Mobile Price Prediction Using Machine Learning Classification Techniques

Saiteja
4 min readOct 7, 2019

--

We are using Classification techniques like CART, Logistic Regression, Random Forest,Naive Bayes ,Decision tree,Ada boost under SVM , KNN to classify the price range of the mobile phones.

Modules involved:

  1. Loading Data , pre processing our data
  2. Visualizing our Data , finding correlation among features and target label
  3. Splitting the data into training samples and testing samples
  4. Using classification techniques and finding the accuracy of the model
  5. Analyzing different classification metrics like MSE, RMSE , Precision , Recall , Accuracy etc.
  6. Concluding the best model.

About The Data

I took this data set from kaggle , it consists of 2 csv files , I used train.csv file, which has 2000 rows and 21 columns , the columns are

battery_power , blue, clock_speed, dual_sim, fc, four_g,int_memory, m_dep, mobile_wt, n_cores, pc, px_height, px_width, ram, sc_h, sc_w, talk_time, three_g, touch_screen, wifi, price_range

here we categorized the price_range from 0 to 3 in increasing order of prices i.e 0 for low range, 1 for mid range , 2 for high range , and 3 for premium or very high range phones

our target label here is price_range , we need to classify the price range of mobile phone based on the specs or features.

1.Loading Data , pre processing our data

Lets load data downloaded from Kaggle.

Here we are took a very good data set , so we're not doing any pre processing

2.Visualizing our Data , finding correlation among features and target label

finding correlation among the target label and other columns

here if we can observe ram has the highest correlation value i.e as ram value increases price also increases and -ve correlation between weight of phone and price i.e as weight increases price decreases etc.

lets see one example of highest price range and its columns

correlation matrix of our data set

we are using Pearson's correlation factor

Heat map of our data set

3.Splitting the data into training samples and testing samples

we just used sklearn library to split into train,test and we divided them into 70–30 ratio.

4.Using classification techniques and finding the accuracy of the model

here we just used linear regression to experiment or play , we got very less accuracy ,so we now use actual methods to classify them

Logistic Regression:

Decision tree classifier using gini impurity (CART):

corresponding tree diagram :

Decision tree classifier using entropy (id3):

corresponding tree diagram :

Ada boost with SVM as base:

corresponding confusion matrix :

KNN with 10th nearest :

Random forest and multinomial naive bayes :

5.Analyzing different classification metrics like MSE, RMSE , Precision , Recall , Accuracy etc.

we already analysed them in the previous steps

6.Concluding the best model.

Among all the classifiers we choose decision tree with gini impurity (CART) as it has highest accuracy later we choose id3 and logistic regression

we also compared the results with test.csv file and found the prediction values accuracy

That’s all for now , Thank You

A. Saiteja

--

--

No responses yet