top of page

Mar 2025 - Apr 2025

Used Car Price Prediction Model

This project uses Linear Regression to predict used car prices based on features like mileage, engine size, and brand. The model delivers actionable insights and a clear breakdown of how different attributes influence market value, making it both statistically sound and interpretable.

Overview


This project applies Linear Regression to predict used car prices based on key features like mileage, engine size, year of manufacture, body type, and brand. Built in Python, the goal was to create a statistically sound and interpretable model that captures real-world pricing dynamics. The coefficients provide a clear, interpretable breakdown of how each feature influence the market value of cars.



Approach


  • Exploratory Data Analysis (EDA): 

    I started off with EDA, using scatterplots and correlation matrices to identify patterns and relationships between variables. During this phase, I also detected skewness in price, mileage, and year, which shaped the preprocessing steps.


  • Preprocessing: 

    To handle the data effectively, I performed a series of transformations: I applied a log transformation, dealt with outliers, and scaled the features. Dummy encoding helped turn categorical data into usable inputs, and I checked for multicollinearity to ensure that the features wouldn’t distort the model. Additionally, I relaxed some linearity assumptions and validated the Ordinary Least Squares (OLS) regression requirements.


  • Modeling: Trained a Linear Regression model using an 80/20 train-test split with log-transformed price as the target. Standardized input features and calculated coefficients to support interpretability.

    I trained the model using an 80/20 train-test split, with log-transformed price as the target. After standardizing the input features, I calculated the coefficients, making sure that the relationship between input variables and predicted prices was both logical and interpretable.


  • Evaluation: 

    The model achieved an R² score of ~0.67 on the test set, indicating that it explained a significant portion of the variance in used car prices. To assess its performance further, I analyzed the residuals and created actual vs. predicted plots, highlighting where the model could be improved.



Insights & Interpretability


  • Mileage negatively impacts the price, which aligns with expectations in the used car market since cars with higher mileage generally have lower prices.


  • Engine size and premium brands (like BMW and Mercedes) are positively associated with higher prices, showing that certain attributes add significant value.


  • Petrol engines and older vehicles show a lower influence on price compared to the baseline, suggesting that these factors aren't as strong in driving market value.


  • The model underestimates prices for luxury vehicles, likely due to their limited representation in the dataset, and tends to overestimate low-end outliers.





Project Gallery

 

Have a Question or Want to Connect?

 

Let's Get In Touch!

linkedin.com/in/shreeyasha-pandey/

United States

  • GitHub
  • LinkedIn

 

© 2025 by Shreeyasha Pandey. Powered and secured by Wix 

 

3D Wireframe Sphere

Thanks for reaching out!

bottom of page