top of page
J

Used Car Price Prediction
The project aimed to develop a multiple regression model and a program to estimate used car prices in the market. The team obtained a dataset from Kaggle, focused on six popular car brands, and performed data cleaning and exploration. Two regression models were created—one for individual car brands and another for all brands combined—with an average R-squared value of 87%. A Python program was developed to allow users to input car variables and receive estimated prices, providing potential buyers with a tool to check prices before making purchasing decisions.

Python was used to check, validate and filter the data. In order
to ensure the quality of our data, we first used Python to process the data with basic data cleaning procedures, for example, checking invalid values and checking null values.
We noticed several unreasonable observations in the dataset, such as used cars from the year
2060, cars that can run 470 miles per gallon (mpg), or cars with zero engine size.
Here are images that show the unreasonable observations and the code used to rectify the issue limiting the years from 2000 - 2022, MPG range from 10 - 70, engine size more than 0 and tax more than 0.



Variable Selection Using SPSS


Heatmap Correlation
Scatter plot for Individual Car Brand multiple regression results
FULL REPORT BELOW

20221204_BANA212 FINAL Project Report_ Team 12A_page-0001

20221204_BANA212 FINAL Project Report_ Team 12A_page-0002

20221204_BANA212 FINAL Project Report_ Team 12A_page-0018

20221204_BANA212 FINAL Project Report_ Team 12A_page-0001
1/18
bottom of page