My code can be found here on GitHub: https://github.com/spang047/Date-A-Scientist-1.git
I am working on analyzing the data, and have spent a good twenty hours on the project. What I am trying to develop is a regression model that predicts a person’s income based on the other provided information in the data set.
However, I have done several approaches to get a better result to no avail. I attempted to scale the data to prevent one set of large numbers, like income, bias the results. I also removed columns from the data set that do not have income information.
I also ran through different combinations of data into the model to “trim the forest,” but there was no combination that drove the score close to one.
Also, I am very confused as to why all of my attempts for the KNN model go to zero. I would have expected one combination to go above zero, but everything is negatively correlated.
I’m tempted to post the project as a way to show a null result. Basically, a ML regression model couldn’t be built that can predict the values. Is there something major I am missing in my approach that is preventing a better score?
I did think of doing part three to try to do a classifier approach. It wouldn’t necessarily meet what I was trying to complete which is predicting values, but I could try to classify what income people would fall into with the model. That may be a way to at least build something that has a certain level of predictive values.
Any suggestions to improve the regression models would be great!
I love my life,