My code can be found here on GitHub: https://github.com/spang047/Date-A-Scientist-1.git

I am working on analyzing the data, and have spent a good twenty hours on the project. What I am trying to develop is a regression model that predicts a person’s income based on the other provided information in the data set.

However, I have done several approaches to get a better result to no avail. I attempted to scale the data to prevent one set of large numbers, like income, bias the results. I also removed columns from the data set that do not have income information.

I also ran through different combinations of data into the model to “trim the forest,” but there was no combination that drove the score close to one.

Also, I am very confused as to why all of my attempts for the KNN model go to zero. I would have expected one combination to go above zero, but everything is negatively correlated.

I’m tempted to post the project as a way to show a null result. Basically, a ML regression model couldn’t be built that can predict the values. Is there something major I am missing in my approach that is preventing a better score?

I did think of doing part three to try to do a classifier approach. It wouldn’t necessarily meet what I was trying to complete which is predicting values, but I could try to classify what income people would fall into with the model. That may be a way to at least build something that has a certain level of predictive values.

Any suggestions to improve the regression models would be great!

I love my life,

Lucas