Predicting a variable from other variables


for a task I have to predict the income ten years after A Levels from three parameters: Final high school degree (0-100%), IQ, and extraversion (scale from 0-10). I have a data set of 10000 former students, so basically a 10000x4 table with three predictor variables (degree, IQ, extraversion) and one target variable (gross yearly income).

What would be a simple way of using regression or machine learning approaches to best predict income? My idea would be that I take a subset of the students (maybe half) to determine the weights that best line up to the income. I would then use these weights to predict the income of the second half of students (an independent data set) and see how well it is doing. Does that make sense? Which functions would fit best here?


Is this for a school exam or a code challenge for a job interview?

Not sure why that is relevant, bu it’s for an scientific paper.

We don’t give out answers for school exams or coding challenges.

See here for guidance.

This guy might have some good ideas for what regression model to select.

Okay, that is understandable. To be honest with you, I very much simplified the problem to make it more approachable. In reality, the predictor variables are neural measures from electroencephalography (EEG) data.

that sounds pretty fascinating!

I know there’s some debate around IQ tests in general. I feel like IQ isn’t an accurate measurement though. It (tests for it) seems flawed to me b/c it’s dependent upon access to education, nutrition, health conditions, income, etc.

Have you seen this from UC-Irvine?