Perhaps the answer is obvious, but to be honest, I have been working on this project for almost three months, and I still feel very confused about how these analytical and predictive models work.
I have been using multivariate linear regression to explore linear relationships between primary ethnicity and orientation, as well as other factors. There are not any to speak of for orientation, but that’s not the question.
Though I have identified and isolated nine primary ethnicities (see below), I only receive back eight coefficients.
I do not understand why that would be the case.
It happens though I know every member of every ethnicity in the sample has a numerical value (0-2) associated with the variable being regressed. In this case, I am looking for correlations between primary ethnicity and orientation.
Though I am finding nothing of statistical significance to suggest greater diversity in orientation, I still should get back nine coefficients - one for each ethnicity - using sklearn’s linear regression model, right?
This happens whether I am working with a sample that is not balanced, meaning there is an overwhelming number of straight members (87%) in the sample or if I use a balanced sample set, with an equal number of “straight” and “not-straight” constituents.
I thought it would make a compelling perspective given that a dating site is the one place people would be honest about his topic. I am bringing an objective viewpoint to it.
I just don’t understand why I am receiving one less coefficients (8) than I should receive, given the nine primary ethnicities I have in my datasets. Please, help.
[‘eth1_asian’, ‘eth1_black’, ‘eth1_hispanic_’, ‘eth1_indian’, ‘eth1_middle_eastern’, ‘eth1_native_american’, ‘eth1_other’, ‘eth1_pacific_islander’, ‘eth1_white’]
“Coeficient for linear regression model is: [ 0.0003811 0.1198191 -0.0091023 0.07000563 0.05869241 0.06398784