NBA trends

Hello
I am currently working on project “NBA Trends” (https://www.codecademy.com/paths/data-science/tracks/dacp-summary-statistics/modules/stats-associations-between-variables/projects/nba-trends) and I wanted to make sure I have the right interpretation of the step 8 result.

Step 8 is :
“Using the contingency table created in the previous exercise (Ex. 7), calculate the expected contingency table (if there were no association) and the Chi-Square statistic and print your results. Does the actual contingency table look similar to the expected table — or different? Based on this output, do you think there is an association between these variables?”

so I have a 2X2 matrix but my chi2 square result is 6.5. So for me there is not really a correlation between the “game location” and “game result”. Is it correct ?

I also transformed the games_result and location to binary (0 for L, 1 for W and 0 for H, 1 for A), and ran correlation matrix. The result show a correlation coefficient of -0.12 between games result and location, which we can interpretate as no correlation between these two variables.
So my second question is : is correlation matrix really relevant in this case ?

Thanks a lot for your help

Great questions!

For looking at an association between two categorical variables, one method is to simply compare the “expected” contingency table (if there’s no association) to the observed contingency table and see how much they differ. Larger differences indicate a stronger association. In this case, the observed table is:

133  105
 92  120

And the expected is:

[[119. 119.]
 [106. 106.]]

So there’s a clear (although not huge) difference between each pair of numbers (eg. 133 vs 119 or 106 vs. 92). While we haven’t really gotten to hypothesis testing at this point, the lesson also mentions that for a 2 by 2 table like this, a Chi Square statistic greater than around 4 would suggest a clear association. So the statistic of 6.5 definitely suggests an association!

Because these are both categorical variables, it does not make sense to calculate a Pearson correlation (that would work for two quantitative variables, or one quantitative and one categorical variable).

1 Like