# Calculating Pearson Correlation -- two variables?

Hello,

I am facing confusion as to why we have to assign two variables when calculating the Pearson correlation. For example:

``````from scipy.stats import pearsonr

corr_price_sqfeet, p = pearsonr(housing.price, housing.sqfeet)

print(corr_price_sqfeet)
``````

Here I am wondering why we assign both `corr_price_sqfeet` and `p` to `pearsonr(housing.price, housing.sqfeet)` . What does `p` represent?

Itâs because youâre measuring the linear dependence between two variables. In this case, housing price & sq. feet. The range is -1 and 1 (a negative relationship or a positive one, or, 0 meaning no relationship).

Is this an instance of unpacking? That is, would `corr_price_sqfeet` match to `housing.price` and `p` to `housing.sqfeet`?
If so, I am still not appreciating the functionality of assigning the variables to the arguments in this way.

HmâŚIâm not sure Iâm following what âunpackingâ is.

Youâve imported the pearson module from scipy stats and it requires two variablesâso, youâre passing through the two quantitative variables that you think might have a relationshipâhousing price and sq. feet.
Youâre calculating the correlation (`corr_price_sqfeet`) but also the p value (but that value only relates to the sample data). Though a Pearson correlation doesnât show us any significance; just that the two quantitative variables in the sample data are either positively, negatively or not at all correlated. Youâre seeing the strength of the relationship (range -1 and 1).

Usually after a Pearson test one would do a hypothesis test (t-test) to see if thereâs any significance. Your null hypothesis (Ho) would be something like, square footage has no effect on sale price and, your alternative (Ha)âsquare footage does affect sale price.

By â`[unpacking](https://www.geeksforgeeks.org/unpacking-a-tuple-in-python/)`â I mean assigning variables to arguments. Although, I no longer think this is relevant here.

So in the about example, it seems that `corr_price_sqfeet` is assigned to whatever the value that value is that the `Pearsonr()` provides. But what is the p value? When I print this out, it does not make much sense in relation to my dataâŚ

Also, thank you for the general stats insight!

yep, the first thing is the Pearson Correlation coefficient (often represented by r in formulas where it means sample correlation coefficient).

As for the second thing âŚ
The p-value is used for stuff like hypothesis testing âŚ
meaning checking âAre they really correlated? or is is something that just happened by chance?â

Or stated in more detail: âAre these variables really correlated in the population? or did this correlation just happen by chance for this particular sample, and the variables are not actually correlated in the population?â
(Thatâs roughly the null hypothesis vs. the alternative hypotheses here, if youâve heard of those.)

You get probabilities of the second one as a p-value (which is calculated based on stuff including the correlation coefficient and how many things there are in the sample)

I know this isnât a precise explanation (and parts of this may be inaccurate).
Correct me if Iâm wrong please.

1 Like

I vaguely recall doing this lesson but forgot where it is on the DS path. Do you have a link?

Yes, correct.

The p value determines if the correlation is significant. (< 0.05).

But, like I said, this is just a first step in hypothesis testing.

Very cool. Thank you.
Here is the link, if you are still curious