Hypothesis Testing - Indexing

I am working on " HYPOTHESIS TESTING WITH PYTHON Familiar: A Study In Data Analysis" and had no issues implementing the following question(question 2 of the project):
“Extract the life spans of subscribers to the 'vein' pack and save the data into a variable called
vein_pack_lifespans . "
I wrote the following for this part of the question:
“””"
#Save column “vein”

vein_pack_lifespans = lifespans.loc[lifespans[“pack”] == “vein”]

print(vein_pack_lifespans)
“”""
But when it comes to implementing the t-test, I get a giant error, but a portion of it says:
" TypeError: unsupported operand type(s) for /: ‘str’ and ‘int’ "
Why is it that this method does not work? How can I rewrite it(using this similar format of indexing), to make it work?

This code using .loc() returns the type of pack column in addition to the lifespan age column:

This is why you get that lengthy error message.

Whereas, this way:

vein_pack_lifespans = lifespans.lifespan[lifespans.pack=='vein']

Only returns the lifespan column (which is datatype float64)

which is what you need to find the mean (using np.mean() ) and what you need to run the 1sample t-test, ttest_1samp
np.mean() takes an array only.
See:
https://numpy.org/doc/stable/reference/generated/numpy.mean.html
I hope that helps.

1 Like

Yes, that helped thank you! I think there is a concept here I need to review. Why is this written as lifespans.lifespan[lifespans.pack==‘vein’] ?

Instead of with as the following (ending with an s):

lifespans.lifespans[lifespans.pack==‘vein’]

Your explanation made a lightbulb go off and I re-wrote the line as:

vein_pack_lifespans = lifespans[lifespans[“pack”] == “vein”].iloc[:,1]
print(vein_pack_lifespans)

For the method used in the answer key, what is this concept called? I should probably review it.
Thank you for your time and help :slight_smile:

2 Likes
vein_pack_lifespans = lifespans.lifespan[lifespans.pack=='vein']
print(vein_pack_lifespans)

it can be confusing! :slight_smile:
I think it is this:

new variable = originaldataframe.newarray[originaldataframe.colname=='value']

You’re selecting a subset of the data frame.
See:
https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html

2 Likes

For that particular query about lifespans.lifespan this expression is equivalent to lifespans['lifespan'], which simply accesses the column 'lifespan' or the lifespans dataframe. That dotted syntax or attribute lookup df.column_name is a convenience method offered by pandas, the following might add some useful detail if you’re curious about how/why it is allowed-

2 Likes

Frankly, I had not realized they were the equivalent. Thank you for the useful post ! :slight_smile:

1 Like