Central Tendency Project - What does the data being accessed look like?

Hi - thanks for checking out my post regarding a project called Central Tendency in the Master Statistics with Python Skill Path.

I completed the tasks of the project however I am curious about the prewritten code towards the end. Specifically, in the try/except functions for MODE, why does the value I created for manhattan_mode appear to be a list within a list? The value is being accessed as manhattan_mode[0][0] and manhattan_mode[1][0].

When I print manhattan_mode I see that it is an array. However I cannot use head() to check it out, I receive an error: AttributeError: ‘ModeResult’ object has no attribute ‘head’.

I believe it is something with the stats module that I’m not fully understanding yet.
Is there another function I can use to see what manhattan_mode (and the other mode variables) contain?

Thank you kindly

# Import packages import numpy as np import pandas as pd from scipy import stats # Read in housing data brooklyn_one_bed = pd.read_csv('brooklyn-one-bed.csv') brooklyn_price = brooklyn_one_bed['rent'] manhattan_one_bed = pd.read_csv('manhattan-one-bed.csv') manhattan_price = manhattan_one_bed['rent'] queens_one_bed = pd.read_csv('queens-one-bed.csv') queens_price = queens_one_bed['rent'] # Add mean calculations below brooklyn_mean = np.average(brooklyn_price) manhattan_mean = np.average(manhattan_price) queens_mean = np.average(queens_price) # Add median calculations below brooklyn_median = np.median(brooklyn_price) manhattan_median = np.median(manhattan_price) queens_median = np.median(queens_price) # Add mode calculations below brooklyn_mode = stats.mode(brooklyn_price) manhattan_mode = stats.mode(manhattan_price) queens_mode = stats.mode(queens_price) ############################################## ############################################## ############################################## # Mean try: print("The mean price in Brooklyn is " + str(round(brooklyn_mean, 2))) except NameError: print("The mean price in Brooklyn is not yet defined.") try: print("The mean price in Manhattan is " + str(round(manhattan_mean, 2))) except NameError: print("The mean in Manhattan is not yet defined.") try: print("The mean price in Queens is " + str(round(queens_mean, 2))) except NameError: print("The mean price in Queens is not yet defined.") # Median try: print("The median price in Brooklyn is " + str(brooklyn_median)) except NameError: print("The median price in Brooklyn is not yet defined.") try: print("The median price in Manhattan is " + str(manhattan_median)) except NameError: print("The median price in Manhattan is not yet defined.") try: print("The median price in Queens is " + str(queens_median)) except NameError: print("The median price in Queens is not yet defined.") #Mode try: print("The mode price in Brooklyn is " + str(brooklyn_mode[0][0]) + " and it appears " + str(brooklyn_mode[1][0]) + " times out of " + str(len(brooklyn_price))) except NameError: print("The mode price in Brooklyn is not yet defined.") try: print("The mode price in Manhattan is " + str(manhattan_mode[0][0]) + " and it appears " + str(manhattan_mode[1][0]) + " times out of " + str(len(manhattan_price))) except NameError: print("The mode price in Manhattan is not yet defined.") try: print("The mode price in Queens is " + str(queens_mode[0][0]) + " and it appears " + str(queens_mode[1][0]) + " times out of " + str(len(queens_price))) except NameError: print("The mode price in Queens is not yet defined.")

link to the lesson, please? CodeBytes doesn’t work with this b/c you cannot import Python libraries; it throws errors.

mode is the most frequent value in an array, ie: it’s one value, it’s not a data frame or an array object so you cannot use the Pandas method .head(). You’re better off printing the manhattan_price, as it’s an array, or column of data.

see:

for ref, always check the docs:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html

Thanks for your response.

Here is the link, please lmk if it works this is my first time posting: Central Tendency | Codecademy

Also, when I print manhattan_mode, it doesn’t appear to be a single value because this is printed: ModeResult(mode=array([3500]), count=array([56])). I guess I’m not sure exactly what all that means?

In the beginning of the lesson it states:
" In this project, we only care about the price of apartments, so we saved the price of apartments in each borough to:

  • brooklyn_price
  • manhattan_price
  • queens_price

If you want to see what these arrays look like, you can use print statements to see them in the output terminal."

Which means that the arrays are just that single rent column from the data frame. You’re calculating mean, median, mode on a single column of data only.

*If you do a print(manhattan_price) you’ll get this:

0       4500
1       4795
2       4650
3       2950
4       4875
       #not all 1476 rows are printed. 
1471    3420
1472    2095
1473    4210
1474    3475
1475    4500
Name: rent, Length: 1476, dtype: int64 #column name, length/number of rows, data type of the col.

For manhattan_mode = stats.mode(manhattan_price)

this is returned: “The mode price in Manhattan is 3500 and it appears 56 times out of 1476”

mode, $3500, is the most represented value in that array which is 56 times. 1476 refers to the number of rows.

mode is a single value. the other col, 0-1475 are the index numbers for each row in the rent column (think of it like a single column in a spreadsheet or table), 3500 = most represented price, 56= the number of times it appears in the array.

More on indexing:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

arrays:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.array.html

and on scipy.stats.mode:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mode.html

Thank you for the information and resources.

Why does print(manhattan_mode) return this: ModeResult(mode=array([3500]), count=array([56])) instead of its single value?

I also still don’t understand why a single value is being accessed like this: manhattan_mode[0][0] or manhattan_mode[1][0]

Also, the scipy.stats.mode link says that stats.mode, “Return an array of the modal (most common) value in the passed array.”, not a single value. Is there a way to view this array it returned?

Sidebar: How does Pandas deal with multiple modes?

“most common value” = a single value.

$3500 is a single value. It is the one that is represented most frequently in the rental column for Manhattan. 3500 appears 56x in that column of data. You could pull out each row where 3500 is listed, and that would be, [3500, 3500, 3500, etc]

*Also, when I try print(manhattan_mode), there isn’t a result in my learning environment.

You’re not really supposed to be concerned with the try and except statements below your code. But, each row has an index and that “col” is at 0, the rents are at index 1.

think of an excel spreadsheet,

It might be a good idea to re-review mean, median, and mode either here on CC or on a basic stats site.

The lesson is on central tendency, mean, median, mode & the distribution of this data. This particular column has one mode, or is unimodal. Let’s not confuse things. :slight_smile:

But, sure, a data set can have more than one value that is most frequently represented. It can have no mode, or be: unimodal (one mode), bimodal (two modes), trimodal( three modes), or multimodal (four or more modes).

1 Like

Valid question as realistic data tends to be more complex. When there are multiple modes, pandas handles it by choosing the mode value that is the lowest (in number or string length).

There appears to be solutions to having all the modes returned, from StackOverflow:

agg_mode = purchases_df.groupby(['date', 'user_id'])['purchase'].agg(lambda x: x.mode().tolist())
1 Like

I understand that mode is supposed to be a single value, I understand the concepts of mean, median, and mode.

What I have been asking is why Python returns an array after using stats.mode, which is evident by the need to access manhattan_mode not just as a variable but as a list within a list (manhattan_mode[0][0]).

The more code I understand, the better programmer I will be, no?

So, it looks like the answer is that Python returns the single mode value as a single line array. Thanks!

The returned result of mode is a mode object.

ages = [37, 12, 38, 34, 37, 26, 37, 21, 21, 37, 19, 67, 37, 41, 17, 37, 44, 37]
stats.mode(ages, axis=None, keepdims=False)

>>ModeResult(mode=37, count=7) 

it’s not. It’s an array (that’s zero-indexed) with indicies.

I need some help if any one can past the whole code that I can compare with my code. I went through all the process but still the next option is inactive.

Show me yours, I’ll show you mine. Please be sure it is formatted so we can make sense of it.

While you’re figuring that out, here is a little segue to tease your brain. It may help.

a2plus2abplusb2
credit: ‘TabletClass Math’ YouTube channel