FAQ: Mode - Mode SciPy

This community-built FAQ covers the “Mode SciPy” exercise from the lesson “Mode”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Learn Statistics With Python

FAQs on the exercise Mode SciPy

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

1 Like

Could someone pls explain me why in the previous exercise about median we used np.median() and now all in a sudden we have to use scipy.mode()??? Very confusing … thanks

1 Like

Noticed that you have a comment that is probably referencing the “median” lesson rather than the “mode” lesson (part 3 of 4)…

Use numpy to calculate the median age of the top 100 authors

where it should probably be

Use scipy to calculate the mode age of the top 100 authors

Feel free to delete this comment when you get it fixed :wink:

1 Like

I was interested myself so I went to look at the packages at https://docs.scipy.org/doc/ . And then I dug into scipy’s FAQs and found this : https://www.scipy.org/scipylib/faq.html#what-is-the-difference-between-numpy-and-scipy . Hopefully this helps!

2 Likes

How do we find all the mode(s) that exist in a data-set and not just the smallest one, if there are multiple mode(s)?

1 Like

I am trying to understand the last bits to the print statement

print("The mode age and its frequency of authors from
       Le Monde's 100 greatest books is: " + str(mode_age[0][0]) + \
       " and " + str(mode_age[1][0]))

where we add in the str(mode_age[0][0]) and str(mode_age[1][0]) Whats the reasoning behind the indexes

1 Like

Apologies for the delay, Hopefully it will help someone else even if you already solved your problem.

Unfortunately there’s no simple route to returning multiple modal values. As per the docs- scipy.stats.mode only the smallest mode is returned.

Something along the lines of collecting and counting unique values and then taking only the values with the largest count(s) would be a good place to start, e.g. with the built-in-
unique, counts = numpy.unique(a, return_counts=True)

1 Like

Using scipy.stats.mode will return a namedtuple of both modal values and their appearance counts (which themselves are numpy arrays). A namedtuple is like a regular tuple which can be accessed by index, e.g. [0] but can also be accessed by a specific attribute name. An example might be simpler-

from scipy import stats
values = np.array((0, 1, 1, 2, 3, 3))
output_vals = stats.mode(values)
print(output_vals)
Out: ModeResult(mode=array([1]), count=array([2]))  # namedtuple values

# namedtuples can be addressed by a pre-defined attribute (in this case 'mode' and 'count')
# ... or by index like a normal tuple
output_vals[0] == output_vals.mode  # array([True])
# note that that the output is an numpy array due to the way numpy comparisons work
output_vals[1] == output_vals.count 
# these are simply different methods of referencing the same element

As for why [0][0] and [0][1] are used it is because the scipy result wraps these results in a numpy array.

# following on from above...
print(type(output_vals[0]))
<class 'numpy.ndarray'>
print(output_vals[0])
Out: [1]  # We have the mode but it is still inside a numpy array
print(output_vals[0][0])  # Index the mode array AND index the array itself
Out: 1  # This provides us with the actual value.
1 Like

So they want you to use Scipy for this, so i did.
Then I worried about what happens if you have more than one mode (they note that if that happens, this function only shows the smallest one, but it has no warning that there might be more than one mode for your data).

So, this:

Import packages

import numpy as np
import pandas as pd
from scipy import stats

Read in author data

greatest_books = pd.read_csv(“top-hundred-books.csv”)

Save author ages to author_ages

author_ages = greatest_books[‘Ages’]

Use numpy to calculate the mode age of the top 100 authors

mode_age = stats.mode(author_ages)
print (mode_age)

print("The mode age and its frequency of authors from Le Monde’s 100 greatest books is: " + str(mode_age[0][0]) + " and " + str(mode_age[1][0]))

#owngrown will show multiple keys if they have the same max value

you can test it by adding in a second mode with the same value as the max_value. I used mode_dic[43] = 7

It returns keys having the same max value as a list

def find_the_modes():

mode_dic = {}

for age in author_ages:

if age in mode_dic:

  mode_dic[age]+=1

else:

  mode_dic[age] = 1

print(mode_dic)

max_value = max(mode_dic.values())

print(max_value)

mode_dic[43] = 7

print("The mode age of authors is " + (str([k for k,v in mode_dic.items() if v == max_value])))

print(" and the frequency is " + str(max_value))

find_the_modes()

It seems like the pandas value counts function is a good alternative for anybody that wants to be able to deal with multiple results.

Hi,

Does anybody know where I can find the example find “top-hundred-books.csv”?

I would like to practice with this book on my own IDE…