There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply () below.
If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.
Join the Discussion. Help a fellow learner on their journey.
Ask or answer a question about this exercise by clicking reply () below!
Agree with a comment or answer? Like () to up-vote the contribution!
Could someone pls explain me why in the previous exercise about median we used np.median() and now all in a sudden we have to use scipy.mode()??? Very confusing … thanks
Apologies for the delay, Hopefully it will help someone else even if you already solved your problem.
Unfortunately there’s no simple route to returning multiple modal values. As per the docs- scipy.stats.mode only the smallest mode is returned.
Something along the lines of collecting and counting unique values and then taking only the values with the largest count(s) would be a good place to start, e.g. with the built-in- unique, counts = numpy.unique(a, return_counts=True)
Using scipy.stats.mode will return a namedtuple of both modal values and their appearance counts (which themselves are numpy arrays). A namedtuple is like a regular tuple which can be accessed by index, e.g. [0] but can also be accessed by a specific attribute name. An example might be simpler-
from scipy import stats
values = np.array((0, 1, 1, 2, 3, 3))
output_vals = stats.mode(values)
print(output_vals)
Out: ModeResult(mode=array([1]), count=array([2])) # namedtuple values
# namedtuples can be addressed by a pre-defined attribute (in this case 'mode' and 'count')
# ... or by index like a normal tuple
output_vals[0] == output_vals.mode # array([True])
# note that that the output is an numpy array due to the way numpy comparisons work
output_vals[1] == output_vals.count
# these are simply different methods of referencing the same element
As for why [0][0] and [0][1] are used it is because the scipy result wraps these results in a numpy array.
# following on from above...
print(type(output_vals[0]))
<class 'numpy.ndarray'>
print(output_vals[0])
Out: [1] # We have the mode but it is still inside a numpy array
print(output_vals[0][0]) # Index the mode array AND index the array itself
Out: 1 # This provides us with the actual value.
So they want you to use Scipy for this, so i did.
Then I worried about what happens if you have more than one mode (they note that if that happens, this function only shows the smallest one, but it has no warning that there might be more than one mode for your data).
So, this:
Import packages
import numpy as np
import pandas as pd
from scipy import stats