FAQs on the exercise Calculating Aggregate Functions III

I think if we create a class-method/attribute for calculating percentile for the series object, then it may be applied directly like other statistical values like max, min etc. I can only conclude the the .percentile is a function in the numpy module which can be called on a series object x passed to it. Therefore the lambda function is used in passing the wage object

Results as follows:
shoe_color percentile(x, 25) Excel
0 black 130 130
1 brown 248 248
2 navy 200 200
3 red 157 149
4 white 188 181
The results match for 3 colors, but are different for two colors. Furthermore, I checked the CSV data and there is no data point (red, 157) or (white, 188)… Any insights?

I tried to calculate in Excel the percentile(x,25) only for red-coloured shoes in order to check what is going on.

I noticed that in Excel there are two functions for calculating percentiles , “PERCENTILE.INC” & “PERCENTILE.EXC”; their difference is that the first includes the min. and max. datapoints but the second doesn’t.
Using the first function (INC), the result is the same as that in codeacademy’s solution i.e. 157. Using the second (EXC) , it is 149. It seems that you used the PERCENTILE.EXC function. It is also apparent that python’s method here for calculating percentiles is to include the min. and max.

As far as your second comment that the result is 157 whereas there isn’t any datapoint (red shoe) in the dataframe priced at 157:
I recall that we calculate percentiles , quartiles etc. in order to find those values that divide our dataset into the groups we want. Then we check which of our data falls into each group. Those values might be the same, equal, to datapoints of our dataframe or might be not.

In the red-shoe dataframe example, the percentile (x,25) actually lies between id 11 & 12 which corresponds to prices 149 & 165. Thus, the percentile, in terms of price, is the mean of those two values (157).

Okay, the instructions here have a few misleading instructions.

Prior to this, the student is not introduced to the numpy library in any formal way, nor is it explained how and why we import it. This needs to be explained, especially considering we are using it in this exercise. At the very least, we need some sort of lesson on python libraries and how and why we use them.

We are given the following code as instructive in the lesson write-up:

Technically, this syntax wouldn’t work, if a student tried to emulate it. For example, in the exercise, when you write the following, it will generate an error:

Note the backslashes in the above. Again, this sort of thing needs to be explained in the lesson write-up, since previous lesson solutions have this in the code without any explanation of what it does.

hey!
when i type in the command: cheap_shoes = orders.groupby('shoe_color').price.apply(lambda x: np.percentile(x, 25)).reset_index()
the dot after price differs in color. it’s brown where all the others are white.

Does anyone understand why this is?
just wondering

Hi,
Well, if you let editor go to next line by keep writing code and don’t change the line by pressing “enter” it works. Otherwise, you have to insert backslashes. It worked for me.

Hey, I have a question about how the percentile operates with groupby.

In a previous section groupby is explained as enabling us to loop through a subset of values. But, np.percentile takes an array like input not a single value.

So when we refer to x in the lambda function we are referring to the array output by the groupby function?