Hi everyone! I wanted to ask something about the fifth task of this project.

It’s about calculating the mean for the question values, but the final round questions actually don’t have a ($) value and so the solution code recommends setting all those (it’s 3654 of them) to zero:

jeopardy_data["Float Value"] = jeopardy_data["Value"].apply(
    lambda x: float(x[1:].replace(",", "")) if x != "None" else 0

This results in an average value of 740 (rounded). Wouldn’t it be more sound to ignore those questions alltogether? Is it really fair to have them lowering the mean of all the other questions?

I tried to do so by setting all ‘None’ strings to actual NaN values:

df = pd.read_csv("jeopardy.csv", na_values="None")

and converted the value column to float type inplace:

df.value = df.value.str[1:].str.replace(",", "").astype(float)

which should make the mean() function ignore the NaNs and gives me a result of 753 (rounded).

Well, what do you think?

