Write a function that filters the dataset for questions that contains all of the words in a list of words. For example, when the list ["King", "England"] was passed to our function, the function returned a DataFrame of 152 rows. Every row had the strings "King" and "England" somewhere in its " Question" .

Note that in this example, we found 152 rows by filtering the entire dataset. You can download the entire dataset at the start or end of this project. The dataset used on Codecademy is only a fraction of the dataset so you won’t find as many rows.

Test your function by printing out the column containing the question of each row of the dataset.

Here I am using
df = pd.read_csv(r"C:\Users\rakumishra\Desktop\jeopardy.csv")
print (df[“Question”])— to retrieve the column “Question”. I dont understand why I am getting error when I am using teh correct syntax. Could you please help

hello mate i am stuck in this question, if you found an answer please share it with me

Post the question . Unable to view your question

There is a space before the Q so use print (df[“ Question”]) instead of print (df[“Question”]). Most of the other columns also have a space before their first letter, it’s better to just print out the column names, copy them straight from the printed output and rename each with .replace().

Please I’ve also been stuck on question 3 for a while now. The code below keeps returning an empty dataframe. Can anyone put me through please.

ret_rows = df[df.question.isin([‘King’, ‘England’])]


Make sure you’ve inspected the column names carefully. I was just struggling with trying to rename them, and I realized that something really was off about the column names like they said… Make sure they’re stripped.

I’ve been searching for a good strip method and have failed to find one. Do you have suggested column way strip all column names?

Did you ever get past this question? I’m stuck at the same point and tried the same approach. The hint suggests the use of all() but I do not understand the syntax for all() as it relates to this requirement. Any guidance you have would be greatly appreciated. TY

I think it is empty because the answers in jeopardy[’ Question’] do not only contain “King” or “England” but other strings. You filter for the exact content and, thus, answers with more text are filtered out.

If you try the following code it doesn’t come back empty:

King_England = jeopardy[(jeopardy[’ Question’].str.contains(“King”)) & (jeopardy[’ Question’].str.contains(“England”))]

BUT, it doesn’t match the number in the text. It says there should be 152 results but for the downloaded dataset I only get 49. You can check this using the following code:


If I inspect the results, they are correct though:

print(King_England[’ Question’].head(20))

So while I’m confident in that code, I see that the instructions say to create a function.

I’m not able to get that into a function that returns me the proper dataframe.

If anyone has an idea how to apply functions to filter a dataframe, please share them.

I think the problem is in the capital letter in ‘King’ and ‘England’.
If you put all letters in the string in lower case, you will get the 152 rows.

word_list = [‘king’, ‘england’]

question_key_words = jeopardy[jeopardy.question.apply(lambda x: all(word in x.lower() for word in word_list))]

Note: I’ve changed the column name from " Questions" to “question”

1 Like

Yes, I have used the following line of code and it has helped strip all column names. I suggest you try it out.

df = pd.read_csv(“jeopardy.csv”)
df.columns = df.columns.str.strip()

1 Like

there are just 7 columns so you can use the df.columns to rename all your columns before you continue writing your function.
An example will be data.columns = [‘show_number’, ‘air_date’, ‘round’, ‘category’, ‘value’, ‘question’, ‘answer’]