This is Jeopardy

Im not sure why but Im struggling with question three on this.

"Write a function that filters the dataset for questions that contains all of the words in a list of words. For example, when the list ["King", "England"]was passed to our function, the function returned a DataFrame of 152 rows. Every row had the strings "King" and "England" somewhere in its " Question".

Note that in this example, we found 152 rows by filtering the entire dataset. You can download the entire dataset at the start or end of this project. The dataset used on Codecademy is only a fraction of the dataset so you won’t find as many rows.

Test your function by printing out the column containing the question of each row of the dataset."

I’ve looked at the hint, I’ve looked at the solution (which I don’t understand what’s going on, I don’t have any clue where to start I feel like if tried everything. I don’t even thing the solution is doing the right thing.

here is the solution code I tried modifying this to use Regular Expression. such as “\Wking\W” as soon as I do anything to the code it stops working. I’ve tried .isin I’ve tried str.contains, I’ve tried looking at others solutions. I’ve tried taking all the strings from the data frame’s column Question and appending them to a list, I couldn’t be more lost on this one.

# Filtering a dataset by a list of words
def filter_data(data, words):
  # Lowercases all words in the list of words as well as the questions. Returns true if all of the words in the list appear in the question.
  filter = lambda x: all(word.lower() in x.lower() for word in words)
  # Applies the lambda function to the Question column and returns the rows where the function returned True
  return data.loc[data["Question"].apply(filter)]

# Testing the filter function
filtered = filter_data(jeopardy_data, ["King", "England"])
print(filtered["Question"])
1 Like

Hi @x5960114061

You know? I was struggling too with this question. I wasn’t able to understand at all the Lambda functions. For me it was easier to see the for loop instead of a compressed Lambda function.
The things that I have done was delate the lambda function and put back my lovely for loop, just for understanding:

def filter(x, words):
    # making the data['Question'] in lowercase
    x_lower = x.lower()
    for word in words:
        if word.lower() not in x_lower:
            return False
    return True

with the return data.loc[data["Question"].apply(filter)] is taking the rows in the dataframe column named ‘Question’ and convert in lowercase. It is take the words in the list named ‘words’ too (‘King’ and ‘England’ in this case) and convert them in lowercase. Then for every word in the list ‘words’ is checking if there is a match with that word and the row of the dataframe. If there is, is creating a dataframe with all the rows matching that specific words (the all() function it helping to make sure that all the words of the list are in the specific row. Questions with just one of the word will not returned).
As you can see, I pretend that x was just a string. But we know that x is a Dataframe. Just to understand what Lambda function is doing.

Now you can see how much is practical to write a lambda function instead a for loop.
I can’t imagine that now Lambda functions are my best friends. :face_with_hand_over_mouth:

I’ll share a webpage were you can find more info on the Lambda Functions here

Let me know if there is still any questions.

Happy Coding!

1 Like

Ok this was actually very helpful thank you. From what I understand your function filter here is taking all the rows in the column data['Question'] making them lower case and returning False if the words are not in x_lower and True otherwise

then the data.loc[data["Question"].apply(filter)] actually goes through them and find which ones have the words.

My next question for this is sometimes this will return, ‘Vikings’ which we don’t want, using regular expression, we could fix this using r'\Wkings\W' but I’m not sure how to add that I guess I could concatenate by adding a variable regx = '\W' + word.lower() + '\W' and then do if r(regx) not in x_lower: I’ll try this.