This is Jeopardy! project: using regex

This relates to part 4 of the project
https://www.codecademy.com/practice/projects/this-is-jeopardy

I’d like to use regex as part of the filter function so that rows that match on substring are excluded. Does anyone know how I’d go about doing this? Where would the regex fit in?

I don’t know the answer, but was wondering the same thing as I approach section 3 of this project. Did you ever figure it out?

Hi,

I guess it’s a little bit late, but maybe for future programmers who want to exclude substrings with regex it could be helpful.
I came up with this solution:

def filter_question(df, words_list): base=str(r'^{}') expr=str('(?=.*\\b{}\\b)') # Added \b to limit the search to the exact words of the list sequence = base.format(''.join(expr.format(w) for w in words_list)) # Regular expression return df.loc[df['Question'].str.contains(sequence, case=False, regex=True)] # Case insensitive filtered = filter_question(jeopardy, ['England','king']) print(filtered['Question']) # Question 208295 with only "Viking" in it is now disappeared # Question 216789 with only "kingdom" in it is now disappeared

This seems working very well since:

  • It avoids substring (e.g: if you search “king”, you’ll not find results with “viking” or “kingdom”) thanks to the \b key in regex
  • It keeps words with apostrophe (e.g: searching “England”, it’ll keep results with “England’s”)
  • with case=False, it’s case insensitive
  • It looks for each word in the words list

Hope it helps
Cheers
Alberto