This is Jeopardy Challenge Project (Python, Pandas)

Hello everyone,
Coding this project step by step is harder than it might seem. Anyway, “The end justifies the means” they say and it’s been two days since I arrived at the end of the tunnel. And apart from the difficulties I encountered, the most important thing is everything I learned in developing it. Isn’t it reassuring when at the end of the project, you can play randomly with your code?

This is my solution for this Jeopardy Challenge project.
Your comment , please!

Hi Everyone,

here is my solution, feel free to provide comments on it. Happy to hear different thought procsses.

import pandas as pd
pd.set_option('display.max_colwidth', 50)

'''read csv into all_data'''
all_data = pd.read_csv('jeopardy.csv')

'''renaming columns'''
all_data.rename(columns={
    'Show Number': 'show_number',
    ' Air Date': 'air_date',
    ' Round': 'round',
    ' Category': 'category',
    ' Value': 'value',
    ' Question': 'question',
    ' Answer': 'answer'
    },inplace=True,errors="raise")

'''transforming value column into columns, turn None values to zero'''
all_data['question_value'] = all_data.value.apply(lambda x : float(0) if x=='None' else float(x.replace(',','').replace('$','')))

'''Filter function, has options for case sensitive and whole word searches'''
def filter_df(filter,case_sensitive=False,whole_word=False):
    if case_sensitive:
        if whole_word:
            return all_data[all_data.question.apply(lambda x : all(word in x.split() for word in filter))]
        else:
            return all_data[all_data.question.apply(lambda x : all(word in x for word in filter))]
    else: 
        if whole_word:
            return all_data[all_data.question.apply(lambda x : all(word.lower() in x.lower().split() for word in filter))]
        else:
            return all_data[all_data.question.apply(lambda x : all(word.lower() in x.lower() for word in filter))]
     
'''Find all rows containing "King" and England in the question column'''
filtered_df = filter_df(["King", "England"],False,True)
print('Number of found rows: {}'.format(len(filtered_df.question)))
print(filtered_df.head(20))

'''Find all rows containing "King" in the question column'''
filtered_df_king = filter_df(["King"],False,False)
print('Number of found rows for value King: {}'.format(len(filtered_df_king)))

'''Calculate the mean of question value where "King" is found in the question'''
print(filtered_df_king.question_value.mean())

'''Unique answer count where questions contain "King" '''
print(filtered_df_king.answer.value_counts(sort=True))
1 Like

Hello @pheaxx ,

That’s some great stuff here. I got extremely confused by recursive programming, but I see you are confident enough not only to just read it in other people’s code but to implement it youself! Well done there!

I am currently working on this project and I will post my effort soon, feel free to take a look at it!

1 Like

Hi @cspanias,

Thanks, I look forward to see how you interpreted the questions and how to translated that to code.

If you want someone to discuss thought proces, just pm me!

1 Like

Hello,

You can find my effort on the Jeopardy project here .

My functions for the find_question function are not well-structured, they only work for 1-3 words; I couldn’t figure out a way to do it otherwise!

In activity 5 regarding “cleaning” and converting the value column, I actually find my solution using regex more intuitive, but maybe because I am studying NLP this period and I am seeing regular expressions everywhere!

Overall, it was the most challenging project on the path so far (for me at least) and it made me realize that I have to dive deeper in lambda functions because I am still confused by some keywords like all that I saw on the proposed solution.

@pheaxx feel free to have a look!

1 Like

Nice work, as you mentioned, there is allot of ground to win by using lambda functions and list comprehensions. built in functions are really useful, i strongly recommend the page that was shared by Codecademy: Built-in Functions — Python 3.10.2 documentation

1 Like

Hello again @pheaxx , yes I have added “lambda functions” in my “revision” list along with some other stuff which I thought I had grasped, but when I try to implement them without any guidance I feel lost.

Thanks for commenting!

For the “keyword”-search-function I did come up with this little solution - maybe it helps someone. For me it was very difficult to understand the lambda-function and the .all-method, so I did it “my way”. It is quite flexible to use, since you can input any DF, any columnname in which you want to search and a list of keywords to search for. Probably it is not the “master” solution, but for me it was the easiest to understand, and with the given dataset it is doing a good job so far :slight_smile:

def searcher(datainput, columnname, keywords):
  #Initilizing some things in the beginning (a counter to count through the rows, a list of our given keywords in "low", and two helpcolumns. One which transforms the column we are searching in, into a copy in "low"-string; the other to check if all keywords are found in the column/row (True or False)
  keywordlow = []
  for element in keywords:
    keywordlow.append(element.lower())

  rowcount = 0
  
  datainput["Searchcolumn"] = datainput[columnname].apply(lambda x: str(x).lower() if x != 0 else 0)
  datainput["Trueorfalse"] = True

  #Now going through every value in our "searchcolumn" and search for every given keyword, depending on the result a "truelist" will be appended by True or False
  for value in datainput.Searchcolumn:
    truelist = []
    for element in keywordlow:
      if element in value:
        truelist.append(True)
      else:
        truelist.append(False)
  
  #If one keyword was not found, the helpcolumn "Trueorfalse" will be set to False
    if False in truelist:
      datainput.at[rowcount, "Trueorfalse"] = False

    rowcount += 1
  
  #returns a DF with all rows, that have True in their Trueorfalse-column (which means all keywords have been found in the searchcolumn there)
  df2 = datainput[datainput.Trueorfalse == True]
  return df2
1 Like

Hi, actually it is not a list comprehension. (missing square brackets in the syntax). They used what is called a “generator comprehension”. (which is same syntax as list comprehension except square brackets)
Here you got a link to it if courious:

My point on the topic about the discussion if it was taught or not:
In programming no one can teach you everything. If you want to be a good programmer you will have to learn to be comfortable with looking up and learning for different ways of doing stuff on your own. Very often these leads to very deep understanding and for me at least it is important part of the learning process. So, my recommendation would be changing focus from complaining to see this kind of unexpected solutions as an opportunity to learn and understand.

Can anyone explain to me why this part doesn’t work.
Here is my code for Task 5, I try to erase ‘$’ and ‘,’ in the string so I can convert it into float, but there is still error

Task 5:

jeopardy[‘converted_value’] = jeopardy.value.apply(lambda x: float(x.lstrip(’$’).lstrip(’,’)) if x != ‘None’ else ‘None’)
print(jeopardy[‘converted_value’])

Here’s my Jeopardy project submission:

This started out when I became unhappy about having all of my code in one Python file. I did a heck of a lot of work on this, and judging by some of the others’ submissions, I may have overdid mine. However, I think that makes it all the more impressive for those who want to take a look at it, since it has a lot of interactivity compared to the others’ submissions or the official solution.

Nice solutions to the extra questions :slight_smile:

in order to get rid of the , I used .replace(’,’,’’) instead of lstrip.
lstrip only removes leading characters, might also work with .strip instead of .lstrip
I’d also replaye ‘None’ with None (without ‘’) or 0

1 Like

thank you for your support, I appreciate it :heart_eyes:

1 Like

Hi everyone,
here is my solution: https://github.com/mige92/This-is-Jeopardy

I had a lot of trouble with the filter function. Found the solution here in the chat.
Also, I think you can do a better solution to find a connection between the round and the category.

I really appreciate any feedback!

Include answer for question 7

Hello,

If anyone can help with my question, please I would appreciate it.
For task #5, I wrote the code below.

def questions_filter_by_words(data, words):
    filter = lambda x: all(word.lower() in x.lower() for word in words)
    return data.loc[data.question.apply(filter)]

def count_unique_answers(data, words):
    filtered = questions_filter_by_words(data, words)
    count_unique = {}
    for i in range(len(filtered)):
        if filtered.answer.iloc[i] not in count_unique:
            count_unique[filtered.answer.iloc[i]] = 0
        count_unique[filtered.answer.iloc[i]] += 1
    return count_unique

count_unique_answers(jeopardy, 'King')

It works, but it gives different counts compared to the solution code.
For example, the solution code counted ‘Henry VIII’ 55 times but mine 61 times. The solution codes’ length is 5268, mine has 37688. which means my code has more unique answers. Moreover, my result has some answers that the solution code does not have and vice versa. I am trying to find out why, but I can’t. Would anyone please help figure out why my code does not work like ‘value_counts()’ method?

Hello everyone. I’d like to share the link to my Github showcasing my Jeopardy project. Please click the link above and offer any feedback. I’d appreciate it. And please try out the quiz!

Nice code Michael. You were able to filter through everything manually. I made some custom filters based on changing the air date column to a date time object, and then creating a decade column. From there, I made a function that allows you to find the connection between decade or round (or whatever column you like) and the number of shows those appear in. Find my link below if you’re interested.

Hello all. I’m here to share a couple of observations from when I was playing around with the Jeopardy Challenge project.

The second code cell contains this code.

# Filtering a dataset by a list of words
def filter_data(data, words):
  # Lowercases all words in the list of words as well as the questions. Returns true if all of the words in the list appear in the question.
  filter = lambda x: all(word in x for word in words)
  # Applies the lambda function to the Question column and returns the rows where the function returned True
  return data.loc[data["Question"].apply(filter)]
  1. The first comment doesn’t match the code.
    Not a big deal. Just a copy-paste issue.

  2. The code doesn’t work if we try and apply it to the ‘Answer’ column. We get “TypeError: argument of type ‘float’ is not iterable”

The reason it doesn’t work is one of the values in the Answer column is a number. There are a couple of ways to fix this.
Option 1. Cast x as a string, so it is iterable
filter = lambda x: all(word in str(x) for word in words)

Option 2. Clean up the type up front at the same time as when we clean up the column names.

Option 3. Specify the column type when reading in the CSV

1 Like