Updated: This is Jeopardy! - CC Challenge Project


I’m doing this off-platform with the larger data set. I’m working on “Explore from Here” suggestion 2:

Is there a connection between the round and the category? Are you more likely to find certain categories, like "Literature" in Single Jeopardy or Double Jeopardy?

Here are the relevant parts of my code so far:

pd.set_option('display.max_colwidth', -1)

df = pd.read_csv('jeopardy.csv')

df.columns = ['show', 'date', 'round', 'cat', 'val', 'quest', 'ans']

cat_round = df.groupby(['cat', 'round']).quest.count().reset_index()
cat_round_pivot = cat_round.pivot(\
    columns= 'round',\
    index= 'cat',\
    values= 'quest'\


The output for my print command is good:

round                          cat  Double Jeopardy!  Final Jeopardy!  Jeopardy!  Tiebreaker
0       A JIM CARREY FILM FESTIVAL NaN               NaN               5.0       NaN
1      "!"                         NaN               NaN               5.0       NaN
2      "-ARES"                      5.0              NaN              NaN        NaN
3      "-ICIAN" EXPEDITION         NaN               NaN               5.0       NaN
4      "...OD" WORDS                5.0              NaN              NaN        NaN


So, to answer the question, I need to find a row in the table with the category cat == "Literature". It should be simple to do this, but nothing I’ve tried works. Here’s what I’ve tried:

litPiv = cat_round_pivot(cat_round_pivot['cat'] == 'Literature').reset_index() 
 //TypeError: 'DataFrame' object is not callable
litPiv = cat_round_pivot.xs(('Literature')).reset_index() 
 //KeyError: 'Literature'
print(cat_round_pivot[cat_round_pivot.index == "Literature"])
 //KeyError: False
print(cat_round_pivot[cat_round_pivot.index["Literature"]])// IndexError: only integers, slices (':'), ellipsis ('...'), numpy.newaxis ('None') and integer or boolean arrays are valid indices

So: How do I search a pivot table for a specific row?

Note: I figured out how to answer the question without the pivot table, like this (but I still want to know how to use the pivot table):

print(cat_round[cat_round.cat == 'Literature'])

but no results came up:

Empty DataFrame
Columns: [cat, round, quest]
Index: []

This result seems unlikely, as I’ve watched Jeopardy before and think there must have been a category called Literature in the 30 years or so the data set covers.

Update: Putting ‘LITERATURE’ (in all caps) gave the expected result:

              cat             round  quest
16377  LITERATURE  Double Jeopardy!  381
16378  LITERATURE  Final Jeopardy!   10
16379  LITERATURE  Jeopardy!         105

However, when I try it with a category I know for sure is in there:

print(cat_round[cat_round.cat == 'eBay'])
        cat             round  quest
31660  eBay  Double Jeopardy!  5

So I’m still a bit mystified. Any help/explanation would be welcome!

You will find your answer here: https://stackoverflow.com/questions/42985070/how-to-search-data-in-a-pivot-table-in-pandas

In your case, I would use
litPiv = cat_round_pivot.loc[‘litterature’]

To get proper results, it is perhaps better to apply lowercase to the whole Category column.

Thank you, @micro4146363334. This question was from a while ago, and, as noted in the update, the issue was the casing.

Can you please help me with the first suggestion? Can you share your code? @mtf @appylpye @stetim94

I’d appreciate some help with this particular problem. One of the suggestions was to…

“Build a system to quiz yourself. Grab random questions, and use the [input] function to get a response from the user. Check to see if that response was right or wrong.”

I thought this sounded like fun and I’ve got something working (don’t know if it’s particularly good code) in Pycharm console but getting an error. Here’s what I’m running…

import pandas as pd
df = pd.read_csv(r'/Users/jeopardy.csv')
df.columns = ['num','air_date','round','category','value','question','answer']
random_question = df.sample(1)
answer = input(random_question['question'])
if answer == random_question['answer']:

However, when I put add an answer it’s failing saying “The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().”

I understand that the dataframe expects bitwise operators rather than logical ones but in this instance I’d thought that ‘==’ would work .

I’m a total novice so I’m probably missing something obvious or have just done it completely wrong? :slight_smile:

Any help appreciated.

If you’re able to obtain a string for the question and write that out, why would obtaining a string for the answer be any different? Is that even the problem? Maybe you’re not getting the question either? And if so then it’s not related to ==, then it’s about obtaining the string from your data structure.

And if you weren’t able to write out the question, then maybe you shouldn’t have written the rest of the code after that yet? Fix current problems proceeding?

Thanks Ionatan, perhaps I didn’t explain very well. The question is written out successfully and the request for an input afterwards.

After entering the answer related to the question, I’m expecting a response of ‘correct’ but instead receive the error I posted.

Sure, but what types do you expect the following to have?

  • user input
  • question
  • answer

In particular, what types do you think you’re handing to ==?
And if answer is of one type, then the question, obtained the same way, is probably of the same type.

Would the thing you’re comparing the user input to ever consider itself equal to a string?

Either you have two strings, or zero. Having one string is unlikely because you obtained them the same way.

Thanks @ionatan. The types should all be strings so to provide an example…

Heres another screenshots if useful?

Then that’s your problem.
You expected string. That’s something else.

Somewhere in your dataframe you’ve put a string. Two strings, I suppose. You’d want to dig them out of there.

Thanks @ionatan. Just to make sure I understand, I want to ensure that the answer is stored in the dataframe as a string. So for simplification in solving this issue, I’ve created a new dataframe to make sure this is the case:

import pandas as pd
df = pd.read_csv(r'/Users/jeopardyv.csv')
df.columns = ['num','air_date','round','category','value','question','answer']
df2 = df.filter(['question','answer'], axis=1)
df2 = df2.applymap(str)
random_question = df2.sample(1)
answer = input(random_question['question'])
if answer == random_question['answer']:

Checking the answer column of the selected random_question this seems fine:

But when entering the answer into the console, I still encounter the same ValueError

The value in your dataframe is very likely a string already. But you’d have to get it out of the dataframe.

You might find this a whole lot easier to reason about if you sample TWO rows.
Or for that matter ten.
Or zero.

That sample method, it doesn’t return a row. It returns rows. Plural. Zero. One. Many. The type doesn’t change based on the amount.

For example, consider a list. A list of 1 looks like this:


not this:


…you’ve still got a bunch of pandas-related structure around your result. You just want the string.
How would you know? By reading what sample promises to do, and then considering what it returns and how you would use that return value to get what you want from it.

1 Like

Thanks @ionatan, you’ve perfectly articulated what I was missing / what was confusing me. I went about this another way and it works as expected…

import pandas as pd
df = pd.read_csv(r'/Users/jeopardy.csv')
df.columns = ['num','air_date','round','category','value','question','answer']
from random import randrange
random_question_id = randrange(500)
sample_question = df.loc[random_question_id]
answer = input(sample_question.question)
if answer == sample_question.answer:

Appreciate your help (and patience!)

Pandas follows the numpy convention of raising an error when you try to convert something to a bool. This happens in a if or when using the boolean operations, and, or, or not. It is not clear what the result of.


5 == pd.Series([12,2,5,10])

The result you get is a Series of booleans, equal in size to the pd.Series in the right hand side of the expression. So, you get an error. The problem here is that you are comparing a pandas pd.Series with a value, so you’ll have multiple True and multiple False values, as in the case above. This of course is ambiguous, since the condition is neither True or False. You need to further aggregate the result so that a single boolean value results from the operation. For that you’ll have to use either any or all depending on whether you want at least one (any) or all values to satisfy the condition.

(5 == pd.Series([12,2,5,10])).all()
# False


(5 == pd.Series([12,2,5,10])).any()
# True
1 Like