Chat Bots - Mystery Friend Step 10 Value error on custom text

  1. Hey all I am having issue in step 10 in chat bots https://www.codecademy.com/paths/build-chatbots-with-python/tracks/retrieval-based-chatbots/modules/language-and-topic-modeling-chatbots/projects/bag-of-words-mystery-friend

when I use different text than what is default i get error:

Traceback (most recent call last):
  File "script.py", line 38, in <module>
    mystery_friend = predictions[0] if predictions[0] else "someone else"
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Code:

from goldman_emma_raw import goldman_docs
from henson_matthew_raw import henson_docs
from wu_tingfang_raw import wu_docs

# import sklearn modules here:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

# Setting up the combined list of friends' writing samples
friends_docs = goldman_docs + henson_docs + wu_docs
# Setting up labels for your three friends
friends_labels = [1] * 154 + [2] * 141 + [3] * 166

# Print out a document from each friend:
mystery_postcard = """
The signal officer leaped from his position and made a vicious grab at the thin paper tape that was snaking from his typer to the master transmitter. It tore just at the entrance slot. The tape-end slid in; disappeared.
"""

# Create bow_vectorizer:
bow_vectorizer = CountVectorizer()

# Define friends_vectors:
friends_vectors = bow_vectorizer.fit_transform(friends_docs)

# Define mystery_vector: 
mystery_vector = bow_vectorizer.transform([mystery_postcard])

# Define friends_classifier:
friends_classifier = MultinomialNB()

# Train the classifier:
friends_classifier.fit(friends_vectors, friends_labels)

# Change predictions:
# predictions = friends_classifier.predict(mystery_vector)
predictions = friends_classifier.predict_proba(mystery_vector)
mystery_friend = predictions[0] if predictions[0] else "someone else"

# Uncomment the print statement:
print("The postcard was from {}!".format(mystery_friend))

Hi,
On the line above, predictions has been changed to become a list of probabilities for each option. Before it just had the one value.

Hope that helps

So If I understood this right:

I printed out each probability and got results:

0.004582551489322573
0.9945159382991403
0.0009015102115406884

In upper case it means that the 2nd is the most probable predictions and other two did not come even close

How can suggested any() or all() assist in classification since any of those two I use there will be answer True

predictions = friends_classifier.predict_proba(mystery_vector)
print(f'Result {predictions}')
for p in predictions[0]:
    print(p)

print(all(predictions[0]))
print(any(predictions[0]))

# Printout
Result [[4.58255149e-03 9.94515938e-01 9.01510212e-04]]
0.004582551489322573
0.9945159382991403
0.0009015102115406884
True
True

Hi,
The any or all is python trying to fix the code so it runs - not necessarily to make it correct.

If you look at the code;

mystery_friend = predictions[0] if predictions[0] else "someone else"

The if condition is a list of numbers and it’s not sure what you want to do with them.
Before, when you used just predict, predictions[0] was just one number, [2], so the condition was whether that was a truthy value or not.
But with a list of numbers some might be true, some false, so python is unsure how to deal with it - hence the any/all suggestion.

Basically, predict is doing what you’ve done. Looking at the probabilities of each and deciding which is the higher and returning just that.
But, predict_proba is returning all the different probabilities and the mystery_friend line isn’t set up to deal with that.

Ok, so the logical return would be the highest score and than let the end user decide how much faith does he want to put into