Mystery Friend

Link to exercise

LEARN NATURAL LANGUAGE PROCESSING

#9 Uncomment the final print statement and save your code to see who your mystery friend was all along!

When doing so this results in an error.

Traceback (most recent call last):
  File "c:/Users/scott/Documents/Coding/Tutorials/Codecademy/Python/Learn Natural Language Processing/00 - LEARN NATURAL LANGUAGE PROCESSING/02 - Mystery Friend/main.py", line 44, in <module>
    friends_classifier.fit(friends_labels, friends_vectors)
  File "C:\Users\scott\miniconda3\lib\site-packages\sklearn\naive_bayes.py", line 615, in fit
    X, y = self._check_X_y(X, y)
  File "C:\Users\scott\miniconda3\lib\site-packages\sklearn\naive_bayes.py", line 480, in _check_X_y
    return self._validate_data(X, y, accept_sparse='csr')
  File "C:\Users\scott\miniconda3\lib\site-packages\sklearn\base.py", line 432, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "C:\Users\scott\miniconda3\lib\site-packages\sklearn\utils\validation.py", line 73, in inner_f
    return f(**kwargs)
  File "C:\Users\scott\miniconda3\lib\site-packages\sklearn\utils\validation.py", line 803, in check_X_y
    estimator=estimator)
  File "C:\Users\scott\miniconda3\lib\site-packages\sklearn\utils\validation.py", line 73, in inner_f
    return f(**kwargs)
  File "C:\Users\scott\miniconda3\lib\site-packages\sklearn\utils\validation.py", line 624, in check_array
    "if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3
 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Seems to be pointing to this line in my code

# Train friends_classifier on friends_vectors and friends_labels using the classifier’s.fit() method.
friends_classifier.fit(friends_labels, friends_vectors)

Which was done on step #7

Train friends_classifier on friends_vectors and friends_labels using the classifier’s .fit() method.

Did I do step #7 correctly? I assume I did based on info from google but now it appears it is wanting an array an I am feeding it a string?
sklearn.naive_bayes.MultinomialNB

I’m not sure if that is the issue, or something else I did wrong. Can someone look over my code (below) and see if they see an issue please?

# import CountVectorizer from sklearn.feature_extraction.text.
# import MultinomialNB from sklearn.naive_bayes.
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

from goldman_emma_raw import goldman_docs
from henson_matthew_raw import henson_docs
from wu_tingfang_raw import wu_docs

# Setting up the combined list of friends' writing samples
friends_docs = goldman_docs + henson_docs + wu_docs
# Setting up labels for your three friends
friends_labels = [1] * 154 + [2] * 141 + [3] * 166

# Print out a document from each friend:


mystery_postcard = """
My friend,
From the 10th of July to the 13th, a fierce storm raged, clouds of
freeing spray broke over the ship, incasing her in a coat of icy mail,
and the tempest forced all of the ice out of the lower end of the
channel and beyond as far as the eye could see, but the _Roosevelt_
still remained surrounded by ice.
Hope to see you soon.
"""

# Define bow_vectorizer as an implementation of CountVectorizer.
bow_vectorizer = CountVectorizer()

# Use your newly minted bow_vectorizer to both fit (train) and
# transform (vectorize) all your friends’ writing (stored in the variable friends_docs).
# Save the resulting vector object as friends_vectors.
friends_vectors = bow_vectorizer.fit_transform(friends_docs)

# Create a new variable mystery_vector.
# Assign to it the vectorized form of [mystery_postcard] using the vectorizer’s .transform() method.
mystery_vector = bow_vectorizer.transform([mystery_postcard])

# Implement a Naive Bayes classifier using MultinomialNB. Save the result to friends_classifier.
friends_classifier = MultinomialNB()

# Train friends_classifier on friends_vectors and friends_labels using the classifier’s.fit() method.
friends_classifier.fit(friends_labels, friends_vectors)

# Change predictions value from ["None Yet"] to the classifier’s prediction about which friend wrote the postcard.
# You can do this by calling the classifier’s .predict() method on the mystery_vector.
predictions = friends_classifier.predict(mystery_vector)

mystery_friend = predictions[0] if predictions[0] else "someone else"

# Uncomment the print statement:
print("The postcard was from {}!".format(mystery_friend))

Looks like you needed to pass in your friends_vectors as the first argument and friends_labels as the second argument in friends_classifier.fit().

Looking at the docs, you can see the .fit() method as such:

.fit(X, y, sample_weight=None)

…where X is your matrix of training vectors and y is your array of target values (a.k.a. the labels).

Swapping those two arguments around should solve your error.

That produces a new error if you do this

friends_classifier.fit(friends_vectors, friends_labels)

Which Python complains with this…

  File "c:/Users/scott/Documents/Coding/Tutorials/Codecademy/Python/Learn Natural Language Processing/00 - LEARN NATURAL LANGUAGE PROCESSING/02 - Mystery Friend/main.py", line 44, in <module>
    friends_classifier.fit(friends_vectors, friends_labels)
  File "C:\Users\scott\miniconda3\lib\site-packages\sklearn\naive_bayes.py", line 615, in fit
    X, y = self._check_X_y(X, y)
  File "C:\Users\scott\miniconda3\lib\site-packages\sklearn\naive_bayes.py", line 480, in _check_X_y
    return self._validate_data(X, y, accept_sparse='csr')
  File "C:\Users\scott\miniconda3\lib\site-packages\sklearn\base.py", line 432, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "C:\Users\scott\miniconda3\lib\site-packages\sklearn\utils\validation.py", line 73, in inner_f
    return f(**kwargs)
  File "C:\Users\scott\miniconda3\lib\site-packages\sklearn\utils\validation.py", line 813, in check_X_y
    check_consistent_length(X, y)
  File "C:\Users\scott\miniconda3\lib\site-packages\sklearn\utils\validation.py", line 257, in check_consistent_length
    " samples: %r" % [int(l) for l in lengths])

That was my only guess before posting. When that didn’t work I posted here. Pretty sure based on this that it is something else going on, if not some unknown bug :confused:?

hm…if that is the only error, it seems like it thinks friends_vectors and friends_labels are not the same length. Did you try this on Codecademy’s platform? It works for me on there, but I haven’t tested it off platform. Maybe it has something to do with how you copied the info over to your machine?

1 Like

Didn’t copy the files incorrectly. Just to be sure I copy/paste over what I had with same results. However, the script works on the website. Not sure why I am getting different results other than I excluded the .pyc files. Not sure if that’s the reason, but I guess I am moving on?

Thank you for your help!

Hm, that’s strange that it works on the website but not on your computer. When I have the time, I’ll look into this further and see if I run into the same problems when I do it locally. If so, I’ll reach out to the Codecademy team to see why this might be.

Thanks for bringing this to our attention!

I’m seeing the same error on the website – “Reshape your data”. Did you ever fix this? Can you check to see if your solution on the website still works? I don’t know what to do.

No, I gave up an moved on.

Is there some special intention behind the task 10? Did somebody get a proper result ? For me I get:

How do I interpret this data?
The first two are reaaaaaaaaaaaaaally small, I see that, but nr3?

[array([[2.96848999e-15, 1.05268065e-13, 1.00000000e+00]])]

I never get the “someone else” returned, no matter which text I give as an input…