Problems with Q4 on Machine Learning Foundations Exam No 2

[https://www.codecademy.com/exams/journeys/data-scientist-ml/paths/dsmlcj-22-machine-learning-foundations/parts/2]

I am having an issue getting my answer for Q4 on Machine Learning Foundations Exam No 2 in the right format:

Here is my code which I think should answer the question posed:
import pandas as pd

df = pd.read_csv(‘https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data’,
names=[‘buying’, ‘maint’, ‘doors’, ‘persons’, ‘lug_boot’, ‘safety’, ‘accep’])
print(df.head())

Create target variable array ‘y’ and set it to have a binary outcome by transforming df[‘accep’]

lst = [‘acc’, ‘good’, ‘vgood’]
df[‘accep’] = df[‘accep’].replace(‘unacc’, 1)
df[‘accep’] = df[‘accep’].replace(lst, 0)
y = df[‘accep’]

Create feature matrix ‘X’ by transforming the rest of the columns to be one-hot encoded variables

df1 = pd.get_dummies(df, columns = [‘buying’, ‘maint’, ‘doors’, ‘persons’, ‘lug_boot’, ‘safety’])
df1 = df1.drop(columns = [‘accep’, ‘buying_low’, ‘maint_high’, ‘doors_4’, ‘persons_more’, ‘lug_boot_small’, ‘safety_med’])
x = df1
print(x)

but I get the message
Error in calculation of number of entries in y that have the value 1.

Please can you help me with what I am doing wrong.

Thanks

1 Like

I also ran into this issue. Although I employed another method. For the binary encoding I did:

# to compare for validation
print(df.accep.value_count())

binary_dict = {'unacc':1, 'acc':0, 'good':0,'vgood':0}
df['accep'] = df['accep'].map(binary_dict)
y = df['accep']

#check values match with initial counts
print(y.value_count())

My code wouldn’t even go to the Matrix X so I’m still stuck here

Sorry never did manage to get it to work but maybe try using a BinaryEncoder?

Tom

I’ve come up with other methods to try that might work but I’m still yet to retry the solutions on the exam and see if they work.I’ll give them one more try after the 24h probation and if all these fail, we might have to ask the Codecademy team for some assistance on this

import pandas as pd

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data', 
                 names=['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'accep'])
#print(df.head())


# Create target variable array 'y' and set it to have a binary outcome by transforming df['accep']
# method 1: lambda (works for me)
df['accep'] = df.apply(lambda x: 1 if 'unacc' in x['accep'] else 0, axis=1)
y = df['accep'] 
print(y.value_counts())

#method 2: map dictionary (also works for me)
binary_dict = {'unacc':1,'acc':0,'good':0,'vgood':0}
df['accep'] = df['accep'].map(binary_dict)
y2 = df['accep']
print(y2.value_counts())


#method 3: trial from Codecademy post: (should work don't know why it doesn't)
lst = ['acc', 'good', 'vgood']
df['accep'] = df['accep'].replace('unacc', 1)
df['accep'] = df['accep'].replace(lst, 0)
y3 = df['accep']
print(y3.value_counts())

# Create feature matrix 'X' by transforming the rest of the columns to be one-hot encoded variables
# check column category counts
print([df[col].value_counts() for col in list(df.select_dtypes(include=['O']).columns)])


# one hot encode column and drop redundant columns
X = pd.get_dummies(df,columns=df.columns,drop_first=True)
print(X.head())

Technically speaking a Binary Encoder shouldn’t work, but I might be understanding things wrongly.
In the example given: the 19 colour types are changed into binary code and the longest color code value length becomes the number of columns i.e., the colour pink is the 19th colour which 19 has a binary code value of (10011) which is 5 digits long. Therefore the transformed dataframe will have 5 new features rather than 19 features, reducing feature count immensely (75%).

This wouldn’t work because the question asked for a single column with binary variables, so multiple columns/feature transformations wouldn’t be too useful for us since the df['accep'] column has 4 different categories.

to make your code to pass you need to

  1. use .map() when created y
  2. use capital X at the end.

Buy the way thank you for second chapter, I’ve tried so many other options which unfortunately didn’t work properly. columns for live!