Hi all!
Here my code of the Automobile Evaluation Data project, feel free to correct me in the comments!
import pandas as pd
import numpy as np
car_eval = pd.read_csv(‘car_eval_dataset.csv’)
print(car_eval.head())
manufacturer_country = car_eval.manufacturer_country.value_counts(normalize = True)
print(manufacturer_country)
manufacturer_no_value = car_eval[“manufacturer_country”].value_counts()/len(car_eval[“manufacturer_country”])
print(manufacturer_no_value)
buying_cost = car_eval[“buying_cost”].unique()
print(buying_cost)
buying_cost_categories = [‘low’, ‘med’, ‘high’, ‘vhigh’]
car_eval[“buying_cost”] = pd.Categorical(car_eval[“buying_cost”], buying_cost_categories, ordered=True)
median = np.median(car_eval[“buying_cost”].cat.codes)
print(median)
median_category = buying_cost_categories[int(median)]
print(median_category)
luggage = car_eval.luggage.value_counts(normalize = True)
print(luggage)
luggage_nodrop = car_eval.luggage.value_counts(dropna = False, normalize = True)
print(luggage_nodrop)
luggage_no_value = car_eval[“luggage”].value_counts()/len(car_eval[“luggage”])
print(luggage_no_value)
doors = (car_eval[“doors”] == ‘5more’).sum()
print(doors)
doors_mean = (car_eval[“doors”] == ‘5more’).mean()
print(doors_mean)
Cheers!
Jorge.
2 Likes
Hi Jorge,
I just found this skill path and am doing this evaluation today. Your code looks good. I could not get numpy to run on the Codecademy platform, so that is a bummer. Could you get it to work or is that why you are posting it here?
1 Like
Hey @nfry2672241086 !
I totally made it work, didn’t have any trouble with numpy.
I just wanted to share my code because I wasn’t able to find the project solved in the forum, thus to help others.
See ya.
Don’t know if this is relevant anymore BUT I did notice in my codecademy platform they did not have ‘import numpy as np’ at the top of this exercise so if you just add that it should work fine!
Dear all,
does anyone know if the data set for the car project is available as a csv file somewhere? Codeacademy extended the original data set by one variable for country and I would like to work on jupyter notebook for this project using the csv file:)
And here is my code
import pandas as pd
import numpy as np
car_eval = pd.read_csv('car_eval_dataset.csv')
print(car_eval.head())
# Table of Frequencies
freq = car_eval.manufacturer_country.value_counts()
print(freq)
# Modal: Most occuring country
modal_cat =freq.index[0]
print(modal_cat) # Output: Japan
print(freq.index[5]) # France
# 2 Table of Proportions
prop = car_eval.manufacturer_country.value_counts(normalize=True)
print(prop) # 23% for Japan
# 3 List of possible values
print(car_eval['buying_cost'].unique()) # ['vhigh' 'med' 'low' 'high']
# 4 Ordered list for ordinal categorical variable
buying_cost_categories = ['low', 'med' ,'vhigh' , 'high']
# 5 Convert to Type category
car_eval['buying_cost'] = pd.Categorical(car_eval['buying_cost'], buying_cost_categories, ordered = True)
print(car_eval['buying_cost'].head())
# 6 Median for buying cost category
median_category_num = np.median(car_eval['buying_cost'].cat.codes)
print(median_category_num)
median_category = buying_cost_categories[int(median_category_num)]
print(median_category) # 1
# Summarize luggage capacity
# 7 Table of proportions
prop2 = car_eval.luggage.value_counts(normalize=True)
print(prop2)
# 8 Account for missing data
prop3 = car_eval.luggage.value_counts(dropna = False, normalize=True)
print(prop3) # no missings
print(car_eval.luggage.isnull().any()) # no missings
# 9
prop4 = car_eval.luggage.value_counts(dropna = False)
print(prop4)
# Passanger Capacity
count = (car_eval.doors == '5more').sum()
print(count)
# 11
proportion = (car_eval["doors"] == '5more').mean()
print(proportion)
Happy Coding
sorry, but do we have now another task or why did you use it print(freq.index[5]) # France for the 4th the most frequently?