What does cat.codes do?

Hi!

I’m doing the Data Scientist: Natural Language Processing Specialist career path, and I’ve gotten to the point where we’re doing variable types. On the variable types review page, it asks me to use cat.codes, but it never went over this in the lesson. Is anyone willing to explain exactly what cat.codes means and does?

Thank you for the help,
mews_mochi <3

P.S : it also asks me to use this code on the census variables project, not sure how I’m supposed to know that without looking at the hint :sob:

Do you have a link to the lesson?

cat.codes assigns a numerical value to the (ordinal) categorical variables. It returns an array of numbers that are paired with the categorical variables (as well as the index). This allows you to do summary statistics on a column of data.
Also, it’s okay to look at the hint if you don’t understand a concept. :slight_smile:

See the Pandas documentation:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Categorical.codes.html?highlight=cat%20codes#pandas.Categorical.codes

And more here on categorical variables:

https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html

Example from the Summary Statistics lesson on NYC Tree census data:

health_categories = ['Poor', 'Fair', 'Good']

nyc_trees['health'] = pd.Categorical(nyc_trees['health'], health_categories, ordered=True)

median_index = np.median(nyc_trees['health'].cat.codes)
print(median_index)

median_health_status = health_categories[int(median_index)]

print(median_health_status)
>>2.0
Good
 

Here is a link to the lesson:

https://www.codecademy.com/paths/data-science-nlp/tracks/dsf-exploratory-data-analysis-python/modules/dsf-variable-types-for-data-science/lessons/variable-types/exercises/variable-types-review

Also, thank you for the explanation! Definitely makes more sense now.

1 Like

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.