Project : Census Variables

Hello !
Could someone share with me the code on how to solve this part of the project?

> * Create a new variable called marital_codes by Label Encoding the marital_status variable. This could help the Census team use machine learning to predict if a respondent thinks the wealthy should pay higher taxes based on their marital status.

As this is a learning environment simply asking for code if frowned upon, the community guidelines are worth viewing-
https://discuss.codecademy.com/faq

You’d likely get a better response by following the guidance laid out in the following FAQ on how best to set up questions (by and large these forums typically follow more of a Q&A style especially under “get-help”-

2 Likes

Dear fellow coders,

Please find here the solution for the census variables project

import codecademylib3

# Import pandas with alias
import pandas as pd

# Read in the census dataframe
census = pd.read_csv('census_data.csv', index_col=0)

#1
print(census.head())

# 3
print(census.dtypes)

# 4
print(census.birth_year.unique())

# 5. 
census['birth_year'] = census['birth_year'].replace('missing', 1967)
print(census['birth_year'].head())

#6
census['birth_year'] = census['birth_year'].astype('int')
#8
print(census['birth_year'].mean())

# 9

# converting type of columns to 'category'
census['higher_tax'] = census['higher_tax'].astype('category')

#  encoding
census['higher_tax'] = census['higher_tax'].cat.codes
print(census.higher_tax.unique)

# print out the median of the higher_tax variable
print(census['higher_tax'].median()) 

# 10
census = pd.get_dummies(census, columns = ['marital_status'] )

print(census.head())

Hi,

ths is my code, Appreciate your feedback, thank

1 Like

Thanks for your sharing. :grinning_face_with_smiling_eyes:

1 Like

Hi guys, here’s my solution.

import codecademylib3
import numpy as np
# Import pandas with alias
import pandas as pd

# Read in the census dataframe
census = pd.read_csv('census_data.csv', index_col=0)

#print(census.head())
#print(census.dtypes)

#check why birth year was classed as an object
#print(census['birth_year'].unique())

#change the missing data to 1967
census['birth_year'] = census['birth_year'].replace(['missing'], 1967)
#change birth year data type to int
census['birth_year'] = census['birth_year'].astype('int')
#print(census.dtypes)

#average birth year
#print(census['birth_year'].mean())

#convert higher_tax to categorical
census['higher_tax'] = pd.Categorical(census['higher_tax'], ['strongly disagree', 'disagree', 'neutral', 'agree', 'strongly agree'], ordered=True)
print(census['higher_tax'].unique())

census['higher_tax'] = census['higher_tax'].cat.codes
print(census.head())

# median for higher tax is neutral
#print(census['higher_tax'].median())


#census = pd.get_dummies(census, columns=['marital_status'])
#print(census.head())

#print(census['marital_status'].unique())
census['marital_status'] = pd.Categorical(census['marital_status'], ['single', 'married', 'divorced', 'widowed'], ordered=True)
census['marital_codes'] = census['marital_status'].cat.codes

#Create an age column and group them by 5 year intervals
census['age'] = 2021 - census['birth_year']
age_bins = np.arange(min(census['age'])- 4, 100, 5)
census['age_group'] = pd.cut(census['age'], bins=age_bins)

#Python recognises the age_group variable as categorical. Encode age groups.
census['age_group'] = census['age_group'].cat.codes

print(census.head())