Hello, I am having a little of a pinch in this project
My problem resides in problem b of the last task which goes as follows:
- Create a new variable called
age_group
, which groups respondents based on their birth year. The groups should be in five-year increments, e.g.,25-30
,31-35
, etc. Then label encode theage_group
variable to assist the Census team in the event they would like to use machine learning to predict if a respondent thinks the wealthy should pay higher taxes based on their age group.
I wrote the following code and thankfully it worked, however I find it really inefficient and time consuming to make this lengthy if/elif function just to classify the values into different age groups. is there a more elegant way to code the parse_values
function?
def parse_values(birth_year):
x = 2021 - birth_year
if x <= 15:
return '10-15'
elif x > 15 and x <= 20:
return '15-20'
elif x > 20 and x <= 25:
return '20-25'
elif x > 25 and x <= 30:
return '25-30'
elif x > 30 and x <= 35:
return '30-35'
elif x > 35 and x <= 40:
return '35-40'
elif x > 40 and x <= 45:
return '40-45'
elif x > 45 and x <= 50:
return '45-50'
elif x > 50 and x <= 55:
return '50-55'
elif x > 55 and x <= 60:
return '55-60'
elif x > 60 and x <= 65:
return '60-65'
elif x > 65 and x <= 70:
return '65-70'
elif x > 70 and x <= 75:
return '70-75'
elif x > 75 and x <= 80:
return '75-80'
elif x > 80 and x <= 85:
return '80-85'
census['age_group'] = census.birth_year.apply(parse_values)
ages = census.age_group.unique()
census.age_group = pd.Categorical(census.age_group, ages, ordered=True)
census['age_group_codes'] = census.age_group.cat.codes
Thanks to everyone who is willing to help a priori