I’m doing the extra for Data Science Foundations: Census Variables, and decided to be be hellbent on autogenerating the labels for age groups because… I’m extra? I haven’t used bins before, I googled it in order to figure out how to make the age groups, so I’m sure I’m not perfectly understanding it.
I used the range function to make the bin edges and list comprehension to create the age groups every five years, f.e. 21-25. However, I kept running into issues when making the new age group column: either the ages weren’t sorting correctly or I kept getting the error “Bin labels must be one fewer than " “the number of bin edges””. Which I do understand, there needs to be one less label than edges, sure. I ended up using .pop to remove a label because I couldn’t find another way to comply. The code works properly afaik, but it just looks so clunky, there must be a better way to do this, right? Can someone tell me how?
# Generating edges for bin age_group_max = list(range(0, 101, 5)) print(age_group_max) # Generating labels (first needs to be removed because it's nonsensical) age_group_labels = [(str(x-4) + "-" + str(x)) for x in age_group_max] #The first label generated is '-4-0' so removing that one age_group_labels.pop(0) print(age_group_labels) census["age_group"] = pd.cut(census.age, bins=age_group_max, labels=age_group_labels)