More concise way of changing a columns values to specific integers (depending upon the original columns values) using pandas?

This is for this project in Learn the Basics of Machine Learning course: https://www.codecademy.com/courses/machine-learning/projects/ml-decision-trees-flags

I finished the project, and wanted to use the column “Mainhue” as a feature. However, it had strings like “green” “red” etc which I needed to change to integers. And I did that using the following method I found through google:

flags.loc[(flags.Mainhue == 'green'),'Mainhue']=1

flags.loc[(flags.Mainhue == 'red'),'Mainhue']=2

flags.loc[(flags.Mainhue == 'blue'),'Mainhue']=3

flags.loc[(flags.Mainhue == 'gold'),'Mainhue']=4

flags.loc[(flags.Mainhue == 'white'),'Mainhue']=5

flags.loc[(flags.Mainhue == 'orange'),'Mainhue']=6

flags.loc[(flags.Mainhue == 'black'),'Mainhue']=7

flags.loc[(flags.Mainhue == 'brown'),'Mainhue']=8

But surely there is a more concise way of doing this? I hope this all makes sense, never used the pandas library before this course.

Something like pandas.Series.apply — pandas 1.3.3 documentation or pandas.Series.map — pandas 1.3.3 documentation for a single series might be what you’re looking for. You can use a function (more powerful) or mapping type (simpler), e.g. a dictionary to map each of these string values to their integer counterparts. You might need to set the new series type if you want to use them as integers.

1 Like