Python Pandas Personal Project Help

#Import required packages/libraries
import pandas as pd

#Practice dataframe 1, comprised of student info.
Student_Info = pd.DataFrame({"ID": [1, 2, 3, 4, 5, 6, 7, 8, 9], 
              "Name": ["John", "Steve", "Mike", "Luke", "Jane", "Ella", "Sophie", "Alice", "Susan"], 
              "Email": ["John1@email.com", "Steve2@email.com", "Mike3@email.com", "Luke4@email.com", 
                        "Jane5@email.com", "Ella6@email.com", "Sophie7@email.com", "Alice8@email.com", 
                        "Susan9@email.com"]})

#Practice dataframe 2, comprised of student scores for their major. "ID" column is the same
# as dataframe 1.
Student_Marks = pd.DataFrame({"ID": [1, 2, 3, 4, 5, 6, 7, 8, 9], 
                    "Major": ["English", "Geology", "Math", "Biology", "Education", "Business", 
                              "Finance", "Chemistry", "Psychology"], 
                    "Score": [34.6, 26.2, 98.1, 87.3, 65.5, 72.4, 59.7, 68.6, 61.0]})

#Merging 2 of the above dataframes together by the ID column.
df_merge = pd.merge(Student_Info, Student_Marks, on="ID", how="outer").reset_index(drop=True)

# Making a list (and dictionary) for the sex of a name.
name_by_sex = {"John": "M", "Steve": "M", "Mike": "M", "Luke": "M", "Jane": "F", 
               "Ella": "F", "Sophie": "F", "Alice": "F", "Susan": "F"}
male_names = ["John", "Steve", "Mike", "Luke"]

# Iterate through df_merge names column, to see if any of the names match those in a dictionary
# that has names as keys, and the values as either F or M for the sex of that name.
# If they match, create a new column in df_merge, "Sex", and append the sex of the name to
# that column for the respective row.

                                    ### NOT WORKING ###
for x, y in name_by_sex.items():
    for i in df_merge["Name"]:
        if i == x:
            df_merge[i]["Sex"] = y

# Same as above, except using a list of males names, instead of a dictionary of names with their sex
                                    
                                    ### NOT WORKING ###
for x in male_names:
    df_merge.loc[df_merge["Name"] == x, ["Sex"]] = "M"
    df_merge.loc[df_merge["Name"] != x, ["Sex"]] = "F"
    
# Check to see if the sex by name code above has worked.
df_merge

I am currently working through the “Multiple Tables with Pandas” Chapter in the Data Science course with Python (https://www.codecademy.com/paths/data-science/tracks/dscp-data-manipulation-with-pandas/modules/dscp-multiple-tables-in-pandas/lessons/pandas-multiple-tables/exercises/review-ii), and have made a small practice project on my own to try and apply some of the topics covered over the course so far. In my project I am currently trying to iterate through a column of names in a Pandas dataframe, and check to see if the names in that column match a list (and dictionary) of names I have made that also has the sex that those names belong to. I’m trying to do this so that I can then add an additional column to the dataframe (“Sex”), where if the names in the dataframe column match those in the list (or dictionary), then it will add the sex associated with that name to the appropriate column and row.

I’m doing this to practice modifying a dataframe with external data provided in another form (as I have had this done by a colleague in R on a work project, and can see that it’d be very useful to know). The 2 most common errors I receive are 1. KeyError: ‘John’, and 2. ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Both errors seem to be because my code can’t find the string ‘John’, which hasn’t helped me a lot in terms of fixing my code. The dictionary attempt I have made above gives the key error, and the ValueError I have gotten from various previous attempts at the list and dictionary for-loops. Lastly, the list for-loop I have given above seems to have the strange effect of making every value in the Sex column “F” except for the row for “Luke”, which is a confusing outcome.

Any tips or hints on how to achieve my goal of iterating through the dataframe and checking against the dictionary and list that I have made, and how to append the sex values to the new colum would be greatly appreciated. If this isn’t the correct area to post this, then my apologies. I can move it to somewhere more relevant if anyone lets me know.

Regards,
Damir

Hm. I tweaked this a bit and it worked. I commented out the first part and, well, try this(?):

'''for x, y in name_by_sex.items():
    for i in df_merge["Name"]:
        if i == x:
            df_merge[i]["Sex"] = "M"'''

# Same as above, except using a list of males names, instead of a dictionary of names with their sex
                                    
                                    ### NOT WORKING ###
for x in male_names:
    df_merge.loc[df_merge["Name"] == x, ["Sex"]] = "M"
    #df_merge.loc[df_merge["Name"] != x, ["Sex"]] = "F"
    
# Check to see if the sex by name code above has worked.
df_merge
>>>
	ID	Name	Email	Major	Score	Sex
0	1	John	John1@email.com	English	34.6	M
1	2	Steve	Steve2@email.com	Geology	26.2	M
2	3	Mike	Mike3@email.com	Math	98.1	M
3	4	Luke	Luke4@email.com	Biology	87.3	M
4	5	Jane	Jane5@email.com	Education	65.5	F
5	6	Ella	Ella6@email.com	Business	72.4	F
6	7	Sophie	Sophie7@email.com	Finance	59.7	F
7	8	Alice	Alice8@email.com	Chemistry	68.6	F
8	9	Susan	Susan9@email.com	Psychology	61.0	F
1 Like

Hey Lisalisaj, thanks for giving this a crack.

Hmmm, it’s strange that that code works for you. I noticed you commented out the first for loop, and commenting out the second statement -

#df_merge.loc[df_merge["Name"] != x, ["Sex"]] = "F"

in the second for loop. So I’m not actually sure how you even got an output with “F” values in the sex column, seeing as you didn’t run any statement that could do that (as they’re all commented). When I run the code you provided (with the second for loop statement in the brackets above un-commented ) I get this output.

   ID    Name              Email       Major  Score Sex
0   1    John    John1@email.com     English   34.6   F
1   2   Steve   Steve2@email.com     Geology   26.2   F
2   3    Mike    Mike3@email.com        Math   98.1   F
3   4    Luke    Luke4@email.com     Biology   87.3   M
4   5    Jane    Jane5@email.com   Education   65.5   F
5   6    Ella    Ella6@email.com    Business   72.4   F
6   7  Sophie  Sophie7@email.com     Finance   59.7   F
7   8   Alice   Alice8@email.com   Chemistry   68.6   F
8   9   Susan   Susan9@email.com  Psychology   81.0   F

The strange thing is, if I run only this as you have,

for x in male_names:
    df_merge.loc[df_merge["Name"] == x, ["Sex"]] = "M"

then the output is halfway correct. It gives me this.

   ID    Name              Email       Major  Score  Sex
0   1    John    John1@email.com     English   34.6    M
1   2   Steve   Steve2@email.com     Geology   26.2    M
2   3    Mike    Mike3@email.com        Math   98.1    M
3   4    Luke    Luke4@email.com     Biology   87.3    M
4   5    Jane    Jane5@email.com   Education   65.5  NaN
5   6    Ella    Ella6@email.com    Business   72.4  NaN
6   7  Sophie  Sophie7@email.com     Finance   59.7  NaN
7   8   Alice   Alice8@email.com   Chemistry   68.6  NaN
8   9   Susan   Susan9@email.com  Psychology   81.0  NaN

However, as soon as I add in any additional lines of code to fill in the remaining values for sex to be “F” (like shown initially), I get either one of the aforementioned errors, the outcome with only a single “M” as shown above, or all the values in the sex column being “F” (although I figured out why this last outcome happened fairly quickly).

I have managed to solve making the dictionary work, by doing this

# Create a new column in df_merge titled "Sex".
df_merge["Sex"] = ""

for idx, row in df_merge.iterrows():
    for x, y in name_by_sex.items():
        if row["Name"] == x:
            df_merge.loc[idx, "Sex"] = y

The only way I could get the list attempt working though was by creating a second list of female names only, then using the following for loop

male_names = ["John", "Steve", "Mike", "Luke"]

female_names = ["Jane", "Ella", "Sophie", "Alice", "Susan"]

# Make a new empty string column in df_merge for the sex values "M" and "F".
df_merge["Sex"] = ""

for idx, row in df_merge.iterrows():
    for x in male_names:
        if row["Name"] == x:
            df_merge.loc[idx, "Sex"] = "M"
    for y in female_names:
        if row["Name"] == y:
            df_merge.loc[idx, "Sex"] = "F"

Unfortunately, I didn’t want to solve it this way, so I’m still racking my brain trying to figure out a way to get it working with just the male_names list.

Thanks for the help so far though. It might indicate that there is something off with something else in my code, or with the software I’m using.

Regards,
Damir

1 Like

Hm. I’ll go back and look at it.
Though, you might just not want such a column anyway.
What happens if you have a name like “Chris” or “Pat” or “Davy”?
:woman_shrugging:t2:

1 Like

I know it isn’t the most realistic example, I only made the dummy data that way to purposefully keep it simple. It was just to learn how to manipulate data in python, the example of names and sex has no real world purpose nor do I need it to.

Thanks for your help so far.

1 Like

First consider if you really need to iterate over rows in a DataFrame. Iterating through pandas dataFrame objects is generally slow. Iteration beats the whole purpose of using DataFrame. It is an anti-pattern and is something you should only do when you have exhausted every other option. It is better look for a List Comprehensions , vectorized solution or DataFrame.apply() method for iterate through DataFrame.

Pandas DataFrame loop using list comprehension

result = [(x, y,z) for x, y,z in zip(df['Name'], df['Promoted'],df['Grade'])]

Pandas DataFrame loop using DataFrame.apply()

result = df.apply(lambda row: row["Name"] + " , " + str(row["TotalMarks"]) + " , " + row["Grade"], axis = 1)