#Import required packages/libraries
import pandas as pd
#Practice dataframe 1, comprised of student info.
Student_Info = pd.DataFrame({"ID": [1, 2, 3, 4, 5, 6, 7, 8, 9],
"Name": ["John", "Steve", "Mike", "Luke", "Jane", "Ella", "Sophie", "Alice", "Susan"],
"Email": ["John1@email.com", "Steve2@email.com", "Mike3@email.com", "Luke4@email.com",
"Jane5@email.com", "Ella6@email.com", "Sophie7@email.com", "Alice8@email.com",
"Susan9@email.com"]})
#Practice dataframe 2, comprised of student scores for their major. "ID" column is the same
# as dataframe 1.
Student_Marks = pd.DataFrame({"ID": [1, 2, 3, 4, 5, 6, 7, 8, 9],
"Major": ["English", "Geology", "Math", "Biology", "Education", "Business",
"Finance", "Chemistry", "Psychology"],
"Score": [34.6, 26.2, 98.1, 87.3, 65.5, 72.4, 59.7, 68.6, 61.0]})
#Merging 2 of the above dataframes together by the ID column.
df_merge = pd.merge(Student_Info, Student_Marks, on="ID", how="outer").reset_index(drop=True)
# Making a list (and dictionary) for the sex of a name.
name_by_sex = {"John": "M", "Steve": "M", "Mike": "M", "Luke": "M", "Jane": "F",
"Ella": "F", "Sophie": "F", "Alice": "F", "Susan": "F"}
male_names = ["John", "Steve", "Mike", "Luke"]
# Iterate through df_merge names column, to see if any of the names match those in a dictionary
# that has names as keys, and the values as either F or M for the sex of that name.
# If they match, create a new column in df_merge, "Sex", and append the sex of the name to
# that column for the respective row.
### NOT WORKING ###
for x, y in name_by_sex.items():
for i in df_merge["Name"]:
if i == x:
df_merge[i]["Sex"] = y
# Same as above, except using a list of males names, instead of a dictionary of names with their sex
### NOT WORKING ###
for x in male_names:
df_merge.loc[df_merge["Name"] == x, ["Sex"]] = "M"
df_merge.loc[df_merge["Name"] != x, ["Sex"]] = "F"
# Check to see if the sex by name code above has worked.
df_merge
I am currently working through the “Multiple Tables with Pandas” Chapter in the Data Science course with Python (https://www.codecademy.com/paths/data-science/tracks/dscp-data-manipulation-with-pandas/modules/dscp-multiple-tables-in-pandas/lessons/pandas-multiple-tables/exercises/review-ii), and have made a small practice project on my own to try and apply some of the topics covered over the course so far. In my project I am currently trying to iterate through a column of names in a Pandas dataframe, and check to see if the names in that column match a list (and dictionary) of names I have made that also has the sex that those names belong to. I’m trying to do this so that I can then add an additional column to the dataframe (“Sex”), where if the names in the dataframe column match those in the list (or dictionary), then it will add the sex associated with that name to the appropriate column and row.
I’m doing this to practice modifying a dataframe with external data provided in another form (as I have had this done by a colleague in R on a work project, and can see that it’d be very useful to know). The 2 most common errors I receive are 1. KeyError: ‘John’, and 2. ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Both errors seem to be because my code can’t find the string ‘John’, which hasn’t helped me a lot in terms of fixing my code. The dictionary attempt I have made above gives the key error, and the ValueError I have gotten from various previous attempts at the list and dictionary for-loops. Lastly, the list for-loop I have given above seems to have the strange effect of making every value in the Sex column “F” except for the row for “Luke”, which is a confusing outcome.
Any tips or hints on how to achieve my goal of iterating through the dataframe and checking against the dictionary and list that I have made, and how to append the sex values to the new colum would be greatly appreciated. If this isn’t the correct area to post this, then my apologies. I can move it to somewhere more relevant if anyone lets me know.
Regards,
Damir