FAQ: Aggregating Data - Let's Review

This community-built FAQ covers the “Let’s Review” exercise from the lesson “Aggregating Data”.

Paths and Courses
This exercise can be found in the following Codecademy content:

[Beta] Python for Data Science: Working with Data

FAQs on the exercise Let’s Review

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!
You can also find further discussion and get answers to your questions over in Language Help.

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head to Language Help and Tips and Resources. If you are wanting feedback or inspiration for a project, check out Projects.

Looking for motivation to keep learning? Join our wider discussions in Community

Learn more about how to use this guide.

Found a bug? Report it online, or post in Bug Reporting

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

I am slightly confused with the last task in the summary which is an analysis of FIFA 2022:

Calculate the Total Number of Goals by Each Team in the FIFA World Cup 2022.

Hint: We’ll need to combine each team’s goals scored as both the home team and away team.

  1. Split results into a DataFrame containing only FIFA World Cup 2022 matches
  • Combine multiple boolean masks using the & operator:
    • results['tournament']=='FIFA World Cup'
    • results['year']== 2022
  • Perform a second split that creates separate DataFrames for home teams and away teams
  1. Apply aggregation functions to each DataFrame that sums up the total number of goals scored by each team
  • Don’t forget to flatten the index of the resulting aggregated DataFrames
  1. Combine the two DataFrames by performing a left join where the DataFrame on the left contains the data for the home team (important!)
  • Feel free to clean the column names but don’t drop the home_team and away_team columns!
  • Create a new column total_goals that adds up the total goals scored by each team as the home team and away team
  • Sort from highest to lowest number of total_goals by each team.

The code I’ve written for this task is below and I believe it adheres to the instructions but I am baffled by the table it produces. How should I be reading this table?

import pandas as pd import numpy as np results = pd.read_csv('results.csv') results.head() # Filter data for FIFA World Cup 2022 world_cup_2022 = results[(results['tournament'] == 'FIFA World Cup') & (results['year'] == 2022)] # Split into separate DataFrames for home and away teams home_teams = world_cup_2022[['home_team', 'home_score']] away_teams = world_cup_2022[['away_team', 'away_score']] # Aggregate goals for home and away teams home_goals = home_teams.groupby('home_team').agg(total_home_goals=('home_score', 'sum')).reset_index() away_goals = away_teams.groupby('away_team').agg(total_away_goals=('away_score', 'sum')).reset_index() # Join the two DataFrames combined_goals = pd.merge(home_goals, away_goals, left_on='home_team', right_on='away_team', how='left').fillna(0) # Calculate and sort total goals combined_goals['total_goals'] = combined_goals['total_home_goals'] + combined_goals['total_away_goals'] combined_goals = combined_goals.sort_values(by='total_goals', ascending=False) combined_goals = combined_goals.rename(columns={ 'home_team': 'team_1', 'away_team': 'team_2', 'total_home_goals': 'team_1_goals', 'total_away_goals': 'team_2_goals' }) combined_goals

I think you got it right. I got similar results.
team_1_goals are goals the team scored as the home team
team_2_goals are goals the team scored as the away team
the total_goals are all the goals the team scored during the FIFA 2022

Here’s how I did it. I cleaned up the output a little bit.

fifa = results['tournament'] == 'FIFA World Cup' year2022 = results['year'] == 2022 fifa2022 = results[fifa & year2022] home = fifa2022[['home_team', 'home_score']] away = fifa2022[['away_team', 'away_score']] home_total = home.groupby('home_team').agg({'home_score': 'sum'}).reset_index() away_total = away.groupby('away_team').agg({'away_score': 'sum'}).reset_index() combine = pd.merge( left = home_total, right = away_total, left_on = 'home_team', right_on = 'away_team', how = 'left') combine['away_score'] = combine['away_score'].fillna(0) sum_scores = combine['home_score'] + combine['away_score'] combine['total_score'] = sum_scores drop_columns = ['home_score', 'away_team', 'away_score'] clean = combine.drop( labels = drop_columns, axis = 1) clean = clean.rename(columns = {'home_team': 'team'}) clean = clean.sort_values(by = 'total_score', ascending = False) clean.reset_index(drop=True, inplace=True) print("FIFA World Cup 2022 Total Scores") clean