"Best of Baseball" PostgreSQL Challenge Help Needed!

Hi All,

I’m having some trouble with the “Best of Baseball Awards” lesson in the Design Databases with PostgreSQL course.

Specifically with the first query “Heaviest Hitters” (awarded to the team with the highest average weight of its batters on a given year). Here’s what I’ve got so far:

SELECT years, MAX(avg_weight) as max_weight
FROM (SELECT batting.yearid AS years, teams.name AS team, round(avg(people.weight), 2) AS avg_weight
	FROM batting, teams, people
	WHERE batting.teamid = teams.teamid
	AND batting.playerid = people.playerid
	GROUP BY batting.yearid, teams.name) AS weights
GROUP BY weights.years;

This gives me a sorted list of the highest average weight per year (which is what we want!). However, when I try to also list the team name in the following way:

SELECT years, team, MAX(avg_weight) as max_weight
FROM (SELECT batting.yearid AS years, teams.name AS team, round(avg(people.weight), 2) AS avg_weight
	FROM batting, teams, people
	WHERE batting.teamid = teams.teamid
	AND batting.playerid = people.playerid
	GROUP BY batting.yearid, teams.name) AS weights
GROUP BY weights.years;

I get the following error:

ERROR:  column "weights.team" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select years, team, max(avg_weight) as max_weight

And if I add weights.team to the GROUP BY statement, the output changes to be identical to the subquery (because we are now grouping by year, then by team, and then taking the max of avg_weight).

So, my question is: is there some way I can “join” the columns I’ve selected with the team name that corresponds to the maximum average weight each year?

Thanks in advance!

(I’m running PostgreSQL 13.0 via pgAdmin 4 4.27 on macOS Catalina 10.15.7.)

I feel your pain. I’ve been struggling all day trying to accomplish the same.
I believe, the Codecademy solutions for this project don’t really answer their own questions correctly. They provide lists of a variable (in this instance: average weight) for each team and year, rather than providing the team with the highest/lowest value for the variable for each individual year.
So hats to you for trying to accomplish a ‘correct’ solution. Have you found a solution, yet?

1 Like

I would expect to solve it with a Window Function as following:

WITH
	avg_weight 
  AS
  (
    SELECT
        teams.name AS team_name,
        batting.yearid AS season,
        ROUND(AVG(people.weight)) AS avg_player_weight
      FROM 
        people,
        batting,
        teams
      WHERE people.playerid = batting.playerid
      AND teams.id = batting.team_id
      GROUP BY
        batting.yearid,
        teams.name
      ORDER BY avg_player_weight DESC
  )
SELECT
 	team_name,
 	season,
  MAX(avg_player_weight) OVER (PARTITION BY season) AS avg_player_weight
FROM 
	avg_weight
ORDER BY avg_player_weight DESC;

But it doesn’t provide the desired results, as I expect only ONE result row for each year.

team_name season avg_player_weight
team_name season avg_player_weight
Tampa Bay Rays 2009 221
Atlanta Braves 2014 221
Tampa Bay Rays 2014 221
Pittsburgh Pirates 2009 221
Oakland Athletics 2009 221

etc.

It would appear PostgreSQL does not support the MAX() function (as expected) in Window Functions although it does not produce an error: PostgreSQL: Documentation: 13: 3.5. Window Functions.

HI,
I’ve been working on this too and kept getting stuck with an error when trying to select a column that I wasn’t using an Aggregate function on or using GROUP BY on.

I found a workaround by first creating a temp table called ‘avg_weights’ that has every team for every year with the average weight for that year.

Then I was able to select those columns from itself WHERE the avg_weight is in a query of all the max avg_weights ordered by yearid.

WITH avg_weights AS 
(SELECT 
    teams.name,
    AVG(people.weight) AS average_weight,
    batting.yearid
FROM batting
LEFT JOIN teams
    ON teams.id = batting.team_id
LEFT JOIN people 
    ON batting.playerid = people.playerid
GROUP BY 1, 3
ORDER BY 3 DESC)

SELECT * FROM avg_weights
WHERE average_weight IN
(SELECT max(average_weight)
 FROM avg_weights
 GROUP BY yearid
)
ORDER BY yearid DESC;

The Flaw with my solution is that there are a few max averages that match other non-max averages in the data set. Thus allowing several non-max-averges through to the data set. Such as in the years 2001, 1999 and 1998.

I don’t know why PostgreSQL will not allow us to Select for a column that is not used in an aggregate function or the GROUP BY tag. Would be much easier If they did allow us.

Hopefully this helps get you a bit closer to the answer.

Found an ugly workaround to my problem.

avg_weights stays the same from my code above. The changed part is below. I followed a tip found on StackOverflow here.

They say selected columns needing to be found in the GROUP BY/AGGREGATE is a know problem. You can get around it by making a subquery with as much info as possible and then joining it to your table.

Here I make a sub-query t which has max average_weight and yearid then I Join back to the first data (avg_weights) where BOTH the max_average = average AND yearid = yearid. That narrows the margin for error such that two teams should only show for a given year if they have the same average and are both the MAX. Which I think is what we want.

SELECT name, average_weight, yearid
FROM (
	SELECT MAX(average_weight) AS mx, yearid AS y
  FROM avg_weights
  GROUP BY yearid
) AS t
JOIN avg_weights as a
	ON t.mx = a.average_weight AND t.y = a.yearid
ORDER BY yearid DESC;
1 Like

I found another way to get the right answer. I used two WITH clauses to create two separate result tables, one with the average weight of each team and year, and one with the maximum weight and year. Then I could join those two tables into a final table.

WITH weights AS (  
  SELECT ROUND(AVG(weight),2) AS "avg_weight", batting.yearid AS "year", teams.name as "team_name"
  FROM batting
  JOIN people
    ON people.playerid = batting.playerid
  JOIN teams
    ON batting.team_id = teams.id
  WHERE weight IS NOT NULL
  GROUP BY batting.yearid, teams.name
  ORDER BY batting.yearid
),
winner AS (
  SELECT MAX(avg_weight) AS "highest_average", year
  FROM weights
  GROUP BY year
  ORDER BY year
)
SELECT highest_average, winner.year, team_name
FROM winner
JOIN weights
  ON winner.year = weights.year AND winner.highest_average = weights.avg_weight;
2 Likes

Is this the shortest way to get the team from each year with the highest average weight without using AS?

SELECT 
  batting.yearid,
  teams.name, 
  AVG(weight),
  RANK() OVER (PARTITION BY batting.yearid ORDER BY AVG(weight) DESC)
FROM people 
JOIN batting ON people.playerid = batting.playerid
JOIN teams ON batting.team_id = teams.id
GROUP BY 1, 2
ORDER BY 4
LIMIT 149

Hello.

I have problem with database. When I want to run baseball_database.sql in pgAdmin 4 it give me:

“ERROR: syntax error at or near “1”
LINE 688: 1 gomezle01 1933 0 ALS193307060 NYA 921 AL 1 1
^
SQL state: 42601
Character: 11803”

Can you help me with this?

Sorry for bothering! :woozy_face:
I found how to fix this problem in other discussion:

Querying Baseball data off platform project

1 Like

This is a great solution! I just added 1 to the ORDER BY list so it would be sorted by year and it was exactly what I was trying to do. It frustrates me that the “solutions” provided by codecademy don’t actually do what the instructions request. On the positive side I learned some things about window functions while I was trying to fix them. I didn’t discover RANK() until I read your post, so thanks for that.