Honey production exercise

in the data science path, MACHINE LEARNING: SUPERVISED LEARNING :robot:
project: Honey Production

https://www.codecademy.com/paths/data-science/tracks/dspath-supervised/modules/dspath-linear-regression/projects/honey-production

why does the lesson when when grouping the data by year is using mean instead of sum?
The solutions is:
prod_per_year = df.groupby(‘year’).totalprod.mean().reset_index()
should it be?
prod_per_year = df.groupby(‘year’).totalprod.SUM().reset_index()

Hi eyalbre,

you should ask yourself what information you want to extract from the model. Are you interested in the sum of the produced honey in 2050? Would the sum represent the trend of production per beekeeper?

Regards,
Vince

my question is about what they say they have reached at the solution. They use the mean and at the end they say they have found the total production… which sounds wrong from my point of view.

Yeah, that’s actually a bit confusing. I think this comes from the fact that the corresponding column in the data set, which refers to the production of one producer, is called ‘totalprod’, i.e. total production.

So i guess by ‘total production’ they always mean the production of one producer in the corresponding year.

But I agree, it’s a bit confusing.