Hi @lord-proton, welcome to the forums.
I like the fact you’ve included a summary (and I love the fact you’ve referenced things in it). Your values are sensibly formatted and you seem to have used reasonable charts for the data you represent in them. Overall, you seem to have a good idea of how to structure and present your analysis which is great.
For a little feedback:
I like the introduction but at the minute some of it seems to be missing the why. There’s lengthy description of what you intend to look into but it’s always worthwhile engaging your reader early on. Granted this a little trickier with a pre-defined project just consider how someone who is unfamiliar with the project would interpret it.
There’s a couple of
print outputs early on which takes up several pages worth of scrolling, you’d be better just adding a link to the dataset. It might have been left in accidentally but make sure your notebook is human readable before uploading it . If you’re introducing your dataset then a few example lines should do the job.
In terms of code Jupyter notebooks do have a background state, you don’t need to import packages in every new cell and re-using functions in more than one cell could improve readability here. You also don’t need to initialise variables in Python, since there are such a large number of such variables in your code it does hamper readability in its current form. Lists, dictionaries, arrays or dataframes/series might also make your variables much more manageable, container types are great, make the most of them.
At present most of your functions just do too much. I’d highly suggest modularising your code a little further, with sensible function names the code can remain very readable even if the number of functions increases. There are a couple of lengthy if statements that I’m sure you could reduce somewhat with functions and looping, what’s more that code seems to repeat in other functions, try and up that DRY factor .
Since you do a lot of binning for histograms and you already import
numpy then functions like the following would be useful (pandas has an equivalent too if you prefer that)-