This was my first attempt at some statistical and graphing analysis of the American Insurance data set. There seems to be some good news stories and some bad news. The project overall was reasonably challenging from the perspective of taking as much of the Python coding I’ve learned up to this point and employing it. Learning to access and use various modules containing all manner of statistical analyses, models and graphing was incredibly helpful and made me realize that I am just scratching the surface of this topic. I took my time with this project and probably put in around 20 hours of work. A good deal of that time was spent finding meaningful representations for the data and learning how to run various methods within modules, as well as a bit of scientific medical journal reading on smoking and obesity trends in the USA.
Some of the findings highlights:
Average American BMI is over 30 in nearly every age category, which is considered ‘obese’.
Over 80% of Americans according to this dataset are non-smokers!
I like the fact you’ve included a summary (and I love the fact you’ve referenced things in it). Your values are sensibly formatted and you seem to have used reasonable charts for the data you represent in them. Overall, you seem to have a good idea of how to structure and present your analysis which is great.
For a little feedback:
I like the introduction but at the minute some of it seems to be missing the why. There’s lengthy description of what you intend to look into but it’s always worthwhile engaging your reader early on. Granted this a little trickier with a pre-defined project just consider how someone who is unfamiliar with the project would interpret it.
There’s a couple of print outputs early on which takes up several pages worth of scrolling, you’d be better just adding a link to the dataset. It might have been left in accidentally but make sure your notebook is human readable before uploading it . If you’re introducing your dataset then a few example lines should do the job.
In terms of code Jupyter notebooks do have a background state, you don’t need to import packages in every new cell and re-using functions in more than one cell could improve readability here. You also don’t need to initialise variables in Python, since there are such a large number of such variables in your code it does hamper readability in its current form. Lists, dictionaries, arrays or dataframes/series might also make your variables much more manageable, container types are great, make the most of them.
At present most of your functions just do too much. I’d highly suggest modularising your code a little further, with sensible function names the code can remain very readable even if the number of functions increases. There are a couple of lengthy if statements that I’m sure you could reduce somewhat with functions and looping, what’s more that code seems to repeat in other functions, try and up that DRY factor .
Thanks for your comments and review, I do appreciate it. I agree my functions could have been a lot more broken down into manageable bites. I think I just got carried away with the task at hand. Sorry about the data printout, I will do sample data from now on. Your idea of using dictionaries to eliminate the if statements is a good one. I had a look at some usage of dictionaries in this regard and it makes a lot of sense. I will definitely use that in the future. I think the “why” portion of the project was definitely lacking and quite honestly I hadn’t really given it any thought mostly because I was being a bit too assignment-minded. I see the value of the ‘why’ as it is essential in building an analysis.
Ah grand, it had most of the hallmarks of a scientific article which is right up my street. I figured a few minor additions on that front and you’d have a presentable notebook that someone else could read and understand. Presenting your data/analysis can be sometimes be worth far more than the analysis itself.
The code comments are just things to get in the habit of, chances are you’ll come to appreciate them yourself even if it’s just to save an extra couple of minutes typing.