I thought the project was fairly simple since I used the pandas library, took me all day because I wasn’t sure how to do the data analysis part. SO any feedback good or bad would be appreciated!!!
I’m unable to access your project via this link at the moment (it seems like quite a long link, do you have that many nested directories in the project?). Could this repo possibly be set to private? Please update when you get a working link, ta.
I think it is fixed now
Yes that’s now accessible, ta.
Everything appears to be easy to read and you explain what you’re doing which is very helpful for a viewer. You’ve done a fair bit of analysis but consider perhaps adding an extra few lines at the end summing up the most interesting analysis. What is most important to cost? Are there any unexpected correlations? A short summary might be nice at the very end (your choice to include or not).
Consider limiting your coding comments. Since this is Python ideally you can understand what’s happening without much annotation (if you want to keep the detailed explanation somewhere then perhaps have two .ipynb files, one with annotations to explain every step and one without).
It’s the one only one I spotted but this statement is questionable, “If you do not smoke you will save $…”. It should really be “approximately / around / an average of” or ideally an average with std. errors or similar. The rest seem to be covered correctly.
Your code seems fairly straightforward and readable which is great. I would say though that if you’ve gone to the trouble of importing pandas
then make the most of it. It has been in-built methods for counting, averaging, grouping and much more.
Perhaps you could reorganise your project to be completed without pandas (showing you’re capable of performing analysis without a third-party library) if you wanted to make the most of the work you’ve done. Or consider have more than one solution file. One with pandas and one without. It’s your work you’re displaying so it’s up to you.
Consider perhaps limiting the significant figures for some of the values where you have printed them out with df.describe
and similar. Pandas has a few options for this such as round
-https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.round.html
or formatting options- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.set_option.html
Thank you! I was looking for more pandas documentation but what I found was not the same as what you have sent over. I plan on going over it and doing what you suggested!!!