Finished Insurance Cost Project - Function Criticism?

Just finished up the US Medical Insurance Cost project! I stupidly attempted to try multi-variate regression originally but then went back to trying some more basic things within my skill level.

Overall I’m pretty happy with it but would like to hear any comments/criticism especially around my region_count and smoker_difference functions. I feel like they may be more bulky than they should be.

File is called Insurance Cost Data Analysis.ipynb

Thanks for any help!

1 Like

Hi @web1886190748, congrats on finishing up. You can always revisit the project to trial any skills you learn in the future. A reasonable step when adding something complex is to work out what you want to do and a method of solving it on paper before diving into the code or you risk trying to solve two difficult problems at once.

Since you were looking for a little criticism- I’d agree with your comments about the functions. I think region_count does too much for a single function and mixes it’s purpose (the name and the return don’t really match). Perhaps you could separate the counting and the return of a maximum value for a start.

Code like the following could be potentially simplified with something like a dictionary (or perhaps even the Counter dict subclass)-

if region == 'northwest':
    northwest += 1
if region == 'northeast':
...

There are several almost repeated logic statements like-

if northwest > northeast and northwest > southeast and northwest > southwest:
            return ['NorthWest' , northwest]

The logic around these lines could be greatly simplified. Consider what the main goal is (find largest value) and then what you want to return.

It might be helpful to any readers if you formatted some of your floating data values, only include a relevant number of decimal places (an average age of 38.3 is straightforward and clear, 38.3343903 is less helpful (inclusion of error might be worthwhile and it also stops claims of accuracy that aren’t true). On that note you import part of the decimal module but never make use of it, either make use of it or remove the import.

Thank you for this @tgrtim! You’re right about the constantly repeating logic checks with checking for a max. I’ll try delving into something simpler or less “congested”.

I’ll have to look into that Counter dict subclass as well.

Funny enough I did bring in Decimal to try and deal with the float issue. Then couldn’t seem to work it out properly and forgot to remove import. I’ll need to read the documentation on Decimal on how to fix those floats properly.

In regards to the average age…

return total_age / len(ages)

I was trying to use…

return Decimal(str(total_age / len(ages)))

That seems to be what the lesson on Decimal showed me but I couldn’t seem to make it work

EDIT: I have realized I can just use round() instead of decimal module which makes life way easier!

1 Like