I think I found a better way to layout the initial variables in this project than the suggested answer by Codecademy. Am I wrong?

Hello Everyone!

Still pretty new here, but I have a question in regards to my first Python-based project. Originally, I’m asked to create variables based off of a set of data that will be referenced later in the code I’m writing.

These are the variables I will need to create:

  • age : age of the individual in years
  • sex : 0 for female, 1 for male*
  • bmi : individual’s body mass index
  • num_of_children : number of children the individual has
  • smoker : 0 for a non-smoker, 1 for a smoker

The first task is: Create the following variables for a 28-year-old, non-smoking woman who has three children and a BMI of 26.2.

The “correct” answer is as follows:
age = 28
sex = 0
num_of_children = 3
bmi = 26.6
smoker = 0

Initially, I was thinking in the real-world and in more of a long term situation, I will most likely be bouncing between male/female and smoker/non smoker as I’m going through different data comparisons. The structure I would’ve done for my variables would’ve included those 2 extra variables for easier referencing later. Instead of having to update the sex and smoker variable, I would have individual variables for male, female, smoker, and non-smoker. My preferred answer is as follows:

Variables
age = 28
female = 0
male = 1
num_of_children = 3
bmi = 26.6
nonsmoker = 0
smoker = 1

Is there something I haven’t learned yet as to why this form of an answer wouldn’t work? Is it too many variables for the computer to remember? I feel like, at least for how I think, this would have been more efficient to start with. The other variables can be updated on a per patient basis, but in the long run, the programmer (me) would have less variables to update, making my flow a bit faster.

Here’s an example of the insurance cost formula:
insurance_cost = 250 * age - 128 * sex + 370 * bmi + 425 * num_of_children + 24000 * smoker - 12500

Using my variables instead, I would write the formula as:
insurance_cost = 250 * age - 128 * female + 370 * bmi + 425 * num_of_children + 24000 * nonsmoker - 12500

Looking at it, I can see that as I write out the formula 100s of times, I would be constantly updating the formula with either female/male, or nonsmoker/smoker. That seems like it would get tedious really fast. It also seems like updating these variables along with the other 3 is about the same amount of work too. What else am I missing?

Any thoughts? Excited to hear! :grinning:

https://www.codecademy.com/paths/data-science/tracks/dscp-python-fundamentals/modules/dscp-python-syntax/projects/ds-python-syntax-project

Hello!
The second way would involve checking to see if the “customer” is a male or female. Imagine if this was put into a website, and people checked various boxes. You would need to have separate conditions depending on whether they’re male/female, smoking/nonsmoking. The first way is also more efficient, as the formula works regardless of female/male/smoker/non-smoker, whereas you would need to have 4 different formulas if you used the second way.
I hope this helps!

1 Like

Thank you for the feedback! Much appreciated :grinning:

1 Like