Wrapper Method with `obesity.csv` dataset project

Greetings Data Scientists,

Wrapper Method Solution with obesity.csv dataset

Notice the Wrapper Method practice project does not provide obesity.csv in the downloaded file for Jupyter Notebook.

Found the dataset on the internet and have converted all categorical values into numerical values.

Here is my GH Repo:

You’re welcome. :cowboy_hat_face:

Happy Coding

1 Like

I think it’s great that you did this project (the link is wrong to the dataset though) and you seem to have some interesting predictive models. Perhaps a bit more details would be needed for the non technical folks about what the models do. I guess it depends on who the audience is.

I think it’s key to note that the data only has participants from Mexico, Peru, & Colombia. And any results would only be applicable to this population in the study and not to the general population as a whole.

I have to wonder if this original study was peer reviewed?

That said, I think it leaves out several important social factors that can influence the variables in the data. Factors like:

  • median income

  • access & availability of grocery stores to the individual (does the person live in a “food desert”?)

  • access to affordable health care & access to health information

That’s all I can think of off the top of my head. I think the original study is flawed because it leaves out so many other variables that would affect one’s health.

1 Like

Hi lisalisaj

The variables in this research is under the category of Life Science hence the these variables are related to health.

The variables you have mentioned above are related to economy and this is a good idea for future potential research to investigate the impact of economic factor to obesity rate.

As data scientists, secondary data collection is our main source of data and our specialty is to analyse data provided and make useful insight from the result.

Appreciate your feedback!



Health doesn’t exist in a vacuum and nutritional behavior is affected by other variables.

I understand that the data is categorized under “Life Science”, but the factors that I mentioned, while economic, also fall under that category and have an influence on the variables in the data. These factors have to be taken into consideration. For instance, the variables- level of physical activity, total caloric intake, and how much water one drinks while specific to each individual are also influenced by whether or not the person has access to health care and health information. The type of food choices that a person makes are also influenced by income. Any results from the can only be applied to the survey respondents and not extrapolated to the general population without error.
It seems like it’s (the study) is more anthropological in nature, or more of an ethnography on the sample population involved
In general, I think that just because we can use ML to sort of predict this stuff doesn’t necessarily mean that we should.

I just found the link to the peer-reviewed publication

Link to the obesity dataset

Machine learning technique that we apply actually has a lot of potential to predict the rate of obesity because each disease has its own pathophysiology.

Researchers are listing all possible significant variables that are associated with obesity from medical point of view.

I did not reject your idea that median income, access to healthcare etc could not affect the rate of obesity. What I mentioned in my previous post that your idea is useful for the future research to investigate the impact of income on obesity.

For example we can hypothesize that people with higher income tend to develop obesity possibly due to sedentary lifestyle.

2 research gaps we have identified:

  • investigate the impact of the median income, access to healthcare services and availability of nearby grocery store to the obesity.
  • investigate if the findings in this research are consistent in countries other than Peru, Mexico and Columbia.

Yep, that’s what I was looking at. :slight_smile: I searched the UCI ML data repo bc the link in your notebook lead to a “404 error”, and then I searched and found the dataset and the article.

I hear what you’re saying…I’m just pointing out that this can be pretty tricky (these predictive models). Which is why I mentioned that these models can only be applicable to the sample survey population.

And I cannot stress enough that peoples’ individual health is influenced by outside factors and those have to be considered when doing something like this, including survey creation.
Someone’s decisions about what to eat are definitely influenced by how much money they have. (So, you can’t think of one w/o the other.) Do they choose foods that have more refined sugars and higher fat and calories? Are those foods often cheaper (yes) ? Does one live near a grocery store or do they have to travel a farther distance? (Food deserts in the U.S. and health indicators is something that I’ve been interested in). Do they have a doctor to discuss these things with (access to health information)?

I’m glad that we can discuss these sorts of topics! :slight_smile: