Date-A-Scientist Capstone Project

Hi guys,

Here are the links on my ML Capstone Project (slides and code).
Please take a look and kindly give me your feedback.
I dedicated so much time to this project that I will be really delighted to have some comments on it! :smiley:
Thanks a lot in advance!



Hey there,
I am yet to complete the data science module but yes looking at the PPT file, I was able to understand most of the things. Of course I did not go into much of the technical detail as I lack the knowledge, but yes questions were satisfactorily answered and well represented into tables.
So, an excellent project from my perspective.

1 Like

I used an online pdf to ppt converter and then loaded it into my google drive.

I’m not quite finished yet but I’d say overall it’s a very good job of showing the important concepts.

My critique would be that the presentation is very packed with information. My approach so far is to try to keep mentions of code out of the presentation at all (imagining I’m presenting to a group of managers that may not care) and try to keep my graphs and slides as clean as possible.

If someone is really curious about the code, they can read through the notebook.

I’d consider eliminating a lot of info from the slides and also splitting a few slides up. It took me a long time to get, but the goal of a slide isn’t really to get as much info as possible, it’s to make itwhat you present understandable in as little time as possible.

You can find my notebook here

My presentation may also be found here

I know I should edit to formulate some new questions but I suddenly needed a portfolio quickly.

1 Like

Heyyy good job!

I just began mine, but I’m weirdly getting a really high precision, recall and f1 score on mine. I know that what I’m looking for, if a person [‘is_comf’] enough to state their [‘body_type’] (0 if ‘rather not say’ else 1, or if left it empty which I considered when replacing nans with zeros)
I’m getting a really high score (0.9938707672669976, 0.9969306732501502, 0.9953983686868814
, None) with precision_recall_fscore_support(test_y, predict, average=‘weighted’).
There are 1/244 people who won’t state their body_type. Does that undermine my results or is my model actually good at predicting who won’t state their body_type?

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_fscore_support

df = pd.read_csv('profiles.csv')
df['is_comf'] = df['body_type'].apply([lambda x: 0 if x == 'rather not say' else 1])
df['is_vegan'] = df['diet'].apply([lambda x: 1 if (x == 'vegan') | (x == 'strictly vegan') | (x == 'mostly vegan') else 0])

df["drinks_code"] ={"not at all": 0, "rarely": 1, "socially": 2, "often": 3, "very often": 4, "desperately": 5})
df["smokes_code"] ={"no": 0, "trying to quit": 1, "when drinking": 2, "sometimes": 3, "yes": 4})
df["drugs_code"] ={"never": 0, "sometimes": 1, "often": 2})

features = df[['drinks_code', 'smokes_code', 'drugs_code']].replace(np.nan, 0, regex=True)

labels = df['is_comf'].replace(np.nan, 0, regex=True)

train_x, test_x, train_y, test_y = train_test_split(features, labels)

model = LogisticRegression(), train_y)
predict = model.predict(test_x)
score = model.score(test_x, test_y)
precision, recall, fbeta_score, support = precision_recall_fscore_support(test_y, predict, average='weighted')