Deep Learning Regression with Admissions Data

Deep Learning Regression with Admissions Data

Hi,

I completed the code step-by-step by this project.
My solution can find on GitHub or on Colaboratory.

I dont’t understand the main thinks maybe because I cannot to figure out how to give answer this project to the question:

For this project, you will attempt to determine which student factors (such as test scores) are most important when applying to graduate school.

Can anyone help me?

Hi @lendoo!!

You did an incredible job with this project!

For the prompt

For this project, you will create a deep learning regression model that determines how strongly different student application factors (such as test scores) predict whether they are accepted to graduate school.

I actually recently rephrased it to

For this project, you will create a deep learning regression model that predicts the likelihood that a student applying to graduate school will be accepted based on various application factors (such as test scores).

because I believe this is more clearly aligned with what we are doing in the step-by-step directions in the project.

However, if you are still interested in investigating the original prompt, one thing you could do is look at the r2 outlined in the final task. This r2 score shows the strength of the predictions your deep learning model is making. If you would like to see how each feature predicts the chance an applicant will be accepted, you can plot each feature individually against the predicted acceptance chances to see the strength of the relationship (and see which ones appear to have the strongest correlation).

Sorry for any confusion about the original prompt, and again I want to say that your deep learning analysis on the dataset is very well done! Let me know if you have any additional questions. :smile:

1 Like

Hey @jrich20! I’m about to finish this project and I’m a bit confused here on how r2 score gives us some insight into how each feature impacts the acceptance chances! I’ve got an 0.76 r2_score, wondering if that’s good enough.
First I searched for means to find the feature importance for the model. :dango: t’s deep learning, there’s nothing like that due to the model complexity and its “black box” aspect. To have some sort of in-depth into each feature, it would require an iterative approach: dropout of features.
Anyway, thanks for the skill path! It’s pretty nice!

Also @lendoo, nice project! Here’s mine :slight_smile:


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow	import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras import layers

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import Normalizer
from sklearn.metrics import r2_score
import eli5
from eli5.sklearn import PermutationImportance

df = pd.read_csv('admissions_data.csv')

features = df.iloc[:,1:-1]
labels = df.iloc[:,-1]
print(features.head())

features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.1)

scaler = StandardScaler()
features_train = scaler.fit_transform(features_train)
features_test = scaler.transform(features_test)



model = Sequential([
layers.InputLayer(input_shape=(features_train.shape[1],)),
layers.Dense(6, activation='relu'),
layers.Dropout(0.2),
layers.Dense(6, activation='relu'),
layers.Dense(1),
])
opt = keras.optimizers.Adam(learning_rate = 0.1)
model.compile(loss='mse', metrics=['mae'], optimizer=opt)

history = model.fit(features_train, labels_train, epochs=40, batch_size=250, verbose=1, validation_split=0.2)
res_mse, res_mae = model.evaluate(features_test, labels_test)

print(history.history.keys())

plt.plot(history.history['mae'])
plt.plot(history.history['val_mae'])
plt.title('model mae')
plt.ylabel('mae')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()