I’m on the Sports Vector Machines project
and trying to build a function for testing out different SVM parameters. I’ve written the following block
def machine(playa, gamma_range, C_range):
# Prepare data
playa = playa.dropna(subset = ['plate_x', 'plate_z', 'type'])
playa = playa[['plate_x', 'plate_z', 'type']]
playa['type'] = playa['type'].map(\
{'S':1, 'B':0})
training_set, validation_set = train_test_split(playa, random_state = 1)
# Loop through a range of gamma and C values
scores_list = []
for gamma in gamma_range:
for C in C_range:
classifier = SVC(kernel = 'rbf', gamma = gamma, C = C)
classifier.fit(training_set[['plate_x', 'plate_z']], training_set['type'])
scores_list.append([gamma, C, classifier.score(validation_set[['plate_x', 'plate_z']], validation_set.type)])
scores_df = pd.DataFrame(scores_list, columns = ['Gamma', 'C', 'Score'])
print(scores_df)
# Plot the best boundary
fig, ax = plt.subplots()
plt.scatter(playa.plate_x, playa.plate_z, c = playa.type, cmap = plt.cm.coolwarm, alpha = 0.25)
optimal_gamma = scores_df[scores_df['Score'] == scores_df['Score'].max()].scores_df['Gamma']
optimal_C = scores_df[scores_df['Score'] == scores_df['Score'].max()].scores_df['C']
optimal_classifier = SVC(kernel = 'rbf', gamma = optimal_gamma, C = optimal_C)
draw_boundary(ax, optimal_classifier)
plt.show()
When I call machine(david_ortiz, range(1,6), range(1,6))
to test the function it throws out
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
I’ve removed NaNs from the dataset, in a line inside the function and preemptively outside. I’ve checked for NaNs and they’re gone. But the error persists. Would really like to make this work! Apologies in advance for any unintended formatting!
Thanks!
EDIT: many thanks for the formatting tips tgrtim !
Indeed the error is generated by the line classifier.fit(training_set[['plate_x', 'plate_z']], training_set['type'])
inside the for
loop. It’s a simple .fit
call - which works perfectly when not inside a loop…