Sports Vector Machines data science project, function-writing trouble

I’m on the Sports Vector Machines project

and trying to build a function for testing out different SVM parameters. I’ve written the following block

def machine(playa, gamma_range, C_range):

# Prepare data

 playa = playa.dropna(subset = ['plate_x', 'plate_z', 'type'])

 playa = playa[['plate_x', 'plate_z', 'type']]

 playa['type'] = playa['type'].map(\
{'S':1, 'B':0})
 
 training_set, validation_set = train_test_split(playa, random_state = 1)

# Loop through a range of gamma and C values

 scores_list = []

 for gamma in gamma_range:
   for C in C_range:
 
     classifier = SVC(kernel = 'rbf', gamma = gamma, C = C)
 
     classifier.fit(training_set[['plate_x', 'plate_z']], training_set['type'])
 
     scores_list.append([gamma, C, classifier.score(validation_set[['plate_x', 'plate_z']], validation_set.type)])

 scores_df = pd.DataFrame(scores_list, columns = ['Gamma', 'C', 'Score'])

 print(scores_df)

# Plot the best boundary

 fig, ax = plt.subplots()

 plt.scatter(playa.plate_x, playa.plate_z, c = playa.type, cmap = plt.cm.coolwarm, alpha = 0.25)

 optimal_gamma = scores_df[scores_df['Score'] == scores_df['Score'].max()].scores_df['Gamma']

 optimal_C = scores_df[scores_df['Score'] == scores_df['Score'].max()].scores_df['C']

 optimal_classifier = SVC(kernel = 'rbf', gamma = optimal_gamma, C = optimal_C)

 draw_boundary(ax, optimal_classifier)

 plt.show()

When I call machine(david_ortiz, range(1,6), range(1,6)) to test the function it throws out

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I’ve removed NaNs from the dataset, in a line inside the function and preemptively outside. I’ve checked for NaNs and they’re gone. But the error persists. Would really like to make this work! Apologies in advance for any unintended formatting!

Thanks!

EDIT: many thanks for the formatting tips tgrtim !

Indeed the error is generated by the line classifier.fit(training_set[['plate_x', 'plate_z']], training_set['type']) inside the for loop. It’s a simple .fit call - which works perfectly when not inside a loop…

Sadly I’ve never used pandas and it doesn’t seem like a straightforward numpy error which is eerily happy with infinity and NaN but I would encourage you to check this guidance on formatting code to help the next person along-

Short version, wrap the code as follows-
` ` `
code goes here
` ` `
Does the error point you to a specific location btw? That would be helpful too.