Sports Vector Machine launch_angle accuracy = 1.00


I included more features in the function… I followed the suggestion and I included ‘strikes’ and the accuracy reached 0.85! that’s nice!

BUT the issue came when I included ‘launch_angle’ feature… the accuracy is always 1.00. Why?! I do not feel ok with this result.

I included plate_x, plate_z and launch angle

Do anyone get the same result?

1 Like


Can you share your code (where relevant)?

Maybe somebody will notice something there.

1 Like


I copy and paste entirely just to be sure. At the bottom a print of the data frame with the results of the code.

import codecademylib3_seaborn

import matplotlib.pyplot as plt

from sklearn.svm import SVC

from sklearn.model_selection import train_test_split

from svm_visualization import draw_boundary

from players import aaron_judge, jose_altuve, david_ortiz

import pandas as pd

def strike_area(player, feature):

  fig, ax = plt.subplots()

  player['type'] = player['type'].map({'S':1, 'B': 0})

  player = player.dropna(subset = ['plate_x', 'plate_z', 'type', feature])

  plt.scatter(x= player.plate_x, y=player.plate_z, c=player.type,, alpha=0.25)

  training_set, validation_set = train_test_split(player, random_state=1)

  gam = []

  param_c = []

  accuracy_list = []

  for i in range (1, 5):

    for w in range(1,5):



      classifier = SVC(kernel='rbf', gamma = i, C=w)[['plate_x', 'plate_z', feature]], training_set.type)

      accuracy_list.append(classifier.score(validation_set[['plate_x', 'plate_z', feature]], validation_set.type))

  frames = [gam, param_c, accuracy_list]

  dictionary = {'Gamma':gam, 'C':param_c, 'Accuracy':accuracy_list}

  df = pd.DataFrame.from_dict(dictionary)

  print (df.sort_values(by=['Accuracy'], ascending = False))


  draw_boundary(ax, classifier)[['plate_x', 'plate_z', feature]], training_set.type)

  score = classifier.score(validation_set[['plate_x', 'plate_z', feature]], validation_set.type)

  draw_boundary(ax, classifier)

  # ax.set_ylim(-2, 6)

  # ax.set_xlim(-3, 3)


  return df


strike_area(aaron_judge, 'launch_angle')


1 Like

I tried it and got the same result. One reason I came up with is too little data. About 90% of rows have NaN as launch_angle. So applying .dropna() leaves only about 10% of the data (actually only 282 out of 2989 rows). Moreover, almost all of the remaining rows have 1 (strike) as type, and only 2 rows have 0 (ball).

1 Like

That makes sense, few data low reliable result.

Thank you!!