During the Decision Trees Project - Find the Flag, you are asked - in question 10. - to create a for loop to iterate different max_depths over a piece of code, which creates, trains, and tests a decision tree, resulting in the DecisionTreeClassifier.score. In the subsequent two questions these results are then stored and graphed. Finally, - in question 13. - you are asked to say something meaningful about this graph and here we find my issue.
Each time I run the code, the graph is markedly different, which prevents me from providing a meaningful answer to the question AND, more importantly invokes a question in me:
Is this correct? Why is the graph markedly different each time I run the code? Is this because the train_test_split is different each time and the sample size is “too” small to have the different split result in a similar graph?
Below is the code:
import codecademylib3_seaborn import pandas as pd from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier import matplotlib.pyplot as plt flags = pd.read_csv('flags.csv', header = 0) labels = flags[['Landmass']] data = flags[["Red", "Green", "Blue", "Gold", "White", "Black", "Orange", "Circles", "Crosses", "Saltires", "Quarters", "Sunstars", "Crescent", "Triangle"]] train_data, test_data, train_labels, test_labels = train_test_split(data, labels) scores = [ ] for i in range(1,20): tree = DecisionTreeClassifier(random_state = 1, max_depth = i) tree.fit(train_data, train_labels) scores.append(tree.score(test_data, test_labels)) plt.plot(range(1, 20), scores) plt.show() plt.close()