Hi everyone!
I’m attending the “Data Scientist: Machine Learning Specialist” path.
Going through the project “Find the Flag!”, about Decision Tree Classification, I arrived at question 11, where it asks to tune the ccp_alpha
value in DecisionTreeClassifier, in the same way we did previously for the hyperparameter depth
.
However, in depth
’s case, the initial range where to loop in was already provided (range(1,21)
), while in ccp_alpha
it wasn’t.
Looking at the documentation, I understood that this value should be a non-negative float, but I was totally unsure about which initial range to use.
I had the idea to start with a big range (using the same range as for depth
), and then depending on the accuracy distribution, narrow it in the region showing the highest accuracy (which was between 0 and 1).
The codecademy solution however sets the initial range with ccp = np.logspace(-3, 0, num=20)
.
My question is: why did they use a log distribution of values as the initial range for ccp_alpha
? How did they decide the boundaries?
And more in general: is there a sort of rule to follow in terms of defining the initial range for ccp_alpha
tuning?
Thanks for the help!
Happy coding
Alberto