Supervised learning, advanced classification model

I am stuck in this code here, if anyone can help please?


We’ve given you the function find_best_split() that takes a set of data points and a set of labels.

The function returns the index of the feature that causes the best split and the information gain caused by that split.

For now, at the bottom of your code, call this function using car_data and car_labels as parameters and store the values in variables named best_feature and best_gain .

Print those two variables. What was the best feature to split on and what was the information gain?

My code:
from tree import *

car_data = [[‘med’, ‘low’, ‘3’, ‘4’, ‘med’, ‘med’], [‘med’, ‘vhigh’, ‘4’, ‘more’, ‘small’, ‘high’], [‘high’, ‘med’, ‘3’, ‘2’, ‘med’, ‘low’], [‘med’, ‘low’, ‘4’, ‘4’, ‘med’, ‘low’], [‘med’, ‘low’, ‘5more’, ‘2’, ‘big’, ‘med’], [‘med’, ‘med’, ‘2’, ‘more’, ‘big’, ‘high’], [‘med’, ‘med’, ‘2’, ‘more’, ‘med’, ‘med’], [‘vhigh’, ‘vhigh’, ‘2’, ‘2’, ‘med’, ‘low’], [‘high’, ‘med’, ‘4’, ‘2’, ‘big’, ‘low’], [‘low’, ‘low’, ‘2’, ‘4’, ‘big’, ‘med’]]

car_labels = [‘acc’, ‘acc’, ‘unacc’, ‘unacc’, ‘unacc’, ‘vgood’, ‘acc’, ‘unacc’, ‘unacc’, ‘good’]

def find_best_split(dataset, labels):

best_gain = 0

best_feature = 0

for feature in range(len(dataset[0])):

    data_subsets, label_subsets = split(dataset, labels, feature)

    gain = information_gain(labels, label_subsets)

    if gain > best_gain:

        best_gain, best_feature = gain, feature

return best_feature, best_gain

def best_feature(car_data, car_labels):

best_feature= find_best_split(car_data, car_labels)


def best_gain(car_data, car_labels):

best_gain= find_best_split(car_data, car_labels)


best_feature should be the first item returned from the call of find_best_split .

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.