In the context of this exercise, how many axes of features can you have?
There is no limit to how many axes, or number of features, that you can have for a dataset. The more features you have for a dataset, the more accurate your predictor can be.
For instance, in addition to the three measurements used in the exercise to determine humans from cyborgs, we could have also added additional features like “Running speed” and “Power”.
Sure we can have more features but how would it model other geometry figures or it would be the typical graphs?
Currently it looks like the axes are set to the x, y, and z fields. If we wanted, we could adjust this. The code for that ability to change on the fly which graphical projection we would want to acquire would be more difficult to employ than the hard coded example they have here.
In the context of this exercise, how is the input data not labelled - aren’t they labelled as cyborg/robots/humans - what is the difference then between labelled and not labelled data?
hey, from what I understand, the input data that the learning algorithm evaluates is not labelled. It’s just labelled on the graph afterwards for human comprehension.
“The more features you have for a dataset, the more accurate your predictor can be.” → not always. Sometimes additional non-related features can cause the ML to predict with more errors. For example, assume you have made a model that predicts life satisfaction based on GDP. Then you decided to add ‘name of the country’ as an additional feature (predictor) to your model. Your ML starts to ‘think’ that people in countries whose names have ‘w’ in them, such as Norway, Switzerland, Sweden are happier. Then you give the model a new data - a country called Zimbabwe. Now although the GPD of Zimbabwe is low, the model thinks its people should have higher life satisfaction because the country name has ‘w’ in it.
I hope we agree that name of the country has nothing to do with the life satisfaction of its people. So this example shows how an additional feature (name of the country) does not necessarily increase the prediction accuracy.
It would be the latter. We essentially are observing a larger dataset across more axes. This can mean one of many things. Are we observing more Data or processing more tests to plot our data from? In this case I believe we have an unlimited amount of data we can plot on the axis.
The dataset is already labeled. Run the following snippet to find labels (aka “target”) used for this example. By the way, keep in mind that the dataset has nothing to do with cyborg, robots and humans. It’s just a set of data about eye iris. In this example, for the sake of example and being easier to follow, they used this dataset with a storyline of humans, robots and cyborgs to match 3 different labels used in this dataset.
# Credits: https://thatascience.com/learn-machine-learning/iris-dataset/
from sklearn.datasets import load_iris
import pandas as pd
#install pandas through <pip install pandas> if you've not already.
# Load Data
iris = load_iris()
# Create a dataframe
df = pd.DataFrame(iris.data, columns = iris.feature_names)
df['target'] = iris.target
X = iris.data