There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply () below.
If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.
Join the Discussion. Help a fellow learner on their journey.
Ask or answer a question about this exercise by clicking reply () below!
You can also find further discussion and get answers to your questions over in Language Help.
Agree with a comment or answer? Like () to up-vote the contribution!
I’m curious to know what the many uses of One-Hot Encoding can be used for and some real world examples. From what I understand, it further creates more columns based on the unique values of the mfr column, cereal manufacturers or brand name, and assigning each row a 1 or zero.
Hi.
I haven’t understood the use case for preferring several dummy variables to one categorical variable.
I find the explanation in the lesson unclear.
It says:
sometimes we need a different approach. This could be because:
We have a nominal categorical variable (like breed of dog), so it doesn’t really make sense to assign numbers like 0,1,2,3,4,5 to our categories, as this could create an order among the species that is not present.
We have an ordinal categorical variable but we don’t want to assume that there’s equal spacing between categories.
BUT
We can create an unordered categorical type. So the first reason is unclear;
As to the second reason, it is unclear to me why “equal spacing” is a consideration at all for categorical variables?
The lesson here might have explained it poorly, but the real value of one-hot encoding becomes evident during statistical analysis, especially in exploratory data analysis.
It’s similar to the dummy variable technique, which helps assign numerical values for predictive or descriptive purposes. Think of it less as a matter of organization and more as a way to handle categorical data in modeling.
For instance, if we’re predicting survival odds based on Titanic embarkation points or computing a predicted rating for cereal based on the manufacturer, we might run into problems using label encoding. In a linear regression, label encoding would assign just one coefficient to a nominal variable, which isn’t ideal. For ordinal variables, it would assume a linear relationship between the “steps” (ie. as a very simple example if the variable is class of tickets predicting survival odds, you may end up with one predicted value, say -.3 with each movement of class (from class 1 to 2 to 3) having linear increase of -0,3 in survival odds) so each class shift might wrongly suggest a constant change in the predicted outcome.
One-hot encoding solves this issue by creating independent variables for each category. This way, each category can be assigned its own coefficient, allowing for more flexibility and accurate predictions.
Hope this helps. Just an example of how one-hot encoding is used vs. why not label encoding.