Hey guys, I"m currently getting familiar with sklearn and working on the Classification Project of the Deep Learning Path : https://www.codecademy.com/paths/build-deep-learning-models-with-tensorflow/tracks/dlsp-classification-track/modules/dlsp-classification/projects/classification-neural-networks-project
In the process of this project (and probably most ML projects ahead), I"m using the ColumnTransformer feature from sklearn in a way that I want to transform certain column of a pandas DataFrame and others not, e.g.:
# use column transformer with standard scaling on numeric features, passthrough other features ct = ColumnTransformer([ ('scaled', StandardScaler(), numeric_features), ],remainder='passthrough') X_train = ct.fit_transform(X_train) X_test = ct.transform(X_test)
When I peek at X_train for example after my transformation, not only the values have changed (as expected) but also I"m looking at a numpy ndarray (also as expected) with a different order! My ‘not numeric’ columns got pushed at the end of the array. I"m guessing ColumnTransformer transforms the column he is told, then puts them in the array and concats the other columns without transformation.
Now to my actual question: Is there any way I can backtrace now which column is which. I mean for my ML algos it should not matter right? They don’t care how a feature is called they’re just assigning weights. But I’m ‘scared’ something gets mixed up here and I’m going to draw false conclusions from my models. Is sklearn tracing these atrributes anywhere? Do I have access? Does it even matter (insert Linkin Park melody here whilst reading)?
Thanks a lot for your answers guys!