Forest Cover Classification

Forest Cover Classification

The last project also done. I reached 82% accuracy. I am not familiar is it appropriate or need to improve more the model?

  • I tried to change batch_size with learning_rate… This values gave the best accuracy/performance.
  • I tried to modify the model as well. Models with lower hidden layers gave too small accuracy. More complex model increase the running time but not raise the accuracy significantly.

I am open to any idea how to improve my code or model.

Link to GitHub

Link to Colaboratory

Hi @lendoo. Thanks for sharing! It looks like you need to make the Colaboratory public, right now we’re not able to access it. Looking forward to checking out your project :slight_smile:

Hi,

Can you access now to the notebook.
I got this link.

Sharing my deep learning portfolio project; Forest Cover Classification.

I think it was a good project to complete the deep learning tensor flow skill path. Gave me a chance to not only practice creating a neural network, but also practice adjusting hyperparameters and evaluating the model. My accuracy was around 83%.

Hey there everyone!
With the following model,

model = Sequential()
model.add(Input(shape=x_train.shape[1],))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(8, activation='softmax'))

model.compile(optimizer=Adam(learning_rate=0.01), loss=SparseCategoricalCrossentropy(), metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=200, epochs=50, validation_split=0.1)

I’ve got this classification report:
precision recall f1-score support

       1       0.86      0.79      0.82     42275
       2       0.83      0.91      0.87     56602
       3       0.87      0.74      0.80      7269
       4       0.81      0.60      0.69       546
       5       0.82      0.30      0.44      1929
       6       0.60      0.80      0.69      3496
       7       0.87      0.77      0.82      4086

accuracy                           0.84    116203

macro avg 0.81 0.70 0.73 116203
weighted avg 0.84 0.84 0.83 116203

I’ll have my presentation done later, but I’d like to point something out! It was the first time I’ve seen info on this Skill Path on how to save and open a model! Quite simple, just

model.save(‘path/to/save’)

and then open it with

model = keras.models.load_model(‘path/to/load’).

I would definitely consider having that not just a side note on the Final Project both on this path and also for the Data Science Path! The notion that models can be saved and utilized later on as a program is quite necessary! My bad if there is info on that during the paths :stuck_out_tongue:

Hi all,
My results are not too far from yours but here’s my two cents anyway.
First of all I did not consider all the columns as features for the final model. I’ve disregarded the following ones:

  • ‘Soil_Type7’, ‘Soil_Type8’, ‘Soil_Type14’, ‘Soil_Type15’, ‘Soil_Type25’, ‘Soil_Type36’, ‘Soil_Type37’ - these soil types have very few samples compared to the others, so I assumed them not to be as relevant. The corresponding samples were removed as well;
  • Hillshade values - my understanding is that these values can be calculated based on other features (‘Slope’ and ‘Aspect’) and for that reason they should not provide any additional information;
  • ‘Horizontal_Distance_To_Roadways’ and ‘Horizontal_Distance_To_Fire_Points’ - my reasoning is that since the description of the project says that the “forest cover types are mainly a result of ecological processes rather than forest management practices” I wanted to discard any features that suggested any type of man activity;
  • The wilderness areas - based on my quick research, some of these areas are described as having a specific type of tree (like the Cache La Poudre Wilderness Area having ponderosa and lodgepole pine forests, see in Cache La Poudre Wilderness​ | National Wilderness Area near Fort Collins, CO). This made me think that the wilderness areas could in some way introduce biased information regarding tree types. I agree this is debatable but in the end I decided not to include these features.
    My model, in summary:
  • 3 hidden dense layers (128, 64, 32 units accordingly);
  • optimizer = Adam(learning_rate=0.005);
  • loss_function = tf.keras.losses.CategoricalCrossentropy()
  • metrics: categorical accuracy and AUC;
  • epochs = 60;
  • batch_size = 256 (I read that the performance of processors is better if the batch size is a power of 2);
    The final results:
                   precision    recall  f1-score   support

       Spruce/Fir       0.78      0.83      0.80     42332
   Lodgepole Pine       0.85      0.83      0.84     56534
   Ponderosa Pine       0.78      0.87      0.82      7127
Cottonwood/Willow       0.77      0.63      0.69       518
            Aspen       0.73      0.50      0.59      1899
      Douglas-fir       0.70      0.55      0.62      3407
        Krummholz       0.86      0.68      0.76      4030

Feel free to comment!
Cheers.

I’m so interesting why you chose to use sparse categorical crossentropy instead of one-hot encoding and using normal categorical crossentropy. What is the difference/benefit? I saw an allusion to “sparse” in a hint, but I didn’t spot it covered in the material.

Using sparse categorical crossentropy is meant for when a model uses categorical data! Which I think means that it could have any value from 1 to N being N the number of categories. The benefit for me is that it was easier to use.

eg. Having a row that falls under category 2:
sparse: 2
normal: [0, 1, 0]

If I’m wrong hope someone can correct me!

Maybe someone can enlighten me, since this topic is not really covered in the codecademy class?

I trained and validated the model. So far so good. Now i want to use it for a single dataset prediction. How can I use the column transfer standard scaler for the new data with the values of the scaling of the training features?

st = ColumnTransformer([(“scale”, StandardScaler(), [‘Elevation’, ‘Aspect’, ‘Horizontal_Distance_To_Hydrology’,
“Vertical_Distance_To_Hydrology”,
“Horizontal_Distance_To_Roadways”,
“Hillshade_9am”, “Hillshade_Noon”, “Hillshade_3pm”,
“Horizontal_Distance_To_Fire_Points”])],
remainder=‘passthrough’)
scaled_features = panda.DataFrame(st.fit_transform(self.training_features))

Looking forward to the solution :grinning: