Classifying Galaxies Using Convolutional Neural Networks

Hello there everyone!

for the exercise Classifying Galaxies I’m getting a terrible accuracy score, I don’t know why! On the exercise, it’s mentioned that, following the steps, we were supposed to have a pretty good score:

Your accuracy tells you that your model assigns the highest probability to the correct class more than 60% of the time. For a classification task with over four classes, this is no small feat: a random baseline model would achieve only ~25% accuracy on the dataset. Your AUC tells you that for a random galaxy, there is more than an 80% chance your model would assign a higher probability to a true class than to a false one.

But it’s definitely not the case. The training data has a 0.32 and validation data 0.3. Yickes. .summary() prints the following:

categorical_accuracy: 0.3214 - val_loss: 1.3868 - val_categorical_accuracy: 0.3000
Model: “sequential”
Layer (type) Output Shape Param #

conv2d (Conv2D) (None, 63, 63, 8) 224

max_pooling2d (MaxPooling2D) (None, 31, 31, 8) 0

conv2d_1 (Conv2D) (None, 15, 15, 8) 584

max_pooling2d_1 (MaxPooling2 (None, 7, 7, 8) 0

flatten (Flatten) (None, 392) 0

dense (Dense) (None, 16) 6288

dense_1 (Dense) (None, 4) 68

Total params: 7,164
Trainable params: 7,164
Non-trainable params: 0

I’ll try to adjust things, but I’m wondering if I’ve done something wrong. Here’s the code!

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import numpy as np

from sklearn.model_selection import train_test_split
from utils import load_galaxy_data

import app

input_data, labels = load_galaxy_data()
batch_size = 5

generator = ImageDataGenerator(rescale=1./128)

x_train, x_test, y_train, y_test = train_test_split(input_data, labels, test_size = 0.2, shuffle=True, random_state=222, stratify=labels)

training_iterator = generator.flow(x_train, y_train, batch_size=5)
validation_iterator = generator.flow(x_test, y_test, batch_size=batch_size)

model = tf.keras.Sequential([
  tf.keras.layers.Conv2D(8, 3, strides=2),
  tf.keras.layers.MaxPooling2D(pool_size=(2,2), strides=(2,2)),
  tf.keras.layers.Conv2D(8, 3, strides=2),
  tf.keras.layers.MaxPooling2D(pool_size=(2,2), strides=(2,2)),
  tf.keras.layers.Dense(16, activation='relu'),
  tf.keras.layers.Dense(4, activation='softmax')

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss='CategoricalCrossentropy', metrics = ['CategoricalAccuracy']), y_train, steps_per_epoch = len(x_train)/batch_size, epochs = 8, validation_data = validation_iterator, validation_steps = len(validation_iterator)/batch_size)

Something I’ve fixed is that the conv2d layers had no activation, but even with relu, it performs poorly.Still 0.3

Hi @lucasvinzon, after a skim through your code, I’ve noticed two things:

  • the rescale factor in ImageDataGenerator is set to 1./128, it should be 1./255 (you want to normalize the pixel intensity);
  • when training the model, the validation_steps you’re using len(validation_iterator), it should be len( x_test)

I hope this is enough for you to get better results.


1 Like

Ha!! Finally, I might understand rescale! Is supposed to normalize the pixel intensity?? For color theory, it would be the range between 0 and 255 of the Value property (brightness).
For the validation_steps, I’ll check that. I think I got it, we are iterating through the x_test, the iterator functions as a configurator for the iterations.
Thank you so much! :smiley: And welcome back, seems you’ve been away for some time.
Does PT stand for Portuguese or Portugal?

Yes, you’re right, think of it as brightness (ranging from 0 to 255).
Sim PT de português :slight_smile:
In case you’re interest I’m posting my solution to this exercise:

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from utils import load_galaxy_data
import app

input_data, labels = load_galaxy_data()

# shapes: 
# input_data (1400, 128, 128, 3), meaning 1400no images, 128 by 128 pixels, rbg (3 channels)
# labels(1400, 4), meaning 1400no labels, 4 types of classifications

X_train, X_test, Y_train, Y_test = train_test_split(input_data, labels, test_size=0.2, random_state=222, stratify=labels)

train_data_generator = ImageDataGenerator(rescale=1.0/255.0)

batch_size_val = 5

train_iterator = train_data_generator.flow(X_train, Y_train, batch_size=batch_size_val)

validation_iterator = train_data_generator.flow(X_test, Y_test, batch_size=batch_size_val)

model = tf.keras.models.Sequential()

# input layer


# convolution + max pooling layers

# 1st pair

model.add(tf.keras.layers.Conv2D(3, 3, strides=2, padding='valid', activation='relu'))

# 2nd pair

model.add(tf.keras.layers.Conv2D(8, 3, strides=2, padding='same', activation='relu'))

model.add(tf.keras.layers.MaxPooling2D(pool_size=(2,2), strides=(2,2)))


# output layer (4 outputs)


opt = tf.keras.optimizers.Adam(learning_rate=0.001)

loss_function = tf.keras.losses.CategoricalCrossentropy()




  metrics=[tf.keras.metrics.CategoricalAccuracy(), tf.keras.metrics.AUC()]




  steps_per_epoch = len(X_train)/batch_size_val,





With this model, I managed to achieve:
val_categorical_accuracy = 0.7179
val_auc = 0.8935

I would like to know what other people managed to do in this exercise, but couldn’t find any other forum with this topic.

Portugal? Brasil? Angola? Sou do Rio de Janeiro :smiley:
There are chapters in Portuguese if you are willing to join! There is one in São Paulo which tackles Machine Learning, and I’m heading one in Rio to help newcomers. You are more then welcome to both I guess.

Thank you for your solution! seems pretty clean.
The deeper we go into Codecademy, the harder it is to find solutions, but eventually we do get answers! Its nice to be the first sometimes since you get the chance to review things and clear up things :slight_smile:

1 Like

Sou do sul de Portugal :slight_smile:
Thanks for your suggestion, I was unaware of the existence of “Chapters” in Codecademy.

1 Like