I hope you’re doing great. I downloaded a cancer dataset from Kaggle and did a project about prediction the type of cancer. It took a week for me to do the project and it was an interesting experience. I got the best result from AdaBoost classifier(98% accuracy, 95% recall and 98% f1-score). Since the data was slightly imbalanced, I used the SMOTE technique to balance the dataset, but it didn’t show a marked improvement. I also used PCA to see if the performance gets better or not. I summarized the results of all my models in tables and compared them with four metrics (Accuracy, Recall, Precision and F1-score).
I would be so grateful and happy if you give me your feedbacks.
I uploaded my code on the github :
Thanks in advance!
Congrats on completing the project. A lot of work went into this!
a few thoughts:
Maybe in the readme file, rather than paste the CC specific directions for the final project, just describe in your own words: 1.) where the data came from, 2.) what’s in the data, ie: " 570 cancer cells and 30 features to determine whether the cancer cells in our data are benign or malignant. Our cancer data contains 2 types of cancers: 1. benign cancer (B) and 2. malignant cancer (M)." (from Kaggle), and the goals of your project/what you hope to accomplish with it, etc.
You do a good job of using comments and text markdown, describing what you’re doing as you go along with the EDA and selecting a model. I think that anyone (even non-tech inclined people) can follow along.
Your conclusions wrap up everything neatly.
Thank you very much for your suggestions. I really appreciate the time you took to read my code and give your feedback.
I find it to be very helpful.
Have a good time.