Here is crossing my fingers Lillian highlights my final project!
This is it, I’ve made it to the end of the data science course with this capstone project. I found data about a wind turbine on Kaggle. The data poster was asking about predicting the power output of wind turbine 15 days out. While there wasn’t weather data provided, I did put a model together for modeling the power output based on all the parameters given, and those related to weather.
This was a fun project. I’m a professional engineer and work in power generation. Looking at the data of the wind turbine was fun. It took me about twenty-five to thirty hours to put this together. I found as I tried to get fancy with plugins for Jupyter notebook, I started to break matplotlib or even the version of python being used. I spent a good four hours trying to cobble my environment back together. I did find some great tools for doing exploratory data analysis (See the profile report code in my notebook for more).
I focused on mostly using machine learning as that is what I think is the most interesting aspect of data science. Image processing and NLP are cool topics, but not something I am looking to use at this time. Maybe in another project!
I found my topic was narrow enough where I could show the basic skills I learned from the course, and still have much to learn. However, I spent almost everyday coding. I have built enough of the skills set where I could feel comfortable talking about it in an interview.
I learned through the last few projects that being intentional about what questions I can answer makes a big difference in how much I enjoy the project. In some of the other capstone projects for the units, I would ask a question that was hard to answer or was missing a lot of information. This data set I spent more time making sure I understood what I could and could not answer with the data.
I also pushed myself to be more concise with the code or make functions where I did repeat calculations. I’ve been sloppy in the past with copy and pasting code instead of making a function. I’ve noticed as part of my data analysis I have a bunch of basic functions I use again and again. I need to create a library for myself to speed up data exploration.
I’m excited to explore some of the intermediate and advanced topics on Codecademy to further push myself and increase my skills.
What I would like feedback on is making sure when I normalized and shifted the data is the correct approach. I am hesitant that the data was modified as it should given the high predictive scores I am getting with the methods used.
You can find my final project here: Wind Turbine Capstone Project