Data Scientist - Final Portfolio Project/ Capstone - As the Wind Blows

Greetings Coders!

Here is crossing my fingers Lillian highlights my final project!

This is it, I’ve made it to the end of the data science course with this capstone project. I found data about a wind turbine on Kaggle. The data poster was asking about predicting the power output of wind turbine 15 days out. While there wasn’t weather data provided, I did put a model together for modeling the power output based on all the parameters given, and those related to weather.

This was a fun project. I’m a professional engineer and work in power generation. Looking at the data of the wind turbine was fun. It took me about twenty-five to thirty hours to put this together. I found as I tried to get fancy with plugins for Jupyter notebook, I started to break matplotlib or even the version of python being used. I spent a good four hours trying to cobble my environment back together. I did find some great tools for doing exploratory data analysis (See the profile report code in my notebook for more).

I focused on mostly using machine learning as that is what I think is the most interesting aspect of data science. Image processing and NLP are cool topics, but not something I am looking to use at this time. Maybe in another project!

I found my topic was narrow enough where I could show the basic skills I learned from the course, and still have much to learn. However, I spent almost everyday coding. I have built enough of the skills set where I could feel comfortable talking about it in an interview.

I learned through the last few projects that being intentional about what questions I can answer makes a big difference in how much I enjoy the project. In some of the other capstone projects for the units, I would ask a question that was hard to answer or was missing a lot of information. This data set I spent more time making sure I understood what I could and could not answer with the data.

I also pushed myself to be more concise with the code or make functions where I did repeat calculations. I’ve been sloppy in the past with copy and pasting code instead of making a function. I’ve noticed as part of my data analysis I have a bunch of basic functions I use again and again. I need to create a library for myself to speed up data exploration.

I’m excited to explore some of the intermediate and advanced topics on Codecademy to further push myself and increase my skills.

What I would like feedback on is making sure when I normalized and shifted the data is the correct approach. I am hesitant that the data was modified as it should given the high predictive scores I am getting with the methods used.

You can find my final project here: Wind Turbine Capstone Project