Hello, please take a look at the slides, as well as a copy of my jupyter notebook file for the Netflix Stock Data Capstone Project below.
Any feedback is appreciated! Thanks!
JUPYTER NOTEBOOK BELOW:
Introduction
In this project, you will act as a data visualization developer at Yahoo Finance! You will be helping the “Netflix Stock Profile” team visualize the Netflix stock data. In finance, a stock profile is a series of studies, visualizations, and analyses that dive into different aspects a publicly traded company’s data.
For the purposes of the project, you will only visualize data for the year of 2017. Specifically, you will be in charge of creating the following visualizations:
- The distribution of the stock prices for the past year
- Netflix’s earnings and revenue in the last four quarters
- The actual vs. estimated earnings per share for the four quarters in 2017
- A comparison of the Netflix Stock price vs the Dow Jones Industrial Average price in 2017
Note: We are using the Dow Jones Industrial Average to compare the Netflix stock to the larter stock market. Learn more about why the Dow Jones Industrial Average is a general reflection of the larger stock market here.
During this project, you will analyze, prepare, and plot data. Your visualizations will help the financial analysts asses the risk of the Netflix stock.
After you complete your visualizations, you’ll be creating a presentation to share the images with the rest of the Netflix Stock Profile team. Your slides should include:
- A title slide
- A list of your visualizations and your role in their creation for the “Stock Profile” team
- A visualization of the distribution of the stock prices for Netflix in 2017
- A visualization and a summary of Netflix stock and revenue for the past four quarters and a summary
- A visualization and a brief summary of their earned versus actual earnings per share
- A visualization of Netflix stock against the Dow Jones stock (to get a sense of the market) in 2017
Financial Data Source: Yahoo Finance
Step 1
Let’s get our notebook ready for visualizing! Import the modules that you’ll be using in this project:
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
Step 2
Let’s load the datasets and inspect them.
Load NFLX.csv into a DataFrame called netflix_stocks
. Then, quickly inspect the DataFrame using print()
.
Hint: Use the pd.read_csv()
function).
Note: In the Yahoo Data, Adj Close
represents the adjusted close price adjusted for both dividends and splits. This means this is the true closing stock price for a given business day.
netflix_stocks = pd.read_csv('NFLX.csv')
print(netflix_stocks)
Load DJI.csv into a DataFrame called dowjones_stocks
. Then, quickly inspect the DataFrame using print()
.
Note: You can learn more about why the Dow Jones Industrial Average is a industry reflection of the larger stock market here.
dowjones_stocks = pd.read_csv('DJI.csv')
print(dowjones_stocks)
Load NFLX_daily_by_quarter.csv into a DataFrame called netflix_stocks_quarterly
. Then, quickly inspect the DataFrame using print()
.
netflix_stocks_quarterly = pd.read_csv('NFLX_daily_by_quarter.csv')
print(netflix_stocks_quarterly)
Step 3
Let’s learn more about our data. The datasets are large and it may be easier to view the entire dataset locally on your computer. Open the CSV files directly from the folder you downloaded for this project.
-
NFLX
is the stock ticker symbol for Netflix and^DJI
is the stock ticker symbol for the Dow Jones industrial Average, which is why the CSV files are named accordingly - In the Yahoo Data,
Adj Close
is documented as adjusted close price adjusted for both dividends and splits. - You can learn more about why the Dow Jones Industrial Average is a industry reflection of the larger stock market here.
Answer the following questions by inspecting the data in the NFLX.csv,DJI.csv, and NFLX_daily_by_quarter.csv in your computer.
What year is represented in the data? Look out for the latest and earliest date.
# The year of 2017
- Is the data represented by days, weeks, or months?
- In which ways are the files different?
- What’s different about the columns for
netflix_stocks
versusnetflix_stocks_quarterly
?
# Data points are daily in NFLX_daily_by_quarter.csv, and monthly in both NFLX.csv and DJI.CSV
# DJI.csv provides monthly data for the Dow Jones index. NFLX.csv provides monthly data for the NFLX stock ticker.
# NFLX_daily_by_quarter provides data for every business day for the NFLX stock ticker.
# The difference between the columns is that the netflix_stocks columns provide the values for each day,
# whereas the netflix_stocks_quarterly columns provide the values for the entire month (e.g. volume is sum of all days in the month for netflix_stocks_quarterly).
# Lastly, the netflix_stocks_quarterly dataframe has an extra column to indicate the current quarter for each month.
Step 4
Great! Now that we have spent sometime looking at the data, let’s look at the column names of the DataFrame netflix_stocks
using .head()
.
netflix_stocks.head()
What do you notice? The first two column names are one word each, and the only one that is not is Adj Close
!
The term Adj Close
is a confusing term if you don’t read the Yahoo Documentation. In Yahoo, Adj Close
is documented as adjusted close price adjusted for both dividends and splits.
This means this is the column with the true closing price, so these data are very important.
Use Pandas to change the name of of the column to Adj Close
to Price
so that it is easier to work with the data. Remember to use inplace=True
.
Do this for the Dow Jones and Netflix Quarterly pandas dataframes as well.
Hint: Use .rename()
).
netflix_stocks.rename(columns={'Adj Close': 'Price'}, inplace=True)
netflix_stocks_quarterly.rename(columns={'Adj Close': 'Price'}, inplace=True)
dowjones_stocks.rename(columns={'Adj Close': 'Price'}, inplace=True)
Run netflix_stocks.head()
again to check your column name has changed.
netflix_stocks.head()
Call .head()
on the DataFrame dowjones_stocks
and netflix_stocks_quarterly
.
print('dowjones_stocks:\n', dowjones_stocks.head(), '\n\n\n')
print('netflix_stocks_quarterly:\n', netflix_stocks_quarterly.head())
Step 5
In this step, we will be visualizing the Netflix quarterly data!
We want to get an understanding of the distribution of the Netflix quarterly stock prices for 2017. Specifically, we want to see in which quarter stock prices flucutated the most. We can accomplish this using a violin plot with four violins, one for each business quarter!
- Start by creating a variable
ax
and setting it equal tosns.violinplot()
. This will instantiate a figure and give us access to the axes through the variable nameax
. - Use
sns.violinplot()
and pass in the following arguments:
- The
Quarter
column as thex
values - The
Price
column as youry
values - The
netflix_stocks_quarterly
dataframe as yourdata
- Improve the readability of the chart by adding a title of the plot. Add
"Distribution of 2017 Netflix Stock Prices by Quarter"
by usingax.set_title()
- Change your
ylabel
to “Closing Stock Price” - Change your
xlabel
to “Business Quarters in 2017” - Be sure to show your plot!
ax = sns.violinplot(data=netflix_stocks_quar

terly, x='Quarter', y='Price')
ax.set_title('Distribution of 2017 Netflix Stock Prices by Quarter')
ax.set_ylabel('Closing Stock Price')
ax.set_xlabel('Business Quarters in 2017')
plt.savefig("nflx_distribution.png")
plt.show()
Step 6
Next, we will chart the performance of the earnings per share (EPS) by graphing the estimate Yahoo projected for the Quarter compared to the actual earnings for that quarters. We will accomplish this using a scatter chart.
-
Plot the actual EPS by using
x_positions
andearnings_actual
with theplt.scatter()
function. Assignred
as the color. -
Plot the actual EPS by using
x_positions
andearnings_estimate
with theplt.scatter()
function. Assignblue
as the color -
Often, estimates and actual EPS are the same. To account for this, be sure to set your transparency
alpha=0.5
to allow for visibility pf overlapping datapoint. -
Add a legend by using
plt.legend()
and passing in a list with two strings["Actual", "Estimate"]
-
Change the
x_ticks
label to reflect each quarter by usingplt.xticks(x_positions, chart_labels)
-
Assing "
"Earnings Per Share in Cents"
as the title of your plot.
x_positions = [1, 2, 3, 4]
chart_labels = ["1Q2017","2Q2017","3Q2017","4Q2017"]
earnings_actual =[.4, .15,.29,.41]
earnings_estimate = [.37,.15,.32,.41 ]
plt.scatter(x_positions, earnings_actual, color='red', alpha=0.5)
plt.scatter(x_positions, earnings_estimate, color='blue', alpha=0.5)
plt.legend(['Actual', 'Estimate'])
plt.xticks(x_positions, chart_labels)
plt.title('Earnings Per Share in Cents')
plt.savefig("earnings_actual_v_estimate.png")
plt.show()
Step 7
Next, we will visualize the earnings and revenue reported by Netflix by mapping two bars side-by-side. We have visualized a similar chart in the second Matplotlib lesson Exercise 4.
As you may recall, plotting side-by-side bars in Matplotlib requires computing the width of each bar before hand. We have pasted the starter code for that exercise below.
- Fill in the
n
,t
,d
,w
values for the revenue bars - Plot the revenue bars by calling
plt.bar()
with the newly computedx_values
and therevenue_by_quarter
data - Fill in the
n
,t
,d
,w
values for the earnings bars - Plot the revenue bars by calling
plt.bar()
with the newly computedx_values
and theearnings_by_quarter
data - Create a legend for your bar chart with the
labels
provided - Add a descriptive title for your chart with
plt.title()
- Add labels to each quarter by assigning the position of the ticks through the code provided. Hint:
plt.xticks(middle_x, quarter_labels)
- Be sure to show your plot!
# The metrics below are in billions of dollars
revenue_by_quarter = [2.79, 2.98,3.29,3.7]
earnings_by_quarter = [.0656,.12959,.18552,.29012]
quarter_labels = ["2Q2017","3Q2017","4Q2017", "1Q2018"]
# Revenue
n = 1 # This is our first dataset (out of 2)
t = 2 # Number of dataset
d = 4 # Number of sets of bars
w = 0.8 # Width of each bar
bars1_x = [t*element + w*n for element
in range(d)]
# Earnings
n = 2 # This is our second dataset (out of 2)
t = 2 # Number of dataset
d = 4 # Number of sets of bars
w = 0.8 # Width of each bar
bars2_x = [t*element + w*n for element
in range(d)]
middle_x = [ (a + b) / 2.0 for a, b in zip(bars1_x, bars2_x)]
labels = ["Revenue", "Earnings"]
plt.xticks(middle_x, quarter_labels)
plt.bar(bars1_x, revenue_by_quarter, color='purple')
plt.bar(bars2_x, earnings_by_quarter, color='orange')
plt.legend(labels)
plt.title('How do quarterly Revenue and Earnings compare for NFLX in 2017?')
plt.savefig("earnings_v_revenue.png")
plt.show()
Graph Literacy
What are your first impressions looking at the visualized data?
- Does Revenue follow a trend?
- Do Earnings follow a trend?
- Roughly, what percentage of the revenue constitutes earnings?
earnings_from_revenue = []
for a, b in zip(revenue_by_quarter, earnings_by_quarter):
earnings_from_revenue.append(b/a)
df_evr = pd.DataFrame({'Quarter': quarter_labels,
'Earnings from Revenue': earnings_from_revenue})
ax = df_evr.plot(x='Quarter', y='Earnings from Revenue', style='o--')
vals = ax.get_yticks()
ax.set_yticklabels(['{:,.2%}'.format(x) for x in vals])
ax.set_xticks([0, 1, 2, 3])
ax.set_xticklabels(quarter_labels)
ax.set_ylabel('% Revenue from Earnings')
plt.title('What percentage of revenue constitutes earnings? (Quarterly)')
plt.savefig("revenue_from_earnings.png")
plt.show()
# Both revenue and earnings follow a trend (linear, positive)
# Between 2-8% of earnings come from revenue
Step 8
In this last step, we will compare Netflix stock to the Dow Jones Industrial Average in 2017. We will accomplish this by plotting two line charts side by side in one figure.
Since Price
which is the most relevant data is in the Y axis, let’s map our subplots to align vertically side by side.
-
We have set up the code for you on line 1 in the cell below. Complete the figure by passing the following arguments to
plt.subplots()
for the first plot, and tweaking the third argument for the second plot-
1
– the number of rows for the subplots -
2
– the number of columns for the subplots -
1
– the subplot you are modifying
-
-
Chart the Netflix Stock Prices in the left-hand subplot. Using your data frame, access the
Date
andPrice
charts as the x and y axes respectively. Hint: (netflix_stocks['Date'], netflix_stocks['Price']
) -
Assign “Netflix” as a title to this subplot. Hint:
ax1.set_title()
-
For each subplot,
set_xlabel
to"Date"
andset_ylabel
to"Stock Price"
-
Chart the Dow Jones Stock Prices in the left-hand subplot. Using your data frame, access the
Date
andPrice
charts as the x and y axes respectively. Hint: (dowjones_stocks['Date'], dowjones_stocks['Price']
) -
Assign “Dow Jones” as a title to this subplot. Hint:
plt.set_title()
-
There is some crowding in the Y axis labels, add some space by calling
plt.subplots_adjust(wspace=.5)
-
Be sure to
.show()
your plots.
# Left plot Netflix
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'June', 'July', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
x_values = range(len(months))
plt.figure(figsize=(12, 4))
ax1 = plt.subplot(1, 2, 1)
ax1.plot(netflix_stocks['Date'], netflix_stocks['Price'], color='purple')
ax1.set_title('Netflix')
ax1.set_xlabel('Date')
ax1.set_ylabel('Stock Price')
ax1.set_xticks(x_values)
ax1.set_xticklabels(months)
# Right plot Dow Jones
ax2 = plt.subplot(1, 2, 2)
ax2.plot(dowjones_stocks['Date'], dowjones_stocks['Price'])
ax2.set_title('Dow Jones')
ax2.set_xlabel('Date')
ax2.set_ylabel('Stock Price')
ax2.set_xticks(x_values)
ax2.set_xticklabels(months)
plt.suptitle('Does NFLX behave similarly to the DJI in 2017?')
plt.subplots_adjust(wspace=.5)
plt.savefig("nflx_v_dji.png")
plt.show()
- How did Netflix perform relative to Dow Jones Industrial Average in 2017?
- Which was more volatile?
- How do the prices of the stocks compare?