Here is my project. Please give some comments
[python]
#%% md
Introduction
In this project, you will act as a data visualization developer at Yahoo Finance! You will be helping the “Netflix Stock Profile” team visualize the Netflix stock data. In finance, a stock profile is a series of studies, visualizations, and analyses that dive into different aspects a publicly traded company’s data.
For the purposes of the project, you will only visualize data for the year of 2017. Specifically, you will be in charge of creating the following visualizations:
- The distribution of the stock prices for the past year
- Netflix’s earnings and revenue in the last four quarters
- The actual vs. estimated earnings per share for the four quarters in 2017
- A comparison of the Netflix Stock price vs the Dow Jones Industrial Average price in 2017
Note: We are using the Dow Jones Industrial Average to compare the Netflix stock to the larter stock market. Learn more about why the Dow Jones Industrial Average is a general reflection of the larger stock market here.
During this project, you will analyze, prepare, and plot data. Your visualizations will help the financial analysts asses the risk of the Netflix stock.
After you complete your visualizations, you’ll be creating a presentation to share the images with the rest of the Netflix Stock Profile team. Your slides should include:
- A title slide
- A list of your visualizations and your role in their creation for the “Stock Profile” team
- A visualization of the distribution of the stock prices for Netflix in 2017
- A visualization and a summary of Netflix stock and revenue for the past four quarters and a summary
- A visualization and a brief summary of their earned versus actual earnings per share
- A visualization of Netflix stock against the Dow Jones stock (to get a sense of the market) in 2017
Financial Data Source: Yahoo Finance
#%% md
Step 1
Let’s get our notebook ready for visualizing! Import the modules that you’ll be using in this project:
from matplotlib import pyplot as plt
import pandas as pd
-
import seaborn as sns
#%%
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
#%% md
Step 2
#%% md
Let’s load the datasets and inspect them.
#%% md
Load NFLX.csv into a DataFrame called netflix_stocks
. Then, quickly inspect the DataFrame using print()
.
Hint: Use the pd.read_csv()
function).
Note: In the Yahoo Data, Adj Close
represents the adjusted close price adjusted for both dividends and splits. This means this is the true closing stock price for a given business day.
#%%
netflix_stocks = pd.read_csv(“NFLX.csv”)
#%%
#inspect first few rows and check the cleanliness of the dataframe
netflix_stocks.head()
#%%
netflix_stocks.info() #seems the datatype is correct
#%%
netflix_stocks.isna().sum()
#%%
for col in netflix_stocks.columns:
print(netflix_stocks[col].unique())
#the netflix_stocks is a clean dataframe
#%% md
Load DJI.csv into a DataFrame called dowjones_stocks
. Then, quickly inspect the DataFrame using print()
.
Note: You can learn more about why the Dow Jones Industrial Average is a industry reflection of the larger stock market here.
#%%
dowjones_stocks = pd.read_csv(“DJI.csv”)
#%%
#inspect first few rows and check the cleanliness of the dataframe
dowjones_stocks.head()
#%%
dowjones_stocks.info() #seems the datatype is correct
#%%
dowjones_stocks.isna().sum()
#%%
for col in dowjones_stocks.columns:
print(dowjones_stocks[col].unique())
#the dowjones_stocks is a clean dataframe
#%% md
Load NFLX_daily_by_quarter.csv into a DataFrame called netflix_stocks_quarterly
. Then, quickly inspect the DataFrame using print()
.
#%%
netflix_stocks_quarterly = pd.read_csv(“NFLX_daily_by_quarter.csv”)
#%%
netflix_stocks_quarterly.head()
#%%
netflix_stocks_quarterly.info()
#%%
netflix_stocks_quarterly.isna().sum()
#%%
for col in netflix_stocks_quarterly.columns:
print(netflix_stocks_quarterly[col].unique())
#the netflix_stocks_quarterly is a clean dataframe
#%% md
Step 3
#%% md
Let’s learn more about our data. The datasets are large and it may be easier to view the entire dataset locally on your computer. Open the CSV files directly from the folder you downloaded for this project.
-
NFLX
is the stock ticker symbol for Netflix and^DJI
is the stock ticker symbol for the Dow Jones industrial Average, which is why the CSV files are named accordingly - In the Yahoo Data,
Adj Close
is documented as adjusted close price adjusted for both dividends and splits. - You can learn more about why the Dow Jones Industrial Average is a industry reflection of the larger stock market here.
Answer the following questions by inspecting the data in the NFLX.csv,DJI.csv, and NFLX_daily_by_quarter.csv in your computer.
#%% md
What year is represented in the data? Look out for the latest and earliest date.
#%% md
the data was collected in 2017
#%% md
- Is the data represented by days, weeks, or months?
- In which ways are the files different?
- What’s different about the columns for
netflix_stocks
versusnetflix_stocks_quarterly
?
#%% md
Is the data represented by days, weeks, or months?
In the first two dfs, the data is presented by months
In the last df, the data is presented by days
In which ways are the files different?
different files store different data of a stock
What’s different about the columns for netflix_stocks versus netflix_stocks_quarterly?
the difference is the netflix_stocks_quarterly has “Quarter” column
#%% md
Step 4
Great! Now that we have spent sometime looking at the data, let’s look at the column names of the DataFrame netflix_stocks
using .head()
.
#%%
netflix_stocks.head()
#%% md
What do you notice? The first two column names are one word each, and the only one that is not is Adj Close
!
The term Adj Close
is a confusing term if you don’t read the Yahoo Documentation. In Yahoo, Adj Close
is documented as adjusted close price adjusted for both dividends and splits.
This means this is the column with the true closing price, so these data are very important.
Use Pandas to change the name of of the column to Adj Close
to Price
so that it is easier to work with the data. Remember to use inplace=True
.
Do this for the Dow Jones and Netflix Quarterly pandas dataframes as well.
Hint: Use .rename()
).
#%%
netflix_stocks.rename(columns={“Adj Close”: “Price”}, inplace=True)
dowjones_stocks.rename(columns={“Adj Close”: “Price”}, inplace=True)
netflix_stocks_quarterly.rename(columns={“Adj Close”: “Price”}, inplace=True)
#%% md
Run netflix_stocks.head()
again to check your column name has changed.
#%%
netflix_stocks.head()
#%% md
Call .head()
on the DataFrame dowjones_stocks
and netflix_stocks_quarterly
.
#%%
dowjones_stocks.head()
#%%
netflix_stocks_quarterly.head()
#%% md
Step 5
In this step, we will be visualizing the Netflix quarterly data!
We want to get an understanding of the distribution of the Netflix quarterly stock prices for 2017. Specifically, we want to see in which quarter stock prices flucutated the most. We can accomplish this using a violin plot with four violins, one for each business quarter!
- Start by creating a variable
ax
and setting it equal tosns.violinplot()
. This will instantiate a figure and give us access to the axes through the variable nameax
. - Use
sns.violinplot()
and pass in the following arguments:
- The
Quarter
column as thex
values - The
Price
column as youry
values - The
netflix_stocks_quarterly
dataframe as yourdata
- Improve the readability of the chart by adding a title of the plot. Add
"Distribution of 2017 Netflix Stock Prices by Quarter"
by usingax.set_title()
- Change your
ylabel
to “Closing Stock Price” - Change your
xlabel
to “Business Quarters in 2017” - Be sure to show your plot!
#%%
ax = sns.violinplot(data=netflix_stocks_quarterly, x=“Quarter”, y=“Price”)
ax.set_title(“Distribution of 2017 Netflix Stock Prices by Quarter”)
ax.set_xlabel(“Business Quarters in 2017”)
ax.set_ylabel(“Closing Stock Price”)
plt.show()
#%% md
Graph Literacy
-
What are your first impressions looking at the visualized data?
-
In what range(s) did most of the prices fall throughout the year?
-
What were the highest and lowest prices?
#%% md
Q1: We can see the apparent difference between the the average stock price between quarters in 2017.
On average, Q4 has the highest Netflix Stock Prices
In Q3, the Netflix stock prices fluctuated wildly
Q2: 120-200
Q3: highest price is above 200 and the lowest price is under 130. We cant know the exact number if we only look at the graph
#%% md
#%% md
Step 6
Next, we will chart the performance of the earnings per share (EPS) by graphing the estimate Yahoo projected for the Quarter compared to the actual earnings for that quarters. We will accomplish this using a scatter chart.
-
Plot the actual EPS by using
x_positions
andearnings_actual
with theplt.scatter()
function. Assignred
as the color. -
Plot the actual EPS by using
x_positions
andearnings_estimate
with theplt.scatter()
function. Assignblue
as the color -
Often, estimates and actual EPS are the same. To account for this, be sure to set your transparency
alpha=0.5
to allow for visibility pf overlapping datapoint. -
Add a legend by using
plt.legend()
and passing in a list with two strings["Actual", "Estimate"]
-
Change the
x_ticks
label to reflect each quarter by usingplt.xticks(x_positions, chart_labels)
-
Assing "
"Earnings Per Share in Cents"
as the title of your plot.
#%%
x_positions = [1, 2, 3, 4]
chart_labels = [“1Q2017”,“2Q2017”,“3Q2017”,“4Q2017”]
earnings_actual =[.4, .15,.29,.41]
earnings_estimate = [.37,.15,.32,.41 ]
plt.scatter(x_positions,earnings_actual, color=“red”, alpha=.5)
plt.scatter(x_positions,earnings_estimate , color=“blue”, alpha=.5)
plt.legend([“Actual”, “Estimate”])
plt.xticks(x_positions, chart_labels)
plt.title(“Earnings Per Share in Cents”)
#%% md
Graph Literacy
- What do the purple dots tell us about the actual and estimate earnings per share in this graph? Hint: In color theory red and blue mix to make purple.
#%% md
It means that some data points of eanrnings_actual and earning_estimate are overlap
#%% md
#%% md
Step 7
#%% md
Next, we will visualize the earnings and revenue reported by Netflix by mapping two bars side-by-side. We have visualized a similar chart in the second Matplotlib lesson Exercise 4.
As you may recall, plotting side-by-side bars in Matplotlib requires computing the width of each bar before hand. We have pasted the starter code for that exercise below.
- Fill in the
n
,t
,d
,w
values for the revenue bars - Plot the revenue bars by calling
plt.bar()
with the newly computedx_values
and therevenue_by_quarter
data - Fill in the
n
,t
,d
,w
values for the earnings bars - Plot the revenue bars by calling
plt.bar()
with the newly computedx_values
and theearnings_by_quarter
data - Create a legend for your bar chart with the
labels
provided - Add a descriptive title for your chart with
plt.title()
- Add labels to each quarter by assigning the position of the ticks through the code provided. Hint:
plt.xticks(middle_x, quarter_labels)
- Be sure to show your plot!
#%%
The metrics below are in billions of dollars
revenue_by_quarter = [2.79, 2.98,3.29,3.7]
earnings_by_quarter = [.0656,.12959,.18552,.29012]
quarter_labels = [“2Q2017”,“3Q2017”,“4Q2017”, “1Q2018”]
Revenue
n = 1 # This is our first dataset (out of 2)
t = 2 # Number of dataset
d = len(revenue_by_quarter) # Number of sets of bars
w = 0.8 # Width of each bar
bars1_x = [telement + wn for element
in range(d)]
plt.bar(bars1_x,revenue_by_quarter)
Earnings
n = 2 # This is our second dataset (out of 2)
t = 2 # Number of dataset
d = len(earnings_by_quarter) # Number of sets of bars
w = 0.8 # Width of each bar
bars2_x = [telement + wn for element
in range(d)]
plt.bar(bars2_x,earnings_by_quarter)
middle_x = [ (a + b) / 2.0 for a, b in zip(bars1_x, bars2_x)]
labels = [“Revenue”, “Earnings”]
plt.title(“Netflix Revenue and Earnings in $ Billions”)
plt.xticks(middle_x, quarter_labels)
plt.legend(labels)
plt.show()
#%% md
Graph Literacy
What are your first impressions looking at the visualized data?
- Does Revenue follow a trend?
- Do Earnings follow a trend?
- Roughly, what percentage of the revenue constitutes earnings?
#%% md
Both revenue and earning follows a uptrend
Look at the graph I cant provide the exact number but the earning would be a small proportion of revenue
#%% md
Step 8
In this last step, we will compare Netflix stock to the Dow Jones Industrial Average in 2017. We will accomplish this by plotting two line charts side by side in one figure.
Since Price
which is the most relevant data is in the Y axis, let’s map our subplots to align vertically side by side.
-
We have set up the code for you on line 1 in the cell below. Complete the figure by passing the following arguments to
plt.subplots()
for the first plot, and tweaking the third argument for the second plot-
1
– the number of rows for the subplots -
2
– the number of columns for the subplots -
1
– the subplot you are modifying
-
-
Chart the Netflix Stock Prices in the left-hand subplot. Using your data frame, access the
Date
andPrice
charts as the x and y axes respectively. Hint: (netflix_stocks['Date'], netflix_stocks['Price']
) -
Assign “Netflix” as a title to this subplot. Hint:
ax1.set_title()
-
For each subplot,
set_xlabel
to"Date"
andset_ylabel
to"Stock Price"
-
Chart the Dow Jones Stock Prices in the left-hand subplot. Using your data frame, access the
Date
andPrice
charts as the x and y axes respectively. Hint: (dowjones_stocks['Date'], dowjones_stocks['Price']
) -
Assign “Dow Jones” as a title to this subplot. Hint:
plt.set_title()
-
There is some crowding in the Y axis labels, add some space by calling
plt.subplots_adjust(wspace=.5)
-
Be sure to
.show()
your plots.
#%%
Left plot Netflix
ax1 = plt.subplot(1, 2, 1)
plt.plot(netflix_stocks[‘Date’], netflix_stocks[‘Price’])
ax1.set_title(“Netflix”)
ax1.set_xlabel(‘Date’)
ax1.set_ylabel(‘Stock Price’)
plt.xticks(rotation=90)
plt.subplots_adjust(wspace=0.5)
Right plot Dow Jones
ax2 = plt.subplot(1, 2, 2)
plt.plot(dowjones_stocks[‘Date’], dowjones_stocks[‘Price’])
ax2.set_title(“Dow Jones”)
ax2.set_xlabel(‘Date’)
ax2.set_ylabel(‘Stock Price’)
plt.subplots_adjust(wspace=0.5)
plt.xticks(rotation=90)
plt.show()
#%% md
- How did Netflix perform relative to Dow Jones Industrial Average in 2017?
- Which was more volatile?
- How do the prices of the stocks compare?
#%% md - They both followed an uptrend
- However, netflix seems to be more volatile
- Dow Johns has more valuable stocks than Netflix
#%% md
Step 9
It’s time to make your presentation! Save each of your visualizations as a png file with plt.savefig("filename.png")
.
As you prepare your slides, think about the answers to the graph literacy questions. Embed your observations in the narrative of your slideshow!
Remember that your slideshow must include:
- A title slide
- A list of your visualizations and your role in their creation for the “Stock Profile” team
- A visualization of the distribution of the stock prices for Netflix in 2017
- A visualization and a summary of Netflix stock and revenue for the past four quarters and a summary
- A visualization and a brief summary of their earned versus actual earnings per share
- A visualization of Netflix stock against the Dow Jones stock (to get a sense of the market) in 2017
#%%
[/python]