 # Capstone Project: Analyze Financial Data with Python

Hi guys!

Here’s my take on the capstone project for the Analyze Financial Data with Python skill path. Feel free to have a look. I appreciate any comments or recommendations for improvement.

Thanks!

2 Likes

Here’s the code i used for this assignment:

``````import pandas as pd
import numpy as np
import cvxopt as opt
import matplotlib.pyplot as plt
from cvxopt import blas, solvers
``````
``````def return_portfolios(expected_returns, cov_matrix):
port_returns = []
port_volatility = []
stock_weights = []

selected = (expected_returns.axes)

num_assets = len(selected)
num_portfolios = 5000

for single_portfolio in range(num_portfolios):
#get stock portfolio weights by dividing random number assigned to each stock with the sum of random numbers
weights = np.random.random(num_assets)
weights /= np.sum(weights)
returns = np.dot(weights, expected_returns)
volatility = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
port_returns.append(returns)
port_volatility.append(volatility)
stock_weights.append(weights)

portfolio = {'Returns': port_returns,
'Volatility': port_volatility}

for counter,symbol in enumerate(selected):
portfolio[symbol +' Weight'] = [Weight[counter] for Weight in stock_weights]

df = pd.DataFrame(portfolio)

column_order = ['Returns', 'Volatility'] + [stock+' Weight' for stock in selected]

df = df[column_order]

return df
``````
``````def optimal_portfolio(returns):
n = returns.shape
returns = np.transpose(returns.values)

N = 100
mus = [10**(5.0 * t/N - 1.0) for t in range(N)]

# Convert to cvxopt matrices
S = opt.matrix(np.cov(returns))
pbar = opt.matrix(np.mean(returns, axis=1))

# Create constraint matrices
G = -opt.matrix(np.eye(n))   # negative n x n identity matrix
h = opt.matrix(0.0, (n ,1))
A = opt.matrix(1.0, (1, n))
b = opt.matrix(1.0)

# Calculate efficient frontier weights using quadratic programming
portfolios = [solvers.qp(mu*S, -pbar, G, h, A, b)['x'] for mu in mus]

## CALCULATE RISKS AND RETURNS FOR FRONTIER
returns = [blas.dot(pbar, x) for x in portfolios]
risks = [np.sqrt(blas.dot(x, S*x)) for x in portfolios]
## CALCULATE THE 2ND DEGREE POLYNOMIAL OF THE FRONTIER CURVE
m1 = np.polyfit(returns, risks, 2)
x1 = np.sqrt(m1 / m1)
# CALCULATE THE OPTIMAL PORTFOLIO
wt = solvers.qp(opt.matrix(x1 * S), -pbar, G, h, A, b)['x']
return np.asarray(wt), returns, risks
``````

# Parameter Definition

``````startdate = '2015/5/1'
enddate = '2020/5/1'
symbol = ['DUK', 'MSFT', 'AMD', 'ABBV', 'BABA']
stocks = ['Duke Energy', 'Microsoft', 'Advanced Micro Devices Inc.', 'Abbvie', 'Alibaba']
``````

# Data Extraction

``````adjclose = web.DataReader(symbol, start=startdate, end=enddate, data_source='yahoo')['Adj Close']
for i in range(1,len(symbol)):

plt.figure(figsize = (25, 25))
for i in range(len(symbol)):
plt.subplot(3, 2, i + 1)
plt.xlabel('Date')
plt.title(str(symbol[i]), fontsize = 20, fontweight='bold')
plt.savefig('stockperformance.jpeg')
plt.show()
``````

# Data Analysis on Stocks

``````simpleror = adjclose.pct_change()
print(simpleror)
``````
``````simple_returns = simpleror.mean()
print(simple_returns)
plt.figure()
plt.bar(symbol, simple_returns)
plt.title('Average Daily Returns of Stocks')
plt.xlabel('Stock')
plt.ylabel('Simple Returns')
plt.savefig('dailyreturns.jpeg')
plt.show()
``````
``````simpleror_var = simpleror.var()
print(simpleror_var)
print('The riskiest stock in the portfolio, exhibiting a variance of {} in daily simple returns over the past 5 years is {}.'.format(max(simpleror_var), simpleror_var.idxmax()))
plt.figure()
plt.bar(symbol, simpleror_var)
plt.title('Variance of Daily Stock Returns')
plt.xlabel('Stock')
plt.ylabel('Variance')
plt.savefig('variance.jpeg')
plt.show()
``````
``````simpleror_std = simpleror.std()
print(simpleror_std)
``````
``````simpleror_corr = simpleror.corr()
#simpleror_corr = simpleror_corr.where(simpleror_corr < 1.00)
print(simpleror_corr)
a, b = simpleror_corr.stack().idxmin()
c, d = simpleror_corr.where(simpleror_corr < 1.00).stack().idxmax()
print('{}/{} exhibits the lowest correlation in daily simple returns.'.format(a, b))
print('{}/{} exhibits the greatest correlation in daily simple returns.'.format(c, d))
``````

# Mean-Variance Optimization

Return a set of portfolio options with volatility and annualized returns

``````portfolio_options = return_portfolios(simple_returns, simpleror.cov())
print(portfolio_options)
``````
``````weights, returns, risks = optimal_portfolio(simpleror[1:])
``````
``````print(weights.shape)
simp_returns = simple_returns.to_numpy().reshape(1,5)
print(simp_returns.shape)
opt_returns = np.asscalar(np.dot(simp_returns, weights))
print(opt_returns)
``````
``````plt.figure(figsize = (15,15))
portfolio_options.plot.scatter(x='Volatility', y='Returns', fontsize=12)
plt.title('Efficient Frontier', fontsize = 15)
plt.xlabel('Volatility')
plt.ylabel('Daily Returns')
plt.axis([min(portfolio_options['Volatility']) * 0.5, max((portfolio_options['Volatility'])) * 1.5, min(portfolio_options['Returns']) * 0.5, max((portfolio_options['Returns'])) * 1.5])
#Place markers for single stock portfolios
std = np.sqrt(np.diagonal(simpleror.cov()))
plt.scatter(std, simple_returns, marker = 'x', color = 'red', s = 200)
plt.plot(risks, returns, color = 'purple')
plt.grid(color='grey', linestyle=':', linewidth=2)
plt.savefig('efficientfrontier.jpeg')
plt.show()
``````

# Portfolio Recommendations

``````returns_tolerance = np.array([0.0015, 0.0020])
volatility_tolerance = np.array([0.017, 0.022])
df_lower_volatility = portfolio_options[(portfolio_options.Returns > returns_tolerance) & (portfolio_options.Volatility < volatility_tolerance)]
df_higher_returns = portfolio_options[(portfolio_options.Returns > returns_tolerance) & (portfolio_options.Volatility < volatility_tolerance)]
print(df_lower_volatility)
print(df_higher_returns)
``````
``````print(portfolio_options.iloc)
print(portfolio_options.iloc)
``````
``````
``````
``````(5, 1)
(1, 5)
0.0032835199914771087
``````
``````plt.figure(figsize = (15,15))
portfolio_options.plot.scatter(x='Volatility', y='Returns', fontsize=12)
plt.title('Efficient Frontier', fontsize = 15)
plt.xlabel('Volatility')
plt.ylabel('Daily Returns')
plt.axis([min(portfolio_options['Volatility']) * 0.5, max((portfolio_options['Volatility'])) * 1.5, min(portfolio_options['Returns']) * 0.5, max((portfolio_options['Returns'])) * 1.5])
#Place markers for single stock portfolios
std = np.sqrt(np.diagonal(simpleror.cov()))
plt.scatter(std, simple_returns, marker = 'x', color = 'red', s = 200)
plt.plot(risks, returns, color = 'purple')
plt.grid(color='grey', linestyle=':', linewidth=2)
plt.savefig('efficientfrontier.jpeg')
plt.show()
``````
``````<Figure size 1080x1080 with 0 Axes>
``````

# Portfolio Recommendations

``````returns_tolerance = np.array([0.0015, 0.0020])
volatility_tolerance = np.array([0.017, 0.022])
df_lower_volatility = portfolio_options[(portfolio_options.Returns > returns_tolerance) & (portfolio_options.Volatility < volatility_tolerance)]
df_higher_returns = portfolio_options[(portfolio_options.Returns > returns_tolerance) & (portfolio_options.Volatility < volatility_tolerance)]
print(df_lower_volatility)
print(df_higher_returns)
``````
``````       Returns  Volatility  DUK Weight  MSFT Weight  AMD Weight  ABBV Weight  \
1464  0.001501    0.016963    0.232939     0.383125    0.253342     0.004610
3082  0.001503    0.016983    0.131725     0.431681    0.227222     0.034093
4384  0.001514    0.016963    0.085769     0.543129    0.210829     0.078017

BABA Weight
1464     0.125985
3082     0.175278
4384     0.082256
Returns  Volatility  DUK Weight  MSFT Weight  AMD Weight  ABBV Weight  \
4581  0.002006    0.021957    0.011947     0.507696    0.399157     0.039691

BABA Weight
4581     0.041509
``````
``````print(portfolio_options.iloc)
print(portfolio_options.iloc)
``````
``````Returns        0.001514
Volatility     0.016963
DUK Weight     0.085769
MSFT Weight    0.543129
AMD Weight     0.210829
ABBV Weight    0.078017
BABA Weight    0.082256
Name: 4384, dtype: float64
Returns        0.002006
Volatility     0.021957
DUK Weight     0.011947
MSFT Weight    0.507696
AMD Weight     0.399157
ABBV Weight    0.039691
BABA Weight    0.041509
Name: 4581, dtype: float64
``````
``````
``````

Did you copy the return_portfolios and optimal_portfolio functions directly from the portfolio optimization section of the course?

I am currently trying to do this project and find myself with no idea how to do the final section:
“Perform a mean-variance portfolio optimization that shows the efficient frontier for the group of stocks you have selected. If the investor is less risky, how should she allocate her funds across the portfolio? If she is more risky, how should she allocate her funds? Indicate multiple investment options at different risk levels and specify the returns.”

I’m sure I could copy and paste the functions from the course into my code and use them but that feels like I’m skipping a step.

How did you know how/why to use cvxopt?

The cvxopt package is used in my optimal_portfolio() function, which was derived from the source code in page 9/12 of the lesson “Mean-Variance Portfolio Optimization”.

However, this might not be that critical to my analysis outcome, but was more for a visualization aspect of my efficient frontier.

From the graph obtained, by adding gridlines, I set the limits of my intended returns (0.0015 and 0.0020), and their corresponding volatility (just a rough gauge) from the efficient frontier plot. ``````returns_tolerance = np.array([0.0015, 0.0020])
volatility_tolerance = np.array([0.017, 0.022])
``````

Subsequently, I extracted the list of portfolios that satisfy my preset conditions for returns and volatility from the list of portfolio options (portfolio_options), and select my final portfolios from what was extracted, with the option of a lower return but lower volatility portfolio (df_lower_volatility) and the option of a higher return but higher volatility and therefore riskier portfolio (df_higher_returns).

``````df_lower_volatility = portfolio_options[(portfolio_options.Returns > returns_tolerance) & (portfolio_options.Volatility < volatility_tolerance)]
df_higher_returns = portfolio_options[(portfolio_options.Returns > returns_tolerance) & (portfolio_options.Volatility < volatility_tolerance)]
print(df_lower_volatility)
print(df_higher_returns)
``````

Hope this helps!

I think the highest return “portfolio” will just be 100% of the previously best performing stock. If you add any lower-performing stock, the expected returns should go down.