Financial Analysis Skill Path with Python, Pandas and NumPy

my capstone ! This was fun. In case you take some time to review this: Thanks in advance.

Please review the code below. You can find my finished presentation at mikesMeanVarOpt - Google Slides

import yfinance as yf

# OF NOTE: End date is non-inclusive. Also, random securities become unavailable intermittently.
#          Watch in console for ommissions.
start_date = '2009-06-30'
end_date = '2023-04-04'


result_data =, start_date, end_date)
result_data['Date'] = result_data.index
result_data = result_data[['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']]
result_data.reset_index(drop=True, inplace=True)

import pandas as pd

# Read csv procuced by
result_data = pd.read_csv('rawProjectTickers.csv')

# Dropping all columns except Adj Close, Date, and 'Unnamed 0'
result_data = result_data.drop(['Close','Close.1','Close.2','Close.3','Close.4','Close.5','Close.6','Close.7',\
                                'Close.8','Close.9','Close.10'], axis=1).reset_index(drop=True)
result_data = result_data.drop(columns=result_data.columns[result_data.columns.str.contains('Open')])
result_data = result_data.drop(columns=result_data.columns[result_data.columns.str.contains('High')])
result_data = result_data.drop(columns=result_data.columns[result_data.columns.str.contains('Low')])
result_data = result_data.drop(columns=result_data.columns[result_data.columns.str.contains('Volume')])

# Convert times to datetime objects.
result_data['Date'] = pd.to_datetime(result_data['Date'], format='%Y-%m-%d')

# Transform all NaNs/NaTs to zero. The next line changes 'Unnamed' column to ints.
# This wasn't necessary, but it helped me see what the 'real' indexes were meant to look like.
result_data = result_data.fillna(value=0)
result_data['Unnamed: 0'] = result_data['Unnamed: 0'].astype('int')

# This was a manual data massage to recreate the ticker labels on the2nd line.
result_data = result_data.rename(columns={'Adj Close':'AAPL', 'Adj Close.1':'AER', \
    'Adj Close.2':'CAH', 'Adj Close.3':'CRM', 'Adj Close.4':'F', 'Adj Close.5':'HAL', \
    'Adj Close.6':'MSFT', 'Adj Close.7':'PNC', 'Adj Close.8':'UPS', 'Adj Close.9':'WYNN', \
    'Adj Close.10':'XOM'})

# Dropped the top row of the dataframe.
result_data = result_data.drop(0, axis=0)

# Drop the unnamed column to establish the new indices, disallow adding the
# old index column, and insure the result is a new dataframe.
result_data = result_data.drop('Unnamed: 0', axis=1).reset_index(drop=True)

# File to be read in and used in

import pandas as pd

# Read csv produced by
result_data = pd.read_csv('modifiedProjectTickers.csv')
result_data = result_data.drop('Unnamed: 0', axis=1)

# Creating a dictionary to name headers and then transforming it into a dataframe.
result_data2 = {'Date': [], 'AAPL': [], 'AER': [], 'CAH': [], 'CRM': [], 'F': [], 'HAL': [], \
    'MSFT':[], 'PNC': [], 'UPS': [], 'WYNN': [], 'XOM': []}
result_data2 = pd.DataFrame(result_data2)

# Function to append a row.
def add_a_row(i_for_modified, i_for_original):
    result_data2.loc[i_for_modified]= result_data.loc[i_for_original]

# Iterating through the csv:
#   Grab the date for the current index. [5:7] is the two-digit month.
#   Test to only examine the date for a Quarterly month.
#   Make sure I don't go past the final element when comparing to the next one.
#   Examine current index and the next one to find the last trading day of the month.
#   Invoke the function above and manage the index for the new df.
i_for_new_df = 0
for i in range(len(result_data)):
    examine_date = result_data.loc[i,'Date']
    if (examine_date[5:7] == '03') | (examine_date[5:7] == '06') | \
        (examine_date[5:7] == '09') | (examine_date[5:7] == '12'):
        if i+1 in range(len(result_data)):
            examine_date_plus_1 = result_data.loc[i+1,'Date']
            if examine_date[5:7] != examine_date_plus_1[5:7]:
                add_a_row(i_for_new_df, i)
                i_for_new_df += 1
# Write the new (fully prepared) csv file.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from rf import return_portfolios, optimal_portfolio

# Read csv produced by
result_data = pd.read_csv('adjClose.csv', index_col=1)#.reset_index(drop=True)
result_data = result_data.drop('Unnamed: 0', axis=1)

my_var = result_data.var()#.reset_index()
#my_var.to_csv('outAllVariance.csv') # This can be done as a Series or as a Dataframe (1 dimension).

my_cov = result_data.cov()#.reset_index()   # BUT this is based on Adj Close, NOT on 
                                            # quarterly returns!!

# Calculate quarterly returns, and expected_returns
selected = list(result_data.columns[:])  # Each ticker
quarterly_returns = result_data[selected].pct_change()
expected_returns = quarterly_returns.mean()

quarterly_cov = quarterly_returns.cov() # THIS is cov based on quarterly returns!
correlQR = quarterly_returns.corr()

single_asset_std=np.sqrt(np.diagonal(quarterly_cov))  # All the standard 

# Call to functions
randomized_weights = return_portfolios(expected_returns, quarterly_cov)
weights, returns, risks = optimal_portfolio(quarterly_returns[1:])    # IS THIS  [1:]  OMITTING ANYTHING?

# Initial plot, followed by the risk/return efficient frontier.
randomized_weights.plot.scatter(x='Volatility', y='Returns', figsize=(10,8), grid=True)
plt.plot(risks, returns, color='aquamarine', marker='v')

# Plot the individual assets.

plt.xlabel('Volatility (Std. Deviation)')
plt.ylabel('Expected Returns')
plt.title('Efficient Frontier')

Also wanted to mention a fairly old post that I found very helpful in my approach to completing this skillpath. It was posted by
py7378625778 with great feedback from lisalisaj

Thanks for that posting thread. :smiley:

1 Like

Congrats on completing the project.

Some thoughts:

  • rather than post the code here, include a link to the GitHub repo with the notebook.

  • I’m a fan of Slides over Powerpoint (that’s just my .02 & I know no one cares. lol)

  • If you can, reduce the amount of text on the slides. You could make each key point a brief bulleted list. Reasoning: if you’re presenting this to people, they don’t want to read a bunch of text on a slide; but rather key points.

  • Slides 7 & 9, need to have one column on them (one seems to be greyed out).

  • As far as including code snippets–these would be great in an appendix rather than the main slide presentation. It’s about ‘knowing your audience’–and if they’re non-technical, these slides will mean nothing to them. If people are interested in the code part of your analysis, you can refer them to the appendix, or to your GitHub repo.

  • unrelated-nice shot of the Golden Gate Bridge(?) at the end.

Good work! Keep going.

Thank you for the feedback. I totally agree with your comments about the slides. I broke my own rules here; it’s hard to “present” when you’re not presenting.

I have 3 or 4 notebooks that I could post, but most of them are related to other financial analysis (REITs and such). This is really an interest for me, rather than all about my main vocation. Perhaps I’ll try one of the python project selections, set it up in an iPython nb, and then post that to my GitHub account! Nice idea - thanks again.

(The bridge pic was inside the Google Slides template.)

1 Like

Nice work.
Would be interesting to see where your portfolio with optimal weights will lie along the efficient frontier.
Your philosophical comments made this more interesting.
Had no idea Harry lived so long.

Well done!

Thanks for the feedback. I tried to follow the instructions to give the fictitious investor a few different risk options. This was fun.

Can’t remember when I first read about Harry, but was surprised to learn he had passed away just last summer. A life well lived I think