Jeffrey Wu - Projects

The Efficient Frontier — An Analysis on Portfolio Optimization Through Simulations

Published on Medium | July 7, 2020 | 9 min read

Coding has been an immense interest for me ever since I enrolled in a Mathematics dual degree — it's a way to bridge abstract thought (ie. math/economics) and application (finance and data analysis). This is a recent Python project that I have been working on and hopefully serves as a stepping stone to better understanding packages like pandas and matplotlib.

Note: The following post is copy-pasted from my Jupyter notebook (link). As a result, there may be a few formatting issues (so please let me know). Also, while I have tried to reduce redundancies and improve my coding style, I am by no means a trained coder and would love input on how to better format and optimize my code.

Overview

The purpose of this analysis is to identify the optimal portfolio consisting of pre-determined stocks that maximizes return or minimizes risk at the discretion of the investor in accordance with historical returns. This form of portfolio analysis was first pioneered by Harry Markowitz and has been a staple of portfolio analysis and led to the development of the capital asset pricing model.

Assuming that the investor is rational, risk-averse, profit-maximizing and subject to negligible transaction costs, the optimal portfolio at any given level of risk or return would be along the top edge, ie. the "frontier" of the model. (Note that this is not a time-series based Monte Carlo simulation, which is used to model asset prices according to the random walk theory.)

To determine the frontier, a portfolio consisting of x picked stocks are chosen. The simulation assigns randomized weightings to each stock in the given portfolio (for example, Stock A at 40%, Stock B at 50%, Stock C at 10% of the entire value of the portfolio) repeatedly in large sample sizes. Each sample is evaluated for its expected return and volatility under the pre-determined time interval. This is finally plotted to visualize the effectiveness of each iteration.

The stock data used in this analysis is dated from October 2016 to the end of June 2020 and will focus on the following:

["NVDA", "GOLD","USO", "FTS", "PG"]

However, the coding behind the work has been tested and works on virtually any combination of equities — but do note that it is most efficient with only a handful of them. Simulations with a broader number of equities will require more processing to produce an adequate number of randomly distributed weights to the portfolio to determine the efficient frontier.

Import the Data from Google

The data is imported using gspread and oauth2client from a customizable Google Spreadsheet with a .json keyfile. Stocks are populated in columns across the spreadsheet, along with rows of prices corresponding to each time interval.

import gspread
from oauth2client.service_account import ServiceAccountCredentials

scope = ['https://spreadsheets.google.com/feeds',
         'https://www.googleapis.com/auth/drive']
creds = ServiceAccountCredentials.from_json_keyfile_name('Finance-3dc23d566121.json', scope)
client = gspread.authorize(creds)
sheet = client.open('S&P500 Data').sheet1
data = sheet.get_all_records()

Initialize Data with Pandas

The following initializes the retrieved data into pandas to quickly check data quality.

import numpy as np
import pandas as pd

listdata = []
stock = []
headers = list(data[0].keys())
for i in range(len(data)):
    listdata.append(list(data[i].values()))
for i in range(len(listdata)):
    stock.append(listdata[i][0])
for row in listdata:
    del row[0]
pddata = pd.DataFrame(listdata, columns = headers[1:], index = stock)
print(pddata.head())

Choose the Stocks and Simulation Size

To begin the analysis, we pick a list of stocks. For this particular exercise, selected stocks from different sectors are chosen to demonstrate their differences in return and risk.

stock_list = ["NVDA", "GOLD","USO", "FTS", "PG"]
sim_size = 1000000  # 1M simulations

Generate Randomized Weightings

Simulations of greater than 1M were found to significantly increase processing times.

weight = np.random.uniform(low = 0.0, high = 1.0, size = sim_size * len(stock_list))
weight = weight.reshape(sim_size, len(stock_list))
for i in range(len(weight)):
    denom = 0
    for j in range(len(weight[i])):
        denom = denom + weight[i][j]
    weight[i] = weight[i] / denom

Calculate Return and Variance

As the retrieved stock data is in weekly prices, it must be converted into % returns. This is where pandas is still useful prior to our calculations.

pctdata = 1 / (1 + pd.DataFrame(pddata.pct_change()))
cov_table = np.array(pctdata[stock_list].cov())
avgreturn = pctdata[stock_list].mean(axis = 0)
npreturn = np.array(avgreturn)

Portfolio Variance and Volatility

The equation for portfolio variance is equal to the sum of the products of each stock's return to its counterparts (and itself) and the covariance of those stocks. Mathematically it is expressed as:

Portfolio variance =
w₁ * std(w₁)² + w₂ * std(w₂)² + … wₙ * std(wₙ)²
+ w₁ * w₂ * cov(w₁,w₂) + w₁ * w₃ * cov(w₁,w₃) + …
w₂ * w₃ * cov(w₂,w₃) + … w(n-1) * wₙ * cov(w(n-1),wₙ)

var_table = np.empty(len(weight))
for k in range(len(weight)):
    port_var = 0
    for i in range(len(cov_table)):
        for j in range(len(cov_table[i])):
            if i <= j:
                if i == j:
                    port_var = port_var + weight[k][i] ** 2 * cov_table[i][j]
                else:
                    port_var = port_var + 2 * cov_table[i][j] * weight[k][i] * weight[k][j]
var_table[k] = port_var ** 0.5

Expected Weighted Returns

ret_table = np.zeros(len(weight))
for i in range(len(weight)):
    for j in range(len(avgreturn)):
        ret_table[i] = ret_table[i] + weight[i][j] * avgreturn[j]
    ret_table[i] -= 1

Graphing the Results

import matplotlib as mpl
import matplotlib.pyplot as plt

plt.style.use('seaborn')
mpl.rcParams['font.family'] = 'serif'

y = ret_table.tolist()
x = var_table.tolist()
plt.figure(figsize = (16, 12), dpi = 80, facecolor = 'w', edgecolor = 'k')
plt.scatter(x, y, s = 5)
plt.xlabel("Portfolio Standard Deviation")
plt.ylabel("Portfolio % Return, Weekly")
plt.title("Return vs. Variance Distribution of Randomized Portfolios")

With an ample number of simulations, a clear relationship between portfolio return and risk is established along the top edge of blue dots.

Note: A 0.002 return is a 0.2% weekly return.

Pruning Inefficient Portfolios

Any portfolio that contains both a higher standard deviation and lower return to any other portfolio is removed as they are clearly suboptimal.

Identifying the most inefficient portfolios requires parsing through the data for each portfolio and comparing it with every other portfolio, requiring a worst case scenario of O(n²) time. However, if the data is sorted by one measure (eg. lower standard deviation) and then each subsequent portfolio with a favorable secondary measure (ie. higher expected return), we can obtain a list of portfolios along this frontier limited to the sorting runtime (effectively O(n log(n))).

ret_var = np.vstack((var_table, ret_table, np.arange(sim_size))).T
sorted_retvar = ret_var[np.argsort(ret_var[:,0], kind = 'quicksort')]

optimals = np.array([])
largest = 0
for i in range(len(sorted_retvar)):
    if sorted_retvar[i][1] > largest:
        largest = sorted_retvar[i][1]
        optimals = np.concatenate((optimals, sorted_retvar[i]))
optimals = optimals.reshape(int(len(optimals) / 3), 3)

Plot the Optimal Portfolios

y = optimals[:,1]
x = optimals[:,0]
plt.figure(figsize = (16, 12), dpi = 80, facecolor = 'w', edgecolor = 'k')
plt.scatter(x, y, s = 5)
plt.xlabel("Portfolio Standard Deviation")
plt.ylabel("Portfolio % Return, Weekly")
plt.title("The Efficient Frontier of Randomized Portfolios")

After pruning the data, a curve of the most optimal portfolio choices begins to show. As simulations increase, the line becomes more apparent and accurate.

Backend Analysis of Weightings

eff_weights = np.array([])
for i in range(len(optimals)):
    eff_weights = np.append(eff_weights, weight[int(optimals[i,2])])
eff_weights = eff_weights.reshape(int(len(optimals)), len(stock_list)).T

Cross-examination of Return and Risk

from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure(figsize = (16,14), dpi = 80, facecolor = 'w', edgecolor = 'k')
ax = fig.add_subplot(111, projection = '3d')
for i, item in enumerate(eff_weights):
    ax.scatter(x, y, item, label = str(stock_list[i]))
ax.set_xlabel("Portfolio Standard Deviation")
ax.set_ylabel("Portfolio % Return, Weekly")
ax.set_zlabel("Portfolio Weight")
ax.set_title("The Efficient Frontier of Randomized Portfolios")
ax.legend()

Key Findings

Based on this data, it is clear that NVDA is the most effective equity in the basket to improve returns, while a combination of PG and FTS can serve to hedge away risks. The analysis is very clear:

At a lower risk tolerance, one would optimally hold a collection of stocks in utilities and consumer packaged goods
An investor that is relatively risk-seeking would benefit from buying more volatile stocks in tech
The correlation between weight and risk/return is very much evident in NVDA, but less so in PG, FTS and GOLD
NVDA is the only stock in the list skewed towards returns despite higher risk, which forces the model to discriminate heavily towards the weight of NVDA as risk and returns increase
USO shows low weighting across all optimal portfolio choices, indicating it is not an effective choice to hedge risk nor provide favorable returns

View this article on Medium →

Download the code and data →