Exploring the 2020 Stock Market Recovery in the United States
You can access the notebook used to generate the analysis presented below here: https://gist.github.com/aarongilman/5054fc12ccde79076e45501ab82c0690
If you prefer to view this post as a Jupyter Notebook, you can view it that way at the end of the post.
The S&P 500 has recovered all of the losses from the COVID crash earlier in the year. Does that also mean that all of the stocks in the index have recovered their losses too? Let’s find out below.¶
First we are going to gather the necessary data to perform our analysis.
We need to get a list of all of the companies in the S&P 500, along with their size. I am going to use a dataset based on the holdings of the popular ETF, SPY, and use the weights as a proxy for their size. The ETF weights companies based on their market capitalization, so it should serve its purpose
constituents = requests.get('https://finnhub.io/api/v1/etf/holdings?symbol=SPY&token={}'.format(finnhub_token))
constituents_df = pd.DataFrame.from_records(constituents.json()['holdings'])
constituents_df.head(10)
Now that we have a nice list of the companies along with weights, we need to get price data so we can calculate trailing returns for each company, before we move to the next step.
- First we are going to drop the last row, which has None as the ticker, which means it is probably cash in the ETF
- We are going to use adjusted close prices instead of unadjusted, to account for any dividends or splits in the companies, to put every company on the same page.
constituents_df = constituents_df.iloc[:505]
constituents_df.loc[:, 'symbol'] = constituents_df['symbol'].apply(lambda x: x.replace(".", "-"))
I am going to add the SPY ETF for comparison purposes, RSP (the equal weighted S&P 500 ETF), and XLG (the 50 largest companies in the S&P 500) so we can have something to benchmark against later in this research.
tickers = list(set(constituents_df['symbol'].to_list())) + ['SPY','RSP','XLG']
adjusted_close = pd.DataFrame(columns=tickers)
null_tickers = []
for ticker in tickers:
try:
data_panel = web.DataReader([ticker], "tiingo").loc[ticker]['adjClose'].to_frame()
data_panel.columns = [ticker]
data_panel.index = pd.to_datetime(data_panel.index)
if data_panel.index.max().tz_localize(None) < datetime.today() - relativedelta(days=5):
print("{} most recent date is {}".format(ticker, str(data_panel.index.max())))
else:
adjusted_close[ticker] = data_panel[ticker]
except:
null_tickers.append(ticker)
print("{} not found".format(ticker))
filtered_adjusted_close = adjusted_close.loc['2019-12-31':]
constituents_df.set_index('symbol', inplace=True)
The time period I am interested in begins with the peak of the S&P 500, which was on the date below. I will run all performance for the constituents from the peak of the market to the last market close.
peak = '2020-02-19'
for ticker in filtered_adjusted_close.columns:
constituents_df.loc[ticker, 'peak_to_date'] = (filtered_adjusted_close[ticker].iloc[-1] - filtered_adjusted_close[ticker].loc[peak])/filtered_adjusted_close[ticker].loc[peak]
constituents_df.head(10)
What is the total retur of the S&P 500 ETF from the peak in February to the last market close?
print("{}%".format(round(constituents_df.loc['SPY', 'peak_to_date'] * 100, 2)))
What about the equal-weight S&P 500 performance?
print("{}%".format(round(constituents_df.loc['RSP', 'peak_to_date'] * 100, 2)))
What about just the top 50 in the S&P 500?
print("{}%".format(round(constituents_df.loc['XLG', 'peak_to_date'] * 100, 2)))
So, from the peak of the market on 2/19/2020, the S&P 500 as a whole is up over 6.5%. Which means the index has recovered all of the losses from the drawdown/recession with some to spare. For the equal weighted version of the index, performance is still positive but a little less than 3% less than the market cap weighted equivalent. This difference is indicative of the outperformance of large vs small companies so far this year. To see how much the index performance was buoyed by the largest companies, the 10.04% performance from just the top 50 companies could mean that if you held the S&P 500 without the top 50 names, you would be negative performance wise during the recovery.
constituents_df.drop(['SPY','RSP','XLG'], inplace=True, axis=0)
Breaking it down by Market Cap (size)¶
Next, we are going to split up the S&P 500 by market cap into deciles, which should give us around 50 companies per decile. We will be using these deciles to group the returns of the companies, to see if there is any pattern or information that may be interesting.
constituents_df.loc[:, 'decile'] = pd.qcut(constituents_df['percent'].to_list(), 10, labels=False)
Decile 9 is the 50 largest companies, while decile 0 represents the 50 smallest companies in the index
constituents_df
Now that we have all of the data we need to start digging in, we are going to group the data by decile, and look at some descriptive statistics for the return from the peak in February. This will give us the average, min, max, standard deviation, etc. so it will be helpful to get a peek into the variation within each group.
constituents_df.groupby('decile')['peak_to_date'].describe()
On average, the two smallest deciles (0 and 1 above) have not yet fully recovered the losses from the pandemic crash yet, and the 3rd smallest has barely recovered (slightly above 0 on average). The best performance has been in the top decile, while the second best has been the 4th decile. However, if you look at the “std” columnn, you can see the variation within each decile, and it is almost 50% greater than the variation in the top decile. The 4th decile’s worst performer was -53%, while the best performer returned a massive 163%. Compare this with the top decile, with worst performer at -34% and the best performer at 66%. This demonstrates the uniformity and consistency of the market’s preference for mega cap stocks during the recovery. Market participants appeared to put a premium on company size perhaps due to the perception that they would be able to weather the storm better?
constituents_df.loc[:, 'weight'] = constituents_df['percent'].apply(lambda x: x/constituents_df['percent'].sum())
constituents_df
constituents_df.loc[:, 'return_contribution_peak_to_date'] = constituents_df['weight'] * constituents_df['peak_to_date']
constituents_df.groupby('decile')['return_contribution_peak_to_date'].sum() / constituents_df['return_contribution_peak_to_date'].sum()
What this shows is that the top decile contributed aroud 81% of the recovery performance, and the bottom two deciles detracted from performance, which lines up with our results so far.
To wrap up this part of the analysis, lets look at a box plot of the performance by decile, which is a nice way to visualize most of what we saw in the data previously.
boxplot = constituents_df.boxplot(column=['peak_to_date'], by=['decile'], figsize=[24,12])
Breaking down the constituents by sector¶
Let’s bring in some sector data real quick
sector_data = pd.read_csv('sp500_sectors.csv', index_col=1)
sector_constituent_df = pd.merge(sector_data, constituents_df, left_index=True, right_index=True)
sector_constituent_df = sector_constituent_df.dropna()
sector_constituent_df.groupby('sector')['peak_to_date'].describe().sort_values(by='mean')
So, even though the S&P 500 is in aggregate recovered all losses from the recession, 4 of the sectors (when equally weighted) have not recovered all of their losses. The rest have recovered (greater than 0 return), but two (Communication Services and Consumer Staples) are lagging behind the broader S&P 500 return. Sentiment around the Energy and Utility sectors has been negative due to expectations for lower energy consumption as the economy faced threat of shutdowns across the country, same goes for real estate. Financials have been out of favor due to the low interest rate environment. Bargain shoppers/value inclined investors may be rummaging through these sectors for stocks that are still beaten down and may have some room to run.
Let’s look inside the energy and real estate sectors real quick to see the distribution of returns for the stocks in each.
sector_constituent_df[sector_constituent_df['sector'] == 'Energy']['peak_to_date'].sort_values().plot(kind='barh', figsize=[12,6], legend=True) # alpha for transparency
Pretty bleak performance across the entire sector, with only 2 companies that have recovered from the drawdown earlier in the year.
sector_constituent_df[sector_constituent_df['sector'] == 'Real Estate']['peak_to_date'].sort_values().plot(kind='barh', figsize=[12,6], legend=True) # alpha for transparency
Real estate performance is mostly still in the red, with a couple bright spots, which appear to be in data center, wireless tower and storage unit REITs which all have pretty strong / positive tail winds driving their performance.
sector_constituent_df.groupby('sector')['return_contribution_peak_to_date'].sum().plot(kind='barh', figsize=[12,6], legend=True) # alpha for transparency
sector_constituent_df.groupby('sector')['return_contribution_peak_to_date'].sum() / sector_constituent_df['return_contribution_peak_to_date'].sum()
Last but not least, looking at the sectors contributions to the recovery performance, over 50% of the positive performance is attributed to the technology sector, with consumer discretionary adding about 1/4 of the performance to the index.
The bottom line is that while the broad S&P 500 index (cap-weighted and equal-weighted) has recovered all losses from the pandemic drawdown, stock pickers using the index as their universe have probably had very different results depending on sector weights and market cap weights relative to the index.