Exploring the 2020 Stock Market Recovery in the United States

You can access the notebook used to generate the analysis presented below here: https://gist.github.com/aarongilman/5054fc12ccde79076e45501ab82c0690

If you prefer to view this post as a Jupyter Notebook, you can view it that way at the end of the post.

The S&P 500 has recovered all of the losses from the COVID crash earlier in the year. Does that also mean that all of the stocks in the index have recovered their losses too? Let’s find out below.

First we are going to gather the necessary data to perform our analysis.

We need to get a list of all of the companies in the S&P 500, along with their size. I am going to use a dataset based on the holdings of the popular ETF, SPY, and use the weights as a proxy for their size. The ETF weights companies based on their market capitalization, so it should serve its purpose

In [50]:
constituents = requests.get('https://finnhub.io/api/v1/etf/holdings?symbol=SPY&token={}'.format(finnhub_token))
In [70]:
constituents_df = pd.DataFrame.from_records(constituents.json()['holdings'])
In [71]:
constituents_df.head(10)
Out[71]:
percent share symbol
0 6.755792 164439440 AAPL
1 5.645525 77406670 MSFT
2 4.981768 4354880 AMZN
3 2.460435 24592506 FB
4 1.704741 3073531 GOOGL
5 1.676826 3003218 GOOG
6 1.455559 20275316 BRK.B
7 1.316222 26930294 JNJ
8 1.248076 25465472 PG
9 1.171733 6313330 NVDA

Now that we have a nice list of the companies along with weights, we need to get price data so we can calculate trailing returns for each company, before we move to the next step.

  1. First we are going to drop the last row, which has None as the ticker, which means it is probably cash in the ETF
  2. We are going to use adjusted close prices instead of unadjusted, to account for any dividends or splits in the companies, to put every company on the same page.
In [72]:
constituents_df = constituents_df.iloc[:505]
constituents_df.loc[:, 'symbol'] = constituents_df['symbol'].apply(lambda x: x.replace(".", "-"))

I am going to add the SPY ETF for comparison purposes, RSP (the equal weighted S&P 500 ETF), and XLG (the 50 largest companies in the S&P 500) so we can have something to benchmark against later in this research.

In [73]:
tickers = list(set(constituents_df['symbol'].to_list())) + ['SPY','RSP','XLG']
In [74]:
adjusted_close = pd.DataFrame(columns=tickers)
null_tickers = []

for ticker in tickers:
  try:
    data_panel = web.DataReader([ticker], "tiingo").loc[ticker]['adjClose'].to_frame()
    data_panel.columns = [ticker]
    data_panel.index = pd.to_datetime(data_panel.index)
    if data_panel.index.max().tz_localize(None) < datetime.today() - relativedelta(days=5):
        print("{} most recent date is {}".format(ticker, str(data_panel.index.max())))
    else:
        adjusted_close[ticker] = data_panel[ticker]
  except:
    null_tickers.append(ticker)
    print("{} not found".format(ticker))
In [75]:
filtered_adjusted_close = adjusted_close.loc['2019-12-31':]
In [76]:
constituents_df.set_index('symbol', inplace=True)

The time period I am interested in begins with the peak of the S&P 500, which was on the date below. I will run all performance for the constituents from the peak of the market to the last market close.

In [77]:
peak = '2020-02-19'
In [78]:
for ticker in filtered_adjusted_close.columns:
    constituents_df.loc[ticker, 'peak_to_date'] = (filtered_adjusted_close[ticker].iloc[-1] - filtered_adjusted_close[ticker].loc[peak])/filtered_adjusted_close[ticker].loc[peak]
In [79]:
constituents_df.head(10)
Out[79]:
percent share peak_to_date
symbol
AAPL 6.755792 164439440.0 0.459377
MSFT 5.645525 77406670.0 0.132220
AMZN 4.981768 4354880.0 0.428150
FB 2.460435 24592506.0 0.240057
GOOGL 1.704741 3073531.0 0.138707
GOOG 1.676826 3003218.0 0.141155
BRK-B 1.455559 20275316.0 -0.009728
JNJ 1.316222 26930294.0 0.002607
PG 1.248076 25465472.0 0.131139
NVDA 1.171733 6313330.0 0.665816

What is the total retur of the S&P 500 ETF from the peak in February to the last market close?

In [80]:
print("{}%".format(round(constituents_df.loc['SPY', 'peak_to_date'] * 100, 2)))
6.56%

What about the equal-weight S&P 500 performance?

In [81]:
print("{}%".format(round(constituents_df.loc['RSP', 'peak_to_date'] * 100, 2)))
3.68%

What about just the top 50 in the S&P 500?

In [82]:
print("{}%".format(round(constituents_df.loc['XLG', 'peak_to_date'] * 100, 2)))
10.04%

So, from the peak of the market on 2/19/2020, the S&P 500 as a whole is up over 6.5%. Which means the index has recovered all of the losses from the drawdown/recession with some to spare. For the equal weighted version of the index, performance is still positive but a little less than 3% less than the market cap weighted equivalent. This difference is indicative of the outperformance of large vs small companies so far this year. To see how much the index performance was buoyed by the largest companies, the 10.04% performance from just the top 50 companies could mean that if you held the S&P 500 without the top 50 names, you would be negative performance wise during the recovery.

In [83]:
constituents_df.drop(['SPY','RSP','XLG'], inplace=True, axis=0)

Breaking it down by Market Cap (size)

Next, we are going to split up the S&P 500 by market cap into deciles, which should give us around 50 companies per decile. We will be using these deciles to group the returns of the companies, to see if there is any pattern or information that may be interesting.

In [84]:
constituents_df.loc[:, 'decile'] = pd.qcut(constituents_df['percent'].to_list(), 10, labels=False)

Decile 9 is the 50 largest companies, while decile 0 represents the 50 smallest companies in the index

In [85]:
constituents_df
Out[85]:
percent share peak_to_date decile
symbol
AAPL 6.755792 164439440.0 0.459377 9
MSFT 5.645525 77406670.0 0.132220 9
AMZN 4.981768 4354880.0 0.428150 9
FB 2.460435 24592506.0 0.240057 9
GOOGL 1.704741 3073531.0 0.138707 9
HFC 0.009603 1515188.0 -0.456968 0
UAA 0.009101 1852602.0 0.009656 0
UA 0.008485 1978056.0 -0.021390 0
FTI 0.008211 4291772.0 -0.509858 0
NWS 0.005825 1261692.0 0.185057 0

505 rows × 4 columns

Now that we have all of the data we need to start digging in, we are going to group the data by decile, and look at some descriptive statistics for the return from the peak in February. This will give us the average, min, max, standard deviation, etc. so it will be helpful to get a peek into the variation within each group.

In [86]:
constituents_df.groupby('decile')['peak_to_date'].describe()
Out[86]:
count mean std min 25% 50% 75% max
decile
0 50.0 -0.201483 0.223319 -0.615725 -0.341461 -0.217842 -0.080332 0.418510
1 49.0 -0.067887 0.270849 -0.662109 -0.195012 -0.137430 0.063570 0.699703
2 51.0 0.001260 0.224653 -0.382226 -0.152608 -0.046219 0.126765 0.597439
3 50.0 0.113857 0.324194 -0.526371 -0.057762 0.059852 0.237990 1.633697
4 51.0 0.012033 0.241358 -0.443316 -0.111223 0.015064 0.115940 0.632796
5 49.0 0.058554 0.210509 -0.318459 -0.104685 0.040385 0.201152 0.756198
6 49.0 0.110746 0.196115 -0.308597 -0.016661 0.102932 0.215325 0.610861
7 51.0 0.038762 0.175152 -0.284102 -0.104194 0.023504 0.150917 0.595663
8 50.0 0.068971 0.236575 -0.443211 -0.055814 0.038220 0.156029 0.739545
9 50.0 0.122010 0.221007 -0.348387 -0.007703 0.127393 0.251156 0.665816

On average, the two smallest deciles (0 and 1 above) have not yet fully recovered the losses from the pandemic crash yet, and the 3rd smallest has barely recovered (slightly above 0 on average). The best performance has been in the top decile, while the second best has been the 4th decile. However, if you look at the “std” columnn, you can see the variation within each decile, and it is almost 50% greater than the variation in the top decile. The 4th decile’s worst performer was -53%, while the best performer returned a massive 163%. Compare this with the top decile, with worst performer at -34% and the best performer at 66%. This demonstrates the uniformity and consistency of the market’s preference for mega cap stocks during the recovery. Market participants appeared to put a premium on company size perhaps due to the perception that they would be able to weather the storm better?

In [87]:
constituents_df.loc[:, 'weight'] = constituents_df['percent'].apply(lambda x: x/constituents_df['percent'].sum())
In [88]:
constituents_df
Out[88]:
percent share peak_to_date decile weight
symbol
AAPL 6.755792 164439440.0 0.459377 9 0.067562
MSFT 5.645525 77406670.0 0.132220 9 0.056459
AMZN 4.981768 4354880.0 0.428150 9 0.049821
FB 2.460435 24592506.0 0.240057 9 0.024606
GOOGL 1.704741 3073531.0 0.138707 9 0.017048
HFC 0.009603 1515188.0 -0.456968 0 0.000096
UAA 0.009101 1852602.0 0.009656 0 0.000091
UA 0.008485 1978056.0 -0.021390 0 0.000085
FTI 0.008211 4291772.0 -0.509858 0 0.000082
NWS 0.005825 1261692.0 0.185057 0 0.000058

505 rows × 5 columns

In [89]:
constituents_df.loc[:, 'return_contribution_peak_to_date'] = constituents_df['weight'] * constituents_df['peak_to_date']
In [93]:
constituents_df.groupby('decile')['return_contribution_peak_to_date'].sum() / constituents_df['return_contribution_peak_to_date'].sum()
Out[93]:
decile
0   -0.011930
1   -0.008300
2    0.000549
3    0.024789
4    0.003140
5    0.020998
6    0.048493
7    0.027243
8    0.081920
9    0.813099
Name: return_contribution_peak_to_date, dtype: float64

What this shows is that the top decile contributed aroud 81% of the recovery performance, and the bottom two deciles detracted from performance, which lines up with our results so far.

To wrap up this part of the analysis, lets look at a box plot of the performance by decile, which is a nice way to visualize most of what we saw in the data previously.

In [42]:
boxplot = constituents_df.boxplot(column=['peak_to_date'], by=['decile'], figsize=[24,12])

Breaking down the constituents by sector

Let’s bring in some sector data real quick

In [94]:
sector_data = pd.read_csv('sp500_sectors.csv', index_col=1)
In [95]:
sector_constituent_df = pd.merge(sector_data, constituents_df, left_index=True, right_index=True)
In [96]:
sector_constituent_df = sector_constituent_df.dropna()
In [97]:
sector_constituent_df.groupby('sector')['peak_to_date'].describe().sort_values(by='mean')
Out[97]:
count mean std min 25% 50% 75% max
sector
Energy 25.0 -0.341995 0.175702 -0.662109 -0.462221 -0.348387 -0.273734 0.112045
Real Estate 31.0 -0.133095 0.150568 -0.404468 -0.228111 -0.138372 -0.001369 0.148429
Utilities 28.0 -0.100580 0.113766 -0.443316 -0.158609 -0.091907 -0.041900 0.150589
Financials 64.0 -0.080723 0.161523 -0.443211 -0.180854 -0.115506 -0.008073 0.604008
Communication Services 26.0 0.033171 0.207584 -0.257213 -0.172928 0.077039 0.197006 0.497710
Consumer Staples 31.0 0.044896 0.149484 -0.253839 -0.055527 0.049870 0.145582 0.293411
Industrials 66.0 0.087485 0.262545 -0.557713 -0.081551 0.113872 0.219676 0.739545
Consumer Discretionary 61.0 0.087760 0.312344 -0.615725 -0.049052 0.032395 0.248092 1.633697
Health Care 61.0 0.106403 0.226890 -0.284102 -0.056451 0.044411 0.202356 0.703710
Information Technology 72.0 0.116447 0.248760 -0.382226 -0.070303 0.122893 0.270537 0.665816
Materials 28.0 0.182086 0.179906 -0.148591 0.072165 0.195846 0.249910 0.756198

So, even though the S&P 500 is in aggregate recovered all losses from the recession, 4 of the sectors (when equally weighted) have not recovered all of their losses. The rest have recovered (greater than 0 return), but two (Communication Services and Consumer Staples) are lagging behind the broader S&P 500 return. Sentiment around the Energy and Utility sectors has been negative due to expectations for lower energy consumption as the economy faced threat of shutdowns across the country, same goes for real estate. Financials have been out of favor due to the low interest rate environment. Bargain shoppers/value inclined investors may be rummaging through these sectors for stocks that are still beaten down and may have some room to run.

Let’s look inside the energy and real estate sectors real quick to see the distribution of returns for the stocks in each.

In [99]:
sector_constituent_df[sector_constituent_df['sector'] == 'Energy']['peak_to_date'].sort_values().plot(kind='barh', figsize=[12,6], legend=True) # alpha for transparency
Out[99]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7001199320>

Pretty bleak performance across the entire sector, with only 2 companies that have recovered from the drawdown earlier in the year.

In [100]:
sector_constituent_df[sector_constituent_df['sector'] == 'Real Estate']['peak_to_date'].sort_values().plot(kind='barh', figsize=[12,6], legend=True) # alpha for transparency
Out[100]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7000f6d278>

Real estate performance is mostly still in the red, with a couple bright spots, which appear to be in data center, wireless tower and storage unit REITs which all have pretty strong / positive tail winds driving their performance.

In [101]:
sector_constituent_df.groupby('sector')['return_contribution_peak_to_date'].sum().plot(kind='barh', figsize=[12,6], legend=True) # alpha for transparency
Out[101]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7000e52b70>
In [103]:
sector_constituent_df.groupby('sector')['return_contribution_peak_to_date'].sum() / sector_constituent_df['return_contribution_peak_to_date'].sum()
Out[103]:
sector
Communication Services    0.113385
Consumer Discretionary    0.242785
Consumer Staples          0.040726
Energy                   -0.044964
Financials               -0.057521
Health Care               0.095992
Industrials               0.078075
Information Technology    0.516078
Materials                 0.038822
Real Estate              -0.009315
Utilities                -0.014064
Name: return_contribution_peak_to_date, dtype: float64

Last but not least, looking at the sectors contributions to the recovery performance, over 50% of the positive performance is attributed to the technology sector, with consumer discretionary adding about 1/4 of the performance to the index.

The bottom line is that while the broad S&P 500 index (cap-weighted and equal-weighted) has recovered all losses from the pandemic drawdown, stock pickers using the index as their universe have probably had very different results depending on sector weights and market cap weights relative to the index.

In [ ]: