More on linear regression: Capital asset pricing models

This is the long-awaited second part of February’s post on Linear Regression. This time, without the needlessly abstract math, but with some classic portfolio theory as an example of “applied linear regression”.

We’ll be discussing two papers: The performance of mutual funds in the period 1945-1964 (1968) by Michael Jensen and Common Risk Factors in the Returns On Stocks And Bonds (1993) by Eugene Fama and Kenneth French.

Some disclaimers:

I am by no means an expert on this topic. I learned about these investing concepts on the Rational Reminder podcast by the excellent Benjamin Felix and Cameron Passmore.
There’s many more results on this, both classic and recent. The two papers discussed here added some delta at the time, but previous results by others were essential too, and get no mention here. I chose these two papers because they are relatively easy to follow and their main message can be explained well in a blog post.
All of this is just for fun. Capital at risk.

Jensen (1968): Standard stuff.

On models

The papers we discuss present models. Models have assumptions as well as domains in which they are valid and domains in which they are not. We are not going to make precise statements about either of these, but it’s useful to not confuse models with reality. ‘Whereof one has no model thereof one must be silent’ (otherwise, philosophus mansisses). This is also known as searching under the streetlights.

The aim of our model will be to evaluate, or assess, the performance of any given portfolio. Specifically, the model should evaluate a portfolio manager’s ability to increase returns by successfully predicting future prices for a given level of riskiness. For instance, if stock market returns are positive in expectation, a leveraged portfolio could outperform an unleveraged portfolio without any special forecasting ability on the manager’s part. Conversely, a classical 60/40 mix of stocks and bonds would likely do worse than a 100% equity portfolio in this case, but may still be preferable due to its reduced risk. We won’t go into detail how risk is measured, but we should mention that under certain assumptions, it is precisely its riskiness that causes a given asset to yield higher returns (ceteris paribus, and on average): If investors are risk-averse, they will need to be compensated for taking on the additional risk of a specific asset compared to some other asset, and higher expected future returns (expected future prices relative to its current price) are the way that compensation happens in a liquid market. Of course, the actual expectation of any given investor may also just be wrong, but over time one may expect the worst investors to drop out, improving the accuracy of the average investor’s expectations.

Time series regressions

The specific models here do linear regression on time series. Suppose we have one data point per time interval, e.g. per trading day:

day		value
5 January 1970		388.8K
6 January 1970		475.2K
	⋮

We will use this data as an $n$-dimensional feature vector (aka explanatory variable). Think of it as coming from the value of an index, specifically from the capitalization-weighted value of the whole stock market on that trading day. We will call this the market portfolio.

The portfolio (or individual stock) we want to assess will also have a value each trading day. This puts us in a situation in which we could try to use linear regression to express our assessed portfolio as a linear combination of the market value and a constant intercept vector. However, little would be gained by just doing that – we’ll have to at least normalize things a bit, otherwise this is what the model will use its capacity (two numbers!) for. And while we are at it, we remember that there used to be a time when the (almost, or by definition) risk-free return offered by central banks was not basically zero. To tease out the “risk factor”, we will use the market return minus this risk-free rate as our feature vector. Our data thus could look like this:

day		market	risk-free
5 January 1970		1.21%	0.029%
6 January 1970		0.62%	0.029%
	⋮

For the risk-free rate, one could use the treasury bill rate. All these numbers can in principle be retrieved from the historical records (and in practice downloaded from Kenneth French’s homepage).

Let’s call the market return $R_M$ and the risk-free return $R_F$. A sensible regression without an intercept term could then read as

\[R \approx R_F + \beta(R_M - R_F), \label{eq:1}\tag{1}\]

where $R$ is vector the observed returns of the portfolio or stock we want to assess, and $\beta$ is computed via the least-squares condition of linear regression, i.e., $\beta = \argmin\{\abs{R - R_F - \beta(R_M - R_F)}_2\st \beta\in\R\}$, as in the last blog post. Notice that the $R_F$s make sense here: It’s deviations from this risk-free return that we want to model, and we want $\beta$ to scale the non risk-free portion of the market portfolio.

Example. For a classic 60/40 portfolio with 60% whole market and 40% risk-free return (historically not realistic for private investors, but easy to compute) we have

\[R - R_F= 0.6R_M + 0.4R_F - R_F = 0.6(R_M - R_F)\in\R^{n\times 1}\]

and therefore, by the normal equation,

\[\beta = \bigl((R_M - R_F)^\top (R_M - R_F)\bigr)^{-1} (R_M - R_F)^\top (R - R_F) = 0.6.\]

That seems sensible in this case.

Finding Alpha

However, \eqref{eq:1} isn’t quite good enough: More precisely, it reads

\[R = R_F + \beta(R_M - R_F) + e, \label{eq:2}\tag{2}\]

with an error term $e\in\R^n$. As Jensen (1968) argues,

[W]e must be very careful when applying the equation to managed portfolios. If the manager is a superior forecaster (perhaps because of special knowledge not available to others) he will tend to systematically select securities which realize [$e_j > 0$].

This touches on a subject glossed over in the last blog post: Most statements about linear regression models depend on certain statistical assumptions, among them that the error terms are elementwise iid, ideally with a mean of zero. There’s autocorrelation tests like Durbin-Watson to test if this is true for a particular dataset. In this particular modeling exercise, we can do better by adding the constant $\1=(1,\ldots,1)\in\R^n$ intercept vector to the subspace we project on, which turns \eqref{eq:2} into

\[R - R_F = \alpha + \beta(R_M - R_F) + u, \label{eq:3}\tag{3}\]

with an error term $u\in\R^n$.

Ever wondered where the “alpha” in the clickbait website Seeking Alpha comes from? It is this $\alpha$, the coefficient of the $\1$ intercept vector in \eqref{eq:3}. To quote Jensen (1968) again:

Thus if the portfolio manager has an ability to forecast security prices, the intercept, [$\alpha$, in eq. \eqref{eq:3}] will be positive. Indeed, it represents the average incremental rate of return on the portfolio per unit time which is due solely to the manager’s ability to forecast future security prices. It is interesting to note that a naive random selection buy and hold policy can be expected to yield a zero intercept. In addition if the manager is not doing as well as a random selection buy and hold policy, [$\alpha$] will be negative. At first glance it might seem difficult to do worse than a random selection policy, but such results may very well be due to the generation of too many expenses in unsuccessful forecasting attempts.

However, given that we observe a positive intercept in any sample of returns on a portfolio we have the difficulty of judging whether or not this observation was due to mere random chance or to the superior forecasting ability of the portfolio manager. […]

It should be emphasized that in estimating [$\alpha$], the measure of performance, we are explicitly allowing for the effects of risk on return as implied by the asset pricing model. Moreover, it should also be noted that if the model is valid, the particular nature of general economic conditions or the particular market conditions (the behavior of $\pi$) over the sample or evaluation period has no effect whatsoever on the measure of performance. Thus our measure of performance can be legitimately compared across funds of different risk levels and across different time periods irrespective of general economic and market condition.

About the error term $u$, first notice that thanks to the intercept term we can expect it to have a mean of zero. Further, Jensen (1968) argues it “should be serially [i.e., elementwise] independent” as otherwise “the manager could increase his return even more by taking account of the information contained in the serial dependence and would therefore eliminate it.”

Just show me the code!

After introducing this model, Jensen (1968) continues with “the data and empirical results”. In ca. 2015 AI parlance, this part could be called “Experiments”. Take a look at Table 1 in the paper to get a list of quaint mutual fund names. Notice too that it’s not immediately obvious how to get the market portfolio’s returns from historical trading data, as companies enter and leave the stock market, and how they leave will play a huge role. (Bankruptcy? Taken private at $420?)

For our purposes though, all of this has been taken care of and Kenneth French’s homepage has the data, including data for each trading day, in usable formats.

Let’s start by getting the data and sanity-checking our 60/40 example above. All of the code in this post can also be downloaded separately or run in a Google Colab.

import os
import re
import urllib.request
import zipfile

import numpy as np

FF3F = (  # Monthly data.
    "https://mba.tuck.dartmouth.edu/pages/faculty/"
    "ken.french/ftp/F-F_Research_Data_Factors_TXT.zip"
)


def download(url=FF3F):
    archive = os.path.basename(url)
    if not os.path.exists(archive):
        print("Retrieving", url)
        urllib.request.urlretrieve(url, archive)
    return archive


def extract(archive, match=re.compile(rb"\d" * 6)):
    with zipfile.ZipFile(archive) as z:
        name, *rest = z.namelist()
        assert not rest
        with z.open(name) as f:
            # Filter for the actual data lines in the file.
            return np.loadtxt((line for line in f if match.match(line)), unpack=True)


date, mktmrf, _, _, rf = extract(download())

mkt = mktmrf + rf

# Linear regression using the normal equation.
A = np.stack([np.ones_like(mktmrf), mktmrf], axis=1)
alpha, beta = np.linalg.inv(A.T @ A) @ A.T @ (0.6 * mkt + 0.4 * rf - rf)

print("alpha=%f; beta=%f" % (alpha, beta))

As expected, this prints

alpha=0.000000; beta=0.600000

If we are seeking $\alpha$, we’ll have to look elsewhere.

Let’s try some real data. The biggest issue is getting hold of the daily returns of real portfolios. Real researchers use data sources like the Center for Research in Security Prices (CRSP), but their data isn’t available for free. Instead, let’s the data for iShares S&P Small-Cap 600 Value ETF (IJS) from Yahoo finance.

# Continuing from above.

import time

IJS = (
    "https://gist.githubusercontent.com/heiner/b222d0985cbebfdfc77288404e6b2735/"
    "raw/08c1cacecbcfcd9e30ce28ee6d3fe3d96c07115c/IJS.csv"
)


def extract_csv(archive):
    with open(archive) as f:
        return np.loadtxt(
            f,
            delimiter=",",
            skiprows=1,  # Header.
            converters={  # Hacky date handling.
                0: lambda s: time.strftime(
                    "%Y%m", time.strptime(s.decode("ascii"), "%Y-%m-%d")
                )
            },
        )


ijs_data = extract_csv(download(IJS))

ijs = ijs_data[:, 5]  # Adj Close (includes dividends).

# Turn into monthly percentage returns.
ijs = 100 * (ijs[1:] / ijs[:-1] - 1)
ijs_date = ijs_data[1:, 0]

ijs_date, indices, ijs_indices = np.intersect1d(date, ijs_date, return_indices=True)

# Regression model for CAPM.
A = np.stack([np.ones_like(ijs_date), mktmrf[indices]], axis=1)
y = ijs[ijs_indices] - rf[indices]
B = np.linalg.inv(A.T @ A) @ A.T @ y
alpha, beta = B

# R^2 and adjusted R^2.
model_err = A @ B - y
ss_err = model_err.T @ model_err
r2 = 1 - ss_err.item() / np.var(y, ddof=len(y) - 1)
adjr2 = 1 - ss_err.item() / (A.shape[0] - A.shape[1]) / np.var(y, ddof=1)

print(
    "CAPM: alpha=%.2f%%; beta=%.2f. R^2=%.1f%%; R_adj^2=%.1f%%. Annualized alpha: %.2f%%"
    % (
        alpha,
        beta,
        100 * r2,
        100 * adjr2,
        ((1 + alpha / 100) ** 12 - 1) * 100,
    )
)

This prints:

CAPM: alpha=0.13%; beta=1.14. R^2=77.8%; R_adj^2=77.7%. Annualized alpha: 1.58%

Since $\alpha$ is the weight for the constant intercept vector $\1 = (1,\ldots,1)$, we can think of it as having percentage points as its unit. Note that fees are not included in this calculation. However, as for many ETFs fees for IJS are low, currently at 0.25% per year. (Managed mutual funds will typically have an annual fee of at least 1%, historically often more than that.)

It seems bit strange that this ETF tracking the S&P Small-Cap 600 Value index has significant $\alpha$: Presumably, the index just includes firms based on simple rules, not genius insights by some above-average fund manager. Looking at the $R^2$ value, we “explain” only 77% of the variance of the returns of IJS (the usual caveats to the wording “explain” apply).

Clearly more research was needed. Or just a larger subspace for the linear regression to project onto?

Fama & French (1993): More factors.

By the 1980s, research into financial economics had noticed that certain segments of the market outperformed other segments, and thus the market as a whole, on average. There are several possible explanations for this effect with different implications for the future. For example: Are these segments of the market just inherently riskier such that rational traders demand higher expected returns via sufficiently low prices for these stocks? Or were traders just irrationally disinterested in some ‘unsexy’ firms and have perhaps caught on by now (or not, hence TSLA)? The latter is the behavioural explanation, while the former tends to be put forth by proponents of the Efficient-market hypothesis (EMH), which includes Jensen as well as Fama and French. We won’t be getting into this now. Let’s instead take a look at which ‘easily’ identifiable segments of the market have historically outperformed.

Citing previous literature, Fama & French (1993) mention size (market capitalization, i.e., price per stock times number of shares), earnings per price and book-to-market (book value divided by market value) as variables that appear to have “explanatory power”, which I take to mean that some model that includes these variables has nonzero regression coefficients and a relatively large $R^2$ or other appropriate statistics.

The specific way in which Fama & French (1993) introduce these variables into the model is through the construction of portfolios that mimic these variables. This approach contributed to their being awarded the Nobel (Memorial) Prize in Economic Sciences in 2013. The specific construction goes as follows:

Take all stocks in the overall market and order them by their size (i.e., market capitalization). Then take the top and bottom halves and call them “big” (B) and “small” (S), respectively.

Next, again take all stocks in the overall market and order them by book-to-market equity. Then take the bottom 30% (“low”, L), the middle 40% (“medium”, M), and the top 30% (“high”, H). In both cases, some care needs to be taken: E.g., how to handle firms dropping in and out of the market, how to define book equity properly in the presence of deferred taxes, and other effects.

Then, construct the six portfolios containing stocks in the intersections of the two size and the three book-to-market equity groups, e.g.

	low	medium	high
small	S/L	S/M	S/H
big	B/L	B/M	B/H

Out of these six building blocks, Fama & French build a size and a book-to-market equity portfolio:

The size portfolio is “small minus big” (SMB) consisting of the monthly difference of the three small-stock portfolios S/L, S/M, and S/H to the three big-stock portfolios B/L, B/M, and B/H.
The book-to-market equity portfolio is “high minus low” (HML), the monthly difference of the two high book-to-market portfolios S/H and B/H to the two low book-to-market portfolios S/L and B/L. This is also known as the value factor.

Additionally, the authors also use the market portfolio as “market return minus risk-free return (one-month treasury bill rate)” in the same way as Jensen (1968).

An aside

I’m not sure why SMB and HML need the to have their two terms be equally weighted among the splits of the other ordering. The authors mention

[For HML] the difference between the two returns should be largely free of the size factor in returns, focusing instead on the different return behaviors of high- and low-[book-to-market] firms. As testimony to the success of this simple procedure, the correlation between the 1963–1991 monthly mimicking returns for the size and book-to-market factors is only $- 0.08$.

Taking “correlation” to mean the Pearson correlation coefficient, we can test this using the data from French’s homepage:

date, mktmrf, smb, hml, rf = extract(download())
print(np.corrcoef(smb, hml))

This prints

[[1.         0.12889074]
 [0.12889074 1.        ]]

which implies a coefficient of $0.13$. In 1993, Fama and French had less data available: The paper uses the 342 months from July 1963 to December 1991. Let’s check with this range:

orig_indices = (196307 <= date) & (date <= 199112)
assert len(smb[orig_indices]) == 342
print(np.corrcoef(smb[orig_indices], hml[orig_indices]))

This yields

[[ 1.         -0.09669641]
 [-0.09669641  1.        ]]

a coefficient of roughly $-0.10$, not the $-0.08$ the authors mention, but relatively close. I guess the data has been cleaned a bit since 1993?

As a further aside, accumulating this data and analyzing it was a true feat in 1993. These days, we can do the same using the internet and a few lines of Python (or, spoiler alert, using just a website).

Back to modelling

So what are we to do with SMB and HML? You guessed it – just add them to the regression model. Of course, this makes the subspace we project on larger, which will always decrease the “fraction of variance unexplained”, without necessarily explaining much. However, in the case of IJS it appears to explain a bit:

# Continuing from above.
A = np.stack(
    [np.ones_like(ijs_date), mktmrf[indices], smb[indices], hml[indices]], axis=1
)
y = ijs[ijs_indices] - rf[indices]
B = np.linalg.inv(A.T @ A) @ A.T @ y
alpha, beta_mkt, beta_smb, beta_hml = B

model_err = A @ B - y
ss_err = model_err.T @ model_err
r2 = 1 - ss_err.item() / np.var(y, ddof=len(y) - 1)
adjr2 = 1 - ss_err.item() / (A.shape[0] - A.shape[1]) / np.var(y, ddof=1)

print(
    "FF3F: alpha=%.2f%%; beta_mkt=%.2f; beta_smb=%.2f; beta_hml=%.2f."
    " R^2=%.1f%%; R_adj^2=%.1f%%. Annualized alpha: %.2f%%"
    % (
        alpha,
        beta_mkt,
        beta_smb,
        beta_hml,
        100 * r2,
        100 * adjr2,
        ((1 + alpha / 100) ** 12 - 1) * 100,
    )
)

This prints:

FF3F: alpha=0.04%; beta_mkt=0.97; beta_smb=0.79; beta_hml=0.51. R^2=95.8%; R_adj^2=95.8%. Annualized alpha: 0.43%

In other words, we dropped from an (annualized) $\alpha_{\rm CAPM} = 1.58\%$ to only $\alpha_{\rm FF3F} = 0.43\%$. The explained fraction of variance has increased to above 95%.

Remembering that Jensen (1968) talked about assessing fund managers with this model, we could try the same with actual managed funds. While I couldn’t produce any impressive results there, French and Fama did go into the question of Luck versus Skill in the Cross-Section of Mutual Fund Returns in a 2010 paper. The results, on average, don’t look good for fund managers’ skill. The story for individual fund managers may be better, but don’t hold your breath.

Was this worth it?

We discussed academic outputs of Jensen, French and Fama. The latter two even got a Nobel for their work on factor models. But nowadays, we can do (parts) of their computations in a few lines of Python.

It’s actually easier than that still. The website portfoliovisualizer.com allows us to do all of these computations and more with a few clicks. In that sense, this blog post was perhaps not worth it.

Another question is how useful these models are. This touches on why SMB and HML ‘explain’ returns of portfolios (e.g., the risk explanation vs the behavioural explanation mentioned above, or perhaps both or neither). In 2014, Fama and French presented another updated model with five factors, adding profitability and investment; judged by $R^2$, this five-factor model ‘explains’ even more of the variance of example portfolios. Other research suggesting alternative factors abounds.

How well do these models really ‘explain’ the phenomenal historical returns of star investors like Warren Buffett? Given that Buffett is a proponent of the Benjamin Graham school of value investing, including a value factor like HML could perhaps be the key to explain his success? For the Fama & French five-factor model, we can check portfoliovisualizer.com: With $R^2 \approx 33\%$, and an annualized $\alpha$ of 4.87%, the results don’t look too good for the math nerds but very good for the ‘Oracle of Omaha’.

This is obviously not a new observation. There is even a paper by a number of people from investment firm AQR about Buffett’s Alpha that aims to explain Buffett’s successes with leveraging as well as yet another set of new factors in a linear regression model:

[Buffett’s] alpha became insignificant, however, when we controlled for exposure to the factors “betting against beta” and “quality minus junk.”

Nice as this may sound, it would appear more convincing to this author if the financial analysis community could converge on a small common set of factors instead of seemingly creating them ad hoc. Otherwise, von Neumann’s line comes to mind: “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”

And now what?

We discussed two financial economics papers and the linear regression models they propose, merely to give us a sense of what’s done in this field. One may get a sense that this research should be useful for more than just amusement, perhaps it could even inform our investment choices? Many good financial advisors will make use of data analyses like this and suggest factor-tilted portfolios. However, value investing, both with factors as well as the Buffett/Munger variety, has trailed the overall market in the last 10–15 years. Statistically, this is to be expected to happen every now and then, so we cannot read too much into that. But it’s possible the market has just caught on, past performance is not indicative of future results and value investing should be cancelled in 2020. That would at least match the zeitgeist. However, it’s also entirely possible it’s exactly times like the present that make value investing hard but ultimately worthwhile and we should be greedy when others are fearful.

Time will tell.

References

Frazzini, Andrea & Kabiller, David & Pedersen, Lasse Heje. Buffett’s Alpha. Financ. Anal. J. 74 (4), 35–55 DOI:10.2469/faj.v74.n4.3.
Fama, Eugene F. & French, Kenneth R. Common risk factors in the returns on stocks and bonds. J. Financ. Econ. 33, (1), 3–56 (1993). DOI:10.1016/0304-405X(93)90023-5.
———. Luck versus skill in the cross‐section of mutual fund returns. J. Financ. 65 (5), 1915–1947 (2010). DOI:10.1111/j.1540-6261.2010.01598.x.
———. A five-factor asset pricing model. J. Financ. Econ. 116 (1), 1–22 (2015). DOI:10.1016/j.jfineco.2014.10.010.
Jensen, Michael C. The performance of mutual funds in the period 1945–1964. J. Financ. 23 (2), 389–416 (1968). DOI:10.1111/j.1540-6261.1968.tb00815.x.