How to engineer investment alpha factors

June 24, 2023
Facebook logo.
Twitter logo.
LinkedIn logo.

How to engineer investment alpha factors

Machine learning has revolutionized the finance industry, allowing professionals to make more informed investment decisions.

But not in the ways most people think.

Machine learning has had the most impact on investment alpha factor engineering, which transforms data into predictive signals that capture market risks. By understanding alpha factor engineering, you can develop accurate and effective investment strategies.

Today, I'll show you how to use Python to generate an alpha factor using Average True Range (ATR) and measure its performance.

Whether you're a beginner or an experienced professional, you’ll be equipped with the code to get started engineering investment alpha factors like the pros.

Today’s issue was inspired by Stefan Jansen’s excellent book Machine Learning for Algorithmic Trading.

Let's dive in!

How to engineer investment alpha factors

Investment alpha factor engineering is a technique used in machine learning to transform different types of data into predictive signals. These signals, known as alpha factors, are designed to capture the risks that drive market movements which are used to make informed investment decisions.

ATR is a commonly used indicator in technical analysis that measures market volatility. Quants often use computational or technical alphas like ATR for isolating risk exposures during portfolio rebalancing.

Here's how it’s done.

Imports and set up

We’ll use pandas for data manipulation, the OpenBB SDK for data, and TA-Lib for an easy way to compute ATR. To determine the performance of ATR as a factor, we’ll use the Spearman rank correlation. Matploblib and Seaborn are for plotting.

First, import the libraries.

1import pandas as pd
2
3from openbb_terminal.sdk import openbb
4from talib import ATR
5
6from scipy.stats import spearmanr
7import matplotlib.pyplot as plt
8
9import seaborn as sns

Next, grab the data and do some preprocessing.

1data = openbb.stocks.screener.screener_data(
2    preset_loaded="most_volatile"
3)
4
5universe = data[
6    (data.Country == "USA") & (data.Price > 5)
7]
8
9stocks = []
10for ticker in universe.Ticker.tolist():
11    df = (
12        openbb
13        .stocks
14        .load(
15            ticker, 
16            start_date="2010-01-01", 
17            verbose=False)
18        .drop("Close", axis=1)
19    )
20    df["ticker"] = ticker
21    stocks.append(df)
22
23prices = pd.concat(stocks)
24prices.columns = ["open", "high", "low", "close", "volume", "ticker"]

The OpenBB SDK has an awesome screener function that downloads a DataFrame of stocks based on pre-built screeners. In this case, I used the “most volatile” screener.

Next, filter the stocks to reduce the universe size. I used country and price.

Then, loop through each ticker and download price data starting in 2010. Add the ticker symbol as a column (you’ll see why in a minute) the concatenate the downloaded data together.

From there, there are two more steps.

1nobs = prices.groupby("ticker").size()
2mask = nobs[nobs > 2 * 12 * 21].index
3prices = prices[prices.ticker.isin(mask)]
4
5prices = (
6    prices
7    .set_index("ticker", append=True)
8    .reorder_levels(["ticker", "date"])
9).drop_duplicates()

You’ll want a sufficient amount of data for analysis, so grab only the tickers that have at least two years of data.

Finally, add ticker to the index column and reshuffle them in a MultiIndex DataFrame with ticker first, then date.

Create the factor

Create a simple function that computes the ATR.

1def atr(data):
2    df = ATR(data.high, data.low, data.close, timeperiod=14)
3    return df.sub(df.mean()).div(df.std())
4
5prices["atr"] = (
6    prices
7    .groupby("ticker", group_keys=False)
8    .apply(atr)
9)

The function first computes ATR the normalizes it using the z-score. Next, apply the rolling ATR to each ticker in the DataFrame.

Assessing factor performance

Alpha factors should be good at predicting future performance. The way to test this assumption is to find the Spearman rank correlation between the factor and future returns.

1lags = [1, 5, 10, 21, 42, 63]
2for lag in lags:
3    prices[f"return_{lag}d"] = (
4        prices.groupby(level="ticker")
5        .close.pct_change(lag)
6    )

The first step creates 1, 5, 10, 21, 42, and 63 day lagged returns. This is helpful to understand the IC decay of the factor over time.

Now generate the forward returns.

1for t in [1, 5, 10, 21]:
2    prices[f"target_{t}d"] = (
3        prices
4        .groupby(level="ticker")[f"return_{t}d"]
5        .shift(-t)
6    )

The code creates 1, 5, 10, and 21 day forward returns based on each of the lags.

Now, generate the results.

1target = "target_1d"
2metric = "atr"
3j = sns.jointplot(x=metric, y=target, data=prices)
4plt.tight_layout()
5
6df = prices[[metric, target]].dropna()
7r, p = spearmanr(df[metric], df[target])
8print(f"{r:,.2%} ({p:.2%})")
How to engineer investment alpha factors. Investment alpha factor engineering transforms data into predictive signals that capture market risks.

The scatter plot represents the forward return against the ATR z-score. You can see that most of the returns cluster near zero despite positive skew in the distribution of ATR metrics.

The code also prints out the Spearman rank correlation and p-score. The correlation is negative which makes sense: the higher the volatility, the lower the returns.

Next steps

This issue walks through the basic steps of feature engineering. An actionable next step is to replace ATR with a different factor. You can use another technical indicator in TA-Lib, incorporate fundamental data like PE or EPS, or even download the Fama-French factors using pandas-datareader.

Man with glasses and a wristwatch, wearing a white shirt, looking thoughtfully at a laptop with a data screen in the background.