How to engineer investment alpha factors

Machine learning has revolutionized the finance industry, allowing professionals to make more informed investment decisions.

But not in the ways most people think.

Machine learning has had the most impact on investment alpha factor engineering, which transforms data into predictive signals that capture market risks. By understanding alpha factor engineering, you can develop accurate and effective investment strategies.

Today, I’ll show you how to use Python to generate an alpha factor using Average True Range (ATR) and measure its performance.

Whether you’re a beginner or an experienced professional, you’ll be equipped with the code to get started engineering investment alpha factors like the pros.

Today’s issue was inspired by Stefan Jansen’s excellent book Machine Learning for Algorithmic Trading.

Let’s dive in!

How to engineer investment alpha factors

Investment alpha factor engineering is a technique used in machine learning to transform different types of data into predictive signals. These signals, known as alpha factors, are designed to capture the risks that drive market movements which are used to make informed investment decisions.

ATR is a commonly used indicator in technical analysis that measures market volatility. Quants often use computational or technical alphas like ATR for isolating risk exposures during portfolio rebalancing.

Here’s how it’s done.

Imports and set up

We’ll use pandas for data manipulation, the OpenBB SDK for data, and TA-Lib for an easy way to compute ATR. To determine the performance of ATR as a factor, we’ll use the Spearman rank correlation. Matploblib and Seaborn are for plotting.

First, import the libraries.

import pandas as pd

from openbb_terminal.sdk import openbb
from talib import ATR

from scipy.stats import spearmanr
import matplotlib.pyplot as plt

import seaborn as sns

Next, grab the data and do some preprocessing.

data = openbb.stocks.screener.screener_data(
    preset_loaded="most_volatile"
)

universe = data[
    (data.Country == "USA") &amp; (data.Price &gt; 5)
]

stocks = []
for ticker in universe.Ticker.tolist():
    df = (
        openbb
        .stocks
        .load(
            ticker, 
            start_date="2010-01-01", 
            verbose=False)
        .drop("Close", axis=1)
    )
    df["ticker"] = ticker
    stocks.append(df)

prices = pd.concat(stocks)
prices.columns = ["open", "high", "low", "close", "volume", "ticker"]

The OpenBB SDK has an awesome screener function that downloads a DataFrame of stocks based on pre-built screeners. In this case, I used the “most volatile” screener.

Next, filter the stocks to reduce the universe size. I used country and price.

Then, loop through each ticker and download price data starting in 2010. Add the ticker symbol as a column (you’ll see why in a minute) the concatenate the downloaded data together.

From there, there are two more steps.

nobs = prices.groupby("ticker").size()
mask = nobs[nobs &gt; 2 * 12 * 21].index
prices = prices[prices.ticker.isin(mask)]

prices = (
    prices
    .set_index("ticker", append=True)
    .reorder_levels(["ticker", "date"])
).drop_duplicates()

You’ll want a sufficient amount of data for analysis, so grab only the tickers that have at least two years of data.

Finally, add ticker to the index column and reshuffle them in a MultiIndex DataFrame with ticker first, then date.

Create the factor

Create a simple function that computes the ATR.

def atr(data):
    df = ATR(data.high, data.low, data.close, timeperiod=14)
    return df.sub(df.mean()).div(df.std())

prices["atr"] = (
    prices
    .groupby("ticker", group_keys=False)
    .apply(atr)
)

The function first computes ATR the normalizes it using the z-score. Next, apply the rolling ATR to each ticker in the DataFrame.

Assessing factor performance

Alpha factors should be good at predicting future performance. The way to test this assumption is to find the Spearman rank correlation between the factor and future returns.

lags = [1, 5, 10, 21, 42, 63]
for lag in lags:
    prices[f"return_{lag}d"] = (
        prices.groupby(level="ticker")
        .close.pct_change(lag)
    )

The first step creates 1, 5, 10, 21, 42, and 63 day lagged returns. This is helpful to understand the IC decay of the factor over time.

Now generate the forward returns.

for t in [1, 5, 10, 21]:
    prices[f"target_{t}d"] = (
        prices
        .groupby(level="ticker")[f"return_{t}d"]
        .shift(-t)
    )

The code creates 1, 5, 10, and 21 day forward returns based on each of the lags.

Now, generate the results.

target = "target_1d"
metric = "atr"
j = sns.jointplot(x=metric, y=target, data=prices)
plt.tight_layout()

df = prices[[metric, target]].dropna()
r, p = spearmanr(df[metric], df[target])
print(f"{r:,.2%} ({p:.2%})")

How to engineer investment alpha factors. Investment alpha factor engineering transforms data into predictive signals that capture market risks.

The scatter plot represents the forward return against the ATR z-score. You can see that most of the returns cluster near zero despite positive skew in the distribution of ATR metrics.

The code also prints out the Spearman rank correlation and p-score. The correlation is negative which makes sense: the higher the volatility, the lower the returns.

Next steps

This issue walks through the basic steps of feature engineering. An actionable next step is to replace ATR with a different factor. You can use another technical indicator in TA-Lib, incorporate fundamental data like PE or EPS, or even download the Fama-French factors using pandas-datareader.

Connect With PyQuant News

80KFollowers

May cohort is now open: How to secure your spot:

How to engineer investment alpha factors

How to engineer investment alpha factors

Imports and set up

Create the factor

Assessing factor performance

Next steps

Connect With PyQuant News

Get started with Python for quant finance with the PyQuant Newsletter

Free Resources

How to ingest premium market data with Zipline Reloaded

Accessing Financial Data In EDGAR using Python

Datasets, DataLoaders and PyTorch’s New DataPipes

A Trading Strategy Based on Elon Musk’s Tweets

Pricing Options and Implied Volatility with Python