# How to engineer investment alpha factors

Machine learning has revolutionized the finance industry, allowing professionals to make more informed investment decisions.

But not in the ways most people think.

Machine learning has had the most impact on investment alpha factor engineering, which transforms data into predictive signals that capture market risks. By understanding alpha factor engineering, you can develop accurate and effective investment strategies.

Today, I’ll show you how to use Python to generate an alpha factor using Average True Range (ATR) and measure its performance.

Whether you’re a beginner or an experienced professional, you’ll be equipped with the code to get started engineering investment alpha factors like the pros.

Today’s issue was inspired by Stefan Jansen’s excellent book Machine Learning for Algorithmic Trading.

Let’s dive in!

## How to engineer investment alpha factors

Investment alpha factor engineering is a technique used in machine learning to transform different types of data into predictive signals. These signals, known as alpha factors, are designed to capture the risks that drive market movements which are used to make informed investment decisions.

ATR is a commonly used indicator in technical analysis that measures market volatility. Quants often use computational or technical alphas like ATR for isolating risk exposures during portfolio rebalancing.

Here’s how it’s done.

## Imports and set up

We’ll use pandas for data manipulation, the OpenBB SDK for data, and TA-Lib for an easy way to compute ATR. To determine the performance of ATR as a factor, we’ll use the Spearman rank correlation. Matploblib and Seaborn are for plotting.

First, import the libraries.

import pandas as pd

from openbb_terminal.sdk import openbb
from talib import ATR

from scipy.stats import spearmanr
import matplotlib.pyplot as plt

import seaborn as sns

Next, grab the data and do some preprocessing.

data = openbb.stocks.screener.screener_data(
)

universe = data[
(data.Country == "USA") &amp; (data.Price &gt; 5)
]

stocks = []
for ticker in universe.Ticker.tolist():
df = (
openbb
.stocks
ticker,
start_date="2010-01-01",
verbose=False)
.drop("Close", axis=1)
)
df["ticker"] = ticker
stocks.append(df)

prices = pd.concat(stocks)
prices.columns = ["open", "high", "low", "close", "volume", "ticker"]

The OpenBB SDK has an awesome screener function that downloads a DataFrame of stocks based on pre-built screeners. In this case, I used the “most volatile” screener.

Next, filter the stocks to reduce the universe size. I used country and price.

Then, loop through each ticker and download price data starting in 2010. Add the ticker symbol as a column (you’ll see why in a minute) the concatenate the downloaded data together.

From there, there are two more steps.

nobs = prices.groupby("ticker").size()
mask = nobs[nobs &gt; 2 * 12 * 21].index

prices = (
prices
.set_index("ticker", append=True)
.reorder_levels(["ticker", "date"])
).drop_duplicates()

You’ll want a sufficient amount of data for analysis, so grab only the tickers that have at least two years of data.

Finally, add ticker to the index column and reshuffle them in a MultiIndex DataFrame with ticker first, then date.

## Create the factor

Create a simple function that computes the ATR.

def atr(data):
df = ATR(data.high, data.low, data.close, timeperiod=14)
return df.sub(df.mean()).div(df.std())

prices["atr"] = (
prices
.groupby("ticker", group_keys=False)
.apply(atr)
)

The function first computes ATR the normalizes it using the z-score. Next, apply the rolling ATR to each ticker in the DataFrame.

## Assessing factor performance

Alpha factors should be good at predicting future performance. The way to test this assumption is to find the Spearman rank correlation between the factor and future returns.

lags = [1, 5, 10, 21, 42, 63]
for lag in lags:
prices[f"return_{lag}d"] = (
prices.groupby(level="ticker")
.close.pct_change(lag)
)

The first step creates 1, 5, 10, 21, 42, and 63 day lagged returns. This is helpful to understand the IC decay of the factor over time.

Now generate the forward returns.

for t in [1, 5, 10, 21]:
prices[f"target_{t}d"] = (
prices
.groupby(level="ticker")[f"return_{t}d"]
.shift(-t)
)

The code creates 1, 5, 10, and 21 day forward returns based on each of the lags.

Now, generate the results.

target = "target_1d"
metric = "atr"
j = sns.jointplot(x=metric, y=target, data=prices)
plt.tight_layout()

df = prices[[metric, target]].dropna()
r, p = spearmanr(df[metric], df[target])
print(f"{r:,.2%} ({p:.2%})")

The scatter plot represents the forward return against the ATR z-score. You can see that most of the returns cluster near zero despite positive skew in the distribution of ATR metrics.

The code also prints out the Spearman rank correlation and p-score. The correlation is negative which makes sense: the higher the volatility, the lower the returns.

## Next steps

This issue walks through the basic steps of feature engineering. An actionable next step is to replace ATR with a different factor. You can use another technical indicator in TA-Lib, incorporate fundamental data like PE or EPS, or even download the Fama-French factors using pandas-datareader.