Build and run a backtest like the pros

In today’s issue, I’m going to show you how to build a backtest for a trading strategy.

A backtest is a way to test trading ideas against historic market data. It’s a simulation of how the strategy might have performed in the market. Usually, traders will optimize performance metrics like the Sharpe ratio by tweaking the input parameters like the lookback period.

Unfortunately, most beginners spend all their time tweaking backtests only to find they don’t work in real life. Even with out-of-sample data, cross-validation, and walk-forward analysis, backtest results are often way off. The majority of trading systems with a positive backtest are actually unprofitable.

Why?

Treat backtesting as an experiment

Suppose your strategy has $0 expected profit (i.e. it trades randomly) but you don’t know it. A random strategy will produce positive results in 50% of cases and negative results in 50% of cases. Results will rarely be $0.

What do most people do with a negative backtest result? Tweak the parameters until it’s positive. To get around this problem, professionals will backtest their backtest.

I’m going to show you how.

By the end of this issue, you will know how to:

Set up a backtest with bt
Run a backtest and analyze the results
Perform a Monte Carlo simulation on your backtest

All in Python.

Let’s get started!

Step 1: Imports and set up

bt is a flexible backtesting framework for Python used to test quantitative trading strategies. Import NumPy and Matplotlib too.

1%matplotlib inline
2import bt
3import matplotlib.pyplot as plt

Fund managers report their holdings every month. They don’t want to tell investors they lost money the latest meme stock. So they sell their meme stocks and buy higher quality assets, like bonds.

We might be able to take advantage of this effect by buying bonds toward the end of the month and selling them at the beginning.

Start by getting data for the bond ETF, TLT.

1data = yf.download("TLT", start="2010-01-01", end=today)
2data = data.rename(columns = {'Adj Close':'TLT'})
3data = data.drop(['Open', 'High', 'Low', 'Close', 'Volume'], axis=1)

The first takes daily portfolio weights. bt has a library of built-in algos which takes care of the logic for you. This function weights the portfolio based on the input and rebalances daily.

1def build_strategy(weights):
2    return bt.Strategy(
3        'wd', 
4        [bt.algos.SelectAll(), 
5         bt.algos.WeighTarget(weights), 
6         bt.algos.Rebalance()]
7    )

The next function takes the strategy you just built, market data, initial capital, and a commission model.

1def commission_model(q, p):
2        
3    # p is price, q is quantity
4    val = abs(q * p)
5    if val > 2000:
6        return 8.6
7    if val > 1000:
8        return 4.3
9    if val > 100:
10        return 1.5
11    return 1.0

Your commission model can be anything you want. It just needs price and quantity.

1def add_dom(df):
2    
3    # add the day of month and return
4    added = df.copy()
5    added["day_of_month"] = df.index.day
6    return added

Next, create a function that adds a column to the DataFrame with the day of the month. I want to be long TLT for the last week of the month and short during first. To do this, I need to know the day of the month.

1def add_weights(df, symbol):
2    
3    # start with no position within the month
4    strategy = df[[symbol]].copy()
5    
6    # start with no position within the month
7    strategy.loc[:] = 0
8    
9    # short within the first week of the month
10    strategy.loc[df.day_of_month <= 7] = -1
11
12    # long during the last week of the month
13    strategy.loc[df.day_of_month >= 23] = 1
14    
15    return strategy

The last function weights the portfolio 100% short during the first week of the month and 100% long during the last week of the month. All other days the strategy is out of the market.

1initial_capital = 10_000

Finally, set the initial capital.

Step 2: Run the initial backtest

Now that everything is in place, run the backtest.

1# add the day of month
2data_with_dom = add_dom(data)
3
4# get the portfolio weights
5weights = add_weights(data_with_dom, 'TLT')
6
7# build the bt strategy
8strategy = build_strategy(weights)
9
10#build the backtest
11backtest = build_backtest(strategy, data, initial_capital, commission_model)
12
13#run the backtest
14first_res = bt.run(backtest)

This prints performance statistics about the strategy. Make note of the daily Sharpe which we’ll use next.

bt makes it easy to plot the results, too.

1first_res.plot(figsize=(20, 10))

And the weights.

1first_res.plot_weights('wd', figsize=(20, 5))

Step 3: Backtest the backtest

We need one more function.

1def shuffle_prices(df):
2    
3    # randomly shuffle the prices without replacement
4    shuffled = df.sample(frac=1)
5    
6    # reset the index
7    shuffled.index = df.index
8    
9    return shuffled

This shuffles the prices and resets the date index.

Why do we do this?

I’m going to run a simulation of 1,000 backtests. I plot the resulting Sharpe ratios and see where the backtest result is on the distribution.

1runs = 1000
2initial_sharpe = first_res['wd'].daily_sharpe
3sharpes = []

Set the number of runs and grab the daily Sharpe ratio from the first backtest. Finally, create a list to capture the Sharpe ratios.

1for run in range(0, runs):
2    
3    # shuffle the prices
4    shuffled = shuffle_prices(data)
5    
6    # add the day of month
7    shuffled_with_dom = add_dom(shuffled)
8    
9    # add the weights
10    weights = add_weights(shuffled_with_dom, 'TLT')
11    
12    # build the strategy
13    strategy = build_strategy(weights)
14    
15    # build the backtest
16    backtest = build_backtest(strategy, shuffled_with_dom, initial_capital, comm)
17    
18    # run the backtest
19    res = bt.run(backtest)
20    
21    # accumulate sharpe ratios
22    sharpe = res['wd'].daily_sharpe
23    sharpes.append(sharpe)

This loop runs the backtest against randomly shuffled prices. It then accumulates the Sharpe ratios which are based on the random data.

Finally, find out where the Sharpe ratio is in the distribution of random backtest results.

1dist = plt.hist(sharpes, bins=10)
2plt.axvline(initial_sharpe, linestyle='dashed', linewidth=1)

Run a simple P-test to test significance. The p-value is N / runs where N is the number of random results that are better than our strategy.

1N = sum(i > initial_sharpe for i in sharpes)
2p_value = N / runs

Very few randomized tests have a better result than our backtest. Indeed, the p-value is below 1%, meaning a high significance of our backtest. This gives us some confidence that the strategy can achieve a similar result in real trading.

Well, that's it for today. I hope you enjoyed it.

See you again next week.