Easily cross-validate parameters to boost your trading strategy

Trading strategies often rely on parameters.

To enhance and effectively cross-validate these parameters can provide a competitive advantage in the market.

However, reliable cross-validation strategies can lead to look-ahead bias and other pitfalls that can lead to overestimating a strategy's performance.

In today’s newsletter, we’ll use VectorBT PRO to easily implement a variety of sophisticated cross-validation methods with just a few lines of code.

Ready?

Easily cross-validate parameters to boost your trading strategy

VectorBT PRO offers several features that are highly beneficial for traders.

It allows for lightning-fast testing of trading strategies over historical data using a vectorized approach.

The framework supports extensive customization and optimization enabling traders to fine-tune strategies according to specific market conditions or personal trading styles.

VectorBT PRO is designed to handle large datasets and deal with complex analyses efficiently. It is tightly integrated with pandas which makes it easy to fit into existing data processing pipelines.

Let’s see how it works

Imports and set up

Let's import VBT PRO and the few libraries relevant to our analysis.

1import numpy as np
2from pandas.tseries.frequencies import to_offset
3import vectorbtpro as vbt
4vbt.settings.set_theme("dark")

Grab the data for your favorite asset. We’ll use AAPL.

1SYMBOL = "AAPL"
2START = "2010"
3END = "now"
4TIMEFRAME = "day"
5
6data = vbt.YFData.pull(
7    SYMBOL,
8    start=START,
9    end=END,
10    timeframe=TIMEFRAME
11)

Cross validation schema

Next, we'll set up a "splitter," which divides a date range into smaller segments according to a chosen schema. For instance, lets allocate 12 months for training data and another 12 months for testing data, with this cycle repeating every 3 months.

1TRAIN = 12
2TEST = 12
3EVERY = 3
4OFFSET = "MS"
5
6splitter = vbt.Splitter.from_ranges(
7    data.index, 
8    every=f"{EVERY}{OFFSET}", 
9    lookback_period=f"{TRAIN + TEST}{OFFSET}",
10    split=(
11        vbt.RepFunc(lambda index: index < index[0] + TRAIN * to_offset(OFFSET)),
12        vbt.RepFunc(lambda index: index >= index[0] + TRAIN * to_offset(OFFSET)),
13    ),
14    set_labels=["train", "test"]
15)
16splitter.plots().show_png()

First we segment the data into training and testing periods based on a specified frequency and a combined period of TRAIN + TEST months.

The split argument defines the training set as the first TRAIN months and the testing set as the subsequent TEST months in each split, while the set_labels argument names these segments.

The splitter.plots().show_png() command results in the following visualization:

Easily cross-validate parameters to boost your trading strategy. To cross-validate parameters can provide a competitive advantage in the market.

In the first subplot, we see that each split (or row) contains adjacent training and testing sets, progressively rolling from past to present.

The second subplot illustrates the overlap of each data point across different ranges. Tip: For non-overlapping testing sets, use the setting EVERY = TRAIN.

Parameter optimization

Next, we'll create a function to execute a trading strategy within a specified date range using a single parameter set, returning one key metric. Our strategy will be a simple EMA crossover combined with an ATR trailing stop.

1def objective(data, fast_period=10, slow_period=20, atr_period=14, atr_mult=3):
2    fast_ema = data.run("talib:ema", fast_period, short_name="fast_ema", unpack=True)
3    slow_ema = data.run("talib:ema", slow_period, short_name="slow_ema", unpack=True)
4    atr = data.run("talib:atr", atr_period, unpack=True)
5    pf = vbt.PF.from_signals(
6        data, 
7        entries=fast_ema.vbt.crossed_above(slow_ema), 
8        exits=fast_ema.vbt.crossed_below(slow_ema), 
9        tsl_stop=atr * atr_mult, 
10        save_returns=True,
11        freq=TIMEFRAME
12    )
13    return pf.sharpe_ratio
14
15print(objective(data))

By decorating our function with parameterized, we enable objective to accept a list of parameters and execute them across all combinations. We'll then further enhance the function with another decorator, split, which runs the strategy on each date range specified by the splitter.

1param_objective = vbt.parameterized(
2    objective,
3    merge_func="concat",
4    mono_n_chunks="auto",  # merge parameter combinations into chunks
5    execute_kwargs=dict(engine="pathos")  # run chunks in parallel using Pathos
6)
7cv_objective = vbt.split(
8    param_objective,
9    splitter=splitter, 
10    takeable_args=["data"],  # select date range from data
11    merge_func="concat", 
12    execute_kwargs=dict(show_progress=True)
13)
14
15sharpe_ratio = cv_objective(
16    data,
17    vbt.Param(np.arange(10, 50), condition="slow_period - fast_period >= 5"),
18    vbt.Param(np.arange(10, 50)),
19    vbt.Param(np.arange(10, 50), condition="fast_period <= atr_period <= slow_period"),
20    vbt.Param(np.arange(2, 5))
21)
22print(sharpe_ratio)

This tests over 3 million combinations of date ranges and parameters in just a few minutes.

Analyze the results

Let’s analyze the results by segmenting the fast and slow EMA periods. It highlights the minimal variation in the Sharpe ratio from the training to the testing set across at least 50% of the splits, where blue indicates a positive change.

1sharpe_ratio_diff = test_sharpe_ratio - train_sharpe_ratio
2sharpe_ratio_diff_median = sharpe_ratio_diff.groupby(
3    ["fast_period", "slow_period"]
4).median()
5sharpe_ratio_diff_median.vbt.heatmap(
6    trace_kwargs=dict(colorscale="RdBu")
7).show_png()

The result is a heatmap showing the various Sharpe ratios across the slow and fast period combinations.

Next steps

Although you might have developed a promising strategy on paper, cross-validating it is essential to confirm its consistent performance over time and to ensure it's not merely a result of random fluctuations. Apply the techniques you learned here to your own strategy.