May cohort is now open: How to secure your spot:

How to use HDF5 for advanced, ultra fast market data storage

How to use HDF5 for advanced, ultra fast market data storage

If there’s one thing algorithmic traders cannot get enough of, it’s data.

The data that fuels our strategies is more than just numbers—it’s the lifeblood of our decision-making processes.

And having data available locally—or at least within your control—is a big part of that.

In today’s newsletter, we’ll use the ultra-fast HDF5 file format to store data for research and analysis.

Let’s go!

How to use HDF5 for advanced, ultra fast market data storage

Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and organize large amounts of data in a hierarchical format.

It was originally developed at the U.S. National Center for Supercomputing Applications.

HDF version 5 (HDF5), is an open source file format that supports large, complex, heterogeneous data.

HDF5 uses a “file directory” like structure that allows you to organize data within the file in many different structured ways, as you might do with files on your computer.

We’ll use HDF5 to store market data on stocks, options, and futures.

Imports and set up

Start by importing pandas and the OpenBB SDK. Then set up some variables we’ll use later.

import pandas as pd
from openbb_terminal.sdk import openbb

STOCK_DATA_STORE = "stocks.h5"
FUTURES_DATA_STORE = "futures.h5"
ticker = "SPY"
root = "ES"

Then we’ll use the OpenBB SDK to download data for historic SPY prices, SPY options, and E-mini S&P 500 futures.

# stock price data
spy_equity = openbb.stocks.load(ticker)

# options chains
spy_expirations = openbb.stocks.options.expirations(ticker)
spy_historic = openbb.stocks.options.hist(
    ticker,
    spy_expirations[1],
    440
)
spy_chains = openbb.stocks.options.chains(ticker)

Use pandas to store ETF and options data in HDF5

With this data stored in pandas DataFrames, we open the assets.h5 file using the pandas HDFStore method.

# stock data
with pd.HDFStore(STOCKS_DATA_STORE) as store:
    store.put("equities/spy/stock_prices", spy_equity)
    store.put("equities/spy/options_prices", spy_historic)
    store.put("equities/spy/chains", spy_chains)

The Python with statement creates a context that allows you to run a group of statements under the control of a context manager.

Here, we open the assets.h5 file as a pandas HDFStore.

HDFStore has a method called put which allows us to easily store the data in the DataFrame in the HDF5 file.

Getting data out is simple:

with pd.HDFStore(DATA_STORE) as store:
    spy_prices = store["equities/spy/stock_prices"]
    spy_options = store["equities/spy/options_prices"]
    spy_chains = store["equities/spy/chains"]

Use pandas to store futures data in HDF5

We can use the OpenBB SDK to retrieve historical futures data for individual futures contracts.

with pd.HDFStore(FUTURES_DATA_STORE) as store:
    for i in range(23, 31):
        expiry = f"20{i}-12"
        df = openbb.futures.historical(
            symbols=[root],
            expiry=expiry,
            start_date="2020-01-01",
            end_date="2022-12-31"
        )
        df.rename(
            columns={
                "Adj Close": expiry
            },
            inplace=True
        )
        prices = df[expiry]

        store.put(f'futures/{root}/{expiry}', prices)

Here, we use the context manager to open the HDF5 file using pandas. Once open, we iterate through a list of futures expirations, downloading the data, and storing in a DataFrame. Once stored, we save the data into the HDF5 file.

We can retrieve it the same was as for SPY.

with pd.HDFStore(FUTURES_DATA_STORE) as store:
    es_prices = store[f"futures/{root}/2023-12"]

Next steps

There’s two next steps you can take.

First, check out my other newsletter issue that describes how to build an automated script to store data locally. Next, adapt the script to begin storing data in HDF5 format.