How to use HDF5 for advanced, ultra fast market data storage

September 2, 2023
Facebook logo.
Twitter logo.
LinkedIn logo.
Get this code in Google Colab

How to use HDF5 for advanced, ultra fast market data storage

If there’s one thing algorithmic traders cannot get enough of, it’s data.

The data that fuels our strategies is more than just numbers—it's the lifeblood of our decision-making processes.

And having data available locally—or at least within your control—is a big part of that.

In today’s newsletter, we’ll use the ultra-fast HDF5 file format to store data for research and analysis.

Let’s go!

How to use HDF5 for advanced, ultra fast market data storage

Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and organize large amounts of data in a hierarchical format.

It was originally developed at the U.S. National Center for Supercomputing Applications.

HDF version 5 (HDF5), is an open source file format that supports large, complex, heterogeneous data.

HDF5 uses a "file directory" like structure that allows you to organize data within the file in many different structured ways, as you might do with files on your computer.

We’ll use HDF5 to store market data on stocks, options, and futures.

Imports and set up

Start by importing pandas and the OpenBB SDK. Then set up some variables we’ll use later.

1import pandas as pd
2from openbb_terminal.sdk import openbb
3
4STOCK_DATA_STORE = "stocks.h5"
5FUTURES_DATA_STORE = "futures.h5"
6ticker = "SPY"
7root = "ES"

Then we’ll use the OpenBB SDK to download data for historic SPY prices, SPY options, and E-mini S&P 500 futures.

1# stock price data
2spy_equity = openbb.stocks.load(ticker)
3
4# options chains
5spy_expirations = openbb.stocks.options.expirations(ticker)
6spy_historic = openbb.stocks.options.hist(
7    ticker,
8    spy_expirations[1],
9    440
10)
11spy_chains = openbb.stocks.options.chains(ticker)

Use pandas to store ETF and options data in HDF5

With this data stored in pandas DataFrames, we open the assets.h5 file using the pandas HDFStore method.

1# stock data
2with pd.HDFStore(STOCKS_DATA_STORE) as store:
3    store.put("equities/spy/stock_prices", spy_equity)
4    store.put("equities/spy/options_prices", spy_historic)
5    store.put("equities/spy/chains", spy_chains)

The Python with statement creates a context that allows you to run a group of statements under the control of a context manager.

Here, we open the assets.h5 file as a pandas HDFStore.

HDFStore has a method called put which allows us to easily store the data in the DataFrame in the HDF5 file.

Getting data out is simple:

1with pd.HDFStore(DATA_STORE) as store:
2    spy_prices = store["equities/spy/stock_prices"]
3    spy_options = store["equities/spy/options_prices"]
4    spy_chains = store["equities/spy/chains"]

Use pandas to store futures data in HDF5

We can use the OpenBB SDK to retrieve historical futures data for individual futures contracts.

1with pd.HDFStore(FUTURES_DATA_STORE) as store:
2    for i in range(23, 31):
3        expiry = f"20{i}-12"
4        df = openbb.futures.historical(
5            symbols=[root],
6            expiry=expiry,
7            start_date="2020-01-01",
8            end_date="2022-12-31"
9        )
10        df.rename(
11            columns={
12                "Adj Close": expiry
13            },
14            inplace=True
15        )
16        prices = df[expiry]
17
18        store.put(f'futures/{root}/{expiry}', prices)

Here, we use the context manager to open the HDF5 file using pandas. Once open, we iterate through a list of futures expirations, downloading the data, and storing in a DataFrame. Once stored, we save the data into the HDF5 file.

We can retrieve it the same was as for SPY.

1with pd.HDFStore(FUTURES_DATA_STORE) as store:
2    es_prices = store[f"futures/{root}/2023-12"]

Next steps

There’s two next steps you can take.

First, check out my other newsletter issue that describes how to build an automated script to store data locally. Next, adapt the script to begin storing data in HDF5 format.

Man with glasses and a wristwatch, wearing a white shirt, looking thoughtfully at a laptop with a data screen in the background.