How to build an automated quant database
How to build an automated quant database
How to build an automated quant database
In today’s issue I’m going to show you how to build your own automated stock price database.
All quants work with data to analyze the markets. Most of them have internal databases. Hedge funds and banks like Jane Street, Man Group, and Goldman Sachs have teams of people managing data.
Building your own stock price database is important if you want to combine data to analyze the markets. Putting stock prices, economic trends, and custom analytics in a database helps you research the markets faster.
Data drives the markets
With the explosion of free market data, there’s never been a better time to start storing it. Once you have it, combine it, transform it, and analyze it in any way you want.
By the end of this issue you will know how to:
- Use SQLite to build a database
- Download stock data for free
- Store the data in a database
- Run it all automatically
I’m going to show you how to do it with Python.
Let’s get started
A quick note
In today’s issue, I use a Python script. By using a script, I can automatically download data using Task Manager (Windows) or CRON (Mac/Linux).
Open your favorite code editor and create a Python file called market_data.py
.
Step 1: Import the libraries
Start by importing the libraries. I like to use yfinance to get stock data. I use SQLite as a database you can store on your computer. It’s an efficient file-based database which makes it perfect for building research tools. It uses standard SQL so you can easily move to a different database like MySQL or Postgres.
To create the database, you need to import the SQLite library. You can use pandas to interact directly with the database.
from sys import argv
1from sys import argv
2
3import pandas as pd
4import yfinance as yf
5import sqlite3
Step 2: Download and store stock price data
The script you are building will do two things:
- Save data within a date range
- Save data for the last trading day
To make it easy, create three functions to download the data, save data within a range, and save data for the last trading day.
First, create a function that downloads data.
1def get_stock_data(symbol, start, end):
2 data = yf.download(symbol, start=start, end=end)
3 data.reset_index(inplace=True)
4 data.rename(columns={
5 "Date": "date",
6 "Open": "open",
7 "Low": "low",
8 "Close": "close",
9 "Adj Close": "adj_close",
10 "Volume": "volume"
11 }, inplace=True)
12 data['symbol'] = symbol
13 return data
The function uses yfinance to download data. Then it converts the date to a column, renames the columns, adds the symbol, and returns a DataFrame. Add the symbol so you can save more than one stock in the same table.
Next, create a function that uses get_stock_data
to get stock data and saves it into the database.
1def save_data_range(symbol, start, end, con):
2 data = get_stock_data(symbol, start, end)
3 data.to_sql(
4 "stock_data",
5 con,
6 if_exists="append",
7 index=False
8 )
The function uses pandas to save the data in the DataFrame into the database. Finally, create a function that saves the last trading day's data into the database.
1def save_last_trading_session(symbol, con):
2 today = pd.Timestamp.today()
3 data = get_stock_data(symbol, today, today)
4 data.to_sql(
5 "stock_data",
6 con,
7 if_exists="append",
8 index=False
9 )
This function grabs data from today and inserts it into the database.
I want to run this script automatically. Before I do, I need to add some code to run it from the command line.
1if __name__ == "__main__":
2 # usage example for bulk insert
3 # python market_data.py bulk SPY 2022-01-01 2022-10-20
4 # usage example for last session
5 # python market_aata.py last SPY
6
7 con = sqlite3.connect("market_data.sqlite")
8
9 if argv[1] == "bulk":
10 symbol = argv[2]
11 start = argv[3]
12 end = argv[4]
13 save_data_range(symbol, start, end, con)
14 print(f"{symbol} saved between {start} and {end}")
15 elif argv[1] == "last":
16 symbol = argv[2]
17 save_last_trading_session(symbol, con)
18 print(f"{symbol} saved")
19 else:
20 print("Enter bulk or last")
The comments have examples of how to run the code at the command line.
The code in your file should look like this.
1from sys import argv
2
3import pandas as pd
4import yfinance as yf
5import sqlite3
6
7def get_stock_data(symbol, start, end):
8 data = yf.download(symbol, start=start, end=end)
9 data.reset_index(inplace=True)
10 data.rename(columns={
11 "Date": "date",
12 "Open": "open",
13 "Low": "low",
14 "Close": "close",
15 "Adj Close": "adj_close",
16 "Volume": "volume"
17 }, inplace=True)
18 data['symbol'] = symbol
19 return data
20
21def save_data_range(symbol, start, end, con):
22 data = get_stock_data(symbol, start, end)
23 data.to_sql(
24 "stock_data",
25 con,
26 if_exists="append",
27 index=False
28 )
29
30def save_last_trading_session(symbol, con):
31 today = pd.Timestamp.today()
32 data = get_stock_data(symbol, today, today)
33 data.to_sql(
34 "stock_data",
35 con,
36 if_exists="append",
37 index=False
38 )
39
40if __name__ == "__main__":
41 # usage example for bulk insert
42 # python market_data.py bulk SPY 2022-01-01 2022-10-20
43 # usage example for last session
44 # python market_aata.py last SPY
45
46 con = sqlite3.connect("market_data.sqlite")
47
48 if argv[1] == "bulk":
49 symbol = argv[2]
50 start = argv[3]
51 end = argv[4]
52 save_data_range(symbol, start, end, con)
53 print(f"{symbol} saved between {start} and {end}")
54 elif argv[1] == "last":
55 symbol = argv[2]
56 save_last_trading_session(symbol, con)
57 print(f"{symbol} saved")
58 else:
59 print("Enter bulk or last")
Remember, to use this code, you need to run it in the terminal.
Step 3: Automate your script
The best way to build your database is to first download history. You do this once to “backfill” the historic data. Then you schedule the script to run every trading day after the market close to capture that day’s data going forward.
There are a lot of references to schedule a Python job on Mac and Windows so I won't cover it in detail here.
Here are two that will work:
Bonus: Query the SQLite database with pandas in a Jupyter Notebook
To get data out of the database, create a Jupyter Notebook in the same directory as the database file. Then you can write SQL as you would normally.
1import sqlite3
2import pandas as pd
3
4# connect to the database
5con = sqlite3.connect("market_data.sqlite")
6
7# simple select statement
8df_1 = pd.read_sql_query("SELECT * from stock_data where symbol='SPY'", con)
9
10# where statement
11df_2 = pd.read_sql_query("SELECT * from stock_data where symbol='SPY' and volume > 100000", con)
You can join with other tables, filter, and group in any way you want.