I’ve been down the rabbit hole with LLMs for about three months now. One of the most exciting things I’ve found is LangChain which lets you build applications based on GPT AI. A few weeks ago I showed you how to read PDFs. Today I’m going to show you how to query a SQL database.
Firstly, you’ll populate a SQL database with options data. Next, you’ll use OpenAI’s GPT-3.5, a cutting-edge LLM, to craft SQL queries and extract answers from the data.
Use GPT AI to query a SQL database of options prices
LangChain is a tool that allows users to build applications with LLMs. Using LangChain’s Tools and Agents, you can use LLMs to write complex SQL to query relational databases. The key behind Agents is giving LLM’s the possibility of using tools in their workflow. This is where LangChain departs from the ChatGPT.
LangChain uses GPT AI to interpret the query and return the desired results. It can be used to query a variety of databases, including relational databases, NoSQL databases, and graph databases.
By using LangChain Tools and Agents with SQL, you can streamline your data analysis process and make more informed decisions.
And the best part?
Today you’ll get the code to do it.
Imports and set up
We will use the Agents to interact with a small sample database of options. But first, you need to import pandas, the OpenBB SDK, and SQLAlchemy. You can use these to download and store the data. Then you’ll need the LangChain imports.
import os import pandas as pd from openbb_terminal.sdk import openbb from sqlalchemy import create_engine from langchain import OpenAI from langchain.sql_database import SQLDatabase from langchain.chains import SQLDatabaseChain from langchain.agents import Tool, load_tools, initialize_agent os.environ["OPENAI_API_KEY"] = "<your API key>" engine = create_engine("sqlite:///:memory:")
In today’s issue, I show you how to use a SQLite database. But you can connect to, and use any relational database. Even remote ones.
Grab the data and create the table.
symbols = ["META", "AMZN", "AAPL", "NFLX", "GOOG"] prices = openbb.stocks.ca.hist(symbols) chains =  for symbol in symbols: chain = openbb.stocks.options.chains(symbol) chain["symbol"] = symbol chain["underlying_last"] = prices.iloc[-1][symbol] chains.append(chain) options_chains = pd.concat(chains)
This code iterates through the FAANG stocks, grabs price data, and all the options chains. You need get price data separately because the options chains don’t have them.
Use pandas to store the data into the SQLite database.
options_chains.to_sql("options", con=engine, index=False)
Set up LangChain chains, tools, and agents
The first step is to create an instance of the LLM, a DB, and a SQL DB chain.
llm = OpenAI(temperature=0) db = SQLDatabase(engine) sql_chain = SQLDatabaseChain.from_llm( llm=llm, db=db, verbose=True )
This shows the power of LangChain. You now have a Python object that represents OpenAI’s cutting edge LLM.
Next, create the custom LangChain tool that uses the SQL Chain to build SQL queries.
sql_tool = Tool( name="Options DB", func=sql_chain.run, description="Query options data.", )
Since LLMs are notoriously bad at basic math, LangChain has a built in math tool. This helps the LLM do complex math. Load it and append the SQL tool you just built.
tools = load_tools(["llm-math"], llm=llm) tools.append(sql_tool)
Finally, initialize the LangChain agent.
zero_shot_agent = initialize_agent( agent="zero-shot-react-description", tools=tools, llm=llm, verbose=True, max_iterations=5, )
This uses a zero-shot ReAct agent with no memory. That means it takes a single prompt and uses the frozen data in the LLM to respond. ReAct is a prompt engineering technique that alternatives between “acting” and “reasoning.” You’ll see it do this in the output.
Create prompt to generate the SQL
To run the agent, create a prompt.
prompt = """ What are the last prices of 5 META call options with expiration date greater than 60 days away and strike price within 5% of the underlying price? Create a list of the options and include the expiration date, strike price, and last price. Use that list to create a table using the following template: Expiration Strike Price ------------------------------ expiration_date strike last ... expiration_date strike last If there are no results, print 'no results.' """
I ask the LLM for the last prices of 5 META call options with expiration date greater than 60 days and a strike price within 5% of the underlying. Then I use the prompt to instruct the LLM to format the response in a table format.
Run the agent.
You will see the LangChain agent explain its reasoning while it constructs the SQL. In the output you can find the query:
SELECT * FROM options WHERE symbol = 'META' AND "optionType" = 'call' AND expiration > date('now', '+60 days') AND strike BETWEEN underlying_last * 0.95 AND underlying_last * 1.05 LIMIT 5;
From there, the query is executed and the results are output in a table.
Here’s another, more complex prompt.
prompt = """ What are the 5 options from each symbol with expiration date between 40 and 60 days away, a strike price within 5% of the underlying price, open interest greater than 100, and the difference between the ask and the bid less than 0.05? """
If you’re familiar with SQL, you might expect the SQL to include a GROUP BY clause, which it does not. This demonstrates one of the downsides of using LLMs to generate SQL. But with enough iterations, you will find the right prompt that generates the results you’re looking for.
There’s two action steps today.
The first is to use the code to create a few separate tables with a common key among them. Then write prompts to ask questions that require joins and aggregates.
The second is to connect to a remote database with SQLAlchemy and use LangChain in the same way to query it.