July cohort is now open: How to secure your spot:

Build your own GPT investment advisor that reads financial statements

Build your own GPT investment advisor that reads financial statements LangChain is an LLM framework you can use to ask PDFs questions.

Most people have heard of ChatGPT by now. It’s a conversational bot powered by large language models (LLMs). ChatGPT is very powerful but has limitations. For example, it only accepts text as input and for most people, cannot search the internet.

Enter LangChain.

LangChain is a Python-based framework for chaining together different methods to interact with LLMs.

Today, I’ll show you how to use LangChain and OpenAI’s GPT model to build your own GPT investment advisor.

Build your own GPT investment advisor that reads financial statements

LangChain is a language model integration framework that allows developers to create applications using large language models (LLMs). It was developed to reduce the complexity of developing applications with LLMs, making it easier for developers to create powerful applications.

LangChain can be used for financial document analysis and summarization like a GPT investment advisor. Professionals in the industry use LangChain to create applications that can analyze and summarize large amounts of text and understand natural language.

Instead of pouring over dozens of pages of dense financial information, LangChain can parse these documents and answer your questions.

Here’s how.

This tutorial is based on some great work by Nicholas Renotte. You’ll need to install a few libraries before you start. You can run the following command to do it.

pip install langchain, openai, chromadb, pypdf

You can use any PDF you want. For this example, I used Apple’s 2023Q2 consolidated financial statement. It’s only a few pages but you can use PDFs with hundreds of pages.

One final note, you’ll need an OpenAI API key. You can sign up here.

Imports and set up

The LangChain imports include the interface to the OpenAI API, the PDF parser, and vector storage so you can avoid the 4,000 token limit.

import os

from langchain.llms import OpenAI
from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import Chroma
from langchain.agents.agent_toolkits import (
    create_vectorstore_agent,
    VectorStoreToolkit,
    VectorStoreInfo,
)

os.environ["OPENAI_API_KEY"] = "YOUR-API-KEY"

Now it’s time to parse the PDF and store it in the vector storage.

llm = OpenAI(temperature=0.1, verbose=True)
loader = PyPDFLoader("apple.pdf")
pages = loader.load_and_split()
store = Chroma.from_documents(pages, collection_name="annualreport")

First, download the PDF and put it in the same directly as your code. Then create an instance of the OpenAI LLM. The temperature parameter lets you adjust how creative the model responses are. Since we’re looking for facts from a document, we don’t want much creativity.

Parse the PDF, split it into pages, and load the document pages into the vector storage.

Next, convert the document vector store into something LangChain can read.

vectorstore_info = VectorStoreInfo(
    name="apple",
    description="Apple quarterly consolidated financials",
    vectorstore=store,
)

toolkit = VectorStoreToolkit(vectorstore_info=vectorstore_info)

agent_executor = create_vectorstore_agent(
    llm=llm,
    toolkit=toolkit,
    verbose=True
)

Using the store you created above, create a vector metadata repo. The name and description can be anything you want. Next create the VectorStoreToolkit. The toolkit takes the vector metadata and feeds it into a LangChain agent.

Ask the PDF questions

The OpenAI GPT LLMs are trained up to September 2021. That means the LLM had no knowledge of a document created in 2023. But by loading the document into vector storage, you are able to fine tune the LLM to respond to prompts related to the PDF.

Let’s do just that.

prompt = input("Enter your search term: ")

When executed, you’ll see a text box for you to enter your prompt. Here are a few questions you can ask:

  • What were net sales in greater china for the 3 months ending April 1, 2023?
  • What is the year over year percentage change in net income for the three months ended march 26, 2022 and April 1, 2023?
  • What was the total depreciation and amortization for the six months ended April 1, 2023?

Finally, run the agent with your question.

response = agent_executor.run(prompt)
print(response)

The agent_executor knows about the LLM and the PDF and is trained on its contents. You’ll see LangChain running the commands and correctly answering the questions.

You can now use state-of-the-art technology to parse and quickly get information of out PDFs.