Make a powerful AI agent to convert PDFs to code

I recently posted a PDF with a pairs trading strategy on twitter.

https://x.com/pyquantnews/status/1891848025846288541

I thought I’d implement the strategy in Python. Pretty quickly, I thought that an LLM could do a much faster job than I could.

I started to upload the PDF into ChatGPT and thought about building an agent workflow instead.

In today’s newsletter, you’ll build an AI agent workflow that reads a PDF, develops an implementation plan, and writes code.

All based on a PDF.

Let’s go!

Make a powerful AI agent to convert PDFs to code

AI agents are transforming trading strategies by integrating insights from academic research.

We now have the tools to make academic research accessible by converting complex formulas and jargon into code. This involves data ingestion, planning an implementation, and writing code. From there, we can backtest and deploy strategies in live markets.

LlamaIndex is a great framework that lets us use LLMs to build agents. We’ll use LlamaIndex to parse a PDF describing a pairs trading strategy. The next agent will extract details from the PDF and explain how to implement the strategy. The final two agents will plan an implementation and write the code.

Let's see how it works.

Imports and set up

We’ll use asyncio to run the agents asynchronously. LlamaIndex is great for building agents. Download the pairs trading PDF to work along with this example.

1import os
2import asyncio
3from dotenv import load_dotenv
4from llama_index.llms.openai import OpenAI
5from llama_index.core.workflow import Context
6from llama_index.core.agent.workflow import (
7    FunctionAgent,
8    AgentWorkflow,
9    AgentOutput,
10    ToolCall,
11    ToolCallResult,
12)
13from llama_index.core import SimpleDirectoryReader, GPTVectorStoreIndex
14
15load_dotenv()
16
17llm = OpenAI(model="gpt-4o")

This code sets up our environment and initializes the language model. We load environment variables, which likely include API keys. Then we create an instance of the OpenAI language model, specifically using the GPT-4 model. This prepares us for natural language processing tasks.

Define our tool functions

We define asynchronous functions that will serve as tools for our agents

1async def read_pdf_tool(ctx: Context) -> str:
2    documents = SimpleDirectoryReader(input_files=["pairs.pdf"]).load_data()
3    index = GPTVectorStoreIndex(documents)
4    query_engine = index.as_query_engine()
5    query = (
6        "Extract a detailed description of the pairs trading strategy implementation from this PDF. "
7        "Ensure the description is detailed enough to reproduce the strategy in code."
8    )
9    response = query_engine.query(query)
10    return str(response)
11
12async def build_plan_tool(ctx: Context, plan: str) -> str:
13    current_state = await ctx.get("state")
14    current_state["implementation_plan"] = plan
15    await ctx.set("state", current_state)
16    return "Implementation plan recorded."
17
18async def write_code_tool(ctx: Context, plan: str) -> str:
19    current_state = await ctx.get("state")
20    current_state["python_code"] = plan
21    await ctx.set("state", current_state)
22    return "Python code recorded."

These functions are tools that our agents will use. The read_pdf_tool extracts information from a PDF file about the pairs trading strategy. The build_plan_tool and write_code_tool update the workflow state with an implementation plan and Python code, respectively. These tools allow our agents to process information and generate outputs with the help of an LLM.

Create our agents

We define three function agents to handle different tasks in our workflow

1pdf_reader_agent = FunctionAgent(
2    name="PDFReaderAgent",
3    description=(
4        "Reads a PDF file containing a pairs trading strategy and extracts a detailed description "
5        "of the strategy implementation."
6    ),
7    system_prompt=(
8        "You are the PDFReaderAgent that can read PDFs containing implementation details of pairs trading strategies "
9        "and describe the strategy in detail. Once you read the PDF and describe the implementation details and are "
10        "satisfied, you should hand off control to the PlanBuilderAgent to develop an implementation plan. "
11        "You should have a detailed description of the strategy before handing off control to the PlanBuilderAgent."
12    ),
13    llm=llm,
14    tools=[read_pdf_tool],
15    can_handoff_to=["PlanBuilderAgent"],
16)
17
18plan_builder_agent = FunctionAgent(
19    name="PlanBuilderAgent",
20    description=(
21        "Takes the detailed strategy description and builds a detailed plan to implement the strategy in Python."
22    ),
23    system_prompt=(
24        "You are the PlanBuilderAgent. Your task is to analyze the strategy description from the state and generate "
25        "a detailed plan outlining the steps, functions, and code structure required to implement the pairs trading strategy. "
26        "Include suggested Python libraries for the implementation. Your plan should be in markdown format. Once the plan "
27        "is written, you should hand off control to the CodeWriterAgent. Output your plan with no preamble. Just output the plan."
28    ),
29    llm=llm,
30    tools=[build_plan_tool],
31    can_handoff_to=["CodeWriterAgent"],
32)
33
34code_writer_agent = FunctionAgent(
35    name="CodeWriterAgent",
36    description=(
37        "Takes the implementation plan and writes complete Python code for the pairs trading strategy."
38    ),
39    system_prompt=(
40        "You are the CodeWriterAgent. Your task is to use the implementation plan from the state to write complete "
41        "and executable Python code for the pairs trading strategy. Output your Python code with no preamble. Just "
42        "output the Python code."
43    ),
44    llm=llm,
45    tools=[write_code_tool],
46    can_handoff_to=[],
47)

We create three function agents: PDFReaderAgent, PlanBuilderAgent, and CodeWriterAgent. Each agent has a specific role in our workflow. They use the language model and their assigned tools to perform tasks like reading PDFs, building implementation plans, and writing code.

While it looks like a lot of code, each agent follows the same pattern. They each take a name, a description, and a prompt. This is what allows the LLM know select the right tool. Finally, you pass in the LLM, the tools the agent has available, and which downstream agent is available to pass information to.

Set up our agent workflow

We create an agent workflow to orchestrate the interaction between our agents.

1agent_workflow = AgentWorkflow(
2    agents=[pdf_reader_agent, plan_builder_agent, code_writer_agent],
3    root_agent=pdf_reader_agent.name,
4    initial_state={
5        "strategy_description": "Not generated yet.",
6        "implementation_plan": "Not generated yet.",
7        "python_code": "Not generated yet.",
8    },
9)

This code sets up our agent workflow. It defines the sequence of agents that will work on our task, starting with the PDFReaderAgent. We also set an initial state for our workflow, which will be updated as the agents perform their tasks. This workflow structure allows our agents to collaborate effectively.

Now we can execute the workflow.

1user_msg = (
2    "Please process the PDF file 'pairs.pdf'. "
3    "Extract a detailed description of the pairs trading strategy, build an implementation plan for Python, "
4    "and finally generate the complete Python code for the strategy."
5)
6
7handler = agent_workflow.run(user_msg=user_msg)
8current_agent = None
9async for event in handler.stream_events():
10    if hasattr(event, "current_agent_name") and event.current_agent_name != current_agent:
11        current_agent = event.current_agent_name
12        print(f"\n{'='*50}")
13        print(f"🤖 Agent: {current_agent}")
14        print(f"{'='*50}\n")
15    elif isinstance(event, AgentOutput):
16        if event.response.content:
17            print("📤 Output:", event.response.content)
18        if event.tool_calls:
19            print("🛠️  Planning to use tools:", [call.tool_name for call in event.tool_calls])
20    elif isinstance(event, ToolCallResult):
21        print(f"🔧 Tool Result ({event.tool_name}):")
22        print("  Arguments:", event.tool_kwargs)
23        print("  Output:", event.tool_output)
24    elif isinstance(event, ToolCall):
25        print(f"🔨 Calling Tool: {event.tool_name}")
26        print("  With arguments:", event.tool_kwargs)

We execute our agent workflow by providing a user message that outlines the task. The code then processes the events streamed by the workflow. It prints information about which agent is currently active, their outputs, tool usage, and results. This allows us to see the step-by-step progress of our workflow as it processes the PDF, builds a plan, and generates code for the pairs trading strategy.

The output should look something like this.

With a lot of text, you should see the code near the end.

Your next steps

The first thing to do is test the code the agent creates (which I didn’t do). Even if you come across bugs, you have 90%+ of basic scaffolding already done! You can also try other LLMs and refine the prompts to get more predictable outputs.