July cohort is now open: How to secure your spot:

How to use autoencoders to create feature embeddings

Embeddings are used in neural networks to transform large, sparse data into manageable, dense formats.

In other words, they simplify complex data, making it easier to analyze.

We can use embeddings to capture dense information about drivers of stock returns. This approach is a great way to select pairs and diversify portfolio risk.

By the end of today’s newsletter, you’ll have code to train an autoencoder to build embeddings for stock factors.

This issue is a bit longer than usual, but I hope it helps you get started using autoencoders—let’s go!

 How to use autoencoders to create feature embeddings

Embeddings are compact, dense representations of original high-dimensional stock data, transformed into a lower-dimensional space.

They are created using methods like autoencoders which retain the information contained in features, like volatility or technical indicators. These embeddings are used for clustering, anomaly detection, and predictive modeling.

Embeddings reduce stock features into lower-dimensional vectors that capture key patterns.

This makes them ideal for use in K-means analysis to group similar stocks based on their underlying characteristics.

Let’s see how it works.

Imports and set up

We’ll use some pretty powerful libraries in this issue including PyTorch and Scikit-Learn.

import yfinance as yf
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

Next, we’ll download stock price data to construct our mock portfolio.

symbols = [
    "AAPL", "MSFT", "GOOGL", "AMZN", "META",
    "TSLA", "BRK-B", "V", "JNJ", "WMT", "JPM",
    "MA", "PG", "UNH", "DIS", "NVDA", "HD", 
    "PYPL", "BAC", "VZ", "ADBE", "CMCSA", "NFLX",
    "KO", "NKE", "MRK", "PEP", "T", "PFE", "INTC",
stock_data = yf.download(
)["Adj Close"]

We’ll use the stock price data to create a few features.

log_returns = np.log(stock_data / stock_data.shift(1))
moving_avg = stock_data.rolling(window=22).mean()
volatility = stock_data.rolling(window=22).std()

features = pd.concat([log_returns, moving_avg, volatility], axis=1).dropna()
processed_data = (features - features.mean()) / features.std()

Features are patterns in the data we think drive returns. In this example, we’re using log returns, a simple moving average, and volatility.

Build an autoencoder with PyTorch

Let’s convert the normalized feature data into PyTorch tensors and DataLoader objects.

tensor = torch.tensor(processed_data.values, dtype=torch.float32)
dataset = TensorDataset(tensor)
data_loader = DataLoader(dataset, batch_size=32, shuffle=True)

This code converts our features data into a PyTorch tensor, wraps it in a TensorDataset for batch handling, and creates a DataLoader. The DataLoader is used to iterate over the dataset in batches of 32 while shuffling the data to randomize the input during training.

Let’s build the autoencoder.

class StockAutoencoder(nn.Module):
    def __init__(self, feature_dim):
        super(StockAutoencoder, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(feature_dim, 64),
            nn.Linear(64, 32),
            nn.Linear(32, 10),  # Latent space
        self.decoder = nn.Sequential(
            nn.Linear(10, 32),
            nn.Linear(32, 64),
            nn.Linear(64, feature_dim),

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

An autoencoder is a type of neural network that learns to compress (encode) the input data into a smaller representation and then reconstruct (decode) the output to match the input as closely as possible.

This type of network is useful for learning efficient representations (embeddings) of data, which can be used for tasks such as dimensionality reduction, denoising, or anomaly detection.

In the encoder, data is compressed through a series of linear layers: from the original feature dimension to 64, then 32, and finally to a 10-dimensional space.

Non-linear ReLU activation functions are applied after each linear transformation to introduce non-linearity. This helps the model to capture and learn more complex data patterns effectively.

The decoder reconstructs the input data from the 10-dimensional space by gradually expanding the dimensions through linear layers from 10 to 32, then 64, and finally back to the original feature size.

The forward method of the autoencoder sequentially passes an input tensor through the encoder and decoder to produce a reconstructed version of the input.

Now lets train it.

def train(model, data_loader, epochs=100):
    criterion = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    for epoch in range(epochs):
        for data in data_loader:
            inputs = data[0]
            outputs = model(inputs)
            loss = criterion(outputs, inputs)
        print(f"Epoch {epoch+1}, Loss: {loss.item()}")

feature_dim = processed_data.shape[1]
model = StockAutoencoder(feature_dim)
train(model, data_loader)

This function manages the training of the autoencoder by iteratively adjusting its weights to minimize the loss between its predictions and the actual inputs.

The training loop iterates over the entire dataset multiple times. Each iteration processes data in batches using each batch as input and labels for autoencoder training.

Here’s what’s happening:

  1. It zeros the gradients to prevent accumulation
  2. Performs a forward pass to get reconstructed outputs
  3. Calculates loss using MSE
  4. Performs a backward pass to compute gradients.
  5. Parameters are updated through the optimizer

When you run this, you’ll see the loss printed for each epoch.

Finally, we can extract the embeddings and use them to create clusters.

def extract_embeddings(model, data_loader):
    embeddings = []
    with torch.no_grad():
        for data in data_loader:
            inputs = data[0]
            encoded = model.encoder(inputs)
    return torch.vstack(embeddings)

embeddings = extract_embeddings(model, data_loader)
kmeans = KMeans(n_clusters=5, random_state=42).fit(embeddings.numpy())
clusters = kmeans.labels_

This code switches to evaluation mode and disables gradient calculations to extract embeddings from the encoder part of the model.

It iterates over the data loader, feeding input batches through the encoder and collecting the embeddings in a list.

After extracting the embeddings, the function stacks them into a tensor, which is then clustered using K-means into five groups.

Reduce the dimensions and analyze the results

Principal Component Analysis (PCA) reduces the dimensionality of the embeddings to principal components. These components capture the directions of maximum variance in the data.

pca = PCA(n_components=2)
embeddings_2d = pca.fit_transform(embeddings.numpy())

The code initializes a PCA model to reduce the dimensionality of the embeddings to two principal components. Then it converts the embeddings into a two-dimensional format so we can plot them.

plt.figure(figsize=(10, 8))
    x=embeddings_2d[:, 0],
    y=embeddings_2d[:, 1],
    palette=sns.color_palette("hsv", len(set(clusters))),
plt.xlabel("PCA Dimension 1")
plt.ylabel("PCA Dimension 2")

The result visualizes the two-dimensional PCA-reduced embeddings of stock data. Each point represents a stock positioned according to its values on the first two principal components. The colors represent the different clusters.

How to use autoencoders to create feature embeddings. Embeddings are used in neural networks to transform data into dense formats.

Note these scales don’t represent specific metrics directly. They measure the relative importance of the data as captured by PCA. Stocks clustered together are more similar with respect to the principal components.

Next steps

As a next step, increase the complexity of the autoencoder by adding more layers or neurons within each layer. Adding another linear layer in both the encoder and decoder might help the autoencoder capture more complex patterns in the stock data.