May cohort is now open: How to secure your spot:

High Performance Text Processing in Machine Learning

This talk covers rapid development of high performance scalable text processing solutions for tasks such as classification, semantic analysis, topic modeling and general machine learning.

We demonstrate how Python modules, and in particular the Rosetta Python library, can be used to process, clean, tokenize, extract features, and finally build statistical models with large volumes of text data.

The Rosetta library focuses on creating small and simple modules (each with command line interfaces) that use very little memory and are parallelized with the multiprocessing package. We will touch on LDA topic modeling and different implementations thereof (Vowpal Wabbit and Gensim).

The talk will be part presentation and part “real life” example tutorial.

Connect With PyQuant News

80KFollowers

May cohort is now open: How to secure your spot:

High Performance Text Processing in Machine Learning

High Performance Text Processing in Machine Learning

Connect With PyQuant News

Get started with Python for quant finance with the PyQuant Newsletter

Free Resources

How to ingest premium market data with Zipline Reloaded

Accessing Financial Data In EDGAR using Python

Datasets, DataLoaders and PyTorch’s New DataPipes

A Trading Strategy Based on Elon Musk’s Tweets

Pricing Options and Implied Volatility with Python