May cohort is now open: How to secure your spot:

Building a Data Pipeline from Scratch

Big data processing with Apache Hadoop, Spark, Storm and friends is all the rage right now. But getting started with one of these systems requires an enormous amount of infrastructure, and there are an overwhelming number of decisions to be made. Oftentimes you don’t even know what kinds of questions you can or should be answering with your data.

As a first step, Joe describes the types of problems that people typically solve with a data pipeline—things like A/B testing and data warehousing. Then, drawing from his personal experience of building data tools at Foursquare and a from-scratch data pipeline at a new startup, he’ll highlight the key questions to ask and best practices you should implement to encourage success.

Connect With PyQuant News

80KFollowers

May cohort is now open: How to secure your spot:

Building a Data Pipeline from Scratch

Building a Data Pipeline from Scratch

Connect With PyQuant News

Get started with Python for quant finance with the PyQuant Newsletter

Free Resources

How to ingest premium market data with Zipline Reloaded

Accessing Financial Data In EDGAR using Python

Datasets, DataLoaders and PyTorch’s New DataPipes

A Trading Strategy Based on Elon Musk’s Tweets

Pricing Options and Implied Volatility with Python