In this blog, we’ll review how we took a raw .ipynb notebook that does time series forecasting with Arima, modularized it into a Ploomber pipeline, and ran parallel jobs on Slurm. You can follow the steps in this guide to deploy it yourself. We’ve been using this notebook by Willie Wheeler.
Time Series Forecasting with Arima, Ploomber, Python, and Slurm
This notebook contains 8 tasks that are the basic steps for modeling: loading data, validating the model, pre-processing data, visualizing the data, hyper parameter tuning, model fitting, forecasts, producing, and visualizing forecasts.
Soorgeon is used with the notebook to automatically modularize it into a Ploomber pipeline. The benefit of using Ploomber is to experiment faster. It caches results from previous runs and it makes it simple to submit parallel jobs to SLURM to fine-tune the model.
To Run the Pipeline Locally
Install Ploomber first. Run this command ploomber examples -n templates/timeseries -o ts cd ts
You can perform a sanity check once you have the pipeline locally. Next, you’ll be shown how you can start executing on Slurm cluster and how to have parallel runs.
Organising the SLURM
If you have access to an existing cluster, you can use it. If not, you can follow these simple steps on how to launch a SLURM cluster with Docker. A tool called Soopervisor has been created for this. Soopervisor allows one export pipelines to SLURM and other platforms like Kubernetes, Airflow and AWS (Amazon Web Services) Batch.
Steps to Launch a SLURM Cluster with Docker
Step 1: Create a SLURM cluster for testing. Create a docker-compose.yml file.
a .Once it’s done, start the cluster.
b. Connect the cluster to submit the jobs.
Step 2: Now that you’re inside the cluster, you need to bootstrap it and make sure you have the pipeline you want to run.
a. Get the bootstrapping script and run it.
b. Get the time-series pipeline template.
c. Install requirements and add through Soopervisor.
By doing this, a cluster directory with the template used by the Soopervisor to submit Slurm tasks will be created. To convert the pipeline and submit the jobs to the cluster, you have to issue the export command. When the job is finished, you’ll see outputs in the output directory.
Once you’re done, shut down the cluster.
View this article on https://www.kdnuggets.com/.