Forecasting time series is important in many contexts and highly relevant to machine learning practitioners. Take, for example, demand forecasting from which many use cases derive. Almost every manufacturer would benefit from better understanding demand for their products in order to optimise produced quantities. Underproduce and you will lose revenues, overproduce and you will be forced to sell excess produce at a discount. Very related is pricing, which is essentially a demand forecast with a specific focus on price elasticity. Pricing is relevant to virtually all companies.
Training is without a doubt the most important part of developing a machine learning application. It’s when you start realizing whether or not your model is worth it, how your hyperparameters should look like and what do you need to change in your architecture. In general, most machine learning engineers spend quite some time on training, experimenting with different models, tuning their architecture and discovering the best metrics and losses for their problem.
This book covers the building blocks of the most common methods in machine learning. This set of methods is like a toolbox for machine learning engineers. Those entering the field of machine learning should feel comfortable with this toolbox so they have the right tool for a variety of tasks. Each chapter in this book corresponds to a single machine learning method or group of methods. In other words, each chapter focuses on a single tool within the ML toolbox.
The plotting functionality in the popular Python data analysis library Pandas has always been one of my go-to methods for super quick charts. However, the available visualisations have always been fairly basic and not particularly pretty.
Predictive model to correctly forecast future trend is crucial for investment management and algorithmic trading. The use of technical indicators for financial forecasting is quite common among the traders. Input window length is a time frame parameter required to be set when calculating many technical indicators.
Our problem here is to define whether or not a certain news article is fake news. The dataset is comprised of 3997 news articles each includes a title, text, and the target label as a REAL/FAKE binary label. Part of the course was also testing the model on a test dataset but I never received target for this dataset. The accuracy score of cross validation testing within the training dataset was 94%.
It’s hard to imagine a modern, tech-literate business that doesn’t use data analysis, data science, machine learning, or artificial intelligence in some form. NumPy is at the core of all of those fields.
In this post, we are going to work with Pandas iloc, and loc. More specifically, we are going to learn slicing and indexing by iloc and loc examples.
Once we have a dataset loaded as a Pandas dataframe, we often want to start accessing specific parts of the data based on some criteria. For instance, if our dataset contains the result of an experiment comparing different experimental groups, we may want to calculate descriptive statistics for each experimental group separately.
In this tutorial you will learn how to use OpenCV to stream video from a webcam to a web browser/HTML page using Flask and Python.
Python’s pandas library is one of the things that makes Python a great programming language for data analysis. Pandas makes importing, analyzing, and visualizing data much easier. It builds on packages like NumPy and matplotlib to give you a single, convenient, place to do most of your data analysis and visualization work.
What is Pyjanitor? Before we continue learning on how to use Pandas and Pyjanitor to clean our datasets, we will learn about this package. The python package Pyjanitor extends Pandas with a verb-based API. This easy to use API is providing us with convenient data cleaning techniques. Apparently, it started out as a port of the R package janitor. Furthermore, it is inspired by the ease-of-use and expressiveness of the r-package dplyr. Note, there are some different ways how to work with the methods and this post will not cover all of them (see the documentation).
One of the most common mistakes data scientists make when training machine learning models is incorrectly splitting data for training and testing. The train/test split involves splitting data during the model training and evaluation process.
Learner makes this simple with a single parameter selection during the model building process. It’s also simple to set the percentage split between training and testing data for each model trained.
This article introduces how to build a Python and Flask based web application for performing text analytics on internet resources such as blog pages. To perform text analytics I will utilizing Requests for fetching web pages, BeautifulSoup for parsing html and extracting the viewable text and, apply the TextBlob package to calculate a few sentiment scores. The code for this article is hosted on GitHub so please fork and experiment with it.
Searching for pulsars is a labor-intensive process that requires experienced astronomers and trained volunteers for their classification. In this article, we implement machine learning techniques to facilitate the process.
Data pipelines are where most of the time is spent for those working with data because the bulk of a machine learning project involves data collection and cleaning. Loominus gives everyone the power to build the data pipelines critical to any machine learning project.
Teraport is a powerful tool within the Loominus product suite that ingests and stages data. In another post, we’ll discuss the data ingestion APIs. For now we’ll focus on building a powerful data pipeline for feature engineering.
Hugging Face, the NLP startup behind several social AI apps and open source libraries such as PyTorch BERT, just released a new python library called PyTorch Transformers.
Transformers are a new set of techniques used to train highly performing and efficient models for performing natural language processing (NLP) and natural language understanding (NLU) tasks such as questions answering and sentiment analysis. Several of the recent techniques used to improve and advance the performance of NLP models, such as XLNet and BERT, are all based on a variation of Transformer.
What it sounds like 🙂
In today’s tutorial, you will learn how to use Keras’ ImageDataGenerator class to perform data augmentation. I’ll also dispel common confusions surrounding what data augmentation is, why we use data augmentation, and what it does/does not do.
Machine learning is pretty undeniably the hottest topic in data science right now. It’s also the basic concept that underpins some of the most exciting areas in technology, like self-driving cars and predictive analytics. Searches for Machine Learning on Google hit an all-time-high in April of 2019, and they interest hasn’t declined much since.
This tutorial will show you how to develop, completely from scratch, a stand-alone photo editing app to add filters to your photos using Python, Tkinter, and OpenCV!
Panel is an open-source Python library that lets you create custom interactive web apps and dashboards by connecting user-defined widgets to plots, images, tables, or text.
This project refers to Lambda Labs at Lambda School in which students spent the past 5 weeks building production-grade web applications, with some of them utilizing machine learning models as part of their backends.
The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, the size and complexity of the pandas library makes it challenging to discover the best way to accomplish any given task.
The Pattern library is a multipurpose library capable of handling the following tasks:
- Natural Language Processing: Performing tasks such as tokenization, stemming, POS tagging, sentiment analysis, etc.
- Data Mining: It contains APIs to mine data from sites like Twitter, Facebook, Wikipedia, etc.
- Machine Learning: Contains machine learning models such as SVM, KNN, and perceptron, which can be used for classification, regression, and clustering tasks.
In this article, we will see the first two applications of the Pattern library from the above list. We will explore the use of the Pattern Library for NLP by performing tasks such as tokenization, stemming and sentiment analysis. We will also see how the Pattern library can be used for web mining.
Topic Model: In a nutshell, it is a type of statistical model used for tagging abstract “topics” that occur in a collection of documents that best represents the information in them.
Many techniques are used to obtain topic models. This post aims to demonstrate the implementation of LDA: a widely used topic modeling technique.
String manipulations are an essential part of Data Science. The latest release of Vaex adds incredibly fast and memory efficient support for all common string manipulations. Compared to Pandas, the most popular DataFrame library in the Python ecosystem, string operations are up to ~30–100x faster on your quadcore laptop, and up to a 1000 times faster on a 32 core machine.
In this article, we will explore TextBlob, which is another extremely powerful NLP library for Python. TextBlob is built upon NLTK and provides an easy to use interface to the NLTK library. We will see how TextBlob can be used to perform a variety of NLP tasks ranging from parts-of-speech tagging to sentiment analysis, and language translation to text classification.