A common misconception is that the market cannot be predicted and that hedge fund managers are no better than dart-throwing monkeys. Many academic research papers back up this claim with data. This is an overly simplistic view. Just because some markets cannot be predicted under some experimental settings, such as equities traded on a daily basis, this does not mean no market can be predicted in any setting. Let us try to get an intuitive understanding of what it means to predict the market.
This article will discuss several tips and shortcuts for using
iloc to work with a data set that has a large number of columns. Even if you have some experience with using
iloc you should learn a couple of helpful tricks to speed up your own analysis and avoid typing lots of column names in your code.
It’s hard to imagine a modern, tech-literate business that doesn’t use data analysis, data science, machine learning, or artificial intelligence in some form. NumPy is at the core of all of those fields.
n this tutorial, you will learn how to detect fire and smoke using Computer Vision, OpenCV, and the Keras Deep Learning library.
This is a collection of Jupyter notebooks based on different topics in the area of quantitative finance. Wow!
What it says on the tin.
From GPS navigation to network-layer link-state routing, Dijkstra’s Algorithm powers some of the most taken-for-granted modern services. Utilizing some basic data structures, let’s get an understanding of what it does, how it accomplishes its goal, and how to implement it in Python (first naively, and then with good asymptotic runtime!)
TensorFlow Lite is a framework for running lightweight machine learning models, and it’s perfect for low-power devices like the Raspberry Pi! This video shows how to set up TensorFlow Lite on the Raspberry Pi for running object detection models to locate and identify objects in real-time webcam feeds, videos, or images.
This April a 1.5 billion dollar medicare scheme took advantage of hundreds of thousands of seniors in the US. In reality, this is just a small sliver of the billions of dollars healthcare fraud costs both consumers and insurance providers annually.
Healthcare fraud can come from many different directions. Some people might think of the patient who pretends to be injured, but actually, much of fraud is caused by providers(as in the NYT article).
Providers often have financial incentives for increasing performing unnecessary surgeries or claiming work they never even did. This leads to many different flavors of fraud that can all be difficult to detect on a claim by claim basis.
In this tutorial, you will learn how to train your own traffic sign classifier/recognizer capable of obtaining over 95% accuracy using Keras and Deep Learning.
This article summarizes how to clean up messy currency fields and convert them into a numeric value for further analysis. The concepts illustrated here can also apply to other types of pandas data cleanup tasks.
Investing was always associated with large amounts of money, both in terms of the invested amount as well as costs associated with it. Here at BUX, we want to make investing accessible to everyone. That is why we recently launched BUX Zero in the Netherlands and other European countries will follow soon! BUX Zero is a zero-commission stock trading app, which makes investing not only accessible but also easy to do directly from your phone.
Bayesian Optimization provides a principled technique based on Bayes Theorem to direct a search of a global optimization problem that is efficient and effective. It works by building a probabilistic model of the objective function, called the surrogate function, that is then searched efficiently with an acquisition function before candidate samples are chosen for evaluation on the real objective function.
This document serves as an introduction, crash course, and quick API reference for TensorFlow 2.0.
A large amount of data that is generated today is unstructured, which requires processing to generate insights. Some examples of unstructured data are news articles, posts on social media, and search history. The process of analyzing natural language and making sense out of it falls under the field of Natural Language Processing (NLP). Sentiment analysis is a common NLP task, which involves classifying texts or parts of texts into a pre-defined sentiment. You will use the Natural Language Toolkit (NLTK), a commonly used NLP library in Python, to analyze textual data.
In this post, we are going to work with Pandas iloc, and loc. More specifically, we are going to learn slicing and indexing by iloc and loc examples.
Once we have a dataset loaded as a Pandas dataframe, we often want to start accessing specific parts of the data based on some criteria. For instance, if our dataset contains the result of an experiment comparing different experimental groups, we may want to calculate descriptive statistics for each experimental group separately.
In this tutorial, we’re going to dig into how to transform data using Python scripts and the command line.
But first, it’s worth asking the question you may be thinking: “How does Python fit into the command line and why would I ever want to interact with Python using the command line when I know I can do all my data science work using IPython notebooks or Jupyter lab?”
Notebooks are great for quick data visualization and exploration, but Python scripts are the way to put anything we learn into production. Let’s say you want to make a website to help people make Hacker News posts with ideal headlines and submission times. To do this, you’ll need scripts.
Logistic regression is the bread-and-butter algorithm for machine learning classification. If you’re a practicing or aspiring data scientist, you’ll want to know the ins and outs of how to use it. Also, Scikit-learn’s
LogisticRegression is spitting out warnings about changing the default solver, so this is a great time to learn when to use which solver. 😀
TensorFlow 2 is now live! This tutorial walks you through the process of building a simple CIFAR-10 image classifier using deep learning. In this tutorial, we will:
- Define a model
- Set up a data pipeline
- Train the model
- Accelerate training speed with multiple GPUs
- Add callbacks for monitoring progress/updating learning schedules
The code in this tutorial is available here.
Comparing 5 popular neural net architectures on iOS: VGG16, ResNet50, InceptionV3, GoogleNet, and SqueezeNet using PyTorch.
Since the advent of deep reinforcement learning for game play in 2013, and simulated robotic control shortly after, a multitude of new algorithms have flourished. Most of these are model-free algorithms which can be categorized into three families: deep Q-learning, policy gradients, and Q-value policy gradients.
Onesies with logos of open source software. Your favorite open source software for your favorite munchkin.
Although there are an increasing number of commercial AutoML products, the open-source ecosystem has been innovating here as well. In the early days of the AutoML movement, the focus was on those looking to leverage the power of ML models without a background in data science – citizen data scientists. Today, however, AutoML tools have a lot to offer experts too.
One of the milestones of the investment management application was to implement an end to end solution that starts by fetching company stock prices and builds a set of efficient and optimum portfolios using optimisation routines.
In this article, we’ll use some basic machine learning methods to train a bot to play cards against me. The card game that I’m interested in is called Literature, a game similar to Go Fish.
The version of Literature that we implemented is roughly similar to the rules I linked above. Literature is played in two teams, and the teams compete to collect “sets.” A set is a collection of either A – 6 of a suit or 8 – K of a suit (7’s are not included in the game).
The purpose of this article is to introduce the reader to some of the tools used to spot stock market trends.
We will utilize a data set consisting of five years of daily stock market data for Analog Devices. The time period we consider starts on January 1, 2013 and ends on December 31, 2017. We will start analyzing the data using line plots, then introduce candlestick charts. Patterns that can be seen in the candlestick chart will be introduced which can be used to spot changes in the market. We add another of level analysis by overlaying moving averages and discussing how these can help confirm trend changes. Finally, we construct a figure that concisely summarizes the stock price data for any company.
An introduction to running parallel tasks with Celery, plus how and why we built an API on top of Celery’s Canvas task primitives.
One of the technology goals of Zymergen is to empower biologists to explore genetic edits of microbes in a high throughput and highly automated manner. The Computational Biology team at Zymergen is responsible for building software to help scientists design and execute these genetic edits. (For a brief overview, see our Zymergen 101 tutorial).
In this tutorial you will learn how to use OpenCV to stream video from a webcam to a web browser/HTML page using Flask and Python.
Python’s pandas library is one of the things that makes Python a great programming language for data analysis. Pandas makes importing, analyzing, and visualizing data much easier. It builds on packages like NumPy and matplotlib to give you a single, convenient, place to do most of your data analysis and visualization work.
The Machine Learning team at commercetools is excited to release the beta version of our new Image Search API.
Image search (sometimes called reverse image search) is a tool, where given an image as a query, a duplicate or similar image is returned as a response. The technology driving this search engine is called computer vision, and advancements in this field are giving way to some compelling product features.
What is Pyjanitor? Before we continue learning on how to use Pandas and Pyjanitor to clean our datasets, we will learn about this package. The python package Pyjanitor extends Pandas with a verb-based API. This easy to use API is providing us with convenient data cleaning techniques. Apparently, it started out as a port of the R package janitor. Furthermore, it is inspired by the ease-of-use and expressiveness of the r-package dplyr. Note, there are some different ways how to work with the methods and this post will not cover all of them (see the documentation).
In this tutorial, you will learn how to implement a simple scene boundary/shot transition detector with OpenCV.
In this post, which can be read as a follow up to our ultimate web scraping guide, we will cover almost all the tools Python offers you to web scrape. We will go from the more basic to the most advanced one and will cover the pros and cons of each. Of course, we won’t be able to cover all aspect of every tool we discuss, but this post should be enough to have a good idea of which tools does what, and when to use which.
One of the most common mistakes data scientists make when training machine learning models is incorrectly splitting data for training and testing. The train/test split involves splitting data during the model training and evaluation process.
Learner makes this simple with a single parameter selection during the model building process. It’s also simple to set the percentage split between training and testing data for each model trained.
Systematic trading allows you to test and evaluate your trading ideas before risking your money. By formulating trading ideas as concrete rules, you can evaluate past performance and draw conclusions about the viability of your trading plan.
Following systematic rules provides a consistent approach where you will have some degree of predictability of returns, and perhaps more importantly, it takes emotions and second guessing out of the equation.
From the onset, getting started with professional grade development and backtesting of systematic strategies can seem daunting. Many resort to simplified software which will limit your potential.
NAG has developed, in collaboration with Xi-FINTIQ, a CVA demonstration code to show how the NAG Library and NAG Algorithmic Differentiation (AD) tool dco/c++ combined with Origami – a Grid/Cloud Task Execution Framework available through NAG – can work together to solve large scale CVA computations.
What Softmax is, how it’s used, and how to implement it in Python.
Transfer learning is a powerful technique for training deep neural networks that allows one to take knowledge learned about one deep learning problem and apply it to a different, yet similar learning problem.
Using transfer learning can dramatically speed up the rate of deployment for an app you are designing, making both the training and implementation of your deep neural network simpler and easier.
In this tutorial, you will learn how to automatically find learning rates using Keras. This guide provides a Keras implementation of fast.ai’s popular “lr_find” method.
This article introduces how to build a Python and Flask based web application for performing text analytics on internet resources such as blog pages. To perform text analytics I will utilizing Requests for fetching web pages, BeautifulSoup for parsing html and extracting the viewable text and, apply the TextBlob package to calculate a few sentiment scores. The code for this article is hosted on GitHub so please fork and experiment with it.
With Python code to scrape, extract, transform and load it into a HDF5 data store to please your future self.
In this tutorial, you will learn how to use Cyclical Learning Rates (CLR) and Keras to train your own neural networks. Using Cyclical Learning Rates you can dramatically reduce the number of experiments required to tune and find an optimal learning rate for your model.
Searching for pulsars is a labor-intensive process that requires experienced astronomers and trained volunteers for their classification. In this article, we implement machine learning techniques to facilitate the process.
Data pipelines are where most of the time is spent for those working with data because the bulk of a machine learning project involves data collection and cleaning. Loominus gives everyone the power to build the data pipelines critical to any machine learning project.
Teraport is a powerful tool within the Loominus product suite that ingests and stages data. In another post, we’ll discuss the data ingestion APIs. For now we’ll focus on building a powerful data pipeline for feature engineering.
In this post we will learn how to create a binder so that our data analysis, for instance, can be fully reproduced by other researchers. That is, in this post we will learn how to use binder for reproducible research.
Hugging Face, the NLP startup behind several social AI apps and open source libraries such as PyTorch BERT, just released a new python library called PyTorch Transformers.
Transformers are a new set of techniques used to train highly performing and efficient models for performing natural language processing (NLP) and natural language understanding (NLU) tasks such as questions answering and sentiment analysis. Several of the recent techniques used to improve and advance the performance of NLP models, such as XLNet and BERT, are all based on a variation of Transformer.