This book covers the building blocks of the most common methods in machine learning. This set of methods is like a toolbox for machine learning engineers. Those entering the field of machine learning should feel comfortable with this toolbox so they have the right tool for a variety of tasks. Each chapter in this book corresponds to a single machine learning method or group of methods. In other words, each chapter focuses on a single tool within the ML toolbox.
In order to implement an algorithmic trading strategy though, you have to first narrow down a list of stocks that you want to analyze. This walk-through provides an automated process (using python and logistic regression) for determining the best stocks to algo-trade.
I will dive deeper into the logic and code below, but here is a high-level overview of the process:
- Import the historical data of every stock using yahoo finance.
- Pull in over 32 technical indicators for each stock using the technical analysis library.
- Perform a logistic regression on each stock using 5, 30, and 60 day observation time periods.
- Interpret the results.
Array programming provides a powerful, compact and expressive syntax for accessing, manipulating and operating on data in vectors, matrices and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It has an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, materials science, engineering, finance and economics.
In this tutorial, you will be learning how to build powerful time-series forecasting model of your own using various kinds of deep learning algorithms such as Dense Neural Networks (DNN), Convolutional Neural Network (CNN) and Recurrent Neural Networks (RNN). Also, this course is an elaboration of the time-series forecasting tutorial by TensorFlow.
LendingClub is the world’s largest peer-to-peer lending platform. Until recently (through the end of 2018), LendingClub published a public dataset of all loans issued since the company’s launch in 2007.
This post reviews NumPy main components and functionality, with attention to the needs of Data Science and Machine Learning practitioners, and people who aspire to become a data professional.
Lessons learned building a profitable algorithmic trading system using Reinforcement Learning techniques.
The plotting functionality in the popular Python data analysis library Pandas has always been one of my go-to methods for super quick charts. However, the available visualisations have always been fairly basic and not particularly pretty.
It’s easy to get carried away with the wealth of data and free open-source tools available for data science. After spending a little bit of time with the quandl financial library and the prophet modeling library, I decided to try some simple stock data exploration. Several days and 1000 lines of Python later, I ended up with a complete stock analysis and prediction tool. Although I am not confident (or foolish) enough to use it to invest in individual stocks, I learned a ton of Python in the process and in the spirit of open-source, want to share my results and code so others can benefit.
MlFinlab is a python package which helps portfolio managers and traders who want to leverage the power of machine learning by providing reproducible, interpretable, and easy to use tools. Adding MlFinLab to your companies pipeline is like adding a department of PhD researchers to your team.
The price of energy changes hourly, which opens up the possibility of temporal arbitrage: buying energy at a low price, storing it, and selling it later at a higher price. To successfully execute any temporal arbitrage strategy, some amount of confidence in future prices is required, to be able to expect to make a profit. In the case of energy arbitrage, the constraints of the energy storage system must also be considered. For example, batteries have limited capacity, limited rate of charging, and are not 100% efficient in that not all of the energy used to charge a battery will be available later for discharge.
This series explains concepts that are fundamental to deep learning and artificial neural networks for beginners. In addition to covering these concepts, we also show how to implement some of the concepts in code using Keras, a neural network API written in Python. We will learn about layers in an artificial neural network, activation functions, backpropagation, convolutional neural networks (CNNs), data augmentation, transfer learning and much more!
Exchange rates API is a simple and lightweight free service for current and historical foreign exchange rates.
In this post we will discuss how to do a time series modelling using ARMA and ARIMA models. Here AR stands for Auto-Regressive and MA stands for Moving Average
Since its emergence in Asia late 2019, the coronavirus COVID-19 pandemic has been devastating. The virus spread to most countries causing severe respiratory infections and many human casualties. The virus also put half of the world population in lockdown which resulted in a slowdown of the world economy and a fall in stock prices.
The goal of this tutorial is to introduce the steps for collecting and analyzing stock data in the context of the coronavirus pandemic. To do this, we will use Python, Google Sheets and Google Finance.
In finance, computation efficiency can be directly converted to trading profits sometimes. Quants are facing the challenges of trading off research efficiency with computation efficiency. Using Python can produce succinct research codes, which improves research efficiency. However, vanilla Python code is known to be slow and not suitable for production. In this post, I explore how to use Python GPU libraries to achieve the state-of-the-art performance in the domain of exotic option pricing.
If you have a relative working in the banking industry, ask the person what annoys him/her most about the job. You will surely receive an answer that is related to the task of data entry i.e. the practice of manually entering serial numbers and names from financial documents into the bank’s database.
Lots of quantitative risk metrics for analyzing your backtest and trading performance. Created by Quantopian for their popular Zipline backtesting framework, this library works totally independently.
The experience while accessing the AI platform and running machine learning (ML) training code on the platform must be smooth and easy for the researchers. Migrating any ML code from a local environment to the platform should not require any refactoring of the code at all. Infrastructure configuration overhead should be minimal. Our mission while developing PyKrylov was to abstract the ML logic from the infrastructure and Krylov core components as much as possible in order to achieve the best experience for the platform users.
One of the milestones of the investment management application was to implement an end to end solution that starts by fetching company stock prices and builds a set of efficient and optimum portfolios using optimisation routines.
Predictive model to correctly forecast future trend is crucial for investment management and algorithmic trading. The use of technical indicators for financial forecasting is quite common among the traders. Input window length is a time frame parameter required to be set when calculating many technical indicators.
Our problem here is to define whether or not a certain news article is fake news. The dataset is comprised of 3997 news articles each includes a title, text, and the target label as a REAL/FAKE binary label. Part of the course was also testing the model on a test dataset but I never received target for this dataset. The accuracy score of cross validation testing within the training dataset was 94%.
Loominus has opened up registration and is offering free accounts for a limited time. There’s a whole slew of new features including private data repos, data pipeline cloning, automated data pipelines, enhanced column type detection, UI/UX improvements, detailed information for active tasks, model stream updates and updated API documentation.
Loominus is an end-to-end platform that helps teams ingest and stage data, build advanced machine learning models with no code and deploy them into production. Loominus makes it easy for individuals and teams without experience building machine learning pipelines to take advantage of machine learning faster. Loominus is equally great for experienced data scientists that need to focus on model selection and tuning.
Since the invention of the automobile, manufacturers have steadily added more safety features and improved car design over time with the goal of keeping drivers safer on the road. Automotive manufacturers have spent millions of dollars researching safety improvements for seatbelts, tires, and pretty much every car piece or part imaginable. Despite all of this investment, driving remains substantially more fatal than alternatives such as air travel in 2019. According to the National Safety Council, approximately 40,000 people died in automotive accidents in the United States alone in 2018. In fact, there were a total of ~500 deaths resulting from plane crashes recorded globally in 2018 — that’s 80 times fewer deaths when compared to car crash fatalities in the US only.
A common misconception is that the market cannot be predicted and that hedge fund managers are no better than dart-throwing monkeys. Many academic research papers back up this claim with data. This is an overly simplistic view. Just because some markets cannot be predicted under some experimental settings, such as equities traded on a daily basis, this does not mean no market can be predicted in any setting. Let us try to get an intuitive understanding of what it means to predict the market.
This article will discuss several tips and shortcuts for using
iloc to work with a data set that has a large number of columns. Even if you have some experience with using
iloc you should learn a couple of helpful tricks to speed up your own analysis and avoid typing lots of column names in your code.
It’s hard to imagine a modern, tech-literate business that doesn’t use data analysis, data science, machine learning, or artificial intelligence in some form. NumPy is at the core of all of those fields.
n this tutorial, you will learn how to detect fire and smoke using Computer Vision, OpenCV, and the Keras Deep Learning library.
This is a collection of Jupyter notebooks based on different topics in the area of quantitative finance. Wow!
What it says on the tin.
From GPS navigation to network-layer link-state routing, Dijkstra’s Algorithm powers some of the most taken-for-granted modern services. Utilizing some basic data structures, let’s get an understanding of what it does, how it accomplishes its goal, and how to implement it in Python (first naively, and then with good asymptotic runtime!)
TensorFlow Lite is a framework for running lightweight machine learning models, and it’s perfect for low-power devices like the Raspberry Pi! This video shows how to set up TensorFlow Lite on the Raspberry Pi for running object detection models to locate and identify objects in real-time webcam feeds, videos, or images.
This April a 1.5 billion dollar medicare scheme took advantage of hundreds of thousands of seniors in the US. In reality, this is just a small sliver of the billions of dollars healthcare fraud costs both consumers and insurance providers annually.
Healthcare fraud can come from many different directions. Some people might think of the patient who pretends to be injured, but actually, much of fraud is caused by providers(as in the NYT article).
Providers often have financial incentives for increasing performing unnecessary surgeries or claiming work they never even did. This leads to many different flavors of fraud that can all be difficult to detect on a claim by claim basis.
In this tutorial, you will learn how to train your own traffic sign classifier/recognizer capable of obtaining over 95% accuracy using Keras and Deep Learning.
This article summarizes how to clean up messy currency fields and convert them into a numeric value for further analysis. The concepts illustrated here can also apply to other types of pandas data cleanup tasks.
Investing was always associated with large amounts of money, both in terms of the invested amount as well as costs associated with it. Here at BUX, we want to make investing accessible to everyone. That is why we recently launched BUX Zero in the Netherlands and other European countries will follow soon! BUX Zero is a zero-commission stock trading app, which makes investing not only accessible but also easy to do directly from your phone.
Bayesian Optimization provides a principled technique based on Bayes Theorem to direct a search of a global optimization problem that is efficient and effective. It works by building a probabilistic model of the objective function, called the surrogate function, that is then searched efficiently with an acquisition function before candidate samples are chosen for evaluation on the real objective function.
This document serves as an introduction, crash course, and quick API reference for TensorFlow 2.0.
A large amount of data that is generated today is unstructured, which requires processing to generate insights. Some examples of unstructured data are news articles, posts on social media, and search history. The process of analyzing natural language and making sense out of it falls under the field of Natural Language Processing (NLP). Sentiment analysis is a common NLP task, which involves classifying texts or parts of texts into a pre-defined sentiment. You will use the Natural Language Toolkit (NLTK), a commonly used NLP library in Python, to analyze textual data.
In this post, we are going to work with Pandas iloc, and loc. More specifically, we are going to learn slicing and indexing by iloc and loc examples.
Once we have a dataset loaded as a Pandas dataframe, we often want to start accessing specific parts of the data based on some criteria. For instance, if our dataset contains the result of an experiment comparing different experimental groups, we may want to calculate descriptive statistics for each experimental group separately.
In this tutorial, we’re going to dig into how to transform data using Python scripts and the command line.
But first, it’s worth asking the question you may be thinking: “How does Python fit into the command line and why would I ever want to interact with Python using the command line when I know I can do all my data science work using IPython notebooks or Jupyter lab?”
Notebooks are great for quick data visualization and exploration, but Python scripts are the way to put anything we learn into production. Let’s say you want to make a website to help people make Hacker News posts with ideal headlines and submission times. To do this, you’ll need scripts.
Logistic regression is the bread-and-butter algorithm for machine learning classification. If you’re a practicing or aspiring data scientist, you’ll want to know the ins and outs of how to use it. Also, Scikit-learn’s
LogisticRegression is spitting out warnings about changing the default solver, so this is a great time to learn when to use which solver. 😀
TensorFlow 2 is now live! This tutorial walks you through the process of building a simple CIFAR-10 image classifier using deep learning. In this tutorial, we will:
- Define a model
- Set up a data pipeline
- Train the model
- Accelerate training speed with multiple GPUs
- Add callbacks for monitoring progress/updating learning schedules
The code in this tutorial is available here.