This article introduces how to build a Python and Flask based web application for performing text analytics on internet resources such as blog pages. To perform text analytics I will utilizing Requests for fetching web pages, BeautifulSoup for parsing html and extracting the viewable text and, apply the TextBlob package to calculate a few sentiment scores. The code for this article is hosted on GitHub so please fork and experiment with it.
Hugging Face, the NLP startup behind several social AI apps and open source libraries such as PyTorch BERT, just released a new python library called PyTorch Transformers.
Transformers are a new set of techniques used to train highly performing and efficient models for performing natural language processing (NLP) and natural language understanding (NLU) tasks such as questions answering and sentiment analysis. Several of the recent techniques used to improve and advance the performance of NLP models, such as XLNet and BERT, are all based on a variation of Transformer.
People often complain about important subjects being covered too little in the news. One such subject is climate change. The scientific consensus is that this is an important problem, and it stands to reason that the more people are aware of it, the better our chances may be of solving it. But how can we assess how widely covered climate change is by various media outlets? We can use Python to do some text analysis!
The Pattern library is a multipurpose library capable of handling the following tasks:
- Natural Language Processing: Performing tasks such as tokenization, stemming, POS tagging, sentiment analysis, etc.
- Data Mining: It contains APIs to mine data from sites like Twitter, Facebook, Wikipedia, etc.
- Machine Learning: Contains machine learning models such as SVM, KNN, and perceptron, which can be used for classification, regression, and clustering tasks.
In this article, we will see the first two applications of the Pattern library from the above list. We will explore the use of the Pattern Library for NLP by performing tasks such as tokenization, stemming and sentiment analysis. We will also see how the Pattern library can be used for web mining.
Topic Model: In a nutshell, it is a type of statistical model used for tagging abstract “topics” that occur in a collection of documents that best represents the information in them.
Many techniques are used to obtain topic models. This post aims to demonstrate the implementation of LDA: a widely used topic modeling technique.
In this article, we will explore TextBlob, which is another extremely powerful NLP library for Python. TextBlob is built upon NLTK and provides an easy to use interface to the NLTK library. We will see how TextBlob can be used to perform a variety of NLP tasks ranging from parts-of-speech tagging to sentiment analysis, and language translation to text classification.