python

Sentiment Analysis, Python Machine Learning and Twitter

Sentiment140 is a tool that allows you to evaluate a written text in order to determine if the writer has a positive or negative opinion about a specific topic. Facing 2015 Argentinian presidential election we are going to evaluate the public image of the most important candidates: Sergio Massa, Mauricio Macri and Daniel Scioli. For this purpose we are going to use the python library called Tweepy to collect thousands of tweets in which they are mentioned.

What is sentiment analysis?

Sentiment analysis aims to determine the attitude of a speaker or writer with respect to some topic or the overall contextual polarity of the document. The attitude may be his or her judgment or evaluation, affective state or the intended communication.
Written text can be broadly categorized into two types: facts and opinions.

  • Opinions carry people’s sentiments and feelings. These kind of texts can be classified in positive or negative. For example if we get the following text: “I would like to see Macri as president” we can suppose that the speaker has a positive opinion about the candidate. In the same way, if we get the text: “I wouldn’t like to see Macri as president” we can suppose that the speaker has a negative impression about him.
  • A fact can be for example: “Today Macri visited three neighborhoods“. We should ignore these kind of texts because we can’t determine if the writer has a positive or a negative opinion about the politician.

Analyzing sentiment on a regular basis will help you understand people’s feelings towards your company, brand, your product or whatever you want to analyze.

Sentiment140

There are a few free tools available which provide automatic sentiment analysis. One of the most used nowadays is Sentiment140. This is an API that uses machine learning algorithms to classify tweets. Sentiment140 was created by three Computer Science graduate students at Stanford University (Alec Go, Richa Bhayani and Lei Huang).

How to get the data?

In order to get thousands of tweets about each candidate, Tweepy is a very used python library for accessing the Twitter API.

Installation
The easiest way to install Tweepy is using PIP, if you have this tool open a command line and type:

If not, you can use Git to clone the repository and install it manually:

Get Started with Tweepy:
We won’t get into much detail here about the library but below there is an example about how to get tweets from a specific topic and put them in an array:

Once having the necessary tweets, as in the example, we have 2500 tweets about a candidate in an array, we need to pass these tweets to Sentiment140 API in order to catalog them.

Requests

Requests should be sent via HTTP POST to “http://www.sentiment140.com/api/bulkClassifyJson”. The body of the message should be a JSON object. Here’s an example:

We can ignore some fields in the request like “id”, “query” and “language” but it’s recommended to provide the field ‘query’ to prevent certain keywords from influencing sentiment.

Response

The response will be the same as the request, except for a new field “polarity” added to each object. In our example the response will be:

The polarity values are:

  • 0 : negative
  • 2 : neutral
  • 4 : positive

There are no explicit limits on the number of tweets in the bulk classification service, but there is timeout window of 60 seconds. That is, if the request takes more than 60 seconds to process the server will return a 500 error.
In the candidates’ example, we used the python library urllib2 to send the data via HTTP POST, so the complete code to evaluate a candidate is the following:

Result

In order to get a better estimation it is advised to run our program at least three times in different days and add the results to get a clearer percentage.
In our example, the result showed the following percentages:
candidates-statistics