In this part we’re finally going to visualise our twitter data. The process of data capturing and storing is described in the previous parts of this series. If you’re only interested in the processed data set used in this part, you can get it through this link : Tweets – The Interview.
Originally, I wanted to use PowerView in Excel but later I decided to go with Tableau 9.1 in order to get some hands-on experience with it. I was positively surprised of Tableau’s straightforwardness and overall great user experience.
Thanks to Tableau’s built-in connectors, connecting to Hive is simple. On the initial screen, click Hortonworks Hadoop Hive and fill in the connection values:

On the next screen, choose the default schema and search for the tweetsbi table. Eventually, we should get a data preview:

Clicking on Sheet 1, in the bottom left corner of the window, takes us to the editor. Using drag and drop, you should be able to build a similar matrix easily. Beyond that, it’s only a matter of picking an appropriate visualisation from the Show Me menu, and shifting dimensions and measures around.

I’ve played around a bit and, even though I’m not very experienced with Tableau, I was able to come up with the reports bellow in just under two hours.
I wanted to find the overall tweet activity while I was capturing the tweets. For this, I moved tweets’ timestamp into the columns section and number of tweets into measures.

In the previous part of this series, we attempted to determine tweets’ sentiment. I was interested in sentiment ration grouped by the country the tweets originated from. I went for a map visualisation with dynamic-sized pie charts.




I wonder how the result would change if I used any NLP on the tweet messages. I think I will come back to this topic one day …
I heard Tim Cook saying in an Apple Keynote that iOS users use their devices more frequently than other platforms users. Since we know the application the tweets originate from, we may determine the device’s operating system to confirm or disconfirm his statement.



If we take just the US figures – Android 5,923; iOS 14,064; all observations 40,524 – then using a chi-squared test we can confirm Cook’s statement.
Conclusion
In this series, we’ve gone through the whole process of data capturing, integrating, and exploration. For data capturing, we used Flume and stored the data into HDFS on a Hadoop sandbox. Data integration was performed using Hive tables and views. And finally, we explored the data and got an insight using Tableau.
Great series! I look forward to read about NLP on tweet messages. Great job indeed!