Anton Antonov
MathematicaVsR at GitHub
November, 2016
This R-Markdown notebook was made for the R-part of the MathematicaVsR project “Text analysis of Trump tweets”.
The project is based in the blog post [1], and this R-notebook uses the data from [1] and provide statistics extensions or alternatives. For conclusions over those statistics see [1].
Here are the libraries used in this R-notebook. In addition to those in [1] the libraries “vcd” and “arules” are used.
library(plyr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(lubridate)
library(arules)
library(vcd)We are not going to repeat the Twitter messages ingestion done in [1] – we are going to use the data frame ingestion result provided in [1].
load(url("http://varianceexplained.org/files/trump_tweets_df.rda"))
#load("./trump_tweets_df.rda")This section demonstrates a way to derive word-device associations that is alternative to the approach in [1]. The Association rules learning algorithm Apriori is used through the package “arules”.
First we split the tweet messages into bags of words (baskets).
sres <- strsplit( iconv(tweets$text),"\\s")
sres <- llply( sres, function(x) { x <- unique(x); x[nchar(x)>2] })The package “arules” does not work directly with lists of lists. (In this case with a list of bags or words or baskets.) We have to derive a binary incidence matrix from the bags of words.
Here we add the device tags to those bags of words and derive a long form of tweet-index and word pairs:
sresDF <- 
  ldply( 1:length(sres), function(i) {
    data.frame( index = i, word = c( tweets$source[i], sres[i][[1]]) )
  })Next we find the contingency matrix for index vs. word:
wordsCT <- xtabs( ~ index + word, sresDF, sparse = TRUE)At this point we can use the Apriori algorithm of the package:
rulesRes <- apriori( as.matrix(wordsCT), parameter = list(supp = 0.01, conf = 0.6, maxlen = 2, target = "rules"))Apriori
Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen maxlen target   ext
        0.6    0.1    1 none FALSE            TRUE       5    0.01      1      2  rules FALSE
Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE
Absolute minimum support count: 13 
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[6572 item(s), 1390 transaction(s)] done [0.00s].
sorting and recoding items ... [184 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2Mining stopped (maxlen reached). Only patterns up to a length of 2 returned! done [0.00s].
writing ... [171 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].Here are association rules for “Android” sorted by confidence in descending order:
inspect( subset( sort(rulesRes, by="confidence"), subset = rhs %in% "Android" & confidence > 0.78) )     lhs                   rhs       support    confidence lift    
[1]  {A.M.}             => {Android} 0.01007194 1.0000000  1.824147
[2]  {@megynkelly}      => {Android} 0.01726619 1.0000000  1.824147
[3]  {@realDonaldTrump} => {Android} 0.08057554 0.9911504  1.808004
[4]  {Wow,}             => {Android} 0.01510791 0.9545455  1.741231
[5]  {time}             => {Android} 0.01366906 0.9500000  1.732940
[6]  {done}             => {Android} 0.01223022 0.9444444  1.722805
[7]  {over}             => {Android} 0.01079137 0.9375000  1.710138
[8]  {president}        => {Android} 0.01007194 0.9333333  1.702537
[9]  {because}          => {Android} 0.01870504 0.9285714  1.693851
[10] {@CNN}             => {Android} 0.01726619 0.9230769  1.683828
[11] {were}             => {Android} 0.01510791 0.9130435  1.665526
[12] {beat}             => {Android} 0.01366906 0.9047619  1.650419
[13] {U.S.}             => {Android} 0.01294964 0.9000000  1.641732
[14] {win}              => {Android} 0.01870504 0.8965517  1.635442
[15] {big}              => {Android} 0.01798561 0.8928571  1.628703
[16] {against}          => {Android} 0.01798561 0.8928571  1.628703
[17] {said}             => {Android} 0.02230216 0.8857143  1.615673
[18] {made}             => {Android} 0.01079137 0.8823529  1.609541
[19] {won}              => {Android} 0.01007194 0.8750000  1.596129
[20] {being}            => {Android} 0.01007194 0.8750000  1.596129
[21] {country}          => {Android} 0.01510791 0.8750000  1.596129
[22] {had}              => {Android} 0.01942446 0.8709677  1.588773
[23] {job}              => {Android} 0.01438849 0.8695652  1.586215
[24] {Republican}       => {Android} 0.02302158 0.8648649  1.577641
[25] {than}             => {Android} 0.02230216 0.8611111  1.570793
[26] {@nytimes}         => {Android} 0.01294964 0.8571429  1.563555
[27] {media}            => {Android} 0.02158273 0.8571429  1.563555
[28] {vote}             => {Android} 0.01654676 0.8518519  1.553903
[29] {You}              => {Android} 0.01223022 0.8500000  1.550525
[30] {more}             => {Android} 0.02446043 0.8500000  1.550525
[31] {jobs}             => {Android} 0.01079137 0.8333333  1.520122
[32] {but}              => {Android} 0.03165468 0.8301887  1.514386
[33] {would}            => {Android} 0.02733813 0.8260870  1.506904
[34] {very}             => {Android} 0.03381295 0.8245614  1.504121
[35] {America}          => {Android} 0.01007194 0.8235294  1.502239
[36] {got}              => {Android} 0.01007194 0.8235294  1.502239
[37] {ever}             => {Android} 0.01294964 0.8181818  1.492484
[38] {total}            => {Android} 0.01294964 0.8181818  1.492484
[39] {Sanders}          => {Android} 0.01582734 0.8148148  1.486342
[40] {totally}          => {Android} 0.01870504 0.8125000  1.482119
[41] {@FoxNews}         => {Android} 0.01798561 0.8064516  1.471086
[42] {Bernie}           => {Android} 0.02374101 0.8048780  1.468216
[43] {Trump}            => {Android} 0.04388489 0.8026316  1.464118
[44] {are}              => {Android} 0.06402878 0.8018018  1.462604
[45] {that}             => {Android} 0.08561151 0.7986577  1.456869
[46] {Ted}              => {Android} 0.02517986 0.7954545  1.451026
[47] {what}             => {Android} 0.01654676 0.7931034  1.446737
[48] {wants}            => {Android} 0.01079137 0.7894737  1.440116
[49] {just}             => {Android} 0.03237410 0.7894737  1.440116
[50] {much}             => {Android} 0.01582734 0.7857143  1.433258And here are association rules for “iPhone” sorted by confidence in descending order:
iphRules <- inspect( subset( sort(rulesRes, by="confidence"), subset = rhs %in% "iPhone" & support > 0.01) )     lhs                         rhs      support    confidence lift    
[1]  {#TrumpPence16}          => {iPhone} 0.01007194 1.0000000  2.213376
[2]  {THANK}                  => {iPhone} 0.01223022 1.0000000  2.213376
[3]  {#ImWithYou}             => {iPhone} 0.01366906 1.0000000  2.213376
[4]  {#VoteTrump}             => {iPhone} 0.01582734 1.0000000  2.213376
[5]  {#AmericaFirst}          => {iPhone} 0.01942446 1.0000000  2.213376
[6]  {Join}                   => {iPhone} 0.02733813 1.0000000  2.213376
[7]  {#Trump2016}             => {iPhone} 0.12302158 0.9500000  2.102707
[8]  {#CrookedHillary}        => {iPhone} 0.01151079 0.9411765  2.083177
[9]  {soon!}                  => {iPhone} 0.01151079 0.9411765  2.083177
[10] {#MakeAmericaGreatAgain} => {iPhone} 0.06546763 0.9100000  2.014172
[11] {#MAGA}                  => {iPhone} 0.01151079 0.8888889  1.967445
[12] {Thank}                  => {iPhone} 0.12086331 0.7850467  1.737603
[13] {you}                    => {iPhone} 0.11151079 0.7142857  1.580983
[14] {tonight}                => {iPhone} 0.01366906 0.6785714  1.501934
[15] {AGAIN!}                 => {iPhone} 0.01798561 0.6410256  1.418831
[16] {New}                    => {iPhone} 0.02086331 0.6304348  1.395389
[17] {you!}                   => {iPhone} 0.02446043 0.6296296  1.393607
[18] {&}                  => {iPhone} 0.03669065 0.6219512  1.376612Generally speaking, the package “arules” is somewhat awkward to use. For example, extracting the words of the column “lhs” would require some wrangling:
ws <- as.character(unclass(as.character(iphRules$lhs)))
gsub(pattern = "\\{|\\}", "", ws) [1] "#TrumpPence16"          "THANK"                  "#ImWithYou"             "#VoteTrump"             "#AmericaFirst"          "Join"                  
 [7] "#Trump2016"             "#CrookedHillary"        "soon!"                  "#MakeAmericaGreatAgain" "#MAGA"                  "Thank"                 
[13] "you"                    "tonight"                "AGAIN!"                 "New"                    "you!"                   "&"                 [1] David Robinson, “Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half”, (2016), VarianceExplained.org.