I’m not following you: Twitter data and R

I wondered how many of the people that I follow on Twitter do not follow me back. A quick way to look at this is with R. OK, a really quick way is to give a 3rd party application access rights to your account to do this for you, but a) that isn’t safe, b) you can’t look at anyone else’s data, and c) this is quantixed – doing nerdy stuff like this is what I do. Now, the great thing about R is the availability of well-written packages to do useful stuff. I quickly found two packages twitteR and rtweet that are designed to harvest Twitter data. I went with rtweet and there were some great guides to setting up OAuth and getting going.

The code below set up my environment and pulled down lists of my followers and my “friends”. I’m looking at my main account and not the quantixed twitter account.


library(rtweet)
library(httpuv)
## setup your appname,api key and api secret
appname <- "whatever_name"
key <- "blah614h"
secret <- "blah614h"
## create token named "twitter_token"
twitter_token <- create_token(
app = appname,
consumer_key = key,
consumer_secret = secret)

clathrin_followers <- get_followers("clathrin", n = "all")
clathrin_followers_names <- lookup_users(clathrin_followers)
clathrin_friends <- get_friends("clathrin")
clathrin_friends_names <- lookup_users(clathrin_friends)

The terminology is that people that follow me are called Followers and people that I follow are called Friends. These are the terms used by Twitter’s API. I have almost 3000 followers and around 1200 friends.

This was a bit strange… I had fewer followers with data than actual followers. Same for friends: missing a few hundred in total. I extracted a list of the Twitter IDs that had no data and tried a few other ways to look them up. All failed. I assume that these are users who have deleted their account (and the Twitter ID stays reserved) or maybe they are suspended for some reason. Very strange.


## noticed something weird
## look at the twitter ids of followers and friends with no data
missing_followers <- setdiff(clathrin_followers$user_id,clathrin_followers_names$user_id)
missing_friends <- setdiff(clathrin_friends$user_id,clathrin_friends_names$user_id)

## find how many real followers/friends are in each set
aub <- union(clathrin_followers_names$user_id,clathrin_friends_names$user_id)
anb <- intersect(clathrin_followers_names$user_id,clathrin_friends_names$user_id)

## make an Euler plot to look at overlap
fit <- euler(c(
"Followers" = nrow(clathrin_followers_names) - length(anb),
"Friends" = nrow(clathrin_friends_names) - length(anb),
"Followers&Friends" = length(anb)))
plot(fit)

In the code above, I arranged in sets the “real Twitter users” who follow me or I follow them. There was an overlap of 882 users, leaving 288 Friends who don’t follow me back – boo hoo!

I next wanted to see who these people are, which is pretty straightforward.


## who are the people I follow who don't follow me back
bonly <- setdiff(clathrin_friends_names$user_id,anb)
no_follow_back <- lookup_users(bonly)

Looking at no_follow_back was interesting. There are a bunch of announcement accounts and people with huge follower counts that I wasn’t surprised do not follow me back. There are a few people on the list with whom I have interacted yet they don’t follow me, which is a bit odd. I guess they could have unfollowed me at some point in the past, but my guess is they were never following me in the first place. It used to be the case that you could only see tweets from people you followed, but the boundaries have blurred a lot in recent years. An intermediary only has to retweet something you have written for someone else to see it and you can then interact, without actually following each other. In fact, my own Twitter experience is mainly through lists, rather than my actual timeline. And to look at tweets in a list you don’t need to follow anyone on there. All of this led me to thinking: maybe other people (who follow me) are wondering why I don’t follow them back… I should look at what I am missing out on.

## who are the people who follow me but I don't follow back
aonly <- setdiff(clathrin_followers_names$user_id,anb)
no_friend_back <- lookup_users(aonly)
## save csvs with all user data for unreciprocated follows
write.csv(no_follow_back, file = "nfb.csv")
write.csv(no_friend_back, file = "nfb2.csv")

With this last bit of code, I was able to save a file for each subset of unreciprocated follows/friends. Again there were some interesting people on this list. I must’ve missed them following me and didn’t follow back.

I used these lists to prune my friends and to follow some interesting new people. The csv files contain the Twitter bio of all the accounts so it’s quick to go through and check who is who and who is worth following. Obviously you can search all of this content for keywords and things you are interested in.

So there you have it. This is my first “all R” post on quantixed – hope you liked it!

The post title is from “I’m Not Following You” the final track from the 1997 LP of the same name from Edwyn Collins.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: