Tag Archives: Twitter

I’m not following you: Twitter data and R

I wondered how many of the people that I follow on Twitter do not follow me back. A quick way to look at this is with R. OK, a really quick way is to give a 3rd party application access rights to your account to do this for you, but a) that isn’t safe, b) you can’t look at anyone else’s data, and c) this is quantixed – doing nerdy stuff like this is what I do. Now, the great thing about R is the availability of well-written packages to do useful stuff. I quickly found two packages twitteR and rtweet that are designed to harvest Twitter data. I went with rtweet and there were some great guides to setting up OAuth and getting going.

The code below set up my environment and pulled down lists of my followers and my “friends”. I’m looking at my main account and not the quantixed twitter account.


library(rtweet)
library(httpuv)
## setup your appname,api key and api secret
appname <- "whatever_name"
key <- "blah614h"
secret <- "blah614h"
## create token named "twitter_token"
twitter_token <- create_token(
app = appname,
consumer_key = key,
consumer_secret = secret)

clathrin_followers <- get_followers("clathrin", n = "all")
clathrin_followers_names <- lookup_users(clathrin_followers)
clathrin_friends <- get_friends("clathrin")
clathrin_friends_names <- lookup_users(clathrin_friends)

The terminology is that people that follow me are called Followers and people that I follow are called Friends. These are the terms used by Twitter’s API. I have almost 3000 followers and around 1200 friends.

This was a bit strange… I had fewer followers with data than actual followers. Same for friends: missing a few hundred in total. I extracted a list of the Twitter IDs that had no data and tried a few other ways to look them up. All failed. I assume that these are users who have deleted their account (and the Twitter ID stays reserved) or maybe they are suspended for some reason. Very strange.


## noticed something weird
## look at the twitter ids of followers and friends with no data
missing_followers <- setdiff(clathrin_followers$user_id,clathrin_followers_names$user_id)
missing_friends <- setdiff(clathrin_friends$user_id,clathrin_friends_names$user_id)

## find how many real followers/friends are in each set
aub <- union(clathrin_followers_names$user_id,clathrin_friends_names$user_id)
anb <- intersect(clathrin_followers_names$user_id,clathrin_friends_names$user_id)

## make an Euler plot to look at overlap
fit <- euler(c(
"Followers" = nrow(clathrin_followers_names) - length(anb),
"Friends" = nrow(clathrin_friends_names) - length(anb),
"Followers&Friends" = length(anb)))
plot(fit)

In the code above, I arranged in sets the “real Twitter users” who follow me or I follow them. There was an overlap of 882 users, leaving 288 Friends who don’t follow me back – boo hoo!

I next wanted to see who these people are, which is pretty straightforward.


## who are the people I follow who don't follow me back
bonly <- setdiff(clathrin_friends_names$user_id,anb)
no_follow_back <- lookup_users(bonly)

Looking at no_follow_back was interesting. There are a bunch of announcement accounts and people with huge follower counts that I wasn’t surprised do not follow me back. There are a few people on the list with whom I have interacted yet they don’t follow me, which is a bit odd. I guess they could have unfollowed me at some point in the past, but my guess is they were never following me in the first place. It used to be the case that you could only see tweets from people you followed, but the boundaries have blurred a lot in recent years. An intermediary only has to retweet something you have written for someone else to see it and you can then interact, without actually following each other. In fact, my own Twitter experience is mainly through lists, rather than my actual timeline. And to look at tweets in a list you don’t need to follow anyone on there. All of this led me to thinking: maybe other people (who follow me) are wondering why I don’t follow them back… I should look at what I am missing out on.

## who are the people who follow me but I don't follow back
aonly <- setdiff(clathrin_followers_names$user_id,anb)
no_friend_back <- lookup_users(aonly)
## save csvs with all user data for unreciprocated follows
write.csv(no_follow_back, file = "nfb.csv")
write.csv(no_friend_back, file = "nfb2.csv")

With this last bit of code, I was able to save a file for each subset of unreciprocated follows/friends. Again there were some interesting people on this list. I must’ve missed them following me and didn’t follow back.

I used these lists to prune my friends and to follow some interesting new people. The csv files contain the Twitter bio of all the accounts so it’s quick to go through and check who is who and who is worth following. Obviously you can search all of this content for keywords and things you are interested in.

So there you have it. This is my first “all R” post on quantixed – hope you liked it!

The post title is from “I’m Not Following You” the final track from the 1997 LP of the same name from Edwyn Collins.

Advertisements

You Know My Name (Look Up The Number)

What is your h-index on Twitter?

This thought crossed my mind yesterday when I saw a tweet that was tagged #academicinsults

It occurred to me that a Twitter account is a kind of micro-publishing platform. So what would “publication metrics” look like for Twitter? Twitter makes analytics available, so they can easily be crunched. The main metrics are impressions and engagements per tweet. As I understand it, impressions are the number of times your tweet is served up to people in their feed (boosted by retweets). Engagements are when somebody clicks on the tweet (either a link or to see the thread or whatever). In publication terms, impressions would equate to people downloading your paper and engagements mean that they did something with it, like cite it. This means that a “h-index” for engagements can be calculated with these data.

For those that don’t know, the h-index for a scientist means that he/she has h papers that have been cited h or more times. The Twitter version would be a tweeter that has h tweets that were engaged with h or more times. My data is shown here:

TwitterAnalyticsMy twitter h-index is currently 36. I have 36 tweets that have been engaged with 36 or more times.

So, this is a lot higher than my actual h-index, but obviously there are differences. Papers accrue citations as time goes by, but the information flow on Twitter is so fast that tweets don’t accumulate engagement over time. In that sense, the Twitter h-index is less sensitive to the time a user has been active on Twitter, versus the real h-index which is strongly affected by age of the scientist. Other differences include the fact that I have “published” thousands of tweets and only tens of papers. Also, whether or not more people read my tweets compared to my papers… This is not something I want to think too much about, but it would affect how many engagements it is possible to achieve.

The other thing I looked at was whether replying to somebody actually means more engagement. This would skew the Twitter h-index. I filtered tweets that started with an @ and found that this restricts who sees the tweet, but doesn’t necessarily mean more engagement. Replies make up a very small fraction of the h tweets.

I’ll leave it to somebody else to calculate the Impact Factor of Twitter. I suspect it is very low, given the sheer volume of tweets.

Please note this post is just for fun. Normal service will (probably) resume in the next post.

Edit: As pointed out in the comments this post is short on “Materials and Methods”. If you want to calculate your ownTwitter h-index, go here. When logged in to Twitter, the analytics page should present your data (it may take some time to populate this page after you first view it). A csv can be downloaded from the button on the top-right of the page. I imported this into IgorPro (as always) to generate the plots. The engagements data need to be sorted in descending order and then the h-index can be found by comparing the numbers with their ranked position.

The post title is from the quirky B-side to the Let It Be single by The Beatles.