Twitter Archive

Visualization of the Twitter archive including spam.

Visualization of the Twitter archive without spam.

Thank you to everyone who made the 2013 ACRL/NY Symposium a success! We are especially proud that we had a lively Twitter conversation going on as well – and that this Twitter conversation continues! To keep a record of this Twitter activity around #acrlny13 (the event’s hashtag), we are keeping a Twitter Archive of the event. This archive began to collect tweets with the #acrlny13 hashtag the Monday before the event, and it will continue to collect these tweets until one week after the event (longer, if the conversations are still going). Additionally, we have multiple visualization sites of this archive available for you to explore – see the links above.

Why multiples visualization sites, you might ask? Because, thanks to everyone’s interest and support, our event hashtag #acrlny13 was a trending hashtag for nearly 8 hours and so featured on various Twitter sites (including the Twitter homepage). We nearly topped 200 tweets between 11 am and noon on Friday! And, as of right now, we have 1694 tweets collected from the event – 1580 with spam cleared out – from 222 tweeters, with 504 retweets. Wow! Due to all of this popularity, however, our hashtag became a target for Twitter spammers.

To deal with the spam, we took a multi-faceted approach: many of the top event tweeters were working in a ‘whack the mole’ type of game by reporting spammers as spam (thank you to @jvinopal for the awesome analogy to ‘whack the mole’). This helped to block a lot of these spam Twitter accounts from our feed (thanks in part to the large number of people involved reporting spam – I know, for my part, I was reporting and blocking spammers from 4 different accounts, to help speed this along). For the Twitter archive, there were some steps taken in the background to try to keep the visualization clean of spammers. To do this, we tried various codes to block tweets with certain terms (e.g., ‘Paul Walker’, ‘Nelson Mandela’, ‘Christmas guide’, etc.). This had limited success. Additionally, we tried limiting who was kept in the archive by how many followers they had (with the assumption that spammers would not have more than 20 followers, for example). Again, this had limited success. Together with the ‘whack-a-mole’, we were able to get spam amounts lower, but not entirely gone.

Finally, it came down to creating a ‘silo’ for spammers in our actual archive. The archive is based on code that pulls Tweets with certain hashtags into a spreadsheet in my Google Drive, which was then used for visualization. To silo spam, I created a copy of our live archive spreadsheet, removed spam from that page, and then used that as the basis of our new Twitter archive visualizations. This does not help keep spammers off our hashtag, but it does mean that the visualizations are a better representation of the actual conversation that went on around the event. However, none of this is perfect – and we continue to work on it.

Why not delete the spam entirely? Because, in the archive, they are evidence of the national, maybe even international, popularity of our hashtag. Also, they may prove of interest to future researchers – we will be storing our archive in Academic Commons, the institutional repository of Columbia University, where researchers will be able to access the data. One day they may be interested in what terms were used for Twitter spam – who knows. Finally, I hope to play with this data in tandem with other Twitter archivers to find better ways to block spam from certain feeds and visualizations, so that for the next Symposium, we can more quickly fence off spammers from our event hashtag. It seems odd – to decide to archive something that, during the event, we were trying to get rid of. But the chance to play with the data containing the spammers that survived our onslaught is too good to miss.

So please, continue to join the conversation, and thank you again to everyone that took part. If you have questions or would like to get involved, visit the ACRL/NY homepage.

– Christina Harlow, @cm_harlow, cmh2166@columbia.edu

**Thank you to Martin Hawksey for the free and open-source TAGS (Twitter Archiving Google Spreadsheet) work that underpins this Twitter archive and visualization.

2 Responses to Twitter Archive

  1. Pingback: Archiving and Visualizing Twitter at an Academic Conference | Kristen Mapes

  2. Pingback: Hello world!

Comments are closed.