Monday, March 7, 2016

Building a Database of User-Marked Irony

One of the things I am wanting to do is to put together a database of statements/situations that some people deem as "ironic".

One aspect of this involves assigning numeric values for various attributes often associated with irony - some info on my starts on this is described in this blog post. That blog post also mentions my site https://learnforeverlearn.com/ironycat/ where you can submit your own characterizations of ironic images or situations.

Another strategy is to gather up things that people have already marked as "ironic" in some way. Here, I don't care if the dictionary says it is ironic. Rather, I am interested in things that people think are ironic.

I implemented a simple node.js app to get the past week's worth of tweets to get these tweets based on certain terms in the tweets. It takes about 20-30 minutes for this to run for a given search term (you have to throttle calls due to rate limits of twitter api; more details on the approach are at the bottom of this post).

The basic idea here is to first get tweets that are at least seven days ago (seven days back is the documented limit with the twitter api search), note the maximum tweet id returned, and then afterwards repeatedly call the twitter api using sequential ranges for the tweet id based on a constant size for the range of ids. As far as I could tell, a tweet id width of 10000000000000 seems to cover approximately an hour.

I did this as separate runs for the following search terms for the previous week:

  • "the irony of" (~4800 tweets, ~700/day)
  • "isn't it ironic" (~1200 tweets, ~170/day)
  • #ironic (~1200 tweets, ~170/day)
  • ironically (~14000 tweets, ~2000/day)

There is much playing to do with these data. Just quickly looking through the list of tweets obtained is fascinating. For example, it's obvious that many of the tweets that are user-marked as ironic are examples of hypocrisy - maybe this is because it is an election year. It would be interesting to better quantify just what that fraction is, among other things.

Some of the samples of "self-marked irony" are included below.

Got a certificate this morning for Hugo completing his puppy training. And he's ripped it up. 😩#ironic

(tweet id 703978717073293313

The irony of my lack of attention when trying to do some reading about attention

(tweet id 704317070054465537)

the irony of life

(there seem to be lots of these; tweet id 704314264660475905)

the irony of walking out of the gym and lighting a smoke

(tweet id 704450646439763968)

The irony of a 9am lecture on attitudes resulting in a very negative attitude

(tweet id 704604911007305729)

The irony of watching a documentary about why I should cook as I eat a frozen pizza.

(tweet id 706335709913866241)

Oh the irony of tweets

(tweet id 704481709497028608)

I'm bored waiting to board... #ironic?

(tweet id 705278919931461633

Tried to send a message saying "Ah, the perils of modern life." But no, my phone had to say "the peril sofa" #Ironic

(tweet id 704684867133394944

Needed a spoon for my yogurt, only had knives. Isn't it #ironic @Alanis #donchathink #irony

(tweet id 704666654244888576; btw, lots of Morissette shoutouts with the "isn't it ironic" search)

Obviously, one can also connect to the twitter streaming api to get new tweets with these terms to add to the database (and I need to do that), but the method used here could be used to fill gaps once that is set up.

Notes on the Twitter Api Search

The reference for this information is https://dev.twitter.com/rest/reference/get/search/tweets .

  • You can make up to 180 calls in a 15 minute window - this is at most once every 5 seconds
  • The most tweets you can get in a single call to the api is 100
  • The farthest back you can go is seven days
  • You can use the "until" parameter to get tweets created before the given date, where the date should be formatted as "YYYY-MM-DD"
  • You can specify both a minimum ("since_id") and maximum ("max_id") tweet id for the search.