- Instagram‘s massive surge in April is likely due to;
- release of their Android app
- publicity around the Facebook aquisition
- Twitpic, Yfrog, Lockerz were already in decline before Instgram’s surge and are flattening out
- Pinterest (very slowly) increased its share from 1% to 2% for the year to date
The Twitter “spritzer” stream – a random sample approximately 1% of the full firehose – is freely available and I have been collecting it since the new year. Uncertain as to what I wanted to do with the data I simply wrote the raw JSON stream into ~1GB files, compressed them and stored them in Amazon S3.
As of this morning I have 715 million tweets weighing in around 1.6TB of uncompressed JSON data (270GB compressed).
I finally came up with something I wanted from the data, I was interested in extracting links from tweets for a little side-project. A side-effect of that is the market-share of social picture sharing services looked really interesting so I wanted to share! :o)
To extract the data I wrote a simple Elastic Map/Reduce (EMR) job to churn through the compressed JSON archives. Any tweet that featured a link would be parsed for the host-name and counted in date buckets.
Lastly I filtered the list for domains I knew to be social picture sharing services – if I have missed any please shout about it in the comments below. In the case of Yfrog both counts from yfrog.com and yfrog.us were aggregated. Facebook was left out because it was non-trivial to disambiguate what was being linked to (status, pics, videos, apps etc.)
If you are interested in any further information that I can glean from this data, please let me know. I already have a bunch more charts from this dataset I’ll post about later this week.