Tag Archives: Twitter

Digging through what Twitter knows about me

I joined Twitter on February 21, 2007, at exactly 15:14:48, and I created my account via the web interface. As you can see, my first tweet was pretty mundane!

I remember discussing this exciting cool “new Web 2.0 site” with Kim Plowright @mildlydiverting in Roo’s office in Hursley a couple of days before, and before long he, Ian and I were all trying this new newness out. It was just before the 2007 SXSWi, where Twitter really started to get on the radar of the geekerati.

But wait a moment! It’s impossible to pull back more than just over the last 3,000 tweets using the API, so how was I able to get all the way back to 5 years ago and display that tweet when I’ve got over 33,000 of them to my name?

It’s a relatively little-known fact that you can ask Twitter to disclose everything they hold associated with your account – and they will (at least, in certain jurisdictions – I’m not sure whether they will do this for every single user but in the EU they are legally bound to do so). I learned about this recently after reading Anne Helmond’s blog entry on the subject, and decided to follow the process through. I first contacted Twitter on April 24, and a few days later faxed (!) them my identity documentation, most of which was “redacted” by me :-) Yesterday, May 11, a very large zip file arrived via email.

I say very large, but actually it was smaller than the information dump that Anne received. Her tweets were delivered as 50Mb of files, but mine came in nearer to 9Mb zipped – 17Mb unzipped. I’d expected a gigantic amount of data in relation to my tweets, but it seems as though they have recently revised their process and now only provide the basic metadata about each one rather than a whole JSON dump.

So, what do you get for your trouble? Here’s the list of contents, as outlined by Twitter’s legal department in their email to me.

- USERNAME-account.txt: Basic information about your Twitter account.
- USERNAME-email-address-history.txt: Any records of changes of the email address on file for your Twitter account.
- USERNAME-tweets.txt: Tweets of your Twitter account.
- USERNAME-favorites.txt: Favorites of your Twitter account.
- USERNAME-dms.txt: Direct messages of your Twitter account.
- USERNAME-contacts.txt: Any contacts imported by your Twitter account.
- USERNAME-following.txt: Accounts followed by your Twitter account.
- USERNAME-followers.txt: Accounts that follow your Twitter account.
- USERNAME-lists_created.txt: Any lists created by your Twitter account.
- USERNAME-lists_subscribed.txt: Any lists subscribed to by your Twitter account.
- USERNAME-lists-member.txt: Any public lists that include your Twitter account.
- USERNAME-saved-searches.txt: Any searches saved by your Twitter account.
- USERNAME-ip.txt: Logins to your Twitter account and associated IP addresses.
- USERNAME-devices.txt: Any records of a mobile device that you registered to your Twitter account.
- USERNAME-facebook-connected.txt: Any records of a Facebook account connected to your Twitter account.
- USERNAME-screen-name-changes.txt: Any records of changes to your Twitter username.
- USERNAME-media.zip: Images uploaded using Twitter’s photo hosting service (attached only if your account has such images).
- other-sources.txt: Links and authenticated API calls that provide information about your Twitter account in real time.

Of these, let’s dig a bit more deeply into just a few of the items, no need to pick everything to pieces.

The “tracking data” is contained in andypiper-devices.txt and andypiper-ipaudit.txt – interesting. The devices file essentially contains information on my phone, presumably for the SMS feature. They know my number and the carrier. The IP address list tracks back to the start of March, so they have 2 months of data on what IPs have been used to access my account. I’ve yet to subject that to a lot of scrutiny to check where those are located, that’s another script I need to write.

I took a look at andypiper-contacts.txt and was astonished to find out how much of my contact data Twitter’s friend finder and mobile apps had slurped up. I mean, I don’t even have all of this in my address book… given the fact that the information contained the sender email addresses for various online retailer newsletters, I’m guessing that Google’s API (I’m a Gmail user) probably coughed up not just my defined contact list, but also all of the email addresses from anyone I’d ever heard from, ever.

Fortunately, there’s a way to remove this information permanently, which Anne has written about. I went ahead and did that, and then Twitter warned me that the Who To Follow suggestions might not be so relevant. That’s OK because I don’t use that feature anyway – and in practice, I’ve noticed no difference in the past 24 hours!

I use DMs a lot for quick communication, particularly with colleagues (it was a pretty reliable way of contacting @andysc when I needed him at IBM!). That’s reflected in the size of andypiper-dms.txt, which is also a scary reminder to myself that I used to delete them, but since Twitter now makes it harder to get to and delete DMs, I’ve stopped removing them and there’s a lot of private data I wish I’d scrubbed.

Taking a peek at the early tweets in andypiper-tweets, I’m trying to remember when the @reply syntax was formalised and when Twitter themselves started creating links to the other person’s profile. Many of my early tweets refer to @roo and @epred and I don’t think they ever went by those handles. 5 years is a long time.

I mentioned that the format used to deliver the data appears to have changed since Anne made her request. She got a file containing a JSON dump of each tweet including metadata like retweet information, in_reply_to, geo, etc etc.. By comparison, I now have simply creation info, status ID (the magic that lets you get back to the tweets via web UI), and the text itself:

********************
user_id: 786491
created_at: Wed Feb 21 15:43:54 +0000 2007
created_via: web
status_id: 5623961
text: overheating in an office with no 
comfort cooling or aircon. About to drink water.

It’s a real shame that they have taken this approach, as it means the data is now far more cumbersome to parse and work with. However, using some shell scripts I did some simple slicing-and-dicing because I was curious how my use of Twitter had grown over time. Here’s a chart showing the numbers of tweets I posted per year (2012 is a “to date” figure of course). It looks like it was slow growth initially but last year I suddenly nearly doubled my output.

Still considering what other analysis I’d like to do. I can chart out the client applications I’ve used, or make a word cloud showing how my conversational topics have changed over time… now that all of the information is mine, that is. It is just a shame I have to do so much manual munging of the output beforehand.

Oh, and the email I received from Twitter Legal also said:

No records were found of any disclosure to law enforcement of information about your Twitter account.

So, that’s alright then…

Why did I do this? firstly, because I believe in the Open Web and ownership of my own data. Secondly, because I hope that I’ll now be able to archive this personal history and make it searchable via a tool like ThinkUp (which I’ve been running for a while now, but not for the whole 5 years). Lastly… no, not “because I could”… well OK at least partly because I could… because I believe that companies like Twitter, Facebook, Google and others should be fully transparent with their users and the data they hold, and that by going through this currently-slightly-painful procedure it will encourage Twitter to put in place formal tools to provide this level of access to everyone in a frictionless manner.

If you’ll excuse me, I’m off to dig around some more…

Several weeks ago – in Lotus

A very quick, and very belated, post to note that I was one of the guests on the episode of the This Week in Lotus podcast recorded March 18th 2011 (episode 43, for those keeping count). A good fun, panel discussion about what was new online and in collaborative and social spaces that week. In particular, we picked off a bunch of topics such as LotusLive supporting the earthquake disaster in Japan, Twitter’s new guidance on developing client apps, and IBM’s broader software capabilities.

You may want to dip in and take a listen… it’s far broader than just being about “Lotus” software, and the regular co-hosts Stuart and Darren are always worth a listen. Give it a try.

A Kind(l)er way of consuming tweets

Kindle CoverI picked up an Amazon Kindle 3 over the Christmas period, primarily because I wanted to be able to support a family member who also acquired one. I’d been impressed by the hardware when I’d had a chance to play with a Kindle 3 recently (I’d always thought that the screen refresh and form factor would put me off, but they don’t), and I may also want to dabble in the possibility of developing kindlet applications for the platform. To my mind, despite some limitations, it could be a fantastic slate for displaying relatively-static business content like facts and figures, and of course it is light and has fantastic battery life. I’ve gone for the wifi-only model, not because I wasn’t tempted by the possibility of global free 3G access, but purely because I didn’t consider that I’d need to use it to connect to the wireless much when out-and-about and away from a wifi network.

So far I’ve been very impressed with the device. It is simple, has reasonable usability – although a web interface via Amazon’s website for creating and organising Collections would be exceedingly welcome – and it is definitely encouraging me to read a lot more. It’s a tiny point, but I’m enjoy the progress bars at the bottom of the page that show me how far I’ve got through each book.

Almost by accident the other day I noticed one of my colleagues retweet a comment from David Singleton:

Now to be fair, this hit me squarely between the eyes – I have the former, and do indeed like the latter. So I just had to ping him and find out more!

Moments later, I had been invited to blootwee.

After a short signup process on the website (hint: it didn’t work brilliantly on the Kindle browser, but it can be done very quickly on a desktop machine), my Kindle refreshed itself with a new document “blootwee for andypiper”.

This slideshow requires JavaScript.

So what is this doing? Well, essentially, it is scooping my tweets up, grabbing the associated / linked content, creating an ebook, and emailing it to my Kindle – for free. As you will see from the gallery above, the book has tweets at the start, one per page. By following any links, you can jump forward to the point where that web page content is embedded. You can then hit the Back button to return to where you were in the Twitter timeline.

David is currently offering the ability to do this for free on an ad-hoc basis, but he also has some very low-cost paid options to enable this to happen on a daily basis… so you end up essentially with a “newspaper” based on tweets and interesting web pages from your network. The transcoding of web content is not ideal – obviously Flash is not present and image-based content is missing – but it provides a nice way of summarising the content.

I like it. I’m not sure it will become my default way of reading tweets by any means, but what it does give me is a very convenient way of gathering up interesting web content on a daily basis, and reviewing it as I travel. With a 25-hour trip to Australia coming up in the near future, I can see this could be quite useful!

Ping me via Twitter or comment below if you want an invite, and I’ll update this when they are gone.

Notes, because people might ask:

  1. To take a screenshot on the Kindle 3, hit Shift-Alt-G… then hook up via USB and grab the .gif files from the Documents folder.
  2. The linen slip case for my Kindle came from an etsy seller called kindlecovers.
  3. I have a few more images of my Kindle on Flickr.

Digital Local Government

I just saw my mate Dominic Campbell retweet something interesting from Monmouthshire County Council (yes, really!)

For those who don’t know Dominic, a) you NEED to be following him on Twitter, and b) he’s the great guy behind the consultancy FutureGov which runs a whole swathe of events and projects which are about encouraging and supporting government organisations as they come online. I’m a big fan!

Every time I see something like this, I immediately wonder how my local council is doing. So this evening, I had a quick poke at the Hampshire and Rushmoor websites (hint: Rushmoor, you don’t have to require the ‘www’, you can use a redirect), and followed that up with a look at neighbouring Surrey (I’ve just joined the new Digital Surrey committee, incidentally, and spend a lot of time there, so I have a legitimate interest). What I saw didn’t really encourage me, so I found the contact form on the Rushmoor Borough Council website, and for the sake of transparency on my part if not on theirs, here is what I posted:

Details of your comment/complaint:
Do you provide news or information via any social websites such as Twitter, Facebook, YouTube, or others? I’ve observed that several councils have begun to share budget information, provide important news alerts etc via these kinds of services and they would fit well with my lifestyle.
(http://twitter.com/#!/monmouthshirecc and http://www.monmouthshire.gov.uk/site/scripts/news_article.php?newsID=386)

Are there RSS feeds for news from your site? Do you have any APIs for access to local information and data (http://www.sunderland.gov.uk/index.aspx?articleid=4112)

Do you have any kind of digital engagement strategy and where can I find it?

What would you like us to do?
Provide better access to local information online and explain the council’s view of how to use new technology to engage with people.

I’ll be interested to read what they send back! I’ve been extremely disappointed with my local MP’s digital engagement (oh look, his Twitter stream abruptly stops about a month after the date of last year’s General Election, hmm!). Still, at least if the local councils need some help, I know a bunch of very good people to put them in touch with…

The year of consolidation

An interesting year so far in terms of online services ending or merging. I don’t have a good enough memory to mention all of those that have vanished this year, but there are a number of notable examples I thought I’d highlight, mainly because I’ve used them in the past. I last did a short review of some of these consolidations about two years ago.

So where to start… well, I just read the news that drop.io has been acquired by Facebook. It’s a file-sharing service which was incredibly easy to get set up. I wrote about drop.io a couple of years ago and at the time it was an exciting service with a lot of potential, a growing developer community, and some very cool plans like location-sensitive drops, content transcoding, and so on. I guess for me its utility was rapidly eclipsed once I discovered Dropbox which I now use to sync content between 2 laptops, a netbook, a home server and my iPhone, and which my Dogear Nation co-hosts and I use to share our content (not using it yet? try this referral link). It looks like drop.io is effectively closing on December 15th.

Two notable (to me) video services are going, too. [well, OK, as I write this, one has gone, and the other one is on its way]. Seesmic – the original video version, not the microblogging / update service – is closing. This was a service which wanted to pioneer a “video Twitter” conversation concept, and it was interesting to start off with – I mentioned it in my round-up of online video services back in February 2008. For me, I enjoyed the experiment, and there are a lot of ways in which video online has grown and become an effective way of delivering content, but text has remained my major conversational medium so Seesmic didn’t work out longer term. Of course it has spawned a successful business on the back of Twitter and other sites in the form of Seesmic Web and Desktop clients (and they acquired Ping.fm as well).

Another fun and fascinating video service has gone away – 12seconds.tv has just a page of video static greeting visitors now. I loved that service, although again I struggled to make longer term use of it… but I’m often to be seen sporting my 12seconds t-shirt :-)

In the cases of both Seesmic and 12seconds I’m left to wonder where to re-host my content… kudos to both sites for enabling me to get access to what would otherwise be lost. I suspect I will end up dumping them to YouTube since that isn’t likely to go away in a hurry. Of course the Seesmic videos, particularly the conversational ones, won’t make so much sense without the context.

Vox went the way of the dodo in 2010 as well. As an early adopter I tend to try out most services and I had a small but largely inactive blog over on Vox. I can’t say I’m too sad about its end as I’m perfectly comfortable with a blog at WordPress… it’s funny that Windows Live Spaces bloggers are being migrated to WordPress too – a sign of the times I think, as we’re seeing many of these earlier diverse networks collapse into the larger, more established networks (Vox to SixApart/Typepad, and whilst Windows Live Spaces is hardly supported by a non-established brand in Microsoft, but they are obviously refocussing just like everyone else).

The final service worth mentioning, I think, is xMarks. This is a service I only started using in the middle of the year, in an attempt to synchronise my browser content between the iPhone and other devices. The sudden announcement that it was heading for the buffers back in September led to an outpouring of despair and support from the user community, and as a result what was looking like a failure ended up being a near death experience – they initially took user donations, and have now negotiated a sale (so this is more consolidation, in a sense).

So what’s next? Well the microblogging wars seem to have died out, Twitter has won over e.g. former contenders like Jaiku and Pownce, although most online services appear to be integrating their own “updates” concept to continue to seem relevant. The big spaces where I’m personally seeing competition / overlap at the moment are in sites like Tumblr vs Posterous for general content sharing, and in online identity landing pages where about.me, chi.mp and flavors.me want my business. There are a number of fascinating new music-oriented services as well and I think some of those will start to overlap as they add features. The rest of the competition and fight for success seems to me to be in mobile apps and between runtimes on the handhelds. Just a personal point-in-time observation as 2010 starts to draw to a close.

The circle of life played out on the Internet – early innovation and excitement, a plateau of limited success leading to, possibly, monetisation (and/or an explosion of copycats), and a quiet death disappointing a small user community, or heady growth and unlimited stock prices. It’s an interesting space to continue to watch for us early adopters…