There comes a time in any information-generating technology's life when things get serious and someone says, "Hey, what about the records?" That discussion was sparked on Twitter this week by a great post on Bits, Bytes & Archives. How can we capture and preserve Twitter feeds for the future?
The way we use Web 2.0 technology is growing up, and if this stuff is worth saving, it's worth asking some grown-up questions about how we might manage and preserve the information that Twitter and other technologies help us create, share, and access. The big question is how to adapt our established methods and create new methods when the old ones just don't apply. Here's my own first attempt at a Twitter records analysis - how would you tackle this?
Purpose
Every good records decision includes some thought about the process that created the records and the purpose(s) behind them. In my case, there's probably not much historical value to my own Twitter feed, but if I were tweeting recalls for the FDA or interacting with customers for a company, I would need to think about including those official updates in our organization's records universe.
Content and Context
If I decide to capture Twitter records, what could that include?
I can capture a .csv file using a site like Tweetake for the content itself, but I'm not really catching the look and feel of my own Twitter page when I do that. It's probably a minor point, since the users who interact with me usually see my tweets in the context of their own pages or feeds. If I decide to update that profile or page background, though, I might need to consider the value of a snapshot.
Follower profiles and tweets
It's possible to capture the updates of your friends and followers in addition to your own, and it might be worth considering under some circumstances, but that could make for a very long list. There are also some potential privacy implications. Most tweets are public, but some are restricted. In that case, do you capture those, and if so, do you let that person know? There's a certain amount of caveat emptor in any technology like this, but I have a feeling that knowing that a company or public institution is capturing and saving your profile and/or a list of your updates might feel a little different to some of us.
Conversations
I haven't found an easy way to capture tweeted conversations, although maybe a tool like TweetDeck or FriendFeed might help. Some tweets really only make sense in context, and if you're not capturing both sides of the conversation, is that enough? If an entire group is collaborating via Twitter, would a listing pulled from Twitter search do the trick? You can list @ replies and capture those, but I haven't found a way to group conversations in sequence.
Direct Messages
A regular RSS feed of tweets or a crawl of my Twitter page wouldn't grab incoming or outgoing DMs, so those might need to go on the list, too.
Links
I tend to use Twitter to share news links with colleagues and friends. Twitter automatically shortens some URLs with TinyURL, and I choose to shorten others with services like is.gd and bit.ly. As others have pointed out, this creates a preservation challenge because I'm relying on both the URL service to retain the connection to the original link and the host of the linked page to retain the content. If I'm personally linking out to a news story, I'm inclined not to worry too much about long-term retention of those items, but that may not be the case for everyone. If it is important, might I need to retain the shortened URL, the original URL, and/or the linked content? If I'm announcing a new site or post of my own, maintaining the link between the archived tweet and that archived content could be on my list, too.
Supporting documentation
If I have an official policy or process that governs what gets tweeted, how, and by whom, it makes sense to capture and retain that information, too.
Capture
The Bits Bytes & Archives post and comments outline some possibilities for capture, and I'm betting that the archives/RM/IT community can figure this one out if we work together.
I've tried Tweetake and a simple print-to-PDF to capture my tweets, but I think some work with the Twitter API would be needed to make the process more efficient and complete. Tweetake did allow me to capture a full list of my own tweets and some information about my Twitter friends, including name, screen name and profile details, and most recent tweet. If someone developed some standard code to capture similar data, institutions could run their own capture routines on a regular basis.
It occurs to me that I could also print out screen shots of my Twitter page, but that seems antique. (On the other hand, we do still have my great-grandmother's paper-based diary.)
Retention and Deletion
Depending on the context, I could argue for retention for legal/regulatory reasons (we announced a great new sales deal or got public input on a new rule), program or project reasons (here's the discussion on our presentation), administrative (here's the tweet that made us think twice about that candidate), corporate history (here's how we publicized our activities), or personal history (wow, Mom had a funny hairstyle, and she used this thing called Twitter).
On the retention side, it seems to me that keeping a .csv file or simple HTML file alive is a pretty safe bet if managed and handled properly. On the disposal side, things get a bit trickier. You can build retention periods to meet business needs, and you can click the little trash can icon in Twitter to delete messages when they're no longer needed, but like many things on the Web, tweets are permanent (until they're not) and incredibly temporary (until they're not).
Twitter allows you to delete entries, but if your updates are public, it's very possible that someone outside Twitter is also saving them. Twitturly, for example, captures the links you've shared via Twitter and tells you how many others have tweeted the same links. It also captures the text of your tweet, although when I checked this morning, it looked like it presented all the links but only quoted the actual text of tweets for the latest messages. The older tweets were back again when I checked just now. It's a handy site because it's another way to translate all those shortened URLs, but again, I'd have to be concerned about relying on the kindness of strangers to retain that for me.
If you tweeted your US election voting experience, Plodt captured that particular message. Tweets fed into a blog, Facebook, and other networks are also living out there in the ether. I even found an old ARMA conference tweet of mine out on an insurance-related feed because I'd mentioned an insurance company that received an award at the conference. If you're tweeting, you're probably being tracked by Twitturly, Twitterholic, #Hashtags, Retweetist, Twittermap, FriendFeed and others, not to mention any Google cached pages. This doesn't have to keep you from tweeting, but it does mean that "delete" doesn't necessarily mean deleted.
What's a poor Twitterer to do?
As is the case with most electronic records, it seems that the core questions include:
- What is your purpose?
- What are the best ways to accomplish that purpose, and what tools or systems support them?
- What information/data/records get created/captured as part of the process?
- What information/data/records get created or captured to support the systems or tools?
- How long do we need them?
- How long do others need them?
- What do we need to save?
- How do we make sure we can find it when we need it?
- How do we capture and save this stuff?
- You still okay in there, little record? Need to be migrated or reformatted?
I won't pretend that my tweets, blogs, stories, or wild ideas need serious long-term preservation, but I'd argue that some others out there do merit our effort and attention. What should be our next step?
You bring up retention, which I had not thought of before. We are dealing with two types of things - records of an organization and papers of a person. What I tweet, as an individual, falls into personal papers. Anything I keep from my tweets is merely for arrogance as to what my posterity, if such even exists, will want. But an organization might need to keep tweets for a while for administrative, legal, or fiscal reasons. I'm glad you brought this up - it is something we all need to think about.
Posted by: Russell D. James, CA | March 19, 2009 at 12:40 AM
I day late and a dollar short, I just found this. This morning I printed all my tweets to pdf, but am not sure why. Some of them may be sort of official, others just "stuff."
This is going turning out to be an interesting discussion. Why do we keep records/archives? Given the answer to that, how do we cope with how we use different technologies to communicate?
I'm glad you are here :D
Posted by: jana gallatin | July 07, 2009 at 06:35 PM