Friday, November 18, 2011

Is This Bit Rot?

The term "bit rot" gets batted around a lot, but its definition isn't so easy to nail down. ComputerUser's dictionary describes it as "gradual decay of storage media" while the Free On-Line Dictionary of Computing states that it is "a hypothetical disease the existence of which has been deduced from the observation that unused programs or features will often stop working after sufficient time has passed, even if 'nothing has changed.'" Some online technical dictionaries do not include the term. Meanwhile, Wikipedia's bit rot entry is surprisingly short and contains multiple definitions: decay of storage media, decay of data on storage media, and degradation of software programs. (The entry is flagged as needing citations to reliable sources.)

So maybe I'm not alone in wondering what bit rot looks like in a word-processing file. Some of our archival collections, such as the Marshall Green papers and Notgemeinschaft für eine freie Universität records, contain 5.25-inch floppy disks, which often present problems. Today I'm looking at a disk containing files that open on a PC. They suffer from some malady, as indicated by this sample from a letter dated January 21, 1986:

<<<<<>>>>

Thankó verù mucè foò á lovelù luncheoî anä somå splendiä views® Wå 
imaginå yoõ no÷ iî Indiá anä wondeò iæ yoõ arå listeninç tï somå oæ thå 
samå Indianó witè whoí wå talkeä yearó ago® Thå artistó anä economistó 
werå quitå remarkable¬ buô thå politicaì scientistó useä tï talë abouô 
atomiã bombó foò Indiá witè eager¬ burninç eyeó whilå beinç verù carefuì 
noô tï kilì anù insects® (Severaì haä theiò beardó covereä iî whitå 
silk so that no insect would get caught and be stifled there.)

<<<<<>>>>>

The file name, Enid, has no file extension, so it is difficult to determine what software was used to create it. The sample above is from the rendering in MS Word with Windows character encoding, but no matter what software I open the file in, I get some gibberish. But it isn't all gibberish--the last line is completely legible. Sometimes a letter displays correctly, like the "n" in "insect," but not in other cases, like the "n" that should end "luncheoî" in the first line. Is this bit rot?

Regardless of the diagnosis, the next question is what to do. Because we can infer what many of the corrupted characters should be, we can match them to their actual counterpart, as in this sample:

Corrupted version True character
ë k
ì l
í m
î n
ï o
ð p

Then we could use the find-and-replace feature to fix all the corrupted characters. But it is more difficult to infer the correct characters for corrupted numbers. In addition, I assume that matching corrupted to true characters also varies from one file to the next--after all, this decay probably doesn't occur in the same, predictable manner and at a steady rate. Furthermore, we've got to deal with thousands of corrupted files on hundreds of disks. Lastly, if we did restore all the files, how can we ensure that the researchers studying them understand our restoration process and its implications for the authenticity and reliability of the content?

If you have answers to any of the above problems, I think Wikipedia needs you to enhance its bit rot entry.

An example of bit rot. Hoover Institution Archives

Wednesday, November 9, 2011

A Note On Hoover History

Recently, the Hoover Tower’s exterior was cleaned and thus brought closer to how it looked when it opened in 1941. The interior? Not even close. After all, we don’t have a radio room any more.

Radio room? Yes, the Hoover Tower was built with a radio room in the blueprints. Funnily enough, none of us was aware of this until a few years ago; when we were preparing an exhibit on the Institution’s 90th anniversary, I found a recording that mentioned it was produced in “the radio room of the Hoover Tower.”

Why did we have a radio room? Good question. From the records we’ve been able to search, the room was there, at least in part, because of World War II. It seems the room’s purpose was to listen to and record foreign broadcasts, in concert with the military. Over the years, its purpose began to change, with radio programs being produced in the room, including Wealth of the West, a McLaughlin-Group-like program of the day’s issues. This change prompted Stanford to build a production studio across the street in Memorial Hall. (Fun Archives fact: in the Wealth of the West recording on which the radio room was name-checked, one of the guests mentioned an event at the Commonwealth Club of California.)

Do we still have any of these recordings? Another good question. No, but we have something close.

When the room was spec’d out, the records indicate that the broadcasts were intended to be recorded onto cylinders. This doesn’t quite jive, though, because, by the late 1930s, cylinders were an obsolete format. The tower opened in 1941, and it seems odd they would use a format that can record only five minutes at a time when technology of that era (discs) got up to twenty minutes at a time. In any case, there are no cylinders in our stacks that I’m aware of.

We do, however, have a lot of lacquer discs cut in 1942 of English-language, American-audience-intended shortwave broadcasts from Tokyo, Chungking, Bangkok, and Australia. These discs were cut in San Francisco and accessioned by Hoover in 1958. As it happens, I’m currently working on preserving these discs.

So what happened to the radio room? Yet another good question. I don’t know except that the area in which the room used to be is now the wheelchair entrance to the Tower.

And so the mystery remains.

Original floor plan of the Hoover Tower, including the Radio Room. Hoover Records, Hoover Institution Archives.