Capturing data from Opera Ballet Vlaanderen’s outdated carriers

In the summer of 2017, PACKED vzw attempted to retrieve data from Opera Ballet Vlaanderen’s collection of outdated carriers: CD-Rs, CD-RWs, Zip drives, SyQuests and magneto-optical drives. Read here how we went about it.

Status

In July and August 2017, Emanuel Lorrain and working student Alex Jaou from PACKED vzw processed 474 carriers. A further 45 carriers couldn’t be processed because no suitable reading equipment had yet been found.

Issue

Opera Ballet Vlaanderen unites the Vlaamse Opera (Flemish Opera) and the Royal Ballet of Flanders, and presents major classical works and new creations from the world of opera, dance and ballet.

The institution has a collection of outdated carriers: 519 obsolete and/or unreliable carriers in the form of CD-Rs, CD-RWs, Zip drives, SyQuests and magneto-optical drives. Unfortunately, however, Opera Ballet Vlaanderen no longer has any equipment that can read the content on these carriers and/or transfer it to more modern storage media.

Following on from the Resurrection Lab project, PACKED vzw started collecting equipment that could read the most common outdated carriers and migrate them to contemporary data storage media. The Opera Ballet Vlaanderen collection was a suitable case for testing the old reading devices and the capture station set-up.

Methode

Following on from the Resurrection Lab project, PACKED vzw started collecting equipment that could read the most common outdated carriers and migrate them to contemporary data storage media. The Opera Ballet Vlaanderen collection was a suitable case for testing the old reading devices and the capture station set-up.

Image 1: creating a logical image.

In some cases, disk images were also created alongside the logical images because they’re a better way of storing data from a carrier in its entirety. This is because the carrier is copied bit by bit when creating a disk image. So it’s not just the content from the carrier (the files) that are saved, but also all the system information that’s available from the carrier. Saving this data means you can stay as close to the authentic carrier as possible.

The copied carriers were listed in a spreadsheet with the following columns:

  1. UI (unique identifier): the unique identifier is comprised of the organisation’s initials (OB) followed by sequential 4-digit numbers. The numbering started at 1 (0001), so OB0001 for example refers to the first carrier processed.
  2. Institution: the institution’s official name, Opera Ballet Vlaanderen.
  3. Disk type: the carrier type and capacity.
  4. Information on the carrier: all information written on the carrier, such as on labels or in pen on the carrier itself.
  5. Functional? If the carrier can be read by the reading equipment, it’s considered to be functional.
  6. Copied? This field indicates if all files could be copied from the carrier successfully. If it isn’t possible to copy all files, this is recorded in the ‘notes’ field.
  7. Notes: this column contains all the other relevant information about the carrier, e.g. that there were multiple disks in a box. All information about damaged files, files that can’t be copied or files that encounter a problem during the process is noted here.

Following the capturing process, the carriers were divided into two categories. The first category consists of carriers that were functional and could be fully copied. The second category is for carriers that couldn’t be copied.

Capturing data from CD-R and CD-RW

We used a MacBook Pro from 2010 with a Mac OS X El Capitan operating system to copy data from CD-Rs and CD-RWs, with a DVD-ROM drive from 2004 as the reading device. The advantage of using a DVD-ROM is that it can read disks but not write on them, so data cannot be changed by accident. The DVD-ROM drive has an IDE port[1]. We therefore used an IDE-USB cable to connect the IDE to the USB port on the MacBook.

Image 2: an IDE-USB cable.
Image 3: the IDE-USB cable has a Molex connector which powers the DVD-ROM drive.
Image 4: DVD-ROM drive connection with laptop.

The collection contains 227 optical disks: CD-Rs and CD-RWs. The majority of these disks were kept in boxes, usually identified with labels or writing on the disks themselves. 57 optical disks were not kept in a box, which clearly resulted in visible damage in the form of scratches. Of these 57 disks, 47 were fully functional and could be copied. In total we were unable to copy 31 disks, of which 10 were not kept in a box. Only 13 disks were unreadable and considered to be not functional.

Most of the problems encountered when trying to copy the content from the optical disks were caused by damaged files. These files sometimes blocked the copying process, which meant we had to restart the process without the damaged files. The majority of the damaged files were images, which mostly had a visible line cutting through them. If we were still able to copy the file, one of the two sections created by this line was missing. We weren’t able to copy some of the damaged files at all. Most of the damaged files had the .tiff extension.

Image 5: corrupt TIFF file.
Image 6: corrupt TIFF file.
Type Functional Fully copied Not functional Not fully copied
CD-R 214 196 12 30
CD-RW 1 1 0 0
total 215 197 12 30

Capturing data from Zip disks 100

Zip disks are storage media for computers developed by Iomega. They appeared on the market in 1994 and were available with storage capacities of 100 MB, 250 MB and 750 MB. You need a Zip disk drive to read a Zip disk. The Zip disk drives differ in storage capacity: a Zip 100 drive can read a Zip disk 100 but not a Zip disk 250; a Zip 250 drive can read both a Zip disk 250 and Zip disk 100.

We used a modern MacBook Pro to capture the data from the Zip disks 100. And we used a Zip 100 drive and a Zip 250 drive, both of which were manufactured in 2000. The Zip disk drives have a USB port so we could connect them to the laptop using a USB cable. We didn’t need any extra software to read the Zip drives. But because Zip drives don’t have a built-in write protection tab, we used a write blocker as an intermediate piece between the Zip disk drive and the MacBook Pro. A write blocker stops a computer from writing files on the external drive, which ensures that the data on the drive remains authentic.

The reading equipment didn’t function well, however. The eject function didn’t work so we had to disconnect and reconnect the devices each time to manually remove the Zip disk from the equipment. We also weren’t able to remove some of the Zip disks in this way and had to forcibly remove them from the drive by hand. There was one Zip disk that we couldn’t place in the reading equipment at all.

Image 7: Zip disks 100.
Image 8: a Zip 100 drive.
Image 9: a write blocker.
Image 10: connecting a Zip 100 drive to a write blocker (grey cable), and write blocker to a laptop (blue cable).

The errors in the files were similar to the damage that we encountered with the optical disks. TIFF files had the most errors. There was a visible line that split the image in different sections when they were opened on the disk. When we were able to the copy this file, one of the sections created by this line was missing.

We processed 228 Zip disks in total. 220 of these Zip disks were functional and we were able to fully copy 219 of them.

Type Functional Fully copied Not functional Not fully copied
Zip disk 100 220 219 8 9
total 220 219 8 9

Capturing data from SyQuest disks

SyQuests are storage media for computers in the form of cartridges developed by SyQuest Technology. They were available in different storage capacities, but the 44 MB, 88 MB and 200 MB versions were particularly popular. They were mainly used for larger files, such as for desktop publishing or digital photography. After 1991, when the 88 MB disks were introduced, they became a de facto standard in the Apple Macintosh world for the storage, transfer and backing up of large amounts of data.[2]

You need a SyQuest drive to read SyQuest disks. Just like for Zip disks, there is a different drive for each storage capacity. A 44 MB SyQuest drive can only read disks up to 44 MB, and the 200 MB drive can read 44 MB, 88 MB and 200 MB cartridges, among others. SyQuest drives were connected to a computer via an SCSI connector[3]. Because we didn’t have an old computer with an SCSI port, and modern laptops are no longer equipped with this type of connector, we went in search of an older computer model. And we found a Macintosh PowerBook G3 from 1999, which has SCSI, USB and PCCard ports and can run on both Mac OS 9 and Mac OS X operating systems. Additional software was needed for the computer to communicate with the SCSI device: SCSIProbe 5.2.1. And the Classic Mac environment (Mac OS 9) was required for this.

We used both a SyQuest 44 MB and a 200 MB drive. We connected the reading equipment directly to the computer via the SCSI connection. Before the SyQuest disks were placed in the equipment, we activated the write protection on them so that the content on the disk could not be changed. The content from some SyQuests appears immediately on the computer’s desktop when they’re plugged into the equipment. But this wasn’t the case for other SyQuests, so we had to use the SCSI software (SCSI Probe).

Image 11: SyQuest disks.
Image 12: two SyQuest drives.
Image 13: SyQuest drives and Macintosh Powerbook G3.
Image 14: write protection tab in the bottom right.
Image 15: the disk is write-protected.

In total we processed 16 SyQuests, of which 1 was 88 MB and 15 were 44 MB. 12 of these 16 cartridges were functional, and we were able to fully copy 9 of them. As well as the logical images, we also created disk images of the disks. We used Disk Copy software for this which is installed on Mac OS 9 as standard. We weren’t able to create disk images of the disks that weren’t functional. We also didn’t create any disk images of the disks that caused problems during the copying. Of the 9 fully copied disks, we were able to create disk images for 8 of them. In contrast to the previous carriers, it was difficult for us to identify any recurring properties for the damaged files. The only recurring feature we saw was in two files called HYDEn.ch, which were both damaged. The ‘n’ in the file name represents a number.

Type Functional Fully copied Not functional Not fully copied
SyQuest disk 44MB 11 8 4 7
SyQuest disk 88MB 1 1 0 0
total 12 9 4 7

Capturing data from magneto-optical disks

Magneto-optical disks are a type of optical disk for data storage available in 5.25-inch and 3.5-inch formats. The 5.25-inch version entered the market in 1985; the 3.5-inch version has been around since 1991. Magneto-optical disks were considered to be very reliable because their reading equipment always checks that the data is written without any errors during the writing process. This however resulted in them being very slow to write on. The 5.25-inch disks had a capacity of 256 MB up to 9.2 GB, distributed over both sides of the disk. The 3.5-inch disks had a capacity of 128 MB up to 1.3 GB, and could only be written on one side.

The collection from Opera Ballet Vlaanderen contained 48 magneto-optical disks; 9 of these had a capacity of 128 MB, 31 had a capacity of 230 MB and 8 had a capacity of 640 MB. They were all 3.5-inch disks. You need a magneto-optical drive to read magneto-optical disks. You can only connect reading equipment for 5.25-inch disks to a computer via SCSI; reading devices for 3.5-inch disks can be accessed via an SCSI, IDE or USB port. We used a Sony magneto-optical disk unit with an SCSI port from 1995. This is why we also used a Macintosh PowerBook G3 as our workstation. We used SCSIProbe 5.2.1. software for the computer to be able to communicate with the drive.

Image 16: magneto-optical disks.
Image 17: magneto-optical drive.
Image 18: magneto-optical drive and Macintosh PowerBook G3.

Our equipment was not able to read 640 MB or 230 MB disks. It was however possible to read and fully copy three 128 MB disks. We also made disk images of these three disks using Disk Copy. The remainder of the 128 MB disks were ejected almost immediately by the reading equipment, and so could be damaged or functional, but we weren’t able to determine this yet because the reading equipment wasn’t reliable. Of the three disks that were functional and which we could fully copy, only one was still functional the next day when we tried to read them again. The others wouldn’t load in the operating system or were immediately ejected.

Type Functional Fully copied Not functional Not fully copied
3,5" M.O. disk 128MB 3 3 0 0
3,5" M.O. disk 230MB 0 0 0 0
3,5" M.O. disk 640MB 0 0 0 0
totaal 3 3 0 0

Results

The Opera Ballet Vlaanderen collection meant we could test workflows and set-ups for capturing data from the most common outdated and/or unreliable storage media. We were able to read 450 of the 519 carriers that we received from the institution. We fully copied 428 of these to contemporary data storage media.

Capturing data from the optical disks went quite smoothly (94% functional and 86% fully copied). We were also successful in capturing data from the Zip disks (96% functional and fully copied), but further testing with other reading equipment was required because of the problems we experienced with carriers being ejected. In the case of the SyQuest disks, it’s difficult to draw reliable conclusions because our sample was too small. We were only able to fully copy the data to other storage media for 56% of these cases. However, since 75% of these disks were functional, it's difficult to know if this low score was due to the reading equipment or because the quality of the carrier had deteriorated. We have not yet found a good solution for capturing data from the magneto-optical disks.

Type Functional Fully copied Not functional Not fully copied
CD-R and CD-RW 125 197 12 30
Zip disk 220 219 8 9
SyQuest disk 12 9 4 7
magneto-optical disk 3 3 0 0
total 450 428 24 46


Author: Nastasia Vanderperren (PACKED vzw), Alex Jaou en Rony Vissers (PACKED vzw)

Share this article:          

TRACKS is a collaboration between these partners: