From Investigation to Implementation

Building a Program
for the Large-Scale
Digitization of Manuscripts


Accessing the Thomas E. Watson Papers Digital Collection
End-User Experience

Using the Thomas E. Watson Papers Digital Collection, end-users can browse the entire archival collection using the finding aid. Digitized materials are linked from the finding aid by way of the container number—for example clicking on 'Folder 117' will take the end-user to a container view where they can view thumbnails of every scan in folder 117, or view and manipulate full-resolution versions of the images as well. The use of JPEG2000 images allows viewers to zoom in on the images to a level of granularity not possible with the JPEG file format, and without the large storage requirements of the TIFF file format.

The Thomas E. Watson Papers Digitization Project went beyond the goals of straightforward large-scale digitization by providing additional functionality through the gathering of item-level records for materials in Series 1. Correspondence and Series 8. Photographs. Because of this added metadata (endcoded as described in the Metadata section of this website), end-users can search and browse letters and photographs by names, dates, locations, and subjects.

Systems Overview

The Thomas E. Watson Papers Digital Collection is built using a combination of images delivered via a djatoka JP2 image server and metadata stored in an eXist XML database. Images are viewed and manipulated using an OpenLayers image viewer, and metadata is manipulated and delivered using XQuery, XSLT and PHP.

Accessing the SHC's Digitized Collections

Using the data gathered through the course of the two grants, the Southern Historical Collection, the UNC-Chapel Hill Library Systems Office, and the Carolina Digital Library and Archives developed an easy-to-implement, low-cost solution for the Web delivery and presentation of archival collections digitized on a large scale. Using this solution, EAD-encoded finding aids are automatically linked to digitized materials which are delivered using CONTENTdm Digital Collection Management Software.

EAD-encoded finding aids are transformed from XML to HTML using XSLT. During the transformation, a unique ID is generated for each container in the collection. This container identifier consists of the container type and the container number joined together with an underscore. (For example, the ID generated for folder number 7 would be 'folder_7'.) A javascript is invoked upon download of the finding aid, which searches the CONTENTdm API for the finding aid's collection ID. If the API finds anything, it returns a list of container identifiers (e.g. folder_7). The javascript then walks the finding aid and creates a link to digitized materials where it finds a match.

Using this system, links appear in finding aids only after materials have been digitized and uploaded to CONTENTdm with the appropriate metadata. Metadata uploaded to CONTENTdm—including the collection name and number, the location of the materials within the collection, the container number and title, file name, and hook ID—is automatically generated from collection finding aids by way of another XSLT transformation. The inclusion of this metadata not only allows the automatic linking system to work, but provides useful contextual information to the end-user without having to go through the process of painstakingly creating metadata records by hand.