From Investigation to Implementation

Building a Program
for the Large-Scale
Digitization of Manuscripts

Legal Considerations

Legal and Ethical Concerns

The SHC must find a way to balance the critical goal of providing greater access to manuscript materials through digitization with the legal and ethical concerns regarding privacy and copyright. Some collections may contain sensitive materials. Examples include: student records, correspondence related to refereed journals or grants, financial information, medical or health information, legal records, business records with trade secrets, materials conveying personal information about identified third parties (e.g., extramarital affairs, drug use, and juvenile crime), and sexually explicit and graphic content.

Federal law provides some guidelines in handling privacy concerns. For example, the Census Bureau releases aggregate data about the census as soon afterward as is practical; it does not, by law, release individual census responses for 72 years following the census due to privacy concerns. Following this guideline, collections whose materials were created 72 or more years ago should be free of legal concerns about third-party privacy; however, ethical obligations need to be taken into consideration as well. As a result, in the decision matrix, higher priority is assigned to those collections that do not include materials that may infringe on the privacy of living third parties, as well as those in which such materials could be easily segregated from the bulk of the collection and could be excluded without significantly decreasing the collection's research value.

The University Library and the SHC respect the intellectual property rights of others and do not claim any copyright interest in most SHC collections. The SHC will make the digital reproductions of archival materials available under an assertion of fair use (17 USC 107). However, staff will adhere to a take-down policy that will guide decision making in the wake of copyright infringement claims. The take-down policy will be published on the SHC's website. Higher priority for digitization will be given to those collections unlikely to contain materials that remain under copyright protection and that contain copyright-protected materials that could be removed without significantly decreasing the research value of the collection as a whole.

Under copyright law, unpublished materials created by authors who have been dead 70 years or more (120 years or more for anonymous authors) are not protected by copyright; most works published after 1923 are protected. The SHC will assess all collections for risk, particularly noting the presence of works published after 1923, commercially produced sound recordings and moving images, unpublished works by identified literary authors, photographs with credit lines of photographers who are either living or have been deceased fewer than 70 years, and materials with later creation dates that may be under copyright protection.

Copyright and the Thomas E. Watson Papers Digitization Project: A Case Study

To investigate the copyright component of digitizing archival materials, the Thomas E. Watson Papers Digitization Project staff conducted intensive copyright research on the correspondence in the Watson Papers by gathering basic metadata—e.g., names, dates, and geographical locations—from the materials. The following table depicts the scope of the materials and time taken to record this information:

Total ItemsTotal Linear FeetTotal TimeTime Per Linear Foot
8,4347.5 (15 document cases)90 hours12 hours

There were 3,304 names included in the correspondence. Using a variety of sources—including Wikipedia, the Social Security Death Index,, and print reference works such as biographical dictionaries—staff attempted to identify the correspondents to find dates of death in order to determine copyright status. They found that 608 correspondents (18.4%) had life dates that precluded the materials from copyright protection and thus were in the public domain; and 1,101 correspondents (33.32%) had life dates that placed the materials in copyright. Life dates could not be found for 1,571 correspondents (47.55%), and no information could be found for 24 correspondents (.73%). The identification process undertaken by the Watson-Brown Project staff was very time-consuming, requiring more than 14 weeks of dedicated time by a full-time employee to evaluate a relatively small body of materials (when compared with the millions of documents in the SHC).

This would be an untenable methodology of copyright research for the SHC. Furthermore, archivists have no reasonable means to contact the thousands of descendants of thousands of people. For all practical purposes, it would be impossible to secure copyright permissions for every correspondent whose writing appears in the SHC. Clearly, the SHC staff needed to find a different way to reconcile copyright law with the repository's mission and the needs of its researchers. As a result, the SHC staff is developing a strong and visible policy for addressing any copyright complaints and will be prepared to remove material when and if any claimants notify the SHC that they hold copyright to a particular digitized document.