The Cultural Heritage Community does not need an education in how quickly information can become obfuscated, obliterated, or impractical without proper preservation curation and migration – it is the very impetus for the existence of the community. The age of microfilm was a short blip on the historical timeline of Cultural Heritage Preservation, and already, much of the work done during this period is languishing in storage, inaccessible by its target audience with their modern expectations. Moreover, much of the work done on microfilm is now considered insufficient quality to be considered Preservation Grade.
Scanning an image and saving to a hard drive is trivially easy; creating a Preservation Digital Object (PDO) requires careful consideration and continuous vigilance. Technology rapidly evolves, especially those technologies which control how digital information is created, stored, and retrieved. The goal of a PDO is to capture, wrap, and describe data in such a way that enables migration between storage systems, with the specific ability to be indexed and deciphered by future access systems.
Before undertaking digitization, it is important to survey the variety of material present in the collection. Efforts to create universal conventions for breaking down a collection are in progress at The Getty Research Institute in the form of the Cultural Objects Name Authority. Such canonical hierarchies may be useful for planning a digitization program. They were not explicitly created for this purpose, however, and more utility may be gleaned by a purpose-made breakdown that is customized to a specific institution. The example below is well suited as a starting point for planning a digitization program, and can be made with any standard spreadsheet program using a rough estimate of quantities to help guide the process. Here, we have broken it down by type, size, and state, taking cues from which collection attributes call for special digitization considerations.
Establishing the scope of the collection to be digitized is essential for both the prioritization of digitization and the proper selection of the hardware and workflows that will be used. For instance, a collection which contains 3D materials is ill suited to systematic digitization via a flatbed scanner. Ideally, the hardware and workflow software that can digitize the majority of the collection should be selected. It is imperative for institutions to examine their holdings and look for a solution that is versatile and accommodate the particular needs of their collections. For example, the same high-resolution digital back and raw workflow software can be used to digitize any of the above categorized material types. The ability to use the same hardware and software across a broad collection reduces the institutional training requirements involved, and consolidates hardware cost outlays.
It is especially important to make careful note of the type and quantity of “problem children.” These are the outliers of a collection which will require extraordinary time or effort to digitize. This could be because of size (e.g. in a bound material collection there may be a small number of especially over-sized manuscripts) or condition (e.g. fire-damaged or extremely fragile materials).
“We have handscrolls, we have prints, small paintings, large oil paintings, hanging scrolls, six-fold screens; we also have quite an extensive collection of 3D material: bronzes, sculptures, metals of all types. We have a mandate to photograph our entire collection and put it online… by the end of 2014.”
– John Tsantes, Freer|Sackler, The Smithsonian Institution
One should never underestimate the amount of “housekeeping” required to maintain, curate, and provide long-term access to the tremendous quantity of files created during digitization projects. Perpetually maintaining digital objects require a range of ongoing expenses such as on-site storage, off-site redundant backups, electricity, maintenance, and migration costs. The use of Preservation Digital Objects more easily justify this ongoing cost than those of digital objects that do not meet, or have not been verified to meet, preservation standards.
Organizational systems such as an electronic collections catalogue, collections management system, and/or digital asset management system will help optimize workflow by providing structure to the materials to be imaged and can also help track which collections have been digitized. The ability to assemble collections by likeness and size tremendously increase the efficiency of digitization program and greatly enhance the conversion of collections into Preservation Digital Objects. If the collections catalogue has a digital asset management component, one can characterize the quality of the digital objects using FADGI type guidelines, and segregate images that are true Preservation Digital Objects from lower-quality images. This can help guide which digital assets are worth the costs of maintaining.
The result of a digitization program is the creation of Preservation Digital Objects. These PDOs become part of the collection of the institution and must be preserved and properly cared for. When planning a digitization program this obligation and its ramifications must be considered.
The most obvious implication of an expanded collection of PDOs is the need for more digital storage. Every PDO requires digital storage space, so more PDOs means larger pools of storage are required. Less obvious is that a larger digital collection usually implies more digital traffic in the form of visitors to an institution’s website or collections portal. Handling this increased bandwidth of online visitors may require upgraded servers or a more expensive website hosting plan. Also, those visitors may require more sophisticated tools (e.g. better filtering, searching, browsing) as the online collection grows; manually browsing through a few hundred thumbnails is tedious but practical, while manually browsing through a few million thumbnails is not practical. Adding such tools to a web platform, or switching to a more sophisticated platform which includes them can be a lengthy and expensive process.