Planning Collection Pulls

Before a digitization program begins, clear procedures must be in place for how objects will be pulled from the collection and then returned to the collection after digitization. Along with the usual logistical considerations for retrieving an item from the collection (intra-institutional communication, scheduling systems, tracking systems, and logs), which are outside of the scope of this document and common to other collection-pulls such as patron requests, there are special considerations for pulling items specifically for digitization.

Length of Pull vis a vis Quality Control

The Standard Raw Digitization Workflow [see Standard Raw Digitization Workflow] calls for a Final Quality Control Stage executed by an individual who has access to the original object. Therefore, it is important that the object remains available to the digitization team not only during the Capture and Initial Quality Control stages, but through the end of the Final Quality Control Stage.

Moreover, the Standard Raw Digitization Workflow is built around efficiency, which is often increased by completing a large batch of Production Capture before proceeding to the post-production and Final Quality Control. For example, here is an example of a simplified breakdown of the Standard Raw Workflow when digitizing 100 boxes, each of which contain 40 envelopes:

  1. Production: Technician preps and captures all envelopes from all ten boxes, creating 40,000 raw files. Initial Quality Control is executed during this capture stage.
  2. Post-Production: Technician uses Capture One CH 8 workflow software tools like Auto Crop on the 40,000 raw files. Final Quality Control is executed at this stage.
  3. Processing: Technician processes the 40,000 raw files to TIFFs and places them into the Archive/DAM system.

With a rapid capture system using a 80mp digital back and the DT RCam Reprographic Camera with Capture One CH 8 Software, one could expect a reasonable rate of production of 600 envelopes per hour – quite impressive, especially given that the result will be true PDOs. However, with a collection of this size, it is expected that there will be a two week gap between capture of the first box of envelopes and the Final Quality control for that box. If the boxes are not kept locally available for physical reference, then issues discovered during Final Quality Control may require re-pulling the item. Even a technician with a 99.9% rate of success will have 40 errors during the Final Quality Control stage. If 10% of those 40 errors require re-imaging the object (as opposed to recropping or adjusting the raw file), then 4 envelopes will need to be pulled again from storage, resulting in a significant delay for the completion of the project. For this reason, where institutionally feasible, a large window should be granted between Object Pull and Object Return, commensurate with the size of the pull.

Collating By Required Capture Window (PPI & Object Size)

In the Standard Raw Digitization Workflow, very large gains in productivity can be achieved by minimizing PPI-Changeover time. Consider if there are one hundred objects to be digitized: 50% are A2-size that require a 300ppi capture and 50% are A4-size that require a 600ppi capture. In this case it is most efficient to digitize all of the A2 objects, and then all of the A4 objects. The least efficient method would be to switch between sizes on every capture. The table below illustrates the total time for digitization if the PPI-changeover time was 3 minutes, and the object handling/capture/initial-QC time was 30 seconds.

Workflows Object Handling, Capture, and Initial QC PPI Changeover Time Total Time
Collated Workflow (all A4, then all A2) 30 sec * 100 objects 3 min * 2 PPI change 56 minutes
Uncollated Workflow (A4, A2, A4, A2, A4, A2…) 30 sec * 100 objects 3 min * 100 PPI changes 350 minutes

In the extreme example above, a collated workflow reduces capture time by 84%! Put in different terms, the capture rate accomplished in the “collated” workflow above would capture 750 objects per 8-hour shift, while the uncollated workflow would capture only 120. That is a massive increase in productivity without any reduction in quality and with fewer opportunities for human error.

Many collections are not stored in a collated form, and it would be presumptuous of the digitization program to dictate or even influence the manner in which these collections are stored. However, the gains of digitizing in a collated workflow are so significant it can make sense to temporarily collate the structure of the collection prior to digitization and then restore the original structure prior to returning the items to storage. However, these additional steps in the workflow will increase physical handling and increases the possibility to misplace collection items. These drawbacks must be evaluated alongside the potentially significant increase in overall efficiency.

Consider that the results  need not be so extreme to make significant (>10%) changes to productivity. The digitization technician is at the mercy of the program manager and institutional stakeholders; they can only collate within the items delivered in a particular batch. It is essential, therefore,  that maximizing collation and other streamlining operations be incorporated into the scheduling of the digitization program from the onset and at the highest levels of organization. However, administrators and other stakeholders are often unaware of the drastic difference collation can make. It can be useful for those involved in planning collection pulls to see a brief demonstration of the practical steps required to prepare to digitize a particular type/size of object as compared to the minimal time required to digitize additional similar objects. The change in productivity is so massive that the internal cost to digitize additional homogeneous objects in a collated workflow is nearly free in comparison to additional objects in an entirely uncollated workflow. Proper collation can drastically improve the math behind determining an object’s Digitization ROI.

“it makes very good sense in organizing digitization to pull objects of like size, even if they are not of the highest importance in the overall collection. If the goal is to digitize the entire collection, it will certainly be more efficient to shoot similarly sized objects in “one go”, as it were, rather than prioritizing by order of relevance and constantly changing capture settings.”

– Barbara Katus, Manager of Imaging Services, Pennsylvania Academy of the Fine Arts