Digital Archives Initiative
 


1. Inventory of Digital Objects in the DAI (29 March 2011)
     Notes

a. Compound Objects

These are virtual multi-page documents (monographs, journal issues, etc.) created from original jpeg or jpeg 2000 images. All compound objects have a corresponding PDF file in order to facilitate the easy downloading/printing of compound objects. CONTENTdm uses an XML file (.cpd) to organize individual pages into a compound object. Here is a sample.cpd file:

Work Item: Determine the METS standard for the description and long-term preservation of a compound object. Export metadata (including page-level) from CONTENTdm, convert it to METS and store the original, preservation and access copies of images, the associated PDF/A file along with the metadata in an OAIS Archival Information Package(AIP) using the Library of Congress's Bagit tool.

The format and content of the AIP is based on the Digital Object Specifications for submitting AIPS to HathiTrust. HathiTrust mandates the use marcxml for descriptive bibliographic elements. Given that our CONTENTdm instance relies on the use of Qualified Dublin Core the AIP creation tool will be created to optionally support the output of the descriptive metadata in either marcxml or dublin core.

b. Multiple Formats for the Same Media Type
Since the origin of the DAI over six years ago digital preservation best practices have emerged for the various media types. JPEG 2000 (lossless) has emerged as the preferred long-term preservation format for images and similarly, PDF/A is preferred for text. Where possible we are attempting to standardize and normalize all of file formats associated with each media type in order to make it easier to plan for long-term preservation. In all cases we will be preserving the original file format in addition to converting it to a normalized format and where necessary, the access format of the day.

For example, our analogue to digital video conversion is based on the use of .vob DVD file containers. The .vob files were originally coverted to .wmv for access and are now being converted to .mp4 (H.264) for access because of the improvements in performance for web delivery. In addition, .mp4(H.264) access and playback is supported by all 3 primary user access platforms (Windows, Mac and iPAD). iPAD, introduced in April, 2010, does not support either .wmv or .swf (Flash) video formats. For those devices which do not support HTML5 with .mp4(H264) or .mp3 the DAI returns a .swf (Flash) file (see Supported DAI Video/Audio Formats). Over time, we will replace the remaining .wmv files with .mp4(H.264) for access and also convert the original .vob files (if possible) to MXF Motion JPEG 2000 for long-term preservation. The original .vob files which are stored on DVDs will be copied online to the archive store in order to deal with the issue of media deterioration and obsolescence of the DVDs. A similar approach will be taken to deal with the multiple audio formats. All of these decisions are documented in the DAI Media Type Preservation Plan.

Work Item: convert all PDFs to PDF/A . (Completed)
Work item: copy all .vobs on DVD to multiple online locations. (Completed)
Work item: convert all .vobs to MXF Motion JPEG 2000 (if possible).
Work item: create Archival Information Packages (AIPs) using the Library of Congress Bagit tool for all digital objects.

2. Pre-Ingest

a. Digital Format for Memorial University of Newfoundland Electronic Theses and Dissertations E-Theses Project (in progress)
b. Preservation of Original Digital Content (in progress)

i. Media Migration of Digitized Analogue Videos on DVD (in progress)

c. Add Pre-ingest Checksum Service For Content Providers (in progress)

3. Ingest

a. Digital Preservation Format Policy for Submission, Archiving and Dissemination (in progress)
b. Instructions for Creating Long-Term Preservation Copies of Original Word and PowerPoint Documents