Skip to content
Curtin University
John Curtin Prime Ministerial Library
Curtin University Library

DigiTool: The dawn of a new ERA

Paper by Libero Parisotto, Systems and Technology Librarian and Kandy-Jane Henderson, JCPML Archivist

Presented at the 14th ICAU/SMUG Meetings, Vienna, 24 September 2003.

The new Electronic Research Archive (ERA) was successfully launched on the 18th of August 2003 at the John Curtin Prime Ministerial Library located at Curtin University of Technology in Perth, Western Australia. It is currently the only worldwide website that is actively using DigiTool version 2. This implementation has been based on the migration of data from the existing ERA system, which has been in production since 1998.

The new features of DigiTool version 2 including full text searching and the display of the structure of hierarchical records have allowed us to continue to provide a sophisticated research tool for our users. In addition, we now also have a single, flexible and easy to use product that not only provides access to our archival resources but also provides for efficient management of these resources.

John Curtin Prime Ministerial Library (JCPML)

JCPML was established by Curtin University of Technology to honour the contribution to Australia of its wartime prime minister and international statesman, John Curtin, whose inspirational leadership and political courage unified Australia during the grim years of World War 2. It has been open to the public since 1998.

It is Australia’s first prime ministerial library and its role is to develop and manage a unique archival collection of scholarly resources focusing on the life and times of John Curtin. The JCPML research collection includes personal papers, oral histories, photographs, moving pictures, sound recordings, copies of official records and other archival material. Nearly 40,000 digital files can now be accessed through ERA. Digital objects include:

  • Images (e.g. Photograph of JC at Buckingham Palace with the King and Dominion Prime Ministers in 1944. Only trip as a PM)
  • Full text (contents of documents, e.g. the Westralian Worker editorial searchable text - enhanced access to these writings)
  • Audio and video (oral histories, sound recordings, motion pictures, e.g. a speech by John Curtin – sound and text)

 

Early Digitalisation Work at JCPML

The driving force behind the development of ERA was the premise that Australia’s first prime ministerial library would be, not just a repository holding substantial original records, but an electronic gateway giving people access to John Curtin-related material held in its collection and in other collections in Australia and around the world

As such, the requirements were to develop a system infrastructure that allowed for records to be scanned and stored locally and allowed for linking to records held in other collections and accessible as images via the web. The use of OCR facilities was also required. The most critical and most expensive decision we had to make during the process of developing ERA in 1997 was the selection of software providing the requisite search/retrieval capabilities. The system initially selected to store and access the objects was the Electronic Filing System (EFS) from Excalibur Technologies. It was primarily chosen because of its search capabilities, especially its "fuzzy" logic which eliminates the need to have 100% clean text for searching purposes and its ability to store data in a hierarchical structure.

The system was later successfully migrated to RetrievalWare software, from the same company. RetrievalWare allowed a number of options when digitising such as producing an image, image and text, or text only, and provided a more stable web environment for our finding aids. However, RetrievalWare database maintenance facilities were less suited to our needs than the corresponding facilities for the EFS database. To get the best out of both worlds, we had to adapt the less than satisfactory procedure of using an EFS database for database management and a weekly copy of that database loaded into RetrievalWare as the enquiry database. On February 9, 1999 JCPML unveiled to the public its Electronic Research Archive (ERA) for the first time.

In addition to the access and storage of metadata and objects available through ERA a separate software application was used to manage the collection. This Australian product, called Archive Manager, provided physical and intellectual control of the collection by allowing archive staff to create finding aid documentation including creator, series and item information.

The Road to DigiTool

The increasing maintenance effort required to sustain our dual EFS/RetrievalWare databases led to some soul searching. At the same time we were concerned about the future direction of the vendor of our main University library management system. Ex Libris’ Premier Partnership offer to work together to ensure the cost effective delivery of scholarly information services was not only timely but well suited to our needs.

Curtin University’s Library and Information Service (LIS) Premier Partnership agreement with Ex Libris signed in early 2001 led to the implementation of Aleph, MetaLib and SFX by the 1st of July 2002. It also provided the opportunity to develop DigiTool, a Digital Asset Management System (DAMS), as a commercially viable module that not only could be used at JCPML but also in other libraries and collecting institutions that have archival collections. A key element in all these developments has been the technical support provided by LIS Systems and Technology Unit.

In its early days, version 1 of DigiTool was predominantly an extension of Aleph. Its main components consisted of the Aleph cataloguing library, a Media library and an Object Repository (See Dg 1). The Media Library contained the object metadata which was made up of a Descriptive metadata record and a Z403 Oracle record. The Object repository could either hold the object in an Oracle table, as a file or as a pointer to a URL.

Diagram 1 Structure of DigiTool version 1 (Ex Libris)
Diagram 1 Structure of DigiTool version 1 (Ex Libris)

In preparation for the system migration JCPML started to analyze the metadata from their two systems and concurrently investigate the functionality of DigiTool version 1. The fact that we had implemented Aleph 15.2 in July 2002 stood us in good stead with the understanding of how DigiTool functioned.

JCPML was concerned to ensure that its records were presented in a way consistent with international and national standards of archival description. DigiTool does not support EAD (Encoded Archival Description) and the JCPML took on the task of mapping record metadata to EAD and then matched these as appropriately as possible to USMARC, which was supported by DigiTool.

After loading our merged and converted data in early December, and matching our original specifications (Attachment 1) with this version of DigiTool, JCPML identified a number of additional functionality requirements including:

  • Fully developed hierarchical functionality (i.e. four levels are required to accommodate creator, series, file and item relationship)
  • Full text searching
  • Highlighting of search terms
  • Display of records based on pre-determined ranking
  • Ability to view objects in a multi-page JPEG format i.e. Multi-page viewer allows user to view either image or associated text files where both exist [some documents contain only an image or a text file]
  • Pattern recognition of text, e.g. Fuzzy Logic
  • Next/Previous navigation buttons on full OPAC display
  • Display multiple digital formats attached to a single file or item record

While two US libraries namely, Brandeis University and the University of Maryland, decided to go live with version 1, JCPML felt that they couldn’t go through a system migration which would result in a decrease in functionality. At this time the JCPML system offered the features of full text searching, archival finding aids and a hierarchical records structure. These features were not included in version 1. The University of Maryland did acknowledge that DigiTool would be improved by the addition of certain features not present in version 1.

With STP originally scheduled for March 2003 it was evident that even with the Ex Libris flurry of work conducted during the earlier part of 2003 there was no way that this deadline could be met. In conjunction with Ex Libris a more realistic STP date of 1 July 2003 was agreed upon.

The developmental work that was carried out in the ensuing months involved both Ex Libris from an application enhancement point-of-view and Curtin Library, from a data extraction and testing viewpoint. The tasks required to bring the product to our level of expectation were also identified, prioritised and in the case of the image viewer carried out by us. In the midst of all this work Yossi Tissona from Ex Libris found the time to keep us up to date with developments via InterWise training sessions. This gave us the opportunity to comment on and influence these developments.

Version 2 of DigiTool

In early May, version 2 of DigiTool which was being developed and tested at the Ex Libris site was ready to be installed on our server. In addition, a full data load with the real links to objects and URLs was also supplied.

Yossi Tissona flew in from Israel to provide us with face-to-face training at JCPML on the 12th and 13th of May. Whilst on site, Yossi completed the installation by coordinating the conversion and data load on our server with Ex Libris in Israel. The configuration tables were also changed so that the indexes reflected our needs.

Yossi provided training of a general nature and the cataloguing GUI training was then provided by the Library and Information Service staff experienced in the use of the Aleph Cataloguing module. Yossi’s visit was most useful in identifying gaps in the functionality of the product and in clarifying for Ex Libris exactly what JCPML required for adequate intellectual control of its records via the GUI and for client access via the OPAC.

New Structure

The diagram below illustrates the maturity of the products. The Aleph cataloguing and Media libraries have been replaced by a totally integrated DigiTool application consisting of one library made up of two components:

  • Descriptive Metadata Record (Z00), this contains the bibliographic description of either the hierarchical record or the digital object.
  • Technical Metadata Record (Z403), this contains the details of the digital object including copyright and control information.

Diagram 2 New structure for DigiTool version 2 (Ex Libris)
Diagram 2 New structure for DigiTool version 2 (Ex Libris)

Hierarchy

In this new version Ex Libris has been able to provide a multi-tier hierarchical record structure. This hierarchy caters for the Creator (grandparent), Series (parent), File (child) and Item (grandchild) needed to demonstrate the relationship between the JCPML records. However, only a three-tier structure can be displayed in the OPAC, for instance, grandparent-parent-child or parent-child-grandchild. While JCPML would have preferred a more clearly displayed four-tier structure the outcome did satisfy our requirements. (See Attachment 2).

This hierarchy was achieved by using the 774 MARC record tags. (See Dg. 3 below) This feature was also put to good use by New York University to express the relationships between their series and sub-series in their collection.

Diagram 3 Linking between records
Diagram 3 Linking between records

Full text Searching

All fields and full text are searched concurrently in the Basic Search. In the Advanced Search a drop down box allows a search of the full text or all fields.

Highlighting

This has been achieved. As some of our documents are quite long (one of the longest documents is 301 pages) it is important that the user is easily led by highlighting to the section of text that contains the searched keyword or phrase.

Image Viewer

This is a program, designed and written at Curtin LIS, which converts high-precision TIF images as stored on the database to JPEG images ‘on the fly’ when viewed by the user. Such an approach satisfies the JCPML requirements of storing high-precision images whilst overcoming the difficulties in providing a TIF viewer to our users. The success of this feature is such that users are not even aware, time-wise, that this conversion is actually occurring in the background when they view an image.

Conclusions

It has been a long and tortuous road to say the least, however DigiTool version 2 was sufficiently sophisticated and robust for us to go to go live on the 18th of August 2003. The product still lacks some desired features including: Ranking, Basket (this feature was available in Version 1) and Next/Previous Navigation buttons. However, the expectation is that these desired features will be available soon. At present, we have a working Digital Asset Management System (DAMS) that allows us to manage:

  • Digital assets (digital objects)
  • The metadata associated with those digital assets
  • Access to those digital assets.

However, just as importantly from our perspective, DigiTool is a single tool that performs double duty:

  • Management tool
  • Resources Access

The success of DigiTool version 2 as implemented at JCPML reflects the amount of work put in by the JCPML team, the Library and Information Service’s Systems and Technology Unit and the Ex Libris team. DigiTool provides very intuitive search and display interfaces to the content of ERA. The system is open and allows easy linking to and from digital objects held in other systems. It was with a sense of satisfaction and achievement that we heralded the advent of a new ERA. (see Diagram 4)

Diagram 4 The new ERA
Diagram 4 The new ERA

ATTACHMENT 1 Preliminary specifications for a new ERA

International standards such as:

  • Dublin Core (http://www.lub.lu.se/cgi-bin/nmdc.pl),
  • Australian Government Locator Service (http://www.aa.gov.au/recordkeeping/gov_online/agls/summary.html )
  • Encoded Archival Description (http://www.loc.gov/ead/ead.html).

Need for a sophisticated search interface, search functionality and display options:

The JCPML would require a sophisticated and powerful search engine which enabled:

  • phrase searching,
  • wildcard searching.
  • pattern recognition of text as a search option – Pattern searching processes plain English queries, but tolerates spelling differences in either the body of the text or the queries; automatically does pattern expansion on all query words to the number of words you set. This type of search is most effective when you have ‘dirty’ OCR data, you’re looking for a word with variant spellings, or when you’re looking for a word or phrase but you’re not sure of the spelling.
  • searching across the full content of text files, including contextual metadata;
    conventional Boolean searching with operators (AND, OR, NOT, AND NOT, NEAR),
  • choice of searching whole file room or just subsets of the collection – drop down menus providing choices re file format, item category, date added (eg newly added material might be highlighted), etc.,
  • capability to refine searches/search within material already identified,
  • as well as search function, we would like the capability to browse a hierarchy to select items (see later point re archival hierarchy) – we would like to be able to ‘go to’ creators or accessions within the hierarchy via accession number and keyword in label searching,
  • user friendly interface with good help pages and hints,
  • online tutorial option for assisting users,
  • sophisticated ranking of results by relevance using criteria such as location of search term, frequency and density of occurrence, etc.,
  • capability to save search results to disc or print out in useful format
      - Desired options for displaying results of a search results,
      - highlighting of search terms in the items retrieved plus option to ‘show best hit’ in associated text fields of items in results list,
      - Navigation buttons for next/previous pages when viewing multi-page image files,
      - allow multiple file types for each item/page to be displayed with ability to switch between them [as in EFS multi-page viewer but with all types, to include txt & tiff],
      - system to have capability to create dynamic web pages to display the result of a search – eg to present result in desired combinations of text + image, text + audio or video, etc – as chosen by client.


Need for system which allows archival context hierarchy to be included:

  • There exists in Australia a defacto standard for the description of archival records, which is acknowledged worldwide. It is essential for any system providing access to archival records to be able to present the context of the records in the format of this established hierarchy. See Attachment 2.
  • The development of a system of description for archival records, based on this defacto Australian standard and using EAD in a seamless way, would be worthwhile exploring.

Need for a user-friendly system to input and edit digital records and maintain security of database:

  • Capability to add and access all file types in a seamless way in ERA – i.e. we want to treat text, images, sound and video files in a consistent way plus have the capability to include other file types as identified in the future,
  • Useful to be able to ‘drag and drop’ files to position as required,
  • Useful to be able to add multiple pages per item in one step, i.e. multi-page files,
  • Need for customized metadata fields and ‘save as default’ capability for metadata entry,
  • Automatic indexing from labels, metadata and content,
  • Hide restricted items from non-staff users,
  • No mandatory requirements for structured subject headings or detailed authority files.

Ability to personalise/corporatise web pages.

The need for synchronized data conversion from RetrievalWare to Digitool.

Capability to allow deep linking from and to other institutions’ web databases and ERA.

Need to contribute to local and national initiatives to provide on-line access to descriptions of distributed holdings, for example the Curtin Library Database, National Library of Australia RAAMS database and any future National Archives of Australia initiatives in this area.

New ERA should be able to incorporate the intellectual and physical control system, which is currently contained in an Access database “Archive Manager”. See Attachment 2.

Access control for users to provide access for open and restricted material,

Statistics gathering capability to know exactly what records and what objects are being used.

Availability of open formats for image, audio and video.

ATTACHMENT 2- JCPML Hierarchical Structure

The hierarchical structure has either three or four levels, depending on the degree of detail needed to adequately describe the records. The levels are
  1. Creator
  2. Series
  3. File
  4. Item

Example of a 3 level hierarchy for creator Frederick McLaughlin

Creator - Frederick McLaughlin
Series - Personal papers of F A McLaughlin
File - Prime Minister’s visit to England via USA, Itinerary and engagements 1944

Example of a 4 level hierarchy for creator Australia Post

Creator - Australia Post
Series - Philatelic items published by Australia Post
File - First day covers commemorating the birth centenaries of Curtin & Chifley, 1985
Item - FDC "Carried on Special Flight" with Curtin stamp, Creswick & Perth postmarks, 25 January 1985

Each creator has one or more series, each series has one or more files, and if further subdivision is required, each file can have one or more items.

Diagram 5. Example of the hierarchical record structure in ERA
Diagram 5. Example of the hierarchical record structure in ERA

Full View       Save/Email   | Back to Brief List

Choose format: Full View Citation View

FMT SR
LDR -----nama-22------a-4500
008 ------m19241945|||-----------------|||-d
0922 |a SER0156
24500 |a Personal possessions of John Curtin.
260 |c 1924-1945.
7001 |a Curtin, John.
7741 |w 000001050
7741 |w 000004992
7741 |w 000004993
900 |t John Curtin Prime Ministerial Library.
910 |t Records of John Curtin.
966 000001156
CAT |c 20030706 |l ERA01 |h 2018
CAT |c 20030706 |l ERA01 |h 2048
CAT |c 20030706 |l ERA01 |h 2148
SYS 000000474

Diagram 6. USMARC record showing 774 tags

Diagram 7. Linking an object to a Metadata record
Diagram 7. Linking an object to a Metadata record