From scan to delivery: Special Collections and IIPImage at Utrecht University Library

Guest Post by Edu Hackenitz, University Library Utrecht

The Utrecht University Library Special Collections contain many extensive collections of manuscripts, pre-1901 printed works, more recent rare and valuable printed works, maps and nautical charts. The library takes care of the acquisition, conservation, scanning, cataloging and availability of this material.


Yes, as hard labouring men we do our own daily scan jobs and do not use services from Google books.

Project Goals

Main goals of our project:

  • redefining our existing scan workflow and convert it to an automated process: from scanning to delivery to various public web interfaces.
  • storage in our repository for long term preservation of all objects, their metadata, OCR fulltext.
  • an attractive public portal site where our curators could easily add or update content.
  • a separate ‘bookreader’ where every object can be browsed or viewed including the OCR, annotations, metadata and derived PDFs.
  • a separate high performance ‘viewer’ for viewing and zooming our high-resolution images.
  • must work on modern browser but also IE7 (ouch…) or higher and tablet devices as well. So no Flash required.
  • reader, viewer and portal are considered as standalone elements but linked together by the DSpace repository identifier (handle) and also resolvable to our Aleph (Exlibris) catalog system.
  • multilingual, open source , well performing
  • can be achieved within a year by a small team of developers.

Wow, that sounds pretty ambitious and seems quite a challenge.

Architecture

A few years ago we already built our custom ‘scan production line’ invoking Abby Finereader software for OCR and DSpace as repository software. All administrative and bibliographic metadata as well as all high-resolution scanned images and OCR are loaded in the DSpace repository. So this job was merely a rewriting job according to the latest state of technology. The public portal we decided not to build ourselves. There are enough open source alternatives and we selected Drupal as a highly customizable framework or cms. It takes a learning curve for developers but we are happy with it. Also, Drupal is a very good friend to those who are not very skilled with computers and must add content in multiple languages on a regular base. So far so good and no rocket science involved yet. Next was the selection of a bookreader. We choose the Open Library Bookreader. Considering the amount of time spent on customizing and bug fixing (mostly for IE7) next time we would rather build the bookreader ourselves.

And then the ‘wow factor’, the ‘icing on the cake’, the ‘pièce de resistance’: ‘THE’ viewer.

In the selection phase the usual suspects came along. Zoomify, Microsoft stuff based on DeepZoom, LizardTech, OpenZoom, a lot of javascript solutions and off course IIPImage. Although many libraries and archives use Zoomify we did not think it would fit into our concepts. Neither did Microsoft. We had LizardTech software for some years and if you were lucky enough you could get an image to zoom in. Most of the time it just crashed.

We were very impressed by the flexible architecture and simplicity of IIPImage and it ticked most of our boxes.

At the start we had to overcome the problem that the source code was not that easy to compile on SUN Solaris 9+10 but with the help of Ruven at IIPImage suddenly a neat Solaris binary popped out of Pandora’s box. It’s amazing and quite a luxury to put a question in the help forum and within minutes a correct answer returns. So, once compiled the IIPImage server is up and running within minutes.

Images

Next step was the automation of conversion of existing and daily scans to tiled pyramidal TIFFs. We already used ImageMagick so we started a test batch of converting TIFFs each approx. 100G in size. Woooeeps…that took a long time and it would probably take a year or more to convert the whole set. Alas, switching to VIPS made no significant difference. This was a major show stopper and we decided to go for another option: convert to JPEG2000 based on Kakadu software.

Can we take software seriously with a name like that? YES WE CAN!

The performance improved spectacularly and with the Kakadu software the project was on the safe side again. The Kakadu software to convert TIFF to JPEG2000 is free, the part to decode it in the IIPImage server is not free but it is cheap. It took us quite a while to get in touch with the guys at Kakadu and obtain the software although we were more than willing to pay. But then again, Australia is down under. We did some Didgeridoo workshops to honour the Australians and within a few weeks we converted the whole set of 1,361,067 TIFFs to JPEG2000 using the following encoding parameters:

-rate 0.5 Clayers=1 Clevels=7 Cprecincts={256,256},{256,256},{256,256},{128,128},{128,128},{64,64},{64,64},{32,32},{16,16} Corder=RPCL ORGgen_plt=yes ORGtparts=R Cblk={64,64} Cuse_sop=yes -no_palette

Finally we shaped the final web applications in PHP while using REST based services to communicate with the different subsystems to get all the info needed to display the content. A major advantage of the IIPImage concept is it’s flexibility on the client side. We decided to build a simple detection layer which routes the client to the IIPImage Flash version (if Flash is available) or the javascript Mooviewer. So every device gets the content it deserves. Without effort we connected a Zoomify client but it didn’t add any significant functionality whatsoever so we did stick to the default IIPImage flash client.

For over half a year now we have used IIPImage and it has proven to be stable, simple in use, fast as lightning. Moreover it is very low maintenance so the IT department can sit back and relax. It is open source and the maintainers deliver outstanding support. What more is there to be said but that this is not a paid advertisement!

Some highlights from the University Library Utrecht

Manuscript Egmond-Brevier [approx. 1535-1440]

View in Flash

View in Mooviewer

View in Bookreader

Most delicate part of Vande Spieghel zeevaerdt, Vande
navigation of the Western Sea……. [1584]

View in Flash

View in Mooviewer

View in Bookreader

Imagination of the Christian constellations in the
‘Harmonia Macrocosmica’ by Andreas Cellarius (second part), [1661]

View in Flash

View in Mooviewer

View in Bookreader

Pascaerte of all the Zeecusten Europe, Cornelis Doedsz [around 1620]

View in Flash

View in Mooviewer

View in Bookreader

A map for plundering a treasure fleet!

View in Flash

View in Mooviewer

View in Bookreader

You may also like to visit our public Portal Special Collection at http://bc.library.uu.nl/ or just visit our beautiful city.

Redefining the Special Collections was implemented by Edu Hackenitz, Marina Muilwijk, Martin van Luijt and Jozsef Kiraly at University Library Utrecht.

by Edu Hackenitz, University Library Utrecht