Sunday, 29 September 2013

Digitisation workshop

I spent a morning last week at a digitisation workshop hosted by the Wellcome Trust, as part of their Wellcome Digital work. It was in the rather smart new building on Euston Road...

... and consisted of three presentations by Christy Henshaw, Dave Thompson and Matthew Brack. Christy's presentation is also available online, as is Dave's [opens a pdf], and Matthew's, followed by a trip upstairs to see the digitisation studios and equipment.

The Wellcome has just finished the first phase of 2 x 3 year phases of a digitisation programme. This has been a lot of work on a project named Codebreakers, about the history of genetics. Most of the work has been carried out on site, some using contractors and with up to eight full and part-time staff supporting the project, alongside three full-time photographers, using a variety of different camera and scanner set ups.

The presentations took us through the process of the digitisation project, image processing, metadata (including copyright and access levels, as well as granular access and creating a structure to help navigate within an item).

They discussed the software used: SDB (Safety Deposit Box, which acts as a gateway to securely stored content, and which automatically creates administrative metadata about the images as they are ingested into the repository), Player (a custom-built way of displaying digitised content, due to be released as open source by the end of 2013), Goobi (open source software for managing workflows in digitisation projects)

And metadata: administrative metadata (created automatically by SDB, above), descriptive metadata (ISAD(g) for archives, MARC for bibliographic, converted to XML, which then becomes MODS (Metadata Object Description Schema) once it's in the METS (Metadata Encoding and Transmission Standard) file. Still with me? There are lots of helpful explanations in Dave's presentation, above...

And formats:
JPEG2000, because it can be compressed as there simply wasn't enough room to store everything as a TIFF. JPEGs are created on the fly, as they are needed. They also use PDF, MPEG2 and MP3.

The particularly useful points I came away with were:
  • Metadata metadata metadata. There's no point digitising unless you've already catalogued it. Without metadata digital objects might as well not exist, as you can't search for them. 50% of Wellcome digitisation project time is spent on cataloguing and metadata. Digitisation is an end to end process bringing together objects and metadata, it isn't just about putting books under cameras.
  • Plan! Have a Data Management Plan. What will happen if it all goes wrong? (There was a great egg/custard analogy at this point).
  • Include QA - how do you know that everything has been done to the right standard otherwise?
  • Document your processes and decisions so that other people know what you've done, and you also know when you come back to it for a future project.
  • Share what you've learnt so that others can learn from your mistakes.
  • The actual physical imaging is a very tiny and final part of the programme.
  • Bear conservation in mind - most damage to items happens through handling, and digitisation tends to cause different handling stresses to normal  use. Many items will need conservation work before they can be digitised, so factor it into the workflow.
  • Copyright - do a rights risk assessment.
We were allowed to take photos in the Wellcome's studios. This is a copy stand set up  for digitising books - the glass plate raises and lowers once the book is underneath. It is from ICAM.

This is a similar copy stand, but used for digitising flat objects, such as archives.

This is a copy stand suitable for use with books that can't be opened sufficiently for the other one to be used. It allows the book to be supported at different angles so that images can be taken.

I had a fantastic morning and learnt a lot. It was also good to have a chance to chat with the other attendees about their digitisation plans (whilst enjoying some rather tasty cookies). My thanks to Christy, Matthew and Dave (and the people upstairs in the studios) for such an informative morning.

Dave's presentation linked to above includes suggested further reading. Christy has also got another presentation online, on digitisation workflows.

1 comment:

  1. This post has been extremely useful to me, thank you for taking the time to write up your session so thoroughly!