Wednesday, 24 May 2017

Future past: researching archives in the digital age

Last week I took part in this research symposium at the Institute of Historical Research in London. It was a great opportunity to find out what other archives are doing about digitization and born digital records, and how academic users of archives are finding their experience. It was a really interesting day, and my notes go on for pages, so I'm going to attempt to pull out some of the common themes that emerged. There were many opportunities during the day to ask questions, get feedback and talk to others, so my notes are a mixture of speakers and thoughts/ideas found from networking.


The hashtag was #digfuturepast and the symposium was recorded and should be available soon on the IHR website.


Barriers to using digital material

  • Paying for content. Digitization is expensive but academic users are used to having "free" access to collections (actually paid for by their institution). Yet, the digitization has to be paid for somehow, whether through institutions funding it themselves, grant funding or commercial companies providing a paid-for service (eg Ancestry) 
  • Making copies available. Gone are the days when a student or academic would come into an archive every day for a week or a month to do their research. Pressures of time mean they want to make the most of a single visit and be able to take copies away with them or download copies to use at home, yet it is impossible to digitize everything, and there are various reasons why copies may not be allowed at all, eg copyright, commercial sensitivity or preservation.
  • Poor documentation and/or OCR mean that researchers can't find what they're looking for. They may miss relevant items in a plethora of search results, or not get the result they need at all. A reliance on keyword searching misses the opportunity to search the collection more widely and loses the connection between archival sources.
  • Lack of a seamless user experience make it hard to use the material eg legacy systems, different systems for library/archive material, system not optimised for finding archival material.
  • Information literacy issues. We can't always assume that researchers will know how to search in our system, so we need to equip them with the tools to do this. We also need to address the common misconceptions found below.


  • Misconceptions about online access to archives

    • Any online resource is complete and comprehensive. Many only represent a tiny fraction of an archive's holdings, so how do we alert users to this and encourage them to look beyond the digital? It is impossible to digitize everything, due to copyright, staff and equipment resources, having metadata available, issues with storing electronic files etc.
    • Everything will be catalogued. No, digitizing is not the same as cataloguing. Most (all?!) archives have a cataloguing backlog, and, until the material is catalogued, there is no way to access it. This then gives rise to the question about whether it is better to spend resources digitizing some already catalogued material, or catalogue unlisted material that cannot be used at all yet.
    • Digitized version is just the same as the original. No, frequently this isn't the case and their are users who will still need to see the original. This is also one of the reasons why it is vital never to destroy the original.


    Educating researchers

    Time and again the need to educate researchers came up. It was agreed by all present that this is a vital part of training as a historian and that it should be done as early as possible in an academic career. I was pleased by this as we are already doing several of the suggested activities to encourage researchers to engage with our collections, including:

    Case studies

    • The archivist from Boots Heritage who explained how Boots had moved from an entirely internally-focussed business archive to one that was available to researchers thanks to funding from the Wellcome Trust to develop a new digital resource aimed at academic researchers. She had found that getting the right tools was essential so proper cataloguing software (CALM) had been acquired and material was catalogued to stringent standards to make it helpful and meaningful, including creating authority files to be a repository of information about buildings, brands and people. For many researchers this has turned out to be the entry point into the collections. Preservation issues affected the usability of some items and repackaging them into smaller units greatly improved this issue. Care had to be taken to protect Boots' interests, so images are watermarked and download prevented, and commercially sensitive information is not available.
    • Transport for London archives are aiming to collect the evidence that every journey matters, including the digital output of the organisation. They took the opportunity presented by needing to archive born digital material to overhaul and restructure their cataloguing. Although this was resource heavy it has created a more useable catalogue for staff and made it much more available to researchers.
    • Kathleen Chater talked about her research into black people in England in 18th century and how digitized records hadn't helped her solve research problems such as identifying where "black" didn't refer to a person, or to those instances where a black person was identified using another term. Keyword searches frequently produced unusable quantities of results. One of the more helpful things she did was spend three months going through 10000 Old Bailey records on microfilm, which also gave her the helpful context of many other cases (eg how common was it for anyone to be convicted of a particular crime). Although the Old Bailey records have now been digitized they are difficult to search because of OCR problems (the long s) and context is lost.
    • Jo Pugh, a digital development manager at The National Archives, discussed his PhD research in information journeys in archival collections. He related how the problem now isn't amassing information, but restricting what we see. His research had compared how enquiries are formulated on email, phone calls or Twitter and had looked at how the experts (archivists) worked with researchers to resolve archival queries. He had found that research guides could help to reduce uncertainty, eg by explaining how to get the best out of a search.
    • Tom Scott from Wellcome Collections explained how the context of their collections isn't just medical and so users don't know what's in the collections. Searching digitized collections meant items were isolated from their context "searchable but not understandable". They wanted to provide access by having a good reading experience, whether in person or online, so had tried to "encapsulate a librarian": a single domain model from a mix of systems for books, archives etc, extracting meaning of enquiries (eg cross references for TB/consumption/tuberculosis). He stressed that it is really important to record the metrics of what people are actually searching for.
    The symposium rounded up with a discussion of how we could futureproof our collections. My take aways from the day are:


    • Keep doing our existing work on educating researchers as early as possible, and look at how we can expand that with the resources we have.
    • "Futureproofing requires quality cataloguing" - making sure our cataloguing is up-to-scratch.
    • Assess any digitization project to ensure that high quality metadata is in place first and that it will support the needs of researchers wanting to use our collections.



    No comments:

    Post a Comment