The Bodleian Library is one of the oldest libraries in Europe, and in Britain, it is second in size only to the British Library. First opened to scholars in 1602, it incorporates an earlier library built by the University in the 15th century to house books donated by Humfrey, Duke of Gloucester. Since 1602 it has expanded, slowly at first but with increasing momentum over the last 150 years, to keep pace with the ever-growing accumulation of books, papers, and other materials, but the core of the old buildings has remained intact. Together, the Bodleian Libraries hold over 13 million printed items.
The Bodleian Digital Library System and Services (BDLSS) department have used Zabbix since 2015 for monitoring infrastructure which consists of over 350 virtual servers, multiple storage systems, and two tape robot libraries.
Objective:
In 2018 as part of a new digital preservation policy, BDLSS looked to improve digital preservation across a number of their core digital repositories: Oxford Research Archive (ORA), Digital Bodleian (Image Repository) and Bodleian Electronic Archives and Manuscripts (BEAM) as each lacked various digital preservation capabilities, such as fixity (checksum) checking, file format characterization, file format validation, and virus checking.
“Bodleian Libraries preserves its digital collections with the same level of commitment as it has
preserved its physical collections over many centuries. Digital preservation is recognized as a core organizational function that is essential to Bodleian Libraries’ ability to support current and future research, teaching, and learning activities.” – Bodleian Libraries Digital Preservation Policy 2018
The Weston Library, Oxford, United Kingdom
Requirements:
Incorporate digital preservation actions to existing digital repositories and record key statistics relating to the repositories to allow reporting to service owners with visual dashboards and periodic reports.
Approach:
A proof of concept was developed focusing on file integrity checking, all content within multiple digital repositories was scanned with a File Integrity Worker capturing details of every file such as file size, permissions, and calculating a SHA512 checksum, with all the details being saved into an ElasticSearch index per repository. A comparison of monthly indexes allows differences to be identified, such as new files added, files deleted and files modified (including when a checksum has changed).
Over 30 key statistic values from each of the ElasticSearch indexes are sent to the Zabbix monitoring server to be recorded, these include calculated forecast items to help with storage growth predictions.
Recent Kerberos authentication update to HTTP agent in Zabbix will enable the Micro Services server to authenticate to the ElasticSearch cluster securely.
A Grafana server then connects to Zabbix server API to provide a reporting dashboard, helping the service owners of the various digital repositories visualize the statistics gathered in Zabbix.
Overview of proof of concept solution
Grafana dashboard connected to Zabbix API
Example Grafana dashboard from proof of concept
Business Outcomes:
The proof of concept has been successful in delivering a file integrity microservice which has been tracking over 22 million files from three digital repositories. Over the next two years the plan is to develop five further microservices:
Virus Checking
Ensuring content is virus-free and safe to use by all.
Repository Copy Statistics
Ensuring that independent copies of the digital repository are in sync and consistent with each other.
Backup verification/monitoring
Monthly review of backup logs files, comparing with digital repository content, confirming all items are backed up. Periodic restoration of digital repository content, confirming all content restored correctly.
File Format Characterisation
Identifying the file formats and technical characteristics of the content stored in the digital repositories.
File Format Validation
Ensuring the content stored in digital repositories is valid and complies with file format standards.