Registration & Coffee
ELAG Bootcamps (choose one)
Blend your data with Catmandu & Linked Data Fragments by Carsten Klee and Johann Rolschewski
Building interactive data applications with Shiny by Harrison Dekker and Tim Dennis
Data Quality by Patrick Hochstenbach, Péter Király, Annette Strauch (instructors), Johann Rolschewski and Jakob Voß
Library Data REST APIs: Design to Deploy by Christina Harlow and Erin Fahy
Text mining: Beyond the basics by Eric Lease Morgan
Using metrics with StatsD, Graphite and Grafana in your library by Uwe Dierolf
Guided Tours of NTK
Open Planning Meeting
The ELAG planning meeting is open to anyone interested in how ELAG goes forward, the next conference, and other issues ELAG should consider. You are welcome to share your thoughts!
Registration & Coffee
Key Note Speech: The Delicate Dance of Decentralization and Aggregation
Ruben Verborgh ↓
Ruben Verborgh is a professor of Semantic Web technology at Ghent University – imec and a research affiliate at the Decentralized Information Group at MIT. He aims to build a more intelligent generation of clients for a decentralized Web at the intersection of Linked Data and hypermedia-driven Web APIs. Through the creation of Linked Data Fragments, he introduced a new paradigm for query execution at Web-scale. He has co-authored two books on Linked Data, and contributed to more than 200 publications for international conferences and journals on Web-related topics.
Presentation: Moving from Find & Get and Towards Use & Understanding
Eric Lease Morgan ↓
Considering the ubiquitous nature of networked computers, the traditional role of libraries is not as critical as it used to be. In other words, the time-honored library activities of collection, organization, preservation, and dissemination of books & journals is quickly being supplanted by the ever-present Google search. Thus, the problem to solve is less about finding & getting information but rather about using & understanding the information found. We continue to drink from the proverbial firehose. This does not foretell the demise of libraries nor librarians. Instead, it represents an opportunity to provide enhanced and value-added services above and beyond our collections. These services can be articulated as action statements such as but not limited to: analyze; annotate; cite; cluster & classify; compare & contrast; confirm; count & tabulate words, phrases, and ideas; delete; discuss, evaluate; find opposite; find similar; graph & visualize; learn from; plot on a map; plot on a timeline; purchase, rate; read at a distance; read closely; read at scale; review; save; share; summarize; tag; trace idea; transform; etc. This presentation elaborates upon these ideas with an emphasis on the possibilities of natural language processing & text mining in libraries.
Eric Lease Morgan
Eric has been an academic librarian for just less than thirty years. He works at University of Notre Dame, United States.
Ondřej Koch ↓
Let’s take a look at the IT infrastructure of this library and talk about IT architecture in general with practical examples right here where we stand (or sit). How do we do things? How do we solve security issues? What about identities? Do we need a cloud and what kind of cloud? How do we manage our network and computing resources? Let’s look at those „simple and boring“ things like storage, compute and network, let’s talk about digital signage and AV stuff in here. Why? Because it matters, these are the fundamentals of every library as a building, as a place to study and as a place where data is properly and logically stored so that everyone can get meaningful information out of it.
Ondřej Koch is the Head of ICT at the National Library of Technology. Otherwise random Linux sysadmin / developer with some IT architecture background who doesn’t like doing ugly things, using terrible undocumented tools since IT shouldn’t be voodoo mumbo jumbo like it sometimes is.
Presentation: Blending and Deblending Data in the Daily Routine of a University Library
Wolfgang Stille ↓
In libraries, there is something like a war of opinions about library software in the last couple of years: some (in particular library management) prefer the licensing of commercial software products with strict business models, others (n particular library IT) participate in community driven open source solutions. Probably, the truth lies somewhere in between, which means that standards, interfaces, and interoperability play a more and more imortant role in the business of library IT, and thus have to be open. At the same time, monolithic commercial software solutions implying vendor lock-ins emerge, promising all-in-one one-stop-shop solutions, obstructing an objective debate between library management and IT staff. The talk intends to give some experiental report on the past, tries to answer questions and reasons of the present, and gives some vision (and hopefully discussion) on the future of library IT.
Wolfgang Stille is a mathematician and a computer scientist. He has been working on digital libraries, semantic technologies and innovative research methods beyond the digital search and finding paradigm for quite some time. Since 2013, he is head of the Department of Electronic Information Services at University and State Library of Darmstadt.
Presentation: Hydras to TACOs: Evolving the Stanford Digital Repository
Christina Harlow, Erin Fahy ↓
Stanford University Library has a robust digital library system called the Stanford Digital Repository. This repository holds a little under 500 TB of materials in preservation and online for researchers, capture of scholarly output, and digitized cultural heritage materials. These materials are managed across 90+ codebases serving a variety of functions from self-deposit web applications, to a nearly 10 year old parallel processing framework, to a digital repository assets publication mechanism leading into our Blacklight, Spotlight, and Geoblacklight applications – among other services and needs. At the core of this system is a Fedora 3 store. With Fedora 3 now end-of-lifed, and our system suffering from limited to no horizontal scalability options, we’re revisiting our system and architecture. We are writing it from the start with a goal to have data-forward, distributed microservices and some event-driven processing components. TACO, our new core management API, is the heart of this new architecture, and is currently being developed as a prototype. This talk will walk through the process of analysing our current system via a dataflows analysis; designing a new architecture for our digital library with a wide ranging set of requirements and users; prototyping a core component of our new architecture to be horizontally scalable as well as data & specification driven; then planning how to create ‘seams’ in our current system to migrate towards our new system in an evolutionary fashion instead of a turn-key migration.
Christina Harlow, Erin Fahy
Christina Harlow is a Data Operations Engineer in Digital Library Systems and Services in Stanford, United States.
Presentation: The ARCLib Project: An Open-Source Solution for Long-Term Preservation
Michal Růžička ↓
The talk informs about the Czech ARCLib project. One of the main goals
of the project is the development of an open-source solution for a
bit-level and logical preservation of digital documents, respecting the
national and international standards as well as the needs of all types
of libraries in the Czech Republic. The mission of the ARCLib project
lies, among others, in creating a solution that will allow institutions
to implement all of the OAIS functional modules and entities,
considering institutions’ information model. The architecture is
planned as open and modular and the final product will be able to
ingest, validate and store data from a majority of software products
used for creating, disseminating and archiving libraries’ digital and
digitised data in the Czech Republic.
Michal Růžička presented works in the area of libraries and digital libraries at Masaryk University since 2010, in 2018 got his Ph.D. in this area. He participated at the pilot low-barrier-LTP project in 2014–2015, since 2016 he coordinates bit-level protection methodology and implementation in the ARCLib project.
Presentation: The Datahub Project: De/blending Museum Data
Matthias Vandermaesen ↓
The Flemish Museums for Fine and Contemporary Art offer an overview of the art production in the Southern Netherlands and Belgium form the Middle Ages to the Twenty-First Century. The Flemish Art Collection is a non-profit organisation tasked with promoting the collection to an international and diverse audience. Delivery of knowledge and expertise curated by the museums is a big challenge. Blending cultural object records stored accross various databases and commercial registration systems is non-trivial and prevents opening up the collections across the walls of the museums.
In 2015, the Flemish Art Collection started the Datahub Project. Over the past years, a modern metadata aggregation platform was built, leveraging open source technologies and open standards. This presentation will highlight the architecture of this platform, and the design process.
The Datahub platform is a service oriented architecture and consists of three major components. The core is a home grown, reusable metadata aggregator called The Datahub. This web-application is build with the Symfony framework. Metadata records are ingested via a RESTful API, stored in a MongoDB database and disseminated via an OAI-PMH endpoint. User friendly discovery of metadata is covered via Project Blacklight and geared towards museal workers as well as the general public. Finally, we repurposed the Catmandu framework for flexible and extensible setup of ETL pipelines between the registration systems of the data providers, the Datahub and the discovery interface. Since we are exchanging information about cultural heritage objects, we use the LIDO XML exchange format designed and developed by ICOM.
The project taught us several valuable lessons. What are the benefits of looking across the borders of your own domain? What are key success factors? How do you identify pitfalls? But it also raises a set of new questions. How do we go from here? What’s next?
The tools and the codebase are freely available under a GPLv3 license and are actively documented and maintained on Github.
Matthias Vandermaesen is data conservator for the Flemish Art collection (flemishartcollection.be), an umbrella organisation for five fine arts museums in Belgium: the Royal Museum of Fine Arts Antwerp, the Groeninge Museum Bruges, the Museum of Fine Arts Ghent, Mu.ZEE Ostend and M Leuven. He is involved in ongoing efforts to improve reuse of collection data inside and outside museums. Currently, his main focus is the implementation of digital persistent identifiers for art objects and the design of a datahub architecture which streamlines the flows of information between museums, applications and the public.
Matthias holds a master degree in History (2003) and a postgraduate degree in Applied informatics (2004). Previously, he was engaged as a senior web developer, project manager and functional/technical analyst for various consultancy companies. He is an active contributor to various open source software projects such as the Drupal (http://drupal.org) content management system.
Presentation: ABC: Amsterdam Blended Collections - The Local Amsterdam Cultural Heritage Linked Open Data Network
Lukas Koster ↓
The Library of the University of Amsterdam is a member of the Adamnet Foundation (http://adamnet.nl), an organisation targeted at collaboration between currently 34 Amsterdam based libraries. Participating libraries are of various types: public libraries, higher education institutions, museums, archives, special libraries.
Until now the collaboration consisted of mutual lending and a common catalogue. Recently, the foundation decided to widen its horizon and also focus on the rich cultural heritage collections managed by their members. Among the Adamnet heritage institutions are well known organisations such as Rijksmuseum, International Institute of Social History, Amsterdam City Archive, University of Amsterdam Special Collections, the Amsterdam Museum.
The project “The story of Amsterdam” was started in 2017 with the objective of linking the Amsterdam based heritage collections on the topic of “Amsterdam” in the broadest sense, on the infrastructure level, using linked open data architectures. Target audience: researchers, creative industry, teaching and the general public.
This way the distributed (de-blended) “Amsterdam Collection” can be blended into a virtual unified online reusable collection.
Because of the strong focus on a geographical location (“Amsterdam”) the initial linking is performed on location level (historical and current streets, buildings, etc.). To this end a new central linked open data hub for Amsterdam locations is being developed (https://adamlink.nl, https://data.adamlink.nl/adamnet).
The central linking platform is developed using the HDT based linked data hosting platform Triply (http://triply.cc).
The project is carried out in cooperation with the Dutch National Digital Heritage Network program NDE (http://www.netwerkdigitaalerfgoed.nl/en/), for which it serves as a pilot for the intermediate Linking Layer architecture.
The presentation will discuss the organisational and technical issues of the project on two levels: 1) the central platform (blend/aggregate or de-blend/distribute) and 2) the various local situations of participating institutions, leading to different blending/de-blending approaches, focusing on the Library of the University of Amsterdam Special Collections (using Catmandu as ETL tool for MARC to RDF).
Lukas Koster is Library Systems Coordinator at the Library of the University of Amsterdam, focusing on the dataflow infrastructure.
After a sociology degree and additional training in ICT, he worked as systems designer and developer in institutions of higher education and scientific information. Since 2003 he has been working with library search and discovery systems at the National Library of the Netherlands and the University of Amsterdam.
Lukas’ current activities are:
- managing a project improving the functionality and user experience of the Primo Discovery tool
- implementing a FAIR open library collections policy
- next generation library management platform
- member of the Amsterdam Cultural Heritage Linked Open Data Network project team.
Presentation: In Out, In Out, and Shake It All About: a Moving Story of Data
Jane Stevenson ↓
The Archives Hub blends data. We bring together descriptions of archives, archival resources and repositories in a way that enables us to present an effective and valuable service through our website. We spent two years creating an entirely new system that was built upon the principle of bringing in data from different sources and providing that data for different purposes. I would like to give some insights from our experience of doing this, and consider whether we have created something innovative and with inherent potential for future development. I will talk about the architecture that we wanted to create, the workflow that we believed to be essential to our aims, and the challenges that we faced in being able to create a blend of data that could be successfully deblended in different ways. It required a great deal of thought and planning in terms of what we wanted to achieve, how we should process the data to fulfil those aims, and how we would work with data contributors, who were essential to our success.
Over a year after going live with the new service, have we achieved our aim of more consistent, standardised data, and have we provided the realistic potential for the data to be re-used? I will give examples of where I think we have fulfilled our aims and where we still have issues. I will argue that the ability to blend/deblend relies upon systems and technology, but it also relies upon people and their habits, expectations, understanding and ambitions.
Jane Stevenson is the manager of the Archives Hub service, the UK aggregator for archival descriptions. She is an archivist with a long-term interest in data and data standards and how we can raise awareness of archives through better cataloguing and utilisation of our data.
Presentation: Enriching Library Metadata with API's
Lucas Mak ↓
Given the ever-dwindling resources assigned to metadata creation, individual libraries are hard-pressed to create and maintain high quality traditional metadata across-the-board, let alone to prepare and transform legacy data into linked data. Coming up with no additional support by looking inside, one should look outside for resources that can help mitigate the situation. Nowadays, libraries no longer monopolize metadata creation. More and more special domain communities have set up Wikipedia-like crowd-sourced portals to serve information needs of their members. At the same time, there are international initiatives in the library community to set up data stores for linked data sets. Can the library tap into these rich information resources, in an efficient way, to enrich library metadata in the traditional way as well as prepare the legacy data for the big migration?
This presentation will discuss how Michigan State University Libraries is able to harvest selected metadata from various library and non-library community based portals through APIs (Application Programming Interface) in a batch and automated fashion to enrich existing metadata of a popular music collection and enhance them with URIs for linked data conversion down the road.
Lucas Mak is the Metadata and Catalog Librarian at Michigan State University Libraries. He is interested in application of technologies to enhance efficiency of metadata creation, enrichment, transformation, and maintenance.
Presentation: Rethinking the IT System Architecture
Henrike Berthold ↓
The Saxon State and University Library in Dresden (SLUB) is the university library of Dresden University of Technology (TUD) and the state library of Saxony with a history starting in1556. Because of these two roles, it is an independent research institution with a range of tasks. They include services for TUD, such as an open access repository, support for specific research communities, collection and long-term preservation of digital documents published in Saxony, and internal production and processing workflows.
The rapid digitization in the last decade has dramatically changed usage scenarios of our target groups, materials we acquire and produce (“Digital first”) and all related internal workflows (e.g. patron-driven acquisition). Depending on the needs, systems were developed based on open-source components or, when not available, on commercial components or by us. Some IT systems are offered as services and used by third parties. We operate most of the systems ourselves, some are hosted externally. Today, library IT at SLUB operates 50+ business services for our customers and staff. In the data center we run tape storage with a capacity of some PB, an infrastructure for virtual servers, physical servers and central network devices.We cooperate with the data center of the TUD e.g. to store the data that is managed in the digital long-term preservation system.
This grown IT infrastructure has a number of disadvantages. Operating expenses are high. There are different systems for different media (analog and digital, with and without access restrictions, textual documents and images) with similar workflows.
In the last year, we designed and discussed a future, modern, consolidated system architecture and aligned internal and funded projects according to this target infrastructure.
In the presentation I will present the target IT infrastructure, the background of some design decisions, the challenges we have identified and the projects we currently run to develop our infrastructure towards the target one.
Henrike Berthold has a PhD in computer science. Her research was focused on data management. In 2011 she joined the Saxon State and University Library in Dresden (SLUB) and built up the digital preservation group. In 2016 she took over the lead of the IT department.
Lightning Talks will be limited to 5 min/person.
You can sign up at the Registration Desk in front of the Balling Hall.
Presentation: Pushing SKOS
Felix Ostrowski, Adrian Pohl ↓
Controlled vocabularies and other authority data have been central to library and other knowledge organization processes for a long time.
Currently, Basel Register of Thesauri, Ontologies & Classifications (BARTOC) records more than 2,700 thesauri, ontologies and classifications. These vocabularies are commonly used across many different institutions, implicitly linking resources about the same topic. Individual catalogs can be browsed and searched using these knowledge organization systems.
Vocabularies come along in very different forms, being defined in PDF documents, XML or Excel, to name only a few. The popular Simple Knowldge Organization System (SKOS) is a Linked Data vocabulary for describing vocabularies on the web. This makes them dereferencable and thus self-descriptive. That is great, but is it really all that useful to only know what a topic means, but not which resources about that topic are out there? Given a topic, we currently still have to go and harvest metadata from disparate sources.
What if we could inform a topic itself about corresponding resources?
What if we could harvest a topic for metadata? What if we could even subscribe to topics, receiving corresponding updates via push as they are published?
This presentation introduces the simple knowledge organization hub (skohub), a proof of concept web service that allows to do just that based on current web standards. Given a SKOS vocubulary, the service publishes it on the web, providing RDF serializations along with a human-readable HTML front end. For each topic described in the vocabulary, a Linked Data Notifications inbox is provided, making it possible to publish and receive notifications about resources related to that topic. Finally, WebSub enables subscriptions for push notifications about resources matching a topic.
Felix Ostrowski, Adrian Pohl
Felix does web research & development. Before founding graphthinking GmbH, he worked as a research assistant at the Berlin School of Library and Information Science and as a software developer and repository manager at the North Rhine-Westphalian Library Service Centre.
Adrian has been working at the North Rhine-Westphalian Library Service Center (hbz) in Cologne, Germany since 2008. He is primarily working on hbz’s linked open data service [lobid] (https://lobid.org) focusing on project management, metadata and vocabularies.
Blend and deblend Linked open data in a Consortium
Jordi Pallarès ↓
Working in a Consortium give us the perspective to see the benefits of blending ideas to create a applications from a central data/point. In some cases we found the Institution want to deblend or „not blend“ with the consortium and prefer or they see more benefits to made his own aplication. We explain our experience in the Consortium blending and not blending desicions and explain two projects in linked open data to show examples of this two ways. One project We blend all the authorities and in other we not blend in the case of Thesaurus of the University of Barcelona using Skos format.
Jordi is a computer scientist working in the library area of the Consorci de Serveis Universitàris de Catalunya (CSUC, www.csuc.cat) as a ibrary Aplications and Documentary Expert. He works las years around the University Union Catalogue of Catalonia (CCUC ccuc.cscuc.cat) and all the services envolved with it.
Presentation: A Machine for Automatic Subject Indexing Using ToC
Jan Pokorný ↓
The technology developed in the National Library of Technology can extract a document’s table of content (TOC), generate relevant keywords, and suggest terms for various classification schemas (UDC, DDC, LCC, Conspectus). It can fully or substantially automate the process of generating subject access, unite it across libraries, and significantly increase accuracy and relevancy compared to subject assignments by non-specialist catalogers. Such increased quality in subject access terms is often seen in the superior subject facets generated by discovery systems and library OPAC advanced search forms.
Jan Pokorný is Head of System Architecture Department in NTK.
Sight-seeing Tour by Historical Tram to the Conference Dinner Venue
Presentation: DeepGreen - Blending Data to Transform the German Scientific Publication Landscape to More Open Access
Thomas Dierkes, Julia Goltz-Fellgiebel ↓
The DFG-funded¹ project „DeepGreen“ which started in 2016 aims to develop a platform using microwebservices that collects and redistributes qualified scientific closed access publications to make these openly accessible via repositories. The service is legally based on – but not soley restricted to – national licence agreements, so-called alliance licences, between international publishers and scientific institutions and universities in Germany. The authors from authorised institutions, and the institutions themselves as legitimate representatives of the authors, are permitted, free of charge, to promptly² add their articles that have appeared in licenced journals to institutional or discipline-specific repositories of their choice.
The generic workflow of the platform is designed as a simple push-forward service which includes
a) the publishers‘ delivery of publications,
b) the analysis of the given metadata to determine matches on a database holding all relevant licence information, and last not least,
c) the direct transfer to legitimate repositories accordingly.
In this talk, the technical difficulties and the corresponding solutions of the tasks at hand, to automatically blend in legal information with given metadata, are illustrated. Preliminary results with pilot publishing houses are presented and possible shortcomings of the project are discussed. Finally, the outlook of establishing a central, nation-wide service for a liable, automatic transformation of any OA-entitled publication will be given.
¹ German Research Foundation (DFG), http://www.dfg.de/en/index.jsp
² i.e. typically after a short period of time (embargo)
Thomas Dierkes and Julia Goltz-Fellgiebel
Thomas Dierkes studied mathematics and computer science at the University of Münster, Germany. There, he earned a PhD degree in mathematics for his thesis „Reconstruction methods in optical tomography“. Starting from 2000, he worked in the fields scientific computing and medical imaging at different places (UCL London, Research Centre Jülich, and University of Münster). In 2010, he joined the Zuse Institute Berlin (ZIB) and in 2016, its department „Scientific Information“ as part of the Co-operative Library Network Berlin-Brandenburg (KOBV). Currently, he is working on the effective and liable transformation of open access-enabled scientific publications within the DFG-funded project „DeepGreen“.
Julia Goltz-Fellgiebel studied communication management, modern history plus library and information science in Berlin. From 2007, she worked in different libraries (two hospital libraries, at the European Central Bank, and for the Consortia of Berlin Public Libraries). In 2011 she joined the Co-operative Library Network Berlin-Brandenburg (KOBV) as scientific librarian where she is responsible for the network’s public communications and coordinates the open access transformation project „DeepGreen“.
Presentation: From XML to MARC: RDF behind the scenes
Yann Nicolas ↓
We collect heterogeneous metadata packages from various publishers. Although all of them are in XML, they vary a lot in terms of vocabulary, structure, granularity, precision, and accuracy. It is quite a challenge to cope with this jungle and recycling it to meet the needs of the Sudoc, the French academic union cataloguing system.
How to integrate and enrich these metadata ? How to integrate them in order to process them in a regular way, not through ad hoc processes ? How to integrate them with specific or generic controlled vocabularies ? How to enrich them with author identifiers, for instance ?
RDF looks like the ideal solution for integration and enrichment. Metadata are stored in the Virtuoso RDF database and processed through a workflow steered by the Oracle DB. We will illustrate this generic solution with Oxford UP metadata : ONIX records for printed books and KBART package description for ebooks.
So. A relational database as glue and pipeline engine… RDF as internal model… MARC as output …. Quite weird… Was this abstract written by an ELAG-specific random text generator ?
Lightning Talks will be limited to 5 min/person.
You can sign up at the Registration Desk in front of the Balling Hall.
Presentation: Largest Koha Installation of 1,130 Public Libraries in Turkey
Mengu Yazicioglu ↓
The public libraries in Turkey belongs to Ministry of Culture
and every city in Turkey has many public libraries, more than 1,140
libraries now. In 2014, they decided to migrate from a local system to
Koha and after first period, we widely start to support public libraries
in a centeralized system with Koha. Many new modules were added to Koha
system and we also migrate to 3.20.x version later. More than 5.000.000
duplicate bibliographic records were merged in that period. In this
presentation, we’d like to share all installation process of a largest
centeralized system by a free open source library system Koha.
Gratuated from Mathematics Education and then Ms.Sc degree from Science and Technology Policy Studies from METU. IT person since 1990, like Free Open Source Softwares. CEO of software company Devinim.