Introducing basic support for Open Library metadata

Team blog: Developers

In addition to descriptions and labels from Wikidata, we now also extract authors, titles and subtitles from Open Library URLs. If you haven’t heard of it, Open Library is a fabulous project by the Internet Archive that’s both a structured wiki with data about books, and an actual library.

After making a simple user account, you can “check out” up to 5 books at a time of which the Archive has a physical copy—you can either read them online, or download (DRM-protected) PDF files. As of now, the number of books available is at a staggering 522,358.

We use Open Library as a free catalog that doesn’t have the onerous licensing terms of WorldCat. In future, we’ll add more metadata fields like publication year, number of pages, and so on. For now, when you add an Open Library URL to an existing review subject page, the result is something like this:

Open Library imported data

Editions vs. Works

Open Library distinguishes between “editions” and “works”. A work encompasses all translations and releases of a book, while an edition is a specific one. Information like the number of pages and the publication year obviously is highly variable across editions, which is why we don’t include it yet until we have a concept of “editions” on our side.

We’ll likely want to generalize that concept, since it is applicable in other domains as well: the different versions of a movie, the generations of a product, and so on. This is tricky stuff—at what point is a product so different that it merits its own top level record?

Right now, we’re mushing information together if you provide multiple Open Library URLs for the same item. Modelling out the relations between things without adding too much complexity will be one of the biggest challenges in the future.


We don’t have an Open Library powered search box on the “New review” page yet, as we do for Wikidata. Adding the search box is relatively straightforward, though the search results can be a bit frustrating due to word stemming rules that don’t play well with autocomplete search. Nonetheless, some search is better than no search, so we’ll add that in the near future.