Team blog: Developers

We now synchronize labels from OpenStreetMap

Supporting OpenStreetMap properly will take many steps (integrated maps for picking review subjects, search by location, etc.), but a first small step is now complete: pulls the labels for review subjects directly from OpenStreetMap for any subject with an OSM way or node URL. For example, this restaurant has an OpenStreetMap link associated with it. If you click the link to OSM, you’ll notice that the name (“Hareburger”) is a property there. We are updating these names for items with OSM URLs every 24 hours.

Next up will be support for opening hours, which are already included for many businesses in OpenStreetMap, and which are an essential property for restaurant reviews.

Lots of copies keep stuff safe has offered database downloads almost from the beginning, but until recently, these did not include uploaded media. As of last week, we are now creating an archive of all uploaded media files every week and offering it here. Note that in order to use these files in a license-compliant manner, you will also need the file metadata from the database dumps.

Thanks to user arx, our database downloads are now also mirrored to IPFS here (may take a while to load). IPFS is a cool project to build a distributed web and very much worth checking out.

If you’d like to run a mirror of some or all content, please go ahead—and please let us know through a pull request against this file. One of the huge benefits of free/open licensing is that we can all work together to ensure that the stuff we create doesn’t go away if one particular website does. - a platform co-operative?

I’m investigating the possibility of turning into a platform co-op. The underlying idea is simple: users own the platform and vote democratically on all decisions. I’m a member of, which uses this model for a social network that is part of the larger Mastodon network (a good intro if you’re unfamiliar), and my experience with it has been very positive.

What would this mean? Nothing for the day-to-day use of the site. Folks who want to participate in decision-making could sign-up and contribute to costs for hosting and development, but also vote on decisions, e.g., which features to prioritize. There would probably be a free tier for co-op membership as well, just to make sure that active contributors can join up even if they can’t contribute financially.

If this is something you’d like to see, here’s a simple thing you can do: star on GitHub. I’d like to use OpenCollective to manage finances for the platform, and they require a threshold of 100 GitHub stars. That seems fair—there should be active user interest before it makes sense to set up a funding/governance model.

Major update to browser extension is a separate project from, but with a related purpose. It’s a Chrome/Chromium browser extension that lets you download reviews you’ve contributed to major websites, including Amazon, Goodreads, IMDB, TripAdivsor, and Yelp. You can publish them on the site as a backup, or keep your own copy.

The most popular feature turns out to be the Quora downloader: tens of thousands of Quora answers have been downloaded and re-published under free licenses with it (see the upload directory).

Today I pushed out a week’s worth of updates to the extension. From a user perspective, the main changes are that IMDB and Amazon extraction works again (design changes had caused the plugins to break), Quora extraction should be more well-behaved, and all plugins have a clear “busy” indicator when they’re doing stuff.

Under the hood, the extension now uses async/await functions instead of callbacks to make the download flow a lot more understandable, especially for complex plugins like the Quora one which have to monitor changes to a page dynamically made with JavaScript not under the plugin’s control.

If you’ve contributed reviews to other sites, I encourage you to use this extension to keep your own copy (and please report issues you experience). In future, we’ll make it easy to migrate individual reviews to or other sites, as well.

Major improvements to media uploading

It’s now possible to upload and insert media files directly from the rich-text editor when writing reviews, blog posts, or anything else. This video is a quick demonstration:

This was a major effort for a few reasons:

  • We needed to add an upload API that handles various failure cases (e.g., incorrect MIME type), batches up errors and passes them along to the application. The API supports multi-file uploads as well, but in the editor we only upload a single file at a time.

  • We needed to design a dialog that’s quick and easy, while handling entry of all required data without taking up too much space for mobile users. The flip to a second page you see in the video seems like a pretty good solution—you only ever see that page if you need to.

  • We needed to add an upload feed so we can keep track of what files are being uploaded.

As you can see in the video, the feature gives credit to the person who created the work you’re uploading, something that tends to go missing on most websites.

Along the way, we’ve also improved the old multi-file upload and the presentation of metadata on review subject pages. For now, you need to switch to “rich text” mode to see the upload button—in future, the plain text markdown editor will get its own toolbar.

HTML5 video and audio support added

We now have support for HTML5 video and audio in reviews and posts. You can insert media by URL using the “Insert media” menu in the rich text editor. Alternatively, you can use the markdown syntax for images — ![alt text](url) — and it will now work for video and audio files as well.

Here’s an example of an embedded audio file:

Keep in mind that this only works for links to files in formats supported by modern browsers (typically mp4/webm/ogv for video, mp3/ogg for audio). YouTube links and such won’t magically work. We may or may not add support for that—YouTube is ubiquitous, but it’s not ideal to have videos that may disappear at any moment embedded in reviews that are meant to be freely reusable forever.

Under the hood, the CommonMark markdown specification does not yet include support for video/audio. As a result, CommonMark compliant parsers like Markdown-It (which we use) don’t support it, either. There was a preexisting plugin, but it had a few issues:

  • clocked in at >100KB due to an unnecessarily complex dependency

  • did not show any fallback text for older browsers

  • included hardcoded English strings

  • did not tokenize audio and video differently from images, making it difficult to integrate with rich-text editors

While some of this may be fixed (I sent a pull request), I ended up writing and publishing a new module, markdown-it-html5-media, that is optimized for our use case. Using image syntax tracks the emerging consensus in the ongoing CommonMark discussion about this topic.

SPARQL that sparkles

We’ve had support for looking up items via Wikidata for a while now. In order to give you the most relevant results, we exclude some stuff from the search that is very unlikely to be of interest: disambiguation pages, categories, templates, and other “meta-content” from Wikipedia.

To do this, we previously had to fire off two requests for every query we sent to Wikidata: one, to the MediaWiki API for Wikidata, using the wbsearchentities module that also powers Wikidata’s own search; the second to the powerful Wikidata Query Service, in order to identify which of the search results should be excluded.

It turns out that the Wikidata Query Service supports directly interfacing with the MediaWiki API, but until recently, the order of the results was lost when performing such a query.

Thanks to Wikimedia Foundation engineer Stas Malyshev, this was fixed a few days ago, so we were able to rewrite our queries to make use of it. Our Wikidata search is now entirely powered by SPARQL, the query language designed for the semantic web.

The result: significantly more responsive Wikidata lookup and simpler code. See the code for details; most of the work is done by the _requestHandler function. In related news, Wikidata also has recently significantly improved the quality of search results by switching to ElasticSearch.

Fun with Node 8.9.0 and jsdoc is powered by Node.js. This post is very much about the internals, so only read on if you care. :-)

We’ve just upgraded the site to a major new release of Node: 8.9.0. Many excellent blog posts have been written about the new features in this series of Node. Personally, I’m most excited about async/await.

In modern web development, you’re often dealing with operations that are synchronous (executed immediately and blocking operation of other code) vs. asynchronous (effectively running “in the background”).

For example, when you run an expensive database query, you don’t want it to keep other visitors of the site waiting—it should run in the background. But the application needs to know when the query is finished. Promises are one way to organize such asynchronous execution sequences.

Unfortunately, as you deal with more complex sequences of events, using only promises can also make code increasingly difficult to read. Here’s an example from the sync-all script, which is run every 24 hours to fetch information from sites like Wikidata and Open Library:

  .then(things => {
    things.forEach(thing => thing.setURLs(thing.urls));
    let updates = => 
      limit(() => thing.updateActiveSyncs())
      .then(_updatedThings => {
        console.log('All updates complete.');

What’s going on here? We’re getting a list of all “Things” (review subjects), excluding old and deleted revisions. Then, for each thing, we reset the settings for updating information from external websites. We build an array of asynchronously run promises which contact external websites like Wikidata and Open Library. The limit() call throttles these requests to two at a time.

The main readability problem is the increasing nesting. If you add .catch() blocks to Promises, it can be even more difficult to follow what’s going on, and to make sure all your brackets are in the right place.

Here’s what this sequence looks like with async/await:

  const things = await Thing.filterNotStaleOrDeleted();
  things.forEach(thing => thing.setURLs(thing.urls));
  await Promise.all( => 
      limit(() => thing.updateActiveSyncs())

It’s a lot easier to see what’s going on. And this isn’t even accounting for the greater simplicity of success/error handling. Under the hood, async/await works with Promises, and there are many situations where using Promises directly is fine (note even the second version uses Promise.all). But for more complex operations, it really makes a difference.

While I’m at it, I’m adding standardized documentation in jsdoc format to modules as I go. Essentially, these are code comments in a special syntax that can be used to generate HTML output. You can find the generated result here; it will be updated every 24 hours from the currently deployed codebase.

New screencast is up

A lot has happened since the last screencast, from October 2016: we got full-text search, integration with Open Library and Wikidata, a rich-text editor, and other goodies. Here’s an updated screencast (YouTube version) that gives a brief overview:

Open Library autocomplete search is live

On the heels of basic Open Library support announced last week, we now have an autocomplete search box for book titles as well. This has necessitated a bit of a redesign of the relevant part of the review form. Here’s what it looks like to perform an Open Library search:

Open Library search box

What’s new here is the dropdown that lets you choose between Wikidata and Open Library. In future, other sources like OpenStreetMap may make an appearance here as well.

The actual search is, I think, quite a bit nicer than the title search on itself. If you search on directly, you won’t get an autocomplete match for titles like “the wealt” that would match the title “The Wealth of Nations”. In our case this works just fine. Our search is also not sensitive to the word order, often producing more results at the cost of some irrelevant ones.

Unlike’s search, our search attempts a match against both the stemmed version of the words you enter (e.g., “dog” will both match “dog” and “dogs”) and against the wildcard version (“dog” will also match “dogcatcher”). To do this we have to fire off two requests per query. I’ve put some notes together on OL’s GitHub repository in case there’s interest in building on these improvements for the native search.

Finally, this search has a little extra feature: you can narrow search results by author by adding an author’s name (partial or full) separated with “;” after the title. This is a bit more obscure than something like “author:”, but I figured it’s nice to have a shortcut for something you may want to do very frequently—and it’s documented in the help that’s shown next to the input.

 Older blog posts