SPARQL that sparkles

Team blog: Developers

We’ve had support for looking up items via Wikidata for a while now. In order to give you the most relevant results, we exclude some stuff from the search that is very unlikely to be of interest: disambiguation pages, categories, templates, and other “meta-content” from Wikipedia.

To do this, we previously had to fire off two requests for every query we sent to Wikidata: one, to the MediaWiki API for Wikidata, using the wbsearchentities module that also powers Wikidata’s own search; the second to the powerful Wikidata Query Service, in order to identify which of the search results should be excluded.

It turns out that the Wikidata Query Service supports directly interfacing with the MediaWiki API, but until recently, the order of the results was lost when performing such a query.

Thanks to Wikimedia Foundation engineer Stas Malyshev, this was fixed a few days ago, so we were able to rewrite our queries to make use of it. Our Wikidata search is now entirely powered by SPARQL, the query language designed for the semantic web.

The result: significantly more responsive Wikidata lookup and simpler code. See the code for details; most of the work is done by the _requestHandler function. In related news, Wikidata also has recently significantly improved the quality of search results by switching to ElasticSearch.