Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A gentle introduction to the Wikidata Query Service (wikidata.org)
104 points by todsacerdoti on Oct 19, 2020 | hide | past | favorite | 20 comments


It has an API too; I mentioned recently in someone's ShowHN [0] that I'd thought of doing something similar (showing which fruit & vegetables are in season locally) with WikiData - since even if it doesn't presently have that data, there's simply no reason or anything to gain from it being stored as if proprietary. Then my 'what's in season' web page or API would just be a thin domain-specific wrapper over Wikidata.

I've never actually played with it though, don't know if it already has that data, and I don't know what its terms of service permit doing, so not advice etc. just something I thought of, to likely never get to.

[0]: https://news.ycombinator.com/item?id=24811316


This is one of the best, most magical things on the Internet. SPARQL over wikidata is the closest you can feel to actual omniscience and I never truly understood the value of linked data until I played with it. Everyone should tinker with it!


I found a front-end to Wikidata:

https://qanswer-frontend.univ-st-etienne.fr/

I tried "who was harry potter's aunt?" on it but that didn't seem to work. I tried googling the same question and Google gave me the correct answer. I think that Google, despite result skew, is pretty powerful sometimes.

I asked Alexa the same question and she returned the correct answer as well.


That seems like a suboptimal implementation, when the info clearly exists:

https://www.wikidata.org/wiki/Q3244512

Google can be disappointing, however, when you want to ask a question few people have thought of. It would be hard to get a clear answer to "which family had the most members in Harry Potter?" etc, whereas that's a query you can write pretty naturally with a graph database, RDF or otherwise.

I'd never say that I wish we just had massive triple stores and Google had never been invented, but I still maintain that linked data is exciting and expressive.


askplatyp.us is another experimental front-end on top of WDQS − it worked with a slightly reworded question: https://askplatyp.us/?lang=en&q=Who%20is%20the%20aunt%20of%2...


That appears to still just be using 'relative' as a relation, not 'aunt'. You get the same results for 'cousin' etc.


Indeed, I have also been having fun playing around with this. I had no idea that such a service existed until I read this posting - this is what makes HN such a magical resource.


Yep! It's mind-boggling. Wikidata is one of the most exciting projects on the internet to me.


I'm a big believer in Wikidata as an independent and free source of facts, but there are two major flaws in this:

1. It's super tricky to get the queries right. SPARQL is so unknown by the data community, so it's hard to get help. Plus there are super sudden quirks that lead to erroneous results (e.g. not all relationships are shown if you filter the data _some way_ but no in a _slightly different way_). 2. Wikidata facts are often _not_ leveraged by Wikipedia itself. This leads to divergence between the two datasets, sadly the more popular one is also the one that's unstructured. Unless we properly tie these two services, we won't get an up-to-date linked database.

I have worked fairly hard on updating Wikidata with local facts (politics, mostly) over the past year or so and the two issues noted above have consistently dampened by motivation to continue.


Speaking of politics, that's often why Wikidata is not linked to Wikipedia. Most of the smaller language Wikipedias are eager to use Wikidata, as it provides data on a scale they couldn't achieve on their own. But the biggest Wikipedia, the English one, is very resistant to implementing Wikidata support, as they feel their local info is better maintained than Wikidata info. Which is true, but ignores the likelihood that Wikidata would improve very quickly if enwiki started using it. A very short-sighted and selfish attitude.


This is a case of a smaller Wikipedia (a country of 10M people, 400k articles or so). I've talked with our local Wikimedia orgs and there are basically two arguments:

1. It would be fairly difficult to implement an infobox that would load all the data properly (e.g. party membership, by-elections, complex elections etc.) 2. Some people would be sad to see their thousands of manual edits being replaced by an automated source of information.


This is really cool, I didn't realize SPARQL is pretty easy to work with.

I'm always curious though what people use services like this and wolfram alpha for beyond just playing around.


We have been using wikidata on pantheon.world to measure collective memory. In our case we start with all instances of human, trim the last to only the people with 15 or more language editions on Wikipedia and rank them based on an index. More info here: https://pantheon.world/data/faq


Thanks for sharing, that site is awesome.


Not sure if it counts as beyond playing around, but I used Wolfram Alpha quite a bit through school & university for checking my answers or how to do something I didn't have an answer for (it shows steps to get to solution, not just the solution) or to play around with parameters to get a better understanding of something - 'what if..' - pretty useful. More useful and easier to use than the TI Nspire CX CAS (I think) that I had, briefly, for that reason.


I imagine voice assistants use similar services to answer questions. Siri uses Wolfram Alpha. Alexa appears to use a number of different backends.


Well, XKCD seemed to use Wolfram Alpha when Mathematica fell over: https://what-if.xkcd.com/62/

> While researching this article, I managed to lock up my copy of Mathematica several times on balloon-related differential equations, and subsequently got my IP address banned from Wolfram|Alpha for making too many requests.


For some reason wikidata is light on financial info, as in budgets, expenditure, income, debt, assets of various orgs, various levels of govt etc. Is it because the triple store is not the best way to store/query financial data or is it some other reason? Anyone know? Lot of financial data is public but not much getting into wikidata.


It's an ecosystem of CC0-licensed, sourced data input and applications.

Regarding triples, note that Wikidata is not purely a triple store: each statement ("triple") can include qualifiers, which are key-value pairs about the core statement. So you can directly express "Company X made profit Y, in year Z, according to source S" etc.


There's a property to represent the budget for a project: https://wikidata.org/wiki/Property:P2769 So the triple store is perfectly capable of storing that kind of data. If you don't find what you're looking for, it just means nobody cared enough to add it yet.

I adapted one of the example queries to show the most recent entities with a budget: https://query.wikidata.org/#%23%20All%20items%20with%20a%20p...

Looks like mostly research grants. My guess is that someone wrote an integration to add new research grants to Wikidata as soon as they're published.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: