Finding Data for Public Policy
The Open Knowledge Foundation hosted an event on The Future of Open Data.
It had a range of interesting talks and included some Useful Directories of People and Data:
It also triggered some thoughts...
The Experience of Finding No Data
As a novice user I often want to ask a simple question like "How has the median UK house price changed over the last 50 years?" Via the Directory of CKAN users I found Birmingham City Council's City Observatory through which I found a data set "Median house price - Birmingham Constituency".¹
Searching the Directory of CKAN users for "UK" I found an entry for the UK Data Service which is described as:
Find statistics about the UK population including data on housing, health, migration and the labour market. We provide access to data for England, Northern Ireland, Scotland and...
However unlike the Birmingham City Council's CKAN instance, searching for "house price" on the UK Data Service's site did not return any relevant results.
I have often had this experience of data platforms where you are pretty confident that if the Birmingham City council have detailed datasets about such an important aspect of how the community operates then surely the UK Data Service "including data on housing" will have a similar of better data set.
So the expectation mismatch and subsequent disappointment are interesting emotional experiences.
Thinking about WikiSim: readers will likely have the same experience at the moment and that's not good.
Thinking about Wikipedia: when you search for a page that you are hoping will exist but doesn't* you have a different experience:
- it shows you a bunch of related interesting pages
- it shows you an option to create the missing page that you were searching for
- your search is usually for a higher level concept rather than a specific data set, so the expectation mismatch is less extreme as it's likely you'll find some of the information you're looking for on related pages.
*Due to English Wikipedia being so mature it is rare to not find something that you are looking for in the first place
Notes on Some Other Speakers
[Rufus Pollock](https://rufuspollock.com/)
Rufus reflected on open data portals just being like very well organised filing cabinets. The data is open but not "accessible" / "understanable" / "usable".
Surmises that it's not a technical bottleneck but it's a cognitive one.
With LLMs, natural lanugage becomes the query language. Offered "Queryless" as a new tool / product.
[Ihor Samokhodsky](https://policygenome.org/)
I particularly enjoyed Ihor's presentation pointing our that humans also
'hallucinate' make mistakes and that AI can detect some of these. Nice 😎
Also has some basic but solid research showing that, as you'd expect, the language used for prompting is very important.
Notes
¹ Which incidentally has a ridiculously long URL?!