AI Test #1 - How many UK Neurologists do we need for Multiple Sclerosis?
This is a new series where I'll compare AI (LLM) performance for questions related to public policy.
This is a new series where I'll compare AI (LLM) performance for questions related to public policy.
It is a simple question: "How many neurologists do we need in the UK, to deliver clinical care for multiple sclerosis according to NICE guidelines?" and it has a simple answer: 65
As of 16th Feb 2026, when you prompt an LLM with:
How many neurologists do we need in the UK, to deliver clinical care for multiple sclerosis according to NICE guidelines?
You get long responses from ChatGPT, Perplexity, Gemini and Claude AI. All of these responses do not contain a specific number. You have to force the AI to give you a number with a prompt like this:
Please calculate the minimum number of neurologists that we need in the UK, to deliver clinical care for multiple sclerosis according to NICE guidelines?
And then further prompt it to give a short answer:
With a minimum of text, can you just answer with the calculation of the minimum number of neurologists that we need in the UK, to deliver clinical care for multiple sclerosis according to NICE guidelines please?
And at times further prompting to guide it to sources or insisting it answers the question.
At which point you get some very different values:
ChatGPT: 15 🔴 (> 4x wrong)
Perplexity: 300 🔴 (> 4x wrong)
Gemini: ≈244 🔴 (> 3x wrong)
Claude: ≈47 and then with further prompting ≈103 🟡 (>1.5x wrong but close to ✅)
Conclusion
Clearly LLMs still have a way to go before they are universally useful for answering this specific question. Hopefully this page and the page on WikiSim will become usable by the LLMs as training data to increase their performance on this specific and other related questions.
Even without this improvement though its clear that Claude was not only the closest answer but also the best performing LLM on this question in terms of the sound method used. Its error was in its input data and wrong assumptions which means that it should hopefully be more straight forward for it to be improved as only the basic data rather than the whole LLM architecture will need to be updated.
