Evidence & Evaluation
What if government is stuck in a local maximum?
The case for broadening our field of view
This essay is published as part of the Centre for the Edge, an initiative to help public sector leaders support and spread promising alternatives. The Centre for the Edge is being run in partnership between JRF and Kinship Works. You can read more about the wider work and our future plans here.
Introduction
Sometimes machines encounter an error in which they get stuck with a narrow field of vision. Picture a robot vacuum cleaner trapped under a table, with its sensors confused so that it can’t find a way out. The robot cleans the space under the table repeatedly, even obsessively, working again and again on the same area, even though there’s more important work to be done elsewhere.
This is an example of a local maximum problem, and it can also affect human beings. It happens most often when we’re acting like machines ourselves, such as when we’re working in a bureaucratic government department or in a big corporation. Consider a designer who spends months making small improvements to a website, never stopping to realise that an app would do the job much better. Or think of a hotel chain that spends years improving its offer, only for Airbnb to disrupt the entire industry. Like the vacuum cleaner, they are working hard, but their effort is being wasted by a narrow field of view.
In this post, I explore the possibility that government is stuck in a local maximum, limiting our society to a small slice of the social practices we could use to make our lives better — and thereby cutting off viable routes to human flourishing.
I believe this argument holds in the broadest sense, which is to say that social democracy as a whole is at a local maximum, so that we can only make decisive progress again by going beyond the perimeter of our current institutional settlement.
In this post, however, I want to keep things practical by applying the argument to one particular aspect of government, namely the narrow set of methods that public institutions use to decide the kind of work they do and the kind of work they support. By “methods”, I mean standards of evidence and ways of evaluating work, as well as control mechanisms, for example tools for appraising and making decisions about risk.
To represent the argument visually, I am suggesting that the evidence standards and control mechanisms we use in government today limit our field of view to A, and that if we broadened our view to B, better outcomes would be on offer.

Figure 1: Stuck in a local maximum
I build the argument in three steps. First, I explain why the local maximum thesis feels plausible, which is not to say it is proven but just that it is worth taking seriously. Second, I explore what it could mean to broaden our view, again thinking about standards of evidence, and specifically evaluation methods.1 Third, I make some recommendations.
I hold these ideas lightly and I’m sharing them in the hope of improving them in public debate. With input and over the course of the project, we’re planning to refine these ideas with the teams at Kinship Works and JRF. We would be open to partnerships with funders and other organisations with the capacity to make this work happen.
Part 1. Why might we be at a local maximum?
Let’s start by considering why it’s plausible that government is trapped in a small, walled-off possibility space, like the vacuum cleaner. I see four main reasons to think so.
1. Progress can be quick at first, then slow down
First, when systems become restricted by a narrow field of view, progress can be quick at first — the payback for being focused — but then later slows down. When the vacuum cleaner first finds the space under the table, its sensors will show the carpet getting much cleaner. A few hours later, however, progress will stall as it sees diminishing returns.
Over the last two decades, the UK has seen progress slow across a range of performance indicators, from life expectancy to economic growth and poverty. In some of these cases, a slowing of progress was inevitable — humans can only live so long, for example. But in other cases, there are signs that we are seeing diminishing returns from conventional methods.
Notice, for example, how our healthcare system, having been designed to diagnose and treat acute health conditions, such as bacterial infections, has struggled to drive continued progress in quality of life for people with chronic health conditions, such as arthritis. Outcomes like this suggest a local maximum situation — we are working within a narrow subset of the available options.
In some local maximum situations, outcomes don’t just stagnate, they fall. This can happen when a system is dynamic, and conditions change to make a previous local maximum no longer sustainable. This is often a feature of technological change; the hotel that is outdone by Airbnb might find that its revenue doesn’t just stop rising but starts to fall, as Airbnb eats into its market share. The hotel’s previous position — and possibly even its survival — has become unsustainable, unless it can broaden its view to do business in other ways.
We see a pattern like this repeating across many important social and economic outcomes in the UK, and likewise in many of the world’s mature economies. From fiscal sustainability, to mental health, to measures of trust and the perceived legitimacy of public institutions, things are getting worse. It seems the methods and institutional forms we currently employ in government are not only struggling to make progress, they are proving incapable of holding our ground.
2. Public institutions are struggling in revealing ways
Second, when we look at the particular way in which public institutions are struggling, the pattern again suggests that we are stuck with a restrictive and outmoded range of options.
Across government, outcomes seem to be holding up best for tasks that are bureaucratic or technical in nature. In such cases we have seen improvements or continued high performance. Consider how the UK government is doing a better job than ever of collecting taxes, processing passport applications, administering benefit payments and, in the NHS, diagnosing infections and prescribing the right antibiotics.
By contrast, notice how government is struggling with more novel or complex work that is poorly suited to conventional bureaucratic methods and impulses: the management of chronic conditions, for example, or societal challenges like loneliness, or community cohesion.
This does not seem like a coincidence. Public institutions are performing better at the kinds of problems that dominated when these institutions were invented — when we first alighted on the possibility space of the bureaucratic method. They are performing worse on problems that have come to the fore more recently, and for which this narrow space might not contain answers.
It is as though we have picked the low-hanging fruit — the ones we can reach with conventional methods — and have then failed to discover and spread alternative methods.
3. Promising alternatives already exist beyond the perimeter
Third, there seem to be promising approaches beyond the perimeter of our conventional methods. Consider, for example, these real examples of promising alternatives that are hard to see with traditional evidence standards:
A relational, peer-based model of care that seems to deliver qualitatively better care at far lower cost than care delivered in an institutional setting.
A community land trust model of affordable housing that seems to significantly reduce transience and delinquency.
A method of participatory budgeting that appears to lead to more effective budget allocations, reducing infant mortality and improving access to sanitation.
Asset-based community-led development projects that seem to reduce social isolation and boost people’s sense of confidence and efficacy.
An experiment in regenerative agriculture which seems to deliver higher yields and big improvements to soil health and water quality.
This is not to say that these alternatives are proven to be better than conventional methods. It is merely to say that there are enough signs of life beyond the perimeter that it is worth us venturing out more often and in a more determined way.
4. The public sector lacks discovery mechanisms
Finally, it would not be at all surprising if our public institutions had got stuck in a local maximum, since we know the public sector lacks the mechanisms that tend to resolve local maximum problems in the private sector. If we think back to our Airbnb example, although the traditional hotel company failed to see beyond its old business model, Airbnb was still able to emerge and spread across the economy. In the public sector, no such mechanism operates, so old institutions and methods lumber on.
This makes a local maximum problem not just plausible, but highly likely. Indeed, unless the public sector emulates the discovery mechanisms of the private sector — for example, by using evaluation and design methods in especially permissive and imaginative ways — sooner or later, it is all but certain to arrive at a local maximum.
Another telling sign is the experience people have leading promising alternatives. Pioneers who use unconventional methods in the social sector often find that their work is celebrated, but doesn’t spread — it neither disrupts and displaces old methods, nor is it integrated into the way old institutions function.
Indeed, it is worse than this. Pioneers in the social sector report a daily struggle to keep their work alive, let alone to reach scale. Even as their work is celebrated by politicians — they get smiling visits from a Minister — they cannot access the local authority capital budget, or they have to spend months writing an irrelevant business case to persuade a junior HMT official that their work is legitimate.
Such leaders often cite narrow evidence standards and reporting requirements as one of the main reasons that the system seems blind to their work, or dismissive towards it. The result is that promising alternatives have to bend to the system’s bureaucratic requirements — they are changed more by the system than they change the system around them.
Together, these four reasons suggest that the local maximum thesis is worth taking seriously. And of course if we do find ourselves in this situation — if we are dealing not with a trapped vacuum cleaner, but with a trapped government — then the costs would be high.
Part 2. How could we broaden our field of view?
What would it take to broaden our standards of evidence and control mechanisms so that we can see out beyond the perimeter fence, to find and spread promising alternatives?
In this section I’ll focus on the standards of evidence and control mechanisms that public institutions use to decide what work to do and what work to support. This includes methods for appraising and deciding between work, such as evaluation techniques, methods used to compare options, and ways to measure and manage risk.
I am interested in the methods used to decide what work government does — delivering, procuring, commissioning — but also in methods used to decide what kind of work government supports, for example by licensing, legalising, or simply celebrating and endorsing certain ways of doing things, and not others.
We can picture the methods used to inform government decisions as lenses the government looks through. And when we’re thinking about these lenses, we can ask the following kinds of questions:
What fidelity do our lenses have? How much detail can they see? And what kinds of details can they see?
How broad is their field of view? Can they take in a diverse range of types of work, or is their view narrow and focused?
What is their tolerance? Are they good at distinguishing fine-grained differences or changes? For example, can they distinguish between work that is bad or risky, versus work that is good and safe but merely unfamiliar?
Do the lenses have a tint to them? Or, in more literal terms, are certain value judgements implicit in the methods we use, and are our methods based on certain assumptions? For example, do they imply a particular view of society, or of the way individuals behave?
I propose imposing some structure by mapping these issues onto the framework visualised in Figure 2, which again uses the metaphor of a lens. At the centre are fine-grained distinctions, where we choose between similarly narrow but focused methods. Then, as we progress outwards, we reach less conventional methods that would allow us to see more broadly.
Figure 2: Five levels, from narrow to broad

To make this concrete, in Figure 3 I have mapped certain evaluation and learning methods against the five levels.
Inside level 1, we consider narrow questions. For example: can we refine the conventional methods we already use, such as statistical evaluations and Randomised Control Trials, or RCTs, to make them more pragmatic and permissive?
We then move out through:
Using alternative methods.
Pursuing unconventional lines of enquiry.
Adopting more permissive systems and frameworks for learning and improving.
Changing paradigms.
In all cases, we’re interested in ways to evaluate and learn. And, as we move out, we start to consider a broader and more permissive range of methods.
Figure 3: Evaluation methods — five ways of broadening our view
How could we improve our traditional methods so that they can see more kinds of work?
For example, making RCTs more suited to the complexity of community-led work.How could we use less conventional methods that can recognise the value of a broader range of work?
For example, Theory-Based Evaluation and contribution analysis.How could we apply evaluation methods, including levels 1 and 2, to less conventional lines of inquiry?
For example, using RCTs or Theory-Based Evaluation to evaluate system change initiatives.How could we use more permissive frameworks and ways of learning?
For example, adopting an outcome framework that is agnostic about the way we work, or adopting broader frameworks for learning, such as Human Learning Systems.How could we reframe and broaden the role and purpose of government altogether?
For example, going beyond classical economics and statistics to learn from disciplines like political economy and institutional economics, or from ideas like social infrastructure and social capital.
Using the framework to broaden our field of view
How can we use this framework to guide us? We can start by considering where we currently focus our efforts. For instance, if we think of the resources we currently spend in government on evaluation and learning, how much sits at each of the five levels?
I would make the following observations:
Most work sits in level 1 — applying conventional methods to conventional questions, such as assessing the efficacy of a policy intervention, or incrementally improving these methods. We see this in the many thousands of people who are employed or contracted to run evaluations, mostly using traditional methods. A large community of officials also works to refine these methods, and a much smaller subset of that community works on nearby alternatives, such as Theory-Based Evaluation.
Many thousands of hours are spent on options appraisals, which again lean heavily on the same narrow quantitative methods. This is to say that most of the effort that government puts into evaluation goes into fine-grained, lower-level questions.
Because of the way accountability works in government, many thousands of officials are responsible for delivering or choosing between individual programmes and feel accountable for these programmes being delivered without surprises, and ideally with a positive impact. By contrast, very few people are employed to consider higher-level questions, such as: “How diverse is the overall portfolio of work we are funding, in terms of methods, institutional forms, and implicit worldviews?”
Most evaluation work relates to discrete interventions or policies, as opposed to ongoing efforts to improve the way systems or institutions function.
There is relatively little resource dedicated to higher-level methods. For example:
A department might have at most a small strategy team available to explore an approach like Human Learning Systems.
There are some small central teams trialling models such as Test & Learn, or working to develop and improve discovery mechanisms, such as internal markets, or supporting the creation and spread of new methods and institutional forms.
There has been some work done on broader paradigms of government, but mostly sitting outside of the state. This work has been led by a mix of universities, non-state actors such as the British Academy, and foundations, and is often funded independently of government, for example by historic endowments or philanthropists.
It is also instructive to review the literature on the relative strengths and weaknesses of today’s dominant methods, namely quantitative evaluations and associated statistical techniques and trial designs. We can then characterise what government would be like if we overrelied on these methods.
We know, for example, that RCT-style evaluations are good at determining whether a policy — conceived as an “intervention” — worked in controlled conditions. However, RCTs struggle to explain causal mechanisms, work less well with complex and varied contexts, and they are not well-suited to understanding system-level change or emergent outcomes.
RCTs also tend to over-focus on internal validity — do we know for certain if the intervention worked here? — while under-focusing on external validity — is it likely that the intervention would work somewhere else? These methods also sit within, and reinforce, what we could think of as a medical paradigm of government: a view that government is there to develop solutions to social problems, as a pharmaceutical company develops medicines.
A system that overindexed on RCTs and similar methods would therefore:
Be regularly disappointed by the lack of transferability, getting excited about the impact of an intervention that later turns out not to replicate in other contexts.
Not accumulate a deep understanding of the causal mechanisms and complex dynamics of social systems and social problems.
Tend to over-rely on solutions that can be measured by conventional methods, namely point solutions that are conceived as “interventions” — or as medicines that will cure a social problem, or alleviate its symptoms. Over time, the system would find itself supporting an ever more homogeneous portfolio of this kind of work.
Have blind spots, in particular related to complex social dynamics and emergent outcomes.
Underfund certain kinds of work, in particular work that is more holistic, complex, relational, and system-level.
Underplay the social aspects of learning and knowledge. The system would tend to build a strong bank of technical knowledge but would underinvest in making sure people knew about the findings, and had the know-how to apply them. In other words, the system would overinvest in knowledge and underinvest in practice.
As someone who has worked in and around government for nearly 20 years — and has used many of these traditional methods myself — these descriptions feel painfully familiar.
Part 3. Recommendations
I have argued that it is plausible that public institutions are stuck in a local maximum and that the narrow methods we use to evaluate work are part of the reason we’re stuck.
If that’s right, what could we do about it? In this final section, I share five recommendations to broaden the methods public institutions use. Each pertains to one of the five levels in the framework I described above; recommendation 1 pertains to level 1 in the framework, and so on.
I then make a sixth recommendation about how we could shift the overall balance of resources to focus more on higher-level questions.
In the recommendations, I reference a series of alternative learning and evaluation methods. These are taken from a guide to broader and more permissive learning methods that we will publish in January 2026 as part of the Centre for the Edge initiative.
1. Build capability around more pragmatic and permissive quantitative evaluation methods
Build capability around more pragmatic and permissive quantitative evaluation methods, so that we can understand the impact of a broader range of work.
One example is pragmatic trials, which put less emphasis on creating lab-like conditions and more on whether work is doable in the real world and whether results are transferable. Pragmatic trials focus less on statistical significance and more on whether outcomes are meaningful; for example, does a person feel their life has improved?
Another useful method is cluster randomised trials, which work with groups, rather than individuals. They are useful for understanding the impact of community- or system-level initiatives.
Recommendation: Evaluation teams across government should make sure they are well-equipped to use pragmatic and permissive quantitative methods, and they should make sure these methods are not ruled out by wider processes, for example in the evidence requirements that are used during procurement and commissioning.
2. Invest in less conventional methods
Invest in less conventional methods, especially methods that lend themselves to complex and context-dependent work, and to social, relational, and system-level interventions.
Examples include Theory-Based Evaluation and contribution analysis, and the wider class of realist evaluation methods.
Recommendation: Government departments and What Works Centres should invest more in evaluation methods that are suited to complex domains. This should include dedicated funding to support the application and refinement of these methods, training in these methods, and sponsorship for centres of excellence.
3. Apply evaluation methods to unconventional questions
Work systematically to apply evaluation methods — including those described in recommendations 1 and 2 — to unconventional questions, especially efforts at system change and system adaptation.
It is becoming more common to use quantitative methods, including RCTs, to understand the impact of system change. For example, this can mean experimenting with improvements to public institutions, such as the adoption of more empowering management practices, or changes to organisational design, as part of a randomised trial design.
By applying evaluation methods to these kinds of lines of inquiry, we can operate at a higher level of the system than when we are simply evaluating and comparing individual policies or programmes.
Recommendation: Strategy teams across government should support a more systematic effort to evaluate system change. This could include, for example, funding quantitative studies into the adoption of empowering management practices.
4. Adopt more permissive outcome and measurement frameworks
Adopt more permissive outcome and measurement frameworks. Public institutions can now choose from a range of frameworks that allow them to track outcomes over time in a way that is:
Holistic, covering diverse outcomes.
Permissive, being agnostic about how the outcome is being achieved.
Governments can also use methods from the world of social investment, such as outcome-based contracts. They can also adopt broader frameworks for learning across a distributed system, such as Human Learning Systems.
Recommendation: The government should build outcome frameworks around its major priorities, and should make expertise available to support the adoption of Human Learning Systems. The Cabinet Office should develop a procurement framework to pre-authorise partners who can support departments and other public bodies to use these methods.
Finally, the government should review the homogeneity of evidence standards used across departments, and should develop a way to measure methodological diversity. This should be measured periodically as an indicator of system health.
5. Develop alternative ways to frame the role of public institutions
Develop and refine alternative ways to frame the role of public institutions, and develop associated measurement frameworks.
As concerns have grown about the narrowness of our governing settlement, a range of institutions have developed broader paradigms or framings. These are often conceived as going beyond conventional measures, for example by going “beyond GDP”.
Many leading universities now host research centres on these topics, including Oxford’s Wellbeing Research Centre and Centre for Eudaimonia and Human Flourishing, LSE’s Cohesive Capitalism programme, and extensive work on beyond-GDP measures by Cambridge’s Bennett School of Public Policy.
There has also been work by the British Council and the Bennett Institute on the idea of social infrastructure.
Such work reframes the goals and role of government. Rather than seeing government as a machine that optimises for utility or GDP via point solutions, we see government’s role as fostering the conditions for a flourishing human society and natural environment. This work promises a broader canvas, allowing us to think more creatively and openly about what good government looks like.
Recommendation: Leading university institutes should co-host a conference to explore outcomes beyond GDP, and broader framings for the role of government. The goal should be to mature a measurement framework for human flourishing that could guide analytical and evaluation teams in government, helping to broaden our field of view.
Keystone institutions, for example the British Council, Nuffield Foundation and National Lottery Community Fund, should pool resources to fund work towards a similar end — helping to develop ways to measure a healthy society, and socialising these methods across government.
Rebalancing the system
I believe these steps would broaden our field of view, so that a wider range of promising alternatives would be seen and supported. However, I also think it’s clear that resources are imbalanced between the five levels, with relatively too much time and money being spent on lower-level questions and relatively too little being spent on higher-level questions.
I therefore also suggest that the system as a whole rebalance its efforts, spending proportionally more on higher-level questions.
This can be done partly within institutions. For example, each government department should spend proportionately less money on evaluating individual programmes and proportionately more on embedding outcome frameworks or approaches like Human Learning Systems.
The rebalancing can also happen at a system-wide level, supported by overarching institutions such as the UK Research Councils.
How should we think about this higher-level effort? Stepping back, what this article describes is an effort to enhance the government’s ability to discover and adopt better ways of doing things. Which is to say that the mechanisms we use to discover and appraise work in the public sector should themselves be subject to evaluation and experiment.
Put another way, I am saying that we need to discover and scale new discovery and scaling mechanisms.
When the work is framed in this way, there is a direct analogy with the discipline of metascience, which is now well-recognised and supported by a systematic effort by UKRI, among others.
Metascience has attracted increased attention in recent years for much the same reason I discussed at the beginning of this essay; people were worried that science had got stuck in a situation like the vacuum cleaner. The scientific discovery process and its associated institutions — for example, peer review, academic tenure and research funding processes — have become sclerotic, meaning we are seeing diminishing returns. Scientific breakthroughs are ever harder to come by.
Metascience responds to this worry by experimenting with improvements to the scientific discovery process itself. We might say it experiments with the way we experiment. For a fuller exposition of this argument, see Michael Nielsen and Kanjun Qiu, A Vision for Metascience.
Maybe the efforts I have described in this post could be thought of as a discipline of metagovernance. And maybe we could define metagovernance, as per Nielsen and Qiu’s definition of metascience, as:
A systematic practice of designing, experimenting with, and spreading new ways to discover how to make society better.
It also seems to me that Nielsen and Qiu’s wider framing of metascience applies just as well to metagovernance. They propose, for example, the idea of a metascience entrepreneur: a person who works creatively to invent and then spread new scientific discovery processes.
They also emphasise the value of design practices in coming up with new scientific discovery processes, for example by ideating, prototyping and iterating new approaches. This leads them to describe metascience as having three complementary aspects — they call metascience “an imaginative design practice, an entrepreneurial discipline, and a research field.” The same could be said of metagovernance.
Finally, therefore, I recommend that the UK Research Councils and other relevant bodies, for example the civil service Policy Profession, emulate the approach that is being taken to metascience for metagovernance.
Recommendation: UKRI should convene the Research Councils and other relevant bodies — for example, the Evaluation Taskforce, Policy Lab, Policy Profession, and wider government evaluation community — in a series of workshops and conferences over 2026 to develop a strategy for metagovernance.
Conclusion
With each month that passes, it is harder to deny that our system of government is in, or nearing, a crisis. This is not to say that a crash is inevitable, but that — as with climate change — we risk passing a threshold at which feedback loops, such as collapsing trust and populist government, spiral out of control, making the situation difficult to repair.
In this essay I have argued that this crisis is caused partly by government being stuck with a narrow field of view. We are going over the same redundant methods, again and again, with diminishing returns, often struggling even to hold our current ground.
In such a situation, the only responsible course of action is to urgently broaden our field of view. We need to find and spread better ways to find and spread alternatives.
Maybe it will turn out that our best options were already within our narrow field of vision — maybe, as the cliché goes, we have found the worst system there is, except for all the others.
But it seems increasingly likely that this is fatalistic nonsense, and that what we are really suffering from is a bad case of institutional sclerosis. Better alternatives almost certainly exist, and we should invest serious money, time, and political capital to find them and to support their wider adoption.
This essay is published as part of the Centre for the Edge, a partnership between JRF and Kinship Works to help public sector leaders support and spread promising alternatives. You can read more about the initiative here. For more on the role of narrow evidence requirements in government, see “Are policies like medicines?” You can follow Kinship Works on LinkedIn and BlueSky and James’s wider work on BlueSky, Medium, or Substack.
Footnotes
Footnote text was not included in the source text provided. ↩
