Automatic for the people?
A prescription for AI 'mainlined into the veins' of government is a bad metaphor for a dangerous policy
No sooner had the Starmer Government’s Action Plan for AI hit the news, than Elon Musk was demonstrating what the AI take-over of state power looks like. Although the Action Plan vibes more Third Way than Third Reich, the injection of AI capital into the UK public sector is intended to be just as radical. The Tony Blair Institute, among the few global cheerleaders for Musk’s DOGE project, has its fingerprints all over the Plan: its beneficiaries are the same ‘Founders’ and ‘Frontier’ corporations that the Trump government is currently mobilising, or being mobilised by, in the corridors of global power.
Just after the general election I ran through a report called ‘Governing in the age of AI’, penned by the Tony Blair Foundation, Faculty Science Ltd (a UK AI company we will hear more about shortly) and ChatGPT itself. I predicted that this AI junk would prove irresistible to a Government in need of quick fixes for the public sector. And here we are, straight to the vein.
In the six months since that first report, the tech-to-public transfusion has been nursed along by Matt Clifford, AI venture capitalist and entrepreneur, Effective Altrustism fellow traveller, and the man credited with introducing Rishi Sunak to the CEOs of OpenAI and Anthropic. While penning the Action Plan, Matt also is/was (current status unclear) a personal investor in Faculty and the Hakluyt AI investment fund. He is of course a properly disinterested expert - why else would he play a leading role in the UK AI Safety Institute? (We’ll also be hearing more about the work of the AISI.)

TL:DR
So what is in the AI pill for every public ill, from potholes to planning delays?
I look at the lack of evidence for public benefit from the deployment of ‘AI’, even in some of its most compelling use cases.
Cutting through grift, the Action Plan has three items: ‘sovereign’ compute that is under foreign control, murky data pools, and ‘world leading’ safety regulation that leaves the UK a playground for the biggest AI bullies.
Big tech’s hunger for public sector contracts can tell us a lot about its sources of power and its weaknesses. I look at why the ‘AI’ project needs public money and public data so badly, and why the UK is such a good place to get it.
A recent contract for educational data, awarded to Faculty, shows the continuities between generative AI (content as data) and predictive AI (people and their behaviours as data). When data of the second kind is acquired by powerful corporations, operating in what the Action Plan calls ‘deep collaboration with the national security community’, we should expect it to be used to discriminate against people where they are most vulnerable.
With the Starmer government fuelling the same politics of resentment that have put Trump into the White House - and put Musk’s engineers into the infrastructures of the US state - it matters more than ever who has hold of public data, and how people in the UK are exposed to data-and-algorithmic power, here and elsewhere, now and in the future.
AI Action Plan: nothing to do with the public interest
Let’s start our comedown with a report from the Joseph Rowntree Foundation, led by Imogen Parker of the Ada Lovelace Institute (ALI). Looking hard for evidence of productivity gains and better policy making, Parker concludes that there has been almost no real-world evaluation of public sector AI. That’s right: almost nothing is known about how 'AI’ is actually being deployed, or with what results, and there seems little interest in finding out. This
lack of monitoring and evaluation of AI within the public sector is compounded by the opaque market environment that exists outside of it… The ability of public sector decision-makers – from procurers in academy chains and NHS foundation trusts to senior leaders in the Civil Service – to make effective decisions about AI is limited by this one-sided market.
A separate review by Bristol Policy (2023) concurred that ‘public buyers are not adequately placed to confidently and responsibly procure AI technologies for the benefit of citizens’ and are ‘under a structural conflict of interest due to their operational interest in the deployment of AI’. And then there’s this:
Despite the widespread use of computational predictions in social settings… there’s remarkably little published evidence of their effectiveness,”
That’s from Narayanan and Kapoor, Princeton University computer scientists and authors of AI Snake Oil (here reviewed by Edward Ongweso Jr). They argue that so-called ‘predictive AI’ is not just unproven in public use: when it comes to making decisions about human lives, messy and emergent and complex as they are, automated systems are flawed in principle.
In practice, things also look bad. The Guardian reports that after a year of intensive work with OpenAI, Anthropic, Google and Microsoft, the Department for Work and Pensions has abandoned most of its potential AI projects. The Guardian also separately exposed that an AI system in use by the DWP is systematically biased in how it identifies claimants to be investigated for fraud. So while it’s reassuring that so many bad ideas are being dropped, it’s not at all reassuring to learn that ‘AI’ is already deeply embedded into public decision-making, and is about to be massively accelerated, despite the mainly negative evidence of its value.
There are more clues in this somewhat testy evidence from the (short-lived) Head of the Whitehall AI Incubator - formerly in defence intelligence, though we don’t know if she is returning there. Explaining why so few ‘promising’ applications have progressed to actual use, she has nothing to offer by way of lessons learned or lowered expectations. There is only dismay at the ‘blockers, frustrations and false starts’ that have hindered AI adoption. The point of piloting is not, you see, to test the promise against real-world conditions but to push on regardless: ‘if something fails, we try, try again and find another route to impact’.
Ah yes, impact. My deep dive into ‘AI in education’ research last year found a large literature that has education as its context. But it is overwhelmingly a literature of technical developments being evaluated against technical benchmarks. Such studies are already biased towards positive outcomes - technical systems often do improve against benchmarks over time. But actual use cases are only speculated about, or a token test case may be tagged on with the goal of proving ‘acceptability’ in some well-suited context of practice (‘acceptance’ and ‘trust’ are all that’s needed for the benefits of AI to be unleashed). Meanwhile, evaluating the effects on actual people… is constantly deferred. The same is true of swathes of studies in medicine and the care sector: benefits/impacts exist almost entirely in the conditional tense.
At least one of the first Actions of the Plan is to fund a serious trial into AI support for breast cancer scanning. If you thought the benefits had been proven long ago, that is not exactly the case. Diagnostic imaging has been a promising application for decades with hundreds of studies undertaken, but these efforts have only very recently shown that they can produce efficiencies in real-world diagnostic settings (and there is still some controversy - there’s a further summary below if you’re interested).
The trial looks at whether the use of AI could replace one of the two radiologists that is currently required to assess each scan. This could, on the face of it, save some precious human resources. But meanwhile, the shortfall in trained radiologists is projected to be 40% by 2028. Diagnostics has been a target for privatisation and outsourcing, not least because it is so technology-intensive, and the Society of Radiographers points out this does not increase capacity for the NHS but introduces new inefficiencies in the deployment of staff and the handling of patient data. A group of leading cancer specialists wrote an open letter to Wes Streeting as he took over at Health, begging him to stop looking for a ‘magic AI bullet’ and start training more radiographers.
So the best outcome of this lengthy trial is that a few radiographers can be moved from one part of the diagnostic process to plug shortfalls somewhere else. And we still will not know how the workflow might have improved if that money had been spent on training radiographers.
With trials like this being lengthy and uncertain, the Action Plan cannot seriously hope to pilot ‘hundreds’ of AI applications for public benefit. Luckily for would-be AI contractors, adoption does not have to wait on the evidence. Chancellor Rachel Reeves is determined to find public sector spending cuts, and in services that rely heavily on skilled workers, this can only be achieved through automation and deskilling. That is the hoped-for magic bullet.
Dozens of contracts for AI and algorithmic/data services have already been awarded, including £330million to Palantir to run the NHS Federated Database and - before the diagnostics trial is even under way - a £2m contract to Faculty to:
pilot the NHS Artificial Intelligence Deployment Platform (NHS AIDP)… [a] standardised infrastructure for the deployment of mature and certified AI medical imaging technologies at scale in the radiology workflow.
As the diagram shows, the contract gives Faculty the right to process image data and diagnostic information between private providers of ‘AI’ and NHS Trusts. As part of this contract, Faculty was also asked ‘to assist in the evaluation and moderation of tenders’ from ‘mature and certified’ AI vendors. So the data infrastructure is already built, power is already being ceded to the owners of the infrastructure, and vendors are already being plugged in.
Remember this model of value extraction when we come to education. For now, let’s return to the Ada Lovelace Institute report and its radical idea that, if you want to discover the public benefits of a new technology, you need 'qualitative, relational and experiential understanding from both publics and professionals'. That is, you should listen to people working in the system who understand the problems and bottlenecks that ‘AI’ is meant to solve. You should listen to people on the sharp end of the service that ‘AI’ is supposed to improve. (And perhaps, just as an aside, productivity isn’t the only purpose of the public sector). But the AI Action Plan strikes through woke nonsense of this kind. In its drive to ‘rapidly pilot and scale’, the real stakeholders are not the ‘people’ at all, but the technology companies.
‘In every element of the action plan, the government should ask itself: does this benefit people and organisations trying to do new and ambitious things in the UK? If not, we will fail to meet our potential.
Compute: the UK as data park
Yes, so long as it is new and ambitious, the UK Government is in. So long as it is doing a thing in the UK, the UK Government doesn’t want to own the thing, manage the thing, steer it, tax it, regulate it, or ask what tangible benefits the thing is going to bring to the people who elected it. The UK government is just happy - very very happy - that anyone with ‘AI’ wants to do a thing in the UK. Especially if that something is a data centre.
By focusing on computer power and data infrastructure, the Action Plan does at least recognise the business end of the AI juggernaut. The computer processing requirements for generative AI, at every stage of training and inference, are mind-bending. In fact, venture capital has piled into energy infrastructure as well as data centres, anticipating how much new carbon will need to be burned to access all that promised value. Though demand for AI compute is increasingly understood to be an accelerant to the climate crisis, for those on the supply end it is a massive accelerant to profit.
Chip-maker Nvidia became the largest and most profitable company on the planet in 2024. Boss Jensen Huang wants to make Nvidia data centres the ‘intelligence infrastructure of the world’. Amazon Web Services and Google Cloud (that is, providers of computational infrastructure) are the only divisions of their respective corporations that are making serious money from AI. Leaving aside the question of whether the world can sustain such an investment, it would seem a sensible plan for any nation with ‘growth’ ambitions to have its own infrastructure. The Ada Lovelace Institute (again!) in an April 2024 report on the Role of Public Compute, pointed out that:
As the compute intensity of cutting-edge AI research has increased… the participation of smaller firms and academic centres in AI research has decreased… allowing a small number of leading companies, often funded by venture capital, to effectively monopolise AI development. Venture-capital-funded development needs to provide exponential returns, which pushes towards building winner-takes-all products that can provide significant profits in a short period of time.
If the Government really wants to support UK start-ups and UK research, it needs to reduce dependence on these corporate giants. So welcome ‘StarmerGate’ - naturally on a more modest scale than Donald Trump’s $500 billion Stargate project - to:
‘Streamlined planning’ means preferential access to land and ‘accelerate the provisioning of clean power’ means preferential access to energy.
But here’s the problem: none of it will necessarily be owned by the UK public. Even the category of ‘sovereign compute’ (‘the smallest component’) is only meant to be ‘owned and/or allocated by the public sector’. The rhetoric is undoubtedly supposed to tickle post-imperial fantasies, talking up the UK as ‘an AI maker not an AI taker’. But in the press release boasting of the £millions of inward investment, the infrastructure companies are not from these shores. One UK-based cloud provider even complained in the FT (paywalled I’m afraid) that ‘by continuing to depend on overseas providers, the UK risks regulatory conflicts, security vulnerabilities and a fragile AI ecosystem dominated by external interests’.
The truth is these data centres will be ‘based within the UK but privately owned and operated’, with profits flowing to the host companies, overwhelmingly US tech giants. If Trump and his best buddies force Starmer to give up on the Digital Services Tax - and things are moving so fast, it may have happened by the time you read this - it’s hard to see what benefits will remain for the UK.
Because, aside from the empire-in-rags rhetoric, the UK is a puny player. The EU is investing 1.5billion euros in supercomputer sites, and already questions are being asked about the sustainability of this: chips need to be replaced every 8 months or so, and it’s doubtful any European AI company can generate enough revenue to justify the costs. The Sovereign wealth fund of Saudi Arabia recently entered a partnership with Google Cloud services worth hundreds of billions of dollars. But even in Saudi, where ‘sovereign wealth’ betokens some serious global power, Google will retain full control of the models and the data centres.
In a sign of just how craven a client state the UK has become, the Government recently caved in to demands by the AI industry to exempt AI model training from UK copyright law. Minister Patrick Vallance had the unenviable job of explaining to a posse of national treasures, from Kate Mosse to Kate Bush, why he wants to ‘support the development of world-leading AI models in the UK’ rather more than he wants to support the UK’s genuinely world-leading creative industries.
By contrast, when ‘Stargate’ crowds private and foreign state capital into US data infrastructure, there are obvious benefits to the US state. Massive subsidies for OpenAI, Oracle, Microsoft, Nvidia and Arm to continue their domination of the global AI stack. Support for the techno-dollar as the world’s reserve currency. Capture of data and content from the rest of the world. And all the geopolitical and strategic advantage of that accumulated data, capital and expertise. Even before the recent government-AI merger under DOGE, this Forbes article (approvingly) notes:
Trump is aiming to use the country's huge store of energy reserves to power growth, especially in areas where the U.S. is trying to secure global dominance. The infrastructure will provide crucial capabilities for national security applications, such as cybersecurity enhancements, autonomous defense systems, and AI-powered surveillance.
There is no obvious pay-off to the UK from parking those data-and-surveillance engines on our soil. Like other things the US has parked here - cruise missiles and Elon Musk’s politics for example - data centres are not a sign of strategic influence but of how far the UK must bend to accommodate its more powerful ally. Of course it is hoped that all this ‘compute will create spillover benefits in the form of jobs, investment and new, AI based, service businesses’, but ‘spillover’ acknowledges that it won’t be data centres themselves that create the start-ups - that will require entirely different kinds of investment. Most of the jobs data centres create are in their construction, because the whole point is that they are investments of capital: very few people are needed to keep the GPUs burning.
On the other hand, data centres have significant costs. Globally, they will soon outstrip the power demands of most nation states. Every kilowatt of that power requires 2 litres of clean water for cooling. These demands lead to local droughts and power outages, as well as toxic piles of e-waste as generations of chips are successively burned out or superceded. Data centres are being resisted by communities in many parts of the US, and by communities in Chile and Uruguay and Mexico where (naturally) the ‘frontier’ companies went prospecting for cheap land beyond their borders.
Closer to home, in Ireland, data centres have been using more power than urban homes since 2023 and are threatening to derail that country’s climate goals. The UK has been a small player until now, but The Raconteur still reports that:
In 2021, Slough effectively ran out of power. The construction of a new substation was required to meet [a] data centre’s growing electrical power demands. In 2022, the Greater London Authority warned that the building of new houses in West London had been hampered by a lack of energy capacity, as power was guzzled up by nearby data centres. And in 2023, Thames Water threatened to restrict the water services provided to data centres if they did not curb their consumption.
Let’s notice that the target areas for the new data parks are those ‘post-industrial towns’ where land and construction labour are cheap and there may be less organised opposition. At least some of them will be located in the Freeport Zones - free from local taxes, workers’ rights, safety rules and public transparency - that Starmer’s Government quietly inherited from its predecessor. And let’s consider what ‘coastal Scotland’ might have to offer. Coastal Scotland where, despite a complete lack of respect for energy transition agreements, the Government seems minded to approve a massive new oil and gas drilling field, and to honour every existing licence to ‘drill, baby, drill’. Coastal Scotland is where all that fossil fuel comes ashore.
It’s true that some of the new compute - the smallest ‘sovereign’ part - could be used for projects the Government deems to be in the ‘national interest’. This allocation, says the Action Plan:
could be modelled after the Defense Advanced Research Projects Agency (DARPA) or the Advanced Research and Invention Agency (ARIA) to quickly and independently provide large amounts of compute to high-potential projects of national importance, operating in a way that is strategic and mission driven.
The reference to DARPA - a give-away for the military origins and use cases of ‘AI’ - is also a gesture towards what ‘national importance’, ‘strategic’, and ‘mission-driven’ are likely to mean. But linking DARPA with ARIA, the UK innovation agency headed by one Matt Clifford, is hardly a sign of the UK’s equal status. ARIA receives less than £10million in funding from the UK state, while DARPA’s budget last year was over $4billion. So whatever Matt Clifford’s ambitions, there is not much ARIA can offer to counterbalance the demands of the real global hegemon here.
Murky pools: the UK as data source
Other nations have land and power supplies they are willing to trade for AI dollar, but the UK has something rather special. Our public services are among the longest and best established in the world, and this provides big tech with two resources: a captive market for data services, and a rich source of public data.
As the ‘best state partner’ for big tech, the UK certainly intends to be a captive customer.
So public sector ‘piloting’ should not be seen as providing an evidence base for the public interest but as a wedge for AI adoption. Feedback from the public may produce some improvements. but once adoption is locked in, the incentives to improve are small. Rather, public sector use cases drive adoption into more lucrative markets such as health insurance, education services, private security and private care homes.
At this point we say hello (again) to Faculty Science Ltd, the company that first rose to attention as the AI partner of the Vote Leave campaign. It has since built a sizeable portfolio in defense and security and other UK government data contracts, including - ironically - helping to tackle ‘disinformation’. We last heard of them in the context of Rishi Sunak’s Safer AI Summit, where for several hundred thousand pounds Faculty ran a two-day chat-athon about AI in schools. In their own write-up of this event, Faculty staff seemed confident of further funding to ‘try out’ some of the use cases they identified. And here in the Action Plan, as one of the ‘great examples of AI use-cases driving tangible benefits across the private and public sectors’, comes the news we have been waiting for:
the Department for Education’s generative AI and rules-based marking tool showed 92% accuracy in a pilot with teachers on year 4 literacy work when drawing from appropriately coded educational data and content [footnote 15]
As the only educational use case it does seem rather lame. And the embedded link goes to an entirely different report. But after diligent research I turned up the promised Use cases for generative AI in education: technical report, and was rewarded to discover that it was indeed Faculty that developed the ‘rules-based marking tool’ and wrote the elusive report. Though I’m afraid it is so bad I’ve left the details in a footnote, to be read only when you have a strong antidote to despair close by.
The point of these exercises in futility is not to build a viable product, and certainly not to find out what teachers need to help with their workload. It is to make the case that AI is essential, and can only be developed with huge piles of educational data:
it is important for [pupil] work to be assessed in both the context of other work from the pupil and, in comparison to the rest of the class or year group… collating these insights over a longer period allows teachers to more effectively monitor progress at both an individual and class level
It’s almost as if learner analytics never happened. Or rather, it’s as if learner analytics has not been promising exactly this for two decades, during which time teachers have gone from a few hours’ admin a week to incessant data work that is displacing time spent with pupils and driving many out of the profession. And doing all of this with AI will require even more data work:
including explicit structures and markers that AI tools can recognize and interpret is critical to enabling [documents] to be more effectively used by AI-powered tools. This could mean tagging learning objectives, error types, and correction suggestions in a standardised format that AI applications can process.
Of course Faculty is not alone in demanding more data, and more people doing more data work, so that they can make everything better ‘for people’ while keeping hold of all the data. We have heard the Tony Blair Institute demanding that every classroom becomes a ‘real time data environment’. Another public-spirited organisation, Google Cloud, recently paid Public First to assess the benefits of AI in the public sector, using the same method as the TBI (asking ChatGPT to analyse job classifications and running the results through Tony’s fantasy calculator).
And this bounty will only flow with proper homage to the data gods: ‘modernis[ing] Government IT infrastructure to ensure that AI is built on secure, efficient cloud computing foundations’ (like Google Cloud, perhaps) and ‘link[ing] together public datasets’ through common access points.
And this is where we come to the second big goodie bag the Action Plan is offering. Because if there is one area the UK does punch above its weight, it is in public datasets. The UK population is similar to the US in terms of health and disease, education systems, language and media consumption. But because - unlike the US - we have had a social state for much of the last 75 years, we have the largest bank of healthcare data in the world, and well-established record systems in education, social care, justice and publicly-funded research. All prized resources for private corporations that operate in those fields.
The Action Plan proposes a ‘National Data Library’, which sounds as though members of the public can pop along and take out some data in our lunch hours. And the Open Data Institute does in fact have a user-first vision for the Library with open standards, public governance, and ‘giving people and communities meaningful tools to have control over how their data is used’. But when the ODI asks, rather pointedly - ‘who are the key audiences for the Library - researchers, AI developers, public servants or members of the public?’ - only one of those groups, I’m afraid, is in the frame.
The Federated Data Platform for NHS data was awarded last year to Palantir - the company established by Peter Thiel to build military spyware and security tech. This contract has failed to win the trust of doctors, or of NHS patients, and The Good Law project contests that it was not given very ‘careful consideration’ either. The Faculty contract for medical imaging data is meagre by comparison, but the two companies have ongoing partnerships, as well as striking similarities in their military and security origins, and a shared model of playing their ‘security’ credentials to win contracts for sensitive public data.
And Faculty is now fishing in a new data pool. In August 2024 - just as its ‘use case in education’ report landed - the company was awarded a contract worth up to £4.5m, to create an educational ‘content store’:
including curriculum guidance, lesson plans and anonymised pupil assessments which will then be used by AI companies to train their tools. [Faculty will] develop the technical blueprint for a secure content store … with the capability for differential access and charging requirements should those be required.
Meanwhile, it is for other companies to develop the actual products:
You have to wonder what these companies think of being invited to Faculty’s data party, and asked to make tools to Faculty’s recommendations (for a fraction of the money). Still, we educators know nothing about the free market, so we will just have to trust that this is all in the interests of innovation.
Meanwhile, educators should be concerned about the ambition to include ‘pupil content’ and ‘anonymised pupil assessments’ in the data pool. There is at least a plausible reason for National Curriculum content to be used to augment the generation of lesson plans and marking schemes. What this specification seems to envisage is for pupil work to be assessed individually, by the same systems/data architectures that use that pupil work as training data. The error-in, feedback-out pipeline of Faculty’s proof of concept, despite its gaping flaws, may at least reflect a genuine ambition: that old dream of the personalised teaching engine.
In AI Snake Oil, Narayanan and Kapoor make a distinction between ‘predictive’ AI, that uses data about people and their behaviour, and ‘generative’ AI that uses cultural content. While cultural appropriation has many risks, especially for people of minority cultures, the impact on individuals may be indirect. Predictions based on personal data, on the other hand, have obvious and direct consequences on people’s quality of life: whether they can get insured, whether they are targeted for arrest or investigation, whether they are identified as ‘falling behind’ relative to the national curriculum standards.
You’ll have noticed that the Faculty content store is designed to manage both kinds of data. And there is a clue here. As discussed in other parts of the blog, generative AI use cases are not nearly valuable enough to justify the epic expense of training and maintaining the models. Student essays, boilerplate code and SEO pap are not going to cut it. Either specialist models must be trained or generic models must be retrained with codified expertise if they are to automate anything of actual value.
There is an older, simpler and more proven value proposition, and that is just to get hold of social data. If that data relates to people’s purchasing or political behaviours or media consumption habits, it can be sold on. If it relates to people’s interactions with health, education, care, the criminal justice system, it can be used to model risk within those systems. It can also be used to build alternative, shadow state services in which managing people as risk is the point rather than the method. The fast-fading sheen of generative AI is being used, it seems, as cover for social automation, social discrimination, and control of social data.
Regulation, deregulation, misregulation and safety testing: the UK as cheerleader
So we come to the third aspect of the Action Plan: ‘world-beating’ AI regulation. And this is where the doublespeak really goes full Orwell. Ignorance is knowledge. Regulation is deregulation. ‘Safe’ AI is ‘military’ AI. As if deregulated tax, deregulated planning and deregulated public data were not enough to propitiate the AI gods, the UK is doing its best to help deregulate the rest of the world. When I started this post, despite intense lobbying, the EU had shown at least some regulatory resolve in key areas such as data privacy, energy efficiency and AI safety. As I conclude it, threats of a military nature by the Vice President of the United States, a man who owes his entire career to libertarian AI Founder Peter Thiel, have led the EU to withdraw what was left of its AI Act: effectively to give up on regulating tech capital in the interests of European citizens.
Did it make a difference that the UK was the only other country to stand with Musk and Trump against the Paris declaration on open, ethical, secure and trustworthy AI? We will probably never know.
Again, when I started this post, the UK at least had something called an ‘AI Safety Institute’ that I was fully prepared to satirise. But reality outruns satire. Peter Kyle has just agreed to rename the AISI the ‘AI Security Institute’ and ‘shift[] the UK’s focus on artificial intelligence towards security co-operation rather than a “woke” emphasis on safety concerns’.

The most sympathetic reading of the Action Plan (as detailed here by James Smith on Mastodon) is that it promised to do nothing at all on regulation, pending further developments. But those further developments are very much under way, and Starmer has wisely observed that it is not enough to be a bystander when the bullies own the playground: anyone who does not join in is going to get their head stamped on.
In truth, the UK has dogged the US at every turn, allowing AI companies to sign up to voluntary commitments on safety, and ensuring the ‘AI Safety Security Institute’ was deeply embedded with industry interests. Remember the role of Faculty in helping to select those AI applications ‘mature and certified’ enough to play in their data pools? You wondered, I’m sure, who could provide this kind of certification. And here are two contracts for Faculty (worth £1.3million) to conduct safety testing for the AI Safety Security Institute, in close partnership with the much better endowed US AISI (it really doesn’t matter at this point if you can’t tell them apart).
As the Government’s own note makes clear, ‘safety evaluation’ is a purely technical exercise that compares the performance of new models with existing benchmarks. TheOpenAI o1 model, for example, outperforms the ‘reference’ model on cybersecurity tasks, but the report from UK/US AISI only notes that these superior capabilities ‘can be leveraged to strengthen cyber defences and can also be used maliciously to exploit a computer system’; so it’s not clear how ‘better performance’ translates into ‘better safety’ for users and data subjects. Faculty’s founder, Marc Warner, is not keen on doing even this level of testing. When the contracts were awarded he opined that there was not enough capacity at the UK AISI: rather than being ‘on the hook’ for doing the actual work, his goal was to ‘set standards for the rest of the world’: presumably the same standards agreed by the US AISI with the US AI industry. Since that august body seems now to have been abolished, it would be interesting to know whether the UK AISI is even in possession of its own rubber stamp.
Technical evaluations of model performance, in any case, say nothing about the harms that models may do in real situations. Here is a nice visual summary of the issue, from a recent paper by Gabriel Mukobi at Berkeley.
The gaps between technical evaluation and social mitigation may seem theoretical, but a quick read shows that we are all living through the contents of the first and second gaps, and many people are already plumbing the ‘Catastrophic’ depths of ‘Possible Harms’.
Standing with Elon Musk and JD Vance against ‘open, transparent, ethical and sustainable’ AI is one approach to world-leading regulation, though perhaps not the best one for avoiding Harms. No doubt eager to obscure this capitulation behind a more populist issue, Peter Kyle was sent forth (again) to trumpet that the UK would be the first nation to create a specific law against AI child abuse imagery.
Just to be clear, creating sexualised images of children (and other people) with AI is already illegal, but the legal system is unable to keep up with the tsunami of violent, degrading, fake and exploitative images that has been ‘unleashed’ by synthetic image generation. In the last decade of relentless technological upgrade there have been cuts of 22% in public funding for justice. So the problem is not a lack of laws. Not even the Internet Watch Foundation, that campaigned for legal changes around AI-generated images, thinks these particular changes are going to make a big difference. How will ‘optimised’ AI models be defined, when a generic model can be optimised with a prompt injection, or a database of private images? And has anyone written a ‘manual’ for doing technical stuff since, I don’t know, the 1990s? If accessing the relevant threads on Telegram, Reddit etc etc is too difficult, ChatGPT might be able to help, now that OpenAI had agreed to ‘ease up on AI paternalism’.

I notice that people who spent 2023 reassuring us about guardrails have pivoted to guardrails being childish nonsense. I notice that, once again, it is the ill-defined activity of ‘building AI tools’ that is prohibited and not using ‘foundation’ models to do exactly the same thing. Because, what with OpenAI’s new ‘grown-up’ approach to safety, Meta dissolving its Responsible AI Team, Musk launching an ‘even more uncensored’ Grok, and the Trump administration throwing out Biden’s already weak-as-piss AI safety rules… foundation models don’t really have much left to stop bad actors using them in the worst possible ways.
This is not a drill
Though the media class on this side of the Atlantic seems to be in a state of fascination / dissociation about events over there, the parallels are striking. In Starmer we have a weak, technocratic leader whose tech projects will fail to deliver any service improvements, but will put oodles of public data into private hands. The profits from managing this data will be committed to the kind of politics that keep tech capital ahead: tax and legal loopholes from the state, and the micro-politics of resentment, ignorance and white entitlement, amplified through deregulated social networks. Musk-backed or self-propelled, Farage is waiting in the wings.
It does not take a paranoid imagination to join the dots. There are direct links between AI in the public sector and projects of military and security control: between Palantir having access to your sexual health data and trans people being harrassed in US jails; between OpenAI writing lesson plans, and helping to target civilians in Gaza, Syria and Lebanon; between Musk enshittifying twitter, Musk withdrawing Starlink support from Ukrainian troops when the mood takes him, and Musk denying medical services to tens of thousands of people unfortunate enough to depend on USAID. Google Cloud and Azure are running your university services today, and tomorrow they are doing whatever Pete Hegseth decides they should do, in the interests of US billionnaires.
There is still time for the UK to wake up and clean up from AI. None of this is in the public interest, and it represents massive opportunities for vulnerable people to be harmed. No more public data should pass into private hands without robust, transparent and well-funded safeguards of public accountability. Institutes like the ALI and ODI have practical solutions for this, independent from the AI industry. And it should be obvious from events over the pond that this is not a drill.
Aside on two UK figures that link several parts of this story
Dominic Cummings started out in AI, famously used Faculty’s services to engineer public opinion behind the Vote Leave campaign, and moved seamlessly on to being the power behind the Boris Johnson throne. After a rapid fall from grace over Covid 19 scandals and lies, these days he can be found advising Elon Musk on UK politics and small Government, and apparently telling Starmer’s Labour party how to crush crime and build infrastructure in partnership with AI.
Lord Peter Mandelson, whose company Global Counsel manages PR for Palantir, is reported in the FT to have advised Starmer to ‘build relations’ with Elon Musk using Nigel Farage as a ‘bridgehead’. As UK Ambassador to AI, sorry, the USA, Mandelson claims to have brokered the ‘signature policy’ of making the UK Safety Institute an offshoot of the US Security State, because the ‘backbone of the special relationship between the US and UK lies in its defence, intelligence and security partnerships’. This is a good fit with his previous adventures spearheading UK profits from the reconstruction of Iraq after another US-UK defence partnership devastated that country.
Aside on public sector productivity
If we cared only for productivity in the public sector, we should still pay attention to the productivity paradox - known and studied since the 1980s, recognised by workers everywhere - that computerisation does not improve productivity. This has been confirmed with AI applications in industries as diverse as construction and agriculture; a recent Upwork study found that 77% of employees in companies deploying AI felt they were working harder. Recent work by economist Daron Acemoglu and colleagues has found that across industries, automation produces modest productivity gains in some areas, but large increases in inequality and precarity across the board.
So when Pat McFadden opines that ‘AI could be the key to improving public sector productivity, which has been flat for two decades’, either he has missed three decades of intensifying data and algorithmic management - and its many failures - or he really does think that AI is something completely different. No relation at all to Tony Blair’s first e-revolution, including the ill-fated electronic patient records system that has still to be delivered, or the plan for digital identity cards that he is still energetically touting. No relation to any of the botched public IT projects of Blair’s first tech revolution that the Independent reported in 2010 had cost the tax payer nearly £30 billion.
But perhaps we should not see productivity as the main goal of government. As Eryk Salvaggio argues in his Anatomy of an AI Coup, the entire rationale for Elon Musk’s DOGE :
is based on a productivity myth that the goal of bureaucracy is merely what it produces (services, information, governance) and can be isolated from the process through which democracy achieves those ends: debate, deliberation, and consensus
The public sector is supposed to contribute to people’s quality of life, and (within the many limitations of a mass system) the capacity of those same people to have a say over how ‘quality’ is defined and produced. Quality is always debatable, in the best sense of that word, but efficiency (speed, scale, standardisation) might well be antithetical to it. Caring attention is not scalable. Respectful interaction can’t be standardised. And public accountability is clearly wasteful, compared with the kind of tech start-up / monarchy / reich in which the Founder gets to decide how everything is run. Politics is like that.
It’s the opportunity to be involved with the quality of people’s lives that draws most people to work in the public sector. It’s hard to know why else anyone would put up with the low pay, long hours, and being on the front line of social break-down. Doing public sector work is not ‘efficient’ from the perspective of its workers’ own economic self-interest or personal thriving: what if ‘care’ is, precisely, the surplus you bring, over and above the transactional demands of the job? If you have been involved in community work for a long time, as I have, you will know that providers and recipients are not two sides of a transactional arrangement but roles that everyone plays at different times, and often at the same time in different ways (Covid would not have been survivable, for many, without mutual aid). Of course a public service is also a ponderous bureaucracy and a state enterprise, an instrument of social control and a risk management facility, in which this kind of reciprocal community is a distant ideal. But I suspect it is the ideal that keeps people working for it, and the whole thing tottering forwards.
Aside on diagnostic imaging
Diagnostic imaging '(‘cancer scans’) is the use-case of choice for every AI champion looking to tug on the nation’s heartstrings - and I apologise for returning to it again. I’ve no expertise (obviously) in radiology, but I can read research. And researchers in quality journals like Nature express frustration that real-world clinical testing lags so far behind technical development; that improved ‘accuracy does not equate to efficiency’ even in highly specialised workflows where AI has been carefully integrated; and that bottlenecks in cancer diagnosis and care could be fixed far more quickly by investing in the workforce than by ‘novel’ technologies that take years to develop, refine and integrate.
A recent review in Nature should be required reading for everyone who believes that ‘AI is curing cancer’ (Peter Kyle, for one example). The reviewers looked at 48 clinical trials, of which only one did not show bias in the study design. Then they undertook a meta-analysis of groups of comparable studies and concluded there was ‘no evidence of AI tools reducing the time on imaging tasks’ in real-world use. Of course data can support clinical decisions - it has been doing for decades - but only as part of a complex workflow involving many decision points, many kinds of expertise and many patient interactions. Machine learning data is costly to bring into use, sometimes requiring specialist scanners as well as specialist data training, and that all comes at the cost of other investments - the actual cancer treatments, the ongoing human costs of care, or the training of highly expert oncologists and cancer nurses.
Commenting on the new £11m NHS trial, the president of the Royal College of Radiologists recognised the potential but:
“while AI holds great promise, it also carries major risks”, she added, acknowledging concerns that AI algorithms would need to be very carefully designed so they produce equally reliable results for different groups of women regardless of their ethnic background. [The issue here is not only the data that AI systems are trained on: if AI detection leads to a few early cancers being identified even earlier, that may only shift resources of treatment towards patients who already present earlier.]
She also voiced concern that, while the trial is not due to produce results for several years, the longstanding shortfall in consultants in the NHS is expected to reach 40% by 2028.
Questions for journalists
I’m not a journalist, I’m an exhausted academic, but I do hope journalists who cover these issues are asking questions like:
What does ‘allocated by the public sector’ actually mean in terms of national computing infrastructure?
How will public interest uses in education, health and infrastructure be ring-fenced and prioritised?
How will profits be taxed?
How will any AI solutions developed with UK compute be regulated and made accountable to the UK public? Under what legislative regime will data be processed?
Will UK AI researchers be able to access public compute without compromising their independence through partnerships with big tech?
Will the National Data Library be managed and governed under the kind of open and accountable rules set out by the Open Data Initiative? If not, who will manage and govern public data in the interests of the public?
Does the UK civil service have the expertise to commission and manage such infrastructure projects in the public interest, or will management continue to be outsourced to the tech sector itself?
What investments are being made in the UK water supply and power infrastructure to protect the public from the impacts of increased demand?
That User Research from Faculty: a Titanic fail
Use cases for generative AI in education: technical report is cited by the AI Action Plan as showing ‘92% accuracy in error detection with teachers
The 92% success rate refers to the best error detection achieved in a technical trial. This was with a synthetic dataset of single sentences, to each of which a single grammatical or spelling error had been added.
No real student essays were involved, or real teachers marking them.
The report is a classic of the genre I’ve already described - the ‘technical development, technical benchmark, token use case’ pilot - but there was no technical benchmark. Rather:
‘the process for inserting errors from the NC [National Curriculum] into the essays [sentences] is essentially the error detection pipeline in reverse. Both pipelines were developed simultaneously.
So, essentially, the developers have prompted ChatGPT to find a small number of basic errors in short sentences - that they introduced themselves by the same process - and this was done with less than 100% accuracy.
For contact with actual teachers (possibly as many as 18 of them) you have to track down an even more elusive Faculty report, Use cases for Generative AI in education: User research. This is so bad, I’ve left the details in a footnote. Please only read it if you have a strong antidote to despair close by. I finished the technical report and user research report from Faculty very confused about who had participated in their research, how many of them, and what they had participated with - whether it was ‘examples of feedback the tool had generated’, wireframe screens (as included in the report itself) or a chance to ‘interact with the tool directly’ in some way.
Before anyone got to interact with the tool there was a survey (respondent groups not clearly defined) that started out like this.
If I was being picky, I would suggest that introducing ‘a magic tool’ for saving time and then asking about AI doesn’t really avoid respondents ‘being biased by preconceived ideas’. But no further mention is made of the survey, so let’s move on. The use case chosen to progress to Proof of Concept is ‘giving feedback on student essays’, and the PoC seems to be the same technical development described as giving 92% accuracy in error detection using synthetic data and a two-way error/detection pipeline. In the technical report, 6 teachers or ‘super-users’ get to interact with the tool, though the 3 ‘responses’ they each generate are reported as though there are 18 teachers involved. In the user report (try to keep up), the number is 8 teachers (the same? different?) while 5 ‘AI leaders’ and a further 12 teachers are interviewed, some but not all of whom have encountered the tool. Nine students are just told about the tool and asked if they think it is a good idea: on the whole they don’t.
Confused? Don’t worry, because the generated outputs included in this report are so bad that we don’t really need to worry about what a cherry-picked handful of AI-inclined, teaching-adjacent participants made of them.
The PoC uses OpenAI’s GPT4 to provide the interface and language model, and injects additional data in the form of the National Curriculum specifications for Year 4 literacy (that can be found here), and a number of documents about feedback practice. Two essays from the NC specifications have been used to generate sample essays and feedback on them: a short descriptive piece about the Titanic and an informative piece about pointe ballet shoes. The Titanic piece in the original is vivid and imaginative:
Wealthy people smiled at their beautiful and expensive accomodation while the poor threw their bags on their bunk bed all squashed.
While you look down on the floors and look on the walls and chairs you could see a sparkly clear reflection of yourself. It was so clean you could kiss the floor.
While there are a couple of grammar and spelling issues here, the NC specification does not really focus on them. Instead it provides (or models) detailed, perceptive feedback on the writer's choices and skills. Faculty’s version of ChatGPT generates this:
Having found the errors inserted for it to find, the model concludes that what this student really needs to practice is the formation of regular plural nouns by adding an -s. Even in year one, few pupils have problems with forming regular plurals in English, and this student can correctly use the adjectives ‘well-spoken’ and ‘impeccable’. So a real teacher and not a chatbot with an error pipeline would have concluded that any missing ‘s’ was a slip of the pen.
There’s also a suggested exercise for this student, which is the function of this PoC that supposedly goes beyond what ChatGPT could provide through naive prompting with student work.
You’ll spot that there are a few problems with this as an exercise. In a. what’s actually wrong is the number, not the tense of the verb (‘the wealthy people smile’ is fine, and so actually is ‘the wealthy person smiles’, unless from surrounding context you know that the sentence should be in the past tense: but you don’t, and anyway, why confuse the tense with the number issue?). There’s also a problem with the number of a different noun (‘accommodations’) if you are a speaker of British English. In b the problem is not with the verb but with the adjective/adverb. And c. is not incorrect at all.
Meanwhile the writer of the pointe essay, who also has bizarre basic errors introduced to their text, is directed to spelling practice:
Even if these were all real errors (only the last is in the original) and even it were plausible that a student capable of writing at this level could not spell christmas, and even if ‘beautifully (used as an adverb)’ were incorrect… presenting wrongly-spelled words and asking students to guess what they are supposed to spell is not a recognised approach in literacy teaching.
Remember that these are the two examples that the developers actively chose to present as a proof of concept and to publish in their report. As they admit themselves, even this quality of response:
derives from the inclusion of education-specific feedback guidance within the training data of these models, possibly sourced from across the globe. It should be caveated that this can only be said confidently for feedback on Year 4 English essays, and quality is highly likely to decrease for older year groups and other subjects.
Of course there are far, far better applications for correcting spelling and grammar - you are almost certainly using one now. And that may explain why both developers and participants assume the quality of these outputs is acceptable: everyone knows that computers can do this stuff and no-one is paying much attention. But the proof of concept actually has nothing to do with spelling and grammar: the proof is supposed to be of helping teachers to give personalised feedback on student work. The fact that it doesn’t work for the most favourable use case for automation, further reduced to basic error production/detection pipeline, does not seem to get in the way of it being cited as the basis for the Government’s strategy on AI in schools.
The most punishing irony of all is that it was Dom Cumming’s friend Michael Gove who, as Education Secretary, drove through sweeping reforms of the UK literacy curriculum towards a rigid focus on ‘correct’ grammar and spelling. Gove did away with the ‘woke’ nonsense of having children acquire grammar naturally by using language - in ways that make sense and stimulate and are enjoyable to them - to insist on learning ‘the rules’ of such things as ‘frontal adverbials’, an approach that even Gove’s original advisers agree was utterly mistaken. The evidence remains as it always was, that teaching grammar does not improve students’ writing.
But meanwhile these reforms have meant that access to creative subjects has been ‘devastated’ for those children subject to the National Curriculum’s demands. Children’s enjoyment of reading has plummeted. And the Gove project of restoring ‘grammar’ schools and faux grammar schools as the gold standard of English education has amplifed educational inequalities through the school system.
A belief in grammar as the ‘logic of language’, as representing ‘intelligence’, and as having one correct formulation regardless of context - the dream of a context free language that could be parsed automatically - was the explicit project of computational linguistics up to and until the emergence of machine learning as a viable alternative. Its problems proved fairly intractable, due to what this recent MIT review of automated grammar correction names ‘the inherent difficulties in defining an error according to the perceived communicative intent of the author’. Yes, ChatGPT and its derivatives can perform grammar correction tricks only because they have ingested vast amounts of grammar-teaching and grammar-correcting text, examples of which we can see swilling around half-digested in the Faculty ‘pipeline’. But ‘perceived communicative intent’ is what literacy teachers work with.
ChatGPT can spit out a correct sentence, but it is hopeless at helping students to understand their own inaccuracies and idiosynchracies, or inspiring them to write in more powerful ways. And yet, with AI’s insatiable demand for profitable use cases, this focus on what can be automated in the production of language will no doubt be ramped up. Teachers and children alike will be subjected to regimes of linguistic correction, and made to feel stupid even when the ‘smart’ bot gets it hopelessly wrong. The most privileged schools, of course, are exempt from Gove’s curriculum, as they will be exempt from AI ‘assistance’ to their teaching staff. But 100 years on from the Titanic, ‘the poor’ are still being ‘squashed’ into their linguistic bunks.
Thank you for writing this and analysing these documents for us. I look forward to reading these thoughts, but acute AI blues is preventing me at the moment. Our university recently opted into Blackboard AI Assistant. Course leaders can now create chatbots trained on course content and equip them with personalities and AI generated portraits for students to chat with. Blackboard can now also spit out assignments that students undoubtedly use as ready-made prompts for AI's. I am sure that other colleagues sitting through a demo of these functionalities thought the same as I did: this is ridiculous nonsense, useless, harmful, a waste of time, nobody asked for this. It all looks like some Futurama joke. Yet, out of politeness to our colleagues we all pretended that 'this is fine'.
This is a sharp analysis of how the UK government's embrace of AI, under the guise of innovation, seems to be more about feeding the private tech ecosystem than benefiting the public. The focus on "sovereign compute" and "data pools" really speaks to the direction we're heading: a corporate-controlled landscape where AI’s supposed benefits for society are sidelined in favor of business profits and surveillance.