Among the millions of words written about ChatGPT, LLaMA, Bard and Bing, and the excitement about rethinking assessment/ collaborating with artificial minds/ disrupting higher education (choose your pulse-raiser), this sentence felt significant to me. It’s from Nature’s Editorial of 23 Jan, and it explains that AI tools will not be credited with (co)authorship of any papers published in Nature journals because:
any attribution of authorship carries with it accountability for the work, and AI tools cannot take such responsibility.
Not everyone concerned with student writing is so clear about the difference between a writer and their tools. The International Bacchaleureate (IB) – an alternative to A-levels that includes compulsory ‘theory of knowledge’– has taken a different line. In its latest guidance to students, it says:
‘as with any quote or material from another source, it must be clear that AI-generated text, image or graph included in a piece of work, has been copied from such software. The software must be credited in the body of the text and appropriately referenced in the bibliography’ (my italics)
If ‘software’ can be ‘quoted’ ‘credited’ and ‘referenced’ like any other ‘source’, it is being treated like a writer, or more like a writer than a typewriter.
As you might expect, the Association for Computational Linguistics brings more detail to the party. ‘Since these tools come from our own field’, the ACL Program Chairs begin confidently, ‘we are in the best position to appreciate their potential problems’. They go on to list a range of ways these tools might be used in writing submitted to ACL publications. ‘Polishing’ and ‘paraphrasing’ along with ‘short-form input’ are trivial and do not require any acknowledgement. Carrying out literature searches and describing ‘widely known concepts’ can be done with AI support but this should be noted in the method. Then we find: ‘if the model outputs read to the authors as new research ideas, that would deserve co-authorship or acknowledgement from a human colleague… we suggest acknowledging the use of the model’. However, ‘a contributor of both ideas and their execution seems to us like the definition of a co-author, which the models cannot be’. So: new ideas = not authorship; new language = not authorship; but new ideas + new language = authorship, which ‘cannot be’. (Does ACL mean that AI authorship is not permitted under its just-invented rules, or conceptually impossible, or just technically not achieved yet? These are philosophically very different propositions, yet all find a place in that small phrase. How tricky this business of fitting ‘ideas’ into ‘language’ can be!)
For ACL, the line between writing tool and intelligent author is not found in the technology itself but in its use, and the line can only be drawn by making a series of value judgements about the outputs. How long is ‘short-form’? How ‘widely’ must a concept be known before writing about it is no longer useful human labour? How extensively (and using what tools?) should human authors ‘check for known sources’ before deciding that a section of generated text (how long a section?) comes not from a single author but from the hive mind, requiring no attribution? It is good to acknowledge the human difficulties – and responsibilities – in making these value judgements. But there is little real guidance here. A lot rests on the difference between sense and style, between ‘ideas’ and their ‘execution’ in ‘language’, a difference that any student of linguistics would regard as deeply problematic.
Nature’s stance at least is clear. I like that it focuses on accountability, on writing as a responsible act and not just a spin on a pattern-matching slot machine. I can’t help feeling that if the conversation about AI writing tools focused more on writing and what it means to write we might come back around to academic integrity and other scare quote issues such as the future of university learning with more light to shed, and perhaps even with more workable ideas about student assessment.
Why write?
Because I do wonder how students are expected to understand academic integrity, let alone how academics are supposed to design ‘AI-proof’ assignments, if we don't talk about the purposes of writing, both in particular cases and in general terms. When students ask: ‘why is this piece of writing worth my time?’ – which comes now with the real option of farming it out to a chatbot instead of sweating the words in person - we need to look beyond integrity policies and even beyond immediate learning outcomes. Why is writing developmental, or how do we make it so? And what kind of people are developed through the writing we ask them to do? The Nature statement suggests that one answer to this question is ‘accountable people’: people accountable to the words they write.
‘Writing’ in this context means any language recorded as digital text. LLMs can generate computer code, lesson plans, film scripts and poetry as readily as academic essays. The ill-fated Galactica specialised in scientific genres: GPT4 can be cued up to behave as a coding assistant or a poetry muse. As LLMs are trained on more specific data sets, or with better targeted prompts, they will be able to match different genres more closely. We can’t whack the mole by asking students to submit a video, for example, instead of a standard essay. Presentations and film scripts are very much part of the generative AI repertoire. LLMs do not ‘know’ that there are different genres of writing, or even what a genre is. Genres to LLMs are simply data patterns that emerge at a particular order of scale.
I’ve been lucky enough to work with friends and colleagues who support student writing in practice, many of them members of the Learning Development in HE network, and I’ve always found them deeply thoughtful about the purposes of writing at university, as well as deeply committed to it. They are full of ideas about how writing helps students develop not only strategies for getting better grades but things like voice, stance and identity (Roz Ivanic’ book on Writing and Identity is still a touchstone). In fact, without voice, stance and a sense of identity, strategies for writing well are much harder for students to make sense of. Many learning developers, observing this, have been drawn to ideas from language studies, ideas that I feel are also crucial to getting a handle on Large Language Models (LLMs) and what they do.
The language theory left out of language modelling
We should wonder at what LLMs can do. But our wonder should be directed at what they are actually doing, and not what their somewhat over-heated advocates claim they are doing, or what our own linguistically-oriented minds intuit that they are doing.
Back in the 1980s, when I was a student of AI, few people thought the puzzle of language would be solved by neural nets or crunching linguistic data. Transformational-generative grammars were what everyone was working on. That is, systems of rules for generating meaningful statements in language: ‘Good Old Fashioned’ or symbolic AI. Break the grammatical code that relates form to meaning and machine language would follow, with symbolic thought not far behind. (The ‘grammar’ of how things in the world relate to each another was thought to be close in form to the grammar inherent in language – a touching faith in the classical trivium of grammar, logic and rhetoric.) Except that it didn’t work. The code, if there was a code, remained stubbornly inside its black box. Symbolic thought was still being laboriously hand-coded into expert systems by human thinkers, and the pendulum of research funding and attention swung in more profitable directions. It swung towards training probabilistic models on ever-larger corpora - sets of language data. (The differences between statistical and neural language modelling techniques are important for recent advances in the field, but I’m not going to try to explain them - wikipedia does it much better. ‘Neural’, by the way, is a metaphor that cries out for some critical discourse analysis.)
The turn away from symbolism signalled the end of any attempt to understand language as a system of making and communicating meanings. The problem was no longer a philosophical or linguistic one but purely computational. What was needed was more processing power and better approaches to parallel processing. The assumptions that had been made about language in the age of symbolic processing and Chomskyan grammars were left unexamined in this era, despite the lessons that might have been learned from the failure to implement them in working systems. Only a complete lack of interest in language theory can explain ACL’s assumption that surface features of language are unimportant (‘polish’ and ‘paraphrase’ can safely be outsourced to AI) and that the work of meaning lies inexplicably elsewhere.
Because in language theory, neither symbolic rationalism nor data-crunching empiricism are particularly popular. Computational linguists who venture beyond these twin peaks of AI funding can meet with other ideas. Murray Shanahan meets Wittgenstein, for example, as he explains in this accessible article on LLMs. Wittgenstein did not see language as a system of symbols organised into a grammar, but as a set of devices for coordinating social activity. Deleuze and Guattari make a direct critique of Chomsky’s grammar and its assumptions about language and meaning in the Postulates of Linguistics (the link is to a politicised reading of their work, but an intelligent and entertaining one). If European critical theory is not your thing, Halliday’s functionalism or JL Austin’s idea of ‘illocutionary acts’ come from a more pragmatic philosophical tradition. But all of these theories start from what language does – interpersonally, and in communities – in order to explain what it is. How language positions us as speakers in different ways, in relation to different audiences. How our social identities are constructed and maintained through linguistic acts and a shared memory of them: what we said, and how we said it, and how we were accountable for it afterwards.
As educators we do not have to engage with theory to meet social and developmental ideas about language. Open any academic writing guide and find language treated as a cognitive toolkit but also as a set of cultural and communicative practices. We see students learn the language of their subject and we see that this changes how they think as well as how they write. We notice that this development happens most easily in the context of shared activity and talk, especially with a mix of more and less expert speakers (the ‘see one, say one, teach one’ approach to social learning). If we speculate then about how and why language evolved in our sociable species, our day-to-day observations as well as all the respectable theories involve the coordination of shared activities – activities such teaching and learning, caring and being cared for, procuring food, solving problems, devising rituals, negotiating power.
Language emerges - whether in gesture or vocalisation or both - from bodies organised into social groups. To speak is to conjure shared attention and intention, to say ‘this matters’ and invite other people to agree. Speech/gesture is also distinctive to each body, and each embodied person is accountable for the signs they make. This accountability is not an afterthought to language but what makes language possible, because language also makes lying possible, and lying is an excellent survival strategy for an individual in a social order. Symbolic linguists used to puzzle over the ‘black box’ grammar problem - the conundrum that for words to mean anything, some kind of symbolic system must already be in place. When language is understood as the social management of intention, the puzzle becomes a different one. How did human societies develop the ‘platform of trust’ that was needed for signs to be taken ‘at value’ and not just as moves in a game-theoretical space?
Unlike the black box grammar problem, however, we can imagine bonds of trust developing through non-linguistic social practices, and linguistic practices emerging alongside them as part of the fabric of sociability. We do not need a leap into symbolic language before these practices would have allowed human groups to thrive. Over time, language would have become useful apart from the immediate activities it coordinated. Words could refer to distant places and events, around which shared intentions could be developed, resources gathered, skills practiced. Actions could be described before they were realised, or described and left unrealised, the words themselves becoming a social resource - a plan or a warning, or an entertaining tale. As language became more sophisticated, the ‘platform of trust’ would be maintained by honouring different genres of speech and different requirements for ‘keeping your word’: promises and debts, rituals and favours all imply different kinds of accountability. Keeping your word secures your place in the social order.
Writing and accountability
Writing is language that has been separated from the speaking body, one reason that Socrates considered it such a doubtful practice (‘writing, which is external and depends on signs that belong to others… doesn’t know to whom it should speak and to whom it should not’). Perhaps to close this uncanny gap, writing is made accountable in other ways by the cultural practices that surround it. In European cultures we have laws of copyright, rules of academic referencing, and the ritual use of the signature to guarantee integrity. And in cultures around the globe, the origins of writing seem often to be tied up with accounting and therefore new kinds of accountability.
Writing is language made material. This makes it a new social technology through which new selves are made. Foucault describes new ways of ‘caring for’ and ‘knowing’ the self that writing has produced at different moments in European history. But writing continues - in different cultures, through different genres and styles - to bear traces of its origins in the body. We talk about writing as a ‘hand’ or ‘voice’, an idea we draw on when we encourage students to develop an ‘authentic’ position for writing from. Perhaps this is a romantic idea. More pragmatically, or from a post-humanist perspective, we have rules for identifying authors and their words that we ask students to apply. These rules suggest that there may not be a natural connection between the body and its writing after all, but that connections must be made and maintained so that writing can continue to constitute (among other things) academic identities, and claims to academic knowledge. (This does not mean academic writing demands the same accountabilities from all bodies equally, or that academic identities and knowledges are constructed in inclusive ways).
Whatever features of writing are reproduced by the statistical and neural processing work of LLMs, they do not produce a meaningful, accountable relationship between words and world, self and others. LLMs are only trained on words. Not even on words as we know them, but on the data traces of writing – ‘word embeddings’, ‘sub-word modelling’, strings of digital marks and the patterns they make. LLMs produce new strings of data that mimic human language uncannily well, and because we are a linguistic species, we take them as meaningful. We find in them traces of different genres that imply certain kinds of accountability – that is, certain relationships between words and world. Whether that is factual accuracy or emotional authenticity, entertaining lies, or practical tips for putting up a shelf. Writing by human writers is not only about the world, it is of the world and accountable in it. Are you laughing at my joke? Did the shelf fall down? But LLMs have no world for their words to be active and accountable in. For LLMs there is only text and its self-replicating patterns.
Now, I think it would be fair to say that some student writing, perhaps a lot of student writing, comes close to this condition. I’ve read commentary on AI in education that takes it for granted that student writing is always like this. Writing for the sake of writing. Writing that has been cycled through the same search engines and catalogues of authorised text (‘legitimate’), or writing that has been recycled through cut-and-paste raids on other texts, paraphrasing apps, or the latest long-form chatbots (‘illegimate’), but in either case writing produced only to pass as writing of a certain kind. The role of the grading academic is then to certify that the output is indeed ‘academic writing’, in the same way that we are all being invited to agree that the outputs of generative language models are ‘human writing’. If writing is produced only to pass the academic equivalent of the Turing test, it’s no surprise if students treat it as an exercise in exactly the kind of ‘same but not quite the same’ reproduction that LLMs excel at.
The ‘fluent bullshit’ that emerges from ChatGPT is, says Amit Katwala in Wired, also what emerges from an education system that teaches:
how to very quickly absorb information, confidently regurgitate it in a predetermined format, and then immediately forget it and move on to something else
If that is an accurate picture of university teaching, it is the result of complex forces that university teachers have resisted at least as much as they have accommodated. (Since I first published this post, a friend has drawn my attention to David Wiley’s ten-year campaign for ‘renewable’ rather than ‘disposable’ student assignments: there are some good examples here). But let’s welcome any new reason to challenge these tendencies. What would it mean to produce writing assignments that are less normative, less forgettable, that have a purpose beyond ‘passing’? They would, I think, have to centre accountability. Not to our academic integrity policies, or only incidentally to those, but to words and their connections to the world. It would mean asking students to attend to what their writing is doing, beyond putting its hand out for a particular grade. Who writes like this, and why? What does writing of this kind make happen, or what could it make happen if it had a context beyond the classroom? Why – in students’ own worlds of meanings and purposes – is it worth writing at all?
These are questions about being thoughtful people and accountable social actors as well as about writing. Questions we avoid if we focus on writing tools and their uses. I have tried to address some of them in a post about designing assignments in a world of AI.
Accountable AI?
So far I have argued that LLMs cannot be accountable for the words they produce and therefore can’t use language as human beings use language, and on principle they never can. I hope I have persuaded you that accountability is not something that can be added on to language, as an ethics team might be added on to an AI corporation and perhaps subtracted again if it gets in the way.
The recent open letter on AI from the pregnantly-named Future of Life institute seems to concur. The letter can be read as a plea for accountability to be imposed on the AI industry - or, more improbably, consented to by actors inside it - but it can equally be read as a denial of the very possibility of accountable AI. It confesses that ‘black-box models with emergent capabilities’ are not ‘accurate, safe, interpretable, transparent, robust, aligned, trustworthy’ and that the AI industry has no plans to make them so. Rather, it is up to bodies outside of the AI industry to create external checks - ‘regulatory authorities’, ‘auditing and certification’, ‘safety research’ and when all that fails, ‘institutions for coping with the dramatic economic and political disruptions (especially to democracy) that AI will cause’. Exactly how other institutions audit, regulate, research and prepare is not clear when the disruptors can’t explain how their product works or what it might do to the internet of human writing.
As Emily Bender (among others) has persistently argued, there are:
plenty of open questions about how deep neural nets map inputs to outputs, but we'd be much better positioned to study them if the AI labs provided transparency about training data, model architecture, and training regimes. Instead we have a black box.
Compared with symbolic AI or expert systems, which involved a painstaking translation from human experts into code, probabilistic AI has no real designers. No language model sits behind it, or emerges from it to allow for human scrutiny. As Abeba Birhane and Deborah Raji explain:
without naming and recognizing the engineering choices that contribute to the outcomes of these models, it’s almost impossible to acknowledge the related responsibilities.
And even if those engineering choices could be named and recognised:
All of these systems have datasets that they scrape from the internet and data laborers that do some type of task… that’s what this whole field is predicated on. Timnit Gebru (2023)
Millions of decisions are made by these labourers, many of them on piecemeal wages in the global south, and without them LLMs would produce such a slurry of extreme and harmful content they could never be used. And then there are all of us, writing away in our corners of the internet, grist to the word mill. We are all responsible for the word machine, and none of us are.
LLMs are platforms for coordinating linguistic labour. This was less an engineering choice than a business one, taken by investors, consultants, entrepreneurs, shareholders and board members. Their goals are not, or not in any simple way, to make writing better. Their goals are for humans to work more efficiently as producers and to engage more compulsively as consumers with extractive processes such as search, targeted advertising, data mining, digital gig work and pro-sumption of content. Profit from LLMs will mainly derive not from direct subscriptions but by integrating them into these other industries. (No doubt there will be collateral profits in education-specific applications, but more of that in another post.)
LLMs are extraordinary, highly coordinated, intentional feats of human engineering. But the hype around their ‘emergent’ properties and ‘more than human’ reasoning seems intended to deny the human intentions that produced them, in order to deny responsibility for their effects. OpenAI researchers have already identified multiple ways in which these tools could enhance disinformation campaigns. EuroPol has issued warnings about their use by criminals and fraudsters. And let’s forget scary future scenarios in which ‘strong’ AI takes over the world: real harm has already been done to real people in the making of these tools. The Kenyan digital piece-workers who cleaned up after the engineers by wading through violent, pornographic, hateful and traumatising content. The white collar workers whose jobs depend on producing text – emails, ad copy, routine reports and manuals - and whose work has become overnight disposable and insecure. The people at the cutting edge of the global climate crisis, thanks to the indefensible carbon footprint of training and running these models. And there are many credible people arguing that the AI project overall is fundamentally incompatible with racial justice.
Of course these harms matter far more than any failures of linguistic theory. But none of these harms are incidental to the naïve and plainly wrong ideas about human language proposed in the development and marketing of LLMs. Their language model is normative, amplifying writing from the mainstream (i.e. wealthy, English-speaking, disproportionately white and male) internet. It is extractive, treating language as a resource to be mined, not an act of connection and communication. It is unaccountable, with no body to stand behind its utterances, no community to feel the consequences, no needs to negotiate. It has no intentions or values of its own, except those added on to the black box of code by human data workers, and whatever human organisations may be able to provide around its use.
So what is the role of the human organisations that are universities?
Accountable universities
Students are already using LLM applications as writing tools, or embedded in writing tools such as Grammarly or MS word, or as interfaces on other tools such as Bing search. Of course they are. Universities can’t and shouldn’t try to stop them. Graphical user interfaces were a huge leap forward in the usability, ease (and compulsiveness) of computers. LLMs to my mind offer a similar leap forward in our ease (and no doubt compulsiveness) of interacting with data-based systems.
Graphical user interfaces are illusory, designed to suit the way our human eyes, brains, and hand-eye coordination are wired (command-line interfaces are slow and painstaking and very, very dull). Immersive interfaces are illusory. But even when we are immersed in use, it’s quite easy to bring to mind the illusiveness and virtuality, the designed quality of our sensory experience. LLMs are also illusory, but because the illusion is in language, our language-made brains struggle to register it. Billions of pounds and tens of thousands of hours of human labour are going into refining this illusion. The illusion that these are more than tools or interfaces - that they are our partners in language, our interlocutors. We already spend large parts of our lives engaged in vivid sensory illusions, with mixed results for our physical, mental and sensory health. We should consider the costs and benefits carefully before rushing into a life of dialogue with illusory others.
It is an illusion that matters particularly to university teaching, based in writing and interlocution. And one thing we can do, I think, is make sure that in everything we say about these tools, whether it is in policies to restrict their use, in teaching assignments encouraging their use, or in critical concerns we pose about their use, we do not promote the illusion. We point out, relentlessly, what it is. And we promote instead the opportunities for dialogue with other engaged, interested and interesting, diverse and imperfect people that is unique to the university experience.
As some educators have suggested, advances in AI may lead to a new affirmation of the value of human thinking, and so of a university education in dialogue with other human beings. But universities are no longer the arbiters of what knowledge matters. As more and more intellectual processes are captured and datafied, as more knowledge practices are streamlined and standardised, globalised and automated, the demand is loud for universities to educate ‘people who can work alongside AI’. And universities must have a response.
We may hope that ‘alongside’ means AI tools will be used in the service of human work that is rewarding, transformative, socially useful. We may hope our graduates become creatives, professionals, thinkers and decision makers, and use LLMs as tools to realise their ideas. But our conscience must tell us that this is not how digital technology has played out so far in the workplace. Not for the majority of people, not even for the majority of graduates. And there are reasons for thinking that recent advances in AI will have wider and more unpredictable effects even than we have seen already.
The question of how to educate human beings ‘alongside’ AI should I think be incorporated into curriculum design at the root, drawing on the resources of each subject discipline to address the risks as well as the rewards. The history of teaching machines might be useful, as might alternative imagined futures. Ethics and philosophy (and, yes, even ‘theory of knowledge’) can be put into dialogue with insights from social science, computer science or design, depending on who is in the room. Answers will be as diverse as academic tribes and methods. But the question is too important for individual educators to muddle through with an infographic or two. And students, who have most at stake, must be involved.
ChatGPT and its successors and competitors are already changing work. Universities as workplaces are changing too. Chat rooms and discussion threads are packed with ideas for using LLMs to write lesson plans and lecture notes, with a focus on saving time, improving productivity, or getting on with more interesting tasks. The same models that allow for instant generation of passable essays can provide instant feedback, even design and apply marking rubrics. If academics don’t have time to check every student’s work for signs of artificial writing, software is available to do that job.
There has been some push back. The market leader in plagiarism detection announced this month that an AI-detection service would be embedded into its platform, and UK universities opted en masse to turn it off until they saw better testing and more transparency. This opt-out only appeared after an eleventh-hour intervention by bodies representing universities collectively. Perhaps this is a turning point in higher education’s relationship with tech platforms; perhaps it is only a pause for breath.
While students are held stringently to account for their use of LLMs, how will universities account to students for their own use of these systems? Can they hold out against black-box capabilities being embedded into the platforms they have come to depend on? Who is assessing the risks, and how are those risk assessments and mitigations being shared with the people most affected? These are questions that universities should be attending to with at least as much energy as they are policing students’ use of apps.
Universities have a unique stake in writing as human development. They can shape what writing is and becomes, what it does for us and with us, why and how it matters. The question I started with: ‘What kind of people are developed through the writing we ask them to do?’ demands a response from universities. It is an area where universities are – we are - unequivocally, still accountable.
Thank you so very much for this Helen. We may not have the answers (yet, if ever!), but you have provided us with many of the deep and fundamental questions that we all need to ask ourselves and begin responding to.
The most reasoned , sensible and articulate piece on the subject I have read Helen, thank you.