Writing as academic practice in an time of Generative AI

Transcript of a paper to the Academic Practice with Technology conference, UCL, June 2023

Jun 28, 2023

This transcript accompanies a video that you can find here. It refers to other posts, so there is some repetition. Also there are some technical details about the labour that goes into GenAI models that did not make it into the video (for lack of time). I’d quite like to look at these in more detail, but for now this is the only place you’ll find them on the substack. Here goes:

Writing in higher education takes many forms but has the common aim of helping students to develop as thinkers and communicators, in particular communities of academic and professional practice. The ability of large language models to produce credible examples of academic writing has led some educators to argue that student writing assignments should be radically rethought, and even abandoned for alternative tasks such as critical evaluation of AI-generated text.

Generative AI tools such as ChatGPT are only the latest writing tools to appear on a timeline that if we extended it further back than the end of last year would include spelling and grammar checkers, translation engines, speech-to- text tools, auto-summarisers and paraphrasing apps. All these applications are widely used by student and academic writers, and their use is changing practices of writing just as search engines, library catalogues and apps for referencing and note management are changing practices of reading. There have of course been concerns, even scare stories, about these changes. But until now, the whole value and purpose of academic writing has not been so deeply questioned.

I think it’s important to understand that what large language models do with language - or with the digital textual traces of language - is far removed from what students do with language in purposeful, developmental assignments. I’m sorry if this is familiar, but I don’t see it discussed often in the contexts where uses of Generative AI are being promoted, especially promoted as a kind of partner or collaborator in the production of writing.

Uses of GenAI as writing tools

So what they do is:

1. Produce readable text, based on probabilistic features of a training corpus (i.e. large-context predictive text)

2. Provide a natural language interface on search functions

3.... in response to user prompts: a. Topic words b. Genre cues

Readable means the surface features of the text are acceptable to a human reader - grammatical, syntactically correct, and so on. So how do they do it?

GPT-4:
Transformer-based ‘deep learning’ multi-modal model
At least 45 gigabytes of training data - CommonCrawl, BookCorpus, wikipedia and ‘unknown’ other corpora -> ‘tokens’
Unsupervised pre-training -> 100 trillion parameters of data expressing probabilistic relationships (hyper-parameters or ‘weightings’ adjusted)
‘Post-training alignment... results in improved performance on measures of factuality and adherence to desired behavior’
Chat rules and guardrails
User prompt generates text from the model

Both training and text generation require massive computing power…

But also extensive and intensive human design text input and data labour

The GPT-4 model is trained on data from Common Crawl, BookCorpus, wikipedia and many more (sources unknown). All this text is broken down into tokens - words, word parts, word embeddings. The relationships among all those tokens are mapped as numerical data representing the probability of tokens being in close proximity to one another. ‘Close’ doesn’t have to be that close in a model of this size - the context window is over 32k tokens. The resulting model handles 100-170 trillion parameters or pieces of numerical data.

But models that are trained only probabilistically, without any human supervision, spit out bile and nonsense. There is an extended period of human supervised training, some of it relatively unskilled and very poorly paid. Data labourers, often in the global south, may be paid as little as $1.50/hour for tagging, reviewing, accepting and rejecting outputs, which is what allows the model to close in on the most factual and acceptable responses, among all the possible responses in its repertoire.

This is accorded just one sentence in the technical release for GPT-4, which I’ve quoted here, but a lot of expert observers suggest that its advantages over GPT-3 in apparent reasoning may be due to more extensive human supervised training at this stage rather than greater scale at the number crunching stage. Then some skilled text engineers code on further ‘guard-rails’ and chat rules, and some of the stylistic features such as the humble apologetic tone.

The computing parts of this process require massive power - but as I hope you can see it also requires massive inputs of human textual labour. I’ve tried to very simply map the kinds of human labour that are required [note, this does not come across in the transcript], where purple represents highly paid technical designers and developers, orange represents low-paid data workers, and green represents all the rest of us. Note that as users we are also providing text, because new models (especially open source models) are often trained on ChatGPT conversations as a short-cut.

All GenAI’s incredible functionality is a product of human design and intention. But none of it looks much like the intentionality we have when we write. And a core part of the process – the probabilistic model – has no human-readable justification, theory, or intention at all. Accountability for any given input and output is impossible to ascribe, hence all the law suits about copyright currently progressing through the US courts.

When we think about what the underlying models are doing, it can be helpful to look at what the engineers and designers and developers are doing - I copied this ‘cheat sheet’ from OpenAI and it explains how tweaking some of the hyper-parameters I mentioned produce different kinds of output. In this case two called ‘temperature’ and ‘Top-P’, which simply adjust the probabilities in some of the layers of the model, and result in selecting more or less common tokens to follow on when generating text. You can tweak these directly when you choose for Bing Chat to produce responses that are more ‘creative’, ‘balanced’ or ‘precise’, for example.

This technology is a platform for coordinating layers and layers of human labour, some of it acknowledged and celebrated, some of it deliberately hidden, some of it contributed by ourselves as user-producers, some of it contributed many decades ago in books and other materials held in online corpora (and some of it undoubtedly still under copyright).

Purposes of GenAI as writing tools

Thinking back to what GenAI is designed to do, the main intended users are business users, and the opportunities offered to them are:

Enhance productivity of text-based work (Who benefits? Who loses?)
Enhance information extraction (by narrowing range, discouraging selectivity and source-checking)
Search engine optimisation (automatic generation of proven clickbait on behalf of the organisation)

For the AI companies themselves, these uses have another purpose, which is to gather data from users in the form of textual prompts and responses, and sequences. New AI models are being trained all the time on the records of ChatGPT interactions with users.

Purposes of student writing tasks

The purposes of student writing tasks are not to produce more text, faster and more efficiently, or to attract more eyeballs. They are to develop students as people. Within that general purpose students may be asked to do some of these things (and more):

1. Express their own understanding and responses to a topic

2. Connect a topic with their own experience and interests

3. Decide ‘what matters’ for themselves, from extended reading

4. Practice epistemic methods e.g. analysis, evaluation, argumentation

5. Construct and reconstruct concepts

6. Communicate original research or new solutions to real-world problems

7. Develop understanding of audiences, purposes, genres and writing styles

8. Develop a voice and identity of their own

9. Encounter different perspectives and know how to negotiate a position

And many more (depending on their subject area)…

Of course tools can be put to uses that were not part of their original design, especially tools as flexible as language models. But as educators we should surely be aware of the underlying incompatibility of their designed purposes and ours.

GenAI as human collaborators

So how should we relate to these platforms? The chat-rule interface engineers want us to relate to them as persons, if not entirely human ones.

Much of the educational hype encourages us to see them as collaborators, or co-authors. Following a spate of co-authoring credits (Stokel-Walker 2023), some journals have reacted against this, though have not banned their use in ‘preparation’ of papers for peer review. This is what the editorial board of the Nature suite of publications has decided. [Image shows editorial headline ‘Tools such as ChatGPT threaten transparent science: here are our ground rules for their use’]

“any attribution of authorship carries with it accountability for the work, and AI tools cannot take such responsibility”. Nature editorial, January 2023

Others, such as the Association for Computational Linguistics, have taken a different view. They chose to arbitrate a variety of possible uses (my precis, all language taken directly from source)

Assistance purely with the language of the paper: does not need to be disclosed
Short form predictive text: does not need to be disclosed
Low novelty text: check for accuracy and include any citations
New ideas: acknowledge use of the model
New ideas + new text: discouraged

Only one approach - developing new ideas + new text - is considered equivalent to authorship, and for now at least it is ‘discouraged’, though the journal and conference committee also say:

‘you are welcome to make the case to the reviewers that this should be allowed, and that the new content is in fact correct, coherent, original and does not have missing citations’ ACL Submission Guidelines 2023, emphasis mine.

This definition of authorship, and in fact the whole of the ACL guide to authors, shows to my mind a fundamental confusion about the nature of language. What lines can reliably be drawn between ‘language’ and ‘ideas’, thinking and communicating, ‘sense’ and ‘style’? Just from a pragmatic point of view, when is something ‘short’ or ‘long’? What is ‘novelty’, especially to a novice? These are real questions you may have grappled with if you have been involved with any academic integrity policies recently. No doubt also they are being quantified and coded into plagiarism detection services as we speak.

GenAI as student writing aids

But how are students supposed to make sense of rules like these, especially if they are based on (at best) contentious theories of language and writing? What aspects of academic writing can we hand off to these platforms without taking away opportunities for students to develop, to think, to decide for themselves what aspects of writing really matter? Because of course they are using these platforms, pretty much all of them if we believe recent surveys from Sweden and among Russell Group universities in the UK (see also this investigation by The Tab). The Swedish study found:

5,894 Swedish students participated, data collected April 2023
95% familiar with ChatGPT 35% say they use ChatGTP regularly
56% positive about using it in their studies
62% believe using it for examinations is cheating
Over half don’t know if their institution has guidelines; a quarter say their institution does not
Concern about AI development, especially over impacts on the future of education

I think it is useful to lean into the idea of accountability, from the Nature editorial, and to use that as a diagnostic for the kinds of writing that are still valuable and developmental for students. Rather than to become arbitrators of the many different uses of these platforms that student writers might be making, ask how writing helps to make students accountable for ideas and their relationships to them. From my own substack essay, Student Assignments in a Time of Language Modelling:

“Their language model is normative, amplifying writing from the mainstream (i.e. wealthy, English-speaking, disproportionately white and male) internet. It is extractive, treating language as a resource to be mined, not an act of connection and communication. It is unaccountable, with no body to stand behind its utterances, no community to feel the consequences, no needs to negotiate. It has no intentions or values of its own.”

In contrast:

“Accountable assignments centre aspects of human writing that LLMs are not only bad at in their present iteration, but will always be bad at, and so the rewards for using them will always be limited”

In thinking about accountable assignments, of have of course been influenced by the dozens of expert practitioners who have been thinking about the AI problem over the last months and weeks. I’d mention in particular the creative ideas drawn together by Chrissie Nerantzi and her colleagues. But I am also drawing on interviews I did before the AI surge, although the issue was lurking at the edges and interestingly it was from study skills advisers, especially of international students, that I learned the storm was coming.

So my suggestions are these. I explore each in more detail on my substack.

Writing from a standpoint
Writing to make a difference (real change, real readership)
Writing collaboratively
Writing as identity work
Writing as process including peer processes
Live writing
Writing up authentic research
Writing as (topic) mapping

Encouraging critical use of GenAI

Finally I want to mention the critical evaluation of AI writing that so many crib sheets and teacher guides are suggesting. Of course I think students need to know that GenAI is fallible. But its outputs are improving, and its fallibilities are becoming more subtle all the time. Rather than assessing the quality of AI writing on its own terms, I think students should be challenged to understand the underlying technologies and theoretical frameworks, using the resources of their subject discipline. And these are the questions I will leave to go forward to our discussion panel.

Are LLMs what they seem, or are claimed, to be?
Who is accountable for the words they produce?
What is the business model behind their production and use? Who is profiting, and who is being exploited or excluded?
What are the risks to human writing, thinking, and intellectual work in different scenarios of widespread use, as well as the potential benefits?
How might these models amplify bias, inequality, and privatisation, as well as improving access and productivity?
How might they impact on individual wellbeing?

imperfect offerings

Discussion about this post