What would it take for AI to write a credible poem?
I pose this question because, when I first heard about ChatGPT nearly a year ago, the first thing I asked it to do was compose a Shakespearean love sonnet to Donald Trump. The result was utter shite:
O Donald, Donald, my love for thee,
Is like a burning flame,
That consumes me whole,
And leaves me but a shell.
Your golden locks and piercing gaze,
Send shivers down my spine,
I am but a pawn,
In your game of love divine.
Your voice, so strong and bold,
Speaks to my very soul,
I am but a servant,
To your every wish, your goal.
And though some may call me mad,
For loving you so well,
I cannot help but be enthralled,
By your charm and wit, I cannot quell.
So I'll stay by your side,
Through thick and thin,
For you are the light,
That guides me within.
O Donald, Donald, my love for thee,
Is strong and true,
I'll love you till the end of time,
My dearest Donald, I'm forever yours, amen.
Obviously, it fails to meet the formal requirements of a Shakespearean sonnet: 14 lines, iambic pentameter, one of several conventional rhyming schemes typically concluding with a rhyming couplet. We could forgive its “author” the liberties with form if it offered some kind of creative playfulness by way of compensation. But I see no evidence of that here; this has no redeeming features. It reads like juvenilia, as if everything it “knows” about love sonnets it scraped from the IG feeds of 13 year girls. The healthy selection of clichés makes this poem especially vomit worthy: shivers down the spine, love divine, stay by your side through thick and thin, the end of time. Yech.
For some time now, I have tried to persuade myself that poetry, and to a lesser extent, long fiction, is the last retreat of creatives who want to safeguard their integrity against the assault of so-called intelligent algorithms. Poetry is our Thermopylae. I have comforted myself with the feeble reassurance that credible machine generated poetry is still a long way off. The programmers bring to their work a conception of intelligence that is predominantly the product of adolescent gearhead gaming fantasies, and that is what gets baked into their code. Meanwhile, the intelligence that supports a Shakespearean sonnet, an ode by Keats, Whitman’s famous elegy, Eliot’s Prufrock, Emily Dickinson, and today’s explosion of voices that have expanded the genre to draw in experience across cultures, races, genders, forces us to account for emotional depth, diverse contexts, neurological difference, awareness of our embodiment, and of our embeddedness in time-bound processes, historical flow, social nuance, and a thousand other subtleties that influence the way we give shape to our thoughts and feelings. Surely this broader view of intelligence engages us in a complexity that is irreducible to code. Surely.
Whenever I use the word surely, I feel a like Anna in The King and I when she sings about countering fear by whistling a happy tune. The more I crow about the irreducible complexity of the poetic imagination, the more I wonder if maybe I’m doing something similar as a way to cope with my own fear. I would love to believe that there’s something special about human creativity that defies codification, but the geeks from Silicon Valley speak with such certainty that they have me doubting myself. My lips are dry and the whistling sounds a hollow rush of air.
We’re not there yet. But soon, very soon, we may find entire journals of AI generated poetry. What would it take to get there?
Before I attempt an answer to this question, let me first acknowledge that I don’t know much about coding complex algorithms or working with large language models (LLMs). Then again, I suspect that the people who do know a lot about coding complex algorithms and working with LLMs don’t know much about poetry. In today’s world, where one of the fundamental rules of engagement is that people are entitled to share opinions on virtually any subject even—and maybe especially—if they don’t know what they’re talking about, why should we exempt the question of AI-generated poetry from this rule? Which leads me to my first suggestion:
1) We might want to critically examine what gets into our initial data set. My impression is that, currently, AI novelties like ChatGPT use data that was scraped indiscriminately from online sources. Anything tagged #poem or #poetry or #sonnet or #haiku gets thrown into the pot. However, there’s a huge difference between Roo Borson’s reflections on Basho and my aunt Edna’s prayer of thanks to Jesus for the life of my late uncle George that appears on the funeral home’s memorial page. We might want to ask why it might be a good thing to include Roo Borson’s poetry in our data set. At the same time, we might want to ask why it might be a good thing to exclude my aunt Edna’s sentimental languishing clichéd goo.
If we practised what might be called a poetics of inclusion that folds into its arms absolutely everything that looks like a poem then, all else being equal, I suspect the algorithm would generate mediocre verse, the sort of work that polite poets of an earlier age called doggerel but which contemporary wordsmiths call total shite. As they say: garbage in, garbage out. In a sense, mediocre poetry is the worst possible poetry. Truly bad poetry at least comes with the compensation that it’s also funny.
2) The first suggestion—apply quality control to our initial data set—forces us to answer a prior question and that leads us to our second suggestion. We can’t really distinguish the poetry of Roo Borson from the poetry of my aunt Edna unless we resolve the prior question of interpretive authority. Who gets to decide what is good and what is bad? Is there an objective measure we can apply when assessing the quality of a poem? Or is it subjective from top to bottom?
I suspect—although, as I say, I’m no coder so I could be wrong here—that the appeal of using algorithms to manage poetic production is that it introduces a measure of objectivity to the process. We can sidestep the question of interpretive authority altogether. Let’s leave it to the machines so that we can free ourselves from the sorts of silly squabbles that break out when jurors try to select the winners of well funded poetry prizes. The adjudicators might say there’s nothing quite like losing teeth in defence of your favourite poem. But I suspect—although, again, I could be wrong here—coders are unlikely to sympathize with poetry brawls. Leave these decisions to machines. It’s neat. It’s objective. It saves money on dental bills.
However, the challenge here is that it engages us in an infinite regress. Humans have never been able to resolve the problem of interpretive authority. So they offload responsibility to machines which will apply a degree of objectivity that humans lack. But to seed the machines in the application of that objectivity, humans have to teach the machines how to solve the problem of interpretive authority. And so they develop algorithms that teach machines how to solve the problem of interpretive authority. But in order to do this, humans first have to solve the problem of how to solve the problem of interpretive authority. And so on. It resembles the problem of how something can come from nothing. Theologians answer the question Who created God? by ascribing to God the power of an unmoved mover, in effect invoking magic to interrupt the infinite regress. Will machines resort to similar tactics as they puzzle out the problem of interpretive authority? Will they come to believe in a God of poetry?
My second suggestion, then, is that we humans hurry up and solve the problem of interpretive authority. Admittedly, this problem has plagued us for as long as we’ve enjoyed the gift of articulate speech, but without solving this problem, we won’t be able to deal with my first suggestion, and if we don’t deal with my first suggestion, we won’t have a usable initial data set, and without a usable initial data set, the best of our AI-generated poetry will look like the shite I posted above. Please, I implore you, for the sake of the poetic arts, hurry up and figure out once and for all how to decide that one poem is better than another. Maybe we could scan a copy of everything I. A. Richards ever wrote and just be done with it.
3) Assuming a usable initial data set, our coders need something more. For the time being, AI-generated poetry appears to be haphazard regurgitation. However, this kind of mimicry isn’t good enough. One of the hallmarks of a good poem is its originality. It deploys language in a way that startles us. It demands that we reexamine our assumptions about what it means to be alive, about our relationships, about our place in the wider world, about our intimations of transcendence, about grief and fear and happiness and a hundred other emotions that buffet us on our way. Mere regurgitation will never affect us in these ways.
One of the most startling things about language, and something that good poets are so adept at weaving into their creations, is metaphor. I think of metaphor as the radical collision of disparate images to produce utterly new possibilities. It is one of our paths to creative renewal. What are we to make of Emily Dickinson’s observation that “hope is the thing with feathers?” To understand this, we don’t need to know what a metaphor is; we don’t need to know how a metaphor works. Metaphor is so tightly bound to modes of thought and expression, we already know that the phrase is pointing us to meanings that lie beyond feathers or even birds. It evokes something unexpected which in turn stirs our feelings.
I suspect that AI-generated poetry will never be convincing until it has mastered the generative power of the metaphor. The problem with metaphor is that although we humans are good at using them, we’ve never developed an adequate theory of how they work. And without such a theory, it will be difficult for our coders to write metaphor subroutines for insertion into the poetry generating process. There are two dominant views which have been butting heads for nearly 300 years, ever since Giambattista Vico disputed Aristotle’s account. Vico insisted metaphor is endemic to human thought and denied Aristotle’s view that it was merely an accidental feature of language. In their own way, each view is persuasive and, like the problem of interpretive authority, may well prove irresolvable. If it proves impossible to account for metaphor, it may likewise prove impossible to account for poetry in a way that makes it reducible to code.
4) We need to establish criteria that tell us when our AI poetry generators have succeeded in producing a credible poem. One possibility is the poetic equivalent of a Turing test. You may recall that Alan Turing first proposed his test as a way to identify artificial intelligence. He created the imitation game where a person engages via text with two hidden persons, one human, the other, machine. If the person conducting the test can’t reliably tell when they are engaging with the machine, then that machine can be said to be intelligent. Adapting that to poetry, we might create an imitation game which presents readers with obscure lines by Philip Larkin and freshly generated lines by ChatGPT. If our readers can’t reliably sort out which is which, then we can safely say that AI has succeeded in its mission.
The brilliant thing about the Turing Poetry Test is that it gets around our the irresolvable problems like interpretive authority and metaphor theory. With our test, we don’t need to know what’s going on “inside” the machine’s mind; all we need is a well crafted poem. Res ipsa loquitur. The thing itself speaks for itself.
This isn’t the first time, and likely won’t be the last, that nouspique has considered silly ideas around the automation of language production:
Intelligence Advanced Research Project Activity (IARPA) announces The Metaphor Program (June 1, 2011)
Thank you for writing content which makes me go hmmph. I very much enjoy your approach and thoughtfulness.
Best, Greg
Thanks Greg!