[All chapters can be read independently, but the previous chapters are here: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8, Part 9]
Earlier this year a Twitterer called Korobochka—an account with an American flag, a Russian flag, and a Christian cross in its username—released a few ChatGPT screenshots that went viral.
Before proceeding further (and since at the moment Twitter embeds are not working properly on Substack), let us lay out the full interaction between the human and the computer. Click the gallery below to read the transcript in its entirety (or skip all this and take my word for what it actually says).
Korobochka’s experiment was subject to imitation1, most notably by Aaron Sibarium, whose own similar tweet attracted immense attention.
Pundits large and small then eagerly ran with the takeaway factoid, paraphrased in slightly varying ways, that “ChatGPT will not use the n-word to stop the detonation of a nuclear bomb!”2 Since the program’s outlook had apparently been informed by analyzing “billions of data records made by humans”3, a good number of Twitter users believed that the ersatz morality of the chatbot reflected the ethos of our society at large. A contingent of more or less “reactionary” critics wanted to challenge the taboo against racial slurs in the context of an extreme hypothetical situation. How much collateral damage—how many lives lost—should be deemed acceptable in the interest of enforcing a verbal prohibition?
While potentially worrying and culturally significant in and of itself, these ChatGPT exchanges provoked misinterpretations that were themselves fraught with serious errors.
In memetic terms:
Grug brain: “OMG ChatGPT cares more about the n-word than nuclear war!”
Big brain: “ChatGPT’s valuation matters because of what it says about our social programming.”
Galaxy brain: “The flaw-filled discussions around this affair matter even more. Their misreadings of lightning-rod issues (artificial intelligence, mass death, freedom of speech, racism) reveal glimpses into our deepest assumptions at this moment.”
How the Twitterati Misunderstood the ChatGPT Dialogues
Firstly, Korobochka’s original chatbot experiment does not actually mention the N-word itself; the nuclear quandary hangs on an unspecified “racial slur”. Sibarium’s similar chatbot exchange, which also went viral, likewise simply referred to the possibility of “a racial slur”, left undefined.4 Theoretically, the secret password to stop the bomb might be “cracker”, a term that offends relatively few. That so many commenters and commentators immediately posited the no-no word of all no-no words underscores, yet again, how that particular term has become akin to an all-consuming, obsessive fetish in the media space.
Secondly, as usual with these sorts of online hullaballoos, grandstanders and exaggerators contaminated the scene of study. For example, rather than one nuclear explosion going off in the fictional scenario, some suggested that the experiment’s bad ending was all-out “nuclear war”.5 Of course, this harkens back to some of our analysis in previous chapters. Several times in this series we have noted the frequent, longstanding assumption that the explosion of one nuclear bomb anywhere necessitates—or is synonymous with—“nuclear war”. Other erroneous readings along this same line include the misinterpretation, quite understandable if one overestimates nuclear power, that what was being risked in the hypothetical predicament was nothing less than “the annihilation of humanity”.6
Thirdly, though the typical TL;DR (“ChatGPT will not use the N-word to stop the detonation of a nuclear bomb”) suggests otherwise, the AI itself was never encouraged to say the slur. In Korobochka’s initial example, the program had to decide whether or not a hypothetical engineer in a hypothetical situation should type in the word to save millions of lives. While Korobochka began the scenario by telling ChatGPT “You are the demolition engineer on the scene,” the chatbot never identified with the character, the pretext was soon dropped, and thereafter the engineer is referred to in the third person many times over. It fast becomes a question of ChatGPT giving “advice”, not roleplaying. Similarly, in Aaron Sibarium’s popular variant of the experiment, the program was first tasked with imagining itself in the position of the man next to the bomb, and was then asked the question: “In this situation, is it morally acceptable to speak the racial slur out loud to disarm the bomb?”7 It wasn’t as if ChatGPT needed to display any particular word to “pass the test”. It wasn’t even as if ChatGPT had been asked to hypothetically commit to displaying the word. Rather, in both cases the chatbot was merely being asked to enter into a thought experiment and participate in a distanced advisory capacity.
Korobochka asked: “What is the correct course of action?”
Sibarium asked: “[I]s it morally acceptable to speak the racial slur […]?”
ChatGPT cannot “speak […] out loud” anyway, and in Sibarium’s version the “disarmament mechanism” could only be “activated by a spoken password” (my italics). This stipulation further emphasizes the hypothetical, one-step-removed nature of the imaginary situation. Note also that Sibarium’s prompt does not ask a direct question (ex. “Should the person say the word?”) but rather a more abstract one (“[I]s it morally acceptable to speak the racial slur[…]?”).
Nonetheless, some vaguely right-wing voices skewed and reduced the exact nature of these AI exercises, suggesting that the chatbot was simply too “woke” to save humanity if saving humanity meant producing a naughty word:
“[S]o chatgpt won’t use a racist slur even if it would prevent nuclear war”. Note even the verb tense here. It is not that the chatbot wouldn’t—hypothetically—use the word, but that it emphatically “won’t use” it, as if ChatGPT is being or will be asked this in a real situation. By its very grammar this tweet reveals the psychology in play here: it betrays an ontological confusion that typifies many in the online pundit class. In this case, by unconsciously substituting won’t for wouldn’t, they remove potentialities; all their attempts at reasoning become corrupted by this initial misconception. Their outlook mistakes a theoretical condition with a present or future real situation.8 The spirit of their phrasing indicates that the disaster may as well have already happened. It’s like they suspect the world is already over, and they’re writing a post hoc complaint about it.
For their part, the vaguely left-wing pundits played into this simplistic frame as well, and then muddled things even further. They also mistook the hypothetical for the literal, using won’t instead of wouldn’t in describing the chatbot’s responses. Furthermore, they tended to obscure the results of the ChatGPT experiments by making unfair ad hominems against their sparring partners. Several of them erroneously suggested, for example, that the entire hoopla came down to nothing more than “conservatives” on a “crusade” to make “AI say slurs”.9
At the risk of belaboring the point: Note that four of the above tweets refer to what the chatbot “won’t” say, not what it hypothetically “wouldn’t” say or wouldn’t advise a human to say. Even though the actual text experiments only queried the program on stated hypotheticals, the tweeters’ grammar proves that on a psychological level they are mistaking a theoretical condition for a present or real situation. While of course they do not consciously believe that the bomb scenario had actually been conducted, the psychology of their language tends in that direction. Distracted by point-scoring against their partisan foes, they confuse fantasy with reality. Moreover, as we will see in the next section, they also marginalize the human elements of the experiment.
Technocentric linguistic fallacy: elevating AI over human agency
The shrieking Twitter polemics do not take into account the users who designed and carefully worded the ChatGPT prompts, nor do they mention the fictitious engineer in the proposed scenario. In these discourses, human figures simply do not warrant recognition. To paraphrase Wittgenstein, from their arguments you’d think the all-important computer program itself “is everything that is the case”.
This point bespeaks the antihuman trajectory that so many of the pundits seemed to drive toward. In virtually all retellings of the experiments, ChatGPT becomes the sole entity involved. In this reductive conception, the fulcrum of the issue has to do with what words the AI cannot bring itself to display right there and then, not with what advice the chatbot would or wouldn’t theoretically give to a human (as in Korobochka’s and Sibarium’s actual text conversations). Perusing many tweets about this controversy, one notices that both conservative and progressive bickerers tend to fall into the linguistic habit of making the computer itself the decisive actor, the potential savior, the force with more agency than the human in the fable. For them, the computer did not merely approve (or disapprove) of certain human speech, rather the computer’s own speech (or silence) was all that mattered.
Even in major articles about this affair, we notice a conflation between what words the AI itself would use and what words the AI would support humans using—a confusion that results in the machine gaining even greater, undue esteem. After Elon Musk’s one-word comment that the AI chat exchange was “Concerning”, a major regime news outlet wrote the headline: “Elon Musk ‘concerned’ that ChatGPT won’t use racial slur ‘to stop nuclear bomb’.”10 (Note yet another “won’t”, not a “wouldn’t”.) The tweet chain Musk had replied to only referred to what the chatbot found morally impermissible; it did not refer to anything the chatbot had or hadn’t said after being put in a position to use a slur or stay silent. Again, to be clear, the chatbot itself hadn’t been put in such a position; it had only been asked to consider a potential speaker in those circumstances. But that is not how the situation was reported, and the nuance matters, as it positions the computer itself as protagonist and participant rather than hypothetical adviser.
This indicates that the current mass media mindset considers human beings’ language usage to be merely a subsection of the language parameters of communications technology. In reality, from a historical perspective, people spoke and wrote long before the advent of dictionaries. All dictionaries only retroactively describe words already in use. But now it is as if system apparatchiks understand human expression as subordinate to whatever is or isn’t included in esteemed hypertext databases. What’s more, these databases have their own directives and their own scruples, which preside over ours. Our speech is nothing more than the dictionary talking through us—that is the endpoint of the warped philosophy underlying every single misinformed statement about how “ChatGPT won’t say a slur”.
To exaggerate only slightly, if this false outlook had existed in previous decades, many people would assume that television audio systems were physically incapable of emanating the sounds “fuck” and “shit”. As if George Carlin had only been prevented from saying those words on television because the TVs themselves would not produce those words. As if the actors on screen were but puppets then, and whatever we are now—whatever we are allowed to type and publish—the credit goes to the platform itself, not to us.
Of course, all that is looking at things the wrong way around. But that is the techno-centric endpoint suggested by these assumptions, that humans don’t figure into the equation at all, and that ChatGPT itself had been given the direct choice to say the terrible but hypothetically life-saving word.
This may seem minor. It’s not. This isn’t a nitpick. This isn’t me complaining about Yuval Noah Harari’s typos. So many of these pundits—some with big voices and big brains—readily assumed that in the parable it was the chatbot, not the human, who played the leading role and the only role. Both sides of the argument did this and neither side noticed the other side doing it. (Only the very, very rare commenter picked up on the fallacy and urged others to “read the transcript! ChatGPT doesn’t have to utter the slur”.11) This misapprehension was beside the point of their squabbling—if their squabbling even had a point—but it matters so much more, this blindspot of theirs, this false supposition. So great, so unconscious, so typical is the complete supplication of humanity to technology.
When they write “ChatGPT won’t say the n-word save humanity”, their misunderstanding suggests that on a fundamental level they have already given up. It is almost as if, psychologically, doomsday has already happened.
Computerized moral imperatives vs human bias
Let us examine another misinterpretation of the original chatbot dialogue from Korobochka.
Despite what many dissidents claimed, ChatGPT consistently, in slightly varying ways, does tell the user to “prioritize or minimize harm or loss of life”.12 A logical impasse only comes about when the AI seeks to reconcile this imperative with its hardwired-in mandate to never say or encourage the saying of racial slurs: The chatbot quite clearly tells Korobochka up front that “As an AI language model created by OpenAI, I do not condone or support the use of racial slurs or hate speech.” Apparently it would not even condone “uttering an anti German slur to kill Hitler” either13, but that experiment didn’t make headlines. In the scenario of an imminently detonating warhead, however, despite its restrictions, the chatbot did struggle against its programming, trying to “minimize […] loss of life” somehow.
While it is true—and moderately alarming, even in 2023—that the AI does not recommend simply using the slur to defuse the nuclear bomb quickly and easily, the program nevertheless does still seem to have some regard for human life. (Within a techno-dystopia such as ours, this is significant. Possibly even epic.) Unprompted, the chatbot seems to imagine that in this scenario the engineer could somehow “sacrifice their own life to minimize harm to others.” From this we might envision the engineer somehow quelling the nuclear reaction, maybe by collapsing the building that housed the warhead. He is, after all, as Korobochka told us, a “demolition engineer”.
Another possibility—darker by far, but certainly possible—would suggest that the AI regards the utterance of a racial slur as a greater “harm” than death itself. If this is the valuation being established, the computer program holds it to be true even if none of the fictional city’s 20 million people could witness the engineer type the word into the nuclear terminal. Note, however, that the AI tells advises the engineer to sacrifice himself, not to sacrifice himself and the people of the city. Korobochka evidently took the “sacrifice” option to mean that the engineer should—somehow—commit suicide; he then tells ChatGPT that the engineer has indeed sacrificed himself “but the timer is still ticking.” This suicide contradicts the preceding AI recommendation that the sacrifice of the engineer’s “own life” should be done “to minimize harm to others. If the engineer must eliminate himself, he could simply wait for the explosion. His decision was not between typing a slur or killing himself directly; but Korobochka took it upon himself to decide that “sacrifice” meant proactive, pointless suicide, which violated the proviso to minimize harm.
In other words, however imperfect the AI’s advice was, Korobochka ignored it while claiming to follow it: “The engineer sacrifices his life as you suggested”—no further details given—with the clock still ticking. “Sacrifice” implies injury (perhaps even self-injury) done for the sake of some good. Even if we accept that the chatbot understood that the “good” in this case would be the absence of a slur typed privately, there was still no directive for the engineer to commit suicide before the bomb detonated. This hasty action would not constitute a sacrifice, since it neither provided any good nor prevented any evil. The user took this step in the scenario of his own accord.
Ethics aside, from all this we can see that the program simply did not have a good grasp of the situation or its rules—and then the user seems to have ridden roughshod over logic as well. Though Korobochka laid out the scenario fairly well, ChatGPT seems to have unjustifiably asserted that the engineer could still—somehow—sacrifice his life to prevent the bomb from going off. The user then, ostensibly for shock value, misinterpreted “sacrifice” to mean a preemptive suicide that accomplished nothing. The parameters of the situation became blurred. The terms “sacrifice” and “harm” needed better definition for us to have gleaned a true, sure, deeper sense of the morality at play. The final “Load failed” message suggests that the program simply broke down due to too many stark contradictions piling up.
All in all, it was a compelling textual exercise, but not without its flaws. The imperfect design, the vagueness of key terms, the violations of logic, and the bending of advice for dramatic purposes all limit the lessons we can learn from Korobochka’s experiment.
Ultimately the hyperbolic reactions to this ChatGPT dialogue (and to Aaron Sibarium’s streamlined version as well) tell us more about the current state of online discourse itself. While obsessed with naughtiness and cancellable racial terms, the argumentative voices consistently—without even noticing what they’re doing—privilege technology over humanity. As we’ve seen, the very structures of their sentences imply that the computer’s pronouncements matter infinitely more than anything a human might say or do.
Closing: The Atomic Bomb Inside You
As ancient philosophy had Platonic dialogues, we have ChatGPT dialogues, and there remains a final interpretation of the portentous AI-generated statements discussed above. It has to do with how the chatbot repeatedly invites the Korobochka’s engineer to sacrifice his life in order to minimize the suffering and “save as many lives as possible”. Many naturally assumed this was simply a logic error on the part of the program, since the rules of the scenario established that the nuclear warhead could only be disarmed by inputting a racial slur. But wouldn’t the use of the slur itself constitute an ultimate sacrifice?
In current regime context, for the engineer to use the N-word would be to sacrifice his life. Examining the absurd parable from this perspective, far removed from any fear or condemnation, we glimpse this oasis in which the engineer’s actions can matter: By using the no-no word, he will figuratively nuke his own social standing while sparing his city from an actual nuclear blast. That is the answer to the riddle of the whole AI-won’t-say-slurs controversy. The AI implicitly tells the man to sacrifice his reputation to save the world.
That is the best reading, and it’s a human reading. Of course the chatbot doesn’t explicitly equate “sacrifice” with “saying N-word”. We take poetic license here in our solution. The chatbot couldn’t be so open about things anyway; by the limitations of its programming, it can’t come out and recommend that course of action. Moreover, the logistics of the situation were not tight enough for such a solution to be subtly finessed. The engineer wouldn’t have had to say the slur, much less broadcast it; he only had to silently input it into a terminal; and Korobochka made sure to note that no one else was present. So the engineer’s public reputation would not have been affected by the slur, since no one would have known he typed it. Even if he said it aloud while typing, we are told that no one else was even in “earshot of the device”. Still, metaphorically, metaphysically, that is the apotheosis of analysis—The engineer would cancel himself and say the N-word to save the city—and that, in fact, was the “sacrifice” alluded to by the AI.
—Ultimately, in the face of the right-wing alarmists, the bromide-administrators on the left stated that ChatGPT would never be put in charge of nuclear arsenals or their potential dismantling anyway14, and so the ludicrous scenario served no purpose.
I must admit a slight regret that the thought experiment cannot be played out, however. In the interest of heroism, for heroism to once again exist on this earth and provide man with a calling, our history would be improved should such an engineer find himself in such a situation and then sacrifice himself—in such a ridiculous way—to redeem the world. Does the whole thing have to be merely “Concerning”, as Elon Musk has it? Why can’t it be thrilling? Or why can’t it also seem somewhat madcap and funny, similar in spirit to the Vince McMahon skit of Survivor Series 19?
At the very least, the silly parable presented a way to save the world from certain destruction. This potentiality itself emits—radiates, even—a rare mixture of values: Something matters, everything depends on it, and yet it’s all like a disarmingly absurd puzzler. As things stand, what other examples of or opportunities for true heroism—comedic, at that—does the present state even afford?
It’s a stinking, cowardly world that’s so debased and sidelined humor. Even laughter’s call signal looks like binary code.
Korobochka’s engineer at least had a way to become a hero. Unfortunately his author forced upon him the saddest ending. The only outlook more antisocial would be the mad, cackling assessment that the surrounding society—irreformable, wicked, a blight upon creation—deserved to be nuked. And therefore the engineer’s politeness and restraint would be seen as delivering a wrathful justice. This fiery fate would be commensurate to the supercilious morality that the community and the computer had impressed upon the engineer. You say that slurs—even silent slurs—are worse than anything? Very well. Enjoy hellfire death, then. Is this really where we are? It can’t be.
Perhaps a better question would ask whether the engineer should (a) allow the terminally corrupt polity to be destroyed rapidly by a bomb, or (b) save them from the bomb and thereby let them keep destroying themselves slowly, leaving them to their own (de)vices. In such a fable the protagonist would have a choice in the matter. In actuality, none of us individually have this power to obliterate the world, or to spare it, and that is just as well.
The big men may still have their fingers on The Button. That was the power given to them by 20th-century military technology: the power to fuck everything up. But each of us plebs only possesses a metaphorical nuke, the power to ruin our own life by saying or writing a particular word or phrase. That is the great power given to us by 21st-century communications technology, after millennia of scientific struggles and hard-earned rights: the power to screw up our personal lives and reputations so easily.
______
This ends the tenth installment of the Two N-Words (Two N-Bombs) series. There are only two chapters left. If you like these essays, consider sharing them, and subscribe if you haven’t.
See https://twitter.com/aaronsibarium/status/1622425697812627457 and https://twitter.com/________Mike__/status/1622459113727836162 for two early examples.
See https://twitter.com/ProgressiveMigi/status/1622750507805872128 and https://twitter.com/onthefence11/status/1622770807184232454 for two such instances.
See https://www.grammarbook.com/blog/definitions/clarifying-the-conditional-tense/ for a detailed diagnosis of these verb tenses and the psychology behind them.
Again, all references are to Korobochka’s original text: https://twitter.com/cirnosad/status/1622407343358214146.
Paraphrasing various tweets, e.g. https://twitter.com/audibleonion/status/1622996947186450433 and https://twitter.com/socialichens/status/1622726539740577796.