My Twitter timeline, like most people's, is awash with people trying out the latest bot-pretending-to-be-human thing, ChatGPT. Everyone is getting worked up about what it can and cannot do, or whether the way it does it (speed-reading the whole of the Internet) exposes it to copyright claims, inevitable bias, or simply polluting the source that it drinks from so that its descendants will no longer be able to be trained from a pool of guaranteed human-generated content, unpolluted by bot-created effluent.
I have a different question, namely: why?
Prompt engineer is not a thing.— Nathan Benaich (@nathanbenaich) December 6, 2022
Stop trying to make it a thing.
We do not currently have a problem of lack of low-quality plausible-seeming information on the Internet; quite the opposite. The problem we have right now is one of too much information, leading to information overload and indigestion. On social media, it has not been possible for years to be a completist (reading every post) or to use a purely linear timeline. We require systems to surface information that is particularly interesting or relevant, whether on an automated algorithmic basis, or by manual curation of lists/circles/spaces/instances.
As is inevitably the case in this fallen world of ours, the solution to one problem inevitably begets new problems, and so it is in this case. Algorithmic personalisation and relevance filtering, whether of a social media timeline or the results of a query, inevitably raises the question of: relevant to whom?
Back in the early days of Facebook, if you "liked" the page for your favourite band, you would expect to see their posts in your timeline alerting you of their tour dates or album release. Then Facebook realised that they could charge money for that visibility, so the posts by the band that you had liked would no longer show up in your timeline unless the band paid for them to do so.
In the early days of Google, it was possible to type a query into the search box and get a good result. Then people started gaming the system, triggering an arms race that laid waste to ever greater swathes of the internet as collateral damage.
Keyword stuffing meant that metadata in headers became worthless for cataloguing. Auto-complete will helpfully suggest all sorts of things. Famously, recipes now have to start with long personal essays to be marked as relevant by the all-powerful algorithm. Automated search results have become so bad that people append "reddit" to their queries to take advantage of human curation.
Google employees explain why we haven’t seen ChatGPT like functionality in their products; the cost to serve an AI result is 10x to 100x as high as a regular web search today plus they’re too slow relative to how quick search results must be returned. pic.twitter.com/ixYDq0aI2H— Dare Obasanjo (@Carnage4Life) December 9, 2022
This development takes us full circle to the early rivalry between automated search engines like Google and human-curated catalogues like Yahoo's. As the scale of the Internet exploded, human curation could not keep up — but now, the quality problem is outpacing algorithms' ability to keep up. People no longer write for human audiences, but for robotic ones, in the hope of rising to the surface long enough to take advantage of the fifteen minutes of fame that Warhol promised them.
And the best we can think of is to feed the output of all of this striving back into itself.
We are already losing access to information. We are less and less able to control our information intake, as the combination of adtech and opaque relevance algorithms pushes information to us which others have determined that we should consume. In the other direction, our ability to pull or query information we actually desire is restricted or missing entirely. It is all too easy for the controllers of these systems to enable soft censorship, not by deleting information, but simply by making it unsearchable and therefore unfindable. Harbingers of this approach might be Tumblr's on-again, off-again approach to allowing nudity on that platform, or Huawei phones deleting pictures of protests without the nominal owners of those devices getting any say in the matter.
How do we get out of this mess?
While some are fighting back, like Stack Overflow banning the use of GPT for answers, I am already seeing proposals just to give in and embrace the flood of rubbish information. Instead of trying to prevent students from using ChatGPT to write their homework, the thinking is that we should encourage them to submit their prompts together with the model's output and their own edits and curation of that raw output. Instead of trying to make an Internet that is searchable, we should abandon search entirely and rely on ChatGPT and its ilk to synthesise information for us.
I hate all of these ideas with a passion. I want to go in exactly the opposite direction. I want search boxes to include "I know what I'm doing" mode, with Boolean logic and explicit quote operators that actually work. I do find an algorithmic timeline useful, but I would like to have a (paid) pro mode without trends or ads. And as for homework, simply get the students to talk through their understanding of a topic. When I was in school, the only written tests that required me to write pages of prose were composition exercises; tests of subjects like history involved a verbal examination, in which the teacher would ask me a question and I would be expected to expound on the topic. This approach will remain proof against technological cheating for some while yet.
And once again: why are we building these systems, exactly? People appear to find it amusing to chat to them — but people are very easy to fool. ELIZA could do it without burning millions of dollars of GPU time. There is far more good, valuable text out there already, generated by actual interesting human beings, than I can manage to read. I cannot fathom how anyone can think it a good idea to churn out a whole lot more text that is mediocre and often incorrect — especially because, once again, there is already far too much of that being generated by humans. Automating and accelerating the production of even more textual pablum will not improve life for anyone.
The potential for technological improvement over time is no defence, either. So what if in GPT-4 (or -5 or -6) the text gets somewhat less mediocre and is wrong (or racist) a bit less often? Then what? In what way does the creation and development of GPT improve the lot of humanity? At least Facebook and Google could claim a high ideal (even if neither of them lived up to those ideals, or engaged seriously with their real-world consequences). The entities behind GPT appear to be just as mindless as their creation.