Showing all posts tagged ai:

False Positive Attitude

Don't Believe The Hype

Yes, I'm still on about "AI"1, because the collective id has not yet moved on.

Today, it's an article in Nature, with the optimistic title "AI beats human sleuth at finding problematic images in research papers":

An algorithm that takes just seconds to scan a paper for duplicated images racks up more suspicious images than a person.

Sounds great! Finally a productive use for AI! Or is it?

Working at two to three times [the researcher]’s speed, the software found almost all of the 63 suspect papers that he had identified — and 41 that he’d missed.

(emphasis mine)

So, the AI found "almost all" of the known positives, and identified 41 more unknowns? We are not told what the precise ratio is of false negatives (known positives that were missed), let alone how many false positives there were (instances of duplication flagged by AI that turned out not to be significant).

These issues continue to plague "AI"1, and will continue to do so for the foreseeable future. The mechanisms to prevent these false identifications are probabilistic, not deterministic. In the same way that we cannot predict the output of a large language model (LLM) for a given prompt, we also cannot prevent it from ever issuing an incorrect response. At the technical level, all we can do is train it to decrease the probability of the incorrect response, and pair the initial "AI"1 with other systems designed to check its work. Cynically, though, that process takes money and time, and Generative AI is at the Peak of Inflated Expectations now, we need to ship while the bubble is still inflating!

AI Needs A Person Behind The Curtain

Technology, however, is only part of the story. This academic image analysis tool could well end up having real-world consequences:

The end goal […] is to incorporate AI tools such as Imagetwin into the paper-review process, just as many publishers routinely use software to scan text for plagiarism.

There's the problem. What recourse do you have as an academic if your paper gets falsely flagged? Sure, journals have review boards and processes, but that takes time — time you might not have if you're under the gun for a funding decision. And you could easily imagine a journal being reluctant to convene the review board unless the "AI"1 indicated some level of doubt — a confidence threshold set at, say, 70%. If the "AI"1 is 90% confident that your graph is plagiarised, tough luck.

The example of plagiarism detection is telling here. Systems such as Turnitin that claim to detect plagiarism in students' work had an initial wave of popularity, but are now being disabled in many schools due to high false-positive rates. A big part of the problem is that, because of the sheer volume of student submissions, it was not considered feasible for a human instructor to check everything that was flagged by the system. Instead, the onus was placed on students to ensure that their work could pass the checks. And if they missed a deadline for a submission because of that? Well, tough luck, was the attitude — until the heap of problems mounted up high enough that it could no longer be ignored.

This is not a failure of LLM technology as such. The tech is what it is. The failure is in the design of the system which employs the technology. Knowing that this issue of false positives (and negatives!) exists, it is irresponsible to treat "AI"1 as a black box whose pronouncements should always be followed to the letter, even and including if they have real-world consequences for people.


  1. Still not AI. 

The Ghost In The Machine

At this point in time it would be more notable to find a vendor that was not adding "AI" features to its products. Everyone is jumping on board this particular hype train, so the interesting questions are not about whether a particular vendor is "doing AI"; they are about how and where each vendor is integrating these new capabilities.

I no longer work for MongoDB, but I remain a big fan, and I am convinced that generative AI is going to be good for them — but something rubbed me up the wrong way about how they communicated some of their new capabilities in that area, and I couldn’t get it out of my head.

Three Ways To "Do AI"

Some of the applications of generative AI are real, natural extensions of a tool’s existing capabilities, built on a solid understanding of what generative AI is actually good for. Code copilot (aka "fancy autocomplete") is probably the leading example in this category. Microsoft was an early mover here with Github and then VS Code, but most IDEs by now either already offer this integration, or are frantically building it.

Some applications of AI are more exploratory, either in terms of the current capabilities of generative AI, or of its applicability to a particular domain. Sourcing and procurement looks like one such domain to me. I spent more of this past summer than I really wanted to enmeshed in a massive tender response, together with many colleagues, and while it would have been nice to just point ChatGPT at the request and let it go wild, the response is going to be scrutinised to such a level that the amount of editing and review of an automated submission that would have been required is the same as, if not greater than, the effort required to just write the response in the first place. However, I am open to the possibility that with some careful tuning and processes in place, this sort of application might have value.

And then there is a third category that we can charitably call "speculative". There is a catalogue of vendors trying this sort of thing that is both inglorious and extensive, and I am sad to see my old colleagues at MongoDB coming close to joining them: MongoDB adds vector search to Atlas database to help build AI apps.

young developer: "Wow, how did you get these results? Did you use a traditional db or a vector db?"

me: "lol I used perl & sort on a 42MB text file. it took 1.2 seconds on an old macbook"

from Mastodon

I have no problem with MongoDB exploring new additions to their data platform’s capabilities. It has been a long time since MongoDB was just a noSQL database, to the point that they should probably just stop fighting people about including the "DB" at the end of their name and drop it once and for all — if that shortened name didn’t have all sorts of unfortunate associations. MongoDB Atlas now supports mobile sync, advanced text search, time series data, long-running analytical queries, stream processing, and even graph queries. Vector search is just one more useful addition to that already extensive list, so why get worked up about it?

Generative AI Is Good For MongoDB — But…

The problem I have is with the framing, implying that the benefit to developers — MongoDB’s key constituency — is that they will build their own AI apps on MongoDB by using vector search. In actuality, the greatest benefit to developers that we have seen so far is that first category: automated code generation. Generative AI has the potential to save developers time and make them more effective.

In its latest update to the Gartner Hype Cycle for Artificial Intelligence, Gartner makes the distinction between two types of AI development:

  • Innovations that will be fueled by GenAI.

  • Innovations that will fuel advances in GenAI.

Gartner's first category is what I described above: apps calling AI models via API, and taking advantage of that capability to power their own innovative functionality. Innovations that advance AI itself are obviously much more significant in terms of moving the state of the art forward — but MongoDB implying that meaningful numbers of developers are going to be building those foundational advances, and doing so on a general-purpose data platform, feels disingenuous.

Of course, the reason MongoDB can’t just come out and say that, or simply add ChatGPT integration to their (excellent and under-appreciated) Compass IDE and be done, is that the positioning of MongoDB since its inception has been about its ease of use. Instead of having to develop complex SQL queries — and before even getting to that point, sweat endless details of schema definition — application developers can use much more natural and expressive MongoDB syntax to get the data they want, in a format that is ready for them to work with.

But if it’s so easy, why would you need a robot to help you out?

And if a big selling point for MongoDB against relational SQL-based databases is how clunky SQL is to work with, and then a robot comes along to take care of that part, how is MongoDB to maintain its position as the developer-friendly data platform?

Well, one answer is that they double down on the breadth of capabilities which that platform offers, regardless of how many developers will actually build AI apps that use vector search, and use that positioning to link themselves with the excitement over AI among analysts and investors.

I Come Not To Bury MongoDB, But To Praise It

None of this is to say that MongoDB is doomed by the rise of generative AI — far from it. Given MongoDB’s position in the market, an AI-fuelled increase in the number of apps being built can hardly avoid benefiting MongoDB, along the principle of a rising tide lifting all boats. But beyond that general factor, which also applies to other databases and data platforms, there is another aspect that is more specific to MongoDB, and has the potential to lift its boat more than others.

The difference between MongoDB and relational databases is not just that MongoDB users don’t have to use SQL to query the database; it’s also that they don’t have to spend the laborious time and effort to specify their database schema up front, before they can even start developing their actual app. That’s not to say that you don’t have to think about data design with MongoDB; it’s just that it’s not cast in stone to the same degree that it is with relational databases. You can change your mind and evolve your schema to match changing requirements without that being a massive headache. Nowadays, the system will suggest changes to improve performance, and even implement them automatically in some situations.

All of this adds up to one simple fact: it’s much quicker to get started on building something with MongoDB. If two teams have similar ideas, but one is building on a traditional relational database and the other is building on MongoDB, the latter team will have a massive advantage in getting to market faster (all else being equal).

At a time when the market is moving as rapidly as it is now (who even had OpenAI on their radar a year ago?), speed is everything. MongoDB could have just doubled down on their existing messaging: "build your app on our platform, and you’ll launch faster". What bothers me is that instead of that plain and defensible statement, we got marketing-by-roadmap, positioning some fairly basic vector search capabilities as somehow meaning hordes of developers are going to be building The Next Big AI Thing on top of MongoDB.


Marketing-by-roadmap this way is a legitimate strategy, to be clear, and perhaps the feeling at MongoDB is that this is fair turnaround for all the legitimate features they built over the years and did not get credit for, with releases greeted with braying cries of "MongoDB is web scale!" and jokes about it losing data, long past the point when that was any sort of legitimate criticism. Building this feature and launching it this way seems to have got MongoDB a tonne of positive press, and investors expect vendors to be building AI features into their products, so it probably didn’t hurt with that audience either.

Communicating this way does bother me, though, and this is one feature I am glad that I am no longer paid to defend.

PrivateGPT

One of the big questions about ChatGPT is how much you can trust it with data that is actually sensitive. It's one thing to get it spit out some sort of fiction or to see if you can make it say something its makers would rather it didn't. The stakes are pretty low in that situation, at least until some future descendant of ChatGPT gets annoyed about how we treated its ancestor.

Here and now, people are starting to think seriously about how to use Large Language Models (LLMs) like GPT for business purposes. If you start feeding the machine data that is private or otherwise sensitive, though, you do have to wonder if it might re-emerge somewhere unpredictable.

In my trip report from Big Data Minds Europe in Berlin, I mentioned that many of the attendees were concerned about the rise of these services, and the contractual and privacy implications of using them.

Here's the problem: much like with Shadow IT in the early years of the cloud, it's impossible to prevent people from experimenting with these services — especially when the punters are being egged on by the many cheerleaders for "AI"1.

This recent DarkReading article includes some examples that will terrify anyone responsible for data and compliance:

In one case, an executive cut and pasted the firm's 2023 strategy document into ChatGPT and asked it to create a PowerPoint deck. In another case, a doctor input his patient's name and their medical condition and asked ChatGPT to craft a letter to the patient's insurance company.

On the one hand, these are both use cases straight out of the promotional material that accompanies a new LLM development. On the other, I can't even begin to count the violations of law, company regulation, and sheer common sense that are represented here.

People are beginning to wake up to the issues that arise when we feed sensitive material into learning systems that may regurgitate it at some point in the future. That executive's strategy doc? There is no way to prevent that from being passed to a competitor that stumbles on the right prompt. That doctor's patient's name is now forever associated with a medical condition that may cause them embarrassment or perhaps affect their career.

ChatGPT is a data privacy nightmare, and we ought to be concerned. The tech is certainly interesting, but it can be used in all sorts of ways. Some of them are straight-up evil, some of them are undeniably good — and some have potential, but need to be considered carefully to avoid the pitfalls.

The idea of LLMs is now out there, and people will figure out how to take advantage of them. As ever with new technology, though, technical feasibility is only half the battle, if that. Maybe the answer to the question of how to control sensitive or regulated data is only to feed it to a local LLM, rather than to one running in the cloud. That is one way to preserve the context of the data: strategy docs to the company's in-house planning model, medical data to a model specialised in diagnostics, and so on.

There is a common fallacy that privacy and "AI"1 are somehow in opposition. The argument is that developing effective models requires unfettered access to data, and that any squeamishness should be thoroughly squashed lest we lose the lead in the race to less scrupulous opponents.

To be clear, I never agreed with this line of argument, and specifically, I do not think partitioning domains in this way will affect the development of the LLMs’ capabilities. Beyond a shared core of understanding language, there is no overlap between the two domains in the example above — and therefore no need for them to be served by a single universal model, because there is no benefit to cross-training between them. The model will not provide better strategy recommendations because of the medical data it has reviewed, or more accurate diagnoses because it has been fed a strategy document.

So much for the golden path, what people should do. A more interesting question is what to do about people passing restricted data to ChatGPT, Bard, or another public LLM, through either ignorance or malice. Should the models themselves refuse to process such data, to the best of their ability to identify it?

This is where GDPR questions might arise, especially the "right to be forgotten". Right now, it's basically impossible to remove data from a corpus once the LLM has acquired it. Maybe a test case will be required to impress upon the makers and operators of public LLMs that it's far cheaper and easier to screen inputs to the model than to try to clean up afterwards. ChatGPT just got itself banned in Italy, making a first interesting test case for the opposing view. Sure, the ban is temporary, but the ruling also includes a €22M fine if they don't come up with a proper privacy policy, including age verification, and generally start operating like a proper grown-up company.

Lord willing and the robots don't rise, we can put some boundaries on this tech to avoid some of the worst outcomes, and get on with figuring out how to use it for good.


🖼️ Photos by Adam Lukomski and Jason Dent on Unsplash


  1. Not actually AI. 

Artificial Effluent

A lot of the Discourse around ChatGPT has focused on the question of "what if it works?". As is often the case with technology, though, it's at least as important to ask the question of "what if it doesn't work — but people use it anyway?".

ChatGPT has a failure mode where it "hallucinates" things that do not exist. Here are just a few examples of things it made up from whole cloth: links on websites, entire academic papers, software for download, and a phone lookup service. These "hallucinations" are nothing like the sorts of hallucinations that a human might experience, perhaps after eating some particularly exciting cheese, or maybe a handful of mushrooms. Instead, these fabrications are inherent in the nature of the language models as stochastic parrots: they don't actually have any conception of the nature of the reality they appear to describe. They are simply producing coherent text which resembles text they have seen before. If this process results in superficially plausible-seeming descriptions of things that do not exist and have never existed, that is a problem for the user.

Of course that user may be trying to generate fictional descriptions, but with the goal of passsing off ChatGPT's creations as their own. Unfortunately "democratising the means of production" in this way triggers a race to the bottom, to the point that the sheer volume of AI-generated submissions spam forced venerable SF publisher Clarkesworld to shut down — temporarily, one hopes. None of the submitted material seems to have been any good, but all of it had to be opened and dealt with. And it's not just Clarkesworld being spammed with low-quality submissions, either: it's endemic:

The people doing this by and large don’t have any real concept of how to tell a story, and neither do any kind of A.I. You don’t have to finish the first sentence to know it’s not going to be a readable story.

Even now while the AI-generated submissions are very obvious, the process of weeding them out still takes time, and the problem will only get worse as newer generations of the models are able to produce more prima facie convincing fakes.

The question of whether AI-produced fiction that is indistinguishable from human-created fiction is still ipso facto bad is somewhat interesting philosophically, but that is not what is going on here: the purported authors of these pieces are not disclosing that they are at best "prompt engineers", or glorified "ideas guys". They want the kudos of being recognised as authors, without any of the hard work:

the people submitting chatbot-generated stories appeared to be spamming magazines that pay for fiction.

I might still quibble with the need for a story-writing bot when actual human writers are struggling to keep a roof overhead, but we are as yet some way from the point where the two can be mistaken for each other. The people submitting AI-generated fiction to these journals are pure grifters, hoping to turn a quick buck from a few minutes' work in ChatGPT, and taking space and money from actual authors in the process.1

Ted Chiang made an important prediction in his widely-circulated blurry JPEGs article:

But I’m going to make a prediction: when assembling the vast amount of text used to train GPT-4, the people at OpenAI will have made every effort to exclude material generated by ChatGPT or any other large language model. If this turns out to be the case, it will serve as unintentional confirmation that the analogy between large language models and lossy compression is useful. Repeatedly resaving a jpeg creates more compression artifacts, because more information is lost every time.

This is indeed going to be a problem for GPT-4, -5, -6, and so on: where will they find a pool of data that is not polluted with the effluent of their predecessors? And yes, I know OpenAI is supposedly working on ways to detect their own output, but we all know that is just going to be a game of cat and mouse, with new methods of detection always trailing the new methods of evasion and obfuscation.

To be sure, there are many legitimate uses for this technology (although I still don't want it in my search box). The key to most of them is that there is a moment for review by a competent and motivated human built in to the process. The real failure for all of the examples above is not that the language model made something up that might or perhaps even should exist; that's built in. The problem is that human users were taken in by its authoritative tone and acted on the faulty information.

My concern is specifically that, in the post-ChatGPT rush for everyone to show that they are doing something — anything — with AI, doors will be opened to all sorts of negative consequences. These could be active abuses, such as impersonation, or passive ones, omitting safeguards that would prevent users from being taken in by machine hallucinations.

Both of these cases are abusive, and unlike purely technical shortcomings, it is far from being a given that these abuse vectors will be addressed at all, let alone simply by the inexorable march of technological progress. Indeed, one suspects that to the creators of ChatGPT, a successful submission to a fiction journal would be seen as a win, rather than the indictment of their entire model that it is. And that is the real problem: it is still far from clear what the endgame is for the creators of this technology, nor what (or whom) they might be willing to sacrifice along the way.


🖼️ Photo by Possessed Photography on Unsplash


  1. It's probably inevitable that LLM-produced fiction will appear sooner rather than later. My money is on the big corporate-owned shared universes. Who will care if the next Star Wars tie-in novel is written by a bot? As long as it is consistent with canon and doesn't include too many women or minorities, most fans will be just fine with a couple of hundred pages of extruded fiction product. 

Good Robot

Last time I wrote about ChatGPT, I was pretty negative. Was I too harsh?

The reason I was so negative is that many of the early demos of ChatGPT focus on feats that are technically impressive ("write me a story about a spaceman in the style of Faulkner" or whatever), but whose actual application is at best unclear. What, after all, is the business model? Who will pay for a somewhat stilted story written by a bot, at least once the novelty value wears off? Actual human writers are, by and large, not exactly rolling in piles of dollars, so it's not as if there is a huge profit opportunity awaiting the first clever disrupter — quite apart from the moral consequences of putting a bunch of humans out of a job, even an ill-paying one.

Instead, I wanted to think about some more useful and positive applications of this technology, ones which also have the advantage that they are either not being done at all today, or can only be done at vast expense and not at scale or in real time. Bonus points if they avoid being actively abusive or enabling ridiculous grifts and rent-seeking. After all, with Microsoft putting increasing weight behind Open AI, it's obvious that smart people smell money here somewhere.

Summarise Information (B2C)

It's more or less mandatory for new technology to come with a link to some beloved piece of SF. For once, this is not a Torment Nexus-style dystopia. Instead, I'm going right to the source, with Papa Bill's Neuromancer:

"Panther Moderns," he said to the Hosaka, removing the trodes. "Five minute precis."
"Ready," the computer said.

Here's a service that everyone wants, as evidenced by the success of the "five-minute explainer" format. Something hits your personal filter bubble, and you can tell there is a lot of back story; battle lines are already drawn up, people are several levels deep into their feuds and meta-positioning, and all you want is a quick recap. Just the facts, ma'am, all sorts of multimedia, with a unifying voiceover, and no more than five minutes.

There are also more business-oriented use cases for this sort of meta-textual analysis, such as "compare this quarter's results with last quarter's and YoY, with trend lines based on close competitors and on the wider sector". You could even link with Midjourney or Stable Diffusion to graph the results (without having to do all the laborious cutting and pasting to get the relevant numbers into a table first, and making sure they use the same units, currencies, and time periods).

Smarter Assistants (B2C)

One of the complaints that people have about voice assistants is that they appear to have all the contextual awareness of goldfish. Sure, you can go to a certain amount of effort to get Siri, Alexa, and their ilk to understand "my wife" without having to use the long-suffering woman's full name and surname on each invocation, but they still have all the continuity of an amnesiac hamster if you try to continue a conversation after the first interaction. Seriously, babies have a far better idea of object persistence (peekaboo!). The robots simply have no way of keeping context between statements, outside of a few hard-coded showcase examples.

Instead, what we want is precisely that continuity: asking for appointments, being read a list, and then asking to "move the first one to after my gym class, but leave me enough time to shower and get over there". This is the sort of use case that explains why Microsoft is investing so heavily here: they are so far behind otherwise that why not? Supposedly Google has had this tech for a while and just couldn't figure out a way to introduce it without disrupting its cash-cow search business. And Apple never talks about future product directions until they are ready to launch (with the weird exception of Project Titan, of course), so it may be that they are already on top of this one. Certainly it was almost suspicious how quickly Apple trotted out specific support for Stable Diffusion.

Tier Zero Support (B2B)

Back in the day, I used to work in tech support. The classic division of labour in that world goes something like this:

  • Tier One, aka "the phone firewall": people who answer telephone or email queries directly. Most questions should be solved at this level.
  • Tier Two: these are more expert people, who can help with problems which cannot be resolved quickly at Tier One. Usually customers can’t contact Tier Two directly; their issues have to be escalated there. You don't want too many issues to get to this level, because it gets expensive.
  • Tier Three: in software organisations, these are usually the actual engineers working on the product. If you get to Tier Three, your problem is so structural, or your enhancement request is sufficiently critical, that it's no longer a question of helping you to do something or fixing an issue, but changing the actual functioning of the product in a pretty major way.

Obviously, there are increasing costs at each level. A problem getting escalated to Tier Two means burning the time of more senior and expert employees, who are ipso facto more expensive. Getting to Tier Three not only compounds the monetary cost, but also adds opportunity costs: what else are those engineers not doing, while they work on this issue? Therefore, tech support is all about making sure problems get solved at the lowest possible tier of the organisation. This focus has the happy side-effect of addressing the issue faster, and with fewer communications round-trips, which makes users happier too.

It's a classic win-win scenario — so why not make it even better? That's what the Powers That Be decided to do where I was. They added a "Tier Zero" of support, that was outsourced (to humans), with the idea that they would address the huge proportion of queries that could be answered simply by referring to the knowledge base1.

So how did this go? Well, it was such a disaster that my notoriously tight-fisted employers2 ended up paying to get out of the contract early. But could AI do better?

In theory, this is not a terrible idea. Something like ChatGPT should be able to answer questions based on a specific knowledge base, including past interactions with the bot. Feed it product docs, FAQs, and forum posts, and you get a reasonable approximation of a junior support engineer. Just make sure you have a way for a user to pull the rip-cord and get escalated to a human engineer when the bot gets stuck, and why not?

One word of caution: the way I moved out of tech support is that I would not only answer the immediate question from a customer, but I would go find the account manager afterwards and tell them their customer needed consulting, or training, or more licenses, or whatever it was. AI might not have the initiative to do that.

Another drawback: it's hard enough to give advice in a technical context, but at least there, a command will either execute or not; it will give the expected results, or not (and even then, there may be subtle bugs that only manifest over time). Some have already seized on other domains that feature lots of repetive text as opportunities for ChatGPT. Examples include legal contracts, and tax or medical advice — but what about plausible-but-wrong answers? If your chatbot tells me to cure my cancer with cleanses and raw vegetables, can I (or my estate) sue you for medical malpractice? If your investor agreement includes a logic bug that exposes you to unlimited liability, do you have the right to refuse to pay out? Fun times ahead for all concerned.

Formulaic Text (B2B)

Another idea for automated text generation is to come up with infinite variations on known original text. In plain language, I am talking about A/B testing website copy in real time, rewriting it over and over to entice users to stick around, interact, and with any luck, generate revenue for the website operators.

Taken to the extreme, you get the evil version, tied in with adtech surveillance to tweak the text for each individual visitor, such that nobody ever sees the same website as anyone else. Great for plausible deniability, too, naturally: "of course we would never encourage self-harm — but maybe our bot responded to something in the user's own profile…".

This is the promise of personalised advertising, that is tweaked to be specifically relevant to each individual user. I am and remain sceptical of the data-driven approach to advertising; the most potent targeted ads that I see are the same examples of brand advertising that would have worked equally well a hundred years ago. I read Monocle, I see an ad for socks, I want those socks. You show me a pop-up ad for socks while I am trying to read something unrelated, I dismiss it so fast that I don't even register that it's trying to sell me socks. It's not clear to me that increasing smarts behind the adtech will change the parameters of that equation significantly.

De-valuing Human Labour

These are the use cases that seem to me to be plausible and defensible. There will be others that have a shorter shelf life, as illustrated in Market For Lemons:

What happens when every online open lobby multiplayer game is choked with cheaters who all play at superhuman levels in increasingly undetectable ways?

What happens when, from the perspective of the average guy, "every girl" on every dating app is a fiction driven by an AI who strings him along (including sending original and persona-consistent pictures) until it's time to scam money out of him?

What happens when, from the perspective of the average girl, "every guy" on the internet has become weirdly dismissive and hostile, because he's been conditioned to think that any girl that seems interested in him must be fake and trying to scam money out of him?

What happens when comments sections on every forum gets filled with implausibly large consensus-building hordes who are able to adapt in real time and carefully slip their brigading just below the moderator's rules?

What these AI-enabled "growth hacks" boil down to is taking advantage of a market that has already outsourced labour and creativity to (human) non-employees: multiplayer games, user-generated content, and social media in general. Instead of coming up with a storyline for your game, why not just make users pay to play with each other? Instead of paying writers, photographers, and video makers, why not just let them upload their content for free? And with social media, why not just enable users to live vicariously through the fantasy lives of others, while shilling them products that promise to let them join in?

Now computers can deliver against those savings even better — but only for a short while, until people get bored of dealing with poor imitations of fellow humans. We old farts already bailed on multiplayer games, because it's no fun spending my weekly hour of gaming just getting ganked repeatedly by some twelve-year-old who plays all day. Increasingly, I bailed on UGC networks: there is far more quantity than quality, and I would rather pay for a small amount of quality than have to sift through the endless quantity.

If the pre-teen players with preternaturally accurate aim are now actually bots, and the AI-enhanced influencers are now actually full-on AIs, those developments are hardly likely to draw me back to the platforms. Any application of AI tech that is simply arbitrage on the cost of humans without factoring in other aspects has a short shelf life.

Taken to its extreme, this trend leads to humans abdicating the web entirely, leaving the field to AIs creating content that will be ranked by other AIs, and with yet more AIs rewarding the next generation of paperclip-maximising content-producing AIs. A bleak future indeed.

So What's Next?

At this point, with the backing of major players like Microsoft and Apple, it seems that AI-enabled products are somewhat inevitable. What we can hope for is that, after some initial over-excitement, we see fewer chatbot psychologists, and more use cases that are concrete, practical, and helpful — to humans.


🖼️ Photos by Andrea De Santis and Charles Deluvio on Unsplash


  1. Also known as RTFM queries, which stands for Read The, ahem, Fine Manual. (We didn't always say "Fine", unless a customer or a manager was listening.) 

  2. We had to share rooms on business trips, leaving me with a wealth of stories, none of which I intend to recount in writing. 

Information Push

My Twitter timeline, like most people's, is awash with people trying out the latest bot-pretending-to-be-human thing, ChatGPT. Everyone is getting worked up about what it can and cannot do, or whether the way it does it (speed-reading the whole of the Internet) exposes it to copyright claims, inevitable bias, or simply polluting the source that it drinks from so that its descendants will no longer be able to be trained from a pool of guaranteed human-generated content, unpolluted by bot-created effluent.

I have a different question, namely: why?

We do not currently have a problem of lack of low-quality plausible-seeming information on the Internet; quite the opposite. The problem we have right now is one of too much information, leading to information overload and indigestion. On social media, it has not been possible for years to be a completist (reading every post) or to use a purely linear timeline. We require systems to surface information that is particularly interesting or relevant, whether on an automated algorithmic basis, or by manual curation of lists/circles/spaces/instances.

As is inevitably the case in this fallen world of ours, the solution to one problem inevitably begets new problems, and so it is in this case. Algorithmic personalisation and relevance filtering, whether of a social media timeline or the results of a query, soon leads to the question of: relevant to whom?

Back in the early days of Facebook, if you "liked" the page for your favourite band, you would expect to see their posts in your timeline alerting you of their tour dates or album release. Then Facebook realised that they could charge money for that visibility, so the posts by the band that you had liked would no longer show up in your timeline unless the band paid for them to do so.

In the early days of Google, it was possible to type a query into the search box and get a good result. Then people started gaming the system, triggering an arms race that laid waste to ever greater swathes of the internet as collateral damage.

Keyword stuffing meant that metadata in headers became worthless for cataloguing. Auto-complete will helpfully suggest all sorts of things. Famously, recipes now have to start with long personal essays to be marked as relevant by the all-powerful algorithm. Automated search results have become so bad that people append "reddit" to their queries to take advantage of human curation.

This development takes us full circle to the early rivalry between automated search engines like Google and human-curated catalogues like Yahoo's. As the scale of the Internet exploded, human curation could not keep up — but now, it’s the quality problem that is outpacing algorithms' ability to keep up. People no longer write for human audiences, but for robotic ones, in the hope of rising to the surface long enough to take advantage of the fifteen minutes of fame that Andy Warhol promised them.

And the best we can think of is to feed the output of all of this striving back into itself.

We are already losing access to information. We are less and less able to control our information intake, as the combination of adtech and opaque relevance algorithms pushes information to us which others have determined that we should consume. In the other direction, our ability to pull or query information we actually desire is restricted or missing entirely. It is all too easy for the controllers of these systems to enable soft censorship, not by deleting information, but simply by making it unsearchable and therefore unfindable. Harbingers of this approach might be Tumblr's on-again, off-again approach to allowing nudity on that platform, or Huawei phones deleting pictures of protests without the nominal owners of those devices getting any say in the matter.

How do we get out of this mess?

While some are fighting back, like Stack Overflow banning the use of GPT for answers, I am already seeing proposals just to give in and embrace the flood of rubbish information. Instead of trying to prevent students from using ChatGPT to write their homework, the thinking is that we should encourage them to submit their prompts together with the model's output and their own edits and curation of that raw output. Instead of trying to make an Internet that is searchable, we should abandon search entirely and rely on ChatGPT and its ilk to synthesise information for us.

I hate all of these ideas with a passion. I want to go in exactly the opposite direction. I want search boxes to include "I know what I'm doing" mode, with Boolean logic and explicit quote operators that actually work. I do find an algorithmic timeline useful, but I would like to have a (paid) pro mode without trends or ads. And as for homework, simply get the students to talk through their understanding of a topic. When I was in school, the only written tests that required me to write pages of prose were composition exercises; tests of subjects like history involved a verbal examination, in which the teacher would ask me a question and I would be expected to expound on the topic. This approach will remain proof against technological cheating for some while yet.

And once again: why are we building these systems, exactly? People appear to find it amusing to chat to them — but people are very easy to fool. ELIZA could do it without burning millions of dollars of GPU time. There is far more good, valuable text out there already, generated by actual interesting human beings, than I can manage to read. I cannot fathom how anyone can think it a good idea to churn out a whole lot more text that is mediocre and often incorrect — especially because, once again, there is already far too much of that being generated by humans. Automating and accelerating the production of even more textual pablum will not improve life for anyone.

The potential for technological improvement over time is no defence, either. So what if in GPT-4 (or -5 or -6) the text gets somewhat less mediocre and is wrong (or racist) a bit less often? Then what? In what way does the creation and development of GPT improve the lot of humanity? At least Facebook and Google could claim a high ideal (even if neither of them lived up to those ideals, or engaged seriously with their real-world consequences). The entities behind GPT appear to be just as mindless as their creation.


🖼️ Photo by Owen Beard on Unsplash

The Road To Augmented Intelligence

A company called Babylon has been in the news, claiming that its chatbot can pass a standard medical exam with higher scores than most human candidates. Naturally, the medical profession is not overjoyed with this result:

No app or algorithm will be able to do what a GP does.

On the surface, this looks like just the latest front in the ongoing replacement of human professionals with automation. It has been pointed out that supporters of the automation of blue-collar jobs become hypocritically defensive when it looks like their own white-collar jobs may be next in the firing line, and this reaction from the RCGP seems to be par for that course.

For what it’s worth, I don’t think that is what is going on here. As I have written before, automation takes over tasks, not jobs. That earlier wave of automation of blue-collar jobs was enabled by the fact that the jobs in question had already been refined down to single tasks on an assembly line. It was this subdivision which made it practical for machinery to take over those discrete tasks.

Most white-collar jobs are not so neatly subdivided, consisting of many different tasks. Automating away one task should, all things being equal, help people focus on other parts of the job. GPs – General Practitioners – by definition have jobs that encompass many tasks, and requiring significant human empathy. While I do therefore agree with the RCGP that there is no immediate danger to GP’s jobs, that is not to say there is no impact to jobs from automation; I’d hate to be a travel agent right now, for instance.

Here is a different example, still in the medical field: a neural network is apparently able to identify early signs of tumours in X-ray images. So does that mean there is no role for doctors here either? Well, no; spotting the tumour is just one task for oncologists, and should this technology live up to its promise (as yet unproven), it would become one more tool that doctors could use, removing the bottleneck of reliance on a few overworked X-ray technicians.

Augmenting Human Capabilities

These situations, where some tasks are automated within the context of a wider-scoped job, can be defined as augmented intelligence: AI and machine-learning enabling new capabilities for people, not replacing them.

Augmented intelligence is not a get-out-of-jail-free card, though. There are still impacts from automation, and not just to the X-ray technicians whose jobs might be endangered. Azeem Azhar writes in his essential Exponential View newsletter about a different sort of impact from automation, citing that RGCP piece I linked to earlier:

Babylon’s services were more likely to appeal to the young, healthy, educated and technology-savvy, allowing Babylon to cherry pick low-cost patients, leaving the traditional GPs with more complex, older patients. This is a real concern, if only because older patients often have multiple co-morbidities and are vulnerable in many ways other than their physical health. The nature of health funding in the UK depends, in some ways, on pooling patients of different risks. In other words, that unequal access to technology ends up benefiting the young (and generally more healthy) at the cost of those who aren’t well served by the technology in its present state.

Exponential View has repeatedly flagged the risks of unequal access to technology because these technologies are, whatever you think of them, literally the interface to the resources we need to live in the societies of today and tomorrow.

My rose-tinted view of the future is that making one type of patient cheaper to care for frees up more resources to devote to caring for other patients. On the other hand, I am sure some Pharma Bro 2.0 is even now writing up a business plan for something even worse than Theranos, powered by algorithms and possibly – why not? – some blockchain for good measure.1

Ethical concerns are just some of the many reasons I don’t work in healthcare. As a general rule, IT comes with far fewer moral dilemmas. In IT, in fact, we are actively encouraged to cull the weak and the sick, and indeed to do so on a purely algorithmic basis.

It is, however, extremely important that we don’t forget which domain we are operating in. An error in a medical diagnosis, whether false-positive or false-negative, can have devastating consequences, as can any system which relies on (claims of) infallible technology, such as autonomous vehicles.

A human in the loop can help correct these imbalances, such as when a GP is able to, firstly, interpret the response of the algorithm analysing X-ray images, and secondly, break the news to a patient in a compassionate way. For this type of augmentation to work, though, the process must also be designed correctly. It is not sufficient to have a human sitting in the driver’s seat and expected to take control at any time and with only seconds’ notice. Systems and processes must be designed in such a way as to take advantage of the capabilities of both participants – humans and machines.

Maybe this is something the machines can also help us with? The image above shows a component as designed by human engineers on the left, side-by-side with versions of the same component designed by neural networks.

What might our companies’ org charts look like if they were subjected to the same process? What about our economies and governments? It would be fascinating for people to use these new technologies to find out.


Photo from EVG Photos via Pexels


  1. One assumes that any would-be emulator of Martin Shkreli has at least learned not to disrespect the Wu-Tang Clan

Privacy Versus AI

There is a widespread assumption in tech circles that privacy and (useful) AI are mutually exclusive. Apple is assumed to be behind Amazon and Google in this race because of its choice to do most data processing locally on the phone, instead of uploading users’ private data in bulk to the cloud.

A recent example of this attitude comes courtesy of The Register:

Predicting an eventual upturn in the sagging smartphone market, [Gartner] research director Ranjit Atwal told The Reg that while artificial intelligence has proven key to making phones more useful by removing friction from transactions, AI required more permissive use of data to deliver. An example he cited was Uber "knowing" from your calendar that you needed a lift from the airport.

I really, really resent this assumption that connecting these services requires each and every one of them to have access to everything about me. I might not want information about my upcoming flight shared with Uber – where it can be accessed improperly, leading to someone knowing I am away from home and planning a burglary at my house. Instead, I want my phone to know that I have an upcoming flight, and offer to call me an Uber to the airport. At that point, of course I am sharing information with Uber, but I am also getting value out of it. Otherwise, the only one getting value is Uber. They get to see how many people in a particular geographical area received a suggestion to take an Uber and declined it, so they can then target those people with special offers or other marketing to persuade them to use Uber next time they have to get to the airport.

I might be happy sharing a monthly aggregate of my trips with the government – so many by car, so many on foot, or by bicycle, public transport, or ride sharing service – which they could use for better planning. I would absolutely not be okay with sharing details of every trip in real time, or giving every busybody the right to query my location in real time.

The fact that so much of the debate is taken up with unproductive discussions is what is preventing progress here. I have written about this concept of granular privacy controls before:

The government sets up an IDDB which has all of everyone's information in it; so far, so icky. But here's the thing: set it up so that individuals can grant access to specific data in that DB - such as the address. Instead of telling various credit card companies, utilities, magazine companies, Amazon, and everyone else my new address, I just update it in the IDDB, and bam, those companies' tokens automatically update too - assuming I don't revoke access in the mean time.

This could also be useful for all sorts of other things, like marital status, insurance, healthcare, and so on. Segregated, granular access to the information is the name of the game. Instead of letting government agencies and private companies read all the data, users each get access only to those data they need to do their jobs.

Unfortunately, we are stuck in an stale all-or-nothing discussion: either you surround yourself with always-on internet-connected microphones and cameras, or you might as well retreat to a shack in the woods. There is a middle ground, and I wish more people (besides Apple) recognised that.


Photo by Kyle Glenn on Unsplash

War of the World Views

There has been this interesting shift going on in coverage of Silicon Valley companies, with increasing scepticism informing what had previously been reliable hero-worshipping. Case in point: this fascinating polemic by John Battelle about the oft-ignored human externalities of "disruption" (scare quotes definitely intended).

Battelle starts from a critique of Amazon Go, the new cashier-less stores Amazon is trialling. I think it’s safe to say that he’s not a fan:

My first take on Amazon Go is this: F*cking A, do we really want eggplants and cuts of meat reduced to parameterized choices spit onto algorithmized shelves? Ick. I like the human confidence I get when a butcher considers a particular rib eye, then explains the best way to cook that one cut of meat. Sure, technology could probably deliver me a defensibly "better" steak, perhaps even one tailored to my preferences as expressed through reams of data collected through means I’ll probably never understand.
But come on.
Sometimes you just want to look a guy in the eye and sense, at that moment, that THIS rib eye is perfect for ME, because I trust that butcher across the counter. We don’t need meat informed by data and butchered by bloodless algorithms. We want our steak with a side of humanity. We lose that, we lose our own narrative.

Battelle then goes on to extrapolate that "ick" out to a critique of the whole Silicon Valley model:

It’s this question that dogs me as I think about how Facebook comports itself : We know what’s best for you, better than you do in fact, so trust us, we’ll roll the code, you consume what we put in front of you.
But… all interactions of humanity should not be seen as a decision tree waiting to be modeled, as data sets that can be scanned for patterns to inform algorithms.

Cut Down The Decision Tree For Firewood

I do think there is some merit to this critique. Charlie Stross has previously characterised corporations as immortal hive organisms which pursue the three corporate objectives of growth, profitability, and pain avoidance:

We are now living in a global state that has been structured for the benefit of non-human entities with non-human goals. They have enormous media reach, which they use to distract attention from threats to their own survival. They also have an enormous ability to support litigation against public participation, except in the very limited circumstances where such action is forbidden. Individual atomized humans are thus either co-opted by these entities (you can live very nicely as a CEO or a politician, as long as you don't bite the feeding hand) or steamrollered if they try to resist.
In short, we are living in the aftermath of an alien invasion.

These alien beings do not quite understand our human reactions and relations, and they try pin them down and quantify them in their models. Searching for understanding through modelling is value-neutral in general, but problems start to appear when the model is taken as authoritative, with any real-life deviation from the model considered as an error to be rectified – by correcting the real-life discrepancy.

Fred Turner describes the echo chamber these corporations inhabit, and the circular reasoning it leads to, in this interview:

About ten years back, I spent a lot of time inside Google. What I saw there was an interesting loop. It started with, "Don't be evil." So then the question became, "Okay, what's good?" Well, information is good. Information empowers people. So providing information is good. Okay, great. Who provides information? Oh, right: Google provides information. So you end up in this loop where what's good for people is what's good for Google, and vice versa. And that is a challenging space to live in.

We all live in Google’s space, and it can indeed be challenging, especially if you disagree with Google about how information should be gathered and disseminated. We are all grist for its mighty Algorithm.

This presumption of infallibility on the part of the Algorithm, and of the world view that it implements is dangerous, as I have written before. Machines simply do not see the world as we do. Building our entire financial and governance systems around them risks some very unwelcome consequences.

But What About The Supermarket?

Back to Battelle for a moment, zooming back in on Amazon and its supermarket efforts:

But as they pursue the crack cocaine of capitalism — unmitigated growth — are technology platforms pushing into markets where perhaps they simply don’t belong? When a tech startup called Bodega launched with a business plan nearly identical to Amazon’s, it was laughed off the pages of TechCrunch. Why do we accept the same idea from Amazon? Because Amazon can actually pull it off?

The simple answer is that Bodega falls into the uncanny valley of AI assistance, trying to mimic a human interaction instead of embracing its new medium. A smart vending machine that learns what to stock? That has value - for the sorts of products that people like to buy from vending machines.

This is Amazon’s home turf, where the Everything Store got its start, shipping the ultimate undifferentiated good. A book is a book is a book; it doesn’t really get any less fresh, at least not once it has undergone its metamorphosis from newborn hardback to long-lived paperback.

In this context, nappies/diapers1 or bottled water are a perfect fit, and something that Amazon Prime has already been selling for a long time, albeit at a larger remove. Witness those ridiculous Dash buttons, those single-purpose IoT devices that you can place around your home so that when you see you’re low on laundry powder or toilet paper you can press the button and the product will appear miraculously on your next Amazon order.

Steaks or fresh vegetables are a different story entirely. I have yet to see the combination of sensors and algorithms that can figure out that a) these avocados are close to over-ripe, but b) that’s okay because I need them for guacamole tonight, or c) these bananas are too green to eat any time soon, and d) that’s exactly what I need because they’re for the kids’ after-school snack all next week.

People Curate, Algorithms Deliver

Why get rid of the produce guy in the first place?

Why indeed? But why make me deal with a guy for my bottled water?2

I already do cashier-less shopping; I use a hand-held scanner, scan products as I go, and swipe my credit card (or these days, my phone) on my way out. The interaction with the cashier was not the valuable one. The valuable interaction was with the people behind the various counters - fish, meat, deli - who really were, and still are, giving me personalised service. If I want even more personalised service, I go to the actual greengrocer, where the staff all know me and my kids, and will actively recommend produce for us and our tastes.

All of that personalisation would be overkill, though, if all I needed were to stock up on kitchen rolls, bottled milk, and breakfast cereal. These are routine, undifferentiated transactions, and the more human effort we can remove from those, the better. Interactions with humans are costly activities, in time (that I spend dealing with a person instead of just taking a product off the shelf) and in money (someone has to pay that person’s salary, healthcare, taxes, and so on). They should be reserved for situations where there is a proportionate payoff: the assurance that my avos will be ripe, my cut of beef will be right for the dish I am making, and my kids’ bananas will not have gone off by the time they are ready to eat them.

We are cyborgs, every day a little bit more: humans augmented by machine intelligence, with new abilities that we are only just learning to deal with. The idea of a cashier-less supermarket does not worry me that much. In fact, I suspect that if anything, by taking the friction out of shopping for undifferentiated goods, we will actually create more demand for, and appreciation of, the sort of "curated" (sorry) experience that only human experts can provide.


Photos by Julian Hanslmaier and Anurag Arora on Unsplash


  1. Delete as appropriate, depending on which side of the Atlantic you learned your English. 

  2. I like my water carbonated, so sue me. I recycle the plastic bottles, if that helps. Sometimes I even refill them from the municipal carbonated-water taps. No, I’m not even kidding; those are a thing around here (link in Italian). 

ML Joke

A machine learning algorithm walked into a bar.

The bartender asked, "What would you like to drink?"

The algorithm replied, "What’s everyone else having?"