Printing Money

I spent more time than I should have yesterday installing my mother-in-law’s new HP printer, and while I dodged the more obvious scams, I was actually shocked at how bad the experience was. There is absolutely no way that a normal person without significant IT experience could do it. And the worst part is that HP are in my experience the best — okay, least bad — printer manufacturer out there.

I'm going to document what happened in exhaustive detail because I still can't bring myself to believe some of what happened. It's not going to be a fun post. Sorry. If you want a fun post about how terrible printers are, here's one from The Oatmeal.

  • The "quick start" guide only showed the physical steps (remove packaging, connect power cord, add paper) and then offered a QR code to scan to deploy an HP app that would supposedly take care of the rest of the process.
  • The QR code lead to a URL that 404'd. In retrospect, this was the moment when I should have packed everything back up and shipped it back to HP.
  • Instead of following through on that much better plan and saving myself several hits to my sanity, some detective work helped me to identify what the app should be and find it in the Google Play Store (my MiL's computer is a Chromebook; this will be significant later).
  • The app's "install new printer" workflow simply scans the local network for printers. Since the step I was trying to accomplish was connecting the printer to Wi-Fi (this model doesn't have an on-board control panel, only an embedded web server), this scan was not particularly helpful.
  • The app's next suggestion was to contact support. Thanks, app.
  • After having checked the box for any additional docs, and finding only reams of pointless legal paperwork documenting the printer's compliance to various standards and treaties, I gingerly loaded up the HP web site to search for something more detailed.
  • The HP website's search function resolutely denied all knowledge of the printer model.
  • A Google search scoped to the HP web site found the printer's product page, which included an actual manual.
  • The manual asked me to connect to the printer's management interface, but at no point includes a step-by-step process. By piecing together various bits of information from the doc and some frantic Googling, I finally work out that I need to:
    • Connect to the printer's own ad-hoc Wi-Fi network;
    • Print a test page to get its IP address (this step involves holding down the paper feed button for 10 seconds);
    • Connect to that IP address;
    • Reassure the web browser that it's fine to connect to a website that is INSECURE!!1!
    • Not find the menu options from the doc, only some basic information about supplies;
    • Panic;
    • Note a tiny "Login" link hidden away in a corner;
    • Mutter "surely not…"
    • Fail to find any user credentials documented anywhere, or indeed any mention of a login flow;
    • Connect as "admin" with no password on a hunch;
    • Access the full management interface.
  • At this point I was finally able to authenticate the printer to the correct Wi-Fi network, at which point it promptly rebooted and then went catatonic for a worryingly long time before finally connecting.
  • But we're not done yet! The HP printer app claims to be able to set up the local printer on the Chromebook, but as far as I can tell, it doesn't even attempt to do this. However, we have a network connection, I can read out supply levels and what-not, how hard can this be?
  • Despite having Google Cloud Print enabled, nothing was auto-detected, so I created it as IPP (amazingly, this step is actually in the docs).
  • Time for a test print! The Chromebook's print queue showed the doc as PRINTED, but the printer didn’t produce anything, and as far as I could determine, it never hit the printer's own queue.
  • Hang head in hands.
  • Verified that my iPhone can see the printer (via AirPrint) and print to it. This worked first time.
  • Tried deleting the printer and re-creating it; somehow Google Cloud Print started working at this point, so the printer was auto-detected? The resulting config looked identical to what I created by hand, except with a port number specified instead of just an IP address.
  • Does it print now? HAHAHA of course not.
  • Repeat previous few steps with increasing muttering (can't swear or throw things because I am in my mother-in-law's home).
  • Decide to update software:
    • The Chromebook updates, reboots, no change.
    • The printer's product page does not show any firmware at all — unless you tell it you are looking for Windows software. There are official drivers for various Linux distros, but apparently they don't deserve firmware. There is nothing for macOS, because Apple wisely doesn't allow rando third-party printer drivers anywhere near their operating systems. And of course nothing for ChromeOS or "other", why would you ask?
    • Download the firmware from the Windows driver page, upload it to the printer's management UI — which quoth "firmware not valid".
    • Search for any checksum or other way to verify the download, and of course there is none.
    • Attempt to decode the version embedded in the file name, discover that it is almost impossible to persuade ChromeOS to display a file name that long.
    • Eventually decide that the installed and downloaded versions are probably the same, despite the installed one being over a year old.
  • Give up and run away, promising to return with new ideas, or possibly a can of petrol and a Zippo.

Good Robot

Last time I wrote about ChatGPT, I was pretty negative. Was I too harsh?

The reason I was so negative is that many of the early demos of ChatGPT focus on feats that are technically impressive ("write me a story about a spaceman in the style of Faulkner" or whatever), but whose actual application is at best unclear. What, after all, is the business model? Who will pay for a somewhat stilted story written by a bot, at least once the novelty value wears off? Actual human writers are, by and large, not exactly rolling in piles of dollars, so it's not as if there is a huge profit opportunity awaiting the first clever disrupter — quite apart from the moral consequences of putting a bunch of humans out of a job, even an ill-paying one.

Instead, I wanted to think about some more useful and positive applications of this technology, ones which also have the advantage that they are either not being done at all today, or can only be done at vast expense and not at scale or in real time. Bonus points if they avoid being actively abusive or enabling ridiculous grifts and rent-seeking. After all, with Microsoft putting increasing weight behind Open AI, it's obvious that smart people smell money here somewhere.

Summarise Information (B2C)

It's more or less mandatory for new technology to come with a link to some beloved piece of SF. For once, this is not a Torment Nexus-style dystopia. Instead, I'm going right to the source, with Papa Bill's Neuromancer:

"Panther Moderns," he said to the Hosaka, removing the trodes. "Five minute precis."
"Ready," the computer said.

Here's a service that everyone wants, as evidenced by the success of the "five-minute explainer" format. Something hits your personal filter bubble, and you can tell there is a lot of back story; battle lines are already drawn up, people are several levels deep into their feuds and meta-positioning, and all you want is a quick recap. Just the facts, ma'am, all sorts of multimedia, with a unifying voiceover, and no more than five minutes.

There are also more business-oriented use cases for this sort of meta-textual analysis, such as "compare this quarter's results with last quarter's and YoY, with trend lines based on close competitors and on the wider sector". You could even link with Midjourney or Stable Diffusion to graph the results (without having to do all the laborious cutting and pasting to get the relevant numbers into a table first, and making sure they use the same units, currencies, and time periods).

Smarter Assistants (B2C)

One of the complaints that people have about voice assistants is that they appear to have all the contextual awareness of goldfish. Sure, you can go to a certain amount of effort to get Siri, Alexa, and their ilk to understand "my wife" without having to use the long-suffering woman's full name and surname on each invocation, but they still have all the continuity of an amnesiac hamster if you try to continue a conversation after the first interaction. Seriously, babies have a far better idea of object persistence (peekaboo!). The robots simply have no way of keeping context between statements, outside of a few hard-coded showcase examples.

Instead, what we want is precisely that continuity: asking for appointments, being read a list, and then asking to "move the first one to after my gym class, but leave me enough time to shower and get over there". This is the sort of use case that explains why Microsoft is investing so heavily here: they are so far behind otherwise that why not? Supposedly Google has had this tech for a while and just couldn't figure out a way to introduce it without disrupting its cash-cow search business. And Apple never talks about future product directions until they are ready to launch (with the weird exception of Project Titan, of course), so it may be that they are already on top of this one. Certainly it was almost suspicious how quickly Apple trotted out specific support for Stable Diffusion.

Tier Zero Support (B2B)

Back in the day, I used to work in tech support. The classic division of labour in that world goes something like this:

  • Tier One, aka "the phone firewall": people who answer telephone or email queries directly. Most questions should be solved at this level.
  • Tier Two: these are more expert people, who can help with problems which cannot be resolved quickly at Tier One. Usually customers can’t contact Tier Two directly; their issues have to be escalated there. You don't want too many issues to get to this level, because it gets expensive.
  • Tier Three: in software organisations, these are usually the actual engineers working on the product. If you get to Tier Three, your problem is so structural, or your enhancement request is sufficiently critical, that it's no longer a question of helping you to do something or fixing an issue, but changing the actual functioning of the product in a pretty major way.

Obviously, there are increasing costs at each level. A problem getting escalated to Tier Two means burning the time of more senior and expert employees, who are ipso facto more expensive. Getting to Tier Three not only compounds the monetary cost, but also adds opportunity costs: what else are those engineers not doing, while they work on this issue? Therefore, tech support is all about making sure problems get solved at the lowest possible tier of the organisation. This focus has the happy side-effect of addressing the issue faster, and with fewer communications round-trips, which makes users happier too.

It's a classic win-win scenario — so why not make it even better? That's what the Powers That Be decided to do where I was. They added a "Tier Zero" of support, that was outsourced (to humans), with the idea that they would address the huge proportion of queries that could be answered simply by referring to the knowledge base1.

So how did this go? Well, it was such a disaster that my notoriously tight-fisted employers2 ended up paying to get out of the contract early. But could AI do better?

In theory, this is not a terrible idea. Something like ChatGPT should be able to answer questions based on a specific knowledge base, including past interactions with the bot. Feed it product docs, FAQs, and forum posts, and you get a reasonable approximation of a junior support engineer. Just make sure you have a way for a user to pull the rip-cord and get escalated to a human engineer when the bot gets stuck, and why not?

One word of caution: the way I moved out of tech support is that I would not only answer the immediate question from a customer, but I would go find the account manager afterwards and tell them their customer needed consulting, or training, or more licenses, or whatever it was. AI might not have the initiative to do that.

Another drawback: it's hard enough to give advice in a technical context, but at least there, a command will either execute or not; it will give the expected results, or not (and even then, there may be subtle bugs that only manifest over time). Some have already seized on other domains that feature lots of repetive text as opportunities for ChatGPT. Examples include legal contracts, and tax or medical advice — but what about plausible-but-wrong answers? If your chatbot tells me to cure my cancer with cleanses and raw vegetables, can I (or my estate) sue you for medical malpractice? If your investor agreement includes a logic bug that exposes you to unlimited liability, do you have the right to refuse to pay out? Fun times ahead for all concerned.

Formulaic Text (B2B)

Another idea for automated text generation is to come up with infinite variations on known original text. In plain language, I am talking about A/B testing website copy in real time, rewriting it over and over to entice users to stick around, interact, and with any luck, generate revenue for the website operators.

Taken to the extreme, you get the evil version, tied in with adtech surveillance to tweak the text for each individual visitor, such that nobody ever sees the same website as anyone else. Great for plausible deniability, too, naturally: "of course we would never encourage self-harm — but maybe our bot responded to something in the user's own profile…".

This is the promise of personalised advertising, that is tweaked to be specifically relevant to each individual user. I am and remain sceptical of the data-driven approach to advertising; the most potent targeted ads that I see are the same examples of brand advertising that would have worked equally well a hundred years ago. I read Monocle, I see an ad for socks, I want those socks. You show me a pop-up ad for socks while I am trying to read something unrelated, I dismiss it so fast that I don't even register that it's trying to sell me socks. It's not clear to me that increasing smarts behind the adtech will change the parameters of that equation significantly.

De-valuing Human Labour

These are the use cases that seem to me to be plausible and defensible. There will be others that have a shorter shelf life, as illustrated in Market For Lemons:

What happens when every online open lobby multiplayer game is choked with cheaters who all play at superhuman levels in increasingly undetectable ways?

What happens when, from the perspective of the average guy, "every girl" on every dating app is a fiction driven by an AI who strings him along (including sending original and persona-consistent pictures) until it's time to scam money out of him?

What happens when, from the perspective of the average girl, "every guy" on the internet has become weirdly dismissive and hostile, because he's been conditioned to think that any girl that seems interested in him must be fake and trying to scam money out of him?

What happens when comments sections on every forum gets filled with implausibly large consensus-building hordes who are able to adapt in real time and carefully slip their brigading just below the moderator's rules?

What these AI-enabled "growth hacks" boil down to is taking advantage of a market that has already outsourced labour and creativity to (human) non-employees: multiplayer games, user-generated content, and social media in general. Instead of coming up with a storyline for your game, why not just make users pay to play with each other? Instead of paying writers, photographers, and video makers, why not just let them upload their content for free? And with social media, why not just enable users to live vicariously through the fantasy lives of others, while shilling them products that promise to let them join in?

Now computers can deliver against those savings even better — but only for a short while, until people get bored of dealing with poor imitations of fellow humans. We old farts already bailed on multiplayer games, because it's no fun spending my weekly hour of gaming just getting ganked repeatedly by some twelve-year-old who plays all day. Increasingly, I bailed on UGC networks: there is far more quantity than quality, and I would rather pay for a small amount of quality than have to sift through the endless quantity.

If the pre-teen players with preternaturally accurate aim are now actually bots, and the AI-enhanced influencers are now actually full-on AIs, those developments are hardly likely to draw me back to the platforms. Any application of AI tech that is simply arbitrage on the cost of humans without factoring in other aspects has a short shelf life.

Taken to its extreme, this trend leads to humans abdicating the web entirely, leaving the field to AIs creating content that will be ranked by other AIs, and with yet more AIs rewarding the next generation of paperclip-maximising content-producing AIs. A bleak future indeed.

So What's Next?

At this point, with the backing of major players like Microsoft and Apple, it seems that AI-enabled products are somewhat inevitable. What we can hope for is that, after some initial over-excitement, we see fewer chatbot psychologists, and more use cases that are concrete, practical, and helpful — to humans.


🖼️ Photos by Andrea De Santis and Charles Deluvio on Unsplash


  1. Also known as RTFM queries, which stands for Read The, ahem, Fine Manual. (We didn't always say "Fine", unless a customer or a manager was listening.) 

  2. We had to share rooms on business trips, leaving me with a wealth of stories, none of which I intend to recount in writing. 

Disappearing In The Hills

I have a bunch of stuff I need to talk to people in the US about, but I had forgotten that today was MLK Day, so everything will have to wait one more day. I had an early call with a startup in the Middle East I am consulting with, but then found myself at 10am with an empty schedule for the day — so why not hop on the bike and disappear up into the hills until lunch time?

I had been hoping to climb out of the fog, but it was persistent until I got quite high up — and then I found that it was grey and overcast above the fog anyway. The views were very atmospheric, though, including this Brigadoon-like situation with a village appearing and disappearing amid the shifting billows.

Any day on the bike is a good day, though. I am in a bit of a holding pattern, waiting to get started on a number of projects, so a long(ish — 60km) ride is great for keeping myself from fretting.

I was happy that my legs were cooperating, too, since I was also out on the bike on Sunday, for a group ride on the other side of the Po. This is a part of the world I have never visited, even though it’s only a half-hour drive from me; I just pass through on the train or motorway en route to Milan. It’s a different vibe over there, but they have some fun trails, and it was a good day out — chilly, overcast, and muddy in spots, but at least the forecast rain stayed off.

I especially liked the souvenir for the day — way better than yet another T-shirt!

There are no photos up on the event page yet, and being in a group, I didn’t want to stop to take pictures — but at least I have this wine bottle to remind me…

Pilgrimage

Today’s ride was along part of the old pilgrim route from England through France to Rome, the Via Francigena. There is still a ferry crossing for the use of pilgrims at the Guado di Sigerico — and yes, Italian speakers will be jumping up and down at this point because "guado" means ford, but there is no ford there in modern times, just the ferry. Sigeric himself was an Archbishop of Canterbury who made the pilgrimage down to Rome for his investiture, and someone in his party documented the return leg.

More than a thousand years after Sigeric, it’s still quite common to meet pilgrims walking or cycling the route; it’s also part of the Europe-wide Eurovelo cycling network, as route EV5, appropriately enough named Via Romea Francigena. Sensible pilgrims however avoid setting out at the fag-end of the year, with only single-digit temperatures and overcast skies to look forward to, so today I had the road to myself.

There are still a number of chapels along the route around here, although I suspect they were more for the benefit of farm-workers rather than pilgrims; those tend to stop in towns and cities, just as they always did. Since the advent of mechanisation in farming, most of these field-side chapels are in poor repair. There are no longer armies of farm workers to gather for celebrations in the fields, just the odd tractor — and not even any of those at this time of year.

This particular chapel looks structurally sound from a distance, but as you approach the door, you realise that there is quite a lot of light making its way inside — more than those tiny grated windows could explain.

Sure enough, the roof fell in at some point. On the plus side, this means we can see the remains of the interior frescos a little better.

This is not the only lonely chapel I found today. Not needing the ferry across the Po, I left the pilgrim route at the mouth of the Tidone river and joined the Sentiero del Tidone, which runs along the banks of the eponymous river all the way from its source up in the Apennines down to its confluence into the Po, near the Guado di Sigerico.

This ruined farm-house had a little chapel beside it that someone had gone to some effort to clean up and revive.

This site is a little closer to a main road and still-occupied farms and hamlets, which maybe gives it just enough passing traffic to hang on as a just-about going concern? Or maybe the maintenance of this half-ruined chapel is one person’s project, giving the old building one last lease on life.

Of course on a late-December day these ruins could not help but look sad and brooding. They did not make the same melancholy impression on me when I last came through here in September, with a background that was green, growing, and sunlit, rather than muddy ploughed fields under lowering clouds.

Anyway, I got my miles in, and some thoughts out of my head, so I’m happy with that. Any ride is a good ride!

Information Push

My Twitter timeline, like most people's, is awash with people trying out the latest bot-pretending-to-be-human thing, ChatGPT. Everyone is getting worked up about what it can and cannot do, or whether the way it does it (speed-reading the whole of the Internet) exposes it to copyright claims, inevitable bias, or simply polluting the source that it drinks from so that its descendants will no longer be able to be trained from a pool of guaranteed human-generated content, unpolluted by bot-created effluent.

I have a different question, namely: why?

We do not currently have a problem of lack of low-quality plausible-seeming information on the Internet; quite the opposite. The problem we have right now is one of too much information, leading to information overload and indigestion. On social media, it has not been possible for years to be a completist (reading every post) or to use a purely linear timeline. We require systems to surface information that is particularly interesting or relevant, whether on an automated algorithmic basis, or by manual curation of lists/circles/spaces/instances.

As is inevitably the case in this fallen world of ours, the solution to one problem inevitably begets new problems, and so it is in this case. Algorithmic personalisation and relevance filtering, whether of a social media timeline or the results of a query, soon leads to the question of: relevant to whom?

Back in the early days of Facebook, if you "liked" the page for your favourite band, you would expect to see their posts in your timeline alerting you of their tour dates or album release. Then Facebook realised that they could charge money for that visibility, so the posts by the band that you had liked would no longer show up in your timeline unless the band paid for them to do so.

In the early days of Google, it was possible to type a query into the search box and get a good result. Then people started gaming the system, triggering an arms race that laid waste to ever greater swathes of the internet as collateral damage.

Keyword stuffing meant that metadata in headers became worthless for cataloguing. Auto-complete will helpfully suggest all sorts of things. Famously, recipes now have to start with long personal essays to be marked as relevant by the all-powerful algorithm. Automated search results have become so bad that people append "reddit" to their queries to take advantage of human curation.

This development takes us full circle to the early rivalry between automated search engines like Google and human-curated catalogues like Yahoo's. As the scale of the Internet exploded, human curation could not keep up — but now, it’s the quality problem that is outpacing algorithms' ability to keep up. People no longer write for human audiences, but for robotic ones, in the hope of rising to the surface long enough to take advantage of the fifteen minutes of fame that Andy Warhol promised them.

And the best we can think of is to feed the output of all of this striving back into itself.

We are already losing access to information. We are less and less able to control our information intake, as the combination of adtech and opaque relevance algorithms pushes information to us which others have determined that we should consume. In the other direction, our ability to pull or query information we actually desire is restricted or missing entirely. It is all too easy for the controllers of these systems to enable soft censorship, not by deleting information, but simply by making it unsearchable and therefore unfindable. Harbingers of this approach might be Tumblr's on-again, off-again approach to allowing nudity on that platform, or Huawei phones deleting pictures of protests without the nominal owners of those devices getting any say in the matter.

How do we get out of this mess?

While some are fighting back, like Stack Overflow banning the use of GPT for answers, I am already seeing proposals just to give in and embrace the flood of rubbish information. Instead of trying to prevent students from using ChatGPT to write their homework, the thinking is that we should encourage them to submit their prompts together with the model's output and their own edits and curation of that raw output. Instead of trying to make an Internet that is searchable, we should abandon search entirely and rely on ChatGPT and its ilk to synthesise information for us.

I hate all of these ideas with a passion. I want to go in exactly the opposite direction. I want search boxes to include "I know what I'm doing" mode, with Boolean logic and explicit quote operators that actually work. I do find an algorithmic timeline useful, but I would like to have a (paid) pro mode without trends or ads. And as for homework, simply get the students to talk through their understanding of a topic. When I was in school, the only written tests that required me to write pages of prose were composition exercises; tests of subjects like history involved a verbal examination, in which the teacher would ask me a question and I would be expected to expound on the topic. This approach will remain proof against technological cheating for some while yet.

And once again: why are we building these systems, exactly? People appear to find it amusing to chat to them — but people are very easy to fool. ELIZA could do it without burning millions of dollars of GPU time. There is far more good, valuable text out there already, generated by actual interesting human beings, than I can manage to read. I cannot fathom how anyone can think it a good idea to churn out a whole lot more text that is mediocre and often incorrect — especially because, once again, there is already far too much of that being generated by humans. Automating and accelerating the production of even more textual pablum will not improve life for anyone.

The potential for technological improvement over time is no defence, either. So what if in GPT-4 (or -5 or -6) the text gets somewhat less mediocre and is wrong (or racist) a bit less often? Then what? In what way does the creation and development of GPT improve the lot of humanity? At least Facebook and Google could claim a high ideal (even if neither of them lived up to those ideals, or engaged seriously with their real-world consequences). The entities behind GPT appear to be just as mindless as their creation.


🖼️ Photo by Owen Beard on Unsplash

AWS re:Invent 2022

At this time of year, with the nights drawing in, thoughts turn inevitably to… AWS' annual Las Vegas extravaganza, re:Invent. This year I'm attending remotely again, like it's 2020 or something, which is probably better for my liver, although I am definitely feeling the FOMO.

Day One: Adam Selipsky Keynote

I skipped Monday Night Live due to time zones, but as usual, this first big rock on the re:Invent calendar is a barrage of technical updates, with few hints of broader strategy. That sort of thing comes in the big Tuesday morning keynote with Adam Selipsky.

Last year was his first time taking over after Andy Jassy's ascension to running the whole of Amazon, not just AWS. This year’s delivery was more polished, plus it looks like we have seen the last of the re:Invent House Band. Adam Selipsky himself though was still playing the classics, talking up the benefits of cloud computing for cost savings and using examples such as Carrier or Airbnb to allude to companies' desire to be agile with fewer resources.

Still, it's a bit of a double-take to hear AWS still talking about cloud migration in 2022 — even if, elsewhere in Vegas, there was a memorable endorsement of migration to the cloud from Ukraine's Minister for Digital Transformation. Few AWS customers have to contend with the sorts of stress and time pressure that Mykhailo Fedorov did!

In the keynote, the focus was mostly on exhortations to continue investing in the cloud. I didn't see Andy Jassy's signature move of presenting a slide that shows cloud penetration as still being a tiny proportion of the market, but that was definitely the spirit: no reason to slow down, despite economic headwinds; there's lots more to do.

Murdering the Metaphors

We then got to the first of various metaphors that would be laboriously and at length tortured to breaking point and beyond. The first was space exploration, and admittedly there were some very pretty visuals to go with the point being belaboured: namely, that just like images captured in different wavelengths show different data to astronomers, different techniques used to explore data can deliver additional results.

There were some good customer examples in this segment: Expedia Group making 600B predictions on 70 Petabytes of data, and Pinterest storing 1 Exabyte of data on S31. That sort of scale is admittedly impressive, but this was the first hint that the tempo of this presentation would be slower, with a worse ratio of content to time than we had been used to in the Jassy years.

Tools, Integration, Governance, Insights

This led to a segment on the right tools, integration, and governance for working with data, and the insights that would be possible. The variety of tools is something I had focused on in my report from re:Invent 2021, in which I called out AWS' "one database engine for each use case" approach and questioned whether this was what developers actually wanted.

Initially, it seemed that we were getting more of the same, with Amazon Aurora getting top billing. The metrics in particular were very much down in the weeds, mentioning that Aurora offered 1/10 the cost of commercial DBMS, while also having up to 3x performance of PostgreSQL and 5x the performance of MySQL2.

We then heard about how customers also need analytics tools, not just transactional ones, such as EMR, MSK, and Redshift for high performance on structured data - 5x better price performance than "other cloud data warehouses" (a not-particularly-veiled dig at Snowflake, here — more of a Jassy move, I felt).

The big announcement in this section was OpenSearch Serverless. This launch means that AWS offers serverless options for all of its analytics services. According to Selipsky, "no-one else can say that". However, it is worth checking the fine print. In common with many "serverless" offerings, OpenSearch Serverless has a minimum spend of 4 OCUs — or $700 in real money. Scaling to zero is a key requirement and expectation of serverless, so it is disappointing to see so many offerings like this one that devolve to elastic scalability on top of a fixed base. Valuable, to be sure, but not quite so revolutionary.

ETL Phone Home

Then things got interesting.

Adam Selipsky made an example of a retail company running its operations on DynamoDB and Aurora and needing to move data to Redshift for analysis. This is exactly the sort of situation I decried in last year's report for The New Stack: too many single-purpose databases, leaving users trying to copy data back and forth, with the attendant risk of loss of control over their data.

It seems that AWS product managers had been hearing the same feedback that I had, but instead of committing to one general-purpose database, they are doubling down on their best-of-breed approach. Instead, they enabled federated query in Redshift and Athena to query other services — including third-party ones.

The big announcement was zero-ETL integration between Aurora and Redshift. This was advertised as being "near real time", with latency measured in seconds — good enough for most use cases, although something to be aware of for more demanding situations. The integration also works with multiple Aurora instances all feeding into one Redshift instance, which is what you want. Finally, the integration was advertised as being "all serverless", scaling up and down in response to data volume.

Take Back Control

So that's the integration — but that only addresses questions of technical complexity and maybe cost of storage. What about governance? Removing the need for ETL from one system into another does remove one big issue, which is the creation of a second copy of the data without the access controls and policy enforcement applied to the original. However, there is still a need to track metadata — data about the data itself.

Enter Amazon DataZone, which enables users to discover, catalog, share, and govern data across organisations. What this means in practice is that there is a catalog of available data, with metadata, labels, and descriptions. Authorised consumers of the data can search, browse, and request access, using existing tools: Redshift, Athena, and Quicksight. There is also a partner API for third-party tools; Snowflake and Tableau were mentioned specifically.

The Obligatory AI & ML Segment

I was not the only attendee to note that AWS spent an inordinate amount of time on AI & ML, given AWS' relatively weak position in that market.

Adam Selipsky talked up the "most complete set of machine learning and AI services", as well as claiming that Sagemaker is the most popular IDE for ML. A somewhat-interesting example is ML-powered forecasting: take a metric on a dashboard and extend it into the future, using ML to include seasonal fluctuations and so on. Of course this is only slightly more realistic than just using a ruler to extend the line, but at least it saves the time needed to make the line look credibly irregular.

More Metaphors

Then we got another beautiful video segment, which Adam Selipsky used to bridge somehow from underwater exploration to secure global infrastructure and GuardDuty. The main interesting announcement in this segment was Amazon SecurityLake, a "dedicated data lake to combine security data at petabyte scale". Data in the lake can be queried with Athena, OpenSearch, and Sagemaker, as well as third-party tools.

It didn’t sound like there was massive commitment to this offering, so the whole segment ended up sounding opportunistic. The whole thing reminded me of Tim Bray's recent tale of how AWS never did get into blockchain stuff: as long as people are going to do something, you might as well make it easy.

In this case, what people are doing is dumping all their logs into one place in the hope that they can find the right algorithm to sift them with and find interesting patterns that map to security issues. The most interesting aspect of SecurityLake is that it is the first tool to support the new Open Cybersecurity Schema Framework format. This is a nominally open format (Cisco and Splunk were mentioned as contributors), but it is notable that the examples in the OCSF white paper are all drawn from AWS services. OCSF is a new format, only launched in August 2022, so ultimate adoption by the industry is still unclear.

Trekking Towards The End

By this point in the presentation I was definitely flagging, but there was another metaphor to torture, this time about polar exploration. Adam Selipsky contrasted the Scott and Amundsen expeditions, which seemed in remarkably poor taste, what with all the ponies and people dying — although the anecdote about Amundsen bringing a tin-smith to make sure his cans of fuel stayed sealed was admittedly a good one, and the only non-morbid part of the whole segment. Anyway, all of this starvation and death — of the explorers, I mean, not the keynote audience, although if I had gone before breakfast I would have been regretting it by this point — was in service of making the point that specific tools are better than general ones.

We got a tour of what felt like a large proportion of AWS' 600+ instance types, with shade thrown at would-be Graviton competitors that have not yet appeared, more ML references with Inferentia chips, and various stories about HPC. Here it was noticeable that the customer example use case uses Intel Xeon chips, despite all of those earlier Graviton references.

One More Metaphor

There was one more very pretty video on imagination, but it was completely wasted on supply chains and call centres.

There was one last interesting offering, though, building on that earlier point about governance and access. This was AWS Clean Rooms, a solution to enable secure collaboration on datasets without sharing access to the underlying data itself. This is useful when working across organisational boundaries, because instead of copying data (which means losing control over the copy), it reads data in place, and thereby maintains restrictions on that data. Quicksight, Sagemaker, and Redshift all integrate with this service at launch.

There was one issue hanging over this whole segment, though. The Clean Rooms example was from advertising, which leads to a potential (perception of) conflict of interest with Amazon's own burgeoning advertising business. Like another new service, AWS Supply Chain, it's easy to imagine this offering being a non-starter simply because of the competitive aspect, much like retailers prefer to work with other cloud providers than AWS.

Turn It To Eleven

All in all, nothing earth-shattering — certainly nothing like Andy Jassy's cavalcade of product announcements, upending client and vendor roadmaps every minute or so. Maybe that is as it should be, though, for an event which is in its eleventh year. And this may well be why Adam Selipsky opted for a different approach to "the cloud is still in its infancy", when it is so clearly a market that is maturing fast. In particular, we are seeing a maturation in the treatment of data, from a purely technical focus on specific tasks to a more holistic lifecycle view. This shift is very much in line with the expectations of the market; however, at least based on this keynote, AWS is playing catch-up rather than defining the field of competition. In particular, all of the governance tools only work with analytical (OLAP) tools, not with real-time transactional (OLTP) tools. That would be a truly transformative move, especially if it can be accomplished without too much of a performance penalty.

The other thing that is maturing is AWS' own approach, moving inexorably up the stack from simple technical building blocks to full-on turnkey business applications. This shift does imply a change in target buyers, though; AWS' old IT audience may have been happy to swipe a credit card, read the docs, and start building, but the new audience they are quoting with Supply Chain and Clean Rooms certainly will not. It will be interesting to watch this transformation take place.


  1. It was not clarified how much of that data is used to poison image search engines. 

  2. Relevant because Aurora (and RDS which it is built on) is based on PostgreSQL and MySQL, with custom storage enhancements to give that speed improvement. 

Marketing Without Surveillance

This is a post that I drafted when Facebook released their last results, and never got around to publishing. Why publish it now? For a start, none of this is breaking news, so it remains as relevant as it ever was. More importantly, with the ongoing bonfire of Twitter, the questions of whether ad-funded social networks are a good thing or not is more relevant than ever.

My position remains that none of this tracking nonsense is worth while. I have never been served a relevant ad through surveillance-driven adtech. Meanwhile, brand advertising works just fine, simply by virtue of the brand being present in the right context: bike gear on a cycling blog, that sort of very limited targeting that only requires a single bit of information about the audience.

Meta Loses Top-10 Ranking by Market Value Amid Worst Month Ever
Social media company falls behind Tencent in value ranking
Facebook parent has lost $513 billion in market cap from peak
Stock has fallen 46% from last year’s record.

What do the terrible results announced by Facebook — I refuse to give in to their desire that we call them Meta — actually mean?

Zuck blamed Apple's ad tracking prevention features for wiping $10B off their bottom line, and there has been a concerted push since to present this as somehow a bad thing, especially for small businesses. I agree with Nick Heer that this framing is pretty gross on Facebook's part, but what I wanted to do today is to discuss alternatives that are open to marketers today.

I'm not in marketing these days, and I never worked directly in the demand-generation side that would get actively involved with this sort of thing — but I have worked closely with those teams and been in the planning meetings, so I have at least an idea of how that business works.

Everything starts with a campaign: you have a particular message you want to get out, you want it to reach a particular audience, and you want some idea of how effective it is. Given those goals, there are different ways to go about running your campaign — different largely in their ethics, rather than in their actual results. Let's take a look.
Alice and Bob work for ACME Widgets Corp. Both of them are launching marketing campaigns for the coming quarter — but they take different approaches, even though they have the same metrics set by their boss, Eve the VP of Marketing.

Alice goes all-in on the surveillance model: her emails have tracking pixels, the links they point to are all gated behind a form that also signs you up for a newsletter, she places ads that follow users around the web once they have come within her surveillance web. She even messes with the favicon and the hosted fonts on the website in order to be able to track users that way. At the end, thanks to all of this effort, Alice can show Eve attribution metrics with a certain click-though rate for her outreach and a certain acquisition cost per customer, set against their likely lifetime value to ACME.

Bob takes a different tack: his emails are plain text, without even any images — since plenty of people now reflexively block all images in email, or load them through proxies. The links in the email are customised so that Bob can tell which email was the one that triggered the action, but then they go directly to the linked resource. He also buys ads, but instead of direct calls to action, Bob focuses on brand advertising in the sorts of publications that the prospective customers are likely to read. At the end, Bob can also show Eve attribution metrics, click-through rates and customer acquisition costs — but he has got there with without irritating prospective customers, or falling foul of either technical countermeasures or policies such as GDPR or CCPA.

Comparing Alice and Bob’s Results

Effectively, Alice and Bob have access to the same metrics; it's just that one of them is going about the process of gathering them honestly. The only data point Bob is missing is the open rate on those emails — but first of all, how useful is that metric in reality? If the indicator that an email was opened is that a tracking pixel was loaded, Alice doesn't know whether the recipient actually read the whole thing, or paged past her email quickly on their way to something they actually wanted. And even assuming that it's an accurate representation of how many people read the text but don't click on any of the links — what can Alice do with that information that Bob would not also do with the information that he sent out X number of emails and Y% of recipients clicked on the call-to-action link? And no, for goodness sake, the answer is not even more layers of attribution woo that claims to be able to identify whether someone came to the ACME website because they remembered the email, or the billboard ad, or because someone mentioned it to them at work — let alone trying to embed the "read progression" code that far too many websites now include.

Secondly, all of these intrusive metrics now have a firm expiry date stamped on them. On top of the ad tracking prevention, Apple now offers a Private Relay capability in iCloud that hides originating IP addresses. Browsers already no longer report a whole lot of information that they used to, precisely because it was used for creepy tracking stuff. By building her campaigns this way, Alice might achieve her goals today, but soon she will not be able to run campaigns like this, and will have to learn to do things Bob's way anyway.

At the core of Bob's method is turning tracking inside out. Instead of trying to stalk users around the Web, engaging in a constant arms race and violating their clearly expressed preference, Bob simply figures out where his most valuable prospects gather and advertises there. First-party data is enough for his purposes, and while individual ads might be more expensive in CPM, he avoids engaging with an ecosystem that is ridden with fraud. He also does not need to worry that the ACME ad might show up beside some tin-foil-hatter YouTube channel and get bad press that way — and the time he doesn't spend micro-managing ad placement can be spent more productively on creating better copy, or an entire other campaign.

Context matters in other ways, too: when a prospective customer is reading about the latest political crisis, famine, or natural disaster, they are not in a widget-buying mood, so showing them a widget ad is counter-productive anyway. Instead, Bob puts his widget ads in widget blogs, places them with streamers who test widgets, and gets hosts of widget-focused podcasts to read out his ads. All of these channels have very limited tracking; podcasts offer none at all, unless Bob creates a special landing page or discount code for listeners of each podcast. And yet, those are some of the most expensive ad slots around, because the context makes them very strong indicators of desire to buy.

Eve looks at the campaign performance numbers presented by a haggard Alice and a relaxed Bob, remembers the news stories about Apple and Google clamping down further on ad tracking, and suggests gently to Alice that maybe she should sit with Bob and figure out how to get the job done without the crutch of surveillance ad tech.


🖼️ Photos by Charles Deluvio and Headway on Unsplash

Retracing My Steps

Another ride report post! This time, I decided on the spur of the moment to try a route I hadn't ridden before. It turned out to be a wee bit longer than I had really allowed for, which made me slightly late for family Sunday lunch — oops. I had also forgotten to charge my Apple Watch, so this ride went unrecorded, but I'm pretty sure the distance was around 80km, so not bad. The highest point was around 550m, but there was a fair bit of up and down, so the total vert would be quite a bit more.

Two of the things that make me happiest are bicycles and mountains, though, so riding up into the mountains like this does me an enormous amount of good. Here are some of the highlights of Sunday's ride.

I had only just left the tarmac when I saw three deer bouncing through the wispy fog that was still drifting across the ploughed fields. They moved fast enough that by the time I had stopped and got my phone out, I needed the 3x zoom — and one of the deer got away entirely. For such an extreme shot from a phone camera, I'm not unhappy with the results.

I also love that the scenery looks pretty wild in this framing, but actually it's still pretty close to a bunch of warehouses and factories, a true liminal space. The early part of this route is stitched together from tracks between fields to avoid busy roads, but it's still pretty close to industrial areas.

A little further along, and with the sun burning off the last vestiges of the mist, I stopped again because I liked the view of the river rippling across the stones. After this stop, though, I hit some pretty technical riding and had to concentrate on where I was putting my wheels. Some rain has finally arrived after the long drought, and then motorbikes (ugh) had come through, so all the mud was churned up into mire.

On my mountain bike I'd probably have been fine, but the Bianchi has some intermediate gravel tyres that are pretty smooth in the centre and with only a little bit of tread on the sides, as well as being narrower than MTB tyres. This is the sort of terrain where I'm glad to have proper pedals that I can unclip from and ride along with my feet free just in case I lose my balance and need to put a foot down in a hurry. Anyway, I got through without too much trouble, despite a lot of slipping and sliding. I did have to stop to clear out the plug of mud between rear wheel and frame once I got out of the woods, and then I walked the bike along the edge of one field that had been ploughed right to the river's edge, not leaving any smooth terrain to ride on.

Nothing much to say about this tower, I just always like the look of it. This is also where the trail finally starts to climb out of the plain.

This is an old railway bridge, and because the road bridge is just upstream, it's reserved for walking and riding. It's not at all signposted, either, so you have to know it's there; I rarely see anyone else on it.

One of the reasons I ride a gravel bike is so that I can spend as little time as possible sharing the road with cars. It's tough to avoid that when it comes to river crossings, though! One newer bridge around here has a cycle path slung underneath it, and one of the busier bridges carved out a cycle path in a redesign, but this one is the best of all.

After that I rode properly up into the hills, climbing up out of the Nure valley and over the watershed down into the Trebbia valley before heading home. Unfortunately the day clouded over a bit too, so although I did stop to take a few more shots, they aren't nearly so scenic. I did want to share this one, though, because that rocky outcrop in the middle distance already featured in a past ride report.

Business Case In The Clouds

A perennial problem in tech is people building something that is undeniably cool, but is not a viable product. The most common definition of "viable" revolves around the size and accessibility of the target market, but there are other factors as well: sustainability, profitability, growth versus funding, and so on.

I am as vulnerable as the next tech guy to this disease, which is just one of many reasons why I stay firmly away from consumer tech. I know myself well enough to be aware that I would fall in love with something that is perfectly suited to my needs and desires — and therefore has a minuscule target market made up of me and a handful of other weirdos.

One of the factors that makes this a constant ongoing problem, as opposed to one that we as an industry can resolve and move on from, is that advancing tech continuously expands the frontiers of what is possible, but market positioning does not evolve in the same direction or at the same speed. If something simply can't be done, you won't even get to the "promising demo video on Kickstarter" stage. If on the other hand you can bodge together some components from the smartphone supply chain into something that at least looks like it sort of works, you might fool yourself and others into thinking you have a product on your hands.

The thing is, a product is a lot more than just the technology. There are a ton of very important questions that need to be answered — and answered very convincingly, with data to back up the answers — before you have an actual product. Here are some of the key questions:

  • How many people will buy one?
  • How much are they willing to pay?
  • Given those two numbers, can we even manufacture our potential product at a cost that lets us turn a profit? If we have investors, what are their expectations for the size of that profit?
  • Are there any regulations that would bar us from entering a market (geographical or otherwise)? How much would it cost to comply with those regulations? Are we still profitable after paying those costs?
  • How are we planning to do customer acquisition? If we have a broad market and a low-cost product, we're going to want to blanket that segment with advertising and have as self-service a sales channel as possible. On the other hand, if we are going high-end and bespoke, we need an equally bespoke sales channel. Both options cost money, and they are largely mutually exclusive. And again, that cost comes out of our profit margin.
  • What's the next step? Is this just a one-shot campaign, or do we have plans for a follow-on product, or an expansion to the product family?
  • Who are our competitors? Do they set expectations for our potential customers?
  • How might those competitors react? Can they lower their own prices enough that we have to reduce ours and erode our profit margin? Can they cross-promote with other products while we are stuck being a one-trick pony?

These are just some of the obvious questions, the ones that you should not move a single step forward without being able to answer. There are all sorts of second- and third-order follow-ups to these. Nevertheless, things-that-are-not-viable-products keep showing up, simply because they are possible and technically cool.

Possible, Just Not Viable

One example of how this process can play out would be Google Stadia (RIP). At the time of its launch, everyone was focused on technical feasibility:

[...] streaming games from datacenters like they’re Netflix titles has been unproven tech, and previous attempts have failed. And in places like the US with fixed ISP data caps, how would those hold up to 4-20 GB per hour data usage?

[...] there was one central question. Would it even work?

Some early reviewers did indeed find that the streaming performance was not up to scratch, but all the long-term reports I heard from people like James Whatley were that the streaming was not the problem:

The gamble was always: can Google get good at games faster than games can get good at streaming. And I guess we know (we always knew) the answer now. To be clear: the technology is genuinely fantastic but it was an innovation that is looking - now even more overtly - for a problem to solve.

As far as we can tell from the outside (and it will be fascinating to read the tell-all book when it comes out), Google fixated on the technical aspect of the problem. In fairness, they were and are almost uniquely well-placed to make the technology work that enables game streaming: data centers everywhere, fast network connections, and in-house expertise on low-latency data streaming. The part which apparently did not get sufficient attention was how to turn those technical capabilities into a product that would sell.

Manufacturing hardware is already not Google's strong suit. Sure, they make various phones and smart home devices, but they are bit-players in terms of volume, preferring to supply software to an ecosystem of OEMs. However, what really appears to have sunk Stadia is the pricing strategy. The combination of both a monthly subscription and having to buy individual games appears to have been a deal-killer, especially in the face of other streaming services from long-established players such as Microsoft or Sony which only charge a subscription fee.

To recap: Google built some legitimately very cool technology, but priced it in a way that made it unattractive to its target customers. Those customers were already well-served by established suppliers, who enjoyed positive reputations — as opposed to Google's reputation for killing services, one that has been further reinforced by the whole Stadia fiasco. Finally, there was no uniquely compelling reason to adopt Stadia — no exclusives, no special integration with other Google services, just "isn't it cool to play games streamed from the cloud instead of running on your local console?" Gamers already own consoles or game on their phones, especially the ones with the sort of fat broadband connection required to enable Stadia to work; there is not a massive untapped market to expand into here.

So much for Google. Can Facebook — sorry, Meta — do any better?

Open Questions In An Open World

Facebook rebranded as Meta to underline its commitment to a bright AR/VR future in the Metaverse (okay, and to jettison the increasingly stale and negative branding of the Blue App). The question is, will it work?

Early indications are not good: Meta’s flagship metaverse app is too buggy and employees are barely using it, says exec in charge. Always a sign of success when even the people building the thing can't find a reason to spend time with it. Then again, in fairness, the NYT reports that spending time in Meta's Horizon VR service was "surprisingly fun", so who knows.

The key point is that the issue with Meta is not one of technical feasibility. AR/VR are possible-ish today, and will undoubtedly get better soon. Better display tech, better battery life, and better bandwidth are all coming anyway, driven by the demands of the smartphone ecosystem, and all of that will also benefit the VR services. AR is probably a bit further out, except for industrial applications, due to the need for further miniaturisation if it's going to be accepted by users.

The relevant questions for Meta are not tech questions. Benedict Evans made the same point discussing Netflix:

As I look at discussions of Netflix today, all of the questions that matter are TV industry questions. How many shows, in what genres, at what quality level? What budgets? What do the stars earn? Do you go for awards or breadth? What happens when this incumbent pulls its shows? When and why would they give them back? How do you interact with Disney? These are not Silicon Valley questions - they’re LA and New York questions.

The same factors apply to Horizon. It's a given that Meta can build this thing; the tech exists or is already on the roadmap, and they have (or can easily buy) the infrastructure and expertise. The questions that remain are all "but why, tho" questions:

  • Who will use Horizon? How many of these people exist?
  • How will Horizon pay for itself? Subscriptions — in exchange for what value? Advertising — in what new formats?
  • What's the plan for customer acquisition? Meta keeps trying to integrate its existing services, with unified messaging across Facebook, Instagram, and WhatsApp, but it doesn't really seem to be getting anywhere with consumers.
  • Following on from that point, is any of this going to be profitable at Meta's scale? That qualification is important: to move the needle for Zuckerberg & co., this thing has to rope in hundreds of millions of users. It can't just hit a Kickstarter milestone and declare victory.
  • What competitors are out there, and what expectations have they already set? If Valve failed to get traction with VR when everybody was locked down at home and there was a new VR-exclusive Half-Life game1, what does that say about the addressable market?

None of these are questions that can be answered based on technical capabilities. It doesn't matter how good the display tech in the headsets is, or whether engineers figure out how to give Horizon avatars innovative features such as, oh I don't know, legs. What matters is what people can do in Horizon that they can't do today, IRL or in Flatland. Nobody will don a VR headset to look at Instagram photos; that works better on a phone. And while some people will certainly try to become VR influencers, that is a specialised skill requiring a ton of support; it's not going to be every aspiring singer, model, or fitness instructor who is going to make that transition. Meta will need a clear and convincing answer that is not "what if work meetings but worse in every way".

So there you have it, one failed product and one that is still unproven, both cautionary tales of putting the tech before the actual product.


  1. I love this devastating quote from PCGamesN: "Half-Life: Alyx, [...] artfully crafted though it was, [...] had all the cultural impact of a Michael Bublé album." Talk about vicious!