Showing all posts tagged aws:

AWS re:Invent 2022

At this time of year, with the nights drawing in, thoughts turn inevitably to… AWS' annual Las Vegas extravaganza, re:Invent. This year I'm attending remotely again, like it's 2020 or something, which is probably better for my liver, although I am definitely feeling the FOMO.

Day One: Adam Selipsky Keynote

I skipped Monday Night Live due to time zones, but as usual, this first big rock on the re:Invent calendar is a barrage of technical updates, with few hints of broader strategy. That sort of thing comes in the big Tuesday morning keynote with Adam Selipsky.

Last year was his first time taking over after Andy Jassy's ascension to running the whole of Amazon, not just AWS. This year’s delivery was more polished, plus it looks like we have seen the last of the re:Invent House Band. Adam Selipsky himself though was still playing the classics, talking up the benefits of cloud computing for cost savings and using examples such as Carrier or Airbnb to allude to companies' desire to be agile with fewer resources.

Still, it's a bit of a double-take to hear AWS still talking about cloud migration in 2022 — even if, elsewhere in Vegas, there was a memorable endorsement of migration to the cloud from Ukraine's Minister for Digital Transformation. Few AWS customers have to contend with the sorts of stress and time pressure that Mykhailo Fedorov did!

In the keynote, the focus was mostly on exhortations to continue investing in the cloud. I didn't see Andy Jassy's signature move of presenting a slide that shows cloud penetration as still being a tiny proportion of the market, but that was definitely the spirit: no reason to slow down, despite economic headwinds; there's lots more to do.

Murdering the Metaphors

We then got to the first of various metaphors that would be laboriously and at length tortured to breaking point and beyond. The first was space exploration, and admittedly there were some very pretty visuals to go with the point being belaboured: namely, that just like images captured in different wavelengths show different data to astronomers, different techniques used to explore data can deliver additional results.

There were some good customer examples in this segment: Expedia Group making 600B predictions on 70 Petabytes of data, and Pinterest storing 1 Exabyte of data on S31. That sort of scale is admittedly impressive, but this was the first hint that the tempo of this presentation would be slower, with a worse ratio of content to time than we had been used to in the Jassy years.

Tools, Integration, Governance, Insights

This led to a segment on the right tools, integration, and governance for working with data, and the insights that would be possible. The variety of tools is something I had focused on in my report from re:Invent 2021, in which I called out AWS' "one database engine for each use case" approach and questioned whether this was what developers actually wanted.

Initially, it seemed that we were getting more of the same, with Amazon Aurora getting top billing. The metrics in particular were very much down in the weeds, mentioning that Aurora offered 1/10 the cost of commercial DBMS, while also having up to 3x performance of PostgreSQL and 5x the performance of MySQL2.

We then heard about how customers also need analytics tools, not just transactional ones, such as EMR, MSK, and Redshift for high performance on structured data - 5x better price performance than "other cloud data warehouses" (a not-particularly-veiled dig at Snowflake, here — more of a Jassy move, I felt).

The big announcement in this section was OpenSearch Serverless. This launch means that AWS offers serverless options for all of its analytics services. According to Selipsky, "no-one else can say that". However, it is worth checking the fine print. In common with many "serverless" offerings, OpenSearch Serverless has a minimum spend of 4 OCUs — or $700 in real money. Scaling to zero is a key requirement and expectation of serverless, so it is disappointing to see so many offerings like this one that devolve to elastic scalability on top of a fixed base. Valuable, to be sure, but not quite so revolutionary.

ETL Phone Home

Then things got interesting.

Adam Selipsky made an example of a retail company running its operations on DynamoDB and Aurora and needing to move data to Redshift for analysis. This is exactly the sort of situation I decried in last year's report for The New Stack: too many single-purpose databases, leaving users trying to copy data back and forth, with the attendant risk of loss of control over their data.

It seems that AWS product managers had been hearing the same feedback that I had, but instead of committing to one general-purpose database, they are doubling down on their best-of-breed approach. Instead, they enabled federated query in Redshift and Athena to query other services — including third-party ones.

The big announcement was zero-ETL integration between Aurora and Redshift. This was advertised as being "near real time", with latency measured in seconds — good enough for most use cases, although something to be aware of for more demanding situations. The integration also works with multiple Aurora instances all feeding into one Redshift instance, which is what you want. Finally, the integration was advertised as being "all serverless", scaling up and down in response to data volume.

Take Back Control

So that's the integration — but that only addresses questions of technical complexity and maybe cost of storage. What about governance? Removing the need for ETL from one system into another does remove one big issue, which is the creation of a second copy of the data without the access controls and policy enforcement applied to the original. However, there is still a need to track metadata — data about the data itself.

Enter Amazon DataZone, which enables users to discover, catalog, share, and govern data across organisations. What this means in practice is that there is a catalog of available data, with metadata, labels, and descriptions. Authorised consumers of the data can search, browse, and request access, using existing tools: Redshift, Athena, and Quicksight. There is also a partner API for third-party tools; Snowflake and Tableau were mentioned specifically.

The Obligatory AI & ML Segment

I was not the only attendee to note that AWS spent an inordinate amount of time on AI & ML, given AWS' relatively weak position in that market.

Adam Selipsky talked up the "most complete set of machine learning and AI services", as well as claiming that Sagemaker is the most popular IDE for ML. A somewhat-interesting example is ML-powered forecasting: take a metric on a dashboard and extend it into the future, using ML to include seasonal fluctuations and so on. Of course this is only slightly more realistic than just using a ruler to extend the line, but at least it saves the time needed to make the line look credibly irregular.

More Metaphors

Then we got another beautiful video segment, which Adam Selipsky used to bridge somehow from underwater exploration to secure global infrastructure and GuardDuty. The main interesting announcement in this segment was Amazon SecurityLake, a "dedicated data lake to combine security data at petabyte scale". Data in the lake can be queried with Athena, OpenSearch, and Sagemaker, as well as third-party tools.

This sounds very much like Tim Bray's recent tale of how AWS never did get into blockchain stuff: as long as people are going to do something, you might as well make it easy.

In this case, what people are doing is dumping all their logs into one place in the hope that they can find the right algorithm to sift them with and find interesting patterns that map to security issues. The most interesting aspect of SecurityLake is that it is the first tool to support the new Open Cybersecurity Schema Framework format. This is a nominally open format (Cisco and Splunk were mentioned as contributors), but it is notable that the examples in the OCSF white paper are all drawn from AWS services. OCSF is a new format, only launched in August 2022, so ultimate adoption by the industry is still unclear.

Trekking Towards The End

By this point in the presentation I was definitely flagging, but there was another metaphor to torture, this time about polar exploration. Adam Selipsky contrasted the Scott and Amundsen expeditions, which seemed in remarkably poor taste, what with all the ponies and people dying — although the anecdote about Amundsen bringing a tin-smith to make sure his cans of fuel stayed sealed was admittedly a good one, and the only non-morbid part of the whole segment. Anyway, all of this starvation and death — of the explorers, I mean, not the keynote audience, although if I had gone before breakfast I would have been regretting it by this point — was in service of making the point that specific tools are better.

We got a tour of what felt like a large proportion of AWS' 600+ instance types, with shade thrown at would-be Graviton competitors that have not yet appeared, more ML references with Inferentia chips, and various stories about HPC. Here it was noticeable that the customer example use case uses Intel Xeon chips, despite all of those earlier Graviton references.

One More Metaphor

There was one more very pretty video on imagination, but it was completely wasted on supply chains and call centres.

There was one last interesting offering, though, building on that earlier point about governance and access. This was AWS Clean Rooms, a solution to enable secure collaboration on datasets without sharing access to the underlying data itself. This is useful when working across organisational boundaries, because instead of copying data (which means losing control over the copy), it reads data in place, and thereby maintains restrictions on that data. Quicksight, Sagemaker, and Redshift all integrate with this service at launch.

There was one issue hanging over this whole segment, though. The Clean Rooms example was from advertising, which leads to a potential (perception of) conflict of interest with Amazon's own burgeoning advertising business. Like another new service, AWS Supply Chain, it's easy to imagine this offering being a non-starter simply because of the competitive aspect, much like retailers prefer to work with other cloud providers than AWS.

Turn It To Eleven

All in all, nothing earth-shattering — certainly nothing like Andy Jassy's cavalcade of product announcements, upending client and vendor roadmaps every minute or so. Maybe that is as it should be, though, for an event which is in its eleventh year. And this may well be why Adam Selipsky opted for a different approach to "the cloud is still in its infancy", when it is so clearly a market that is maturing fast. In particular, we are seeing a maturation in the treatment of data, from a purely technical focus on specific tasks to a more holistic lifecycle view. This shift is very much in line with the expectations of the market; however, at least based on this keynote, AWS is playing catch-up rather than defining the field of competition. In particular, all of the governance tools only work with analytical (OLAP) tools, not with real-time transactional (OLTP) tools. That would be a truly transformative move, especially if it can be accomplished without too much of a performance penalty.

The other thing that is maturing is AWS' own approach, moving inexorably up the stack from simple technical building blocks to full-on turnkey business applications. This shift does imply a change in target buyers, though; AWS' old IT audience may have been happy to swipe a credit card, read the docs, and start building, but the new audience they are quoting with Supply Chain and Clean Rooms certainly will not. It will be interesting to watch this transformation take place.


  1. It was not clarified how much of that data is used to poison image search engines. 

  2. Relevant because Aurora (and RDS which it is built on) is based on PostgreSQL and MySQL, with custom storage enhancements to give that speed improvement. 

Amazon's Private Cloud

Last week was AWS re:Invent, and I’m still dealing with the email hangover.1 AWS always announce a thousand and one new offerings and services at their show, and this year was no exception. However, there is one announcement that I wanted to reflect upon briefly, out of however many there were during the week.

AWS Outposts are billed as letting users "Run AWS infrastructure on-premises for a truly consistent hybrid experience". This of course provoked a certain amount of hilarity in the parts of Twitter that have been earnestly debating the existence of hybrid cloud since the term was first coined.

On the surface, it might indeed seem somewhat strange for AWS, the archetypal public cloud in most people’s minds, to start offering hardware to be deployed on customers’ premises. However, to me it makes perfect sense.

Pace some ten-year-old marketing slogans which have not aged well, most companies do not start out with a hybrid cloud strategy. Instead, they find themselves forced by circumstances to formulate one in order to deal with all of the various departments that are out there doing their own thing. In this situation, the hybrid cloud strategy is simply recognition that different teams have different requirements and have made their own decisions based on those. All that corporate IT can do is try to gain overall visibility and attempt to ensure that all the various flavours of compute infrastructure are at least being used in ways which are sane, secure, and fiscally responsible (the order of the priorities may change, but that’s the list).

Some of the more wild-eyed predictions around hybrid cloud instead expected that workloads would be easily moved, not only between on- and off-premises compute infrastructure, but even between different cloud providers. In fact, it would be so easy that it would be possible to make minute-by-minute assessments of the cost of running workloads with different providers, and move them from one to another in order to take advantage of lower prices.

Obviously, that did not happen.

For the cloud broker model to work, several laws of both economics and physics would have to be suspended or circumvented, and nobody seems to have made the requisite breakthroughs.

To take just a few of the more obvious objections:

The Speed Of Light

Moving any meaningful amount of data around the public internet still takes time. If you are used to your local 100 Gb-E LAN, it can be easy to forget this, but it is going to be a factor out there in the wild wild Web. This objection was obvious when we were talking about moving monolithic VMs around, but even if you assume truly immutable infrastructure, you are still going to have to shift at least some snapshot of the application state, and that adds up fast – let alone the rate of configuration drift of your "immutable" infrastructure with each new micro-release.

Transparent Pricing

The units of measure of different cloud providers are not easily comparable. How does the performance of an AWS M5 instance compare to an Azure Dv2-series? Well, you’d better know before you move production over there… And AWS has 24 instance types, whereas Azure has 7 different series, each with sub-types and options – and let’s not even talk about all the weird and wonderful single-use configurations in your local VMware or Openstack service catalogue! How portable is your workload, really?

Leaving Money On The Table

Or let’s take it from the other side: assume you have carefully architected your thing to use only minimum-common-denominator components that are, if not identical, at least similar enough across all of the various substrates they might find themselves running on. By definition, this means that you are not taking full advantage of the more advanced capabilities of each of those platforms. This limitation is not only at the ingredient level; you also have to make worst-case assumptions about the sorts of network bandwidth and latency you might have access to, or the sort of regulatory and policy compliance environment that you might find yourself operating within.

For all of these reasons and more, the dream of real-time cloud pricing arbitrage died a quick death, regardless of whether individual companies might use different cloud providers in various parts of their business.

Amazon Outposts is not that. For a start, despite running physically on the customer’s premises, it is driven entirely from the (remote) AWS control plane. Instead, it has the potential to address concerns about physical location together with associated concerns about latency and legal jurisdiction. Being AWS (with some help from VMware) it avoids the concern about different units of measure. For now, it only goes part of the way to resolving the final question about minimum common denominator ingredients, since at launch it only supports EC2. Additional features are expected shortly, however, especially including various storage options.

So yes, hybrid cloud. Turns out, it’s not only still a thing, but you can even get it from AWS. Who’d have thunk it?


  1. I managed to avoid any hangovers of the alcoholic variety; staying well hydrated in Las Vegas is good for multiple purposes. My inbox, however, is a mess

Enterprise IT on the shelf

Cross-posted to my work blog and to Linkedin


If there has been one overarching theme of the last few years in IT, it has been the changing relationship between enterprise IT departments and the users that they support.

Users have always wanted more IT faster, and this has always driven advances in the field. Minicomputers were the shadow IT of their day, democratising access to computing that had previously been locked up in mainframes. (By the way, did you know that the mainframe is fifty years young and still going strong?)

Departments would purchase their own minicomputers to avoid having to share time on the big corporate machines with others. This new breed of machine introduced application compatibility for the first time. In other words, it was no longer necessary to program for a specific machine. Higher-level languages also made that task of programming much easier.

Microcomputers and personal desktop computers were the next step in that evolution. At this stage it became feasible for people to have their own personal machine and run their own tasks in their own time, and for a while IT departments lost much of their control. The arrival of computer networks swung the balance the other way, until the widespread adoption of mobile devices started the swing back again.

Seen in this way, cloud computing is just the latest move in a long dance. The tempo is increasing, however, and it becomes more critical to make the right moves.

One make-or-break move is the very first public one, when a company decides to shift at least some of its workloads to the public cloud. It’s important to remember that Amazon was not designed to be traditional IT and trying to treat it that way is a route to failure.

IT-on-the-shelf.jpeg.jpg

To get an idea of the sort of problems we want to avoid, here’s an example from a completely different domain. If you have ever furnished a house or a flat, the odds are good that you have wandered around IKEA, feeling lost and disoriented, and possibly having a furious argument with your significant other as well.

Assuming the shopping trip didn’t end in mayhem and disaster - and personally I always count it as a success when I get out of IKEA without either of those - you may well have bought an Expedit shelving unit. The things are ubiquitous, together with their cousins, the Billy shelving units. I should know, I own both.

The bad news is, IKEA is discontinuing the Expedit and replacing it with a slightly different unit, the Kallax. This has infuriated customers who liked being able to replace or extend their existing furniture with additional bits.

What has this got to do with IT? What IKEA has done is break backwards compatibility in their products: you can no longer just get “more of the same", and unless you are furnishing an entire new home, you will probably have to deal with both the old and the new model at the same time.

Enterprise IT departments are facing the same problem with cloud computing. They want to take advantage of the fantastic capabilities of this new model, but they need to do it without breaking the things that are working for their users today. They don’t have the luxury that startups do of engineering their entire operation from the ground up for cloud. They have a history, and all sorts of things that are built on top of that history.

On the other hand, they can’t just treat a virtual server in the public cloud as being the same as the physical blade server humming away in their datacenter. For a start, much of the advantage of the public cloud is based around a fundamentally different operating model. It has been said that servers used to be pets, given individual names, pampered and hand-reared, while in the cloud we treat them like cattle, giving them numbers and putting them down as soon as it’s convenient.

The public cloud is great, but it works best for certain workloads. On the other hand, there are plenty of workloads that are still better off running on-premises, or even (gasp!) directly on physical hardware. The trick is knowing the difference, and managing your entire IT estate that way.

This is part and parcel of BMC’s New IT: make it easy for users to get what they need, when they need it. To find out more about what BMC can do to make your cloud initiative successful, please visit www.bmc.com/cloud.