Showing all posts tagged devops:

The curve points the way to our future

url.png

Just a few days ago, I wrote a post about how technology and services do not stand still. Whatever model we can come up with based on how things are right now, it will soon be obsolete, unless our model can accomodate change.

One of the places where we can see that is with the adoption curve of Docker and other container architectures. Anyone who thought that there might be time to relax, having weathered the virtualisation and cloud storms, is in for a rude awakening.

Who is using Docker?

Sure, the latest Docker adoption survey still shows that most adoption is in development, with 47% of respondents classifying themselves as "Developer or Dev Mgr", and a further 15% as "DevOps or Release Eng". In comparison, only 12% of respondents were in "SysAdmin / Ops / SRE" roles.

Also, 56% of respondents are from companies with fewer than 100 employees. This makes sense: long-established companies have too much history to be able to adopt the hot new thing in a hurry, no matter what benefits it might promise.

What does happen is that small teams within those big companies start using the new cool tech in the lab or for skunkworks projects. Corporate IT can maybe ignore these science experiments for a while, but eventually, between the pressure of those research projects going into production, and new hires coming in from smaller startups that have been working with the new technology stack for some time, they will have to figure out how they are going to support it in production.

Shipping containers

If the teams in charge of production operations have not been paying attention, this can turn into Good news for Dev, bad news for Ops, as my colleague Sahil wrote on the official Moogsoft blog. When it comes to Docker specifically, one important factor for Ops is that containers tend to be very short-lived, continuing and accelerating the trend that VMs introduced. Where physical servers had a lifespan of years, VMs might last for months - but containers have been reported to have a lifespan four times shorter than VMs.

That’s a huge change in operational tempo. Given that shorter release cycles and faster scaling (up and down) in response to demand are among the main benefits that people are looking for from Docker adoption, this rapid churn of containers is likely to continue and even accelerate.

VMs were sometimes used for short-duration tasks, but far more often they were actually forklifted physical servers, and shoe-horned into that operational model. This meant that VMs could sometimes have a longer lifespan than physical servers, as it was possible for them simply to be forgotten.

Container-based architectures are sufficiently different that there is far less risk of this happening. Also, the combination of experience and generational turnover mean that IT people are far more comfortable with the cloud as an operational model, so there is less risk of backsliding.

The Bow Wave

The legacy enterprise IT departments that do not keep up with the new operational tempo will find themselves in the position of the military, struggling to adapt to new realities because of its organisational structure. Armed forces set up for Cold War battles of tanks, fighters and missiles struggle to deal with insurgents armed with cheap AK-47s and repurposed consumer technology such as mobile phones and drones.

In this analogy, shadow IT is the insurgency, able to pop up from nowhere and be just as effective as - if not more so than - the big, expensive technological solutions adopted by corporate. On top of that, the spiralling costs of supporting that technological legacy will force changes sooner or later. This is known as the "bow wave" of technological renewal:

"A modernization bow wave typically forms as the overall defense budget declines and modernization programs are delayed or stretched in the future," writes Todd Harrison of the Center for Strategic and International Studies. He continues: "As this happens the underlying assumption is that funding will become available to cover these deferred costs." These delays push costs into the future, like a ship’s bow pushes a wave forward at sea.

(from here)

What do we do?

The solution is not to throw out everything in the data centre, starting from the mainframe. Judiciously adapted, upgraded, and integrated, old tech can last a very long time. There are B-52 bombers that have hosted three generations from the same family. In the same way, ancient systems like SABRE have been running since the 1960s, and still (eventually) underpin every modern Web 3.0 travel-planning web site you care to name.

What is required is actually something much harder: thought and consideration.

Change is going to happen. It’s better to make plans up front that allow for change, so that we can surf the wave of change. Organisations that wipe out trying to handle (or worse, resist) change that they had not planned for may never surface again.

Not NoOps, but SmartOps

Or, Don't work harder, work smarter

I have always been irritated by some of the more extreme rhetoric around DevOps. I especially hate the way DevOps often gets simplified into blaming everything that went wrong in the past on the Ops team, and explicitly minimising their role in the future. At its extreme, this tendency is encapsulated by the NoOps movement.

This is why I was heartened to read There is no such thing as NoOps, by the reliably acerbic IT Skeptic.

Annoyingly, in terms of the original terminology, I quite agree that we need to get rid of Ops. Back in the day, there was a distinction between admins and ops. The sysadmins were the senior people, who had deep skills and experience, and generally spent their time planning and analysing rather than executing. The operators were typically junior roles, often proto-sysadmins working through an apprenticeship.

Getting rid of ops in this meaning makes perfect sense. The major cause of outages is human error, and not necessarily the fairly obvious moment when the poor overworked ops realize one oh-no-second after hitting Enter that the login was not where they thought it was. What leads to these human-mediated outages is complexity, so the issue is the valid change that is made here but not there, or the upgrade that happened to one component but did not flow down to later stages of the lifecycle. These are the types of human error that can either cause failures on deployment, or those more subtle issues which only show up under load, or every second Thursday, or only when the customer's name has a Y in it.

There have been many attempts to reduce the incidence of these moments by enforcing policies, review, and procedures. However, by not eliminating the weakest link in the chain - the human one - none of these well-meaning attempts have succeeded. Instead of saying "it will work this time, really!", we should aim to to eliminate downtime and improve performance by removing every possible human intervention and hand-over, and instead allowing one single original design to propagate everywhere automatically.

So yes, we get rid of ops by automating their jobs - what I once heard a sysadmin friend describe to a colleague as "monkey-compatible tasks", basically low-value-added, tactical, hands-on-keyboard activity. However, that does not mean that there is no role for IT! It simply means that IT's role is no longer in execution, or in other words, as the bottleneck in every request.

Standard requests should not require hands-on-keyboard intervention from IT.

This is what all these WhateverOps movements are about: preventing IT from becoming a bottleneck to other departments, whether the developers in the case of DevOps, or the GRC team in the case of SecOps that I keep banging on about lately, or whatever other variation you like.

IT still has a very important role to play, but it is not the operator's role, it is the sysadmin's role: to plan, to strategise, to have a deep understanding of the infrastructure. Ultimately, IT's role is to advise other teams on how best to achieve their goals, and to emplace and maintain the automation that lets them do that - much as sysadmins in the past would have worked to train their junior operators to deliver on requests.

The thing is, sysadmins themselves can't wait to rid themselves of scut work. Nothing would make them happier! But the state of the art today makes that difficult to achieve. DevOps et al are the friend of IT, not its enemy, at least when they're done right. Done wrong, they are the developer's enemy too.

In that sense, I say yes to NoOps - but let's not throw the baby out with the bathwater! Any developer trying to do completely without an IT team will soon find that they no longer have any time to develop, because they are so busy with all this extraneous activity, managing their infrastructure1, keeping it compliant, updating components, and all the thousand and one tasks IT performs to keep the lights on.


  1. No, Docker, "the cloud", or whatever fad comes next will not obviate this problem; there will always be some level of infrastructure that needs to be looked after. Even if it works completely lights-out in the normal way of things, someone will need to understand it well enough to fix it when (not if) it breaks. That person is IT, no matter which department they sit in. 

Signalling

I've been blogging a lot about messaging lately, which I suppose is to be expected from someone in marketing. In particular, I have been focusing on how messaging can go wrong.

The process I outlined in "SMAC my pitch up" went something like this:

  • Thought Leaders (spit) come up with a cool new concept
  • Thought Leaders discuss the concept amongst themselves, coming up with jargon, abbreviations, and acronyms (oh my!)
  • Thought Leaders launch the concept on an unsuspecting world, forgetting to translate from jargon, abbreviations and acronyms
  • Followers regurgitate half-understood jargon, abbreviations and acronyms
  • Much clarity is lost

Now the cynical take is that the Followers are doing this in an effort to be perceived as Thought Leaders themselves - and there is certainly some of that going on. However, my new corollary to the theory is that many Followers are not interested in the concept at all. They are name-checking the concept to signal to their audience that they are aware of it and gain credibility for other initiatives, not to jump on the bandwagon of the original concept. This isn't the same thing as "cloudwashing", because that is at least about cloud. This is about using the cloud language to justify doing something completely different.

This is how we end up with actual printed books purporting to explain what is happening in the world of mobile and social. By the time the text is finalised it's already obsolete, never mind printed and distributed - but that's not the point. The point is to be seen as someone knowledgeable about up-to-date topics so that other, more traditional recommendations gain some reflected shine from the new concept.

The audience is in on this too. There will always be rubes taken in by a silver-tongued visionary with a high-concept presentation, but a significant part of the audience is signalling - to other audience members and to outsiders who are aware of their presence in that audience - that they too are aware of the new shiny concept.

It's cover - a way of saying "it's not that I don't know what the kids are up to, it's that I have decided to do something different". This is how I explain the difficulties in adoption of new concepts such as cloud computing1 or DevOps. It's not the operational difficulties - breaking down the silos, interrupting the blamestorms, reconciling all the differing priorities; it's that many of the people talking about those topics are using them as cover for something different.


Images from Morguefile, which I am using as an experiment.


  1. Which my fingers insist on typing as "clod computing", something that is far more widespread but not really what we should be encouraging as an industry. 

DevOps is killing us

I came across this interesting article about the changes that DevOps brings to the developer role. Because of my sysadmin background, I had tended to focus on the Ops side of DevOps. I had simply not realised that developers might object to DevOps!

I knew sysadmins often didn’t like DevOps, of course. Generalising wildly, sysadmins are not happy with DevOps because it means they have to give non-sysadmins access to the systems. This is not just jealousy (although there is often some of that), but a very real awareness that incentives are not necessarily aligned. Developers want change, sysadmins want stability.

Actually, that point is important. Let me emphasise it some more.

Developers want change, sysadmins want stability

Typical pre-DevOps scenario: developers code up an application, and it works. It passes all the testing: functional, performance, and user-acceptance. Now it’s time to deploy it in production - and suddenly the sysadmins are begin difficult, complaining about processes running as root and world-writable directories, or talking about maintenance windows for the deployment. Developers just want the code that they have spent all this time working on to get out there, and the sysadmins are in the way.

From the point of view of the sysadmins, it’s a bit different. They just got all the systems how they like them, and now developers are asking for the keys? Not only that, but their stuff is all messy, with processes running as root, world-writable directories, and goodness knows what. When the sysadmins point out these issues and propose reasonable corrections, the devs get all huffy, and before you know it, the meeting has turned into a blamestorm.

1330992512834_1603323.png

The DevOps movement attempts to address this by getting developers more involved in operations, meaning instead of throwing their code over the proverbial wall between Dev and Ops, they have to support not just deployment but also support and maintenance of that code. In other words, developers have to start carrying pagers.

The default sysadmin assumption is that developers can’t wait to get the root password and go joy-riding in their carefully maintained datacenter - and because I have a sysadmin background, sell to sysadmins, and hang out with sysadmin types, I had unconsciously bought into that. However, now someone points it out, it does make sense that developers would not want to take up that pager…