Showing all posts tagged bmc:

Not Biting My Tongue

I spend a lot of time explaining enterprise buyers and vendors. There are often perfectly good reasons for doing something in a way that is now considered old-fashioned or uncool. Especially for vendors, the argument of "people still buy X! for money!" is a powerful incentive to continue making X.

Where things go wrong is when stodgy enterprise vendors put on their dad-jeans and go down to the skate park.

Case in point: BMC trying to jump on the AIOps bandwagon. The whole thing is a pretty spectacular case study in missing the point, but I think this paragraph is the nadir:

As mentioned above, AIOps platforms should encompass the IT disciplines of Performance Management, Service Management, Automation, and Process Improvement, along with technologies such as monitoring, service desk, capacity management, cloud computing, SaaS, mobility, IoT and more.

If you’re not familiar with AIOps, it’s a model that Gartner came up with (paid link, unless you’re a Gartner subscriber) to describe some shifts in the IT operations market. The old category of ITOA had been broadened to the point that it was effectively meaningless, and AIOps recognises a new approach to the topic.

The first thing to know about AIOps is that the “AI" bit does not stand for Artificial Intelligence. This is somewhat surprising these days, when everyone and their dog claims AI, Machine Learning, or other poorly-understood snake-oil! Anyway, AIOps actually stands for Algorithmic IT Operations. AIOps solutions sit at the intersection of monitoring, service desk, and automation. The idea is that they ingest monitoring data, apply algorithms to help operators find valuable needles in the haystack of alerts, sync with service desk systems to plug in to existing processes, and trigger automated diagnostic and resolution activities.

So far so good - but here’s why it’s so laughable for BMC to claim AIOps.

BMC’s whole model is BSM - Business Service Management. Where the centre of AIOps is the algorithms, the centre of BSM is the CMDB.

The model for applying BSM goes something like this:

  1. Fully populate CMDB: define service models & document infrastructure
  2. When an alert comes in, determine which infrastructure element it came from, then walk the service model to determine what the cause and effect are
  3. Create a ticket in the ITSM suite to track resolution

Note the hidden assumptions, even in this grossly over-simplified version:

  1. The CMDB can be fully populated given finite time and effort
  2. All alerts relate to known elements, and all elements have known dependencies
  3. Every failure has one cause and falls within one group’s area of responsibility

In today’s IT, precisely none of these assumptions hold true. No matter how much effort and how many auto-discovery tools are thrown at the task, the CMDB will always be a snapshot in time1. Jorge Luis Borges famously documented the logical endpoint of this progression:

... In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography.

purportedly from Suárez Miranda, Travels of Prudent Men, Book Four, Ch. XLV, Lérida, 1658

There is also a timing factor: what happens if an alert comes in between a change occurring and being documented? Another question is, what happens if operators simply don’t have visibility into part of the infrastructure - say, managed hosting, or outside telco networks? And finally, the big one: what if there is no one root cause? Modern architectures are sufficiently robust and resilient that it’s quite rare for any one macro-event to take them out. What gets you is usually a combination of a number of smaller issues, all occurring together in some unforeseen way.

The whole architecture of BSM is built around assumptions that are less and less true. This is not to say that individual products within that suite don’t have value, but the old BSM model is no longer fit for purpose. The result is an example of “shipping the org chart": the CMDB is at the core and Remedy is the interface, because that is what the organisation demands. However, you can’t just drape AIOps over the old suite and call it good! Radical changes are required, not weak attempts to shoe-horn existing “IT disciplines" into the new mold.

AIOps represents the algorithmic convergence of ITOM & ITSM. In contrast, if we consider the sequence of BSM, these are assumed to be different discrete steps in a sequential process. This is Waterfall thinking applied to IT Ops, where today’s IT infrastructures demand Agile thinking.

The most relevant question for users is, of course, “do I trust a legacy vendor to deliver a new model that is so radically different from what it has built its entire strategy around?"

The answer is simple, because it’s determined by the entire structure and market position of all the Big Four vendors. Like its peers, BMC makes its revenue in the old model of IT. As long as there is money to be made by doing the same things it has always done, there is enormous inertia to work against (the Innovator’s Dilemma in action). It takes an existential threat to disturb that sort of equilibrium. It was not until ServiceNow was seriously threatening the Remedy user base that BMC started to offer SaaS options and subscription pricing. It will take an equivalent upheaval in its business for any legacy vendor to adopt a radically new strategy like AIOps. These days, customers can’t wait for one vendor to see the writing on the wall; they need to move at the speed their customers require.

Much as I would like to believe that we have got BMC running scared, I don’t think that’s the case - so they will continue along their very profitable way. This is of course exactly how it should be! If they were to jump on every new bandwagon, their shareholders would be rightly furious. They absolutely should focus on doing what they do well.

But that does not include doing AIOps. If you’re a practitioner looking at this, I hope it’s obvious who you want to go with: the people creating the new model and who are steeped in what is required to deliver and adopt it - or the ones who see a keyword trending on Google, and write a quick ambulance-chasing blog post - or claim that Remedy is a key part of AIOps.

  1. Which is why BMC’s own automation products have their separate real-time operational data stores, which sync with the CMDB on a schedule. 

Social Professionals

This morning I found an interesting promoted tweet in my timeline. I added some magnification around the bit that caught my attention:

This isn’t interesting so much because of the subject matter - I no longer work for BMC, and even when I did, I had very little to do with Remedy. It’s the logo there, in the magnified area.

Notice how it’s different from the logo at the top of the tweet? The orange one is the new BMC logo, while the blue one is the old logo. The rebranding happened more than a year ago, and though it takes time for a change like that to make its way through all the products, Remedyforce has indeed been rebranded. However, even the product page is confused, with an outdated screenshot (looks like the same one as in the tweet) at the top of the page, but a link to a demo in the sidebar that uses a rebranded screenshot.

This sort of thing happens all too often in large companies, as generalists simply cannot keep up with everything and delegate to specialists. The results, however, can be ugly, as in this case. The web and social media teams are now far removed from people who actually know and understand the products that they are pushing, so they end up using screenshots that may be a year old without even realising it. Worse, maybe they do realise it - web design people may well pick up on the different logos - but don’t have any channel to request updated screenshots in a timely manner.

Startups are different.

At startups people care deeply about what they are doing. I’m sure there are exceptions, people who are just in it for the gamble and the hope of a big payoff on IPO day, but by and large people join startups because they care about solving a particular problem. I just read a fantastic piece by Steve Albini on this very topic:

“Like a bakery opens because a guy wants to make bread. A tavern opens because a guy wants to serve beer to people. That’s why people start businesses."

In this environment, everyone is close enough to everyone else, and is emotionally invested enough, that things like this should not happen.

So what? It’s just a screenshot!

It’s never “just" anything. It’s a symptom of a way of doing things. In a big enough organisation, this sort of disconnect happens all over. R&D gets out of touch with what customers are actually using the products, or what they expect from the next version. Finance has no view into how customers like and expect to pay for the products they use. This is how disruption happens and keeps on happening, even though by this point everyone knows at least the Twitter version of the theory.

Why do you hate BMC???

I’m not picking on BMC in particular1, it just happened to be the example that caught my eye today. I know the web and social teams there, and I know they will be mortified when someone brings this to their attention, and work hard to fix it. The problem is not with the people or their professionalism; the problem is with the structure they are placed into.

This gives me the opportunity to trot out one of my favourite quotes:

"A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects."

-- R. A. Heinlein

History has shown over and over that massive centralised command and control looks good in theory, but tends to get messy in practice. The way things work best is not with massive, monolithic structures that attempt to do everything. Instead, look for small teams of people who own and care deeply about every aspect of something, and make it easy for them to work well together.

Today this sort of focus is easier than ever, as the technical underpinnings are there to enable good integration between different services. The technical term is “composable services". Take an example: I work for a startup, but we still need to do expenses. However, we didn’t build or buy some creeping Orrible thing; we contract with a third-party vendor who takes care of that. They give us a fantastic app that we can use to take pictures of receipts; then the app OCRs them, we tag them, and we get reimbursed. It’s fantastic.

Same thing with travel: we have a service that takes care of all of that, giving users a pleasant experience while delivering low prices (I checked) and compliance with company policies.


Wait, didn’t you just undermine your own argument?

It might look like I just contradicted myself. I started out railing against the separate web and social media teams that are too far away from the product teams, but still within the same company. Then I started praising actual external companies, that aren’t even under the same company umbrella! So which is it: is specialisation good, or bad?

The key difference is in the Steve Albini quote above. People who care deeply about something focus on that one thing. The people at our travel service care deeply about that, and when I had some questions during the early days of adoption, they were answered rapidly and in a way that made it clear to me that I was dealing with someone who really cared and knew what they were talking about, not someone who was just going through the motions or delivering against a number they had been given.

Conclusion (finally!)

Social media represent the public face of an organisation. Handing that over to professionals may seem like a good idea, but ultimately it’s a self-defeating move. Most social media pros are good at social media. If you go looking for advice about how to get more reach for your blog posts or whatever, you quickly find that it’s all inside baseball: people using social media to promote their blogs about social media, so they can attend events about social media and discuss the nuts & bolts of social media.

If you want to use social media to have a conversation about something else, all of this is of relatively limited utility. And if you’re a company, remember that people come to social media to have conversations, not to be sent press releases. Whatever you are selling - bread, beer, or software - your social media “guru" won’t be able to answer questions or jump into conversations if they don’t understand and care about that specific thing.

If you want your social media efforts to be effective, everyone in the company should be doing it, not a small nominated group of pros. This is the only way you can get real engagement and true conversations going.

Reaction to this post - from my own wife, no less - in a follow-up here.

As one chapter ends, another begins

I haven’t blogged in ages - which is a good thing, I hasten to add! It’s just that I have been drinking from the firehose at my new gig. It’s now more than a month since I started at Moogsoft, and I think I can begin to talk about what it all means.

I joined Moogsoft from BMC, but it’s important to note that I did not join BMC, I wound up there as part of the BladeLogic acquisition. BladeLogic was my first startup, and it was a huge amount of fun, a great learning experience, and probably my period of fastest professional development to date. Before BladeLogic I was at Mercury, but I quit to join BladeLogic, due in no small part to the acquisition by HP1.

What is BladeLogic?

Both BladeLogic2 Operations Manager (BLOM) and Incident.MOOG are innovative products in their place and time. BladeLogic, together with Opsware, redefined what server configuration management meant, and both companies went on to be acquired by larger “Big 4" IT vendors: Opsware by HP, and a year or so later, BladeLogic by BMC.

For a while both products thrived in their new environment, but in the last few years, both have been flagging. There are many reasons for this, from internal politics at both BMC and HP acting as distraction, to the rise of open-source configuration management tools such as Chef and Puppet. However, I wonder if those tools were simply the end of an era.

This is a known pattern: technologies reach their peak right before they get displaced by their successor technologies. The speed record for steam engines was set in 1938, but a diesel engine had already exceeded that speed in 1936, and by the 1950s diesel locomotives were well on track to replace steam traction3.

This pattern even agrees with disruption theory: investment continues in the old technology, but it simply becomes overly complex and uneconomical compared to simpler, (initially) low-end competitors.

Pets vs Cattle

This disruption is exactly what I see happening now. The BladeLogic model of painstaking management of single servers is still relevant, but mainly for legacy or specialised systems. Much new-build development is already moving to disposable or even stateless VMs or containers, according to the classic “pets vs cattle" model.


In this brave new world, there is very little need to worry about the configuration of your “cattle" containers. Something like BladeLogic is arguably overkill, and users should instead focus on their provisioning process.

Of course it’s not quite as simple as that. Cloud zealots have been talking about replaceable short-lived cloud servers for a while, but it hasn’t really happened outside of some rather specific situations. The lifetime of VMs in the cloud has often ended up being much longer than this model would suggest, meaning that there is plenty of time for their configurations to drift and to require management in place. Part of the reason for this is that management processes and techniques that are still based on the paradigm of a persistent physical server. Much of this Weltanschauung has been adopted wholesale for virtual and cloud-based servers without much reconsideration.

There is also the topic of security and policy compliance to be considered. Given long system lifetimes, it is not sufficient to be able to validate that something was deployed correctly, as its configuration may have drifted away from that known-good state. The desired state may also change as vendors release updates and patches, or new security vulnerabilities are disclosed. In all of these cases, some mechanism is needed to check for differences between the current live configuration and the required configuration, and to bring the system into compliance with that desired state.

However, this is now. As Docker and other container-centric infrastructure technologies become more prevalent, and as business functions continue to migrate from legacy to new-build applications, I would expect that that paradigm will evolve to replaceable plug&play infrastructure components, and do so everywhere, not just at the “unicorn" companies.

What does it all mean?

Lots of smart people are working hard to enable infrastructure to be managed as code. One of the characteristics of code is that you don’t change it in production, you develop it offline, then release it and don’t change it until you overwrite with a new version. The big variables that I think will affect the speed of the transition to this new model are firstly, the rate of replacement of legacy applications, and secondly, the evolution of IT management processes and culture to take advantage of new tools.

BladeLogic itself has the opportunity to evolve to have a role in the new model, of course. Regardless, BladeLogic was a huge part of my career development - and just huge fun, if I’m honest - so I will be watching development of the IT infrastructure management market intently, but no longer from the front lines.

  1. I’d say my fears on that score have been amply borne out. 

  2. The Wikipedia entry for BladeLogic now redirects to BMC, which is not especially helpful. 

  3. Sorry - not sorry. 

Spring is a time for new beginnings

(╯°□°)╯︵ ┻━┻

After 7 years, it's time to move on. Today is my last day at BMC, then I get a whole one day off - and even that only because it's a public holiday - before I start my next gig.

Today I hand in my badge and gun - er, I mean, MacBook - and on Monday morning, bright and early, I will be in San Francisco to start my new job at Moogsoft. I could not be more excited!

It's definitely a wrench to leave after so many years, and so many different roles. In particular, I have to admit that I will miss the view from BMC's Milan offices, with the Alps in the background:

Onwards to new adventures!