Showing all posts tagged mongodb:

The Ghost In The Machine

At this point in time it would be more notable to find a vendor that was not adding "AI" features to its products. Everyone is jumping on board this particular hype train, so the interesting questions are not about whether a particular vendor is "doing AI"; they are about how and where each vendor is integrating these new capabilities.

I no longer work for MongoDB, but I remain a big fan, and I am convinced that generative AI is going to be good for them — but something rubbed me up the wrong way about how they communicated some of their new capabilities in that area, and I couldn’t get it out of my head.

Three Ways To "Do AI"

Some of the applications of generative AI are real, natural extensions of a tool’s existing capabilities, built on a solid understanding of what generative AI is actually good for. Code copilot (aka "fancy autocomplete") is probably the leading example in this category. Microsoft was an early mover here with Github and then VS Code, but most IDEs by now either already offer this integration, or are frantically building it.

Some applications of AI are more exploratory, either in terms of the current capabilities of generative AI, or of its applicability to a particular domain. Sourcing and procurement looks like one such domain to me. I spent more of this past summer than I really wanted to enmeshed in a massive tender response, together with many colleagues, and while it would have been nice to just point ChatGPT at the request and let it go wild, the response is going to be scrutinised to such a level that the amount of editing and review of an automated submission that would have been required is the same as, if not greater than, the effort required to just write the response in the first place. However, I am open to the possibility that with some careful tuning and processes in place, this sort of application might have value.

And then there is a third category that we can charitably call "speculative". There is a catalogue of vendors trying this sort of thing that is both inglorious and extensive, and I am sad to see my old colleagues at MongoDB coming close to joining them: MongoDB adds vector search to Atlas database to help build AI apps.

young developer: "Wow, how did you get these results? Did you use a traditional db or a vector db?"

me: "lol I used perl & sort on a 42MB text file. it took 1.2 seconds on an old macbook"

from Mastodon

I have no problem with MongoDB exploring new additions to their data platform’s capabilities. It has been a long time since MongoDB was just a noSQL database, to the point that they should probably just stop fighting people about including the "DB" at the end of their name and drop it once and for all — if that shortened name didn’t have all sorts of unfortunate associations. MongoDB Atlas now supports mobile sync, advanced text search, time series data, long-running analytical queries, stream processing, and even graph queries. Vector search is just one more useful addition to that already extensive list, so why get worked up about it?

Generative AI Is Good For MongoDB — But…

The problem I have is with the framing, implying that the benefit to developers — MongoDB’s key constituency — is that they will build their own AI apps on MongoDB by using vector search. In actuality, the greatest benefit to developers that we have seen so far is that first category: automated code generation. Generative AI has the potential to save developers time and make them more effective.

In its latest update to the Gartner Hype Cycle for Artificial Intelligence, Gartner makes the distinction between two types of AI development:

  • Innovations that will be fueled by GenAI.

  • Innovations that will fuel advances in GenAI.

Gartner's first category is what I described above: apps calling AI models via API, and taking advantage of that capability to power their own innovative functionality. Innovations that advance AI itself are obviously much more significant in terms of moving the state of the art forward — but MongoDB implying that meaningful numbers of developers are going to be building those foundational advances, and doing so on a general-purpose data platform, feels disingenuous.

Of course, the reason MongoDB can’t just come out and say that, or simply add ChatGPT integration to their (excellent and under-appreciated) Compass IDE and be done, is that the positioning of MongoDB since its inception has been about its ease of use. Instead of having to develop complex SQL queries — and before even getting to that point, sweat endless details of schema definition — application developers can use much more natural and expressive MongoDB syntax to get the data they want, in a format that is ready for them to work with.

But if it’s so easy, why would you need a robot to help you out?

And if a big selling point for MongoDB against relational SQL-based databases is how clunky SQL is to work with, and then a robot comes along to take care of that part, how is MongoDB to maintain its position as the developer-friendly data platform?

Well, one answer is that they double down on the breadth of capabilities which that platform offers, regardless of how many developers will actually build AI apps that use vector search, and use that positioning to link themselves with the excitement over AI among analysts and investors.

I Come Not To Bury MongoDB, But To Praise It

None of this is to say that MongoDB is doomed by the rise of generative AI — far from it. Given MongoDB’s position in the market, an AI-fuelled increase in the number of apps being built can hardly avoid benefiting MongoDB, along the principle of a rising tide lifting all boats. But beyond that general factor, which also applies to other databases and data platforms, there is another aspect that is more specific to MongoDB, and has the potential to lift its boat more than others.

The difference between MongoDB and relational databases is not just that MongoDB users don’t have to use SQL to query the database; it’s also that they don’t have to spend the laborious time and effort to specify their database schema up front, before they can even start developing their actual app. That’s not to say that you don’t have to think about data design with MongoDB; it’s just that it’s not cast in stone to the same degree that it is with relational databases. You can change your mind and evolve your schema to match changing requirements without that being a massive headache. Nowadays, the system will suggest changes to improve performance, and even implement them automatically in some situations.

All of this adds up to one simple fact: it’s much quicker to get started on building something with MongoDB. If two teams have similar ideas, but one is building on a traditional relational database and the other is building on MongoDB, the latter team will have a massive advantage in getting to market faster (all else being equal).

At a time when the market is moving as rapidly as it is now (who even had OpenAI on their radar a year ago?), speed is everything. MongoDB could have just doubled down on their existing messaging: "build your app on our platform, and you’ll launch faster". What bothers me is that instead of that plain and defensible statement, we got marketing-by-roadmap, positioning some fairly basic vector search capabilities as somehow meaning hordes of developers are going to be building The Next Big AI Thing on top of MongoDB.


Marketing-by-roadmap this way is a legitimate strategy, to be clear, and perhaps the feeling at MongoDB is that this is fair turnaround for all the legitimate features they built over the years and did not get credit for, with releases greeted with braying cries of "MongoDB is web scale!" and jokes about it losing data, long past the point when that was any sort of legitimate criticism. Building this feature and launching it this way seems to have got MongoDB a tonne of positive press, and investors expect vendors to be building AI features into their products, so it probably didn’t hurt with that audience either.

Communicating this way does bother me, though, and this is one feature I am glad that I am no longer paid to defend.

The Hard Work Of Success

There's a pattern to successful outcomes of IT projects — and it's not about who works the longest hours, or has the most robust infrastructure, or the most fashionable programming language.

Here is a recent specific example, which came to my attention specifically because it mentions my current employer — although the trend is a general one: How Nationwide taps Kafka, MongoDB to guide financial decisions. And here is the key part that I am talking about:

A lot of organizations try and go for a big data approach — let’s throw everything into a data lake and try and capture everything and then work out what we’re going to do with it. It’s interesting, but actually it doesn’t solve the problem. And therefore, the approach we’ve taken is to start at the other end. Let’s look at the business problem that we’re trying to solve, rather than trying to solve the mess of data that organizations are typically trying to untangle.

It is indeed a common pitfall in IT to start with the technology first. You hear about some cool new thing, and you want to try it out in practice, so you go casting around for an excuse to do that. You'll notice, however, that very few of these decisions lead to the sort of success stories that get profiled in the media. The more probable outcome is that the project either dies a quiet death in a corner when it turns out that the shiny new tech wasn't quite ready for prime time, or if the business stakeholders are important/loud enough, it gets a vastly expensive emergency rewrite at the 11th hour into something more traditional.

Meanwhile all the success stories start with a concrete business requirement. Somebody needs to get something done, so they work out what their desired outcome is, and how they will know when it has been achieved. Only then do you start coding, or procuring services, or whatever it is you were planning to do.

This is not to say that it's not worth experimenting with the new tech. It's just that "playing around with new toys" is its own thing, a proof of concept or whatever. You absolutely should be running these sorts of investigations, so that when the business need arises, you will have enough basic familiarity with the various possibilities to pick one that has a decent chance of working out for you. To take the specific example of what Nationwide was doing, data lakes are indeed enormously useful things, and once you have one in place, new ways of using it will almost certainly emerge — but your first use case, the one that justifies starting the project at all, should be able to stand on its own, without hand-waving or references to a nebulous future.

This is also why it's probably not a good idea to tie yourself too closely to a specific technology, in business let alone in education. You don't know what the requirements are going to look like in the future, so being overly specific now is to leave gratuitous hostages to fortune. Instead, focus on a requirement you have right now.

Nationwide is facing competition from fintechs and other non-traditional players in banking, and one of the axes of competition is giving customers better insight into their spending. The use case Nationwide have picked is to help users achieve their financial goals:

We’re looking at how we create insight for our members that we can then expose to them through the app. So you’ll see this through some of the challenger banks that will show you how you’ve spent your money. Well, that’s interesting — we can do that today. But it isn’t quite as interesting as a bit of insight that says, "If you actually want to hit your savings target for the holiday that you want next year, then perhaps you could do better if you didn’t spend it on these things."

Once this capability is in place, other use cases will no doubt emerge.

But what is the education equivalent of this thinking? Saying "let's teach kids Python in school!" is not useful. Python is in vogue right now, but kids starting elementary school this September will emerge from university fifteen or twenty years from now. I am willing to place quite a large bet that, while Python will certainly still be around, something else, maybe even several somethings, will have eclipsed its current importance.

We should not focus narrowly on teaching coding, let alone specific programming languages — not least because the curriculum is already very packed. What are we dropping to make room for Python?

And another question: how are we actually going to deliver the instruction? In theory, my high school curriculum included Basic (no, not Visual; just plain Basic). In practice, it was taught by the maths and physics teacher, and those subjects (rightly!) took precedence. I think we got maybe half a dozen hours a year of Basic instruction, and it may well have been less; it's been a while since high school.

The current flare-up of the conversation about teaching IT skills at school has this in common with failed projects in business: it's been dreamed up in isolation by technologists, with no reference to anyone in actual education, whether teachers, students, or parents. None of these groups operate at Silicon Valley pace, but that's fine; this is not a problem that can be solved with a quick hackathon or a quarter-end sprint. Very few worthwhile problems can be, or they would not remain unsolved.

Don't confuse today's needs with universal requirements, and don't think that the tools you have on the shelf today are the only ones anyone will ever need. Take the time to think through what the actual requirement is, and make sure to include the people doing the work today in your planning.


🖼️ Photos by Alvaro Reyes and Ivan Aleksic on Unsplash

Turning Over A New Leaf

Yesterday was the LinkedIn equivalent of a birthday on Facebook: a new job announcement. I am lucky enough to have well-wishers1 pop up from all over with congratulations, and I am grateful to all of them.

With a new job comes a new title – fortunately one that does not feature on this list of the most ridiculous job titles in tech (although I have to admit to a sneaking admiration for the sheer chutzpah of the Galactic Viceroy of Research Excellence, which is a real title that I am not at all making up).

The new gig is as Director, Field Initiatives and Readiness, EMEA at MongoDB.

Why there? Simply put, because when Dev Ittycheria comes calling, you take that call. Dev was CEO at BladeLogic when I was there, and even though I was a lowly Application Engineer, that was a tight-knit team and Dev paid attention to his people. If I have learned one thing in my years in tech, it’s that the people you work with matter more than just about anything. Dev’s uncanny knack for "catching lightning in a bottle", as he puts it, over and over again, is due in no small part to the teams he puts together around him – and I am proud to have the opportunity to join up once again.

Beyond that, MongoDB itself needs no presentation or explanation as a pick. What might need a bit more unpacking is my move from Ops, where I have spent most of my career until now, into data structures and platforms. Basically, it boils down to a need to get closer to the people actually doing and creating, and to the tools they use to do that work. Ops these days is getting more and more abstract, to the point that some people even talk about NoOps (FWIW I think that vastly oversimplifies the situation). In fact, DevOps is finally coming to fruition, not because developers got the root password, but because Ops teams started thinking like developers and treating infrastructure as code.

Between this cultural shift, and the various technological shifts (to serverless, immutable infrastructure, and infrastructure as code) that precede, follow, and go along with them, it’s less and less interesting to talk about separate Ops tooling and culture. These days, the action is in the operability of development practices, building in ways that support business agility, rather than trying to patch the dam by addressing individual sources of friction as they show up.

More specifically to me, my particular skill set works best in large organisations, where I can go between different groups and carry ideas and insights with me as I go. I’m a facilitator; when I’m doing my job right, I break information out of silos and spread it around, making sure nobody gets stuck on an island or perseveres with some activity or mode of thinking that is no longer providing value to others. Coming full circle, this fluidity in my role is why I tend to have fuzzy, non-specific job titles that make my wife’s eyes roll right back in her head – mirroring the flow I want to enable for everyone around me, whether colleagues, partners, or users.

It’s all about taking frustration and wasted effort out of the working day, which is a goal that I hope we can all get behind.

Now, time to blow away my old life…


  1. Incidentally, this has been the first time I’ve seen people use the new LinkedIn reactions. It will be interesting to watch the uptake of this feature.