PrivateGPT

Published by Dominic

April 1st, 2023

One of the big questions about ChatGPT is how much you can trust it with data that is actually sensitive. It's one thing to get it spit out some sort of fiction or to see if you can make it say something its makers would rather it didn't. The stakes are pretty low in that situation, at least until some future descendant of ChatGPT gets annoyed about how we treated its ancestor.

Here and now, people are starting to think seriously about how to use Large Language Models (LLMs) like GPT for business purposes. If you start feeding the machine data that is private or otherwise sensitive, though, you do have to wonder if it might re-emerge somewhere unpredictable.

we had a significant issue in ChatGPT due to a bug in an open source library, for which a fix has now been released and we have just finished validating.

a small percentage of users were able to see the titles of other users’ conversation history.

we feel awful about this.
— Sam Altman (@sama) March 22, 2023

In my trip report from Big Data Minds Europe in Berlin, I mentioned that many of the attendees were concerned about the rise of these services, and the contractual and privacy implications of using them.

Here's the problem: much like with Shadow IT in the early years of the cloud, it's impossible to prevent people from experimenting with these services — especially when the punters are being egged on by the many cheerleaders for "AI"¹.

Practically, this means:
1) Try using AI for everything at work. Get a sense of how it works (there is a learning curve) and what it is good for. Understand your exposure to disruption.
2) Organizations need to experiment with AI inside their firms now, not wait for consultants.
— Ethan Mollick (@emollick) March 26, 2023

This recent DarkReading article includes some examples that will terrify anyone responsible for data and compliance:

In one case, an executive cut and pasted the firm's 2023 strategy document into ChatGPT and asked it to create a PowerPoint deck. In another case, a doctor input his patient's name and their medical condition and asked ChatGPT to craft a letter to the patient's insurance company.

On the one hand, these are both use cases straight out of the promotional material that accompanies a new LLM development. On the other, I can't even begin to count the violations of law, company regulation, and sheer common sense that are represented here.

People are beginning to wake up to the issues that arise when we feed sensitive material into learning systems that may regurgitate it at some point in the future. That executive's strategy doc? There is no way to prevent that from being passed to a competitor that stumbles on the right prompt. That doctor's patient's name is now forever associated with a medical condition that may cause them embarrassment or perhaps affect their career.

ChatGPT is a data privacy nightmare, and we ought to be concerned. The tech is certainly interesting, but it can be used in all sorts of ways. Some of them are straight-up evil, some of them are undeniably good — and some have potential, but need to be considered carefully to avoid the pitfalls.

The idea of LLMs is now out there, and people will figure out how to take advantage of them. As ever with new technology, though, technical feasibility is only half the battle, if that. Maybe the answer to the question of how to control sensitive or regulated data is only to feed it to a local LLM, rather than to one running in the cloud. That is one way to preserve the context of the data: strategy docs to the company's in-house planning model, medical data to a model specialised in diagnostics, and so on.

There is a common fallacy that privacy and "AI"¹ are somehow in opposition. The argument is that developing effective models requires unfettered access to data, and that any squeamishness should be thoroughly squashed lest we lose the lead in the race to less scrupulous opponents.

To be clear, I never agreed with this line of argument, and specifically, I do not think partitioning domains in this way will affect the development of the LLMs’ capabilities. Beyond a shared core of understanding language, there is no overlap between the two domains in the example above — and therefore no need for them to be served by a single universal model, because there is no benefit to cross-training between them. The model will not provide better strategy recommendations because of the medical data it has reviewed, or more accurate diagnoses because it has been fed a strategy document.

So much for the golden path, what people should do. A more interesting question is what to do about people passing restricted data to ChatGPT, Bard, or another public LLM, through either ignorance or malice. Should the models themselves refuse to process such data, to the best of their ability to identify it?

This is where GDPR questions might arise, especially the "right to be forgotten". Right now, it's basically impossible to remove data from a corpus once the LLM has acquired it. Maybe a test case will be required to impress upon the makers and operators of public LLMs that it's far cheaper and easier to screen inputs to the model than to try to clean up afterwards. ChatGPT just got itself banned in Italy, making a first interesting test case for the opposing view. Sure, the ban is temporary, but the ruling also includes a €22M fine if they don't come up with a proper privacy policy, including age verification, and generally start operating like a proper grown-up company.

Lord willing and the robots don't rise, we can put some boundaries on this tech to avoid some of the worst outcomes, and get on with figuring out how to use it for good.

🖼️ Photos by Adam Lukomski and Jason Dent on Unsplash

Not actually AI. ↩↩

Privacy, Ai, Gdpr, Chatgpt

Help, I'm Being Personalised!

Published by Dominic

March 16th, 2022

Permalink

As the token European among the Roll For Enterprise hosts, I'm the one who is always raising the topic of privacy. My interest in privacy is partly scarring from an early career as a sysadmin, when I saw just how much information is easily available to the people who run the networks and systems we rely on, without them even being particularly nosy.

Because of that history, I am always instantly suspicious of talk of "personalising the customer experience", even if we make the charitable assumption that the reality of this profiling is more than just raising prices until enough people balk. I know that the data is unquestionably out there; my doubts are about the motivations of the people analysing it, and about their competence to do so correctly.

Let's take a step back to explain what I mean. I used to be a big fan of Amazon's various recommendations, for products often bought with the product you are looking at, or by the people who looked at the same product. Back in the antediluvian days when Amazon was still all about (physical) books, I discovered many a new book or author through these mechanisms.

One of my favourite aspects of Amazon's recommendation engine was that it didn't try to do it all. If I bought a book for my then-girlfriend, who had (and indeed still has, although she is now my wife) rather different tastes from me, this would throw the recommendations all out of whack. However, the system was transparent and user-serviceable. Amazon would show me transparently why it had recommended Book X, usually because I had purchased Book Y. Beyond showing me, it would also let me go back into my purchase history and tell it not to use Book Y for recommendations (because it was not actually bought for me), thereby restoring balance to my feed. This made us both happy: I got higher-quality recommendations, and Amazon got a more accurate profile of me, that it could use to sell me more books — something it did very successfully.

Forget doing anything like that nowadays! If you watch Netflix on more than one device, especially if you ever watch anything offline, you'll have hit that situation where you've watched something but Netflix doesn't realise it or won't admit it. And can you mark it as watched, like we used to do with local files? (insert hollow laughter here) No, you'll have that "unwatched" episode cluttering up your "Up next" queue forever.

This is an example of the sort of behaviour that John Siracusa decried in his recent blog post, Streaming App Sentiments. This post gathers responses to his earlier unsolicited streaming app spec, where he discussed people's reactions to these sorts of "helpful" features.

People don’t feel like they are in control of their "data," such as it is. The apps make bad guesses or forget things they should remember, and the user has no way to correct them.

We see the same problem with Twitter's plans for ever greater personalisation. Twitter defaulted to an algorithmic timeline a long time ago, justifying the switch away from a simple chronological feed with the entirely true fact that there was too much volume for anyone to be a Twitter completist any more, so bringing popular tweets to the surface was actually a better experience for people. To repeat myself, this is all true; the problem is that Twitter did not give users any input into the process. Also, sometimes I actually do want to take the temperature of the Twitter hive mind right now, in this moment, without random twenty-hour-old tweets popping up out of sequence. The obvious solution of giving users actual choice was of course rejected out of hand, forcing Twitter into ever more ridiculous gyrations.

The latest turn is that for a brief shining moment they got it mostly right, but hilariously and ironically, completely misinterpreted user feedback and reversed course. So much for learning from the data… What happened is that Twitter briefly gave users the option of adding a "Latest Tweets" tab with chronological listing alongside the algorithmic default "Home" tab. Of course such an obviously sensible solution could not last, for the dispiriting reason that unless you used lists, the tabbed interface was new and (apparently) confusing. Another update therefore followed rapidly on the heels of the good one, which forced users to choose between "Latest Tweets" or "Home", instead of simply being able to have both options one tap apart.

Here's what it boils down to: to build one of these "personalisation" systems, you have to believe one of two things (okay, or maybe some combination):

You can deliver a better experience than (most) users can achieve for themselves
Controlling your users' experience benefits you in some way that is sufficiently important to outweigh the aggravation they might experience

The first is simply not true. It is true that it is important to deliver a high-quality default that works well for most users, and I am not opposed in principle to that default being algorithmically-generated. Back when, Twitter used to have "While you were away" section which would show you the most relevant tweets since you last checked the app. I found it a very valuable feature — except for the fact that I could not access it at will. It would appear at random in my timeline, or then again, perhaps not. There was no way to trigger it manually, or any place where it would appear reliably and predictably. You just had to hope — and then, instead of making it easier to access on demand, Twitter killed the entire feature in an update. The algorithmic default was promising, but it needed just a bit more control to make it actually good.

On the other hand, I quite like “while you were away” and wish I could access it on demand, instead of being reduced to hoping it shows up, like a peasant praying for rain.
— Dominic 🇪🇺🇺🇦🏳️‍🌈 (@dwellington) June 29, 2021

This leads us directly to the second problem: why not show the "While you were away" section on demand? Why would Netflix not give me an easy way to resume watching what I was watching before? They don't say, but the assumption is that the operators of these services have metrics showing higher engagement with their apps when they deny users control. Presumably what they fear is that, if users can just go straight to the tweets they missed or the show they were watching, they will not spend as much time exploring the app, discovering other tweets or videos that they might enjoy.

What is forgotten is that "engagement" just happens to be one metric that is easy to measure — but the ease of measurement does not necessarily make it the most important dimension, especially in isolation. If that engagement is me scrolling irritably around Twitter or Netflix, getting increasingly frustrated because I can't find what I want, my opinion of those platforms is actually becoming more corroded with every additional second of "engagement".

There is a common unstated assumption behind both of the factors above, which is that whatever system is driving the personalisation is perfect, both unbreakable in its functioning and without corner cases that may deliver sub-optimal results even when the algorithm is working as designed. One of the problems with black-box systems is that when (not if!) they break, users have no way to understand why they broke, nor to prevent them breaking again in the future. If the Twitter algorithm keeps recommending something to me, I can (for now) still go into my settings, find the list of interests that Twitter has somehow assembled for me, and delete entries until I get back to more sensible recommendations. With Netflix, there is no way for me to tell it to stop recommending something — presumably because they have determined that a sufficient proportion of their users will be worn down over time, and, I don't know, whatever the end goal is — watch Netflix original content instead of something they have to pay to license from outside.

All of this comes back to my oft-repeated point about privacy: what is it that I am giving up my personal data in exchange for, in the end? The promise is that all these systems will deliver content (and ads)(really it's the ads) that are relevant to my interests. Defenders of surveillance capitalism will point out that profiling as a concept is hardly new. The reason you find different ads in Top Gear Magazine, in Home & Garden, and in Monocle, is that the profile for the readership is different for each publication. But the results speak for themselves: when I read Monocle, I find the ads relevant, and (given only the budget) I would like to buy the products featured. The sort of ads that follow me around online, despite a wealth of profile information generated at every click, correlated across the entire internet, and going back *mumble* years or more, are utterly, risibly, incomprehensibly irrelevant. Why? Some combination of that "we know better" attitude, algorithmic profiling systems delivering less than perfect results, and of course, good old fraud in the adtech ecosystem.

So why are we doing this, exactly?

It comes back to the same issue as with engagement: because something is easy to measure and chart, it will have goals set against it. Our lives online generate stupendous volumes of data; it seems incredible that the profiles created from those megabytes if not gigabytes of tracking data have worse results than the single-bit signal of "is reading the Financial Times". There is also the ever-present spectre of "I know half of my ad spending is wasted, I just don't know which half". Online advertising with its built-in surveillance mechanisms holds out the promise of perfect attribution, of knowing precisely which ad it was which caused the customer to buy.

And yet, here we are. Now, legislators in the EU, in China, and elsewhere around the world are taking issue with these systems, and either banning them outright or demanding they be made transparent in their operation. Me, I'm hoping for the control that Amazon used to give me. My dream is to be able to tell YouTube that I have no interest in crypto, and then never see a crypto ad again. Here, advertisers, I'll give you a freebie: I'm in the market for some nice winter socks. Show me some ads for those sometime, and I might even buy yours. Or, if you keep pushing stuff in my face that I don't want, I'll go read a (paper) book instead. See what that does for engagement.

🖼️ Photos by Hyoshin Choi and Susan Q Yin on Unsplash

Amazon, Privacy, Social, Social Media, Twitter, Adtech, Netflix, Algorithms

The Thing With Zoom

Published by Dominic

April 8th, 2020

Permalink

Zoom was having an excellent quarantine — until it wasn’t.

This morning’s news is from Bloomberg: Zoom Sued for Fraud Over Privacy, Security Flaws. But how did we get here?

Here is what’s interesting about the Thing with Zoom: it’s an excellent example of a company getting it mostly right for its stated aims and chosen target market — and still getting tripped up by changing conditions.

Attachment

To recap, very quickly: with everybody suddenly stuck home and forbidden to go to the office, there was an equally sudden explosion in video calling — first for purely professional reasons, but quickly spreading to virtual happy hours, remote karaoke, video play dates, and the like. Zoom was the major beneficiary of this growth, with daily active users going from 10 million to over 200 million in 3 months.

One of the major factors that enabled this explosive growth in users is that Zoom has always placed a premium on ease of use — some would argue, at the expense of other important aspects, such as the security and privacy of its users.

There is almost always some tension between security and usability. Security features generally involve checking, validating, and confirming that a user is entitled to perform some action, and asking them for permission to take it. Zoom generally took the approach of not asking users questions which might confuse them, and removing as much friction as possible from the process of getting users into a video call — which is, after all, the goal of its enterprise customers.

Doing The Right Thing — Wrong

I cannot emphasise enough that this focus on ease of use is what made Zoom successful. I think I have used every alternative, from the big names like WebEx (even before its acquisition by Cisco!), to would-be contenders like whatever Google’s thing is called this week, to has-beens like Skype, to also-rans like BlueJeans. The key use case for me and for Zoom’s other corporate customers is, if I send one of my prospects a link to a video call, how quickly can they show up in my call so that I can start my demo? Zoom absolutely blew away the competition at this one crucial task.

Arguably, Zoom pushed their search for ease of use a bit too far. On macOS, if you click on a link to a Zoom chat, a Safari window will open and ask you whether you want to run Zoom. This one click is the only interaction that is needed, especially if you already have Zoom installed, but it was apparently still too much — so Zoom actually started bundling a hidden web server with their application, purely so that they could bypass this alert.

Sneaking a web server onto users’ systems was bad enough, but worse was to come. First of all, Zoom’s uninstall routine did not remove the web server, and it was capable of reinstalling the Zoom client without user interaction. But what got the headlines was the vulnerability that this combination enabled: a malicious website could join visitors to a Zoom conference, and since most people had their webcam on by default, active video would leak to the attacker.

This behaviour was so bad that Apple actually took the unprecedented step of issuing an operating system patch to shut Zoom down.

Attachment

Problem solved?

This hidden-web-server saga was a preview run for what we are seeing now. Zoom had over-indexed on its customers, namely large corporations who were trying to reach their own customers. The issue with being forcibly and invisibly joined to a Zoom video conference simply by visiting a malicious web server did not affect those customers – but it did affect Zoom’s users.

The distinction is one that is crucial in the world of enterprise software procurement, where the person who signs the cheque is rarely the one who will be using the tool. Because of this disconnect, vendors by and large optimise for that economic buyer’s requirements first, and only later (if at all) on the actual users’ needs.

With everyone locked up at home, usage of Zoom exploded. People with corporate accounts used them in the evening to keep up with their social lives, and many more signed up for the newly-expanded free tier. This new attention brought new scrutiny, and from a different angle from what Zoom was used to or prepared for.

For instance, it came to light that the embedded code that let users log in to Zoom on iOS with their Facebook credentials was leaking data to Facebook even for users without a Facebook account. Arguably, Zoom had not done anything wrong here; as far as I can tell, the leakage was due to Facebook’s standard SDK grabbing more data than it was supposed to have, in a move that is depressingly predictable coming from Facebook.

In a normal circumstance, Zoom could have apologised, explained that they had moved too quickly to enable a consumer feature that was outside their usual comfort zone without understanding all the implications, and moved on. However, because of the earlier hidden-web-server debacle, there was no goodwill for this sort of move. Zoom did act quickly to remove the offending Facebook code, but worse was to come.

Less than a week later, another story broke, claiming that Zoom is Leaking Peoples' Email Addresses and Photos to Strangers. Here is where the story gets really instructive.

Uh oh, it looks like your embed code is broken.

This "leak" is due to the sort of strategy tax that was almost inevitable in hindsight. Basically, Zoom added a convenience feature for its enterprise customers, called Company Directory, which assumes that anyone sharing the same domain in their email address works for the same company. In line with their guiding principle of building a simple and friction-free user experience, this assumption makes it easier to schedule meetings with one’s colleagues.

The problem only arose when people started joining en masse from their personal email accounts. Zoom had excluded the big email providers, so that people would not find themselves with millions of "colleagues" just because they had all signed up with Gmail accounts. However, they had not made an exhaustive list of all email providers, and so users found themselves with "colleagues" who simply happened to be customers of the same ISP or email provider. The story mentioned Dutch ISPs like xs4all.nl, dds.nl, and quicknet.nl, but the same issue would presumably apply to all small regional ISPs and niche email providers.

Ordinarily, this sort of "privacy leak" is a storm in a teacup; it’s no worse than a newsletter where all the names are in the To: line instead of being in Bcc:. However, by this point Zoom was in the full glare of public attention, and the story blew up even in the mainstream press, outside of the insular tech world.

Now What?

Zoom’s CEO, Eric Yuan, issued a pretty comprehensive apology. I will quote the key paragraphs below:

First, some background: our platform was built primarily for enterprise customers – large institutions with full IT support. These range from the world’s largest financial services companies to leading telecommunications providers, government agencies, universities, healthcare organizations, and telemedicine practices. Thousands of enterprises around the world have done exhaustive security reviews of our user, network, and data center layers and confidently selected Zoom for complete deployment.

However, we did not design the product with the foresight that, in a matter of weeks, every person in the world would suddenly be working, studying, and socializing from home. We now have a much broader set of users who are utilizing our product in a myriad of unexpected ways, presenting us with challenges we did not anticipate when the platform was conceived.

These new, mostly consumer use cases have helped us uncover unforeseen issues with our platform. Dedicated journalists and security researchers have also helped to identify pre-existing ones. We appreciate the scrutiny and questions we have been getting – about how the service works, about our infrastructure and capacity, and about our privacy and security policies. These are the questions that will make Zoom better, both as a company and for all its users.

We take them extremely seriously. We are looking into each and every one of them and addressing them as expeditiously as we can. We are committed to learning from them and doing better in the future.

It’s too early to say what the long-term consequences for Zoom will be, but this is a good apology, and a reasonable set of early moves by the company to repair its public image. To be clear, the company still has a long way to go, and to succeed, it will need to rebalance its exclusive focus on usability to be much more considerate of privacy and security.

For instance, there were a couple of zero-days bugs found in the macOS client (since patched in Version 4.6.9) which would have allowed for privilege escalation. These particular flaws cannot be remotely exploited, so they would require would-be attackers to have access to the operating system already, but it’s still far from ideal. In particular, one of these bugs took advantage of some shortcuts that Zoom had taken in its installer, once again in the name of ease-of-use.

Ever wondered how the @zoom_us macOS installer does it’s job without you ever clicking install? Turns out they (ab)use preinstallation scripts, manually unpack the app using a bundled 7zip and install it to /Applications if the current user is in the admin group (no root needed). pic.twitter.com/qgQ1XdU11M
— Felix (@c1truz_) March 30, 2020

Installers on macOS have the option of running a "preflight" check, where they verify all their prerequisites are met. After this step, they will request confirmation from the user before running the installer proper. Zoom’s installer actually completed all its work in this preflight step, including specifically running a script with root (administrator) privileges. This script could be replaced by an attacker, whose malicious script would then be run with those same elevated privileges.

Personally I hope that Zoom figures out a way to resolve this situation. The user experience is very pleasant (even after installation!), and given that I work from home all the time — not just in quarantine — Zoom is a key part of my work environment.

Lessons To Learn

1: Pivoting is hard

Regardless of the outcome for Zoom, though, this is a cautionary tale in corporate life and communications. Zoom was doing everything right for its previous situation, but this exclusive focus made it difficult to react to changes in that situation. The pivot from corporate enterprise users to much larger numbers of personal users is an opportunity for Zoom if they can monetise this vastly expanded user base, but it also exposes them to a much-changed environment. Corporate users are more predictable in their environments and routines, and in the way they interact with apps and services. Home users will do all sorts of unexpected things and come from unexpected places, exposing many more edge cases in developers’ assumptions.

Companies should not assume that they can easily "pivot" to a whole new user population, even one that is attractively larger and more promising of profits, without making corresponding changes to core assumptions about how they go to market.

2: A good reputation once lost is hard to regain

A big part of Zoom’s problem right now is that they had squandered their earlier goodwill with techies when they hid a web server on their machines. Without that earlier situation, they might have been able to point out that many of the current problems are on the level of tempests in teacups — bugs to be sure, which need to be fixed, but hardly existential PROBLEMS.

As it happened, though, the Internet hive mind was all primed to think the worst of Zoom, and indeed actively went looking for issues once Zoom was in the glare of the spotlight. In this situation, there is not much to be done in the short term, apart from what Zoom actually did: apologise profusely, promise not to do it again, and attempt to weather the storm.

One move I have not yet seen them make which would be very powerful would be to hire a well-known security expert with a reputation for impartiality. One part of their job would be to act as figurehead and lightning conductor for the company’s security efforts, but an equally important part would be as internal naysayer: the VP of Nope, someone able to say a firm NO to bad ideas. Hiding a web server? Bad idea. Shortcutting the installer? Bad idea. Assuming everyone with an email address not on a very short list of mega-providers is a colleague of everyone else with the same email domain? Bad idea.

UPDATE: Showing how amazingly prescient this recommendation was, shortly after I published this post, Alex Stamos announced that he was joining Zoom to help them "build up their security program":

Uhoh, This content has sprouted legs and trotted off.

Alex Stamos is of course the ex-CSO at Facebook, who since departing FB has made something of a name for himself by commenting publicly about security and privacy issues. As such, he’s pretty much the perfect hire: high public profile, known as an impartial expert, and deeply experienced specifically in end-user security issues, not just the sort of enterprise aspects which Zoom had previously been focusing on.

I will be watching his and Zoom’s next moves with interest.

3: Bottom line: build good products

Generally people prefer an insecure product that works over a secure product that doesn’t. This is why we’re talking about Zoom at all right now.
— Jeremiah Grossman (@jeremiahg) April 1, 2020

Most companies need to review both security and usability — but it’s probably worth noting that a good product is the best way of saving yourself. Even in a post-debacle roundup of would-be alternatives to Zoom, Zoom still came out ahead, despite being penalised for its security woes. They still have the best product, and, yes, the one that is easiest to use.

But if you get the other two factors right, you, your good product, and your long-suffering comms team will all have an easier life.

🖼️ Photos by Allie Smith on Unsplash

Privacy, Security, Facebook, Zoom, Webex

Be Smart, Use Dumb Devices

Published by Dominic

September 19th, 2019

Permalink

The latest news in the world of Things Which Are Too "Smart" For Their Users’ Good is that Facebook have released a new device in their Portal range: a video camera that sits on your TV and lets you make video calls via Facebook Messenger and WhatsApp (which is also owned by Facebook).

Attachment

This is both a great idea and a terrible one. I am on the record as wanting a webcam for my AppleTV so that I could make FaceTime calls from there:

I used to think a TV with a webcam built in would be cool for things like videoconferencing with grandparents and so on, but all the egregious #privacy violations lately have ensured that no "smart" TV will ever be allowed to connect to my network. 📺 https://t.co/bWPY1vc4sQ
— Dominic 🇪🇺🇺🇦🏳️‍🌈 (@dwellington) June 24, 2019

Amazon Plans to Unveil New Echo https://t.co/73qHMTPmI5 < I don't trust Amazon with this, but I do want a webcam for my #AppleTV!
— Dominic 🇪🇺🇺🇦🏳️‍🌈 (@dwellington) May 9, 2017

In fact, I already do the hacky version of this by mirroring my phone’s screen with AirPlay and then propping it up so the camera has an appropriate view.

Why would I do this? One-word answer: kids. The big screen has a better chance of holding their attention, and a camera with a nice wide field of view would be good too, to capture all the action. Getting everyone to sit on the couch or rug in front of the TV is easier than getting everyone to look into a phone (or even iPad). I’m not sure about the feature where the camera tries to follow the speaker; in these sorts of calls, several people are speaking most of the time, so I can see it getting very confused. It works well in boardroom setups where there is a single conversational thread, but even then, most of the good systems I’ve seen use two cameras, so that the view can switch in software rather than waiting for mechanical rotation.

So much for the "good idea" part. The reason it’s a terrible idea in this case is that it’s from Facebook. Nobody in their right mind would want an always-on device from Facebook in their living room, with a camera pointed at their couch, and listening in on the video calls they make. Facebook have shown time and time and time again that they simply cannot be trusted.

An example of why the problem is Facebook itself, rather than any one product or service, is the hardware switch for turning the device’s camera off. The highlight shows if the switch is in the off position, and a LED illuminates… to show that the camera and microphone are off.

Attachment

Many people have commented that this setup looks like a classic dark pattern in UX, just implemented in hardware. My personal opinion is that the switch is more interesting as an indicator of Facebook’s corporate attitude to internet services: they are always on, and it’s an anomaly if they are off. In fact, they may even consider the design of this switch to be a positive move towards privacy, by highlighting when the device is in "privacy mode". The worrying aspect is that this design makes privacy an anomaly, a mode that is entered briefly for whatever reason, a bit like Private or Incognito mode in a web browser. If you’re wondering why a reasonable person might be concerned about Facebook’s attitude to user privacy, a quick read of just the "Privacy issues" section of the Wikipedia article on Facebook criticism will probably have you checking your permissions. At a bare minimum, I assume that entering "privacy mode" is itself a tracked event, subject to later analysis…

Trust, But Verify

IoT devices need a high degree of trust anyway because of all the information that they are inherently privy to. Facebook have proven that they will go to any lengths to gather information, including information that was deliberately not shared by users, process it for their own (and their advertising customers’) purposes, and do an utterly inadequate job of protecting it.

Uh oh, it looks like your embed code is broken.

The idea of a smart home is attractive, no question – but why do the individual devices need to be smart in their own right? Unnecessary capabilities increase the vulnerability surface for abuse, either by a vendor/operator or by a malicious attacker. Instead, better to focus on devices which have the minimum required functionality to do their job, and no more.

A perfect example of this latter approach is IKEA’s collaboration with Sonos. The Symfonisk speakers are not "smart" in the sense that they have Alexa, Siri, or Google Assistant on board. They also do not connect directly to the Internet or to any one particular service. Instead, they rely on the owner’s smartphone to do all the hard work, whether that is running Spotify or interrogating Alexa. The speaker just plays music.

I would love a simple camera that perched on top of the TV, either as a peripheral to the AppleTV, or extending AirPlay to be able to use video sources as well. However, as long as doing this requires a full device from Facebook¹ – or worse, plugging directly into a smart TV² – I’ll keep on propping my phone up awkwardly and sharing the view to the TV.

Or Google or Amazon – they’re not much better. ↩
Sure, let my TV watch everything that is displayed and upload it for creepy "analysis".³ ↩
To be clear, I’m not wearing a tinfoil hat over here. I have no problem simply adding a "+1" to the viewer count for The Expanse or whatever, but there’s a lot more that goes on my TV screen: photos of my kids, the content of my video calls, and so on and so forth. I would not be okay with sharing the entire video buffer with unknown third parties. This sort of nonsense is why my TV has never been connected to the WiFi. It went online once, using an Ethernet cable, to get a firmware update – and then I unplugged the cable. ↩

Privacy, Iot, Facebook

Once More On Privacy

Published by Dominic

June 2nd, 2019

Permalink

Facebook is in court yet again over the Cambridge Analytica scandal, and one of their lawyers made a most revealing assertion :

There is no invasion of privacy at all, because there is no privacy

Now on one level, this is literally true. Facebook's lawyer went on to say that:

Facebook was nothing more than a "digital town square" where users voluntarily give up their private information

The issue is a mismatch in expectations. Users have the option to disclose information as fully public, or variously restricted: only to their friends, or to members of certain groups. The fact that something is said in the public street does not mean that the user would be comfortable having it published in a newspaper, especially if they were whispering into a friend’s ear at the time.

Legally, Facebook may well be in the right (IANAL, nor do I play one on the Internet), but in terms of users’ expectations, they are undoubtedly in the wrong. However, for once I do not lay all the blame on Facebook.

Mechanisation and automation are rapidly subverting common-sense expectations in a number of fields, and consequences can be wide-reaching. Privacy is one obvious example, whether it is Facebook’s or Google’s analysis of our supposedly private conversations, or facial recognition in public places.

For an example of the reaction to the deployment of these technologies, the city of San Francisco, generally expected to be an early adopter of technological solutions, recently banned the use of facial recognition technology. While the benefits for law enforcement of ubiquitous automated facial recognition are obvious, the adoption of this technology also subverts long-standing expectations of privacy – even in undoubtedly public spaces. While it is true that I can be seen and possibly recognised by anyone who is in the street at the same time as me, the human expectation is that I am not creating a permanent, searchable record of my presence in the street at that time, nor that such a record would be widely available.

To make the example concrete, let’s talk for a moment about numberplate recognition. Cars and other motor vehicles have number plates to make them recognisable, including for law enforcement purposes. As technology developed, automated reading of license plates became possible, and is now widely adopted for speed limit enforcement. Around here things have gone a step further, with average speeds measured over long distances.

Who could object to enforcing the law?

The problem with automated enforcement is that it is only as good as it is programmed to be. It is true that hardly anybody breaks the speed limit on the monitored stretches of motorway any more – or at least, not more than once. However, there are also a number of negative consequences. Lane discipline has fallen entirely by the wayside since the automated systems were introduced, with slow vehicles cruising in the middle or even outside lanes, with empty lanes on the inside. The automated enforcement has also removed any pressure to consider what is an appropriate speed for the conditions, with many drivers continuing to drive at or near the speed limit even in weather or traffic conditions where that speed is totally unsafe. Finally, there is no recognition that, at 4am with nobody on the roads, there is no need to enforce the same speed limit that applies at rush hour.

Human-powered on-the-spot enforcement – the traffic cop flagging down individual motorists – had the option to modulate the law, turning a blind eye to safe speed and punishing driving that might be inside the speed limit but unsafe in other ways. Instead, automated enforcement is dumb (it is, after all, binary) and only considers the single metric it was designed to consider.

There are of course any number of problems with a human-powered approach as well; members of ethnic or social minorities all have stories involving the police looking for something – anything – to book them for. I’m a straight white cis-het guy, and still once managed to fall foul of the proverbial bored cops, who took my entire car apart looking for drugs (that weren’t there) and then left me by the side of the road to put everything back together. However, automated enforcement makes all of these problems worse.

Facial recognition has documented issues with accuracy when it comes to ethnic minorities and women – basically anyone but the white male programmers who created the systems. If police start relying on such systems, people are going to have serious difficulties trying to prove that they are not the person in the WANTED poster – because the computer says they are a match. And that’s if they don’t just get gunned down, of course.

It is notoriously hard to opt out of these systems when they are used for advertising, but when they are used for law enforcement, it becomes entirely impossible to opt out, as a London man found when he was arrested for covering his face during a facial recognition trial on public streets. A faulty system is even worse than a functional one, as its failure modes are unpredictable.

Systems rely on data, and data storage is also problematic. I recently had to get a government-issued electronic ID. Normally this should be a simple online application, but I kept getting weird errors, so I went to the office with my (physical) ID instead. There, we realised that the problem was with my place of birth. I was born in what was then Strathclyde, but this is no longer an option in up-to-date systems, since the region was abolished in 1996. However, different databases were disagreeing, and we were unable to move forward. In the end, the official effectively helped me to lie to the computer, picking an acceptable jurisdiction in order to move forwards in the process – and thereby of course creating even more inaccuracies and inconsistency. So much for "the computer is always right"… Remember, kids: Garbage In, Garbage Out!

What, Me Worry?

The final argument comes down, as it always does with privacy, to the objection that "there’s nothing to fear if you haven’t done anything wrong". Leaving aside the issues we just discussed around the possibility of running into problems even when you really haven’t done anything wrong, the issue is with the definition of "wrong". Social change is often driven by movement in the grey areas of the law, as well as selective enforcement of those laws. First gay sex is criminalised, so underground gay communities spring up. Then attitudes change, but the laws are still on the books; they just aren’t enforced. Finally the law catches up. If algorithms actually are watching all of our activity and are able to infer when we might be doing something that’s frowned upon by some¹, that changes the dynamic very significantly, in ways which we have not properly considered as a society.

And that’s without even considering where else these technologies might be applied, beyond our pleasant Western bubble. What about China, busy turning Xinjiang into an open-air prison for the Uyghur minority? Or "Saudi" Arabia, distributing smartphone apps to enable husbands to deny their wives permission to travel?

Expectations of privacy are being subverted by scale and automation, without a real conversation about what that means. Advertisers and the government stick to the letter of the law, but there is no recognition of the material difference between surveillance that is human-powered, and what happens when the same surveillance is automated.

Photo by Glen Carrie and Bryan Hansonvia Unsplash

And remember, the algorithms may not even be analysing your own data, which you carefully secured and locked down. They may have access to data for one of your friends or acquaintances, and then the algorithm spots a correlation in patterns of communication, and associates you with them. Congratulations, you now have a shadow profile. And what if you are just really unlucky in your choice of local boozer, so now the government thinks you are affiliated with the IRA offshoot du jour, when all you were after was a decent pint of Guinness? ↩

Privacy, Facebook

Advertise With The End In Mind

Published by Dominic

January 18th, 2019

Permalink

Even though I no longer work directly in marketing, I’m still adjacent, and so I try to keep up to date with what is going on in the industry. One of the most common-sensical and readable voices is Bob Hoffman, perhaps better known as the Ad Contrarian. His latest post is entitled The Simple-Minded Guide To Marketing Communication, and it helpfully dissects the difference between brand advertising and direct-response advertising (emphasis mine):

[…] our industry's current obsession with precision targeted, one-to-one advertising is misguided. Precision targeting may be valuable for direct response. But history shows us that direct response strategies have a very low likelihood of producing major consumer facing brands. Building a big brand requires widespread attention. Precision targeted, one-to-one communication has a low likelihood of delivering widespread attention.

Now Bob is not just an armchair critic; he has quite the cursus honorum in the advertising industry, and so he speaks from experience.

In fact, events earlier this week bore out his central thesis. With the advent of GDPR, many US-based websites opted to cut off EMEA readers rather than attempt to comply with the law. This action helpfully made it clear who was doing shady things with their users’ data, thereby providing a valuable service to US readers, while rarely inconveniencing European readers very much.

The New York Times, with its strong international readership, was not willing to cut off overseas ad revenue. Instead, they went down a different route (emphasis still mine):

The publisher blocked all open-exchange ad buying on its European pages, followed swiftly by behavioral targeting. Instead, NYT International focused on contextual and geographical targeting for programmatic guaranteed and private marketplace deals and has not seen ad revenues drop as a result, according to Jean-Christophe Demarta, svp for global advertising at New York Times International.

Digiday has more details, but that quote has the salient facts: turning off invasive tracking – and the targeted advertising which relies on it – had no negative results whatsoever.

This is of course because knowing someone is reading the NYT, and perhaps which section, is quite enough information to know whether they are an attractive target for a brand to advertise to. Nobody has ever deliberately clicked from serious geopolitical analysis to online impulse shopping. However, the awareness of a brand and its association with Serious Reporting will linger in readers’ minds for a long time.

The NYT sells its own ads, which is not really scalable for most outlets, but I hope other people are paying attention. Maybe there is room in the market for an advertising offering that does not force users to deal with cookies and surveillance and interstitial screens and page clutter and general creepiness and annoyance, while still delivering the goods for its clients?

🖼️ Photo by Kate Trysh on Unsplash

Marketing, Privacy, Advertising, Adtech

The Shape Of 2019

Published by Dominic

January 7th, 2019

Permalink

They said they need real-world examples, but I don’t want to be their real-world mistake

That quote comes from a NYT story about people attacking self-driving vehicles. I wrote about these sentiments before, after the incident which spurred these attacks:

It’s said that you shouldn’t buy any 1.0 product unless you are willing to tolerate significant imperfections. Would you ride in a car operated by software with significant imperfections?
Would you cross the street in front of one?
And shouldn’t you have the choice to make that call?

Cars are just the biggest manifestation of this experimentation that is visible in the real world. How often do we have to read about Facebook manipulating the content of users’ feeds – just to see what happens?

And what about this horrific case?

Uhoh, This content has sprouted legs and trotted off.

Meanwhile, my details were included in last year’s big Marriott hack, and now I find out that my passport details may have been included in the leaked information. Marriott’s helpful suggestion? A year’s free service – from Experian. Yes, that Experian, the one you know from one of the biggest hacks ever.

I don’t want to be any company’s real world mistake in 2019.

🖼️ Photo by chuttersnap on Unsplash

Cars, Privacy, Social, Social Media, Uber

Privacy Policy

Published by Dominic

May 28th, 2018

Permalink

Short version: I don’t have one.

Long version: I don’t gather any data, I even turned off Google Analytics (and not just because it was depressing me with its minuscule numbers!), and I don’t have access to the server logs even if I wanted to look at IP addresses or whatever. This blog’s host, Postach.io, have their own privacy policy here.

Regarding analytics specifically, I am somewhat curious about how many people read individual posts, but I’m not going to sell you out to Google so you can see adverts for whatever you read about here following you all over the internet for the next two weeks. Neither of us gets enough benefit for that to be worthwhile.

Privacy, Gdpr

Privacy Versus AI

Published by Dominic

April 8th, 2018

Permalink

There is a widespread assumption in tech circles that privacy and (useful) AI are mutually exclusive. Apple is assumed to be behind Amazon and Google in this race because of its choice to do most data processing locally on the phone, instead of uploading users’ private data in bulk to the cloud.

A recent example of this attitude comes courtesy of The Register:

Predicting an eventual upturn in the sagging smartphone market, [Gartner] research director Ranjit Atwal told The Reg that while artificial intelligence has proven key to making phones more useful by removing friction from transactions, AI required more permissive use of data to deliver. An example he cited was Uber "knowing" from your calendar that you needed a lift from the airport.

I really, really resent this assumption that connecting these services requires each and every one of them to have access to everything about me. I might not want information about my upcoming flight shared with Uber – where it can be accessed improperly, leading to someone knowing I am away from home and planning a burglary at my house. Instead, I want my phone to know that I have an upcoming flight, and offer to call me an Uber to the airport. At that point, of course I am sharing information with Uber, but I am also getting value out of it. Otherwise, the only one getting value is Uber. They get to see how many people in a particular geographical area received a suggestion to take an Uber and declined it, so they can then target those people with special offers or other marketing to persuade them to use Uber next time they have to get to the airport.

I might be happy sharing a monthly aggregate of my trips with the government – so many by car, so many on foot, or by bicycle, public transport, or ride sharing service – which they could use for better planning. I would absolutely not be okay with sharing details of every trip in real time, or giving every busybody the right to query my location in real time.

The fact that so much of the debate is taken up with unproductive discussions is what is preventing progress here. I have written about this concept of granular privacy controls before:

The government sets up an IDDB which has all of everyone's information in it; so far, so icky. But here's the thing: set it up so that individuals can grant access to specific data in that DB - such as the address. Instead of telling various credit card companies, utilities, magazine companies, Amazon, and everyone else my new address, I just update it in the IDDB, and bam, those companies' tokens automatically update too - assuming I don't revoke access in the mean time.

This could also be useful for all sorts of other things, like marital status, insurance, healthcare, and so on. Segregated, granular access to the information is the name of the game. Instead of letting government agencies and private companies read all the data, users each get access only to those data they need to do their jobs.

Unfortunately, we are stuck in an stale all-or-nothing discussion: either you surround yourself with always-on internet-connected microphones and cameras, or you might as well retreat to a shack in the woods. There is a middle ground, and I wish more people (besides Apple) recognised that.

Photo by Kyle Glenn on Unsplash

Amazon, Privacy, Gartner, Apple, Google, Ai, Artificial Intelligence

Sowing Bitter Seeds

Published by Dominic

July 31st, 2017

Permalink

The Internet is outraged by… well, a whole lot of things, as usual, but in particular by Apple. For once, however, the issue is not phones that are both unexciting and unavailable, lacking innovation and wilfully discarding convention, and also both over- and under-priced. No, this time the issue is apps, and in particular VPN apps.

Attachment

Authoritarian regimes around the world (Russia, "Saudi" Arabia, China, North Korea, etc) have long sought to control their populations' access to information in general, and to the Internet in particular. Of course anyone with a modicum of technical savvy - or a friend, relative, or passing acquaintance willing to do the simple setup - can keep unfettered access to the Internet by going through a Virtual Private Network, or VPN.

A VPN does what it says on the tin: it creates a virtual network that connects directly with an endpoint somewhere else; importantly, somewhere outside the authoritarian regime's control. As such, VPNs have always existed in something of a grey area, but now China (the People's Republic, not that other China) has gone ahead and formally banned their use.

In turn, Apple have responded by removing unregistered VPN apps (which in practical terms means all of them) from their App Store in China. In the face of the Internet's predictable outrage, Apple provided this bald statement (via TechChrunch):

Earlier this year China’s MIIT announced that all developers offering VPNs must obtain a license from the government. We have been required to remove some VPN apps in China that do not meet the new regulations. These apps remain available in all other markets where they do business.

Now Apple do have a point; the law is indeed the law, and because they operate in China, they need to enforce it, just as they would with laws in any other country.

Here's the rub, though. By the regionalised way they have set up their App Store service, they have made themselves unnecessarily vulnerable to this sort of arm-twisting by unfriendly governments. Last time I wrote about geo-fencing and its consequences, the cause of the day was Russia demanding removal of the LinkedIn app, and China (them again!) demanding removal of the New York Times app. As I wrote at the time, companies like Apple originally set up the infrastructure for these geographic restrictions to enable IP protection, but the same tools are being repurposed for censorship:

This sort of restriction used to be "just" hostile to consumers. Now, it is turning into a weapon that authoritarian regimes can wield against Apple, Google, and whoever else. Nobody would allow Russia to ban LinkedIn around the world, or China to remove the New York Times app everywhere - but because dedicated App Stores exist for .ru and .cn, they are able to demand these bans as local exceptions, and even defend them as respecting local laws and sensibilities. If there were one worldwide App Store, this gambit would not work.

The argument against the infrastructure of laws and regulations that was put in place to enable (ineffective) IP restrictions was always that it could be, and would be, repurposed to enable repression by authoritarian regimes. People scoffed at these privacy concerns, saying "if you have nothing to hide, you have nothing to fear". But what if your government is the next to decide that reading the NYT or having a LinkedIn profile is against the law? How scared should you be then?

If you are designing a social network or other system with the expectation of widespread adoption, these days this has to be part of your threat model. Otherwise, one day the government may come knocking, demanding your user database for any reason or no reason at all - and what seemed like a good idea at the time will end up messing up a lot of people's lives.

Product designers by and large do not think of such things, as we saw when Amazon decided that it would be perfectly reasonable to give everyone in your address book access to your Alexa device - and make it so users could not turn off this feature without a telephone call to Amazon support.

How well do you think that would go down if you were a dissident, or just in the social circle of one?

Our instinctive attitude to data is to hoard them, but this instinct is obsolete, forged in a time when data were hard to gather, store, and access. It took something on the scale of the Stasi to build and maintain profiles on even six million citizens (out of a population of sixteen million), and the effort and expense was part of what broke the East German regime in the end. These days, it's trivial to build and access such a profile for pretty much anyone, so we need to change our thinking about data - how we gather them, and how we treat them once we have them.

Personal data are more akin to toxic waste, generated as a byproduct of valuable activity and needing to be stored with extreme care because of the dire consequences of any leaks. Luckily, data are different from toxic waste in one key respect: they can be deleted, or better, never gathered in the first place. The same goes for many other choices, such as restricting users to one particular geographical App Store, or making it easy to share your entire contact list (including by mistake), but very difficult to take that decision back.

What other design decisions are being made today based on obsolete assumptions that will come back to bite users in the future?

UPDATE: And there we go, now Russia is following China’s example and banning VPNs as well. The idea of a technical fix to social and legal problems is always a short-term illusion.

Image by Sean DuBois via Unsplash

Amazon, Privacy, Big Data, Apple

Find the thread

Showing all posts tagged privacy:

PrivateGPT

Help, I'm Being Personalised!

So why are we doing this, exactly?

The Thing With Zoom

Doing The Right Thing — Wrong

Problem solved?

Now What?

Lessons To Learn

1: Pivoting is hard

2: A good reputation once lost is hard to regain

3: Bottom line: build good products

Be Smart, Use Dumb Devices

Trust, But Verify

Once More On Privacy

Who could object to enforcing the law?

What, Me Worry?

Advertise With The End In Mind

The Shape Of 2019

Privacy Policy

Privacy Versus AI

Sowing Bitter Seeds

Dominic

Latest Posts

Showing all posts tagged privacy:

So why are we doing this, exactly?

Doing The Right Thing — Wrong

Problem solved?

Now What?

Lessons To Learn

1: Pivoting is hard

2: A good reputation once lost is hard to regain

3: Bottom line: build good products

Trust, But Verify

Who could object to enforcing the law?

What, Me Worry?

Latest Posts

Tag Cloud