The privacy delusion
14 May 2009
Nello Cristianini, Professor of Artificial Intelligence in the Department of Engineering Mathematics, suspects that most of us are blissfully unaware of just how much we are spied upon.
In order to be useful, information needs to be gathered, stored or transmitted, then processed and finally acted upon. Every step of this chain has undergone major transformations in the past decades and can now be done automatically, cheaply, and very efficiently, by machines. As we conduct our lives in this new and empowering digital age, we leave behind a permanent trail of personal data that is never deleted, never lost. As a result, it is carefully analysed – and even traded – in order that our behaviour can be modelled and, in some cases, influenced. Never before has so much data been collected about so many people with such ease.
The AOL case study
Most users of the internet are not aware that it is standard practice for all search engines to gather and analyse a log of every query that each user makes. Together with the content and time of the query, information is collected that allows analysts to identify the machine from which the query has been made using cookies, IP addresses and, in some cases, user login information. On 4 August 2006, AOL’s research labs released a file on one of its websites containing the search logs of over 650,000 users that had been intended for research purposes. Each user was identified by a unique ID number so that it was possible to connect queries performed by the same person, but not to identify them. A few days later, acknowledging that this was an error, AOL removed the file from public access and those responsible were later fired, but the data are still available at various internet locations, if you know where to look.
This incident gives us a rare glimpse into the often invisible backroom of online businesses, providing direct experience of the trail we leave behind in ‘transaction space’ every day.
Let us follow user 98280, probably a couple using the same computer. There seems to be an abusive male and a pregnant female, possibly addicted to cocaine. The query log reveals a series of sessions alternating in topic from ‘ovulation calculator’ to ‘pregnancy calendar’ and ‘effect of addictions on foetus’, with totally different queries for ‘girls gone wild’ and ‘fine black girls’. Sadly, we also see queries like ‘dealing with spouse that has bipolar disorder’, ‘coping with abusive spouse’ and ‘prayers for relationship problems’. Slowly the story unfolds, revealing the most intimate details and anxieties, and even the intentions, of two people mistakenly under the impression of being in total privacy.
In the AOL query log example, we only have anonymous search queries for a three-month period, but search engines keep data for much longer than that. Often they also have the names and addresses of their users and, if they provide e-shopping services, their bank details. Furthermore, much of this personal information is readily available to various organisations and can be purchased (or rented) like any other commodity.
Do you want to buy a list of 10,000 alcohol-drinking, pet-owning, frequent flyers from the UK? Names and home addresses? Many companies can help you. One such company will charge a basic rate of £1,700 for a list, more for each additional consumer attribute you want included. The company boasts a database of 40 million individuals living in 22 million households in the UK. Modern pattern analysis by means of intelligent software that uses statistics, artificial intelligence and efficient algorithms can then detect subtle trends and anomalies in these data, again allowing predictions to be made about our future behaviour.
The corporate mission of Google is ‘to organise all the information in the world’. Already it provides (for free) web search, book search and email facilities; as well as online calendar, video, document and photograph storage, and much more. For many of these services you have to give your name and email address and, for the services you pay for, you also need to provide your address and banking information.
Google has also acquired a company, Doubleclick, whose business is to track the behaviour of users over multiple partner websites. Connecting the behaviour of a user when shopping for holidays with the behaviour of the same user when reading the news or searching for a house can multiply the power of the inferences to be drawn about them. Could Google slowly turn into Big Brother, keeping track of its users and deciding what information those users will become aware of? Try sending yourself an email using Google mail, about something you might not want to share with others, and see what appears, within nanoseconds, in the right-hand column of your screen. It’s scary!
My own hunch is that Big Brother… will turn out to be not a greedy power-seeker but a relentless bureaucrat obsessed with efficiency
(Vance Packard, 1966)
Today’s technology allows us to collect and exploit a vast number of diverse data about individuals and groups, and we should remember that states, not just businesses, are engaged in gathering information about our activities. We are creating a new type of society where the notion of privacy is very different from what we are used to, and therefore expect. As we sleep-walk irreversibly into this new world, we should develop concepts, laws and values to help us exploit all that information technology has to offer us, without creating a nightmare for our children. We need to be aware that we are venturing into a completely unexplored world – and there will be no going back.
To read an in-depth version of this article, visit: www.see-a-pattern.org