“I no longer look at somebody’s CV to determine if we will interview them or not,” declares Teri Morse, who oversees the recruitment of 30,000 people each year at Xerox Services. Instead, her team analyses personal data to determine the fate of job candidates.
She is not alone. “Big data” and complex algorithms are increasingly taking decisions out of the hands of individual interviewers – a trend that has far-reaching consequences for job seekers and recruiters alike.
What is big data?
It’s a bit of a misnomer: “Ex-Guardian writer Simon Rogers once said, ‘Big data is data that is one bit too much for you to be comfortable’, and this is probably the best definition I’ve read.
“The volume of data is not irrelevant, but not as important as it sounds. More important is the ability to link diverse datasets with each other.”
– Giuseppe Sollazzo, senior systems analyst at St George’s, University of London, and member of the open data user group
It’s about time we dropped the “big” jargon: “It’s just data. Data volume and velocity is exponentially increasing, but has always been too big to easily store and process … One thing that has changed in the last few years is the recognition from decision makers – not just analysts – that data is a valuable resource.”
– Tom Smith, director of the Oxford Consultants for Social Inclusion
Why is it important for government?
“Joining up public sector data sources can make government more efficient, save money, identify fraud and help public bodies better serve their citizens.”
– Claire Vyvyan, executive director and general manager of public sector at Dell UK
“Data can enable government to do existing things more cheaply, do existing things better and do new things we don’t currently do.”
– Tom Heath, head of research at the Open Data Institute
The importance of algorithms in our lives today cannot be overstated. They are used virtually everywhere, from financial institutions to dating sites. But some algorithms shape and control our world more than others — and these ten are the most significant.
Just a quick refresher before we get started. Though there’s no formal definition, computer scientists describe algorithms as a set of rules that define a sequence of operations. They’re a series of instructions that tell a computer how it’s supposed to solve a problem or achieve a certain goal. A good way to think of algorithms is by visualizing a flowchart.
Every reservation is analyzed by Airbnb’s algorithm, which combs it for red flags — new listings that are signing up reservations at a suspicious clip; messages that include the term Western Union — and assigns it a trust score. If the trust score is too low, someone from Airbnb’s trust and safety team will follow up to ensure that everything is on the level. And of course, these efforts are supplemented by Airbnb’s user reviews and comments, which can also steer unsuspecting renters away from dicey properties.
All this seems to work well enough for Airbnb. But as the sharing economy spreads across so many other markets — from car rides to house cleaning — other companies could benefit from this storehouse of data as well. After all, if someone wanted to sign up as a Lyft driver, it would be good to know that they had been banned from Airbnb. That prospect has led some to predict the dawn of a fully reputation-based economy — one in which your behavior and track record follows you from service to service, a kind of FICO score for the sharing economy that would let both platforms and individuals know how trustworthy you are based on your history and activity.
HELSINKI, Finland — If there’s something you’d like to know about Helsinki, someone in the city administration most likely has the answer. For more than a century, this city has funded its own statistics bureaus to keep data on the population, businesses, building permits, and most other things you can think of. Today, that information is stored and freely available on the internet by an appropriately named agency, City of Helsinki Urban Facts.
There’s a potential problem, though. Helsinki may be Finland’s capital and largest city, with 620,000 people. But it’s only one of more than a dozen municipalities in a metropolitan area of almost 1.5 million. So in terms of urban data, if you’re only looking at Helsinki, you’re missing out on more than half of the picture.
Helsinki and three of its neighboring cities are now banding together to solve that problem. Through an entity called Helsinki Region Infoshare, they are bringing together their data so that a fuller picture of the metro area can come into view.
That’s not all. At the same time these datasets are going regional, they’re also going “open.” Helsinki Region Infoshare publishes all of its data in formats that make it easy for software developers, researchers, journalists and others to analyze, combine or turn into web-based or mobile applications that citizens may find useful.
I Know Where You Were Last Summer: London’s public bike data is telling everyone where you’ve been [vartree.blogspot.co.uk]
This article is about a publicly available dataset of bicycle journey data that contains enough information to track the movements of individual cyclists across London, for a six month period just over a year ago.
I’ll also explore how this dataset could be linked with other datasets to identify the actual people who made each of these journeys, and the privacy concerns this kind of linking raises.
It probably won’t surprise you to learn that there is a publicly available Transport For London dataset that contains records of bike journeys for London’s bicycle hire scheme. What may surprise you is that this record includes unique customer identifiers, as well as the location and date/time for the start and end of each journey. The public dataset currently covers a period of six months between 2012 and 2013.
What are the consequences of this? It means that someone who has access to the data can extract and analyse the journeys made by individual cyclists within London during that time, and with a little effort, it’s possible to find the actual people who have made the journeys.
Five years ago, a team of researchers from Google announced a remarkable achievement in one of the world’s top scientific journals, Nature. Without needing the results of a single medical check-up, they were nevertheless able to track the spread of influenza across the US. What’s more, they could do it more quickly than the Centers for Disease Control and Prevention (CDC). Google’s tracking had only a day’s delay, compared with the week or more it took for the CDC to assemble a picture based on reports from doctors’ surgeries. Google was faster because it was tracking the outbreak by finding a correlation between what people searched for online and whether they had flu symptoms.
As researchers contemplate mining the students’ details, however, the university is grappling with ethical issues raised by the collection and analysis of these huge data sets, known familiarly as Big Data, said L. Rafael Reif, the president of M.I.T.
For instance, he said, serious privacy breaches could hypothetically occur if someone were to correlate the personal forum postings of online students with institutional records that the university had de-identified for research purposes.