We swim in a sea of data … and the sea level is rising rapidly.
Tens of millions of connected people, billions of sensors, trillions of transactions now work to create unimaginable amounts of information. An equivalent amount of data is generated by people simply going about their lives, creating what the McKinsey Global Institute calls “digital exhaust”—data given off as a byproduct of other activities such as their Internet browsing and searching or moving around with their smartphone in their pocket.
Human-created information is only part of the story, a relatively shrinking part. Machines and implanted sensors in oceans, in the soil, in pallets of products, in gambling casino chips, in pet collars, and countless other places are generating data and sharing it directly with data “readers” and other machines that do not involve human intervention.
The projected growth of data from all kinds of sources is staggering—to the point where some worry that in the foreseeable future our digital systems of storage and dissemination will not be able to keep up with the simple act of finding places to keep the data and move it around to all those who are interested in it.
Government leaders, scientists, corporate leaders, health officials, and education specialists are anxious to see if new kinds of analysis of large data sets can yield insights into how people behave, what they might buy, and how they might respond to new products, services, and public policy programs.
In March 2012, the White House Office of Science and Technology Policy (OSTP) announced a Big Data Research and Development Initiative, reporting that six U.S. government agencies would spend more than $200 million to help the government better organize and analyze large volumes of digital data. The project is designed to focus on building technologies to collect, store and manage huge quantities of data. OSTP wants to use the technology to accelerate discovery in science and engineering fields and improve national security and education, the White House said.
How could Big Data be significant? A 2011 industry report by global management consulting firm McKinsey argued that five new kinds of value might come from abundant data: 1) creating transparency in organizational activities that can be used to increase efficiency; 2) enabling more thorough analysis of employee and systems performances in ways that allow experiments and feedback; 3) segmenting populations in order to customize actions; 4) replacing/supporting human decision making with automated algorithms; and 5) innovating new business models, products, and services.
“Our research finds that data can create significant value for the world economy, enhancing the productivity and competitiveness of companies and the public sector and creating substantial economic surplus for consumers. For instance, if US health care could use Big Data creatively and effectively to drive efficiency and quality, we estimate that the potential value from data in the sector could be more than $300 billion in value every year, two-thirds of which would be in the form of reducing national health care expenditures by about 8%. In the private sector, we estimate, for example, that a retailer using Big Data to the full has the potential to increase its operating margin by more than 60%.”
Indeed, the race to come up with special analytics and algorithms for working with data is driving more and more corporate activity. As the Economist reported:
“Data are becoming the new raw material of business: an economic input almost on a par with capital and labour. ‘Every day I wake up and ask, “how can I flow data better, manage data better, analyze data better?” says Rollin Ford, the CIO of Wal-Mart.”
Craig Mundie, chief research and strategy officer at Microsoft, was quoted later in the same story musing about the emergence of a “data-centered economy.”
While enthusiasts see great potential for using Big Data, privacy advocates are worried as more and more data is collected about people—both as they knowingly disclose things in such things as their postings through social media and as they unknowingly share digital details about themselves as they march through life. Not only do the advocates worry about profiling, they also worry that those who crunch Big Data with algorithms might draw the wrong conclusions about who someone is, how she might behave in the future, and how to apply the correlations that will emerge in the data analysis.
There are also plenty of technical problems. Much of the data being generated now is “unstructured” and sloppily organized. Getting it into shape for analysis is no tiny task.
Imagine where we might be in 2020. The Pew Research Center’s Internet & American Life Project and Elon University’s Imagining the Internet Center asked digital stakeholders to weigh two scenarios for 2020, select the one most likely to evolve, and elaborate on the choice. One sketched out a relatively positive future where Big Data are drawn together in ways that will improve social, political, and economic intelligence. The other expressed the view that Big Data could cause more problems than it solves between now and 2020.
Respondents to our query rendered a decidedly split verdict.
53% agreed with the first statement:
Thanks to many changes, including the building of "the Internet of Things," human and machine analysis of large data sets will improve social, political, and economic intelligence by 2020. The rise of what is known as "Big Data" will facilitate things like "nowcasting" (real-time "forecasting" of events); the development of "inferential software" that assesses data patterns to project outcomes; and the creation of algorithms for advanced correlations that enable new understanding of the world. Overall, the rise of Big Data is a huge positive for society in nearly all respects .
39% agreed with the second statement, which posited:
Thanks to many changes, including the building of "the Internet of Things," human and machine analysis of Big Data will cause more problems than it solves by 2020. The existence of huge data sets for analysis will engender false confidence in our predictive powers and will lead many to make significant and hurtful mistakes. Moreover, analysis of Big Data will be misused by powerful people and institutions with selfish agendas who manipulate findings to make the case for what they want. And the advent of Big Data has a harmful impact because it serves the majority (at times inaccurately) while diminishing the minority and ignoring important outliers. Overall, the rise of Big Data is a big negative for society in nearly all respects.
Respondents were not allowed to select both scenarios; the question was framed this way in order to encourage a spirited and deeply considered written elaboration about the potential of a future with unimaginable amounts of data available to people and organizations. While about half agreed with the statement that Big Data will yield a positive future, many who chose that view observed that this choice is their hope more than their prediction. A significant number of the survey participants said while they chose the positive or the negative result they expect the true outcome in 2020 will be a little bit of both scenarios.
Respondents were asked to read the alternative visions and give narrative explanations for their answers using the following guideline questions, “What impact will Big Data have in 2020? What are the positives, negatives, and shades of grey in the likely future you anticipate? How will use of Big Data change analysis of the world, change the way business decisions are made, change the way that people are understood?”
Here are some of the major themes and arguments they made:
Those who see mostly positives for the future of Big Data share the upside
By 2020, the use of Big Data will improve our understanding of ourselves and the world .
“Media and regulators are demonizing Big Data and its supposed threat to privacy,” noted Jeff Jarvis, professor, pundit and blogger. “Such moral panics have occurred often thanks to changes in technology...But the moral of the story remains: there is value to be found in this data, value in our newfound publicness. Google's founders have urged government regulators not to require them to quickly delete searches because, in their patterns and anomalies, they have found the ability to track the outbreak of the flu before health officials could and they believe that by similarly tracking a pandemic, millions of lives could be saved. Demonizing data, big or small, is demonizing knowledge, and that is never wise.”
Sean Mead, director of analytics at Mead, Mead & Clark, Interbrand, added: “Large, publicly available data sets, easier tools, wider distribution of analytics skills, and early stage artificial intelligence software will lead to a burst of economic activity and increased productivity comparable to that of the Internet and PC revolutions of the mid to late 1990s. Social movements will arise to free up access to large data repositories, to restrict the development and use of AIs, and to 'liberate' AIs.”
David Weinberger of Harvard University’s Berkman Center observed, “We are just beginning to understand the range of problems Big Data can solve, even though it means acknowledging that we're less unpredictable, free, madcap creatures than we'd like to think. It also raises the prospect of some of our most important knowledge will consist of truths we can't understand because our pathetic human brains are just too small.”
“Big Data is the new oil,” said Bryan Trogdon, an entrepreneur and user-experience professional. “The companies, governments, and organizations that are able to mine this resource will have an enormous advantage over those that don't. With speed, agility, and innovation determining the winners and losers, Big Data allows us to move from a mindset of 'measure twice, cut once' to one of 'place small bets fast.'”
“Nowcasting,” real-time data analysis, and pattern recognition will surely get better.
Hal Varian, chief economist at Google, wrote: “I'm a big believer in nowcasting. Nearly every large company has a real-time data warehouse and has more timely data on the economy than our government agencies. In the next decade we will see a public/private partnership that allows the government to take advantage of some of these private-sector data stores. This is likely to lead to a better informed, more pro-active fiscal and monetary policy.”
“Global climate change will make it imperative that we proceed in this direction of nowcasting to make our societies more nimble and adaptive to both human-caused environmental events and extreme weather events or decadal scale changes,” wrote Gina Maranto, co-director for ecosystem science and coordinator, graduate program in environmental science at the University of Miami. “Coupled with the data, though, we must have a much better understanding of decision making, which means extending knowledge about cognitive biases, about boundary work (scientists, citizens, and policymakers working together to weigh options on the basis not only of empirical evidence but also of values).”
And Tiffany Shlain, director and producer of the film Connected and founder of The Webby Awards, maintained: “Big Data allows us to see patterns we have never seen before. This will clearly show us interdependence and connections that will lead to a new way of looking at everything. It will let us see the ‘real-time’ cause and effect of our actions. What we buy, eat, donate, and throw away will be visual in a real-time map to see the ripple effect of our actions. That could only lead to mores-conscious behavior.”
The good of Big Data will outweigh the bad. User innovation could lead the way, with “do-it-yourself analytics.”
“The Internet magnifies the good, bad, and ugly of everyday life,” said danah boyd, senior researcher for Microsoft Research. “Of course these things will be used for good. And of course they'll be used for bad and ugly. Science fiction gives us plenty of templates for imagining where that will go. But that dichotomy gets us nowhere. What will be interesting is how social dynamics, economic exchange, and information access are inflected in new ways that open up possibilities that we cannot yet imagine. This will mean a loss of some aspects of society that we appreciate but also usher in new possibilities.”
“Do-it-yourself analytics will help more people analyze and forecast than ever before,” observed Marjory S. Blumenthal, associate provost at Georgetown University and adjunct staff officer at RAND. “This will have a variety of societal benefits and further innovation. It will also contribute to new kinds of crime.”
Some say the limitations of Big Data must be recognized
Open access to tools and data “transparency” are necessary for people to provide information checks and balances. A
re they enough?
“Big Data gives me hope about the possibilities of technology,” said Tom Hood, CEO of the Maryland Association of CPAs. “Transparency, accountability, and the ‘wisdom of the crowd’ are all possible with the advent of Big Data combined with the tools to access and analyze the data in real time.”
Richard Lowenberg, director and broadband planner for the 1st-Mile Institute, urged, “Big Data should be developed within a context of openness and improved understandings of dynamic, complex whole ecosystems. There are difficult matters that must be addressed, which will take time and support, including: public- and private-sector entities agreeing to share data; providing frequently updated meta-data; openness and transparency; cost recovery; and technical standards.”
The Internet of Things will diffuse intelligence, but lots of technical hurdles must be overcome.
Fred Hapgood, a tech consultant who ran MIT’s Nanosystems group in the 1990s, said, “I tend to think of the Internet of Things as multiplying points of interactivity—sensors and/or actuators—throughout the social landscape. As the cost of connectivity goes down the number of these points will go up, diffusing intelligence everywhere.”
An anonymous respondent wrote, “With the right legal and normative framework, the Internet of Things should make an astounding contribution to human life. The biggest obstacles to success are technological and behavioral; we need a rapid conversion to IPv6, and we need cooperation among all stakeholders to make the Internet of Things work. We also need global standards, not just US standards and practices, which draw practical and effective lines about how such a data trove may and may not be used consistent with human rights.”
An anonymous survey participant said, “Apparently this 'Internet of Things' idea is beginning to encourage yet another round of cow-eyed Utopian thinking. Big Data will yield some successes and a lot of failures, and most people will continue merely to muddle along, hoping not to be mugged too frequently by the well-intentioned (or not) entrepreneurs and bureaucrats who delight in trying to use this shiny new toy to fix the world.”
In the end, humans just won’t be able to keep up
Jeff Eisenach, managing director, Navigant Economics LLC, a consulting business, formerly a senior policy expert with the US Federal Trade Commission, had this to say: “Big Data will not be so big. Most data will remain proprietary, or reside in incompatible formats and inaccessible databases where it cannot be used in 'real time.' The gap between what is theoretically possible and what is done (in terms of using real-time data to understand and forecast cultural, economic, and social phenomena) will continue to grow.”
Humans, rather than machines, will still be the most capable of extracting insight and making judgments using Big Data. Statistics can still lie.
“By 2020, most insights and significant advances will still be the result of trained, imaginative, inquisitive, and insightful minds,” wrote Donald G. Barnes, visiting professor at Guangxi University in China.
David D. Burstein, founder of Generation18, a youth-run voter-engagement organization, said, “As long as the growth of Big Data is coupled with growth of refined curation and curators it will be an asset. Without those curators the data will become more and more plentiful, more overwhelming and [it will] confuse our political and social conversations by an overabundance of numbers that can make any point we want to make them make.”
Those who see mostly negatives between now and 2020 share the down side
Take off the rose-colored glasses: Big Data has the potential for significant negative impacts that may be impossible to avoid
. “How to Lie with the Internet of Things” will be a best-seller.
“There is a need to think a bit more about the distribution of the harms that flow from the rise of big, medium, and little data gatherers, brokers, and users,” observed communications expert Oscar Gandy. “If ‘Big Data’ could be used primarily for social benefit, rather than the pursuit of profit (and the social-control systems that support that effort), then I could ‘sign on’ to the data-driven future and its expression through the Internet of Things.”
“We can now make catastrophic miscalculations in nanoseconds and broadcast them universally. We have lost the balance inherent in 'lag time,'” added Marcia Richards Suelzer, senior analyst at Wolters Kluwer
An anonymous survey participant wrote, “Big Data will generate misinformation and will be manipulated by people or institutions to display the findings they want. The general public will not understand the underlying conflicts and will naively trust the output. This is already happening and will only get worse as Big Data continues to evolve.” Another anonymous respondent joked, “Upside: How to Lie with the Internet of Things becomes an underground bestseller.”
We won’t have the human or technological capacity to analyze Big Data accurately and efficiently by 2020.
“A lot of 'Big Data' today is biased and missing context, as it's based on convenience samples or subsets,” said Dan Ness, principal research analyst at MetaFacts. “We're seeing valiant, yet misguided attempts to apply the deep datasets to things that have limited relevance or applicability. They're being stretched to answer the wrong questions. I'm optimistic that by 2020, this will be increasingly clear and there will be true information pioneers who will think outside the Big Data box and base decisions on a broader and balanced view. Instead of relying on the 'lamppost light,' they will develop and use the equivalent of focused flashlights.”
Mark Watson, senior engineer for Netflix, said, “I expect this will be quite transformative for society, though perhaps not quite in just the next eight years.”
And Christian Huitema, distinguished engineer with Microsoft, said, “It will take much more than ten years to master the extraction of actual knowledge from Big Data sets.”
Respondents are concerned about the motives of governments and corporations, the entities that have the most data and the incentive to analyze it. Manipulation and surveillance are at the heart of their Big Data agendas.
“The world is too complicated to be usefully encompassed in such an undifferentiated Big Idea. Whose ‘Big Data’ are we talking about? Wall Street, Google, the NSA? I am small, so generally I do not like Big,” wrote John Pike, director of GlobalSecurity.org
An anonymous survey participant wrote, “Data aggregation is growing today for two main purposes: National security apparatus and ever-more-focused marketing (including political) databases. Neither of these are intended for the benefit of individual network users but rather look at users as either potential terrorists or as buyers of goods and services.”
Another anonymous respondent said, “Money will drive access to large data sets and the power needed to analyze and act on the results of the analysis. The end result will, in most cases, be more effective targeting of people with the goal of having them consume more goods, which I believe is a negative for society. I would not call that misuse, but I would call it a self-serving agenda.”
Another wrote, “It is unquestionably a great time to be a mathematician who is thrilled by unwieldy data sets. While many can be used in constructive, positive ways to improve life and services for many, Big Data will predominantly be used to feed people ads based on their behavior and friends, to analyze risk potential for health and other forms of insurance, and to essentially compartmentalize people and expose them more intensely to fewer and fewer things.”
The rich will profit from Big Data and the poor will not.
Brian Harvey, a lecturer at the University of California-Berkeley, wrote, “The collection of information is going to benefit the rich, at the expense of the poor. I suppose that for a few people that counts as a positive outcome, but your two choices should have been ‘will mostly benefit the rich’ or ‘will mostly benefit the poor,’ rather than ‘good for society’ and ‘bad for society.’ There's no such thing as ‘society.’ There's only wealth and poverty, and class struggle. And yes, I know about farmers in Africa using their cell phones to track prices for produce in the big cities. That's great, but it's not enough.”
Frank Odasz, president of Lone Eagle Consulting, said, “The politics of control and the politics of appearances will continue to make the rich richer and diminish the grassroots and disenfranchised until the politics of transparency make it necessary for the top down to partner meaningfully with the bottom up in visible, measurable ways. The grassroots boom in bottom-up innovation will increasingly find new ways to self-organize as evidenced in 2011 by the Occupy Wall Street and Arab Spring movements.”
Purposeful education about Big Data might include priming for the anticipation of manipulation. Maybe trust features can be built in.
Heywood Sloane, principal at CogniPower, said, “This isn't really a question about the Internet or Big Data—it's a question about who and how much people might abuse it (or anything else), intentionally or otherwise. That is a question that is always there—thus there is a need for a countervailing forces, competition, transparency, scrutiny, and/or other ways to guard against abuse. And then be prepared to misjudge sometimes.”
“Never underestimate the stupidity and basic sinfulness of humanity,” reminded Tom Rule, educator, technology consultant, and musician based in Macon, Georgia.
Barry Parr, owner and analyst for MediaSavvy, contributed this thought: “Better information is seldom the solution to any real-world social problems. It may be the solution to lots of business problems, but it's unlikely that the benefits will accrue to the public. We're more likely to lose privacy and freedom from the rise of Big Data.”
And an anonymous respondent commented, “Data is misused today for many reasons, the solution is not to restrict the collection of data, but rather to raise the level of awareness and education about how data can be misused and how to be confident that data is being fairly represented and actually answers the questions you think it does.”
Some share comprehensive views
A number of respondents articulated a view that could be summarized as: Humans seem to think they know more than they actually know. Still, despite all of our flaws, this new way of looking at the big picture could help. One version of this kind of summary thought was written by Stowe Boyd, principal at Stowe Boyd and The Messengers, a research, consulting, and media business based in New York City:
Overall, the growth of the ‘Internet of Things’ and ‘Big Data’ will feed the development of new capabilities in sensing, understanding, and manipulating the world. However, the underlying analytic machinery (like Bruce Sterling's Engines of Meaning) will still require human cognition and curation to connect dots and see the big picture.
And there will be dark episodes, too, since the brightest light casts the darkest shadow. There are opportunities for terrible applications, like the growth of the surveillance society, where the authorities watch everything and analyze our actions, behavior, and movements looking for patterns of illegality, something like a real-time Minority Report.
On the other side, access to more large data can also be a blessing, so social advocacy groups may be able to amass information at a low- or zero-cost that would be unaffordable today. For example, consider the bottom-up creation of an alternative food system, outside the control of multinational agribusiness, and connecting local and regional food producers and consumers. Such a system, what I and others call Food Tech, might come together based on open data about people's consumption, farmers' production plans, and regional, cooperative logistics tools. So it will be a mixed bag, like most human technological advances.
The view expressed by Jerry Michalski, founder and president of Sociate and consultant for the Institute for the Future, weaves in the good, bad, and in between in a practical way:
Humans consistently seem to think they know more than they actually know in retrospect. Our understanding of technological effects, for example, lags by many decades the inexorable effects of implementation. See Jerry Mander's great page about whether we would have let the car drive our evolution as much as it did had we known the consequences back then (in his book In the Absence of the Sacred).
So the best-intentioned of humans will try to use Big Data to solve Big Problems, but are unlikely to do well at it. Big Ideas have driven innumerable bad decisions over time. Think of the Domino Theory, Eugenics, and racial superiority theories—even Survival of the Fittest. These all have led us into mess after mess.
Meanwhile, the worst-intentioned will have at hand immensely powerful ways to do harm, from hidden manipulation of the population to all sorts of privacy invasions. A bunch of dystopian sci-fi movies don't seem like they're that far away from our reality. Also, data coming out of fMRI experiments will convince us we know how people make decisions, leading to more mistaken policies.
There are a few bright spots on the horizon. When crowds of people work openly with one another around real data, they can make real progress. See Wikipedia, OpenStreetMap, CureTogether, PatientsLikeMe, and many other projects that weren't possible pre-Internet. We need small groups empowered by Big Data, then coordinating with other small groups everywhere to find what works pragmatically.
Finally, Google's use of Big Data has found remarkably simple answers to thorny problems like spell checking and translation, not to mention nascent insights on pandemic tracking and more. I fear Google's monolithic power, but admire their more clear-cut approach.
Finally, Patrick Tucker, deputy editor of The Futurist magazine and director of communications for the World Future Society, sees the range of changes adding up to a new dimension he calls the “knowable future” extracted from the things that machines know better about us than we know ourselves:
Computer science, data-mining, and a growing network of sensors and information-collection software programs are giving rise to a phenomenal occurrence, the knowable future. The rate by which we can predict aspects of the future is quickening as rapidly as is the spread of the Internet, because the two are inexorably linked. The Internet is turning prediction into an equation. In research centers across the country, mathematicians, statisticians, and computer scientists are using a global network of sensors and informational collection devices and programs to plot ever more credible and detailed forecasts and scenarios.
Computer-aided prediction comes in a wide variety of forms and guises, from AI programs that chart potential flu outbreaks to expensive (yet imperfect) quant algorithms that anticipate outbreaks of stock market volatility. But the basic process is not dramatically different from what plays out when the human brain makes a prediction. These systems analyze sensed data in the context of stored information to extrapolate a pattern the same way the early earthquake warning system used its network of sensors to detect the P wave and thus project the S wave.
What differs between these systems, between humans predictors and machine predictors, is the sensing tools. Humans are limited to two eyes, two ears, and a network of nerve endings. Computers can sense via a much wider menagerie of data collection tools.
Many firms have gotten a lot better at predicting human patterns using those sense tools. Some, like Google, are already household names. In the coming years, Google is going to leverage the massive amount of user data that it collects on a minute by minute to basis to extrapolate trends in human activity and thus predict future activity. Google has been doing this with some success in terms of flu for several years now with its popular Flu Trends program. It works exactly how you would imagine that it would. We have found a close relationship between how many people search for flu-related topics and how many people actually have flu symptoms, says Google on its Flu Trends Web site. As Nicholas Christakis described in his book Connected querying activity and social network activity can reveal infectious disease trends long before data on those trends is released to the public by prudent government agencies.
But does the same phenomenon, i.e. more querying equals more activity, hold true for subjects beyond influenza? Consider that in 2010 two Notre Dame researchers, Zhi Da and Penhji (Paul) Gao, showed that querying activity around particular companies can, somewhat reliably, predict a stock price increase for those companies.
In many ways, Google is already in the process of becoming the world’s first prediction engine, since prediction is key to its business model anyway. Not everyone realizes that Google makes 28% of its revenue through its Adsense program, which shows different ads to different users on the basis of different search terms. Better personalization in terms of display ads is a function of prediction. Anticipating user behaviors, questions, and moods, strikes at the very heart of what Google’s mission to ‘organize the world’s information.’
…. Services like Facebook and Google+ may help us to understand a lot more about our lives and our relationships than we did before these services came into existence. But Facebook’s view into our lives and how our various social circles interact will always be clearer than will ours. The question becomes, who else gets to look through that microscope?
There are dangers associated with this phenomenon. Moveon.org president Eli Pariser, in his recently released book, The Filter Bubble describes it as a type of ‘informational determinism,’ the inevitable result of too much Web personalization. The Filter Bubble is a state where ‘What you've clicked on in the past determines what you see next—a Web history you're doomed to repeat. You can get stuck in a static, ever-narrowing version of yourself--an endless you-loop.’
Google and Facebook are only the most obvious offenders. They’re conspicuous because they’re using that data to vend services to you. But you can always opt out of using Facebook, as millions already have. And while cutting Google out of your life isn’t as easy as it was a decade ago, there are ways to use Google anonymously, and, indeed, to find information without using it at all. These are networks we opt in or out of….
Futurist machines are taking over the job of inventing the future. Their predictions have consequences in the real world because our interaction with the future as individuals, groups, and nations is an expression of both personal and national identity. Regardless of what may or may not happen, the future as an idea continually shapes buying, voting, and social behavior. The future is becoming increasingly knowable. We sit on the verge of a potentially tremendous revolution in science and technology. But even those aspects of the future that are the most potentially beneficial to humankind will have disastrous effects if we fail to plan for them.