July 20, 2012

The Future of Big Data

Experts predict big possibilities for ‘Big Data’ – and some problems, too

While many see promise in the future of data analysis, some fear that work with gigantic stores of information could lead to privacy abuses and mistaken forecasts

The growing technological ability to collect and analyze massive sets of information, known as Big Data, could lead to revolutionary changes in business, political and social enterprises, according to a new survey of internet experts and stakeholders.

But while leading technologists and researchers around the world look forward to the positive impact of Big Data, many also worry about potential drawbacks.

A new Pew Internet/Elon University survey of 1,021 Internet experts, observers and stakeholders measured current opinions about the potential impact of human and machine analysis of newly emerging large data sets in the years ahead. The survey is an opt-in, online canvassing. Some 53% of those surveyed predicted that the rise of Big Data is likely be “a huge positive for society in nearly all respects” by the year 2020. Some 39% of survey participants said it is likely to be “a big negative.”

“The analysts who expect we will see a mostly positive future say collection and analysis of Big Data will improve our understanding of ourselves and the world,” said researcher Lee Rainie, director of the Pew Research Center’s Internet & American Life Project. “They predict that the continuing development of real-time data analysis and enhanced pattern recognition could bring revolutionary change to personal life, to the business world and to government.”

Survey respondent Hal Varian, chief economist at Google, said, “This is likely to lead to a better informed, more pro-active fiscal and monetary policy.” Bryan Trogdan, a consultant and entrepreneur, said, “Big Data is the new oil.” And David Weinberger of Harvard University’s Berkman Center observed, “We are just beginning to understand the range of problems Big Data can solve, even though it means acknowledging that we’re less unpredictable, free, madcap creatures than we’d like to think. It also raises the prospect of some of our most important knowledge will consist of truths we can’t understand because our pathetic human brains are just too small.”

As with all technological evolution, the experts also anticipate some negative outcomes. “The experts responding to this survey noted that the people controlling the resources to collect, manage and sort large data sets are generally governments or corporations with their own agendas to meet,” said Janna Anderson, director of Elon’s Imagining the Internet Center and a co-author of the study. “They also say there’s a glut of data and a shortage of human curators with the tools to sort it well, there are too many variables to be considered, the data can be manipulated or misread, and much of it is proprietary and unlikely to be shared.”

Survey participant John Pike, director of GlobalSecurity.org, said, “The world is too complicated to be usefully encompassed in such an undifferentiated Big Idea. Whose ‘Big Data’ are we talking about? Wall Street, Google, the NSA? I am small, so generally I do not like Big.”

Survey respondent danah boyd, a Microsoft research scientist and expert on the societal impacts of the Internet, observed, “The Internet magnifies the good, bad and ugly of everyday life. Of course these things will be used for good. And of course they’ll be used for bad and ugly. Science fiction gives us plenty of templates for imagining where that will go. What will be interesting is how social dynamics, economic exchange and information access are inflected in new ways that open up possibilities that we cannot yet imagine. This will mean a loss of some aspects of society that we appreciate but also usher in new possibilities.”

This is the seventh report generated out of an analysis of the results of a Web-based survey fielded in fall 2011 to gather opinions on eight Internet issues from a select group of experts and the highly engaged Internet public. (Details can be found here: http://www.elon.edu/e-web/predictions/expertsurveys/)

Following is a wide-ranging selection of respondents’ remarks:

“I’m a big believer in nowcasting. Nearly every large company has a real-time data warehouse and has more timely data on the economy than our government agencies. In the next decade we will see a public/private partnership that allows the government to take advantage of some of these private sector data stores. This is likely to lead to a better informed, more pro-active fiscal and monetary policy.” —Hal Varian, chief economist at Google

“Big Data allows us to see patterns we have never seen before. This will clearly show us interdependence and connections that will lead to a new way of looking at everything. It will let us see the ‘real-time’ cause and effect of our actions. What we buy, eat, donate, and throw away will be visual in a real-time map to see the ripple effect of our actions. That could only lead to mores-conscious behavior.” —Tiffany Shlain, director and producer of the film ‘Connected’ and founder of The Webby Awards

“Global climate change will make it imperative that we proceed in this direction of nowcasting to make our societies more nimble and adaptive to both human-caused environmental events and extreme weather events or decadal scale changes. Coupled with the data, though, we must have a much better understanding of decision making, which means extending knowledge about cognitive biases, about boundary work (scientists, citizens, and policymakers working together to weigh options on the basis not only of empirical evidence but also of values).” —Gina Maranto, co-director for ecosystem science and coordinator, graduate program in environmental science at the University of Miami

“Media and regulators are demonizing Big Data and its supposed threat to privacy. Such moral panics have occurred often thanks to changes in technology…But the moral of the story remains: there is value to be found in this data, value in our newfound publicness. Google’s founders have urged government regulators not to require them to quickly delete searches because, in their patterns and anomalies, they have found the ability to track the outbreak of the flu before health officials could and they believe that by similarly tracking a pandemic, millions of lives could be saved. Demonizing data, big or small, is demonizing knowledge, and that is never wise.” —Jeff Jarvis, professor, pundit and blogger

“Large, publicly available data sets, easier tools, wider distribution of analytics skills, and early stage artificial intelligence software will lead to a burst of economic activity and increased productivity comparable to that of the Internet and PC revolutions of the mid to late 1990s. Social movements will arise to free up access to large data repositories, to restrict the development and use of AIs, and to ‘liberate’ AIs.” —Sean Mead, director of analytics at Mead, Mead & Clark, Interbrand

“The world is too complicated to be usefully encompassed in such an undifferentiated Big Idea. Whose ‘Big Data’ are we talking about? Wall Street, Google, the NSA? I am small, so generally I do not like Big.”
John Pike, director of GlobalSecurity.org

“We can now make catastrophic miscalculations in nanoseconds and broadcast them universally. We have lost the balance inherent in ‘lag time.’” —Marcia Richards Suelzer, senior analyst at Wolters Kluwer

“Better information is seldom the solution to any real-world social problems. It may be the solution to lots of business problems, but it’s unlikely that the benefits will accrue to the public. We’re more likely to lose privacy and freedom from the rise of Big Data.” —Barry Parr, owner and analyst for MediaSavvy

“Big Data will not be so big. Most data will remain proprietary, or reside in incompatible formats and inaccessible databases where it cannot be used in ‘real time.’ The gap between what is theoretically possible and what is done (in terms of using real-time data to understand and forecast cultural, economic and social phenomena) will continue to grow.” —Jeff Eisenach, managing director, Navigant Economics LLC, a consulting business; formerly a senior policy expert with the US Federal Trade Commission

“Never underestimate the stupidity and basic sinfulness of humanity.” —Tom Rule, educator, technology consultant, and musician based in Macon, Georgia

“More information will be beneficial in all sorts of ways we can’t even fathom right now. Namely because we don’t have the data.” —John Capone, freelance writer and journalist; former editor of MediaPost Communications publications

“The huge prospects for the ‘Internet of Things’ tip me to checking the first choice. I tend to think of the Internet of Things as multiplying points of interactivity—sensors and/or actuators—throughout the social landscape. As the cost of connectivity goes down the number of these points will go up, diffusing intelligence everywhere.” —Fred Hapgood, technology author and consultant; moderator of the Nanosystems Interest Group at MIT in the 1990s

“Data that is much more available in quantity, cost, and quality will be a marked feature of the coming decade, but much of that will be ‘Little Data,’ which is useful mostly or entirely only locally (for practical or privacy concerns). I will want data possibly related to my health kept as private as possible. My house should enable control for light, heat, sound, image, etc. that enhances my experiences and convenience, and saves resources. For example, lighting will increasingly respond to occupancy or ‘presence’ (not just that someone is present, but who they are, how many they are, and what activity engaged in), and so provide better lighting services, automatically, and at less net energy than before. However, who outside the building should care about the details?  No one. Big Data will be a net plus, but a sizeable amount of problems will be created by it as well, particularly around security and privacy.” —Bruce Nordman, research scientist at Lawrence Berkeley National Laboratory

“Big Data should be developed within a context of openness and improved understandings of dynamic, complex whole ecosystems. There are difficult matters that must be addressed, which will take time and support, including: public and private sector entities agreeing to share data; providing frequently updated meta-data; openness and transparency; cost recovery; and technical standards.” —Richard Lowenberg, director, broadband planner 1st-Mile Institute; network activist since early 1970s

“The real power of ‘Big Data’ will come depending largely on the degree to which it is held in private hands or openly available. Openly available data, and widespread tools for manipulating it, will create new ways of understanding and governing ourselves as individuals and as societies.” —Alex Halavais, associate professor at Quinnipiac University; vice president of the Association of Internet Researchers; author of Search Engine Society

“In order for Big Data to have a positive impact on society overall, it has to be transparent. Ordinary citizens would have to be able to query the data set and discover real answers, regardless of the light that shows on individuals or corporations or governments. There is too much at stake for these parties to allow open, transparent access to this data. As long as some data sets or parts of data sets are hidden, there is room for misuse and manipulation. I think this manipulation is sure to take place. Unless Big Data is democratized on a massive scale, it will overall have a negative impact on society. Right now, I don’t see much hope for such a democratization.” —Nathan Swartzendruber, technology education at SWON Libraries Consortium

Respondents were allowed to keep their remarks anonymous if they chose to do so. Following are predictive statements selected from the hundreds of anonymous comments from survey participants:

“If Big Data is not also Wide Data (that is, dispersed among as many players and citizens as possible) then it will be a negative overall.”

“The few people who will understand the dangers of ‘Big Data’ will have high cognitive abilities and training. The general population will continue to rely on crappy results because they know no better.”

“Collection is likely to be imperceptible to most, unless law and regulation make it overt and provide the individual choice. Analysis likely will suffer from a divorce in knowledge and context between the orderers and the providers of the analysis. No example currently is better than that between the avaricious ignorance of bank executives and the technologists’ naiveté about the realities of collateralized debt obligations. Reliance will lead to increasingly unstable processes where only those able to use Big Data will be able to protect themselves, with the individual increasingly at risk. Rapid program stock trading is a current, pernicious example.”

“We will become more addicted to what the databases tell us. It might impair risk-taking for the good. We’ll depend more on models than instincts.”

“Big Data is not well matched to tiny minds. The data sets now exceed the capabilities of most businesspeople to know what to do with, about, and for the data. This will lead to huge abuse and misapplication.”

“We still haven’t figured out the implications of chaos theory, and if ‘Big Data’ and futurecasting aren’t perfect examples of chaos-based information, then I don’t know what is. Generically, we’re not prepared for this great a lack of privacy; we’re even less prepared for data of this magnitude available only to the powerful, rich, or connected.”

“Legal protections for the citizenry (in those jurisdictions which are not decidedly autocratic) are lacking, and will be essential to prevent corporate or governmental abuse of the insights available about people through widely aggregated data, as well as through new surveillance techniques.”

“The old lesson that correlation is not causation seems never to be learned. The control over data means that inaccurate data is hard to identify and correct. I see that the problems will only increase with the size of the datasets. Most emphasis seems to be given to doing clever things with data rather than ensuring its validity or giving the right people control over it.”

“The fact that most data is unstructured is a huge issue, and I doubt that we will solve the problems associated with getting meaning from that morass.” Another anonymous survey participant wrote, “Certainly in 2020 Big Data will be more risky than trustworthy. We just won’t have enough experience—the equivalent of the 100-year flood in forecasting terms—and so our systems will ‘look good’ on some basic problems but prove to make whoppers of mistakes.”

The findings reflect the reactions in an online, opt-in survey of a diverse set of 1,021 technology stakeholders and critics who were asked to choose one of two provided scenarios and explain their choice. While 53 percent selected the statement that that Big Data “will cause more problems than it solves,” a significant number of the survey participants who selected that scenario said the true outcome will be a little bit of both scenarios, and many said while they chose the first scenario as a “vote” for what they hope will happen they actually expect the outcome will be closer to the second scenario.

53% agreed with the statement:

Thanks to many changes, including the building of “the Internet of Things,” human and machine analysis of large data sets will improve social, political, and economic intelligence by 2020. The rise of what is known as “Big Data” will facilitate things like  “nowcasting” (real-time “forecasting” of events); the development of “inferential software” that assesses data patterns to project outcomes; and the creation of algorithms for advanced correlations that enable new understanding of the world. Overall, the rise of Big Data is a huge positive for society in nearly all respects.

39% agreed with the alternate statement, which posited:

Thanks to many changes, including the building of “the Internet of Things,” human and machine analysis of Big Data will cause more problems than it solves by 2020. The existence of huge data sets for analysis will engender false confidence in our predictive powers and will lead many to make significant and hurtful mistakes. Moreover, analysis of Big Data will be misused by powerful people and institutions with selfish agendas who manipulate findings to make the case for what they want. And the advent of Big Data has a harmful impact because it serves the majority (at times inaccurately) while diminishing the minority and ignoring important outliers. Overall, the rise of Big Data is a big negative for society in nearly all respects.

Note:   A total of 8% did not respond. The survey results are based on a non-random online sample of 1,021 Internet experts and other Internet users, recruited via email invitation, conference invitation, or link shared on Twitter, Google Plus or Facebook. Since the data are based on a non-random sample, a margin of error cannot be computed, and the results are not projectable to any population other than the people participating in this sample. The “predictive” scenarios used in this tension pair were created to elicit thoughtful responses to commonly found speculative futures thinking on this topic in 2011; this is not a formal forecast. Many respondents remarked that both scenarios will happen to a certain degree.

The Imagining the Internet Center (http://www.imaginingtheInternet.org) is an initiative of Elon University’s School of Communications. The center’s research holds a mirror to humanity’s use of communications technologies, informs policy development, exposes potential futures and provides a historic record. Imagining the Internet is directed by Janna Quitney Anderson, an associate professor of communications.

The Pew Research Center’s Internet & American Life Project (http://wwwpewInternet.org), directed by Lee Rainie, is a nonprofit, non-partisan “fact tank” that provides information on the issues, attitudes and trends shaping America and the world. It produces reports exploring the impact of the Internet on families, communities, work and home, daily life, education, health care and civic and political life.