July 20, 2012

The Future of Big Data

Main Findings: Influence of Big Data in 2020

Respondents’ thoughts

Tension pair on future of big data

One major sign of the sanctification of Big Data as a topic of interest with vast potential emerged in March this year when the National Science Foundation and National Institutes of Health joined forces “to develop new methods to derive knowledge from data; construct new infrastructure to manage, curate and serve data to communities; and forge new approaches for associated education and training,” NSF Director Subra Suresh announced in a letter to researchers in engineers, computers, and information science.1 He said the “program aims to advance the core scientific and technological means of managing, analyzing, visualizing, and extracting information from large, diverse, distributed, and heterogeneous data sets in order to accelerate progress in science and engineering research.”

The effort could hardly begin soon enough. Many are excited about the prospects for analyzing Big Data. Rick Smolan, creator of the “Day in the Life” photography series, is in the middle of a project he calls “The Human Face of Big Data,” documenting the collection and uses of data. He says that Big Data has the potential to be “humanity’s dashboard,” an intelligent tool that can help combat poverty, crime, and pollution.2

Still, there is uncertainty about how effective it will be. One illustration is a recent survey of chief marketing officers at major corporations: 75% of survey respondents said they believed that leveraging data will help their companies dramatically improve their business, yet more than half said they currently lack the tools to mine true customer insights from the data generated by digital and offline efforts.3 In the survey, 58% of respondents said they lacked the skills and technology to perform analytics on marketing data, and more than 70% said they aren’t able to leverage the value of customer data.

There is already evidence in everyday life of use of Big Data:

  • Every time Google suggests a spelling change in a search query, it’s because previous queries on the same subject used different spellings that were found more useful. The firm’s analysis of trillions of search queries yields those spell-change suggestions.4 Google economist Hal Varian has talked about the firm’s ability to spot trends from search queries allow it to forecast economic and public health trends.
  • Every time someone gets a call from a credit/debit card company about “unusual activity” on their cards, the call is arriving because firms are churning through billions of transactions looking for anomalies in consumer behavior that are potentially associated with fraud or identity theft.5
  • In April, Forbes ran through examples of Big Data operations at well known firms:6Netflix, for example, takes all of its customers’ viewing habits and movie ratings and runs them through a sophisticated algorithm to generate the 5-star recommendation system tailored for each subscriber. Amazon.com does a form of this, too. Online dating site OKCupid generates a steady stream of often hilarious insights into modern romance by sifting through its user profiles looking for correlations. An iPhone app called Ness uses your own social network and preferences to generate a personalized restaurant search engine.”
  • The “Target Snafu” got major attention at an O’Reilly Strata conference on Big Data last spring. As Patrick Tucker blogged from the conference for the World Future Society: The New York Times reported in February that retailer Target “used customer data and predictive analytics to figure out that one of their customers was pregnant, and even more remarkably, what trimester she was in. They emailed her some promotional material and the girl’s father discovered his daughter was pregnant based on the coupons she started receiving from a big box retailer, which gave rise to an awkward conversation, no doubt.”7

This growing focus on Big Data prompted us to pose two scenarios eliciting expert views on how things might unfold by the year 2020.

After being asked to choose one of the two 2020 scenarios presented in this survey question, respondents were also asked, “What impact will Big Data have in 2020? What are the positives, negatives, and shades of grey in the likely future you anticipate? How will use of Big Data change analysis of the world, change the way business decisions are made, change the way that people are understood?”

A number of survey participants questioned the language used to describe the positive-outcome scenario. “Massive increases in the volume and availability of data will certainly improve the power of analytical and predictive tools, but there will not be some kind of major shift to more ‘knowable’ outcomes,” wrote an anonymous respondent. “Having more data does not change the fact that there are too many interoperating variables for meaningful prediction to be possible for many things—e.g. weather. To the extent that people are saying that certain things are more predictable than they used to be they are either lying or using magical thinking.”

Another wrote, “As part of the Big Data sector, I have only modest expectations of its positive impacts. There is very little evidence that there is growing practice of ‘evidence based decision-making.’” However another anonymous respondent disagreed, saying, “As more people enter the digital age there will be more minds working to improve the way people communicate, curate information, and even predict events. Studying how people interact based on time, day, use of language, and comparing it to particular events it would not be impossible to say there could be patterns for identification of future events. If such a process of identifying algorithmic processes doesn’t occur by 2020, it will be well on its way.”

Others directed their predictions to the scenarios. What follows is a selection from the hundreds of written responses survey participants shared when answering this question. About half of the expert survey respondents elected to remain anonymous, not taking credit for their remarks. Because people’s expertise is an important element of their participation in the conversation, the formal report primarily includes the comments of those who took credit for what they said. The full set of expert responses, anonymous and not, can be found online at http://www.imaginingtheInternet.org. The selected statements that follow here are grouped under headings that indicate some of the major themes emerging from the overall responses.

By 2020 we should be seeing progress in the use of Big Data to improve our understanding of ourselves and the world

Many respondents felt sure that Big Data analysis will have progressed to the point in 2020 where practical, everyday applications of it will show up in people’s and organizations’ lives and provide help.

Bryan Trogdon, entrepreneur and Semantic Web evangelist, said, “Big Data is the new oil. The companies, governments and organizations that are able to mine this resource will have an enormous advantage over those that don’t. With speed, agility, and innovation determining the winners and losers, Big Data allows us to move from a mindset of ‘measure twice, cut once’ to one of ‘place small bets fast.’”

Paul Jones, clinical associate professor at the University of North Carolina-Chapel Hill, says a lot of evolution will take place in the next few years. “I expect misuse and regulation responding to that misuse in the near term,” he wrote. “By 2020 behaviors and actions surrounding Big Data will be normalized and a lot less scary and chaotic. The rewards that can be reaped from understanding the world through Big Data are giant and capable of changing society for the better.” Ross Rader, board member of the Canadian Internet Registration Authority, agreed. “Only by 2020 will we have enough of a fundamental understanding to start truly doing great things with Big Data. We will make a lot of mistakes in the next ten years, endure predictions about ‘The Death of Big Data’ and slowly but surely, we will develop the tools and understanding necessary to turn the rise of Big Data into a positive force for change.”

Amber Case, CEO of Geoloqi, expects positive progress. “When data can’t speak to each other, time and effort is wasted,” she pointed out. “In many cases the use of analytics is a way of understanding long-term trends or identifying emergent behavior that may evolve into long-term problems. As with any natural process, there will be mistakes and errors, but there will also be great benefits, one of which is the reduction of the time and space it takes to get work done or understand a process.”

The ongoing evolution of code is seen as a plus. Laura Lee Dooley, online engagement architect and strategist for the World Resources Institute, wrote, “Building on XML, we will enhance and enforce a structured language labeling method for data gathering so data can be incorporated into datasets more easily and seamlessly. This would allow more time for analysis, requiring less time for data formatting and cleaning. This will also enable researchers to quickly respond to information needs by providing mashups of data which can inform quick decision-making.”

Don Hausrath, retired from the US Information Agency, sees positives. “Big Data will prevail,” he wrote. ”Be it the design of war strategies by a UNIVAC in Bethesda during the Vietnam War, to the design of the BART System in Berkeley, gaming systems using sophisticated data sets are better at identifying solutions. In fact, the use of non-traditional statistical analysis in the BART System contributed to the winning of a Nobel Prize in Economics to one of the consultants. It is absolutely false that Big Data will diminish our lives. The use of modern statistical analysis is such that nuanced results are not only possible, but routine.”

“More information will be beneficial in all sorts of ways we can’t even fathom right now, namely because we don’t have the data,” said John Capone, freelance writer and journalist, former editor of MediaPost Communications publications.

An anonymous respondent said, “Time to embrace something that is bigger than our brains, but also to put our brains to use to manage the input and control the analysis. A win-win. We think harder and get smarter.”

Some respondents shared their enthusiasm about the benefits of real-time data. Hal Varian, chief economist at Google noted, “I’m a big believer in nowcasting. Nearly every large company has a real-time data warehouse and has more timely data on the economy than our government agencies. In the next decade we will see a public/private partnership that allows the government to take advantage of some of these private sector data stores. This is likely to lead to a better informed, more pro-active fiscal and monetary policy.”

Gina Maranto, co-director for ecosystem science and policy at the University of Miami, said, “I believe, with Hans Rosling, that the more data we analyze, the better off we will be. Global climate change will make it imperative that we proceed in this direction of nowcasting to make our societies more nimble and adaptive to both human caused environmental events (e.g., Deepwater Horizon) and extreme weather events or decadal scale changes such as droughts. Coupled with the data, though, we must have a much better understanding of decision making, which means extending knowledge about cognitive biases, about boundary work (scientists, citizens, and policymakers working together to weigh options on the basis not only of empirical evidence but also of values).”

Tiffany Shlain, director and producer of the film ‘Connected’ and founder of The Webby Awards, wrote, “Big Data allows us to see patterns we have never seen before. This will clearly show us interdependence and connections that will lead to a new way of looking at everything. It will let us see the ‘real-time’ cause and effect of our actions. What we buy, eat, donate, and throw away will be visual in a real-time map to see the ripple effect of our actions. That could only lead to mores-conscious behavior.”

Some responses that concentrated on the Internet of Things (a source of Big Data) came from people arguing that we will see impressive gains

While a number of respondents expressed little confidence in much additional, useful development of the Internet of Things by 2020, many see it developing. The Internet of Things is the mixture of connected “smart objects”—devices with IP-enabled sensors and readers, RFID tags, and other identifying digital information that can feed material to machines for analysis.

“The huge prospects for the ‘Internet of Things’ tip me to checking the first choice,” wrote Fred Hapgood, technology author and consultant and moderator of the Nanosystems Interest Group at MIT in the 1990s. “I tend to think of the Internet of Things as multiplying points of interactivity—sensors and/or actuators—throughout the social landscape. As the cost of connectivity goes down the number of these points will go up, diffusing intelligence everywhere.”

An anonymous survey participant wrote, “With the right legal and normative framework, the Internet of Things should make an astounding contribution to human life. The biggest obstacles to success are technological and behavioral, we need a rapid conversion to IPv6, and we need cooperation among all stakeholders to make the Internet of Things work. We also need global standards, not just US standards and practices, which draw practical and effective lines about how such a data trove may and may not be used consistent with human rights.”

Bob Frankston, computing pioneer, co-developer of VisiCalc, and ACM Fellow, noted, “The Internet of Things is less about massive data than meta objects. We’ll have to learn how to hide from Big Data in plain sight. I do worry about the tyranny of the major less because of Big Data than because of today’s self-terrorized society seeking solace in the past.”

Bruce Nordman, research scientist at Lawrence Berkeley National Laboratory and Internet Engineering Task Force group leader, wrote, “This topic relates directly to some of my own work on the Internet of Things. Data that is much more available in quantity, cost, and quality will be a marked feature of the coming decade, but much of that will be ‘Little Data,’ which is useful mostly or entirely only locally (for practical or privacy concerns). I will want data possibly related to my health kept as private as possible. My house should enable control for light, heat, sound, image, etc. that enhances my experiences and convenience, and saves resources. For example, lighting will increasingly respond to occupancy or ‘presence’ (not just that someone is present, but who they are, how many they are, and what activity engaged in), and so provide better lighting services, automatically, and at less net energy than before. However, who outside the building should care about the details?  No one. Big Data will be a net plus, but a sizeable amount of problems will be created by it as well, particularly around security and privacy.”

An anonymous respondent wrote, “We are on a path that will make very large datasets available to study the world around us. The emergence of ubiquitous, high-speed wireless environments will enable the deployment of low-cost sensors. These sensors will provide unprecedented quantities of data. Businesses are presently leading the way in ‘predictive analytics.’ Government has recently become attentive to such tools. In the near term (2020), these Big Data sets will begin coming on-line and professional analysts will begin using the information to make informed policy choices. Over the longer-term, the potential for abuse is strong. It is unclear that politically driven people will possess the will or skill to properly interpret data analyses. Any real abuses are likely to accumulated in the more distant future, beyond 2020.”

Internet Society leader Rajnesh Singh, regional director for Asia, warned, “Embedding Internet technology in various ‘things’ will help us improve our lives. However, it is equally important to ensure that we use this responsibly and not too much power and control is held by any one entity. There must be appropriate checks and balances, accountability, and transparency. A lot more work needs to be done by all stakeholders to ensure we get there, and use such technology for the advancement of mankind—not its control.”

An anonymous respondent said, “A risk is that the Big Data available could be used—in a Wild West of privacy rights—as a new gold mine for aggressive Internet companies. It will depend very much on the capacity governments (and the future Internet governance bodies) will have to avoid the risk that the data provided by the Internet of Things (IoT) will become the same that is today the data derived from search engines. In terms of benefits from IoT for the environment, I don’t believe that their impact will be so relevant as you could believe. It will take a lot of time to standardize and to integrate existing networks and the databases of IoT. None of the existing companies will accept being maginalized via the IoT game; there will be fierce resistance to integration.”

Charlie Breindahl, a part-time lecturer at the University of Copenhagen, predicted, “Most things—even the cheapest and most banal, such as paper clips—will carry an individual identity at some point in the future. We are already in the middle of the revolution and we now have available for analysis an unprecedented amount of data. We should get used to the idea that the important question is how much data we can afford to throw away, not how much data we can afford to collect. We now know much more than we used to know, but our knowledge merely points to new needs for research.”

Barry Chudakov, a consultant and visiting research fellow in the McLuhan Program in Culture and Technology at the University of Toronto developed the following scenario: “By 2020 our every movement (or click or emotion) is someone’s business model. We will first build narratives and then a worldview around that. Considering the ability to take vast quantities of data and find meaning in it through pattern-finding and analytics, we will eventually employ these analytics not only in finance, healthcare, marketing and IT, but in what we hear, see and encounter as the world goes by us and through us. There will be a dawning reality that our identities are already tied to our data. In essence, in some measure our identities are our data. Big data and the Internet of Things become an arbiter, a shibboleth, an agent of triage.  As the world becomes increasingly interconnected, information holds things together: it is a binding agent for systems. As such it is not only a new decider of what’s important or not, it is a new proxy it can stand in place of anyone. By 2020 data becomes a new belief system. In human history we’ve had this sort of binder before and we used the Latin base religare, meaning to bind together, to embody this concept. Information, in the form of Big Data and the Internet of Things, becomes religion.”

A doubtful anonymous respondent observed, “Apparently this ‘Internet of Things’ idea is beginning to encourage yet another round of cow-eyed Utopian thinking. Big Data will yield some successes and a lot of failures, and most people will continue merely to muddle along, hoping not to be mugged too frequently by the well-intentioned (or not) entrepreneurs and bureaucrats who delight in trying to use this shiny new toy to fix the world.”

Many expect or at least hope that the good will outweigh the bad; but some worry the balance of impacts will tip the other way

Many respondents in this sampling had a strong sense of both the benefits and problems that will emerge as Big Data becomes a great reality in corporate, government, and social life. They spoke about both dimensions of impact. Some tended to accentuate the positive even as they cautioned about coping with the negative; others worried about the things breaking more bad than good.

Here is how danah boyd, senior researcher with professional affiliations and work based at Microsoft Research, sees the balance of forces: “The Internet magnifies the good, bad, and ugly of everyday life. Of course these things will be used for good. And of course they’ll be used for bad and ugly. Science fiction gives us plenty of templates for imagining where that will go. But that dichotomy gets us nowhere. What will be interesting is how social dynamics, economic exchange, and information access are inflected in new ways that open up possibilities that we cannot yet imagine. This will mean a loss of some aspects of society that we appreciate but also usher in new possibilities.”

Marjory S. Blumenthal, associate provost at Georgetown University and adjunct staff officer at RAND Corporation, predicted, “Do-it-yourself analytics will help more people analyze and forecast than ever before. This will have a variety of societal benefits and further innovation. It will also contribute to new kinds of crime.”

Professional programmer Seth Finkelstein responded, “This is a question where I want to answer both. The ‘choices’ above are both true in their descriptions. I finally went with ‘negative’ because I’ve been advocating for years that data-mining businesses are not good models for government. But this is just the latest version of ‘computers and society.’”

Perry Hewitt, director of digital communications and communications services at Harvard University, wrote,“’Nowcasting’ is sure to stumble many times before it stands, and companies will control software tools in ways that make us all profoundly and correctly suspicious. However, fearing Big Data feels like fearing fire: it exists, its capacity to do damage is enormous, and yet it illuminates such that there is no going back. For every health care data aggregator that makes us cringe, there is, one hopes, an Esther Duflo [a MacArthur Foundation Fellow for her work on improving the lives of the world’s poorest people]. Using data can inform social solutions.”

Larry Lannom, director of information management technology and vice president at the Corporation for National Research Initiatives, wrote, “Added data will enhance our understanding of the physical world and the real-time tracking of objects in motion, e.g., shipments and inventories, and will increase the efficiency of various economic activities. Privacy will continue to be a large challenge.”

Mark Walsh, cofounder of geniusrocket.com, said, “Sadly, this is a question that will definitely have different answers by category. IBM Smarter Planet will make energy use and traffic congestion get better. Big Data works. Politicians will be fed Big Data results by lobbyists to support a given conclusion, and bad things will happen. On and on down the line you will see that dichotomy: Business vs. lobbyists. One will work for positive, one for negative.”

Ted M. Coopman, a faculty member at San Jose State University and member of the executive committee of the Association of Internet Researchers, explained, “While the ability to process huge amounts of data will bring many benefits, the lack of a theoretical coherency and understanding of how large and complex systems work will cause major problems to arise. The focus of Big Data on financial markets has not increased our understanding of how our complex and global economies work. Being able to identify variables does not lead to an understanding of them. Massive complex systems are very hard to predict. Moreover, just because we understand more does not mean we can take actions that do not create more friction or introduce variables that result in unintended consequences. At the end of the data you must act on the data and that is where we run into problems. There will always be more known unknowns and unknown unknowns than known knowns. I think that more data will only increase the former more than the latter.”

Sam Punnett, president of FAD Research Inc., observed, “As with any new technology its arrival is a mixed blessing fraught with the peril of our decision-making organizations to utilize new potentials. The two most obvious cases in point are intelligence-gathering systems used for national security and the information systems currently employed to manage international financial markets. Both have manifested unintended consequences—in the one case a failure to properly act on information available and in the other hugely intricate and extreme fluctuations in markets that no one can explain. I am optimistic for the potential of the Internet of Things deployed on a manageable scale. The great caution with more-ambitious systems is an over reliance on seemingly rational systems to provide total unassailable solutions. The potential for these systems to be abused or to fail to take into account unforeseen circumstances is real, underscoring the need for the design of such systems to be founded on well-considered principles addressing information privacy and civil liberties as well as the realization that the systems are constructs from data using rules. Rules created by people, with all their foibles and imperfections, are subject to the occasional ‘black swan’ conditions of unimagined outcomes.”

Caroline Haythornthwaite, director and professor at the School of Library, Archival, and Information Studies of the University of British Columbia, wrote, “With any change there are equal and opposite reactions. Greater data aggregation will create privacy issues; greater visualizations will hide algorithms for generating these appealing data presentations.”

She warned: “As Herbert Simon said a number of years ago, algorithms will disappear into machines and then not be reexamined.”

Stephen Masiclat, associate professor of communications, Syracuse University, predicted, “Big Data use will be the norm for all business and an increasing sector of the population will eventually be in the business of explaining Big Data insights to people not trained to understand the statistical mechanics and limits of the systems. This will not be a universal good: in America especially people dislike the idea of classification. As our data become more granular and our analysis more refined, we’ll likely see more class stratification driven by marketers and other business operations. But the benefits will very likely outweigh these negatives as we will be able to do more things more cost-effectively with the insights gained from more data.”

Open access to tools and data ‘transparency’ are necessary for people to provide information checks and balances. Are they enough to tilt the impacts in a positive direction?

Some respondents said the future will be positive if access to data is offered on an equal basis to all, and even “private” organizations make most of their data sets or all of them open and free. This is often referred to as data “transparency.” An anonymous respondent wrote, “I’m personally very involved in this trend and I am thrilled how consistently people are pushing for open data.” Another wrote, “If Big Data is not also Wide Data (that is, dispersed among as many players and citizens as possible) then it will be a negative overall.”

Alex Halavais, vice president of the Association of Internet Researchers and author of Search Engine Society, wrote, “The real power of ‘Big Data’ will come depending largely on the degree to which it is held in private hands or openly available. Openly available data, and widespread tools for manipulating it, will create new ways of understanding and governing ourselves as individuals and as societies.”

Cyprien Lomas, director at The Learning Centre for Land and Food Systems at the University of British Columbia, urged, “Along with the rise of Big Data should come equal and open access to the data so that assumptions can be checked and double checked and to foster a culture of looking for results in data. Access to the same data should allow for thousands of parallel experiments to be run by amateurs. This ecosystem should allow the discovery of new patterns and meanings in the Big Data.”

Tom Hood, CEO of the Maryland Association of CPAs, responded, “Big Data gives me hope about the possibilities of technology. Transparency, accountability, and the ‘wisdom of the crowd’ are all possible with the advent of Big Data combined with the tools to access and analyze the data in real time. Many examples are already in progress. In the accounting profession there is the advent of XBRL (eXtensible Business Reporting Language), an open source, and standardized business reporting language, which is a subset of XML. This is already being deployed with mandatory financial reporting with the SEC, FDIC, and is being proposed for government spending accountability via the DATA Act of 2011 (Digital Accountability and Transparency Act). The use of XBRL is also being used for reducing the compliance burden for businesses and government with many governments around the world (Netherlands, Australia, New Zealand, and the UK). The risks of a negative scenario revolve around data integrity and security issues. If these allow for manipulation and distortion from those in power, then the public trust will erode and a very negative scenario will rise. However, if the data is set free and there are tools for the many to access and analyze the data, then I would expect the positive scenario to be the most likely.”

Donald Neal, senior research programmer at the University of Waikato, based in Hamilton, New Zealand, and others see promise in enabling all to easily understand the world better through data. Neal wrote, “One consequence of ‘The Cloud’ is that tools for Big Data analysis could be available to anyone.”

Nathan Swartzendruber, technology education at SWON Libraries Consortium, warned the data must be open. “In order for Big Data to have a positive impact on society overall, it has to be transparent,” he said. “Ordinary citizens would have to be able to query the data set and discover real answers, regardless of the light that shows on individuals or corporations or governments. There is too much at stake for these parties to allow open, transparent access to this data. As long as some data sets or parts of data sets are hidden, there is room for misuse and manipulation. I think this manipulation is sure to take place. Unless Big Data is democratized on a massive scale, it will overall have a negative impact on society. Right now, I don’t see much hope for such a democratization.”

Richard Lowenberg, director of the 1st-Mile Institute and network activist since early 1970s, noted, “Big Data should be developed within a context of openness and improved understandings of dynamic, complex whole ecosystems. There are difficult matters that must be addressed, which will take time and support, including: public and private sector entities agreeing to share data; providing frequently updated meta-data; openness and transparency; cost recovery; and technical standards.”

Cathy Cavanaugh, associate professor of educational technology at the University of Florida-Gainesville, predicted that in this sort of world, “because people will be able to quickly create their own data manipulation apps, public datasets will be used widely for answering questions. In many cases, data analysis will augment user satisfaction and ratings, and in some cases archived user satisfaction and reports will be analyzed as the data, bringing balance to decision-making through the use of large amounts of objective and subjective information.”

But Sean Mead, director of solutions architecture, valuation, and analytics for Mead, Mead & Clark, Interbrand, expects there will have to be a public outcry to open data to the public and there may be an AI liberation movement. “Large, publicly available data sets, easier tools, wider distribution of analytics skills, and early stage artificial intelligence software will lead to a burst of economic activity and increased productivity comparable to that of the Internet and PC revolutions of the mid to late 1990s,” he predicted. “Social movements will arise to free up access to large data repositories, to restrict the development and use of AIs, and to ‘liberate’ AIs.”

Some respondents said they don’t think most people will be able to identify or assess complex data sets, with or without tools and open access.

An anonymous respondent wrote, “Collection is likely to be imperceptible to most, unless law and regulation make it overt and provide the individual choice. Analysis likely will suffer from a divorce in knowledge and context between the orderers and the providers of the analysis. No example currently is better than that between the avaricious ignorance of bank executives and the technologists’ naiveté about the realities of collateralized debt obligations. Reliance will lead to increasingly unstable processes where only those able to use Big Data will be able to protect themselves, with the individual increasingly at risk. Rapid program stock trading is a current, pernicious example.”  

Another anonymous survey participant said, “The false-positive rate will continue to grow, but the general population doesn’t understand ROC curves or false-positive/true-positive comparisons today or in the future. The few people who will understand the dangers of ‘Big Data’ will have high cognitive abilities and training. The general population will continue to rely on crappy results because they know no better.”

Take off the rose-colored glasses, some argued. Big Data has the potential for significant “distribution of harms” that may be impossible to avoid

For a number of respondents, the upside of Big Data is not yet clear enough compared to the foreseeable difficulties it will create.

Longtime technology analyst Oscar Gandy, emeritus professor of communication at the University of Pennsylvania, was one of the forceful advocates of this view: “I recently published a book Coming to Terms with Chance that largely mirrors the arguments at the core of the second option. In that book and in my view more generally, there is a need to think a bit more about the distribution of the harms that flow from the rise of big, medium, and little data gatherers, brokers, and users. If ‘Big Data’ could be used primarily for social benefit, rather than the pursuit of profit (and the social control systems that support that effort), then I could ‘sign on’ to the data driven future and its expression through the Internet of Things.”

Michael Goodson, assistant project scientist at the University of California-Davis, wrote, “My answer is a reluctant acknowledgment of what I perceive as human nature. Basing my opinion on how effectively marketing works on many people—convincing them to do things other than what is in their personal interest—it seems likely that powerful people and institutions will use all of the data at their disposal to affect events according to their interests.”

“While the rise of Big Data yields some positives, I fear that it will mostly result in increased surveillance and more-targeted marketing efforts,” wrote Melinda Blau, freelance journalist and the author of 13 books, including Consequential Strangers: The Power of People Who Don’t Seem to Matter But Really Do.

A warning tone was taken by Sivasubramanian Muthusamy, president of the Internet Society India chapter in Chennai and founder and CEO of InternetStudio. “The Internet and the Internet of Things together with the accompanying explosion in the capacity to process data will indeed facilitate positive progress, but at the same time, the data explosion will definitely cause more problems than it solves in future,” he wrote. “Separating necessary data from unnecessary data will pose peculiar challenges. Also, data analysis alone does not guarantee optimal decisions and optimal outcomes because there are several factors beyond data—a point that is prone to be missed in the quest for more and more data. Such volumes of data call for more elaborate data management infrastructure and complex tools for analysis, which will inevitably leave all the data in the custody of very large enterprises, good and bad, and in the hands of governments. There is tremendous power associated with such a wealth of information. It is unlikely that this power will always be used with infallible ethical standards. In particular the rise of Big Data is likely to lead to a situation where everyone is tracked every moment everywhere, out of a pointless concern for security and in a misguided quest for control.”

James A. Danowski, a professor of communication at Northwestern University and program planner for Open-Source Intelligence and Web Mining 2011, wrote, “Mining, analytics, shortening time to predicting trends are the intensive focus of most of the information segments of the knowledge sector. Misuse will increase as cyberwarfare, as one manifestation, is projected to become much more prevalent and more state-sponsored than it currently is. Government intelligence sources are currently funding research to detect deception in social media and develop ways to counteract it, technologies which can be readily repurposed for manipulation of new ‘public opinion’ sources.”

An anonymous survey participant said, “The hype surrounding Big Data is almost as frenzied as the hype surrounding the efficient markets theory in the 1990s, and look where that led us. While improvements in data collection and processing will lead to vast improvements in our understanding in many areas, it is not a panacea. We have had decades of experience analyzing corporate financials, stock exchanges, and market indices, and yet we still cannot predict what will happen next. Many dynamic systems, of which the stock market is one, do not lend themselves to predictable patterns, no matter how much data you gather or how much computing power you apply. I worry that overconfidence in Big Data and all-seeing algorithms will lead to terrible errors. And I worry that these systems can be gamed, made to deliver false or misleading results.”

One anonymous respondent joked, “Upside: How to Lie with the Internet of Things becomes an underground bestseller.”

Another anonymous respondent wrote, “The manpower required to appropriately tag and accurately merge all current data sets is prohibitively excessive. And that doesn’t consider the new data sets being created every day. Consequently, Big Data will generate misinformation and will be manipulated by people or institutions to display the findings they want. The general public will not understand the underlying conflicts and will naively trust the output. This is already happening and will only get worse as Big Data continues to evolve.”

Several people expressed concerns for the human individual in a world of Big Data. An anonymous respondent said, “We will become more addicted to what the databases tell us. It might impair risk-taking for the good. We’ll depend more on models than instincts.”

Leara Rhodes, an associate professor of journalism and international communications at the University of Georgia, said, “Any data can be misused, information is power, and if someone has a lot of information, whether or not it is accurate, complete, or truthful is harder and harder to prove. Group thinking takes over. Diversity in thinking cycles is so important that to override it with the majority point of view will be harmful to our society and it will push people to conform and not maintain their cultural identities.”

An anonymous respondent wrote, “Somewhere along the timeline of life, logical basic understandings that root each of us to the others need to take precedence. I don’t think technology or Big Data can do that or should try to tell us the future or how or what to believe. I’m not ready for that conceptual change. Technology moves fast, people not so much.”

Stan Stark, a consultant at Heuroes Consulting, responded, “Too much trust will be given to predictive analytics of Big Data, thereby clouding and ‘greying’ decisions made by big business to the detriment of their performance in customer service arenas. They will ‘assume’ their analytics are correct in all decision making and lose focus on  ‘pre’ Big Data techniques that were more personalized.”

An anonymous survey respondent said, “We still haven’t figured out the implications of chaos theory, and if ‘Big Data’ and futurecasting aren’t perfect examples of chaos-based information, then I don’t know what is. Generically, we’re not prepared for this great a lack of privacy; we’re even less prepared for data of this magnitude available only to the powerful, rich, or connected.”

An anonymous participant wrote, “Two points: First: Big Data is not Big Knowledge. We are opening a fire hose of data pointed at ourselves, but aside from developing higher density storage, we’re not doing much in terms of handling it. Major challenges here will be developing ‘perceptual filters’ for these data flows (analogous to the ones in our minds that allow us to, for example, not spend the whole day paying attention to the fact we’re wearing socks): throwing away data points that are not likely to become knowledge, that are not likely to ever be accessed, that will only serve to take up space on a hard drive and complicate further analysis of interesting events. Second: legal protections for the citizenry (in those jurisdictions which are not decidedly autocratic) are lacking, and will be essential to prevent corporate or governmental abuse of the insights available about people through widely aggregated data, as well as through new surveillance techniques.”  

Another anonymous respondent noted, “In 2020, few people understand ‘Big Data’ as no more than conventional 20th century statistics applied to variables measuring highly superficial and ephemeral presences in physical and cyberspace. This information will continue to be imbued with magical power to predict, but will nonetheless fail to determine the unplanned behavior of individuals who are subject to highly emergent social cues. The rise of Big Data is not a negative, but many will lose interest as the cost to acquire and maintain the data exceeds the derived benefit.”  

Jon Lebkowsky, principal at Polycot Associates LLC and president of the Electronic Frontier Foundation-Austin, observed, “We’ve seen so many situations where the gloss of statistical analysis misrepresents the reality of the data analyzed. It’s too easy to bend the analysis to serve a specific goal or intention. I’m also concerned about individual data ownership and privacy issues in the world of Big Data. This is an area where the outcomes could probably be improved by regulation, but regulation is currently out of style.”

One anonymous respondent shared a criticism that many survey respondents raised—that organizations that possess data are not going to be swapping files with each other, even when it is of benefit to the greater good. “I don’t believe there will be large data sets that are shared across corporations, government, and universities at a widespread level like discussed in the above scenarios.”

Marcia Richards Suelzer, senior writer and analyst at Wolters Kluwer, warned, “The biggest risk is the speed and access that the Internet provides. We can now make catastrophic miscalculations in nanoseconds and broadcast them universally. We have lost the balance inherent in ‘lag time.’”

And Barry Parr, owner and analyst for MediaSavvy, said, “Better information is seldom the solution to any real-world social problems. It may be the solution to lots of business problems, but it’s unlikely that the benefits will accrue to the public. We’re more likely to lose privacy and freedom from the rise of Big Data.”

We won’t have the human or technological capacity to analyze Big Data accurately and efficiently. Analysts might be looking for insight in all the wrong places

Some challenged the 2020 timeline presented in the scenario descriptions. Mark Watson, senior engineer for Netflix, said, “I expect this will be quite transformative for society, though perhaps not quite in just the next eight years.”

Others who argued a similar line described what they think is a fundamental mismatch between the volumes of data being generated and human capacity—even with the assistance of machines—to work with large sets of data, to share sets of data, and to derive significant, accurate results.

Mike Liebhold, senior researcher and distinguished fellow at The Institute for the Future, predicted, “The constraints of appreciating the benefits of Big Data will be the speed of adoption of open APIs, linked data, and interoperable metadata. Continued concerns over privacy and security will constrain the utility of Big Data for inference visualization and personal analytics.”

Christian Huitema, distinguished engineer at Microsoft, said, “Unsupervised machine learning is hard. There are many examples of supervised machine learning, but these are driven by subject-matter experts that guide the machine towards specific discoveries. It will take much more than ten years to master the extraction of actual knowledge from Big Data sets.”

Bill St. Arnaud, consultant at SURFnet, the national education and research network building The Netherlands’ next-generation Internet, noted, “The benefits and impacts will be much smaller and take a longer time to develop. Manipulating and correlating Big Data sets is hard work.”

And an anonymous respondent said, “The fact that most data is unstructured is a huge issue, and I doubt that we will solve the problems associated with getting meaning from that morass.” Another anonymous survey participant wrote, “Certainly in 2020 Big Data will be more risky than trustworthy. We just won’t have enough experience—the equivalent of the 100-year flood in forecasting terms—and so our systems will ‘look good’ on some basic problems but prove to make whoppers of mistakes.”

Dan Ness, principal research analyst at MetaFacts, producers of the Technology User Profile, told a tale in his response: “There’s an old story about a passerby who comes across a drunk man standing under a lamppost looking for his keys. The passerby joins in the search and doesn’t see anything. He asks and learns that the keys didn’t fall anywhere near the lamppost, but that the drunk was looking near the lamppost because that’s where the light was. A lot of ‘Big Data’ today is biased and missing context, as it’s based on convenience samples or subsets. We’re seeing valiant, yet misguided attempts to apply the deep datasets to things that have limited relevance or applicability. They’re being stretched to answer the wrong questions. I’m optimistic that by 2020, this will be increasingly clear and there will be true information pioneers who will think outside the Big Data box and base decisions on a broader and balanced view. Instead of relying on the ‘lamppost light,’ they will develop and use the equivalent of focused flashlights.”

Seattle-based consultant Tom Whitmore said, “There will be a rising need not for statistical analysts, but for people who will do ‘forensic data analysis’—what was actually measured to generate this datum that I’m looking at, and how close is it to what I really wanted to see measured? As more and more large data sets get generated, there will be more and more of a problem with this. Everyone knows what a datum is, and what a comparison is. And it’s very clear if one begins to look that the definitions used by different people have very different implications for the meaning that can be derived. Exploratory data analysis can show you what’s interesting in a large batch of numbers, whether the interesting things that are discovered reflect something useful about the labels attached to those numbers is a completely different question, and a number without appropriate descriptive attachments isn’t a datum—it’s just a number.”

Futurist John Smart says Big Data will be a huge positive, but not until the semantic Web becomes fully functional, around 2030. “Lots of folks and companies will over claim what Big Data can do for us in the next decade, and are already doing so, but that’s just a mild negative. Such hype causes overinvestment in underperforming platforms and other problems, but they are mild. Once we have cybertwins (semi-intelligent agents) interfacing with us and a valuecosm in 2030, all the smallest social values groups will have their own online lobbies, and be able to find subcultures that support and advance their values. In the meantime, expect the typical chaos, hype, and inefficiencies that tech innovation always brings.”

A number of respondents said the second, negative scenario will be likely in 2020, but by 2030 or after we may have adapted and evolved to reach the point at which the positive scenario will be most prevalent. An anonymous survey participant said, “Option one would be desirable, but option two is more likely, at least for 2020. In 2020, many questions related to justice, majority vs. minority decisions, etc., will not be solved and the algorithms will still be too machine-like and not humanized enough. Option one might be a long-term scenario.” 

Jonathan Grudin, principal researcher at Microsoft, predicted, “Data mining will be used more, but by 2020 it will still be in fairly limited ways for limited purposes, and won’t have that much of an effect, though of course those marketing it will amplify the benefits. But it will probably be 2030 before it really gets powerful. Will the effects be a net plus or minus? For twenty years the direct marketing and other people have been doing this, has that been a net plus or minus? I like the advances in weather and traffic prediction. I like it when my supermarket actually offers me free items or heavily discounted items that I have actually bought there in the past, rather than random coupons. I don’t expect data mining to make massive advances over this kind of stuff by 2020. It will be here faster than you think. It is like three releases of the Mac or Windows OS ago—how revolutionary have the changes been since then? I guess we have an iPad and a Kinect now, but none of this has radically transformed the lives of ‘most’ people for good or ill.”

J. Meryl Krieger, a sociologist at Indiana University Purdue University-Indianapolis, said, “We don’t have the resources to process the data and analyze it adequately for its meaning. Also, vast quantities of quantitative information are lovely, but without the contextualization and detail that come from interviews, observation, and other qualitative techniques that vast quantity of information is essentially meaningless. In other words, that’s nice but so what?  Until we commit adequate resources (which currently are not available—I point out the emphasis of our current society towards monetization and specialization) towards interpretation and explanation ‘The Internet of Things’ remains a great idea and that’s about it. In terms of values, it totally depends on where you sit. There are going to be people who are terrified of what comes out of finding out what people actually do; they are much more interested in having the world reflect what they know and understand and find ‘difference’ to be incredibly threatening. Such folks will always try to manipulate data sets. On the other hand are the idealists who likewise want diversity to always be a good thing and will try to manipulate data sets to reflect their vision. Integrity in scholarship is the key here – way too many people have an agenda they are pursuing. This is the threat to ‘The Internet of Things’ not the information itself.”

Tapio Varis, professor emeritus at the University of Tampere and principal research associate with UN Educational, Scientific, and Cultural Organization (UNESCO), noted, “The general lack of trust and confidence and the gigantic misuse of existing Big Data for monitoring and intelligence will slow down and backset progress.”

Rich Tatum, the research analyst for Zondervan, a religious publishing house, agreed that major trust issues lie ahead. “Such analysis will enable deception on an ever greater scale,” he wrote. “What will matter most will not be who you trust for news, or what outlet you trust, but who owns the data you use for news. And this kind of data and analysis will not be cheap.”

Nikki Reynolds, director of instructional technology services at Hamilton College, says overconfidence is a big risk. “We are already using data modeling to make big mistakes,” she pointed out. “In most ways, I doubt that the use of Big Data will be any more or less faulty than our current uses of the data and models we have accessible today. The real estate and sub-prime mortgage disasters are a clear case of those problems now. The best analyses of the roots of those mistakes I have read and heard point to overconfidence in poorly understood, very complex risk models, and the refusal to recognize that the worst will happen. When the probability of a situation occurring is 1,000 to one, that means the situation will occur, just not very often, over the long run, although possibly on two or more consecutive occasions over the short run. People seem not to pay attention to that when making decisions. Perhaps we are not really good at thinking about the long run, and ultimately having to put the ‘disaster recovery plan’ into action. In any case, I don’t think Big Data will make the problem of poor judgment when assessing the consequences of risk any smaller. Anything forecast based on any data is just a model, and not an event controller. We, as a species, will continue to make decisions on a short runway and we will get caught out. So, will Big Data make the consequences of our mistakes worse? Yes, and no. While the ramifications of a mistake may become more far-reaching, it will also be much harder to ‘hide’ mistakes and their consequences, because of the level of connectivity that we have. We’ve seen the accelerating potential of that connectivity in recent political events, and even in recent environmental disasters. When someone starts a protest, others know of it right away, from the immediate observers. Decisions about whether to join the protest don’t have to wait for the publication of a newspaper, or the ‘film at eleven’. When an oil rig blows, the whole world knows within hours. Governments and scientists swing into action immediately—not always smoothly and certainly not always cooperatively, but the response is immediate. I’m betting that our ability to respond to crises is going to increase just as rapidly, perhaps more rapidly, than our ability to create crises. I hope I’m right.”

Jeniece Lusk, an assistant research director with a PhD in applied sociology who works at an Atlanta information technology company, wrote, “As an applied sociologist, I religiously believe in the ability of the humans who must interpret and create these data sets to muck it all up, intentionally or not. The media, the research, the Internet is all driven by humans. Human error can mess up even the best of data collection, analysis, and dissemination. Beyond that, we are unable to ever suggest or predict in a generalizable way until confidence intervals are 100% and the Census doesn’t need to impute data we’re not going to be able to read the future or become psychic-statisticians. Unfortunately, you won’t be able to convince some audiences or segments of that, because if someone with an authoritative position tells you something that a computer calculated, you might as well call it absolute truth (unless it doesn’t match up with their belief system, of course).”

And one participant noted that we don’t use the data we already have. “Modern society already ignores a century of social science research when determining programs and policies,” said Cheryl Russell, editorial director for New Strategist Publications and author of the Demo Memo Blog. “It is doubtful that the leaders of 2020 will be any more able or willing than our leaders today to use social science findings to improve our lives.”

Some concentrated their focus on the role of human judgment in the process of Big Data analysis and response

An anonymous respondent said, “The old lesson that correlation is not causation seems never to be learned. The control over data means that inaccurate data is hard to identify and correct. I see that the problems will only increase with the size of the datasets. Most emphasis seems to be given to doing clever things with data rather than ensuring its validity or giving the right people control over it.”

Michel J. Menou, visiting professor at the department of information studies at University College London, noted, “The intelligence of systems cannot substitute for the intelligence of the individuals and organizations that use them. Since efforts are focused on the development of technology at the expense of education, consciousness raising, and democratic control, negative effects are the more likely to occur.”

Tom Rule, an educator and technology consultant based in Macon, Georgia, wrote: “Never underestimate the stupidity and basic sinfulness of humanity.”

William L Schrader, an independent consultant who founded PSINet in 1989, provided a few more details. “The fact is: people are people,” he wrote. “The rich get richer and the powerful stay that way. All tools will be used by the rich for gain and by the powerful to remain so. However, the activists in the world will also have access to Big Data, and big tools, in fact, it will be innovators and activists who create those very tools. In the end, the bag is always mixed; much as the Internet brought us distance learning and distance medicine (as predicted in the 1980s), it also brought humans global access to child pornography, the opportunity to phish financial and identity information for illegal activity, and to assist governments in monitoring and controlling their populations. Simultaneously, we saw how the Internet played an integral role in the overthrow of several governments during 2009-2011 and that activity will continue. Yes, the answer is ‘both,’ positive and negative.”

Miguel Alcaine, head of the International Telecommunication Union’s area office, Tegucigalpa, Honduras, responded, “If some high-placed people believe this type of technology can predict the unpredictable, there will be cases where this technology will be overextended and misused. Human judgment cannot be replaced by technology, the former being the responsible for decisions.”

David D. Burstein, founder of Generation18, a youth-run voter-engagement organization, said the human element trumps all of the technology. “As long as the growth of Big Data is coupled with growth of refined curation and curators it will be an asset,” he wrote. “Without those curators the data will become more and more plentiful, more overwhelming and confuse our political and social conversations by an overabundance of numbers that can make any point we want to make them make.”

Donald G. Barnes, visiting professor at Guangxi University in China and former director of the Science Advisory Board at US Environmental Protection Agency, noted, “Big Data has possibilities and will result in some, but limited, number of discoveries. However, a vision of relying on results of the analysis of Big Data as the major source of breakthrough information and insights is unwarranted. Past and current examples of the analysis of Big Data suggests that we should be cautious about the fruitfulness of this type of analysis; e.g., the Department of Homeland Security being hamstrung by the torrent of information intercepted on the Internet and the limited payoff from the use of massive information sources in combinatorial chemistry and bioinformatics. The underlying problem is one of signal-to-noise; i.e., with more information, the challenge of detecting the signal can be even larger. By 2020, most insights and significant advances will still be the result of trained, imaginative, inquisitive, and insightful minds.”

David Kirschner, a PhD candidate and research assistant at Nanyang Technological University in Singapore, wrote, “People put way too much faith in statistics and quantitative analysis of giant data sets. It leads us to believe we can predict and forecast much better than we actually can. Forecasting leads to people assuming outcomes that don’t necessarily happen, and that has real-life implications for people who win or lose based on these predictions. We also assume that we can trust interpretations of this data. Interpretations are made by people, people in positions of power who have their own agendas, and those are the interpretations people generally trust. Not smart, but we don’t know any better because we believe what we’re told by ‘experts’ and we don’t have any means to find out what’s really going on, reasons for this or that. It’s all efficiently masked in bureaucracy. Very dangerous!”

Jeffrey Alexander, senior science and technology policy analyst at the Center for Science, Technology & Economic Development at SRI International said the human factor in analysis is crucial. “While 2020 is too soon for the emergence of true artificial intelligence and predictive power, the ability to manipulate social, physical, and informational inputs on a large scale will reveal new insights into behaviors and human development,” he wrote. “The greater danger lies beyond 2020, when machine learning may become so effective that it crowds out human judgment.”

An anonymous respondent predicted a withdrawal by some, writing, “I expect a backlash against Big Data to occur sooner rather than later, and expect to see a movement toward people reducing their presence on the grid. There is still a large percentage of the population who have a low level of Internet presence (anyone over the age of 45, for example) and this will offer the necessary contrasts for this to occur.”

People are concerned about the power agendas of governments and corporations, the interests with the most Big Data resources

A variety of responses focused on the collectors of Big Data and their motives. Among those for whom that was the framework for their answers, many were wary and full of warning about how the data could be used.

Ed Lyell, a professor at Adams State College, wrote: “I see two major negatives overwhelming the positive. 1) Our trust in econometric models made the great world economic crash more likely. By getting better and better at predicting the specifics of the near future we had models that ignored big system changes and the power of market corruption by those on top. Predictive models are all subject to a movement to the mean, ignoring the rising system change caused by falling off a cliff not seen by the models and not looked for by the humans trusting models.  2) Like 1984 and Brave New World, books my generation knows well, I have seen government, and even more big business use massive personalized data to control people, to not just respond to their needs but to create needs. Government has made us fearful and willing to accept increasing limits on personal freedom because of our insecurity. The wealthy elite (top 2% or so) can purchase TV ads and other media to get Congress elected for their purposes. Now big corporations can be even more active on the top of the table. This increasing power shifted to the top is moving the United States away from democracy into what I now see as a plutocracy. The RFID’s in our clothes and products make tracking the outliers easier and perhaps puts people at more risk in the future.”

An anonymous respondent wrote, “The correct choice depends largely on our collective choice. In the end, I selected the more pessimistic scenario because that is the choice we are in right now—the one where corporations with no sense of values, morality, or conscience make choices for humans (choices affecting humans, but motivated by mere profit for shareholders). Consider the number of TV programs dedicated to ‘investing’ money compared to the number of TV programs dedicated to ending poverty. As things stand right now, there is little doubt that the second option is the correct one. But fortunately, that could change if humans decide to take charge, and return corporations to their subservient role.”

Another anonymous survey participant predicted, “Both outcomes will occur, concurrently, in many complex intertwined ways. Even liberal governments will feel compelled to accumulate and use data against their citizens, in many of these countries corporations run amok will do the job.”

Julia Takahashi, editor and publisher at Diisynology.com, wrote, “By 2020, most Internet users will be used to receiving algorithmic recommendations and will either give them little notice or will have found ways to circumvent them. In the United States a large majority of people dislike feeling that they are being manipulated or presented with fewer choices and the online retailing community is going to have to deal with this. At a community, regional, state, or national planning level there will be more use of Big Data and it will have to compete with political attitudes which seem to be trending towards suspicion of ‘Big Data.’  Corporations will most likely be the largest users of Big Data and may find that the data out is only as good as the data in and the suppositions that went into planning the output. I think we will see some major mistakes.”

Most of the respondents who commented with concerns over government and/or corporate control of data chose to remain anonymous. Here are more of their observations:

— “I started dealing with data aggregation in the 1970s and have a copy of the 1970s US Health Education and Welfare report on computers, privacy, and databases on the bookshelf where I am typing this. Data aggregation is growing today for two main purposes: National security apparatus and ever more focused marketing (including political) databases. Neither of these are intended for the benefit of individual network users but rather look at users as either potential terrorists or as buyers of goods and services. Already it costs a lot for people to fetch the results of some of these things, even simple things like credit scores are available to the data subject only for a fee. Information is power, and power will cost money.”  

— “Whenever corporations or governments get involved with anything, they rarely behave in what could be considered an ‘altruistic’ fashion. Corporations will monopolize Big Data to make money; an unethical government administration could use it to wreak havoc on private lives, which I believe is already happening in the United States, under the auspices of preventing child pornography and exploitation. While certainly a worthwhile endeavor, there are implications to eroding citizens’ rights to privacy to carry out sting operations. This sets an uneasy precedent where suspicion of activity can trump proof, and entrapment follows closely behind. I see this being a major problem for journalists and political bloggers in the future.”  

— “The false confidence already plagues risk-management ‘professionals.’ No one looking at the Big Databases predicted the criminal activities of the financial sector, anything could be changed to look a certain way, and any channel to data could be clogged, muddled, or dirtied to the point where an independent analysis is undermined. Files have and will be deleted on demand. There is no moral code in the algorithms, no ethics, no enforcement. These tools are only indexes pointing to areas of further research. Without a more robust system of checks and balances and independent watchdogs, these systems will not guarantee fidelity to the truth.”

— “Money will drive access to large data sets and the power needed to analyze and act on the results of the analysis. The end result will, in most cases, be more effective targeting of people with the goal of having them consume more goods, which I believe is a negative for society. I would not call that misuse, but I would call it a self-serving agenda.”

— “Data is not information, and information is not knowledge, and knowledge is not wisdom. Conducting things as they had been conducted up-to-date, the finest information will serve ‘elastic’ statistics, neo-Nazi supremacy visions, wars based on ‘reliable intelligence relative to mass destruction weapons’ or faked president elections. The ethic-control thing will turn more and more important as the power of the Internet gives power to some men.”

— “Unless some major political upheaval changes the balance of power in the world, Big Data will be primarily in the hands of the increasingly small group of the rich and powerful. The tendency of those with immense power is to use tools such as Big Data to increase their power. Therefore, if the current direction of international power structures continues and power is increasingly concentrated in the hands of a few, the capabilities of Big Data will be used to further augment that power and will not be used for the good of the community.”

— “The majority of Big Data is and will continue to be in the hands of corporate interests which by definition are selfish bastards.” 

— “Big Data will probably be a cause of reduced freedom and privacy, and it will give advantages to companies that can spend money on analysis.”

— “It is unquestionably a great time to be a mathematician who is thrilled by unwieldy data sets. While many can be used in constructive, positive ways to improve life and services for many, Big Data will predominantly be used to feed people ads based on their behavior and friends, to analyze risk potential for health and other forms of insurance, and to essentially compartmentalize people and expose them more intensely to fewer and fewer things.” 

— “Humanity will always have greed and corruption and deception, and this can only be mitigated by insightful analysis of fact, open sharing of information and logical decision making.”

Some predict that algorithms will most negatively impact the lives of those who are already disadvantaged

Other answers were related to those in the previous section in that they explored the motives and behaviors of the data collectors. But their material concentrated on the effect of Big Data on those who are not themselves powerful.

Fred Stutzman, a postdoctoral fellow at Carnegie Mellon University and creator of the software Freedom, Anti-Social, and ClaimID, said, “We must stay mindful that Big Data is not a complete lens, especially when interpreting the human condition.”

Steve Sawyer, professor and associate dean of research at Syracuse University; an expert of more than 20 years of research on the Internet, computing, and work, wrote, “Our vision of the data is based on our vision of the world, and this vision is not very broad-minded when it comes to Big Data. We tend to emphasize the parietal insights of a particular form of economic thinking, and we tend to frame social analyses through a form of soft colonialism. Such bias, combined with the arrogance of technical competence, will create huge disparities between ‘what the data say’ and the lives of billions of people.”

Brian Harvey, a lecturer at the University of California-Berkeley, noted, “The collection of information is going to benefit the rich, at the expense of the poor. I suppose that for a few people that counts as a positive outcome, but your two choices should have been ‘will mostly benefit the rich’ or ‘will mostly benefit the poor,’ rather than ‘good for society’ and ‘bad for society.’  There’s no such thing as ‘society.’ There’s only wealth and poverty, and class struggle. And yes, I know about farmers in Africa using their cell phones to track prices for produce in the big cities. That’s great, but it’s not enough.”

Ebenezer Baldwin Bowles, owner and managing editor of corndancer.com, wrote, “With Big Data comes Great Power, and neither shall be used wisely for the common good. The objective is not to reveal opportunity for the elimination of scarcity among the many, but to identify fertile ground for exploitation and control.”

Paul McFate, an online communications specialist based in Provo, Utah, said, “New media channels will continue to splinter consumers and enhance the social divide. Intelligent people will use the information well, but the average person will continue to look for bright shiny objects that will entertain. Abusive people will continue to abuse. Providing access to data does not change moral behavior.”

Daren C. Brabham, an assistant professor of communications at the University of North Carolina-Chapel Hill, said, “Our reliance on algorithms is already proven to be problematic, evidenced by the fickle nature of the stock markets and other things. As we keep funneling the best and brightest mathematicians into algorithm-focused professions (like finance), we’ll continue to abstract real labor and real human concerns further away from real consequences and circumstances. This is a massive ethical problem, too.”

Paul Gardner-Stephen, a telecommunications fellow at Flinders University, observed, “While many benefits will arise from an Internet of Things, while the things remain in the possession of very few centralised interests, it will present great potential for abuse. History tells us that where such potential exists and sufficient money and civil-control can be made by the abuses, that such abuses will continue in increasing measure. Face recognition and tracking alone present the simple means to create an almost inescapable police state. Hats and coats such as those worn by Spy-vs-Spy will become more appealing, although ultimately ineffective as statistical and probabilistic algorithms allow the tracking of even persons cloaked (literally or by other means).”

Arthur Asa Berger, professor emeritus of communications at San Francisco State University, said, “While the Internet does allow dissidents to have a voice, for the main part they are not heard relative to the power of the dominant elites, members of the ruling class, etc.”

Frank Odasz, president of Lone Eagle Consulting, a company specializing in Internet training for rural, remote, and indigenous learners, wrote, “The politics of control and the politics of appearances will continue to make the rich richer and diminish the grassroots and disenfranchised until the politics of transparency make it necessary for the top down to partner meaningfully with the bottom up in visible, measurable ways. The grassroots boom in bottom-up innovation will increasingly find new ways to self-organize as evidenced in 2011 by the Occupy Wall Street and Arab Spring movements.”

David A.H. Brown, executive director of Brown Governance Inc., a consulting business based in Toronto, Canada, noted, “Democratization is the issue; this has tremendous implications for social structure and social order (increasing pressure by ‘have-nots’ on ‘elites’) as well as privacy, family, and culture. A big unanswered question is who will control Big Data?  Whomever controls the information will have greater power and influence, and they may use this for positive or negative results.”

Purposeful education about Big Data might include warnings that anticipate manipulation of data analysis outcomes; trust features might be built into the data scrutiny

Some respondents wondered if some negatives of Big Data might be mitigated by more serious study and purposeful planning.

John Horrigan, vice president of TechNet, a research organization, said, “‘Big Data’ is very much undiscovered country for citizens and policymakers, and its beneficial potential depends on getting governance and citizen education right. The tech sector is typically pretty good at ushering in new applications in a secure way. But it is easy to underplay the importance of educating the public on what this all means, not least to foster widespread use of the affordances of ‘Big Data’ (e.g., health care delivery, home energy management). So a word of caution is in order if such efforts are not undertaken.”

Maureen Hilyard, development programme coordinator for the New Zealand High Commission and vice chair of the board of the Pacific Chapter of the Internet Society, responded, “The big issue to do with the Internet is trust in where the information comes from and how it is used. As long as people know who is offering the information and can trust its source, then there is a better understanding of what the world is about. However, confidence in the technology declines when it is misused and people are harmed as a result of miscommunication or false communication. Education in the appropriate use of the Internet, taking more flexible advantage of the diversity of access that the World Wide Web offers, and the security of online information during financial transactions, are what I see are the big issues for the future, to ensure that the Internet is used appropriately and safely, and provides the positive impacts for which it has the greatest potential.”

Hugh F. Cline, adjunct professor of sociology and education at Columbia University, wrote, “It will be necessary to regulate these activities to ensure that they are used for the benefits of all peoples. Furthermore, it will be necessary to educate ourselves to ensure that we can recognize abuses and self-servicing frauds.”

An anonymous respondent commented, “Data is misused today for many reasons, the solution is not to restrict the collection of data, but rather to raise the level of awareness and education about how data can be misused and how to be confident that data is being fairly represented and actually answers the questions you think it does.”    

John Kelly of the Monitor Group says that savvy individuals can fight back if those in power misrepresent to the public what the data is showing. “The positive outcomes for ‘Big Data’ will depend on the general availability of powerful analytic and visualization tools and widespread fluency in their use. Even if consumer and pro-consumer data dashboards are less advanced than those employed by commercial business and government intelligence agencies, the ability of individuals and small organizations to develop analytically rigorous counter narratives complete with dynamic 3D graphs that go up on YouTube and get picked up by CNN could provide a check on the presumptuous uses of data mining.”

Marcel Bullinga, futurist and author of Welcome to the Future Cloud – 2025 in 100 Predictions, observed, “Big Data can be manipulated as well as Small Data. It is not about big or small! It is about embedding all data with trust and privacy features. We must develop the ‘Cloud Seal’ and wrap all data in such a notary-like seal.”

Heywood Sloane, principal at CogniPower, a consulting business, said, “This isn’t really a question about the Internet or Big Data—it’s a question about who and how much people might abuse it (or anything else), intentionally or otherwise. That is a question that is always there—thus there is a need for a countervailing forces, competition, transparency, scrutiny, and/or other ways to guard against abuse. And then be prepared to misjudge sometimes.”

Humans seem to think they know more than they actually know, but despite all of our flaws, looking at the big picture usually helps

In addition to the longer observations by Stowe Boyd, Jerry Michalski, and Patrick Tucker that were shared in the opening “Overview” of this report (on pages 9-12) several respondents wrote extended and thoughtful answers that included historical perspective and general observations about human nature and society.

Kevin Novak, co-chair for the eGov Working Group of the World Wide Web Consortium, speaker and author on electronic government and consultant to the World Bank on the eTransform Initiative, observed: “Society often finds itself in a perplexed state as it attempts to understand large issues, solutions, and items given the diversity of environments, actions, and opinions throughout diverse cultures of the world. It is often challenging, given this diversity, to determine what is the best course of action/plan to move forward. A growing mass of data can help better inform decision-making, identify trends, and connect data bits to see a larger picture than what was previously known. Big Data will, however, offer challenges unless the tools, methods, and technologies are developed that can aid in relating unstructured data together to tell a story. The tools, methods, and technologies will be the challenge in 2020, not the availability of the data itself. Society will continue to struggle with privacy. As more and more of our lives are archived and mined on the Web, opportunists will continue to explore ways to exploit the available data for their own, not-so-honest means. How we respond and manage should continue to be a major focus in the Internet community through 2020. We must understand the challenges and opportunities, know the gaps that exist, and offer the best chances for addressing these.”

Michael Castengera, senior lecturer at the Grady College of Journalism, wrote: “At one point, futurists talked about the development of a ‘global brain’ through the Internet. That scenario may appear to be hyperbole but not if you differentiate between the definitions of ‘brain’ and ‘mind.’  The brain now creates its own version of algorithms, which allow for advanced correlations and new understanding. In the continuing evolution of the Internet, the algorithms more and more connected disjointed data in a way that mimics the brain’s connecting synapses. As developments continue, the analogy could be drawn of an Internet that has an autonomous and semi-autonomous nervous system. Control of that development will be held by held by major institutions—not just corporations, but the direction of that development will be determined by individuals. It is the debate about what is the nature of the Internet and what it is to be—a debate that will pit individuals against institutions. My concern is that individuality will be lost to an institutional hegemony not because of their ‘selfish agendas’ so much as individuals’ complacency and acceptance of continuing personal intrusions.”

And an anonymous respondent responded: “The more that data sets are open and accessible, entrepreneurial Web-savvy types will harness that raw material for different ends, and many times these may be philanthropic. We will see more visual representations of large data sets that will enable people to see the impacts of their activities as they play out in other parts of the world. Big Data will be used to forecast and predict, more simulations will be played out, and these simulations will help people to understand the complexity of our correlation to each other, as beings on this planet and beyond. People will try to ‘fix’ or ‘game’ scenarios based on simulations. We’ve already seen this in the past decade with the Wall Street crisis, but systems of this size and complexity are dynamic and self-regenerative. The realization of dynamic and emergent systems as a natural order will cause people to realize the foolishness of trying to game systems to the Nth degree. We will see the rise of more algorithmic thinking among average people, and the application of increasingly sophisticated algorithms to make sense of large-scale financial, environmental, epidemiological, and other forms of data. Innovations will be lauded as long as they register a blip in the range of large-scale emergent phenomena.”

  1. numoffset=”6″ See “Core Techniques and Technologies for Advancing Big Data Science & Engineering.” National Science Foundation. Available at: http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504767
  2. Steve Lohr, “The Age of Big Data.” New York Times. Feb. 11, 2012. Available at http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html
  3. “CMOs: Big data is a game changer, but most say they’re not in the game yet.” Tech Journal. March 30, 2012. Available at: http://www.techjournal.org/2012/03/cmos-big-data-is-a-game-changer-but-most-say-theyre-not-in-the-game-yet-infographic/?utm_source=contactology&utm_medium=email&utm_campaign=eWire_%2803-30-12%29
  4. See for more details: http://answers.google.com/answers/threadview/id/526503.html
  5. “7 reasons your credit card gets blocked.” Fox Business News. Available at: http://www.foxbusiness.com/personal-finance/2010/08/06/reasons-credit-card-gets-blocked/
  6. Bruce Upbin, “How Intuit Uses Big Data for The Little Guy.” Forbes. April 26, 2012. Available at: http://www.forbes.com/sites/bruceupbin/2012/04/26/how-intuit-uses-big-data-for-the-little-guy/
  7. Patrick Tucker, “The Three Things You Need to Know About Big Data, Right Now.” World Future Society. Available at: http://www.wfs.org/content/three-things-you-need-know-about-big-data-right-now