From IEEE Communications Magazine February 2017
Communications, ComSoc, and Big Data: What and What Next?
Last year in the President’s Pages we covered three new technology areas in which ComSoc was involved: 5G, IoT, and Fog. This article describes Big Data, a relatively new area for ComSoc. Leading the IEEE Big Data Initiative is David Belanger, who is also assisting ComSoc by taking a critical role in guiding and supporting its transition from an Initiative to a viable program.
David Belanger is currently a Senior Research Fellow at Stevens Institute of Technology. He continues his work in Big Data Technology, Applications, and Governance, and is a leader in the Business Intelligence & Analysis Master’s Degree program. He leads the IEEE Big Data Initiative (bigdata.ieee.org), which addresses activities ranging from a new data repository to standards and educational activities in support of IEEE in the area of Big Data. In addition, he is involved in several national and international activities supporting Big Data issues in policy and education; sits on the advisory boards of several companies, journals, and university programs; and is active in research and speaking on topics related to Big Data. He currently holds 31 patents.
Dr. Belanger retired as Chief Scientist of AT&T Labs, and Vice President of Information, Software, & Systems Research. He created the AT&T InfoLab, a very early (1995) participant in “Big Data” research and practice. Prior to that, he led the Software Engineering Research Department at Bell Labs. He holds a Ph.D. in Mathematics from Case Western Reserve University. Among his awards are the AT&T Science and Technology Medal for contributions in very large scale information mining technology; AT&T Fellow for “lifetime contributions in software, software tools, and information mining”; and the IEEE Communications Society Industrial Innovator Award.
For those of us who have been involved in the evolution toward Big Data over the past two decades, one of the most eye opening changes has been the ability to do things that were previously too difficult, too costly, or just impossible. An example is the ability to discover rare events, e.g., fraud, from massive amounts of data. Prior to the ability to move and store terabytes of data, analysts were required to sample data, apply aggregate thresholding tests to the data, and/or spend large amounts on hardware and communications. Even though the amount of data has changed by several orders of magnitude over two decades, it is now possible to isolate specific instances by their own, specific behaviors. Combining this flow of data with advanced analytics, sometimes deep learning, has created opportunities in areas like medicine, video analysis, finance, telecom, etc., many of which are just emerging into production.
This revolution in the use of data is dependent on a variety of things, including: advances in the cost and power of the computers used, particularly in parallel and distributed computing; the availability of software that scales to manage the huge amounts of data; enormous improvements in the analytic techniques available; and improvements in the understanding of valuable uses for the data. At the front of this list needs to be the ability to move the data so that it is available where and when it is most useful, and the role that communications, and the communications industry, has played in the evolution of Big Data.
Over time, the role of communication has been central to the evolution of Big Data in at least three ways. The first, and probably most obvious, is the movement of the data itself. The amount of bandwidth, both for access and long haul, has been increasing exponentially for at least two decades, as has the ease of access to the bandwidth. Looking at the accompanying figure, one can clearly see a correlation between step function changes in the communications technologies available, e.g., Internet, IP, 3/4G cellular, WiFi, and the amount of data that is available to be analyzed. It is also not hard to argue that, as the access to these technologies has become easier, e.g., web, smartphones, and apps, the amount of data generated by and results consumed by the “crowd” has exploded.
But this is not the whole story. There are very significant changes in the sources of the data, and in the nature of the data from those sources. Transactional data has long been the core of data used for corporate analysis, e.g., credit card swipes, telephony call detail data, and operational data from industries ranging from finance to manufacturing to web purchases to hospitals. This type of data is still at the heart of much corporate data analysis. Increasingly though, it is augmented by data that is far more detailed, perhaps from search engines, social networks, or even clickstreams and cookies, and is often unstructured, e.g., Twitter tweets and other textual data. This data is detailed enough to allow analysts to consider the behavior of users, as well as their purchasing decisions. We are now at the beginning of yet another step function change in the underlying communications technology, and this will, in turn, drive a new generation of applications, a new and much larger set of sources for data and consumers of the results of analysis, and even more convenient and widespread access to the data and its results. These include: social networking, 5th generation wireless (5G), software defined networks (SDN)/network function virtualization (NFV), Fog, and IoT (Internet of Things)/CPS (cyber-physical systems) among other technologies. Some of these have already changed the way we communicate, and the way we do business. We are only at the tip of the iceberg.
A second way that communications has influenced the evolution of Big Data is the contributions made by communications corporations to the technologies and applications in Big Data. During the early days of the current Big Data, e.g., mid 1990s, the technologies and uses of Big Data were led by a few industries. Significant among them were telecom and finance. Joined by the various cyber industries a bit later and medicine more recently, they still are among the leaders in this space. The applications that drive the communications industry include: network operations, service operations, customer marketing and service, fraud and security, location based applications, and many more. These applications have been driven by successively larger and more compelling sources of data. On top of this are ever larger, and incredibly rich, sources of data created by very different forms of communications such as social networking (e.g., Facebook, Twitter).
The third way that communications has contributed to Big Data is one that is often overlooked. The policy implications of the availability of such large, and very detailed, sets of data, along with the associated advances in analytics applied to the data, have been significant, critical, and often contentious. These include things like governance, policy, compliance, organization, privacy/security, and the societal impact of applications. What we can expect in these areas is going to determine where big data can actually be leveraged. Some big issues are just starting to evolve, and policy has not yet kept up. An easy place to think about this is in the area of location data. Accurate locations of individuals using GPS or other technologies are now held by a variety of organizations. This data has many critical applications for first responders and law enforcement organizations, as well as less critical but important and widely used applications in social interactions, marketing, manufacturing, and many more.
We are now on the cusp of yet another set of changes in the communications systems that will certainly increase the amount, and value, of big data by several orders of magnitude. In fact, it is likely to lead to what McKinsey and others are referring to as the digitization of our world. The communication technologies involved are led by 5G wireless, which is scheduled to be widely available in the US, and several other countries, within the next few years; and the associated Internet of Things, a potentially far larger, and very different, version of a “crowd” (a “crowd” of things as well as people), which will once again fundamentally change how we think about data, its availability, and its uses. We can expect that the communications infrastructure will support terabit/second communications, tens of billions of sensors operating off very low power sources, and latency in the single digit milliseconds. This will surely support new levels of predictive analysis. It also means that real-time, immersive, augmented reality and gaming applications across wide areas will appear. One could think of, for example, computer games that can be played involving physical as well as virtual motion, across continents, in real time. This could include the use of huge databases that support realism in terms of video, audio, as well as structured data.
As you might expect, the IEEE Communications Society (ComSoc) is very active in Big Data on a number of fronts. Several of our volunteers, including myself, are actively involved in the IEEE Big Data Initiative (bigdata.ieee.org) with activities including: support/proposal of new and existing publications, conferences, and workshops; new standards and educational proposals; an exciting new data repository (https://ieee-dataport. org/) now being trialed; outreach (e.g., Collabratec, Facebook, Twitter, podcasts); analytics contests at selected IEEE conferences; and many other activities. In addition, ComSoc has a Technical Committee on Big Data (TCBD (http://bdpan.committees. comsoc.org/)) actively involved in many Big Data issues related to communications.
Big Data is a prominent topic in nearly every major Com- Soc conference, including Globecom, ICC, and NOMS, in the form of speakers, panelists, special sessions, and contests. In addition, ComSoc offers a large suite of webinars, webcasts, white papers, local and international conferences, and other educational programs available to all members on the topics described in this column. To go more deeply into these and other communication areas, I encourage you to visit www.comsoc. org and look at the large catalog of learning material in the areas discussed, and more.
Looking to the future, I expect ComSoc to be deeply involved in a variety of Big Data activities. For example, as the IEEE Big Data Initiative transitions from Initiative status, ComSoc will have a critical role as one of the primary IEEE Societies guiding and supporting that transition. ComSoc has been among the leading Societies in support of the activities of the initiative, and along with several other IEEE Societies who have been active in the Initiative, it will be imperative that, through volunteers, funding, and active support, the various activities already initiated be expanded and made permanent. For example, the data repository IEEE Dataport is now in trial. Success will depend on an active user community, both contributors and users of the data, and on involvement of volunteers to provide guidance in the product evolution, and where necessary, funding until it becomes self-supporting. The analytical contests currently being held at ComSoc and other IEEE Society conferences are aimed at learning how IEEE can more effectively use its valuable storehouse of data. This process will take a while, and we need to be leaders as it evolves through providing sources of data, expertise of volunteers, and continued conference support. In addition, Big Data will continue to be an essential component of conferences, standards, and, educational activities. ComSoc, because of the role that communications plays in Big Data and its future, must take a leadership role going forward. Some newer challenges, for example ensuring the quality and survivability of large, complex Big Data systems, are areas that ComSoc should take the lead in addressing.
So what can we expect in the combination of communications and big data? As described above, some trajectories are clear. Much faster, lower latency, lower-power communications leading to orders of magnitude more data of all forms. Also, the emergence of increasingly powerful techniques for management, analysis, and visualization of data, leading to applications that we probably can’t even imagine today. With the increase in machine learning based applications embedded in large systems of systems, we can expect much more research in deploying and operating such systems. Finally, we can expect a dynamism that we haven’t come close to yet, driven by more open data about anything and everything, more innovative consumer and business applications, and a world in which we can communicate with nearly everything.