Challenging the Big Data Tidal Wave in the Cloud and Social Media
Emergence of New Information Technology Ecosystem
For several decades, the Internet has been growing explosively and generating tremendous benefits for our world. As pointed out by Werbach (1997), the Internet is fundamentally different from other communications technologies (i.e., the traditional telephone network). Since the Internet has an open and flexible architecture, it could provide the endless spiral of connectivity; that is, any form of network could connect to and share data with other networks through the Internet. As a result, the services provided through the Internet are separated from the underlying infrastructure to a much greater extent than with other media.
Cloud computing (Harmer et al., 2009; Hayes, 2008; Milojicic, 2008; Weiss, 2007) has emerged as a new generation of business infrastructure environment. Different from the traditional wired and client/server-based system architecture, this platform consists of wireless and cloud-based system environments. It supports new business models, such as user-driven purchase and click install on any device. It also creates new service deployment models by enabling lower total cost of ownership (TCO), scalability and short time-to-usage.
People can communicate and interact with anyone, anytime and anywhere, using smartphones. Smartphones (i.e., iPhone and Galaxy), with their rich application support, are one of the fastest growing fields in the mobile communication industry. Unlike traditional cellular phones, today's smartphones are used for sharing information (i.e., social networking and geographic location services) and enjoying entertainment (i.e., games and sports), which is called infotainment (Moy et al., 2005). The wide adoption of smartphones has opened new opportunities to business organizations, driving innovation in business. As a consequence of these initiatives, businesses will experience increased productivity and efficiency.
Another area of distinctive growth in the IT ecosystem is social media (Cusumano, 2011; Häsel, 2011; Violino, 2011; Foster et al., 2010; Beckman, 2010; Hathi, 2009). With mobile cloud computing and social media tools, the world is changing and becoming more intelligent and interconnected. These phenomena have become a revolutionary driving force for the development of a new digital era, which is called big data. People are coming to enjoy intelligent digital life.
Evolution of Computing Platforms
Modern enterprises (Sturdevant, 2011) are relying heavily on information systems. A mainframe system was introduced in the 1960s/1970s, which is a timesharing system to serve many connected terminals with large and powerful data processing systems. In the 1980s, personal computers (and workstations) were connected to each other, which is called networked PCs, but still they were communicating within the company, using private networking software. In the 1990s, Internet-based enterprise information systems were introduced. The employee could use the enterprise information systems through the Internet regardless of geographical distance. Rather, Web-based standards and protocols were embedded in the enterprise systems.
Recently, we have seen the emergence of a new enterprise information system platform, called the cloud computing platform (Cusumano, 2011). This platform uses the concept of grid (Kurdi et al., 2008; Abramson et al., 2002), which is to build a virtual supercomputer to connect many networked computers and then to aggregate resources (i.e., CPUs, storage, power supplies, network interfaces, etc.) to utilize them collectively. Cloud computing has been made possible by the shift to Internet technologies that are built on Web-based standards and protocols. Figure 1 shows the comparison of architecture among mainframe, client/server, and cloud computing platforms.
For the last couple of decades, the client/server architecture (Abdul-Fatah, 2002) has been the main architecture of the Internet. This client/server is built on the distributed environment. Technologies are introduced in server systems to create a virtual form of operating systems, storage devices and network resources. There are different levels of virtualization, such as users, applications, processors, storage devices, and networks. Virtualization allows multiple accesses to different devices by users. It is like one computer controlling other machines by consolidating information to improve efficiency.
Cloud computing (Mell and Grance, 2011) is a form of virtualization that involves data outsourcing with no up-front cost and provides just-in-time services. It is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of computing resources that can be rapidly provisioned with minimal management effort or service provider interaction. It provides resources over the Internet on demand and eliminates the cost for in-house infrastructure. The key drivers for cloud computing are bandwidth increase in networks, cost reduction in storage systems, and advances in database systems.
As shown in Figure 2, cloud computing has three typical types of business models (Sotomayor et al., 2009): Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). In SaaS, customers can use applications, but cannot control the operating system, hardware or network infrastructure. In PaaS, customers can use the hosting environment (i.e., servers) as well as their applications, but still cannot control the operating system, other hardware and network infrastructure. In IaaS, customers can use fundamental computing resources, such as processing power, storage, and network components, and they also can control the operating system, storage, deployed applications and possibly networking.
Big Data as a New Tidal Wave
Big data is newly spotlighted with the popular use of social media (Lacho and Marinello, 2010) such as Twitter (Weng et al., 2010; Jansen et al., 2009), Facebook (Chiu et al., 2008), and Flickr (Cha et al., 2009), which are becoming more prevalent and have created a new way of life for people (Hathi, 2009). According to IBM (2011), more than 2 billion Internet users and 4.6 billon mobile phones users are in the world. Facebook (Foster et al., 2010) has more than 500 million users and creates 30 billion pieces of content every month. And, every day, about 340 million data are exchanged on Twitter. As a result, we are living the "Age of Big Data."
What is big data? There is no single definition of big data, but, broadly speaking, it is the tidal wave of data, not only volume but also velocity and variety, from cloud computing and social media. Narrowly, it can be defined as data sets whose size is beyond the ability of typical database software tools to capture, store, manage, analyze, and visualize. However, the size of a database that qualifies as big data is changed, so its definition varies by industry sectors. Big data today ranges from a few dozen tera bytes (TB) to multiple peta bytes (PB).
There are two types of data: structured and unstructured. Structured data refers to data with a high degree of organization in a structure so that it is identifiable, such as data in a database. Unstructured data is the opposite. The typical types of unstructured data include video clips, weblogs, social media feeds, etc. For example, email is a type of unstructured data because it does not generally write about precisely one subject and even the format of data. Data in spreadsheets, on the other hand, is an example of structured data because it can be arranged in a database system. In reality, about 80 percent of the world's data in the business world is unstructured. It may be data we've aggregated before but could not process with current data mining tools.
The characteristics of big data are large volume, high velocity, and wide variety. Data volume is expanding due to the increase of social media, online data collection and location data, to name a few. Volume is also accelerating with additional online activity and usage of sensor-enabled devices. The pace of business activity and competitive pressure increases as companies begin to use data on a more frequent basis, including streaming data.
The Challenge of Big Data to Business Firms
The Internet is the backbone of our society, while mobile cloud computing is a central source of social change. Social media has created big data, which is beyond the ability of typical database software tools to capture, store, manage, analyze, and visualize. Today, businesses are challenged by big data because it grows so large that it becomes awkward to work with using on-hand database management tools. However, big data has big potential in that it can generate significant value across sectors such as health care, retail, manufacturing, and the public sector.
- Abdul-Fatah, I. (2002). Performance of CORBA-based client-server architectures. IEEE Transactions on Parallel and Distributed Systems, 13 (2), pp. 111-127.
- Abramson, D., Buyya, R., and Giddy, J. (2002). A computational economy for grid computing and its implementation in the Nimrod-G resource broker. Future Generation Computer Systems, 18 (8), pp. 1061-1074.
- Beckman, M. (2010). Enterprise security vs. social media. System iNEWS, SystemiNetwork.com, pp. 21-27.
- Cha, M., Mislove, A., and Gummadi, K. (2009). A measurement-driven analysis of information propagation in the Flickr social network. In Proceedings of the
18th International Conference on World Wide Web.
- Chiu, P., Cheung, C., and Lee, M. (2008). Online social networks: Why do we use Facebook? Communications in Computer and Information Science, 19, pp. 67-74.
- Cusumano, M.A. (2011). Technology strategy and management: Platform wars come
to social media. Communications of the ACM, 54 (4), pp. 31-33.
- Foster, M. K., Francescucci, A., and West, B.C. (2010). Why users participate in online social networks. International Journal of e-Business Management, 4 (1), pp. 3-19.
- Harmer, T., Wright, P., Cunningham, C., and Perrott, R. (2009). Provider-independent use of the cloud. In Proceedings of the 15th International European Conference on Parallel and Distributed Computing, p. 465.
- Häsel, M. (2011). OpenSocial: An enabler for social applications on the Web. Communications of the ACM, 54 (1), pp. 139-144.
- Hathi, S. (2009). How social networking increases collaboration at IBM. Strategic Communication Management, 14 (1), pp. 32-35.
- Hayes, B. (2008). Cloud computing. Communications of the ACM, Issue 7, pp. 9-11.
- IBM. (2011). Better business outcomes with business analytics. White Paper, IBM Software Group.
- Jansen, B.J., Zhang, M., Sobel, K., and Chowdury, A. (2009). Twitter power-tweets as electronic word-of-mouth. Journal of the American Society for Information Science and Technology, 60 (11), pp. 2169-2188.
- Kurdi, H., Li, M., and Al-Raweshidy, H. (2008). A classification of emerging and traditional grid systems. Distributed Systems Online, Issue 3.
- Lacho, K. J., and Marinello, C. (2010). How small business owners can use social networking to promote their business. Entrepreneurial Executive, 15, pp. 127-133.
- Mell, P., and Grance, T. (2011). The NIST definition of cloud computing. Special Publication 800-145. Retrieved from http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf.
- Milojicic, D. (2008). Cloud computing: Interview with Russ Daniels and Franco Travostino. IEEE Internet Computing, Issue 5, pp. 7-9.
- Moy, P., Xenos, M., and Hess, V. (2005). Communication and citizenship: Mapping the political effects of infotainment. Mass Communication and Society, 8 (2), pp. 111-131.
- SAP. (2011). SaaS, PaaS, cloud computing: The next generation of enterprise software, Presentation Slides, SAP.
- Sotomayor, B., Montero, R., Llorente, I., and Foster, I. (2009). Virtual infrastructure management in private and hybrid clouds. IEEE Internet Computing, 13, pp. 14-22.
- Sturdevant, C. (2011). Socializing the enterprise. eWeek, 28 (1), p. 34.
- Violino, B. (2011). Social media trends. Communications of the ACM, 54 (2), p 17.
- Weiss, A. (2007). Computing in the clouds. netWorker, 4, pp. 16-25.
- Weng, J., Lim, E., Jiang, J., and He, Q. (2010). Twitterrank: Finding topic-sensitive influential Twitterers. In Proceedings of the Third ACM International Conference on Websearch and Data Mining, ACM.
- Werbach, K. (1997). Digital tornado: The Internet and telecommunications policy. Working Paper, FCC Office of Plans and Policy, No. 29.