Access Millions of academic & study documents

AbstractBig Data concerns large volume, complex, growing data set

Content type
User Generated
Showing Page:
1/30
Abstract:
Big Data concerns large-volume, complex, growing data
sets with multiple, autonomous sources. With the fast
development of networking, data storage, and the data
collection capacity, Big Data is now rapidly expanding in
all science and engineering domains, including physical,
biological and biomedical sciences. This article presents a
HACE theorem that characterizes the features of the Big
Data revolution, and proposes a Big Data processing
model, from the data mining perspective. This data-driven
model involves demand-driven aggregation of information
sources, mining and analysis, user interest modeling, and
security and privacy considerations. We analyze the
challenging issues in the data-driven model and also in the
Big Data revolution.
Introduction
Dr. Yan Mo won the 2012 Nobel Prize in Literature. This is
probably the most controversial Nobel prize of this
category, as Mo speaks Chinese, lives in a socialist
country, and has the Chinese governments support.
Searching on Google with Yan Mo Nobel Prize, we get
1,050,000 web pointers on the Internet (as of January 3,
2013). For all praises as well as criticisms, said Mo
recently, I am grateful. What types of praises and
criticisms has Mo actually received over his 31-year writing
career? As comments keep coming on the Internet and in
various news media, can we summarize all types of
opinions in different media in a real-time fashion, including
updated, cross-referenced discussions by critics? This
type of summarization program is an excellent example for
Big Data processing, as the information comes from
multiple, heterogeneous, autonomous sources with
complex and evolving relationships, and keeps growing.
Along with the above example, the era of Big Data has
arrived (Nature Editorial 2008; Mervis J. 2012; Labrinidis

Sign up to view the full document!

lock_open Sign Up
Showing Page:
2/30
and Jagadish 2012). Every day, 2.5 quintillion bytes of
data are created and 90% of the 2 data in the world today
were produced within the past two years (IBM 2012). Our
capability for data generation has never been so powerful
and enormous ever since the invention of the Information
Technology in the early 19th century. As another example,
on October 4, 2012, the first presidential debate between
President Barack Obama and Governor Mitt Romney
triggered more than 10 million tweets within two hours
(Twitter Blog 2012). Among all these tweets, the specific
moments that generated the most discussions actually
revealed the public interests, such as the discussions
about Medicare and vouchers. Such online discussions
provide a new means to sense the public interests and
generate feedback in real-time, and are mostly appealing
compared to generic media, such as radio or TV
broadcasting. Another example is Flickr, a public picture
sharing site, which received 1.8 million photos per day, on
average, from February to March 2012 (Michel F. 2012).
Assuming the size of each photo is 2 megabytes (MB), this
resulted in 3.6 terabytes (TB) storage every single day. As
a picture is worth a thousand words, the billions of
pictures on Flicker are a treasure tank for us to explore the
human society, social events, public affairs, disasters etc.,
only if we have the power to harness the enormous amount
of data. The above examples demonstrate the rise of Big
Data applications where data collection has grown
tremendously and is beyond the ability of commonly used
software tools to capture, manage, and process within a
tolerable elapsed time. The most fundamental challenge
for the Big Data applications is to explore the large
volumes of data and extract useful information or
knowledge for future actions (Rajaraman and Ullman,
2011). In many situations, the knowledge extraction
process has to be very efficient and close to real-time
because storing all observed data is nearly infeasible. For

Sign up to view the full document!

lock_open Sign Up
Showing Page:
3/30

Sign up to view the full document!

lock_open Sign Up
End of Preview - Want to read all 30 pages?
Access Now
Unformatted Attachment Preview
Abstract: Big Data concerns large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data is now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences. This article presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective. This d ata-driven model involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. We analyze the challenging issues in the data-driven model and also in the Big Data revolution. Introduction Dr. Yan Mo won the 2012 Nobel Prize in Literature. This is probably the most controversial Nobel prize of this category, as Mo speaks Chinese, lives in a socialist country, and has the Chinese government’s support. Searching on Google with “Yan Mo Nobel Prize”, we get 1,050,000 web pointers on the Internet (as of January 3, 2013). “For all praises as well as criticisms,” said Mo recently, “I am grateful.” What types of praises and criticisms has Mo actually received over his 31 -year writing career? As comments keep coming on the Internet and in various news media, can we summarize all types of opinions in different media in a real -time fashion, including updated, cross-referenced discussions by critic ...
Purchase document to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.
Studypool
4.7
Indeed
4.5
Sitejabber
4.4

Similar Documents