What is Big Data – Know More about BIG
Data
“Big data is like teenage sex. Everybody is talking about it,
everyone thinks everyone else is doing it, so everyone claims they are doing
it.”
Click Here to know : Big Data Implementation - Java-Based Tools
Click Here to know : Big Data Implementation - Java-Based Tools
What
is Big Data? You may ask; and more importantly why it is the latest trend in
nearly every business domain? Is it just a hype or its here to stay?
As a matter of fact “Big Data” is a pretty straightforward
term – its just what its says – a very large data-set. How large? The exact
answer is “as large as you can imagine”!
How can this data-set be so massively big? Because the
data may come from everywhere and in enormous rates: RFID sensors that gather
traffic data, sensors used to gather weather information, GPRS packets from
cell phones, posts to social media sites, digital pictures and videos, online
purchase transaction records, you name it! Big Data is an enormous data-set
that may contain information from every possible source that produces data that
we are interested in.
Nevertheless Big Data is more than simply a matter of
size; it is an opportunity to find insights in new and emerging types of data
and content, to make businesses more agile, and to answer questions that were
previously considered beyond our reach. That is why Big Data is characterized
by four main aspects: Volume, Variety, Velocity, and Veracity(Value) known as “The four V's of Big Data”. Let’s briefly
examine what each one of them stands for and what challenges it presents:
Volume
Volume references the amount of
content a business must be able to capture, store and access. 90% of the
world’s data has been generated in the past two years alone. Organizations
today are overwhelmed with volumes of data, easily amassing terabytes—even
petabytes—of information of all types, some of which needs to be organized,
secured and analyzed.
Variety
80% of the world’s data is semi –
structured. Sensors, smart devices and social media are generating this data
through Web pages, weblog files, social-media forums, audio, video, click
streams, e-mails, documents, sensor systems and so on. Traditional analytics
solutions work very well with structured information, for example data in a
relational database with a well formed schema. Variety in data types represents
a fundamental shift in the way data is stored and analysis needs to be done to
support today’s decision-making and insight process. Thus Variety represents
the various types of data that can’t easily be captured and managed in a
traditional relational database but can be easily stored and analyzed with Big
Data technologies.
Velocity
Velocity requires analyzing data in
near real time, aka “sometimes 2 minutes is too late!”. Gaining a competitive
edge means identifying a trend or opportunity in minutes or even seconds before
your competitor does. Another example is time-sensitive processes such as
catching fraud where information must be analyzed as it streams into your
enterprise in order to maximize its value. Time-sensitive data has a very short
shelf-life; compelling organizations to analyze them in near real-time.
Veracity (Value)
Acting on data is how we create
opportunities and derive value. Data is all about supporting decisions, so when
you are looking at decisions that can have a major impact on your business, you
are going to want as much information as possible to support your case.
Nevertheless the volume of data alone does not provide enough trust for
decision makers to act upon information. The truthfulness and quality of data
is the most important frontier to fuel new insights and ideas. Thus
establishing trust in Big Data solutions probably presents the biggest
challenge one should overcome to introduce a solid foundation for successful
decision making.
While the existing installed base of
business intelligence and data warehouse solutions weren’t engineered to
support the four V’s, big data solutions are being developed to address these
challenges.
The Importance of Big Data and What You Can Accomplish
The
real issue is not that you are acquiring large amounts of data. It's what you
do with the data that counts. The hopeful vision is that organizations will be
able to take data from any source, harness relevant data and analyze it to find
answers that enable 1) cost reductions, 2) time reductions, 3) new product
development and optimized offerings, and 4) smarter business decision making.
For instance, by combining big data and high-powered analytics, it is possible
to:
- Determine root causes of failures, issues and
defects in near-real time, potentially saving billions of dollars
annually.
- Optimize routes for many thousands of package
delivery vehicles while they are on the road.
- Analyze millions of SKUs to determine prices
that maximize profit and clear inventory.
- Generate retail coupons at the point of sale
based on the customer's current and past purchases.
- Send tailored recommendations to mobile
devices while customers are in the right area to take advantage of offers.
- Recalculate entire risk portfolios in minutes.
- Quickly identify customers who matter the
most.
- Use clickstream analysis and data mining to
detect fraudulent behavior.
Challenges
Many
organizations are concerned that the amount of amassed data is becoming so
large that it is difficult to find the most valuable pieces of information.
- What if your data volume gets so large and
varied you don't know how to deal with it?
- Do you store all your data?
- Do you analyze it all?
- How can you find out which data points are
really important?
- How can you use it to your best advantage?
Until
recently, organizations have been limited to using subsets of their data, or
they were constrained to simplistic analyses because the sheer volumes of data
overwhelmed their processing platforms. But, what is the point of collecting
and storing terabytes of data if you can't analyze it in full context, or if
you have to wait hours or days to get results? On the other hand, not all
business questions are better answered by bigger data. You now have two
choices:
A. Incorporate massive data volumes in analysis. If the answers you're seeking will be better provided by
analyzing all of your data, go for it. High-performance technologies that
extract value from massive amounts of data are here today. One approach is to
apply high-performance analytics to analyze the massive amounts of data using
technologies such as grid computing, in-database processing and in-memory
analytics.
B. Determine upfront which data is relevant. Traditionally, the trend has been to store everything
(some call it data hoarding) and only when you query the data do you discover
what is relevant. We now have the ability to apply analytics on the front end
to determine relevance based on context. This type of analysis determines which
data should be included in analytical processes and what can be placed in
low-cost storage for later use if needed.
No comments:
Post a Comment