Basic understanding of Big Data and its terminology

This article intended to define big data and its related terms. This article first covers the three V’s of big data—high-volume, high-velocity, and high-variety—before describing how big data can aid in better decision-making for an organization. The article defines a variety of big data terms, but it focuses in-depth on different methods for analyzing unstructured data and the way that data mining is performed to find meaningful patterns. Ultimately, you should come away with a basic understanding of big data and its terminology.

Basic understanding of Big Data and its terminology
Basic understanding of Big Data and its terminology

Content Summary

Three V’s of Big Data
The Other V’s of Big Data
Structured and Unstructured Data
Data Mining
Key Message
Evaluation

Three V’s of Big Data

According to Forbes magazine, we’re creating 2.5 quintillion bytes of data EACH DAY. In fact, 90 percent of all the data in the WORLD was generated in the last three years alone. And, the pace is accelerating. We’re talking COLOSSAL amounts of data.

This brings up the concept of “big data.” Here’s what you need to know about it.

Big data describes the large volume of data that affects the world every day. According to the technology analyst firm Gartner Group, it’s defined by what are called the three V’s … “high-volume, high-velocity, and/or high-variety information assets that demand cost-effective forms of information processing and that enable insight, decision-making, and process automation.”

Got it? I… Just kidding. That’s a lot to understand, so let’s break it down.

High-volume data is driven by activities that generate massive amounts of information. It’s those quintillions of bytes, and it’s so much that it can’t be managed by traditional database systems.

High-velocity data refers to the rate at which this information is flowing into your organization. It’s coming at you fast, in continuous waves.

And, high-variety data is information that comes from multiple diverse sources. So, your data isn’t neatly structured and organized.

Here’s where the second part of the definition comes in. You need to process data, organize it, develop insight from it, and make better decisions with it. The challenge is to shorten the amount of time it takes to handle this information and then quickly use it to make informed decisions. You need to know the conditions as of NOW to respond appropriately.

The Other V’s of Big Data

Incidentally, there are some other V’s that are related to big data.

Data veracity refers to how certain and precise your data is. This is important, because, obviously, you can’t make GOOD decisions from BAD data.

Data value is directly related to data analytics. If you’re spending a lot of money manipulating data, you should know how you’re using it. So, what value is your data generating for you?

Data viability deals with the MASS of data’s various levels of importance. What parts of it deliver the most insight for you?

Related to that is data visibility. It’s estimated that 90 percent of big data is collected but not used. Another name for this is “dark data.” It’s not being used to derive insights or help make decisions. So, there was basically no point to collecting it in the first place.

Big data can also GO dark … and this is called “perishable data.”

According to IBM, 60 percent of data loses its value immediately, meaning it’s useless if it’s not analyzed right away.

Data also perishes when you discard it. You never know when data might be useful. And, once your source data’s gone, it’s gone forever. So, remember: “When you can, keep everything.”

Now that you know the V’s of big data, it’s time to become familiar with some other key terms:

Structured and Unstructured Data

First, there’s structured and unstructured data.

Structured data is anything that can be put into a relational database, where it’ll be organized in a way that relates to other data in a table format. Unstructured data, on the other hand, is everything that CAN’T be organized that way, such as emails, online posts, media, and so on.

So, what do you do with this data? You analyze it … potentially in three different ways.

One … Descriptive analytics. This collates the data … like how you analyze your personal spending and develop a monthly budget.

Two … Predictive analytics. This uses advanced statistical techniques to forecast future events using current data … like using your budget and cost projections to estimate spending for next year.

And, three … Prescriptive analytics. This builds on the predictive analytics to determine what the outcomes would be for various actions … like analyzing your predictions to see what the best options might be for reducing your personal expenditures next year.

Typically, the person who does all this is a data scientist. This is someone who can take all this raw big data, simplify it, make sense out of it, and derive useful insights about the organization.

Data Mining

Another useful term is data mining, which involves finding meaningful patterns in big data using statistical methods, artificial intelligence, machine learning, and algorithms.

Statistical methods encompass a wide range of techniques used to analyze all kinds of data formats.

Artificial intelligence refers to programs that can perform tasks that normally require human intelligence, such as visual perception, speech recognition, and decision-making. It can simulate or imitate human behavior.

Machine learning is an application of artificial intelligence that gives systems the ability to automatically learn and improve from experience … without being explicitly programmed to do so. It accesses data and uses it to learn for itself.

This is done using algorithms, which are step-by-step procedures for performing calculations, data processing, and automated reasoning. Essentially, it’s a set of rules or instructions given to artificial intelligence to help them learn on their own.

All this is what allows a website to show ads that interest you or video service to suggest movies you might like. And, it’s all kept in a data warehouse—the system for storing data that’s going to be analyzed and reported. This is typically created from the normal operating systems of the organization.

Next, cloud computing is when software and/or data is hosted and running on remote servers accessible from anywhere on the internet, or … the “cloud.”

The internet of things, or IoT, is the growing network of connected computing devices that are now embedded into formally standalone items, such as appliances, home sensors, automobiles, and even wearables.

Finally, there are a host of tools and programs you might hear about … things like Hadoop, Cassandra, Impala, NO-SQL, the R programming language, Spark … and many more. Thankfully, you don’t need to remember these unless you’re an IT guru. Just know that there are lots of programming tools out there to help manage your information.

Big data is here to stay … and it’s just going to keep getting bigger and more significant. But, when it comes to big data, … it’s not the amount of data that’s important. It’s what organizations DO with this data that matters. So, get your money’s worth. Learn as much as you can. And, ultimately, use big data to create insights that lead to better decisions for YOUR organization.

Key Message

  • Big data describes the large volume of data that affects the world every day, and it’s defined by what are called the three V’s—high-volume, high-velocity, and high-variety.
    • High-volume data is driven by activities that generate massive amounts of information.
    • High-velocity data refers to the rate at which this information is flowing into your organization.
    • High-variety data is information that comes from multiple diverse sources.
  • Data needs to be processed, organized and used to develop insight and make better decisions.
    • The challenge is to shorten the amount of time it takes to handle this information and then quickly use it to make informed decisions.
  • The other V’s of big data:
    • Data veracity refers to how certain and precise your data is.
    • Data value is directly related to data analytics and what value your data is generating for you.
    • Data viability deals with the mass of data’s various levels of importance and which parts deliver the most insight.
    • Data visibility involves data that is not being used to derive insights or help make decisions.
      • It’s estimated that 90 percent of big data is collected but not used; this uncollected data is referred to as dark data.
  • Perishable data refers to the 60 percent of data that loses its value immediately, and it refers to discarded data.
  • Structured data is anything that can be put into a relational database, where it’ll be organized in a way that relates to other data in a table format.
  • Unstructured data is everything that can’t be organized in a table format but can be analyzed in three ways:
    • Descriptive analytics collates the data.
    • Predictive analytics uses advanced statistical techniques to forecast future events using current data.
    • Prescriptive analytics builds on predictive analytics to determine what the outcomes would be for various actions.
      • The analysis is typically conducted by a data scientist, someone who can use big data, simplify it, make sense of it, and derive useful insights about the organization.
  • Data mining involves finding meaningful patterns in big data using statistical methods, artificial intelligence, machine learning, and algorithms.
    • Statistical methods encompass a wide range of techniques used to analyze all kinds of data formats.
    • Artificial intelligence refers to programs that can perform tasks that normally require human intelligence.
    • Machine learning is an application of artificial intelligence that gives systems the ability to automatically learn and improve from experience by accessing data and using it to learn for itself.
      • This is done using algorithms, which are step-by-step procedures for performing calculations, data processing, and automated reasoning.
  • A data warehouse is a system for storing data that’s going to be analyzed and reported.
  • Cloud computing is when software and/or data is hosted and running on remote servers accessible from anywhere on the internet, or the “cloud.”
  • The internet of things, or IoT, is the network of connected computing devices that are now embedded into formally standalone items.

Evaluation

Question 1

High-_____________ data refers to the rate at which information is flowing into your organization.

A. Value
B. Variety
C. Velocity
D. Valome

Correct Answer:
C. Velocity
Answer Description:
High-velocity data refers to the rate at which information is flowing into your organization.

Question 2

Dark data refers to the 60 percent of data that loses its value immediately.

A. True
B. False

Correct Answer:
B. False
Answer Description:
The 60 percent of data that loses its value immediately is called perishable data.

Question 3

Data ____________ refers to how certain and precise your data is.

A. Value
B. Veracity
C. Viabililty
D. Visibility

Correct Answer:
B. Veracity

Question 4

Which of the following are part of the three V’s of big data? (Check all that apply.)

A. Data viability
B. Data visibility
C. High-variety data
D. High-velocity data
E. High-volume data

Correct Answer:
C. High-variety data
D. High-velocity data
E. High-volume data

Question 5

Data visibility involves data that is not being used to derive insights or help make decisions.

A. True
B. False

Correct Answer:
A. True

Question 6

Algorithms are step-by-step procedures for performing calculations, data processing, and automated reasoning.

A. True
B. False

Correct Answer:
A. True

Question 7

________________ refers to programs that can perform tasks that normally require human intelligence.

A. Artificial intelligence
B. Algorithms
C. Machine learning
D. Statistical methods

Correct Answer:
A. Artificial intelligence

Question 8

Dark data refers to the 60 percent of data that loses its value immediately.

A. True
B. False

Correct Answer:
B. False

Question 8

___________ data is anything that can be put into a relational database, where it’ll be organized in a way that relates to other data in a table format.

A. Dark
B. Perishable
C. Structured
D. Unstructured

Correct Answer:
C. Structured

Question 9

A data warehouse is a system for storing data that’s not going to be analyzed or reported.

A. True
B. False

Correct Answer:
B. False

Question 10

Which of the following is a way to find meaningful patterns in big data for dating mining? (Check all that apply.)

A. Algorithms
B. Artificial intelligence
C. Machine learning
D. Statistical methods

Correct Answer:
A. Algorithms
B. Artificial intelligence
C. Machine learning
D. Statistical methods

Published by Julie Robert

, passionate about technology, Windows, and everything that has a power button, I spent most of the time to develop new skills and learning more about the tech world because I derive great satisfaction from helping readers eliminate technological headaches that plague their day-to-day lives.