What is Big Data?

The term big data refers to the massive amounts of data – both structured and unstructured – that inundate organisations on a day-to-day basis.

Typically, big data is so large, and accumulates so fast, that traditional data storage and processing applications are inadequate.

The big data industry helps organisations capture and analyse their big data, so that those organisations can make more informed business decisions.

How “Big” is Big Data?

Big data is not defined by a predefined size or volume. The size of big data will depend on the organisation. Big data for a small flower shop will be different to big data for a multi-national organisation.

However, most companies offering solutions for big data cater for extremely large amounts of data.

Here are some statistics/estimates regarding big data:

  • Most companies in the US store over 100 terabytes (100,000 gigabytes) of data.
  • The US Stock Exchange captures 1 terabyte of trade data during each session.
  • Google processes 3.5 billion requests per day.
  • Google stores 10 exabytes of data (10 billion gigabytes).
  • Facebook stores 500 million terabytes of data.
  • Amazon has over 1.4 million servers (where they host over a billion gigabytes of data).

And these figures have probably already been superseded many times over by now.

Furthermore, International Data Corporation (IDC) predicts that the world’s data is doubling in size every two years, and by 2020 our planet will have created 44 zettabytes (44 trillion gigabytes) of data.

The “3 Vs” of Big Data (or is it 7, or 12?)

In 2001, industry analyst Doug Laney drafted a research paper where he outlined the 3Vs of big data. The 3Vs are a framework for understanding and dealing with big data.

The 3Vs are:

  • Volume. The amount of data available to organisations has become enormous. Data sets are increasingly gathered by cheap and numerous information-sensing mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers and wireless sensor networks. And it continues to grow. Data is increasingly coming from new channels, and the depth and breadth of data available on any given entity is increasing.
  • Velocity. The rate at which new data becomes available is also increasing. Many applications make data available in realtime. RFID tags, sensors and smart metering are pushing companies to capture and interpret data as soon as its received.
  • Variety. Data is available in a plethora of formats. Data can be structured, unstructured, or semi-structured. It could be stored in a relational database. It could be unstructured data found in documents, email, video, audio, etc. And even each of these can be stored in any number of different file formats. An image for example, could be stored in one of hundreds of different image formats (eg, JPEG, GIF, PNG, TIFF, etc).

Many analysts and big data companies have expanded on the 3Vs of big data – some list 4Vs, some 5vs, adding words like Veracity, Value, etc. There’s also a 7Vs and even a 12Vs.

However, the 3Vs provide a starting point in analysing the issue and finding a solution.

The “Hard”ware Problem

The traditional way of dealing with more data has been to add more hardware (i.e. storage space). This might be fine when dealing with crucial data such as customer details, product sales, etc.

But a lot of the big data available could be viewed as “nice to have” data. Or perhaps it’s potentially crucial data but its unclear what value the data will have, until it’s been collected and analysed.

In this case, the cost/benefit of adding more hardware might start to look weak.

But businesses – especially marketing departments – might not be willing to throw away perfectly good data. After all, by analysing their big data, they could find opportunities that they never knew existed. And taking advantage of the right opportunity could increase – or potentially transform – their business. So they need a solution.

And that’s one of the reasons why big data has become big business.

The Big Data Industry

The big data industry provides solutions to help business capture and analyse their big data. Solutions generally have a heavy focus on performance and scalability.

Some solutions involve massively parallel software running on multiple servers. Other solutions involve one or more NoSQL databases so that unstructured/semi-structured data can be captured and analysed.

One of the characteristics of a big data solution, is the ability for it to find and analyse patterns.

When data is stored in a structured way, such as in a typical relational database, users can query the database for the data they want. In other words, they know what data they’re looking for.

With big data, the user doesn’t always know what to look for. The data is so wide and varied, that nobody in the business knows its true value. This is where the big data analytical solutions can provide invaluable insights into the data that is being generated on a day to day basis.