It’s one of those terms that gets bandied around a lot. But do you truly understand what it means? I’m going to suggest various meanings to help you get a handle on what big data is. Understanding the various possible meanings is more helpful than trying to get your head around one very fuzzy definition.
Big data is a challenge
According to Gil Press, the term “big data” first appeared in 1997. Engineers at NASA had vast amounts of data, but weren’t able to visualise them because computational resources were limited. This leads to the first, most basic definition of big data: volumes of data so vast that current technology has trouble processing them.
This is all well and good, but “current” in 1997 is very different from “current” today. It also differs from one person, or organisation, to the next: what is “big data” to me, Google or IBM probably wouldn’t bat an eyelid at.
Unsatisfactory as it may seem, it does describe a general trend, and a challenge that arises from it: with massive increases in data collection, hardware and software are struggling to keep pace.
Tearing down the walls between datasets
The bigness of data is not only about the volume, but about the variety of data. We’re seeing a paradigm shift from data warehouses to data lakes. In the past, data had to fit into a pre-determined structure to be useful. Now, you can sling your data into a data lake and let algorithms pull out on-the-fly queries. The walls between datasets are being torn down.
Tearing down the walls between people
This leads to another challenge: making people from various departments work together to draw meaning out of their various datasets. A good example of how to do this (and how challenging it is) can be found in a case study on Fraport – the company that operates Frankfurt Airport.
They started organising annual Smart Data Labs in 2015, each of which focussed on a specific question that would help the business in some way. For example, optimising the accuracy of plane arrival predictions could avoid ground crew waiting unnecessarily for late planes or – the opposite scenario – planes waiting unnecessarily for ground crew..
The organisation of the Smart Data Lab bore all the hallmarks of a change-management exercise: having to tread carefully around managers of various departments to ensure they didn’t fear the new transparency and were happy to have light shone on potential mistakes; prising datasets out of departmental control; getting buy-in; getting senior management on board. It sounds like bloody hard work!
(You can find the whole story in Part XIII of Digital Marketplaces Unleashed by Claudia Linnhoff-Popien, , Ralf Schneider and Michael Zaddach (2017). If you’re really that interested, get in touch and I might be able to help you access a digital copy.
Big data is a macro-trend
If you had gone straight to Wikipedia instead of coming here, you might have noticed the graphic by Hilbert and Lopez. It shows how, around 2002, we truly entered the digital age. That was when, they estimate, the amount of data stored digitally overtook analogue storage such as vinyl records and video cassettes. Since then, there has been an explosion in the quantity of data collected. Think of all the MP3s, videos, images, audio, server log files, purchase histories; not to mention everything we post on social media and the metadata that we (often inadvertently) post along with it such as our location. This is pitched at a global level, and it’s something we as individuals and businesses cannot change but can only hope to adapt to. That’s what makes it a macro-trend.
You’re part of it
The nice thing about this perspective is that everyone can relate to it. Data is being collected about all of us, and indeed by all of us. Your data needn’t itself be “big” (at least not by Google or Amazon’s standards), but it’s still part of a massive quantity of data worldwide.
A trend is something that changes over time. So another reason you’re part of this macro-trend is that you probably wouldn’t have had access to this data just a few years ago.
Take a smalltime shopkeeper. They could keep tabs on the number of customers coming into their boutique with a simple door counter, then compare this with basic daily revenue data, website and social-media analytics data, and weather data. This information is only available to mere mortals these days because hardware is powerful and cheap (the connected door counter, the laptop, the cloud-computing facilities) and data sources (such as the weather) are readily available. The advent of social media and ease of access to Google Analytics are also factors.
So in this sense, “big” doesn’t mean whether you feel like what you’re doing is very big: it’s more about the volume of the data on a macro-level, and the level at which this is all playing out. You’re part of it!
Big data is an opportunity
Yet another way of approaching big data is to ask: “what is it good for?” What opportunities does it present? Those NASA scientists must have been trying to get some kind of benefit out of visualising that “big” data; something that they couldn’t have done with, well, “small data”. Why should we continue to push the boundaries so the snake can swallow ever greater bundles of data?
The standard answer to this is that you can harvest insights you wouldn’t otherwise have been able to. The Fraport example above shows that there are commercial advantages to be had from sifting through data for gold. The shopkeeper example show that this isn’t just something for large organisations.
Frankly: it’s the way things are going. In business, they’ve started calling it “accountability”, a phrase previously reserved for answering to an electorate or even a higher being at the pearly gates. Taken with a pinch of salt, though, it does make sense to do A/B tests on content, get to know people who use your products and services better, and get a little bit closer to understanding what they want. On a personal level, if you can collect data to prove your worth (as vulgar as that may seem to some), your survival chances at work are higher.
An important point is also that it helps you think about why you are doing what you are doing. There should be some strategy behind your actions.
As new technologies ripen (AI, chatbots, voice interfaces: I’m looking at you) everything will be about data . For example: if you ask a chatbot to book you a table at a restaurant, it won’t phone up for you and speak to a waiter who will then pencil in your time in their book. No, it will be dealing with databases and these will need to be made accessible via APIs for this future tech.
AI will be the backbone for many technologies: so it could automatically go through all the data it can get access to and come up with insights, revise your customer segments, create content for you and start targeting your ads at appropriate audiences. And so much more.
Anyway: I’m starting to waffle. You get the picture.
Big data is an industry
Big Pharma, Big Oil … Big Data. They say that data is the new oil and you only have to look at the likes of Facebook, Amazon and Google to understand why.
But the definition of big data is even bigger
Perhaps you’re surprised that I haven’t given you the Vs yet </british_humour>. The three, sometimes four, Vs explain why big data is not just about “big”. Alongside volume, there are variety, velocity, some say veracity. Some add on “value”. Just go away and Google those.
If you thought that was all there is to it: sorry. Apparently the three or four Vs are only the first part of a three-part definition. If you want the low down on that: check out this article.
So hopefully that will helpy you think about big data and understand it better.
Thanks for reading this far and if you’ve got any questions, corrections or suggestions, get in touch!
(Featured image: photo copyright John heaven)