Up until the recent past, the most pressing issue facing the data-driven community has been the cost of data collection and storage. Technological advancements in automated collection and the plummeting cost of data storage has nullified these issues dramatically and created a flood of captured data.
Now, organizations have nearly unlimited access to both structured (such as production line statistics, financial data) and unstructured (brand impressions, social network mentions, etc.) data, which can all be stored indefinitely. The problem has evolved from one of scarcity to one of choice paralysis caused by an overabundance of data sources and uncertainty about the value of individual data points.
We know that every data point can tell a story, but early in the process of becoming data-driven it is critically important to separate out “noise” from the key data sources that are going to have the most impact on our business. We want to be capturing everything possible, but how can we be assured that the data we use is the right and relevant data?
From my personal experience, the best way to avoid drowning in too much data is to start the process by defining clear KPIs that have a demonstrated impact on your business and THEN finding the data points that support those KPIs. Choice overload in this context often stems from an organization looking at the raw data it collects and trying to parse out what is important from an enormous data set. By first focusing on what you want to achieve, you are able to narrow your selection of the data points that are most relevant to your goals.
This Ted Talk by Susan Etlinger explains why, as we receive more and more data, we need to deepen our critical thinking skills. Because it’s harder to understand things than it is to count them.
Defining your KPIs and selecting the data points that support them comes with its own set of challenges. Sometimes the link between data sets and KPIs is clear but at other times it may be less so.
As an example, we can explore KPI’s for a production line. The first step is to narrow your focus. Do you want your production line to be faster? More efficient? Less labor dependent? Have improved quality control?
While you may answer “yes” to all of the above, it is important to break down your desire for improvement into achievable KPIs. Only once you have determined what aspect of the production line you want to improve should you delve into your data to find the most relevant points.
For example, if you choose to focus on lowering the cost of production, you will want to look closely at the data you are gathering on labor, procurement and automation. Starting from where you want to be, it is easier to determine the data points that will help you get there. In this example, we can see how unstructured data from social media mentions will obviously be irrelevant. If you are focused just on lowering cost, data referring to your brand’s image should be of little concern.
You aren’t discarding the unstructured data, just filing it away as unnecessary to achieving this particular goal at this particular time.
For production lines, KPIs are a relatively straightforward solution. Issues arise, however, when trying to approach something like determining marketing ROI where there are potentially thousands of interconnected data points whose relevancy is less clear. This is where the problem of “too much data” becomes very real.
But the solution of defining your KPIs and selecting data to support them still applies. In trying to determine marketing efficiency, you are going to want to look at the programs you are running, marketing expenses, click-through rates, first touches, etc. All of these data points will be relevant to supporting your marketing efficiency KPI.
But what about brand impressions, social media mentions, and the quality of conversations around your brand? Surely these data points have value in telling a complete story of your marketing efficiency even if it may be more intangible than your click-through rates on advertisements. How do you decide which data points should be considered for your KPIs?
An effective method for determining which data points are critical to your KPIs is to set up A/B testing. Set up your KPI, in this example marketing efficiency, and create two sets of data points from which to track your progress towards meeting it. Over time you can track which data sources are more important to meeting that KPI, and ultimately improving your bottom line.
This is a task where automation tools can be a valuable asset. Setting up a program to monitor and assess the validity of your KPIs can help you understand how those KPIs are impacted by various data sources, and thus tell you which data points are the most important to your organization’s success.
My suggestion for how to best implement this practice in the real world is to stick to the “rule of five.”: