The year 2023 is projected to generate 120 zettabytes (ZB) of data, an increase of 23 ZB, or almost 24%, from last year’s 97. How do you sift through the eye-watering amounts of data available at your fingertips, much more leverage these in decision making, in the Zettabyte Era?
Gerald Valentin of StoryIQ was back for First Philippine Holdings Corporation’s Friday DEAL (Drop Everything and Learn), “Turning Data into Action,” where he discussed the finer points of handling big data with over 100 participants from the Lopez Group.
Data is defined as a collection of facts, including words, numbers and observations. One thing that data is not is information. Data is raw, while information is data that has been processed and sorted. Insight comes from understanding this information. For example, Employee A’s salary is data. If you take her and her colleagues’ salaries and put them in a table with the highest ones on top—that is, high salaries first and low salaries last—that’s information. It becomes an insight if you know the ranks of Employee A’s colleagues who take home the lowest pay.
According to the StoryIQ lead trainer, our eagerness to generate insights makes us prone to succumb to the traps of data analysis. These include cognitive bias (also known as unconscious bias, this is ingrained in us from a young age), availability or recency bias (tendency to remember recent events rather than historical ones or the last few items in a series), sunkcost fallacy (tendency to stick with a strategy because one has invested time and effort in the undertaking) and confirmation bias (tendency to process only the information that supports our hunch or belief).
Countermeasures must be put in place in order to limit the effects of our biases on our analysis. Before anything else, ask yourself if you are biased toward your research.
Five whys
Valentin encouraged the participants to not to be afraid to ask questions in order to clarify what the objectives are and what needs to be done. Like a child who endlessly asks why, we must ask questions even if it’s simply to satisfy our curiosity.
Methods such as the 5 Whys as well as the 5W1H (what, why, who, when, where and how) will help us obtain a clearer picture of the problem. At this stage, he said, we are simply asking the right questions and not yet dealing with the data.
In a Menti survey of the first thing the participants do when analyzing data, they said they would “study,” “confirm,” “validate,” “process,” “arrange” and “understand” the data.
The usual process starts with data collection, followed by data cleaning, data analysis, visualization and reporting.
However, we should actually start by defining the problem. The results of the survey, which mostly referred to data, showed our tendency to immediately go into solution mode when confronted by a problem.
The problem statement
“Defining the problem should be the first step before even going to data collection,” Valentin stressed.
The problem statement, defined as a concise description of the issue to be addressed and why it’s valuable, should answer one or all of these items: Will it increase revenue? Will it cut costs? Will it lessen the risks involved? Will it benefit customers and the community?
If the time dedicated to the undertaking can be shortened, then that’s another plus for the business.
The resulting statement must be broad enough that it can be solved with the resources at hand, or narrow enough that one has room to play around with ideas. Formulate a too broad statement and it will be difficult to keep track of what you’re trying to solve; too narrow and the possible solutions will be limited.
Between the two, the second option can be considered the lesser evil: “…Go with too narrow, at least you can come up with solutions that directly address the problem,” Valentin explained.
Key elements
A well-structured problem should include three key elements: stakeholders or the persons doing or receiving the action, the action itself, and the new state or the result of the action.
An example is this problem statement for Netflix: “How do we maximize 2023 profits?” The streaming service lost more than a million subscribers last year after cracking down on account sharing, not to mention the stiff competition from the likes of Disney and Amazon Prime Video.
Applying the elements of person, action and result, and asking the 5Ws and 5W1H questions result in this clearer, more concise problem statement: “How do we attract new subscribers, while retaining existing subscribers, to hit profit targets in 2023?”
“We want to define the why before jumping to how because we need to regain control of the analytics process, not the data in control of us,” Valentin said. “Find your whys first if you are afraid of the amount of data you are dealing with.”
To generate solutions, use the 5Ws and 5W1H to break the issue down using the issue tree framework—the main problem is the root, which is then segmented into smaller and smaller sub-issues, represented by the branches and subbranches. The goal is to see the things you know and those you don’t know and to identify possible solutions. At this point, you can start collecting your data.
Building blocks
“We always start with the data. But how do you know it’s the correct data if it did not go through the process? For all we know, we’re using or analyzing the wrong data to begin with,” Valentin stated.
The analysis phase that follows data cleaning and data analysis will yield insights, wherein one gains an understanding or realization about a situation, resulting in a change in perspective or belief.
For example, you are a fan of Product X. With insight, you either start believing in Product Y—or you become even more convinced about the superiority of Product X.
Observations are not insights but are instead the building blocks of insights. To get insights, one has to dig deeper, ask the 5Ws and connect everything.
When you have your insights—which are made up of data, context and background—you now will be able to give recommendations or actionable items to your stakeholders, Valentin said.