Data Quality - Are we asking the right questions about our data?
In the analytics world, data quality is a hot topic. We need our data to solve problems, but before it can do that, it needs to be “quality data.” The current conversation around data quality is all about making sure the data is clean, consistent, complete, and well formatted. But the question we should be asking is:
Does this data have the ability to answer our questions and solve our problems BEFORE hours of cleaning, experimentation, and modeling?
Why do we have to wait until the data is analyzed to know if the models/predictions/insights really are representative of the business? Why should we spend time and resources tackling problems when we aren’t sure they can be solved?
We need to better understand our data in its raw form. We need some objective confidence in the quality of our data and what supports our problems before we begin our analysis.
At Compellon, we see “data quality” as the ability of the data to answer your business question, not how perfect your data is. Does the data contain the information related to what you are investigating?
“You can have data without information, but you cannot have information without data.” – Daniel Keys Moran
What do we mean by information?
We refer to information as the ability of your data to describe your business. Information is not a bunch of individual columns of data working independently, but rather the interactions between a group or multiple groups of columns. At our core, we identify these groups without the need of assumptions or biases introduced by humans.
Let’s go back to the beginning. Compellon’s technology was built out of failure (read more about it here) and the need for a new approach. With our Smart Plug and Play AI, we have created that approach—one that has transformed conventional analytics from a creative/subjective process to an objective process (i.e., a much better approach). It leverages information theory and other super geeky stuff, utilizing our AI to autonomously search for the information in the data that describes your target.
Say you want to learn about where your boss likes to eat for lunch. To find this out, we will need to gather information from people that know her. So, you ask her husband, family, friends, coworkers, local restaurant owners—anyone that might have information about her lunch habits. The tricky part is finding the best combination of people that provides enough information to know where she likes to eat. Sometimes that can be a single individual, but often is a disparate group and not always the obvious combination. Her best friend and her husband may each know a lot about her, but most of it’s the same stuff, so you just select one. Let’s say her husband. Her assistant and the doorman may not know as much information as her husband or best friend, but it’s different. If you want to get as much information without overlap, you may choose to talk to her husband, her assistant, and the doorman. This will provide the best group that contains enough of the information needed to determine where my boss will eat.
We built Iris Pro because our AI finds the best group of information before it builds a custom model for your data, and we thought, “Man, wouldn’t EVERYONE want to know this before they start any analytics project…even if they aren’t using our platform?”
Now, you can know:
How much information in your data supports your target, and
What the best group of variables are to use to build your model from the beginning.
Then, you can decide if it’s enough to answer your question and move forward with the right variables, change the scope of your project to align with the information you do have, or go back for more data that may contain the information needed to answer your question.
Using our Smart Plug and Play AI and analytics advisory tool, Iris Pro, you can streamline your data science process, drastically reduce time to modeling, and quickly evaluate the information in your data to determine how well it supports your business analysis.
Now you can proceed with confidence, knowing that your resource-intensive analytics projects will result in positive outcomes and train models with the best group of variables.
Learn more about Iris Pro here.