High quality and trustworthy data is more important now than ever before. Epidemiologists, data scientists, researchers and healthcare professionals are turning to data to track the spread of COVID-19 and ultimately find a vaccine to prevent future outbreaks. These data citizens must act fast and be proactive; they do not have time to waste using bad data.
Similarly, businesses today are being dramatically affected by COVID-19. With the uncertainty of the economy, organizations need to be able to act fast and rethink strategies, budgets and targets in order to be more effective in this new COVID-19 world.
Unfortunately, many organizations are struggling to act quickly during this time because they are unable to efficiently access trusted data. In fact, many business analysts spend more time on average searching for data, than on generating business insights. According to a survey conducted by Forrester, business analysts spend more than 70% of their time finding, understanding and accessing data, which can slow down analyses and ultimately innovation. This inefficiency could lead to missed opportunities, disruption and even financial loss, so it is crucial that organizations ensure that the highest quality data is used when making business decisions.
Introducing Data Scoring
At Collibra, we recognize that the need for quality data is greater now than ever, and we are here to help. We want to enable you to effectively and efficiently make data-driven decisions using the most trustworthy data available in order to navigate any business challenge you are facing.
That is why we are excited to announce Data Scoring, a new feature of Collibra Data Catalog, which helps organizations increase productivity and enable smarter decision making. Data Scoring provides insights that enable analysts to compare options while they shop for data in Collibra Data Catalog and find the data they need faster. Typically, business analysts have to manually sort through every potentially useful data set to evaluate the quality of the data and then ask for permission to access the data, which slows down time to insight. And this lengthy and laborious process results in many business analysts just picking the data sets they are familiar with and hoping that the data is of sufficient quality — meaning business analysts aren’t typically using the highest quality data available in their analyses.
A credit score for your column
Data Scoring enhances the data shopping experience, enabling business analysts to compare similar data using scores that indicate relative data quality and quickly assess which data set is the best fit for their purpose. More specifically, Data Scoring uses data profiling information collected during ingestion to generate scores by column that represent data quality.
To understand Data Scoring better, let’s think about a credit score as an analogy. In the US, a credit score is a number ranging from 300-850 that depicts an individual’s creditworthiness. This score is based on the credit history of the consumer, such as the number of open accounts, total levels of debt, and repayment history. Lenders use credit scores to evaluate the probability that an individual will repay loans in a timely manner. The higher the credit score, the more attractive the individual and the more likely she will be approved to open new credit cards or be approved for a loan.
Similarly, Data Scoring helps analysts quickly assess the data quality of one column relative to other available columns containing similar data. And like credit scores, the higher the score, the better the quality of data in that column, and the more confident business analysts can feel using that data in their analysis. Data Scoring enables analysts to produce more accurate analyses and ensure that critical business decisions are made using the highest quality data available.
Cliff goes comparison shopping
Let’s take a look at an example. Meet Cliff, a marketing analyst tasked by the CMO to build a marketing campaign for a new product launch. This is not a straightforward task because Cliff must look for data sets that include a range of data such as customer name, contact information, where they live, what products they’ve bought in the past and how often. To do this, he’ll need to access multiple data sets within Collibra Data Catalog. He wants to use the best data available, but knows that some of the data sets he wants to use may have columns with inaccurate or incomplete data. He turns to Data Scoring for help.
He begins his search by typing “customer” into the search bar on the home screen. When the search results come up, Cliff clicks on the Customer data domain, and he can see details on what type of data is mapped to that domain. In Collibra Data Catalog, there are multiple data sources that contain the data Cliff needs, so he clicks on “Find a data source.”
He types in the data concepts (or the types of data) he is looking to use in his analysis, which are customer email, full name, address, business region, revenue and product name, and sees multiple data sources with data scores per each data concept. Cliff scans the list and sees that one data source, “CustomerProductSales,” has the highest score for Address. This is the data that Cliff is most interested in as he wants to segment his analysis by geography.
Cliff wants to better understand the score, so he clicks on the score for Address to learn more. This shows him additional details such as % of null, anonymous and invalid values. Because the percentages are low, he feels comfortable that the quality of the Address data is high.
Now that he knows which data source to use, he finds the data set containing data from that source, adds the data set to his data basket and checks out. Because of Data Scoring, Cliff is confident in the quality of the data he requests. The data owner grants Cliff access to the data set, and he uses the data to build an amazing campaign to launch the new product. Cliff is happy, the CMO is happy, and the company sees more sales from Cliff’s marketing campaign than from any campaign in the past.
Enable smarter decision making across your organization
In these times it is crucial to access the highest quality data available as quickly as possible. With Data Scoring, we help increase productivity at your organization by enabling all your “Cliffs” to find the data they need much faster and be confident that this data is the best data available for their purpose. Cliff no longer needs to manually sort through and request access to numerous data sets and instead, can effectively and efficiently access data he trusts. Data Scoring ultimately enables smarter decision making, which will help companies adapt and react to the current economic and social climate.
Aurko Joshi is passionate about building products that empower organizations to revolutionize the way they use data.