It’s Like Amazon, But for Data: The Key to Sustainable Self-Service BI
A well-deserved buzz around self-service business intelligence (BI) steams from its claim to democratize the large volumes of data across the enterprise to make it accessible and consumable by everyone for reporting and visualization. And the darlings of startup community are capitalizing on this claim. Nevertheless, we can learn from an article by the leading AirBnb experts, as well as from plenty of other market research in this area. Self-service analytics, in itself, is not enough to transform corporate employees into the responsible data collaborators, or the data citizens, as we call them, simply because of its limited scope. There is a need to understand the “entire data ecosystem, from the production of an event log to its consumption in a visualization” that isn’t achievable by merely having a self-service BI setup.
Luckily for our customers, such need is naturally met by the Collibra data governance platform. Our core capability acts as a system of record for the data, positions us for proper cataloguing, linking, and visualizing various types of connections and relationships across the multitude of data assets and elements of various kinds to create an actionable map of the data landscape, or ecosystem, which can be then strategically maintained and collaboratively, methodologically evolved. Among other great functionalities fueling this vision, Collibra is known for its lineage and traceability diagrams that enable intuitive, visualized tracking of the data flows from their original physical data sources through the various stages of processing, enrichment, and associating them with the key business concepts and metrics in data glossary. The tentacles of our diagrams can stretch all the way to the various integrated systems, composite reports, and executive insights, where this data is being leveraged for business strategy, allowing proper assessment of its quality, articulation scores, ownership, and other important metrics. This ultimately confirms the validity of the data and reports and allows for easy investigation and mitigation of the potential issues with the data flows, e.g. data quality issues. Such capability is the key to having a sustainable trust in the results of your analytics. As self-service BI promotes the data democratization for business users across the enterprise data landscape, it is important to ensure that self-service implementations are properly maintained to consistently provide a fresh flow of trusted data to their users resulting in the relevant data insights communicated on the executive level. As self-service BI usage snowballs, the risk of report mismanaging and redundancy, as well as the need for consolidation, efficiency, and proper collaboration of business users also grows. As our team acknowledged the business user’s need for trusted data and desire to have well-maintained BI systems, our priority focuses on providing a seamless in-product integration with commonly used BI applications to offer Collibra as the means of collaboration over the logical data sets that source BI reports, and ultimately offer Collibra as a system of record for BI reports. Thus Tableau, a well known self-service BI pioneer, has become our first frontier in the effort to offer properly-maintained data accessibility to the business analysts and other BI consumers.
Many of our longstanding customers have been using the ad hoc integrations of Collibra and Tableau, supported via our extendable connector APIs. The common use case for such integrations was to offer the content from Collibra as a contextual glossary for the Tableau reports. Such capability would provide context for the common industry language around the Tableau visualizations, making them easier to understand. Now, with Collibra Catalog empowering business analysts to create, manage, profile, and categorize our pliable, logical data sets, we’re ready to go even deeper into facilitating a properly-enabled ecosystem that makes maintaining BI services a breeze.
In line with the trending revolutionary optimization of e-commerce lead by Amazon and marked by unprecedented convenience and superior user experience, we’re calling our next level Tableau integration use case “Shop for Data.” In it, we position Collibra as a governed data catalog for the Tableau server. The use case scenario is quite simple:
- Users can “shop” for data sets in Collibra, similar to how they browse for goods on Amazon
- Once they find the data sets they need, they can add them to their data basket and “checkout”
- Upon checkout, they obtain access to the data sets and can export them to Tableau to be used there for reporting, analytics, and data visualization
Now let’s look at the process in more detail. To begin, the users are equipped to find the data needed for their Tableau reporting intuitively, by browsing the Data Dictionary or having our algorithms flip through the searchable data sources and sets registered in Collibra Catalog. These registered data assets are tied into the variety of physical data sources ranging from Excel spreadsheets to the SQL databases, RedShift, or Hadoop. Our source registration capabilities can “dive” into the data lake, too. During registration, our Spark-powered profiling mechanism allows for the quick assessment of the incoming data landscape to foster an informed and quick decision making:
Such profiling results can be stored along with the data samples, allowing the right data to be scouted for quick discovery and assessed at a glance. Business users can mix and match the contents of data sources to create the logical data sets in alignment with their internal standards.
Here is how, in Collibra, you can select columns that represent the appropriate metadata assets from the different data sources, and either group them into a single logical data set, or add to existing data set:
Collaborative aspects of Collibra allow users to co-operate and crowdsource the data going into the BI tool before it’s published. The transparency and access to the data sets can be regimented to ensure that democratization does not undermine safe and secure handling of the data deemed sensitive. As the data is being properly protected, it can also be branded as trusted or fit-for-use by authorized data consumers. We call this discipline a data certification, and mark the certified data sets with a green ribbons as shown below as part of the Catalog landing page:
As you can see, the data sets, managed in Collibra as its technical assets containing proper metadata representing them,, can be discovered by the platform-wide search engine or from the Catalog’s landing page, which offers recommendations based on your browsing history and usage patterns. The data certification mechanism allows for proper governance of these data sets. It is powered by configurable certification workflow which takes the data set through the stages of approval with various stakeholders.
After identifying the data set best fit to use in their Tableau report, users can add it to the data basket in order to obtain the proper access to it using our intuitive shopping cart experience allusion, in which requesting access to the data is equivalent to the shopping cart checkout process. In this process, which some call data provisioning, the data governance functionality of Collibra platform empowers the user to automatically identify and tap the stakeholders and stewards responsible for the gatekeeping of this particular subset of corporate data, to review and appropriately handle this access request based on their involvement. After the data set check-out process in Collibra is complete, as one of the forms of granting data access, the data set may be loaded to the self-service BI tool, in our case Tableau, to generate the BI reports that can be trusted. The data basket analogy represents a personal dataset “shopping cart” to store the dataset “shopping lists” for the given user. When the approval for the dataset usage is granted, these data sets are ready to be seamlessly transferred from the Catalog to the Tableau server and made available to the authorized data consumer requesters, e.g. Business Analysts, so they can extract meaningful and truthful insights from this trusted data in no time:
If the data set was published to a Tableau server previously, its permissions will be updated, so that the requester can obtain a proper access to use the published data set in Tableau workbooks to create visualizations.
In closing, let’s recap on the benefits of registering your data sources in Collibra Catalog to form trusted data sets to be consumed by the corporate analytics. First and foremost, it will replicate the enterprise data governance standards around people and processes that are set in Collibra to ensure the proper data maintenance continues in the self-service BI realm. It will regulate the use and certification of the data sets and it will connect data users and consumers with data owners, producers, and stakeholders. It will also enrich the data sets with the business context, the appropriate metadata tags and other descriptors, as well as offer similar data set matching that provides for faster data discovery and de-duplication of datasets and their derived byproducts, such as reports. For the business users, it means that they can trust the data in their reports and insights, and they don’t have to reinvent the wheel each time they need a new report because they have a system of record to search through and find if it was already created by someone else. The result is a savings in their time and the money spent on constructing reports that have already been created. It also saves the time of technical users and administrators spent on maintaining, communicating, and provisioning around the data warehouses they own. The users can also track the data lineage and traceability of their reports, data sources, and linkage with other data assets, such as associated business terms. The collaboration capabilities of Collibra platform can also seamlessly connect users to the experts for every data set and get their data-related questions answered, crowdsourced and instantly documented to improve the productivity and efficiency around enterprise data and its reports. Amazon-like capabilities of the Collibra platform can be used to commission the corporate self-service BI, and other data management systems, to create “a marathon, not a sprint” type of experience for its users and cause a long-lasting sustainable effect of success and precision that we’re seeing with the leading digitized companies.
Ellie is a passionate product manager, data warehousing aficionado, self-proclaimed fitness enthusiast, up-and-coming entrepreneur and occasional comedian. Prior to spending last ten years as multifunctional product leader, championing data management products from inception into successful adoption with high profile enterprise customers, she learned the art of building software as a consultant, programmer and system architect. Ellie holds a Marketing MBA from New York University’s Stern School of Business and bachelor’s degree in Computer Science, also from NYU.