IT’S YOUR DATA, AND THIS IS YOUR BLOG

Welcome to the Collibra Blog, where CDOs, data stewards, and data citizens go to learn about true data governance.

subscribe

It’s Like Amazon, But for Data: The Key to Sustainable Self-Service BI

Share: Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someone

Self-Service BIA well-deserved buzz around self-service business intelligence (BI) links to the concept of democratizing the large volumes of data across the enterprise to make it accessible and consumable by everyone. And the darlings of startup community are capitalizing on this concept. As we can learn from an article by the leading AirBnb experts, as well as from plenty of other market research in this area, self-service analytics in itself is not enough to transform corporate employees into the responsible data collaborators, or the data citizens, as we call them. There is a need to understand the “entire data ecosystem, from the production of an event log to its consumption in a visualization” that isn’t achievable by merely having a self-service BI setup.

Luckily for our customers, such need is naturally met by the Collibra data governance platform. Our core capability acts as a system of record for the data, which results in cataloguing, linking, and visualizing various types of connections and relationships across the multitude of data assets and elements of various kinds to create an actionable map of the data landscape, or ecosystem, which can be then strategically maintained and collaboratively, methodologically evolved.  Among other great functionalities fueling this vision, Collibra is known for its lineage and traceability diagrams that enable intuitive, visualized tracking of the data flow from the original physical data sources through the various stages of processing, enriching, and associating them with the key business concepts and metrics in data glossary. The tentacles of our diagrams can stretch all the way to the various integrated systems, composite reports, and executive insights, where this data is being leveraged for business strategy, allowing proper assessment of its quality, articulation scores, ownership, and other important metrics. This ultimately confirms the validity of the data and reports and allow for easy investigation and mitigation of the issues with this data flow. Such capability is the key to having a sustainable trust in the results of your analytics. As self-service BI promotes the data democratization for business users across the enterprise data landscape, it is important to ensure that self-service implementations are properly maintained to consistently provide a fresh flow of trusted data to their users resulting in the relevant data insights communicated on the executive level. As self-service BI usage snowballs, the risk of report mismanaging and redundancy, as well as the need for consolidation, efficiency, and proper collaboration of business users also grows. As our team acknowledged the business user’s need for trusted data and desire to have well-maintained BI systems, our priority focuses on providing a seamless in-product integration with commonly used BI applications to offer Collibra as the means of collaboration over the logical data sets that source BI reports, and ultimately having Collibra as a system of record for BI reports. Thus Tableau, a well known self-service BI pioneer, has become our first frontier in the effort to offer properly-maintained data accessibility to the business analysts and other BI consumers.

Many of our longstanding customers have been using the ad hoc integrations of Collibra and Tableau, supported via our extendable connector APIs. The common use case for such integrations was to offer the content from Collibra as a contextual glossary for the Tableau reports. Such capability would provide context for the common industry language around the Tableau visualizations, making them easier to understand. Now, with Collibra Catalog empowering business analysts to create, manage, profile, and categorize our pliable data sets, we’re ready to go even deeper into facilitating a properly-enabled ecosystem that makes maintaining BI services a breeze.

In line with the trending revolutionary optimization of e-commerce lead by Amazon and marked by unprecedented convenience and superior user experience, we’re calling our next level Tableau integration use case “Shop for Data.” In it, we position Collibra as a governed data catalog for the Tableau server. The use case scenario is quite simple:

  1. Users can “shop” for data sets in Collibra, similar to how they browse for goods on Amazon
  2. Once they find the data sets they need, they can add them to their data basket and “checkout”
  3. Upon checkout, they can export the data sets to Tableau so they can use them for reporting, analytics, and data visualization

Now let’s look at the process in more detail. To begin, the users are equipped to find the data needed for their Tableau reporting intuitively, by browsing the Data Dictionary or having our algorithms flip through the searchable data sources and sets registered in Collibra Catalog. These registered data assets are tied into the variety of physical data sources ranging from Excel spreadsheets to the SQL databases, RedShift, or Hadoop. They can dive into the data lake, too. Our Spark-powered profiling mechanism allows for the quick assessment of the data landscape to foster an informed and quick decision making:

Self-Service BI

Such profiling results can be stored along with the data samples, allowing the right data to be scouted for quick discovery and assessed at a glance. Business users can mix and match the contents of data sources to create the logical data sets in alignment with their internal standards.

Self-Service BI

Here is how, in Collibra, you can select columns that represent the appropriate metadata assets from the different data sources, and either group them into a single logical data set, or add to existing data set:

Self-Service BI

Collaborative aspects of Collibra allow users to co-operate and crowdsource the data going into the BI tool before it’s published. The transparency and access to the data sets can be regimented to ensure that democratization does not undermine safe and secure handling of the data deemed sensitive. We call this discipline a data certification, and mark the certified data sets with a green ribbons as shown below as part of the Catalog landing page:

Self-Service BIAs you can see, the data sets, along with their Collibra metadata, can be found from the landing page, which offer recommendations based on your browsing history and usage patterns. The data certification mechanism allows for proper governance and is powered by configurable certification workflow which takes the set through the stages of approval with various stakeholders.

After identifying the data set best fit to use in their Tableau report, users can add it to the data basket in order to obtain the proper access to it using our intuitive shopping cart experience allusion, in which requesting access to the data is equivalent to the shopping cart checkout process. In this process, which some call data provisioning, the data governance functionality of Collibra platform empowers the user to automatically identify and tap the stakeholders and stewards responsible for the gatekeeping of this particular subset of corporate data, to review and appropriately handle this request based on their involvement. After the data set check-out process in Collibra is complete, the data set may be loaded to the self-service BI tool, in our case Tableau, to generate the BI reports that can be trusted.  The data basket analogy represents a personal dataset “shopping cart” to store the data “shopping lists” for the given user. When the approval for the data usage is granted, these data sets are ready to be seamlessly transferred from the Catalog to the Tableau server and made available to the authorized BA requesters so they can extract meaningful and truthful insights in no time:

Self-Service BIIf the data set was published to a Tableau server previously, its permissions will be updated, so that the requester can obtain a proper access to use the published data set in Tableau workbooks to create visualizations.

In closing, let’s recap on the benefits of registering your data in Collibra Catalog to be then consumed for the corporate analytics. First and foremost, it will replicate the enterprise data governance standards around people and processes that are set in Collibra to ensure the proper data maintenance continues in the self-service BI world of Tableau. It will regulate the use and certification of the data sets and it will connect data users and consumers with data owners and stakeholders. It will also connect the data sets with the business context, the appropriate metadata tags and other descriptors, as well as offer linkage to the similar data sets that provide for faster data discovery and de-duplication of data and its derived byproducts, such as reports. For the business users, it means that they can trust the data in their reports and insights, and they don’t have to reinvent the wheel each time they need a new report because they have a system of record to search through. The result is a savings in their time and the money spent on constructing reports that have already been created by someone else. It also saves the time of technical users and administrators spent on maintaining, communicating, and provisioning around the data warehouses they own. The users can also track the data lineage and traceability of their reports, data sources, and linkage with other data assets. The collaboration capabilities of Collibra platform can also seamlessly connect users to the experts for every data set and get any data-related questions answered, crowdsourced and instantly documented to improve the productivity and efficiency around enterprise data and its reports. Amazon-like capabilities of the Collibra platform can be used to commission the corporate self-service BI, and other data management systems, to create “a marathon, not a sprint” type of experience for its users and cause a long-lasting sustainable effect of success and precision that we’re seeing with the leading digitized companies.

Ellie is a passionate product manager, data warehousing aficionado, self-proclaimed fitness enthusiast, up-and-coming entrepreneur and occasional comedian. Prior to spending last ten years as multifunctional product leader, championing data management products from inception into successful adoption with high profile enterprise customers, she learned the art of building software as a consultant, programmer and system architect. Ellie holds a Marketing MBA from New York University’s Stern School of Business and bachelor’s degree in Computer Science, also from NYU.