My data product journey as a freshly graduated data scientist

On the 3rd of August, I started my professional journey as a data scientist in Collibra’s Data Office. Our mission is to make data meaningful, and in this blog post I’d like to share how I’ve moved the needle on that mission. Specifically, I’ll write about what I’ve learned as I launched my first data product with my colleagues in Sales Engineering all the way from idea to production, including the cross-functional collaboration between our departments. 

Involving the business: the world of the sales engineer

Sales engineers play a vital role in helping our customers on their Data Intelligence journey. They refine the customer’s needs, identify business value and connect it to our platform’s capabilities. If you are new to Data Intelligence, the sales engineer will show you how that vision translates into “a day in the life of” a business analyst, data scientist, data steward or data engineer. They show how the software works and help you through the journey of demos, proof of concepts, integration scenarios, technical questions, trainings and more. 

Although Collibra has tremendously good sales engineers, they sometimes can’t understand all the customer’s challenges over a short time frame. Imagine a visit to the doctor: they make a diagnosis based on a short physical visit, but any symptoms occurring before and after the visit are hard to physically examine. Like this metaphor, how could data (from monitoring) help sales engineers better understand a customer’s challenge?

A good data product starts with a business idea

I started my collaboration with a sales engineer passionate about data. Together, we envisioned a data product that could make the sales engineers’ lives easier. Looking back, this is an important first lesson learned — all good data products start with a business idea! And collaboration is crucial; there is no substitute for experience, and the sales engineer’s experience is a key ingredient in our business. By taking the first step in creating a data product together with a sales engineer, I immediately achieved buy-in and ownership. 

Achieving buy-in and ownership is critical because a data product should always have an owner. A data product owner, who in this instance was a sales engineer, is someone who represents their part of the business and their needs while developing a product. In my experience, our data product owner knocked on the data office’s door holding a Jupyter Notebook excited to roll this product out to her colleagues. 

Collaboratively iterating a prototype made the idea come alive

We held weekly meetings with the data product owner to identify the required insights. Shortly a data product evolved surfacing usage of assets, domains and features.

We built the data product in a very agile manner. First, all initially proposed insights were rapidly put in a data pipeline and immediately visualized without thinking of code cleanness and efficiency. This went fast as we iterated progress in 2 days, which allowed the data product owner to consume what they asked for and interact with the insights. This quick implementation raised the bar on what is possible. 

Architectural Overview

Collibra’s product uses Pendo to track usage data, which is stored in the data lake. A data pipeline in Python code transforms the raw data from an AWS S3 bucket into proper insights in AWS Redshift. The pipeline is Dockerised and runs on a Kubernetes cluster, updating the data five times a day. Finally, this data is visualized in a Tableau dashboard, visible for all sales engineers.

Going from a hypothetical state (the ideas) to a directly consumable Tableau dashboard opened a new world of possibilities. New ideas were raised and discussed, resulting in a better data product. As a next step, we involved additional sales engineers. They had a fresh view generating additional insights and feedback. The same agile method was used to continue this co-creation process.

Out of the sandbox and into production

Once our ideas were implemented we decided to move from our discovery phase into production. Now code cleanness, code efficiency and uptime became very important. Also, the data product has to be correctly registered in our internal Collibra environment. Additionally, we need proper documentation on how to use it, how to add instances, organize training and create instructional videos. The last-mentioned steps are crucial for adoption.

Consulting with legal counsel on proper data use is critical prior to implementation. Data privacy and ownership issues abound in data use cases. Data owners, as well as data product developers, don’t necessarily analyze these use cases from a legal perspective. In this case, we worked with legal counsel as a partner to avoid compliance and proprietary roadblocks.

What comes in the box?

The data product is delivered as a Tableau dashboard where the sales engineer can interactively play with the data. It starts with key metrics on total hands-on time, the number of active users and the date on which the data was last refreshed. This last metric gives the possibility to compare printouts of the dashboard and also provides an idea on the recency of the processed data. Furthermore, the data product shows insights on the most used assets, domains, and features. Finally, search terms are shown in a wordcloud, giving a better idea on what users are looking for and what might be missing. 

 

Conclusion

I’ve learned that a data product will never be fully defined at the start, but shapes itself during the collaborative process. Collaboration is key to a successful data product. I’ve had more than 15 people involved! The data product’s success depended on each of them. I would like to thank all collaborators involved in making data meaningful for our sales engineers. Up to the next project! #OneCollibra

Related resources

Analyst report

10 ways CDOs can succeed in forging a data-driven organization

Customer story

Democratizing data at Lockheed Martin

Video/Webinar

Collibra Data Intelligence Cloud

View all resources

More stories like this one

Jul 16, 2020 - 2 min read

Why data literacy matters now more than ever

Read more
Arrow
Blog ParisSeminar
Jan 25, 2019 - 4 min read

The steps to a successful business transformation

Read more
Arrow
Blog DigTransSeries
Jan 24, 2019 - 3 min read

The top five tips CDOs need to know in 2019

Read more
Arrow