Freddie Mac creates a one-stop shop for data with Collibra

Share on:

Freddie Mac’s mission is to make ownership of a home possible for everyone, helping 480,000 first time home buyers in 2019. Achieving this mission involves the collection and processing of vast amounts of data. To help manage this colossal wealth of data, Freddie Mac turned to Collibra as their one-stop shop for data, particularly relying on Collibra for their intake process, business glossary and data lineage needs.

Vikram Chopra is the Senior Manager, Single-Family, Data & Decisions at Freddie Mac. He led the selection of Collibra to address their data governance and data stewardship needs, and now leads the enhancements and adoption of Collibra across all Freddie Mac divisions.

“Our data ecosystem transformation pivoted on Collibra,” said Vikra.

Collibra quickly improved the collaboration within and across different teams after the implementation of the single-family business glossary. “Before Collibra our business glossary, business terms and definitions were in Excel spreadsheets,” said Vikram.

Now, more than 2,000 Freddie Mac data users have access to the business glossary on their computers and mobile devices.

“In addition to transforming the business glossary from spreadsheets, we built key integrations to make Collibra the universal metadata repository,” said Vikram. “We have also added collaborative workflows to engage with our business users and keep the business metadata evergreen.”

Representing approximately one third of the Freddie Mac organization, data consumers use Collibra for different data related use cases.

“This is truly a transformation, not only in the data governance and metadata space, but overall, how we think of data at Freddie Mac,” said Vikram. “Data users can now get an integrated business view of their technical metadata, data quality, data movement controls, and all business metadata in a single central universal platform.”

Data Transformation Challenges

This transformation did not come without challenges. For one, they had been collecting data for years, with exponential growth recently. Their data ecosystem is multi-generational. They still have Cobolt Mainframes, relational and star schema base data warehouses and data marts. They have approved provisioning points for system-to-system integrations with pub and sub patterns.

“We also implemented Hadoop’s analytics platform for our big data needs,” said Vikram. “Having a multi-generational data ecosystem not only creates a predicament for us, it also creates complications for our data consumers. It can get overwhelming for data consumers to figure out where to get the data they need.”

The solution involved the transition to a self-governed cloud native data lake, making all data available in a single place. This transformation provides security, resiliency, scalability, reliability and availability.

“We can realize those benefits by modernizing our data ecosystem to a cloud native ecosystem. But what are the risks if you don’t do it right?” said Vikram. “The business is dependent on the delivery of this data, their innovation, their analytics, their predictive analysis needs this data. And they are relying on the data team, us, to provision this data seamlessly so that the business can be run smoothly.”

Designing a modern cloud ecosystem

Their approach to designing and implementing a modern cloud native ecosystem solution involved three steps. At the core of designing a well-governed data lake is the requirement that the new data ecosystem will deliver ongoing relevance and value creation. It should drive creativity, innovation, and reduce the time to market for new products.

The first requirement was alignment between the overarching organizational business objectives and those responsible for provisioning the data and implementing the data lake.

“We refreshed our data strategy, to ensure it aligns with our divisional business objectives, and with the departmental mission of a cloud native data strategy, because we were heavy users of Collibra already,” said Vikram. “Next, we built the cloud onboarding solution with Collibra in the front. And finally, we added integrations where we needed them so that the data pipelines are flowing smoothly and are optimized at every point.”

Defining the data strategy

The foundational blocks to Freddie Mac’s data strategy were managing content, trusting the data source and empowering the business. Collibra plays key roles across all these foundational blocks. Collibra manages the content data about the data, using data quality scorecards and metrics for the data elements. This provides confidence that consumers can trust the data sources.

Collibra also empowers data users to find the data quickly through the data information platform. The new centralized infrastructure strategically empowers the business by increasing data consistency, availability and sustainability across the enterprise.

Placing Collibra in the front

“Having a data strategy helps us communicate where we are going and why we are going there,” said Vikram. “The next step in our data transformation journey was to design the flow for the data lake hydration process that we pivoted on Collibra. Collibra has become a well-established data governance and metadata platform at Freddie Mac, the metadata from our legacy data platforms, data warehouses, data marts, applications in relational databases, as well as Hadoop is already integrated into Collibra. The technical metadata from these data sources is supplemented with business metadata, such as definitions, taxonomy, data classifications, data quality, and controls.”

They added custom workflows to increase the collaboration across teams and to keep the business metadata evergreen. They placed Collibra in front of the data lake onboarding process to leverage the existing metadata. Along with their new intake process they were able to accelerate the transformation to their new data information platform.

“The metadata, data classifications, and technical metadata from Collibra powers the access management engine and the data pipelines to hydrate our data lake,” said Vikram. “As the data lake gets hydrated with approved data sets, the data in the lake is already curated. It is ready for access and the metadata from S3 buckets, parquet files and Snowflake is available in Collibra for data consumers, universally.”

Adding the integrations

They are also integrating Collibra with their own data quality processes. They will be able to measure the quality of data as it flows into the lake, analyzing across the quality dimensions of accuracy, completeness and validity. Through Collibra’s Data Catalog, they can then provide these metrics to their data consumers directly.

Then the team designed and developed an intake workflow in Collibra. The registration workflow provides the mechanism to identify and classify data sets that need to be provisioned in the cloud.

Their workflow identifies and validates the data structures that need to be transitioned, then initiates the data lake hydration business process by bringing together different parts of the organization – data consumers, project teams, data owners, and data governance teams to create alignment across the board.

Using pre-built pattern-based pipelines, the metadata flows from Collibra into the lake using Okera, a universal data authorization application, as the engine for fine-grain access management. The integration between Collibra and Okera delivers the data and the masking details at data element level.

The tremendous benefits

With the Collibra-based solution in place for a few months now, Freddie Mac has started to realize tremendous benefits from it. They have strengthened the business culture on the idea that data is a product, with value that the data information platform can deliver. They focus on adding features that their users will be able to use immediately.

They are developing new business capabilities with the innovations possible with a modern, agile delivery approach. By leveraging Collibra, they are accelerating the simplification of their data ecosystem, with a focus on business. The metadata, technical business, data quality controls, and even operational metadata is all available in Collibra.

“This promotes common understanding of data because this rich metadata is able to provide data consumers, a universal context on the data origination, its availability and how to get access to what data,” said Vikram. “Having an integrated business view of this rich content is very powerful and empowers our business users.”