As organizations increasingly use predictive analytics to more accurately predict their business outcomes, improve their performance, and ultimately increase their profitability, governing those models becomes more important. But today, many organizations lack the transparency and governance they need for true predictive model governance.
Model governance (both researching a new model as well as ongoing governance) defines a set of processes, roles/responsibilities, and insights to make sure that users of the model understand the model risks, limitation, and major assumptions of the model and that the use of the model output is appropriate. In short, predictive model governance is possibly the most important element of a predictive modelling QA framework, as it manages risk and ensure the appropriate sign-offs.
Making the results and methods as transparent as possible can help to make the models more effective and drive up the standard of quality by encouraging the researchers to undertake more thorough checks. In addition, opening up the approach and results to a wide range of external experts sparks them to challenge, debate, and further improve the already existing models. This can ultimately drive consistency in the organization by the re-use of the same analysis and models previously shown valuable.
By taking a closer look to the typical steps involved in creating a predictive model, we can identify several common steps:
- Request or determine the business objective of the model to be created
Model Build & Validation
- Find, understand, and trust the data
- Request access to the required data
- Develop and validate the model
Model Communication and Monitoring
- Analysis of the results and related reporting
- Govern and monitor the model
A solid governance platform plays an important role in optimizing some of the above steps. It also makes researchers more effective in what they do, which makes their results more impactful.
First off, an organization can use governance system to keep track of the different requirements in the organization and help build a solid process around how they handle those requests.
Secondly, it helps organizations make use of a data dictionary, data catalog and business glossary. A governance platform can drastically reduce the time spent in finding what data is available in the organization and giving it additional context. In addition, policies and data quality metrics might provide the researcher with the information on how good the quality of the related data is.
Lastly, once an organization completes the analysis and creates the model, a governance platform can help to certify and communicate the resulting report and model to the necessary people while still providing the transparency. A governance platform also serves as a common place to find comments and questions regarding the model or report.
In the case of data source updates, predictive models related to the source data become unreliable. Organizations should update them accordingly. But in order to complete this task, we have to understand what models, reports, and business processes are relying on data. Using impact analysis in the governance system, people can discover upfront what models, business processes, or reports a data source change will impact and can anticipate the necessary changes upfront.
Another issue researchers commonly experience is the ability to share the models between researchers. Multiple researches work on the same source data and/or datasets, but are unaware of what datasets and models other researchers have previously provisioned and used for analysis (and therefore are readily available). In the governance system, a searchable business glossary and catalog containing this information can help researchers to work more effectively.
The trend on data privacy is clear: it’s imperative that organizations have transparency and policies in place. The same is valid for predictive models. You can not use all data elements freely when building the models. To avoid unintended -and unwanted – use of data elements, organizations should classify their data and link it to policies and processes. Having an enterprise metadata repository and governance system is key in this exercise.
How Collibra can help
The Collibra searchable data catalog and data dictionary helps researchers discover what data and datasets are available and if the data is of a sufficient quality. Collibra also provides business context to further improve their understanding. Furthermore, researchers have the ability to request access to the required data while governed processes can be used to provide that access.
Within Collibra, a predictive model is a type of asset (in line with other asset types, such as reports, policies, etc.) which organizations can link to business requirements, use in parameters, dataset(s), report(s) containing the information, system of record, and other essential attributes in understanding and providing transparency in the model itself and its risks, limitations, and assumptions. In addition, the Collibra Data Governance Center can accommodate all essential lifecycle workflows.
Privacy classification of data elements and linking them to privacy policies and processes is an integral part of the Collibra platform. This further enhances the organizations ability to protect the required elements and avoid the related risks.
Exposing the models in a business glossary and linking them to the related business language helps other researchers as well as knowledge workers to easily find, understand and trust the model and the data they use. In addition, it provides them with the relevant policies and business rules in place to ensure correct usage.
Predictive models are important asset to the organization. Building governance around them is therefore an essential step in creating value and trust. Strong predictive model governance also enables researchers to limit the time spent on finding, understanding and trusting the data they require and focus on the core of their jobs: building predictive models. Furthermore, strong governance limits the risks involved in predictive modelling by providing transparency in the models themselves and the privacy limitation of the data used.