Barriers on the Road to Big Data Governance
Big data projects are very much in vogue. Many organizations, across a variety of industries, are making significant investments in the hopes of finding insight and creating value. But when it comes to a plan of execution, most organizations rely on a team of data scientists and a polluted data lake. While it’s possible to be successful with this approach, it is essentially the equivalent of planning for your retirement by buying a single lottery ticket. It’s high risk, and there is a strong possibility that it will yield mediocre results or fail altogether. High risk strategies can (and often should) be part of the portfolio for many organizations. However, for your organization to use big data to achieve its full potential, you must diversify your strategies rather than concentrate on just one, high-risk approach. You should democratize analytics by spreading usable data and analysis capability throughout the organization so that the results of that analysis can be applied immediately. It must become woven into the daily business processes and practices. And it requires big data governance.
Big data governance may sound like a lofty goal, but in reality, it is not. It does, however, require some infrastructure in order to put it into practice. And like all infrastructure, it demands an investment of money, time, and resources. Justifying the investment is simple at the macro level (many people use a business case), but challenging to do when its implementation is spread across many silos and many different people throughout the organization. Understanding how the approaches to breadth and scale of analysis succeed or fail is critical to designing a data governance path that will mesh with your organization, its culture, and its processes.
3 Roadblocks to Big Data Governance
As organizations start down the path to big data governance, they find they may encounter barriers along the way. While these barriers vary from organization to organization, I’ve found that they generally fall into three areas:
- Absence of a data democracy
- Lack of urgency by IT
- Data haves and data have nots
Absence of a Data Democracy
Democratizing the use of data so that anyone can become a data citizen is not a new idea. This was the promise of various data management approaches over the years, from data warehouses to operational data stores to analytics embedded within business applications. And while all of these approaches represented significant improvements in the ability of the organization to use data, none of them really took that use out of the hands of specialists, either centrally located or in individual business units. In short, the promise of self-service analytics remains virtually unrealized as data remains siloed in its home systems. Integrating applications across the organization is a powerful way to provide transactional automation, but despite many attempts, is unable to provide the insight and analytics needed to empower decision making. And as the amount and scale of data grows, these “in flight” approaches are unable to scale with them.
Lack of Urgency by IT
The IT organization, often the typical stewards of early big data initiatives, are often not aggressive about building out the infrastructure required to support big data governance. From their perspective, the big data initiative may already be a clear win because it is significantly less costly to store large amounts of data in a Hadoop cluster vs. some of the proprietary alternatives. Given that they already view this initiative as a win, the urgency of improving it further may be somewhat dulled.
Another challenge is that IT does not know the data. They are experts in the machinery that manipulates the data, but the data itself is just “grist for the mill.” The people who know the data are the people who produce and consume it as part of their day to day jobs. And these people, the data citizens, are spread throughout the organization, representing a powerful resource if they can be organized to work together.
Data Haves and Data Have Nots
Many organizations find a few big wins with big data: situations where a team or group gains the ability to analyze data that they never had before. However, the vast majority of the organization fails to see the benefits, and as a whole, there is an overall sense of disappointment with the project. The data scientist team is also frustrated not because they cannot consistently produce high quality analysis, because it takes far too long and has too much manual effort to find and evaluate the data. Research has shown that data scientists spend 60% of their time looking for the data they need. This research also shows that this is also the task that they enjoy the least. So the upshot is that we have expensive and scarce resources spending most of their day performing a task they dislike. Not exactly a recipe for success.
It’s clear that the road to big data governance is full of obstacles. But the good news is, they are obstacles that you can overcome with the right strategy and the right data governance platform. In my next post, I’ll start to explore to the steps you can take to break down the barriers and start down the road to making big data and analytics a success. In the meantime, if you’re looking to hear more about big data governance, join me, Aaron Zorens from The MDM Institute, and Stephen Gatchell, Chief Data Officer, Engineering Analytics & Data Lake at EMC for a free webcast entitled “Data Governance: They Key to Taming the Big Data Beast.” You can register here. I hope to “see” you there.