Data Quality Comes Before Data Visualization
Several months ago, I was slated to meet members of a client’s Business Intelligence team for happy hour. I decided to get there early, so imagine my surprise when I walked into the room 15 minutes early – and everyone was already seated and participating in a lively discussion! I took my seat, and the senior director asks, “What does data governance mean to you?” My candid answer…
1. Data governance is a red-headed stepchild
The unfamiliar guy seated next to me throws his hands up in exasperation. One of the other directors points at him and says “meet our Director of Data Governance.” The data governance guy was exasperated because I nailed his primary issue with one statement. Basically, data governance – and the accompanying data quality and master data management initiatives – are frequently ignored in favor of putting cool, interactive reports and dashboards into the hands of the users as quickly as possible.
I then went on to elaborate about the importance of a data governance strategy – and how engaged resources and consistent data and processes trump tool selection every time. After my impassioned explanation, the senior director simply says: “you passed”…and we went back to a more casual relationship building session!
2. Regional focus yields higher adoption
Fast forward a few months, and that Director of Data Governance and I have become both friends and allies. We were sitting down, over coffee this time, to discuss his progress rolling out data governance for this large, regionally distributed organization. I was amazed at his progress! It helps that he is both highly competent and charismatic to “sell” data governance to his own organization. He now has regional data governance boards with multiple data stewards representing the different lines of business. These regional boards rollup to a national data governance board that keeps an eye on the decisions across regions. These data stewards have enthusiastically taken on their expanded roles – many while still fulfilling their “day job” responsibilities. Some of them also recognize the new career path possibilities as they commit to becoming more integral to the data decision-making for their organization.
In addition, adoption of the new processes is also very high because “national” engaged “regional” to be part of the solution versus forcing national mandates down the regions’ throats.
Ah, but data governance and BI projects are never short stories. Rather than stop at a happy ending, we have a plot twist that many of you can already guess: The data QUALITY, to put it delicately, leaves a lot to be desired! And that brings me to the next crucial point…
3. Data Quality comes before Data Visualization
My friend and I commiserated that both regional and national offices were proceeding with data visualization tool purchases and projects without having the firm foundations of good data and consistent data definitions across the enterprise. We all want to race to put those glitzy, interactive reports in the hands of the same decision-makers who determine our budgets for more projects. We want to impress these decision-makers with the rapid ROI.
The loss of credibility…and subsequent loss of project funding…can be devastating if the “numbers do not add up”. In some cases, the numbers add up just fine, but they differ across regions or even across reports in the same region. That usually indicates that the organization needs to resolve data DEFINITIONS as part of an overall Data Governance strategy. By doing the definitions first, you save the re-work of having to change data visualizations (and possibly even the ETL layer) later.
So you do the front-end work to get consistent data definitions across the enterprise – you win, right? Well, it’s about this time you may discover that the data QUALITY is suspect. My friend and I agreed that the best method to address data quality issues is to…
4. Get Data Profiling as close to the source as possible
Left unchecked, regional and/or ungoverned data can find its way into integrated datasources without the appropriate data validations and cleansing. The length of the identification, feedback, and data correction cycle is frequently too long. End Users discover data errors in reports, or perhaps national BI resources discover the errors via data profiling tools. However, the best people to discover and correct data quality errors – or at least facilitate the corrections – are the regional level data stewards. The feedback loop is much tighter because these regional team members are much closer to the source data systems and generally have better context and contacts to expedite the data corrections.
My friend and I discussed using a national-based implementation team to deploy data profiling tools, and provide appropriate training, to these regional data stewards. At a minimum, the “data profiling expert” should physically sit alongside the “subject matter expert” to quickly identify data anomalies and define/refine the appropriate business rules to address those anomalies (I’m a fan of this style of “paired programming” where you combine a technical resource with a domain expert to look at a single screen). In some cases, the cleansing can happen during the ETL stage, but it is always best to close the feedback loop by going back to the source to correct the data (if possible).
Data Governance can be a hard sell to executives because the end result isn’t about the glitz and glamor of cool data visualizations. Instead, it is about building the FOUNDATION to support CORRECT and CONSISTENT glitzy and glamorous data visualizations! If you need assistance establishing the business case for a Data Governance strategy, implementing a strategy, or even building the architecture, ETL, and visualizations to make your data work for you…drop me an email! ProKarma will competently and enthusiastically help you deliver on your BI initiatives!
Photo Courtesy of celebrityabc via Flickr and shared via Creative Commons CC BY-SA 2.0