Practical Ways to Improve Credit Data Quality
Despite unremitting regulatory pressure and millions of dollars of investment in data governance, many banks have made little real progress in addressing the root causes of “dirty” credit data.
The consequences of this failure are enormous; data quality issues undermine basic credit risk management. For example, without reliable data, like single-client identifiers or product codes, banks struggle to connect various exposures in different products to the same counterparty. Without timely delivery of critical data, like mark-to-market collateral values, net exposure calculations may be incorrect.
Indeed, poor data often means that credit officers spend more time debating the accuracy of underlying data than they do addressing the critical credit issues that the data should inform. At a large bank, we found credit officers spent, on average, one to two hours every day checking and remediating exposure and collateral data.
COVID Has Been a Wake-Up Call
The COVID-19 crisis has served as a wake-up call for risk departments, underscoring the urgent need for better credit data. In the absence of clean data, credit officers have scrambled to answer critical questions about primary exposures to vulnerable industries such as airline, retail, and oil and gas companies. Many have been at a loss to explain potential secondary exposures to affected industries via supply chains or investment funds. Others have struggled to track drawdown behavior across products at a client level, connecting the dots only after a considerable delay — at one bank, it took a week to pull together its total exposure across all products to oil.
The crisis has shown that banks with better credit risk data can see potential problems earlier and react more quickly, getting ahead of peers with poor data. They can also operate more efficiently, dispensing with legions of “data elves” that manually validate and clean dirty data.
The benefits of clean credit data have not been lost on regulators. Even before the current crisis, many regulators had become more proactive in pushing banks to show credible plans for addressing the root causes of credit data issues. Amid continuing economic uncertainty, banks need to be able to quickly respond when the next spasm strikes. Regulators are watching.
Indeed, the ECB’s recent letter to Significant Institutions underscores its concern about banks’ ability to manage distressed portfolios.
Progress on Cleaning Data Has Been Limited
Many banks do not have a solid operational approach for their critical data processes. At most banks, the lack of process-related KPI’s to track drivers of poor data (e.g., manual uploads, adjustments, hand-offs, checks and controls) is revealing. Banks are not applying the same level of operational rigor to critical report generation processes that they apply to, say, mortgage or other key customer processes.
The systems in place to ensure proper data “ingestion” into risk processing environments do not work well. In fact, traditional ETL (Extract-Transform-Load) tools have become a major cause of data indigestion. These systems struggle to control the quality of incoming data. For large banks, multiple layers of business rules (amounting to millions of lines of code) parse thousands of feeds per day from dozens — sometimes hundreds — of different data sources, creating an impenetrable barrier for those diagnosing the root causes of data issues.
Banks need to rethink their approach to cleaning up credit data as a matter of urgency. Based on practices at peers that are making progress, we have four recommendations for next generation credit data remediation.
Against the backdrop of the current crisis, banks need to revisit their existing programs and take decisive action to clean up credit data once and for all.
Recommendation One: Source Consolidation
Leading banks are consolidating upstream data repositories to create single sources of truth or “golden sources” for critical types of data. Source consolidation reduces the effort that risk departments (and others) need to expend in data ingestion by decreasing the number of feeds by as much as 75%. Building golden sources for reference data domains (for example, book, product, party, legal entities, instruments) is particularly beneficial because so much risk data remediation work typically involves sorting through inconsistent reference data to create comparable data sets.
Leaders are also ensuring that the data in these “golden sources” are accurate, up-to-date and complete, reducing the effort involved in downstream data clean-up by as much as 95%. Since “accurate” may have a different meaning for risk than for data owners, a few banks have put in place service level agreements to define specific data quality standards.
Recommendation Two: Treat Your (Data) Indigestion
Pioneer banks are replacing their existing, over-engineered ETL tools with modern data orchestration layers, often leveraging tools from cloud storage providers that are highly customizable and less expensive than traditional tools.
These orchestration layers are typically simpler, more efficient and more flexible than older data ingestion systems. They contain far fewer business rules and have “user friendly” libraries that enable non-technical users to have a clear understanding of previously “hard-coded” adjustments.
Recommendation Three: Measure, Measure, Measure
Leading banks are also deploying incentives and penalties in order to encourage ownership and accountability among data providers. Corporate goodwill and good intentions are often not enough. Without carrots and sticks, it is tough to motivate busy executives to dedicate time and effort to clean data.
A few years ago, after a series of reporting issues, a leading European bank set up a central data quality team to monitor and control data inputs. This team flagged “dirty data” issues and communicated them back to data providers. If these issues persisted, the offending data providers received a punitive charge for internal reporting purposes until the problem was fixed. Senior executive bonuses could be directly affected by a failure to respond. After one year, the bank noticed a dramatic turnaround in its data quality, with errors and restatements falling by 80%. Moreover, the scheme created much greater awareness of (and cultural aversion to) dirty data.
Leading banks like the one mentioned above are measuring data quality with a level of precision that allows them to identify the root causes of problems. Ideally, these measurements should rely on controls and checks carried out at the source of data to give risk and other users advanced warning of issues.
Recommendation Four: Clean Up the Clean-Up Crews
Leading banks are revisiting the need for large teams of manual data fixers. Many risk departments have built clean-up crews to validate, correct and enrich data. These armies of data fixers often constitute a cheap and expedient way to deal with dirty data.
But the remedy can be worse than the disease over the long run. Clean-up crews reinforce a culture of manual workarounds that can become self-perpetuating and delay a proper reckoning with the root causes of dirty data.
Re-Thinking Credit Data
Banks can no longer afford to ignore the knocking and rattling coming from their credit engines. Poor credit data — like contaminated fuel in a combustion engine — can undermine performance, make it difficult to keep up with competitors and ultimately cause a complete breakdown.
Against the backdrop of the current crisis, banks need to revisit their existing programs and take decisive action to clean up credit data once and for all.