What Is Data Quality Management?
The data quality dashboard is green. The pipeline ran. The row counts match. The expected schemas validated. The completeness check passed.
Sales says the numbers are wrong. They have been wrong for a week.
I have been on the side of this where it looks like a pipeline problem. You stare at watermark-first, you trace the late-arriving partitions, you check the ingestion-lag dashboard, and the lag is within tolerance. Late data is annoying but it is not what is wrong here. The data is on time. The data is the right shape. The data still does not match what the business sees in its own systems.
Data quality programs fail the way pipeline runs fail when you are looking at the wrong window. The signal is technically clean. The signal is not what the consumer needed.
Step One — The Wrong Assumption
"We need better quality rules. Add another check."
"If the dashboard says green and the user says wrong, we just need more rules." — Every data quality program, year two
The first instinct is always to add coverage. Another null check. Another range constraint. Another freshness threshold. The rule library doubles every year. The dashboard gets denser. The proportion of "green" tiles approaches one hundred percent.
The proportion of correct numbers in the business does not move with it. The reason is that the rules are measuring whether the data is what the pipeline expected. The business question is whether the data is what the consumer expected. Those are not the same question, and the gap between them is where data quality programs go to die.
Step Two — The Partial Signal
Three of four signals look fine. The fourth is the shape, not the count.
The standard data quality dimensions — completeness, validity, uniqueness, timeliness — cover most of the failure modes you can detect inside the pipeline. They will catch the row that is missing a required field, the value that does not match the schema, the duplicate that broke the join, the partition that arrived after the SLA.
What they do not catch is the dimension that matters most for the consumer: conformance to the meaning the consumer assigned the field. The pipeline thinks customer_id is a string. The consumer thinks customer_id is the canonical identifier that joins to the CRM. Both can be true and the data can still be wrong, because the producer started emitting a different id system in March and never told anyone, and the field name is unchanged.
The dashboard is green. The consumer is broken. Three of four signals can come back clean while the fourth — the one nobody measured — is the only one that mattered.
Step Three — The Failed Fix
You add a contract test. The producer doesn't know they own a contract.
So the team writes a contract test. Schema-on-read in dbt, expectations in Great Expectations, a JSON schema sitting next to the table definition. The contract says: customer_id must be the CRM canonical identifier; the producer is responsible for emitting it correctly.
Then you discover the producer is a service team three reorgs away that has never heard of the contract test, never agreed to be bound by it, and is shipping a refactor next sprint that will rename the field. The contract is in your repo. The producer is not.
This is the moment most data quality programs break. The tooling now exists, the rule is now codified, and the relationship that the rule depends on does not exist. You can run the test as often as you want; it cannot enforce a contract on a party that did not sign it.
Fig. 1 — Three of four signals stay green. The fourth was never a signal you could measure inside the pipeline.
Step Four — The Real Failure
It was never a measurement gap. It was a relationship that nobody owned.
The actual failure is in the social structure around the data, not in the data. The producer has incentives that are not aligned with the consumer's expectations. The platform team owns the pipeline but not the contract. The consumer owns the report but not the source. There is no one whose job description includes maintaining the agreement between the two.
This is the failure that backup admins, DBAs, and pipeline engineers all eventually recognize: the technical system is doing exactly what it was built to do, and the thing that is broken is one layer up, in a workflow or an ownership map that was never written down. The dashboards that monitor the technical system cannot see the failure because the failure is not in the technical system.
The clean version of data quality is not a longer list of rules. It is a defined contract between every named producer and every named consumer, with explicit responsibilities, an SLA, and a process for changing the contract that does not involve the consumer finding out by way of a wrong number on Tuesday.
Step Five — The Definition
Now the definition lands.
Data quality management is the discipline of maintaining the contract between producers and consumers of data — through measurement, yes, but more importantly through named ownership, change protocols, and remediation pathways for when the contract breaks. Quality is not a property of the data. It is a property of the relationship around it.
Most definitions describe data quality through its dimensions: accuracy, completeness, consistency, timeliness, validity, uniqueness. These are the metrics. The metrics are not the discipline. A team can score perfectly on every dimension and still ship wrong numbers, because what was measured was technical conformance and what was needed was business meaning.
The discipline is the contract. The metrics are how you tell whether the contract is being honored.
What Solix Enforces
Quality lives at the boundary, not at the table.
What the Solix Common Data Platform enforces in this category is the contract layer between the systems of record and the analytical, governance, and AI consumers downstream. When a record is captured into the governed platform, it carries its provenance, its retention rule, its access policy, and its semantic contract with it — bound at the boundary, not inferred at the table.
This matters most at the moments quality programs typically fail: when a source system is replaced, when a producer reorgs, when a regulator asks for the lineage of a specific decision. The contract survives the source. The data quality program does not have to start over every time the upstream changes.
Three things to do this week
- Pick one report nobody trusts and trace it back to its named producer. Walk every join, every field, every transformation. The exercise almost always surfaces a producer who does not know they are a producer, or a contract that was implied and never agreed. The producer-consumer pair you find is the one to formalize first.
- Audit your data quality rules for technical-only coverage. Count how many rules check the shape of the data versus how many check whether the data means what the consumer expects. If the ratio is more than 80 percent technical, the dashboard will keep going green while the business keeps going wrong.
- Establish a change protocol for one critical field. Pick the most-used field in the most-used table. Write the protocol: who must approve a change, who must be notified, what happens to the contract test, what the rollback path is. The protocol becomes the template. Without one, every change is a fire drill.
References
- Gartner — Data Quality (topic page).
- Gartner Peer Insights, market category — Augmented Data Quality Solutions. Reviewed 2026
- Forrester Research — The Forrester Wave™: Data Governance Solutions, Q3 2025. Report ID RES184107
About the author
Barry Kunst is VP of Marketing at Solix Technologies. He writes about enterprise data lifecycle, application retirement, and modernization in systems that have outlived their original mandate. Earlier in his career he supported IBM zSeries ecosystems for CA Technologies' multi-billion-dollar mainframe business, with first-hand exposure to lifecycle risk at scale.
- Solix Leadership
- Forbes Technology Council
- MIT
Find him at:
What you can do with Solix
Enter to win a $100 Amex Gift Card
Related Resources
Explore related resources to gain deeper insights, helpful guides, and expert tips for your ongoing success.
Why SOLIXCloud
SOLIXCloud offers scalable, secure, and compliant cloud archiving that optimizes costs, boosts performance, and ensures data governance.
-
Common Data Platform
Unified archive for structured, unstructured and semi-structured data.
-
Reduce Risk
Policy driven archiving and data retention
-
Continuous Support
Solix offers world-class support from experts 24/7 to meet your data management needs.
-
On-demand AI
Elastic offering to scale storage and support with your project
-
Fully Managed
Software as-a-service offering
-
Secure & Compliant
Comprehensive Data Governance
-
Free to Start
Pay-as-you-go monthly subscription so you only purchase what you need.
-
End-User Friendly
End-user data access with flexibility for format options.
