An Introduction to Data Quality
A new analyst joins the team. Their first question is the most useful one anyone asks all year: what does data quality actually mean here?
The senior team gives four different answers in the same meeting. None of them are wrong. None of them agree.
I have seen the four answers many times. The data engineer says quality is whether the pipeline produces what was specified. The analyst says quality is whether the numbers match the source of truth. The compliance partner says quality is whether the data meets the regulatory standard. The product manager says quality is whether the dashboard supports the decision. Each of them is accurate within their own job. None of them is the same answer.
This is the same shape as etl-pipeline-first debugging where every team's definition of "the bug" is locally consistent and globally incompatible. The data engineer fixes the pipeline. The analyst fixes the join. The compliance partner fixes the field. The product manager rewrites the dashboard. None of them is wrong. None of them, alone, fixes the problem. The problem is at the layer where the four definitions are supposed to align, and that layer is rarely staffed.
Step One — The Wrong Assumption
"Data quality is the accuracy and completeness of the data."
"Data quality is the property of data being fit for use — accurate, complete, consistent, timely."
The standard definition is correct, useful, and structurally insufficient as a starting point. It treats quality as a property of the data — the data is or is not accurate, complete, consistent, timely. The implication is that if the data has these properties, quality is met.
The structural failure is that "accurate," "complete," "consistent," and "timely" are not properties of the data alone. They are properties relative to a use. Data accurate enough for a marketing dashboard is not accurate enough for a financial close. Data complete enough for a customer-success workflow is not complete enough for a regulatory filing. Quality is the relationship between the data and the use, and the standard introductions describe one half of the relationship and treat the other as a constant.
Step Two — The Partial Signal
Three of four quality dimensions are well measured. The fourth is whose dimension you are measuring.
The DAMA-DMBOK quality dimensions — accuracy, completeness, consistency, timeliness, validity, uniqueness — are real and useful. Each can be measured. Each has tooling. Each appears on most data quality dashboards. A program that measures all six is doing better than a program that measures none.
What the dimensions do not specify is the consumer whose use defines them. Accuracy against what reference? Completeness for which use case? Consistency across which sources? Timeliness for which workflow? The dimensions describe categories of measurement; they do not, on their own, specify the binding to a particular consumer's expectation. Two dashboards using the same dataset can both be "high quality" by the dashboard, and substantively wrong for one of the two consumers, because the consumer's use was never bound to the measurement.
This is the partial signal in introductory programs. The dimensions are taught. The consumer-specific binding is left as an exercise.
Step Three — The Failed Fix
The team builds dashboards for each dimension. The CFO still gets a wrong number.
The natural response to the gap is to build out the dimensions further. More accuracy rules. More completeness checks. More consistency assertions across sources. The dashboard expands. The proportion of green tiles approaches one hundred percent.
The CFO still gets a wrong number, because the wrong number is not produced by a violation of any of the rules. It is produced by a definition mismatch between two systems both ruled compliant on their own terms. Marketing's customer is anyone who registered. Finance's customer is anyone with a paid invoice. Both definitions are internally consistent. Neither system is failing its own quality checks. The CFO's question, which assumed customer meant one thing, gets two answers, both technically correct.
The fix did not fix anything because it built more measurement at the dimension layer and skipped the consumer-binding layer. The dimensions are necessary; they are not the discipline.
Fig. 1 — Dimensions are the loud number. The producer-consumer pair is the unit that has to bind them.
Step Four — The Real Failure
It was never about the dimensions. It was about the relationship the dimensions are supposed to measure.
The actual structure of data quality is two-layered. The dimensions are the measurement. The contract between producer and consumer is the thing being measured. Quality is the alignment between what the consumer expected and what the producer delivered, with the dimensions as the language for describing the alignment. Programs that work at the dimension layer alone produce dashboards that say healthy and consumers who do not trust the numbers.
The clean introduction to data quality is therefore: quality is a property of a relationship. The producer is one party. The consumer is the other. The dimensions are the dictionary the two parties use to specify what they expect. Without the relationship, the dimensions describe nothing in particular. With the relationship, the dimensions are operational.
This framing changes what an introductory program builds first. Not a profiler. Not a dimension dashboard. A registry of producer-consumer pairs and the contracts between them. The dimensions then attach to the contracts, the contracts to the relationships, the relationships to the consumers whose decisions the data is supposed to support.
Step Five — The Definition
Now the definition lands.
Data quality is the fitness of data for the use a specific consumer makes of it — described through the dimensions of accuracy, completeness, consistency, timeliness, validity, and uniqueness, and measured against a contract between the producer and the consumer. Quality is a property of the relationship. The dimensions are how the relationship is specified.
Most introductions describe quality as a property of data. The standard definition is not wrong; it is incomplete in the way that produces the most expensive failure mode. Programs that internalize the standard definition build dimension dashboards. Programs that internalize the relational definition build producer-consumer contracts and use the dimensions as the contract language.
The first set produces healthy-looking metrics and consumer mistrust. The second produces fewer metrics and consumer trust. The difference, after a year, is operationally enormous.
What Solix Enforces
Quality lives at the boundary, not in the middle of the table.
What Solix's governance and archival platform enforces in this category is the contract layer between producers and consumers, bound at the boundary the data crosses. When records leave a system of record, their schema, semantic contract, and intended consumer constraints travel with them — in metadata, in retention policy, in access controls. The dimension dashboards on the consumer side become diffs against the contract, not measurements in isolation.
For SAP ECC, Oracle E-Business Suite, custom application retirement, and the AI training pipelines that depend on historical records being faithful to their original semantics, the same pattern applies. The contract survives the source. The quality program does not have to start over every time the upstream changes.
Three things to do this week
- List the producer-consumer pairs for your three most important reports. For each report, name the producer (the system that emits the data), the consumer (the team that uses the report), and the contract (what the producer promises and the consumer expects). The exercise reveals which pairs have a real contract and which have an implicit one. Implicit contracts are where the failures live.
- Pick one dimension on one report and audit its consumer-binding. Pick completeness on the most-used dashboard. Ask the dashboard owner what completeness threshold they actually need. Compare it to what the pipeline measures. The mismatch is the size of the consumer-binding gap on that one report. Multiply by the number of reports.
- Stop building dimension dashboards until you have producer-consumer pairs. Adding more dimensions to a system without producer-consumer pairs produces more measurements that consumers do not trust. The honest move is to invest in the relational layer first. The dimensions then attach to relationships, which is the only layer that turns measurement into trust. Without the relationships, the dashboards report numbers nobody is willing to act on.
References
- Gartner — Data Quality (topic page).
- Gartner Peer Insights, market category — Augmented Data Quality Solutions. Reviewed 2026
- Forrester Research — The Forrester Wave™: Data Governance Solutions, Q3 2025. Report ID RES184107
About the author
Barry writes Solix's lived-narrative series — engineer-voiced reads on data lifecycle, archival, and governance, drawn from real failure modes across mainframe ops, DBA work, integration, and modernization. This piece draws on ETL pipeline ops because the contract-tests-pass-bug-ships pattern shows up earliest in pipeline integration work, where the contract is local and the bug is at the consumer.
- Solix Leadership
- Forbes Technology Council
- MIT
Find him at:
What you can do with Solix
Enter to win a $100 Amex Gift Card
Related Resources
Explore related resources to gain deeper insights, helpful guides, and expert tips for your ongoing success.
Why SOLIXCloud
SOLIXCloud offers scalable, secure, and compliant cloud archiving that optimizes costs, boosts performance, and ensures data governance.
-
Common Data Platform
Unified archive for structured, unstructured and semi-structured data.
-
Reduce Risk
Policy driven archiving and data retention
-
Continuous Support
Solix offers world-class support from experts 24/7 to meet your data management needs.
-
On-demand AI
Elastic offering to scale storage and support with your project
-
Fully Managed
Software as-a-service offering
-
Secure & Compliant
Comprehensive Data Governance
-
Free to Start
Pay-as-you-go monthly subscription so you only purchase what you need.
-
End-User Friendly
End-user data access with flexibility for format options.
