What Is Metadata?
The data catalog has 14,000 assets. The adoption dashboard says 41% of business users have logged in at least once. The catalog team is asking for budget to ingest two more sources next quarter.
The CFO's analyst still asks the same Slack channel where to find the revenue table.
I have seen this shape in airflow-logs-first investigations on DAGs that are technically working. Every task has its run metadata, its operator metadata, its schedule metadata. The web UI displays all of it. The number of fields is impressive. The only one anyone clicks on is the duration column, because that is the field that answers the question they actually had.
Metadata catalogs fail this same way. The catalog is well populated, the technical schema is accurate, and adoption stalls because the catalog is answering "what does this column type look like" while the consumer is asking "is this the table I should trust for revenue."
Step One — The Wrong Assumption
"We need more coverage. Connect another source."
"Adoption is stuck because we don't have all the assets cataloged yet. Once we hit 100% coverage we'll see the lift." — Quarterly catalog roadmap, year two
This is the answer that buys you another quarter of catalog work and another quarter of flat adoption. The premise is that adoption is a function of coverage. Once every table is cataloged, the consumer will find the catalog useful, the flywheel will spin, and the program will deliver.
The premise is wrong. Adoption is not stuck on coverage. Adoption is stuck because the catalog has solved a different problem than the consumer needed solved. The catalog has the technical metadata of every table. The consumer needs to know which of two tables with similar names is the one finance considers official, who owns the discrepancies, and what the SLA is when the data is wrong. Those are not technical-metadata questions. Adding the next ten thousand tables does not move them.
Step Two — The Partial Signal
Three of four metadata layers are well populated. The fourth is the one consumers ask about.
Most catalogs do well on technical metadata: column names, types, sources, partitioning, freshness, last-modified. They do reasonably well on operational metadata: pipeline runs, owners-of-record, recent failures. They do moderately well on lineage metadata: where the data came from, how it was transformed.
They do badly on the layer the business actually needs, which is semantic metadata: what does this dataset mean in the context of the business, which of several similar tables is the canonical one for a given decision, what is the trust level, what is the change protocol, who decides when the meaning evolves.
This is the airflow-logs-first equivalent: the metadata you can capture automatically is not the metadata your consumer is going to use. The automatic capture is necessary; it is not sufficient. The fourth layer requires people, agreements, and review cadences — none of which an ingestion pipeline can produce on its own.
Step Three — The Failed Fix
You add a glossary. Nobody updates the glossary.
The catalog team eventually adds a business glossary — a layer where humans can document what a term means, who owns it, and which physical asset is the canonical implementation. The glossary launches with seventy entries, written by the data team, in two weeks.
Six months later, the glossary has eighty entries. Seven of them have a "last reviewed" date inside the last quarter. The rest are unchanged from launch. The terms that have evolved with the business — revenue, customer, active user — are technically present in the glossary and substantively wrong, because the definition that was true at launch is not the definition the business currently uses, and there is no review cadence that catches the drift.
The glossary did not fail because it was the wrong tool. It failed because the operating model around it does not exist. There is no named steward per term, no review cadence, no consequence for a stale definition. The catalog became a place where definitions go to expire.
Fig. 1 — Coverage is the loud system. Stewardship is the cause and the only fix.
Step Four — The Real Failure
It was never a tooling gap. It was a stewardship model that was never instantiated.
The actual failure is upstream of the catalog, in the definition of what stewardship means in this organization and who does it. The catalog tooling is fine. The technical metadata pipeline is fine. The lineage graph is fine. None of those address the question of who, specifically, on what cadence, with what authority, decides what a term means and updates the catalog when it changes.
Most catalog programs assume stewardship will emerge from making the catalog visible. It does not. Stewardship is a job, not a side effect. Without a named steward per domain, with capacity to do the work, with a review cadence and an escalation path when the business meaning shifts, the catalog drifts at exactly the speed the business changes — which is faster than it gets reviewed.
This is the same lesson backup admins, DBAs, and pipeline engineers keep encountering: the technical layer is a precondition for solving the problem, and it is not the same as solving the problem. The problem is in the operating model.
Step Five — The Definition
Now the definition lands.
Metadata is the data about data — structured at four layers (technical, operational, lineage, semantic) — whose value to the business depends entirely on the stewardship model around the semantic layer. The first three layers can be automated. The fourth is a job description.
The textbook definition of metadata says it is "data about data." That is true and almost useless on its own. The useful definition specifies what about the data — technical schema, operational lineage, business meaning — and acknowledges that the business-meaning layer requires a different operating model from the others.
Tooling solves the first three layers. Stewardship solves the fourth. Most catalog programs ship the first and skip the second.
What Solix Enforces
Metadata that survives the source system is the only kind that pays back.
What Solix's archival and governance platform enforces in this category is the survival of metadata past the lifespan of the source system that produced it. When SAP ECC is decommissioned, the metadata about its records does not disappear with it. When a custom application is retired, the lineage of its data is preserved with the records themselves.
This matters for the same reason data quality contracts matter: the moment a source system is replaced, every catalog entry pointing at it becomes a tombstone unless the metadata was captured at the boundary, under policy, with the records. The catalog is a viewport on the metadata. The metadata has to live somewhere governed.
Three things to do this week
- Pick one ambiguous business term and ask three teams what it means. Pick something everyone uses, like 'active customer' or 'recognized revenue'. Ask three teams independently. The variance you find is the size of the semantic-metadata problem your catalog is not currently solving. The variance is almost always larger than anyone expects.
- Find the steward of your most-cited table. If there is no named steward, name one. If the named steward has not reviewed the entry this year, the entry is technically present and substantively expired. Stewardship is a job; if no one is doing the job, the catalog is documentation theater.
- Cap your catalog ingestion until the glossary review cadence works. Ingesting more sources without a working stewardship model produces more entries that will go stale. The honest move is to freeze coverage, fix the operating model, then resume. The number nobody wants to commit to is the operating one.
References
- Gartner Peer Insights, market category — Metadata Management Solutions. Reviewed 2026
- Gartner Peer Community — Poll: Data Catalog Governance Tool Facing Lowest Business Adoption.
- Forrester Research — The Forrester Wave™: Data Governance Solutions, Q3 2025. Report ID RES184107
About the author
Barry Kunst is VP of Marketing at Solix Technologies. He writes about enterprise data lifecycle, application retirement, and modernization in systems that have outlived their original mandate. Earlier in his career he supported IBM zSeries ecosystems for CA Technologies' multi-billion-dollar mainframe business, with first-hand exposure to lifecycle risk at scale.
- Solix Leadership
- Forbes Technology Council
- MIT
Find him at:
What you can do with Solix
Enter to win a $100 Amex Gift Card
Related Resources
Explore related resources to gain deeper insights, helpful guides, and expert tips for your ongoing success.
Why SOLIXCloud
SOLIXCloud offers scalable, secure, and compliant cloud archiving that optimizes costs, boosts performance, and ensures data governance.
-
Common Data Platform
Unified archive for structured, unstructured and semi-structured data.
-
Reduce Risk
Policy driven archiving and data retention
-
Continuous Support
Solix offers world-class support from experts 24/7 to meet your data management needs.
-
On-demand AI
Elastic offering to scale storage and support with your project
-
Fully Managed
Software as-a-service offering
-
Secure & Compliant
Comprehensive Data Governance
-
Free to Start
Pay-as-you-go monthly subscription so you only purchase what you need.
-
End-User Friendly
End-user data access with flexibility for format options.
