SAP archiving before S/4HANA migration: what to move out before you move up
The HANA pre-sizing report comes back, and a room of competent people goes very quiet. The number is eighteen times the estimate. Someone at the back, the FICO lead who has been with the company longer than most of the project, says the words out loud: we are about to buy in-memory database licenses to store invoices from 2009.
The board approved the migration twelve months ago on a number that assumed a clean, current operational dataset. The actual dataset is twelve years of FI documents, eight years of stuck workflow runs, a complete history of every printout an SAP job has ever produced, and a SOH table whose row count nobody has audited since the last upgrade.
And so the project pivots. Not to a smaller migration. To a data-reduction workstream that should have started eighteen months ago.
This is the shape of the conversation in almost every SAP shop preparing for S/4HANA in 2026. The migration was scoped as a technical project. The data underneath it was scoped as a project for later. The order is exactly backward, and the cost of getting it backward is paid in HANA memory dollars, every quarter, forever.
Step One — The Wrong Assumption
We'll archive after we cut over.
"We're not going to slow the migration down for an archiving project. We'll get on S/4 first and clean up later."
— Standard pre-migration steering position, every SAP program in 2025
It is the most common pre-migration position, and it is wrong in a specific way. The migration carries the data it inherits. The bigger the dataset that crosses the cutover, the bigger the HANA system that gets sized to host it, the bigger the license tier, the longer the conversion window, the longer the test cycles, and the bigger the recurring cost the business absorbs from the day the migration ends. "Archive later" sounds operationally neutral. It is not. It is a decision to set the recurring cost ceiling at the level of every record the company has ever produced, including the ones it has been legally allowed to delete for a decade.
Step Two — The Partial Signal
The sizing number arrives, and the wrong question gets asked.
The early signal that this is happening is the HANA sizing report. The report converts the existing ECC database into a projected HANA memory footprint, and in most shops it comes back surprisingly large. The instinctive response — across architecture, infra, and the SAP partner running the migration — is to negotiate the sizing. More nodes. Bigger nodes. A different deployment model. A scale-out instead of scale-up.
None of those answers ask the question the sizing report is actually trying to surface, which is: how much of this data needs to be in memory at all? Operational HANA was designed to hold current, queried data. The pre-sizing exercise is the moment to find out how much of the database is not that.
Step Three — The Failed Fix
Buying more HANA memory to make the problem fit.
The failed fix is procurement. The program calls SAP, calls the hyperscaler, calls the implementation partner, and works out a deal for a larger HANA target. The migration proceeds. The number on the invoice changes; the underlying data-placement question is untouched. Twelve months after cutover, the same room of competent people will be staring at a HANA growth report that adds another node every quarter, because no archiving has been turned on, no retention rule has been authored, and the system is by design loading every new document into memory and keeping it there.
The fix did not fix anything because the cost driver was never the size of the system — it was the absence of any discipline for what does not belong in the system. Without that discipline the HANA bill grows monotonically with operational time.
Fig. 1 — When the archiving design slips to "after migration," the migration carries the unreduced data — and the HANA cost ceiling is set by history, not by current operations.
Step Four — The Real Failure
The actual failure is treating data reduction as a post-cutover cleanup.
The real failure is sequencing. SAP shops that arrive at S/4HANA with a healthy operational footprint do the archiving design before the migration design — not after. They identify the top archive objects (typically FI_DOCUMNT, MM_MATBEL, SD_VBAK, RV_LIKP, WORKITEM, and the IDoc and spool history tables), they author the retention rules in SAP ILM,[1] they stand up the nearline target,[2] and they run the reductions while the legacy system is still the system of record. By the time the migration starts, the dataset crossing the cutover is the dataset the business actually operates on, plus whatever the retention rules require to remain live.
The shops that arrive in the opposite order — migrate first, archive later — pay HANA prices on the unreduced dataset for as long as the platform runs, and they pay them on top of an implementation cost that was sized to a smaller number than what they actually have to host.
Step Five — The Definition
Now the definition lands.
SAP archiving is the SAP-certified discipline of moving completed business documents out of the operational database, governed by retention rules authored in SAP ILM, with continued read access through SAP standard transactions and reporting — typically into a nearline (NLS) tier for low-cost queryable retention.
Notice what is and is not in the definition. The data does not leave the SAP world; it leaves the in-memory tier. Retention is not implemented by a custom job; it is governed by an ILM policy that auditors can read. And the read path through standard transactions stays intact, which is the property that lets the business stop paying for the document while continuing to be able to find it.
SAP-certified archiving + ILM + NLS, designed as a single pre-migration motion.
What Solix runs in this category is the full SAP-certified archiving pipeline — BC-ILM-SE and BC-ILM-NLS certified — wrapped around the data-reduction sequence the migration needs. ILM rules are authored against the live ECC system. The archive objects are scoped, run, and verified. The nearline target carries the cold tier with full ADK round-trip. The dataset that crosses the cutover is the reduced one. The HANA bill that follows is sized to the reduced one, every quarter, for as long as the platform runs.
The mechanism is not a Solix invention; it is the SAP-defined ILM-and-NLS pattern. What Solix supplies is the discipline to run it on the calendar where it actually matters — before the migration carries the data forward — and the certified plumbing that keeps the SAP read path intact while it happens.
Three things to do this week
- Run a HANA pre-sizing report with retention rules applied. Re-run the standard pre-sizing exercise with a retention model overlaid against your top archive objects. The delta between the raw size and the retention-applied size is the data-reduction prize, in HANA memory dollars per year. Most SAP shops have never seen this comparison drawn.
- Identify the top five archive objects driving your footprint. In almost every ECC dataset, five archive objects carry 70–85% of the residual data volume: FI_DOCUMNT, MM_MATBEL, SD_VBAK or RV_LIKP, the workflow history (WORKITEM), and the spool/IDoc archive history. Scope them first; the long tail is small relative to those five.
- Author retention before the migration, not after it. Stand the ILM policy up against the live ECC system, get the policy reviewed and signed off by records-management and the responsible compliance owners, and run the archive jobs while ECC is still the system of record. Migration after that point crosses the reduced dataset; migration before that point bakes the unreduced one into HANA permanently.
References
- SAP Help Portal — SAP Information Lifecycle Management (ILM) — Overview. SAP ILM is the framework that governs retention, legal hold, and destruction across SAP data.
- SAP Help Portal — Nearline Storage for SAP BW and SAP BW/4HANA. Nearline storage is the SAP-defined tier for cold, queryable retention outside of HANA memory.
- SAP Help Portal — Data Archiving with the Archive Development Kit (ADK). The ADK is the engine that handles read-back of archived documents from standard transactions.
- Gartner press — Gartner Forecasts Worldwide Data Management Software Spending. Gartner's data-management forecast tracks the structural shift of spend toward retention-tier and storage-optimized platforms.
- EUR-Lex (official EU law) — GDPR Article 5(1)(e) — Storage limitation. Storage-limitation principle that makes retention not just a cost lever but a compliance obligation under EU law.
About the author
Barry Kunst writes Solix's lived-narrative series — engineer-voiced reads on data lifecycle, archival, and governance, drawn from real failure modes across mainframe ops, DBA work, integration, and modernization. This piece draws on a series of SAP basis and FICO conversations during ECC-to-S/4HANA pre-sizing, where the gap between estimated and actual data volumes is almost always the moment the archiving design moves from "later" to "this quarter."
- Solix Leadership
- Forbes Technology Council
- MIT
Find him at:
What you can do with Solix
Enter to win a $100 Amex Gift Card
Related Resources
Explore related resources to gain deeper insights, helpful guides, and expert tips for your ongoing success.
Why SOLIXCloud
SOLIXCloud offers scalable, secure, and compliant cloud archiving that optimizes costs, boosts performance, and ensures data governance.
-
Common Data Platform
Unified archive for structured, unstructured and semi-structured data.
-
Reduce Risk
Policy driven archiving and data retention
-
Continuous Support
Solix offers world-class support from experts 24/7 to meet your data management needs.
-
On-demand AI
Elastic offering to scale storage and support with your project
-
Fully Managed
Software as-a-service offering
-
Secure & Compliant
Comprehensive Data Governance
-
Free to Start
Pay-as-you-go monthly subscription so you only purchase what you need.
-
End-User Friendly
End-user data access with flexibility for format options.
