Database archiving on a tiered storage architecture is moving onto center stage in this era of global warming, accelerating data storage demand, increasing energy costs, overcrowded physical data centers. From the archiving point-of-view, business data comes in three forms: structured data from formal databases, unstructured data from informal sources, and semi-structured (email) that contains unstructured content in a structured metadata wrapper. While this semi-structured data is claiming the greatest attention at present, structured data growth is accelerating as well, and from a data archiving standpoint it has its own important and often neglected issues.
Gartner estimates that 80% of the structured data in most enterprise data centers is inactive. Gartner also estimates that a staggering 50% of enterprise data centers worldwide will run out of power and cooling capacity this year (2008) due in part to the out-of-control growth of storage systems and subsystems. And experts estimate that growing energy costs will emerge as the second largest line item on 70% of enterprise IT operating budgets worldwide by next year (2009). Given that many IT organizations are entering their 2009 budget process now, and that pressures to “do more with less” are only increasing, data archiving is no longer an “ideal” or “visionary statement.” It is an immediate issue that must be addressed this year – in many cases immediately. And in the greater picture, scientists are issuing shrill warnings that if we do not cut our carbon load drastically; global warming will rise to disaster levels.
Inactive data occupies valuable Tier 1 and 2 disk spaces, an effect that is greatly multiplied by the database copies that proliferate in a large IT environment. It adds extra compute loads, decreasing performance, increases power use and heat loads, decreases the lifespan of primary processing systems, and forces premature migration to expensive, larger disk storage systems. It can have a marked effect on the lifespan of the physical data center, since most data centers are replaced or rebuilt because they have run out of space or power and cooling long before they are physically obsolete.
Conversely, archiving, which means moving the data to a lower tier in a multi-tier system (for instance to inexpensive SATA disks), and possibly to a secondary site accessed through the network, can decrease pressure on the data center dramatically, increase the life of primary processors and delay purchase of new high-performance storage. A comprehensive purging of the inactive data from all the secondary copies can have a dramatic positive effect on the data center and decrease litigation risk by removing information that might be used against the organization in court. Data that is completely inactive but needed for compliance can be removed to tape or un-powered disks kept off site.
This can delay or in some cases negate the need for major server and storage system upgrades, saving not only the hardware purchase price but also networking, power and cooling, server provisioning and administration, and disposal of old systems. It also has a dual positive impact on the environment by cutting both demand for energy that often is generated by burning coal, natural gas or petroleum, and delaying disposal of older systems, which often end up leaking various pollutants into ground water.
Most IT departments have no idea of the energy requirements of the various boxes in their data centers. This indicates that they have a major opportunity to reduce their energy use, and save on their operating budgets, by optimizing that energy use. The first step is to analyze and optimize the energy demands of each system. In many cases, data archiving will play a major role in such efforts. Thus, identifying archiving the huge volumes of inactive data in most environments is a green strategy in both senses of that word and can have a major positive impact on both the IT budget and the data center’s carbon footprint, a clear win for all concerned.