Monthly Archives: April 2012

Storage Tiers – Putting Data in It’s Place

I’m frequently surprised by the number of companies who haven’t transitioned to a tiered storage structure.  All data is not created equal.  While a powerful database may place extreme demand on storage, word processing documents do not. 

As we move into a new world of “big data”, more emphasis needs to be focused on making good decisions about what class of disk this data should reside on.  Although there are no universally accepted standards for storage tier designations, frequently the breakdown goes as follows:

Tier 0 – Solid state devices

Tier 1 – 15K RPM SAS or FC Disks

Tier 2 – 10K RPM SAS or FC Disks

Tier 3 – 7200 or 5400 RPM SATA (a.k.a. – NL-SAS) Disks

So why is a tiering strategy important for large quantities of storage?  Let’s take a look at similar storage models for 1 petabyte of data:

The difference in disk drive expense alone is over $225,000 or around 30% of the equipment purchase price.  In addition there other issues to consider.

Pros:  

  • Reduces the Initial purchase price by 25% or more
  • Improving energy efficiency by 25% – 35%  lowers operational cost and cooling requirements
  • Substantial savings from reduced data center floorspace requirements
  • Increased overall performance for all applications and databases
  • Greater scalability and flexibility for matching storage requirements to business growth patterns
  • Provides additional resources for performance improvements (an increased number of ports, cache, controller power, etc.)
  • A high degree of modularity facilitates better avoidance of technical obsolescence
  • May moderate the demand for technical staff necessary to manage continual storage growth                                                                              

Cons: 

  • Requires automated, policy-based data migration software to operate efficiently.
  • Should employ enterprise-class frames for Tiers 0/1 and midrange arrays for Tiers 2/3
  • Incurs approximately a 15% cost premium for enterprise-class storage to support Tier 0/1 disks
  • Implements a more complex storage architecture that requires good planning and design
  • Needs at least a rudimentary data classification effort for maximum effectiveness

So does the end justify the effort?  That is for each company to decide.  If data storage growth is fairly stagnant, then it may be questionable whether the additional effort and expense is worth it.  However if you are staggering under a 30% – 50% CAGR storage growth rate like most companies, the cost reduction, increased scalability, and performance improvements achieved may well justify the effort.

Big Data – Data Preservation or Simply Corporate Hoarding?

Several years ago my Mother passed away.  As one of her children, I was faced with the challenge of helping clean out her home prior to it being put up for sale.  As we struggled to empty out each room, I was both amazed and appalled by what we found.  There were artifacts from almost every year in school, bank statements from the 1950s, yellowing newspaper clippings, and greeting cards of all types and vintages.  Occasionally we’d find a piece that was worth our attention, but the vast majority of saved documents were just waste – pieces of useless information tucked away “just in case” they might someday be needed again.

Unfortunately many corporations engage in the same sort of “hording”.  Vast quantities of low-value data and obsolete information are retained on spinning disk or archived on tape media forever, “just in case” they may be needed.   Multiple copies of databases, outdated binaries from application updates, copies of log files, ancient directories and files that were undeleted – all continue to consume capacity and resources.

Perhaps this strategy worked in years past, but it has long outlived its usefulness.  At the average industry growth rate, the 2.5 Petabyte of storage you struggle with today will explode to over 1.0 Exabytes within 15-yrs!  That’s a 400 times increase in your need for storage capacity, backup and recovery, SAN fabric bandwidth, data center floor space, power and cooling, storage management, staffing, disaster recovery, and related support items.  The list of resources impacted by storage growth is extensive.  In a previous post I’d identified (46) separate areas that are directly affected by storage growth, and must be scaled accordingly.  An x400 expansion will result in a simply stunning amount of hardware, software, facilities, support services, and other critical resources needed to support this rate of growth.  Deduplication, compression, and other size reduction methods may provide temporary relief but in most cases they simply defer the problem, not eliminate it.

The solution is obvious – reduce the amount of data being saved.  Determine what is truly relevant and save only information that has demonstrable residual value.  This requires a system of data classification, and a method for managing, migrating, and ultimately expiring files.  

Unfortunately that is much easier said than done.  Attempt to perform data categorization manually and you’ll quickly be overwhelmed by the tsunami of data flooding the IT department.  Purchase one of the emerging commercial tools for data categorization, and you may be frustrated by how much content is incorrectly evaluated and assigned to incorrect categories. 

Regardless of the challenges, there are very few viable alternatives to data classification for maintaining massive amounts of information.  Far greater emphasis should be placed on identifying and destroying low or no-value files.  (Is there really sound justification for saving last Thursday’s cafeteria menu or knowing who won Employee-of-the-Month last July?).  Invest in an automated policy-based management product that allows data to be demoted backward through the storage tiers and ultimately destroyed, based on pre-defined company criteria.  Something has to “give” or the quantity of retained data will eventually outpace future IT budget allocations for storage. 

In the end the winning strategy will be to continually manage information retention, establishing an equilibrium and working toward a goal of near-zero storage growth.  It’s time to make data classification by value and projected “shelf-life” a part of the organizations culture.