Author Archives: Big Data Challenges

SAN Fabric for the Next Generation

There’s a quiet revolution going on in large data centers.  It’s not as visible or flashy as virtualization or deduplication, but at least equal in important.

As its name implies, SAN “fabric” is a dedicated network that allows servers, storage arrays, backup & recovery systems, replication devices, and other equipment to pass data between systems.  Traditionally this has been comprised of 4Gbps Fibre Channel and 1Gbps Ethernet channels.  However, a new family of 8Gbps and 16Gbps Fibre Channel, 6Gbps and 12Gbps SAS, and 10Gbps Ethernet are quietly replacing legacy fabric with links capable of 2 – 4 times the performance.

The following is a comparison of the maximum throughput rates of various SAN fabric links:

A comparison of available SAN channel speeds.

Performance ranges from the relatively outdated 1Gbps channel (Ethernet or FC) capable of supporting data transfers of up to 100 MB per second, to 16Gbps Fibre Channel capable of handling 1940 MB per second.  Since all are capable of full duplex (bi-directional) operations, the sustainable throughput rate is actually twice the speed indicated in the chart.  If these blazing new speeds are still insufficient, 10Gbps Ethernet, 12Gbps SAS, and 16Gbps Fibre Channel can be “trunked” – bundled together to produce an aggregate bandwidth equal to the number of individual channels tied together.  (For example, eight 16Gbps FC channels can be bundled to create a 128Gbps “trunk”.)

In addition to high channel speeds, 10Gbps Ethernet and 16Gbps Fibre Channel both implement a 64b/66b encoding scheme, rather than the 8b/10b encoding scheme used by lower performance channels.  The encoding process improves the quality of the data transmission, but at a cost.  An 8b/10b encoding process decreases available bandwidth by 20%, while 64b/66b encoding only reduces bandwidth by 3.03%.  This significantly increases data transfer efficiency.

While 8/16Gbps Fibre Channel and 10Gbps Ethernet are changing the game at the front-end, SAS is revolutionizing the back-end disk drive connections as well.  For over a decade, enterprise-grade disks had 2Gbps or 4Gbps ports, and were attached to a Fiber Channel Arbitrated Loop (FC-AL).  Like any technologies using loop technology, low traffic enjoyed maximum speed but performance dropped off as demand increased.  Under heavy load conditions, the back-end bus could become a bottle-neck.

SAS will change that for two reasons.  First it uses switched technology, so every device attached to the controller “owns” 100% of the bus bandwidth.  The latency “dog leg pattern” found on busy FC-AL busses is eliminated.  Secondly current SAS drives are shipping with 6Gbps ports, which are 50% faster than 4Gbps Fibre Channel.  Just over the horizon are 12Gbps SAS speeds that will offer a 300% increase in bandwidth to the disks, and do it over switched (isolated) channels.

Recent improvements in fabric performance will support emerging SSD technology, and allow SANs to gracefully scale to support storage arrays staggering under a growth rate of 40% – 50% per year.

“Big Data” Challenges our Perspective of Technology

It’s easy to hold onto the concept that IT is all about systems, networks, and software.  This has been accepted wisdom for the past 50-years.   It’s a comfortable concept, but one that is increasing inaccurate and downright dangerous as we move into an era of “big data”!  In today’s world not about systems, networks, applications, or the datacenter – it’s all about the data!

For decades accumulated data was treated as a simply bi-product of information processing activities.  However, there is growing awareness that stored information is not just digital “raw material”, but a corporate asset containing vast amounts of innate value.  Like any other high-value asset, it can be bought or sold, traded, stolen, enhanced, or destroyed.

A good analogy for today’s large-scale storage array is to that of a gold mine.  Data is the nuggets of gold embedded in the mine.  The storage arrays containing data are the “mine” that houses and protects resident data.  Complex and sophisticated hardware, software, tools, and skill-sets are simply tools used to locate, manipulate, and extract the “gold” (data assets) from its surrounding environment.  The presence of high value “nuggets” is the sole reason the mining operation exists.  If there was no “gold”, the equipment used to extract and/or manipulate it would be of little value.

This presents a new paradigm.  For years storage was considered some secondary peripheral that was considered only when new systems or applications were being deployed.  Today storage has an identity of its own that is independent from the other systems and software in the environment.

Data is no longer just a commodity or some type of operational residue left over from the computing process.  “Big Data” forces a shift in focus from IT assets deployment and administration to the management of high-value data assets.  It dictates that data assets sit at the center of concentric rings, ensuring security, recoverability, accessibility, performance, data manipulation, and other aspects of data retention are addressed as abstract requirements with unique requirements.  Now information must be captured, identified, valued, classified, assigned to resources, protected, managed according to policy, and ultimately purged from the system after its value to the organization has been expended.

This requires a fundamental change in corporate culture.  As we move into an era of “big data” the entire organization must be aware of information’s value as an asset, and the shift from technology-centric approaches for IT management.  Just like gold in the above analogy, users must recognize that all data is not “created equal” and delivers different levels of value to an organization for specific periods of time.  For example, financial records typically have a high level of inherent value, and retain a level of value for some defined period of time.  (The Sarbanes-Oxley act requires publicly-traded companies to maintain related audit documents for no less than seven years after the completion of an audit. Companies in violation of this can face fines of up to $10 million and prison sentences of 20 years for Executives.)

However, differences in value must be recognized and managed accordingly.  Last week’s memo about the cafeteria’s luncheon specials must not be retained and managed in the same fashion as an employee’s personnel record.  When entered into the system, information should be classified according to a well-defined set of guidelines.  With that information it can be assigned to an appropriate storage tier, backed up on a regular schedule, kept available on active storage as necessary, later written to low-cost archiving media to meet regulatory and litigation compliance needs.  Once data no longer delivers value to an organization, it can be expired by policy, freeing up expensive resources for re-use.

This approach moves IT emphasis away from building systems tactically by simply adding more-of-the-same, and replacing it with a focus on sophisticated management tools and utilities that automate the process.  Clearly articulated processes and procedures must replace “tribal lore” and anecdotal knowledge for managing  the data repositories of tomorrow.

“Big Data” ushers in an entirely new way of thinking about information as stored, high-value assets.  It forces IT Departments to re-evaluate their approach for management of data resources on a massive scale.   At a data growth rate of 35% to 50% per year, business-as-usual is no longer an option.  As aptly noted in a Bob Dylan song, “the times they are a-changin”.   We must adapt accordingly, or suffer the consequences.

Tape vs. Disk – It’s Time for a Truce

Over the past couple of years we’ve heard an enthusiastic debate over whether Tape is dead, obsolete, or simply relegated to some secondary role like regulatory compliance or litigation response.  A lot of the “uproar” has been caused by highly vocal deduplication companies marketing their products by creating fear, uncertainty, and doubt (FUD) among users.   And why not, since most of these vendors do not have a significant presence in the tape backup market and therefore little to lose?

However, there are companies with legitimate concerns about managing their backup process and the direction they should take.   They need to understand what the facts are and why one approach would be superior to another.

With this goal in mind, let’s take a look at two popular backup & recovery devices – the LTO-5 tape drive and the 3.0 TB SATA disk, and see how they compare.

  Specification Quantum Internal
half-high LTO-5
Seagate Constellation
ES.2 ST33000651SS
  Average drive cost-per-unit $1,450 $ 495
  Typical media cost-per-unit $54 per tape N/A
  Native formatted capacity 1500 GB 3000 GB
  Native sustained transfer rate
140 MB/s 155 MB/s
  Data buffer size 256 MB 64 MB
  Average file access time 56 sec. 12.16 ms
  Interfaces available 6   Gb/s SAS 6   Gb/s SAS
  Typical duty cycle 8-hrs/day 24-hrs/day
  Encryption
AES 256-bit encryption
AES 256-bit encryption
  Power
  Power consumption – Idle 6.7 Watts 7.4 Watts
  Power consumption – Typical 23.1 Watts 11.3 Watts
  Reliability
  Drive MTBF        50,000 hours at          100% duty cycle 1,200,000 hours
  Media MTBF 30-yrs
  Non-recoverable Error Rate 1 in 1 × 1017 bits 1 sector 1×1015 bits
  Warranty 3-Year 3-Year
  Cost Comparison
  Storage Cost-per-GB $0.036 $0.165
  5-year total for 1.0 Petabyte $37,495 $165,330

No surprises here.  Simply doing the math indicates the cost to store 1 Petabyte of data for 5-years would be more four times more on spinning disk than on tape media.  Granted there are other factors involved in the process, but most offset each other.  Both a tape library and a disk array take data center floor space and infrastructure resources.  Both consume power and require cooling.  Each system must be managed by skilled IT specialists.  Deduplication may reduce disk capacity requirements (reducing cost) but so will tape compression and/or increasing the tape drive’s duty cycle from 8 to 12 hours per day.   Surprisingly the only major variable over time is the cost of the media, which is heavily weighted in favor of tape.

In the foreseeable future the 4TB SATA disk will make the above calculations somewhat more favorable for the disk drive.  However, we expect to see the LTO-6 tape drive in production in the second half of 2012, increasing the tape drive’s sustained transfer rate by 30% and tape media capacity by 47%.  This will bring the above tape vs. disk comparison back into close alignment.

The sensible strategy is to develop a backup and recovery system that incorporates both technologies, to capitalize on the strengths of both.   Using disk “pools” to aggregate nightly backups (whether deduplicated or not) ensures backup windows can be met, and greatly improves data restoration time.  Backing up directly to tape from the “disk pools” allows streaming data to be sustained for maximum performance and transfers data to the lowest-cost media available for long-term archiving, disaster recovery, regulatory compliance, and litigation response.

It’s time this argument to bed.  Both tape drives and SATA disk should play a role in a well-designed, highly optimized backup and recovery system.  The “war” is over, and for once both combatants won!

16 Gbps Fibre Channel – Do the Benefits Outweigh the Cost?

With today’s technology there can be no status quo.  As the IT industry advances, so must each organization’s efforts to embrace new equipment, applications, and approaches.  Without an ongoing process of improvement, IT infrastructures progressively become outdated and the business group they support grows incrementally less effective.

In September of 2010, the INCITS T11.2 Committee ratified the standard for 16Gbps Fibre Channel, ushering in the next generation of SAN fabric.  Unlike Ethernet, Fibre Channel is designed for one specific purpose – low overhead transmission of block data.  While this capability may be less important for smaller requirements where convenience and simplicity are paramount, it is critical for larger datacenters where massive storage repositories must be managed, migrated, and protected.  For this environment, 16Gbps offers more than twice the bandwidth of the current 8Gbps SAN and 40% more bandwidth than the recently released 10Gbps Ethernet with FCoE (Fibre Channel over Ethernet).

But is an investment in 16Gbps Fibre Channel justified?  If a company has reached a point where SAN fabric is approaching saturation or SAN equipment is approaching retirement, then definitely yes!  Here is how 16Gbps stacks up against both slower fibre channel implementations and with 10Gbps Ethernet.

Emulex
Model
Port Speed Protocol Average HBA/NIC   Price Transfer
Rate
Transfer Time for 1TB Bandwidth
Cost per
MB/sec.
Bandwidth
Difference
LPE16002 16 Gbps Fibre Channel $1,808 1939 MB/sec. 1.43 Hrs. $0.93 160%
OCe11102 10 Gbps Ethernet $1,522 1212 MB/sec. 2.29 Hrs. $1.26 100%
LPe12002 8 Gbps Fibre Channel $1,223 800 MB/sec. 3.47 Hrs. $1.53 65%
LPe11000 4 Gbps Fibre Channel $891 400 MB/sec. 6.94 Hrs. $2.23 32%

This table highlights several differences between 4/8/16 Gbps fibre channel and 10Gbps Ethernet with FCoE technology (sometimes marketed as Unified Storage).  The street prices for a popular I/O Controller manufacturer clearly indicates there are relatively small differences between controller prices, particularly for the faster controllers.  Although the 16Gbps HBA is 40% quicker, it is only 17% more expensive!

However, a far more important issue is that 16Gbps fibre channel is backward compatible with existing 4/8 Gbps SAN equipment.  This allows segments of the SAN to be gradually upgraded to leading-edge technology without having to suffer the financial impact of legacy equipment rip-and-replace approaches.

In addition to providing a robust, purpose-built infrastructure for migrating large blocks of data, it also offers lower power consumption per port, a simplified cabling infrastructure, and the ability to “trunk” (combine) channel bandwidth up to 128Gbps!   It doubles the number of ports and available bandwidth in the same 4U rack space for edge switches, providing the potential for a saving of over $3300 per edge switch.

Even more significant is that 16Gbps provides the additional performance necessary to support the next generation of storage, which will be based on 6Gbps and 12Gbps SAS disk drives.  Unlike legacy FC storage, which was based upon 4Gbps FC-AL arbitrated loops, the new SAS arrays are on switched connections.  Switching provides a point-to-point connection for each disk drive, ensuring every 6Gbps SAS connection (or in the near future, 12Gbps SAS connection) will have a direct connection to the SAN fabric.  This eliminates backend saturation of legacy array FC-AL shared busses, and will place far greater demand for storage channel performance on the SAN fabric.

So do the benefits of 16Gbps fibre channel outweigh its modest price premium?  Like many things in life – it depends!  Block-based 16Gbps fibre channel SAN fabric is not for every storage requirement, but neither is file-based 10Gbps FCoE or iSCSI. If it is a departmental storage requirement or an environment where NAS or iSCSI has previously been deployed, then replacing the incumbent protocol with 16Gbps fibre channel may or may not have merit.  However, large SAN storage array are particularly dependent on high performance equipment specifically designed for efficient data transfers.  This is an arena where the capabilities and attributes of 16Gbps fibre channel will shine.

In any case, the best protection against making a poor choice is to thoroughly research the strengths and weaknesses of each technology and seek out professional guidance from a vendor-neutral storage expert with a Subject Matter Expert level understanding of the storage industry and its technology.

Boot-from-SAN gives Internal Disk the Boot!

It is somewhat surprising just how many skilled IT specialists still shy away from eliminating traditional internal boot disks with a Boot-from-SAN process.  I realize old habits die hard and there’s something reassuring about having the O/S find the default boot-block without needing human intervention.  However the price organizations pay for this convenience is not justifiable.  It simply adds waste, complexity, and unnecessary expense to their computing environment.

Traditionally servers have relied on internal disk for initiating their boot-up processes.  At start-up, the system BIOS executes a self-test, starts primitive services like the video output and basic I/O operations, then goes to a pre-defined disk block where the MBR (Master Boot Record) is located.  For most systems, the Stage 1 Boot Loader resides on the first block of the default disk drive.  The BIOS loads this data into system memory, which then continues to load Stage 2 Boot instructions and ultimately start the Operating System.

Due to the importance of the boot process and the common practice of loading the operating system on the same disk, two disks drives with a RAID1 (disk mirroring) configuration is commonly used to ensure high availability.

Ok, so far so good.  Then what’s the problem?

The problem is the disks themselves.  Unlike virtually every subsystem in the server, these are electro/mechanical devices with the following undesirable issues:

  • Power & Cooling – Unlike other solid-state components, these devices take a disproportionately large amount of power to start and operate.  A mirrored pair of 300GB, 15K RPM disks will consume around .25 amps of power and need 95.6 BTUs for cooling.  Each system with internal disk has its own miniature “space heater” that aggravates efforts to keep sensitive solid state components cool.
  • Physical Space – Each 3.5 inch drive is 1” x 4.0” x 5.76” (or 23.04 cubic inches) in size, so a mirrored pair of disks in a server represents an obstacle of 46.08 cubic inches that requires physical space, provisions for mounting, power connections, air flow routing, and vibration dampening to reduce fatigue on itself and other internal components.
  • Under-utilized Capacity – As disk drive technology continues to advance, it becomes more economical to manufacture higher capacity disk drives than maintain an inventory of lower capacity disks.  Therefore servers today are commonly shipped with 300GB or 450GB boot drives.  The problem is that Windows Server 2008 (or similar) only needs < 100GB of space, so 66% of the disk’s capacity is wasted.
  • Backup & Recovery – Initially everyone plans to keep only the O/S, patches and updates, log files, and related utilities on the boot disk.  However, the local disk is far too convenient and eventually has other files “temporarily” put on it as well.  Unfortunately some companies don’t include boot disks in their backup schedule, and risk losing valuable content if both disks are corrupted.  (Note:  RAID1 protects data from individual disk failures but not corruption.)

Boot-from-SAN does not involve a PXE or tftp boot over the network.  It is an HBA BIOS setting that allows SAN disk to be recognized very early in the boot process as a valid boot device, then points the server to that location for the Stage 1 Boot Loader code.  It eliminates any need for internal disk devices and moves the process to shared storage on the SAN.  It also facilitates the rapid replacement of failed servers (all data and applications remain on the SAN), and is particularly useful for blade systems (where server “real-estate” is at a premium and optimal airflow is crucial).

The most common argument used against Boot-from-SAN is “what if the SAN is not available”.  On the surface it sounds like a valid point, but what is the chance of that occurring with well-designed SAN storage?  Why would that be any different than if the internal boot disk array failed to start?  Even if the system started internally and the O/S loaded, how much work could a server do if it could not connect to the SAN?  The consequences of any system failing to come up to an operational state are the same, regardless if it uses a Boot-from-SAN process or boots up from internal disks.

For a handful servers, this may not be a very big deal.  However, when you consider the impact on a datacenter running thousands of servers the problem becomes obvious.  For every thousand servers, Boot-from-SAN eliminates the expense of two thousand internal disks, 240 amps of current, the need for 655,300 BTUs of cooling, greatly simplifies equipment rack airflow, eliminates 200TB of inaccessible space, and measurably improves storage manageability and data backup protection.

Boot-from-SAN capability is built into most modern HBA BIOS’s and is supported by almost every operating system and storage array on the market.  Implementing this valuable tool should measurably improve the efficiency of your data center operation.

Datacenter Optimization? It’s a Target-Rich Environment!

We’re struggling back from the depths of recession, but IT budgets remain tight.  The business community is demanding an ever-increasing amount of functionality from the datacenter.  Managing IT today is an exercise in being-between-a-rock-and-a-hard-place.

 However in the midst of this seemingly impossible situation, there are bright spots.  Most datacenters are a veritable treasure-trove of opportunity for efficiency improvements.  Some examples include:

  • Typically over 90% of all data sitting on active disk has not been accessed in over 6-months.
  • An average physical server (non-virtualized) is less than 30% utilized.
  • Floor-space for a 42U equipment rack costs around $3000 per month (or $36,000 per year, per rack).  That equate to over $850 per U (1.75 inches), per year!
  • Replacing rack servers with blade servers can reduce the amount of rack space by at least 36% (typically much more).
  • According to Intel, upgrading servers to newer, more powerful systems can yield a consolidation ratio of between 4:1 and 7:1.
  • Standardizing on 45U high equipment racks, rather than 42U racks will reduce your datacenter foot print by (1) rack for every (14) equipment racks installed.
  • The purchase price for multi-tiered storage equipment is normally 25% – 35% less than traditional storage arrays.
  • Replacing boot disks with boot-from-SAN technology may eliminate literally hundreds of underutilized disks, along with the power and cooling they require.
  • 2.5 inch disk drives need 40% less energy (and cooling) than a 3.5 inch disk drive equivalent of the same capacity.
  • In a properly managed environment, LTO tape media will store seldom-used data for up to 30-years with a 99.999% recovery rate – at under $.03 per GB!
  • A well-designed SAN topology can lower the fibre channel port cost from $1800-per-port to around $300-per-port (an 80%+ reduction in cost)!
  • Well-designed and properly delivered IT training can increase productivity by 17% – 21% per FTE.

So where to start?  The quickest way is to perform a high-level assessment of the datacenter to identify the most promising opportunities.  This can be done by internal personnel, but it is a task most effectively done by an outside IT consulting firm that specializes in datacenter optimization.  They can devote the time necessary to promptly complete the task, and are not biased by day-to-day familiarity with the equipment that may mask issues.  Additionally, an professional datacenter consultant can deliver an industry-wide perspective, suggest best practices, and offer out-of-the-box thinking that is not influenced by an organization’s current culture.

Once all areas for improvement have been exposed and documented, the data should be transitioned into the architectural development and planning cycle to ensure any changes will not adversely impact other areas of operation, and are executed on a manageable and sustainable timeline.

 Unfortunately there is no “magic cure” for a difficult economy or anemic IT budgets.  However, most datacenters offer more than enough opportunity to enable you to shave 20% to 30% off the operating budget.

Storage Tiers – Putting Data in It’s Place

I’m frequently surprised by the number of companies who haven’t transitioned to a tiered storage structure.  All data is not created equal.  While a powerful database may place extreme demand on storage, word processing documents do not. 

As we move into a new world of “big data”, more emphasis needs to be focused on making good decisions about what class of disk this data should reside on.  Although there are no universally accepted standards for storage tier designations, frequently the breakdown goes as follows:

Tier 0 – Solid state devices

Tier 1 – 15K RPM SAS or FC Disks

Tier 2 – 10K RPM SAS or FC Disks

Tier 3 – 7200 or 5400 RPM SATA (a.k.a. – NL-SAS) Disks

So why is a tiering strategy important for large quantities of storage?  Let’s take a look at similar storage models for 1 petabyte of data:

The difference in disk drive expense alone is over $225,000 or around 30% of the equipment purchase price.  In addition there other issues to consider.

Pros:  

  • Reduces the Initial purchase price by 25% or more
  • Improving energy efficiency by 25% – 35%  lowers operational cost and cooling requirements
  • Substantial savings from reduced data center floorspace requirements
  • Increased overall performance for all applications and databases
  • Greater scalability and flexibility for matching storage requirements to business growth patterns
  • Provides additional resources for performance improvements (an increased number of ports, cache, controller power, etc.)
  • A high degree of modularity facilitates better avoidance of technical obsolescence
  • May moderate the demand for technical staff necessary to manage continual storage growth                                                                              

Cons: 

  • Requires automated, policy-based data migration software to operate efficiently.
  • Should employ enterprise-class frames for Tiers 0/1 and midrange arrays for Tiers 2/3
  • Incurs approximately a 15% cost premium for enterprise-class storage to support Tier 0/1 disks
  • Implements a more complex storage architecture that requires good planning and design
  • Needs at least a rudimentary data classification effort for maximum effectiveness

So does the end justify the effort?  That is for each company to decide.  If data storage growth is fairly stagnant, then it may be questionable whether the additional effort and expense is worth it.  However if you are staggering under a 30% – 50% CAGR storage growth rate like most companies, the cost reduction, increased scalability, and performance improvements achieved may well justify the effort.

Big Data – Data Preservation or Simply Corporate Hoarding?

Several years ago my Mother passed away.  As one of her children, I was faced with the challenge of helping clean out her home prior to it being put up for sale.  As we struggled to empty out each room, I was both amazed and appalled by what we found.  There were artifacts from almost every year in school, bank statements from the 1950s, yellowing newspaper clippings, and greeting cards of all types and vintages.  Occasionally we’d find a piece that was worth our attention, but the vast majority of saved documents were just waste – pieces of useless information tucked away “just in case” they might someday be needed again.

Unfortunately many corporations engage in the same sort of “hording”.  Vast quantities of low-value data and obsolete information are retained on spinning disk or archived on tape media forever, “just in case” they may be needed.   Multiple copies of databases, outdated binaries from application updates, copies of log files, ancient directories and files that were undeleted – all continue to consume capacity and resources.

Perhaps this strategy worked in years past, but it has long outlived its usefulness.  At the average industry growth rate, the 2.5 Petabyte of storage you struggle with today will explode to over 1.0 Exabytes within 15-yrs!  That’s a 400 times increase in your need for storage capacity, backup and recovery, SAN fabric bandwidth, data center floor space, power and cooling, storage management, staffing, disaster recovery, and related support items.  The list of resources impacted by storage growth is extensive.  In a previous post I’d identified (46) separate areas that are directly affected by storage growth, and must be scaled accordingly.  An x400 expansion will result in a simply stunning amount of hardware, software, facilities, support services, and other critical resources needed to support this rate of growth.  Deduplication, compression, and other size reduction methods may provide temporary relief but in most cases they simply defer the problem, not eliminate it.

The solution is obvious – reduce the amount of data being saved.  Determine what is truly relevant and save only information that has demonstrable residual value.  This requires a system of data classification, and a method for managing, migrating, and ultimately expiring files.  

Unfortunately that is much easier said than done.  Attempt to perform data categorization manually and you’ll quickly be overwhelmed by the tsunami of data flooding the IT department.  Purchase one of the emerging commercial tools for data categorization, and you may be frustrated by how much content is incorrectly evaluated and assigned to incorrect categories. 

Regardless of the challenges, there are very few viable alternatives to data classification for maintaining massive amounts of information.  Far greater emphasis should be placed on identifying and destroying low or no-value files.  (Is there really sound justification for saving last Thursday’s cafeteria menu or knowing who won Employee-of-the-Month last July?).  Invest in an automated policy-based management product that allows data to be demoted backward through the storage tiers and ultimately destroyed, based on pre-defined company criteria.  Something has to “give” or the quantity of retained data will eventually outpace future IT budget allocations for storage. 

In the end the winning strategy will be to continually manage information retention, establishing an equilibrium and working toward a goal of near-zero storage growth.  It’s time to make data classification by value and projected “shelf-life” a part of the organizations culture.

(50) Energy Saving Tips for the Data Center

It’s a simple truth – “Big Data” produces big power bills.  In many areas the cost of data center energy for ongoing operatons is equal to the purchase cost of IT equipment itself.  In today’s economy “going green” offers some very attractive incentives for saving money through conservation practices, as well as a side benefit of helping save the planet we all live on.

The following is a collection of tips to save power in the data center.  Some are simply common sense and others take time, knowledge, and a budgetary commitment to implement.  As many of these as possible should be incorporated into an energy optimization culture that continually searches for ways to reduce power consumption and the associated cooling requirements.

1.  Purchase Energy Efficient Disk Technology

A new generation of disk drives feature such advanced capabilities as optimized caching, intelligent control circuitry, energy optimized motors, and other power reduction techniques.  Not only ask for energy efficient equipment for your projects, but ensure your purchasing department is aware of the differences and importance to your organization

2.  Create a Tiered Storage System

Assigning data to different classes of disk subsystems, based on the value of the information can result in significant energy savings. Solid-state disks and lower RPM disks drives consume far less power-per-TB than standard disks.

3.  Automated, Policy-Based Migration

This software utility is a major enabler for multi-tiered storage. It monitors file characteristics and will automatically migrate data “behind the scenes” to an appropriate class of disk once a specific set of criteria is met.

4.  Implement Storage Virtualization

Virtualization creates an abstraction of physical storage and allows the servers to see available disk as one large storage pool. It provides access to all available storage, offers greater flexibility and simplifies the management of heterogeneous subsystems.

5.  Employ Thin Provisioning

Databases and some applications require a contiguous storage space assigned for future growth. Thin provisioning facilitates the allocation of virtual storage, which will appear as a contiguous physical storage to the database.

6.  Power Down Inactive Equipment

Unused systems and storage that has been left running in a data center will continue to consume power and generate heat without providing any useful work.  An assumption that “someone might need to access it” is a poor reason for leaving inactive equipment up and running 365-days per year.

7.  Retire Legacy Systems

Outdated equipment can be another big consumer of energy.  Develop a program to annually retire aging storage that contains low-capacity disks, inefficient circuit components, and little or no power conservation circuitry.

8.  Optimize Raid Array Configuration

Legacy RAID5 3+1 or high performance RAID10 configurations that are not warranted waste large amounts of capacity and power with little tangible benefit.  Selective deployment of RAID technology increases usable space and reduces power/cooling requirements.

9.  Clean Out Unwanted Data

Over time, systems become a retirement home for unused files, core dumps, outdated logs, roll-back files, non work-related content, and other unnecessary information.  Files can be automatically scanned to identify and remove unwanted or outdated data that provides no value to the company.

10.  Clean Up File Systems

Like data, file systems and directories should be periodically scanned to ensure that defunct applications, outdated directories, and temporary updates have been purged from storage.

11.  Periodically Update I/O Firmware

Manufacturers regularly improve their firmware to ensure bugs are fixes, security holes are patched, and performance is optimized.   Current firmware ensures that controllers work at optimal efficiency.  Less work that must be done may translate into less power consumption.

12.  Clean Up the Backup Process

Examine the backup schedule and exclusion lists to ensure all identified areas are still relevant.  Your backup system may be regularly processing and backing up directories that contain obsolete files, irrelevant directories (i.e.- /temp), or system content that never changes.

13.  Replace Missing Floor Tiles and Blank Panels

Missing floor tiles and equipment rack filler panels reduce the positive cooling pressure produced by the cooling system and can significantly disrupt airflow patterns through rack-mounted equipment.

14. Eliminate Air Pressure Blockage

Also check under the raised floor for collections of debris that can restrict airflow going to, or through equipment racks.  The harder an air conditioning system must work to move air through a facility, the more energy will be consumed.

15.  Increase Temperature and Humidity Settings

Confirm temperature and humidity are set to the correct levels.  Evaluate equipment manufacturer’s specifications to ensure all settings do not go exceed manufacturer recommendations.

16. Turn Off Video Monitors

If video monitors are not in use, they should be turned off.  Monitors are usually left on 24-hrs a day whether they’re being used or not, consuming power and generating heat without providing value.

17.  Minimize/Eliminate Server  Internal Disk Drives

Servers are usually purchased with internal disks installed for the operating system, binaries, swap space, and other system needs. Whenever practical, eliminate internal disks by using Boot-from-SAN technologies to better utilize capacity and more efficiently manage power consumption.

18.  Reclaim Orphaned LUNS

Storage tends to collect areas of allocated, but unused or abandoned storage space over time. Periodic review and reclamation of these spaces can result in significant storage savings.

19.  Revise Data Retention Policy

An organizational policy of “save everything” is usually the worst of approaches.  Implement a program of saving only data that has verified business value, or is necessary to retain for litigation protection and regulatory compliance.

20. Increase User Consumption Awareness

End-users bad habits can have a significant impact on storage consumption. Educate users on the value of content management, space utilization, and data cleanup once a file is no longer needed.

21. Facilities Operational Staff Training

Every operational staff member should be trained in the proper operation of equipment, conservation methods, and the energy optimization objective established by the organization.  Energy management must be a part of the corporate culture.

22. Require Periodic Performance Optimization

A poorly performing server, fabric, or storage structures will consume additional power and cooling.  Periodic performance tuning efforts will optimize server and storage operations and achieve the same goals while requiring the systems to do less work..

23. Disk Spares Assignment

Over-provisioning of disk spares consumes storage resources without adding measurable value.  Storage industry best practices recommend one disk spare for every 30-32 disks.  RAID array selection may dictate more or less need to be allocated.  Follow manufacturer recommendations for spares.

24. High Efficiency Power Supplies

High efficiency power supplies offer improved efficiency of 60-70% to over 90%. In most circumstances there are exact replacements for most popular systems and storage power supplies.

25. Channel Port Speed Optimization

Implementation of high speed ports and following recommended fan-out ratios allow you to provide an appropriate amount of bandwidth with a minimum number of resources, which translates into lower power and cooling demands.

26. High Capacity Disk Drives

Advanced disk development is dramatically increasing physical disk capacity.  As long as IOPS (I/Os per second) is not a requirement, larger disks of the same rotational speed can be deployed to double or even triple capacity for the same energy consumption.

27. Centralize Storage Management

Over time, management tools offering point-solutions tend to proliferate, along with servers and storage.  Centralization and consolidation of management tools into comprehensive suites can eliminate multiple under-utilized monitors and reduce excess power consumption.

28. Use Electronically Commutated Motors

Wherever possible, replace condensing units or fan powered boxes using mechanical brushes with electronically commutated motors.  Eliminating the brush mechanism and adding automatic turn-down circuitry found in most EC Motors can yield a reduction in power consumption of up to 45%.

29. Equipment Consolidation

Legacy servers and storage systems have proliferated over the past two decades.  Frequent over-provisioning of systems leads to servers and storage that are grossly underutilized. Consolidation permits additional legacy systems to be retired.

30. Deploy Arrays Built from 2.5 Inch Drives

Three or more 2.5 inch disk can fit in the same physical space as one 3.5 inch drive.  They have a much smaller spinning mass, so they can provide twice the storage capacity for the same power consumption.

31. Real-Time Data Compression

Some primary storage systems can perform real-time compression on the data stream. For certain types of data this can produce a reduction of 2:1 or more in the amount of storage space consumed.

32. Manage Data Copy Proliferation

Without careful monitoring, duplicate copies of data proliferate like rabbits.  IT management should review each department’s data requirements and ensure only a reasonable number of copies exist.

33. Data De-duplication

This backup technology identifies patterns in the data stream and replaces duplicate data with a pointer to the original copy. This can significantly reduce the amount of disk backup space required.

34. Data Classification

This is a process that categorizes different types of information by business value.  Once this process has been completed, data can be assigned an appropriate levels of disk performance and cost.

35. Solid-State Drives

Solid-state disk dramatically reduce power consumption by eliminating electro-mechanical components and rotating platters.  SSD power consumption is miniscule when compared to traditional disk drives.

36. MAID Technology

Maid technology powers down the storage array to an idle state if no activity has been detected within a specific period of time. They are valuable when infrequently accessed data is involved.

37. Use High-Capacity Tape Drives

High-capacity tape drives will hold larger amounts data and when installed in tape libraries, minimize the number of cartridge changes.  Since a robotic arm is an electromechanical device, minimizing tape changes reduces the amount of energy consumed by the tape library.

38. Convert to Direct DC Power

Significant energy loss occurs when AC power goes through multiple conversion steps between the initial distribution point and the system power supply.  Converting to (or designing for) direct DC power directly to the equipment racks can save up to 30% in power consumption.

39. Capacity on Demand

Avoid deploying Capacity-on-Demand capabilities unless absolutely required.  Inactivated processors, memory, and other resources typically consume energy without providing additional business value until an activation license is provided.

40. Consider Storage-as-a-Service

If your operational model supports it, consider migrating some storage requirements to the Cloud.  When storage is purchased from an external provider, organizations pay only for the storage they use and therefore are only charged for the energy necessary to run storage capacity they’ve purchased.

41. Consolidation of NAS Systems

NAS storage has proliferated within most organizations, due to their modest cost, installation flexibility, and ease of deployment.  Consolidation of multiple stand-alone units into larger NAS storage will improve efficiency, simplify management, and minimize power consumption.

42. Greater use of Granular Scaling

Select storage equipment that facilitates scaling capacity in relatively small increments.  Installing full frames of disk storage before its capacity is required consumes large amounts of power without adding any business value.

43. Consolidate SAN Fabrics

Consolidation multiple SAN fabrics into a single shared SAN fabric to eliminates switch/director duplication, simplifies manageability and increases device utilization.

44. Continuous Data Streaming to Tape

Ensure the backup streams sent to tape devices are robust enough to allow continuous streaming, rather than requiring frequent start and stops.  Also configure disk pools to consolidate data and ensure tape drives can be driven at maximum speed for the shortest period possible.  Streaming data to tape requires less energy and significantly reduces the backup period.

45. Back Up to Tape Media

Disk pools for backup are recommended for speed and efficiency, but inactive data should be off-loaded to tape media as soon as possible to minimize energy consumption.  Once it has been written to tape, data can be archived for future use without consuming power or occupying spinning disk space.

46. Equipment Rack Height

Increasing equipment rack height by a few inches lowers the total number of installed power supplies installed in the racks.  Replacing 42U racks with 45U racks will add 3U per frame and free up expensive data center floor-space for 6 additional frames per 100 racks.

47. Update Legacy Lighting Systems

Update legacy lighting systems to modern, energy efficient technology and install occupancy sensors in the data center to ensure lighting is only being used as required.

48. Adopt a Cold Aisle/Hot Isle Configuration

Creating a designated cold/hot isle system works more efficiently by preventing hot and cold air from mixing.  With a Cold/Hot isle organization, cold air is directed into the equipment racks on one side while hot air is purged from the other.

49. Use Ambient Air For Cooling

Using ambient air for cooling takes advantage of the differential between local atmospheric temperature levels and the heat generated by electronic equipment.  If relatively dry air is present and temperatures are moderate, it may be advantageous to leverage prevailing conditions for cooling, rather than being total dependent upon mechanical cooling systems.

50. Measure Your Power Consumption

According to Peter Drucker, ““What gets measured, gets managed.”  If you don’t set clear objectives and deploy the proper measurement tools to track your progress, there is a very good chance you’ll never achieve your company’s energy reduction goals.

As with many things, “your mileage may vary” when implementing any of the above tips.  Start with the easiest and most obvious, then work forward from there.  And as mentioned above, make energy conservation a part of your operational culture.  Escalating energy costs and higher power demands are problems that will probably not go away in the foreseeable future.

“Big Data” Getting Bigger? Beware of the Ripple Effect…

Everyone seems to be concerned about the “tsunami of data” that is overwhelming the IT world. However, relatively few people appear to be worried about the “ripple effect” of this growth on other areas that are directly or indirectly impacted by this phenomenon.

Storage growth does not occur in a vacuum.   Every gigabyte of data written to disk must also be backed up, managed, transferred, secured, analyzed, protected, and supported.   It has a “ripple effect” that can spread throughout the organization, creating problems and resource shortages in many other areas.

A case-in point is the backup & recovery process.  Every gigabyte stored must be scheduled for backup, so if we’re experiencing a 50% CAGR data growth rate, then we are also subjected to a 50% growth rate in demand for backup & recovery services.  In addition, most companies keep more than one copy of data in the form of supplementary backups, clones, replications, and other forms of duplication.  Therefore a single gigabyte of data can exist in multiple areas throughout the organization.

The ripple effect of storage growth and the areas that are impacted

Areas that are directly or indirectly affected by storage growth.

The picture above identifies at least (36) specific areas that are impacted by data growth.  I’m sure there are others.  Gone are the days when problems could quickly be resolved by “just buying more disk”.

It’s time to “think outside the box”.  This is no longer a localized issue that can be solved by stove-piped departments and back-room technologists.  It is an enterprise-wide challenge that needs the creative minds of many individuals from diverse areas of the organization.  Consider bringing in independent Subject Matter Experts from the outside to analyze complex problems, stimulate creative thinking, and discuss how others have attacked similar challenges.

In today’s world of “big data”, there needs to be far greater emphasis on comprehensive planning, designing in architectural efficiency, minimizing the impact on IT infrastructure, and improving the manageability of our entire IT environment.  Your future depends on it.