Solid State Disks – Beyond the Sticker Shock!
NOTE: My original article contained embedded calculation errors that significantly distorted the end results. These problems have since been corrected. I apologize to anyone who was accidentally misled by this information, and sincerely thank those diligent readers who brought the issues to my attention.
Some issues seem so obvious they’re hardly worth considering. Everyone knows that Solid State Drives (SSD) are more energy-efficient than spinning disk. They don’t employ rotating platters, electro-mechanical motors and mechanical head movement for data storage, so they must consume less power – right? However, everyone also knows the cost of SSD is so outrageous that they can only be deployed for super-critical high performance applications. But does the reputation of having exorbitant prices still apply?
While these considerations may seem intuitive, they are not entirely accurate. Comparing the Total Cost of Ownership (TCO) for traditional electro-mechanical disks vs. Solid State Disks provides a clearer picture of the comparative costs of each technology.
- For accuracy, this analysis compares the purchase price (CAPEX) and power consumption (OPEX) of only the disk drives, and does not include the expense of entire storage arrays, rack space, cooling equipment, etc.
- It uses the drive’s current “street price” for comparison. Individual vendor pricing may be significantly different, but the ratio between disk and SSD cost should remain fairly constant.
- The dollar amounts shown on the graph represent a 5-year operational lifecycle, which is fairly typical for production storage equipment.
- Energy consumption for cooling has also been included in the cost estimate, since it requires roughly the amount of energy as the drives consume to maintain them in an operational state.
- 100 TB of storage capacity was arbitrarily selected to illustrate the effect of cost on a typical mid-sized SAN storage array.
The following graph illustrates the combined purchase price, plus energy consumption costs for several popular electro-mechanical and Solid State Devices.
From the above comparison, several conclusions can be drawn:
SSDs are Still Expensive – Solid State Drives remain an expensive alternative storage medium, but the price differential between SSD and electro-mechanical drives is coming down. As of this writing there is only an x5 price difference between the 800GB SSD and the 600GB, 15K RPM drive. While this is still a significant gap, it is far less that the staggering x10 to x20 price differential seen 3-4 years ago.
SSDs are very “Green” – A comparison of the Watts consumed during a drive’s “typical operation” indicate that SSD consumes about 25% less energy than 10K RPM, 2.5-inch drives, and about 75% less power than 15K RPM, 3.5-inch disks. Given that a) each Watt used by the disk requires roughly 1 Watt of power for cooling to remove the heat that is produced, and b) the cost per Kwh continues to rise every year, this significant difference become a factor over a storage array-s 5-year lifecycle.
Extreme IOPS is a Bonus – Although more expensive, SSDs are capable of delivering from 10- to 20-times more I/O’s-per-second, potentially providing a dramatic increase in storage performance.
Electro-Mechanical Disks Cost Differential – There is a surprisingly small cost differential between 3.5 inch, 15K RPM drives and 2.5 inch 10K RPM drives. This may justify eliminating 10K disks altogether and deploying a less complex 2-tiered array using only 15K RPM disks and 7.2K disks.
Legacy 3.5 Inch Disks – Low capacity legacy storage devices (<146GB) in a 3-5-inch drive form-factor consume too much energy to be practical in a modern, energy-efficient data center (this includes server internal disks). Any legacy disk drive smaller than 300 GB should be retired.
SATA/NL-SAS Disks are Inexpensive – This simply re-affirms what’s already known about SATA/NL-SAS disks. They are specifically designed to be inexpensive, modest performance devices capable of storing vast amounts of low-demand content on-line.
The incursion of Solid State Disks into the industry’s storage mainstream will have interesting ramifications not only for the current SAN/NAS arrays, but also may impact a diverse set of technologies that have been designed to tolerate the limitations of an electro-mechanical storage world. As they say, “It’s a brave new world”.
Widespread deployment of SSD will have a dramatic impact on the storage technology itself. If SSDs can be implemented in a cost-effective fashion, why would anyone need an expensive and complex automated tiering system to decrement data across multiple layers of disk? Because of its speed, will our current efforts to reduce RAID rebuild times still be necessary? If I/O bottlenecks are eliminated at the disk drive, what impact will it have on array controllers, data fabric, and HBAs/NICs residing upstream of the arrays?
While it is disappointing to find SSD technology still commands a healthy premium over electro-mechanical drives, don’t expect that to remain the case forever. As the technology matures prices will decline when user acceptance grows and production volumes increase. Don’t be surprised to see SSD technology eventually eliminate the mechanical disk’s 40-year dominance over the computer industry.
For those of you interested in examining the comparison calculations, I’ve included the following spreadsheet excerpts contain detailed information used to create the graph.
Rumors of Fibre Channel’s Death are Greatly Exaggerated
If you believed all the media hype and vendor pontifcations three years ago, you would have thought for sure that Fibre Channel was teetering on the edge of oblivion. According to industry hype, 10Gbps Ethernet and the FCoE protocol were certain to be the demise of Fibre Channel. One Analyst even went so far as to state, “IP based storage networking technologies represent the future of storage”. Well as they say, “Don’t believe everything you read”.
In spite of a media blitz designed to convince everyone that Fibre Channel was going extinct, industry shipments and FC implementation by IT storage professionals continued to blossom. As 16Gbps Fibre Channel rapidly grew in acceptance, the excitement around 10GbE diminished. In a Dell’Oro Group report for 4Q12, fibre channel Director, switch, and adapter revenues surpassed $650 million, while FCoE champion Cisco suffered through soft quarterly results.
So what makes Fibre Channel network technology so resilient?
• Simplicity – FCP was designed with a singular purpose in mind, and does not have to contend with a complex protocol stack.
• Performance – a native 16Gbps FC port is 40% faster than a 10GbE network, and it too can be trunked to provide aggregate ISL bandwidth up to 128 Gbps.
• Low Latency – FC fabric is not penalized by the additional 2-hop latency imposed by routing data packets through a NAS server before it’s written to disk.
• Parity of Cost – The dramatic reduction in expense promised by FCoE has failed to materialize. The complexity and cost of pushing data at NN_Ghz is fairly consistent, regardless of what protocol it used.
• Efficiency – Having a Fibre Channel back-end network supports such capabilities as LAN-less backup technology, high speed data migration, block-level storage virtualization, and in-fabric encryption.
An excellent indicator that Fibre Channel is not falling from favor is Cisco’s recent announcement of their new 16Gbps MDS 9710 Multilayer Director and MultiService Fabric Switch. Cisco was a major proponent of 10GbE and the FCoE protocol, and failed to update their aging MDS 9500 family of Fibre Channel Directors and FC switches. (http://searchstorage.techtarget.com/news/2240182444/Cisco-FC-director-and-switch-moves-to-16-Gbps-new-chassis) This left Brocade with a lion’s share of a rapidly growing 16 Gbps Fibre Channel market. For Brocade, it produced a record quarter for FC switch revenues, while Cisco struggled with sagging sales.
Another influencing factor in FC longevity of is the average IT department’s need for extremely high-bandwidth storage network capabilities. Prior to 10GbE technology, Ethernet LANs performed quite well at 1GbE (or some trunked variation of 1GbE). The majority of the fibre channel world still depends upon 4Gbps FC, with 8Gbps technology recently starting to make significant inroads in the data center. Given the fairly leisurely pace of migration to higher performance for the SAN and NAS fabric technology. Except for a fairly small percent of IT departments that actually require high performance / high throughput, the lure of a faster interface alone has a limited amount of allure.
So which network technology will win? Who knows (or even cares)? There are usually bigger issues to overcome than what the back-end “plumbing” is made of. It’s far more important to implement the most appropriate technology for the task at hand. That could be Ethernet, Fibre Channel, Infiniband, or some other future network scheme. The key is to select your approach based on functionality and efficiency, not what is being hyped as “the next great thing” in the industry. In spite of all the hyperbole, Fibre Channel isn’t going away any time soon.
As Samuel Clemens (aka Mark Twain) said after hearing that his obituary had been published in the New York Journal, “The reports of my death are greatly exaggerated”.
Drive Down Costs with a Storage Refresh
Like most other things, technology suffers from advancing age. That leading-edge wonder of just a few years ago is today’s mainstream system. This aging process creates great headaches for IT departments, who constantly see “the bar” being moved upward. Just when it seems like the computing environment is under control, equipment needs to be updated.
Unless a company is well disciplined in enforcing their technical refresh cycle, the aging process can also lure some organizations into a trap. The thinking goes something like this – “Why not put off a technology update by a year or two? Budgets are tight, the IT staff is overworked, and things seem to be going along just fine.” It makes sense, doesn’t it?
Well, not exactly. If you look beyond the purchase and migration expenses, there are other major cost factors to consider.
Power Reduction: There have been major changes in storage device energy efficiency over the past decade. Five years ago the 300GB, 15K RPM 3.5-inch drive was leading-edge technology. Today, that has disk been superseded by 2.5-inch disks of the same speed and capacity. Other than its physical size, other major changes are the disk’s interface (33% faster than Fibre Channel) and its power consumption (about 70% less than a 3.5-inch drive). For 100TB of raw storage, $3577 per year could be saved by reduced power consumption alone.
Cooling Cost Reduction: A by-product of converting energy to power is heat, and systems used to eliminate heat consume power too. The following chart compares the cost for cooling 100TB of 3.5-inch disks with the same capacity provided by 2.5-disks. Using 2.5-inch disks, cooling costs could be reduced by $3548 per year, per 100TB of storage.
Floor Space Reduction: Another significant data center cost is for floor space. This expense can vary widely, depending on the type resources provided and level of high availability guaranteed by the Service Level Agreement. For the purpose of cost comparison, we’ll take a fairly conservative $9600 per equipment rack per year. We will also assume fractional amounts are available, although in the real world full rack pricing might be required. Given the higher density provided by 2.5-inch disks, a cost savings of $9,371 would be achieved.
In the example above, simply replacing aging 300GB, 15K RPM 3.5-inch FC disk drives with the latest 300GB, 15K RPM 2.5-inch FC disk drives will yield the following operational costs (OPEX) savings:
Reduced power $ 3,577
Reduced cooling $ 3,548
Less floor space $ 9,371
Total Savings $ 16,496 per 100TB of storage
Over a storage array’s standard 5-year service cycle, OPEX savings could result in as much as $82K dollars or more.
Addition benefits from a storage refresh might also include tiering storage (typically yielding around a 30% savings over non-tiered storage), reduced support contract costs, and less time spent managing older, more labor-intensive storage subsystems. There is also an opportunity for capital expense (CAPEX) savings by cleverly designing cost-optimized equipment, but that’s a story for a future article.
Don’t be misled into thinking that a delay of your storage technical refresh cycle will save money. In the end it could be a very costly decision.
Disaster Recovery Strategy for the 21st Century
Blade servers, virtualization, solid state disks, and 16Gbps fibre channel – it’s challenging to keep up with today’s advanced technology. The complexity and sophistication of emerging products can be dizzying. In most cases we’ve learned how to cope with these changes, but there are a few areas where we still cling to vestiges of the past. One of these relics of past decades is the impenetrable, monolithic data center.
The data center traces its roots back to the mainframe, when all computing resources were housed in a single, highly specialized facility designed specifically to support processing operations. Since there was little or no effort to classify data, these bastions of data processing were over-designed to ensure the most critical requirements were supported. This model was well-suited for mainframes and centralized computing, but it falls well short of meeting the needs of our modern IT environments.
Traditional data center facilities provide a one-size-fits-all solution. At an average $700 to $1500 per square foot, they are expensive to build. They lack the scalability and flexibility to respond to dynamic market changes and shifts in technology. Since these require massive investments of capital, they must be built not only to contain today’s IT equipment, but also satisfy growth requirements for 25-years or more. The end result is a tremendous waste of capacity, corporate funds tied up for decades, making assumptions about the direction and needs of future IT technology, the build-out of a one-size-fits-all facility, and a price tag that makes disaster recovery redundancy well beyond the reach of most companies.
An excellent solution to this problem is already a proven technology – the Portable Modular Data Center. These are typically self-contained data center modules that contain a comprehensive set of power, cooling, security, and internal infrastructure to support a dozen or more equipment racks per module with up to 30kW of power per rack. These units are relatively inexpensive, highly scalable, simple to deploy, energy efficient (Green), and factory constructed to ensure consistent quality and reproducible technology. As modules, they can be deployed incrementally as requirements dictate, avoiding major one-time capital expenditures for facilities.
Their inherent modularity and scalability make them an excellent choice for incrementally building out finely-tuned disaster recovery facilities. Here is an example of how modular data centers can be leveraged to cost-effectively provide Disaster Recovery protection of an organization’s data assets.
- Mission Critical Operations (typically 10% to 15%)
These are applications and data that might severely cripple the organization if they were not available for any significant period of time.
Strategy – Deploy synchronous replication technology to maintain an up-to-date mirror image of the data that could be brought to operational status within a matter of minutes.
Solution – Deploy one or more Portable Module Data Center units within 30-miles (to minimize latency) and run synchronous replication between the primary data center and the modular facility. Since 20-30 miles of separation would protect from a local disaster, but not a region-wide event, it might be worthwhile to replicate asynchronously from the modular data center to some remote (out-of-region) location. A small amount of data might be lost in the event of a disaster (due to asynchronous delay), but processing could still be brought back on-line quickly with minimal loss of data and only a limited interruption to operations.
- Vital Operations (typically 20% to 25%)
These applications and data are very important to the organization, but an outage of several hours would not financially cripple the business.
Strategy – Deploy an asynchronous replication mechanism outside the region to ensure an almost-up-to-date copy of data is available for rapid recovery.
Solution – Deploy one or more Portable Module Data Center units anywhere in the country and run asynchronous replication between the primary data center and the remote modular facility. Since distance is not a limiting factor for asynchronous replication, the modular facility could be installed anywhere. This protects from disasters occurring not only locally, but within the region as well. A small amount of data might be lost in the event of a disaster (due to asynchronous delay), but applications and databases could still be recovered quickly with minimal loss of data and only a limited interruption to operations.
- Sensitive Operations (typically 20% to 30%)
These applications and data are important to the organization, but an outage of several days to one week would have only a negligible financial impact on the business.
Strategy – (same as above) Use the same asynchronous replication mechanism outside the region to ensure an almost-up-to-date copy of data is available for rapid recovery.
Solution – Add one or more Portable Module Data Center units to the above facility (as required) and run asynchronous replication between the primary data center and the remote modular facility.
- Non-Critical Operations (typically 40% or more)These applications and data are incidental to the organization and can be recovered when time is available. An outage of several weeks would have little impact on the business.
Strategy – (same as above) Use the same asynchronous replication mechanism outside the region to ensure an almost-up-to-date copy of data is available for rapid recovery.
Solution – Deploy one or more Portable Module Data Center units anywhere in the country and run asynchronous replication between the primary data center and a remote modular facility.
Note: Since non-critical applications and data tend to be passive, non-critical operations might also be a viable candidate for transitioning to an Infrastructure-as-a-Service (IaaS) provider.
Modular Data Centers are the obvious enabler for the above Disaster Recovery strategy. They allow you to deploy only the data center resource you need, when you need it. They are less expensive than either leased or build facilities, and can be scaled as required by the business.
It’s time for the IT industry to abandon their outdated concepts of what a data center should be and focus on what is needed by each class of data. The day of raised-floor mainframe “bunkers” has passed. It’s time to start managing data center resource deployment as carefully as we manage server and storage deployment. Portable Modular Data Centers allow you to implement efficient, cost-effective IT production facilities in a logical sequence, without breaking the bank in the process.
Consultant, Contractor, or Staff Augmentation – Do You Know the Difference?
The world of IT is becoming remarkably complex, and companies grow increasingly reliant on outside knowledge and skills for assistance. But when you enter into really uncharted waters and need someone you can trust, who will you call? Unfortunately there are a lot of companies in the industry claiming to be technical experts for everything from “Big Data” to “Desktop Virtualization”. How do you identify the serious resources from the technical wannabe’s?
That is an interesting question. Every since the industry’s rush to identify and fix Y2K problems over a decade ago, the line between Consultant, Contractor, and Staff Augmentation has blurred. The recession of the past few years further masked the distinction between roles, since many laid-off IT employees simply re-branded themselves as “Independent Consultants” in an attempt to secure short-term project work.
So what are the differences between Staff Augmentation, Contractors, and Consultants? Let’s start with a definition. According to Wikipedia:
“Staff Augmentation is an outsourcing strategy which is used to staff a project and respond to business objectives. The technique consists of evaluating the existing staff and then determining which additional skills are required. One possible advantage of this approach is that it may leverage existing resources as well as utilize outsourced services and contract workers.”
“An Independent Contractor is a natural person, business, or corporation that provides goods or services to another entity under terms specified in a contract or within a verbal agreement. Unlike an employee, an independent contractor does not work regularly for an employer but works as and when required, during which time he or she may be subject to the Law of Agency. Independent contractors are usually paid on a freelance basis.”
“A Consultant (from Latin: consultare “to discuss”) is a professional who provides professional or expert advice in a particular area such as security (electronic or physical), management, accountancy, law (tax law, in particular), human resources, marketing (and public relations), finance, engineering, or any of many other specialized fields. A consultant is usually an expert or a professional in a specific field and has a wide knowledge of the subject matter.”
…Wikipedia Online Dictionary
Staff augmentation is based on the concept of a “faceless, replaceable skill” that is available for an entire category of labor (Administrator, Engineer, Programmer, Database Administrator, Web Designer, etc.). Since IT relies on a large labor pool of technical skills, these are relatively low priced roles. Since participants are required to have only prerequisite skills in their specialty and no other unique capabilities, they can be hired and released pretty much on demand. Rates are dictated by current market prices, and range from $35 – $95 per hour.
IT contractors are further up the scale in capabilities and value. They are typically companies that deliver a complete service or system to solve a clearly defined problem. This may be a particular operation, type of application, virtualized infrastructure, or network operation. In many instances it is delivered as a complete package, including hardware, software, utilities, installation, configuration, and testing. Contractor services may be purchased on a per-project or a time-and-materials basis and are consistent with similar projects. Bundled labor rates within a specified package or service are in the $125 to $185 per hour range.
At the top of the pyramid is IT consultant. This is a professional service offering highly developed skills and extensive experience in a specialized field. In addition to being a Subject Matter Expert for a particular technology or service, IT consultants typically have an extensive knowledge of related activities that include business operations, project management, associated technologies, industry best practices, quality assurance, security, and other operations. They are sought out by organizations for their comprehensive understanding of business-critical operations or other activity than can have industry-changing ramifications. Since these are highly specialized skills, they command rates from $225 to $450 per hour or more. Although consultants are expensive, they return value to the company that can far exceed their billable rate.
Clearly it’s in the client’s best interests to understand the differences and capabilities of each category. Unfortunately, these titles are frequently intermixed and tossed around somewhat indiscriminately by organizations. Unless due diligence is performed beforehand, occasionally some hapless company will think they landed a senior Consultant for $85 per hr. (plus expenses), when they actually contracted Staff Augmentation. This can quickly becomes the root cause of poor performance, lack-luster productivity, poor organization, missed objectives, and ultimately a failed project.
Technical personnel do not automatically become senior consultants just because that’s a label they’ve anointed themselves with. Buyer beware! Engaging the proper skill-set can either be a game-changer, or a “boat anchor” for the project.
Modular Datacenter Units – The End of Traditional Enterprise Datacenters?
Traditional brick and mortar datacenters have been a mainstay of enterprise computing since the day of the mainframe. IT systems were kept in isolation in windowless, highly secure facilities that provided a constant temperature and humidity environment on a 7×24 basis. Although the cost of building new datacenters continues to increase substantially, until now relatively few options have been available.
However, with the development of the portable modular datacenter, the day of the traditional datacenter may be coming to an end. While there are several variations on the market, the most promising appears to be the completely built out facility. New datacenter modules are built from ISO standard shipping containers. They incorporate chillers, power and communications buses, forced air cooling, equipment racks, and all other components necessary for a modern datacenter. These units can be trucked to any location, moved into position on a concrete pad, connected to external resources, and be ready for systems build-out on short notice. They can be configured to operate as a singular unit, multiple units, and even as stacked arrays of modular datacenter units.
In addition to serving as a modular replacement for traditional brick-and-mortar datacenter s, there are other possibilities for Portable Modular Datacenter s:
RAPID DEPLOYMENT MODULES – For situations where rapid implementation is a key driver, or when companies simply can’t wait the 18-24 months for a new datacenter build-out.
COST CONTAINMENT – Situations where minimizing the cost for building a new datacenter facility is a primary objective
DISASTER RECOVERY – A highly flexible, cost-effective IT environment that can be deployed remotely for a Disaster Recovery solution
CAPACITY-ON-DEMAND –Modular, self-contained units that permit companies to add new datacenter capacity only-as-required (Capacity-as-a-Service?)
TEMPORARY FACILITIES – Allows companies to continue to support ongoing IT operations while a permanent datacenter facility is built
SEGREGATED SYSTEMS – Enables complete isolation of specific IT operation in an otherwise shared environment (Community Cloud?)
DYNAMIC MARKETS – A solution for highly volatile markets where future capacity requirements are difficult to predict
EMERGENCY CAPACITY – Available for relatively rapid deployment when an organization’s primary datacenter runs out of floor space
SYNCRONOUS REPLICATION – Allows the implementation of a small nearby replication site within 40KM of the primary datacenter to support replication while maintaining database consistency
MOBILE SYSTEMS – A portable IT solution that could be relocated to a different region in response to changing corporate needs or an impending disaster (such as a major hurricane).
PREFABRICATED SUB-SYSTEMS – A transportable platform for high growth companies who must buy integrated sub-systems from an external vendor, rather than building the equipment themselves.
REPURPOSING OF BUILDINGS – Modular units may be installed within existing building that are sitting idle, as long as adequate resources (power and communications) are available.
Anotherbig benefit to portable mobile datacenter units is that they’re built in a factory to exact specification. As such, they benefit from repetitive manufacturing processes and ongoing quality assurance reviews. Each module features the same level of quality and reliability as its peers. This is in sharp contrast to traditional brick-and-mortar datacenters, which are normally built as one-off custom configurations.
The concept of portable mobile datacenter units is pretty clever. If there are any downsides to this technology they are not readily apparent. Although this represents a relatively new approach, it appears to be distinctly superior to what’s been done in the past. Don’t be surprised to see a new modular datacenter unit being installed on a concrete pad near you in the foreseeable future.
Storage System Refresh – Making a Case for Mandatory Retirement
It’s hard to retire a perfectly good storage array. Budgets are tight, there’s a backlog of new projects in the queue, people are on vacation, and migration planning can be difficult. As long as there is not a compelling reason to take it out of service, it’is far easier to simply leave it alone and focus on more pressing issues.
While this may be the path of least resistance, it can come at a high price. There are a number of good reasons why upgrading storage arrays to modern technology may yield superior results and possibly save money too!
Capacity – When your aging disk array was installed several years ago, 300 GB, 10K RPM, FC disk drives were mainstream technology. It was amazing to realize you could squeeze up to 45 TB in a single 42U equipment rack! Times have changed. The same 10K RPM DISK drive has tripled in capacity, providing 900 GB in the same 3.5 inch disk drive “footprint”. It’s now possible to get 135 TB (a 300% capacity increase) into the same equipment rack configuration. Since data center rack space currently costs around $3000 per month, that upgrade alone will dramatically increase capacity without incurring any increase in floor-space cost.
Density – Previous generation arrays packaged from (12) to (15) 3.5 inch FC or SATA disk drives into a single rack-mountable 4U array. Modern disk arrays support from (16) 3.5 inch disks per 3U tray, to (25) 2.5 inch disks in a 2U tray. Special ultra-high density configurations may house up to (60) FC, SAS, or SATA DISK drives in a 4U enclosure. As above, increasing storage density within an equipment rack significantly increases capacity while requiring no additional data center floor-space.
Energy Efficiency – Since the EPA’s IT energy efficiency study in 2007 (Report to Congress on Server and Data Center Energy Efficiency, Public Law 109-431), IT manufacturers have increased efforts to improve the energy efficiency of their products. This has resulted in disk drives that consume from 25% to 33% less energy, and storage array controllers lowering power consumption by up to 30%. That has had a significant impact on energy costs, including not only the power to run the equipment, but also power to operate the cooling systems needed to purge residual heat from the environment.
Controller Performance – Storage array controllers are little more than specialized servers designed specifically to manage such functions as I/O ports, disk mapping, RAID and cache operations, and execution of array-centric internal applications (such as thin provisioning and snapshots). Like any other server, storage controllers have benefited from advances in technology over the past few years. The current generation of disk arrays contain storage controllers with from 3 to 5 times the processing power of their predecessors.
Driver Compatibility – As newer technologies emerge, they tend to focus on developing software compatibility with the most recently released products and systems on the market. With the passage of time, it becomes less likely for storage arrays to be supported by the latest and greatest technology on the market. This may not impact daily operations, but it creates challenges when a need arises to integrate aging arrays with state-of-the-art systems.
Reliability – Common wisdom used to be that disk failure characteristics could be accurately represented by a ”bathtub graph”. The theory was the potential for failure was high when a disk was new. It then flattened out at a low probability throughout the disk’s useful life, then took a sharp turn upswing as it approached end-of-life. This model implied that extending disk service life had no detrimental effects until it approached end-of-life for the disks.
However over the past decade, detailed studies by Google and other large organizations with massive disk farms have proven the “bathtub graph” model incorrect. Actual failure rates in the field indicate the probability of a disk failure increases by 10% – 20% for every year the disk is in service. It clearly shows the probability of failure increases in a linear fashion over the disk’s service life. Extending disk service-life greatly increases the risk for disk failure.
Service Contracts –Many popular storage arrays are covered by standard three-year warranties. This creates a dilemma, since the useful service life of most storage equipment is considered to be either four or five years. When the original warranty expires, companies must decide whether to extend the existing support contract (at a significantly higher cost), or transitioning to a time & materials basis for support (which can result in some very costly repairs).
Budgetary Impact – For equipment like disk arrays, it is far too easy to fixate on replacement costs (CAPEX), and ignore the ongoing cost of operational expenses (OPEX). This may avoid large upfront expenditures, but it slowly bleeds the IT budget to death by having to maintain increasingly inefficient, fault-prone, and power hungry equipment.
The solution is to establish a program of rolling equipment replenishment on a four- or five-year cycle. By regularly upgrading 20% to 25% of all systems each year, the IT budget is more manageable, equipment failures are controlled, and technical obsolescence remains in check.
Getting rid of familiar things can be difficult. But unlike your favorite slippers, the LazyBoy recliner, or your special coffee cup, keeping outdated storage arrays in service well beyond their prime can cost your organization plenty.
Tape vs. Disk – It’s Time for a Truce
Over the past couple of years we’ve heard an enthusiastic debate over whether Tape is dead, obsolete, or simply relegated to some secondary role like regulatory compliance or litigation response. A lot of the “uproar” has been caused by highly vocal deduplication companies marketing their products by creating fear, uncertainty, and doubt (FUD) among users. And why not, since most of these vendors do not have a significant presence in the tape backup market and therefore little to lose?
However, there are companies with legitimate concerns about managing their backup process and the direction they should take. They need to understand what the facts are and why one approach would be superior to another.
With this goal in mind, let’s take a look at two popular backup & recovery devices – the LTO-5 tape drive and the 3.0 TB SATA disk, and see how they compare.
|Average drive cost-per-unit||$1,450||$ 495|
|Typical media cost-per-unit||$54 per tape||N/A|
|Native formatted capacity||1500 GB||3000 GB|
Native sustained transfer rate
|140 MB/s||155 MB/s|
|Data buffer size||256 MB||64 MB|
|Average file access time||56 sec.||12.16 ms|
|Interfaces available||6 Gb/s SAS||6 Gb/s SAS|
|Typical duty cycle||8-hrs/day||24-hrs/day|
AES 256-bit encryption
|AES 256-bit encryption|
|Power consumption – Idle||6.7 Watts||7.4 Watts|
|Power consumption – Typical||23.1 Watts||11.3 Watts|
|Drive MTBF||50,000 hours at 100% duty cycle||1,200,000 hours|
|Non-recoverable Error Rate||1 in 1 × 1017 bits||1 sector 1×1015 bits|
|5-year total for 1.0 Petabyte||$37,495||$165,330|
No surprises here. Simply doing the math indicates the cost to store 1 Petabyte of data for 5-years would be more four times more on spinning disk than on tape media. Granted there are other factors involved in the process, but most offset each other. Both a tape library and a disk array take data center floor space and infrastructure resources. Both consume power and require cooling. Each system must be managed by skilled IT specialists. Deduplication may reduce disk capacity requirements (reducing cost) but so will tape compression and/or increasing the tape drive’s duty cycle from 8 to 12 hours per day. Surprisingly the only major variable over time is the cost of the media, which is heavily weighted in favor of tape.
In the foreseeable future the 4TB SATA disk will make the above calculations somewhat more favorable for the disk drive. However, we expect to see the LTO-6 tape drive in production in the second half of 2012, increasing the tape drive’s sustained transfer rate by 30% and tape media capacity by 47%. This will bring the above tape vs. disk comparison back into close alignment.
The sensible strategy is to develop a backup and recovery system that incorporates both technologies, to capitalize on the strengths of both. Using disk “pools” to aggregate nightly backups (whether deduplicated or not) ensures backup windows can be met, and greatly improves data restoration time. Backing up directly to tape from the “disk pools” allows streaming data to be sustained for maximum performance and transfers data to the lowest-cost media available for long-term archiving, disaster recovery, regulatory compliance, and litigation response.
It’s time this argument to bed. Both tape drives and SATA disk should play a role in a well-designed, highly optimized backup and recovery system. The “war” is over, and for once both combatants won!
16 Gbps Fibre Channel – Do the Benefits Outweigh the Cost?
With today’s technology there can be no status quo. As the IT industry advances, so must each organization’s efforts to embrace new equipment, applications, and approaches. Without an ongoing process of improvement, IT infrastructures progressively become outdated and the business group they support grows incrementally less effective.
In September of 2010, the INCITS T11.2 Committee ratified the standard for 16Gbps Fibre Channel, ushering in the next generation of SAN fabric. Unlike Ethernet, Fibre Channel is designed for one specific purpose – low overhead transmission of block data. While this capability may be less important for smaller requirements where convenience and simplicity are paramount, it is critical for larger datacenters where massive storage repositories must be managed, migrated, and protected. For this environment, 16Gbps offers more than twice the bandwidth of the current 8Gbps SAN and 40% more bandwidth than the recently released 10Gbps Ethernet with FCoE (Fibre Channel over Ethernet).
But is an investment in 16Gbps Fibre Channel justified? If a company has reached a point where SAN fabric is approaching saturation or SAN equipment is approaching retirement, then definitely yes! Here is how 16Gbps stacks up against both slower fibre channel implementations and with 10Gbps Ethernet.
|Port Speed||Protocol||Average HBA/NIC Price||Transfer
|Transfer Time for 1TB||Bandwidth
|LPE16002||16 Gbps||Fibre Channel||$1,808||1939 MB/sec.||1.43 Hrs.||$0.93||160%|
|OCe11102||10 Gbps||Ethernet||$1,522||1212 MB/sec.||2.29 Hrs.||$1.26||100%|
|LPe12002||8 Gbps||Fibre Channel||$1,223||800 MB/sec.||3.47 Hrs.||$1.53||65%|
|LPe11000||4 Gbps||Fibre Channel||$891||400 MB/sec.||6.94 Hrs.||$2.23||32%|
This table highlights several differences between 4/8/16 Gbps fibre channel and 10Gbps Ethernet with FCoE technology (sometimes marketed as Unified Storage). The street prices for a popular I/O Controller manufacturer clearly indicates there are relatively small differences between controller prices, particularly for the faster controllers. Although the 16Gbps HBA is 40% quicker, it is only 17% more expensive!
However, a far more important issue is that 16Gbps fibre channel is backward compatible with existing 4/8 Gbps SAN equipment. This allows segments of the SAN to be gradually upgraded to leading-edge technology without having to suffer the financial impact of legacy equipment rip-and-replace approaches.
In addition to providing a robust, purpose-built infrastructure for migrating large blocks of data, it also offers lower power consumption per port, a simplified cabling infrastructure, and the ability to “trunk” (combine) channel bandwidth up to 128Gbps! It doubles the number of ports and available bandwidth in the same 4U rack space for edge switches, providing the potential for a saving of over $3300 per edge switch.
Even more significant is that 16Gbps provides the additional performance necessary to support the next generation of storage, which will be based on 6Gbps and 12Gbps SAS disk drives. Unlike legacy FC storage, which was based upon 4Gbps FC-AL arbitrated loops, the new SAS arrays are on switched connections. Switching provides a point-to-point connection for each disk drive, ensuring every 6Gbps SAS connection (or in the near future, 12Gbps SAS connection) will have a direct connection to the SAN fabric. This eliminates backend saturation of legacy array FC-AL shared busses, and will place far greater demand for storage channel performance on the SAN fabric.
So do the benefits of 16Gbps fibre channel outweigh its modest price premium? Like many things in life – it depends! Block-based 16Gbps fibre channel SAN fabric is not for every storage requirement, but neither is file-based 10Gbps FCoE or iSCSI. If it is a departmental storage requirement or an environment where NAS or iSCSI has previously been deployed, then replacing the incumbent protocol with 16Gbps fibre channel may or may not have merit. However, large SAN storage array are particularly dependent on high performance equipment specifically designed for efficient data transfers. This is an arena where the capabilities and attributes of 16Gbps fibre channel will shine.
In any case, the best protection against making a poor choice is to thoroughly research the strengths and weaknesses of each technology and seek out professional guidance from a vendor-neutral storage expert with a Subject Matter Expert level understanding of the storage industry and its technology.
Boot-from-SAN gives Internal Disk the Boot!
It is somewhat surprising just how many skilled IT specialists still shy away from eliminating traditional internal boot disks with a Boot-from-SAN process. I realize old habits die hard and there’s something reassuring about having the O/S find the default boot-block without needing human intervention. However the price organizations pay for this convenience is not justifiable. It simply adds waste, complexity, and unnecessary expense to their computing environment.
Traditionally servers have relied on internal disk for initiating their boot-up processes. At start-up, the system BIOS executes a self-test, starts primitive services like the video output and basic I/O operations, then goes to a pre-defined disk block where the MBR (Master Boot Record) is located. For most systems, the Stage 1 Boot Loader resides on the first block of the default disk drive. The BIOS loads this data into system memory, which then continues to load Stage 2 Boot instructions and ultimately start the Operating System.
Due to the importance of the boot process and the common practice of loading the operating system on the same disk, two disks drives with a RAID1 (disk mirroring) configuration is commonly used to ensure high availability.
Ok, so far so good. Then what’s the problem?
The problem is the disks themselves. Unlike virtually every subsystem in the server, these are electro/mechanical devices with the following undesirable issues:
- Power & Cooling – Unlike other solid-state components, these devices take a disproportionately large amount of power to start and operate. A mirrored pair of 300GB, 15K RPM disks will consume around .25 amps of power and need 95.6 BTUs for cooling. Each system with internal disk has its own miniature “space heater” that aggravates efforts to keep sensitive solid state components cool.
- Physical Space – Each 3.5 inch drive is 1” x 4.0” x 5.76” (or 23.04 cubic inches) in size, so a mirrored pair of disks in a server represents an obstacle of 46.08 cubic inches that requires physical space, provisions for mounting, power connections, air flow routing, and vibration dampening to reduce fatigue on itself and other internal components.
- Under-utilized Capacity – As disk drive technology continues to advance, it becomes more economical to manufacture higher capacity disk drives than maintain an inventory of lower capacity disks. Therefore servers today are commonly shipped with 300GB or 450GB boot drives. The problem is that Windows Server 2008 (or similar) only needs < 100GB of space, so 66% of the disk’s capacity is wasted.
- Backup & Recovery – Initially everyone plans to keep only the O/S, patches and updates, log files, and related utilities on the boot disk. However, the local disk is far too convenient and eventually has other files “temporarily” put on it as well. Unfortunately some companies don’t include boot disks in their backup schedule, and risk losing valuable content if both disks are corrupted. (Note: RAID1 protects data from individual disk failures but not corruption.)
Boot-from-SAN does not involve a PXE or tftp boot over the network. It is an HBA BIOS setting that allows SAN disk to be recognized very early in the boot process as a valid boot device, then points the server to that location for the Stage 1 Boot Loader code. It eliminates any need for internal disk devices and moves the process to shared storage on the SAN. It also facilitates the rapid replacement of failed servers (all data and applications remain on the SAN), and is particularly useful for blade systems (where server “real-estate” is at a premium and optimal airflow is crucial).
The most common argument used against Boot-from-SAN is “what if the SAN is not available”. On the surface it sounds like a valid point, but what is the chance of that occurring with well-designed SAN storage? Why would that be any different than if the internal boot disk array failed to start? Even if the system started internally and the O/S loaded, how much work could a server do if it could not connect to the SAN? The consequences of any system failing to come up to an operational state are the same, regardless if it uses a Boot-from-SAN process or boots up from internal disks.
For a handful servers, this may not be a very big deal. However, when you consider the impact on a datacenter running thousands of servers the problem becomes obvious. For every thousand servers, Boot-from-SAN eliminates the expense of two thousand internal disks, 240 amps of current, the need for 655,300 BTUs of cooling, greatly simplifies equipment rack airflow, eliminates 200TB of inaccessible space, and measurably improves storage manageability and data backup protection.
Boot-from-SAN capability is built into most modern HBA BIOS’s and is supported by almost every operating system and storage array on the market. Implementing this valuable tool should measurably improve the efficiency of your data center operation.