Blog Archives
Enhanced Commodity Storage – Do You Believe in Magic?
With predictable regularity someone surfaces on the Web, claiming they have discovered a way to turn slow SATA arrays into high performance storage. Their method usually involves adding complex and sophisticated software to reallocate and optimize system resources. While there may a few circumstances where this might work, in reality it is usually just the opposite.
The problem with this concept is similar to the kit car world several decades ago. At the time, kit-build sports cars were all the rage. Automobile enthusiasts were intrigued by the idea of building a phenomenal sports car by mounting a sleek fiberglass body on the chassis of a humble Volkswagen Beetle. Done properly, the results were amazing! As long as their workmanship was good, the end results would rival the appearance of a Ferrari, Ford GT-40, or Lamborghini!
However, this grand illusion disappeared the minute its proud owner started the engine. Despite its stunning appearance, the kit car was still built on top of an anemic VW bug chassis, power train, and suspension!
Today we see a similar illusion being promoted by vendors claiming to offer “commodity storage” capable of delivering the same high performance as complex SAN and NAS systems. Overly enthusiastic suppliers push the virtues of cheap “commodity” storage arrays with amazing capabilities as a differentiator in this highly competitive market. The myth is perpetuated within the industry by a general lack of understanding of the underlying disk technology characteristics, and a desperate need to manage shrinking IT budgets, coupled with a growing demand for storage capacity.
According to this technical fantasy, underlying hardware limitations don’t count. In theory, if you simply run a bunch of complex software functions on the storage array controllers, you somehow repeal the laws of physics and get “something for nothing”.
That sounds appealing, but it unfortunately just doesn’t work that way. Like the kit car’s Achilles heel, hardware limitations of underlying disk technology govern the array’s capabilities, throughput, reliability, scalability, and price.
• Drive Latencies – the inherent latency incurred to move read/write heads and rotate disks until the appropriate sector address is available can vary significantly.
For example, comparing performance of a 300GB, 15K RPM SAS disk to a 3TB 7200 RPM SATA disk produces the following results:
• Controller Overhead – Masking SATA performance by adding processor capabilities may not be the answer either. Call it what you will – Controller, SP, NAS head, or something else. A storage controller is simply a dedicated server performing specialized storage operations. This means controllers can become overburdened by loading multiple sophisticated applications on them. More complex processes also means the controller consumes additional internal resources (memory, bandwidth, cache, I/O queues, etc.). As real-time capabilities like thin provisioning, automated tiering, deduplication and data compression applications are added, the array’s throughput will diminish.
• “Magic” Cache – This is another area where lots of smoke-and-mirrors can be found. Regardless of the marketing hype, cache is still governed by the laws of physics and has predictable characteristics. If you put a large amounts of cache in front of slow SATA disk, your systems will run really fast – as long as requested data is already located in cache. When it isn’t you must go out to slow SATA disk and utilize the same data retrieval process as every disk access. The same is true when cache is periodically flushed to disk to protect data integrity. Cache is a great tool that can significantly enhance the performance of a storage array. However, it is expensive, and will never act as a “black box” that somehow makes slow SATA disk perform like 15K RPM SAS disks.
• Other Differences – Additional differentiators between “commodity storage” and high performance storage include available I/Os per second, disk latency, RAID level selected, IOPS per GB capability, MTBF reliability, and the Bit Error Rate.
When citing the benefits of “tricked out” commodity storage, champions of this approach usually point to obscure white papers written by social media providers, universities, and research labs. These may serve as interesting reading, but seldom have much in common with production IT operations and “the real world”. Most Universities and research labs struggle with restricted funding, and must turn to highly creative (and sometimes unusual) methods to achieve specific functions from a less-than-optimal equipment. Large social media providers seldom suffer from budget constraints, but create non-standard solutions to meet highly specialized, stable, and predictable user scenarios. This may illustrate an interesting use of technology, but have little value for mainstream IT operations.
As with most things in life, “you can’t get something for nothing”, and the idea of somehow enhancing commodity storage to meet all enterprise data requirements is no exception.
Tape vs. Disk – It’s Time for a Truce
Over the past couple of years we’ve heard an enthusiastic debate over whether Tape is dead, obsolete, or simply relegated to some secondary role like regulatory compliance or litigation response. A lot of the “uproar” has been caused by highly vocal deduplication companies marketing their products by creating fear, uncertainty, and doubt (FUD) among users. And why not, since most of these vendors do not have a significant presence in the tape backup market and therefore little to lose?
However, there are companies with legitimate concerns about managing their backup process and the direction they should take. They need to understand what the facts are and why one approach would be superior to another.
With this goal in mind, let’s take a look at two popular backup & recovery devices – the LTO-5 tape drive and the 3.0 TB SATA disk, and see how they compare.
Specification | Quantum Internal half-high LTO-5 |
Seagate Constellation ES.2 ST33000651SS |
Average drive cost-per-unit | $1,450 | $ 495 |
Typical media cost-per-unit | $54 per tape | N/A |
Native formatted capacity | 1500 GB | 3000 GB |
Native sustained transfer rate
|
140 MB/s | 155 MB/s |
Data buffer size | 256 MB | 64 MB |
Average file access time | 56 sec. | 12.16 ms |
Interfaces available | 6 Gb/s SAS | 6 Gb/s SAS |
Typical duty cycle | 8-hrs/day | 24-hrs/day |
Encryption |
AES 256-bit encryption
|
AES 256-bit encryption |
Power
|
||
Power consumption – Idle | 6.7 Watts | 7.4 Watts |
Power consumption – Typical | 23.1 Watts | 11.3 Watts |
Reliability
|
||
Drive MTBF | 50,000 hours at 100% duty cycle | 1,200,000 hours |
Media MTBF | 30-yrs | |
Non-recoverable Error Rate | 1 in 1 × 1017 bits | 1 sector 1×1015 bits |
Warranty | 3-Year | 3-Year |
Cost Comparison | ||
Storage Cost-per-GB | $0.036 | $0.165 |
5-year total for 1.0 Petabyte | $37,495 | $165,330 |
No surprises here. Simply doing the math indicates the cost to store 1 Petabyte of data for 5-years would be more four times more on spinning disk than on tape media. Granted there are other factors involved in the process, but most offset each other. Both a tape library and a disk array take data center floor space and infrastructure resources. Both consume power and require cooling. Each system must be managed by skilled IT specialists. Deduplication may reduce disk capacity requirements (reducing cost) but so will tape compression and/or increasing the tape drive’s duty cycle from 8 to 12 hours per day. Surprisingly the only major variable over time is the cost of the media, which is heavily weighted in favor of tape.
In the foreseeable future the 4TB SATA disk will make the above calculations somewhat more favorable for the disk drive. However, we expect to see the LTO-6 tape drive in production in the second half of 2012, increasing the tape drive’s sustained transfer rate by 30% and tape media capacity by 47%. This will bring the above tape vs. disk comparison back into close alignment.
The sensible strategy is to develop a backup and recovery system that incorporates both technologies, to capitalize on the strengths of both. Using disk “pools” to aggregate nightly backups (whether deduplicated or not) ensures backup windows can be met, and greatly improves data restoration time. Backing up directly to tape from the “disk pools” allows streaming data to be sustained for maximum performance and transfers data to the lowest-cost media available for long-term archiving, disaster recovery, regulatory compliance, and litigation response.
It’s time this argument to bed. Both tape drives and SATA disk should play a role in a well-designed, highly optimized backup and recovery system. The “war” is over, and for once both combatants won!