Enhanced Commodity Storage – Do You Believe in Magic?
With predictable regularity someone surfaces on the Web, claiming they have discovered a way to turn slow SATA arrays into high performance storage. Their method usually involves adding complex and sophisticated software to reallocate and optimize system resources. While there may a few circumstances where this might work, in reality it is usually just the opposite.
The problem with this concept is similar to the kit car world several decades ago. At the time, kit-build sports cars were all the rage. Automobile enthusiasts were intrigued by the idea of building a phenomenal sports car by mounting a sleek fiberglass body on the chassis of a humble Volkswagen Beetle. Done properly, the results were amazing! As long as their workmanship was good, the end results would rival the appearance of a Ferrari, Ford GT-40, or Lamborghini!
However, this grand illusion disappeared the minute its proud owner started the engine. Despite its stunning appearance, the kit car was still built on top of an anemic VW bug chassis, power train, and suspension!
Today we see a similar illusion being promoted by vendors claiming to offer “commodity storage” capable of delivering the same high performance as complex SAN and NAS systems. Overly enthusiastic suppliers push the virtues of cheap “commodity” storage arrays with amazing capabilities as a differentiator in this highly competitive market. The myth is perpetuated within the industry by a general lack of understanding of the underlying disk technology characteristics, and a desperate need to manage shrinking IT budgets, coupled with a growing demand for storage capacity.
According to this technical fantasy, underlying hardware limitations don’t count. In theory, if you simply run a bunch of complex software functions on the storage array controllers, you somehow repeal the laws of physics and get “something for nothing”.
That sounds appealing, but it unfortunately just doesn’t work that way. Like the kit car’s Achilles heel, hardware limitations of underlying disk technology govern the array’s capabilities, throughput, reliability, scalability, and price.
• Drive Latencies – the inherent latency incurred to move read/write heads and rotate disks until the appropriate sector address is available can vary significantly.
For example, comparing performance of a 300GB, 15K RPM SAS disk to a 3TB 7200 RPM SATA disk produces the following results:
• Controller Overhead – Masking SATA performance by adding processor capabilities may not be the answer either. Call it what you will – Controller, SP, NAS head, or something else. A storage controller is simply a dedicated server performing specialized storage operations. This means controllers can become overburdened by loading multiple sophisticated applications on them. More complex processes also means the controller consumes additional internal resources (memory, bandwidth, cache, I/O queues, etc.). As real-time capabilities like thin provisioning, automated tiering, deduplication and data compression applications are added, the array’s throughput will diminish.
• “Magic” Cache – This is another area where lots of smoke-and-mirrors can be found. Regardless of the marketing hype, cache is still governed by the laws of physics and has predictable characteristics. If you put a large amounts of cache in front of slow SATA disk, your systems will run really fast – as long as requested data is already located in cache. When it isn’t you must go out to slow SATA disk and utilize the same data retrieval process as every disk access. The same is true when cache is periodically flushed to disk to protect data integrity. Cache is a great tool that can significantly enhance the performance of a storage array. However, it is expensive, and will never act as a “black box” that somehow makes slow SATA disk perform like 15K RPM SAS disks.
• Other Differences – Additional differentiators between “commodity storage” and high performance storage include available I/Os per second, disk latency, RAID level selected, IOPS per GB capability, MTBF reliability, and the Bit Error Rate.
When citing the benefits of “tricked out” commodity storage, champions of this approach usually point to obscure white papers written by social media providers, universities, and research labs. These may serve as interesting reading, but seldom have much in common with production IT operations and “the real world”. Most Universities and research labs struggle with restricted funding, and must turn to highly creative (and sometimes unusual) methods to achieve specific functions from a less-than-optimal equipment. Large social media providers seldom suffer from budget constraints, but create non-standard solutions to meet highly specialized, stable, and predictable user scenarios. This may illustrate an interesting use of technology, but have little value for mainstream IT operations.
As with most things in life, “you can’t get something for nothing”, and the idea of somehow enhancing commodity storage to meet all enterprise data requirements is no exception.
SAN Fabric for the Next Generation
There’s a quiet revolution going on in large data centers. It’s not as visible or flashy as virtualization or deduplication, but at least equal in important.
As its name implies, SAN “fabric” is a dedicated network that allows servers, storage arrays, backup & recovery systems, replication devices, and other equipment to pass data between systems. Traditionally this has been comprised of 4Gbps Fibre Channel and 1Gbps Ethernet channels. However, a new family of 8Gbps and 16Gbps Fibre Channel, 6Gbps and 12Gbps SAS, and 10Gbps Ethernet are quietly replacing legacy fabric with links capable of 2 – 4 times the performance.
The following is a comparison of the maximum throughput rates of various SAN fabric links:
Performance ranges from the relatively outdated 1Gbps channel (Ethernet or FC) capable of supporting data transfers of up to 100 MB per second, to 16Gbps Fibre Channel capable of handling 1940 MB per second. Since all are capable of full duplex (bi-directional) operations, the sustainable throughput rate is actually twice the speed indicated in the chart. If these blazing new speeds are still insufficient, 10Gbps Ethernet, 12Gbps SAS, and 16Gbps Fibre Channel can be “trunked” – bundled together to produce an aggregate bandwidth equal to the number of individual channels tied together. (For example, eight 16Gbps FC channels can be bundled to create a 128Gbps “trunk”.)
In addition to high channel speeds, 10Gbps Ethernet and 16Gbps Fibre Channel both implement a 64b/66b encoding scheme, rather than the 8b/10b encoding scheme used by lower performance channels. The encoding process improves the quality of the data transmission, but at a cost. An 8b/10b encoding process decreases available bandwidth by 20%, while 64b/66b encoding only reduces bandwidth by 3.03%. This significantly increases data transfer efficiency.
While 8/16Gbps Fibre Channel and 10Gbps Ethernet are changing the game at the front-end, SAS is revolutionizing the back-end disk drive connections as well. For over a decade, enterprise-grade disks had 2Gbps or 4Gbps ports, and were attached to a Fiber Channel Arbitrated Loop (FC-AL). Like any technologies using loop technology, low traffic enjoyed maximum speed but performance dropped off as demand increased. Under heavy load conditions, the back-end bus could become a bottle-neck.
SAS will change that for two reasons. First it uses switched technology, so every device attached to the controller “owns” 100% of the bus bandwidth. The latency “dog leg pattern” found on busy FC-AL busses is eliminated. Secondly current SAS drives are shipping with 6Gbps ports, which are 50% faster than 4Gbps Fibre Channel. Just over the horizon are 12Gbps SAS speeds that will offer a 300% increase in bandwidth to the disks, and do it over switched (isolated) channels.
Recent improvements in fabric performance will support emerging SSD technology, and allow SANs to gracefully scale to support storage arrays staggering under a growth rate of 40% – 50% per year.
16 Gbps Fibre Channel – Do the Benefits Outweigh the Cost?
With today’s technology there can be no status quo. As the IT industry advances, so must each organization’s efforts to embrace new equipment, applications, and approaches. Without an ongoing process of improvement, IT infrastructures progressively become outdated and the business group they support grows incrementally less effective.
In September of 2010, the INCITS T11.2 Committee ratified the standard for 16Gbps Fibre Channel, ushering in the next generation of SAN fabric. Unlike Ethernet, Fibre Channel is designed for one specific purpose – low overhead transmission of block data. While this capability may be less important for smaller requirements where convenience and simplicity are paramount, it is critical for larger datacenters where massive storage repositories must be managed, migrated, and protected. For this environment, 16Gbps offers more than twice the bandwidth of the current 8Gbps SAN and 40% more bandwidth than the recently released 10Gbps Ethernet with FCoE (Fibre Channel over Ethernet).
But is an investment in 16Gbps Fibre Channel justified? If a company has reached a point where SAN fabric is approaching saturation or SAN equipment is approaching retirement, then definitely yes! Here is how 16Gbps stacks up against both slower fibre channel implementations and with 10Gbps Ethernet.
|Port Speed||Protocol||Average HBA/NIC Price||Transfer
|Transfer Time for 1TB||Bandwidth
|LPE16002||16 Gbps||Fibre Channel||$1,808||1939 MB/sec.||1.43 Hrs.||$0.93||160%|
|OCe11102||10 Gbps||Ethernet||$1,522||1212 MB/sec.||2.29 Hrs.||$1.26||100%|
|LPe12002||8 Gbps||Fibre Channel||$1,223||800 MB/sec.||3.47 Hrs.||$1.53||65%|
|LPe11000||4 Gbps||Fibre Channel||$891||400 MB/sec.||6.94 Hrs.||$2.23||32%|
This table highlights several differences between 4/8/16 Gbps fibre channel and 10Gbps Ethernet with FCoE technology (sometimes marketed as Unified Storage). The street prices for a popular I/O Controller manufacturer clearly indicates there are relatively small differences between controller prices, particularly for the faster controllers. Although the 16Gbps HBA is 40% quicker, it is only 17% more expensive!
However, a far more important issue is that 16Gbps fibre channel is backward compatible with existing 4/8 Gbps SAN equipment. This allows segments of the SAN to be gradually upgraded to leading-edge technology without having to suffer the financial impact of legacy equipment rip-and-replace approaches.
In addition to providing a robust, purpose-built infrastructure for migrating large blocks of data, it also offers lower power consumption per port, a simplified cabling infrastructure, and the ability to “trunk” (combine) channel bandwidth up to 128Gbps! It doubles the number of ports and available bandwidth in the same 4U rack space for edge switches, providing the potential for a saving of over $3300 per edge switch.
Even more significant is that 16Gbps provides the additional performance necessary to support the next generation of storage, which will be based on 6Gbps and 12Gbps SAS disk drives. Unlike legacy FC storage, which was based upon 4Gbps FC-AL arbitrated loops, the new SAS arrays are on switched connections. Switching provides a point-to-point connection for each disk drive, ensuring every 6Gbps SAS connection (or in the near future, 12Gbps SAS connection) will have a direct connection to the SAN fabric. This eliminates backend saturation of legacy array FC-AL shared busses, and will place far greater demand for storage channel performance on the SAN fabric.
So do the benefits of 16Gbps fibre channel outweigh its modest price premium? Like many things in life – it depends! Block-based 16Gbps fibre channel SAN fabric is not for every storage requirement, but neither is file-based 10Gbps FCoE or iSCSI. If it is a departmental storage requirement or an environment where NAS or iSCSI has previously been deployed, then replacing the incumbent protocol with 16Gbps fibre channel may or may not have merit. However, large SAN storage array are particularly dependent on high performance equipment specifically designed for efficient data transfers. This is an arena where the capabilities and attributes of 16Gbps fibre channel will shine.
In any case, the best protection against making a poor choice is to thoroughly research the strengths and weaknesses of each technology and seek out professional guidance from a vendor-neutral storage expert with a Subject Matter Expert level understanding of the storage industry and its technology.
FCoE? Thanks, but No Thanks!
I may be a bit “slow on the uptake”, but I’m struggling to understand industry claims that FCoE (Fibre Channel over Ethernet) is superior to having storage traffic sent over Fibre Channel. As a 34-year IT industry veteran and SAN storage specialist, it is my belief the only thing Ethernet data communications and SAN fabric transmissions may have in common is the label “network”. Therefore I’m puzzled why anyone feels “Unified Computing” is a more desirable solution for either Ethernet or SAN traffic. (Other than vendors who want you to buy their FCoE products.)
For the past couple of years we’ve been flooded with claims that “Unified Computing” (A.K.A. – Fibre Channel over Ethernet, or FCoE) is superior to separate Ethernet and SAN fabric networks. Webcasts and the trade press are awash with comments about the benefits and advantages of this new technology. If you believe everything you read, then FCoE should simply be sweeping the industry, making segregated Ethernet and SAN fabric channels a thing of the past. It’s not.
But will it? When I examined some of the claims in greater detail, they just don’t add up. The following is a matrix of popular “benefits” presented for FCoE, and my corresponding response as to why I question the validity of their claims.
|Reduces the number of adapters and cables that are deployed||On the surface this sounds logical, but it really doesn’t make much sense if you think about it. If a network (LAN or SAN) is designed for 30% average throughput with spikes of up to 70%, then it will still need (2) cables to support the configuration (70% + 70% = 140% of a single cable’s capacity). Unless your system is relatively small and/or the network is seriously underutilized, multiple cables will still be required. In addition FCoE will require some type of Quality-of-Service utility to ensure one service will not “starve” another, adding both additional complexity and greater expense.|
|Higher performance from 10Gbps network||This is also a compelling argument if performance is compared to 4Gbps Fibre Channel. But why, when 8Gbps FC is the current standard? Due to its more efficient protocol, 8Gbps performance is very similar to that of 10Gbps Ethernet. More significantly, now that 16Gbps Fibre Channel is shipping FCoE over 10Gbps Ethernet is the technology playing “catch up” now.|
|40Gbps and 100Gbps Ethernet interfaces are coming||This is a meaningless claim unless you’re doing extreme computing. 8Gbps Fibre Channel has been shipping for a couple of years, yet it is still being adopted at a leisurely pace. If there is no rush to upgrade from 4Gbps to 8Gbps FC (a 100% increase), why then will there be a rush to deploy 40Gbps Ethernet (a 400% increase) or 100Gbps (1000% increase) over 10Gbps Ethernet? Even 16Gbps Fibre Channel is a 160% over 10Gbps Ethernet.20Gbps and 40Gbps Infiniband have also been around for quite awhile. If raw channel speed is a major industry requirement, then why hasn’t Infiniband become a dominant network technology?|
|More efficient 64/66 encoding||If throughput is crucial, there is a logical argument for using 10Gbps FCoE (that uses 64/66 encoding) rather than 4Gbps or 8Gbps Fibre Channel (which has the less efficient 8/10 encoding). However, the latest 16Gbps Fibre Channel (and above) employs 64/66 encoding too, so this “benefit” is no longer relevant.|
|Greater flexibility||Hmmm… I’m not certain how merging two dissimilar technologies onto a single network medium will provide “greater flexibility”. In most cases just the opposite occurs.|
|Lower power and cooling||Since their component count, general circuit layout, and optical drivers are very similar, just what is it that makes FCoE have “lower power and cooling”? (Please don’t say that it’s because it needs fewer cables. Passive Fibre cabling really doesn’t consume much power!) 🙂|
|Simplified Infrastructure||This might be true, as long as you’re running low demand systems that only require a single cable. However, if traffic load needs two or more cables, then all bets are off.|
|Better compatibility with virtualized servers||Why? How is running multiple virtual servers over FCoE provide better compatibility than running multiple virtual servers over NPIV? What unique attribute is it that makes FCoE more compatible?|
|Availability of network security tools||This is an interesting argument. The reason we have more Ethernet security tools is that as an external facing technology, more people are trying to hack it. It is true that fibre channel has fewer security tools, but if they are sufficient to provide excellent storage security, why does having more of them matter?|
|Lower cost||Really? What numbers were they looking at? A quick search on Google Shopping shows both FCoE NICs and 8Gbps HBAs are in roughly priced the same.Several months ago we also estimated the total cost of an enterprise architecture using the both technologies, and found that the FCoE configuration ran about 50% higher than 8Gbps Fibre Channel! So much for being less inexpensive!|
|Familiarity within the enterprise||True, but what does familiarity have to do with it? There are lots of people familiar with copying data to DVDs, but that doesn’t make DVDs a better choice for data center backup and recovery. A specialized application like NetBackup or TSM will do a far better job of enterprise backup and recovery, even if only a few IT backup specialists are familiar with them. “Dumbing down” an IT operation to save money is a questionable tactic if user performance is sacrificed in the process.|
|Interface with the Cloud||In what way? The TCP/IP protocol is not native to WAN communications infrastructure, so 10Gbps Ethernet must be converted into something else on each end, just like Fibre Channel. For an internal Cloud connection, TPC/IP is not native to the SAN storage either, so 10Gbps Ethernet must be converted into a block storage format and back in the array, as well.|
|Simplified management and integration with tools||Whoever claimed this as a “benefit” apparently knew little about the breadth and depth of storage management tools available on the market today.|
|No proprietary tools needed to install||I have no idea what proprietary tools they’re referring to for installing Fibre Channel. Last time I did a Fibre Channel installation we used exactly the same tools that were used for high-speed Ethernet interconnections.|
|Lossless Ethernet||Hmmm… If I push the Ethernet standard far enough to compensate for its inherent “best effort” characteristics, doesn’t it just end up looking a lot like the Fibre Channel Protocol (Which is a well established, proven technology)?|
|Operational efficiencies and performance enhancements||If I run FCP (or any protocol) over any other protocol I incur two types of delays – conversion latency, and the consumption of extra CPU cycles. How does adding overhead improve either efficiency or performance?|
|People and skill consolidation||This is an argument typically presented by people with a limited understanding of the complexity of modern SAN storage. Ethernet LANs and SAN FC Fabric have very little in common, other than both support data traffic. Assigning Ethernet LAN specialists to manage enterprise SAN fabric makes no more sense than having SAN specialists manage corporate network communications.|
|Ubiquitous computing||This is a benefit? Stored data is the most valuable asset a corporation or Agency owns. While it may be important to offer ubiquitous computing to the user community, maintaining, protecting, and optimizing data assets should be carefully orchestrated activity provided by highly trained storage specialists!|
|Cost-effective network||Do your own comprehensive cost comparison and see if you agree. My estimate indicated identical functionality from 10Gbps FCoE would cost around 150% more than an equivalent 8Gbps Fibre Channel configuration.|
|Pervasive skill set||Like the “people and skill consolidation” myth above, this is based on a misguided assumption that operating a SAN fabric is somehow similar to operating an Ethernet data communications network. It is not.|
|Simplified interoperability||This may be true – if you can tolerate the latency and performance penalties associated with having one technology host another. As long as server farms are fairly small and storage requirements are modest, making performance compromises for the sake of convenience isn’t an issue. However, it rapidly grows in difficulty as stored data volume increases.|
|Reduces capital and operational costs||As above, do your own price estimates for identical functionality from 10Gbps Ethernet and 8Gbps FC. I think you may be surprised.|
What seems to be missing from these discussions is:
- Vulnerability created by having both data communications and storage traffic over the same medium. If there is an external attack on the Ethernet network, all computing activities will be brought to a halt. If there is a critical firmware bug, both data and SAN traffic is impacted. Troubleshooting becomes much more complex and time-consuming.
- The importance of keeping dissimilar technologies separate so they’re allowed to evolve at their own pace. If both storage traffic and data communications are dependent upon Ethernet, then each is constrained by the evolution of the other. If one requires more capacity and the other doesn’t, you’re forced to buy the consolidated infrastructure in its entirity.
- Dissimilar skill sets and areas of responsibility managed by different IT specialists. Ask most LAN specialists how to zone a fabric or allocate LUNs and you’ll get a blank stare. Ask most SAN specialists how the configure a router or use a packet sniffer, and you’ll probably get a similar response. SAN storage and SAN fabric management are activities that are inextricably linked. Splitting areas of responsibility between a LAN Group and SAN Group is a recipe for operational inefficiency, troubleshooting complexities, and reduced staff productivity.
- If industry adoption of FCoE has been widespread, then why do IT industry research Groups keep reporting sluggish sales? Also, why do Fibre Channel equipment sales remain robust?
I have no illusions there being lots of things I didn’t know, so I could be wrong about this too. If you feel there are other compelling reasons why FCoE will dominate the industry, I’d love to hear them.