Monthly Archives: June 2012
“Big Data” Challenges our Perspective of Technology
It’s easy to hold onto the concept that IT is all about systems, networks, and software. This has been accepted wisdom for the past 50-years. It’s a comfortable concept, but one that is increasing inaccurate and downright dangerous as we move into an era of “big data”! In today’s world not about systems, networks, applications, or the datacenter – it’s all about the data!
For decades accumulated data was treated as a simply bi-product of information processing activities. However, there is growing awareness that stored information is not just digital “raw material”, but a corporate asset containing vast amounts of innate value. Like any other high-value asset, it can be bought or sold, traded, stolen, enhanced, or destroyed.
A good analogy for today’s large-scale storage array is to that of a gold mine. Data is the nuggets of gold embedded in the mine. The storage arrays containing data are the “mine” that houses and protects resident data. Complex and sophisticated hardware, software, tools, and skill-sets are simply tools used to locate, manipulate, and extract the “gold” (data assets) from its surrounding environment. The presence of high value “nuggets” is the sole reason the mining operation exists. If there was no “gold”, the equipment used to extract and/or manipulate it would be of little value.
This presents a new paradigm. For years storage was considered some secondary peripheral that was considered only when new systems or applications were being deployed. Today storage has an identity of its own that is independent from the other systems and software in the environment.
Data is no longer just a commodity or some type of operational residue left over from the computing process. “Big Data” forces a shift in focus from IT assets deployment and administration to the management of high-value data assets. It dictates that data assets sit at the center of concentric rings, ensuring security, recoverability, accessibility, performance, data manipulation, and other aspects of data retention are addressed as abstract requirements with unique requirements. Now information must be captured, identified, valued, classified, assigned to resources, protected, managed according to policy, and ultimately purged from the system after its value to the organization has been expended.
This requires a fundamental change in corporate culture. As we move into an era of “big data” the entire organization must be aware of information’s value as an asset, and the shift from technology-centric approaches for IT management. Just like gold in the above analogy, users must recognize that all data is not “created equal” and delivers different levels of value to an organization for specific periods of time. For example, financial records typically have a high level of inherent value, and retain a level of value for some defined period of time. (The Sarbanes-Oxley act requires publicly-traded companies to maintain related audit documents for no less than seven years after the completion of an audit. Companies in violation of this can face fines of up to $10 million and prison sentences of 20 years for Executives.)
However, differences in value must be recognized and managed accordingly. Last week’s memo about the cafeteria’s luncheon specials must not be retained and managed in the same fashion as an employee’s personnel record. When entered into the system, information should be classified according to a well-defined set of guidelines. With that information it can be assigned to an appropriate storage tier, backed up on a regular schedule, kept available on active storage as necessary, later written to low-cost archiving media to meet regulatory and litigation compliance needs. Once data no longer delivers value to an organization, it can be expired by policy, freeing up expensive resources for re-use.
This approach moves IT emphasis away from building systems tactically by simply adding more-of-the-same, and replacing it with a focus on sophisticated management tools and utilities that automate the process. Clearly articulated processes and procedures must replace “tribal lore” and anecdotal knowledge for managing the data repositories of tomorrow.
“Big Data” ushers in an entirely new way of thinking about information as stored, high-value assets. It forces IT Departments to re-evaluate their approach for management of data resources on a massive scale. At a data growth rate of 35% to 50% per year, business-as-usual is no longer an option. As aptly noted in a Bob Dylan song, “the times they are a-changin”. We must adapt accordingly, or suffer the consequences.
Tape vs. Disk – It’s Time for a Truce
Over the past couple of years we’ve heard an enthusiastic debate over whether Tape is dead, obsolete, or simply relegated to some secondary role like regulatory compliance or litigation response. A lot of the “uproar” has been caused by highly vocal deduplication companies marketing their products by creating fear, uncertainty, and doubt (FUD) among users. And why not, since most of these vendors do not have a significant presence in the tape backup market and therefore little to lose?
However, there are companies with legitimate concerns about managing their backup process and the direction they should take. They need to understand what the facts are and why one approach would be superior to another.
With this goal in mind, let’s take a look at two popular backup & recovery devices – the LTO-5 tape drive and the 3.0 TB SATA disk, and see how they compare.
Specification | Quantum Internal half-high LTO-5 |
Seagate Constellation ES.2 ST33000651SS |
Average drive cost-per-unit | $1,450 | $ 495 |
Typical media cost-per-unit | $54 per tape | N/A |
Native formatted capacity | 1500 GB | 3000 GB |
Native sustained transfer rate
|
140 MB/s | 155 MB/s |
Data buffer size | 256 MB | 64 MB |
Average file access time | 56 sec. | 12.16 ms |
Interfaces available | 6 Gb/s SAS | 6 Gb/s SAS |
Typical duty cycle | 8-hrs/day | 24-hrs/day |
Encryption |
AES 256-bit encryption
|
AES 256-bit encryption |
Power
|
||
Power consumption – Idle | 6.7 Watts | 7.4 Watts |
Power consumption – Typical | 23.1 Watts | 11.3 Watts |
Reliability
|
||
Drive MTBF | 50,000 hours at 100% duty cycle | 1,200,000 hours |
Media MTBF | 30-yrs | |
Non-recoverable Error Rate | 1 in 1 × 1017 bits | 1 sector 1×1015 bits |
Warranty | 3-Year | 3-Year |
Cost Comparison | ||
Storage Cost-per-GB | $0.036 | $0.165 |
5-year total for 1.0 Petabyte | $37,495 | $165,330 |
No surprises here. Simply doing the math indicates the cost to store 1 Petabyte of data for 5-years would be more four times more on spinning disk than on tape media. Granted there are other factors involved in the process, but most offset each other. Both a tape library and a disk array take data center floor space and infrastructure resources. Both consume power and require cooling. Each system must be managed by skilled IT specialists. Deduplication may reduce disk capacity requirements (reducing cost) but so will tape compression and/or increasing the tape drive’s duty cycle from 8 to 12 hours per day. Surprisingly the only major variable over time is the cost of the media, which is heavily weighted in favor of tape.
In the foreseeable future the 4TB SATA disk will make the above calculations somewhat more favorable for the disk drive. However, we expect to see the LTO-6 tape drive in production in the second half of 2012, increasing the tape drive’s sustained transfer rate by 30% and tape media capacity by 47%. This will bring the above tape vs. disk comparison back into close alignment.
The sensible strategy is to develop a backup and recovery system that incorporates both technologies, to capitalize on the strengths of both. Using disk “pools” to aggregate nightly backups (whether deduplicated or not) ensures backup windows can be met, and greatly improves data restoration time. Backing up directly to tape from the “disk pools” allows streaming data to be sustained for maximum performance and transfers data to the lowest-cost media available for long-term archiving, disaster recovery, regulatory compliance, and litigation response.
It’s time this argument to bed. Both tape drives and SATA disk should play a role in a well-designed, highly optimized backup and recovery system. The “war” is over, and for once both combatants won!
16 Gbps Fibre Channel – Do the Benefits Outweigh the Cost?
With today’s technology there can be no status quo. As the IT industry advances, so must each organization’s efforts to embrace new equipment, applications, and approaches. Without an ongoing process of improvement, IT infrastructures progressively become outdated and the business group they support grows incrementally less effective.
In September of 2010, the INCITS T11.2 Committee ratified the standard for 16Gbps Fibre Channel, ushering in the next generation of SAN fabric. Unlike Ethernet, Fibre Channel is designed for one specific purpose – low overhead transmission of block data. While this capability may be less important for smaller requirements where convenience and simplicity are paramount, it is critical for larger datacenters where massive storage repositories must be managed, migrated, and protected. For this environment, 16Gbps offers more than twice the bandwidth of the current 8Gbps SAN and 40% more bandwidth than the recently released 10Gbps Ethernet with FCoE (Fibre Channel over Ethernet).
But is an investment in 16Gbps Fibre Channel justified? If a company has reached a point where SAN fabric is approaching saturation or SAN equipment is approaching retirement, then definitely yes! Here is how 16Gbps stacks up against both slower fibre channel implementations and with 10Gbps Ethernet.
Emulex Model |
Port Speed | Protocol | Average HBA/NIC Price | Transfer Rate |
Transfer Time for 1TB | Bandwidth Cost per MB/sec. |
Bandwidth Difference |
LPE16002 | 16 Gbps | Fibre Channel | $1,808 | 1939 MB/sec. | 1.43 Hrs. | $0.93 | 160% |
OCe11102 | 10 Gbps | Ethernet | $1,522 | 1212 MB/sec. | 2.29 Hrs. | $1.26 | 100% |
LPe12002 | 8 Gbps | Fibre Channel | $1,223 | 800 MB/sec. | 3.47 Hrs. | $1.53 | 65% |
LPe11000 | 4 Gbps | Fibre Channel | $891 | 400 MB/sec. | 6.94 Hrs. | $2.23 | 32% |
This table highlights several differences between 4/8/16 Gbps fibre channel and 10Gbps Ethernet with FCoE technology (sometimes marketed as Unified Storage). The street prices for a popular I/O Controller manufacturer clearly indicates there are relatively small differences between controller prices, particularly for the faster controllers. Although the 16Gbps HBA is 40% quicker, it is only 17% more expensive!
However, a far more important issue is that 16Gbps fibre channel is backward compatible with existing 4/8 Gbps SAN equipment. This allows segments of the SAN to be gradually upgraded to leading-edge technology without having to suffer the financial impact of legacy equipment rip-and-replace approaches.
In addition to providing a robust, purpose-built infrastructure for migrating large blocks of data, it also offers lower power consumption per port, a simplified cabling infrastructure, and the ability to “trunk” (combine) channel bandwidth up to 128Gbps! It doubles the number of ports and available bandwidth in the same 4U rack space for edge switches, providing the potential for a saving of over $3300 per edge switch.
Even more significant is that 16Gbps provides the additional performance necessary to support the next generation of storage, which will be based on 6Gbps and 12Gbps SAS disk drives. Unlike legacy FC storage, which was based upon 4Gbps FC-AL arbitrated loops, the new SAS arrays are on switched connections. Switching provides a point-to-point connection for each disk drive, ensuring every 6Gbps SAS connection (or in the near future, 12Gbps SAS connection) will have a direct connection to the SAN fabric. This eliminates backend saturation of legacy array FC-AL shared busses, and will place far greater demand for storage channel performance on the SAN fabric.
So do the benefits of 16Gbps fibre channel outweigh its modest price premium? Like many things in life – it depends! Block-based 16Gbps fibre channel SAN fabric is not for every storage requirement, but neither is file-based 10Gbps FCoE or iSCSI. If it is a departmental storage requirement or an environment where NAS or iSCSI has previously been deployed, then replacing the incumbent protocol with 16Gbps fibre channel may or may not have merit. However, large SAN storage array are particularly dependent on high performance equipment specifically designed for efficient data transfers. This is an arena where the capabilities and attributes of 16Gbps fibre channel will shine.
In any case, the best protection against making a poor choice is to thoroughly research the strengths and weaknesses of each technology and seek out professional guidance from a vendor-neutral storage expert with a Subject Matter Expert level understanding of the storage industry and its technology.