How to Avoid the Shortcomings of Virtual Tape Libraries

July 14, 2008

VTLs can sometimes make life easier for the enterprise user - but not always. As George Crump, founder of Storage Switzerland, points out, it's not nearly as simple as it seems to make these devices work well. And, if not done right, it can be worse than not having them at all.

It started innocently enough. Users were struggling with backing up to tape. Disk prices, especially with SATA (Serial ATA) technology, were going down while capacities were going up. Users started using SATA technology as a fast cache to tape to help improve backup speeds. Also, if they had to recover from the disk, it was faster and easier than tape. Disk became attractive for recovery. The problem that most users ran into was, they could not afford to buy enough disks to handle all of their disk backups (which would have eliminated the need for tape altogether).

A manual transfer of the data from the disk array, through the backup server, to a tape library became necessary. Additionally, moving that backup data electronically across the network to another disk array on a remote site, became necessary if tape was to be truly eliminated. The problem with a basic disk accomplishing what was required was, most WAN segments were too small (and still are) to handle the size of data that a traditional backup application can produce in a backup window.

VTLs are complex
First, VTLs continue to be very complex to implement into any environment. Most are SAN (storage area network)-based and, as a result, bring all of the complexities of a Fibre Channel SAN into the equation. A great many customers, to this day, have a very limited number of servers attached to a SAN - let alone backed up via a SAN. It is too complex to be cost-effective, especially in large shops where more FC is found. VTLs made matters worse by adding the complexity of integrating an often foreign device into the SAN. This addition creates challenges with zoning and partitioning the SAN so that the backup process and production storage can peacefully coexist.

Even if you purchase your VTL solution from your SAN supplier, most of the larger primary storage manufacturers have either acquired or OEM'd that technology - and the integration has not been perfected yet. As a result, even if the logo on your VTL solution matches the logo on your SAN array, it is still very complex to implement and maintain.

VTLs increase total backup completion times
Many data centers do not consider data to be protected until a second copy of the backup is made for DR (disaster recovery) purposes. In the case of VTLs, that means that backup data has to be on a piece of tape media, as well as on disk. Since most VTL solutions are not supplied with capacity optimized (data deduplication or at a minimum compression), the move from disk storage to tape has to happen quite frequently - most commonly every night.

Even those VTL solutions that have added data deduplication to their systems are struggling. There is typically no back-end integration to the tape device. The data under the control of the backup application needs to pull this data from the VTL back through the backup server, and then out to the tape library. Essentially, to get data to tape now requires three transfers instead of one. Initially, it goes from the backup server to the VTL, then from the VTL back to the backup server, and then it goes from the backup server to the tape library. This results in three times the opportunity for transfer failure. This process in the VTL world happens on almost EVERY backup.

VTLs extend time to DR
Also with VTLs, there is limited ability to create an electronic DR copy. The stored backup set on disk was quite large, and many VTL solutions have some capability to replicate data to another unit. The size of the backup set made it prohibitive to do so, except across the shortest links. In many cases, this DR copy was simply made by directing the backup application to make a copy from one VTL to another VTL, at the end of the WAN link. Some VTLs did have the ability to communicate directly with each other, but then the backup application had no awareness of the second copy of data, and the size of the data set was too large. The vaulting option, most likely to be chosen, remained to copy data to another set of tapes (that is a fourth time data has to move). Finally, put the tape on a truck, and send the truck to the vault.

Most VTL restores are not from disk
In the VTL world, most moves to the vault were via a tape on a truck, and that tape had to be created soon after the backup to adhere to a DR policy. Because of this, it meant that another big driver for disk-based backups, recovery from disk, was eliminated. Most VTLs create virtual pieces of media that have bar codes, which correspond to the real media and bar codes in the library. Most backup applications can handle only one occurrence of a backup set and one bar- code instance. As a result, when the copy to tape was made, the backup set that was on the disk was eliminated and was no longer available for recoveries. This meant, even if you could afford a large disk cache, you often could not use it for recoveries.

VTLs do not reduce tape media expenditures
Since the move to tape has to happen almost the moment the backup job is complete, and then a second set of tapes is created for DR off-site, the amount of media used is not reduced. In fact, some VTLs that move data directly to tape as a background process cannot properly calculate the capacity differences between storing the backup set on a non-compressed disk section of the VTL, and then moving it to the compressed tape side of the VTL. To compensate, they will recommend turning off tape drive compression, effectively doubling the tape media requirement. Or they recommend not using the background transfer to tape and letting the application manage it.

VTLs offer inefficient deduplication
To compensate for their lack of value, VTLs have attempted to add data deduplication as a product feature. The challenge is, because most of these systems have deduplication as a post-process, they again further complicate the process by creating separate storage areas to resource and manage (de-duped and non-de-duped). They also, by definition of post-process, delay replication time, thus impacting the security of an in-place DR copy until the post-process deduplication can complete.

VTLs are complex to install, and they do not address the key requirements of their purchasers: faster backups, electronic vaulting, and recoveries from disk and media cost reduction.

So what are some solutions?
The first and simplest solution is to try to eliminate tape altogether or at least lessen the number of times it is needed in the environment. This is achievable today by using a capacity-optimized disk backup target that leverages a technology like data deduplication. Data deduplication identifies redundant data segments before they can be written to disk. Instead of writing those redundant segments, it makes a pointer to the original segment. Since the data in this week's full backup is likely to be very similar to the data in last week's full backup, storage efficiencies of 20-times are not uncommon, meaning that 200TB's of backup data can be stored in about 10TB's worth.

Similar to VTLs, the more sophisticated data deduplication devices can replicate between one another. Unlike VTLs, only the unique segments are stored on the data deduplication device. Plus, it replicates only those unique segments making electronic vaulting over very modest WAN links very possible (and now commonplace in the segment). As with VTLs, the backup process is not complete until a DR copy of the backup is made. With a deduplication system that leverages this form of electronic vaulting, especially one that does in-line data deduplication, this happens very quickly.

By being able to store data more efficiently on disk, the move to tape in some data centers may be eliminated entirely. Most data centers will still need to move data to tape, so the value with these capacity-optimized devices is that this move can be far less frequent. There is also less risk for data loss because each backup job is replicated to the remote data deduplication system at the vault. The tape then is used for long-term retention. Fewer media is used because data is sent to tape less often. The data is electronically vaulted to another data deduplication system and the tape created locally, for long-term storage, can become the final vault copy. A reduction of media expenditures by two-thirds is not uncommon with data deduplication systems.

Lastly, some data deduplication systems are IP-based. Integration into the existing environment is easy; users are not forced into a tape protocol (FC). Most IT staffs have a much deeper knowledge base of IP than FC, and the configuration of the environment is simpler and far less intrusive. Implementation time is measured in hours as opposed to days.

In-line data deduplication
In-line data deduplication, (data that is deduplicated as it is processed before it lands on disk), keeps the process simple by not having to require separate storage areas. With most systems, replication to DR can happen as data is being written to the deduplication system in the primary location. The net result is that data deduplication systems continue to address the immediate pain that IT professionals are facing in the backup process. They are improving reliability and simplifying a very complex process.

VTLs are a tape-augmentation solution. Big shops can and will afford it, for now. A VTL can act as a fast-cache-to-tape in a large consolidated data center. Here, backup virtualization is a better strategy to look at, which would include virtualizing data deduplication systems.

George Crump is the founder of Storage Switzerland, an analyst company focused on the virtualization and storage marketplaces. An industry veteran for over 25 years, he has held engineering and sales positions at various IT industry manufacturers and integrators. Prior to founding Storage Switzerland, Crump was chief technology officer at one of the nation's largest integrators. He can be reached at georgeacrump@mac.com.

Scalable NAS Resources

  • Storage Consolidation Without Performance Compromise

    Given the current state of the economy, storage consolidation is now a high priority for every IT organization. But for IT organizations running performance-sensitive applications, storage consolidation can be a major challenge.

  • Demystifying NAS Clustering

    Data storage needs are on the rise. But beyond simply providing more raw capacity, today¿s storage solutions must also be easy to provision and manage, energy-efficient, and highly scalable in performance and capacity. Download this white paper to learn about HP NAS clustering solutions that help meet today¿s rapidly changing storage requirements.

  • Windows File Server Consolidation: Reference Architecture and Configurations

    Organizations that deploy Microsoft Windows file servers receive many useful services. Traditional file servers, however, lack scalability, so organizations must add file servers as their data storage needs grow. This results in server sprawl, which leads to low utilization of the file servers and sub-optimal availability of storage. Learn how organizations benefit from consolidating their Windows file serving environments using HP Scalable NAS, a highly scalable, manageable and available storage solution.

  • Data Mobility Group TCO Study on ExDS

    Storage administrators are being challenged to manage enterprise data growth and maintain increasing service level commitments while keeping within budgets. This study examines the total cost of ownership of the new HP StorageWorks 9100 Extreme Data Storage System (ExDS9100) and compares it to three competitive approaches. Learn how the HP ExDS9100 is well positioned to deliver massive scalability in both capacity and performance, yet offers considerable cost advantages to meet today¿s storage challenges.

  • Managing Exponential Storage Growth

    In this IT Link podcast hosted by Mike Vizard, Scott Campbell, HP manager of solutions architects, explains why HP is taking a different approach to managing storage using a new XDS architecture specifically designed to handle the requirements of rapidly growing unstructured data storage.

  • Comprehending NAS Clusters

    In this IT Link podcast hosted by Mike Vizard, Efren Molina, PolyServe technical specialist for HP, explains how NAS cluster technology is being used to help customers keep costs in line even as their storage requirements continue to balloon.

  • Coming to Terms with Storage Management

    In this IT Link podcast hosted by Mike Vizard, Logicalis vice president of consulting Eric Linxweiler explains why storage management software is becoming a strategic issue as the amount and types of data that needs to be managed continues to explode.

  • Massively Scalable NAS: Pre-Empting Tomorrow’s Data Overload with Today’s Technology

    NAS has always been simple, unless IT managers wanted to grow their NAS storage significantly. For the first time, storage administrators are thinking in terms of managing petabytes of storage, making massive storage build-outs a necessity. Learn how companies can affordably meet these demands with a simply managed, highly scalable NAS environment.

  • Transparent Business Continuity and Availability through HP Scalable NAS

    This solution brief explores HP’s next generation of Scalable NAS and how it caters to every business continuity need by being highly available and easy to deploy while adding levels of affordable, fault tolerant data protection and availability.

  • Scalable NAS: Insights from customers, analysts and HP

    When IT administrators are looking for networked storage solutions, they often look to NAS because they can use the Ethernet infrastructure they are familiar with to build pools of storage for significantly less money than SAN with equivalent capacity. Unfortunately, traditional NAS doesn't scale and administrators find themselves having to add NAS platforms to keep up with growing storage demands. As a result, many administrators have started looking for alternative solutions.

  • Scalable, Always Available Solution for Digital Media

    Learn how HP's Scalable NAS solution offers central management and administration, scalable capacity and improved utilization, with a lower total cost of ownership (TCO)

  • Create an On-demand Streaming Media Storage Solution with HP Scalable NAS

    Watch this demo and learn how HP's next generation of Scalable NAS is well suited for streaming media serving solutions.

  • Roswell Park Cancer Institute Improves Scalability and Performance with HP Storage Solution

    When Roswell Park Cancer Institute (RPCI) needed to remain on the front line of research and to continue providing high-quality care for patients, they chose a comprehensive HP storage solution and improved storage capacity, performance and scalability.

  • HP Storage Removes Bottlenecks, Consolidates Storage and Increases Revenue for Crest Animation

    When Crest Animation looked to take on an increased workload and handle High Definition and 2K film animations, the company chose a comprehensive HP storage solution that has given the company a unified, highly reliable storage infrastructure.

  • Create a Scalable Infrastructure for Oracle

    Oracle Database and the Oracle E-Business Suite are at the heart of most commercial data centers. HP's Scalable NAS solution Create a scalable infrastructure for Oracle consolidation and file serving.

  • Streaming Media Content Reference Architecture

    The new Web 2.0 business model, where the data is the business, utilizes the Internet to disseminate information in many different ways.

  • Scalable NAS for Oracle Demo

    NAS has been rapidly evolving as a storage alternative for Oracle databases, and for good reason: NAS is often the simplest, most cost-effective storage approach for Oracle.

  • Consolidation for an Optimized Storage Environment

    Windows File Server and Storage Consolidation using HP EVA File Services.

  • Scalable, Fault-Tolerant NAS for Oracle: The Next Generation

    For several years NAS has been evolving as a storage alternative for Oracle databases, and for good reason