Cloud storage is for blocks too, not just files

One of the misconceptions about cloud storage is that it is only useful for storing files. This assumption comes from the popularity of file-sharing services and the fact that it’s difficult to conceive of data objects that aren’t files because that’s what people use.

But Windows Azure Storage works very well for storing fingerprints and fingerprints definitely are not files—they are logical packages of contiguous blocks of data. Blocks are an addressing mechanism that operating systems use to calculate where to put data to maintain system performance. CiS systems exchange blocks, not files, with servers. On the cloud side, CiS systems exchange objects with Windows Azure Storage—and fingerprints, chock full of blocks, are the objects used by the Microsoft HCS solution. Just as the operating system translates files into blocks in order to store them on storage devices, CiS translates blocks into fingerprints so they can be stored in the cloud.

Source of Information : Rethinking Enterprise Storage

A big breakthrough: Cloud snapshots

The Microsoft HCS solution incorporates elements from backup, dedupe, and snapshot technologies to create a highly automated data protection system based on cloud snapshots. A cloud snapshot is like a storage snapshot but where the snapshot data is stored in Windows Azure Storage instead of in a storage array. Cloud snapshots provide system administrators with a tool they already know and love—snapshots—and extend them across the hybrid cloud boundary.


Fingerprints in the cloud
The data objects that are stored as snapshots in the cloud are called fingerprints. Fingerprints are logical data containers that are created early in the data lifecycle when data is moved out of the input queue in the CiS system. While CiS systems store and serve block data to servers, they manage the data internally as fingerprints.

Just as backup processes work by copying newly written data to tapes or disk, cloud snapshots
work by copying newly made fingerprints to Windows Azure Storage. One of the biggest differences between backup and cloud snapshots is that backup transforms the data by copying it into a different data format, whereas cloud snapshots copy fingerprints as-is without changing the data format. This means that fingerprints in Windows Azure Storage can be directly accessed by the CiS system and used for any storage management purpose.

Cloud snapshots work like incremental-only backups insofar that fingerprints only need to be uploaded once to Windows Azure Storage. Replication services in Windows Azure Storage makes multiple copies of the data as protection against failures. With most backup systems, there are many different backup data sets that need to be tracked and managed, but with cloud snapshots, there is only a single repository of fingerprints. In addition, there is no need to create synthetic full tapes because all the fingerprints needed to be recovered are located in the same Windows Azure Storage bucket.


Scheduling cloud snapshots
IT teams can flexibly configure their CiS systems to perform automated cloud snapshots to meet a broad range of requirements. Unlike tape backup systems that necessarily tie data expiration to tape rotation schedules, cloud snapshots can be assigned any expiration period. For instance, if the IT team decides they want to keep all cloud snapshot data for a minimum of three months, they can do it without having to worry about which tapes to use. Also, if the IT team wants to upload data more frequently, they can run cloud snapshots several times a day.


Efficiency improvements with cloud snapshots
Cloud snapshots eliminate tape problems and operator errors because there are no tapes to manage, lose, or go bad. No tapes need to be loaded for the next backup operation, no tapes are transferred off site, there are no tape names and labels to worry about, and no courier services need to be engaged. The arcane best practices that were developed for tape backup no longer apply to cloud snapshots. This is an enormous time saver for the IT team and removes them from the drudgery of managing tapes, tape equipment, and backup processes. Data protection with cloud snapshots also eliminates the need to make full or synthetic full tapes. The incremental-only approach of cloud snapshots means that a minimal amount of data is copied and transferred. In addition, the fact that data is deduped on-premises before it is snapshotted means the amount of data that is uploaded is minimized.


Comparing cloud snapshots
The biggest difference between cloud snapshots with the Microsoft HCS solution and other backup products is the integration with Windows Azure Storage. Cloud snapshots improve data protection in three important ways:

1. Off-site automation. Cloud snapshots automatically copy data off site to Windows Azure Storage.

2. Access to off-site data. Cloud snapshot data stored off site is quickly accessed on premises.

3. Unlimited data storage and retention. The amount of backup data that can be retained on Windows Azure Storage is virtually unlimited.

Remote replication can be used to enhance disk-based backup and snapshot solutions by automating off-site data protection. The biggest difference between cloud snapshots and replication-empowered solutions is that replication has the added expense of remote systems and facilities overhead, including the cost of managing disk capacities and replication links.


Remote office data protection
Cloud snapshots are also effective for automating data protection in remote and branch offices (ROBOs). These locations often do not have skilled IT team members on site to manage backup, and as a result, it is common for companies with many ROBOs to have significant gaps in their data protection.

Installing the Microsoft HCS solution in ROBO locations allows the IT team to completely automate data protection in Windows Azure Storage. This highlights another important architectural advantage—the many to one (N:1) relationship of on-premises locations to cloud storage. This design makes it possible for a Microsoft HCS solution at a corporate data center to access data from any of the ROBO locations. In addition, alerts from CiS systems running in the ROBOs can be sent to the IT team so they can remotely troubleshoot any problems that arise.


The role of local snapshots
CiS systems also provide local snapshots that are stored on the CiS system. Although local and cloud snapshots are managed independently, the first step in performing a cloud snapshot is running a local snapshot. In other words, all the data that is snapped to the cloud is also snapped locally first. The IT team can schedule local snapshots to run on a regular schedule—many times a day and on demand.


Looking beyond disaster protection
Snapshot technology is based on a system of pointers that provide access to all the versions of data stored by the system. The Microsoft HCS solution has pointers that provides access to all the fingerprints stored on the CiS system and in Windows Azure Storage.

The fingerprints and pointers in a Microsoft HCS solution are useful for much more than disaster protection and accessing point-in-time copies of data. Together they form a hybrid data management system that spans the hybrid cloud boundary. A set of pointers accompanies every cloud snapshot that is uploaded to Windows Azure Storage, referencing the fingerprints that are stored there. The system of pointers and fingerprints in the cloud is a portable data volume that uses Windows Azure Storage for both protection and portability.

This hybrid data management system enables additional data and storage management functions beyond backup and disaster recovery. For example, data tiering and archiving both take advantage of it to manage data growth and drive storage efficiency. 

Source of Information : Rethinking Enterprise Storage

For the love of snapshots

Snapshot technology is an alternative to backup that was first made popular by NetApp in their storage systems. Snapshots are a system of pointers to internal storage locations that maintain access to older versions of data. Snapshots are commonly described as making point-in-time copies of data. With snapshots, storage administrators are able to recreate data as it existed at various times in the past.

Snapshot technology is widely appreciated by IT teams everywhere for having saved them innumerable hours that they would have spent restoring data from backup tapes. It’s no wonder that snapshot technology has become a key element of storage infrastructures and is one of the most heavily utilized features on most business-class storage systems.

While IT teams have largely replaced backups with snapshots for restoring historical versions of data, the two technologies are often used together in backup scenarios. Snapshots are used to capture updates to data and then backup processes capture the data from the snapshot. This keeps backups from interfering with active production applications and their data.

One problem with snapshots is that they consume additional storage capacity on primary storage that has to be planned for. The amount of snapshot data depends on the breadth of changed data and the frequency of snapshots. As data growth consumes more and more capacity the amount of snapshot data also tends to increase and IT teams may be surprised to discover they are running out of primary storage capacity. A remedy for this is deleting snapshot data, but that means fewer versions of data are available to restore than expected.

In many cases, that may not be a huge problem, but there could be times when not being able to restore previous versions of data could cause problems for the IT team. Otherwise, the ease that snapshot capacity can be returned to free space depends on the storage system and may not be as simple as expected.

Source of Information : Rethinking Enterprise Storage

Dedupe makes a big difference

A breakthrough in virtual tape technology came when dedupe technology was integrated with VTLs. Like previous-generation VTLs, dedupe VTLs require backup software products to generate backup data, but the dedupe function eliminates redundant data from backup streams. This translates directly into backup storage capacity savings and makes them much more cost-competitive with tape systems. Not only that, but dedupe VTLs improve backup  performance by simultaneously backing up a larger number of servers and by keeping more backup copies readily available online. Many organizations
happily replaced their tape backup systems with dedupe VTLs.

While dedupe VTLs have transformed backup for many IT teams, it has done relatively little to make disaster recovery easier. In most cases, tape copies still need to be made for off-site protection and the challenges of restoring data from tape are the same whether they were generated by tape drives or a dedupe VTL. However, like incremental-only backup solutions, some dedupe VTLs can also replicate data off site to another remote dedupe VTL, eliminating the need to make off-site tape copies—with the familiar caveats that remote replication adds additional systems and facilities costs as well as being more complicated to manage.


Dedupe variations: source and primary dedupe
Due to the success of dedupe backup systems, most people associate dedupe technology with target-side backup protection, but the technology can be successfully implemented other ways as well. Source dedupe implements dedupe technology before sending it over the network to be backed up. The main advantage of source dedupe is that it consumes far less bandwidth to transfer data and the main disadvantage is that it takes more processing resources on the server where the dedupe process runs.

Primary dedupe is the application of dedupe technology for primary production data, as opposed to being limited to backup data. The main advantage of primary dedupe is that it reduces the amount of capacity consumed on primary storage— which tends to be the most expensive storage in the data center. The main disadvantage of primary dedupe is the performance impact of running dedupe on production data.

Source of Information : Rethinking Enterprise Storage

Incremental-only backup

The incremental-only approach to backup makes a single full backup copy and thereafter makes incremental backup copies to capture newly written data. If synthetic full tapes are not made, this approach leads to horrendously long and troublesome restores because every tape that was ever made might be needed for recovery. This implies copies need to be made of every tape in case they fail and also requires them to be stored in different locations, which means it might be necessary to have multiple copies at each location to account for media failures and so on and so forth. (It’s funny what disaster paranoia will lead you to think about.)

That’s why backup vendors developed disk-based, incremental-only backup systems that automatically copy backup data from a backup system at one site to another system at a remote location. When a disaster happens at the primary site, a full recovery can be made at the remote site from backup data in the remote system.

Incremental-only backup solutions integrate database, replication, and backup software along with the redundant hardware systems and facilities overhead at the remote site. Like other disk-based backup systems, they have capacity limitations that restrict the amount of backup data that can be kept, requiring management diligence and planning. Incremental-only backup systems are effective for solving backup problems, but, if the IT team also wants to reduce the cost of storage, incremental-only systems probably don’t fit the bill.

Source of Information : Rethinking Enterprise Storage

Virtual tape

The desire to reduce the dependency on tape for recovery gave rise to the development of virtual tape libraries (VTLs) that use disk drives for storing backup data by emulating tapes and tape hardware. Off-site storage of backup data is accomplished by copying virtual tapes onto physical tapes and transporting them to an off-site facility. This backup design is called disk-to-disk-to-tape, or D2D2T—where the first disk (D) is in file server disk storage, the second disk (D) is in a virtual tape system, and tape refers to tape drives and media.

VTLs significantly improve the automation of backup processes and provide good backup performance, but are more expensive than tape backup systems. Because the storage capacity of virtual tape products is limited, it might not be possible to backup as many servers or retain as much backup data as desired. For cost, capacity, and performance reasons, VTLs were mostly used in niche environments until dedupe technology was integrated with them and made them more widely applicable.

Source of Information : Rethinking Enterprise Storage

The many complications and risks of tape

Magnetic tape technology was adopted for backup many years ago because it met most of the physical storage requirements, primarily by being portable so that it could be transported to an off-site facility. This gave rise to a sizeable ecosystem of related backup technologies and services, including tape media, tape drives, autoloaders, large scale libraries, device and subsystem firmware, peripheral interfaces, protocols, cables, backup software with numerous agents and options, off-site storage service providers, courier services, and a wide variety of consulting practices to help companies of all sizes understand how to implement and use it all effectively.


Tape media
Tape complexity starts with its physical construction. In one respect, it is almost miraculous that tape engineers have been able to design and manufacture media that meets so many challenging and conflicting requirements. Magnetic tape is a long ribbon of multiple laminated layers, including a microscopically jagged layer of extremely small metallic particles that record the data and a super-smooth base layer of polyester-like material that gives the media its strength and flexibility. It must be able to tolerate being wound and unwound and pulled and positioned through a high-tension alignment mechanism without losing the integrity of its dimensions. Manufacturing data grade magnetic tapes involves sophisticated chemistry, magnetics, materials, and processes.

Unfortunately, there are many environmental threats to tape, mostly because metals tend to oxidize and break apart. Tape manufacturers are moving to increase the environmental range that their products can withstand, but historically, they have recommended storing them in a fairly narrow humidity and temperature range. There is no question that the IT teams with the most success using tape take care to restrict its exposure to increased temperatures and humidity. Also, as the density of tape increases, vibration during transport has become a factor, resulting in new packaging and handling requirements. Given that tapes are stored in warehouses prior to being purchased and that they are regularly transported by courier services and stored off-site, there are environmental variables beyond the IT team’s control—and that makes people suspicious of its reliability.

Tape’s metallic layer is abrasive to tape recording heads and constantly causes wear and tear to them. Over time the heads wear out, sometimes much faster than expected. It can be very difficult to determine if the problem is head wear, tape defects, or dirty tape heads. Sometimes the only remedy is to replace both the tape heads and all the tapes. The time, effort, and cost involved in managing
wear-and-tear issues can be a sizeable burden on the IT group with no possible return on that investment to the organization. Tape aficionados are very careful about the tapes they buy and how they care for them, but many IT leaders no longer think it is worthwhile to maintain tapes and tape equipment.


Media management and rotation
Transporting tapes also exposes them to the risk of being lost, misplaced, or stolen. The exposure
to the organization from lost tapes can be extremely negative, especially if they contain customer account information, financial data, or logon credentials. Businesses that have lost tapes in-transit have not only had to pay for extensive customer notification and education programs, but they have also suffered the loss of reputation.

Backup software determines the order that tapes are used, as well as the generation of tape names. Unfortunately, tapes are sometimes mislabeled which can lead to incomplete backup coverage, as well as making restores and recoveries more challenging. It sounds like a simpleproblem to solve, but when you consider that multiple tapes may have been used as part of a single backup job and that some tapes (or copies of tapes) are off site and cannot be physically checked, it turns out that there is not always a fast way to clear up any confusion.

Tape rotation is the schedule that is used by backup software to determine which tapes should be used for the next backup operation. If an administrator improperly loads the wrong tape in a tape drive, the backup software may not run, which means new data is not protected.

Conversely, the backup software may choose to overwrite existing data on the tape, making it impossible to recover any of it. A similar problem occurs when a backup administrator erroneously deletes tape records from the backup system’s database or erases the wrong tapes. Backup only works correctly when the database used to track data on tape accurately reflects the data that is recorded on tapes.

These sorts of problems are well-known to backup administrators and are more common that one might think. Backup administration and tape management tends to be repetitive, uninteresting work which sets the stage for operator oversights and errors. This is the reality of tape backup and it is why automated data protection with the Microsoft HCS solution from

Microsoft is such an important breakthrough. It removes the responsibility for error- prone processes from people who would rather be doing something else. When you look at all the problems with tape, it is highly questionable as an infrastructure technology. Infrastructures should be dependable above all else and yet, that is the consistent weakness of tape technology in nearly all its facets.


Synthetic full backups
An alternative to making full backup copies is to make what are called synthetic full copies, which aggregate data from multiple tapes or disk-based backups onto a tape (or tapes) that contains all the data that would be captured if a full backup were to be run. They reduce the time needed to complete backup processing, but they still consume administrative resources and suffer from the same gremlins that haunt all tape processes.

The real issue is why it should be necessary to make so many copies of data that have already been made so many times before. Considering the incredible advances in computing technology over the years, it seems absurd that more intelligence could not be applied to data protection, and it highlights the fundamental weakness of tape as a portable media for off-site storage.


Restoring from tape
It would almost be comical if it weren’t so vexing, but exceptions are normal where recovering from tape is concerned. Things often go wrong with backup that keeps it from completing as expected. It’s never a problem until it’s time to recover data and then it can suddenly become extremely important in an unpleasant sort of way. Data that was skipped during backup cannot be recovered. Even worse, tape failures during recovery prevents data from being restored.

Unpleasant surprises tend to be just the beginning of a long detour where restores are concerned. Fortunately, there may be copies from earlier backup jobs that are available to recover. Unfortunately, several weeks or months of data could be lost. When this happens, somebody has a lot of reconstruction work to do to recreate the data that couldn’t be restored.

One thing to expect from disaster recovery is that more tapes will need to be used than assumed. Another is that two different administrators are likely to vary the process enough so that the tapes they use are different—as well as the time they spend before deciding the job is done, which implies the job is never completely finished. Most people who have conducted a disaster recovery would say there was unfinished business that they didn’t have time to figure out. Their efforts were good enough—they passed the test—but unknown problems were still lurking.

Source of Information : Rethinking Enterprise Storage

Cloud storage is for blocks too, not just files

One of the misconceptions about cloud storage is that it is only useful for storing files. This assumption comes from the popularity of file...