Backup Replication Best Practices

Backup infrastructures that are utilizing disks to backup data on premise and not using tapes to store copies offsite must replicate their data to a disaster recovery or secondary site, in order to mitigate the risks of losing data when the primary sites go away due to disaster.

Popular backup solutions such as Avamar usually include replication feature that logically copies data from one or more source backup servers to a destination or target backup server. In addition, Avamar uses deduplication methodology at the source server, transferring unique data only to the target server and encrypting the data during transmission. Avamar replication is accomplished via asynchronous IP transfer and can be configured to run on a scheduled basis.

Some of the best practices of Avamar replication are:

1. Replicate during low backup activity and outside of the routine server maintenance
2. Replicate all backup clients
3. Avoid filtering backup data because it may inadvertently miss backups
4. Ensure available bandwidth is adequate to replicate all daily changed data within a 4-hour period.

Backing Up Virtual Machines Using Avamar Image-Level Backup

Avamar can backup virtual machines using guest level backup or image-level backup.

The advantages of VMware guest backup are that it allows backup administrators to leverage identical backup methods for physical and virtual machines, which reduces administrative complexity, and it provides the highest level of data deduplication, which reduces the amount of backup data across the virtual machines.

The second way to backup virtual machines is via the Avamar image-level backup. It is faster and more efficient and it also supports file level restores.

Avamar integrates with VMware VADP (vStorage API for Data Protection) to provide image level backups. Integration is achieved through the use of the Avamar VMware Image plug-in. Simply put, the VMware Image backup creates a temporary snapshot of the virtual machine, and uses a virtual machine proxy to perform the image backup.

Backup can occur while the virtual machines are powered on or off. Since the backup is handled by a proxy, CPU cycles of the target virtual machines are not used.

Avamar provides two ways for restoring virtual machine data: image restores, which can restore an entire image or selected drives; and file-level restores, which can restore specific folders or files.

However, file-level restores are only supported on Windows and Linux. In addition, it has the following limitations:

1. File-level restores are more resource intensive and are best used to restore a relatively small amounts of data. In fact, you cannot restore more than 5,000 folders or files.

2. The latest VMware Tools must be installed on the target virtual machine, in order to successfuly restore files and folders.

3. Dynamic disks, GPT disks, deduplicated NTFS, ReFS, extended partitions, bootloaders, encrypted and compressed partitions virtual disk configurations are not supported.

4. ACLs are not restored.

5. Symbolic links cannot be restored.

6. When restoring files or folders to the original virtual machine, only SCSI disks are supported; IDE disks are not supported.

If you must restore folders or files, and you ran into the limitations mentioned above, you can restore an entire image or selected drives to a temporary location (for example, a new temporary virtual machine), then copy those files and folders to the desired location following the restore.

Harvard Club of Worcester

I was recently appointed as the president of the Harvard-Radcliffe Club of Worcester for the year 2015/2016.

The Harvard-Radcliffe Club of Worcester was founded in 1906 to increase fellowship opportunities for Harvard alumni in Worcester and Central Massachusetts.

As I begin my term for the 2015/2016 season, I invite all the alumni in the area to join and become active members of the club.   In 2014/2015, the club hosted a number of gatherings and good times were shared by all.  The club board worked hard to provide a range of events that appeal to a diversity of interests.

The club has number of great events planned for the coming months. There are two exciting events scheduled in October:

(1)   Family-friendly apple picking, picnic, and wine tours at Nashoba Valley Winery on Sunday, October 4th at 10:30 a.m.

(2)   Fall Cocktail Party and recent graduates “Welcome to Your City” event on Tuesday, October 27th  at 5:30 p.m. at Bocado Tapas Wine Bar.

In November, we will cheer on the Harvard Crimson Men’s Basketball Team as they take on the Holy Cross Crusaders at the Hart Center in Holy Cross (Sunday, November 29th at 1:00 p.m.)

In December, we’ll have our annual Harvard Club Holiday Dinner at Wachusett Mountain;   in January, we’ll have our Harvard Alumni Global Networking Night at the Worcester Club;   in February, we’re going on a trip to the newly renovated Harvard Art Museums; in March, we’ll cheer on the Harvard Crimson Men’s Basketball Team;  in April, we’ll do community service in the Worcester area;  in May, we’ll have our family-friendly brunch at the Green Hill Park; and finally, in June, we’ll hold our annual meeting and election of officers.

In addition to club events, we have an active Schools and Scholarships Committee that assists the Harvard Admissions Committee by interviewing local candidates for admission and offering networking opportunities and support for Central Massachusetts students attending the College.  The Committee also administers the Harvard Book Award, given to outstanding juniors at local high schools.

If interested in any of these events, please email me at jonas.palencia@alumni.harvard.edu.

Data Protection Best Practices

Data protection is the process of safeguarding information from threats to data integrity and availability.  These threats include hardware errors, software bugs, operator errors, hardware loss, user errors, security breaches, and acts of God.

Data protection is crucial to the operation of any company and a sound data protection strategy must be in place. Following is my checklist of a good data protection strategy, including implementation and operation:

1. Backup and disaster recovery (DR) should be a part of the overall design of the IT infrastructure.  Network, storage and compute resources must be allocated in the planning process. Small and inexperienced companies usually employ backup and DR as an afterthought.

2. Classify data and application according to importance.  It is more cost-effective and easier to apply the necessary protection when data are classified properly.

3. With regards to which backup technology to use – tape, disk or cloud, the answer depends on several factors including the size of the company and the budget.  For companies with budget constraints, tape backup with off-site storage generally provides the most affordable option for general data protection.  For medium-sized companies, a cloud backup service can provide a disk-based backup target via Internet connection or can be used as a replication target. For large companies with multiple sites, on-premise disk based backup with remote WAN-based replication to another company site or cloud service may provide the best option.

4. Use snapshot technology that comes with the storage array. Snapshots are the fastest way to restore data.

5. Use disk mirroring, array mirroring, and WAN-based array replication technology that come with the storage array to protect against hardware / site failures.

6. Use continuous data protection (CDP) when granular rollback is required.

7.  Perform disaster recovery tests at least once a year to make sure the data can be restored within planned time frames and that the right data is being protected and replicated.

8. Document backup and restore policies – including how often the backup occurs (e.g. daily), the backup method (e.g. full, incremental, synthetic full, etc), and the retention period (e.g. 3 months).  Policies must be approved by upper management and communicated to users.  Document as well all disaster recovery procedures and processes.

9. Monitor all backup and replication jobs on a daily basis and address the ones that failed right away.

10.  Processes must be in place to ensure that newly provisioned machines are being backed up.  Too often, users assume that data and applications are backed up automatically.

11. Encrypt data at rest and data in motion.

12. Employ third party auditors to check data integrity and to check if the technology and processes work as advertised.

A good data protection strategy consists of using the right tools, well trained personnel to do the job, and effective processes and techniques to safeguard data.

Enterprise File Sync and Share

Due to increased usage of mobile devices (iPhone, iPad, Android, tablet, etc) in the enterprise, the need for a platform where employees can synchronize files between their various devices is becoming a necessity. In addition, they need a platform where they can easily share files both inside and outside of the organization. Some employees have been using this technology unbeknownst to the IT department. The popular file sync and share cloud-based app dropbox has been very popular in this area. The issue with these cloud-based sync-and-share apps is that for corporate data that are sensitive and regulated, it can pose a serious problem to the company.

Enterprises must have a solution in their own internal data center where the IT department can control, secure, protect, backup, and manage the data. IT vendors have been offering these products over the last several years. Some examples of enterprise file sync are share are: EMC Syncplicity, Egnyte Enterprise File Sharing, Citirx Sharefile, and Accellion Kiteworks.

A good enterprise file sync and share application must have the following characteristics:

1. Security. Data must protected from malware and it must be encrypted in transit and at rest. The application must integrate with Active Directory for authentication and there must be a mechanism to remote lock and/or wipe the devices.
2. Application and data must be supported via WAN acceleration, so users do not perceive slowness.
3. Interoperability with Microsoft Office, Sharepoint, and other document management system.
4. Support for major endpoint devices (Android, Apple, Windows).
5. Ability to house data internally and in the cloud.
6. Finally, the app should be easy to use. Users’ files should be easy to access, edit, share, and restore, or else people will revert back to cloud-based apps that they find super easy to use.

The Battle Between External Cloud Providers and Internal IT Departments

Nowadays, when business units require computing resources for their new software application, they have a choice between using an external provider or using the company’s internal IT department. Gone are the days when they solely rely on the IT department to provide them with compute and storage resources. Business units are now empowered because of the growing reliability and ubiquity of external cloud providers such as Amazon Web Services (AWS).

Services provided by external providers are generally easy to use and fast to provision. As long as you have a credit card, a Windows or Linux server can be running within a few hours, if not minutes. Compare that to internal IT departments which usually take days, if not weeks, to spin-up one. Large companies especially have to follow a bureaucratic procedure that takes weeks to complete.

Because of this, business units who are under the pressure to provide the application or service to the end users end up using external providers. This is the fast growing “shadow IT.” More often than not, IT departments do not know about this, until they are called to troubleshoot issues, such as to fix a slow network connection or to restore data after a security breach or data loss.

Using external providers can be good for the company. They have their merits such as fast provisioning and ability to quickly scale up, but they also have their limitations. Security, vendor lock-in, integration with on-premise applications and databases are some of the concerns. Some of these business units do not know the implication on the company’s network which may impact users during normal business hours. Some of them do not consider backup and disaster recovery. For regulated companies, compliance and data protection are important. They should be able to tell the auditors where the data resides and replicates. Also, as you scale up the use of compute and storage, it gets more costly.

External cloud providers are here to stay and their innovation and services will get better and better. The future as I foresee it will be a hybrid model – a combination of external providers and internal IT providers. The key for companies is to provide guidelines and policies on when to use external provider vs internal IT. For instance, a proof of concept application may be well suited to an external cloud because it is fast to provision. An application that is used only by a few users and does not need any integration with existing application is another one. Applications that integrates with the company’s internal SAP system, on the other hand, is well suited for internal cloud. These policies must be well communicated to business units.

For IT departments, they must be able to provide a good level of service to the business, streamline the process of provisioning, adapt technologies that are able to respond to the business quickly, and provide an internal cloud services that matches the services offered by external providers. This way, business units will be forced to use internal IT instead of external providers.

Integrating Riverbed Steelfusion with EMC VNX

SteelFusion is an appliance-based IT-infrastructure for remote offices. SteelFusion eliminates the need for physical servers, storage and backup infrastructure at remote offices by consolidating them into the data centers. Virtual servers located at the data centers are projected to the branch offices, enabling the branch office users access to servers and data with LAN-like performance.

SteelFusion uses VMware to project virtual servers and data to the branch office. Robust VMware infrastructure usually consists of fiber channel block-based storage such as EMC VNX. The advantage of using EMC VNX or any robust storage platform is its data protection features such as redundancy and snapshots.

In order to protect data via the use of EMC VNX array-based snapshot, and so that data can be backed up and restored using 3rd party backup software, the following items must be followed:

1. When configuring storage and LUNs, use Raid Group instead of Storage Pools. Storage Pools snapshots do not integrate well with Steelfusion for now.

2. Create Reserve LUNs to be used for snapshots.

3. When adding the VNX storage array information to Steelfusion Core appliance, make sure to select ‘Type: EMC CLARiON’, not EMC VNX.

For more information, consult the Riverbed documentation.

Migrating Data to Isilon NAS

Isilon has made it easy to migrate data from NetApp filers to Isilon clusters. They made a utility called isi_vol_copy that copies files including its metadata and its ACL (access control list) information via NDMP protocol. The utility is run on the Isilon command line interface. There is no need to use a separate host that executes migration tools such as robocopy, which may be slower and more difficult to manage.

isi_vol_copy is versatile enough to do a full baseline copy of data and perform updates of the deltas on a daily basis using the incremental switch, until the day of the cutover. Since Isilon is BSD-based, the incremental copy jobs can be run via crontabs.

The load can also be distributed by running the isi_vol_copy utility on multiple nodes on the Isilon cluster.

The syntax of the command is:

isi_vol_copy <source_filer>:<directory> -full|incr -sa username:password <destination_directory_on_Isilon>

Using Isilon as VMware Datastore

I recently implemented a VMware farm utilizing Isilon as a backend datastore. Although Isilon’s specialty is sequential access I/O workloads such as file services, it can also be used as a storage for random access I/O workloads such as datastore for VMware farms. I only recommend it though for low to mid-tier VMware farms.

Isilon scale-out storage supports both iSCSI and NFS implementations. However, NFS implementation is far superior than iSCSI. The advantages of NFS are:

1. simplicity – managing virtual machines at the file level is simpler than managing LUNs,
2. rapid storage provisioning – instead of managing LUNs, all VMDK files may be stored on a single file export, eliminating the need to balance workloads across multiple LUNs,
3. higher storage utilization rates – VMDK files are thin-provisioned by default when using NAS-based datastore.

In addition, Isilon only supports software iSCSI initiators.

Isilon supports VAAI (vStorage APIs for Array Integration) which offloads I/O intensive tasks from the ESXi host to the Isilon storage cluster directly (such as when doing storage vmotion, virtual disk cloning, NAS-based VM snaphots, and VM instant provisioning), which results in overall faster completion times. Isilon also supports VASA (vStorage APIs for Storage Awareness) which presents the underlying storage capabilities to vCenter.

When using NFS datastore, it is very important to follow the implementation best practices which can be found here. Some of the important best practices are:

1. Connect the Isilon and ESXi hosts to the same physical switches on the same subnet. The underlying network infrastructure should also be redundant, such as redundant switches.
2. Use 10GB connectivity to achieve optimal performance.
3. Segment NFS traffic so that other traffic such as virtual machines network traffic or management network traffic do not share bandwidth with NFS traffic.
4. Use separate vSwiches for NFS traffic on the VMware and use dedicated NICs for NFS storage.
5. Use Smartconnect zone to load balance between multiple Isilon nodes, as well as dynamic failover and failback of client connections across the Isilon storage nodes.
6. Enable the VASA features and functions to simplify and automate storage resource management
7. To achieve higher aggregate I/O, create multiple datastores, with each datastore mounted via a separate FQDN/ Smartconnect pool and network interface on the Isilon cluster.

2015 Storage Trends

The world of data storage has seen significant innovation over the years. This year, companies will continue to adopt these storage technologies and storage vendors will continue to innovate and develop exciting products and services. Here are my top 5 storage trends for this year:

1. Software-defined storage (SDS) or storage virtualization will start to have huge adoption in tier-2 or tier-3 storage. Virtual storage appliances such as Nutanix and Virtual SAN-like solutions such as VMware virtual-SAN will find their way in companies looking for simple converged solutions.

2. The cost of flash storage will continue to drop, driving its deployment to tier-1, I/O intensive applications such as VDI. Flash storage will also continue to be used on server-side flash, and on hybrid or tiered-based appliances.

3. Small and medium companies will make headway in utilizing the cloud for storage, but mostly as backup and sync-and-share applications.

4. Storage vendors will release products with integrated data protection including encryption, archiving, replication, backup, and disaster recovery.

5. Finally, the demand for storage will continue to grow because of the explosion of big data, the “internet of things”, and large enterprises building redundant data centers.