Protecting Data Located at Remote Sites

One of the challenges of remote offices with limited bandwidth and plenty of data is how to protect that data. Building a local backup infrastructure can be cost prohibitive and usually the best option is to backup the data to the company’s data center or to a cloud provider.

But how do you initially bring the data to the backup server without impacting the business users using the wide area network (WAN)?

There are three options:

1. The first option is to “seed” the initial backup. Start the backup locally to a USB drive, ship the drive to the data center, copy the data, then perform subsequent backups to the data center.

2. Use the WAN to backup the data but throttle the bandwidth until it completes. WAN utilization will be low, but it may take some time to complete.

3. Use the WAN to backup data and divvy up the data into smaller chunks. So that the users will not be affected during business hours, run the backup jobs only during off-hours and during the weekends. This may also take some time to complete.

Book Review: The Industries of the Future

I came across this book while browsing the New Arrivals section at a local bookstore. As a technology enthusiast, the title has piqued my interest. However, the other reason why I wanted to read this book was to find an answer to the question “How do we prepare our children for the future?” As a father of a teenage daughter, I would like to provide her with all the opportunities and exposure she needs to enable her to make the right career choice and be better prepared for the future.

The author Alec Ross states in the introduction, “This book is about the next economy. It is written for everyone who wants to know how the next wave of innovation and globalization will affect our countries, our societies, and ourselves.”

The industries of the future are:

1. Robotics. Robots have been around for many years, but the ubiquity of network connection, availability of big data, and faster processors are making significant progress in robotics.

2. Genomics. If the last century is the age of Physics, the coming century will be the the age of Biology. The sequencing of genomics has opened the door to many opportunities in life sciences.

3. Blockchains. The financial industry and the way we handle commerce will be transformed by this technology.

4. Cybersecurity. The Internet will be the next place where war between nations will be waged.

5. Big Data. Use of predictive analytics or other advanced methods to extract value from data will allow us to “perform predictions of outcomes and behaviors” and alter the way we live.

There is nothing new about these technologies. However, what made the book really worth reading were the examples, anecdotes and interesting stories told by Ross. The author has traveled extensively around the world and has first hand experience of these technologies.

Back to the question, “How do we prepare our children for the future?” —  the best thing we can do is to encourage them to pursue a career in science and technology and allow them to travel so they will be comfortable in a multicultural world.

Translating Business Problems into Technology Solutions

One of the most important jobs of IT Consultants/Architects is to translate business problems into technology solutions. Many companies today and in the future will need to solve business problems to remain competitive. Exponential advances in information technology will enable businesses to solve problems.

But translating business problems into technology solutions is often hard. Most of the time there is a disconnect between business people and technology people. For example, business people speak of vision, strategy, processes, and functional requirements, whereas technology folks speak about programming, infrastructure, big data and technical requirements. In addition, people who understand the business typically are not smart about technology, and vice versa – technology folks often do not understand business challenges. Both have totally different perspectives – business folks are concerned about business opportunities, business climate, and business objectives, while technology folks are concerned about technology challenges, technical resources, and technical skills.

To be successful, IT Consultants/Architects should bridge the gap and provide businesses the services and the solution they need. IT Consultants/Architects need to translate business objectives into actions. In order to do this, they should be able to identify business problems, determine the requirements to solve problems, determine the technology available to help solve them, and architect the best solution. In addition, they should be able to identify strategic partners that will help move the project and determine likely barriers.

Most importantly though, IT Consultants/Architects should be able to manage expectations. It’s always better to under promise and over deliver.

Object Storage

A couple of days ago, a business user asked me if our enterprise IT provides object-based storage. I heard the term object storage before but I have little knowledge about it. I only know it’s a type of storage that is data aware. I replied “No, we don’t offer it yet.” But in the back of my mind, I was asking myself, should we be offering object storage to our users? Are we so behind, we haven’t implemented this cool technology? Is our business losing its competitive advantage because we haven’t been using it?

As I research more on the topic, I understood what it entails, its advantages and disadvantages.

Object storage is one of the hot technologies that is expected to grow adoption this year. As defined by Wikipedia, object storage, “is a storage architecture that manages data as objects, as opposed to other storage architectures like file systems which manage data as a file hierarchy and block storage which manages data as blocks within sectors and tracks. Each object typically includes the data itself, a variable amount of metadata, and a globally unique identifier.”

Its extended metadata allows for some intelligence in the data. For example, a user or application can tag a data object what type of file it is, how it should be used, who will use it, its contents, how long it should live, and so on. That metadata information could, in turn, inform a backup application, for instance, that the object is classified or that it should be deleted on a certain date. This makes tasks like automation and management simpler for the administrator.

The globally unique identifier allows a server or end user to retrieve the data without needing to know the physical location or hierarchical location of the data. This makes it a useful data storage for long-term data retention, backup, file-sharing, and cloud application. In fact, Facebook uses object storage when you upload a picture.

One drawback of object storage is performance – slow throughput and latency due to the amount of metadata. Another drawback is that data consistency is achieved slowly. Whenever an object is updated, the change has to be propagated to all of the replicas which takes time before the latest version becomes available. With these properties, it’s well suited for data that doesn’t change much, like backups, archives, video, and audio files. That’s why it’s heavily used by Facebook, Spotify, and other cloud companies because once you upload a picture or music file, it doesn’t change much and it stays forever.

Object storage may be one of the hottest technologies in the storage space, but for now, I don’t see compelling use cases in enterprise IT. Object storage is unsuitable for data that changes frequently. File systems and block storage do just fine in storing data that rarely changes or data that frequently changes. Enterprise backup systems are versatile as well for long-term data retention and backups. Object storage may provide more information about the data, but storage administrators primary concerns are to deliver the data faster and more efficiently, as well as to protect its integrity.

Object storage distributed nature enables IT shops to use low cost storage, but in reality, in enterprise IT, NAS and SAN are prevalent because they are reliable and easier to manage.

We need well defined use cases and compelling advantages for object-based storage to be widely used in enterprise IT.

How to Restore from Replicated Data

When the primary backup server goes down due to hardware error, a site disaster, or other causes, the only way to restore is via the replicated data, assuming the backup server was configured to replicate to a DR (Disaster Recovery) or secondary site.

In Avamar, replicated data is restored from the REPLICATE domain of the target Avamar server. All restores of replicated data are directed restores, because from the point of view of the Avamar target server, the restore destination is a different machine from the original one.

The procedure to restore files and directories are:

  1. Re-register and activate the server to the Avamar replication target server.
  2. Perform file/directory restore.
    • Select the data that you want to restore from the replicated backups for the clients within the REPLICATE domain
    • Select Actions > Restore Now
    • On the Restore Options window, notice that the only destination choice is blank so that a new client must be selected
    • Click Browse and select a client and destination from among the listed clients. Note that these clients are clients that are activated with the target server and are not under the REPLICATE domain.

If the Windows or UNIX/Linux server was part of the disaster, then the way to restore data is to build a new server first, then follow the procedure above to restore files and directories to that server. The other way is to perform a bare metal restore which is supported by Avamar on Windows 2008 and above.

Backup Replication Best Practices

Backup infrastructures that are utilizing disks to backup data on premise and not using tapes to store copies offsite must replicate their data to a disaster recovery or secondary site, in order to mitigate the risks of losing data when the primary sites go away due to disaster.

Popular backup solutions such as Avamar usually include replication feature that logically copies data from one or more source backup servers to a destination or target backup server. In addition, Avamar uses deduplication methodology at the source server, transferring unique data only to the target server and encrypting the data during transmission. Avamar replication is accomplished via asynchronous IP transfer and can be configured to run on a scheduled basis.

Some of the best practices of Avamar replication are:

1. Replicate during low backup activity and outside of the routine server maintenance
2. Replicate all backup clients
3. Avoid filtering backup data because it may inadvertently miss backups
4. Ensure available bandwidth is adequate to replicate all daily changed data within a 4-hour period.

Backing Up Virtual Machines Using Avamar Image-Level Backup

Avamar can backup virtual machines using guest level backup or image-level backup.

The advantages of VMware guest backup are that it allows backup administrators to leverage identical backup methods for physical and virtual machines, which reduces administrative complexity, and it provides the highest level of data deduplication, which reduces the amount of backup data across the virtual machines.

The second way to backup virtual machines is via the Avamar image-level backup. It is faster and more efficient and it also supports file level restores.

Avamar integrates with VMware VADP (vStorage API for Data Protection) to provide image level backups. Integration is achieved through the use of the Avamar VMware Image plug-in. Simply put, the VMware Image backup creates a temporary snapshot of the virtual machine, and uses a virtual machine proxy to perform the image backup.

Backup can occur while the virtual machines are powered on or off. Since the backup is handled by a proxy, CPU cycles of the target virtual machines are not used.

Avamar provides two ways for restoring virtual machine data: image restores, which can restore an entire image or selected drives; and file-level restores, which can restore specific folders or files.

However, file-level restores are only supported on Windows and Linux. In addition, it has the following limitations:

1. File-level restores are more resource intensive and are best used to restore a relatively small amounts of data. In fact, you cannot restore more than 5,000 folders or files.

2. The latest VMware Tools must be installed on the target virtual machine, in order to successfuly restore files and folders.

3. Dynamic disks, GPT disks, deduplicated NTFS, ReFS, extended partitions, bootloaders, encrypted and compressed partitions virtual disk configurations are not supported.

4. ACLs are not restored.

5. Symbolic links cannot be restored.

6. When restoring files or folders to the original virtual machine, only SCSI disks are supported; IDE disks are not supported.

If you must restore folders or files, and you ran into the limitations mentioned above, you can restore an entire image or selected drives to a temporary location (for example, a new temporary virtual machine), then copy those files and folders to the desired location following the restore.

Harvard Club of Worcester

I was recently appointed as the president of the Harvard-Radcliffe Club of Worcester for the year 2015/2016.

The Harvard-Radcliffe Club of Worcester was founded in 1906 to increase fellowship opportunities for Harvard alumni in Worcester and Central Massachusetts.

As I begin my term for the 2015/2016 season, I invite all the alumni in the area to join and become active members of the club.   In 2014/2015, the club hosted a number of gatherings and good times were shared by all.  The club board worked hard to provide a range of events that appeal to a diversity of interests.

The club has number of great events planned for the coming months. There are two exciting events scheduled in October:

(1)   Family-friendly apple picking, picnic, and wine tours at Nashoba Valley Winery on Sunday, October 4th at 10:30 a.m.

(2)   Fall Cocktail Party and recent graduates “Welcome to Your City” event on Tuesday, October 27th  at 5:30 p.m. at Bocado Tapas Wine Bar.

In November, we will cheer on the Harvard Crimson Men’s Basketball Team as they take on the Holy Cross Crusaders at the Hart Center in Holy Cross (Sunday, November 29th at 1:00 p.m.)

In December, we’ll have our annual Harvard Club Holiday Dinner at Wachusett Mountain;   in January, we’ll have our Harvard Alumni Global Networking Night at the Worcester Club;   in February, we’re going on a trip to the newly renovated Harvard Art Museums; in March, we’ll cheer on the Harvard Crimson Men’s Basketball Team;  in April, we’ll do community service in the Worcester area;  in May, we’ll have our family-friendly brunch at the Green Hill Park; and finally, in June, we’ll hold our annual meeting and election of officers.

In addition to club events, we have an active Schools and Scholarships Committee that assists the Harvard Admissions Committee by interviewing local candidates for admission and offering networking opportunities and support for Central Massachusetts students attending the College.  The Committee also administers the Harvard Book Award, given to outstanding juniors at local high schools.

If interested in any of these events, please email me at jonas.palencia@alumni.harvard.edu.

Data Protection Best Practices

Data protection is the process of safeguarding information from threats to data integrity and availability.  These threats include hardware errors, software bugs, operator errors, hardware loss, user errors, security breaches, and acts of God.

Data protection is crucial to the operation of any company and a sound data protection strategy must be in place. Following is my checklist of a good data protection strategy, including implementation and operation:

1. Backup and disaster recovery (DR) should be a part of the overall design of the IT infrastructure.  Network, storage and compute resources must be allocated in the planning process. Small and inexperienced companies usually employ backup and DR as an afterthought.

2. Classify data and application according to importance.  It is more cost-effective and easier to apply the necessary protection when data are classified properly.

3. With regards to which backup technology to use – tape, disk or cloud, the answer depends on several factors including the size of the company and the budget.  For companies with budget constraints, tape backup with off-site storage generally provides the most affordable option for general data protection.  For medium-sized companies, a cloud backup service can provide a disk-based backup target via Internet connection or can be used as a replication target. For large companies with multiple sites, on-premise disk based backup with remote WAN-based replication to another company site or cloud service may provide the best option.

4. Use snapshot technology that comes with the storage array. Snapshots are the fastest way to restore data.

5. Use disk mirroring, array mirroring, and WAN-based array replication technology that come with the storage array to protect against hardware / site failures.

6. Use continuous data protection (CDP) when granular rollback is required.

7.  Perform disaster recovery tests at least once a year to make sure the data can be restored within planned time frames and that the right data is being protected and replicated.

8. Document backup and restore policies – including how often the backup occurs (e.g. daily), the backup method (e.g. full, incremental, synthetic full, etc), and the retention period (e.g. 3 months).  Policies must be approved by upper management and communicated to users.  Document as well all disaster recovery procedures and processes.

9. Monitor all backup and replication jobs on a daily basis and address the ones that failed right away.

10.  Processes must be in place to ensure that newly provisioned machines are being backed up.  Too often, users assume that data and applications are backed up automatically.

11. Encrypt data at rest and data in motion.

12. Employ third party auditors to check data integrity and to check if the technology and processes work as advertised.

A good data protection strategy consists of using the right tools, well trained personnel to do the job, and effective processes and techniques to safeguard data.

Enterprise File Sync and Share

Due to increased usage of mobile devices (iPhone, iPad, Android, tablet, etc) in the enterprise, the need for a platform where employees can synchronize files between their various devices is becoming a necessity. In addition, they need a platform where they can easily share files both inside and outside of the organization. Some employees have been using this technology unbeknownst to the IT department. The popular file sync and share cloud-based app dropbox has been very popular in this area. The issue with these cloud-based sync-and-share apps is that for corporate data that are sensitive and regulated, it can pose a serious problem to the company.

Enterprises must have a solution in their own internal data center where the IT department can control, secure, protect, backup, and manage the data. IT vendors have been offering these products over the last several years. Some examples of enterprise file sync are share are: EMC Syncplicity, Egnyte Enterprise File Sharing, Citirx Sharefile, and Accellion Kiteworks.

A good enterprise file sync and share application must have the following characteristics:

1. Security. Data must protected from malware and it must be encrypted in transit and at rest. The application must integrate with Active Directory for authentication and there must be a mechanism to remote lock and/or wipe the devices.
2. Application and data must be supported via WAN acceleration, so users do not perceive slowness.
3. Interoperability with Microsoft Office, Sharepoint, and other document management system.
4. Support for major endpoint devices (Android, Apple, Windows).
5. Ability to house data internally and in the cloud.
6. Finally, the app should be easy to use. Users’ files should be easy to access, edit, share, and restore, or else people will revert back to cloud-based apps that they find super easy to use.