Category Archives: IT Strategy

Data Protection Best Practices

Data protection is the process of safeguarding information from threats to data integrity and availability.  These threats include hardware errors, software bugs, operator errors, hardware loss, user errors, security breaches, and acts of God.

Data protection is crucial to the operation of any company and a sound data protection strategy must be in place. Following is my checklist of a good data protection strategy, including implementation and operation:

1. Backup and disaster recovery (DR) should be a part of the overall design of the IT infrastructure.  Network, storage and compute resources must be allocated in the planning process. Small and inexperienced companies usually employ backup and DR as an afterthought.

2. Classify data and application according to importance.  It is more cost-effective and easier to apply the necessary protection when data are classified properly.

3. With regards to which backup technology to use – tape, disk or cloud, the answer depends on several factors including the size of the company and the budget.  For companies with budget constraints, tape backup with off-site storage generally provides the most affordable option for general data protection.  For medium-sized companies, a cloud backup service can provide a disk-based backup target via Internet connection or can be used as a replication target. For large companies with multiple sites, on-premise disk based backup with remote WAN-based replication to another company site or cloud service may provide the best option.

4. Use snapshot technology that comes with the storage array. Snapshots are the fastest way to restore data.

5. Use disk mirroring, array mirroring, and WAN-based array replication technology that come with the storage array to protect against hardware / site failures.

6. Use continuous data protection (CDP) when granular rollback is required.

7.  Perform disaster recovery tests at least once a year to make sure the data can be restored within planned time frames and that the right data is being protected and replicated.

8. Document backup and restore policies – including how often the backup occurs (e.g. daily), the backup method (e.g. full, incremental, synthetic full, etc), and the retention period (e.g. 3 months).  Policies must be approved by upper management and communicated to users.  Document as well all disaster recovery procedures and processes.

9. Monitor all backup and replication jobs on a daily basis and address the ones that failed right away.

10.  Processes must be in place to ensure that newly provisioned machines are being backed up.  Too often, users assume that data and applications are backed up automatically.

11. Encrypt data at rest and data in motion.

12. Employ third party auditors to check data integrity and to check if the technology and processes work as advertised.

A good data protection strategy consists of using the right tools, well trained personnel to do the job, and effective processes and techniques to safeguard data.

Enterprise File Sync and Share

Due to increased usage of mobile devices (iPhone, iPad, Android, tablet, etc) in the enterprise, the need for a platform where employees can synchronize files between their various devices is becoming a necessity. In addition, they need a platform where they can easily share files both inside and outside of the organization. Some employees have been using this technology unbeknownst to the IT department. The popular file sync and share cloud-based app dropbox has been very popular in this area. The issue with these cloud-based sync-and-share apps is that for corporate data that are sensitive and regulated, it can pose a serious problem to the company.

Enterprises must have a solution in their own internal data center where the IT department can control, secure, protect, backup, and manage the data. IT vendors have been offering these products over the last several years. Some examples of enterprise file sync are share are: EMC Syncplicity, Egnyte Enterprise File Sharing, Citirx Sharefile, and Accellion Kiteworks.

A good enterprise file sync and share application must have the following characteristics:

1. Security. Data must protected from malware and it must be encrypted in transit and at rest. The application must integrate with Active Directory for authentication and there must be a mechanism to remote lock and/or wipe the devices.
2. Application and data must be supported via WAN acceleration, so users do not perceive slowness.
3. Interoperability with Microsoft Office, Sharepoint, and other document management system.
4. Support for major endpoint devices (Android, Apple, Windows).
5. Ability to house data internally and in the cloud.
6. Finally, the app should be easy to use. Users’ files should be easy to access, edit, share, and restore, or else people will revert back to cloud-based apps that they find super easy to use.

The Battle Between External Cloud Providers and Internal IT Departments

Nowadays, when business units require computing resources for their new software application, they have a choice between using an external provider or using the company’s internal IT department. Gone are the days when they solely rely on the IT department to provide them with compute and storage resources. Business units are now empowered because of the growing reliability and ubiquity of external cloud providers such as Amazon Web Services (AWS).

Services provided by external providers are generally easy to use and fast to provision. As long as you have a credit card, a Windows or Linux server can be running within a few hours, if not minutes. Compare that to internal IT departments which usually take days, if not weeks, to spin-up one. Large companies especially have to follow a bureaucratic procedure that takes weeks to complete.

Because of this, business units who are under the pressure to provide the application or service to the end users end up using external providers. This is the fast growing “shadow IT.” More often than not, IT departments do not know about this, until they are called to troubleshoot issues, such as to fix a slow network connection or to restore data after a security breach or data loss.

Using external providers can be good for the company. They have their merits such as fast provisioning and ability to quickly scale up, but they also have their limitations. Security, vendor lock-in, integration with on-premise applications and databases are some of the concerns. Some of these business units do not know the implication on the company’s network which may impact users during normal business hours. Some of them do not consider backup and disaster recovery. For regulated companies, compliance and data protection are important. They should be able to tell the auditors where the data resides and replicates. Also, as you scale up the use of compute and storage, it gets more costly.

External cloud providers are here to stay and their innovation and services will get better and better. The future as I foresee it will be a hybrid model – a combination of external providers and internal IT providers. The key for companies is to provide guidelines and policies on when to use external provider vs internal IT. For instance, a proof of concept application may be well suited to an external cloud because it is fast to provision. An application that is used only by a few users and does not need any integration with existing application is another one. Applications that integrates with the company’s internal SAP system, on the other hand, is well suited for internal cloud. These policies must be well communicated to business units.

For IT departments, they must be able to provide a good level of service to the business, streamline the process of provisioning, adapt technologies that are able to respond to the business quickly, and provide an internal cloud services that matches the services offered by external providers. This way, business units will be forced to use internal IT instead of external providers.

2015 Storage Trends

The world of data storage has seen significant innovation over the years. This year, companies will continue to adopt these storage technologies and storage vendors will continue to innovate and develop exciting products and services. Here are my top 5 storage trends for this year:

1. Software-defined storage (SDS) or storage virtualization will start to have huge adoption in tier-2 or tier-3 storage. Virtual storage appliances such as Nutanix and Virtual SAN-like solutions such as VMware virtual-SAN will find their way in companies looking for simple converged solutions.

2. The cost of flash storage will continue to drop, driving its deployment to tier-1, I/O intensive applications such as VDI. Flash storage will also continue to be used on server-side flash, and on hybrid or tiered-based appliances.

3. Small and medium companies will make headway in utilizing the cloud for storage, but mostly as backup and sync-and-share applications.

4. Storage vendors will release products with integrated data protection including encryption, archiving, replication, backup, and disaster recovery.

5. Finally, the demand for storage will continue to grow because of the explosion of big data, the “internet of things”, and large enterprises building redundant data centers.

Data-centric Security

Data is one of the most important assets of an organization; hence, it must be secured and protected. Data typically goes in and out of an organization’s internal network in order to conduct business and do valuable work. These days, data reside in the cloud, go to employees’ mobile devices or to business partners’ networks. Laptops and USB drives containing sensitive information sometimes get lost or stolen.

In order to protect the data, security must travel with the data. For a long time, the focus of security is on the network and on the devices where the data resides. Infrastructure security such as firewalls, intrusion prevention systems, etc. are not enough anymore. The focus should now shift to protecting the data itself.

Data-centric security is very useful in dealing with data breaches, especially with data containing sensitive information such as personally identifiable information, financial information and credit card numbers, health information and intellectual property data.

The key to data-centric security is strong encryption because if the public or hackers get ahold of sensitive data, it will show up as garbled information which is pretty much useless to them. To implement a robust data-centric security, the following should be considered:

1. Strong data at rest encryption on the server/storage side, applications and databases.
2. Strong in-transit encryption using public key infrastructure (PKI).
3. Effective management of encryption keys.
4. Centralized control of security policy which enforce standards and protection on data stored on the devices at the endpoints or on the central servers and storage.

Enterprise Search Using Google Search Appliance

One of the pain points for companies these days is how difficult it is to find relevant information inside their corporate network. I often hear people complain that it is easier to find any information on the Internet using Google or Bing rather than inside the enterprise.

Well, Google has been selling their Google Search Appliance (GSA) for many years. GSA brings Google superior search technology to a business corporate network. It even has the familiar look and feel that people have been accustomed to when doing a search on the Internet.

GSA can index and serve content located on the internal websites, documents located on file servers, and Microsoft Sharepoint repositories.

I recently replaced an old GSA, and quickly remembered how easy and fast it is to deploy. The hardware of the GSA is a souped up Dell server with a bright yellow casing. Racking the hardware is a snap. It comes with instructions on where to plug the network interfaces. The initial setup is done via a back-to-back network connection to a laptop, where network settings such as the IP address, netmask, gateway, time server, mail server, etc are configured.

Once the GSA is accessible on the network, the only other thing to do is to configure the initial crawl of the web servers and/or file systems, which may take a couple of hours. Once the documents are indexed, the appliance is ready to answer user search requests.

The search appliance has many advanced features and can be customized to your needs. For instance, you can customize the behavior and appearance of the search page. You can turn on or off the auto-completion feature. You can configure security settings, so that content is only available to certain people that are properly authenticated, and many other features.

Internal search engines such as the Google Search Appliance will increase the productivity of corporate employees by helping them save time looking for information.

Redefining Data Center In A Box

Data center in a box is traditionally defined as a “type of data center in which portable, mobile, and modular information nodes are self-contained within a cargo container. It is designed and packaged for quick deployment and acquisition of data center solutions in organizations or facilities, including remote off-site locations.” Data center in a box usually contains equipment from large storage, compute, and network vendors such as EMC, NetApp, Dell, and Cisco. They are pieced together to form the IT infrastructure. Virtual Computing Alliance (VCE) for instance, offers Vblock, a bundled product containing EMC storage, Cisco servers, and VMware. NetApp has a similar offering called Flexpod.

But new innovative companies such as Simplivity, Nutanix, and Scale Computing are changing the definition of data center in a box. They are creating a purpose-built product from the ground up that incorporates not just compute, storage, and network, but additional services such as data deduplication, wan optimization, and backup in a box.

For instance, Simplivity’s product called OmniCube is “a powerful data center building block that assimilates the core functions of server, storage and networking in addition to a wide range of advanced functionality including: native VM-level backup, WAN optimization, bandwidth efficient replication for DR, cache accelerated performance, and cloud integration.”

These products will further simplify the design, implementation, and operation of IT infrastructure. With these boxes, there is no more storage area network (SAN) to manage, nor additional appliances such as WAN accelerator to deploy. A few virtual machine (VM) administrators can manage all the boxes in a cluster from the VMware server virtualization management user interface.

Data center in a box will continue to evolve and will change how we view and manage IT infrastructure for years to come.

IT Infrastructure Qualification and Compliance

One of the requirements of building and operating an IT infrastructure in a highly regulated industry (such as the pharmaceutical industry, which is regulated by the FDA) is to qualify, or validate the servers, network, and storage when they are being built. Once built, any changes to the infrastructure should undergo a change control procedure.

Building the infrastructure and making changes to it should undergo verification. They should also be documented so that they can be easily managed and traced. These activities are really not that different from the best practices guide in operating an IT infrastructure, or even from the ITIL processes.

FDA does not dictate how to perform IT infrastructure qualification or validation, as long as you have documented reasonable procedures.

The problem is that some companies overdo validation and change control processes. The common problems I’ve seen are: (1) too many signatures required to make a change, (2) no automated procedure to perform the documentation – many still use papers to route documents (3) and finally, the people who perform the checks and balances sometimes do not understand the technology.

The result is that IT personnel get overwhelmed with paperwork and bureaucracy. This discourages them to make critical changes to the infrastructure such as performing security patches on time. This also leads to the relunctance of IT personnel to implement newer or leading edge technologies into their infrastructure.

Fortunately, the International Society for Pharmaceutical Engineering (ISPE) has published a Good Automated Manufacturing Practice (GAMP) guidance on IT Infrastructure Control and Compliance. Companies can create their own IT infrastructure qualification program and procedures based on the GAMP guidance document. They should be simple but comprehensive enough to cover all the bases. It is also important that these procedures be periodically reviewed and streamlined to achieve an optimized procedure.

IT Infrastructure for Remote Offices

When designing the IT infrastructure (servers, storage, and network) of small remote offices, infrastructure architects of large enterprises are often faced with the question, what is the best IT infrastructure solution for remote sites? Low-cost, simple, secure, and easy to support solution always come to mind, but positive end-user experience in terms of network and application performance, and user friendliness should also be in the top priorities when building the infrastructure.

Most small sites just need access to enterprise applications and to file and print services. Network infrastructure definitely needs to be built – such as the site’s local area network (LAN), wireless access points, wide area network (WAN) to connect to the enterprise data center, and access to the Internet. The bigger question though is: should servers and storage be installed on the site?

There are a lot of technologies such as WAN accelerators and “sync and share” applications that will forgo installing servers and storage on the remote sites without sacrificing positive end-user experience. For instance, Riverbed WAN accelerator products tremendously improve performance access to files and applications from the remote sites to the enterprise data center.  These products can even serve up remote datastore for VMware farms. “Sync and share” applications are dropbox-like applications (such as EMC Syncplicity). Enterprises can build a storage as a service solution in their internal infrastructure. This will eliminate the need to install file servers or storage appliances on the remote sites.

The decision to “install servers” or “go serverless” at the remote sites still depends on many factors. They should be dealt with on a case by case basis and should not rely on a cookie cutter solution. Some of the criteria to consider are: the number of people at the sites and the growth projection; the storage size requirement, available WAN bandwidth, the presence or absence of local IT support, office politics, and country/region specific regulation for data to remain local. If these issues are factored in, a better solution can be designed for remote offices.

Big Data

There is too much hype on big data these days, promising the next big revolution in information technology which will change the way we do business. It purports to have a big impact on economy, science, and society at large. In fact, big data right now is at the “peak of inflated expectations” on the Gartner technology hype cycle.

Big data “refers to our burgeoning ability to crunch vast collections of information, analyze it instantly, and draw sometimes profoundly surprising conclusions from it.” It answers questions that are sometimes not so obvious.

Big data definitely has tremendous potential. After all the hype has subsided, entities that do not take advantage of its power will be left out. In fact big data is already being used by technology companies such as Google, Amazon, Facebook, and many other companies. IT vendors such as Oracle, EMC, and IBM started offering big data solutions for companies and enterprises.

There are three drivers that is making big data possible:

First, a robust and cheap IT infrastructure – powerful server platforms that crunch data, advanced storage systems that store huge amount of data, and ubiquitous network – Wifi, 4G, fiber, etc.

Second, the explosion of data from mobile devices, social networks, web searches, sensors, and data from many different devices.

Lastly, the proliferation of powerful analytics and data mining tools suited for big data, such as Hadoop, MapReduce, NoSQL, and many other software yet to be created. These tools will only get better and better.

I recently read the book entitled “Big Data: A Revolution That Will Transform How We Live, Work, and Think” by Viktor Mayer-Schönberger and Kenneth Cukier.

The book is spot on its predictions. With big data, there will be yet another paradigm shift on how we understand the world. With big data, “what” is more important than “why”. Big data is also the processing of complete data, not just a sampling of data. It also means accepting less than perfect accurate result.

The book also talks about the dark side of big data – such as the loss of privacy. It also talks about how big data predictions can be used to police and punish individuals, and how organizations may blindly defer to what the data says without understanding its limitations.

I highly recommend the book to those who like to fully understand big data and its implications.