Important Responsibilities of IT Infrastructure Operations

The main function of IT operations is to keep IT services running smoothly and efficiently. It would be nirvana if IT infrastructure services just work perfectly throughout their lifespan after they are initially installed and configured. However, the reality is that hardware fails, bugs are found, features need to be added, security flaws are found, patches need to be applied, usage fluctuates, data needs to be protected, upgrades need to be done, demand increases, etc. The following are the important job of IT operations:

Monitoring

Monitoring is the only way to keep track of the health and availability of the systems. Monitoring can be accomplished by looking at the system’s health via dashboard or console, or via specialized monitoring appliances. One major component of monitoring is alerting via email or pagers when there is a major issue. Monitoring can also come from incident tickets generated by machines or end users that may not be apparent via machine monitoring. System logs can also be used for trending and monitoring as it can bring into light some flaws on the system.

Troubleshooting

Once issues are detected, IT operations should be able to troubleshoot these issues and fix them as soon as possible to bring the service back online. Issues that are more complicated to fix should be escalated to vendors, higher level support, or engineers and developers.

Change Control

IT operations should not make any changes (such as configuration change, hardware replacement, upgrades, etc) without following the proper change control procedure. More than 50% of outages are caused by changes on the system. IT services are often tightly integrated with other system and a change on one system may be able to affect the others. Subject matter experts of the various systems should make sure that a change will not affect their system. Planning and testing are vital steps in performing changes.

Capacity planning

IT operations should monitor and trend the utilization of resources (compute, storage, network) and allocate resources to ensure that there is enough capacity to serve demand. They should be able to predict and allocate resources so that there is capacity when they are needed.

Performance optimization

IT operations should optimize IT services and ensure efficient use of resources. The goal is provide an excellent user experience for these services. Mechanisms such as redundancy, local load balancing, global load balancing and caching improves utilization, efficiency and end user experience.

Backups

In addition to keeping the IT services running smoothly, IT operations should also protect these systems and their data by backing them up and replicating them to a remote site. The goal is to bring these systems back online in as little time as possible when there is a catastrophic failure on the systems.

Security

IT operations should also be responsible for securing the systems. Due to its enormous task, a lot of companies employ a dedicated Security Operations Center (SOC) that watches security breaches.

Automation

One of the goals of IT operations is to automate most manual activities via scripting and self-healing mechanism. This enables them to focus on higher value tasks and not get bogged down on repetitive tasks.

Mitigating Insider Threats

With all the news about security breaches, we often hear about external cyber attacks, but internal attacks are widely unreported. Studies show that between 45% to 60% of all attacks were carried out by insiders. In addition, it is harder to detect and prevent insider attacks because access and activities are coming from trusted systems.

Why is this so common and why is this so hard to mitigate? The following reasons have been cited to explain why there are more incidents of internal security breaches:

1. Companies don’t employ data protection, don’t apply patches on time, or don’t enforce any security policies/standards (such as using complex passwords). Some companies wrongly assume that installing a firewall can protect them from inside intruders.

2. Data is outside of the control of IT security such as when the data is in the cloud.

3. The greatest reason for security breach is also the weakest link in the security chain – the people. There are two types of people in this weak security chain:

a. People who are vulnerable such as careless users who use USB, send sensitive data using public email services, or sacrifice security in favor of convenience. Most of the time, users are not aware that their account has already been compromised via malware, phishing attacks, or stolen credentials gleaned from social networks.

b. People who have their own agenda or what we call malicious users. These individuals want to steal and sell competitive data or intellectual properties to gain money, or they probably have personal vendetta against the organization.

There are however proven measures to lessen the gravity of insider threats:

1. Monitor the users, especially those who hold the potential for greatest damage – top executives, contractors, vendors, at-risk employees, and IT administrators.

2. Learn the way they access the data, create a baseline and detect any anomalous behavior.

3. When a divergent behavior is detected such as unauthorized download or server log-ins, perform an action such as block or quarantine user.

It should be noted that when an individual is caught compromising security, more often than not, damage has already been done. The challenge is to be proactive in order for the breach to not happen in the first place.

An article in Harvard Business Review has argued that psychology is the key to detecting internal cyber threats.

In essence, companies should focus on understanding and anticipating human behavior such as analyzing employee language (in their email, chat, and text) continuously and in real time. The author contends that “certain negative emotions, stressors, and conflicts have long been associated with incidents of workplace aggression, employee turnover, absenteeism, accidents, fraud, sabotage, and espionage”

Applying big data analytics and artificial intelligence on employees language in email, chat, voice, text logs and other digital communication may uncover worrisome content, meaning, language pattern, and deviation in behavior, that may make it easier to spot indications that a user is a security risk or may perform malicious activity in the future.

(ISC)2 Security Congress 2016

I recently attended the (ISC)2 Annual Security Congress (in conjunction with ASIS International) in Orlando, Florida. (ISC)2 Security Congress is a premier 4-day conference attended by hundreds of IT security professionals from around the world. This year featured a line-up of excellent speakers including keynote speeches from journalist Ted Koppel and foreign policy expert Elliott Abrams.

Here are the top IT security topics I gathered from the conference:

  1. Cloud security. As more and more companies are migrating to the cloud, IT security professionals are seeking the best practices for securing applications and data in the cloud.
  2. IoT (Internet of Things) security. It’s still a wild west out there. Manufacturers are making IOT devices (sensors, cameras, appliances, etc) that are insecure. There is a lack of standardization. People are putting devices on the Internet with default settings and passwords which make them vulnerable. Inside most companies, there is usually no process of putting these IOT devices on the network.
  3. Ransomware. They are getting more prevalent and sophisticated. Some perpetrators have a solid business model around this, including a call center/ help desk to help victims pay the ransom and recover their data.
  4. Resiliency. It’s better to build your network for resiliency. Every company will be a victim of an attack at some point, even with the best defenses in place. Resilient networks are those that can recover quickly after a breach.
  5. Common sense security. There are plenty of discussions on using time-tested security practices such as hardening of devices (replacing default passwords for instance), patching on time, and constant security awareness for users.
  6. Cyberwar.  There’s a mounting occurrence of cyber incidents and the next big threat to our civilization is cyberwar. Bad actors (state-sponsored hackers, hacktivists, criminals, etc.) may be able to hack into our industrial systems that are controlling our electrical and water supply, and be able to disrupt or destroy them.
  7. Shortage of cybersecurity experts.  The industry is predicting a shortage of cybersecurity professionals in the near future.

Hyper-converged Infrastructure: Hype or For Real?

One of the hottest emerging technologies in IT is hyper-converged infrastructure (HCI). What is the hype all about? Is it here to stay?

As defined by Techtarget, hyper-convergence infrastructure (HCI) is a system with a software-centric architecture that tightly integrates compute, storage, networking, virtualization resources (hypervisor, virtual storage, virtual networking) and other technologies (such as data protection and deduplication) in a commodity hardware box (usually x86) supported by a single vendor.

Hyper-convergence grew out of the concept of converged infrastructure, where engineers took it a little further – using very small hardware footprint, tight integration of components and simplified management. It is a relatively new technology. On the technology adoption curve, it is still at the early adopters stage.

Nutanix is the first vendor to offer hyper-converged solution, followed by Simplivity, and Scale Computing. Not to be outdone, VMWare developed its EVO-RAIL, then opened it for hardware vendors to OEM the product. Major vendors, including EMC, NetApp, Dell, HP, and Hitachi began selling EVO-RAIL products.

One of the best HCI product that I’ve seen is VxRail. Jointly engineered by VMware and EMC, the “VxRail appliance family takes full advantage of VMware Hyper-Converged Software capabilities and provides additional hardware and lifecycle management features and rich EMC data services, delivered in a turnkey appliance with integrated support.”

What are the advantages of HCI and where can it be used? Customers who are looking to start small and be able to scale out overtime, will find the HCI solution very attractive. It is a perfect fit for small to medium size companies, to be able to build their own data center without spending huge amount of money. It is simple (because it eliminates a lot of hardware clutter) and highly scalable (because it can be scaled very easily by adding small standardized x86 nodes). Since it is scalable, it will ease the burden of growth. Finally, its performance is comparable to big infrastructures because leveraging SSD storage and bringing the data close to the compute enables high IOPS at very low latencies.

References:

1. Techtarget
2. VMware Hyper-Converged Infrastructure: What’s All the Fuss About?

Replicating Massive NAS Data to a Disaster Recovery Site

Replicating Network Attached Storage (NAS) data to a Disaster Recovery (DR) site is quite easy when using big named NAS appliances such as NetApp or Isilon. Replication software is already built-in on these appliances – Snapmirror for NetApp and SyncIQ for Isilon. They just need to be licensed to work.

But how do you replicate terabytes, even petabytes of data, to a DR site when the Wide Area Network (WAN) bandwidth is a limiting factor? Replicating a petabyte of data may take weeks, if not months to complete even on a 622 Mbps WAN link, leaving the company’s DR plan vulnerable.

One way to accomplish this is to use a temporary swing array by (1) replicating data from the source array to the swing array locally, (2) shipping the swing frame to the DR site, (3) copying the data to the destination array, and finally (4) resyncing the source array with the destination array.

On NetaApp, this is accomplished by using the Snapmirror resync command. On Isilon, this is accomplished by turning on the option “target-compare-initial” in SynqIQ which compares the files between the source and destination arrays and sends only data that are different over the wire.

When this technique is used, huge company data sitting on NAS devices can be well protected right away on the DR site.

Protecting Data Located at Remote Sites

One of the challenges of remote offices with limited bandwidth and plenty of data is how to protect that data. Building a local backup infrastructure can be cost prohibitive and usually the best option is to backup the data to the company’s data center or to a cloud provider.

But how do you initially bring the data to the backup server without impacting the business users using the wide area network (WAN)?

There are three options:

1. The first option is to “seed” the initial backup. Start the backup locally to a USB drive, ship the drive to the data center, copy the data, then perform subsequent backups to the data center.

2. Use the WAN to backup the data but throttle the bandwidth until it completes. WAN utilization will be low, but it may take some time to complete.

3. Use the WAN to backup data and divvy up the data into smaller chunks. So that the users will not be affected during business hours, run the backup jobs only during off-hours and during the weekends. This may also take some time to complete.

Book Review: The Industries of the Future

I came across this book while browsing the New Arrivals section at a local bookstore. As a technology enthusiast, the title has piqued my interest. However, the other reason why I wanted to read this book was to find an answer to the question “How do we prepare our children for the future?” As a father of a teenage daughter, I would like to provide her with all the opportunities and exposure she needs to enable her to make the right career choice and be better prepared for the future.

The author Alec Ross states in the introduction, “This book is about the next economy. It is written for everyone who wants to know how the next wave of innovation and globalization will affect our countries, our societies, and ourselves.”

The industries of the future are:

1. Robotics. Robots have been around for many years, but the ubiquity of network connection, availability of big data, and faster processors are making significant progress in robotics.

2. Genomics. If the last century is the age of Physics, the coming century will be the the age of Biology. The sequencing of genomics has opened the door to many opportunities in life sciences.

3. Blockchains. The financial industry and the way we handle commerce will be transformed by this technology.

4. Cybersecurity. The Internet will be the next place where war between nations will be waged.

5. Big Data. Use of predictive analytics or other advanced methods to extract value from data will allow us to “perform predictions of outcomes and behaviors” and alter the way we live.

There is nothing new about these technologies. However, what made the book really worth reading were the examples, anecdotes and interesting stories told by Ross. The author has traveled extensively around the world and has first hand experience of these technologies.

Back to the question, “How do we prepare our children for the future?” —  the best thing we can do is to encourage them to pursue a career in science and technology and allow them to travel so they will be comfortable in a multicultural world.

Translating Business Problems into Technology Solutions

One of the most important jobs of IT Consultants/Architects is to translate business problems into technology solutions. Many companies today and in the future will need to solve business problems to remain competitive. Exponential advances in information technology will enable businesses to solve problems.

But translating business problems into technology solutions is often hard. Most of the time there is a disconnect between business people and technology people. For example, business people speak of vision, strategy, processes, and functional requirements, whereas technology folks speak about programming, infrastructure, big data and technical requirements. In addition, people who understand the business typically are not smart about technology, and vice versa – technology folks often do not understand business challenges. Both have totally different perspectives – business folks are concerned about business opportunities, business climate, and business objectives, while technology folks are concerned about technology challenges, technical resources, and technical skills.

To be successful, IT Consultants/Architects should bridge the gap and provide businesses the services and the solution they need. IT Consultants/Architects need to translate business objectives into actions. In order to do this, they should be able to identify business problems, determine the requirements to solve problems, determine the technology available to help solve them, and architect the best solution. In addition, they should be able to identify strategic partners that will help move the project and determine likely barriers.

Most importantly though, IT Consultants/Architects should be able to manage expectations. It’s always better to under promise and over deliver.

Object Storage

A couple of days ago, a business user asked me if our enterprise IT provides object-based storage. I heard the term object storage before but I have little knowledge about it. I only know it’s a type of storage that is data aware. I replied “No, we don’t offer it yet.” But in the back of my mind, I was asking myself, should we be offering object storage to our users? Are we so behind, we haven’t implemented this cool technology? Is our business losing its competitive advantage because we haven’t been using it?

As I research more on the topic, I understood what it entails, its advantages and disadvantages.

Object storage is one of the hot technologies that is expected to grow adoption this year. As defined by Wikipedia, object storage, “is a storage architecture that manages data as objects, as opposed to other storage architectures like file systems which manage data as a file hierarchy and block storage which manages data as blocks within sectors and tracks. Each object typically includes the data itself, a variable amount of metadata, and a globally unique identifier.”

Its extended metadata allows for some intelligence in the data. For example, a user or application can tag a data object what type of file it is, how it should be used, who will use it, its contents, how long it should live, and so on. That metadata information could, in turn, inform a backup application, for instance, that the object is classified or that it should be deleted on a certain date. This makes tasks like automation and management simpler for the administrator.

The globally unique identifier allows a server or end user to retrieve the data without needing to know the physical location or hierarchical location of the data. This makes it a useful data storage for long-term data retention, backup, file-sharing, and cloud application. In fact, Facebook uses object storage when you upload a picture.

One drawback of object storage is performance – slow throughput and latency due to the amount of metadata. Another drawback is that data consistency is achieved slowly. Whenever an object is updated, the change has to be propagated to all of the replicas which takes time before the latest version becomes available. With these properties, it’s well suited for data that doesn’t change much, like backups, archives, video, and audio files. That’s why it’s heavily used by Facebook, Spotify, and other cloud companies because once you upload a picture or music file, it doesn’t change much and it stays forever.

Object storage may be one of the hottest technologies in the storage space, but for now, I don’t see compelling use cases in enterprise IT. Object storage is unsuitable for data that changes frequently. File systems and block storage do just fine in storing data that rarely changes or data that frequently changes. Enterprise backup systems are versatile as well for long-term data retention and backups. Object storage may provide more information about the data, but storage administrators primary concerns are to deliver the data faster and more efficiently, as well as to protect its integrity.

Object storage distributed nature enables IT shops to use low cost storage, but in reality, in enterprise IT, NAS and SAN are prevalent because they are reliable and easier to manage.

We need well defined use cases and compelling advantages for object-based storage to be widely used in enterprise IT.

How to Restore from Replicated Data

When the primary backup server goes down due to hardware error, a site disaster, or other causes, the only way to restore is via the replicated data, assuming the backup server was configured to replicate to a DR (Disaster Recovery) or secondary site.

In Avamar, replicated data is restored from the REPLICATE domain of the target Avamar server. All restores of replicated data are directed restores, because from the point of view of the Avamar target server, the restore destination is a different machine from the original one.

The procedure to restore files and directories are:

  1. Re-register and activate the server to the Avamar replication target server.
  2. Perform file/directory restore.
    • Select the data that you want to restore from the replicated backups for the clients within the REPLICATE domain
    • Select Actions > Restore Now
    • On the Restore Options window, notice that the only destination choice is blank so that a new client must be selected
    • Click Browse and select a client and destination from among the listed clients. Note that these clients are clients that are activated with the target server and are not under the REPLICATE domain.

If the Windows or UNIX/Linux server was part of the disaster, then the way to restore data is to build a new server first, then follow the procedure above to restore files and directories to that server. The other way is to perform a bare metal restore which is supported by Avamar on Windows 2008 and above.