Category Archives: IT Management

Important Responsibilities of IT Infrastructure Operations

The main function of IT operations is to keep IT services running smoothly and efficiently. It would be nirvana if IT infrastructure services just work perfectly throughout their lifespan after they are initially installed and configured. However, the reality is that hardware fails, bugs are found, features need to be added, security flaws are found, patches need to be applied, usage fluctuates, data needs to be protected, upgrades need to be done, demand increases, etc. The following are the important job of IT operations:

Monitoring

Monitoring is the only way to keep track of the health and availability of the systems. Monitoring can be accomplished by looking at the system’s health via dashboard or console, or via specialized monitoring appliances. One major component of monitoring is alerting via email or pagers when there is a major issue. Monitoring can also come from incident tickets generated by machines or end users that may not be apparent via machine monitoring. System logs can also be used for trending and monitoring as it can bring into light some flaws on the system.

Troubleshooting

Once issues are detected, IT operations should be able to troubleshoot these issues and fix them as soon as possible to bring the service back online. Issues that are more complicated to fix should be escalated to vendors, higher level support, or engineers and developers.

Change Control

IT operations should not make any changes (such as configuration change, hardware replacement, upgrades, etc) without following the proper change control procedure. More than 50% of outages are caused by changes on the system. IT services are often tightly integrated with other system and a change on one system may be able to affect the others. Subject matter experts of the various systems should make sure that a change will not affect their system. Planning and testing are vital steps in performing changes.

Capacity planning

IT operations should monitor and trend the utilization of resources (compute, storage, network) and allocate resources to ensure that there is enough capacity to serve demand. They should be able to predict and allocate resources so that there is capacity when they are needed.

Performance optimization

IT operations should optimize IT services and ensure efficient use of resources. The goal is provide an excellent user experience for these services. Mechanisms such as redundancy, local load balancing, global load balancing and caching improves utilization, efficiency and end user experience.

Backups

In addition to keeping the IT services running smoothly, IT operations should also protect these systems and their data by backing them up and replicating them to a remote site. The goal is to bring these systems back online in as little time as possible when there is a catastrophic failure on the systems.

Security

IT operations should also be responsible for securing the systems. Due to its enormous task, a lot of companies employ a dedicated Security Operations Center (SOC) that watches security breaches.

Automation

One of the goals of IT operations is to automate most manual activities via scripting and self-healing mechanism. This enables them to focus on higher value tasks and not get bogged down on repetitive tasks.

Mitigating Insider Threats

With all the news about security breaches, we often hear about external cyber attacks, but internal attacks are widely unreported. Studies show that between 45% to 60% of all attacks were carried out by insiders. In addition, it is harder to detect and prevent insider attacks because access and activities are coming from trusted systems.

Why is this so common and why is this so hard to mitigate? The following reasons have been cited to explain why there are more incidents of internal security breaches:

1. Companies don’t employ data protection, don’t apply patches on time, or don’t enforce any security policies/standards (such as using complex passwords). Some companies wrongly assume that installing a firewall can protect them from inside intruders.

2. Data is outside of the control of IT security such as when the data is in the cloud.

3. The greatest reason for security breach is also the weakest link in the security chain – the people. There are two types of people in this weak security chain:

a. People who are vulnerable such as careless users who use USB, send sensitive data using public email services, or sacrifice security in favor of convenience. Most of the time, users are not aware that their account has already been compromised via malware, phishing attacks, or stolen credentials gleaned from social networks.

b. People who have their own agenda or what we call malicious users. These individuals want to steal and sell competitive data or intellectual properties to gain money, or they probably have personal vendetta against the organization.

There are however proven measures to lessen the gravity of insider threats:

1. Monitor the users, especially those who hold the potential for greatest damage – top executives, contractors, vendors, at-risk employees, and IT administrators.

2. Learn the way they access the data, create a baseline and detect any anomalous behavior.

3. When a divergent behavior is detected such as unauthorized download or server log-ins, perform an action such as block or quarantine user.

It should be noted that when an individual is caught compromising security, more often than not, damage has already been done. The challenge is to be proactive in order for the breach to not happen in the first place.

An article in Harvard Business Review has argued that psychology is the key to detecting internal cyber threats.

In essence, companies should focus on understanding and anticipating human behavior such as analyzing employee language (in their email, chat, and text) continuously and in real time. The author contends that “certain negative emotions, stressors, and conflicts have long been associated with incidents of workplace aggression, employee turnover, absenteeism, accidents, fraud, sabotage, and espionage”

Applying big data analytics and artificial intelligence on employees language in email, chat, voice, text logs and other digital communication may uncover worrisome content, meaning, language pattern, and deviation in behavior, that may make it easier to spot indications that a user is a security risk or may perform malicious activity in the future.

Translating Business Problems into Technology Solutions

One of the most important jobs of IT Consultants/Architects is to translate business problems into technology solutions. Many companies today and in the future will need to solve business problems to remain competitive. Exponential advances in information technology will enable businesses to solve problems.

But translating business problems into technology solutions is often hard. Most of the time there is a disconnect between business people and technology people. For example, business people speak of vision, strategy, processes, and functional requirements, whereas technology folks speak about programming, infrastructure, big data and technical requirements. In addition, people who understand the business typically are not smart about technology, and vice versa – technology folks often do not understand business challenges. Both have totally different perspectives – business folks are concerned about business opportunities, business climate, and business objectives, while technology folks are concerned about technology challenges, technical resources, and technical skills.

To be successful, IT Consultants/Architects should bridge the gap and provide businesses the services and the solution they need. IT Consultants/Architects need to translate business objectives into actions. In order to do this, they should be able to identify business problems, determine the requirements to solve problems, determine the technology available to help solve them, and architect the best solution. In addition, they should be able to identify strategic partners that will help move the project and determine likely barriers.

Most importantly though, IT Consultants/Architects should be able to manage expectations. It’s always better to under promise and over deliver.

Enterprise File Sync and Share

Due to increased usage of mobile devices (iPhone, iPad, Android, tablet, etc) in the enterprise, the need for a platform where employees can synchronize files between their various devices is becoming a necessity. In addition, they need a platform where they can easily share files both inside and outside of the organization. Some employees have been using this technology unbeknownst to the IT department. The popular file sync and share cloud-based app dropbox has been very popular in this area. The issue with these cloud-based sync-and-share apps is that for corporate data that are sensitive and regulated, it can pose a serious problem to the company.

Enterprises must have a solution in their own internal data center where the IT department can control, secure, protect, backup, and manage the data. IT vendors have been offering these products over the last several years. Some examples of enterprise file sync are share are: EMC Syncplicity, Egnyte Enterprise File Sharing, Citirx Sharefile, and Accellion Kiteworks.

A good enterprise file sync and share application must have the following characteristics:

1. Security. Data must protected from malware and it must be encrypted in transit and at rest. The application must integrate with Active Directory for authentication and there must be a mechanism to remote lock and/or wipe the devices.
2. Application and data must be supported via WAN acceleration, so users do not perceive slowness.
3. Interoperability with Microsoft Office, Sharepoint, and other document management system.
4. Support for major endpoint devices (Android, Apple, Windows).
5. Ability to house data internally and in the cloud.
6. Finally, the app should be easy to use. Users’ files should be easy to access, edit, share, and restore, or else people will revert back to cloud-based apps that they find super easy to use.

The Battle Between External Cloud Providers and Internal IT Departments

Nowadays, when business units require computing resources for their new software application, they have a choice between using an external provider or using the company’s internal IT department. Gone are the days when they solely rely on the IT department to provide them with compute and storage resources. Business units are now empowered because of the growing reliability and ubiquity of external cloud providers such as Amazon Web Services (AWS).

Services provided by external providers are generally easy to use and fast to provision. As long as you have a credit card, a Windows or Linux server can be running within a few hours, if not minutes. Compare that to internal IT departments which usually take days, if not weeks, to spin-up one. Large companies especially have to follow a bureaucratic procedure that takes weeks to complete.

Because of this, business units who are under the pressure to provide the application or service to the end users end up using external providers. This is the fast growing “shadow IT.” More often than not, IT departments do not know about this, until they are called to troubleshoot issues, such as to fix a slow network connection or to restore data after a security breach or data loss.

Using external providers can be good for the company. They have their merits such as fast provisioning and ability to quickly scale up, but they also have their limitations. Security, vendor lock-in, integration with on-premise applications and databases are some of the concerns. Some of these business units do not know the implication on the company’s network which may impact users during normal business hours. Some of them do not consider backup and disaster recovery. For regulated companies, compliance and data protection are important. They should be able to tell the auditors where the data resides and replicates. Also, as you scale up the use of compute and storage, it gets more costly.

External cloud providers are here to stay and their innovation and services will get better and better. The future as I foresee it will be a hybrid model – a combination of external providers and internal IT providers. The key for companies is to provide guidelines and policies on when to use external provider vs internal IT. For instance, a proof of concept application may be well suited to an external cloud because it is fast to provision. An application that is used only by a few users and does not need any integration with existing application is another one. Applications that integrates with the company’s internal SAP system, on the other hand, is well suited for internal cloud. These policies must be well communicated to business units.

For IT departments, they must be able to provide a good level of service to the business, streamline the process of provisioning, adapt technologies that are able to respond to the business quickly, and provide an internal cloud services that matches the services offered by external providers. This way, business units will be forced to use internal IT instead of external providers.

Data-centric Security

Data is one of the most important assets of an organization; hence, it must be secured and protected. Data typically goes in and out of an organization’s internal network in order to conduct business and do valuable work. These days, data reside in the cloud, go to employees’ mobile devices or to business partners’ networks. Laptops and USB drives containing sensitive information sometimes get lost or stolen.

In order to protect the data, security must travel with the data. For a long time, the focus of security is on the network and on the devices where the data resides. Infrastructure security such as firewalls, intrusion prevention systems, etc. are not enough anymore. The focus should now shift to protecting the data itself.

Data-centric security is very useful in dealing with data breaches, especially with data containing sensitive information such as personally identifiable information, financial information and credit card numbers, health information and intellectual property data.

The key to data-centric security is strong encryption because if the public or hackers get ahold of sensitive data, it will show up as garbled information which is pretty much useless to them. To implement a robust data-centric security, the following should be considered:

1. Strong data at rest encryption on the server/storage side, applications and databases.
2. Strong in-transit encryption using public key infrastructure (PKI).
3. Effective management of encryption keys.
4. Centralized control of security policy which enforce standards and protection on data stored on the devices at the endpoints or on the central servers and storage.

Enterprise Search Using Google Search Appliance

One of the pain points for companies these days is how difficult it is to find relevant information inside their corporate network. I often hear people complain that it is easier to find any information on the Internet using Google or Bing rather than inside the enterprise.

Well, Google has been selling their Google Search Appliance (GSA) for many years. GSA brings Google superior search technology to a business corporate network. It even has the familiar look and feel that people have been accustomed to when doing a search on the Internet.

GSA can index and serve content located on the internal websites, documents located on file servers, and Microsoft Sharepoint repositories.

I recently replaced an old GSA, and quickly remembered how easy and fast it is to deploy. The hardware of the GSA is a souped up Dell server with a bright yellow casing. Racking the hardware is a snap. It comes with instructions on where to plug the network interfaces. The initial setup is done via a back-to-back network connection to a laptop, where network settings such as the IP address, netmask, gateway, time server, mail server, etc are configured.

Once the GSA is accessible on the network, the only other thing to do is to configure the initial crawl of the web servers and/or file systems, which may take a couple of hours. Once the documents are indexed, the appliance is ready to answer user search requests.

The search appliance has many advanced features and can be customized to your needs. For instance, you can customize the behavior and appearance of the search page. You can turn on or off the auto-completion feature. You can configure security settings, so that content is only available to certain people that are properly authenticated, and many other features.

Internal search engines such as the Google Search Appliance will increase the productivity of corporate employees by helping them save time looking for information.

Installing High Performance Computing Cluster

A high performance computing (HPC) cluster is usually needed to analyze data from scientific instruments. For instance, I recently setup an HPC cluster using Red Hat Enterprise Linux 6.5 consisting of several nodes which will be used to analyze data generated from a gene sequencer machine.

Basically, to build the cluster, you need several machines with high speed processors and multiple cores, lots of memory, a high speed network to connect the nodes, and a huge and fast data storage. You also need to install an operating system – such as the Red Hat or CentOS Linux, and configure tools and utilities such as kickstart, ssh, NFS, and NIS. Finally, a cluster software or queueing system is needed to manage jobs to fully utilize the compute resources. One of the commonly used open source cluster software is Son of Grid Engine (SGE)  – an offshoot of the popular Sun Grid Engine.

An excellent write up for setting up an HPC cluster can be found at this Admin article.

The latest Son of Grid Engine version (as of this writing) is 8.1.7 and can be downloaded from the Son of Grid Engine Project Site.

Since the environment I setup is running Red Hat Enterprise Linux 6.5, I downloaded and installed the following rpms:

gridengine-8.1.7-1.el6.x86_64.rpm
gridengine-execd-8.1.7-1.el6.x86_64.rpm
gridengine-qmaster-8.1.7-1.el6.x86_64.rpm
gridengine-qmon-8.1.7-1.el6.x86_64.rpm

After the installation of the rpms, I installed and configured the qmaster, then installed sge (execd) on all the nodes. I also ran a simple test to verify that the cluster is working by issuing the following commands:

$ qsub /opt/sge/examples/jobs/simple.sh
$ qstat

MIT Sloan CIO Symposium

I recently attended the MIT Sloan CIO Symposium held in Cambridge, MA on May 21st, 2014. The event was well attended by Chief Information Officers (CIOs), VPs, business executives, academics, entrepreneurs, and professionals from companies all over the world. The speaker lineup was superb, the symposium was content rich, and the #mitcio twitter hashtag was trending during the event.

I enjoyed the symposium because of the combination of academic and business perspective coming from a diverse set of speakers. Hot topics such as security, big data, robotics and self-driving cars and its implications to society, and the evolving role of CIOs were big topics of conversation.

The key takeaways for me are the following:

1. The future role of CIOs and IT professionals in general will be service brokers. They will increasingly serve as in-house IT service providers, and as brokers for business managers and external cloud service providers.

2. On the issue of “build vs buy, and when does it make sense to build your own system”, the answer is — when it is a source for your competitive advantage, or when what you build will differentiate your business from anyone else.

3. CIOs increasingly have to work closely with the business to deliver on technology promises rather than focusing on the technology alone. They should have a seat at the executive table. CIOs need to stay in front of their organizations and should talk to boards regularly. They should be communicating the risks of IT investments and demonstrate its benefit to the business.

4. To maximize and communicate the business value of IT, use the following sentence when explaining the benefits of IT to business: “We are going to do ___, to make ___ better, as measured by ___, and it is worth ____.” Also, consider “you” and “your” as the most important words when translating the business value of IT.

5. In terms of the future of technology – everything is becoming data-fied. Brynjolfsson, the author of the book, “The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies” said that “We are at the cusp of a 10 year period where we go from machines not really understanding us to being able to.” Thus we are seeing tremendous technological advancements in robotics and self-driving cars. With all these technological progress, we also have to think about how our culture, laws, ethics, and economics will be affected. For instance, how will employment be affected by robots that can generally do repetitive tasks? The advice from the panel is that “creative lifelong learners will always be in demand.”

Redefining Data Center In A Box

Data center in a box is traditionally defined as a “type of data center in which portable, mobile, and modular information nodes are self-contained within a cargo container. It is designed and packaged for quick deployment and acquisition of data center solutions in organizations or facilities, including remote off-site locations.” Data center in a box usually contains equipment from large storage, compute, and network vendors such as EMC, NetApp, Dell, and Cisco. They are pieced together to form the IT infrastructure. Virtual Computing Alliance (VCE) for instance, offers Vblock, a bundled product containing EMC storage, Cisco servers, and VMware. NetApp has a similar offering called Flexpod.

But new innovative companies such as Simplivity, Nutanix, and Scale Computing are changing the definition of data center in a box. They are creating a purpose-built product from the ground up that incorporates not just compute, storage, and network, but additional services such as data deduplication, wan optimization, and backup in a box.

For instance, Simplivity’s product called OmniCube is “a powerful data center building block that assimilates the core functions of server, storage and networking in addition to a wide range of advanced functionality including: native VM-level backup, WAN optimization, bandwidth efficient replication for DR, cache accelerated performance, and cloud integration.”

These products will further simplify the design, implementation, and operation of IT infrastructure. With these boxes, there is no more storage area network (SAN) to manage, nor additional appliances such as WAN accelerator to deploy. A few virtual machine (VM) administrators can manage all the boxes in a cluster from the VMware server virtualization management user interface.

Data center in a box will continue to evolve and will change how we view and manage IT infrastructure for years to come.