Category Archives: IT Strategy

The Five Stages of Crisis Management: COVID-19 in the US

I recently attended the virtual ISC2 Security Congress 2020. One of the keynote addresses was regarding crisis management by Harvard Kennedy School Professor Juliette Kayyem. She used to be Assistant Secretary at the Department of Homeland Security.

Crisis management is central to cybersecurity. When there is a breach or security incident, crisis management is invoked to minimize damage. A well executed crisis management program leads to a successful resolution in a short period of time.

I’d like to share this chart presented by Ms Kayyem on the five stages on how the COVID-19 was managed in the US, which is similar to the five stages of cybersecurity crisis management: Protection > Prevention > Response > Recovery > Resiliency

The keynote address can be viewed here: https://securitycongress.brighttalk.live/keynote-november-18/

Building and Operating a Private Cloud

The future of computing is both public and private cloud. It may seem that public cloud – dominated by AWS, Microsoft Azure and Google – is now the norm, but companies will continue to run workloads that are best suited to be on premises. There are three main reasons for this. First, with large volume of data generated on-prem, storage and compute power need to be close since latency and distance are major issues. Second, there are studies which prove that consumption-based services offered by public cloud can be quite expensive after a certain threshold. Thus we hear some companies pulling back their storage and compute on premises to save money. Third, compliance and data sovereignty requirement by organization and governments are better controlled on-prem.

The challenge for most companies is how to build and operate on premises private cloud. Most companies have traditional data centers which are not suited any longer for delivering and maintaining reliable compute, storage, and network services to the rapidly changing business requirements. They should build and operate a private cloud that have the same characteristics of a public cloud – including on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service (read my blog on the Characteristics of a True Private Cloud).

Fortunately, companies such as VMware now offer a single integrated solution that enable companies to build and operate private cloud. For instance, the VMware Cloud Foundation based on underlying hyperconverged hardware (e.g. DellEMC VxRail), provides software-defined services for compute, storage, networking, security and cloud management to run enterprise traditional or containerized applications. It extended its VMware vSphere server virtualization platform with integrated software-defined storage (vSAN), networking (NSX), cloud management (vRealize suite), and security capabilities that can be consumed flexibly on premises. In addition, VMware Cloud Foundation delivers IT automation based on blueprints (templates), which embeds both automation and policy, and when executed will automatically orchestrate the provisioning and lifecycle of all the components in the blueprint.

Although some features are still not at par with the public cloud (such as the ease of self-service provisioning) , private cloud will continue to improve as more technologies are integrated or built on top of this private cloud foundation.

Using Artificial Intelligence in Cyber Security Applications

Artificial Intelligence (AI) and Machine Learning (ML) play critical roles in cyber security.  More and more cyber security applications are utilizing AI and ML to enhance their effectiveness.  The following are some of the applications that are taking advantage of ML algorithms.

Phishing Prevention. Phishing is a fraudulent attempt to obtain sensitive data by disguising oneself as a trustworthy entity. Detection of phishing attack is a classification problem. Training data fed into the ML system must contain phishing and legitimate website classes. By using learning algorithm, the system can be able to detect the unseen or not classified URL.

Botnet Detection. Botnet means an organized automated army of zombies which can be used for DDoS attack, sending spam, or spreading viruses.  Machine learning is now being used in detection and recognition of botnets to prevent attacks.

User Authentication. Authentication verifies the identity of a user, process. or device to allow only legit users to use the resources and services. Machine learning is now being used for adaptive authentication by learning user’s behavior.

Incident Forecasting. Predicting an incident before it occurs can save a company’s face and money.  Machine learning algorithms fed with incident reports and external data can now predict hacking incidents before they occur.

Cyber Ratings. Cyber ratings is used to assess the effectiveness of a cyber security infrastructure. Machine learning calculates cyber security ratings by getting information from multitude of security data from the web.

Spam filtering. Unwanted emails clogging user’s inbox have to be eliminated by using more dependable and robust antispam features.  Machine learning methods are now the most effective way of successfully detecting and filtering spam emails.

Malware Detection. Malware is getting more complex and being distributed more quickly.  Detecting them using signatures is not sufficient anymore.  Machine learning techniques are now being used for malware detection due to its ability to keep pace with malware evolution.

Intrusion Detection.  Intrusion detection identifies unusual access or attacks to secure internal networks. Machine learning techniques such as pattern classification, self-organizing maps and fuzzy logic are being used to detect intrusion.

User Behavior Monitoring. User behavior monitoring is an approach to insider threat prevention and detection. Machine learning techniques can help in creating an employee behavioral profile and setting off an early warning when insider threat is observed.

Maintaining High Level of Information Security During the COVID-19 Pandemic

As more people are forced to work from home during this pandemic, it is important to maintain a high level of security to safeguard the company’s information assets as well as its employees.  Endpoints such as laptops not connected to corporate network are more vulnerable when used at home.  Stressed out employees are more prone to social-engineering attacks.  They may visit sites that are usually blocked on a corporate firewall. Not surprisingly, this is also the best time for bad actors to take advantage of this opportunity.  

To mitigate these risks, the company’s security office should work with the IT department in implementing the following security measures:

  1. Enhance user security awareness by using creative ways to make the users pay attention to the message, such as using short video instead of just sending email.  Emphasize COVID-19-themed scams and phishing email and websites.  
  2. Identify and monitor high-risk user groups. Some users, such as those working with personally identifiable information (PII) or other confidential data, pose more risk than others, and their activity should be closely monitored. 
  3. Make sure all laptops have the latest security patches.  Critical servers that are accessed remotely should also have the latest security patches.
  4. Critical servers should only be accessed via virtual private network (VPN)
  5. Users connecting to the corporate network via VPN should use multi-factor (MFA) authentication. Corporate applications in the cloud should also use MFA authentication
  6. If your Virtual Desktop Infrastructure (VDI) can handle the load, users should use virtual desktops in accessing corporate applications.
  7. To support the massive users working remotely, IT should add more capacity to the network bandwidth, VDI, VPNs and MFA services.
  8. Validate and adjust incident-response (IR) and business-continuity (BC)/disaster-recovery (DR) plans.
  9. Expand monitoring of data access and end points, since the usual detection mechanism such as IDS/IPS, proxies, etc. will not secure users working from home. 
  10. Clarify incident-response protocols. When a breach occurs, security teams must know how to report and take action on it.

Source: https://www.mckinsey.com/business-functions/risk/our-insights/cybersecurity-tactics-for-the-coronavirus-pandemic?cid=other-eml-alt-mip mck&hlkid=cc61f434b9354af8aaf986862aa59350&hctky=3124098&hdpid=fd48c3f4-6cf9-4203-bfae-3df232c30bb7

Selecting the Right HCI Solution

The popularity of hyper converged infrastructure (HCI) systems is fueled not only by better, faster, and cheaper cpu and flash storage, but also by better orchestration of compute and storage resources, horizontal scaling, and elasticity to adjust to changing workloads.

Hyper-converged infrastructures are scale-out systems with nodes that are added and aggregated into a single platform. Each node performs compute, storage and networking functions, and they run virtualization software. HCI enables the software-defined data center.

But what are the considerations in buying the right solution for your use case? Here are some guidelines:

1. Closely inspect how the system implements reliability and resiliency. How does it protect system configuration and data? Implementations include replication, erasure coding, distribution of state information across multiple nodes to enable automatic failover, etc.

2. Does it have self-healing capabilities?

3. Can it perform non-disruptive upgrades?

4. Does it support VMware vSphere, as well as Microsoft Hyper-V and open source hypervisors like KVM?

5. Does the storage supports auto-tiering?

6. Since migrations affect virtual machine performance, how does the system maintains data locality as virtual machines move from one host to another?

7. What are the network configuration options? How is the network managed? Is there a self-optimizing network capabilities?

8. How is the performance affected when backups and restore are performed?

9. What is the performance impact if they are deployed in multiple geographical regions?

10. What are the data protection and recovery capabilities, such as snapshots and replication of workloads locally, to remote data centers and in the cloud?

11. Does it deduplicate the data, which minimizes the amount of data stored?

12. Does it have the option to extend to public clouds?

13. What are its management capabilities? Does it provide a single intuitive console for managing the HCI, or does it include a plug-in to hypervisor management tool such as vCenter to perform the management tasks?

14. Does it have APIs that enable third-party tools and custom scripts to interface with to enable automation?

15. Does it have monitoring, alerting, and reporting system which analyzes its performance, errors and capacity planning?

Finally, you should look at the vendor itself and look at their future in the HCI space, their product roadmap, support polices and cost model (lease, outright purchase, pay as you go, etc).

Optimizing AWS Cost

One of the advantages of using the cloud is cost savings since you only pay for what you use. However, many companies still waste resources in the cloud, and end up paying for services that they don’t use. A lot of people are stuck in the old ways of implementing IT infrastructure such as overprovisioning and keeping the servers on 24×7 even when they are idle most of the time.

There are several ways you can optimize AWS in order to save money.

1. Right sizing

With AWS you can right size your services to meet exactly the capacity requirements you need without having to overprovision. On the compute side, you should select the correct EC2 instance appropriate with the application, and provision only enough number of instances to meed the need. When the need for more compute increases, you can scale up or scale out compute resources. For instance during low-demand, use only a couple of EC2 instances, but during high-demand, autoprovision additional EC2 instances to meet the load.

On the storage side, AWS offers multiple tiers to fit your storage need. For instance, you can store frequently used files/objects on S3 Standard tier, store less frequently used files/objects on S3 Infrequent Access (IA) tier, and store archive data on Glacier. Finally you should delete data that you don’t need.

2. Reserve capacity

If you know that you will be using AWS for a long period of time, you can commit to reserve capacity from AWS and save a lot of money on equivalent on-demand capacity.

Reserved Instances are available in 3 options – All up-front (AURI), partial up-front (PURI) or no upfront payments (NURI). When you buy Reserved Instances, the larger the upfront payment, the greater the discount. To maximize your savings, you can pay all up-front and receive the largest discount. Partial up-front RI’s offer lower discounts but give you the option to spend less up front. Lastly, you can choose to spend nothing up-front and receive a smaller discount, but this option allows you to free up capital to spend on other projects.

3. Use spot market

If you have applications that are not time sensitive such as non-critical batch workloads, you may be able to save a lot of money by leveraging Amazon EC2 Spot Instances. This works like an auction where you bid on spare Amazon EC2 computing capacity.

Since Spot instances are often available at a discount compared to On-Demand pricing, you can significantly reduce the cost of running your applications

4. Cleanup unused services

One of the best ways to save money is to turn off unused and idle resources. These include EC2 instances with no network or CPU activity for the past few days, Load Balancers with no traffic, unused block storage (EBS), piles of snapshots and detached Elastic IPs. For instance, one company analyzed their usage pattern and found that during certain periods, they should be able to power off a number of EC2 instances, thereby minimizing their costs.

One thing you really need to do on a regular basis is to monitor and analyze your usage. AWS provides several tools to track your costs such as Amazon CloudWatch (which collects and tracks metrics, monitors log files, and sets alarms), Amazon Trusted Advisor (which looks for opportunities to save you money such as turning off non-production instances), and Amazon Cost Explorer (which gives you the ability to analyze your costs and usage).

Reference: https://aws.amazon.com/pricing/cost-optimization/

AWS Cloud Architecture Best Practices

AWS services have many capabilities.  When migrating existing applications to the cloud or creating new applications for the cloud, it is important to know these AWS capabilities in order to architect the most resilient, efficient, and scalable solution for your applications.

Cloud architecture and on-premise architecture differs in so many ways.  In the cloud, you treat the infrastructure as a configurable and flexible software as opposed to hardware. You need to have a different mindset when architecting in the cloud because the cloud has a different way of solving problems.

You have to consider the following design principles in AWS cloud:

  1. Design for failure by implementing redundancy everywhere.  Components fail all the time.  Even whole site fail sometimes.  For example, if you implement redundancy of your web/application servers in different availability zones, your application will be more resilient when one availability zone fails.
  2. Implement scalability.  One of the advantages of using the cloud vs on-premise is the ability to grow and shrink the resources you need depending on the demand.  AWS supports scaling your resources vertically and horizontally, even automating it by using auto-scaling.
  3. Use AWS storage service that fits your use case.  AWS has several storage services with different properties, cost and functionality.  Amazon S3 is used for web applications that need large-scale storage capacity and performance.  It is also used  for backup and disaster recovery.  Amazon Glacier is used for data archiving and long-term backup.  Amazon EBS is a block storage used for mission-critical applications. Amazon EFS (Elastic File System) is used for SMB or NFS shares.
  4. Choose the right database solution. Match technology to the workload: Amazon RDS is for relational databases. Amazon DynamoDB is for NoSQL databases and Amazon Redshift is for data warehousing.
  5. Use caching to improve end user experience.  Caching minimizes redundant data retrieval operations making future requests faster.   Amazon CloudFront is a content delivery network that caches your website via edge devices located around the world. Amazon ElastiCache is for caching data for mission-critical database applications.
  6. Implement defense-in-depth security.  This means building security at every layer.  Referencing the AWS “Shared Security” model, AWS is in-charge of securing the cloud infrastructure (including physical layer and hypervisor layer) while the costumer is in-charge of the majority of the layers from the operating system up to the application layer.  This means customer is still responsible for patching the OS and making the application as secure as possible.  AWS provides security tools that will make your application secure such as IAM, security groups, network ACL’s, CloudTrail, etc.
  7. Utilize parallel processing.  For instance, multi-thread requests by using concurrent threads instead of sequential requests.  Another example is to deploy multiple web or application servers behind load balancers so that requests can be processed by multiple servers at once.
  8. Decouple your applications. IT systems should be designed in a way that reduces inter-dependencies, so that a change or failure in one component does not cascade to other components.  Let the components interact with each other only through standard APIs.
  9.  Automate your environment. Remove manual process to improve system’s stability and consistency.  AWS offers many automation tools to ensure that your infrastructure can respond quickly to changes.
  10. Optimize for cost.  Ensure that your resources are sized appropriately (they can scale in and out based on need),  and that you are taking advantage of different pricing options.

Sources: AWS Certified Solutions Architect Official Study Guide; Global Knowledge Architecting on AWS 5.1 Student Guide

Ensuring Reliability of Your Apps on the Amazon Cloud

On February 28, 2017, the Amazon Simple Storage Service (S3) located in the Northern Virginia (US-EAST-1) Region went down due to an incorrect command issued by a technician. A lot of websites and applications that rely on the S3 service went down with it. The full information about the outage can be found here: https://aws.amazon.com/message/41926/

While Amazon Web Services (AWS) could have prevented this outage, a well-architected site should not have been affected by this outage. Amazon allows subscribers to use multiple availability zones (and even redundancy in multiple regions), so that when one goes down, the applications are still able to run on the others.

It is very important to have a well-architected framework when using the cloud. AWS provides one that is based on five pillars:

  • Security – The ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies.
  • Reliability – The ability of a system to recover from infrastructure or service failures, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues.
  • Performance Efficiency – The ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve.
  • Cost Optimization – The ability to avoid or eliminate unneeded cost or suboptimal resources.
  • Operational Excellence – The ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures.

For those companies who were affected by the outage, applying the “reliability” principle (by utilizing multiple availability zones, or using replication to different regions), could have shielded them from the outage.

Securing Your Apps on Amazon AWS

One thing to keep in mind when putting your company’s applications in the cloud, specifically on Amazon AWS, is that you are still largely responsible for securing them. Amazon AWS has solid security in place, but you do not entrust the security aspect to Amazon thinking that your applications are totally secure because they are hosted there. In fact, Amazon AWS has a shared security responsibility model depicted by this diagram:

Source:  Amazon AWS

Amazon AWS is responsible for the physical and infrastructure security, including hypervisor, compute, storage, and network security; and the customer is responsible for application security, data security, Operating System (OS) patching and hardening, network and firewall configuration, identity and access management, and client and server-side data encryption.

However, Amazon AWS provides a slew of security services to make your applications more secure. They provide the AWS IAM for identity and access management, Security Groups to shield EC2 instances (or servers), Network ACLs that act as firewall for your subnets, SSL encryption for data transmission, and user activity logging for auditing. As a customer, you need to understand, design, and configure these security settings to make your applications secure.

In addition, there are advance security services that Amazon AWS provides, so that you don’t have to build them, including the AWS Directory Service for authentication, AWS KMS for Security Key Management, AWS WAF Web Application Firewall for deep packet inspection, and DDOS mitigation.

There is really no perfect security, but securing your infrastructure at every layer tremendously improves the security of your data and applications in the cloud.

Building an Enterprise Private Cloud

Businesses are using public clouds such as Amazon AWS, VMware vCloud or Microsoft Azure because they are relatively easy to use, they are fast to deploy, businesses can buy resources on demand, and most importantly, they are relatively cheap (because there is no operational overhead in building, managing and refreshing an on-premise infrastructure). But there are downsides to using public cloud, such as security and compliance, diminished control of data, data locality issue, and network latency and bandwidth. On-premise infrastructure is still the most cost effective for regulated data and for applications with predictable workloads (such as ERP, local databases, end-user productivity tools, etc).

However, businesses and end-users are expecting and demanding cloud-like services from their IT departments for these applications that are best suited on-premise. So, IT departments should build and deliver an infrastructure that has the characteristics of a public cloud (fast, easy, on-demand, elastic, etc) and the reliability and security of the on-premise infrastructure – an enterprise private cloud.

An enterprise cloud is now possible to build because of the following technology advancements:

  1. hyper-converged solution
  2. orchestration tools
  3. flash storage

When building an enterprise cloud, keep in mind the following:

  1. They should be 100% virtualized.
  2. There should be a mechanism for self-service provisioning, monitoring, billing and charge back.
  3. A lot of operational functions should be automated.
  4. Compute and storage can be scaled-out.
  5. It should be resilient – no single point of failure.
  6. Security should be integrated in the infrastructure.
  7. There should be a single management platform.
  8. Data protection and disaster recovery should be integrated in the infrastructure.
  9. It should be application-centric instead of infrastructure-centric.
  10. Finally, it should be able to support legacy applications as well as modern apps.