Using AWS S3 Storage Service as On-premises NAS

One of the fastest ways to start utilizing storage services in the cloud  such as AWS S3 storage is by using AWS File Storage Gateway. 

File Storage Gateway is a hybrid cloud storage service that provides on-premises access to virtually unlimited cloud storage in AWS S3.  It presents one or more AWS S3 buckets and their objects as a mountable NFS or SMB share to one or more clients on-premises.  In effect, you have an on-premises NAS, which keeps hot data locally, but the backend connects to AWS S3 where data ultimately resides.  The main advantages of using File Storage Gateway are:

  1. Data on AWS S3 can be tiered and life cycled into cost-effective storage  
  2. Data can be processed on both on-prem and in AWS, using on-prem legacy applications and Amazon EC2-based applications
  3. Data can be shared by users located in multiple geographic locations

One disadvantage of using File Storage Gateway is that it is not optimized for large number of users/connections.  It is designed for small number of users (about 100 connections per gateway), but high volume of data (in TB o PB scale). 

Using Artificial Intelligence in Cyber Security Applications

Artificial Intelligence (AI) and Machine Learning (ML) play critical roles in cyber security.  More and more cyber security applications are utilizing AI and ML to enhance their effectiveness.  The following are some of the applications that are taking advantage of ML algorithms.

Phishing Prevention. Phishing is a fraudulent attempt to obtain sensitive data by disguising oneself as a trustworthy entity. Detection of phishing attack is a classification problem. Training data fed into the ML system must contain phishing and legitimate website classes. By using learning algorithm, the system can be able to detect the unseen or not classified URL.

Botnet Detection. Botnet means an organized automated army of zombies which can be used for DDoS attack, sending spam, or spreading viruses.  Machine learning is now being used in detection and recognition of botnets to prevent attacks.

User Authentication. Authentication verifies the identity of a user, process. or device to allow only legit users to use the resources and services. Machine learning is now being used for adaptive authentication by learning user’s behavior.

Incident Forecasting. Predicting an incident before it occurs can save a company’s face and money.  Machine learning algorithms fed with incident reports and external data can now predict hacking incidents before they occur.

Cyber Ratings. Cyber ratings is used to assess the effectiveness of a cyber security infrastructure. Machine learning calculates cyber security ratings by getting information from multitude of security data from the web.

Spam filtering. Unwanted emails clogging user’s inbox have to be eliminated by using more dependable and robust antispam features.  Machine learning methods are now the most effective way of successfully detecting and filtering spam emails.

Malware Detection. Malware is getting more complex and being distributed more quickly.  Detecting them using signatures is not sufficient anymore.  Machine learning techniques are now being used for malware detection due to its ability to keep pace with malware evolution.

Intrusion Detection.  Intrusion detection identifies unusual access or attacks to secure internal networks. Machine learning techniques such as pattern classification, self-organizing maps and fuzzy logic are being used to detect intrusion.

User Behavior Monitoring. User behavior monitoring is an approach to insider threat prevention and detection. Machine learning techniques can help in creating an employee behavioral profile and setting off an early warning when insider threat is observed.

Improving Security of Backup Data

One of the best defense against ransomware is to backup data and verify its integrity regularly.  If your data has been breached by a ransomware, you can always restore the data from backup.  However, hackers using ransomware are increasingly targeting primary backups. Adding an air gap to the secondary copy of the backup can mitigate this, 

An air gap is a security measure that protects backup data from intrusion, malicious software and direct cyber attacks  The idea is to place a secondary copy of backups behind a private network that is not physically connected to the wider network (i.e. behind air gaps). These secondary air-gapped backups will provide preserved backup copies and will be capable of restoring data that have been attacked by ransomware.

One example of air gap implementation is by DellEMC.  In the figure below, the Data Domain primary backup storage (Source) is replicated to a Data Domain secondary backup storage (Target) inside a vault.  The vault is self-contained and self-secured.  It is air-gapped except for replication in cycles.  It also has encryption and data protection controls including mutual authentication of source and target, data-at-rest encryption, data-in-motion encryption, replication channel encryption, Data Domain hardening, and immutable data (using retention lock). In addition, it also contains applications that scans for security issues and tests for critical apps.

DellEMC Cyber Recovery


VMWare Instant Recovery

When a virtual machine crashes, there are two ways to quickly recover it – first is by using the VMware snapshot copy and second is by restoring an image-level backup.  Most VMware environment though do not usually perform snapshots on the virtual machines (VMs) due to increased usage on the primary storage, which can be costly.   On the other hand, using traditional method to restore image-level backup can take longer since it has to be copied back from the protection storage to the primary storage. 

However, most backup solutions nowadays – including Netbackup, Avamar/Data Domain, and Veeam – support VMware instant recovery where you can immediately restore VMs by running them directly from backup files.  The way it works is that the virtual machine image backup is staged to a temporary NFS share on the protection storage system (e.g. Data Domain).   You can then use the vSphere Client to power on the virtual machine (which is NFS mounted on the ESXi host), then initiate a vMotion of the virtual machine to the primary datastore within the vCenter. 

Since there is no need to extract the virtual machine from the backup file and copy it to production storage, you can perform restore from any restore point in a matter of minutes. VMware instant recovery helps improve recovery time objectives (RTO), and minimizes disruption and downtime of critical workloads.

There are also other uses for instant recovery. You can use it to verify the backup image, verify an application, test a patch on a restored virtual machine before you apply the patch to production systems, and perform granular restore of individual files and folders.

Unlike the primary storage, protection storage such as Data Domains are usually slow.  However, the new releases of Data Domains have improved random I/O (due to additional flash SSD), higher IOPS and better latency, enabling faster instant access and restore of VMs. 

AWS Cloud Native Security Services

Unlike legacy and most on-prem IT infrastructure, AWS cloud was build with security in mind.  AWS is responsible for the security “of” the cloud including hardware, hypervisors, and networks.  Customers are still responsible for the security of their data and applications “in” the cloud.  

To help customers, AWS offers numerous cloud native security tools.  This diagram, which I derived from the latest AWS Online Summit on May 13, 2020, depicts AWS services that customers can use when implementing the five NIST cybersecurity framework – Identify, Protect, Detect, Respond,  and Recover – to secure their data and applications in the cloud.

Source: AWS Summit May 2020 Security Presentation

Maintaining High Level of Information Security During the COVID-19 Pandemic

As more people are forced to work from home during this pandemic, it is important to maintain a high level of security to safeguard the company’s information assets as well as its employees.  Endpoints such as laptops not connected to corporate network are more vulnerable when used at home.  Stressed out employees are more prone to social-engineering attacks.  They may visit sites that are usually blocked on a corporate firewall. Not surprisingly, this is also the best time for bad actors to take advantage of this opportunity.  

To mitigate these risks, the company’s security office should work with the IT department in implementing the following security measures:

  1. Enhance user security awareness by using creative ways to make the users pay attention to the message, such as using short video instead of just sending email.  Emphasize COVID-19-themed scams and phishing email and websites.  
  2. Identify and monitor high-risk user groups. Some users, such as those working with personally identifiable information (PII) or other confidential data, pose more risk than others, and their activity should be closely monitored. 
  3. Make sure all laptops have the latest security patches.  Critical servers that are accessed remotely should also have the latest security patches.
  4. Critical servers should only be accessed via virtual private network (VPN)
  5. Users connecting to the corporate network via VPN should use multi-factor (MFA) authentication. Corporate applications in the cloud should also use MFA authentication
  6. If your Virtual Desktop Infrastructure (VDI) can handle the load, users should use virtual desktops in accessing corporate applications.
  7. To support the massive users working remotely, IT should add more capacity to the network bandwidth, VDI, VPNs and MFA services.
  8. Validate and adjust incident-response (IR) and business-continuity (BC)/disaster-recovery (DR) plans.
  9. Expand monitoring of data access and end points, since the usual detection mechanism such as IDS/IPS, proxies, etc. will not secure users working from home. 
  10. Clarify incident-response protocols. When a breach occurs, security teams must know how to report and take action on it.

Source: https://www.mckinsey.com/business-functions/risk/our-insights/cybersecurity-tactics-for-the-coronavirus-pandemic?cid=other-eml-alt-mip mck&hlkid=cc61f434b9354af8aaf986862aa59350&hctky=3124098&hdpid=fd48c3f4-6cf9-4203-bfae-3df232c30bb7

Encrypting In-flight Oracle RMAN Database Backup via DD Boost

To secure Oracle database backup from a DB server to a Data Domain system, DD Boost for RMAN encryption can be enabled so that RMAN backup data can be encrypted after deduplication at the Oracle server and before transmitting across the network. Since the encryption happens after deduplication and before the segment leaves the Oracle server (in-flight encryption), deduplication ratios will not suffer on the Data Domain system. In contrast, if Oracle RMAN encryption is used, data will not be deduplicated because they will be encrypted first, thus deduplication ratio will suffer.

In-flight encryption enables applications to encrypt in-flight backup or restore data over the network from the Data Domain system. When configured, the client is able to use TLS to encrypt the session between the client and the Data Domain system.

To enable in-flight encryption for backup and restore operations over a LAN, run the following command on the Data Domain:

# ddboost clients add client-list [encryption-strength {medium | high} authentication-mode {one-way | two-way | anonymous}]

This command can enable encryption for a single client or for a set of clients.

The specific cipher suite used is either ADH-AES256-SHA, if the HIGH encryption option is selected, or ADH-AES128-SHA, if the MEDIUM encryption option is selected.

The authentication-mode option is used to configure the minimum authentication requirement. A client trying to connect by using a weaker authentication setting will be blocked. Both one-way and two-way authentication require the client to be knowledgeable about certificates.

For example:

# ddboost clients add db1.domain.com db2.domain.com encryption-strength high authentication-mode anonymous

To verify:

# ddboost clients show config
Client          Encryption Strength  Authentication Mode
*               none                 none
db1.domain.com  high                 anonymous
db2.domain.com  high                 anonymous

Using BoostFS to Backup Databases

If your company is using DellEMC Data Domain appliance to backup your databases, you are probably familiar with DD Boost technology. DD Boost increases backup speed while decreasing network bandwidth utilization.  In the case of Oracle, it has a plugin that integrates directly into RMAN. RMAN backs up via the DD Boost plugin to the Data Domain. It is the fastest and most efficient method to backup Oracle databases. 

However, some database administrators are still more comfortable with performing cold backups.  These backups are usually dumped to the Data Domain via NFS mount.   This is not the most efficient way to backup large databases as they are not deduplicated before sending to the network, thus consuming a lot of bandwidth.

Luckily, DellEMC created the product BoostFS (Data Domain Boost Filesystem) which provides a general file-system interface to the DD Boost library, allowing standard backup applications to take advantage of DD Boost features.   In the case of database cold backup, instead of using NFS to mount the Data Domain, you can use BoostFS to stream the cold backups to the Data Domain, thus increasing backup speed and decreasing network bandwidth utilization. In addition, you can also take advantage of its load-balancing feature as well as in-flight encryption.

To implement BoostFS, follow these steps:

1. DDBoostFS is dependent on FUSE.  So before installing DDBoostFS, install fuse and fuse-libs first.

2. Edit the configuration file /opt/emc/boostfs/etc/boostfs.conf, specifying the Data Domain hostname, DD storage-unit, username, security option, and if you want to allow users other than the owner of the mount to access the mount.  This is useful if you are using the same storage-unit for multiple machines.

3. Create the lockbox file, if you specified lockbox as the security option.  This is the most popular choice.

4. Verify host has access to storage using command /opt/emc/boostfs/bin/boostfs lockbox show-hosts

5. Mount the new boostfs storage unit using command /opt/emc/boostfs/bin/boostfs mount

6. To retain the mount after reboots, add the boostfs entry on /etc/fstab

For more information, visit the DellEMC support site.

Automating Security

One of the most exploited security weaknesses that leads to data breaches is device misconfigurations, Some of these misconfigurations are:

  • Not changing the default passwords
  • Not cleaning up unused user accounts
  • Failing to remove unused / temporary access
  • Inability to cope with changes
  • Overly complex policies
  • Creating incorrect or non compliant policies
  • Changing wrong policies

Compared to security device flaws, misconfigurations can be mitigated by enforcing strict procedures as well as automation. Automating security configuration will eliminate human errors amidst the complex and rapidly changing environment.  For instance, Operating System images can be defined in a template format which have been hardened with the necessary configurations.  Orchestration tools such as Puppet, Ansible, or Chef are then used to deploy and implement automatically.  

How to Permanently Delete Data in the Cloud

In the pre-cloud era, to permanently delete data, the sectors on the physical disk must be overwritten multiple times with zeros and ones to make sure the data is unrecoverable. if the device will not be re-used, it must be degaussed. The Department of Defense standard, DoD 5220.22-M, goes so far as destroying the physical disk through melting, crushing, incineration or shredding to completely get rid of the data.

But these techniques do not work for data in the cloud. First, cloud customers probably will not have access to the provider’s data centers to access the physical disks. Second, cloud customers do not know where they are written, i.e. which specific sectors of the disk, or which physical disks for that matter. In addition, drives may reside on different arrays, located in multiple availability zones, or data might even be replicated in different regions.

The only way to permanently erase data in the cloud is via crypto-shredding. It works by deleting the encryption keys used to encrypt the data. Once the encryption keys are gone, the data cannot be recovered. So it is imperative that even before putting data in the cloud, they should be encrypted. Unencrypted data in the cloud will be impossible to permanently delete. As a cloud customer, it is also important that you own and manage the encryption keys and not the cloud provider.