By David Piet PhD, VMware Cloud Global Account Specialist SA Lead – AWS
By Dhaval Shah, Sr. Solutions Architect – AWS

Many organizations reach a stage where they have to revisit their disaster recovery (DR) strategy for a variety of reasons, including aging infrastructure that needs an upgrade, colocation space lease renewal, or an expansion to the scope of applications that need a DR strategy in case of an actual event.

While organizations are evaluating the benefits of moving to the cloud for their primary needs, a sudden need for additional investments into the DR site upgrade can add an additional burden on the IT budget and slow down working on new business opportunities.

Our customer, a Fortune 500 company, was exploring options for a new DR solution due to an expiring lease and aging infrastructure. They were also looking for ways to reduce costs and their maintenance and operational management. The customer’s production workloads are running on VMware vSphere hypervisor and leverage the use of a NetApp Filer.

This post provides a solution that can help organizations, similar to our customer, migrate their DR site to Amazon Web Services (AWS) with minimal changes to their applications, leveraging the VMware Cloud Disaster Recovery (VCDR) solution and a variety of AWS services.

This approach allows customers to quickly move to AWS with a lower learning curve and higher total cost of ownership (TCO). The solution assumes customers currently run their infrastructure on VMware and use some sort of shared storage, such as Common Internet File System (CIFS) shares.

The Disaster Recovery Solution

The disaster recovery solution shown in Figure 1 is comprised of three major components:

  • VCDR to manage the failover of the on-premises virtual machines (VMs).
  • NetApp Cloud Volumes ONTAP (CVO) to manage the CIFS shares.
  • AWS Landing Zone to manage the demilitarized zone (DMZ) layer, storage layer, and connectivity across the various services.

Let’s take a brief look at what each of these components do on their own.

VMware-VCDR-NetApp-2.1

Figure 1 – Solution architecture diagram of the implementation.

VCDR Overview

VCDR is a fully integrated disaster recovery solutions that is natively built into VMware Cloud on AWS. It’s an on-demand DR service that provides an easy-to-use software-as-a-service (SaaS) solution. You can use VCDR to protect your on-premises, vSphere VMs by replicating them into AWS and then, in a disaster event, failing your workloads over into VMware Cloud on AWS.

To get VCDR configured as your DR solution, you first deploy the DRaaS Connector, which is an OVA appliance, into your production vSphere site. Virtual machines that get added to your DR protection group will get replicated into the cloud-based VCDR services.

Within the cloud-based services resides the DRaaS Orchestrator and the Scale-out Cloud File System. In an actual disaster recovery event, or a DR simulation, the Scale-out Cloud File System will get mounted to a software-defined data center (SDDC) within VMware Cloud on AWS and your workloads will now run in AWS.

There are two models customers can use for their disaster recovery SDDC within VMware Cloud on AWS:

  • The first is to run in a pilot light mode. In this model, you’d have a minimum number of VMware Cloud on AWS hosts running at all times; in the event of DR failover, the cluster will automatically scale up with Elastic Distributed Resource Scheduler (EDRS) as the VMs come online. In an operational model like this, you’d typically have subscriptions associated to the pilot light hosts and, in a failover, run the scaled-up hosts as on-demand.
  • The second model is to run in a just-in-time (JIT) deployment mode, meaning that when a disaster occurs you provision a new SDDC, configure it accordingly, and then fail your workloads over into it. This is a lower cost model because there are no hosts running around the clock; however, it takes additional time and personnel to deploy the environment in real-time of an event. The hosts that get provisioned will typically run as on-demand.

NetApp Cloud Volumes ONTAP Overview

NetApp Cloud Volumes ONTAP is a software-only version of Data ONTAP, which is the data management operating system from NetApp used on physical NetApp storage appliances.

With Cloud Volumes ONTAP, the operating system has been customized to run as an Amazon Elastic Compute Cloud (Amazon EC2) instance. With Cloud Volumes ONTAP on AWS, you can spin up a new enterprise class data management system in minutes on the cloud.

Cloud Volumes ONTAP includes features such as SMB and NFS multi-protocol support, local snapshots, efficiencies with compression and dedupe, and SnapMirror migration to cloud with efficiencies and snapshots intact. You can find more information regarding CVO features on the AWS Marketplace pages for NetApp Cloud Manager and Cloud Volumes ONTAP for AWS.

Solution Overview

Putting the pieces back together, the overall solution is as follows. The continuous replication of both VMs and the NetApp volumes keep the DR environment in sync. In the case of a DR event, the VMs would failover to VMware Cloud on AWS, and in turn the VMs would mount the CVO volumes directly to their respective filesystems using the AWS Transit Gateway for connectivity.

Figure 2 illustrates traffic flow management using AWS Transit Gateway, which provides the flexibility to manage and control the east-west and north-south network traffic, as well as provides scalability to add AWS services or external services as additional attachment(s).

VMware-VCDR-NetApp-2.1

Figure 2 – Network routes through the Transit Gateway.

AWS Configuration Workflow

The high-level basic workflow for the VCDR solutions’ AWS components are:

  • Create an AWS account and select a region for the DR infrastructure deployment. In our case, customer selected us-west-2 for their DR site.
  • Create users and AWS Identity Access Management (IAM) roles based on organizational need and best practices.
  • Deploy basic networking components: virtual private clouds (VPCs), virtual private networks (VPNs), AWS Transit Gateway, and transit VPC with firewall appliance.
  • Configure the Transit Gateway with routes and security groups to connect all of the components.

VCDR Configuration Workflow

Here is the basic flow of work that was done:

  • Deploy a minimum-sized cluster in the VMware Cloud Disaster Recovery portal. These hosts serve as the pilot light compute on AWS in a DR event. Our customer deployed a 2-node cluster as their pilot light for refence.
  • In the same portal, create your protected site, which is the on-premises VMware environment that will be using VCDR as its DR target. As part of this process, the DRaaS Connector will get downloaded to your on-premises site.
  • Once the two sites are successfully paired, all that remains is creating your protection group policies. These policies are what set the VMs that are being protected, their respective recovery point objectives (RPOs), and the storage-level policies of the backups, which is the retention period.

CVO Configuration Workflow

Here is the basic flow of the work that was done to configure this:

  • Create Amazon EC2 instances to install Cloud Manager and the corresponding routes and firewall rules.
  • Create EC2 instances to install NetApp CVO to manage the CIFS shares.
  • Deploy the Snap Mirroring process to sync the on-premises CIFS shares to AWS.

Conclusion

There are many reasons why customers rethink their existing disaster recovery strategies, and when that time comes they often gravitate towards familiar solutions and technology.

In the design we laid out in this post, customers who are currently using the VMware vSphere hypervisor and NetApp storage are able to deploy like-for-like technology in AWS, while fully decommissioning their on-premises DR facility by using VMware’s VCDR with VMware Cloud on AWS and NetApp’s Cloud Volumes ONTAP in Amazon EC2.