VMware SRM 5 with EMC Symmetrix – What’s New (Part 2)

Hi,

The first part of this blog series cover the SRM5 and the Symmetrix SRA functionality up to the point of running a SRM failover test..

http://itzikr.wordpress.com/2011/10/08/vmware-srm-5-with-emc-symmetrix-%e2%80%93-what%e2%80%99s-new-part-1/

This part will cover the actual Failover and Failback (ReProtect)

Gold Copies for Failover

  • SRA 5.0 has changed to create two separate files for Gold Copy information:
    • EmcSrdfSraProtectionSiteGoldcopyConfig.xml
    • EmcSrdfSraRecoverySiteGoldcopyConfig.xml
  • It is important to note that in this release, for
    recovery side gold copy only TF/Mirror (TF Clone emulation) is supported. SNAP
    and CLONE are not. This will be resolved in 5.1. Protected side gold copies are
    supported for all TF mechanisms.
  • Located:
    • %ProgramData%\EMC\EmcSrdfSra\Config\
  • Supported with all replication modes
    • SRDF/A, SRDF/S, SRDF/STAR
    • Same requirements as for Test Failover
      • E.G., SRDF/A and TimeFinder/Snap require 5875 and Write Pacing
  • New with the SRA 5.0 is the ability to create a gold copy on the Protection side as well as the Recovery side.
    • Configuration of the recovery side gold copy can be done with VSI like with SRM 4.
    • The ability to create/edit the protection side gold copy options file is not yet in VSI but is planned. Manual editing is required in this case.
  • If one or both of these options file are configured, a gold copy will be created on none, one or both of the sides during a Failover operation.
    • The files that should be edited are the ones on the Recovery side SRM server
  • Default behavior of adapter is to continue on with Failover if gold copy creation fails. This can be changed by editing a parameter in the global options file.
    • <FailoverIfGoldCopyFails>
  • Few differences in behavior of Test Failover and Gold Copy
    • If consistency protection is not enabled, test failover fails where as gold copy will succeed.
    • Test failover performs “consistent” split or activate whereas gold copy doesn’t need to.
      • For example, if the RDF link is in “Transmit Idle” state, the adapter cannot perform consistent split on the BCVs or consistent activate on clones and snaps. Therefore test failover fails where as goldcopy operation detects this scenario and performs normal split.

Planned Migration

Planned Migration Recovery Plan execution


  • Previously the SRA performed an RDF swap of devices if possible after failover
    • No longer occurs with failover for SRDF/A or SRDF/S devices due to new “reprotect” operation
    • STAR devices will be reconfigured to reverse replication though during failover

      Disaster Recovery

  • Array, cluster or site failure, “Disaster Recovery” option must be used
    • Rides through failures on the protected side unlike “Planned Migration” which will fail upon any errorsSRA will try to reconfigure STAR environment if possible
    • Certain failure scenarios will caused STAR Cascaded to become Concurrent

Reprotect

  • In SRM 4.x the SRA performed an RDF Swap (when possible) after failover by default
  • SRM 5 has increased the granularity of operations and has enforced this to be a separate operation: Reprotect
  • Reprotect may not be possible if there was a failure on the protected side
    • Usually only fully functional in Planned Migration scenarios
    • Storage operations may fail

Reprotect SRDF/A And SRDF/S

  • After failover device pairs are in “FailedOver” state
    • Replication direction (though suspended) remains A -> B
  • The SRA performs a swap during Reprotect and reverses replication
    • Replication is resumed but in the opposite direction B -> A
    • Device personalities change: R1 becomes R2 and vice versa
  • Protection groups and recovery plans are automatically updated and reversed

Reprotect – SRDF/STAR

    • For STAR, replication is already reversed and reconfigured during failover
      • Assuming there were no site/storage failures, if so manual intervention may be necessary
    • Protection groups and recovery plans are automatically updated and reversed

  • Reprotect offers Force Cleanup similar to test recovery
    • Only available after one failed reprotect
    • If needed, usually means manual intervention with the storage will be required to resume replication

Reprotect – Failback

  • After reprotect has been successfully executed failback can occur
  • Failback is no different than failover and is executed in the same way



VMware SRM 5 with EMC Symmetrix – What’s New (Part 1)

Hi,

A lot has changed since I published my SRM failback post (http://itzikr.wordpress.com/2011/01/10/srm-automatic-failback-using-emc-symmetrix-vmax/)

SRM 5 has finally been released and it now includes a built in failback, aka (ReProtect) so this post will try to capture some of the new SRM5 features and then an expansion of the (very soon to be released) the Symmetrix SRA (Storage Replication Adapter)

Before we go ahead, I also wanted to thank Cody Hosterman @codyhosterman
, cody is a Snr Systems engineer who is responsible for many many things but in the context of the vSpecialist team, he makes sure we get the info we need before it leaves the door, thank you Cody for being so patient !

So first thing first, here’s the new scalability features within SRM 5

  Maximum Enforced
Protected virtual machines total 1000 No
Protected virtual machines in a single

protection group

500 No
Protection groups 250 No
Simultaneous running recovery plans 30 No
vSphere Replicated virtual machines 500 No

 

The planned migration is quite an im;ortant one generically speaking but has no meaning when using EMC Symmetrix array because of the fact that SRDF will ALWAYS make sure the data is consistent prior to the failover.

Some other new features which are “under the hood”

  • IPv6
  • SRM will support IPv6 for all network links.
  • vSphere Replication will support communication over IPv6 if underlying ESXi servers support IPv6.
  • Single UI – don’t need to use two clients or linked mode
  • IP Customization performance increase
    • The command line doesn’t change for bulk imports, but the actual action of customization is much faster.
  • In guest callouts

Since 1.0 you can execute a script that was held on the SRM server, now you can do it inside the guest

API:

  • Existing API on recovery site preserved
  • New API on both protected and recovery sides
  • Protected Site API set includes:
    • List replicated datastores / protection groups / resources / VM
    • Query the status of protection for a VM or VMs
    • Protect or unprotect one or more VM
    • Status of protection group
  • Recovery Side API includes:    
    • Recovery Plan info
    • Start / cancel, list / answer prompts
    • Get XML representation of historical run of plan
    • Get basic result information of a plan (name, start, stop, etc.)

       

SRDF SRA 5.0

 

Requirements for SRDF Storage Adapter

  • DMX and VMAX storage arrays
    • DMX-1/2 running Enginuity operating environment 5671
    • DMX-3 and DMX-4 running Enginuity operating environment 5771 or later
    • VMAX running Enginuity operating environment 5874    
  • Management of Symmetrix array is done in-band
    • Solutions Enabler version 7.3.1 or later, There is a special SYMAPI preference that allows the SRA to discover remote devices even if the RDF state is partitioned. This preference is new in 7.3.1 and this behavior is required by SRM 5.
    • Has to be 32 bit version
  • Solutions Enabler is required on server running VMware SRM
  • Host running Solutions Enabler is required at each site
    • Can be the VMware SRM Server if it has connection to the storage array
  • Solutions Enabler in a client / server configuration
    • Host providing SYMAPI service needs to be configured
    • SSL connections between client and server recommended
  • Solutions Enabler Virtual Machine Appliance makes deployment of server easier
    • Virtual Appliances can be SE only or include SMC/SPA as well

Supported Functionality and Restrictions

  • SRDF/S, SRDF/A, SRDF/STAR Concurrent/Cascaded modes are supported
  • Support for enterprise consistency
    • SRDF/S ECA is supported
    • SRDF/A MSC is supported
      • Provides consistency across multiple SRDF/A groups
      • All SRDF/A groups in the MSC session needs to be managed by a single SRM instance
  • TimeFinder used for testing recovery plans
    • TimeFinder/Mirror and TimeFinder/Clone are fully supported
    • TimeFinder/Snap is fully supported with SRDF/S
    • TimeFinder/Snap is supported with SRDF/A with restrictions
      • 5875 with write pacing enabled on R1

        Logging

  • VMware SRM maintains logs on the vCenter Server
    • Location determined by VMware
  • Adapter logs located with SRM Logs (NEW LOCATION)
    • Default log location: %ProgramData%\VMware\VMware vCenter Site Recovery Manager\Logs\SRAs\EMC Symmetrix
  • Log file name is EmcSrdfSra_<date>.log
  • API logs is symapi-<date>.log
    • Available on the Symmetrix API server handling the client requests
  • Troubleshooting requires the logs listed above
    • Both protection and recovery side files are required

       

SRDF/STAR Support

 

  • Newly supported with SRA version 5.0, previously only two site solutions were allowed
  • SRDF/STAR uses one of the following RDF capabilities to mirror the same production data synchronously to one remote site and asynchronously to another remote site:
    • Concurrent SRDF configuration: A single source (R1) device is remotely mirrored to two target (R2) devices at the same time.
    • Cascaded SRDF configuration: It consists of a primary site (SiteA) replicating data to a secondary site (SiteB) and the secondary site (SiteB) replicating the same data to a tertiary site (SiteC).
    • SRDF/STAR topology:
    • Workload site: It is the primary data center where the production workload is running.
    • Sync target site: It is the secondary site usually located in the same region as the workload site. The production data is mirrored to this site using synchronous replication.
    • Async target site: It is the secondary site in distant location. The production data is mirrored to this site using asynchronous replication.
    • STAR Site operations: Operations performed on the workload site or target site.
    • Connect: Begin SRDF/Star synchronization
    • Protect: Enable SRDF consistency protection for a target site
    • Disconnect: Suspend SRDF/Star synchronization.
    • Unprotect: Disable SRDF/Star consistency protection to the specific target site.
    • Switch: Switch workload operations to a target site

SRDF STAR Considerations

  • The user is expected to setup the STAR group before the Adapter operations.
  • SRA supports Failover for STAR devices between the workload site and sync target site only.
    • The async target site is considered a bunker site and assumed that it will not be connected to a host to control it.
  • SRA supports Test Failover for STAR devices at the sync target site only.
  • The STAR commands in SRA might take multiple hours and depends on the amount of replication data.

SRDF/STAR Concurrent Setup

  1. The R1 devices must be configured as concurrent dynamic devices.
  2. Create an RDF1-type composite group on the control host at the workload site.
  3. Add devices to the composite group from those SRDF groups that represent the concurrent links for SRDF/Star configuration.
  4. Create two SRDF group names – one SRDF group name for all synchronous links and one for all asynchronous links.

For each source SRDF group that you added to the composite group, define corresponding empty recovery RDF groups (static or dynamic) at both the remote sites

SRDF/STAR Cascaded Setup

  1. The R1 devices must be configured as cascaded dynamic devices.
  2. Create an RDF1-type composite group on the control host at the workload site.
  3. Add devices to the composite group from those SRDF groups that represent the cascaded links for the SRDF/Star configuration.
  4. Create one SRDF group name for all synchronous links.
  5. For each source SRDF group that you added to the composite group, define a corresponding empty recovery SRDF group (static or dynamic) at the workload site.

SRDF/STAR Setup

  • Create SRDF/STAR options file specifying the names of each SRDF/Star site and the required parameters.
  • Perform the symstar setup operation.

    Create the matching R2 or R21 composite groups needed for recovery operations at the synchronous and asynchronous target sites

Device Discovery

  • Dynamic RDF devices
  • SRDF/S and SRDF/A
  • Adaptive Copy is still NOT supported
  • SRDF/STAR
    • STAR/Concurrent or Cascaded, Diskless Cascaded is not supported
    • Must be in STAR Configuration, standalone Concurrent or Cascaded configurations are not allowed

[WARNING]: Non-STAR Cascaded and Concurrent Devices are not supported. Skipping this device

Device Discovery- Consistency Groups

  • Consistency groups are required for all devices!
    • The devices will only be filtered out from discovery if they do not have a consistency group at one of the two sides. If they have none they will appear but will be grouped in the same protection group leading to a possibly incorrect configuration.
    • Even single devices without dependencies must be in a group
    • New requirement from SRM 5.0
    • Match the consistency groups to protection groups

Use VSI 5.0 SRA Utilities to create groups

  • Only supported to create groups for SRDF/A and SRDF/S devices
  • STAR consistency groups must be created manually
    • First group on workload site created manually
    • Secondary and tertiary groups, on Sync and Async site SYMAPI servers respectively, are easily created using “symstar buildcg” command
  • For SRDF/A pairs, the RDF Daemon must be enabled on both SYMAPI servers
    • Also set “SYMAPI_USE_RDFD” to ENABLE on both in options file
    • SRDF/S Algorithm
      • VSI scans for VMFS volumes on Symmetrix devices
      • All virtual machines are discovered for that VMFS
      • If any VM spans multiple datastores, all underlying Symmetrix devices are in CG
      • If one datastore spans multiple Symmetrix devices (must be same Symmetrix) all devices are included in CG
      • Any devices used as RDMs by the previous VMs are included in CG
      • All devices in step 3-5 are in one CG
      • All or some of the previewed CGs can be created

All or some of the CGs can be merged if desired

  • SRDF/A Algorithm and considerations
    • Very similar to SRDF/S with some special considerations
    • RDF operations cannot be performed on a subset of devices contained in a single RA group with SRDF/A. This means, all of the devices within an RA group must be part of any RDF operation.
    • Non-VMware associated Symmetrix devices may be affected if they are in the same RA group
      • Avoid this when possible—dedicate certain RA groups to only VMware devices when using SRDF/A

Array Manager Configuration

  • Array pairs must be enabled before devices can be discovered
  • Pairs can only be enabled if the remote array manager for that pair is also configured
    • I.E., set up array managers on both sides first, then enable the pairs
  • No need to enable array pairs in STAR configuration to Async site, only Sync site pair is necessary

  • Once pairs are enabled devices can be discovered by the SRA on the “Devices” tab
    • Device IDs, replication direction and consistency groups will be reported

Protection Groups and Recovery Plans

  • Datastore groups are defined by similar rules as VSI uses to suggest consistency groups
  • Create protection group
    • Select datastore group(s)

Recovery Plans

  • Can include one or more protection groups

Note how you can enter both protected site, and recovery site IP – which will help in both failover, and failback.

Recovery Plans – VM Dependencies


Test Failover


  • Requires Virtual Storage Integrator 5.0 SRA Utilities
  • TimeFinder configuration pairing saved to options file
    • Located:
      • %ProgramData%\EMC\EmcSrdfSra\Config\EmcSrdfSraTestFailoverConfig.xml
  • To use R2 devices for Failover instead of TimeFinder copies set the following global option to “yes”:
    • “TestFailoverWithoutLocalSnapshots”
    • Located here:
      • %ProgramData%\EMC\EmcSrdfSra\Config\EmcSrdfSraGlobalOptions.xml
  • Must still log on to Recovery site to create pairings with VSI.
    • Notwithstanding the new architecture of SRM 5 user interface

Test Failover – General Considerations

 

  • TimeFinder/Mirror
    • The adapter requires the BCV pairs to be fully established prior to test failover.
  • TimeFinder/Snap or Clone
    • The adapter doesn’t require any TF relationships between the device pairs prior to ‘test failover – start’ operation.
      • Must be configured in options file though
    • If a relationship exists for the input RDF2 devices, the adapter analyses all existing relationships during ‘test failover – start’ operation and creates/recreates new/existing sessions.
      • For Snap, if “recreate” is not supported in the microcode, the adapter terminates any activated snap sessions and creates new sessions.

Test Failover Considerations with STAR

  • Test Failover can only be performed on the Sync site.
    • Not supported with the Async site
  • TestFailoverWithoutLocalSnapshots is NOT allowed in conjunction with STAR (for the current release..)
    • Will ignore setting and look for TimeFinder device pairings.
    • If no devices pairings are defined, test will fail
  • Other Test Failover advanced options are supported:
    • TestFailoverForce (not supported for RDF devices with the links in “Split” state, in the current release of the SRDF Adapter.)
    • TerminateCopySessions
  • These are the STAR modes allowed by the SRA for test failover (and failover) with STAR:
STAR State Sync Target Site Async Target Site
Protected Protected Protected
Tripped PathFailed PathFailed
Tripped PathFailed Protected
Tripped Protected PathFailed

Test Failover Example: SRDF/STAR Cascaded

  • Configure device pairings with VSI
    • Available in vCenter inventory at ESX or Cluster level

      Choose, TimeFinder mode, device pairs and save

Test Failover Example: SRDF/STAR Cascaded

  • Initiate test of recovery plan
    • “Replicate recent changes to recovery site” is a non-operation for the SRDF SRA. The issued command “SyncOnce” is accepted but not used as SRDF constantly synchronizes storage.

Test Failover Example: SRDF/STAR Cascaded

  • Force Cleanup” is required when test failover storage operations fail. Due to:
    • Incorrect options file
    • Licensing issue
    • Etc…
  • “Force Cleanup” is not available upon the first “Cleanup” attempt after a test. “Cleanup” operation must fail once before this option can be selected.
    • Will allow process to complete regardless of storage operation errors

       

Powerpath/VE 5.4 SP2 P01

Hi,

So for those of you who haven’t upgraded to vSphere 5 yet which is being supported by Powerpath/VE 5.7 and are still on vSphere 4.1, we have released an hot fix for PP/VE 5.4 SP2 known as “P01″.

 

PowerPath/VE 5.4 SP2 P01

PowerPath/VE 5.4 SP2 P01 has the following new features:

  • Support for EMC Symmetrix® VMAXeTM arrays with EnginuityTM version 5875.198.148.e.

and here’s what this release fix:

01