RDM intro and Migrating virtual machines with Raw Device Mappings

First of all we will discuss about a common use case with regards to the RDM – Raw Device Mappings,

Microsoft failover clustering service requires to have RDM volumes specially when using the multi host clustering setup (cluster-across-boxes)

In this setup, Physical compatibility mode is being used. By doing this the VM will have more IOPS access than the virtualized storage.

Raw device mapping (RDM) is a mapping file that provides direct access to a LUN on an iscsi or fibre channel storage system for a virtual machine. RDM is basically a Mapping file acts as a proxy for a raw physical storage device placed in a VMFS volume. Virtual Machine can directly access the storage device using RDM and RDM contains metadata which controls the disk access to the physical device. Raw Device Mapping (RDM) gives you some of the advantages of direct access to a physical device while keeping some advantages of a virtual disk in VMFS. 

Now, let's discuss the Migration process of the VM servers with RDM in detail;

Migrating virtual machines with RDMs can be performed in three ways:

  1. Warm migration (vMotion), with the virtual machine powered on.
  2. Cold migration, with the virtual machine powered off.
  3. Storage migration (Storage vMotion), with the virtual machine powered on.

To be honest, this process is a pain and also not an easy task to carry out.

But VMware discusses some of the common questions that arise when migrating virtual machines that use Raw Device Mappings (RDMs).

Permanent Device Loss (PDL) and All-Paths-Down (APD)

There is a new storage-related feature, VM Component Protection (VMCP), that protects virtual machines from possible storage issues.

There are two different types of methods that can be managed by VMCP:

PDL: It occurs when the storage array issues a SCSI sense code indicating that the device is unavailable (for example, a failed LUN).

APD: Usually, related to an underlying storage/networking issue, different from a PDL because the host doesn't have enough information to determine if the device loss is temporary or permanent.

The below URL will provide all the necessary information to drill down the issue and the resolution steps;

[source: VMware]