Monday, 14 March 2016

How to map ESXi vmdk to scsi devices in Linux (RHEL, CentOS etc.) GuestOS

If you have some big VM with more than 1 virtual scsi controller and many virtual disks (usually this VM is critical e.g DB) and you have to maintenance some vmdk you have to know the mapping to GuestOS Linux disk to remove proper one.

The often use tool to map vmdk to Linux disk is lsscsi.But if in meantime you changed the scsi controler type the order of scsi device in Linux changed too and you can't rely on lsscsi output any more.


The most reliable way to find the ESXi mapping to scsi devices in Linux is using pciSlotNumber (ESXi) and Physical Slot (Linux).

I am a RPM based Linux user (Centos, RHEL, Fedora etc.)

Install packages:

# yum -y install pciutils lshw

Check businfo on Linux:

# lshw -businfo | grep scsi

pci@0000:00:10.0  scsi0        storage    53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI
scsi@0:0.0.0      /dev/sda     disk       42GB SCSI Disk
scsi@0:0.0.0,1                 volume     500MiB Linux filesystem partition
scsi@0:0.0.0,2    /dev/sda2    volume     39GiB Linux LVM Physical Volume partition
scsi@0:0.1.0      /dev/sdb     disk       1073MB SCSI Disk
pci@0000:02:01.0  scsi3        storage    53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI
scsi@3:0.0.0      /dev/sdc     disk       1073MB SCSI Disk
pci@0000:0b:00.0  scsi4        storage    PVSCSI SCSI Controller
scsi@4:0.0.0      /dev/sdd     disk       1073MB SCSI Disk
scsi@4:0.1.0      /dev/sdg     disk       1073MB SCSI Disk
pci@0000:13:00.0  scsi5        storage    PVSCSI SCSI Controller
scsi@5:0.0.0      /dev/sde     disk       1073MB SCSI Disk
scsi@5:0.1.0      /dev/sdf     disk       1073MB SCSI Disk
                  scsi2        storage   
scsi@2:0.0.0      /dev/cdrom   disk       DVD-RAM writer
 


Check the PCI Slot on Linux box:

# lspci -v | grep -A 2 "13:00.0" | grep Phy
    Physical Slot: 224


Login to ESXi host:

# grep -i scsi[0-9].pciSlotNumber /vmfs/volumes/a*/kb-c02/*.vmx
 
/vmfs/volumes/a0754ea2-4f1bb7f5/kb-c02/kb-c02.vmx:scsi0.pciSlotNumber = "16"
/vmfs/volumes/a0754ea2-4f1bb7f5/kb-c02/kb-c02.vmx:scsi1.pciSlotNumber = "33"
/vmfs/volumes/a0754ea2-4f1bb7f5/kb-c02/kb-c02.vmx:scsi3.pciSlotNumber = "192"
/vmfs/volumes/a0754ea2-4f1bb7f5/kb-c02/kb-c02.vmx:scsi2.pciSlotNumber = "224"

Based on that information we see that vmdk disks:

# grep -i scsi2:[0-9]*.file* /vmfs/volumes/a*/kb-c02/*.vmx
 

/vmfs/volumes/a0754ea2-4f1bb7f5/kb-c02/kb-c02.vmx:scsi2:0.fileName = "kb-c02_4.vmdk"
/vmfs/volumes/a0754ea2-4f1bb7f5/kb-c02/kb-c02.vmx:scsi2:1.fileName = "kb-c02_5.vmdk"
 

are mapped to:

pci@0000:13:00.0  scsi5        storage    PVSCSI SCSI Controller
scsi@5:0.0.0      /dev/sde     disk       1073MB SCSI Disk
scsi@5:0.1.0      /dev/sdf     disk       1073MB SCSI Disk



WARNING: PCI slot number will  change when you change the SCSI controller type. It means that your VM will not boot in same corner case scenario see below. PCI slot numbers are increased in area of SCSI controller type. The lowest numbers has LSI Logic Parallel.(I didn't check BusLogic Parallel which is supported only with 32bit OSes)

Let say our VM is configured as follow is bootable and up and running:



VM config file looks like that:

# grep -i scsi[0-9].pciSlotNumber /vmfs/volumes/a*/kb-c02/*.vmx
 
scsi0.pciSlotNumber = "16"
scsi1.pciSlotNumber = "33"
scsi3.pciSlotNumber = "34"
scsi2.pciSlotNumber = "192"


# grep -i scsi[0-9].virtu /vmfs/volumes/a*/kb-c02/*.vmx

 
scsi1.virtualDev = "lsilogic"
scsi3.virtualDev = "lsilogic"
scsi2.virtualDev = "pvscsi"
scsi0.virtualDev = "lsilogic"


Someone decided to change the SCSI controller type to 'LSI Logic SAS'




VM config now looks like that and VM cannot boot now the scsi controller order changed this is not Linux uniq on Windows OS we will see the same behaviour.

# grep -i scsi[0-9].pciSlotNumber /vmfs/volumes/a*/kb-c02/*.vmx
 
scsi0.pciSlotNumber = "224"
scsi1.pciSlotNumber = "33"
scsi3.pciSlotNumber = "34"
scsi2.pciSlotNumber = "192"


# grep -i scsi[0-9].virtu /vmfs/volumes/a*/kb-c02/*.vmx

 
scsi1.virtualDev = "lsilogic"
scsi3.virtualDev = "lsilogic"
scsi2.virtualDev = "pvscsi"
scsi0.virtualDev = "lsisas1068"


VMware introduced in VM Hardware version 11 new parameters in .vmx file scsiX.sasWWID (I haven't found it in previous VM Hardware versions):

# grep -i scsi[0-9].sas* /vmfs/volumes/a*/kb-c02/*.vmx
 
scsi2.sasWWID = "50 05 05 62 80 46 0d 80"
scsi3.sasWWID = "50 05 05 62 80 46 0c 80"
scsi0.sasWWID = "50 05 05 62 80 46 0f 80"

This parameter is presented to PVSCSI capabilities. Unfortunately is not presented in LSI controlers:

lspci -v | grep -A 12 "0b:00.0"
 
0b:00.0 Serial Attached SCSI controller: VMware PVSCSI SCSI Controller (rev 02)
    Subsystem: VMware PVSCSI SCSI Controller
    Physical Slot: 192
    Flags: bus master, fast devsel, latency 0, IRQ 19
    I/O ports at 5000 [size=8]
    Memory at fd3f8000 (64-bit, non-prefetchable) [size=32K]
    [virtual] Expansion ROM at fd300000 [disabled] [size=64K]
    Capabilities: [40] Express Endpoint, MSI 00
    Capabilities: [7c] MSI: Enable- Count=1/1 Maskable- 64bit+
    Capabilities: [94] Power Management version 3
    Capabilities: [9c] MSI-X: Enable+ Count=24 Masked-
    Capabilities: [100] Device Serial Number 80-46-0d-80-50-05-05-62


lspci -v | grep -A 12 "02:01.0"
 
02:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 01)
    Subsystem: VMware LSI Logic Parallel SCSI Controller
    Physical Slot: 33
    Flags: bus master, medium devsel, latency 64, IRQ 19
    I/O ports at 2000 [size=256]
    Memory at fd5c0000 (64-bit, non-prefetchable) [size=128K]
    Memory at fd5e0000 (64-bit, non-prefetchable) [size=128K]
    [virtual] Expansion ROM at fd500000 [disabled] [size=16K]
    Capabilities: [f8] PCI Advanced Features
    Kernel driver in use: mptspi




Another corner scenario is when we start with LSI Logic SAS controller and attach 3 x PVSCSI controllers the last assigned slot is 1184 the same case will be with 3 x LSI Logic SAS.

# grep -i scsi[0-9].pciS* /vmfs/volumes/a*/kb-c03/*.vmx
 
scsi0.pciSlotNumber = "160"
scsi1.pciSlotNumber = "224"
scsi2.pciSlotNumber = "256"
scsi3.pciSlotNumber = "1184"


# grep -i scsi[0-9].virt* /vmfs/volumes/a*/kb-c03/*.vmx

scsi0.virtualDev = "lsisas1068"
scsi1.virtualDev = "pvscsi"
scsi2.virtualDev = "pvscsi"
scsi3.virtualDev = "pvscsi"


or 

# grep -i scsi[0-9].virt* /vmfs/volumes/a*/kb-c03/*.vmx
 
scsi0.virtualDev = "lsisas1068"
scsi1.virtualDev = "lsisas1068"
scsi2.virtualDev = "lsisas1068"
scsi3.virtualDev = "lsisas1068"


The pciSlotNumber = "1184" will be translated to Physical Slot: 161


# lspci -v | grep -A 2 "04:00.0"
 
04:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01)
    Subsystem: VMware SAS Controller
    Physical Slot: 161


# lspci -v | grep -A 2 "04:00.0"
 
04:00.0 Serial Attached SCSI controller: VMware PVSCSI SCSI Controller (rev 02)
    Subsystem: VMware PVSCSI SCSI Controller
    Physical Slot: 161


WARNING: Even the pciSlotNumber is high like 1184 the Linux will see that like Physical Slot 161 it mean your linux drives will be enumerated and the last controller will show as second in Linux !
I am not sure this is a bug or feature I checked that on VMware ESXi 6.0.0 build-3029758.




3 comments:

  1. I've observed the same behavior with the pciSlotNumber 1184 showing up at 161. I thought I was going crazy, but was glad to see your post with the same issue. Hopefully this is consistent behavior, some scripts I've written assume now that 1184 = 161.

    ReplyDelete
  2. I've had this behavior where the PCI slots 1184 and 1248 show up as 161 and 225 respectively. From what I could observe, when the slot number is over 1023, linux wraps the count arround, so if you subtract 1023, it will give you the correct number in the guest OS.

    1184-1023 = 161
    1248-1023 = 225

    Hope this helps anyone else who runs into this in the future. You could adapt scripts to check if the number is over 1023 and subtract that number.

    ReplyDelete
    Replies
    1. Awesome find, I too thought I am going crazy :D
      Btw windows does same thing ;)

      Delete