Monday 26 May 2014

Notes and thougts after 'Storage Implementation in vSphere 5.0 by Mostafa Khalil' read.

I really enjoyed reading this book and I found here a lot of troubleshooting tips&hints. Mostafa has long experience in Vmware support and access to VMware developers therefore we have great real-life examples with some internals from people who implement important part of the code .I highly recommend this book which dives deep into vSphere storage implementations.


1. How to list WWNN and WWPN of HBA adapter:

# esxcfg-mpath -b | grep WWNN | sed 's/.*fc//;s/Target.*$//'

2. How to list Adapter WWPN and Target WWPN:

#  esxcfg-mpath -b | grep WWNN | sed 's/.*fc//' | awk '{ print $1, " ",$4, " ",$5, " ",$6, " ",$9, " ",$10 }'
Adapter:   WWPN: 20:00:00:06:f6:30:b4:ed Target:   WWPN: 50:06:01:64:3d:e0:2e:a0
Adapter:   WWPN: 20:00:00:06:f6:30:b4:ee Target:   WWPN: 50:06:01:65:3d:e0:2e:a0


VMware recommends single initiator zonning but single-initiator - multiple-target zoning is also acceptable unless the storage vendor does not support it.

3. How to list HBA adapters:

# esxcfg-scsidev -a
# esxcli storage core adapter list

HBA Name  Driver  Link State  UID                                   Description
--------  ------  ----------  ------------------------------------  ------------------------------------------------------
vmhba0    mptsas  link-n/a    sas.5588d09267486000                  (0:1:0.0) LSI Logic / Symbios Logic LSI1064E
vmhba1    fnic    link-up     fc.20000025b5011109:20000025b501a009  (0:10:0.0) Cisco Systems Inc Cisco VIC FCoE HBA Driver
vmhba2    fnic    link-up     fc.20000025b5011109:20000025b501b109  (0:11:0.0) Cisco Systems Inc Cisco VIC FCoE HBA Driver


4. FCoE notes:

"FCoE runs directly on Ethernet (not on top of TCP or IP like iSCSI) as a Layer 3 protocol and cannot be routed. Based on that fact, both initiators and targets (native FCoE targets) must be on the same network. If native FC targets are accessible via FCoE switches, the latter must be on the same network as the FCoE initiators."

"Any link supporting DCBX must have LLDP enabled on both ends of the link for Transmit/Receive (Tx/Rx). If LLDP to be disabled on a port for either Rx or Tx; DCBX TLV within received LLDP frames are ignored. That is the reason why the NIC must be bound to the vSwitch. Frames are forwarded to the Datacenter Bridging Daemon (DCBD) to DCBX via the CDP vmkernel module. The latter does both CDP and LLDP."

"FCoE Initialization Protocol (FIP), Jumbo frame (actually baby jumbo frames, which are configured on the physical switch, are used to accomodate the FC frame payload which is 2112 bytes long), FCoE, and DCBX modules are enabled in ESXi 5 Software FCoE initiator by default."

"You can configure up to four (4) SW FCoE adapters on a single vSphere 5 host."

"The number assigned to the vmhba is a hint to whether it is Hardware or Software FCoE Adapter. vmhba numbers lower that 32 are assigned to Hardware (SCSI-related) Adapters, for example, SCSI HBA, RAID Controller, FC HBA, HW FCoE, and HW iSCSI HBA. vmhba numbers 32 and higher are assigned to Software Adapters and non-SCSI Adapters, for example, SW FCoE, SW iSCSI Adapters, IDE, SATA, and USB storage controllers."

5. How to list vml and the device IDs to which they link:

# ls -la /dev/disks/
# ls -la /vmfs/device/disks/

iSCSI notes:

6. How to check for iSCSI targets:

# esxcli iscsi adapter target list

7. How to identifying SW iSCSI Initiator's vmkernel ports:

# esxcli iscsi logicalnetworkportal list 
 
8. How to dump SW iSCSI database:

# vmkiscsid --dump-db=<file-name> 
The dump includes the following sections:
* ISID: iSCSI Session ID information
* InitiatorNodes: iSCSI Inititor information
* Targets: iSCSI Targets information
* discovery: Target Discovery information
* ifaces: The iSCSI Network configuration including the vmnic and vmknic names 

PSA notes:

"Runtime Name, as the name indicates, does not persist between host reboots. This is due the possibility that any of the components that make up that name may change due to hardware or connectivity changes. For example, a host might have an additional HBA added or another HBA removed, which would change the number assumed by the HBA."

9. How to list paths to device:

# esxcli storage nmp path list -d naa.123123123 | grep fc 

If you are using PowerPath you can run:

# /opt/emc/powerpath/bin/powermt display dev=all
 
10. How to identify NAA ID using the vml ID:

# esxcfg-scsidevs -l -d vml.123123 | grep Display

VMFS notes:

11. /vmfs/devices is a symbolic link to /dev on ESXi 5.x 

12. How to re-create partition table using partedUtil

a. healthy gpt vmfs partition looks like that:

# partedUtil getptbl /dev/disks/naa.60000970000292600926533030313237

gpt
31392 255 63 504322560
1 2048 504322047 AA31E02A400F11DB9590000C2911D1B8 vmfs 0


b. here how looks output of partedUtil when partition was deleted. In this case you will notice after rescan that affected LUN disapper from VIclient. 

# partedUtil getptbl /dev/disks/naa.60000970000292600926533030313237

gpt
31392 255 63 504322560


c. list partion's GUIDs

# partedUtil showGuids
 Partition Type       GUID
 vmfs                 AA31E02A400F11DB9590000C2911D1B8
 vmkDiagnostic        9D27538040AD11DBBF97000C2911D1B8
 VMware Reserved      9198EFFC31C011DB8F78000C2911D1B8
 Basic Data           EBD0A0A2B9E5443387C068B6B72699C7
 Linux Swap           0657FD6DA4AB43C484E50933C84B4F4F
 Linux Lvm            E6D6D379F50744C2A23C238F2A3DF928
 Linux Raid           A19D880F05FC4D3BA006743F0F84911E
 Efi System           C12A7328F81F11D2BA4B00A0C93EC93B
 Microsoft Reserved   E3C9E3160B5C4DB8817DF92DF00215AE
 Unused Entry         00000000000000000000000000000000  

 
d. get usable sectors for affected LUN:

# partedUtil getUsableSectors /dev/disks/naa.60000970000292600926533030313237
34 504322526

e. create new partition vmfs partition table:

# partedUtil setptbl "/dev/disks/naa.60000970000292600926533030313237" gpt "1 2048 504322526 AA31E02A400F11DB9590000C2911D1B8 0"
gpt
0 0 0 0
1 2048 504322526 AA31E02A400F11DB9590000C2911D1B8 0


f. check if partition was created:

# partedUtil getptbl "/dev/disks/naa.60000970000292600926533030313237"
gpt
31392 255 63 504322560
1 2048 504322526 AA31E02A400F11DB9590000C2911D1B8 vmfs 0


g. rescan datastores:

# esxcfg-rescan -A
or
# vmkfstools -V

h. check /var/log/vmkernel.log for potential issue logs

# tail -20 /var/log/vmkernel.log

g. refresh viclient storage datastore pane and affected LUN should pop-up

h. in case you get error: "The primary GPT table is corrupt, but the backup appears OK, so that will be used" run:

# partedUtil fix /dev/disks/naa.60000970000292600926533030313237

VMware does not offer data recovery services. 
Please see VMware KB: http://kb.vmware.com/kb/1015413


Virtual Disks and RDMs:

PVSCSI limitations:

- if you hot add or hot remove a virtual disk to the VM attached to the PVSCSI controller, you must rescan the SCSI BUS from within the GOS.
- if the virtual disks attached to the PVSCSI controller have snapshots, they will not benefit from the performance improvement
- if the ESXi host memory is overcommited, the VM does not benefit from the PVSCSI performance improvement
- PVSCSI controllers are not supported for GOS boot device
- MSCS clusters are not supported with PVSCSI.

Creating VM snapshot, parent disk remains unmodified. When the VM is powered on, parent disk is opened with Read-Only locks. This is the same function done by the VADP API (vStorage APIs for Data Protection) when it backs up a virtual disk while the VM is running. This allows the backup software to copy the parent disk since the Read-Only lock allows Multi-Readers to access and open the parent virtual disk for reads.

.vmdk - virtual disk - the file without flat suffix is the descriptor file, the file with flat is the extent file with data.The file with  delta is a Delta Disk Extent where new data is written after snapshot is taken,its type is vmfsSparse

.vmsn - vm snapshot file - this is the actual snapshot file, which is the state of the VM configuration. It actually combines the original unmodified content of both vmx  and vmxf files. If the Vm was power-on at the time of taking the snapshot and I chose to take a snashot of Vm's memory, this file would have included that as well as the CPU state.
  
.vmsd - virtual machine snapshot dictionary - it defines the snapshot  hierarchy.This file used to be blank before snapshot was taken

.vmx - virtual machine configuration file - describes the VM structure and virtual hardware

.vmxf - virtual machine foundry file - holds information used by vSphere Client when it connect to the ESXi host directly. This is a subset of information stored in the vCenter database.

Perennial Reservation:

Having MSCS cluster nodes spread over several ESXi hosts nieccessitates the use of passthrough RDMs, which are shared among all hosts on which a relevant cluster node will run. As a result, each of these hosts have some RDMs reserved whereas the remaining RDMs are reserved by the other hosts. At the boot time, LUN discovery and Device Claiming processes require a response from each LUN. Such a response takes much longer for LUNs reserved by other hosts. This results in an excessively prolonged boot time of all hosts with the configuration. The same issue might also affect the time for rescan operation to complete.

The concept of perennial reservations is a device property that makes it easier for an ESXi host to recognize if a given LUN is reserved by another host perennially. At boot time or upon rescanning, the host does not wait for a response from a LUN in that state. This improves boot and rescan times on ESXi 5 hosts sharing MSCS shared LUNs.(This property cannot be set via Host Profiles in ESXi 5.0)

# esxcli storage core device setconfig -d naa.123123 --perennially-reserved=true

No comments:

Post a Comment