Tuesday 20 May 2014

Troubleshooting vSphere Storage - Mike Preston - REVIEW

I have just finished reading "Troubleshooting vSphere Storage" by Mike Preston. My first impression is that this book only scratches the surface of huge and complex topic of storage troubleshooting in VMware infrastructure. In general I recommend it because it is "one evening read" and cheap. I found some inaccurate information and I believe that this information will be corrected in potential second edition.I learnt or refreshed some topics, it seems that author has some real experience but he did not delve deep enough.

CONS:

I don't agree with author that NFS is a distributed filesystem as VMFS. 
The sentence: "NFS, like VMFS,is also a distributed file system and has been around for nearly 20 years" is inaccurate IMHO or too simplified.

Please see: 
http://en.wikipedia.org/wiki/Network_File_System.

NFS stands for Network File System but it is protocol - distributed file system PROTOCOL. 
You cannot format your disk with NFS. You can have e.g. Nexenta DIY storage with ZFS filesystem and export volume to ESXi host via NFS protocol. (BTW: NFS and ZFS were invented by the Sun Microsystems). You can export NTFS volume from Windows machine  or  ext4 filesystem from Linux via NFS protocol to ESXi hosts.

Maybe I'm too pedantic but command:

cat /var/log/vmkernel.log | grep SCSI | grep -i Failed

can be replaced by:

egrep SCSI /var/log/vmkernel.log | egrep -i Failed

you save one process ;-) and less typing.

You can use short links to VMware KB instead long copy/paste:
http://kb.vmware.com/kb/289902

Sentence: "In order for ESXi to properly host VMs on NFS server, we need to have a minimum of read/write permission." is inaccurate.
You must have set option "no_root_squash" and read/write permission.If you have only read/write permission you can still create the data store, but you will not be able to create any virtual machines on it.

Please see:
http://www.vmware.com/files/pdf/VMware_NFS_BestPractices_WP_EN.pdf
http://kb.vmware.com/kb/1005948

Sentence: "Take for instance RAID5. When we issue one single write I/O to a group of drives in a RAID5 array, due to parity and spanning, the array actually incurs four writes or four IOPs." is inaccurate.

For a single write operation on RAID5, controller performs two disk reads and two disk writes for every WRITE operations, and WRITE penalty is 4.

PROS:

I like information about vCenter Storage Views (Reports and Maps).
"Storage View reports are a great way to view just how much snapshot space any VMs are using.That being said, what is displayed by default is not all that has been collected. If we right-click along the column header, we can see that there are many other column that can be added and removed from report, which can be resized, reordered, and sorted in any manner we prefer."

"You can filter any column using the filter box on the top right-hand corner.Simply select the column you would like to filter from selecting the down arrow and type in your expression in the filter box"

"Tip: Most of the storage logging in ESXi is enabled  by default; however, there are some things that are not. Look for parameters starting with SCSI.login the hosts advanced settings to see what is enabled and what is not. Simply toggling these values 1 or 0 turns them on and off."

vCenter Server Storage filters:

"Apart from Host Rescan Filter, the other filters do just as they describe; filter out LUNs. Although these filters are in place to protect us from inadvertently destroying or corrupting data, sometimes they introduce troubles when we actually have a legitimate reason to see the LUNs taht are being filtered. Take for instance setting up a MSCS inside vSphere. In order to do so, we need to map the same raw LUN or RDM to all virtual machines participating in the cluster. With RDM Filter running under its default configuration, only the host which is running the first VM to see the RDM would see this storage, and storage filters would filter that RDM on all other hosts.This is certainly a valid scenario where we would need to disable the RDN Filter in order to gain visibility to the LUN from other VMs on other hosts."

Open VIclient -> goto Administration drop-down menu -> vCenter Server Settings -> Advanced Settings -> add the appropriate key e.g. config.vpxd.filter.rdmFilter -- false

"Tip: The maximum LUN ID used in a storage rescan is configurable by an advanced setting within ESXi called Disk.MaxLun. This setting is located in the Advanced Settings section of the host's Configuration tab and must be set on a per-host basis. While lowering the default value can increase the speed of rescans and boot up, there is a risk that an administrator will provision a LUN with ID greater than what they have set in Disk.MaxLun. ESXi will never see or discover that LUN. This will occur even if you are under the configuration maximum of  256 LUNs per host. If we experience issues with discovering block storage and can't access the LUN initially, Disk.MaxLun is a great setting to check."

# esxcli system settings advanced list -o /Disk/MaxLUN
   Path: /Disk/MaxLUN
   Type: integer
   Int Value: 256
   Default Int Value: 256
   Min Value: 1
   Max Value: 256
   String Value:
   Default String Value:
   Valid Characters:
   Description: Maximum LUN id (N+1) that is scanned on a target


Table of required ports for NFS and iSCSI
NFS 111/udp/tcp
NFS 2049/udp/tcp <- ESXi 5.1.0 and above
iSCSI/tcp 3260

Enable verbose iSCSI logging:

# vmkiscsid -x "insert into internal (key,value) VALUES ('option.LogLevel'.'999');"

Disable verbose iSCSI logging:

# vmkiscsid -x "delete from internal where key='option.LogLevel';"

Enable verbose NFS logging:

# esxcfg-advcfg -s 1 /NFS/LogNfsStat3

Disable verbose NFS logging:

# esxcfg-advcfg -s 0 /NFS/LogNfsStat3

List, enable, disable verobose NFS logging using esxcli:

# esxcli system settings advanced list -o /NFS/LogNfsStat3
# esxcli system settings advanced set -o /NFS/LogNfsStat3 -i 1
# esxcli system settings advanced set -o /NFS/LogNfsStat3 -i 0

Check for 'pending reservation' I use -s switch for more readable output:

# esxcfg-info -s | egrep -B16 -i "pending reservations"

# vmkfstools -lock -lunreset /vmfs/devices/disks/naa.xxxyyyzzz

VMware KB how change Windows OS PVSCSI queue:
http://kb.vmware.com/kb/1017423

"In order to confirm that we have Jumbo Frames set up properly in our environment, we can pass an MTU size to our vmkping command that we send to the storage array[...] using the stabdard Jumbo Frames MTU of 9000 and header size of 28 bytes, we would use the following command to test our configuration:"

# vmkping -s 8972 -d 1.2.3.4

Verbose logging for fibre channel:

Advance Settings:

# esxcli system settings advanced list -o /Scsi/LogCmdErrors
# esxcli system settings advanced list -o /Scsi/LogScsiAborts
# esxcli system settings advanced set -o /Scsi/LogScsiAborts -i 1

Scsi.LogCmdErrors
Scsi.LogScsiAborts <- for ESXi 5.0.0[ build-821926] this settings is available only from GUI

SUMMARY:

I learnt and refreshed a lot of information from storage troubleshooting area. My next reading will be "Storage Implementation in vSphere 5.0" by Mostafa Khali and "vSphere High Performance Cookbook" by Prasenjit Sarkar








1 comment:

  1. Did you know that you can shorten your links with OUO and receive cash from every visitor to your short links.

    ReplyDelete