Friday, 15 August 2014

Notes from INF-STO2564 session provided by Aboubacar Diare from HP

ATS - Atomic Test&Set primitive.
SCSI2 reservation - entire volume is locked
ATS - lock only block

VMFS3: layout - resources (inodes, blocks, sub-blocks etc.)organized into clusters, each cluster has an associated lock and metadata. Clusters form a cluster groups. Cluster groups repeat to make the filesystem.

ATS Values:
* space locality - esxi hosts strive for contiguous blocks of objects they manage
* reduces storage resources contention
* larger VMFS datastores size
* higher VMs density
* reduced datastore management

ATS Caveats:

* vmotion vs. storage vMotion e.g.

1.) on esxi1 we create 10 VMs which see 500GB at the beginning of 1TB volume
2.) host esxi1 is accessing this contiguous area
3.) on esxi2 we create VM which allocate 100GB on the same 1TB volume at the end of 500GB area accessed by esxi1. Both hosts work fine without any storage contention.
4.) now, we vmotion 5 VMs to esxi2,
5.) esxi2 host get access to contiguous space previously accessed by esxi1
6.) now, this hosts start contending for the resources in different area of the disk. Space locality is disturbed.

Do vMotion only when is needed e.g. DRS recommendation. Instead of causing multiple hosts access region on the disk which other hosts will need and increasing the potential for contention of these resources. Storage vMotion more preferable than classic vMotion.

It doesn't mean don't use vmotion !!! It means think when you scale out with vSphere cluster !!! 

* free capacity (filling up VMFS)- need free capacity on the datastore, esxi don't like full datastores ;)
* storage vMotion -> force esxi to find contiguous space.

UNMAP primitive:

# esxcli system settings advanced set -i 0 -o /VMFS3/EnableBlockDelete

# vmkfstools -y X (X = 1 - 100% default 60%)

VMware disabled UNMAP - only manual reclaiming posssible.
Reclaiming space => huge impact on performance. UNMPAP operation create a balloon file equal to size of space to be de-allocated. Balloon file causes many WR I/O.

Avoid X > 90% !!!

You may not get back space which you expected to get back !
Percentage is derived from free space available on VMFS datastore e.g.

2TB datastore 50% full with vmkfstools -y 50 will attempt to reclaim 0.5TB (512GB) = 50% of 1TB free.

VAAI Consideration/Hidden Benefits:

* fewer commands send to the storage array.

UNMAP command is different and depends on array implementation, it could be more command than without UNMAP primitive.

Some VAAI implementation will NOT work if datastore is not aligned:

* VMFS3 datastore to 64kB
* VMFS5 datastore to 1MB 
* some arrays may reject non-aligned VAAI operations
* concurrent clone operations:
-- maybe throttle by array at some threshold (current backend processing limit)
-- limit number of concurrent clone/zero to 3 or 4.

VAAI and array replication - some performance deprecation for VAAI.

Fixed I/O path policy is NOT ALUA aware and not recommended for ALUA arrays.

Changing MPIO settings for multiple LUNs:

# for i in 'esxcli storage nmp device list | grep ^naa.6001' ; do esxcli storage nmp device set -P VMW_PSP_RR -d $i; done

# for i in 'esxcli storage nmp device list | grep ^naa.6001' ; do esxcli storage nmp psp roundrobin deviceconfig set -t iops –I 1 -d $i; done


FIXED_AP - ALUA aware
ESXi5 rolled FIXED_AP functionality into the FIXED I/O path policy. FIXED_AP only applicable to ALUA capable arrays.

Is FIXED in ESXi5 recommended for ALUA arrays?
NO ! For general usage (LUN thrashing). YES ! as a tool for quickly restoring balance in an unbalanced array LUN configuration.

Round Robin - lower IOPS is better, high IOPS value for sequential workload.
  
 

Tuesday, 12 August 2014

No access to VM console; Unable to connect to the MKS: A general system error occured: Internal error

All 16 hosts in cluster are up and running since long time without any issue - uptime 300+ days. On all hosts we cannot get access to VM console. Opening VM console from viClient we get error "Unable to connect to the MKS: A general system error occured: Internal error".
We cannot vmotion VMs to another esxi hosts in cluster.


We login into esxi hosts and noticed that root Ramdisk is full:

# vdf -h | tail -6


The uptime of esxi hosts was impressive:

# uptime


When we tried to get information about Virtual Machines using vim-cmd command we got error:

# vim-cmd vmsvc/getallvms



We tried to figure out what consumed space on root in Ramdisk, we run command:

# find / -size +10k -exec du -h {} \; | egrep -v volumes | egrep -v disks  | less

I spotted a lot of EMCProvider logs in /opt/emc/cim/log

# ls -l | head -5


And bingo! these logs eat the space:

# du -h /opt/emc/cim/log/


It seems that EMCProvider logs haven't rotated and fulfilled root in Ramdisk. I couldn't find any parameter in conf file to setup rotation of EMCProvider logs - it is more feature than bug ;)

We deleted logs older than 200 days (eventually we deleted all EMCProvider logs older than 1 day) on esxi hosts in cluster:

# cd /opt/emc/cim/log/
# find . -name '*.log' -mtime +200 -exec rm -f {} \;

We got some free space on root and were able to got access to some VM console, but some VMs started to show another error 'Unable to connect to the MKS: Failed to connect to server fqdn.com:902':


We identified that VMs located on 3 esxi hosts encounter the error above.

We noticed that on affected esxi hosts nothing is listen on port 902 even when we already had enough free space on root ramdisk:

# esxcli network ip connection | grep :902

 
 VMs which no longer encountered issue with VM console access were located on esxi hosts where 'busybox' listened on port 902:

 
 We decided to put affected esxi hosts into MM (Maintenance Mode) and reboot. After esxi host reboot 'busybox' started to listen on port 902 and VM console issue gone.

The main take-away is that full root ramdisk condition is abnormal - we have to remember that in *nix world everything is a file it could explain why some hosts cannot create TCP socket for 902 port when root was full even after we got some free space on root ramdisk.

Here all steps in one printscreen:



 The End.