Troubleshooting

VMware Update Manager reports "error code 99" during scan operation

After updating my lab to VMware vSphere 6.0 U2, one of my hosts continuously thrown an error during an update scan.

Host returns ESX error code 99, unhandled exception has occurred

The first thing I’ve checked was the esxupdate.log on the affected ESXi host. This is the output, that was logged during a scan operation.

2016-04-04T13:42:13Z esxupdate: vmware.runcommand: INFO: runcommand called with: args = '['/sbin/esxcfg-advcfg', '-q', '-g', '/UserVars/EsximageNetTimeout']', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
2016-04-04T13:42:14Z esxupdate: vmware.runcommand: INFO: runcommand called with: args = '['/sbin/esxcfg-advcfg', '-q', '-g', '/UserVars/EsximageNetRetries']', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
2016-04-04T13:42:14Z esxupdate: vmware.runcommand: INFO: runcommand called with: args = '['/sbin/esxcfg-advcfg', '-q', '-g', '/UserVars/EsximageNetRateLimit']', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
2016-04-04T13:42:14Z esxupdate: esxupdate: INFO: ---
Command: scan
Args: ['scan']
Options: {'nosigcheck': None, 'retry': 5, 'loglevel': None, 'cleancache': None, 'viburls': None, 'meta': ['http://vum.lab.local:9084/vum/repository/hostupdate/10960002/metadata_1456989617.zip', 'http://vum.lab.local:9084/vum/repository/hostupdate/vmw/vmw-ESXi-6.0.0-metadata.zip'], 'proxyurl': None, 'timeout': 30.0, 'cachesize': None, 'hamode': True, 'maintenancemode': None}
2016-04-04T13:42:14Z esxupdate: BootBankInstaller.pyc: DEBUG: Creating an empty ImageProfile for bootbank /bootbank
2016-04-04T13:42:14Z esxupdate: vmware.runcommand: INFO: runcommand called with: args = '['/sbin/bootOption', '-rp']', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
2016-04-04T13:42:14Z esxupdate: vmware.runcommand: INFO: runcommand called with: args = '['/sbin/bootOption', '-ro']', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
2016-04-04T13:42:14Z esxupdate: downloader: DEBUG: Downloading http://vum.lab.local:9084/vum/repository/hostupdate/10960002/metadata_1456989617.zip to /tmp/tmpWW6WJC...
2016-04-04T13:42:17Z esxupdate: Metadata.pyc: INFO: Unrecognized file vendor-index.xml in Metadata file
2016-04-04T13:42:17Z esxupdate: downloader: DEBUG: Downloading http://vum.lab.local:9084/vum/repository/hostupdate/vmw/vmw-ESXi-6.0.0-metadata.zip to /tmp/tmpKfdI64...
2016-04-04T13:42:20Z esxupdate: Metadata.pyc: INFO: Unrecognized file vendor-index.xml in Metadata file
2016-04-04T13:42:21Z esxupdate: BootBankInstaller.pyc: DEBUG: Creating an empty ImageProfile for bootbank /bootbank
2016-04-04T13:42:22Z esxupdate: vmware.runcommand: INFO: runcommand called with: args = '['/usr/sbin/vsish', '-e', '-p', 'cat', '/hardware/bios/dmiInfo']', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
2016-04-04T13:42:22Z esxupdate: vmware.runcommand: INFO: runcommand called with: args = '['/sbin/smbiosDump']', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
2016-04-04T13:42:22Z esxupdate: BootBankInstaller.pyc: DEBUG: Creating an empty ImageProfile for bootbank /bootbank
2016-04-04T13:42:22Z esxupdate: BootBankInstaller.pyc: DEBUG: Creating an empty ImageProfile for bootbank /bootbank
2016-04-04T13:42:23Z esxupdate: BootBankInstaller.pyc: DEBUG: Creating an empty ImageProfile for bootbank /bootbank
2016-04-04T13:42:23Z esxupdate: Transaction: DEBUG: Populating VIB list from all VIBs in metadata http://vum.lab.local:9084/vum/repository/hostupdate/10960002/metadata_1456989617.zip; depots:
2016-04-04T13:42:23Z esxupdate: downloader: DEBUG: Downloading http://vum.lab.local:9084/vum/repository/hostupdate/10960002/metadata_1456989617.zip to /tmp/tmpFQrdX3...
2016-04-04T13:42:23Z esxupdate: Metadata.pyc: INFO: Unrecognized file vendor-index.xml in Metadata file
2016-04-04T13:42:23Z esxupdate: Transaction: DEBUG: Populating VIB list from all VIBs in metadata http://vum.lab.local:9084/vum/repository/hostupdate/vmw/vmw-ESXi-6.0.0-metadata.zip; depots:
2016-04-04T13:42:23Z esxupdate: downloader: DEBUG: Downloading http://vum.lab.local:9084/vum/repository/hostupdate/vmw/vmw-ESXi-6.0.0-metadata.zip to /tmp/tmplZxgm6...
2016-04-04T13:42:23Z esxupdate: Metadata.pyc: INFO: Unrecognized file vendor-index.xml in Metadata file
2016-04-04T13:42:24Z esxupdate: esxupdate: ERROR: An unexpected exception was caught:
2016-04-04T13:42:24Z esxupdate: esxupdate: ERROR: Traceback (most recent call last):
2016-04-04T13:42:24Z esxupdate: esxupdate: ERROR:   File "/usr/sbin/esxupdate", line 238, in main
2016-04-04T13:42:24Z esxupdate: esxupdate: ERROR:     cmd.Run()
2016-04-04T13:42:24Z esxupdate: esxupdate: ERROR:   File "/build/mts/release/bora-3620759/bora/build/esx/release/vmvisor/sys-boot/lib/python2.7/site-packages/vmware/esx5update/Cmdline.py", line 113, in Run
2016-04-04T13:42:24Z esxupdate: esxupdate: ERROR:   File "/build/mts/release/bora-3620759/bora/build/esx/release/vmvisor/sys-boot/lib/python2.7/site-packages/vmware/esx5update/MetadataScanner.py", line 244, in Scan
2016-04-04T13:42:24Z esxupdate: esxupdate: ERROR:   File "/build/mts/release/bora-3620759/bora/build/esx/release/vmvisor/sys-boot/lib/python2.7/site-packages/vmware/esx5update/MetadataScanner.py", line 106, in _generateOperationData
2016-04-04T13:42:24Z esxupdate: esxupdate: ERROR:   File "/build/mts/release/bora-3620759/bora/build/esx/release/vmvisor/sys-boot/lib/python2.7/site-packages/vmware/esx5update/MetadataScanner.py", line 88, in _getInstallProfile
2016-04-04T13:42:24Z esxupdate: esxupdate: ERROR: AttributeError: 'NoneType' object has no attribute 'Copy'
2016-04-04T13:42:24Z esxupdate: esxupdate: DEBUG: <<<

You might notice the “Unrecognized file vendor-index.xml in Metadata file” error. I also found this error message on the other hosts, so I excluded it from further research. It was unlikely, that this error was related to the observed problem. I started searching differences between the hosts and found out, that the output of “esxcli software vib list” was different on the faulty host.

Data Protector: Exchange backup failes because of database lock

Today I had a customer call, where a Exchange 2010 backup repeatedly failed. HPE Data Protector was unable to create a differential or incremental backup. For each database, the following error was logged:

[Minor] From: OB2BAR_E2010_BAR@exchangeserver.domain.tld "MS Exchange 2010+ Server"  Time: 21.03.2016 20:00:27
[170:313] 	One or more copies of database DATABASE are already being backed up in a different session.

Interestingly, there was no other backup session running. But the night before, the backup jobs failed because of a network failure.

ALE OmniSwitch stack does not form due to incompatible licenses

Today I saw an interesting behaviour of two Alcatel-Lucent Enterprise OmniSwitch 6450. Both switches has been configured as a stack, but one of the switches showed a flashing ID after the startup, and the stack was not formed. While I checked the logs and the status of the stack, I noticed that the slot number was incorrect. Furthermore the status showed “INC-LIC”.

-> show stack topology
                                         Link A  Link A          Link B  Link B
NI      Role      State   Saved  Link A  Remote  Remote  Link B  Remote  Remote
                          Slot   State   NI      Port    State   NI      Port
----+-----------+--------+------+-------+-------+-------+-------+-------+-------
   1 PRIMARY     RUNNING    1    UP       1001   StackB  DOWN        0        0
1001 PASS-THRU   INC-LIC    2    DOWN        0        0  UP          1   StackA

-> show log swlog
<snip>
THU MAR 03 13:07:29 2016  STACK-MANAGER    info == SM == Stack Port A Status Changed: DOWN
THU MAR 03 13:07:29 2016  STACK-MANAGER    info == SM == NI 0 down notification sent to LAG
THU MAR 03 13:08:41 2016  STACK-MANAGER    info == SM == Stack Port A Status Changed: UP
THU MAR 03 13:08:41 2016  STACK-MANAGER    info == SM == Stack Port A MAC Frames TX/RX Enabled
THU MAR 03 13:08:42 2016  STACK-MANAGER    info  Retaining Module Id for slot 2 unit 0 as 1
THU MAR 03 13:08:46 2016  STACK-MANAGER    info == SM == An element enters passthru mode (incompatible license)
<snip>

According to the stack status and the switch logs, there seems to be a problem with the licenses. So I checked the installed licenses on both switches. On switch showed Metro license:

Using VCSA as remote syslog - Don't forget the log rotation!

Important note: It seems that vCenter Server Appliance updates revert the changes. Please check the settings after each update!

The VMware vCenter Server Appliance (VCSA) can act as a remote syslog destition for ESXi hosts. This is very handy for troubleshooting and I really recommend to use this feature.  But VMware ESXi hosts can be really chatty and therefore it’s a good idea to keep an eye on the free disk space of the VCSA.

Chicken-and-egg problem: 3PAR VSP 4.3 MU1 & 3PAR OS 3.2.1 MU3

Since monday I’m helping a customer to put two HP 3PAR StoreServ 7200c into operation. Both StoreServs came factory-installed with 3PAR OS 3.2.1 MU3, which is available since July 2015. Usually, the first thing you do is to deploy the 3PAR Service Processor (SP). These days this is (in most cases) a Virtual Service Processor (VSP). The SP is used to initialize the storage system. Later, the SP reports to HP and it’s used for maintenance tasks like shutdown the StoreServ, install updates and patches. There are only a few cases in which you start the Out-of-the-Box (OOTB) procedure of the StoreServ without having a VSP. I deployed two (one VSP for each StoreServ) VSPs, started the Service Processor Setup Wizard, entered the StoreServ serial number and got this message:

Microsoft Exchange 2013 shows blank ECP & OWA after changes to SSL certificates

This issue is described in KB2971270 and is fixed in CU6.

I ran a couple of times in this error. After applying changes to SSL certificates (add, replace or delete a SSL certificate) and rebooting the server, the event log is flooded with events from source “HttpEvent” and event id 15021. The message says:

An error occurred while using SSL configuration for endpoint 0.0.0.0:444. The error status code is contained within the returned data.

DataCore mirrored virtual disks full recovery fails repeatedly

Last sunday a customer suffered a power outage for a few hours. Unfortunately the DataCore Storage Server in the affected datacenter weren’t shutdown and therefore it crashed. After the power was back, the Storage Server was started and the recoveries for the mirrored virtual disks started. Hours later, three mirrored virtual disks were still running full recoveries and the recovery for each of them failed repeatedly.

Patrick Terlisten/ vcloudnine.de/ Creative Commons CC0

Patrick Terlisten/ vcloudnine.de/ Creative Commons CC0

Windows guest customization fails after cloning a VM

Last week I got a call from a customer. The customer has tried to deploy new Citrix XenApp servers, and because the VMware template was a bit outdated, he tried to clone a provisioned and running Citrix XenApp VM. During this, the customer applied a guest customization specification to customize the guest OS (IP address, hostname etc). Until this point everything was fine. But after the clone process, the guest customization started, but never finished.

vCenter Server Appliance: Troubleshooting full database partition

A customer of mine had within 6 months twice a full database partition on a VMware vCenter Server Appliance. After the first outage, the customer increased the size of the partition which is mounted to /storage/db. Some months later, some days ago, the vCSA became unresponsive again. Again because of a filled up database partition. The customer increased the size of the database partition again  (~ 200 GB!!) and today I had time to take a look at this nasty vCSA.