vCenter Server migration to 6.7 fails with "Failed to check VMware STS. The SSL certificate of STS service cannot be verified"
There are still customers out there that are running vCenter Server on a Windows host. This year, despite the fact that most customers have set project on hold, I managed some of them to migrate to a vCenter Server Appliance.
Some days ago I had an meeting with one of my favorite customers to migrate their vCenter Server 6.5 to a vCenter Server Appliance 6.7 U3l. They were still on 6.5 because of some legacy ESXi 5.5 hosts, but they managed it to remove them from their vCenter and we were able to start the migration.
Healthcheck & Stage 1
We did a healthcheck the day before, so we pretty sure that everything should will went smooth. Stage 1 was easy. We deployed the appliance (X-Large… *cough* *cough*) and moved to stage 2. We had ~20 GB of data to migrate. IMHO nothing fancy, I had vCenter withs more data to migrate.
Stage 2… and FAIL
To make a long story short: Stage 2 failed pretty hard with an unrecoverable error.
Encountered an internal error. Traceback (most recent call last): File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 1641, in main vmidentityFB.boot() File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 345, in boot self.checkSTS(self.__stsRetryCount, self.__stsRetryInterval) File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 1179, in checkSTS raise Exception('Failed to initialize Secure Token Server.') Exception: Failed to initialize Secure Token Server.
Deep dive into the log files
The messaged indicated a problem with the Secure Token Server (STS). Without any troubleshooting we did a second try, which also fails with the same message. Time to dive into the logs…
The log file of vmidentity-firstboot.py was pretty helpful, because the end of the log file pointed us to the right direction.
Failed to check VMware STS.
com.vmware.vim.sso.client.exception.CertificateValidationException: The SSL certificate of STS service cannot be verified
This message led us to VMware KB76144 (“Failed to check VMware STS. The SSL certificate of STS service cannot be verified” while upgrading VCSA from 6.5 to 6.7/7.0)
According to the KB article, the cause for our problem is a certificate in STS_INTERNAL_SSL_CERT store which is used by the STS. For sure: This vCenter was upgraded from 5.5 at some time in the past.
So we checked the certificate stores and found further evidence, that a certificate seemed to be our main problem. As you can see, this certificate from the STS_INTERNAL_SSL_CERT store was expired some days ago.
Fortunately, KB76144 offers a simple solution to this problem. In short:
- remove certificates from the STS_INTERNAL_SSL_CERT store, and
- re-import the certificate from the MACHINE_SSL_CERT store
It’s DNS… or NTP… or a expired certificate
Because we had a Windows-based vCenter, we had to modify the commands from the KB article for Windows.
C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry getcert --store MACHINE_SSL_CERT --alias __MACHINE_CERT --output c:\temp\machine_ssl.crt
C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry getkey --store MACHINE_SSL_CERT --alias __MACHINE_CERT --output c:\temp\machine_ssl.key
C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry getcert --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT --output c:\temp\STS_INTERNAL_SSL_CERT-__MACHINE_CERT.crt
C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry getkey --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT --output c:\temp\STS_INTERNAL_SSL_CERT-__MACHINE_CERT.key
C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry delete --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT -y
Deleted entry with alias [__MACHINE_CERT] in store [STS_INTERNAL_SSL_CERT] successfully
C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry create --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT --cert c:\temp\machine_ssl.crt --key c:\temp\machine_ssl.key
Entry with alias [__MACHINE_CERT] in store [STS_INTERNAL_SSL_CERT] was created successfully
The last step was to restart the VMware STS service. After this, we tried the migration again and the migration went smooth.