Wrong iovDisableIR setting on ProLiant Gen8 might cause a PSOD
TL;DR: There’s a script at the bottom of the page that fixes the issue.
Some days ago, this HPE customer advisory caught my attention:
And there is also a corrosponding VMware KB article:
ESXi host fails with intermittent NMI PSOD on HP ProLiant Gen8 servers
It isn’t clear WHY this setting was changed, but in VMware ESXi 5.5 patch 10, 6.0 patch 4, 6.0 U3 and, 6.5 the Intel IOMMU’s interrupt remapper functionality was disabled. So if you are running these ESXi versions on a HPE ProLiant Gen8, you might want to check if you are affected.
To make it clear again, only HPE ProLiant Gen8 models are affected. No newer (Gen9) or older (G6, G7) models.
Currently there is no resolution, only a workaround. The iovDisableIR setting must set to FALSE. If it’s set to TRUE, the Intel IOMMU’s interrupt remapper functionality is disabled.
To check this setting, you have to SSH to each host, and use esxcli to check the current setting:
[root@esx1:~] esxcli system settings kernel list -o iovDisableIR
Name Type Description Configured Runtime Default
------------ ---- --------------------------------------- ---------- ------- -------
iovDisableIR Bool Disable Interrupt Routing in the IOMMU... FALSE FALSE TRUE
I have written a small PowerCLI script that uses the Get-EsxCli cmdlet to check all hosts in a cluster. The script only checks the setting, it doesn’t change the iovDisableIR setting.
<# | |
.SYNOPSIS | |
This script checks if the iovDisableIR setting is set to FALSE. | |
.DESCRIPTION | |
The script checks the current setting of the Intel IOMMU interrupt remapper (iovDisableIR). | |
The script needs a single parameter: | |
- vSphere Cluster | |
History | |
v0.2: Under development | |
.EXAMPLE | |
Get-iovDisableIRSetting -Cluster LAB | |
.NOTES | |
Author: Patrick Terlisten, patrick@blazilla.de, Twitter @PTerlisten | |
This script is provided 'AS IS' with no warranty expressed or implied. Run at your own risk. | |
This work is licensed under a Creative Commons Attribution NonCommercial ShareAlike 4.0 | |
International License (https://creativecommons.org/licenses/by-nc-sa/4.0/). | |
ESXi hosts running ESXi 5.5 Patch 10, 6.0 Patch 4, 6.0 U3, or 6.5 may fail with a purple diagnostic screen | |
caused by non-maskable-interrupts (NMI) on HPE ProLiant Gen8 Servers. | |
Important vendor links: | |
ESXi host fails with intermittent NMI PSOD on HP ProLiant Gen8 servers (2149043) | |
https://kb.vmware.com/kb/2149043 | |
Advisory: (Revision) VMware - HPE ProLiant Gen8 Servers running VMware ESXi 5.5 Patch 10, VMware ESXi 6.0 Patch 4, | |
or VMware ESXi 6.5 May Experience Purple Screen Of Death (PSOD): LINT1 Motherboard Interrupt | |
http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c05392947 | |
.LINK | |
http://www.vcloudnine.de | |
#> | |
#Requires -Version 3.0 | |
#Requires -Module VMware.VimAutomation.Core | |
# Parameter | |
Param ( | |
[Parameter(Mandatory = $true)] | |
[string]$cluster = "Name of the vSphere Cluster" | |
) | |
# Get the ESXi hosts from the given cluster | |
$Hosts = Get-Cluster $cluster | Get-VMhost | |
# Create an empty array for the PowerShell object | |
$Results = @() | |
# Looping through the VMHosts | |
$Hosts | ForEach-Object { | |
$EsxCliv2 = Get-EsxCli -V2 -VMHost $_ | |
$Arguments = $EsxCliv2.system.settings.kernel.list.CreateArgs() | |
$Arguments.option = "iovDisableIR" | |
$Output = $EsxCliv2.system.settings.kernel.list.Invoke($Arguments) | |
# Get the vendor and model of the current VMHost | |
$Model = $_ | Get-View | Select-Object @{N = 'Model'; E = {$_.Hardware.SystemInfo.Vendor + ' ' + $_.Hardware.SystemInfo.Model}} | |
# Properties for the PowerShell object | |
$PSOProps = @{ | |
VMHost = $_.name | |
Model = $Model.Model | |
iovDisableIR = $Output.configured | |
} | |
$Results += New-Object -TypeName psobject -Property $PSOProps | |
} | |
# Getting a list of the affected hosts by filtering the output for iovDisableIR = TRUE and server models with "Gen8" | |
$AffectedHosts = $Results | ? {$_.iovDisableIR -eq 'TRUE' -and $_.Model -like '*Gen8'} | Select-Object VMhost | |
$Count = ($AffectedHosts).Count | |
If ($Count -gt 0) { | |
Write-host `n | |
Write-Host -ForegroundColor Red "$Count hosts are affected. Please set iovDisableIR to FALSE on the affected hosts. The following hosts are affected:" | |
$AffectedHosts | |
} | |
else { | |
Write-host `n | |
Write-Host -ForegroundColor Green "None of your hosts seeems to be affected." | |
} |
Here’s another script, that analyzes and fixes the issue.
<# | |
.SYNOPSIS | |
This script checks if the iovDisableIR setting is set to FALSE. If not, it will set iovDisableIR to FALSE. | |
.DESCRIPTION | |
The script checks the current setting of the Intel IOMMU interrupt remapper (iovDisableIR) and changes the setting | |
if necessary. | |
The script needs a single parameter: | |
- vSphere Cluster | |
History | |
v1.0: First Release | |
.EXAMPLE | |
Fix-iovDisableIRSetting -Cluster LAB | |
.NOTES | |
Author: Patrick Terlisten, patrick@blazilla.de, Twitter @PTerlisten | |
This script is provided 'AS IS' with no warranty expressed or implied. Run at your own risk. | |
This work is licensed under a Creative Commons Attribution NonCommercial ShareAlike 4.0 | |
International License (https://creativecommons.org/licenses/by-nc-sa/4.0/). | |
ESXi hosts running ESXi 5.5 Patch 10, 6.0 Patch 4, 6.0 U3, or 6.5 may fail with a purple diagnostic screen | |
caused by non-maskable-interrupts (NMI) on HPE ProLiant Gen8 Servers. | |
Important vendor links: | |
ESXi host fails with intermittent NMI PSOD on HP ProLiant Gen8 servers (2149043) | |
https://kb.vmware.com/kb/2149043 | |
Advisory: (Revision) VMware - HPE ProLiant Gen8 Servers running VMware ESXi 5.5 Patch 10, VMware ESXi 6.0 Patch 4, | |
or VMware ESXi 6.5 May Experience Purple Screen Of Death (PSOD): LINT1 Motherboard Interrupt | |
http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c05392947 | |
.LINK | |
http://www.vcloudnine.de | |
#> | |
#Requires -Version 3.0 | |
#Requires -Module VMware.VimAutomation.Core | |
# Parameter | |
Param ( | |
[Parameter(Mandatory = $true)] | |
[string]$cluster = "Name of the vSphere Cluster" | |
) | |
# Get the ESXi hosts from the given cluster | |
$Hosts = Get-Cluster $cluster | Get-VMhost | |
# Create an empty array for the PowerShell object | |
$Results = @() | |
# Looping through the VMHosts | |
$Hosts | ForEach-Object { | |
$EsxCliv2 = Get-EsxCli -V2 -VMHost $_ | |
$Arguments = $EsxCliv2.system.settings.kernel.list.CreateArgs() | |
$Arguments.option = "iovDisableIR" | |
$Output = $EsxCliv2.system.settings.kernel.list.Invoke($Arguments) | |
# Get the vendor and model of the current VMHost | |
$Model = $_ | Get-View | Select-Object @{N = 'Model'; E = {$_.Hardware.SystemInfo.Vendor + ' ' + $_.Hardware.SystemInfo.Model}} | |
# Properties for the PowerShell object | |
$PSOProps = @{ | |
VMHost = $_.name | |
Model = $Model.Model | |
# The value of "Runtime" represents the current active mode of iovDisableIR | |
iovDisableIR = $Output.Runtime | |
} | |
$Results += New-Object -TypeName psobject -Property $PSOProps | |
} | |
# Getting a list of the affected hosts by filtering the output for iovDisableIR = TRUE and server models with "Gen8" | |
$AffectedHosts = $Results | Where-Object {$_.iovDisableIR -eq 'TRUE' -and $_.Model -like '*Gen8'} | Select-Object VMhost | |
$Count = ($AffectedHosts).Count | |
If ($Count -gt 0) { | |
Write-host `n | |
Write-Host -ForegroundColor Red "The following hosts are affected." | |
$AffectedHosts | Format-Table | |
} | |
else { | |
Write-host `n | |
Write-Host -ForegroundColor Green "None of your hosts seeems to be affected." | |
Write-host `n | |
break | |
} | |
# Change the iovDisableIR | |
Write-Host -ForegroundColor Green "Setting iovDisableIR to FALSE." | |
$AffectedHosts.VMhost | ForEach-Object { | |
# Current host | |
Write-host `n | |
Write-Host -ForegroundColor Green "Processing host $_." | |
try { | |
# Change iovDisableIR | |
$EsxCliv2 = Get-EsxCli -V2 -VMHost $_ | |
$Arguments = $EsxCliv2.system.settings.kernel.set.CreateArgs() | |
$arguments.setting = "iovDisableIR" | |
$arguments.value = $false | |
$EsxCliv2.system.settings.kernel.set.Invoke($Arguments) | |
} | |
catch { | |
Write-Host -ForegroundColor Red "Ups... something with $_ went wrong." | |
} | |
Write-Host -ForegroundColor Green "Finished processing host $_." | |
} | |
Write-host `n | |
Write-Host -ForegroundColor Green "Script finished. Please reboot each host." |