From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH] x86/nmi: Make external NMI injection reliably crash the host Date: Wed, 27 Aug 2014 00:01:14 +0100 Message-ID: <53FD11BA.6060305@citrix.com> References: <1409047805-17893-1-git-send-email-ross.lagerwall@citrix.com> <53FCB083.2050603@terremark.com> <53FCBB10.6050108@citrix.com> <53FD0156.1010708@terremark.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <53FD0156.1010708@terremark.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Don Slutz Cc: Ross Lagerwall , Keir Fraser , Jan Beulich , Xen-devel List-Id: xen-devel@lists.xenproject.org On 26/08/2014 22:51, Don Slutz wrote: > On 08/26/14 12:51, Andrew Cooper wrote: >> On 26/08/14 17:06, Don Slutz wrote: >>> On 08/26/14 06:10, Ross Lagerwall wrote: >>>> Change the watchdog handler to only "tick" if the corresponding perf >>>> counter has overflowed; otherwise, return false from the NMI >>>> handler to >>>> indicate that the NMI is not a watchdog tick and let the other >>>> handlers >>>> handle it. This allows externally injected NMIs to reliably crash the >>>> host rather than be swallowed by the watchdog handler. >>> If a crash kernel has been setup via kexec, does this change to >>> "crash host" ends up jumping into the crash kernel? >>> >>> -Don Slutz >> No - this has no change of behaviour as to how Xen proceeds after it has >> decided to panic(). >> >> It does however change whether Xen decided to panic, depending on >> whether the NMI was a result of the watchdog, or some otherwise >> unidentified NMI. >> >> Basically, without this change, the "inject fatal NMI" option in most >> IPMI controllers doesn't work in combination with running the Xen >> watchdog. Only certain HP systems appear to set the IOCK bit in the >> system control port B when injecting an NMI. All other systems just >> send an NMI with no change to the control ports, which get eaten by the >> watchdog logic. >> >> This patch changes the watchdog logic to only consider an NMI as a >> watchdog tick if the perf counter confirms that it injected the NMI. > > Well, that is useful information. Looks like I was not clear. I am > reading > >> as to how Xen proceeds after it has > > > decided to panic(). > > > As a yes, but you start with a no. And I am getting "crash host" to > mean "calls panic()". > > -Don Slutz > >> ~Andrew > Allow me to try again. This patch will alter how NMIs are classified. It does not alter the actions of a particular classification of NMI. Before this patch, any NMI which did not explicitly set the IOCK/SERR bit in the system control port B would be considered a watchdog NMI, and ignored if the watchdog was active. The vast majority of "inject NMI" options from IPMI controllers do not set the IOCK/SERR bit. After this patch is applied, NMIs which are received but not generated by the watchdog performance counters will be considered as external NMIs *even if* the IOCK/SERR bits are not set. The action taken upon discovery of these NMIs is still controlled by the nmi=fatal/dom0/ignore command line option, and in the case of nmi=fatal, panic() is still called as before. Realistically, it means that, with the NMI watchdog enabled, using the "inject NMI" button on your Dell/SuperMicro/IBM/Quanta/Intel IPMI interface will be classified as an external NMI rather than a watchdog NMI, and in the case of nmi=fatal, will call panic(). (Certain HP servers are the only ones we have encountered which reliably set the IOCK bit when injecting an NMI from the iLO interface) ~Andrew