From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: Xen-4.3 - curious crash Date: Wed, 29 Jan 2014 10:30:06 +0000 Message-ID: <52E8D82E.4060604@citrix.com> References: <52E81237.3010302@citrix.com> <52E8CD580200007800117D7E@nat28.tlf.novell.com> <1390985477.15103.4.camel@dagon.hellion.org.uk> <52E8D17E0200007800117D9F@nat28.tlf.novell.com> <1390987522.31814.38.camel@kazak.uk.xensource.com> <52E8DB2B0200007800117DCA@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <52E8DB2B0200007800117DCA@nat28.tlf.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: Ian Campbell , Xen-devel List List-Id: xen-devel@lists.xenproject.org On 29/01/14 09:42, Jan Beulich wrote: >>>> On 29.01.14 at 10:25, Ian Campbell wrote: >> On Wed, 2014-01-29 at 09:01 +0000, Jan Beulich wrote: >>>>>> On 29.01.14 at 09:51, Ian Campbell wrote: >>>> On Wed, 2014-01-29 at 08:43 +0000, Jan Beulich wrote: >>>>> An interrupt not properly restoring EFLAGS.IF (or actually one not >>>>> properly restoring all of EFLAGS) would be very odd. About as odd >>>>> as a cosmic radiation induced bit flip resulting in some other >>>>> misbehavior. >>>> Isn't it also the affect of a missing spin_unlock(_irqrestore)? Or does >>>> something else catch that first? >>> A missing plain spin_unlock() wouldn't have any effect of IF. And >>> a missing spin_unlock_irqrestore() would have an effect on IF in >>> the interrupt handler, but with the return being through an IRET >>> something would need to actively modify the flags on the stack >>> that IRET uses in order to affect the interrupted code's EFLAGS. >> Ah, I mistakenly thought that this issue was happening on that return >> path (i.e. before the IRET). > Right - the problem is that we're having two return paths to > consider here: The outer one (wanting to return to the guest) > explicitly used STI a few instructions before the crash. And it > would need to be an inner one (hardware interrupt) that would > have to fail to restore IF properly, and for that to happen the > EFLAGS image used by that exit path's IRET would need to get > corrupted. > > Jan > This issue has been seen exactly once, on an otherwise perfectly stable server, which is running stably since. I certainly have no evidence to rule out cosmic radiation. I suppose all that can be done at this point is to wait and see whether it reoccurs. ~Andrew