From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH] x86/IO-APIC: refine EOI-ing of migrating level interrupts Date: Fri, 18 Nov 2011 18:57:17 +0000 Message-ID: <4EC6AA8D.5090607@citrix.com> References: <4EC273B40200007800061145@nat28.tlf.novell.com> <4EC53288.5010400@citrix.com> <4EC625F90200007800061BFE@nat28.tlf.novell.com> <4EC69D83.6010704@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4EC69D83.6010704@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel@lists.xensource.com, Jan Beulich List-Id: xen-devel@lists.xenproject.org On 18/11/11 18:01, Andrew Cooper wrote: > On 18/11/11 08:31, Jan Beulich wrote: >> Now that this is in, could you try (again on the offending system) >> whether adding e.g. a WARN_ON(vector != desc->arch.old_vector) >> prior to the just added call to eoi_IO_APIC_irq() (but inside the >> surrounding if()) would ever trigger (obviously you'd want to make >> sure that the code path actually gets executed at all - perhaps >> counting and printing the count once in a while would be the easiest >> thing to do)? >> >> If it does, we obviously need to stay with passing in vector. If not, >> we'd need to do another round of code inspection to determine >> whether indeed there's no race when relying on just the stored >> data. >> >> Thanks, Jan > So long as you also check for arch.old_vector != IRQ_UNASSIGNED_VECTOR, > this appears to be fine. > > I will sort out a patch to change this behavior > Wait actually not. It turns out that there is some race condition which causes this assertion not to hold. Over the space of 2 hours with 16guests and dom0 each trying to stress their storage over a line level interrupt, there have been 5 cases where vector != old_vector. I presume it is some race condition where the scheduler is attempting to move IRQs between PCPUs while they are already in a half moved state. I will attempt to work out what is causing this race condition, but I have some more important bugs to deal with at the moment. I guess we can do with the kudge involving having the lapic vector passed into hw_irq_handler.end until the race condition is identified. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com