From mboxrd@z Thu Jan 1 00:00:00 1970 From: Fengzhe Zhang Subject: Re: xen irq unmask bug brainstroming Date: Wed, 16 Feb 2011 12:12:55 +0800 Message-ID: <4D5B4EC7.1070509@intel.com> References: <1A42CE6F5F474C41B63392A5F80372B2335E8D61@shsmsx501.ccr.corp.intel.com> <4D5A63730200007800031F7C@vpn.id2.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4D5A63730200007800031F7C@vpn.id2.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Jan Beulich Cc: Jeremy Fitzhardinge , xen-devel , "Dong, Eddie" , "Li, Xin" List-Id: xen-devel@lists.xenproject.org On 2011/2/15 18:28, Jan Beulich wrote: >>>> On 15.02.11 at 07:28, "Zhang, Fengzhe" wrote: >> Hi, we found a bug related to xen spin unlock ipi. Looking forward to >> brainstorming for a clean fixup. >> >> How the bug happens: >> 1. Dom0 poweroff. >> 2. CPU0 takes down other CPUs. >> 3. IRQs are unmasked in function fixup_irqs on other CPUs. >> 4. IPI IRQ for "lock_kicker_irq" is unmasked (which should never happen). >> 5. Other CPUs receives lock_kicker_irq and dummy_handler (handler for ipi >> XEN_SPIN_UNLOCK_VECTOR) is invoked. >> 6. Dummy_handler reports bug and crashes Dom0. >> >> Main cause: >> Function fixup_irqs masks and then unmasks each irq when taking cpus down. >> And Xen irq_chip structure does not distinguish disable_ops from mask_ops. So >> when the lock_kicker_irq is unmasked, it is effectively re-enabled. >> >> A possible fixup: >> Provide a dedicated disable_ops for xen irq_chip structure. Prevent >> unmask_ops to enable irqs that are disabled. > > Other alternatives (based on what we do in non-pvops, where we > don't have this problem): Either mark the kicker IRQ properly as > IRQ_PER_CPU (IRQF_PERCPU is being passed, but this additionally > requires CONFIG_IRQ_PER_CPU to be set), and then exclude > per-CPU IRQs from being fixed up (which they obviously should be). > > Or don't use the kernel's IRQ subsystem altogether, and instead > directly map the kick logic to event channels. (This is what we do, > but we have the per-CPU handling above in place nevertheless > to cover IPIs and timer vIRQ.) > > Jan > Can we safely set CONFIG_IRQ_PER_CPU in current pvops kernel? -Fengzhe