All of lore.kernel.org
 help / color / mirror / Atom feed
* xen irq unmask bug brainstroming
@ 2011-02-15  6:28 Zhang, Fengzhe
  2011-02-15 10:28 ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Zhang, Fengzhe @ 2011-02-15  6:28 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel, Dong, Eddie, Li, Xin

Hi, we found a bug related to xen spin unlock ipi. Looking forward to brainstorming for a clean fixup.

How the bug happens:
1. Dom0 poweroff.
2. CPU0 takes down other CPUs.
3. IRQs are unmasked in function fixup_irqs on other CPUs.
4. IPI IRQ for "lock_kicker_irq" is unmasked (which should never happen).
5. Other CPUs receives lock_kicker_irq and dummy_handler (handler for ipi XEN_SPIN_UNLOCK_VECTOR) is invoked.
6. Dummy_handler reports bug and crashes Dom0.

Main cause:
Function fixup_irqs masks and then unmasks each irq when taking cpus down. And Xen irq_chip structure does not distinguish disable_ops from mask_ops. So when the lock_kicker_irq is unmasked, it is effectively re-enabled.

A possible fixup:
Provide a dedicated disable_ops for xen irq_chip structure. Prevent unmask_ops to enable irqs that are disabled.

-Fengzhe Zhang

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xen irq unmask bug brainstroming
  2011-02-15  6:28 xen irq unmask bug brainstroming Zhang, Fengzhe
@ 2011-02-15 10:28 ` Jan Beulich
  2011-02-16  4:12   ` Fengzhe Zhang
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2011-02-15 10:28 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Fengzhe Zhang; +Cc: xen-devel, Eddie Dong, Xin Li

>>> On 15.02.11 at 07:28, "Zhang, Fengzhe" <fengzhe.zhang@intel.com> wrote:
> Hi, we found a bug related to xen spin unlock ipi. Looking forward to 
> brainstorming for a clean fixup.
> 
> How the bug happens:
> 1. Dom0 poweroff.
> 2. CPU0 takes down other CPUs.
> 3. IRQs are unmasked in function fixup_irqs on other CPUs.
> 4. IPI IRQ for "lock_kicker_irq" is unmasked (which should never happen).
> 5. Other CPUs receives lock_kicker_irq and dummy_handler (handler for ipi 
> XEN_SPIN_UNLOCK_VECTOR) is invoked.
> 6. Dummy_handler reports bug and crashes Dom0.
> 
> Main cause:
> Function fixup_irqs masks and then unmasks each irq when taking cpus down. 
> And Xen irq_chip structure does not distinguish disable_ops from mask_ops. So 
> when the lock_kicker_irq is unmasked, it is effectively re-enabled.
> 
> A possible fixup:
> Provide a dedicated disable_ops for xen irq_chip structure. Prevent 
> unmask_ops to enable irqs that are disabled.

Other alternatives (based on what we do in non-pvops, where we
don't have this problem): Either mark the kicker IRQ properly as
IRQ_PER_CPU (IRQF_PERCPU is being passed, but this additionally
requires CONFIG_IRQ_PER_CPU to be set), and then exclude
per-CPU IRQs from being fixed up (which they obviously should be).

Or don't use the kernel's IRQ subsystem altogether, and instead
directly map the kick logic to event channels. (This is what we do,
but we have the per-CPU handling above in place nevertheless
to cover IPIs and timer vIRQ.)

Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xen irq unmask bug brainstroming
  2011-02-15 10:28 ` Jan Beulich
@ 2011-02-16  4:12   ` Fengzhe Zhang
  2011-02-16  8:22     ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Fengzhe Zhang @ 2011-02-16  4:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jeremy Fitzhardinge, xen-devel, Dong, Eddie, Li, Xin

On 2011/2/15 18:28, Jan Beulich wrote:
>>>> On 15.02.11 at 07:28, "Zhang, Fengzhe"<fengzhe.zhang@intel.com>  wrote:
>> Hi, we found a bug related to xen spin unlock ipi. Looking forward to
>> brainstorming for a clean fixup.
>>
>> How the bug happens:
>> 1. Dom0 poweroff.
>> 2. CPU0 takes down other CPUs.
>> 3. IRQs are unmasked in function fixup_irqs on other CPUs.
>> 4. IPI IRQ for "lock_kicker_irq" is unmasked (which should never happen).
>> 5. Other CPUs receives lock_kicker_irq and dummy_handler (handler for ipi
>> XEN_SPIN_UNLOCK_VECTOR) is invoked.
>> 6. Dummy_handler reports bug and crashes Dom0.
>>
>> Main cause:
>> Function fixup_irqs masks and then unmasks each irq when taking cpus down.
>> And Xen irq_chip structure does not distinguish disable_ops from mask_ops. So
>> when the lock_kicker_irq is unmasked, it is effectively re-enabled.
>>
>> A possible fixup:
>> Provide a dedicated disable_ops for xen irq_chip structure. Prevent
>> unmask_ops to enable irqs that are disabled.
>
> Other alternatives (based on what we do in non-pvops, where we
> don't have this problem): Either mark the kicker IRQ properly as
> IRQ_PER_CPU (IRQF_PERCPU is being passed, but this additionally
> requires CONFIG_IRQ_PER_CPU to be set), and then exclude
> per-CPU IRQs from being fixed up (which they obviously should be).
>
> Or don't use the kernel's IRQ subsystem altogether, and instead
> directly map the kick logic to event channels. (This is what we do,
> but we have the per-CPU handling above in place nevertheless
> to cover IPIs and timer vIRQ.)
>
> Jan
>

Can we safely set CONFIG_IRQ_PER_CPU in current pvops kernel?

-Fengzhe

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xen irq unmask bug brainstroming
  2011-02-16  4:12   ` Fengzhe Zhang
@ 2011-02-16  8:22     ` Jan Beulich
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2011-02-16  8:22 UTC (permalink / raw)
  To: Fengzhe Zhang; +Cc: Jeremy Fitzhardinge, xen-devel, Eddie Dong, Xin Li

>>> On 16.02.11 at 05:12, Fengzhe Zhang <fengzhe.zhang@intel.com> wrote:
> On 2011/2/15 18:28, Jan Beulich wrote:
>>>>> On 15.02.11 at 07:28, "Zhang, Fengzhe"<fengzhe.zhang@intel.com>  wrote:
>>> Hi, we found a bug related to xen spin unlock ipi. Looking forward to
>>> brainstorming for a clean fixup.
>>>
>>> How the bug happens:
>>> 1. Dom0 poweroff.
>>> 2. CPU0 takes down other CPUs.
>>> 3. IRQs are unmasked in function fixup_irqs on other CPUs.
>>> 4. IPI IRQ for "lock_kicker_irq" is unmasked (which should never happen).
>>> 5. Other CPUs receives lock_kicker_irq and dummy_handler (handler for ipi
>>> XEN_SPIN_UNLOCK_VECTOR) is invoked.
>>> 6. Dummy_handler reports bug and crashes Dom0.
>>>
>>> Main cause:
>>> Function fixup_irqs masks and then unmasks each irq when taking cpus down.
>>> And Xen irq_chip structure does not distinguish disable_ops from mask_ops. 
> So
>>> when the lock_kicker_irq is unmasked, it is effectively re-enabled.
>>>
>>> A possible fixup:
>>> Provide a dedicated disable_ops for xen irq_chip structure. Prevent
>>> unmask_ops to enable irqs that are disabled.
>>
>> Other alternatives (based on what we do in non-pvops, where we
>> don't have this problem): Either mark the kicker IRQ properly as
>> IRQ_PER_CPU (IRQF_PERCPU is being passed, but this additionally
>> requires CONFIG_IRQ_PER_CPU to be set), and then exclude
>> per-CPU IRQs from being fixed up (which they obviously should be).
>>
>> Or don't use the kernel's IRQ subsystem altogether, and instead
>> directly map the kick logic to event channels. (This is what we do,
>> but we have the per-CPU handling above in place nevertheless
>> to cover IPIs and timer vIRQ.)
>>
>> Jan
>>
> 
> Can we safely set CONFIG_IRQ_PER_CPU in current pvops kernel?

I think so, but you'll need to get this accepted by the x86 maintainers
anyway, so perhaps asking for their opinion would be useful.

Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-02-16  8:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-15  6:28 xen irq unmask bug brainstroming Zhang, Fengzhe
2011-02-15 10:28 ` Jan Beulich
2011-02-16  4:12   ` Fengzhe Zhang
2011-02-16  8:22     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.