* [patch, -rc5-mm3] fix irqpoll some more
[not found] ` <1149497459.23209.39.camel@localhost.localdomain>
@ 2006-06-05 8:49 ` Ingo Molnar
2006-06-05 13:05 ` Alan Cox
2006-06-06 13:25 ` Steven Rostedt
0 siblings, 2 replies; 8+ messages in thread
From: Ingo Molnar @ 2006-06-05 8:49 UTC (permalink / raw)
To: Alan Cox, Andrew Morton; +Cc: arjan, linux-kernel
* Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> Ar Sul, 2006-06-04 am 23:00 -0700, ysgrifennodd akpm@osdl.org:
> > irqpoll/irqfixup ignored IRQ_DISABLED but that could cause lockups. So
> > listen to desc->depth to correctly honor disable_irq(). Also, when an
> > interrupt it screaming, set IRQ_DISABLED but do not touch ->depth.
> >
> > Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
>
> NAK
>
> The moment you do this you as good as lose irqpoll support because the
> IRQ being disabled is unrelated to the IRQ delivery line when dealing
> with faults of this nature.
hm, agreed. I really sent it more as an RFC. The real fix is to do the
disable_irq_handler()/enable_irq_handler() thing. [which is also a much
nicer interface for more advanced MSI concepts]
> There is a saner fix - walk all the IRQs that are not disabled and
> only if that produces no claim (ie the box is about to die otherwise)
> then walk the disabled ones. Its a slow path at that point in error
> recovery so the performance isn't critical.
yeah. How about the patch below? [builds and boots on x86, but no real
irqpoll testing was done as i dont have such a problem-system.]
Ingo
---------------
Subject: fix irqpoll some more
From: Ingo Molnar <mingo@elte.hu>
implement Alan's suggestion of irqpoll doing a second pass over
disabled irq lines if we didnt manage to handle the screaming
interrupt.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/irq/spurious.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)
Index: linux/kernel/irq/spurious.c
===================================================================
--- linux.orig/kernel/irq/spurious.c
+++ linux/kernel/irq/spurious.c
@@ -18,8 +18,9 @@ static int irqfixup __read_mostly;
*/
static int misrouted_irq(int irq, struct pt_regs *regs)
{
- int i, ok = 0, work = 0;
+ int i, ok = 0, work = 0, first_pass = 1;
+repeat:
for (i = 1; i < NR_IRQS; i++) {
struct irq_desc *desc = irq_desc + i;
struct irqaction *action;
@@ -61,7 +62,7 @@ static int misrouted_irq(int irq, struct
* IRQ_DISABLED - which might have been set due to
* a screaming interrupt.
*/
- if (desc->depth) {
+ if (first_pass && desc->depth) {
spin_unlock(&desc->lock);
continue;
}
@@ -90,6 +91,25 @@ static int misrouted_irq(int irq, struct
desc->chip->end(i);
spin_unlock(&desc->lock);
}
+ /*
+ * HACK:
+ *
+ * In the first pass we dont touch handlers that are behind
+ * a disabled IRQ line. In the second pass (having no other
+ * choice) we ignore the disabled state of IRQ lines. We've
+ * got a screaming interrupt, so we have the choice between
+ * a real lockup happening due to that screaming interrupt,
+ * against a theoretical locking that becomes possible if we
+ * ignore a disabled IRQ line.
+ *
+ * FIXME: proper disable_irq_handler() API would remove the
+ * need for this hack.
+ */
+ if (!ok && first_pass) {
+ first_pass = 0;
+ goto repeat;
+ }
+
/* So the caller can adjust the irq error counts */
return ok;
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch, -rc5-mm3] fix irqpoll some more
2006-06-05 13:05 ` Alan Cox
@ 2006-06-05 12:52 ` Ingo Molnar
0 siblings, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2006-06-05 12:52 UTC (permalink / raw)
To: Alan Cox; +Cc: Andrew Morton, arjan, linux-kernel
* Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> Unfortunately we've got a load of handlers that just blindly say "Yes
> I handled something" so they bogusly cause completion to be assumed.
>
> We'd actually have to know if the IRQ source had "gone away" on the
> chip I fear.
ok. Then lets keep what we have right now, and first do the
disable/enable_irq_handler() API, and once the commonly used drivers are
covered by the new disable/enable API, apply the unconditional
desc->depth check to irqpoll?
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch, -rc5-mm3] fix irqpoll some more
2006-06-05 8:49 ` [patch, -rc5-mm3] fix irqpoll some more Ingo Molnar
@ 2006-06-05 13:05 ` Alan Cox
2006-06-05 12:52 ` Ingo Molnar
2006-06-06 13:25 ` Steven Rostedt
1 sibling, 1 reply; 8+ messages in thread
From: Alan Cox @ 2006-06-05 13:05 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Andrew Morton, arjan, linux-kernel
Unfortunately we've got a load of handlers that just blindly say "Yes I
handled something" so they bogusly cause completion to be assumed.
We'd actually have to know if the IRQ source had "gone away" on the chip
I fear.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch, -rc5-mm3] fix irqpoll some more
2006-06-05 8:49 ` [patch, -rc5-mm3] fix irqpoll some more Ingo Molnar
2006-06-05 13:05 ` Alan Cox
@ 2006-06-06 13:25 ` Steven Rostedt
2006-06-06 13:53 ` Ingo Molnar
1 sibling, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2006-06-06 13:25 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Alan Cox, Andrew Morton, arjan, linux-kernel
On Mon, 2006-06-05 at 10:49 +0200, Ingo Molnar wrote:
> + /*
> + * HACK:
> + *
> + * In the first pass we dont touch handlers that are behind
> + * a disabled IRQ line. In the second pass (having no other
> + * choice) we ignore the disabled state of IRQ lines. We've
> + * got a screaming interrupt, so we have the choice between
> + * a real lockup happening due to that screaming interrupt,
> + * against a theoretical locking that becomes possible if we
> + * ignore a disabled IRQ line.
FYI, with irqpoll on in my i386 SMP machine, I hit this theoretical
locking every time in the vortex driver.
-- Steve
> + *
> + * FIXME: proper disable_irq_handler() API would remove the
> + * need for this hack.
> + */
> + if (!ok && first_pass) {
> + first_pass = 0;
> + goto repeat;
> + }
> +
> /* So the caller can adjust the irq error counts */
> return ok;
> }
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch, -rc5-mm3] fix irqpoll some more
2006-06-06 13:25 ` Steven Rostedt
@ 2006-06-06 13:53 ` Ingo Molnar
2006-06-06 14:40 ` Steven Rostedt
0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2006-06-06 13:53 UTC (permalink / raw)
To: Steven Rostedt; +Cc: Alan Cox, Andrew Morton, arjan, linux-kernel
* Steven Rostedt <rostedt@goodmis.org> wrote:
> On Mon, 2006-06-05 at 10:49 +0200, Ingo Molnar wrote:
>
> > + /*
> > + * HACK:
> > + *
> > + * In the first pass we dont touch handlers that are behind
> > + * a disabled IRQ line. In the second pass (having no other
> > + * choice) we ignore the disabled state of IRQ lines. We've
> > + * got a screaming interrupt, so we have the choice between
> > + * a real lockup happening due to that screaming interrupt,
> > + * against a theoretical locking that becomes possible if we
> > + * ignore a disabled IRQ line.
>
> FYI, with irqpoll on in my i386 SMP machine, I hit this theoretical
> locking every time in the vortex driver.
you got a lockup/deadlock every time, due to the bug that the lock
validator pointed out?
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch, -rc5-mm3] fix irqpoll some more
2006-06-06 13:53 ` Ingo Molnar
@ 2006-06-06 14:40 ` Steven Rostedt
2006-06-06 14:59 ` Ingo Molnar
0 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2006-06-06 14:40 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Alan Cox, Andrew Morton, arjan, linux-kernel
On Tue, 2006-06-06 at 15:53 +0200, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > On Mon, 2006-06-05 at 10:49 +0200, Ingo Molnar wrote:
> >
> > > + /*
> > > + * HACK:
> > > + *
> > > + * In the first pass we dont touch handlers that are behind
> > > + * a disabled IRQ line. In the second pass (having no other
> > > + * choice) we ignore the disabled state of IRQ lines. We've
> > > + * got a screaming interrupt, so we have the choice between
> > > + * a real lockup happening due to that screaming interrupt,
> > > + * against a theoretical locking that becomes possible if we
> > > + * ignore a disabled IRQ line.
> >
> > FYI, with irqpoll on in my i386 SMP machine, I hit this theoretical
> > locking every time in the vortex driver.
>
> you got a lockup/deadlock every time, due to the bug that the lock
> validator pointed out?
>
The one with the spin_lock being used outside of interrupts disabled?
Yep. But remember, this only happens with irqpoll where it doesn't
honor disabled interrupts, so this was expected behavior.
Here's the back trace. I had to disable your lockdep to turn NMI back
on.
[<c0103d37>] show_stack_log_lvl+0xa7/0xf0
[<c0103f40>] show_registers+0x1c0/0x260
[<c0104357>] die_nmi+0xd7/0x160
[<c0110835>] nmi_watchdog_tick+0x1c5/0x210
[<c0104079>] do_nmi+0x99/0x2a0
[<c01035ee>] nmi_stack_correct+0x1d/0x22
[<f8898933>] boomerang_interrupt+0x33/0x460 [3c59x]
[<c01435d5>] note_interrupt+0x155/0x260
[<c0143dfd>] handle_edge_irq+0x11d/0x150
[<c010529f>] do_IRQ+0x4f/0xa0
[<c0103462>] common_interrupt+0x1a/0x20
[<f8895c46>] mdio_read+0xb6/0xf0 [3c59x]
[<f885a2e1>] mii_link_ok+0x21/0x50 [mii]
[<f885a3a6>] mii_check_media+0x36/0x200 [mii]
[<f88956ab>] vortex_check_media+0x3b/0x90 [3c59x]
[<f8895b1e>] vortex_timer+0x41e/0x490 [3c59x]
[<c0128799>] run_timer_softirq+0xc9/0x1a0
[<c0123ff4>] __do_softirq+0x74/0xf0
[<c01240c5>] do_softirq+0x55/0x60
[<c012422d>] irq_exit+0x4d/0x50
[<c01100e4>] smp_apic_timer_interrupt+0x64/0x70
[<c01034f3>] apic_timer_interrupt+0x1f/0x24
[<c010160d>] cpu_idle+0x4d/0xb0
[<c010f170>] start_secondary+0x450/0x500
[<00000000>] 0x0
-- Steve
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch, -rc5-mm3] fix irqpoll some more
2006-06-06 14:40 ` Steven Rostedt
@ 2006-06-06 14:59 ` Ingo Molnar
2006-06-06 15:32 ` Steven Rostedt
0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2006-06-06 14:59 UTC (permalink / raw)
To: Steven Rostedt; +Cc: Alan Cox, Andrew Morton, arjan, linux-kernel
* Steven Rostedt <rostedt@goodmis.org> wrote:
> > you got a lockup/deadlock every time, due to the bug that the lock
> > validator pointed out?
> >
>
> The one with the spin_lock being used outside of interrupts disabled?
> Yep. But remember, this only happens with irqpoll where it doesn't
> honor disabled interrupts, so this was expected behavior.
>
> Here's the back trace. I had to disable your lockdep to turn NMI back
> on.
>
> [<c0103d37>] show_stack_log_lvl+0xa7/0xf0
> [<c0103f40>] show_registers+0x1c0/0x260
> [<c0104357>] die_nmi+0xd7/0x160
> [<c0110835>] nmi_watchdog_tick+0x1c5/0x210
> [<c0104079>] do_nmi+0x99/0x2a0
> [<c01035ee>] nmi_stack_correct+0x1d/0x22
> [<f8898933>] boomerang_interrupt+0x33/0x460 [3c59x]
> [<c01435d5>] note_interrupt+0x155/0x260
> [<c0143dfd>] handle_edge_irq+0x11d/0x150
> [<c010529f>] do_IRQ+0x4f/0xa0
> [<c0103462>] common_interrupt+0x1a/0x20
> [<f8895c46>] mdio_read+0xb6/0xf0 [3c59x]
> [<f885a2e1>] mii_link_ok+0x21/0x50 [mii]
> [<f885a3a6>] mii_check_media+0x36/0x200 [mii]
> [<f88956ab>] vortex_check_media+0x3b/0x90 [3c59x]
> [<f8895b1e>] vortex_timer+0x41e/0x490 [3c59x]
> [<c0128799>] run_timer_softirq+0xc9/0x1a0
> [<c0123ff4>] __do_softirq+0x74/0xf0
> [<c01240c5>] do_softirq+0x55/0x60
> [<c012422d>] irq_exit+0x4d/0x50
> [<c01100e4>] smp_apic_timer_interrupt+0x64/0x70
> [<c01034f3>] apic_timer_interrupt+0x1f/0x24
> [<c010160d>] cpu_idle+0x4d/0xb0
> [<c010f170>] start_secondary+0x450/0x500
> [<00000000>] 0x0
ouch. Then this problem is more urgent than i thought. We definitely
need the disable_irq_handler(irq, handler) infrastructure and it should
be used in the most common drivers affected (vortex/3c59x, forcedeth and
IDE basically).
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch, -rc5-mm3] fix irqpoll some more
2006-06-06 14:59 ` Ingo Molnar
@ 2006-06-06 15:32 ` Steven Rostedt
0 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2006-06-06 15:32 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Alan Cox, Andrew Morton, arjan, linux-kernel
On Tue, 2006-06-06 at 16:59 +0200, Ingo Molnar wrote:
> *
>
> ouch. Then this problem is more urgent than i thought. We definitely
> need the disable_irq_handler(irq, handler) infrastructure and it should
> be used in the most common drivers affected (vortex/3c59x, forcedeth and
> IDE basically).
>
Ingo, did you take a look at my alternative? It postpones the unmasking
of the interrupt till the enable_irq is done. It also doesn't have the
need to update 300+ calls to disable/enable_irq.
Also, the disable_irq_handler doesn't solve the case if the misrouted
irq came from the device that its driver called the disable_irq_handler.
This doesn't affect irqpoll, but for those cases that have broken irq
setups needing irqfixup this can still allow an irq storm.
-- Steve
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-06-06 15:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200606050600.k5560GdU002338@shell0.pdx.osdl.net>
[not found] ` <1149497459.23209.39.camel@localhost.localdomain>
2006-06-05 8:49 ` [patch, -rc5-mm3] fix irqpoll some more Ingo Molnar
2006-06-05 13:05 ` Alan Cox
2006-06-05 12:52 ` Ingo Molnar
2006-06-06 13:25 ` Steven Rostedt
2006-06-06 13:53 ` Ingo Molnar
2006-06-06 14:40 ` Steven Rostedt
2006-06-06 14:59 ` Ingo Molnar
2006-06-06 15:32 ` Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox