* [patch, -rc5-mm3] fix irqpoll some more [not found] ` <1149497459.23209.39.camel@localhost.localdomain> @ 2006-06-05 8:49 ` Ingo Molnar 2006-06-05 13:05 ` Alan Cox 2006-06-06 13:25 ` Steven Rostedt 0 siblings, 2 replies; 8+ messages in thread From: Ingo Molnar @ 2006-06-05 8:49 UTC (permalink / raw) To: Alan Cox, Andrew Morton; +Cc: arjan, linux-kernel * Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > Ar Sul, 2006-06-04 am 23:00 -0700, ysgrifennodd akpm@osdl.org: > > irqpoll/irqfixup ignored IRQ_DISABLED but that could cause lockups. So > > listen to desc->depth to correctly honor disable_irq(). Also, when an > > interrupt it screaming, set IRQ_DISABLED but do not touch ->depth. > > > > Signed-off-by: Ingo Molnar <mingo@elte.hu> > > Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> > > NAK > > The moment you do this you as good as lose irqpoll support because the > IRQ being disabled is unrelated to the IRQ delivery line when dealing > with faults of this nature. hm, agreed. I really sent it more as an RFC. The real fix is to do the disable_irq_handler()/enable_irq_handler() thing. [which is also a much nicer interface for more advanced MSI concepts] > There is a saner fix - walk all the IRQs that are not disabled and > only if that produces no claim (ie the box is about to die otherwise) > then walk the disabled ones. Its a slow path at that point in error > recovery so the performance isn't critical. yeah. How about the patch below? [builds and boots on x86, but no real irqpoll testing was done as i dont have such a problem-system.] Ingo --------------- Subject: fix irqpoll some more From: Ingo Molnar <mingo@elte.hu> implement Alan's suggestion of irqpoll doing a second pass over disabled irq lines if we didnt manage to handle the screaming interrupt. Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/irq/spurious.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) Index: linux/kernel/irq/spurious.c =================================================================== --- linux.orig/kernel/irq/spurious.c +++ linux/kernel/irq/spurious.c @@ -18,8 +18,9 @@ static int irqfixup __read_mostly; */ static int misrouted_irq(int irq, struct pt_regs *regs) { - int i, ok = 0, work = 0; + int i, ok = 0, work = 0, first_pass = 1; +repeat: for (i = 1; i < NR_IRQS; i++) { struct irq_desc *desc = irq_desc + i; struct irqaction *action; @@ -61,7 +62,7 @@ static int misrouted_irq(int irq, struct * IRQ_DISABLED - which might have been set due to * a screaming interrupt. */ - if (desc->depth) { + if (first_pass && desc->depth) { spin_unlock(&desc->lock); continue; } @@ -90,6 +91,25 @@ static int misrouted_irq(int irq, struct desc->chip->end(i); spin_unlock(&desc->lock); } + /* + * HACK: + * + * In the first pass we dont touch handlers that are behind + * a disabled IRQ line. In the second pass (having no other + * choice) we ignore the disabled state of IRQ lines. We've + * got a screaming interrupt, so we have the choice between + * a real lockup happening due to that screaming interrupt, + * against a theoretical locking that becomes possible if we + * ignore a disabled IRQ line. + * + * FIXME: proper disable_irq_handler() API would remove the + * need for this hack. + */ + if (!ok && first_pass) { + first_pass = 0; + goto repeat; + } + /* So the caller can adjust the irq error counts */ return ok; } ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch, -rc5-mm3] fix irqpoll some more 2006-06-05 8:49 ` [patch, -rc5-mm3] fix irqpoll some more Ingo Molnar @ 2006-06-05 13:05 ` Alan Cox 2006-06-05 12:52 ` Ingo Molnar 2006-06-06 13:25 ` Steven Rostedt 1 sibling, 1 reply; 8+ messages in thread From: Alan Cox @ 2006-06-05 13:05 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, arjan, linux-kernel Unfortunately we've got a load of handlers that just blindly say "Yes I handled something" so they bogusly cause completion to be assumed. We'd actually have to know if the IRQ source had "gone away" on the chip I fear. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch, -rc5-mm3] fix irqpoll some more 2006-06-05 13:05 ` Alan Cox @ 2006-06-05 12:52 ` Ingo Molnar 0 siblings, 0 replies; 8+ messages in thread From: Ingo Molnar @ 2006-06-05 12:52 UTC (permalink / raw) To: Alan Cox; +Cc: Andrew Morton, arjan, linux-kernel * Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > Unfortunately we've got a load of handlers that just blindly say "Yes > I handled something" so they bogusly cause completion to be assumed. > > We'd actually have to know if the IRQ source had "gone away" on the > chip I fear. ok. Then lets keep what we have right now, and first do the disable/enable_irq_handler() API, and once the commonly used drivers are covered by the new disable/enable API, apply the unconditional desc->depth check to irqpoll? Ingo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch, -rc5-mm3] fix irqpoll some more 2006-06-05 8:49 ` [patch, -rc5-mm3] fix irqpoll some more Ingo Molnar 2006-06-05 13:05 ` Alan Cox @ 2006-06-06 13:25 ` Steven Rostedt 2006-06-06 13:53 ` Ingo Molnar 1 sibling, 1 reply; 8+ messages in thread From: Steven Rostedt @ 2006-06-06 13:25 UTC (permalink / raw) To: Ingo Molnar; +Cc: Alan Cox, Andrew Morton, arjan, linux-kernel On Mon, 2006-06-05 at 10:49 +0200, Ingo Molnar wrote: > + /* > + * HACK: > + * > + * In the first pass we dont touch handlers that are behind > + * a disabled IRQ line. In the second pass (having no other > + * choice) we ignore the disabled state of IRQ lines. We've > + * got a screaming interrupt, so we have the choice between > + * a real lockup happening due to that screaming interrupt, > + * against a theoretical locking that becomes possible if we > + * ignore a disabled IRQ line. FYI, with irqpoll on in my i386 SMP machine, I hit this theoretical locking every time in the vortex driver. -- Steve > + * > + * FIXME: proper disable_irq_handler() API would remove the > + * need for this hack. > + */ > + if (!ok && first_pass) { > + first_pass = 0; > + goto repeat; > + } > + > /* So the caller can adjust the irq error counts */ > return ok; > } ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch, -rc5-mm3] fix irqpoll some more 2006-06-06 13:25 ` Steven Rostedt @ 2006-06-06 13:53 ` Ingo Molnar 2006-06-06 14:40 ` Steven Rostedt 0 siblings, 1 reply; 8+ messages in thread From: Ingo Molnar @ 2006-06-06 13:53 UTC (permalink / raw) To: Steven Rostedt; +Cc: Alan Cox, Andrew Morton, arjan, linux-kernel * Steven Rostedt <rostedt@goodmis.org> wrote: > On Mon, 2006-06-05 at 10:49 +0200, Ingo Molnar wrote: > > > + /* > > + * HACK: > > + * > > + * In the first pass we dont touch handlers that are behind > > + * a disabled IRQ line. In the second pass (having no other > > + * choice) we ignore the disabled state of IRQ lines. We've > > + * got a screaming interrupt, so we have the choice between > > + * a real lockup happening due to that screaming interrupt, > > + * against a theoretical locking that becomes possible if we > > + * ignore a disabled IRQ line. > > FYI, with irqpoll on in my i386 SMP machine, I hit this theoretical > locking every time in the vortex driver. you got a lockup/deadlock every time, due to the bug that the lock validator pointed out? Ingo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch, -rc5-mm3] fix irqpoll some more 2006-06-06 13:53 ` Ingo Molnar @ 2006-06-06 14:40 ` Steven Rostedt 2006-06-06 14:59 ` Ingo Molnar 0 siblings, 1 reply; 8+ messages in thread From: Steven Rostedt @ 2006-06-06 14:40 UTC (permalink / raw) To: Ingo Molnar; +Cc: Alan Cox, Andrew Morton, arjan, linux-kernel On Tue, 2006-06-06 at 15:53 +0200, Ingo Molnar wrote: > * Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Mon, 2006-06-05 at 10:49 +0200, Ingo Molnar wrote: > > > > > + /* > > > + * HACK: > > > + * > > > + * In the first pass we dont touch handlers that are behind > > > + * a disabled IRQ line. In the second pass (having no other > > > + * choice) we ignore the disabled state of IRQ lines. We've > > > + * got a screaming interrupt, so we have the choice between > > > + * a real lockup happening due to that screaming interrupt, > > > + * against a theoretical locking that becomes possible if we > > > + * ignore a disabled IRQ line. > > > > FYI, with irqpoll on in my i386 SMP machine, I hit this theoretical > > locking every time in the vortex driver. > > you got a lockup/deadlock every time, due to the bug that the lock > validator pointed out? > The one with the spin_lock being used outside of interrupts disabled? Yep. But remember, this only happens with irqpoll where it doesn't honor disabled interrupts, so this was expected behavior. Here's the back trace. I had to disable your lockdep to turn NMI back on. [<c0103d37>] show_stack_log_lvl+0xa7/0xf0 [<c0103f40>] show_registers+0x1c0/0x260 [<c0104357>] die_nmi+0xd7/0x160 [<c0110835>] nmi_watchdog_tick+0x1c5/0x210 [<c0104079>] do_nmi+0x99/0x2a0 [<c01035ee>] nmi_stack_correct+0x1d/0x22 [<f8898933>] boomerang_interrupt+0x33/0x460 [3c59x] [<c01435d5>] note_interrupt+0x155/0x260 [<c0143dfd>] handle_edge_irq+0x11d/0x150 [<c010529f>] do_IRQ+0x4f/0xa0 [<c0103462>] common_interrupt+0x1a/0x20 [<f8895c46>] mdio_read+0xb6/0xf0 [3c59x] [<f885a2e1>] mii_link_ok+0x21/0x50 [mii] [<f885a3a6>] mii_check_media+0x36/0x200 [mii] [<f88956ab>] vortex_check_media+0x3b/0x90 [3c59x] [<f8895b1e>] vortex_timer+0x41e/0x490 [3c59x] [<c0128799>] run_timer_softirq+0xc9/0x1a0 [<c0123ff4>] __do_softirq+0x74/0xf0 [<c01240c5>] do_softirq+0x55/0x60 [<c012422d>] irq_exit+0x4d/0x50 [<c01100e4>] smp_apic_timer_interrupt+0x64/0x70 [<c01034f3>] apic_timer_interrupt+0x1f/0x24 [<c010160d>] cpu_idle+0x4d/0xb0 [<c010f170>] start_secondary+0x450/0x500 [<00000000>] 0x0 -- Steve ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch, -rc5-mm3] fix irqpoll some more 2006-06-06 14:40 ` Steven Rostedt @ 2006-06-06 14:59 ` Ingo Molnar 2006-06-06 15:32 ` Steven Rostedt 0 siblings, 1 reply; 8+ messages in thread From: Ingo Molnar @ 2006-06-06 14:59 UTC (permalink / raw) To: Steven Rostedt; +Cc: Alan Cox, Andrew Morton, arjan, linux-kernel * Steven Rostedt <rostedt@goodmis.org> wrote: > > you got a lockup/deadlock every time, due to the bug that the lock > > validator pointed out? > > > > The one with the spin_lock being used outside of interrupts disabled? > Yep. But remember, this only happens with irqpoll where it doesn't > honor disabled interrupts, so this was expected behavior. > > Here's the back trace. I had to disable your lockdep to turn NMI back > on. > > [<c0103d37>] show_stack_log_lvl+0xa7/0xf0 > [<c0103f40>] show_registers+0x1c0/0x260 > [<c0104357>] die_nmi+0xd7/0x160 > [<c0110835>] nmi_watchdog_tick+0x1c5/0x210 > [<c0104079>] do_nmi+0x99/0x2a0 > [<c01035ee>] nmi_stack_correct+0x1d/0x22 > [<f8898933>] boomerang_interrupt+0x33/0x460 [3c59x] > [<c01435d5>] note_interrupt+0x155/0x260 > [<c0143dfd>] handle_edge_irq+0x11d/0x150 > [<c010529f>] do_IRQ+0x4f/0xa0 > [<c0103462>] common_interrupt+0x1a/0x20 > [<f8895c46>] mdio_read+0xb6/0xf0 [3c59x] > [<f885a2e1>] mii_link_ok+0x21/0x50 [mii] > [<f885a3a6>] mii_check_media+0x36/0x200 [mii] > [<f88956ab>] vortex_check_media+0x3b/0x90 [3c59x] > [<f8895b1e>] vortex_timer+0x41e/0x490 [3c59x] > [<c0128799>] run_timer_softirq+0xc9/0x1a0 > [<c0123ff4>] __do_softirq+0x74/0xf0 > [<c01240c5>] do_softirq+0x55/0x60 > [<c012422d>] irq_exit+0x4d/0x50 > [<c01100e4>] smp_apic_timer_interrupt+0x64/0x70 > [<c01034f3>] apic_timer_interrupt+0x1f/0x24 > [<c010160d>] cpu_idle+0x4d/0xb0 > [<c010f170>] start_secondary+0x450/0x500 > [<00000000>] 0x0 ouch. Then this problem is more urgent than i thought. We definitely need the disable_irq_handler(irq, handler) infrastructure and it should be used in the most common drivers affected (vortex/3c59x, forcedeth and IDE basically). Ingo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch, -rc5-mm3] fix irqpoll some more 2006-06-06 14:59 ` Ingo Molnar @ 2006-06-06 15:32 ` Steven Rostedt 0 siblings, 0 replies; 8+ messages in thread From: Steven Rostedt @ 2006-06-06 15:32 UTC (permalink / raw) To: Ingo Molnar; +Cc: Alan Cox, Andrew Morton, arjan, linux-kernel On Tue, 2006-06-06 at 16:59 +0200, Ingo Molnar wrote: > * > > ouch. Then this problem is more urgent than i thought. We definitely > need the disable_irq_handler(irq, handler) infrastructure and it should > be used in the most common drivers affected (vortex/3c59x, forcedeth and > IDE basically). > Ingo, did you take a look at my alternative? It postpones the unmasking of the interrupt till the enable_irq is done. It also doesn't have the need to update 300+ calls to disable/enable_irq. Also, the disable_irq_handler doesn't solve the case if the misrouted irq came from the device that its driver called the disable_irq_handler. This doesn't affect irqpoll, but for those cases that have broken irq setups needing irqfixup this can still allow an irq storm. -- Steve ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-06-06 15:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200606050600.k5560GdU002338@shell0.pdx.osdl.net>
[not found] ` <1149497459.23209.39.camel@localhost.localdomain>
2006-06-05 8:49 ` [patch, -rc5-mm3] fix irqpoll some more Ingo Molnar
2006-06-05 13:05 ` Alan Cox
2006-06-05 12:52 ` Ingo Molnar
2006-06-06 13:25 ` Steven Rostedt
2006-06-06 13:53 ` Ingo Molnar
2006-06-06 14:40 ` Steven Rostedt
2006-06-06 14:59 ` Ingo Molnar
2006-06-06 15:32 ` Steven Rostedt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox