public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch, -rc5-mm3] fix irqpoll some more
       [not found] ` <1149497459.23209.39.camel@localhost.localdomain>
@ 2006-06-05  8:49   ` Ingo Molnar
  2006-06-05 13:05     ` Alan Cox
  2006-06-06 13:25     ` Steven Rostedt
  0 siblings, 2 replies; 8+ messages in thread
From: Ingo Molnar @ 2006-06-05  8:49 UTC (permalink / raw)
  To: Alan Cox, Andrew Morton; +Cc: arjan, linux-kernel


* Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> Ar Sul, 2006-06-04 am 23:00 -0700, ysgrifennodd akpm@osdl.org:
> > irqpoll/irqfixup ignored IRQ_DISABLED but that could cause lockups.  So
> > listen to desc->depth to correctly honor disable_irq().  Also, when an
> > interrupt it screaming, set IRQ_DISABLED but do not touch ->depth.
> > 
> > Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
> 
> NAK
> 
> The moment you do this you as good as lose irqpoll support because the 
> IRQ being disabled is unrelated to the IRQ delivery line when dealing 
> with faults of this nature.

hm, agreed. I really sent it more as an RFC. The real fix is to do the 
disable_irq_handler()/enable_irq_handler() thing. [which is also a much 
nicer interface for more advanced MSI concepts]

> There is a saner fix - walk all the IRQs that are not disabled and 
> only if that produces no claim (ie the box is about to die otherwise) 
> then walk the disabled ones. Its a slow path at that point in error 
> recovery so the performance isn't critical.

yeah. How about the patch below? [builds and boots on x86, but no real 
irqpoll testing was done as i dont have such a problem-system.]

	Ingo

---------------
Subject: fix irqpoll some more
From: Ingo Molnar <mingo@elte.hu>

implement Alan's suggestion of irqpoll doing a second pass over
disabled irq lines if we didnt manage to handle the screaming
interrupt.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/irq/spurious.c |   24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

Index: linux/kernel/irq/spurious.c
===================================================================
--- linux.orig/kernel/irq/spurious.c
+++ linux/kernel/irq/spurious.c
@@ -18,8 +18,9 @@ static int irqfixup __read_mostly;
  */
 static int misrouted_irq(int irq, struct pt_regs *regs)
 {
-	int i, ok = 0, work = 0;
+	int i, ok = 0, work = 0, first_pass = 1;
 
+repeat:
 	for (i = 1; i < NR_IRQS; i++) {
 		struct irq_desc *desc = irq_desc + i;
 		struct irqaction *action;
@@ -61,7 +62,7 @@ static int misrouted_irq(int irq, struct
 		 * IRQ_DISABLED - which might have been set due to
 		 * a screaming interrupt.
 		 */
-		if (desc->depth) {
+		if (first_pass && desc->depth) {
 			spin_unlock(&desc->lock);
 			continue;
 		}
@@ -90,6 +91,25 @@ static int misrouted_irq(int irq, struct
 			desc->chip->end(i);
 		spin_unlock(&desc->lock);
 	}
+	/*
+	 * HACK:
+	 *
+	 * In the first pass we dont touch handlers that are behind
+	 * a disabled IRQ line. In the second pass (having no other
+	 * choice) we ignore the disabled state of IRQ lines. We've
+	 * got a screaming interrupt, so we have the choice between
+	 * a real lockup happening due to that screaming interrupt,
+	 * against a theoretical locking that becomes possible if we
+	 * ignore a disabled IRQ line.
+	 *
+	 * FIXME: proper disable_irq_handler() API would remove the
+	 * need for this hack.
+	 */
+	if (!ok && first_pass) {
+		first_pass = 0;
+		goto repeat;
+	}
+
 	/* So the caller can adjust the irq error counts */
 	return ok;
 }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch, -rc5-mm3] fix irqpoll some more
  2006-06-05 13:05     ` Alan Cox
@ 2006-06-05 12:52       ` Ingo Molnar
  0 siblings, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2006-06-05 12:52 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andrew Morton, arjan, linux-kernel


* Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> Unfortunately we've got a load of handlers that just blindly say "Yes 
> I handled something" so they bogusly cause completion to be assumed.
> 
> We'd actually have to know if the IRQ source had "gone away" on the 
> chip I fear.

ok. Then lets keep what we have right now, and first do the 
disable/enable_irq_handler() API, and once the commonly used drivers are 
covered by the new disable/enable API, apply the unconditional 
desc->depth check to irqpoll?

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch, -rc5-mm3] fix irqpoll some more
  2006-06-05  8:49   ` [patch, -rc5-mm3] fix irqpoll some more Ingo Molnar
@ 2006-06-05 13:05     ` Alan Cox
  2006-06-05 12:52       ` Ingo Molnar
  2006-06-06 13:25     ` Steven Rostedt
  1 sibling, 1 reply; 8+ messages in thread
From: Alan Cox @ 2006-06-05 13:05 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, arjan, linux-kernel

Unfortunately we've got a load of handlers that just blindly say "Yes I
handled something" so they bogusly cause completion to be assumed.

We'd actually have to know if the IRQ source had "gone away" on the chip
I fear.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch, -rc5-mm3] fix irqpoll some more
  2006-06-05  8:49   ` [patch, -rc5-mm3] fix irqpoll some more Ingo Molnar
  2006-06-05 13:05     ` Alan Cox
@ 2006-06-06 13:25     ` Steven Rostedt
  2006-06-06 13:53       ` Ingo Molnar
  1 sibling, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2006-06-06 13:25 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Alan Cox, Andrew Morton, arjan, linux-kernel

On Mon, 2006-06-05 at 10:49 +0200, Ingo Molnar wrote:

> +	/*
> +	 * HACK:
> +	 *
> +	 * In the first pass we dont touch handlers that are behind
> +	 * a disabled IRQ line. In the second pass (having no other
> +	 * choice) we ignore the disabled state of IRQ lines. We've
> +	 * got a screaming interrupt, so we have the choice between
> +	 * a real lockup happening due to that screaming interrupt,
> +	 * against a theoretical locking that becomes possible if we
> +	 * ignore a disabled IRQ line.

FYI,  with irqpoll on in my i386 SMP machine, I hit this theoretical
locking every time in the vortex driver.

-- Steve

> +	 *
> +	 * FIXME: proper disable_irq_handler() API would remove the
> +	 * need for this hack.
> +	 */
> +	if (!ok && first_pass) {
> +		first_pass = 0;
> +		goto repeat;
> +	}
> +
>  	/* So the caller can adjust the irq error counts */
>  	return ok;
>  }



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch, -rc5-mm3] fix irqpoll some more
  2006-06-06 13:25     ` Steven Rostedt
@ 2006-06-06 13:53       ` Ingo Molnar
  2006-06-06 14:40         ` Steven Rostedt
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2006-06-06 13:53 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Alan Cox, Andrew Morton, arjan, linux-kernel


* Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 2006-06-05 at 10:49 +0200, Ingo Molnar wrote:
> 
> > +	/*
> > +	 * HACK:
> > +	 *
> > +	 * In the first pass we dont touch handlers that are behind
> > +	 * a disabled IRQ line. In the second pass (having no other
> > +	 * choice) we ignore the disabled state of IRQ lines. We've
> > +	 * got a screaming interrupt, so we have the choice between
> > +	 * a real lockup happening due to that screaming interrupt,
> > +	 * against a theoretical locking that becomes possible if we
> > +	 * ignore a disabled IRQ line.
> 
> FYI, with irqpoll on in my i386 SMP machine, I hit this theoretical 
> locking every time in the vortex driver.

you got a lockup/deadlock every time, due to the bug that the lock 
validator pointed out?

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch, -rc5-mm3] fix irqpoll some more
  2006-06-06 13:53       ` Ingo Molnar
@ 2006-06-06 14:40         ` Steven Rostedt
  2006-06-06 14:59           ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2006-06-06 14:40 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Alan Cox, Andrew Morton, arjan, linux-kernel

On Tue, 2006-06-06 at 15:53 +0200, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Mon, 2006-06-05 at 10:49 +0200, Ingo Molnar wrote:
> > 
> > > +	/*
> > > +	 * HACK:
> > > +	 *
> > > +	 * In the first pass we dont touch handlers that are behind
> > > +	 * a disabled IRQ line. In the second pass (having no other
> > > +	 * choice) we ignore the disabled state of IRQ lines. We've
> > > +	 * got a screaming interrupt, so we have the choice between
> > > +	 * a real lockup happening due to that screaming interrupt,
> > > +	 * against a theoretical locking that becomes possible if we
> > > +	 * ignore a disabled IRQ line.
> > 
> > FYI, with irqpoll on in my i386 SMP machine, I hit this theoretical 
> > locking every time in the vortex driver.
> 
> you got a lockup/deadlock every time, due to the bug that the lock 
> validator pointed out?
> 

The one with the spin_lock being used outside of interrupts disabled?
Yep.  But remember, this only happens with irqpoll where it doesn't
honor disabled interrupts, so this was expected behavior.

Here's the back trace. I had to disable your lockdep to turn NMI back
on.

 [<c0103d37>] show_stack_log_lvl+0xa7/0xf0
 [<c0103f40>] show_registers+0x1c0/0x260
 [<c0104357>] die_nmi+0xd7/0x160
 [<c0110835>] nmi_watchdog_tick+0x1c5/0x210
 [<c0104079>] do_nmi+0x99/0x2a0
 [<c01035ee>] nmi_stack_correct+0x1d/0x22
 [<f8898933>] boomerang_interrupt+0x33/0x460 [3c59x]
 [<c01435d5>] note_interrupt+0x155/0x260
 [<c0143dfd>] handle_edge_irq+0x11d/0x150
 [<c010529f>] do_IRQ+0x4f/0xa0
 [<c0103462>] common_interrupt+0x1a/0x20
 [<f8895c46>] mdio_read+0xb6/0xf0 [3c59x]
 [<f885a2e1>] mii_link_ok+0x21/0x50 [mii]
 [<f885a3a6>] mii_check_media+0x36/0x200 [mii]
 [<f88956ab>] vortex_check_media+0x3b/0x90 [3c59x]
 [<f8895b1e>] vortex_timer+0x41e/0x490 [3c59x]
 [<c0128799>] run_timer_softirq+0xc9/0x1a0
 [<c0123ff4>] __do_softirq+0x74/0xf0
 [<c01240c5>] do_softirq+0x55/0x60
 [<c012422d>] irq_exit+0x4d/0x50
 [<c01100e4>] smp_apic_timer_interrupt+0x64/0x70
 [<c01034f3>] apic_timer_interrupt+0x1f/0x24
 [<c010160d>] cpu_idle+0x4d/0xb0
 [<c010f170>] start_secondary+0x450/0x500
 [<00000000>] 0x0


-- Steve



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch, -rc5-mm3] fix irqpoll some more
  2006-06-06 14:40         ` Steven Rostedt
@ 2006-06-06 14:59           ` Ingo Molnar
  2006-06-06 15:32             ` Steven Rostedt
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2006-06-06 14:59 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Alan Cox, Andrew Morton, arjan, linux-kernel


* Steven Rostedt <rostedt@goodmis.org> wrote:

> > you got a lockup/deadlock every time, due to the bug that the lock 
> > validator pointed out?
> > 
> 
> The one with the spin_lock being used outside of interrupts disabled? 
> Yep.  But remember, this only happens with irqpoll where it doesn't 
> honor disabled interrupts, so this was expected behavior.
> 
> Here's the back trace. I had to disable your lockdep to turn NMI back 
> on.
> 
>  [<c0103d37>] show_stack_log_lvl+0xa7/0xf0
>  [<c0103f40>] show_registers+0x1c0/0x260
>  [<c0104357>] die_nmi+0xd7/0x160
>  [<c0110835>] nmi_watchdog_tick+0x1c5/0x210
>  [<c0104079>] do_nmi+0x99/0x2a0
>  [<c01035ee>] nmi_stack_correct+0x1d/0x22
>  [<f8898933>] boomerang_interrupt+0x33/0x460 [3c59x]
>  [<c01435d5>] note_interrupt+0x155/0x260
>  [<c0143dfd>] handle_edge_irq+0x11d/0x150
>  [<c010529f>] do_IRQ+0x4f/0xa0
>  [<c0103462>] common_interrupt+0x1a/0x20
>  [<f8895c46>] mdio_read+0xb6/0xf0 [3c59x]
>  [<f885a2e1>] mii_link_ok+0x21/0x50 [mii]
>  [<f885a3a6>] mii_check_media+0x36/0x200 [mii]
>  [<f88956ab>] vortex_check_media+0x3b/0x90 [3c59x]
>  [<f8895b1e>] vortex_timer+0x41e/0x490 [3c59x]
>  [<c0128799>] run_timer_softirq+0xc9/0x1a0
>  [<c0123ff4>] __do_softirq+0x74/0xf0
>  [<c01240c5>] do_softirq+0x55/0x60
>  [<c012422d>] irq_exit+0x4d/0x50
>  [<c01100e4>] smp_apic_timer_interrupt+0x64/0x70
>  [<c01034f3>] apic_timer_interrupt+0x1f/0x24
>  [<c010160d>] cpu_idle+0x4d/0xb0
>  [<c010f170>] start_secondary+0x450/0x500
>  [<00000000>] 0x0

ouch. Then this problem is more urgent than i thought. We definitely 
need the disable_irq_handler(irq, handler) infrastructure and it should 
be used in the most common drivers affected (vortex/3c59x, forcedeth and 
IDE basically).

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch, -rc5-mm3] fix irqpoll some more
  2006-06-06 14:59           ` Ingo Molnar
@ 2006-06-06 15:32             ` Steven Rostedt
  0 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2006-06-06 15:32 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Alan Cox, Andrew Morton, arjan, linux-kernel

On Tue, 2006-06-06 at 16:59 +0200, Ingo Molnar wrote:
> *
> 
> ouch. Then this problem is more urgent than i thought. We definitely 
> need the disable_irq_handler(irq, handler) infrastructure and it should 
> be used in the most common drivers affected (vortex/3c59x, forcedeth and 
> IDE basically).
> 

Ingo, did you take a look at my alternative?  It postpones the unmasking
of the interrupt till the enable_irq is done.  It also doesn't have the
need to update 300+ calls to disable/enable_irq.

Also, the disable_irq_handler doesn't solve the case if the misrouted
irq came from the device that its driver called the disable_irq_handler.
This doesn't affect irqpoll, but for those cases that have broken irq
setups needing irqfixup this can still allow an irq storm.

-- Steve



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-06-06 15:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200606050600.k5560GdU002338@shell0.pdx.osdl.net>
     [not found] ` <1149497459.23209.39.camel@localhost.localdomain>
2006-06-05  8:49   ` [patch, -rc5-mm3] fix irqpoll some more Ingo Molnar
2006-06-05 13:05     ` Alan Cox
2006-06-05 12:52       ` Ingo Molnar
2006-06-06 13:25     ` Steven Rostedt
2006-06-06 13:53       ` Ingo Molnar
2006-06-06 14:40         ` Steven Rostedt
2006-06-06 14:59           ` Ingo Molnar
2006-06-06 15:32             ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox