* [RFC] [PATCH] Fix misrouted interrupts deadlocks
@ 2006-11-10 13:55 Pavel Emelianov
2006-11-10 14:12 ` Ingo Molnar
2006-11-20 19:23 ` Vivek Goyal
0 siblings, 2 replies; 8+ messages in thread
From: Pavel Emelianov @ 2006-11-10 13:55 UTC (permalink / raw)
To: Andrew Morton, mingo, Linux Kernel Mailing List; +Cc: Kirill Korotaev
[-- Attachment #1: Type: text/plain, Size: 1221 bytes --]
While testing kernel on machine with "irqpoll" option
I've caught such a lockup:
__do_IRQ()
spin_lock(&desc->lock);
desc->chip->ack(); /* IRQ is ACKed */
note_interrupt()
misrouted_irq()
handle_IRQ_event()
if (...)
local_irq_enable_in_hardirq();
/* interrupts are enabled from now */
...
__do_IRQ() /* same IRQ we've started from */
spin_lock(&desc->lock); /* LOCKUP */
Looking at misrouted_irq() code I've found that a potential
deadlock like this can also take place:
1CPU:
__do_IRQ()
spin_lock(&desc->lock); /* irq = A */
misrouted_irq()
for (i = 1; i < NR_IRQS; i++) {
spin_lock(&desc->lock); /* irq = B */
if (desc->status & IRQ_INPROGRESS) {
2CPU:
__do_IRQ()
spin_lock(&desc->lock); /* irq = B */
misrouted_irq()
for (i = 1; i < NR_IRQS; i++) {
spin_lock(&desc->lock); /* irq = A */
if (desc->status & IRQ_INPROGRESS) {
As the second lock on booth CPUs is taken before checking that
this irq is being handled in another processor this may cause
a deadlock. This issue is only theoretical.
I propose the attached patch to fix booth problems: when trying
to handle misrouted IRQ active desc->lock may be unlocked.
Please comment.
[-- Attachment #2: diff-misrouted-irq-lockup --]
[-- Type: text/plain, Size: 536 bytes --]
--- ./kernel/irq/spurious.c.irqlockup 2006-11-09 11:19:10.000000000 +0300
+++ ./kernel/irq/spurious.c 2006-11-10 16:53:38.000000000 +0300
@@ -147,7 +147,11 @@ void note_interrupt(unsigned int irq, st
if (unlikely(irqfixup)) {
/* Don't punish working computers */
if ((irqfixup == 2 && irq == 0) || action_ret == IRQ_NONE) {
- int ok = misrouted_irq(irq);
+ int ok;
+
+ spin_unlock(&desc->lock);
+ ok = misrouted_irq(irq);
+ spin_lock(&desc->lock);
if (action_ret == IRQ_NONE)
desc->irqs_unhandled -= ok;
}
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [RFC] [PATCH] Fix misrouted interrupts deadlocks 2006-11-10 13:55 [RFC] [PATCH] Fix misrouted interrupts deadlocks Pavel Emelianov @ 2006-11-10 14:12 ` Ingo Molnar 2006-11-10 14:31 ` Pavel Emelianov 2006-11-20 19:23 ` Vivek Goyal 1 sibling, 1 reply; 8+ messages in thread From: Ingo Molnar @ 2006-11-10 14:12 UTC (permalink / raw) To: Pavel Emelianov Cc: Andrew Morton, Linux Kernel Mailing List, mingo, Kirill Korotaev On Fri, 2006-11-10 at 16:55 +0300, Pavel Emelianov wrote: > - int ok = misrouted_irq(irq); > + int ok; > + > + spin_unlock(&desc->lock); > + ok = misrouted_irq(irq); > + spin_lock(&desc->lock); your fix looks reasonable to me - it's a thinko to call misrouted_irq() with the descriptor lock still held. (btw., how did you find it - lockdep spinlock debugging or NMI watchdog?) Acked-by: Ingo Molnar <mingo@redhat.com> Ingo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] Fix misrouted interrupts deadlocks 2006-11-10 14:12 ` Ingo Molnar @ 2006-11-10 14:31 ` Pavel Emelianov 0 siblings, 0 replies; 8+ messages in thread From: Pavel Emelianov @ 2006-11-10 14:31 UTC (permalink / raw) To: Ingo Molnar Cc: Andrew Morton, Linux Kernel Mailing List, mingo, Kirill Korotaev Ingo Molnar wrote: > On Fri, 2006-11-10 at 16:55 +0300, Pavel Emelianov wrote: >> - int ok = misrouted_irq(irq); >> + int ok; >> + >> + spin_unlock(&desc->lock); >> + ok = misrouted_irq(irq); >> + spin_lock(&desc->lock); > > your fix looks reasonable to me - it's a thinko to call misrouted_irq() > with the descriptor lock still held. (btw., how did you find it - > lockdep spinlock debugging or NMI watchdog?) It was NMI watchdog who reported the deadlock. With lockdep turned on it wouldn't be caught - local_irq_enable_in_hardirq() is nothing but a "do { } while (0)" if CONFIG_LOCKDEP=y :) The second issue (with 2 cpus involved) was discovered by code examining. > Acked-by: Ingo Molnar <mingo@redhat.com> > > Ingo > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] Fix misrouted interrupts deadlocks 2006-11-10 13:55 [RFC] [PATCH] Fix misrouted interrupts deadlocks Pavel Emelianov 2006-11-10 14:12 ` Ingo Molnar @ 2006-11-20 19:23 ` Vivek Goyal 2006-11-20 19:56 ` Vivek Goyal 1 sibling, 1 reply; 8+ messages in thread From: Vivek Goyal @ 2006-11-20 19:23 UTC (permalink / raw) To: Pavel Emelianov Cc: Andrew Morton, mingo, Linux Kernel Mailing List, Kirill Korotaev On Fri, Nov 10, 2006 at 04:55:48PM +0300, Pavel Emelianov wrote: > As the second lock on booth CPUs is taken before checking that > this irq is being handled in another processor this may cause > a deadlock. This issue is only theoretical. > > I propose the attached patch to fix booth problems: when trying > to handle misrouted IRQ active desc->lock may be unlocked. > > Please comment. > --- ./kernel/irq/spurious.c.irqlockup 2006-11-09 11:19:10.000000000 +0300 > +++ ./kernel/irq/spurious.c 2006-11-10 16:53:38.000000000 +0300 > @@ -147,7 +147,11 @@ void note_interrupt(unsigned int irq, st > if (unlikely(irqfixup)) { > /* Don't punish working computers */ > if ((irqfixup == 2 && irq == 0) || action_ret == IRQ_NONE) { > - int ok = misrouted_irq(irq); > + int ok; > + > + spin_unlock(&desc->lock); > + ok = misrouted_irq(irq); > + spin_lock(&desc->lock); Hi Pavel, Till -rc5, I was able to boot a kernel with irqpoll option and in -rc6 I can't. It simply hangs. I suspect it is realted to this change. I have yet to confirm that. But before that one observation. Not at every place note_interrupt() is called with desc->lock() held. For example, handle_level_irq(). I enabled spin lock debugging and I run into following BUG(). PID hash table entries: 256 (order: 8, 2048 bytes) time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer. time.c: Detected 3000.218 MHz processor. ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- swapper/0 is trying to release lock (&irq_desc_lock_class) at: [<ffffffff8104c673>] note_interrupt+0x7a/0x22b but there are no more locks to release! other info that might help us debug this: no locks held by swapper/0. stack backtrace: Call Trace: [<ffffffff8100a6f9>] show_trace+0x34/0x47 [<ffffffff8100a71e>] dump_stack+0x12/0x17 [<ffffffff8103caba>] print_unlock_inbalance_bug+0xfb/0x106 [<ffffffff8103e6e5>] lock_release+0x89/0x128 [<ffffffff81332d96>] _spin_unlock+0x17/0x20 [<ffffffff8104c673>] note_interrupt+0x7a/0x22b [<ffffffff8104d131>] handle_level_irq+0xab/0xea [<ffffffff8100b776>] do_IRQ+0xf4/0x132 [<ffffffff81009956>] ret_from_intr+0x0/0xf DWARF2 unwinder stuck at ret_from_intr+0x0/0xf Leftover inexact backtrace: <IRQ> <EOI> [<ffffffff8159f61d>] start_kernel+0x178/0x2f6 [<ffffffff8159f625>] start_kernel+0x180/0x2f6 [<ffffffff8159f61d>] start_kernel+0x178/0x2f6 [<ffffffff8159f13e>] _sinittext+0x13e/0x142 BUG: spinlock lockup on CPU#0, swapper/0, ffffffff81586140 Call Trace: [<ffffffff8100a6f9>] show_trace+0x34/0x47 [<ffffffff8100a71e>] dump_stack+0x12/0x17 [<ffffffff811457c8>] _raw_spin_lock+0xca/0xe8 [<ffffffff8104d139>] handle_level_irq+0xb3/0xea [<ffffffff8100b776>] do_IRQ+0xf4/0x132 [<ffffffff81009956>] ret_from_intr+0x0/0xf DWARF2 unwinder stuck at ret_from_intr+0x0/0xf Leftover inexact backtrace: <IRQ> <EOI> [<ffffffff8159f61d>] start_kernel+0x178/0x2f6 [<ffffffff8159f625>] start_kernel+0x180/0x2f6 [<ffffffff8159f61d>] start_kernel+0x178/0x2f6 [<ffffffff8159f13e>] _sinittext+0x13e/0x142 Thanks Vivek ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] Fix misrouted interrupts deadlocks 2006-11-20 19:23 ` Vivek Goyal @ 2006-11-20 19:56 ` Vivek Goyal 2006-11-20 20:52 ` Vivek Goyal ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Vivek Goyal @ 2006-11-20 19:56 UTC (permalink / raw) To: Pavel Emelianov Cc: Andrew Morton, mingo, Linux Kernel Mailing List, Kirill Korotaev On Mon, Nov 20, 2006 at 02:23:35PM -0500, Vivek Goyal wrote: > On Fri, Nov 10, 2006 at 04:55:48PM +0300, Pavel Emelianov wrote: > > As the second lock on booth CPUs is taken before checking that > > this irq is being handled in another processor this may cause > > a deadlock. This issue is only theoretical. > > > > I propose the attached patch to fix booth problems: when trying > > to handle misrouted IRQ active desc->lock may be unlocked. > > > > Please comment. > > > --- ./kernel/irq/spurious.c.irqlockup 2006-11-09 11:19:10.000000000 +0300 > > +++ ./kernel/irq/spurious.c 2006-11-10 16:53:38.000000000 +0300 > > @@ -147,7 +147,11 @@ void note_interrupt(unsigned int irq, st > > if (unlikely(irqfixup)) { > > /* Don't punish working computers */ > > if ((irqfixup == 2 && irq == 0) || action_ret == IRQ_NONE) { > > - int ok = misrouted_irq(irq); > > + int ok; > > + > > + spin_unlock(&desc->lock); > > + ok = misrouted_irq(irq); > > + spin_lock(&desc->lock); > > Hi Pavel, > > Till -rc5, I was able to boot a kernel with irqpoll option and in -rc6 I > can't. It simply hangs. I suspect it is realted to this change. I have yet > to confirm that. But before that one observation. > Hi Pavel, If I backout your changes, everything works fine. So it looks like that the problem I am facing is because of your patch but I don't have a logical explanation yet that why the problem is there. Just realasing a lock which is not currently acquired should not hang the system? Thanks Vivek ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] Fix misrouted interrupts deadlocks 2006-11-20 19:56 ` Vivek Goyal @ 2006-11-20 20:52 ` Vivek Goyal 2006-11-21 8:01 ` Pavel Emelianov 2006-11-21 8:10 ` Pavel Emelianov 2 siblings, 0 replies; 8+ messages in thread From: Vivek Goyal @ 2006-11-20 20:52 UTC (permalink / raw) To: Pavel Emelianov Cc: Pavel Emelianov, Andrew Morton, mingo, Kirill Korotaev, linux kernel mailing list On Mon, Nov 20, 2006 at 02:56:52PM -0500, Vivek Goyal wrote: > On Mon, Nov 20, 2006 at 02:23:35PM -0500, Vivek Goyal wrote: > > On Fri, Nov 10, 2006 at 04:55:48PM +0300, Pavel Emelianov wrote: > > > As the second lock on booth CPUs is taken before checking that > > > this irq is being handled in another processor this may cause > > > a deadlock. This issue is only theoretical. > > > > > > I propose the attached patch to fix booth problems: when trying > > > to handle misrouted IRQ active desc->lock may be unlocked. > > > > > > Please comment. > > > > > --- ./kernel/irq/spurious.c.irqlockup 2006-11-09 11:19:10.000000000 +0300 > > > +++ ./kernel/irq/spurious.c 2006-11-10 16:53:38.000000000 +0300 > > > @@ -147,7 +147,11 @@ void note_interrupt(unsigned int irq, st > > > if (unlikely(irqfixup)) { > > > /* Don't punish working computers */ > > > if ((irqfixup == 2 && irq == 0) || action_ret == IRQ_NONE) { > > > - int ok = misrouted_irq(irq); > > > + int ok; > > > + > > > + spin_unlock(&desc->lock); > > > + ok = misrouted_irq(irq); > > > + spin_lock(&desc->lock); > > > > Hi Pavel, > > > > Till -rc5, I was able to boot a kernel with irqpoll option and in -rc6 I > > can't. It simply hangs. I suspect it is realted to this change. I have yet > > to confirm that. But before that one observation. > > > > Hi Pavel, > > If I backout your changes, everything works fine. So it looks like that > the problem I am facing is because of your patch but I don't have a logical > explanation yet that why the problem is there. Just realasing a lock > which is not currently acquired should not hang the system? > Some more data regarding this issue. For my system it gets locked in following sequence. handle_level_irq() { spin_unlock(desc->lock) ...... note_interrupt() { /* Called without desc->lock held */ spin_unlock(desc->lock) misrouted_irq() spin_lock(desc->lock) } spin_lock(desc->lock) /* Lockup */ } So basically problems seems to be due to calling conventions of note_interrupt(). In few cases we call it with desc->lock held and in other cases we call it with desc->lock not held. I guess, note_interrupt() should restore the desc->lock status back to the state of the lock when function was entered so that caller does not lockup. Thanks Vivek ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] Fix misrouted interrupts deadlocks 2006-11-20 19:56 ` Vivek Goyal 2006-11-20 20:52 ` Vivek Goyal @ 2006-11-21 8:01 ` Pavel Emelianov 2006-11-21 8:10 ` Pavel Emelianov 2 siblings, 0 replies; 8+ messages in thread From: Pavel Emelianov @ 2006-11-21 8:01 UTC (permalink / raw) To: vgoyal; +Cc: Andrew Morton, mingo, Linux Kernel Mailing List, Kirill Korotaev Vivek Goyal wrote: > On Mon, Nov 20, 2006 at 02:23:35PM -0500, Vivek Goyal wrote: >> On Fri, Nov 10, 2006 at 04:55:48PM +0300, Pavel Emelianov wrote: >>> As the second lock on booth CPUs is taken before checking that >>> this irq is being handled in another processor this may cause >>> a deadlock. This issue is only theoretical. >>> >>> I propose the attached patch to fix booth problems: when trying >>> to handle misrouted IRQ active desc->lock may be unlocked. >>> >>> Please comment. >>> --- ./kernel/irq/spurious.c.irqlockup 2006-11-09 11:19:10.000000000 +0300 >>> +++ ./kernel/irq/spurious.c 2006-11-10 16:53:38.000000000 +0300 >>> @@ -147,7 +147,11 @@ void note_interrupt(unsigned int irq, st >>> if (unlikely(irqfixup)) { >>> /* Don't punish working computers */ >>> if ((irqfixup == 2 && irq == 0) || action_ret == IRQ_NONE) { >>> - int ok = misrouted_irq(irq); >>> + int ok; >>> + >>> + spin_unlock(&desc->lock); >>> + ok = misrouted_irq(irq); >>> + spin_lock(&desc->lock); >> Hi Pavel, >> >> Till -rc5, I was able to boot a kernel with irqpoll option and in -rc6 I >> can't. It simply hangs. I suspect it is realted to this change. I have yet >> to confirm that. But before that one observation. >> > > Hi Pavel, > > If I backout your changes, everything works fine. So it looks like that > the problem I am facing is because of your patch but I don't have a logical > explanation yet that why the problem is there. Just realasing a lock > which is not currently acquired should not hang the system? Without this patch my kernel hanged in another place. I'll look over the interrupt code again. I suspect that just another fix is required. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] Fix misrouted interrupts deadlocks 2006-11-20 19:56 ` Vivek Goyal 2006-11-20 20:52 ` Vivek Goyal 2006-11-21 8:01 ` Pavel Emelianov @ 2006-11-21 8:10 ` Pavel Emelianov 2 siblings, 0 replies; 8+ messages in thread From: Pavel Emelianov @ 2006-11-21 8:10 UTC (permalink / raw) To: vgoyal, mingo; +Cc: Andrew Morton, Linux Kernel Mailing List, Kirill Korotaev > Hi Pavel, > > If I backout your changes, everything works fine. So it looks like that > the problem I am facing is because of your patch but I don't have a logical > explanation yet that why the problem is there. Just realasing a lock > which is not currently acquired should not hang the system? Hm... A simple grep over the code showed that note_interrupt is called w/o desc->lock in all places but __do_IRQ(). And this looks like an error at least for the following reason: note_interrupt() calls __report_bad_irq() and __report_bad_irq() does require desc->lock to be held. So I suppose that we have to do spin_lock(&desc->lock) before calling note_interrupt(). I'll prepare a patch in a moment. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-11-21 8:15 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-11-10 13:55 [RFC] [PATCH] Fix misrouted interrupts deadlocks Pavel Emelianov 2006-11-10 14:12 ` Ingo Molnar 2006-11-10 14:31 ` Pavel Emelianov 2006-11-20 19:23 ` Vivek Goyal 2006-11-20 19:56 ` Vivek Goyal 2006-11-20 20:52 ` Vivek Goyal 2006-11-21 8:01 ` Pavel Emelianov 2006-11-21 8:10 ` Pavel Emelianov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox