* 2.6.21-rc7-mm2 "irqpoll" seems to be broken
@ 2007-04-26 9:36 Vivek Goyal
2007-04-26 15:24 ` Andrew Morton
0 siblings, 1 reply; 12+ messages in thread
From: Vivek Goyal @ 2007-04-26 9:36 UTC (permalink / raw)
To: linux kernel mailing list, Morton Andrew Morton
Hi,
I am booting 2.6.21-rc7-mm2 on x86_64 box with "irqpoll" command line option
and it panics. I can reproduce this problem easily on this box. Please
let me know if serial console output is required.
2.6.21-rc7 works just fine. So problem seems to be in some -mm patch.
Unable to handle kernel NULL pointer dereference at 0000000000000009 RIP:
[<ffffffff8025bc5e>] note_interrupt+0x5d/0x21b
PGD 1032c5067 PUD 1032c4067 PMD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.21-rc7-mm2 #1
RIP: 0010:[<ffffffff8025bc5e>] [<ffffffff8025bc5e>] note_interrupt+0x5d/0x21b
RSP: 0018:ffff810100cbff08 EFLAGS: 00010002
RAX: 0000000000000000 RBX: ffffffff807e2d40 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff807e2d40 RDI: 0000000000000004
RBP: ffffffff807e2d40 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000010 R11: 00000000000000a0 R12: ffff810104192f40
R13: ffffffff807e2d84 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff810100854140(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000009 CR3: 00000001032c6000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff810100cb8000, task ffff810100cb7500)
Stack: 0000000000000004 0000000400000000 0000000000000000 ffffffff807e2d40
0000000000000004 ffff810104192f40 ffffffff807e2d84 0000000000000000
0000000000000000 ffffffff8025c7c5 ffff810100cb9e98 ffff810100cb9e98
Call Trace:
[<ffffffff8025c7c5>] handle_edge_irq+0xf9/0x127
[<ffffffff8020c2f9>] do_IRQ+0xf1/0x160
[<ffffffff8020a141>] ret_from_intr+0x0/0xa
[<ffffffff80208fd9>] mwait_idle+0x42/0x45
[<ffffffff80208f2f>] cpu_idle+0xbd/0xe0
Code: f6 40 09 10 75 09 45 85 ff 0f 85 3d 01 00 00 49 c7 c4 c0 2b
RIP [<ffffffff8025bc5e>] note_interrupt+0x5d/0x21b
RSP <ffff810100cbff08>
CR2: 0000000000000009
Kernel panic - not syncing: Aiee, killing interrupt handler!
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken
2007-04-26 9:36 2.6.21-rc7-mm2 "irqpoll" seems to be broken Vivek Goyal
@ 2007-04-26 15:24 ` Andrew Morton
2007-04-26 15:43 ` Bernhard Walle
2007-04-30 8:48 ` Vivek Goyal
0 siblings, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2007-04-26 15:24 UTC (permalink / raw)
To: vgoyal; +Cc: linux kernel mailing list, Bernhard Walle
On Thu, 26 Apr 2007 15:06:20 +0530 Vivek Goyal <vgoyal@in.ibm.com> wrote:
> Hi,
>
> I am booting 2.6.21-rc7-mm2 on x86_64 box with "irqpoll" command line option
> and it panics. I can reproduce this problem easily on this box. Please
> let me know if serial console output is required.
>
> 2.6.21-rc7 works just fine. So problem seems to be in some -mm patch.
>
> Unable to handle kernel NULL pointer dereference at 0000000000000009 RIP:
> [<ffffffff8025bc5e>] note_interrupt+0x5d/0x21b
> PGD 1032c5067 PUD 1032c4067 PMD 0
> Oops: 0000 [1] SMP
> CPU 1
> Modules linked in:
> Pid: 0, comm: swapper Not tainted 2.6.21-rc7-mm2 #1
> RIP: 0010:[<ffffffff8025bc5e>] [<ffffffff8025bc5e>] note_interrupt+0x5d/0x21b
> RSP: 0018:ffff810100cbff08 EFLAGS: 00010002
> RAX: 0000000000000000 RBX: ffffffff807e2d40 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffffffff807e2d40 RDI: 0000000000000004
> RBP: ffffffff807e2d40 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000010 R11: 00000000000000a0 R12: ffff810104192f40
> R13: ffffffff807e2d84 R14: 0000000000000000 R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffff810100854140(0000) knlGS:0000000000000000
> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000009 CR3: 00000001032c6000 CR4: 00000000000006e0
> Process swapper (pid: 0, threadinfo ffff810100cb8000, task ffff810100cb7500)
> Stack: 0000000000000004 0000000400000000 0000000000000000 ffffffff807e2d40
> 0000000000000004 ffff810104192f40 ffffffff807e2d84 0000000000000000
> 0000000000000000 ffffffff8025c7c5 ffff810100cb9e98 ffff810100cb9e98
> Call Trace:
> [<ffffffff8025c7c5>] handle_edge_irq+0xf9/0x127
> [<ffffffff8020c2f9>] do_IRQ+0xf1/0x160
> [<ffffffff8020a141>] ret_from_intr+0x0/0xa
> [<ffffffff80208fd9>] mwait_idle+0x42/0x45
> [<ffffffff80208f2f>] cpu_idle+0xbd/0xe0
>
>
> Code: f6 40 09 10 75 09 45 85 ff 0f 85 3d 01 00 00 49 c7 c4 c0 2b
> RIP [<ffffffff8025bc5e>] note_interrupt+0x5d/0x21b
> RSP <ffff810100cbff08>
> CR2: 0000000000000009
> Kernel panic - not syncing: Aiee, killing interrupt handler!
>
hm. I'd be suspecting
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/add-irqf_irqpoll-flag-common-code.patch
and
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/add-irqf_irqpoll-flag-on-x86_64.patch
But because x86_64 doesn't implement IRQ_PER_CPU it's hard to see how we
got into note_interrupt as a result of that patch.
Adding the `noirqdebug' boot option would be interesting, perhaps.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken
2007-04-26 15:24 ` Andrew Morton
@ 2007-04-26 15:43 ` Bernhard Walle
2007-04-30 8:48 ` Vivek Goyal
1 sibling, 0 replies; 12+ messages in thread
From: Bernhard Walle @ 2007-04-26 15:43 UTC (permalink / raw)
To: vgoyal; +Cc: linux kernel mailing list, Andrew Morton
[-- Attachment #1: Type: text/plain, Size: 813 bytes --]
* Andrew Morton <akpm@linux-foundation.org> [2007-04-26 17:24]:
>
> hm. I'd be suspecting
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/add-irqf_irqpoll-flag-common-code.patch
> and
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/add-irqf_irqpoll-flag-on-x86_64.patch
I aggree that it's likely that the patches cause the problem. However,
Vivek, could you try without the patches if the problem is really in
theses patches?
Thanks,
Bernhard
--
SUSE LINUX Products GmbH Tel. +49 (911) 74053-0
Maxfeldstr. 5 GF: Markus Rex
90409 Nürnberg, Germany HRB 16746 (AG Nürnberg)
OpenPGP DDAF6454: F61F 34CC 09CA FB82 C9F6 BA4B 8865 3696 DDAF 6454
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken
2007-04-26 15:24 ` Andrew Morton
2007-04-26 15:43 ` Bernhard Walle
@ 2007-04-30 8:48 ` Vivek Goyal
2007-05-01 19:20 ` Bernhard Walle
[not found] ` <20070502221932.GA488@suse.de>
1 sibling, 2 replies; 12+ messages in thread
From: Vivek Goyal @ 2007-04-30 8:48 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux kernel mailing list, Bernhard Walle
On Thu, Apr 26, 2007 at 08:24:05AM -0700, Andrew Morton wrote:
> On Thu, 26 Apr 2007 15:06:20 +0530 Vivek Goyal <vgoyal@in.ibm.com> wrote:
>
> > Hi,
> >
> > I am booting 2.6.21-rc7-mm2 on x86_64 box with "irqpoll" command line option
> > and it panics. I can reproduce this problem easily on this box. Please
> > let me know if serial console output is required.
> >
> > 2.6.21-rc7 works just fine. So problem seems to be in some -mm patch.
> >
> > Unable to handle kernel NULL pointer dereference at 0000000000000009 RIP:
> > [<ffffffff8025bc5e>] note_interrupt+0x5d/0x21b
> > PGD 1032c5067 PUD 1032c4067 PMD 0
> > Oops: 0000 [1] SMP
> > CPU 1
> > Modules linked in:
> > Pid: 0, comm: swapper Not tainted 2.6.21-rc7-mm2 #1
> > RIP: 0010:[<ffffffff8025bc5e>] [<ffffffff8025bc5e>] note_interrupt+0x5d/0x21b
> > RSP: 0018:ffff810100cbff08 EFLAGS: 00010002
> > RAX: 0000000000000000 RBX: ffffffff807e2d40 RCX: 0000000000000000
> > RDX: 0000000000000000 RSI: ffffffff807e2d40 RDI: 0000000000000004
> > RBP: ffffffff807e2d40 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000010 R11: 00000000000000a0 R12: ffff810104192f40
> > R13: ffffffff807e2d84 R14: 0000000000000000 R15: 0000000000000000
> > FS: 0000000000000000(0000) GS:ffff810100854140(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > CR2: 0000000000000009 CR3: 00000001032c6000 CR4: 00000000000006e0
> > Process swapper (pid: 0, threadinfo ffff810100cb8000, task ffff810100cb7500)
> > Stack: 0000000000000004 0000000400000000 0000000000000000 ffffffff807e2d40
> > 0000000000000004 ffff810104192f40 ffffffff807e2d84 0000000000000000
> > 0000000000000000 ffffffff8025c7c5 ffff810100cb9e98 ffff810100cb9e98
> > Call Trace:
> > [<ffffffff8025c7c5>] handle_edge_irq+0xf9/0x127
> > [<ffffffff8020c2f9>] do_IRQ+0xf1/0x160
> > [<ffffffff8020a141>] ret_from_intr+0x0/0xa
> > [<ffffffff80208fd9>] mwait_idle+0x42/0x45
> > [<ffffffff80208f2f>] cpu_idle+0xbd/0xe0
> >
> >
> > Code: f6 40 09 10 75 09 45 85 ff 0f 85 3d 01 00 00 49 c7 c4 c0 2b
> > RIP [<ffffffff8025bc5e>] note_interrupt+0x5d/0x21b
> > RSP <ffff810100cbff08>
> > CR2: 0000000000000009
> > Kernel panic - not syncing: Aiee, killing interrupt handler!
> >
>
> hm. I'd be suspecting
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/add-irqf_irqpoll-flag-common-code.patch
> and
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/add-irqf_irqpoll-flag-on-x86_64.patch
>
> But because x86_64 doesn't implement IRQ_PER_CPU it's hard to see how we
> got into note_interrupt as a result of that patch.
>
> Adding the `noirqdebug' boot option would be interesting, perhaps.
"noirqdebug" gets rid of the problem. But that also effectively nullifies
"irqpoll" parameter.
Interestingly on another x86_64 machine this problem does not occur. So
something is dependent on hardware.
I put some debug statements on note_interrupt() and found that desc->action
is a NULL pointer and that's why the problem. Above patch acesses
desc->action->flags, hence it ends up accessing a NULL pointer.
handle_edge_irq() already makes sure that desc->action is not null, still
note_interrupt() is receiving desc->action as null, that's strange. On my
system this is happening for irq 4 and /proc/interrupt shows that it is
coming from "serial".
Thanks
Vivek
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken
2007-04-30 8:48 ` Vivek Goyal
@ 2007-05-01 19:20 ` Bernhard Walle
[not found] ` <20070502221932.GA488@suse.de>
1 sibling, 0 replies; 12+ messages in thread
From: Bernhard Walle @ 2007-05-01 19:20 UTC (permalink / raw)
To: Vivek Goyal; +Cc: Andrew Morton, linux kernel mailing list
Hello Vivek,
* Vivek Goyal <vgoyal@in.ibm.com> [2007-04-30 10:48]:
> >
> handle_edge_irq() already makes sure that desc->action is not null, still
> note_interrupt() is receiving desc->action as null, that's strange. On my
> system this is happening for irq 4 and /proc/interrupt shows that it is
> coming from "serial".
from reading the code I also cannot this. However, I'm trying to
reproduce the problem here. I hope I find a machine where this also
happens.
Thanks,
Bernhard
--
SUSE LINUX Products GmbH Tel. +49 (911) 74053-0
Maxfeldstr. 5 GF: Markus Rex
90409 Nürnberg, Germany HRB 16746 (AG Nürnberg)
OpenPGP DDAF6454: F61F 34CC 09CA FB82 C9F6 BA4B 8865 3696 DDAF 6454
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken
[not found] ` <20070502221932.GA488@suse.de>
@ 2007-05-08 17:18 ` Vivek Goyal
2007-05-14 14:05 ` Bernhard Walle
0 siblings, 1 reply; 12+ messages in thread
From: Vivek Goyal @ 2007-05-08 17:18 UTC (permalink / raw)
To: Bernhard Walle; +Cc: Morton Andrew Morton, linux kernel mailing list
On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote:
> * Vivek Goyal <vgoyal@in.ibm.com> [2007-04-30 10:48]:
> >
> > handle_edge_irq() already makes sure that desc->action is not null, still
> > note_interrupt() is receiving desc->action as null, that's strange. On my
> > system this is happening for irq 4 and /proc/interrupt shows that it is
> > coming from "serial".
>
> Unfortunately, I couldn't reproduce this here. Vivek, do you have time
> to take a look at this at your site? For the meanwhile, should I
> create a patch that checks for desc->action in note_interrupt(), too?
>
Hi Bernhard,
I can reproduce this problem only on one machine. I think there is some
race condition and your code somehow just exposes it.
I put few WARN_ON(!desc->action) in handle_edge_irq() and what I find
that after handle_IRQ_event(), desc->action has become null. That means
in the meantime somebody has gone ahead and modified the desc. This must
have happened because we have release desc->lock while running
handle_IRQ_event().
This means there is a race somewhere. It is verified by the fact that
this problem does not occur if same system is booted with only one
cpu (maxcpus=1).
Thanks
Vivek
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken
2007-05-08 17:18 ` Vivek Goyal
@ 2007-05-14 14:05 ` Bernhard Walle
2007-05-17 13:05 ` Vivek Goyal
2007-05-22 8:37 ` [PATCH] Fix crash with irqpoll due to the IRQF_IRQPOLL flag Bernhard Walle
0 siblings, 2 replies; 12+ messages in thread
From: Bernhard Walle @ 2007-05-14 14:05 UTC (permalink / raw)
To: Vivek Goyal; +Cc: Morton Andrew Morton, linux kernel mailing list
* Vivek Goyal <vgoyal@in.ibm.com> [2007-05-08 19:18]:
> On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote:
> > * Vivek Goyal <vgoyal@in.ibm.com> [2007-04-30 10:48]:
> > >
> > > handle_edge_irq() already makes sure that desc->action is not null, still
> > > note_interrupt() is receiving desc->action as null, that's strange. On my
> > > system this is happening for irq 4 and /proc/interrupt shows that it is
> > > coming from "serial".
> >
> > Unfortunately, I couldn't reproduce this here. Vivek, do you have time
> > to take a look at this at your site? For the meanwhile, should I
> > create a patch that checks for desc->action in note_interrupt(), too?
>
> I can reproduce this problem only on one machine. I think there is some
> race condition and your code somehow just exposes it.
thanks for finding that out. Could you try/review out the patch below?
As the lock is only aquired when irqfixup == 2 it shouldn't impact
performance of a 'normal' system.
Thanks,
Bernhard
---
kernel/irq/spurious.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
--- a/kernel/irq/spurious.c
+++ b/kernel/irq/spurious.c
@@ -145,10 +145,20 @@ void note_interrupt(unsigned int irq, st
}
if (unlikely(irqfixup)) {
- /* Don't punish working computers */
- if ((irqfixup == 2 && ((irq == 0) ||
- (desc->action->flags & IRQF_IRQPOLL))) ||
- action_ret == IRQ_NONE) {
+ int call_misrouted_irq = action_ret == IRQ_NONE;
+
+ if (!call_misrouted_irq && irqfixup == 2) {
+ if (irq == 0)
+ call_misrouted_irq = 1;
+ else {
+ spin_lock(&desc->lock);
+ if (desc->action && (desc->action->flags & IRQF_IRQPOLL))
+ call_misrouted_irq = 1;
+ spin_unlock(&desc->lock);
+ }
+ }
+
+ if (call_misrouted_irq) {
int ok = misrouted_irq(irq);
if (action_ret == IRQ_NONE)
desc->irqs_unhandled -= ok;
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken
2007-05-14 14:05 ` Bernhard Walle
@ 2007-05-17 13:05 ` Vivek Goyal
2007-05-17 21:56 ` Bernhard Walle
2007-05-22 8:37 ` [PATCH] Fix crash with irqpoll due to the IRQF_IRQPOLL flag Bernhard Walle
1 sibling, 1 reply; 12+ messages in thread
From: Vivek Goyal @ 2007-05-17 13:05 UTC (permalink / raw)
To: Bernhard Walle; +Cc: Morton Andrew Morton, linux kernel mailing list
On Mon, May 14, 2007 at 04:05:15PM +0200, Bernhard Walle wrote:
> * Vivek Goyal <vgoyal@in.ibm.com> [2007-05-08 19:18]:
> > On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote:
> > > * Vivek Goyal <vgoyal@in.ibm.com> [2007-04-30 10:48]:
> > > >
> > > > handle_edge_irq() already makes sure that desc->action is not null, still
> > > > note_interrupt() is receiving desc->action as null, that's strange. On my
> > > > system this is happening for irq 4 and /proc/interrupt shows that it is
> > > > coming from "serial".
> > >
> > > Unfortunately, I couldn't reproduce this here. Vivek, do you have time
> > > to take a look at this at your site? For the meanwhile, should I
> > > create a patch that checks for desc->action in note_interrupt(), too?
> >
> > I can reproduce this problem only on one machine. I think there is some
> > race condition and your code somehow just exposes it.
>
> thanks for finding that out. Could you try/review out the patch below?
> As the lock is only aquired when irqfixup == 2 it shouldn't impact
> performance of a 'normal' system.
>
Hi Bernhard,
It does fix up my problem. I have modified your patch a bit. I think
new version is little more clear. What do you think?
Thanks
Vivek
o System crashes if booted with irqpoll command line option.
o Problem happens because Inside note_interrupt() we are accessing
desc->action->flag without taking the desc->lock. While accessing it
somebody goes ahead and unregisters the irq handler hence desc->action
is NULL. By the time note_interrupt() checks it, it crashes.
o In my system it is irq 4 seriving to serial driver.
o Take the desc->lock before accessing desc->action->flag.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---
linux-2.6.21-git12-root/kernel/irq/spurious.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
diff -puN kernel/irq/spurious.c~fix-irqpoll-crash kernel/irq/spurious.c
--- linux-2.6.21-git12/kernel/irq/spurious.c~fix-irqpoll-crash 2007-05-17 17:36:50.000000000 +0530
+++ linux-2.6.21-git12-root/kernel/irq/spurious.c 2007-05-17 17:53:52.000000000 +0530
@@ -138,6 +138,8 @@ report_bad_irq(unsigned int irq, struct
void note_interrupt(unsigned int irq, struct irq_desc *desc,
irqreturn_t action_ret)
{
+ int call_misrouted_irq = 0;
+
if (unlikely(action_ret != IRQ_HANDLED)) {
desc->irqs_unhandled++;
if (unlikely(action_ret != IRQ_NONE))
@@ -146,9 +148,24 @@ void note_interrupt(unsigned int irq, st
if (unlikely(irqfixup)) {
/* Don't punish working computers */
- if ((irqfixup == 2 && ((irq == 0) ||
- (desc->action->flags & IRQF_IRQPOLL))) ||
- action_ret == IRQ_NONE) {
+ if (action_ret == IRQ_NONE)
+ /* Nobody handled irq. Possibly a misrouted one. */
+ call_misrouted_irq = 1;
+ else if (irqfixup == 2) {
+ /* irqpoll is enabled. Is this the irq driving
+ * polling.
+ */
+ if (irq == 0)
+ call_misrouted_irq = 1;
+ else {
+ spin_lock(&desc->lock);
+ if (desc->action &&
+ (desc->action->flags & IRQF_IRQPOLL))
+ call_misrouted_irq = 1;
+ spin_unlock(&desc->lock);
+ }
+ }
+ if (call_misrouted_irq) {
int ok = misrouted_irq(irq);
if (action_ret == IRQ_NONE)
desc->irqs_unhandled -= ok;
_
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken
2007-05-17 13:05 ` Vivek Goyal
@ 2007-05-17 21:56 ` Bernhard Walle
0 siblings, 0 replies; 12+ messages in thread
From: Bernhard Walle @ 2007-05-17 21:56 UTC (permalink / raw)
To: Vivek Goyal; +Cc: Morton Andrew Morton, linux kernel mailing list
* Vivek Goyal <vgoyal@in.ibm.com> [2007-05-17 15:05]:
> On Mon, May 14, 2007 at 04:05:15PM +0200, Bernhard Walle wrote:
> > * Vivek Goyal <vgoyal@in.ibm.com> [2007-05-08 19:18]:
> > > On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote:
> > > > * Vivek Goyal <vgoyal@in.ibm.com> [2007-04-30 10:48]:
> > > > >
> > > > > handle_edge_irq() already makes sure that desc->action is not null, still
> > > > > note_interrupt() is receiving desc->action as null, that's strange. On my
> > > > > system this is happening for irq 4 and /proc/interrupt shows that it is
> > > > > coming from "serial".
> > > >
> > > > Unfortunately, I couldn't reproduce this here. Vivek, do you have time
> > > > to take a look at this at your site? For the meanwhile, should I
> > > > create a patch that checks for desc->action in note_interrupt(), too?
> > >
> > > I can reproduce this problem only on one machine. I think there is some
> > > race condition and your code somehow just exposes it.
> >
> > thanks for finding that out. Could you try/review out the patch below?
> > As the lock is only aquired when irqfixup == 2 it shouldn't impact
> > performance of a 'normal' system.
>
> It does fix up my problem. I have modified your patch a bit. I think
> new version is little more clear. What do you think?
Aggreed. Thanks for spotting that problem out!
Bernhard
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] Fix crash with irqpoll due to the IRQF_IRQPOLL flag
2007-05-14 14:05 ` Bernhard Walle
2007-05-17 13:05 ` Vivek Goyal
@ 2007-05-22 8:37 ` Bernhard Walle
2007-05-23 16:01 ` Linus Torvalds
1 sibling, 1 reply; 12+ messages in thread
From: Bernhard Walle @ 2007-05-22 8:37 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, Vivek Goyal
o System crashes if booted with irqpoll command line option.
o Problem happens because Inside note_interrupt() we are accessing
desc->action->flag without taking the desc->lock. While accessing it
somebody goes ahead and unregisters the irq handler hence desc->action
is NULL. By the time note_interrupt() checks it, it crashes.
o In that system it is irq 4 seriving to serial driver.
o Take the desc->lock before accessing desc->action->flag.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---
linux-2.6.21-git12-root/kernel/irq/spurious.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
diff -puN kernel/irq/spurious.c~fix-irqpoll-crash kernel/irq/spurious.c
--- linux-2.6.21-git12/kernel/irq/spurious.c~fix-irqpoll-crash 2007-05-17 17:36:50.000000000 +0530
+++ linux-2.6.21-git12-root/kernel/irq/spurious.c 2007-05-17 17:53:52.000000000 +0530
@@ -138,6 +138,8 @@ report_bad_irq(unsigned int irq, struct
void note_interrupt(unsigned int irq, struct irq_desc *desc,
irqreturn_t action_ret)
{
+ int call_misrouted_irq = 0;
+
if (unlikely(action_ret != IRQ_HANDLED)) {
desc->irqs_unhandled++;
if (unlikely(action_ret != IRQ_NONE))
@@ -146,9 +148,24 @@ void note_interrupt(unsigned int irq, st
if (unlikely(irqfixup)) {
/* Don't punish working computers */
- if ((irqfixup == 2 && ((irq == 0) ||
- (desc->action->flags & IRQF_IRQPOLL))) ||
- action_ret == IRQ_NONE) {
+ if (action_ret == IRQ_NONE)
+ /* Nobody handled irq. Possibly a misrouted one. */
+ call_misrouted_irq = 1;
+ else if (irqfixup == 2) {
+ /* irqpoll is enabled. Is this the irq driving
+ * polling.
+ */
+ if (irq == 0)
+ call_misrouted_irq = 1;
+ else {
+ spin_lock(&desc->lock);
+ if (desc->action &&
+ (desc->action->flags & IRQF_IRQPOLL))
+ call_misrouted_irq = 1;
+ spin_unlock(&desc->lock);
+ }
+ }
+ if (call_misrouted_irq) {
int ok = misrouted_irq(irq);
if (action_ret == IRQ_NONE)
desc->irqs_unhandled -= ok;
_
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Fix crash with irqpoll due to the IRQF_IRQPOLL flag
2007-05-22 8:37 ` [PATCH] Fix crash with irqpoll due to the IRQF_IRQPOLL flag Bernhard Walle
@ 2007-05-23 16:01 ` Linus Torvalds
2007-05-24 5:28 ` Vivek Goyal
0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2007-05-23 16:01 UTC (permalink / raw)
To: Bernhard Walle; +Cc: Andrew Morton, linux-kernel, Vivek Goyal
On Tue, 22 May 2007, Bernhard Walle wrote:
>
> o System crashes if booted with irqpoll command line option.
>
> o Problem happens because Inside note_interrupt() we are accessing
> desc->action->flag without taking the desc->lock. While accessing it
> somebody goes ahead and unregisters the irq handler hence desc->action
> is NULL. By the time note_interrupt() checks it, it crashes.
I absolutely _detest_ patches that make already complex and unreadable
code even more so. Especially conditionals. Please don't do that.
If you need a variable for a conditional, make it be an implicit one from
an inline function, and aim for making it readable.
So how about instead writing it out as a nice self-explanatory inline
function? I can almost guarantee that this generates no worse code, and it
also makes it easy to explain things like "we don't bother with the lock,
because we don't care enough".
Untested, but I think the point of the patch is obvious. Anybody want to
test it, send it back to me, and fix the bug while making the code more
readable?
I think we should instate a hard new rule:
- if a bugfix doesn't make the code more readable, it's not a bugfix at
all.
There was a reason for the bug in the first place, and that reason is
likely that the code was hard to think about. So making it even _harder_
to think about isn't really fixing the deeper problem!
Linus
---
kernel/irq/spurious.c | 46 +++++++++++++++++++++++++++++++++++++---------
1 files changed, 37 insertions(+), 9 deletions(-)
diff --git a/kernel/irq/spurious.c b/kernel/irq/spurious.c
index b0d81aa..bd9e272 100644
--- a/kernel/irq/spurious.c
+++ b/kernel/irq/spurious.c
@@ -135,6 +135,39 @@ report_bad_irq(unsigned int irq, struct irq_desc *desc, irqreturn_t action_ret)
}
}
+static inline int try_misrouted_irq(unsigned int irq, struct irq_desc *desc, irqreturn_t action_ret)
+{
+ struct irqaction *action;
+
+ if (!irqfixup)
+ return 0;
+
+ /* We didn't actually handle the IRQ - see if it was misrouted? */
+ if (action_ret == IRQ_NONE)
+ return 1;
+
+ /*
+ * But for 'irqfixup == 2' we also do it for handled interrupts if
+ * they are marked as IRQF_IRQPOLL (or for irq zero, which is the
+ * traditional PC timer interrupt.. Legacy)
+ */
+ if (irqfixup < 2)
+ return 0;
+
+ if (!irq)
+ return 1;
+
+ /*
+ * Since we don't get the descriptor lock, "action" can
+ * change under us. We don't really care, but we don't
+ * want to follow a NULL pointer. So tell the compiler to
+ * just load it once by using a barrier.
+ */
+ action = desc->action;
+ barrier();
+ return action && (action->flags & IRQF_IRQPOLL);
+}
+
void note_interrupt(unsigned int irq, struct irq_desc *desc,
irqreturn_t action_ret)
{
@@ -144,15 +177,10 @@ void note_interrupt(unsigned int irq, struct irq_desc *desc,
report_bad_irq(irq, desc, action_ret);
}
- if (unlikely(irqfixup)) {
- /* Don't punish working computers */
- if ((irqfixup == 2 && ((irq == 0) ||
- (desc->action->flags & IRQF_IRQPOLL))) ||
- action_ret == IRQ_NONE) {
- int ok = misrouted_irq(irq);
- if (action_ret == IRQ_NONE)
- desc->irqs_unhandled -= ok;
- }
+ if (unlikely(try_misrouted_irq(irq, desc, action_ret))) {
+ int ok = misrouted_irq(irq);
+ if (action_ret == IRQ_NONE)
+ desc->irqs_unhandled -= ok;
}
desc->irq_count++;
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] Fix crash with irqpoll due to the IRQF_IRQPOLL flag
2007-05-23 16:01 ` Linus Torvalds
@ 2007-05-24 5:28 ` Vivek Goyal
0 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2007-05-24 5:28 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Bernhard Walle, Andrew Morton, linux-kernel
On Wed, May 23, 2007 at 09:01:04AM -0700, Linus Torvalds wrote:
>
>
> On Tue, 22 May 2007, Bernhard Walle wrote:
> >
> > o System crashes if booted with irqpoll command line option.
> >
> > o Problem happens because Inside note_interrupt() we are accessing
> > desc->action->flag without taking the desc->lock. While accessing it
> > somebody goes ahead and unregisters the irq handler hence desc->action
> > is NULL. By the time note_interrupt() checks it, it crashes.
>
> I absolutely _detest_ patches that make already complex and unreadable
> code even more so. Especially conditionals. Please don't do that.
>
> If you need a variable for a conditional, make it be an implicit one from
> an inline function, and aim for making it readable.
>
> So how about instead writing it out as a nice self-explanatory inline
> function? I can almost guarantee that this generates no worse code, and it
> also makes it easy to explain things like "we don't bother with the lock,
> because we don't care enough".
>
> Untested, but I think the point of the patch is obvious. Anybody want to
> test it, send it back to me, and fix the bug while making the code more
> readable?
>
Hi Linus,
I tested it. It works fine. And yes, this patch is more readable.
Thanks
Vivek
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-05-24 5:29 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-26 9:36 2.6.21-rc7-mm2 "irqpoll" seems to be broken Vivek Goyal
2007-04-26 15:24 ` Andrew Morton
2007-04-26 15:43 ` Bernhard Walle
2007-04-30 8:48 ` Vivek Goyal
2007-05-01 19:20 ` Bernhard Walle
[not found] ` <20070502221932.GA488@suse.de>
2007-05-08 17:18 ` Vivek Goyal
2007-05-14 14:05 ` Bernhard Walle
2007-05-17 13:05 ` Vivek Goyal
2007-05-17 21:56 ` Bernhard Walle
2007-05-22 8:37 ` [PATCH] Fix crash with irqpoll due to the IRQF_IRQPOLL flag Bernhard Walle
2007-05-23 16:01 ` Linus Torvalds
2007-05-24 5:28 ` Vivek Goyal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).