* [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
@ 2008-06-17 21:57 Johannes Berg
2008-06-17 23:08 ` David Miller
2008-06-17 23:55 ` Linus Torvalds
0 siblings, 2 replies; 18+ messages in thread
From: Johannes Berg @ 2008-06-17 21:57 UTC (permalink / raw)
To: Linux Kernel list
Cc: Michael Buesch, David Ellingsworth, linux-wireless, Ingo Molnar,
Linus Torvalds
This warning has started to trigger with mac80211 because it can, under
some circumstances, use spin_lock_bh() protected sections within
irq-disabled sections. Is that a bug?
Interestingly, the warning was never noticed until somebody ran the code
on UP/NO-PREEMPT because local_bh_enable_ip() does not contain such a
warning. Also, because softirq disabling is refcounted (via the preempt
counter), it ought to be safe to nest it and even use within
irqs-disabled sections, as far as I can tell.
Also, if you're going to treat IRQs being enabled as a bug, there's no
point in disabling them right afterwards, is there?
If you're going to reject this patch, I'll post one that adds the
warning to local_bh_enable_ip() to allow detecting this for everybody
and not just those poor people running UP/NO-PREEMPT :)
(and why doesn't local_bh_enable just call local_bh_enable_ip anyway? or
both "call" a common static inline?)
---
kernel/softirq.c | 3 ---
1 file changed, 3 deletions(-)
--- everything.orig/kernel/softirq.c 2008-06-17 23:48:25.000000000 +0200
+++ everything/kernel/softirq.c 2008-06-17 23:48:56.000000000 +0200
@@ -137,10 +137,7 @@ void local_bh_enable(void)
unsigned long flags;
WARN_ON_ONCE(in_irq());
-#endif
- WARN_ON_ONCE(irqs_disabled());
-#ifdef CONFIG_TRACE_IRQFLAGS
local_irq_save(flags);
#endif
/*
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
2008-06-17 21:57 [PATCH/RFC] remove irqs_disabled warning from local_bh_enable Johannes Berg
@ 2008-06-17 23:08 ` David Miller
2008-06-17 23:55 ` Linus Torvalds
1 sibling, 0 replies; 18+ messages in thread
From: David Miller @ 2008-06-17 23:08 UTC (permalink / raw)
To: johannes; +Cc: linux-kernel, mb, david, linux-wireless, mingo, torvalds
From: Johannes Berg <johannes@sipsolutions.net>
Date: Tue, 17 Jun 2008 23:57:14 +0200
> This warning has started to trigger with mac80211 because it can, under
> some circumstances, use spin_lock_bh() protected sections within
> irq-disabled sections. Is that a bug?
It is a bug. The only legal nesting is base --> BH --> IRQ locking.
> Also, if you're going to treat IRQs being enabled as a bug, there's no
> point in disabling them right afterwards, is there?
It's treating IRQs being "disabled" as a bug. It expects them to
be enabled, always.
> If you're going to reject this patch, I'll post one that adds the
> warning to local_bh_enable_ip() to allow detecting this for everybody
> and not just those poor people running UP/NO-PREEMPT :)
That might be useful :)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
2008-06-17 21:57 [PATCH/RFC] remove irqs_disabled warning from local_bh_enable Johannes Berg
2008-06-17 23:08 ` David Miller
@ 2008-06-17 23:55 ` Linus Torvalds
2008-06-18 7:01 ` Peter Zijlstra
2008-06-18 7:29 ` Johannes Berg
1 sibling, 2 replies; 18+ messages in thread
From: Linus Torvalds @ 2008-06-17 23:55 UTC (permalink / raw)
To: Johannes Berg
Cc: Linux Kernel list, Michael Buesch, David Ellingsworth,
linux-wireless, Ingo Molnar
On Tue, 17 Jun 2008, Johannes Berg wrote:
>
> This warning has started to trigger with mac80211 because it can, under
> some circumstances, use spin_lock_bh() protected sections within
> irq-disabled sections. Is that a bug?
Yes, it's a bug.
Why? Not because of the "spin_lock_bh()" itself, but because of the
_unlock_, which does a "local_bh_enable_ip()", which in turn will check
the whole "do_softirq()" if it was the last softirq_count.
And you must not do softirq's when hard-irq's were disabled!
So it should in theory be ok (but perhaps a bit odd) to do something like
spin_lock_irq(&irq_lock);
..do something..
spin_lock_bh(&bh_lock);
spin_unlock_irq(&irq_lock);
.. do something else ..
spin_unlock_bh(&bh_lock);
where the "spin_lock_bh()" itself is in an irq-locked context - as long as
the "spin_unlock_bh()" is *not*.
See?
Linus
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
2008-06-17 23:55 ` Linus Torvalds
@ 2008-06-18 7:01 ` Peter Zijlstra
2008-06-18 7:29 ` Johannes Berg
1 sibling, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2008-06-18 7:01 UTC (permalink / raw)
To: Linus Torvalds
Cc: Johannes Berg, Linux Kernel list, Michael Buesch,
David Ellingsworth, linux-wireless, Ingo Molnar
On Tue, 2008-06-17 at 16:55 -0700, Linus Torvalds wrote:
>
> On Tue, 17 Jun 2008, Johannes Berg wrote:
> >
> > This warning has started to trigger with mac80211 because it can, under
> > some circumstances, use spin_lock_bh() protected sections within
> > irq-disabled sections. Is that a bug?
>
> Yes, it's a bug.
>
> Why? Not because of the "spin_lock_bh()" itself, but because of the
> _unlock_, which does a "local_bh_enable_ip()", which in turn will check
> the whole "do_softirq()" if it was the last softirq_count.
>
> And you must not do softirq's when hard-irq's were disabled!
>
> So it should in theory be ok (but perhaps a bit odd) to do something like
>
> spin_lock_irq(&irq_lock);
> ..do something..
> spin_lock_bh(&bh_lock);
> spin_unlock_irq(&irq_lock);
> .. do something else ..
> spin_unlock_bh(&bh_lock);
>
> where the "spin_lock_bh()" itself is in an irq-locked context - as long as
> the "spin_unlock_bh()" is *not*.
I would suggest discouraging such madne^Wcreativity, its gains are
dubious at best and it doesn't make the locking any more obvious and
could be an indication of messy locking to begin with.
So I would like to see Johannes' other patch that allows all of us to
enjoy the warning he ran into ;-)
Peter
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
2008-06-17 23:55 ` Linus Torvalds
2008-06-18 7:01 ` Peter Zijlstra
@ 2008-06-18 7:29 ` Johannes Berg
2008-06-20 13:46 ` Ingo Molnar
1 sibling, 1 reply; 18+ messages in thread
From: Johannes Berg @ 2008-06-18 7:29 UTC (permalink / raw)
To: Linus Torvalds
Cc: Linux Kernel list, Michael Buesch, David Ellingsworth,
linux-wireless, Ingo Molnar
On Tue, 2008-06-17 at 16:55 -0700, Linus Torvalds wrote:
> > This warning has started to trigger with mac80211 because it can, under
> > some circumstances, use spin_lock_bh() protected sections within
> > irq-disabled sections. Is that a bug?
>
> Yes, it's a bug.
You know, that actually makes sense to me.
> Why? Not because of the "spin_lock_bh()" itself, but because of the
> _unlock_, which does a "local_bh_enable_ip()", which in turn will check
> the whole "do_softirq()" if it was the last softirq_count.
Yeah, that makes sense to me as well. In fact, I pondered a bit over the
code until posting this.
> And you must not do softirq's when hard-irq's were disabled!
>
> So it should in theory be ok (but perhaps a bit odd) to do something like
>
> spin_lock_irq(&irq_lock);
> ..do something..
> spin_lock_bh(&bh_lock);
> spin_unlock_irq(&irq_lock);
> .. do something else ..
> spin_unlock_bh(&bh_lock);
>
> where the "spin_lock_bh()" itself is in an irq-locked context - as long as
> the "spin_unlock_bh()" is *not*.
>
> See?
Well, yes, from an API POV I do see that. And it's actually what I
thought up front. The thing that started to confuse me is that
local_bh_enable_ip reads like this:
void local_bh_enable_ip(unsigned long ip)
{
#ifdef CONFIG_TRACE_IRQFLAGS
unsigned long flags;
WARN_ON_ONCE(in_irq());
local_irq_save(flags);
#endif
[...]
if (unlikely(!in_interrupt() && local_softirq_pending()))
do_softirq();
dec_preempt_count();
#ifdef CONFIG_TRACE_IRQFLAGS
local_irq_restore(flags);
#endif
So doesn't it actually run do_softirq() when interrupts are still
disabled? If I look deeper into the code that appears to be a rather odd
trick to avoid locking and __do_softirq will unconditionally enable
IRQs, so I guess here's my answer...
How about the patch below?
johannes
From: Johannes Berg <johannes@sipsolutions.net>
Subject: clean up and comment local_bh_enable code
There's no need to use local_irq_save() over local_irq_disable() in the
local_bh_enable code since it is a bug to call it with irqs disabled and
do_softirq will enable irqs if there is any pending work. Consolidate
the code from local_bh_enable and ..._ip to avoid having a disconnect
between them in the warnings they trigger that is currently there. Also
always trigger the warning on in_irq(), not just in the trace-irqflags
case.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
---
I made it a static inline so that runtime behaviour isn't affected by
another function call, but that can be argued.
Maybe local_bh_enable() should just be a static inline somewhere in a
header file, that would get rid of one export_symbol as well.
kernel/softirq.c | 48 +++++++++++-------------------------------------
1 file changed, 11 insertions(+), 37 deletions(-)
--- everything.orig/kernel/softirq.c 2008-06-17 23:48:25.000000000 +0200
+++ everything/kernel/softirq.c 2008-06-18 09:13:21.000000000 +0200
@@ -131,23 +131,17 @@ void _local_bh_enable(void)
EXPORT_SYMBOL(_local_bh_enable);
-void local_bh_enable(void)
+static inline void _local_bh_enable_ip(unsigned long ip)
{
+ WARN_ON_ONCE(in_irq() || irqs_disabled());
#ifdef CONFIG_TRACE_IRQFLAGS
- unsigned long flags;
-
- WARN_ON_ONCE(in_irq());
-#endif
- WARN_ON_ONCE(irqs_disabled());
-
-#ifdef CONFIG_TRACE_IRQFLAGS
- local_irq_save(flags);
+ local_irq_disable();
#endif
/*
* Are softirqs going to be turned on now:
*/
if (softirq_count() == SOFTIRQ_OFFSET)
- trace_softirqs_on((unsigned long)__builtin_return_address(0));
+ trace_softirqs_on(ip);
/*
* Keep preemption disabled until we are done with
* softirq processing:
@@ -159,40 +153,20 @@ void local_bh_enable(void)
dec_preempt_count();
#ifdef CONFIG_TRACE_IRQFLAGS
- local_irq_restore(flags);
+ local_irq_enable();
#endif
preempt_check_resched();
}
+
+void local_bh_enable(void)
+{
+ _local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+}
EXPORT_SYMBOL(local_bh_enable);
void local_bh_enable_ip(unsigned long ip)
{
-#ifdef CONFIG_TRACE_IRQFLAGS
- unsigned long flags;
-
- WARN_ON_ONCE(in_irq());
-
- local_irq_save(flags);
-#endif
- /*
- * Are softirqs going to be turned on now:
- */
- if (softirq_count() == SOFTIRQ_OFFSET)
- trace_softirqs_on(ip);
- /*
- * Keep preemption disabled until we are done with
- * softirq processing:
- */
- sub_preempt_count(SOFTIRQ_OFFSET - 1);
-
- if (unlikely(!in_interrupt() && local_softirq_pending()))
- do_softirq();
-
- dec_preempt_count();
-#ifdef CONFIG_TRACE_IRQFLAGS
- local_irq_restore(flags);
-#endif
- preempt_check_resched();
+ _local_bh_enable_ip(ip);
}
EXPORT_SYMBOL(local_bh_enable_ip);
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
2008-06-18 7:29 ` Johannes Berg
@ 2008-06-20 13:46 ` Ingo Molnar
2008-06-20 15:27 ` Ingo Molnar
0 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2008-06-20 13:46 UTC (permalink / raw)
To: Johannes Berg
Cc: Linus Torvalds, Linux Kernel list, Michael Buesch,
David Ellingsworth, linux-wireless
* Johannes Berg <johannes@sipsolutions.net> wrote:
> Subject: clean up and comment local_bh_enable code
>
> There's no need to use local_irq_save() over local_irq_disable() in
> the local_bh_enable code since it is a bug to call it with irqs
> disabled and do_softirq will enable irqs if there is any pending work.
> Consolidate the code from local_bh_enable and ..._ip to avoid having a
> disconnect between them in the warnings they trigger that is currently
> there. Also always trigger the warning on in_irq(), not just in the
> trace-irqflags case.
applied to tip/core/softirq for testing, thanks Johannes.
Ingo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
2008-06-20 13:46 ` Ingo Molnar
@ 2008-06-20 15:27 ` Ingo Molnar
2008-06-20 15:36 ` Michael Buesch
2008-06-20 15:43 ` Johannes Berg
0 siblings, 2 replies; 18+ messages in thread
From: Ingo Molnar @ 2008-06-20 15:27 UTC (permalink / raw)
To: Johannes Berg
Cc: Linus Torvalds, Linux Kernel list, Michael Buesch,
David Ellingsworth, linux-wireless
* Ingo Molnar <mingo@elte.hu> wrote:
> > Subject: clean up and comment local_bh_enable code
> >
> > There's no need to use local_irq_save() over local_irq_disable() in
> > the local_bh_enable code since it is a bug to call it with irqs
> > disabled and do_softirq will enable irqs if there is any pending
> > work. Consolidate the code from local_bh_enable and ..._ip to avoid
> > having a disconnect between them in the warnings they trigger that
> > is currently there. Also always trigger the warning on in_irq(), not
> > just in the trace-irqflags case.
>
> applied to tip/core/softirq for testing, thanks Johannes.
ok, -tip testing found that your patch triggers a new warning on an old
testbox that uses 3c59x vortex and netlogging:
----->
calling vortex_init+0x0/0xb0
PCI: Found IRQ 10 for device 0000:00:0b.0
PCI: Sharing IRQ 10 with 0000:00:0a.0
PCI: Sharing IRQ 10 with 0000:00:0b.1
3c59x: Donald Becker and others.
0000:00:0b.0: 3Com PCI 3c556 Laptop Tornado at e0800400.
PCI: Enabling bus mastering for device 0000:00:0b.0
initcall vortex_init+0x0/0xb0 returned 0 after 47 msecs
...
calling init_netconsole+0x0/0x1b0
netconsole: local port 4444
netconsole: local IP 10.0.1.9
netconsole: interface eth0
netconsole: remote port 4444
netconsole: remote IP 10.0.1.16
netconsole: remote ethernet address 00:19:xx:xx:xx:xx
netconsole: device eth0 not up yet, forcing it
eth0: setting half-duplex.
eth0: setting full-duplex.
------------[ cut here ]------------
WARNING: at kernel/softirq.c:137 local_bh_enable_ip+0xd1/0xe0()
Pid: 1, comm: swapper Not tainted 2.6.26-rc6-tip #2091
[<c0125ecf>] warn_on_slowpath+0x4f/0x70
[<c0126834>] ? release_console_sem+0x1b4/0x1d0
[<c0126d00>] ? vprintk+0x2a0/0x450
[<c012fde5>] ? __mod_timer+0xa5/0xc0
[<c046f7fd>] ? mdio_sync+0x3d/0x50
[<c0160ef6>] ? marker_probe_cb+0x46/0xa0
[<c0126ed7>] ? printk+0x27/0x50
[<c046f4c3>] ? vortex_set_duplex+0x43/0xc0
[<c046f521>] ? vortex_set_duplex+0xa1/0xc0
[<c0471b92>] ? vortex_timer+0xe2/0x3e0
[<c012b361>] local_bh_enable_ip+0xd1/0xe0
[<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
[<c0471b92>] vortex_timer+0xe2/0x3e0
[<c014743b>] ? trace_hardirqs_on+0xb/0x10
[<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160
[<c012f8b2>] run_timer_softirq+0x162/0x1c0
[<c0471ab0>] ? vortex_timer+0x0/0x3e0
[<c012b361>] local_bh_enable_ip+0xd1/0xe0
[<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
[<c0471b92>] vortex_timer+0xe2/0x3e0
[<c014743b>] ? trace_hardirqs_on+0xb/0x10
[<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160
[<c012f8b2>] run_timer_softirq+0x162/0x1c0
[<c0471ab0>] ? vortex_timer+0x0/0x3e0
[<c0471ab0>] ? vortex_timer+0x0/0x3e0
[<c012b60a>] __do_softirq+0x9a/0x160
[<c012b570>] ? __do_softirq+0x0/0x160
[<c0106775>] call_on_stack+0x15/0x30
[<c012b4f5>] ? irq_exit+0x55/0x60
[<c0106e85>] ? do_IRQ+0x85/0xd0
[<c0147391>] ? trace_hardirqs_on_caller+0xc1/0x160
[<c0104888>] ? common_interrupt+0x28/0x30
[<c08d8ac8>] ? mutex_unlock+0x8/0x10
[<c08d8180>] ? _cond_resched+0x10/0x30
[<c07a3be7>] ? netpoll_setup+0x117/0x390
[<c0cbfcfe>] ? init_netconsole+0x14e/0x1b0
[<c013d539>] ? ktime_get+0x19/0x40
[<c0c9bab2>] ? kernel_init+0x1b2/0x2c0
[<c0cbfbb0>] ? init_netconsole+0x0/0x1b0
[<c0396aa4>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c0103f12>] ? restore_nocheck_notrace+0x0/0xe
[<c0c9b900>] ? kernel_init+0x0/0x2c0
[<c0c9b900>] ? kernel_init+0x0/0x2c0
[<c0104aa7>] ? kernel_thread_helper+0x7/0x10
=======================
---[ end trace 37f9c502aff112e0 ]---
console [netcon0] enabled
netconsole: network logging started
initcall init_netconsole+0x0/0x1b0 returned 0 after 2914 msecs
real bug or false positive?
config is at:
http://redhat.com/~mingo/misc/config-Fri_Jun_20_16_51_07_CEST_2008.bad
Ingo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
2008-06-20 15:27 ` Ingo Molnar
@ 2008-06-20 15:36 ` Michael Buesch
2008-06-20 15:55 ` Ingo Molnar
2008-06-20 15:43 ` Johannes Berg
1 sibling, 1 reply; 18+ messages in thread
From: Michael Buesch @ 2008-06-20 15:36 UTC (permalink / raw)
To: Ingo Molnar
Cc: Johannes Berg, Linus Torvalds, Linux Kernel list,
David Ellingsworth, linux-wireless
On Friday 20 June 2008 17:27:48 Ingo Molnar wrote:
> [<c012b361>] local_bh_enable_ip+0xd1/0xe0
> [<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
> [<c0471b92>] vortex_timer+0xe2/0x3e0
> real bug or false positive?
Well, a timer runs with IRQs disabled, no? So this would be a bug.
--
Greetings Michael.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
2008-06-20 15:27 ` Ingo Molnar
2008-06-20 15:36 ` Michael Buesch
@ 2008-06-20 15:43 ` Johannes Berg
2008-06-20 15:46 ` Johannes Berg
1 sibling, 1 reply; 18+ messages in thread
From: Johannes Berg @ 2008-06-20 15:43 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Linux Kernel list, Michael Buesch,
David Ellingsworth, linux-wireless
[-- Attachment #1: Type: text/plain, Size: 2499 bytes --]
On Fri, 2008-06-20 at 17:27 +0200, Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
>
> > > Subject: clean up and comment local_bh_enable code
> > >
> > > There's no need to use local_irq_save() over local_irq_disable() in
> > > the local_bh_enable code since it is a bug to call it with irqs
> > > disabled and do_softirq will enable irqs if there is any pending
> > > work. Consolidate the code from local_bh_enable and ..._ip to avoid
> > > having a disconnect between them in the warnings they trigger that
> > > is currently there. Also always trigger the warning on in_irq(), not
> > > just in the trace-irqflags case.
> >
> > applied to tip/core/softirq for testing, thanks Johannes.
>
> ok, -tip testing found that your patch triggers a new warning on an old
> testbox that uses 3c59x vortex and netlogging:
>
> ----->
> calling vortex_init+0x0/0xb0
> PCI: Found IRQ 10 for device 0000:00:0b.0
> PCI: Sharing IRQ 10 with 0000:00:0a.0
> PCI: Sharing IRQ 10 with 0000:00:0b.1
> 3c59x: Donald Becker and others.
> 0000:00:0b.0: 3Com PCI 3c556 Laptop Tornado at e0800400.
> PCI: Enabling bus mastering for device 0000:00:0b.0
> initcall vortex_init+0x0/0xb0 returned 0 after 47 msecs
> ...
> calling init_netconsole+0x0/0x1b0
> netconsole: local port 4444
> netconsole: local IP 10.0.1.9
> netconsole: interface eth0
> netconsole: remote port 4444
> netconsole: remote IP 10.0.1.16
> netconsole: remote ethernet address 00:19:xx:xx:xx:xx
> netconsole: device eth0 not up yet, forcing it
> eth0: setting half-duplex.
> eth0: setting full-duplex.
> ------------[ cut here ]------------
> WARNING: at kernel/softirq.c:137 local_bh_enable_ip+0xd1/0xe0()
> Pid: 1, comm: swapper Not tainted 2.6.26-rc6-tip #2091
> [<c0125ecf>] warn_on_slowpath+0x4f/0x70
> [<c0126834>] ? release_console_sem+0x1b4/0x1d0
> [<c0126d00>] ? vprintk+0x2a0/0x450
Now you can't printk in irq context any more?
vprintk:
/*
* Try to acquire and then immediately release the
* console semaphore. The release will do all the
* actual magic (print out buffers, wake up klogd,
* etc).
*
* The acquire_console_semaphore_for_printk() function
* will release 'logbuf_lock' regardless of whether it
* actually gets the semaphore or not.
*/
if (acquire_console_semaphore_for_printk(this_cpu))
release_console_sem();
johannes
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
2008-06-20 15:43 ` Johannes Berg
@ 2008-06-20 15:46 ` Johannes Berg
0 siblings, 0 replies; 18+ messages in thread
From: Johannes Berg @ 2008-06-20 15:46 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Linux Kernel list, Michael Buesch,
David Ellingsworth, linux-wireless
[-- Attachment #1: Type: text/plain, Size: 2165 bytes --]
On Fri, 2008-06-20 at 17:43 +0200, Johannes Berg wrote:
> On Fri, 2008-06-20 at 17:27 +0200, Ingo Molnar wrote:
> > * Ingo Molnar <mingo@elte.hu> wrote:
> >
> > > > Subject: clean up and comment local_bh_enable code
> > > >
> > > > There's no need to use local_irq_save() over local_irq_disable() in
> > > > the local_bh_enable code since it is a bug to call it with irqs
> > > > disabled and do_softirq will enable irqs if there is any pending
> > > > work. Consolidate the code from local_bh_enable and ..._ip to avoid
> > > > having a disconnect between them in the warnings they trigger that
> > > > is currently there. Also always trigger the warning on in_irq(), not
> > > > just in the trace-irqflags case.
> > >
> > > applied to tip/core/softirq for testing, thanks Johannes.
> >
> > ok, -tip testing found that your patch triggers a new warning on an old
> > testbox that uses 3c59x vortex and netlogging:
> >
> > ----->
> > calling vortex_init+0x0/0xb0
> > PCI: Found IRQ 10 for device 0000:00:0b.0
> > PCI: Sharing IRQ 10 with 0000:00:0a.0
> > PCI: Sharing IRQ 10 with 0000:00:0b.1
> > 3c59x: Donald Becker and others.
> > 0000:00:0b.0: 3Com PCI 3c556 Laptop Tornado at e0800400.
> > PCI: Enabling bus mastering for device 0000:00:0b.0
> > initcall vortex_init+0x0/0xb0 returned 0 after 47 msecs
> > ...
> > calling init_netconsole+0x0/0x1b0
> > netconsole: local port 4444
> > netconsole: local IP 10.0.1.9
> > netconsole: interface eth0
> > netconsole: remote port 4444
> > netconsole: remote IP 10.0.1.16
> > netconsole: remote ethernet address 00:19:xx:xx:xx:xx
> > netconsole: device eth0 not up yet, forcing it
> > eth0: setting half-duplex.
> > eth0: setting full-duplex.
> > ------------[ cut here ]------------
> > WARNING: at kernel/softirq.c:137 local_bh_enable_ip+0xd1/0xe0()
> > Pid: 1, comm: swapper Not tainted 2.6.26-rc6-tip #2091
> > [<c0125ecf>] warn_on_slowpath+0x4f/0x70
> > [<c0126834>] ? release_console_sem+0x1b4/0x1d0
> > [<c0126d00>] ? vprintk+0x2a0/0x450
>
> Now you can't printk in irq context any more?
Or I'm just confused by x86 stack dumps.
johannes
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
2008-06-20 15:36 ` Michael Buesch
@ 2008-06-20 15:55 ` Ingo Molnar
2008-06-20 16:01 ` Michael Buesch
0 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2008-06-20 15:55 UTC (permalink / raw)
To: Michael Buesch
Cc: Johannes Berg, Linus Torvalds, Linux Kernel list,
David Ellingsworth, linux-wireless
* Michael Buesch <mb@bu3sch.de> wrote:
> On Friday 20 June 2008 17:27:48 Ingo Molnar wrote:
> > [<c012b361>] local_bh_enable_ip+0xd1/0xe0
> > [<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
> > [<c0471b92>] vortex_timer+0xe2/0x3e0
>
> > real bug or false positive?
>
> Well, a timer runs with IRQs disabled, no? So this would be a bug.
indeed - agreed :) [no time for me to fix it, but can test any rfc patch.]
so score +1 for Johannes's patch. I think if it works out fine and there
are no false positives in further testing i'll put it up for pull into
v2.6.26 as well, as it is catching real bugs.
Ingo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
2008-06-20 15:55 ` Ingo Molnar
@ 2008-06-20 16:01 ` Michael Buesch
2008-06-20 16:18 ` Michael Buesch
0 siblings, 1 reply; 18+ messages in thread
From: Michael Buesch @ 2008-06-20 16:01 UTC (permalink / raw)
To: Ingo Molnar
Cc: Johannes Berg, Linus Torvalds, Linux Kernel list,
David Ellingsworth, linux-wireless
On Friday 20 June 2008 17:55:41 Ingo Molnar wrote:
>
> * Michael Buesch <mb@bu3sch.de> wrote:
>
> > On Friday 20 June 2008 17:27:48 Ingo Molnar wrote:
> > > [<c012b361>] local_bh_enable_ip+0xd1/0xe0
> > > [<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
> > > [<c0471b92>] vortex_timer+0xe2/0x3e0
> >
> > > real bug or false positive?
> >
> > Well, a timer runs with IRQs disabled, no? So this would be a bug.
>
> indeed - agreed :) [no time for me to fix it, but can test any rfc patch.]
A quick workaround always is to convert the lock into an _irqsafe lock.
Although it introduces higher overhead (interrupt-wise), it prevents the bug.
A real fix would require to understand the locking in the driver, which I
don't, as I never looked at the driver. :)
--
Greetings Michael.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
2008-06-20 16:01 ` Michael Buesch
@ 2008-06-20 16:18 ` Michael Buesch
2008-06-23 8:41 ` Ingo Molnar
0 siblings, 1 reply; 18+ messages in thread
From: Michael Buesch @ 2008-06-20 16:18 UTC (permalink / raw)
To: Ingo Molnar
Cc: Johannes Berg, Linus Torvalds, Linux Kernel list,
David Ellingsworth, linux-wireless
On Friday 20 June 2008 18:01:09 Michael Buesch wrote:
> On Friday 20 June 2008 17:55:41 Ingo Molnar wrote:
> >
> > * Michael Buesch <mb@bu3sch.de> wrote:
> >
> > > On Friday 20 June 2008 17:27:48 Ingo Molnar wrote:
> > > > [<c012b361>] local_bh_enable_ip+0xd1/0xe0
> > > > [<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
> > > > [<c0471b92>] vortex_timer+0xe2/0x3e0
> > >
> > > > real bug or false positive?
> > >
> > > Well, a timer runs with IRQs disabled, no? So this would be a bug.
> >
> > indeed - agreed :) [no time for me to fix it, but can test any rfc patch.]
>
> A quick workaround always is to convert the lock into an _irqsafe lock.
> Although it introduces higher overhead (interrupt-wise), it prevents the bug.
> A real fix would require to understand the locking in the driver, which I
> don't, as I never looked at the driver. :)
However, looking at the driver I think the fix actually is trivial:
Index: wireless-testing/drivers/net/3c59x.c
===================================================================
--- wireless-testing.orig/drivers/net/3c59x.c 2008-05-16 00:26:29.000000000 +0200
+++ wireless-testing/drivers/net/3c59x.c 2008-06-20 18:16:55.000000000 +0200
@@ -1768,9 +1768,10 @@ vortex_timer(unsigned long data)
case XCVR_MII: case XCVR_NWAY:
{
ok = 1;
- spin_lock_bh(&vp->lock);
+ /* Interrupts are already disabled */
+ spin_lock(&vp->lock);
vortex_check_media(dev, 0);
- spin_unlock_bh(&vp->lock);
+ spin_unlock(&vp->lock);
}
break;
default: /* Other media types handled by Tx timeouts. */
vp->lock is also taken in hardware IRQ context, so we _have_ to always
use irqsafe locking. As we run in a timer with IRQs disabled,
we can simply use spin_lock.
--
Greetings Michael.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
2008-06-20 16:18 ` Michael Buesch
@ 2008-06-23 8:41 ` Ingo Molnar
2008-06-23 9:22 ` Michael Buesch
2008-06-27 5:33 ` Jeff Garzik
0 siblings, 2 replies; 18+ messages in thread
From: Ingo Molnar @ 2008-06-23 8:41 UTC (permalink / raw)
To: Michael Buesch
Cc: Johannes Berg, Linus Torvalds, Linux Kernel list,
David Ellingsworth, linux-wireless, Jeff Garzik, netdev
* Michael Buesch <mb@bu3sch.de> wrote:
> On Friday 20 June 2008 18:01:09 Michael Buesch wrote:
> > On Friday 20 June 2008 17:55:41 Ingo Molnar wrote:
> > >
> > > * Michael Buesch <mb@bu3sch.de> wrote:
> > >
> > > > On Friday 20 June 2008 17:27:48 Ingo Molnar wrote:
> > > > > [<c012b361>] local_bh_enable_ip+0xd1/0xe0
> > > > > [<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
> > > > > [<c0471b92>] vortex_timer+0xe2/0x3e0
> > > >
> > > > > real bug or false positive?
> > > >
> > > > Well, a timer runs with IRQs disabled, no? So this would be a bug.
> > >
> > > indeed - agreed :) [no time for me to fix it, but can test any rfc patch.]
> >
> > A quick workaround always is to convert the lock into an _irqsafe lock.
> > Although it introduces higher overhead (interrupt-wise), it prevents the bug.
> > A real fix would require to understand the locking in the driver, which I
> > don't, as I never looked at the driver. :)
>
>
> However, looking at the driver I think the fix actually is trivial:
>
> Index: wireless-testing/drivers/net/3c59x.c
> ===================================================================
> --- wireless-testing.orig/drivers/net/3c59x.c 2008-05-16 00:26:29.000000000 +0200
> +++ wireless-testing/drivers/net/3c59x.c 2008-06-20 18:16:55.000000000 +0200
> @@ -1768,9 +1768,10 @@ vortex_timer(unsigned long data)
> case XCVR_MII: case XCVR_NWAY:
> {
> ok = 1;
> - spin_lock_bh(&vp->lock);
> + /* Interrupts are already disabled */
> + spin_lock(&vp->lock);
> vortex_check_media(dev, 0);
> - spin_unlock_bh(&vp->lock);
> + spin_unlock(&vp->lock);
> }
> break;
> default: /* Other media types handled by Tx timeouts. */
>
>
> vp->lock is also taken in hardware IRQ context, so we _have_ to always
> use irqsafe locking. As we run in a timer with IRQs disabled, we can
> simply use spin_lock.
thanks Michael! I have applied your fix to the tip/out-of-tree
local/non-propagated fixlets branch for more testing. It takes up to 4
hours to trigger this warning on that testbox, so i should know it
whether it fixes the bug later today.
Cc:-ed Jeff, he might want to pick up this fix, find the tidied up
commit below. (3c59x is still present in various older laptops, so
-stable candidate too i guess.)
Ingo
---------------------->
commit 24a5454d9f7863b00143760197f0ec29c8234289
Author: Michael Buesch <mb@bu3sch.de>
Date: Fri Jun 20 18:18:34 2008 +0200
net, vortex: fix lockup
Ingo Molnar reported:
-tip testing found that Johannes Berg's "softirq: remove irqs_disabled
warning from local_bh_enable" enhancement to lockdep triggers a new
warning on an old testbox that uses 3c59x vortex and netlogging:
----->
calling vortex_init+0x0/0xb0
PCI: Found IRQ 10 for device 0000:00:0b.0
PCI: Sharing IRQ 10 with 0000:00:0a.0
PCI: Sharing IRQ 10 with 0000:00:0b.1
3c59x: Donald Becker and others.
0000:00:0b.0: 3Com PCI 3c556 Laptop Tornado at e0800400.
PCI: Enabling bus mastering for device 0000:00:0b.0
initcall vortex_init+0x0/0xb0 returned 0 after 47 msecs
...
calling init_netconsole+0x0/0x1b0
netconsole: local port 4444
netconsole: local IP 10.0.1.9
netconsole: interface eth0
netconsole: remote port 4444
netconsole: remote IP 10.0.1.16
netconsole: remote ethernet address 00:19:xx:xx:xx:xx
netconsole: device eth0 not up yet, forcing it
eth0: setting half-duplex.
eth0: setting full-duplex.
------------[ cut here ]------------
WARNING: at kernel/softirq.c:137 local_bh_enable_ip+0xd1/0xe0()
Pid: 1, comm: swapper Not tainted 2.6.26-rc6-tip #2091
[<c0125ecf>] warn_on_slowpath+0x4f/0x70
[<c0126834>] ? release_console_sem+0x1b4/0x1d0
[<c0126d00>] ? vprintk+0x2a0/0x450
[<c012fde5>] ? __mod_timer+0xa5/0xc0
[<c046f7fd>] ? mdio_sync+0x3d/0x50
[<c0160ef6>] ? marker_probe_cb+0x46/0xa0
[<c0126ed7>] ? printk+0x27/0x50
[<c046f4c3>] ? vortex_set_duplex+0x43/0xc0
[<c046f521>] ? vortex_set_duplex+0xa1/0xc0
[<c0471b92>] ? vortex_timer+0xe2/0x3e0
[<c012b361>] local_bh_enable_ip+0xd1/0xe0
[<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
[<c0471b92>] vortex_timer+0xe2/0x3e0
[<c014743b>] ? trace_hardirqs_on+0xb/0x10
[<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160
[<c012f8b2>] run_timer_softirq+0x162/0x1c0
[<c0471ab0>] ? vortex_timer+0x0/0x3e0
[<c012b361>] local_bh_enable_ip+0xd1/0xe0
[<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
[<c0471b92>] vortex_timer+0xe2/0x3e0
[<c014743b>] ? trace_hardirqs_on+0xb/0x10
[<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160
[<c012f8b2>] run_timer_softirq+0x162/0x1c0
[<c0471ab0>] ? vortex_timer+0x0/0x3e0
[<c0471ab0>] ? vortex_timer+0x0/0x3e0
[<c012b60a>] __do_softirq+0x9a/0x160
[<c012b570>] ? __do_softirq+0x0/0x160
[<c0106775>] call_on_stack+0x15/0x30
[<c012b4f5>] ? irq_exit+0x55/0x60
[<c0106e85>] ? do_IRQ+0x85/0xd0
[<c0147391>] ? trace_hardirqs_on_caller+0xc1/0x160
[<c0104888>] ? common_interrupt+0x28/0x30
[<c08d8ac8>] ? mutex_unlock+0x8/0x10
[<c08d8180>] ? _cond_resched+0x10/0x30
[<c07a3be7>] ? netpoll_setup+0x117/0x390
[<c0cbfcfe>] ? init_netconsole+0x14e/0x1b0
[<c013d539>] ? ktime_get+0x19/0x40
[<c0c9bab2>] ? kernel_init+0x1b2/0x2c0
[<c0cbfbb0>] ? init_netconsole+0x0/0x1b0
[<c0396aa4>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c0103f12>] ? restore_nocheck_notrace+0x0/0xe
[<c0c9b900>] ? kernel_init+0x0/0x2c0
[<c0c9b900>] ? kernel_init+0x0/0x2c0
[<c0104aa7>] ? kernel_thread_helper+0x7/0x10
=======================
---[ end trace 37f9c502aff112e0 ]---
console [netcon0] enabled
netconsole: network logging started
initcall init_netconsole+0x0/0x1b0 returned 0 after 2914 msecs
looking at the driver I think the bug is real and the fix actually
is trivial.
vp->lock is also taken in hardware IRQ context, so we _have_ to always
use irqsafe locking. As we run in a timer with IRQs disabled,
we can simply use spin_lock.
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
index 2edda8c..aabad8c 100644
--- a/drivers/net/3c59x.c
+++ b/drivers/net/3c59x.c
@@ -1768,9 +1768,10 @@ vortex_timer(unsigned long data)
case XCVR_MII: case XCVR_NWAY:
{
ok = 1;
- spin_lock_bh(&vp->lock);
+ /* Interrupts are already disabled */
+ spin_lock(&vp->lock);
vortex_check_media(dev, 0);
- spin_unlock_bh(&vp->lock);
+ spin_unlock(&vp->lock);
}
break;
default: /* Other media types handled by Tx timeouts. */
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
2008-06-23 8:41 ` Ingo Molnar
@ 2008-06-23 9:22 ` Michael Buesch
2008-06-27 5:33 ` Jeff Garzik
1 sibling, 0 replies; 18+ messages in thread
From: Michael Buesch @ 2008-06-23 9:22 UTC (permalink / raw)
To: Ingo Molnar
Cc: Johannes Berg, Linus Torvalds, Linux Kernel list,
David Ellingsworth, linux-wireless, Jeff Garzik, netdev
On Monday 23 June 2008 10:41:23 Ingo Molnar wrote:
> ---------------------->
> commit 24a5454d9f7863b00143760197f0ec29c8234289
> Author: Michael Buesch <mb@bu3sch.de>
> Date: Fri Jun 20 18:18:34 2008 +0200
>
> net, vortex: fix lockup
>
> Ingo Molnar reported:
>
> -tip testing found that Johannes Berg's "softirq: remove irqs_disabled
> warning from local_bh_enable" enhancement to lockdep triggers a new
> warning on an old testbox that uses 3c59x vortex and netlogging:
>
> ----->
> calling vortex_init+0x0/0xb0
> PCI: Found IRQ 10 for device 0000:00:0b.0
> PCI: Sharing IRQ 10 with 0000:00:0a.0
> PCI: Sharing IRQ 10 with 0000:00:0b.1
> 3c59x: Donald Becker and others.
> 0000:00:0b.0: 3Com PCI 3c556 Laptop Tornado at e0800400.
> PCI: Enabling bus mastering for device 0000:00:0b.0
> initcall vortex_init+0x0/0xb0 returned 0 after 47 msecs
> ...
> calling init_netconsole+0x0/0x1b0
> netconsole: local port 4444
> netconsole: local IP 10.0.1.9
> netconsole: interface eth0
> netconsole: remote port 4444
> netconsole: remote IP 10.0.1.16
> netconsole: remote ethernet address 00:19:xx:xx:xx:xx
> netconsole: device eth0 not up yet, forcing it
> eth0: setting half-duplex.
> eth0: setting full-duplex.
> ------------[ cut here ]------------
> WARNING: at kernel/softirq.c:137 local_bh_enable_ip+0xd1/0xe0()
> Pid: 1, comm: swapper Not tainted 2.6.26-rc6-tip #2091
> [<c0125ecf>] warn_on_slowpath+0x4f/0x70
> [<c0126834>] ? release_console_sem+0x1b4/0x1d0
> [<c0126d00>] ? vprintk+0x2a0/0x450
> [<c012fde5>] ? __mod_timer+0xa5/0xc0
> [<c046f7fd>] ? mdio_sync+0x3d/0x50
> [<c0160ef6>] ? marker_probe_cb+0x46/0xa0
> [<c0126ed7>] ? printk+0x27/0x50
> [<c046f4c3>] ? vortex_set_duplex+0x43/0xc0
> [<c046f521>] ? vortex_set_duplex+0xa1/0xc0
> [<c0471b92>] ? vortex_timer+0xe2/0x3e0
> [<c012b361>] local_bh_enable_ip+0xd1/0xe0
> [<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
> [<c0471b92>] vortex_timer+0xe2/0x3e0
> [<c014743b>] ? trace_hardirqs_on+0xb/0x10
> [<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160
> [<c012f8b2>] run_timer_softirq+0x162/0x1c0
> [<c0471ab0>] ? vortex_timer+0x0/0x3e0
> [<c012b361>] local_bh_enable_ip+0xd1/0xe0
> [<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
> [<c0471b92>] vortex_timer+0xe2/0x3e0
> [<c014743b>] ? trace_hardirqs_on+0xb/0x10
> [<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160
> [<c012f8b2>] run_timer_softirq+0x162/0x1c0
> [<c0471ab0>] ? vortex_timer+0x0/0x3e0
> [<c0471ab0>] ? vortex_timer+0x0/0x3e0
> [<c012b60a>] __do_softirq+0x9a/0x160
> [<c012b570>] ? __do_softirq+0x0/0x160
> [<c0106775>] call_on_stack+0x15/0x30
> [<c012b4f5>] ? irq_exit+0x55/0x60
> [<c0106e85>] ? do_IRQ+0x85/0xd0
> [<c0147391>] ? trace_hardirqs_on_caller+0xc1/0x160
> [<c0104888>] ? common_interrupt+0x28/0x30
> [<c08d8ac8>] ? mutex_unlock+0x8/0x10
> [<c08d8180>] ? _cond_resched+0x10/0x30
> [<c07a3be7>] ? netpoll_setup+0x117/0x390
> [<c0cbfcfe>] ? init_netconsole+0x14e/0x1b0
> [<c013d539>] ? ktime_get+0x19/0x40
> [<c0c9bab2>] ? kernel_init+0x1b2/0x2c0
> [<c0cbfbb0>] ? init_netconsole+0x0/0x1b0
> [<c0396aa4>] ? trace_hardirqs_on_thunk+0xc/0x10
> [<c0103f12>] ? restore_nocheck_notrace+0x0/0xe
> [<c0c9b900>] ? kernel_init+0x0/0x2c0
> [<c0c9b900>] ? kernel_init+0x0/0x2c0
> [<c0104aa7>] ? kernel_thread_helper+0x7/0x10
> =======================
> ---[ end trace 37f9c502aff112e0 ]---
> console [netcon0] enabled
> netconsole: network logging started
> initcall init_netconsole+0x0/0x1b0 returned 0 after 2914 msecs
>
> looking at the driver I think the bug is real and the fix actually
> is trivial.
>
> vp->lock is also taken in hardware IRQ context, so we _have_ to always
> use irqsafe locking. As we run in a timer with IRQs disabled,
> we can simply use spin_lock.
>
> Cc: <stable@kernel.org>
Signed-off-by: Michael Buesch <mb@bu3sch.de>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
>
> diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
> index 2edda8c..aabad8c 100644
> --- a/drivers/net/3c59x.c
> +++ b/drivers/net/3c59x.c
> @@ -1768,9 +1768,10 @@ vortex_timer(unsigned long data)
> case XCVR_MII: case XCVR_NWAY:
> {
> ok = 1;
> - spin_lock_bh(&vp->lock);
> + /* Interrupts are already disabled */
> + spin_lock(&vp->lock);
> vortex_check_media(dev, 0);
> - spin_unlock_bh(&vp->lock);
> + spin_unlock(&vp->lock);
> }
> break;
> default: /* Other media types handled by Tx timeouts. */
>
>
--
Greetings Michael.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
2008-06-23 8:41 ` Ingo Molnar
2008-06-23 9:22 ` Michael Buesch
@ 2008-06-27 5:33 ` Jeff Garzik
2008-06-27 10:53 ` Ingo Molnar
1 sibling, 1 reply; 18+ messages in thread
From: Jeff Garzik @ 2008-06-27 5:33 UTC (permalink / raw)
To: Ingo Molnar
Cc: Michael Buesch, Johannes Berg, Linus Torvalds, Linux Kernel list,
David Ellingsworth, linux-wireless, netdev
Ingo Molnar wrote:
> * Michael Buesch <mb@bu3sch.de> wrote:
>
>> On Friday 20 June 2008 18:01:09 Michael Buesch wrote:
>>> On Friday 20 June 2008 17:55:41 Ingo Molnar wrote:
>>>> * Michael Buesch <mb@bu3sch.de> wrote:
>>>>
>>>>> On Friday 20 June 2008 17:27:48 Ingo Molnar wrote:
>>>>>> [<c012b361>] local_bh_enable_ip+0xd1/0xe0
>>>>>> [<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
>>>>>> [<c0471b92>] vortex_timer+0xe2/0x3e0
>>>>>> real bug or false positive?
>>>>> Well, a timer runs with IRQs disabled, no? So this would be a bug.
>>>> indeed - agreed :) [no time for me to fix it, but can test any rfc patch.]
>>> A quick workaround always is to convert the lock into an _irqsafe lock.
>>> Although it introduces higher overhead (interrupt-wise), it prevents the bug.
>>> A real fix would require to understand the locking in the driver, which I
>>> don't, as I never looked at the driver. :)
>>
>> However, looking at the driver I think the fix actually is trivial:
>>
>> Index: wireless-testing/drivers/net/3c59x.c
>> ===================================================================
>> --- wireless-testing.orig/drivers/net/3c59x.c 2008-05-16 00:26:29.000000000 +0200
>> +++ wireless-testing/drivers/net/3c59x.c 2008-06-20 18:16:55.000000000 +0200
>> @@ -1768,9 +1768,10 @@ vortex_timer(unsigned long data)
>> case XCVR_MII: case XCVR_NWAY:
>> {
>> ok = 1;
>> - spin_lock_bh(&vp->lock);
>> + /* Interrupts are already disabled */
>> + spin_lock(&vp->lock);
>> vortex_check_media(dev, 0);
>> - spin_unlock_bh(&vp->lock);
>> + spin_unlock(&vp->lock);
>> }
>> break;
>> default: /* Other media types handled by Tx timeouts. */
>>
>>
>> vp->lock is also taken in hardware IRQ context, so we _have_ to always
>> use irqsafe locking. As we run in a timer with IRQs disabled, we can
>> simply use spin_lock.
>
> thanks Michael! I have applied your fix to the tip/out-of-tree
> local/non-propagated fixlets branch for more testing. It takes up to 4
> hours to trigger this warning on that testbox, so i should know it
> whether it fixes the bug later today.
>
> Cc:-ed Jeff, he might want to pick up this fix, find the tidied up
> commit below. (3c59x is still present in various older laptops, so
> -stable candidate too i guess.)
>
> Ingo
>
> ---------------------->
> commit 24a5454d9f7863b00143760197f0ec29c8234289
> Author: Michael Buesch <mb@bu3sch.de>
> Date: Fri Jun 20 18:18:34 2008 +0200
>
> net, vortex: fix lockup
>
> Ingo Molnar reported:
>
> -tip testing found that Johannes Berg's "softirq: remove irqs_disabled
> warning from local_bh_enable" enhancement to lockdep triggers a new
> warning on an old testbox that uses 3c59x vortex and netlogging:
>
> ----->
> calling vortex_init+0x0/0xb0
> PCI: Found IRQ 10 for device 0000:00:0b.0
> PCI: Sharing IRQ 10 with 0000:00:0a.0
> PCI: Sharing IRQ 10 with 0000:00:0b.1
> 3c59x: Donald Becker and others.
> 0000:00:0b.0: 3Com PCI 3c556 Laptop Tornado at e0800400.
> PCI: Enabling bus mastering for device 0000:00:0b.0
> initcall vortex_init+0x0/0xb0 returned 0 after 47 msecs
> ...
> calling init_netconsole+0x0/0x1b0
> netconsole: local port 4444
> netconsole: local IP 10.0.1.9
> netconsole: interface eth0
> netconsole: remote port 4444
> netconsole: remote IP 10.0.1.16
> netconsole: remote ethernet address 00:19:xx:xx:xx:xx
> netconsole: device eth0 not up yet, forcing it
> eth0: setting half-duplex.
> eth0: setting full-duplex.
> ------------[ cut here ]------------
> WARNING: at kernel/softirq.c:137 local_bh_enable_ip+0xd1/0xe0()
> Pid: 1, comm: swapper Not tainted 2.6.26-rc6-tip #2091
> [<c0125ecf>] warn_on_slowpath+0x4f/0x70
> [<c0126834>] ? release_console_sem+0x1b4/0x1d0
> [<c0126d00>] ? vprintk+0x2a0/0x450
> [<c012fde5>] ? __mod_timer+0xa5/0xc0
> [<c046f7fd>] ? mdio_sync+0x3d/0x50
> [<c0160ef6>] ? marker_probe_cb+0x46/0xa0
> [<c0126ed7>] ? printk+0x27/0x50
> [<c046f4c3>] ? vortex_set_duplex+0x43/0xc0
> [<c046f521>] ? vortex_set_duplex+0xa1/0xc0
> [<c0471b92>] ? vortex_timer+0xe2/0x3e0
> [<c012b361>] local_bh_enable_ip+0xd1/0xe0
> [<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
> [<c0471b92>] vortex_timer+0xe2/0x3e0
> [<c014743b>] ? trace_hardirqs_on+0xb/0x10
> [<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160
> [<c012f8b2>] run_timer_softirq+0x162/0x1c0
> [<c0471ab0>] ? vortex_timer+0x0/0x3e0
> [<c012b361>] local_bh_enable_ip+0xd1/0xe0
> [<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
> [<c0471b92>] vortex_timer+0xe2/0x3e0
> [<c014743b>] ? trace_hardirqs_on+0xb/0x10
> [<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160
> [<c012f8b2>] run_timer_softirq+0x162/0x1c0
> [<c0471ab0>] ? vortex_timer+0x0/0x3e0
> [<c0471ab0>] ? vortex_timer+0x0/0x3e0
> [<c012b60a>] __do_softirq+0x9a/0x160
> [<c012b570>] ? __do_softirq+0x0/0x160
> [<c0106775>] call_on_stack+0x15/0x30
> [<c012b4f5>] ? irq_exit+0x55/0x60
> [<c0106e85>] ? do_IRQ+0x85/0xd0
> [<c0147391>] ? trace_hardirqs_on_caller+0xc1/0x160
> [<c0104888>] ? common_interrupt+0x28/0x30
> [<c08d8ac8>] ? mutex_unlock+0x8/0x10
> [<c08d8180>] ? _cond_resched+0x10/0x30
> [<c07a3be7>] ? netpoll_setup+0x117/0x390
> [<c0cbfcfe>] ? init_netconsole+0x14e/0x1b0
> [<c013d539>] ? ktime_get+0x19/0x40
> [<c0c9bab2>] ? kernel_init+0x1b2/0x2c0
> [<c0cbfbb0>] ? init_netconsole+0x0/0x1b0
> [<c0396aa4>] ? trace_hardirqs_on_thunk+0xc/0x10
> [<c0103f12>] ? restore_nocheck_notrace+0x0/0xe
> [<c0c9b900>] ? kernel_init+0x0/0x2c0
> [<c0c9b900>] ? kernel_init+0x0/0x2c0
> [<c0104aa7>] ? kernel_thread_helper+0x7/0x10
> =======================
> ---[ end trace 37f9c502aff112e0 ]---
> console [netcon0] enabled
> netconsole: network logging started
> initcall init_netconsole+0x0/0x1b0 returned 0 after 2914 msecs
>
> looking at the driver I think the bug is real and the fix actually
> is trivial.
>
> vp->lock is also taken in hardware IRQ context, so we _have_ to always
> use irqsafe locking. As we run in a timer with IRQs disabled,
> we can simply use spin_lock.
>
> Cc: <stable@kernel.org>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
>
> diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
> index 2edda8c..aabad8c 100644
> --- a/drivers/net/3c59x.c
> +++ b/drivers/net/3c59x.c
> @@ -1768,9 +1768,10 @@ vortex_timer(unsigned long data)
> case XCVR_MII: case XCVR_NWAY:
> {
> ok = 1;
> - spin_lock_bh(&vp->lock);
> + /* Interrupts are already disabled */
> + spin_lock(&vp->lock);
> vortex_check_media(dev, 0);
> - spin_unlock_bh(&vp->lock);
> + spin_unlock(&vp->lock);
applied
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
2008-06-27 5:33 ` Jeff Garzik
@ 2008-06-27 10:53 ` Ingo Molnar
2008-06-27 10:56 ` Johannes Berg
0 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2008-06-27 10:53 UTC (permalink / raw)
To: Jeff Garzik
Cc: Michael Buesch, Johannes Berg, Linus Torvalds, Linux Kernel list,
David Ellingsworth, linux-wireless, netdev
* Jeff Garzik <jgarzik@pobox.com> wrote:
> applied
track record of 4 days of testing: the warning indeed went away and no
other funnies in this area happened on that box. Thanks,
Ingo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable
2008-06-27 10:53 ` Ingo Molnar
@ 2008-06-27 10:56 ` Johannes Berg
0 siblings, 0 replies; 18+ messages in thread
From: Johannes Berg @ 2008-06-27 10:56 UTC (permalink / raw)
To: Ingo Molnar
Cc: Jeff Garzik, Michael Buesch, Linus Torvalds, Linux Kernel list,
David Ellingsworth, linux-wireless, netdev
[-- Attachment #1: Type: text/plain, Size: 352 bytes --]
On Fri, 2008-06-27 at 12:53 +0200, Ingo Molnar wrote:
> * Jeff Garzik <jgarzik@pobox.com> wrote:
>
> > applied
>
> track record of 4 days of testing: the warning indeed went away and no
> other funnies in this area happened on that box. Thanks,
I've been running with my change for a while too now and never noticed
anything.
johannes
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2008-06-27 10:57 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-17 21:57 [PATCH/RFC] remove irqs_disabled warning from local_bh_enable Johannes Berg
2008-06-17 23:08 ` David Miller
2008-06-17 23:55 ` Linus Torvalds
2008-06-18 7:01 ` Peter Zijlstra
2008-06-18 7:29 ` Johannes Berg
2008-06-20 13:46 ` Ingo Molnar
2008-06-20 15:27 ` Ingo Molnar
2008-06-20 15:36 ` Michael Buesch
2008-06-20 15:55 ` Ingo Molnar
2008-06-20 16:01 ` Michael Buesch
2008-06-20 16:18 ` Michael Buesch
2008-06-23 8:41 ` Ingo Molnar
2008-06-23 9:22 ` Michael Buesch
2008-06-27 5:33 ` Jeff Garzik
2008-06-27 10:53 ` Ingo Molnar
2008-06-27 10:56 ` Johannes Berg
2008-06-20 15:43 ` Johannes Berg
2008-06-20 15:46 ` Johannes Berg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox