From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Garzik Subject: Re: [PATCH/RFC] remove irqs_disabled warning from local_bh_enable Date: Fri, 27 Jun 2008 01:33:08 -0400 Message-ID: <48647B94.5000006@pobox.com> References: <1213739834.3803.137.camel@johannes.berg> <20080620155541.GC6656@elte.hu> <200806201801.09858.mb@bu3sch.de> <200806201818.34904.mb@bu3sch.de> <20080623084123.GA32688@elte.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Michael Buesch , Johannes Berg , Linus Torvalds , Linux Kernel list , David Ellingsworth , linux-wireless , netdev@vger.kernel.org To: Ingo Molnar Return-path: Received: from srv5.dvmed.net ([207.36.208.214]:56719 "EHLO mail.dvmed.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753063AbYF0Fdc (ORCPT ); Fri, 27 Jun 2008 01:33:32 -0400 In-Reply-To: <20080623084123.GA32688@elte.hu> Sender: netdev-owner@vger.kernel.org List-ID: Ingo Molnar wrote: > * Michael Buesch wrote: > >> On Friday 20 June 2008 18:01:09 Michael Buesch wrote: >>> On Friday 20 June 2008 17:55:41 Ingo Molnar wrote: >>>> * Michael Buesch wrote: >>>> >>>>> On Friday 20 June 2008 17:27:48 Ingo Molnar wrote: >>>>>> [] local_bh_enable_ip+0xd1/0xe0 >>>>>> [] _spin_unlock_bh+0x2f/0x40 >>>>>> [] vortex_timer+0xe2/0x3e0 >>>>>> real bug or false positive? >>>>> Well, a timer runs with IRQs disabled, no? So this would be a bug. >>>> indeed - agreed :) [no time for me to fix it, but can test any rfc patch.] >>> A quick workaround always is to convert the lock into an _irqsafe lock. >>> Although it introduces higher overhead (interrupt-wise), it prevents the bug. >>> A real fix would require to understand the locking in the driver, which I >>> don't, as I never looked at the driver. :) >> >> However, looking at the driver I think the fix actually is trivial: >> >> Index: wireless-testing/drivers/net/3c59x.c >> =================================================================== >> --- wireless-testing.orig/drivers/net/3c59x.c 2008-05-16 00:26:29.000000000 +0200 >> +++ wireless-testing/drivers/net/3c59x.c 2008-06-20 18:16:55.000000000 +0200 >> @@ -1768,9 +1768,10 @@ vortex_timer(unsigned long data) >> case XCVR_MII: case XCVR_NWAY: >> { >> ok = 1; >> - spin_lock_bh(&vp->lock); >> + /* Interrupts are already disabled */ >> + spin_lock(&vp->lock); >> vortex_check_media(dev, 0); >> - spin_unlock_bh(&vp->lock); >> + spin_unlock(&vp->lock); >> } >> break; >> default: /* Other media types handled by Tx timeouts. */ >> >> >> vp->lock is also taken in hardware IRQ context, so we _have_ to always >> use irqsafe locking. As we run in a timer with IRQs disabled, we can >> simply use spin_lock. > > thanks Michael! I have applied your fix to the tip/out-of-tree > local/non-propagated fixlets branch for more testing. It takes up to 4 > hours to trigger this warning on that testbox, so i should know it > whether it fixes the bug later today. > > Cc:-ed Jeff, he might want to pick up this fix, find the tidied up > commit below. (3c59x is still present in various older laptops, so > -stable candidate too i guess.) > > Ingo > > ----------------------> > commit 24a5454d9f7863b00143760197f0ec29c8234289 > Author: Michael Buesch > Date: Fri Jun 20 18:18:34 2008 +0200 > > net, vortex: fix lockup > > Ingo Molnar reported: > > -tip testing found that Johannes Berg's "softirq: remove irqs_disabled > warning from local_bh_enable" enhancement to lockdep triggers a new > warning on an old testbox that uses 3c59x vortex and netlogging: > > -----> > calling vortex_init+0x0/0xb0 > PCI: Found IRQ 10 for device 0000:00:0b.0 > PCI: Sharing IRQ 10 with 0000:00:0a.0 > PCI: Sharing IRQ 10 with 0000:00:0b.1 > 3c59x: Donald Becker and others. > 0000:00:0b.0: 3Com PCI 3c556 Laptop Tornado at e0800400. > PCI: Enabling bus mastering for device 0000:00:0b.0 > initcall vortex_init+0x0/0xb0 returned 0 after 47 msecs > ... > calling init_netconsole+0x0/0x1b0 > netconsole: local port 4444 > netconsole: local IP 10.0.1.9 > netconsole: interface eth0 > netconsole: remote port 4444 > netconsole: remote IP 10.0.1.16 > netconsole: remote ethernet address 00:19:xx:xx:xx:xx > netconsole: device eth0 not up yet, forcing it > eth0: setting half-duplex. > eth0: setting full-duplex. > ------------[ cut here ]------------ > WARNING: at kernel/softirq.c:137 local_bh_enable_ip+0xd1/0xe0() > Pid: 1, comm: swapper Not tainted 2.6.26-rc6-tip #2091 > [] warn_on_slowpath+0x4f/0x70 > [] ? release_console_sem+0x1b4/0x1d0 > [] ? vprintk+0x2a0/0x450 > [] ? __mod_timer+0xa5/0xc0 > [] ? mdio_sync+0x3d/0x50 > [] ? marker_probe_cb+0x46/0xa0 > [] ? printk+0x27/0x50 > [] ? vortex_set_duplex+0x43/0xc0 > [] ? vortex_set_duplex+0xa1/0xc0 > [] ? vortex_timer+0xe2/0x3e0 > [] local_bh_enable_ip+0xd1/0xe0 > [] _spin_unlock_bh+0x2f/0x40 > [] vortex_timer+0xe2/0x3e0 > [] ? trace_hardirqs_on+0xb/0x10 > [] ? trace_hardirqs_on_caller+0x88/0x160 > [] run_timer_softirq+0x162/0x1c0 > [] ? vortex_timer+0x0/0x3e0 > [] local_bh_enable_ip+0xd1/0xe0 > [] _spin_unlock_bh+0x2f/0x40 > [] vortex_timer+0xe2/0x3e0 > [] ? trace_hardirqs_on+0xb/0x10 > [] ? trace_hardirqs_on_caller+0x88/0x160 > [] run_timer_softirq+0x162/0x1c0 > [] ? vortex_timer+0x0/0x3e0 > [] ? vortex_timer+0x0/0x3e0 > [] __do_softirq+0x9a/0x160 > [] ? __do_softirq+0x0/0x160 > [] call_on_stack+0x15/0x30 > [] ? irq_exit+0x55/0x60 > [] ? do_IRQ+0x85/0xd0 > [] ? trace_hardirqs_on_caller+0xc1/0x160 > [] ? common_interrupt+0x28/0x30 > [] ? mutex_unlock+0x8/0x10 > [] ? _cond_resched+0x10/0x30 > [] ? netpoll_setup+0x117/0x390 > [] ? init_netconsole+0x14e/0x1b0 > [] ? ktime_get+0x19/0x40 > [] ? kernel_init+0x1b2/0x2c0 > [] ? init_netconsole+0x0/0x1b0 > [] ? trace_hardirqs_on_thunk+0xc/0x10 > [] ? restore_nocheck_notrace+0x0/0xe > [] ? kernel_init+0x0/0x2c0 > [] ? kernel_init+0x0/0x2c0 > [] ? kernel_thread_helper+0x7/0x10 > ======================= > ---[ end trace 37f9c502aff112e0 ]--- > console [netcon0] enabled > netconsole: network logging started > initcall init_netconsole+0x0/0x1b0 returned 0 after 2914 msecs > > looking at the driver I think the bug is real and the fix actually > is trivial. > > vp->lock is also taken in hardware IRQ context, so we _have_ to always > use irqsafe locking. As we run in a timer with IRQs disabled, > we can simply use spin_lock. > > Cc: > Signed-off-by: Ingo Molnar > > diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c > index 2edda8c..aabad8c 100644 > --- a/drivers/net/3c59x.c > +++ b/drivers/net/3c59x.c > @@ -1768,9 +1768,10 @@ vortex_timer(unsigned long data) > case XCVR_MII: case XCVR_NWAY: > { > ok = 1; > - spin_lock_bh(&vp->lock); > + /* Interrupts are already disabled */ > + spin_lock(&vp->lock); > vortex_check_media(dev, 0); > - spin_unlock_bh(&vp->lock); > + spin_unlock(&vp->lock); applied