RE: Fw: Badness in local_bh_enable at kernel/softirq.c:119

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RE: Fw: Badness in local_bh_enable at kernel/softirq.c:119
@ 2003-10-01  8:19 Feldman, Scott
  2003-10-01  8:37 ` David S. Miller
  2003-10-01 14:40 ` Randy.Dunlap
  0 siblings, 2 replies; 8+ messages in thread
From: Feldman, Scott @ 2003-10-01  8:19 UTC (permalink / raw)
  To: David S. Miller; +Cc: jgarzik, akpm, netdev, cramerj

> Why do you even need to use IRQ locking here?
> 
> Your e1000 netdev->hard_start_xmit method doesn't need to do 
> anything special, why does this timer code?  I suppose you 
> need to synchronize with e1000_clean_tx_irq() in the non-NAPI 
> case right?  If so, that's not being accomplished by what 
> your code is doing.  If nobody else takes that xmit_lock in 
> an IRQ disabling manner, the e1000 timer code doing so 
> doesn't make any difference.
> 
> I have an idea for attacking the problem, once you figure out 
> what kind of locking you really need.  Do whatever you need 
> to do to synchronize on the hardware side, but instead of 
> directly freeing the SKB, add each one to a list.  A pointer 
> to the head of this list is stored on the stack of the timer 
> routine, and passed down into the TX purger.
> 
> Then at the top level you can drop all your locks, re-enable 
> hw IRQs and whatever else you need to do, then pass the SKBs 
> in the list off to dev_kfree_skb_irq() (this is the 
> appropriate routine to call to free an SKB from a timer 
> handler, which runs in soft interrupt context).

Chris can jump in here anytime.  :-)

Synchronizing on the hardware side is stumping me.  We have the list of
skbs you describe, but I'm concerned about unmapping the skb buffers if
hardware is right in the middle of some DMA  on one of the buffers.
Some archs really don't like hardware accessing unmapped buffers.

Here's what I'm thinking: when link down is detected in the timer, just
trick hardware into thinking link is still up (ILOS - Invert Loss of
Signal).  No locking, no disabling of interrupts.  Hardware will do the
natural thing by completing the outstanding sends and also provide the
interrupts so we can clean/return skbs as normal (e1000_clean_tx_irq).
Something like:

<timer>
	if lost link
		if outstanding Tx work
			set ILOS		// h/w thinks link is
up, DMA continues
			mdelay(10)
			clear ILOS		// h/w thinks link is
down

The mdelay(10) is terrible, but we've already got that in the current
tx_flush routine.

Chris, what am I missing?  I didn't included the ANE business for
clarity.

-scott

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fw: Badness in local_bh_enable at kernel/softirq.c:119
  2003-10-01  8:19 Fw: Badness in local_bh_enable at kernel/softirq.c:119 Feldman, Scott
@ 2003-10-01  8:37 ` David S. Miller
  2003-10-01 14:40 ` Randy.Dunlap
  1 sibling, 0 replies; 8+ messages in thread
From: David S. Miller @ 2003-10-01  8:37 UTC (permalink / raw)
  To: Feldman, Scott; +Cc: jgarzik, akpm, netdev, cramerj

On Wed, 1 Oct 2003 01:19:41 -0700
"Feldman, Scott" <scott.feldman@intel.com> wrote:

> Synchronizing on the hardware side is stumping me.  We have the list of
> skbs you describe, but I'm concerned about unmapping the skb buffers if
> hardware is right in the middle of some DMA  on one of the buffers.
> Some archs really don't like hardware accessing unmapped buffers.

Good point, if the e1000 accesses the DMA buffer after the unmap
it will cause many arch's to signal PCI errors since the IOMMU
will no longer have a valid translation for those DMA requests.

> Here's what I'm thinking: when link down is detected in the timer, just
> trick hardware into thinking link is still up (ILOS - Invert Loss of
> Signal).  No locking, no disabling of interrupts.  Hardware will do the
> natural thing by completing the outstanding sends and also provide the
> interrupts so we can clean/return skbs as normal (e1000_clean_tx_irq).

If you can make that work, it's the simplest fix.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fw: Badness in local_bh_enable at kernel/softirq.c:119
  2003-10-01  8:19 Fw: Badness in local_bh_enable at kernel/softirq.c:119 Feldman, Scott
  2003-10-01  8:37 ` David S. Miller
@ 2003-10-01 14:40 ` Randy.Dunlap
  1 sibling, 0 replies; 8+ messages in thread
From: Randy.Dunlap @ 2003-10-01 14:40 UTC (permalink / raw)
  To: Feldman, Scott; +Cc: davem, jgarzik, akpm, netdev, cramerj

On Wed, 1 Oct 2003 01:19:41 -0700 "Feldman, Scott" <scott.feldman@intel.com> wrote:

| Chris can jump in here anytime.  :-)
| 
| Synchronizing on the hardware side is stumping me.  We have the list of
| skbs you describe, but I'm concerned about unmapping the skb buffers if
| hardware is right in the middle of some DMA  on one of the buffers.
| Some archs really don't like hardware accessing unmapped buffers.
| 
| Here's what I'm thinking: when link down is detected in the timer, just
| trick hardware into thinking link is still up (ILOS - Invert Loss of
| Signal).  No locking, no disabling of interrupts.  Hardware will do the
| natural thing by completing the outstanding sends and also provide the
| interrupts so we can clean/return skbs as normal (e1000_clean_tx_irq).
| Something like:
| 
| <timer>
| 	if lost link
| 		if outstanding Tx work
| 			set ILOS		// h/w thinks link is
| up, DMA continues
| 			mdelay(10)
| 			clear ILOS		// h/w thinks link is
| down
| 			
| The mdelay(10) is terrible, but we've already got that in the current
| tx_flush routine.
| 
| Chris, what am I missing?  I didn't included the ANE business for
| clarity.

What happens if the link comes back up (live) during the mdelay
period?  Tiny race?  Just a delay until it's corrected?

--
~Randy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Fw: Badness in local_bh_enable at kernel/softirq.c:119
@ 2003-09-30 17:27 Feldman, Scott
  2003-10-01  6:51 ` David S. Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Feldman, Scott @ 2003-09-30 17:27 UTC (permalink / raw)
  To: David S. Miller; +Cc: jgarzik, akpm, netdev, cramerj

> Sorry, in case it isn't painfully obvious, instead of hinting
> at it let me state explicitly that ->xmit_lock is a BH 
> disabling lock not an IRQ disabling one.
> 
> Therefore, e1000's IRQ disabling when grabbing that lock is
> buggy and need to be changed to BH disabling.
> 
> If it needs to disable IRQs for it's own internal locking, it
> needs to do so such that such IRQ disabling internal locks 
> are not held while kfree_skb() is being invoked.
> 
> Calling kfree_skb() with IRQs disabled in the e1000 driver is
> the cause of this bug.

Thanks David for your help.
 
> Jeff, if you pushed these e1000 updates that make it grap
> ->xmit_lock() with disabling IRQs instead of BH into 2.4.x
> trees too, beware!

The e1000 driver has been like this (broken) for quite a while.  Recent
updates haven't messed with this code.

This gets back to the problem of trying to flush any queued transmits
when we lose link.  The e1000 hardware stops DMA when link lose is
detected, so any work queued to hardware is "stuck", and therefore we
don't release the associated skb resources until we regain link.  This
causes problems when we're sitting under a failover setup like bonding
or ANS.

At this point, I'm leaning towards removing the offending code in the
timer callback now, and taking a step back to solve the bigger problem,
either with a better locking scheme, or a new plan on how to flush the
"stuck" work.  We don't need kernel panics when you trip over the
Ethernet cable!  Sound like a plan?

@@ -1278,41 +1278,6 @@
 		e1000_leave_82542_rst(adapter);
 }
 
-static void
-e1000_tx_flush(struct e1000_adapter *adapter)
-{
-	uint32_t ctrl, tctl, txcw, icr;
-
-	e1000_irq_disable(adapter);
-
-	if(adapter->hw.mac_type < e1000_82543) {
-		/* Transmit Unit Reset */
-		tctl = E1000_READ_REG(&adapter->hw, TCTL);
-		E1000_WRITE_REG(&adapter->hw, TCTL, tctl |
E1000_TCTL_RST);
-		E1000_WRITE_REG(&adapter->hw, TCTL, tctl);
-		e1000_clean_tx_ring(adapter);
-		e1000_configure_tx(adapter);
-	} else {
-		txcw = E1000_READ_REG(&adapter->hw, TXCW);
-		E1000_WRITE_REG(&adapter->hw, TXCW, txcw &
~E1000_TXCW_ANE);
-
-		ctrl = E1000_READ_REG(&adapter->hw, CTRL);
-		E1000_WRITE_REG(&adapter->hw, CTRL, ctrl |
E1000_CTRL_SLU |
-				E1000_CTRL_ILOS);
-
-		mdelay(10);
-
-		e1000_clean_tx_irq(adapter);
-		E1000_WRITE_REG(&adapter->hw, CTRL, ctrl);
-		E1000_WRITE_REG(&adapter->hw, TXCW, txcw);
-
-		/* clear the link status change interrupts this caused
*/
-		icr = E1000_READ_REG(&adapter->hw, ICR);
-	}
-
-	e1000_irq_enable(adapter);
-}
-
 /* need to wait a few seconds after link up to get diagnostic
information from the phy */
 
 static void
@@ -1414,15 +1379,6 @@
 	e1000_update_stats(adapter);
 	e1000_update_adaptive(&adapter->hw);
 
-	if(!netif_carrier_ok(netdev)) {
-		if(E1000_DESC_UNUSED(txdr) + 1 < txdr->count) {
-			unsigned long flags;
-			spin_lock_irqsave(&netdev->xmit_lock, flags);
-			e1000_tx_flush(adapter);
-			spin_unlock_irqrestore(&netdev->xmit_lock,
flags);
-		}
-	}
-
 	/* Dynamic mode for Interrupt Throttle Rate (ITR) */
 	if(adapter->hw.mac_type >= e1000_82540 && adapter->itr == 1) {
 		/* Symmetric Tx/Rx gets a reduced ITR=2000; Total

-scott

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fw: Badness in local_bh_enable at kernel/softirq.c:119
  2003-09-30 17:27 Feldman, Scott
@ 2003-10-01  6:51 ` David S. Miller
  0 siblings, 0 replies; 8+ messages in thread
From: David S. Miller @ 2003-10-01  6:51 UTC (permalink / raw)
  To: Feldman, Scott; +Cc: jgarzik, akpm, netdev, cramerj

On Tue, 30 Sep 2003 10:27:08 -0700
"Feldman, Scott" <scott.feldman@intel.com> wrote:

> At this point, I'm leaning towards removing the offending code in the
> timer callback now, and taking a step back to solve the bigger problem,
> either with a better locking scheme, or a new plan on how to flush the
> "stuck" work.  We don't need kernel panics when you trip over the
> Ethernet cable!  Sound like a plan?

Why do you even need to use IRQ locking here?

Your e1000 netdev->hard_start_xmit method doesn't need to do anything
special, why does this timer code?  I suppose you need to synchronize
with e1000_clean_tx_irq() in the non-NAPI case right?  If so, that's
not being accomplished by what your code is doing.  If nobody else
takes that xmit_lock in an IRQ disabling manner, the e1000 timer code
doing so doesn't make any difference.

I have an idea for attacking the problem, once you figure out what
kind of locking you really need.  Do whatever you need to do to
synchronize on the hardware side, but instead of directly freeing
the SKB, add each one to a list.  A pointer to the head of this list
is stored on the stack of the timer routine, and passed down into
the TX purger.

Then at the top level you can drop all your locks, re-enable hw IRQs
and whatever else you need to do, then pass the SKBs in the list off
to dev_kfree_skb_irq() (this is the appropriate routine to call to
free an SKB from a timer handler, which runs in soft interrupt
context).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Fw: Badness in local_bh_enable at kernel/softirq.c:119
@ 2003-09-29 21:36 Andrew Morton
  2003-09-30  5:49 ` David S. Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2003-09-29 21:36 UTC (permalink / raw)
  To: netdev; +Cc: cramerj, scott.feldman



Badness in local_bh_enable at kernel/softirq.c:119
Call Trace:
 [<c01253d7>] local_bh_enable+0x93/0x96
 [<c032d0fb>] xprt_write_space+0xfb/0x158
 [<c02cf42a>] sock_wfree+0x48/0x4a
 [<c02cf3e2>] sock_wfree+0x0/0x4a
 [<c02d02df>] __kfree_skb+0x49/0xda
 [<c020bcc0>] __delay+0x14/0x18
 [<c02752a0>] e1000_clean_tx_irq+0x1f0/0x1f6
 [<c0273a07>] e1000_tx_flush+0x69/0xd0
 [<c0273d08>] e1000_watchdog+0xba/0x340
 [<c011be80>] scheduler_tick+0x5a6/0x5ac
 [<c0273c4e>] e1000_watchdog+0x0/0x340
 [<c0129846>] run_timer_softirq+0xe8/0x1cc
 [<c0116903>] smp_apic_timer_interrupt+0x147/0x14c
 [<c0125341>] do_softirq+0xc9/0xcc
 [<c01253ac>] local_bh_enable+0x68/0x96
 [<c02e63a2>] rt_run_flush+0xa4/0xda
 [<c031f95b>] fib_netdev_event+0x57/0x8b
 [<c012ea23>] notifier_call_chain+0x27/0x40
 [<c02d3e6b>] netdev_state_change+0x37/0x52
 [<c02de65a>] linkwatch_run_queue+0xce/0xe2
 [<c02de694>] linkwatch_event+0x26/0x2c
 [<c0131504>] worker_thread+0x212/0x314
 [<c02de66e>] linkwatch_event+0x0/0x2c
 [<c011c598>] default_wake_function+0x0/0x2e
 [<c01094b6>] ret_from_fork+0x6/0x14
 [<c011c598>] default_wake_function+0x0/0x2e
 [<c01312f2>] worker_thread+0x0/0x314
 [<c010739d>] kernel_thread_helper+0x5/0xc

It hapenned while NFS was having trouble communicating with the server.



Due to this:

		spin_lock_irqsave(&netdev->xmit_lock, flags);
		e1000_tx_flush(adapter);
		spin_unlock_irqrestore(&netdev->xmit_lock, flags);

I'd have thought that calling kfree_skb() under xmit_lock would be a big
ranking bug..  But the reason why the kernel dropped this backtrace is that
local_bh_enable() will unconditionally enable interrupts, so this driver is
exposed to a deadlock.

Other parts of the kernel do not take xmit_lock with irq's disabled, so a
simple spin_lock() here may suffice.

Oh, it's taking xmit_lock in a timer handler.  I give up.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fw: Badness in local_bh_enable at kernel/softirq.c:119
  2003-09-29 21:36 Andrew Morton
@ 2003-09-30  5:49 ` David S. Miller
  2003-09-30 11:53   ` David S. Miller
  0 siblings, 1 reply; 8+ messages in thread
From: David S. Miller @ 2003-09-30  5:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: netdev, cramerj, scott.feldman

On Mon, 29 Sep 2003 14:36:42 -0700
Andrew Morton <akpm@osdl.org> wrote:

> Oh, it's taking xmit_lock in a timer handler.  I give up.

I think e1000 should rethink what it is doing :-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fw: Badness in local_bh_enable at kernel/softirq.c:119
  2003-09-30  5:49 ` David S. Miller
@ 2003-09-30 11:53   ` David S. Miller
  0 siblings, 0 replies; 8+ messages in thread
From: David S. Miller @ 2003-09-30 11:53 UTC (permalink / raw)
  To: David S. Miller; +Cc: jgarzik, akpm, netdev, cramerj, scott.feldman

On Mon, 29 Sep 2003 22:49:29 -0700
"David S. Miller" <davem@redhat.com> wrote:

> On Mon, 29 Sep 2003 14:36:42 -0700
> Andrew Morton <akpm@osdl.org> wrote:
> 
> > Oh, it's taking xmit_lock in a timer handler.  I give up.
> 
> I think e1000 should rethink what it is doing :-)

Sorry, in case it isn't painfully obvious, instead of hinting
at it let me state explicitly that ->xmit_lock is a BH disabling
lock not an IRQ disabling one.

Therefore, e1000's IRQ disabling when grabbing that lock is
buggy and need to be changed to BH disabling.

If it needs to disable IRQs for it's own internal locking, it
needs to do so such that such IRQ disabling internal locks
are not held while kfree_skb() is being invoked.

Calling kfree_skb() with IRQs disabled in the e1000 driver is
the cause of this bug.

Jeff, if you pushed these e1000 updates that make it grap
->xmit_lock() with disabling IRQs instead of BH into 2.4.x
trees too, beware!

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-10-01 14:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-01  8:19 Fw: Badness in local_bh_enable at kernel/softirq.c:119 Feldman, Scott
2003-10-01  8:37 ` David S. Miller
2003-10-01 14:40 ` Randy.Dunlap
  -- strict thread matches above, loose matches on Subject: below --
2003-09-30 17:27 Feldman, Scott
2003-10-01  6:51 ` David S. Miller
2003-09-29 21:36 Andrew Morton
2003-09-30  5:49 ` David S. Miller
2003-09-30 11:53   ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).