netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: Fw: Badness in local_bh_enable at kernel/softirq.c:119
@ 2003-10-01  8:19 Feldman, Scott
  2003-10-01  8:37 ` David S. Miller
  2003-10-01 14:40 ` Randy.Dunlap
  0 siblings, 2 replies; 8+ messages in thread
From: Feldman, Scott @ 2003-10-01  8:19 UTC (permalink / raw)
  To: David S. Miller; +Cc: jgarzik, akpm, netdev, cramerj

> Why do you even need to use IRQ locking here?
> 
> Your e1000 netdev->hard_start_xmit method doesn't need to do 
> anything special, why does this timer code?  I suppose you 
> need to synchronize with e1000_clean_tx_irq() in the non-NAPI 
> case right?  If so, that's not being accomplished by what 
> your code is doing.  If nobody else takes that xmit_lock in 
> an IRQ disabling manner, the e1000 timer code doing so 
> doesn't make any difference.
> 
> I have an idea for attacking the problem, once you figure out 
> what kind of locking you really need.  Do whatever you need 
> to do to synchronize on the hardware side, but instead of 
> directly freeing the SKB, add each one to a list.  A pointer 
> to the head of this list is stored on the stack of the timer 
> routine, and passed down into the TX purger.
> 
> Then at the top level you can drop all your locks, re-enable 
> hw IRQs and whatever else you need to do, then pass the SKBs 
> in the list off to dev_kfree_skb_irq() (this is the 
> appropriate routine to call to free an SKB from a timer 
> handler, which runs in soft interrupt context).

Chris can jump in here anytime.  :-)

Synchronizing on the hardware side is stumping me.  We have the list of
skbs you describe, but I'm concerned about unmapping the skb buffers if
hardware is right in the middle of some DMA  on one of the buffers.
Some archs really don't like hardware accessing unmapped buffers.

Here's what I'm thinking: when link down is detected in the timer, just
trick hardware into thinking link is still up (ILOS - Invert Loss of
Signal).  No locking, no disabling of interrupts.  Hardware will do the
natural thing by completing the outstanding sends and also provide the
interrupts so we can clean/return skbs as normal (e1000_clean_tx_irq).
Something like:

<timer>
	if lost link
		if outstanding Tx work
			set ILOS		// h/w thinks link is
up, DMA continues
			mdelay(10)
			clear ILOS		// h/w thinks link is
down
			
The mdelay(10) is terrible, but we've already got that in the current
tx_flush routine.

Chris, what am I missing?  I didn't included the ANE business for
clarity.

-scott

^ permalink raw reply	[flat|nested] 8+ messages in thread
* RE: Fw: Badness in local_bh_enable at kernel/softirq.c:119
@ 2003-09-30 17:27 Feldman, Scott
  2003-10-01  6:51 ` David S. Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Feldman, Scott @ 2003-09-30 17:27 UTC (permalink / raw)
  To: David S. Miller; +Cc: jgarzik, akpm, netdev, cramerj

> Sorry, in case it isn't painfully obvious, instead of hinting
> at it let me state explicitly that ->xmit_lock is a BH 
> disabling lock not an IRQ disabling one.
> 
> Therefore, e1000's IRQ disabling when grabbing that lock is
> buggy and need to be changed to BH disabling.
> 
> If it needs to disable IRQs for it's own internal locking, it
> needs to do so such that such IRQ disabling internal locks 
> are not held while kfree_skb() is being invoked.
> 
> Calling kfree_skb() with IRQs disabled in the e1000 driver is
> the cause of this bug.

Thanks David for your help.
 
> Jeff, if you pushed these e1000 updates that make it grap
> ->xmit_lock() with disabling IRQs instead of BH into 2.4.x
> trees too, beware!

The e1000 driver has been like this (broken) for quite a while.  Recent
updates haven't messed with this code.

This gets back to the problem of trying to flush any queued transmits
when we lose link.  The e1000 hardware stops DMA when link lose is
detected, so any work queued to hardware is "stuck", and therefore we
don't release the associated skb resources until we regain link.  This
causes problems when we're sitting under a failover setup like bonding
or ANS.

At this point, I'm leaning towards removing the offending code in the
timer callback now, and taking a step back to solve the bigger problem,
either with a better locking scheme, or a new plan on how to flush the
"stuck" work.  We don't need kernel panics when you trip over the
Ethernet cable!  Sound like a plan?

@@ -1278,41 +1278,6 @@
 		e1000_leave_82542_rst(adapter);
 }
 
-static void
-e1000_tx_flush(struct e1000_adapter *adapter)
-{
-	uint32_t ctrl, tctl, txcw, icr;
-
-	e1000_irq_disable(adapter);
-
-	if(adapter->hw.mac_type < e1000_82543) {
-		/* Transmit Unit Reset */
-		tctl = E1000_READ_REG(&adapter->hw, TCTL);
-		E1000_WRITE_REG(&adapter->hw, TCTL, tctl |
E1000_TCTL_RST);
-		E1000_WRITE_REG(&adapter->hw, TCTL, tctl);
-		e1000_clean_tx_ring(adapter);
-		e1000_configure_tx(adapter);
-	} else {
-		txcw = E1000_READ_REG(&adapter->hw, TXCW);
-		E1000_WRITE_REG(&adapter->hw, TXCW, txcw &
~E1000_TXCW_ANE);
-
-		ctrl = E1000_READ_REG(&adapter->hw, CTRL);
-		E1000_WRITE_REG(&adapter->hw, CTRL, ctrl |
E1000_CTRL_SLU |
-				E1000_CTRL_ILOS);
-
-		mdelay(10);
-
-		e1000_clean_tx_irq(adapter);
-		E1000_WRITE_REG(&adapter->hw, CTRL, ctrl);
-		E1000_WRITE_REG(&adapter->hw, TXCW, txcw);
-
-		/* clear the link status change interrupts this caused
*/
-		icr = E1000_READ_REG(&adapter->hw, ICR);
-	}
-
-	e1000_irq_enable(adapter);
-}
-
 /* need to wait a few seconds after link up to get diagnostic
information from the phy */
 
 static void
@@ -1414,15 +1379,6 @@
 	e1000_update_stats(adapter);
 	e1000_update_adaptive(&adapter->hw);
 
-	if(!netif_carrier_ok(netdev)) {
-		if(E1000_DESC_UNUSED(txdr) + 1 < txdr->count) {
-			unsigned long flags;
-			spin_lock_irqsave(&netdev->xmit_lock, flags);
-			e1000_tx_flush(adapter);
-			spin_unlock_irqrestore(&netdev->xmit_lock,
flags);
-		}
-	}
-
 	/* Dynamic mode for Interrupt Throttle Rate (ITR) */
 	if(adapter->hw.mac_type >= e1000_82540 && adapter->itr == 1) {
 		/* Symmetric Tx/Rx gets a reduced ITR=2000; Total

-scott

^ permalink raw reply	[flat|nested] 8+ messages in thread
* Fw: Badness in local_bh_enable at kernel/softirq.c:119
@ 2003-09-29 21:36 Andrew Morton
  2003-09-30  5:49 ` David S. Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2003-09-29 21:36 UTC (permalink / raw)
  To: netdev; +Cc: cramerj, scott.feldman



Badness in local_bh_enable at kernel/softirq.c:119
Call Trace:
 [<c01253d7>] local_bh_enable+0x93/0x96
 [<c032d0fb>] xprt_write_space+0xfb/0x158
 [<c02cf42a>] sock_wfree+0x48/0x4a
 [<c02cf3e2>] sock_wfree+0x0/0x4a
 [<c02d02df>] __kfree_skb+0x49/0xda
 [<c020bcc0>] __delay+0x14/0x18
 [<c02752a0>] e1000_clean_tx_irq+0x1f0/0x1f6
 [<c0273a07>] e1000_tx_flush+0x69/0xd0
 [<c0273d08>] e1000_watchdog+0xba/0x340
 [<c011be80>] scheduler_tick+0x5a6/0x5ac
 [<c0273c4e>] e1000_watchdog+0x0/0x340
 [<c0129846>] run_timer_softirq+0xe8/0x1cc
 [<c0116903>] smp_apic_timer_interrupt+0x147/0x14c
 [<c0125341>] do_softirq+0xc9/0xcc
 [<c01253ac>] local_bh_enable+0x68/0x96
 [<c02e63a2>] rt_run_flush+0xa4/0xda
 [<c031f95b>] fib_netdev_event+0x57/0x8b
 [<c012ea23>] notifier_call_chain+0x27/0x40
 [<c02d3e6b>] netdev_state_change+0x37/0x52
 [<c02de65a>] linkwatch_run_queue+0xce/0xe2
 [<c02de694>] linkwatch_event+0x26/0x2c
 [<c0131504>] worker_thread+0x212/0x314
 [<c02de66e>] linkwatch_event+0x0/0x2c
 [<c011c598>] default_wake_function+0x0/0x2e
 [<c01094b6>] ret_from_fork+0x6/0x14
 [<c011c598>] default_wake_function+0x0/0x2e
 [<c01312f2>] worker_thread+0x0/0x314
 [<c010739d>] kernel_thread_helper+0x5/0xc

It hapenned while NFS was having trouble communicating with the server.



Due to this:

		spin_lock_irqsave(&netdev->xmit_lock, flags);
		e1000_tx_flush(adapter);
		spin_unlock_irqrestore(&netdev->xmit_lock, flags);

I'd have thought that calling kfree_skb() under xmit_lock would be a big
ranking bug..  But the reason why the kernel dropped this backtrace is that
local_bh_enable() will unconditionally enable interrupts, so this driver is
exposed to a deadlock.

Other parts of the kernel do not take xmit_lock with irq's disabled, so a
simple spin_lock() here may suffice.

Oh, it's taking xmit_lock in a timer handler.  I give up.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-10-01 14:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-01  8:19 Fw: Badness in local_bh_enable at kernel/softirq.c:119 Feldman, Scott
2003-10-01  8:37 ` David S. Miller
2003-10-01 14:40 ` Randy.Dunlap
  -- strict thread matches above, loose matches on Subject: below --
2003-09-30 17:27 Feldman, Scott
2003-10-01  6:51 ` David S. Miller
2003-09-29 21:36 Andrew Morton
2003-09-30  5:49 ` David S. Miller
2003-09-30 11:53   ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).