LLTX and netif_stop

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* LLTX and netif_stop_queue
@ 2004-12-17 21:57 Roland Dreier
  2004-12-18  5:44 ` David S. Miller
  0 siblings, 1 reply; 62+ messages in thread
From: Roland Dreier @ 2004-12-17 21:57 UTC (permalink / raw)
  To: netdev, davem; +Cc: openib-general

While testing my IP-over-InfiniBand driver, I discovered that if a net
device sets NETIF_F_LLTX, it seems the device's hard_start_xmit method
can be called even after a netif_stop_queue().

This is because in the LLTX case, qdisc_restart() holds no locks while
calling hard_start_xmit, so something like the following can happen:

    CPU 1                                CPU 2

    qdisc_restart:
      drop queue lock
      call hard_start_xmit()
        net driver:
          acquire TX lock
          queue packet to HW

                                        acquire queue lock...
                                        qdisc_restart:
                                          drop queue lock
                                          call hard_start_xmit:

          queue full, call netif_stop_queue()
          release TX lock

                                            net driver:
                                              acquire TX lock
                                              queue is already full!

Is my understanding correct?  If so it seems the patch below would
make sense.  (e1000 seems to handle this properly already)

Thanks,
  Roland


Since tg3 and sungem now use lockless TX (NETIF_F_LLTX), it's possible
for their hard_start_xmit method to be called even after they call
netif_stop_queue.  Therefore a full queue no longer indicates a bug --
this patch fixes the comment and removes the KERN_ERR printk.

Signed-off-by: Roland Dreier <roland@topspin.com>

Index: linux-bk/drivers/net/sungem.c
===================================================================
--- linux-bk.orig/drivers/net/sungem.c	2004-12-16 15:56:19.000000000 -0800
+++ linux-bk/drivers/net/sungem.c	2004-12-17 13:46:43.307064457 -0800
@@ -976,12 +976,10 @@
 		return NETDEV_TX_LOCKED;
 	}
 
-	/* This is a hard error, log it. */
+	/* This may happen, since we have NETIF_F_LLTX set */
 	if (TX_BUFFS_AVAIL(gp) <= (skb_shinfo(skb)->nr_frags + 1)) {
 		netif_stop_queue(dev);
 		spin_unlock_irqrestore(&gp->tx_lock, flags);
-		printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n",
-		       dev->name);
 		return NETDEV_TX_BUSY;
 	}
 
Index: linux-bk/drivers/net/tg3.c
===================================================================
--- linux-bk.orig/drivers/net/tg3.c	2004-12-16 15:56:06.000000000 -0800
+++ linux-bk/drivers/net/tg3.c	2004-12-17 13:46:25.952622672 -0800
@@ -3076,12 +3076,10 @@
 		return NETDEV_TX_LOCKED; 
 	} 
 
-	/* This is a hard error, log it. */
+	/* This may happen, since we have NETIF_F_LLTX set */
 	if (unlikely(TX_BUFFS_AVAIL(tp) <= (skb_shinfo(skb)->nr_frags + 1))) {
 		netif_stop_queue(dev);
 		spin_unlock_irqrestore(&tp->tx_lock, flags);
-		printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n",
-		       dev->name);
 		return NETDEV_TX_BUSY;
 	}
 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-17 21:57 LLTX and netif_stop_queue Roland Dreier
@ 2004-12-18  5:44 ` David S. Miller
  2004-12-18 15:35   ` Roland Dreier
  2004-12-19 19:31   ` jamal
  0 siblings, 2 replies; 62+ messages in thread
From: David S. Miller @ 2004-12-18  5:44 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, openib-general

On Fri, 17 Dec 2004 13:57:40 -0800
Roland Dreier <roland@topspin.com> wrote:

> While testing my IP-over-InfiniBand driver, I discovered that if a net
> device sets NETIF_F_LLTX, it seems the device's hard_start_xmit method
> can be called even after a netif_stop_queue().
> 
> This is because in the LLTX case, qdisc_restart() holds no locks while
> calling hard_start_xmit, so something like the following can happen:

 ...
>  	if (TX_BUFFS_AVAIL(gp) <= (skb_shinfo(skb)->nr_frags + 1)) {
>  		netif_stop_queue(dev);
>  		spin_unlock_irqrestore(&gp->tx_lock, flags);
> -		printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n",
> -		       dev->name);
>  		return NETDEV_TX_BUSY;
>  	}

I understand the bug, but we're not going to fix it this way.
This is a crucial invariant that we need to check for because it
indicates a pretty serious state error except in this bug case
you've discovered.

Perhaps one way to fix this is to add a pointer to a spinlock to
the netdev struct, and have hold that the upper level grab that
when NETIF_F_LLTX when doing queue state checks.  Actually, that
could end up being racy too.

If we can't find a good fix for this, besides removing the necessary
debugging message, we might have to pull NETIF_F_LLTX out or disable it
temporarily until we figure out a way.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-18  5:44 ` David S. Miller
@ 2004-12-18 15:35   ` Roland Dreier
  2004-12-18 17:58     ` Roland Dreier
  2004-12-19 19:33     ` jamal
  2004-12-19 19:31   ` jamal
  1 sibling, 2 replies; 62+ messages in thread
From: Roland Dreier @ 2004-12-18 15:35 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, openib-general

    David> I understand the bug, but we're not going to fix it this
    David> way.  This is a crucial invariant that we need to check for
    David> because it indicates a pretty serious state error except in
    David> this bug case you've discovered.

OK, that's fair enough.  I was pretty bummed myself when I realized
hard_start_xmit was getting called even after I stopped the queue.
Thanks for confirming my analysis.

    David> Perhaps one way to fix this is to add a pointer to a
    David> spinlock to the netdev struct, and have hold that the upper
    David> level grab that when NETIF_F_LLTX when doing queue state
    David> checks.  Actually, that could end up being racy too.

I may be missing something, but it seems to me that we get all of the
benefits of LLTX by just documenting that device drivers can use the
xmit_lock member of struct net_device to synchronize other parts of
the driver against hard_start_xmit.  I guess the driver also should
set xmit_lock_owner to -1 after it acquires xmit_lock.

This would mean that NETIF_F_LLTX can go away, and LLTX drivers would
just replace their use of their own private tx_lock with xmit_lock
(except in hard_start_xmit, where the upper layer already holds xmit_lock).

By the way, am I correct in thinking that the use of xmit_lock_owner
in qdisc_restart() is racy?

    if (!spin_trylock(&dev->xmit_lock)) {
    /* get the lock */

                                            if (!spin_trylock(&dev->xmit_lock)) {
                                            /* fail */
                                                if (dev->xmit_lock_owner == smp_processor_id()) {
                                                /* test the wrong value */

    /* set the value too late */
    dev->xmit_lock_owner = smp_processor_id();

I don't see a simple way to fix this unfortunately.

Thanks,
  Roland

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Re: LLTX and netif_stop_queue
  2004-12-18 15:35   ` Roland Dreier
@ 2004-12-18 17:58     ` Roland Dreier
  2004-12-18 18:26       ` Roland Dreier
  2004-12-19 19:33     ` jamal
  1 sibling, 1 reply; 62+ messages in thread
From: Roland Dreier @ 2004-12-18 17:58 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, openib-general

    Roland> I may be missing something, but it seems to me that we get
    Roland> all of the benefits of LLTX by just documenting that
    Roland> device drivers can use the xmit_lock member of struct
    Roland> net_device to synchronize other parts of the driver
    Roland> against hard_start_xmit.  I guess the driver also should
    Roland> set xmit_lock_owner to -1 after it acquires xmit_lock.

Thinking about this a little more, I realize that there's no reason
for the driver to set xmit_lock_owner -- if the driver is able to
acquire the lock, then xmit_lock_owner will already be -1.

So it seems LLTX can be replaced by just having drivers use
net_device.xmit_lock instead of their own private tx_lock.  Assuming
this works (and I don't see anything wrong with it) then this seems
like a pretty nice solution: we remove some code from the networking
core and get rid of all the "trylock" logic in driver's hard_start_xmit.

 - Roland

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Re: LLTX and netif_stop_queue
  2004-12-18 17:58     ` Roland Dreier
@ 2004-12-18 18:26       ` Roland Dreier
  0 siblings, 0 replies; 62+ messages in thread
From: Roland Dreier @ 2004-12-18 18:26 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, openib-general

    Roland> So it seems LLTX can be replaced by just having drivers
    Roland> use net_device.xmit_lock instead of their own private
    Roland> tx_lock.  Assuming this works (and I don't see anything
    Roland> wrong with it) then this seems like a pretty nice
    Roland> solution: we remove some code from the networking core and
    Roland> get rid of all the "trylock" logic in driver's
    Roland> hard_start_xmit.

Actually trying it instead of talking out of my ass...

Just doing this naively without changing the net core can deadlock
because the net core acquires dev->xmit_lock without disabling
interrupts.  So if the driver tries to use xmit_lock in its interrupt
handler, it will deadlock if the interrupt occurred during
hard_start_xmit.  Even doing local_irq_save() in hard_start_xmit isn't
good enough, because there's still a window between the net core's
call to hard_start_xmit and the actual local_irq_save where xmit_lock
is held with interrupts on.

Maybe it makes sense to change NETIF_F_LLTX to NETIF_F_TX_IRQ_DIS or
something like that and have that flag mean "disable interrupts when
calling hard_start_xmit."  (We could just do this unconditionally but
I'm not sure if any drivers rely on having interrupts enabled during
hard_start_xmit and I'm worried about making a change in semantics
like that -- not to mention some drivers may not need interrupts
disabled and may not want the cost).

 - Roland

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-18  5:44 ` David S. Miller
  2004-12-18 15:35   ` Roland Dreier
@ 2004-12-19 19:31   ` jamal
  2004-12-19 19:54     ` jamal
                       ` (2 more replies)
  1 sibling, 3 replies; 62+ messages in thread
From: jamal @ 2004-12-19 19:31 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, openib-general

On Sat, 2004-12-18 at 00:44, David S. Miller wrote:

> Perhaps one way to fix this is to add a pointer to a spinlock to
> the netdev struct, and have hold that the upper level grab that
> when NETIF_F_LLTX when doing queue state checks.  Actually, that
> could end up being racy too.

How about releasing the qlock only when the LLTX transmit lock is
grabbed? That should bring it to par with what it was originally.

cheers,
jamal

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-18 15:35   ` Roland Dreier
  2004-12-18 17:58     ` Roland Dreier
@ 2004-12-19 19:33     ` jamal
  1 sibling, 0 replies; 62+ messages in thread
From: jamal @ 2004-12-19 19:33 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, David S. Miller, openib-general

On Sat, 2004-12-18 at 10:35, Roland Dreier wrote:

> By the way, am I correct in thinking that the use of xmit_lock_owner
> in qdisc_restart() is racy?

No.

>     if (!spin_trylock(&dev->xmit_lock)) {
>     /* get the lock */
> 
>                                             if (!spin_trylock(&dev->xmit_lock)) {
>                                             /* fail */
>                                                 if (dev->xmit_lock_owner == smp_processor_id()) {
>                                                 /* test the wrong value */
> 
>     /* set the value too late */
>     dev->xmit_lock_owner = smp_processor_id();
> 

The setting is protected by the queue lock. So no race.

cheers,
jamal

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-19 19:31   ` jamal
@ 2004-12-19 19:54     ` jamal
  2004-12-19 20:02       ` Jamal Hadi Salim
  2004-12-19 22:35     ` Roland Dreier
  2004-12-22 18:49     ` Eric Lemoine
  2 siblings, 1 reply; 62+ messages in thread
From: jamal @ 2004-12-19 19:54 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, openib-general

[-- Attachment #1: Type: text/plain, Size: 299 bytes --]

On Sun, 2004-12-19 at 14:31, jamal wrote:

> How about releasing the qlock only when the LLTX transmit lock is
> grabbed? That should bring it to par with what it was originally.

Something like two attached patches... one showing how to do it for
e1000. untested (not even compiled)

cheers,
jamal

[-- Attachment #2: p1 --]
[-- Type: text/plain, Size: 887 bytes --]

--- a/net/sched/bak.sch_generic.c	2004-12-19 13:46:19.799356432 -0500
+++ b/net/sched/sch_generic.c	2004-12-19 13:49:14.384815408 -0500
@@ -128,12 +128,11 @@
 			}
 			/* Remember that the driver is grabbed by us. */
 			dev->xmit_lock_owner = smp_processor_id();
+			/* And release queue */
+			spin_unlock(&dev->queue_lock);
 		}
 		
 		{
-			/* And release queue */
-			spin_unlock(&dev->queue_lock);
-
 			if (!netif_queue_stopped(dev)) {
 				int ret;
 				if (netdev_nit)
@@ -141,15 +140,14 @@
 
 				ret = dev->hard_start_xmit(skb, dev);
 				if (ret == NETDEV_TX_OK) { 
+					dev->xmit_lock_owner = -1;
 					if (!nolock) {
-						dev->xmit_lock_owner = -1;
 						spin_unlock(&dev->xmit_lock);
 					}
 					spin_lock(&dev->queue_lock);
 					return -1;
 				}
 				if (ret == NETDEV_TX_LOCKED && nolock) {
-					spin_lock(&dev->queue_lock);
 					goto collision; 
 				}
 			}

[-- Attachment #3: p2 --]
[-- Type: text/plain, Size: 479 bytes --]

--- a/drivers/net/e1000/b.e1000_main.c	2004-12-19 13:50:59.481838232 -0500
+++ b/drivers/net/e1000/e1000_main.c	2004-12-19 13:53:34.326298296 -0500
@@ -1809,6 +1809,10 @@
  		return NETDEV_TX_LOCKED; 
  	} 
 
+	/* new from sch_generic for LLTX */
+	spin_unlock(&dev->queue_lock);
+	dev->xmit_lock_owner = smp_processor_id();
+
 	/* need: count + 2 desc gap to keep tail from touching
 	 * head, otherwise try next time */
 	if(E1000_DESC_UNUSED(&adapter->tx_ring) < count + 2) {

[-- Attachment #4: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-19 19:54     ` jamal
@ 2004-12-19 20:02       ` Jamal Hadi Salim
  0 siblings, 0 replies; 62+ messages in thread
From: Jamal Hadi Salim @ 2004-12-19 20:02 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, openib-general

[-- Attachment #1: Type: text/plain, Size: 396 bytes --]


On Sun, 2004-12-19 at 14:54, jamal wrote:
> On Sun, 2004-12-19 at 14:31, jamal wrote:
> 
> > How about releasing the qlock only when the LLTX transmit lock is
> > grabbed? That should bring it to par with what it was originally.
> 
> Something like two attached patches... one showing how to do it for
> e1000. untested (not even compiled)

After attempting to compile, p2 take2.

cheers,
jamal

[-- Attachment #2: p2-take2 --]
[-- Type: text/plain, Size: 511 bytes --]

--- 2610-rc3-bk12/drivers/net/e1000/bak.e1000_main.c	2004-12-19 13:50:59.000000000 -0500
+++ 2610-rc3-bk12/drivers/net/e1000/e1000_main.c	2004-12-19 13:58:17.000000000 -0500
@@ -1809,6 +1809,10 @@
  		return NETDEV_TX_LOCKED; 
  	} 
 
+	/* new from sch_generic for LLTX */
+	spin_unlock(&netdev->queue_lock);
+	netdev->xmit_lock_owner = smp_processor_id();
+
 	/* need: count + 2 desc gap to keep tail from touching
 	 * head, otherwise try next time */
 	if(E1000_DESC_UNUSED(&adapter->tx_ring) < count + 2) {

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-19 19:31   ` jamal
  2004-12-19 19:54     ` jamal
@ 2004-12-19 22:35     ` Roland Dreier
  2004-12-19 23:06       ` jamal
  2004-12-22 18:49     ` Eric Lemoine
  2 siblings, 1 reply; 62+ messages in thread
From: Roland Dreier @ 2004-12-19 22:35 UTC (permalink / raw)
  To: hadi; +Cc: netdev, David S. Miller, openib-general

    jamal> How about releasing the qlock only when the LLTX transmit
    jamal> lock is grabbed? That should bring it to par with what it
    jamal> was originally.

This seems a little risky.  I can't point to a specific deadlock but
it doesn't seem right on general principles to unlock in a different
order than you nested the locks when acquiring them -- if I understand
correctly, you're suggesting lock(queue_lock), lock(tx_lock),
unlock(queue_lock), unlock(tx_lock).

Thanks,
  Roland

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-19 22:35     ` Roland Dreier
@ 2004-12-19 23:06       ` jamal
  2004-12-19 23:16         ` Roland Dreier
  0 siblings, 1 reply; 62+ messages in thread
From: jamal @ 2004-12-19 23:06 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, David S. Miller, openib-general

On Sun, 2004-12-19 at 17:35, Roland Dreier wrote:
>     jamal> How about releasing the qlock only when the LLTX transmit
>     jamal> lock is grabbed? That should bring it to par with what it
>     jamal> was originally.
> 
> This seems a little risky.  I can't point to a specific deadlock but
> it doesn't seem right on general principles to unlock in a different
> order than you nested the locks when acquiring them -- if I understand
> correctly, you're suggesting lock(queue_lock), lock(tx_lock),
> unlock(queue_lock), unlock(tx_lock).

There is no deadlock. Thats exactly how things work. Try the patches i
posted. 

cheers,
jamal

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-19 23:06       ` jamal
@ 2004-12-19 23:16         ` Roland Dreier
  0 siblings, 0 replies; 62+ messages in thread
From: Roland Dreier @ 2004-12-19 23:16 UTC (permalink / raw)
  To: hadi; +Cc: netdev, David S. Miller, openib-general

    Roland> This seems a little risky.  I can't point to a specific
    Roland> deadlock but it doesn't seem right on general principles
    Roland> to unlock in a different order than you nested the locks
    Roland> when acquiring them -- if I understand correctly, you're
    Roland> suggesting lock(queue_lock), lock(tx_lock),
    Roland> unlock(queue_lock), unlock(tx_lock).

    jamal> There is no deadlock. Thats exactly how things work. Try
    jamal> the patches i posted.

Thinking about it more, I guess it's OK.  I was just think about the
basic general rule that locks need to be acquired/released in LIFO
order to avoid deadlock (eg lock(A), lock(B), unlock(B), unlock(A)).

However in this case unlocking queue_lock after acquiring the driver's
tx_lock seems to be OK because the driver does a trylock on tx_lock in
the xmit path, so it can't deadlock.  At worst the trylock will just
fail to get the lock.

Thanks,
  Roland

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-19 19:31   ` jamal
  2004-12-19 19:54     ` jamal
  2004-12-19 22:35     ` Roland Dreier
@ 2004-12-22 18:49     ` Eric Lemoine
  2004-12-23  4:29       ` David S. Miller
  2 siblings, 1 reply; 62+ messages in thread
From: Eric Lemoine @ 2004-12-22 18:49 UTC (permalink / raw)
  To: hadi; +Cc: netdev, David S. Miller, openib-general

[-- Attachment #1: Type: text/plain, Size: 863 bytes --]

On 19 Dec 2004 14:31:15 -0500, jamal <hadi@cyberus.ca> wrote:
> On Sat, 2004-12-18 at 00:44, David S. Miller wrote:
> 
> > Perhaps one way to fix this is to add a pointer to a spinlock to
> > the netdev struct, and have hold that the upper level grab that
> > when NETIF_F_LLTX when doing queue state checks.  Actually, that
> > could end up being racy too.
> 
> How about releasing the qlock only when the LLTX transmit lock is
> grabbed? That should bring it to par with what it was originally.

I dont like the idea of releasing inside the driver a lock taken
outside. That might be just me...

Instead, I would suggest to have LLTX drivers check whether queue is
stopped after they grab their private tx lock and before they check tx
ring fullness. That way we close the race window but keep the driver
bug check around.

See attached sungem patch.

-- 
Eric

[-- Attachment #2: sungem-lltx.patch --]
[-- Type: application/octet-stream, Size: 511 bytes --]

===== drivers/net/sungem.c 1.71 vs edited =====
--- 1.71/drivers/net/sungem.c	2004-11-14 10:45:36 +01:00
+++ edited/drivers/net/sungem.c	2004-12-22 19:34:07 +01:00
@@ -976,6 +976,12 @@
 		return NETDEV_TX_LOCKED;
 	}
 
+	/* This handles a LLTX-related race condition */
+	if (netif_queue_stopped(dev)) {
+		spin_unlock_irqrestore(&gp->tx_lock, flags);
+		return NETDEV_TX_BUSY;
+	}
+
 	/* This is a hard error, log it. */
 	if (TX_BUFFS_AVAIL(gp) <= (skb_shinfo(skb)->nr_frags + 1)) {
 		netif_stop_queue(dev);

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-22 18:49     ` Eric Lemoine
@ 2004-12-23  4:29       ` David S. Miller
  2004-12-23  4:38         ` Roland Dreier
  2004-12-23  9:10         ` Eric Lemoine
  0 siblings, 2 replies; 62+ messages in thread
From: David S. Miller @ 2004-12-23  4:29 UTC (permalink / raw)
  To: Eric Lemoine; +Cc: netdev, hadi, openib-general

On Wed, 22 Dec 2004 19:49:48 +0100
Eric Lemoine <eric.lemoine@gmail.com> wrote:

> Instead, I would suggest to have LLTX drivers check whether queue is
> stopped after they grab their private tx lock and before they check tx
> ring fullness. That way we close the race window but keep the driver
> bug check around.
> 
> See attached sungem patch.

That sounds about right.  Nice idea.  It solves the race, and retains
the error state check.

I'll apply Eric's patch, and do something similar in the other LLTX
drivers (except loopback which has not "queue" per se so doesn't need
this stuff).

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-23  4:29       ` David S. Miller
@ 2004-12-23  4:38         ` Roland Dreier
  2004-12-23  9:10         ` Eric Lemoine
  1 sibling, 0 replies; 62+ messages in thread
From: Roland Dreier @ 2004-12-23  4:38 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eric Lemoine, hadi, openib-general

    David> That sounds about right.  Nice idea.  It solves the race,
    David> and retains the error state check.

Great, I've made a similar change to the IP-over-IB driver.

Thanks,
  Roland

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-23  4:29       ` David S. Miller
  2004-12-23  4:38         ` Roland Dreier
@ 2004-12-23  9:10         ` Eric Lemoine
  2004-12-23 16:37           ` Patrick McHardy
  1 sibling, 1 reply; 62+ messages in thread
From: Eric Lemoine @ 2004-12-23  9:10 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, hadi, openib-general

On Wed, 22 Dec 2004 20:29:19 -0800, David S. Miller <davem@davemloft.net> wrote:
> On Wed, 22 Dec 2004 19:49:48 +0100
> Eric Lemoine <eric.lemoine@gmail.com> wrote:
> 
> > Instead, I would suggest to have LLTX drivers check whether queue is
> > stopped after they grab their private tx lock and before they check tx
> > ring fullness. That way we close the race window but keep the driver
> > bug check around.
> >
> > See attached sungem patch.
> 
> That sounds about right.  Nice idea.  It solves the race, and retains
> the error state check.
> 
> I'll apply Eric's patch, and do something similar in the other LLTX
> drivers (except loopback which has not "queue" per se so doesn't need
> this stuff).

Dave,

I still have one concern with the LLTX code (and it may be that the
correct patch is Jamal's) :

Without LLTX we do : lock(queue_lock), lock(xmit_lock),
release(queue_lock), release(xmit_lock). With LLTX (without Jamal's
patch) we do : lock(queue_lock), release(queue_lock), lock(tx_lock),
release(tx_lock). LLTX doesn't look correct because it creates a race
condition window between the the two lock-protected sections. So you
may want to reconsider Jamal's patch or pull out LLTX...

Thanks, 

-- 
Eric

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-23  9:10         ` Eric Lemoine
@ 2004-12-23 16:37           ` Patrick McHardy
  2004-12-23 18:11             ` Eric Lemoine
  2004-12-24 16:10             ` Eric Lemoine
  0 siblings, 2 replies; 62+ messages in thread
From: Patrick McHardy @ 2004-12-23 16:37 UTC (permalink / raw)
  To: Eric Lemoine; +Cc: netdev, hadi, David S. Miller, openib-general

Eric Lemoine wrote:
> I still have one concern with the LLTX code (and it may be that the
> correct patch is Jamal's) :
> 
> Without LLTX we do : lock(queue_lock), lock(xmit_lock),
> release(queue_lock), release(xmit_lock). With LLTX (without Jamal's
> patch) we do : lock(queue_lock), release(queue_lock), lock(tx_lock),
> release(tx_lock). LLTX doesn't look correct because it creates a race
> condition window between the the two lock-protected sections. So you
> may want to reconsider Jamal's patch or pull out LLTX...

You're right, it can cause packet reordering if something like this
happens:

CPU1			CPU2
lock(queue_lock)	
dequeue
unlock(queue_lock)
			lock(queue_lock)
			dequeue
			unlock(queue_lock)
			lock(xmit_lock)
			hard_start_xmit
			unlock(xmit_lock)
lock(xmit_lock)
hard_start_xmit
unlock(xmit_lock)

Jamal's patch should fix this.

Regards
Patrick

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-23 16:37           ` Patrick McHardy
@ 2004-12-23 18:11             ` Eric Lemoine
  2004-12-24 16:10             ` Eric Lemoine
  1 sibling, 0 replies; 62+ messages in thread
From: Eric Lemoine @ 2004-12-23 18:11 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, hadi, David S. Miller, openib-general

On Thu, 23 Dec 2004 17:37:24 +0100, Patrick McHardy <kaber@trash.net> wrote:
> Eric Lemoine wrote:
> > I still have one concern with the LLTX code (and it may be that the
> > correct patch is Jamal's) :
> >
> > Without LLTX we do : lock(queue_lock), lock(xmit_lock),
> > release(queue_lock), release(xmit_lock). With LLTX (without Jamal's
> > patch) we do : lock(queue_lock), release(queue_lock), lock(tx_lock),
> > release(tx_lock). LLTX doesn't look correct because it creates a race
> > condition window between the the two lock-protected sections. So you
> > may want to reconsider Jamal's patch or pull out LLTX...
> 
> You're right, it can cause packet reordering 

That's exactly what I was thinking of.

> if something like this
> happens:
> 
> CPU1                    CPU2
> lock(queue_lock)
> dequeue
> unlock(queue_lock)
>                         lock(queue_lock)
>                         dequeue
>                         unlock(queue_lock)
>                         lock(xmit_lock)
>                         hard_start_xmit
>                         unlock(xmit_lock)
> lock(xmit_lock)
> hard_start_xmit
> unlock(xmit_lock)
> 
> Jamal's patch should fix this.



-- 
Eric

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-23 16:37           ` Patrick McHardy
  2004-12-23 18:11             ` Eric Lemoine
@ 2004-12-24 16:10             ` Eric Lemoine
  2004-12-28 13:31               ` jamal
  1 sibling, 1 reply; 62+ messages in thread
From: Eric Lemoine @ 2004-12-24 16:10 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, hadi, David S. Miller, openib-general

On Thu, 23 Dec 2004 17:37:24 +0100, Patrick McHardy <kaber@trash.net> wrote:
> Eric Lemoine wrote:
> > I still have one concern with the LLTX code (and it may be that the
> > correct patch is Jamal's) :
> >
> > Without LLTX we do : lock(queue_lock), lock(xmit_lock),
> > release(queue_lock), release(xmit_lock). With LLTX (without Jamal's
> > patch) we do : lock(queue_lock), release(queue_lock), lock(tx_lock),
> > release(tx_lock). LLTX doesn't look correct because it creates a race
> > condition window between the the two lock-protected sections. So you
> > may want to reconsider Jamal's patch or pull out LLTX...
> 
> You're right, it can cause packet reordering if something like this
> happens:
> 
> CPU1                    CPU2
> lock(queue_lock)
> dequeue
> unlock(queue_lock)
>                         lock(queue_lock)
>                         dequeue
>                         unlock(queue_lock)
>                         lock(xmit_lock)
>                         hard_start_xmit
>                         unlock(xmit_lock)
> lock(xmit_lock)
> hard_start_xmit
> unlock(xmit_lock)
> 
> Jamal's patch should fix this.

Yes but requiring drivers to release a lock that they should not even
be aware of doesn't sound good. Another way would be to keep
dev->queue_lock grabbed when entering start_xmit() and let the driver
drop it (and re-acquire it before it returns) only if it wishes so.
Although I don't like this too much either, that's the best way I can
think of up to now...

-- 
Eric

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-24 16:10             ` Eric Lemoine
@ 2004-12-28 13:31               ` jamal
  2005-01-02 23:30                 ` Eric Lemoine
  0 siblings, 1 reply; 62+ messages in thread
From: jamal @ 2004-12-28 13:31 UTC (permalink / raw)
  To: Eric Lemoine; +Cc: netdev, Patrick McHardy, openib-general, David S. Miller

On Fri, 2004-12-24 at 11:10, Eric Lemoine wrote:

> Yes but requiring drivers to release a lock that they should not even
> be aware of doesn't sound good. Another way would be to keep
> dev->queue_lock grabbed when entering start_xmit() and let the driver
> drop it (and re-acquire it before it returns) only if it wishes so.
> Although I don't like this too much either, that's the best way I can
> think of up to now...

I am not a big fan of that patch either, but i cant think of a cleaner
way to do it. 
The violation already happens with the LLTX flag. So maybe a big warning
that says "Do this only if you driver is LLTX enabled". The other way to
do it is put a check to see if LLTX is enabled before releasing that
lock - but why the extra cycles? Driver writer should know.

cheers,
jamal

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2004-12-28 13:31               ` jamal
@ 2005-01-02 23:30                 ` Eric Lemoine
  2005-01-03  7:41                   ` Eric Lemoine
  2005-01-03 15:04                   ` jamal
  0 siblings, 2 replies; 62+ messages in thread
From: Eric Lemoine @ 2005-01-02 23:30 UTC (permalink / raw)
  To: hadi; +Cc: netdev, Patrick McHardy, openib-general, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 1437 bytes --]

On 28 Dec 2004 08:31:57 -0500, jamal <hadi@cyberus.ca> wrote:
> On Fri, 2004-12-24 at 11:10, Eric Lemoine wrote:
> 
> > Yes but requiring drivers to release a lock that they should not even
> > be aware of doesn't sound good. Another way would be to keep
> > dev->queue_lock grabbed when entering start_xmit() and let the driver
> > drop it (and re-acquire it before it returns) only if it wishes so.
> > Although I don't like this too much either, that's the best way I can
> > think of up to now...
> 
> I am not a big fan of that patch either, but i cant think of a cleaner
> way to do it.
> The violation already happens with the LLTX flag. So maybe a big warning
> that says "Do this only if you driver is LLTX enabled". The other way to
> do it is put a check to see if LLTX is enabled before releasing that
> lock - but why the extra cycles? Driver writer should know.

Two (untested) patches implementing what I described above.

The first pach (sch_generic.patch) keeps queue_lock held in
qdisc_restart() when calling hard_start_xmit() of a LLTX driver. The
second (sungem.patch) makes sungem release queue_lock after grabbing
its private tx lock.

Note that the modifications made to qdisc_restart() are not compatible
with the current LLTX drivers. All LLTX drivers must be modified along
sungem.patch's lines. Take sungem.patch as an example of how things
must be done.

Patches apply to current davem bk snapshot.

-- 
Eric

[-- Attachment #2: sch_generic.patch --]
[-- Type: application/octet-stream, Size: 1965 bytes --]

===== net/sched/sch_generic.c 1.32 vs edited =====
--- 1.32/net/sched/sch_generic.c	2004-12-28 05:51:20 +01:00
+++ edited/net/sched/sch_generic.c	2005-01-02 23:55:18 +01:00
@@ -104,9 +104,17 @@
 		 * locking again. These checks are worth it because
 		 * even uncongested locks can be quite expensive.
 		 * The driver can do trylock like here too, in case
-		 * of lock congestion it should return -1 and the packet
-		 * will be requeued.
+		 * of lock congestion it should return NETDEV_TX_LOCKED
+		 * and the packet will be requeued.
+		 * 
+		 * Note: when the driver has LLTX set queue_lock cannot
+		 * be dropped in here if we want to avoid having packets
+		 * passing eachover during transmit. queue_lock can only
+		 * be dropped in hard_start_xmit() after private tx lock 
+		 * has been grabbed. If hard_start_xmit() does release 
+		 * queue_lock it must grab it again before it returns.
 		 */
+
 		if (!nolock) {
 			if (!spin_trylock(&dev->xmit_lock)) {
 			collision:
@@ -128,12 +136,12 @@
 			}
 			/* Remember that the driver is grabbed by us. */
 			dev->xmit_lock_owner = smp_processor_id();
+
+			/* And release queue */
+			spin_unlock(&dev->queue_lock);
 		}
 		
 		{
-			/* And release queue */
-			spin_unlock(&dev->queue_lock);
-
 			if (!netif_queue_stopped(dev)) {
 				int ret;
 				if (netdev_nit)
@@ -144,14 +152,12 @@
 					if (!nolock) {
 						dev->xmit_lock_owner = -1;
 						spin_unlock(&dev->xmit_lock);
+						spin_lock(&dev->queue_lock);
 					}
-					spin_lock(&dev->queue_lock);
 					return -1;
 				}
-				if (ret == NETDEV_TX_LOCKED && nolock) {
-					spin_lock(&dev->queue_lock);
+				if (ret == NETDEV_TX_LOCKED && nolock)
 					goto collision; 
-				}
 			}
 
 			/* NETDEV_TX_BUSY - we need to requeue */
@@ -159,8 +165,8 @@
 			if (!nolock) { 
 				dev->xmit_lock_owner = -1;
 				spin_unlock(&dev->xmit_lock);
+				spin_lock(&dev->queue_lock);
 			} 
-			spin_lock(&dev->queue_lock);
 			q = dev->qdisc;
 		}
 

[-- Attachment #3: sungem.patch --]
[-- Type: application/octet-stream, Size: 1182 bytes --]

===== drivers/net/sungem.c 1.71 vs edited =====
--- 1.71/drivers/net/sungem.c	2004-11-14 10:45:36 +01:00
+++ edited/drivers/net/sungem.c	2005-01-03 00:00:22 +01:00
@@ -950,6 +950,7 @@
 	return 0;
 }
 
+/* Called with dev->queue_lock held */
 static int gem_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct gem *gp = dev->priv;
@@ -976,8 +977,16 @@
 		return NETDEV_TX_LOCKED;
 	}
 
+	/* We can now drop queue_lock since tx_lock protects us. But remember
+	 * that queue_lock must be reacquired before we return. Moreover we
+	 * must reacquire it before releasing tx_lock to avoid being called
+	 * with the queue stopped.
+	 */
+	spin_unlock(&dev->queue_lock);
+
 	/* This is a hard error, log it. */
 	if (TX_BUFFS_AVAIL(gp) <= (skb_shinfo(skb)->nr_frags + 1)) {
+		spin_lock(&dev->queue_lock);
 		netif_stop_queue(dev);
 		spin_unlock_irqrestore(&gp->tx_lock, flags);
 		printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n",
@@ -1066,6 +1075,7 @@
 		       dev->name, entry, skb->len);
 	mb();
 	writel(gp->tx_new, gp->regs + TXDMA_KICK);
+	spin_lock(&dev->queue_lock);
 	spin_unlock_irqrestore(&gp->tx_lock, flags);
 
 	dev->trans_start = jiffies;

[-- Attachment #4: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2005-01-02 23:30                 ` Eric Lemoine
@ 2005-01-03  7:41                   ` Eric Lemoine
  2005-01-03 15:04                   ` jamal
  1 sibling, 0 replies; 62+ messages in thread
From: Eric Lemoine @ 2005-01-03  7:41 UTC (permalink / raw)
  To: hadi; +Cc: netdev, Patrick McHardy, openib-general, David S. Miller

> Two (untested) patches implementing what I described above.
> 
> The first pach (sch_generic.patch) keeps queue_lock held in
> qdisc_restart() when calling hard_start_xmit() of a LLTX driver. The
> second (sungem.patch) makes sungem release queue_lock after grabbing
> its private tx lock.
> 
> Note that the modifications made to qdisc_restart() are not compatible
> with the current LLTX drivers. All LLTX drivers must be modified along
> sungem.patch's lines. Take sungem.patch as an example of how things
> must be done.

Correction: the current LLTX drivers need not be modified to work
correctly. However, as an opimisation, they can modified along
sungem.patch's line.

Thanks,
-- 
Eric

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2005-01-02 23:30                 ` Eric Lemoine
  2005-01-03  7:41                   ` Eric Lemoine
@ 2005-01-03 15:04                   ` jamal
  2005-01-03 15:48                     ` Eric Lemoine
  2005-01-03 15:57                     ` Roland Dreier
  1 sibling, 2 replies; 62+ messages in thread
From: jamal @ 2005-01-03 15:04 UTC (permalink / raw)
  To: Eric Lemoine, Andi Kleen
  Cc: Patrick McHardy, David S. Miller, roland, netdev, openib-general

this piece:
-----
+
+                       /* And release queue */
+                       spin_unlock(&dev->queue_lock);
                }
---------

Isnt there a possibility (higher as you increase number of processors)
that you will spin for a long time in _the driver_ waiting for the queue
lock? 

Maybe we need a flag "the queue lock" is already been grabbed passed
to the driver. LLTX is a good  alias but insufficient (if you run your
driver in an old kernels which support LLTX but not the idea of locking
the queue for you).

Andi or anyone else - has anyone really done perfomance analysis of LLTX
(with anything other than loopback) and shown it has any value? Maybe
time to just shoot the damn thing.

cheers,
jamal

On Sun, 2005-01-02 at 18:30, Eric Lemoine wrote:
> On 28 Dec 2004 08:31:57 -0500, jamal <hadi@cyberus.ca> wrote:
> > On Fri, 2004-12-24 at 11:10, Eric Lemoine wrote:
> > 
> > > Yes but requiring drivers to release a lock that they should not even
> > > be aware of doesn't sound good. Another way would be to keep
> > > dev->queue_lock grabbed when entering start_xmit() and let the driver
> > > drop it (and re-acquire it before it returns) only if it wishes so.
> > > Although I don't like this too much either, that's the best way I can
> > > think of up to now...
> > 
> > I am not a big fan of that patch either, but i cant think of a cleaner
> > way to do it.
> > The violation already happens with the LLTX flag. So maybe a big warning
> > that says "Do this only if you driver is LLTX enabled". The other way to
> > do it is put a check to see if LLTX is enabled before releasing that
> > lock - but why the extra cycles? Driver writer should know.
> 
> Two (untested) patches implementing what I described above.
> 
> The first pach (sch_generic.patch) keeps queue_lock held in
> qdisc_restart() when calling hard_start_xmit() of a LLTX driver. The
> second (sungem.patch) makes sungem release queue_lock after grabbing
> its private tx lock.
> 
> Note that the modifications made to qdisc_restart() are not compatible
> with the current LLTX drivers. All LLTX drivers must be modified along
> sungem.patch's lines. Take sungem.patch as an example of how things
> must be done.
> 
> Patches apply to current davem bk snapshot.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2005-01-03 15:04                   ` jamal
@ 2005-01-03 15:48                     ` Eric Lemoine
  2005-01-03 15:57                     ` Roland Dreier
  1 sibling, 0 replies; 62+ messages in thread
From: Eric Lemoine @ 2005-01-03 15:48 UTC (permalink / raw)
  To: hadi; +Cc: netdev, Andi Kleen, openib-general, Patrick McHardy,
	David S. Miller

On 03 Jan 2005 10:04:21 -0500, jamal <hadi@cyberus.ca> wrote:
> this piece:
> -----
> +
> +                       /* And release queue */
> +                       spin_unlock(&dev->queue_lock);
>                 }
> ---------
> 
> Isnt there a possibility (higher as you increase number of processors)
> that you will spin for a long time in _the driver_ waiting for the queue
> lock?

I don't see how LLTX is different from non-LLTX wrt the time spent
spinning on queue_lock. What am i missing?

> 
> Maybe we need a flag "the queue lock" is already been grabbed passed
> to the driver. LLTX is a good  alias but insufficient (if you run your
> driver in an old kernels which support LLTX but not the idea of locking
> the queue for you).
> 
> Andi or anyone else - has anyone really done perfomance analysis of LLTX
> (with anything other than loopback) and shown it has any value? Maybe
> time to just shoot the damn thing.

I do not like LLTX too much either as I can't see a cleaner way to get
rid of that packet reordering race condition.

-- 
Eric

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2005-01-03 15:04                   ` jamal
  2005-01-03 15:48                     ` Eric Lemoine
@ 2005-01-03 15:57                     ` Roland Dreier
  2005-01-03 16:41                       ` Eric Lemoine
  1 sibling, 1 reply; 62+ messages in thread
From: Roland Dreier @ 2005-01-03 15:57 UTC (permalink / raw)
  To: hadi
  Cc: netdev, Eric Lemoine, Andi Kleen, openib-general, Patrick McHardy,
	David S. Miller

    jamal> Andi or anyone else - has anyone really done perfomance
    jamal> analysis of LLTX (with anything other than loopback) and
    jamal> shown it has any value? Maybe time to just shoot the damn
    jamal> thing.

For the IPoIB driver, it seems to be worth a couple percent.  It's
hard to get really consistent results but on my pair of test systems,
TCP throughput with netperf goes up by about 30 Mbps out of roughly
1900 Mbps total.  It's probably not worth the complexity in the net
core if all drivers only get that level of improvement.

 - Roland

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2005-01-03 15:57                     ` Roland Dreier
@ 2005-01-03 16:41                       ` Eric Lemoine
  2005-01-03 16:54                         ` Roland Dreier
  0 siblings, 1 reply; 62+ messages in thread
From: Eric Lemoine @ 2005-01-03 16:41 UTC (permalink / raw)
  To: Roland Dreier
  Cc: netdev, hadi, Andi Kleen, openib-general, Patrick McHardy,
	David S. Miller

On Mon, 03 Jan 2005 07:57:14 -0800, Roland Dreier <roland@topspin.com> wrote:
>     jamal> Andi or anyone else - has anyone really done perfomance
>     jamal> analysis of LLTX (with anything other than loopback) and
>     jamal> shown it has any value? Maybe time to just shoot the damn
>     jamal> thing.
> 
> For the IPoIB driver, it seems to be worth a couple percent.  It's
> hard to get really consistent results but on my pair of test systems,
> TCP throughput with netperf goes up by about 30 Mbps out of roughly
> 1900 Mbps total.  It's probably not worth the complexity in the net
> core if all drivers only get that level of improvement.
> 
>  - Roland
> 

What are your machines? In particular, how many CPUs do they have?

-- 
Eric

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2005-01-03 16:41                       ` Eric Lemoine
@ 2005-01-03 16:54                         ` Roland Dreier
  2005-01-03 17:07                           ` Eric Lemoine
  0 siblings, 1 reply; 62+ messages in thread
From: Roland Dreier @ 2005-01-03 16:54 UTC (permalink / raw)
  To: Eric Lemoine
  Cc: netdev, hadi, Andi Kleen, openib-general, Patrick McHardy,
	David S. Miller

    Eric> What are your machines? In particular, how many CPUs do they have?

Dual Xeons with HT on, so they look like 4 CPUs.

 - R.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: LLTX and netif_stop_queue
  2005-01-03 16:54                         ` Roland Dreier
@ 2005-01-03 17:07                           ` Eric Lemoine
  2005-01-03 17:12                             ` Grant Grundler
  0 siblings, 1 reply; 62+ messages in thread
From: Eric Lemoine @ 2005-01-03 17:07 UTC (permalink / raw)
  To: Roland Dreier
  Cc: netdev, hadi, Andi Kleen, openib-general, Patrick McHardy,
	David S. Miller

On Mon, 03 Jan 2005 08:54:59 -0800, Roland Dreier <roland@topspin.com> wrote:
>     Eric> What are your machines? In particular, how many CPUs do they have?
> 
> Dual Xeons with HT on, so they look like 4 CPUs.

If I understand correctly, LLTX aims at avoiding cache misses on lock
variables (because of cacheline bouncing). So the effect of LLTX
should increase as the number of CPUs not sharing the same cache
increases. And two CPUs might not be enough...

-- 
Eric

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Re: LLTX and netif_stop_queue
  2005-01-03 17:07                           ` Eric Lemoine
@ 2005-01-03 17:12                             ` Grant Grundler
  2005-01-04  4:18                               ` jamal
  0 siblings, 1 reply; 62+ messages in thread
From: Grant Grundler @ 2005-01-03 17:12 UTC (permalink / raw)
  To: Eric Lemoine
  Cc: netdev, hadi, Andi Kleen, openib-general, David S. Miller,
	Patrick McHardy

On Mon, Jan 03, 2005 at 06:07:00PM +0100, Eric Lemoine wrote:
> If I understand correctly, LLTX aims at avoiding cache misses on lock
> variables (because of cacheline bouncing). So the effect of LLTX
> should increase as the number of CPUs not sharing the same cache
> increases. And two CPUs might not be enough...

Cacheline bouncing usually starts to show up with > 4 CPUs (ie 8 or more).
Some workloads that Jamal cares about (routing) only need 2 cpus.

grant

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Re: LLTX and netif_stop_queue
  2005-01-03 17:12                             ` Grant Grundler
@ 2005-01-04  4:18                               ` jamal
  2005-01-19 22:47                                 ` David S. Miller
  0 siblings, 1 reply; 62+ messages in thread
From: jamal @ 2005-01-04  4:18 UTC (permalink / raw)
  To: Grant Grundler
  Cc: netdev, Eric Lemoine, Andi Kleen, openib-general, David S. Miller,
	Patrick McHardy

On Mon, 2005-01-03 at 12:12, Grant Grundler wrote:

> Some workloads that Jamal cares about (routing) only need 2 cpus.

What bothers me is more the complexity this has introduced more than
the workload. OTOH, 1-2% improvement posted by Roland is good
justification.

cheers,
jamal

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Re: LLTX and netif_stop_queue
  2005-01-04  4:18                               ` jamal
@ 2005-01-19 22:47                                 ` David S. Miller
  2005-01-19 23:18                                   ` Stephen Hemminger
  0 siblings, 1 reply; 62+ messages in thread
From: David S. Miller @ 2005-01-19 22:47 UTC (permalink / raw)
  To: hadi; +Cc: netdev, eric.lemoine, ak, openib-general, kaber

On 03 Jan 2005 23:18:14 -0500
jamal <hadi@cyberus.ca> wrote:

> On Mon, 2005-01-03 at 12:12, Grant Grundler wrote:
> 
> > Some workloads that Jamal cares about (routing) only need 2 cpus.
> 
> What bothers me is more the complexity this has introduced more than
> the workload. OTOH, 1-2% improvement posted by Roland is good
> justification.

I think I'm going to put in something like Eric's patch and fix
up the other LLTX drivers as per his sungem patch.

There is a part of me that does want to yank LLTX for non-loopback
out of the tree.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Re: LLTX and netif_stop_queue
  2005-01-19 22:47                                 ` David S. Miller
@ 2005-01-19 23:18                                   ` Stephen Hemminger
  2005-01-19 23:41                                     ` David S. Miller
  2005-01-20  0:46                                     ` [PATCH]: was " David S. Miller
  0 siblings, 2 replies; 62+ messages in thread
From: Stephen Hemminger @ 2005-01-19 23:18 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, eric.lemoine, hadi, ak, openib-general, kaber

On Wed, 19 Jan 2005 14:47:11 -0800
"David S. Miller" <davem@davemloft.net> wrote:

> On 03 Jan 2005 23:18:14 -0500
> jamal <hadi@cyberus.ca> wrote:
> 
> > On Mon, 2005-01-03 at 12:12, Grant Grundler wrote:
> > 
> > > Some workloads that Jamal cares about (routing) only need 2 cpus.
> > 
> > What bothers me is more the complexity this has introduced more than
> > the workload. OTOH, 1-2% improvement posted by Roland is good
> > justification.
> 
> I think I'm going to put in something like Eric's patch and fix
> up the other LLTX drivers as per his sungem patch.
> 
> There is a part of me that does want to yank LLTX for non-loopback
> out of the tree.

Wondering, why not just have the drivers have a way to lock dev->queue_lock
in the interrupt handler, and change the xmit to do spin_lock_irqsave?

Any driver that assumes it is being called with irq's enabled in transmit
is probably already busted anyway.

-- 
Stephen Hemminger	<shemminger@osdl.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Re: LLTX and netif_stop_queue
  2005-01-19 23:18                                   ` Stephen Hemminger
@ 2005-01-19 23:41                                     ` David S. Miller
  2005-01-20  0:02                                       ` [openib-general] " Jeff Garzik
  2005-01-20  0:47                                       ` Francois Romieu
  2005-01-20  0:46                                     ` [PATCH]: was " David S. Miller
  1 sibling, 2 replies; 62+ messages in thread
From: David S. Miller @ 2005-01-19 23:41 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, eric.lemoine, hadi, ak, openib-general, kaber

On Wed, 19 Jan 2005 15:18:53 -0800
Stephen Hemminger <shemminger@osdl.org> wrote:

> Wondering, why not just have the drivers have a way to lock dev->queue_lock
> in the interrupt handler, and change the xmit to do spin_lock_irqsave?
> 
> Any driver that assumes it is being called with irq's enabled in transmit
> is probably already busted anyway.

Yes, that's an idea.

Effectively, LLTX is working around the fact that dev->xmit_lock is
BH disabling instead of IRQ disabling.  And there is no fundamental
reason for that.

Usually we strive for BH disabling locks in order to decrease the
amount of hard IRQ disabling time in the kernel.  But here, as soon
as we call into ->hard_start_xmit(), the driver typically turns hard
IRQs off anyways.

There are other things using this though, such as multicast list
upload.

If we change dev->xmit_lock to be an IRQ disabling lock, then drivers
can use it in place of their private tx_lock.

We would still need something special for loopback, which wants and
needs no locking at all.

Also, changing dev->xmit_lock to be IRQ disabling is not %100 trivial.
We'd need to verify the side-effects this has on gems like the
spin_unlock_wait(&dev->xmit_lock) in the Tulip winbond power management
code.  There is another fun case in the 6pack hamradio driver where it
is doing IRQ disabling when taking dev->xmit_lock.

Originally, dev->xmit_lock was added so that drivers that were SMP dumb
could stay that way.  Thus preserving the guarentee that there would be
only one active call into the dev->hard_start_xmit method across the
entire system.  I don't think any of that is relevant any longer.  All
of our network drivers are pretty clean in this regard.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [openib-general] Re: LLTX and netif_stop_queue
  2005-01-19 23:41                                     ` David S. Miller
@ 2005-01-20  0:02                                       ` Jeff Garzik
  2005-01-20  0:46                                         ` Stephen Hemminger
  2005-01-20  0:47                                       ` Francois Romieu
  1 sibling, 1 reply; 62+ messages in thread
From: Jeff Garzik @ 2005-01-20  0:02 UTC (permalink / raw)
  To: David S. Miller
  Cc: Stephen Hemminger, hadi, iod00d, eric.lemoine, roland, netdev, ak,
	openib-general, kaber

David S. Miller wrote:
> Usually we strive for BH disabling locks in order to decrease the
> amount of hard IRQ disabling time in the kernel.  But here, as soon
> as we call into ->hard_start_xmit(), the driver typically turns hard
> IRQs off anyways.
[...]
> Also, changing dev->xmit_lock to be IRQ disabling is not %100 trivial.


Agreed on both counts...

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [openib-general] Re: LLTX and netif_stop_queue
  2005-01-20  0:02                                       ` [openib-general] " Jeff Garzik
@ 2005-01-20  0:46                                         ` Stephen Hemminger
  2005-01-20  0:47                                           ` David S. Miller
  0 siblings, 1 reply; 62+ messages in thread
From: Stephen Hemminger @ 2005-01-20  0:46 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: David S. Miller, hadi, iod00d, eric.lemoine, roland, netdev, ak,
	openib-general, kaber

On Wed, 19 Jan 2005 19:02:52 -0500
Jeff Garzik <jgarzik@pobox.com> wrote:

> David S. Miller wrote:
> > Usually we strive for BH disabling locks in order to decrease the
> > amount of hard IRQ disabling time in the kernel.  But here, as soon
> > as we call into ->hard_start_xmit(), the driver typically turns hard
> > IRQs off anyways.
> [...]
> > Also, changing dev->xmit_lock to be IRQ disabling is not %100 trivial.
> 
> 
> Agreed on both counts...

But in the end it would be safer than the current LLTX usage in drivers
which seems like it opening up problems.


-- 
Stephen Hemminger	<shemminger@osdl.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-19 23:18                                   ` Stephen Hemminger
  2005-01-19 23:41                                     ` David S. Miller
@ 2005-01-20  0:46                                     ` David S. Miller
  2005-01-20  3:14                                       ` Andi Kleen
                                                         ` (2 more replies)
  1 sibling, 3 replies; 62+ messages in thread
From: David S. Miller @ 2005-01-20  0:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, eric.lemoine, hadi, ak, openib-general, kaber


Ok, here is what something like this might look like.

In summary:

1) dev->xmit_lock is now IRQ disabling instead of BH disabling
2) Drivers can use dev->xmit_lock in place of their private
   driver_priv->tx_lock and this is effectively the same gain
   LLTX gave drivers sans the race condition which started this
   thread.
3) NETDEV_TX_LOCKED is gone
4) LLTX stays, but this means that the driver's TX routine is
   %100 lockless.  It is only to be used by loopback and other
   queueless software devices such as tunnels.

BTW, one next step would be to implement things as Alexey
prescribes in net/core/dev.c, that is to make all software
devices %100 lockless and use LLTX.  Then all the locking code
under his comment in dev_queue_xmit() for the queueless device
case can be deleted.  We can assert the LLTX flag being set in
that code path in order to catch unconverted software devices.

Also, this new semantic of LLTX means that it should never be
seen set in qdisc_restart() any longer.  So we could remove all
of that LLTX complexity from that code and assert on it not being
set.  That would be quite a welcome cleanup.

Comments?

===== Documentation/networking/netdevices.txt 1.5 vs edited =====
--- 1.5/Documentation/networking/netdevices.txt	2004-09-12 16:55:06 -07:00
+++ edited/Documentation/networking/netdevices.txt	2005-01-19 16:09:50 -08:00
@@ -45,10 +45,9 @@
 	Synchronization: dev->xmit_lock spinlock.
 	When the driver sets NETIF_F_LLTX in dev->features this will be
 	called without holding xmit_lock. In this case the driver 
-	has to lock by itself when needed. It is recommended to use a try lock
-	for this and return -1 when the spin lock fails. 
-	The locking there should also properly protect against 
-	set_multicast_list
+	has to execute it's transmission routine in a completely lockless
+	manner.  It is recommended only for queueless devices such
+	loopback and tunnels.
 	Context: BHs disabled
 	Notes: netif_queue_stopped() is guaranteed false
 	Return codes: 
@@ -56,8 +55,6 @@
 	o NETDEV_TX_BUSY Cannot transmit packet, try later 
 	  Usually a bug, means queue start/stop flow control is broken in
 	  the driver. Note: the driver must NOT put the skb in its DMA ring.
-	o NETDEV_TX_LOCKED Locking failed, please retry quickly.
-	  Only valid when NETIF_F_LLTX is set.
 
 dev->tx_timeout:
 	Synchronization: dev->xmit_lock spinlock.
===== drivers/infiniband/ulp/ipoib/ipoib.h 1.3 vs edited =====
--- 1.3/drivers/infiniband/ulp/ipoib/ipoib.h	2005-01-15 14:01:50 -08:00
+++ edited/drivers/infiniband/ulp/ipoib/ipoib.h	2005-01-19 15:50:47 -08:00
@@ -104,10 +104,10 @@
 };
 
 /*
- * Device private locking: tx_lock protects members used in TX fast
- * path (and we use LLTX so upper layers don't do extra locking).
- * lock protects everything else.  lock nests inside of tx_lock (ie
- * tx_lock must be acquired first if needed).
+ * Device private locking: netdev->xmit_lock protects members used
+ * in TX fast path.
+ * lock protects everything else.  lock nests inside of xmit_lock (ie
+ * xmit_lock must be acquired first if needed).
  */
 struct ipoib_dev_priv {
 	spinlock_t lock;
@@ -150,7 +150,6 @@
 
 	struct ipoib_buf *rx_ring;
 
-	spinlock_t        tx_lock;
 	struct ipoib_buf *tx_ring;
 	unsigned          tx_head;
 	unsigned          tx_tail;
===== drivers/infiniband/ulp/ipoib/ipoib_ib.c 1.4 vs edited =====
--- 1.4/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2005-01-15 14:01:50 -08:00
+++ edited/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2005-01-19 15:51:36 -08:00
@@ -247,12 +247,12 @@
 
 		dev_kfree_skb_any(tx_req->skb);
 
-		spin_lock_irqsave(&priv->tx_lock, flags);
+		spin_lock_irqsave(&dev->xmit_lock, flags);
 		++priv->tx_tail;
 		if (netif_queue_stopped(dev) &&
 		    priv->tx_head - priv->tx_tail <= IPOIB_TX_RING_SIZE / 2)
 			netif_wake_queue(dev);
-		spin_unlock_irqrestore(&priv->tx_lock, flags);
+		spin_unlock_irqrestore(&dev->xmit_lock, flags);
 
 		if (wc->status != IB_WC_SUCCESS &&
 		    wc->status != IB_WC_WR_FLUSH_ERR)
===== drivers/infiniband/ulp/ipoib/ipoib_main.c 1.3 vs edited =====
--- 1.3/drivers/infiniband/ulp/ipoib/ipoib_main.c	2005-01-15 14:01:48 -08:00
+++ edited/drivers/infiniband/ulp/ipoib/ipoib_main.c	2005-01-19 15:53:27 -08:00
@@ -411,7 +411,7 @@
 
 	/*
 	 * We can only be called from ipoib_start_xmit, so we're
-	 * inside tx_lock -- no need to save/restore flags.
+	 * inside dev->xmit_lock -- no need to save/restore flags.
 	 */
 	spin_lock(&priv->lock);
 
@@ -483,7 +483,7 @@
 
 	/*
 	 * We can only be called from ipoib_start_xmit, so we're
-	 * inside tx_lock -- no need to save/restore flags.
+	 * inside dev->xmit_lock -- no need to save/restore flags.
 	 */
 	spin_lock(&priv->lock);
 
@@ -526,27 +526,11 @@
 	spin_unlock(&priv->lock);
 }
 
+/* Called with dev->xmit_lock held and IRQs disabled.  */
 static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ipoib_neigh *neigh;
-	unsigned long flags;
-
-	local_irq_save(flags);
-	if (!spin_trylock(&priv->tx_lock)) {
-		local_irq_restore(flags);
-		return NETDEV_TX_LOCKED;
-	}
-
-	/*
-	 * Check if our queue is stopped.  Since we have the LLTX bit
-	 * set, we can't rely on netif_stop_queue() preventing our
-	 * xmit function from being called with a full queue.
-	 */
-	if (unlikely(netif_queue_stopped(dev))) {
-		spin_unlock_irqrestore(&priv->tx_lock, flags);
-		return NETDEV_TX_BUSY;
-	}
 
 	if (skb->dst && skb->dst->neighbour) {
 		if (unlikely(!*to_ipoib_neigh(skb->dst->neighbour))) {
@@ -601,7 +585,6 @@
 	}
 
 out:
-	spin_unlock_irqrestore(&priv->tx_lock, flags);
 
 	return NETDEV_TX_OK;
 }
@@ -797,7 +780,7 @@
 	dev->addr_len 		 = INFINIBAND_ALEN;
 	dev->type 		 = ARPHRD_INFINIBAND;
 	dev->tx_queue_len 	 = IPOIB_TX_RING_SIZE * 2;
-	dev->features            = NETIF_F_VLAN_CHALLENGED | NETIF_F_LLTX;
+	dev->features            = NETIF_F_VLAN_CHALLENGED;
 
 	/* MTU will be reset when mcast join happens */
 	dev->mtu 		 = IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN;
@@ -812,7 +795,6 @@
 	priv->dev = dev;
 
 	spin_lock_init(&priv->lock);
-	spin_lock_init(&priv->tx_lock);
 
 	init_MUTEX(&priv->mcast_mutex);
 	init_MUTEX(&priv->vlan_mutex);
===== drivers/net/sungem.c 1.72 vs edited =====
--- 1.72/drivers/net/sungem.c	2004-11-05 15:56:15 -08:00
+++ edited/drivers/net/sungem.c	2005-01-19 15:58:26 -08:00
@@ -835,9 +835,9 @@
 		}
 
 		/* Run TX completion thread */
-		spin_lock(&gp->tx_lock);
+		spin_lock(&dev->xmit_lock);
 		gem_tx(dev, gp, gp->status);
-		spin_unlock(&gp->tx_lock);
+		spin_unlock(&dev->xmit_lock);
 
 		spin_unlock_irqrestore(&gp->lock, flags);
 
@@ -932,12 +932,12 @@
 	       readl(gp->regs + MAC_RXCFG));
 
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	gp->reset_task_pending = 2;
 	schedule_work(&gp->reset_task);
 
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 }
 
@@ -955,7 +955,6 @@
 	struct gem *gp = dev->priv;
 	int entry;
 	u64 ctrl;
-	unsigned long flags;
 
 	ctrl = 0;
 	if (skb->ip_summed == CHECKSUM_HW) {
@@ -969,17 +968,9 @@
 			(csum_stuff_off << 21));
 	}
 
-	local_irq_save(flags);
-	if (!spin_trylock(&gp->tx_lock)) {
-		/* Tell upper layer to requeue */
-		local_irq_restore(flags);
-		return NETDEV_TX_LOCKED;
-	}
-
 	/* This is a hard error, log it. */
 	if (TX_BUFFS_AVAIL(gp) <= (skb_shinfo(skb)->nr_frags + 1)) {
 		netif_stop_queue(dev);
-		spin_unlock_irqrestore(&gp->tx_lock, flags);
 		printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n",
 		       dev->name);
 		return NETDEV_TX_BUSY;
@@ -1066,7 +1057,6 @@
 		       dev->name, entry, skb->len);
 	mb();
 	writel(gp->tx_new, gp->regs + TXDMA_KICK);
-	spin_unlock_irqrestore(&gp->tx_lock, flags);
 
 	dev->trans_start = jiffies;
 
@@ -1097,11 +1087,11 @@
 	}
 
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	dev->mtu = new_mtu;
 	gp->reset_task_pending = 1;
 	schedule_work(&gp->reset_task);
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 
 	flush_scheduled_work();
@@ -1111,7 +1101,7 @@
 
 #define STOP_TRIES 32
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_stop(struct gem *gp)
 {
 	int limit;
@@ -1137,7 +1127,7 @@
 		printk(KERN_ERR "%s: SW reset is ghetto.\n", gp->dev->name);
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_start_dma(struct gem *gp)
 {
 	unsigned long val;
@@ -1162,7 +1152,7 @@
 }
 
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 // XXX dbl check what that function should do when called on PCS PHY
 static void gem_begin_auto_negotiation(struct gem *gp, struct ethtool_cmd *ep)
 {
@@ -1249,7 +1239,7 @@
 /* A link-up condition has occurred, initialize and enable the
  * rest of the chip.
  *
- * Must be invoked under gp->lock and gp->tx_lock.
+ * Must be invoked under gp->lock and dev->xmit_lock.
  */
 static int gem_set_link_modes(struct gem *gp)
 {
@@ -1356,7 +1346,7 @@
 	return 0;
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static int gem_mdio_link_not_up(struct gem *gp)
 {
 	switch (gp->lstate) {
@@ -1414,7 +1404,7 @@
 
 	netif_poll_disable(gp->dev);
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&gp->dev->xmit_lock);
 
 	if (gp->hw_running && gp->opened) {
 		netif_stop_queue(gp->dev);
@@ -1430,7 +1420,7 @@
 	}
 	gp->reset_task_pending = 0;
 
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&gp->dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 	netif_poll_enable(gp->dev);
 }
@@ -1444,7 +1434,7 @@
 		return;
 
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&gp->dev->xmit_lock);
 
 	/* If the link of task is still pending, we just
 	 * reschedule the link timer
@@ -1514,11 +1504,11 @@
 restart:
 	mod_timer(&gp->link_timer, jiffies + ((12 * HZ) / 10));
 out_unlock:
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&gp->dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_clean_rings(struct gem *gp)
 {
 	struct gem_init_block *gb = gp->init_block;
@@ -1569,7 +1559,7 @@
 	}
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_init_rings(struct gem *gp)
 {
 	struct gem_init_block *gb = gp->init_block;
@@ -1619,7 +1609,7 @@
 	wmb();
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_init_phy(struct gem *gp)
 {
 	u32 mifcfg;
@@ -1757,7 +1747,7 @@
 	}
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_init_dma(struct gem *gp)
 {
 	u64 desc_dma = (u64) gp->gblock_dvma;
@@ -1795,7 +1785,7 @@
 		       gp->regs + RXDMA_BLANK);
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static u32
 gem_setup_multicast(struct gem *gp)
 {
@@ -1838,7 +1828,7 @@
 	return rxcfg;
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_init_mac(struct gem *gp)
 {
 	unsigned char *e = &gp->dev->dev_addr[0];
@@ -1916,7 +1906,7 @@
 	writel(0xffffffff, gp->regs + MAC_MCMASK);
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_init_pause_thresholds(struct gem *gp)
 {
        	u32 cfg;
@@ -2052,7 +2042,7 @@
 	return 0;
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_init_hw(struct gem *gp, int restart_link)
 {
 	/* On Apple's gmac, I initialize the PHY only after
@@ -2150,11 +2140,11 @@
 
 	if (!gp->wake_on_lan) {
 		spin_lock_irqsave(&gp->lock, flags);
-		spin_lock(&gp->tx_lock);
+		spin_lock(&gp->dev->xmit_lock);
 		gem_stop(gp);
 		writel(MAC_TXRST_CMD, gp->regs + MAC_TXRST);
 		writel(MAC_RXRST_CMD, gp->regs + MAC_RXRST);
-		spin_unlock(&gp->tx_lock);
+		spin_unlock(&gp->dev->xmit_lock);
 		spin_unlock_irqrestore(&gp->lock, flags);
 	}
 
@@ -2202,9 +2192,9 @@
 		unsigned long flags;
 
 		spin_lock_irqsave(&gp->lock, flags);
-		spin_lock(&gp->tx_lock);
+		spin_lock(&gp->dev->xmit_lock);
 		gem_stop(gp);
-		spin_unlock(&gp->tx_lock);
+		spin_unlock(&gp->dev->xmit_lock);
 		spin_unlock_irqrestore(&gp->lock, flags);
 	}
 }
@@ -2265,9 +2255,9 @@
 
 		/* Reset the chip */
 		spin_lock_irq(&gp->lock);
-		spin_lock(&gp->tx_lock);
+		spin_lock(&gp->dev->xmit_lock);
 		gem_stop(gp);
-		spin_unlock(&gp->tx_lock);
+		spin_unlock(&gp->dev->xmit_lock);
 		spin_unlock_irq(&gp->lock);
 
 		gp->hw_running = 1;
@@ -2281,7 +2271,7 @@
 		printk(KERN_ERR "%s: failed to request irq !\n", gp->dev->name);
 
 		spin_lock_irq(&gp->lock);
-		spin_lock(&gp->tx_lock);
+		spin_lock(&gp->dev->xmit_lock);
 #ifdef CONFIG_PPC_PMAC
 		if (!hw_was_up && gp->pdev->vendor == PCI_VENDOR_ID_APPLE)
 			gem_apple_powerdown(gp);
@@ -2290,14 +2280,14 @@
 		gp->pm_timer.expires = jiffies + 10*HZ;
 		add_timer(&gp->pm_timer);
 		up(&gp->pm_sem);
-		spin_unlock(&gp->tx_lock);
+		spin_unlock(&gp->dev->xmit_lock);
 		spin_unlock_irq(&gp->lock);
 
 		return -EAGAIN;
 	}
 
        	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&gp->dev->xmit_lock);
 
 	/* Allocate & setup ring buffers */
 	gem_init_rings(gp);
@@ -2307,7 +2297,7 @@
 
 	gp->opened = 1;
 
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&gp->dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 
 	up(&gp->pm_sem);
@@ -2328,7 +2318,7 @@
 
 	/* Stop traffic, mark us closed */
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&gp->dev->xmit_lock);
 
 	gp->opened = 0;	
 
@@ -2343,7 +2333,7 @@
 	/* Bye, the pm timer will finish the job */
 	free_irq(gp->pdev->irq, (void *) dev);
 
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&gp->dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 
 	/* Fire the PM timer that will shut us down in about 10 seconds */
@@ -2374,7 +2364,7 @@
 	/* If the driver is opened, we stop the DMA */
 	if (gp->opened) {
 		spin_lock_irq(&gp->lock);
-		spin_lock(&gp->tx_lock);
+		spin_lock(&gp->dev->xmit_lock);
 
 		/* Stop traffic, mark us closed */
 		netif_device_detach(dev);
@@ -2385,7 +2375,7 @@
 		/* Get rid of ring buffers */
 		gem_clean_rings(gp);
 
-		spin_unlock(&gp->tx_lock);
+		spin_unlock(&gp->dev->xmit_lock);
 		spin_unlock_irq(&gp->lock);
 
 		if (gp->pdev->vendor == PCI_VENDOR_ID_APPLE)
@@ -2419,14 +2409,14 @@
 		}
 #endif /* CONFIG_PPC_PMAC */
 		spin_lock_irq(&gp->lock);
-		spin_lock(&gp->tx_lock);
+		spin_lock(&gp->dev->xmit_lock);
 
 		gem_stop(gp);
 		gp->hw_running = 1;
 		gem_init_rings(gp);
 		gem_init_hw(gp, 1);
 
-		spin_unlock(&gp->tx_lock);
+		spin_unlock(&gp->dev->xmit_lock);
 		spin_unlock_irq(&gp->lock);
 
 		netif_device_attach(dev);
@@ -2447,7 +2437,7 @@
 	struct net_device_stats *stats = &gp->net_stats;
 
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	if (gp->hw_running) {
 		stats->rx_crc_errors += readl(gp->regs + MAC_FCSERR);
@@ -2467,7 +2457,7 @@
 		writel(0, gp->regs + MAC_LCOLL);
 	}
 
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 
 	return &gp->net_stats;
@@ -2483,7 +2473,7 @@
 		return;
 		
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	netif_stop_queue(dev);
 
@@ -2508,7 +2498,7 @@
 
 	netif_wake_queue(dev);
 
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 }
 
@@ -2540,7 +2530,7 @@
 
 		/* Return current PHY settings */
 		spin_lock_irq(&gp->lock);
-		spin_lock(&gp->tx_lock);
+		spin_lock(&dev->xmit_lock);
 		cmd->autoneg = gp->want_autoneg;
 		cmd->speed = gp->phy_mii.speed;
 		cmd->duplex = gp->phy_mii.duplex;			
@@ -2552,7 +2542,7 @@
 		 */
 		if (cmd->advertising == 0)
 			cmd->advertising = cmd->supported;
-		spin_unlock(&gp->tx_lock);
+		spin_unlock(&dev->xmit_lock);
 		spin_unlock_irq(&gp->lock);
 	} else { // XXX PCS ?
 		cmd->supported =
@@ -2592,9 +2582,9 @@
 	      
 	/* Apply settings and restart link process. */
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	gem_begin_auto_negotiation(gp, cmd);
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 
 	return 0;
@@ -2609,9 +2599,9 @@
 
 	/* Restart link process. */
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	gem_begin_auto_negotiation(gp, NULL);
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 
 	return 0;
@@ -2863,7 +2853,6 @@
 	gp->msg_enable = DEFAULT_MSG;
 
 	spin_lock_init(&gp->lock);
-	spin_lock_init(&gp->tx_lock);
 	init_MUTEX(&gp->pm_sem);
 
 	init_timer(&gp->link_timer);
@@ -2899,9 +2888,9 @@
 		gem_apple_powerup(gp);
 #endif
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	gem_stop(gp);
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 
 	/* Fill up the mii_phy structure (even if we won't use it) */
@@ -2967,11 +2956,11 @@
 
 	/* Detect & init PHY, start autoneg */
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	gp->hw_running = 1;
 	gem_init_phy(gp);
 	gem_begin_auto_negotiation(gp, NULL);
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 
 	if (gp->phy_type == phy_mii_mdio0 ||
@@ -2982,7 +2971,7 @@
 	pci_set_drvdata(pdev, dev);
 
 	/* GEM can do it all... */
-	dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_LLTX;
+	dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM;
 	if (pci_using_dac)
 		dev->features |= NETIF_F_HIGHDMA;
 
===== drivers/net/sungem.h 1.18 vs edited =====
--- 1.18/drivers/net/sungem.h	2004-09-15 08:16:05 -07:00
+++ edited/drivers/net/sungem.h	2005-01-19 15:54:08 -08:00
@@ -953,7 +953,6 @@
 
 struct gem {
 	spinlock_t lock;
-	spinlock_t tx_lock;
 	void __iomem *regs;
 	int rx_new, rx_old;
 	int tx_new, tx_old;
===== drivers/net/tg3.c 1.227 vs edited =====
--- 1.227/drivers/net/tg3.c	2005-01-17 13:40:31 -08:00
+++ edited/drivers/net/tg3.c	2005-01-19 15:43:48 -08:00
@@ -2816,9 +2816,9 @@
 
 	/* run TX completion thread */
 	if (sblk->idx[0].tx_consumer != tp->tx_cons) {
-		spin_lock(&tp->tx_lock);
+		spin_lock(&netdev->xmit_lock);
 		tg3_tx(tp);
-		spin_unlock(&tp->tx_lock);
+		spin_unlock(&netdev->xmit_lock);
 	}
 
 	spin_unlock_irqrestore(&tp->lock, flags);
@@ -2939,7 +2939,7 @@
 	tg3_netif_stop(tp);
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&tp->dev->xmit_lock);
 
 	restart_timer = tp->tg3_flags2 & TG3_FLG2_RESTART_TIMER;
 	tp->tg3_flags2 &= ~TG3_FLG2_RESTART_TIMER;
@@ -2949,7 +2949,7 @@
 
 	tg3_netif_start(tp);
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&tp->dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	if (restart_timer)
@@ -3048,6 +3048,7 @@
 		(base + len + 8 < base));
 }
 
+/* dev->xmit_lock is held and IRQs are disabled.  */
 static int tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct tg3 *tp = netdev_priv(dev);
@@ -3055,39 +3056,12 @@
 	unsigned int i;
 	u32 len, entry, base_flags, mss;
 	int would_hit_hwbug;
-	unsigned long flags;
 
 	len = skb_headlen(skb);
 
-	/* No BH disabling for tx_lock here.  We are running in BH disabled
-	 * context and TX reclaim runs via tp->poll inside of a software
-	 * interrupt.  Rejoice!
-	 *
-	 * Actually, things are not so simple.  If we are to take a hw
-	 * IRQ here, we can deadlock, consider:
-	 *
-	 *       CPU1		CPU2
-	 *   tg3_start_xmit
-	 *   take tp->tx_lock
-	 *			tg3_timer
-	 *			take tp->lock
-	 *   tg3_interrupt
-	 *   spin on tp->lock
-	 *			spin on tp->tx_lock
-	 *
-	 * So we really do need to disable interrupts when taking
-	 * tx_lock here.
-	 */
-	local_irq_save(flags);
-	if (!spin_trylock(&tp->tx_lock)) { 
-		local_irq_restore(flags);
-		return NETDEV_TX_LOCKED; 
-	} 
-
 	/* This is a hard error, log it. */
 	if (unlikely(TX_BUFFS_AVAIL(tp) <= (skb_shinfo(skb)->nr_frags + 1))) {
 		netif_stop_queue(dev);
-		spin_unlock_irqrestore(&tp->tx_lock, flags);
 		printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n",
 		       dev->name);
 		return NETDEV_TX_BUSY;
@@ -3224,7 +3198,7 @@
 						entry, len,
 						last_plus_one,
 						&start, mss))
-			goto out_unlock;
+			goto out;
 
 		entry = start;
 	}
@@ -3236,9 +3210,8 @@
 	if (TX_BUFFS_AVAIL(tp) <= (MAX_SKB_FRAGS + 1))
 		netif_stop_queue(dev);
 
-out_unlock:
+out:
     	mmiowb();
-	spin_unlock_irqrestore(&tp->tx_lock, flags);
 
 	dev->trans_start = jiffies;
 
@@ -3273,7 +3246,7 @@
 
 	tg3_netif_stop(tp);
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	tg3_halt(tp);
 
@@ -3283,7 +3256,7 @@
 
 	tg3_netif_start(tp);
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	return 0;
@@ -5574,7 +5547,7 @@
 	unsigned long flags;
 
 	spin_lock_irqsave(&tp->lock, flags);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&tp->dev->xmit_lock);
 
 	/* All of this garbage is because when using non-tagged
 	 * IRQ status the mailbox/status_block protocol the chip
@@ -5590,7 +5563,7 @@
 
 	if (!(tr32(WDMAC_MODE) & WDMAC_MODE_ENABLE)) {
 		tp->tg3_flags2 |= TG3_FLG2_RESTART_TIMER;
-		spin_unlock(&tp->tx_lock);
+		spin_unlock(&tp->dev->xmit_lock);
 		spin_unlock_irqrestore(&tp->lock, flags);
 		schedule_work(&tp->reset_task);
 		return;
@@ -5659,7 +5632,7 @@
 		tp->asf_counter = tp->asf_multiplier;
 	}
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&tp->dev->xmit_lock);
 	spin_unlock_irqrestore(&tp->lock, flags);
 
 	tp->timer.expires = jiffies + tp->timer_offset;
@@ -5672,12 +5645,12 @@
 	int err;
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	tg3_disable_ints(tp);
 	tp->tg3_flags &= ~TG3_FLAG_INIT_COMPLETE;
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	/* The placement of this call is tied
@@ -5696,7 +5669,7 @@
 	}
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	err = tg3_init_hw(tp);
 	if (err) {
@@ -5716,7 +5689,7 @@
 		tp->tg3_flags |= TG3_FLAG_INIT_COMPLETE;
 	}
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	if (err) {
@@ -5726,11 +5699,11 @@
 	}
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	tg3_enable_ints(tp);
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	netif_start_queue(dev);
@@ -5978,7 +5951,7 @@
 	del_timer_sync(&tp->timer);
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 #if 0
 	tg3_dump_state(tp);
 #endif
@@ -5992,7 +5965,7 @@
 		  TG3_FLAG_GOT_SERDES_FLOWCTL);
 	netif_carrier_off(tp->dev);
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	free_irq(dev->irq, dev);
@@ -6296,9 +6269,9 @@
 	struct tg3 *tp = netdev_priv(dev);
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	__tg3_set_rx_mode(dev);
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 }
 
@@ -6322,7 +6295,7 @@
 	memset(p, 0, TG3_REGDUMP_LEN);
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 #define __GET_REG32(reg)	(*(p)++ = tr32(reg))
 #define GET_REG32_LOOP(base,len)		\
@@ -6372,7 +6345,7 @@
 #undef GET_REG32_LOOP
 #undef GET_REG32_1
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 }
 
@@ -6496,7 +6469,7 @@
 	}
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	tp->link_config.autoneg = cmd->autoneg;
 	if (cmd->autoneg == AUTONEG_ENABLE) {
@@ -6510,7 +6483,7 @@
   	}
   
 	tg3_setup_phy(tp, 1);
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
   
 	return 0;
@@ -6627,7 +6600,7 @@
   
 	tg3_netif_stop(tp);
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
   
 	tp->rx_pending = ering->rx_pending;
 
@@ -6640,7 +6613,7 @@
 	tg3_halt(tp);
 	tg3_init_hw(tp);
 	tg3_netif_start(tp);
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
   
 	return 0;
@@ -6661,7 +6634,7 @@
   
 	tg3_netif_stop(tp);
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	if (epause->autoneg)
 		tp->tg3_flags |= TG3_FLAG_PAUSE_AUTONEG;
 	else
@@ -6677,7 +6650,7 @@
 	tg3_halt(tp);
 	tg3_init_hw(tp);
 	tg3_netif_start(tp);
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
   
 	return 0;
@@ -6803,14 +6776,14 @@
 	struct tg3 *tp = netdev_priv(dev);
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	tp->vlgrp = grp;
 
 	/* Update RX_MODE_KEEP_VLAN_TAG bit in RX_MODE register. */
 	__tg3_set_rx_mode(dev);
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 }
 
@@ -6819,10 +6792,10 @@
 	struct tg3 *tp = netdev_priv(dev);
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	if (tp->vlgrp)
 		tp->vlgrp->vlan_devices[vid] = NULL;
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 }
 #endif
@@ -8241,7 +8214,6 @@
 
 	if (pci_using_dac)
 		dev->features |= NETIF_F_HIGHDMA;
-	dev->features |= NETIF_F_LLTX;
 #if TG3_VLAN_TAG_USED
 	dev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
 	dev->vlan_rx_register = tg3_vlan_rx_register;
@@ -8283,7 +8255,6 @@
 	tp->grc_mode |= GRC_MODE_BSWAP_NONFRM_DATA;
 #endif
 	spin_lock_init(&tp->lock);
-	spin_lock_init(&tp->tx_lock);
 	spin_lock_init(&tp->indirect_lock);
 	INIT_WORK(&tp->reset_task, tg3_reset_task, tp);
 
@@ -8496,23 +8467,23 @@
 	del_timer_sync(&tp->timer);
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	tg3_disable_ints(tp);
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	netif_device_detach(dev);
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	tg3_halt(tp);
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	err = tg3_set_power_state(tp, state);
 	if (err) {
 		spin_lock_irq(&tp->lock);
-		spin_lock(&tp->tx_lock);
+		spin_lock(&dev->xmit_lock);
 
 		tg3_init_hw(tp);
 
@@ -8522,7 +8493,7 @@
 		netif_device_attach(dev);
 		tg3_netif_start(tp);
 
-		spin_unlock(&tp->tx_lock);
+		spin_unlock(&dev->xmit_lock);
 		spin_unlock_irq(&tp->lock);
 	}
 
@@ -8547,7 +8518,7 @@
 	netif_device_attach(dev);
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	tg3_init_hw(tp);
 
@@ -8558,7 +8529,7 @@
 
 	tg3_netif_start(tp);
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	return 0;
===== drivers/net/tg3.h 1.52 vs edited =====
--- 1.52/drivers/net/tg3.h	2004-11-15 15:41:31 -08:00
+++ edited/drivers/net/tg3.h	2005-01-19 15:37:29 -08:00
@@ -1980,12 +1980,11 @@
 	 * lock: Held during all operations except TX packet
 	 *       processing.
 	 *
-	 * tx_lock: Held during tg3_start_xmit{,_4gbug} and tg3_tx
+	 * dev->xmit_lock: Held during tg3_start_xmit and tg3_tx
 	 *
 	 * If you want to shut up all asynchronous processing you must
-	 * acquire both locks, 'lock' taken before 'tx_lock'.  IRQs must
-	 * be disabled to take 'lock' but only softirq disabling is
-	 * necessary for acquisition of 'tx_lock'.
+	 * acquire both locks, 'lock' taken before 'xmit_lock'.  IRQs must
+	 * be disabled to take either lock.
 	 */
 	spinlock_t			lock;
 	spinlock_t			indirect_lock;
@@ -2003,8 +2002,6 @@
 	u32				tx_prod;
 	u32				tx_cons;
 	u32				tx_pending;
-
-	spinlock_t			tx_lock;
 
 	struct tg3_tx_buffer_desc	*tx_ring;
 	struct tx_ring_info		*tx_buffers;
===== drivers/net/e1000/e1000.h 1.58 vs edited =====
--- 1.58/drivers/net/e1000/e1000.h	2004-10-25 18:13:57 -07:00
+++ edited/drivers/net/e1000/e1000.h	2005-01-19 15:45:31 -08:00
@@ -209,7 +209,6 @@
 
 	/* TX */
 	struct e1000_desc_ring tx_ring;
-	spinlock_t tx_lock;
 	uint32_t txd_cmd;
 	uint32_t tx_int_delay;
 	uint32_t tx_abs_int_delay;
===== drivers/net/e1000/e1000_main.c 1.145 vs edited =====
--- 1.145/drivers/net/e1000/e1000_main.c	2005-01-10 21:03:21 -08:00
+++ edited/drivers/net/e1000/e1000_main.c	2005-01-19 15:46:15 -08:00
@@ -520,9 +520,6 @@
 	if(pci_using_dac)
 		netdev->features |= NETIF_F_HIGHDMA;
 
- 	/* hard_start_xmit is safe against parallel locking */
- 	netdev->features |= NETIF_F_LLTX; 
- 
 	/* before reading the EEPROM, reset the controller to 
 	 * put the device in a known good starting state */
 	
@@ -732,7 +729,6 @@
 
 	atomic_set(&adapter->irq_sem, 1);
 	spin_lock_init(&adapter->stats_lock);
-	spin_lock_init(&adapter->tx_lock);
 
 	return 0;
 }
@@ -1308,7 +1304,7 @@
 
 	/* Check for Promiscuous and All Multicast modes */
 
-	spin_lock_irqsave(&adapter->tx_lock, flags);
+	spin_lock_irqsave(&netdev->xmit_lock, flags);
 
 	rctl = E1000_READ_REG(hw, RCTL);
 
@@ -1359,7 +1355,7 @@
 	if(hw->mac_type == e1000_82542_rev2_0)
 		e1000_leave_82542_rst(adapter);
 
-	spin_unlock_irqrestore(&adapter->tx_lock, flags);
+	spin_unlock_irqrestore(&netdev->xmit_lock, flags);
 }
 
 /* Need to wait a few seconds after link up to get diagnostic information from
@@ -1786,6 +1782,8 @@
 }
 
 #define TXD_USE_COUNT(S, X) (((S) >> (X)) + 1 )
+
+/* Called with dev->xmit_lock held and interrupts disabled.  */
 static int
 e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 {
@@ -1794,7 +1792,6 @@
 	unsigned int max_txd_pwr = E1000_MAX_TXD_PWR;
 	unsigned int tx_flags = 0;
 	unsigned int len = skb->len;
-	unsigned long flags;
 	unsigned int nr_frags = 0;
 	unsigned int mss = 0;
 	int count = 0;
@@ -1838,18 +1835,10 @@
 	if(adapter->pcix_82544)
 		count += nr_frags;
 
- 	local_irq_save(flags); 
- 	if (!spin_trylock(&adapter->tx_lock)) { 
- 		/* Collision - tell upper layer to requeue */ 
- 		local_irq_restore(flags); 
- 		return NETDEV_TX_LOCKED; 
- 	} 
-
 	/* need: count + 2 desc gap to keep tail from touching
 	 * head, otherwise try next time */
 	if(unlikely(E1000_DESC_UNUSED(&adapter->tx_ring) < count + 2)) {
 		netif_stop_queue(netdev);
-		spin_unlock_irqrestore(&adapter->tx_lock, flags);
 		return NETDEV_TX_BUSY;
 	}
 
@@ -1857,7 +1846,6 @@
 		if(unlikely(e1000_82547_fifo_workaround(adapter, skb))) {
 			netif_stop_queue(netdev);
 			mod_timer(&adapter->tx_fifo_stall_timer, jiffies);
-			spin_unlock_irqrestore(&adapter->tx_lock, flags);
 			return NETDEV_TX_BUSY;
 		}
 	}
@@ -1884,7 +1872,6 @@
 	if(unlikely(E1000_DESC_UNUSED(&adapter->tx_ring) < MAX_SKB_FRAGS + 2))
 		netif_stop_queue(netdev);
 
-	spin_unlock_irqrestore(&adapter->tx_lock, flags);
 	return NETDEV_TX_OK;
 }
 
@@ -2234,13 +2221,13 @@
 
 	tx_ring->next_to_clean = i;
 
-	spin_lock(&adapter->tx_lock);
+	spin_lock(&netdev->xmit_lock);
 
 	if(unlikely(cleaned && netif_queue_stopped(netdev) &&
 		    netif_carrier_ok(netdev)))
 		netif_wake_queue(netdev);
 
-	spin_unlock(&adapter->tx_lock);
+	spin_unlock(&netdev->xmit_lock);
 
 	return cleaned;
 }
===== include/linux/netdevice.h 1.96 vs edited =====
--- 1.96/include/linux/netdevice.h	2005-01-17 13:13:31 -08:00
+++ edited/include/linux/netdevice.h	2005-01-19 16:07:45 -08:00
@@ -76,7 +76,6 @@
 /* Driver transmit return codes */
 #define NETDEV_TX_OK 0		/* driver took care of packet */
 #define NETDEV_TX_BUSY 1	/* driver tx path was busy*/
-#define NETDEV_TX_LOCKED -1	/* driver tx lock was already taken */
 
 /*
  *	Compute the worst case header length according to the protocols
@@ -415,7 +414,7 @@
 #define NETIF_F_HW_VLAN_FILTER	512	/* Receive filtering on VLAN */
 #define NETIF_F_VLAN_CHALLENGED	1024	/* Device cannot handle VLAN packets */
 #define NETIF_F_TSO		2048	/* Can offload TCP/IP segmentation */
-#define NETIF_F_LLTX		4096	/* LockLess TX */
+#define NETIF_F_LLTX		4096	/* Do not grab xmit_lock during ->hard_start_xmit */
 
 	/* Called after device is detached from network. */
 	void			(*uninit)(struct net_device *dev);
@@ -894,9 +893,11 @@
 
 static inline void netif_tx_disable(struct net_device *dev)
 {
-	spin_lock_bh(&dev->xmit_lock);
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->xmit_lock, flags);
 	netif_stop_queue(dev);
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irqrestore(&dev->xmit_lock, flags);
 }
 
 /* These functions live elsewhere (drivers/net/net_init.c, but related) */
===== net/atm/clip.c 1.43 vs edited =====
--- 1.43/net/atm/clip.c	2004-10-01 14:50:07 -07:00
+++ edited/net/atm/clip.c	2005-01-19 16:04:55 -08:00
@@ -97,7 +97,7 @@
 		printk(KERN_CRIT "!clip_vcc->entry (clip_vcc %p)\n",clip_vcc);
 		return;
 	}
-	spin_lock_bh(&entry->neigh->dev->xmit_lock);	/* block clip_start_xmit() */
+	spin_lock_irq(&entry->neigh->dev->xmit_lock);	/* block clip_start_xmit() */
 	entry->neigh->used = jiffies;
 	for (walk = &entry->vccs; *walk; walk = &(*walk)->next)
 		if (*walk == clip_vcc) {
@@ -121,7 +121,7 @@
 	printk(KERN_CRIT "ATMARP: unlink_clip_vcc failed (entry %p, vcc "
 	  "0x%p)\n",entry,clip_vcc);
 out:
-	spin_unlock_bh(&entry->neigh->dev->xmit_lock);
+	spin_unlock_irq(&entry->neigh->dev->xmit_lock);
 }
 
 /* The neighbour entry n->lock is held. */
===== net/core/dev.c 1.181 vs edited =====
--- 1.181/net/core/dev.c	2005-01-13 20:41:04 -08:00
+++ edited/net/core/dev.c	2005-01-19 16:00:08 -08:00
@@ -1190,7 +1190,7 @@
 
 #define HARD_TX_LOCK(dev, cpu) {			\
 	if ((dev->features & NETIF_F_LLTX) == 0) {	\
-		spin_lock(&dev->xmit_lock);		\
+		spin_lock_irq(&dev->xmit_lock);		\
 		dev->xmit_lock_owner = cpu;		\
 	}						\
 }
@@ -1198,7 +1198,7 @@
 #define HARD_TX_UNLOCK(dev) {				\
 	if ((dev->features & NETIF_F_LLTX) == 0) {	\
 		dev->xmit_lock_owner = -1;		\
-		spin_unlock(&dev->xmit_lock);		\
+		spin_unlock_irq(&dev->xmit_lock);	\
 	}						\
 }
 
===== net/core/dev_mcast.c 1.4 vs edited =====
--- 1.4/net/core/dev_mcast.c	2004-10-20 01:37:15 -07:00
+++ edited/net/core/dev_mcast.c	2005-01-19 16:05:35 -08:00
@@ -93,9 +93,9 @@
 
 void dev_mc_upload(struct net_device *dev)
 {
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	__dev_mc_upload(dev);
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 }
 
 /*
@@ -107,7 +107,7 @@
 	int err = 0;
 	struct dev_mc_list *dmi, **dmip;
 
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 
 	for (dmip = &dev->mc_list; (dmi = *dmip) != NULL; dmip = &dmi->next) {
 		/*
@@ -139,13 +139,13 @@
 			 */
 			__dev_mc_upload(dev);
 			
-			spin_unlock_bh(&dev->xmit_lock);
+			spin_unlock_irq(&dev->xmit_lock);
 			return 0;
 		}
 	}
 	err = -ENOENT;
 done:
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 	return err;
 }
 
@@ -160,7 +160,7 @@
 
 	dmi1 = (struct dev_mc_list *)kmalloc(sizeof(*dmi), GFP_ATOMIC);
 
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	for (dmi = dev->mc_list; dmi != NULL; dmi = dmi->next) {
 		if (memcmp(dmi->dmi_addr, addr, dmi->dmi_addrlen) == 0 &&
 		    dmi->dmi_addrlen == alen) {
@@ -176,7 +176,7 @@
 	}
 
 	if ((dmi = dmi1) == NULL) {
-		spin_unlock_bh(&dev->xmit_lock);
+		spin_unlock_irq(&dev->xmit_lock);
 		return -ENOMEM;
 	}
 	memcpy(dmi->dmi_addr, addr, alen);
@@ -189,11 +189,11 @@
 
 	__dev_mc_upload(dev);
 	
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 	return 0;
 
 done:
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 	if (dmi1)
 		kfree(dmi1);
 	return err;
@@ -205,7 +205,7 @@
 
 void dev_mc_discard(struct net_device *dev)
 {
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	
 	while (dev->mc_list != NULL) {
 		struct dev_mc_list *tmp = dev->mc_list;
@@ -216,7 +216,7 @@
 	}
 	dev->mc_count = 0;
 
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 }
 
 #ifdef CONFIG_PROC_FS
@@ -251,7 +251,7 @@
 	struct dev_mc_list *m;
 	struct net_device *dev = v;
 
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	for (m = dev->mc_list; m; m = m->next) {
 		int i;
 
@@ -263,7 +263,7 @@
 
 		seq_putc(seq, '\n');
 	}
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 	return 0;
 }
 
===== net/core/netpoll.c 1.24 vs edited =====
--- 1.24/net/core/netpoll.c	2005-01-13 20:41:04 -08:00
+++ edited/net/core/netpoll.c	2005-01-19 16:05:56 -08:00
@@ -188,7 +188,7 @@
 		return;
 	}
 
-	spin_lock(&np->dev->xmit_lock);
+	spin_lock_irq(&np->dev->xmit_lock);
 	np->dev->xmit_lock_owner = smp_processor_id();
 
 	/*
@@ -197,7 +197,7 @@
 	 */
 	if (netif_queue_stopped(np->dev)) {
 		np->dev->xmit_lock_owner = -1;
-		spin_unlock(&np->dev->xmit_lock);
+		spin_unlock_irq(&np->dev->xmit_lock);
 
 		netpoll_poll(np);
 		goto repeat;
@@ -205,7 +205,7 @@
 
 	status = np->dev->hard_start_xmit(skb, np->dev);
 	np->dev->xmit_lock_owner = -1;
-	spin_unlock(&np->dev->xmit_lock);
+	spin_unlock_irq(&np->dev->xmit_lock);
 
 	/* transmit busy */
 	if(status) {
===== net/core/pktgen.c 1.21 vs edited =====
--- 1.21/net/core/pktgen.c	2005-01-10 11:32:10 -08:00
+++ edited/net/core/pktgen.c	2005-01-19 16:07:58 -08:00
@@ -2664,12 +2664,11 @@
 		}
 	}
 	
-	spin_lock_bh(&odev->xmit_lock);
+	spin_lock_irq(&odev->xmit_lock);
 	if (!netif_queue_stopped(odev)) {
 		u64 now;
 
 		atomic_inc(&(pkt_dev->skb->users));
-retry_now:
 		ret = odev->hard_start_xmit(pkt_dev->skb, odev);
 		if (likely(ret == NETDEV_TX_OK)) {
 			pkt_dev->last_ok = 1;    
@@ -2677,10 +2676,6 @@
 			pkt_dev->seq_num++;
 			pkt_dev->tx_bytes += pkt_dev->cur_pkt_size;
 			
-		} else if (ret == NETDEV_TX_LOCKED 
-			   && (odev->features & NETIF_F_LLTX)) {
-			cpu_relax();
-			goto retry_now;
 		} else {  /* Retry it next time */
 			
 			atomic_dec(&(pkt_dev->skb->users));
@@ -2716,7 +2711,7 @@
 		pkt_dev->next_tx_ns = 0;
         }
 
-	spin_unlock_bh(&odev->xmit_lock);
+	spin_unlock_irq(&odev->xmit_lock);
 	
 	/* If pkt_dev->count is zero, then run forever */
 	if ((pkt_dev->count != 0) && (pkt_dev->sofar >= pkt_dev->count)) {
===== net/sched/sch_generic.c 1.33 vs edited =====
--- 1.33/net/sched/sch_generic.c	2005-01-13 20:41:07 -08:00
+++ edited/net/sched/sch_generic.c	2005-01-19 16:10:49 -08:00
@@ -99,16 +99,11 @@
 	if ((skb = q->dequeue(q)) != NULL) {
 		unsigned nolock = (dev->features & NETIF_F_LLTX);
 		/*
-		 * When the driver has LLTX set it does its own locking
-		 * in start_xmit. No need to add additional overhead by
-		 * locking again. These checks are worth it because
-		 * even uncongested locks can be quite expensive.
-		 * The driver can do trylock like here too, in case
-		 * of lock congestion it should return -1 and the packet
-		 * will be requeued.
+		 * When the driver has LLTX set it does not require any
+		 * locking in start_xmit.
 		 */
 		if (!nolock) {
-			if (!spin_trylock(&dev->xmit_lock)) {
+			if (!spin_trylock_irq(&dev->xmit_lock)) {
 			collision:
 				/* So, someone grabbed the driver. */
 				
@@ -143,22 +138,18 @@
 				if (ret == NETDEV_TX_OK) { 
 					if (!nolock) {
 						dev->xmit_lock_owner = -1;
-						spin_unlock(&dev->xmit_lock);
+						spin_unlock_irq(&dev->xmit_lock);
 					}
 					spin_lock(&dev->queue_lock);
 					return -1;
 				}
-				if (ret == NETDEV_TX_LOCKED && nolock) {
-					spin_lock(&dev->queue_lock);
-					goto collision; 
-				}
 			}
 
 			/* NETDEV_TX_BUSY - we need to requeue */
 			/* Release the driver */
 			if (!nolock) { 
 				dev->xmit_lock_owner = -1;
-				spin_unlock(&dev->xmit_lock);
+				spin_unlock_irq(&dev->xmit_lock);
 			} 
 			spin_lock(&dev->queue_lock);
 			q = dev->qdisc;
@@ -186,7 +177,7 @@
 {
 	struct net_device *dev = (struct net_device *)arg;
 
-	spin_lock(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	if (dev->qdisc != &noop_qdisc) {
 		if (netif_device_present(dev) &&
 		    netif_running(dev) &&
@@ -200,7 +191,7 @@
 				dev_hold(dev);
 		}
 	}
-	spin_unlock(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 
 	dev_put(dev);
 }
@@ -224,17 +215,17 @@
 
 static void dev_watchdog_up(struct net_device *dev)
 {
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	__netdev_watchdog_up(dev);
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 }
 
 static void dev_watchdog_down(struct net_device *dev)
 {
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	if (del_timer(&dev->watchdog_timer))
 		__dev_put(dev);
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 }
 
 /* "NOOP" scheduler: the best scheduler, recommended for all interfaces
===== net/sched/sch_teql.c 1.16 vs edited =====
--- 1.16/net/sched/sch_teql.c	2004-10-21 22:21:24 -07:00
+++ edited/net/sched/sch_teql.c	2005-01-19 16:06:11 -08:00
@@ -301,12 +301,12 @@
 
 		switch (teql_resolve(skb, skb_res, slave)) {
 		case 0:
-			if (spin_trylock(&slave->xmit_lock)) {
+			if (spin_trylock_irq(&slave->xmit_lock)) {
 				slave->xmit_lock_owner = smp_processor_id();
 				if (!netif_queue_stopped(slave) &&
 				    slave->hard_start_xmit(skb, slave) == 0) {
 					slave->xmit_lock_owner = -1;
-					spin_unlock(&slave->xmit_lock);
+					spin_unlock_irq(&slave->xmit_lock);
 					master->slaves = NEXT_SLAVE(q);
 					netif_wake_queue(dev);
 					master->stats.tx_packets++;
@@ -314,7 +314,7 @@
 					return 0;
 				}
 				slave->xmit_lock_owner = -1;
-				spin_unlock(&slave->xmit_lock);
+				spin_unlock_irq(&slave->xmit_lock);
 			}
 			if (netif_queue_stopped(dev))
 				busy = 1;

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [openib-general] Re: LLTX and netif_stop_queue
  2005-01-20  0:46                                         ` Stephen Hemminger
@ 2005-01-20  0:47                                           ` David S. Miller
  0 siblings, 0 replies; 62+ messages in thread
From: David S. Miller @ 2005-01-20  0:47 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: jgarzik, hadi, iod00d, eric.lemoine, roland, netdev, ak,
	openib-general, kaber

On Wed, 19 Jan 2005 16:46:14 -0800
Stephen Hemminger <shemminger@osdl.org> wrote:

> But in the end it would be safer than the current LLTX usage in drivers
> which seems like it opening up problems.

Agreed, see my prototype patch in a posting I just made.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [openib-general] Re: LLTX and netif_stop_queue
  2005-01-19 23:41                                     ` David S. Miller
  2005-01-20  0:02                                       ` [openib-general] " Jeff Garzik
@ 2005-01-20  0:47                                       ` Francois Romieu
  2005-01-20  0:52                                         ` David S. Miller
  1 sibling, 1 reply; 62+ messages in thread
From: Francois Romieu @ 2005-01-20  0:47 UTC (permalink / raw)
  To: David S. Miller
  Cc: Stephen Hemminger, hadi, iod00d, eric.lemoine, roland, netdev, ak,
	openib-general, kaber

David S. Miller <davem@davemloft.net> :
[...]
> Originally, dev->xmit_lock was added so that drivers that were SMP dumb
> could stay that way.  Thus preserving the guarentee that there would be
> only one active call into the dev->hard_start_xmit method across the
> entire system.  I don't think any of that is relevant any longer.  All
> of our network drivers are pretty clean in this regard.

(nit)

Almost all. I used the fact that dev->hard_start_xmit was issued in
a bh disabled context to exchange spinlock_irqsave for ordered ops
on ring indexes so as to sync hard_start_xmit and the irq handler in
the r8169 driver. It is a bit sick but Jon Mason reported it made a
noticeable difference to avoid the irqsave on its 4 way ppc64 and 
nobody complained about it.

--
Ueimor

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [openib-general] Re: LLTX and netif_stop_queue
  2005-01-20  0:47                                       ` Francois Romieu
@ 2005-01-20  0:52                                         ` David S. Miller
  2005-01-20  1:17                                           ` Francois Romieu
  0 siblings, 1 reply; 62+ messages in thread
From: David S. Miller @ 2005-01-20  0:52 UTC (permalink / raw)
  To: Francois Romieu
  Cc: shemminger, hadi, iod00d, eric.lemoine, roland, netdev, ak,
	openib-general, kaber

On Thu, 20 Jan 2005 01:47:53 +0100
Francois Romieu <romieu@fr.zoreil.com> wrote:

> David S. Miller <davem@davemloft.net> :
> [...]
> > Originally, dev->xmit_lock was added so that drivers that were SMP dumb
> > could stay that way.  Thus preserving the guarentee that there would be
> > only one active call into the dev->hard_start_xmit method across the
> > entire system.  I don't think any of that is relevant any longer.  All
> > of our network drivers are pretty clean in this regard.
> 
> (nit)
> 
> Almost all. I used the fact that dev->hard_start_xmit was issued in
> a bh disabled context to exchange spinlock_irqsave for ordered ops
> on ring indexes so as to sync hard_start_xmit and the irq handler in
> the r8169 driver. It is a bit sick but Jon Mason reported it made a
> noticeable difference to avoid the irqsave on its 4 way ppc64 and 
> nobody complained about it.

Hmmm... ok then.  Unfortunately, my prototype patch I just posted
will make IRQs get disabled in the ->hard_start_xmit() path.

BTW, in your close() routine you do this:

	/* Give a racing hard_start_xmit a few cycles to complete. */
	set_current_state(TASK_UNINTERRUPTIBLE);
	schedule_timeout(1);

This is no guarentee of progress.  You might instead want to
do a synchronize_kernel() or similar, which does in fact
guarentee a quiescent state.

Or if my patch goes in use spin_unlock_wait(&netdev->xmit_lock) ;-)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [openib-general] Re: LLTX and netif_stop_queue
  2005-01-20  0:52                                         ` David S. Miller
@ 2005-01-20  1:17                                           ` Francois Romieu
  0 siblings, 0 replies; 62+ messages in thread
From: Francois Romieu @ 2005-01-20  1:17 UTC (permalink / raw)
  To: David S. Miller
  Cc: shemminger, hadi, iod00d, eric.lemoine, roland, netdev, ak,
	openib-general, kaber

David S. Miller <davem@davemloft.net> :
[...]
> Hmmm... ok then.  Unfortunately, my prototype patch I just posted
> will make IRQs get disabled in the ->hard_start_xmit() path.

I'm completely fine with the prototype patch. It will allow to cut some
code in the driver and the r8169 has room for improvement anyway.

Point taken for the close() routine.

--
Ueimor

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-20  0:46                                     ` [PATCH]: was " David S. Miller
@ 2005-01-20  3:14                                       ` Andi Kleen
  2005-01-20  7:05                                         ` David S. Miller
  2005-01-20  3:43                                       ` Roland Dreier
  2005-01-20  4:01                                       ` jamal
  2 siblings, 1 reply; 62+ messages in thread
From: Andi Kleen @ 2005-01-20  3:14 UTC (permalink / raw)
  To: David S. Miller
  Cc: Stephen Hemminger, hadi, iod00d, eric.lemoine, roland, netdev, ak,
	openib-general, kaber

On Wed, Jan 19, 2005 at 04:46:40PM -0800, David S. Miller wrote:
> Ok, here is what something like this might look like.
> 
> In summary:
> 
> 1) dev->xmit_lock is now IRQ disabling instead of BH disabling
> 2) Drivers can use dev->xmit_lock in place of their private
>    driver_priv->tx_lock and this is effectively the same gain
>    LLTX gave drivers sans the race condition which started this
>    thread.
> 3) NETDEV_TX_LOCKED is gone
> 4) LLTX stays, but this means that the driver's TX routine is
>    %100 lockless.  It is only to be used by loopback and other
>    queueless software devices such as tunnels.

Looks good to me and much cleaner than what I initially did.
Thanks.

-Andi

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-20  0:46                                     ` [PATCH]: was " David S. Miller
  2005-01-20  3:14                                       ` Andi Kleen
@ 2005-01-20  3:43                                       ` Roland Dreier
  2005-01-20  7:05                                         ` David S. Miller
  2005-01-20  4:01                                       ` jamal
  2 siblings, 1 reply; 62+ messages in thread
From: Roland Dreier @ 2005-01-20  3:43 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, eric.lemoine, hadi, ak, openib-general, kaber,
	Stephen Hemminger

Looks reasonable to me.  I'll try the IPoIB changes out to make sure
it still works.

Thanks,
  Roland

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-20  0:46                                     ` [PATCH]: was " David S. Miller
  2005-01-20  3:14                                       ` Andi Kleen
  2005-01-20  3:43                                       ` Roland Dreier
@ 2005-01-20  4:01                                       ` jamal
  2005-01-20  5:18                                         ` David S. Miller
  2 siblings, 1 reply; 62+ messages in thread
From: jamal @ 2005-01-20  4:01 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, eric.lemoine, ak, openib-general, kaber,
	Stephen Hemminger


One thing i didnt quiet follow Dave in some of the drivers, example in
the e1000:

-----
/* Called with dev->xmit_lock held and interrupts disabled.  */
 static int
 e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
---

Who calls that with dev->xmit_lock held and interrupts disabled?
Shouldnt the spin_unlock(&netdev->xmit_lock); be right at the top of
that routine now?

cheers,
jamal

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-20  4:01                                       ` jamal
@ 2005-01-20  5:18                                         ` David S. Miller
  0 siblings, 0 replies; 62+ messages in thread
From: David S. Miller @ 2005-01-20  5:18 UTC (permalink / raw)
  To: hadi; +Cc: netdev, eric.lemoine, ak, openib-general, kaber, shemminger

On 19 Jan 2005 23:01:47 -0500
jamal <hadi@cyberus.ca> wrote:

> -----
> /* Called with dev->xmit_lock held and interrupts disabled.  */
>  static int
>  e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
> ---
> 
> Who calls that with dev->xmit_lock held and interrupts disabled?

qdisc_restart() and dev_queue_xmit(), via netdev->hard_start_xmit().

> Shouldnt the spin_unlock(&netdev->xmit_lock); be right at the top of
> that routine now?

Nope, the idea now is that netdev->xmit_lock replaces the driver
private tx_lock

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-20  3:43                                       ` Roland Dreier
@ 2005-01-20  7:05                                         ` David S. Miller
  2005-01-20 13:51                                           ` Tommy Christensen
                                                             ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: David S. Miller @ 2005-01-20  7:05 UTC (permalink / raw)
  To: Roland Dreier
  Cc: netdev, eric.lemoine, hadi, ak, openib-general, kaber, shemminger

On Wed, 19 Jan 2005 19:43:42 -0800
Roland Dreier <roland@topspin.com> wrote:

> Looks reasonable to me.  I'll try the IPoIB changes out to make sure
> it still works.

Great.

I actually tested my patch out, and there was a deadlock added
to the tg3 and sungem drivers which is fixed in this updated
version.

Basically, since the multicast code now takes dev->xmit_lock
the ->set_multicast_list() routines should not grab it
anymore.

To prevent ABBA deadlock, I had to remove the acquisition of
the usual priv->lock in tg3 and sungem multicast list settings.
Grabbing priv->lock was overkill and this new scheme of just
holding dev->xmit_lock is OK, because:

1) Messing with the RX mode and multicast hash is safe even
   in parallel with RX packet processing.

2) All code paths that want to shut up all chip programming
   code paths still take both dev->xmit_lock and priv->lock
   thus still able to block out ->set_multicast_list() processing.

So I'm going to check this in and push upstream.  Let me know
if folks find any other errors.

Thanks.

===== Documentation/networking/netdevices.txt 1.5 vs edited =====
--- 1.5/Documentation/networking/netdevices.txt	2004-09-12 16:55:06 -07:00
+++ edited/Documentation/networking/netdevices.txt	2005-01-19 16:09:50 -08:00
@@ -45,10 +45,9 @@
 	Synchronization: dev->xmit_lock spinlock.
 	When the driver sets NETIF_F_LLTX in dev->features this will be
 	called without holding xmit_lock. In this case the driver 
-	has to lock by itself when needed. It is recommended to use a try lock
-	for this and return -1 when the spin lock fails. 
-	The locking there should also properly protect against 
-	set_multicast_list
+	has to execute it's transmission routine in a completely lockless
+	manner.  It is recommended only for queueless devices such
+	loopback and tunnels.
 	Context: BHs disabled
 	Notes: netif_queue_stopped() is guaranteed false
 	Return codes: 
@@ -56,8 +55,6 @@
 	o NETDEV_TX_BUSY Cannot transmit packet, try later 
 	  Usually a bug, means queue start/stop flow control is broken in
 	  the driver. Note: the driver must NOT put the skb in its DMA ring.
-	o NETDEV_TX_LOCKED Locking failed, please retry quickly.
-	  Only valid when NETIF_F_LLTX is set.
 
 dev->tx_timeout:
 	Synchronization: dev->xmit_lock spinlock.
===== drivers/infiniband/ulp/ipoib/ipoib.h 1.3 vs edited =====
--- 1.3/drivers/infiniband/ulp/ipoib/ipoib.h	2005-01-15 14:01:50 -08:00
+++ edited/drivers/infiniband/ulp/ipoib/ipoib.h	2005-01-19 15:50:47 -08:00
@@ -104,10 +104,10 @@
 };
 
 /*
- * Device private locking: tx_lock protects members used in TX fast
- * path (and we use LLTX so upper layers don't do extra locking).
- * lock protects everything else.  lock nests inside of tx_lock (ie
- * tx_lock must be acquired first if needed).
+ * Device private locking: netdev->xmit_lock protects members used
+ * in TX fast path.
+ * lock protects everything else.  lock nests inside of xmit_lock (ie
+ * xmit_lock must be acquired first if needed).
  */
 struct ipoib_dev_priv {
 	spinlock_t lock;
@@ -150,7 +150,6 @@
 
 	struct ipoib_buf *rx_ring;
 
-	spinlock_t        tx_lock;
 	struct ipoib_buf *tx_ring;
 	unsigned          tx_head;
 	unsigned          tx_tail;
===== drivers/infiniband/ulp/ipoib/ipoib_ib.c 1.4 vs edited =====
--- 1.4/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2005-01-15 14:01:50 -08:00
+++ edited/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2005-01-19 15:51:36 -08:00
@@ -247,12 +247,12 @@
 
 		dev_kfree_skb_any(tx_req->skb);
 
-		spin_lock_irqsave(&priv->tx_lock, flags);
+		spin_lock_irqsave(&dev->xmit_lock, flags);
 		++priv->tx_tail;
 		if (netif_queue_stopped(dev) &&
 		    priv->tx_head - priv->tx_tail <= IPOIB_TX_RING_SIZE / 2)
 			netif_wake_queue(dev);
-		spin_unlock_irqrestore(&priv->tx_lock, flags);
+		spin_unlock_irqrestore(&dev->xmit_lock, flags);
 
 		if (wc->status != IB_WC_SUCCESS &&
 		    wc->status != IB_WC_WR_FLUSH_ERR)
===== drivers/infiniband/ulp/ipoib/ipoib_main.c 1.3 vs edited =====
--- 1.3/drivers/infiniband/ulp/ipoib/ipoib_main.c	2005-01-15 14:01:48 -08:00
+++ edited/drivers/infiniband/ulp/ipoib/ipoib_main.c	2005-01-19 15:53:27 -08:00
@@ -411,7 +411,7 @@
 
 	/*
 	 * We can only be called from ipoib_start_xmit, so we're
-	 * inside tx_lock -- no need to save/restore flags.
+	 * inside dev->xmit_lock -- no need to save/restore flags.
 	 */
 	spin_lock(&priv->lock);
 
@@ -483,7 +483,7 @@
 
 	/*
 	 * We can only be called from ipoib_start_xmit, so we're
-	 * inside tx_lock -- no need to save/restore flags.
+	 * inside dev->xmit_lock -- no need to save/restore flags.
 	 */
 	spin_lock(&priv->lock);
 
@@ -526,27 +526,11 @@
 	spin_unlock(&priv->lock);
 }
 
+/* Called with dev->xmit_lock held and IRQs disabled.  */
 static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ipoib_neigh *neigh;
-	unsigned long flags;
-
-	local_irq_save(flags);
-	if (!spin_trylock(&priv->tx_lock)) {
-		local_irq_restore(flags);
-		return NETDEV_TX_LOCKED;
-	}
-
-	/*
-	 * Check if our queue is stopped.  Since we have the LLTX bit
-	 * set, we can't rely on netif_stop_queue() preventing our
-	 * xmit function from being called with a full queue.
-	 */
-	if (unlikely(netif_queue_stopped(dev))) {
-		spin_unlock_irqrestore(&priv->tx_lock, flags);
-		return NETDEV_TX_BUSY;
-	}
 
 	if (skb->dst && skb->dst->neighbour) {
 		if (unlikely(!*to_ipoib_neigh(skb->dst->neighbour))) {
@@ -601,7 +585,6 @@
 	}
 
 out:
-	spin_unlock_irqrestore(&priv->tx_lock, flags);
 
 	return NETDEV_TX_OK;
 }
@@ -797,7 +780,7 @@
 	dev->addr_len 		 = INFINIBAND_ALEN;
 	dev->type 		 = ARPHRD_INFINIBAND;
 	dev->tx_queue_len 	 = IPOIB_TX_RING_SIZE * 2;
-	dev->features            = NETIF_F_VLAN_CHALLENGED | NETIF_F_LLTX;
+	dev->features            = NETIF_F_VLAN_CHALLENGED;
 
 	/* MTU will be reset when mcast join happens */
 	dev->mtu 		 = IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN;
@@ -812,7 +795,6 @@
 	priv->dev = dev;
 
 	spin_lock_init(&priv->lock);
-	spin_lock_init(&priv->tx_lock);
 
 	init_MUTEX(&priv->mcast_mutex);
 	init_MUTEX(&priv->vlan_mutex);
===== drivers/net/sungem.c 1.72 vs edited =====
--- 1.72/drivers/net/sungem.c	2004-11-05 15:56:15 -08:00
+++ edited/drivers/net/sungem.c	2005-01-19 22:29:14 -08:00
@@ -835,9 +835,9 @@
 		}
 
 		/* Run TX completion thread */
-		spin_lock(&gp->tx_lock);
+		spin_lock(&dev->xmit_lock);
 		gem_tx(dev, gp, gp->status);
-		spin_unlock(&gp->tx_lock);
+		spin_unlock(&dev->xmit_lock);
 
 		spin_unlock_irqrestore(&gp->lock, flags);
 
@@ -932,12 +932,12 @@
 	       readl(gp->regs + MAC_RXCFG));
 
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	gp->reset_task_pending = 2;
 	schedule_work(&gp->reset_task);
 
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 }
 
@@ -955,7 +955,6 @@
 	struct gem *gp = dev->priv;
 	int entry;
 	u64 ctrl;
-	unsigned long flags;
 
 	ctrl = 0;
 	if (skb->ip_summed == CHECKSUM_HW) {
@@ -969,17 +968,9 @@
 			(csum_stuff_off << 21));
 	}
 
-	local_irq_save(flags);
-	if (!spin_trylock(&gp->tx_lock)) {
-		/* Tell upper layer to requeue */
-		local_irq_restore(flags);
-		return NETDEV_TX_LOCKED;
-	}
-
 	/* This is a hard error, log it. */
 	if (TX_BUFFS_AVAIL(gp) <= (skb_shinfo(skb)->nr_frags + 1)) {
 		netif_stop_queue(dev);
-		spin_unlock_irqrestore(&gp->tx_lock, flags);
 		printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n",
 		       dev->name);
 		return NETDEV_TX_BUSY;
@@ -1066,7 +1057,6 @@
 		       dev->name, entry, skb->len);
 	mb();
 	writel(gp->tx_new, gp->regs + TXDMA_KICK);
-	spin_unlock_irqrestore(&gp->tx_lock, flags);
 
 	dev->trans_start = jiffies;
 
@@ -1097,11 +1087,11 @@
 	}
 
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	dev->mtu = new_mtu;
 	gp->reset_task_pending = 1;
 	schedule_work(&gp->reset_task);
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 
 	flush_scheduled_work();
@@ -1111,7 +1101,7 @@
 
 #define STOP_TRIES 32
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_stop(struct gem *gp)
 {
 	int limit;
@@ -1137,7 +1127,7 @@
 		printk(KERN_ERR "%s: SW reset is ghetto.\n", gp->dev->name);
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_start_dma(struct gem *gp)
 {
 	unsigned long val;
@@ -1162,7 +1152,7 @@
 }
 
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 // XXX dbl check what that function should do when called on PCS PHY
 static void gem_begin_auto_negotiation(struct gem *gp, struct ethtool_cmd *ep)
 {
@@ -1249,7 +1239,7 @@
 /* A link-up condition has occurred, initialize and enable the
  * rest of the chip.
  *
- * Must be invoked under gp->lock and gp->tx_lock.
+ * Must be invoked under gp->lock and dev->xmit_lock.
  */
 static int gem_set_link_modes(struct gem *gp)
 {
@@ -1356,7 +1346,7 @@
 	return 0;
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static int gem_mdio_link_not_up(struct gem *gp)
 {
 	switch (gp->lstate) {
@@ -1414,7 +1404,7 @@
 
 	netif_poll_disable(gp->dev);
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&gp->dev->xmit_lock);
 
 	if (gp->hw_running && gp->opened) {
 		netif_stop_queue(gp->dev);
@@ -1430,7 +1420,7 @@
 	}
 	gp->reset_task_pending = 0;
 
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&gp->dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 	netif_poll_enable(gp->dev);
 }
@@ -1444,7 +1434,7 @@
 		return;
 
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&gp->dev->xmit_lock);
 
 	/* If the link of task is still pending, we just
 	 * reschedule the link timer
@@ -1514,11 +1504,11 @@
 restart:
 	mod_timer(&gp->link_timer, jiffies + ((12 * HZ) / 10));
 out_unlock:
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&gp->dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_clean_rings(struct gem *gp)
 {
 	struct gem_init_block *gb = gp->init_block;
@@ -1569,7 +1559,7 @@
 	}
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_init_rings(struct gem *gp)
 {
 	struct gem_init_block *gb = gp->init_block;
@@ -1619,7 +1609,7 @@
 	wmb();
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_init_phy(struct gem *gp)
 {
 	u32 mifcfg;
@@ -1757,7 +1747,7 @@
 	}
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_init_dma(struct gem *gp)
 {
 	u64 desc_dma = (u64) gp->gblock_dvma;
@@ -1795,7 +1785,7 @@
 		       gp->regs + RXDMA_BLANK);
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under dev->xmit_lock. */
 static u32
 gem_setup_multicast(struct gem *gp)
 {
@@ -1838,7 +1828,7 @@
 	return rxcfg;
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_init_mac(struct gem *gp)
 {
 	unsigned char *e = &gp->dev->dev_addr[0];
@@ -1916,7 +1906,7 @@
 	writel(0xffffffff, gp->regs + MAC_MCMASK);
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_init_pause_thresholds(struct gem *gp)
 {
        	u32 cfg;
@@ -2052,7 +2042,7 @@
 	return 0;
 }
 
-/* Must be invoked under gp->lock and gp->tx_lock. */
+/* Must be invoked under gp->lock and dev->xmit_lock. */
 static void gem_init_hw(struct gem *gp, int restart_link)
 {
 	/* On Apple's gmac, I initialize the PHY only after
@@ -2150,11 +2140,11 @@
 
 	if (!gp->wake_on_lan) {
 		spin_lock_irqsave(&gp->lock, flags);
-		spin_lock(&gp->tx_lock);
+		spin_lock(&gp->dev->xmit_lock);
 		gem_stop(gp);
 		writel(MAC_TXRST_CMD, gp->regs + MAC_TXRST);
 		writel(MAC_RXRST_CMD, gp->regs + MAC_RXRST);
-		spin_unlock(&gp->tx_lock);
+		spin_unlock(&gp->dev->xmit_lock);
 		spin_unlock_irqrestore(&gp->lock, flags);
 	}
 
@@ -2202,9 +2192,9 @@
 		unsigned long flags;
 
 		spin_lock_irqsave(&gp->lock, flags);
-		spin_lock(&gp->tx_lock);
+		spin_lock(&gp->dev->xmit_lock);
 		gem_stop(gp);
-		spin_unlock(&gp->tx_lock);
+		spin_unlock(&gp->dev->xmit_lock);
 		spin_unlock_irqrestore(&gp->lock, flags);
 	}
 }
@@ -2265,9 +2255,9 @@
 
 		/* Reset the chip */
 		spin_lock_irq(&gp->lock);
-		spin_lock(&gp->tx_lock);
+		spin_lock(&gp->dev->xmit_lock);
 		gem_stop(gp);
-		spin_unlock(&gp->tx_lock);
+		spin_unlock(&gp->dev->xmit_lock);
 		spin_unlock_irq(&gp->lock);
 
 		gp->hw_running = 1;
@@ -2281,7 +2271,7 @@
 		printk(KERN_ERR "%s: failed to request irq !\n", gp->dev->name);
 
 		spin_lock_irq(&gp->lock);
-		spin_lock(&gp->tx_lock);
+		spin_lock(&gp->dev->xmit_lock);
 #ifdef CONFIG_PPC_PMAC
 		if (!hw_was_up && gp->pdev->vendor == PCI_VENDOR_ID_APPLE)
 			gem_apple_powerdown(gp);
@@ -2290,14 +2280,14 @@
 		gp->pm_timer.expires = jiffies + 10*HZ;
 		add_timer(&gp->pm_timer);
 		up(&gp->pm_sem);
-		spin_unlock(&gp->tx_lock);
+		spin_unlock(&gp->dev->xmit_lock);
 		spin_unlock_irq(&gp->lock);
 
 		return -EAGAIN;
 	}
 
        	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&gp->dev->xmit_lock);
 
 	/* Allocate & setup ring buffers */
 	gem_init_rings(gp);
@@ -2307,7 +2297,7 @@
 
 	gp->opened = 1;
 
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&gp->dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 
 	up(&gp->pm_sem);
@@ -2328,7 +2318,7 @@
 
 	/* Stop traffic, mark us closed */
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&gp->dev->xmit_lock);
 
 	gp->opened = 0;	
 
@@ -2343,7 +2333,7 @@
 	/* Bye, the pm timer will finish the job */
 	free_irq(gp->pdev->irq, (void *) dev);
 
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&gp->dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 
 	/* Fire the PM timer that will shut us down in about 10 seconds */
@@ -2374,7 +2364,7 @@
 	/* If the driver is opened, we stop the DMA */
 	if (gp->opened) {
 		spin_lock_irq(&gp->lock);
-		spin_lock(&gp->tx_lock);
+		spin_lock(&gp->dev->xmit_lock);
 
 		/* Stop traffic, mark us closed */
 		netif_device_detach(dev);
@@ -2385,7 +2375,7 @@
 		/* Get rid of ring buffers */
 		gem_clean_rings(gp);
 
-		spin_unlock(&gp->tx_lock);
+		spin_unlock(&gp->dev->xmit_lock);
 		spin_unlock_irq(&gp->lock);
 
 		if (gp->pdev->vendor == PCI_VENDOR_ID_APPLE)
@@ -2419,14 +2409,14 @@
 		}
 #endif /* CONFIG_PPC_PMAC */
 		spin_lock_irq(&gp->lock);
-		spin_lock(&gp->tx_lock);
+		spin_lock(&gp->dev->xmit_lock);
 
 		gem_stop(gp);
 		gp->hw_running = 1;
 		gem_init_rings(gp);
 		gem_init_hw(gp, 1);
 
-		spin_unlock(&gp->tx_lock);
+		spin_unlock(&gp->dev->xmit_lock);
 		spin_unlock_irq(&gp->lock);
 
 		netif_device_attach(dev);
@@ -2447,7 +2437,7 @@
 	struct net_device_stats *stats = &gp->net_stats;
 
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	if (gp->hw_running) {
 		stats->rx_crc_errors += readl(gp->regs + MAC_FCSERR);
@@ -2467,12 +2457,13 @@
 		writel(0, gp->regs + MAC_LCOLL);
 	}
 
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 
 	return &gp->net_stats;
 }
 
+/* Called with dev->xmit_lock held and IRQs disabled.  */
 static void gem_set_multicast(struct net_device *dev)
 {
 	struct gem *gp = dev->priv;
@@ -2482,9 +2473,6 @@
 	if (!gp->hw_running)
 		return;
 		
-	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
-
 	netif_stop_queue(dev);
 
 	rxcfg = readl(gp->regs + MAC_RXCFG);
@@ -2507,9 +2495,6 @@
 	writel(rxcfg, gp->regs + MAC_RXCFG);
 
 	netif_wake_queue(dev);
-
-	spin_unlock(&gp->tx_lock);
-	spin_unlock_irq(&gp->lock);
 }
 
 static void gem_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info)
@@ -2540,7 +2525,7 @@
 
 		/* Return current PHY settings */
 		spin_lock_irq(&gp->lock);
-		spin_lock(&gp->tx_lock);
+		spin_lock(&dev->xmit_lock);
 		cmd->autoneg = gp->want_autoneg;
 		cmd->speed = gp->phy_mii.speed;
 		cmd->duplex = gp->phy_mii.duplex;			
@@ -2552,7 +2537,7 @@
 		 */
 		if (cmd->advertising == 0)
 			cmd->advertising = cmd->supported;
-		spin_unlock(&gp->tx_lock);
+		spin_unlock(&dev->xmit_lock);
 		spin_unlock_irq(&gp->lock);
 	} else { // XXX PCS ?
 		cmd->supported =
@@ -2592,9 +2577,9 @@
 	      
 	/* Apply settings and restart link process. */
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	gem_begin_auto_negotiation(gp, cmd);
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 
 	return 0;
@@ -2609,9 +2594,9 @@
 
 	/* Restart link process. */
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	gem_begin_auto_negotiation(gp, NULL);
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 
 	return 0;
@@ -2863,7 +2848,6 @@
 	gp->msg_enable = DEFAULT_MSG;
 
 	spin_lock_init(&gp->lock);
-	spin_lock_init(&gp->tx_lock);
 	init_MUTEX(&gp->pm_sem);
 
 	init_timer(&gp->link_timer);
@@ -2899,9 +2883,9 @@
 		gem_apple_powerup(gp);
 #endif
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	gem_stop(gp);
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 
 	/* Fill up the mii_phy structure (even if we won't use it) */
@@ -2967,11 +2951,11 @@
 
 	/* Detect & init PHY, start autoneg */
 	spin_lock_irq(&gp->lock);
-	spin_lock(&gp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	gp->hw_running = 1;
 	gem_init_phy(gp);
 	gem_begin_auto_negotiation(gp, NULL);
-	spin_unlock(&gp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&gp->lock);
 
 	if (gp->phy_type == phy_mii_mdio0 ||
@@ -2982,7 +2966,7 @@
 	pci_set_drvdata(pdev, dev);
 
 	/* GEM can do it all... */
-	dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_LLTX;
+	dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM;
 	if (pci_using_dac)
 		dev->features |= NETIF_F_HIGHDMA;
 
===== drivers/net/sungem.h 1.18 vs edited =====
--- 1.18/drivers/net/sungem.h	2004-09-15 08:16:05 -07:00
+++ edited/drivers/net/sungem.h	2005-01-19 15:54:08 -08:00
@@ -953,7 +953,6 @@
 
 struct gem {
 	spinlock_t lock;
-	spinlock_t tx_lock;
 	void __iomem *regs;
 	int rx_new, rx_old;
 	int tx_new, tx_old;
===== drivers/net/tg3.c 1.227 vs edited =====
--- 1.227/drivers/net/tg3.c	2005-01-17 13:40:31 -08:00
+++ edited/drivers/net/tg3.c	2005-01-19 16:41:08 -08:00
@@ -2816,9 +2816,9 @@
 
 	/* run TX completion thread */
 	if (sblk->idx[0].tx_consumer != tp->tx_cons) {
-		spin_lock(&tp->tx_lock);
+		spin_lock(&netdev->xmit_lock);
 		tg3_tx(tp);
-		spin_unlock(&tp->tx_lock);
+		spin_unlock(&netdev->xmit_lock);
 	}
 
 	spin_unlock_irqrestore(&tp->lock, flags);
@@ -2939,7 +2939,7 @@
 	tg3_netif_stop(tp);
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&tp->dev->xmit_lock);
 
 	restart_timer = tp->tg3_flags2 & TG3_FLG2_RESTART_TIMER;
 	tp->tg3_flags2 &= ~TG3_FLG2_RESTART_TIMER;
@@ -2949,7 +2949,7 @@
 
 	tg3_netif_start(tp);
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&tp->dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	if (restart_timer)
@@ -3048,6 +3048,7 @@
 		(base + len + 8 < base));
 }
 
+/* dev->xmit_lock is held and IRQs are disabled.  */
 static int tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct tg3 *tp = netdev_priv(dev);
@@ -3055,39 +3056,12 @@
 	unsigned int i;
 	u32 len, entry, base_flags, mss;
 	int would_hit_hwbug;
-	unsigned long flags;
 
 	len = skb_headlen(skb);
 
-	/* No BH disabling for tx_lock here.  We are running in BH disabled
-	 * context and TX reclaim runs via tp->poll inside of a software
-	 * interrupt.  Rejoice!
-	 *
-	 * Actually, things are not so simple.  If we are to take a hw
-	 * IRQ here, we can deadlock, consider:
-	 *
-	 *       CPU1		CPU2
-	 *   tg3_start_xmit
-	 *   take tp->tx_lock
-	 *			tg3_timer
-	 *			take tp->lock
-	 *   tg3_interrupt
-	 *   spin on tp->lock
-	 *			spin on tp->tx_lock
-	 *
-	 * So we really do need to disable interrupts when taking
-	 * tx_lock here.
-	 */
-	local_irq_save(flags);
-	if (!spin_trylock(&tp->tx_lock)) { 
-		local_irq_restore(flags);
-		return NETDEV_TX_LOCKED; 
-	} 
-
 	/* This is a hard error, log it. */
 	if (unlikely(TX_BUFFS_AVAIL(tp) <= (skb_shinfo(skb)->nr_frags + 1))) {
 		netif_stop_queue(dev);
-		spin_unlock_irqrestore(&tp->tx_lock, flags);
 		printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n",
 		       dev->name);
 		return NETDEV_TX_BUSY;
@@ -3224,7 +3198,7 @@
 						entry, len,
 						last_plus_one,
 						&start, mss))
-			goto out_unlock;
+			goto out;
 
 		entry = start;
 	}
@@ -3236,9 +3210,8 @@
 	if (TX_BUFFS_AVAIL(tp) <= (MAX_SKB_FRAGS + 1))
 		netif_stop_queue(dev);
 
-out_unlock:
+out:
     	mmiowb();
-	spin_unlock_irqrestore(&tp->tx_lock, flags);
 
 	dev->trans_start = jiffies;
 
@@ -3273,7 +3246,7 @@
 
 	tg3_netif_stop(tp);
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	tg3_halt(tp);
 
@@ -3283,7 +3256,7 @@
 
 	tg3_netif_start(tp);
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	return 0;
@@ -5574,7 +5547,7 @@
 	unsigned long flags;
 
 	spin_lock_irqsave(&tp->lock, flags);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&tp->dev->xmit_lock);
 
 	/* All of this garbage is because when using non-tagged
 	 * IRQ status the mailbox/status_block protocol the chip
@@ -5590,7 +5563,7 @@
 
 	if (!(tr32(WDMAC_MODE) & WDMAC_MODE_ENABLE)) {
 		tp->tg3_flags2 |= TG3_FLG2_RESTART_TIMER;
-		spin_unlock(&tp->tx_lock);
+		spin_unlock(&tp->dev->xmit_lock);
 		spin_unlock_irqrestore(&tp->lock, flags);
 		schedule_work(&tp->reset_task);
 		return;
@@ -5659,7 +5632,7 @@
 		tp->asf_counter = tp->asf_multiplier;
 	}
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&tp->dev->xmit_lock);
 	spin_unlock_irqrestore(&tp->lock, flags);
 
 	tp->timer.expires = jiffies + tp->timer_offset;
@@ -5672,12 +5645,12 @@
 	int err;
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	tg3_disable_ints(tp);
 	tp->tg3_flags &= ~TG3_FLAG_INIT_COMPLETE;
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	/* The placement of this call is tied
@@ -5696,7 +5669,7 @@
 	}
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	err = tg3_init_hw(tp);
 	if (err) {
@@ -5716,7 +5689,7 @@
 		tp->tg3_flags |= TG3_FLAG_INIT_COMPLETE;
 	}
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	if (err) {
@@ -5726,11 +5699,11 @@
 	}
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	tg3_enable_ints(tp);
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	netif_start_queue(dev);
@@ -5978,7 +5951,7 @@
 	del_timer_sync(&tp->timer);
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 #if 0
 	tg3_dump_state(tp);
 #endif
@@ -5992,7 +5965,7 @@
 		  TG3_FLAG_GOT_SERDES_FLOWCTL);
 	netif_carrier_off(tp->dev);
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	free_irq(dev->irq, dev);
@@ -6291,15 +6264,10 @@
 	}
 }
 
+/* Called with dev->xmit_lock held and IRQs disabled.  */
 static void tg3_set_rx_mode(struct net_device *dev)
 {
-	struct tg3 *tp = netdev_priv(dev);
-
-	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
 	__tg3_set_rx_mode(dev);
-	spin_unlock(&tp->tx_lock);
-	spin_unlock_irq(&tp->lock);
 }
 
 #define TG3_REGDUMP_LEN		(32 * 1024)
@@ -6322,7 +6290,7 @@
 	memset(p, 0, TG3_REGDUMP_LEN);
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 #define __GET_REG32(reg)	(*(p)++ = tr32(reg))
 #define GET_REG32_LOOP(base,len)		\
@@ -6372,7 +6340,7 @@
 #undef GET_REG32_LOOP
 #undef GET_REG32_1
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 }
 
@@ -6496,7 +6464,7 @@
 	}
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	tp->link_config.autoneg = cmd->autoneg;
 	if (cmd->autoneg == AUTONEG_ENABLE) {
@@ -6510,7 +6478,7 @@
   	}
   
 	tg3_setup_phy(tp, 1);
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
   
 	return 0;
@@ -6627,7 +6595,7 @@
   
 	tg3_netif_stop(tp);
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
   
 	tp->rx_pending = ering->rx_pending;
 
@@ -6640,7 +6608,7 @@
 	tg3_halt(tp);
 	tg3_init_hw(tp);
 	tg3_netif_start(tp);
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
   
 	return 0;
@@ -6661,7 +6629,7 @@
   
 	tg3_netif_stop(tp);
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	if (epause->autoneg)
 		tp->tg3_flags |= TG3_FLAG_PAUSE_AUTONEG;
 	else
@@ -6677,7 +6645,7 @@
 	tg3_halt(tp);
 	tg3_init_hw(tp);
 	tg3_netif_start(tp);
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
   
 	return 0;
@@ -6803,14 +6771,14 @@
 	struct tg3 *tp = netdev_priv(dev);
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	tp->vlgrp = grp;
 
 	/* Update RX_MODE_KEEP_VLAN_TAG bit in RX_MODE register. */
 	__tg3_set_rx_mode(dev);
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 }
 
@@ -6819,10 +6787,10 @@
 	struct tg3 *tp = netdev_priv(dev);
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	if (tp->vlgrp)
 		tp->vlgrp->vlan_devices[vid] = NULL;
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 }
 #endif
@@ -8241,7 +8209,6 @@
 
 	if (pci_using_dac)
 		dev->features |= NETIF_F_HIGHDMA;
-	dev->features |= NETIF_F_LLTX;
 #if TG3_VLAN_TAG_USED
 	dev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
 	dev->vlan_rx_register = tg3_vlan_rx_register;
@@ -8283,7 +8250,6 @@
 	tp->grc_mode |= GRC_MODE_BSWAP_NONFRM_DATA;
 #endif
 	spin_lock_init(&tp->lock);
-	spin_lock_init(&tp->tx_lock);
 	spin_lock_init(&tp->indirect_lock);
 	INIT_WORK(&tp->reset_task, tg3_reset_task, tp);
 
@@ -8496,23 +8462,23 @@
 	del_timer_sync(&tp->timer);
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	tg3_disable_ints(tp);
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	netif_device_detach(dev);
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 	tg3_halt(tp);
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	err = tg3_set_power_state(tp, state);
 	if (err) {
 		spin_lock_irq(&tp->lock);
-		spin_lock(&tp->tx_lock);
+		spin_lock(&dev->xmit_lock);
 
 		tg3_init_hw(tp);
 
@@ -8522,7 +8488,7 @@
 		netif_device_attach(dev);
 		tg3_netif_start(tp);
 
-		spin_unlock(&tp->tx_lock);
+		spin_unlock(&dev->xmit_lock);
 		spin_unlock_irq(&tp->lock);
 	}
 
@@ -8547,7 +8513,7 @@
 	netif_device_attach(dev);
 
 	spin_lock_irq(&tp->lock);
-	spin_lock(&tp->tx_lock);
+	spin_lock(&dev->xmit_lock);
 
 	tg3_init_hw(tp);
 
@@ -8558,7 +8524,7 @@
 
 	tg3_netif_start(tp);
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock(&dev->xmit_lock);
 	spin_unlock_irq(&tp->lock);
 
 	return 0;
===== drivers/net/tg3.h 1.52 vs edited =====
--- 1.52/drivers/net/tg3.h	2004-11-15 15:41:31 -08:00
+++ edited/drivers/net/tg3.h	2005-01-19 15:37:29 -08:00
@@ -1980,12 +1980,11 @@
 	 * lock: Held during all operations except TX packet
 	 *       processing.
 	 *
-	 * tx_lock: Held during tg3_start_xmit{,_4gbug} and tg3_tx
+	 * dev->xmit_lock: Held during tg3_start_xmit and tg3_tx
 	 *
 	 * If you want to shut up all asynchronous processing you must
-	 * acquire both locks, 'lock' taken before 'tx_lock'.  IRQs must
-	 * be disabled to take 'lock' but only softirq disabling is
-	 * necessary for acquisition of 'tx_lock'.
+	 * acquire both locks, 'lock' taken before 'xmit_lock'.  IRQs must
+	 * be disabled to take either lock.
 	 */
 	spinlock_t			lock;
 	spinlock_t			indirect_lock;
@@ -2003,8 +2002,6 @@
 	u32				tx_prod;
 	u32				tx_cons;
 	u32				tx_pending;
-
-	spinlock_t			tx_lock;
 
 	struct tg3_tx_buffer_desc	*tx_ring;
 	struct tx_ring_info		*tx_buffers;
===== drivers/net/e1000/e1000.h 1.58 vs edited =====
--- 1.58/drivers/net/e1000/e1000.h	2004-10-25 18:13:57 -07:00
+++ edited/drivers/net/e1000/e1000.h	2005-01-19 15:45:31 -08:00
@@ -209,7 +209,6 @@
 
 	/* TX */
 	struct e1000_desc_ring tx_ring;
-	spinlock_t tx_lock;
 	uint32_t txd_cmd;
 	uint32_t tx_int_delay;
 	uint32_t tx_abs_int_delay;
===== drivers/net/e1000/e1000_main.c 1.145 vs edited =====
--- 1.145/drivers/net/e1000/e1000_main.c	2005-01-10 21:03:21 -08:00
+++ edited/drivers/net/e1000/e1000_main.c	2005-01-19 16:37:55 -08:00
@@ -291,7 +291,9 @@
 			e1000_phy_reset(&adapter->hw);
 	}
 
+	spin_lock_irq(&netdev->xmit_lock);
 	e1000_set_multi(netdev);
+	spin_unlock_irq(&netdev->xmit_lock);
 
 	e1000_restore_vlan(adapter);
 
@@ -520,9 +522,6 @@
 	if(pci_using_dac)
 		netdev->features |= NETIF_F_HIGHDMA;
 
- 	/* hard_start_xmit is safe against parallel locking */
- 	netdev->features |= NETIF_F_LLTX; 
- 
 	/* before reading the EEPROM, reset the controller to 
 	 * put the device in a known good starting state */
 	
@@ -732,7 +731,6 @@
 
 	atomic_set(&adapter->irq_sem, 1);
 	spin_lock_init(&adapter->stats_lock);
-	spin_lock_init(&adapter->tx_lock);
 
 	return 0;
 }
@@ -1293,6 +1291,8 @@
  * list or the network interface flags are updated.  This routine is
  * responsible for configuring the hardware for proper multicast,
  * promiscuous mode, and all-multi behavior.
+ *
+ * Called with netdev->xmit_lock held and IRQs disabled.
  **/
 
 static void
@@ -1304,12 +1304,9 @@
 	uint32_t rctl;
 	uint32_t hash_value;
 	int i;
-	unsigned long flags;
 
 	/* Check for Promiscuous and All Multicast modes */
 
-	spin_lock_irqsave(&adapter->tx_lock, flags);
-
 	rctl = E1000_READ_REG(hw, RCTL);
 
 	if(netdev->flags & IFF_PROMISC) {
@@ -1358,8 +1355,6 @@
 
 	if(hw->mac_type == e1000_82542_rev2_0)
 		e1000_leave_82542_rst(adapter);
-
-	spin_unlock_irqrestore(&adapter->tx_lock, flags);
 }
 
 /* Need to wait a few seconds after link up to get diagnostic information from
@@ -1786,6 +1781,8 @@
 }
 
 #define TXD_USE_COUNT(S, X) (((S) >> (X)) + 1 )
+
+/* Called with dev->xmit_lock held and interrupts disabled.  */
 static int
 e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 {
@@ -1794,7 +1791,6 @@
 	unsigned int max_txd_pwr = E1000_MAX_TXD_PWR;
 	unsigned int tx_flags = 0;
 	unsigned int len = skb->len;
-	unsigned long flags;
 	unsigned int nr_frags = 0;
 	unsigned int mss = 0;
 	int count = 0;
@@ -1838,18 +1834,10 @@
 	if(adapter->pcix_82544)
 		count += nr_frags;
 
- 	local_irq_save(flags); 
- 	if (!spin_trylock(&adapter->tx_lock)) { 
- 		/* Collision - tell upper layer to requeue */ 
- 		local_irq_restore(flags); 
- 		return NETDEV_TX_LOCKED; 
- 	} 
-
 	/* need: count + 2 desc gap to keep tail from touching
 	 * head, otherwise try next time */
 	if(unlikely(E1000_DESC_UNUSED(&adapter->tx_ring) < count + 2)) {
 		netif_stop_queue(netdev);
-		spin_unlock_irqrestore(&adapter->tx_lock, flags);
 		return NETDEV_TX_BUSY;
 	}
 
@@ -1857,7 +1845,6 @@
 		if(unlikely(e1000_82547_fifo_workaround(adapter, skb))) {
 			netif_stop_queue(netdev);
 			mod_timer(&adapter->tx_fifo_stall_timer, jiffies);
-			spin_unlock_irqrestore(&adapter->tx_lock, flags);
 			return NETDEV_TX_BUSY;
 		}
 	}
@@ -1884,7 +1871,6 @@
 	if(unlikely(E1000_DESC_UNUSED(&adapter->tx_ring) < MAX_SKB_FRAGS + 2))
 		netif_stop_queue(netdev);
 
-	spin_unlock_irqrestore(&adapter->tx_lock, flags);
 	return NETDEV_TX_OK;
 }
 
@@ -2234,13 +2220,13 @@
 
 	tx_ring->next_to_clean = i;
 
-	spin_lock(&adapter->tx_lock);
+	spin_lock(&netdev->xmit_lock);
 
 	if(unlikely(cleaned && netif_queue_stopped(netdev) &&
 		    netif_carrier_ok(netdev)))
 		netif_wake_queue(netdev);
 
-	spin_unlock(&adapter->tx_lock);
+	spin_unlock(&netdev->xmit_lock);
 
 	return cleaned;
 }
@@ -2819,7 +2805,10 @@
 
 	if(wufc) {
 		e1000_setup_rctl(adapter);
+
+		spin_lock_irq(&netdev->xmit_lock);
 		e1000_set_multi(netdev);
+		spin_unlock_irq(&netdev->xmit_lock);
 
 		/* turn on all-multi mode if wake on multicast is enabled */
 		if(adapter->wol & E1000_WUFC_MC) {
===== include/linux/netdevice.h 1.96 vs edited =====
--- 1.96/include/linux/netdevice.h	2005-01-17 13:13:31 -08:00
+++ edited/include/linux/netdevice.h	2005-01-19 16:07:45 -08:00
@@ -76,7 +76,6 @@
 /* Driver transmit return codes */
 #define NETDEV_TX_OK 0		/* driver took care of packet */
 #define NETDEV_TX_BUSY 1	/* driver tx path was busy*/
-#define NETDEV_TX_LOCKED -1	/* driver tx lock was already taken */
 
 /*
  *	Compute the worst case header length according to the protocols
@@ -415,7 +414,7 @@
 #define NETIF_F_HW_VLAN_FILTER	512	/* Receive filtering on VLAN */
 #define NETIF_F_VLAN_CHALLENGED	1024	/* Device cannot handle VLAN packets */
 #define NETIF_F_TSO		2048	/* Can offload TCP/IP segmentation */
-#define NETIF_F_LLTX		4096	/* LockLess TX */
+#define NETIF_F_LLTX		4096	/* Do not grab xmit_lock during ->hard_start_xmit */
 
 	/* Called after device is detached from network. */
 	void			(*uninit)(struct net_device *dev);
@@ -894,9 +893,11 @@
 
 static inline void netif_tx_disable(struct net_device *dev)
 {
-	spin_lock_bh(&dev->xmit_lock);
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->xmit_lock, flags);
 	netif_stop_queue(dev);
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irqrestore(&dev->xmit_lock, flags);
 }
 
 /* These functions live elsewhere (drivers/net/net_init.c, but related) */
===== net/atm/clip.c 1.43 vs edited =====
--- 1.43/net/atm/clip.c	2004-10-01 14:50:07 -07:00
+++ edited/net/atm/clip.c	2005-01-19 16:04:55 -08:00
@@ -97,7 +97,7 @@
 		printk(KERN_CRIT "!clip_vcc->entry (clip_vcc %p)\n",clip_vcc);
 		return;
 	}
-	spin_lock_bh(&entry->neigh->dev->xmit_lock);	/* block clip_start_xmit() */
+	spin_lock_irq(&entry->neigh->dev->xmit_lock);	/* block clip_start_xmit() */
 	entry->neigh->used = jiffies;
 	for (walk = &entry->vccs; *walk; walk = &(*walk)->next)
 		if (*walk == clip_vcc) {
@@ -121,7 +121,7 @@
 	printk(KERN_CRIT "ATMARP: unlink_clip_vcc failed (entry %p, vcc "
 	  "0x%p)\n",entry,clip_vcc);
 out:
-	spin_unlock_bh(&entry->neigh->dev->xmit_lock);
+	spin_unlock_irq(&entry->neigh->dev->xmit_lock);
 }
 
 /* The neighbour entry n->lock is held. */
===== net/core/dev.c 1.181 vs edited =====
--- 1.181/net/core/dev.c	2005-01-13 20:41:04 -08:00
+++ edited/net/core/dev.c	2005-01-19 16:00:08 -08:00
@@ -1190,7 +1190,7 @@
 
 #define HARD_TX_LOCK(dev, cpu) {			\
 	if ((dev->features & NETIF_F_LLTX) == 0) {	\
-		spin_lock(&dev->xmit_lock);		\
+		spin_lock_irq(&dev->xmit_lock);		\
 		dev->xmit_lock_owner = cpu;		\
 	}						\
 }
@@ -1198,7 +1198,7 @@
 #define HARD_TX_UNLOCK(dev) {				\
 	if ((dev->features & NETIF_F_LLTX) == 0) {	\
 		dev->xmit_lock_owner = -1;		\
-		spin_unlock(&dev->xmit_lock);		\
+		spin_unlock_irq(&dev->xmit_lock);	\
 	}						\
 }
 
===== net/core/dev_mcast.c 1.4 vs edited =====
--- 1.4/net/core/dev_mcast.c	2004-10-20 01:37:15 -07:00
+++ edited/net/core/dev_mcast.c	2005-01-19 16:05:35 -08:00
@@ -93,9 +93,9 @@
 
 void dev_mc_upload(struct net_device *dev)
 {
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	__dev_mc_upload(dev);
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 }
 
 /*
@@ -107,7 +107,7 @@
 	int err = 0;
 	struct dev_mc_list *dmi, **dmip;
 
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 
 	for (dmip = &dev->mc_list; (dmi = *dmip) != NULL; dmip = &dmi->next) {
 		/*
@@ -139,13 +139,13 @@
 			 */
 			__dev_mc_upload(dev);
 			
-			spin_unlock_bh(&dev->xmit_lock);
+			spin_unlock_irq(&dev->xmit_lock);
 			return 0;
 		}
 	}
 	err = -ENOENT;
 done:
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 	return err;
 }
 
@@ -160,7 +160,7 @@
 
 	dmi1 = (struct dev_mc_list *)kmalloc(sizeof(*dmi), GFP_ATOMIC);
 
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	for (dmi = dev->mc_list; dmi != NULL; dmi = dmi->next) {
 		if (memcmp(dmi->dmi_addr, addr, dmi->dmi_addrlen) == 0 &&
 		    dmi->dmi_addrlen == alen) {
@@ -176,7 +176,7 @@
 	}
 
 	if ((dmi = dmi1) == NULL) {
-		spin_unlock_bh(&dev->xmit_lock);
+		spin_unlock_irq(&dev->xmit_lock);
 		return -ENOMEM;
 	}
 	memcpy(dmi->dmi_addr, addr, alen);
@@ -189,11 +189,11 @@
 
 	__dev_mc_upload(dev);
 	
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 	return 0;
 
 done:
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 	if (dmi1)
 		kfree(dmi1);
 	return err;
@@ -205,7 +205,7 @@
 
 void dev_mc_discard(struct net_device *dev)
 {
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	
 	while (dev->mc_list != NULL) {
 		struct dev_mc_list *tmp = dev->mc_list;
@@ -216,7 +216,7 @@
 	}
 	dev->mc_count = 0;
 
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 }
 
 #ifdef CONFIG_PROC_FS
@@ -251,7 +251,7 @@
 	struct dev_mc_list *m;
 	struct net_device *dev = v;
 
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	for (m = dev->mc_list; m; m = m->next) {
 		int i;
 
@@ -263,7 +263,7 @@
 
 		seq_putc(seq, '\n');
 	}
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 	return 0;
 }
 
===== net/core/netpoll.c 1.24 vs edited =====
--- 1.24/net/core/netpoll.c	2005-01-13 20:41:04 -08:00
+++ edited/net/core/netpoll.c	2005-01-19 16:05:56 -08:00
@@ -188,7 +188,7 @@
 		return;
 	}
 
-	spin_lock(&np->dev->xmit_lock);
+	spin_lock_irq(&np->dev->xmit_lock);
 	np->dev->xmit_lock_owner = smp_processor_id();
 
 	/*
@@ -197,7 +197,7 @@
 	 */
 	if (netif_queue_stopped(np->dev)) {
 		np->dev->xmit_lock_owner = -1;
-		spin_unlock(&np->dev->xmit_lock);
+		spin_unlock_irq(&np->dev->xmit_lock);
 
 		netpoll_poll(np);
 		goto repeat;
@@ -205,7 +205,7 @@
 
 	status = np->dev->hard_start_xmit(skb, np->dev);
 	np->dev->xmit_lock_owner = -1;
-	spin_unlock(&np->dev->xmit_lock);
+	spin_unlock_irq(&np->dev->xmit_lock);
 
 	/* transmit busy */
 	if(status) {
===== net/core/pktgen.c 1.21 vs edited =====
--- 1.21/net/core/pktgen.c	2005-01-10 11:32:10 -08:00
+++ edited/net/core/pktgen.c	2005-01-19 16:07:58 -08:00
@@ -2664,12 +2664,11 @@
 		}
 	}
 	
-	spin_lock_bh(&odev->xmit_lock);
+	spin_lock_irq(&odev->xmit_lock);
 	if (!netif_queue_stopped(odev)) {
 		u64 now;
 
 		atomic_inc(&(pkt_dev->skb->users));
-retry_now:
 		ret = odev->hard_start_xmit(pkt_dev->skb, odev);
 		if (likely(ret == NETDEV_TX_OK)) {
 			pkt_dev->last_ok = 1;    
@@ -2677,10 +2676,6 @@
 			pkt_dev->seq_num++;
 			pkt_dev->tx_bytes += pkt_dev->cur_pkt_size;
 			
-		} else if (ret == NETDEV_TX_LOCKED 
-			   && (odev->features & NETIF_F_LLTX)) {
-			cpu_relax();
-			goto retry_now;
 		} else {  /* Retry it next time */
 			
 			atomic_dec(&(pkt_dev->skb->users));
@@ -2716,7 +2711,7 @@
 		pkt_dev->next_tx_ns = 0;
         }
 
-	spin_unlock_bh(&odev->xmit_lock);
+	spin_unlock_irq(&odev->xmit_lock);
 	
 	/* If pkt_dev->count is zero, then run forever */
 	if ((pkt_dev->count != 0) && (pkt_dev->sofar >= pkt_dev->count)) {
===== net/sched/sch_generic.c 1.33 vs edited =====
--- 1.33/net/sched/sch_generic.c	2005-01-13 20:41:07 -08:00
+++ edited/net/sched/sch_generic.c	2005-01-19 16:25:02 -08:00
@@ -99,17 +99,11 @@
 	if ((skb = q->dequeue(q)) != NULL) {
 		unsigned nolock = (dev->features & NETIF_F_LLTX);
 		/*
-		 * When the driver has LLTX set it does its own locking
-		 * in start_xmit. No need to add additional overhead by
-		 * locking again. These checks are worth it because
-		 * even uncongested locks can be quite expensive.
-		 * The driver can do trylock like here too, in case
-		 * of lock congestion it should return -1 and the packet
-		 * will be requeued.
+		 * When the driver has LLTX set it does not require any
+		 * locking in start_xmit.
 		 */
 		if (!nolock) {
-			if (!spin_trylock(&dev->xmit_lock)) {
-			collision:
+			if (!spin_trylock_irq(&dev->xmit_lock)) {
 				/* So, someone grabbed the driver. */
 				
 				/* It may be transient configuration error,
@@ -143,22 +137,18 @@
 				if (ret == NETDEV_TX_OK) { 
 					if (!nolock) {
 						dev->xmit_lock_owner = -1;
-						spin_unlock(&dev->xmit_lock);
+						spin_unlock_irq(&dev->xmit_lock);
 					}
 					spin_lock(&dev->queue_lock);
 					return -1;
 				}
-				if (ret == NETDEV_TX_LOCKED && nolock) {
-					spin_lock(&dev->queue_lock);
-					goto collision; 
-				}
 			}
 
 			/* NETDEV_TX_BUSY - we need to requeue */
 			/* Release the driver */
 			if (!nolock) { 
 				dev->xmit_lock_owner = -1;
-				spin_unlock(&dev->xmit_lock);
+				spin_unlock_irq(&dev->xmit_lock);
 			} 
 			spin_lock(&dev->queue_lock);
 			q = dev->qdisc;
@@ -186,7 +176,7 @@
 {
 	struct net_device *dev = (struct net_device *)arg;
 
-	spin_lock(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	if (dev->qdisc != &noop_qdisc) {
 		if (netif_device_present(dev) &&
 		    netif_running(dev) &&
@@ -200,7 +190,7 @@
 				dev_hold(dev);
 		}
 	}
-	spin_unlock(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 
 	dev_put(dev);
 }
@@ -224,17 +214,17 @@
 
 static void dev_watchdog_up(struct net_device *dev)
 {
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	__netdev_watchdog_up(dev);
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 }
 
 static void dev_watchdog_down(struct net_device *dev)
 {
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	if (del_timer(&dev->watchdog_timer))
 		__dev_put(dev);
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 }
 
 /* "NOOP" scheduler: the best scheduler, recommended for all interfaces
===== net/sched/sch_teql.c 1.16 vs edited =====
--- 1.16/net/sched/sch_teql.c	2004-10-21 22:21:24 -07:00
+++ edited/net/sched/sch_teql.c	2005-01-19 16:06:11 -08:00
@@ -301,12 +301,12 @@
 
 		switch (teql_resolve(skb, skb_res, slave)) {
 		case 0:
-			if (spin_trylock(&slave->xmit_lock)) {
+			if (spin_trylock_irq(&slave->xmit_lock)) {
 				slave->xmit_lock_owner = smp_processor_id();
 				if (!netif_queue_stopped(slave) &&
 				    slave->hard_start_xmit(skb, slave) == 0) {
 					slave->xmit_lock_owner = -1;
-					spin_unlock(&slave->xmit_lock);
+					spin_unlock_irq(&slave->xmit_lock);
 					master->slaves = NEXT_SLAVE(q);
 					netif_wake_queue(dev);
 					master->stats.tx_packets++;
@@ -314,7 +314,7 @@
 					return 0;
 				}
 				slave->xmit_lock_owner = -1;
-				spin_unlock(&slave->xmit_lock);
+				spin_unlock_irq(&slave->xmit_lock);
 			}
 			if (netif_queue_stopped(dev))
 				busy = 1;

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-20  3:14                                       ` Andi Kleen
@ 2005-01-20  7:05                                         ` David S. Miller
  0 siblings, 0 replies; 62+ messages in thread
From: David S. Miller @ 2005-01-20  7:05 UTC (permalink / raw)
  To: Andi Kleen
  Cc: shemminger, hadi, iod00d, eric.lemoine, roland, netdev, ak,
	openib-general, kaber

On Thu, 20 Jan 2005 04:14:23 +0100
Andi Kleen <ak@suse.de> wrote:

> Looks good to me and much cleaner than what I initially did.
> Thanks.

Thanks for the review Andi.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-20  7:05                                         ` David S. Miller
@ 2005-01-20 13:51                                           ` Tommy Christensen
  2005-01-20 21:34                                             ` David S. Miller
  2005-01-20 21:41                                             ` David S. Miller
  2005-01-20 16:56                                           ` Stephen Hemminger
  2005-01-20 19:16                                           ` Jeff Garzik
  2 siblings, 2 replies; 62+ messages in thread
From: Tommy Christensen @ 2005-01-20 13:51 UTC (permalink / raw)
  To: David S. Miller
  Cc: Roland Dreier, shemminger, hadi, iod00d, eric.lemoine, netdev, ak,
	openib-general, kaber

On Thu, 2005-01-20 at 08:05, David S. Miller wrote:
> So I'm going to check this in and push upstream.  Let me know
> if folks find any other errors.

Hi Dave
Sorry to be the one with bad news.
I am in favour of this idea, but there are still issues that
need to be sorted out.

> ===== drivers/net/sungem.c 1.72 vs edited =====
> --- 1.72/drivers/net/sungem.c	2004-11-05 15:56:15 -08:00
> +++ edited/drivers/net/sungem.c	2005-01-19 22:29:14 -08:00

> @@ -932,12 +932,12 @@
>  	       readl(gp->regs + MAC_RXCFG));
>  
>  	spin_lock_irq(&gp->lock);
> -	spin_lock(&gp->tx_lock);
> +	spin_lock(&dev->xmit_lock);
>  
>  	gp->reset_task_pending = 2;
>  	schedule_work(&gp->reset_task);
>  
> -	spin_unlock(&gp->tx_lock);
> +	spin_unlock(&dev->xmit_lock);
>  	spin_unlock_irq(&gp->lock);
>  }
>  

->tx_timeout() can't take dev->xmit_lock, since dev_watchdog
already has it held.

A lot more serious is the fact that ->tx_timeout() and
->hard_start_xmit() are no longer allowed to do this:

  spin_lock_irq()
  ...
  spin_unlock_irq()

since that would leave us with irq's enabled while still
holding the xmit_lock.
This would have to be fixed for non-LLTX drivers as well.

-Tommy

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-20  7:05                                         ` David S. Miller
  2005-01-20 13:51                                           ` Tommy Christensen
@ 2005-01-20 16:56                                           ` Stephen Hemminger
  2005-01-21 10:54                                             ` Lennert Buytenhek
  2005-01-20 19:16                                           ` Jeff Garzik
  2 siblings, 1 reply; 62+ messages in thread
From: Stephen Hemminger @ 2005-01-20 16:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, eric.lemoine, hadi, ak, openib-general, kaber

You might want to enforce that LLTX devices like tunnels need to have
no transmit queue (tx_queue_len == 0)?

I can run a test on 8 way with e1000 if it would help.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-20  7:05                                         ` David S. Miller
  2005-01-20 13:51                                           ` Tommy Christensen
  2005-01-20 16:56                                           ` Stephen Hemminger
@ 2005-01-20 19:16                                           ` Jeff Garzik
  2 siblings, 0 replies; 62+ messages in thread
From: Jeff Garzik @ 2005-01-20 19:16 UTC (permalink / raw)
  To: David S. Miller
  Cc: Roland Dreier, shemminger, hadi, iod00d, eric.lemoine, netdev, ak,
	openib-general, kaber

Reminder:  double-check that the locking rules in 
Documentation/networking/netdevices.txt still apply; update the text 
file where you changed things.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-20 13:51                                           ` Tommy Christensen
@ 2005-01-20 21:34                                             ` David S. Miller
  2005-01-20 21:56                                               ` Grant Grundler
  2005-01-20 21:41                                             ` David S. Miller
  1 sibling, 1 reply; 62+ messages in thread
From: David S. Miller @ 2005-01-20 21:34 UTC (permalink / raw)
  To: Tommy Christensen
  Cc: roland, shemminger, hadi, iod00d, eric.lemoine, netdev, ak,
	openib-general, kaber

On Thu, 20 Jan 2005 14:51:25 +0100
Tommy Christensen <tommy.christensen@tpack.net> wrote:

> A lot more serious is the fact that ->tx_timeout() and
> ->hard_start_xmit() are no longer allowed to do this:
> 
>   spin_lock_irq()
>   ...
>   spin_unlock_irq()
> 
> since that would leave us with irq's enabled while still
> holding the xmit_lock.
> This would have to be fixed for non-LLTX drivers as well.

Even worse is that this breaks the acenic driver too because
it does this:

	unsigned long maxjiff = jiffies + 3*HZ;

	if (time_before(jiffies, maxjiff)) {
		barrier();
		cpu_relax();
		goto restart;
	}

in it's ->hard_start_xmit() routine.

I was auditing spin_lock_irq() usage in ->hard_start_xmit()
routines when I caught this.

This one isn't impossible to fix though.  We can replace the
jiffies games with a udelay/mdelay and a "maxloop" variable.
Any takers?

I'll keep working on the spin_lock_irq() audit then work on
the other problems Tommy found.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-20 13:51                                           ` Tommy Christensen
  2005-01-20 21:34                                             ` David S. Miller
@ 2005-01-20 21:41                                             ` David S. Miller
  1 sibling, 0 replies; 62+ messages in thread
From: David S. Miller @ 2005-01-20 21:41 UTC (permalink / raw)
  To: Tommy Christensen
  Cc: roland, shemminger, hadi, iod00d, eric.lemoine, netdev, ak,
	openib-general, kaber

On Thu, 20 Jan 2005 14:51:25 +0100
Tommy Christensen <tommy.christensen@tpack.net> wrote:

> A lot more serious is the fact that ->tx_timeout() and
> ->hard_start_xmit() are no longer allowed to do this:

Sigh, there are even more issues.  Now dev_kfree_skb()
is illegal in ->hard_start_xmit() because interrupts are
disabled, and there are many such cases in the simpler
net drivers.

I think we might need to revert the change over to IRQ
disabling dev->xmit_lock and rethink this.  This is just too
much fallout.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-20 21:34                                             ` David S. Miller
@ 2005-01-20 21:56                                               ` Grant Grundler
  2005-01-21  1:01                                                 ` David S. Miller
  0 siblings, 1 reply; 62+ messages in thread
From: Grant Grundler @ 2005-01-20 21:56 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, eric.lemoine, hadi, ak, openib-general, Tommy Christensen,
	kaber, shemminger

On Thu, Jan 20, 2005 at 01:34:20PM -0800, David S. Miller wrote:
> Even worse is that this breaks the acenic driver too because
> it does this:
> 
> 	unsigned long maxjiff = jiffies + 3*HZ;
> 
> 	if (time_before(jiffies, maxjiff)) {
> 		barrier();
> 		cpu_relax();
> 		goto restart;
> 	}
> 
> in it's ->hard_start_xmit() routine.
> 
> I was auditing spin_lock_irq() usage in ->hard_start_xmit()
> routines when I caught this.
> 
> This one isn't impossible to fix though.  We can replace the
> jiffies games with a udelay/mdelay and a "maxloop" variable.
> Any takers?

I saw the next mail suggesting to revert the changes because of more fallout.

But if/when acenic needs changes like those proposed above, I can implement
and test proposed changes if folks have the patience to wait a week or so.
acenic is no where on my "official HP supported NICs" list but I happen
to have the HW/infrastructure to test them.

grant

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-20 21:56                                               ` Grant Grundler
@ 2005-01-21  1:01                                                 ` David S. Miller
  2005-01-22  3:17                                                   ` Roland Dreier
  0 siblings, 1 reply; 62+ messages in thread
From: David S. Miller @ 2005-01-21  1:01 UTC (permalink / raw)
  To: Grant Grundler
  Cc: tommy.christensen, roland, shemminger, hadi, iod00d, eric.lemoine,
	netdev, ak, openib-general, kaber

On Thu, 20 Jan 2005 13:56:50 -0800
Grant Grundler <iod00d@hp.com> wrote:

> I saw the next mail suggesting to revert the changes because of more fallout.

Yeah, that's what I ended up just pushing to Linus.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-20 16:56                                           ` Stephen Hemminger
@ 2005-01-21 10:54                                             ` Lennert Buytenhek
  2005-01-26  6:27                                               ` David S. Miller
  0 siblings, 1 reply; 62+ messages in thread
From: Lennert Buytenhek @ 2005-01-21 10:54 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, Roland Dreier, hadi, iod00d, eric.lemoine,
	netdev, ak, openib-general, kaber

On Thu, Jan 20, 2005 at 08:56:11AM -0800, Stephen Hemminger wrote:

> You might want to enforce that LLTX devices like tunnels need to
> have no transmit queue (tx_queue_len == 0)?

I didn't look at the original LLTX patch, so I'm speaking out of
ignorance, but since we're on this subject anyway: I noticed that
the ip_gre tunneling driver contains this comment:

<quote>
   1. The most important issue is detecting local dead loops.
   They would cause complete host lockup in transmit, which
   would be "resolved" by stack overflow or, if queueing is enabled,
   with infinite looping in net_bh.

   We cannot track such dead loops during route installation,
   it is infeasible task. The most general solutions would be
   to keep skb->encapsulation counter (sort of local ttl),
   and silently drop packet when it expires. It is the best
   solution, but it supposes maintaing new variable in ALL
   skb, even if no tunneling is used.

   Current solution: t->recursion lock breaks dead loops. It looks
   like dev->tbusy flag, but I preferred new variable, because
   the semantics is different. One day, when hard_start_xmit
   will be multithreaded we will have to use skb->encapsulation.
</quote>

If multiple CPUs can call into the tunneling drivers without taking
any locks, we'd need some extra locking in there, or just do what
Alexey describes and keep track of recursion in the skb.

The recursion detection in dev_queue_xmit would perhaps also want to
use such a per-skb mechanism instead of using dev->xmit_lock_owner.

(You really _do_ want the protection that this mechanism offers.)

--L

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-21  1:01                                                 ` David S. Miller
@ 2005-01-22  3:17                                                   ` Roland Dreier
  2005-01-22  3:53                                                     ` David S. Miller
  0 siblings, 1 reply; 62+ messages in thread
From: Roland Dreier @ 2005-01-22  3:17 UTC (permalink / raw)
  To: David S. Miller
  Cc: Grant Grundler, tommy.christensen, shemminger, hadi, eric.lemoine,
	netdev, ak, openib-general, kaber

So what's the plan here with LLTX?  As far as I can tell, 2.6.11-rc2
went out with the changes that disable IRQs in hard_start_xmit...

 - R.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-22  3:17                                                   ` Roland Dreier
@ 2005-01-22  3:53                                                     ` David S. Miller
  0 siblings, 0 replies; 62+ messages in thread
From: David S. Miller @ 2005-01-22  3:53 UTC (permalink / raw)
  To: Roland Dreier
  Cc: iod00d, tommy.christensen, shemminger, hadi, eric.lemoine, netdev,
	ak, openib-general, kaber

On Fri, 21 Jan 2005 19:17:07 -0800
Roland Dreier <roland@topspin.com> wrote:

> So what's the plan here with LLTX?  As far as I can tell, 2.6.11-rc2
> went out with the changes that disable IRQs in hard_start_xmit...

Linus just didn't pull the backout of that stuff in time.
I'll repush it to him tonight.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-21 10:54                                             ` Lennert Buytenhek
@ 2005-01-26  6:27                                               ` David S. Miller
  2005-01-26 13:25                                                 ` Lennert Buytenhek
  0 siblings, 1 reply; 62+ messages in thread
From: David S. Miller @ 2005-01-26  6:27 UTC (permalink / raw)
  To: Lennert Buytenhek
  Cc: shemminger, roland, hadi, iod00d, eric.lemoine, netdev, ak,
	openib-general, kaber

On Fri, 21 Jan 2005 11:54:52 +0100
Lennert Buytenhek <buytenh@wantstofly.org> wrote:

> If multiple CPUs can call into the tunneling drivers without taking
> any locks, we'd need some extra locking in there, or just do what
> Alexey describes and keep track of recursion in the skb.

Another idea is that, just like how loopback made it's statistics
per-cpu for LLTX support, this recursion variable could be per-cpu
as well.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-26  6:27                                               ` David S. Miller
@ 2005-01-26 13:25                                                 ` Lennert Buytenhek
  2005-01-27  6:32                                                   ` David S. Miller
  0 siblings, 1 reply; 62+ messages in thread
From: Lennert Buytenhek @ 2005-01-26 13:25 UTC (permalink / raw)
  To: David S. Miller
  Cc: shemminger, roland, hadi, iod00d, eric.lemoine, netdev, ak,
	openib-general, kaber

On Tue, Jan 25, 2005 at 10:27:05PM -0800, David S. Miller wrote:

> > If multiple CPUs can call into the tunneling drivers without taking
> > any locks, we'd need some extra locking in there, or just do what
> > Alexey describes and keep track of recursion in the skb.
> 
> Another idea is that, just like how loopback made it's statistics
> per-cpu for LLTX support, this recursion variable could be per-cpu
> as well.

I've thought about this a bit, and the only sane way of doing recursion
detection that doesn't involve 'struct net_device' would be to keep track
of the recursion depth (perhaps per-CPU as you suggest) and tossing the
packet when it exceeds some random value, right?

To reproduce the current behaviour more closely you'd have to keep a
small per-CPU array of 'struct net_device *' pointers as a kind of
recursion stack, and toss the packet when you hit a net_device that's
already on the list.  But that seems like slight overkill.

--L

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-26 13:25                                                 ` Lennert Buytenhek
@ 2005-01-27  6:32                                                   ` David S. Miller
  2005-01-27  7:16                                                     ` Andi Kleen
  0 siblings, 1 reply; 62+ messages in thread
From: David S. Miller @ 2005-01-27  6:32 UTC (permalink / raw)
  To: Lennert Buytenhek
  Cc: shemminger, roland, hadi, iod00d, eric.lemoine, netdev, ak,
	openib-general, kaber

On Wed, 26 Jan 2005 14:25:12 +0100
Lennert Buytenhek <buytenh@wantstofly.org> wrote:

> I've thought about this a bit, and the only sane way of doing recursion
> detection that doesn't involve 'struct net_device' would be to keep track
> of the recursion depth (perhaps per-CPU as you suggest) and tossing the
> packet when it exceeds some random value, right?

Yes, that's the idea.

> To reproduce the current behaviour more closely you'd have to keep a
> small per-CPU array of 'struct net_device *' pointers as a kind of
> recursion stack, and toss the packet when you hit a net_device that's
> already on the list.  But that seems like slight overkill.

Indeed.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-27  6:32                                                   ` David S. Miller
@ 2005-01-27  7:16                                                     ` Andi Kleen
  2005-01-27  7:22                                                       ` David S. Miller
  0 siblings, 1 reply; 62+ messages in thread
From: Andi Kleen @ 2005-01-27  7:16 UTC (permalink / raw)
  To: David S. Miller
  Cc: Lennert Buytenhek, shemminger, roland, hadi, iod00d, eric.lemoine,
	netdev, ak, openib-general, kaber

On Wed, Jan 26, 2005 at 10:32:47PM -0800, David S. Miller wrote:
> On Wed, 26 Jan 2005 14:25:12 +0100
> Lennert Buytenhek <buytenh@wantstofly.org> wrote:
> 
> > I've thought about this a bit, and the only sane way of doing recursion
> > detection that doesn't involve 'struct net_device' would be to keep track
> > of the recursion depth (perhaps per-CPU as you suggest) and tossing the
> > packet when it exceeds some random value, right?
> 
> Yes, that's the idea.

per CPU only works in preemptive kernel if you have preemption
disabled all the time. Do you? 

Seems not likely to me.

-Andi

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-27  7:16                                                     ` Andi Kleen
@ 2005-01-27  7:22                                                       ` David S. Miller
  2005-01-27  8:26                                                         ` Andi Kleen
  0 siblings, 1 reply; 62+ messages in thread
From: David S. Miller @ 2005-01-27  7:22 UTC (permalink / raw)
  To: Andi Kleen
  Cc: buytenh, shemminger, roland, hadi, iod00d, eric.lemoine, netdev,
	ak, openib-general, kaber

On Thu, 27 Jan 2005 08:16:45 +0100
Andi Kleen <ak@suse.de> wrote:

> > Yes, that's the idea.
> 
> per CPU only works in preemptive kernel if you have preemption
> disabled all the time. Do you? 
> 
> Seems not likely to me.

BH is disabled in these code paths (specifically we're talking
about ->hard_start_xmit()), as that is where the recursion
check goes.

Otherwise, loopback's LLTX is broken as it relies on this property
as well.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH]: was Re: LLTX and netif_stop_queue
  2005-01-27  7:22                                                       ` David S. Miller
@ 2005-01-27  8:26                                                         ` Andi Kleen
  0 siblings, 0 replies; 62+ messages in thread
From: Andi Kleen @ 2005-01-27  8:26 UTC (permalink / raw)
  To: David S. Miller
  Cc: Andi Kleen, buytenh, shemminger, roland, hadi, iod00d,
	eric.lemoine, netdev, openib-general, kaber

On Wed, Jan 26, 2005 at 11:22:27PM -0800, David S. Miller wrote:
> On Thu, 27 Jan 2005 08:16:45 +0100
> Andi Kleen <ak@suse.de> wrote:
> 
> > > Yes, that's the idea.
> > 
> > per CPU only works in preemptive kernel if you have preemption
> > disabled all the time. Do you? 
> > 
> > Seems not likely to me.
> 
> BH is disabled in these code paths (specifically we're talking
> about ->hard_start_xmit()), as that is where the recursion
> check goes.

Hmm, but hard_start_xmit is allowed to enable preemption again, isn't it?

An safer alternative may be to change the cpu mask of the current process
temporarily.

-Andi

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2005-01-27  8:26 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-12-17 21:57 LLTX and netif_stop_queue Roland Dreier
2004-12-18  5:44 ` David S. Miller
2004-12-18 15:35   ` Roland Dreier
2004-12-18 17:58     ` Roland Dreier
2004-12-18 18:26       ` Roland Dreier
2004-12-19 19:33     ` jamal
2004-12-19 19:31   ` jamal
2004-12-19 19:54     ` jamal
2004-12-19 20:02       ` Jamal Hadi Salim
2004-12-19 22:35     ` Roland Dreier
2004-12-19 23:06       ` jamal
2004-12-19 23:16         ` Roland Dreier
2004-12-22 18:49     ` Eric Lemoine
2004-12-23  4:29       ` David S. Miller
2004-12-23  4:38         ` Roland Dreier
2004-12-23  9:10         ` Eric Lemoine
2004-12-23 16:37           ` Patrick McHardy
2004-12-23 18:11             ` Eric Lemoine
2004-12-24 16:10             ` Eric Lemoine
2004-12-28 13:31               ` jamal
2005-01-02 23:30                 ` Eric Lemoine
2005-01-03  7:41                   ` Eric Lemoine
2005-01-03 15:04                   ` jamal
2005-01-03 15:48                     ` Eric Lemoine
2005-01-03 15:57                     ` Roland Dreier
2005-01-03 16:41                       ` Eric Lemoine
2005-01-03 16:54                         ` Roland Dreier
2005-01-03 17:07                           ` Eric Lemoine
2005-01-03 17:12                             ` Grant Grundler
2005-01-04  4:18                               ` jamal
2005-01-19 22:47                                 ` David S. Miller
2005-01-19 23:18                                   ` Stephen Hemminger
2005-01-19 23:41                                     ` David S. Miller
2005-01-20  0:02                                       ` [openib-general] " Jeff Garzik
2005-01-20  0:46                                         ` Stephen Hemminger
2005-01-20  0:47                                           ` David S. Miller
2005-01-20  0:47                                       ` Francois Romieu
2005-01-20  0:52                                         ` David S. Miller
2005-01-20  1:17                                           ` Francois Romieu
2005-01-20  0:46                                     ` [PATCH]: was " David S. Miller
2005-01-20  3:14                                       ` Andi Kleen
2005-01-20  7:05                                         ` David S. Miller
2005-01-20  3:43                                       ` Roland Dreier
2005-01-20  7:05                                         ` David S. Miller
2005-01-20 13:51                                           ` Tommy Christensen
2005-01-20 21:34                                             ` David S. Miller
2005-01-20 21:56                                               ` Grant Grundler
2005-01-21  1:01                                                 ` David S. Miller
2005-01-22  3:17                                                   ` Roland Dreier
2005-01-22  3:53                                                     ` David S. Miller
2005-01-20 21:41                                             ` David S. Miller
2005-01-20 16:56                                           ` Stephen Hemminger
2005-01-21 10:54                                             ` Lennert Buytenhek
2005-01-26  6:27                                               ` David S. Miller
2005-01-26 13:25                                                 ` Lennert Buytenhek
2005-01-27  6:32                                                   ` David S. Miller
2005-01-27  7:16                                                     ` Andi Kleen
2005-01-27  7:22                                                       ` David S. Miller
2005-01-27  8:26                                                         ` Andi Kleen
2005-01-20 19:16                                           ` Jeff Garzik
2005-01-20  4:01                                       ` jamal
2005-01-20  5:18                                         ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).