* LLTX and netif_stop_queue
@ 2004-12-17 21:57 Roland Dreier
2004-12-18 5:44 ` David S. Miller
0 siblings, 1 reply; 62+ messages in thread
From: Roland Dreier @ 2004-12-17 21:57 UTC (permalink / raw)
To: netdev, davem; +Cc: openib-general
While testing my IP-over-InfiniBand driver, I discovered that if a net
device sets NETIF_F_LLTX, it seems the device's hard_start_xmit method
can be called even after a netif_stop_queue().
This is because in the LLTX case, qdisc_restart() holds no locks while
calling hard_start_xmit, so something like the following can happen:
CPU 1 CPU 2
qdisc_restart:
drop queue lock
call hard_start_xmit()
net driver:
acquire TX lock
queue packet to HW
acquire queue lock...
qdisc_restart:
drop queue lock
call hard_start_xmit:
queue full, call netif_stop_queue()
release TX lock
net driver:
acquire TX lock
queue is already full!
Is my understanding correct? If so it seems the patch below would
make sense. (e1000 seems to handle this properly already)
Thanks,
Roland
Since tg3 and sungem now use lockless TX (NETIF_F_LLTX), it's possible
for their hard_start_xmit method to be called even after they call
netif_stop_queue. Therefore a full queue no longer indicates a bug --
this patch fixes the comment and removes the KERN_ERR printk.
Signed-off-by: Roland Dreier <roland@topspin.com>
Index: linux-bk/drivers/net/sungem.c
===================================================================
--- linux-bk.orig/drivers/net/sungem.c 2004-12-16 15:56:19.000000000 -0800
+++ linux-bk/drivers/net/sungem.c 2004-12-17 13:46:43.307064457 -0800
@@ -976,12 +976,10 @@
return NETDEV_TX_LOCKED;
}
- /* This is a hard error, log it. */
+ /* This may happen, since we have NETIF_F_LLTX set */
if (TX_BUFFS_AVAIL(gp) <= (skb_shinfo(skb)->nr_frags + 1)) {
netif_stop_queue(dev);
spin_unlock_irqrestore(&gp->tx_lock, flags);
- printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n",
- dev->name);
return NETDEV_TX_BUSY;
}
Index: linux-bk/drivers/net/tg3.c
===================================================================
--- linux-bk.orig/drivers/net/tg3.c 2004-12-16 15:56:06.000000000 -0800
+++ linux-bk/drivers/net/tg3.c 2004-12-17 13:46:25.952622672 -0800
@@ -3076,12 +3076,10 @@
return NETDEV_TX_LOCKED;
}
- /* This is a hard error, log it. */
+ /* This may happen, since we have NETIF_F_LLTX set */
if (unlikely(TX_BUFFS_AVAIL(tp) <= (skb_shinfo(skb)->nr_frags + 1))) {
netif_stop_queue(dev);
spin_unlock_irqrestore(&tp->tx_lock, flags);
- printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n",
- dev->name);
return NETDEV_TX_BUSY;
}
^ permalink raw reply [flat|nested] 62+ messages in thread* Re: LLTX and netif_stop_queue 2004-12-17 21:57 LLTX and netif_stop_queue Roland Dreier @ 2004-12-18 5:44 ` David S. Miller 2004-12-18 15:35 ` Roland Dreier 2004-12-19 19:31 ` jamal 0 siblings, 2 replies; 62+ messages in thread From: David S. Miller @ 2004-12-18 5:44 UTC (permalink / raw) To: Roland Dreier; +Cc: netdev, openib-general On Fri, 17 Dec 2004 13:57:40 -0800 Roland Dreier <roland@topspin.com> wrote: > While testing my IP-over-InfiniBand driver, I discovered that if a net > device sets NETIF_F_LLTX, it seems the device's hard_start_xmit method > can be called even after a netif_stop_queue(). > > This is because in the LLTX case, qdisc_restart() holds no locks while > calling hard_start_xmit, so something like the following can happen: ... > if (TX_BUFFS_AVAIL(gp) <= (skb_shinfo(skb)->nr_frags + 1)) { > netif_stop_queue(dev); > spin_unlock_irqrestore(&gp->tx_lock, flags); > - printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n", > - dev->name); > return NETDEV_TX_BUSY; > } I understand the bug, but we're not going to fix it this way. This is a crucial invariant that we need to check for because it indicates a pretty serious state error except in this bug case you've discovered. Perhaps one way to fix this is to add a pointer to a spinlock to the netdev struct, and have hold that the upper level grab that when NETIF_F_LLTX when doing queue state checks. Actually, that could end up being racy too. If we can't find a good fix for this, besides removing the necessary debugging message, we might have to pull NETIF_F_LLTX out or disable it temporarily until we figure out a way. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2004-12-18 5:44 ` David S. Miller @ 2004-12-18 15:35 ` Roland Dreier 2004-12-18 17:58 ` Roland Dreier 2004-12-19 19:33 ` jamal 2004-12-19 19:31 ` jamal 1 sibling, 2 replies; 62+ messages in thread From: Roland Dreier @ 2004-12-18 15:35 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, openib-general David> I understand the bug, but we're not going to fix it this David> way. This is a crucial invariant that we need to check for David> because it indicates a pretty serious state error except in David> this bug case you've discovered. OK, that's fair enough. I was pretty bummed myself when I realized hard_start_xmit was getting called even after I stopped the queue. Thanks for confirming my analysis. David> Perhaps one way to fix this is to add a pointer to a David> spinlock to the netdev struct, and have hold that the upper David> level grab that when NETIF_F_LLTX when doing queue state David> checks. Actually, that could end up being racy too. I may be missing something, but it seems to me that we get all of the benefits of LLTX by just documenting that device drivers can use the xmit_lock member of struct net_device to synchronize other parts of the driver against hard_start_xmit. I guess the driver also should set xmit_lock_owner to -1 after it acquires xmit_lock. This would mean that NETIF_F_LLTX can go away, and LLTX drivers would just replace their use of their own private tx_lock with xmit_lock (except in hard_start_xmit, where the upper layer already holds xmit_lock). By the way, am I correct in thinking that the use of xmit_lock_owner in qdisc_restart() is racy? if (!spin_trylock(&dev->xmit_lock)) { /* get the lock */ if (!spin_trylock(&dev->xmit_lock)) { /* fail */ if (dev->xmit_lock_owner == smp_processor_id()) { /* test the wrong value */ /* set the value too late */ dev->xmit_lock_owner = smp_processor_id(); I don't see a simple way to fix this unfortunately. Thanks, Roland ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Re: LLTX and netif_stop_queue 2004-12-18 15:35 ` Roland Dreier @ 2004-12-18 17:58 ` Roland Dreier 2004-12-18 18:26 ` Roland Dreier 2004-12-19 19:33 ` jamal 1 sibling, 1 reply; 62+ messages in thread From: Roland Dreier @ 2004-12-18 17:58 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, openib-general Roland> I may be missing something, but it seems to me that we get Roland> all of the benefits of LLTX by just documenting that Roland> device drivers can use the xmit_lock member of struct Roland> net_device to synchronize other parts of the driver Roland> against hard_start_xmit. I guess the driver also should Roland> set xmit_lock_owner to -1 after it acquires xmit_lock. Thinking about this a little more, I realize that there's no reason for the driver to set xmit_lock_owner -- if the driver is able to acquire the lock, then xmit_lock_owner will already be -1. So it seems LLTX can be replaced by just having drivers use net_device.xmit_lock instead of their own private tx_lock. Assuming this works (and I don't see anything wrong with it) then this seems like a pretty nice solution: we remove some code from the networking core and get rid of all the "trylock" logic in driver's hard_start_xmit. - Roland ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Re: LLTX and netif_stop_queue 2004-12-18 17:58 ` Roland Dreier @ 2004-12-18 18:26 ` Roland Dreier 0 siblings, 0 replies; 62+ messages in thread From: Roland Dreier @ 2004-12-18 18:26 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, openib-general Roland> So it seems LLTX can be replaced by just having drivers Roland> use net_device.xmit_lock instead of their own private Roland> tx_lock. Assuming this works (and I don't see anything Roland> wrong with it) then this seems like a pretty nice Roland> solution: we remove some code from the networking core and Roland> get rid of all the "trylock" logic in driver's Roland> hard_start_xmit. Actually trying it instead of talking out of my ass... Just doing this naively without changing the net core can deadlock because the net core acquires dev->xmit_lock without disabling interrupts. So if the driver tries to use xmit_lock in its interrupt handler, it will deadlock if the interrupt occurred during hard_start_xmit. Even doing local_irq_save() in hard_start_xmit isn't good enough, because there's still a window between the net core's call to hard_start_xmit and the actual local_irq_save where xmit_lock is held with interrupts on. Maybe it makes sense to change NETIF_F_LLTX to NETIF_F_TX_IRQ_DIS or something like that and have that flag mean "disable interrupts when calling hard_start_xmit." (We could just do this unconditionally but I'm not sure if any drivers rely on having interrupts enabled during hard_start_xmit and I'm worried about making a change in semantics like that -- not to mention some drivers may not need interrupts disabled and may not want the cost). - Roland ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2004-12-18 15:35 ` Roland Dreier 2004-12-18 17:58 ` Roland Dreier @ 2004-12-19 19:33 ` jamal 1 sibling, 0 replies; 62+ messages in thread From: jamal @ 2004-12-19 19:33 UTC (permalink / raw) To: Roland Dreier; +Cc: netdev, David S. Miller, openib-general On Sat, 2004-12-18 at 10:35, Roland Dreier wrote: > By the way, am I correct in thinking that the use of xmit_lock_owner > in qdisc_restart() is racy? No. > if (!spin_trylock(&dev->xmit_lock)) { > /* get the lock */ > > if (!spin_trylock(&dev->xmit_lock)) { > /* fail */ > if (dev->xmit_lock_owner == smp_processor_id()) { > /* test the wrong value */ > > /* set the value too late */ > dev->xmit_lock_owner = smp_processor_id(); > The setting is protected by the queue lock. So no race. cheers, jamal ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2004-12-18 5:44 ` David S. Miller 2004-12-18 15:35 ` Roland Dreier @ 2004-12-19 19:31 ` jamal 2004-12-19 19:54 ` jamal ` (2 more replies) 1 sibling, 3 replies; 62+ messages in thread From: jamal @ 2004-12-19 19:31 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, openib-general On Sat, 2004-12-18 at 00:44, David S. Miller wrote: > Perhaps one way to fix this is to add a pointer to a spinlock to > the netdev struct, and have hold that the upper level grab that > when NETIF_F_LLTX when doing queue state checks. Actually, that > could end up being racy too. How about releasing the qlock only when the LLTX transmit lock is grabbed? That should bring it to par with what it was originally. cheers, jamal ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2004-12-19 19:31 ` jamal @ 2004-12-19 19:54 ` jamal 2004-12-19 20:02 ` Jamal Hadi Salim 2004-12-19 22:35 ` Roland Dreier 2004-12-22 18:49 ` Eric Lemoine 2 siblings, 1 reply; 62+ messages in thread From: jamal @ 2004-12-19 19:54 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, openib-general [-- Attachment #1: Type: text/plain, Size: 299 bytes --] On Sun, 2004-12-19 at 14:31, jamal wrote: > How about releasing the qlock only when the LLTX transmit lock is > grabbed? That should bring it to par with what it was originally. Something like two attached patches... one showing how to do it for e1000. untested (not even compiled) cheers, jamal [-- Attachment #2: p1 --] [-- Type: text/plain, Size: 887 bytes --] --- a/net/sched/bak.sch_generic.c 2004-12-19 13:46:19.799356432 -0500 +++ b/net/sched/sch_generic.c 2004-12-19 13:49:14.384815408 -0500 @@ -128,12 +128,11 @@ } /* Remember that the driver is grabbed by us. */ dev->xmit_lock_owner = smp_processor_id(); + /* And release queue */ + spin_unlock(&dev->queue_lock); } { - /* And release queue */ - spin_unlock(&dev->queue_lock); - if (!netif_queue_stopped(dev)) { int ret; if (netdev_nit) @@ -141,15 +140,14 @@ ret = dev->hard_start_xmit(skb, dev); if (ret == NETDEV_TX_OK) { + dev->xmit_lock_owner = -1; if (!nolock) { - dev->xmit_lock_owner = -1; spin_unlock(&dev->xmit_lock); } spin_lock(&dev->queue_lock); return -1; } if (ret == NETDEV_TX_LOCKED && nolock) { - spin_lock(&dev->queue_lock); goto collision; } } [-- Attachment #3: p2 --] [-- Type: text/plain, Size: 479 bytes --] --- a/drivers/net/e1000/b.e1000_main.c 2004-12-19 13:50:59.481838232 -0500 +++ b/drivers/net/e1000/e1000_main.c 2004-12-19 13:53:34.326298296 -0500 @@ -1809,6 +1809,10 @@ return NETDEV_TX_LOCKED; } + /* new from sch_generic for LLTX */ + spin_unlock(&dev->queue_lock); + dev->xmit_lock_owner = smp_processor_id(); + /* need: count + 2 desc gap to keep tail from touching * head, otherwise try next time */ if(E1000_DESC_UNUSED(&adapter->tx_ring) < count + 2) { [-- Attachment #4: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2004-12-19 19:54 ` jamal @ 2004-12-19 20:02 ` Jamal Hadi Salim 0 siblings, 0 replies; 62+ messages in thread From: Jamal Hadi Salim @ 2004-12-19 20:02 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, openib-general [-- Attachment #1: Type: text/plain, Size: 396 bytes --] On Sun, 2004-12-19 at 14:54, jamal wrote: > On Sun, 2004-12-19 at 14:31, jamal wrote: > > > How about releasing the qlock only when the LLTX transmit lock is > > grabbed? That should bring it to par with what it was originally. > > Something like two attached patches... one showing how to do it for > e1000. untested (not even compiled) After attempting to compile, p2 take2. cheers, jamal [-- Attachment #2: p2-take2 --] [-- Type: text/plain, Size: 511 bytes --] --- 2610-rc3-bk12/drivers/net/e1000/bak.e1000_main.c 2004-12-19 13:50:59.000000000 -0500 +++ 2610-rc3-bk12/drivers/net/e1000/e1000_main.c 2004-12-19 13:58:17.000000000 -0500 @@ -1809,6 +1809,10 @@ return NETDEV_TX_LOCKED; } + /* new from sch_generic for LLTX */ + spin_unlock(&netdev->queue_lock); + netdev->xmit_lock_owner = smp_processor_id(); + /* need: count + 2 desc gap to keep tail from touching * head, otherwise try next time */ if(E1000_DESC_UNUSED(&adapter->tx_ring) < count + 2) { [-- Attachment #3: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2004-12-19 19:31 ` jamal 2004-12-19 19:54 ` jamal @ 2004-12-19 22:35 ` Roland Dreier 2004-12-19 23:06 ` jamal 2004-12-22 18:49 ` Eric Lemoine 2 siblings, 1 reply; 62+ messages in thread From: Roland Dreier @ 2004-12-19 22:35 UTC (permalink / raw) To: hadi; +Cc: netdev, David S. Miller, openib-general jamal> How about releasing the qlock only when the LLTX transmit jamal> lock is grabbed? That should bring it to par with what it jamal> was originally. This seems a little risky. I can't point to a specific deadlock but it doesn't seem right on general principles to unlock in a different order than you nested the locks when acquiring them -- if I understand correctly, you're suggesting lock(queue_lock), lock(tx_lock), unlock(queue_lock), unlock(tx_lock). Thanks, Roland ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2004-12-19 22:35 ` Roland Dreier @ 2004-12-19 23:06 ` jamal 2004-12-19 23:16 ` Roland Dreier 0 siblings, 1 reply; 62+ messages in thread From: jamal @ 2004-12-19 23:06 UTC (permalink / raw) To: Roland Dreier; +Cc: netdev, David S. Miller, openib-general On Sun, 2004-12-19 at 17:35, Roland Dreier wrote: > jamal> How about releasing the qlock only when the LLTX transmit > jamal> lock is grabbed? That should bring it to par with what it > jamal> was originally. > > This seems a little risky. I can't point to a specific deadlock but > it doesn't seem right on general principles to unlock in a different > order than you nested the locks when acquiring them -- if I understand > correctly, you're suggesting lock(queue_lock), lock(tx_lock), > unlock(queue_lock), unlock(tx_lock). There is no deadlock. Thats exactly how things work. Try the patches i posted. cheers, jamal ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2004-12-19 23:06 ` jamal @ 2004-12-19 23:16 ` Roland Dreier 0 siblings, 0 replies; 62+ messages in thread From: Roland Dreier @ 2004-12-19 23:16 UTC (permalink / raw) To: hadi; +Cc: netdev, David S. Miller, openib-general Roland> This seems a little risky. I can't point to a specific Roland> deadlock but it doesn't seem right on general principles Roland> to unlock in a different order than you nested the locks Roland> when acquiring them -- if I understand correctly, you're Roland> suggesting lock(queue_lock), lock(tx_lock), Roland> unlock(queue_lock), unlock(tx_lock). jamal> There is no deadlock. Thats exactly how things work. Try jamal> the patches i posted. Thinking about it more, I guess it's OK. I was just think about the basic general rule that locks need to be acquired/released in LIFO order to avoid deadlock (eg lock(A), lock(B), unlock(B), unlock(A)). However in this case unlocking queue_lock after acquiring the driver's tx_lock seems to be OK because the driver does a trylock on tx_lock in the xmit path, so it can't deadlock. At worst the trylock will just fail to get the lock. Thanks, Roland ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2004-12-19 19:31 ` jamal 2004-12-19 19:54 ` jamal 2004-12-19 22:35 ` Roland Dreier @ 2004-12-22 18:49 ` Eric Lemoine 2004-12-23 4:29 ` David S. Miller 2 siblings, 1 reply; 62+ messages in thread From: Eric Lemoine @ 2004-12-22 18:49 UTC (permalink / raw) To: hadi; +Cc: netdev, David S. Miller, openib-general [-- Attachment #1: Type: text/plain, Size: 863 bytes --] On 19 Dec 2004 14:31:15 -0500, jamal <hadi@cyberus.ca> wrote: > On Sat, 2004-12-18 at 00:44, David S. Miller wrote: > > > Perhaps one way to fix this is to add a pointer to a spinlock to > > the netdev struct, and have hold that the upper level grab that > > when NETIF_F_LLTX when doing queue state checks. Actually, that > > could end up being racy too. > > How about releasing the qlock only when the LLTX transmit lock is > grabbed? That should bring it to par with what it was originally. I dont like the idea of releasing inside the driver a lock taken outside. That might be just me... Instead, I would suggest to have LLTX drivers check whether queue is stopped after they grab their private tx lock and before they check tx ring fullness. That way we close the race window but keep the driver bug check around. See attached sungem patch. -- Eric [-- Attachment #2: sungem-lltx.patch --] [-- Type: application/octet-stream, Size: 511 bytes --] ===== drivers/net/sungem.c 1.71 vs edited ===== --- 1.71/drivers/net/sungem.c 2004-11-14 10:45:36 +01:00 +++ edited/drivers/net/sungem.c 2004-12-22 19:34:07 +01:00 @@ -976,6 +976,12 @@ return NETDEV_TX_LOCKED; } + /* This handles a LLTX-related race condition */ + if (netif_queue_stopped(dev)) { + spin_unlock_irqrestore(&gp->tx_lock, flags); + return NETDEV_TX_BUSY; + } + /* This is a hard error, log it. */ if (TX_BUFFS_AVAIL(gp) <= (skb_shinfo(skb)->nr_frags + 1)) { netif_stop_queue(dev); [-- Attachment #3: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2004-12-22 18:49 ` Eric Lemoine @ 2004-12-23 4:29 ` David S. Miller 2004-12-23 4:38 ` Roland Dreier 2004-12-23 9:10 ` Eric Lemoine 0 siblings, 2 replies; 62+ messages in thread From: David S. Miller @ 2004-12-23 4:29 UTC (permalink / raw) To: Eric Lemoine; +Cc: netdev, hadi, openib-general On Wed, 22 Dec 2004 19:49:48 +0100 Eric Lemoine <eric.lemoine@gmail.com> wrote: > Instead, I would suggest to have LLTX drivers check whether queue is > stopped after they grab their private tx lock and before they check tx > ring fullness. That way we close the race window but keep the driver > bug check around. > > See attached sungem patch. That sounds about right. Nice idea. It solves the race, and retains the error state check. I'll apply Eric's patch, and do something similar in the other LLTX drivers (except loopback which has not "queue" per se so doesn't need this stuff). ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2004-12-23 4:29 ` David S. Miller @ 2004-12-23 4:38 ` Roland Dreier 2004-12-23 9:10 ` Eric Lemoine 1 sibling, 0 replies; 62+ messages in thread From: Roland Dreier @ 2004-12-23 4:38 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, Eric Lemoine, hadi, openib-general David> That sounds about right. Nice idea. It solves the race, David> and retains the error state check. Great, I've made a similar change to the IP-over-IB driver. Thanks, Roland ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2004-12-23 4:29 ` David S. Miller 2004-12-23 4:38 ` Roland Dreier @ 2004-12-23 9:10 ` Eric Lemoine 2004-12-23 16:37 ` Patrick McHardy 1 sibling, 1 reply; 62+ messages in thread From: Eric Lemoine @ 2004-12-23 9:10 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, hadi, openib-general On Wed, 22 Dec 2004 20:29:19 -0800, David S. Miller <davem@davemloft.net> wrote: > On Wed, 22 Dec 2004 19:49:48 +0100 > Eric Lemoine <eric.lemoine@gmail.com> wrote: > > > Instead, I would suggest to have LLTX drivers check whether queue is > > stopped after they grab their private tx lock and before they check tx > > ring fullness. That way we close the race window but keep the driver > > bug check around. > > > > See attached sungem patch. > > That sounds about right. Nice idea. It solves the race, and retains > the error state check. > > I'll apply Eric's patch, and do something similar in the other LLTX > drivers (except loopback which has not "queue" per se so doesn't need > this stuff). Dave, I still have one concern with the LLTX code (and it may be that the correct patch is Jamal's) : Without LLTX we do : lock(queue_lock), lock(xmit_lock), release(queue_lock), release(xmit_lock). With LLTX (without Jamal's patch) we do : lock(queue_lock), release(queue_lock), lock(tx_lock), release(tx_lock). LLTX doesn't look correct because it creates a race condition window between the the two lock-protected sections. So you may want to reconsider Jamal's patch or pull out LLTX... Thanks, -- Eric ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2004-12-23 9:10 ` Eric Lemoine @ 2004-12-23 16:37 ` Patrick McHardy 2004-12-23 18:11 ` Eric Lemoine 2004-12-24 16:10 ` Eric Lemoine 0 siblings, 2 replies; 62+ messages in thread From: Patrick McHardy @ 2004-12-23 16:37 UTC (permalink / raw) To: Eric Lemoine; +Cc: netdev, hadi, David S. Miller, openib-general Eric Lemoine wrote: > I still have one concern with the LLTX code (and it may be that the > correct patch is Jamal's) : > > Without LLTX we do : lock(queue_lock), lock(xmit_lock), > release(queue_lock), release(xmit_lock). With LLTX (without Jamal's > patch) we do : lock(queue_lock), release(queue_lock), lock(tx_lock), > release(tx_lock). LLTX doesn't look correct because it creates a race > condition window between the the two lock-protected sections. So you > may want to reconsider Jamal's patch or pull out LLTX... You're right, it can cause packet reordering if something like this happens: CPU1 CPU2 lock(queue_lock) dequeue unlock(queue_lock) lock(queue_lock) dequeue unlock(queue_lock) lock(xmit_lock) hard_start_xmit unlock(xmit_lock) lock(xmit_lock) hard_start_xmit unlock(xmit_lock) Jamal's patch should fix this. Regards Patrick ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2004-12-23 16:37 ` Patrick McHardy @ 2004-12-23 18:11 ` Eric Lemoine 2004-12-24 16:10 ` Eric Lemoine 1 sibling, 0 replies; 62+ messages in thread From: Eric Lemoine @ 2004-12-23 18:11 UTC (permalink / raw) To: Patrick McHardy; +Cc: netdev, hadi, David S. Miller, openib-general On Thu, 23 Dec 2004 17:37:24 +0100, Patrick McHardy <kaber@trash.net> wrote: > Eric Lemoine wrote: > > I still have one concern with the LLTX code (and it may be that the > > correct patch is Jamal's) : > > > > Without LLTX we do : lock(queue_lock), lock(xmit_lock), > > release(queue_lock), release(xmit_lock). With LLTX (without Jamal's > > patch) we do : lock(queue_lock), release(queue_lock), lock(tx_lock), > > release(tx_lock). LLTX doesn't look correct because it creates a race > > condition window between the the two lock-protected sections. So you > > may want to reconsider Jamal's patch or pull out LLTX... > > You're right, it can cause packet reordering That's exactly what I was thinking of. > if something like this > happens: > > CPU1 CPU2 > lock(queue_lock) > dequeue > unlock(queue_lock) > lock(queue_lock) > dequeue > unlock(queue_lock) > lock(xmit_lock) > hard_start_xmit > unlock(xmit_lock) > lock(xmit_lock) > hard_start_xmit > unlock(xmit_lock) > > Jamal's patch should fix this. -- Eric ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2004-12-23 16:37 ` Patrick McHardy 2004-12-23 18:11 ` Eric Lemoine @ 2004-12-24 16:10 ` Eric Lemoine 2004-12-28 13:31 ` jamal 1 sibling, 1 reply; 62+ messages in thread From: Eric Lemoine @ 2004-12-24 16:10 UTC (permalink / raw) To: Patrick McHardy; +Cc: netdev, hadi, David S. Miller, openib-general On Thu, 23 Dec 2004 17:37:24 +0100, Patrick McHardy <kaber@trash.net> wrote: > Eric Lemoine wrote: > > I still have one concern with the LLTX code (and it may be that the > > correct patch is Jamal's) : > > > > Without LLTX we do : lock(queue_lock), lock(xmit_lock), > > release(queue_lock), release(xmit_lock). With LLTX (without Jamal's > > patch) we do : lock(queue_lock), release(queue_lock), lock(tx_lock), > > release(tx_lock). LLTX doesn't look correct because it creates a race > > condition window between the the two lock-protected sections. So you > > may want to reconsider Jamal's patch or pull out LLTX... > > You're right, it can cause packet reordering if something like this > happens: > > CPU1 CPU2 > lock(queue_lock) > dequeue > unlock(queue_lock) > lock(queue_lock) > dequeue > unlock(queue_lock) > lock(xmit_lock) > hard_start_xmit > unlock(xmit_lock) > lock(xmit_lock) > hard_start_xmit > unlock(xmit_lock) > > Jamal's patch should fix this. Yes but requiring drivers to release a lock that they should not even be aware of doesn't sound good. Another way would be to keep dev->queue_lock grabbed when entering start_xmit() and let the driver drop it (and re-acquire it before it returns) only if it wishes so. Although I don't like this too much either, that's the best way I can think of up to now... -- Eric ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2004-12-24 16:10 ` Eric Lemoine @ 2004-12-28 13:31 ` jamal 2005-01-02 23:30 ` Eric Lemoine 0 siblings, 1 reply; 62+ messages in thread From: jamal @ 2004-12-28 13:31 UTC (permalink / raw) To: Eric Lemoine; +Cc: netdev, Patrick McHardy, openib-general, David S. Miller On Fri, 2004-12-24 at 11:10, Eric Lemoine wrote: > Yes but requiring drivers to release a lock that they should not even > be aware of doesn't sound good. Another way would be to keep > dev->queue_lock grabbed when entering start_xmit() and let the driver > drop it (and re-acquire it before it returns) only if it wishes so. > Although I don't like this too much either, that's the best way I can > think of up to now... I am not a big fan of that patch either, but i cant think of a cleaner way to do it. The violation already happens with the LLTX flag. So maybe a big warning that says "Do this only if you driver is LLTX enabled". The other way to do it is put a check to see if LLTX is enabled before releasing that lock - but why the extra cycles? Driver writer should know. cheers, jamal ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2004-12-28 13:31 ` jamal @ 2005-01-02 23:30 ` Eric Lemoine 2005-01-03 7:41 ` Eric Lemoine 2005-01-03 15:04 ` jamal 0 siblings, 2 replies; 62+ messages in thread From: Eric Lemoine @ 2005-01-02 23:30 UTC (permalink / raw) To: hadi; +Cc: netdev, Patrick McHardy, openib-general, David S. Miller [-- Attachment #1: Type: text/plain, Size: 1437 bytes --] On 28 Dec 2004 08:31:57 -0500, jamal <hadi@cyberus.ca> wrote: > On Fri, 2004-12-24 at 11:10, Eric Lemoine wrote: > > > Yes but requiring drivers to release a lock that they should not even > > be aware of doesn't sound good. Another way would be to keep > > dev->queue_lock grabbed when entering start_xmit() and let the driver > > drop it (and re-acquire it before it returns) only if it wishes so. > > Although I don't like this too much either, that's the best way I can > > think of up to now... > > I am not a big fan of that patch either, but i cant think of a cleaner > way to do it. > The violation already happens with the LLTX flag. So maybe a big warning > that says "Do this only if you driver is LLTX enabled". The other way to > do it is put a check to see if LLTX is enabled before releasing that > lock - but why the extra cycles? Driver writer should know. Two (untested) patches implementing what I described above. The first pach (sch_generic.patch) keeps queue_lock held in qdisc_restart() when calling hard_start_xmit() of a LLTX driver. The second (sungem.patch) makes sungem release queue_lock after grabbing its private tx lock. Note that the modifications made to qdisc_restart() are not compatible with the current LLTX drivers. All LLTX drivers must be modified along sungem.patch's lines. Take sungem.patch as an example of how things must be done. Patches apply to current davem bk snapshot. -- Eric [-- Attachment #2: sch_generic.patch --] [-- Type: application/octet-stream, Size: 1965 bytes --] ===== net/sched/sch_generic.c 1.32 vs edited ===== --- 1.32/net/sched/sch_generic.c 2004-12-28 05:51:20 +01:00 +++ edited/net/sched/sch_generic.c 2005-01-02 23:55:18 +01:00 @@ -104,9 +104,17 @@ * locking again. These checks are worth it because * even uncongested locks can be quite expensive. * The driver can do trylock like here too, in case - * of lock congestion it should return -1 and the packet - * will be requeued. + * of lock congestion it should return NETDEV_TX_LOCKED + * and the packet will be requeued. + * + * Note: when the driver has LLTX set queue_lock cannot + * be dropped in here if we want to avoid having packets + * passing eachover during transmit. queue_lock can only + * be dropped in hard_start_xmit() after private tx lock + * has been grabbed. If hard_start_xmit() does release + * queue_lock it must grab it again before it returns. */ + if (!nolock) { if (!spin_trylock(&dev->xmit_lock)) { collision: @@ -128,12 +136,12 @@ } /* Remember that the driver is grabbed by us. */ dev->xmit_lock_owner = smp_processor_id(); + + /* And release queue */ + spin_unlock(&dev->queue_lock); } { - /* And release queue */ - spin_unlock(&dev->queue_lock); - if (!netif_queue_stopped(dev)) { int ret; if (netdev_nit) @@ -144,14 +152,12 @@ if (!nolock) { dev->xmit_lock_owner = -1; spin_unlock(&dev->xmit_lock); + spin_lock(&dev->queue_lock); } - spin_lock(&dev->queue_lock); return -1; } - if (ret == NETDEV_TX_LOCKED && nolock) { - spin_lock(&dev->queue_lock); + if (ret == NETDEV_TX_LOCKED && nolock) goto collision; - } } /* NETDEV_TX_BUSY - we need to requeue */ @@ -159,8 +165,8 @@ if (!nolock) { dev->xmit_lock_owner = -1; spin_unlock(&dev->xmit_lock); + spin_lock(&dev->queue_lock); } - spin_lock(&dev->queue_lock); q = dev->qdisc; } [-- Attachment #3: sungem.patch --] [-- Type: application/octet-stream, Size: 1182 bytes --] ===== drivers/net/sungem.c 1.71 vs edited ===== --- 1.71/drivers/net/sungem.c 2004-11-14 10:45:36 +01:00 +++ edited/drivers/net/sungem.c 2005-01-03 00:00:22 +01:00 @@ -950,6 +950,7 @@ return 0; } +/* Called with dev->queue_lock held */ static int gem_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct gem *gp = dev->priv; @@ -976,8 +977,16 @@ return NETDEV_TX_LOCKED; } + /* We can now drop queue_lock since tx_lock protects us. But remember + * that queue_lock must be reacquired before we return. Moreover we + * must reacquire it before releasing tx_lock to avoid being called + * with the queue stopped. + */ + spin_unlock(&dev->queue_lock); + /* This is a hard error, log it. */ if (TX_BUFFS_AVAIL(gp) <= (skb_shinfo(skb)->nr_frags + 1)) { + spin_lock(&dev->queue_lock); netif_stop_queue(dev); spin_unlock_irqrestore(&gp->tx_lock, flags); printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n", @@ -1066,6 +1075,7 @@ dev->name, entry, skb->len); mb(); writel(gp->tx_new, gp->regs + TXDMA_KICK); + spin_lock(&dev->queue_lock); spin_unlock_irqrestore(&gp->tx_lock, flags); dev->trans_start = jiffies; [-- Attachment #4: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2005-01-02 23:30 ` Eric Lemoine @ 2005-01-03 7:41 ` Eric Lemoine 2005-01-03 15:04 ` jamal 1 sibling, 0 replies; 62+ messages in thread From: Eric Lemoine @ 2005-01-03 7:41 UTC (permalink / raw) To: hadi; +Cc: netdev, Patrick McHardy, openib-general, David S. Miller > Two (untested) patches implementing what I described above. > > The first pach (sch_generic.patch) keeps queue_lock held in > qdisc_restart() when calling hard_start_xmit() of a LLTX driver. The > second (sungem.patch) makes sungem release queue_lock after grabbing > its private tx lock. > > Note that the modifications made to qdisc_restart() are not compatible > with the current LLTX drivers. All LLTX drivers must be modified along > sungem.patch's lines. Take sungem.patch as an example of how things > must be done. Correction: the current LLTX drivers need not be modified to work correctly. However, as an opimisation, they can modified along sungem.patch's line. Thanks, -- Eric ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2005-01-02 23:30 ` Eric Lemoine 2005-01-03 7:41 ` Eric Lemoine @ 2005-01-03 15:04 ` jamal 2005-01-03 15:48 ` Eric Lemoine 2005-01-03 15:57 ` Roland Dreier 1 sibling, 2 replies; 62+ messages in thread From: jamal @ 2005-01-03 15:04 UTC (permalink / raw) To: Eric Lemoine, Andi Kleen Cc: Patrick McHardy, David S. Miller, roland, netdev, openib-general this piece: ----- + + /* And release queue */ + spin_unlock(&dev->queue_lock); } --------- Isnt there a possibility (higher as you increase number of processors) that you will spin for a long time in _the driver_ waiting for the queue lock? Maybe we need a flag "the queue lock" is already been grabbed passed to the driver. LLTX is a good alias but insufficient (if you run your driver in an old kernels which support LLTX but not the idea of locking the queue for you). Andi or anyone else - has anyone really done perfomance analysis of LLTX (with anything other than loopback) and shown it has any value? Maybe time to just shoot the damn thing. cheers, jamal On Sun, 2005-01-02 at 18:30, Eric Lemoine wrote: > On 28 Dec 2004 08:31:57 -0500, jamal <hadi@cyberus.ca> wrote: > > On Fri, 2004-12-24 at 11:10, Eric Lemoine wrote: > > > > > Yes but requiring drivers to release a lock that they should not even > > > be aware of doesn't sound good. Another way would be to keep > > > dev->queue_lock grabbed when entering start_xmit() and let the driver > > > drop it (and re-acquire it before it returns) only if it wishes so. > > > Although I don't like this too much either, that's the best way I can > > > think of up to now... > > > > I am not a big fan of that patch either, but i cant think of a cleaner > > way to do it. > > The violation already happens with the LLTX flag. So maybe a big warning > > that says "Do this only if you driver is LLTX enabled". The other way to > > do it is put a check to see if LLTX is enabled before releasing that > > lock - but why the extra cycles? Driver writer should know. > > Two (untested) patches implementing what I described above. > > The first pach (sch_generic.patch) keeps queue_lock held in > qdisc_restart() when calling hard_start_xmit() of a LLTX driver. The > second (sungem.patch) makes sungem release queue_lock after grabbing > its private tx lock. > > Note that the modifications made to qdisc_restart() are not compatible > with the current LLTX drivers. All LLTX drivers must be modified along > sungem.patch's lines. Take sungem.patch as an example of how things > must be done. > > Patches apply to current davem bk snapshot. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2005-01-03 15:04 ` jamal @ 2005-01-03 15:48 ` Eric Lemoine 2005-01-03 15:57 ` Roland Dreier 1 sibling, 0 replies; 62+ messages in thread From: Eric Lemoine @ 2005-01-03 15:48 UTC (permalink / raw) To: hadi; +Cc: netdev, Andi Kleen, openib-general, Patrick McHardy, David S. Miller On 03 Jan 2005 10:04:21 -0500, jamal <hadi@cyberus.ca> wrote: > this piece: > ----- > + > + /* And release queue */ > + spin_unlock(&dev->queue_lock); > } > --------- > > Isnt there a possibility (higher as you increase number of processors) > that you will spin for a long time in _the driver_ waiting for the queue > lock? I don't see how LLTX is different from non-LLTX wrt the time spent spinning on queue_lock. What am i missing? > > Maybe we need a flag "the queue lock" is already been grabbed passed > to the driver. LLTX is a good alias but insufficient (if you run your > driver in an old kernels which support LLTX but not the idea of locking > the queue for you). > > Andi or anyone else - has anyone really done perfomance analysis of LLTX > (with anything other than loopback) and shown it has any value? Maybe > time to just shoot the damn thing. I do not like LLTX too much either as I can't see a cleaner way to get rid of that packet reordering race condition. -- Eric ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2005-01-03 15:04 ` jamal 2005-01-03 15:48 ` Eric Lemoine @ 2005-01-03 15:57 ` Roland Dreier 2005-01-03 16:41 ` Eric Lemoine 1 sibling, 1 reply; 62+ messages in thread From: Roland Dreier @ 2005-01-03 15:57 UTC (permalink / raw) To: hadi Cc: netdev, Eric Lemoine, Andi Kleen, openib-general, Patrick McHardy, David S. Miller jamal> Andi or anyone else - has anyone really done perfomance jamal> analysis of LLTX (with anything other than loopback) and jamal> shown it has any value? Maybe time to just shoot the damn jamal> thing. For the IPoIB driver, it seems to be worth a couple percent. It's hard to get really consistent results but on my pair of test systems, TCP throughput with netperf goes up by about 30 Mbps out of roughly 1900 Mbps total. It's probably not worth the complexity in the net core if all drivers only get that level of improvement. - Roland ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2005-01-03 15:57 ` Roland Dreier @ 2005-01-03 16:41 ` Eric Lemoine 2005-01-03 16:54 ` Roland Dreier 0 siblings, 1 reply; 62+ messages in thread From: Eric Lemoine @ 2005-01-03 16:41 UTC (permalink / raw) To: Roland Dreier Cc: netdev, hadi, Andi Kleen, openib-general, Patrick McHardy, David S. Miller On Mon, 03 Jan 2005 07:57:14 -0800, Roland Dreier <roland@topspin.com> wrote: > jamal> Andi or anyone else - has anyone really done perfomance > jamal> analysis of LLTX (with anything other than loopback) and > jamal> shown it has any value? Maybe time to just shoot the damn > jamal> thing. > > For the IPoIB driver, it seems to be worth a couple percent. It's > hard to get really consistent results but on my pair of test systems, > TCP throughput with netperf goes up by about 30 Mbps out of roughly > 1900 Mbps total. It's probably not worth the complexity in the net > core if all drivers only get that level of improvement. > > - Roland > What are your machines? In particular, how many CPUs do they have? -- Eric ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2005-01-03 16:41 ` Eric Lemoine @ 2005-01-03 16:54 ` Roland Dreier 2005-01-03 17:07 ` Eric Lemoine 0 siblings, 1 reply; 62+ messages in thread From: Roland Dreier @ 2005-01-03 16:54 UTC (permalink / raw) To: Eric Lemoine Cc: netdev, hadi, Andi Kleen, openib-general, Patrick McHardy, David S. Miller Eric> What are your machines? In particular, how many CPUs do they have? Dual Xeons with HT on, so they look like 4 CPUs. - R. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: LLTX and netif_stop_queue 2005-01-03 16:54 ` Roland Dreier @ 2005-01-03 17:07 ` Eric Lemoine 2005-01-03 17:12 ` Grant Grundler 0 siblings, 1 reply; 62+ messages in thread From: Eric Lemoine @ 2005-01-03 17:07 UTC (permalink / raw) To: Roland Dreier Cc: netdev, hadi, Andi Kleen, openib-general, Patrick McHardy, David S. Miller On Mon, 03 Jan 2005 08:54:59 -0800, Roland Dreier <roland@topspin.com> wrote: > Eric> What are your machines? In particular, how many CPUs do they have? > > Dual Xeons with HT on, so they look like 4 CPUs. If I understand correctly, LLTX aims at avoiding cache misses on lock variables (because of cacheline bouncing). So the effect of LLTX should increase as the number of CPUs not sharing the same cache increases. And two CPUs might not be enough... -- Eric ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Re: LLTX and netif_stop_queue 2005-01-03 17:07 ` Eric Lemoine @ 2005-01-03 17:12 ` Grant Grundler 2005-01-04 4:18 ` jamal 0 siblings, 1 reply; 62+ messages in thread From: Grant Grundler @ 2005-01-03 17:12 UTC (permalink / raw) To: Eric Lemoine Cc: netdev, hadi, Andi Kleen, openib-general, David S. Miller, Patrick McHardy On Mon, Jan 03, 2005 at 06:07:00PM +0100, Eric Lemoine wrote: > If I understand correctly, LLTX aims at avoiding cache misses on lock > variables (because of cacheline bouncing). So the effect of LLTX > should increase as the number of CPUs not sharing the same cache > increases. And two CPUs might not be enough... Cacheline bouncing usually starts to show up with > 4 CPUs (ie 8 or more). Some workloads that Jamal cares about (routing) only need 2 cpus. grant ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Re: LLTX and netif_stop_queue 2005-01-03 17:12 ` Grant Grundler @ 2005-01-04 4:18 ` jamal 2005-01-19 22:47 ` David S. Miller 0 siblings, 1 reply; 62+ messages in thread From: jamal @ 2005-01-04 4:18 UTC (permalink / raw) To: Grant Grundler Cc: netdev, Eric Lemoine, Andi Kleen, openib-general, David S. Miller, Patrick McHardy On Mon, 2005-01-03 at 12:12, Grant Grundler wrote: > Some workloads that Jamal cares about (routing) only need 2 cpus. What bothers me is more the complexity this has introduced more than the workload. OTOH, 1-2% improvement posted by Roland is good justification. cheers, jamal ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Re: LLTX and netif_stop_queue 2005-01-04 4:18 ` jamal @ 2005-01-19 22:47 ` David S. Miller 2005-01-19 23:18 ` Stephen Hemminger 0 siblings, 1 reply; 62+ messages in thread From: David S. Miller @ 2005-01-19 22:47 UTC (permalink / raw) To: hadi; +Cc: netdev, eric.lemoine, ak, openib-general, kaber On 03 Jan 2005 23:18:14 -0500 jamal <hadi@cyberus.ca> wrote: > On Mon, 2005-01-03 at 12:12, Grant Grundler wrote: > > > Some workloads that Jamal cares about (routing) only need 2 cpus. > > What bothers me is more the complexity this has introduced more than > the workload. OTOH, 1-2% improvement posted by Roland is good > justification. I think I'm going to put in something like Eric's patch and fix up the other LLTX drivers as per his sungem patch. There is a part of me that does want to yank LLTX for non-loopback out of the tree. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Re: LLTX and netif_stop_queue 2005-01-19 22:47 ` David S. Miller @ 2005-01-19 23:18 ` Stephen Hemminger 2005-01-19 23:41 ` David S. Miller 2005-01-20 0:46 ` [PATCH]: was " David S. Miller 0 siblings, 2 replies; 62+ messages in thread From: Stephen Hemminger @ 2005-01-19 23:18 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, eric.lemoine, hadi, ak, openib-general, kaber On Wed, 19 Jan 2005 14:47:11 -0800 "David S. Miller" <davem@davemloft.net> wrote: > On 03 Jan 2005 23:18:14 -0500 > jamal <hadi@cyberus.ca> wrote: > > > On Mon, 2005-01-03 at 12:12, Grant Grundler wrote: > > > > > Some workloads that Jamal cares about (routing) only need 2 cpus. > > > > What bothers me is more the complexity this has introduced more than > > the workload. OTOH, 1-2% improvement posted by Roland is good > > justification. > > I think I'm going to put in something like Eric's patch and fix > up the other LLTX drivers as per his sungem patch. > > There is a part of me that does want to yank LLTX for non-loopback > out of the tree. Wondering, why not just have the drivers have a way to lock dev->queue_lock in the interrupt handler, and change the xmit to do spin_lock_irqsave? Any driver that assumes it is being called with irq's enabled in transmit is probably already busted anyway. -- Stephen Hemminger <shemminger@osdl.org> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Re: LLTX and netif_stop_queue 2005-01-19 23:18 ` Stephen Hemminger @ 2005-01-19 23:41 ` David S. Miller 2005-01-20 0:02 ` [openib-general] " Jeff Garzik 2005-01-20 0:47 ` Francois Romieu 2005-01-20 0:46 ` [PATCH]: was " David S. Miller 1 sibling, 2 replies; 62+ messages in thread From: David S. Miller @ 2005-01-19 23:41 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev, eric.lemoine, hadi, ak, openib-general, kaber On Wed, 19 Jan 2005 15:18:53 -0800 Stephen Hemminger <shemminger@osdl.org> wrote: > Wondering, why not just have the drivers have a way to lock dev->queue_lock > in the interrupt handler, and change the xmit to do spin_lock_irqsave? > > Any driver that assumes it is being called with irq's enabled in transmit > is probably already busted anyway. Yes, that's an idea. Effectively, LLTX is working around the fact that dev->xmit_lock is BH disabling instead of IRQ disabling. And there is no fundamental reason for that. Usually we strive for BH disabling locks in order to decrease the amount of hard IRQ disabling time in the kernel. But here, as soon as we call into ->hard_start_xmit(), the driver typically turns hard IRQs off anyways. There are other things using this though, such as multicast list upload. If we change dev->xmit_lock to be an IRQ disabling lock, then drivers can use it in place of their private tx_lock. We would still need something special for loopback, which wants and needs no locking at all. Also, changing dev->xmit_lock to be IRQ disabling is not %100 trivial. We'd need to verify the side-effects this has on gems like the spin_unlock_wait(&dev->xmit_lock) in the Tulip winbond power management code. There is another fun case in the 6pack hamradio driver where it is doing IRQ disabling when taking dev->xmit_lock. Originally, dev->xmit_lock was added so that drivers that were SMP dumb could stay that way. Thus preserving the guarentee that there would be only one active call into the dev->hard_start_xmit method across the entire system. I don't think any of that is relevant any longer. All of our network drivers are pretty clean in this regard. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [openib-general] Re: LLTX and netif_stop_queue 2005-01-19 23:41 ` David S. Miller @ 2005-01-20 0:02 ` Jeff Garzik 2005-01-20 0:46 ` Stephen Hemminger 2005-01-20 0:47 ` Francois Romieu 1 sibling, 1 reply; 62+ messages in thread From: Jeff Garzik @ 2005-01-20 0:02 UTC (permalink / raw) To: David S. Miller Cc: Stephen Hemminger, hadi, iod00d, eric.lemoine, roland, netdev, ak, openib-general, kaber David S. Miller wrote: > Usually we strive for BH disabling locks in order to decrease the > amount of hard IRQ disabling time in the kernel. But here, as soon > as we call into ->hard_start_xmit(), the driver typically turns hard > IRQs off anyways. [...] > Also, changing dev->xmit_lock to be IRQ disabling is not %100 trivial. Agreed on both counts... ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [openib-general] Re: LLTX and netif_stop_queue 2005-01-20 0:02 ` [openib-general] " Jeff Garzik @ 2005-01-20 0:46 ` Stephen Hemminger 2005-01-20 0:47 ` David S. Miller 0 siblings, 1 reply; 62+ messages in thread From: Stephen Hemminger @ 2005-01-20 0:46 UTC (permalink / raw) To: Jeff Garzik Cc: David S. Miller, hadi, iod00d, eric.lemoine, roland, netdev, ak, openib-general, kaber On Wed, 19 Jan 2005 19:02:52 -0500 Jeff Garzik <jgarzik@pobox.com> wrote: > David S. Miller wrote: > > Usually we strive for BH disabling locks in order to decrease the > > amount of hard IRQ disabling time in the kernel. But here, as soon > > as we call into ->hard_start_xmit(), the driver typically turns hard > > IRQs off anyways. > [...] > > Also, changing dev->xmit_lock to be IRQ disabling is not %100 trivial. > > > Agreed on both counts... But in the end it would be safer than the current LLTX usage in drivers which seems like it opening up problems. -- Stephen Hemminger <shemminger@osdl.org> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [openib-general] Re: LLTX and netif_stop_queue 2005-01-20 0:46 ` Stephen Hemminger @ 2005-01-20 0:47 ` David S. Miller 0 siblings, 0 replies; 62+ messages in thread From: David S. Miller @ 2005-01-20 0:47 UTC (permalink / raw) To: Stephen Hemminger Cc: jgarzik, hadi, iod00d, eric.lemoine, roland, netdev, ak, openib-general, kaber On Wed, 19 Jan 2005 16:46:14 -0800 Stephen Hemminger <shemminger@osdl.org> wrote: > But in the end it would be safer than the current LLTX usage in drivers > which seems like it opening up problems. Agreed, see my prototype patch in a posting I just made. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [openib-general] Re: LLTX and netif_stop_queue 2005-01-19 23:41 ` David S. Miller 2005-01-20 0:02 ` [openib-general] " Jeff Garzik @ 2005-01-20 0:47 ` Francois Romieu 2005-01-20 0:52 ` David S. Miller 1 sibling, 1 reply; 62+ messages in thread From: Francois Romieu @ 2005-01-20 0:47 UTC (permalink / raw) To: David S. Miller Cc: Stephen Hemminger, hadi, iod00d, eric.lemoine, roland, netdev, ak, openib-general, kaber David S. Miller <davem@davemloft.net> : [...] > Originally, dev->xmit_lock was added so that drivers that were SMP dumb > could stay that way. Thus preserving the guarentee that there would be > only one active call into the dev->hard_start_xmit method across the > entire system. I don't think any of that is relevant any longer. All > of our network drivers are pretty clean in this regard. (nit) Almost all. I used the fact that dev->hard_start_xmit was issued in a bh disabled context to exchange spinlock_irqsave for ordered ops on ring indexes so as to sync hard_start_xmit and the irq handler in the r8169 driver. It is a bit sick but Jon Mason reported it made a noticeable difference to avoid the irqsave on its 4 way ppc64 and nobody complained about it. -- Ueimor ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [openib-general] Re: LLTX and netif_stop_queue 2005-01-20 0:47 ` Francois Romieu @ 2005-01-20 0:52 ` David S. Miller 2005-01-20 1:17 ` Francois Romieu 0 siblings, 1 reply; 62+ messages in thread From: David S. Miller @ 2005-01-20 0:52 UTC (permalink / raw) To: Francois Romieu Cc: shemminger, hadi, iod00d, eric.lemoine, roland, netdev, ak, openib-general, kaber On Thu, 20 Jan 2005 01:47:53 +0100 Francois Romieu <romieu@fr.zoreil.com> wrote: > David S. Miller <davem@davemloft.net> : > [...] > > Originally, dev->xmit_lock was added so that drivers that were SMP dumb > > could stay that way. Thus preserving the guarentee that there would be > > only one active call into the dev->hard_start_xmit method across the > > entire system. I don't think any of that is relevant any longer. All > > of our network drivers are pretty clean in this regard. > > (nit) > > Almost all. I used the fact that dev->hard_start_xmit was issued in > a bh disabled context to exchange spinlock_irqsave for ordered ops > on ring indexes so as to sync hard_start_xmit and the irq handler in > the r8169 driver. It is a bit sick but Jon Mason reported it made a > noticeable difference to avoid the irqsave on its 4 way ppc64 and > nobody complained about it. Hmmm... ok then. Unfortunately, my prototype patch I just posted will make IRQs get disabled in the ->hard_start_xmit() path. BTW, in your close() routine you do this: /* Give a racing hard_start_xmit a few cycles to complete. */ set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(1); This is no guarentee of progress. You might instead want to do a synchronize_kernel() or similar, which does in fact guarentee a quiescent state. Or if my patch goes in use spin_unlock_wait(&netdev->xmit_lock) ;-) ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [openib-general] Re: LLTX and netif_stop_queue 2005-01-20 0:52 ` David S. Miller @ 2005-01-20 1:17 ` Francois Romieu 0 siblings, 0 replies; 62+ messages in thread From: Francois Romieu @ 2005-01-20 1:17 UTC (permalink / raw) To: David S. Miller Cc: shemminger, hadi, iod00d, eric.lemoine, roland, netdev, ak, openib-general, kaber David S. Miller <davem@davemloft.net> : [...] > Hmmm... ok then. Unfortunately, my prototype patch I just posted > will make IRQs get disabled in the ->hard_start_xmit() path. I'm completely fine with the prototype patch. It will allow to cut some code in the driver and the r8169 has room for improvement anyway. Point taken for the close() routine. -- Ueimor ^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-19 23:18 ` Stephen Hemminger 2005-01-19 23:41 ` David S. Miller @ 2005-01-20 0:46 ` David S. Miller 2005-01-20 3:14 ` Andi Kleen ` (2 more replies) 1 sibling, 3 replies; 62+ messages in thread From: David S. Miller @ 2005-01-20 0:46 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev, eric.lemoine, hadi, ak, openib-general, kaber Ok, here is what something like this might look like. In summary: 1) dev->xmit_lock is now IRQ disabling instead of BH disabling 2) Drivers can use dev->xmit_lock in place of their private driver_priv->tx_lock and this is effectively the same gain LLTX gave drivers sans the race condition which started this thread. 3) NETDEV_TX_LOCKED is gone 4) LLTX stays, but this means that the driver's TX routine is %100 lockless. It is only to be used by loopback and other queueless software devices such as tunnels. BTW, one next step would be to implement things as Alexey prescribes in net/core/dev.c, that is to make all software devices %100 lockless and use LLTX. Then all the locking code under his comment in dev_queue_xmit() for the queueless device case can be deleted. We can assert the LLTX flag being set in that code path in order to catch unconverted software devices. Also, this new semantic of LLTX means that it should never be seen set in qdisc_restart() any longer. So we could remove all of that LLTX complexity from that code and assert on it not being set. That would be quite a welcome cleanup. Comments? ===== Documentation/networking/netdevices.txt 1.5 vs edited ===== --- 1.5/Documentation/networking/netdevices.txt 2004-09-12 16:55:06 -07:00 +++ edited/Documentation/networking/netdevices.txt 2005-01-19 16:09:50 -08:00 @@ -45,10 +45,9 @@ Synchronization: dev->xmit_lock spinlock. When the driver sets NETIF_F_LLTX in dev->features this will be called without holding xmit_lock. In this case the driver - has to lock by itself when needed. It is recommended to use a try lock - for this and return -1 when the spin lock fails. - The locking there should also properly protect against - set_multicast_list + has to execute it's transmission routine in a completely lockless + manner. It is recommended only for queueless devices such + loopback and tunnels. Context: BHs disabled Notes: netif_queue_stopped() is guaranteed false Return codes: @@ -56,8 +55,6 @@ o NETDEV_TX_BUSY Cannot transmit packet, try later Usually a bug, means queue start/stop flow control is broken in the driver. Note: the driver must NOT put the skb in its DMA ring. - o NETDEV_TX_LOCKED Locking failed, please retry quickly. - Only valid when NETIF_F_LLTX is set. dev->tx_timeout: Synchronization: dev->xmit_lock spinlock. ===== drivers/infiniband/ulp/ipoib/ipoib.h 1.3 vs edited ===== --- 1.3/drivers/infiniband/ulp/ipoib/ipoib.h 2005-01-15 14:01:50 -08:00 +++ edited/drivers/infiniband/ulp/ipoib/ipoib.h 2005-01-19 15:50:47 -08:00 @@ -104,10 +104,10 @@ }; /* - * Device private locking: tx_lock protects members used in TX fast - * path (and we use LLTX so upper layers don't do extra locking). - * lock protects everything else. lock nests inside of tx_lock (ie - * tx_lock must be acquired first if needed). + * Device private locking: netdev->xmit_lock protects members used + * in TX fast path. + * lock protects everything else. lock nests inside of xmit_lock (ie + * xmit_lock must be acquired first if needed). */ struct ipoib_dev_priv { spinlock_t lock; @@ -150,7 +150,6 @@ struct ipoib_buf *rx_ring; - spinlock_t tx_lock; struct ipoib_buf *tx_ring; unsigned tx_head; unsigned tx_tail; ===== drivers/infiniband/ulp/ipoib/ipoib_ib.c 1.4 vs edited ===== --- 1.4/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2005-01-15 14:01:50 -08:00 +++ edited/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2005-01-19 15:51:36 -08:00 @@ -247,12 +247,12 @@ dev_kfree_skb_any(tx_req->skb); - spin_lock_irqsave(&priv->tx_lock, flags); + spin_lock_irqsave(&dev->xmit_lock, flags); ++priv->tx_tail; if (netif_queue_stopped(dev) && priv->tx_head - priv->tx_tail <= IPOIB_TX_RING_SIZE / 2) netif_wake_queue(dev); - spin_unlock_irqrestore(&priv->tx_lock, flags); + spin_unlock_irqrestore(&dev->xmit_lock, flags); if (wc->status != IB_WC_SUCCESS && wc->status != IB_WC_WR_FLUSH_ERR) ===== drivers/infiniband/ulp/ipoib/ipoib_main.c 1.3 vs edited ===== --- 1.3/drivers/infiniband/ulp/ipoib/ipoib_main.c 2005-01-15 14:01:48 -08:00 +++ edited/drivers/infiniband/ulp/ipoib/ipoib_main.c 2005-01-19 15:53:27 -08:00 @@ -411,7 +411,7 @@ /* * We can only be called from ipoib_start_xmit, so we're - * inside tx_lock -- no need to save/restore flags. + * inside dev->xmit_lock -- no need to save/restore flags. */ spin_lock(&priv->lock); @@ -483,7 +483,7 @@ /* * We can only be called from ipoib_start_xmit, so we're - * inside tx_lock -- no need to save/restore flags. + * inside dev->xmit_lock -- no need to save/restore flags. */ spin_lock(&priv->lock); @@ -526,27 +526,11 @@ spin_unlock(&priv->lock); } +/* Called with dev->xmit_lock held and IRQs disabled. */ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_neigh *neigh; - unsigned long flags; - - local_irq_save(flags); - if (!spin_trylock(&priv->tx_lock)) { - local_irq_restore(flags); - return NETDEV_TX_LOCKED; - } - - /* - * Check if our queue is stopped. Since we have the LLTX bit - * set, we can't rely on netif_stop_queue() preventing our - * xmit function from being called with a full queue. - */ - if (unlikely(netif_queue_stopped(dev))) { - spin_unlock_irqrestore(&priv->tx_lock, flags); - return NETDEV_TX_BUSY; - } if (skb->dst && skb->dst->neighbour) { if (unlikely(!*to_ipoib_neigh(skb->dst->neighbour))) { @@ -601,7 +585,6 @@ } out: - spin_unlock_irqrestore(&priv->tx_lock, flags); return NETDEV_TX_OK; } @@ -797,7 +780,7 @@ dev->addr_len = INFINIBAND_ALEN; dev->type = ARPHRD_INFINIBAND; dev->tx_queue_len = IPOIB_TX_RING_SIZE * 2; - dev->features = NETIF_F_VLAN_CHALLENGED | NETIF_F_LLTX; + dev->features = NETIF_F_VLAN_CHALLENGED; /* MTU will be reset when mcast join happens */ dev->mtu = IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN; @@ -812,7 +795,6 @@ priv->dev = dev; spin_lock_init(&priv->lock); - spin_lock_init(&priv->tx_lock); init_MUTEX(&priv->mcast_mutex); init_MUTEX(&priv->vlan_mutex); ===== drivers/net/sungem.c 1.72 vs edited ===== --- 1.72/drivers/net/sungem.c 2004-11-05 15:56:15 -08:00 +++ edited/drivers/net/sungem.c 2005-01-19 15:58:26 -08:00 @@ -835,9 +835,9 @@ } /* Run TX completion thread */ - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); gem_tx(dev, gp, gp->status); - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irqrestore(&gp->lock, flags); @@ -932,12 +932,12 @@ readl(gp->regs + MAC_RXCFG)); spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); gp->reset_task_pending = 2; schedule_work(&gp->reset_task); - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&gp->lock); } @@ -955,7 +955,6 @@ struct gem *gp = dev->priv; int entry; u64 ctrl; - unsigned long flags; ctrl = 0; if (skb->ip_summed == CHECKSUM_HW) { @@ -969,17 +968,9 @@ (csum_stuff_off << 21)); } - local_irq_save(flags); - if (!spin_trylock(&gp->tx_lock)) { - /* Tell upper layer to requeue */ - local_irq_restore(flags); - return NETDEV_TX_LOCKED; - } - /* This is a hard error, log it. */ if (TX_BUFFS_AVAIL(gp) <= (skb_shinfo(skb)->nr_frags + 1)) { netif_stop_queue(dev); - spin_unlock_irqrestore(&gp->tx_lock, flags); printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n", dev->name); return NETDEV_TX_BUSY; @@ -1066,7 +1057,6 @@ dev->name, entry, skb->len); mb(); writel(gp->tx_new, gp->regs + TXDMA_KICK); - spin_unlock_irqrestore(&gp->tx_lock, flags); dev->trans_start = jiffies; @@ -1097,11 +1087,11 @@ } spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); dev->mtu = new_mtu; gp->reset_task_pending = 1; schedule_work(&gp->reset_task); - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&gp->lock); flush_scheduled_work(); @@ -1111,7 +1101,7 @@ #define STOP_TRIES 32 -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_stop(struct gem *gp) { int limit; @@ -1137,7 +1127,7 @@ printk(KERN_ERR "%s: SW reset is ghetto.\n", gp->dev->name); } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_start_dma(struct gem *gp) { unsigned long val; @@ -1162,7 +1152,7 @@ } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ // XXX dbl check what that function should do when called on PCS PHY static void gem_begin_auto_negotiation(struct gem *gp, struct ethtool_cmd *ep) { @@ -1249,7 +1239,7 @@ /* A link-up condition has occurred, initialize and enable the * rest of the chip. * - * Must be invoked under gp->lock and gp->tx_lock. + * Must be invoked under gp->lock and dev->xmit_lock. */ static int gem_set_link_modes(struct gem *gp) { @@ -1356,7 +1346,7 @@ return 0; } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static int gem_mdio_link_not_up(struct gem *gp) { switch (gp->lstate) { @@ -1414,7 +1404,7 @@ netif_poll_disable(gp->dev); spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); if (gp->hw_running && gp->opened) { netif_stop_queue(gp->dev); @@ -1430,7 +1420,7 @@ } gp->reset_task_pending = 0; - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irq(&gp->lock); netif_poll_enable(gp->dev); } @@ -1444,7 +1434,7 @@ return; spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); /* If the link of task is still pending, we just * reschedule the link timer @@ -1514,11 +1504,11 @@ restart: mod_timer(&gp->link_timer, jiffies + ((12 * HZ) / 10)); out_unlock: - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irq(&gp->lock); } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_clean_rings(struct gem *gp) { struct gem_init_block *gb = gp->init_block; @@ -1569,7 +1559,7 @@ } } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_init_rings(struct gem *gp) { struct gem_init_block *gb = gp->init_block; @@ -1619,7 +1609,7 @@ wmb(); } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_init_phy(struct gem *gp) { u32 mifcfg; @@ -1757,7 +1747,7 @@ } } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_init_dma(struct gem *gp) { u64 desc_dma = (u64) gp->gblock_dvma; @@ -1795,7 +1785,7 @@ gp->regs + RXDMA_BLANK); } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static u32 gem_setup_multicast(struct gem *gp) { @@ -1838,7 +1828,7 @@ return rxcfg; } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_init_mac(struct gem *gp) { unsigned char *e = &gp->dev->dev_addr[0]; @@ -1916,7 +1906,7 @@ writel(0xffffffff, gp->regs + MAC_MCMASK); } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_init_pause_thresholds(struct gem *gp) { u32 cfg; @@ -2052,7 +2042,7 @@ return 0; } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_init_hw(struct gem *gp, int restart_link) { /* On Apple's gmac, I initialize the PHY only after @@ -2150,11 +2140,11 @@ if (!gp->wake_on_lan) { spin_lock_irqsave(&gp->lock, flags); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); gem_stop(gp); writel(MAC_TXRST_CMD, gp->regs + MAC_TXRST); writel(MAC_RXRST_CMD, gp->regs + MAC_RXRST); - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irqrestore(&gp->lock, flags); } @@ -2202,9 +2192,9 @@ unsigned long flags; spin_lock_irqsave(&gp->lock, flags); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); gem_stop(gp); - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irqrestore(&gp->lock, flags); } } @@ -2265,9 +2255,9 @@ /* Reset the chip */ spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); gem_stop(gp); - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irq(&gp->lock); gp->hw_running = 1; @@ -2281,7 +2271,7 @@ printk(KERN_ERR "%s: failed to request irq !\n", gp->dev->name); spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); #ifdef CONFIG_PPC_PMAC if (!hw_was_up && gp->pdev->vendor == PCI_VENDOR_ID_APPLE) gem_apple_powerdown(gp); @@ -2290,14 +2280,14 @@ gp->pm_timer.expires = jiffies + 10*HZ; add_timer(&gp->pm_timer); up(&gp->pm_sem); - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irq(&gp->lock); return -EAGAIN; } spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); /* Allocate & setup ring buffers */ gem_init_rings(gp); @@ -2307,7 +2297,7 @@ gp->opened = 1; - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irq(&gp->lock); up(&gp->pm_sem); @@ -2328,7 +2318,7 @@ /* Stop traffic, mark us closed */ spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); gp->opened = 0; @@ -2343,7 +2333,7 @@ /* Bye, the pm timer will finish the job */ free_irq(gp->pdev->irq, (void *) dev); - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irq(&gp->lock); /* Fire the PM timer that will shut us down in about 10 seconds */ @@ -2374,7 +2364,7 @@ /* If the driver is opened, we stop the DMA */ if (gp->opened) { spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); /* Stop traffic, mark us closed */ netif_device_detach(dev); @@ -2385,7 +2375,7 @@ /* Get rid of ring buffers */ gem_clean_rings(gp); - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irq(&gp->lock); if (gp->pdev->vendor == PCI_VENDOR_ID_APPLE) @@ -2419,14 +2409,14 @@ } #endif /* CONFIG_PPC_PMAC */ spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); gem_stop(gp); gp->hw_running = 1; gem_init_rings(gp); gem_init_hw(gp, 1); - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irq(&gp->lock); netif_device_attach(dev); @@ -2447,7 +2437,7 @@ struct net_device_stats *stats = &gp->net_stats; spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); if (gp->hw_running) { stats->rx_crc_errors += readl(gp->regs + MAC_FCSERR); @@ -2467,7 +2457,7 @@ writel(0, gp->regs + MAC_LCOLL); } - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&gp->lock); return &gp->net_stats; @@ -2483,7 +2473,7 @@ return; spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); netif_stop_queue(dev); @@ -2508,7 +2498,7 @@ netif_wake_queue(dev); - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&gp->lock); } @@ -2540,7 +2530,7 @@ /* Return current PHY settings */ spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); cmd->autoneg = gp->want_autoneg; cmd->speed = gp->phy_mii.speed; cmd->duplex = gp->phy_mii.duplex; @@ -2552,7 +2542,7 @@ */ if (cmd->advertising == 0) cmd->advertising = cmd->supported; - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&gp->lock); } else { // XXX PCS ? cmd->supported = @@ -2592,9 +2582,9 @@ /* Apply settings and restart link process. */ spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); gem_begin_auto_negotiation(gp, cmd); - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&gp->lock); return 0; @@ -2609,9 +2599,9 @@ /* Restart link process. */ spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); gem_begin_auto_negotiation(gp, NULL); - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&gp->lock); return 0; @@ -2863,7 +2853,6 @@ gp->msg_enable = DEFAULT_MSG; spin_lock_init(&gp->lock); - spin_lock_init(&gp->tx_lock); init_MUTEX(&gp->pm_sem); init_timer(&gp->link_timer); @@ -2899,9 +2888,9 @@ gem_apple_powerup(gp); #endif spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); gem_stop(gp); - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&gp->lock); /* Fill up the mii_phy structure (even if we won't use it) */ @@ -2967,11 +2956,11 @@ /* Detect & init PHY, start autoneg */ spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); gp->hw_running = 1; gem_init_phy(gp); gem_begin_auto_negotiation(gp, NULL); - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&gp->lock); if (gp->phy_type == phy_mii_mdio0 || @@ -2982,7 +2971,7 @@ pci_set_drvdata(pdev, dev); /* GEM can do it all... */ - dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_LLTX; + dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM; if (pci_using_dac) dev->features |= NETIF_F_HIGHDMA; ===== drivers/net/sungem.h 1.18 vs edited ===== --- 1.18/drivers/net/sungem.h 2004-09-15 08:16:05 -07:00 +++ edited/drivers/net/sungem.h 2005-01-19 15:54:08 -08:00 @@ -953,7 +953,6 @@ struct gem { spinlock_t lock; - spinlock_t tx_lock; void __iomem *regs; int rx_new, rx_old; int tx_new, tx_old; ===== drivers/net/tg3.c 1.227 vs edited ===== --- 1.227/drivers/net/tg3.c 2005-01-17 13:40:31 -08:00 +++ edited/drivers/net/tg3.c 2005-01-19 15:43:48 -08:00 @@ -2816,9 +2816,9 @@ /* run TX completion thread */ if (sblk->idx[0].tx_consumer != tp->tx_cons) { - spin_lock(&tp->tx_lock); + spin_lock(&netdev->xmit_lock); tg3_tx(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&netdev->xmit_lock); } spin_unlock_irqrestore(&tp->lock, flags); @@ -2939,7 +2939,7 @@ tg3_netif_stop(tp); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&tp->dev->xmit_lock); restart_timer = tp->tg3_flags2 & TG3_FLG2_RESTART_TIMER; tp->tg3_flags2 &= ~TG3_FLG2_RESTART_TIMER; @@ -2949,7 +2949,7 @@ tg3_netif_start(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&tp->dev->xmit_lock); spin_unlock_irq(&tp->lock); if (restart_timer) @@ -3048,6 +3048,7 @@ (base + len + 8 < base)); } +/* dev->xmit_lock is held and IRQs are disabled. */ static int tg3_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct tg3 *tp = netdev_priv(dev); @@ -3055,39 +3056,12 @@ unsigned int i; u32 len, entry, base_flags, mss; int would_hit_hwbug; - unsigned long flags; len = skb_headlen(skb); - /* No BH disabling for tx_lock here. We are running in BH disabled - * context and TX reclaim runs via tp->poll inside of a software - * interrupt. Rejoice! - * - * Actually, things are not so simple. If we are to take a hw - * IRQ here, we can deadlock, consider: - * - * CPU1 CPU2 - * tg3_start_xmit - * take tp->tx_lock - * tg3_timer - * take tp->lock - * tg3_interrupt - * spin on tp->lock - * spin on tp->tx_lock - * - * So we really do need to disable interrupts when taking - * tx_lock here. - */ - local_irq_save(flags); - if (!spin_trylock(&tp->tx_lock)) { - local_irq_restore(flags); - return NETDEV_TX_LOCKED; - } - /* This is a hard error, log it. */ if (unlikely(TX_BUFFS_AVAIL(tp) <= (skb_shinfo(skb)->nr_frags + 1))) { netif_stop_queue(dev); - spin_unlock_irqrestore(&tp->tx_lock, flags); printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n", dev->name); return NETDEV_TX_BUSY; @@ -3224,7 +3198,7 @@ entry, len, last_plus_one, &start, mss)) - goto out_unlock; + goto out; entry = start; } @@ -3236,9 +3210,8 @@ if (TX_BUFFS_AVAIL(tp) <= (MAX_SKB_FRAGS + 1)) netif_stop_queue(dev); -out_unlock: +out: mmiowb(); - spin_unlock_irqrestore(&tp->tx_lock, flags); dev->trans_start = jiffies; @@ -3273,7 +3246,7 @@ tg3_netif_stop(tp); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tg3_halt(tp); @@ -3283,7 +3256,7 @@ tg3_netif_start(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); return 0; @@ -5574,7 +5547,7 @@ unsigned long flags; spin_lock_irqsave(&tp->lock, flags); - spin_lock(&tp->tx_lock); + spin_lock(&tp->dev->xmit_lock); /* All of this garbage is because when using non-tagged * IRQ status the mailbox/status_block protocol the chip @@ -5590,7 +5563,7 @@ if (!(tr32(WDMAC_MODE) & WDMAC_MODE_ENABLE)) { tp->tg3_flags2 |= TG3_FLG2_RESTART_TIMER; - spin_unlock(&tp->tx_lock); + spin_unlock(&tp->dev->xmit_lock); spin_unlock_irqrestore(&tp->lock, flags); schedule_work(&tp->reset_task); return; @@ -5659,7 +5632,7 @@ tp->asf_counter = tp->asf_multiplier; } - spin_unlock(&tp->tx_lock); + spin_unlock(&tp->dev->xmit_lock); spin_unlock_irqrestore(&tp->lock, flags); tp->timer.expires = jiffies + tp->timer_offset; @@ -5672,12 +5645,12 @@ int err; spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tg3_disable_ints(tp); tp->tg3_flags &= ~TG3_FLAG_INIT_COMPLETE; - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); /* The placement of this call is tied @@ -5696,7 +5669,7 @@ } spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); err = tg3_init_hw(tp); if (err) { @@ -5716,7 +5689,7 @@ tp->tg3_flags |= TG3_FLAG_INIT_COMPLETE; } - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); if (err) { @@ -5726,11 +5699,11 @@ } spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tg3_enable_ints(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); netif_start_queue(dev); @@ -5978,7 +5951,7 @@ del_timer_sync(&tp->timer); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); #if 0 tg3_dump_state(tp); #endif @@ -5992,7 +5965,7 @@ TG3_FLAG_GOT_SERDES_FLOWCTL); netif_carrier_off(tp->dev); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); free_irq(dev->irq, dev); @@ -6296,9 +6269,9 @@ struct tg3 *tp = netdev_priv(dev); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); __tg3_set_rx_mode(dev); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); } @@ -6322,7 +6295,7 @@ memset(p, 0, TG3_REGDUMP_LEN); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); #define __GET_REG32(reg) (*(p)++ = tr32(reg)) #define GET_REG32_LOOP(base,len) \ @@ -6372,7 +6345,7 @@ #undef GET_REG32_LOOP #undef GET_REG32_1 - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); } @@ -6496,7 +6469,7 @@ } spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tp->link_config.autoneg = cmd->autoneg; if (cmd->autoneg == AUTONEG_ENABLE) { @@ -6510,7 +6483,7 @@ } tg3_setup_phy(tp, 1); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); return 0; @@ -6627,7 +6600,7 @@ tg3_netif_stop(tp); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tp->rx_pending = ering->rx_pending; @@ -6640,7 +6613,7 @@ tg3_halt(tp); tg3_init_hw(tp); tg3_netif_start(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); return 0; @@ -6661,7 +6634,7 @@ tg3_netif_stop(tp); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); if (epause->autoneg) tp->tg3_flags |= TG3_FLAG_PAUSE_AUTONEG; else @@ -6677,7 +6650,7 @@ tg3_halt(tp); tg3_init_hw(tp); tg3_netif_start(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); return 0; @@ -6803,14 +6776,14 @@ struct tg3 *tp = netdev_priv(dev); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tp->vlgrp = grp; /* Update RX_MODE_KEEP_VLAN_TAG bit in RX_MODE register. */ __tg3_set_rx_mode(dev); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); } @@ -6819,10 +6792,10 @@ struct tg3 *tp = netdev_priv(dev); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); if (tp->vlgrp) tp->vlgrp->vlan_devices[vid] = NULL; - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); } #endif @@ -8241,7 +8214,6 @@ if (pci_using_dac) dev->features |= NETIF_F_HIGHDMA; - dev->features |= NETIF_F_LLTX; #if TG3_VLAN_TAG_USED dev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX; dev->vlan_rx_register = tg3_vlan_rx_register; @@ -8283,7 +8255,6 @@ tp->grc_mode |= GRC_MODE_BSWAP_NONFRM_DATA; #endif spin_lock_init(&tp->lock); - spin_lock_init(&tp->tx_lock); spin_lock_init(&tp->indirect_lock); INIT_WORK(&tp->reset_task, tg3_reset_task, tp); @@ -8496,23 +8467,23 @@ del_timer_sync(&tp->timer); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tg3_disable_ints(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); netif_device_detach(dev); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tg3_halt(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); err = tg3_set_power_state(tp, state); if (err) { spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tg3_init_hw(tp); @@ -8522,7 +8493,7 @@ netif_device_attach(dev); tg3_netif_start(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); } @@ -8547,7 +8518,7 @@ netif_device_attach(dev); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tg3_init_hw(tp); @@ -8558,7 +8529,7 @@ tg3_netif_start(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); return 0; ===== drivers/net/tg3.h 1.52 vs edited ===== --- 1.52/drivers/net/tg3.h 2004-11-15 15:41:31 -08:00 +++ edited/drivers/net/tg3.h 2005-01-19 15:37:29 -08:00 @@ -1980,12 +1980,11 @@ * lock: Held during all operations except TX packet * processing. * - * tx_lock: Held during tg3_start_xmit{,_4gbug} and tg3_tx + * dev->xmit_lock: Held during tg3_start_xmit and tg3_tx * * If you want to shut up all asynchronous processing you must - * acquire both locks, 'lock' taken before 'tx_lock'. IRQs must - * be disabled to take 'lock' but only softirq disabling is - * necessary for acquisition of 'tx_lock'. + * acquire both locks, 'lock' taken before 'xmit_lock'. IRQs must + * be disabled to take either lock. */ spinlock_t lock; spinlock_t indirect_lock; @@ -2003,8 +2002,6 @@ u32 tx_prod; u32 tx_cons; u32 tx_pending; - - spinlock_t tx_lock; struct tg3_tx_buffer_desc *tx_ring; struct tx_ring_info *tx_buffers; ===== drivers/net/e1000/e1000.h 1.58 vs edited ===== --- 1.58/drivers/net/e1000/e1000.h 2004-10-25 18:13:57 -07:00 +++ edited/drivers/net/e1000/e1000.h 2005-01-19 15:45:31 -08:00 @@ -209,7 +209,6 @@ /* TX */ struct e1000_desc_ring tx_ring; - spinlock_t tx_lock; uint32_t txd_cmd; uint32_t tx_int_delay; uint32_t tx_abs_int_delay; ===== drivers/net/e1000/e1000_main.c 1.145 vs edited ===== --- 1.145/drivers/net/e1000/e1000_main.c 2005-01-10 21:03:21 -08:00 +++ edited/drivers/net/e1000/e1000_main.c 2005-01-19 15:46:15 -08:00 @@ -520,9 +520,6 @@ if(pci_using_dac) netdev->features |= NETIF_F_HIGHDMA; - /* hard_start_xmit is safe against parallel locking */ - netdev->features |= NETIF_F_LLTX; - /* before reading the EEPROM, reset the controller to * put the device in a known good starting state */ @@ -732,7 +729,6 @@ atomic_set(&adapter->irq_sem, 1); spin_lock_init(&adapter->stats_lock); - spin_lock_init(&adapter->tx_lock); return 0; } @@ -1308,7 +1304,7 @@ /* Check for Promiscuous and All Multicast modes */ - spin_lock_irqsave(&adapter->tx_lock, flags); + spin_lock_irqsave(&netdev->xmit_lock, flags); rctl = E1000_READ_REG(hw, RCTL); @@ -1359,7 +1355,7 @@ if(hw->mac_type == e1000_82542_rev2_0) e1000_leave_82542_rst(adapter); - spin_unlock_irqrestore(&adapter->tx_lock, flags); + spin_unlock_irqrestore(&netdev->xmit_lock, flags); } /* Need to wait a few seconds after link up to get diagnostic information from @@ -1786,6 +1782,8 @@ } #define TXD_USE_COUNT(S, X) (((S) >> (X)) + 1 ) + +/* Called with dev->xmit_lock held and interrupts disabled. */ static int e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev) { @@ -1794,7 +1792,6 @@ unsigned int max_txd_pwr = E1000_MAX_TXD_PWR; unsigned int tx_flags = 0; unsigned int len = skb->len; - unsigned long flags; unsigned int nr_frags = 0; unsigned int mss = 0; int count = 0; @@ -1838,18 +1835,10 @@ if(adapter->pcix_82544) count += nr_frags; - local_irq_save(flags); - if (!spin_trylock(&adapter->tx_lock)) { - /* Collision - tell upper layer to requeue */ - local_irq_restore(flags); - return NETDEV_TX_LOCKED; - } - /* need: count + 2 desc gap to keep tail from touching * head, otherwise try next time */ if(unlikely(E1000_DESC_UNUSED(&adapter->tx_ring) < count + 2)) { netif_stop_queue(netdev); - spin_unlock_irqrestore(&adapter->tx_lock, flags); return NETDEV_TX_BUSY; } @@ -1857,7 +1846,6 @@ if(unlikely(e1000_82547_fifo_workaround(adapter, skb))) { netif_stop_queue(netdev); mod_timer(&adapter->tx_fifo_stall_timer, jiffies); - spin_unlock_irqrestore(&adapter->tx_lock, flags); return NETDEV_TX_BUSY; } } @@ -1884,7 +1872,6 @@ if(unlikely(E1000_DESC_UNUSED(&adapter->tx_ring) < MAX_SKB_FRAGS + 2)) netif_stop_queue(netdev); - spin_unlock_irqrestore(&adapter->tx_lock, flags); return NETDEV_TX_OK; } @@ -2234,13 +2221,13 @@ tx_ring->next_to_clean = i; - spin_lock(&adapter->tx_lock); + spin_lock(&netdev->xmit_lock); if(unlikely(cleaned && netif_queue_stopped(netdev) && netif_carrier_ok(netdev))) netif_wake_queue(netdev); - spin_unlock(&adapter->tx_lock); + spin_unlock(&netdev->xmit_lock); return cleaned; } ===== include/linux/netdevice.h 1.96 vs edited ===== --- 1.96/include/linux/netdevice.h 2005-01-17 13:13:31 -08:00 +++ edited/include/linux/netdevice.h 2005-01-19 16:07:45 -08:00 @@ -76,7 +76,6 @@ /* Driver transmit return codes */ #define NETDEV_TX_OK 0 /* driver took care of packet */ #define NETDEV_TX_BUSY 1 /* driver tx path was busy*/ -#define NETDEV_TX_LOCKED -1 /* driver tx lock was already taken */ /* * Compute the worst case header length according to the protocols @@ -415,7 +414,7 @@ #define NETIF_F_HW_VLAN_FILTER 512 /* Receive filtering on VLAN */ #define NETIF_F_VLAN_CHALLENGED 1024 /* Device cannot handle VLAN packets */ #define NETIF_F_TSO 2048 /* Can offload TCP/IP segmentation */ -#define NETIF_F_LLTX 4096 /* LockLess TX */ +#define NETIF_F_LLTX 4096 /* Do not grab xmit_lock during ->hard_start_xmit */ /* Called after device is detached from network. */ void (*uninit)(struct net_device *dev); @@ -894,9 +893,11 @@ static inline void netif_tx_disable(struct net_device *dev) { - spin_lock_bh(&dev->xmit_lock); + unsigned long flags; + + spin_lock_irqsave(&dev->xmit_lock, flags); netif_stop_queue(dev); - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irqrestore(&dev->xmit_lock, flags); } /* These functions live elsewhere (drivers/net/net_init.c, but related) */ ===== net/atm/clip.c 1.43 vs edited ===== --- 1.43/net/atm/clip.c 2004-10-01 14:50:07 -07:00 +++ edited/net/atm/clip.c 2005-01-19 16:04:55 -08:00 @@ -97,7 +97,7 @@ printk(KERN_CRIT "!clip_vcc->entry (clip_vcc %p)\n",clip_vcc); return; } - spin_lock_bh(&entry->neigh->dev->xmit_lock); /* block clip_start_xmit() */ + spin_lock_irq(&entry->neigh->dev->xmit_lock); /* block clip_start_xmit() */ entry->neigh->used = jiffies; for (walk = &entry->vccs; *walk; walk = &(*walk)->next) if (*walk == clip_vcc) { @@ -121,7 +121,7 @@ printk(KERN_CRIT "ATMARP: unlink_clip_vcc failed (entry %p, vcc " "0x%p)\n",entry,clip_vcc); out: - spin_unlock_bh(&entry->neigh->dev->xmit_lock); + spin_unlock_irq(&entry->neigh->dev->xmit_lock); } /* The neighbour entry n->lock is held. */ ===== net/core/dev.c 1.181 vs edited ===== --- 1.181/net/core/dev.c 2005-01-13 20:41:04 -08:00 +++ edited/net/core/dev.c 2005-01-19 16:00:08 -08:00 @@ -1190,7 +1190,7 @@ #define HARD_TX_LOCK(dev, cpu) { \ if ((dev->features & NETIF_F_LLTX) == 0) { \ - spin_lock(&dev->xmit_lock); \ + spin_lock_irq(&dev->xmit_lock); \ dev->xmit_lock_owner = cpu; \ } \ } @@ -1198,7 +1198,7 @@ #define HARD_TX_UNLOCK(dev) { \ if ((dev->features & NETIF_F_LLTX) == 0) { \ dev->xmit_lock_owner = -1; \ - spin_unlock(&dev->xmit_lock); \ + spin_unlock_irq(&dev->xmit_lock); \ } \ } ===== net/core/dev_mcast.c 1.4 vs edited ===== --- 1.4/net/core/dev_mcast.c 2004-10-20 01:37:15 -07:00 +++ edited/net/core/dev_mcast.c 2005-01-19 16:05:35 -08:00 @@ -93,9 +93,9 @@ void dev_mc_upload(struct net_device *dev) { - spin_lock_bh(&dev->xmit_lock); + spin_lock_irq(&dev->xmit_lock); __dev_mc_upload(dev); - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); } /* @@ -107,7 +107,7 @@ int err = 0; struct dev_mc_list *dmi, **dmip; - spin_lock_bh(&dev->xmit_lock); + spin_lock_irq(&dev->xmit_lock); for (dmip = &dev->mc_list; (dmi = *dmip) != NULL; dmip = &dmi->next) { /* @@ -139,13 +139,13 @@ */ __dev_mc_upload(dev); - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); return 0; } } err = -ENOENT; done: - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); return err; } @@ -160,7 +160,7 @@ dmi1 = (struct dev_mc_list *)kmalloc(sizeof(*dmi), GFP_ATOMIC); - spin_lock_bh(&dev->xmit_lock); + spin_lock_irq(&dev->xmit_lock); for (dmi = dev->mc_list; dmi != NULL; dmi = dmi->next) { if (memcmp(dmi->dmi_addr, addr, dmi->dmi_addrlen) == 0 && dmi->dmi_addrlen == alen) { @@ -176,7 +176,7 @@ } if ((dmi = dmi1) == NULL) { - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); return -ENOMEM; } memcpy(dmi->dmi_addr, addr, alen); @@ -189,11 +189,11 @@ __dev_mc_upload(dev); - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); return 0; done: - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); if (dmi1) kfree(dmi1); return err; @@ -205,7 +205,7 @@ void dev_mc_discard(struct net_device *dev) { - spin_lock_bh(&dev->xmit_lock); + spin_lock_irq(&dev->xmit_lock); while (dev->mc_list != NULL) { struct dev_mc_list *tmp = dev->mc_list; @@ -216,7 +216,7 @@ } dev->mc_count = 0; - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); } #ifdef CONFIG_PROC_FS @@ -251,7 +251,7 @@ struct dev_mc_list *m; struct net_device *dev = v; - spin_lock_bh(&dev->xmit_lock); + spin_lock_irq(&dev->xmit_lock); for (m = dev->mc_list; m; m = m->next) { int i; @@ -263,7 +263,7 @@ seq_putc(seq, '\n'); } - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); return 0; } ===== net/core/netpoll.c 1.24 vs edited ===== --- 1.24/net/core/netpoll.c 2005-01-13 20:41:04 -08:00 +++ edited/net/core/netpoll.c 2005-01-19 16:05:56 -08:00 @@ -188,7 +188,7 @@ return; } - spin_lock(&np->dev->xmit_lock); + spin_lock_irq(&np->dev->xmit_lock); np->dev->xmit_lock_owner = smp_processor_id(); /* @@ -197,7 +197,7 @@ */ if (netif_queue_stopped(np->dev)) { np->dev->xmit_lock_owner = -1; - spin_unlock(&np->dev->xmit_lock); + spin_unlock_irq(&np->dev->xmit_lock); netpoll_poll(np); goto repeat; @@ -205,7 +205,7 @@ status = np->dev->hard_start_xmit(skb, np->dev); np->dev->xmit_lock_owner = -1; - spin_unlock(&np->dev->xmit_lock); + spin_unlock_irq(&np->dev->xmit_lock); /* transmit busy */ if(status) { ===== net/core/pktgen.c 1.21 vs edited ===== --- 1.21/net/core/pktgen.c 2005-01-10 11:32:10 -08:00 +++ edited/net/core/pktgen.c 2005-01-19 16:07:58 -08:00 @@ -2664,12 +2664,11 @@ } } - spin_lock_bh(&odev->xmit_lock); + spin_lock_irq(&odev->xmit_lock); if (!netif_queue_stopped(odev)) { u64 now; atomic_inc(&(pkt_dev->skb->users)); -retry_now: ret = odev->hard_start_xmit(pkt_dev->skb, odev); if (likely(ret == NETDEV_TX_OK)) { pkt_dev->last_ok = 1; @@ -2677,10 +2676,6 @@ pkt_dev->seq_num++; pkt_dev->tx_bytes += pkt_dev->cur_pkt_size; - } else if (ret == NETDEV_TX_LOCKED - && (odev->features & NETIF_F_LLTX)) { - cpu_relax(); - goto retry_now; } else { /* Retry it next time */ atomic_dec(&(pkt_dev->skb->users)); @@ -2716,7 +2711,7 @@ pkt_dev->next_tx_ns = 0; } - spin_unlock_bh(&odev->xmit_lock); + spin_unlock_irq(&odev->xmit_lock); /* If pkt_dev->count is zero, then run forever */ if ((pkt_dev->count != 0) && (pkt_dev->sofar >= pkt_dev->count)) { ===== net/sched/sch_generic.c 1.33 vs edited ===== --- 1.33/net/sched/sch_generic.c 2005-01-13 20:41:07 -08:00 +++ edited/net/sched/sch_generic.c 2005-01-19 16:10:49 -08:00 @@ -99,16 +99,11 @@ if ((skb = q->dequeue(q)) != NULL) { unsigned nolock = (dev->features & NETIF_F_LLTX); /* - * When the driver has LLTX set it does its own locking - * in start_xmit. No need to add additional overhead by - * locking again. These checks are worth it because - * even uncongested locks can be quite expensive. - * The driver can do trylock like here too, in case - * of lock congestion it should return -1 and the packet - * will be requeued. + * When the driver has LLTX set it does not require any + * locking in start_xmit. */ if (!nolock) { - if (!spin_trylock(&dev->xmit_lock)) { + if (!spin_trylock_irq(&dev->xmit_lock)) { collision: /* So, someone grabbed the driver. */ @@ -143,22 +138,18 @@ if (ret == NETDEV_TX_OK) { if (!nolock) { dev->xmit_lock_owner = -1; - spin_unlock(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); } spin_lock(&dev->queue_lock); return -1; } - if (ret == NETDEV_TX_LOCKED && nolock) { - spin_lock(&dev->queue_lock); - goto collision; - } } /* NETDEV_TX_BUSY - we need to requeue */ /* Release the driver */ if (!nolock) { dev->xmit_lock_owner = -1; - spin_unlock(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); } spin_lock(&dev->queue_lock); q = dev->qdisc; @@ -186,7 +177,7 @@ { struct net_device *dev = (struct net_device *)arg; - spin_lock(&dev->xmit_lock); + spin_lock_irq(&dev->xmit_lock); if (dev->qdisc != &noop_qdisc) { if (netif_device_present(dev) && netif_running(dev) && @@ -200,7 +191,7 @@ dev_hold(dev); } } - spin_unlock(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); dev_put(dev); } @@ -224,17 +215,17 @@ static void dev_watchdog_up(struct net_device *dev) { - spin_lock_bh(&dev->xmit_lock); + spin_lock_irq(&dev->xmit_lock); __netdev_watchdog_up(dev); - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); } static void dev_watchdog_down(struct net_device *dev) { - spin_lock_bh(&dev->xmit_lock); + spin_lock_irq(&dev->xmit_lock); if (del_timer(&dev->watchdog_timer)) __dev_put(dev); - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); } /* "NOOP" scheduler: the best scheduler, recommended for all interfaces ===== net/sched/sch_teql.c 1.16 vs edited ===== --- 1.16/net/sched/sch_teql.c 2004-10-21 22:21:24 -07:00 +++ edited/net/sched/sch_teql.c 2005-01-19 16:06:11 -08:00 @@ -301,12 +301,12 @@ switch (teql_resolve(skb, skb_res, slave)) { case 0: - if (spin_trylock(&slave->xmit_lock)) { + if (spin_trylock_irq(&slave->xmit_lock)) { slave->xmit_lock_owner = smp_processor_id(); if (!netif_queue_stopped(slave) && slave->hard_start_xmit(skb, slave) == 0) { slave->xmit_lock_owner = -1; - spin_unlock(&slave->xmit_lock); + spin_unlock_irq(&slave->xmit_lock); master->slaves = NEXT_SLAVE(q); netif_wake_queue(dev); master->stats.tx_packets++; @@ -314,7 +314,7 @@ return 0; } slave->xmit_lock_owner = -1; - spin_unlock(&slave->xmit_lock); + spin_unlock_irq(&slave->xmit_lock); } if (netif_queue_stopped(dev)) busy = 1; ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-20 0:46 ` [PATCH]: was " David S. Miller @ 2005-01-20 3:14 ` Andi Kleen 2005-01-20 7:05 ` David S. Miller 2005-01-20 3:43 ` Roland Dreier 2005-01-20 4:01 ` jamal 2 siblings, 1 reply; 62+ messages in thread From: Andi Kleen @ 2005-01-20 3:14 UTC (permalink / raw) To: David S. Miller Cc: Stephen Hemminger, hadi, iod00d, eric.lemoine, roland, netdev, ak, openib-general, kaber On Wed, Jan 19, 2005 at 04:46:40PM -0800, David S. Miller wrote: > Ok, here is what something like this might look like. > > In summary: > > 1) dev->xmit_lock is now IRQ disabling instead of BH disabling > 2) Drivers can use dev->xmit_lock in place of their private > driver_priv->tx_lock and this is effectively the same gain > LLTX gave drivers sans the race condition which started this > thread. > 3) NETDEV_TX_LOCKED is gone > 4) LLTX stays, but this means that the driver's TX routine is > %100 lockless. It is only to be used by loopback and other > queueless software devices such as tunnels. Looks good to me and much cleaner than what I initially did. Thanks. -Andi ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-20 3:14 ` Andi Kleen @ 2005-01-20 7:05 ` David S. Miller 0 siblings, 0 replies; 62+ messages in thread From: David S. Miller @ 2005-01-20 7:05 UTC (permalink / raw) To: Andi Kleen Cc: shemminger, hadi, iod00d, eric.lemoine, roland, netdev, ak, openib-general, kaber On Thu, 20 Jan 2005 04:14:23 +0100 Andi Kleen <ak@suse.de> wrote: > Looks good to me and much cleaner than what I initially did. > Thanks. Thanks for the review Andi. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-20 0:46 ` [PATCH]: was " David S. Miller 2005-01-20 3:14 ` Andi Kleen @ 2005-01-20 3:43 ` Roland Dreier 2005-01-20 7:05 ` David S. Miller 2005-01-20 4:01 ` jamal 2 siblings, 1 reply; 62+ messages in thread From: Roland Dreier @ 2005-01-20 3:43 UTC (permalink / raw) To: David S. Miller Cc: netdev, eric.lemoine, hadi, ak, openib-general, kaber, Stephen Hemminger Looks reasonable to me. I'll try the IPoIB changes out to make sure it still works. Thanks, Roland ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-20 3:43 ` Roland Dreier @ 2005-01-20 7:05 ` David S. Miller 2005-01-20 13:51 ` Tommy Christensen ` (2 more replies) 0 siblings, 3 replies; 62+ messages in thread From: David S. Miller @ 2005-01-20 7:05 UTC (permalink / raw) To: Roland Dreier Cc: netdev, eric.lemoine, hadi, ak, openib-general, kaber, shemminger On Wed, 19 Jan 2005 19:43:42 -0800 Roland Dreier <roland@topspin.com> wrote: > Looks reasonable to me. I'll try the IPoIB changes out to make sure > it still works. Great. I actually tested my patch out, and there was a deadlock added to the tg3 and sungem drivers which is fixed in this updated version. Basically, since the multicast code now takes dev->xmit_lock the ->set_multicast_list() routines should not grab it anymore. To prevent ABBA deadlock, I had to remove the acquisition of the usual priv->lock in tg3 and sungem multicast list settings. Grabbing priv->lock was overkill and this new scheme of just holding dev->xmit_lock is OK, because: 1) Messing with the RX mode and multicast hash is safe even in parallel with RX packet processing. 2) All code paths that want to shut up all chip programming code paths still take both dev->xmit_lock and priv->lock thus still able to block out ->set_multicast_list() processing. So I'm going to check this in and push upstream. Let me know if folks find any other errors. Thanks. ===== Documentation/networking/netdevices.txt 1.5 vs edited ===== --- 1.5/Documentation/networking/netdevices.txt 2004-09-12 16:55:06 -07:00 +++ edited/Documentation/networking/netdevices.txt 2005-01-19 16:09:50 -08:00 @@ -45,10 +45,9 @@ Synchronization: dev->xmit_lock spinlock. When the driver sets NETIF_F_LLTX in dev->features this will be called without holding xmit_lock. In this case the driver - has to lock by itself when needed. It is recommended to use a try lock - for this and return -1 when the spin lock fails. - The locking there should also properly protect against - set_multicast_list + has to execute it's transmission routine in a completely lockless + manner. It is recommended only for queueless devices such + loopback and tunnels. Context: BHs disabled Notes: netif_queue_stopped() is guaranteed false Return codes: @@ -56,8 +55,6 @@ o NETDEV_TX_BUSY Cannot transmit packet, try later Usually a bug, means queue start/stop flow control is broken in the driver. Note: the driver must NOT put the skb in its DMA ring. - o NETDEV_TX_LOCKED Locking failed, please retry quickly. - Only valid when NETIF_F_LLTX is set. dev->tx_timeout: Synchronization: dev->xmit_lock spinlock. ===== drivers/infiniband/ulp/ipoib/ipoib.h 1.3 vs edited ===== --- 1.3/drivers/infiniband/ulp/ipoib/ipoib.h 2005-01-15 14:01:50 -08:00 +++ edited/drivers/infiniband/ulp/ipoib/ipoib.h 2005-01-19 15:50:47 -08:00 @@ -104,10 +104,10 @@ }; /* - * Device private locking: tx_lock protects members used in TX fast - * path (and we use LLTX so upper layers don't do extra locking). - * lock protects everything else. lock nests inside of tx_lock (ie - * tx_lock must be acquired first if needed). + * Device private locking: netdev->xmit_lock protects members used + * in TX fast path. + * lock protects everything else. lock nests inside of xmit_lock (ie + * xmit_lock must be acquired first if needed). */ struct ipoib_dev_priv { spinlock_t lock; @@ -150,7 +150,6 @@ struct ipoib_buf *rx_ring; - spinlock_t tx_lock; struct ipoib_buf *tx_ring; unsigned tx_head; unsigned tx_tail; ===== drivers/infiniband/ulp/ipoib/ipoib_ib.c 1.4 vs edited ===== --- 1.4/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2005-01-15 14:01:50 -08:00 +++ edited/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2005-01-19 15:51:36 -08:00 @@ -247,12 +247,12 @@ dev_kfree_skb_any(tx_req->skb); - spin_lock_irqsave(&priv->tx_lock, flags); + spin_lock_irqsave(&dev->xmit_lock, flags); ++priv->tx_tail; if (netif_queue_stopped(dev) && priv->tx_head - priv->tx_tail <= IPOIB_TX_RING_SIZE / 2) netif_wake_queue(dev); - spin_unlock_irqrestore(&priv->tx_lock, flags); + spin_unlock_irqrestore(&dev->xmit_lock, flags); if (wc->status != IB_WC_SUCCESS && wc->status != IB_WC_WR_FLUSH_ERR) ===== drivers/infiniband/ulp/ipoib/ipoib_main.c 1.3 vs edited ===== --- 1.3/drivers/infiniband/ulp/ipoib/ipoib_main.c 2005-01-15 14:01:48 -08:00 +++ edited/drivers/infiniband/ulp/ipoib/ipoib_main.c 2005-01-19 15:53:27 -08:00 @@ -411,7 +411,7 @@ /* * We can only be called from ipoib_start_xmit, so we're - * inside tx_lock -- no need to save/restore flags. + * inside dev->xmit_lock -- no need to save/restore flags. */ spin_lock(&priv->lock); @@ -483,7 +483,7 @@ /* * We can only be called from ipoib_start_xmit, so we're - * inside tx_lock -- no need to save/restore flags. + * inside dev->xmit_lock -- no need to save/restore flags. */ spin_lock(&priv->lock); @@ -526,27 +526,11 @@ spin_unlock(&priv->lock); } +/* Called with dev->xmit_lock held and IRQs disabled. */ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_neigh *neigh; - unsigned long flags; - - local_irq_save(flags); - if (!spin_trylock(&priv->tx_lock)) { - local_irq_restore(flags); - return NETDEV_TX_LOCKED; - } - - /* - * Check if our queue is stopped. Since we have the LLTX bit - * set, we can't rely on netif_stop_queue() preventing our - * xmit function from being called with a full queue. - */ - if (unlikely(netif_queue_stopped(dev))) { - spin_unlock_irqrestore(&priv->tx_lock, flags); - return NETDEV_TX_BUSY; - } if (skb->dst && skb->dst->neighbour) { if (unlikely(!*to_ipoib_neigh(skb->dst->neighbour))) { @@ -601,7 +585,6 @@ } out: - spin_unlock_irqrestore(&priv->tx_lock, flags); return NETDEV_TX_OK; } @@ -797,7 +780,7 @@ dev->addr_len = INFINIBAND_ALEN; dev->type = ARPHRD_INFINIBAND; dev->tx_queue_len = IPOIB_TX_RING_SIZE * 2; - dev->features = NETIF_F_VLAN_CHALLENGED | NETIF_F_LLTX; + dev->features = NETIF_F_VLAN_CHALLENGED; /* MTU will be reset when mcast join happens */ dev->mtu = IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN; @@ -812,7 +795,6 @@ priv->dev = dev; spin_lock_init(&priv->lock); - spin_lock_init(&priv->tx_lock); init_MUTEX(&priv->mcast_mutex); init_MUTEX(&priv->vlan_mutex); ===== drivers/net/sungem.c 1.72 vs edited ===== --- 1.72/drivers/net/sungem.c 2004-11-05 15:56:15 -08:00 +++ edited/drivers/net/sungem.c 2005-01-19 22:29:14 -08:00 @@ -835,9 +835,9 @@ } /* Run TX completion thread */ - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); gem_tx(dev, gp, gp->status); - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irqrestore(&gp->lock, flags); @@ -932,12 +932,12 @@ readl(gp->regs + MAC_RXCFG)); spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); gp->reset_task_pending = 2; schedule_work(&gp->reset_task); - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&gp->lock); } @@ -955,7 +955,6 @@ struct gem *gp = dev->priv; int entry; u64 ctrl; - unsigned long flags; ctrl = 0; if (skb->ip_summed == CHECKSUM_HW) { @@ -969,17 +968,9 @@ (csum_stuff_off << 21)); } - local_irq_save(flags); - if (!spin_trylock(&gp->tx_lock)) { - /* Tell upper layer to requeue */ - local_irq_restore(flags); - return NETDEV_TX_LOCKED; - } - /* This is a hard error, log it. */ if (TX_BUFFS_AVAIL(gp) <= (skb_shinfo(skb)->nr_frags + 1)) { netif_stop_queue(dev); - spin_unlock_irqrestore(&gp->tx_lock, flags); printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n", dev->name); return NETDEV_TX_BUSY; @@ -1066,7 +1057,6 @@ dev->name, entry, skb->len); mb(); writel(gp->tx_new, gp->regs + TXDMA_KICK); - spin_unlock_irqrestore(&gp->tx_lock, flags); dev->trans_start = jiffies; @@ -1097,11 +1087,11 @@ } spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); dev->mtu = new_mtu; gp->reset_task_pending = 1; schedule_work(&gp->reset_task); - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&gp->lock); flush_scheduled_work(); @@ -1111,7 +1101,7 @@ #define STOP_TRIES 32 -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_stop(struct gem *gp) { int limit; @@ -1137,7 +1127,7 @@ printk(KERN_ERR "%s: SW reset is ghetto.\n", gp->dev->name); } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_start_dma(struct gem *gp) { unsigned long val; @@ -1162,7 +1152,7 @@ } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ // XXX dbl check what that function should do when called on PCS PHY static void gem_begin_auto_negotiation(struct gem *gp, struct ethtool_cmd *ep) { @@ -1249,7 +1239,7 @@ /* A link-up condition has occurred, initialize and enable the * rest of the chip. * - * Must be invoked under gp->lock and gp->tx_lock. + * Must be invoked under gp->lock and dev->xmit_lock. */ static int gem_set_link_modes(struct gem *gp) { @@ -1356,7 +1346,7 @@ return 0; } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static int gem_mdio_link_not_up(struct gem *gp) { switch (gp->lstate) { @@ -1414,7 +1404,7 @@ netif_poll_disable(gp->dev); spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); if (gp->hw_running && gp->opened) { netif_stop_queue(gp->dev); @@ -1430,7 +1420,7 @@ } gp->reset_task_pending = 0; - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irq(&gp->lock); netif_poll_enable(gp->dev); } @@ -1444,7 +1434,7 @@ return; spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); /* If the link of task is still pending, we just * reschedule the link timer @@ -1514,11 +1504,11 @@ restart: mod_timer(&gp->link_timer, jiffies + ((12 * HZ) / 10)); out_unlock: - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irq(&gp->lock); } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_clean_rings(struct gem *gp) { struct gem_init_block *gb = gp->init_block; @@ -1569,7 +1559,7 @@ } } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_init_rings(struct gem *gp) { struct gem_init_block *gb = gp->init_block; @@ -1619,7 +1609,7 @@ wmb(); } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_init_phy(struct gem *gp) { u32 mifcfg; @@ -1757,7 +1747,7 @@ } } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_init_dma(struct gem *gp) { u64 desc_dma = (u64) gp->gblock_dvma; @@ -1795,7 +1785,7 @@ gp->regs + RXDMA_BLANK); } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under dev->xmit_lock. */ static u32 gem_setup_multicast(struct gem *gp) { @@ -1838,7 +1828,7 @@ return rxcfg; } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_init_mac(struct gem *gp) { unsigned char *e = &gp->dev->dev_addr[0]; @@ -1916,7 +1906,7 @@ writel(0xffffffff, gp->regs + MAC_MCMASK); } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_init_pause_thresholds(struct gem *gp) { u32 cfg; @@ -2052,7 +2042,7 @@ return 0; } -/* Must be invoked under gp->lock and gp->tx_lock. */ +/* Must be invoked under gp->lock and dev->xmit_lock. */ static void gem_init_hw(struct gem *gp, int restart_link) { /* On Apple's gmac, I initialize the PHY only after @@ -2150,11 +2140,11 @@ if (!gp->wake_on_lan) { spin_lock_irqsave(&gp->lock, flags); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); gem_stop(gp); writel(MAC_TXRST_CMD, gp->regs + MAC_TXRST); writel(MAC_RXRST_CMD, gp->regs + MAC_RXRST); - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irqrestore(&gp->lock, flags); } @@ -2202,9 +2192,9 @@ unsigned long flags; spin_lock_irqsave(&gp->lock, flags); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); gem_stop(gp); - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irqrestore(&gp->lock, flags); } } @@ -2265,9 +2255,9 @@ /* Reset the chip */ spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); gem_stop(gp); - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irq(&gp->lock); gp->hw_running = 1; @@ -2281,7 +2271,7 @@ printk(KERN_ERR "%s: failed to request irq !\n", gp->dev->name); spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); #ifdef CONFIG_PPC_PMAC if (!hw_was_up && gp->pdev->vendor == PCI_VENDOR_ID_APPLE) gem_apple_powerdown(gp); @@ -2290,14 +2280,14 @@ gp->pm_timer.expires = jiffies + 10*HZ; add_timer(&gp->pm_timer); up(&gp->pm_sem); - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irq(&gp->lock); return -EAGAIN; } spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); /* Allocate & setup ring buffers */ gem_init_rings(gp); @@ -2307,7 +2297,7 @@ gp->opened = 1; - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irq(&gp->lock); up(&gp->pm_sem); @@ -2328,7 +2318,7 @@ /* Stop traffic, mark us closed */ spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); gp->opened = 0; @@ -2343,7 +2333,7 @@ /* Bye, the pm timer will finish the job */ free_irq(gp->pdev->irq, (void *) dev); - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irq(&gp->lock); /* Fire the PM timer that will shut us down in about 10 seconds */ @@ -2374,7 +2364,7 @@ /* If the driver is opened, we stop the DMA */ if (gp->opened) { spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); /* Stop traffic, mark us closed */ netif_device_detach(dev); @@ -2385,7 +2375,7 @@ /* Get rid of ring buffers */ gem_clean_rings(gp); - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irq(&gp->lock); if (gp->pdev->vendor == PCI_VENDOR_ID_APPLE) @@ -2419,14 +2409,14 @@ } #endif /* CONFIG_PPC_PMAC */ spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&gp->dev->xmit_lock); gem_stop(gp); gp->hw_running = 1; gem_init_rings(gp); gem_init_hw(gp, 1); - spin_unlock(&gp->tx_lock); + spin_unlock(&gp->dev->xmit_lock); spin_unlock_irq(&gp->lock); netif_device_attach(dev); @@ -2447,7 +2437,7 @@ struct net_device_stats *stats = &gp->net_stats; spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); if (gp->hw_running) { stats->rx_crc_errors += readl(gp->regs + MAC_FCSERR); @@ -2467,12 +2457,13 @@ writel(0, gp->regs + MAC_LCOLL); } - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&gp->lock); return &gp->net_stats; } +/* Called with dev->xmit_lock held and IRQs disabled. */ static void gem_set_multicast(struct net_device *dev) { struct gem *gp = dev->priv; @@ -2482,9 +2473,6 @@ if (!gp->hw_running) return; - spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); - netif_stop_queue(dev); rxcfg = readl(gp->regs + MAC_RXCFG); @@ -2507,9 +2495,6 @@ writel(rxcfg, gp->regs + MAC_RXCFG); netif_wake_queue(dev); - - spin_unlock(&gp->tx_lock); - spin_unlock_irq(&gp->lock); } static void gem_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) @@ -2540,7 +2525,7 @@ /* Return current PHY settings */ spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); cmd->autoneg = gp->want_autoneg; cmd->speed = gp->phy_mii.speed; cmd->duplex = gp->phy_mii.duplex; @@ -2552,7 +2537,7 @@ */ if (cmd->advertising == 0) cmd->advertising = cmd->supported; - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&gp->lock); } else { // XXX PCS ? cmd->supported = @@ -2592,9 +2577,9 @@ /* Apply settings and restart link process. */ spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); gem_begin_auto_negotiation(gp, cmd); - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&gp->lock); return 0; @@ -2609,9 +2594,9 @@ /* Restart link process. */ spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); gem_begin_auto_negotiation(gp, NULL); - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&gp->lock); return 0; @@ -2863,7 +2848,6 @@ gp->msg_enable = DEFAULT_MSG; spin_lock_init(&gp->lock); - spin_lock_init(&gp->tx_lock); init_MUTEX(&gp->pm_sem); init_timer(&gp->link_timer); @@ -2899,9 +2883,9 @@ gem_apple_powerup(gp); #endif spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); gem_stop(gp); - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&gp->lock); /* Fill up the mii_phy structure (even if we won't use it) */ @@ -2967,11 +2951,11 @@ /* Detect & init PHY, start autoneg */ spin_lock_irq(&gp->lock); - spin_lock(&gp->tx_lock); + spin_lock(&dev->xmit_lock); gp->hw_running = 1; gem_init_phy(gp); gem_begin_auto_negotiation(gp, NULL); - spin_unlock(&gp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&gp->lock); if (gp->phy_type == phy_mii_mdio0 || @@ -2982,7 +2966,7 @@ pci_set_drvdata(pdev, dev); /* GEM can do it all... */ - dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_LLTX; + dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM; if (pci_using_dac) dev->features |= NETIF_F_HIGHDMA; ===== drivers/net/sungem.h 1.18 vs edited ===== --- 1.18/drivers/net/sungem.h 2004-09-15 08:16:05 -07:00 +++ edited/drivers/net/sungem.h 2005-01-19 15:54:08 -08:00 @@ -953,7 +953,6 @@ struct gem { spinlock_t lock; - spinlock_t tx_lock; void __iomem *regs; int rx_new, rx_old; int tx_new, tx_old; ===== drivers/net/tg3.c 1.227 vs edited ===== --- 1.227/drivers/net/tg3.c 2005-01-17 13:40:31 -08:00 +++ edited/drivers/net/tg3.c 2005-01-19 16:41:08 -08:00 @@ -2816,9 +2816,9 @@ /* run TX completion thread */ if (sblk->idx[0].tx_consumer != tp->tx_cons) { - spin_lock(&tp->tx_lock); + spin_lock(&netdev->xmit_lock); tg3_tx(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&netdev->xmit_lock); } spin_unlock_irqrestore(&tp->lock, flags); @@ -2939,7 +2939,7 @@ tg3_netif_stop(tp); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&tp->dev->xmit_lock); restart_timer = tp->tg3_flags2 & TG3_FLG2_RESTART_TIMER; tp->tg3_flags2 &= ~TG3_FLG2_RESTART_TIMER; @@ -2949,7 +2949,7 @@ tg3_netif_start(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&tp->dev->xmit_lock); spin_unlock_irq(&tp->lock); if (restart_timer) @@ -3048,6 +3048,7 @@ (base + len + 8 < base)); } +/* dev->xmit_lock is held and IRQs are disabled. */ static int tg3_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct tg3 *tp = netdev_priv(dev); @@ -3055,39 +3056,12 @@ unsigned int i; u32 len, entry, base_flags, mss; int would_hit_hwbug; - unsigned long flags; len = skb_headlen(skb); - /* No BH disabling for tx_lock here. We are running in BH disabled - * context and TX reclaim runs via tp->poll inside of a software - * interrupt. Rejoice! - * - * Actually, things are not so simple. If we are to take a hw - * IRQ here, we can deadlock, consider: - * - * CPU1 CPU2 - * tg3_start_xmit - * take tp->tx_lock - * tg3_timer - * take tp->lock - * tg3_interrupt - * spin on tp->lock - * spin on tp->tx_lock - * - * So we really do need to disable interrupts when taking - * tx_lock here. - */ - local_irq_save(flags); - if (!spin_trylock(&tp->tx_lock)) { - local_irq_restore(flags); - return NETDEV_TX_LOCKED; - } - /* This is a hard error, log it. */ if (unlikely(TX_BUFFS_AVAIL(tp) <= (skb_shinfo(skb)->nr_frags + 1))) { netif_stop_queue(dev); - spin_unlock_irqrestore(&tp->tx_lock, flags); printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n", dev->name); return NETDEV_TX_BUSY; @@ -3224,7 +3198,7 @@ entry, len, last_plus_one, &start, mss)) - goto out_unlock; + goto out; entry = start; } @@ -3236,9 +3210,8 @@ if (TX_BUFFS_AVAIL(tp) <= (MAX_SKB_FRAGS + 1)) netif_stop_queue(dev); -out_unlock: +out: mmiowb(); - spin_unlock_irqrestore(&tp->tx_lock, flags); dev->trans_start = jiffies; @@ -3273,7 +3246,7 @@ tg3_netif_stop(tp); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tg3_halt(tp); @@ -3283,7 +3256,7 @@ tg3_netif_start(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); return 0; @@ -5574,7 +5547,7 @@ unsigned long flags; spin_lock_irqsave(&tp->lock, flags); - spin_lock(&tp->tx_lock); + spin_lock(&tp->dev->xmit_lock); /* All of this garbage is because when using non-tagged * IRQ status the mailbox/status_block protocol the chip @@ -5590,7 +5563,7 @@ if (!(tr32(WDMAC_MODE) & WDMAC_MODE_ENABLE)) { tp->tg3_flags2 |= TG3_FLG2_RESTART_TIMER; - spin_unlock(&tp->tx_lock); + spin_unlock(&tp->dev->xmit_lock); spin_unlock_irqrestore(&tp->lock, flags); schedule_work(&tp->reset_task); return; @@ -5659,7 +5632,7 @@ tp->asf_counter = tp->asf_multiplier; } - spin_unlock(&tp->tx_lock); + spin_unlock(&tp->dev->xmit_lock); spin_unlock_irqrestore(&tp->lock, flags); tp->timer.expires = jiffies + tp->timer_offset; @@ -5672,12 +5645,12 @@ int err; spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tg3_disable_ints(tp); tp->tg3_flags &= ~TG3_FLAG_INIT_COMPLETE; - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); /* The placement of this call is tied @@ -5696,7 +5669,7 @@ } spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); err = tg3_init_hw(tp); if (err) { @@ -5716,7 +5689,7 @@ tp->tg3_flags |= TG3_FLAG_INIT_COMPLETE; } - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); if (err) { @@ -5726,11 +5699,11 @@ } spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tg3_enable_ints(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); netif_start_queue(dev); @@ -5978,7 +5951,7 @@ del_timer_sync(&tp->timer); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); #if 0 tg3_dump_state(tp); #endif @@ -5992,7 +5965,7 @@ TG3_FLAG_GOT_SERDES_FLOWCTL); netif_carrier_off(tp->dev); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); free_irq(dev->irq, dev); @@ -6291,15 +6264,10 @@ } } +/* Called with dev->xmit_lock held and IRQs disabled. */ static void tg3_set_rx_mode(struct net_device *dev) { - struct tg3 *tp = netdev_priv(dev); - - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); __tg3_set_rx_mode(dev); - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); } #define TG3_REGDUMP_LEN (32 * 1024) @@ -6322,7 +6290,7 @@ memset(p, 0, TG3_REGDUMP_LEN); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); #define __GET_REG32(reg) (*(p)++ = tr32(reg)) #define GET_REG32_LOOP(base,len) \ @@ -6372,7 +6340,7 @@ #undef GET_REG32_LOOP #undef GET_REG32_1 - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); } @@ -6496,7 +6464,7 @@ } spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tp->link_config.autoneg = cmd->autoneg; if (cmd->autoneg == AUTONEG_ENABLE) { @@ -6510,7 +6478,7 @@ } tg3_setup_phy(tp, 1); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); return 0; @@ -6627,7 +6595,7 @@ tg3_netif_stop(tp); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tp->rx_pending = ering->rx_pending; @@ -6640,7 +6608,7 @@ tg3_halt(tp); tg3_init_hw(tp); tg3_netif_start(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); return 0; @@ -6661,7 +6629,7 @@ tg3_netif_stop(tp); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); if (epause->autoneg) tp->tg3_flags |= TG3_FLAG_PAUSE_AUTONEG; else @@ -6677,7 +6645,7 @@ tg3_halt(tp); tg3_init_hw(tp); tg3_netif_start(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); return 0; @@ -6803,14 +6771,14 @@ struct tg3 *tp = netdev_priv(dev); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tp->vlgrp = grp; /* Update RX_MODE_KEEP_VLAN_TAG bit in RX_MODE register. */ __tg3_set_rx_mode(dev); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); } @@ -6819,10 +6787,10 @@ struct tg3 *tp = netdev_priv(dev); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); if (tp->vlgrp) tp->vlgrp->vlan_devices[vid] = NULL; - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); } #endif @@ -8241,7 +8209,6 @@ if (pci_using_dac) dev->features |= NETIF_F_HIGHDMA; - dev->features |= NETIF_F_LLTX; #if TG3_VLAN_TAG_USED dev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX; dev->vlan_rx_register = tg3_vlan_rx_register; @@ -8283,7 +8250,6 @@ tp->grc_mode |= GRC_MODE_BSWAP_NONFRM_DATA; #endif spin_lock_init(&tp->lock); - spin_lock_init(&tp->tx_lock); spin_lock_init(&tp->indirect_lock); INIT_WORK(&tp->reset_task, tg3_reset_task, tp); @@ -8496,23 +8462,23 @@ del_timer_sync(&tp->timer); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tg3_disable_ints(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); netif_device_detach(dev); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tg3_halt(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); err = tg3_set_power_state(tp, state); if (err) { spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tg3_init_hw(tp); @@ -8522,7 +8488,7 @@ netif_device_attach(dev); tg3_netif_start(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); } @@ -8547,7 +8513,7 @@ netif_device_attach(dev); spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + spin_lock(&dev->xmit_lock); tg3_init_hw(tp); @@ -8558,7 +8524,7 @@ tg3_netif_start(tp); - spin_unlock(&tp->tx_lock); + spin_unlock(&dev->xmit_lock); spin_unlock_irq(&tp->lock); return 0; ===== drivers/net/tg3.h 1.52 vs edited ===== --- 1.52/drivers/net/tg3.h 2004-11-15 15:41:31 -08:00 +++ edited/drivers/net/tg3.h 2005-01-19 15:37:29 -08:00 @@ -1980,12 +1980,11 @@ * lock: Held during all operations except TX packet * processing. * - * tx_lock: Held during tg3_start_xmit{,_4gbug} and tg3_tx + * dev->xmit_lock: Held during tg3_start_xmit and tg3_tx * * If you want to shut up all asynchronous processing you must - * acquire both locks, 'lock' taken before 'tx_lock'. IRQs must - * be disabled to take 'lock' but only softirq disabling is - * necessary for acquisition of 'tx_lock'. + * acquire both locks, 'lock' taken before 'xmit_lock'. IRQs must + * be disabled to take either lock. */ spinlock_t lock; spinlock_t indirect_lock; @@ -2003,8 +2002,6 @@ u32 tx_prod; u32 tx_cons; u32 tx_pending; - - spinlock_t tx_lock; struct tg3_tx_buffer_desc *tx_ring; struct tx_ring_info *tx_buffers; ===== drivers/net/e1000/e1000.h 1.58 vs edited ===== --- 1.58/drivers/net/e1000/e1000.h 2004-10-25 18:13:57 -07:00 +++ edited/drivers/net/e1000/e1000.h 2005-01-19 15:45:31 -08:00 @@ -209,7 +209,6 @@ /* TX */ struct e1000_desc_ring tx_ring; - spinlock_t tx_lock; uint32_t txd_cmd; uint32_t tx_int_delay; uint32_t tx_abs_int_delay; ===== drivers/net/e1000/e1000_main.c 1.145 vs edited ===== --- 1.145/drivers/net/e1000/e1000_main.c 2005-01-10 21:03:21 -08:00 +++ edited/drivers/net/e1000/e1000_main.c 2005-01-19 16:37:55 -08:00 @@ -291,7 +291,9 @@ e1000_phy_reset(&adapter->hw); } + spin_lock_irq(&netdev->xmit_lock); e1000_set_multi(netdev); + spin_unlock_irq(&netdev->xmit_lock); e1000_restore_vlan(adapter); @@ -520,9 +522,6 @@ if(pci_using_dac) netdev->features |= NETIF_F_HIGHDMA; - /* hard_start_xmit is safe against parallel locking */ - netdev->features |= NETIF_F_LLTX; - /* before reading the EEPROM, reset the controller to * put the device in a known good starting state */ @@ -732,7 +731,6 @@ atomic_set(&adapter->irq_sem, 1); spin_lock_init(&adapter->stats_lock); - spin_lock_init(&adapter->tx_lock); return 0; } @@ -1293,6 +1291,8 @@ * list or the network interface flags are updated. This routine is * responsible for configuring the hardware for proper multicast, * promiscuous mode, and all-multi behavior. + * + * Called with netdev->xmit_lock held and IRQs disabled. **/ static void @@ -1304,12 +1304,9 @@ uint32_t rctl; uint32_t hash_value; int i; - unsigned long flags; /* Check for Promiscuous and All Multicast modes */ - spin_lock_irqsave(&adapter->tx_lock, flags); - rctl = E1000_READ_REG(hw, RCTL); if(netdev->flags & IFF_PROMISC) { @@ -1358,8 +1355,6 @@ if(hw->mac_type == e1000_82542_rev2_0) e1000_leave_82542_rst(adapter); - - spin_unlock_irqrestore(&adapter->tx_lock, flags); } /* Need to wait a few seconds after link up to get diagnostic information from @@ -1786,6 +1781,8 @@ } #define TXD_USE_COUNT(S, X) (((S) >> (X)) + 1 ) + +/* Called with dev->xmit_lock held and interrupts disabled. */ static int e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev) { @@ -1794,7 +1791,6 @@ unsigned int max_txd_pwr = E1000_MAX_TXD_PWR; unsigned int tx_flags = 0; unsigned int len = skb->len; - unsigned long flags; unsigned int nr_frags = 0; unsigned int mss = 0; int count = 0; @@ -1838,18 +1834,10 @@ if(adapter->pcix_82544) count += nr_frags; - local_irq_save(flags); - if (!spin_trylock(&adapter->tx_lock)) { - /* Collision - tell upper layer to requeue */ - local_irq_restore(flags); - return NETDEV_TX_LOCKED; - } - /* need: count + 2 desc gap to keep tail from touching * head, otherwise try next time */ if(unlikely(E1000_DESC_UNUSED(&adapter->tx_ring) < count + 2)) { netif_stop_queue(netdev); - spin_unlock_irqrestore(&adapter->tx_lock, flags); return NETDEV_TX_BUSY; } @@ -1857,7 +1845,6 @@ if(unlikely(e1000_82547_fifo_workaround(adapter, skb))) { netif_stop_queue(netdev); mod_timer(&adapter->tx_fifo_stall_timer, jiffies); - spin_unlock_irqrestore(&adapter->tx_lock, flags); return NETDEV_TX_BUSY; } } @@ -1884,7 +1871,6 @@ if(unlikely(E1000_DESC_UNUSED(&adapter->tx_ring) < MAX_SKB_FRAGS + 2)) netif_stop_queue(netdev); - spin_unlock_irqrestore(&adapter->tx_lock, flags); return NETDEV_TX_OK; } @@ -2234,13 +2220,13 @@ tx_ring->next_to_clean = i; - spin_lock(&adapter->tx_lock); + spin_lock(&netdev->xmit_lock); if(unlikely(cleaned && netif_queue_stopped(netdev) && netif_carrier_ok(netdev))) netif_wake_queue(netdev); - spin_unlock(&adapter->tx_lock); + spin_unlock(&netdev->xmit_lock); return cleaned; } @@ -2819,7 +2805,10 @@ if(wufc) { e1000_setup_rctl(adapter); + + spin_lock_irq(&netdev->xmit_lock); e1000_set_multi(netdev); + spin_unlock_irq(&netdev->xmit_lock); /* turn on all-multi mode if wake on multicast is enabled */ if(adapter->wol & E1000_WUFC_MC) { ===== include/linux/netdevice.h 1.96 vs edited ===== --- 1.96/include/linux/netdevice.h 2005-01-17 13:13:31 -08:00 +++ edited/include/linux/netdevice.h 2005-01-19 16:07:45 -08:00 @@ -76,7 +76,6 @@ /* Driver transmit return codes */ #define NETDEV_TX_OK 0 /* driver took care of packet */ #define NETDEV_TX_BUSY 1 /* driver tx path was busy*/ -#define NETDEV_TX_LOCKED -1 /* driver tx lock was already taken */ /* * Compute the worst case header length according to the protocols @@ -415,7 +414,7 @@ #define NETIF_F_HW_VLAN_FILTER 512 /* Receive filtering on VLAN */ #define NETIF_F_VLAN_CHALLENGED 1024 /* Device cannot handle VLAN packets */ #define NETIF_F_TSO 2048 /* Can offload TCP/IP segmentation */ -#define NETIF_F_LLTX 4096 /* LockLess TX */ +#define NETIF_F_LLTX 4096 /* Do not grab xmit_lock during ->hard_start_xmit */ /* Called after device is detached from network. */ void (*uninit)(struct net_device *dev); @@ -894,9 +893,11 @@ static inline void netif_tx_disable(struct net_device *dev) { - spin_lock_bh(&dev->xmit_lock); + unsigned long flags; + + spin_lock_irqsave(&dev->xmit_lock, flags); netif_stop_queue(dev); - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irqrestore(&dev->xmit_lock, flags); } /* These functions live elsewhere (drivers/net/net_init.c, but related) */ ===== net/atm/clip.c 1.43 vs edited ===== --- 1.43/net/atm/clip.c 2004-10-01 14:50:07 -07:00 +++ edited/net/atm/clip.c 2005-01-19 16:04:55 -08:00 @@ -97,7 +97,7 @@ printk(KERN_CRIT "!clip_vcc->entry (clip_vcc %p)\n",clip_vcc); return; } - spin_lock_bh(&entry->neigh->dev->xmit_lock); /* block clip_start_xmit() */ + spin_lock_irq(&entry->neigh->dev->xmit_lock); /* block clip_start_xmit() */ entry->neigh->used = jiffies; for (walk = &entry->vccs; *walk; walk = &(*walk)->next) if (*walk == clip_vcc) { @@ -121,7 +121,7 @@ printk(KERN_CRIT "ATMARP: unlink_clip_vcc failed (entry %p, vcc " "0x%p)\n",entry,clip_vcc); out: - spin_unlock_bh(&entry->neigh->dev->xmit_lock); + spin_unlock_irq(&entry->neigh->dev->xmit_lock); } /* The neighbour entry n->lock is held. */ ===== net/core/dev.c 1.181 vs edited ===== --- 1.181/net/core/dev.c 2005-01-13 20:41:04 -08:00 +++ edited/net/core/dev.c 2005-01-19 16:00:08 -08:00 @@ -1190,7 +1190,7 @@ #define HARD_TX_LOCK(dev, cpu) { \ if ((dev->features & NETIF_F_LLTX) == 0) { \ - spin_lock(&dev->xmit_lock); \ + spin_lock_irq(&dev->xmit_lock); \ dev->xmit_lock_owner = cpu; \ } \ } @@ -1198,7 +1198,7 @@ #define HARD_TX_UNLOCK(dev) { \ if ((dev->features & NETIF_F_LLTX) == 0) { \ dev->xmit_lock_owner = -1; \ - spin_unlock(&dev->xmit_lock); \ + spin_unlock_irq(&dev->xmit_lock); \ } \ } ===== net/core/dev_mcast.c 1.4 vs edited ===== --- 1.4/net/core/dev_mcast.c 2004-10-20 01:37:15 -07:00 +++ edited/net/core/dev_mcast.c 2005-01-19 16:05:35 -08:00 @@ -93,9 +93,9 @@ void dev_mc_upload(struct net_device *dev) { - spin_lock_bh(&dev->xmit_lock); + spin_lock_irq(&dev->xmit_lock); __dev_mc_upload(dev); - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); } /* @@ -107,7 +107,7 @@ int err = 0; struct dev_mc_list *dmi, **dmip; - spin_lock_bh(&dev->xmit_lock); + spin_lock_irq(&dev->xmit_lock); for (dmip = &dev->mc_list; (dmi = *dmip) != NULL; dmip = &dmi->next) { /* @@ -139,13 +139,13 @@ */ __dev_mc_upload(dev); - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); return 0; } } err = -ENOENT; done: - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); return err; } @@ -160,7 +160,7 @@ dmi1 = (struct dev_mc_list *)kmalloc(sizeof(*dmi), GFP_ATOMIC); - spin_lock_bh(&dev->xmit_lock); + spin_lock_irq(&dev->xmit_lock); for (dmi = dev->mc_list; dmi != NULL; dmi = dmi->next) { if (memcmp(dmi->dmi_addr, addr, dmi->dmi_addrlen) == 0 && dmi->dmi_addrlen == alen) { @@ -176,7 +176,7 @@ } if ((dmi = dmi1) == NULL) { - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); return -ENOMEM; } memcpy(dmi->dmi_addr, addr, alen); @@ -189,11 +189,11 @@ __dev_mc_upload(dev); - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); return 0; done: - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); if (dmi1) kfree(dmi1); return err; @@ -205,7 +205,7 @@ void dev_mc_discard(struct net_device *dev) { - spin_lock_bh(&dev->xmit_lock); + spin_lock_irq(&dev->xmit_lock); while (dev->mc_list != NULL) { struct dev_mc_list *tmp = dev->mc_list; @@ -216,7 +216,7 @@ } dev->mc_count = 0; - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); } #ifdef CONFIG_PROC_FS @@ -251,7 +251,7 @@ struct dev_mc_list *m; struct net_device *dev = v; - spin_lock_bh(&dev->xmit_lock); + spin_lock_irq(&dev->xmit_lock); for (m = dev->mc_list; m; m = m->next) { int i; @@ -263,7 +263,7 @@ seq_putc(seq, '\n'); } - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); return 0; } ===== net/core/netpoll.c 1.24 vs edited ===== --- 1.24/net/core/netpoll.c 2005-01-13 20:41:04 -08:00 +++ edited/net/core/netpoll.c 2005-01-19 16:05:56 -08:00 @@ -188,7 +188,7 @@ return; } - spin_lock(&np->dev->xmit_lock); + spin_lock_irq(&np->dev->xmit_lock); np->dev->xmit_lock_owner = smp_processor_id(); /* @@ -197,7 +197,7 @@ */ if (netif_queue_stopped(np->dev)) { np->dev->xmit_lock_owner = -1; - spin_unlock(&np->dev->xmit_lock); + spin_unlock_irq(&np->dev->xmit_lock); netpoll_poll(np); goto repeat; @@ -205,7 +205,7 @@ status = np->dev->hard_start_xmit(skb, np->dev); np->dev->xmit_lock_owner = -1; - spin_unlock(&np->dev->xmit_lock); + spin_unlock_irq(&np->dev->xmit_lock); /* transmit busy */ if(status) { ===== net/core/pktgen.c 1.21 vs edited ===== --- 1.21/net/core/pktgen.c 2005-01-10 11:32:10 -08:00 +++ edited/net/core/pktgen.c 2005-01-19 16:07:58 -08:00 @@ -2664,12 +2664,11 @@ } } - spin_lock_bh(&odev->xmit_lock); + spin_lock_irq(&odev->xmit_lock); if (!netif_queue_stopped(odev)) { u64 now; atomic_inc(&(pkt_dev->skb->users)); -retry_now: ret = odev->hard_start_xmit(pkt_dev->skb, odev); if (likely(ret == NETDEV_TX_OK)) { pkt_dev->last_ok = 1; @@ -2677,10 +2676,6 @@ pkt_dev->seq_num++; pkt_dev->tx_bytes += pkt_dev->cur_pkt_size; - } else if (ret == NETDEV_TX_LOCKED - && (odev->features & NETIF_F_LLTX)) { - cpu_relax(); - goto retry_now; } else { /* Retry it next time */ atomic_dec(&(pkt_dev->skb->users)); @@ -2716,7 +2711,7 @@ pkt_dev->next_tx_ns = 0; } - spin_unlock_bh(&odev->xmit_lock); + spin_unlock_irq(&odev->xmit_lock); /* If pkt_dev->count is zero, then run forever */ if ((pkt_dev->count != 0) && (pkt_dev->sofar >= pkt_dev->count)) { ===== net/sched/sch_generic.c 1.33 vs edited ===== --- 1.33/net/sched/sch_generic.c 2005-01-13 20:41:07 -08:00 +++ edited/net/sched/sch_generic.c 2005-01-19 16:25:02 -08:00 @@ -99,17 +99,11 @@ if ((skb = q->dequeue(q)) != NULL) { unsigned nolock = (dev->features & NETIF_F_LLTX); /* - * When the driver has LLTX set it does its own locking - * in start_xmit. No need to add additional overhead by - * locking again. These checks are worth it because - * even uncongested locks can be quite expensive. - * The driver can do trylock like here too, in case - * of lock congestion it should return -1 and the packet - * will be requeued. + * When the driver has LLTX set it does not require any + * locking in start_xmit. */ if (!nolock) { - if (!spin_trylock(&dev->xmit_lock)) { - collision: + if (!spin_trylock_irq(&dev->xmit_lock)) { /* So, someone grabbed the driver. */ /* It may be transient configuration error, @@ -143,22 +137,18 @@ if (ret == NETDEV_TX_OK) { if (!nolock) { dev->xmit_lock_owner = -1; - spin_unlock(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); } spin_lock(&dev->queue_lock); return -1; } - if (ret == NETDEV_TX_LOCKED && nolock) { - spin_lock(&dev->queue_lock); - goto collision; - } } /* NETDEV_TX_BUSY - we need to requeue */ /* Release the driver */ if (!nolock) { dev->xmit_lock_owner = -1; - spin_unlock(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); } spin_lock(&dev->queue_lock); q = dev->qdisc; @@ -186,7 +176,7 @@ { struct net_device *dev = (struct net_device *)arg; - spin_lock(&dev->xmit_lock); + spin_lock_irq(&dev->xmit_lock); if (dev->qdisc != &noop_qdisc) { if (netif_device_present(dev) && netif_running(dev) && @@ -200,7 +190,7 @@ dev_hold(dev); } } - spin_unlock(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); dev_put(dev); } @@ -224,17 +214,17 @@ static void dev_watchdog_up(struct net_device *dev) { - spin_lock_bh(&dev->xmit_lock); + spin_lock_irq(&dev->xmit_lock); __netdev_watchdog_up(dev); - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); } static void dev_watchdog_down(struct net_device *dev) { - spin_lock_bh(&dev->xmit_lock); + spin_lock_irq(&dev->xmit_lock); if (del_timer(&dev->watchdog_timer)) __dev_put(dev); - spin_unlock_bh(&dev->xmit_lock); + spin_unlock_irq(&dev->xmit_lock); } /* "NOOP" scheduler: the best scheduler, recommended for all interfaces ===== net/sched/sch_teql.c 1.16 vs edited ===== --- 1.16/net/sched/sch_teql.c 2004-10-21 22:21:24 -07:00 +++ edited/net/sched/sch_teql.c 2005-01-19 16:06:11 -08:00 @@ -301,12 +301,12 @@ switch (teql_resolve(skb, skb_res, slave)) { case 0: - if (spin_trylock(&slave->xmit_lock)) { + if (spin_trylock_irq(&slave->xmit_lock)) { slave->xmit_lock_owner = smp_processor_id(); if (!netif_queue_stopped(slave) && slave->hard_start_xmit(skb, slave) == 0) { slave->xmit_lock_owner = -1; - spin_unlock(&slave->xmit_lock); + spin_unlock_irq(&slave->xmit_lock); master->slaves = NEXT_SLAVE(q); netif_wake_queue(dev); master->stats.tx_packets++; @@ -314,7 +314,7 @@ return 0; } slave->xmit_lock_owner = -1; - spin_unlock(&slave->xmit_lock); + spin_unlock_irq(&slave->xmit_lock); } if (netif_queue_stopped(dev)) busy = 1; ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-20 7:05 ` David S. Miller @ 2005-01-20 13:51 ` Tommy Christensen 2005-01-20 21:34 ` David S. Miller 2005-01-20 21:41 ` David S. Miller 2005-01-20 16:56 ` Stephen Hemminger 2005-01-20 19:16 ` Jeff Garzik 2 siblings, 2 replies; 62+ messages in thread From: Tommy Christensen @ 2005-01-20 13:51 UTC (permalink / raw) To: David S. Miller Cc: Roland Dreier, shemminger, hadi, iod00d, eric.lemoine, netdev, ak, openib-general, kaber On Thu, 2005-01-20 at 08:05, David S. Miller wrote: > So I'm going to check this in and push upstream. Let me know > if folks find any other errors. Hi Dave Sorry to be the one with bad news. I am in favour of this idea, but there are still issues that need to be sorted out. > ===== drivers/net/sungem.c 1.72 vs edited ===== > --- 1.72/drivers/net/sungem.c 2004-11-05 15:56:15 -08:00 > +++ edited/drivers/net/sungem.c 2005-01-19 22:29:14 -08:00 > @@ -932,12 +932,12 @@ > readl(gp->regs + MAC_RXCFG)); > > spin_lock_irq(&gp->lock); > - spin_lock(&gp->tx_lock); > + spin_lock(&dev->xmit_lock); > > gp->reset_task_pending = 2; > schedule_work(&gp->reset_task); > > - spin_unlock(&gp->tx_lock); > + spin_unlock(&dev->xmit_lock); > spin_unlock_irq(&gp->lock); > } > ->tx_timeout() can't take dev->xmit_lock, since dev_watchdog already has it held. A lot more serious is the fact that ->tx_timeout() and ->hard_start_xmit() are no longer allowed to do this: spin_lock_irq() ... spin_unlock_irq() since that would leave us with irq's enabled while still holding the xmit_lock. This would have to be fixed for non-LLTX drivers as well. -Tommy ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-20 13:51 ` Tommy Christensen @ 2005-01-20 21:34 ` David S. Miller 2005-01-20 21:56 ` Grant Grundler 2005-01-20 21:41 ` David S. Miller 1 sibling, 1 reply; 62+ messages in thread From: David S. Miller @ 2005-01-20 21:34 UTC (permalink / raw) To: Tommy Christensen Cc: roland, shemminger, hadi, iod00d, eric.lemoine, netdev, ak, openib-general, kaber On Thu, 20 Jan 2005 14:51:25 +0100 Tommy Christensen <tommy.christensen@tpack.net> wrote: > A lot more serious is the fact that ->tx_timeout() and > ->hard_start_xmit() are no longer allowed to do this: > > spin_lock_irq() > ... > spin_unlock_irq() > > since that would leave us with irq's enabled while still > holding the xmit_lock. > This would have to be fixed for non-LLTX drivers as well. Even worse is that this breaks the acenic driver too because it does this: unsigned long maxjiff = jiffies + 3*HZ; if (time_before(jiffies, maxjiff)) { barrier(); cpu_relax(); goto restart; } in it's ->hard_start_xmit() routine. I was auditing spin_lock_irq() usage in ->hard_start_xmit() routines when I caught this. This one isn't impossible to fix though. We can replace the jiffies games with a udelay/mdelay and a "maxloop" variable. Any takers? I'll keep working on the spin_lock_irq() audit then work on the other problems Tommy found. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-20 21:34 ` David S. Miller @ 2005-01-20 21:56 ` Grant Grundler 2005-01-21 1:01 ` David S. Miller 0 siblings, 1 reply; 62+ messages in thread From: Grant Grundler @ 2005-01-20 21:56 UTC (permalink / raw) To: David S. Miller Cc: netdev, eric.lemoine, hadi, ak, openib-general, Tommy Christensen, kaber, shemminger On Thu, Jan 20, 2005 at 01:34:20PM -0800, David S. Miller wrote: > Even worse is that this breaks the acenic driver too because > it does this: > > unsigned long maxjiff = jiffies + 3*HZ; > > if (time_before(jiffies, maxjiff)) { > barrier(); > cpu_relax(); > goto restart; > } > > in it's ->hard_start_xmit() routine. > > I was auditing spin_lock_irq() usage in ->hard_start_xmit() > routines when I caught this. > > This one isn't impossible to fix though. We can replace the > jiffies games with a udelay/mdelay and a "maxloop" variable. > Any takers? I saw the next mail suggesting to revert the changes because of more fallout. But if/when acenic needs changes like those proposed above, I can implement and test proposed changes if folks have the patience to wait a week or so. acenic is no where on my "official HP supported NICs" list but I happen to have the HW/infrastructure to test them. grant ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-20 21:56 ` Grant Grundler @ 2005-01-21 1:01 ` David S. Miller 2005-01-22 3:17 ` Roland Dreier 0 siblings, 1 reply; 62+ messages in thread From: David S. Miller @ 2005-01-21 1:01 UTC (permalink / raw) To: Grant Grundler Cc: tommy.christensen, roland, shemminger, hadi, iod00d, eric.lemoine, netdev, ak, openib-general, kaber On Thu, 20 Jan 2005 13:56:50 -0800 Grant Grundler <iod00d@hp.com> wrote: > I saw the next mail suggesting to revert the changes because of more fallout. Yeah, that's what I ended up just pushing to Linus. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-21 1:01 ` David S. Miller @ 2005-01-22 3:17 ` Roland Dreier 2005-01-22 3:53 ` David S. Miller 0 siblings, 1 reply; 62+ messages in thread From: Roland Dreier @ 2005-01-22 3:17 UTC (permalink / raw) To: David S. Miller Cc: Grant Grundler, tommy.christensen, shemminger, hadi, eric.lemoine, netdev, ak, openib-general, kaber So what's the plan here with LLTX? As far as I can tell, 2.6.11-rc2 went out with the changes that disable IRQs in hard_start_xmit... - R. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-22 3:17 ` Roland Dreier @ 2005-01-22 3:53 ` David S. Miller 0 siblings, 0 replies; 62+ messages in thread From: David S. Miller @ 2005-01-22 3:53 UTC (permalink / raw) To: Roland Dreier Cc: iod00d, tommy.christensen, shemminger, hadi, eric.lemoine, netdev, ak, openib-general, kaber On Fri, 21 Jan 2005 19:17:07 -0800 Roland Dreier <roland@topspin.com> wrote: > So what's the plan here with LLTX? As far as I can tell, 2.6.11-rc2 > went out with the changes that disable IRQs in hard_start_xmit... Linus just didn't pull the backout of that stuff in time. I'll repush it to him tonight. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-20 13:51 ` Tommy Christensen 2005-01-20 21:34 ` David S. Miller @ 2005-01-20 21:41 ` David S. Miller 1 sibling, 0 replies; 62+ messages in thread From: David S. Miller @ 2005-01-20 21:41 UTC (permalink / raw) To: Tommy Christensen Cc: roland, shemminger, hadi, iod00d, eric.lemoine, netdev, ak, openib-general, kaber On Thu, 20 Jan 2005 14:51:25 +0100 Tommy Christensen <tommy.christensen@tpack.net> wrote: > A lot more serious is the fact that ->tx_timeout() and > ->hard_start_xmit() are no longer allowed to do this: Sigh, there are even more issues. Now dev_kfree_skb() is illegal in ->hard_start_xmit() because interrupts are disabled, and there are many such cases in the simpler net drivers. I think we might need to revert the change over to IRQ disabling dev->xmit_lock and rethink this. This is just too much fallout. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-20 7:05 ` David S. Miller 2005-01-20 13:51 ` Tommy Christensen @ 2005-01-20 16:56 ` Stephen Hemminger 2005-01-21 10:54 ` Lennert Buytenhek 2005-01-20 19:16 ` Jeff Garzik 2 siblings, 1 reply; 62+ messages in thread From: Stephen Hemminger @ 2005-01-20 16:56 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, eric.lemoine, hadi, ak, openib-general, kaber You might want to enforce that LLTX devices like tunnels need to have no transmit queue (tx_queue_len == 0)? I can run a test on 8 way with e1000 if it would help. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-20 16:56 ` Stephen Hemminger @ 2005-01-21 10:54 ` Lennert Buytenhek 2005-01-26 6:27 ` David S. Miller 0 siblings, 1 reply; 62+ messages in thread From: Lennert Buytenhek @ 2005-01-21 10:54 UTC (permalink / raw) To: Stephen Hemminger Cc: David S. Miller, Roland Dreier, hadi, iod00d, eric.lemoine, netdev, ak, openib-general, kaber On Thu, Jan 20, 2005 at 08:56:11AM -0800, Stephen Hemminger wrote: > You might want to enforce that LLTX devices like tunnels need to > have no transmit queue (tx_queue_len == 0)? I didn't look at the original LLTX patch, so I'm speaking out of ignorance, but since we're on this subject anyway: I noticed that the ip_gre tunneling driver contains this comment: <quote> 1. The most important issue is detecting local dead loops. They would cause complete host lockup in transmit, which would be "resolved" by stack overflow or, if queueing is enabled, with infinite looping in net_bh. We cannot track such dead loops during route installation, it is infeasible task. The most general solutions would be to keep skb->encapsulation counter (sort of local ttl), and silently drop packet when it expires. It is the best solution, but it supposes maintaing new variable in ALL skb, even if no tunneling is used. Current solution: t->recursion lock breaks dead loops. It looks like dev->tbusy flag, but I preferred new variable, because the semantics is different. One day, when hard_start_xmit will be multithreaded we will have to use skb->encapsulation. </quote> If multiple CPUs can call into the tunneling drivers without taking any locks, we'd need some extra locking in there, or just do what Alexey describes and keep track of recursion in the skb. The recursion detection in dev_queue_xmit would perhaps also want to use such a per-skb mechanism instead of using dev->xmit_lock_owner. (You really _do_ want the protection that this mechanism offers.) --L ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-21 10:54 ` Lennert Buytenhek @ 2005-01-26 6:27 ` David S. Miller 2005-01-26 13:25 ` Lennert Buytenhek 0 siblings, 1 reply; 62+ messages in thread From: David S. Miller @ 2005-01-26 6:27 UTC (permalink / raw) To: Lennert Buytenhek Cc: shemminger, roland, hadi, iod00d, eric.lemoine, netdev, ak, openib-general, kaber On Fri, 21 Jan 2005 11:54:52 +0100 Lennert Buytenhek <buytenh@wantstofly.org> wrote: > If multiple CPUs can call into the tunneling drivers without taking > any locks, we'd need some extra locking in there, or just do what > Alexey describes and keep track of recursion in the skb. Another idea is that, just like how loopback made it's statistics per-cpu for LLTX support, this recursion variable could be per-cpu as well. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-26 6:27 ` David S. Miller @ 2005-01-26 13:25 ` Lennert Buytenhek 2005-01-27 6:32 ` David S. Miller 0 siblings, 1 reply; 62+ messages in thread From: Lennert Buytenhek @ 2005-01-26 13:25 UTC (permalink / raw) To: David S. Miller Cc: shemminger, roland, hadi, iod00d, eric.lemoine, netdev, ak, openib-general, kaber On Tue, Jan 25, 2005 at 10:27:05PM -0800, David S. Miller wrote: > > If multiple CPUs can call into the tunneling drivers without taking > > any locks, we'd need some extra locking in there, or just do what > > Alexey describes and keep track of recursion in the skb. > > Another idea is that, just like how loopback made it's statistics > per-cpu for LLTX support, this recursion variable could be per-cpu > as well. I've thought about this a bit, and the only sane way of doing recursion detection that doesn't involve 'struct net_device' would be to keep track of the recursion depth (perhaps per-CPU as you suggest) and tossing the packet when it exceeds some random value, right? To reproduce the current behaviour more closely you'd have to keep a small per-CPU array of 'struct net_device *' pointers as a kind of recursion stack, and toss the packet when you hit a net_device that's already on the list. But that seems like slight overkill. --L ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-26 13:25 ` Lennert Buytenhek @ 2005-01-27 6:32 ` David S. Miller 2005-01-27 7:16 ` Andi Kleen 0 siblings, 1 reply; 62+ messages in thread From: David S. Miller @ 2005-01-27 6:32 UTC (permalink / raw) To: Lennert Buytenhek Cc: shemminger, roland, hadi, iod00d, eric.lemoine, netdev, ak, openib-general, kaber On Wed, 26 Jan 2005 14:25:12 +0100 Lennert Buytenhek <buytenh@wantstofly.org> wrote: > I've thought about this a bit, and the only sane way of doing recursion > detection that doesn't involve 'struct net_device' would be to keep track > of the recursion depth (perhaps per-CPU as you suggest) and tossing the > packet when it exceeds some random value, right? Yes, that's the idea. > To reproduce the current behaviour more closely you'd have to keep a > small per-CPU array of 'struct net_device *' pointers as a kind of > recursion stack, and toss the packet when you hit a net_device that's > already on the list. But that seems like slight overkill. Indeed. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-27 6:32 ` David S. Miller @ 2005-01-27 7:16 ` Andi Kleen 2005-01-27 7:22 ` David S. Miller 0 siblings, 1 reply; 62+ messages in thread From: Andi Kleen @ 2005-01-27 7:16 UTC (permalink / raw) To: David S. Miller Cc: Lennert Buytenhek, shemminger, roland, hadi, iod00d, eric.lemoine, netdev, ak, openib-general, kaber On Wed, Jan 26, 2005 at 10:32:47PM -0800, David S. Miller wrote: > On Wed, 26 Jan 2005 14:25:12 +0100 > Lennert Buytenhek <buytenh@wantstofly.org> wrote: > > > I've thought about this a bit, and the only sane way of doing recursion > > detection that doesn't involve 'struct net_device' would be to keep track > > of the recursion depth (perhaps per-CPU as you suggest) and tossing the > > packet when it exceeds some random value, right? > > Yes, that's the idea. per CPU only works in preemptive kernel if you have preemption disabled all the time. Do you? Seems not likely to me. -Andi ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-27 7:16 ` Andi Kleen @ 2005-01-27 7:22 ` David S. Miller 2005-01-27 8:26 ` Andi Kleen 0 siblings, 1 reply; 62+ messages in thread From: David S. Miller @ 2005-01-27 7:22 UTC (permalink / raw) To: Andi Kleen Cc: buytenh, shemminger, roland, hadi, iod00d, eric.lemoine, netdev, ak, openib-general, kaber On Thu, 27 Jan 2005 08:16:45 +0100 Andi Kleen <ak@suse.de> wrote: > > Yes, that's the idea. > > per CPU only works in preemptive kernel if you have preemption > disabled all the time. Do you? > > Seems not likely to me. BH is disabled in these code paths (specifically we're talking about ->hard_start_xmit()), as that is where the recursion check goes. Otherwise, loopback's LLTX is broken as it relies on this property as well. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-27 7:22 ` David S. Miller @ 2005-01-27 8:26 ` Andi Kleen 0 siblings, 0 replies; 62+ messages in thread From: Andi Kleen @ 2005-01-27 8:26 UTC (permalink / raw) To: David S. Miller Cc: Andi Kleen, buytenh, shemminger, roland, hadi, iod00d, eric.lemoine, netdev, openib-general, kaber On Wed, Jan 26, 2005 at 11:22:27PM -0800, David S. Miller wrote: > On Thu, 27 Jan 2005 08:16:45 +0100 > Andi Kleen <ak@suse.de> wrote: > > > > Yes, that's the idea. > > > > per CPU only works in preemptive kernel if you have preemption > > disabled all the time. Do you? > > > > Seems not likely to me. > > BH is disabled in these code paths (specifically we're talking > about ->hard_start_xmit()), as that is where the recursion > check goes. Hmm, but hard_start_xmit is allowed to enable preemption again, isn't it? An safer alternative may be to change the cpu mask of the current process temporarily. -Andi ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-20 7:05 ` David S. Miller 2005-01-20 13:51 ` Tommy Christensen 2005-01-20 16:56 ` Stephen Hemminger @ 2005-01-20 19:16 ` Jeff Garzik 2 siblings, 0 replies; 62+ messages in thread From: Jeff Garzik @ 2005-01-20 19:16 UTC (permalink / raw) To: David S. Miller Cc: Roland Dreier, shemminger, hadi, iod00d, eric.lemoine, netdev, ak, openib-general, kaber Reminder: double-check that the locking rules in Documentation/networking/netdevices.txt still apply; update the text file where you changed things. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-20 0:46 ` [PATCH]: was " David S. Miller 2005-01-20 3:14 ` Andi Kleen 2005-01-20 3:43 ` Roland Dreier @ 2005-01-20 4:01 ` jamal 2005-01-20 5:18 ` David S. Miller 2 siblings, 1 reply; 62+ messages in thread From: jamal @ 2005-01-20 4:01 UTC (permalink / raw) To: David S. Miller Cc: netdev, eric.lemoine, ak, openib-general, kaber, Stephen Hemminger One thing i didnt quiet follow Dave in some of the drivers, example in the e1000: ----- /* Called with dev->xmit_lock held and interrupts disabled. */ static int e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev) --- Who calls that with dev->xmit_lock held and interrupts disabled? Shouldnt the spin_unlock(&netdev->xmit_lock); be right at the top of that routine now? cheers, jamal ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH]: was Re: LLTX and netif_stop_queue 2005-01-20 4:01 ` jamal @ 2005-01-20 5:18 ` David S. Miller 0 siblings, 0 replies; 62+ messages in thread From: David S. Miller @ 2005-01-20 5:18 UTC (permalink / raw) To: hadi; +Cc: netdev, eric.lemoine, ak, openib-general, kaber, shemminger On 19 Jan 2005 23:01:47 -0500 jamal <hadi@cyberus.ca> wrote: > ----- > /* Called with dev->xmit_lock held and interrupts disabled. */ > static int > e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev) > --- > > Who calls that with dev->xmit_lock held and interrupts disabled? qdisc_restart() and dev_queue_xmit(), via netdev->hard_start_xmit(). > Shouldnt the spin_unlock(&netdev->xmit_lock); be right at the top of > that routine now? Nope, the idea now is that netdev->xmit_lock replaces the driver private tx_lock ^ permalink raw reply [flat|nested] 62+ messages in thread
end of thread, other threads:[~2005-01-27 8:26 UTC | newest] Thread overview: 62+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-12-17 21:57 LLTX and netif_stop_queue Roland Dreier 2004-12-18 5:44 ` David S. Miller 2004-12-18 15:35 ` Roland Dreier 2004-12-18 17:58 ` Roland Dreier 2004-12-18 18:26 ` Roland Dreier 2004-12-19 19:33 ` jamal 2004-12-19 19:31 ` jamal 2004-12-19 19:54 ` jamal 2004-12-19 20:02 ` Jamal Hadi Salim 2004-12-19 22:35 ` Roland Dreier 2004-12-19 23:06 ` jamal 2004-12-19 23:16 ` Roland Dreier 2004-12-22 18:49 ` Eric Lemoine 2004-12-23 4:29 ` David S. Miller 2004-12-23 4:38 ` Roland Dreier 2004-12-23 9:10 ` Eric Lemoine 2004-12-23 16:37 ` Patrick McHardy 2004-12-23 18:11 ` Eric Lemoine 2004-12-24 16:10 ` Eric Lemoine 2004-12-28 13:31 ` jamal 2005-01-02 23:30 ` Eric Lemoine 2005-01-03 7:41 ` Eric Lemoine 2005-01-03 15:04 ` jamal 2005-01-03 15:48 ` Eric Lemoine 2005-01-03 15:57 ` Roland Dreier 2005-01-03 16:41 ` Eric Lemoine 2005-01-03 16:54 ` Roland Dreier 2005-01-03 17:07 ` Eric Lemoine 2005-01-03 17:12 ` Grant Grundler 2005-01-04 4:18 ` jamal 2005-01-19 22:47 ` David S. Miller 2005-01-19 23:18 ` Stephen Hemminger 2005-01-19 23:41 ` David S. Miller 2005-01-20 0:02 ` [openib-general] " Jeff Garzik 2005-01-20 0:46 ` Stephen Hemminger 2005-01-20 0:47 ` David S. Miller 2005-01-20 0:47 ` Francois Romieu 2005-01-20 0:52 ` David S. Miller 2005-01-20 1:17 ` Francois Romieu 2005-01-20 0:46 ` [PATCH]: was " David S. Miller 2005-01-20 3:14 ` Andi Kleen 2005-01-20 7:05 ` David S. Miller 2005-01-20 3:43 ` Roland Dreier 2005-01-20 7:05 ` David S. Miller 2005-01-20 13:51 ` Tommy Christensen 2005-01-20 21:34 ` David S. Miller 2005-01-20 21:56 ` Grant Grundler 2005-01-21 1:01 ` David S. Miller 2005-01-22 3:17 ` Roland Dreier 2005-01-22 3:53 ` David S. Miller 2005-01-20 21:41 ` David S. Miller 2005-01-20 16:56 ` Stephen Hemminger 2005-01-21 10:54 ` Lennert Buytenhek 2005-01-26 6:27 ` David S. Miller 2005-01-26 13:25 ` Lennert Buytenhek 2005-01-27 6:32 ` David S. Miller 2005-01-27 7:16 ` Andi Kleen 2005-01-27 7:22 ` David S. Miller 2005-01-27 8:26 ` Andi Kleen 2005-01-20 19:16 ` Jeff Garzik 2005-01-20 4:01 ` jamal 2005-01-20 5:18 ` David S. Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).