[ofa-general] [PATCH 3/3][NET_BATCH] kill dev->gso

Netdev List
 help / color / mirror / Atom feed

* [ofa-general] [PATCH 3/3][NET_BATCH] kill dev->gso_skb
@ 2007-10-08 18:27 jamal
  0 siblings, 0 replies; 2+ messages in thread
From: jamal @ 2007-10-08 18:27 UTC (permalink / raw)
  To: David Miller
  Cc: johnpol, peter.p.waskiewicz.jr, kumarkr, herbert, gaagaan,
	Robert.Olsson, netdev, rdreier, mcarlson, randy.dunlap, jagana,
	general, mchan, tgraf, jeff, sri, shemminger, kaber

[-- Attachment #1: Type: text/plain, Size: 97 bytes --]

This patch removes dev->gso_skb as it is no longer necessary with
batching code.

cheers,
jamal


[-- Attachment #2: 03-kill-dev-gso-skb.patch --]
[-- Type: text/x-patch, Size: 2277 bytes --]

[NET_BATCH] kill dev->gso_skb
The batching code does what gso used to batch at the drivers.
There is no more need for gso_skb. If for whatever reason the
requeueing is a bad idea we are going to leave packets in dev->blist
(and still not need dev->gso_skb)

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>

---
commit 3a6202d62adff75b85d6ca0f8fd491abf9d63f4b
tree 80497c538bdb3eab6ab81ff1dec1c3263da79826
parent 63381156b35719a364d0f81fec487e6263fb5467
author Jamal Hadi Salim <hadi@cyberus.ca> Mon, 08 Oct 2007 09:11:02 -0400
committer Jamal Hadi Salim <hadi@cyberus.ca> Mon, 08 Oct 2007 09:11:02 -0400

 include/linux/netdevice.h |    3 ---
 net/sched/sch_generic.c   |   12 ------------
 2 files changed, 0 insertions(+), 15 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b31df5c..4ddc6eb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -577,9 +577,6 @@ struct net_device
 	struct list_head	qdisc_list;
 	unsigned long		tx_queue_len;	/* Max frames per queue allowed */
 
-	/* Partially transmitted GSO packet. */
-	struct sk_buff		*gso_skb;
-
 	/* ingress path synchronizer */
 	spinlock_t		ingress_lock;
 	struct Qdisc		*qdisc_ingress;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 96bfdcb..9112ea0 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -172,13 +172,6 @@ static int xmit_get_pkts(struct net_device *dev,
 	struct sk_buff *skb;
 	int count = dev->xmit_win;
 
-	if (count  && dev->gso_skb) {
-		skb = dev->gso_skb;
-		dev->gso_skb = NULL;
-		count -= xmit_count_skbs(skb);
-		__skb_queue_tail(pktlist, skb);
-	}
-
 	while (count > 0) {
 		skb = q->dequeue(q);
 		if (!skb)
@@ -647,7 +640,6 @@ void dev_activate(struct net_device *dev)
 void dev_deactivate(struct net_device *dev)
 {
 	struct Qdisc *qdisc;
-	struct sk_buff *skb;
 
 	spin_lock_bh(&dev->queue_lock);
 	qdisc = dev->qdisc;
@@ -655,15 +647,11 @@ void dev_deactivate(struct net_device *dev)
 
 	qdisc_reset(qdisc);
 
-	skb = dev->gso_skb;
-	dev->gso_skb = NULL;
 	if (!skb_queue_empty(&dev->blist))
 		skb_queue_purge(&dev->blist);
 	dev->xmit_win = 1;
 	spin_unlock_bh(&dev->queue_lock);
 
-	kfree_skb(skb);
-
 	dev_watchdog_down(dev);
 
 	/* Wait for outstanding dev_queue_xmit calls. */

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000
@ 2007-09-14  9:00 Krishna Kumar
  2007-09-16 23:17 ` [ofa-general] " David Miller
  0 siblings, 1 reply; 2+ messages in thread
From: Krishna Kumar @ 2007-09-14  9:00 UTC (permalink / raw)
  To: johnpol, herbert, hadi, kaber, shemminger, davem
  Cc: jagana, Robert.Olsson, rick.jones2, xma, gaagaan, netdev, rdreier,
	peter.p.waskiewicz.jr, mcarlson, jeff, mchan, general, kumarkr,
	tgraf, randy.dunlap, Krishna Kumar, sri

This set of patches implements the batching xmit capability, and adds support
for batching in IPoIB and E1000 (E1000 driver changes is ported, thanks to
changes taken from Jamal's code from an old kernel).

List of changes from previous revision:
----------------------------------------
1. [Dave] Enable batching as default (change in register_netdev).
2. [Randy] Update documentation (however ethtool cmd to get/set batching is
	not implemented, hence I am guessing the usage).
3. [KK] When changing tx_batch_skb, qdisc xmits need to be blocked since
	qdisc_restart() drops queue_lock before calling driver xmit, and
	driver could find blist change under it.
4. [KK] sched: requeue could wrongly requeue skb already put in the batching
	list (in case a single skb was sent to the device but not sent as the
	device was full, resulting in the skb getting added to blist). This
	also results in slight optimization of batching behavior where for
	getting skbs #2 onwards don't require to check for gso_skb as that
	is the first skb that is processed.
4. [KK] Change documentation to explain this behavior.
5. [KK] sched: Fix panic when GSO is enabled in driver.
6. [KK] IPoIB: Small optimization in ipoib_ib_handle_tx_wc
7. [KK] netdevice: Needed to change NETIF_F_GSO_SHIFT/NETIF_F_GSO_MASK as
	BATCH_SKBS is now defined as 65536 (earlier it was using 8192 which
	was taken up by NETIF_F_NETNS_LOCAL).


Will submit in the next 1-2 days:
---------------------------------
1. [Auke] Enable batching in e1000e.


Extras that I can do later:
---------------------------
1. [Patrick] Use skb_blist statically in netdevice. This could also be used
	to integrate GSO and batching.
2. [Evgeniy] Useful to splice lists dev_add_skb_to_blist (and this can be
	done for regular xmit's of GSO skbs too for #1 above).

Patches are described as:
		 Mail 0/10:  This mail
		 Mail 1/10:  HOWTO documentation
		 Mail 2/10:  Introduce skb_blist, NETIF_F_BATCH_SKBS, use
		 	     single API for batching/no-batching, etc.
		 Mail 3/10:  Modify qdisc_run() to support batching
		 Mail 4/10:  Add ethtool support to enable/disable batching
		 Mail 5/10:  IPoIB: Header file changes to use batching
		 Mail 6/10:  IPoIB: CM & Multicast changes
		 Mail 7/10:  IPoIB: Verbs changes to use batching
		 Mail 8/10:  IPoIB: Internal post and work completion handler
		 Mail 9/10:  IPoIB: Implement the new batching capability
		 Mail 10/10: E1000: Implement the new batching capability

Issues:
--------
The retransmission problem reported earlier seems to happen when mthca is
used as the underlying device, but when I tested ehca the retransmissions
dropped to normal levels (around 2 times the regular code). The performance
improvement is around 55% for TCP.

Please review and provide feedback; and consider for inclusion.

Thanks,

- KK

----------------------------------------------------
			TCP
			----
Size:32 Procs:1		2728	3544	29.91
Size:128 Procs:1	11803	13679	15.89
Size:512 Procs:1	43279	49665	14.75
Size:4096 Procs:1	147952	101246	-31.56
Size:16384 Procs:1	149852	141897	-5.30

Size:32 Procs:4		10562	11349	7.45
Size:128 Procs:4	41010	40832	-.43
Size:512 Procs:4	75374	130943	73.72
Size:4096 Procs:4	167996	368218	119.18
Size:16384 Procs:4	123176	379524	208.11

Size:32 Procs:8		21125	21990	4.09
Size:128 Procs:8	77419	78605	1.53
Size:512 Procs:8	234678	265047	12.94
Size:4096 Procs:8	218063	367604	68.57
Size:16384 Procs:8	184283	370972	101.30

Average:	1509300 -> 2345115 = 55.38%
----------------------------------------------------

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [ofa-general] Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000
  2007-09-14  9:00 [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 Krishna Kumar
@ 2007-09-16 23:17 ` David Miller
  2007-09-17  0:29   ` jamal
  0 siblings, 1 reply; 2+ messages in thread
From: David Miller @ 2007-09-16 23:17 UTC (permalink / raw)
  To: krkumar2
  Cc: johnpol, jagana, peter.p.waskiewicz.jr, kumarkr, herbert, gaagaan,
	Robert.Olsson, netdev, rdreier, hadi, mcarlson, jeff, general,
	mchan, tgraf, randy.dunlap, sri, shemminger, kaber

From: Krishna Kumar <krkumar2@in.ibm.com>
Date: Fri, 14 Sep 2007 14:30:58 +0530

> This set of patches implements the batching xmit capability, and
> adds support for batching in IPoIB and E1000 (E1000 driver changes
> is ported, thanks to changes taken from Jamal's code from an old
> kernel).

The only major complaint I have about this patch series is that
the IPoIB part should just be one big changeset.  Otherwise the
tree is not bisectable, for example the initial ipoib header file
change breaks the build.

The tree must compile and work properly after every single patch.

On a lower priority, I question the indirection of skb_blist by making
it a pointer.  For what?  Saving 12 bytes on 64-bit?  That kmalloc()'d
thing is a nearly guarenteed cache and/or TLB miss.  Just inline the
thing, we generally don't do crap like this anywhere else.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000
  2007-09-16 23:17 ` [ofa-general] " David Miller
@ 2007-09-17  0:29   ` jamal
  2007-09-23 17:53     ` [PATCHES] TX batching jamal
  0 siblings, 1 reply; 2+ messages in thread
From: jamal @ 2007-09-17  0:29 UTC (permalink / raw)
  To: David Miller
  Cc: krkumar2, johnpol, herbert, kaber, shemminger, jagana,
	Robert.Olsson, rick.jones2, xma, gaagaan, netdev, rdreier,
	peter.p.waskiewicz.jr, mcarlson, jeff, mchan, general, kumarkr,
	tgraf, randy.dunlap, sri

On Sun, 2007-16-09 at 16:17 -0700, David Miller wrote:

> The only major complaint I have about this patch series is that
> the IPoIB part should just be one big changeset. 

Dave, you do realize that i have been investing my time working on
batching as well, right? 

cheers,
jamal



^ permalink raw reply	[flat|nested] 2+ messages in thread

* [PATCHES] TX batching
  2007-09-17  0:29   ` jamal
@ 2007-09-23 17:53     ` jamal
  2007-09-23 17:56       ` [ofa-general] [PATCH 1/4] [NET_SCHED] explict hold dev tx lock jamal
  0 siblings, 1 reply; 2+ messages in thread
From: jamal @ 2007-09-23 17:53 UTC (permalink / raw)
  To: David Miller
  Cc: krkumar2, johnpol, herbert, kaber, shemminger, jagana,
	Robert.Olsson, rick.jones2, xma, gaagaan, netdev, rdreier,
	peter.p.waskiewicz.jr, mcarlson, jeff, mchan, general, kumarkr,
	tgraf, randy.dunlap, sri

I had plenty of time this weekend so i have been doing a _lot_ of
testing.  My next emails will send a set of patches:

Patch 1: Introduces explicit tx locking
Patch 2: Introduces batching interface
Patch 3: Core uses batching interface
Patch 4: get rid of dev->gso_skb

Testing
-------
Each of these patches has been performance tested and the results
are in the logs on a per-patch basis. 
My system under test hardware is a 2xdual core opteron with a couple of 
tg3s. 
My test tool generates udp traffic of different sizes for upto 60 
seconds per run or a total of 30M packets. I have 4 threads each 
running on a specific CPU which keep all the CPUs as busy as they can 
sending packets targetted at a directly connected box's udp discard
port.

All 4 CPUs target a single tg3 to send. The receiving box has a tc rule 
which counts and drops all incoming udp packets to discard port - this
allows me to make sure that the receiver is not the bottleneck in the
testing. Packet sizes sent are {64B, 128B, 256B, 512B, 1024B}. Each
packet size run is repeated 10 times to ensure that there are no
transients. The average of all 10 runs is then computed and collected.

I have not run testing on patch #4 because i had to let the machine
go, but will have some access to it tommorow early morning where i can
run some tests.

Comments
--------
Iam trying to kill ->hard_batch_xmit() but it would be tricky to do
without it for LLTX drivers. Anything i try will require a few extra
checks. OTOH, I could kill LLTX for the drivers i am using that
are LLTX and then drop that interface or I could say "no support
for LLTX". I am in a dilema.

Dave please let me know if this meets your desires to allow devices
which are SG and able to compute CSUM benefit just in case i
misunderstood. 
Herbert, if you can look at at least patch 4 i will appreaciate it.

More patches to follow  - i didnt want to overload people by dumping 
too many patches. Most of these patches below are ready to go; some are
need some testing and others need a little porting from an earlier
kernel: 
- tg3 driver (tested and works well, but dont want to send 
- tun driver
- pktgen
- netiron driver
- e1000 driver
- ethtool interface
- There is at least one other driver promised to me

I am also going to update the two documents i posted earlier.
Hopefully i can do that today.

cheers,
jamal

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [ofa-general] [PATCH 1/4] [NET_SCHED] explict hold dev tx lock
  2007-09-23 17:53     ` [PATCHES] TX batching jamal
@ 2007-09-23 17:56       ` jamal
  2007-09-23 17:58         ` [ofa-general] [PATCH 2/4] [NET_BATCH] Introduce batching interface jamal
  0 siblings, 1 reply; 2+ messages in thread
From: jamal @ 2007-09-23 17:56 UTC (permalink / raw)
  To: David Miller
  Cc: johnpol, peter.p.waskiewicz.jr, kumarkr, herbert, gaagaan,
	Robert.Olsson, netdev, rdreier, mcarlson, randy.dunlap, jagana,
	general, mchan, tgraf, jeff, sri, shemminger, kaber

[-- Attachment #1: Type: text/plain, Size: 141 bytes --]


I have submitted this before; but here it is again.
Against net-2.6.24 from yesterday for this and all following patches. 


cheers,
jamal


[-- Attachment #2: patch10f4 --]
[-- Type: text/plain, Size: 3260 bytes --]

[NET_SCHED] explict hold dev tx lock

For N cpus, with full throttle traffic on all N CPUs, funneling traffic
to the same ethernet device, the devices queue lock is contended by all
N CPUs constantly. The TX lock is only contended by a max of 2 CPUS.
In the current mode of qdisc operation, when all N CPUs contend for
the dequeue region and one of them (after all the work) entering
dequeue region, we may endup aborting the path if we are unable to get
the tx lock and go back to contend for the queue lock. As N goes up,
this gets worse.

The changes in this patch result in a small increase in performance
with a 4CPU (2xdual-core) with no irq binding. My tests are UDP
based and keep all 4CPUs busy all the time for the period of the test.
Both e1000 and tg3 showed similar behavior.
I expect higher gains with more CPUs. Summary below with different
UDP packets and the resulting pps seen. Note at around 200Bytes,
the two dont seem that much different and we are approaching wire
speed (with plenty of CPU available; eg at 512B, the app is sitting
at 80% idle on both cases).

        +------------+--------------+-------------+------------+--------+
pktsize |       64B  |  128B        | 256B        | 512B       |1024B   |
        +------------+--------------+-------------+------------+--------+
Original| 467482     | 463061       | 388267      | 216308     | 114704 |
        |            |              |             |            |        |
txlock  | 468922     | 464060       | 388298      | 216316     | 114709 |
        -----------------------------------------------------------------

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>

---
commit b0e36991c5850dfe930f80ee508b08fdcabc18d1
tree b1787bba26f80a325298f89d1ec882cc5ab524ae
parent 42765047105fdd496976bc1784d22eec1cd9b9aa
author Jamal Hadi Salim <hadi@cyberus.ca> Sun, 23 Sep 2007 09:09:17 -0400
committer Jamal Hadi Salim <hadi@cyberus.ca> Sun, 23 Sep 2007 09:09:17 -0400

 net/sched/sch_generic.c |   19 ++-----------------
 1 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index e970e8e..95ae119 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -134,34 +134,19 @@ static inline int qdisc_restart(struct net_device *dev)
 {
 	struct Qdisc *q = dev->qdisc;
 	struct sk_buff *skb;
-	unsigned lockless;
 	int ret;
 
 	/* Dequeue packet */
 	if (unlikely((skb = dev_dequeue_skb(dev, q)) == NULL))
 		return 0;
 
-	/*
-	 * When the driver has LLTX set, it does its own locking in
-	 * start_xmit. These checks are worth it because even uncongested
-	 * locks can be quite expensive. The driver can do a trylock, as
-	 * is being done here; in case of lock contention it should return
-	 * NETDEV_TX_LOCKED and the packet will be requeued.
-	 */
-	lockless = (dev->features & NETIF_F_LLTX);
-
-	if (!lockless && !netif_tx_trylock(dev)) {
-		/* Another CPU grabbed the driver tx lock */
-		return handle_dev_cpu_collision(skb, dev, q);
-	}
 
 	/* And release queue */
 	spin_unlock(&dev->queue_lock);
 
+	HARD_TX_LOCK(dev, smp_processor_id());
 	ret = dev_hard_start_xmit(skb, dev);
-
-	if (!lockless)
-		netif_tx_unlock(dev);
+	HARD_TX_UNLOCK(dev);
 
 	spin_lock(&dev->queue_lock);
 	q = dev->qdisc;

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [ofa-general] [PATCH 2/4] [NET_BATCH] Introduce batching interface
  2007-09-23 17:56       ` [ofa-general] [PATCH 1/4] [NET_SCHED] explict hold dev tx lock jamal
@ 2007-09-23 17:58         ` jamal
  2007-09-23 18:00           ` [PATCH 3/4][NET_BATCH] net core use batching jamal
  0 siblings, 1 reply; 2+ messages in thread
From: jamal @ 2007-09-23 17:58 UTC (permalink / raw)
  To: David Miller
  Cc: johnpol, peter.p.waskiewicz.jr, kumarkr, herbert, gaagaan,
	Robert.Olsson, netdev, rdreier, mcarlson, randy.dunlap, jagana,
	general, mchan, tgraf, jeff, sri, shemminger, kaber

[-- Attachment #1: Type: text/plain, Size: 77 bytes --]


This patch introduces the netdevice interface for batching.

cheers,
jamal


[-- Attachment #2: patch20f4 --]
[-- Type: text/plain, Size: 8823 bytes --]

[NET_BATCH] Introduce batching interface

This patch introduces the netdevice interface for batching.

A typical driver dev->hard_start_xmit() has 4 parts:
a) packet formating (example vlan, mss, descriptor counting etc)
b) chip specific formatting
c) enqueueing the packet on a DMA ring
d) IO operations to complete packet transmit, tell DMA engine to chew on,
tx completion interupts etc

[For code cleanliness/readability sake, regardless of this work,
one should break the dev->hard_start_xmit() into those 4 functions
anyways].

With the api introduced in this patch, a driver which has all
4 parts and needing to support batching is advised to split its
dev->hard_start_xmit() in the following manner:
1)use its dev->hard_prep_xmit() method to achieve #a
2)use its dev->hard_end_xmit() method to achieve #d
3)#b and #c can stay in ->hard_start_xmit() (or whichever way you want
to do this)
Note: There are drivers which may need not support any of the two
methods (example the tun driver i patched) so the two methods are
optional.

The core will first do the packet formatting by invoking your supplied
dev->hard_prep_xmit() method. It will then pass you the packet via
your dev->hard_start_xmit() method and lastly will invoke your
dev->hard_end_xmit() when it completes passing you all the packets
queued for you. dev->hard_prep_xmit() is invoked without holding any
tx lock but the rest are under TX_LOCK().

LLTX present a challenge in that we have to introduce a deviation
from the norm and introduce the ->hard_batch_xmit() method. An LLTX
driver presents us with ->hard_batch_xmit() to which we pass it a list
of packets in a dev->blist skb queue. It is then the responsibility
of the ->hard_batch_xmit() to exercise steps #b and #c for all packets
and #d when the batching is complete. Step #a is already done for you
by the time you get the packets in dev->blist.
And last xmit_win variable is introduced to ensure that when we pass
the driver a list of packets it will swallow all of them - which is
useful because we dont requeue to the qdisc (and avoids burning
unnecessary cpu cycles or introducing any strange re-ordering). The driver
tells us when it invokes netif_wake_queue how much space it has for
descriptors by setting this variable.

Some decisions i had to make:
- every driver will have a xmit_win variable and the core will set it
to 1 which means the behavior of non-batching drivers stays the same.
- the batch list, blist, is no longer a pointer; wastes a little extra
memmory i plan to recoup by killing gso_skb in later patches.

Theres a lot of history and reasoning of why batching in a document
i am writting which i may submit as a patch.
Thomas Graf (who doesnt know this probably) gave me the impetus to
start looking at this back in 2004 when he invited me to the linux
conference he was organizing. Parts of what i presented in SUCON in
2004 talk about batching. Herbert Xu forced me to take a second look around
2.6.18 - refer to my netconf 2006 presentation. Krishna Kumar provided
me with more motivation in May 2007 when he posted on netdev and engaged me.
Sridhar Samudrala, Krishna Kumar, Matt Carlson, Michael Chan,
Jeremy Ethridge, Evgeniy Polyakov, Sivakumar Subramani, and
David Miller, have contributed in one or more of {bug fixes, enhancements,
testing, lively discussion}. The Broadcom and netiron folks have been
outstanding in their help.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>

---
commit ab4b07ef2e4069c115c9c1707d86ae2344a5ded5
tree 994b42b03bbfcc09ac8b7670c53c12e0b2a71dc7
parent b0e36991c5850dfe930f80ee508b08fdcabc18d1
author Jamal Hadi Salim <hadi@cyberus.ca> Sun, 23 Sep 2007 10:30:32 -0400
committer Jamal Hadi Salim <hadi@cyberus.ca> Sun, 23 Sep 2007 10:30:32 -0400

 include/linux/netdevice.h |   17 +++++++
 net/core/dev.c            |  106 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 123 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cf89ce6..443cded 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -453,6 +453,7 @@ struct net_device
 #define NETIF_F_NETNS_LOCAL	8192	/* Does not change network namespaces */
 #define NETIF_F_MULTI_QUEUE	16384	/* Has multiple TX/RX queues */
 #define NETIF_F_LRO		32768	/* large receive offload */
+#define NETIF_F_BTX		65536	/* Capable of batch tx */
 
 	/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT	16
@@ -578,6 +579,15 @@ struct net_device
 	void			*priv;	/* pointer to private data	*/
 	int			(*hard_start_xmit) (struct sk_buff *skb,
 						    struct net_device *dev);
+	/* hard_batch_xmit is needed for LLTX, kill it when those
+	 * disappear or better kill it now and dont support LLTX
+	*/
+	int			(*hard_batch_xmit) (struct net_device *dev);
+	int			(*hard_prep_xmit) (struct sk_buff *skb,
+						   struct net_device *dev);
+	void			(*hard_end_xmit) (struct net_device *dev);
+	int			xmit_win;
+
 	/* These may be needed for future network-power-down code. */
 	unsigned long		trans_start;	/* Time (in jiffies) of last Tx	*/
 
@@ -592,6 +602,7 @@ struct net_device
 
 	/* delayed register/unregister */
 	struct list_head	todo_list;
+	struct sk_buff_head     blist;
 	/* device index hash chain */
 	struct hlist_node	index_hlist;
 
@@ -1022,6 +1033,12 @@ extern int		dev_set_mac_address(struct net_device *,
 					    struct sockaddr *);
 extern int		dev_hard_start_xmit(struct sk_buff *skb,
 					    struct net_device *dev);
+extern int		dev_batch_xmit(struct net_device *dev);
+extern int		prepare_gso_skb(struct sk_buff *skb,
+					struct net_device *dev,
+					struct sk_buff_head *skbs);
+extern int		xmit_prepare_skb(struct sk_buff *skb,
+					 struct net_device *dev);
 
 extern int		netdev_budget;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 91c31e6..25d01fd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1531,6 +1531,110 @@ static int dev_gso_segment(struct sk_buff *skb)
 	return 0;
 }
 
+int prepare_gso_skb(struct sk_buff *skb, struct net_device *dev,
+		    struct sk_buff_head *skbs)
+{
+	int tdq = 0;
+	do {
+		struct sk_buff *nskb = skb->next;
+
+		skb->next = nskb->next;
+		nskb->next = NULL;
+
+		if (dev->hard_prep_xmit) {
+			/* note: skb->cb is set in hard_prep_xmit(),
+			 * it should not be trampled somewhere
+			 * between here and the driver picking it
+			 * The VLAN code wrongly assumes it owns it
+			 * so the driver needs to be careful; for
+			 * good handling look at tg3 driver ..
+			*/
+			int ret = dev->hard_prep_xmit(nskb, dev);
+			if (ret != NETDEV_TX_OK)
+				continue;
+		}
+		/* Driver likes this packet .. */
+		tdq++;
+		__skb_queue_tail(skbs, nskb);
+	} while (skb->next);
+	skb->destructor = DEV_GSO_CB(skb)->destructor;
+	kfree_skb(skb);
+
+	return tdq;
+}
+
+int xmit_prepare_skb(struct sk_buff *skb, struct net_device *dev)
+{
+	struct sk_buff_head *skbs = &dev->blist;
+
+	if (netif_needs_gso(dev, skb)) {
+		if (unlikely(dev_gso_segment(skb))) {
+			kfree_skb(skb);
+			return 0;
+		}
+		if (skb->next)
+			return prepare_gso_skb(skb, dev, skbs);
+	}
+
+	if (dev->hard_prep_xmit) {
+		int ret = dev->hard_prep_xmit(skb, dev);
+		if (ret != NETDEV_TX_OK)
+			return 0;
+	}
+	__skb_queue_tail(skbs, skb);
+	return 1;
+}
+
+int dev_batch_xmit(struct net_device *dev)
+{
+	struct sk_buff_head *skbs = &dev->blist;
+	int rc = NETDEV_TX_OK;
+	struct sk_buff *skb;
+	int orig_w = dev->xmit_win;
+	int orig_pkts = skb_queue_len(skbs);
+
+	if (dev->hard_batch_xmit) { /* only for LLTX devices */
+		rc = dev->hard_batch_xmit(dev);
+	} else {
+		while ((skb = __skb_dequeue(skbs)) != NULL) {
+			if (!list_empty(&ptype_all))
+				dev_queue_xmit_nit(skb, dev);
+			rc = dev->hard_start_xmit(skb, dev);
+			if (unlikely(rc))
+				break;
+			/*
+			 * XXX: multiqueue may need closer srutiny..
+			*/
+			if (unlikely(netif_queue_stopped(dev) ||
+			     netif_subqueue_stopped(dev, skb->queue_mapping))) {
+				rc = NETDEV_TX_BUSY;
+				break;
+			}
+		}
+	}
+
+	/* driver is likely buggy and lied to us on how much
+	 * space it had. Damn you driver ..
+	*/
+	if (unlikely(skb_queue_len(skbs))) {
+		printk(KERN_WARNING "Likely bug %s %s (%d) "
+				"left %d/%d window now %d, orig %d\n",
+			dev->name, rc?"busy":"locked",
+			netif_queue_stopped(dev),
+			skb_queue_len(skbs),
+			orig_pkts,
+			dev->xmit_win,
+			orig_w);
+			rc = NETDEV_TX_BUSY;
+	}
+
+	if (orig_pkts > skb_queue_len(skbs))
+		if (dev->hard_end_xmit)
+			dev->hard_end_xmit(dev);
+
+	return rc;
+}
+
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	if (likely(!skb->next)) {
@@ -3565,6 +3669,8 @@ int register_netdevice(struct net_device *dev)
 		}
 	}
 
+	dev->xmit_win = 1;
+	skb_queue_head_init(&dev->blist);
 	/*
 	 *	nil rebuild_header routine,
 	 *	that should be never called and used as just bug trap.

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [PATCH 3/4][NET_BATCH] net core use batching
  2007-09-23 17:58         ` [ofa-general] [PATCH 2/4] [NET_BATCH] Introduce batching interface jamal
@ 2007-09-23 18:00           ` jamal
  2007-09-23 18:02             ` [ofa-general] [PATCH 4/4][NET_SCHED] kill dev->gso_skb jamal
  0 siblings, 1 reply; 2+ messages in thread
From: jamal @ 2007-09-23 18:00 UTC (permalink / raw)
  To: David Miller
  Cc: krkumar2, johnpol, herbert, kaber, shemminger, jagana,
	Robert.Olsson, rick.jones2, xma, gaagaan, netdev, rdreier,
	peter.p.waskiewicz.jr, mcarlson, jeff, mchan, general, kumarkr,
	tgraf, randy.dunlap, sri

[-- Attachment #1: Type: text/plain, Size: 71 bytes --]

This patch adds the usage of batching within the core.

cheers,
jamal


[-- Attachment #2: patch30f4 --]
[-- Type: text/plain, Size: 6919 bytes --]

[NET_BATCH] net core use batching

This patch adds the usage of batching within the core.
The same test methodology used in introducing txlock is used, with
the following results on different kernels:

        +------------+--------------+-------------+------------+--------+
        |       64B  |  128B        | 256B        | 512B       |1024B   |
        +------------+--------------+-------------+------------+--------+
Original| 467482     | 463061       | 388267      | 216308     | 114704 |
        |            |              |             |            |        |
txlock  | 468922     | 464060       | 388298      | 216316     | 114709 |
        |            |              |             |            |        |
tg3nobtx| 468012     | 464079       | 388293      | 216314     | 114704 |
        |            |              |             |            |        |
tg3btxdr| 480794     | 475102       | 388298      | 216316     | 114705 |
        |            |              |             |            |        |
tg3btxco| 481059     | 475423       | 388285      | 216308     | 114706 |
        +------------+--------------+-------------+------------+--------+

The first two colums "Original" and "txlock" were introduced in an earlier
patch and demonstrate a slight increase in performance with txlock.
"tg3nobtx" shows the tg3 driver with no changes to support batching.
The purpose of this test is to demonstrate the effect of introducing
the core changes to a driver that doesnt support them.
Although this patch brings down perfomance slightly compared to txlock
for such netdevices, it is still better compared to just the original kernel.
"tg3btxdr" demonstrates the effect of using ->hard_batch_xmit() with tg3
driver. "tg3btxco" demonstrates the effect of letting the core do all the
work. As can be seen the last two are not very different in performance.
The difference is ->hard_batch_xmit() introduces a new method which
is intrusive.

I have #if-0ed some of the old functions so the patch is more readable.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>

---
commit e26705f6ef7db034df7af3f4fccd7cd40b8e46e0
tree b99c469497a0145ca5c0651dc4229ce17da5b31c
parent 6b8e2f76f86c35a6b2cee3698c633d20495ae0c0
author Jamal Hadi Salim <hadi@cyberus.ca> Sun, 23 Sep 2007 11:35:25 -0400
committer Jamal Hadi Salim <hadi@cyberus.ca> Sun, 23 Sep 2007 11:35:25 -0400

 net/sched/sch_generic.c |  127 +++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 115 insertions(+), 12 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 95ae119..86a3f9d 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -56,6 +56,7 @@ static inline int qdisc_qlen(struct Qdisc *q)
 	return q->q.qlen;
 }
 
+#if 0
 static inline int dev_requeue_skb(struct sk_buff *skb, struct net_device *dev,
 				  struct Qdisc *q)
 {
@@ -110,6 +111,97 @@ static inline int handle_dev_cpu_collision(struct sk_buff *skb,
 
 	return ret;
 }
+#endif
+
+static inline int handle_dev_cpu_collision(struct net_device *dev)
+{
+	if (unlikely(dev->xmit_lock_owner == smp_processor_id())) {
+		if (net_ratelimit())
+			printk(KERN_WARNING
+				"Dead loop on netdevice %s, fix it urgently!\n",
+				dev->name);
+		return 1;
+	}
+	__get_cpu_var(netdev_rx_stat).cpu_collision++;
+	return 0;
+}
+
+static inline int
+dev_requeue_skbs(struct sk_buff_head *skbs, struct net_device *dev,
+	       struct Qdisc *q)
+{
+
+	struct sk_buff *skb;
+
+	while ((skb = __skb_dequeue(skbs)) != NULL)
+		q->ops->requeue(skb, q);
+
+	netif_schedule(dev);
+	return 0;
+}
+
+static inline int
+xmit_islocked(struct sk_buff_head *skbs, struct net_device *dev,
+	    struct Qdisc *q)
+{
+	int ret = handle_dev_cpu_collision(dev);
+
+	if (ret) {
+		if (!skb_queue_empty(skbs))
+			skb_queue_purge(skbs);
+		return qdisc_qlen(q);
+	}
+
+	return dev_requeue_skbs(skbs, dev, q);
+}
+
+static int xmit_count_skbs(struct sk_buff *skb)
+{
+	int count = 0;
+	for (; skb; skb = skb->next) {
+		count += skb_shinfo(skb)->nr_frags;
+		count += 1;
+	}
+	return count;
+}
+
+static int xmit_get_pkts(struct net_device *dev,
+			   struct Qdisc *q,
+			   struct sk_buff_head *pktlist)
+{
+	struct sk_buff *skb;
+	int count = dev->xmit_win;
+
+	if (count  && dev->gso_skb) {
+		skb = dev->gso_skb;
+		dev->gso_skb = NULL;
+		count -= xmit_count_skbs(skb);
+		__skb_queue_tail(pktlist, skb);
+	}
+
+	while (count > 0) {
+		skb = q->dequeue(q);
+		if (!skb)
+			break;
+
+		count -= xmit_count_skbs(skb);
+		__skb_queue_tail(pktlist, skb);
+	}
+
+	return skb_queue_len(pktlist);
+}
+
+static int xmit_prepare_pkts(struct net_device *dev,
+			     struct sk_buff_head *tlist)
+{
+	struct sk_buff *skb;
+	struct sk_buff_head *flist = &dev->blist;
+
+	while ((skb = __skb_dequeue(tlist)) != NULL)
+		xmit_prepare_skb(skb, dev);
+
+	return skb_queue_len(flist);
+}
 
 /*
  * NOTE: Called under dev->queue_lock with locally disabled BH.
@@ -130,22 +222,27 @@ static inline int handle_dev_cpu_collision(struct sk_buff *skb,
  *				>0 - queue is not empty.
  *
  */
-static inline int qdisc_restart(struct net_device *dev)
+
+static inline int qdisc_restart(struct net_device *dev,
+				struct sk_buff_head *tpktlist)
 {
 	struct Qdisc *q = dev->qdisc;
-	struct sk_buff *skb;
-	int ret;
+	int ret = 0;
 
-	/* Dequeue packet */
-	if (unlikely((skb = dev_dequeue_skb(dev, q)) == NULL))
-		return 0;
+	ret = xmit_get_pkts(dev, q, tpktlist);
 
+	if (!ret)
+		return 0;
 
-	/* And release queue */
+	/* We got em packets */
 	spin_unlock(&dev->queue_lock);
 
+	/* prepare to embark */
+	xmit_prepare_pkts(dev, tpktlist);
+
+	/* bye packets ....*/
 	HARD_TX_LOCK(dev, smp_processor_id());
-	ret = dev_hard_start_xmit(skb, dev);
+	ret = dev_batch_xmit(dev);
 	HARD_TX_UNLOCK(dev);
 
 	spin_lock(&dev->queue_lock);
@@ -158,8 +255,8 @@ static inline int qdisc_restart(struct net_device *dev)
 		break;
 
 	case NETDEV_TX_LOCKED:
-		/* Driver try lock failed */
-		ret = handle_dev_cpu_collision(skb, dev, q);
+		/* Driver lock failed */
+		ret = xmit_islocked(&dev->blist, dev, q);
 		break;
 
 	default:
@@ -168,7 +265,7 @@ static inline int qdisc_restart(struct net_device *dev)
 			printk(KERN_WARNING "BUG %s code %d qlen %d\n",
 			       dev->name, ret, q->q.qlen);
 
-		ret = dev_requeue_skb(skb, dev, q);
+		ret = dev_requeue_skbs(&dev->blist, dev, q);
 		break;
 	}
 
@@ -177,8 +274,11 @@ static inline int qdisc_restart(struct net_device *dev)
 
 void __qdisc_run(struct net_device *dev)
 {
+	struct sk_buff_head tpktlist;
+	skb_queue_head_init(&tpktlist);
+
 	do {
-		if (!qdisc_restart(dev))
+		if (!qdisc_restart(dev, &tpktlist))
 			break;
 	} while (!netif_queue_stopped(dev));
 
@@ -564,6 +664,9 @@ void dev_deactivate(struct net_device *dev)
 
 	skb = dev->gso_skb;
 	dev->gso_skb = NULL;
+	if (!skb_queue_empty(&dev->blist))
+		skb_queue_purge(&dev->blist);
+	dev->xmit_win = 1;
 	spin_unlock_bh(&dev->queue_lock);
 
 	kfree_skb(skb);

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [ofa-general] [PATCH 4/4][NET_SCHED] kill dev->gso_skb
  2007-09-23 18:00           ` [PATCH 3/4][NET_BATCH] net core use batching jamal
@ 2007-09-23 18:02             ` jamal
  2007-10-07 18:39               ` [ofa-general] [PATCH 3/3][NET_BATCH] " jamal
  0 siblings, 1 reply; 2+ messages in thread
From: jamal @ 2007-09-23 18:02 UTC (permalink / raw)
  To: David Miller
  Cc: johnpol, peter.p.waskiewicz.jr, kumarkr, herbert, gaagaan,
	Robert.Olsson, netdev, rdreier, mcarlson, randy.dunlap, jagana,
	general, mchan, tgraf, jeff, sri, shemminger, kaber

[-- Attachment #1: Type: text/plain, Size: 98 bytes --]


This patch removes dev->gso_skb as it is no longer necessary with
batching code.

cheers,
jamal


[-- Attachment #2: patch40f4 --]
[-- Type: text/plain, Size: 2277 bytes --]

[NET_SCHED] kill dev->gso_skb
The batching code does what gso used to batch at the drivers.
There is no more need for gso_skb. If for whatever reason the
requeueing is a bad idea we are going to leave packets in dev->blist
(and still not need dev->gso_skb)

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>

---
commit c6d2d61a73e1df5daaa294876f62454413fcb0af
tree 1d7bf650096a922a6b6a4e7d6810f83320eb94dd
parent e26705f6ef7db034df7af3f4fccd7cd40b8e46e0
author Jamal Hadi Salim <hadi@cyberus.ca> Sun, 23 Sep 2007 12:25:10 -0400
committer Jamal Hadi Salim <hadi@cyberus.ca> Sun, 23 Sep 2007 12:25:10 -0400

 include/linux/netdevice.h |    3 ---
 net/sched/sch_generic.c   |   12 ------------
 2 files changed, 0 insertions(+), 15 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 443cded..7811729 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -560,9 +560,6 @@ struct net_device
 	struct list_head	qdisc_list;
 	unsigned long		tx_queue_len;	/* Max frames per queue allowed */
 
-	/* Partially transmitted GSO packet. */
-	struct sk_buff		*gso_skb;
-
 	/* ingress path synchronizer */
 	spinlock_t		ingress_lock;
 	struct Qdisc		*qdisc_ingress;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 86a3f9d..b4e1607 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -172,13 +172,6 @@ static int xmit_get_pkts(struct net_device *dev,
 	struct sk_buff *skb;
 	int count = dev->xmit_win;
 
-	if (count  && dev->gso_skb) {
-		skb = dev->gso_skb;
-		dev->gso_skb = NULL;
-		count -= xmit_count_skbs(skb);
-		__skb_queue_tail(pktlist, skb);
-	}
-
 	while (count > 0) {
 		skb = q->dequeue(q);
 		if (!skb)
@@ -654,7 +647,6 @@ void dev_activate(struct net_device *dev)
 void dev_deactivate(struct net_device *dev)
 {
 	struct Qdisc *qdisc;
-	struct sk_buff *skb;
 
 	spin_lock_bh(&dev->queue_lock);
 	qdisc = dev->qdisc;
@@ -662,15 +654,11 @@ void dev_deactivate(struct net_device *dev)
 
 	qdisc_reset(qdisc);
 
-	skb = dev->gso_skb;
-	dev->gso_skb = NULL;
 	if (!skb_queue_empty(&dev->blist))
 		skb_queue_purge(&dev->blist);
 	dev->xmit_win = 1;
 	spin_unlock_bh(&dev->queue_lock);
 
-	kfree_skb(skb);
-
 	dev_watchdog_down(dev);
 
 	/* Wait for outstanding dev_queue_xmit calls. */

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [ofa-general] [PATCH 3/3][NET_BATCH] kill dev->gso_skb
  2007-09-23 18:02             ` [ofa-general] [PATCH 4/4][NET_SCHED] kill dev->gso_skb jamal
@ 2007-10-07 18:39               ` jamal
  0 siblings, 0 replies; 2+ messages in thread
From: jamal @ 2007-10-07 18:39 UTC (permalink / raw)
  To: David Miller
  Cc: johnpol, peter.p.waskiewicz.jr, kumarkr, herbert, gaagaan,
	Robert.Olsson, netdev, rdreier, mcarlson, randy.dunlap, jagana,
	general, mchan, tgraf, jeff, sri, shemminger, kaber

[-- Attachment #1: Type: text/plain, Size: 96 bytes --]

This patch removes dev->gso_skb as it is no longer necessary with
batching code.

cheers,
jamal

[-- Attachment #2: oct07-p3of3 --]
[-- Type: text/plain, Size: 2277 bytes --]

[NET_BATCH] kill dev->gso_skb
The batching code does what gso used to batch at the drivers.
There is no more need for gso_skb. If for whatever reason the
requeueing is a bad idea we are going to leave packets in dev->blist
(and still not need dev->gso_skb)

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>

---
commit 7ebf50f0f43edd4897b88601b4133612fc36af61
tree 5d942ecebc14de6254ab3c812d542d524e148e92
parent cd602aa5f84fcef6359852cd99c95863eeb91015
author Jamal Hadi Salim <hadi@cyberus.ca> Sun, 07 Oct 2007 09:30:19 -0400
committer Jamal Hadi Salim <hadi@cyberus.ca> Sun, 07 Oct 2007 09:30:19 -0400

 include/linux/netdevice.h |    3 ---
 net/sched/sch_generic.c   |   12 ------------
 2 files changed, 0 insertions(+), 15 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b31df5c..4ddc6eb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -577,9 +577,6 @@ struct net_device
 	struct list_head	qdisc_list;
 	unsigned long		tx_queue_len;	/* Max frames per queue allowed */
 
-	/* Partially transmitted GSO packet. */
-	struct sk_buff		*gso_skb;
-
 	/* ingress path synchronizer */
 	spinlock_t		ingress_lock;
 	struct Qdisc		*qdisc_ingress;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 80ac56b..772e7fe 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -172,13 +172,6 @@ static int xmit_get_pkts(struct net_device *dev,
 	struct sk_buff *skb;
 	int count = dev->xmit_win;
 
-	if (count  && dev->gso_skb) {
-		skb = dev->gso_skb;
-		dev->gso_skb = NULL;
-		count -= xmit_count_skbs(skb);
-		__skb_queue_tail(pktlist, skb);
-	}
-
 	while (count > 0) {
 		skb = q->dequeue(q);
 		if (!skb)
@@ -659,7 +652,6 @@ void dev_activate(struct net_device *dev)
 void dev_deactivate(struct net_device *dev)
 {
 	struct Qdisc *qdisc;
-	struct sk_buff *skb;
 
 	spin_lock_bh(&dev->queue_lock);
 	qdisc = dev->qdisc;
@@ -667,15 +659,11 @@ void dev_deactivate(struct net_device *dev)
 
 	qdisc_reset(qdisc);
 
-	skb = dev->gso_skb;
-	dev->gso_skb = NULL;
 	if (!skb_queue_empty(&dev->blist))
 		skb_queue_purge(&dev->blist);
 	dev->xmit_win = 1;
 	spin_unlock_bh(&dev->queue_lock);
 
-	kfree_skb(skb);
-
 	dev_watchdog_down(dev);
 
 	/* Wait for outstanding dev_queue_xmit calls. */

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-10-08 18:27 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-08 18:27 [ofa-general] [PATCH 3/3][NET_BATCH] kill dev->gso_skb jamal
  -- strict thread matches above, loose matches on Subject: below --
2007-09-14  9:00 [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 Krishna Kumar
2007-09-16 23:17 ` [ofa-general] " David Miller
2007-09-17  0:29   ` jamal
2007-09-23 17:53     ` [PATCHES] TX batching jamal
2007-09-23 17:56       ` [ofa-general] [PATCH 1/4] [NET_SCHED] explict hold dev tx lock jamal
2007-09-23 17:58         ` [ofa-general] [PATCH 2/4] [NET_BATCH] Introduce batching interface jamal
2007-09-23 18:00           ` [PATCH 3/4][NET_BATCH] net core use batching jamal
2007-09-23 18:02             ` [ofa-general] [PATCH 4/4][NET_SCHED] kill dev->gso_skb jamal
2007-10-07 18:39               ` [ofa-general] [PATCH 3/3][NET_BATCH] " jamal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox