* [ofa-general] [PATCH 3/3][NET_BATCH] kill dev->gso_skb @ 2007-10-08 18:27 jamal 0 siblings, 0 replies; 2+ messages in thread From: jamal @ 2007-10-08 18:27 UTC (permalink / raw) To: David Miller Cc: johnpol, peter.p.waskiewicz.jr, kumarkr, herbert, gaagaan, Robert.Olsson, netdev, rdreier, mcarlson, randy.dunlap, jagana, general, mchan, tgraf, jeff, sri, shemminger, kaber [-- Attachment #1: Type: text/plain, Size: 97 bytes --] This patch removes dev->gso_skb as it is no longer necessary with batching code. cheers, jamal [-- Attachment #2: 03-kill-dev-gso-skb.patch --] [-- Type: text/x-patch, Size: 2277 bytes --] [NET_BATCH] kill dev->gso_skb The batching code does what gso used to batch at the drivers. There is no more need for gso_skb. If for whatever reason the requeueing is a bad idea we are going to leave packets in dev->blist (and still not need dev->gso_skb) Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> --- commit 3a6202d62adff75b85d6ca0f8fd491abf9d63f4b tree 80497c538bdb3eab6ab81ff1dec1c3263da79826 parent 63381156b35719a364d0f81fec487e6263fb5467 author Jamal Hadi Salim <hadi@cyberus.ca> Mon, 08 Oct 2007 09:11:02 -0400 committer Jamal Hadi Salim <hadi@cyberus.ca> Mon, 08 Oct 2007 09:11:02 -0400 include/linux/netdevice.h | 3 --- net/sched/sch_generic.c | 12 ------------ 2 files changed, 0 insertions(+), 15 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b31df5c..4ddc6eb 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -577,9 +577,6 @@ struct net_device struct list_head qdisc_list; unsigned long tx_queue_len; /* Max frames per queue allowed */ - /* Partially transmitted GSO packet. */ - struct sk_buff *gso_skb; - /* ingress path synchronizer */ spinlock_t ingress_lock; struct Qdisc *qdisc_ingress; diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 96bfdcb..9112ea0 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -172,13 +172,6 @@ static int xmit_get_pkts(struct net_device *dev, struct sk_buff *skb; int count = dev->xmit_win; - if (count && dev->gso_skb) { - skb = dev->gso_skb; - dev->gso_skb = NULL; - count -= xmit_count_skbs(skb); - __skb_queue_tail(pktlist, skb); - } - while (count > 0) { skb = q->dequeue(q); if (!skb) @@ -647,7 +640,6 @@ void dev_activate(struct net_device *dev) void dev_deactivate(struct net_device *dev) { struct Qdisc *qdisc; - struct sk_buff *skb; spin_lock_bh(&dev->queue_lock); qdisc = dev->qdisc; @@ -655,15 +647,11 @@ void dev_deactivate(struct net_device *dev) qdisc_reset(qdisc); - skb = dev->gso_skb; - dev->gso_skb = NULL; if (!skb_queue_empty(&dev->blist)) skb_queue_purge(&dev->blist); dev->xmit_win = 1; spin_unlock_bh(&dev->queue_lock); - kfree_skb(skb); - dev_watchdog_down(dev); /* Wait for outstanding dev_queue_xmit calls. */ [-- Attachment #3: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply related [flat|nested] 2+ messages in thread
* [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 @ 2007-09-14 9:00 Krishna Kumar 2007-09-16 23:17 ` [ofa-general] " David Miller 0 siblings, 1 reply; 2+ messages in thread From: Krishna Kumar @ 2007-09-14 9:00 UTC (permalink / raw) To: johnpol, herbert, hadi, kaber, shemminger, davem Cc: jagana, Robert.Olsson, rick.jones2, xma, gaagaan, netdev, rdreier, peter.p.waskiewicz.jr, mcarlson, jeff, mchan, general, kumarkr, tgraf, randy.dunlap, Krishna Kumar, sri This set of patches implements the batching xmit capability, and adds support for batching in IPoIB and E1000 (E1000 driver changes is ported, thanks to changes taken from Jamal's code from an old kernel). List of changes from previous revision: ---------------------------------------- 1. [Dave] Enable batching as default (change in register_netdev). 2. [Randy] Update documentation (however ethtool cmd to get/set batching is not implemented, hence I am guessing the usage). 3. [KK] When changing tx_batch_skb, qdisc xmits need to be blocked since qdisc_restart() drops queue_lock before calling driver xmit, and driver could find blist change under it. 4. [KK] sched: requeue could wrongly requeue skb already put in the batching list (in case a single skb was sent to the device but not sent as the device was full, resulting in the skb getting added to blist). This also results in slight optimization of batching behavior where for getting skbs #2 onwards don't require to check for gso_skb as that is the first skb that is processed. 4. [KK] Change documentation to explain this behavior. 5. [KK] sched: Fix panic when GSO is enabled in driver. 6. [KK] IPoIB: Small optimization in ipoib_ib_handle_tx_wc 7. [KK] netdevice: Needed to change NETIF_F_GSO_SHIFT/NETIF_F_GSO_MASK as BATCH_SKBS is now defined as 65536 (earlier it was using 8192 which was taken up by NETIF_F_NETNS_LOCAL). Will submit in the next 1-2 days: --------------------------------- 1. [Auke] Enable batching in e1000e. Extras that I can do later: --------------------------- 1. [Patrick] Use skb_blist statically in netdevice. This could also be used to integrate GSO and batching. 2. [Evgeniy] Useful to splice lists dev_add_skb_to_blist (and this can be done for regular xmit's of GSO skbs too for #1 above). Patches are described as: Mail 0/10: This mail Mail 1/10: HOWTO documentation Mail 2/10: Introduce skb_blist, NETIF_F_BATCH_SKBS, use single API for batching/no-batching, etc. Mail 3/10: Modify qdisc_run() to support batching Mail 4/10: Add ethtool support to enable/disable batching Mail 5/10: IPoIB: Header file changes to use batching Mail 6/10: IPoIB: CM & Multicast changes Mail 7/10: IPoIB: Verbs changes to use batching Mail 8/10: IPoIB: Internal post and work completion handler Mail 9/10: IPoIB: Implement the new batching capability Mail 10/10: E1000: Implement the new batching capability Issues: -------- The retransmission problem reported earlier seems to happen when mthca is used as the underlying device, but when I tested ehca the retransmissions dropped to normal levels (around 2 times the regular code). The performance improvement is around 55% for TCP. Please review and provide feedback; and consider for inclusion. Thanks, - KK ---------------------------------------------------- TCP ---- Size:32 Procs:1 2728 3544 29.91 Size:128 Procs:1 11803 13679 15.89 Size:512 Procs:1 43279 49665 14.75 Size:4096 Procs:1 147952 101246 -31.56 Size:16384 Procs:1 149852 141897 -5.30 Size:32 Procs:4 10562 11349 7.45 Size:128 Procs:4 41010 40832 -.43 Size:512 Procs:4 75374 130943 73.72 Size:4096 Procs:4 167996 368218 119.18 Size:16384 Procs:4 123176 379524 208.11 Size:32 Procs:8 21125 21990 4.09 Size:128 Procs:8 77419 78605 1.53 Size:512 Procs:8 234678 265047 12.94 Size:4096 Procs:8 218063 367604 68.57 Size:16384 Procs:8 184283 370972 101.30 Average: 1509300 -> 2345115 = 55.38% ---------------------------------------------------- ^ permalink raw reply [flat|nested] 2+ messages in thread
* [ofa-general] Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 2007-09-14 9:00 [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 Krishna Kumar @ 2007-09-16 23:17 ` David Miller 2007-09-17 0:29 ` jamal 0 siblings, 1 reply; 2+ messages in thread From: David Miller @ 2007-09-16 23:17 UTC (permalink / raw) To: krkumar2 Cc: johnpol, jagana, peter.p.waskiewicz.jr, kumarkr, herbert, gaagaan, Robert.Olsson, netdev, rdreier, hadi, mcarlson, jeff, general, mchan, tgraf, randy.dunlap, sri, shemminger, kaber From: Krishna Kumar <krkumar2@in.ibm.com> Date: Fri, 14 Sep 2007 14:30:58 +0530 > This set of patches implements the batching xmit capability, and > adds support for batching in IPoIB and E1000 (E1000 driver changes > is ported, thanks to changes taken from Jamal's code from an old > kernel). The only major complaint I have about this patch series is that the IPoIB part should just be one big changeset. Otherwise the tree is not bisectable, for example the initial ipoib header file change breaks the build. The tree must compile and work properly after every single patch. On a lower priority, I question the indirection of skb_blist by making it a pointer. For what? Saving 12 bytes on 64-bit? That kmalloc()'d thing is a nearly guarenteed cache and/or TLB miss. Just inline the thing, we generally don't do crap like this anywhere else. ^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 2007-09-16 23:17 ` [ofa-general] " David Miller @ 2007-09-17 0:29 ` jamal 2007-09-23 17:53 ` [PATCHES] TX batching jamal 0 siblings, 1 reply; 2+ messages in thread From: jamal @ 2007-09-17 0:29 UTC (permalink / raw) To: David Miller Cc: krkumar2, johnpol, herbert, kaber, shemminger, jagana, Robert.Olsson, rick.jones2, xma, gaagaan, netdev, rdreier, peter.p.waskiewicz.jr, mcarlson, jeff, mchan, general, kumarkr, tgraf, randy.dunlap, sri On Sun, 2007-16-09 at 16:17 -0700, David Miller wrote: > The only major complaint I have about this patch series is that > the IPoIB part should just be one big changeset. Dave, you do realize that i have been investing my time working on batching as well, right? cheers, jamal ^ permalink raw reply [flat|nested] 2+ messages in thread
* [PATCHES] TX batching 2007-09-17 0:29 ` jamal @ 2007-09-23 17:53 ` jamal 2007-09-23 17:56 ` [ofa-general] [PATCH 1/4] [NET_SCHED] explict hold dev tx lock jamal 0 siblings, 1 reply; 2+ messages in thread From: jamal @ 2007-09-23 17:53 UTC (permalink / raw) To: David Miller Cc: krkumar2, johnpol, herbert, kaber, shemminger, jagana, Robert.Olsson, rick.jones2, xma, gaagaan, netdev, rdreier, peter.p.waskiewicz.jr, mcarlson, jeff, mchan, general, kumarkr, tgraf, randy.dunlap, sri I had plenty of time this weekend so i have been doing a _lot_ of testing. My next emails will send a set of patches: Patch 1: Introduces explicit tx locking Patch 2: Introduces batching interface Patch 3: Core uses batching interface Patch 4: get rid of dev->gso_skb Testing ------- Each of these patches has been performance tested and the results are in the logs on a per-patch basis. My system under test hardware is a 2xdual core opteron with a couple of tg3s. My test tool generates udp traffic of different sizes for upto 60 seconds per run or a total of 30M packets. I have 4 threads each running on a specific CPU which keep all the CPUs as busy as they can sending packets targetted at a directly connected box's udp discard port. All 4 CPUs target a single tg3 to send. The receiving box has a tc rule which counts and drops all incoming udp packets to discard port - this allows me to make sure that the receiver is not the bottleneck in the testing. Packet sizes sent are {64B, 128B, 256B, 512B, 1024B}. Each packet size run is repeated 10 times to ensure that there are no transients. The average of all 10 runs is then computed and collected. I have not run testing on patch #4 because i had to let the machine go, but will have some access to it tommorow early morning where i can run some tests. Comments -------- Iam trying to kill ->hard_batch_xmit() but it would be tricky to do without it for LLTX drivers. Anything i try will require a few extra checks. OTOH, I could kill LLTX for the drivers i am using that are LLTX and then drop that interface or I could say "no support for LLTX". I am in a dilema. Dave please let me know if this meets your desires to allow devices which are SG and able to compute CSUM benefit just in case i misunderstood. Herbert, if you can look at at least patch 4 i will appreaciate it. More patches to follow - i didnt want to overload people by dumping too many patches. Most of these patches below are ready to go; some are need some testing and others need a little porting from an earlier kernel: - tg3 driver (tested and works well, but dont want to send - tun driver - pktgen - netiron driver - e1000 driver - ethtool interface - There is at least one other driver promised to me I am also going to update the two documents i posted earlier. Hopefully i can do that today. cheers, jamal ^ permalink raw reply [flat|nested] 2+ messages in thread
* [ofa-general] [PATCH 1/4] [NET_SCHED] explict hold dev tx lock 2007-09-23 17:53 ` [PATCHES] TX batching jamal @ 2007-09-23 17:56 ` jamal 2007-09-23 17:58 ` [ofa-general] [PATCH 2/4] [NET_BATCH] Introduce batching interface jamal 0 siblings, 1 reply; 2+ messages in thread From: jamal @ 2007-09-23 17:56 UTC (permalink / raw) To: David Miller Cc: johnpol, peter.p.waskiewicz.jr, kumarkr, herbert, gaagaan, Robert.Olsson, netdev, rdreier, mcarlson, randy.dunlap, jagana, general, mchan, tgraf, jeff, sri, shemminger, kaber [-- Attachment #1: Type: text/plain, Size: 141 bytes --] I have submitted this before; but here it is again. Against net-2.6.24 from yesterday for this and all following patches. cheers, jamal [-- Attachment #2: patch10f4 --] [-- Type: text/plain, Size: 3260 bytes --] [NET_SCHED] explict hold dev tx lock For N cpus, with full throttle traffic on all N CPUs, funneling traffic to the same ethernet device, the devices queue lock is contended by all N CPUs constantly. The TX lock is only contended by a max of 2 CPUS. In the current mode of qdisc operation, when all N CPUs contend for the dequeue region and one of them (after all the work) entering dequeue region, we may endup aborting the path if we are unable to get the tx lock and go back to contend for the queue lock. As N goes up, this gets worse. The changes in this patch result in a small increase in performance with a 4CPU (2xdual-core) with no irq binding. My tests are UDP based and keep all 4CPUs busy all the time for the period of the test. Both e1000 and tg3 showed similar behavior. I expect higher gains with more CPUs. Summary below with different UDP packets and the resulting pps seen. Note at around 200Bytes, the two dont seem that much different and we are approaching wire speed (with plenty of CPU available; eg at 512B, the app is sitting at 80% idle on both cases). +------------+--------------+-------------+------------+--------+ pktsize | 64B | 128B | 256B | 512B |1024B | +------------+--------------+-------------+------------+--------+ Original| 467482 | 463061 | 388267 | 216308 | 114704 | | | | | | | txlock | 468922 | 464060 | 388298 | 216316 | 114709 | ----------------------------------------------------------------- Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> --- commit b0e36991c5850dfe930f80ee508b08fdcabc18d1 tree b1787bba26f80a325298f89d1ec882cc5ab524ae parent 42765047105fdd496976bc1784d22eec1cd9b9aa author Jamal Hadi Salim <hadi@cyberus.ca> Sun, 23 Sep 2007 09:09:17 -0400 committer Jamal Hadi Salim <hadi@cyberus.ca> Sun, 23 Sep 2007 09:09:17 -0400 net/sched/sch_generic.c | 19 ++----------------- 1 files changed, 2 insertions(+), 17 deletions(-) diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index e970e8e..95ae119 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -134,34 +134,19 @@ static inline int qdisc_restart(struct net_device *dev) { struct Qdisc *q = dev->qdisc; struct sk_buff *skb; - unsigned lockless; int ret; /* Dequeue packet */ if (unlikely((skb = dev_dequeue_skb(dev, q)) == NULL)) return 0; - /* - * When the driver has LLTX set, it does its own locking in - * start_xmit. These checks are worth it because even uncongested - * locks can be quite expensive. The driver can do a trylock, as - * is being done here; in case of lock contention it should return - * NETDEV_TX_LOCKED and the packet will be requeued. - */ - lockless = (dev->features & NETIF_F_LLTX); - - if (!lockless && !netif_tx_trylock(dev)) { - /* Another CPU grabbed the driver tx lock */ - return handle_dev_cpu_collision(skb, dev, q); - } /* And release queue */ spin_unlock(&dev->queue_lock); + HARD_TX_LOCK(dev, smp_processor_id()); ret = dev_hard_start_xmit(skb, dev); - - if (!lockless) - netif_tx_unlock(dev); + HARD_TX_UNLOCK(dev); spin_lock(&dev->queue_lock); q = dev->qdisc; [-- Attachment #3: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply related [flat|nested] 2+ messages in thread
* [ofa-general] [PATCH 2/4] [NET_BATCH] Introduce batching interface 2007-09-23 17:56 ` [ofa-general] [PATCH 1/4] [NET_SCHED] explict hold dev tx lock jamal @ 2007-09-23 17:58 ` jamal 2007-09-23 18:00 ` [PATCH 3/4][NET_BATCH] net core use batching jamal 0 siblings, 1 reply; 2+ messages in thread From: jamal @ 2007-09-23 17:58 UTC (permalink / raw) To: David Miller Cc: johnpol, peter.p.waskiewicz.jr, kumarkr, herbert, gaagaan, Robert.Olsson, netdev, rdreier, mcarlson, randy.dunlap, jagana, general, mchan, tgraf, jeff, sri, shemminger, kaber [-- Attachment #1: Type: text/plain, Size: 77 bytes --] This patch introduces the netdevice interface for batching. cheers, jamal [-- Attachment #2: patch20f4 --] [-- Type: text/plain, Size: 8823 bytes --] [NET_BATCH] Introduce batching interface This patch introduces the netdevice interface for batching. A typical driver dev->hard_start_xmit() has 4 parts: a) packet formating (example vlan, mss, descriptor counting etc) b) chip specific formatting c) enqueueing the packet on a DMA ring d) IO operations to complete packet transmit, tell DMA engine to chew on, tx completion interupts etc [For code cleanliness/readability sake, regardless of this work, one should break the dev->hard_start_xmit() into those 4 functions anyways]. With the api introduced in this patch, a driver which has all 4 parts and needing to support batching is advised to split its dev->hard_start_xmit() in the following manner: 1)use its dev->hard_prep_xmit() method to achieve #a 2)use its dev->hard_end_xmit() method to achieve #d 3)#b and #c can stay in ->hard_start_xmit() (or whichever way you want to do this) Note: There are drivers which may need not support any of the two methods (example the tun driver i patched) so the two methods are optional. The core will first do the packet formatting by invoking your supplied dev->hard_prep_xmit() method. It will then pass you the packet via your dev->hard_start_xmit() method and lastly will invoke your dev->hard_end_xmit() when it completes passing you all the packets queued for you. dev->hard_prep_xmit() is invoked without holding any tx lock but the rest are under TX_LOCK(). LLTX present a challenge in that we have to introduce a deviation from the norm and introduce the ->hard_batch_xmit() method. An LLTX driver presents us with ->hard_batch_xmit() to which we pass it a list of packets in a dev->blist skb queue. It is then the responsibility of the ->hard_batch_xmit() to exercise steps #b and #c for all packets and #d when the batching is complete. Step #a is already done for you by the time you get the packets in dev->blist. And last xmit_win variable is introduced to ensure that when we pass the driver a list of packets it will swallow all of them - which is useful because we dont requeue to the qdisc (and avoids burning unnecessary cpu cycles or introducing any strange re-ordering). The driver tells us when it invokes netif_wake_queue how much space it has for descriptors by setting this variable. Some decisions i had to make: - every driver will have a xmit_win variable and the core will set it to 1 which means the behavior of non-batching drivers stays the same. - the batch list, blist, is no longer a pointer; wastes a little extra memmory i plan to recoup by killing gso_skb in later patches. Theres a lot of history and reasoning of why batching in a document i am writting which i may submit as a patch. Thomas Graf (who doesnt know this probably) gave me the impetus to start looking at this back in 2004 when he invited me to the linux conference he was organizing. Parts of what i presented in SUCON in 2004 talk about batching. Herbert Xu forced me to take a second look around 2.6.18 - refer to my netconf 2006 presentation. Krishna Kumar provided me with more motivation in May 2007 when he posted on netdev and engaged me. Sridhar Samudrala, Krishna Kumar, Matt Carlson, Michael Chan, Jeremy Ethridge, Evgeniy Polyakov, Sivakumar Subramani, and David Miller, have contributed in one or more of {bug fixes, enhancements, testing, lively discussion}. The Broadcom and netiron folks have been outstanding in their help. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> --- commit ab4b07ef2e4069c115c9c1707d86ae2344a5ded5 tree 994b42b03bbfcc09ac8b7670c53c12e0b2a71dc7 parent b0e36991c5850dfe930f80ee508b08fdcabc18d1 author Jamal Hadi Salim <hadi@cyberus.ca> Sun, 23 Sep 2007 10:30:32 -0400 committer Jamal Hadi Salim <hadi@cyberus.ca> Sun, 23 Sep 2007 10:30:32 -0400 include/linux/netdevice.h | 17 +++++++ net/core/dev.c | 106 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 123 insertions(+), 0 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index cf89ce6..443cded 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -453,6 +453,7 @@ struct net_device #define NETIF_F_NETNS_LOCAL 8192 /* Does not change network namespaces */ #define NETIF_F_MULTI_QUEUE 16384 /* Has multiple TX/RX queues */ #define NETIF_F_LRO 32768 /* large receive offload */ +#define NETIF_F_BTX 65536 /* Capable of batch tx */ /* Segmentation offload features */ #define NETIF_F_GSO_SHIFT 16 @@ -578,6 +579,15 @@ struct net_device void *priv; /* pointer to private data */ int (*hard_start_xmit) (struct sk_buff *skb, struct net_device *dev); + /* hard_batch_xmit is needed for LLTX, kill it when those + * disappear or better kill it now and dont support LLTX + */ + int (*hard_batch_xmit) (struct net_device *dev); + int (*hard_prep_xmit) (struct sk_buff *skb, + struct net_device *dev); + void (*hard_end_xmit) (struct net_device *dev); + int xmit_win; + /* These may be needed for future network-power-down code. */ unsigned long trans_start; /* Time (in jiffies) of last Tx */ @@ -592,6 +602,7 @@ struct net_device /* delayed register/unregister */ struct list_head todo_list; + struct sk_buff_head blist; /* device index hash chain */ struct hlist_node index_hlist; @@ -1022,6 +1033,12 @@ extern int dev_set_mac_address(struct net_device *, struct sockaddr *); extern int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev); +extern int dev_batch_xmit(struct net_device *dev); +extern int prepare_gso_skb(struct sk_buff *skb, + struct net_device *dev, + struct sk_buff_head *skbs); +extern int xmit_prepare_skb(struct sk_buff *skb, + struct net_device *dev); extern int netdev_budget; diff --git a/net/core/dev.c b/net/core/dev.c index 91c31e6..25d01fd 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1531,6 +1531,110 @@ static int dev_gso_segment(struct sk_buff *skb) return 0; } +int prepare_gso_skb(struct sk_buff *skb, struct net_device *dev, + struct sk_buff_head *skbs) +{ + int tdq = 0; + do { + struct sk_buff *nskb = skb->next; + + skb->next = nskb->next; + nskb->next = NULL; + + if (dev->hard_prep_xmit) { + /* note: skb->cb is set in hard_prep_xmit(), + * it should not be trampled somewhere + * between here and the driver picking it + * The VLAN code wrongly assumes it owns it + * so the driver needs to be careful; for + * good handling look at tg3 driver .. + */ + int ret = dev->hard_prep_xmit(nskb, dev); + if (ret != NETDEV_TX_OK) + continue; + } + /* Driver likes this packet .. */ + tdq++; + __skb_queue_tail(skbs, nskb); + } while (skb->next); + skb->destructor = DEV_GSO_CB(skb)->destructor; + kfree_skb(skb); + + return tdq; +} + +int xmit_prepare_skb(struct sk_buff *skb, struct net_device *dev) +{ + struct sk_buff_head *skbs = &dev->blist; + + if (netif_needs_gso(dev, skb)) { + if (unlikely(dev_gso_segment(skb))) { + kfree_skb(skb); + return 0; + } + if (skb->next) + return prepare_gso_skb(skb, dev, skbs); + } + + if (dev->hard_prep_xmit) { + int ret = dev->hard_prep_xmit(skb, dev); + if (ret != NETDEV_TX_OK) + return 0; + } + __skb_queue_tail(skbs, skb); + return 1; +} + +int dev_batch_xmit(struct net_device *dev) +{ + struct sk_buff_head *skbs = &dev->blist; + int rc = NETDEV_TX_OK; + struct sk_buff *skb; + int orig_w = dev->xmit_win; + int orig_pkts = skb_queue_len(skbs); + + if (dev->hard_batch_xmit) { /* only for LLTX devices */ + rc = dev->hard_batch_xmit(dev); + } else { + while ((skb = __skb_dequeue(skbs)) != NULL) { + if (!list_empty(&ptype_all)) + dev_queue_xmit_nit(skb, dev); + rc = dev->hard_start_xmit(skb, dev); + if (unlikely(rc)) + break; + /* + * XXX: multiqueue may need closer srutiny.. + */ + if (unlikely(netif_queue_stopped(dev) || + netif_subqueue_stopped(dev, skb->queue_mapping))) { + rc = NETDEV_TX_BUSY; + break; + } + } + } + + /* driver is likely buggy and lied to us on how much + * space it had. Damn you driver .. + */ + if (unlikely(skb_queue_len(skbs))) { + printk(KERN_WARNING "Likely bug %s %s (%d) " + "left %d/%d window now %d, orig %d\n", + dev->name, rc?"busy":"locked", + netif_queue_stopped(dev), + skb_queue_len(skbs), + orig_pkts, + dev->xmit_win, + orig_w); + rc = NETDEV_TX_BUSY; + } + + if (orig_pkts > skb_queue_len(skbs)) + if (dev->hard_end_xmit) + dev->hard_end_xmit(dev); + + return rc; +} + int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev) { if (likely(!skb->next)) { @@ -3565,6 +3669,8 @@ int register_netdevice(struct net_device *dev) } } + dev->xmit_win = 1; + skb_queue_head_init(&dev->blist); /* * nil rebuild_header routine, * that should be never called and used as just bug trap. [-- Attachment #3: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply related [flat|nested] 2+ messages in thread
* [PATCH 3/4][NET_BATCH] net core use batching 2007-09-23 17:58 ` [ofa-general] [PATCH 2/4] [NET_BATCH] Introduce batching interface jamal @ 2007-09-23 18:00 ` jamal 2007-09-23 18:02 ` [ofa-general] [PATCH 4/4][NET_SCHED] kill dev->gso_skb jamal 0 siblings, 1 reply; 2+ messages in thread From: jamal @ 2007-09-23 18:00 UTC (permalink / raw) To: David Miller Cc: krkumar2, johnpol, herbert, kaber, shemminger, jagana, Robert.Olsson, rick.jones2, xma, gaagaan, netdev, rdreier, peter.p.waskiewicz.jr, mcarlson, jeff, mchan, general, kumarkr, tgraf, randy.dunlap, sri [-- Attachment #1: Type: text/plain, Size: 71 bytes --] This patch adds the usage of batching within the core. cheers, jamal [-- Attachment #2: patch30f4 --] [-- Type: text/plain, Size: 6919 bytes --] [NET_BATCH] net core use batching This patch adds the usage of batching within the core. The same test methodology used in introducing txlock is used, with the following results on different kernels: +------------+--------------+-------------+------------+--------+ | 64B | 128B | 256B | 512B |1024B | +------------+--------------+-------------+------------+--------+ Original| 467482 | 463061 | 388267 | 216308 | 114704 | | | | | | | txlock | 468922 | 464060 | 388298 | 216316 | 114709 | | | | | | | tg3nobtx| 468012 | 464079 | 388293 | 216314 | 114704 | | | | | | | tg3btxdr| 480794 | 475102 | 388298 | 216316 | 114705 | | | | | | | tg3btxco| 481059 | 475423 | 388285 | 216308 | 114706 | +------------+--------------+-------------+------------+--------+ The first two colums "Original" and "txlock" were introduced in an earlier patch and demonstrate a slight increase in performance with txlock. "tg3nobtx" shows the tg3 driver with no changes to support batching. The purpose of this test is to demonstrate the effect of introducing the core changes to a driver that doesnt support them. Although this patch brings down perfomance slightly compared to txlock for such netdevices, it is still better compared to just the original kernel. "tg3btxdr" demonstrates the effect of using ->hard_batch_xmit() with tg3 driver. "tg3btxco" demonstrates the effect of letting the core do all the work. As can be seen the last two are not very different in performance. The difference is ->hard_batch_xmit() introduces a new method which is intrusive. I have #if-0ed some of the old functions so the patch is more readable. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> --- commit e26705f6ef7db034df7af3f4fccd7cd40b8e46e0 tree b99c469497a0145ca5c0651dc4229ce17da5b31c parent 6b8e2f76f86c35a6b2cee3698c633d20495ae0c0 author Jamal Hadi Salim <hadi@cyberus.ca> Sun, 23 Sep 2007 11:35:25 -0400 committer Jamal Hadi Salim <hadi@cyberus.ca> Sun, 23 Sep 2007 11:35:25 -0400 net/sched/sch_generic.c | 127 +++++++++++++++++++++++++++++++++++++++++++---- 1 files changed, 115 insertions(+), 12 deletions(-) diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 95ae119..86a3f9d 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -56,6 +56,7 @@ static inline int qdisc_qlen(struct Qdisc *q) return q->q.qlen; } +#if 0 static inline int dev_requeue_skb(struct sk_buff *skb, struct net_device *dev, struct Qdisc *q) { @@ -110,6 +111,97 @@ static inline int handle_dev_cpu_collision(struct sk_buff *skb, return ret; } +#endif + +static inline int handle_dev_cpu_collision(struct net_device *dev) +{ + if (unlikely(dev->xmit_lock_owner == smp_processor_id())) { + if (net_ratelimit()) + printk(KERN_WARNING + "Dead loop on netdevice %s, fix it urgently!\n", + dev->name); + return 1; + } + __get_cpu_var(netdev_rx_stat).cpu_collision++; + return 0; +} + +static inline int +dev_requeue_skbs(struct sk_buff_head *skbs, struct net_device *dev, + struct Qdisc *q) +{ + + struct sk_buff *skb; + + while ((skb = __skb_dequeue(skbs)) != NULL) + q->ops->requeue(skb, q); + + netif_schedule(dev); + return 0; +} + +static inline int +xmit_islocked(struct sk_buff_head *skbs, struct net_device *dev, + struct Qdisc *q) +{ + int ret = handle_dev_cpu_collision(dev); + + if (ret) { + if (!skb_queue_empty(skbs)) + skb_queue_purge(skbs); + return qdisc_qlen(q); + } + + return dev_requeue_skbs(skbs, dev, q); +} + +static int xmit_count_skbs(struct sk_buff *skb) +{ + int count = 0; + for (; skb; skb = skb->next) { + count += skb_shinfo(skb)->nr_frags; + count += 1; + } + return count; +} + +static int xmit_get_pkts(struct net_device *dev, + struct Qdisc *q, + struct sk_buff_head *pktlist) +{ + struct sk_buff *skb; + int count = dev->xmit_win; + + if (count && dev->gso_skb) { + skb = dev->gso_skb; + dev->gso_skb = NULL; + count -= xmit_count_skbs(skb); + __skb_queue_tail(pktlist, skb); + } + + while (count > 0) { + skb = q->dequeue(q); + if (!skb) + break; + + count -= xmit_count_skbs(skb); + __skb_queue_tail(pktlist, skb); + } + + return skb_queue_len(pktlist); +} + +static int xmit_prepare_pkts(struct net_device *dev, + struct sk_buff_head *tlist) +{ + struct sk_buff *skb; + struct sk_buff_head *flist = &dev->blist; + + while ((skb = __skb_dequeue(tlist)) != NULL) + xmit_prepare_skb(skb, dev); + + return skb_queue_len(flist); +} /* * NOTE: Called under dev->queue_lock with locally disabled BH. @@ -130,22 +222,27 @@ static inline int handle_dev_cpu_collision(struct sk_buff *skb, * >0 - queue is not empty. * */ -static inline int qdisc_restart(struct net_device *dev) + +static inline int qdisc_restart(struct net_device *dev, + struct sk_buff_head *tpktlist) { struct Qdisc *q = dev->qdisc; - struct sk_buff *skb; - int ret; + int ret = 0; - /* Dequeue packet */ - if (unlikely((skb = dev_dequeue_skb(dev, q)) == NULL)) - return 0; + ret = xmit_get_pkts(dev, q, tpktlist); + if (!ret) + return 0; - /* And release queue */ + /* We got em packets */ spin_unlock(&dev->queue_lock); + /* prepare to embark */ + xmit_prepare_pkts(dev, tpktlist); + + /* bye packets ....*/ HARD_TX_LOCK(dev, smp_processor_id()); - ret = dev_hard_start_xmit(skb, dev); + ret = dev_batch_xmit(dev); HARD_TX_UNLOCK(dev); spin_lock(&dev->queue_lock); @@ -158,8 +255,8 @@ static inline int qdisc_restart(struct net_device *dev) break; case NETDEV_TX_LOCKED: - /* Driver try lock failed */ - ret = handle_dev_cpu_collision(skb, dev, q); + /* Driver lock failed */ + ret = xmit_islocked(&dev->blist, dev, q); break; default: @@ -168,7 +265,7 @@ static inline int qdisc_restart(struct net_device *dev) printk(KERN_WARNING "BUG %s code %d qlen %d\n", dev->name, ret, q->q.qlen); - ret = dev_requeue_skb(skb, dev, q); + ret = dev_requeue_skbs(&dev->blist, dev, q); break; } @@ -177,8 +274,11 @@ static inline int qdisc_restart(struct net_device *dev) void __qdisc_run(struct net_device *dev) { + struct sk_buff_head tpktlist; + skb_queue_head_init(&tpktlist); + do { - if (!qdisc_restart(dev)) + if (!qdisc_restart(dev, &tpktlist)) break; } while (!netif_queue_stopped(dev)); @@ -564,6 +664,9 @@ void dev_deactivate(struct net_device *dev) skb = dev->gso_skb; dev->gso_skb = NULL; + if (!skb_queue_empty(&dev->blist)) + skb_queue_purge(&dev->blist); + dev->xmit_win = 1; spin_unlock_bh(&dev->queue_lock); kfree_skb(skb); ^ permalink raw reply related [flat|nested] 2+ messages in thread
* [ofa-general] [PATCH 4/4][NET_SCHED] kill dev->gso_skb 2007-09-23 18:00 ` [PATCH 3/4][NET_BATCH] net core use batching jamal @ 2007-09-23 18:02 ` jamal 2007-10-07 18:39 ` [ofa-general] [PATCH 3/3][NET_BATCH] " jamal 0 siblings, 1 reply; 2+ messages in thread From: jamal @ 2007-09-23 18:02 UTC (permalink / raw) To: David Miller Cc: johnpol, peter.p.waskiewicz.jr, kumarkr, herbert, gaagaan, Robert.Olsson, netdev, rdreier, mcarlson, randy.dunlap, jagana, general, mchan, tgraf, jeff, sri, shemminger, kaber [-- Attachment #1: Type: text/plain, Size: 98 bytes --] This patch removes dev->gso_skb as it is no longer necessary with batching code. cheers, jamal [-- Attachment #2: patch40f4 --] [-- Type: text/plain, Size: 2277 bytes --] [NET_SCHED] kill dev->gso_skb The batching code does what gso used to batch at the drivers. There is no more need for gso_skb. If for whatever reason the requeueing is a bad idea we are going to leave packets in dev->blist (and still not need dev->gso_skb) Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> --- commit c6d2d61a73e1df5daaa294876f62454413fcb0af tree 1d7bf650096a922a6b6a4e7d6810f83320eb94dd parent e26705f6ef7db034df7af3f4fccd7cd40b8e46e0 author Jamal Hadi Salim <hadi@cyberus.ca> Sun, 23 Sep 2007 12:25:10 -0400 committer Jamal Hadi Salim <hadi@cyberus.ca> Sun, 23 Sep 2007 12:25:10 -0400 include/linux/netdevice.h | 3 --- net/sched/sch_generic.c | 12 ------------ 2 files changed, 0 insertions(+), 15 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 443cded..7811729 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -560,9 +560,6 @@ struct net_device struct list_head qdisc_list; unsigned long tx_queue_len; /* Max frames per queue allowed */ - /* Partially transmitted GSO packet. */ - struct sk_buff *gso_skb; - /* ingress path synchronizer */ spinlock_t ingress_lock; struct Qdisc *qdisc_ingress; diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 86a3f9d..b4e1607 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -172,13 +172,6 @@ static int xmit_get_pkts(struct net_device *dev, struct sk_buff *skb; int count = dev->xmit_win; - if (count && dev->gso_skb) { - skb = dev->gso_skb; - dev->gso_skb = NULL; - count -= xmit_count_skbs(skb); - __skb_queue_tail(pktlist, skb); - } - while (count > 0) { skb = q->dequeue(q); if (!skb) @@ -654,7 +647,6 @@ void dev_activate(struct net_device *dev) void dev_deactivate(struct net_device *dev) { struct Qdisc *qdisc; - struct sk_buff *skb; spin_lock_bh(&dev->queue_lock); qdisc = dev->qdisc; @@ -662,15 +654,11 @@ void dev_deactivate(struct net_device *dev) qdisc_reset(qdisc); - skb = dev->gso_skb; - dev->gso_skb = NULL; if (!skb_queue_empty(&dev->blist)) skb_queue_purge(&dev->blist); dev->xmit_win = 1; spin_unlock_bh(&dev->queue_lock); - kfree_skb(skb); - dev_watchdog_down(dev); /* Wait for outstanding dev_queue_xmit calls. */ [-- Attachment #3: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply related [flat|nested] 2+ messages in thread
* [ofa-general] [PATCH 3/3][NET_BATCH] kill dev->gso_skb 2007-09-23 18:02 ` [ofa-general] [PATCH 4/4][NET_SCHED] kill dev->gso_skb jamal @ 2007-10-07 18:39 ` jamal 0 siblings, 0 replies; 2+ messages in thread From: jamal @ 2007-10-07 18:39 UTC (permalink / raw) To: David Miller Cc: johnpol, peter.p.waskiewicz.jr, kumarkr, herbert, gaagaan, Robert.Olsson, netdev, rdreier, mcarlson, randy.dunlap, jagana, general, mchan, tgraf, jeff, sri, shemminger, kaber [-- Attachment #1: Type: text/plain, Size: 96 bytes --] This patch removes dev->gso_skb as it is no longer necessary with batching code. cheers, jamal [-- Attachment #2: oct07-p3of3 --] [-- Type: text/plain, Size: 2277 bytes --] [NET_BATCH] kill dev->gso_skb The batching code does what gso used to batch at the drivers. There is no more need for gso_skb. If for whatever reason the requeueing is a bad idea we are going to leave packets in dev->blist (and still not need dev->gso_skb) Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> --- commit 7ebf50f0f43edd4897b88601b4133612fc36af61 tree 5d942ecebc14de6254ab3c812d542d524e148e92 parent cd602aa5f84fcef6359852cd99c95863eeb91015 author Jamal Hadi Salim <hadi@cyberus.ca> Sun, 07 Oct 2007 09:30:19 -0400 committer Jamal Hadi Salim <hadi@cyberus.ca> Sun, 07 Oct 2007 09:30:19 -0400 include/linux/netdevice.h | 3 --- net/sched/sch_generic.c | 12 ------------ 2 files changed, 0 insertions(+), 15 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b31df5c..4ddc6eb 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -577,9 +577,6 @@ struct net_device struct list_head qdisc_list; unsigned long tx_queue_len; /* Max frames per queue allowed */ - /* Partially transmitted GSO packet. */ - struct sk_buff *gso_skb; - /* ingress path synchronizer */ spinlock_t ingress_lock; struct Qdisc *qdisc_ingress; diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 80ac56b..772e7fe 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -172,13 +172,6 @@ static int xmit_get_pkts(struct net_device *dev, struct sk_buff *skb; int count = dev->xmit_win; - if (count && dev->gso_skb) { - skb = dev->gso_skb; - dev->gso_skb = NULL; - count -= xmit_count_skbs(skb); - __skb_queue_tail(pktlist, skb); - } - while (count > 0) { skb = q->dequeue(q); if (!skb) @@ -659,7 +652,6 @@ void dev_activate(struct net_device *dev) void dev_deactivate(struct net_device *dev) { struct Qdisc *qdisc; - struct sk_buff *skb; spin_lock_bh(&dev->queue_lock); qdisc = dev->qdisc; @@ -667,15 +659,11 @@ void dev_deactivate(struct net_device *dev) qdisc_reset(qdisc); - skb = dev->gso_skb; - dev->gso_skb = NULL; if (!skb_queue_empty(&dev->blist)) skb_queue_purge(&dev->blist); dev->xmit_win = 1; spin_unlock_bh(&dev->queue_lock); - kfree_skb(skb); - dev_watchdog_down(dev); /* Wait for outstanding dev_queue_xmit calls. */ [-- Attachment #3: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2007-10-08 18:27 UTC | newest] Thread overview: 2+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-10-08 18:27 [ofa-general] [PATCH 3/3][NET_BATCH] kill dev->gso_skb jamal -- strict thread matches above, loose matches on Subject: below -- 2007-09-14 9:00 [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 Krishna Kumar 2007-09-16 23:17 ` [ofa-general] " David Miller 2007-09-17 0:29 ` jamal 2007-09-23 17:53 ` [PATCHES] TX batching jamal 2007-09-23 17:56 ` [ofa-general] [PATCH 1/4] [NET_SCHED] explict hold dev tx lock jamal 2007-09-23 17:58 ` [ofa-general] [PATCH 2/4] [NET_BATCH] Introduce batching interface jamal 2007-09-23 18:00 ` [PATCH 3/4][NET_BATCH] net core use batching jamal 2007-09-23 18:02 ` [ofa-general] [PATCH 4/4][NET_SCHED] kill dev->gso_skb jamal 2007-10-07 18:39 ` [ofa-general] [PATCH 3/3][NET_BATCH] " jamal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox