public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000
@ 2007-09-14  9:00 Krishna Kumar
  2007-09-14  9:01 ` [PATCH 1/10 REV5] [Doc] HOWTO Documentation for batching Krishna Kumar
                   ` (11 more replies)
  0 siblings, 12 replies; 103+ messages in thread
From: Krishna Kumar @ 2007-09-14  9:00 UTC (permalink / raw)
  To: johnpol, herbert, hadi, kaber, shemminger, davem
  Cc: jagana, Robert.Olsson, rick.jones2, xma, gaagaan, netdev, rdreier,
	peter.p.waskiewicz.jr, mcarlson, jeff, mchan, general, kumarkr,
	tgraf, randy.dunlap, Krishna Kumar, sri

This set of patches implements the batching xmit capability, and adds support
for batching in IPoIB and E1000 (E1000 driver changes is ported, thanks to
changes taken from Jamal's code from an old kernel).

List of changes from previous revision:
----------------------------------------
1. [Dave] Enable batching as default (change in register_netdev).
2. [Randy] Update documentation (however ethtool cmd to get/set batching is
	not implemented, hence I am guessing the usage).
3. [KK] When changing tx_batch_skb, qdisc xmits need to be blocked since
	qdisc_restart() drops queue_lock before calling driver xmit, and
	driver could find blist change under it.
4. [KK] sched: requeue could wrongly requeue skb already put in the batching
	list (in case a single skb was sent to the device but not sent as the
	device was full, resulting in the skb getting added to blist). This
	also results in slight optimization of batching behavior where for
	getting skbs #2 onwards don't require to check for gso_skb as that
	is the first skb that is processed.
4. [KK] Change documentation to explain this behavior.
5. [KK] sched: Fix panic when GSO is enabled in driver.
6. [KK] IPoIB: Small optimization in ipoib_ib_handle_tx_wc
7. [KK] netdevice: Needed to change NETIF_F_GSO_SHIFT/NETIF_F_GSO_MASK as
	BATCH_SKBS is now defined as 65536 (earlier it was using 8192 which
	was taken up by NETIF_F_NETNS_LOCAL).


Will submit in the next 1-2 days:
---------------------------------
1. [Auke] Enable batching in e1000e.


Extras that I can do later:
---------------------------
1. [Patrick] Use skb_blist statically in netdevice. This could also be used
	to integrate GSO and batching.
2. [Evgeniy] Useful to splice lists dev_add_skb_to_blist (and this can be
	done for regular xmit's of GSO skbs too for #1 above).

Patches are described as:
		 Mail 0/10:  This mail
		 Mail 1/10:  HOWTO documentation
		 Mail 2/10:  Introduce skb_blist, NETIF_F_BATCH_SKBS, use
		 	     single API for batching/no-batching, etc.
		 Mail 3/10:  Modify qdisc_run() to support batching
		 Mail 4/10:  Add ethtool support to enable/disable batching
		 Mail 5/10:  IPoIB: Header file changes to use batching
		 Mail 6/10:  IPoIB: CM & Multicast changes
		 Mail 7/10:  IPoIB: Verbs changes to use batching
		 Mail 8/10:  IPoIB: Internal post and work completion handler
		 Mail 9/10:  IPoIB: Implement the new batching capability
		 Mail 10/10: E1000: Implement the new batching capability

Issues:
--------
The retransmission problem reported earlier seems to happen when mthca is
used as the underlying device, but when I tested ehca the retransmissions
dropped to normal levels (around 2 times the regular code). The performance
improvement is around 55% for TCP.

Please review and provide feedback; and consider for inclusion.

Thanks,

- KK

----------------------------------------------------
			TCP
			----
Size:32 Procs:1		2728	3544	29.91
Size:128 Procs:1	11803	13679	15.89
Size:512 Procs:1	43279	49665	14.75
Size:4096 Procs:1	147952	101246	-31.56
Size:16384 Procs:1	149852	141897	-5.30

Size:32 Procs:4		10562	11349	7.45
Size:128 Procs:4	41010	40832	-.43
Size:512 Procs:4	75374	130943	73.72
Size:4096 Procs:4	167996	368218	119.18
Size:16384 Procs:4	123176	379524	208.11

Size:32 Procs:8		21125	21990	4.09
Size:128 Procs:8	77419	78605	1.53
Size:512 Procs:8	234678	265047	12.94
Size:4096 Procs:8	218063	367604	68.57
Size:16384 Procs:8	184283	370972	101.30

Average:	1509300 -> 2345115 = 55.38%
----------------------------------------------------

^ permalink raw reply	[flat|nested] 103+ messages in thread
* [PATCH 1/3] [NET_BATCH] Introduce batching interface
@ 2007-10-08 18:24 jamal
  0 siblings, 0 replies; 103+ messages in thread
From: jamal @ 2007-10-08 18:24 UTC (permalink / raw)
  To: David Miller
  Cc: krkumar2, johnpol, herbert, kaber, shemminger, jagana,
	Robert.Olsson, rick.jones2, xma, gaagaan, netdev, rdreier,
	peter.p.waskiewicz.jr, mcarlson, jeff, mchan, general, kumarkr,
	tgraf, randy.dunlap, sri

[-- Attachment #1: Type: text/plain, Size: 76 bytes --]

This patch introduces the netdevice interface for batching.

cheers,
jamal


[-- Attachment #2: 01-introduce-batching-interface.patch --]
[-- Type: text/x-patch, Size: 6320 bytes --]

[NET_BATCH] Introduce batching interface

This patch introduces the netdevice interface for batching.

BACKGROUND
---------

A driver dev->hard_start_xmit() has 4 typical parts:
a) packet formating (example vlan, mss, descriptor counting etc)
b) chip specific formatting
c) enqueueing the packet on a DMA ring
d) IO operations to complete packet transmit, tell DMA engine to chew on,
tx completion interupts, set last tx time, etc

[For code cleanliness/readability sake, regardless of this work,
one should break the dev->hard_start_xmit() into those 4 functions
anyways].

INTRODUCING API
---------------

With the api introduced in this patch, a driver which has all
4 parts and needing to support batching is advised to split its
dev->hard_start_xmit() in the following manner:
1)Remove #d from dev->hard_start_xmit() and put it in
dev->hard_end_xmit() method.
2)#b and #c can stay in ->hard_start_xmit() (or whichever way you want
to do this)
3) #a is deffered to future work to reduce confusion (since it holds
on its own).

Note: There are drivers which may need not support any of the two
approaches (example the tun driver i patched) so the methods are
optional.

xmit_win variable is set by the driver to tell the core how much space
it has to take on new skbs. It is introduced to ensure that when we pass
the driver a list of packets it will swallow all of them - which is
useful because we dont requeue to the qdisc (and avoids burning
unnecessary cpu cycles or introducing any strange re-ordering). The driver
tells us when it invokes netif_wake_queue how much space it has for
descriptors by setting this variable.

Refer to the driver howto for more details.

THEORY OF OPERATION
-------------------

1. Core dequeues from qdiscs upto dev->xmit_win packets. Fragmented
and GSO packets are accounted for as well.
2. Core grabs TX_LOCK
3. Core loop for all skbs:
                    invokes driver dev->hard_start_xmit()
4. Core invokes driver dev->hard_end_xmit()

ACKNOWLEDGEMENT AND SOME HISTORY
--------------------------------

There's a lot of history and reasoning of "why batching" in a document
i am writting which i may submit as a patch.
Thomas Graf (who doesnt know this probably) gave me the impetus to
start looking at this back in 2004 when he invited me to the linux
conference he was organizing. Parts of what i presented in SUCON in
2004 talk about batching. Herbert Xu forced me to take a second look around
2.6.18 - refer to my netconf 2006 presentation. Krishna Kumar provided
me with more motivation in May 2007 when he posted on netdev and engaged
me.
Sridhar Samudrala, Krishna Kumar, Matt Carlson, Michael Chan,
Jeremy Ethridge, Evgeniy Polyakov, Sivakumar Subramani, David Miller,
and Patrick McHardy, Jeff Garzik and Bill Fink have contributed in one or
more of {bug fixes, enhancements, testing, lively discussion}. The
Broadcom and neterion folks have been outstanding in their help.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>

---
commit 8b9960cfbb98d48c0ac30b2c00d27c25217adb26
tree d8ab164757cee431f71a6011df4b09e0c382e2da
parent cef811600354e9f41f059570fc877d19a1daed99
author Jamal Hadi Salim <hadi@cyberus.ca> Mon, 08 Oct 2007 09:03:42 -0400
committer Jamal Hadi Salim <hadi@cyberus.ca> Mon, 08 Oct 2007 09:03:42 -0400

 include/linux/netdevice.h |   11 +++++++++++
 net/core/dev.c            |   40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 91cd3f3..b31df5c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -467,6 +467,7 @@ struct net_device
 #define NETIF_F_NETNS_LOCAL	8192	/* Does not change network namespaces */
 #define NETIF_F_MULTI_QUEUE	16384	/* Has multiple TX/RX queues */
 #define NETIF_F_LRO		32768	/* large receive offload */
+#define NETIF_F_BTX		65536	/* Capable of batch tx */
 
 	/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT	16
@@ -595,6 +596,9 @@ struct net_device
 	void			*priv;	/* pointer to private data	*/
 	int			(*hard_start_xmit) (struct sk_buff *skb,
 						    struct net_device *dev);
+	void			(*hard_end_xmit) (struct net_device *dev);
+	int			xmit_win;
+
 	/* These may be needed for future network-power-down code. */
 	unsigned long		trans_start;	/* Time (in jiffies) of last Tx	*/
 
@@ -609,6 +613,7 @@ struct net_device
 
 	/* delayed register/unregister */
 	struct list_head	todo_list;
+	struct sk_buff_head     blist;
 	/* device index hash chain */
 	struct hlist_node	index_hlist;
 
@@ -1044,6 +1049,12 @@ extern int		dev_set_mac_address(struct net_device *,
 					    struct sockaddr *);
 extern int		dev_hard_start_xmit(struct sk_buff *skb,
 					    struct net_device *dev);
+extern int		dev_batch_xmit(struct net_device *dev);
+extern int		prepare_gso_skb(struct sk_buff *skb,
+					struct net_device *dev,
+					struct sk_buff_head *skbs);
+extern int		xmit_prepare_skb(struct sk_buff *skb,
+					 struct net_device *dev);
 
 extern int		netdev_budget;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 1aa0704..164977f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1559,6 +1559,44 @@ out_kfree_skb:
 	return 0;
 }
 
+int dev_batch_xmit(struct net_device *dev)
+{
+	struct sk_buff_head *skbs = &dev->blist;
+	int rc = NETDEV_TX_OK;
+	int tqd = 0;
+	struct sk_buff *skb;
+	int orig_w = dev->xmit_win;
+	int orig_pkts = skb_queue_len(skbs);
+
+	while ((skb = __skb_dequeue(skbs)) != NULL) {
+		rc = dev_hard_start_xmit(skb, dev);
+		if (unlikely(rc))
+			break;
+		tqd++;
+	}
+
+	/* driver is likely buggy and lied to us on how much
+	 * space it had. Damn you driver ..
+	*/
+	if (unlikely(skb_queue_len(skbs))) {
+		printk(KERN_WARNING "Likely bug %s %s (%d) "
+				"left %d/%d window now %d, orig %d\n",
+			dev->name, rc?"busy":"locked",
+			netif_queue_stopped(dev),
+			skb_queue_len(skbs),
+			orig_pkts,
+			dev->xmit_win,
+			orig_w);
+			rc = NETDEV_TX_BUSY;
+	}
+
+	if (tqd)
+		if (dev->hard_end_xmit)
+			dev->hard_end_xmit(dev);
+
+	return rc;
+}
+
 /**
  *	dev_queue_xmit - transmit a buffer
  *	@skb: buffer to transmit
@@ -3581,6 +3619,8 @@ int register_netdevice(struct net_device *dev)
 		}
 	}
 
+	dev->xmit_win = 1;
+	skb_queue_head_init(&dev->blist);
 	ret = netdev_register_kobject(dev);
 	if (ret)
 		goto err_uninit;

^ permalink raw reply related	[flat|nested] 103+ messages in thread

end of thread, other threads:[~2007-11-14  8:28 UTC | newest]

Thread overview: 103+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-14  9:00 [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 Krishna Kumar
2007-09-14  9:01 ` [PATCH 1/10 REV5] [Doc] HOWTO Documentation for batching Krishna Kumar
2007-09-14 18:37   ` [ofa-general] " Randy Dunlap
2007-09-17  4:10     ` Krishna Kumar2
2007-09-17  4:13       ` [ofa-general] " Jeff Garzik
2007-09-14  9:01 ` [PATCH 2/10 REV5] [core] Add skb_blist & support " Krishna Kumar
2007-09-14 12:46   ` [ofa-general] " Evgeniy Polyakov
2007-09-17  3:51     ` Krishna Kumar2
2007-09-14  9:01 ` [PATCH 3/10 REV5] [sched] Modify qdisc_run to support batching Krishna Kumar
2007-09-14 12:15   ` [ofa-general] " Evgeniy Polyakov
2007-09-17  3:49     ` Krishna Kumar2
2007-09-14  9:02 ` [PATCH 4/10 REV5] [ethtool] Add ethtool support Krishna Kumar
2007-09-14  9:02 ` [PATCH 5/10 REV5] [IPoIB] Header file changes Krishna Kumar
2007-09-14  9:03 ` [PATCH 6/10 REV5] [IPoIB] CM & Multicast changes Krishna Kumar
2007-09-14  9:03 ` [PATCH 7/10 REV5] [IPoIB] Verbs changes Krishna Kumar
2007-09-14  9:03 ` [PATCH 8/10 REV5] [IPoIB] Post and work completion handler changes Krishna Kumar
2007-09-14  9:04 ` [PATCH 9/10 REV5] [IPoIB] Implement batching Krishna Kumar
2007-09-14  9:04 ` [PATCH 10/10 REV5] [E1000] " Krishna Kumar
2007-09-14 12:47   ` [ofa-general] " Evgeniy Polyakov
2007-09-17  3:56     ` Krishna Kumar2
2007-11-13 21:28   ` [ofa-general] " Kok, Auke
2007-11-14  8:30     ` Krishna Kumar2
2007-09-14 12:49 ` [ofa-general] Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 Evgeniy Polyakov
2007-09-16 23:17 ` David Miller
2007-09-17  0:29   ` jamal
2007-09-17  1:02     ` David Miller
2007-09-17  2:14       ` [ofa-general] " jamal
2007-09-17  2:25         ` David Miller
2007-09-17  3:01           ` jamal
2007-09-17  3:13             ` David Miller
2007-09-17 12:51               ` jamal
2007-09-17 16:37                 ` [ofa-general] " David Miller
2007-09-17  4:46           ` Krishna Kumar2
2007-09-23 17:53     ` [PATCHES] TX batching jamal
2007-09-23 17:56       ` [ofa-general] [PATCH 1/4] [NET_SCHED] explict hold dev tx lock jamal
2007-09-23 17:58         ` [ofa-general] [PATCH 2/4] [NET_BATCH] Introduce batching interface jamal
2007-09-23 18:00           ` [PATCH 3/4][NET_BATCH] net core use batching jamal
2007-09-23 18:02             ` [ofa-general] [PATCH 4/4][NET_SCHED] kill dev->gso_skb jamal
2007-09-30 18:53               ` [ofa-general] [PATCH 3/3][NET_SCHED] " jamal
2007-10-07 18:39               ` [ofa-general] [PATCH 3/3][NET_BATCH] " jamal
2007-09-30 18:52             ` [ofa-general] [PATCH 2/3][NET_BATCH] net core use batching jamal
2007-10-01  4:11               ` Bill Fink
2007-10-01 13:30                 ` jamal
2007-10-02  4:25                   ` [ofa-general] " Bill Fink
2007-10-02 13:20                     ` jamal
2007-10-03  5:29                       ` [ofa-general] " Bill Fink
2007-10-03 13:42                         ` jamal
2007-10-01 10:42               ` [ofa-general] " Patrick McHardy
2007-10-01 13:21                 ` jamal
2007-10-08  5:03                   ` Krishna Kumar2
2007-10-08 13:17                     ` jamal
2007-10-09  3:09                       ` [ofa-general] " Krishna Kumar2
2007-10-09 13:10                         ` jamal
2007-10-07 18:38             ` [ofa-general] " jamal
2007-09-30 18:51           ` [ofa-general] [PATCH 1/4] [NET_BATCH] Introduce batching interface jamal
2007-09-30 18:54             ` [ofa-general] Re: [PATCH 1/3] " jamal
2007-10-07 18:36           ` [ofa-general] " jamal
2007-10-08  9:59             ` Krishna Kumar2
2007-10-08 13:49               ` jamal
2007-09-24 19:12         ` [ofa-general] RE: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock Waskiewicz Jr, Peter P
2007-09-24 22:51           ` jamal
2007-09-24 22:57             ` Waskiewicz Jr, Peter P
2007-09-24 23:38               ` [ofa-general] " jamal
2007-09-24 23:47                 ` Waskiewicz Jr, Peter P
2007-09-25  0:14                   ` [ofa-general] " Stephen Hemminger
2007-09-25  0:31                     ` [ofa-general] " Waskiewicz Jr, Peter P
2007-09-25 13:15                     ` [ofa-general] " jamal
2007-09-25 15:24                       ` Stephen Hemminger
2007-09-25 22:14                         ` jamal
2007-09-25 22:43                           ` jamal
2007-09-25 13:08                   ` [ofa-general] " jamal
2007-10-08  4:51                 ` [ofa-general] " David Miller
2007-10-08 13:34                   ` jamal
2007-10-08 14:22                     ` parallel networking (was Re: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock) Jeff Garzik
2007-10-08 15:18                       ` [ofa-general] " jamal
2007-10-08 21:11                       ` [ofa-general] Re: parallel networking David Miller
2007-10-08 22:30                         ` jamal
2007-10-08 22:33                           ` David Miller
2007-10-08 22:35                             ` [ofa-general] " Waskiewicz Jr, Peter P
2007-10-08 23:42                             ` [ofa-general] " jamal
2007-10-09  1:53                         ` Jeff Garzik
2007-10-09 14:59                           ` Michael Krause
2007-10-08 21:05                     ` [PATCH 1/4] [NET_SCHED] explict hold dev tx lock David Miller
2007-09-23 18:19       ` [PATCHES] TX batching Jeff Garzik
2007-09-23 19:11         ` [ofa-general] " jamal
2007-09-23 19:36           ` Kok, Auke
2007-09-23 21:20             ` jamal
2007-09-24  7:00               ` Kok, Auke
2007-09-24 22:38                 ` jamal
2007-09-24 22:52                   ` [ofa-general] " Kok, Auke
2007-09-24 22:54           ` [DOC] Net batching driver howto jamal
2007-09-25 20:16             ` [ofa-general] " Randy Dunlap
2007-09-25 22:28               ` jamal
2007-09-25  0:15           ` [PATCHES] TX batching Jeff Garzik
2007-09-30 18:50       ` [ofa-general] " jamal
2007-09-30 19:19         ` [ofa-general] " jamal
2007-10-07 18:34       ` [ofa-general] " jamal
2007-10-08 12:51         ` [ofa-general] " Evgeniy Polyakov
2007-10-08 14:05           ` jamal
2007-10-09  8:14             ` Krishna Kumar2
2007-10-09 13:25               ` jamal
2007-09-17  4:08   ` [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 Krishna Kumar2
  -- strict thread matches above, loose matches on Subject: below --
2007-10-08 18:24 [PATCH 1/3] [NET_BATCH] Introduce batching interface jamal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox