public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: jamal <hadi@cyberus.ca>
To: David Miller <davem@davemloft.net>
Cc: johnpol@2ka.mipt.ru, peter.p.waskiewicz.jr@intel.com,
	kumarkr@linux.ibm.com, herbert@gondor.apana.org.au,
	gaagaan@gmail.com, Robert.Olsson@data.slu.se,
	netdev@vger.kernel.org, rdreier@cisco.com, mcarlson@broadcom.com,
	randy.dunlap@oracle.com, jagana@us.ibm.com,
	general@lists.openfabrics.org, mchan@broadcom.com, tgraf@suug.ch,
	jeff@garzik.org, sri@us.ibm.com, shemminger@linux-foundation.org,
	kaber@trash.net
Subject: [ofa-general] [PATCH 1/3] [NET_BATCH] Introduce batching interface
Date: Sun, 07 Oct 2007 14:36:23 -0400	[thread overview]
Message-ID: <1191782183.4394.62.camel@localhost> (raw)
In-Reply-To: <1190570317.4256.59.camel@localhost>

[-- Attachment #1: Type: text/plain, Size: 77 bytes --]

This patch introduces the netdevice interface for batching.

cheers,
jamal



[-- Attachment #2: oct07-p1of3 --]
[-- Type: text/plain, Size: 7403 bytes --]

[NET_BATCH] Introduce batching interface

This patch introduces the netdevice interface for batching.

BACKGROUND
---------

A driver dev->hard_start_xmit() has 4 typical parts:
a) packet formating (example vlan, mss, descriptor counting etc)
b) chip specific formatting
c) enqueueing the packet on a DMA ring
d) IO operations to complete packet transmit, tell DMA engine to chew on,
tx completion interupts, set last tx time, etc

[For code cleanliness/readability sake, regardless of this work,
one should break the dev->hard_start_xmit() into those 4 functions
anyways].

INTRODUCING API
---------------

With the api introduced in this patch, a driver which has all
4 parts and needing to support batching is advised to split its
dev->hard_start_xmit() in the following manner:
1)Remove #d from dev->hard_start_xmit() and put it in
dev->hard_end_xmit() method.
2)#b and #c can stay in ->hard_start_xmit() (or whichever way you want
to do this)
3) #a is deffered to future work to reduce confusion (since it holds
on its own).

Note: There are drivers which may need not support any of the two
approaches (example the tun driver i patched) so the methods are
optional.

xmit_win variable is set by the driver to tell the core how much space
it has to take on new skbs. It is introduced to ensure that when we pass
the driver a list of packets it will swallow all of them - which is
useful because we dont requeue to the qdisc (and avoids burning
unnecessary cpu cycles or introducing any strange re-ordering). The driver
tells us when it invokes netif_wake_queue how much space it has for
descriptors by setting this variable.

Refer to the driver howto for more details.

THEORY OF OPERATION
-------------------

1. Core dequeues from qdiscs upto dev->xmit_win packets. Fragmented
and GSO packets are accounted for as well.
2. Core grabs TX_LOCK
3. Core loop for all skbs:
                    invokes driver dev->hard_start_xmit()
4. Core invokes driver dev->hard_end_xmit()

ACKNOWLEDGEMENT AND SOME HISTORY
--------------------------------

There's a lot of history and reasoning of "why batching" in a document
i am writting which i may submit as a patch.
Thomas Graf (who doesnt know this probably) gave me the impetus to
start looking at this back in 2004 when he invited me to the linux
conference he was organizing. Parts of what i presented in SUCON in
2004 talk about batching. Herbert Xu forced me to take a second look around
2.6.18 - refer to my netconf 2006 presentation. Krishna Kumar provided
me with more motivation in May 2007 when he posted on netdev and engaged
me.
Sridhar Samudrala, Krishna Kumar, Matt Carlson, Michael Chan,
Jeremy Ethridge, Evgeniy Polyakov, Sivakumar Subramani, David Miller,
and Patrick McHardy, Jeff Garzik and Bill Fink have contributed in one or 
more of {bug fixes, enhancements, testing, lively discussion}. The 
Broadcom and neterion folks have been outstanding in their help.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>

---
commit 0a0762e2c615a980af284e86d9729d233e1bf7f4
tree c27fec824a9e75ffbb791647bdb595c082a54990
parent 190674ff1fe0b7bddf038c2bfddf45b9c6418e2a
author Jamal Hadi Salim <hadi@cyberus.ca> Sun, 07 Oct 2007 08:51:10 -0400
committer Jamal Hadi Salim <hadi@cyberus.ca> Sun, 07 Oct 2007 08:51:10 -0400

 include/linux/netdevice.h |   11 ++++++
 net/core/dev.c            |   83 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 94 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 91cd3f3..b31df5c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -467,6 +467,7 @@ struct net_device
 #define NETIF_F_NETNS_LOCAL	8192	/* Does not change network namespaces */
 #define NETIF_F_MULTI_QUEUE	16384	/* Has multiple TX/RX queues */
 #define NETIF_F_LRO		32768	/* large receive offload */
+#define NETIF_F_BTX		65536	/* Capable of batch tx */
 
 	/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT	16
@@ -595,6 +596,9 @@ struct net_device
 	void			*priv;	/* pointer to private data	*/
 	int			(*hard_start_xmit) (struct sk_buff *skb,
 						    struct net_device *dev);
+	void			(*hard_end_xmit) (struct net_device *dev);
+	int			xmit_win;
+
 	/* These may be needed for future network-power-down code. */
 	unsigned long		trans_start;	/* Time (in jiffies) of last Tx	*/
 
@@ -609,6 +613,7 @@ struct net_device
 
 	/* delayed register/unregister */
 	struct list_head	todo_list;
+	struct sk_buff_head     blist;
 	/* device index hash chain */
 	struct hlist_node	index_hlist;
 
@@ -1044,6 +1049,12 @@ extern int		dev_set_mac_address(struct net_device *,
 					    struct sockaddr *);
 extern int		dev_hard_start_xmit(struct sk_buff *skb,
 					    struct net_device *dev);
+extern int		dev_batch_xmit(struct net_device *dev);
+extern int		prepare_gso_skb(struct sk_buff *skb,
+					struct net_device *dev,
+					struct sk_buff_head *skbs);
+extern int		xmit_prepare_skb(struct sk_buff *skb,
+					 struct net_device *dev);
 
 extern int		netdev_budget;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index d998646..04df3fb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1517,6 +1517,87 @@ static int dev_gso_segment(struct sk_buff *skb)
 	return 0;
 }
 
+int prepare_gso_skb(struct sk_buff *skb, struct net_device *dev,
+		    struct sk_buff_head *skbs)
+{
+	int tdq = 0;
+	do {
+		struct sk_buff *nskb = skb->next;
+
+		skb->next = nskb->next;
+		nskb->next = NULL;
+
+		/* Driver likes this packet .. */
+		tdq++;
+		__skb_queue_tail(skbs, nskb);
+	} while (skb->next);
+	skb->destructor = DEV_GSO_CB(skb)->destructor;
+	kfree_skb(skb);
+
+	return tdq;
+}
+
+int xmit_prepare_skb(struct sk_buff *skb, struct net_device *dev)
+{
+	struct sk_buff_head *skbs = &dev->blist;
+
+	if (netif_needs_gso(dev, skb)) {
+		if (unlikely(dev_gso_segment(skb))) {
+			kfree_skb(skb);
+			return 0;
+		}
+		if (skb->next)
+			return prepare_gso_skb(skb, dev, skbs);
+	}
+
+	__skb_queue_tail(skbs, skb);
+	return 1;
+}
+
+int dev_batch_xmit(struct net_device *dev)
+{
+	struct sk_buff_head *skbs = &dev->blist;
+	int rc = NETDEV_TX_OK;
+	struct sk_buff *skb;
+	int orig_w = dev->xmit_win;
+	int orig_pkts = skb_queue_len(skbs);
+
+	while ((skb = __skb_dequeue(skbs)) != NULL) {
+		if (!list_empty(&ptype_all))
+			dev_queue_xmit_nit(skb, dev);
+		rc = dev->hard_start_xmit(skb, dev);
+		if (unlikely(rc))
+			break;
+		/* * XXX: multiqueue may need closer srutiny.. */
+		if (unlikely(netif_queue_stopped(dev) ||
+		     netif_subqueue_stopped(dev, skb->queue_mapping))) {
+			rc = NETDEV_TX_BUSY;
+			break;
+		}
+	}
+
+	/* driver is likely buggy and lied to us on how much
+	 * space it had. Damn you driver ..
+	*/
+	if (unlikely(skb_queue_len(skbs))) {
+		printk(KERN_WARNING "Likely bug %s %s (%d) "
+				"left %d/%d window now %d, orig %d\n",
+			dev->name, rc?"busy":"locked",
+			netif_queue_stopped(dev),
+			skb_queue_len(skbs),
+			orig_pkts,
+			dev->xmit_win,
+			orig_w);
+			rc = NETDEV_TX_BUSY;
+	}
+
+	if (orig_pkts > skb_queue_len(skbs))
+		if (dev->hard_end_xmit)
+			dev->hard_end_xmit(dev);
+
+	return rc;
+}
+
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	if (likely(!skb->next)) {
@@ -3553,6 +3634,8 @@ int register_netdevice(struct net_device *dev)
 		}
 	}
 
+	dev->xmit_win = 1;
+	skb_queue_head_init(&dev->blist);
 	ret = netdev_register_kobject(dev);
 	if (ret)
 		goto err_uninit;

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



  parent reply	other threads:[~2007-10-07 18:36 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-14  9:00 [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 Krishna Kumar
2007-09-14  9:01 ` [PATCH 1/10 REV5] [Doc] HOWTO Documentation for batching Krishna Kumar
2007-09-14 18:37   ` [ofa-general] " Randy Dunlap
2007-09-17  4:10     ` Krishna Kumar2
2007-09-17  4:13       ` [ofa-general] " Jeff Garzik
2007-09-14  9:01 ` [PATCH 2/10 REV5] [core] Add skb_blist & support " Krishna Kumar
2007-09-14 12:46   ` [ofa-general] " Evgeniy Polyakov
2007-09-17  3:51     ` Krishna Kumar2
2007-09-14  9:01 ` [PATCH 3/10 REV5] [sched] Modify qdisc_run to support batching Krishna Kumar
2007-09-14 12:15   ` [ofa-general] " Evgeniy Polyakov
2007-09-17  3:49     ` Krishna Kumar2
2007-09-14  9:02 ` [PATCH 4/10 REV5] [ethtool] Add ethtool support Krishna Kumar
2007-09-14  9:02 ` [PATCH 5/10 REV5] [IPoIB] Header file changes Krishna Kumar
2007-09-14  9:03 ` [PATCH 6/10 REV5] [IPoIB] CM & Multicast changes Krishna Kumar
2007-09-14  9:03 ` [PATCH 7/10 REV5] [IPoIB] Verbs changes Krishna Kumar
2007-09-14  9:03 ` [PATCH 8/10 REV5] [IPoIB] Post and work completion handler changes Krishna Kumar
2007-09-14  9:04 ` [PATCH 9/10 REV5] [IPoIB] Implement batching Krishna Kumar
2007-09-14  9:04 ` [PATCH 10/10 REV5] [E1000] " Krishna Kumar
2007-09-14 12:47   ` [ofa-general] " Evgeniy Polyakov
2007-09-17  3:56     ` Krishna Kumar2
2007-11-13 21:28   ` [ofa-general] " Kok, Auke
2007-11-14  8:30     ` Krishna Kumar2
2007-09-14 12:49 ` [ofa-general] Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 Evgeniy Polyakov
2007-09-16 23:17 ` David Miller
2007-09-17  0:29   ` jamal
2007-09-17  1:02     ` David Miller
2007-09-17  2:14       ` [ofa-general] " jamal
2007-09-17  2:25         ` David Miller
2007-09-17  3:01           ` jamal
2007-09-17  3:13             ` David Miller
2007-09-17 12:51               ` jamal
2007-09-17 16:37                 ` [ofa-general] " David Miller
2007-09-17  4:46           ` Krishna Kumar2
2007-09-23 17:53     ` [PATCHES] TX batching jamal
2007-09-23 17:56       ` [ofa-general] [PATCH 1/4] [NET_SCHED] explict hold dev tx lock jamal
2007-09-23 17:58         ` [ofa-general] [PATCH 2/4] [NET_BATCH] Introduce batching interface jamal
2007-09-23 18:00           ` [PATCH 3/4][NET_BATCH] net core use batching jamal
2007-09-23 18:02             ` [ofa-general] [PATCH 4/4][NET_SCHED] kill dev->gso_skb jamal
2007-09-30 18:53               ` [ofa-general] [PATCH 3/3][NET_SCHED] " jamal
2007-10-07 18:39               ` [ofa-general] [PATCH 3/3][NET_BATCH] " jamal
2007-09-30 18:52             ` [ofa-general] [PATCH 2/3][NET_BATCH] net core use batching jamal
2007-10-01  4:11               ` Bill Fink
2007-10-01 13:30                 ` jamal
2007-10-02  4:25                   ` [ofa-general] " Bill Fink
2007-10-02 13:20                     ` jamal
2007-10-03  5:29                       ` [ofa-general] " Bill Fink
2007-10-03 13:42                         ` jamal
2007-10-01 10:42               ` [ofa-general] " Patrick McHardy
2007-10-01 13:21                 ` jamal
2007-10-08  5:03                   ` Krishna Kumar2
2007-10-08 13:17                     ` jamal
2007-10-09  3:09                       ` [ofa-general] " Krishna Kumar2
2007-10-09 13:10                         ` jamal
2007-10-07 18:38             ` [ofa-general] " jamal
2007-09-30 18:51           ` [ofa-general] [PATCH 1/4] [NET_BATCH] Introduce batching interface jamal
2007-09-30 18:54             ` [ofa-general] Re: [PATCH 1/3] " jamal
2007-10-07 18:36           ` jamal [this message]
2007-10-08  9:59             ` Krishna Kumar2
2007-10-08 13:49               ` jamal
2007-09-24 19:12         ` [ofa-general] RE: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock Waskiewicz Jr, Peter P
2007-09-24 22:51           ` jamal
2007-09-24 22:57             ` Waskiewicz Jr, Peter P
2007-09-24 23:38               ` [ofa-general] " jamal
2007-09-24 23:47                 ` Waskiewicz Jr, Peter P
2007-09-25  0:14                   ` [ofa-general] " Stephen Hemminger
2007-09-25  0:31                     ` [ofa-general] " Waskiewicz Jr, Peter P
2007-09-25 13:15                     ` [ofa-general] " jamal
2007-09-25 15:24                       ` Stephen Hemminger
2007-09-25 22:14                         ` jamal
2007-09-25 22:43                           ` jamal
2007-09-25 13:08                   ` [ofa-general] " jamal
2007-10-08  4:51                 ` [ofa-general] " David Miller
2007-10-08 13:34                   ` jamal
2007-10-08 14:22                     ` parallel networking (was Re: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock) Jeff Garzik
2007-10-08 15:18                       ` [ofa-general] " jamal
2007-10-08 21:11                       ` [ofa-general] Re: parallel networking David Miller
2007-10-08 22:30                         ` jamal
2007-10-08 22:33                           ` David Miller
2007-10-08 22:35                             ` [ofa-general] " Waskiewicz Jr, Peter P
2007-10-08 23:42                             ` [ofa-general] " jamal
2007-10-09  1:53                         ` Jeff Garzik
2007-10-09 14:59                           ` Michael Krause
2007-10-08 21:05                     ` [PATCH 1/4] [NET_SCHED] explict hold dev tx lock David Miller
2007-09-23 18:19       ` [PATCHES] TX batching Jeff Garzik
2007-09-23 19:11         ` [ofa-general] " jamal
2007-09-23 19:36           ` Kok, Auke
2007-09-23 21:20             ` jamal
2007-09-24  7:00               ` Kok, Auke
2007-09-24 22:38                 ` jamal
2007-09-24 22:52                   ` [ofa-general] " Kok, Auke
2007-09-24 22:54           ` [DOC] Net batching driver howto jamal
2007-09-25 20:16             ` [ofa-general] " Randy Dunlap
2007-09-25 22:28               ` jamal
2007-09-25  0:15           ` [PATCHES] TX batching Jeff Garzik
2007-09-30 18:50       ` [ofa-general] " jamal
2007-09-30 19:19         ` [ofa-general] " jamal
2007-10-07 18:34       ` [ofa-general] " jamal
2007-10-08 12:51         ` [ofa-general] " Evgeniy Polyakov
2007-10-08 14:05           ` jamal
2007-10-09  8:14             ` Krishna Kumar2
2007-10-09 13:25               ` jamal
2007-09-17  4:08   ` [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 Krishna Kumar2

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1191782183.4394.62.camel@localhost \
    --to=hadi@cyberus.ca \
    --cc=Robert.Olsson@data.slu.se \
    --cc=davem@davemloft.net \
    --cc=gaagaan@gmail.com \
    --cc=general@lists.openfabrics.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=jagana@us.ibm.com \
    --cc=jeff@garzik.org \
    --cc=johnpol@2ka.mipt.ru \
    --cc=kaber@trash.net \
    --cc=kumarkr@linux.ibm.com \
    --cc=mcarlson@broadcom.com \
    --cc=mchan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=peter.p.waskiewicz.jr@intel.com \
    --cc=randy.dunlap@oracle.com \
    --cc=rdreier@cisco.com \
    --cc=shemminger@linux-foundation.org \
    --cc=sri@us.ibm.com \
    --cc=tgraf@suug.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox