All of lore.kernel.org
 help / color / mirror / Atom feed
From: jamal <hadi@cyberus.ca>
To: David Miller <davem@davemloft.net>
Cc: xma@us.ibm.com, rdreier@cisco.com, ak@suse.de,
	krkumar2@in.ibm.com, netdev@vger.kernel.org,
	netdev-owner@vger.kernel.org, ashwin.chaugule@celunite.com,
	Evgeniy Polyakov <johnpol@2ka.mipt.ru>,
	Gagan Arneja <gagan@vmware.com>
Subject: [WIP] [PATCH] WAS Re: [RFC] New driver API to speed up small packets xmits
Date: Tue, 15 May 2007 18:17:47 -0400	[thread overview]
Message-ID: <1179267467.4080.33.camel@localhost> (raw)
In-Reply-To: <20070515.143207.80029051.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 2196 bytes --]

On Tue, 2007-15-05 at 14:32 -0700, David Miller wrote:
>  An efficient qdisc-->driver
> transfer during netif_wake_queue() could help solve some of that,
> as is being discussed here.

Ok, heres the approach i discussed at netconf.
It needs net-2.6 and the patch i posted earlier to clean up
qdisc_restart() [1].
I havent ported over all the bits from 2.6.18,  but this works.
Krishna and i have colluded privately on working together. I just need
to reproduce the patches, so here is the core.
A lot of the code in the core could be aggragated later - right now i am
worried about correctness.
I will post a patch for tun device in a few minutes
that i use to test on my laptop (i need to remove some debugs) to show
an example.
I also plan to post a patch for e1000 - but that will take more
than a few minutes.
the e1000 driver has changed quiet a bit since 2.6.18, so it is
consuming.

What does a driver need to do to get batched-to?

1) On initialization (probe probably)
 a) set NETIF_F_BTX in its dev->features at startup
 i.e dev->features |= NETIF_F_BTX
 b) initialize the batch queue i.e something like
skb_queue_head_init(&dev->blist);
c) set dev->xmit_win to something reasonable like
maybe half the DMA ring size or tx_queuelen

2) create a new method for batch txmit.
This loops on dev->blist and stashes onto hardware.
All return codes like NETDEV_TX_OK etc still apply.

3) set the dev->xmit_win which provides hints on how much
data to send from the core to the driver. Some suggestions:
a)on doing a netif_stop, set it to 1
b)on netif_wake_queue set it to the max available space

Of course, to work, all this requires that the driver to have a
threshold for waking up tx path; like  drivers such as e1000 or tg3 do
in order to invoke netif_wake_queue (example look at TX_WAKE_THRESHOLD
usage in e1000).


feedback welcome (preferably in the form of patches).

Anyone with a really nice tool to measure CPU improvement will help
a great deal in quantifying things. As i have said earlier, I never saw
any throughput improvement. But like T/GSO it may be just CPU savings
(as was suggested at netconf).

cheers,
jamal
[1] http://marc.info/?l=linux-netdev&m=117914954911959&w=2


[-- Attachment #2: batch0 --]
[-- Type: text/x-patch, Size: 4512 bytes --]

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f671cd2..7205748 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -325,6 +325,7 @@ struct net_device
 #define NETIF_F_VLAN_CHALLENGED	1024	/* Device cannot handle VLAN packets */
 #define NETIF_F_GSO		2048	/* Enable software GSO. */
 #define NETIF_F_LLTX		4096	/* LockLess TX */
+#define NETIF_F_BTX		8192	/* Capable of batch tx */
 
 	/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT	16
@@ -450,6 +451,11 @@ struct net_device
 	void			*priv;	/* pointer to private data	*/
 	int			(*hard_start_xmit) (struct sk_buff *skb,
 						    struct net_device *dev);
+	int			(*hard_batch_xmit) (struct sk_buff_head *list,
+						    struct net_device *dev);
+	int			(*hard_prep_xmit) (struct sk_buff *skb,
+						    struct net_device *dev);
+	int			xmit_win;
 	/* These may be needed for future network-power-down code. */
 	unsigned long		trans_start;	/* Time (in jiffies) of last Tx	*/
 
@@ -466,6 +472,10 @@ struct net_device
 	struct list_head	todo_list;
 	/* device index hash chain */
 	struct hlist_node	index_hlist;
+	/*XXX: Fix eventually to not allocate if device not
+	 *batch capable
+	*/
+	struct sk_buff_head	blist;
 
 	struct net_device	*link_watch_next;
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index ed80054..61fa301 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -85,10 +85,12 @@ static inline int
 do_dev_requeue(struct sk_buff *skb, struct net_device *dev, struct Qdisc *q)
 {
 
-	if (unlikely(skb->next))
-		dev->gso_skb = skb;
-	else
-		q->ops->requeue(skb, q);
+	if (skb) {
+		if (unlikely(skb->next))
+			dev->gso_skb = skb;
+		else
+			q->ops->requeue(skb, q);
+	}
 	/* XXX: Could netif_schedule fail? Or is that fact we are
 	 * requeueing imply the hardware path is closed
 	 * and even if we fail, some interupt will wake us
@@ -116,7 +118,10 @@ tx_islocked(struct sk_buff *skb, struct net_device *dev, struct Qdisc *q)
 	int ret = handle_dev_cpu_collision(dev);
 
 	if (ret == SCHED_TX_DROP) {
-		kfree_skb(skb);
+		if (skb) /* we are not batching */
+			kfree_skb(skb);
+		else if (!skb_queue_empty(&dev->blist))
+			skb_queue_purge(&dev->blist);
 		return qdisc_qlen(q);
 	}
 
@@ -195,10 +200,99 @@ static inline int qdisc_restart(struct net_device *dev)
 	return do_dev_requeue(skb, dev, q);
 }
 
+static int try_get_tx_pkts(struct net_device *dev, struct Qdisc *q, int count)
+{
+	struct sk_buff *skb;
+	struct sk_buff_head *skbs = &dev->blist;
+	int tdq = count;
+
+	/* 
+	 * very unlikely, but who knows ..
+	 * If this happens we dont try to grab more pkts
+	 */
+	if (!skb_queue_empty(&dev->blist))
+		return skb_queue_len(&dev->blist);
+
+	if (dev->gso_skb) {
+		count--;
+		__skb_queue_head(skbs, dev->gso_skb);
+		dev->gso_skb = NULL;
+	}
+
+	while (count) {
+		skb = q->dequeue(q);
+		if (!skb)
+			break;
+		count--;
+		__skb_queue_head(skbs, skb);
+	}
+
+	return tdq - count;
+}
+
+static inline int try_tx_pkts(struct net_device *dev)
+{
+
+	return dev->hard_batch_xmit(&dev->blist, dev);
+
+}
+
+/* same comments as in qdisc_restart apply;
+ * at some point use shared code with qdisc_restart*/
+int batch_qdisc_restart(struct net_device *dev)
+{
+	struct Qdisc *q = dev->qdisc;
+	unsigned lockless = (dev->features & NETIF_F_LLTX);
+	int count = dev->xmit_win;
+	int ret = 0;
+
+	ret = try_get_tx_pkts(dev, q, count);
+
+	if (ret == 0)
+		return qdisc_qlen(q);
+
+	/* we have packets to send! */
+	if (!lockless) {
+		if (!netif_tx_trylock(dev))
+			return tx_islocked(NULL, dev, q);
+	}
+
+	/* all clear .. */
+	spin_unlock(&dev->queue_lock);
+
+	ret = NETDEV_TX_BUSY;
+	if (!netif_queue_stopped(dev))
+		ret = try_tx_pkts(dev);
+
+	if (!lockless)
+		netif_tx_unlock(dev);
+
+	spin_lock(&dev->queue_lock);
+
+	q = dev->qdisc;
+
+	/* most likely result, packet went ok */
+	if (ret == NETDEV_TX_OK)
+		return qdisc_qlen(q);
+	/* only for lockless drivers .. */
+	if (ret == NETDEV_TX_LOCKED && lockless)
+		return tx_islocked(NULL, dev, q);
+
+	if (unlikely(ret != NETDEV_TX_BUSY && net_ratelimit()))
+		printk(KERN_WARNING " BUG %s code %d qlen %d\n",
+		       dev->name, ret, q->q.qlen);
+
+	return do_dev_requeue(NULL, dev, q);
+}
+
 void __qdisc_run(struct net_device *dev)
 {
+	unsigned batching = (dev->features & NETIF_F_BTX);
+
 	do {
-		if (!qdisc_restart(dev))
+		if (!batching && !qdisc_restart(dev))
+			break;
+		else if (!batch_qdisc_restart(dev))
 			break;
 	} while (!netif_queue_stopped(dev));
 

  reply	other threads:[~2007-05-15 22:17 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <OF0CAD6D87.DBE62968-ON872572DC.0073646A-882572DC.0073BEC2@us.ibm.com>
2007-05-15 21:17 ` [RFC] New driver API to speed up small packets xmits Roland Dreier
     [not found]   ` <OFF5654BB8.74EC8DCB-ON872572DC.00752079-882572DC.00756B23@us.ibm.com>
2007-05-15 21:25     ` Roland Dreier
     [not found]       ` <OF21D475A2.5E5C88DE-ON872572DC.00763DE4-882572DC.00768A7E@us.ibm.com>
2007-05-15 21:38         ` David Miller
2007-05-15 21:32     ` David Miller
2007-05-15 22:17       ` jamal [this message]
2007-05-15 22:48         ` [WIP] [PATCH] WAS " jamal
2007-05-16  0:50           ` jamal
2007-05-16 22:12         ` Sridhar Samudrala
2007-05-16 22:52           ` jamal
2007-05-17  3:25             ` jamal
2007-05-18 12:07               ` jamal
2007-05-17  4:03           ` Krishna Kumar2
2007-05-16 21:44             ` Sridhar Samudrala
2007-05-17  5:01               ` Krishna Kumar2
     [not found]       ` <OF6757F56D.EE5984FD-ON872572DC.0081026C-882572DC.00814B8F@us.ibm.com>
2007-05-15 23:36         ` David Miller
2007-05-21  7:56       ` Herbert Xu
     [not found]         ` <OF9ABCD08D.2CD1B193-ON872572E3.007A6FC1-882572E3.007ACE1A@us.ibm.com>
2007-05-22 22:36           ` David Miller
     [not found]             ` <OFCF3EB7F8.9740C0C7-ON872572E3.007DADF6-882572E3.007E0E7B@us.ibm.com>
2007-05-22 23:04               ` David Miller
2007-05-22 23:12             ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1179267467.4080.33.camel@localhost \
    --to=hadi@cyberus.ca \
    --cc=ak@suse.de \
    --cc=ashwin.chaugule@celunite.com \
    --cc=davem@davemloft.net \
    --cc=gagan@vmware.com \
    --cc=johnpol@2ka.mipt.ru \
    --cc=krkumar2@in.ibm.com \
    --cc=netdev-owner@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rdreier@cisco.com \
    --cc=xma@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.