From: jamal <hadi@cyberus.ca>
To: David Miller <davem@davemloft.net>
Cc: xma@us.ibm.com, rdreier@cisco.com, ak@suse.de,
krkumar2@in.ibm.com, netdev@vger.kernel.org,
netdev-owner@vger.kernel.org, ashwin.chaugule@celunite.com,
Evgeniy Polyakov <johnpol@2ka.mipt.ru>,
Gagan Arneja <gagan@vmware.com>
Subject: [WIP] [PATCH] WAS Re: [RFC] New driver API to speed up small packets xmits
Date: Tue, 15 May 2007 18:17:47 -0400 [thread overview]
Message-ID: <1179267467.4080.33.camel@localhost> (raw)
In-Reply-To: <20070515.143207.80029051.davem@davemloft.net>
[-- Attachment #1: Type: text/plain, Size: 2196 bytes --]
On Tue, 2007-15-05 at 14:32 -0700, David Miller wrote:
> An efficient qdisc-->driver
> transfer during netif_wake_queue() could help solve some of that,
> as is being discussed here.
Ok, heres the approach i discussed at netconf.
It needs net-2.6 and the patch i posted earlier to clean up
qdisc_restart() [1].
I havent ported over all the bits from 2.6.18, but this works.
Krishna and i have colluded privately on working together. I just need
to reproduce the patches, so here is the core.
A lot of the code in the core could be aggragated later - right now i am
worried about correctness.
I will post a patch for tun device in a few minutes
that i use to test on my laptop (i need to remove some debugs) to show
an example.
I also plan to post a patch for e1000 - but that will take more
than a few minutes.
the e1000 driver has changed quiet a bit since 2.6.18, so it is
consuming.
What does a driver need to do to get batched-to?
1) On initialization (probe probably)
a) set NETIF_F_BTX in its dev->features at startup
i.e dev->features |= NETIF_F_BTX
b) initialize the batch queue i.e something like
skb_queue_head_init(&dev->blist);
c) set dev->xmit_win to something reasonable like
maybe half the DMA ring size or tx_queuelen
2) create a new method for batch txmit.
This loops on dev->blist and stashes onto hardware.
All return codes like NETDEV_TX_OK etc still apply.
3) set the dev->xmit_win which provides hints on how much
data to send from the core to the driver. Some suggestions:
a)on doing a netif_stop, set it to 1
b)on netif_wake_queue set it to the max available space
Of course, to work, all this requires that the driver to have a
threshold for waking up tx path; like drivers such as e1000 or tg3 do
in order to invoke netif_wake_queue (example look at TX_WAKE_THRESHOLD
usage in e1000).
feedback welcome (preferably in the form of patches).
Anyone with a really nice tool to measure CPU improvement will help
a great deal in quantifying things. As i have said earlier, I never saw
any throughput improvement. But like T/GSO it may be just CPU savings
(as was suggested at netconf).
cheers,
jamal
[1] http://marc.info/?l=linux-netdev&m=117914954911959&w=2
[-- Attachment #2: batch0 --]
[-- Type: text/x-patch, Size: 4512 bytes --]
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f671cd2..7205748 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -325,6 +325,7 @@ struct net_device
#define NETIF_F_VLAN_CHALLENGED 1024 /* Device cannot handle VLAN packets */
#define NETIF_F_GSO 2048 /* Enable software GSO. */
#define NETIF_F_LLTX 4096 /* LockLess TX */
+#define NETIF_F_BTX 8192 /* Capable of batch tx */
/* Segmentation offload features */
#define NETIF_F_GSO_SHIFT 16
@@ -450,6 +451,11 @@ struct net_device
void *priv; /* pointer to private data */
int (*hard_start_xmit) (struct sk_buff *skb,
struct net_device *dev);
+ int (*hard_batch_xmit) (struct sk_buff_head *list,
+ struct net_device *dev);
+ int (*hard_prep_xmit) (struct sk_buff *skb,
+ struct net_device *dev);
+ int xmit_win;
/* These may be needed for future network-power-down code. */
unsigned long trans_start; /* Time (in jiffies) of last Tx */
@@ -466,6 +472,10 @@ struct net_device
struct list_head todo_list;
/* device index hash chain */
struct hlist_node index_hlist;
+ /*XXX: Fix eventually to not allocate if device not
+ *batch capable
+ */
+ struct sk_buff_head blist;
struct net_device *link_watch_next;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index ed80054..61fa301 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -85,10 +85,12 @@ static inline int
do_dev_requeue(struct sk_buff *skb, struct net_device *dev, struct Qdisc *q)
{
- if (unlikely(skb->next))
- dev->gso_skb = skb;
- else
- q->ops->requeue(skb, q);
+ if (skb) {
+ if (unlikely(skb->next))
+ dev->gso_skb = skb;
+ else
+ q->ops->requeue(skb, q);
+ }
/* XXX: Could netif_schedule fail? Or is that fact we are
* requeueing imply the hardware path is closed
* and even if we fail, some interupt will wake us
@@ -116,7 +118,10 @@ tx_islocked(struct sk_buff *skb, struct net_device *dev, struct Qdisc *q)
int ret = handle_dev_cpu_collision(dev);
if (ret == SCHED_TX_DROP) {
- kfree_skb(skb);
+ if (skb) /* we are not batching */
+ kfree_skb(skb);
+ else if (!skb_queue_empty(&dev->blist))
+ skb_queue_purge(&dev->blist);
return qdisc_qlen(q);
}
@@ -195,10 +200,99 @@ static inline int qdisc_restart(struct net_device *dev)
return do_dev_requeue(skb, dev, q);
}
+static int try_get_tx_pkts(struct net_device *dev, struct Qdisc *q, int count)
+{
+ struct sk_buff *skb;
+ struct sk_buff_head *skbs = &dev->blist;
+ int tdq = count;
+
+ /*
+ * very unlikely, but who knows ..
+ * If this happens we dont try to grab more pkts
+ */
+ if (!skb_queue_empty(&dev->blist))
+ return skb_queue_len(&dev->blist);
+
+ if (dev->gso_skb) {
+ count--;
+ __skb_queue_head(skbs, dev->gso_skb);
+ dev->gso_skb = NULL;
+ }
+
+ while (count) {
+ skb = q->dequeue(q);
+ if (!skb)
+ break;
+ count--;
+ __skb_queue_head(skbs, skb);
+ }
+
+ return tdq - count;
+}
+
+static inline int try_tx_pkts(struct net_device *dev)
+{
+
+ return dev->hard_batch_xmit(&dev->blist, dev);
+
+}
+
+/* same comments as in qdisc_restart apply;
+ * at some point use shared code with qdisc_restart*/
+int batch_qdisc_restart(struct net_device *dev)
+{
+ struct Qdisc *q = dev->qdisc;
+ unsigned lockless = (dev->features & NETIF_F_LLTX);
+ int count = dev->xmit_win;
+ int ret = 0;
+
+ ret = try_get_tx_pkts(dev, q, count);
+
+ if (ret == 0)
+ return qdisc_qlen(q);
+
+ /* we have packets to send! */
+ if (!lockless) {
+ if (!netif_tx_trylock(dev))
+ return tx_islocked(NULL, dev, q);
+ }
+
+ /* all clear .. */
+ spin_unlock(&dev->queue_lock);
+
+ ret = NETDEV_TX_BUSY;
+ if (!netif_queue_stopped(dev))
+ ret = try_tx_pkts(dev);
+
+ if (!lockless)
+ netif_tx_unlock(dev);
+
+ spin_lock(&dev->queue_lock);
+
+ q = dev->qdisc;
+
+ /* most likely result, packet went ok */
+ if (ret == NETDEV_TX_OK)
+ return qdisc_qlen(q);
+ /* only for lockless drivers .. */
+ if (ret == NETDEV_TX_LOCKED && lockless)
+ return tx_islocked(NULL, dev, q);
+
+ if (unlikely(ret != NETDEV_TX_BUSY && net_ratelimit()))
+ printk(KERN_WARNING " BUG %s code %d qlen %d\n",
+ dev->name, ret, q->q.qlen);
+
+ return do_dev_requeue(NULL, dev, q);
+}
+
void __qdisc_run(struct net_device *dev)
{
+ unsigned batching = (dev->features & NETIF_F_BTX);
+
do {
- if (!qdisc_restart(dev))
+ if (!batching && !qdisc_restart(dev))
+ break;
+ else if (!batch_qdisc_restart(dev))
break;
} while (!netif_queue_stopped(dev));
next prev parent reply other threads:[~2007-05-15 22:17 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <OF0CAD6D87.DBE62968-ON872572DC.0073646A-882572DC.0073BEC2@us.ibm.com>
2007-05-15 21:17 ` [RFC] New driver API to speed up small packets xmits Roland Dreier
[not found] ` <OFF5654BB8.74EC8DCB-ON872572DC.00752079-882572DC.00756B23@us.ibm.com>
2007-05-15 21:25 ` Roland Dreier
[not found] ` <OF21D475A2.5E5C88DE-ON872572DC.00763DE4-882572DC.00768A7E@us.ibm.com>
2007-05-15 21:38 ` David Miller
2007-05-15 21:32 ` David Miller
2007-05-15 22:17 ` jamal [this message]
2007-05-15 22:48 ` [WIP] [PATCH] WAS " jamal
2007-05-16 0:50 ` jamal
2007-05-16 22:12 ` Sridhar Samudrala
2007-05-16 22:52 ` jamal
2007-05-17 3:25 ` jamal
2007-05-18 12:07 ` jamal
2007-05-17 4:03 ` Krishna Kumar2
2007-05-16 21:44 ` Sridhar Samudrala
2007-05-17 5:01 ` Krishna Kumar2
[not found] ` <OF6757F56D.EE5984FD-ON872572DC.0081026C-882572DC.00814B8F@us.ibm.com>
2007-05-15 23:36 ` David Miller
2007-05-21 7:56 ` Herbert Xu
[not found] ` <OF9ABCD08D.2CD1B193-ON872572E3.007A6FC1-882572E3.007ACE1A@us.ibm.com>
2007-05-22 22:36 ` David Miller
[not found] ` <OFCF3EB7F8.9740C0C7-ON872572E3.007DADF6-882572E3.007E0E7B@us.ibm.com>
2007-05-22 23:04 ` David Miller
2007-05-22 23:12 ` Herbert Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1179267467.4080.33.camel@localhost \
--to=hadi@cyberus.ca \
--cc=ak@suse.de \
--cc=ashwin.chaugule@celunite.com \
--cc=davem@davemloft.net \
--cc=gagan@vmware.com \
--cc=johnpol@2ka.mipt.ru \
--cc=krkumar2@in.ibm.com \
--cc=netdev-owner@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=rdreier@cisco.com \
--cc=xma@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).