netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Jesper Dangaard Brouer <brouer@redhat.com>,
	netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	Tom Herbert <therbert@google.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	Florian Westphal <fw@strlen.de>,
	Daniel Borkmann <dborkman@redhat.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	John Fastabend <john.r.fastabend@intel.com>
Subject: [RFC net-next PATCH V2 2/3] qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE
Date: Thu, 04 Sep 2014 14:55:49 +0200	[thread overview]
Message-ID: <20140904125438.4108.38160.stgit@dragon> (raw)
In-Reply-To: <20140904125247.4108.8132.stgit@dragon>

Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
sending/processing an entire skb list.

This patch implements qdisc bulk dequeue, by allowing multiple packets
to be dequeued in dequeue_skb().

The optimization principle for this is two fold, (1) to amortize
locking cost and (2) avoid expensive tailptr update for notifying HW.
 (1) Several packets are dequeued while holding the qdisc root_lock,
amortizing locking cost over several packet.  The dequeued SKB list is
processed under the TXQ lock in dev_hard_start_xmit(), thus also
amortizing the cost of the TXQ lock.
 (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
API to delay HW tailptr update, which also reduces the cost per
packet.

One restriction of the new API is that every SKB must belong to the
same TXQ.  This patch takes the easy way out, by restricting bulk
dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
qdisc only have attached a single TXQ.

Some detail about the flow; dev_hard_start_xmit() will process the skb
list, and transmit packets individually towards the driver (see
xmit_one()).  In case the driver stops midway in the list, the
remaining skb list is returned by dev_hard_start_xmit().  In
sch_direct_xmit() this returned list is requeued by dev_requeue_skb().

The patch also tries to limit the amount of bytes dequeued, based on
the drivers BQL limits.  It also tries to avoid and stop dequeuing
when seeing a GSO packet (both real GSO and segmented GSO skb lists).

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

---
V2:
 - Restruct functions, split out functionality
 - Use BQL bytelimit to avoid overshooting driver limits, causing
   too large skb lists to be sitting on the requeue gso_skb.

 net/sched/sch_generic.c |   67 +++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 19696eb..a0c8070 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -56,6 +56,67 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 	return 0;
 }
 
+static inline bool qdisc_may_bulk(const struct Qdisc *qdisc,
+				  const struct sk_buff *skb)
+{
+	return (qdisc->flags & TCQ_F_ONETXQUEUE);
+}
+
+static inline struct sk_buff *qdisc_dequeue_validate(struct Qdisc *qdisc)
+{
+	struct sk_buff *skb = qdisc->dequeue(qdisc);
+
+	if (skb != NULL)
+		skb = validate_xmit_skb(skb, qdisc_dev(qdisc));
+
+	return skb;
+}
+
+static inline struct sk_buff *qdisc_bulk_dequeue_skb(struct Qdisc *q,
+						     struct sk_buff *head)
+{
+	struct sk_buff *new, *skb = head;
+	struct netdev_queue *txq = q->dev_queue;
+	int bytelimit = netdev_tx_avail_queue(txq);
+	int limit = 5;
+
+	if (bytelimit <= 0)
+		return head;
+
+	do {
+		if (skb->next || skb_is_gso(skb)) {
+			/* Stop processing if the skb is already a skb
+			 * list (e.g a segmented GSO packet) or a real
+			 * GSO packet */
+			break;
+		}
+		new = qdisc_dequeue_validate(q);
+		if (new) {
+			skb->next = new;
+			skb = new;
+			bytelimit -= skb->len;
+			cnt++;
+			/* One problem here is it is difficult to
+			 * requeue the "new" dequeued skb, e.g. in
+			 * case of GSO, thus a "normal" packet can
+			 * have a GSO packet on its ->next ptr.
+			 *
+			 * Requeue is difficult because if requeuing
+			 * on q->gso_skb, then a second requeue can
+			 * happen from sch_direct_xmit e.g. if driver
+			 * returns NETDEV_TX_BUSY, which would
+			 * overwrite this requeue.
+			 */
+		}
+	} while (new && --limit && (bytelimit > 0));
+	skb = head;
+
+	return skb;
+}
+
+/* Note that dequeue_skb can possibly return a SKB list (via skb->next).
+ * A requeued skb (via q->gso_skb) can also be a SKB list.
+ */
 static inline struct sk_buff *dequeue_skb(struct Qdisc *q)
 {
 	struct sk_buff *skb = q->gso_skb;
@@ -71,9 +132,9 @@ static inline struct sk_buff *dequeue_skb(struct Qdisc *q)
 			skb = NULL;
 	} else {
 		if (!(q->flags & TCQ_F_ONETXQUEUE) || !netif_xmit_frozen_or_stopped(txq)) {
-			skb = q->dequeue(q);
-			if (skb)
-				skb = validate_xmit_skb(skb, qdisc_dev(q));
+			skb = qdisc_dequeue_validate(q);
+			if (skb && qdisc_may_bulk(q, skb))
+				skb = qdisc_bulk_dequeue_skb(q, skb);
 		}
 	}
 

  parent reply	other threads:[~2014-09-04 12:56 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-04 12:54 [RFC net-next PATCH V2 0/3] qdisc bulk dequeuing and utilizing delayed tailptr updates Jesper Dangaard Brouer
2014-09-04 12:54 ` [RFC net-next PATCH V2 1/3] net: Functions to report space available in device TX queues Jesper Dangaard Brouer
2014-09-04 12:55 ` Jesper Dangaard Brouer [this message]
2014-09-04 13:17   ` [RFC net-next PATCH V2 2/3] qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE Florian Westphal
2014-09-04 17:09     ` Cong Wang
2014-09-04 13:29   ` Eric Dumazet
2014-09-05  8:28     ` Jesper Dangaard Brouer
2014-09-06 12:55       ` Eric Dumazet
2014-09-04 16:59   ` Tom Herbert
2014-09-04 12:56 ` [RFC net-next PATCH V2 3/3] qdisc: debug statements while testing prev-patch Jesper Dangaard Brouer
2014-09-04 13:44 ` [RFC net-next PATCH V2 0/3] qdisc bulk dequeuing and utilizing delayed tailptr updates Jamal Hadi Salim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140904125438.4108.38160.stgit@dragon \
    --to=brouer@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dborkman@redhat.com \
    --cc=eric.dumazet@gmail.com \
    --cc=fw@strlen.de \
    --cc=hannes@stressinduktion.org \
    --cc=jhs@mojatatu.com \
    --cc=john.r.fastabend@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=therbert@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).