From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org, therbert@google.com,
"David S. Miller" <davem@davemloft.net>,
Alexander Duyck <alexander.h.duyck@intel.com>,
toke@toke.dk, Florian Westphal <fw@strlen.de>,
jhs@mojatatu.com, Dave Taht <dave.taht@gmail.com>,
John Fastabend <john.r.fastabend@intel.com>,
Daniel Borkmann <dborkman@redhat.com>,
Hannes Frederic Sowa <hannes@stressinduktion.org>,
brouer@redhat.com
Subject: Re: [net-next PATCH 1/1 V4] qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE
Date: Wed, 24 Sep 2014 19:58:31 +0200 [thread overview]
Message-ID: <20140924195831.6fb91051@redhat.com> (raw)
In-Reply-To: <1411579395.15395.41.camel@edumazet-glaptop2.roam.corp.google.com>
On Wed, 24 Sep 2014 10:23:15 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2014-09-24 at 18:12 +0200, Jesper Dangaard Brouer wrote:
> > Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
> > sending/processing an entire skb list.
> >
> > This patch implements qdisc bulk dequeue, by allowing multiple packets
> > to be dequeued in dequeue_skb().
> >
> > The optimization principle for this is two fold, (1) to amortize
> > locking cost and (2) avoid expensive tailptr update for notifying HW.
> > (1) Several packets are dequeued while holding the qdisc root_lock,
> > amortizing locking cost over several packet. The dequeued SKB list is
> > processed under the TXQ lock in dev_hard_start_xmit(), thus also
> > amortizing the cost of the TXQ lock.
> > (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
> > API to delay HW tailptr update, which also reduces the cost per
> > packet.
> >
> > One restriction of the new API is that every SKB must belong to the
> > same TXQ. This patch takes the easy way out, by restricting bulk
> > dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
> > qdisc only have attached a single TXQ.
> >
> > Some detail about the flow; dev_hard_start_xmit() will process the skb
> > list, and transmit packets individually towards the driver (see
> > xmit_one()). In case the driver stops midway in the list, the
> > remaining skb list is returned by dev_hard_start_xmit(). In
> > sch_direct_xmit() this returned list is requeued by dev_requeue_skb().
> >
> > To avoid overshooting the HW limits, which results in requeuing, the
> > patch limits the amount of bytes dequeued, based on the drivers BQL
> > limits. In-effect bulking will only happen for BQL enabled drivers.
> > Besides the bytelimit from BQL, also limit bulking to maximum 8
> > packets to avoid any issues with available descriptor in HW.
> >
> > For now, as a conservative approach, don't bulk dequeue GSO and
> > segmented GSO packets, because testing showed requeuing occuring
> > with segmented GSO packets.
> >
> > Jointed work with Hannes, Daniel and Florian.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> >
> > ---
> > V4:
> > - Patch rewritten in the Red Hat Neuchatel office jointed work with
> > Hannes, Daniel and Florian.
> > - Conservative approach of only using on BQL enabled drivers
> > - No user tunable parameter, but limit bulking to 8 packets.
> > - For now, avoid bulking GSO packets packets.
> >
[...]
> > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> > index 19696eb..6fba089 100644
> > --- a/net/sched/sch_generic.c
> > +++ b/net/sched/sch_generic.c
> > @@ -56,6 +56,42 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
> > return 0;
> > }
> >
> > +static struct sk_buff *try_bulk_dequeue_skb(struct Qdisc *q,
> > + struct sk_buff *head_skb,
> > + int bytelimit)
> > +{
> > + struct sk_buff *skb, *tail_skb = head_skb;
> > + int budget = 8; /* Arbitrary, but conservatively choosen limit */
> > +
> > + while (bytelimit > 0 && --budget > 0) {
> > + /* For now, don't bulk dequeue GSO (or GSO segmented) pkts */
>
> Hmm... this is a serious limitation.
We plan to remove this at a later point, this is just to be conservative.
> > + if (tail_skb->next || skb_is_gso(tail_skb))
> > + break;
> > +
> > + skb = q->dequeue(q);
> > + if (!skb)
> > + break;
> > +
> > + bytelimit -= skb->len; /* covers GSO len */
>
> Not really, use qdisc_pkt_len(skb) instead, to have a better byte count.
Understood, will update patch tomorrow.
> > + skb = validate_xmit_skb(skb, qdisc_dev(q));
> > + if (!skb)
> > + break;
> > +
> > + /* "skb" can be a skb list after validate call above
> > + * (GSO segmented), but it is okay to append it to
> > + * current tail_skb->next, because next round will exit
> > + * in-case "tail_skb->next" is a skb list.
> > + */
>
> It would be trivial to change validate_xmit_skb() to return the head and
> tail of the chain. Walking the chain would hit hot cache lines, but it
> is better not having to walk it.
Yes, we could do this when we later add support for GSO segmented packets.
> > + tail_skb->next = skb;
> > + tail_skb = skb;
>
> So that here we do : tail_skb = tail;
>
> > + }
> > +
> > + return head_skb;
> > +}
> > +
> > +/* Note that dequeue_skb can possibly return a SKB list (via skb->next).
> > + * A requeued skb (via q->gso_skb) can also be a SKB list.
> > + */
> > static inline struct sk_buff *dequeue_skb(struct Qdisc *q)
> > {
> > struct sk_buff *skb = q->gso_skb;
> > @@ -70,10 +106,17 @@ static inline struct sk_buff *dequeue_skb(struct Qdisc *q)
> > } else
> > skb = NULL;
> > } else {
> > - if (!(q->flags & TCQ_F_ONETXQUEUE) || !netif_xmit_frozen_or_stopped(txq)) {
> > + if (!(q->flags & TCQ_F_ONETXQUEUE) ||
> > + !netif_xmit_frozen_or_stopped(txq)) {
> > + int bytelimit = qdisc_avail_bulklimit(txq);
> > +
> > skb = q->dequeue(q);
> > - if (skb)
> > + if (skb) {
> > + bytelimit -= skb->len;
>
> qdisc_pkt_len(skb)
Okay, will update patch tomorrow.
> > skb = validate_xmit_skb(skb, qdisc_dev(q));
> > + }
> > + if (skb && qdisc_may_bulk(q))
> > + skb = try_bulk_dequeue_skb(q, skb, bytelimit);
> > }
> > }
> >
>
> It looks good, but we really need to take care of TSO packets.
As notes above, TSO packets are planned as a future feature. Lets
first see if this works well, without introducing any HoL blocking
issues or other regressions.
> pktgen is nice, but do not represent the majority of the traffic we send
> from high performance host where we want this bulk dequeue thing ;)
This patch is actually targetted towards more normal use-cases.
Pktgen cannot even use this work, as it bypass the qdisc layer...
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2014-09-24 18:29 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-24 16:10 [net-next PATCH 0/1 V4] qdisc bulk dequeuing and utilizing delayed tailptr updates Jesper Dangaard Brouer
2014-09-24 16:12 ` [net-next PATCH 1/1 V4] qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE Jesper Dangaard Brouer
2014-09-24 17:23 ` Eric Dumazet
2014-09-24 17:58 ` Jesper Dangaard Brouer [this message]
2014-09-24 18:34 ` Tom Herbert
2014-09-24 19:22 ` Eric Dumazet
2014-09-25 2:12 ` Eric Dumazet
2014-09-25 2:38 ` Tom Herbert
2014-09-25 2:58 ` Dave Taht
2014-09-26 18:06 ` Eric Dumazet
2014-09-25 23:40 ` Eric Dumazet
2014-09-26 6:04 ` [PATCH net-next] dql: dql_queued() should write first to reduce bus transactions Eric Dumazet
2014-09-26 8:47 ` Jesper Dangaard Brouer
2014-09-26 11:06 ` Hannes Frederic Sowa
2014-09-26 12:02 ` Eric Dumazet
2014-09-28 21:43 ` David Miller
2014-09-26 9:23 ` [net-next PATCH 1/1 V4] qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE David Laight
2014-09-26 13:16 ` David Laight
2014-09-26 13:38 ` Eric Dumazet
2014-09-24 22:13 ` Jamal Hadi Salim
2014-09-25 8:25 ` Jesper Dangaard Brouer
2014-09-25 12:48 ` Jamal Hadi Salim
2014-09-25 14:40 ` Tom Herbert
2014-09-25 14:57 ` Jesper Dangaard Brouer
2014-09-25 15:05 ` Tom Herbert
2014-09-25 15:23 ` Jesper Dangaard Brouer
2014-09-25 15:58 ` Tom Herbert
2014-09-29 20:23 ` Jesper Dangaard Brouer
2014-09-29 20:14 ` Jesper Dangaard Brouer
2014-09-25 15:12 ` Eric Dumazet
2014-09-25 13:52 ` Dave Taht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140924195831.6fb91051@redhat.com \
--to=brouer@redhat.com \
--cc=alexander.h.duyck@intel.com \
--cc=dave.taht@gmail.com \
--cc=davem@davemloft.net \
--cc=dborkman@redhat.com \
--cc=eric.dumazet@gmail.com \
--cc=fw@strlen.de \
--cc=hannes@stressinduktion.org \
--cc=jhs@mojatatu.com \
--cc=john.r.fastabend@intel.com \
--cc=netdev@vger.kernel.org \
--cc=therbert@google.com \
--cc=toke@toke.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).