From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>,
netdev@vger.kernel.org, therbert@google.com,
hannes@stressinduktion.org, fw@strlen.de, dborkman@redhat.com,
jhs@mojatatu.com, alexander.duyck@gmail.com,
john.r.fastabend@intel.com, dave.taht@gmail.com, toke@toke.dk,
brouer@redhat.com
Subject: Re: Quota in __qdisc_run() (was: qdisc: validate skb without holding lock)
Date: Tue, 7 Oct 2014 15:30:50 +0200 [thread overview]
Message-ID: <20141007153050.792c9743@redhat.com> (raw)
In-Reply-To: <1412686038.11091.111.camel@edumazet-glaptop2.roam.corp.google.com>
On Tue, 07 Oct 2014 05:47:18 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-10-07 at 09:34 +0200, Jesper Dangaard Brouer wrote:
> > On Fri, 03 Oct 2014 16:30:44 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > > Another problem we need to address is the quota in __qdisc_run()
> > > is no longer meaningfull, if each qdisc_restart() can pump many packets.
> >
> > I fully agree. My earlier "magic" packet limit was covering/pampering
> > over this issue.
>
> Although quota was multiplied by 7 or 8 in worst case ?
Yes, exactly not a very elegant solution ;-)
> > > An idea would be to use the bstats (or cpu_qstats if applicable)
> >
> > Please elaborate some more, as I don't completely follow (feel free to
> > show with a patch ;-)).
> >
>
> I was hoping John could finish the percpu stats before I do that.
>
> Problem with q->bstats.packets is that TSO packets with 45 MSS add 45 to
> this counter.
>
> Using a time quota would be better, but : jiffies is too big, and
> local_clock() might be too expensive.
Hannes hacked up this patch for me... (didn't finish testing)
The basic idea is we want keep/restore the quota fairness between
qdisc's , that we sort of broke with commit 5772e9a346 ("qdisc: bulk
dequeue support for qdiscs with TCQ_F_ONETXQUEUE").
We choose not to account for the number of packets inside the TSO/GSO
packets ("skb_gso_segs"). As the previous fairness also had this "defect".
You might view this as a short term solution, until you can fix it with
your q->bstats.packets or time quota?
[RFC PATCH] net_sched: restore quota limits after bulk dequeue
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -57,17 +57,19 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
static void try_bulk_dequeue_skb(struct Qdisc *q,
struct sk_buff *skb,
- const struct netdev_queue *txq)
+ const struct netdev_queue *txq,
+ int *quota)
{
int bytelimit = qdisc_avail_bulklimit(txq) - skb->len;
- while (bytelimit > 0) {
+ while (bytelimit > 0 && *quota > 0) {
struct sk_buff *nskb = q->dequeue(q);
if (!nskb)
break;
bytelimit -= nskb->len; /* covers GSO len */
+ --*quota;
skb->next = nskb;
skb = nskb;
}
@@ -77,7 +79,7 @@ static void try_bulk_dequeue_skb(struct Qdisc *q,
/* Note that dequeue_skb can possibly return a SKB list (via skb->next).
* A requeued skb (via q->gso_skb) can also be a SKB list.
*/
-static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate)
+static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate, int *quota)
{
struct sk_buff *skb = q->gso_skb;
const struct netdev_queue *txq = q->dev_queue;
@@ -87,18 +89,25 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate)
/* check the reason of requeuing without tx lock first */
txq = skb_get_tx_queue(txq->dev, skb);
if (!netif_xmit_frozen_or_stopped(txq)) {
+ struct sk_buff *iskb = skb;
+
q->gso_skb = NULL;
q->q.qlen--;
- } else
+ do
+ --*quota;
+ while ((iskb = skb->next));
+ } else {
skb = NULL;
+ }
/* skb in gso_skb were already validated */
*validate = false;
} else {
if (!(q->flags & TCQ_F_ONETXQUEUE) ||
!netif_xmit_frozen_or_stopped(txq)) {
skb = q->dequeue(q);
+ --*quota;
if (skb && qdisc_may_bulk(q))
- try_bulk_dequeue_skb(q, skb, txq);
+ try_bulk_dequeue_skb(q, skb, txq, quota);
}
}
return skb;
@@ -204,7 +213,7 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
* >0 - queue is not empty.
*
*/
-static inline int qdisc_restart(struct Qdisc *q)
+static inline int qdisc_restart(struct Qdisc *q, int *quota)
{
struct netdev_queue *txq;
struct net_device *dev;
@@ -213,7 +222,7 @@ static inline int qdisc_restart(struct Qdisc *q)
bool validate;
/* Dequeue packet */
- skb = dequeue_skb(q, &validate);
+ skb = dequeue_skb(q, &validate, quota);
if (unlikely(!skb))
return 0;
@@ -230,13 +239,13 @@ void __qdisc_run(struct Qdisc *q)
{
int quota = weight_p;
- while (qdisc_restart(q)) {
+ while (qdisc_restart(q, "a)) {
/*
* Ordered by possible occurrence: Postpone processing if
* 1. we've exceeded packet quota
* 2. another process needs the CPU;
*/
- if (--quota <= 0 || need_resched()) {
+ if (quota <= 0 || need_resched()) {
__netif_schedule(q);
break;
}
next prev parent reply other threads:[~2014-10-07 14:22 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-01 20:35 [net-next PATCH V6 0/2] qdisc: bulk dequeue support Jesper Dangaard Brouer
2014-10-01 20:35 ` [net-next PATCH V6 1/2] qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE Jesper Dangaard Brouer
2014-10-01 20:36 ` [net-next PATCH V6 2/2] qdisc: dequeue bulking also pickup GSO/TSO packets Jesper Dangaard Brouer
2014-10-02 14:35 ` Eric Dumazet
2014-10-02 14:38 ` Daniel Borkmann
2014-10-02 14:42 ` [net-next PATCH V6 0/2] qdisc: bulk dequeue support Tom Herbert
2014-10-02 15:04 ` Eric Dumazet
2014-10-02 15:24 ` [PATCH net-next] mlx4: add a new xmit_more counter Eric Dumazet
2014-10-05 0:04 ` David Miller
2014-10-02 15:27 ` [net-next PATCH V6 0/2] qdisc: bulk dequeue support Tom Herbert
2014-10-02 16:52 ` Florian Westphal
2014-10-02 17:32 ` Eric Dumazet
2014-10-02 17:35 ` Tom Herbert
2014-10-03 19:38 ` David Miller
2014-10-03 20:57 ` Eric Dumazet
2014-10-03 21:56 ` David Miller
2014-10-03 21:57 ` Eric Dumazet
2014-10-03 22:15 ` Eric Dumazet
2014-10-03 22:19 ` Tom Herbert
2014-10-03 22:56 ` Eric Dumazet
2014-10-03 22:30 ` David Miller
2014-10-03 22:31 ` [PATCH net-next] qdisc: validate skb without holding lock Eric Dumazet
2014-10-03 22:36 ` David Miller
2014-10-03 23:30 ` Eric Dumazet
2014-10-07 7:34 ` Quota in __qdisc_run() (was: qdisc: validate skb without holding lock) Jesper Dangaard Brouer
2014-10-07 12:47 ` Eric Dumazet
2014-10-07 13:30 ` Jesper Dangaard Brouer [this message]
2014-10-07 14:43 ` Hannes Frederic Sowa
2014-10-07 15:01 ` Eric Dumazet
2014-10-07 15:06 ` Eric Dumazet
2014-10-07 17:19 ` Quota in __qdisc_run() David Miller
2014-10-07 17:32 ` Eric Dumazet
2014-10-07 18:37 ` Jesper Dangaard Brouer
2014-10-07 20:07 ` Jesper Dangaard Brouer
2014-10-07 18:03 ` Jesper Dangaard Brouer
2014-10-07 19:10 ` Eric Dumazet
2014-10-07 19:34 ` Jesper Dangaard Brouer
2014-10-07 15:26 ` Quota in __qdisc_run() (was: qdisc: validate skb without holding lock) Jesper Dangaard Brouer
2014-10-08 17:38 ` Quota in __qdisc_run() John Fastabend
2014-10-06 14:12 ` [PATCH net-next] qdisc: validate skb without holding lock Jesper Dangaard Brouer
2014-10-04 3:59 ` [PATCH net-next] net: skb_segment() provides list head and tail Eric Dumazet
2014-10-06 4:38 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141007153050.792c9743@redhat.com \
--to=brouer@redhat.com \
--cc=alexander.duyck@gmail.com \
--cc=dave.taht@gmail.com \
--cc=davem@davemloft.net \
--cc=dborkman@redhat.com \
--cc=eric.dumazet@gmail.com \
--cc=fw@strlen.de \
--cc=hannes@stressinduktion.org \
--cc=jhs@mojatatu.com \
--cc=john.r.fastabend@intel.com \
--cc=netdev@vger.kernel.org \
--cc=therbert@google.com \
--cc=toke@toke.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).