netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
To: Michel Machado <michel@digirati.com.br>
Cc: Nishanth Devarajan <ndev2021@gmail.com>,
	xiyou.wangcong@gmail.com, jhs@mojatatu.com, jiri@resnulli.us,
	davem@davemloft.net, netdev@vger.kernel.org, doucette@bu.edu
Subject: Re: [PATCH v3 net-next] net/sched: add skbprio scheduler
Date: Tue, 10 Jul 2018 11:33:16 -0300	[thread overview]
Message-ID: <20180710143314.GF10923@localhost.localdomain> (raw)
In-Reply-To: <9322b915-8193-7011-e9ba-e7b22ceb08ba@digirati.com.br>

On Tue, Jul 10, 2018 at 10:03:22AM -0400, Michel Machado wrote:
...
> > You can get 64 different priorities by stacking sch_prio, btw. And if
> > you implement drop_from_tail() as part of Qdisc, you can even get it
> > working for this cascading case too.
> 
>    A solution would be to add another flag to switch between the current
> prio_classify() and a new one to just use skb->priority as in skbprio. This

Sounds promising.

> way we don't risk breaking applications that rely on tcf_classify() and this
> odd behavior that I found in prio_classify():
> 
>         band = TC_H_MIN(band) - 1;
>         if (band >= q->bands)
>                 return q->queues[q->prio2band[0]];
>         return q->queues[band];
> 
>    When band is zero, it returns q->queues[q->prio2band[0]] instead of
> q->queues[band] as it would for other bands less than q->bands.

Agreed, this looks odd. It came from 1d8ae3fdeb00 ("pkt_sched: Remove
RR scheduler."):

        band = TC_H_MIN(band) - 1;
        if (band >= q->bands)
-               band = q->prio2band[0];
-out:
-       if (q->mq)
-               skb_set_queue_mapping(skb, band);
+               return q->queues[q->prio2band[0]];
+
        return q->queues[band];
 }

I can see how it made sense before the change, but not after.

> 
> > > > >      3. The queues of sch_prio.c are struct Qdisc, which don't have a method
> > > > > to drop at its tail.
> > > > 
> > > > That can be implemented, most likely as prio_tail_drop() as above.
> > > 
> > >     struct Qdisc represents *all* qdiscs. My knowledge of the other qdiscs is
> > > limited, but not all qdiscs may have a meaningful method to drop at the
> > > tail. For example: a qdisc that works over flows may not know with flow is
> > 
> > True, but it doesn't mean you have to implement it for all available qdiscs.
> 
>    If it is not implemented for all available qdiscs and the flag to drop at
> the tail is on, sch_prio.c would need to issue a log message whenever a
> packet goes into one of the subqueues that don't drop at the tail and have a
> failsafe behavior.

That's fine. pr_warn_ratelimit() is probably what we need for logging
the error, so it a) doesn't flood kernel log and b) gets activated
even if the sysadmin later try again with another qdisc (as opposed to
pr_warn_once).

For the failsafe behavior, it probably can then just drop the incoming
packet. It is not what you want, yes, but it's an easy way out out of
a non-expected situation and that works well enough.

> 
> > > the tail. Not to mention that this would be a widespread patch to only
> > > support this new prio qdisc. It would be prudent to wait for the production
> > > success of the proposed, self-contained qdisc before making this commitment.
> > 
> > On the other hand, by adding another qdisc you're adding more work
> > that one needs to do when dealing with qdisc infrastructure, such as
> > updating enqueue() prototype, for example.
> > 
> > Once this new qdisc is in, it won't be easy to deprecate it.
> 
>    We need to choose between (1) having skbprio that has some duplicate code
> with sch_prio.c and (2) adding flags to sch_prio.c and make a major
> refactoring of the schedule subsystem to add drop an the tail to qdiscs.

Yes,

> 
>    I mean major because we are not just talking about adding the method
> dequeue_tail() to struct Qdisc and adding dequeue_tail() to all qdiscs. One

I think it is. :-)

> will need to come up with definitions of dequeue_tail() for qdiscs that
> don't naturally have it and even rewrite the data structures of qdiscs. To
> substantiate this last point, consider sch_fifo.c, one of the simplest
> qdiscs available. sch_fifo.c keeps its packets in sch->q, which is of type
> struct qdisc_skb_head. struct qdisc_skb_head doesn't set skb->prev, so it
> cannot drop at the tail without walking through its list.

Yes but this would only be needed for the qdiscs that you want to
support with this flag. Nobody said you need to implement it on all
qdiscs that we have...

> 
>    I do understand the motivation for minimizing duplicate code. But the
> small amount of duplicate code that skbprio adds is cheaper than refactoring
> the scheduler system to only support this new sch_prio.c.

I'm afraid that without the code for option (2) above, this discussion
will become subjective. I'll wait for other opinions here.

Cheers,
  Marcelo

  reply	other threads:[~2018-07-10 14:33 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-07 10:13 [PATCH v3 net-next] net/sched: add skbprio scheduler Nishanth Devarajan
2018-07-09 15:44 ` Marcelo Ricardo Leitner
2018-07-09 18:18   ` Michel Machado
2018-07-09 19:53     ` Marcelo Ricardo Leitner
2018-07-09 21:03       ` Michel Machado
2018-07-09 21:40         ` Marcelo Ricardo Leitner
2018-07-10 14:03           ` Michel Machado
2018-07-10 14:33             ` Marcelo Ricardo Leitner [this message]
2018-07-11  2:25           ` Cong Wang
2018-07-11 19:33             ` Marcelo Ricardo Leitner
2018-07-13  6:05               ` Cong Wang
2018-07-13 13:04                 ` Marcelo Ricardo Leitner
2018-07-13 18:26                   ` Cong Wang
2018-07-14  4:39                     ` Marcelo Ricardo Leitner
2018-07-17  6:41                       ` Cong Wang
2018-07-11  2:32       ` Cong Wang
2018-07-11 18:37         ` Marcelo Ricardo Leitner
2018-07-13  5:07           ` Cong Wang
2018-07-13 13:00             ` Marcelo Ricardo Leitner
2018-07-13 18:17               ` Cong Wang
2018-07-14  4:51                 ` Marcelo Ricardo Leitner
2018-07-17  5:36                   ` Cong Wang
2018-07-11  2:38     ` Cong Wang
2018-07-11  2:57 ` Cong Wang
2018-07-11 15:24   ` Michel Machado
2018-07-19 18:39     ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180710143314.GF10923@localhost.localdomain \
    --to=marcelo.leitner@gmail.com \
    --cc=davem@davemloft.net \
    --cc=doucette@bu.edu \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=michel@digirati.com.br \
    --cc=ndev2021@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).