From: Jarek Poplawski <jarkao2@gmail.com>
To: John Fastabend <john.r.fastabend@intel.com>
Cc: "davem@davemloft.net" <davem@davemloft.net>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"hadi@cyberus.ca" <hadi@cyberus.ca>,
"shemminger@vyatta.com" <shemminger@vyatta.com>,
"tgraf@infradead.org" <tgraf@infradead.org>,
"eric.dumazet@gmail.com" <eric.dumazet@gmail.com>,
"bhutchings@solarflare.com" <bhutchings@solarflare.com>,
"nhorman@tuxdriver.com" <nhorman@tuxdriver.com>
Subject: Re: [net-next-2.6 PATCH v2 3/3] net_sched: implement a root container qdisc sch_mclass
Date: Mon, 3 Jan 2011 18:02:44 +0100 [thread overview]
Message-ID: <20110103170244.GA1843@del.dom.local> (raw)
In-Reply-To: <4D2161FF.4070804@intel.com>
On Sun, Jan 02, 2011 at 09:43:27PM -0800, John Fastabend wrote:
> On 12/30/2010 3:37 PM, Jarek Poplawski wrote:
> > John Fastabend wrote:
> >> This implements a mclass 'multi-class' queueing discipline that by
> >> default creates multiple mq qdisc's one for each traffic class. Each
> >> mq qdisc then owns a range of queues per the netdev_tc_txq mappings.
> >
> > Is it really necessary to add one more abstraction layer for this,
> > probably not most often used (or even asked by users), functionality?
> > Why mclass can't simply do these few things more instead of attaching
> > (and changing) mq?
> >
>
> The statistics work nicely when the mq qdisc is used.
Well, I sometimes add leaf qdiscs only to get class stats with less
typing, too ;-)
>
> qdisc mclass 8002: root tc 4 map 0 1 2 3 0 1 2 3 1 1 1 1 1 1 1 1
> queues:(0:1) (2:3) (4:5) (6:15)
> Sent 140 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> qdisc mq 8003: parent 8002:1
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> qdisc mq 8004: parent 8002:2
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> qdisc mq 8005: parent 8002:3
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> qdisc mq 8006: parent 8002:4
> Sent 140 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> qdisc sfq 8007: parent 8005:1 limit 127p quantum 1514b
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> qdisc sfq 8008: parent 8005:2 limit 127p quantum 1514b
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
>
> The mclass gives the statistics for the interface and then statistics on the mq qdisc gives statistics for each traffic class. Also, when using the 'mq qdisc' with this abstraction other qdisc can be grafted onto the queue. For example the sch_sfq is used in the above example.
IMHO, these tc offsets and counts make simply two level hierarchy
(classes with leaf subclasses) similarly (or simpler) to other
classful qdisc which manage it all inside one module. Of course,
we could think of another way of code organization, but it should
be rather done at the beginning of schedulers design. The mq qdisc
broke the design a bit adding a fake root, but I doubt we should go
deeper unless it's necessary. Doing mclass (or something) as a more
complex alternative to mq should be enough. Why couldn't mclass graft
sch_sfq the same way as mq?
>
> Although I am not too hung up on this use case it does seem to be a good abstraction to me. Is it strictly necessary though no and looking at the class statistics of mclass could be used to get stats per traffic class.
I am not too hung up on this either, especially if it's OK to others,
especially to DaveM ;-)
>
> > ...
> >> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> >> index 0af57eb..723ee52 100644
> >> --- a/include/net/sch_generic.h
> >> +++ b/include/net/sch_generic.h
> >> @@ -50,6 +50,7 @@ struct Qdisc {
> >> #define TCQ_F_INGRESS 4
> >> #define TCQ_F_CAN_BYPASS 8
> >> #define TCQ_F_MQROOT 16
> >> +#define TCQ_F_MQSAFE 32
> >
> > If every other qdisc added a flag for qdiscs it likes...
> >
>
> then we run out of bits and get unneeded complexity. I think I will drop the MQSAFE bit completely and let user space catch this. The worst that should happen is the noop qdisc is used.
Maybe you're right. On the other hand, usually flags are added for
more general purpose and the optimal/wrong configs are the matter of
documentation.
>
> >> @@ -709,7 +709,13 @@ static void attach_default_qdiscs(struct net_device *dev)
> >> dev->qdisc = txq->qdisc_sleeping;
> >> atomic_inc(&dev->qdisc->refcnt);
> >> } else {
> >> - qdisc = qdisc_create_dflt(txq, &mq_qdisc_ops, TC_H_ROOT);
> >> + if (dev->num_tc)
> >
> > Actually, where this num_tc is expected to be set? I can see it inside
> > mclass only, with unsetting on destruction, but probably I miss something.
>
> Either through mclass as you noted or a driver could set the num_tc. One of the RFC's I sent out has ixgbe setting the num_tc when DCB was enabled.
OK, I probably missed this second possibility in the last version.
...
> >> + /* Unwind attributes on failure */
> >> + u8 unwnd_tc = dev->num_tc;
> >> + u8 unwnd_map[16];
> >
> > [TC_MAX_QUEUE] ?
>
> Actually TC_BITMASK+1 is probably more accurate. This array maps the skb priority to a traffic class after the priority is masked with TC_BITMASK.
>
> >
> >> + struct netdev_tc_txq unwnd_txq[16];
> >> +
>
> Although unwnd_txq should be TC_MAX_QUEUE.
...
> >> + /* Always use supplied priority mappings */
> >> + for (i = 0; i < 16; i++) {
> >
> > i < qopt->num_tc ?
>
> Nope, TC_BITMASK+1 here. If we only have 4 tcs for example we still need to map all 16 priority values to a tc.
OK, anyway, all these '16' should be 'upgraded'.
Thanks,
Jarek P.
next prev parent reply other threads:[~2011-01-03 17:02 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-21 19:28 [net-next-2.6 PATCH v2 1/3] net: implement mechanism for HW based QOS John Fastabend
2010-12-21 19:29 ` [net-next-2.6 PATCH v2 2/3] net_sched: Allow multiple mq qdisc to be used as non-root John Fastabend
2010-12-21 19:29 ` [net-next-2.6 PATCH v2 3/3] net_sched: implement a root container qdisc sch_mclass John Fastabend
2010-12-30 23:37 ` Jarek Poplawski
2010-12-30 23:56 ` Jarek Poplawski
2011-01-03 5:43 ` John Fastabend
2011-01-03 17:02 ` Jarek Poplawski [this message]
2011-01-03 20:37 ` John Fastabend
2011-01-03 22:59 ` Jarek Poplawski
2011-01-04 0:18 ` John Fastabend
2011-01-04 2:59 ` John Fastabend
2010-12-31 9:25 ` Jarek Poplawski
2011-01-03 5:46 ` John Fastabend
2011-01-03 17:04 ` Jarek Poplawski
2010-12-22 9:12 ` [net-next-2.6 PATCH v2 1/3] net: implement mechanism for HW based QOS Johannes Berg
2010-12-23 5:29 ` John Fastabend
2010-12-26 23:47 ` Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110103170244.GA1843@del.dom.local \
--to=jarkao2@gmail.com \
--cc=bhutchings@solarflare.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=hadi@cyberus.ca \
--cc=john.r.fastabend@intel.com \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=shemminger@vyatta.com \
--cc=tgraf@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).