From: Vinicius Costa Gomes <vinicius.gomes@intel.com>
To: Vladimir Oltean <vladimir.oltean@nxp.com>,
netdev@vger.kernel.org, John Fastabend <john.fastabend@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Claudiu Manoil <claudiu.manoil@nxp.com>,
Camelia Groza <camelia.groza@nxp.com>,
Xiaoliang Yang <xiaoliang.yang_1@nxp.com>,
Gerhard Engleder <gerhard@engleder-embedded.com>,
Alexander Duyck <alexander.duyck@gmail.com>,
Kurt Kanzenbach <kurt@linutronix.de>,
Ferenc Fejes <ferenc.fejes@ericsson.com>,
Tony Nguyen <anthony.l.nguyen@intel.com>,
Jesse Brandeburg <jesse.brandeburg@intel.com>,
Jacob Keller <jacob.e.keller@intel.com>
Subject: Re: [RFC PATCH net-next 00/11] ENETC mqprio/taprio cleanup
Date: Tue, 24 Jan 2023 17:11:29 -0800 [thread overview]
Message-ID: <87tu0fh0zi.fsf@intel.com> (raw)
In-Reply-To: <20230120141537.1350744-1-vladimir.oltean@nxp.com>
Hi Vladimir,
Sorry for the delay. I had to sleep on this for a bit.
Vladimir Oltean <vladimir.oltean@nxp.com> writes:
> I realize that this patch set will start a flame war, but there are
> things about the mqprio qdisc that I simply don't understand, so in an
> attempt to explain how I see things should be done, I've made some
> patches to the code. I hope the reviewers will be patient enough with me :)
>
> I need to touch mqprio because I'm preparing a patch set for Frame
> Preemption (an IEEE 802.1Q feature). A disagreement started with
> Vinicius here:
> https://patchwork.kernel.org/project/netdevbpf/patch/20220816222920.1952936-3-vladimir.oltean@nxp.com/#24976672
>
> regarding how TX packet prioritization should be handled. Vinicius said
> that for some Intel NICs, prioritization at the egress scheduler stage
> is fundamentally attached to TX queues rather than traffic classes.
>
> In other words, in the "popular" mqprio configuration documented by him:
>
> $ tc qdisc replace dev $IFACE parent root handle 100 mqprio \
> num_tc 3 \
> map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
> queues 1@0 1@1 2@2 \
> hw 0
>
> there are 3 Linux traffic classes and 4 TX queues. The TX queues are
> organized in strict priority fashion, like this: TXQ 0 has highest prio
> (hardware dequeue precedence for TX scheduler), TXQ 3 has lowest prio.
> Packets classified by Linux to TC 2 are hashed between TXQ 2 and TXQ 3,
> but the hardware has higher precedence for TXQ2 over TXQ 3, and Linux
> doesn't know that.
>
> I am surprised by this fact, and this isn't how ENETC works at all.
> For ENETC, we try to prioritize on TCs rather than TXQs, and TC 7 has
> higher priority than TC 7. For us, groups of TXQs that map to the same
> TC have the same egress scheduling priority. It is possible (and maybe
> useful) to have 2 TXQs per TC - one TXQ per CPU). Patch 07/11 tries to
> make that more clear.
>
That makes me think, making "queues" visible on mqprio/taprio perhaps
was a mistake. Perhaps if we only had the "prio to tc" map, and relied
on drivers implementing .ndo_select_queue() that would be less
problematic. And for devices with tens/hundreds of queues, this "no
queues to the user exposed" sounds like a better model. Anyway... just
wondering.
Perhaps something to think about for mqprio/taprio/etc "the next generation" ;-)
> Furthermore (and this is really the biggest point of contention), myself
> and Vinicius have the fundamental disagreement whether the 802.1Qbv
> (taprio) gate mask should be passed to the device driver per TXQ or per
> TC. This is what patch 11/11 is about.
>
I think that I was being annoying because I believed that some
implementation detail of the netdev prio_tc_map and the way that netdev
select TX queues (the "core of how mqprio works") would leak, and it
would be easier/more correct to make other vendors adapt themselves to
the "Intel"/"queues have priorities" model. But I stand corrected, as
you (and others) have proven.
In short, I am not opposed to this idea. This capability operation
really opens some possibilities. The patches look clean.
I'll play with the patches later in the week, quite swamped at this
point.
> Again, I'm not *certain* that my opinion on this topic is correct
> (and it sure is confusing to see such a different approach for Intel).
> But I would appreciate any feedback.
And that reminds me, I owe you a beverage of your choice. For all your
effort.
>
> Vladimir Oltean (11):
> net/sched: mqprio: refactor nlattr parsing to a separate function
> net/sched: mqprio: refactor offloading and unoffloading to dedicated
> functions
> net/sched: move struct tc_mqprio_qopt_offload from pkt_cls.h to
> pkt_sched.h
> net/sched: mqprio: allow offloading drivers to request queue count
> validation
> net/sched: mqprio: add extack messages for queue count validation
> net: enetc: request mqprio to validate the queue counts
> net: enetc: act upon the requested mqprio queue configuration
> net/sched: taprio: pass mqprio queue configuration to ndo_setup_tc()
> net: enetc: act upon mqprio queue config in taprio offload
> net/sched: taprio: validate that gate mask does not exceed number of
> TCs
> net/sched: taprio: only calculate gate mask per TXQ for igc
>
> drivers/net/ethernet/freescale/enetc/enetc.c | 67 ++--
> .../net/ethernet/freescale/enetc/enetc_qos.c | 27 +-
> drivers/net/ethernet/intel/igc/igc_main.c | 17 +
> include/net/pkt_cls.h | 10 -
> include/net/pkt_sched.h | 16 +
> net/sched/sch_mqprio.c | 298 +++++++++++-------
> net/sched/sch_taprio.c | 57 ++--
> 7 files changed, 310 insertions(+), 182 deletions(-)
>
> --
> 2.34.1
>
Cheers,
--
Vinicius
next prev parent reply other threads:[~2023-01-25 1:15 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-20 14:15 [RFC PATCH net-next 00/11] ENETC mqprio/taprio cleanup Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 01/11] net/sched: mqprio: refactor nlattr parsing to a separate function Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 02/11] net/sched: mqprio: refactor offloading and unoffloading to dedicated functions Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 03/11] net/sched: move struct tc_mqprio_qopt_offload from pkt_cls.h to pkt_sched.h Vladimir Oltean
2023-01-25 13:09 ` Kurt Kanzenbach
2023-01-25 13:16 ` Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 04/11] net/sched: mqprio: allow offloading drivers to request queue count validation Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 05/11] net/sched: mqprio: add extack messages for " Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 06/11] net: enetc: request mqprio to validate the queue counts Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 07/11] net: enetc: act upon the requested mqprio queue configuration Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 08/11] net/sched: taprio: pass mqprio queue configuration to ndo_setup_tc() Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 09/11] net: enetc: act upon mqprio queue config in taprio offload Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 10/11] net/sched: taprio: validate that gate mask does not exceed number of TCs Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 11/11] net/sched: taprio: only calculate gate mask per TXQ for igc Vladimir Oltean
2023-01-25 1:11 ` Vinicius Costa Gomes
2023-01-23 18:22 ` [RFC PATCH net-next 00/11] ENETC mqprio/taprio cleanup Jacob Keller
2023-01-24 14:26 ` Vladimir Oltean
2023-01-24 22:30 ` Jacob Keller
2023-01-23 21:21 ` Gerhard Engleder
2023-01-23 21:31 ` Vladimir Oltean
2023-01-23 22:20 ` Gerhard Engleder
2023-01-25 1:11 ` Vinicius Costa Gomes [this message]
2023-01-25 13:10 ` Vladimir Oltean
2023-01-25 22:47 ` Vinicius Costa Gomes
2023-01-26 20:39 ` Vladimir Oltean
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tu0fh0zi.fsf@intel.com \
--to=vinicius.gomes@intel.com \
--cc=alexander.duyck@gmail.com \
--cc=anthony.l.nguyen@intel.com \
--cc=camelia.groza@nxp.com \
--cc=claudiu.manoil@nxp.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=ferenc.fejes@ericsson.com \
--cc=gerhard@engleder-embedded.com \
--cc=jacob.e.keller@intel.com \
--cc=jesse.brandeburg@intel.com \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=kurt@linutronix.de \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=vladimir.oltean@nxp.com \
--cc=xiaoliang.yang_1@nxp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).