From: Vinicius Costa Gomes <vinicius.gomes@intel.com>
To: Vladimir Oltean <vladimir.oltean@nxp.com>,
netdev@vger.kernel.org, John Fastabend <john.fastabend@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Claudiu Manoil <claudiu.manoil@nxp.com>,
Camelia Groza <camelia.groza@nxp.com>,
Xiaoliang Yang <xiaoliang.yang_1@nxp.com>,
Gerhard Engleder <gerhard@engleder-embedded.com>,
Alexander Duyck <alexander.duyck@gmail.com>,
Kurt Kanzenbach <kurt@linutronix.de>,
Ferenc Fejes <ferenc.fejes@ericsson.com>,
Tony Nguyen <anthony.l.nguyen@intel.com>,
Jesse Brandeburg <jesse.brandeburg@intel.com>,
Jacob Keller <jacob.e.keller@intel.com>
Subject: Re: [RFC PATCH net-next 00/11] ENETC mqprio/taprio cleanup
Date: Tue, 24 Jan 2023 17:11:29 -0800 [thread overview]
Message-ID: <87tu0fh0zi.fsf@intel.com> (raw)
In-Reply-To: <20230120141537.1350744-1-vladimir.oltean@nxp.com>
Hi Vladimir,
Sorry for the delay. I had to sleep on this for a bit.
Vladimir Oltean <vladimir.oltean@nxp.com> writes:
> I realize that this patch set will start a flame war, but there are
> things about the mqprio qdisc that I simply don't understand, so in an
> attempt to explain how I see things should be done, I've made some
> patches to the code. I hope the reviewers will be patient enough with me :)
>
> I need to touch mqprio because I'm preparing a patch set for Frame
> Preemption (an IEEE 802.1Q feature). A disagreement started with
> Vinicius here:
> https://patchwork.kernel.org/project/netdevbpf/patch/20220816222920.1952936-3-vladimir.oltean@nxp.com/#24976672
>
> regarding how TX packet prioritization should be handled. Vinicius said
> that for some Intel NICs, prioritization at the egress scheduler stage
> is fundamentally attached to TX queues rather than traffic classes.
>
> In other words, in the "popular" mqprio configuration documented by him:
>
> $ tc qdisc replace dev $IFACE parent root handle 100 mqprio \
> num_tc 3 \
> map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
> queues 1@0 1@1 2@2 \
> hw 0
>
> there are 3 Linux traffic classes and 4 TX queues. The TX queues are
> organized in strict priority fashion, like this: TXQ 0 has highest prio
> (hardware dequeue precedence for TX scheduler), TXQ 3 has lowest prio.
> Packets classified by Linux to TC 2 are hashed between TXQ 2 and TXQ 3,
> but the hardware has higher precedence for TXQ2 over TXQ 3, and Linux
> doesn't know that.
>
> I am surprised by this fact, and this isn't how ENETC works at all.
> For ENETC, we try to prioritize on TCs rather than TXQs, and TC 7 has
> higher priority than TC 7. For us, groups of TXQs that map to the same
> TC have the same egress scheduling priority. It is possible (and maybe
> useful) to have 2 TXQs per TC - one TXQ per CPU). Patch 07/11 tries to
> make that more clear.
>
That makes me think, making "queues" visible on mqprio/taprio perhaps
was a mistake. Perhaps if we only had the "prio to tc" map, and relied
on drivers implementing .ndo_select_queue() that would be less
problematic. And for devices with tens/hundreds of queues, this "no
queues to the user exposed" sounds like a better model. Anyway... just
wondering.
Perhaps something to think about for mqprio/taprio/etc "the next generation" ;-)
> Furthermore (and this is really the biggest point of contention), myself
> and Vinicius have the fundamental disagreement whether the 802.1Qbv
> (taprio) gate mask should be passed to the device driver per TXQ or per
> TC. This is what patch 11/11 is about.
>
I think that I was being annoying because I believed that some
implementation detail of the netdev prio_tc_map and the way that netdev
select TX queues (the "core of how mqprio works") would leak, and it
would be easier/more correct to make other vendors adapt themselves to
the "Intel"/"queues have priorities" model. But I stand corrected, as
you (and others) have proven.
In short, I am not opposed to this idea. This capability operation
really opens some possibilities. The patches look clean.
I'll play with the patches later in the week, quite swamped at this
point.
> Again, I'm not *certain* that my opinion on this topic is correct
> (and it sure is confusing to see such a different approach for Intel).
> But I would appreciate any feedback.
And that reminds me, I owe you a beverage of your choice. For all your
effort.
>
> Vladimir Oltean (11):
> net/sched: mqprio: refactor nlattr parsing to a separate function
> net/sched: mqprio: refactor offloading and unoffloading to dedicated
> functions
> net/sched: move struct tc_mqprio_qopt_offload from pkt_cls.h to
> pkt_sched.h
> net/sched: mqprio: allow offloading drivers to request queue count
> validation
> net/sched: mqprio: add extack messages for queue count validation
> net: enetc: request mqprio to validate the queue counts
> net: enetc: act upon the requested mqprio queue configuration
> net/sched: taprio: pass mqprio queue configuration to ndo_setup_tc()
> net: enetc: act upon mqprio queue config in taprio offload
> net/sched: taprio: validate that gate mask does not exceed number of
> TCs
> net/sched: taprio: only calculate gate mask per TXQ for igc
>
> drivers/net/ethernet/freescale/enetc/enetc.c | 67 ++--
> .../net/ethernet/freescale/enetc/enetc_qos.c | 27 +-
> drivers/net/ethernet/intel/igc/igc_main.c | 17 +
> include/net/pkt_cls.h | 10 -
> include/net/pkt_sched.h | 16 +
> net/sched/sch_mqprio.c | 298 +++++++++++-------
> net/sched/sch_taprio.c | 57 ++--
> 7 files changed, 310 insertions(+), 182 deletions(-)
>
> --
> 2.34.1
>
Cheers,
--
Vinicius
next prev parent reply other threads:[~2023-01-25 1:15 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-20 14:15 [RFC PATCH net-next 00/11] ENETC mqprio/taprio cleanup Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 01/11] net/sched: mqprio: refactor nlattr parsing to a separate function Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 02/11] net/sched: mqprio: refactor offloading and unoffloading to dedicated functions Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 03/11] net/sched: move struct tc_mqprio_qopt_offload from pkt_cls.h to pkt_sched.h Vladimir Oltean
2023-01-25 13:09 ` Kurt Kanzenbach
2023-01-25 13:16 ` Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 04/11] net/sched: mqprio: allow offloading drivers to request queue count validation Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 05/11] net/sched: mqprio: add extack messages for " Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 06/11] net: enetc: request mqprio to validate the queue counts Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 07/11] net: enetc: act upon the requested mqprio queue configuration Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 08/11] net/sched: taprio: pass mqprio queue configuration to ndo_setup_tc() Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 09/11] net: enetc: act upon mqprio queue config in taprio offload Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 10/11] net/sched: taprio: validate that gate mask does not exceed number of TCs Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 11/11] net/sched: taprio: only calculate gate mask per TXQ for igc Vladimir Oltean
2023-01-25 1:11 ` Vinicius Costa Gomes
2023-01-23 18:22 ` [RFC PATCH net-next 00/11] ENETC mqprio/taprio cleanup Jacob Keller
2023-01-24 14:26 ` Vladimir Oltean
2023-01-24 22:30 ` Jacob Keller
2023-01-23 21:21 ` Gerhard Engleder
2023-01-23 21:31 ` Vladimir Oltean
2023-01-23 22:20 ` Gerhard Engleder
2023-01-25 1:11 ` Vinicius Costa Gomes [this message]
2023-01-25 13:10 ` Vladimir Oltean
2023-01-25 22:47 ` Vinicius Costa Gomes
2023-01-26 20:39 ` Vladimir Oltean
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tu0fh0zi.fsf@intel.com \
--to=vinicius.gomes@intel.com \
--cc=alexander.duyck@gmail.com \
--cc=anthony.l.nguyen@intel.com \
--cc=camelia.groza@nxp.com \
--cc=claudiu.manoil@nxp.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=ferenc.fejes@ericsson.com \
--cc=gerhard@engleder-embedded.com \
--cc=jacob.e.keller@intel.com \
--cc=jesse.brandeburg@intel.com \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=kurt@linutronix.de \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=vladimir.oltean@nxp.com \
--cc=xiaoliang.yang_1@nxp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.