netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vinicius Costa Gomes <vinicius.gomes@intel.com>
To: Vladimir Oltean <vladimir.oltean@nxp.com>,
	netdev@vger.kernel.org, John Fastabend <john.fastabend@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Claudiu Manoil <claudiu.manoil@nxp.com>,
	Camelia Groza <camelia.groza@nxp.com>,
	Xiaoliang Yang <xiaoliang.yang_1@nxp.com>,
	Gerhard Engleder <gerhard@engleder-embedded.com>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	Kurt Kanzenbach <kurt@linutronix.de>,
	Ferenc Fejes <ferenc.fejes@ericsson.com>,
	Tony Nguyen <anthony.l.nguyen@intel.com>,
	Jesse Brandeburg <jesse.brandeburg@intel.com>,
	Jacob Keller <jacob.e.keller@intel.com>
Subject: Re: [RFC PATCH net-next 00/11] ENETC mqprio/taprio cleanup
Date: Tue, 24 Jan 2023 17:11:29 -0800	[thread overview]
Message-ID: <87tu0fh0zi.fsf@intel.com> (raw)
In-Reply-To: <20230120141537.1350744-1-vladimir.oltean@nxp.com>

Hi Vladimir,

Sorry for the delay. I had to sleep on this for a bit.

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> I realize that this patch set will start a flame war, but there are
> things about the mqprio qdisc that I simply don't understand, so in an
> attempt to explain how I see things should be done, I've made some
> patches to the code. I hope the reviewers will be patient enough with me :)
>
> I need to touch mqprio because I'm preparing a patch set for Frame
> Preemption (an IEEE 802.1Q feature). A disagreement started with
> Vinicius here:
> https://patchwork.kernel.org/project/netdevbpf/patch/20220816222920.1952936-3-vladimir.oltean@nxp.com/#24976672
>
> regarding how TX packet prioritization should be handled. Vinicius said
> that for some Intel NICs, prioritization at the egress scheduler stage
> is fundamentally attached to TX queues rather than traffic classes.
>
> In other words, in the "popular" mqprio configuration documented by him:
>
> $ tc qdisc replace dev $IFACE parent root handle 100 mqprio \
>       num_tc 3 \
>       map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
>       queues 1@0 1@1 2@2 \
>       hw 0
>
> there are 3 Linux traffic classes and 4 TX queues. The TX queues are
> organized in strict priority fashion, like this: TXQ 0 has highest prio
> (hardware dequeue precedence for TX scheduler), TXQ 3 has lowest prio.
> Packets classified by Linux to TC 2 are hashed between TXQ 2 and TXQ 3,
> but the hardware has higher precedence for TXQ2 over TXQ 3, and Linux
> doesn't know that.
>
> I am surprised by this fact, and this isn't how ENETC works at all.
> For ENETC, we try to prioritize on TCs rather than TXQs, and TC 7 has
> higher priority than TC 7. For us, groups of TXQs that map to the same
> TC have the same egress scheduling priority. It is possible (and maybe
> useful) to have 2 TXQs per TC - one TXQ per CPU). Patch 07/11 tries to
> make that more clear.
>

That makes me think, making "queues" visible on mqprio/taprio perhaps
was a mistake. Perhaps if we only had the "prio to tc" map, and relied
on drivers implementing .ndo_select_queue() that would be less
problematic. And for devices with tens/hundreds of queues, this "no
queues to the user exposed" sounds like a better model. Anyway... just
wondering.

Perhaps something to think about for mqprio/taprio/etc "the next generation" ;-)

> Furthermore (and this is really the biggest point of contention), myself
> and Vinicius have the fundamental disagreement whether the 802.1Qbv
> (taprio) gate mask should be passed to the device driver per TXQ or per
> TC. This is what patch 11/11 is about.
>

I think that I was being annoying because I believed that some
implementation detail of the netdev prio_tc_map and the way that netdev
select TX queues (the "core of how mqprio works") would leak, and it
would be easier/more correct to make other vendors adapt themselves to
the "Intel"/"queues have priorities" model. But I stand corrected, as
you (and others) have proven.

In short, I am not opposed to this idea. This capability operation
really opens some possibilities. The patches look clean.

I'll play with the patches later in the week, quite swamped at this
point.

> Again, I'm not *certain* that my opinion on this topic is correct
> (and it sure is confusing to see such a different approach for Intel).
> But I would appreciate any feedback.

And that reminds me, I owe you a beverage of your choice. For all your
effort.

>
> Vladimir Oltean (11):
>   net/sched: mqprio: refactor nlattr parsing to a separate function
>   net/sched: mqprio: refactor offloading and unoffloading to dedicated
>     functions
>   net/sched: move struct tc_mqprio_qopt_offload from pkt_cls.h to
>     pkt_sched.h
>   net/sched: mqprio: allow offloading drivers to request queue count
>     validation
>   net/sched: mqprio: add extack messages for queue count validation
>   net: enetc: request mqprio to validate the queue counts
>   net: enetc: act upon the requested mqprio queue configuration
>   net/sched: taprio: pass mqprio queue configuration to ndo_setup_tc()
>   net: enetc: act upon mqprio queue config in taprio offload
>   net/sched: taprio: validate that gate mask does not exceed number of
>     TCs
>   net/sched: taprio: only calculate gate mask per TXQ for igc
>
>  drivers/net/ethernet/freescale/enetc/enetc.c  |  67 ++--
>  .../net/ethernet/freescale/enetc/enetc_qos.c  |  27 +-
>  drivers/net/ethernet/intel/igc/igc_main.c     |  17 +
>  include/net/pkt_cls.h                         |  10 -
>  include/net/pkt_sched.h                       |  16 +
>  net/sched/sch_mqprio.c                        | 298 +++++++++++-------
>  net/sched/sch_taprio.c                        |  57 ++--
>  7 files changed, 310 insertions(+), 182 deletions(-)
>
> -- 
> 2.34.1
>

Cheers,
-- 
Vinicius

  parent reply	other threads:[~2023-01-25  1:15 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-20 14:15 [RFC PATCH net-next 00/11] ENETC mqprio/taprio cleanup Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 01/11] net/sched: mqprio: refactor nlattr parsing to a separate function Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 02/11] net/sched: mqprio: refactor offloading and unoffloading to dedicated functions Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 03/11] net/sched: move struct tc_mqprio_qopt_offload from pkt_cls.h to pkt_sched.h Vladimir Oltean
2023-01-25 13:09   ` Kurt Kanzenbach
2023-01-25 13:16     ` Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 04/11] net/sched: mqprio: allow offloading drivers to request queue count validation Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 05/11] net/sched: mqprio: add extack messages for " Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 06/11] net: enetc: request mqprio to validate the queue counts Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 07/11] net: enetc: act upon the requested mqprio queue configuration Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 08/11] net/sched: taprio: pass mqprio queue configuration to ndo_setup_tc() Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 09/11] net: enetc: act upon mqprio queue config in taprio offload Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 10/11] net/sched: taprio: validate that gate mask does not exceed number of TCs Vladimir Oltean
2023-01-20 14:15 ` [RFC PATCH net-next 11/11] net/sched: taprio: only calculate gate mask per TXQ for igc Vladimir Oltean
2023-01-25  1:11   ` Vinicius Costa Gomes
2023-01-23 18:22 ` [RFC PATCH net-next 00/11] ENETC mqprio/taprio cleanup Jacob Keller
2023-01-24 14:26   ` Vladimir Oltean
2023-01-24 22:30     ` Jacob Keller
2023-01-23 21:21 ` Gerhard Engleder
2023-01-23 21:31   ` Vladimir Oltean
2023-01-23 22:20     ` Gerhard Engleder
2023-01-25  1:11 ` Vinicius Costa Gomes [this message]
2023-01-25 13:10   ` Vladimir Oltean
2023-01-25 22:47     ` Vinicius Costa Gomes
2023-01-26 20:39       ` Vladimir Oltean

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tu0fh0zi.fsf@intel.com \
    --to=vinicius.gomes@intel.com \
    --cc=alexander.duyck@gmail.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=camelia.groza@nxp.com \
    --cc=claudiu.manoil@nxp.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=ferenc.fejes@ericsson.com \
    --cc=gerhard@engleder-embedded.com \
    --cc=jacob.e.keller@intel.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kurt@linutronix.de \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=vladimir.oltean@nxp.com \
    --cc=xiaoliang.yang_1@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).