netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
To: Vladimir Oltean <olteanv@gmail.com>
Cc: Lorenzo Bianconi <lorenzo@kernel.org>,
	Oleksij Rempel <linux@rempel-privat.de>,
	netdev@vger.kernel.org, andrew@lunn.ch, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	horms@kernel.org, nbd@nbd.name, sean.wang@mediatek.com,
	Mark-MC.Lee@mediatek.com, lorenzo.bianconi83@gmail.com
Subject: Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
Date: Tue, 17 Dec 2024 10:11:46 +0100	[thread overview]
Message-ID: <Z2FAUuOh4jrA0uGu@lore-desk> (raw)
In-Reply-To: <20241216231311.odozs4eki7bbagwp@skbuf>

[-- Attachment #1: Type: text/plain, Size: 11264 bytes --]

> On Mon, Dec 16, 2024 at 11:28:18PM +0100, Lorenzo Bianconi wrote:
> > > ndo_setup_tc_conduit() does not have the same instruments to offload
> > > what port_setup_tc() can offload. It is not involved in all data paths
> > > that port_setup_tc() has to handle. Please ack this. So if port_setup_tc()
> > 
> > Can you please elaborate on this? Both (->ndo_setup_tc_conduit() and
> > ->port_setup_tc()) refer to the same DSA user port (please take a look the
> > callback signature).
> 
> I'd be just repeating what I've said a few times before. Your proposed
> ndo_setup_tc_conduit() appears to be configuring conduit resources
> (QDMA, GDM) for mt7530 user port tc offload, as if it is in complete and
> exclusive control of the user port data path. But as long as there are
> packets in the user port data path that bypass those conduit QoS resources
> (like for example mt7530 switch forwards packet from one port to another,
> bypassing the GDM1 in your drawing[1]), it isn't a good model. Forget
> about ndo_setup_tc_conduit(), it isn't a good tc command to run in the
> first place. The tc command you're trying to make to do what you want is
> supposed to _also_ include the mt7530 packets forwarded from one port to
> another in its QoS mix. It applies at the _egress_ of the mt7530 port.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23020f04932701d5c8363e60756f12b43b8ed752
> 
> Let me try to add some squiggles based on your diagram, to clarify what
> is my understanding and complaint.
> 
>                ┌───────┐                                   ┌───────┐
>                │ QDMA2 │                                   │ QDMA1 │
>                └───┬───┘                                   └───┬───┘
>                    │                                           │
>            ┌───────▼───────────────────────────────────────────▼────────┐
>            │                                                            │
>            │       P5                                          P0       │
>            │                                                            │
>            │                                                            │
>            │                                                            │    ┌──────┐
>            │                                                         P3 ├────► GDM3 │
>            │                                                            │
> ┌─────┐    │                                                            │
> │ PPE ◄────┤ P4                        PSE                              │
> └─────┘    │                                                            │
>            │                                                            │    ┌──────┐
>            │                                                         P9 ├────► GDM4 │
>            │                                                            │    └──────┘
>            │                                                            │
>            │        P2                                         P1       │
>            └─────────┬─────────────────────────────────────────┬────────┘
>                      │                                         │
>                  ┌───▼──┐                                   ┌──▼───┐
>                  │ GDM2 │                                   │ GDM1 │
>                  └──────┘                                   └──┬───┘
>                                                                │
>                                                 ┌──────────────▼───────────────┐
>                                                 │            CPU port          │
>                                                 │   ┌─────────┘                │
>                                                 │   │         MT7530           │
>                                                 │   │                          │
>                                                 │   ▼         x                │
>                                                 │   ┌─────┐ ┌─┘                │
>                                                 │  lan1  lan2  lan3  lan4      │
>                                                 └───│──────────────────────────┘
>                                                     ▼
> 
> When you add an offloaded Qdisc to the egress of lan1, the expectation
> is that packets from lan2 obey it too (offloaded tc goes hand in hand
> with offloaded bridge). Whereas, by using GDM1/QDMA resources, you are
> breaking that expectation, because packets from lan2 bridged by MT7530
> don't go to GDM1 (the "x").

ack, I got your point. I was assuming to cover this case (traffic from lan2 to
lan1) maintaining the port_setup_tc() callback in dsa_user_setup_qdisc() (this
traffic is not managed by ndo_setup_tc_conduit() callback). If this approach is
not ok, I guess we will need to revisit the approach.

> 
> > > returns -EOPNOTSUPP, the entire dsa_user_setup_qdisc() should return
> > > -EOPNOTSUPP, UNLESS you install packet traps on all other offloaded data
> > > paths in the switch, such that all packets that egress the DSA user port
> > > are handled by ndo_setup_tc_conduit()'s instruments.
> > 
> > Uhm, do you mean we are changing the user expected result in this way?
> > It seems to me the only case we are actually changing is if port_setup_tc()
> > callback is not supported by the DSA switch driver while ndo_setup_tc_conduit()
> > one is supported by the mac chip. In this case the previous implementation
> > returns -EOPNOTSUPP while the proposed one does not report any error.
> > Do we really care about this case? If so, I guess we can rework
> > dsa_user_setup_qdisc().
> 
> See above, there's nothing to rework.
> 
> > > > nope, I am not saying the current Qdisc DSA infrastructure is wrong, it just
> > > > does not allow to exploit all hw capabilities available on EN7581 when the
> > > > traffic is routed from the WAN port to a given DSA switch port.
> > > 
> > > And I don't believe it should, in this way.
> > 
> > Can you please elaborate on this? IIUC it seems even Oleksij has a use-case
> > for this.
> 
> See above, I'm also waiting for Oleksij's answer but I don't expect you
> 2 to be talking about the same thing. If there's some common infrastructure
> to be shared, my understanding is it has nothing to do with ndo_setup_tc_conduit().

ack

> 
> > > We need something as the root Qdisc of the conduit which exposes its
> > > hardware capabilities. I just assumed that would be a simple (and single)
> > > ETS, you can correct me if I am wrong.
> > > 
> > > On conduit egress, what is the arbitration scheme between the traffic
> > > destined towards each DSA user port (channel, as the driver calls them)?
> > > How can this be best represented?
> > 
> > The EN7581 supports up to 32 different 'channels' (each of them support 8
> > different hw queues). You can define an ETS and/or TBF Qdisc for each channel.
> > My idea is to associate a channel to each DSA switch port, so the user can
> > define independent QoS policies for each DSA ports (e.g. shape at 100Mbps lan0,
> > apply ETS on lan1, ...) configuring the mac chip instead of the hw switch.
> > The kernel (if the traffic is not offloaded) or the PPE block (if the traffic
> > is offloaded) updates the channel and queue information in the DMA descriptor
> > (please take a look to [0] for the first case).
> 
> But you call it a MAC chip because between the GDM1 and the MT7530 there's
> an in-chip Ethernet MAC (GMII netlist), with a fixed packet rate, right?

With "mac chip" I mean the set of PSE/PPE and QDMA blocks in the diagram
(what is managed by airoha_eth driver). There is no other chip in between
of GDM1 and MT7530 switch (sorry for the confusion).

> I'm asking again, are the channels completely independent of one another,
> or are they consuming shared bandwidth in a way that with your proposal
> is just not visible? If there is a GMII between the GDM1 and the MT7530,
> how come the bandwidth between the channels is not shared in any way?

Channels are logically independent.
GDM1 is connected to the MT7530 switch via a fixed speed link (10Gbps, similar
to what we have for other MediaTek chipset, like MT7988 [0]). The fixed link speed
is higher than the sum of DSA port link speeds (on my development boards I have
4 DSA ports @ 1Gbps);

> And if there is no GMII or similar MAC interface, we need to take 100
> steps back and discuss why was the DSA model chosen for this switch, and
> not a freeform switchdev driver where the conduit is not discrete?
> 
> I'm not sure what to associate these channels with. Would it be wrong to
> think of each channel as a separate DSA conduit? Because for example there
> is API to customize the user port <-> conduit assignment.
> 
> > > IIUC, in your patch set, you expose the conduit hardware QoS capabilities
> > > as if they can be perfectly virtualized among DSA user ports, and as if
> > > each DSA user port can have its own ETS root Qdisc, completely independent
> > > of each other, as if the packets do not serialize on the conduit <-> CPU
> > > port link, and as if that is not a bottleneck. Is that really the case?
> > 
> > correct
> 
> Very interesting but will need more than one word for an explanation :)

I mean your paragraph above well describes hw architecture.

> 
> > > If so (but please explain how), maybe you really need your own root Qdisc
> > > driver, with one class per DSA user port, and those classes have ETS
> > > attached to them.
> > 
> > Can you please clarify what do you mean with 'root Qdisc driver'?
> 
> Quite literally, an implementer of struct Qdisc_ops whose parent can
> only be TC_H_ROOT. I was implying you'd have to create an abstract
> software model of the QoS capabilities of the QDMA and the GDM port such
> that we all understand them, a netlink attribute scheme for configuring
> those QoS parameters, and then a QoS offload mechanism through which
> they are communicated to compatible hardware. But let's leave that aside
> until it becomes more clear what you have.
> 

ack, fine.

Regards,
Lorenzo

[0] https://github.com/openwrt/openwrt/blob/main/target/linux/mediatek/files-6.6/arch/arm64/boot/dts/mediatek/mt7988a.dtsi#L1531

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

  reply	other threads:[~2024-12-17  9:11 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-11 15:31 [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC Lorenzo Bianconi
2024-12-11 15:31 ` [RFC net-next 1/5] net: airoha: Enable Tx drop capability for each Tx DMA ring Lorenzo Bianconi
2024-12-11 15:31 ` [RFC net-next 2/5] net: airoha: Introduce ndo_select_queue callback Lorenzo Bianconi
2024-12-11 15:31 ` [RFC net-next 3/5] net: dsa: Introduce ndo_setup_tc_conduit callback Lorenzo Bianconi
2024-12-11 15:31 ` [RFC net-next 4/5] net: airoha: Add sched ETS offload support Lorenzo Bianconi
2024-12-12 14:37   ` Davide Caratti
2024-12-12 17:04     ` Lorenzo Bianconi
2024-12-11 15:31 ` [RFC net-next 5/5] net: airoha: Add sched TBF " Lorenzo Bianconi
2024-12-11 15:41 ` [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC Vladimir Oltean
2024-12-12  9:19   ` Lorenzo Bianconi
2024-12-12 15:06     ` Vladimir Oltean
2024-12-12 17:03       ` Lorenzo Bianconi
2024-12-12 18:46         ` Vladimir Oltean
2024-12-16 12:09           ` Lorenzo Bianconi
2024-12-16 15:49             ` Vladimir Oltean
2024-12-16 18:14               ` Oleksij Rempel
2024-12-16 19:01               ` Lorenzo Bianconi
2024-12-16 19:23                 ` Oleksij Rempel
2024-12-16 21:44                   ` Lorenzo Bianconi
2024-12-17  8:46                     ` Oleksij Rempel
2024-12-16 19:46                 ` Vladimir Oltean
2024-12-16 22:28                   ` Lorenzo Bianconi
2024-12-16 23:13                     ` Vladimir Oltean
2024-12-17  9:11                       ` Lorenzo Bianconi [this message]
2024-12-17  9:30                         ` Vladimir Oltean
2024-12-17 10:01                           ` Lorenzo Bianconi
2024-12-17 10:17                             ` Vladimir Oltean
2024-12-17 10:23                               ` Oleksij Rempel
2024-12-16 23:24                 ` Andrew Lunn
2024-12-17  9:38                   ` Oleksij Rempel
2024-12-17 11:54                     ` Vladimir Oltean
2024-12-17 12:22                       ` Oleksij Rempel
2024-12-17 13:28                         ` Vladimir Oltean

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z2FAUuOh4jrA0uGu@lore-desk \
    --to=lorenzo.bianconi@redhat.com \
    --cc=Mark-MC.Lee@mediatek.com \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux@rempel-privat.de \
    --cc=lorenzo.bianconi83@gmail.com \
    --cc=lorenzo@kernel.org \
    --cc=nbd@nbd.name \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=pabeni@redhat.com \
    --cc=sean.wang@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).