From: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
To: Vladimir Oltean <olteanv@gmail.com>
Cc: Lorenzo Bianconi <lorenzo@kernel.org>,
Oleksij Rempel <linux@rempel-privat.de>,
netdev@vger.kernel.org, andrew@lunn.ch, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
horms@kernel.org, nbd@nbd.name, sean.wang@mediatek.com,
Mark-MC.Lee@mediatek.com, lorenzo.bianconi83@gmail.com
Subject: Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
Date: Tue, 17 Dec 2024 10:11:46 +0100 [thread overview]
Message-ID: <Z2FAUuOh4jrA0uGu@lore-desk> (raw)
In-Reply-To: <20241216231311.odozs4eki7bbagwp@skbuf>
[-- Attachment #1: Type: text/plain, Size: 11264 bytes --]
> On Mon, Dec 16, 2024 at 11:28:18PM +0100, Lorenzo Bianconi wrote:
> > > ndo_setup_tc_conduit() does not have the same instruments to offload
> > > what port_setup_tc() can offload. It is not involved in all data paths
> > > that port_setup_tc() has to handle. Please ack this. So if port_setup_tc()
> >
> > Can you please elaborate on this? Both (->ndo_setup_tc_conduit() and
> > ->port_setup_tc()) refer to the same DSA user port (please take a look the
> > callback signature).
>
> I'd be just repeating what I've said a few times before. Your proposed
> ndo_setup_tc_conduit() appears to be configuring conduit resources
> (QDMA, GDM) for mt7530 user port tc offload, as if it is in complete and
> exclusive control of the user port data path. But as long as there are
> packets in the user port data path that bypass those conduit QoS resources
> (like for example mt7530 switch forwards packet from one port to another,
> bypassing the GDM1 in your drawing[1]), it isn't a good model. Forget
> about ndo_setup_tc_conduit(), it isn't a good tc command to run in the
> first place. The tc command you're trying to make to do what you want is
> supposed to _also_ include the mt7530 packets forwarded from one port to
> another in its QoS mix. It applies at the _egress_ of the mt7530 port.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23020f04932701d5c8363e60756f12b43b8ed752
>
> Let me try to add some squiggles based on your diagram, to clarify what
> is my understanding and complaint.
>
> ┌───────┐ ┌───────┐
> │ QDMA2 │ │ QDMA1 │
> └───┬───┘ └───┬───┘
> │ │
> ┌───────▼───────────────────────────────────────────▼────────┐
> │ │
> │ P5 P0 │
> │ │
> │ │
> │ │ ┌──────┐
> │ P3 ├────► GDM3 │
> │ │
> ┌─────┐ │ │
> │ PPE ◄────┤ P4 PSE │
> └─────┘ │ │
> │ │ ┌──────┐
> │ P9 ├────► GDM4 │
> │ │ └──────┘
> │ │
> │ P2 P1 │
> └─────────┬─────────────────────────────────────────┬────────┘
> │ │
> ┌───▼──┐ ┌──▼───┐
> │ GDM2 │ │ GDM1 │
> └──────┘ └──┬───┘
> │
> ┌──────────────▼───────────────┐
> │ CPU port │
> │ ┌─────────┘ │
> │ │ MT7530 │
> │ │ │
> │ ▼ x │
> │ ┌─────┐ ┌─┘ │
> │ lan1 lan2 lan3 lan4 │
> └───│──────────────────────────┘
> ▼
>
> When you add an offloaded Qdisc to the egress of lan1, the expectation
> is that packets from lan2 obey it too (offloaded tc goes hand in hand
> with offloaded bridge). Whereas, by using GDM1/QDMA resources, you are
> breaking that expectation, because packets from lan2 bridged by MT7530
> don't go to GDM1 (the "x").
ack, I got your point. I was assuming to cover this case (traffic from lan2 to
lan1) maintaining the port_setup_tc() callback in dsa_user_setup_qdisc() (this
traffic is not managed by ndo_setup_tc_conduit() callback). If this approach is
not ok, I guess we will need to revisit the approach.
>
> > > returns -EOPNOTSUPP, the entire dsa_user_setup_qdisc() should return
> > > -EOPNOTSUPP, UNLESS you install packet traps on all other offloaded data
> > > paths in the switch, such that all packets that egress the DSA user port
> > > are handled by ndo_setup_tc_conduit()'s instruments.
> >
> > Uhm, do you mean we are changing the user expected result in this way?
> > It seems to me the only case we are actually changing is if port_setup_tc()
> > callback is not supported by the DSA switch driver while ndo_setup_tc_conduit()
> > one is supported by the mac chip. In this case the previous implementation
> > returns -EOPNOTSUPP while the proposed one does not report any error.
> > Do we really care about this case? If so, I guess we can rework
> > dsa_user_setup_qdisc().
>
> See above, there's nothing to rework.
>
> > > > nope, I am not saying the current Qdisc DSA infrastructure is wrong, it just
> > > > does not allow to exploit all hw capabilities available on EN7581 when the
> > > > traffic is routed from the WAN port to a given DSA switch port.
> > >
> > > And I don't believe it should, in this way.
> >
> > Can you please elaborate on this? IIUC it seems even Oleksij has a use-case
> > for this.
>
> See above, I'm also waiting for Oleksij's answer but I don't expect you
> 2 to be talking about the same thing. If there's some common infrastructure
> to be shared, my understanding is it has nothing to do with ndo_setup_tc_conduit().
ack
>
> > > We need something as the root Qdisc of the conduit which exposes its
> > > hardware capabilities. I just assumed that would be a simple (and single)
> > > ETS, you can correct me if I am wrong.
> > >
> > > On conduit egress, what is the arbitration scheme between the traffic
> > > destined towards each DSA user port (channel, as the driver calls them)?
> > > How can this be best represented?
> >
> > The EN7581 supports up to 32 different 'channels' (each of them support 8
> > different hw queues). You can define an ETS and/or TBF Qdisc for each channel.
> > My idea is to associate a channel to each DSA switch port, so the user can
> > define independent QoS policies for each DSA ports (e.g. shape at 100Mbps lan0,
> > apply ETS on lan1, ...) configuring the mac chip instead of the hw switch.
> > The kernel (if the traffic is not offloaded) or the PPE block (if the traffic
> > is offloaded) updates the channel and queue information in the DMA descriptor
> > (please take a look to [0] for the first case).
>
> But you call it a MAC chip because between the GDM1 and the MT7530 there's
> an in-chip Ethernet MAC (GMII netlist), with a fixed packet rate, right?
With "mac chip" I mean the set of PSE/PPE and QDMA blocks in the diagram
(what is managed by airoha_eth driver). There is no other chip in between
of GDM1 and MT7530 switch (sorry for the confusion).
> I'm asking again, are the channels completely independent of one another,
> or are they consuming shared bandwidth in a way that with your proposal
> is just not visible? If there is a GMII between the GDM1 and the MT7530,
> how come the bandwidth between the channels is not shared in any way?
Channels are logically independent.
GDM1 is connected to the MT7530 switch via a fixed speed link (10Gbps, similar
to what we have for other MediaTek chipset, like MT7988 [0]). The fixed link speed
is higher than the sum of DSA port link speeds (on my development boards I have
4 DSA ports @ 1Gbps);
> And if there is no GMII or similar MAC interface, we need to take 100
> steps back and discuss why was the DSA model chosen for this switch, and
> not a freeform switchdev driver where the conduit is not discrete?
>
> I'm not sure what to associate these channels with. Would it be wrong to
> think of each channel as a separate DSA conduit? Because for example there
> is API to customize the user port <-> conduit assignment.
>
> > > IIUC, in your patch set, you expose the conduit hardware QoS capabilities
> > > as if they can be perfectly virtualized among DSA user ports, and as if
> > > each DSA user port can have its own ETS root Qdisc, completely independent
> > > of each other, as if the packets do not serialize on the conduit <-> CPU
> > > port link, and as if that is not a bottleneck. Is that really the case?
> >
> > correct
>
> Very interesting but will need more than one word for an explanation :)
I mean your paragraph above well describes hw architecture.
>
> > > If so (but please explain how), maybe you really need your own root Qdisc
> > > driver, with one class per DSA user port, and those classes have ETS
> > > attached to them.
> >
> > Can you please clarify what do you mean with 'root Qdisc driver'?
>
> Quite literally, an implementer of struct Qdisc_ops whose parent can
> only be TC_H_ROOT. I was implying you'd have to create an abstract
> software model of the QoS capabilities of the QDMA and the GDM port such
> that we all understand them, a netlink attribute scheme for configuring
> those QoS parameters, and then a QoS offload mechanism through which
> they are communicated to compatible hardware. But let's leave that aside
> until it becomes more clear what you have.
>
ack, fine.
Regards,
Lorenzo
[0] https://github.com/openwrt/openwrt/blob/main/target/linux/mediatek/files-6.6/arch/arm64/boot/dts/mediatek/mt7988a.dtsi#L1531
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
next prev parent reply other threads:[~2024-12-17 9:11 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-11 15:31 [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC Lorenzo Bianconi
2024-12-11 15:31 ` [RFC net-next 1/5] net: airoha: Enable Tx drop capability for each Tx DMA ring Lorenzo Bianconi
2024-12-11 15:31 ` [RFC net-next 2/5] net: airoha: Introduce ndo_select_queue callback Lorenzo Bianconi
2024-12-11 15:31 ` [RFC net-next 3/5] net: dsa: Introduce ndo_setup_tc_conduit callback Lorenzo Bianconi
2024-12-11 15:31 ` [RFC net-next 4/5] net: airoha: Add sched ETS offload support Lorenzo Bianconi
2024-12-12 14:37 ` Davide Caratti
2024-12-12 17:04 ` Lorenzo Bianconi
2024-12-11 15:31 ` [RFC net-next 5/5] net: airoha: Add sched TBF " Lorenzo Bianconi
2024-12-11 15:41 ` [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC Vladimir Oltean
2024-12-12 9:19 ` Lorenzo Bianconi
2024-12-12 15:06 ` Vladimir Oltean
2024-12-12 17:03 ` Lorenzo Bianconi
2024-12-12 18:46 ` Vladimir Oltean
2024-12-16 12:09 ` Lorenzo Bianconi
2024-12-16 15:49 ` Vladimir Oltean
2024-12-16 18:14 ` Oleksij Rempel
2024-12-16 19:01 ` Lorenzo Bianconi
2024-12-16 19:23 ` Oleksij Rempel
2024-12-16 21:44 ` Lorenzo Bianconi
2024-12-17 8:46 ` Oleksij Rempel
2024-12-16 19:46 ` Vladimir Oltean
2024-12-16 22:28 ` Lorenzo Bianconi
2024-12-16 23:13 ` Vladimir Oltean
2024-12-17 9:11 ` Lorenzo Bianconi [this message]
2024-12-17 9:30 ` Vladimir Oltean
2024-12-17 10:01 ` Lorenzo Bianconi
2024-12-17 10:17 ` Vladimir Oltean
2024-12-17 10:23 ` Oleksij Rempel
2024-12-16 23:24 ` Andrew Lunn
2024-12-17 9:38 ` Oleksij Rempel
2024-12-17 11:54 ` Vladimir Oltean
2024-12-17 12:22 ` Oleksij Rempel
2024-12-17 13:28 ` Vladimir Oltean
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z2FAUuOh4jrA0uGu@lore-desk \
--to=lorenzo.bianconi@redhat.com \
--cc=Mark-MC.Lee@mediatek.com \
--cc=andrew@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux@rempel-privat.de \
--cc=lorenzo.bianconi83@gmail.com \
--cc=lorenzo@kernel.org \
--cc=nbd@nbd.name \
--cc=netdev@vger.kernel.org \
--cc=olteanv@gmail.com \
--cc=pabeni@redhat.com \
--cc=sean.wang@mediatek.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).