From: Ido Schimmel <idosch@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Simon Horman <horms@kernel.org>, Amit Cohen <amcohen@nvidia.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Petr Machata <petrm@nvidia.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
Network Development <netdev@vger.kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
bpf <bpf@vger.kernel.org>, mlxsw <mlxsw@nvidia.com>
Subject: Re: [PATCH net-next 00/12] mlxsw: Preparations for XDP support
Date: Mon, 17 Feb 2025 11:35:01 +0200 [thread overview]
Message-ID: <Z7MCxTDyVWGpRtOv@shredder> (raw)
In-Reply-To: <20250215081043.063e995a@kernel.org>
On Sat, Feb 15, 2025 at 08:10:43AM -0800, Jakub Kicinski wrote:
> On Sat, 15 Feb 2025 14:02:52 +0000 Simon Horman wrote:
> > > TBH I also feel a little ambivalent about adding advanced software
> > > features to mlxsw. You have a dummy device off which you hang the NAPIs,
> > > the page pools, and now the RXQ objects. That already works poorly with
> > > our APIs. How are you going to handle the XDP side? Program per port,
> > > I hope? But the basic fact remains that only fallback traffic goes thru
> > > the XDP program which is not the normal Linux model, routing is after
> > > XDP.
> > >
> > > On one hand it'd be great if upstream switch drivers could benefit from
> > > the advanced features. On the other the HW is clearly not capable of
> > > delivering in line with how NICs work, so we're signing up for a stream
> > > of corner cases, bugs and incompatibility. Dunno.
> >
> > FWIIW, I do think that as this driver is actively maintained by the vendor,
> > and this is a grey zone, it is reasonable to allow the vendor to decide if
> > they want the burden of this complexity to gain some performance.
>
> Yes, I left this series in PW for an extra couple of days expecting
> a discussion but I suppose my email was taken as a final judgment.
Yes.
> The object separation can be faked more accurately, and analyzed
> (in the cover letter) to give us more confidence that the divergence
> won't create problems.
Unlike regular NICs, this device has more ports than Rx queues, so we
cannot associate a Rx queue with a net device. Like you said, this is
why NAPI instances and RXQ objects are associated with a dummy net
device. However, there are already drivers such as mtk that have the
same problem and do the same thing. The only API change that we made in
this regard is adding a net device argument to xdp_build_skb_from_buff()
instead of having it use rxq->dev.
Regarding the invocation of XDP programs, they are of course invoked on
a per-port basis. It's just that the driver first needs to look up the
XDP program in an internal array based on the Rx port in the completion
info.
Regarding motivation, one use case we thought about is telemetry. For
example, today you can configure a tc filter with a sample action that
will mirror one out of N packets to the CPU. The driver identifies such
packets according to the trap ID in the completion info and then passes
them to the psample module with various metadata that it extracted from
the completion info (e.g., latency, egress queue occupancy, if sampled
on egress). Some users don't want to process these packets locally, but
instead have them sent together with the metadata to a server for
processing. If XDP programs had access to this metadata we could do this
on the CPU with relatively low overhead. However, this is not supported
with tc-bpf, so you might tell me that it shouldn't be supported with
XDP either.
prev parent reply other threads:[~2025-02-17 9:35 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-04 11:04 [PATCH net-next 00/12] mlxsw: Preparations for XDP support Petr Machata
2025-02-04 11:04 ` [PATCH net-next 01/12] mlxsw: core: Remove debug prints Petr Machata
2025-02-04 11:04 ` [PATCH net-next 02/12] mlxsw: Check Rx local port in PCI code Petr Machata
2025-02-04 11:04 ` [PATCH net-next 03/12] mlxsw: Add struct mlxsw_pci_rx_pkt_info Petr Machata
2025-02-04 11:04 ` [PATCH net-next 04/12] mlxsw: pci: Use mlxsw_pci_rx_pkt_info Petr Machata
2025-02-04 11:05 ` [PATCH net-next 05/12] mlxsw: pci: Add a separate function for syncing buffers for CPU Petr Machata
2025-02-04 11:05 ` [PATCH net-next 06/12] mlxsw: pci: Store maximum number of ports Petr Machata
2025-02-04 11:05 ` [PATCH net-next 07/12] mlxsw: pci: Add PCI ports array Petr Machata
2025-02-04 11:05 ` [PATCH net-next 08/12] mlxsw: Add APIs to init/fini PCI port Petr Machata
2025-02-04 11:05 ` [PATCH net-next 09/12] mlxsw: pci: Initialize XDP Rx queue info per RDQ Petr Machata
2025-02-04 11:05 ` [PATCH net-next 10/12] mlxsw: spectrum: Initialize PCI port with the relevant netdevice Petr Machata
2025-02-04 11:05 ` [PATCH net-next 11/12] mlxsw: Set some SKB fields in bus driver Petr Machata
2025-02-04 11:05 ` [PATCH net-next 12/12] mlxsw: Validate local port from CQE in PCI code Petr Machata
2025-02-04 15:56 ` [PATCH net-next 00/12] mlxsw: Preparations for XDP support Alexei Starovoitov
2025-02-04 15:59 ` Amit Cohen
2025-02-04 16:02 ` Alexei Starovoitov
2025-02-04 17:26 ` Amit Cohen
2025-02-05 17:09 ` Jakub Kicinski
2025-02-15 14:02 ` Simon Horman
2025-02-15 16:10 ` Jakub Kicinski
2025-02-16 9:26 ` Simon Horman
2025-02-17 9:35 ` Ido Schimmel [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z7MCxTDyVWGpRtOv@shredder \
--to=idosch@nvidia.com \
--cc=alexei.starovoitov@gmail.com \
--cc=amcohen@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=mlxsw@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=petrm@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).