From: Tariq Toukan <tariqt@nvidia.com>
To: Leon Romanovsky <leon@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>,
"Saeed Mahameed" <saeedm@nvidia.com>,
Tariq Toukan <tariqt@nvidia.com>
Cc: Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Mark Bloch <mbloch@nvidia.com>, <linux-kernel@vger.kernel.org>,
<linux-rdma@vger.kernel.org>, <netdev@vger.kernel.org>,
Gal Pressman <gal@nvidia.com>,
Dragos Tatulea <dtatulea@nvidia.com>,
Moshe Shemesh <moshe@nvidia.com>, Shay Drory <shayd@nvidia.com>,
Alexei Lazar <alazar@nvidia.com>
Subject: [PATCH mlx5-next V2 0/9] mlx5-next updates 2026-03-09
Date: Mon, 9 Mar 2026 11:34:26 +0200 [thread overview]
Message-ID: <20260309093435.1850724-1-tariqt@nvidia.com> (raw)
Hi,
This series contains mlx5 shared updates as preparation for upcoming
features.
First patch by Alex contains IFC changes as preparation for an upcoming
feature.
Last patch does definition movement to expose a HW constant so it could
be used later also by core and Eth drivers.
Patches 2 to 8 by Shay introduce mlx5 infrastructure for SD switchdev
and LAG support.
Detailed description by Shay below.
Regards,
Tariq
This series adds shared infrastructure to enable Socket Direct (SD)
single-netdev switchdev transition and LAG support in subsequent patches.
Currently, LAG is not supported in Socket Direct configurations, and
BlueField-3/4 utilizing SD for North-South traffic operates with two
distinct eSwitches per physical port. This forces the use of separate
IPs and MAC addresses for each NUMA node, complicating network
configuration and requiring firmware to handle MPFS with different
inner and outer packets for communication.
The goal is to expose a single external IP address (single MAC address)
per physical port while maintaining SD's bandwidth and latency benefits.
This means having a single eswitch per physical port managing all
physical ports via merged eswitch with multiple vports. This enables
single FDB creation which will result in a single RDMA device to be used by
DOCA/HWS/OVS.
To achieve this, the LAG infrastructure needs changes since the current
implementation assumes a fixed mapping between device indices and LAG
ports, which breaks with SD's multi-device-per-port model.
This series prepares the groundwork by:
1. Adding IFC bits for silent mode query and VHCA RX destination type,
needed for SD device coordination and cross-VHCA traffic steering.
2. Converting the LAG pf array to xarray and using xa_alloc for dynamic
index management. This decouples LAG indexing from physical device
indices, allowing flexible device membership.
3. Convert peer_miss_rule array to xarray, key with vhca_id.
4. Introducing LAG variant of device index helpers that produce unique
identifiers even when multiple devices share the same physical port.
5. Adding VHCA RX flow destination support for steering traffic to a
specific VHCA's receive path.
6. Moving LAG demux table ownership to the LAG layer with APIs for
SW-only LAG modes where firmware cannot create the demux table.
A follow-up series will build on this infrastructure to implement:
- SD single-netdev switchdev mode transition with shared FDB
corresponded to the SD group.
- LAG support enabling bonding of SD groups
Since the follow-up series is large (~20 patches), the shared code
between RDMA and net is sent in advance to avoid overloading the
shared branch tree.
V2:
- Add one more patch #9.
- Use kvfree() instead of kfree() in mlx5_esw_lag_demux_rule_create()
- Fix a condition check to > instead of >= in
mlx5_ib_set_vport_rep().
- Fix author of patch #4.
- Link to V1: https://lore.kernel.org/all/20260308065559.1837449-1-tariqt@nvidia.com/
Alexei Lazar (1):
net/mlx5: Add IFC bits for shared headroom pool PBMC support
Shay Drory (7):
net/mlx5: Add silent mode set/query and VHCA RX IFC bits
net/mlx5: LAG, replace pf array with xarray
net/mlx5: LAG, use xa_alloc to manage LAG device indices
net/mlx5: E-switch, modify peer miss rule index to vhca_id
net/mlx5: LAG, replace mlx5_get_dev_index with LAG sequence number
net/mlx5: Add VHCA RX flow destination support for FW steering
{net/RDMA}/mlx5: Add LAG demux table API and vport demux rules
Tariq Toukan (1):
net/mlx5: Expose MLX5_UMR_ALIGN definition
drivers/infiniband/hw/mlx5/ib_rep.c | 24 +-
drivers/infiniband/hw/mlx5/main.c | 21 +-
drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 -
drivers/infiniband/hw/mlx5/mr.c | 1 -
.../mellanox/mlx5/core/diag/fs_tracepoint.c | 3 +
.../net/ethernet/mellanox/mlx5/core/en_tc.c | 9 +-
.../net/ethernet/mellanox/mlx5/core/eswitch.h | 14 +-
.../mellanox/mlx5/core/eswitch_offloads.c | 103 ++-
.../net/ethernet/mellanox/mlx5/core/fs_cmd.c | 6 +-
.../net/ethernet/mellanox/mlx5/core/fs_core.c | 17 +-
.../ethernet/mellanox/mlx5/core/lag/debugfs.c | 3 +-
.../net/ethernet/mellanox/mlx5/core/lag/lag.c | 684 ++++++++++++++----
.../net/ethernet/mellanox/mlx5/core/lag/lag.h | 49 +-
.../net/ethernet/mellanox/mlx5/core/lag/mp.c | 20 +-
.../ethernet/mellanox/mlx5/core/lag/mpesw.c | 15 +-
.../mellanox/mlx5/core/lag/port_sel.c | 28 +-
.../net/ethernet/mellanox/mlx5/core/lib/sd.c | 2 +-
include/linux/mlx5/device.h | 1 +
include/linux/mlx5/fs.h | 10 +-
include/linux/mlx5/lag.h | 21 +
include/linux/mlx5/mlx5_ifc.h | 26 +-
21 files changed, 850 insertions(+), 208 deletions(-)
create mode 100644 include/linux/mlx5/lag.h
base-commit: 385a06f74ff7a03e3fb0b15fb87cfeb052d75073
--
2.44.0
next reply other threads:[~2026-03-09 9:35 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-09 9:34 Tariq Toukan [this message]
2026-03-09 9:34 ` [PATCH mlx5-next V2 1/9] net/mlx5: Add IFC bits for shared headroom pool PBMC support Tariq Toukan
2026-03-09 9:34 ` [PATCH mlx5-next V2 2/9] net/mlx5: Add silent mode set/query and VHCA RX IFC bits Tariq Toukan
2026-03-09 9:34 ` [PATCH mlx5-next V2 3/9] net/mlx5: LAG, replace pf array with xarray Tariq Toukan
2026-03-09 9:34 ` [PATCH mlx5-next V2 4/9] net/mlx5: LAG, use xa_alloc to manage LAG device indices Tariq Toukan
2026-03-09 9:34 ` [PATCH mlx5-next V2 5/9] net/mlx5: E-switch, modify peer miss rule index to vhca_id Tariq Toukan
2026-03-09 9:34 ` [PATCH mlx5-next V2 6/9] net/mlx5: LAG, replace mlx5_get_dev_index with LAG sequence number Tariq Toukan
2026-03-09 9:34 ` [PATCH mlx5-next V2 7/9] net/mlx5: Add VHCA RX flow destination support for FW steering Tariq Toukan
2026-03-09 9:34 ` [PATCH mlx5-next V2 8/9] {net/RDMA}/mlx5: Add LAG demux table API and vport demux rules Tariq Toukan
2026-03-09 9:34 ` [PATCH mlx5-next V2 9/9] net/mlx5: Expose MLX5_UMR_ALIGN definition Tariq Toukan
2026-03-14 18:08 ` [PATCH mlx5-next V2 0/9] mlx5-next updates 2026-03-09 Tariq Toukan
2026-03-16 20:23 ` Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260309093435.1850724-1-tariqt@nvidia.com \
--to=tariqt@nvidia.com \
--cc=alazar@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=dtatulea@nvidia.com \
--cc=edumazet@google.com \
--cc=gal@nvidia.com \
--cc=jgg@ziepe.ca \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=moshe@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=shayd@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox