public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Tariq Toukan <tariqt@nvidia.com>
To: Leon Romanovsky <leon@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>,
	"Saeed Mahameed" <saeedm@nvidia.com>,
	Tariq Toukan <tariqt@nvidia.com>
Cc: Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Mark Bloch <mbloch@nvidia.com>, <linux-kernel@vger.kernel.org>,
	<linux-rdma@vger.kernel.org>, <netdev@vger.kernel.org>,
	Gal Pressman <gal@nvidia.com>,
	Dragos Tatulea <dtatulea@nvidia.com>,
	Moshe Shemesh <moshe@nvidia.com>, Shay Drory <shayd@nvidia.com>,
	Alexei Lazar <alazar@nvidia.com>
Subject: [PATCH mlx5-next V2 0/9] mlx5-next updates 2026-03-09
Date: Mon, 9 Mar 2026 11:34:26 +0200	[thread overview]
Message-ID: <20260309093435.1850724-1-tariqt@nvidia.com> (raw)

Hi,

This series contains mlx5 shared updates as preparation for upcoming
features.

First patch by Alex contains IFC changes as preparation for an upcoming
feature.
Last patch does definition movement to expose a HW constant so it could
be used later also by core and Eth drivers.

Patches 2 to 8 by Shay introduce mlx5 infrastructure for SD switchdev
and LAG support.
Detailed description by Shay below.

Regards,
Tariq

This series adds shared infrastructure to enable Socket Direct (SD)
single-netdev switchdev transition and LAG support in subsequent patches.

Currently, LAG is not supported in Socket Direct configurations, and
BlueField-3/4 utilizing SD for North-South traffic operates with two
distinct eSwitches per physical port. This forces the use of separate
IPs and MAC addresses for each NUMA node, complicating network
configuration and requiring firmware to handle MPFS with different
inner and outer packets for communication.

The goal is to expose a single external IP address (single MAC address)
per physical port while maintaining SD's bandwidth and latency benefits.
This means having a single eswitch per physical port managing all
physical ports via merged eswitch with multiple vports. This enables
single FDB creation which will result in a single RDMA device to be used by
DOCA/HWS/OVS.

To achieve this, the LAG infrastructure needs changes since the current
implementation assumes a fixed mapping between device indices and LAG
ports, which breaks with SD's multi-device-per-port model.

This series prepares the groundwork by:

1. Adding IFC bits for silent mode query and VHCA RX destination type,
   needed for SD device coordination and cross-VHCA traffic steering.

2. Converting the LAG pf array to xarray and using xa_alloc for dynamic
   index management. This decouples LAG indexing from physical device
   indices, allowing flexible device membership.

3. Convert peer_miss_rule array to xarray, key with vhca_id.

4. Introducing LAG variant of device index helpers that produce unique
   identifiers even when multiple devices share the same physical port.

5. Adding VHCA RX flow destination support for steering traffic to a
   specific VHCA's receive path.

6. Moving LAG demux table ownership to the LAG layer with APIs for
   SW-only LAG modes where firmware cannot create the demux table.

A follow-up series will build on this infrastructure to implement:
- SD single-netdev switchdev mode transition with shared FDB
  corresponded to the SD group.
- LAG support enabling bonding of SD groups

Since the follow-up series is large (~20 patches), the shared code
between RDMA and net is sent in advance to avoid overloading the
shared branch tree.

V2:
- Add one more patch #9.
- Use kvfree() instead of kfree() in mlx5_esw_lag_demux_rule_create()
- Fix a condition check to > instead of >= in
  mlx5_ib_set_vport_rep().
- Fix author of patch #4.
- Link to V1: https://lore.kernel.org/all/20260308065559.1837449-1-tariqt@nvidia.com/

Alexei Lazar (1):
  net/mlx5: Add IFC bits for shared headroom pool PBMC support

Shay Drory (7):
  net/mlx5: Add silent mode set/query and VHCA RX IFC bits
  net/mlx5: LAG, replace pf array with xarray
  net/mlx5: LAG, use xa_alloc to manage LAG device indices
  net/mlx5: E-switch, modify peer miss rule index to vhca_id
  net/mlx5: LAG, replace mlx5_get_dev_index with LAG sequence number
  net/mlx5: Add VHCA RX flow destination support for FW steering
  {net/RDMA}/mlx5: Add LAG demux table API and vport demux rules

Tariq Toukan (1):
  net/mlx5: Expose MLX5_UMR_ALIGN definition

 drivers/infiniband/hw/mlx5/ib_rep.c           |  24 +-
 drivers/infiniband/hw/mlx5/main.c             |  21 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |   1 -
 drivers/infiniband/hw/mlx5/mr.c               |   1 -
 .../mellanox/mlx5/core/diag/fs_tracepoint.c   |   3 +
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   |   9 +-
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  14 +-
 .../mellanox/mlx5/core/eswitch_offloads.c     | 103 ++-
 .../net/ethernet/mellanox/mlx5/core/fs_cmd.c  |   6 +-
 .../net/ethernet/mellanox/mlx5/core/fs_core.c |  17 +-
 .../ethernet/mellanox/mlx5/core/lag/debugfs.c |   3 +-
 .../net/ethernet/mellanox/mlx5/core/lag/lag.c | 684 ++++++++++++++----
 .../net/ethernet/mellanox/mlx5/core/lag/lag.h |  49 +-
 .../net/ethernet/mellanox/mlx5/core/lag/mp.c  |  20 +-
 .../ethernet/mellanox/mlx5/core/lag/mpesw.c   |  15 +-
 .../mellanox/mlx5/core/lag/port_sel.c         |  28 +-
 .../net/ethernet/mellanox/mlx5/core/lib/sd.c  |   2 +-
 include/linux/mlx5/device.h                   |   1 +
 include/linux/mlx5/fs.h                       |  10 +-
 include/linux/mlx5/lag.h                      |  21 +
 include/linux/mlx5/mlx5_ifc.h                 |  26 +-
 21 files changed, 850 insertions(+), 208 deletions(-)
 create mode 100644 include/linux/mlx5/lag.h


base-commit: 385a06f74ff7a03e3fb0b15fb87cfeb052d75073
-- 
2.44.0


             reply	other threads:[~2026-03-09  9:35 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-09  9:34 Tariq Toukan [this message]
2026-03-09  9:34 ` [PATCH mlx5-next V2 1/9] net/mlx5: Add IFC bits for shared headroom pool PBMC support Tariq Toukan
2026-03-09  9:34 ` [PATCH mlx5-next V2 2/9] net/mlx5: Add silent mode set/query and VHCA RX IFC bits Tariq Toukan
2026-03-09  9:34 ` [PATCH mlx5-next V2 3/9] net/mlx5: LAG, replace pf array with xarray Tariq Toukan
2026-03-09  9:34 ` [PATCH mlx5-next V2 4/9] net/mlx5: LAG, use xa_alloc to manage LAG device indices Tariq Toukan
2026-03-09  9:34 ` [PATCH mlx5-next V2 5/9] net/mlx5: E-switch, modify peer miss rule index to vhca_id Tariq Toukan
2026-03-09  9:34 ` [PATCH mlx5-next V2 6/9] net/mlx5: LAG, replace mlx5_get_dev_index with LAG sequence number Tariq Toukan
2026-03-09  9:34 ` [PATCH mlx5-next V2 7/9] net/mlx5: Add VHCA RX flow destination support for FW steering Tariq Toukan
2026-03-09  9:34 ` [PATCH mlx5-next V2 8/9] {net/RDMA}/mlx5: Add LAG demux table API and vport demux rules Tariq Toukan
2026-03-09  9:34 ` [PATCH mlx5-next V2 9/9] net/mlx5: Expose MLX5_UMR_ALIGN definition Tariq Toukan
2026-03-14 18:08 ` [PATCH mlx5-next V2 0/9] mlx5-next updates 2026-03-09 Tariq Toukan
2026-03-16 20:23 ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260309093435.1850724-1-tariqt@nvidia.com \
    --to=tariqt@nvidia.com \
    --cc=alazar@nvidia.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=moshe@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=shayd@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox