netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rdma-next 00/12] Multi-plane support for mlx5
@ 2024-06-16 16:08 Leon Romanovsky
  2024-06-16 16:08 ` [PATCH rdma-next 01/12] RDMA/core: Create "issm*" device nodes only when SMI is supported Leon Romanovsky
                   ` (13 more replies)
  0 siblings, 14 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Eric Dumazet, Jakub Kicinski, linux-kernel,
	linux-rdma, Mark Zhang, netdev, Paolo Abeni, Saeed Mahameed,
	Tariq Toukan

From: Leon Romanovsky <leonro@nvidia.com>

From Mark,

This patchset adds support to IB sub device and mlx5 implementation.

An IB sub device provides a subset of functionalists of it's parent.
Currently type "SMI" is supported: A SMI device provides SMI (QP0)
interface and shares same VPort with it's parent; It allows the subnet
manager to configure VPort through this interface when the parent
doesn't support SMI.

In mlx5 cases, when mlx5 multi-plane is supported, a logical mlx5 port
which aggregates multiple physical plane ports is presented, to provide
higher bandwidth. As SMI is per physical port, a mlx5 SMI device is
needed to represents physical plane ports and provides SMI capability.

A sub device can be added or deleted with the rdma tool. When a mlx5 SMI
device is created, all it's ports are created.

Examples:
$ rdma dev add smi1 type SMI parent ibp8s0f1
$ rdma dev show smi1
2: smi1: node_type ca fw 20.38.0458 node_guid 9803:9b03:009f:d20f
sys_image_guid 9803:9b03:009f:d20e type smi parent ibp8s0f1
$ rdma link show
...
link smi1/1 state INIT physical_state LINK_UP
link smi1/2 state INIT physical_state LINK_UP
link smi1/3 state INIT physical_state LINK_UP
link smi1/4 state INIT physical_state LINK_UP
$ rdma dev del smi1

Mark Zhang (12):
  RDMA/core: Create "issm*" device nodes only when SMI is supported
  net/mlx5: mlx5_ifc update for multi-plane support
  RDMA/mlx5: Add support to multi-plane device and port
  RDMA/core: Support IB sub device with type "SMI"
  RDMA: Set type of rdma_ah to IB for a SMI sub device
  RDMA/core: Create GSI QP only when CM is supported
  RDMA/mlx5: Support plane device and driver APIs to add and delete it
  RDMA/nldev: Add support to add/delete a sub IB device through netlink
  RDMA/nldev: Add support to dump device type and parent device if
    exists
  RDMA/mlx5: Add plane index support when querying PTYS registers
  net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports
  RDMA/mlx5: Support per-plane port IB counters by querying PPCNT
    register

 drivers/infiniband/core/agent.c               |  32 ++-
 drivers/infiniband/core/device.c              |  68 +++++++
 drivers/infiniband/core/mad.c                 |   9 +-
 drivers/infiniband/core/nldev.c               |  69 +++++++
 drivers/infiniband/core/user_mad.c            |  29 +--
 drivers/infiniband/core/uverbs_main.c         |   3 +-
 drivers/infiniband/hw/mlx5/cmd.c              |  12 +-
 drivers/infiniband/hw/mlx5/cmd.h              |   2 +-
 drivers/infiniband/hw/mlx5/mad.c              |  71 +++++--
 drivers/infiniband/hw/mlx5/main.c             | 182 ++++++++++++++++--
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |  10 +
 drivers/infiniband/hw/mlx5/qp.c               |   7 +-
 drivers/infiniband/hw/mlx5/qpc.c              |  13 +-
 .../net/ethernet/mellanox/mlx5/core/en/port.c |   2 +-
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  |   2 +-
 .../mellanox/mlx5/core/ipoib/ethtool.c        |   2 +-
 .../net/ethernet/mellanox/mlx5/core/port.c    |  10 +-
 .../net/ethernet/mellanox/mlx5/core/vport.c   |   1 +
 include/linux/mlx5/device.h                   |   1 +
 include/linux/mlx5/driver.h                   |   1 +
 include/linux/mlx5/mlx5_ifc.h                 |  61 +++++-
 include/linux/mlx5/port.h                     |   5 +-
 include/rdma/ib_verbs.h                       |  45 +++++
 include/uapi/rdma/rdma_netlink.h              |  13 ++
 24 files changed, 574 insertions(+), 76 deletions(-)

-- 
2.45.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 01/12] RDMA/core: Create "issm*" device nodes only when SMI is supported
  2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
  2024-06-16 16:08 ` [PATCH mlx5-next 02/12] net/mlx5: mlx5_ifc update for multi-plane support Leon Romanovsky
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mark Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan

From: Mark Zhang <markzhang@nvidia.com>

For an IB port create it's issm device node only when it has SMI
capability. In following patches mlx5 is going to support IB devices
without this cap.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/user_mad.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 2ed749f50a29..f760dfffa188 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -1321,15 +1321,17 @@ static int ib_umad_init_port(struct ib_device *device, int port_num,
 	if (ret)
 		goto err_cdev;
 
-	ib_umad_init_port_dev(&port->sm_dev, port, device);
-	port->sm_dev.devt = base_issm;
-	dev_set_name(&port->sm_dev, "issm%d", port->dev_num);
-	cdev_init(&port->sm_cdev, &umad_sm_fops);
-	port->sm_cdev.owner = THIS_MODULE;
-
-	ret = cdev_device_add(&port->sm_cdev, &port->sm_dev);
-	if (ret)
-		goto err_dev;
+	if (rdma_cap_ib_smi(device, port_num)) {
+		ib_umad_init_port_dev(&port->sm_dev, port, device);
+		port->sm_dev.devt = base_issm;
+		dev_set_name(&port->sm_dev, "issm%d", port->dev_num);
+		cdev_init(&port->sm_cdev, &umad_sm_fops);
+		port->sm_cdev.owner = THIS_MODULE;
+
+		ret = cdev_device_add(&port->sm_cdev, &port->sm_dev);
+		if (ret)
+			goto err_dev;
+	}
 
 	return 0;
 
@@ -1345,9 +1347,13 @@ static int ib_umad_init_port(struct ib_device *device, int port_num,
 static void ib_umad_kill_port(struct ib_umad_port *port)
 {
 	struct ib_umad_file *file;
+	bool has_smi = false;
 	int id;
 
-	cdev_device_del(&port->sm_cdev, &port->sm_dev);
+	if (rdma_cap_ib_smi(port->ib_dev, port->port_num)) {
+		cdev_device_del(&port->sm_cdev, &port->sm_dev);
+		has_smi = true;
+	}
 	cdev_device_del(&port->cdev, &port->dev);
 
 	mutex_lock(&port->file_mutex);
@@ -1373,7 +1379,8 @@ static void ib_umad_kill_port(struct ib_umad_port *port)
 	ida_free(&umad_ida, port->dev_num);
 
 	/* balances device_initialize() */
-	put_device(&port->sm_dev);
+	if (has_smi)
+		put_device(&port->sm_dev);
 	put_device(&port->dev);
 }
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH mlx5-next 02/12] net/mlx5: mlx5_ifc update for multi-plane support
  2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
  2024-06-16 16:08 ` [PATCH rdma-next 01/12] RDMA/core: Create "issm*" device nodes only when SMI is supported Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
  2024-06-16 16:08 ` [PATCH mlx5-next 03/12] RDMA/mlx5: Add support to multi-plane device and port Leon Romanovsky
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mark Zhang, Eric Dumazet, Jakub Kicinski, linux-rdma, netdev,
	Paolo Abeni, Saeed Mahameed, Tariq Toukan

From: Mark Zhang <markzhang@nvidia.com>

Add new fields to support mlx5 multi-plane feature. Actual support will
be added in following patches.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/linux/mlx5/mlx5_ifc.h | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 09d9d87d62c6..61738990e399 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -793,7 +793,7 @@ struct mlx5_ifc_ads_bits {
 	u8         reserved_at_2[0xe];
 	u8         pkey_index[0x10];
 
-	u8         reserved_at_20[0x8];
+	u8         plane_index[0x8];
 	u8         grh[0x1];
 	u8         mlid[0x7];
 	u8         rlid[0x10];
@@ -1990,7 +1990,8 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8	   reserved_at_c0[0x8];
 	u8	   migration_multi_load[0x1];
 	u8	   migration_tracking_state[0x1];
-	u8	   reserved_at_ca[0x6];
+	u8	   multiplane_qp_ud[0x1];
+	u8	   reserved_at_cb[0x5];
 	u8	   migration_in_chunks[0x1];
 	u8	   reserved_at_d1[0xf];
 
@@ -4172,7 +4173,8 @@ struct mlx5_ifc_hca_vport_context_bits {
 	u8         has_smi[0x1];
 	u8         has_raw[0x1];
 	u8         grh_required[0x1];
-	u8         reserved_at_104[0xc];
+	u8         reserved_at_104[0x4];
+	u8         num_port_plane[0x8];
 	u8         port_physical_state[0x4];
 	u8         vport_state_policy[0x4];
 	u8         port_state[0x4];
@@ -7692,7 +7694,7 @@ struct mlx5_ifc_mad_ifc_in_bits {
 	u8         op_mod[0x10];
 
 	u8         remote_lid[0x10];
-	u8         reserved_at_50[0x8];
+	u8         plane_index[0x8];
 	u8         port[0x8];
 
 	u8         reserved_at_60[0x20];
@@ -9621,7 +9623,9 @@ struct mlx5_ifc_ptys_reg_bits {
 	u8         an_disable_cap[0x1];
 	u8         reserved_at_3[0x5];
 	u8         local_port[0x8];
-	u8         reserved_at_10[0xd];
+	u8         reserved_at_10[0x8];
+	u8         plane_ind[0x4];
+	u8         reserved_at_1c[0x1];
 	u8         proto_mask[0x3];
 
 	u8         an_status[0x4];
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH mlx5-next 03/12] RDMA/mlx5: Add support to multi-plane device and port
  2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
  2024-06-16 16:08 ` [PATCH rdma-next 01/12] RDMA/core: Create "issm*" device nodes only when SMI is supported Leon Romanovsky
  2024-06-16 16:08 ` [PATCH mlx5-next 02/12] net/mlx5: mlx5_ifc update for multi-plane support Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
  2024-06-16 16:08 ` [PATCH rdma-next 04/12] RDMA/core: Support IB sub device with type "SMI" Leon Romanovsky
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mark Zhang, Eric Dumazet, Jakub Kicinski, linux-rdma, netdev,
	Paolo Abeni, Saeed Mahameed, Tariq Toukan

From: Mark Zhang <markzhang@nvidia.com>

When multi-plane is supported, a logical port, which is aggregation of
multiple physical plane ports, is exposed for data transmission.
Compared with a normal mlx5 IB port, this logical port supports all
functionalities except Subnet Management.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/main.c             | 60 ++++++++++++++++---
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |  2 +
 .../net/ethernet/mellanox/mlx5/core/vport.c   |  1 +
 include/linux/mlx5/driver.h                   |  1 +
 4 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index a7003316d438..55eb60715b48 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1388,7 +1388,13 @@ static int mlx5_query_hca_port(struct ib_device *ibdev, u32 port,
 	props->sm_sl		= rep->sm_sl;
 	props->state		= rep->vport_state;
 	props->phys_state	= rep->port_physical_state;
-	props->port_cap_flags	= rep->cap_mask1;
+
+	props->port_cap_flags = rep->cap_mask1;
+	if (dev->num_plane) {
+		props->port_cap_flags |= IB_PORT_SM_DISABLED;
+		props->port_cap_flags &= ~IB_PORT_SM;
+	}
+
 	props->gid_tbl_len	= mlx5_get_gid_table_len(MLX5_CAP_GEN(mdev, gid_table_size));
 	props->max_msg_sz	= 1 << MLX5_CAP_GEN(mdev, log_max_msg);
 	props->pkey_tbl_len	= mlx5_to_sw_pkey_sz(MLX5_CAP_GEN(mdev, pkey_table_size));
@@ -2807,6 +2813,23 @@ static int mlx5_ib_event_slave_port(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
+static int mlx5_ib_get_plane_num(struct mlx5_core_dev *mdev, u8 *num_plane)
+{
+	struct mlx5_hca_vport_context vport_ctx;
+	int err;
+
+	*num_plane = 0;
+	if (!MLX5_CAP_GEN(mdev, ib_virt))
+		return 0;
+
+	err = mlx5_query_hca_vport_context(mdev, 0, 1, 0, &vport_ctx);
+	if (err)
+		return err;
+
+	*num_plane = vport_ctx.num_plane;
+	return 0;
+}
+
 static int set_has_smi_cap(struct mlx5_ib_dev *dev)
 {
 	struct mlx5_hca_vport_context vport_ctx;
@@ -2817,10 +2840,14 @@ static int set_has_smi_cap(struct mlx5_ib_dev *dev)
 		return 0;
 
 	for (port = 1; port <= dev->num_ports; port++) {
-		if (!MLX5_CAP_GEN(dev->mdev, ib_virt)) {
+		if (dev->num_plane) {
+			dev->port_caps[port - 1].has_smi = false;
+			continue;
+		} else if (!MLX5_CAP_GEN(dev->mdev, ib_virt)) {
 			dev->port_caps[port - 1].has_smi = true;
 			continue;
 		}
+
 		err = mlx5_query_hca_vport_context(dev->mdev, 0, port, 0,
 						   &vport_ctx);
 		if (err) {
@@ -3026,6 +3053,11 @@ static u32 get_core_cap_flags(struct ib_device *ibdev,
 	if (rep->grh_required)
 		ret |= RDMA_CORE_CAP_IB_GRH_REQUIRED;
 
+	if (dev->num_plane)
+		return ret | RDMA_CORE_CAP_PROT_IB | RDMA_CORE_CAP_IB_MAD |
+			RDMA_CORE_CAP_IB_CM | RDMA_CORE_CAP_IB_SA |
+			RDMA_CORE_CAP_AF_IB;
+
 	if (ll == IB_LINK_LAYER_INFINIBAND)
 		return ret | RDMA_CORE_PORT_IBA_IB;
 
@@ -4507,11 +4539,18 @@ static int mlx5r_probe(struct auxiliary_device *adev,
 	dev = ib_alloc_device(mlx5_ib_dev, ib_dev);
 	if (!dev)
 		return -ENOMEM;
+
+	if (ll == IB_LINK_LAYER_INFINIBAND) {
+		ret = mlx5_ib_get_plane_num(mdev, &dev->num_plane);
+		if (ret)
+			goto fail;
+	}
+
 	dev->port = kcalloc(num_ports, sizeof(*dev->port),
 			     GFP_KERNEL);
 	if (!dev->port) {
-		ib_dealloc_device(&dev->ib_dev);
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto fail;
 	}
 
 	dev->mdev = mdev;
@@ -4523,14 +4562,17 @@ static int mlx5r_probe(struct auxiliary_device *adev,
 		profile = &pf_profile;
 
 	ret = __mlx5_ib_add(dev, profile);
-	if (ret) {
-		kfree(dev->port);
-		ib_dealloc_device(&dev->ib_dev);
-		return ret;
-	}
+	if (ret)
+		goto fail_ib_add;
 
 	auxiliary_set_drvdata(adev, dev);
 	return 0;
+
+fail_ib_add:
+	kfree(dev->port);
+fail:
+	ib_dealloc_device(&dev->ib_dev);
+	return ret;
 }
 
 static void mlx5r_remove(struct auxiliary_device *adev)
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index a6f2b679a7e9..d97d6bc2dbaa 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1189,6 +1189,8 @@ struct mlx5_ib_dev {
 #ifdef CONFIG_MLX5_MACSEC
 	struct mlx5_macsec macsec;
 #endif
+
+	u8 num_plane;
 };
 
 static inline struct mlx5_ib_cq *to_mibcq(struct mlx5_core_cq *mcq)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index 1005bb6935b6..0d5f750faa45 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -737,6 +737,7 @@ int mlx5_query_hca_vport_context(struct mlx5_core_dev *dev,
 	rep->grh_required = MLX5_GET_PR(hca_vport_context, ctx, grh_required);
 	rep->sys_image_guid = MLX5_GET64_PR(hca_vport_context, ctx,
 					    system_image_guid);
+	rep->num_plane = MLX5_GET_PR(hca_vport_context, ctx, num_port_plane);
 
 ex:
 	kvfree(out);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 145e2fb1b832..2889ece6c808 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -917,6 +917,7 @@ struct mlx5_hca_vport_context {
 	u16			qkey_violation_counter;
 	u16			pkey_violation_counter;
 	bool			grh_required;
+	u8			num_plane;
 };
 
 #define STRUCT_FIELD(header, field) \
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 04/12] RDMA/core: Support IB sub device with type "SMI"
  2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
                   ` (2 preceding siblings ...)
  2024-06-16 16:08 ` [PATCH mlx5-next 03/12] RDMA/mlx5: Add support to multi-plane device and port Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
  2024-06-29  0:14   ` Zhu Yanjun
  2024-06-16 16:08 ` [PATCH rdma-next 05/12] RDMA: Set type of rdma_ah to IB for a SMI sub device Leon Romanovsky
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mark Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan

From: Mark Zhang <markzhang@nvidia.com>

This patch adds 2 APIs, as well as driver operations to support adding
and deleting an IB sub device, which provides part of functionalities
of it's parent.

A sub device has a type; for a sub device with type "SMI", it provides
the smi capability through umad for its parent, meaning uverb is not
supported.

A sub device cannot live without a parent. So when a parent is
released, all it's sub devices are released as well.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/device.c      | 68 +++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c |  3 +-
 include/rdma/ib_verbs.h               | 43 +++++++++++++++++
 include/uapi/rdma/rdma_netlink.h      |  5 ++
 4 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 55aa7aa32d4a..8547cab50b23 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -641,6 +641,11 @@ struct ib_device *_ib_alloc_device(size_t size)
 		BIT_ULL(IB_USER_VERBS_CMD_REG_MR) |
 		BIT_ULL(IB_USER_VERBS_CMD_REREG_MR) |
 		BIT_ULL(IB_USER_VERBS_CMD_RESIZE_CQ);
+
+	mutex_init(&device->subdev_lock);
+	INIT_LIST_HEAD(&device->subdev_list_head);
+	INIT_LIST_HEAD(&device->subdev_list);
+
 	return device;
 }
 EXPORT_SYMBOL(_ib_alloc_device);
@@ -1461,6 +1466,18 @@ EXPORT_SYMBOL(ib_register_device);
 /* Callers must hold a get on the device. */
 static void __ib_unregister_device(struct ib_device *ib_dev)
 {
+	struct ib_device *sub, *tmp;
+
+	mutex_lock(&ib_dev->subdev_lock);
+	list_for_each_entry_safe_reverse(sub, tmp,
+					 &ib_dev->subdev_list_head,
+					 subdev_list) {
+		list_del(&sub->subdev_list);
+		ib_dev->ops.del_sub_dev(sub);
+		ib_device_put(ib_dev);
+	}
+	mutex_unlock(&ib_dev->subdev_lock);
+
 	/*
 	 * We have a registration lock so that all the calls to unregister are
 	 * fully fenced, once any unregister returns the device is truely
@@ -2597,6 +2614,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 		ops->uverbs_no_driver_id_binding;
 
 	SET_DEVICE_OP(dev_ops, add_gid);
+	SET_DEVICE_OP(dev_ops, add_sub_dev);
 	SET_DEVICE_OP(dev_ops, advise_mr);
 	SET_DEVICE_OP(dev_ops, alloc_dm);
 	SET_DEVICE_OP(dev_ops, alloc_hw_device_stats);
@@ -2631,6 +2649,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, dealloc_ucontext);
 	SET_DEVICE_OP(dev_ops, dealloc_xrcd);
 	SET_DEVICE_OP(dev_ops, del_gid);
+	SET_DEVICE_OP(dev_ops, del_sub_dev);
 	SET_DEVICE_OP(dev_ops, dereg_mr);
 	SET_DEVICE_OP(dev_ops, destroy_ah);
 	SET_DEVICE_OP(dev_ops, destroy_counters);
@@ -2727,6 +2746,55 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 }
 EXPORT_SYMBOL(ib_set_device_ops);
 
+int ib_add_sub_device(struct ib_device *parent,
+		      enum rdma_nl_dev_type type,
+		      const char *name)
+{
+	struct ib_device *sub;
+	int ret = 0;
+
+	if (!parent->ops.add_sub_dev || !parent->ops.del_sub_dev)
+		return -EOPNOTSUPP;
+
+	if (!ib_device_try_get(parent))
+		return -EINVAL;
+
+	sub = parent->ops.add_sub_dev(parent, type, name);
+	if (IS_ERR(sub)) {
+		ib_device_put(parent);
+		return PTR_ERR(sub);
+	}
+
+	sub->type = type;
+	sub->parent = parent;
+
+	mutex_lock(&parent->subdev_lock);
+	list_add_tail(&parent->subdev_list_head, &sub->subdev_list);
+	mutex_unlock(&parent->subdev_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL(ib_add_sub_device);
+
+int ib_del_sub_device_and_put(struct ib_device *sub)
+{
+	struct ib_device *parent = sub->parent;
+
+	if (!parent)
+		return -EOPNOTSUPP;
+
+	mutex_lock(&parent->subdev_lock);
+	list_del(&sub->subdev_list);
+	mutex_unlock(&parent->subdev_lock);
+
+	ib_device_put(sub);
+	parent->ops.del_sub_dev(sub);
+	ib_device_put(parent);
+
+	return 0;
+}
+EXPORT_SYMBOL(ib_del_sub_device_and_put);
+
 #ifdef CONFIG_INFINIBAND_VIRT_DMA
 int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents)
 {
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 495d5a5d0373..bc099287de9a 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -1114,7 +1114,8 @@ static int ib_uverbs_add_one(struct ib_device *device)
 	struct ib_uverbs_device *uverbs_dev;
 	int ret;
 
-	if (!device->ops.alloc_ucontext)
+	if (!device->ops.alloc_ucontext ||
+	    device->type == RDMA_DEVICE_TYPE_SMI)
 		return -EOPNOTSUPP;
 
 	uverbs_dev = kzalloc(sizeof(*uverbs_dev), GFP_KERNEL);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 477bf9dd5e71..bebc2d22f466 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2661,6 +2661,18 @@ struct ib_device_ops {
 	 */
 	int (*get_numa_node)(struct ib_device *dev);
 
+	/**
+	 * add_sub_dev - Add a sub IB device
+	 */
+	struct ib_device *(*add_sub_dev)(struct ib_device *parent,
+					 enum rdma_nl_dev_type type,
+					 const char *name);
+
+	/**
+	 * del_sub_dev - Delete a sub IB device
+	 */
+	void (*del_sub_dev)(struct ib_device *sub_dev);
+
 	DECLARE_RDMA_OBJ_SIZE(ib_ah);
 	DECLARE_RDMA_OBJ_SIZE(ib_counters);
 	DECLARE_RDMA_OBJ_SIZE(ib_cq);
@@ -2771,6 +2783,15 @@ struct ib_device {
 	char iw_ifname[IFNAMSIZ];
 	u32 iw_driver_flags;
 	u32 lag_flags;
+
+	/* A parent device has a list of sub-devices */
+	struct mutex subdev_lock;
+	struct list_head subdev_list_head;
+
+	/* A sub device has a type and a parent */
+	enum rdma_nl_dev_type type;
+	struct ib_device *parent;
+	struct list_head subdev_list;
 };
 
 static inline void *rdma_zalloc_obj(struct ib_device *dev, size_t size,
@@ -4820,4 +4841,26 @@ static inline u16 rdma_get_udp_sport(u32 fl, u32 lqpn, u32 rqpn)
 
 const struct ib_port_immutable*
 ib_port_immutable_read(struct ib_device *dev, unsigned int port);
+
+/** ib_add_sub_device - Add a sub IB device on an existing one
+ *
+ * @parent: The IB device that needs to add a sub device
+ * @type: The type of the new sub device
+ * @name: The name of the new sub device
+ *
+ *
+ * Return 0 on success, an error code otherwise
+ */
+int ib_add_sub_device(struct ib_device *parent,
+		      enum rdma_nl_dev_type type,
+		      const char *name);
+
+
+/** ib_del_sub_device_and_put - Delect an IB sub device while holding a 'get'
+ *
+ * @sub: The sub device that is going to be deleted
+ *
+ * Return 0 on success, an error code otherwise
+ */
+int ib_del_sub_device_and_put(struct ib_device *sub);
 #endif /* IB_VERBS_H */
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index a214fc259f28..d15ee16be722 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -602,4 +602,9 @@ enum rdma_nl_counter_mask {
 	RDMA_COUNTER_MASK_QP_TYPE = 1,
 	RDMA_COUNTER_MASK_PID = 1 << 1,
 };
+
+/* Supported rdma device types. */
+enum rdma_nl_dev_type {
+	RDMA_DEVICE_TYPE_SMI = 1,
+};
 #endif /* _UAPI_RDMA_NETLINK_H */
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 05/12] RDMA: Set type of rdma_ah to IB for a SMI sub device
  2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
                   ` (3 preceding siblings ...)
  2024-06-16 16:08 ` [PATCH rdma-next 04/12] RDMA/core: Support IB sub device with type "SMI" Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
  2024-06-16 16:08 ` [PATCH rdma-next 06/12] RDMA/core: Create GSI QP only when CM is supported Leon Romanovsky
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mark Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan

From: Mark Zhang <markzhang@nvidia.com>

An address handle created on a SMI port has type IB, as a SMI
port it's used for SMI management through umad.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/rdma/ib_verbs.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index bebc2d22f466..c20571618798 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4660,6 +4660,8 @@ static inline enum rdma_ah_attr_type rdma_ah_find_type(struct ib_device *dev,
 			return RDMA_AH_ATTR_TYPE_OPA;
 		return RDMA_AH_ATTR_TYPE_IB;
 	}
+	if (dev->type == RDMA_DEVICE_TYPE_SMI)
+		return RDMA_AH_ATTR_TYPE_IB;
 
 	return RDMA_AH_ATTR_TYPE_UNDEFINED;
 }
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 06/12] RDMA/core: Create GSI QP only when CM is supported
  2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
                   ` (4 preceding siblings ...)
  2024-06-16 16:08 ` [PATCH rdma-next 05/12] RDMA: Set type of rdma_ah to IB for a SMI sub device Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
  2024-06-16 16:08 ` [PATCH rdma-next 07/12] RDMA/mlx5: Support plane device and driver APIs to add and delete it Leon Romanovsky
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mark Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan

From: Mark Zhang <markzhang@nvidia.com>

GSI QP is not needed if the port doesn't support connection management.
In following patches mlx5 is going to support IB ports that doesn't
support CM.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/agent.c | 32 ++++++++++++++++++++++----------
 drivers/infiniband/core/mad.c   |  9 ++++++---
 2 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index f82b4260de42..3bb46696731e 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -59,7 +59,16 @@ __ib_get_agent_port(const struct ib_device *device, int port_num)
 	struct ib_agent_port_private *entry;
 
 	list_for_each_entry(entry, &ib_agent_port_list, port_list) {
-		if (entry->agent[1]->device == device &&
+		/* Need to check both agent[0] and agent[1], as an agent port
+		 * may only have one of them
+		 */
+		if (entry->agent[0] &&
+		    entry->agent[0]->device == device &&
+		    entry->agent[0]->port_num == port_num)
+			return entry;
+
+		if (entry->agent[1] &&
+		    entry->agent[1]->device == device &&
 		    entry->agent[1]->port_num == port_num)
 			return entry;
 	}
@@ -172,14 +181,16 @@ int ib_agent_port_open(struct ib_device *device, int port_num)
 		}
 	}
 
-	/* Obtain send only MAD agent for GSI QP */
-	port_priv->agent[1] = ib_register_mad_agent(device, port_num,
-						    IB_QPT_GSI, NULL, 0,
-						    &agent_send_handler,
-						    NULL, NULL, 0);
-	if (IS_ERR(port_priv->agent[1])) {
-		ret = PTR_ERR(port_priv->agent[1]);
-		goto error3;
+	if (rdma_cap_ib_cm(device, port_num)) {
+		/* Obtain send only MAD agent for GSI QP */
+		port_priv->agent[1] = ib_register_mad_agent(device, port_num,
+							    IB_QPT_GSI, NULL, 0,
+							    &agent_send_handler,
+							    NULL, NULL, 0);
+		if (IS_ERR(port_priv->agent[1])) {
+			ret = PTR_ERR(port_priv->agent[1]);
+			goto error3;
+		}
 	}
 
 	spin_lock_irqsave(&ib_agent_port_list_lock, flags);
@@ -212,7 +223,8 @@ int ib_agent_port_close(struct ib_device *device, int port_num)
 	list_del(&port_priv->port_list);
 	spin_unlock_irqrestore(&ib_agent_port_list_lock, flags);
 
-	ib_unregister_mad_agent(port_priv->agent[1]);
+	if (port_priv->agent[1])
+		ib_unregister_mad_agent(port_priv->agent[1]);
 	if (port_priv->agent[0])
 		ib_unregister_mad_agent(port_priv->agent[0]);
 
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 674344eb8e2f..7439e47ff951 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2983,9 +2983,12 @@ static int ib_mad_port_open(struct ib_device *device,
 		if (ret)
 			goto error6;
 	}
-	ret = create_mad_qp(&port_priv->qp_info[1], IB_QPT_GSI);
-	if (ret)
-		goto error7;
+
+	if (rdma_cap_ib_cm(device, port_num)) {
+		ret = create_mad_qp(&port_priv->qp_info[1], IB_QPT_GSI);
+		if (ret)
+			goto error7;
+	}
 
 	snprintf(name, sizeof(name), "ib_mad%u", port_num);
 	port_priv->wq = alloc_ordered_workqueue(name, WQ_MEM_RECLAIM);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 07/12] RDMA/mlx5: Support plane device and driver APIs to add and delete it
  2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
                   ` (5 preceding siblings ...)
  2024-06-16 16:08 ` [PATCH rdma-next 06/12] RDMA/core: Create GSI QP only when CM is supported Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
  2024-06-16 16:08 ` [PATCH rdma-next 08/12] RDMA/nldev: Add support to add/delete a sub IB device through netlink Leon Romanovsky
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mark Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan

From: Mark Zhang <markzhang@nvidia.com>

This patch supports driver APIs "add_sub_dev" and "del_sub_dev", to
add and delete a plane device respectively.
A mlx5 plane device is a rdma SMI device; It provides the SMI capability
through user MAD for it's parent, the logical multi-plane aggregated
device. For a plane port:
- It supports QP0 only;
- When adding a plane device, all plane ports are added;
- For some commands like mad_ifc, both plane_index and native portnum
  is needed;
- When querying or modifying a plane port context, the native portnum
  must be used, as the query/modify_hca_vport_context command doesn't
  support plane port.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/cmd.c     |  12 ++-
 drivers/infiniband/hw/mlx5/cmd.h     |   2 +-
 drivers/infiniband/hw/mlx5/mad.c     |   2 +-
 drivers/infiniband/hw/mlx5/main.c    | 116 ++++++++++++++++++++++++++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h |   8 ++
 drivers/infiniband/hw/mlx5/qp.c      |   7 +-
 drivers/infiniband/hw/mlx5/qpc.c     |  13 ++-
 7 files changed, 147 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c
index 1d0c8d5e745b..895b62cc528d 100644
--- a/drivers/infiniband/hw/mlx5/cmd.c
+++ b/drivers/infiniband/hw/mlx5/cmd.c
@@ -177,7 +177,7 @@ int mlx5_cmd_xrcd_dealloc(struct mlx5_core_dev *dev, u32 xrcdn, u16 uid)
 	return mlx5_cmd_exec_in(dev, dealloc_xrcd, in);
 }
 
-int mlx5_cmd_mad_ifc(struct mlx5_core_dev *dev, const void *inb, void *outb,
+int mlx5_cmd_mad_ifc(struct mlx5_ib_dev *dev, const void *inb, void *outb,
 		     u16 opmod, u8 port)
 {
 	int outlen = MLX5_ST_SZ_BYTES(mad_ifc_out);
@@ -195,12 +195,18 @@ int mlx5_cmd_mad_ifc(struct mlx5_core_dev *dev, const void *inb, void *outb,
 
 	MLX5_SET(mad_ifc_in, in, opcode, MLX5_CMD_OP_MAD_IFC);
 	MLX5_SET(mad_ifc_in, in, op_mod, opmod);
-	MLX5_SET(mad_ifc_in, in, port, port);
+	if (dev->ib_dev.type == RDMA_DEVICE_TYPE_SMI) {
+		MLX5_SET(mad_ifc_in, in, plane_index, port);
+		MLX5_SET(mad_ifc_in, in, port,
+			 smi_to_native_portnum(dev, port));
+	} else {
+		MLX5_SET(mad_ifc_in, in, port, port);
+	}
 
 	data = MLX5_ADDR_OF(mad_ifc_in, in, mad);
 	memcpy(data, inb, MLX5_FLD_SZ_BYTES(mad_ifc_in, mad));
 
-	err = mlx5_cmd_exec_inout(dev, mad_ifc, in, out);
+	err = mlx5_cmd_exec_inout(dev->mdev, mad_ifc, in, out);
 	if (err)
 		goto out;
 
diff --git a/drivers/infiniband/hw/mlx5/cmd.h b/drivers/infiniband/hw/mlx5/cmd.h
index 93a971a40d11..e5cd31270443 100644
--- a/drivers/infiniband/hw/mlx5/cmd.h
+++ b/drivers/infiniband/hw/mlx5/cmd.h
@@ -54,7 +54,7 @@ int mlx5_cmd_detach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid,
 			u32 qpn, u16 uid);
 int mlx5_cmd_xrcd_alloc(struct mlx5_core_dev *dev, u32 *xrcdn, u16 uid);
 int mlx5_cmd_xrcd_dealloc(struct mlx5_core_dev *dev, u32 xrcdn, u16 uid);
-int mlx5_cmd_mad_ifc(struct mlx5_core_dev *dev, const void *inb, void *outb,
+int mlx5_cmd_mad_ifc(struct mlx5_ib_dev *dev, const void *inb, void *outb,
 		     u16 opmod, u8 port);
 int mlx5_cmd_uar_alloc(struct mlx5_core_dev *dev, u32 *uarn, u16 uid);
 int mlx5_cmd_uar_dealloc(struct mlx5_core_dev *dev, u32 uarn, u16 uid);
diff --git a/drivers/infiniband/hw/mlx5/mad.c b/drivers/infiniband/hw/mlx5/mad.c
index 3e43687a7f6f..ead836d159d3 100644
--- a/drivers/infiniband/hw/mlx5/mad.c
+++ b/drivers/infiniband/hw/mlx5/mad.c
@@ -69,7 +69,7 @@ static int mlx5_MAD_IFC(struct mlx5_ib_dev *dev, int ignore_mkey,
 	if (ignore_bkey || !in_wc)
 		op_modifier |= 0x2;
 
-	return mlx5_cmd_mad_ifc(dev->mdev, in_mad, response_mad, op_modifier,
+	return mlx5_cmd_mad_ifc(dev, in_mad, response_mad, op_modifier,
 				port);
 }
 
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 55eb60715b48..3a653998bd88 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -313,6 +313,14 @@ struct mlx5_core_dev *mlx5_ib_get_native_port_mdev(struct mlx5_ib_dev *ibdev,
 	struct mlx5_ib_multiport_info *mpi;
 	struct mlx5_ib_port *port;
 
+	if (ibdev->ib_dev.type == RDMA_DEVICE_TYPE_SMI) {
+		if (native_port_num)
+			*native_port_num = smi_to_native_portnum(ibdev,
+								 ib_port_num);
+		return ibdev->mdev;
+
+	}
+
 	if (!mlx5_core_mp_enabled(ibdev->mdev) ||
 	    ll != IB_LINK_LAYER_ETHERNET) {
 		if (native_port_num)
@@ -1378,6 +1386,9 @@ static int mlx5_query_hca_port(struct ib_device *ibdev, u32 port,
 
 	/* props being zeroed by the caller, avoid zeroing it here */
 
+	if (ibdev->type == RDMA_DEVICE_TYPE_SMI)
+		port = smi_to_native_portnum(dev, port);
+
 	err = mlx5_query_hca_vport_context(mdev, 0, port, 0, rep);
 	if (err)
 		goto out;
@@ -1393,7 +1404,8 @@ static int mlx5_query_hca_port(struct ib_device *ibdev, u32 port,
 	if (dev->num_plane) {
 		props->port_cap_flags |= IB_PORT_SM_DISABLED;
 		props->port_cap_flags &= ~IB_PORT_SM;
-	}
+	} else if (ibdev->type == RDMA_DEVICE_TYPE_SMI)
+		props->port_cap_flags &= ~IB_PORT_CM_SUP;
 
 	props->gid_tbl_len	= mlx5_get_gid_table_len(MLX5_CAP_GEN(mdev, gid_table_size));
 	props->max_msg_sz	= 1 << MLX5_CAP_GEN(mdev, log_max_msg);
@@ -2843,7 +2855,8 @@ static int set_has_smi_cap(struct mlx5_ib_dev *dev)
 		if (dev->num_plane) {
 			dev->port_caps[port - 1].has_smi = false;
 			continue;
-		} else if (!MLX5_CAP_GEN(dev->mdev, ib_virt)) {
+		} else if (!MLX5_CAP_GEN(dev->mdev, ib_virt) ||
+			dev->ib_dev.type == RDMA_DEVICE_TYPE_SMI) {
 			dev->port_caps[port - 1].has_smi = true;
 			continue;
 		}
@@ -3057,6 +3070,8 @@ static u32 get_core_cap_flags(struct ib_device *ibdev,
 		return ret | RDMA_CORE_CAP_PROT_IB | RDMA_CORE_CAP_IB_MAD |
 			RDMA_CORE_CAP_IB_CM | RDMA_CORE_CAP_IB_SA |
 			RDMA_CORE_CAP_AF_IB;
+	else if (ibdev->type == RDMA_DEVICE_TYPE_SMI)
+		return ret | RDMA_CORE_CAP_IB_MAD | RDMA_CORE_CAP_IB_SMI;
 
 	if (ll == IB_LINK_LAYER_INFINIBAND)
 		return ret | RDMA_CORE_PORT_IBA_IB;
@@ -3093,6 +3108,9 @@ static int mlx5_port_immutable(struct ib_device *ibdev, u32 port_num,
 		return err;
 
 	if (ll == IB_LINK_LAYER_INFINIBAND) {
+		if (ibdev->type == RDMA_DEVICE_TYPE_SMI)
+			port_num = smi_to_native_portnum(dev, port_num);
+
 		err = mlx5_query_hca_vport_context(dev->mdev, 0, port_num, 0,
 						   &rep);
 		if (err)
@@ -3892,12 +3910,18 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev)
 	return err;
 }
 
+static struct ib_device *mlx5_ib_add_sub_dev(struct ib_device *parent,
+					     enum rdma_nl_dev_type type,
+					     const char *name);
+static void mlx5_ib_del_sub_dev(struct ib_device *sub_dev);
+
 static const struct ib_device_ops mlx5_ib_dev_ops = {
 	.owner = THIS_MODULE,
 	.driver_id = RDMA_DRIVER_MLX5,
 	.uverbs_abi_ver	= MLX5_IB_UVERBS_ABI_VERSION,
 
 	.add_gid = mlx5_ib_add_gid,
+	.add_sub_dev = mlx5_ib_add_sub_dev,
 	.alloc_mr = mlx5_ib_alloc_mr,
 	.alloc_mr_integrity = mlx5_ib_alloc_mr_integrity,
 	.alloc_pd = mlx5_ib_alloc_pd,
@@ -3912,6 +3936,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
 	.dealloc_pd = mlx5_ib_dealloc_pd,
 	.dealloc_ucontext = mlx5_ib_dealloc_ucontext,
 	.del_gid = mlx5_ib_del_gid,
+	.del_sub_dev = mlx5_ib_del_sub_dev,
 	.dereg_mr = mlx5_ib_dereg_mr,
 	.destroy_ah = mlx5_ib_destroy_ah,
 	.destroy_cq = mlx5_ib_destroy_cq,
@@ -4201,7 +4226,9 @@ static int mlx5_ib_stage_ib_reg_init(struct mlx5_ib_dev *dev)
 {
 	const char *name;
 
-	if (!mlx5_lag_is_active(dev->mdev))
+	if (dev->sub_dev_name)
+		name = dev->sub_dev_name;
+	else if (!mlx5_lag_is_active(dev->mdev))
 		name = "mlx5_%d";
 	else
 		name = "mlx5_bond_%d";
@@ -4462,6 +4489,89 @@ const struct mlx5_ib_profile raw_eth_profile = {
 		     NULL),
 };
 
+static const struct mlx5_ib_profile plane_profile = {
+	STAGE_CREATE(MLX5_IB_STAGE_INIT,
+		     mlx5_ib_stage_init_init,
+		     mlx5_ib_stage_init_cleanup),
+	STAGE_CREATE(MLX5_IB_STAGE_CAPS,
+		     mlx5_ib_stage_caps_init,
+		     mlx5_ib_stage_caps_cleanup),
+	STAGE_CREATE(MLX5_IB_STAGE_NON_DEFAULT_CB,
+		     mlx5_ib_stage_non_default_cb,
+		     NULL),
+	STAGE_CREATE(MLX5_IB_STAGE_QP,
+		     mlx5_init_qp_table,
+		     mlx5_cleanup_qp_table),
+	STAGE_CREATE(MLX5_IB_STAGE_SRQ,
+		     mlx5_init_srq_table,
+		     mlx5_cleanup_srq_table),
+	STAGE_CREATE(MLX5_IB_STAGE_DEVICE_RESOURCES,
+		     mlx5_ib_dev_res_init,
+		     mlx5_ib_dev_res_cleanup),
+	STAGE_CREATE(MLX5_IB_STAGE_BFREG,
+		     mlx5_ib_stage_bfrag_init,
+		     mlx5_ib_stage_bfrag_cleanup),
+	STAGE_CREATE(MLX5_IB_STAGE_IB_REG,
+		     mlx5_ib_stage_ib_reg_init,
+		     mlx5_ib_stage_ib_reg_cleanup),
+};
+
+static struct ib_device *mlx5_ib_add_sub_dev(struct ib_device *parent,
+					     enum rdma_nl_dev_type type,
+					     const char *name)
+{
+	struct mlx5_ib_dev *mparent = to_mdev(parent), *mplane;
+	enum rdma_link_layer ll;
+	int ret;
+
+	if (mparent->smi_dev)
+		return ERR_PTR(-EEXIST);
+
+	ll = mlx5_port_type_cap_to_rdma_ll(MLX5_CAP_GEN(mparent->mdev,
+							port_type));
+	if (type != RDMA_DEVICE_TYPE_SMI || !mparent->num_plane ||
+	    ll != IB_LINK_LAYER_INFINIBAND ||
+	    !MLX5_CAP_GEN_2(mparent->mdev, multiplane_qp_ud))
+		return ERR_PTR(-EOPNOTSUPP);
+
+	mplane = ib_alloc_device(mlx5_ib_dev, ib_dev);
+	if (!mplane)
+		return ERR_PTR(-ENOMEM);
+
+	mplane->port = kcalloc(mparent->num_plane * mparent->num_ports,
+			       sizeof(*mplane->port), GFP_KERNEL);
+	if (!mplane->port) {
+		ret = -ENOMEM;
+		goto fail_kcalloc;
+	}
+
+	mplane->ib_dev.type = type;
+	mplane->mdev = mparent->mdev;
+	mplane->num_ports = mparent->num_plane;
+	mplane->sub_dev_name = name;
+
+	ret = __mlx5_ib_add(mplane, &plane_profile);
+	if (ret)
+		goto fail_ib_add;
+
+	mparent->smi_dev = mplane;
+	return &mplane->ib_dev;
+
+fail_ib_add:
+	kfree(mplane->port);
+fail_kcalloc:
+	ib_dealloc_device(&mplane->ib_dev);
+	return ERR_PTR(ret);
+}
+
+static void mlx5_ib_del_sub_dev(struct ib_device *sub_dev)
+{
+	struct mlx5_ib_dev *mdev = to_mdev(sub_dev);
+
+	to_mdev(sub_dev->parent)->smi_dev = NULL;
+	__mlx5_ib_remove(mdev, mdev->profile, MLX5_IB_STAGE_MAX);
+}
+
 static int mlx5r_mp_probe(struct auxiliary_device *adev,
 			  const struct auxiliary_device_id *id)
 {
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index d97d6bc2dbaa..bf25ddb17bce 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1191,6 +1191,8 @@ struct mlx5_ib_dev {
 #endif
 
 	u8 num_plane;
+	struct mlx5_ib_dev *smi_dev;
+	const char *sub_dev_name;
 };
 
 static inline struct mlx5_ib_cq *to_mibcq(struct mlx5_core_cq *mcq)
@@ -1698,4 +1700,10 @@ static inline bool mlx5_umem_needs_ats(struct mlx5_ib_dev *dev,
 int set_roce_addr(struct mlx5_ib_dev *dev, u32 port_num,
 		  unsigned int index, const union ib_gid *gid,
 		  const struct ib_gid_attr *attr);
+
+static inline u32 smi_to_native_portnum(struct mlx5_ib_dev *dev, u32 port)
+{
+	return (port - 1) / dev->num_ports + 1;
+}
+
 #endif /* MLX5_IB_H */
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index be288cc7a3c0..66d9b44a6991 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -4219,7 +4219,12 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
 
 	/* todo implement counter_index functionality */
 
-	if (is_sqp(qp->type))
+	if (dev->ib_dev.type == RDMA_DEVICE_TYPE_SMI && is_qp0(qp->type)) {
+		MLX5_SET(ads, pri_path, vhca_port_num,
+			 smi_to_native_portnum(dev, qp->port));
+		if (cur_state == IB_QPS_INIT && new_state == IB_QPS_RTR)
+			MLX5_SET(ads, pri_path, plane_index, qp->port);
+	} else if (is_sqp(qp->type))
 		MLX5_SET(ads, pri_path, vhca_port_num, qp->port);
 
 	if (attr_mask & IB_QP_PORT)
diff --git a/drivers/infiniband/hw/mlx5/qpc.c b/drivers/infiniband/hw/mlx5/qpc.c
index d9cf6982d645..d3dcc272200a 100644
--- a/drivers/infiniband/hw/mlx5/qpc.c
+++ b/drivers/infiniband/hw/mlx5/qpc.c
@@ -249,7 +249,8 @@ int mlx5_qpc_create_qp(struct mlx5_ib_dev *dev, struct mlx5_core_qp *qp,
 	if (err)
 		goto err_cmd;
 
-	mlx5_debug_qp_add(dev->mdev, qp);
+	if (dev->ib_dev.type != RDMA_DEVICE_TYPE_SMI)
+		mlx5_debug_qp_add(dev->mdev, qp);
 
 	return 0;
 
@@ -307,7 +308,8 @@ int mlx5_core_destroy_qp(struct mlx5_ib_dev *dev, struct mlx5_core_qp *qp)
 {
 	u32 in[MLX5_ST_SZ_DW(destroy_qp_in)] = {};
 
-	mlx5_debug_qp_remove(dev->mdev, qp);
+	if (dev->ib_dev.type != RDMA_DEVICE_TYPE_SMI)
+		mlx5_debug_qp_remove(dev->mdev, qp);
 
 	destroy_resource_common(dev, qp);
 
@@ -504,7 +506,9 @@ int mlx5_init_qp_table(struct mlx5_ib_dev *dev)
 	spin_lock_init(&table->lock);
 	INIT_RADIX_TREE(&table->tree, GFP_ATOMIC);
 	xa_init(&table->dct_xa);
-	mlx5_qp_debugfs_init(dev->mdev);
+
+	if (dev->ib_dev.type != RDMA_DEVICE_TYPE_SMI)
+		mlx5_qp_debugfs_init(dev->mdev);
 
 	table->nb.notifier_call = rsc_event_notifier;
 	mlx5_notifier_register(dev->mdev, &table->nb);
@@ -517,7 +521,8 @@ void mlx5_cleanup_qp_table(struct mlx5_ib_dev *dev)
 	struct mlx5_qp_table *table = &dev->qp_table;
 
 	mlx5_notifier_unregister(dev->mdev, &table->nb);
-	mlx5_qp_debugfs_cleanup(dev->mdev);
+	if (dev->ib_dev.type != RDMA_DEVICE_TYPE_SMI)
+		mlx5_qp_debugfs_cleanup(dev->mdev);
 }
 
 int mlx5_core_qp_query(struct mlx5_ib_dev *dev, struct mlx5_core_qp *qp,
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 08/12] RDMA/nldev: Add support to add/delete a sub IB device through netlink
  2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
                   ` (6 preceding siblings ...)
  2024-06-16 16:08 ` [PATCH rdma-next 07/12] RDMA/mlx5: Support plane device and driver APIs to add and delete it Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
  2024-06-16 16:08 ` [PATCH rdma-next 09/12] RDMA/nldev: Add support to dump device type and parent device if exists Leon Romanovsky
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mark Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan

From: Mark Zhang <markzhang@nvidia.com>

Add new netlink commands and attributes to support adding and deleting
a sub IB device with admin privilege.

Examples:
$ rdma dev add smi1 type SMI parent ibp8s0f1
$ rdma dev del smi1

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/nldev.c  | 59 ++++++++++++++++++++++++++++++++
 include/uapi/rdma/rdma_netlink.h |  6 ++++
 2 files changed, 65 insertions(+)

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index bc79ee630d8d..b5f87e7a1cfd 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -167,6 +167,7 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
 	[RDMA_NLDEV_ATTR_STAT_HWCOUNTER_DYNAMIC] = { .type = NLA_U8 },
 	[RDMA_NLDEV_SYS_ATTR_PRIVILEGED_QKEY_MODE] = { .type = NLA_U8 },
 	[RDMA_NLDEV_ATTR_DRIVER_DETAILS]	= { .type = NLA_U8 },
+	[RDMA_NLDEV_ATTR_DEV_TYPE]		= { .type = NLA_U8 },
 };
 
 static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
@@ -2548,6 +2549,56 @@ static int nldev_stat_get_counter_status_doit(struct sk_buff *skb,
 	return ret;
 }
 
+static int nldev_newdev(struct sk_buff *skb, struct nlmsghdr *nlh,
+			struct netlink_ext_ack *extack)
+{
+	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
+	enum rdma_nl_dev_type type;
+	struct ib_device *parent;
+	char name[IFNAMSIZ] = {};
+	u32 parentid;
+	int ret;
+
+	ret = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
+			  nldev_policy, extack);
+	if (ret || !tb[RDMA_NLDEV_ATTR_DEV_INDEX] ||
+		!tb[RDMA_NLDEV_ATTR_DEV_NAME] || !tb[RDMA_NLDEV_ATTR_DEV_TYPE])
+		return -EINVAL;
+
+	nla_strscpy(name, tb[RDMA_NLDEV_ATTR_DEV_NAME], sizeof(name));
+	type = nla_get_u8(tb[RDMA_NLDEV_ATTR_DEV_TYPE]);
+	parentid = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
+	parent = ib_device_get_by_index(sock_net(skb->sk), parentid);
+	if (!parent)
+		return -EINVAL;
+
+	ret = ib_add_sub_device(parent, type, name);
+	ib_device_put(parent);
+
+	return ret;
+}
+
+static int nldev_deldev(struct sk_buff *skb, struct nlmsghdr *nlh,
+			struct netlink_ext_ack *extack)
+{
+	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
+	struct ib_device *device;
+	u32 devid;
+	int ret;
+
+	ret = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
+			  nldev_policy, extack);
+	if (ret || !tb[RDMA_NLDEV_ATTR_DEV_INDEX])
+		return -EINVAL;
+
+	devid = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
+	device = ib_device_get_by_index(sock_net(skb->sk), devid);
+	if (!device)
+		return -EINVAL;
+
+	return ib_del_sub_device_and_put(device);
+}
+
 static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
 	[RDMA_NLDEV_CMD_GET] = {
 		.doit = nldev_get_doit,
@@ -2646,6 +2697,14 @@ static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
 	[RDMA_NLDEV_CMD_STAT_GET_STATUS] = {
 		.doit = nldev_stat_get_counter_status_doit,
 	},
+	[RDMA_NLDEV_CMD_NEWDEV] = {
+		.doit = nldev_newdev,
+		.flags = RDMA_NL_ADMIN_PERM,
+	},
+	[RDMA_NLDEV_CMD_DELDEV] = {
+		.doit = nldev_deldev,
+		.flags = RDMA_NL_ADMIN_PERM,
+	},
 };
 
 void __init nldev_init(void)
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index d15ee16be722..bd52fb325e22 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -301,6 +301,10 @@ enum rdma_nldev_command {
 
 	RDMA_NLDEV_CMD_RES_SRQ_GET_RAW,
 
+	RDMA_NLDEV_CMD_NEWDEV,
+
+	RDMA_NLDEV_CMD_DELDEV,
+
 	RDMA_NLDEV_NUM_OPS
 };
 
@@ -564,6 +568,8 @@ enum rdma_nldev_attr {
 	 */
 	RDMA_NLDEV_ATTR_RES_SUBTYPE,		/* string */
 
+	RDMA_NLDEV_ATTR_DEV_TYPE,		/* u8 */
+
 	/*
 	 * Always the end
 	 */
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 09/12] RDMA/nldev: Add support to dump device type and parent device if exists
  2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
                   ` (7 preceding siblings ...)
  2024-06-16 16:08 ` [PATCH rdma-next 08/12] RDMA/nldev: Add support to add/delete a sub IB device through netlink Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
  2024-06-16 16:08 ` [PATCH mlx5-next 10/12] RDMA/mlx5: Add plane index support when querying PTYS registers Leon Romanovsky
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mark Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan

From: Mark Zhang <markzhang@nvidia.com>

If a device has a specific type or a parent device, dump them as well.

Example:
$ rdma dev show smi1
3: smi1: node_type ca fw 20.38.1002 node_guid 9803:9b03:009f:d5ef sys_image_guid 9803:9b03:009f:d5ee type smi parent ibp8s0f1

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/nldev.c  | 10 ++++++++++
 include/uapi/rdma/rdma_netlink.h |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index b5f87e7a1cfd..025efce540a7 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -168,6 +168,7 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
 	[RDMA_NLDEV_SYS_ATTR_PRIVILEGED_QKEY_MODE] = { .type = NLA_U8 },
 	[RDMA_NLDEV_ATTR_DRIVER_DETAILS]	= { .type = NLA_U8 },
 	[RDMA_NLDEV_ATTR_DEV_TYPE]		= { .type = NLA_U8 },
+	[RDMA_NLDEV_ATTR_PARENT_NAME]		= { .type = NLA_NUL_STRING },
 };
 
 static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
@@ -302,6 +303,15 @@ static int fill_dev_info(struct sk_buff *msg, struct ib_device *device)
 	if (nla_put_u8(msg, RDMA_NLDEV_ATTR_DEV_DIM, device->use_cq_dim))
 		return -EMSGSIZE;
 
+	if (device->type &&
+	    nla_put_u8(msg, RDMA_NLDEV_ATTR_DEV_TYPE, device->type))
+		return -EMSGSIZE;
+
+	if (device->parent &&
+	    nla_put_string(msg, RDMA_NLDEV_ATTR_PARENT_NAME,
+			   dev_name(&device->parent->dev)))
+		return -EMSGSIZE;
+
 	/*
 	 * Link type is determined on first port and mlx4 device
 	 * which can potentially have two different link type for the same
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index bd52fb325e22..4b69242d7848 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -570,6 +570,8 @@ enum rdma_nldev_attr {
 
 	RDMA_NLDEV_ATTR_DEV_TYPE,		/* u8 */
 
+	RDMA_NLDEV_ATTR_PARENT_NAME,		/* string */
+
 	/*
 	 * Always the end
 	 */
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH mlx5-next 10/12] RDMA/mlx5: Add plane index support when querying PTYS registers
  2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
                   ` (8 preceding siblings ...)
  2024-06-16 16:08 ` [PATCH rdma-next 09/12] RDMA/nldev: Add support to dump device type and parent device if exists Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
  2024-06-16 16:08 ` [PATCH mlx5-next 11/12] net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports Leon Romanovsky
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mark Zhang, Eric Dumazet, Jakub Kicinski, linux-rdma, netdev,
	Paolo Abeni, Saeed Mahameed, Tariq Toukan

From: Mark Zhang <markzhang@nvidia.com>

Support the new "plane_ind" field when querying port PTYS registers.
This is needed when querying the rate of a plane port.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/main.c                    | 12 +++++++-----
 drivers/net/ethernet/mellanox/mlx5/core/en/port.c    |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |  2 +-
 .../net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/port.c       | 10 ++++++----
 include/linux/mlx5/port.h                            |  5 +++--
 6 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 3a653998bd88..4a0380e711ea 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -542,10 +542,10 @@ static int mlx5_query_port_roce(struct ib_device *device, u32 port_num,
 	 */
 	if (dev->is_rep)
 		err = mlx5_query_port_ptys(mdev, out, sizeof(out), MLX5_PTYS_EN,
-					   1);
+					   1, 0);
 	else
 		err = mlx5_query_port_ptys(mdev, out, sizeof(out), MLX5_PTYS_EN,
-					   mdev_port_num);
+					   mdev_port_num, 0);
 	if (err)
 		goto out;
 	ext = !!MLX5_GET_ETH_PROTO(ptys_reg, out, true, eth_proto_capability);
@@ -1372,11 +1372,11 @@ static int mlx5_query_hca_port(struct ib_device *ibdev, u32 port,
 	struct mlx5_ib_dev *dev = to_mdev(ibdev);
 	struct mlx5_core_dev *mdev = dev->mdev;
 	struct mlx5_hca_vport_context *rep;
+	u8 vl_hw_cap, plane_index = 0;
 	u16 max_mtu;
 	u16 oper_mtu;
 	int err;
 	u16 ib_link_width_oper;
-	u8 vl_hw_cap;
 
 	rep = kzalloc(sizeof(*rep), GFP_KERNEL);
 	if (!rep) {
@@ -1386,8 +1386,10 @@ static int mlx5_query_hca_port(struct ib_device *ibdev, u32 port,
 
 	/* props being zeroed by the caller, avoid zeroing it here */
 
-	if (ibdev->type == RDMA_DEVICE_TYPE_SMI)
+	if (ibdev->type == RDMA_DEVICE_TYPE_SMI) {
+		plane_index = port;
 		port = smi_to_native_portnum(dev, port);
+	}
 
 	err = mlx5_query_hca_vport_context(mdev, 0, port, 0, rep);
 	if (err)
@@ -1419,7 +1421,7 @@ static int mlx5_query_hca_port(struct ib_device *ibdev, u32 port,
 		props->port_cap_flags2 = rep->cap_mask2;
 
 	err = mlx5_query_ib_port_oper(mdev, &ib_link_width_oper,
-				      &props->active_speed, port);
+				      &props->active_speed, port, plane_index);
 	if (err)
 		goto out;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
index b4efc780e297..5f6a0605e4ae 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
@@ -41,7 +41,7 @@ void mlx5_port_query_eth_autoneg(struct mlx5_core_dev *dev, u8 *an_status,
 	*an_disable_cap = 0;
 	*an_disable_admin = 0;
 
-	if (mlx5_query_port_ptys(dev, out, sizeof(out), MLX5_PTYS_EN, 1))
+	if (mlx5_query_port_ptys(dev, out, sizeof(out), MLX5_PTYS_EN, 1, 0))
 		return;
 
 	*an_status = MLX5_GET(ptys_reg, out, an_status);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 3320f12ba2db..f57e0184c12b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1195,7 +1195,7 @@ static int mlx5e_ethtool_get_link_ksettings(struct mlx5e_priv *priv,
 	bool ext;
 	int err;
 
-	err = mlx5_query_port_ptys(mdev, out, sizeof(out), MLX5_PTYS_EN, 1);
+	err = mlx5_query_port_ptys(mdev, out, sizeof(out), MLX5_PTYS_EN, 1, 0);
 	if (err) {
 		netdev_err(priv->netdev, "%s: query port ptys failed: %d\n",
 			   __func__, err);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
index 779d92b762d3..b8aadbea1312 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
@@ -215,7 +215,7 @@ static int mlx5i_get_link_ksettings(struct net_device *netdev,
 	int speed, ret;
 
 	ret = mlx5_query_ib_port_oper(mdev, &ib_link_width_oper, &ib_proto_oper,
-				      1);
+				      1, 0);
 	if (ret)
 		return ret;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index 7fba1c46e2ac..50931584132b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -144,11 +144,13 @@ int mlx5_set_port_caps(struct mlx5_core_dev *dev, u8 port_num, u32 caps)
 EXPORT_SYMBOL_GPL(mlx5_set_port_caps);
 
 int mlx5_query_port_ptys(struct mlx5_core_dev *dev, u32 *ptys,
-			 int ptys_size, int proto_mask, u8 local_port)
+			 int ptys_size, int proto_mask,
+			 u8 local_port, u8 plane_index)
 {
 	u32 in[MLX5_ST_SZ_DW(ptys_reg)] = {0};
 
 	MLX5_SET(ptys_reg, in, local_port, local_port);
+	MLX5_SET(ptys_reg, in, plane_ind, plane_index);
 	MLX5_SET(ptys_reg, in, proto_mask, proto_mask);
 	return mlx5_core_access_reg(dev, in, sizeof(in), ptys,
 				    ptys_size, MLX5_REG_PTYS, 0, 0);
@@ -167,13 +169,13 @@ int mlx5_set_port_beacon(struct mlx5_core_dev *dev, u16 beacon_duration)
 }
 
 int mlx5_query_ib_port_oper(struct mlx5_core_dev *dev, u16 *link_width_oper,
-			    u16 *proto_oper, u8 local_port)
+			    u16 *proto_oper, u8 local_port, u8 plane_index)
 {
 	u32 out[MLX5_ST_SZ_DW(ptys_reg)];
 	int err;
 
 	err = mlx5_query_port_ptys(dev, out, sizeof(out), MLX5_PTYS_IB,
-				   local_port);
+				   local_port, plane_index);
 	if (err)
 		return err;
 
@@ -1114,7 +1116,7 @@ int mlx5_port_query_eth_proto(struct mlx5_core_dev *dev, u8 port, bool ext,
 	if (!eproto)
 		return -EINVAL;
 
-	err = mlx5_query_port_ptys(dev, out, sizeof(out), MLX5_PTYS_EN, port);
+	err = mlx5_query_port_ptys(dev, out, sizeof(out), MLX5_PTYS_EN, port, 0);
 	if (err)
 		return err;
 
diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h
index 26092c78a985..e68d42b8ce65 100644
--- a/include/linux/mlx5/port.h
+++ b/include/linux/mlx5/port.h
@@ -155,10 +155,11 @@ struct mlx5_port_eth_proto {
 
 int mlx5_set_port_caps(struct mlx5_core_dev *dev, u8 port_num, u32 caps);
 int mlx5_query_port_ptys(struct mlx5_core_dev *dev, u32 *ptys,
-			 int ptys_size, int proto_mask, u8 local_port);
+			 int ptys_size, int proto_mask,
+			 u8 local_port, u8 plane_index);
 
 int mlx5_query_ib_port_oper(struct mlx5_core_dev *dev, u16 *link_width_oper,
-			    u16 *proto_oper, u8 local_port);
+			    u16 *proto_oper, u8 local_port, u8 plane_index);
 void mlx5_toggle_port_link(struct mlx5_core_dev *dev);
 int mlx5_set_port_admin_status(struct mlx5_core_dev *dev,
 			       enum mlx5_port_status status);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH mlx5-next 11/12] net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports
  2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
                   ` (9 preceding siblings ...)
  2024-06-16 16:08 ` [PATCH mlx5-next 10/12] RDMA/mlx5: Add plane index support when querying PTYS registers Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
  2024-06-16 16:08 ` [PATCH mlx5-next 12/12] RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register Leon Romanovsky
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mark Zhang, Eric Dumazet, Jakub Kicinski, linux-rdma, netdev,
	Paolo Abeni, Saeed Mahameed, Tariq Toukan

From: Mark Zhang <markzhang@nvidia.com>

This patch adds new fields to support multi-plane and the extend port
counters group. Actual support will be added in the next patch.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/linux/mlx5/mlx5_ifc.h | 47 +++++++++++++++++++++++++++++++++--
 1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 61738990e399..5fea7b747607 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -2651,6 +2651,46 @@ struct mlx5_ifc_ib_port_cntrs_grp_data_layout_bits {
 	u8         port_xmit_wait[0x20];
 };
 
+struct mlx5_ifc_ib_ext_port_cntrs_grp_data_layout_bits {
+	u8         reserved_at_0[0x300];
+
+	u8         port_xmit_data_high[0x20];
+
+	u8         port_xmit_data_low[0x20];
+
+	u8         port_rcv_data_high[0x20];
+
+	u8         port_rcv_data_low[0x20];
+
+	u8         port_xmit_pkts_high[0x20];
+
+	u8         port_xmit_pkts_low[0x20];
+
+	u8         port_rcv_pkts_high[0x20];
+
+	u8         port_rcv_pkts_low[0x20];
+
+	u8         reserved_at_400[0x80];
+
+	u8         port_unicast_xmit_pkts_high[0x20];
+
+	u8         port_unicast_xmit_pkts_low[0x20];
+
+	u8         port_multicast_xmit_pkts_high[0x20];
+
+	u8         port_multicast_xmit_pkts_low[0x20];
+
+	u8         port_unicast_rcv_pkts_high[0x20];
+
+	u8         port_unicast_rcv_pkts_low[0x20];
+
+	u8         port_multicast_rcv_pkts_high[0x20];
+
+	u8         port_multicast_rcv_pkts_low[0x20];
+
+	u8         reserved_at_580[0x240];
+};
+
 struct mlx5_ifc_eth_per_tc_prio_grp_data_layout_bits {
 	u8         transmit_queue_high[0x20];
 
@@ -4543,6 +4583,7 @@ union mlx5_ifc_eth_cntrs_grp_data_layout_auto_bits {
 	struct mlx5_ifc_eth_per_tc_prio_grp_data_layout_bits eth_per_tc_prio_grp_data_layout;
 	struct mlx5_ifc_eth_per_tc_congest_prio_grp_data_layout_bits eth_per_tc_congest_prio_grp_data_layout;
 	struct mlx5_ifc_ib_port_cntrs_grp_data_layout_bits ib_port_cntrs_grp_data_layout;
+	struct mlx5_ifc_ib_ext_port_cntrs_grp_data_layout_bits ib_ext_port_cntrs_grp_data_layout;
 	struct mlx5_ifc_phys_layer_cntrs_bits phys_layer_cntrs;
 	struct mlx5_ifc_phys_layer_statistical_cntrs_bits phys_layer_statistical_cntrs;
 	u8         reserved_at_0[0x7c0];
@@ -9851,8 +9892,10 @@ struct mlx5_ifc_ppcnt_reg_bits {
 	u8         grp[0x6];
 
 	u8         clr[0x1];
-	u8         reserved_at_21[0x1c];
-	u8         prio_tc[0x3];
+	u8         reserved_at_21[0x13];
+	u8         plane_ind[0x4];
+	u8         reserved_at_38[0x3];
+	u8         prio_tc[0x5];
 
 	union mlx5_ifc_eth_cntrs_grp_data_layout_auto_bits counter_set;
 };
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH mlx5-next 12/12] RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register
  2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
                   ` (10 preceding siblings ...)
  2024-06-16 16:08 ` [PATCH mlx5-next 11/12] net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
  2024-06-28 16:00 ` [PATCH rdma-next 00/12] Multi-plane support for mlx5 Jason Gunthorpe
  2024-07-01 12:36 ` Leon Romanovsky
  13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mark Zhang, Eric Dumazet, Jakub Kicinski, linux-rdma, netdev,
	Paolo Abeni, Saeed Mahameed, Tariq Toukan

From: Mark Zhang <markzhang@nvidia.com>

Supports per-plane port counters by querying PPCNT register with the
"extended port counters" group, as the query_vport_counter command
doesn't support plane ports.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mad.c | 69 +++++++++++++++++++++++++++-----
 include/linux/mlx5/device.h      |  1 +
 2 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mad.c b/drivers/infiniband/hw/mlx5/mad.c
index ead836d159d3..1b6c5e37d169 100644
--- a/drivers/infiniband/hw/mlx5/mad.c
+++ b/drivers/infiniband/hw/mlx5/mad.c
@@ -147,8 +147,39 @@ static void pma_cnt_assign(struct ib_pma_portcounters *pma_cnt,
 			     vl_15_dropped);
 }
 
-static int query_ib_ppcnt(struct mlx5_core_dev *dev, u8 port_num, void *out,
-			  size_t sz)
+static void pma_cnt_ext_assign_ppcnt(struct ib_pma_portcounters_ext *cnt_ext,
+				     void *out)
+{
+	void *out_pma = MLX5_ADDR_OF(ppcnt_reg, out,
+				     counter_set);
+
+#define MLX5_GET_EXT_CNTR(counter_name)			\
+	MLX5_GET64(ib_ext_port_cntrs_grp_data_layout,	\
+		   out_pma, counter_name##_high)
+
+	cnt_ext->port_xmit_data =
+		cpu_to_be64(MLX5_GET_EXT_CNTR(port_xmit_data) >> 2);
+	cnt_ext->port_rcv_data =
+		cpu_to_be64(MLX5_GET_EXT_CNTR(port_rcv_data) >> 2);
+
+	cnt_ext->port_xmit_packets =
+		cpu_to_be64(MLX5_GET_EXT_CNTR(port_xmit_pkts));
+	cnt_ext->port_rcv_packets =
+		cpu_to_be64(MLX5_GET_EXT_CNTR(port_rcv_pkts));
+
+	cnt_ext->port_unicast_xmit_packets =
+		cpu_to_be64(MLX5_GET_EXT_CNTR(port_unicast_xmit_pkts));
+	cnt_ext->port_unicast_rcv_packets =
+		cpu_to_be64(MLX5_GET_EXT_CNTR(port_unicast_rcv_pkts));
+
+	cnt_ext->port_multicast_xmit_packets =
+		cpu_to_be64(MLX5_GET_EXT_CNTR(port_multicast_xmit_pkts));
+	cnt_ext->port_multicast_rcv_packets =
+		cpu_to_be64(MLX5_GET_EXT_CNTR(port_multicast_rcv_pkts));
+}
+
+static int query_ib_ppcnt(struct mlx5_core_dev *dev, u8 port_num, u8 plane_num,
+			  void *out, size_t sz, bool ext)
 {
 	u32 *in;
 	int err;
@@ -160,8 +191,14 @@ static int query_ib_ppcnt(struct mlx5_core_dev *dev, u8 port_num, void *out,
 	}
 
 	MLX5_SET(ppcnt_reg, in, local_port, port_num);
-
-	MLX5_SET(ppcnt_reg, in, grp, MLX5_INFINIBAND_PORT_COUNTERS_GROUP);
+	MLX5_SET(ppcnt_reg, in, plane_ind, plane_num);
+
+	if (ext)
+		MLX5_SET(ppcnt_reg, in, grp,
+			 MLX5_INFINIBAND_EXTENDED_PORT_COUNTERS_GROUP);
+	else
+		MLX5_SET(ppcnt_reg, in, grp,
+			 MLX5_INFINIBAND_PORT_COUNTERS_GROUP);
 	err = mlx5_core_access_reg(dev, in, sz, out,
 				   sz, MLX5_REG_PPCNT, 0, 0);
 
@@ -189,7 +226,8 @@ static int process_pma_cmd(struct mlx5_ib_dev *dev, u32 port_num,
 		mdev_port_num = 1;
 	}
 	if (MLX5_CAP_GEN(dev->mdev, num_ports) == 1 &&
-	    !mlx5_core_mp_enabled(mdev)) {
+	    !mlx5_core_mp_enabled(mdev) &&
+	    dev->ib_dev.type != RDMA_DEVICE_TYPE_SMI) {
 		/* set local port to one for Function-Per-Port HCA. */
 		mdev = dev->mdev;
 		mdev_port_num = 1;
@@ -208,7 +246,8 @@ static int process_pma_cmd(struct mlx5_ib_dev *dev, u32 port_num,
 	if (in_mad->mad_hdr.attr_id == IB_PMA_PORT_COUNTERS_EXT) {
 		struct ib_pma_portcounters_ext *pma_cnt_ext =
 			(struct ib_pma_portcounters_ext *)(out_mad->data + 40);
-		int sz = MLX5_ST_SZ_BYTES(query_vport_counter_out);
+		int sz = max(MLX5_ST_SZ_BYTES(query_vport_counter_out),
+			     MLX5_ST_SZ_BYTES(ppcnt_reg));
 
 		out_cnt = kvzalloc(sz, GFP_KERNEL);
 		if (!out_cnt) {
@@ -216,10 +255,18 @@ static int process_pma_cmd(struct mlx5_ib_dev *dev, u32 port_num,
 			goto done;
 		}
 
-		err = mlx5_core_query_vport_counter(mdev, 0, 0, mdev_port_num,
-						    out_cnt);
-		if (!err)
-			pma_cnt_ext_assign(pma_cnt_ext, out_cnt);
+		if (dev->ib_dev.type == RDMA_DEVICE_TYPE_SMI) {
+			err = query_ib_ppcnt(mdev, mdev_port_num,
+					     port_num, out_cnt, sz, 1);
+			if (!err)
+				pma_cnt_ext_assign_ppcnt(pma_cnt_ext, out_cnt);
+		} else {
+			err = mlx5_core_query_vport_counter(mdev, 0, 0,
+							    mdev_port_num,
+							    out_cnt);
+			if (!err)
+				pma_cnt_ext_assign(pma_cnt_ext, out_cnt);
+		}
 	} else {
 		struct ib_pma_portcounters *pma_cnt =
 			(struct ib_pma_portcounters *)(out_mad->data + 40);
@@ -231,7 +278,7 @@ static int process_pma_cmd(struct mlx5_ib_dev *dev, u32 port_num,
 			goto done;
 		}
 
-		err = query_ib_ppcnt(mdev, mdev_port_num, out_cnt, sz);
+		err = query_ib_ppcnt(mdev, mdev_port_num, 0, out_cnt, sz, 0);
 		if (!err)
 			pma_cnt_assign(pma_cnt, out_cnt);
 	}
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index d7bb31d9a446..68bd1b4737ea 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1466,6 +1466,7 @@ enum {
 	MLX5_PER_TRAFFIC_CLASS_CONGESTION_GROUP = 0x13,
 	MLX5_PHYSICAL_LAYER_STATISTICAL_GROUP = 0x16,
 	MLX5_INFINIBAND_PORT_COUNTERS_GROUP   = 0x20,
+	MLX5_INFINIBAND_EXTENDED_PORT_COUNTERS_GROUP = 0x21,
 };
 
 enum {
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH rdma-next 00/12] Multi-plane support for mlx5
  2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
                   ` (11 preceding siblings ...)
  2024-06-16 16:08 ` [PATCH mlx5-next 12/12] RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register Leon Romanovsky
@ 2024-06-28 16:00 ` Jason Gunthorpe
  2024-07-01 12:36 ` Leon Romanovsky
  13 siblings, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2024-06-28 16:00 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Leon Romanovsky, Eric Dumazet, Jakub Kicinski, linux-kernel,
	linux-rdma, Mark Zhang, netdev, Paolo Abeni, Saeed Mahameed,
	Tariq Toukan

On Sun, Jun 16, 2024 at 07:08:32PM +0300, Leon Romanovsky wrote:
> Mark Zhang (12):
>   RDMA/core: Create "issm*" device nodes only when SMI is supported
>   net/mlx5: mlx5_ifc update for multi-plane support
>   RDMA/mlx5: Add support to multi-plane device and port
>   RDMA/core: Support IB sub device with type "SMI"
>   RDMA: Set type of rdma_ah to IB for a SMI sub device
>   RDMA/core: Create GSI QP only when CM is supported
>   RDMA/mlx5: Support plane device and driver APIs to add and delete it
>   RDMA/nldev: Add support to add/delete a sub IB device through netlink
>   RDMA/nldev: Add support to dump device type and parent device if
>     exists
>   RDMA/mlx5: Add plane index support when querying PTYS registers
>   net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports
>   RDMA/mlx5: Support per-plane port IB counters by querying PPCNT
>     register

This all seems quite straightforward, Leon are you going to put this
on a shared branch with all the IFC stuff/etc?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH rdma-next 04/12] RDMA/core: Support IB sub device with type "SMI"
  2024-06-16 16:08 ` [PATCH rdma-next 04/12] RDMA/core: Support IB sub device with type "SMI" Leon Romanovsky
@ 2024-06-29  0:14   ` Zhu Yanjun
  2024-07-01 11:55     ` Leon Romanovsky
  0 siblings, 1 reply; 17+ messages in thread
From: Zhu Yanjun @ 2024-06-29  0:14 UTC (permalink / raw)
  To: Leon Romanovsky, Jason Gunthorpe
  Cc: Mark Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan

在 2024/6/17 0:08, Leon Romanovsky 写道:
> From: Mark Zhang <markzhang@nvidia.com>
> 
> This patch adds 2 APIs, as well as driver operations to support adding
> and deleting an IB sub device, which provides part of functionalities
> of it's parent.
> 
> A sub device has a type; for a sub device with type "SMI", it provides
> the smi capability through umad for its parent, meaning uverb is not
> supported.
> 
> A sub device cannot live without a parent. So when a parent is
> released, all it's sub devices are released as well.
> 
> Signed-off-by: Mark Zhang <markzhang@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>   drivers/infiniband/core/device.c      | 68 +++++++++++++++++++++++++++
>   drivers/infiniband/core/uverbs_main.c |  3 +-
>   include/rdma/ib_verbs.h               | 43 +++++++++++++++++
>   include/uapi/rdma/rdma_netlink.h      |  5 ++
>   4 files changed, 118 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index 55aa7aa32d4a..8547cab50b23 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -641,6 +641,11 @@ struct ib_device *_ib_alloc_device(size_t size)
>   		BIT_ULL(IB_USER_VERBS_CMD_REG_MR) |
>   		BIT_ULL(IB_USER_VERBS_CMD_REREG_MR) |
>   		BIT_ULL(IB_USER_VERBS_CMD_RESIZE_CQ);
> +
> +	mutex_init(&device->subdev_lock);
> +	INIT_LIST_HEAD(&device->subdev_list_head);
> +	INIT_LIST_HEAD(&device->subdev_list);
> +
>   	return device;
>   }
>   EXPORT_SYMBOL(_ib_alloc_device);
> @@ -1461,6 +1466,18 @@ EXPORT_SYMBOL(ib_register_device);
>   /* Callers must hold a get on the device. */
>   static void __ib_unregister_device(struct ib_device *ib_dev)
>   {
> +	struct ib_device *sub, *tmp;
> +
> +	mutex_lock(&ib_dev->subdev_lock);
> +	list_for_each_entry_safe_reverse(sub, tmp,
> +					 &ib_dev->subdev_list_head,
> +					 subdev_list) {
> +		list_del(&sub->subdev_list);
> +		ib_dev->ops.del_sub_dev(sub);
> +		ib_device_put(ib_dev);
> +	}
> +	mutex_unlock(&ib_dev->subdev_lock);
> +
>   	/*
>   	 * We have a registration lock so that all the calls to unregister are
>   	 * fully fenced, once any unregister returns the device is truely
> @@ -2597,6 +2614,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
>   		ops->uverbs_no_driver_id_binding;
>   
>   	SET_DEVICE_OP(dev_ops, add_gid);
> +	SET_DEVICE_OP(dev_ops, add_sub_dev);
>   	SET_DEVICE_OP(dev_ops, advise_mr);
>   	SET_DEVICE_OP(dev_ops, alloc_dm);
>   	SET_DEVICE_OP(dev_ops, alloc_hw_device_stats);
> @@ -2631,6 +2649,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
>   	SET_DEVICE_OP(dev_ops, dealloc_ucontext);
>   	SET_DEVICE_OP(dev_ops, dealloc_xrcd);
>   	SET_DEVICE_OP(dev_ops, del_gid);
> +	SET_DEVICE_OP(dev_ops, del_sub_dev);
>   	SET_DEVICE_OP(dev_ops, dereg_mr);
>   	SET_DEVICE_OP(dev_ops, destroy_ah);
>   	SET_DEVICE_OP(dev_ops, destroy_counters);
> @@ -2727,6 +2746,55 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
>   }
>   EXPORT_SYMBOL(ib_set_device_ops);
>   
> +int ib_add_sub_device(struct ib_device *parent,
> +		      enum rdma_nl_dev_type type,
> +		      const char *name)
> +{
> +	struct ib_device *sub;
> +	int ret = 0;
> +
> +	if (!parent->ops.add_sub_dev || !parent->ops.del_sub_dev)
> +		return -EOPNOTSUPP;
> +
> +	if (!ib_device_try_get(parent))
> +		return -EINVAL;
> +
> +	sub = parent->ops.add_sub_dev(parent, type, name);
> +	if (IS_ERR(sub)) {
> +		ib_device_put(parent);
> +		return PTR_ERR(sub);
> +	}
> +
> +	sub->type = type;
> +	sub->parent = parent;
> +
> +	mutex_lock(&parent->subdev_lock);
> +	list_add_tail(&parent->subdev_list_head, &sub->subdev_list);
> +	mutex_unlock(&parent->subdev_lock);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(ib_add_sub_device);
> +
> +int ib_del_sub_device_and_put(struct ib_device *sub)
> +{
> +	struct ib_device *parent = sub->parent;
> +
> +	if (!parent)
> +		return -EOPNOTSUPP;
> +
> +	mutex_lock(&parent->subdev_lock);

mutex_destroy of subdev_lock is missing. When mutex_lock is called, it 
had better call mutex_destroy when the mutex lock is not used any more.
Other mutex locks in this file, for example subdev_lock and subdev_lock,
call mutex_destroy in the function ib_device_release.

Perhaps subdev_lock can also call mutex_destroy in ib_device_release?

Zhu Yanjun

> +	list_del(&sub->subdev_list);
> +	mutex_unlock(&parent->subdev_lock);
> +
> +	ib_device_put(sub);
> +	parent->ops.del_sub_dev(sub);
> +	ib_device_put(parent);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(ib_del_sub_device_and_put);
> +
>   #ifdef CONFIG_INFINIBAND_VIRT_DMA
>   int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents)
>   {
> diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> index 495d5a5d0373..bc099287de9a 100644
> --- a/drivers/infiniband/core/uverbs_main.c
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -1114,7 +1114,8 @@ static int ib_uverbs_add_one(struct ib_device *device)
>   	struct ib_uverbs_device *uverbs_dev;
>   	int ret;
>   
> -	if (!device->ops.alloc_ucontext)
> +	if (!device->ops.alloc_ucontext ||
> +	    device->type == RDMA_DEVICE_TYPE_SMI)
>   		return -EOPNOTSUPP;
>   
>   	uverbs_dev = kzalloc(sizeof(*uverbs_dev), GFP_KERNEL);
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 477bf9dd5e71..bebc2d22f466 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -2661,6 +2661,18 @@ struct ib_device_ops {
>   	 */
>   	int (*get_numa_node)(struct ib_device *dev);
>   
> +	/**
> +	 * add_sub_dev - Add a sub IB device
> +	 */
> +	struct ib_device *(*add_sub_dev)(struct ib_device *parent,
> +					 enum rdma_nl_dev_type type,
> +					 const char *name);
> +
> +	/**
> +	 * del_sub_dev - Delete a sub IB device
> +	 */
> +	void (*del_sub_dev)(struct ib_device *sub_dev);
> +
>   	DECLARE_RDMA_OBJ_SIZE(ib_ah);
>   	DECLARE_RDMA_OBJ_SIZE(ib_counters);
>   	DECLARE_RDMA_OBJ_SIZE(ib_cq);
> @@ -2771,6 +2783,15 @@ struct ib_device {
>   	char iw_ifname[IFNAMSIZ];
>   	u32 iw_driver_flags;
>   	u32 lag_flags;
> +
> +	/* A parent device has a list of sub-devices */
> +	struct mutex subdev_lock;
> +	struct list_head subdev_list_head;
> +
> +	/* A sub device has a type and a parent */
> +	enum rdma_nl_dev_type type;
> +	struct ib_device *parent;
> +	struct list_head subdev_list;
>   };
>   
>   static inline void *rdma_zalloc_obj(struct ib_device *dev, size_t size,
> @@ -4820,4 +4841,26 @@ static inline u16 rdma_get_udp_sport(u32 fl, u32 lqpn, u32 rqpn)
>   
>   const struct ib_port_immutable*
>   ib_port_immutable_read(struct ib_device *dev, unsigned int port);
> +
> +/** ib_add_sub_device - Add a sub IB device on an existing one
> + *
> + * @parent: The IB device that needs to add a sub device
> + * @type: The type of the new sub device
> + * @name: The name of the new sub device
> + *
> + *
> + * Return 0 on success, an error code otherwise
> + */
> +int ib_add_sub_device(struct ib_device *parent,
> +		      enum rdma_nl_dev_type type,
> +		      const char *name);
> +
> +
> +/** ib_del_sub_device_and_put - Delect an IB sub device while holding a 'get'
> + *
> + * @sub: The sub device that is going to be deleted
> + *
> + * Return 0 on success, an error code otherwise
> + */
> +int ib_del_sub_device_and_put(struct ib_device *sub);
>   #endif /* IB_VERBS_H */
> diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
> index a214fc259f28..d15ee16be722 100644
> --- a/include/uapi/rdma/rdma_netlink.h
> +++ b/include/uapi/rdma/rdma_netlink.h
> @@ -602,4 +602,9 @@ enum rdma_nl_counter_mask {
>   	RDMA_COUNTER_MASK_QP_TYPE = 1,
>   	RDMA_COUNTER_MASK_PID = 1 << 1,
>   };
> +
> +/* Supported rdma device types. */
> +enum rdma_nl_dev_type {
> +	RDMA_DEVICE_TYPE_SMI = 1,
> +};
>   #endif /* _UAPI_RDMA_NETLINK_H */


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH rdma-next 04/12] RDMA/core: Support IB sub device with type "SMI"
  2024-06-29  0:14   ` Zhu Yanjun
@ 2024-07-01 11:55     ` Leon Romanovsky
  0 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-07-01 11:55 UTC (permalink / raw)
  To: Zhu Yanjun
  Cc: Jason Gunthorpe, Mark Zhang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
	Tariq Toukan

On Sat, Jun 29, 2024 at 08:14:56AM +0800, Zhu Yanjun wrote:
> 在 2024/6/17 0:08, Leon Romanovsky 写道:
> > From: Mark Zhang <markzhang@nvidia.com>
> > 
> > This patch adds 2 APIs, as well as driver operations to support adding
> > and deleting an IB sub device, which provides part of functionalities
> > of it's parent.
> > 
> > A sub device has a type; for a sub device with type "SMI", it provides
> > the smi capability through umad for its parent, meaning uverb is not
> > supported.
> > 
> > A sub device cannot live without a parent. So when a parent is
> > released, all it's sub devices are released as well.
> > 
> > Signed-off-by: Mark Zhang <markzhang@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >   drivers/infiniband/core/device.c      | 68 +++++++++++++++++++++++++++
> >   drivers/infiniband/core/uverbs_main.c |  3 +-
> >   include/rdma/ib_verbs.h               | 43 +++++++++++++++++
> >   include/uapi/rdma/rdma_netlink.h      |  5 ++
> >   4 files changed, 118 insertions(+), 1 deletion(-)

<...>

> > +int ib_del_sub_device_and_put(struct ib_device *sub)
> > +{
> > +	struct ib_device *parent = sub->parent;
> > +
> > +	if (!parent)
> > +		return -EOPNOTSUPP;
> > +
> > +	mutex_lock(&parent->subdev_lock);
> 
> mutex_destroy of subdev_lock is missing. When mutex_lock is called, it had
> better call mutex_destroy when the mutex lock is not used any more.
> Other mutex locks in this file, for example subdev_lock and subdev_lock,
> call mutex_destroy in the function ib_device_release.
> 
> Perhaps subdev_lock can also call mutex_destroy in ib_device_release?

Thanks, I will add this fixup to the series.

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 7aaf2b4c1844..7b418c717f29 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -503,6 +503,7 @@ static void ib_device_release(struct device *device)
                          rcu_head);
        }

+       mutex_destroy(&dev->subdev_lock);
        mutex_destroy(&dev->unregistration_lock);
        mutex_destroy(&dev->compat_devs_mutex);

Thanks

> 
> Zhu Yanjun
> 
> > +	list_del(&sub->subdev_list);
> > +	mutex_unlock(&parent->subdev_lock);
> > +
> > +	ib_device_put(sub);
> > +	parent->ops.del_sub_dev(sub);
> > +	ib_device_put(parent);
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL(ib_del_sub_device_and_put);
> > +
> >   #ifdef CONFIG_INFINIBAND_VIRT_DMA
> >   int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents)
> >   {
> > diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> > index 495d5a5d0373..bc099287de9a 100644
> > --- a/drivers/infiniband/core/uverbs_main.c
> > +++ b/drivers/infiniband/core/uverbs_main.c
> > @@ -1114,7 +1114,8 @@ static int ib_uverbs_add_one(struct ib_device *device)
> >   	struct ib_uverbs_device *uverbs_dev;
> >   	int ret;
> > -	if (!device->ops.alloc_ucontext)
> > +	if (!device->ops.alloc_ucontext ||
> > +	    device->type == RDMA_DEVICE_TYPE_SMI)
> >   		return -EOPNOTSUPP;
> >   	uverbs_dev = kzalloc(sizeof(*uverbs_dev), GFP_KERNEL);
> > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> > index 477bf9dd5e71..bebc2d22f466 100644
> > --- a/include/rdma/ib_verbs.h
> > +++ b/include/rdma/ib_verbs.h
> > @@ -2661,6 +2661,18 @@ struct ib_device_ops {
> >   	 */
> >   	int (*get_numa_node)(struct ib_device *dev);
> > +	/**
> > +	 * add_sub_dev - Add a sub IB device
> > +	 */
> > +	struct ib_device *(*add_sub_dev)(struct ib_device *parent,
> > +					 enum rdma_nl_dev_type type,
> > +					 const char *name);
> > +
> > +	/**
> > +	 * del_sub_dev - Delete a sub IB device
> > +	 */
> > +	void (*del_sub_dev)(struct ib_device *sub_dev);
> > +
> >   	DECLARE_RDMA_OBJ_SIZE(ib_ah);
> >   	DECLARE_RDMA_OBJ_SIZE(ib_counters);
> >   	DECLARE_RDMA_OBJ_SIZE(ib_cq);
> > @@ -2771,6 +2783,15 @@ struct ib_device {
> >   	char iw_ifname[IFNAMSIZ];
> >   	u32 iw_driver_flags;
> >   	u32 lag_flags;
> > +
> > +	/* A parent device has a list of sub-devices */
> > +	struct mutex subdev_lock;
> > +	struct list_head subdev_list_head;
> > +
> > +	/* A sub device has a type and a parent */
> > +	enum rdma_nl_dev_type type;
> > +	struct ib_device *parent;
> > +	struct list_head subdev_list;
> >   };
> >   static inline void *rdma_zalloc_obj(struct ib_device *dev, size_t size,
> > @@ -4820,4 +4841,26 @@ static inline u16 rdma_get_udp_sport(u32 fl, u32 lqpn, u32 rqpn)
> >   const struct ib_port_immutable*
> >   ib_port_immutable_read(struct ib_device *dev, unsigned int port);
> > +
> > +/** ib_add_sub_device - Add a sub IB device on an existing one
> > + *
> > + * @parent: The IB device that needs to add a sub device
> > + * @type: The type of the new sub device
> > + * @name: The name of the new sub device
> > + *
> > + *
> > + * Return 0 on success, an error code otherwise
> > + */
> > +int ib_add_sub_device(struct ib_device *parent,
> > +		      enum rdma_nl_dev_type type,
> > +		      const char *name);
> > +
> > +
> > +/** ib_del_sub_device_and_put - Delect an IB sub device while holding a 'get'
> > + *
> > + * @sub: The sub device that is going to be deleted
> > + *
> > + * Return 0 on success, an error code otherwise
> > + */
> > +int ib_del_sub_device_and_put(struct ib_device *sub);
> >   #endif /* IB_VERBS_H */
> > diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
> > index a214fc259f28..d15ee16be722 100644
> > --- a/include/uapi/rdma/rdma_netlink.h
> > +++ b/include/uapi/rdma/rdma_netlink.h
> > @@ -602,4 +602,9 @@ enum rdma_nl_counter_mask {
> >   	RDMA_COUNTER_MASK_QP_TYPE = 1,
> >   	RDMA_COUNTER_MASK_PID = 1 << 1,
> >   };
> > +
> > +/* Supported rdma device types. */
> > +enum rdma_nl_dev_type {
> > +	RDMA_DEVICE_TYPE_SMI = 1,
> > +};
> >   #endif /* _UAPI_RDMA_NETLINK_H */
> 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH rdma-next 00/12] Multi-plane support for mlx5
  2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
                   ` (12 preceding siblings ...)
  2024-06-28 16:00 ` [PATCH rdma-next 00/12] Multi-plane support for mlx5 Jason Gunthorpe
@ 2024-07-01 12:36 ` Leon Romanovsky
  13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-07-01 12:36 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky
  Cc: Eric Dumazet, Jakub Kicinski, linux-kernel, linux-rdma,
	Mark Zhang, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan


On Sun, 16 Jun 2024 19:08:32 +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> From Mark,
> 
> This patchset adds support to IB sub device and mlx5 implementation.
> 
> An IB sub device provides a subset of functionalists of it's parent.
> Currently type "SMI" is supported: A SMI device provides SMI (QP0)
> interface and shares same VPort with it's parent; It allows the subnet
> manager to configure VPort through this interface when the parent
> doesn't support SMI.
> 
> [...]

Applied, thanks!

[01/12] RDMA/core: Create "issm*" device nodes only when SMI is supported
        https://git.kernel.org/rdma/rdma/c/50660c5197f52b
[02/12] net/mlx5: mlx5_ifc update for multi-plane support
        https://git.kernel.org/rdma/rdma/c/65528cfb21fdb6
[03/12] RDMA/mlx5: Add support to multi-plane device and port
        https://git.kernel.org/rdma/rdma/c/2a5db20fa53219
[04/12] RDMA/core: Support IB sub device with type "SMI"
        https://git.kernel.org/rdma/rdma/c/f3b5c2b823fbd8
[05/12] RDMA: Set type of rdma_ah to IB for a SMI sub device
        https://git.kernel.org/rdma/rdma/c/66862e38a557b3
[06/12] RDMA/core: Create GSI QP only when CM is supported
        https://git.kernel.org/rdma/rdma/c/6d4498d1745128
[07/12] RDMA/mlx5: Support plane device and driver APIs to add and delete it
        https://git.kernel.org/rdma/rdma/c/39351acd72e775
[08/12] RDMA/nldev: Add support to add/delete a sub IB device through netlink
        https://git.kernel.org/rdma/rdma/c/201dfa2d8129a6
[09/12] RDMA/nldev: Add support to dump device type and parent device if exists
        https://git.kernel.org/rdma/rdma/c/1bc00c7c0ae33e
[10/12] RDMA/mlx5: Add plane index support when querying PTYS registers
        https://git.kernel.org/rdma/rdma/c/d6caf3986716c3
[11/12] net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports
        https://git.kernel.org/rdma/rdma/c/db9e43f6580613
[12/12] RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register
        https://git.kernel.org/rdma/rdma/c/ac3a5e5f01eb40

Best regards,
-- 
Leon Romanovsky <leonro@nvidia.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-07-01 12:36 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
2024-06-16 16:08 ` [PATCH rdma-next 01/12] RDMA/core: Create "issm*" device nodes only when SMI is supported Leon Romanovsky
2024-06-16 16:08 ` [PATCH mlx5-next 02/12] net/mlx5: mlx5_ifc update for multi-plane support Leon Romanovsky
2024-06-16 16:08 ` [PATCH mlx5-next 03/12] RDMA/mlx5: Add support to multi-plane device and port Leon Romanovsky
2024-06-16 16:08 ` [PATCH rdma-next 04/12] RDMA/core: Support IB sub device with type "SMI" Leon Romanovsky
2024-06-29  0:14   ` Zhu Yanjun
2024-07-01 11:55     ` Leon Romanovsky
2024-06-16 16:08 ` [PATCH rdma-next 05/12] RDMA: Set type of rdma_ah to IB for a SMI sub device Leon Romanovsky
2024-06-16 16:08 ` [PATCH rdma-next 06/12] RDMA/core: Create GSI QP only when CM is supported Leon Romanovsky
2024-06-16 16:08 ` [PATCH rdma-next 07/12] RDMA/mlx5: Support plane device and driver APIs to add and delete it Leon Romanovsky
2024-06-16 16:08 ` [PATCH rdma-next 08/12] RDMA/nldev: Add support to add/delete a sub IB device through netlink Leon Romanovsky
2024-06-16 16:08 ` [PATCH rdma-next 09/12] RDMA/nldev: Add support to dump device type and parent device if exists Leon Romanovsky
2024-06-16 16:08 ` [PATCH mlx5-next 10/12] RDMA/mlx5: Add plane index support when querying PTYS registers Leon Romanovsky
2024-06-16 16:08 ` [PATCH mlx5-next 11/12] net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports Leon Romanovsky
2024-06-16 16:08 ` [PATCH mlx5-next 12/12] RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register Leon Romanovsky
2024-06-28 16:00 ` [PATCH rdma-next 00/12] Multi-plane support for mlx5 Jason Gunthorpe
2024-07-01 12:36 ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).