* [PATCH rdma-next 01/12] RDMA/core: Create "issm*" device nodes only when SMI is supported
2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
2024-06-16 16:08 ` [PATCH mlx5-next 02/12] net/mlx5: mlx5_ifc update for multi-plane support Leon Romanovsky
` (12 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mark Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan
From: Mark Zhang <markzhang@nvidia.com>
For an IB port create it's issm device node only when it has SMI
capability. In following patches mlx5 is going to support IB devices
without this cap.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/user_mad.c | 29 ++++++++++++++++++-----------
1 file changed, 18 insertions(+), 11 deletions(-)
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 2ed749f50a29..f760dfffa188 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -1321,15 +1321,17 @@ static int ib_umad_init_port(struct ib_device *device, int port_num,
if (ret)
goto err_cdev;
- ib_umad_init_port_dev(&port->sm_dev, port, device);
- port->sm_dev.devt = base_issm;
- dev_set_name(&port->sm_dev, "issm%d", port->dev_num);
- cdev_init(&port->sm_cdev, &umad_sm_fops);
- port->sm_cdev.owner = THIS_MODULE;
-
- ret = cdev_device_add(&port->sm_cdev, &port->sm_dev);
- if (ret)
- goto err_dev;
+ if (rdma_cap_ib_smi(device, port_num)) {
+ ib_umad_init_port_dev(&port->sm_dev, port, device);
+ port->sm_dev.devt = base_issm;
+ dev_set_name(&port->sm_dev, "issm%d", port->dev_num);
+ cdev_init(&port->sm_cdev, &umad_sm_fops);
+ port->sm_cdev.owner = THIS_MODULE;
+
+ ret = cdev_device_add(&port->sm_cdev, &port->sm_dev);
+ if (ret)
+ goto err_dev;
+ }
return 0;
@@ -1345,9 +1347,13 @@ static int ib_umad_init_port(struct ib_device *device, int port_num,
static void ib_umad_kill_port(struct ib_umad_port *port)
{
struct ib_umad_file *file;
+ bool has_smi = false;
int id;
- cdev_device_del(&port->sm_cdev, &port->sm_dev);
+ if (rdma_cap_ib_smi(port->ib_dev, port->port_num)) {
+ cdev_device_del(&port->sm_cdev, &port->sm_dev);
+ has_smi = true;
+ }
cdev_device_del(&port->cdev, &port->dev);
mutex_lock(&port->file_mutex);
@@ -1373,7 +1379,8 @@ static void ib_umad_kill_port(struct ib_umad_port *port)
ida_free(&umad_ida, port->dev_num);
/* balances device_initialize() */
- put_device(&port->sm_dev);
+ if (has_smi)
+ put_device(&port->sm_dev);
put_device(&port->dev);
}
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH mlx5-next 02/12] net/mlx5: mlx5_ifc update for multi-plane support
2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
2024-06-16 16:08 ` [PATCH rdma-next 01/12] RDMA/core: Create "issm*" device nodes only when SMI is supported Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
2024-06-16 16:08 ` [PATCH mlx5-next 03/12] RDMA/mlx5: Add support to multi-plane device and port Leon Romanovsky
` (11 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mark Zhang, Eric Dumazet, Jakub Kicinski, linux-rdma, netdev,
Paolo Abeni, Saeed Mahameed, Tariq Toukan
From: Mark Zhang <markzhang@nvidia.com>
Add new fields to support mlx5 multi-plane feature. Actual support will
be added in following patches.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
include/linux/mlx5/mlx5_ifc.h | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 09d9d87d62c6..61738990e399 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -793,7 +793,7 @@ struct mlx5_ifc_ads_bits {
u8 reserved_at_2[0xe];
u8 pkey_index[0x10];
- u8 reserved_at_20[0x8];
+ u8 plane_index[0x8];
u8 grh[0x1];
u8 mlid[0x7];
u8 rlid[0x10];
@@ -1990,7 +1990,8 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
u8 reserved_at_c0[0x8];
u8 migration_multi_load[0x1];
u8 migration_tracking_state[0x1];
- u8 reserved_at_ca[0x6];
+ u8 multiplane_qp_ud[0x1];
+ u8 reserved_at_cb[0x5];
u8 migration_in_chunks[0x1];
u8 reserved_at_d1[0xf];
@@ -4172,7 +4173,8 @@ struct mlx5_ifc_hca_vport_context_bits {
u8 has_smi[0x1];
u8 has_raw[0x1];
u8 grh_required[0x1];
- u8 reserved_at_104[0xc];
+ u8 reserved_at_104[0x4];
+ u8 num_port_plane[0x8];
u8 port_physical_state[0x4];
u8 vport_state_policy[0x4];
u8 port_state[0x4];
@@ -7692,7 +7694,7 @@ struct mlx5_ifc_mad_ifc_in_bits {
u8 op_mod[0x10];
u8 remote_lid[0x10];
- u8 reserved_at_50[0x8];
+ u8 plane_index[0x8];
u8 port[0x8];
u8 reserved_at_60[0x20];
@@ -9621,7 +9623,9 @@ struct mlx5_ifc_ptys_reg_bits {
u8 an_disable_cap[0x1];
u8 reserved_at_3[0x5];
u8 local_port[0x8];
- u8 reserved_at_10[0xd];
+ u8 reserved_at_10[0x8];
+ u8 plane_ind[0x4];
+ u8 reserved_at_1c[0x1];
u8 proto_mask[0x3];
u8 an_status[0x4];
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH mlx5-next 03/12] RDMA/mlx5: Add support to multi-plane device and port
2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
2024-06-16 16:08 ` [PATCH rdma-next 01/12] RDMA/core: Create "issm*" device nodes only when SMI is supported Leon Romanovsky
2024-06-16 16:08 ` [PATCH mlx5-next 02/12] net/mlx5: mlx5_ifc update for multi-plane support Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
2024-06-16 16:08 ` [PATCH rdma-next 04/12] RDMA/core: Support IB sub device with type "SMI" Leon Romanovsky
` (10 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mark Zhang, Eric Dumazet, Jakub Kicinski, linux-rdma, netdev,
Paolo Abeni, Saeed Mahameed, Tariq Toukan
From: Mark Zhang <markzhang@nvidia.com>
When multi-plane is supported, a logical port, which is aggregation of
multiple physical plane ports, is exposed for data transmission.
Compared with a normal mlx5 IB port, this logical port supports all
functionalities except Subnet Management.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/main.c | 60 ++++++++++++++++---
drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 +
.../net/ethernet/mellanox/mlx5/core/vport.c | 1 +
include/linux/mlx5/driver.h | 1 +
4 files changed, 55 insertions(+), 9 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index a7003316d438..55eb60715b48 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1388,7 +1388,13 @@ static int mlx5_query_hca_port(struct ib_device *ibdev, u32 port,
props->sm_sl = rep->sm_sl;
props->state = rep->vport_state;
props->phys_state = rep->port_physical_state;
- props->port_cap_flags = rep->cap_mask1;
+
+ props->port_cap_flags = rep->cap_mask1;
+ if (dev->num_plane) {
+ props->port_cap_flags |= IB_PORT_SM_DISABLED;
+ props->port_cap_flags &= ~IB_PORT_SM;
+ }
+
props->gid_tbl_len = mlx5_get_gid_table_len(MLX5_CAP_GEN(mdev, gid_table_size));
props->max_msg_sz = 1 << MLX5_CAP_GEN(mdev, log_max_msg);
props->pkey_tbl_len = mlx5_to_sw_pkey_sz(MLX5_CAP_GEN(mdev, pkey_table_size));
@@ -2807,6 +2813,23 @@ static int mlx5_ib_event_slave_port(struct notifier_block *nb,
return NOTIFY_OK;
}
+static int mlx5_ib_get_plane_num(struct mlx5_core_dev *mdev, u8 *num_plane)
+{
+ struct mlx5_hca_vport_context vport_ctx;
+ int err;
+
+ *num_plane = 0;
+ if (!MLX5_CAP_GEN(mdev, ib_virt))
+ return 0;
+
+ err = mlx5_query_hca_vport_context(mdev, 0, 1, 0, &vport_ctx);
+ if (err)
+ return err;
+
+ *num_plane = vport_ctx.num_plane;
+ return 0;
+}
+
static int set_has_smi_cap(struct mlx5_ib_dev *dev)
{
struct mlx5_hca_vport_context vport_ctx;
@@ -2817,10 +2840,14 @@ static int set_has_smi_cap(struct mlx5_ib_dev *dev)
return 0;
for (port = 1; port <= dev->num_ports; port++) {
- if (!MLX5_CAP_GEN(dev->mdev, ib_virt)) {
+ if (dev->num_plane) {
+ dev->port_caps[port - 1].has_smi = false;
+ continue;
+ } else if (!MLX5_CAP_GEN(dev->mdev, ib_virt)) {
dev->port_caps[port - 1].has_smi = true;
continue;
}
+
err = mlx5_query_hca_vport_context(dev->mdev, 0, port, 0,
&vport_ctx);
if (err) {
@@ -3026,6 +3053,11 @@ static u32 get_core_cap_flags(struct ib_device *ibdev,
if (rep->grh_required)
ret |= RDMA_CORE_CAP_IB_GRH_REQUIRED;
+ if (dev->num_plane)
+ return ret | RDMA_CORE_CAP_PROT_IB | RDMA_CORE_CAP_IB_MAD |
+ RDMA_CORE_CAP_IB_CM | RDMA_CORE_CAP_IB_SA |
+ RDMA_CORE_CAP_AF_IB;
+
if (ll == IB_LINK_LAYER_INFINIBAND)
return ret | RDMA_CORE_PORT_IBA_IB;
@@ -4507,11 +4539,18 @@ static int mlx5r_probe(struct auxiliary_device *adev,
dev = ib_alloc_device(mlx5_ib_dev, ib_dev);
if (!dev)
return -ENOMEM;
+
+ if (ll == IB_LINK_LAYER_INFINIBAND) {
+ ret = mlx5_ib_get_plane_num(mdev, &dev->num_plane);
+ if (ret)
+ goto fail;
+ }
+
dev->port = kcalloc(num_ports, sizeof(*dev->port),
GFP_KERNEL);
if (!dev->port) {
- ib_dealloc_device(&dev->ib_dev);
- return -ENOMEM;
+ ret = -ENOMEM;
+ goto fail;
}
dev->mdev = mdev;
@@ -4523,14 +4562,17 @@ static int mlx5r_probe(struct auxiliary_device *adev,
profile = &pf_profile;
ret = __mlx5_ib_add(dev, profile);
- if (ret) {
- kfree(dev->port);
- ib_dealloc_device(&dev->ib_dev);
- return ret;
- }
+ if (ret)
+ goto fail_ib_add;
auxiliary_set_drvdata(adev, dev);
return 0;
+
+fail_ib_add:
+ kfree(dev->port);
+fail:
+ ib_dealloc_device(&dev->ib_dev);
+ return ret;
}
static void mlx5r_remove(struct auxiliary_device *adev)
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index a6f2b679a7e9..d97d6bc2dbaa 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1189,6 +1189,8 @@ struct mlx5_ib_dev {
#ifdef CONFIG_MLX5_MACSEC
struct mlx5_macsec macsec;
#endif
+
+ u8 num_plane;
};
static inline struct mlx5_ib_cq *to_mibcq(struct mlx5_core_cq *mcq)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index 1005bb6935b6..0d5f750faa45 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -737,6 +737,7 @@ int mlx5_query_hca_vport_context(struct mlx5_core_dev *dev,
rep->grh_required = MLX5_GET_PR(hca_vport_context, ctx, grh_required);
rep->sys_image_guid = MLX5_GET64_PR(hca_vport_context, ctx,
system_image_guid);
+ rep->num_plane = MLX5_GET_PR(hca_vport_context, ctx, num_port_plane);
ex:
kvfree(out);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 145e2fb1b832..2889ece6c808 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -917,6 +917,7 @@ struct mlx5_hca_vport_context {
u16 qkey_violation_counter;
u16 pkey_violation_counter;
bool grh_required;
+ u8 num_plane;
};
#define STRUCT_FIELD(header, field) \
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH rdma-next 04/12] RDMA/core: Support IB sub device with type "SMI"
2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
` (2 preceding siblings ...)
2024-06-16 16:08 ` [PATCH mlx5-next 03/12] RDMA/mlx5: Add support to multi-plane device and port Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
2024-06-29 0:14 ` Zhu Yanjun
2024-06-16 16:08 ` [PATCH rdma-next 05/12] RDMA: Set type of rdma_ah to IB for a SMI sub device Leon Romanovsky
` (9 subsequent siblings)
13 siblings, 1 reply; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mark Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan
From: Mark Zhang <markzhang@nvidia.com>
This patch adds 2 APIs, as well as driver operations to support adding
and deleting an IB sub device, which provides part of functionalities
of it's parent.
A sub device has a type; for a sub device with type "SMI", it provides
the smi capability through umad for its parent, meaning uverb is not
supported.
A sub device cannot live without a parent. So when a parent is
released, all it's sub devices are released as well.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/device.c | 68 +++++++++++++++++++++++++++
drivers/infiniband/core/uverbs_main.c | 3 +-
include/rdma/ib_verbs.h | 43 +++++++++++++++++
include/uapi/rdma/rdma_netlink.h | 5 ++
4 files changed, 118 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 55aa7aa32d4a..8547cab50b23 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -641,6 +641,11 @@ struct ib_device *_ib_alloc_device(size_t size)
BIT_ULL(IB_USER_VERBS_CMD_REG_MR) |
BIT_ULL(IB_USER_VERBS_CMD_REREG_MR) |
BIT_ULL(IB_USER_VERBS_CMD_RESIZE_CQ);
+
+ mutex_init(&device->subdev_lock);
+ INIT_LIST_HEAD(&device->subdev_list_head);
+ INIT_LIST_HEAD(&device->subdev_list);
+
return device;
}
EXPORT_SYMBOL(_ib_alloc_device);
@@ -1461,6 +1466,18 @@ EXPORT_SYMBOL(ib_register_device);
/* Callers must hold a get on the device. */
static void __ib_unregister_device(struct ib_device *ib_dev)
{
+ struct ib_device *sub, *tmp;
+
+ mutex_lock(&ib_dev->subdev_lock);
+ list_for_each_entry_safe_reverse(sub, tmp,
+ &ib_dev->subdev_list_head,
+ subdev_list) {
+ list_del(&sub->subdev_list);
+ ib_dev->ops.del_sub_dev(sub);
+ ib_device_put(ib_dev);
+ }
+ mutex_unlock(&ib_dev->subdev_lock);
+
/*
* We have a registration lock so that all the calls to unregister are
* fully fenced, once any unregister returns the device is truely
@@ -2597,6 +2614,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
ops->uverbs_no_driver_id_binding;
SET_DEVICE_OP(dev_ops, add_gid);
+ SET_DEVICE_OP(dev_ops, add_sub_dev);
SET_DEVICE_OP(dev_ops, advise_mr);
SET_DEVICE_OP(dev_ops, alloc_dm);
SET_DEVICE_OP(dev_ops, alloc_hw_device_stats);
@@ -2631,6 +2649,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, dealloc_ucontext);
SET_DEVICE_OP(dev_ops, dealloc_xrcd);
SET_DEVICE_OP(dev_ops, del_gid);
+ SET_DEVICE_OP(dev_ops, del_sub_dev);
SET_DEVICE_OP(dev_ops, dereg_mr);
SET_DEVICE_OP(dev_ops, destroy_ah);
SET_DEVICE_OP(dev_ops, destroy_counters);
@@ -2727,6 +2746,55 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
}
EXPORT_SYMBOL(ib_set_device_ops);
+int ib_add_sub_device(struct ib_device *parent,
+ enum rdma_nl_dev_type type,
+ const char *name)
+{
+ struct ib_device *sub;
+ int ret = 0;
+
+ if (!parent->ops.add_sub_dev || !parent->ops.del_sub_dev)
+ return -EOPNOTSUPP;
+
+ if (!ib_device_try_get(parent))
+ return -EINVAL;
+
+ sub = parent->ops.add_sub_dev(parent, type, name);
+ if (IS_ERR(sub)) {
+ ib_device_put(parent);
+ return PTR_ERR(sub);
+ }
+
+ sub->type = type;
+ sub->parent = parent;
+
+ mutex_lock(&parent->subdev_lock);
+ list_add_tail(&parent->subdev_list_head, &sub->subdev_list);
+ mutex_unlock(&parent->subdev_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL(ib_add_sub_device);
+
+int ib_del_sub_device_and_put(struct ib_device *sub)
+{
+ struct ib_device *parent = sub->parent;
+
+ if (!parent)
+ return -EOPNOTSUPP;
+
+ mutex_lock(&parent->subdev_lock);
+ list_del(&sub->subdev_list);
+ mutex_unlock(&parent->subdev_lock);
+
+ ib_device_put(sub);
+ parent->ops.del_sub_dev(sub);
+ ib_device_put(parent);
+
+ return 0;
+}
+EXPORT_SYMBOL(ib_del_sub_device_and_put);
+
#ifdef CONFIG_INFINIBAND_VIRT_DMA
int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents)
{
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 495d5a5d0373..bc099287de9a 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -1114,7 +1114,8 @@ static int ib_uverbs_add_one(struct ib_device *device)
struct ib_uverbs_device *uverbs_dev;
int ret;
- if (!device->ops.alloc_ucontext)
+ if (!device->ops.alloc_ucontext ||
+ device->type == RDMA_DEVICE_TYPE_SMI)
return -EOPNOTSUPP;
uverbs_dev = kzalloc(sizeof(*uverbs_dev), GFP_KERNEL);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 477bf9dd5e71..bebc2d22f466 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2661,6 +2661,18 @@ struct ib_device_ops {
*/
int (*get_numa_node)(struct ib_device *dev);
+ /**
+ * add_sub_dev - Add a sub IB device
+ */
+ struct ib_device *(*add_sub_dev)(struct ib_device *parent,
+ enum rdma_nl_dev_type type,
+ const char *name);
+
+ /**
+ * del_sub_dev - Delete a sub IB device
+ */
+ void (*del_sub_dev)(struct ib_device *sub_dev);
+
DECLARE_RDMA_OBJ_SIZE(ib_ah);
DECLARE_RDMA_OBJ_SIZE(ib_counters);
DECLARE_RDMA_OBJ_SIZE(ib_cq);
@@ -2771,6 +2783,15 @@ struct ib_device {
char iw_ifname[IFNAMSIZ];
u32 iw_driver_flags;
u32 lag_flags;
+
+ /* A parent device has a list of sub-devices */
+ struct mutex subdev_lock;
+ struct list_head subdev_list_head;
+
+ /* A sub device has a type and a parent */
+ enum rdma_nl_dev_type type;
+ struct ib_device *parent;
+ struct list_head subdev_list;
};
static inline void *rdma_zalloc_obj(struct ib_device *dev, size_t size,
@@ -4820,4 +4841,26 @@ static inline u16 rdma_get_udp_sport(u32 fl, u32 lqpn, u32 rqpn)
const struct ib_port_immutable*
ib_port_immutable_read(struct ib_device *dev, unsigned int port);
+
+/** ib_add_sub_device - Add a sub IB device on an existing one
+ *
+ * @parent: The IB device that needs to add a sub device
+ * @type: The type of the new sub device
+ * @name: The name of the new sub device
+ *
+ *
+ * Return 0 on success, an error code otherwise
+ */
+int ib_add_sub_device(struct ib_device *parent,
+ enum rdma_nl_dev_type type,
+ const char *name);
+
+
+/** ib_del_sub_device_and_put - Delect an IB sub device while holding a 'get'
+ *
+ * @sub: The sub device that is going to be deleted
+ *
+ * Return 0 on success, an error code otherwise
+ */
+int ib_del_sub_device_and_put(struct ib_device *sub);
#endif /* IB_VERBS_H */
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index a214fc259f28..d15ee16be722 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -602,4 +602,9 @@ enum rdma_nl_counter_mask {
RDMA_COUNTER_MASK_QP_TYPE = 1,
RDMA_COUNTER_MASK_PID = 1 << 1,
};
+
+/* Supported rdma device types. */
+enum rdma_nl_dev_type {
+ RDMA_DEVICE_TYPE_SMI = 1,
+};
#endif /* _UAPI_RDMA_NETLINK_H */
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH rdma-next 04/12] RDMA/core: Support IB sub device with type "SMI"
2024-06-16 16:08 ` [PATCH rdma-next 04/12] RDMA/core: Support IB sub device with type "SMI" Leon Romanovsky
@ 2024-06-29 0:14 ` Zhu Yanjun
2024-07-01 11:55 ` Leon Romanovsky
0 siblings, 1 reply; 17+ messages in thread
From: Zhu Yanjun @ 2024-06-29 0:14 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe
Cc: Mark Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan
在 2024/6/17 0:08, Leon Romanovsky 写道:
> From: Mark Zhang <markzhang@nvidia.com>
>
> This patch adds 2 APIs, as well as driver operations to support adding
> and deleting an IB sub device, which provides part of functionalities
> of it's parent.
>
> A sub device has a type; for a sub device with type "SMI", it provides
> the smi capability through umad for its parent, meaning uverb is not
> supported.
>
> A sub device cannot live without a parent. So when a parent is
> released, all it's sub devices are released as well.
>
> Signed-off-by: Mark Zhang <markzhang@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
> drivers/infiniband/core/device.c | 68 +++++++++++++++++++++++++++
> drivers/infiniband/core/uverbs_main.c | 3 +-
> include/rdma/ib_verbs.h | 43 +++++++++++++++++
> include/uapi/rdma/rdma_netlink.h | 5 ++
> 4 files changed, 118 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index 55aa7aa32d4a..8547cab50b23 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -641,6 +641,11 @@ struct ib_device *_ib_alloc_device(size_t size)
> BIT_ULL(IB_USER_VERBS_CMD_REG_MR) |
> BIT_ULL(IB_USER_VERBS_CMD_REREG_MR) |
> BIT_ULL(IB_USER_VERBS_CMD_RESIZE_CQ);
> +
> + mutex_init(&device->subdev_lock);
> + INIT_LIST_HEAD(&device->subdev_list_head);
> + INIT_LIST_HEAD(&device->subdev_list);
> +
> return device;
> }
> EXPORT_SYMBOL(_ib_alloc_device);
> @@ -1461,6 +1466,18 @@ EXPORT_SYMBOL(ib_register_device);
> /* Callers must hold a get on the device. */
> static void __ib_unregister_device(struct ib_device *ib_dev)
> {
> + struct ib_device *sub, *tmp;
> +
> + mutex_lock(&ib_dev->subdev_lock);
> + list_for_each_entry_safe_reverse(sub, tmp,
> + &ib_dev->subdev_list_head,
> + subdev_list) {
> + list_del(&sub->subdev_list);
> + ib_dev->ops.del_sub_dev(sub);
> + ib_device_put(ib_dev);
> + }
> + mutex_unlock(&ib_dev->subdev_lock);
> +
> /*
> * We have a registration lock so that all the calls to unregister are
> * fully fenced, once any unregister returns the device is truely
> @@ -2597,6 +2614,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
> ops->uverbs_no_driver_id_binding;
>
> SET_DEVICE_OP(dev_ops, add_gid);
> + SET_DEVICE_OP(dev_ops, add_sub_dev);
> SET_DEVICE_OP(dev_ops, advise_mr);
> SET_DEVICE_OP(dev_ops, alloc_dm);
> SET_DEVICE_OP(dev_ops, alloc_hw_device_stats);
> @@ -2631,6 +2649,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
> SET_DEVICE_OP(dev_ops, dealloc_ucontext);
> SET_DEVICE_OP(dev_ops, dealloc_xrcd);
> SET_DEVICE_OP(dev_ops, del_gid);
> + SET_DEVICE_OP(dev_ops, del_sub_dev);
> SET_DEVICE_OP(dev_ops, dereg_mr);
> SET_DEVICE_OP(dev_ops, destroy_ah);
> SET_DEVICE_OP(dev_ops, destroy_counters);
> @@ -2727,6 +2746,55 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
> }
> EXPORT_SYMBOL(ib_set_device_ops);
>
> +int ib_add_sub_device(struct ib_device *parent,
> + enum rdma_nl_dev_type type,
> + const char *name)
> +{
> + struct ib_device *sub;
> + int ret = 0;
> +
> + if (!parent->ops.add_sub_dev || !parent->ops.del_sub_dev)
> + return -EOPNOTSUPP;
> +
> + if (!ib_device_try_get(parent))
> + return -EINVAL;
> +
> + sub = parent->ops.add_sub_dev(parent, type, name);
> + if (IS_ERR(sub)) {
> + ib_device_put(parent);
> + return PTR_ERR(sub);
> + }
> +
> + sub->type = type;
> + sub->parent = parent;
> +
> + mutex_lock(&parent->subdev_lock);
> + list_add_tail(&parent->subdev_list_head, &sub->subdev_list);
> + mutex_unlock(&parent->subdev_lock);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(ib_add_sub_device);
> +
> +int ib_del_sub_device_and_put(struct ib_device *sub)
> +{
> + struct ib_device *parent = sub->parent;
> +
> + if (!parent)
> + return -EOPNOTSUPP;
> +
> + mutex_lock(&parent->subdev_lock);
mutex_destroy of subdev_lock is missing. When mutex_lock is called, it
had better call mutex_destroy when the mutex lock is not used any more.
Other mutex locks in this file, for example subdev_lock and subdev_lock,
call mutex_destroy in the function ib_device_release.
Perhaps subdev_lock can also call mutex_destroy in ib_device_release?
Zhu Yanjun
> + list_del(&sub->subdev_list);
> + mutex_unlock(&parent->subdev_lock);
> +
> + ib_device_put(sub);
> + parent->ops.del_sub_dev(sub);
> + ib_device_put(parent);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(ib_del_sub_device_and_put);
> +
> #ifdef CONFIG_INFINIBAND_VIRT_DMA
> int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents)
> {
> diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> index 495d5a5d0373..bc099287de9a 100644
> --- a/drivers/infiniband/core/uverbs_main.c
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -1114,7 +1114,8 @@ static int ib_uverbs_add_one(struct ib_device *device)
> struct ib_uverbs_device *uverbs_dev;
> int ret;
>
> - if (!device->ops.alloc_ucontext)
> + if (!device->ops.alloc_ucontext ||
> + device->type == RDMA_DEVICE_TYPE_SMI)
> return -EOPNOTSUPP;
>
> uverbs_dev = kzalloc(sizeof(*uverbs_dev), GFP_KERNEL);
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 477bf9dd5e71..bebc2d22f466 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -2661,6 +2661,18 @@ struct ib_device_ops {
> */
> int (*get_numa_node)(struct ib_device *dev);
>
> + /**
> + * add_sub_dev - Add a sub IB device
> + */
> + struct ib_device *(*add_sub_dev)(struct ib_device *parent,
> + enum rdma_nl_dev_type type,
> + const char *name);
> +
> + /**
> + * del_sub_dev - Delete a sub IB device
> + */
> + void (*del_sub_dev)(struct ib_device *sub_dev);
> +
> DECLARE_RDMA_OBJ_SIZE(ib_ah);
> DECLARE_RDMA_OBJ_SIZE(ib_counters);
> DECLARE_RDMA_OBJ_SIZE(ib_cq);
> @@ -2771,6 +2783,15 @@ struct ib_device {
> char iw_ifname[IFNAMSIZ];
> u32 iw_driver_flags;
> u32 lag_flags;
> +
> + /* A parent device has a list of sub-devices */
> + struct mutex subdev_lock;
> + struct list_head subdev_list_head;
> +
> + /* A sub device has a type and a parent */
> + enum rdma_nl_dev_type type;
> + struct ib_device *parent;
> + struct list_head subdev_list;
> };
>
> static inline void *rdma_zalloc_obj(struct ib_device *dev, size_t size,
> @@ -4820,4 +4841,26 @@ static inline u16 rdma_get_udp_sport(u32 fl, u32 lqpn, u32 rqpn)
>
> const struct ib_port_immutable*
> ib_port_immutable_read(struct ib_device *dev, unsigned int port);
> +
> +/** ib_add_sub_device - Add a sub IB device on an existing one
> + *
> + * @parent: The IB device that needs to add a sub device
> + * @type: The type of the new sub device
> + * @name: The name of the new sub device
> + *
> + *
> + * Return 0 on success, an error code otherwise
> + */
> +int ib_add_sub_device(struct ib_device *parent,
> + enum rdma_nl_dev_type type,
> + const char *name);
> +
> +
> +/** ib_del_sub_device_and_put - Delect an IB sub device while holding a 'get'
> + *
> + * @sub: The sub device that is going to be deleted
> + *
> + * Return 0 on success, an error code otherwise
> + */
> +int ib_del_sub_device_and_put(struct ib_device *sub);
> #endif /* IB_VERBS_H */
> diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
> index a214fc259f28..d15ee16be722 100644
> --- a/include/uapi/rdma/rdma_netlink.h
> +++ b/include/uapi/rdma/rdma_netlink.h
> @@ -602,4 +602,9 @@ enum rdma_nl_counter_mask {
> RDMA_COUNTER_MASK_QP_TYPE = 1,
> RDMA_COUNTER_MASK_PID = 1 << 1,
> };
> +
> +/* Supported rdma device types. */
> +enum rdma_nl_dev_type {
> + RDMA_DEVICE_TYPE_SMI = 1,
> +};
> #endif /* _UAPI_RDMA_NETLINK_H */
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH rdma-next 04/12] RDMA/core: Support IB sub device with type "SMI"
2024-06-29 0:14 ` Zhu Yanjun
@ 2024-07-01 11:55 ` Leon Romanovsky
0 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-07-01 11:55 UTC (permalink / raw)
To: Zhu Yanjun
Cc: Jason Gunthorpe, Mark Zhang, David S. Miller, Eric Dumazet,
Jakub Kicinski, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan
On Sat, Jun 29, 2024 at 08:14:56AM +0800, Zhu Yanjun wrote:
> 在 2024/6/17 0:08, Leon Romanovsky 写道:
> > From: Mark Zhang <markzhang@nvidia.com>
> >
> > This patch adds 2 APIs, as well as driver operations to support adding
> > and deleting an IB sub device, which provides part of functionalities
> > of it's parent.
> >
> > A sub device has a type; for a sub device with type "SMI", it provides
> > the smi capability through umad for its parent, meaning uverb is not
> > supported.
> >
> > A sub device cannot live without a parent. So when a parent is
> > released, all it's sub devices are released as well.
> >
> > Signed-off-by: Mark Zhang <markzhang@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> > drivers/infiniband/core/device.c | 68 +++++++++++++++++++++++++++
> > drivers/infiniband/core/uverbs_main.c | 3 +-
> > include/rdma/ib_verbs.h | 43 +++++++++++++++++
> > include/uapi/rdma/rdma_netlink.h | 5 ++
> > 4 files changed, 118 insertions(+), 1 deletion(-)
<...>
> > +int ib_del_sub_device_and_put(struct ib_device *sub)
> > +{
> > + struct ib_device *parent = sub->parent;
> > +
> > + if (!parent)
> > + return -EOPNOTSUPP;
> > +
> > + mutex_lock(&parent->subdev_lock);
>
> mutex_destroy of subdev_lock is missing. When mutex_lock is called, it had
> better call mutex_destroy when the mutex lock is not used any more.
> Other mutex locks in this file, for example subdev_lock and subdev_lock,
> call mutex_destroy in the function ib_device_release.
>
> Perhaps subdev_lock can also call mutex_destroy in ib_device_release?
Thanks, I will add this fixup to the series.
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 7aaf2b4c1844..7b418c717f29 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -503,6 +503,7 @@ static void ib_device_release(struct device *device)
rcu_head);
}
+ mutex_destroy(&dev->subdev_lock);
mutex_destroy(&dev->unregistration_lock);
mutex_destroy(&dev->compat_devs_mutex);
Thanks
>
> Zhu Yanjun
>
> > + list_del(&sub->subdev_list);
> > + mutex_unlock(&parent->subdev_lock);
> > +
> > + ib_device_put(sub);
> > + parent->ops.del_sub_dev(sub);
> > + ib_device_put(parent);
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL(ib_del_sub_device_and_put);
> > +
> > #ifdef CONFIG_INFINIBAND_VIRT_DMA
> > int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents)
> > {
> > diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> > index 495d5a5d0373..bc099287de9a 100644
> > --- a/drivers/infiniband/core/uverbs_main.c
> > +++ b/drivers/infiniband/core/uverbs_main.c
> > @@ -1114,7 +1114,8 @@ static int ib_uverbs_add_one(struct ib_device *device)
> > struct ib_uverbs_device *uverbs_dev;
> > int ret;
> > - if (!device->ops.alloc_ucontext)
> > + if (!device->ops.alloc_ucontext ||
> > + device->type == RDMA_DEVICE_TYPE_SMI)
> > return -EOPNOTSUPP;
> > uverbs_dev = kzalloc(sizeof(*uverbs_dev), GFP_KERNEL);
> > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> > index 477bf9dd5e71..bebc2d22f466 100644
> > --- a/include/rdma/ib_verbs.h
> > +++ b/include/rdma/ib_verbs.h
> > @@ -2661,6 +2661,18 @@ struct ib_device_ops {
> > */
> > int (*get_numa_node)(struct ib_device *dev);
> > + /**
> > + * add_sub_dev - Add a sub IB device
> > + */
> > + struct ib_device *(*add_sub_dev)(struct ib_device *parent,
> > + enum rdma_nl_dev_type type,
> > + const char *name);
> > +
> > + /**
> > + * del_sub_dev - Delete a sub IB device
> > + */
> > + void (*del_sub_dev)(struct ib_device *sub_dev);
> > +
> > DECLARE_RDMA_OBJ_SIZE(ib_ah);
> > DECLARE_RDMA_OBJ_SIZE(ib_counters);
> > DECLARE_RDMA_OBJ_SIZE(ib_cq);
> > @@ -2771,6 +2783,15 @@ struct ib_device {
> > char iw_ifname[IFNAMSIZ];
> > u32 iw_driver_flags;
> > u32 lag_flags;
> > +
> > + /* A parent device has a list of sub-devices */
> > + struct mutex subdev_lock;
> > + struct list_head subdev_list_head;
> > +
> > + /* A sub device has a type and a parent */
> > + enum rdma_nl_dev_type type;
> > + struct ib_device *parent;
> > + struct list_head subdev_list;
> > };
> > static inline void *rdma_zalloc_obj(struct ib_device *dev, size_t size,
> > @@ -4820,4 +4841,26 @@ static inline u16 rdma_get_udp_sport(u32 fl, u32 lqpn, u32 rqpn)
> > const struct ib_port_immutable*
> > ib_port_immutable_read(struct ib_device *dev, unsigned int port);
> > +
> > +/** ib_add_sub_device - Add a sub IB device on an existing one
> > + *
> > + * @parent: The IB device that needs to add a sub device
> > + * @type: The type of the new sub device
> > + * @name: The name of the new sub device
> > + *
> > + *
> > + * Return 0 on success, an error code otherwise
> > + */
> > +int ib_add_sub_device(struct ib_device *parent,
> > + enum rdma_nl_dev_type type,
> > + const char *name);
> > +
> > +
> > +/** ib_del_sub_device_and_put - Delect an IB sub device while holding a 'get'
> > + *
> > + * @sub: The sub device that is going to be deleted
> > + *
> > + * Return 0 on success, an error code otherwise
> > + */
> > +int ib_del_sub_device_and_put(struct ib_device *sub);
> > #endif /* IB_VERBS_H */
> > diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
> > index a214fc259f28..d15ee16be722 100644
> > --- a/include/uapi/rdma/rdma_netlink.h
> > +++ b/include/uapi/rdma/rdma_netlink.h
> > @@ -602,4 +602,9 @@ enum rdma_nl_counter_mask {
> > RDMA_COUNTER_MASK_QP_TYPE = 1,
> > RDMA_COUNTER_MASK_PID = 1 << 1,
> > };
> > +
> > +/* Supported rdma device types. */
> > +enum rdma_nl_dev_type {
> > + RDMA_DEVICE_TYPE_SMI = 1,
> > +};
> > #endif /* _UAPI_RDMA_NETLINK_H */
>
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH rdma-next 05/12] RDMA: Set type of rdma_ah to IB for a SMI sub device
2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
` (3 preceding siblings ...)
2024-06-16 16:08 ` [PATCH rdma-next 04/12] RDMA/core: Support IB sub device with type "SMI" Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
2024-06-16 16:08 ` [PATCH rdma-next 06/12] RDMA/core: Create GSI QP only when CM is supported Leon Romanovsky
` (8 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mark Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan
From: Mark Zhang <markzhang@nvidia.com>
An address handle created on a SMI port has type IB, as a SMI
port it's used for SMI management through umad.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
include/rdma/ib_verbs.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index bebc2d22f466..c20571618798 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4660,6 +4660,8 @@ static inline enum rdma_ah_attr_type rdma_ah_find_type(struct ib_device *dev,
return RDMA_AH_ATTR_TYPE_OPA;
return RDMA_AH_ATTR_TYPE_IB;
}
+ if (dev->type == RDMA_DEVICE_TYPE_SMI)
+ return RDMA_AH_ATTR_TYPE_IB;
return RDMA_AH_ATTR_TYPE_UNDEFINED;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH rdma-next 06/12] RDMA/core: Create GSI QP only when CM is supported
2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
` (4 preceding siblings ...)
2024-06-16 16:08 ` [PATCH rdma-next 05/12] RDMA: Set type of rdma_ah to IB for a SMI sub device Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
2024-06-16 16:08 ` [PATCH rdma-next 07/12] RDMA/mlx5: Support plane device and driver APIs to add and delete it Leon Romanovsky
` (7 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mark Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan
From: Mark Zhang <markzhang@nvidia.com>
GSI QP is not needed if the port doesn't support connection management.
In following patches mlx5 is going to support IB ports that doesn't
support CM.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/agent.c | 32 ++++++++++++++++++++++----------
drivers/infiniband/core/mad.c | 9 ++++++---
2 files changed, 28 insertions(+), 13 deletions(-)
diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index f82b4260de42..3bb46696731e 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -59,7 +59,16 @@ __ib_get_agent_port(const struct ib_device *device, int port_num)
struct ib_agent_port_private *entry;
list_for_each_entry(entry, &ib_agent_port_list, port_list) {
- if (entry->agent[1]->device == device &&
+ /* Need to check both agent[0] and agent[1], as an agent port
+ * may only have one of them
+ */
+ if (entry->agent[0] &&
+ entry->agent[0]->device == device &&
+ entry->agent[0]->port_num == port_num)
+ return entry;
+
+ if (entry->agent[1] &&
+ entry->agent[1]->device == device &&
entry->agent[1]->port_num == port_num)
return entry;
}
@@ -172,14 +181,16 @@ int ib_agent_port_open(struct ib_device *device, int port_num)
}
}
- /* Obtain send only MAD agent for GSI QP */
- port_priv->agent[1] = ib_register_mad_agent(device, port_num,
- IB_QPT_GSI, NULL, 0,
- &agent_send_handler,
- NULL, NULL, 0);
- if (IS_ERR(port_priv->agent[1])) {
- ret = PTR_ERR(port_priv->agent[1]);
- goto error3;
+ if (rdma_cap_ib_cm(device, port_num)) {
+ /* Obtain send only MAD agent for GSI QP */
+ port_priv->agent[1] = ib_register_mad_agent(device, port_num,
+ IB_QPT_GSI, NULL, 0,
+ &agent_send_handler,
+ NULL, NULL, 0);
+ if (IS_ERR(port_priv->agent[1])) {
+ ret = PTR_ERR(port_priv->agent[1]);
+ goto error3;
+ }
}
spin_lock_irqsave(&ib_agent_port_list_lock, flags);
@@ -212,7 +223,8 @@ int ib_agent_port_close(struct ib_device *device, int port_num)
list_del(&port_priv->port_list);
spin_unlock_irqrestore(&ib_agent_port_list_lock, flags);
- ib_unregister_mad_agent(port_priv->agent[1]);
+ if (port_priv->agent[1])
+ ib_unregister_mad_agent(port_priv->agent[1]);
if (port_priv->agent[0])
ib_unregister_mad_agent(port_priv->agent[0]);
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 674344eb8e2f..7439e47ff951 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2983,9 +2983,12 @@ static int ib_mad_port_open(struct ib_device *device,
if (ret)
goto error6;
}
- ret = create_mad_qp(&port_priv->qp_info[1], IB_QPT_GSI);
- if (ret)
- goto error7;
+
+ if (rdma_cap_ib_cm(device, port_num)) {
+ ret = create_mad_qp(&port_priv->qp_info[1], IB_QPT_GSI);
+ if (ret)
+ goto error7;
+ }
snprintf(name, sizeof(name), "ib_mad%u", port_num);
port_priv->wq = alloc_ordered_workqueue(name, WQ_MEM_RECLAIM);
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH rdma-next 07/12] RDMA/mlx5: Support plane device and driver APIs to add and delete it
2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
` (5 preceding siblings ...)
2024-06-16 16:08 ` [PATCH rdma-next 06/12] RDMA/core: Create GSI QP only when CM is supported Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
2024-06-16 16:08 ` [PATCH rdma-next 08/12] RDMA/nldev: Add support to add/delete a sub IB device through netlink Leon Romanovsky
` (6 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mark Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan
From: Mark Zhang <markzhang@nvidia.com>
This patch supports driver APIs "add_sub_dev" and "del_sub_dev", to
add and delete a plane device respectively.
A mlx5 plane device is a rdma SMI device; It provides the SMI capability
through user MAD for it's parent, the logical multi-plane aggregated
device. For a plane port:
- It supports QP0 only;
- When adding a plane device, all plane ports are added;
- For some commands like mad_ifc, both plane_index and native portnum
is needed;
- When querying or modifying a plane port context, the native portnum
must be used, as the query/modify_hca_vport_context command doesn't
support plane port.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/cmd.c | 12 ++-
drivers/infiniband/hw/mlx5/cmd.h | 2 +-
drivers/infiniband/hw/mlx5/mad.c | 2 +-
drivers/infiniband/hw/mlx5/main.c | 116 ++++++++++++++++++++++++++-
drivers/infiniband/hw/mlx5/mlx5_ib.h | 8 ++
drivers/infiniband/hw/mlx5/qp.c | 7 +-
drivers/infiniband/hw/mlx5/qpc.c | 13 ++-
7 files changed, 147 insertions(+), 13 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c
index 1d0c8d5e745b..895b62cc528d 100644
--- a/drivers/infiniband/hw/mlx5/cmd.c
+++ b/drivers/infiniband/hw/mlx5/cmd.c
@@ -177,7 +177,7 @@ int mlx5_cmd_xrcd_dealloc(struct mlx5_core_dev *dev, u32 xrcdn, u16 uid)
return mlx5_cmd_exec_in(dev, dealloc_xrcd, in);
}
-int mlx5_cmd_mad_ifc(struct mlx5_core_dev *dev, const void *inb, void *outb,
+int mlx5_cmd_mad_ifc(struct mlx5_ib_dev *dev, const void *inb, void *outb,
u16 opmod, u8 port)
{
int outlen = MLX5_ST_SZ_BYTES(mad_ifc_out);
@@ -195,12 +195,18 @@ int mlx5_cmd_mad_ifc(struct mlx5_core_dev *dev, const void *inb, void *outb,
MLX5_SET(mad_ifc_in, in, opcode, MLX5_CMD_OP_MAD_IFC);
MLX5_SET(mad_ifc_in, in, op_mod, opmod);
- MLX5_SET(mad_ifc_in, in, port, port);
+ if (dev->ib_dev.type == RDMA_DEVICE_TYPE_SMI) {
+ MLX5_SET(mad_ifc_in, in, plane_index, port);
+ MLX5_SET(mad_ifc_in, in, port,
+ smi_to_native_portnum(dev, port));
+ } else {
+ MLX5_SET(mad_ifc_in, in, port, port);
+ }
data = MLX5_ADDR_OF(mad_ifc_in, in, mad);
memcpy(data, inb, MLX5_FLD_SZ_BYTES(mad_ifc_in, mad));
- err = mlx5_cmd_exec_inout(dev, mad_ifc, in, out);
+ err = mlx5_cmd_exec_inout(dev->mdev, mad_ifc, in, out);
if (err)
goto out;
diff --git a/drivers/infiniband/hw/mlx5/cmd.h b/drivers/infiniband/hw/mlx5/cmd.h
index 93a971a40d11..e5cd31270443 100644
--- a/drivers/infiniband/hw/mlx5/cmd.h
+++ b/drivers/infiniband/hw/mlx5/cmd.h
@@ -54,7 +54,7 @@ int mlx5_cmd_detach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid,
u32 qpn, u16 uid);
int mlx5_cmd_xrcd_alloc(struct mlx5_core_dev *dev, u32 *xrcdn, u16 uid);
int mlx5_cmd_xrcd_dealloc(struct mlx5_core_dev *dev, u32 xrcdn, u16 uid);
-int mlx5_cmd_mad_ifc(struct mlx5_core_dev *dev, const void *inb, void *outb,
+int mlx5_cmd_mad_ifc(struct mlx5_ib_dev *dev, const void *inb, void *outb,
u16 opmod, u8 port);
int mlx5_cmd_uar_alloc(struct mlx5_core_dev *dev, u32 *uarn, u16 uid);
int mlx5_cmd_uar_dealloc(struct mlx5_core_dev *dev, u32 uarn, u16 uid);
diff --git a/drivers/infiniband/hw/mlx5/mad.c b/drivers/infiniband/hw/mlx5/mad.c
index 3e43687a7f6f..ead836d159d3 100644
--- a/drivers/infiniband/hw/mlx5/mad.c
+++ b/drivers/infiniband/hw/mlx5/mad.c
@@ -69,7 +69,7 @@ static int mlx5_MAD_IFC(struct mlx5_ib_dev *dev, int ignore_mkey,
if (ignore_bkey || !in_wc)
op_modifier |= 0x2;
- return mlx5_cmd_mad_ifc(dev->mdev, in_mad, response_mad, op_modifier,
+ return mlx5_cmd_mad_ifc(dev, in_mad, response_mad, op_modifier,
port);
}
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 55eb60715b48..3a653998bd88 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -313,6 +313,14 @@ struct mlx5_core_dev *mlx5_ib_get_native_port_mdev(struct mlx5_ib_dev *ibdev,
struct mlx5_ib_multiport_info *mpi;
struct mlx5_ib_port *port;
+ if (ibdev->ib_dev.type == RDMA_DEVICE_TYPE_SMI) {
+ if (native_port_num)
+ *native_port_num = smi_to_native_portnum(ibdev,
+ ib_port_num);
+ return ibdev->mdev;
+
+ }
+
if (!mlx5_core_mp_enabled(ibdev->mdev) ||
ll != IB_LINK_LAYER_ETHERNET) {
if (native_port_num)
@@ -1378,6 +1386,9 @@ static int mlx5_query_hca_port(struct ib_device *ibdev, u32 port,
/* props being zeroed by the caller, avoid zeroing it here */
+ if (ibdev->type == RDMA_DEVICE_TYPE_SMI)
+ port = smi_to_native_portnum(dev, port);
+
err = mlx5_query_hca_vport_context(mdev, 0, port, 0, rep);
if (err)
goto out;
@@ -1393,7 +1404,8 @@ static int mlx5_query_hca_port(struct ib_device *ibdev, u32 port,
if (dev->num_plane) {
props->port_cap_flags |= IB_PORT_SM_DISABLED;
props->port_cap_flags &= ~IB_PORT_SM;
- }
+ } else if (ibdev->type == RDMA_DEVICE_TYPE_SMI)
+ props->port_cap_flags &= ~IB_PORT_CM_SUP;
props->gid_tbl_len = mlx5_get_gid_table_len(MLX5_CAP_GEN(mdev, gid_table_size));
props->max_msg_sz = 1 << MLX5_CAP_GEN(mdev, log_max_msg);
@@ -2843,7 +2855,8 @@ static int set_has_smi_cap(struct mlx5_ib_dev *dev)
if (dev->num_plane) {
dev->port_caps[port - 1].has_smi = false;
continue;
- } else if (!MLX5_CAP_GEN(dev->mdev, ib_virt)) {
+ } else if (!MLX5_CAP_GEN(dev->mdev, ib_virt) ||
+ dev->ib_dev.type == RDMA_DEVICE_TYPE_SMI) {
dev->port_caps[port - 1].has_smi = true;
continue;
}
@@ -3057,6 +3070,8 @@ static u32 get_core_cap_flags(struct ib_device *ibdev,
return ret | RDMA_CORE_CAP_PROT_IB | RDMA_CORE_CAP_IB_MAD |
RDMA_CORE_CAP_IB_CM | RDMA_CORE_CAP_IB_SA |
RDMA_CORE_CAP_AF_IB;
+ else if (ibdev->type == RDMA_DEVICE_TYPE_SMI)
+ return ret | RDMA_CORE_CAP_IB_MAD | RDMA_CORE_CAP_IB_SMI;
if (ll == IB_LINK_LAYER_INFINIBAND)
return ret | RDMA_CORE_PORT_IBA_IB;
@@ -3093,6 +3108,9 @@ static int mlx5_port_immutable(struct ib_device *ibdev, u32 port_num,
return err;
if (ll == IB_LINK_LAYER_INFINIBAND) {
+ if (ibdev->type == RDMA_DEVICE_TYPE_SMI)
+ port_num = smi_to_native_portnum(dev, port_num);
+
err = mlx5_query_hca_vport_context(dev->mdev, 0, port_num, 0,
&rep);
if (err)
@@ -3892,12 +3910,18 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev)
return err;
}
+static struct ib_device *mlx5_ib_add_sub_dev(struct ib_device *parent,
+ enum rdma_nl_dev_type type,
+ const char *name);
+static void mlx5_ib_del_sub_dev(struct ib_device *sub_dev);
+
static const struct ib_device_ops mlx5_ib_dev_ops = {
.owner = THIS_MODULE,
.driver_id = RDMA_DRIVER_MLX5,
.uverbs_abi_ver = MLX5_IB_UVERBS_ABI_VERSION,
.add_gid = mlx5_ib_add_gid,
+ .add_sub_dev = mlx5_ib_add_sub_dev,
.alloc_mr = mlx5_ib_alloc_mr,
.alloc_mr_integrity = mlx5_ib_alloc_mr_integrity,
.alloc_pd = mlx5_ib_alloc_pd,
@@ -3912,6 +3936,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
.dealloc_pd = mlx5_ib_dealloc_pd,
.dealloc_ucontext = mlx5_ib_dealloc_ucontext,
.del_gid = mlx5_ib_del_gid,
+ .del_sub_dev = mlx5_ib_del_sub_dev,
.dereg_mr = mlx5_ib_dereg_mr,
.destroy_ah = mlx5_ib_destroy_ah,
.destroy_cq = mlx5_ib_destroy_cq,
@@ -4201,7 +4226,9 @@ static int mlx5_ib_stage_ib_reg_init(struct mlx5_ib_dev *dev)
{
const char *name;
- if (!mlx5_lag_is_active(dev->mdev))
+ if (dev->sub_dev_name)
+ name = dev->sub_dev_name;
+ else if (!mlx5_lag_is_active(dev->mdev))
name = "mlx5_%d";
else
name = "mlx5_bond_%d";
@@ -4462,6 +4489,89 @@ const struct mlx5_ib_profile raw_eth_profile = {
NULL),
};
+static const struct mlx5_ib_profile plane_profile = {
+ STAGE_CREATE(MLX5_IB_STAGE_INIT,
+ mlx5_ib_stage_init_init,
+ mlx5_ib_stage_init_cleanup),
+ STAGE_CREATE(MLX5_IB_STAGE_CAPS,
+ mlx5_ib_stage_caps_init,
+ mlx5_ib_stage_caps_cleanup),
+ STAGE_CREATE(MLX5_IB_STAGE_NON_DEFAULT_CB,
+ mlx5_ib_stage_non_default_cb,
+ NULL),
+ STAGE_CREATE(MLX5_IB_STAGE_QP,
+ mlx5_init_qp_table,
+ mlx5_cleanup_qp_table),
+ STAGE_CREATE(MLX5_IB_STAGE_SRQ,
+ mlx5_init_srq_table,
+ mlx5_cleanup_srq_table),
+ STAGE_CREATE(MLX5_IB_STAGE_DEVICE_RESOURCES,
+ mlx5_ib_dev_res_init,
+ mlx5_ib_dev_res_cleanup),
+ STAGE_CREATE(MLX5_IB_STAGE_BFREG,
+ mlx5_ib_stage_bfrag_init,
+ mlx5_ib_stage_bfrag_cleanup),
+ STAGE_CREATE(MLX5_IB_STAGE_IB_REG,
+ mlx5_ib_stage_ib_reg_init,
+ mlx5_ib_stage_ib_reg_cleanup),
+};
+
+static struct ib_device *mlx5_ib_add_sub_dev(struct ib_device *parent,
+ enum rdma_nl_dev_type type,
+ const char *name)
+{
+ struct mlx5_ib_dev *mparent = to_mdev(parent), *mplane;
+ enum rdma_link_layer ll;
+ int ret;
+
+ if (mparent->smi_dev)
+ return ERR_PTR(-EEXIST);
+
+ ll = mlx5_port_type_cap_to_rdma_ll(MLX5_CAP_GEN(mparent->mdev,
+ port_type));
+ if (type != RDMA_DEVICE_TYPE_SMI || !mparent->num_plane ||
+ ll != IB_LINK_LAYER_INFINIBAND ||
+ !MLX5_CAP_GEN_2(mparent->mdev, multiplane_qp_ud))
+ return ERR_PTR(-EOPNOTSUPP);
+
+ mplane = ib_alloc_device(mlx5_ib_dev, ib_dev);
+ if (!mplane)
+ return ERR_PTR(-ENOMEM);
+
+ mplane->port = kcalloc(mparent->num_plane * mparent->num_ports,
+ sizeof(*mplane->port), GFP_KERNEL);
+ if (!mplane->port) {
+ ret = -ENOMEM;
+ goto fail_kcalloc;
+ }
+
+ mplane->ib_dev.type = type;
+ mplane->mdev = mparent->mdev;
+ mplane->num_ports = mparent->num_plane;
+ mplane->sub_dev_name = name;
+
+ ret = __mlx5_ib_add(mplane, &plane_profile);
+ if (ret)
+ goto fail_ib_add;
+
+ mparent->smi_dev = mplane;
+ return &mplane->ib_dev;
+
+fail_ib_add:
+ kfree(mplane->port);
+fail_kcalloc:
+ ib_dealloc_device(&mplane->ib_dev);
+ return ERR_PTR(ret);
+}
+
+static void mlx5_ib_del_sub_dev(struct ib_device *sub_dev)
+{
+ struct mlx5_ib_dev *mdev = to_mdev(sub_dev);
+
+ to_mdev(sub_dev->parent)->smi_dev = NULL;
+ __mlx5_ib_remove(mdev, mdev->profile, MLX5_IB_STAGE_MAX);
+}
+
static int mlx5r_mp_probe(struct auxiliary_device *adev,
const struct auxiliary_device_id *id)
{
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index d97d6bc2dbaa..bf25ddb17bce 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1191,6 +1191,8 @@ struct mlx5_ib_dev {
#endif
u8 num_plane;
+ struct mlx5_ib_dev *smi_dev;
+ const char *sub_dev_name;
};
static inline struct mlx5_ib_cq *to_mibcq(struct mlx5_core_cq *mcq)
@@ -1698,4 +1700,10 @@ static inline bool mlx5_umem_needs_ats(struct mlx5_ib_dev *dev,
int set_roce_addr(struct mlx5_ib_dev *dev, u32 port_num,
unsigned int index, const union ib_gid *gid,
const struct ib_gid_attr *attr);
+
+static inline u32 smi_to_native_portnum(struct mlx5_ib_dev *dev, u32 port)
+{
+ return (port - 1) / dev->num_ports + 1;
+}
+
#endif /* MLX5_IB_H */
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index be288cc7a3c0..66d9b44a6991 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -4219,7 +4219,12 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
/* todo implement counter_index functionality */
- if (is_sqp(qp->type))
+ if (dev->ib_dev.type == RDMA_DEVICE_TYPE_SMI && is_qp0(qp->type)) {
+ MLX5_SET(ads, pri_path, vhca_port_num,
+ smi_to_native_portnum(dev, qp->port));
+ if (cur_state == IB_QPS_INIT && new_state == IB_QPS_RTR)
+ MLX5_SET(ads, pri_path, plane_index, qp->port);
+ } else if (is_sqp(qp->type))
MLX5_SET(ads, pri_path, vhca_port_num, qp->port);
if (attr_mask & IB_QP_PORT)
diff --git a/drivers/infiniband/hw/mlx5/qpc.c b/drivers/infiniband/hw/mlx5/qpc.c
index d9cf6982d645..d3dcc272200a 100644
--- a/drivers/infiniband/hw/mlx5/qpc.c
+++ b/drivers/infiniband/hw/mlx5/qpc.c
@@ -249,7 +249,8 @@ int mlx5_qpc_create_qp(struct mlx5_ib_dev *dev, struct mlx5_core_qp *qp,
if (err)
goto err_cmd;
- mlx5_debug_qp_add(dev->mdev, qp);
+ if (dev->ib_dev.type != RDMA_DEVICE_TYPE_SMI)
+ mlx5_debug_qp_add(dev->mdev, qp);
return 0;
@@ -307,7 +308,8 @@ int mlx5_core_destroy_qp(struct mlx5_ib_dev *dev, struct mlx5_core_qp *qp)
{
u32 in[MLX5_ST_SZ_DW(destroy_qp_in)] = {};
- mlx5_debug_qp_remove(dev->mdev, qp);
+ if (dev->ib_dev.type != RDMA_DEVICE_TYPE_SMI)
+ mlx5_debug_qp_remove(dev->mdev, qp);
destroy_resource_common(dev, qp);
@@ -504,7 +506,9 @@ int mlx5_init_qp_table(struct mlx5_ib_dev *dev)
spin_lock_init(&table->lock);
INIT_RADIX_TREE(&table->tree, GFP_ATOMIC);
xa_init(&table->dct_xa);
- mlx5_qp_debugfs_init(dev->mdev);
+
+ if (dev->ib_dev.type != RDMA_DEVICE_TYPE_SMI)
+ mlx5_qp_debugfs_init(dev->mdev);
table->nb.notifier_call = rsc_event_notifier;
mlx5_notifier_register(dev->mdev, &table->nb);
@@ -517,7 +521,8 @@ void mlx5_cleanup_qp_table(struct mlx5_ib_dev *dev)
struct mlx5_qp_table *table = &dev->qp_table;
mlx5_notifier_unregister(dev->mdev, &table->nb);
- mlx5_qp_debugfs_cleanup(dev->mdev);
+ if (dev->ib_dev.type != RDMA_DEVICE_TYPE_SMI)
+ mlx5_qp_debugfs_cleanup(dev->mdev);
}
int mlx5_core_qp_query(struct mlx5_ib_dev *dev, struct mlx5_core_qp *qp,
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH rdma-next 08/12] RDMA/nldev: Add support to add/delete a sub IB device through netlink
2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
` (6 preceding siblings ...)
2024-06-16 16:08 ` [PATCH rdma-next 07/12] RDMA/mlx5: Support plane device and driver APIs to add and delete it Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
2024-06-16 16:08 ` [PATCH rdma-next 09/12] RDMA/nldev: Add support to dump device type and parent device if exists Leon Romanovsky
` (5 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mark Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan
From: Mark Zhang <markzhang@nvidia.com>
Add new netlink commands and attributes to support adding and deleting
a sub IB device with admin privilege.
Examples:
$ rdma dev add smi1 type SMI parent ibp8s0f1
$ rdma dev del smi1
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/nldev.c | 59 ++++++++++++++++++++++++++++++++
include/uapi/rdma/rdma_netlink.h | 6 ++++
2 files changed, 65 insertions(+)
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index bc79ee630d8d..b5f87e7a1cfd 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -167,6 +167,7 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
[RDMA_NLDEV_ATTR_STAT_HWCOUNTER_DYNAMIC] = { .type = NLA_U8 },
[RDMA_NLDEV_SYS_ATTR_PRIVILEGED_QKEY_MODE] = { .type = NLA_U8 },
[RDMA_NLDEV_ATTR_DRIVER_DETAILS] = { .type = NLA_U8 },
+ [RDMA_NLDEV_ATTR_DEV_TYPE] = { .type = NLA_U8 },
};
static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
@@ -2548,6 +2549,56 @@ static int nldev_stat_get_counter_status_doit(struct sk_buff *skb,
return ret;
}
+static int nldev_newdev(struct sk_buff *skb, struct nlmsghdr *nlh,
+ struct netlink_ext_ack *extack)
+{
+ struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
+ enum rdma_nl_dev_type type;
+ struct ib_device *parent;
+ char name[IFNAMSIZ] = {};
+ u32 parentid;
+ int ret;
+
+ ret = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
+ nldev_policy, extack);
+ if (ret || !tb[RDMA_NLDEV_ATTR_DEV_INDEX] ||
+ !tb[RDMA_NLDEV_ATTR_DEV_NAME] || !tb[RDMA_NLDEV_ATTR_DEV_TYPE])
+ return -EINVAL;
+
+ nla_strscpy(name, tb[RDMA_NLDEV_ATTR_DEV_NAME], sizeof(name));
+ type = nla_get_u8(tb[RDMA_NLDEV_ATTR_DEV_TYPE]);
+ parentid = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
+ parent = ib_device_get_by_index(sock_net(skb->sk), parentid);
+ if (!parent)
+ return -EINVAL;
+
+ ret = ib_add_sub_device(parent, type, name);
+ ib_device_put(parent);
+
+ return ret;
+}
+
+static int nldev_deldev(struct sk_buff *skb, struct nlmsghdr *nlh,
+ struct netlink_ext_ack *extack)
+{
+ struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
+ struct ib_device *device;
+ u32 devid;
+ int ret;
+
+ ret = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
+ nldev_policy, extack);
+ if (ret || !tb[RDMA_NLDEV_ATTR_DEV_INDEX])
+ return -EINVAL;
+
+ devid = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
+ device = ib_device_get_by_index(sock_net(skb->sk), devid);
+ if (!device)
+ return -EINVAL;
+
+ return ib_del_sub_device_and_put(device);
+}
+
static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
[RDMA_NLDEV_CMD_GET] = {
.doit = nldev_get_doit,
@@ -2646,6 +2697,14 @@ static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
[RDMA_NLDEV_CMD_STAT_GET_STATUS] = {
.doit = nldev_stat_get_counter_status_doit,
},
+ [RDMA_NLDEV_CMD_NEWDEV] = {
+ .doit = nldev_newdev,
+ .flags = RDMA_NL_ADMIN_PERM,
+ },
+ [RDMA_NLDEV_CMD_DELDEV] = {
+ .doit = nldev_deldev,
+ .flags = RDMA_NL_ADMIN_PERM,
+ },
};
void __init nldev_init(void)
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index d15ee16be722..bd52fb325e22 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -301,6 +301,10 @@ enum rdma_nldev_command {
RDMA_NLDEV_CMD_RES_SRQ_GET_RAW,
+ RDMA_NLDEV_CMD_NEWDEV,
+
+ RDMA_NLDEV_CMD_DELDEV,
+
RDMA_NLDEV_NUM_OPS
};
@@ -564,6 +568,8 @@ enum rdma_nldev_attr {
*/
RDMA_NLDEV_ATTR_RES_SUBTYPE, /* string */
+ RDMA_NLDEV_ATTR_DEV_TYPE, /* u8 */
+
/*
* Always the end
*/
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH rdma-next 09/12] RDMA/nldev: Add support to dump device type and parent device if exists
2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
` (7 preceding siblings ...)
2024-06-16 16:08 ` [PATCH rdma-next 08/12] RDMA/nldev: Add support to add/delete a sub IB device through netlink Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
2024-06-16 16:08 ` [PATCH mlx5-next 10/12] RDMA/mlx5: Add plane index support when querying PTYS registers Leon Romanovsky
` (4 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mark Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan
From: Mark Zhang <markzhang@nvidia.com>
If a device has a specific type or a parent device, dump them as well.
Example:
$ rdma dev show smi1
3: smi1: node_type ca fw 20.38.1002 node_guid 9803:9b03:009f:d5ef sys_image_guid 9803:9b03:009f:d5ee type smi parent ibp8s0f1
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/nldev.c | 10 ++++++++++
include/uapi/rdma/rdma_netlink.h | 2 ++
2 files changed, 12 insertions(+)
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index b5f87e7a1cfd..025efce540a7 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -168,6 +168,7 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
[RDMA_NLDEV_SYS_ATTR_PRIVILEGED_QKEY_MODE] = { .type = NLA_U8 },
[RDMA_NLDEV_ATTR_DRIVER_DETAILS] = { .type = NLA_U8 },
[RDMA_NLDEV_ATTR_DEV_TYPE] = { .type = NLA_U8 },
+ [RDMA_NLDEV_ATTR_PARENT_NAME] = { .type = NLA_NUL_STRING },
};
static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
@@ -302,6 +303,15 @@ static int fill_dev_info(struct sk_buff *msg, struct ib_device *device)
if (nla_put_u8(msg, RDMA_NLDEV_ATTR_DEV_DIM, device->use_cq_dim))
return -EMSGSIZE;
+ if (device->type &&
+ nla_put_u8(msg, RDMA_NLDEV_ATTR_DEV_TYPE, device->type))
+ return -EMSGSIZE;
+
+ if (device->parent &&
+ nla_put_string(msg, RDMA_NLDEV_ATTR_PARENT_NAME,
+ dev_name(&device->parent->dev)))
+ return -EMSGSIZE;
+
/*
* Link type is determined on first port and mlx4 device
* which can potentially have two different link type for the same
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index bd52fb325e22..4b69242d7848 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -570,6 +570,8 @@ enum rdma_nldev_attr {
RDMA_NLDEV_ATTR_DEV_TYPE, /* u8 */
+ RDMA_NLDEV_ATTR_PARENT_NAME, /* string */
+
/*
* Always the end
*/
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH mlx5-next 10/12] RDMA/mlx5: Add plane index support when querying PTYS registers
2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
` (8 preceding siblings ...)
2024-06-16 16:08 ` [PATCH rdma-next 09/12] RDMA/nldev: Add support to dump device type and parent device if exists Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
2024-06-16 16:08 ` [PATCH mlx5-next 11/12] net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports Leon Romanovsky
` (3 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mark Zhang, Eric Dumazet, Jakub Kicinski, linux-rdma, netdev,
Paolo Abeni, Saeed Mahameed, Tariq Toukan
From: Mark Zhang <markzhang@nvidia.com>
Support the new "plane_ind" field when querying port PTYS registers.
This is needed when querying the rate of a plane port.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/main.c | 12 +++++++-----
drivers/net/ethernet/mellanox/mlx5/core/en/port.c | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 2 +-
.../net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/port.c | 10 ++++++----
include/linux/mlx5/port.h | 5 +++--
6 files changed, 19 insertions(+), 14 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 3a653998bd88..4a0380e711ea 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -542,10 +542,10 @@ static int mlx5_query_port_roce(struct ib_device *device, u32 port_num,
*/
if (dev->is_rep)
err = mlx5_query_port_ptys(mdev, out, sizeof(out), MLX5_PTYS_EN,
- 1);
+ 1, 0);
else
err = mlx5_query_port_ptys(mdev, out, sizeof(out), MLX5_PTYS_EN,
- mdev_port_num);
+ mdev_port_num, 0);
if (err)
goto out;
ext = !!MLX5_GET_ETH_PROTO(ptys_reg, out, true, eth_proto_capability);
@@ -1372,11 +1372,11 @@ static int mlx5_query_hca_port(struct ib_device *ibdev, u32 port,
struct mlx5_ib_dev *dev = to_mdev(ibdev);
struct mlx5_core_dev *mdev = dev->mdev;
struct mlx5_hca_vport_context *rep;
+ u8 vl_hw_cap, plane_index = 0;
u16 max_mtu;
u16 oper_mtu;
int err;
u16 ib_link_width_oper;
- u8 vl_hw_cap;
rep = kzalloc(sizeof(*rep), GFP_KERNEL);
if (!rep) {
@@ -1386,8 +1386,10 @@ static int mlx5_query_hca_port(struct ib_device *ibdev, u32 port,
/* props being zeroed by the caller, avoid zeroing it here */
- if (ibdev->type == RDMA_DEVICE_TYPE_SMI)
+ if (ibdev->type == RDMA_DEVICE_TYPE_SMI) {
+ plane_index = port;
port = smi_to_native_portnum(dev, port);
+ }
err = mlx5_query_hca_vport_context(mdev, 0, port, 0, rep);
if (err)
@@ -1419,7 +1421,7 @@ static int mlx5_query_hca_port(struct ib_device *ibdev, u32 port,
props->port_cap_flags2 = rep->cap_mask2;
err = mlx5_query_ib_port_oper(mdev, &ib_link_width_oper,
- &props->active_speed, port);
+ &props->active_speed, port, plane_index);
if (err)
goto out;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
index b4efc780e297..5f6a0605e4ae 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
@@ -41,7 +41,7 @@ void mlx5_port_query_eth_autoneg(struct mlx5_core_dev *dev, u8 *an_status,
*an_disable_cap = 0;
*an_disable_admin = 0;
- if (mlx5_query_port_ptys(dev, out, sizeof(out), MLX5_PTYS_EN, 1))
+ if (mlx5_query_port_ptys(dev, out, sizeof(out), MLX5_PTYS_EN, 1, 0))
return;
*an_status = MLX5_GET(ptys_reg, out, an_status);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 3320f12ba2db..f57e0184c12b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1195,7 +1195,7 @@ static int mlx5e_ethtool_get_link_ksettings(struct mlx5e_priv *priv,
bool ext;
int err;
- err = mlx5_query_port_ptys(mdev, out, sizeof(out), MLX5_PTYS_EN, 1);
+ err = mlx5_query_port_ptys(mdev, out, sizeof(out), MLX5_PTYS_EN, 1, 0);
if (err) {
netdev_err(priv->netdev, "%s: query port ptys failed: %d\n",
__func__, err);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
index 779d92b762d3..b8aadbea1312 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
@@ -215,7 +215,7 @@ static int mlx5i_get_link_ksettings(struct net_device *netdev,
int speed, ret;
ret = mlx5_query_ib_port_oper(mdev, &ib_link_width_oper, &ib_proto_oper,
- 1);
+ 1, 0);
if (ret)
return ret;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index 7fba1c46e2ac..50931584132b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -144,11 +144,13 @@ int mlx5_set_port_caps(struct mlx5_core_dev *dev, u8 port_num, u32 caps)
EXPORT_SYMBOL_GPL(mlx5_set_port_caps);
int mlx5_query_port_ptys(struct mlx5_core_dev *dev, u32 *ptys,
- int ptys_size, int proto_mask, u8 local_port)
+ int ptys_size, int proto_mask,
+ u8 local_port, u8 plane_index)
{
u32 in[MLX5_ST_SZ_DW(ptys_reg)] = {0};
MLX5_SET(ptys_reg, in, local_port, local_port);
+ MLX5_SET(ptys_reg, in, plane_ind, plane_index);
MLX5_SET(ptys_reg, in, proto_mask, proto_mask);
return mlx5_core_access_reg(dev, in, sizeof(in), ptys,
ptys_size, MLX5_REG_PTYS, 0, 0);
@@ -167,13 +169,13 @@ int mlx5_set_port_beacon(struct mlx5_core_dev *dev, u16 beacon_duration)
}
int mlx5_query_ib_port_oper(struct mlx5_core_dev *dev, u16 *link_width_oper,
- u16 *proto_oper, u8 local_port)
+ u16 *proto_oper, u8 local_port, u8 plane_index)
{
u32 out[MLX5_ST_SZ_DW(ptys_reg)];
int err;
err = mlx5_query_port_ptys(dev, out, sizeof(out), MLX5_PTYS_IB,
- local_port);
+ local_port, plane_index);
if (err)
return err;
@@ -1114,7 +1116,7 @@ int mlx5_port_query_eth_proto(struct mlx5_core_dev *dev, u8 port, bool ext,
if (!eproto)
return -EINVAL;
- err = mlx5_query_port_ptys(dev, out, sizeof(out), MLX5_PTYS_EN, port);
+ err = mlx5_query_port_ptys(dev, out, sizeof(out), MLX5_PTYS_EN, port, 0);
if (err)
return err;
diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h
index 26092c78a985..e68d42b8ce65 100644
--- a/include/linux/mlx5/port.h
+++ b/include/linux/mlx5/port.h
@@ -155,10 +155,11 @@ struct mlx5_port_eth_proto {
int mlx5_set_port_caps(struct mlx5_core_dev *dev, u8 port_num, u32 caps);
int mlx5_query_port_ptys(struct mlx5_core_dev *dev, u32 *ptys,
- int ptys_size, int proto_mask, u8 local_port);
+ int ptys_size, int proto_mask,
+ u8 local_port, u8 plane_index);
int mlx5_query_ib_port_oper(struct mlx5_core_dev *dev, u16 *link_width_oper,
- u16 *proto_oper, u8 local_port);
+ u16 *proto_oper, u8 local_port, u8 plane_index);
void mlx5_toggle_port_link(struct mlx5_core_dev *dev);
int mlx5_set_port_admin_status(struct mlx5_core_dev *dev,
enum mlx5_port_status status);
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH mlx5-next 11/12] net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports
2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
` (9 preceding siblings ...)
2024-06-16 16:08 ` [PATCH mlx5-next 10/12] RDMA/mlx5: Add plane index support when querying PTYS registers Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
2024-06-16 16:08 ` [PATCH mlx5-next 12/12] RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register Leon Romanovsky
` (2 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mark Zhang, Eric Dumazet, Jakub Kicinski, linux-rdma, netdev,
Paolo Abeni, Saeed Mahameed, Tariq Toukan
From: Mark Zhang <markzhang@nvidia.com>
This patch adds new fields to support multi-plane and the extend port
counters group. Actual support will be added in the next patch.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
include/linux/mlx5/mlx5_ifc.h | 47 +++++++++++++++++++++++++++++++++--
1 file changed, 45 insertions(+), 2 deletions(-)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 61738990e399..5fea7b747607 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -2651,6 +2651,46 @@ struct mlx5_ifc_ib_port_cntrs_grp_data_layout_bits {
u8 port_xmit_wait[0x20];
};
+struct mlx5_ifc_ib_ext_port_cntrs_grp_data_layout_bits {
+ u8 reserved_at_0[0x300];
+
+ u8 port_xmit_data_high[0x20];
+
+ u8 port_xmit_data_low[0x20];
+
+ u8 port_rcv_data_high[0x20];
+
+ u8 port_rcv_data_low[0x20];
+
+ u8 port_xmit_pkts_high[0x20];
+
+ u8 port_xmit_pkts_low[0x20];
+
+ u8 port_rcv_pkts_high[0x20];
+
+ u8 port_rcv_pkts_low[0x20];
+
+ u8 reserved_at_400[0x80];
+
+ u8 port_unicast_xmit_pkts_high[0x20];
+
+ u8 port_unicast_xmit_pkts_low[0x20];
+
+ u8 port_multicast_xmit_pkts_high[0x20];
+
+ u8 port_multicast_xmit_pkts_low[0x20];
+
+ u8 port_unicast_rcv_pkts_high[0x20];
+
+ u8 port_unicast_rcv_pkts_low[0x20];
+
+ u8 port_multicast_rcv_pkts_high[0x20];
+
+ u8 port_multicast_rcv_pkts_low[0x20];
+
+ u8 reserved_at_580[0x240];
+};
+
struct mlx5_ifc_eth_per_tc_prio_grp_data_layout_bits {
u8 transmit_queue_high[0x20];
@@ -4543,6 +4583,7 @@ union mlx5_ifc_eth_cntrs_grp_data_layout_auto_bits {
struct mlx5_ifc_eth_per_tc_prio_grp_data_layout_bits eth_per_tc_prio_grp_data_layout;
struct mlx5_ifc_eth_per_tc_congest_prio_grp_data_layout_bits eth_per_tc_congest_prio_grp_data_layout;
struct mlx5_ifc_ib_port_cntrs_grp_data_layout_bits ib_port_cntrs_grp_data_layout;
+ struct mlx5_ifc_ib_ext_port_cntrs_grp_data_layout_bits ib_ext_port_cntrs_grp_data_layout;
struct mlx5_ifc_phys_layer_cntrs_bits phys_layer_cntrs;
struct mlx5_ifc_phys_layer_statistical_cntrs_bits phys_layer_statistical_cntrs;
u8 reserved_at_0[0x7c0];
@@ -9851,8 +9892,10 @@ struct mlx5_ifc_ppcnt_reg_bits {
u8 grp[0x6];
u8 clr[0x1];
- u8 reserved_at_21[0x1c];
- u8 prio_tc[0x3];
+ u8 reserved_at_21[0x13];
+ u8 plane_ind[0x4];
+ u8 reserved_at_38[0x3];
+ u8 prio_tc[0x5];
union mlx5_ifc_eth_cntrs_grp_data_layout_auto_bits counter_set;
};
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH mlx5-next 12/12] RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register
2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
` (10 preceding siblings ...)
2024-06-16 16:08 ` [PATCH mlx5-next 11/12] net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports Leon Romanovsky
@ 2024-06-16 16:08 ` Leon Romanovsky
2024-06-28 16:00 ` [PATCH rdma-next 00/12] Multi-plane support for mlx5 Jason Gunthorpe
2024-07-01 12:36 ` Leon Romanovsky
13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-06-16 16:08 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mark Zhang, Eric Dumazet, Jakub Kicinski, linux-rdma, netdev,
Paolo Abeni, Saeed Mahameed, Tariq Toukan
From: Mark Zhang <markzhang@nvidia.com>
Supports per-plane port counters by querying PPCNT register with the
"extended port counters" group, as the query_vport_counter command
doesn't support plane ports.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/mad.c | 69 +++++++++++++++++++++++++++-----
include/linux/mlx5/device.h | 1 +
2 files changed, 59 insertions(+), 11 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/mad.c b/drivers/infiniband/hw/mlx5/mad.c
index ead836d159d3..1b6c5e37d169 100644
--- a/drivers/infiniband/hw/mlx5/mad.c
+++ b/drivers/infiniband/hw/mlx5/mad.c
@@ -147,8 +147,39 @@ static void pma_cnt_assign(struct ib_pma_portcounters *pma_cnt,
vl_15_dropped);
}
-static int query_ib_ppcnt(struct mlx5_core_dev *dev, u8 port_num, void *out,
- size_t sz)
+static void pma_cnt_ext_assign_ppcnt(struct ib_pma_portcounters_ext *cnt_ext,
+ void *out)
+{
+ void *out_pma = MLX5_ADDR_OF(ppcnt_reg, out,
+ counter_set);
+
+#define MLX5_GET_EXT_CNTR(counter_name) \
+ MLX5_GET64(ib_ext_port_cntrs_grp_data_layout, \
+ out_pma, counter_name##_high)
+
+ cnt_ext->port_xmit_data =
+ cpu_to_be64(MLX5_GET_EXT_CNTR(port_xmit_data) >> 2);
+ cnt_ext->port_rcv_data =
+ cpu_to_be64(MLX5_GET_EXT_CNTR(port_rcv_data) >> 2);
+
+ cnt_ext->port_xmit_packets =
+ cpu_to_be64(MLX5_GET_EXT_CNTR(port_xmit_pkts));
+ cnt_ext->port_rcv_packets =
+ cpu_to_be64(MLX5_GET_EXT_CNTR(port_rcv_pkts));
+
+ cnt_ext->port_unicast_xmit_packets =
+ cpu_to_be64(MLX5_GET_EXT_CNTR(port_unicast_xmit_pkts));
+ cnt_ext->port_unicast_rcv_packets =
+ cpu_to_be64(MLX5_GET_EXT_CNTR(port_unicast_rcv_pkts));
+
+ cnt_ext->port_multicast_xmit_packets =
+ cpu_to_be64(MLX5_GET_EXT_CNTR(port_multicast_xmit_pkts));
+ cnt_ext->port_multicast_rcv_packets =
+ cpu_to_be64(MLX5_GET_EXT_CNTR(port_multicast_rcv_pkts));
+}
+
+static int query_ib_ppcnt(struct mlx5_core_dev *dev, u8 port_num, u8 plane_num,
+ void *out, size_t sz, bool ext)
{
u32 *in;
int err;
@@ -160,8 +191,14 @@ static int query_ib_ppcnt(struct mlx5_core_dev *dev, u8 port_num, void *out,
}
MLX5_SET(ppcnt_reg, in, local_port, port_num);
-
- MLX5_SET(ppcnt_reg, in, grp, MLX5_INFINIBAND_PORT_COUNTERS_GROUP);
+ MLX5_SET(ppcnt_reg, in, plane_ind, plane_num);
+
+ if (ext)
+ MLX5_SET(ppcnt_reg, in, grp,
+ MLX5_INFINIBAND_EXTENDED_PORT_COUNTERS_GROUP);
+ else
+ MLX5_SET(ppcnt_reg, in, grp,
+ MLX5_INFINIBAND_PORT_COUNTERS_GROUP);
err = mlx5_core_access_reg(dev, in, sz, out,
sz, MLX5_REG_PPCNT, 0, 0);
@@ -189,7 +226,8 @@ static int process_pma_cmd(struct mlx5_ib_dev *dev, u32 port_num,
mdev_port_num = 1;
}
if (MLX5_CAP_GEN(dev->mdev, num_ports) == 1 &&
- !mlx5_core_mp_enabled(mdev)) {
+ !mlx5_core_mp_enabled(mdev) &&
+ dev->ib_dev.type != RDMA_DEVICE_TYPE_SMI) {
/* set local port to one for Function-Per-Port HCA. */
mdev = dev->mdev;
mdev_port_num = 1;
@@ -208,7 +246,8 @@ static int process_pma_cmd(struct mlx5_ib_dev *dev, u32 port_num,
if (in_mad->mad_hdr.attr_id == IB_PMA_PORT_COUNTERS_EXT) {
struct ib_pma_portcounters_ext *pma_cnt_ext =
(struct ib_pma_portcounters_ext *)(out_mad->data + 40);
- int sz = MLX5_ST_SZ_BYTES(query_vport_counter_out);
+ int sz = max(MLX5_ST_SZ_BYTES(query_vport_counter_out),
+ MLX5_ST_SZ_BYTES(ppcnt_reg));
out_cnt = kvzalloc(sz, GFP_KERNEL);
if (!out_cnt) {
@@ -216,10 +255,18 @@ static int process_pma_cmd(struct mlx5_ib_dev *dev, u32 port_num,
goto done;
}
- err = mlx5_core_query_vport_counter(mdev, 0, 0, mdev_port_num,
- out_cnt);
- if (!err)
- pma_cnt_ext_assign(pma_cnt_ext, out_cnt);
+ if (dev->ib_dev.type == RDMA_DEVICE_TYPE_SMI) {
+ err = query_ib_ppcnt(mdev, mdev_port_num,
+ port_num, out_cnt, sz, 1);
+ if (!err)
+ pma_cnt_ext_assign_ppcnt(pma_cnt_ext, out_cnt);
+ } else {
+ err = mlx5_core_query_vport_counter(mdev, 0, 0,
+ mdev_port_num,
+ out_cnt);
+ if (!err)
+ pma_cnt_ext_assign(pma_cnt_ext, out_cnt);
+ }
} else {
struct ib_pma_portcounters *pma_cnt =
(struct ib_pma_portcounters *)(out_mad->data + 40);
@@ -231,7 +278,7 @@ static int process_pma_cmd(struct mlx5_ib_dev *dev, u32 port_num,
goto done;
}
- err = query_ib_ppcnt(mdev, mdev_port_num, out_cnt, sz);
+ err = query_ib_ppcnt(mdev, mdev_port_num, 0, out_cnt, sz, 0);
if (!err)
pma_cnt_assign(pma_cnt, out_cnt);
}
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index d7bb31d9a446..68bd1b4737ea 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1466,6 +1466,7 @@ enum {
MLX5_PER_TRAFFIC_CLASS_CONGESTION_GROUP = 0x13,
MLX5_PHYSICAL_LAYER_STATISTICAL_GROUP = 0x16,
MLX5_INFINIBAND_PORT_COUNTERS_GROUP = 0x20,
+ MLX5_INFINIBAND_EXTENDED_PORT_COUNTERS_GROUP = 0x21,
};
enum {
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH rdma-next 00/12] Multi-plane support for mlx5
2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
` (11 preceding siblings ...)
2024-06-16 16:08 ` [PATCH mlx5-next 12/12] RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register Leon Romanovsky
@ 2024-06-28 16:00 ` Jason Gunthorpe
2024-07-01 12:36 ` Leon Romanovsky
13 siblings, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2024-06-28 16:00 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Leon Romanovsky, Eric Dumazet, Jakub Kicinski, linux-kernel,
linux-rdma, Mark Zhang, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan
On Sun, Jun 16, 2024 at 07:08:32PM +0300, Leon Romanovsky wrote:
> Mark Zhang (12):
> RDMA/core: Create "issm*" device nodes only when SMI is supported
> net/mlx5: mlx5_ifc update for multi-plane support
> RDMA/mlx5: Add support to multi-plane device and port
> RDMA/core: Support IB sub device with type "SMI"
> RDMA: Set type of rdma_ah to IB for a SMI sub device
> RDMA/core: Create GSI QP only when CM is supported
> RDMA/mlx5: Support plane device and driver APIs to add and delete it
> RDMA/nldev: Add support to add/delete a sub IB device through netlink
> RDMA/nldev: Add support to dump device type and parent device if
> exists
> RDMA/mlx5: Add plane index support when querying PTYS registers
> net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports
> RDMA/mlx5: Support per-plane port IB counters by querying PPCNT
> register
This all seems quite straightforward, Leon are you going to put this
on a shared branch with all the IFC stuff/etc?
Thanks,
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH rdma-next 00/12] Multi-plane support for mlx5
2024-06-16 16:08 [PATCH rdma-next 00/12] Multi-plane support for mlx5 Leon Romanovsky
` (12 preceding siblings ...)
2024-06-28 16:00 ` [PATCH rdma-next 00/12] Multi-plane support for mlx5 Jason Gunthorpe
@ 2024-07-01 12:36 ` Leon Romanovsky
13 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2024-07-01 12:36 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: Eric Dumazet, Jakub Kicinski, linux-kernel, linux-rdma,
Mark Zhang, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan
On Sun, 16 Jun 2024 19:08:32 +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
>
> From Mark,
>
> This patchset adds support to IB sub device and mlx5 implementation.
>
> An IB sub device provides a subset of functionalists of it's parent.
> Currently type "SMI" is supported: A SMI device provides SMI (QP0)
> interface and shares same VPort with it's parent; It allows the subnet
> manager to configure VPort through this interface when the parent
> doesn't support SMI.
>
> [...]
Applied, thanks!
[01/12] RDMA/core: Create "issm*" device nodes only when SMI is supported
https://git.kernel.org/rdma/rdma/c/50660c5197f52b
[02/12] net/mlx5: mlx5_ifc update for multi-plane support
https://git.kernel.org/rdma/rdma/c/65528cfb21fdb6
[03/12] RDMA/mlx5: Add support to multi-plane device and port
https://git.kernel.org/rdma/rdma/c/2a5db20fa53219
[04/12] RDMA/core: Support IB sub device with type "SMI"
https://git.kernel.org/rdma/rdma/c/f3b5c2b823fbd8
[05/12] RDMA: Set type of rdma_ah to IB for a SMI sub device
https://git.kernel.org/rdma/rdma/c/66862e38a557b3
[06/12] RDMA/core: Create GSI QP only when CM is supported
https://git.kernel.org/rdma/rdma/c/6d4498d1745128
[07/12] RDMA/mlx5: Support plane device and driver APIs to add and delete it
https://git.kernel.org/rdma/rdma/c/39351acd72e775
[08/12] RDMA/nldev: Add support to add/delete a sub IB device through netlink
https://git.kernel.org/rdma/rdma/c/201dfa2d8129a6
[09/12] RDMA/nldev: Add support to dump device type and parent device if exists
https://git.kernel.org/rdma/rdma/c/1bc00c7c0ae33e
[10/12] RDMA/mlx5: Add plane index support when querying PTYS registers
https://git.kernel.org/rdma/rdma/c/d6caf3986716c3
[11/12] net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports
https://git.kernel.org/rdma/rdma/c/db9e43f6580613
[12/12] RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register
https://git.kernel.org/rdma/rdma/c/ac3a5e5f01eb40
Best regards,
--
Leon Romanovsky <leonro@nvidia.com>
^ permalink raw reply [flat|nested] 17+ messages in thread