* [PATCH rdma-next 0/3] Add net namespace awareness to device registration
@ 2025-06-17 8:44 Leon Romanovsky
2025-06-17 8:44 ` [PATCH rdma-next 1/3] RDMA/core: Extend RDMA device registration to be net namespace aware Leon Romanovsky
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Leon Romanovsky @ 2025-06-17 8:44 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Dennis Dalessandro, linux-rdma, Mark Bloch, Parav Pandit
From Mark:
Introduces net namespace awareness to RDMA device registration and update
relevant users accordingly.
Currently, RDMA devices are always registered in the initial network namespace,
for example even when their associated devlink devices have been moved to
a different namespace via devlink reload.
This results in inconsistent behavior and namespace mismatches.
So in this series, we update the RDMA core to optionally accept
a net namespace during device allocation, allowing drivers to associate
the RDMA device with the correct namespace. In addition, we ensures that
IPoIB inherit the namespace from the underlying RDMA device, maintaining
consistency across the RDMA stack.
Thanks
Mark Bloch (3):
RDMA/core: Extend RDMA device registration to be net namespace aware
RDMA/mlx5: Allocate IB device with net namespace supplied from core
dev
RDMA/ipoib: Use parent rdma device net namespace
drivers/infiniband/core/device.c | 14 ++++++++++++--
drivers/infiniband/hw/mlx5/ib_rep.c | 3 ++-
drivers/infiniband/hw/mlx5/main.c | 6 ++++--
drivers/infiniband/sw/rdmavt/vt.c | 2 +-
drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 ++
.../net/ethernet/mellanox/mlx5/core/lib/mlx5.h | 5 -----
include/linux/mlx5/driver.h | 5 +++++
include/rdma/ib_verbs.h | 15 +++++++++++++--
8 files changed, 39 insertions(+), 13 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH rdma-next 1/3] RDMA/core: Extend RDMA device registration to be net namespace aware
2025-06-17 8:44 [PATCH rdma-next 0/3] Add net namespace awareness to device registration Leon Romanovsky
@ 2025-06-17 8:44 ` Leon Romanovsky
2025-06-17 8:44 ` [PATCH rdma-next 2/3] RDMA/mlx5: Allocate IB device with net namespace supplied from core dev Leon Romanovsky
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2025-06-17 8:44 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Mark Bloch, Dennis Dalessandro, linux-rdma, Parav Pandit
From: Mark Bloch <mbloch@nvidia.com>
Presently, RDMA devices are always registered within the init network
namespace, even if the associated devlink device's namespace was
changed via a devlink reload. This mismatch leads to discrepancies
between the network namespace of the devlink device and that of the
RDMA device.
Therefore, extend the RDMA device allocation API to optionally take
the net namespace. This isn't limited to devices that support devlink
but allows all users to provide the network namespace if they need to
do so.
If a network namespace is provided during device allocation, it's up
to the caller to make sure the namespace stays valid until
ib_register_device() is called.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/device.c | 14 ++++++++++++--
drivers/infiniband/sw/rdmavt/vt.c | 2 +-
include/rdma/ib_verbs.h | 11 +++++++++--
3 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 79d8e6fce487..1ca6a9b7ba1a 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -584,6 +584,8 @@ static void rdma_init_coredev(struct ib_core_device *coredev,
/**
* _ib_alloc_device - allocate an IB device struct
* @size:size of structure to allocate
+ * @net: network namespace device should be located in, namespace
+ * must stay valid until ib_register_device() is completed.
*
* Low-level drivers should use ib_alloc_device() to allocate &struct
* ib_device. @size is the size of the structure to be allocated,
@@ -591,7 +593,7 @@ static void rdma_init_coredev(struct ib_core_device *coredev,
* ib_dealloc_device() must be used to free structures allocated with
* ib_alloc_device().
*/
-struct ib_device *_ib_alloc_device(size_t size)
+struct ib_device *_ib_alloc_device(size_t size, struct net *net)
{
struct ib_device *device;
unsigned int i;
@@ -608,7 +610,15 @@ struct ib_device *_ib_alloc_device(size_t size)
return NULL;
}
- rdma_init_coredev(&device->coredev, device, &init_net);
+ /* ib_devices_shared_netns can't change while we have active namespaces
+ * in the system which means either init_net is passed or the user has
+ * no idea what they are doing.
+ *
+ * To avoid breaking backward compatibility, when in shared mode,
+ * force to init the device in the init_net.
+ */
+ net = ib_devices_shared_netns ? &init_net : net;
+ rdma_init_coredev(&device->coredev, device, net);
INIT_LIST_HEAD(&device->event_handler_list);
spin_lock_init(&device->qp_open_list_lock);
diff --git a/drivers/infiniband/sw/rdmavt/vt.c b/drivers/infiniband/sw/rdmavt/vt.c
index 5499025e8a0a..d22d610c2696 100644
--- a/drivers/infiniband/sw/rdmavt/vt.c
+++ b/drivers/infiniband/sw/rdmavt/vt.c
@@ -49,7 +49,7 @@ struct rvt_dev_info *rvt_alloc_device(size_t size, int nports)
{
struct rvt_dev_info *rdi;
- rdi = container_of(_ib_alloc_device(size), struct rvt_dev_info, ibdev);
+ rdi = container_of(_ib_alloc_device(size, &init_net), struct rvt_dev_info, ibdev);
if (!rdi)
return rdi;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5e70a5cf35c3..77cea846eb2d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2914,11 +2914,18 @@ struct ib_block_iter {
unsigned int __pg_bit; /* alignment of current block */
};
-struct ib_device *_ib_alloc_device(size_t size);
+struct ib_device *_ib_alloc_device(size_t size, struct net *net);
#define ib_alloc_device(drv_struct, member) \
container_of(_ib_alloc_device(sizeof(struct drv_struct) + \
BUILD_BUG_ON_ZERO(offsetof( \
- struct drv_struct, member))), \
+ struct drv_struct, member)), \
+ &init_net), \
+ struct drv_struct, member)
+
+#define ib_alloc_device_with_net(drv_struct, member, net) \
+ container_of(_ib_alloc_device(sizeof(struct drv_struct) + \
+ BUILD_BUG_ON_ZERO(offsetof( \
+ struct drv_struct, member)), net), \
struct drv_struct, member)
void ib_dealloc_device(struct ib_device *device);
--
2.49.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH rdma-next 2/3] RDMA/mlx5: Allocate IB device with net namespace supplied from core dev
2025-06-17 8:44 [PATCH rdma-next 0/3] Add net namespace awareness to device registration Leon Romanovsky
2025-06-17 8:44 ` [PATCH rdma-next 1/3] RDMA/core: Extend RDMA device registration to be net namespace aware Leon Romanovsky
@ 2025-06-17 8:44 ` Leon Romanovsky
2025-06-17 8:44 ` [PATCH rdma-next 3/3] RDMA/ipoib: Use parent rdma device net namespace Leon Romanovsky
2025-06-26 12:14 ` [PATCH rdma-next 0/3] Add net namespace awareness to device registration Leon Romanovsky
3 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2025-06-17 8:44 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Mark Bloch, Dennis Dalessandro, linux-rdma, Parav Pandit
From: Mark Bloch <mbloch@nvidia.com>
Use the new ib_alloc_device_with_net() API to allocate the IB device
so that it is properly bound to the network namespace obtained via
mlx5_core_net(). This change ensures correct namespace association
(e.g., for containerized setups).
Additionally, expose mlx5_core_net so that RDMA driver can use it.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/ib_rep.c | 3 ++-
drivers/infiniband/hw/mlx5/main.c | 6 ++++--
drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h | 5 -----
include/linux/mlx5/driver.h | 5 +++++
4 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/ib_rep.c b/drivers/infiniband/hw/mlx5/ib_rep.c
index 49af1cfbe6d1..cc8859d3c2f5 100644
--- a/drivers/infiniband/hw/mlx5/ib_rep.c
+++ b/drivers/infiniband/hw/mlx5/ib_rep.c
@@ -88,7 +88,8 @@ mlx5_ib_vport_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep)
else
return mlx5_ib_set_vport_rep(lag_master, rep, vport_index);
- ibdev = ib_alloc_device(mlx5_ib_dev, ib_dev);
+ ibdev = ib_alloc_device_with_net(mlx5_ib_dev, ib_dev,
+ mlx5_core_net(lag_master));
if (!ibdev)
return -ENOMEM;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index b0aa6c8f218c..d0ddb24aeb64 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4826,7 +4826,8 @@ static struct ib_device *mlx5_ib_add_sub_dev(struct ib_device *parent,
!MLX5_CAP_GEN_2(mparent->mdev, multiplane_qp_ud))
return ERR_PTR(-EOPNOTSUPP);
- mplane = ib_alloc_device(mlx5_ib_dev, ib_dev);
+ mplane = ib_alloc_device_with_net(mlx5_ib_dev, ib_dev,
+ mlx5_core_net(mparent->mdev));
if (!mplane)
return ERR_PTR(-ENOMEM);
@@ -4940,7 +4941,8 @@ static int mlx5r_probe(struct auxiliary_device *adev,
num_ports = max(MLX5_CAP_GEN(mdev, num_ports),
MLX5_CAP_GEN(mdev, num_vhca_ports));
- dev = ib_alloc_device(mlx5_ib_dev, ib_dev);
+ dev = ib_alloc_device_with_net(mlx5_ib_dev, ib_dev,
+ mlx5_core_net(mdev));
if (!dev)
return -ENOMEM;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
index 37d5f445598c..b111ccd03b02 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
@@ -45,11 +45,6 @@ int mlx5_crdump_enable(struct mlx5_core_dev *dev);
void mlx5_crdump_disable(struct mlx5_core_dev *dev);
int mlx5_crdump_collect(struct mlx5_core_dev *dev, u32 *cr_data);
-static inline struct net *mlx5_core_net(struct mlx5_core_dev *dev)
-{
- return devlink_net(priv_to_devlink(dev));
-}
-
static inline struct net_device *mlx5_uplink_netdev_get(struct mlx5_core_dev *mdev)
{
return mdev->mlx5e_res.uplink_netdev;
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index e6ba8f4f4bd1..3475d33c75f4 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1349,4 +1349,9 @@ enum {
};
bool mlx5_wc_support_get(struct mlx5_core_dev *mdev);
+
+static inline struct net *mlx5_core_net(struct mlx5_core_dev *dev)
+{
+ return devlink_net(priv_to_devlink(dev));
+}
#endif /* MLX5_DRIVER_H */
--
2.49.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH rdma-next 3/3] RDMA/ipoib: Use parent rdma device net namespace
2025-06-17 8:44 [PATCH rdma-next 0/3] Add net namespace awareness to device registration Leon Romanovsky
2025-06-17 8:44 ` [PATCH rdma-next 1/3] RDMA/core: Extend RDMA device registration to be net namespace aware Leon Romanovsky
2025-06-17 8:44 ` [PATCH rdma-next 2/3] RDMA/mlx5: Allocate IB device with net namespace supplied from core dev Leon Romanovsky
@ 2025-06-17 8:44 ` Leon Romanovsky
2025-06-26 12:14 ` [PATCH rdma-next 0/3] Add net namespace awareness to device registration Leon Romanovsky
3 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2025-06-17 8:44 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Mark Bloch, Dennis Dalessandro, linux-rdma, Parav Pandit
From: Mark Bloch <mbloch@nvidia.com>
Use the net namespace of the underlying rdma device.
After honoring the rdma device's namespace, the ipoib
netdev now also runs in the same net namespace of the
rdma device.
Add an API to read the net namespace of the rdma device
so that ULP such as IPoIB can use it to initialize its
netdev.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 ++
include/rdma/ib_verbs.h | 4 ++++
2 files changed, 6 insertions(+)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index f2f5465f2a90..7acafc5c0e09 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -2577,6 +2577,8 @@ static struct net_device *ipoib_add_port(const char *format,
ndev->rtnl_link_ops = ipoib_get_link_ops();
+ dev_net_set(ndev, rdma_dev_net(hca));
+
result = register_netdev(ndev);
if (result) {
pr_warn("%s: couldn't register ipoib port %d; error %d\n",
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 77cea846eb2d..2288387089cd 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4872,6 +4872,10 @@ bool rdma_dev_access_netns(const struct ib_device *device,
const struct net *net);
bool rdma_dev_has_raw_cap(const struct ib_device *dev);
+static inline struct net *rdma_dev_net(struct ib_device *device)
+{
+ return read_pnet(&device->coredev.rdma_net);
+}
#define IB_ROCE_UDP_ENCAP_VALID_PORT_MIN (0xC000)
#define IB_ROCE_UDP_ENCAP_VALID_PORT_MAX (0xFFFF)
--
2.49.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH rdma-next 0/3] Add net namespace awareness to device registration
2025-06-17 8:44 [PATCH rdma-next 0/3] Add net namespace awareness to device registration Leon Romanovsky
` (2 preceding siblings ...)
2025-06-17 8:44 ` [PATCH rdma-next 3/3] RDMA/ipoib: Use parent rdma device net namespace Leon Romanovsky
@ 2025-06-26 12:14 ` Leon Romanovsky
3 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2025-06-26 12:14 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: Dennis Dalessandro, linux-rdma, Mark Bloch, Parav Pandit
On Tue, 17 Jun 2025 11:44:00 +0300, Leon Romanovsky wrote:
> >From Mark:
> Introduces net namespace awareness to RDMA device registration and update
> relevant users accordingly.
>
> Currently, RDMA devices are always registered in the initial network namespace,
> for example even when their associated devlink devices have been moved to
> a different namespace via devlink reload.
>
> [...]
Applied, thanks!
[1/3] RDMA/core: Extend RDMA device registration to be net namespace aware
https://git.kernel.org/rdma/rdma/c/8cffca866ba86c
[2/3] RDMA/mlx5: Allocate IB device with net namespace supplied from core dev
https://git.kernel.org/rdma/rdma/c/611d08207d3135
[3/3] RDMA/ipoib: Use parent rdma device net namespace
https://git.kernel.org/rdma/rdma/c/f1208b05574f63
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-06-26 12:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-17 8:44 [PATCH rdma-next 0/3] Add net namespace awareness to device registration Leon Romanovsky
2025-06-17 8:44 ` [PATCH rdma-next 1/3] RDMA/core: Extend RDMA device registration to be net namespace aware Leon Romanovsky
2025-06-17 8:44 ` [PATCH rdma-next 2/3] RDMA/mlx5: Allocate IB device with net namespace supplied from core dev Leon Romanovsky
2025-06-17 8:44 ` [PATCH rdma-next 3/3] RDMA/ipoib: Use parent rdma device net namespace Leon Romanovsky
2025-06-26 12:14 ` [PATCH rdma-next 0/3] Add net namespace awareness to device registration Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).