public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] RDMA/rxe: Add the support that rxe can work in net namespace
@ 2026-03-06  8:24 Zhu Yanjun
  2026-03-06  8:24 ` [PATCH 1/4] RDMA/rxe: Add testcase for net namespace rxe Zhu Yanjun
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Zhu Yanjun @ 2026-03-06  8:24 UTC (permalink / raw)
  To: jgg, leon, zyjzyj2000, yanjun.zhu, dsahern, linux-rdma,
	linux-kselftest

Currently rxe does not work correctly in network namespaces.

When the rdma_rxe module is loaded, a UDP socket listening on port
4791 is created in init_net. When users run:

    ip link add ... type rxe

inside another network namespace, the RXE RDMA link is created but it
cannot function properly because the underlying UDP socket belongs to
init_net. Other network namespaces cannot use that socket.

To address this issue, this series introduces net namespace support
for rxe and moves socket management to be per network namespace.

The series first introduces per-net namespace management for the IPv4
and IPv6 sockets used by rxe. The sockets are created when the network
namespace becomes active and are released when the namespace is
destroyed.

Based on this infrastructure, rxe RDMA links are then created and
destroyed within each network namespace. This ensures that both the
UDP sockets and RDMA links are correctly scoped to the namespace in
which they are used.

With these changes, rxe RDMA links can be created and used both in
init_net and in other network namespaces, and resources are properly
cleaned up during namespace teardown.

The series also includes a selftest to verify RXE functionality in
network namespaces.

Zhu Yanjun (4):
  RDMA/rxe: Add testcase for net namespace rxe
  RDMA/nldev: Add dellink function pointer
  RDMA/rxe: Add net namespace support for IPv4/IPv6 sockets
  RDMA/rxe: Support RDMA link creation and destruction per net namespace

 MAINTAINERS                                   |   1 +
 drivers/infiniband/core/nldev.c               |   6 +
 drivers/infiniband/sw/rxe/Makefile            |   3 +-
 drivers/infiniband/sw/rxe/rxe.c               |  41 ++++-
 drivers/infiniband/sw/rxe/rxe_net.c           | 122 ++++++++++----
 drivers/infiniband/sw/rxe/rxe_net.h           |   9 +-
 drivers/infiniband/sw/rxe/rxe_ns.c            | 156 ++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_ns.h            |  17 ++
 include/rdma/rdma_netlink.h                   |   2 +
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/rdma/Makefile         |   5 +
 tools/testing/selftests/rdma/config           |   3 +
 .../selftests/rdma/rping_between_netns.sh     |  57 +++++++
 tools/testing/selftests/rdma/rxe_ipv6.sh      |  47 ++++++
 .../testing/selftests/rdma/socket_with_rxe.sh |  64 +++++++
 15 files changed, 493 insertions(+), 41 deletions(-)
 create mode 100644 drivers/infiniband/sw/rxe/rxe_ns.c
 create mode 100644 drivers/infiniband/sw/rxe/rxe_ns.h
 create mode 100644 tools/testing/selftests/rdma/Makefile
 create mode 100644 tools/testing/selftests/rdma/config
 create mode 100755 tools/testing/selftests/rdma/rping_between_netns.sh
 create mode 100755 tools/testing/selftests/rdma/rxe_ipv6.sh
 create mode 100755 tools/testing/selftests/rdma/socket_with_rxe.sh

-- 
2.52.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] RDMA/rxe: Add testcase for net namespace rxe
  2026-03-06  8:24 [PATCH 0/4] RDMA/rxe: Add the support that rxe can work in net namespace Zhu Yanjun
@ 2026-03-06  8:24 ` Zhu Yanjun
  2026-03-07  1:10   ` David Ahern
  2026-03-06  8:24 ` [PATCH 2/4] RDMA/nldev: Add dellink function pointer Zhu Yanjun
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Zhu Yanjun @ 2026-03-06  8:24 UTC (permalink / raw)
  To: jgg, leon, zyjzyj2000, yanjun.zhu, dsahern, linux-rdma,
	linux-kselftest

Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
 MAINTAINERS                                   |  1 +
 tools/testing/selftests/Makefile              |  1 +
 tools/testing/selftests/rdma/Makefile         |  5 ++
 tools/testing/selftests/rdma/config           |  3 +
 .../selftests/rdma/rping_between_netns.sh     | 57 +++++++++++++++++
 tools/testing/selftests/rdma/rxe_ipv6.sh      | 47 ++++++++++++++
 .../testing/selftests/rdma/socket_with_rxe.sh | 64 +++++++++++++++++++
 7 files changed, 178 insertions(+)
 create mode 100644 tools/testing/selftests/rdma/Makefile
 create mode 100644 tools/testing/selftests/rdma/config
 create mode 100755 tools/testing/selftests/rdma/rping_between_netns.sh
 create mode 100755 tools/testing/selftests/rdma/rxe_ipv6.sh
 create mode 100755 tools/testing/selftests/rdma/socket_with_rxe.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 61bf550fd37c..3f3aca470d77 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -24509,6 +24509,7 @@ L:	linux-rdma@vger.kernel.org
 S:	Supported
 F:	drivers/infiniband/sw/rxe/
 F:	include/uapi/rdma/rdma_user_rxe.h
+F:	tools/testing/selftests/rdma/
 
 SOFTLOGIC 6x10 MPEG CODEC
 M:	Bluecherry Maintainers <maintainers@bluecherrydvr.com>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 450f13ba4cca..110e07c0d99d 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -94,6 +94,7 @@ TARGETS += proc
 TARGETS += pstore
 TARGETS += ptrace
 TARGETS += openat2
+TARGETS += rdma
 TARGETS += resctrl
 TARGETS += riscv
 TARGETS += rlimits
diff --git a/tools/testing/selftests/rdma/Makefile b/tools/testing/selftests/rdma/Makefile
new file mode 100644
index 000000000000..362e97f0fb3e
--- /dev/null
+++ b/tools/testing/selftests/rdma/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+TEST_PROGS := rping_between_netns.sh \
+		rxe_ipv6.sh \
+		socket_with_rxe.sh
+include ../lib.mk
diff --git a/tools/testing/selftests/rdma/config b/tools/testing/selftests/rdma/config
new file mode 100644
index 000000000000..4ffb814e253b
--- /dev/null
+++ b/tools/testing/selftests/rdma/config
@@ -0,0 +1,3 @@
+CONFIG_TUN
+CONFIG_VETH
+CONFIG_RDMA_RXE
diff --git a/tools/testing/selftests/rdma/rping_between_netns.sh b/tools/testing/selftests/rdma/rping_between_netns.sh
new file mode 100755
index 000000000000..80b4249dba55
--- /dev/null
+++ b/tools/testing/selftests/rdma/rping_between_netns.sh
@@ -0,0 +1,57 @@
+#!/bin/sh
+
+# Notes:
+#
+# 1. Before running this script, please disable the firewall, as it may
+# block UDP port 4791.
+
+# 2. This test script depends on the veth and tun drivers. Before running
+#  the script, please verify that both drivers are available by executing:
+#
+# modinfo veth
+#
+# Make sure these commands return valid module information.
+
+#1. Check if rping can work or not
+exec > /dev/null
+ip netns add test1
+ip netns ls
+ip link add veth-a type veth peer name veth-b
+ip l
+ip link set veth-a netns test1
+ip l
+ip netns exec test1 ip l set veth-a up
+ip netns exec test1 ip addr add 1.1.1.1/24 dev veth-a
+ip netns exec test1 ip l
+ip netns exec test1 ip -4 a
+ip netns exec test1 rdma link add rxe0 type rxe netdev veth-a
+
+#check if socket exist or not
+ip netns exec test1 ss -lun | grep :4791
+
+ip netns exec test1 rdma link
+ip link set veth-b up
+ip addr add 1.1.1.2/24 dev veth-b
+ping -c 3 1.1.1.1 || exit 1
+ip netns exec test1 rping -s -a 1.1.1.1&
+rdma link add rxe1 type rxe netdev veth-b
+rdma link
+
+#check if socket exist or not
+ss -lun | grep :4791
+
+rping -c -a 1.1.1.1 -d -v -C 3 || exit 1
+ip netns ls
+rdma link del rxe1
+
+#check if socket exist or not
+ss -lun | grep :4791
+
+ip netns exec test1 ss -lun | grep :4791
+ip netns exec test1 rdma link del rxe0
+ip netns exec test1 ss -lun | grep :4791
+ip netns del test1
+ip netns ls
+
+modprobe -v -r veth
+modprobe -v -r rdma_rxe
diff --git a/tools/testing/selftests/rdma/rxe_ipv6.sh b/tools/testing/selftests/rdma/rxe_ipv6.sh
new file mode 100755
index 000000000000..9337ac4fd13f
--- /dev/null
+++ b/tools/testing/selftests/rdma/rxe_ipv6.sh
@@ -0,0 +1,47 @@
+#!/bin/sh
+
+# Notes:
+#
+# 1. Before running this script, please disable the firewall, as it may
+# block UDP port 4791.
+
+# 2. This test script depends on the veth and tun drivers. Before running
+#  the script, please verify that both drivers are available by executing:
+#
+# modinfo tun
+# modinfo veth
+#
+# Make sure these commands return valid module information.
+
+# 3. ipv6 test.
+# While RXE is conventionally deployed over IPv4, it maintains
+# native support for IPv6. However, IPv6 implementations typically
+# receive less validation and performance tuning in standard use cases.
+exec > /dev/null
+# 1) create ipv6 net namespace
+ip netns add net6
+ip link add veth0 type veth peer name veth1
+ip link set veth1 netns net6
+ip netns exec net6 ip addr add 2001:db8::1/64 dev veth1
+ip netns exec net6 ip link set veth1 up
+
+# 2) Add rdma link
+ip netns exec net6 rdma link add rxe6 type rxe netdev veth1
+
+# 3) check IPv6 UDP 4791 listening port
+if ! ip netns exec net6 ss -ul6n | grep :4791; then
+	echo "Error: udp port 4791 exists"
+	exit 1
+fi
+
+# 4) Delete rxe link
+ip netns exec net6 rdma link del rxe6
+if ip netns exec net6 ss -ul6n | grep :4791; then  # result should be null
+	echo "Error: udp port 4791 exists"
+	exit 1
+fi
+
+# 5) delete net6
+ip netns del net6
+
+modprobe -v -r rdma_rxe
diff --git a/tools/testing/selftests/rdma/socket_with_rxe.sh b/tools/testing/selftests/rdma/socket_with_rxe.sh
new file mode 100755
index 000000000000..676aec63babd
--- /dev/null
+++ b/tools/testing/selftests/rdma/socket_with_rxe.sh
@@ -0,0 +1,64 @@
+#!/bin/sh
+
+# Notes:
+#
+# 1. Before running this script, please disable the firewall, as it may
+# block UDP port 4791.
+
+# 2. This test script depends on the veth and tun drivers. Before running
+#  the script, please verify that both drivers are available by executing:
+#
+# modinfo tun
+#
+# Make sure these commands return valid module information.
+
+# Check if socket exist or not
+exec > /dev/null
+ip tuntap add mode tun tun0
+ip -4 a
+ip addr add 1.1.1.1/24 dev tun0
+ip link set tun0 up
+ip -4 a
+rdma link add rxe0 type rxe netdev tun0
+rdma link
+ret=`ss -lun | grep :4791`
+if [ X"$ret" == X"" ]; then
+	echo "Error: udp port 4791 does not exist"
+	exit 1
+fi
+
+ip tuntap add mode tun tun1
+ip -4 a
+ip addr add 2.2.2.2/24 dev tun1
+ip link set tun1 up
+rdma link add rxe1 type rxe netdev tun1
+rdma link
+ret=`ss -lun | grep :4791`
+if [ X"$ret" == X"" ]; then
+	echo "Error: udp port 4791 does not exist"
+	exit 1
+fi
+
+rdma link del rxe1
+rdma link
+ret=`ss -lun | grep :4791`
+if [ X"$ret" == X"" ]; then
+	echo "Error: udp port 4791 doese not exist"
+	exit 1
+fi
+
+rdma link del rxe0
+rdma link
+if ss -lun | grep :4791; then
+	echo "Error: udp port 4791 exists"
+	exit 1
+fi
+
+ip addr del 2.2.2.2/24 dev tun1
+ip tuntap del mode tun tun1
+
+ip addr del 1.1.1.1/24 dev tun0
+ip tuntap del mode tun tun0
+
+modprobe -v -r tun
+modprobe -v -r rdma_rxe
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/4] RDMA/nldev: Add dellink function pointer
  2026-03-06  8:24 [PATCH 0/4] RDMA/rxe: Add the support that rxe can work in net namespace Zhu Yanjun
  2026-03-06  8:24 ` [PATCH 1/4] RDMA/rxe: Add testcase for net namespace rxe Zhu Yanjun
@ 2026-03-06  8:24 ` Zhu Yanjun
  2026-03-06  8:24 ` [PATCH 3/4] RDMA/rxe: Add net namespace support for IPv4/IPv6 sockets Zhu Yanjun
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Zhu Yanjun @ 2026-03-06  8:24 UTC (permalink / raw)
  To: jgg, leon, zyjzyj2000, yanjun.zhu, dsahern, linux-rdma,
	linux-kselftest

The newlink function pointer is added. And the sock listening on port 4791
is added in the newlink function. So the dellink function is needed to
remove the sock.

Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
 drivers/infiniband/core/nldev.c | 6 ++++++
 include/rdma/rdma_netlink.h     | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 2220a2dfab24..48684930660a 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -1824,6 +1824,12 @@ static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		return -EINVAL;
 	}
 
+	if (device->link_ops) {
+		err = device->link_ops->dellink(device);
+		if (err)
+			return err;
+	}
+
 	ib_unregister_device_and_put(device);
 	return 0;
 }
diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h
index 326deaf56d5d..2fd1358ea57d 100644
--- a/include/rdma/rdma_netlink.h
+++ b/include/rdma/rdma_netlink.h
@@ -5,6 +5,7 @@
 
 #include <linux/netlink.h>
 #include <uapi/rdma/rdma_netlink.h>
+#include <rdma/ib_verbs.h>
 
 struct ib_device;
 
@@ -126,6 +127,7 @@ struct rdma_link_ops {
 	struct list_head list;
 	const char *type;
 	int (*newlink)(const char *ibdev_name, struct net_device *ndev);
+	int (*dellink)(struct ib_device *dev);
 };
 
 void rdma_link_register(struct rdma_link_ops *ops);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/4] RDMA/rxe: Add net namespace support for IPv4/IPv6 sockets
  2026-03-06  8:24 [PATCH 0/4] RDMA/rxe: Add the support that rxe can work in net namespace Zhu Yanjun
  2026-03-06  8:24 ` [PATCH 1/4] RDMA/rxe: Add testcase for net namespace rxe Zhu Yanjun
  2026-03-06  8:24 ` [PATCH 2/4] RDMA/nldev: Add dellink function pointer Zhu Yanjun
@ 2026-03-06  8:24 ` Zhu Yanjun
  2026-03-07  1:10   ` David Ahern
  2026-03-06  8:24 ` [PATCH 4/4] RDMA/rxe: Support RDMA link creation and destruction per net namespace Zhu Yanjun
  2026-03-06  8:27 ` [PATCH 0/4] RDMA/rxe: Add the support that rxe can work in " Zhu Yanjun
  4 siblings, 1 reply; 12+ messages in thread
From: Zhu Yanjun @ 2026-03-06  8:24 UTC (permalink / raw)
  To: jgg, leon, zyjzyj2000, yanjun.zhu, dsahern, linux-rdma,
	linux-kselftest

Add a net namespace implementation file to rxe to manage the
lifecycle of IPv4 and IPv6 sockets per network namespace.

This implementation handles the creation and destruction of the
sockets both for init_net and for dynamically created network
namespaces. The sockets are initialized when a namespace becomes
active and are properly released when the namespace is removed.

This change provides the infrastructure needed for rxe to operate
correctly in environments using multiple network namespaces.

Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
 drivers/infiniband/sw/rxe/Makefile |   3 +-
 drivers/infiniband/sw/rxe/rxe_ns.c | 134 +++++++++++++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_ns.h |  17 ++++
 3 files changed, 153 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/sw/rxe/rxe_ns.c
 create mode 100644 drivers/infiniband/sw/rxe/rxe_ns.h

diff --git a/drivers/infiniband/sw/rxe/Makefile b/drivers/infiniband/sw/rxe/Makefile
index 93134f1d1d0c..3977f4f13258 100644
--- a/drivers/infiniband/sw/rxe/Makefile
+++ b/drivers/infiniband/sw/rxe/Makefile
@@ -22,6 +22,7 @@ rdma_rxe-y := \
 	rxe_mcast.o \
 	rxe_task.o \
 	rxe_net.o \
-	rxe_hw_counters.o
+	rxe_hw_counters.o \
+	rxe_ns.o
 
 rdma_rxe-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += rxe_odp.o
diff --git a/drivers/infiniband/sw/rxe/rxe_ns.c b/drivers/infiniband/sw/rxe/rxe_ns.c
new file mode 100644
index 000000000000..29d08899dcda
--- /dev/null
+++ b/drivers/infiniband/sw/rxe/rxe_ns.c
@@ -0,0 +1,134 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ */
+
+#include <net/sock.h>
+#include <net/netns/generic.h>
+#include <net/net_namespace.h>
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/pid_namespace.h>
+#include <net/udp_tunnel.h>
+
+#include "rxe_ns.h"
+
+/*
+ * Per network namespace data
+ */
+struct rxe_ns_sock {
+	struct sock __rcu *rxe_sk4;
+	struct sock __rcu *rxe_sk6;
+};
+
+/*
+ * Index to store custom data for each network namespace.
+ */
+static unsigned int rxe_pernet_id;
+
+/*
+ * Called for every existing and added network namespaces
+ */
+static int __net_init rxe_ns_init(struct net *net)
+{
+	/*
+	 * create (if not present) and access data item in network namespace
+	 * (net) using the id (net_id)
+	 */
+	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
+
+	rcu_assign_pointer(ns_sk->rxe_sk4, NULL); /* initialize sock 4 socket */
+	rcu_assign_pointer(ns_sk->rxe_sk6, NULL); /* initialize sock 6 socket */
+	synchronize_rcu();
+
+	return 0;
+}
+
+static void __net_exit rxe_ns_exit(struct net *net)
+{
+	/*
+	 * called when the network namespace is removed
+	 */
+	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
+	struct sock *rxe_sk4 = NULL;
+	struct sock *rxe_sk6 = NULL;
+
+	rcu_read_lock();
+	rxe_sk4 = rcu_dereference(ns_sk->rxe_sk4);
+	rxe_sk6 = rcu_dereference(ns_sk->rxe_sk6);
+	rcu_read_unlock();
+
+	/* close socket */
+	if (rxe_sk4 && rxe_sk4->sk_socket) {
+		udp_tunnel_sock_release(rxe_sk4->sk_socket);
+		rcu_assign_pointer(ns_sk->rxe_sk4, NULL);
+		synchronize_rcu();
+	}
+
+	if (rxe_sk6 && rxe_sk6->sk_socket) {
+		udp_tunnel_sock_release(rxe_sk6->sk_socket);
+		rcu_assign_pointer(ns_sk->rxe_sk6, NULL);
+		synchronize_rcu();
+	}
+}
+
+/*
+ * callback to make the module network namespace aware
+ */
+static struct pernet_operations rxe_net_ops __net_initdata = {
+	.init = rxe_ns_init,
+	.exit = rxe_ns_exit,
+	.id = &rxe_pernet_id,
+	.size = sizeof(struct rxe_ns_sock),
+};
+
+struct sock *rxe_ns_pernet_sk4(struct net *net)
+{
+	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
+	struct sock *sk;
+
+	rcu_read_lock();
+	sk = rcu_dereference(ns_sk->rxe_sk4);
+	rcu_read_unlock();
+
+	return sk;
+}
+
+void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk)
+{
+	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
+
+	rcu_assign_pointer(ns_sk->rxe_sk4, sk);
+	synchronize_rcu();
+}
+
+struct sock *rxe_ns_pernet_sk6(struct net *net)
+{
+	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
+	struct sock *sk;
+
+	rcu_read_lock();
+	sk = rcu_dereference(ns_sk->rxe_sk6);
+	rcu_read_unlock();
+
+	return sk;
+}
+
+void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
+{
+	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
+
+	rcu_assign_pointer(ns_sk->rxe_sk6, sk);
+	synchronize_rcu();
+}
+
+int __init rxe_namespace_init(void)
+{
+	return register_pernet_subsys(&rxe_net_ops);
+}
+
+void __exit rxe_namespace_exit(void)
+{
+	unregister_pernet_subsys(&rxe_net_ops);
+}
diff --git a/drivers/infiniband/sw/rxe/rxe_ns.h b/drivers/infiniband/sw/rxe/rxe_ns.h
new file mode 100644
index 000000000000..da5bfcea1274
--- /dev/null
+++ b/drivers/infiniband/sw/rxe/rxe_ns.h
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ */
+
+#ifndef RXE_NS_H
+#define RXE_NS_H
+
+struct sock *rxe_ns_pernet_sk4(struct net *net);
+struct sock *rxe_ns_pernet_sk6(struct net *net);
+void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk);
+void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk);
+int __init rxe_namespace_init(void);
+void __exit rxe_namespace_exit(void);
+
+#endif /* RXE_NS_H */
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/4] RDMA/rxe: Support RDMA link creation and destruction per net namespace
  2026-03-06  8:24 [PATCH 0/4] RDMA/rxe: Add the support that rxe can work in net namespace Zhu Yanjun
                   ` (2 preceding siblings ...)
  2026-03-06  8:24 ` [PATCH 3/4] RDMA/rxe: Add net namespace support for IPv4/IPv6 sockets Zhu Yanjun
@ 2026-03-06  8:24 ` Zhu Yanjun
  2026-03-07  1:10   ` David Ahern
  2026-03-07  1:12   ` yanjun.zhu
  2026-03-06  8:27 ` [PATCH 0/4] RDMA/rxe: Add the support that rxe can work in " Zhu Yanjun
  4 siblings, 2 replies; 12+ messages in thread
From: Zhu Yanjun @ 2026-03-06  8:24 UTC (permalink / raw)
  To: jgg, leon, zyjzyj2000, yanjun.zhu, dsahern, linux-rdma,
	linux-kselftest

After introducing dellink handling and per-net namespace management
for IPv4 and IPv6 sockets, extend rxe to create and destroy RDMA links
within each network namespace.

With this change, RDMA links can be instantiated both in init_net and
in other network namespaces. The lifecycle of the RDMA link is now tied
to the corresponding namespace and is properly cleaned up when the
namespace or link is removed.

This ensures rxe behaves correctly in multi-namespace environments and
keeps socket and RDMA link resources consistent across namespace
creation and teardown.

Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
 drivers/infiniband/sw/rxe/rxe.c     |  41 +++++++++-
 drivers/infiniband/sw/rxe/rxe_net.c | 122 +++++++++++++++++++++-------
 drivers/infiniband/sw/rxe/rxe_net.h |   9 +-
 drivers/infiniband/sw/rxe/rxe_ns.c  |  22 +++++
 4 files changed, 154 insertions(+), 40 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index e891199cbdef..f74a66948a37 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -8,6 +8,8 @@
 #include <net/addrconf.h>
 #include "rxe.h"
 #include "rxe_loc.h"
+#include "rxe_net.h"
+#include "rxe_ns.h"
 
 MODULE_AUTHOR("Bob Pearson, Frank Zago, John Groves, Kamal Heib");
 MODULE_DESCRIPTION("Soft RDMA transport");
@@ -200,6 +202,8 @@ void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu)
 	port->mtu_cap = ib_mtu_enum_to_int(mtu);
 }
 
+static struct rdma_link_ops rxe_link_ops;
+
 /* called by ifc layer to create new rxe device.
  * The caller should allocate memory for rxe by calling ib_alloc_device.
  */
@@ -208,6 +212,7 @@ int rxe_add(struct rxe_dev *rxe, unsigned int mtu, const char *ibdev_name,
 {
 	rxe_init(rxe, ndev);
 	rxe_set_mtu(rxe, mtu);
+	rxe->ib_dev.link_ops = &rxe_link_ops;
 
 	return rxe_register_device(rxe, ibdev_name, ndev);
 }
@@ -231,6 +236,10 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
 		goto err;
 	}
 
+	err = rxe_net_init(ndev);
+	if (err)
+		return err;
+
 	err = rxe_net_add(ibdev_name, ndev);
 	if (err) {
 		rxe_err("failed to add %s\n", ndev->name);
@@ -240,9 +249,17 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
 	return err;
 }
 
+static int rxe_dellink(struct ib_device *dev)
+{
+	rxe_net_del(dev);
+
+	return 0;
+}
+
 static struct rdma_link_ops rxe_link_ops = {
 	.type = "rxe",
 	.newlink = rxe_newlink,
+	.dellink = rxe_dellink,
 };
 
 static int __init rxe_module_init(void)
@@ -253,15 +270,29 @@ static int __init rxe_module_init(void)
 	if (err)
 		return err;
 
-	err = rxe_net_init();
+	rdma_link_register(&rxe_link_ops);
+
+	err = rxe_register_notifier();
 	if (err) {
-		rxe_destroy_wq();
-		return err;
+		pr_err("Failed to register netdev notifier\n");
+		goto err_wq;
+	}
+
+	err = rxe_namespace_init();
+	if (err) {
+		pr_err("Failed to register net namespace notifier\n");
+		goto err_notifier;;
 	}
 
-	rdma_link_register(&rxe_link_ops);
 	pr_info("loaded\n");
 	return 0;
+
+err_notifier:
+	rxe_net_exit(); /* unregister notifier */
+err_wq:
+	rdma_link_unregister(&rxe_link_ops);
+	rxe_destroy_wq();
+	return err;
 }
 
 static void __exit rxe_module_exit(void)
@@ -271,6 +302,8 @@ static void __exit rxe_module_exit(void)
 	rxe_net_exit();
 	rxe_destroy_wq();
 
+	rxe_namespace_exit();
+
 	pr_info("unloaded\n");
 }
 
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 0bd0902b11f7..ba5bc171a58e 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -17,8 +17,7 @@
 #include "rxe.h"
 #include "rxe_net.h"
 #include "rxe_loc.h"
-
-static struct rxe_recv_sockets recv_sockets;
+#include "rxe_ns.h"
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 /*
@@ -114,7 +113,7 @@ static struct dst_entry *rxe_find_route4(struct rxe_qp *qp,
 	memcpy(&fl.daddr, daddr, sizeof(*daddr));
 	fl.flowi4_proto = IPPROTO_UDP;
 
-	rt = ip_route_output_key(&init_net, &fl);
+	rt = ip_route_output_key(dev_net(ndev), &fl);
 	if (IS_ERR(rt)) {
 		rxe_dbg_qp(qp, "no route to %pI4\n", &daddr->s_addr);
 		return NULL;
@@ -138,8 +137,8 @@ static struct dst_entry *rxe_find_route6(struct rxe_qp *qp,
 	memcpy(&fl6.daddr, daddr, sizeof(*daddr));
 	fl6.flowi6_proto = IPPROTO_UDP;
 
-	ndst = ipv6_stub->ipv6_dst_lookup_flow(sock_net(recv_sockets.sk6->sk),
-					       recv_sockets.sk6->sk, &fl6,
+	ndst = ipv6_stub->ipv6_dst_lookup_flow(dev_net(ndev),
+					       rxe_ns_pernet_sk6(dev_net(ndev)), &fl6,
 					       NULL);
 	if (IS_ERR(ndst)) {
 		rxe_dbg_qp(qp, "no route to %pI6\n", daddr);
@@ -624,6 +623,43 @@ int rxe_net_add(const char *ibdev_name, struct net_device *ndev)
 	return 0;
 }
 
+#define SK_REF_FOR_TUNNEL	2
+
+static void rxe_sock_put(struct sock *sk,
+					void (*set_sk)(struct net *, struct sock *),
+					struct net_device *ndev)
+{
+	if (refcount_read(&sk->sk_refcnt) > SK_REF_FOR_TUNNEL) {
+		__sock_put(sk);
+	} else {
+		rxe_release_udp_tunnel(sk->sk_socket);
+		sk = NULL;
+		set_sk(dev_net(ndev), sk);
+	}
+}
+
+void rxe_net_del(struct ib_device *dev)
+{
+	struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev);
+	struct net_device *ndev;
+	struct sock *sk;
+
+	ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
+	if (!ndev)
+		return;
+
+	sk = rxe_ns_pernet_sk4(dev_net(ndev));
+	if (sk)
+		rxe_sock_put(sk, rxe_ns_pernet_set_sk4, ndev);
+
+	sk = rxe_ns_pernet_sk6(dev_net(ndev));
+	if (sk)
+		rxe_sock_put(sk, rxe_ns_pernet_set_sk6, ndev);
+
+	dev_put(ndev);
+}
+#undef SK_REF_FOR_TUNNEL
+
 static void rxe_port_event(struct rxe_dev *rxe,
 			   enum ib_event_type event)
 {
@@ -680,6 +716,7 @@ static int rxe_notify(struct notifier_block *not_blk,
 	switch (event) {
 	case NETDEV_UNREGISTER:
 		ib_unregister_device_queued(&rxe->ib_dev);
+		rxe_net_del(&rxe->ib_dev);
 		break;
 	case NETDEV_CHANGEMTU:
 		rxe_dbg_dev(rxe, "%s changed mtu to %d\n", ndev->name, ndev->mtu);
@@ -709,66 +746,91 @@ static struct notifier_block rxe_net_notifier = {
 	.notifier_call = rxe_notify,
 };
 
-static int rxe_net_ipv4_init(void)
+static int rxe_net_ipv4_init(struct net_device *ndev)
 {
-	recv_sockets.sk4 = rxe_setup_udp_tunnel(&init_net,
-				htons(ROCE_V2_UDP_DPORT), false);
-	if (IS_ERR(recv_sockets.sk4)) {
-		recv_sockets.sk4 = NULL;
+	struct sock *sk;
+	struct socket *sock;
+
+	sk = rxe_ns_pernet_sk4(dev_net(ndev));
+	if (sk) {
+		sock_hold(sk);
+		return 0;
+	}
+
+	sock = rxe_setup_udp_tunnel(dev_net(ndev), htons(ROCE_V2_UDP_DPORT), false);
+	if (IS_ERR(sock)) {
 		pr_err("Failed to create IPv4 UDP tunnel\n");
 		return -1;
 	}
+	rxe_ns_pernet_set_sk4(dev_net(ndev), sock->sk);
 
 	return 0;
 }
 
-static int rxe_net_ipv6_init(void)
+static int rxe_net_ipv6_init(struct net_device *ndev)
 {
 #if IS_ENABLED(CONFIG_IPV6)
+	struct sock *sk;
+	struct socket *sock;
 
-	recv_sockets.sk6 = rxe_setup_udp_tunnel(&init_net,
-						htons(ROCE_V2_UDP_DPORT), true);
-	if (PTR_ERR(recv_sockets.sk6) == -EAFNOSUPPORT) {
-		recv_sockets.sk6 = NULL;
+	sk = rxe_ns_pernet_sk6(dev_net(ndev));
+	if (sk) {
+		sock_hold(sk);
+		return 0;
+	}
+
+	sock = rxe_setup_udp_tunnel(dev_net(ndev), htons(ROCE_V2_UDP_DPORT), true);
+	if (PTR_ERR(sock) == -EAFNOSUPPORT) {
 		pr_warn("IPv6 is not supported, can not create a UDPv6 socket\n");
 		return 0;
 	}
 
-	if (IS_ERR(recv_sockets.sk6)) {
-		recv_sockets.sk6 = NULL;
+	if (IS_ERR(sock)) {
 		pr_err("Failed to create IPv6 UDP tunnel\n");
 		return -1;
 	}
+
+	rxe_ns_pernet_set_sk6(dev_net(ndev), sock->sk);
+
 #endif
 	return 0;
 }
 
+int rxe_register_notifier(void)
+{
+	int err;
+
+	err = register_netdevice_notifier(&rxe_net_notifier);
+	if (err) {
+		pr_err("Failed to register netdev notifier\n");
+		return -1;
+	}
+
+	return 0;
+}
+
 void rxe_net_exit(void)
 {
-	rxe_release_udp_tunnel(recv_sockets.sk6);
-	rxe_release_udp_tunnel(recv_sockets.sk4);
 	unregister_netdevice_notifier(&rxe_net_notifier);
 }
 
-int rxe_net_init(void)
+int rxe_net_init(struct net_device *ndev)
 {
 	int err;
 
-	recv_sockets.sk6 = NULL;
-
-	err = rxe_net_ipv4_init();
+	err = rxe_net_ipv4_init(ndev);
 	if (err)
 		return err;
-	err = rxe_net_ipv6_init();
+
+	err = rxe_net_ipv6_init(ndev);
 	if (err)
 		goto err_out;
-	err = register_netdevice_notifier(&rxe_net_notifier);
-	if (err) {
-		pr_err("Failed to register netdev notifier\n");
-		goto err_out;
-	}
+
 	return 0;
+
 err_out:
-	rxe_net_exit();
+	/* If ipv6 error, release ipv4 resource */
+	udp_tunnel_sock_release(rxe_ns_pernet_sk4(dev_net(ndev))->sk_socket);
+	rxe_ns_pernet_set_sk4(dev_net(ndev), NULL);
 	return err;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h
index 45d80d00f86b..56249677d692 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.h
+++ b/drivers/infiniband/sw/rxe/rxe_net.h
@@ -11,14 +11,11 @@
 #include <net/if_inet6.h>
 #include <linux/module.h>
 
-struct rxe_recv_sockets {
-	struct socket *sk4;
-	struct socket *sk6;
-};
-
 int rxe_net_add(const char *ibdev_name, struct net_device *ndev);
+void rxe_net_del(struct ib_device *dev);
 
-int rxe_net_init(void);
+int rxe_register_notifier(void);
+int rxe_net_init(struct net_device *ndev);
 void rxe_net_exit(void);
 
 #endif /* RXE_NET_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_ns.c b/drivers/infiniband/sw/rxe/rxe_ns.c
index 29d08899dcda..1ff34167a295 100644
--- a/drivers/infiniband/sw/rxe/rxe_ns.c
+++ b/drivers/infiniband/sw/rxe/rxe_ns.c
@@ -39,7 +39,9 @@ static int __net_init rxe_ns_init(struct net *net)
 	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
 
 	rcu_assign_pointer(ns_sk->rxe_sk4, NULL); /* initialize sock 4 socket */
+#if IS_ENABLED(CONFIG_IPV6)
 	rcu_assign_pointer(ns_sk->rxe_sk6, NULL); /* initialize sock 6 socket */
+#endif /* IPV6 */
 	synchronize_rcu();
 
 	return 0;
@@ -52,11 +54,15 @@ static void __net_exit rxe_ns_exit(struct net *net)
 	 */
 	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
 	struct sock *rxe_sk4 = NULL;
+#if IS_ENABLED(CONFIG_IPV6)
 	struct sock *rxe_sk6 = NULL;
+#endif
 
 	rcu_read_lock();
 	rxe_sk4 = rcu_dereference(ns_sk->rxe_sk4);
+#if IS_ENABLED(CONFIG_IPV6)
 	rxe_sk6 = rcu_dereference(ns_sk->rxe_sk6);
+#endif
 	rcu_read_unlock();
 
 	/* close socket */
@@ -66,11 +72,13 @@ static void __net_exit rxe_ns_exit(struct net *net)
 		synchronize_rcu();
 	}
 
+#if IS_ENABLED(CONFIG_IPV6)
 	if (rxe_sk6 && rxe_sk6->sk_socket) {
 		udp_tunnel_sock_release(rxe_sk6->sk_socket);
 		rcu_assign_pointer(ns_sk->rxe_sk6, NULL);
 		synchronize_rcu();
 	}
+#endif
 }
 
 /*
@@ -103,6 +111,7 @@ void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk)
 	synchronize_rcu();
 }
 
+#if IS_ENABLED(CONFIG_IPV6)
 struct sock *rxe_ns_pernet_sk6(struct net *net)
 {
 	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
@@ -123,6 +132,19 @@ void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
 	synchronize_rcu();
 }
 
+#else /* IPV6 */
+
+struct sock *rxe_ns_pernet_sk6(struct net *net)
+{
+	return NULL;
+}
+
+void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
+{
+}
+
+#endif /* IPV6 */
+
 int __init rxe_namespace_init(void)
 {
 	return register_pernet_subsys(&rxe_net_ops);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] RDMA/rxe: Add the support that rxe can work in net namespace
  2026-03-06  8:24 [PATCH 0/4] RDMA/rxe: Add the support that rxe can work in net namespace Zhu Yanjun
                   ` (3 preceding siblings ...)
  2026-03-06  8:24 ` [PATCH 4/4] RDMA/rxe: Support RDMA link creation and destruction per net namespace Zhu Yanjun
@ 2026-03-06  8:27 ` Zhu Yanjun
  4 siblings, 0 replies; 12+ messages in thread
From: Zhu Yanjun @ 2026-03-06  8:27 UTC (permalink / raw)
  To: jgg, leon, zyjzyj2000, dsahern, linux-rdma, linux-kselftest


在 2026/3/6 0:24, Zhu Yanjun 写道:
> Currently rxe does not work correctly in network namespaces.
>
> When the rdma_rxe module is loaded, a UDP socket listening on port
> 4791 is created in init_net. When users run:
>
>      ip link add ... type rxe
>
> inside another network namespace, the RXE RDMA link is created but it
> cannot function properly because the underlying UDP socket belongs to
> init_net. Other network namespaces cannot use that socket.
>
> To address this issue, this series introduces net namespace support
> for rxe and moves socket management to be per network namespace.
>
> The series first introduces per-net namespace management for the IPv4
> and IPv6 sockets used by rxe. The sockets are created when the network
> namespace becomes active and are released when the namespace is
> destroyed.
>
> Based on this infrastructure, rxe RDMA links are then created and
> destroyed within each network namespace. This ensures that both the
> UDP sockets and RDMA links are correctly scoped to the namespace in
> which they are used.
>
> With these changes, rxe RDMA links can be created and used both in
> init_net and in other network namespaces, and resources are properly
> cleaned up during namespace teardown.
>
> The series also includes a selftest to verify RXE functionality in
> network namespaces.

The selftest results are as below:

"

# make -C tools/testing/selftests TARGETS=rdma run_tests

make: Entering directory '/root/Development/linux/tools/testing/selftests'

make[1]: Nothing to be done for 'all'.
TAP version 13
1..3
# timeout set to 45
# selftests: rdma: rping_between_netns.sh
# server DISCONNECT EVENT...
# wait for RDMA_READ_ADV state 10
ok 1 selftests: rdma: rping_between_netns.sh
# timeout set to 45
# selftests: rdma: rxe_ipv6.sh
ok 2 selftests: rdma: rxe_ipv6.sh
# timeout set to 45
# selftests: rdma: socket_with_rxe.sh
ok 3 selftests: rdma: socket_with_rxe.sh
make: Leaving directory '/root/Development/linux/tools/testing/selftests'

"

Best Regards,

Zhu Yanjun

>
> Zhu Yanjun (4):
>    RDMA/rxe: Add testcase for net namespace rxe
>    RDMA/nldev: Add dellink function pointer
>    RDMA/rxe: Add net namespace support for IPv4/IPv6 sockets
>    RDMA/rxe: Support RDMA link creation and destruction per net namespace
>
>   MAINTAINERS                                   |   1 +
>   drivers/infiniband/core/nldev.c               |   6 +
>   drivers/infiniband/sw/rxe/Makefile            |   3 +-
>   drivers/infiniband/sw/rxe/rxe.c               |  41 ++++-
>   drivers/infiniband/sw/rxe/rxe_net.c           | 122 ++++++++++----
>   drivers/infiniband/sw/rxe/rxe_net.h           |   9 +-
>   drivers/infiniband/sw/rxe/rxe_ns.c            | 156 ++++++++++++++++++
>   drivers/infiniband/sw/rxe/rxe_ns.h            |  17 ++
>   include/rdma/rdma_netlink.h                   |   2 +
>   tools/testing/selftests/Makefile              |   1 +
>   tools/testing/selftests/rdma/Makefile         |   5 +
>   tools/testing/selftests/rdma/config           |   3 +
>   .../selftests/rdma/rping_between_netns.sh     |  57 +++++++
>   tools/testing/selftests/rdma/rxe_ipv6.sh      |  47 ++++++
>   .../testing/selftests/rdma/socket_with_rxe.sh |  64 +++++++
>   15 files changed, 493 insertions(+), 41 deletions(-)
>   create mode 100644 drivers/infiniband/sw/rxe/rxe_ns.c
>   create mode 100644 drivers/infiniband/sw/rxe/rxe_ns.h
>   create mode 100644 tools/testing/selftests/rdma/Makefile
>   create mode 100644 tools/testing/selftests/rdma/config
>   create mode 100755 tools/testing/selftests/rdma/rping_between_netns.sh
>   create mode 100755 tools/testing/selftests/rdma/rxe_ipv6.sh
>   create mode 100755 tools/testing/selftests/rdma/socket_with_rxe.sh
>
-- 
Best Regards,
Yanjun.Zhu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] RDMA/rxe: Add testcase for net namespace rxe
  2026-03-06  8:24 ` [PATCH 1/4] RDMA/rxe: Add testcase for net namespace rxe Zhu Yanjun
@ 2026-03-07  1:10   ` David Ahern
  0 siblings, 0 replies; 12+ messages in thread
From: David Ahern @ 2026-03-07  1:10 UTC (permalink / raw)
  To: Zhu Yanjun, jgg, leon, zyjzyj2000, linux-rdma, linux-kselftest

On 3/6/26 1:24 AM, Zhu Yanjun wrote:
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> ---
>  MAINTAINERS                                   |  1 +
>  tools/testing/selftests/Makefile              |  1 +
>  tools/testing/selftests/rdma/Makefile         |  5 ++
>  tools/testing/selftests/rdma/config           |  3 +
>  .../selftests/rdma/rping_between_netns.sh     | 57 +++++++++++++++++
>  tools/testing/selftests/rdma/rxe_ipv6.sh      | 47 ++++++++++++++
>  .../testing/selftests/rdma/socket_with_rxe.sh | 64 +++++++++++++++++++
>  7 files changed, 178 insertions(+)
>  create mode 100644 tools/testing/selftests/rdma/Makefile
>  create mode 100644 tools/testing/selftests/rdma/config
>  create mode 100755 tools/testing/selftests/rdma/rping_between_netns.sh
>  create mode 100755 tools/testing/selftests/rdma/rxe_ipv6.sh
>  create mode 100755 tools/testing/selftests/rdma/socket_with_rxe.sh
> 

Test patch should be last since it relies on the next 3 patches to work.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] RDMA/rxe: Support RDMA link creation and destruction per net namespace
  2026-03-06  8:24 ` [PATCH 4/4] RDMA/rxe: Support RDMA link creation and destruction per net namespace Zhu Yanjun
@ 2026-03-07  1:10   ` David Ahern
  2026-03-07  8:00     ` Zhu Yanjun
  2026-03-07  1:12   ` yanjun.zhu
  1 sibling, 1 reply; 12+ messages in thread
From: David Ahern @ 2026-03-07  1:10 UTC (permalink / raw)
  To: Zhu Yanjun, jgg, leon, zyjzyj2000, linux-rdma, linux-kselftest

On 3/6/26 1:24 AM, Zhu Yanjun wrote:
> @@ -253,15 +270,29 @@ static int __init rxe_module_init(void)
>  	if (err)
>  		return err;
>  
> -	err = rxe_net_init();
> +	rdma_link_register(&rxe_link_ops);
> +
> +	err = rxe_register_notifier();
>  	if (err) {
> -		rxe_destroy_wq();
> -		return err;
> +		pr_err("Failed to register netdev notifier\n");

drop the error message; rxe_register_notifier already logs it.

> +		goto err_wq;

		goto err_notifier;

err_wq is misleading since the wq init did not fail and neither did the
link register.


> +	}
> +
> +	err = rxe_namespace_init();
> +	if (err) {
> +		pr_err("Failed to register net namespace notifier\n");
> +		goto err_notifier;;

		goto err_namespace_init;

note that you have 2 ';' in your goto statement.

>  	}
>  
> -	rdma_link_register(&rxe_link_ops);

why move this register up? doing here after all initializations are
complete seems more appropriate to me.

>  	pr_info("loaded\n");
>  	return 0;
> +
> +err_notifier:
> +	rxe_net_exit(); /* unregister notifier */
> +err_wq:
> +	rdma_link_unregister(&rxe_link_ops);
> +	rxe_destroy_wq();
> +	return err;
>  }
>  
>  static void __exit rxe_module_exit(void)
> @@ -271,6 +302,8 @@ static void __exit rxe_module_exit(void)
>  	rxe_net_exit();
>  	rxe_destroy_wq();
>  
> +	rxe_namespace_exit();
> +
>  	pr_info("unloaded\n");
>  }
>  
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 0bd0902b11f7..ba5bc171a58e 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -17,8 +17,7 @@
>  #include "rxe.h"
>  #include "rxe_net.h"
>  #include "rxe_loc.h"
> -
> -static struct rxe_recv_sockets recv_sockets;
> +#include "rxe_ns.h"
>  
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>  /*
> @@ -114,7 +113,7 @@ static struct dst_entry *rxe_find_route4(struct rxe_qp *qp,
>  	memcpy(&fl.daddr, daddr, sizeof(*daddr));
>  	fl.flowi4_proto = IPPROTO_UDP;
>  
> -	rt = ip_route_output_key(&init_net, &fl);
> +	rt = ip_route_output_key(dev_net(ndev), &fl);

past struct net *net into both of the find_route functions. That is what
both of them care about, and then you can have rxe_find_route set the
namespace once.


>  	if (IS_ERR(rt)) {
>  		rxe_dbg_qp(qp, "no route to %pI4\n", &daddr->s_addr);
>  		return NULL;
> @@ -138,8 +137,8 @@ static struct dst_entry *rxe_find_route6(struct rxe_qp *qp,
>  	memcpy(&fl6.daddr, daddr, sizeof(*daddr));
>  	fl6.flowi6_proto = IPPROTO_UDP;
>  
> -	ndst = ipv6_stub->ipv6_dst_lookup_flow(sock_net(recv_sockets.sk6->sk),
> -					       recv_sockets.sk6->sk, &fl6,
> +	ndst = ipv6_stub->ipv6_dst_lookup_flow(dev_net(ndev),
> +					       rxe_ns_pernet_sk6(dev_net(ndev)), &fl6,

and doing my comment above means you have 1 net reference and not 2
dev_net(ndev) changes here.

>  					       NULL);
>  	if (IS_ERR(ndst)) {
>  		rxe_dbg_qp(qp, "no route to %pI6\n", daddr);
> @@ -624,6 +623,43 @@ int rxe_net_add(const char *ibdev_name, struct net_device *ndev)
>  	return 0;
>  }
>  
> +#define SK_REF_FOR_TUNNEL	2
> +
> +static void rxe_sock_put(struct sock *sk,
> +					void (*set_sk)(struct net *, struct sock *),
> +					struct net_device *ndev)
> +{
> +	if (refcount_read(&sk->sk_refcnt) > SK_REF_FOR_TUNNEL) {
> +		__sock_put(sk);
> +	} else {
> +		rxe_release_udp_tunnel(sk->sk_socket);
> +		sk = NULL;
> +		set_sk(dev_net(ndev), sk);
> +	}
> +}
> +
> +void rxe_net_del(struct ib_device *dev)
> +{
> +	struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev);
> +	struct net_device *ndev;
> +	struct sock *sk;
	struct net *net;

> +
> +	ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
> +	if (!ndev)
> +		return;

	net = dev_net(ndev);

then use just net in the calls below. This code is not operating under
RTNL, only the rdma nldev semaphore meaning the netdev can change
namespaces on you in between calls.

> +
> +	sk = rxe_ns_pernet_sk4(dev_net(ndev));
> +	if (sk)
> +		rxe_sock_put(sk, rxe_ns_pernet_set_sk4, ndev);
> +
> +	sk = rxe_ns_pernet_sk6(dev_net(ndev));
> +	if (sk)
> +		rxe_sock_put(sk, rxe_ns_pernet_set_sk6, ndev);
> +
> +	dev_put(ndev);
> +}
> +#undef SK_REF_FOR_TUNNEL
> +
>  static void rxe_port_event(struct rxe_dev *rxe,
>  			   enum ib_event_type event)
>  {
> @@ -680,6 +716,7 @@ static int rxe_notify(struct notifier_block *not_blk,
>  	switch (event) {
>  	case NETDEV_UNREGISTER:
>  		ib_unregister_device_queued(&rxe->ib_dev);
> +		rxe_net_del(&rxe->ib_dev);

make sure you have a test case for this -- the netdevice changing
namespaces on the rxe device.

>  		break;
>  	case NETDEV_CHANGEMTU:
>  		rxe_dbg_dev(rxe, "%s changed mtu to %d\n", ndev->name, ndev->mtu);
> @@ -709,66 +746,91 @@ static struct notifier_block rxe_net_notifier = {
>  	.notifier_call = rxe_notify,
>  };
>  
> -static int rxe_net_ipv4_init(void)
> +static int rxe_net_ipv4_init(struct net_device *ndev)

pass in struct net *net

>  {
> -	recv_sockets.sk4 = rxe_setup_udp_tunnel(&init_net,
> -				htons(ROCE_V2_UDP_DPORT), false);
> -	if (IS_ERR(recv_sockets.sk4)) {
> -		recv_sockets.sk4 = NULL;
> +	struct sock *sk;
> +	struct socket *sock;
> +
> +	sk = rxe_ns_pernet_sk4(dev_net(ndev));
> +	if (sk) {
> +		sock_hold(sk);
> +		return 0;
> +	}
> +
> +	sock = rxe_setup_udp_tunnel(dev_net(ndev), htons(ROCE_V2_UDP_DPORT), false);
> +	if (IS_ERR(sock)) {
>  		pr_err("Failed to create IPv4 UDP tunnel\n");
>  		return -1;
>  	}
> +	rxe_ns_pernet_set_sk4(dev_net(ndev), sock->sk);
>  
>  	return 0;
>  }
>  
> -static int rxe_net_ipv6_init(void)
> +static int rxe_net_ipv6_init(struct net_device *ndev)

same here, input argument should be struct net *net
>  {
>  #if IS_ENABLED(CONFIG_IPV6)
> +	struct sock *sk;
> +	struct socket *sock;
>  
> -	recv_sockets.sk6 = rxe_setup_udp_tunnel(&init_net,
> -						htons(ROCE_V2_UDP_DPORT), true);
> -	if (PTR_ERR(recv_sockets.sk6) == -EAFNOSUPPORT) {
> -		recv_sockets.sk6 = NULL;
> +	sk = rxe_ns_pernet_sk6(dev_net(ndev));
> +	if (sk) {
> +		sock_hold(sk);
> +		return 0;
> +	}
> +
> +	sock = rxe_setup_udp_tunnel(dev_net(ndev), htons(ROCE_V2_UDP_DPORT), true);
> +	if (PTR_ERR(sock) == -EAFNOSUPPORT) {
>  		pr_warn("IPv6 is not supported, can not create a UDPv6 socket\n");
>  		return 0;
>  	}
>  
> -	if (IS_ERR(recv_sockets.sk6)) {
> -		recv_sockets.sk6 = NULL;
> +	if (IS_ERR(sock)) {
>  		pr_err("Failed to create IPv6 UDP tunnel\n");
>  		return -1;
>  	}
> +
> +	rxe_ns_pernet_set_sk6(dev_net(ndev), sock->sk);
> +
>  #endif
>  	return 0;
>  }
>  
> +int rxe_register_notifier(void)
> +{
> +	int err;
> +
> +	err = register_netdevice_notifier(&rxe_net_notifier);
> +	if (err) {
> +		pr_err("Failed to register netdev notifier\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
>  void rxe_net_exit(void)
>  {
> -	rxe_release_udp_tunnel(recv_sockets.sk6);
> -	rxe_release_udp_tunnel(recv_sockets.sk4);
>  	unregister_netdevice_notifier(&rxe_net_notifier);
>  }
>  
> -int rxe_net_init(void)
> +int rxe_net_init(struct net_device *ndev)
>  {
>  	int err;

	struct net *net = dev_net(ndev);

>  
> -	recv_sockets.sk6 = NULL;
> -
> -	err = rxe_net_ipv4_init();
> +	err = rxe_net_ipv4_init(ndev);
>  	if (err)
>  		return err;
> -	err = rxe_net_ipv6_init();
> +
> +	err = rxe_net_ipv6_init(ndev);
>  	if (err)
>  		goto err_out;
> -	err = register_netdevice_notifier(&rxe_net_notifier);
> -	if (err) {
> -		pr_err("Failed to register netdev notifier\n");
> -		goto err_out;
> -	}
> +
>  	return 0;
> +
>  err_out:
> -	rxe_net_exit();
> +	/* If ipv6 error, release ipv4 resource */
> +	udp_tunnel_sock_release(rxe_ns_pernet_sk4(dev_net(ndev))->sk_socket);
> +	rxe_ns_pernet_set_sk4(dev_net(ndev), NULL);
>  	return err;
>  }
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h
> index 45d80d00f86b..56249677d692 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.h
> +++ b/drivers/infiniband/sw/rxe/rxe_net.h
> @@ -11,14 +11,11 @@
>  #include <net/if_inet6.h>
>  #include <linux/module.h>
>  
> -struct rxe_recv_sockets {
> -	struct socket *sk4;
> -	struct socket *sk6;
> -};
> -
>  int rxe_net_add(const char *ibdev_name, struct net_device *ndev);
> +void rxe_net_del(struct ib_device *dev);
>  
> -int rxe_net_init(void);
> +int rxe_register_notifier(void);
> +int rxe_net_init(struct net_device *ndev);
>  void rxe_net_exit(void);
>  
>  #endif /* RXE_NET_H */
> diff --git a/drivers/infiniband/sw/rxe/rxe_ns.c b/drivers/infiniband/sw/rxe/rxe_ns.c
> index 29d08899dcda..1ff34167a295 100644
> --- a/drivers/infiniband/sw/rxe/rxe_ns.c
> +++ b/drivers/infiniband/sw/rxe/rxe_ns.c

All of the changes to this file belong in the previous patch.

> @@ -39,7 +39,9 @@ static int __net_init rxe_ns_init(struct net *net)
>  	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
>  
>  	rcu_assign_pointer(ns_sk->rxe_sk4, NULL); /* initialize sock 4 socket */
> +#if IS_ENABLED(CONFIG_IPV6)
>  	rcu_assign_pointer(ns_sk->rxe_sk6, NULL); /* initialize sock 6 socket */
> +#endif /* IPV6 */
>  	synchronize_rcu();
>  
>  	return 0;
> @@ -52,11 +54,15 @@ static void __net_exit rxe_ns_exit(struct net *net)
>  	 */
>  	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
>  	struct sock *rxe_sk4 = NULL;
> +#if IS_ENABLED(CONFIG_IPV6)
>  	struct sock *rxe_sk6 = NULL;
> +#endif
>  
>  	rcu_read_lock();
>  	rxe_sk4 = rcu_dereference(ns_sk->rxe_sk4);
> +#if IS_ENABLED(CONFIG_IPV6)
>  	rxe_sk6 = rcu_dereference(ns_sk->rxe_sk6);
> +#endif
>  	rcu_read_unlock();
>  
>  	/* close socket */
> @@ -66,11 +72,13 @@ static void __net_exit rxe_ns_exit(struct net *net)
>  		synchronize_rcu();
>  	}
>  
> +#if IS_ENABLED(CONFIG_IPV6)
>  	if (rxe_sk6 && rxe_sk6->sk_socket) {
>  		udp_tunnel_sock_release(rxe_sk6->sk_socket);
>  		rcu_assign_pointer(ns_sk->rxe_sk6, NULL);
>  		synchronize_rcu();
>  	}
> +#endif
>  }
>  
>  /*
> @@ -103,6 +111,7 @@ void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk)
>  	synchronize_rcu();
>  }
>  
> +#if IS_ENABLED(CONFIG_IPV6)
>  struct sock *rxe_ns_pernet_sk6(struct net *net)
>  {
>  	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
> @@ -123,6 +132,19 @@ void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
>  	synchronize_rcu();
>  }
>  
> +#else /* IPV6 */
> +
> +struct sock *rxe_ns_pernet_sk6(struct net *net)
> +{
> +	return NULL;
> +}
> +
> +void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
> +{
> +}
> +
> +#endif /* IPV6 */
> +
>  int __init rxe_namespace_init(void)
>  {
>  	return register_pernet_subsys(&rxe_net_ops);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] RDMA/rxe: Add net namespace support for IPv4/IPv6 sockets
  2026-03-06  8:24 ` [PATCH 3/4] RDMA/rxe: Add net namespace support for IPv4/IPv6 sockets Zhu Yanjun
@ 2026-03-07  1:10   ` David Ahern
  2026-03-07  8:00     ` Zhu Yanjun
  0 siblings, 1 reply; 12+ messages in thread
From: David Ahern @ 2026-03-07  1:10 UTC (permalink / raw)
  To: Zhu Yanjun, jgg, leon, zyjzyj2000, linux-rdma, linux-kselftest

On 3/6/26 1:24 AM, Zhu Yanjun wrote:
> diff --git a/drivers/infiniband/sw/rxe/rxe_ns.c b/drivers/infiniband/sw/rxe/rxe_ns.c
> new file mode 100644
> index 000000000000..29d08899dcda
> --- /dev/null
> +++ b/drivers/infiniband/sw/rxe/rxe_ns.c
> @@ -0,0 +1,134 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
> + * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
> + */
> +
> +#include <net/sock.h>
> +#include <net/netns/generic.h>
> +#include <net/net_namespace.h>
> +#include <linux/module.h>
> +#include <linux/skbuff.h>
> +#include <linux/pid_namespace.h>
> +#include <net/udp_tunnel.h>
> +
> +#include "rxe_ns.h"
> +
> +/*
> + * Per network namespace data
> + */
> +struct rxe_ns_sock {
> +	struct sock __rcu *rxe_sk4;
> +	struct sock __rcu *rxe_sk6;
> +};
> +
> +/*
> + * Index to store custom data for each network namespace.
> + */
> +static unsigned int rxe_pernet_id;
> +
> +/*
> + * Called for every existing and added network namespaces
> + */
> +static int __net_init rxe_ns_init(struct net *net)
> +{
> +	/*
> +	 * create (if not present) and access data item in network namespace
> +	 * (net) using the id (net_id)
> +	 */

this comment is not needed; does not really convey anything useful. I
would like this function to have the comment from my patch:

	/* defer socket create in the namespace to the first
	 * device create.
	 */

this makes it clear why init and exit are not symmetrical.

> +	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
> +
> +	rcu_assign_pointer(ns_sk->rxe_sk4, NULL); /* initialize sock 4 socket */
> +	rcu_assign_pointer(ns_sk->rxe_sk6, NULL); /* initialize sock 6 socket */
> +	synchronize_rcu();

I believe the core network namespace code ensures the memory is
initialized, so this is not needed.

> +
> +	return 0;
> +}
> +
> +static void __net_exit rxe_ns_exit(struct net *net)
> +{
> +	/*
> +	 * called when the network namespace is removed
> +	 */
> +	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
> +	struct sock *rxe_sk4 = NULL;
> +	struct sock *rxe_sk6 = NULL;

initialization is not needed since both are set before use.

> +
> +	rcu_read_lock();
> +	rxe_sk4 = rcu_dereference(ns_sk->rxe_sk4);
> +	rxe_sk6 = rcu_dereference(ns_sk->rxe_sk6);
> +	rcu_read_unlock();
> +
> +	/* close socket */
> +	if (rxe_sk4 && rxe_sk4->sk_socket) {

how can rxe_sk4 be non-NULL and yet sk_socket become NULL?

> +		udp_tunnel_sock_release(rxe_sk4->sk_socket);
> +		rcu_assign_pointer(ns_sk->rxe_sk4, NULL);

if you flip the order

		rcu_assign_pointer(ns_sk->rxe_sk4, NULL);
		/* udp_tunnel_sock_release calls synchronize_rcu */
		udp_tunnel_sock_release(rxe_sk4->sk_socket);


you should be able to drop the synchronize_rcu here:

> +		synchronize_rcu();

> +	}
> +
> +	if (rxe_sk6 && rxe_sk6->sk_socket) {

same here.

> +		udp_tunnel_sock_release(rxe_sk6->sk_socket);
> +		rcu_assign_pointer(ns_sk->rxe_sk6, NULL);
> +		synchronize_rcu();> +	}
> +}
> +


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] RDMA/rxe: Support RDMA link creation and destruction per net namespace
  2026-03-06  8:24 ` [PATCH 4/4] RDMA/rxe: Support RDMA link creation and destruction per net namespace Zhu Yanjun
  2026-03-07  1:10   ` David Ahern
@ 2026-03-07  1:12   ` yanjun.zhu
  1 sibling, 0 replies; 12+ messages in thread
From: yanjun.zhu @ 2026-03-07  1:12 UTC (permalink / raw)
  To: jgg, leon, zyjzyj2000, dsahern, linux-rdma, linux-kselftest

On 3/6/26 12:24 AM, Zhu Yanjun wrote:
> After introducing dellink handling and per-net namespace management
> for IPv4 and IPv6 sockets, extend rxe to create and destroy RDMA links
> within each network namespace.
> 
> With this change, RDMA links can be instantiated both in init_net and
> in other network namespaces. The lifecycle of the RDMA link is now tied
> to the corresponding namespace and is properly cleaned up when the
> namespace or link is removed.
> 
> This ensures rxe behaves correctly in multi-namespace environments and
> keeps socket and RDMA link resources consistent across namespace
> creation and teardown.
> 
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> ---
>   drivers/infiniband/sw/rxe/rxe.c     |  41 +++++++++-
>   drivers/infiniband/sw/rxe/rxe_net.c | 122 +++++++++++++++++++++-------
>   drivers/infiniband/sw/rxe/rxe_net.h |   9 +-
>   drivers/infiniband/sw/rxe/rxe_ns.c  |  22 +++++
>   4 files changed, 154 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> index e891199cbdef..f74a66948a37 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -8,6 +8,8 @@
>   #include <net/addrconf.h>
>   #include "rxe.h"
>   #include "rxe_loc.h"
> +#include "rxe_net.h"
> +#include "rxe_ns.h"
>   
>   MODULE_AUTHOR("Bob Pearson, Frank Zago, John Groves, Kamal Heib");
>   MODULE_DESCRIPTION("Soft RDMA transport");
> @@ -200,6 +202,8 @@ void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu)
>   	port->mtu_cap = ib_mtu_enum_to_int(mtu);
>   }
>   
> +static struct rdma_link_ops rxe_link_ops;
> +
>   /* called by ifc layer to create new rxe device.
>    * The caller should allocate memory for rxe by calling ib_alloc_device.
>    */
> @@ -208,6 +212,7 @@ int rxe_add(struct rxe_dev *rxe, unsigned int mtu, const char *ibdev_name,
>   {
>   	rxe_init(rxe, ndev);
>   	rxe_set_mtu(rxe, mtu);
> +	rxe->ib_dev.link_ops = &rxe_link_ops;
>   
>   	return rxe_register_device(rxe, ibdev_name, ndev);
>   }
> @@ -231,6 +236,10 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
>   		goto err;
>   	}
>   
> +	err = rxe_net_init(ndev);
> +	if (err)
> +		return err;
> +
>   	err = rxe_net_add(ibdev_name, ndev);
>   	if (err) {
>   		rxe_err("failed to add %s\n", ndev->name);
> @@ -240,9 +249,17 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
>   	return err;
>   }
>   
> +static int rxe_dellink(struct ib_device *dev)
> +{
> +	rxe_net_del(dev);
> +
> +	return 0;
> +}
> +
>   static struct rdma_link_ops rxe_link_ops = {
>   	.type = "rxe",
>   	.newlink = rxe_newlink,
> +	.dellink = rxe_dellink,
>   };
>   
>   static int __init rxe_module_init(void)
> @@ -253,15 +270,29 @@ static int __init rxe_module_init(void)
>   	if (err)
>   		return err;
>   
> -	err = rxe_net_init();
> +	rdma_link_register(&rxe_link_ops);
> +
> +	err = rxe_register_notifier();
>   	if (err) {
> -		rxe_destroy_wq();
> -		return err;
> +		pr_err("Failed to register netdev notifier\n");
> +		goto err_wq;
> +	}
> +
> +	err = rxe_namespace_init();
> +	if (err) {
> +		pr_err("Failed to register net namespace notifier\n");
> +		goto err_notifier;;

Claude also helps to find "Double semicolon typo".

>   	}
>   
> -	rdma_link_register(&rxe_link_ops);
>   	pr_info("loaded\n");
>   	return 0;
> +
> +err_notifier:
> +	rxe_net_exit(); /* unregister notifier */
> +err_wq:
> +	rdma_link_unregister(&rxe_link_ops);
> +	rxe_destroy_wq();
> +	return err;
>   }
>   
>   static void __exit rxe_module_exit(void)
> @@ -271,6 +302,8 @@ static void __exit rxe_module_exit(void)
>   	rxe_net_exit();
>   	rxe_destroy_wq();
>   
> +	rxe_namespace_exit();
> +
>   	pr_info("unloaded\n");
>   }
>   
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 0bd0902b11f7..ba5bc171a58e 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -17,8 +17,7 @@
>   #include "rxe.h"
>   #include "rxe_net.h"
>   #include "rxe_loc.h"
> -
> -static struct rxe_recv_sockets recv_sockets;
> +#include "rxe_ns.h"
>   
>   #ifdef CONFIG_DEBUG_LOCK_ALLOC
>   /*
> @@ -114,7 +113,7 @@ static struct dst_entry *rxe_find_route4(struct rxe_qp *qp,
>   	memcpy(&fl.daddr, daddr, sizeof(*daddr));
>   	fl.flowi4_proto = IPPROTO_UDP;
>   
> -	rt = ip_route_output_key(&init_net, &fl);
> +	rt = ip_route_output_key(dev_net(ndev), &fl);
>   	if (IS_ERR(rt)) {
>   		rxe_dbg_qp(qp, "no route to %pI4\n", &daddr->s_addr);
>   		return NULL;
> @@ -138,8 +137,8 @@ static struct dst_entry *rxe_find_route6(struct rxe_qp *qp,
>   	memcpy(&fl6.daddr, daddr, sizeof(*daddr));
>   	fl6.flowi6_proto = IPPROTO_UDP;
>   
> -	ndst = ipv6_stub->ipv6_dst_lookup_flow(sock_net(recv_sockets.sk6->sk),
> -					       recv_sockets.sk6->sk, &fl6,
> +	ndst = ipv6_stub->ipv6_dst_lookup_flow(dev_net(ndev),
> +					       rxe_ns_pernet_sk6(dev_net(ndev)), &fl6,
>   					       NULL);
>   	if (IS_ERR(ndst)) {
>   		rxe_dbg_qp(qp, "no route to %pI6\n", daddr);
> @@ -624,6 +623,43 @@ int rxe_net_add(const char *ibdev_name, struct net_device *ndev)
>   	return 0;
>   }
>   
> +#define SK_REF_FOR_TUNNEL	2
> +
> +static void rxe_sock_put(struct sock *sk,
> +					void (*set_sk)(struct net *, struct sock *),
> +					struct net_device *ndev)
> +{
> +	if (refcount_read(&sk->sk_refcnt) > SK_REF_FOR_TUNNEL) {
> +		__sock_put(sk);
> +	} else {
> +		rxe_release_udp_tunnel(sk->sk_socket);
> +		sk = NULL;
> +		set_sk(dev_net(ndev), sk);
> +	}
> +}
> +
> +void rxe_net_del(struct ib_device *dev)
> +{
> +	struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev);
> +	struct net_device *ndev;
> +	struct sock *sk;
> +
> +	ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
> +	if (!ndev)
> +		return;
> +
> +	sk = rxe_ns_pernet_sk4(dev_net(ndev));
> +	if (sk)
> +		rxe_sock_put(sk, rxe_ns_pernet_set_sk4, ndev);
> +
> +	sk = rxe_ns_pernet_sk6(dev_net(ndev));
> +	if (sk)
> +		rxe_sock_put(sk, rxe_ns_pernet_set_sk6, ndev);
> +
> +	dev_put(ndev);
> +}
> +#undef SK_REF_FOR_TUNNEL
> +
>   static void rxe_port_event(struct rxe_dev *rxe,
>   			   enum ib_event_type event)
>   {
> @@ -680,6 +716,7 @@ static int rxe_notify(struct notifier_block *not_blk,
>   	switch (event) {
>   	case NETDEV_UNREGISTER:
>   		ib_unregister_device_queued(&rxe->ib_dev);
> +		rxe_net_del(&rxe->ib_dev);
>   		break;
>   	case NETDEV_CHANGEMTU:
>   		rxe_dbg_dev(rxe, "%s changed mtu to %d\n", ndev->name, ndev->mtu);
> @@ -709,66 +746,91 @@ static struct notifier_block rxe_net_notifier = {
>   	.notifier_call = rxe_notify,
>   };
>   
> -static int rxe_net_ipv4_init(void)
> +static int rxe_net_ipv4_init(struct net_device *ndev)
>   {
> -	recv_sockets.sk4 = rxe_setup_udp_tunnel(&init_net,
> -				htons(ROCE_V2_UDP_DPORT), false);
> -	if (IS_ERR(recv_sockets.sk4)) {
> -		recv_sockets.sk4 = NULL;
> +	struct sock *sk;
> +	struct socket *sock;
> +
> +	sk = rxe_ns_pernet_sk4(dev_net(ndev));
> +	if (sk) {
> +		sock_hold(sk);
> +		return 0;
> +	}
> +
> +	sock = rxe_setup_udp_tunnel(dev_net(ndev), htons(ROCE_V2_UDP_DPORT), false);
> +	if (IS_ERR(sock)) {
>   		pr_err("Failed to create IPv4 UDP tunnel\n");
>   		return -1;
>   	}
> +	rxe_ns_pernet_set_sk4(dev_net(ndev), sock->sk);
>   
>   	return 0;
>   }
>   
> -static int rxe_net_ipv6_init(void)
> +static int rxe_net_ipv6_init(struct net_device *ndev)
>   {
>   #if IS_ENABLED(CONFIG_IPV6)
> +	struct sock *sk;
> +	struct socket *sock;
>   
> -	recv_sockets.sk6 = rxe_setup_udp_tunnel(&init_net,
> -						htons(ROCE_V2_UDP_DPORT), true);
> -	if (PTR_ERR(recv_sockets.sk6) == -EAFNOSUPPORT) {
> -		recv_sockets.sk6 = NULL;
> +	sk = rxe_ns_pernet_sk6(dev_net(ndev));
> +	if (sk) {
> +		sock_hold(sk);
> +		return 0;
> +	}
> +
> +	sock = rxe_setup_udp_tunnel(dev_net(ndev), htons(ROCE_V2_UDP_DPORT), true);
> +	if (PTR_ERR(sock) == -EAFNOSUPPORT) {
>   		pr_warn("IPv6 is not supported, can not create a UDPv6 socket\n");
>   		return 0;
>   	}
>   
> -	if (IS_ERR(recv_sockets.sk6)) {
> -		recv_sockets.sk6 = NULL;
> +	if (IS_ERR(sock)) {
>   		pr_err("Failed to create IPv6 UDP tunnel\n");
>   		return -1;
>   	}
> +
> +	rxe_ns_pernet_set_sk6(dev_net(ndev), sock->sk);
> +
>   #endif
>   	return 0;
>   }
>   
> +int rxe_register_notifier(void)
> +{
> +	int err;
> +
> +	err = register_netdevice_notifier(&rxe_net_notifier);
> +	if (err) {
> +		pr_err("Failed to register netdev notifier\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
>   void rxe_net_exit(void)
>   {
> -	rxe_release_udp_tunnel(recv_sockets.sk6);
> -	rxe_release_udp_tunnel(recv_sockets.sk4);
>   	unregister_netdevice_notifier(&rxe_net_notifier);
>   }
>   
> -int rxe_net_init(void)
> +int rxe_net_init(struct net_device *ndev)
>   {
>   	int err;
>   
> -	recv_sockets.sk6 = NULL;
> -
> -	err = rxe_net_ipv4_init();
> +	err = rxe_net_ipv4_init(ndev);
>   	if (err)
>   		return err;
> -	err = rxe_net_ipv6_init();
> +
> +	err = rxe_net_ipv6_init(ndev);
>   	if (err)
>   		goto err_out;
> -	err = register_netdevice_notifier(&rxe_net_notifier);
> -	if (err) {
> -		pr_err("Failed to register netdev notifier\n");
> -		goto err_out;
> -	}
> +
>   	return 0;
> +
>   err_out:
> -	rxe_net_exit();
> +	/* If ipv6 error, release ipv4 resource */
> +	udp_tunnel_sock_release(rxe_ns_pernet_sk4(dev_net(ndev))->sk_socket);
> +	rxe_ns_pernet_set_sk4(dev_net(ndev), NULL);
>   	return err;

Thanks. Claude helps to find the above problem. I will fix it.
The following should be better.
"
         /* If ipv6 error, release ipv4 resource */
         sk = rxe_ns_pernet_sk4(dev_net(ndev));
           if (sk)
                   rxe_sock_put(sk, rxe_ns_pernet_set_sk4, ndev);
"

Zhu Yanjun

>   }
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h
> index 45d80d00f86b..56249677d692 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.h
> +++ b/drivers/infiniband/sw/rxe/rxe_net.h
> @@ -11,14 +11,11 @@
>   #include <net/if_inet6.h>
>   #include <linux/module.h>
>   
> -struct rxe_recv_sockets {
> -	struct socket *sk4;
> -	struct socket *sk6;
> -};
> -
>   int rxe_net_add(const char *ibdev_name, struct net_device *ndev);
> +void rxe_net_del(struct ib_device *dev);
>   
> -int rxe_net_init(void);
> +int rxe_register_notifier(void);
> +int rxe_net_init(struct net_device *ndev);
>   void rxe_net_exit(void);
>   
>   #endif /* RXE_NET_H */
> diff --git a/drivers/infiniband/sw/rxe/rxe_ns.c b/drivers/infiniband/sw/rxe/rxe_ns.c
> index 29d08899dcda..1ff34167a295 100644
> --- a/drivers/infiniband/sw/rxe/rxe_ns.c
> +++ b/drivers/infiniband/sw/rxe/rxe_ns.c
> @@ -39,7 +39,9 @@ static int __net_init rxe_ns_init(struct net *net)
>   	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
>   
>   	rcu_assign_pointer(ns_sk->rxe_sk4, NULL); /* initialize sock 4 socket */
> +#if IS_ENABLED(CONFIG_IPV6)
>   	rcu_assign_pointer(ns_sk->rxe_sk6, NULL); /* initialize sock 6 socket */
> +#endif /* IPV6 */
>   	synchronize_rcu();
>   
>   	return 0;
> @@ -52,11 +54,15 @@ static void __net_exit rxe_ns_exit(struct net *net)
>   	 */
>   	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
>   	struct sock *rxe_sk4 = NULL;
> +#if IS_ENABLED(CONFIG_IPV6)
>   	struct sock *rxe_sk6 = NULL;
> +#endif
>   
>   	rcu_read_lock();
>   	rxe_sk4 = rcu_dereference(ns_sk->rxe_sk4);
> +#if IS_ENABLED(CONFIG_IPV6)
>   	rxe_sk6 = rcu_dereference(ns_sk->rxe_sk6);
> +#endif
>   	rcu_read_unlock();
>   
>   	/* close socket */
> @@ -66,11 +72,13 @@ static void __net_exit rxe_ns_exit(struct net *net)
>   		synchronize_rcu();
>   	}
>   
> +#if IS_ENABLED(CONFIG_IPV6)
>   	if (rxe_sk6 && rxe_sk6->sk_socket) {
>   		udp_tunnel_sock_release(rxe_sk6->sk_socket);
>   		rcu_assign_pointer(ns_sk->rxe_sk6, NULL);
>   		synchronize_rcu();
>   	}
> +#endif
>   }
>   
>   /*
> @@ -103,6 +111,7 @@ void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk)
>   	synchronize_rcu();
>   }
>   
> +#if IS_ENABLED(CONFIG_IPV6)
>   struct sock *rxe_ns_pernet_sk6(struct net *net)
>   {
>   	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
> @@ -123,6 +132,19 @@ void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
>   	synchronize_rcu();
>   }
>   
> +#else /* IPV6 */
> +
> +struct sock *rxe_ns_pernet_sk6(struct net *net)
> +{
> +	return NULL;
> +}
> +
> +void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
> +{
> +}
> +
> +#endif /* IPV6 */
> +
>   int __init rxe_namespace_init(void)
>   {
>   	return register_pernet_subsys(&rxe_net_ops);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] RDMA/rxe: Support RDMA link creation and destruction per net namespace
  2026-03-07  1:10   ` David Ahern
@ 2026-03-07  8:00     ` Zhu Yanjun
  0 siblings, 0 replies; 12+ messages in thread
From: Zhu Yanjun @ 2026-03-07  8:00 UTC (permalink / raw)
  To: David Ahern, jgg, leon, zyjzyj2000, linux-rdma, linux-kselftest,
	yanjun.zhu@linux.dev


在 2026/3/6 17:10, David Ahern 写道:
> On 3/6/26 1:24 AM, Zhu Yanjun wrote:
>> @@ -253,15 +270,29 @@ static int __init rxe_module_init(void)
>>   	if (err)
>>   		return err;
>>   
>> -	err = rxe_net_init();
>> +	rdma_link_register(&rxe_link_ops);
>> +
>> +	err = rxe_register_notifier();
>>   	if (err) {
>> -		rxe_destroy_wq();
>> -		return err;
>> +		pr_err("Failed to register netdev notifier\n");
> drop the error message; rxe_register_notifier already logs it.
>
>> +		goto err_wq;
> 		goto err_notifier;
>
> err_wq is misleading since the wq init did not fail and neither did the
> link register.
>
>
>> +	}
>> +
>> +	err = rxe_namespace_init();
>> +	if (err) {
>> +		pr_err("Failed to register net namespace notifier\n");
>> +		goto err_notifier;;
> 		goto err_namespace_init;
>
> note that you have 2 ';' in your goto statement.
>
>>   	}
>>   
>> -	rdma_link_register(&rxe_link_ops);
> why move this register up? doing here after all initializations are
> complete seems more appropriate to me.
>
>>   	pr_info("loaded\n");
>>   	return 0;
>> +
>> +err_notifier:
>> +	rxe_net_exit(); /* unregister notifier */
>> +err_wq:
>> +	rdma_link_unregister(&rxe_link_ops);
>> +	rxe_destroy_wq();
>> +	return err;
>>   }
>>   
>>   static void __exit rxe_module_exit(void)
>> @@ -271,6 +302,8 @@ static void __exit rxe_module_exit(void)
>>   	rxe_net_exit();
>>   	rxe_destroy_wq();
>>   
>> +	rxe_namespace_exit();
>> +
>>   	pr_info("unloaded\n");
>>   }
>>   
>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
>> index 0bd0902b11f7..ba5bc171a58e 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
>> @@ -17,8 +17,7 @@
>>   #include "rxe.h"
>>   #include "rxe_net.h"
>>   #include "rxe_loc.h"
>> -
>> -static struct rxe_recv_sockets recv_sockets;
>> +#include "rxe_ns.h"
>>   
>>   #ifdef CONFIG_DEBUG_LOCK_ALLOC
>>   /*
>> @@ -114,7 +113,7 @@ static struct dst_entry *rxe_find_route4(struct rxe_qp *qp,
>>   	memcpy(&fl.daddr, daddr, sizeof(*daddr));
>>   	fl.flowi4_proto = IPPROTO_UDP;
>>   
>> -	rt = ip_route_output_key(&init_net, &fl);
>> +	rt = ip_route_output_key(dev_net(ndev), &fl);
> past struct net *net into both of the find_route functions. That is what
> both of them care about, and then you can have rxe_find_route set the
> namespace once.
>
>
>>   	if (IS_ERR(rt)) {
>>   		rxe_dbg_qp(qp, "no route to %pI4\n", &daddr->s_addr);
>>   		return NULL;
>> @@ -138,8 +137,8 @@ static struct dst_entry *rxe_find_route6(struct rxe_qp *qp,
>>   	memcpy(&fl6.daddr, daddr, sizeof(*daddr));
>>   	fl6.flowi6_proto = IPPROTO_UDP;
>>   
>> -	ndst = ipv6_stub->ipv6_dst_lookup_flow(sock_net(recv_sockets.sk6->sk),
>> -					       recv_sockets.sk6->sk, &fl6,
>> +	ndst = ipv6_stub->ipv6_dst_lookup_flow(dev_net(ndev),
>> +					       rxe_ns_pernet_sk6(dev_net(ndev)), &fl6,
> and doing my comment above means you have 1 net reference and not 2
> dev_net(ndev) changes here.
>
>>   					       NULL);
>>   	if (IS_ERR(ndst)) {
>>   		rxe_dbg_qp(qp, "no route to %pI6\n", daddr);
>> @@ -624,6 +623,43 @@ int rxe_net_add(const char *ibdev_name, struct net_device *ndev)
>>   	return 0;
>>   }
>>   
>> +#define SK_REF_FOR_TUNNEL	2
>> +
>> +static void rxe_sock_put(struct sock *sk,
>> +					void (*set_sk)(struct net *, struct sock *),
>> +					struct net_device *ndev)
>> +{
>> +	if (refcount_read(&sk->sk_refcnt) > SK_REF_FOR_TUNNEL) {
>> +		__sock_put(sk);
>> +	} else {
>> +		rxe_release_udp_tunnel(sk->sk_socket);
>> +		sk = NULL;
>> +		set_sk(dev_net(ndev), sk);
>> +	}
>> +}
>> +
>> +void rxe_net_del(struct ib_device *dev)
>> +{
>> +	struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev);
>> +	struct net_device *ndev;
>> +	struct sock *sk;
> 	struct net *net;
>
>> +
>> +	ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
>> +	if (!ndev)
>> +		return;
> 	net = dev_net(ndev);
>
> then use just net in the calls below. This code is not operating under
> RTNL, only the rdma nldev semaphore meaning the netdev can change
> namespaces on you in between calls.
>
>> +
>> +	sk = rxe_ns_pernet_sk4(dev_net(ndev));
>> +	if (sk)
>> +		rxe_sock_put(sk, rxe_ns_pernet_set_sk4, ndev);
>> +
>> +	sk = rxe_ns_pernet_sk6(dev_net(ndev));
>> +	if (sk)
>> +		rxe_sock_put(sk, rxe_ns_pernet_set_sk6, ndev);
>> +
>> +	dev_put(ndev);
>> +}
>> +#undef SK_REF_FOR_TUNNEL
>> +
>>   static void rxe_port_event(struct rxe_dev *rxe,
>>   			   enum ib_event_type event)
>>   {
>> @@ -680,6 +716,7 @@ static int rxe_notify(struct notifier_block *not_blk,
>>   	switch (event) {
>>   	case NETDEV_UNREGISTER:
>>   		ib_unregister_device_queued(&rxe->ib_dev);
>> +		rxe_net_del(&rxe->ib_dev);
> make sure you have a test case for this -- the netdevice changing
> namespaces on the rxe device.

All the mentioned problems are fix in the latest commit.

Zhu Yanjun

>
>>   		break;
>>   	case NETDEV_CHANGEMTU:
>>   		rxe_dbg_dev(rxe, "%s changed mtu to %d\n", ndev->name, ndev->mtu);
>> @@ -709,66 +746,91 @@ static struct notifier_block rxe_net_notifier = {
>>   	.notifier_call = rxe_notify,
>>   };
>>   
>> -static int rxe_net_ipv4_init(void)
>> +static int rxe_net_ipv4_init(struct net_device *ndev)
> pass in struct net *net
>
>>   {
>> -	recv_sockets.sk4 = rxe_setup_udp_tunnel(&init_net,
>> -				htons(ROCE_V2_UDP_DPORT), false);
>> -	if (IS_ERR(recv_sockets.sk4)) {
>> -		recv_sockets.sk4 = NULL;
>> +	struct sock *sk;
>> +	struct socket *sock;
>> +
>> +	sk = rxe_ns_pernet_sk4(dev_net(ndev));
>> +	if (sk) {
>> +		sock_hold(sk);
>> +		return 0;
>> +	}
>> +
>> +	sock = rxe_setup_udp_tunnel(dev_net(ndev), htons(ROCE_V2_UDP_DPORT), false);
>> +	if (IS_ERR(sock)) {
>>   		pr_err("Failed to create IPv4 UDP tunnel\n");
>>   		return -1;
>>   	}
>> +	rxe_ns_pernet_set_sk4(dev_net(ndev), sock->sk);
>>   
>>   	return 0;
>>   }
>>   
>> -static int rxe_net_ipv6_init(void)
>> +static int rxe_net_ipv6_init(struct net_device *ndev)
> same here, input argument should be struct net *net
>>   {
>>   #if IS_ENABLED(CONFIG_IPV6)
>> +	struct sock *sk;
>> +	struct socket *sock;
>>   
>> -	recv_sockets.sk6 = rxe_setup_udp_tunnel(&init_net,
>> -						htons(ROCE_V2_UDP_DPORT), true);
>> -	if (PTR_ERR(recv_sockets.sk6) == -EAFNOSUPPORT) {
>> -		recv_sockets.sk6 = NULL;
>> +	sk = rxe_ns_pernet_sk6(dev_net(ndev));
>> +	if (sk) {
>> +		sock_hold(sk);
>> +		return 0;
>> +	}
>> +
>> +	sock = rxe_setup_udp_tunnel(dev_net(ndev), htons(ROCE_V2_UDP_DPORT), true);
>> +	if (PTR_ERR(sock) == -EAFNOSUPPORT) {
>>   		pr_warn("IPv6 is not supported, can not create a UDPv6 socket\n");
>>   		return 0;
>>   	}
>>   
>> -	if (IS_ERR(recv_sockets.sk6)) {
>> -		recv_sockets.sk6 = NULL;
>> +	if (IS_ERR(sock)) {
>>   		pr_err("Failed to create IPv6 UDP tunnel\n");
>>   		return -1;
>>   	}
>> +
>> +	rxe_ns_pernet_set_sk6(dev_net(ndev), sock->sk);
>> +
>>   #endif
>>   	return 0;
>>   }
>>   
>> +int rxe_register_notifier(void)
>> +{
>> +	int err;
>> +
>> +	err = register_netdevice_notifier(&rxe_net_notifier);
>> +	if (err) {
>> +		pr_err("Failed to register netdev notifier\n");
>> +		return -1;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>   void rxe_net_exit(void)
>>   {
>> -	rxe_release_udp_tunnel(recv_sockets.sk6);
>> -	rxe_release_udp_tunnel(recv_sockets.sk4);
>>   	unregister_netdevice_notifier(&rxe_net_notifier);
>>   }
>>   
>> -int rxe_net_init(void)
>> +int rxe_net_init(struct net_device *ndev)
>>   {
>>   	int err;
> 	struct net *net = dev_net(ndev);
>
>>   
>> -	recv_sockets.sk6 = NULL;
>> -
>> -	err = rxe_net_ipv4_init();
>> +	err = rxe_net_ipv4_init(ndev);
>>   	if (err)
>>   		return err;
>> -	err = rxe_net_ipv6_init();
>> +
>> +	err = rxe_net_ipv6_init(ndev);
>>   	if (err)
>>   		goto err_out;
>> -	err = register_netdevice_notifier(&rxe_net_notifier);
>> -	if (err) {
>> -		pr_err("Failed to register netdev notifier\n");
>> -		goto err_out;
>> -	}
>> +
>>   	return 0;
>> +
>>   err_out:
>> -	rxe_net_exit();
>> +	/* If ipv6 error, release ipv4 resource */
>> +	udp_tunnel_sock_release(rxe_ns_pernet_sk4(dev_net(ndev))->sk_socket);
>> +	rxe_ns_pernet_set_sk4(dev_net(ndev), NULL);
>>   	return err;
>>   }
>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h
>> index 45d80d00f86b..56249677d692 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_net.h
>> +++ b/drivers/infiniband/sw/rxe/rxe_net.h
>> @@ -11,14 +11,11 @@
>>   #include <net/if_inet6.h>
>>   #include <linux/module.h>
>>   
>> -struct rxe_recv_sockets {
>> -	struct socket *sk4;
>> -	struct socket *sk6;
>> -};
>> -
>>   int rxe_net_add(const char *ibdev_name, struct net_device *ndev);
>> +void rxe_net_del(struct ib_device *dev);
>>   
>> -int rxe_net_init(void);
>> +int rxe_register_notifier(void);
>> +int rxe_net_init(struct net_device *ndev);
>>   void rxe_net_exit(void);
>>   
>>   #endif /* RXE_NET_H */
>> diff --git a/drivers/infiniband/sw/rxe/rxe_ns.c b/drivers/infiniband/sw/rxe/rxe_ns.c
>> index 29d08899dcda..1ff34167a295 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_ns.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_ns.c
> All of the changes to this file belong in the previous patch.
>
>> @@ -39,7 +39,9 @@ static int __net_init rxe_ns_init(struct net *net)
>>   	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
>>   
>>   	rcu_assign_pointer(ns_sk->rxe_sk4, NULL); /* initialize sock 4 socket */
>> +#if IS_ENABLED(CONFIG_IPV6)
>>   	rcu_assign_pointer(ns_sk->rxe_sk6, NULL); /* initialize sock 6 socket */
>> +#endif /* IPV6 */
>>   	synchronize_rcu();
>>   
>>   	return 0;
>> @@ -52,11 +54,15 @@ static void __net_exit rxe_ns_exit(struct net *net)
>>   	 */
>>   	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
>>   	struct sock *rxe_sk4 = NULL;
>> +#if IS_ENABLED(CONFIG_IPV6)
>>   	struct sock *rxe_sk6 = NULL;
>> +#endif
>>   
>>   	rcu_read_lock();
>>   	rxe_sk4 = rcu_dereference(ns_sk->rxe_sk4);
>> +#if IS_ENABLED(CONFIG_IPV6)
>>   	rxe_sk6 = rcu_dereference(ns_sk->rxe_sk6);
>> +#endif
>>   	rcu_read_unlock();
>>   
>>   	/* close socket */
>> @@ -66,11 +72,13 @@ static void __net_exit rxe_ns_exit(struct net *net)
>>   		synchronize_rcu();
>>   	}
>>   
>> +#if IS_ENABLED(CONFIG_IPV6)
>>   	if (rxe_sk6 && rxe_sk6->sk_socket) {
>>   		udp_tunnel_sock_release(rxe_sk6->sk_socket);
>>   		rcu_assign_pointer(ns_sk->rxe_sk6, NULL);
>>   		synchronize_rcu();
>>   	}
>> +#endif
>>   }
>>   
>>   /*
>> @@ -103,6 +111,7 @@ void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk)
>>   	synchronize_rcu();
>>   }
>>   
>> +#if IS_ENABLED(CONFIG_IPV6)
>>   struct sock *rxe_ns_pernet_sk6(struct net *net)
>>   {
>>   	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
>> @@ -123,6 +132,19 @@ void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
>>   	synchronize_rcu();
>>   }
>>   
>> +#else /* IPV6 */
>> +
>> +struct sock *rxe_ns_pernet_sk6(struct net *net)
>> +{
>> +	return NULL;
>> +}
>> +
>> +void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
>> +{
>> +}
>> +
>> +#endif /* IPV6 */
>> +
>>   int __init rxe_namespace_init(void)
>>   {
>>   	return register_pernet_subsys(&rxe_net_ops);

-- 
Best Regards,
Yanjun.Zhu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] RDMA/rxe: Add net namespace support for IPv4/IPv6 sockets
  2026-03-07  1:10   ` David Ahern
@ 2026-03-07  8:00     ` Zhu Yanjun
  0 siblings, 0 replies; 12+ messages in thread
From: Zhu Yanjun @ 2026-03-07  8:00 UTC (permalink / raw)
  To: David Ahern, jgg, leon, zyjzyj2000, linux-rdma, linux-kselftest,
	yanjun.zhu@linux.dev


在 2026/3/6 17:10, David Ahern 写道:
> On 3/6/26 1:24 AM, Zhu Yanjun wrote:
>> diff --git a/drivers/infiniband/sw/rxe/rxe_ns.c b/drivers/infiniband/sw/rxe/rxe_ns.c
>> new file mode 100644
>> index 000000000000..29d08899dcda
>> --- /dev/null
>> +++ b/drivers/infiniband/sw/rxe/rxe_ns.c
>> @@ -0,0 +1,134 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
>> +/*
>> + * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
>> + * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
>> + */
>> +
>> +#include <net/sock.h>
>> +#include <net/netns/generic.h>
>> +#include <net/net_namespace.h>
>> +#include <linux/module.h>
>> +#include <linux/skbuff.h>
>> +#include <linux/pid_namespace.h>
>> +#include <net/udp_tunnel.h>
>> +
>> +#include "rxe_ns.h"
>> +
>> +/*
>> + * Per network namespace data
>> + */
>> +struct rxe_ns_sock {
>> +	struct sock __rcu *rxe_sk4;
>> +	struct sock __rcu *rxe_sk6;
>> +};
>> +
>> +/*
>> + * Index to store custom data for each network namespace.
>> + */
>> +static unsigned int rxe_pernet_id;
>> +
>> +/*
>> + * Called for every existing and added network namespaces
>> + */
>> +static int __net_init rxe_ns_init(struct net *net)
>> +{
>> +	/*
>> +	 * create (if not present) and access data item in network namespace
>> +	 * (net) using the id (net_id)
>> +	 */
> this comment is not needed; does not really convey anything useful. I
> would like this function to have the comment from my patch:
>
> 	/* defer socket create in the namespace to the first
> 	 * device create.
> 	 */
>
> this makes it clear why init and exit are not symmetrical.
>
>> +	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
>> +
>> +	rcu_assign_pointer(ns_sk->rxe_sk4, NULL); /* initialize sock 4 socket */
>> +	rcu_assign_pointer(ns_sk->rxe_sk6, NULL); /* initialize sock 6 socket */
>> +	synchronize_rcu();
> I believe the core network namespace code ensures the memory is
> initialized, so this is not needed.
>
>> +
>> +	return 0;
>> +}
>> +
>> +static void __net_exit rxe_ns_exit(struct net *net)
>> +{
>> +	/*
>> +	 * called when the network namespace is removed
>> +	 */
>> +	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
>> +	struct sock *rxe_sk4 = NULL;
>> +	struct sock *rxe_sk6 = NULL;
> initialization is not needed since both are set before use.
>
>> +
>> +	rcu_read_lock();
>> +	rxe_sk4 = rcu_dereference(ns_sk->rxe_sk4);
>> +	rxe_sk6 = rcu_dereference(ns_sk->rxe_sk6);
>> +	rcu_read_unlock();
>> +
>> +	/* close socket */
>> +	if (rxe_sk4 && rxe_sk4->sk_socket) {
> how can rxe_sk4 be non-NULL and yet sk_socket become NULL?
>
>> +		udp_tunnel_sock_release(rxe_sk4->sk_socket);
>> +		rcu_assign_pointer(ns_sk->rxe_sk4, NULL);
> if you flip the order
>
> 		rcu_assign_pointer(ns_sk->rxe_sk4, NULL);
> 		/* udp_tunnel_sock_release calls synchronize_rcu */
> 		udp_tunnel_sock_release(rxe_sk4->sk_socket);
>
>
> you should be able to drop the synchronize_rcu here:
>
>> +		synchronize_rcu();
>> +	}
>> +
>> +	if (rxe_sk6 && rxe_sk6->sk_socket) {
> same here.

All the mentioned problems are fix in the latest commit.

Zhu Yanjun

>
>> +		udp_tunnel_sock_release(rxe_sk6->sk_socket);
>> +		rcu_assign_pointer(ns_sk->rxe_sk6, NULL);
>> +		synchronize_rcu();> +	}
>> +}
>> +

-- 
Best Regards,
Yanjun.Zhu


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-03-07  8:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-06  8:24 [PATCH 0/4] RDMA/rxe: Add the support that rxe can work in net namespace Zhu Yanjun
2026-03-06  8:24 ` [PATCH 1/4] RDMA/rxe: Add testcase for net namespace rxe Zhu Yanjun
2026-03-07  1:10   ` David Ahern
2026-03-06  8:24 ` [PATCH 2/4] RDMA/nldev: Add dellink function pointer Zhu Yanjun
2026-03-06  8:24 ` [PATCH 3/4] RDMA/rxe: Add net namespace support for IPv4/IPv6 sockets Zhu Yanjun
2026-03-07  1:10   ` David Ahern
2026-03-07  8:00     ` Zhu Yanjun
2026-03-06  8:24 ` [PATCH 4/4] RDMA/rxe: Support RDMA link creation and destruction per net namespace Zhu Yanjun
2026-03-07  1:10   ` David Ahern
2026-03-07  8:00     ` Zhu Yanjun
2026-03-07  1:12   ` yanjun.zhu
2026-03-06  8:27 ` [PATCH 0/4] RDMA/rxe: Add the support that rxe can work in " Zhu Yanjun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox