Re: [PATCHv2 1/1] RDMA/rxe: Add the support that rxe can work in net namespace

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

From: Zhu Yanjun <yanjun.zhu@linux.dev>
To: jgg@ziepe.ca, leon@kernel.org, zyjzyj2000@gmail.com,
	dsahern@kernel.org, linux-rdma@vger.kernel.org
Subject: Re: [PATCHv2 1/1] RDMA/rxe: Add the support that rxe can work in net namespace
Date: Tue, 3 Mar 2026 20:44:28 -0800	[thread overview]
Message-ID: <28be84a9-ad21-4a29-8199-a155e63e4cd8@linux.dev> (raw)
In-Reply-To: <20260304041607.11685-1-yanjun.zhu@linux.dev>

在 2026/3/3 20:16, Zhu Yanjun 写道:
> When run "ip link add" command to add a rxe rdma link in a net
> namespace, normally this rxe rdma link can not work in a net
> name space.
> 
> The root cause is that a sock listening on udp port 4791 is created
> in init_net when the rdma_rxe module is loaded into kernel. That is,
> the sock listening on udp port 4791 is created in init_net. Other net
> namespace is difficult to use this sock.
> 
> The following commits will solve this problem.
> 
> In the first commit, move the creating sock listening on udp port 4791
> from module_init function to rdma link creating functions. That is,
> after the module rdma_rxe is loaded, the sock will not be created.
> When run "rdma link add ..." command, the sock will be created. So
> when creating a rdma link in the net namespace, the sock will be
> created in this net namespace.
> 
> In the second commit, the functions udp4_lib_lookup and udp6_lib_lookup
> will check the sock exists in the net namespace or not. If yes, rdma
> link will increase the reference count of this sock, then continue other
> jobs instead of creating a new sock to listen on udp port 4791. Since the
> network notifier is global, when the module rdma_rxe is loaded, this
> notifier will be registered.
> 
> After the rdma link is created, the command "rdma link del" is to
> delete rdma link at the same time the sock is checked. If the reference
> count of this sock is greater than the sock reference count needed by
> udp tunnel, the sock reference count is decreased by one. If equal, it
> indicates that this rdma link is the last one. As such, the udp tunnel
> is shut down and the sock is closed. The above work should be
> implemented in linkdel function. But currently no dellink function in
> rxe. So the 3rd commit addes dellink function pointer. And the 4th
> commit implements the dellink function in rxe.
> 
> To now, it is not necessary to keep a global variable to store the sock
> listening udp port 4791. This global variable can be replaced by the
> functions udp4_lib_lookup and udp6_lib_lookup totally. Because the
> function udp6_lib_lookup is in the fast path, a member variable l_sk6
> is added to store the sock. If l_sk6 is NULL, udp6_lib_lookup is called
> to lookup the sock, then the sock is stored in l_sk6, in the future,it
> can be used directly.
> 
> All the above work has been done in init_net. And it can also work in
> the net namespace. So the init_net is replaced by the individual net
> namespace. This is what the 6th commit does. Because rxe device is
> dependent on the net device and the sock listening on udp port 4791,
> every rxe device is in exclusive mode in the individual net namespace.
> Other rdma netns operations will be considerred in the future.
> 
> In the 7th commit, the register_pernet_subsys/unregister_pernet_subsys
> functions are added. When a new net namespace is created, the init
> function will initialize the sk4 and sk6 socks. Then the 2 socks will
> be released when the net namespace is destroyed. The functions
> rxe_ns_pernet_sk4/rxe_ns_pernet_set_sk4 will get and set sk4 in the net
> namespace. The functions rxe_ns_pernet_sk6/rxe_ns_pernet_set_sk6 will
> handle sk6. Then sk4 and sk6 are used in the previous commits.
> 
> As the sk4 and sk6 in pernet namespace can be accessed, it is not
> necessary to add a new l_sk6. As such, in the 8th commit, the l_sk6 is
> replaced with the sk6 in pernet namespace.
> 
> Test steps:
> 1) Suppose that 2 NICs are in 2 different net namespaces.
> 
>    # ip netns exec net0 ip link
>    3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
>       link/ether 00:1e:67:a0:22:3f brd ff:ff:ff:ff:ff:ff
>       altname enp5s0
> 
>    # ip netns exec net1 ip link
>    4: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
>       link/ether f8:e4:3b:3b:e4:10 brd ff:ff:ff:ff:ff:ff
> 
> 2) Add rdma link in the different net namespace
>      net0:
>      # ip netns exec net0 rdma link add rxe0 type rxe netdev eno2
> 
>      net1:
>      # ip netns exec net1 rdma link add rxe1 type rxe netdev eno3
> 
> 3) Run rping test.
>      net0
>      # ip netns exec net0 rping -s -a 192.168.2.1 -C 1&
>      [1] 1737
>      # ip netns exec net1 rping -c -a 192.168.2.1 -d -v -C 1
>      verbose
>      count 1
>      ...
>      ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
>      ...
> 
> 4) Remove the rdma links from the net namespaces.
>      net0:
>      # ip netns exec net0 ss -lu
>      State     Recv-Q    Send-Q    Local Address:Port    Peer Address:Port    Process
>      UNCONN    0         0         0.0.0.0:4791          0.0.0.0:*
>      UNCONN    0         0         [::]:4791             [::]:*
> 
>      # ip netns exec net0 rdma link del rxe0
> 
>      # ip netns exec net0 ss -lu
>      State     Recv-Q    Send-Q    Local Address:Port    Peer Address:Port    Process
> 
>      net1:
>      # ip netns exec net0 ss -lu
>      State     Recv-Q    Send-Q    Local Address:Port    Peer Address:Port    Process
>      UNCONN    0         0         0.0.0.0:4791          0.0.0.0:*
>      UNCONN    0         0         [::]:4791             [::]:*
> 
>      # ip netns exec net1 rdma link del rxe1
> 
>      # ip netns exec net0 ss -lu
>      State     Recv-Q    Send-Q    Local Address:Port    Peer Address:Port    Process
> 
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>

Hi, all

The script in the link 
https://github.com/zhuyj/linux/blob/6.19-net-namespace/net_ns_rxe.sh can 
make tests in linux distributions.

BTW, please disable firewall before making tests.

Zhu Yanjun

> ---
> 1. Use net_generic;
> 2. Add ipv6 check;
> 3. Use a function to handle ipv4/6 socket;
> ---
>   drivers/infiniband/core/nldev.c     |   6 ++
>   drivers/infiniband/sw/rxe/Makefile  |   3 +-
>   drivers/infiniband/sw/rxe/rxe.c     |  32 +++++-
>   drivers/infiniband/sw/rxe/rxe_net.c | 132 +++++++++++++++++------
>   drivers/infiniband/sw/rxe/rxe_net.h |   9 +-
>   drivers/infiniband/sw/rxe/rxe_ns.c  | 156 ++++++++++++++++++++++++++++
>   drivers/infiniband/sw/rxe/rxe_ns.h  |  17 +++
>   include/rdma/rdma_netlink.h         |   2 +
>   8 files changed, 316 insertions(+), 41 deletions(-)
>   create mode 100644 drivers/infiniband/sw/rxe/rxe_ns.c
>   create mode 100644 drivers/infiniband/sw/rxe/rxe_ns.h
> 
> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> index 2220a2dfab24..48684930660a 100644
> --- a/drivers/infiniband/core/nldev.c
> +++ b/drivers/infiniband/core/nldev.c
> @@ -1824,6 +1824,12 @@ static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   		return -EINVAL;
>   	}
>   
> +	if (device->link_ops) {
> +		err = device->link_ops->dellink(device);
> +		if (err)
> +			return err;
> +	}
> +
>   	ib_unregister_device_and_put(device);
>   	return 0;
>   }
> diff --git a/drivers/infiniband/sw/rxe/Makefile b/drivers/infiniband/sw/rxe/Makefile
> index 93134f1d1d0c..3977f4f13258 100644
> --- a/drivers/infiniband/sw/rxe/Makefile
> +++ b/drivers/infiniband/sw/rxe/Makefile
> @@ -22,6 +22,7 @@ rdma_rxe-y := \
>   	rxe_mcast.o \
>   	rxe_task.o \
>   	rxe_net.o \
> -	rxe_hw_counters.o
> +	rxe_hw_counters.o \
> +	rxe_ns.o
>   
>   rdma_rxe-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += rxe_odp.o
> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> index e891199cbdef..165155f9be6d 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -8,6 +8,8 @@
>   #include <net/addrconf.h>
>   #include "rxe.h"
>   #include "rxe_loc.h"
> +#include "rxe_net.h"
> +#include "rxe_ns.h"
>   
>   MODULE_AUTHOR("Bob Pearson, Frank Zago, John Groves, Kamal Heib");
>   MODULE_DESCRIPTION("Soft RDMA transport");
> @@ -200,6 +202,8 @@ void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu)
>   	port->mtu_cap = ib_mtu_enum_to_int(mtu);
>   }
>   
> +static struct rdma_link_ops rxe_link_ops;
> +
>   /* called by ifc layer to create new rxe device.
>    * The caller should allocate memory for rxe by calling ib_alloc_device.
>    */
> @@ -208,6 +212,7 @@ int rxe_add(struct rxe_dev *rxe, unsigned int mtu, const char *ibdev_name,
>   {
>   	rxe_init(rxe, ndev);
>   	rxe_set_mtu(rxe, mtu);
> +	rxe->ib_dev.link_ops = &rxe_link_ops;
>   
>   	return rxe_register_device(rxe, ibdev_name, ndev);
>   }
> @@ -231,6 +236,10 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
>   		goto err;
>   	}
>   
> +	err = rxe_net_init(ndev);
> +	if (err)
> +		return err;
> +
>   	err = rxe_net_add(ibdev_name, ndev);
>   	if (err) {
>   		rxe_err("failed to add %s\n", ndev->name);
> @@ -240,9 +249,17 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
>   	return err;
>   }
>   
> +static int rxe_dellink(struct ib_device *dev)
> +{
> +	rxe_net_del(dev);
> +
> +	return 0;
> +}
> +
>   static struct rdma_link_ops rxe_link_ops = {
>   	.type = "rxe",
>   	.newlink = rxe_newlink,
> +	.dellink = rxe_dellink,
>   };
>   
>   static int __init rxe_module_init(void)
> @@ -253,13 +270,20 @@ static int __init rxe_module_init(void)
>   	if (err)
>   		return err;
>   
> -	err = rxe_net_init();
> +	rdma_link_register(&rxe_link_ops);
> +	err = rxe_register_notifier();
>   	if (err) {
> +		pr_err("Failed to register netdev notifier\n");
>   		rxe_destroy_wq();
> -		return err;
> +		return -1;
> +	}
> +
> +	err = rxe_namespace_init();
> +	if (err) {
> +		pr_err("Failed to register net namespace notifier\n");
> +		return -1;
>   	}
>   
> -	rdma_link_register(&rxe_link_ops);
>   	pr_info("loaded\n");
>   	return 0;
>   }
> @@ -271,6 +295,8 @@ static void __exit rxe_module_exit(void)
>   	rxe_net_exit();
>   	rxe_destroy_wq();
>   
> +	rxe_namespace_exit();
> +
>   	pr_info("unloaded\n");
>   }
>   
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 0bd0902b11f7..bf50a298c9ba 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -17,8 +17,7 @@
>   #include "rxe.h"
>   #include "rxe_net.h"
>   #include "rxe_loc.h"
> -
> -static struct rxe_recv_sockets recv_sockets;
> +#include "rxe_ns.h"
>   
>   #ifdef CONFIG_DEBUG_LOCK_ALLOC
>   /*
> @@ -106,7 +105,7 @@ static struct dst_entry *rxe_find_route4(struct rxe_qp *qp,
>   					 struct in_addr *daddr)
>   {
>   	struct rtable *rt;
> -	struct flowi4 fl = { { 0 } };
> +	struct flowi4 fl = {};
>   
>   	memset(&fl, 0, sizeof(fl));
>   	fl.flowi4_oif = ndev->ifindex;
> @@ -114,7 +113,7 @@ static struct dst_entry *rxe_find_route4(struct rxe_qp *qp,
>   	memcpy(&fl.daddr, daddr, sizeof(*daddr));
>   	fl.flowi4_proto = IPPROTO_UDP;
>   
> -	rt = ip_route_output_key(&init_net, &fl);
> +	rt = ip_route_output_key(dev_net(ndev), &fl);
>   	if (IS_ERR(rt)) {
>   		rxe_dbg_qp(qp, "no route to %pI4\n", &daddr->s_addr);
>   		return NULL;
> @@ -130,7 +129,7 @@ static struct dst_entry *rxe_find_route6(struct rxe_qp *qp,
>   					 struct in6_addr *daddr)
>   {
>   	struct dst_entry *ndst;
> -	struct flowi6 fl6 = { { 0 } };
> +	struct flowi6 fl6 = {};
>   
>   	memset(&fl6, 0, sizeof(fl6));
>   	fl6.flowi6_oif = ndev->ifindex;
> @@ -138,8 +137,8 @@ static struct dst_entry *rxe_find_route6(struct rxe_qp *qp,
>   	memcpy(&fl6.daddr, daddr, sizeof(*daddr));
>   	fl6.flowi6_proto = IPPROTO_UDP;
>   
> -	ndst = ipv6_stub->ipv6_dst_lookup_flow(sock_net(recv_sockets.sk6->sk),
> -					       recv_sockets.sk6->sk, &fl6,
> +	ndst = ipv6_stub->ipv6_dst_lookup_flow(dev_net(ndev),
> +					       rxe_ns_pernet_sk6(dev_net(ndev)), &fl6,
>   					       NULL);
>   	if (IS_ERR(ndst)) {
>   		rxe_dbg_qp(qp, "no route to %pI6\n", daddr);
> @@ -624,6 +623,50 @@ int rxe_net_add(const char *ibdev_name, struct net_device *ndev)
>   	return 0;
>   }
>   
> +#define SK_REF_FOR_TUNNEL	2
> +
> +static void rxe_sock_put(struct sock *sk,
> +					void (*set_sk)(struct net *, struct sock *),
> +					struct net_device *ndev)
> +{
> +	if (refcount_read(&sk->sk_refcnt) > SK_REF_FOR_TUNNEL) {
> +		__sock_put(sk);
> +	} else {
> +		rxe_release_udp_tunnel(sk->sk_socket);
> +		sk = NULL;
> +		set_sk(dev_net(ndev), sk);
> +	}
> +}
> +
> +void rxe_net_del(struct ib_device *dev)
> +{
> +	struct sock *sk;
> +	struct rxe_dev *rxe;
> +	struct net_device *ndev;
> +
> +	rxe = container_of(dev, struct rxe_dev, ib_dev);
> +
> +	ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
> +	if (!ndev)
> +		return;
> +
> +	sk = rxe_ns_pernet_sk4(dev_net(ndev));
> +	if (!sk)
> +		goto err_out;
> +
> +	rxe_sock_put(sk, rxe_ns_pernet_set_sk4, ndev);
> +
> +	sk = rxe_ns_pernet_sk6(dev_net(ndev));
> +	if (!sk)
> +		goto err_out;
> +
> +	rxe_sock_put(sk, rxe_ns_pernet_set_sk6, ndev);
> +
> +err_out:
> +	dev_put(ndev);
> +}
> +#undef SK_REF_FOR_TUNNEL
> +
>   static void rxe_port_event(struct rxe_dev *rxe,
>   			   enum ib_event_type event)
>   {
> @@ -680,6 +723,7 @@ static int rxe_notify(struct notifier_block *not_blk,
>   	switch (event) {
>   	case NETDEV_UNREGISTER:
>   		ib_unregister_device_queued(&rxe->ib_dev);
> +		rxe_net_del(&rxe->ib_dev);
>   		break;
>   	case NETDEV_CHANGEMTU:
>   		rxe_dbg_dev(rxe, "%s changed mtu to %d\n", ndev->name, ndev->mtu);
> @@ -709,66 +753,92 @@ static struct notifier_block rxe_net_notifier = {
>   	.notifier_call = rxe_notify,
>   };
>   
> -static int rxe_net_ipv4_init(void)
> +static int rxe_net_ipv4_init(struct net_device *ndev)
>   {
> -	recv_sockets.sk4 = rxe_setup_udp_tunnel(&init_net,
> -				htons(ROCE_V2_UDP_DPORT), false);
> -	if (IS_ERR(recv_sockets.sk4)) {
> -		recv_sockets.sk4 = NULL;
> +	struct sock *sk;
> +	struct socket *sock;
> +
> +	sk = rxe_ns_pernet_sk4(dev_net(ndev));
> +	if (sk) {
> +		sock_hold(sk);
> +		return 0;
> +	}
> +
> +	sock = rxe_setup_udp_tunnel(dev_net(ndev), htons(ROCE_V2_UDP_DPORT), false);
> +	if (IS_ERR(sock)) {
>   		pr_err("Failed to create IPv4 UDP tunnel\n");
>   		return -1;
>   	}
> +	rxe_ns_pernet_set_sk4(dev_net(ndev), sock->sk);
>   
>   	return 0;
>   }
>   
> -static int rxe_net_ipv6_init(void)
> +static int rxe_net_ipv6_init(struct net_device *ndev)
>   {
>   #if IS_ENABLED(CONFIG_IPV6)
> +	struct sock *sk;
> +	struct socket *sock;
> +
> +	sk = rxe_ns_pernet_sk6(dev_net(ndev));
> +	if (sk) {
> +		sock_hold(sk);
> +		return 0;
> +	}
>   
> -	recv_sockets.sk6 = rxe_setup_udp_tunnel(&init_net,
> -						htons(ROCE_V2_UDP_DPORT), true);
> -	if (PTR_ERR(recv_sockets.sk6) == -EAFNOSUPPORT) {
> -		recv_sockets.sk6 = NULL;
> +	sock = rxe_setup_udp_tunnel(dev_net(ndev), htons(ROCE_V2_UDP_DPORT), true);
> +	if (PTR_ERR(sock) == -EAFNOSUPPORT) {
>   		pr_warn("IPv6 is not supported, can not create a UDPv6 socket\n");
>   		return 0;
>   	}
>   
> -	if (IS_ERR(recv_sockets.sk6)) {
> -		recv_sockets.sk6 = NULL;
> +	if (IS_ERR(sock)) {
>   		pr_err("Failed to create IPv6 UDP tunnel\n");
>   		return -1;
>   	}
> +
> +	rxe_ns_pernet_set_sk6(dev_net(ndev), sock->sk);
> +
>   #endif
>   	return 0;
>   }
>   
> +int rxe_register_notifier(void)
> +{
> +	int err;
> +
> +	err = register_netdevice_notifier(&rxe_net_notifier);
> +	if (err) {
> +		pr_err("Failed to register netdev notifier\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
>   void rxe_net_exit(void)
>   {
> -	rxe_release_udp_tunnel(recv_sockets.sk6);
> -	rxe_release_udp_tunnel(recv_sockets.sk4);
>   	unregister_netdevice_notifier(&rxe_net_notifier);
>   }
>   
> -int rxe_net_init(void)
> +int rxe_net_init(struct net_device *ndev)
>   {
>   	int err;
>   
> -	recv_sockets.sk6 = NULL;
> -
> -	err = rxe_net_ipv4_init();
> +	err = rxe_net_ipv4_init(ndev);
>   	if (err)
>   		return err;
> -	err = rxe_net_ipv6_init();
> +
> +	err = rxe_net_ipv6_init(ndev);
>   	if (err)
>   		goto err_out;
> -	err = register_netdevice_notifier(&rxe_net_notifier);
> -	if (err) {
> -		pr_err("Failed to register netdev notifier\n");
> -		goto err_out;
> -	}
> +
>   	return 0;
> +
>   err_out:
> +	/* If ipv6 error, release ipv4 resource */
> +	udp_tunnel_sock_release(rxe_ns_pernet_sk4(dev_net(ndev))->sk_socket);
> +	rxe_ns_pernet_set_sk4(dev_net(ndev), NULL);
>   	rxe_net_exit();
>   	return err;
>   }
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h
> index 45d80d00f86b..56249677d692 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.h
> +++ b/drivers/infiniband/sw/rxe/rxe_net.h
> @@ -11,14 +11,11 @@
>   #include <net/if_inet6.h>
>   #include <linux/module.h>
>   
> -struct rxe_recv_sockets {
> -	struct socket *sk4;
> -	struct socket *sk6;
> -};
> -
>   int rxe_net_add(const char *ibdev_name, struct net_device *ndev);
> +void rxe_net_del(struct ib_device *dev);
>   
> -int rxe_net_init(void);
> +int rxe_register_notifier(void);
> +int rxe_net_init(struct net_device *ndev);
>   void rxe_net_exit(void);
>   
>   #endif /* RXE_NET_H */
> diff --git a/drivers/infiniband/sw/rxe/rxe_ns.c b/drivers/infiniband/sw/rxe/rxe_ns.c
> new file mode 100644
> index 000000000000..1ff34167a295
> --- /dev/null
> +++ b/drivers/infiniband/sw/rxe/rxe_ns.c
> @@ -0,0 +1,156 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
> + * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
> + */
> +
> +#include <net/sock.h>
> +#include <net/netns/generic.h>
> +#include <net/net_namespace.h>
> +#include <linux/module.h>
> +#include <linux/skbuff.h>
> +#include <linux/pid_namespace.h>
> +#include <net/udp_tunnel.h>
> +
> +#include "rxe_ns.h"
> +
> +/*
> + * Per network namespace data
> + */
> +struct rxe_ns_sock {
> +	struct sock __rcu *rxe_sk4;
> +	struct sock __rcu *rxe_sk6;
> +};
> +
> +/*
> + * Index to store custom data for each network namespace.
> + */
> +static unsigned int rxe_pernet_id;
> +
> +/*
> + * Called for every existing and added network namespaces
> + */
> +static int __net_init rxe_ns_init(struct net *net)
> +{
> +	/*
> +	 * create (if not present) and access data item in network namespace
> +	 * (net) using the id (net_id)
> +	 */
> +	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
> +
> +	rcu_assign_pointer(ns_sk->rxe_sk4, NULL); /* initialize sock 4 socket */
> +#if IS_ENABLED(CONFIG_IPV6)
> +	rcu_assign_pointer(ns_sk->rxe_sk6, NULL); /* initialize sock 6 socket */
> +#endif /* IPV6 */
> +	synchronize_rcu();
> +
> +	return 0;
> +}
> +
> +static void __net_exit rxe_ns_exit(struct net *net)
> +{
> +	/*
> +	 * called when the network namespace is removed
> +	 */
> +	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
> +	struct sock *rxe_sk4 = NULL;
> +#if IS_ENABLED(CONFIG_IPV6)
> +	struct sock *rxe_sk6 = NULL;
> +#endif
> +
> +	rcu_read_lock();
> +	rxe_sk4 = rcu_dereference(ns_sk->rxe_sk4);
> +#if IS_ENABLED(CONFIG_IPV6)
> +	rxe_sk6 = rcu_dereference(ns_sk->rxe_sk6);
> +#endif
> +	rcu_read_unlock();
> +
> +	/* close socket */
> +	if (rxe_sk4 && rxe_sk4->sk_socket) {
> +		udp_tunnel_sock_release(rxe_sk4->sk_socket);
> +		rcu_assign_pointer(ns_sk->rxe_sk4, NULL);
> +		synchronize_rcu();
> +	}
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> +	if (rxe_sk6 && rxe_sk6->sk_socket) {
> +		udp_tunnel_sock_release(rxe_sk6->sk_socket);
> +		rcu_assign_pointer(ns_sk->rxe_sk6, NULL);
> +		synchronize_rcu();
> +	}
> +#endif
> +}
> +
> +/*
> + * callback to make the module network namespace aware
> + */
> +static struct pernet_operations rxe_net_ops __net_initdata = {
> +	.init = rxe_ns_init,
> +	.exit = rxe_ns_exit,
> +	.id = &rxe_pernet_id,
> +	.size = sizeof(struct rxe_ns_sock),
> +};
> +
> +struct sock *rxe_ns_pernet_sk4(struct net *net)
> +{
> +	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
> +	struct sock *sk;
> +
> +	rcu_read_lock();
> +	sk = rcu_dereference(ns_sk->rxe_sk4);
> +	rcu_read_unlock();
> +
> +	return sk;
> +}
> +
> +void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk)
> +{
> +	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
> +
> +	rcu_assign_pointer(ns_sk->rxe_sk4, sk);
> +	synchronize_rcu();
> +}
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> +struct sock *rxe_ns_pernet_sk6(struct net *net)
> +{
> +	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
> +	struct sock *sk;
> +
> +	rcu_read_lock();
> +	sk = rcu_dereference(ns_sk->rxe_sk6);
> +	rcu_read_unlock();
> +
> +	return sk;
> +}
> +
> +void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
> +{
> +	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
> +
> +	rcu_assign_pointer(ns_sk->rxe_sk6, sk);
> +	synchronize_rcu();
> +}
> +
> +#else /* IPV6 */
> +
> +struct sock *rxe_ns_pernet_sk6(struct net *net)
> +{
> +	return NULL;
> +}
> +
> +void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
> +{
> +}
> +
> +#endif /* IPV6 */
> +
> +int __init rxe_namespace_init(void)
> +{
> +	return register_pernet_subsys(&rxe_net_ops);
> +}
> +
> +void __exit rxe_namespace_exit(void)
> +{
> +	unregister_pernet_subsys(&rxe_net_ops);
> +}
> diff --git a/drivers/infiniband/sw/rxe/rxe_ns.h b/drivers/infiniband/sw/rxe/rxe_ns.h
> new file mode 100644
> index 000000000000..da5bfcea1274
> --- /dev/null
> +++ b/drivers/infiniband/sw/rxe/rxe_ns.h
> @@ -0,0 +1,17 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
> + * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
> + */
> +
> +#ifndef RXE_NS_H
> +#define RXE_NS_H
> +
> +struct sock *rxe_ns_pernet_sk4(struct net *net);
> +struct sock *rxe_ns_pernet_sk6(struct net *net);
> +void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk);
> +void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk);
> +int __init rxe_namespace_init(void);
> +void __exit rxe_namespace_exit(void);
> +
> +#endif /* RXE_NS_H */
> diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h
> index 326deaf56d5d..2fd1358ea57d 100644
> --- a/include/rdma/rdma_netlink.h
> +++ b/include/rdma/rdma_netlink.h
> @@ -5,6 +5,7 @@
>   
>   #include <linux/netlink.h>
>   #include <uapi/rdma/rdma_netlink.h>
> +#include <rdma/ib_verbs.h>
>   
>   struct ib_device;
>   
> @@ -126,6 +127,7 @@ struct rdma_link_ops {
>   	struct list_head list;
>   	const char *type;
>   	int (*newlink)(const char *ibdev_name, struct net_device *ndev);
> +	int (*dellink)(struct ib_device *dev);
>   };
>   
>   void rdma_link_register(struct rdma_link_ops *ops);

next prev parent reply	other threads:[~2026-03-04  4:44 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-04  4:16 [PATCHv2 1/1] RDMA/rxe: Add the support that rxe can work in net namespace Zhu Yanjun
2026-03-04  4:44 ` Zhu Yanjun [this message]
2026-03-04 19:29   ` David Ahern
2026-03-05  3:29     ` Zhu Yanjun
2026-03-05 18:58       ` David Ahern
2026-03-05 23:15         ` Yanjun.Zhu
2026-03-05 23:41           ` David Ahern
2026-03-06 21:29             ` yanjun.zhu
2026-03-05 18:54 ` David Ahern
2026-03-06  2:38   ` Yanjun.Zhu
2026-03-06  2:58 ` kernel test robot
2026-03-06 21:28   ` yanjun.zhu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28be84a9-ad21-4a29-8199-a155e63e4cd8@linux.dev \
    --to=yanjun.zhu@linux.dev \
    --cc=dsahern@kernel.org \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox