From: Kuniyuki Iwashima <kuniyu@google.com>
To: Zhu Yanjun <zyjzyj2000@gmail.com>, Jason Gunthorpe <jgg@ziepe.ca>,
Leon Romanovsky <leon@kernel.org>
Cc: David Ahern <dsahern@kernel.org>,
Kuniyuki Iwashima <kuniyu@google.com>,
Kuniyuki Iwashima <kuni1840@gmail.com>,
linux-rdma@vger.kernel.org,
syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
Subject: [PATCH v2 1/2] RDMA/rxe: Fix null-ptr-deref in kernel_sock_shutdown().
Date: Sat, 25 Apr 2026 06:04:13 +0000 [thread overview]
Message-ID: <20260425060436.2316620-2-kuniyu@google.com> (raw)
In-Reply-To: <20260425060436.2316620-1-kuniyu@google.com>
syzbot reported null-ptr-deref in kernel_sock_shutdown(). [0]
The problem is ->newlink() and ->dellink() can be called
concurrently with no synchronisation, leading sk leak or
double free, etc.
We defer UDP tunnel allocation to the first device creation,
but this would requrie per-netns locking.
Let's allocate UDP tunnels in the __init_net hook.
Now extra sock_hold() and __sock_put() are no longer needed.
Note that rxe_ns_pernet_sk6() is broken and will be fixed
in the following patch.
[0]:
Oops: general protection fault, probably for non-canonical address 0xdffffc000000000d: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f]
CPU: 3 UID: 0 PID: 12652 Comm: syz.7.1709 Tainted: G L syzkaller #0 PREEMPT(full)
Tainted: [L]=SOFTLOCKUP
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:kernel_sock_shutdown+0x47/0x70 net/socket.c:3785
Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 33 48 b8 00 00 00 00 00 fc ff df 4c 8b 63 20 49 8d 7c 24 68 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 1a 49 8b 44 24 68 89 ee 48 89 df 5b 5d 41 5c e9 46
RSP: 0018:ffffc9000566f180 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff888058587240 RCX: 0000000000000000
RDX: 000000000000000d RSI: ffffffff895ced12 RDI: 0000000000000068
RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed1006d98945
R10: ffff888036cc4a2b R11: 0000003683c25c00 R12: 0000000000000000
R13: ffff88805c998000 R14: 0000000000000002 R15: 0000000000000018
FS: 00007f1306d976c0(0000) GS:ffff8880d65db000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1306d97d58 CR3: 00000000404f1000 CR4: 0000000000352ef0
DR0: ffffffffffffffff DR1: 00000000000001f8 DR2: 0000000000000002
DR3: ffffffffefffff15 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
udp_tunnel_sock_release+0x68/0x80 net/ipv4/udp_tunnel_core.c:202
rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
rxe_sock_put+0xae/0x130 drivers/infiniband/sw/rxe/rxe_net.c:639
rxe_net_del+0x83/0x120 drivers/infiniband/sw/rxe/rxe_net.c:660
rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
nldev_dellink+0x289/0x3c0 drivers/infiniband/core/nldev.c:1849
rdma_nl_rcv_msg+0x392/0x6f0 drivers/infiniband/core/netlink.c:195
rdma_nl_rcv_skb.constprop.0.isra.0+0x2cb/0x410 drivers/infiniband/core/netlink.c:239
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0x585/0x850 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x8b0/0xda0 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:787 [inline]
__sock_sendmsg net/socket.c:802 [inline]
____sys_sendmsg+0x9e1/0xb70 net/socket.c:2698
___sys_sendmsg+0x190/0x1e0 net/socket.c:2752
__sys_sendmsg+0x170/0x220 net/socket.c:2784
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x10b/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f1305f9c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f1306d97028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f1306216090 RCX: 00007f1305f9c819
RDX: 0000000000000000 RSI: 00002000000002c0 RDI: 0000000000000003
RBP: 00007f1306032c91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f1306216128 R14: 00007f1306216090 R15: 00007ffd8ecad288
</TASK>
Modules linked in:
Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and destruction per net namespace")
Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/69ea344f.a00a0220.17a17.0040.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
v2: Set up UDP tunnels in __net_init instead of adding mutex.
v1: https://lore.kernel.org/all/20260424013759.728288-1-kuniyu@google.com/
---
drivers/infiniband/sw/rxe/rxe.c | 6 --
drivers/infiniband/sw/rxe/rxe_net.c | 126 ++--------------------------
drivers/infiniband/sw/rxe/rxe_net.h | 5 +-
drivers/infiniband/sw/rxe/rxe_ns.c | 90 +++++++++-----------
drivers/infiniband/sw/rxe/rxe_ns.h | 1 -
5 files changed, 47 insertions(+), 181 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index b0714f9abe3d..111ba4e57261 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -236,10 +236,6 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
goto err;
}
- err = rxe_net_init(ndev);
- if (err)
- return err;
-
err = rxe_net_add(ibdev_name, ndev);
if (err) {
rxe_err("failed to add %s\n", ndev->name);
@@ -251,8 +247,6 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
static int rxe_dellink(struct ib_device *dev)
{
- rxe_net_del(dev);
-
return 0;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 50a2cb5405e2..9080d4c893a1 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -256,8 +256,8 @@ static int rxe_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
return 0;
}
-static struct socket *rxe_setup_udp_tunnel(struct net *net, __be16 port,
- bool ipv6)
+struct sock *rxe_setup_udp_tunnel(struct net *net, __be16 port,
+ bool ipv6)
{
int err;
struct socket *sock;
@@ -285,13 +285,12 @@ static struct socket *rxe_setup_udp_tunnel(struct net *net, __be16 port,
/* Setup UDP tunnel */
setup_udp_tunnel_sock(net, sock, &tnl_cfg);
- return sock;
+ return sock->sk;
}
-static void rxe_release_udp_tunnel(struct socket *sk)
+void rxe_release_udp_tunnel(struct sock *sk)
{
- if (sk)
- udp_tunnel_sock_release(sk);
+ udp_tunnel_sock_release(sk->sk_socket);
}
static void prepare_udp_hdr(struct sk_buff *skb, __be16 src_port,
@@ -629,43 +628,6 @@ int rxe_net_add(const char *ibdev_name, struct net_device *ndev)
return 0;
}
-static void rxe_sock_put(struct sock *sk,
- void (*set_sk)(struct net *, struct sock *),
- struct net *net)
-{
- if (refcount_read(&sk->sk_refcnt) > SK_REF_FOR_TUNNEL) {
- __sock_put(sk);
- } else {
- rxe_release_udp_tunnel(sk->sk_socket);
- sk = NULL;
- set_sk(net, sk);
- }
-}
-
-void rxe_net_del(struct ib_device *dev)
-{
- struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev);
- struct net_device *ndev;
- struct sock *sk;
- struct net *net;
-
- ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
- if (!ndev)
- return;
-
- net = dev_net(ndev);
-
- sk = rxe_ns_pernet_sk4(net);
- if (sk)
- rxe_sock_put(sk, rxe_ns_pernet_set_sk4, net);
-
- sk = rxe_ns_pernet_sk6(net);
- if (sk)
- rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);
-
- dev_put(ndev);
-}
-
static void rxe_port_event(struct rxe_dev *rxe,
enum ib_event_type event)
{
@@ -722,7 +684,6 @@ static int rxe_notify(struct notifier_block *not_blk,
switch (event) {
case NETDEV_UNREGISTER:
ib_unregister_device_queued(&rxe->ib_dev);
- rxe_net_del(&rxe->ib_dev);
break;
case NETDEV_CHANGEMTU:
rxe_dbg_dev(rxe, "%s changed mtu to %d\n", ndev->name, ndev->mtu);
@@ -752,56 +713,6 @@ static struct notifier_block rxe_net_notifier = {
.notifier_call = rxe_notify,
};
-static int rxe_net_ipv4_init(struct net *net)
-{
- struct sock *sk;
- struct socket *sock;
-
- sk = rxe_ns_pernet_sk4(net);
- if (sk) {
- sock_hold(sk);
- return 0;
- }
-
- sock = rxe_setup_udp_tunnel(net, htons(ROCE_V2_UDP_DPORT), false);
- if (IS_ERR(sock)) {
- pr_err("Failed to create IPv4 UDP tunnel\n");
- return -1;
- }
- rxe_ns_pernet_set_sk4(net, sock->sk);
-
- return 0;
-}
-
-static int rxe_net_ipv6_init(struct net *net)
-{
-#if IS_ENABLED(CONFIG_IPV6)
- struct sock *sk;
- struct socket *sock;
-
- sk = rxe_ns_pernet_sk6(net);
- if (sk) {
- sock_hold(sk);
- return 0;
- }
-
- sock = rxe_setup_udp_tunnel(net, htons(ROCE_V2_UDP_DPORT), true);
- if (PTR_ERR(sock) == -EAFNOSUPPORT) {
- pr_warn("IPv6 is not supported, can not create a UDPv6 socket\n");
- return 0;
- }
-
- if (IS_ERR(sock)) {
- pr_err("Failed to create IPv6 UDP tunnel\n");
- return -1;
- }
-
- rxe_ns_pernet_set_sk6(net, sock->sk);
-
-#endif
- return 0;
-}
-
int rxe_register_notifier(void)
{
int err;
@@ -819,30 +730,3 @@ void rxe_net_exit(void)
{
unregister_netdevice_notifier(&rxe_net_notifier);
}
-
-int rxe_net_init(struct net_device *ndev)
-{
- struct net *net;
- struct sock *sk;
- int err;
-
- net = dev_net(ndev);
-
- err = rxe_net_ipv4_init(net);
- if (err)
- return err;
-
- err = rxe_net_ipv6_init(net);
- if (err)
- goto err_out;
-
- return 0;
-
-err_out:
- /* If ipv6 error, release ipv4 resource */
- sk = rxe_ns_pernet_sk4(net);
- if (sk)
- rxe_sock_put(sk, rxe_ns_pernet_set_sk4, net);
-
- return err;
-}
diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h
index 56249677d692..592b0e577f32 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.h
+++ b/drivers/infiniband/sw/rxe/rxe_net.h
@@ -11,11 +11,12 @@
#include <net/if_inet6.h>
#include <linux/module.h>
+struct sock *rxe_setup_udp_tunnel(struct net *net, __be16 port, bool ipv6);
+void rxe_release_udp_tunnel(struct sock *sk);
+
int rxe_net_add(const char *ibdev_name, struct net_device *ndev);
-void rxe_net_del(struct ib_device *dev);
int rxe_register_notifier(void);
-int rxe_net_init(struct net_device *ndev);
void rxe_net_exit(void);
#endif /* RXE_NET_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_ns.c b/drivers/infiniband/sw/rxe/rxe_ns.c
index 8b9d734229b2..06eb2e2387a1 100644
--- a/drivers/infiniband/sw/rxe/rxe_ns.c
+++ b/drivers/infiniband/sw/rxe/rxe_ns.c
@@ -7,8 +7,10 @@
#include <linux/skbuff.h>
#include <linux/pid_namespace.h>
#include <net/udp_tunnel.h>
+#include <rdma/ib_verbs.h>
#include "rxe_ns.h"
+#include "rxe_net.h"
/*
* Per network namespace data
@@ -23,40 +25,54 @@ struct rxe_ns_sock {
*/
static unsigned int rxe_pernet_id;
-/*
- * Called for every existing and added network namespaces
- */
-static int rxe_ns_init(struct net *net)
+static __net_init int rxe_ns_init(struct net *net)
{
- /* defer socket create in the namespace to the first
- * device create.
- */
+ struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
+ struct sock *sk;
+ int err = 0;
+
+ sk = rxe_setup_udp_tunnel(net, htons(ROCE_V2_UDP_DPORT), false);
+ if (IS_ERR(sk)) {
+ err = PTR_ERR(sk);
+ goto out;
+ }
+
+ RCU_INIT_POINTER(ns_sk->rxe_sk4, sk);
+
+#if IS_ENABLED(CONFIG_IPV6)
+ sk = rxe_setup_udp_tunnel(net, htons(ROCE_V2_UDP_DPORT), true);
+ if (IS_ERR(sk)) {
+ err = PTR_ERR(sk);
+ if (err == -EAFNOSUPPORT) {
+ err = 0;
+ goto out;
+ }
+
+ sk = rcu_dereference_protected(ns_sk->rxe_sk4, 1);
+ rxe_release_udp_tunnel(sk);
+ goto out;
+ }
- return 0;
+ RCU_INIT_POINTER(ns_sk->rxe_sk6, sk);
+#endif
+out:
+ return err;
}
-static void rxe_ns_exit(struct net *net)
+static __net_exit void rxe_ns_exit(struct net *net)
{
- /* called when the network namespace is removed
- */
struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
struct sock *sk;
- rcu_read_lock();
- sk = rcu_dereference(ns_sk->rxe_sk4);
- rcu_read_unlock();
- if (sk) {
- rcu_assign_pointer(ns_sk->rxe_sk4, NULL);
- udp_tunnel_sock_release(sk->sk_socket);
- }
+ sk = rcu_dereference_protected(ns_sk->rxe_sk4, 1);
+ RCU_INIT_POINTER(ns_sk->rxe_sk4, NULL);
+ rxe_release_udp_tunnel(sk);
#if IS_ENABLED(CONFIG_IPV6)
- rcu_read_lock();
- sk = rcu_dereference(ns_sk->rxe_sk6);
- rcu_read_unlock();
+ sk = rcu_dereference_protected(ns_sk->rxe_sk6, 1);
if (sk) {
- rcu_assign_pointer(ns_sk->rxe_sk6, NULL);
- udp_tunnel_sock_release(sk->sk_socket);
+ RCU_INIT_POINTER(ns_sk->rxe_sk6, NULL);
+ rxe_release_udp_tunnel(sk);
}
#endif
}
@@ -71,26 +87,6 @@ static struct pernet_operations rxe_net_ops = {
.size = sizeof(struct rxe_ns_sock),
};
-struct sock *rxe_ns_pernet_sk4(struct net *net)
-{
- struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
- struct sock *sk;
-
- rcu_read_lock();
- sk = rcu_dereference(ns_sk->rxe_sk4);
- rcu_read_unlock();
-
- return sk;
-}
-
-void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk)
-{
- struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
-
- rcu_assign_pointer(ns_sk->rxe_sk4, sk);
- synchronize_rcu();
-}
-
#if IS_ENABLED(CONFIG_IPV6)
struct sock *rxe_ns_pernet_sk6(struct net *net)
{
@@ -103,14 +99,6 @@ struct sock *rxe_ns_pernet_sk6(struct net *net)
return sk;
}
-
-void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
-{
- struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
-
- rcu_assign_pointer(ns_sk->rxe_sk6, sk);
- synchronize_rcu();
-}
#endif /* IPV6 */
int rxe_namespace_init(void)
diff --git a/drivers/infiniband/sw/rxe/rxe_ns.h b/drivers/infiniband/sw/rxe/rxe_ns.h
index 4da2709e6b71..7f48d624fa05 100644
--- a/drivers/infiniband/sw/rxe/rxe_ns.h
+++ b/drivers/infiniband/sw/rxe/rxe_ns.h
@@ -3,7 +3,6 @@
#ifndef RXE_NS_H
#define RXE_NS_H
-struct sock *rxe_ns_pernet_sk4(struct net *net);
void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk);
#if IS_ENABLED(CONFIG_IPV6)
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
next prev parent reply other threads:[~2026-04-25 6:04 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-25 6:04 [PATCH v2 0/2] RDMA/rxe: Fix per-netns UDP tunnel issues Kuniyuki Iwashima
2026-04-25 6:04 ` Kuniyuki Iwashima [this message]
2026-04-25 15:47 ` [PATCH v2 1/2] RDMA/rxe: Fix null-ptr-deref in kernel_sock_shutdown() David Ahern
2026-04-25 20:55 ` Kuniyuki Iwashima
2026-04-26 16:40 ` David Ahern
2026-04-25 21:25 ` Zhu Yanjun
2026-04-26 16:42 ` David Ahern
2026-04-27 2:57 ` Zhu Yanjun
2026-04-27 3:10 ` Kuniyuki Iwashima
2026-04-27 3:53 ` Zhu Yanjun
2026-04-27 14:38 ` David Ahern
2026-04-27 20:20 ` yanjun.zhu
2026-04-28 0:52 ` Kuniyuki Iwashima
2026-04-28 0:58 ` David Ahern
2026-04-28 2:15 ` Zhu Yanjun
2026-04-28 5:12 ` Zhu Yanjun
2026-04-28 5:22 ` Kuniyuki Iwashima
2026-04-28 6:30 ` Zhu Yanjun
2026-04-28 6:39 ` Kuniyuki Iwashima
2026-04-28 16:56 ` yanjun.zhu
2026-04-25 6:04 ` [PATCH v2 2/2] RDMA/rxe: Fix up RCU usage for rxe_ns_pernet_sk6() Kuniyuki Iwashima
2026-04-25 21:26 ` Zhu Yanjun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260425060436.2316620-2-kuniyu@google.com \
--to=kuniyu@google.com \
--cc=dsahern@kernel.org \
--cc=jgg@ziepe.ca \
--cc=kuni1840@gmail.com \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com \
--cc=zyjzyj2000@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.