* Re: [syzbot] [net?] general protection fault in kernel_sock_shutdown (4)
[not found] <69ea344f.a00a0220.17a17.0040.GAE@google.com>
@ 2026-04-24 18:08 ` Arjan van de Ven
2026-04-25 1:12 ` Arjan van de Ven
` (2 subsequent siblings)
3 siblings, 0 replies; 20+ messages in thread
From: Arjan van de Ven @ 2026-04-24 18:08 UTC (permalink / raw)
To: netdev
Cc: syzbot+d8f76778263ab65c2b21, dsahern, edumazet, akpm,
linux-kernel, syzkaller-bugs, Arjan van de Ven, linux-rdma,
Zhu Yanjun, Jason Gunthorpe, Leon Romanovsky
Unfortunately the AI had a burp and did not write out the proper URL
for analysis data; it should have been
http://oops.fenrus.org/reports/lkml/69ea344f.a00a0220.17a17.0040.GAE_google.com/report.html
and in addition, it made a candidate patch (below)
From: Arjan van de Ven <arjan@linux.intel.com>
Subject: [PATCH] RDMA/rxe: fix double-release race on UDP tunnel socket teardown
This patch is based on a BUG as reported at
https://lore.kernel.org/r/69ea344f.a00a0220.17a17.0040.GAE@google.com.
The Soft RoCE (RXE) driver stores per-network-namespace UDP tunnel
sockets for IPv4 and IPv6 encapsulation. Two independent code paths
tear these sockets down: rxe_ns_exit(), called when a network
namespace is destroyed, and rxe_net_del(), called when an RDMA link
is deleted via netlink. Both paths read the per-namespace socket
pointer and call udp_tunnel_sock_release() on it.
A time-of-check/time-of-use (TOCTOU) race exists in rxe_net_del().
It reads the socket pointer via rxe_ns_pernet_sk4(), then passes it
to rxe_sock_put() for release. If rxe_ns_exit() runs concurrently
between the read and the release, it clears the pointer and calls
udp_tunnel_sock_release() first, causing sock_release() to set
sock->ops = NULL. When rxe_net_del() then calls
udp_tunnel_sock_release() on the same socket, kernel_sock_shutdown()
dereferences the now-NULL sock->ops, triggering a KASAN null-ptr-deref
at offset 0x68 (the shutdown function pointer in struct proto_ops).
A minimal alternative would guard against NULL sock->ops inside
udp_tunnel_sock_release() before calling kernel_sock_shutdown(). That
treats the symptom rather than the root cause and leaves the
double-release of socket state intact.
Add rxe_ns_pernet_take_sk4() and rxe_ns_pernet_take_sk6() which use
xchg() to atomically swap the per-namespace socket pointer to NULL
and return the old value. Replace the non-atomic reads in
rxe_net_del() with these take variants, and release the socket
directly via udp_tunnel_sock_release() without going through
rxe_sock_put().
Whichever teardown path executes take first claims ownership of the
socket; the second caller gets NULL and skips the release, closing
the double-release window.
Link: https://lore.kernel.org/r/69ea344f.a00a0220.17a17.0040.GAE@google.com
Oops-Analysis: http://oops.fenrus.org/reports/lkml/69ea344f.a00a0220.17a17.0040.GAE_google.com/report.html
Fixes: 13f2a53c2a71 ("RDMA/rxe: Add net namespace support for IPv4/IPv6 sockets")
Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and destruction per net namespace")
Assisted-by: GitHub Copilot patcher:claude linux-kernel-oops-x86.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Zhu Yanjun <zyjzyj2000@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
---
drivers/infiniband/sw/rxe/rxe_net.c | 8 ++++----
drivers/infiniband/sw/rxe/rxe_ns.c | 14 ++++++++++++++
drivers/infiniband/sw/rxe/rxe_ns.h | 7 +++++++
3 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 50a2cb5405e22..4f604636cb7b4 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -655,13 +655,13 @@ void rxe_net_del(struct ib_device *dev)
net = dev_net(ndev);
- sk = rxe_ns_pernet_sk4(net);
+ sk = rxe_ns_pernet_take_sk4(net);
if (sk)
- rxe_sock_put(sk, rxe_ns_pernet_set_sk4, net);
+ udp_tunnel_sock_release(sk->sk_socket);
- sk = rxe_ns_pernet_sk6(net);
+ sk = rxe_ns_pernet_take_sk6(net);
if (sk)
- rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);
+ udp_tunnel_sock_release(sk->sk_socket);
dev_put(ndev);
}
diff --git a/drivers/infiniband/sw/rxe/rxe_ns.c b/drivers/infiniband/sw/rxe/rxe_ns.c
index 8b9d734229b24..d9d376e3c670f 100644
--- a/drivers/infiniband/sw/rxe/rxe_ns.c
+++ b/drivers/infiniband/sw/rxe/rxe_ns.c
@@ -91,6 +91,13 @@ void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk)
synchronize_rcu();
}
+struct sock *rxe_ns_pernet_take_sk4(struct net *net)
+{
+ struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
+
+ return xchg((__force struct sock **)&ns_sk->rxe_sk4, NULL);
+}
+
#if IS_ENABLED(CONFIG_IPV6)
struct sock *rxe_ns_pernet_sk6(struct net *net)
{
@@ -111,6 +118,13 @@ void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
rcu_assign_pointer(ns_sk->rxe_sk6, sk);
synchronize_rcu();
}
+
+struct sock *rxe_ns_pernet_take_sk6(struct net *net)
+{
+ struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
+
+ return xchg((__force struct sock **)&ns_sk->rxe_sk6, NULL);
+}
#endif /* IPV6 */
int rxe_namespace_init(void)
diff --git a/drivers/infiniband/sw/rxe/rxe_ns.h b/drivers/infiniband/sw/rxe/rxe_ns.h
index 4da2709e6b714..9d9a5106b77c8 100644
--- a/drivers/infiniband/sw/rxe/rxe_ns.h
+++ b/drivers/infiniband/sw/rxe/rxe_ns.h
@@ -5,10 +5,17 @@
struct sock *rxe_ns_pernet_sk4(struct net *net);
void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk);
+struct sock *rxe_ns_pernet_take_sk4(struct net *net);
#if IS_ENABLED(CONFIG_IPV6)
void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk);
struct sock *rxe_ns_pernet_sk6(struct net *net);
+struct sock *rxe_ns_pernet_take_sk6(struct net *net);
#else /* IPv6 */
static inline struct sock *rxe_ns_pernet_sk6(struct net *net)
{
@@ -18,6 +25,10 @@ static inline void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
{
}
+static inline struct sock *rxe_ns_pernet_take_sk6(struct net *net)
+{
+ return NULL;
+}
#endif /* IPv6 */
int rxe_namespace_init(void);
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [syzbot] [net?] general protection fault in kernel_sock_shutdown (4)
[not found] <69ea344f.a00a0220.17a17.0040.GAE@google.com>
2026-04-24 18:08 ` [syzbot] [net?] general protection fault in kernel_sock_shutdown (4) Arjan van de Ven
@ 2026-04-25 1:12 ` Arjan van de Ven
2026-04-25 1:14 ` Kuniyuki Iwashima
2026-05-06 13:48 ` [syzbot] [rdma] " syzbot
2026-05-07 3:52 ` syzbot
3 siblings, 1 reply; 20+ messages in thread
From: Arjan van de Ven @ 2026-04-25 1:12 UTC (permalink / raw)
To: kuniyu
Cc: Arjan van de Ven, linux-rdma, linux-kernel, Zhu Yanjun,
Jason Gunthorpe, Leon Romanovsky
Unfortunately the AI had a burp and did not write out the proper URL
for analysis data; it should have been
http://oops.fenrus.org/reports/lkml/69ea344f.a00a0220.17a17.0040.GAE_google.com/report.html
and in addition, it made a candidate patch (below)
From: Arjan van de Ven <arjan@linux.intel.com>
Subject: [PATCH] RDMA/rxe: fix double-release race on UDP tunnel socket teardown
This patch is based on a BUG as reported at
https://lore.kernel.org/r/69ea344f.a00a0220.17a17.0040.GAE@google.com.
The Soft RoCE (RXE) driver stores per-network-namespace UDP tunnel
sockets for IPv4 and IPv6 encapsulation. Two independent code paths
tear these sockets down: rxe_ns_exit(), called when a network
namespace is destroyed, and rxe_net_del(), called when an RDMA link
is deleted via netlink. Both paths read the per-namespace socket
pointer and call udp_tunnel_sock_release() on it.
A time-of-check/time-of-use (TOCTOU) race exists in rxe_net_del().
It reads the socket pointer via rxe_ns_pernet_sk4(), then passes it
to rxe_sock_put() for release. If rxe_ns_exit() runs concurrently
between the read and the release, it clears the pointer and calls
udp_tunnel_sock_release() first, causing sock_release() to set
sock->ops = NULL. When rxe_net_del() then calls
udp_tunnel_sock_release() on the same socket, kernel_sock_shutdown()
dereferences the now-NULL sock->ops, triggering a KASAN null-ptr-deref
at offset 0x68 (the shutdown function pointer in struct proto_ops).
A minimal alternative would guard against NULL sock->ops inside
udp_tunnel_sock_release() before calling kernel_sock_shutdown(). That
treats the symptom rather than the root cause and leaves the
double-release of socket state intact.
Add rxe_ns_pernet_take_sk4() and rxe_ns_pernet_take_sk6() which use
xchg() to atomically swap the per-namespace socket pointer to NULL
and return the old value. Replace the non-atomic reads in
rxe_net_del() with these take variants, and release the socket
directly via udp_tunnel_sock_release() without going through
rxe_sock_put().
Whichever teardown path executes take first claims ownership of the
socket; the second caller gets NULL and skips the release, closing
the double-release window.
Link: https://lore.kernel.org/r/69ea344f.a00a0220.17a17.0040.GAE@google.com
Oops-Analysis: http://oops.fenrus.org/reports/lkml/69ea344f.a00a0220.17a17.0040.GAE_google.com/report.html
Fixes: 13f2a53c2a71 ("RDMA/rxe: Add net namespace support for IPv4/IPv6 sockets")
Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and destruction per net namespace")
Assisted-by: GitHub Copilot patcher:claude linux-kernel-oops-x86.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Zhu Yanjun <zyjzyj2000@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
---
drivers/infiniband/sw/rxe/rxe_net.c | 8 ++++----
drivers/infiniband/sw/rxe/rxe_ns.c | 14 ++++++++++++++
drivers/infiniband/sw/rxe/rxe_ns.h | 7 +++++++
3 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 50a2cb5405e22..4f604636cb7b4 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -655,13 +655,13 @@ void rxe_net_del(struct ib_device *dev)
net = dev_net(ndev);
- sk = rxe_ns_pernet_sk4(net);
+ sk = rxe_ns_pernet_take_sk4(net);
if (sk)
- rxe_sock_put(sk, rxe_ns_pernet_set_sk4, net);
+ udp_tunnel_sock_release(sk->sk_socket);
- sk = rxe_ns_pernet_sk6(net);
+ sk = rxe_ns_pernet_take_sk6(net);
if (sk)
- rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);
+ udp_tunnel_sock_release(sk->sk_socket);
dev_put(ndev);
}
diff --git a/drivers/infiniband/sw/rxe/rxe_ns.c b/drivers/infiniband/sw/rxe/rxe_ns.c
index 8b9d734229b24..d9d376e3c670f 100644
--- a/drivers/infiniband/sw/rxe/rxe_ns.c
+++ b/drivers/infiniband/sw/rxe/rxe_ns.c
@@ -91,6 +91,13 @@ void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk)
synchronize_rcu();
}
+struct sock *rxe_ns_pernet_take_sk4(struct net *net)
+{
+ struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
+
+ return xchg((__force struct sock **)&ns_sk->rxe_sk4, NULL);
+}
+
#if IS_ENABLED(CONFIG_IPV6)
struct sock *rxe_ns_pernet_sk6(struct net *net)
{
@@ -111,6 +118,13 @@ void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
rcu_assign_pointer(ns_sk->rxe_sk6, sk);
synchronize_rcu();
}
+
+struct sock *rxe_ns_pernet_take_sk6(struct net *net)
+{
+ struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
+
+ return xchg((__force struct sock **)&ns_sk->rxe_sk6, NULL);
+}
#endif /* IPV6 */
int rxe_namespace_init(void)
diff --git a/drivers/infiniband/sw/rxe/rxe_ns.h b/drivers/infiniband/sw/rxe/rxe_ns.h
index 4da2709e6b714..9d9a5106b77c8 100644
--- a/drivers/infiniband/sw/rxe/rxe_ns.h
+++ b/drivers/infiniband/sw/rxe/rxe_ns.h
@@ -5,10 +5,17 @@
struct sock *rxe_ns_pernet_sk4(struct net *net);
void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk);
+struct sock *rxe_ns_pernet_take_sk4(struct net *net);
#if IS_ENABLED(CONFIG_IPV6)
void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk);
struct sock *rxe_ns_pernet_sk6(struct net *net);
+struct sock *rxe_ns_pernet_take_sk6(struct net *net);
#else /* IPv6 */
static inline struct sock *rxe_ns_pernet_sk6(struct net *net)
{
@@ -18,6 +25,10 @@ static inline void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
{
}
+static inline struct sock *rxe_ns_pernet_take_sk6(struct net *net)
+{
+ return NULL;
+}
#endif /* IPv6 */
int rxe_namespace_init(void);
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [syzbot] [net?] general protection fault in kernel_sock_shutdown (4)
2026-04-25 1:12 ` Arjan van de Ven
@ 2026-04-25 1:14 ` Kuniyuki Iwashima
0 siblings, 0 replies; 20+ messages in thread
From: Kuniyuki Iwashima @ 2026-04-25 1:14 UTC (permalink / raw)
To: Arjan van de Ven
Cc: linux-rdma, linux-kernel, Zhu Yanjun, Jason Gunthorpe,
Leon Romanovsky
On Fri, Apr 24, 2026 at 6:11 PM Arjan van de Ven <arjan@linux.intel.com> wrote:
>
>
> Unfortunately the AI had a burp and did not write out the proper URL
> for analysis data; it should have been
>
> http://oops.fenrus.org/reports/lkml/69ea344f.a00a0220.17a17.0040.GAE_google.com/report.html
>
> and in addition, it made a candidate patch (below)
>
>
>
>
>
>
>
>
>
>
>
> From: Arjan van de Ven <arjan@linux.intel.com>
> Subject: [PATCH] RDMA/rxe: fix double-release race on UDP tunnel socket teardown
>
> This patch is based on a BUG as reported at
> https://lore.kernel.org/r/69ea344f.a00a0220.17a17.0040.GAE@google.com.
>
> The Soft RoCE (RXE) driver stores per-network-namespace UDP tunnel
> sockets for IPv4 and IPv6 encapsulation. Two independent code paths
> tear these sockets down: rxe_ns_exit(), called when a network
> namespace is destroyed, and rxe_net_del(), called when an RDMA link
> is deleted via netlink. Both paths read the per-namespace socket
> pointer and call udp_tunnel_sock_release() on it.
>
> A time-of-check/time-of-use (TOCTOU) race exists in rxe_net_del().
> It reads the socket pointer via rxe_ns_pernet_sk4(), then passes it
> to rxe_sock_put() for release. If rxe_ns_exit() runs concurrently
> between the read and the release, it clears the pointer and calls
> udp_tunnel_sock_release() first, causing sock_release() to set
> sock->ops = NULL. When rxe_net_del() then calls
> udp_tunnel_sock_release() on the same socket, kernel_sock_shutdown()
> dereferences the now-NULL sock->ops, triggering a KASAN null-ptr-deref
> at offset 0x68 (the shutdown function pointer in struct proto_ops).
>
> A minimal alternative would guard against NULL sock->ops inside
> udp_tunnel_sock_release() before calling kernel_sock_shutdown(). That
> treats the symptom rather than the root cause and leaves the
> double-release of socket state intact.
>
> Add rxe_ns_pernet_take_sk4() and rxe_ns_pernet_take_sk6() which use
> xchg() to atomically swap the per-namespace socket pointer to NULL
> and return the old value. Replace the non-atomic reads in
> rxe_net_del() with these take variants, and release the socket
> directly via udp_tunnel_sock_release() without going through
> rxe_sock_put().
>
> Whichever teardown path executes take first claims ownership of the
> socket; the second caller gets NULL and skips the release, closing
> the double-release window.
>
> Link: https://lore.kernel.org/r/69ea344f.a00a0220.17a17.0040.GAE@google.com
> Oops-Analysis: http://oops.fenrus.org/reports/lkml/69ea344f.a00a0220.17a17.0040.GAE_google.com/report.html
> Fixes: 13f2a53c2a71 ("RDMA/rxe: Add net namespace support for IPv4/IPv6 sockets")
> Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and destruction per net namespace")
> Assisted-by: GitHub Copilot patcher:claude linux-kernel-oops-x86.
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> Cc: linux-rdma@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Zhu Yanjun <zyjzyj2000@gmail.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Leon Romanovsky <leon@kernel.org>
>
> ---
> drivers/infiniband/sw/rxe/rxe_net.c | 8 ++++----
> drivers/infiniband/sw/rxe/rxe_ns.c | 14 ++++++++++++++
> drivers/infiniband/sw/rxe/rxe_ns.h | 7 +++++++
> 3 files changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 50a2cb5405e22..4f604636cb7b4 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -655,13 +655,13 @@ void rxe_net_del(struct ib_device *dev)
>
> net = dev_net(ndev);
>
> - sk = rxe_ns_pernet_sk4(net);
> + sk = rxe_ns_pernet_take_sk4(net);
> if (sk)
> - rxe_sock_put(sk, rxe_ns_pernet_set_sk4, net);
> + udp_tunnel_sock_release(sk->sk_socket);
This leaks sk->sk_refcnt, no AI slop please.
I'm working on the right fix.
Thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4)
[not found] <69ea344f.a00a0220.17a17.0040.GAE@google.com>
2026-04-24 18:08 ` [syzbot] [net?] general protection fault in kernel_sock_shutdown (4) Arjan van de Ven
2026-04-25 1:12 ` Arjan van de Ven
@ 2026-05-06 13:48 ` syzbot
2026-05-06 14:28 ` Zhu Yanjun
2026-05-07 3:52 ` syzbot
3 siblings, 1 reply; 20+ messages in thread
From: syzbot @ 2026-05-06 13:48 UTC (permalink / raw)
To: akpm, arjan, davem, dsahern, edumazet, horms, jgg, kuba, kuni1840,
kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
syzkaller-bugs, yanjun.zhu, zyjzyj2000
syzbot has found a reproducer for the following issue on:
HEAD commit: 74fe02ce122a Merge tag 'wq-for-7.1-rc2-fixes' of git://git..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16e895ce580000
kernel config: https://syzkaller.appspot.com/x/.config?x=59da38148f3a3d24
dashboard link: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13a613ba580000
Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-74fe02ce.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/c0a591d96864/vmlinux-74fe02ce.xz
kernel image: https://storage.googleapis.com/syzbot-assets/9f94fb623cd1/bzImage-74fe02ce.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
Oops: general protection fault, probably for non-canonical address 0xdffffc000000000d: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f]
CPU: 3 UID: 0 PID: 5986 Comm: syz.3.20 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:kernel_sock_shutdown+0x47/0x70 net/socket.c:3785
Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 33 48 b8 00 00 00 00 00 fc ff df 4c 8b 63 20 49 8d 7c 24 68 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 1a 49 8b 44 24 68 89 ee 48 89 df 5b 5d 41 5c ff e0
RSP: 0018:ffffc9000391f180 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff88802a2a0040 RCX: ffffffff8b8b72bd
RDX: 000000000000000d RSI: ffffffff89553b32 RDI: 0000000000000068
RBP: 0000000000000002 R08: 0000000000000001 R09: fffff52000723dfc
R10: ffffc9000391efe7 R11: 0000000000000000 R12: 0000000000000000
R13: ffff8880311b8000 R14: 0000000000000002 R15: 0000000000000018
FS: 00007f602d1fe6c0(0000) GS:ffff8880d6675000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000561c522a6000 CR3: 000000002e99e000 CR4: 0000000000352ef0
Call Trace:
<TASK>
udp_tunnel_sock_release+0x68/0x80 net/ipv4/udp_tunnel_core.c:202
rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
rxe_sock_put+0xae/0x130 drivers/infiniband/sw/rxe/rxe_net.c:639
rxe_net_del+0x83/0x120 drivers/infiniband/sw/rxe/rxe_net.c:660
rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
nldev_dellink+0x289/0x3c0 drivers/infiniband/core/nldev.c:1849
rdma_nl_rcv_msg+0x392/0x6f0 drivers/infiniband/core/netlink.c:195
rdma_nl_rcv_skb.constprop.0.isra.0+0x2cb/0x410 drivers/infiniband/core/netlink.c:239
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0x585/0x850 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x8b0/0xda0 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:787 [inline]
__sock_sendmsg net/socket.c:802 [inline]
____sys_sendmsg+0x9e1/0xb70 net/socket.c:2698
___sys_sendmsg+0x190/0x1e0 net/socket.c:2752
__sys_sendmsg+0x170/0x220 net/socket.c:2784
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x10b/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f602db9cdd9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f602d1fe028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f602de16090 RCX: 00007f602db9cdd9
RDX: 0000000000000000 RSI: 00002000000002c0 RDI: 0000000000000007
RBP: 00007f602dc32d69 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f602de16128 R14: 00007f602de16090 R15: 00007ffc1d89c428
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:kernel_sock_shutdown+0x47/0x70 net/socket.c:3785
Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 33 48 b8 00 00 00 00 00 fc ff df 4c 8b 63 20 49 8d 7c 24 68 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 1a 49 8b 44 24 68 89 ee 48 89 df 5b 5d 41 5c ff e0
RSP: 0018:ffffc9000391f180 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff88802a2a0040 RCX: ffffffff8b8b72bd
RDX: 000000000000000d RSI: ffffffff89553b32 RDI: 0000000000000068
RBP: 0000000000000002 R08: 0000000000000001 R09: fffff52000723dfc
R10: ffffc9000391efe7 R11: 0000000000000000 R12: 0000000000000000
R13: ffff8880311b8000 R14: 0000000000000002 R15: 0000000000000018
FS: 00007f602d1fe6c0(0000) GS:ffff8880d6675000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000561c522a6000 CR3: 000000002e99e000 CR4: 0000000000352ef0
----------------
Code disassembly (best guess):
0: fc cld
1: ff lcall (bad)
2: df 48 89 fisttps -0x77(%rax)
5: fa cli
6: 48 c1 ea 03 shr $0x3,%rdx
a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1)
e: 75 33 jne 0x43
10: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
17: fc ff df
1a: 4c 8b 63 20 mov 0x20(%rbx),%r12
1e: 49 8d 7c 24 68 lea 0x68(%r12),%rdi
23: 48 89 fa mov %rdi,%rdx
26: 48 c1 ea 03 shr $0x3,%rdx
* 2a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction
2e: 75 1a jne 0x4a
30: 49 8b 44 24 68 mov 0x68(%r12),%rax
35: 89 ee mov %ebp,%esi
37: 48 89 df mov %rbx,%rdi
3a: 5b pop %rbx
3b: 5d pop %rbp
3c: 41 5c pop %r12
3e: ff e0 jmp *%rax
---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4)
2026-05-06 13:48 ` [syzbot] [rdma] " syzbot
@ 2026-05-06 14:28 ` Zhu Yanjun
2026-05-06 15:19 ` Kuniyuki Iwashima
0 siblings, 1 reply; 20+ messages in thread
From: Zhu Yanjun @ 2026-05-06 14:28 UTC (permalink / raw)
To: syzbot, akpm, arjan, davem, dsahern, edumazet, horms, jgg, kuba,
kuni1840, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
syzkaller-bugs, zyjzyj2000
Cc: Kuniyuki Iwashima
在 2026/5/6 6:48, syzbot 写道:
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit: 74fe02ce122a Merge tag 'wq-for-7.1-rc2-fixes' of git://git..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=16e895ce580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=59da38148f3a3d24
> dashboard link: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
> compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13a613ba580000
>
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-74fe02ce.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/c0a591d96864/vmlinux-74fe02ce.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/9f94fb623cd1/bzImage-74fe02ce.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
>
> Oops: general protection fault, probably for non-canonical address 0xdffffc000000000d: 0000 [#1] SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f]
Thanks a lot. IIRC, this problem is in process. The link is
https://patchwork.kernel.org/project/linux-rdma/patch/20260424013759.728288-1-kuniyu@google.com/
Hi, Kuniyuki Iwashima
I think you are fixing this problem. I hope that we can see your commit
very soon.
Zhu Yanjun
> CPU: 3 UID: 0 PID: 5986 Comm: syz.3.20 Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> RIP: 0010:kernel_sock_shutdown+0x47/0x70 net/socket.c:3785
> Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 33 48 b8 00 00 00 00 00 fc ff df 4c 8b 63 20 49 8d 7c 24 68 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 1a 49 8b 44 24 68 89 ee 48 89 df 5b 5d 41 5c ff e0
> RSP: 0018:ffffc9000391f180 EFLAGS: 00010202
> RAX: dffffc0000000000 RBX: ffff88802a2a0040 RCX: ffffffff8b8b72bd
> RDX: 000000000000000d RSI: ffffffff89553b32 RDI: 0000000000000068
> RBP: 0000000000000002 R08: 0000000000000001 R09: fffff52000723dfc
> R10: ffffc9000391efe7 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff8880311b8000 R14: 0000000000000002 R15: 0000000000000018
> FS: 00007f602d1fe6c0(0000) GS:ffff8880d6675000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000561c522a6000 CR3: 000000002e99e000 CR4: 0000000000352ef0
> Call Trace:
> <TASK>
> udp_tunnel_sock_release+0x68/0x80 net/ipv4/udp_tunnel_core.c:202
> rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
> rxe_sock_put+0xae/0x130 drivers/infiniband/sw/rxe/rxe_net.c:639
> rxe_net_del+0x83/0x120 drivers/infiniband/sw/rxe/rxe_net.c:660
> rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
> nldev_dellink+0x289/0x3c0 drivers/infiniband/core/nldev.c:1849
> rdma_nl_rcv_msg+0x392/0x6f0 drivers/infiniband/core/netlink.c:195
> rdma_nl_rcv_skb.constprop.0.isra.0+0x2cb/0x410 drivers/infiniband/core/netlink.c:239
> netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
> netlink_unicast+0x585/0x850 net/netlink/af_netlink.c:1344
> netlink_sendmsg+0x8b0/0xda0 net/netlink/af_netlink.c:1894
> sock_sendmsg_nosec net/socket.c:787 [inline]
> __sock_sendmsg net/socket.c:802 [inline]
> ____sys_sendmsg+0x9e1/0xb70 net/socket.c:2698
> ___sys_sendmsg+0x190/0x1e0 net/socket.c:2752
> __sys_sendmsg+0x170/0x220 net/socket.c:2784
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0x10b/0xf80 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f602db9cdd9
> Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f602d1fe028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 00007f602de16090 RCX: 00007f602db9cdd9
> RDX: 0000000000000000 RSI: 00002000000002c0 RDI: 0000000000000007
> RBP: 00007f602dc32d69 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f602de16128 R14: 00007f602de16090 R15: 00007ffc1d89c428
> </TASK>
> Modules linked in:
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:kernel_sock_shutdown+0x47/0x70 net/socket.c:3785
> Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 33 48 b8 00 00 00 00 00 fc ff df 4c 8b 63 20 49 8d 7c 24 68 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 1a 49 8b 44 24 68 89 ee 48 89 df 5b 5d 41 5c ff e0
> RSP: 0018:ffffc9000391f180 EFLAGS: 00010202
>
> RAX: dffffc0000000000 RBX: ffff88802a2a0040 RCX: ffffffff8b8b72bd
> RDX: 000000000000000d RSI: ffffffff89553b32 RDI: 0000000000000068
> RBP: 0000000000000002 R08: 0000000000000001 R09: fffff52000723dfc
> R10: ffffc9000391efe7 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff8880311b8000 R14: 0000000000000002 R15: 0000000000000018
> FS: 00007f602d1fe6c0(0000) GS:ffff8880d6675000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000561c522a6000 CR3: 000000002e99e000 CR4: 0000000000352ef0
> ----------------
> Code disassembly (best guess):
> 0: fc cld
> 1: ff lcall (bad)
> 2: df 48 89 fisttps -0x77(%rax)
> 5: fa cli
> 6: 48 c1 ea 03 shr $0x3,%rdx
> a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1)
> e: 75 33 jne 0x43
> 10: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
> 17: fc ff df
> 1a: 4c 8b 63 20 mov 0x20(%rbx),%r12
> 1e: 49 8d 7c 24 68 lea 0x68(%r12),%rdi
> 23: 48 89 fa mov %rdi,%rdx
> 26: 48 c1 ea 03 shr $0x3,%rdx
> * 2a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction
> 2e: 75 1a jne 0x4a
> 30: 49 8b 44 24 68 mov 0x68(%r12),%rax
> 35: 89 ee mov %ebp,%esi
> 37: 48 89 df mov %rbx,%rdi
> 3a: 5b pop %rbx
> 3b: 5d pop %rbp
> 3c: 41 5c pop %r12
> 3e: ff e0 jmp *%rax
>
>
> ---
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
--
Best Regards,
Yanjun.Zhu
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4)
2026-05-06 14:28 ` Zhu Yanjun
@ 2026-05-06 15:19 ` Kuniyuki Iwashima
0 siblings, 0 replies; 20+ messages in thread
From: Kuniyuki Iwashima @ 2026-05-06 15:19 UTC (permalink / raw)
To: Zhu Yanjun
Cc: syzbot, akpm, arjan, davem, dsahern, edumazet, horms, jgg, kuba,
kuni1840, leon, linux-kernel, linux-rdma, netdev, pabeni,
syzkaller-bugs, zyjzyj2000
On Wed, May 6, 2026 at 7:28 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
>
>
> 在 2026/5/6 6:48, syzbot 写道:
> > syzbot has found a reproducer for the following issue on:
> >
> > HEAD commit: 74fe02ce122a Merge tag 'wq-for-7.1-rc2-fixes' of git://git..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=16e895ce580000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=59da38148f3a3d24
> > dashboard link: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
> > compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13a613ba580000
> >
> > Downloadable assets:
> > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-74fe02ce.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/c0a591d96864/vmlinux-74fe02ce.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/9f94fb623cd1/bzImage-74fe02ce.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> >
> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000000d: 0000 [#1] SMP KASAN NOPTI
> > KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f]
>
> Thanks a lot. IIRC, this problem is in process. The link is
> https://patchwork.kernel.org/project/linux-rdma/patch/20260424013759.728288-1-kuniyu@google.com/
>
> Hi, Kuniyuki Iwashima
>
> I think you are fixing this problem. I hope that we can see your commit
> very soon.
Yes, I was sidetracked but will respin v3 this week.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4)
[not found] <69ea344f.a00a0220.17a17.0040.GAE@google.com>
` (2 preceding siblings ...)
2026-05-06 13:48 ` [syzbot] [rdma] " syzbot
@ 2026-05-07 3:52 ` syzbot
2026-05-07 12:50 ` [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink() Edward Adam Davis
2026-05-14 5:15 ` [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4) Zhu Yanjun
3 siblings, 2 replies; 20+ messages in thread
From: syzbot @ 2026-05-07 3:52 UTC (permalink / raw)
To: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, jgg, kuba,
kuni1840, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
syzkaller-bugs, yanjun.zhu, zyjzyj2000
syzbot has found a reproducer for the following issue on:
HEAD commit: 735d2f48cada Add linux-next specific files for 20260506
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=14f0e56a580000
kernel config: https://syzkaller.appspot.com/x/.config?x=a88880f0f312e277
dashboard link: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=125c9f6c580000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=166580ec580000
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/e65b731bdb98/disk-735d2f48.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/60db2f3d3f2f/vmlinux-735d2f48.xz
kernel image: https://storage.googleapis.com/syzbot-assets/55da282f7ab4/bzImage-735d2f48.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
rdma_rxe: rxe_newlink: failed to add lo
Oops: gen[ 127.022080][ T5982] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
CPU: 1 UID: 0 PID: 5982 Comm: syz.3.20 Not tainted syzkaller #0 PREEMPT_{RT,(full)}
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
RIP: 0010:kernel_sock_shutdown+0x2a/0x70 net/socket.c:3803
Code: f3 0f 1e fa 41 57 41 56 41 54 53 89 f3 49 89 fe 49 bc 00 00 00 00 00 fc ff df e8 e1 25 c5 f8 4d 8d 7e 20 4c 89 f8 48 c1 e8 03 <42> 80 3c 20 00 74 08 4c 89 ff e8 27 bf 2e f9 4d 8b 3f 49 83 c7 68
RSP: 0018:ffffc900015ef090 EFLAGS: 00010202
RAX: 0000000000000004 RBX: 0000000000000002 RCX: ffff88802dd89ec0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: dffffc0000000000 R11: ffffed1007cc8979 R12: dffffc0000000000
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000020
FS: 000055556d432500(0000) GS:ffff888125dca000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b34563fff CR3: 0000000042b1c000 CR4: 00000000003526f0
Call Trace:
<TASK>
udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
nldev_dellink+0x304/0x3d0 drivers/infiniband/core/nldev.c:1849
rdma_nl_rcv_msg drivers/infiniband/core/netlink.c:-1 [inline]
rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
rdma_nl_rcv+0x6d7/0xa10 drivers/infiniband/core/netlink.c:259
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x780/0x920 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1895
sock_sendmsg_nosec+0x112/0x150 net/socket.c:797
__sock_sendmsg net/socket.c:812 [inline]
____sys_sendmsg+0x55c/0x870 net/socket.c:2716
___sys_sendmsg+0x2a5/0x360 net/socket.c:2770
__sys_sendmsg net/socket.c:2802 [inline]
__do_sys_sendmsg net/socket.c:2807 [inline]
__se_sys_sendmsg net/socket.c:2805 [inline]
__x64_sys_sendmsg+0x1c3/0x2a0 net/socket.c:2805
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f89172fcdd9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe8bf8c018 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f8917575fa0 RCX: 00007f89172fcdd9
RDX: 0000000000000000 RSI: 00002000000002c0 RDI: 0000000000000006
RBP: 00007f8917392d69 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f8917575fac R14: 00007f8917575fa0 R15: 00007f8917575fa0
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:kernel_sock_shutdown+0x2a/0x70 net/socket.c:3803
Code: f3 0f 1e fa 41 57 41 56 41 54 53 89 f3 49 89 fe 49 bc 00 00 00 00 00 fc ff df e8 e1 25 c5 f8 4d 8d 7e 20 4c 89 f8 48 c1 e8 03 <42> 80 3c 20 00 74 08 4c 89 ff e8 27 bf 2e f9 4d 8b 3f 49 83 c7 68
RSP: 0018:ffffc900015ef090 EFLAGS: 00010202
RAX: 0000000000000004 RBX: 0000000000000002 RCX: ffff88802dd89ec0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: dffffc0000000000 R11: ffffed1007cc8979 R12: dffffc0000000000
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000020
FS: 000055556d432500(0000) GS:ffff888125dca000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000042b1c000 CR4: 00000000003526f0
----------------
Code disassembly (best guess):
0: f3 0f 1e fa endbr64
4: 41 57 push %r15
6: 41 56 push %r14
8: 41 54 push %r12
a: 53 push %rbx
b: 89 f3 mov %esi,%ebx
d: 49 89 fe mov %rdi,%r14
10: 49 bc 00 00 00 00 00 movabs $0xdffffc0000000000,%r12
17: fc ff df
1a: e8 e1 25 c5 f8 call 0xf8c52600
1f: 4d 8d 7e 20 lea 0x20(%r14),%r15
23: 4c 89 f8 mov %r15,%rax
26: 48 c1 e8 03 shr $0x3,%rax
* 2a: 42 80 3c 20 00 cmpb $0x0,(%rax,%r12,1) <-- trapping instruction
2f: 74 08 je 0x39
31: 4c 89 ff mov %r15,%rdi
34: e8 27 bf 2e f9 call 0xf92ebf60
39: 4d 8b 3f mov (%r15),%r15
3c: 49 83 c7 68 add $0x68,%r15
---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
2026-05-07 3:52 ` syzbot
@ 2026-05-07 12:50 ` Edward Adam Davis
2026-05-07 13:25 ` Zhu Yanjun
2026-05-13 18:17 ` Leon Romanovsky
2026-05-14 5:15 ` [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4) Zhu Yanjun
1 sibling, 2 replies; 20+ messages in thread
From: Edward Adam Davis @ 2026-05-07 12:50 UTC (permalink / raw)
To: syzbot+d8f76778263ab65c2b21
Cc: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, jgg, kuba,
kuni1840, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
syzkaller-bugs, yanjun.zhu, zyjzyj2000
We must serialize calls to nldev_dellink() or risk a crash as syzbot
reported:
KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
Call Trace:
udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
Fixes: a60e3f3d6fba ("RDMA/nldev: Add dellink function pointer")
Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
Tested-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
---
drivers/infiniband/core/nldev.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 96c745d5bac4..3cb3cb7629fe 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -1816,6 +1816,8 @@ static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
return err;
}
+static DEFINE_MUTEX(nldev_dellink_mutex);
+
static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
@@ -1846,7 +1848,9 @@ static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
* implicitly scoped to the driver supporting dynamic link deletion like RXE.
*/
if (device->link_ops && device->link_ops->dellink) {
+ mutex_lock(&nldev_dellink_mutex);
err = device->link_ops->dellink(device);
+ mutex_unlock(&nldev_dellink_mutex);
if (err)
return err;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
2026-05-07 12:50 ` [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink() Edward Adam Davis
@ 2026-05-07 13:25 ` Zhu Yanjun
2026-05-07 13:40 ` Edward Adam Davis
2026-05-13 18:17 ` Leon Romanovsky
1 sibling, 1 reply; 20+ messages in thread
From: Zhu Yanjun @ 2026-05-07 13:25 UTC (permalink / raw)
To: Edward Adam Davis, syzbot+d8f76778263ab65c2b21,
yanjun.zhu@linux.dev
Cc: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, jgg, kuba,
kuni1840, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
syzkaller-bugs, zyjzyj2000
在 2026/5/7 5:50, Edward Adam Davis 写道:
> We must serialize calls to nldev_dellink() or risk a crash as syzbot
> reported:
>
> KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
> Call Trace:
> udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
> rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
> rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
> rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
>
> Fixes: a60e3f3d6fba ("RDMA/nldev: Add dellink function pointer")
> Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
> Tested-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Thanks a lot. This looks like a good solution. Since the issue is
reproducible,
have you sent this commit to syzbot for verification?
Thanks,
Zhu Yanjun
> ---
> drivers/infiniband/core/nldev.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> index 96c745d5bac4..3cb3cb7629fe 100644
> --- a/drivers/infiniband/core/nldev.c
> +++ b/drivers/infiniband/core/nldev.c
> @@ -1816,6 +1816,8 @@ static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
> return err;
> }
>
> +static DEFINE_MUTEX(nldev_dellink_mutex);
> +
> static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
> struct netlink_ext_ack *extack)
> {
> @@ -1846,7 +1848,9 @@ static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
> * implicitly scoped to the driver supporting dynamic link deletion like RXE.
> */
> if (device->link_ops && device->link_ops->dellink) {
> + mutex_lock(&nldev_dellink_mutex);
> err = device->link_ops->dellink(device);
> + mutex_unlock(&nldev_dellink_mutex);
> if (err)
> return err;
> }
--
Best Regards,
Yanjun.Zhu
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
2026-05-07 13:25 ` Zhu Yanjun
@ 2026-05-07 13:40 ` Edward Adam Davis
2026-05-07 14:11 ` Zhu Yanjun
0 siblings, 1 reply; 20+ messages in thread
From: Edward Adam Davis @ 2026-05-07 13:40 UTC (permalink / raw)
To: yanjun.zhu
Cc: akpm, arjan, davem, dsahern, eadavis, edumazet, hdanton, horms,
jgg, kuba, kuni1840, kuniyu, leon, linux-kernel, linux-rdma,
netdev, pabeni, syzbot+d8f76778263ab65c2b21, syzkaller-bugs,
zyjzyj2000
On Thu, 7 May 2026 06:25:54 -0700, Zhu Yanjun wrote:
> > We must serialize calls to nldev_dellink() or risk a crash as syzbot
> > reported:
> >
> > KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
> > Call Trace:
> > udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> > rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
> > rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
> > rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
> > rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
> >
> > Fixes: a60e3f3d6fba ("RDMA/nldev: Add dellink function pointer")
> > Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
> > Tested-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> > Signed-off-by: Edward Adam Davis <eadavis@qq.com>
>
> Thanks a lot. This looks like a good solution. Since the issue is
> reproducible,
>
> have you sent this commit to syzbot for verification?
The patch has been verified by syzbot.
BR,
Edward
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
2026-05-07 13:40 ` Edward Adam Davis
@ 2026-05-07 14:11 ` Zhu Yanjun
0 siblings, 0 replies; 20+ messages in thread
From: Zhu Yanjun @ 2026-05-07 14:11 UTC (permalink / raw)
To: Edward Adam Davis, yanjun.zhu@linux.dev
Cc: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, jgg, kuba,
kuni1840, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
syzbot+d8f76778263ab65c2b21, syzkaller-bugs, zyjzyj2000
在 2026/5/7 6:40, Edward Adam Davis 写道:
> On Thu, 7 May 2026 06:25:54 -0700, Zhu Yanjun wrote:
>>> We must serialize calls to nldev_dellink() or risk a crash as syzbot
>>> reported:
>>>
>>> KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
>>> Call Trace:
>>> udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
>>> rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
>>> rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
>>> rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
>>> rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
>>>
>>> Fixes: a60e3f3d6fba ("RDMA/nldev: Add dellink function pointer")
>>> Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
>>> Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
>>> Tested-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
>>> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
>> Thanks a lot. This looks like a good solution. Since the issue is
>> reproducible,
>>
>> have you sent this commit to syzbot for verification?
> The patch has been verified by syzbot.
Thanks a lot.
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Zhu Yanjun
>
> BR,
> Edward
>
--
Best Regards,
Yanjun.Zhu
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
2026-05-07 12:50 ` [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink() Edward Adam Davis
2026-05-07 13:25 ` Zhu Yanjun
@ 2026-05-13 18:17 ` Leon Romanovsky
2026-05-13 23:46 ` Jason Gunthorpe
1 sibling, 1 reply; 20+ messages in thread
From: Leon Romanovsky @ 2026-05-13 18:17 UTC (permalink / raw)
To: syzbot+d8f76778263ab65c2b21, Edward Adam Davis
Cc: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, jgg, kuba,
kuniyu, linux-kernel, linux-rdma, netdev, pabeni, syzkaller-bugs,
yanjun.zhu, zyjzyj2000, Kuniyuki Iwashima
On Thu, 07 May 2026 20:50:10 +0800, Edward Adam Davis wrote:
> We must serialize calls to nldev_dellink() or risk a crash as syzbot
> reported:
>
> Call Trace:
> udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
> rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
> rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
> rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
>
> [...]
Applied, thanks!
[1/1] RDMA/nldev: add mutual exclusion in nldev_dellink()
https://git.kernel.org/rdma/rdma/c/0b28000b64f40d
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
2026-05-13 18:17 ` Leon Romanovsky
@ 2026-05-13 23:46 ` Jason Gunthorpe
2026-05-14 7:31 ` Edward Adam Davis
0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2026-05-13 23:46 UTC (permalink / raw)
To: Leon Romanovsky
Cc: syzbot+d8f76778263ab65c2b21, Edward Adam Davis, akpm, arjan,
davem, dsahern, edumazet, hdanton, horms, kuba, kuniyu,
linux-kernel, linux-rdma, netdev, pabeni, syzkaller-bugs,
yanjun.zhu, zyjzyj2000
On Wed, May 13, 2026 at 02:17:28PM -0400, Leon Romanovsky wrote:
>
> On Thu, 07 May 2026 20:50:10 +0800, Edward Adam Davis wrote:
> > We must serialize calls to nldev_dellink() or risk a crash as syzbot
> > reported:
> >
> > Call Trace:
> > udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> > rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
> > rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
> > rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
> > rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
> >
> > [...]
>
> Applied, thanks!
>
> [1/1] RDMA/nldev: add mutual exclusion in nldev_dellink()
> https://git.kernel.org/rdma/rdma/c/0b28000b64f40d
This seems like a rxe bug, I would have expected the lock to be inside
rxe to protect its racy implementation of rxe_net_del(), which looks
like it is possibly also triggered by NETDEV_UNREGISTER...
ie it should not change nldev_dellink().
Jason
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4)
2026-05-07 3:52 ` syzbot
2026-05-07 12:50 ` [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink() Edward Adam Davis
@ 2026-05-14 5:15 ` Zhu Yanjun
1 sibling, 0 replies; 20+ messages in thread
From: Zhu Yanjun @ 2026-05-14 5:15 UTC (permalink / raw)
To: syzbot, akpm, arjan, davem, dsahern, edumazet, hdanton, horms,
jgg, kuba, kuni1840, kuniyu, leon, linux-kernel, linux-rdma,
netdev, pabeni, syzkaller-bugs, zyjzyj2000
syz test: https://github.com/zhuyj/linux null-ptr-deref_kernel_sock_shutdown
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
2026-05-13 23:46 ` Jason Gunthorpe
@ 2026-05-14 7:31 ` Edward Adam Davis
2026-05-14 11:50 ` Jason Gunthorpe
0 siblings, 1 reply; 20+ messages in thread
From: Edward Adam Davis @ 2026-05-14 7:31 UTC (permalink / raw)
To: jgg
Cc: akpm, arjan, davem, dsahern, eadavis, edumazet, hdanton, horms,
kuba, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
syzbot+d8f76778263ab65c2b21, syzkaller-bugs, yanjun.zhu,
zyjzyj2000
On Wed, 13 May 2026 20:46:55 -0300, Jason Gunthorpe wrote:
> On Wed, May 13, 2026 at 02:17:28PM -0400, Leon Romanovsky wrote:
> >
> > On Thu, 07 May 2026 20:50:10 +0800, Edward Adam Davis wrote:
> > > We must serialize calls to nldev_dellink() or risk a crash as syzbot
> > > reported:
> > >
> > > Call Trace:
> > > udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> > > rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
> > > rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
> > > rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
> > > rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
> > >
> > > [...]
> >
> > Applied, thanks!
> >
> > [1/1] RDMA/nldev: add mutual exclusion in nldev_dellink()
> > https://git.kernel.org/rdma/rdma/c/0b28000b64f40d
>
> This seems like a rxe bug, I would have expected the lock to be inside
> rxe to protect its racy implementation of rxe_net_del(), which looks
> like it is possibly also triggered by NETDEV_UNREGISTER...
No, it was triggered by RDMA_NLDEV_CMD_DELLINK, you can see the "call trace".
>
> ie it should not change nldev_dellink().
While this could be fixed within RXE, the same issue affects all other
RXE-like submodules when they subsequently support the "dellink" interface,
therefore, handling this within nldev_dellink() is relatively more appropriate.
Edward
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
2026-05-14 7:31 ` Edward Adam Davis
@ 2026-05-14 11:50 ` Jason Gunthorpe
2026-05-14 13:58 ` David Ahern
0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2026-05-14 11:50 UTC (permalink / raw)
To: Edward Adam Davis
Cc: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, kuba,
kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
syzbot+d8f76778263ab65c2b21, syzkaller-bugs, yanjun.zhu,
zyjzyj2000
On Thu, May 14, 2026 at 03:31:22PM +0800, Edward Adam Davis wrote:
> On Wed, 13 May 2026 20:46:55 -0300, Jason Gunthorpe wrote:
> > On Wed, May 13, 2026 at 02:17:28PM -0400, Leon Romanovsky wrote:
> > >
> > > On Thu, 07 May 2026 20:50:10 +0800, Edward Adam Davis wrote:
> > > > We must serialize calls to nldev_dellink() or risk a crash as syzbot
> > > > reported:
> > > >
> > > > Call Trace:
> > > > udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> > > > rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
> > > > rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
> > > > rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
> > > > rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
> > > >
> > > > [...]
> > >
> > > Applied, thanks!
> > >
> > > [1/1] RDMA/nldev: add mutual exclusion in nldev_dellink()
> > > https://git.kernel.org/rdma/rdma/c/0b28000b64f40d
> >
> > This seems like a rxe bug, I would have expected the lock to be inside
> > rxe to protect its racy implementation of rxe_net_del(), which looks
> > like it is possibly also triggered by NETDEV_UNREGISTER...
> No, it was triggered by RDMA_NLDEV_CMD_DELLINK, you can see the "call trace".
> >
> > ie it should not change nldev_dellink().
> While this could be fixed within RXE, the same issue affects all other
> RXE-like submodules when they subsequently support the "dellink" interface,
> therefore, handling this within nldev_dellink() is relatively more appropriate.
Why would other modules have an issue? The problem is rxe's racey
refcounting scheme for its lazy socket creation. There is nothing
wrong with nldev, and now you've created some nasty BKL in the nldev
code to fix rxe while ignoring its other races.
Jason
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
2026-05-14 11:50 ` Jason Gunthorpe
@ 2026-05-14 13:58 ` David Ahern
2026-05-14 14:14 ` Jason Gunthorpe
0 siblings, 1 reply; 20+ messages in thread
From: David Ahern @ 2026-05-14 13:58 UTC (permalink / raw)
To: Jason Gunthorpe, Edward Adam Davis
Cc: akpm, arjan, davem, edumazet, hdanton, horms, kuba, kuniyu, leon,
linux-kernel, linux-rdma, netdev, pabeni,
syzbot+d8f76778263ab65c2b21, syzkaller-bugs, yanjun.zhu,
zyjzyj2000
On 5/14/26 5:50 AM, Jason Gunthorpe wrote:
> On Thu, May 14, 2026 at 03:31:22PM +0800, Edward Adam Davis wrote:
>> On Wed, 13 May 2026 20:46:55 -0300, Jason Gunthorpe wrote:
>>> On Wed, May 13, 2026 at 02:17:28PM -0400, Leon Romanovsky wrote:
>>>>
>>>> On Thu, 07 May 2026 20:50:10 +0800, Edward Adam Davis wrote:
>>>>> We must serialize calls to nldev_dellink() or risk a crash as syzbot
>>>>> reported:
>>>>>
>>>>> Call Trace:
>>>>> udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
>>>>> rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
>>>>> rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
>>>>> rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
>>>>> rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
>>>>>
>>>>> [...]
>>>>
>>>> Applied, thanks!
>>>>
>>>> [1/1] RDMA/nldev: add mutual exclusion in nldev_dellink()
>>>> https://git.kernel.org/rdma/rdma/c/0b28000b64f40d
>>>
>>> This seems like a rxe bug, I would have expected the lock to be inside
>>> rxe to protect its racy implementation of rxe_net_del(), which looks
>>> like it is possibly also triggered by NETDEV_UNREGISTER...
>> No, it was triggered by RDMA_NLDEV_CMD_DELLINK, you can see the "call trace".
Not that Jason's point. Code wise
rxe_dellink -> rxe_net_del
netdev NETDEV_UNREGISTER:
rxe_notify -> rxe_net_del
both can lead to the same problem
>>>
>>> ie it should not change nldev_dellink().
>> While this could be fixed within RXE, the same issue affects all other
>> RXE-like submodules when they subsequently support the "dellink" interface,
>> therefore, handling this within nldev_dellink() is relatively more appropriate.
>
> Why would other modules have an issue? The problem is rxe's racey
> refcounting scheme for its lazy socket creation. There is nothing
> wrong with nldev, and now you've created some nasty BKL in the nldev
> code to fix rxe while ignoring its other races.
+1
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
2026-05-14 13:58 ` David Ahern
@ 2026-05-14 14:14 ` Jason Gunthorpe
2026-05-14 14:26 ` David Ahern
0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2026-05-14 14:14 UTC (permalink / raw)
To: David Ahern
Cc: Edward Adam Davis, akpm, arjan, davem, edumazet, hdanton, horms,
kuba, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
syzbot+d8f76778263ab65c2b21, syzkaller-bugs, yanjun.zhu,
zyjzyj2000
On Thu, May 14, 2026 at 07:58:18AM -0600, David Ahern wrote:
> On 5/14/26 5:50 AM, Jason Gunthorpe wrote:
> > On Thu, May 14, 2026 at 03:31:22PM +0800, Edward Adam Davis wrote:
> >> On Wed, 13 May 2026 20:46:55 -0300, Jason Gunthorpe wrote:
> >>> On Wed, May 13, 2026 at 02:17:28PM -0400, Leon Romanovsky wrote:
> >>>>
> >>>> On Thu, 07 May 2026 20:50:10 +0800, Edward Adam Davis wrote:
> >>>>> We must serialize calls to nldev_dellink() or risk a crash as syzbot
> >>>>> reported:
> >>>>>
> >>>>> Call Trace:
> >>>>> udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> >>>>> rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
> >>>>> rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
> >>>>> rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
> >>>>> rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
> >>>>>
> >>>>> [...]
> >>>>
> >>>> Applied, thanks!
> >>>>
> >>>> [1/1] RDMA/nldev: add mutual exclusion in nldev_dellink()
> >>>> https://git.kernel.org/rdma/rdma/c/0b28000b64f40d
> >>>
> >>> This seems like a rxe bug, I would have expected the lock to be inside
> >>> rxe to protect its racy implementation of rxe_net_del(), which looks
> >>> like it is possibly also triggered by NETDEV_UNREGISTER...
> >> No, it was triggered by RDMA_NLDEV_CMD_DELLINK, you can see the "call trace".
>
> Not that Jason's point. Code wise
>
> rxe_dellink -> rxe_net_del
>
> netdev NETDEV_UNREGISTER:
> rxe_notify -> rxe_net_del
>
> both can lead to the same problem
>
> >>>
> >>> ie it should not change nldev_dellink().
> >> While this could be fixed within RXE, the same issue affects all other
> >> RXE-like submodules when they subsequently support the "dellink" interface,
> >> therefore, handling this within nldev_dellink() is relatively more appropriate.
> >
> > Why would other modules have an issue? The problem is rxe's racey
> > refcounting scheme for its lazy socket creation. There is nothing
> > wrong with nldev, and now you've created some nasty BKL in the nldev
> > code to fix rxe while ignoring its other races.
>
> +1
Edward, please come with a fixup on top of this since it was already
applied
Jason
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
2026-05-14 14:14 ` Jason Gunthorpe
@ 2026-05-14 14:26 ` David Ahern
2026-05-14 15:46 ` Zhu Yanjun
0 siblings, 1 reply; 20+ messages in thread
From: David Ahern @ 2026-05-14 14:26 UTC (permalink / raw)
To: Jason Gunthorpe, Zhu Yanjun
Cc: Edward Adam Davis, akpm, arjan, davem, edumazet, hdanton, horms,
kuba, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
syzbot+d8f76778263ab65c2b21, syzkaller-bugs, yanjun.zhu,
zyjzyj2000
On 5/14/26 8:14 AM, Jason Gunthorpe wrote:
>
> Edward, please come with a fixup on top of this since it was already
> applied
>
Zhu Yanjun: As author of the patch that introduced the bug and
maintainer of the rxe code, why have you not addressed this problem? It
has been well known for many weeks now and multiple people have
attempted fixes. Seems like you need to step up and take care of it.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
2026-05-14 14:26 ` David Ahern
@ 2026-05-14 15:46 ` Zhu Yanjun
0 siblings, 0 replies; 20+ messages in thread
From: Zhu Yanjun @ 2026-05-14 15:46 UTC (permalink / raw)
To: David Ahern, Jason Gunthorpe, Zhu Yanjun
Cc: Edward Adam Davis, akpm, arjan, davem, edumazet, hdanton, horms,
kuba, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
syzbot+d8f76778263ab65c2b21, syzkaller-bugs, zyjzyj2000
在 2026/5/14 7:26, David Ahern 写道:
> On 5/14/26 8:14 AM, Jason Gunthorpe wrote:
>> Edward, please come with a fixup on top of this since it was already
>> applied
>>
> Zhu Yanjun: As author of the patch that introduced the bug and
> maintainer of the rxe code, why have you not addressed this problem? It
> has been well known for many weeks now and multiple people have
I am aware of the issue and have been following the discussion and
proposed fixes.
I did not want to rush a change without fully understanding the
implications on RXE
behavior and existing users. I am currently reviewing the proposed
approaches and
working on a proper fix.
I appreciate everyone who helped investigate and test the issue.
Zhu Yanjun
> attempted fixes. Seems like you need to step up and take care of it.
>
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2026-05-14 15:48 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <69ea344f.a00a0220.17a17.0040.GAE@google.com>
2026-04-24 18:08 ` [syzbot] [net?] general protection fault in kernel_sock_shutdown (4) Arjan van de Ven
2026-04-25 1:12 ` Arjan van de Ven
2026-04-25 1:14 ` Kuniyuki Iwashima
2026-05-06 13:48 ` [syzbot] [rdma] " syzbot
2026-05-06 14:28 ` Zhu Yanjun
2026-05-06 15:19 ` Kuniyuki Iwashima
2026-05-07 3:52 ` syzbot
2026-05-07 12:50 ` [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink() Edward Adam Davis
2026-05-07 13:25 ` Zhu Yanjun
2026-05-07 13:40 ` Edward Adam Davis
2026-05-07 14:11 ` Zhu Yanjun
2026-05-13 18:17 ` Leon Romanovsky
2026-05-13 23:46 ` Jason Gunthorpe
2026-05-14 7:31 ` Edward Adam Davis
2026-05-14 11:50 ` Jason Gunthorpe
2026-05-14 13:58 ` David Ahern
2026-05-14 14:14 ` Jason Gunthorpe
2026-05-14 14:26 ` David Ahern
2026-05-14 15:46 ` Zhu Yanjun
2026-05-14 5:15 ` [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4) Zhu Yanjun
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox