Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* Re: [syzbot] [net?] general protection fault in kernel_sock_shutdown (4)
       [not found] <69ea344f.a00a0220.17a17.0040.GAE@google.com>
@ 2026-04-24 18:08 ` Arjan van de Ven
  2026-04-25  1:12 ` Arjan van de Ven
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 31+ messages in thread
From: Arjan van de Ven @ 2026-04-24 18:08 UTC (permalink / raw)
  To: netdev
  Cc: syzbot+d8f76778263ab65c2b21, dsahern, edumazet, akpm,
	linux-kernel, syzkaller-bugs, Arjan van de Ven, linux-rdma,
	Zhu Yanjun, Jason Gunthorpe, Leon Romanovsky


Unfortunately the AI had a burp and did not write out the proper URL
for analysis data; it should have been

http://oops.fenrus.org/reports/lkml/69ea344f.a00a0220.17a17.0040.GAE_google.com/report.html

and in addition, it made a candidate patch (below)

















From: Arjan van de Ven <arjan@linux.intel.com>
Subject: [PATCH] RDMA/rxe: fix double-release race on UDP tunnel socket teardown

This patch is based on a BUG as reported at
https://lore.kernel.org/r/69ea344f.a00a0220.17a17.0040.GAE@google.com.

The Soft RoCE (RXE) driver stores per-network-namespace UDP tunnel
sockets for IPv4 and IPv6 encapsulation. Two independent code paths
tear these sockets down: rxe_ns_exit(), called when a network
namespace is destroyed, and rxe_net_del(), called when an RDMA link
is deleted via netlink. Both paths read the per-namespace socket
pointer and call udp_tunnel_sock_release() on it.

A time-of-check/time-of-use (TOCTOU) race exists in rxe_net_del().
It reads the socket pointer via rxe_ns_pernet_sk4(), then passes it
to rxe_sock_put() for release. If rxe_ns_exit() runs concurrently
between the read and the release, it clears the pointer and calls
udp_tunnel_sock_release() first, causing sock_release() to set
sock->ops = NULL. When rxe_net_del() then calls
udp_tunnel_sock_release() on the same socket, kernel_sock_shutdown()
dereferences the now-NULL sock->ops, triggering a KASAN null-ptr-deref
at offset 0x68 (the shutdown function pointer in struct proto_ops).

A minimal alternative would guard against NULL sock->ops inside
udp_tunnel_sock_release() before calling kernel_sock_shutdown(). That
treats the symptom rather than the root cause and leaves the
double-release of socket state intact.

Add rxe_ns_pernet_take_sk4() and rxe_ns_pernet_take_sk6() which use
xchg() to atomically swap the per-namespace socket pointer to NULL
and return the old value. Replace the non-atomic reads in
rxe_net_del() with these take variants, and release the socket
directly via udp_tunnel_sock_release() without going through
rxe_sock_put().

Whichever teardown path executes take first claims ownership of the
socket; the second caller gets NULL and skips the release, closing
the double-release window.

Link: https://lore.kernel.org/r/69ea344f.a00a0220.17a17.0040.GAE@google.com
Oops-Analysis: http://oops.fenrus.org/reports/lkml/69ea344f.a00a0220.17a17.0040.GAE_google.com/report.html
Fixes: 13f2a53c2a71 ("RDMA/rxe: Add net namespace support for IPv4/IPv6 sockets")
Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and destruction per net namespace")
Assisted-by: GitHub Copilot patcher:claude linux-kernel-oops-x86.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Zhu Yanjun <zyjzyj2000@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>

---
 drivers/infiniband/sw/rxe/rxe_net.c |    8 ++++----
 drivers/infiniband/sw/rxe/rxe_ns.c  |   14 ++++++++++++++
 drivers/infiniband/sw/rxe/rxe_ns.h  |    7 +++++++
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 50a2cb5405e22..4f604636cb7b4 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -655,13 +655,13 @@ void rxe_net_del(struct ib_device *dev)
 
 	net = dev_net(ndev);
 
-	sk = rxe_ns_pernet_sk4(net);
+	sk = rxe_ns_pernet_take_sk4(net);
 	if (sk)
-		rxe_sock_put(sk, rxe_ns_pernet_set_sk4, net);
+		udp_tunnel_sock_release(sk->sk_socket);
 
-	sk = rxe_ns_pernet_sk6(net);
+	sk = rxe_ns_pernet_take_sk6(net);
 	if (sk)
-		rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);
+		udp_tunnel_sock_release(sk->sk_socket);
 
 	dev_put(ndev);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_ns.c b/drivers/infiniband/sw/rxe/rxe_ns.c
index 8b9d734229b24..d9d376e3c670f 100644
--- a/drivers/infiniband/sw/rxe/rxe_ns.c
+++ b/drivers/infiniband/sw/rxe/rxe_ns.c
@@ -91,6 +91,13 @@ void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk)
 	synchronize_rcu();
 }
 
+struct sock *rxe_ns_pernet_take_sk4(struct net *net)
+{
+	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
+
+	return xchg((__force struct sock **)&ns_sk->rxe_sk4, NULL);
+}
+
 #if IS_ENABLED(CONFIG_IPV6)
 struct sock *rxe_ns_pernet_sk6(struct net *net)
 {
@@ -111,6 +118,13 @@ void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
 	rcu_assign_pointer(ns_sk->rxe_sk6, sk);
 	synchronize_rcu();
 }
+
+struct sock *rxe_ns_pernet_take_sk6(struct net *net)
+{
+	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
+
+	return xchg((__force struct sock **)&ns_sk->rxe_sk6, NULL);
+}
 #endif /* IPV6 */
 
 int rxe_namespace_init(void)
diff --git a/drivers/infiniband/sw/rxe/rxe_ns.h b/drivers/infiniband/sw/rxe/rxe_ns.h
index 4da2709e6b714..9d9a5106b77c8 100644
--- a/drivers/infiniband/sw/rxe/rxe_ns.h
+++ b/drivers/infiniband/sw/rxe/rxe_ns.h
@@ -5,10 +5,17 @@
 
 struct sock *rxe_ns_pernet_sk4(struct net *net);
 void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk);
+struct sock *rxe_ns_pernet_take_sk4(struct net *net);
 
 #if IS_ENABLED(CONFIG_IPV6)
 void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk);
 struct sock *rxe_ns_pernet_sk6(struct net *net);
+struct sock *rxe_ns_pernet_take_sk6(struct net *net);
 #else /* IPv6 */
 static inline struct sock *rxe_ns_pernet_sk6(struct net *net)
 {
@@ -18,6 +25,10 @@ static inline void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
 {
 }
 
+static inline struct sock *rxe_ns_pernet_take_sk6(struct net *net)
+{
+	return NULL;
+}
 #endif /* IPv6 */
 
 int rxe_namespace_init(void);

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [syzbot] [net?] general protection fault in kernel_sock_shutdown (4)
       [not found] <69ea344f.a00a0220.17a17.0040.GAE@google.com>
  2026-04-24 18:08 ` [syzbot] [net?] general protection fault in kernel_sock_shutdown (4) Arjan van de Ven
@ 2026-04-25  1:12 ` Arjan van de Ven
  2026-04-25  1:14   ` Kuniyuki Iwashima
  2026-05-06 13:48 ` [syzbot] [rdma] " syzbot
  2026-05-07  3:52 ` syzbot
  3 siblings, 1 reply; 31+ messages in thread
From: Arjan van de Ven @ 2026-04-25  1:12 UTC (permalink / raw)
  To: kuniyu
  Cc: Arjan van de Ven, linux-rdma, linux-kernel, Zhu Yanjun,
	Jason Gunthorpe, Leon Romanovsky


Unfortunately the AI had a burp and did not write out the proper URL
for analysis data; it should have been

http://oops.fenrus.org/reports/lkml/69ea344f.a00a0220.17a17.0040.GAE_google.com/report.html

and in addition, it made a candidate patch (below)











From: Arjan van de Ven <arjan@linux.intel.com>
Subject: [PATCH] RDMA/rxe: fix double-release race on UDP tunnel socket teardown

This patch is based on a BUG as reported at
https://lore.kernel.org/r/69ea344f.a00a0220.17a17.0040.GAE@google.com.

The Soft RoCE (RXE) driver stores per-network-namespace UDP tunnel
sockets for IPv4 and IPv6 encapsulation. Two independent code paths
tear these sockets down: rxe_ns_exit(), called when a network
namespace is destroyed, and rxe_net_del(), called when an RDMA link
is deleted via netlink. Both paths read the per-namespace socket
pointer and call udp_tunnel_sock_release() on it.

A time-of-check/time-of-use (TOCTOU) race exists in rxe_net_del().
It reads the socket pointer via rxe_ns_pernet_sk4(), then passes it
to rxe_sock_put() for release. If rxe_ns_exit() runs concurrently
between the read and the release, it clears the pointer and calls
udp_tunnel_sock_release() first, causing sock_release() to set
sock->ops = NULL. When rxe_net_del() then calls
udp_tunnel_sock_release() on the same socket, kernel_sock_shutdown()
dereferences the now-NULL sock->ops, triggering a KASAN null-ptr-deref
at offset 0x68 (the shutdown function pointer in struct proto_ops).

A minimal alternative would guard against NULL sock->ops inside
udp_tunnel_sock_release() before calling kernel_sock_shutdown(). That
treats the symptom rather than the root cause and leaves the
double-release of socket state intact.

Add rxe_ns_pernet_take_sk4() and rxe_ns_pernet_take_sk6() which use
xchg() to atomically swap the per-namespace socket pointer to NULL
and return the old value. Replace the non-atomic reads in
rxe_net_del() with these take variants, and release the socket
directly via udp_tunnel_sock_release() without going through
rxe_sock_put().

Whichever teardown path executes take first claims ownership of the
socket; the second caller gets NULL and skips the release, closing
the double-release window.

Link: https://lore.kernel.org/r/69ea344f.a00a0220.17a17.0040.GAE@google.com
Oops-Analysis: http://oops.fenrus.org/reports/lkml/69ea344f.a00a0220.17a17.0040.GAE_google.com/report.html
Fixes: 13f2a53c2a71 ("RDMA/rxe: Add net namespace support for IPv4/IPv6 sockets")
Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and destruction per net namespace")
Assisted-by: GitHub Copilot patcher:claude linux-kernel-oops-x86.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Zhu Yanjun <zyjzyj2000@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>

---
 drivers/infiniband/sw/rxe/rxe_net.c |    8 ++++----
 drivers/infiniband/sw/rxe/rxe_ns.c  |   14 ++++++++++++++
 drivers/infiniband/sw/rxe/rxe_ns.h  |    7 +++++++
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 50a2cb5405e22..4f604636cb7b4 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -655,13 +655,13 @@ void rxe_net_del(struct ib_device *dev)
 
 	net = dev_net(ndev);
 
-	sk = rxe_ns_pernet_sk4(net);
+	sk = rxe_ns_pernet_take_sk4(net);
 	if (sk)
-		rxe_sock_put(sk, rxe_ns_pernet_set_sk4, net);
+		udp_tunnel_sock_release(sk->sk_socket);
 
-	sk = rxe_ns_pernet_sk6(net);
+	sk = rxe_ns_pernet_take_sk6(net);
 	if (sk)
-		rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);
+		udp_tunnel_sock_release(sk->sk_socket);
 
 	dev_put(ndev);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_ns.c b/drivers/infiniband/sw/rxe/rxe_ns.c
index 8b9d734229b24..d9d376e3c670f 100644
--- a/drivers/infiniband/sw/rxe/rxe_ns.c
+++ b/drivers/infiniband/sw/rxe/rxe_ns.c
@@ -91,6 +91,13 @@ void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk)
 	synchronize_rcu();
 }
 
+struct sock *rxe_ns_pernet_take_sk4(struct net *net)
+{
+	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
+
+	return xchg((__force struct sock **)&ns_sk->rxe_sk4, NULL);
+}
+
 #if IS_ENABLED(CONFIG_IPV6)
 struct sock *rxe_ns_pernet_sk6(struct net *net)
 {
@@ -111,6 +118,13 @@ void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
 	rcu_assign_pointer(ns_sk->rxe_sk6, sk);
 	synchronize_rcu();
 }
+
+struct sock *rxe_ns_pernet_take_sk6(struct net *net)
+{
+	struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
+
+	return xchg((__force struct sock **)&ns_sk->rxe_sk6, NULL);
+}
 #endif /* IPV6 */
 
 int rxe_namespace_init(void)
diff --git a/drivers/infiniband/sw/rxe/rxe_ns.h b/drivers/infiniband/sw/rxe/rxe_ns.h
index 4da2709e6b714..9d9a5106b77c8 100644
--- a/drivers/infiniband/sw/rxe/rxe_ns.h
+++ b/drivers/infiniband/sw/rxe/rxe_ns.h
@@ -5,10 +5,17 @@
 
 struct sock *rxe_ns_pernet_sk4(struct net *net);
 void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk);
+struct sock *rxe_ns_pernet_take_sk4(struct net *net);
 
 #if IS_ENABLED(CONFIG_IPV6)
 void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk);
 struct sock *rxe_ns_pernet_sk6(struct net *net);
+struct sock *rxe_ns_pernet_take_sk6(struct net *net);
 #else /* IPv6 */
 static inline struct sock *rxe_ns_pernet_sk6(struct net *net)
 {
@@ -18,6 +25,10 @@ static inline void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
 {
 }
 
+static inline struct sock *rxe_ns_pernet_take_sk6(struct net *net)
+{
+	return NULL;
+}
 #endif /* IPv6 */
 
 int rxe_namespace_init(void);

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [syzbot] [net?] general protection fault in kernel_sock_shutdown (4)
  2026-04-25  1:12 ` Arjan van de Ven
@ 2026-04-25  1:14   ` Kuniyuki Iwashima
  0 siblings, 0 replies; 31+ messages in thread
From: Kuniyuki Iwashima @ 2026-04-25  1:14 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-rdma, linux-kernel, Zhu Yanjun, Jason Gunthorpe,
	Leon Romanovsky

On Fri, Apr 24, 2026 at 6:11 PM Arjan van de Ven <arjan@linux.intel.com> wrote:
>
>
> Unfortunately the AI had a burp and did not write out the proper URL
> for analysis data; it should have been
>
> http://oops.fenrus.org/reports/lkml/69ea344f.a00a0220.17a17.0040.GAE_google.com/report.html
>
> and in addition, it made a candidate patch (below)
>
>
>
>
>
>
>
>
>
>
>
> From: Arjan van de Ven <arjan@linux.intel.com>
> Subject: [PATCH] RDMA/rxe: fix double-release race on UDP tunnel socket teardown
>
> This patch is based on a BUG as reported at
> https://lore.kernel.org/r/69ea344f.a00a0220.17a17.0040.GAE@google.com.
>
> The Soft RoCE (RXE) driver stores per-network-namespace UDP tunnel
> sockets for IPv4 and IPv6 encapsulation. Two independent code paths
> tear these sockets down: rxe_ns_exit(), called when a network
> namespace is destroyed, and rxe_net_del(), called when an RDMA link
> is deleted via netlink. Both paths read the per-namespace socket
> pointer and call udp_tunnel_sock_release() on it.
>
> A time-of-check/time-of-use (TOCTOU) race exists in rxe_net_del().
> It reads the socket pointer via rxe_ns_pernet_sk4(), then passes it
> to rxe_sock_put() for release. If rxe_ns_exit() runs concurrently
> between the read and the release, it clears the pointer and calls
> udp_tunnel_sock_release() first, causing sock_release() to set
> sock->ops = NULL. When rxe_net_del() then calls
> udp_tunnel_sock_release() on the same socket, kernel_sock_shutdown()
> dereferences the now-NULL sock->ops, triggering a KASAN null-ptr-deref
> at offset 0x68 (the shutdown function pointer in struct proto_ops).
>
> A minimal alternative would guard against NULL sock->ops inside
> udp_tunnel_sock_release() before calling kernel_sock_shutdown(). That
> treats the symptom rather than the root cause and leaves the
> double-release of socket state intact.
>
> Add rxe_ns_pernet_take_sk4() and rxe_ns_pernet_take_sk6() which use
> xchg() to atomically swap the per-namespace socket pointer to NULL
> and return the old value. Replace the non-atomic reads in
> rxe_net_del() with these take variants, and release the socket
> directly via udp_tunnel_sock_release() without going through
> rxe_sock_put().
>
> Whichever teardown path executes take first claims ownership of the
> socket; the second caller gets NULL and skips the release, closing
> the double-release window.
>
> Link: https://lore.kernel.org/r/69ea344f.a00a0220.17a17.0040.GAE@google.com
> Oops-Analysis: http://oops.fenrus.org/reports/lkml/69ea344f.a00a0220.17a17.0040.GAE_google.com/report.html
> Fixes: 13f2a53c2a71 ("RDMA/rxe: Add net namespace support for IPv4/IPv6 sockets")
> Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and destruction per net namespace")
> Assisted-by: GitHub Copilot patcher:claude linux-kernel-oops-x86.
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> Cc: linux-rdma@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Zhu Yanjun <zyjzyj2000@gmail.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Leon Romanovsky <leon@kernel.org>
>
> ---
>  drivers/infiniband/sw/rxe/rxe_net.c |    8 ++++----
>  drivers/infiniband/sw/rxe/rxe_ns.c  |   14 ++++++++++++++
>  drivers/infiniband/sw/rxe/rxe_ns.h  |    7 +++++++
>  3 files changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 50a2cb5405e22..4f604636cb7b4 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -655,13 +655,13 @@ void rxe_net_del(struct ib_device *dev)
>
>         net = dev_net(ndev);
>
> -       sk = rxe_ns_pernet_sk4(net);
> +       sk = rxe_ns_pernet_take_sk4(net);
>         if (sk)
> -               rxe_sock_put(sk, rxe_ns_pernet_set_sk4, net);
> +               udp_tunnel_sock_release(sk->sk_socket);

This leaks sk->sk_refcnt, no AI slop please.

I'm working on the right fix.

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4)
       [not found] <69ea344f.a00a0220.17a17.0040.GAE@google.com>
  2026-04-24 18:08 ` [syzbot] [net?] general protection fault in kernel_sock_shutdown (4) Arjan van de Ven
  2026-04-25  1:12 ` Arjan van de Ven
@ 2026-05-06 13:48 ` syzbot
  2026-05-06 14:28   ` Zhu Yanjun
  2026-05-07  3:52 ` syzbot
  3 siblings, 1 reply; 31+ messages in thread
From: syzbot @ 2026-05-06 13:48 UTC (permalink / raw)
  To: akpm, arjan, davem, dsahern, edumazet, horms, jgg, kuba, kuni1840,
	kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzkaller-bugs, yanjun.zhu, zyjzyj2000

syzbot has found a reproducer for the following issue on:

HEAD commit:    74fe02ce122a Merge tag 'wq-for-7.1-rc2-fixes' of git://git..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16e895ce580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=59da38148f3a3d24
dashboard link: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13a613ba580000

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-74fe02ce.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/c0a591d96864/vmlinux-74fe02ce.xz
kernel image: https://storage.googleapis.com/syzbot-assets/9f94fb623cd1/bzImage-74fe02ce.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com

Oops: general protection fault, probably for non-canonical address 0xdffffc000000000d: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f]
CPU: 3 UID: 0 PID: 5986 Comm: syz.3.20 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:kernel_sock_shutdown+0x47/0x70 net/socket.c:3785
Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 33 48 b8 00 00 00 00 00 fc ff df 4c 8b 63 20 49 8d 7c 24 68 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 1a 49 8b 44 24 68 89 ee 48 89 df 5b 5d 41 5c ff e0
RSP: 0018:ffffc9000391f180 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff88802a2a0040 RCX: ffffffff8b8b72bd
RDX: 000000000000000d RSI: ffffffff89553b32 RDI: 0000000000000068
RBP: 0000000000000002 R08: 0000000000000001 R09: fffff52000723dfc
R10: ffffc9000391efe7 R11: 0000000000000000 R12: 0000000000000000
R13: ffff8880311b8000 R14: 0000000000000002 R15: 0000000000000018
FS:  00007f602d1fe6c0(0000) GS:ffff8880d6675000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000561c522a6000 CR3: 000000002e99e000 CR4: 0000000000352ef0
Call Trace:
 <TASK>
 udp_tunnel_sock_release+0x68/0x80 net/ipv4/udp_tunnel_core.c:202
 rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
 rxe_sock_put+0xae/0x130 drivers/infiniband/sw/rxe/rxe_net.c:639
 rxe_net_del+0x83/0x120 drivers/infiniband/sw/rxe/rxe_net.c:660
 rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
 nldev_dellink+0x289/0x3c0 drivers/infiniband/core/nldev.c:1849
 rdma_nl_rcv_msg+0x392/0x6f0 drivers/infiniband/core/netlink.c:195
 rdma_nl_rcv_skb.constprop.0.isra.0+0x2cb/0x410 drivers/infiniband/core/netlink.c:239
 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
 netlink_unicast+0x585/0x850 net/netlink/af_netlink.c:1344
 netlink_sendmsg+0x8b0/0xda0 net/netlink/af_netlink.c:1894
 sock_sendmsg_nosec net/socket.c:787 [inline]
 __sock_sendmsg net/socket.c:802 [inline]
 ____sys_sendmsg+0x9e1/0xb70 net/socket.c:2698
 ___sys_sendmsg+0x190/0x1e0 net/socket.c:2752
 __sys_sendmsg+0x170/0x220 net/socket.c:2784
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x10b/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f602db9cdd9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f602d1fe028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f602de16090 RCX: 00007f602db9cdd9
RDX: 0000000000000000 RSI: 00002000000002c0 RDI: 0000000000000007
RBP: 00007f602dc32d69 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f602de16128 R14: 00007f602de16090 R15: 00007ffc1d89c428
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:kernel_sock_shutdown+0x47/0x70 net/socket.c:3785
Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 33 48 b8 00 00 00 00 00 fc ff df 4c 8b 63 20 49 8d 7c 24 68 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 1a 49 8b 44 24 68 89 ee 48 89 df 5b 5d 41 5c ff e0
RSP: 0018:ffffc9000391f180 EFLAGS: 00010202

RAX: dffffc0000000000 RBX: ffff88802a2a0040 RCX: ffffffff8b8b72bd
RDX: 000000000000000d RSI: ffffffff89553b32 RDI: 0000000000000068
RBP: 0000000000000002 R08: 0000000000000001 R09: fffff52000723dfc
R10: ffffc9000391efe7 R11: 0000000000000000 R12: 0000000000000000
R13: ffff8880311b8000 R14: 0000000000000002 R15: 0000000000000018
FS:  00007f602d1fe6c0(0000) GS:ffff8880d6675000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000561c522a6000 CR3: 000000002e99e000 CR4: 0000000000352ef0
----------------
Code disassembly (best guess):
   0:	fc                   	cld
   1:	ff                   	lcall  (bad)
   2:	df 48 89             	fisttps -0x77(%rax)
   5:	fa                   	cli
   6:	48 c1 ea 03          	shr    $0x3,%rdx
   a:	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1)
   e:	75 33                	jne    0x43
  10:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
  17:	fc ff df
  1a:	4c 8b 63 20          	mov    0x20(%rbx),%r12
  1e:	49 8d 7c 24 68       	lea    0x68(%r12),%rdi
  23:	48 89 fa             	mov    %rdi,%rdx
  26:	48 c1 ea 03          	shr    $0x3,%rdx
* 2a:	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1) <-- trapping instruction
  2e:	75 1a                	jne    0x4a
  30:	49 8b 44 24 68       	mov    0x68(%r12),%rax
  35:	89 ee                	mov    %ebp,%esi
  37:	48 89 df             	mov    %rbx,%rdi
  3a:	5b                   	pop    %rbx
  3b:	5d                   	pop    %rbp
  3c:	41 5c                	pop    %r12
  3e:	ff e0                	jmp    *%rax


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4)
  2026-05-06 13:48 ` [syzbot] [rdma] " syzbot
@ 2026-05-06 14:28   ` Zhu Yanjun
  2026-05-06 15:19     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 31+ messages in thread
From: Zhu Yanjun @ 2026-05-06 14:28 UTC (permalink / raw)
  To: syzbot, akpm, arjan, davem, dsahern, edumazet, horms, jgg, kuba,
	kuni1840, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzkaller-bugs, zyjzyj2000
  Cc: Kuniyuki Iwashima


在 2026/5/6 6:48, syzbot 写道:
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit:    74fe02ce122a Merge tag 'wq-for-7.1-rc2-fixes' of git://git..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=16e895ce580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=59da38148f3a3d24
> dashboard link: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
> compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13a613ba580000
>
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-74fe02ce.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/c0a591d96864/vmlinux-74fe02ce.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/9f94fb623cd1/bzImage-74fe02ce.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
>
> Oops: general protection fault, probably for non-canonical address 0xdffffc000000000d: 0000 [#1] SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f]

Thanks a lot. IIRC, this problem is in process. The link is 
https://patchwork.kernel.org/project/linux-rdma/patch/20260424013759.728288-1-kuniyu@google.com/

Hi, Kuniyuki Iwashima

I think you are fixing this problem. I hope that we can see your commit 
very soon.

Zhu Yanjun

> CPU: 3 UID: 0 PID: 5986 Comm: syz.3.20 Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> RIP: 0010:kernel_sock_shutdown+0x47/0x70 net/socket.c:3785
> Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 33 48 b8 00 00 00 00 00 fc ff df 4c 8b 63 20 49 8d 7c 24 68 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 1a 49 8b 44 24 68 89 ee 48 89 df 5b 5d 41 5c ff e0
> RSP: 0018:ffffc9000391f180 EFLAGS: 00010202
> RAX: dffffc0000000000 RBX: ffff88802a2a0040 RCX: ffffffff8b8b72bd
> RDX: 000000000000000d RSI: ffffffff89553b32 RDI: 0000000000000068
> RBP: 0000000000000002 R08: 0000000000000001 R09: fffff52000723dfc
> R10: ffffc9000391efe7 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff8880311b8000 R14: 0000000000000002 R15: 0000000000000018
> FS:  00007f602d1fe6c0(0000) GS:ffff8880d6675000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000561c522a6000 CR3: 000000002e99e000 CR4: 0000000000352ef0
> Call Trace:
>   <TASK>
>   udp_tunnel_sock_release+0x68/0x80 net/ipv4/udp_tunnel_core.c:202
>   rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
>   rxe_sock_put+0xae/0x130 drivers/infiniband/sw/rxe/rxe_net.c:639
>   rxe_net_del+0x83/0x120 drivers/infiniband/sw/rxe/rxe_net.c:660
>   rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
>   nldev_dellink+0x289/0x3c0 drivers/infiniband/core/nldev.c:1849
>   rdma_nl_rcv_msg+0x392/0x6f0 drivers/infiniband/core/netlink.c:195
>   rdma_nl_rcv_skb.constprop.0.isra.0+0x2cb/0x410 drivers/infiniband/core/netlink.c:239
>   netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
>   netlink_unicast+0x585/0x850 net/netlink/af_netlink.c:1344
>   netlink_sendmsg+0x8b0/0xda0 net/netlink/af_netlink.c:1894
>   sock_sendmsg_nosec net/socket.c:787 [inline]
>   __sock_sendmsg net/socket.c:802 [inline]
>   ____sys_sendmsg+0x9e1/0xb70 net/socket.c:2698
>   ___sys_sendmsg+0x190/0x1e0 net/socket.c:2752
>   __sys_sendmsg+0x170/0x220 net/socket.c:2784
>   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>   do_syscall_64+0x10b/0xf80 arch/x86/entry/syscall_64.c:94
>   entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f602db9cdd9
> Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f602d1fe028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 00007f602de16090 RCX: 00007f602db9cdd9
> RDX: 0000000000000000 RSI: 00002000000002c0 RDI: 0000000000000007
> RBP: 00007f602dc32d69 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f602de16128 R14: 00007f602de16090 R15: 00007ffc1d89c428
>   </TASK>
> Modules linked in:
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:kernel_sock_shutdown+0x47/0x70 net/socket.c:3785
> Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 33 48 b8 00 00 00 00 00 fc ff df 4c 8b 63 20 49 8d 7c 24 68 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 1a 49 8b 44 24 68 89 ee 48 89 df 5b 5d 41 5c ff e0
> RSP: 0018:ffffc9000391f180 EFLAGS: 00010202
>
> RAX: dffffc0000000000 RBX: ffff88802a2a0040 RCX: ffffffff8b8b72bd
> RDX: 000000000000000d RSI: ffffffff89553b32 RDI: 0000000000000068
> RBP: 0000000000000002 R08: 0000000000000001 R09: fffff52000723dfc
> R10: ffffc9000391efe7 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff8880311b8000 R14: 0000000000000002 R15: 0000000000000018
> FS:  00007f602d1fe6c0(0000) GS:ffff8880d6675000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000561c522a6000 CR3: 000000002e99e000 CR4: 0000000000352ef0
> ----------------
> Code disassembly (best guess):
>     0:	fc                   	cld
>     1:	ff                   	lcall  (bad)
>     2:	df 48 89             	fisttps -0x77(%rax)
>     5:	fa                   	cli
>     6:	48 c1 ea 03          	shr    $0x3,%rdx
>     a:	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1)
>     e:	75 33                	jne    0x43
>    10:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
>    17:	fc ff df
>    1a:	4c 8b 63 20          	mov    0x20(%rbx),%r12
>    1e:	49 8d 7c 24 68       	lea    0x68(%r12),%rdi
>    23:	48 89 fa             	mov    %rdi,%rdx
>    26:	48 c1 ea 03          	shr    $0x3,%rdx
> * 2a:	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1) <-- trapping instruction
>    2e:	75 1a                	jne    0x4a
>    30:	49 8b 44 24 68       	mov    0x68(%r12),%rax
>    35:	89 ee                	mov    %ebp,%esi
>    37:	48 89 df             	mov    %rbx,%rdi
>    3a:	5b                   	pop    %rbx
>    3b:	5d                   	pop    %rbp
>    3c:	41 5c                	pop    %r12
>    3e:	ff e0                	jmp    *%rax
>
>
> ---
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.

-- 
Best Regards,
Yanjun.Zhu


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4)
  2026-05-06 14:28   ` Zhu Yanjun
@ 2026-05-06 15:19     ` Kuniyuki Iwashima
  0 siblings, 0 replies; 31+ messages in thread
From: Kuniyuki Iwashima @ 2026-05-06 15:19 UTC (permalink / raw)
  To: Zhu Yanjun
  Cc: syzbot, akpm, arjan, davem, dsahern, edumazet, horms, jgg, kuba,
	kuni1840, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzkaller-bugs, zyjzyj2000

On Wed, May 6, 2026 at 7:28 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
>
>
> 在 2026/5/6 6:48, syzbot 写道:
> > syzbot has found a reproducer for the following issue on:
> >
> > HEAD commit:    74fe02ce122a Merge tag 'wq-for-7.1-rc2-fixes' of git://git..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=16e895ce580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=59da38148f3a3d24
> > dashboard link: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
> > compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13a613ba580000
> >
> > Downloadable assets:
> > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-74fe02ce.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/c0a591d96864/vmlinux-74fe02ce.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/9f94fb623cd1/bzImage-74fe02ce.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> >
> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000000d: 0000 [#1] SMP KASAN NOPTI
> > KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f]
>
> Thanks a lot. IIRC, this problem is in process. The link is
> https://patchwork.kernel.org/project/linux-rdma/patch/20260424013759.728288-1-kuniyu@google.com/
>
> Hi, Kuniyuki Iwashima
>
> I think you are fixing this problem. I hope that we can see your commit
> very soon.

Yes, I was sidetracked but will respin v3 this week.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4)
       [not found] <69ea344f.a00a0220.17a17.0040.GAE@google.com>
                   ` (2 preceding siblings ...)
  2026-05-06 13:48 ` [syzbot] [rdma] " syzbot
@ 2026-05-07  3:52 ` syzbot
  2026-05-07 12:50   ` [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink() Edward Adam Davis
  2026-05-14  5:15   ` [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4) Zhu Yanjun
  3 siblings, 2 replies; 31+ messages in thread
From: syzbot @ 2026-05-07  3:52 UTC (permalink / raw)
  To: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, jgg, kuba,
	kuni1840, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzkaller-bugs, yanjun.zhu, zyjzyj2000

syzbot has found a reproducer for the following issue on:

HEAD commit:    735d2f48cada Add linux-next specific files for 20260506
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=14f0e56a580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a88880f0f312e277
dashboard link: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=125c9f6c580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=166580ec580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/e65b731bdb98/disk-735d2f48.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/60db2f3d3f2f/vmlinux-735d2f48.xz
kernel image: https://storage.googleapis.com/syzbot-assets/55da282f7ab4/bzImage-735d2f48.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com

rdma_rxe: rxe_newlink: failed to add lo
Oops: gen[  127.022080][ T5982] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
CPU: 1 UID: 0 PID: 5982 Comm: syz.3.20 Not tainted syzkaller #0 PREEMPT_{RT,(full)} 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
RIP: 0010:kernel_sock_shutdown+0x2a/0x70 net/socket.c:3803
Code: f3 0f 1e fa 41 57 41 56 41 54 53 89 f3 49 89 fe 49 bc 00 00 00 00 00 fc ff df e8 e1 25 c5 f8 4d 8d 7e 20 4c 89 f8 48 c1 e8 03 <42> 80 3c 20 00 74 08 4c 89 ff e8 27 bf 2e f9 4d 8b 3f 49 83 c7 68
RSP: 0018:ffffc900015ef090 EFLAGS: 00010202
RAX: 0000000000000004 RBX: 0000000000000002 RCX: ffff88802dd89ec0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: dffffc0000000000 R11: ffffed1007cc8979 R12: dffffc0000000000
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000020
FS:  000055556d432500(0000) GS:ffff888125dca000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b34563fff CR3: 0000000042b1c000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
 rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
 rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
 rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
 rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
 nldev_dellink+0x304/0x3d0 drivers/infiniband/core/nldev.c:1849
 rdma_nl_rcv_msg drivers/infiniband/core/netlink.c:-1 [inline]
 rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
 rdma_nl_rcv+0x6d7/0xa10 drivers/infiniband/core/netlink.c:259
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x780/0x920 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1895
 sock_sendmsg_nosec+0x112/0x150 net/socket.c:797
 __sock_sendmsg net/socket.c:812 [inline]
 ____sys_sendmsg+0x55c/0x870 net/socket.c:2716
 ___sys_sendmsg+0x2a5/0x360 net/socket.c:2770
 __sys_sendmsg net/socket.c:2802 [inline]
 __do_sys_sendmsg net/socket.c:2807 [inline]
 __se_sys_sendmsg net/socket.c:2805 [inline]
 __x64_sys_sendmsg+0x1c3/0x2a0 net/socket.c:2805
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f89172fcdd9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe8bf8c018 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f8917575fa0 RCX: 00007f89172fcdd9
RDX: 0000000000000000 RSI: 00002000000002c0 RDI: 0000000000000006
RBP: 00007f8917392d69 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f8917575fac R14: 00007f8917575fa0 R15: 00007f8917575fa0
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:kernel_sock_shutdown+0x2a/0x70 net/socket.c:3803
Code: f3 0f 1e fa 41 57 41 56 41 54 53 89 f3 49 89 fe 49 bc 00 00 00 00 00 fc ff df e8 e1 25 c5 f8 4d 8d 7e 20 4c 89 f8 48 c1 e8 03 <42> 80 3c 20 00 74 08 4c 89 ff e8 27 bf 2e f9 4d 8b 3f 49 83 c7 68
RSP: 0018:ffffc900015ef090 EFLAGS: 00010202
RAX: 0000000000000004 RBX: 0000000000000002 RCX: ffff88802dd89ec0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: dffffc0000000000 R11: ffffed1007cc8979 R12: dffffc0000000000
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000020
FS:  000055556d432500(0000) GS:ffff888125dca000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000042b1c000 CR4: 00000000003526f0
----------------
Code disassembly (best guess):
   0:	f3 0f 1e fa          	endbr64
   4:	41 57                	push   %r15
   6:	41 56                	push   %r14
   8:	41 54                	push   %r12
   a:	53                   	push   %rbx
   b:	89 f3                	mov    %esi,%ebx
   d:	49 89 fe             	mov    %rdi,%r14
  10:	49 bc 00 00 00 00 00 	movabs $0xdffffc0000000000,%r12
  17:	fc ff df
  1a:	e8 e1 25 c5 f8       	call   0xf8c52600
  1f:	4d 8d 7e 20          	lea    0x20(%r14),%r15
  23:	4c 89 f8             	mov    %r15,%rax
  26:	48 c1 e8 03          	shr    $0x3,%rax
* 2a:	42 80 3c 20 00       	cmpb   $0x0,(%rax,%r12,1) <-- trapping instruction
  2f:	74 08                	je     0x39
  31:	4c 89 ff             	mov    %r15,%rdi
  34:	e8 27 bf 2e f9       	call   0xf92ebf60
  39:	4d 8b 3f             	mov    (%r15),%r15
  3c:	49 83 c7 68          	add    $0x68,%r15


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
  2026-05-07  3:52 ` syzbot
@ 2026-05-07 12:50   ` Edward Adam Davis
  2026-05-07 13:25     ` Zhu Yanjun
  2026-05-13 18:17     ` Leon Romanovsky
  2026-05-14  5:15   ` [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4) Zhu Yanjun
  1 sibling, 2 replies; 31+ messages in thread
From: Edward Adam Davis @ 2026-05-07 12:50 UTC (permalink / raw)
  To: syzbot+d8f76778263ab65c2b21
  Cc: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, jgg, kuba,
	kuni1840, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzkaller-bugs, yanjun.zhu, zyjzyj2000

We must serialize calls to nldev_dellink() or risk a crash as syzbot
reported:

KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
Call Trace:
 udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
 rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
 rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
 rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
 rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
 
Fixes: a60e3f3d6fba ("RDMA/nldev: Add dellink function pointer")
Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
Tested-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
---
 drivers/infiniband/core/nldev.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 96c745d5bac4..3cb3cb7629fe 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -1816,6 +1816,8 @@ static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	return err;
 }
 
+static DEFINE_MUTEX(nldev_dellink_mutex);
+
 static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			  struct netlink_ext_ack *extack)
 {
@@ -1846,7 +1848,9 @@ static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	 * implicitly scoped to the driver supporting dynamic link deletion like RXE.
 	 */
 	if (device->link_ops && device->link_ops->dellink) {
+		mutex_lock(&nldev_dellink_mutex);
 		err = device->link_ops->dellink(device);
+		mutex_unlock(&nldev_dellink_mutex);
 		if (err)
 			return err;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
  2026-05-07 12:50   ` [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink() Edward Adam Davis
@ 2026-05-07 13:25     ` Zhu Yanjun
  2026-05-07 13:40       ` Edward Adam Davis
  2026-05-13 18:17     ` Leon Romanovsky
  1 sibling, 1 reply; 31+ messages in thread
From: Zhu Yanjun @ 2026-05-07 13:25 UTC (permalink / raw)
  To: Edward Adam Davis, syzbot+d8f76778263ab65c2b21,
	yanjun.zhu@linux.dev
  Cc: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, jgg, kuba,
	kuni1840, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzkaller-bugs, zyjzyj2000


在 2026/5/7 5:50, Edward Adam Davis 写道:
> We must serialize calls to nldev_dellink() or risk a crash as syzbot
> reported:
>
> KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
> Call Trace:
>   udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
>   rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
>   rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
>   rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
>   rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
>   
> Fixes: a60e3f3d6fba ("RDMA/nldev: Add dellink function pointer")
> Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
> Tested-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> Signed-off-by: Edward Adam Davis <eadavis@qq.com>

Thanks a lot. This looks like a good solution. Since the issue is 
reproducible,

have you sent this commit to syzbot for verification?

Thanks,

Zhu Yanjun

> ---
>   drivers/infiniband/core/nldev.c | 4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> index 96c745d5bac4..3cb3cb7629fe 100644
> --- a/drivers/infiniband/core/nldev.c
> +++ b/drivers/infiniband/core/nldev.c
> @@ -1816,6 +1816,8 @@ static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   	return err;
>   }
>   
> +static DEFINE_MUTEX(nldev_dellink_mutex);
> +
>   static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   			  struct netlink_ext_ack *extack)
>   {
> @@ -1846,7 +1848,9 @@ static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   	 * implicitly scoped to the driver supporting dynamic link deletion like RXE.
>   	 */
>   	if (device->link_ops && device->link_ops->dellink) {
> +		mutex_lock(&nldev_dellink_mutex);
>   		err = device->link_ops->dellink(device);
> +		mutex_unlock(&nldev_dellink_mutex);
>   		if (err)
>   			return err;
>   	}

-- 
Best Regards,
Yanjun.Zhu


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
  2026-05-07 13:25     ` Zhu Yanjun
@ 2026-05-07 13:40       ` Edward Adam Davis
  2026-05-07 14:11         ` Zhu Yanjun
  0 siblings, 1 reply; 31+ messages in thread
From: Edward Adam Davis @ 2026-05-07 13:40 UTC (permalink / raw)
  To: yanjun.zhu
  Cc: akpm, arjan, davem, dsahern, eadavis, edumazet, hdanton, horms,
	jgg, kuba, kuni1840, kuniyu, leon, linux-kernel, linux-rdma,
	netdev, pabeni, syzbot+d8f76778263ab65c2b21, syzkaller-bugs,
	zyjzyj2000

On Thu, 7 May 2026 06:25:54 -0700, Zhu Yanjun wrote:
> > We must serialize calls to nldev_dellink() or risk a crash as syzbot
> > reported:
> >
> > KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
> > Call Trace:
> >   udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> >   rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
> >   rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
> >   rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
> >   rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
> >
> > Fixes: a60e3f3d6fba ("RDMA/nldev: Add dellink function pointer")
> > Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
> > Tested-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> > Signed-off-by: Edward Adam Davis <eadavis@qq.com>
> 
> Thanks a lot. This looks like a good solution. Since the issue is
> reproducible,
> 
> have you sent this commit to syzbot for verification?
The patch has been verified by syzbot.

BR,
Edward


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
  2026-05-07 13:40       ` Edward Adam Davis
@ 2026-05-07 14:11         ` Zhu Yanjun
  0 siblings, 0 replies; 31+ messages in thread
From: Zhu Yanjun @ 2026-05-07 14:11 UTC (permalink / raw)
  To: Edward Adam Davis, yanjun.zhu@linux.dev
  Cc: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, jgg, kuba,
	kuni1840, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzbot+d8f76778263ab65c2b21, syzkaller-bugs, zyjzyj2000


在 2026/5/7 6:40, Edward Adam Davis 写道:
> On Thu, 7 May 2026 06:25:54 -0700, Zhu Yanjun wrote:
>>> We must serialize calls to nldev_dellink() or risk a crash as syzbot
>>> reported:
>>>
>>> KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
>>> Call Trace:
>>>    udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
>>>    rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
>>>    rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
>>>    rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
>>>    rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
>>>
>>> Fixes: a60e3f3d6fba ("RDMA/nldev: Add dellink function pointer")
>>> Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
>>> Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
>>> Tested-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
>>> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
>> Thanks a lot. This looks like a good solution. Since the issue is
>> reproducible,
>>
>> have you sent this commit to syzbot for verification?
> The patch has been verified by syzbot.

Thanks a lot.

Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>

Zhu Yanjun

>
> BR,
> Edward
>
-- 
Best Regards,
Yanjun.Zhu


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
  2026-05-07 12:50   ` [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink() Edward Adam Davis
  2026-05-07 13:25     ` Zhu Yanjun
@ 2026-05-13 18:17     ` Leon Romanovsky
  2026-05-13 23:46       ` Jason Gunthorpe
  1 sibling, 1 reply; 31+ messages in thread
From: Leon Romanovsky @ 2026-05-13 18:17 UTC (permalink / raw)
  To: syzbot+d8f76778263ab65c2b21, Edward Adam Davis
  Cc: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, jgg, kuba,
	kuniyu, linux-kernel, linux-rdma, netdev, pabeni, syzkaller-bugs,
	yanjun.zhu, zyjzyj2000, Kuniyuki Iwashima


On Thu, 07 May 2026 20:50:10 +0800, Edward Adam Davis wrote:
> We must serialize calls to nldev_dellink() or risk a crash as syzbot
> reported:
> 
> Call Trace:
>  udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
>  rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
>  rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
>  rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
>  rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
> 
> [...]

Applied, thanks!

[1/1] RDMA/nldev: add mutual exclusion in nldev_dellink()
      https://git.kernel.org/rdma/rdma/c/0b28000b64f40d

Best regards,
-- 
Leon Romanovsky <leon@kernel.org>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
  2026-05-13 18:17     ` Leon Romanovsky
@ 2026-05-13 23:46       ` Jason Gunthorpe
  2026-05-14  7:31         ` Edward Adam Davis
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2026-05-13 23:46 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: syzbot+d8f76778263ab65c2b21, Edward Adam Davis, akpm, arjan,
	davem, dsahern, edumazet, hdanton, horms, kuba, kuniyu,
	linux-kernel, linux-rdma, netdev, pabeni, syzkaller-bugs,
	yanjun.zhu, zyjzyj2000

On Wed, May 13, 2026 at 02:17:28PM -0400, Leon Romanovsky wrote:
> 
> On Thu, 07 May 2026 20:50:10 +0800, Edward Adam Davis wrote:
> > We must serialize calls to nldev_dellink() or risk a crash as syzbot
> > reported:
> > 
> > Call Trace:
> >  udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> >  rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
> >  rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
> >  rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
> >  rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
> > 
> > [...]
> 
> Applied, thanks!
> 
> [1/1] RDMA/nldev: add mutual exclusion in nldev_dellink()
>       https://git.kernel.org/rdma/rdma/c/0b28000b64f40d

This seems like a rxe bug, I would have expected the lock to be inside
rxe to protect its racy implementation of rxe_net_del(), which looks
like it is possibly also triggered by NETDEV_UNREGISTER...

ie it should not change nldev_dellink().

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4)
  2026-05-07  3:52 ` syzbot
  2026-05-07 12:50   ` [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink() Edward Adam Davis
@ 2026-05-14  5:15   ` Zhu Yanjun
  2026-05-16  5:44     ` Zhu Yanjun
  1 sibling, 1 reply; 31+ messages in thread
From: Zhu Yanjun @ 2026-05-14  5:15 UTC (permalink / raw)
  To: syzbot, akpm, arjan, davem, dsahern, edumazet, hdanton, horms,
	jgg, kuba, kuni1840, kuniyu, leon, linux-kernel, linux-rdma,
	netdev, pabeni, syzkaller-bugs, zyjzyj2000

syz test: https://github.com/zhuyj/linux null-ptr-deref_kernel_sock_shutdown


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
  2026-05-13 23:46       ` Jason Gunthorpe
@ 2026-05-14  7:31         ` Edward Adam Davis
  2026-05-14 11:50           ` Jason Gunthorpe
  0 siblings, 1 reply; 31+ messages in thread
From: Edward Adam Davis @ 2026-05-14  7:31 UTC (permalink / raw)
  To: jgg
  Cc: akpm, arjan, davem, dsahern, eadavis, edumazet, hdanton, horms,
	kuba, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzbot+d8f76778263ab65c2b21, syzkaller-bugs, yanjun.zhu,
	zyjzyj2000

On Wed, 13 May 2026 20:46:55 -0300, Jason Gunthorpe wrote:
> On Wed, May 13, 2026 at 02:17:28PM -0400, Leon Romanovsky wrote:
> >
> > On Thu, 07 May 2026 20:50:10 +0800, Edward Adam Davis wrote:
> > > We must serialize calls to nldev_dellink() or risk a crash as syzbot
> > > reported:
> > >
> > > Call Trace:
> > >  udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> > >  rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
> > >  rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
> > >  rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
> > >  rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
> > >
> > > [...]
> >
> > Applied, thanks!
> >
> > [1/1] RDMA/nldev: add mutual exclusion in nldev_dellink()
> >       https://git.kernel.org/rdma/rdma/c/0b28000b64f40d
> 
> This seems like a rxe bug, I would have expected the lock to be inside
> rxe to protect its racy implementation of rxe_net_del(), which looks
> like it is possibly also triggered by NETDEV_UNREGISTER...
No, it was triggered by RDMA_NLDEV_CMD_DELLINK, you can see the "call trace".
> 
> ie it should not change nldev_dellink().
While this could be fixed within RXE, the same issue affects all other
RXE-like submodules when they subsequently support the "dellink" interface,
therefore, handling this within nldev_dellink() is relatively more appropriate.

Edward


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
  2026-05-14  7:31         ` Edward Adam Davis
@ 2026-05-14 11:50           ` Jason Gunthorpe
  2026-05-14 13:58             ` David Ahern
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2026-05-14 11:50 UTC (permalink / raw)
  To: Edward Adam Davis
  Cc: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, kuba,
	kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzbot+d8f76778263ab65c2b21, syzkaller-bugs, yanjun.zhu,
	zyjzyj2000

On Thu, May 14, 2026 at 03:31:22PM +0800, Edward Adam Davis wrote:
> On Wed, 13 May 2026 20:46:55 -0300, Jason Gunthorpe wrote:
> > On Wed, May 13, 2026 at 02:17:28PM -0400, Leon Romanovsky wrote:
> > >
> > > On Thu, 07 May 2026 20:50:10 +0800, Edward Adam Davis wrote:
> > > > We must serialize calls to nldev_dellink() or risk a crash as syzbot
> > > > reported:
> > > >
> > > > Call Trace:
> > > >  udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> > > >  rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
> > > >  rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
> > > >  rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
> > > >  rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
> > > >
> > > > [...]
> > >
> > > Applied, thanks!
> > >
> > > [1/1] RDMA/nldev: add mutual exclusion in nldev_dellink()
> > >       https://git.kernel.org/rdma/rdma/c/0b28000b64f40d
> > 
> > This seems like a rxe bug, I would have expected the lock to be inside
> > rxe to protect its racy implementation of rxe_net_del(), which looks
> > like it is possibly also triggered by NETDEV_UNREGISTER...
> No, it was triggered by RDMA_NLDEV_CMD_DELLINK, you can see the "call trace".
> > 
> > ie it should not change nldev_dellink().
> While this could be fixed within RXE, the same issue affects all other
> RXE-like submodules when they subsequently support the "dellink" interface,
> therefore, handling this within nldev_dellink() is relatively more appropriate.

Why would other modules have an issue? The problem is rxe's racey
refcounting scheme for its lazy socket creation. There is nothing
wrong with nldev, and now you've created some nasty BKL in the nldev
code to fix rxe while ignoring its other races.

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
  2026-05-14 11:50           ` Jason Gunthorpe
@ 2026-05-14 13:58             ` David Ahern
  2026-05-14 14:14               ` Jason Gunthorpe
  0 siblings, 1 reply; 31+ messages in thread
From: David Ahern @ 2026-05-14 13:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Edward Adam Davis
  Cc: akpm, arjan, davem, edumazet, hdanton, horms, kuba, kuniyu, leon,
	linux-kernel, linux-rdma, netdev, pabeni,
	syzbot+d8f76778263ab65c2b21, syzkaller-bugs, yanjun.zhu,
	zyjzyj2000

On 5/14/26 5:50 AM, Jason Gunthorpe wrote:
> On Thu, May 14, 2026 at 03:31:22PM +0800, Edward Adam Davis wrote:
>> On Wed, 13 May 2026 20:46:55 -0300, Jason Gunthorpe wrote:
>>> On Wed, May 13, 2026 at 02:17:28PM -0400, Leon Romanovsky wrote:
>>>>
>>>> On Thu, 07 May 2026 20:50:10 +0800, Edward Adam Davis wrote:
>>>>> We must serialize calls to nldev_dellink() or risk a crash as syzbot
>>>>> reported:
>>>>>
>>>>> Call Trace:
>>>>>  udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
>>>>>  rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
>>>>>  rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
>>>>>  rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
>>>>>  rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
>>>>>
>>>>> [...]
>>>>
>>>> Applied, thanks!
>>>>
>>>> [1/1] RDMA/nldev: add mutual exclusion in nldev_dellink()
>>>>       https://git.kernel.org/rdma/rdma/c/0b28000b64f40d
>>>
>>> This seems like a rxe bug, I would have expected the lock to be inside
>>> rxe to protect its racy implementation of rxe_net_del(), which looks
>>> like it is possibly also triggered by NETDEV_UNREGISTER...
>> No, it was triggered by RDMA_NLDEV_CMD_DELLINK, you can see the "call trace".

Not that Jason's point. Code wise

rxe_dellink -> rxe_net_del

netdev NETDEV_UNREGISTER:
 rxe_notify -> rxe_net_del

both can lead to the same problem

>>>
>>> ie it should not change nldev_dellink().
>> While this could be fixed within RXE, the same issue affects all other
>> RXE-like submodules when they subsequently support the "dellink" interface,
>> therefore, handling this within nldev_dellink() is relatively more appropriate.
> 
> Why would other modules have an issue? The problem is rxe's racey
> refcounting scheme for its lazy socket creation. There is nothing
> wrong with nldev, and now you've created some nasty BKL in the nldev
> code to fix rxe while ignoring its other races.

+1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
  2026-05-14 13:58             ` David Ahern
@ 2026-05-14 14:14               ` Jason Gunthorpe
  2026-05-14 14:26                 ` David Ahern
  2026-05-16 12:40                 ` Edward Adam Davis
  0 siblings, 2 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2026-05-14 14:14 UTC (permalink / raw)
  To: David Ahern
  Cc: Edward Adam Davis, akpm, arjan, davem, edumazet, hdanton, horms,
	kuba, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzbot+d8f76778263ab65c2b21, syzkaller-bugs, yanjun.zhu,
	zyjzyj2000

On Thu, May 14, 2026 at 07:58:18AM -0600, David Ahern wrote:
> On 5/14/26 5:50 AM, Jason Gunthorpe wrote:
> > On Thu, May 14, 2026 at 03:31:22PM +0800, Edward Adam Davis wrote:
> >> On Wed, 13 May 2026 20:46:55 -0300, Jason Gunthorpe wrote:
> >>> On Wed, May 13, 2026 at 02:17:28PM -0400, Leon Romanovsky wrote:
> >>>>
> >>>> On Thu, 07 May 2026 20:50:10 +0800, Edward Adam Davis wrote:
> >>>>> We must serialize calls to nldev_dellink() or risk a crash as syzbot
> >>>>> reported:
> >>>>>
> >>>>> Call Trace:
> >>>>>  udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> >>>>>  rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
> >>>>>  rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
> >>>>>  rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
> >>>>>  rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
> >>>>>
> >>>>> [...]
> >>>>
> >>>> Applied, thanks!
> >>>>
> >>>> [1/1] RDMA/nldev: add mutual exclusion in nldev_dellink()
> >>>>       https://git.kernel.org/rdma/rdma/c/0b28000b64f40d
> >>>
> >>> This seems like a rxe bug, I would have expected the lock to be inside
> >>> rxe to protect its racy implementation of rxe_net_del(), which looks
> >>> like it is possibly also triggered by NETDEV_UNREGISTER...
> >> No, it was triggered by RDMA_NLDEV_CMD_DELLINK, you can see the "call trace".
> 
> Not that Jason's point. Code wise
> 
> rxe_dellink -> rxe_net_del
> 
> netdev NETDEV_UNREGISTER:
>  rxe_notify -> rxe_net_del
> 
> both can lead to the same problem
> 
> >>>
> >>> ie it should not change nldev_dellink().
> >> While this could be fixed within RXE, the same issue affects all other
> >> RXE-like submodules when they subsequently support the "dellink" interface,
> >> therefore, handling this within nldev_dellink() is relatively more appropriate.
> > 
> > Why would other modules have an issue? The problem is rxe's racey
> > refcounting scheme for its lazy socket creation. There is nothing
> > wrong with nldev, and now you've created some nasty BKL in the nldev
> > code to fix rxe while ignoring its other races.
> 
> +1

Edward, please come with a fixup on top of this since it was already
applied

Jason
 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
  2026-05-14 14:14               ` Jason Gunthorpe
@ 2026-05-14 14:26                 ` David Ahern
  2026-05-14 15:46                   ` Zhu Yanjun
  2026-05-16 12:40                 ` Edward Adam Davis
  1 sibling, 1 reply; 31+ messages in thread
From: David Ahern @ 2026-05-14 14:26 UTC (permalink / raw)
  To: Jason Gunthorpe, Zhu Yanjun
  Cc: Edward Adam Davis, akpm, arjan, davem, edumazet, hdanton, horms,
	kuba, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzbot+d8f76778263ab65c2b21, syzkaller-bugs, yanjun.zhu,
	zyjzyj2000

On 5/14/26 8:14 AM, Jason Gunthorpe wrote:
> 
> Edward, please come with a fixup on top of this since it was already
> applied
> 

Zhu Yanjun: As author of the patch that introduced the bug and
maintainer of the rxe code, why have you not addressed this problem? It
has been well known for many weeks now and multiple people have
attempted fixes. Seems like you need to step up and take care of it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
  2026-05-14 14:26                 ` David Ahern
@ 2026-05-14 15:46                   ` Zhu Yanjun
  0 siblings, 0 replies; 31+ messages in thread
From: Zhu Yanjun @ 2026-05-14 15:46 UTC (permalink / raw)
  To: David Ahern, Jason Gunthorpe, Zhu Yanjun
  Cc: Edward Adam Davis, akpm, arjan, davem, edumazet, hdanton, horms,
	kuba, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzbot+d8f76778263ab65c2b21, syzkaller-bugs, zyjzyj2000


在 2026/5/14 7:26, David Ahern 写道:
> On 5/14/26 8:14 AM, Jason Gunthorpe wrote:
>> Edward, please come with a fixup on top of this since it was already
>> applied
>>
> Zhu Yanjun: As author of the patch that introduced the bug and
> maintainer of the rxe code, why have you not addressed this problem? It
> has been well known for many weeks now and multiple people have
I am aware of the issue and have been following the discussion and 
proposed fixes.

I did not want to rush a change without fully understanding the 
implications on RXE

behavior and existing users. I am currently reviewing the proposed 
approaches and

working on a proper fix.

I appreciate everyone who helped investigate and test the issue.

Zhu Yanjun


> attempted fixes. Seems like you need to step up and take care of it.
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4)
  2026-05-14  5:15   ` [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4) Zhu Yanjun
@ 2026-05-16  5:44     ` Zhu Yanjun
  2026-05-16  7:02       ` syzbot
  0 siblings, 1 reply; 31+ messages in thread
From: Zhu Yanjun @ 2026-05-16  5:44 UTC (permalink / raw)
  To: syzbot, akpm, arjan, davem, dsahern, edumazet, hdanton, horms,
	jgg, kuba, kuni1840, kuniyu, leon, linux-kernel, linux-rdma,
	netdev, pabeni, syzkaller-bugs, zyjzyj2000, yanjun.zhu@linux.dev

#syz test: https://github.com/zhuyj/linux 
null-ptr-deref_kernel_sock_shutdown

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4)
  2026-05-16  5:44     ` Zhu Yanjun
@ 2026-05-16  7:02       ` syzbot
  2026-05-16 18:40         ` Zhu Yanjun
  0 siblings, 1 reply; 31+ messages in thread
From: syzbot @ 2026-05-16  7:02 UTC (permalink / raw)
  To: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, jgg, kuba,
	kuni1840, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzkaller-bugs, yanjun.zhu, zyjzyj2000

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
lost connection to test machine



Tested on:

commit:         bb9ed28b RDMA/rxe: Fix null-ptr-deref in kernel_sock_s..
git tree:       https://github.com/zhuyj/linux null-ptr-deref_kernel_sock_shutdown
console output: https://syzkaller.appspot.com/x/log.txt?x=16617cc8580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=eeba87c808be946b
dashboard link: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8

Note: no patches were applied.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
  2026-05-14 14:14               ` Jason Gunthorpe
  2026-05-14 14:26                 ` David Ahern
@ 2026-05-16 12:40                 ` Edward Adam Davis
  2026-05-16 14:00                   ` [PATCH RDMA v2] RDMA/rxe: add mutual exclusion in rxe_net_del() Edward Adam Davis
  1 sibling, 1 reply; 31+ messages in thread
From: Edward Adam Davis @ 2026-05-16 12:40 UTC (permalink / raw)
  To: jgg
  Cc: akpm, arjan, davem, dsahern, eadavis, edumazet, hdanton, horms,
	kuba, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzbot+d8f76778263ab65c2b21, syzkaller-bugs, yanjun.zhu,
	zyjzyj2000

On Thu, 14 May 2026 11:14:09 -0300, Jason Gunthorpe wrote:
> On Thu, May 14, 2026 at 07:58:18AM -0600, David Ahern wrote:
> > On 5/14/26 5:50 AM, Jason Gunthorpe wrote:
> > > On Thu, May 14, 2026 at 03:31:22PM +0800, Edward Adam Davis wrote:
> > >> On Wed, 13 May 2026 20:46:55 -0300, Jason Gunthorpe wrote:
> > >>> On Wed, May 13, 2026 at 02:17:28PM -0400, Leon Romanovsky wrote:
> > >>>>
> > >>>> On Thu, 07 May 2026 20:50:10 +0800, Edward Adam Davis wrote:
> > >>>>> We must serialize calls to nldev_dellink() or risk a crash as syzbot
> > >>>>> reported:
> > >>>>>
> > >>>>> Call Trace:
> > >>>>>  udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> > >>>>>  rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
> > >>>>>  rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
> > >>>>>  rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
> > >>>>>  rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
> > >>>>>
> > >>>>> [...]
> > >>>>
> > >>>> Applied, thanks!
> > >>>>
> > >>>> [1/1] RDMA/nldev: add mutual exclusion in nldev_dellink()
> > >>>>       https://git.kernel.org/rdma/rdma/c/0b28000b64f40d
> > >>>
> > >>> This seems like a rxe bug, I would have expected the lock to be inside
> > >>> rxe to protect its racy implementation of rxe_net_del(), which looks
> > >>> like it is possibly also triggered by NETDEV_UNREGISTER...
> > >> No, it was triggered by RDMA_NLDEV_CMD_DELLINK, you can see the "call trace".
> >
> > Not that Jason's point. Code wise
> >
> > rxe_dellink -> rxe_net_del
> >
> > netdev NETDEV_UNREGISTER:
> >  rxe_notify -> rxe_net_del
> >
> > both can lead to the same problem
> >
> > >>>
> > >>> ie it should not change nldev_dellink().
> > >> While this could be fixed within RXE, the same issue affects all other
> > >> RXE-like submodules when they subsequently support the "dellink" interface,
> > >> therefore, handling this within nldev_dellink() is relatively more appropriate.
> > >
> > > Why would other modules have an issue? The problem is rxe's racey
> > > refcounting scheme for its lazy socket creation. There is nothing
> > > wrong with nldev, and now you've created some nasty BKL in the nldev
> > > code to fix rxe while ignoring its other races.
> >
> > +1
> 
> Edward, please come with a fixup on top of this since it was already
> applied
OK.

Edward


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH RDMA v2] RDMA/rxe: add mutual exclusion in rxe_net_del()
  2026-05-16 12:40                 ` Edward Adam Davis
@ 2026-05-16 14:00                   ` Edward Adam Davis
  2026-05-16 14:31                     ` Zhu Yanjun
  2026-05-16 23:40                     ` Yanjun.Zhu
  0 siblings, 2 replies; 31+ messages in thread
From: Edward Adam Davis @ 2026-05-16 14:00 UTC (permalink / raw)
  To: eadavis
  Cc: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, jgg, kuba,
	kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzbot+d8f76778263ab65c2b21, syzkaller-bugs, yanjun.zhu,
	zyjzyj2000

We must serialize calls to rxe_net_del() or risk a crash as syzbot
reported:

KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
Call Trace:
 udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
 rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
 rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
 rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
 rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254

Jason Gunthorpe suggest placing the lock within rxe to protect its racy
implementation of rxe_net_del(), which looks like it is possibly also
triggered by NETDEV_UNREGISTER.

The patch addressing this issue in nldev_dellink() has already been
applied(0b28000b64f4); however, since the fix has now been relocated
to rxe, the corresponding remedial code in nldev has been removed.

Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and destruction per net namespace")
Fixes: 0b28000b64f4 ("RDMA/nldev: Add mutual exclusion in nldev_dellink()")
Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
---
v1 -> v2: serialize calls to rxe net del

 drivers/infiniband/core/nldev.c     | 4 ----
 drivers/infiniband/sw/rxe/rxe_net.c | 7 ++++++-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 3cb3cb7629fe..96c745d5bac4 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -1816,8 +1816,6 @@ static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	return err;
 }
 
-static DEFINE_MUTEX(nldev_dellink_mutex);
-
 static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			  struct netlink_ext_ack *extack)
 {
@@ -1848,9 +1846,7 @@ static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	 * implicitly scoped to the driver supporting dynamic link deletion like RXE.
 	 */
 	if (device->link_ops && device->link_ops->dellink) {
-		mutex_lock(&nldev_dellink_mutex);
 		err = device->link_ops->dellink(device);
-		mutex_unlock(&nldev_dellink_mutex);
 		if (err)
 			return err;
 	}
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 50a2cb5405e2..92847e955ca2 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -642,6 +642,8 @@ static void rxe_sock_put(struct sock *sk,
 	}
 }
 
+static DEFINE_MUTEX(rxe_net_del_mutex);
+
 void rxe_net_del(struct ib_device *dev)
 {
 	struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev);
@@ -649,9 +651,10 @@ void rxe_net_del(struct ib_device *dev)
 	struct sock *sk;
 	struct net *net;
 
+	mutex_lock(&rxe_net_del_mutex);
 	ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
 	if (!ndev)
-		return;
+		goto out;
 
 	net = dev_net(ndev);
 
@@ -664,6 +667,8 @@ void rxe_net_del(struct ib_device *dev)
 		rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);
 
 	dev_put(ndev);
+out:
+	mutex_unlock(&rxe_net_del_mutex);
 }
 
 static void rxe_port_event(struct rxe_dev *rxe,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH RDMA v2] RDMA/rxe: add mutual exclusion in rxe_net_del()
  2026-05-16 14:00                   ` [PATCH RDMA v2] RDMA/rxe: add mutual exclusion in rxe_net_del() Edward Adam Davis
@ 2026-05-16 14:31                     ` Zhu Yanjun
  2026-05-16 23:40                     ` Yanjun.Zhu
  1 sibling, 0 replies; 31+ messages in thread
From: Zhu Yanjun @ 2026-05-16 14:31 UTC (permalink / raw)
  To: Edward Adam Davis, yanjun.zhu@linux.dev
  Cc: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, jgg, kuba,
	kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzbot+d8f76778263ab65c2b21, syzkaller-bugs, zyjzyj2000

在 2026/5/16 7:00, Edward Adam Davis 写道:
> We must serialize calls to rxe_net_del() or risk a crash as syzbot
> reported:
> 
> KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
> Call Trace:
>   udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
>   rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
>   rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
>   rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
>   rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
> 
> Jason Gunthorpe suggest placing the lock within rxe to protect its racy
> implementation of rxe_net_del(), which looks like it is possibly also
> triggered by NETDEV_UNREGISTER.
> 
> The patch addressing this issue in nldev_dellink() has already been
> applied(0b28000b64f4); however, since the fix has now been relocated
> to rxe, the corresponding remedial code in nldev has been removed.
> 
> Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and destruction per net namespace")
> Fixes: 0b28000b64f4 ("RDMA/nldev: Add mutual exclusion in nldev_dellink()")
> Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
> ---
> v1 -> v2: serialize calls to rxe net del

I looked through the commit. I am not sure if this commit should be sent 
to syzbot to verify.

Zhu Yanjun

> 
>   drivers/infiniband/core/nldev.c     | 4 ----
>   drivers/infiniband/sw/rxe/rxe_net.c | 7 ++++++-
>   2 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> index 3cb3cb7629fe..96c745d5bac4 100644
> --- a/drivers/infiniband/core/nldev.c
> +++ b/drivers/infiniband/core/nldev.c
> @@ -1816,8 +1816,6 @@ static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   	return err;
>   }
>   
> -static DEFINE_MUTEX(nldev_dellink_mutex);
> -
>   static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   			  struct netlink_ext_ack *extack)
>   {
> @@ -1848,9 +1846,7 @@ static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   	 * implicitly scoped to the driver supporting dynamic link deletion like RXE.
>   	 */
>   	if (device->link_ops && device->link_ops->dellink) {
> -		mutex_lock(&nldev_dellink_mutex);
>   		err = device->link_ops->dellink(device);
> -		mutex_unlock(&nldev_dellink_mutex);
>   		if (err)
>   			return err;
>   	}
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 50a2cb5405e2..92847e955ca2 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -642,6 +642,8 @@ static void rxe_sock_put(struct sock *sk,
>   	}
>   }
>   
> +static DEFINE_MUTEX(rxe_net_del_mutex);
> +
>   void rxe_net_del(struct ib_device *dev)
>   {
>   	struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev);
> @@ -649,9 +651,10 @@ void rxe_net_del(struct ib_device *dev)
>   	struct sock *sk;
>   	struct net *net;
>   
> +	mutex_lock(&rxe_net_del_mutex);
>   	ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
>   	if (!ndev)
> -		return;
> +		goto out;
>   
>   	net = dev_net(ndev);
>   
> @@ -664,6 +667,8 @@ void rxe_net_del(struct ib_device *dev)
>   		rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);
>   
>   	dev_put(ndev);
> +out:
> +	mutex_unlock(&rxe_net_del_mutex);
>   }
>   
>   static void rxe_port_event(struct rxe_dev *rxe,


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4)
  2026-05-16  7:02       ` syzbot
@ 2026-05-16 18:40         ` Zhu Yanjun
  0 siblings, 0 replies; 31+ messages in thread
From: Zhu Yanjun @ 2026-05-16 18:40 UTC (permalink / raw)
  To: syzbot, akpm, arjan, davem, dsahern, edumazet, hdanton, horms,
	jgg, kuba, kuni1840, kuniyu, leon, linux-kernel, linux-rdma,
	netdev, pabeni, syzkaller-bugs, zyjzyj2000, yanjun.zhu@linux.dev

diff --git a/drivers/infiniband/core/nldev.c 
b/drivers/infiniband/core/nldev.c
index 3cb3cb7629fe..96c745d5bac4 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -1816,8 +1816,6 @@ static int nldev_newlink(struct sk_buff *skb, 
struct nlmsghdr *nlh,
  	return err;
  }

-static DEFINE_MUTEX(nldev_dellink_mutex);
-
  static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
  			  struct netlink_ext_ack *extack)
  {
@@ -1848,9 +1846,7 @@ static int nldev_dellink(struct sk_buff *skb, 
struct nlmsghdr *nlh,
  	 * implicitly scoped to the driver supporting dynamic link deletion 
like RXE.
  	 */
  	if (device->link_ops && device->link_ops->dellink) {
-		mutex_lock(&nldev_dellink_mutex);
  		err = device->link_ops->dellink(device);
-		mutex_unlock(&nldev_dellink_mutex);
  		if (err)
  			return err;
  	}
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c 
b/drivers/infiniband/sw/rxe/rxe_net.c
index 50a2cb5405e2..92847e955ca2 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -642,6 +642,8 @@ static void rxe_sock_put(struct sock *sk,
  	}
  }

+static DEFINE_MUTEX(rxe_net_del_mutex);
+
  void rxe_net_del(struct ib_device *dev)
  {
  	struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev);
@@ -649,9 +651,10 @@ void rxe_net_del(struct ib_device *dev)
  	struct sock *sk;
  	struct net *net;

+	mutex_lock(&rxe_net_del_mutex);
  	ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
  	if (!ndev)
-		return;
+		goto out;

  	net = dev_net(ndev);

@@ -664,6 +667,8 @@ void rxe_net_del(struct ib_device *dev)
  		rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);

  	dev_put(ndev);
+out:
+	mutex_unlock(&rxe_net_del_mutex);
  }

  static void rxe_port_event(struct rxe_dev *rxe,

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH RDMA v2] RDMA/rxe: add mutual exclusion in rxe_net_del()
  2026-05-16 14:00                   ` [PATCH RDMA v2] RDMA/rxe: add mutual exclusion in rxe_net_del() Edward Adam Davis
  2026-05-16 14:31                     ` Zhu Yanjun
@ 2026-05-16 23:40                     ` Yanjun.Zhu
  2026-05-17  1:56                       ` Edward Adam Davis
  2026-05-17  2:15                       ` Kuniyuki Iwashima
  1 sibling, 2 replies; 31+ messages in thread
From: Yanjun.Zhu @ 2026-05-16 23:40 UTC (permalink / raw)
  To: Edward Adam Davis, Zhu Yanjun
  Cc: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, jgg, kuba,
	kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzbot+d8f76778263ab65c2b21, syzkaller-bugs, zyjzyj2000


On 5/16/26 7:00 AM, Edward Adam Davis wrote:
> We must serialize calls to rxe_net_del() or risk a crash as syzbot
> reported:
>
> KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
> Call Trace:
>   udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
>   rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
>   rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
>   rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
>   rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
>
> Jason Gunthorpe suggest placing the lock within rxe to protect its racy
> implementation of rxe_net_del(), which looks like it is possibly also
> triggered by NETDEV_UNREGISTER.
>
> The patch addressing this issue in nldev_dellink() has already been
> applied(0b28000b64f4); however, since the fix has now been relocated
> to rxe, the corresponding remedial code in nldev has been removed.
>
> Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and destruction per net namespace")
> Fixes: 0b28000b64f4 ("RDMA/nldev: Add mutual exclusion in nldev_dellink()")
> Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
> ---
> v1 -> v2: serialize calls to rxe net del
>
>   drivers/infiniband/core/nldev.c     | 4 ----
>   drivers/infiniband/sw/rxe/rxe_net.c | 7 ++++++-
>   2 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> index 3cb3cb7629fe..96c745d5bac4 100644
> --- a/drivers/infiniband/core/nldev.c
> +++ b/drivers/infiniband/core/nldev.c
> @@ -1816,8 +1816,6 @@ static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   	return err;
>   }
>   
> -static DEFINE_MUTEX(nldev_dellink_mutex);
> -
>   static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   			  struct netlink_ext_ack *extack)
>   {
> @@ -1848,9 +1846,7 @@ static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   	 * implicitly scoped to the driver supporting dynamic link deletion like RXE.
>   	 */
>   	if (device->link_ops && device->link_ops->dellink) {
> -		mutex_lock(&nldev_dellink_mutex);
>   		err = device->link_ops->dellink(device);
> -		mutex_unlock(&nldev_dellink_mutex);
>   		if (err)
>   			return err;
>   	}
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 50a2cb5405e2..92847e955ca2 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -642,6 +642,8 @@ static void rxe_sock_put(struct sock *sk,
>   	}
>   }
>   
I read this commit carefully. There are two paths that can invoke 
rxe_net_del().

One is through the rdma link del xxx command, while the other is through 
the netdevice notification chain.

In the netdevice notification chain path, rtnl_lock is already held, and 
rxe_net_del() is called under that lock.

However, in the rdma link del xxx path, no rtnl_lock is taken.

Because of this, I would like to use the existing rtnl_lock to serialize 
calls to rxe_net_del().

My proposed commit is shown below. I am not sure whether it fully 
resolves the problem.

diff --git a/drivers/infiniband/sw/rxe/rxe.c 
b/drivers/infiniband/sw/rxe/rxe.c
index b0714f9abe3d..84266dc416c4 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -251,7 +251,9 @@ static int rxe_newlink(const char *ibdev_name, 
struct net_device *ndev)

  static int rxe_dellink(struct ib_device *dev)
  {
+       rtnl_lock();
         rxe_net_del(dev);
+       rtnl_unlock();

         return 0;
  }
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c 
b/drivers/infiniband/sw/rxe/rxe_net.c
index 50a2cb5405e2..ac53ea73996d 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -649,6 +649,8 @@ void rxe_net_del(struct ib_device *dev)
         struct sock *sk;
         struct net *net;

+       ASSERT_RTNL();
+
         ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
         if (!ndev)
                 return;

Zhu Yanjun

> +static DEFINE_MUTEX(rxe_net_del_mutex);
> +
>   void rxe_net_del(struct ib_device *dev)
>   {
>   	struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev);
> @@ -649,9 +651,10 @@ void rxe_net_del(struct ib_device *dev)
>   	struct sock *sk;
>   	struct net *net;
>   
> +	mutex_lock(&rxe_net_del_mutex);
>   	ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
>   	if (!ndev)
> -		return;
> +		goto out;
>   
>   	net = dev_net(ndev);
>   
> @@ -664,6 +667,8 @@ void rxe_net_del(struct ib_device *dev)
>   		rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);
>   
>   	dev_put(ndev);
> +out:
> +	mutex_unlock(&rxe_net_del_mutex);
>   }
>   
>   static void rxe_port_event(struct rxe_dev *rxe,

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH RDMA v2] RDMA/rxe: add mutual exclusion in rxe_net_del()
  2026-05-16 23:40                     ` Yanjun.Zhu
@ 2026-05-17  1:56                       ` Edward Adam Davis
  2026-05-17  2:15                       ` Kuniyuki Iwashima
  1 sibling, 0 replies; 31+ messages in thread
From: Edward Adam Davis @ 2026-05-17  1:56 UTC (permalink / raw)
  To: yanjun.zhu
  Cc: akpm, arjan, davem, dsahern, eadavis, edumazet, hdanton, horms,
	jgg, kuba, kuni1840, kuniyu, leon, linux-kernel, linux-rdma,
	netdev, pabeni, syzbot+d8f76778263ab65c2b21, syzkaller-bugs,
	zyjzyj2000

On Sat, 16 May 2026 16:40:22 -0700, Zhu Yanjun wrote:
> diff --git a/drivers/infiniband/sw/rxe/rxe.c 
> b/drivers/infiniband/sw/rxe/rxe.c
> index b0714f9abe3d..84266dc416c4 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -251,7 +251,9 @@ static int rxe_newlink(const char *ibdev_name, 
> struct net_device *ndev)
> 
>   static int rxe_dellink(struct ib_device *dev)
>   {
> +       rtnl_lock();
>          rxe_net_del(dev);
> +       rtnl_unlock();
> 
>          return 0;
>   }
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c 
> b/drivers/infiniband/sw/rxe/rxe_net.c
> index 50a2cb5405e2..ac53ea73996d 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -649,6 +649,8 @@ void rxe_net_del(struct ib_device *dev)
>          struct sock *sk;
>          struct net *net;
> 
> +       ASSERT_RTNL();
> +
>          ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
>          if (!ndev)
>                  return;
Since the solution is the same, and requires no new locks, I favor your
approach.

BR,
Edward


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH RDMA v2] RDMA/rxe: add mutual exclusion in rxe_net_del()
  2026-05-16 23:40                     ` Yanjun.Zhu
  2026-05-17  1:56                       ` Edward Adam Davis
@ 2026-05-17  2:15                       ` Kuniyuki Iwashima
  2026-05-17  3:27                         ` Zhu Yanjun
  1 sibling, 1 reply; 31+ messages in thread
From: Kuniyuki Iwashima @ 2026-05-17  2:15 UTC (permalink / raw)
  To: Yanjun.Zhu
  Cc: Edward Adam Davis, akpm, arjan, davem, dsahern, edumazet, hdanton,
	horms, jgg, kuba, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzbot+d8f76778263ab65c2b21, syzkaller-bugs, zyjzyj2000

On Sat, May 16, 2026 at 4:40 PM Yanjun.Zhu <yanjun.zhu@linux.dev> wrote:
>
>
> On 5/16/26 7:00 AM, Edward Adam Davis wrote:
> > We must serialize calls to rxe_net_del() or risk a crash as syzbot
> > reported:
> >
> > KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
> > Call Trace:
> >   udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> >   rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
> >   rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
> >   rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
> >   rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
> >
> > Jason Gunthorpe suggest placing the lock within rxe to protect its racy
> > implementation of rxe_net_del(), which looks like it is possibly also
> > triggered by NETDEV_UNREGISTER.
> >
> > The patch addressing this issue in nldev_dellink() has already been
> > applied(0b28000b64f4); however, since the fix has now been relocated
> > to rxe, the corresponding remedial code in nldev has been removed.
> >
> > Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and destruction per net namespace")
> > Fixes: 0b28000b64f4 ("RDMA/nldev: Add mutual exclusion in nldev_dellink()")
> > Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
> > Signed-off-by: Edward Adam Davis <eadavis@qq.com>
> > ---
> > v1 -> v2: serialize calls to rxe net del
> >
> >   drivers/infiniband/core/nldev.c     | 4 ----
> >   drivers/infiniband/sw/rxe/rxe_net.c | 7 ++++++-
> >   2 files changed, 6 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> > index 3cb3cb7629fe..96c745d5bac4 100644
> > --- a/drivers/infiniband/core/nldev.c
> > +++ b/drivers/infiniband/core/nldev.c
> > @@ -1816,8 +1816,6 @@ static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
> >       return err;
> >   }
> >
> > -static DEFINE_MUTEX(nldev_dellink_mutex);
> > -
> >   static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
> >                         struct netlink_ext_ack *extack)
> >   {
> > @@ -1848,9 +1846,7 @@ static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
> >        * implicitly scoped to the driver supporting dynamic link deletion like RXE.
> >        */
> >       if (device->link_ops && device->link_ops->dellink) {
> > -             mutex_lock(&nldev_dellink_mutex);
> >               err = device->link_ops->dellink(device);
> > -             mutex_unlock(&nldev_dellink_mutex);
> >               if (err)
> >                       return err;
> >       }
> > diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> > index 50a2cb5405e2..92847e955ca2 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_net.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> > @@ -642,6 +642,8 @@ static void rxe_sock_put(struct sock *sk,
> >       }
> >   }
> >
> I read this commit carefully. There are two paths that can invoke
> rxe_net_del().
>
> One is through the rdma link del xxx command, while the other is through
> the netdevice notification chain.
>
> In the netdevice notification chain path, rtnl_lock is already held, and
> rxe_net_del() is called under that lock.
>
> However, in the rdma link del xxx path, no rtnl_lock is taken.
>
> Because of this, I would like to use the existing rtnl_lock to serialize
> calls to rxe_net_del().

-1 for this.

It's a global mutex and heavily contended because many
components use it without much care.  We are working
to reduce the RTNL pressure for years by converting such
users with a dedicated lock or per-netns RTNL mutex.

RTNL is not needed here at all, so please use a dedicated lock.


>
> My proposed commit is shown below. I am not sure whether it fully
> resolves the problem.
>
> diff --git a/drivers/infiniband/sw/rxe/rxe.c
> b/drivers/infiniband/sw/rxe/rxe.c
> index b0714f9abe3d..84266dc416c4 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -251,7 +251,9 @@ static int rxe_newlink(const char *ibdev_name,
> struct net_device *ndev)
>
>   static int rxe_dellink(struct ib_device *dev)
>   {
> +       rtnl_lock();
>          rxe_net_del(dev);
> +       rtnl_unlock();
>
>          return 0;
>   }
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c
> b/drivers/infiniband/sw/rxe/rxe_net.c
> index 50a2cb5405e2..ac53ea73996d 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -649,6 +649,8 @@ void rxe_net_del(struct ib_device *dev)
>          struct sock *sk;
>          struct net *net;
>
> +       ASSERT_RTNL();
> +
>          ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
>          if (!ndev)
>                  return;
>
> Zhu Yanjun
>
> > +static DEFINE_MUTEX(rxe_net_del_mutex);
> > +
> >   void rxe_net_del(struct ib_device *dev)
> >   {
> >       struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev);
> > @@ -649,9 +651,10 @@ void rxe_net_del(struct ib_device *dev)
> >       struct sock *sk;
> >       struct net *net;
> >
> > +     mutex_lock(&rxe_net_del_mutex);
> >       ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
> >       if (!ndev)
> > -             return;
> > +             goto out;
> >
> >       net = dev_net(ndev);
> >
> > @@ -664,6 +667,8 @@ void rxe_net_del(struct ib_device *dev)
> >               rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);
> >
> >       dev_put(ndev);
> > +out:
> > +     mutex_unlock(&rxe_net_del_mutex);
> >   }
> >
> >   static void rxe_port_event(struct rxe_dev *rxe,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH RDMA v2] RDMA/rxe: add mutual exclusion in rxe_net_del()
  2026-05-17  2:15                       ` Kuniyuki Iwashima
@ 2026-05-17  3:27                         ` Zhu Yanjun
  2026-05-17  4:31                           ` Zhu Yanjun
  0 siblings, 1 reply; 31+ messages in thread
From: Zhu Yanjun @ 2026-05-17  3:27 UTC (permalink / raw)
  To: Kuniyuki Iwashima, Yanjun.Zhu
  Cc: Edward Adam Davis, akpm, arjan, davem, dsahern, edumazet, hdanton,
	horms, jgg, kuba, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzbot+d8f76778263ab65c2b21, syzkaller-bugs, zyjzyj2000


在 2026/5/16 19:15, Kuniyuki Iwashima 写道:
> On Sat, May 16, 2026 at 4:40 PM Yanjun.Zhu <yanjun.zhu@linux.dev> wrote:
>>
>> On 5/16/26 7:00 AM, Edward Adam Davis wrote:
>>> We must serialize calls to rxe_net_del() or risk a crash as syzbot
>>> reported:
>>>
>>> KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
>>> Call Trace:
>>>    udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
>>>    rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
>>>    rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
>>>    rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
>>>    rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
>>>
>>> Jason Gunthorpe suggest placing the lock within rxe to protect its racy
>>> implementation of rxe_net_del(), which looks like it is possibly also
>>> triggered by NETDEV_UNREGISTER.
>>>
>>> The patch addressing this issue in nldev_dellink() has already been
>>> applied(0b28000b64f4); however, since the fix has now been relocated
>>> to rxe, the corresponding remedial code in nldev has been removed.
>>>
>>> Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and destruction per net namespace")
>>> Fixes: 0b28000b64f4 ("RDMA/nldev: Add mutual exclusion in nldev_dellink()")
>>> Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
>>> Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
>>> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
>>> ---
>>> v1 -> v2: serialize calls to rxe net del
>>>
>>>    drivers/infiniband/core/nldev.c     | 4 ----
>>>    drivers/infiniband/sw/rxe/rxe_net.c | 7 ++++++-
>>>    2 files changed, 6 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
>>> index 3cb3cb7629fe..96c745d5bac4 100644
>>> --- a/drivers/infiniband/core/nldev.c
>>> +++ b/drivers/infiniband/core/nldev.c
>>> @@ -1816,8 +1816,6 @@ static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
>>>        return err;
>>>    }
>>>
>>> -static DEFINE_MUTEX(nldev_dellink_mutex);
>>> -
>>>    static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>>>                          struct netlink_ext_ack *extack)
>>>    {
>>> @@ -1848,9 +1846,7 @@ static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>>>         * implicitly scoped to the driver supporting dynamic link deletion like RXE.
>>>         */
>>>        if (device->link_ops && device->link_ops->dellink) {
>>> -             mutex_lock(&nldev_dellink_mutex);
>>>                err = device->link_ops->dellink(device);
>>> -             mutex_unlock(&nldev_dellink_mutex);
>>>                if (err)
>>>                        return err;
>>>        }
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
>>> index 50a2cb5405e2..92847e955ca2 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
>>> @@ -642,6 +642,8 @@ static void rxe_sock_put(struct sock *sk,
>>>        }
>>>    }
>>>
>> I read this commit carefully. There are two paths that can invoke
>> rxe_net_del().
>>
>> One is through the rdma link del xxx command, while the other is through
>> the netdevice notification chain.
>>
>> In the netdevice notification chain path, rtnl_lock is already held, and
>> rxe_net_del() is called under that lock.
>>
>> However, in the rdma link del xxx path, no rtnl_lock is taken.
>>
>> Because of this, I would like to use the existing rtnl_lock to serialize
>> calls to rxe_net_del().
> -1 for this.
>
> It's a global mutex and heavily contended because many
> components use it without much care.  We are working
> to reduce the RTNL pressure for years by converting such
> users with a dedicated lock or per-netns RTNL mutex.
>
> RTNL is not needed here at all, so please use a dedicated lock.

Thanks a lot for your review. I think the following commit can fix this 
problem.

Please review.

 From 80525f5b7fb0af18b9759cbde0237aabb76158cc Mon Sep 17 00:00:00 2001

From: Zhu Yanjun <yanjun.zhu@linux.dev>
Date: Sat, 16 May 2026 22:27:35 +0200
Subject: [PATCH 1/1] RDMA/rxe: Fix Use-After-Free problem in rxe_net_del

syzbot reported a general protection fault (KASAN: null-ptr-deref) in
kernel_sock_shutdown() called during the software RoCE (rxe) link
deletion path (rxe_dellink -> rxe_net_del).

The root cause is a TOCTOU (Time-of-Check to Time-of-Use) race condition
in rxe_net_del(). Previously, the function fetched the socket pointer
via rxe_ns_pernet_sk4/6() outside the critical section, and then
acquired the lock to release it via rxe_sock_put().

In a highly concurrent teardown environment, another thread could close
and clear the pernet socket after it was fetched but before the lock
was acquired. This causes rxe_sock_put() to operate on a dangling or
already cleared socket pointer, leading to a NULL pointer dereference
when kernel_sock_shutdown() attempts to access sock->sk.

Fix this by introducing a dedicated, per-device mutex 'release_lock'
and extending its scope. The socket pointers are now fetched, checked,
and released entirely within the same locked critical section. This
ensures the atomicity of the socket lookup and teardown sequence.

Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and 
destruction per net namespace")
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
  drivers/infiniband/sw/rxe/rxe.c       | 2 ++
  drivers/infiniband/sw/rxe/rxe_net.c   | 4 ++++
  drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
  3 files changed, 7 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe.c 
b/drivers/infiniband/sw/rxe/rxe.c
index b0714f9abe3d..46967ecdaf7d 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -34,6 +34,7 @@ void rxe_dealloc(struct ib_device *ib_dev)
         WARN_ON(!RB_EMPTY_ROOT(&rxe->mcg_tree));

         mutex_destroy(&rxe->usdev_lock);
+       mutex_destroy(&rxe->release_lock);
  }

  static const struct ib_device_ops rxe_ib_dev_odp_ops = {
@@ -186,6 +187,7 @@ static void rxe_init(struct rxe_dev *rxe, struct 
net_device *ndev)
         rxe->mcg_tree = RB_ROOT;

         mutex_init(&rxe->usdev_lock);
+       mutex_init(&rxe->release_lock);
  }

  void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu)
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c 
b/drivers/infiniband/sw/rxe/rxe_net.c
index 50a2cb5405e2..c3b188538540 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -655,6 +655,8 @@ void rxe_net_del(struct ib_device *dev)

         net = dev_net(ndev);

+       mutex_lock(&rxe->release_lock);
+
         sk = rxe_ns_pernet_sk4(net);
         if (sk)
                 rxe_sock_put(sk, rxe_ns_pernet_set_sk4, net);
@@ -663,6 +665,8 @@ void rxe_net_del(struct ib_device *dev)
         if (sk)
                 rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);

+       mutex_unlock(&rxe->release_lock);
+
         dev_put(ndev);
  }

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h 
b/drivers/infiniband/sw/rxe/rxe_verbs.h
index d92f80d16f78..3f54aa0a4356 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -422,6 +422,7 @@ struct rxe_dev {
         int                     max_ucontext;
         int                     max_inline_data;
         struct mutex            usdev_lock;
+       struct mutex            release_lock;

         char                    raw_gid[ETH_ALEN];

--
2.43.0

>
>> My proposed commit is shown below. I am not sure whether it fully
>> resolves the problem.
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe.c
>> b/drivers/infiniband/sw/rxe/rxe.c
>> index b0714f9abe3d..84266dc416c4 100644
>> --- a/drivers/infiniband/sw/rxe/rxe.c
>> +++ b/drivers/infiniband/sw/rxe/rxe.c
>> @@ -251,7 +251,9 @@ static int rxe_newlink(const char *ibdev_name,
>> struct net_device *ndev)
>>
>>    static int rxe_dellink(struct ib_device *dev)
>>    {
>> +       rtnl_lock();
>>           rxe_net_del(dev);
>> +       rtnl_unlock();
>>
>>           return 0;
>>    }
>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c
>> b/drivers/infiniband/sw/rxe/rxe_net.c
>> index 50a2cb5405e2..ac53ea73996d 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
>> @@ -649,6 +649,8 @@ void rxe_net_del(struct ib_device *dev)
>>           struct sock *sk;
>>           struct net *net;
>>
>> +       ASSERT_RTNL();
>> +
>>           ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
>>           if (!ndev)
>>                   return;
>>
>> Zhu Yanjun
>>
>>> +static DEFINE_MUTEX(rxe_net_del_mutex);
>>> +
>>>    void rxe_net_del(struct ib_device *dev)
>>>    {
>>>        struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev);
>>> @@ -649,9 +651,10 @@ void rxe_net_del(struct ib_device *dev)
>>>        struct sock *sk;
>>>        struct net *net;
>>>
>>> +     mutex_lock(&rxe_net_del_mutex);
>>>        ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
>>>        if (!ndev)
>>> -             return;
>>> +             goto out;
>>>
>>>        net = dev_net(ndev);
>>>
>>> @@ -664,6 +667,8 @@ void rxe_net_del(struct ib_device *dev)
>>>                rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);
>>>
>>>        dev_put(ndev);
>>> +out:
>>> +     mutex_unlock(&rxe_net_del_mutex);
>>>    }
>>>
>>>    static void rxe_port_event(struct rxe_dev *rxe,

-- 
Best Regards,
Yanjun.Zhu


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH RDMA v2] RDMA/rxe: add mutual exclusion in rxe_net_del()
  2026-05-17  3:27                         ` Zhu Yanjun
@ 2026-05-17  4:31                           ` Zhu Yanjun
  0 siblings, 0 replies; 31+ messages in thread
From: Zhu Yanjun @ 2026-05-17  4:31 UTC (permalink / raw)
  To: Kuniyuki Iwashima, yanjun.zhu@linux.dev
  Cc: Edward Adam Davis, akpm, arjan, davem, dsahern, edumazet, hdanton,
	horms, jgg, kuba, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzbot+d8f76778263ab65c2b21, syzkaller-bugs, zyjzyj2000

在 2026/5/16 20:27, Zhu Yanjun 写道:
> 
> 在 2026/5/16 19:15, Kuniyuki Iwashima 写道:
>> On Sat, May 16, 2026 at 4:40 PM Yanjun.Zhu <yanjun.zhu@linux.dev> wrote:
>>>
>>> On 5/16/26 7:00 AM, Edward Adam Davis wrote:
>>>> We must serialize calls to rxe_net_del() or risk a crash as syzbot
>>>> reported:
>>>>
>>>> KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
>>>> Call Trace:
>>>>    udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
>>>>    rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 
>>>> [inline]
>>>>    rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
>>>>    rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
>>>>    rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
>>>>
>>>> Jason Gunthorpe suggest placing the lock within rxe to protect its racy
>>>> implementation of rxe_net_del(), which looks like it is possibly also
>>>> triggered by NETDEV_UNREGISTER.
>>>>
>>>> The patch addressing this issue in nldev_dellink() has already been
>>>> applied(0b28000b64f4); however, since the fix has now been relocated
>>>> to rxe, the corresponding remedial code in nldev has been removed.
>>>>
>>>> Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and 
>>>> destruction per net namespace")
>>>> Fixes: 0b28000b64f4 ("RDMA/nldev: Add mutual exclusion in 
>>>> nldev_dellink()")
>>>> Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
>>>> Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
>>>> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
>>>> ---
>>>> v1 -> v2: serialize calls to rxe net del
>>>>
>>>>    drivers/infiniband/core/nldev.c     | 4 ----
>>>>    drivers/infiniband/sw/rxe/rxe_net.c | 7 ++++++-
>>>>    2 files changed, 6 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/ 
>>>> core/nldev.c
>>>> index 3cb3cb7629fe..96c745d5bac4 100644
>>>> --- a/drivers/infiniband/core/nldev.c
>>>> +++ b/drivers/infiniband/core/nldev.c
>>>> @@ -1816,8 +1816,6 @@ static int nldev_newlink(struct sk_buff *skb, 
>>>> struct nlmsghdr *nlh,
>>>>        return err;
>>>>    }
>>>>
>>>> -static DEFINE_MUTEX(nldev_dellink_mutex);
>>>> -
>>>>    static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>>>>                          struct netlink_ext_ack *extack)
>>>>    {
>>>> @@ -1848,9 +1846,7 @@ static int nldev_dellink(struct sk_buff *skb, 
>>>> struct nlmsghdr *nlh,
>>>>         * implicitly scoped to the driver supporting dynamic link 
>>>> deletion like RXE.
>>>>         */
>>>>        if (device->link_ops && device->link_ops->dellink) {
>>>> -             mutex_lock(&nldev_dellink_mutex);
>>>>                err = device->link_ops->dellink(device);
>>>> -             mutex_unlock(&nldev_dellink_mutex);
>>>>                if (err)
>>>>                        return err;
>>>>        }
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/ 
>>>> infiniband/sw/rxe/rxe_net.c
>>>> index 50a2cb5405e2..92847e955ca2 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
>>>> @@ -642,6 +642,8 @@ static void rxe_sock_put(struct sock *sk,
>>>>        }
>>>>    }
>>>>
>>> I read this commit carefully. There are two paths that can invoke
>>> rxe_net_del().
>>>
>>> One is through the rdma link del xxx command, while the other is through
>>> the netdevice notification chain.
>>>
>>> In the netdevice notification chain path, rtnl_lock is already held, and
>>> rxe_net_del() is called under that lock.
>>>
>>> However, in the rdma link del xxx path, no rtnl_lock is taken.
>>>
>>> Because of this, I would like to use the existing rtnl_lock to serialize
>>> calls to rxe_net_del().
>> -1 for this.
>>
>> It's a global mutex and heavily contended because many
>> components use it without much care.  We are working
>> to reduce the RTNL pressure for years by converting such
>> users with a dedicated lock or per-netns RTNL mutex.
>>
>> RTNL is not needed here at all, so please use a dedicated lock.
> 
> Thanks a lot for your review. I think the following commit can fix this 
> problem.
> 
> Please review.

The root cause is clear. If no one disagrees with this commit, I will 
send out the official patch.

In the latest revision, I will move the mutex lock into the network 
namespace.

I think we have discussed this problem thoroughly, and we all understand 
the root cause now.

Zhu Yanjun

> 
>  From 80525f5b7fb0af18b9759cbde0237aabb76158cc Mon Sep 17 00:00:00 2001
> 
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
> Date: Sat, 16 May 2026 22:27:35 +0200
> Subject: [PATCH 1/1] RDMA/rxe: Fix Use-After-Free problem in rxe_net_del
> 
> syzbot reported a general protection fault (KASAN: null-ptr-deref) in
> kernel_sock_shutdown() called during the software RoCE (rxe) link
> deletion path (rxe_dellink -> rxe_net_del).
> 
> The root cause is a TOCTOU (Time-of-Check to Time-of-Use) race condition
> in rxe_net_del(). Previously, the function fetched the socket pointer
> via rxe_ns_pernet_sk4/6() outside the critical section, and then
> acquired the lock to release it via rxe_sock_put().
> 
> In a highly concurrent teardown environment, another thread could close
> and clear the pernet socket after it was fetched but before the lock
> was acquired. This causes rxe_sock_put() to operate on a dangling or
> already cleared socket pointer, leading to a NULL pointer dereference
> when kernel_sock_shutdown() attempts to access sock->sk.
> 
> Fix this by introducing a dedicated, per-device mutex 'release_lock'
> and extending its scope. The socket pointers are now fetched, checked,
> and released entirely within the same locked critical section. This
> ensures the atomicity of the socket lookup and teardown sequence.
> 
> Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
> Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and 
> destruction per net namespace")
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> ---
>   drivers/infiniband/sw/rxe/rxe.c       | 2 ++
>   drivers/infiniband/sw/rxe/rxe_net.c   | 4 ++++
>   drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>   3 files changed, 7 insertions(+)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/ 
> rxe/rxe.c
> index b0714f9abe3d..46967ecdaf7d 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -34,6 +34,7 @@ void rxe_dealloc(struct ib_device *ib_dev)
>          WARN_ON(!RB_EMPTY_ROOT(&rxe->mcg_tree));
> 
>          mutex_destroy(&rxe->usdev_lock);
> +       mutex_destroy(&rxe->release_lock);
>   }
> 
>   static const struct ib_device_ops rxe_ib_dev_odp_ops = {
> @@ -186,6 +187,7 @@ static void rxe_init(struct rxe_dev *rxe, struct 
> net_device *ndev)
>          rxe->mcg_tree = RB_ROOT;
> 
>          mutex_init(&rxe->usdev_lock);
> +       mutex_init(&rxe->release_lock);
>   }
> 
>   void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu)
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/ 
> sw/rxe/rxe_net.c
> index 50a2cb5405e2..c3b188538540 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -655,6 +655,8 @@ void rxe_net_del(struct ib_device *dev)
> 
>          net = dev_net(ndev);
> 
> +       mutex_lock(&rxe->release_lock);
> +
>          sk = rxe_ns_pernet_sk4(net);
>          if (sk)
>                  rxe_sock_put(sk, rxe_ns_pernet_set_sk4, net);
> @@ -663,6 +665,8 @@ void rxe_net_del(struct ib_device *dev)
>          if (sk)
>                  rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);
> 
> +       mutex_unlock(&rxe->release_lock);
> +
>          dev_put(ndev);
>   }
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/ 
> sw/rxe/rxe_verbs.h
> index d92f80d16f78..3f54aa0a4356 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
> @@ -422,6 +422,7 @@ struct rxe_dev {
>          int                     max_ucontext;
>          int                     max_inline_data;
>          struct mutex            usdev_lock;
> +       struct mutex            release_lock;
> 
>          char                    raw_gid[ETH_ALEN];
> 
> -- 
> 2.43.0
> 
>>
>>> My proposed commit is shown below. I am not sure whether it fully
>>> resolves the problem.
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe.c
>>> b/drivers/infiniband/sw/rxe/rxe.c
>>> index b0714f9abe3d..84266dc416c4 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe.c
>>> @@ -251,7 +251,9 @@ static int rxe_newlink(const char *ibdev_name,
>>> struct net_device *ndev)
>>>
>>>    static int rxe_dellink(struct ib_device *dev)
>>>    {
>>> +       rtnl_lock();
>>>           rxe_net_del(dev);
>>> +       rtnl_unlock();
>>>
>>>           return 0;
>>>    }
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c
>>> b/drivers/infiniband/sw/rxe/rxe_net.c
>>> index 50a2cb5405e2..ac53ea73996d 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
>>> @@ -649,6 +649,8 @@ void rxe_net_del(struct ib_device *dev)
>>>           struct sock *sk;
>>>           struct net *net;
>>>
>>> +       ASSERT_RTNL();
>>> +
>>>           ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
>>>           if (!ndev)
>>>                   return;
>>>
>>> Zhu Yanjun
>>>
>>>> +static DEFINE_MUTEX(rxe_net_del_mutex);
>>>> +
>>>>    void rxe_net_del(struct ib_device *dev)
>>>>    {
>>>>        struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev);
>>>> @@ -649,9 +651,10 @@ void rxe_net_del(struct ib_device *dev)
>>>>        struct sock *sk;
>>>>        struct net *net;
>>>>
>>>> +     mutex_lock(&rxe_net_del_mutex);
>>>>        ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
>>>>        if (!ndev)
>>>> -             return;
>>>> +             goto out;
>>>>
>>>>        net = dev_net(ndev);
>>>>
>>>> @@ -664,6 +667,8 @@ void rxe_net_del(struct ib_device *dev)
>>>>                rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);
>>>>
>>>>        dev_put(ndev);
>>>> +out:
>>>> +     mutex_unlock(&rxe_net_del_mutex);
>>>>    }
>>>>
>>>>    static void rxe_port_event(struct rxe_dev *rxe,
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2026-05-17  4:32 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <69ea344f.a00a0220.17a17.0040.GAE@google.com>
2026-04-24 18:08 ` [syzbot] [net?] general protection fault in kernel_sock_shutdown (4) Arjan van de Ven
2026-04-25  1:12 ` Arjan van de Ven
2026-04-25  1:14   ` Kuniyuki Iwashima
2026-05-06 13:48 ` [syzbot] [rdma] " syzbot
2026-05-06 14:28   ` Zhu Yanjun
2026-05-06 15:19     ` Kuniyuki Iwashima
2026-05-07  3:52 ` syzbot
2026-05-07 12:50   ` [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink() Edward Adam Davis
2026-05-07 13:25     ` Zhu Yanjun
2026-05-07 13:40       ` Edward Adam Davis
2026-05-07 14:11         ` Zhu Yanjun
2026-05-13 18:17     ` Leon Romanovsky
2026-05-13 23:46       ` Jason Gunthorpe
2026-05-14  7:31         ` Edward Adam Davis
2026-05-14 11:50           ` Jason Gunthorpe
2026-05-14 13:58             ` David Ahern
2026-05-14 14:14               ` Jason Gunthorpe
2026-05-14 14:26                 ` David Ahern
2026-05-14 15:46                   ` Zhu Yanjun
2026-05-16 12:40                 ` Edward Adam Davis
2026-05-16 14:00                   ` [PATCH RDMA v2] RDMA/rxe: add mutual exclusion in rxe_net_del() Edward Adam Davis
2026-05-16 14:31                     ` Zhu Yanjun
2026-05-16 23:40                     ` Yanjun.Zhu
2026-05-17  1:56                       ` Edward Adam Davis
2026-05-17  2:15                       ` Kuniyuki Iwashima
2026-05-17  3:27                         ` Zhu Yanjun
2026-05-17  4:31                           ` Zhu Yanjun
2026-05-14  5:15   ` [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4) Zhu Yanjun
2026-05-16  5:44     ` Zhu Yanjun
2026-05-16  7:02       ` syzbot
2026-05-16 18:40         ` Zhu Yanjun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox