public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 1/2] tcp: call sk_data_ready() after listener migration
       [not found] <20260418041633.691435-1-jt26wzz@gmail.com>
@ 2026-04-18  4:16 ` Zhenzhong Wu
  2026-04-18  6:02   ` Eric Dumazet
  0 siblings, 1 reply; 3+ messages in thread
From: Zhenzhong Wu @ 2026-04-18  4:16 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, Zhenzhong Wu,
	stable

When inet_csk_listen_stop() migrates an established child socket from
a closing listener to another socket in the same SO_REUSEPORT group,
the target listener gets a new accept-queue entry via
inet_csk_reqsk_queue_add(), but that path never notifies the target
listener's waiters.

As a result, a nonblocking accept() still succeeds because it checks
the accept queue directly, but waiters that sleep for listener
readiness can remain asleep until another connection generates a
wakeup. This affects poll()/epoll_wait()-based waiters, and can also
leave a blocking accept() asleep after migration even though the
child is already in the target listener's accept queue.

This was observed in a local test where listener A completed the
handshake, queued the child, and was closed before userspace called
accept(). The child was migrated to listener B, but listener B never
received a wakeup for the migrated accept-queue entry.

Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration
in inet_csk_listen_stop().

The reqsk_timer_handler() path does not need the same change:
half-open requests only become readable to userspace when the final
ACK completes the handshake, and tcp_child_process() already wakes
the listener in that case.

Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
Cc: stable@vger.kernel.org
Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
---
 net/ipv4/inet_connection_sock.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4ac3ae1bc..da1ce082f 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1483,6 +1483,7 @@ void inet_csk_listen_stop(struct sock *sk)
 					__NET_INC_STATS(sock_net(nsk),
 							LINUX_MIB_TCPMIGRATEREQSUCCESS);
 					reqsk_migrate_reset(req);
+					READ_ONCE(nsk->sk_data_ready)(nsk);
 				} else {
 					__NET_INC_STATS(sock_net(nsk),
 							LINUX_MIB_TCPMIGRATEREQFAILURE);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net 1/2] tcp: call sk_data_ready() after listener migration
  2026-04-18  4:16 ` [PATCH net 1/2] tcp: call sk_data_ready() after listener migration Zhenzhong Wu
@ 2026-04-18  6:02   ` Eric Dumazet
  2026-04-18 13:30     ` 上勾拳
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2026-04-18  6:02 UTC (permalink / raw)
  To: Zhenzhong Wu
  Cc: netdev, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, stable

On Fri, Apr 17, 2026 at 9:17 PM Zhenzhong Wu <jt26wzz@gmail.com> wrote:
>
> When inet_csk_listen_stop() migrates an established child socket from
> a closing listener to another socket in the same SO_REUSEPORT group,
> the target listener gets a new accept-queue entry via
> inet_csk_reqsk_queue_add(), but that path never notifies the target
> listener's waiters.
>
> As a result, a nonblocking accept() still succeeds because it checks
> the accept queue directly, but waiters that sleep for listener
> readiness can remain asleep until another connection generates a
> wakeup. This affects poll()/epoll_wait()-based waiters, and can also
> leave a blocking accept() asleep after migration even though the
> child is already in the target listener's accept queue.
>
> This was observed in a local test where listener A completed the
> handshake, queued the child, and was closed before userspace called
> accept(). The child was migrated to listener B, but listener B never
> received a wakeup for the migrated accept-queue entry.
>
> Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration
> in inet_csk_listen_stop().
>
> The reqsk_timer_handler() path does not need the same change:
> half-open requests only become readable to userspace when the final
> ACK completes the handshake, and tcp_child_process() already wakes
> the listener in that case.
>
> Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
> Cc: stable@vger.kernel.org
> Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
> ---
>  net/ipv4/inet_connection_sock.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 4ac3ae1bc..da1ce082f 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -1483,6 +1483,7 @@ void inet_csk_listen_stop(struct sock *sk)
>                                         __NET_INC_STATS(sock_net(nsk),
>                                                         LINUX_MIB_TCPMIGRATEREQSUCCESS);
>                                         reqsk_migrate_reset(req);
> +                                       READ_ONCE(nsk->sk_data_ready)(nsk);

I think this is adding a potential UAF (Use Afte Free).
@nsk might have been freed already by another thread/cpu.
Note the existing code already has similar issues.

Untested patch:

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4ac3ae1bc1afc3a39f2790e39b4dda877dc3272b..287b6e01c4f71bfec3dd2a708f316224d9eb4a64
100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1479,6 +1479,7 @@ void inet_csk_listen_stop(struct sock *sk)
                        if (nreq) {
                                refcount_set(&nreq->rsk_refcnt, 1);

+                               rcu_read_lock();
                                if (inet_csk_reqsk_queue_add(nsk,
nreq, child)) {
                                        __NET_INC_STATS(sock_net(nsk),

LINUX_MIB_TCPMIGRATEREQSUCCESS);
@@ -1489,7 +1490,7 @@ void inet_csk_listen_stop(struct sock *sk)
                                        reqsk_migrate_reset(nreq);
                                        __reqsk_free(nreq);
                                }
-
+                               rcu_read_unlock();
                                /* inet_csk_reqsk_queue_add() has already
                                 * called inet_child_forget() on failure case.
                                 */

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net 1/2] tcp: call sk_data_ready() after listener migration
  2026-04-18  6:02   ` Eric Dumazet
@ 2026-04-18 13:30     ` 上勾拳
  0 siblings, 0 replies; 3+ messages in thread
From: 上勾拳 @ 2026-04-18 13:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, stable

Thanks Eric, you're right.

After inet_csk_reqsk_queue_add() succeeds, the ref acquired in
reuseport_migrate_sock() is effectively transferred to
nreq->rsk_listener. Another CPU can then dequeue nreq (via
accept() or listener shutdown), hit reqsk_put(), and drop that
listener ref.

Since listeners are SOCK_RCU_FREE, the post-queue_add()
dereferences of nsk should be under rcu_read_lock()/
rcu_read_unlock(), which also covers the existing sock_net(nsk)
access in that path.

I also checked reqsk_timer_handler(): reqsk_queue_migrated()
there is only accounting, and once nreq becomes visible via
inet_ehash_insert(), the handler no longer appears to
dereference nsk.

I'll fold this into v2.


Eric Dumazet <edumazet@google.com> 于2026年4月18日周六 14:02写道:
>
> On Fri, Apr 17, 2026 at 9:17 PM Zhenzhong Wu <jt26wzz@gmail.com> wrote:
> >
> > When inet_csk_listen_stop() migrates an established child socket from
> > a closing listener to another socket in the same SO_REUSEPORT group,
> > the target listener gets a new accept-queue entry via
> > inet_csk_reqsk_queue_add(), but that path never notifies the target
> > listener's waiters.
> >
> > As a result, a nonblocking accept() still succeeds because it checks
> > the accept queue directly, but waiters that sleep for listener
> > readiness can remain asleep until another connection generates a
> > wakeup. This affects poll()/epoll_wait()-based waiters, and can also
> > leave a blocking accept() asleep after migration even though the
> > child is already in the target listener's accept queue.
> >
> > This was observed in a local test where listener A completed the
> > handshake, queued the child, and was closed before userspace called
> > accept(). The child was migrated to listener B, but listener B never
> > received a wakeup for the migrated accept-queue entry.
> >
> > Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration
> > in inet_csk_listen_stop().
> >
> > The reqsk_timer_handler() path does not need the same change:
> > half-open requests only become readable to userspace when the final
> > ACK completes the handshake, and tcp_child_process() already wakes
> > the listener in that case.
> >
> > Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
> > ---
> >  net/ipv4/inet_connection_sock.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > index 4ac3ae1bc..da1ce082f 100644
> > --- a/net/ipv4/inet_connection_sock.c
> > +++ b/net/ipv4/inet_connection_sock.c
> > @@ -1483,6 +1483,7 @@ void inet_csk_listen_stop(struct sock *sk)
> >                                         __NET_INC_STATS(sock_net(nsk),
> >                                                         LINUX_MIB_TCPMIGRATEREQSUCCESS);
> >                                         reqsk_migrate_reset(req);
> > +                                       READ_ONCE(nsk->sk_data_ready)(nsk);
>
> I think this is adding a potential UAF (Use Afte Free).
> @nsk might have been freed already by another thread/cpu.
> Note the existing code already has similar issues.
>
> Untested patch:
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 4ac3ae1bc1afc3a39f2790e39b4dda877dc3272b..287b6e01c4f71bfec3dd2a708f316224d9eb4a64
> 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -1479,6 +1479,7 @@ void inet_csk_listen_stop(struct sock *sk)
>                         if (nreq) {
>                                 refcount_set(&nreq->rsk_refcnt, 1);
>
> +                               rcu_read_lock();
>                                 if (inet_csk_reqsk_queue_add(nsk,
> nreq, child)) {
>                                         __NET_INC_STATS(sock_net(nsk),
>
> LINUX_MIB_TCPMIGRATEREQSUCCESS);
> @@ -1489,7 +1490,7 @@ void inet_csk_listen_stop(struct sock *sk)
>                                         reqsk_migrate_reset(nreq);
>                                         __reqsk_free(nreq);
>                                 }
> -
> +                               rcu_read_unlock();
>                                 /* inet_csk_reqsk_queue_add() has already
>                                  * called inet_child_forget() on failure case.
>                                  */

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-04-18 13:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260418041633.691435-1-jt26wzz@gmail.com>
2026-04-18  4:16 ` [PATCH net 1/2] tcp: call sk_data_ready() after listener migration Zhenzhong Wu
2026-04-18  6:02   ` Eric Dumazet
2026-04-18 13:30     ` 上勾拳

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox