* [PATCH net 1/2] tcp: call sk_data_ready() after listener migration
[not found] <20260418041633.691435-1-jt26wzz@gmail.com>
@ 2026-04-18 4:16 ` Zhenzhong Wu
2026-04-18 6:02 ` Eric Dumazet
0 siblings, 1 reply; 3+ messages in thread
From: Zhenzhong Wu @ 2026-04-18 4:16 UTC (permalink / raw)
To: netdev
Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
shuah, tamird, linux-kernel, linux-kselftest, Zhenzhong Wu,
stable
When inet_csk_listen_stop() migrates an established child socket from
a closing listener to another socket in the same SO_REUSEPORT group,
the target listener gets a new accept-queue entry via
inet_csk_reqsk_queue_add(), but that path never notifies the target
listener's waiters.
As a result, a nonblocking accept() still succeeds because it checks
the accept queue directly, but waiters that sleep for listener
readiness can remain asleep until another connection generates a
wakeup. This affects poll()/epoll_wait()-based waiters, and can also
leave a blocking accept() asleep after migration even though the
child is already in the target listener's accept queue.
This was observed in a local test where listener A completed the
handshake, queued the child, and was closed before userspace called
accept(). The child was migrated to listener B, but listener B never
received a wakeup for the migrated accept-queue entry.
Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration
in inet_csk_listen_stop().
The reqsk_timer_handler() path does not need the same change:
half-open requests only become readable to userspace when the final
ACK completes the handshake, and tcp_child_process() already wakes
the listener in that case.
Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
Cc: stable@vger.kernel.org
Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
---
net/ipv4/inet_connection_sock.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4ac3ae1bc..da1ce082f 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1483,6 +1483,7 @@ void inet_csk_listen_stop(struct sock *sk)
__NET_INC_STATS(sock_net(nsk),
LINUX_MIB_TCPMIGRATEREQSUCCESS);
reqsk_migrate_reset(req);
+ READ_ONCE(nsk->sk_data_ready)(nsk);
} else {
__NET_INC_STATS(sock_net(nsk),
LINUX_MIB_TCPMIGRATEREQFAILURE);
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net 1/2] tcp: call sk_data_ready() after listener migration
2026-04-18 4:16 ` [PATCH net 1/2] tcp: call sk_data_ready() after listener migration Zhenzhong Wu
@ 2026-04-18 6:02 ` Eric Dumazet
2026-04-18 13:30 ` 上勾拳
0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2026-04-18 6:02 UTC (permalink / raw)
To: Zhenzhong Wu
Cc: netdev, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
shuah, tamird, linux-kernel, linux-kselftest, stable
On Fri, Apr 17, 2026 at 9:17 PM Zhenzhong Wu <jt26wzz@gmail.com> wrote:
>
> When inet_csk_listen_stop() migrates an established child socket from
> a closing listener to another socket in the same SO_REUSEPORT group,
> the target listener gets a new accept-queue entry via
> inet_csk_reqsk_queue_add(), but that path never notifies the target
> listener's waiters.
>
> As a result, a nonblocking accept() still succeeds because it checks
> the accept queue directly, but waiters that sleep for listener
> readiness can remain asleep until another connection generates a
> wakeup. This affects poll()/epoll_wait()-based waiters, and can also
> leave a blocking accept() asleep after migration even though the
> child is already in the target listener's accept queue.
>
> This was observed in a local test where listener A completed the
> handshake, queued the child, and was closed before userspace called
> accept(). The child was migrated to listener B, but listener B never
> received a wakeup for the migrated accept-queue entry.
>
> Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration
> in inet_csk_listen_stop().
>
> The reqsk_timer_handler() path does not need the same change:
> half-open requests only become readable to userspace when the final
> ACK completes the handshake, and tcp_child_process() already wakes
> the listener in that case.
>
> Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
> Cc: stable@vger.kernel.org
> Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
> ---
> net/ipv4/inet_connection_sock.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 4ac3ae1bc..da1ce082f 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -1483,6 +1483,7 @@ void inet_csk_listen_stop(struct sock *sk)
> __NET_INC_STATS(sock_net(nsk),
> LINUX_MIB_TCPMIGRATEREQSUCCESS);
> reqsk_migrate_reset(req);
> + READ_ONCE(nsk->sk_data_ready)(nsk);
I think this is adding a potential UAF (Use Afte Free).
@nsk might have been freed already by another thread/cpu.
Note the existing code already has similar issues.
Untested patch:
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4ac3ae1bc1afc3a39f2790e39b4dda877dc3272b..287b6e01c4f71bfec3dd2a708f316224d9eb4a64
100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1479,6 +1479,7 @@ void inet_csk_listen_stop(struct sock *sk)
if (nreq) {
refcount_set(&nreq->rsk_refcnt, 1);
+ rcu_read_lock();
if (inet_csk_reqsk_queue_add(nsk,
nreq, child)) {
__NET_INC_STATS(sock_net(nsk),
LINUX_MIB_TCPMIGRATEREQSUCCESS);
@@ -1489,7 +1490,7 @@ void inet_csk_listen_stop(struct sock *sk)
reqsk_migrate_reset(nreq);
__reqsk_free(nreq);
}
-
+ rcu_read_unlock();
/* inet_csk_reqsk_queue_add() has already
* called inet_child_forget() on failure case.
*/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net 1/2] tcp: call sk_data_ready() after listener migration
2026-04-18 6:02 ` Eric Dumazet
@ 2026-04-18 13:30 ` 上勾拳
0 siblings, 0 replies; 3+ messages in thread
From: 上勾拳 @ 2026-04-18 13:30 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
shuah, tamird, linux-kernel, linux-kselftest, stable
Thanks Eric, you're right.
After inet_csk_reqsk_queue_add() succeeds, the ref acquired in
reuseport_migrate_sock() is effectively transferred to
nreq->rsk_listener. Another CPU can then dequeue nreq (via
accept() or listener shutdown), hit reqsk_put(), and drop that
listener ref.
Since listeners are SOCK_RCU_FREE, the post-queue_add()
dereferences of nsk should be under rcu_read_lock()/
rcu_read_unlock(), which also covers the existing sock_net(nsk)
access in that path.
I also checked reqsk_timer_handler(): reqsk_queue_migrated()
there is only accounting, and once nreq becomes visible via
inet_ehash_insert(), the handler no longer appears to
dereference nsk.
I'll fold this into v2.
Eric Dumazet <edumazet@google.com> 于2026年4月18日周六 14:02写道:
>
> On Fri, Apr 17, 2026 at 9:17 PM Zhenzhong Wu <jt26wzz@gmail.com> wrote:
> >
> > When inet_csk_listen_stop() migrates an established child socket from
> > a closing listener to another socket in the same SO_REUSEPORT group,
> > the target listener gets a new accept-queue entry via
> > inet_csk_reqsk_queue_add(), but that path never notifies the target
> > listener's waiters.
> >
> > As a result, a nonblocking accept() still succeeds because it checks
> > the accept queue directly, but waiters that sleep for listener
> > readiness can remain asleep until another connection generates a
> > wakeup. This affects poll()/epoll_wait()-based waiters, and can also
> > leave a blocking accept() asleep after migration even though the
> > child is already in the target listener's accept queue.
> >
> > This was observed in a local test where listener A completed the
> > handshake, queued the child, and was closed before userspace called
> > accept(). The child was migrated to listener B, but listener B never
> > received a wakeup for the migrated accept-queue entry.
> >
> > Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration
> > in inet_csk_listen_stop().
> >
> > The reqsk_timer_handler() path does not need the same change:
> > half-open requests only become readable to userspace when the final
> > ACK completes the handshake, and tcp_child_process() already wakes
> > the listener in that case.
> >
> > Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
> > ---
> > net/ipv4/inet_connection_sock.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > index 4ac3ae1bc..da1ce082f 100644
> > --- a/net/ipv4/inet_connection_sock.c
> > +++ b/net/ipv4/inet_connection_sock.c
> > @@ -1483,6 +1483,7 @@ void inet_csk_listen_stop(struct sock *sk)
> > __NET_INC_STATS(sock_net(nsk),
> > LINUX_MIB_TCPMIGRATEREQSUCCESS);
> > reqsk_migrate_reset(req);
> > + READ_ONCE(nsk->sk_data_ready)(nsk);
>
> I think this is adding a potential UAF (Use Afte Free).
> @nsk might have been freed already by another thread/cpu.
> Note the existing code already has similar issues.
>
> Untested patch:
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 4ac3ae1bc1afc3a39f2790e39b4dda877dc3272b..287b6e01c4f71bfec3dd2a708f316224d9eb4a64
> 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -1479,6 +1479,7 @@ void inet_csk_listen_stop(struct sock *sk)
> if (nreq) {
> refcount_set(&nreq->rsk_refcnt, 1);
>
> + rcu_read_lock();
> if (inet_csk_reqsk_queue_add(nsk,
> nreq, child)) {
> __NET_INC_STATS(sock_net(nsk),
>
> LINUX_MIB_TCPMIGRATEREQSUCCESS);
> @@ -1489,7 +1490,7 @@ void inet_csk_listen_stop(struct sock *sk)
> reqsk_migrate_reset(nreq);
> __reqsk_free(nreq);
> }
> -
> + rcu_read_unlock();
> /* inet_csk_reqsk_queue_add() has already
> * called inet_child_forget() on failure case.
> */
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-04-18 13:31 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260418041633.691435-1-jt26wzz@gmail.com>
2026-04-18 4:16 ` [PATCH net 1/2] tcp: call sk_data_ready() after listener migration Zhenzhong Wu
2026-04-18 6:02 ` Eric Dumazet
2026-04-18 13:30 ` 上勾拳
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox