* [PATCH net 1/2] tcp: call sk_data_ready() after listener migration [not found] <20260418041633.691435-1-jt26wzz@gmail.com> @ 2026-04-18 4:16 ` Zhenzhong Wu 2026-04-18 6:02 ` Eric Dumazet 0 siblings, 1 reply; 3+ messages in thread From: Zhenzhong Wu @ 2026-04-18 4:16 UTC (permalink / raw) To: netdev Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms, shuah, tamird, linux-kernel, linux-kselftest, Zhenzhong Wu, stable When inet_csk_listen_stop() migrates an established child socket from a closing listener to another socket in the same SO_REUSEPORT group, the target listener gets a new accept-queue entry via inet_csk_reqsk_queue_add(), but that path never notifies the target listener's waiters. As a result, a nonblocking accept() still succeeds because it checks the accept queue directly, but waiters that sleep for listener readiness can remain asleep until another connection generates a wakeup. This affects poll()/epoll_wait()-based waiters, and can also leave a blocking accept() asleep after migration even though the child is already in the target listener's accept queue. This was observed in a local test where listener A completed the handshake, queued the child, and was closed before userspace called accept(). The child was migrated to listener B, but listener B never received a wakeup for the migrated accept-queue entry. Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration in inet_csk_listen_stop(). The reqsk_timer_handler() path does not need the same change: half-open requests only become readable to userspace when the final ACK completes the handshake, and tcp_child_process() already wakes the listener in that case. Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.") Cc: stable@vger.kernel.org Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com> --- net/ipv4/inet_connection_sock.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 4ac3ae1bc..da1ce082f 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -1483,6 +1483,7 @@ void inet_csk_listen_stop(struct sock *sk) __NET_INC_STATS(sock_net(nsk), LINUX_MIB_TCPMIGRATEREQSUCCESS); reqsk_migrate_reset(req); + READ_ONCE(nsk->sk_data_ready)(nsk); } else { __NET_INC_STATS(sock_net(nsk), LINUX_MIB_TCPMIGRATEREQFAILURE); -- 2.43.0 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net 1/2] tcp: call sk_data_ready() after listener migration 2026-04-18 4:16 ` [PATCH net 1/2] tcp: call sk_data_ready() after listener migration Zhenzhong Wu @ 2026-04-18 6:02 ` Eric Dumazet 2026-04-18 13:30 ` 上勾拳 0 siblings, 1 reply; 3+ messages in thread From: Eric Dumazet @ 2026-04-18 6:02 UTC (permalink / raw) To: Zhenzhong Wu Cc: netdev, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms, shuah, tamird, linux-kernel, linux-kselftest, stable On Fri, Apr 17, 2026 at 9:17 PM Zhenzhong Wu <jt26wzz@gmail.com> wrote: > > When inet_csk_listen_stop() migrates an established child socket from > a closing listener to another socket in the same SO_REUSEPORT group, > the target listener gets a new accept-queue entry via > inet_csk_reqsk_queue_add(), but that path never notifies the target > listener's waiters. > > As a result, a nonblocking accept() still succeeds because it checks > the accept queue directly, but waiters that sleep for listener > readiness can remain asleep until another connection generates a > wakeup. This affects poll()/epoll_wait()-based waiters, and can also > leave a blocking accept() asleep after migration even though the > child is already in the target listener's accept queue. > > This was observed in a local test where listener A completed the > handshake, queued the child, and was closed before userspace called > accept(). The child was migrated to listener B, but listener B never > received a wakeup for the migrated accept-queue entry. > > Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration > in inet_csk_listen_stop(). > > The reqsk_timer_handler() path does not need the same change: > half-open requests only become readable to userspace when the final > ACK completes the handshake, and tcp_child_process() already wakes > the listener in that case. > > Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.") > Cc: stable@vger.kernel.org > Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com> > --- > net/ipv4/inet_connection_sock.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > index 4ac3ae1bc..da1ce082f 100644 > --- a/net/ipv4/inet_connection_sock.c > +++ b/net/ipv4/inet_connection_sock.c > @@ -1483,6 +1483,7 @@ void inet_csk_listen_stop(struct sock *sk) > __NET_INC_STATS(sock_net(nsk), > LINUX_MIB_TCPMIGRATEREQSUCCESS); > reqsk_migrate_reset(req); > + READ_ONCE(nsk->sk_data_ready)(nsk); I think this is adding a potential UAF (Use Afte Free). @nsk might have been freed already by another thread/cpu. Note the existing code already has similar issues. Untested patch: diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 4ac3ae1bc1afc3a39f2790e39b4dda877dc3272b..287b6e01c4f71bfec3dd2a708f316224d9eb4a64 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -1479,6 +1479,7 @@ void inet_csk_listen_stop(struct sock *sk) if (nreq) { refcount_set(&nreq->rsk_refcnt, 1); + rcu_read_lock(); if (inet_csk_reqsk_queue_add(nsk, nreq, child)) { __NET_INC_STATS(sock_net(nsk), LINUX_MIB_TCPMIGRATEREQSUCCESS); @@ -1489,7 +1490,7 @@ void inet_csk_listen_stop(struct sock *sk) reqsk_migrate_reset(nreq); __reqsk_free(nreq); } - + rcu_read_unlock(); /* inet_csk_reqsk_queue_add() has already * called inet_child_forget() on failure case. */ ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net 1/2] tcp: call sk_data_ready() after listener migration 2026-04-18 6:02 ` Eric Dumazet @ 2026-04-18 13:30 ` 上勾拳 0 siblings, 0 replies; 3+ messages in thread From: 上勾拳 @ 2026-04-18 13:30 UTC (permalink / raw) To: Eric Dumazet Cc: netdev, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms, shuah, tamird, linux-kernel, linux-kselftest, stable Thanks Eric, you're right. After inet_csk_reqsk_queue_add() succeeds, the ref acquired in reuseport_migrate_sock() is effectively transferred to nreq->rsk_listener. Another CPU can then dequeue nreq (via accept() or listener shutdown), hit reqsk_put(), and drop that listener ref. Since listeners are SOCK_RCU_FREE, the post-queue_add() dereferences of nsk should be under rcu_read_lock()/ rcu_read_unlock(), which also covers the existing sock_net(nsk) access in that path. I also checked reqsk_timer_handler(): reqsk_queue_migrated() there is only accounting, and once nreq becomes visible via inet_ehash_insert(), the handler no longer appears to dereference nsk. I'll fold this into v2. Eric Dumazet <edumazet@google.com> 于2026年4月18日周六 14:02写道: > > On Fri, Apr 17, 2026 at 9:17 PM Zhenzhong Wu <jt26wzz@gmail.com> wrote: > > > > When inet_csk_listen_stop() migrates an established child socket from > > a closing listener to another socket in the same SO_REUSEPORT group, > > the target listener gets a new accept-queue entry via > > inet_csk_reqsk_queue_add(), but that path never notifies the target > > listener's waiters. > > > > As a result, a nonblocking accept() still succeeds because it checks > > the accept queue directly, but waiters that sleep for listener > > readiness can remain asleep until another connection generates a > > wakeup. This affects poll()/epoll_wait()-based waiters, and can also > > leave a blocking accept() asleep after migration even though the > > child is already in the target listener's accept queue. > > > > This was observed in a local test where listener A completed the > > handshake, queued the child, and was closed before userspace called > > accept(). The child was migrated to listener B, but listener B never > > received a wakeup for the migrated accept-queue entry. > > > > Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration > > in inet_csk_listen_stop(). > > > > The reqsk_timer_handler() path does not need the same change: > > half-open requests only become readable to userspace when the final > > ACK completes the handshake, and tcp_child_process() already wakes > > the listener in that case. > > > > Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.") > > Cc: stable@vger.kernel.org > > Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com> > > --- > > net/ipv4/inet_connection_sock.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > > index 4ac3ae1bc..da1ce082f 100644 > > --- a/net/ipv4/inet_connection_sock.c > > +++ b/net/ipv4/inet_connection_sock.c > > @@ -1483,6 +1483,7 @@ void inet_csk_listen_stop(struct sock *sk) > > __NET_INC_STATS(sock_net(nsk), > > LINUX_MIB_TCPMIGRATEREQSUCCESS); > > reqsk_migrate_reset(req); > > + READ_ONCE(nsk->sk_data_ready)(nsk); > > I think this is adding a potential UAF (Use Afte Free). > @nsk might have been freed already by another thread/cpu. > Note the existing code already has similar issues. > > Untested patch: > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > index 4ac3ae1bc1afc3a39f2790e39b4dda877dc3272b..287b6e01c4f71bfec3dd2a708f316224d9eb4a64 > 100644 > --- a/net/ipv4/inet_connection_sock.c > +++ b/net/ipv4/inet_connection_sock.c > @@ -1479,6 +1479,7 @@ void inet_csk_listen_stop(struct sock *sk) > if (nreq) { > refcount_set(&nreq->rsk_refcnt, 1); > > + rcu_read_lock(); > if (inet_csk_reqsk_queue_add(nsk, > nreq, child)) { > __NET_INC_STATS(sock_net(nsk), > > LINUX_MIB_TCPMIGRATEREQSUCCESS); > @@ -1489,7 +1490,7 @@ void inet_csk_listen_stop(struct sock *sk) > reqsk_migrate_reset(nreq); > __reqsk_free(nreq); > } > - > + rcu_read_unlock(); > /* inet_csk_reqsk_queue_add() has already > * called inet_child_forget() on failure case. > */ ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-04-18 13:31 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260418041633.691435-1-jt26wzz@gmail.com>
2026-04-18 4:16 ` [PATCH net 1/2] tcp: call sk_data_ready() after listener migration Zhenzhong Wu
2026-04-18 6:02 ` Eric Dumazet
2026-04-18 13:30 ` 上勾拳
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox