* [PATCH net v2 0/2] tcp: fix listener wakeup after reuseport migration @ 2026-04-18 18:13 Zhenzhong Wu 2026-04-18 18:13 ` [PATCH net v2 1/2] tcp: call sk_data_ready() after listener migration Zhenzhong Wu 2026-04-18 18:13 ` [PATCH net v2 2/2] selftests/bpf: check epoll readiness after reuseport migration Zhenzhong Wu 0 siblings, 2 replies; 5+ messages in thread From: Zhenzhong Wu @ 2026-04-18 18:13 UTC (permalink / raw) To: netdev Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms, shuah, tamird, linux-kernel, linux-kselftest, Zhenzhong Wu This series fixes a missing wakeup when inet_csk_listen_stop() migrates an established child socket from a closing listener to another socket in the same SO_REUSEPORT group after the child has already been queued for accept. The target listener receives the migrated accept-queue entry via inet_csk_reqsk_queue_add(), but its waiters are not notified. Nonblocking accept() still succeeds because it checks the accept queue directly, but readiness-based waiters can remain asleep until another connection generates a wakeup. Patch 1 notifies the target listener after a successful migration in inet_csk_listen_stop() and protects the post-queue_add() nsk accesses with rcu_read_lock()/rcu_read_unlock(). Patch 2 extends the existing migrate_reuseport BPF selftest with an epoll readiness check for the TCP_ESTABLISHED migration case. Testing: - On a patched kernel booted under QEMU, the full migrate_reuseport selftest passes with SELFTEST_RC=0. --- v2: - wrap the post-queue_add() nsk dereferences with rcu_read_lock()/ rcu_read_unlock() to prevent a potential UAF (Eric Dumazet) - extend tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c instead of adding standalone net selftests (Kuniyuki Iwashima) - limit the epoll readiness check to TCP_ESTABLISHED cases v1: https://lore.kernel.org/netdev/20260418041633.691435-1-jt26wzz@gmail.com/ Zhenzhong Wu (2): tcp: call sk_data_ready() after listener migration selftests/bpf: check epoll readiness after reuseport migration net/ipv4/inet_connection_sock.c | 3 ++ .../bpf/prog_tests/migrate_reuseport.c | 32 ++++++++++++++++++- 2 files changed, 34 insertions(+), 1 deletion(-) base-commit: 52bcb57a4e8a0865a76c587c2451906342ae1b2d -- 2.43.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net v2 1/2] tcp: call sk_data_ready() after listener migration 2026-04-18 18:13 [PATCH net v2 0/2] tcp: fix listener wakeup after reuseport migration Zhenzhong Wu @ 2026-04-18 18:13 ` Zhenzhong Wu 2026-04-18 18:13 ` [PATCH net v2 2/2] selftests/bpf: check epoll readiness after reuseport migration Zhenzhong Wu 1 sibling, 0 replies; 5+ messages in thread From: Zhenzhong Wu @ 2026-04-18 18:13 UTC (permalink / raw) To: netdev Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms, shuah, tamird, linux-kernel, linux-kselftest, Zhenzhong Wu, stable When inet_csk_listen_stop() migrates an established child socket from a closing listener to another socket in the same SO_REUSEPORT group, the target listener gets a new accept-queue entry via inet_csk_reqsk_queue_add(), but that path never notifies the target listener's waiters. A nonblocking accept() still works because it checks the queue directly, but poll()/epoll_wait() waiters and blocking accept() callers can also remain asleep indefinitely. Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration in inet_csk_listen_stop(). However, after inet_csk_reqsk_queue_add() succeeds, the ref acquired in reuseport_migrate_sock() is effectively transferred to nreq->rsk_listener. Another CPU can then dequeue nreq via accept() or listener shutdown, hit reqsk_put(), and drop that listener ref. Since listeners are SOCK_RCU_FREE, wrap the post-queue_add() dereferences of nsk in rcu_read_lock()/rcu_read_unlock(), which also covers the existing sock_net(nsk) access in that path. The reqsk_timer_handler() path does not need the same changes for two reasons: half-open requests become readable only after the final ACK, where tcp_child_process() already wakes the listener; and once nreq is visible via inet_ehash_insert(), the success path no longer touches nsk directly. Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.") Cc: stable@vger.kernel.org Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com> --- net/ipv4/inet_connection_sock.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 4ac3ae1bc..928654c34 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -1479,16 +1479,19 @@ void inet_csk_listen_stop(struct sock *sk) if (nreq) { refcount_set(&nreq->rsk_refcnt, 1); + rcu_read_lock(); if (inet_csk_reqsk_queue_add(nsk, nreq, child)) { __NET_INC_STATS(sock_net(nsk), LINUX_MIB_TCPMIGRATEREQSUCCESS); reqsk_migrate_reset(req); + READ_ONCE(nsk->sk_data_ready)(nsk); } else { __NET_INC_STATS(sock_net(nsk), LINUX_MIB_TCPMIGRATEREQFAILURE); reqsk_migrate_reset(nreq); __reqsk_free(nreq); } + rcu_read_unlock(); /* inet_csk_reqsk_queue_add() has already * called inet_child_forget() on failure case. -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net v2 2/2] selftests/bpf: check epoll readiness after reuseport migration 2026-04-18 18:13 [PATCH net v2 0/2] tcp: fix listener wakeup after reuseport migration Zhenzhong Wu 2026-04-18 18:13 ` [PATCH net v2 1/2] tcp: call sk_data_ready() after listener migration Zhenzhong Wu @ 2026-04-18 18:13 ` Zhenzhong Wu 2026-04-21 7:15 ` Kuniyuki Iwashima 1 sibling, 1 reply; 5+ messages in thread From: Zhenzhong Wu @ 2026-04-18 18:13 UTC (permalink / raw) To: netdev Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms, shuah, tamird, linux-kernel, linux-kselftest, Zhenzhong Wu After migrate_dance() moves established children to the target listener, add it to an epoll set and verify that epoll_wait(..., 0) reports it ready before accept(). This adds epoll coverage for the TCP_ESTABLISHED reuseport migration case in migrate_reuseport. Keep the check limited to TCP_ESTABLISHED cases. TCP_SYN_RECV and TCP_NEW_SYN_RECV still depend on asynchronous handshake completion, so a zero-timeout epoll_wait() would race there. Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com> --- .../bpf/prog_tests/migrate_reuseport.c | 32 ++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c b/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c index 653b0a20f..580a53424 100644 --- a/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c +++ b/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c @@ -18,13 +18,16 @@ * 9. call shutdown() for the second server * and migrate the requests in the accept queue * to the last server socket. - * 10. call accept() for the last server socket. + * 10. for TCP_ESTABLISHED cases, call epoll_wait(..., 0) + * for the last server socket. + * 11. call accept() for the last server socket. * * Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> */ #include <bpf/bpf.h> #include <bpf/libbpf.h> +#include <sys/epoll.h> #include "test_progs.h" #include "test_migrate_reuseport.skel.h" @@ -522,6 +525,33 @@ static void run_test(struct migrate_reuseport_test_case *test_case, goto close_clients; } + /* Only TCP_ESTABLISHED has already-migrated accept-queue entries + * here. Later states still depend on follow-up handshake work. + */ + if (test_case->state == BPF_TCP_ESTABLISHED) { + struct epoll_event ev = { + .events = EPOLLIN, + }; + int epfd; + int nfds; + + epfd = epoll_create1(EPOLL_CLOEXEC); + if (!ASSERT_NEQ(epfd, -1, "epoll_create1")) + goto close_clients; + + ev.data.fd = test_case->servers[MIGRATED_TO]; + if (!ASSERT_OK(epoll_ctl(epfd, EPOLL_CTL_ADD, + test_case->servers[MIGRATED_TO], &ev), + "epoll_ctl")) + goto close_epfd; + + nfds = epoll_wait(epfd, &ev, 1, 0); + ASSERT_EQ(nfds, 1, "epoll_wait"); + +close_epfd: + close(epfd); + } + count_requests(test_case, skel); close_clients: -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net v2 2/2] selftests/bpf: check epoll readiness after reuseport migration 2026-04-18 18:13 ` [PATCH net v2 2/2] selftests/bpf: check epoll readiness after reuseport migration Zhenzhong Wu @ 2026-04-21 7:15 ` Kuniyuki Iwashima 2026-04-21 11:16 ` Zhenzhong Wu 0 siblings, 1 reply; 5+ messages in thread From: Kuniyuki Iwashima @ 2026-04-21 7:15 UTC (permalink / raw) To: jt26wzz Cc: davem, dsahern, edumazet, horms, kuba, kuniyu, linux-kernel, linux-kselftest, ncardwell, netdev, pabeni, shuah, tamird From: Zhenzhong Wu <jt26wzz@gmail.com> Date: Sun, 19 Apr 2026 02:13:33 +0800 > After migrate_dance() moves established children to the target > listener, add it to an epoll set and verify that epoll_wait(..., 0) > reports it ready before accept(). > > This adds epoll coverage for the TCP_ESTABLISHED reuseport migration > case in migrate_reuseport. > > Keep the check limited to TCP_ESTABLISHED cases. TCP_SYN_RECV and > TCP_NEW_SYN_RECV still depend on asynchronous handshake completion, > so a zero-timeout epoll_wait() would race there. > > Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com> > --- > .../bpf/prog_tests/migrate_reuseport.c | 32 ++++++++++++++++++- > 1 file changed, 31 insertions(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c b/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c > index 653b0a20f..580a53424 100644 > --- a/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c > +++ b/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c > @@ -18,13 +18,16 @@ > * 9. call shutdown() for the second server > * and migrate the requests in the accept queue > * to the last server socket. > - * 10. call accept() for the last server socket. > + * 10. for TCP_ESTABLISHED cases, call epoll_wait(..., 0) > + * for the last server socket. > + * 11. call accept() for the last server socket. > * > * Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> > */ > > #include <bpf/bpf.h> > #include <bpf/libbpf.h> > +#include <sys/epoll.h> > > #include "test_progs.h" > #include "test_migrate_reuseport.skel.h" > @@ -522,6 +525,33 @@ static void run_test(struct migrate_reuseport_test_case *test_case, > goto close_clients; > } > > + /* Only TCP_ESTABLISHED has already-migrated accept-queue entries > + * here. Later states still depend on follow-up handshake work. > + */ > + if (test_case->state == BPF_TCP_ESTABLISHED) { > + struct epoll_event ev = { > + .events = EPOLLIN, > + }; > + int epfd; > + int nfds; > + > + epfd = epoll_create1(EPOLL_CLOEXEC); > + if (!ASSERT_NEQ(epfd, -1, "epoll_create1")) > + goto close_clients; > + > + ev.data.fd = test_case->servers[MIGRATED_TO]; > + if (!ASSERT_OK(epoll_ctl(epfd, EPOLL_CTL_ADD, > + test_case->servers[MIGRATED_TO], &ev), > + "epoll_ctl")) > + goto close_epfd; > + > + nfds = epoll_wait(epfd, &ev, 1, 0); > + ASSERT_EQ(nfds, 1, "epoll_wait"); Thanks for the update, but the test passes without patch 1. I think it would be best to test just after shutdown() where migration happens. Also, TCP_SYN_RECV should be covered in the same way. ---8<--- diff --git a/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c b/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c index 580a534249a7..66fea936649e 100644 --- a/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c +++ b/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c @@ -353,8 +353,29 @@ static int update_maps(struct migrate_reuseport_test_case *test_case, static int migrate_dance(struct migrate_reuseport_test_case *test_case) { + struct epoll_event ev = { + .events = EPOLLIN, + }; + int epoll, nfds; int i, err; + if (test_case->state != BPF_TCP_NEW_SYN_RECV) { + epoll = epoll_create1(0); + if (!ASSERT_NEQ(epoll, -1, "epoll_create1")) + return -1; + + ev.data.fd = test_case->servers[MIGRATED_TO]; + if (!ASSERT_OK(epoll_ctl(epoll, EPOLL_CTL_ADD, + test_case->servers[MIGRATED_TO], &ev), + "epoll_ctl")) { + goto close_epoll; + } + + nfds = epoll_wait(epoll, &ev, 1, 0); + if (!ASSERT_EQ(nfds, 0, "epoll_wait 1")) + goto close_epoll; + } + /* Migrate TCP_ESTABLISHED and TCP_SYN_RECV requests * to the last listener based on eBPF. */ @@ -368,6 +389,15 @@ static int migrate_dance(struct migrate_reuseport_test_case *test_case) if (test_case->state == BPF_TCP_NEW_SYN_RECV) return 0; + nfds = epoll_wait(epoll, &ev, 1, 0); + if (!ASSERT_EQ(nfds, 1, "epoll_wait 2")) { +close_epoll: + close(epoll); + return -1; + } + + close(epoll); + /* Note that we use the second listener instead of the * first one here. * @@ -525,33 +555,6 @@ static void run_test(struct migrate_reuseport_test_case *test_case, goto close_clients; } - /* Only TCP_ESTABLISHED has already-migrated accept-queue entries - * here. Later states still depend on follow-up handshake work. - */ - if (test_case->state == BPF_TCP_ESTABLISHED) { - struct epoll_event ev = { - .events = EPOLLIN, - }; - int epfd; - int nfds; - - epfd = epoll_create1(EPOLL_CLOEXEC); - if (!ASSERT_NEQ(epfd, -1, "epoll_create1")) - goto close_clients; - - ev.data.fd = test_case->servers[MIGRATED_TO]; - if (!ASSERT_OK(epoll_ctl(epfd, EPOLL_CTL_ADD, - test_case->servers[MIGRATED_TO], &ev), - "epoll_ctl")) - goto close_epfd; - - nfds = epoll_wait(epfd, &ev, 1, 0); - ASSERT_EQ(nfds, 1, "epoll_wait"); - -close_epfd: - close(epfd); - } - count_requests(test_case, skel); close_clients: ---8<--- ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net v2 2/2] selftests/bpf: check epoll readiness after reuseport migration 2026-04-21 7:15 ` Kuniyuki Iwashima @ 2026-04-21 11:16 ` Zhenzhong Wu 0 siblings, 0 replies; 5+ messages in thread From: Zhenzhong Wu @ 2026-04-21 11:16 UTC (permalink / raw) To: Kuniyuki Iwashima Cc: davem, dsahern, edumazet, horms, kuba, linux-kernel, linux-kselftest, ncardwell, netdev, pabeni, shuah, tamird Thanks Kuniyuki, will fold this into v3. Your approach of checking epoll right around shutdown() is much better. I didn't think to put the check inside migrate_dance(). With my original placement, the test still passed without patch 1 — I should have called this out in the cover letter instead of settling for what was effectively just a smoke test. On Tue, Apr 21, 2026 at 3:20 PM Kuniyuki Iwashima <kuniyu@google.com> wrote: > > From: Zhenzhong Wu <jt26wzz@gmail.com> > Date: Sun, 19 Apr 2026 02:13:33 +0800 > > After migrate_dance() moves established children to the target > > listener, add it to an epoll set and verify that epoll_wait(..., 0) > > reports it ready before accept(). > > > > This adds epoll coverage for the TCP_ESTABLISHED reuseport migration > > case in migrate_reuseport. > > > > Keep the check limited to TCP_ESTABLISHED cases. TCP_SYN_RECV and > > TCP_NEW_SYN_RECV still depend on asynchronous handshake completion, > > so a zero-timeout epoll_wait() would race there. > > > > Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com> > > --- > > .../bpf/prog_tests/migrate_reuseport.c | 32 ++++++++++++++++++- > > 1 file changed, 31 insertions(+), 1 deletion(-) > > > > diff --git a/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c b/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c > > index 653b0a20f..580a53424 100644 > > --- a/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c > > +++ b/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c > > @@ -18,13 +18,16 @@ > > * 9. call shutdown() for the second server > > * and migrate the requests in the accept queue > > * to the last server socket. > > - * 10. call accept() for the last server socket. > > + * 10. for TCP_ESTABLISHED cases, call epoll_wait(..., 0) > > + * for the last server socket. > > + * 11. call accept() for the last server socket. > > * > > * Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> > > */ > > > > #include <bpf/bpf.h> > > #include <bpf/libbpf.h> > > +#include <sys/epoll.h> > > > > #include "test_progs.h" > > #include "test_migrate_reuseport.skel.h" > > @@ -522,6 +525,33 @@ static void run_test(struct migrate_reuseport_test_case *test_case, > > goto close_clients; > > } > > > > + /* Only TCP_ESTABLISHED has already-migrated accept-queue entries > > + * here. Later states still depend on follow-up handshake work. > > + */ > > + if (test_case->state == BPF_TCP_ESTABLISHED) { > > + struct epoll_event ev = { > > + .events = EPOLLIN, > > + }; > > + int epfd; > > + int nfds; > > + > > + epfd = epoll_create1(EPOLL_CLOEXEC); > > + if (!ASSERT_NEQ(epfd, -1, "epoll_create1")) > > + goto close_clients; > > + > > + ev.data.fd = test_case->servers[MIGRATED_TO]; > > + if (!ASSERT_OK(epoll_ctl(epfd, EPOLL_CTL_ADD, > > + test_case->servers[MIGRATED_TO], &ev), > > + "epoll_ctl")) > > + goto close_epfd; > > + > > + nfds = epoll_wait(epfd, &ev, 1, 0); > > + ASSERT_EQ(nfds, 1, "epoll_wait"); > > Thanks for the update, but the test passes without patch 1. > > I think it would be best to test just after shutdown() > where migration happens. > > Also, TCP_SYN_RECV should be covered in the same way. > > ---8<--- > diff --git a/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c b/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c > index 580a534249a7..66fea936649e 100644 > --- a/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c > +++ b/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c > @@ -353,8 +353,29 @@ static int update_maps(struct migrate_reuseport_test_case *test_case, > > static int migrate_dance(struct migrate_reuseport_test_case *test_case) > { > + struct epoll_event ev = { > + .events = EPOLLIN, > + }; > + int epoll, nfds; > int i, err; > > + if (test_case->state != BPF_TCP_NEW_SYN_RECV) { > + epoll = epoll_create1(0); > + if (!ASSERT_NEQ(epoll, -1, "epoll_create1")) > + return -1; > + > + ev.data.fd = test_case->servers[MIGRATED_TO]; > + if (!ASSERT_OK(epoll_ctl(epoll, EPOLL_CTL_ADD, > + test_case->servers[MIGRATED_TO], &ev), > + "epoll_ctl")) { > + goto close_epoll; > + } > + > + nfds = epoll_wait(epoll, &ev, 1, 0); > + if (!ASSERT_EQ(nfds, 0, "epoll_wait 1")) > + goto close_epoll; > + } > + > /* Migrate TCP_ESTABLISHED and TCP_SYN_RECV requests > * to the last listener based on eBPF. > */ > @@ -368,6 +389,15 @@ static int migrate_dance(struct migrate_reuseport_test_case *test_case) > if (test_case->state == BPF_TCP_NEW_SYN_RECV) > return 0; > > + nfds = epoll_wait(epoll, &ev, 1, 0); > + if (!ASSERT_EQ(nfds, 1, "epoll_wait 2")) { > +close_epoll: > + close(epoll); > + return -1; > + } > + > + close(epoll); > + > /* Note that we use the second listener instead of the > * first one here. > * > @@ -525,33 +555,6 @@ static void run_test(struct migrate_reuseport_test_case *test_case, > goto close_clients; > } > > - /* Only TCP_ESTABLISHED has already-migrated accept-queue entries > - * here. Later states still depend on follow-up handshake work. > - */ > - if (test_case->state == BPF_TCP_ESTABLISHED) { > - struct epoll_event ev = { > - .events = EPOLLIN, > - }; > - int epfd; > - int nfds; > - > - epfd = epoll_create1(EPOLL_CLOEXEC); > - if (!ASSERT_NEQ(epfd, -1, "epoll_create1")) > - goto close_clients; > - > - ev.data.fd = test_case->servers[MIGRATED_TO]; > - if (!ASSERT_OK(epoll_ctl(epfd, EPOLL_CTL_ADD, > - test_case->servers[MIGRATED_TO], &ev), > - "epoll_ctl")) > - goto close_epfd; > - > - nfds = epoll_wait(epfd, &ev, 1, 0); > - ASSERT_EQ(nfds, 1, "epoll_wait"); > - > -close_epfd: > - close(epfd); > - } > - > count_requests(test_case, skel); > > close_clients: > ---8<--- ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-21 11:16 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-18 18:13 [PATCH net v2 0/2] tcp: fix listener wakeup after reuseport migration Zhenzhong Wu 2026-04-18 18:13 ` [PATCH net v2 1/2] tcp: call sk_data_ready() after listener migration Zhenzhong Wu 2026-04-18 18:13 ` [PATCH net v2 2/2] selftests/bpf: check epoll readiness after reuseport migration Zhenzhong Wu 2026-04-21 7:15 ` Kuniyuki Iwashima 2026-04-21 11:16 ` Zhenzhong Wu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox