public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/2] tcp: fix listener wakeup after reuseport migration
@ 2026-04-18  4:16 Zhenzhong Wu
  2026-04-18  4:16 ` [PATCH net 1/2] tcp: call sk_data_ready() after listener migration Zhenzhong Wu
  2026-04-18  4:16 ` [PATCH net 2/2] selftests: net: add reuseport migration wakeup regression tests Zhenzhong Wu
  0 siblings, 2 replies; 6+ messages in thread
From: Zhenzhong Wu @ 2026-04-18  4:16 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, Zhenzhong Wu

Hi,

this small series fixes a missing wakeup after listener migration in
the SO_REUSEPORT close path and adds regression selftests.

The issue shows up when a fully established child has already been
queued on listener A, userspace has not accepted it yet, and
listener A is then closed. The kernel migrates that child to
listener B in the same SO_REUSEPORT group via
inet_csk_reqsk_queue_add(), but the target listener's waiters are
not notified.

As a result, a nonblocking accept() still succeeds because it checks
the accept queue directly, but waiters that sleep for listener
readiness can remain asleep until another connection generates a
wakeup. This affects poll()/epoll_wait()-based waiters, and can also
leave a blocking accept() asleep after migration even though the
child is already in the target listener's accept queue.

The fix is to notify the target listener after a successful
inet_csk_reqsk_queue_add() in inet_csk_listen_stop().

I also checked the half-open migration path in
reqsk_timer_handler(). That path does not need an extra wakeup here
because the listener becomes readable only after the final ACK
completes the handshake, and tcp_child_process() already wakes the
parent listener at that point.

The series adds selftests under tools/testing/selftests/net/ that
reproduce the regression for both IPv4 and IPv6. They cover both
epoll-based waiters and a blocking accept() waiter.

Patch 1 contains only the runtime fix so it can stand on its own and
be considered for stable backporting. Patch 2 adds the selftest
coverage.

Testing:

On an unpatched host kernel:

  unshare -Ur sh -c \
    './tools/testing/selftests/net/reuseport_migrate_epoll'
  unshare -Ur sh -c \
    './tools/testing/selftests/net/reuseport_migrate_accept'

The epoll selftest fails for both IPv4 and IPv6 with:

  accept queue was populated, but epoll_wait() timed out

The blocking accept selftest fails for both IPv4 and IPv6, for example
with:

  blocking accept() completed only in cleanup

On a patched kernel booted under QEMU with a minimal initramfs, both
selftests pass:

  ok 1 ipv4 epoll wake after reuseport migration
  ok 2 ipv6 epoll wake after reuseport migration
  reuseport_migrate_epoll_RC=0

  ok 1 ipv4 blocking accept wake after reuseport migration
  ok 2 ipv6 blocking accept wake after reuseport migration
  reuseport_migrate_accept_RC=0

Zhenzhong Wu (2):
  tcp: call sk_data_ready() after listener migration
  selftests: net: add reuseport migration wakeup regression tests

 net/ipv4/inet_connection_sock.c               |   1 +
 tools/testing/selftests/net/Makefile          |   3 +
 .../selftests/net/reuseport_migrate_accept.c  | 533 ++++++++++++++++++
 .../selftests/net/reuseport_migrate_epoll.c   | 353 ++++++++++++
 4 files changed, 890 insertions(+)
 create mode 100644 tools/testing/selftests/net/reuseport_migrate_accept.c
 create mode 100644 tools/testing/selftests/net/reuseport_migrate_epoll.c


base-commit: 52bcb57a4e8a0865a76c587c2451906342ae1b2d
-- 
2.43.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net 1/2] tcp: call sk_data_ready() after listener migration
  2026-04-18  4:16 [PATCH net 0/2] tcp: fix listener wakeup after reuseport migration Zhenzhong Wu
@ 2026-04-18  4:16 ` Zhenzhong Wu
  2026-04-18  6:02   ` Eric Dumazet
  2026-04-18  4:16 ` [PATCH net 2/2] selftests: net: add reuseport migration wakeup regression tests Zhenzhong Wu
  1 sibling, 1 reply; 6+ messages in thread
From: Zhenzhong Wu @ 2026-04-18  4:16 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, Zhenzhong Wu,
	stable

When inet_csk_listen_stop() migrates an established child socket from
a closing listener to another socket in the same SO_REUSEPORT group,
the target listener gets a new accept-queue entry via
inet_csk_reqsk_queue_add(), but that path never notifies the target
listener's waiters.

As a result, a nonblocking accept() still succeeds because it checks
the accept queue directly, but waiters that sleep for listener
readiness can remain asleep until another connection generates a
wakeup. This affects poll()/epoll_wait()-based waiters, and can also
leave a blocking accept() asleep after migration even though the
child is already in the target listener's accept queue.

This was observed in a local test where listener A completed the
handshake, queued the child, and was closed before userspace called
accept(). The child was migrated to listener B, but listener B never
received a wakeup for the migrated accept-queue entry.

Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration
in inet_csk_listen_stop().

The reqsk_timer_handler() path does not need the same change:
half-open requests only become readable to userspace when the final
ACK completes the handshake, and tcp_child_process() already wakes
the listener in that case.

Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
Cc: stable@vger.kernel.org
Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
---
 net/ipv4/inet_connection_sock.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4ac3ae1bc..da1ce082f 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1483,6 +1483,7 @@ void inet_csk_listen_stop(struct sock *sk)
 					__NET_INC_STATS(sock_net(nsk),
 							LINUX_MIB_TCPMIGRATEREQSUCCESS);
 					reqsk_migrate_reset(req);
+					READ_ONCE(nsk->sk_data_ready)(nsk);
 				} else {
 					__NET_INC_STATS(sock_net(nsk),
 							LINUX_MIB_TCPMIGRATEREQFAILURE);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net 2/2] selftests: net: add reuseport migration wakeup regression tests
  2026-04-18  4:16 [PATCH net 0/2] tcp: fix listener wakeup after reuseport migration Zhenzhong Wu
  2026-04-18  4:16 ` [PATCH net 1/2] tcp: call sk_data_ready() after listener migration Zhenzhong Wu
@ 2026-04-18  4:16 ` Zhenzhong Wu
  2026-04-18  4:40   ` Kuniyuki Iwashima
  1 sibling, 1 reply; 6+ messages in thread
From: Zhenzhong Wu @ 2026-04-18  4:16 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, Zhenzhong Wu

Add selftests that reproduce missing wakeups on the target listener
after SO_REUSEPORT migration from inet_csk_listen_stop().

The epoll case connects while only the first listener is active so the
child lands on its accept queue, registers the second listener with
epoll, then closes the first listener to trigger migration. It verifies
that the target listener both accepts the migrated child and becomes
readable via epoll.

The blocking accept case starts a thread blocked in accept() on the
target listener, closes the first listener to trigger migration, and
verifies that the blocked accept() wakes and returns the migrated
child. Wait until the helper thread is actually asleep in accept()
before triggering migration so the test does not race waiter
registration.

Run the tests in a private network namespace and enable
net.ipv4.tcp_migrate_req=1 there so they can exercise the migration
path without relying on a sk_reuseport/migrate BPF program. Treat a
missing or unwritable tcp_migrate_req sysctl as SKIP. Run both
scenarios for IPv4 and IPv6.

These tests cover the bug fixed by the preceding patch.

Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
---
 tools/testing/selftests/net/Makefile          |   3 +
 .../selftests/net/reuseport_migrate_accept.c  | 533 ++++++++++++++++++
 .../selftests/net/reuseport_migrate_epoll.c   | 353 ++++++++++++
 3 files changed, 889 insertions(+)
 create mode 100644 tools/testing/selftests/net/reuseport_migrate_accept.c
 create mode 100644 tools/testing/selftests/net/reuseport_migrate_epoll.c

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index a275ed584..2f8b6c44d 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -184,6 +184,8 @@ TEST_GEN_PROGS := \
 	reuseport_bpf_cpu \
 	reuseport_bpf_numa \
 	reuseport_dualstack \
+	reuseport_migrate_accept \
+	reuseport_migrate_epoll \
 	sk_bind_sendto_listen \
 	sk_connect_zero_addr \
 	sk_so_peek_off \
@@ -232,6 +234,7 @@ $(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma
 $(OUTPUT)/tcp_mmap: LDLIBS += -lpthread -lcrypto
 $(OUTPUT)/tcp_inq: LDLIBS += -lpthread
 $(OUTPUT)/bind_bhash: LDLIBS += -lpthread
+$(OUTPUT)/reuseport_migrate_accept: LDLIBS += -lpthread
 $(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/
 
 include bpf.mk
diff --git a/tools/testing/selftests/net/reuseport_migrate_accept.c b/tools/testing/selftests/net/reuseport_migrate_accept.c
new file mode 100644
index 000000000..a516843a0
--- /dev/null
+++ b/tools/testing/selftests/net/reuseport_migrate_accept.c
@@ -0,0 +1,533 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <netinet/in.h>
+#include <pthread.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdatomic.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <sys/syscall.h>
+#include <time.h>
+#include <unistd.h>
+
+#include "../kselftest.h"
+
+#define ACCEPT_BLOCK_TIMEOUT_MS 1000
+#define ACCEPT_CLEANUP_TIMEOUT_MS 1000
+#define ACCEPT_WAKE_TIMEOUT_MS 2000
+#define TCP_MIGRATE_REQ_PATH "/proc/sys/net/ipv4/tcp_migrate_req"
+
+struct reuseport_migrate_case {
+	const char *name;
+	int family;
+	const char *addr;
+};
+
+struct accept_result {
+	int listener_fd;
+	atomic_int started;
+	atomic_int tid;
+	int accepted_fd;
+	int err;
+};
+
+static const struct reuseport_migrate_case test_cases[] = {
+	{
+		.name = "ipv4 blocking accept wake after reuseport migration",
+		.family = AF_INET,
+		.addr = "127.0.0.1",
+	},
+	{
+		.name = "ipv6 blocking accept wake after reuseport migration",
+		.family = AF_INET6,
+		.addr = "::1",
+	},
+};
+
+static void close_fd(int *fd)
+{
+	if (*fd >= 0) {
+		close(*fd);
+		*fd = -1;
+	}
+}
+
+static bool unsupported_addr_err(int family, int err)
+{
+	return family == AF_INET6 &&
+		(err == EAFNOSUPPORT ||
+		 err == EPROTONOSUPPORT ||
+		 err == EADDRNOTAVAIL);
+}
+
+static int make_sockaddr(const struct reuseport_migrate_case *test_case,
+			 unsigned short port,
+			 struct sockaddr_storage *addr,
+			 socklen_t *addrlen)
+{
+	memset(addr, 0, sizeof(*addr));
+
+	if (test_case->family == AF_INET) {
+		struct sockaddr_in *addr4 = (struct sockaddr_in *)addr;
+
+		addr4->sin_family = AF_INET;
+		addr4->sin_port = htons(port);
+		if (inet_pton(AF_INET, test_case->addr, &addr4->sin_addr) != 1)
+			return -1;
+
+		*addrlen = sizeof(*addr4);
+		return 0;
+	}
+
+	if (test_case->family == AF_INET6) {
+		struct sockaddr_in6 *addr6 = (struct sockaddr_in6 *)addr;
+
+		addr6->sin6_family = AF_INET6;
+		addr6->sin6_port = htons(port);
+		if (inet_pton(AF_INET6, test_case->addr, &addr6->sin6_addr) != 1)
+			return -1;
+
+		*addrlen = sizeof(*addr6);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int create_reuseport_socket(const struct reuseport_migrate_case *test_case)
+{
+	int one = 1;
+	int fd;
+
+	fd = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+	if (fd < 0)
+		return -1;
+
+	if (test_case->family == AF_INET6 &&
+	    setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, &one, sizeof(one))) {
+		close(fd);
+		return -1;
+	}
+
+	if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one))) {
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+static int enable_tcp_migrate_req(void)
+{
+	int len;
+	int fd;
+
+	fd = open(TCP_MIGRATE_REQ_PATH, O_RDWR | O_CLOEXEC);
+	if (fd < 0) {
+		if (errno == ENOENT || errno == EACCES ||
+		    errno == EPERM || errno == EROFS)
+			return KSFT_SKIP;
+		return KSFT_FAIL;
+	}
+
+	len = write(fd, "1", 1);
+	if (len != 1) {
+		if (errno == EACCES || errno == EPERM || errno == EROFS) {
+			close(fd);
+			return KSFT_SKIP;
+		}
+
+		close(fd);
+		return KSFT_FAIL;
+	}
+
+	close(fd);
+	return KSFT_PASS;
+}
+
+static void setup_netns(void)
+{
+	int ret;
+
+	if (unshare(CLONE_NEWNET))
+		ksft_exit_skip("unshare(CLONE_NEWNET): %s\n", strerror(errno));
+
+	if (system("ip link set lo up"))
+		ksft_exit_skip("failed to bring up lo interface in netns\n");
+
+	ret = enable_tcp_migrate_req();
+	if (ret == KSFT_SKIP)
+		ksft_exit_skip("failed to enable tcp_migrate_req\n");
+	if (ret == KSFT_FAIL)
+		ksft_exit_fail_msg("failed to enable tcp_migrate_req\n");
+}
+
+static void noop_handler(int sig)
+{
+	(void)sig;
+}
+
+static void *accept_thread(void *arg)
+{
+	struct accept_result *result = arg;
+
+	atomic_store_explicit(&result->tid, (int)syscall(SYS_gettid),
+			      memory_order_release);
+	atomic_store_explicit(&result->started, 1, memory_order_release);
+	result->accepted_fd = accept4(result->listener_fd, NULL, NULL,
+				      SOCK_CLOEXEC);
+	if (result->accepted_fd < 0)
+		result->err = errno;
+
+	return NULL;
+}
+
+static int read_thread_state(int tid, char *state)
+{
+	char *close_paren;
+	char path[64];
+	char buf[256];
+	ssize_t len;
+	int fd;
+
+	snprintf(path, sizeof(path), "/proc/self/task/%d/stat", tid);
+
+	fd = open(path, O_RDONLY | O_CLOEXEC);
+	if (fd < 0)
+		return -errno;
+
+	len = read(fd, buf, sizeof(buf) - 1);
+	close(fd);
+	if (len < 0)
+		return -errno;
+	if (!len)
+		return -EINVAL;
+
+	buf[len] = '\0';
+	close_paren = strrchr(buf, ')');
+	if (!close_paren || close_paren[1] != ' ' || !close_paren[2])
+		return -EINVAL;
+
+	*state = close_paren[2];
+	return 0;
+}
+
+static int wait_for_accept_to_block(const struct reuseport_migrate_case *test_case,
+				    int tid)
+{
+	char state = '\0';
+	int ret;
+	int i;
+
+	/*
+	 * A started thread is not enough here: we need to know the waiter
+	 * has actually gone to sleep in accept() before closing listener_a,
+	 * otherwise migration can race ahead of waiter registration. Poll
+	 * /proc task state because the pthread APIs can tell us whether the
+	 * thread has exited, but not whether it is already blocked in the
+	 * target syscall.
+	 */
+	for (i = 0; i < ACCEPT_BLOCK_TIMEOUT_MS; i++) {
+		ret = read_thread_state(tid, &state);
+		if (!ret) {
+			if (state == 'S' || state == 'D')
+				return KSFT_PASS;
+			if (state == 'Z')
+				break;
+		} else if (ret == -ENOENT) {
+			break;
+		}
+
+		usleep(1000);
+	}
+
+	ksft_print_msg("%s: accept waiter never blocked before migration\n",
+		       test_case->name);
+	return KSFT_FAIL;
+}
+
+static int join_thread_with_timeout(pthread_t thread, int timeout_ms,
+				    bool *timed_out)
+{
+	struct timespec deadline;
+	int err;
+
+	*timed_out = false;
+
+	if (clock_gettime(CLOCK_REALTIME, &deadline))
+		return KSFT_FAIL;
+
+	deadline.tv_nsec += timeout_ms * 1000000LL;
+	deadline.tv_sec += deadline.tv_nsec / 1000000000LL;
+	deadline.tv_nsec %= 1000000000LL;
+
+	err = pthread_timedjoin_np(thread, NULL, &deadline);
+	if (!err)
+		return KSFT_PASS;
+
+	if (err != ETIMEDOUT)
+		return KSFT_FAIL;
+
+	*timed_out = true;
+	return KSFT_FAIL;
+}
+
+static int interrupt_accept_thread(pthread_t thread)
+{
+	int err;
+
+	err = pthread_kill(thread, SIGUSR1);
+	if (err && err != ESRCH)
+		return KSFT_FAIL;
+
+	return KSFT_PASS;
+}
+
+static int stop_accept_thread(pthread_t thread, bool *timed_out)
+{
+	if (interrupt_accept_thread(thread))
+		return KSFT_FAIL;
+
+	return join_thread_with_timeout(thread, ACCEPT_CLEANUP_TIMEOUT_MS,
+					timed_out);
+}
+
+static int run_test(const struct reuseport_migrate_case *test_case)
+{
+	struct accept_result result = {
+		.listener_fd = -1,
+		.started = 0,
+		.tid = -1,
+		.accepted_fd = -1,
+		.err = 0,
+	};
+	struct sockaddr_storage addr;
+	struct sigaction sa = {
+		.sa_handler = noop_handler,
+	};
+	bool thread_joined = false;
+	bool cleanup_timed_out;
+	int listener_a = -1;
+	int listener_b = -1;
+	int ret = KSFT_FAIL;
+	socklen_t addrlen;
+	pthread_t thread;
+	int client = -1;
+	bool timed_out;
+	int probe = -1;
+	int tid;
+
+	if (make_sockaddr(test_case, 0, &addr, &addrlen)) {
+		ksft_print_msg("%s: failed to build socket address\n",
+			       test_case->name);
+		goto out;
+	}
+
+	if (sigemptyset(&sa.sa_mask)) {
+		ksft_perror("sigemptyset");
+		goto out;
+	}
+
+	if (sigaction(SIGUSR1, &sa, NULL)) {
+		ksft_perror("sigaction(SIGUSR1)");
+		goto out;
+	}
+
+	listener_a = create_reuseport_socket(test_case);
+	if (listener_a < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(listener_a)");
+		goto out;
+	}
+
+	if (bind(listener_a, (struct sockaddr *)&addr, addrlen)) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("bind(listener_a)");
+		goto out;
+	}
+
+	if (listen(listener_a, 1)) {
+		ksft_perror("listen(listener_a)");
+		goto out;
+	}
+
+	addrlen = sizeof(addr);
+	if (getsockname(listener_a, (struct sockaddr *)&addr, &addrlen)) {
+		ksft_perror("getsockname(listener_a)");
+		goto out;
+	}
+
+	listener_b = create_reuseport_socket(test_case);
+	if (listener_b < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(listener_b)");
+		goto out;
+	}
+
+	if (bind(listener_b, (struct sockaddr *)&addr, addrlen)) {
+		ksft_perror("bind(listener_b)");
+		goto out;
+	}
+
+	client = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+	if (client < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(client)");
+		goto out;
+	}
+
+	/* Connect while only listener_a is listening, ensuring the
+	 * child lands in listener_a's accept queue deterministically.
+	 */
+	if (connect(client, (struct sockaddr *)&addr, addrlen)) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("connect(client)");
+		goto out;
+	}
+
+	if (listen(listener_b, 1)) {
+		ksft_perror("listen(listener_b)");
+		goto out;
+	}
+
+	result.listener_fd = listener_b;
+	if (pthread_create(&thread, NULL, accept_thread, &result)) {
+		ksft_perror("pthread_create");
+		goto out;
+	}
+
+	while (!atomic_load_explicit(&result.started, memory_order_acquire))
+		sched_yield();
+
+	tid = atomic_load_explicit(&result.tid, memory_order_acquire);
+	if (wait_for_accept_to_block(test_case, tid))
+		goto out_with_thread;
+
+	close_fd(&listener_a);
+
+	ret = join_thread_with_timeout(thread, ACCEPT_WAKE_TIMEOUT_MS, &timed_out);
+	if (ret == KSFT_PASS) {
+		thread_joined = true;
+		if (result.accepted_fd < 0) {
+			ksft_print_msg("%s: blocking accept() returned err=%d (%s)\n",
+				       test_case->name, result.err,
+				       strerror(result.err));
+			ret = KSFT_FAIL;
+		}
+
+		goto out_with_thread;
+	}
+
+	if (!timed_out) {
+		ksft_print_msg("%s: join_thread_with_timeout() failed\n",
+			       test_case->name);
+		goto out_with_thread;
+	}
+
+	if (stop_accept_thread(thread, &cleanup_timed_out) == KSFT_FAIL) {
+		ksft_print_msg("%s: failed to stop blocking accept waiter\n",
+			       test_case->name);
+		goto out_with_thread;
+	}
+	thread_joined = true;
+
+	if (result.accepted_fd >= 0) {
+		ksft_print_msg("%s: blocking accept() completed only in cleanup\n",
+			       test_case->name);
+		goto out_with_thread;
+	}
+
+	if (result.err != EINTR) {
+		ksft_print_msg("%s: blocking accept() returned err=%d (%s)\n",
+			       test_case->name, result.err,
+			       strerror(result.err));
+		goto out_with_thread;
+	}
+
+	probe = accept4(listener_b, NULL, NULL, SOCK_NONBLOCK | SOCK_CLOEXEC);
+	if (probe >= 0) {
+		ksft_print_msg("%s: accept queue was populated, but blocking accept() timed out\n",
+			       test_case->name);
+	} else if (errno == EAGAIN || errno == EWOULDBLOCK) {
+		ksft_print_msg("%s: target listener had no queued child after migration\n",
+			       test_case->name);
+	} else {
+		ksft_perror("accept4(listener_b)");
+	}
+
+out_with_thread:
+	close_fd(&probe);
+	if (!thread_joined) {
+		if (stop_accept_thread(thread, &cleanup_timed_out) == KSFT_FAIL) {
+			ksft_print_msg("%s: failed to stop blocking accept waiter\n",
+				       test_case->name);
+			ret = KSFT_FAIL;
+			goto out;
+		}
+
+		thread_joined = true;
+	}
+	if (thread_joined)
+		close_fd(&result.accepted_fd);
+
+out:
+	close_fd(&client);
+	close_fd(&listener_b);
+	close_fd(&listener_a);
+
+	return ret;
+}
+
+int main(void)
+{
+	int status = KSFT_PASS;
+	int ret;
+	int i;
+
+	setup_netns();
+
+	ksft_print_header();
+	ksft_set_plan(ARRAY_SIZE(test_cases));
+
+	for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+		ret = run_test(&test_cases[i]);
+		ksft_test_result_code(ret, test_cases[i].name, NULL);
+
+		if (ret == KSFT_FAIL)
+			status = KSFT_FAIL;
+	}
+
+	if (status == KSFT_FAIL)
+		ksft_exit_fail();
+
+	ksft_finished();
+}
diff --git a/tools/testing/selftests/net/reuseport_migrate_epoll.c b/tools/testing/selftests/net/reuseport_migrate_epoll.c
new file mode 100644
index 000000000..9cbfb58c4
--- /dev/null
+++ b/tools/testing/selftests/net/reuseport_migrate_epoll.c
@@ -0,0 +1,353 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <netinet/in.h>
+#include <sched.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/epoll.h>
+#include <sys/socket.h>
+#include <unistd.h>
+
+#include "../kselftest.h"
+
+#define EPOLL_TIMEOUT_MS 500
+#define TCP_MIGRATE_REQ_PATH "/proc/sys/net/ipv4/tcp_migrate_req"
+
+struct reuseport_migrate_case {
+	const char *name;
+	int family;
+	const char *addr;
+};
+
+static const struct reuseport_migrate_case test_cases[] = {
+	{
+		.name = "ipv4 epoll wake after reuseport migration",
+		.family = AF_INET,
+		.addr = "127.0.0.1",
+	},
+	{
+		.name = "ipv6 epoll wake after reuseport migration",
+		.family = AF_INET6,
+		.addr = "::1",
+	},
+};
+
+static void close_fd(int *fd)
+{
+	if (*fd >= 0) {
+		close(*fd);
+		*fd = -1;
+	}
+}
+
+static bool unsupported_addr_err(int family, int err)
+{
+	return family == AF_INET6 &&
+		(err == EAFNOSUPPORT ||
+		 err == EPROTONOSUPPORT ||
+		 err == EADDRNOTAVAIL);
+}
+
+static int make_sockaddr(const struct reuseport_migrate_case *test_case,
+			 unsigned short port,
+			 struct sockaddr_storage *addr,
+			 socklen_t *addrlen)
+{
+	memset(addr, 0, sizeof(*addr));
+
+	if (test_case->family == AF_INET) {
+		struct sockaddr_in *addr4 = (struct sockaddr_in *)addr;
+
+		addr4->sin_family = AF_INET;
+		addr4->sin_port = htons(port);
+		if (inet_pton(AF_INET, test_case->addr, &addr4->sin_addr) != 1)
+			return -1;
+
+		*addrlen = sizeof(*addr4);
+		return 0;
+	}
+
+	if (test_case->family == AF_INET6) {
+		struct sockaddr_in6 *addr6 = (struct sockaddr_in6 *)addr;
+
+		addr6->sin6_family = AF_INET6;
+		addr6->sin6_port = htons(port);
+		if (inet_pton(AF_INET6, test_case->addr, &addr6->sin6_addr) != 1)
+			return -1;
+
+		*addrlen = sizeof(*addr6);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int create_reuseport_socket(const struct reuseport_migrate_case *test_case)
+{
+	int one = 1;
+	int fd;
+
+	fd = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+	if (fd < 0)
+		return -1;
+
+	if (test_case->family == AF_INET6 &&
+	    setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, &one, sizeof(one))) {
+		close(fd);
+		return -1;
+	}
+
+	if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one))) {
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+static int set_nonblocking(int fd)
+{
+	int flags;
+
+	flags = fcntl(fd, F_GETFL);
+	if (flags < 0)
+		return -1;
+
+	return fcntl(fd, F_SETFL, flags | O_NONBLOCK);
+}
+
+static int enable_tcp_migrate_req(void)
+{
+	int len;
+	int fd;
+
+	fd = open(TCP_MIGRATE_REQ_PATH, O_RDWR | O_CLOEXEC);
+	if (fd < 0) {
+		if (errno == ENOENT || errno == EACCES ||
+		    errno == EPERM || errno == EROFS)
+			return KSFT_SKIP;
+		return KSFT_FAIL;
+	}
+
+	len = write(fd, "1", 1);
+	if (len != 1) {
+		if (errno == EACCES || errno == EPERM || errno == EROFS) {
+			close(fd);
+			return KSFT_SKIP;
+		}
+
+		close(fd);
+		return KSFT_FAIL;
+	}
+
+	close(fd);
+	return KSFT_PASS;
+}
+
+static void setup_netns(void)
+{
+	int ret;
+
+	if (unshare(CLONE_NEWNET))
+		ksft_exit_skip("unshare(CLONE_NEWNET): %s\n", strerror(errno));
+
+	if (system("ip link set lo up"))
+		ksft_exit_skip("failed to bring up lo interface in netns\n");
+
+	ret = enable_tcp_migrate_req();
+	if (ret == KSFT_SKIP)
+		ksft_exit_skip("failed to enable tcp_migrate_req\n");
+	if (ret == KSFT_FAIL)
+		ksft_exit_fail_msg("failed to enable tcp_migrate_req\n");
+}
+
+static int run_test(const struct reuseport_migrate_case *test_case)
+{
+	struct sockaddr_storage addr;
+	struct epoll_event ev = {
+		.events = EPOLLIN,
+	};
+	int listener_a = -1;
+	int listener_b = -1;
+	int ret = KSFT_FAIL;
+	socklen_t addrlen;
+	int accepted = -1;
+	int client = -1;
+	int epfd = -1;
+	int n;
+
+	if (make_sockaddr(test_case, 0, &addr, &addrlen)) {
+		ksft_print_msg("%s: failed to build socket address\n",
+			       test_case->name);
+		goto out;
+	}
+
+	listener_a = create_reuseport_socket(test_case);
+	if (listener_a < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(listener_a)");
+		goto out;
+	}
+
+	if (bind(listener_a, (struct sockaddr *)&addr, addrlen)) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("bind(listener_a)");
+		goto out;
+	}
+
+	if (listen(listener_a, 1)) {
+		ksft_perror("listen(listener_a)");
+		goto out;
+	}
+
+	addrlen = sizeof(addr);
+	if (getsockname(listener_a, (struct sockaddr *)&addr, &addrlen)) {
+		ksft_perror("getsockname(listener_a)");
+		goto out;
+	}
+
+	listener_b = create_reuseport_socket(test_case);
+	if (listener_b < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(listener_b)");
+		goto out;
+	}
+
+	if (bind(listener_b, (struct sockaddr *)&addr, addrlen)) {
+		ksft_perror("bind(listener_b)");
+		goto out;
+	}
+
+	client = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+	if (client < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(client)");
+		goto out;
+	}
+
+	/* Connect while only listener_a is listening, ensuring the
+	 * child lands in listener_a's accept queue deterministically.
+	 */
+	if (connect(client, (struct sockaddr *)&addr, addrlen)) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("connect(client)");
+		goto out;
+	}
+
+	if (listen(listener_b, 1)) {
+		ksft_perror("listen(listener_b)");
+		goto out;
+	}
+
+	if (set_nonblocking(listener_b)) {
+		ksft_perror("set_nonblocking(listener_b)");
+		goto out;
+	}
+
+	epfd = epoll_create1(EPOLL_CLOEXEC);
+	if (epfd < 0) {
+		ksft_perror("epoll_create1");
+		goto out;
+	}
+
+	ev.data.fd = listener_b;
+	if (epoll_ctl(epfd, EPOLL_CTL_ADD, listener_b, &ev)) {
+		ksft_perror("epoll_ctl(ADD listener_b)");
+		goto out;
+	}
+
+	close_fd(&listener_a);
+
+	n = epoll_wait(epfd, &ev, 1, EPOLL_TIMEOUT_MS);
+	if (n < 0) {
+		ksft_perror("epoll_wait");
+		goto out;
+	}
+
+	accepted = accept4(listener_b, NULL, NULL, SOCK_NONBLOCK | SOCK_CLOEXEC);
+	if (accepted < 0) {
+		if (errno == EAGAIN || errno == EWOULDBLOCK) {
+			ksft_print_msg("%s: target listener had no queued child after migration\n",
+				       test_case->name);
+			goto out;
+		}
+
+		ksft_perror("accept4(listener_b)");
+		goto out;
+	}
+
+	if (n != 1) {
+		ksft_print_msg("%s: accept queue was populated, but epoll_wait() timed out\n",
+			       test_case->name);
+		goto out;
+	}
+
+	if (ev.data.fd != listener_b || !(ev.events & EPOLLIN)) {
+		ksft_print_msg("%s: unexpected epoll event fd=%d events=%#x\n",
+			       test_case->name, ev.data.fd, ev.events);
+		goto out;
+	}
+
+	ret = KSFT_PASS;
+
+out:
+	close_fd(&accepted);
+	close_fd(&epfd);
+	close_fd(&client);
+	close_fd(&listener_b);
+	close_fd(&listener_a);
+
+	return ret;
+}
+
+int main(void)
+{
+	int status = KSFT_PASS;
+	int ret;
+	int i;
+
+	setup_netns();
+
+	ksft_print_header();
+	ksft_set_plan(ARRAY_SIZE(test_cases));
+
+	for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+		ret = run_test(&test_cases[i]);
+		ksft_test_result_code(ret, test_cases[i].name, NULL);
+
+		if (ret == KSFT_FAIL)
+			status = KSFT_FAIL;
+	}
+
+	if (status == KSFT_FAIL)
+		ksft_exit_fail();
+
+	ksft_finished();
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net 2/2] selftests: net: add reuseport migration wakeup regression tests
  2026-04-18  4:16 ` [PATCH net 2/2] selftests: net: add reuseport migration wakeup regression tests Zhenzhong Wu
@ 2026-04-18  4:40   ` Kuniyuki Iwashima
  0 siblings, 0 replies; 6+ messages in thread
From: Kuniyuki Iwashima @ 2026-04-18  4:40 UTC (permalink / raw)
  To: Zhenzhong Wu
  Cc: netdev, edumazet, ncardwell, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest

On Fri, Apr 17, 2026 at 9:17 PM Zhenzhong Wu <jt26wzz@gmail.com> wrote:
>
> Add selftests that reproduce missing wakeups on the target listener
> after SO_REUSEPORT migration from inet_csk_listen_stop().
>
> The epoll case connects while only the first listener is active so the
> child lands on its accept queue, registers the second listener with
> epoll, then closes the first listener to trigger migration. It verifies
> that the target listener both accepts the migrated child and becomes
> readable via epoll.
>
> The blocking accept case starts a thread blocked in accept() on the
> target listener, closes the first listener to trigger migration, and
> verifies that the blocked accept() wakes and returns the migrated
> child. Wait until the helper thread is actually asleep in accept()
> before triggering migration so the test does not race waiter
> registration.
>
> Run the tests in a private network namespace and enable
> net.ipv4.tcp_migrate_req=1 there so they can exercise the migration
> path without relying on a sk_reuseport/migrate BPF program. Treat a
> missing or unwritable tcp_migrate_req sysctl as SKIP. Run both
> scenarios for IPv4 and IPv6.
>
> These tests cover the bug fixed by the preceding patch.
>
> Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
> ---
>  tools/testing/selftests/net/Makefile          |   3 +
>  .../selftests/net/reuseport_migrate_accept.c  | 533 ++++++++++++++++++
>  .../selftests/net/reuseport_migrate_epoll.c   | 353 ++++++++++++
>  3 files changed, 889 insertions(+)
>  create mode 100644 tools/testing/selftests/net/reuseport_migrate_accept.c
>  create mode 100644 tools/testing/selftests/net/reuseport_migrate_epoll.c

Thanks for the series.

Instead of adding new tests, can you extend
tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c ?

It covers all migration scenarios and you can just add
the target listener to epoll and call non-blocking epoll_wait(,... 0)
before accept() to check if it returns 1 (the number of fd).


>
> diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
> index a275ed584..2f8b6c44d 100644
> --- a/tools/testing/selftests/net/Makefile
> +++ b/tools/testing/selftests/net/Makefile
> @@ -184,6 +184,8 @@ TEST_GEN_PROGS := \
>         reuseport_bpf_cpu \
>         reuseport_bpf_numa \
>         reuseport_dualstack \
> +       reuseport_migrate_accept \
> +       reuseport_migrate_epoll \
>         sk_bind_sendto_listen \
>         sk_connect_zero_addr \
>         sk_so_peek_off \
> @@ -232,6 +234,7 @@ $(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma
>  $(OUTPUT)/tcp_mmap: LDLIBS += -lpthread -lcrypto
>  $(OUTPUT)/tcp_inq: LDLIBS += -lpthread
>  $(OUTPUT)/bind_bhash: LDLIBS += -lpthread
> +$(OUTPUT)/reuseport_migrate_accept: LDLIBS += -lpthread
>  $(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/
>
>  include bpf.mk
> diff --git a/tools/testing/selftests/net/reuseport_migrate_accept.c b/tools/testing/selftests/net/reuseport_migrate_accept.c
> new file mode 100644
> index 000000000..a516843a0
> --- /dev/null
> +++ b/tools/testing/selftests/net/reuseport_migrate_accept.c
> @@ -0,0 +1,533 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define _GNU_SOURCE
> +
> +#include <arpa/inet.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <netinet/in.h>
> +#include <pthread.h>
> +#include <sched.h>
> +#include <signal.h>
> +#include <stdbool.h>
> +#include <stdatomic.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/socket.h>
> +#include <sys/syscall.h>
> +#include <time.h>
> +#include <unistd.h>
> +
> +#include "../kselftest.h"
> +
> +#define ACCEPT_BLOCK_TIMEOUT_MS 1000
> +#define ACCEPT_CLEANUP_TIMEOUT_MS 1000
> +#define ACCEPT_WAKE_TIMEOUT_MS 2000
> +#define TCP_MIGRATE_REQ_PATH "/proc/sys/net/ipv4/tcp_migrate_req"
> +
> +struct reuseport_migrate_case {
> +       const char *name;
> +       int family;
> +       const char *addr;
> +};
> +
> +struct accept_result {
> +       int listener_fd;
> +       atomic_int started;
> +       atomic_int tid;
> +       int accepted_fd;
> +       int err;
> +};
> +
> +static const struct reuseport_migrate_case test_cases[] = {
> +       {
> +               .name = "ipv4 blocking accept wake after reuseport migration",
> +               .family = AF_INET,
> +               .addr = "127.0.0.1",
> +       },
> +       {
> +               .name = "ipv6 blocking accept wake after reuseport migration",
> +               .family = AF_INET6,
> +               .addr = "::1",
> +       },
> +};
> +
> +static void close_fd(int *fd)
> +{
> +       if (*fd >= 0) {
> +               close(*fd);
> +               *fd = -1;
> +       }
> +}
> +
> +static bool unsupported_addr_err(int family, int err)
> +{
> +       return family == AF_INET6 &&
> +               (err == EAFNOSUPPORT ||
> +                err == EPROTONOSUPPORT ||
> +                err == EADDRNOTAVAIL);
> +}
> +
> +static int make_sockaddr(const struct reuseport_migrate_case *test_case,
> +                        unsigned short port,
> +                        struct sockaddr_storage *addr,
> +                        socklen_t *addrlen)
> +{
> +       memset(addr, 0, sizeof(*addr));
> +
> +       if (test_case->family == AF_INET) {
> +               struct sockaddr_in *addr4 = (struct sockaddr_in *)addr;
> +
> +               addr4->sin_family = AF_INET;
> +               addr4->sin_port = htons(port);
> +               if (inet_pton(AF_INET, test_case->addr, &addr4->sin_addr) != 1)
> +                       return -1;
> +
> +               *addrlen = sizeof(*addr4);
> +               return 0;
> +       }
> +
> +       if (test_case->family == AF_INET6) {
> +               struct sockaddr_in6 *addr6 = (struct sockaddr_in6 *)addr;
> +
> +               addr6->sin6_family = AF_INET6;
> +               addr6->sin6_port = htons(port);
> +               if (inet_pton(AF_INET6, test_case->addr, &addr6->sin6_addr) != 1)
> +                       return -1;
> +
> +               *addrlen = sizeof(*addr6);
> +               return 0;
> +       }
> +
> +       return -1;
> +}
> +
> +static int create_reuseport_socket(const struct reuseport_migrate_case *test_case)
> +{
> +       int one = 1;
> +       int fd;
> +
> +       fd = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
> +       if (fd < 0)
> +               return -1;
> +
> +       if (test_case->family == AF_INET6 &&
> +           setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, &one, sizeof(one))) {
> +               close(fd);
> +               return -1;
> +       }
> +
> +       if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one))) {
> +               close(fd);
> +               return -1;
> +       }
> +
> +       return fd;
> +}
> +
> +static int enable_tcp_migrate_req(void)
> +{
> +       int len;
> +       int fd;
> +
> +       fd = open(TCP_MIGRATE_REQ_PATH, O_RDWR | O_CLOEXEC);
> +       if (fd < 0) {
> +               if (errno == ENOENT || errno == EACCES ||
> +                   errno == EPERM || errno == EROFS)
> +                       return KSFT_SKIP;
> +               return KSFT_FAIL;
> +       }
> +
> +       len = write(fd, "1", 1);
> +       if (len != 1) {
> +               if (errno == EACCES || errno == EPERM || errno == EROFS) {
> +                       close(fd);
> +                       return KSFT_SKIP;
> +               }
> +
> +               close(fd);
> +               return KSFT_FAIL;
> +       }
> +
> +       close(fd);
> +       return KSFT_PASS;
> +}
> +
> +static void setup_netns(void)
> +{
> +       int ret;
> +
> +       if (unshare(CLONE_NEWNET))
> +               ksft_exit_skip("unshare(CLONE_NEWNET): %s\n", strerror(errno));
> +
> +       if (system("ip link set lo up"))
> +               ksft_exit_skip("failed to bring up lo interface in netns\n");
> +
> +       ret = enable_tcp_migrate_req();
> +       if (ret == KSFT_SKIP)
> +               ksft_exit_skip("failed to enable tcp_migrate_req\n");
> +       if (ret == KSFT_FAIL)
> +               ksft_exit_fail_msg("failed to enable tcp_migrate_req\n");
> +}
> +
> +static void noop_handler(int sig)
> +{
> +       (void)sig;
> +}
> +
> +static void *accept_thread(void *arg)
> +{
> +       struct accept_result *result = arg;
> +
> +       atomic_store_explicit(&result->tid, (int)syscall(SYS_gettid),
> +                             memory_order_release);
> +       atomic_store_explicit(&result->started, 1, memory_order_release);
> +       result->accepted_fd = accept4(result->listener_fd, NULL, NULL,
> +                                     SOCK_CLOEXEC);
> +       if (result->accepted_fd < 0)
> +               result->err = errno;
> +
> +       return NULL;
> +}
> +
> +static int read_thread_state(int tid, char *state)
> +{
> +       char *close_paren;
> +       char path[64];
> +       char buf[256];
> +       ssize_t len;
> +       int fd;
> +
> +       snprintf(path, sizeof(path), "/proc/self/task/%d/stat", tid);
> +
> +       fd = open(path, O_RDONLY | O_CLOEXEC);
> +       if (fd < 0)
> +               return -errno;
> +
> +       len = read(fd, buf, sizeof(buf) - 1);
> +       close(fd);
> +       if (len < 0)
> +               return -errno;
> +       if (!len)
> +               return -EINVAL;
> +
> +       buf[len] = '\0';
> +       close_paren = strrchr(buf, ')');
> +       if (!close_paren || close_paren[1] != ' ' || !close_paren[2])
> +               return -EINVAL;
> +
> +       *state = close_paren[2];
> +       return 0;
> +}
> +
> +static int wait_for_accept_to_block(const struct reuseport_migrate_case *test_case,
> +                                   int tid)
> +{
> +       char state = '\0';
> +       int ret;
> +       int i;
> +
> +       /*
> +        * A started thread is not enough here: we need to know the waiter
> +        * has actually gone to sleep in accept() before closing listener_a,
> +        * otherwise migration can race ahead of waiter registration. Poll
> +        * /proc task state because the pthread APIs can tell us whether the
> +        * thread has exited, but not whether it is already blocked in the
> +        * target syscall.
> +        */
> +       for (i = 0; i < ACCEPT_BLOCK_TIMEOUT_MS; i++) {
> +               ret = read_thread_state(tid, &state);
> +               if (!ret) {
> +                       if (state == 'S' || state == 'D')
> +                               return KSFT_PASS;
> +                       if (state == 'Z')
> +                               break;
> +               } else if (ret == -ENOENT) {
> +                       break;
> +               }
> +
> +               usleep(1000);
> +       }
> +
> +       ksft_print_msg("%s: accept waiter never blocked before migration\n",
> +                      test_case->name);
> +       return KSFT_FAIL;
> +}
> +
> +static int join_thread_with_timeout(pthread_t thread, int timeout_ms,
> +                                   bool *timed_out)
> +{
> +       struct timespec deadline;
> +       int err;
> +
> +       *timed_out = false;
> +
> +       if (clock_gettime(CLOCK_REALTIME, &deadline))
> +               return KSFT_FAIL;
> +
> +       deadline.tv_nsec += timeout_ms * 1000000LL;
> +       deadline.tv_sec += deadline.tv_nsec / 1000000000LL;
> +       deadline.tv_nsec %= 1000000000LL;
> +
> +       err = pthread_timedjoin_np(thread, NULL, &deadline);
> +       if (!err)
> +               return KSFT_PASS;
> +
> +       if (err != ETIMEDOUT)
> +               return KSFT_FAIL;
> +
> +       *timed_out = true;
> +       return KSFT_FAIL;
> +}
> +
> +static int interrupt_accept_thread(pthread_t thread)
> +{
> +       int err;
> +
> +       err = pthread_kill(thread, SIGUSR1);
> +       if (err && err != ESRCH)
> +               return KSFT_FAIL;
> +
> +       return KSFT_PASS;
> +}
> +
> +static int stop_accept_thread(pthread_t thread, bool *timed_out)
> +{
> +       if (interrupt_accept_thread(thread))
> +               return KSFT_FAIL;
> +
> +       return join_thread_with_timeout(thread, ACCEPT_CLEANUP_TIMEOUT_MS,
> +                                       timed_out);
> +}
> +
> +static int run_test(const struct reuseport_migrate_case *test_case)
> +{
> +       struct accept_result result = {
> +               .listener_fd = -1,
> +               .started = 0,
> +               .tid = -1,
> +               .accepted_fd = -1,
> +               .err = 0,
> +       };
> +       struct sockaddr_storage addr;
> +       struct sigaction sa = {
> +               .sa_handler = noop_handler,
> +       };
> +       bool thread_joined = false;
> +       bool cleanup_timed_out;
> +       int listener_a = -1;
> +       int listener_b = -1;
> +       int ret = KSFT_FAIL;
> +       socklen_t addrlen;
> +       pthread_t thread;
> +       int client = -1;
> +       bool timed_out;
> +       int probe = -1;
> +       int tid;
> +
> +       if (make_sockaddr(test_case, 0, &addr, &addrlen)) {
> +               ksft_print_msg("%s: failed to build socket address\n",
> +                              test_case->name);
> +               goto out;
> +       }
> +
> +       if (sigemptyset(&sa.sa_mask)) {
> +               ksft_perror("sigemptyset");
> +               goto out;
> +       }
> +
> +       if (sigaction(SIGUSR1, &sa, NULL)) {
> +               ksft_perror("sigaction(SIGUSR1)");
> +               goto out;
> +       }
> +
> +       listener_a = create_reuseport_socket(test_case);
> +       if (listener_a < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(listener_a)");
> +               goto out;
> +       }
> +
> +       if (bind(listener_a, (struct sockaddr *)&addr, addrlen)) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("bind(listener_a)");
> +               goto out;
> +       }
> +
> +       if (listen(listener_a, 1)) {
> +               ksft_perror("listen(listener_a)");
> +               goto out;
> +       }
> +
> +       addrlen = sizeof(addr);
> +       if (getsockname(listener_a, (struct sockaddr *)&addr, &addrlen)) {
> +               ksft_perror("getsockname(listener_a)");
> +               goto out;
> +       }
> +
> +       listener_b = create_reuseport_socket(test_case);
> +       if (listener_b < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(listener_b)");
> +               goto out;
> +       }
> +
> +       if (bind(listener_b, (struct sockaddr *)&addr, addrlen)) {
> +               ksft_perror("bind(listener_b)");
> +               goto out;
> +       }
> +
> +       client = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
> +       if (client < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(client)");
> +               goto out;
> +       }
> +
> +       /* Connect while only listener_a is listening, ensuring the
> +        * child lands in listener_a's accept queue deterministically.
> +        */
> +       if (connect(client, (struct sockaddr *)&addr, addrlen)) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("connect(client)");
> +               goto out;
> +       }
> +
> +       if (listen(listener_b, 1)) {
> +               ksft_perror("listen(listener_b)");
> +               goto out;
> +       }
> +
> +       result.listener_fd = listener_b;
> +       if (pthread_create(&thread, NULL, accept_thread, &result)) {
> +               ksft_perror("pthread_create");
> +               goto out;
> +       }
> +
> +       while (!atomic_load_explicit(&result.started, memory_order_acquire))
> +               sched_yield();
> +
> +       tid = atomic_load_explicit(&result.tid, memory_order_acquire);
> +       if (wait_for_accept_to_block(test_case, tid))
> +               goto out_with_thread;
> +
> +       close_fd(&listener_a);
> +
> +       ret = join_thread_with_timeout(thread, ACCEPT_WAKE_TIMEOUT_MS, &timed_out);
> +       if (ret == KSFT_PASS) {
> +               thread_joined = true;
> +               if (result.accepted_fd < 0) {
> +                       ksft_print_msg("%s: blocking accept() returned err=%d (%s)\n",
> +                                      test_case->name, result.err,
> +                                      strerror(result.err));
> +                       ret = KSFT_FAIL;
> +               }
> +
> +               goto out_with_thread;
> +       }
> +
> +       if (!timed_out) {
> +               ksft_print_msg("%s: join_thread_with_timeout() failed\n",
> +                              test_case->name);
> +               goto out_with_thread;
> +       }
> +
> +       if (stop_accept_thread(thread, &cleanup_timed_out) == KSFT_FAIL) {
> +               ksft_print_msg("%s: failed to stop blocking accept waiter\n",
> +                              test_case->name);
> +               goto out_with_thread;
> +       }
> +       thread_joined = true;
> +
> +       if (result.accepted_fd >= 0) {
> +               ksft_print_msg("%s: blocking accept() completed only in cleanup\n",
> +                              test_case->name);
> +               goto out_with_thread;
> +       }
> +
> +       if (result.err != EINTR) {
> +               ksft_print_msg("%s: blocking accept() returned err=%d (%s)\n",
> +                              test_case->name, result.err,
> +                              strerror(result.err));
> +               goto out_with_thread;
> +       }
> +
> +       probe = accept4(listener_b, NULL, NULL, SOCK_NONBLOCK | SOCK_CLOEXEC);
> +       if (probe >= 0) {
> +               ksft_print_msg("%s: accept queue was populated, but blocking accept() timed out\n",
> +                              test_case->name);
> +       } else if (errno == EAGAIN || errno == EWOULDBLOCK) {
> +               ksft_print_msg("%s: target listener had no queued child after migration\n",
> +                              test_case->name);
> +       } else {
> +               ksft_perror("accept4(listener_b)");
> +       }
> +
> +out_with_thread:
> +       close_fd(&probe);
> +       if (!thread_joined) {
> +               if (stop_accept_thread(thread, &cleanup_timed_out) == KSFT_FAIL) {
> +                       ksft_print_msg("%s: failed to stop blocking accept waiter\n",
> +                                      test_case->name);
> +                       ret = KSFT_FAIL;
> +                       goto out;
> +               }
> +
> +               thread_joined = true;
> +       }
> +       if (thread_joined)
> +               close_fd(&result.accepted_fd);
> +
> +out:
> +       close_fd(&client);
> +       close_fd(&listener_b);
> +       close_fd(&listener_a);
> +
> +       return ret;
> +}
> +
> +int main(void)
> +{
> +       int status = KSFT_PASS;
> +       int ret;
> +       int i;
> +
> +       setup_netns();
> +
> +       ksft_print_header();
> +       ksft_set_plan(ARRAY_SIZE(test_cases));
> +
> +       for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
> +               ret = run_test(&test_cases[i]);
> +               ksft_test_result_code(ret, test_cases[i].name, NULL);
> +
> +               if (ret == KSFT_FAIL)
> +                       status = KSFT_FAIL;
> +       }
> +
> +       if (status == KSFT_FAIL)
> +               ksft_exit_fail();
> +
> +       ksft_finished();
> +}
> diff --git a/tools/testing/selftests/net/reuseport_migrate_epoll.c b/tools/testing/selftests/net/reuseport_migrate_epoll.c
> new file mode 100644
> index 000000000..9cbfb58c4
> --- /dev/null
> +++ b/tools/testing/selftests/net/reuseport_migrate_epoll.c
> @@ -0,0 +1,353 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define _GNU_SOURCE
> +
> +#include <arpa/inet.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <netinet/in.h>
> +#include <sched.h>
> +#include <stdbool.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/epoll.h>
> +#include <sys/socket.h>
> +#include <unistd.h>
> +
> +#include "../kselftest.h"
> +
> +#define EPOLL_TIMEOUT_MS 500
> +#define TCP_MIGRATE_REQ_PATH "/proc/sys/net/ipv4/tcp_migrate_req"
> +
> +struct reuseport_migrate_case {
> +       const char *name;
> +       int family;
> +       const char *addr;
> +};
> +
> +static const struct reuseport_migrate_case test_cases[] = {
> +       {
> +               .name = "ipv4 epoll wake after reuseport migration",
> +               .family = AF_INET,
> +               .addr = "127.0.0.1",
> +       },
> +       {
> +               .name = "ipv6 epoll wake after reuseport migration",
> +               .family = AF_INET6,
> +               .addr = "::1",
> +       },
> +};
> +
> +static void close_fd(int *fd)
> +{
> +       if (*fd >= 0) {
> +               close(*fd);
> +               *fd = -1;
> +       }
> +}
> +
> +static bool unsupported_addr_err(int family, int err)
> +{
> +       return family == AF_INET6 &&
> +               (err == EAFNOSUPPORT ||
> +                err == EPROTONOSUPPORT ||
> +                err == EADDRNOTAVAIL);
> +}
> +
> +static int make_sockaddr(const struct reuseport_migrate_case *test_case,
> +                        unsigned short port,
> +                        struct sockaddr_storage *addr,
> +                        socklen_t *addrlen)
> +{
> +       memset(addr, 0, sizeof(*addr));
> +
> +       if (test_case->family == AF_INET) {
> +               struct sockaddr_in *addr4 = (struct sockaddr_in *)addr;
> +
> +               addr4->sin_family = AF_INET;
> +               addr4->sin_port = htons(port);
> +               if (inet_pton(AF_INET, test_case->addr, &addr4->sin_addr) != 1)
> +                       return -1;
> +
> +               *addrlen = sizeof(*addr4);
> +               return 0;
> +       }
> +
> +       if (test_case->family == AF_INET6) {
> +               struct sockaddr_in6 *addr6 = (struct sockaddr_in6 *)addr;
> +
> +               addr6->sin6_family = AF_INET6;
> +               addr6->sin6_port = htons(port);
> +               if (inet_pton(AF_INET6, test_case->addr, &addr6->sin6_addr) != 1)
> +                       return -1;
> +
> +               *addrlen = sizeof(*addr6);
> +               return 0;
> +       }
> +
> +       return -1;
> +}
> +
> +static int create_reuseport_socket(const struct reuseport_migrate_case *test_case)
> +{
> +       int one = 1;
> +       int fd;
> +
> +       fd = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
> +       if (fd < 0)
> +               return -1;
> +
> +       if (test_case->family == AF_INET6 &&
> +           setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, &one, sizeof(one))) {
> +               close(fd);
> +               return -1;
> +       }
> +
> +       if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one))) {
> +               close(fd);
> +               return -1;
> +       }
> +
> +       return fd;
> +}
> +
> +static int set_nonblocking(int fd)
> +{
> +       int flags;
> +
> +       flags = fcntl(fd, F_GETFL);
> +       if (flags < 0)
> +               return -1;
> +
> +       return fcntl(fd, F_SETFL, flags | O_NONBLOCK);
> +}
> +
> +static int enable_tcp_migrate_req(void)
> +{
> +       int len;
> +       int fd;
> +
> +       fd = open(TCP_MIGRATE_REQ_PATH, O_RDWR | O_CLOEXEC);
> +       if (fd < 0) {
> +               if (errno == ENOENT || errno == EACCES ||
> +                   errno == EPERM || errno == EROFS)
> +                       return KSFT_SKIP;
> +               return KSFT_FAIL;
> +       }
> +
> +       len = write(fd, "1", 1);
> +       if (len != 1) {
> +               if (errno == EACCES || errno == EPERM || errno == EROFS) {
> +                       close(fd);
> +                       return KSFT_SKIP;
> +               }
> +
> +               close(fd);
> +               return KSFT_FAIL;
> +       }
> +
> +       close(fd);
> +       return KSFT_PASS;
> +}
> +
> +static void setup_netns(void)
> +{
> +       int ret;
> +
> +       if (unshare(CLONE_NEWNET))
> +               ksft_exit_skip("unshare(CLONE_NEWNET): %s\n", strerror(errno));
> +
> +       if (system("ip link set lo up"))
> +               ksft_exit_skip("failed to bring up lo interface in netns\n");
> +
> +       ret = enable_tcp_migrate_req();
> +       if (ret == KSFT_SKIP)
> +               ksft_exit_skip("failed to enable tcp_migrate_req\n");
> +       if (ret == KSFT_FAIL)
> +               ksft_exit_fail_msg("failed to enable tcp_migrate_req\n");
> +}
> +
> +static int run_test(const struct reuseport_migrate_case *test_case)
> +{
> +       struct sockaddr_storage addr;
> +       struct epoll_event ev = {
> +               .events = EPOLLIN,
> +       };
> +       int listener_a = -1;
> +       int listener_b = -1;
> +       int ret = KSFT_FAIL;
> +       socklen_t addrlen;
> +       int accepted = -1;
> +       int client = -1;
> +       int epfd = -1;
> +       int n;
> +
> +       if (make_sockaddr(test_case, 0, &addr, &addrlen)) {
> +               ksft_print_msg("%s: failed to build socket address\n",
> +                              test_case->name);
> +               goto out;
> +       }
> +
> +       listener_a = create_reuseport_socket(test_case);
> +       if (listener_a < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(listener_a)");
> +               goto out;
> +       }
> +
> +       if (bind(listener_a, (struct sockaddr *)&addr, addrlen)) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("bind(listener_a)");
> +               goto out;
> +       }
> +
> +       if (listen(listener_a, 1)) {
> +               ksft_perror("listen(listener_a)");
> +               goto out;
> +       }
> +
> +       addrlen = sizeof(addr);
> +       if (getsockname(listener_a, (struct sockaddr *)&addr, &addrlen)) {
> +               ksft_perror("getsockname(listener_a)");
> +               goto out;
> +       }
> +
> +       listener_b = create_reuseport_socket(test_case);
> +       if (listener_b < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(listener_b)");
> +               goto out;
> +       }
> +
> +       if (bind(listener_b, (struct sockaddr *)&addr, addrlen)) {
> +               ksft_perror("bind(listener_b)");
> +               goto out;
> +       }
> +
> +       client = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
> +       if (client < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(client)");
> +               goto out;
> +       }
> +
> +       /* Connect while only listener_a is listening, ensuring the
> +        * child lands in listener_a's accept queue deterministically.
> +        */
> +       if (connect(client, (struct sockaddr *)&addr, addrlen)) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("connect(client)");
> +               goto out;
> +       }
> +
> +       if (listen(listener_b, 1)) {
> +               ksft_perror("listen(listener_b)");
> +               goto out;
> +       }
> +
> +       if (set_nonblocking(listener_b)) {
> +               ksft_perror("set_nonblocking(listener_b)");
> +               goto out;
> +       }
> +
> +       epfd = epoll_create1(EPOLL_CLOEXEC);
> +       if (epfd < 0) {
> +               ksft_perror("epoll_create1");
> +               goto out;
> +       }
> +
> +       ev.data.fd = listener_b;
> +       if (epoll_ctl(epfd, EPOLL_CTL_ADD, listener_b, &ev)) {
> +               ksft_perror("epoll_ctl(ADD listener_b)");
> +               goto out;
> +       }
> +
> +       close_fd(&listener_a);
> +
> +       n = epoll_wait(epfd, &ev, 1, EPOLL_TIMEOUT_MS);
> +       if (n < 0) {
> +               ksft_perror("epoll_wait");
> +               goto out;
> +       }
> +
> +       accepted = accept4(listener_b, NULL, NULL, SOCK_NONBLOCK | SOCK_CLOEXEC);
> +       if (accepted < 0) {
> +               if (errno == EAGAIN || errno == EWOULDBLOCK) {
> +                       ksft_print_msg("%s: target listener had no queued child after migration\n",
> +                                      test_case->name);
> +                       goto out;
> +               }
> +
> +               ksft_perror("accept4(listener_b)");
> +               goto out;
> +       }
> +
> +       if (n != 1) {
> +               ksft_print_msg("%s: accept queue was populated, but epoll_wait() timed out\n",
> +                              test_case->name);
> +               goto out;
> +       }
> +
> +       if (ev.data.fd != listener_b || !(ev.events & EPOLLIN)) {
> +               ksft_print_msg("%s: unexpected epoll event fd=%d events=%#x\n",
> +                              test_case->name, ev.data.fd, ev.events);
> +               goto out;
> +       }
> +
> +       ret = KSFT_PASS;
> +
> +out:
> +       close_fd(&accepted);
> +       close_fd(&epfd);
> +       close_fd(&client);
> +       close_fd(&listener_b);
> +       close_fd(&listener_a);
> +
> +       return ret;
> +}
> +
> +int main(void)
> +{
> +       int status = KSFT_PASS;
> +       int ret;
> +       int i;
> +
> +       setup_netns();
> +
> +       ksft_print_header();
> +       ksft_set_plan(ARRAY_SIZE(test_cases));
> +
> +       for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
> +               ret = run_test(&test_cases[i]);
> +               ksft_test_result_code(ret, test_cases[i].name, NULL);
> +
> +               if (ret == KSFT_FAIL)
> +                       status = KSFT_FAIL;
> +       }
> +
> +       if (status == KSFT_FAIL)
> +               ksft_exit_fail();
> +
> +       ksft_finished();
> +}
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net 1/2] tcp: call sk_data_ready() after listener migration
  2026-04-18  4:16 ` [PATCH net 1/2] tcp: call sk_data_ready() after listener migration Zhenzhong Wu
@ 2026-04-18  6:02   ` Eric Dumazet
  2026-04-18 13:30     ` 上勾拳
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2026-04-18  6:02 UTC (permalink / raw)
  To: Zhenzhong Wu
  Cc: netdev, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, stable

On Fri, Apr 17, 2026 at 9:17 PM Zhenzhong Wu <jt26wzz@gmail.com> wrote:
>
> When inet_csk_listen_stop() migrates an established child socket from
> a closing listener to another socket in the same SO_REUSEPORT group,
> the target listener gets a new accept-queue entry via
> inet_csk_reqsk_queue_add(), but that path never notifies the target
> listener's waiters.
>
> As a result, a nonblocking accept() still succeeds because it checks
> the accept queue directly, but waiters that sleep for listener
> readiness can remain asleep until another connection generates a
> wakeup. This affects poll()/epoll_wait()-based waiters, and can also
> leave a blocking accept() asleep after migration even though the
> child is already in the target listener's accept queue.
>
> This was observed in a local test where listener A completed the
> handshake, queued the child, and was closed before userspace called
> accept(). The child was migrated to listener B, but listener B never
> received a wakeup for the migrated accept-queue entry.
>
> Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration
> in inet_csk_listen_stop().
>
> The reqsk_timer_handler() path does not need the same change:
> half-open requests only become readable to userspace when the final
> ACK completes the handshake, and tcp_child_process() already wakes
> the listener in that case.
>
> Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
> Cc: stable@vger.kernel.org
> Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
> ---
>  net/ipv4/inet_connection_sock.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 4ac3ae1bc..da1ce082f 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -1483,6 +1483,7 @@ void inet_csk_listen_stop(struct sock *sk)
>                                         __NET_INC_STATS(sock_net(nsk),
>                                                         LINUX_MIB_TCPMIGRATEREQSUCCESS);
>                                         reqsk_migrate_reset(req);
> +                                       READ_ONCE(nsk->sk_data_ready)(nsk);

I think this is adding a potential UAF (Use Afte Free).
@nsk might have been freed already by another thread/cpu.
Note the existing code already has similar issues.

Untested patch:

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4ac3ae1bc1afc3a39f2790e39b4dda877dc3272b..287b6e01c4f71bfec3dd2a708f316224d9eb4a64
100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1479,6 +1479,7 @@ void inet_csk_listen_stop(struct sock *sk)
                        if (nreq) {
                                refcount_set(&nreq->rsk_refcnt, 1);

+                               rcu_read_lock();
                                if (inet_csk_reqsk_queue_add(nsk,
nreq, child)) {
                                        __NET_INC_STATS(sock_net(nsk),

LINUX_MIB_TCPMIGRATEREQSUCCESS);
@@ -1489,7 +1490,7 @@ void inet_csk_listen_stop(struct sock *sk)
                                        reqsk_migrate_reset(nreq);
                                        __reqsk_free(nreq);
                                }
-
+                               rcu_read_unlock();
                                /* inet_csk_reqsk_queue_add() has already
                                 * called inet_child_forget() on failure case.
                                 */

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net 1/2] tcp: call sk_data_ready() after listener migration
  2026-04-18  6:02   ` Eric Dumazet
@ 2026-04-18 13:30     ` 上勾拳
  0 siblings, 0 replies; 6+ messages in thread
From: 上勾拳 @ 2026-04-18 13:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, stable

Thanks Eric, you're right.

After inet_csk_reqsk_queue_add() succeeds, the ref acquired in
reuseport_migrate_sock() is effectively transferred to
nreq->rsk_listener. Another CPU can then dequeue nreq (via
accept() or listener shutdown), hit reqsk_put(), and drop that
listener ref.

Since listeners are SOCK_RCU_FREE, the post-queue_add()
dereferences of nsk should be under rcu_read_lock()/
rcu_read_unlock(), which also covers the existing sock_net(nsk)
access in that path.

I also checked reqsk_timer_handler(): reqsk_queue_migrated()
there is only accounting, and once nreq becomes visible via
inet_ehash_insert(), the handler no longer appears to
dereference nsk.

I'll fold this into v2.


Eric Dumazet <edumazet@google.com> 于2026年4月18日周六 14:02写道:
>
> On Fri, Apr 17, 2026 at 9:17 PM Zhenzhong Wu <jt26wzz@gmail.com> wrote:
> >
> > When inet_csk_listen_stop() migrates an established child socket from
> > a closing listener to another socket in the same SO_REUSEPORT group,
> > the target listener gets a new accept-queue entry via
> > inet_csk_reqsk_queue_add(), but that path never notifies the target
> > listener's waiters.
> >
> > As a result, a nonblocking accept() still succeeds because it checks
> > the accept queue directly, but waiters that sleep for listener
> > readiness can remain asleep until another connection generates a
> > wakeup. This affects poll()/epoll_wait()-based waiters, and can also
> > leave a blocking accept() asleep after migration even though the
> > child is already in the target listener's accept queue.
> >
> > This was observed in a local test where listener A completed the
> > handshake, queued the child, and was closed before userspace called
> > accept(). The child was migrated to listener B, but listener B never
> > received a wakeup for the migrated accept-queue entry.
> >
> > Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration
> > in inet_csk_listen_stop().
> >
> > The reqsk_timer_handler() path does not need the same change:
> > half-open requests only become readable to userspace when the final
> > ACK completes the handshake, and tcp_child_process() already wakes
> > the listener in that case.
> >
> > Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
> > ---
> >  net/ipv4/inet_connection_sock.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > index 4ac3ae1bc..da1ce082f 100644
> > --- a/net/ipv4/inet_connection_sock.c
> > +++ b/net/ipv4/inet_connection_sock.c
> > @@ -1483,6 +1483,7 @@ void inet_csk_listen_stop(struct sock *sk)
> >                                         __NET_INC_STATS(sock_net(nsk),
> >                                                         LINUX_MIB_TCPMIGRATEREQSUCCESS);
> >                                         reqsk_migrate_reset(req);
> > +                                       READ_ONCE(nsk->sk_data_ready)(nsk);
>
> I think this is adding a potential UAF (Use Afte Free).
> @nsk might have been freed already by another thread/cpu.
> Note the existing code already has similar issues.
>
> Untested patch:
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 4ac3ae1bc1afc3a39f2790e39b4dda877dc3272b..287b6e01c4f71bfec3dd2a708f316224d9eb4a64
> 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -1479,6 +1479,7 @@ void inet_csk_listen_stop(struct sock *sk)
>                         if (nreq) {
>                                 refcount_set(&nreq->rsk_refcnt, 1);
>
> +                               rcu_read_lock();
>                                 if (inet_csk_reqsk_queue_add(nsk,
> nreq, child)) {
>                                         __NET_INC_STATS(sock_net(nsk),
>
> LINUX_MIB_TCPMIGRATEREQSUCCESS);
> @@ -1489,7 +1490,7 @@ void inet_csk_listen_stop(struct sock *sk)
>                                         reqsk_migrate_reset(nreq);
>                                         __reqsk_free(nreq);
>                                 }
> -
> +                               rcu_read_unlock();
>                                 /* inet_csk_reqsk_queue_add() has already
>                                  * called inet_child_forget() on failure case.
>                                  */

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-04-18 13:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-18  4:16 [PATCH net 0/2] tcp: fix listener wakeup after reuseport migration Zhenzhong Wu
2026-04-18  4:16 ` [PATCH net 1/2] tcp: call sk_data_ready() after listener migration Zhenzhong Wu
2026-04-18  6:02   ` Eric Dumazet
2026-04-18 13:30     ` 上勾拳
2026-04-18  4:16 ` [PATCH net 2/2] selftests: net: add reuseport migration wakeup regression tests Zhenzhong Wu
2026-04-18  4:40   ` Kuniyuki Iwashima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox