Netdev List
 help / color / mirror / Atom feed
* [PATCH v8 net 0/6] netem: bug fixes
From: Stephen Hemminger @ 2026-04-18  3:19 UTC (permalink / raw)
  To: netdev; +Cc: jiri, jhs, horms, Stephen Hemminger

These bugs were found when doing AI-assisted review of sch_netem.c
during investigation of the packet duplication recursion problem
addressed in Jamal's series.

The fixes cover:

 - probability gaps in the 4-state Markov loss model
 - queue limit not accounting for reordered packets
 - PRNG reseeded on every tc change, breaking reproducibility
 - slot configuration not validated (inverted ranges, negative
   delays, negative limits)
 - slot delay arithmetic overflow for ranges above ~2.1 seconds
 - negative latency and jitter wrapping to huge time_to_send
   values via u64 arithmetic

v8 - added check for negative TCA_NETEM_LATENCY64 and TCA_NETEM_JITTER64
   - extended slot validation to cover dist_delay, dist_jitter,
     max_packets and max_bytes

Stephen Hemminger (6):
  net/sched: netem: fix probability gaps in 4-state loss model
  net/sched: netem: fix queue limit check to include reordered packets
  net/sched: netem: only reseed PRNG when seed is explicitly provided
  net/sched: netem: validate slot configuration
  net/sched: netem: fix slot delay calculation overflow
  net/sched: netem: check for negative latency and jitter

 net/sched/sch_netem.c | 76 ++++++++++++++++++++++++++++++++++++-------
 1 file changed, 64 insertions(+), 12 deletions(-)

-- 
2.53.0


^ permalink raw reply

* [PATCH net v8 1/6] net/sched: netem: fix probability gaps in 4-state loss model
From: Stephen Hemminger @ 2026-04-18  3:19 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, horms, Stephen Hemminger, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, open list
In-Reply-To: <20260418032027.900913-1-stephen@networkplumber.org>

The 4-state Markov chain in loss_4state() has gaps at the boundaries
between transition probability ranges. The comparisons use:

  if (rnd < a4)
  else if (a4 < rnd && rnd < a1 + a4)

When rnd equals a boundary value exactly, neither branch matches and
no state transition occurs. The redundant lower-bound check (a4 < rnd)
is already implied by being in the else branch.

Remove the unnecessary lower-bound comparisons so the ranges are
contiguous and every random value produces a transition, matching
the GI (General and Intuitive) loss model specification.

This bug goes back to original implementation of this model.

Fixes: 661b79725fea ("netem: revised correlated loss generator")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
 net/sched/sch_netem.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 20df1c08b1e9..8ee72cac1faf 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -227,10 +227,10 @@ static bool loss_4state(struct netem_sched_data *q)
 		if (rnd < clg->a4) {
 			clg->state = LOST_IN_GAP_PERIOD;
 			return true;
-		} else if (clg->a4 < rnd && rnd < clg->a1 + clg->a4) {
+		} else if (rnd < clg->a1 + clg->a4) {
 			clg->state = LOST_IN_BURST_PERIOD;
 			return true;
-		} else if (clg->a1 + clg->a4 < rnd) {
+		} else {
 			clg->state = TX_IN_GAP_PERIOD;
 		}
 
@@ -247,9 +247,9 @@ static bool loss_4state(struct netem_sched_data *q)
 	case LOST_IN_BURST_PERIOD:
 		if (rnd < clg->a3)
 			clg->state = TX_IN_BURST_PERIOD;
-		else if (clg->a3 < rnd && rnd < clg->a2 + clg->a3) {
+		else if (rnd < clg->a2 + clg->a3) {
 			clg->state = TX_IN_GAP_PERIOD;
-		} else if (clg->a2 + clg->a3 < rnd) {
+		} else {
 			clg->state = LOST_IN_BURST_PERIOD;
 			return true;
 		}
-- 
2.53.0


^ permalink raw reply related

* [PATCH net v8 2/6] net/sched: netem: fix queue limit check to include reordered packets
From: Stephen Hemminger @ 2026-04-18  3:19 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, horms, Stephen Hemminger, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Martin Ottens,
	open list
In-Reply-To: <20260418032027.900913-1-stephen@networkplumber.org>

The queue limit check in netem_enqueue() uses q->t_len which only
counts packets in the internal tfifo. Packets placed in sch->q by
the reorder path (__qdisc_enqueue_head) are not counted, allowing
the total queue occupancy to exceed sch->limit under reordering.

Include sch->q.qlen in the limit check.

Fixes: f8d4bc455047 ("net/sched: netem: account for backlog updates from child qdisc")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
 net/sched/sch_netem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 8ee72cac1faf..d400a730eadd 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -524,7 +524,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 				1 << get_random_u32_below(8);
 	}
 
-	if (unlikely(q->t_len >= sch->limit)) {
+	if (unlikely(sch->q.qlen >= sch->limit)) {
 		/* re-link segs, so that qdisc_drop_all() frees them all */
 		skb->next = segs;
 		qdisc_drop_all(skb, sch, to_free);
-- 
2.53.0


^ permalink raw reply related

* [PATCH net v8 3/6] net/sched: netem: only reseed PRNG when seed is explicitly provided
From: Stephen Hemminger @ 2026-04-18  3:19 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, horms, Stephen Hemminger, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, François Michel,
	open list
In-Reply-To: <20260418032027.900913-1-stephen@networkplumber.org>

netem_change() unconditionally reseeds the PRNG on every tc change
command. If TCA_NETEM_PRNG_SEED is not specified, a new random seed
is generated, destroying reproducibility for users who set a
deterministic seed on a previous change.

Move the initial random seed generation to netem_init() and only
reseed in netem_change() when TCA_NETEM_PRNG_SEED is explicitly
provided by the user.

Fixes: 4072d97ddc44 ("netem: add prng attribute to netem_sched_data")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
 net/sched/sch_netem.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index d400a730eadd..556f9747f0e7 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -1112,11 +1112,10 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
 	/* capping jitter to the range acceptable by tabledist() */
 	q->jitter = min_t(s64, abs(q->jitter), INT_MAX);
 
-	if (tb[TCA_NETEM_PRNG_SEED])
+	if (tb[TCA_NETEM_PRNG_SEED]) {
 		q->prng.seed = nla_get_u64(tb[TCA_NETEM_PRNG_SEED]);
-	else
-		q->prng.seed = get_random_u64();
-	prandom_seed_state(&q->prng.prng_state, q->prng.seed);
+		prandom_seed_state(&q->prng.prng_state, q->prng.seed);
+	}
 
 unlock:
 	sch_tree_unlock(sch);
@@ -1139,6 +1138,9 @@ static int netem_init(struct Qdisc *sch, struct nlattr *opt,
 		return -EINVAL;
 
 	q->loss_model = CLG_RANDOM;
+	q->prng.seed = get_random_u64();
+	prandom_seed_state(&q->prng.prng_state, q->prng.seed);
+
 	ret = netem_change(sch, opt, extack);
 	if (ret)
 		pr_info("netem: change failed\n");
-- 
2.53.0


^ permalink raw reply related

* [PATCH net v8 4/6] net/sched: netem: validate slot configuration
From: Stephen Hemminger @ 2026-04-18  3:19 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, horms, Stephen Hemminger, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Dave Taht, open list
In-Reply-To: <20260418032027.900913-1-stephen@networkplumber.org>

Reject slot configurations that have no defensible meaning:

  - negative min_delay or max_delay
  - min_delay greater than max_delay
  - negative dist_delay or dist_jitter
  - negative max_packets or max_bytes

Negative or out-of-order delays underflow in get_slot_next(),
producing garbage intervals. Negative limits trip the per-slot
accounting (packets_left/bytes_left <= 0) on the first packet of
every slot, defeating the rate-limiting half of the slot feature.

Note that dist_jitter has been silently coerced to its absolute
value by get_slot() since the feature was introduced; rejecting
negatives here converts that silent coercion into -EINVAL. The
abs() can be removed in a follow-up.

Fixes: 836af83b54e3 ("netem: support delivering packets in delayed time slots")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 net/sched/sch_netem.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 556f9747f0e7..640b51be807a 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -827,6 +827,29 @@ static int get_dist_table(struct disttable **tbl, const struct nlattr *attr)
 	return 0;
 }
 
+static int validate_slot(const struct nlattr *attr, struct netlink_ext_ack *extack)
+{
+	const struct tc_netem_slot *c = nla_data(attr);
+
+	if (c->min_delay < 0 || c->max_delay < 0) {
+		NL_SET_ERR_MSG_ATTR(extack, attr, "negative slot delay");
+		return -EINVAL;
+	}
+	if (c->min_delay > c->max_delay) {
+		NL_SET_ERR_MSG_ATTR(extack, attr, "slot min delay greater than max delay");
+		return -EINVAL;
+	}
+	if (c->dist_delay < 0 || c->dist_jitter < 0) {
+		NL_SET_ERR_MSG_ATTR(extack, attr, "negative dist delay");
+		return -EINVAL;
+	}
+	if (c->max_packets < 0 || c->max_bytes < 0) {
+		NL_SET_ERR_MSG_ATTR(extack, attr, "negative slot limit");
+		return -EINVAL;
+	}
+	return 0;
+}
+
 static void get_slot(struct netem_sched_data *q, const struct nlattr *attr)
 {
 	const struct tc_netem_slot *c = nla_data(attr);
@@ -1040,6 +1063,12 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
 			goto table_free;
 	}
 
+	if (tb[TCA_NETEM_SLOT]) {
+		ret = validate_slot(tb[TCA_NETEM_SLOT], extack);
+		if (ret)
+			goto table_free;
+	}
+
 	sch_tree_lock(sch);
 	/* backup q->clg and q->loss_model */
 	old_clg = q->clg;
-- 
2.53.0


^ permalink raw reply related

* [PATCH net v8 5/6] net/sched: netem: fix slot delay calculation overflow
From: Stephen Hemminger @ 2026-04-18  3:19 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, horms, Stephen Hemminger, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Yousuk Seung,
	Neal Cardwell, open list
In-Reply-To: <20260418032027.900913-1-stephen@networkplumber.org>

get_slot_next() computes a random delay between min_delay and
max_delay using:

  get_random_u32() * (max_delay - min_delay) >> 32

This overflows signed 64-bit arithmetic when the delay range exceeds
approximately 2.1 seconds (2^31 nanoseconds), producing a negative
result that effectively disables slot-based pacing. This is a
realistic configuration for WAN emulation (e.g., slot 1s 5s).

Use mul_u64_u32_shr() which handles the widening multiply without
overflow.

Fixes: 0a9fe5c375b5 ("netem: slotting with non-uniform distribution")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
 net/sched/sch_netem.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 640b51be807a..475c14b3dbdb 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -659,9 +659,8 @@ static void get_slot_next(struct netem_sched_data *q, u64 now)
 
 	if (!q->slot_dist)
 		next_delay = q->slot_config.min_delay +
-				(get_random_u32() *
-				 (q->slot_config.max_delay -
-				  q->slot_config.min_delay) >> 32);
+			mul_u64_u32_shr(q->slot_config.max_delay - q->slot_config.min_delay,
+					get_random_u32(), 32);
 	else
 		next_delay = tabledist(q->slot_config.dist_delay,
 				       (s32)(q->slot_config.dist_jitter),
-- 
2.53.0


^ permalink raw reply related

* [PATCH net v8 6/6] net/sched: netem: check for negative latency and jitter
From: Stephen Hemminger @ 2026-04-18  3:19 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, horms, Stephen Hemminger, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Dave Taht, open list
In-Reply-To: <20260418032027.900913-1-stephen@networkplumber.org>

Reject requests with negative latency or jitter.
A negative value added to current timestamp (u64) wraps
to an enormous time_to_send, disabling dequeue.
The original UAPI used u32 for these values; the conversion to 64-bit
time values via TCA_NETEM_LATENCY64 and TCA_NETEM_JITTER64
allowed signed values to reach the kernel without validation.

Jitter is already silently clamped by an abs() in netem_change();
that abs() can be removed in a follow-up once this rejection is in
place.

Fixes: 99803171ef04 ("netem: add uapi to express delay and jitter in nanoseconds")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 net/sched/sch_netem.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 475c14b3dbdb..bc18e1976b6e 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -826,6 +826,16 @@ static int get_dist_table(struct disttable **tbl, const struct nlattr *attr)
 	return 0;
 }
 
+static int validate_time(const struct nlattr *attr, const char *name,
+			 struct netlink_ext_ack *extack)
+{
+	if (nla_get_s64(attr) < 0) {
+		NL_SET_ERR_MSG_ATTR_FMT(extack, attr, "negative %s", name);
+		return -EINVAL;
+	}
+	return 0;
+}
+
 static int validate_slot(const struct nlattr *attr, struct netlink_ext_ack *extack)
 {
 	const struct tc_netem_slot *c = nla_data(attr);
@@ -1068,6 +1078,18 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
 			goto table_free;
 	}
 
+	if (tb[TCA_NETEM_LATENCY64]) {
+		ret = validate_time(tb[TCA_NETEM_LATENCY64], "latency", extack);
+		if (ret)
+			goto table_free;
+	}
+
+	if (tb[TCA_NETEM_JITTER64]) {
+		ret = validate_time(tb[TCA_NETEM_JITTER64], "jitter", extack);
+		if (ret)
+			goto table_free;
+	}
+
 	sch_tree_lock(sch);
 	/* backup q->clg and q->loss_model */
 	old_clg = q->clg;
-- 
2.53.0


^ permalink raw reply related

* [PATCH net 0/2] tcp: fix listener wakeup after reuseport migration
From: Zhenzhong Wu @ 2026-04-18  4:16 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, Zhenzhong Wu

Hi,

this small series fixes a missing wakeup after listener migration in
the SO_REUSEPORT close path and adds regression selftests.

The issue shows up when a fully established child has already been
queued on listener A, userspace has not accepted it yet, and
listener A is then closed. The kernel migrates that child to
listener B in the same SO_REUSEPORT group via
inet_csk_reqsk_queue_add(), but the target listener's waiters are
not notified.

As a result, a nonblocking accept() still succeeds because it checks
the accept queue directly, but waiters that sleep for listener
readiness can remain asleep until another connection generates a
wakeup. This affects poll()/epoll_wait()-based waiters, and can also
leave a blocking accept() asleep after migration even though the
child is already in the target listener's accept queue.

The fix is to notify the target listener after a successful
inet_csk_reqsk_queue_add() in inet_csk_listen_stop().

I also checked the half-open migration path in
reqsk_timer_handler(). That path does not need an extra wakeup here
because the listener becomes readable only after the final ACK
completes the handshake, and tcp_child_process() already wakes the
parent listener at that point.

The series adds selftests under tools/testing/selftests/net/ that
reproduce the regression for both IPv4 and IPv6. They cover both
epoll-based waiters and a blocking accept() waiter.

Patch 1 contains only the runtime fix so it can stand on its own and
be considered for stable backporting. Patch 2 adds the selftest
coverage.

Testing:

On an unpatched host kernel:

  unshare -Ur sh -c \
    './tools/testing/selftests/net/reuseport_migrate_epoll'
  unshare -Ur sh -c \
    './tools/testing/selftests/net/reuseport_migrate_accept'

The epoll selftest fails for both IPv4 and IPv6 with:

  accept queue was populated, but epoll_wait() timed out

The blocking accept selftest fails for both IPv4 and IPv6, for example
with:

  blocking accept() completed only in cleanup

On a patched kernel booted under QEMU with a minimal initramfs, both
selftests pass:

  ok 1 ipv4 epoll wake after reuseport migration
  ok 2 ipv6 epoll wake after reuseport migration
  reuseport_migrate_epoll_RC=0

  ok 1 ipv4 blocking accept wake after reuseport migration
  ok 2 ipv6 blocking accept wake after reuseport migration
  reuseport_migrate_accept_RC=0

Zhenzhong Wu (2):
  tcp: call sk_data_ready() after listener migration
  selftests: net: add reuseport migration wakeup regression tests

 net/ipv4/inet_connection_sock.c               |   1 +
 tools/testing/selftests/net/Makefile          |   3 +
 .../selftests/net/reuseport_migrate_accept.c  | 533 ++++++++++++++++++
 .../selftests/net/reuseport_migrate_epoll.c   | 353 ++++++++++++
 4 files changed, 890 insertions(+)
 create mode 100644 tools/testing/selftests/net/reuseport_migrate_accept.c
 create mode 100644 tools/testing/selftests/net/reuseport_migrate_epoll.c


base-commit: 52bcb57a4e8a0865a76c587c2451906342ae1b2d
-- 
2.43.0

^ permalink raw reply

* [PATCH net 1/2] tcp: call sk_data_ready() after listener migration
From: Zhenzhong Wu @ 2026-04-18  4:16 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, Zhenzhong Wu,
	stable
In-Reply-To: <20260418041633.691435-1-jt26wzz@gmail.com>

When inet_csk_listen_stop() migrates an established child socket from
a closing listener to another socket in the same SO_REUSEPORT group,
the target listener gets a new accept-queue entry via
inet_csk_reqsk_queue_add(), but that path never notifies the target
listener's waiters.

As a result, a nonblocking accept() still succeeds because it checks
the accept queue directly, but waiters that sleep for listener
readiness can remain asleep until another connection generates a
wakeup. This affects poll()/epoll_wait()-based waiters, and can also
leave a blocking accept() asleep after migration even though the
child is already in the target listener's accept queue.

This was observed in a local test where listener A completed the
handshake, queued the child, and was closed before userspace called
accept(). The child was migrated to listener B, but listener B never
received a wakeup for the migrated accept-queue entry.

Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration
in inet_csk_listen_stop().

The reqsk_timer_handler() path does not need the same change:
half-open requests only become readable to userspace when the final
ACK completes the handshake, and tcp_child_process() already wakes
the listener in that case.

Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
Cc: stable@vger.kernel.org
Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
---
 net/ipv4/inet_connection_sock.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4ac3ae1bc..da1ce082f 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1483,6 +1483,7 @@ void inet_csk_listen_stop(struct sock *sk)
 					__NET_INC_STATS(sock_net(nsk),
 							LINUX_MIB_TCPMIGRATEREQSUCCESS);
 					reqsk_migrate_reset(req);
+					READ_ONCE(nsk->sk_data_ready)(nsk);
 				} else {
 					__NET_INC_STATS(sock_net(nsk),
 							LINUX_MIB_TCPMIGRATEREQFAILURE);
-- 
2.43.0


^ permalink raw reply related

* [PATCH net 2/2] selftests: net: add reuseport migration wakeup regression tests
From: Zhenzhong Wu @ 2026-04-18  4:16 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, Zhenzhong Wu
In-Reply-To: <20260418041633.691435-1-jt26wzz@gmail.com>

Add selftests that reproduce missing wakeups on the target listener
after SO_REUSEPORT migration from inet_csk_listen_stop().

The epoll case connects while only the first listener is active so the
child lands on its accept queue, registers the second listener with
epoll, then closes the first listener to trigger migration. It verifies
that the target listener both accepts the migrated child and becomes
readable via epoll.

The blocking accept case starts a thread blocked in accept() on the
target listener, closes the first listener to trigger migration, and
verifies that the blocked accept() wakes and returns the migrated
child. Wait until the helper thread is actually asleep in accept()
before triggering migration so the test does not race waiter
registration.

Run the tests in a private network namespace and enable
net.ipv4.tcp_migrate_req=1 there so they can exercise the migration
path without relying on a sk_reuseport/migrate BPF program. Treat a
missing or unwritable tcp_migrate_req sysctl as SKIP. Run both
scenarios for IPv4 and IPv6.

These tests cover the bug fixed by the preceding patch.

Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
---
 tools/testing/selftests/net/Makefile          |   3 +
 .../selftests/net/reuseport_migrate_accept.c  | 533 ++++++++++++++++++
 .../selftests/net/reuseport_migrate_epoll.c   | 353 ++++++++++++
 3 files changed, 889 insertions(+)
 create mode 100644 tools/testing/selftests/net/reuseport_migrate_accept.c
 create mode 100644 tools/testing/selftests/net/reuseport_migrate_epoll.c

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index a275ed584..2f8b6c44d 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -184,6 +184,8 @@ TEST_GEN_PROGS := \
 	reuseport_bpf_cpu \
 	reuseport_bpf_numa \
 	reuseport_dualstack \
+	reuseport_migrate_accept \
+	reuseport_migrate_epoll \
 	sk_bind_sendto_listen \
 	sk_connect_zero_addr \
 	sk_so_peek_off \
@@ -232,6 +234,7 @@ $(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma
 $(OUTPUT)/tcp_mmap: LDLIBS += -lpthread -lcrypto
 $(OUTPUT)/tcp_inq: LDLIBS += -lpthread
 $(OUTPUT)/bind_bhash: LDLIBS += -lpthread
+$(OUTPUT)/reuseport_migrate_accept: LDLIBS += -lpthread
 $(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/
 
 include bpf.mk
diff --git a/tools/testing/selftests/net/reuseport_migrate_accept.c b/tools/testing/selftests/net/reuseport_migrate_accept.c
new file mode 100644
index 000000000..a516843a0
--- /dev/null
+++ b/tools/testing/selftests/net/reuseport_migrate_accept.c
@@ -0,0 +1,533 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <netinet/in.h>
+#include <pthread.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdatomic.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <sys/syscall.h>
+#include <time.h>
+#include <unistd.h>
+
+#include "../kselftest.h"
+
+#define ACCEPT_BLOCK_TIMEOUT_MS 1000
+#define ACCEPT_CLEANUP_TIMEOUT_MS 1000
+#define ACCEPT_WAKE_TIMEOUT_MS 2000
+#define TCP_MIGRATE_REQ_PATH "/proc/sys/net/ipv4/tcp_migrate_req"
+
+struct reuseport_migrate_case {
+	const char *name;
+	int family;
+	const char *addr;
+};
+
+struct accept_result {
+	int listener_fd;
+	atomic_int started;
+	atomic_int tid;
+	int accepted_fd;
+	int err;
+};
+
+static const struct reuseport_migrate_case test_cases[] = {
+	{
+		.name = "ipv4 blocking accept wake after reuseport migration",
+		.family = AF_INET,
+		.addr = "127.0.0.1",
+	},
+	{
+		.name = "ipv6 blocking accept wake after reuseport migration",
+		.family = AF_INET6,
+		.addr = "::1",
+	},
+};
+
+static void close_fd(int *fd)
+{
+	if (*fd >= 0) {
+		close(*fd);
+		*fd = -1;
+	}
+}
+
+static bool unsupported_addr_err(int family, int err)
+{
+	return family == AF_INET6 &&
+		(err == EAFNOSUPPORT ||
+		 err == EPROTONOSUPPORT ||
+		 err == EADDRNOTAVAIL);
+}
+
+static int make_sockaddr(const struct reuseport_migrate_case *test_case,
+			 unsigned short port,
+			 struct sockaddr_storage *addr,
+			 socklen_t *addrlen)
+{
+	memset(addr, 0, sizeof(*addr));
+
+	if (test_case->family == AF_INET) {
+		struct sockaddr_in *addr4 = (struct sockaddr_in *)addr;
+
+		addr4->sin_family = AF_INET;
+		addr4->sin_port = htons(port);
+		if (inet_pton(AF_INET, test_case->addr, &addr4->sin_addr) != 1)
+			return -1;
+
+		*addrlen = sizeof(*addr4);
+		return 0;
+	}
+
+	if (test_case->family == AF_INET6) {
+		struct sockaddr_in6 *addr6 = (struct sockaddr_in6 *)addr;
+
+		addr6->sin6_family = AF_INET6;
+		addr6->sin6_port = htons(port);
+		if (inet_pton(AF_INET6, test_case->addr, &addr6->sin6_addr) != 1)
+			return -1;
+
+		*addrlen = sizeof(*addr6);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int create_reuseport_socket(const struct reuseport_migrate_case *test_case)
+{
+	int one = 1;
+	int fd;
+
+	fd = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+	if (fd < 0)
+		return -1;
+
+	if (test_case->family == AF_INET6 &&
+	    setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, &one, sizeof(one))) {
+		close(fd);
+		return -1;
+	}
+
+	if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one))) {
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+static int enable_tcp_migrate_req(void)
+{
+	int len;
+	int fd;
+
+	fd = open(TCP_MIGRATE_REQ_PATH, O_RDWR | O_CLOEXEC);
+	if (fd < 0) {
+		if (errno == ENOENT || errno == EACCES ||
+		    errno == EPERM || errno == EROFS)
+			return KSFT_SKIP;
+		return KSFT_FAIL;
+	}
+
+	len = write(fd, "1", 1);
+	if (len != 1) {
+		if (errno == EACCES || errno == EPERM || errno == EROFS) {
+			close(fd);
+			return KSFT_SKIP;
+		}
+
+		close(fd);
+		return KSFT_FAIL;
+	}
+
+	close(fd);
+	return KSFT_PASS;
+}
+
+static void setup_netns(void)
+{
+	int ret;
+
+	if (unshare(CLONE_NEWNET))
+		ksft_exit_skip("unshare(CLONE_NEWNET): %s\n", strerror(errno));
+
+	if (system("ip link set lo up"))
+		ksft_exit_skip("failed to bring up lo interface in netns\n");
+
+	ret = enable_tcp_migrate_req();
+	if (ret == KSFT_SKIP)
+		ksft_exit_skip("failed to enable tcp_migrate_req\n");
+	if (ret == KSFT_FAIL)
+		ksft_exit_fail_msg("failed to enable tcp_migrate_req\n");
+}
+
+static void noop_handler(int sig)
+{
+	(void)sig;
+}
+
+static void *accept_thread(void *arg)
+{
+	struct accept_result *result = arg;
+
+	atomic_store_explicit(&result->tid, (int)syscall(SYS_gettid),
+			      memory_order_release);
+	atomic_store_explicit(&result->started, 1, memory_order_release);
+	result->accepted_fd = accept4(result->listener_fd, NULL, NULL,
+				      SOCK_CLOEXEC);
+	if (result->accepted_fd < 0)
+		result->err = errno;
+
+	return NULL;
+}
+
+static int read_thread_state(int tid, char *state)
+{
+	char *close_paren;
+	char path[64];
+	char buf[256];
+	ssize_t len;
+	int fd;
+
+	snprintf(path, sizeof(path), "/proc/self/task/%d/stat", tid);
+
+	fd = open(path, O_RDONLY | O_CLOEXEC);
+	if (fd < 0)
+		return -errno;
+
+	len = read(fd, buf, sizeof(buf) - 1);
+	close(fd);
+	if (len < 0)
+		return -errno;
+	if (!len)
+		return -EINVAL;
+
+	buf[len] = '\0';
+	close_paren = strrchr(buf, ')');
+	if (!close_paren || close_paren[1] != ' ' || !close_paren[2])
+		return -EINVAL;
+
+	*state = close_paren[2];
+	return 0;
+}
+
+static int wait_for_accept_to_block(const struct reuseport_migrate_case *test_case,
+				    int tid)
+{
+	char state = '\0';
+	int ret;
+	int i;
+
+	/*
+	 * A started thread is not enough here: we need to know the waiter
+	 * has actually gone to sleep in accept() before closing listener_a,
+	 * otherwise migration can race ahead of waiter registration. Poll
+	 * /proc task state because the pthread APIs can tell us whether the
+	 * thread has exited, but not whether it is already blocked in the
+	 * target syscall.
+	 */
+	for (i = 0; i < ACCEPT_BLOCK_TIMEOUT_MS; i++) {
+		ret = read_thread_state(tid, &state);
+		if (!ret) {
+			if (state == 'S' || state == 'D')
+				return KSFT_PASS;
+			if (state == 'Z')
+				break;
+		} else if (ret == -ENOENT) {
+			break;
+		}
+
+		usleep(1000);
+	}
+
+	ksft_print_msg("%s: accept waiter never blocked before migration\n",
+		       test_case->name);
+	return KSFT_FAIL;
+}
+
+static int join_thread_with_timeout(pthread_t thread, int timeout_ms,
+				    bool *timed_out)
+{
+	struct timespec deadline;
+	int err;
+
+	*timed_out = false;
+
+	if (clock_gettime(CLOCK_REALTIME, &deadline))
+		return KSFT_FAIL;
+
+	deadline.tv_nsec += timeout_ms * 1000000LL;
+	deadline.tv_sec += deadline.tv_nsec / 1000000000LL;
+	deadline.tv_nsec %= 1000000000LL;
+
+	err = pthread_timedjoin_np(thread, NULL, &deadline);
+	if (!err)
+		return KSFT_PASS;
+
+	if (err != ETIMEDOUT)
+		return KSFT_FAIL;
+
+	*timed_out = true;
+	return KSFT_FAIL;
+}
+
+static int interrupt_accept_thread(pthread_t thread)
+{
+	int err;
+
+	err = pthread_kill(thread, SIGUSR1);
+	if (err && err != ESRCH)
+		return KSFT_FAIL;
+
+	return KSFT_PASS;
+}
+
+static int stop_accept_thread(pthread_t thread, bool *timed_out)
+{
+	if (interrupt_accept_thread(thread))
+		return KSFT_FAIL;
+
+	return join_thread_with_timeout(thread, ACCEPT_CLEANUP_TIMEOUT_MS,
+					timed_out);
+}
+
+static int run_test(const struct reuseport_migrate_case *test_case)
+{
+	struct accept_result result = {
+		.listener_fd = -1,
+		.started = 0,
+		.tid = -1,
+		.accepted_fd = -1,
+		.err = 0,
+	};
+	struct sockaddr_storage addr;
+	struct sigaction sa = {
+		.sa_handler = noop_handler,
+	};
+	bool thread_joined = false;
+	bool cleanup_timed_out;
+	int listener_a = -1;
+	int listener_b = -1;
+	int ret = KSFT_FAIL;
+	socklen_t addrlen;
+	pthread_t thread;
+	int client = -1;
+	bool timed_out;
+	int probe = -1;
+	int tid;
+
+	if (make_sockaddr(test_case, 0, &addr, &addrlen)) {
+		ksft_print_msg("%s: failed to build socket address\n",
+			       test_case->name);
+		goto out;
+	}
+
+	if (sigemptyset(&sa.sa_mask)) {
+		ksft_perror("sigemptyset");
+		goto out;
+	}
+
+	if (sigaction(SIGUSR1, &sa, NULL)) {
+		ksft_perror("sigaction(SIGUSR1)");
+		goto out;
+	}
+
+	listener_a = create_reuseport_socket(test_case);
+	if (listener_a < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(listener_a)");
+		goto out;
+	}
+
+	if (bind(listener_a, (struct sockaddr *)&addr, addrlen)) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("bind(listener_a)");
+		goto out;
+	}
+
+	if (listen(listener_a, 1)) {
+		ksft_perror("listen(listener_a)");
+		goto out;
+	}
+
+	addrlen = sizeof(addr);
+	if (getsockname(listener_a, (struct sockaddr *)&addr, &addrlen)) {
+		ksft_perror("getsockname(listener_a)");
+		goto out;
+	}
+
+	listener_b = create_reuseport_socket(test_case);
+	if (listener_b < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(listener_b)");
+		goto out;
+	}
+
+	if (bind(listener_b, (struct sockaddr *)&addr, addrlen)) {
+		ksft_perror("bind(listener_b)");
+		goto out;
+	}
+
+	client = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+	if (client < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(client)");
+		goto out;
+	}
+
+	/* Connect while only listener_a is listening, ensuring the
+	 * child lands in listener_a's accept queue deterministically.
+	 */
+	if (connect(client, (struct sockaddr *)&addr, addrlen)) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("connect(client)");
+		goto out;
+	}
+
+	if (listen(listener_b, 1)) {
+		ksft_perror("listen(listener_b)");
+		goto out;
+	}
+
+	result.listener_fd = listener_b;
+	if (pthread_create(&thread, NULL, accept_thread, &result)) {
+		ksft_perror("pthread_create");
+		goto out;
+	}
+
+	while (!atomic_load_explicit(&result.started, memory_order_acquire))
+		sched_yield();
+
+	tid = atomic_load_explicit(&result.tid, memory_order_acquire);
+	if (wait_for_accept_to_block(test_case, tid))
+		goto out_with_thread;
+
+	close_fd(&listener_a);
+
+	ret = join_thread_with_timeout(thread, ACCEPT_WAKE_TIMEOUT_MS, &timed_out);
+	if (ret == KSFT_PASS) {
+		thread_joined = true;
+		if (result.accepted_fd < 0) {
+			ksft_print_msg("%s: blocking accept() returned err=%d (%s)\n",
+				       test_case->name, result.err,
+				       strerror(result.err));
+			ret = KSFT_FAIL;
+		}
+
+		goto out_with_thread;
+	}
+
+	if (!timed_out) {
+		ksft_print_msg("%s: join_thread_with_timeout() failed\n",
+			       test_case->name);
+		goto out_with_thread;
+	}
+
+	if (stop_accept_thread(thread, &cleanup_timed_out) == KSFT_FAIL) {
+		ksft_print_msg("%s: failed to stop blocking accept waiter\n",
+			       test_case->name);
+		goto out_with_thread;
+	}
+	thread_joined = true;
+
+	if (result.accepted_fd >= 0) {
+		ksft_print_msg("%s: blocking accept() completed only in cleanup\n",
+			       test_case->name);
+		goto out_with_thread;
+	}
+
+	if (result.err != EINTR) {
+		ksft_print_msg("%s: blocking accept() returned err=%d (%s)\n",
+			       test_case->name, result.err,
+			       strerror(result.err));
+		goto out_with_thread;
+	}
+
+	probe = accept4(listener_b, NULL, NULL, SOCK_NONBLOCK | SOCK_CLOEXEC);
+	if (probe >= 0) {
+		ksft_print_msg("%s: accept queue was populated, but blocking accept() timed out\n",
+			       test_case->name);
+	} else if (errno == EAGAIN || errno == EWOULDBLOCK) {
+		ksft_print_msg("%s: target listener had no queued child after migration\n",
+			       test_case->name);
+	} else {
+		ksft_perror("accept4(listener_b)");
+	}
+
+out_with_thread:
+	close_fd(&probe);
+	if (!thread_joined) {
+		if (stop_accept_thread(thread, &cleanup_timed_out) == KSFT_FAIL) {
+			ksft_print_msg("%s: failed to stop blocking accept waiter\n",
+				       test_case->name);
+			ret = KSFT_FAIL;
+			goto out;
+		}
+
+		thread_joined = true;
+	}
+	if (thread_joined)
+		close_fd(&result.accepted_fd);
+
+out:
+	close_fd(&client);
+	close_fd(&listener_b);
+	close_fd(&listener_a);
+
+	return ret;
+}
+
+int main(void)
+{
+	int status = KSFT_PASS;
+	int ret;
+	int i;
+
+	setup_netns();
+
+	ksft_print_header();
+	ksft_set_plan(ARRAY_SIZE(test_cases));
+
+	for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+		ret = run_test(&test_cases[i]);
+		ksft_test_result_code(ret, test_cases[i].name, NULL);
+
+		if (ret == KSFT_FAIL)
+			status = KSFT_FAIL;
+	}
+
+	if (status == KSFT_FAIL)
+		ksft_exit_fail();
+
+	ksft_finished();
+}
diff --git a/tools/testing/selftests/net/reuseport_migrate_epoll.c b/tools/testing/selftests/net/reuseport_migrate_epoll.c
new file mode 100644
index 000000000..9cbfb58c4
--- /dev/null
+++ b/tools/testing/selftests/net/reuseport_migrate_epoll.c
@@ -0,0 +1,353 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <netinet/in.h>
+#include <sched.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/epoll.h>
+#include <sys/socket.h>
+#include <unistd.h>
+
+#include "../kselftest.h"
+
+#define EPOLL_TIMEOUT_MS 500
+#define TCP_MIGRATE_REQ_PATH "/proc/sys/net/ipv4/tcp_migrate_req"
+
+struct reuseport_migrate_case {
+	const char *name;
+	int family;
+	const char *addr;
+};
+
+static const struct reuseport_migrate_case test_cases[] = {
+	{
+		.name = "ipv4 epoll wake after reuseport migration",
+		.family = AF_INET,
+		.addr = "127.0.0.1",
+	},
+	{
+		.name = "ipv6 epoll wake after reuseport migration",
+		.family = AF_INET6,
+		.addr = "::1",
+	},
+};
+
+static void close_fd(int *fd)
+{
+	if (*fd >= 0) {
+		close(*fd);
+		*fd = -1;
+	}
+}
+
+static bool unsupported_addr_err(int family, int err)
+{
+	return family == AF_INET6 &&
+		(err == EAFNOSUPPORT ||
+		 err == EPROTONOSUPPORT ||
+		 err == EADDRNOTAVAIL);
+}
+
+static int make_sockaddr(const struct reuseport_migrate_case *test_case,
+			 unsigned short port,
+			 struct sockaddr_storage *addr,
+			 socklen_t *addrlen)
+{
+	memset(addr, 0, sizeof(*addr));
+
+	if (test_case->family == AF_INET) {
+		struct sockaddr_in *addr4 = (struct sockaddr_in *)addr;
+
+		addr4->sin_family = AF_INET;
+		addr4->sin_port = htons(port);
+		if (inet_pton(AF_INET, test_case->addr, &addr4->sin_addr) != 1)
+			return -1;
+
+		*addrlen = sizeof(*addr4);
+		return 0;
+	}
+
+	if (test_case->family == AF_INET6) {
+		struct sockaddr_in6 *addr6 = (struct sockaddr_in6 *)addr;
+
+		addr6->sin6_family = AF_INET6;
+		addr6->sin6_port = htons(port);
+		if (inet_pton(AF_INET6, test_case->addr, &addr6->sin6_addr) != 1)
+			return -1;
+
+		*addrlen = sizeof(*addr6);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int create_reuseport_socket(const struct reuseport_migrate_case *test_case)
+{
+	int one = 1;
+	int fd;
+
+	fd = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+	if (fd < 0)
+		return -1;
+
+	if (test_case->family == AF_INET6 &&
+	    setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, &one, sizeof(one))) {
+		close(fd);
+		return -1;
+	}
+
+	if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one))) {
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+static int set_nonblocking(int fd)
+{
+	int flags;
+
+	flags = fcntl(fd, F_GETFL);
+	if (flags < 0)
+		return -1;
+
+	return fcntl(fd, F_SETFL, flags | O_NONBLOCK);
+}
+
+static int enable_tcp_migrate_req(void)
+{
+	int len;
+	int fd;
+
+	fd = open(TCP_MIGRATE_REQ_PATH, O_RDWR | O_CLOEXEC);
+	if (fd < 0) {
+		if (errno == ENOENT || errno == EACCES ||
+		    errno == EPERM || errno == EROFS)
+			return KSFT_SKIP;
+		return KSFT_FAIL;
+	}
+
+	len = write(fd, "1", 1);
+	if (len != 1) {
+		if (errno == EACCES || errno == EPERM || errno == EROFS) {
+			close(fd);
+			return KSFT_SKIP;
+		}
+
+		close(fd);
+		return KSFT_FAIL;
+	}
+
+	close(fd);
+	return KSFT_PASS;
+}
+
+static void setup_netns(void)
+{
+	int ret;
+
+	if (unshare(CLONE_NEWNET))
+		ksft_exit_skip("unshare(CLONE_NEWNET): %s\n", strerror(errno));
+
+	if (system("ip link set lo up"))
+		ksft_exit_skip("failed to bring up lo interface in netns\n");
+
+	ret = enable_tcp_migrate_req();
+	if (ret == KSFT_SKIP)
+		ksft_exit_skip("failed to enable tcp_migrate_req\n");
+	if (ret == KSFT_FAIL)
+		ksft_exit_fail_msg("failed to enable tcp_migrate_req\n");
+}
+
+static int run_test(const struct reuseport_migrate_case *test_case)
+{
+	struct sockaddr_storage addr;
+	struct epoll_event ev = {
+		.events = EPOLLIN,
+	};
+	int listener_a = -1;
+	int listener_b = -1;
+	int ret = KSFT_FAIL;
+	socklen_t addrlen;
+	int accepted = -1;
+	int client = -1;
+	int epfd = -1;
+	int n;
+
+	if (make_sockaddr(test_case, 0, &addr, &addrlen)) {
+		ksft_print_msg("%s: failed to build socket address\n",
+			       test_case->name);
+		goto out;
+	}
+
+	listener_a = create_reuseport_socket(test_case);
+	if (listener_a < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(listener_a)");
+		goto out;
+	}
+
+	if (bind(listener_a, (struct sockaddr *)&addr, addrlen)) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("bind(listener_a)");
+		goto out;
+	}
+
+	if (listen(listener_a, 1)) {
+		ksft_perror("listen(listener_a)");
+		goto out;
+	}
+
+	addrlen = sizeof(addr);
+	if (getsockname(listener_a, (struct sockaddr *)&addr, &addrlen)) {
+		ksft_perror("getsockname(listener_a)");
+		goto out;
+	}
+
+	listener_b = create_reuseport_socket(test_case);
+	if (listener_b < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(listener_b)");
+		goto out;
+	}
+
+	if (bind(listener_b, (struct sockaddr *)&addr, addrlen)) {
+		ksft_perror("bind(listener_b)");
+		goto out;
+	}
+
+	client = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+	if (client < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(client)");
+		goto out;
+	}
+
+	/* Connect while only listener_a is listening, ensuring the
+	 * child lands in listener_a's accept queue deterministically.
+	 */
+	if (connect(client, (struct sockaddr *)&addr, addrlen)) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("connect(client)");
+		goto out;
+	}
+
+	if (listen(listener_b, 1)) {
+		ksft_perror("listen(listener_b)");
+		goto out;
+	}
+
+	if (set_nonblocking(listener_b)) {
+		ksft_perror("set_nonblocking(listener_b)");
+		goto out;
+	}
+
+	epfd = epoll_create1(EPOLL_CLOEXEC);
+	if (epfd < 0) {
+		ksft_perror("epoll_create1");
+		goto out;
+	}
+
+	ev.data.fd = listener_b;
+	if (epoll_ctl(epfd, EPOLL_CTL_ADD, listener_b, &ev)) {
+		ksft_perror("epoll_ctl(ADD listener_b)");
+		goto out;
+	}
+
+	close_fd(&listener_a);
+
+	n = epoll_wait(epfd, &ev, 1, EPOLL_TIMEOUT_MS);
+	if (n < 0) {
+		ksft_perror("epoll_wait");
+		goto out;
+	}
+
+	accepted = accept4(listener_b, NULL, NULL, SOCK_NONBLOCK | SOCK_CLOEXEC);
+	if (accepted < 0) {
+		if (errno == EAGAIN || errno == EWOULDBLOCK) {
+			ksft_print_msg("%s: target listener had no queued child after migration\n",
+				       test_case->name);
+			goto out;
+		}
+
+		ksft_perror("accept4(listener_b)");
+		goto out;
+	}
+
+	if (n != 1) {
+		ksft_print_msg("%s: accept queue was populated, but epoll_wait() timed out\n",
+			       test_case->name);
+		goto out;
+	}
+
+	if (ev.data.fd != listener_b || !(ev.events & EPOLLIN)) {
+		ksft_print_msg("%s: unexpected epoll event fd=%d events=%#x\n",
+			       test_case->name, ev.data.fd, ev.events);
+		goto out;
+	}
+
+	ret = KSFT_PASS;
+
+out:
+	close_fd(&accepted);
+	close_fd(&epfd);
+	close_fd(&client);
+	close_fd(&listener_b);
+	close_fd(&listener_a);
+
+	return ret;
+}
+
+int main(void)
+{
+	int status = KSFT_PASS;
+	int ret;
+	int i;
+
+	setup_netns();
+
+	ksft_print_header();
+	ksft_set_plan(ARRAY_SIZE(test_cases));
+
+	for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+		ret = run_test(&test_cases[i]);
+		ksft_test_result_code(ret, test_cases[i].name, NULL);
+
+		if (ret == KSFT_FAIL)
+			status = KSFT_FAIL;
+	}
+
+	if (status == KSFT_FAIL)
+		ksft_exit_fail();
+
+	ksft_finished();
+}
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH 4/4] drbd: switch from genl_magic macros to YNL-generated code
From: kernel test robot @ 2026-04-18  4:36 UTC (permalink / raw)
  To: Christoph Böhmwalder, Jens Axboe
  Cc: oe-kbuild-all, drbd-dev, linux-kernel, Lars Ellenberg,
	Philipp Reisner, linux-block, Donald Hunter, Eric Dumazet,
	Jakub Kicinski, netdev, Christoph Böhmwalder
In-Reply-To: <20260407173356.873887-5-christoph.boehmwalder@linbit.com>

Hi Christoph,

kernel test robot noticed the following build errors:

[auto build test ERROR on a9c4b1d37622ed01b75f94a4f68cf55f33153a31]

url:    https://github.com/intel-lab-lkp/linux/commits/Christoph-B-hmwalder/drbd-move-UAPI-headers-to-include-uapi-linux/20260417-214347
base:   a9c4b1d37622ed01b75f94a4f68cf55f33153a31
patch link:    https://lore.kernel.org/r/20260407173356.873887-5-christoph.boehmwalder%40linbit.com
patch subject: [PATCH 4/4] drbd: switch from genl_magic macros to YNL-generated code
config: x86_64-rhel-9.4 (https://download.01.org/0day-ci/archive/20260418/202604180607.iqIlyAER-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260418/202604180607.iqIlyAER-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604180607.iqIlyAER-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from <command-line>:
>> ./usr/include/linux/drbd.h:18:10: fatal error: sys/types.h: No such file or directory
      18 | #include <sys/types.h>
         |          ^~~~~~~~~~~~~
   compilation terminated.

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH net 2/2] selftests: net: add reuseport migration wakeup regression tests
From: Kuniyuki Iwashima @ 2026-04-18  4:40 UTC (permalink / raw)
  To: Zhenzhong Wu
  Cc: netdev, edumazet, ncardwell, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest
In-Reply-To: <20260418041633.691435-3-jt26wzz@gmail.com>

On Fri, Apr 17, 2026 at 9:17 PM Zhenzhong Wu <jt26wzz@gmail.com> wrote:
>
> Add selftests that reproduce missing wakeups on the target listener
> after SO_REUSEPORT migration from inet_csk_listen_stop().
>
> The epoll case connects while only the first listener is active so the
> child lands on its accept queue, registers the second listener with
> epoll, then closes the first listener to trigger migration. It verifies
> that the target listener both accepts the migrated child and becomes
> readable via epoll.
>
> The blocking accept case starts a thread blocked in accept() on the
> target listener, closes the first listener to trigger migration, and
> verifies that the blocked accept() wakes and returns the migrated
> child. Wait until the helper thread is actually asleep in accept()
> before triggering migration so the test does not race waiter
> registration.
>
> Run the tests in a private network namespace and enable
> net.ipv4.tcp_migrate_req=1 there so they can exercise the migration
> path without relying on a sk_reuseport/migrate BPF program. Treat a
> missing or unwritable tcp_migrate_req sysctl as SKIP. Run both
> scenarios for IPv4 and IPv6.
>
> These tests cover the bug fixed by the preceding patch.
>
> Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
> ---
>  tools/testing/selftests/net/Makefile          |   3 +
>  .../selftests/net/reuseport_migrate_accept.c  | 533 ++++++++++++++++++
>  .../selftests/net/reuseport_migrate_epoll.c   | 353 ++++++++++++
>  3 files changed, 889 insertions(+)
>  create mode 100644 tools/testing/selftests/net/reuseport_migrate_accept.c
>  create mode 100644 tools/testing/selftests/net/reuseport_migrate_epoll.c

Thanks for the series.

Instead of adding new tests, can you extend
tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c ?

It covers all migration scenarios and you can just add
the target listener to epoll and call non-blocking epoll_wait(,... 0)
before accept() to check if it returns 1 (the number of fd).


>
> diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
> index a275ed584..2f8b6c44d 100644
> --- a/tools/testing/selftests/net/Makefile
> +++ b/tools/testing/selftests/net/Makefile
> @@ -184,6 +184,8 @@ TEST_GEN_PROGS := \
>         reuseport_bpf_cpu \
>         reuseport_bpf_numa \
>         reuseport_dualstack \
> +       reuseport_migrate_accept \
> +       reuseport_migrate_epoll \
>         sk_bind_sendto_listen \
>         sk_connect_zero_addr \
>         sk_so_peek_off \
> @@ -232,6 +234,7 @@ $(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma
>  $(OUTPUT)/tcp_mmap: LDLIBS += -lpthread -lcrypto
>  $(OUTPUT)/tcp_inq: LDLIBS += -lpthread
>  $(OUTPUT)/bind_bhash: LDLIBS += -lpthread
> +$(OUTPUT)/reuseport_migrate_accept: LDLIBS += -lpthread
>  $(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/
>
>  include bpf.mk
> diff --git a/tools/testing/selftests/net/reuseport_migrate_accept.c b/tools/testing/selftests/net/reuseport_migrate_accept.c
> new file mode 100644
> index 000000000..a516843a0
> --- /dev/null
> +++ b/tools/testing/selftests/net/reuseport_migrate_accept.c
> @@ -0,0 +1,533 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define _GNU_SOURCE
> +
> +#include <arpa/inet.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <netinet/in.h>
> +#include <pthread.h>
> +#include <sched.h>
> +#include <signal.h>
> +#include <stdbool.h>
> +#include <stdatomic.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/socket.h>
> +#include <sys/syscall.h>
> +#include <time.h>
> +#include <unistd.h>
> +
> +#include "../kselftest.h"
> +
> +#define ACCEPT_BLOCK_TIMEOUT_MS 1000
> +#define ACCEPT_CLEANUP_TIMEOUT_MS 1000
> +#define ACCEPT_WAKE_TIMEOUT_MS 2000
> +#define TCP_MIGRATE_REQ_PATH "/proc/sys/net/ipv4/tcp_migrate_req"
> +
> +struct reuseport_migrate_case {
> +       const char *name;
> +       int family;
> +       const char *addr;
> +};
> +
> +struct accept_result {
> +       int listener_fd;
> +       atomic_int started;
> +       atomic_int tid;
> +       int accepted_fd;
> +       int err;
> +};
> +
> +static const struct reuseport_migrate_case test_cases[] = {
> +       {
> +               .name = "ipv4 blocking accept wake after reuseport migration",
> +               .family = AF_INET,
> +               .addr = "127.0.0.1",
> +       },
> +       {
> +               .name = "ipv6 blocking accept wake after reuseport migration",
> +               .family = AF_INET6,
> +               .addr = "::1",
> +       },
> +};
> +
> +static void close_fd(int *fd)
> +{
> +       if (*fd >= 0) {
> +               close(*fd);
> +               *fd = -1;
> +       }
> +}
> +
> +static bool unsupported_addr_err(int family, int err)
> +{
> +       return family == AF_INET6 &&
> +               (err == EAFNOSUPPORT ||
> +                err == EPROTONOSUPPORT ||
> +                err == EADDRNOTAVAIL);
> +}
> +
> +static int make_sockaddr(const struct reuseport_migrate_case *test_case,
> +                        unsigned short port,
> +                        struct sockaddr_storage *addr,
> +                        socklen_t *addrlen)
> +{
> +       memset(addr, 0, sizeof(*addr));
> +
> +       if (test_case->family == AF_INET) {
> +               struct sockaddr_in *addr4 = (struct sockaddr_in *)addr;
> +
> +               addr4->sin_family = AF_INET;
> +               addr4->sin_port = htons(port);
> +               if (inet_pton(AF_INET, test_case->addr, &addr4->sin_addr) != 1)
> +                       return -1;
> +
> +               *addrlen = sizeof(*addr4);
> +               return 0;
> +       }
> +
> +       if (test_case->family == AF_INET6) {
> +               struct sockaddr_in6 *addr6 = (struct sockaddr_in6 *)addr;
> +
> +               addr6->sin6_family = AF_INET6;
> +               addr6->sin6_port = htons(port);
> +               if (inet_pton(AF_INET6, test_case->addr, &addr6->sin6_addr) != 1)
> +                       return -1;
> +
> +               *addrlen = sizeof(*addr6);
> +               return 0;
> +       }
> +
> +       return -1;
> +}
> +
> +static int create_reuseport_socket(const struct reuseport_migrate_case *test_case)
> +{
> +       int one = 1;
> +       int fd;
> +
> +       fd = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
> +       if (fd < 0)
> +               return -1;
> +
> +       if (test_case->family == AF_INET6 &&
> +           setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, &one, sizeof(one))) {
> +               close(fd);
> +               return -1;
> +       }
> +
> +       if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one))) {
> +               close(fd);
> +               return -1;
> +       }
> +
> +       return fd;
> +}
> +
> +static int enable_tcp_migrate_req(void)
> +{
> +       int len;
> +       int fd;
> +
> +       fd = open(TCP_MIGRATE_REQ_PATH, O_RDWR | O_CLOEXEC);
> +       if (fd < 0) {
> +               if (errno == ENOENT || errno == EACCES ||
> +                   errno == EPERM || errno == EROFS)
> +                       return KSFT_SKIP;
> +               return KSFT_FAIL;
> +       }
> +
> +       len = write(fd, "1", 1);
> +       if (len != 1) {
> +               if (errno == EACCES || errno == EPERM || errno == EROFS) {
> +                       close(fd);
> +                       return KSFT_SKIP;
> +               }
> +
> +               close(fd);
> +               return KSFT_FAIL;
> +       }
> +
> +       close(fd);
> +       return KSFT_PASS;
> +}
> +
> +static void setup_netns(void)
> +{
> +       int ret;
> +
> +       if (unshare(CLONE_NEWNET))
> +               ksft_exit_skip("unshare(CLONE_NEWNET): %s\n", strerror(errno));
> +
> +       if (system("ip link set lo up"))
> +               ksft_exit_skip("failed to bring up lo interface in netns\n");
> +
> +       ret = enable_tcp_migrate_req();
> +       if (ret == KSFT_SKIP)
> +               ksft_exit_skip("failed to enable tcp_migrate_req\n");
> +       if (ret == KSFT_FAIL)
> +               ksft_exit_fail_msg("failed to enable tcp_migrate_req\n");
> +}
> +
> +static void noop_handler(int sig)
> +{
> +       (void)sig;
> +}
> +
> +static void *accept_thread(void *arg)
> +{
> +       struct accept_result *result = arg;
> +
> +       atomic_store_explicit(&result->tid, (int)syscall(SYS_gettid),
> +                             memory_order_release);
> +       atomic_store_explicit(&result->started, 1, memory_order_release);
> +       result->accepted_fd = accept4(result->listener_fd, NULL, NULL,
> +                                     SOCK_CLOEXEC);
> +       if (result->accepted_fd < 0)
> +               result->err = errno;
> +
> +       return NULL;
> +}
> +
> +static int read_thread_state(int tid, char *state)
> +{
> +       char *close_paren;
> +       char path[64];
> +       char buf[256];
> +       ssize_t len;
> +       int fd;
> +
> +       snprintf(path, sizeof(path), "/proc/self/task/%d/stat", tid);
> +
> +       fd = open(path, O_RDONLY | O_CLOEXEC);
> +       if (fd < 0)
> +               return -errno;
> +
> +       len = read(fd, buf, sizeof(buf) - 1);
> +       close(fd);
> +       if (len < 0)
> +               return -errno;
> +       if (!len)
> +               return -EINVAL;
> +
> +       buf[len] = '\0';
> +       close_paren = strrchr(buf, ')');
> +       if (!close_paren || close_paren[1] != ' ' || !close_paren[2])
> +               return -EINVAL;
> +
> +       *state = close_paren[2];
> +       return 0;
> +}
> +
> +static int wait_for_accept_to_block(const struct reuseport_migrate_case *test_case,
> +                                   int tid)
> +{
> +       char state = '\0';
> +       int ret;
> +       int i;
> +
> +       /*
> +        * A started thread is not enough here: we need to know the waiter
> +        * has actually gone to sleep in accept() before closing listener_a,
> +        * otherwise migration can race ahead of waiter registration. Poll
> +        * /proc task state because the pthread APIs can tell us whether the
> +        * thread has exited, but not whether it is already blocked in the
> +        * target syscall.
> +        */
> +       for (i = 0; i < ACCEPT_BLOCK_TIMEOUT_MS; i++) {
> +               ret = read_thread_state(tid, &state);
> +               if (!ret) {
> +                       if (state == 'S' || state == 'D')
> +                               return KSFT_PASS;
> +                       if (state == 'Z')
> +                               break;
> +               } else if (ret == -ENOENT) {
> +                       break;
> +               }
> +
> +               usleep(1000);
> +       }
> +
> +       ksft_print_msg("%s: accept waiter never blocked before migration\n",
> +                      test_case->name);
> +       return KSFT_FAIL;
> +}
> +
> +static int join_thread_with_timeout(pthread_t thread, int timeout_ms,
> +                                   bool *timed_out)
> +{
> +       struct timespec deadline;
> +       int err;
> +
> +       *timed_out = false;
> +
> +       if (clock_gettime(CLOCK_REALTIME, &deadline))
> +               return KSFT_FAIL;
> +
> +       deadline.tv_nsec += timeout_ms * 1000000LL;
> +       deadline.tv_sec += deadline.tv_nsec / 1000000000LL;
> +       deadline.tv_nsec %= 1000000000LL;
> +
> +       err = pthread_timedjoin_np(thread, NULL, &deadline);
> +       if (!err)
> +               return KSFT_PASS;
> +
> +       if (err != ETIMEDOUT)
> +               return KSFT_FAIL;
> +
> +       *timed_out = true;
> +       return KSFT_FAIL;
> +}
> +
> +static int interrupt_accept_thread(pthread_t thread)
> +{
> +       int err;
> +
> +       err = pthread_kill(thread, SIGUSR1);
> +       if (err && err != ESRCH)
> +               return KSFT_FAIL;
> +
> +       return KSFT_PASS;
> +}
> +
> +static int stop_accept_thread(pthread_t thread, bool *timed_out)
> +{
> +       if (interrupt_accept_thread(thread))
> +               return KSFT_FAIL;
> +
> +       return join_thread_with_timeout(thread, ACCEPT_CLEANUP_TIMEOUT_MS,
> +                                       timed_out);
> +}
> +
> +static int run_test(const struct reuseport_migrate_case *test_case)
> +{
> +       struct accept_result result = {
> +               .listener_fd = -1,
> +               .started = 0,
> +               .tid = -1,
> +               .accepted_fd = -1,
> +               .err = 0,
> +       };
> +       struct sockaddr_storage addr;
> +       struct sigaction sa = {
> +               .sa_handler = noop_handler,
> +       };
> +       bool thread_joined = false;
> +       bool cleanup_timed_out;
> +       int listener_a = -1;
> +       int listener_b = -1;
> +       int ret = KSFT_FAIL;
> +       socklen_t addrlen;
> +       pthread_t thread;
> +       int client = -1;
> +       bool timed_out;
> +       int probe = -1;
> +       int tid;
> +
> +       if (make_sockaddr(test_case, 0, &addr, &addrlen)) {
> +               ksft_print_msg("%s: failed to build socket address\n",
> +                              test_case->name);
> +               goto out;
> +       }
> +
> +       if (sigemptyset(&sa.sa_mask)) {
> +               ksft_perror("sigemptyset");
> +               goto out;
> +       }
> +
> +       if (sigaction(SIGUSR1, &sa, NULL)) {
> +               ksft_perror("sigaction(SIGUSR1)");
> +               goto out;
> +       }
> +
> +       listener_a = create_reuseport_socket(test_case);
> +       if (listener_a < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(listener_a)");
> +               goto out;
> +       }
> +
> +       if (bind(listener_a, (struct sockaddr *)&addr, addrlen)) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("bind(listener_a)");
> +               goto out;
> +       }
> +
> +       if (listen(listener_a, 1)) {
> +               ksft_perror("listen(listener_a)");
> +               goto out;
> +       }
> +
> +       addrlen = sizeof(addr);
> +       if (getsockname(listener_a, (struct sockaddr *)&addr, &addrlen)) {
> +               ksft_perror("getsockname(listener_a)");
> +               goto out;
> +       }
> +
> +       listener_b = create_reuseport_socket(test_case);
> +       if (listener_b < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(listener_b)");
> +               goto out;
> +       }
> +
> +       if (bind(listener_b, (struct sockaddr *)&addr, addrlen)) {
> +               ksft_perror("bind(listener_b)");
> +               goto out;
> +       }
> +
> +       client = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
> +       if (client < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(client)");
> +               goto out;
> +       }
> +
> +       /* Connect while only listener_a is listening, ensuring the
> +        * child lands in listener_a's accept queue deterministically.
> +        */
> +       if (connect(client, (struct sockaddr *)&addr, addrlen)) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("connect(client)");
> +               goto out;
> +       }
> +
> +       if (listen(listener_b, 1)) {
> +               ksft_perror("listen(listener_b)");
> +               goto out;
> +       }
> +
> +       result.listener_fd = listener_b;
> +       if (pthread_create(&thread, NULL, accept_thread, &result)) {
> +               ksft_perror("pthread_create");
> +               goto out;
> +       }
> +
> +       while (!atomic_load_explicit(&result.started, memory_order_acquire))
> +               sched_yield();
> +
> +       tid = atomic_load_explicit(&result.tid, memory_order_acquire);
> +       if (wait_for_accept_to_block(test_case, tid))
> +               goto out_with_thread;
> +
> +       close_fd(&listener_a);
> +
> +       ret = join_thread_with_timeout(thread, ACCEPT_WAKE_TIMEOUT_MS, &timed_out);
> +       if (ret == KSFT_PASS) {
> +               thread_joined = true;
> +               if (result.accepted_fd < 0) {
> +                       ksft_print_msg("%s: blocking accept() returned err=%d (%s)\n",
> +                                      test_case->name, result.err,
> +                                      strerror(result.err));
> +                       ret = KSFT_FAIL;
> +               }
> +
> +               goto out_with_thread;
> +       }
> +
> +       if (!timed_out) {
> +               ksft_print_msg("%s: join_thread_with_timeout() failed\n",
> +                              test_case->name);
> +               goto out_with_thread;
> +       }
> +
> +       if (stop_accept_thread(thread, &cleanup_timed_out) == KSFT_FAIL) {
> +               ksft_print_msg("%s: failed to stop blocking accept waiter\n",
> +                              test_case->name);
> +               goto out_with_thread;
> +       }
> +       thread_joined = true;
> +
> +       if (result.accepted_fd >= 0) {
> +               ksft_print_msg("%s: blocking accept() completed only in cleanup\n",
> +                              test_case->name);
> +               goto out_with_thread;
> +       }
> +
> +       if (result.err != EINTR) {
> +               ksft_print_msg("%s: blocking accept() returned err=%d (%s)\n",
> +                              test_case->name, result.err,
> +                              strerror(result.err));
> +               goto out_with_thread;
> +       }
> +
> +       probe = accept4(listener_b, NULL, NULL, SOCK_NONBLOCK | SOCK_CLOEXEC);
> +       if (probe >= 0) {
> +               ksft_print_msg("%s: accept queue was populated, but blocking accept() timed out\n",
> +                              test_case->name);
> +       } else if (errno == EAGAIN || errno == EWOULDBLOCK) {
> +               ksft_print_msg("%s: target listener had no queued child after migration\n",
> +                              test_case->name);
> +       } else {
> +               ksft_perror("accept4(listener_b)");
> +       }
> +
> +out_with_thread:
> +       close_fd(&probe);
> +       if (!thread_joined) {
> +               if (stop_accept_thread(thread, &cleanup_timed_out) == KSFT_FAIL) {
> +                       ksft_print_msg("%s: failed to stop blocking accept waiter\n",
> +                                      test_case->name);
> +                       ret = KSFT_FAIL;
> +                       goto out;
> +               }
> +
> +               thread_joined = true;
> +       }
> +       if (thread_joined)
> +               close_fd(&result.accepted_fd);
> +
> +out:
> +       close_fd(&client);
> +       close_fd(&listener_b);
> +       close_fd(&listener_a);
> +
> +       return ret;
> +}
> +
> +int main(void)
> +{
> +       int status = KSFT_PASS;
> +       int ret;
> +       int i;
> +
> +       setup_netns();
> +
> +       ksft_print_header();
> +       ksft_set_plan(ARRAY_SIZE(test_cases));
> +
> +       for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
> +               ret = run_test(&test_cases[i]);
> +               ksft_test_result_code(ret, test_cases[i].name, NULL);
> +
> +               if (ret == KSFT_FAIL)
> +                       status = KSFT_FAIL;
> +       }
> +
> +       if (status == KSFT_FAIL)
> +               ksft_exit_fail();
> +
> +       ksft_finished();
> +}
> diff --git a/tools/testing/selftests/net/reuseport_migrate_epoll.c b/tools/testing/selftests/net/reuseport_migrate_epoll.c
> new file mode 100644
> index 000000000..9cbfb58c4
> --- /dev/null
> +++ b/tools/testing/selftests/net/reuseport_migrate_epoll.c
> @@ -0,0 +1,353 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define _GNU_SOURCE
> +
> +#include <arpa/inet.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <netinet/in.h>
> +#include <sched.h>
> +#include <stdbool.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/epoll.h>
> +#include <sys/socket.h>
> +#include <unistd.h>
> +
> +#include "../kselftest.h"
> +
> +#define EPOLL_TIMEOUT_MS 500
> +#define TCP_MIGRATE_REQ_PATH "/proc/sys/net/ipv4/tcp_migrate_req"
> +
> +struct reuseport_migrate_case {
> +       const char *name;
> +       int family;
> +       const char *addr;
> +};
> +
> +static const struct reuseport_migrate_case test_cases[] = {
> +       {
> +               .name = "ipv4 epoll wake after reuseport migration",
> +               .family = AF_INET,
> +               .addr = "127.0.0.1",
> +       },
> +       {
> +               .name = "ipv6 epoll wake after reuseport migration",
> +               .family = AF_INET6,
> +               .addr = "::1",
> +       },
> +};
> +
> +static void close_fd(int *fd)
> +{
> +       if (*fd >= 0) {
> +               close(*fd);
> +               *fd = -1;
> +       }
> +}
> +
> +static bool unsupported_addr_err(int family, int err)
> +{
> +       return family == AF_INET6 &&
> +               (err == EAFNOSUPPORT ||
> +                err == EPROTONOSUPPORT ||
> +                err == EADDRNOTAVAIL);
> +}
> +
> +static int make_sockaddr(const struct reuseport_migrate_case *test_case,
> +                        unsigned short port,
> +                        struct sockaddr_storage *addr,
> +                        socklen_t *addrlen)
> +{
> +       memset(addr, 0, sizeof(*addr));
> +
> +       if (test_case->family == AF_INET) {
> +               struct sockaddr_in *addr4 = (struct sockaddr_in *)addr;
> +
> +               addr4->sin_family = AF_INET;
> +               addr4->sin_port = htons(port);
> +               if (inet_pton(AF_INET, test_case->addr, &addr4->sin_addr) != 1)
> +                       return -1;
> +
> +               *addrlen = sizeof(*addr4);
> +               return 0;
> +       }
> +
> +       if (test_case->family == AF_INET6) {
> +               struct sockaddr_in6 *addr6 = (struct sockaddr_in6 *)addr;
> +
> +               addr6->sin6_family = AF_INET6;
> +               addr6->sin6_port = htons(port);
> +               if (inet_pton(AF_INET6, test_case->addr, &addr6->sin6_addr) != 1)
> +                       return -1;
> +
> +               *addrlen = sizeof(*addr6);
> +               return 0;
> +       }
> +
> +       return -1;
> +}
> +
> +static int create_reuseport_socket(const struct reuseport_migrate_case *test_case)
> +{
> +       int one = 1;
> +       int fd;
> +
> +       fd = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
> +       if (fd < 0)
> +               return -1;
> +
> +       if (test_case->family == AF_INET6 &&
> +           setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, &one, sizeof(one))) {
> +               close(fd);
> +               return -1;
> +       }
> +
> +       if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one))) {
> +               close(fd);
> +               return -1;
> +       }
> +
> +       return fd;
> +}
> +
> +static int set_nonblocking(int fd)
> +{
> +       int flags;
> +
> +       flags = fcntl(fd, F_GETFL);
> +       if (flags < 0)
> +               return -1;
> +
> +       return fcntl(fd, F_SETFL, flags | O_NONBLOCK);
> +}
> +
> +static int enable_tcp_migrate_req(void)
> +{
> +       int len;
> +       int fd;
> +
> +       fd = open(TCP_MIGRATE_REQ_PATH, O_RDWR | O_CLOEXEC);
> +       if (fd < 0) {
> +               if (errno == ENOENT || errno == EACCES ||
> +                   errno == EPERM || errno == EROFS)
> +                       return KSFT_SKIP;
> +               return KSFT_FAIL;
> +       }
> +
> +       len = write(fd, "1", 1);
> +       if (len != 1) {
> +               if (errno == EACCES || errno == EPERM || errno == EROFS) {
> +                       close(fd);
> +                       return KSFT_SKIP;
> +               }
> +
> +               close(fd);
> +               return KSFT_FAIL;
> +       }
> +
> +       close(fd);
> +       return KSFT_PASS;
> +}
> +
> +static void setup_netns(void)
> +{
> +       int ret;
> +
> +       if (unshare(CLONE_NEWNET))
> +               ksft_exit_skip("unshare(CLONE_NEWNET): %s\n", strerror(errno));
> +
> +       if (system("ip link set lo up"))
> +               ksft_exit_skip("failed to bring up lo interface in netns\n");
> +
> +       ret = enable_tcp_migrate_req();
> +       if (ret == KSFT_SKIP)
> +               ksft_exit_skip("failed to enable tcp_migrate_req\n");
> +       if (ret == KSFT_FAIL)
> +               ksft_exit_fail_msg("failed to enable tcp_migrate_req\n");
> +}
> +
> +static int run_test(const struct reuseport_migrate_case *test_case)
> +{
> +       struct sockaddr_storage addr;
> +       struct epoll_event ev = {
> +               .events = EPOLLIN,
> +       };
> +       int listener_a = -1;
> +       int listener_b = -1;
> +       int ret = KSFT_FAIL;
> +       socklen_t addrlen;
> +       int accepted = -1;
> +       int client = -1;
> +       int epfd = -1;
> +       int n;
> +
> +       if (make_sockaddr(test_case, 0, &addr, &addrlen)) {
> +               ksft_print_msg("%s: failed to build socket address\n",
> +                              test_case->name);
> +               goto out;
> +       }
> +
> +       listener_a = create_reuseport_socket(test_case);
> +       if (listener_a < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(listener_a)");
> +               goto out;
> +       }
> +
> +       if (bind(listener_a, (struct sockaddr *)&addr, addrlen)) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("bind(listener_a)");
> +               goto out;
> +       }
> +
> +       if (listen(listener_a, 1)) {
> +               ksft_perror("listen(listener_a)");
> +               goto out;
> +       }
> +
> +       addrlen = sizeof(addr);
> +       if (getsockname(listener_a, (struct sockaddr *)&addr, &addrlen)) {
> +               ksft_perror("getsockname(listener_a)");
> +               goto out;
> +       }
> +
> +       listener_b = create_reuseport_socket(test_case);
> +       if (listener_b < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(listener_b)");
> +               goto out;
> +       }
> +
> +       if (bind(listener_b, (struct sockaddr *)&addr, addrlen)) {
> +               ksft_perror("bind(listener_b)");
> +               goto out;
> +       }
> +
> +       client = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
> +       if (client < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(client)");
> +               goto out;
> +       }
> +
> +       /* Connect while only listener_a is listening, ensuring the
> +        * child lands in listener_a's accept queue deterministically.
> +        */
> +       if (connect(client, (struct sockaddr *)&addr, addrlen)) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("connect(client)");
> +               goto out;
> +       }
> +
> +       if (listen(listener_b, 1)) {
> +               ksft_perror("listen(listener_b)");
> +               goto out;
> +       }
> +
> +       if (set_nonblocking(listener_b)) {
> +               ksft_perror("set_nonblocking(listener_b)");
> +               goto out;
> +       }
> +
> +       epfd = epoll_create1(EPOLL_CLOEXEC);
> +       if (epfd < 0) {
> +               ksft_perror("epoll_create1");
> +               goto out;
> +       }
> +
> +       ev.data.fd = listener_b;
> +       if (epoll_ctl(epfd, EPOLL_CTL_ADD, listener_b, &ev)) {
> +               ksft_perror("epoll_ctl(ADD listener_b)");
> +               goto out;
> +       }
> +
> +       close_fd(&listener_a);
> +
> +       n = epoll_wait(epfd, &ev, 1, EPOLL_TIMEOUT_MS);
> +       if (n < 0) {
> +               ksft_perror("epoll_wait");
> +               goto out;
> +       }
> +
> +       accepted = accept4(listener_b, NULL, NULL, SOCK_NONBLOCK | SOCK_CLOEXEC);
> +       if (accepted < 0) {
> +               if (errno == EAGAIN || errno == EWOULDBLOCK) {
> +                       ksft_print_msg("%s: target listener had no queued child after migration\n",
> +                                      test_case->name);
> +                       goto out;
> +               }
> +
> +               ksft_perror("accept4(listener_b)");
> +               goto out;
> +       }
> +
> +       if (n != 1) {
> +               ksft_print_msg("%s: accept queue was populated, but epoll_wait() timed out\n",
> +                              test_case->name);
> +               goto out;
> +       }
> +
> +       if (ev.data.fd != listener_b || !(ev.events & EPOLLIN)) {
> +               ksft_print_msg("%s: unexpected epoll event fd=%d events=%#x\n",
> +                              test_case->name, ev.data.fd, ev.events);
> +               goto out;
> +       }
> +
> +       ret = KSFT_PASS;
> +
> +out:
> +       close_fd(&accepted);
> +       close_fd(&epfd);
> +       close_fd(&client);
> +       close_fd(&listener_b);
> +       close_fd(&listener_a);
> +
> +       return ret;
> +}
> +
> +int main(void)
> +{
> +       int status = KSFT_PASS;
> +       int ret;
> +       int i;
> +
> +       setup_netns();
> +
> +       ksft_print_header();
> +       ksft_set_plan(ARRAY_SIZE(test_cases));
> +
> +       for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
> +               ret = run_test(&test_cases[i]);
> +               ksft_test_result_code(ret, test_cases[i].name, NULL);
> +
> +               if (ret == KSFT_FAIL)
> +                       status = KSFT_FAIL;
> +       }
> +
> +       if (status == KSFT_FAIL)
> +               ksft_exit_fail();
> +
> +       ksft_finished();
> +}
> --
> 2.43.0
>

^ permalink raw reply

* [PATCH net 0/4] xsk: fix bugs around xsk skb allocation
From: Jason Xing @ 2026-04-18  4:56 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

There are four extremely rare issues around xsk_build_skb(). Two of them
were founded by Sashiko[1].

[1]: https://lore.kernel.org/all/20260415082654.21026-1-kerneljasonxing@gmail.com/

Jason Xing (4):
  xsk: avoid skb leak in XDP_TX_METADATA case
  xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
  xsk: handle NULL dereference of the skb without frags issue
  xsk: fix use-after-free of xs->skb in xsk_build_skb()  free_err path

 net/xdp/xsk.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

-- 
2.41.3


^ permalink raw reply

* [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case
From: Jason Xing @ 2026-04-18  4:56 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260418045644.28612-1-kerneljasonxing@gmail.com>

From: Jason Xing <kernelxing@tencent.com>

Fix it by explicitly adding kfree_skb() before returning back to its
caller.

How to reproduce it in virtio_net:
1. the current skb is the first one (which means no frag and xs->skb is
   NULL) and users enable metadata feature.
2. xsk_skb_metadata() returns a error code.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
4. there is no chance to free this skb anymore.

Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 6149f6a79897..8fcde34aec7b 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -743,8 +743,10 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
 		xsk_skb_init_misc(skb, xs, desc->addr);
 		if (desc->options & XDP_TX_METADATA) {
 			err = xsk_skb_metadata(skb, buffer, desc, pool, hr);
-			if (unlikely(err))
+			if (unlikely(err)) {
+				kfree_skb(skb);
 				return ERR_PTR(err);
+			}
 		}
 	} else {
 		struct xsk_addrs *xsk_addr;
-- 
2.41.3


^ permalink raw reply related

* [PATCH net 2/4] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
From: Jason Xing @ 2026-04-18  4:56 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260418045644.28612-1-kerneljasonxing@gmail.com>

From: Jason Xing <kernelxing@tencent.com>

Fix it by explicitly adding kfree_skb() before returning back to its
caller.

How to reproduce it in virtio_net:
1. the current skb is the first one (which means xs->skb is NULL) and
   hit the limit MAX_SKB_FRAGS.
2. xsk_build_skb_zerocopy() returns -EOVERFLOW.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. This
   is why bug can be triggered.
4. there is no chance to free this skb anymore.

Note that if in this case the xs->skb is not NULL, xsk_build_skb() will
call xsk_drop_skb(xs->skb) to do the right thing.

Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 8fcde34aec7b..5d3dbb118730 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -778,8 +778,11 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
 	addr = buffer - pool->addrs;
 
 	for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) {
-		if (unlikely(i >= MAX_SKB_FRAGS))
+		if (unlikely(i >= MAX_SKB_FRAGS)) {
+			if (!xs->skb)
+				kfree_skb(skb);
 			return ERR_PTR(-EOVERFLOW);
+		}
 
 		page = pool->umem->pgs[addr >> PAGE_SHIFT];
 		get_page(page);
-- 
2.41.3


^ permalink raw reply related

* [PATCH net 3/4] xsk: handle NULL dereference of the skb without frags issue
From: Jason Xing @ 2026-04-18  4:56 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260418045644.28612-1-kerneljasonxing@gmail.com>

From: Jason Xing <kernelxing@tencent.com>

When a first descriptor (xs->skb == NULL) triggers -EOVERFLOW in
xsk_build_skb_zerocopy (e.g., MAX_SKB_FRAGS exceeded), the free_err
EOVERFLOW handler unconditionally dereferences xs->skb via
xsk_inc_num_desc(xs->skb) and xsk_drop_skb(xs->skb), causing a NULL
pointer dereference.

In the patch 2/4, the skb is already freed by kfree_skb() inside
xsk_build_skb_zerocopy for the first-descriptor case, so we only need
to do the bookkeeping: cancel the one reserved CQ slot and account for
the single invalid descriptor.

Guard the existing xsk_inc_num_desc/xsk_drop_skb calls with an
xs->skb check (for the continuation case), and add an else branch
for the first-descriptor case that manually cancels the CQ slot and
increments invalid_descs by one.

Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 5d3dbb118730..2918b773aa84 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -898,9 +898,14 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
 		kfree_skb(skb);
 
 	if (err == -EOVERFLOW) {
-		/* Drop the packet */
-		xsk_inc_num_desc(xs->skb);
-		xsk_drop_skb(xs->skb);
+		if (xs->skb) {
+			/* Drop the packet */
+			xsk_inc_num_desc(xs->skb);
+			xsk_drop_skb(xs->skb);
+		} else {
+			xsk_cq_cancel_locked(xs->pool, 1);
+			xs->tx->invalid_descs++;
+		}
 		xskq_cons_release(xs->tx);
 	} else {
 		/* Let application retry */
-- 
2.41.3


^ permalink raw reply related

* [PATCH net 4/4] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path
From: Jason Xing @ 2026-04-18  4:56 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260418045644.28612-1-kerneljasonxing@gmail.com>

From: Jason Xing <kernelxing@tencent.com>

When xsk_build_skb() processes multi-buffer packets in copy mode, the
first descriptor stores data into the skb linear area without adding
any frags, so nr_frags stays at 0. The caller then sets xs->skb = skb
to accumulate subsequent descriptors.

If a continuation descriptor fails (e.g. alloc_page returns NULL with
-EAGAIN), we jump to free_err where the condition:

  if (skb && !skb_shinfo(skb)->nr_frags)
      kfree_skb(skb);

evaluates to true because nr_frags is still 0 (the first descriptor
used the linear area, not frags). This frees the skb while xs->skb
still points to it, creating a dangling pointer. On the next transmit
attempt or socket close, xs->skb is dereferenced, causing a
use-after-free or double-free.

Fix by adding a !xs->skb check to the condition, ensuring we only free
skbs that were freshly allocated in this call (xs->skb is NULL) and
never free an in-progress multi-buffer skb that the caller still
references.

Closes: https://lore.kernel.org/all/20260415082654.21026-4-kerneljasonxing@gmail.com/
Fixes: 6b9c129c2f93 ("xsk: remove @first_frag from xsk_build_skb()")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 2918b773aa84..22c7a92e0734 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -894,7 +894,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
 	return skb;
 
 free_err:
-	if (skb && !skb_shinfo(skb)->nr_frags)
+	if (skb && !xs->skb && !skb_shinfo(skb)->nr_frags)
 		kfree_skb(skb);
 
 	if (err == -EOVERFLOW) {
-- 
2.41.3


^ permalink raw reply related

* Re: [PATCH v1 net] tcp: Disable usec TS for SYN Cookie.
From: Eric Dumazet @ 2026-04-18  4:59 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: Neal Cardwell, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, netdev
In-Reply-To: <20260418024957.2669737-1-kuniyu@google.com>

On Fri, Apr 17, 2026 at 7:50 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>
> cookie_tcp_reqsk_alloc() sets tcp_rsk(req)->req_usec_ts to false
> unconditionally.
>
> If want_cookie is true in tcp_conn_request(), we should not set
> tcp_rsk(req)->req_usec_ts.
>
> Let's not call dst_tcp_usec_ts() for SYN Cookie.

May I ask why ?

TCP usec TS are based on routing. this feature is not part of SYN
and/or SYNACK options.

Both side must have:

ip route ... feature tcp_usec_ts

syncookies are orthogonal to this constraint, so TCP flows can use
usec TS just fine.

pw-bot: cr

^ permalink raw reply

* [syzbot] [net?] possible deadlock in br_forward_delay_timer_expired (5)
From: syzbot @ 2026-04-18  5:30 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, jv, kuba, linux-kernel, netdev,
	pabeni, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    43cfbdda5af6 Merge tag 'for-linus-iommufd' of git://git.ke..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=100a4702580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8195c5b22e79c2cf
dashboard link: https://syzkaller.appspot.com/bug?extid=a7f25fd06ad99e9379e4
compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/848e46852283/disk-43cfbdda.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/24283dbdc318/vmlinux-43cfbdda.xz
kernel image: https://storage.googleapis.com/syzbot-assets/f91b3fadd31d/bzImage-43cfbdda.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+a7f25fd06ad99e9379e4@syzkaller.appspotmail.com

netlink: 16 bytes leftover after parsing attributes in process `syz.3.6945'.
=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
syzkaller #0 Tainted: G             L     
-----------------------------------------------------
syz.3.6945/21491 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
ffff888035200e98 (&bond->stats_lock/2){+.+.}-{3:3}, at: bond_get_stats+0x458/0x740 drivers/net/bonding/bond_main.c:4514

and this task is already holding:
ffff888036758e18 (&br->lock){+.-.}-{3:3}, at: spin_lock_bh include/linux/spinlock.h:348 [inline]
ffff888036758e18 (&br->lock){+.-.}-{3:3}, at: br_port_slave_changelink+0x3d/0x150 net/bridge/br_netlink.c:1212
which would create a new lock dependency:
 (&br->lock){+.-.}-{3:3} -> (&bond->stats_lock/2){+.+.}-{3:3}

but this new dependency connects a SOFTIRQ-irq-safe lock:
 (&br->lock){+.-.}-{3:3}

... which became SOFTIRQ-irq-safe at:
  lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
  __raw_spin_lock include/linux/spinlock_api_smp.h:158 [inline]
  _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:158
  spin_lock include/linux/spinlock.h:342 [inline]
  br_forward_delay_timer_expired+0x4f/0x460 net/bridge/br_stp_timer.c:88
  call_timer_fn+0x192/0x5e0 kernel/time/timer.c:1748
  expire_timers kernel/time/timer.c:1799 [inline]
  __run_timers kernel/time/timer.c:2374 [inline]
  __run_timer_base+0x652/0x8b0 kernel/time/timer.c:2386
  run_timer_base kernel/time/timer.c:2395 [inline]
  run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2405
  handle_softirqs+0x22a/0x840 kernel/softirq.c:622
  __do_softirq kernel/softirq.c:656 [inline]
  invoke_softirq kernel/softirq.c:496 [inline]
  __irq_exit_rcu+0xca/0x220 kernel/softirq.c:735
  irq_exit_rcu+0x9/0x30 kernel/softirq.c:752
  common_interrupt+0xbb/0xe0 arch/x86/kernel/irq.c:326
  asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:688
  finish_task_switch+0x427/0xbe0 kernel/sched/core.c:5244
  context_switch kernel/sched/core.c:5390 [inline]
  __schedule+0x17bc/0x5680 kernel/sched/core.c:7188
  __schedule_loop kernel/sched/core.c:7267 [inline]
  schedule+0x164/0x360 kernel/sched/core.c:7282
  smpboot_thread_fn+0x5bc/0xa50 kernel/smpboot.c:156
  kthread+0x388/0x470 kernel/kthread.c:436
  ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

to a SOFTIRQ-irq-unsafe lock:
 (&bond->stats_lock/2){+.+.}-{3:3}

... which became SOFTIRQ-irq-unsafe at:
...
  lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
  _raw_spin_lock_nested+0x32/0x50 kernel/locking/spinlock.c:382
  bond_get_stats+0x458/0x740 drivers/net/bonding/bond_main.c:4514
  dev_get_stats+0xb4/0xa50 net/core/dev.c:11916
  rtnl_fill_stats+0x47/0x8c0 net/core/rtnetlink.c:1506
  rtnl_fill_ifinfo+0x1840/0x20f0 net/core/rtnetlink.c:2155
  rtmsg_ifinfo_build_skb+0x17d/0x260 net/core/rtnetlink.c:4452
  rtmsg_ifinfo_event net/core/rtnetlink.c:4485 [inline]
  rtnetlink_event+0x1b7/0x270 net/core/rtnetlink.c:7054
  notifier_call_chain+0x1ad/0x3d0 kernel/notifier.c:85
  call_netdevice_notifiers_extack net/core/dev.c:2287 [inline]
  call_netdevice_notifiers net/core/dev.c:2301 [inline]
  netdev_features_change net/core/dev.c:1590 [inline]
  netdev_change_features net/core/dev.c:11155 [inline]
  netdev_compute_master_upper_features+0x91e/0xac0 net/core/dev.c:12913
  bond_enslave+0x21cc/0x3c10 drivers/net/bonding/bond_main.c:2276
  do_set_master+0x533/0x6d0 net/core/rtnetlink.c:2985
  do_setlink+0x1018/0x4590 net/core/rtnetlink.c:3187
  rtnl_changelink net/core/rtnetlink.c:3798 [inline]
  __rtnl_newlink net/core/rtnetlink.c:3971 [inline]
  rtnl_newlink+0x15ad/0x1bb0 net/core/rtnetlink.c:4108
  rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6994
  netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
  netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
  netlink_unicast+0x75c/0x8e0 net/netlink/af_netlink.c:1344
  netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
  sock_sendmsg_nosec net/socket.c:787 [inline]
  __sock_sendmsg net/socket.c:802 [inline]
  ____sys_sendmsg+0x972/0x9f0 net/socket.c:2698
  ___sys_sendmsg+0x2a5/0x360 net/socket.c:2752
  __sys_sendmsg net/socket.c:2784 [inline]
  __do_sys_sendmsg net/socket.c:2789 [inline]
  __se_sys_sendmsg net/socket.c:2787 [inline]
  __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2787
  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
  do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
  entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&bond->stats_lock/2);
                               local_irq_disable();
                               lock(&br->lock);
                               lock(&bond->stats_lock/2);
  <Interrupt>
    lock(&br->lock);

 *** DEADLOCK ***

3 locks held by syz.3.6945/21491:
 #0: ffffffff8fdddc80 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline]
 #0: ffffffff8fdddc80 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock net/core/rtnetlink.c:341 [inline]
 #0: ffffffff8fdddc80 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x883/0x1bb0 net/core/rtnetlink.c:4107
 #1: ffff888036758e18 (&br->lock){+.-.}-{3:3}, at: spin_lock_bh include/linux/spinlock.h:348 [inline]
 #1: ffff888036758e18 (&br->lock){+.-.}-{3:3}, at: br_port_slave_changelink+0x3d/0x150 net/bridge/br_netlink.c:1212
 #2: ffffffff8e95cb20 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
 #2: ffffffff8e95cb20 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
 #2: ffffffff8e95cb20 (rcu_read_lock){....}-{1:3}, at: bond_get_stats+0x11a/0x740 drivers/net/bonding/bond_main.c:4509

the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
-> (&br->lock){+.-.}-{3:3} {
   HARDIRQ-ON-W at:
                    lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
                    __raw_spin_lock_bh include/linux/spinlock_api_smp.h:150 [inline]
                    _raw_spin_lock_bh+0x36/0x50 kernel/locking/spinlock.c:182
                    spin_lock_bh include/linux/spinlock.h:348 [inline]
                    br_add_if+0xa99/0xeb0 net/bridge/br_if.c:668
                    do_set_master+0x533/0x6d0 net/core/rtnetlink.c:2985
                    do_setlink+0x1018/0x4590 net/core/rtnetlink.c:3187
                    rtnl_changelink net/core/rtnetlink.c:3798 [inline]
                    __rtnl_newlink net/core/rtnetlink.c:3971 [inline]
                    rtnl_newlink+0x15ad/0x1bb0 net/core/rtnetlink.c:4108
                    rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6994
                    netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
                    netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
                    netlink_unicast+0x75c/0x8e0 net/netlink/af_netlink.c:1344
                    netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
                    sock_sendmsg_nosec net/socket.c:787 [inline]
                    __sock_sendmsg net/socket.c:802 [inline]
                    __sys_sendto+0x672/0x710 net/socket.c:2265
                    __do_sys_sendto net/socket.c:2272 [inline]
                    __se_sys_sendto net/socket.c:2268 [inline]
                    __x64_sys_sendto+0xde/0x100 net/socket.c:2268
                    do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
                    do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
                    entry_SYSCALL_64_after_hwframe+0x77/0x7f
   IN-SOFTIRQ-W at:
                    lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
                    __raw_spin_lock include/linux/spinlock_api_smp.h:158 [inline]
                    _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:158
                    spin_lock include/linux/spinlock.h:342 [inline]
                    br_forward_delay_timer_expired+0x4f/0x460 net/bridge/br_stp_timer.c:88
                    call_timer_fn+0x192/0x5e0 kernel/time/timer.c:1748
                    expire_timers kernel/time/timer.c:1799 [inline]
                    __run_timers kernel/time/timer.c:2374 [inline]
                    __run_timer_base+0x652/0x8b0 kernel/time/timer.c:2386
                    run_timer_base kernel/time/timer.c:2395 [inline]
                    run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2405
                    handle_softirqs+0x22a/0x840 kernel/softirq.c:622
                    __do_softirq kernel/softirq.c:656 [inline]
                    invoke_softirq kernel/softirq.c:496 [inline]
                    __irq_exit_rcu+0xca/0x220 kernel/softirq.c:735
                    irq_exit_rcu+0x9/0x30 kernel/softirq.c:752
                    common_interrupt+0xbb/0xe0 arch/x86/kernel/irq.c:326
                    asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:688
                    finish_task_switch+0x427/0xbe0 kernel/sched/core.c:5244
                    context_switch kernel/sched/core.c:5390 [inline]
                    __schedule+0x17bc/0x5680 kernel/sched/core.c:7188
                    __schedule_loop kernel/sched/core.c:7267 [inline]
                    schedule+0x164/0x360 kernel/sched/core.c:7282
                    smpboot_thread_fn+0x5bc/0xa50 kernel/smpboot.c:156
                    kthread+0x388/0x470 kernel/kthread.c:436
                    ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
                    ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
   INITIAL USE at:
                   lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
                   __raw_spin_lock_bh include/linux/spinlock_api_smp.h:150 [inline]
                   _raw_spin_lock_bh+0x36/0x50 kernel/locking/spinlock.c:182
                   spin_lock_bh include/linux/spinlock.h:348 [inline]
                   br_add_if+0xa99/0xeb0 net/bridge/br_if.c:668
                   do_set_master+0x533/0x6d0 net/core/rtnetlink.c:2985
                   do_setlink+0x1018/0x4590 net/core/rtnetlink.c:3187
                   rtnl_changelink net/core/rtnetlink.c:3798 [inline]
                   __rtnl_newlink net/core/rtnetlink.c:3971 [inline]
                   rtnl_newlink+0x15ad/0x1bb0 net/core/rtnetlink.c:4108
                   rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6994
                   netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
                   netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
                   netlink_unicast+0x75c/0x8e0 net/netlink/af_netlink.c:1344
                   netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
                   sock_sendmsg_nosec net/socket.c:787 [inline]
                   __sock_sendmsg net/socket.c:802 [inline]
                   __sys_sendto+0x672/0x710 net/socket.c:2265
                   __do_sys_sendto net/socket.c:2272 [inline]
                   __se_sys_sendto net/socket.c:2268 [inline]
                   __x64_sys_sendto+0xde/0x100 net/socket.c:2268
                   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
                   do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
                   entry_SYSCALL_64_after_hwframe+0x77/0x7f
 }
 ... key      at: [<ffffffff9aa0b240>] br_dev_setup.__key+0x0/0x20

the dependencies between the lock to be acquired
 and SOFTIRQ-irq-unsafe lock:
-> (&bond->stats_lock/2){+.+.}-{3:3} {
   HARDIRQ-ON-W at:
                    lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
                    _raw_spin_lock_nested+0x32/0x50 kernel/locking/spinlock.c:382
                    bond_get_stats+0x458/0x740 drivers/net/bonding/bond_main.c:4514
                    dev_get_stats+0xb4/0xa50 net/core/dev.c:11916
                    rtnl_fill_stats+0x47/0x8c0 net/core/rtnetlink.c:1506
                    rtnl_fill_ifinfo+0x1840/0x20f0 net/core/rtnetlink.c:2155
                    rtmsg_ifinfo_build_skb+0x17d/0x260 net/core/rtnetlink.c:4452
                    rtmsg_ifinfo_event net/core/rtnetlink.c:4485 [inline]
                    rtnetlink_event+0x1b7/0x270 net/core/rtnetlink.c:7054
                    notifier_call_chain+0x1ad/0x3d0 kernel/notifier.c:85
                    call_netdevice_notifiers_extack net/core/dev.c:2287 [inline]
                    call_netdevice_notifiers net/core/dev.c:2301 [inline]
                    netdev_features_change net/core/dev.c:1590 [inline]
                    netdev_change_features net/core/dev.c:11155 [inline]
                    netdev_compute_master_upper_features+0x91e/0xac0 net/core/dev.c:12913
                    bond_enslave+0x21cc/0x3c10 drivers/net/bonding/bond_main.c:2276
                    do_set_master+0x533/0x6d0 net/core/rtnetlink.c:2985
                    do_setlink+0x1018/0x4590 net/core/rtnetlink.c:3187
                    rtnl_changelink net/core/rtnetlink.c:3798 [inline]
                    __rtnl_newlink net/core/rtnetlink.c:3971 [inline]
                    rtnl_newlink+0x15ad/0x1bb0 net/core/rtnetlink.c:4108
                    rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6994
                    netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
                    netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
                    netlink_unicast+0x75c/0x8e0 net/netlink/af_netlink.c:1344
                    netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
                    sock_sendmsg_nosec net/socket.c:787 [inline]
                    __sock_sendmsg net/socket.c:802 [inline]
                    ____sys_sendmsg+0x972/0x9f0 net/socket.c:2698
                    ___sys_sendmsg+0x2a5/0x360 net/socket.c:2752
                    __sys_sendmsg net/socket.c:2784 [inline]
                    __do_sys_sendmsg net/socket.c:2789 [inline]
                    __se_sys_sendmsg net/socket.c:2787 [inline]
                    __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2787
                    do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
                    do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
                    entry_SYSCALL_64_after_hwframe+0x77/0x7f
   SOFTIRQ-ON-W at:
                    lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
                    _raw_spin_lock_nested+0x32/0x50 kernel/locking/spinlock.c:382
                    bond_get_stats+0x458/0x740 drivers/net/bonding/bond_main.c:4514
                    dev_get_stats+0xb4/0xa50 net/core/dev.c:11916
                    rtnl_fill_stats+0x47/0x8c0 net/core/rtnetlink.c:1506
                    rtnl_fill_ifinfo+0x1840/0x20f0 net/core/rtnetlink.c:2155
                    rtmsg_ifinfo_build_skb+0x17d/0x260 net/core/rtnetlink.c:4452
                    rtmsg_ifinfo_event net/core/rtnetlink.c:4485 [inline]
                    rtnetlink_event+0x1b7/0x270 net/core/rtnetlink.c:7054
                    notifier_call_chain+0x1ad/0x3d0 kernel/notifier.c:85
                    call_netdevice_notifiers_extack net/core/dev.c:2287 [inline]
                    call_netdevice_notifiers net/core/dev.c:2301 [inline]
                    netdev_features_change net/core/dev.c:1590 [inline]
                    netdev_change_features net/core/dev.c:11155 [inline]
                    netdev_compute_master_upper_features+0x91e/0xac0 net/core/dev.c:12913
                    bond_enslave+0x21cc/0x3c10 drivers/net/bonding/bond_main.c:2276
                    do_set_master+0x533/0x6d0 net/core/rtnetlink.c:2985
                    do_setlink+0x1018/0x4590 net/core/rtnetlink.c:3187
                    rtnl_changelink net/core/rtnetlink.c:3798 [inline]
                    __rtnl_newlink net/core/rtnetlink.c:3971 [inline]
                    rtnl_newlink+0x15ad/0x1bb0 net/core/rtnetlink.c:4108
                    rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6994
                    netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
                    netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
                    netlink_unicast+0x75c/0x8e0 net/netlink/af_netlink.c:1344
                    netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
                    sock_sendmsg_nosec net/socket.c:787 [inline]
                    __sock_sendmsg net/socket.c:802 [inline]
                    ____sys_sendmsg+0x972/0x9f0 net/socket.c:2698
                    ___sys_sendmsg+0x2a5/0x360 net/socket.c:2752
                    __sys_sendmsg net/socket.c:2784 [inline]
                    __do_sys_sendmsg net/socket.c:2789 [inline]
                    __se_sys_sendmsg net/socket.c:2787 [inline]
                    __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2787
                    do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
                    do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
                    entry_SYSCALL_64_after_hwframe+0x77/0x7f
   INITIAL USE at:
                   lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
                   _raw_spin_lock_nested+0x32/0x50 kernel/locking/spinlock.c:382
                   bond_get_stats+0x458/0x740 drivers/net/bonding/bond_main.c:4514
                   dev_get_stats+0xb4/0xa50 net/core/dev.c:11916
                   rtnl_fill_stats+0x47/0x8c0 net/core/rtnetlink.c:1506
                   rtnl_fill_ifinfo+0x1840/0x20f0 net/core/rtnetlink.c:2155
                   rtmsg_ifinfo_build_skb+0x17d/0x260 net/core/rtnetlink.c:4452
                   rtmsg_ifinfo_event net/core/rtnetlink.c:4485 [inline]
                   rtnetlink_event+0x1b7/0x270 net/core/rtnetlink.c:7054
                   notifier_call_chain+0x1ad/0x3d0 kernel/notifier.c:85
                   call_netdevice_notifiers_extack net/core/dev.c:2287 [inline]
                   call_netdevice_notifiers net/core/dev.c:2301 [inline]
                   netdev_features_change net/core/dev.c:1590 [inline]
                   netdev_change_features net/core/dev.c:11155 [inline]
                   netdev_compute_master_upper_features+0x91e/0xac0 net/core/dev.c:12913
                   bond_enslave+0x21cc/0x3c10 drivers/net/bonding/bond_main.c:2276
                   do_set_master+0x533/0x6d0 net/core/rtnetlink.c:2985
                   do_setlink+0x1018/0x4590 net/core/rtnetlink.c:3187
                   rtnl_changelink net/core/rtnetlink.c:3798 [inline]
                   __rtnl_newlink net/core/rtnetlink.c:3971 [inline]
                   rtnl_newlink+0x15ad/0x1bb0 net/core/rtnetlink.c:4108
                   rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6994
                   netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
                   netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
                   netlink_unicast+0x75c/0x8e0 net/netlink/af_netlink.c:1344
                   netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
                   sock_sendmsg_nosec net/socket.c:787 [inline]
                   __sock_sendmsg net/socket.c:802 [inline]
                   ____sys_sendmsg+0x972/0x9f0 net/socket.c:2698
                   ___sys_sendmsg+0x2a5/0x360 net/socket.c:2752
                   __sys_sendmsg net/socket.c:2784 [inline]
                   __do_sys_sendmsg net/socket.c:2789 [inline]
                   __se_sys_sendmsg net/socket.c:2787 [inline]
                   __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2787
                   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
                   do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
                   entry_SYSCALL_64_after_hwframe+0x77/0x7f
 }
 ... key      at: [<ffffffff9a825582>] bond_init.__key+0x2/0x20
 ... acquired at:
   _raw_spin_lock_nested+0x32/0x50 kernel/locking/spinlock.c:382
   bond_get_stats+0x458/0x740 drivers/net/bonding/bond_main.c:4514
   dev_get_stats+0xb4/0xa50 net/core/dev.c:11916
   rtnl_fill_stats+0x47/0x8c0 net/core/rtnetlink.c:1506
   rtnl_fill_ifinfo+0x1840/0x20f0 net/core/rtnetlink.c:2155
   rtmsg_ifinfo_build_skb+0x17d/0x260 net/core/rtnetlink.c:4452
   rtmsg_ifinfo_event net/core/rtnetlink.c:4485 [inline]
   rtmsg_ifinfo+0x8c/0x1a0 net/core/rtnetlink.c:4494
   __dev_notify_flags+0xf2/0x310 net/core/dev.c:9845
   __dev_set_promiscuity+0x27f/0x710 net/core/dev.c:9647
   netif_set_promiscuity+0x50/0xe0 net/core/dev.c:9657
   dev_set_promiscuity+0x126/0x260 net/core/dev_api.c:287
   br_port_clear_promisc net/bridge/br_if.c:135 [inline]
   br_manage_promisc+0x4db/0x560 net/bridge/br_if.c:172
   nbp_update_port_count net/bridge/br_if.c:242 [inline]
   br_port_flags_change+0x160/0x1f0 net/bridge/br_if.c:747
   br_setport+0xc0a/0x1680 net/bridge/br_netlink.c:1000
   br_port_slave_changelink+0x12f/0x150 net/bridge/br_netlink.c:1213
   rtnl_changelink net/core/rtnetlink.c:3791 [inline]
   __rtnl_newlink net/core/rtnetlink.c:3971 [inline]
   rtnl_newlink+0x191b/0x1bb0 net/core/rtnetlink.c:4108
   rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6994
   netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
   netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
   netlink_unicast+0x75c/0x8e0 net/netlink/af_netlink.c:1344
   netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
   sock_sendmsg_nosec net/socket.c:787 [inline]
   __sock_sendmsg net/socket.c:802 [inline]
   ____sys_sendmsg+0x972/0x9f0 net/socket.c:2698
   ___sys_sendmsg+0x2a5/0x360 net/socket.c:2752
   __sys_sendmsg net/socket.c:2784 [inline]
   __do_sys_sendmsg net/socket.c:2789 [inline]
   __se_sys_sendmsg net/socket.c:2787 [inline]
   __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2787
   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
   do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
   entry_SYSCALL_64_after_hwframe+0x77/0x7f


stack backtrace:
CPU: 0 UID: 0 PID: 21491 Comm: syz.3.6945 Tainted: G             L      syzkaller #0 PREEMPT(full) 
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_bad_irq_dependency kernel/locking/lockdep.c:2616 [inline]
 check_irq_usage kernel/locking/lockdep.c:2857 [inline]
 check_prev_add kernel/locking/lockdep.c:3169 [inline]
 check_prevs_add kernel/locking/lockdep.c:3284 [inline]
 validate_chain kernel/locking/lockdep.c:3908 [inline]
 __lock_acquire+0x2a94/0x2cf0 kernel/locking/lockdep.c:5237
 lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
 _raw_spin_lock_nested+0x32/0x50 kernel/locking/spinlock.c:382
 bond_get_stats+0x458/0x740 drivers/net/bonding/bond_main.c:4514
 dev_get_stats+0xb4/0xa50 net/core/dev.c:11916
 rtnl_fill_stats+0x47/0x8c0 net/core/rtnetlink.c:1506
 rtnl_fill_ifinfo+0x1840/0x20f0 net/core/rtnetlink.c:2155
 rtmsg_ifinfo_build_skb+0x17d/0x260 net/core/rtnetlink.c:4452
 rtmsg_ifinfo_event net/core/rtnetlink.c:4485 [inline]
 rtmsg_ifinfo+0x8c/0x1a0 net/core/rtnetlink.c:4494
 __dev_notify_flags+0xf2/0x310 net/core/dev.c:9845
 __dev_set_promiscuity+0x27f/0x710 net/core/dev.c:9647
 netif_set_promiscuity+0x50/0xe0 net/core/dev.c:9657
 dev_set_promiscuity+0x126/0x260 net/core/dev_api.c:287
 br_port_clear_promisc net/bridge/br_if.c:135 [inline]
 br_manage_promisc+0x4db/0x560 net/bridge/br_if.c:172
 nbp_update_port_count net/bridge/br_if.c:242 [inline]
 br_port_flags_change+0x160/0x1f0 net/bridge/br_if.c:747
 br_setport+0xc0a/0x1680 net/bridge/br_netlink.c:1000
 br_port_slave_changelink+0x12f/0x150 net/bridge/br_netlink.c:1213
 rtnl_changelink net/core/rtnetlink.c:3791 [inline]
 __rtnl_newlink net/core/rtnetlink.c:3971 [inline]
 rtnl_newlink+0x191b/0x1bb0 net/core/rtnetlink.c:4108
 rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6994
 netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
 netlink_unicast+0x75c/0x8e0 net/netlink/af_netlink.c:1344
 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
 sock_sendmsg_nosec net/socket.c:787 [inline]
 __sock_sendmsg net/socket.c:802 [inline]
 ____sys_sendmsg+0x972/0x9f0 net/socket.c:2698
 ___sys_sendmsg+0x2a5/0x360 net/socket.c:2752
 __sys_sendmsg net/socket.c:2784 [inline]
 __do_sys_sendmsg net/socket.c:2789 [inline]
 __se_sys_sendmsg net/socket.c:2787 [inline]
 __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2787
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f779019c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f7791124028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f7790415fa0 RCX: 00007f779019c819
RDX: 0000000000008002 RSI: 0000200000000340 RDI: 0000000000000003
RBP: 00007f7790232c91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f7790416038 R14: 00007f7790415fa0 R15: 00007f779053fa48
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* Re: [PATCH v1 net] tcp: Disable usec TS for SYN Cookie.
From: Kuniyuki Iwashima @ 2026-04-18  5:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Neal Cardwell, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, netdev
In-Reply-To: <CANn89iLL0iWoU-jh=Nk+0fsGWNOeuwQ8DP=dVUT=LjBGHR_2FA@mail.gmail.com>

On Fri, Apr 17, 2026 at 9:59 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Apr 17, 2026 at 7:50 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
> >
> > cookie_tcp_reqsk_alloc() sets tcp_rsk(req)->req_usec_ts to false
> > unconditionally.
> >
> > If want_cookie is true in tcp_conn_request(), we should not set
> > tcp_rsk(req)->req_usec_ts.
> >
> > Let's not call dst_tcp_usec_ts() for SYN Cookie.
>
> May I ask why ?

Sorry, I missed tcp_skb_timestamp_ts() properly restores the
cookie TS generated by cookie_init_timestamp() to ms unit.

Still we don't need to call dst_tcp_usec_ts() for SYN cookie,
but this was more like a cleanup patch.


>
> TCP usec TS are based on routing. this feature is not part of SYN
> and/or SYNACK options.
>
> Both side must have:
>
> ip route ... feature tcp_usec_ts
>
> syncookies are orthogonal to this constraint, so TCP flows can use
> usec TS just fine.
>
> pw-bot: cr

^ permalink raw reply

* [PATCH 0/2] Bluetooth: ISO: Fix KCSAN data-races on iso_pi(sk)
From: SeungJu Cheon @ 2026-04-18  5:33 UTC (permalink / raw)
  To: luiz.dentz, marcel
  Cc: linux-bluetooth, netdev, linux-kernel, me, skhan,
	linux-kernel-mentees, SeungJu Cheon

Found while auditing iso_pi(sk) field accesses after a KCSAN report.
Patch 1/2 is the reported race on iso_pi(sk)->dst in iso_sock_connect();
patch 2/2 covers related races on other iso_pi(sk) fields accessed in
iso_connect_{bis,cis}() and iso_connect_ind() that were found by
inspection during the same audit.

SeungJu Cheon (2):
  Bluetooth: ISO: Fix data-race on dst in iso_sock_connect()
  Bluetooth: ISO: Fix data-race on iso_pi(sk) in socket and HCI event
    paths

 net/bluetooth/iso.c | 59 ++++++++++++++++++++++++++-------------------
 1 file changed, 34 insertions(+), 25 deletions(-)

-- 
2.52.0


^ permalink raw reply

* [PATCH 1/2] Bluetooth: ISO: Fix data-race on dst in iso_sock_connect()
From: SeungJu Cheon @ 2026-04-18  5:34 UTC (permalink / raw)
  To: luiz.dentz, marcel
  Cc: linux-bluetooth, netdev, linux-kernel, me, skhan,
	linux-kernel-mentees, SeungJu Cheon
In-Reply-To: <20260418053401.128483-1-suunj1331@gmail.com>

iso_sock_connect() copies the destination address into
iso_pi(sk)->dst under lock_sock, then releases the lock and reads
it back with bacmp() to decide between the CIS and BIS connect
paths:

    lock_sock(sk);
    bacpy(&iso_pi(sk)->dst, &sa->iso_bdaddr);
    iso_pi(sk)->dst_type = sa->iso_bdaddr_type;
    release_sock(sk);

    if (bacmp(&iso_pi(sk)->dst, BDADDR_ANY))  // <- no lock held

This read after release_sock() races with any concurrent write to
iso_pi(sk)->dst on the same socket.

Fix by performing the bacmp() inside the lock_sock critical section
and caching the result in a local variable.

This patch addresses only the bacmp() race in iso_sock_connect();
other unprotected iso_pi(sk) accesses are fixed separately in the
next patch.

KCSAN report:

BUG: KCSAN: data-race in memcmp+0x39/0xb0

race at unknown origin, with read to 0xffff8f96ea66dde3 of 1 bytes by task 549 on cpu 1:
 memcmp+0x39/0xb0
 iso_sock_connect+0x275/0xb40
 __sys_connect_file+0xbd/0xe0
 __sys_connect+0xe0/0x110
 __x64_sys_connect+0x40/0x50
 x64_sys_call+0xcad/0x1c60
 do_syscall_64+0x133/0x590
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

value changed: 0x00 -> 0xee

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 549 Comm: iso_race_combin Not tainted 7.0.0-08391-g1d51b370a0f8 #40 PREEMPT(lazy)

Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: SeungJu Cheon <suunj1331@gmail.com>
---
 net/bluetooth/iso.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c
index be145e2736b7..14963ba68597 100644
--- a/net/bluetooth/iso.c
+++ b/net/bluetooth/iso.c
@@ -1169,6 +1169,7 @@ static int iso_sock_connect(struct socket *sock, struct sockaddr_unsized *addr,
 	struct sockaddr_iso *sa = (struct sockaddr_iso *)addr;
 	struct sock *sk = sock->sk;
 	int err;
+	bool bcast;
 
 	BT_DBG("sk %p", sk);
 
@@ -1191,9 +1192,11 @@ static int iso_sock_connect(struct socket *sock, struct sockaddr_unsized *addr,
 	bacpy(&iso_pi(sk)->dst, &sa->iso_bdaddr);
 	iso_pi(sk)->dst_type = sa->iso_bdaddr_type;
 
+	bcast = !bacmp(&iso_pi(sk)->dst, BDADDR_ANY);
+
 	release_sock(sk);
 
-	if (bacmp(&iso_pi(sk)->dst, BDADDR_ANY))
+	if (!bcast)
 		err = iso_connect_cis(sk);
 	else
 		err = iso_connect_bis(sk);
-- 
2.52.0


^ permalink raw reply related

* [PATCH 2/2] Bluetooth: ISO: Fix data-race on iso_pi(sk) in socket and HCI event paths
From: SeungJu Cheon @ 2026-04-18  5:34 UTC (permalink / raw)
  To: luiz.dentz, marcel
  Cc: linux-bluetooth, netdev, linux-kernel, me, skhan,
	linux-kernel-mentees, SeungJu Cheon
In-Reply-To: <20260418053401.128483-1-suunj1331@gmail.com>

Several iso_pi(sk) fields (qos, qos_user_set, bc_sid, base, base_len,
sync_handle, bc_num_bis) are written under lock_sock in
iso_sock_setsockopt() and iso_sock_bind(), but read and written under
hci_dev_lock only in two other paths:

  - iso_connect_bis() / iso_connect_cis(), invoked from connect(2),
    read qos/base/bc_sid and reset qos to default_qos on the
    qos_user_set validation failure -- all without lock_sock.

  - iso_connect_ind(), invoked from hci_rx_work, writes sync_handle,
    bc_sid, qos.bcast.encryption, bc_num_bis, base and base_len on
    PA_SYNC_ESTABLISHED / PAST_RECEIVED / BIG_INFO_ADV_REPORT /
    PER_ADV_REPORT events. The BIG_INFO handler additionally passes
    &iso_pi(sk)->qos together with sync_handle / bc_num_bis / bc_bis
    to hci_conn_big_create_sync() while setsockopt may be mutating
    them.

Acquire lock_sock around the affected accesses in both paths.

The locking order hci_dev_lock -> lock_sock matches the existing
iso_conn_big_sync() precedent, whose comment documents the same
requirement for hci_conn_big_create_sync(). The HCI connect/bind
helpers do not wait for command completion -- they enqueue work via
hci_cmd_sync_queue{,_once}() / hci_le_create_cis_pending() and
return -- so the added hold time is comparable to iso_conn_big_sync().

KCSAN report:

BUG: KCSAN: data-race in iso_connect_cis / iso_sock_setsockopt

read to 0xffffa3ae8ce3cdc8 of 1 bytes by task 335 on cpu 0:
 iso_connect_cis+0x49f/0xa20
 iso_sock_connect+0x60e/0xb40
 __sys_connect_file+0xbd/0xe0
 __sys_connect+0xe0/0x110
 __x64_sys_connect+0x40/0x50
 x64_sys_call+0xcad/0x1c60
 do_syscall_64+0x133/0x590
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

write to 0xffffa3ae8ce3cdc8 of 60 bytes by task 334 on cpu 1:
 iso_sock_setsockopt+0x69a/0x930
 do_sock_setsockopt+0xc3/0x170
 __sys_setsockopt+0xd1/0x130
 __x64_sys_setsockopt+0x64/0x80
 x64_sys_call+0x1547/0x1c60
 do_syscall_64+0x133/0x590
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 334 Comm: iso_setup_race Not tainted 7.0.0-10949-g8541d8f725c6 #44 PREEMPT(lazy)

The iso_connect_ind() races were found by inspection.

Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: SeungJu Cheon <suunj1331@gmail.com>
---
 net/bluetooth/iso.c | 54 +++++++++++++++++++++++++--------------------
 1 file changed, 30 insertions(+), 24 deletions(-)

diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c
index 14963ba68597..3ba13769be3a 100644
--- a/net/bluetooth/iso.c
+++ b/net/bluetooth/iso.c
@@ -347,6 +347,7 @@ static int iso_connect_bis(struct sock *sk)
 		return -EHOSTUNREACH;
 
 	hci_dev_lock(hdev);
+	lock_sock(sk);
 
 	if (!bis_capable(hdev)) {
 		err = -EOPNOTSUPP;
@@ -399,13 +400,9 @@ static int iso_connect_bis(struct sock *sk)
 		goto unlock;
 	}
 
-	lock_sock(sk);
-
 	err = iso_chan_add(conn, sk, NULL);
-	if (err) {
-		release_sock(sk);
+	if (err)
 		goto unlock;
-	}
 
 	/* Update source addr of the socket */
 	bacpy(&iso_pi(sk)->src, &hcon->src);
@@ -421,9 +418,8 @@ static int iso_connect_bis(struct sock *sk)
 		iso_sock_set_timer(sk, READ_ONCE(sk->sk_sndtimeo));
 	}
 
-	release_sock(sk);
-
 unlock:
+	release_sock(sk);
 	hci_dev_unlock(hdev);
 	hci_dev_put(hdev);
 	return err;
@@ -444,6 +440,7 @@ static int iso_connect_cis(struct sock *sk)
 		return -EHOSTUNREACH;
 
 	hci_dev_lock(hdev);
+	lock_sock(sk);
 
 	if (!cis_central_capable(hdev)) {
 		err = -EOPNOTSUPP;
@@ -498,13 +495,9 @@ static int iso_connect_cis(struct sock *sk)
 		goto unlock;
 	}
 
-	lock_sock(sk);
-
 	err = iso_chan_add(conn, sk, NULL);
-	if (err) {
-		release_sock(sk);
+	if (err)
 		goto unlock;
-	}
 
 	/* Update source addr of the socket */
 	bacpy(&iso_pi(sk)->src, &hcon->src);
@@ -520,9 +513,8 @@ static int iso_connect_cis(struct sock *sk)
 		iso_sock_set_timer(sk, READ_ONCE(sk->sk_sndtimeo));
 	}
 
-	release_sock(sk);
-
 unlock:
+	release_sock(sk);
 	hci_dev_unlock(hdev);
 	hci_dev_put(hdev);
 	return err;
@@ -2259,8 +2251,10 @@ int iso_connect_ind(struct hci_dev *hdev, bdaddr_t *bdaddr, __u8 *flags)
 		sk = iso_get_sock(hdev, &hdev->bdaddr, bdaddr, BT_LISTEN,
 				  iso_match_sid, ev1);
 		if (sk && !ev1->status) {
+			lock_sock(sk);
 			iso_pi(sk)->sync_handle = le16_to_cpu(ev1->handle);
 			iso_pi(sk)->bc_sid = ev1->sid;
+			release_sock(sk);
 		}
 
 		goto done;
@@ -2271,8 +2265,10 @@ int iso_connect_ind(struct hci_dev *hdev, bdaddr_t *bdaddr, __u8 *flags)
 		sk = iso_get_sock(hdev, &hdev->bdaddr, bdaddr, BT_LISTEN,
 				  iso_match_sid_past, ev1a);
 		if (sk && !ev1a->status) {
+			lock_sock(sk);
 			iso_pi(sk)->sync_handle = le16_to_cpu(ev1a->sync_handle);
 			iso_pi(sk)->bc_sid = ev1a->sid;
+			release_sock(sk);
 		}
 
 		goto done;
@@ -2299,27 +2295,35 @@ int iso_connect_ind(struct hci_dev *hdev, bdaddr_t *bdaddr, __u8 *flags)
 					  ev2);
 
 		if (sk) {
-			int err;
-			struct hci_conn	*hcon = iso_pi(sk)->conn->hcon;
+			int err = 0;
+			bool big_sync;
+			struct hci_conn *hcon;
 
+			lock_sock(sk);
+
+			hcon = iso_pi(sk)->conn->hcon;
 			iso_pi(sk)->qos.bcast.encryption = ev2->encryption;
 
 			if (ev2->num_bis < iso_pi(sk)->bc_num_bis)
 				iso_pi(sk)->bc_num_bis = ev2->num_bis;
 
-			if (!test_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags) &&
-			    !test_and_set_bit(BT_SK_BIG_SYNC, &iso_pi(sk)->flags)) {
+			big_sync = !test_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags) &&
+				   !test_and_set_bit(BT_SK_BIG_SYNC, &iso_pi(sk)->flags);
+
+			if (big_sync)
 				err = hci_conn_big_create_sync(hdev, hcon,
 							       &iso_pi(sk)->qos,
 							       iso_pi(sk)->sync_handle,
 							       iso_pi(sk)->bc_num_bis,
 							       iso_pi(sk)->bc_bis);
-				if (err) {
-					bt_dev_err(hdev, "hci_le_big_create_sync: %d",
-						   err);
-					sock_put(sk);
-					sk = NULL;
-				}
+
+			release_sock(sk);
+
+			if (big_sync && err) {
+				bt_dev_err(hdev, "hci_le_big_create_sync: %d",
+					   err);
+				sock_put(sk);
+				sk = NULL;
 			}
 		}
 
@@ -2373,8 +2377,10 @@ int iso_connect_ind(struct hci_dev *hdev, bdaddr_t *bdaddr, __u8 *flags)
 			if (!base || base_len > BASE_MAX_LENGTH)
 				goto done;
 
+			lock_sock(sk);
 			memcpy(iso_pi(sk)->base, base, base_len);
 			iso_pi(sk)->base_len = base_len;
+			release_sock(sk);
 		} else {
 			/* This is a PA data fragment. Keep pa_data_len set to 0
 			 * until all data has been reassembled.
-- 
2.52.0


^ permalink raw reply related

* Re: [PATCH v1 net] tcp: Disable usec TS for SYN Cookie.
From: Eric Dumazet @ 2026-04-18  5:49 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: Neal Cardwell, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, netdev
In-Reply-To: <CAAVpQUA8+eibr_0CcdKEWUhzyn6SNE8MA5uzYCJhNmWstq2OAQ@mail.gmail.com>

On Fri, Apr 17, 2026 at 10:33 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>
> On Fri, Apr 17, 2026 at 9:59 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Fri, Apr 17, 2026 at 7:50 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
> > >
> > > cookie_tcp_reqsk_alloc() sets tcp_rsk(req)->req_usec_ts to false
> > > unconditionally.
> > >
> > > If want_cookie is true in tcp_conn_request(), we should not set
> > > tcp_rsk(req)->req_usec_ts.
> > >
> > > Let's not call dst_tcp_usec_ts() for SYN Cookie.
> >
> > May I ask why ?
>
> Sorry, I missed tcp_skb_timestamp_ts() properly restores the
> cookie TS generated by cookie_init_timestamp() to ms unit.
>
> Still we don't need to call dst_tcp_usec_ts() for SYN cookie,
> but this was more like a cleanup patch.

Okay, but consider the standard path (or fast path) has to call it.
dst_tcp_usec_ts() is a mere dst_feature(dst,
RTAX_FEATURE_TCP_USEC_TS), which is pretty fast.
Adding a conditional branch won't help.

^ permalink raw reply

* Re: [PATCH net 1/2] tcp: call sk_data_ready() after listener migration
From: Eric Dumazet @ 2026-04-18  6:02 UTC (permalink / raw)
  To: Zhenzhong Wu
  Cc: netdev, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, stable
In-Reply-To: <20260418041633.691435-2-jt26wzz@gmail.com>

On Fri, Apr 17, 2026 at 9:17 PM Zhenzhong Wu <jt26wzz@gmail.com> wrote:
>
> When inet_csk_listen_stop() migrates an established child socket from
> a closing listener to another socket in the same SO_REUSEPORT group,
> the target listener gets a new accept-queue entry via
> inet_csk_reqsk_queue_add(), but that path never notifies the target
> listener's waiters.
>
> As a result, a nonblocking accept() still succeeds because it checks
> the accept queue directly, but waiters that sleep for listener
> readiness can remain asleep until another connection generates a
> wakeup. This affects poll()/epoll_wait()-based waiters, and can also
> leave a blocking accept() asleep after migration even though the
> child is already in the target listener's accept queue.
>
> This was observed in a local test where listener A completed the
> handshake, queued the child, and was closed before userspace called
> accept(). The child was migrated to listener B, but listener B never
> received a wakeup for the migrated accept-queue entry.
>
> Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration
> in inet_csk_listen_stop().
>
> The reqsk_timer_handler() path does not need the same change:
> half-open requests only become readable to userspace when the final
> ACK completes the handshake, and tcp_child_process() already wakes
> the listener in that case.
>
> Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
> Cc: stable@vger.kernel.org
> Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
> ---
>  net/ipv4/inet_connection_sock.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 4ac3ae1bc..da1ce082f 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -1483,6 +1483,7 @@ void inet_csk_listen_stop(struct sock *sk)
>                                         __NET_INC_STATS(sock_net(nsk),
>                                                         LINUX_MIB_TCPMIGRATEREQSUCCESS);
>                                         reqsk_migrate_reset(req);
> +                                       READ_ONCE(nsk->sk_data_ready)(nsk);

I think this is adding a potential UAF (Use Afte Free).
@nsk might have been freed already by another thread/cpu.
Note the existing code already has similar issues.

Untested patch:

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4ac3ae1bc1afc3a39f2790e39b4dda877dc3272b..287b6e01c4f71bfec3dd2a708f316224d9eb4a64
100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1479,6 +1479,7 @@ void inet_csk_listen_stop(struct sock *sk)
                        if (nreq) {
                                refcount_set(&nreq->rsk_refcnt, 1);

+                               rcu_read_lock();
                                if (inet_csk_reqsk_queue_add(nsk,
nreq, child)) {
                                        __NET_INC_STATS(sock_net(nsk),

LINUX_MIB_TCPMIGRATEREQSUCCESS);
@@ -1489,7 +1490,7 @@ void inet_csk_listen_stop(struct sock *sk)
                                        reqsk_migrate_reset(nreq);
                                        __reqsk_free(nreq);
                                }
-
+                               rcu_read_unlock();
                                /* inet_csk_reqsk_queue_add() has already
                                 * called inet_child_forget() on failure case.
                                 */

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox