Netdev List

Netdev List
 help / color / mirror / Atom feed

* [RFC PATCH net] mptcp: pm: fix ADD_ADDR timer infinite retry on option space insufficient
From: Li Xiasong @ 2026-04-18 10:00 UTC (permalink / raw)
  To: Matthieu Baerts, Mat Martineau, Geliang Tang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman
  Cc: netdev, mptcp, linux-kernel, yuehaibing, zhangchangzhong,
	weiyongjun1

When TCP option space is insufficient (e.g., IPv6 with tcp_timestamps
enabled), the original code jumped to out_unlock without clearing the
addr_signal flag. This caused mptcp_pm_add_timer to keep rescheduling
indefinitely without sending ADD_ADDR, preventing the endpoint list from
being traversed.

In a pure ACK scenario (indicated by drop_other_suboptions=true), if
the option space is insufficient to carry the ADD_ADDR suboption, it
is appropriate to drop this address signal to allow the timer handler
to move on to other addresses.

Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Signed-off-by: Li Xiasong <lixiasong1@huawei.com>
---

Seeking feedback on:

When announcing addresses to the peer, MPTCP sends a pure ACK packet
to carry MPTCP options (ADD_ADDR). In this scenario, if the option space
is insufficient for ADD_ADDR, clearing addr_signal would:

  - Prevent the timer from retrying infinitely
  - Allow the timer to continue traversing and processing other addresses
  - Not block other subflow creation or address announcement operations

Is there any scenario where we should retry later instead of clearing
the address signal/echo flag? However, if a pure ACK doesn't have
enough space for the flag, subsequent packets won't either.

---
 net/mptcp/pm.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 57a456690406..1d49779c6a1f 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -881,19 +881,18 @@ bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, const struct sk_buff *skb,
 	}

 	*echo = mptcp_pm_should_add_signal_echo(msk);
+	add_addr = msk->pm.addr_signal &
+		~(*echo ? BIT(MPTCP_ADD_ADDR_ECHO) : BIT(MPTCP_ADD_ADDR_SIGNAL));
 	port = !!(*echo ? msk->pm.remote.port : msk->pm.local.port);
-
 	family = *echo ? msk->pm.remote.family : msk->pm.local.family;
-	if (remaining < mptcp_add_addr_len(family, *echo, port))
-		goto out_unlock;

-	if (*echo) {
-		*addr = msk->pm.remote;
-		add_addr = msk->pm.addr_signal & ~BIT(MPTCP_ADD_ADDR_ECHO);
-	} else {
-		*addr = msk->pm.local;
-		add_addr = msk->pm.addr_signal & ~BIT(MPTCP_ADD_ADDR_SIGNAL);
+	if (remaining < mptcp_add_addr_len(family, *echo, port)) {
+		if (*drop_other_suboptions)
+			WRITE_ONCE(msk->pm.addr_signal, add_addr);
+		goto out_unlock;
 	}
+
+	*addr = *echo ? msk->pm.remote : msk->pm.local;
 	WRITE_ONCE(msk->pm.addr_signal, add_addr);
 	ret = true;

-- 
2.34.1

^ permalink raw reply related

* pre-boot plugged SFP autoneg advertisement
From: markus.stockhausen @ 2026-04-18  9:27 UTC (permalink / raw)
  To: linux, andrew, hkallweit1, netdev; +Cc: 'Jonas Jelonek', jan

Hi,

I'm currently analyzing an issue where a pre-boot-plugged SFP module 
comes up with autoneg=no advertisement during boot. After an
unplug/replug autoneg=yes advertisement is chosen. 

The following addition in phylink_start() just before the call to
phylink_mac_initial_config() mitigiates this.

+  /* If an SFP module was already present before phylink_start() was
+   * called, phylink_sfp_set_config() was unable to call
+   * phylink_mac_initial_config() as phylink was not yet started.
+   * Ensure the SFP capabilities are reflected in advertising.
+   */
+  if (pl->sfp_bus && !linkmode_empty(pl->sfp_support))
+    linkmode_copy(pl->link_config.advertising, pl->sfp_support);

Remark! This is about the OpenWrt Realtek Switch ecosystem with 
kernel 6.18 where we are working hard to get hardware up and 
running. We still rely heavily on pcs/dsa downstream drivers. So 
I'm unsure if my observation/idea regarding upstream phylink is 
right.

Thanks for your feedback in advance.

Markus

^ permalink raw reply

* [PATCH iwl-net v1] ice: fix UAF/NULL deref when VSI rebuild and XDP attach race
From: Kohei Enju @ 2026-04-18  9:01 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Wojciech Drewek,
	Jacob Keller, Larysa Zaremba, Maciej Fijalkowski, Kohei Enju

ice_xdp_setup_prog() unconditionally hot-swaps xdp_prog when
ICE_VSI_REBUILD_PENDING is set. In the attach path, this can publish a
new rx_ring->xdp_prog before rx_ring->xdp_ring becomes valid while the
rebuild is pending. As a result, ice_clean_rx_irq() may dereference
rx_ring->xdp_ring too early.

With high-volume RX packets, running these commands in parallel
triggered a KASAN splat [1].
 # ethtool --reset $DEV irq dma filter offload
 # ip link set dev $DEV xdp {obj $OBJ sec xdp,off}

Fix this by rejecting XDP attach while rebuild is pending.
Keep XDP detach allowed in this window. Detach clears rx_ring->xdp_prog,
so the RX path will not attempt to access rx_ring->xdp_ring.

[1]
BUG: KASAN: slab-use-after-free in ice_napi_poll+0x3921/0x41a0
Read of size 2 at addr ffff88812475b880 by task ksoftirqd/1/23
[...]
Call Trace:
 <TASK>
 ice_napi_poll+0x3921/0x41a0
 __napi_poll+0x98/0x520
 net_rx_action+0x8f2/0xfa0
 handle_softirqs+0x1cb/0x7f0
[...]
 </TASK>

Allocated by task 7246:
 ice_prepare_xdp_rings+0x3de/0x12d0
 ice_xdp+0x61c/0xef0
 dev_xdp_install+0x3c4/0x840
 dev_xdp_attach+0x50a/0x10a0
 dev_change_xdp_fd+0x175/0x210
[...]

Freed by task 7251:
 __rcu_free_sheaf_prepare+0x5f/0x230
 rcu_free_sheaf+0x1a/0xf0
 rcu_core+0x567/0x1d80
 handle_softirqs+0x1cb/0x7f0

Fixes: 2504b8405768 ("ice: protect XDP configuration with a mutex")
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index d1f628f1c8ac..4681cbe193f6 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2912,12 +2912,21 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
 	}
 
 	/* hot swap progs and avoid toggling link */
-	if (ice_is_xdp_ena_vsi(vsi) == !!prog ||
-	    test_bit(ICE_VSI_REBUILD_PENDING, vsi->state)) {
+	if (ice_is_xdp_ena_vsi(vsi) == !!prog) {
 		ice_vsi_assign_bpf_prog(vsi, prog);
 		return 0;
 	}
 
+	if (test_bit(ICE_VSI_REBUILD_PENDING, vsi->state)) {
+		if (prog) {
+			NL_SET_ERR_MSG_MOD(extack, "VSI rebuild is pending");
+			return -EAGAIN;
+		}
+
+		ice_vsi_assign_bpf_prog(vsi, NULL);
+		return 0;
+	}
+
 	if_running = netif_running(vsi->netdev) &&
 		     !test_and_set_bit(ICE_VSI_DOWN, vsi->state);
 
-- 
2.51.0


^ permalink raw reply related

* Re: [PATCH net-deletions] caif: remove CAIF NETWORK LAYER
From: Greg KH @ 2026-04-18  8:48 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	skhan, alexs, si.yanteng, dzm91, linux, mst, jasowang, xuanzhuo,
	eperezma, xu.xin16, wang.yaxin, jiang.kun2, linusw,
	jihed.chaibi.dev, arnd, tytso, jiayuan.chen
In-Reply-To: <20260416182829.1440262-1-kuba@kernel.org>

On Thu, Apr 16, 2026 at 11:28:28AM -0700, Jakub Kicinski wrote:
> Remove CAIF (Communication CPU to Application CPU Interface), the
> ST-Ericsson modem protocol. The subsystem has been orphaned since 2013.
> The last meaningful changes from the maintainers were in March 2013:
>   a8c7687bf216 ("caif_virtio: Check that vringh_config is not null")
>   b2273be8d2df ("caif_virtio: Use vringh_notify_enable correctly")
>   0d2e1a2926b1 ("caif_virtio: Introduce caif over virtio")
> 
> Not-so-coincidentally, according to "the Internet" ST-Ericsson officially
> shut down its modem joint venture in Aug 2013.
> 
> If anyone is using this code please yell!
> 
> In the 13 years since, the code has accumulated 200 non-merge commits,
> of which 71 were cross-tree API changes, 21 carried Fixes: tags, and
> the remaining ~110 were cleanups, doc conversions, treewide refactors,
> and one partial removal (caif_hsi, ca75bcf0a83b).
> 
> We are still getting fixes to this code, in the last 10 days there were
> 3 reports on security@ about CAIF that I have been CCed on.
> 
> UAPI constants (AF_CAIF, ARPHRD_CAIF, N_CAIF, VIRTIO_ID_CAIF) and the
> SELinux classmap entry are intentionally kept for ABI stability.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> I think we should accumulate such patches over the coming days on a separate
> branch. CAIF is a no-brainer IMO but other removals may be more controversial.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply

* Re: [PATCH 1/2 nf] netfilter: nfnetlink_osf: fix out-of-bounds read on option matching
From: Pablo Neira Ayuso @ 2026-04-18  7:57 UTC (permalink / raw)
  To: Fernando Fernandez Mancera; +Cc: netfilter-devel, netdev, coreteam, fw, phil
In-Reply-To: <20260417162057.3732-1-fmancera@suse.de>

On Fri, Apr 17, 2026 at 06:20:56PM +0200, Fernando Fernandez Mancera wrote:
> In nf_osf_match(), the nf_osf_hdr_ctx structure is initialized once
> and passed by reference to nf_osf_match_one() for each fingerprint
> checked. During TCP option parsing, nf_osf_match_one() advances the
> shared ctx->optp pointer.
> 
> If a fingerprint perfectly matches, the function returns early without
> restoring ctx->optp to its initial state. If the user has configured
> NF_OSF_LOGLEVEL_ALL, the loop continues to the next fingerprint.
> However, because ctx->optp was not restored, the next call to
> nf_osf_match_one() starts parsing from the end of the options buffer.
> This causes subsequent matches to read garbage data and fail
> immediately, making it impossible to log more than one match or logging
> incorrect matches.
> 
> Instead of using a shared ctx->optp pointer, pass the context as a
> constant pointer and use a local pointer (optp) for TCP option
> traversal. This makes nf_osf_match_one() strictly stateless from the
> caller's perspective, ensuring every fingerprint check starts at the
> correct option offset.
> 
> Fixes: 1a6a0951fc00 ("netfilter: nfnetlink_osf: add missing fmatch check")
> Suggested-by: Florian Westphal <fw@strlen.de>
> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>

Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

^ permalink raw reply

* Re: [PATCH 2/2 nf] netfilter: nfnetlink_osf: fix potential NULL dereference in ttl check
From: Pablo Neira Ayuso @ 2026-04-18  7:53 UTC (permalink / raw)
  To: Fernando Fernandez Mancera
  Cc: netfilter-devel, netdev, coreteam, fw, phil, Kito Xu (veritas501)
In-Reply-To: <20260417162057.3732-2-fmancera@suse.de>

On Fri, Apr 17, 2026 at 06:20:57PM +0200, Fernando Fernandez Mancera wrote:
> The nf_osf_ttl() function accessed skb->dev to perform a local interface
> address lookup without verifying that the device pointer was valid.
> 
> Additionally, the implementation utilized an in_dev_for_each_ifa_rcu
> loop to match the packet source address against local interface
> addresses. It assumed that packets from the same subnet should not see a
> decrement on the initial TTL. A packet might appear it is from the same
> subnet but it actually isn't especially in modern environments with
> containers and virtual switching.
> 
> Remove the device dereference and interface loop. Replace the logic with
> a switch statement that evaluates the TTL according to the ttl_check.
> 
> Fixes: 11eeef41d5f6 ("netfilter: passive OS fingerprint xtables match")
> Reported-by: Kito Xu (veritas501) <hxzene@gmail.com>
> Closes: https://lore.kernel.org/netfilter-devel/20260414074556.2512750-1-hxzene@gmail.com/
> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>

Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

> ---
> Note: if some help is needed during the backport I can assist.
> ---
>  net/netfilter/nfnetlink_osf.c | 22 +++++++---------------
>  1 file changed, 7 insertions(+), 15 deletions(-)
> 
> diff --git a/net/netfilter/nfnetlink_osf.c b/net/netfilter/nfnetlink_osf.c
> index f58267986453..f0d1e596e146 100644
> --- a/net/netfilter/nfnetlink_osf.c
> +++ b/net/netfilter/nfnetlink_osf.c
> @@ -31,26 +31,18 @@ EXPORT_SYMBOL_GPL(nf_osf_fingers);
>  static inline int nf_osf_ttl(const struct sk_buff *skb,
>  			     int ttl_check, unsigned char f_ttl)
>  {
> -	struct in_device *in_dev = __in_dev_get_rcu(skb->dev);
>  	const struct iphdr *ip = ip_hdr(skb);
> -	const struct in_ifaddr *ifa;
> -	int ret = 0;
>  
> -	if (ttl_check == NF_OSF_TTL_TRUE)
> +	switch (ttl_check) {
> +	case NF_OSF_TTL_TRUE:
>  		return ip->ttl == f_ttl;
> -	if (ttl_check == NF_OSF_TTL_NOCHECK)
> -		return 1;
> -	else if (ip->ttl <= f_ttl)
> +		break;
> +	case NF_OSF_TTL_NOCHECK:
>  		return 1;
> -
> -	in_dev_for_each_ifa_rcu(ifa, in_dev) {
> -		if (inet_ifa_match(ip->saddr, ifa)) {
> -			ret = (ip->ttl == f_ttl);
> -			break;
> -		}
> +	case NF_OSF_TTL_LESS:
> +	default:
> +		return ip->ttl <= f_ttl;
>  	}
> -
> -	return ret;
>  }
>  
>  struct nf_osf_hdr_ctx {
> -- 
> 2.53.0
> 

^ permalink raw reply

* Re: [PATCH 4/4 nf] netfilter: xtables: fix L4 header parsing for non-first fragments
From: Pablo Neira Ayuso @ 2026-04-18  7:51 UTC (permalink / raw)
  To: Fernando Fernandez Mancera; +Cc: netfilter-devel, netdev, coreteam, fw, phil
In-Reply-To: <20260417183433.4739-6-fmancera@suse.de>

On Fri, Apr 17, 2026 at 08:34:35PM +0200, Fernando Fernandez Mancera wrote:
> The TPROXY target and osf match relies on L4 header to operate. For
> fragmented packets, every fragment carries the transport protocol
> identifier, but only the first fragment contains the L4 header.
> 
> As the 'raw' table can be configured to run at priority -450 (before
> defragmentation at -400), the target/match can be reached before
> reassembly. In this case, non-first fragments have their payload
> incorrectly parsed as a TCP/UDP header.

I see, this refers to a misconfiguration scenario.

> Add a fragment check to ensure TPROXY/osf only evaluates unfragmented
> packets or the first fragment in the stream.

LGTM this combo patch for osf and TPROXY in xtables.

Thanks.

> Fixes: 902d6a4c2a4f ("netfilter: nf_defrag: Skip defrag if NOTRACK is set")
> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
> ---
>  net/netfilter/xt_TPROXY.c | 8 ++++++--
>  net/netfilter/xt_osf.c    | 3 +++
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/xt_TPROXY.c b/net/netfilter/xt_TPROXY.c
> index e4bea1d346cf..ac4b011ce48c 100644
> --- a/net/netfilter/xt_TPROXY.c
> +++ b/net/netfilter/xt_TPROXY.c
> @@ -40,6 +40,9 @@ tproxy_tg4(struct net *net, struct sk_buff *skb, __be32 laddr, __be16 lport,
>  	struct udphdr _hdr, *hp;
>  	struct sock *sk;
>  
> +	if (ip_is_fragment(iph))
> +		return NF_DROP;
> +
>  	hp = skb_header_pointer(skb, ip_hdrlen(skb), sizeof(_hdr), &_hdr);
>  	if (hp == NULL)
>  		return NF_DROP;
> @@ -106,6 +109,7 @@ tproxy_tg6_v1(struct sk_buff *skb, const struct xt_action_param *par)
>  {
>  	const struct ipv6hdr *iph = ipv6_hdr(skb);
>  	const struct xt_tproxy_target_info_v1 *tgi = par->targinfo;
> +	unsigned short fragoff = 0;
>  	struct udphdr _hdr, *hp;
>  	struct sock *sk;
>  	const struct in6_addr *laddr;
> @@ -113,8 +117,8 @@ tproxy_tg6_v1(struct sk_buff *skb, const struct xt_action_param *par)
>  	int thoff = 0;
>  	int tproto;
>  
> -	tproto = ipv6_find_hdr(skb, &thoff, -1, NULL, NULL);
> -	if (tproto < 0)
> +	tproto = ipv6_find_hdr(skb, &thoff, -1, &fragoff, NULL);
> +	if (tproto < 0 || fragoff)
>  		return NF_DROP;
>  
>  	hp = skb_header_pointer(skb, thoff, sizeof(_hdr), &_hdr);
> diff --git a/net/netfilter/xt_osf.c b/net/netfilter/xt_osf.c
> index dc9485854002..889dff4daff0 100644
> --- a/net/netfilter/xt_osf.c
> +++ b/net/netfilter/xt_osf.c
> @@ -27,6 +27,9 @@
>  static bool
>  xt_osf_match_packet(const struct sk_buff *skb, struct xt_action_param *p)
>  {
> +	if (ip_is_fragment(ip_hdr(skb)))
> +		return false;
> +
>  	return nf_osf_match(skb, xt_family(p), xt_hooknum(p), xt_in(p),
>  			    xt_out(p), p->matchinfo, xt_net(p), nf_osf_fingers);
>  }
> -- 
> 2.53.0
> 

^ permalink raw reply

* Re: [PATCH 1/4 nf] netfilter: nft_exthdr: skip SCTP chunk evaluation for non-first fragments
From: Pablo Neira Ayuso @ 2026-04-18  7:49 UTC (permalink / raw)
  To: Fernando Fernandez Mancera; +Cc: netfilter-devel, netdev, coreteam, fw, phil
In-Reply-To: <20260417183433.4739-1-fmancera@suse.de>

Hi Fernando,

On Fri, Apr 17, 2026 at 08:34:30PM +0200, Fernando Fernandez Mancera wrote:
> The SCTP chunk matching logic in nft_exthdr relies on SCTP common header
> being present at the transport header offset. For fragmented packets at
> IP level, only the first fragment would match this condition.
> 
> The nft_exthdr could be used in a PREROUTING chain with a priority lower
> than -400. This would bypass defragmentation. In addition, it can be use
> in stateless environments so it should work on a environment where
> defragmentation is not being performed at all.

Yes, and stateless filtering is still a valid configuration, ie.
nf_conntrack is not loaded.

> Add a check for pkt->fragoff to ensure exthdr SCTP only evaluates
> unfragmented packets or the first fragment in the stream.

I would suggest to squash the three small patches to check for
pkt->fragoff in one patch. The three expressions have been already
around for a while (backporting the combo patch that makes the same
logical change should be easy) and it is basically the same logical
change.

Thanks!

> Fixes: 133dc203d77d ("netfilter: nft_exthdr: Support SCTP chunks")
> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
> ---
>  net/netfilter/nft_exthdr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/netfilter/nft_exthdr.c b/net/netfilter/nft_exthdr.c
> index 7eedf4e3ae9c..8eb708bb8cff 100644
> --- a/net/netfilter/nft_exthdr.c
> +++ b/net/netfilter/nft_exthdr.c
> @@ -376,7 +376,7 @@ static void nft_exthdr_sctp_eval(const struct nft_expr *expr,
>  	const struct sctp_chunkhdr *sch;
>  	struct sctp_chunkhdr _sch;
>  
> -	if (pkt->tprot != IPPROTO_SCTP)
> +	if (pkt->tprot != IPPROTO_SCTP || pkt->fragoff)
>  		goto err;
>  
>  	do {
> -- 
> 2.53.0
> 

^ permalink raw reply

* Re: [PATCH net-next] r8169: report per-queue statistics through netdev qstats
From: Eric Dumazet @ 2026-04-18  6:27 UTC (permalink / raw)
  To: Gustavo Arantes
  Cc: Heiner Kallweit, nic_swsd, Andrew Lunn, David S . Miller,
	Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20260418021232.5425-1-dev.gustavoa@gmail.com>

On Fri, Apr 17, 2026 at 7:12 PM Gustavo Arantes <dev.gustavoa@gmail.com> wrote:
>
> r8169 maintains synchronized per-CPU software counters for packet and byte
> accounting, but does not expose them through the netdev qstats interface.
>
> Add netdev_stat_ops callbacks and report the existing software counters
> through queue 0 for both Rx and Tx. Provide zero base stats so device-scope
> qstats report the packet and byte counters as supported and match the
> existing RTNL statistics.
>
> Signed-off-by: Gustavo Arantes <dev.gustavoa@gmail.com>

## Form letter - net-next-closed

Please repost when net-next reopens after Apr 27th.

RFC patches sent for review only are obviously welcome at any time.

See:
https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#development-cycle

pw-bot: cr

^ permalink raw reply

* Re: [PATCH net 1/2] tcp: call sk_data_ready() after listener migration
From: Eric Dumazet @ 2026-04-18  6:02 UTC (permalink / raw)
  To: Zhenzhong Wu
  Cc: netdev, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, stable
In-Reply-To: <20260418041633.691435-2-jt26wzz@gmail.com>

On Fri, Apr 17, 2026 at 9:17 PM Zhenzhong Wu <jt26wzz@gmail.com> wrote:
>
> When inet_csk_listen_stop() migrates an established child socket from
> a closing listener to another socket in the same SO_REUSEPORT group,
> the target listener gets a new accept-queue entry via
> inet_csk_reqsk_queue_add(), but that path never notifies the target
> listener's waiters.
>
> As a result, a nonblocking accept() still succeeds because it checks
> the accept queue directly, but waiters that sleep for listener
> readiness can remain asleep until another connection generates a
> wakeup. This affects poll()/epoll_wait()-based waiters, and can also
> leave a blocking accept() asleep after migration even though the
> child is already in the target listener's accept queue.
>
> This was observed in a local test where listener A completed the
> handshake, queued the child, and was closed before userspace called
> accept(). The child was migrated to listener B, but listener B never
> received a wakeup for the migrated accept-queue entry.
>
> Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration
> in inet_csk_listen_stop().
>
> The reqsk_timer_handler() path does not need the same change:
> half-open requests only become readable to userspace when the final
> ACK completes the handshake, and tcp_child_process() already wakes
> the listener in that case.
>
> Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
> Cc: stable@vger.kernel.org
> Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
> ---
>  net/ipv4/inet_connection_sock.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 4ac3ae1bc..da1ce082f 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -1483,6 +1483,7 @@ void inet_csk_listen_stop(struct sock *sk)
>                                         __NET_INC_STATS(sock_net(nsk),
>                                                         LINUX_MIB_TCPMIGRATEREQSUCCESS);
>                                         reqsk_migrate_reset(req);
> +                                       READ_ONCE(nsk->sk_data_ready)(nsk);

I think this is adding a potential UAF (Use Afte Free).
@nsk might have been freed already by another thread/cpu.
Note the existing code already has similar issues.

Untested patch:

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4ac3ae1bc1afc3a39f2790e39b4dda877dc3272b..287b6e01c4f71bfec3dd2a708f316224d9eb4a64
100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1479,6 +1479,7 @@ void inet_csk_listen_stop(struct sock *sk)
                        if (nreq) {
                                refcount_set(&nreq->rsk_refcnt, 1);

+                               rcu_read_lock();
                                if (inet_csk_reqsk_queue_add(nsk,
nreq, child)) {
                                        __NET_INC_STATS(sock_net(nsk),

LINUX_MIB_TCPMIGRATEREQSUCCESS);
@@ -1489,7 +1490,7 @@ void inet_csk_listen_stop(struct sock *sk)
                                        reqsk_migrate_reset(nreq);
                                        __reqsk_free(nreq);
                                }
-
+                               rcu_read_unlock();
                                /* inet_csk_reqsk_queue_add() has already
                                 * called inet_child_forget() on failure case.
                                 */

^ permalink raw reply

* Re: [PATCH v1 net] tcp: Disable usec TS for SYN Cookie.
From: Eric Dumazet @ 2026-04-18  5:49 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: Neal Cardwell, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, netdev
In-Reply-To: <CAAVpQUA8+eibr_0CcdKEWUhzyn6SNE8MA5uzYCJhNmWstq2OAQ@mail.gmail.com>

On Fri, Apr 17, 2026 at 10:33 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>
> On Fri, Apr 17, 2026 at 9:59 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Fri, Apr 17, 2026 at 7:50 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
> > >
> > > cookie_tcp_reqsk_alloc() sets tcp_rsk(req)->req_usec_ts to false
> > > unconditionally.
> > >
> > > If want_cookie is true in tcp_conn_request(), we should not set
> > > tcp_rsk(req)->req_usec_ts.
> > >
> > > Let's not call dst_tcp_usec_ts() for SYN Cookie.
> >
> > May I ask why ?
>
> Sorry, I missed tcp_skb_timestamp_ts() properly restores the
> cookie TS generated by cookie_init_timestamp() to ms unit.
>
> Still we don't need to call dst_tcp_usec_ts() for SYN cookie,
> but this was more like a cleanup patch.

Okay, but consider the standard path (or fast path) has to call it.
dst_tcp_usec_ts() is a mere dst_feature(dst,
RTAX_FEATURE_TCP_USEC_TS), which is pretty fast.
Adding a conditional branch won't help.

^ permalink raw reply

* [PATCH 2/2] Bluetooth: ISO: Fix data-race on iso_pi(sk) in socket and HCI event paths
From: SeungJu Cheon @ 2026-04-18  5:34 UTC (permalink / raw)
  To: luiz.dentz, marcel
  Cc: linux-bluetooth, netdev, linux-kernel, me, skhan,
	linux-kernel-mentees, SeungJu Cheon
In-Reply-To: <20260418053401.128483-1-suunj1331@gmail.com>

Several iso_pi(sk) fields (qos, qos_user_set, bc_sid, base, base_len,
sync_handle, bc_num_bis) are written under lock_sock in
iso_sock_setsockopt() and iso_sock_bind(), but read and written under
hci_dev_lock only in two other paths:

  - iso_connect_bis() / iso_connect_cis(), invoked from connect(2),
    read qos/base/bc_sid and reset qos to default_qos on the
    qos_user_set validation failure -- all without lock_sock.

  - iso_connect_ind(), invoked from hci_rx_work, writes sync_handle,
    bc_sid, qos.bcast.encryption, bc_num_bis, base and base_len on
    PA_SYNC_ESTABLISHED / PAST_RECEIVED / BIG_INFO_ADV_REPORT /
    PER_ADV_REPORT events. The BIG_INFO handler additionally passes
    &iso_pi(sk)->qos together with sync_handle / bc_num_bis / bc_bis
    to hci_conn_big_create_sync() while setsockopt may be mutating
    them.

Acquire lock_sock around the affected accesses in both paths.

The locking order hci_dev_lock -> lock_sock matches the existing
iso_conn_big_sync() precedent, whose comment documents the same
requirement for hci_conn_big_create_sync(). The HCI connect/bind
helpers do not wait for command completion -- they enqueue work via
hci_cmd_sync_queue{,_once}() / hci_le_create_cis_pending() and
return -- so the added hold time is comparable to iso_conn_big_sync().

KCSAN report:

BUG: KCSAN: data-race in iso_connect_cis / iso_sock_setsockopt

read to 0xffffa3ae8ce3cdc8 of 1 bytes by task 335 on cpu 0:
 iso_connect_cis+0x49f/0xa20
 iso_sock_connect+0x60e/0xb40
 __sys_connect_file+0xbd/0xe0
 __sys_connect+0xe0/0x110
 __x64_sys_connect+0x40/0x50
 x64_sys_call+0xcad/0x1c60
 do_syscall_64+0x133/0x590
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

write to 0xffffa3ae8ce3cdc8 of 60 bytes by task 334 on cpu 1:
 iso_sock_setsockopt+0x69a/0x930
 do_sock_setsockopt+0xc3/0x170
 __sys_setsockopt+0xd1/0x130
 __x64_sys_setsockopt+0x64/0x80
 x64_sys_call+0x1547/0x1c60
 do_syscall_64+0x133/0x590
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 334 Comm: iso_setup_race Not tainted 7.0.0-10949-g8541d8f725c6 #44 PREEMPT(lazy)

The iso_connect_ind() races were found by inspection.

Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: SeungJu Cheon <suunj1331@gmail.com>
---
 net/bluetooth/iso.c | 54 +++++++++++++++++++++++++--------------------
 1 file changed, 30 insertions(+), 24 deletions(-)

diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c
index 14963ba68597..3ba13769be3a 100644
--- a/net/bluetooth/iso.c
+++ b/net/bluetooth/iso.c
@@ -347,6 +347,7 @@ static int iso_connect_bis(struct sock *sk)
 		return -EHOSTUNREACH;
 
 	hci_dev_lock(hdev);
+	lock_sock(sk);
 
 	if (!bis_capable(hdev)) {
 		err = -EOPNOTSUPP;
@@ -399,13 +400,9 @@ static int iso_connect_bis(struct sock *sk)
 		goto unlock;
 	}
 
-	lock_sock(sk);
-
 	err = iso_chan_add(conn, sk, NULL);
-	if (err) {
-		release_sock(sk);
+	if (err)
 		goto unlock;
-	}
 
 	/* Update source addr of the socket */
 	bacpy(&iso_pi(sk)->src, &hcon->src);
@@ -421,9 +418,8 @@ static int iso_connect_bis(struct sock *sk)
 		iso_sock_set_timer(sk, READ_ONCE(sk->sk_sndtimeo));
 	}
 
-	release_sock(sk);
-
 unlock:
+	release_sock(sk);
 	hci_dev_unlock(hdev);
 	hci_dev_put(hdev);
 	return err;
@@ -444,6 +440,7 @@ static int iso_connect_cis(struct sock *sk)
 		return -EHOSTUNREACH;
 
 	hci_dev_lock(hdev);
+	lock_sock(sk);
 
 	if (!cis_central_capable(hdev)) {
 		err = -EOPNOTSUPP;
@@ -498,13 +495,9 @@ static int iso_connect_cis(struct sock *sk)
 		goto unlock;
 	}
 
-	lock_sock(sk);
-
 	err = iso_chan_add(conn, sk, NULL);
-	if (err) {
-		release_sock(sk);
+	if (err)
 		goto unlock;
-	}
 
 	/* Update source addr of the socket */
 	bacpy(&iso_pi(sk)->src, &hcon->src);
@@ -520,9 +513,8 @@ static int iso_connect_cis(struct sock *sk)
 		iso_sock_set_timer(sk, READ_ONCE(sk->sk_sndtimeo));
 	}
 
-	release_sock(sk);
-
 unlock:
+	release_sock(sk);
 	hci_dev_unlock(hdev);
 	hci_dev_put(hdev);
 	return err;
@@ -2259,8 +2251,10 @@ int iso_connect_ind(struct hci_dev *hdev, bdaddr_t *bdaddr, __u8 *flags)
 		sk = iso_get_sock(hdev, &hdev->bdaddr, bdaddr, BT_LISTEN,
 				  iso_match_sid, ev1);
 		if (sk && !ev1->status) {
+			lock_sock(sk);
 			iso_pi(sk)->sync_handle = le16_to_cpu(ev1->handle);
 			iso_pi(sk)->bc_sid = ev1->sid;
+			release_sock(sk);
 		}
 
 		goto done;
@@ -2271,8 +2265,10 @@ int iso_connect_ind(struct hci_dev *hdev, bdaddr_t *bdaddr, __u8 *flags)
 		sk = iso_get_sock(hdev, &hdev->bdaddr, bdaddr, BT_LISTEN,
 				  iso_match_sid_past, ev1a);
 		if (sk && !ev1a->status) {
+			lock_sock(sk);
 			iso_pi(sk)->sync_handle = le16_to_cpu(ev1a->sync_handle);
 			iso_pi(sk)->bc_sid = ev1a->sid;
+			release_sock(sk);
 		}
 
 		goto done;
@@ -2299,27 +2295,35 @@ int iso_connect_ind(struct hci_dev *hdev, bdaddr_t *bdaddr, __u8 *flags)
 					  ev2);
 
 		if (sk) {
-			int err;
-			struct hci_conn	*hcon = iso_pi(sk)->conn->hcon;
+			int err = 0;
+			bool big_sync;
+			struct hci_conn *hcon;
 
+			lock_sock(sk);
+
+			hcon = iso_pi(sk)->conn->hcon;
 			iso_pi(sk)->qos.bcast.encryption = ev2->encryption;
 
 			if (ev2->num_bis < iso_pi(sk)->bc_num_bis)
 				iso_pi(sk)->bc_num_bis = ev2->num_bis;
 
-			if (!test_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags) &&
-			    !test_and_set_bit(BT_SK_BIG_SYNC, &iso_pi(sk)->flags)) {
+			big_sync = !test_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags) &&
+				   !test_and_set_bit(BT_SK_BIG_SYNC, &iso_pi(sk)->flags);
+
+			if (big_sync)
 				err = hci_conn_big_create_sync(hdev, hcon,
 							       &iso_pi(sk)->qos,
 							       iso_pi(sk)->sync_handle,
 							       iso_pi(sk)->bc_num_bis,
 							       iso_pi(sk)->bc_bis);
-				if (err) {
-					bt_dev_err(hdev, "hci_le_big_create_sync: %d",
-						   err);
-					sock_put(sk);
-					sk = NULL;
-				}
+
+			release_sock(sk);
+
+			if (big_sync && err) {
+				bt_dev_err(hdev, "hci_le_big_create_sync: %d",
+					   err);
+				sock_put(sk);
+				sk = NULL;
 			}
 		}
 
@@ -2373,8 +2377,10 @@ int iso_connect_ind(struct hci_dev *hdev, bdaddr_t *bdaddr, __u8 *flags)
 			if (!base || base_len > BASE_MAX_LENGTH)
 				goto done;
 
+			lock_sock(sk);
 			memcpy(iso_pi(sk)->base, base, base_len);
 			iso_pi(sk)->base_len = base_len;
+			release_sock(sk);
 		} else {
 			/* This is a PA data fragment. Keep pa_data_len set to 0
 			 * until all data has been reassembled.
-- 
2.52.0


^ permalink raw reply related

* [PATCH 1/2] Bluetooth: ISO: Fix data-race on dst in iso_sock_connect()
From: SeungJu Cheon @ 2026-04-18  5:34 UTC (permalink / raw)
  To: luiz.dentz, marcel
  Cc: linux-bluetooth, netdev, linux-kernel, me, skhan,
	linux-kernel-mentees, SeungJu Cheon
In-Reply-To: <20260418053401.128483-1-suunj1331@gmail.com>

iso_sock_connect() copies the destination address into
iso_pi(sk)->dst under lock_sock, then releases the lock and reads
it back with bacmp() to decide between the CIS and BIS connect
paths:

    lock_sock(sk);
    bacpy(&iso_pi(sk)->dst, &sa->iso_bdaddr);
    iso_pi(sk)->dst_type = sa->iso_bdaddr_type;
    release_sock(sk);

    if (bacmp(&iso_pi(sk)->dst, BDADDR_ANY))  // <- no lock held

This read after release_sock() races with any concurrent write to
iso_pi(sk)->dst on the same socket.

Fix by performing the bacmp() inside the lock_sock critical section
and caching the result in a local variable.

This patch addresses only the bacmp() race in iso_sock_connect();
other unprotected iso_pi(sk) accesses are fixed separately in the
next patch.

KCSAN report:

BUG: KCSAN: data-race in memcmp+0x39/0xb0

race at unknown origin, with read to 0xffff8f96ea66dde3 of 1 bytes by task 549 on cpu 1:
 memcmp+0x39/0xb0
 iso_sock_connect+0x275/0xb40
 __sys_connect_file+0xbd/0xe0
 __sys_connect+0xe0/0x110
 __x64_sys_connect+0x40/0x50
 x64_sys_call+0xcad/0x1c60
 do_syscall_64+0x133/0x590
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

value changed: 0x00 -> 0xee

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 549 Comm: iso_race_combin Not tainted 7.0.0-08391-g1d51b370a0f8 #40 PREEMPT(lazy)

Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: SeungJu Cheon <suunj1331@gmail.com>
---
 net/bluetooth/iso.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c
index be145e2736b7..14963ba68597 100644
--- a/net/bluetooth/iso.c
+++ b/net/bluetooth/iso.c
@@ -1169,6 +1169,7 @@ static int iso_sock_connect(struct socket *sock, struct sockaddr_unsized *addr,
 	struct sockaddr_iso *sa = (struct sockaddr_iso *)addr;
 	struct sock *sk = sock->sk;
 	int err;
+	bool bcast;
 
 	BT_DBG("sk %p", sk);
 
@@ -1191,9 +1192,11 @@ static int iso_sock_connect(struct socket *sock, struct sockaddr_unsized *addr,
 	bacpy(&iso_pi(sk)->dst, &sa->iso_bdaddr);
 	iso_pi(sk)->dst_type = sa->iso_bdaddr_type;
 
+	bcast = !bacmp(&iso_pi(sk)->dst, BDADDR_ANY);
+
 	release_sock(sk);
 
-	if (bacmp(&iso_pi(sk)->dst, BDADDR_ANY))
+	if (!bcast)
 		err = iso_connect_cis(sk);
 	else
 		err = iso_connect_bis(sk);
-- 
2.52.0


^ permalink raw reply related

* [PATCH 0/2] Bluetooth: ISO: Fix KCSAN data-races on iso_pi(sk)
From: SeungJu Cheon @ 2026-04-18  5:33 UTC (permalink / raw)
  To: luiz.dentz, marcel
  Cc: linux-bluetooth, netdev, linux-kernel, me, skhan,
	linux-kernel-mentees, SeungJu Cheon

Found while auditing iso_pi(sk) field accesses after a KCSAN report.
Patch 1/2 is the reported race on iso_pi(sk)->dst in iso_sock_connect();
patch 2/2 covers related races on other iso_pi(sk) fields accessed in
iso_connect_{bis,cis}() and iso_connect_ind() that were found by
inspection during the same audit.

SeungJu Cheon (2):
  Bluetooth: ISO: Fix data-race on dst in iso_sock_connect()
  Bluetooth: ISO: Fix data-race on iso_pi(sk) in socket and HCI event
    paths

 net/bluetooth/iso.c | 59 ++++++++++++++++++++++++++-------------------
 1 file changed, 34 insertions(+), 25 deletions(-)

-- 
2.52.0

^ permalink raw reply

* Re: [PATCH v1 net] tcp: Disable usec TS for SYN Cookie.
From: Kuniyuki Iwashima @ 2026-04-18  5:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Neal Cardwell, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, netdev
In-Reply-To: <CANn89iLL0iWoU-jh=Nk+0fsGWNOeuwQ8DP=dVUT=LjBGHR_2FA@mail.gmail.com>

On Fri, Apr 17, 2026 at 9:59 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Apr 17, 2026 at 7:50 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
> >
> > cookie_tcp_reqsk_alloc() sets tcp_rsk(req)->req_usec_ts to false
> > unconditionally.
> >
> > If want_cookie is true in tcp_conn_request(), we should not set
> > tcp_rsk(req)->req_usec_ts.
> >
> > Let's not call dst_tcp_usec_ts() for SYN Cookie.
>
> May I ask why ?

Sorry, I missed tcp_skb_timestamp_ts() properly restores the
cookie TS generated by cookie_init_timestamp() to ms unit.

Still we don't need to call dst_tcp_usec_ts() for SYN cookie,
but this was more like a cleanup patch.


>
> TCP usec TS are based on routing. this feature is not part of SYN
> and/or SYNACK options.
>
> Both side must have:
>
> ip route ... feature tcp_usec_ts
>
> syncookies are orthogonal to this constraint, so TCP flows can use
> usec TS just fine.
>
> pw-bot: cr

^ permalink raw reply

* [syzbot] [net?] possible deadlock in br_forward_delay_timer_expired (5)
From: syzbot @ 2026-04-18  5:30 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, jv, kuba, linux-kernel, netdev,
	pabeni, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    43cfbdda5af6 Merge tag 'for-linus-iommufd' of git://git.ke..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=100a4702580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8195c5b22e79c2cf
dashboard link: https://syzkaller.appspot.com/bug?extid=a7f25fd06ad99e9379e4
compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/848e46852283/disk-43cfbdda.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/24283dbdc318/vmlinux-43cfbdda.xz
kernel image: https://storage.googleapis.com/syzbot-assets/f91b3fadd31d/bzImage-43cfbdda.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+a7f25fd06ad99e9379e4@syzkaller.appspotmail.com

netlink: 16 bytes leftover after parsing attributes in process `syz.3.6945'.
=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
syzkaller #0 Tainted: G             L     
-----------------------------------------------------
syz.3.6945/21491 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
ffff888035200e98 (&bond->stats_lock/2){+.+.}-{3:3}, at: bond_get_stats+0x458/0x740 drivers/net/bonding/bond_main.c:4514

and this task is already holding:
ffff888036758e18 (&br->lock){+.-.}-{3:3}, at: spin_lock_bh include/linux/spinlock.h:348 [inline]
ffff888036758e18 (&br->lock){+.-.}-{3:3}, at: br_port_slave_changelink+0x3d/0x150 net/bridge/br_netlink.c:1212
which would create a new lock dependency:
 (&br->lock){+.-.}-{3:3} -> (&bond->stats_lock/2){+.+.}-{3:3}

but this new dependency connects a SOFTIRQ-irq-safe lock:
 (&br->lock){+.-.}-{3:3}

... which became SOFTIRQ-irq-safe at:
  lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
  __raw_spin_lock include/linux/spinlock_api_smp.h:158 [inline]
  _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:158
  spin_lock include/linux/spinlock.h:342 [inline]
  br_forward_delay_timer_expired+0x4f/0x460 net/bridge/br_stp_timer.c:88
  call_timer_fn+0x192/0x5e0 kernel/time/timer.c:1748
  expire_timers kernel/time/timer.c:1799 [inline]
  __run_timers kernel/time/timer.c:2374 [inline]
  __run_timer_base+0x652/0x8b0 kernel/time/timer.c:2386
  run_timer_base kernel/time/timer.c:2395 [inline]
  run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2405
  handle_softirqs+0x22a/0x840 kernel/softirq.c:622
  __do_softirq kernel/softirq.c:656 [inline]
  invoke_softirq kernel/softirq.c:496 [inline]
  __irq_exit_rcu+0xca/0x220 kernel/softirq.c:735
  irq_exit_rcu+0x9/0x30 kernel/softirq.c:752
  common_interrupt+0xbb/0xe0 arch/x86/kernel/irq.c:326
  asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:688
  finish_task_switch+0x427/0xbe0 kernel/sched/core.c:5244
  context_switch kernel/sched/core.c:5390 [inline]
  __schedule+0x17bc/0x5680 kernel/sched/core.c:7188
  __schedule_loop kernel/sched/core.c:7267 [inline]
  schedule+0x164/0x360 kernel/sched/core.c:7282
  smpboot_thread_fn+0x5bc/0xa50 kernel/smpboot.c:156
  kthread+0x388/0x470 kernel/kthread.c:436
  ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

to a SOFTIRQ-irq-unsafe lock:
 (&bond->stats_lock/2){+.+.}-{3:3}

... which became SOFTIRQ-irq-unsafe at:
...
  lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
  _raw_spin_lock_nested+0x32/0x50 kernel/locking/spinlock.c:382
  bond_get_stats+0x458/0x740 drivers/net/bonding/bond_main.c:4514
  dev_get_stats+0xb4/0xa50 net/core/dev.c:11916
  rtnl_fill_stats+0x47/0x8c0 net/core/rtnetlink.c:1506
  rtnl_fill_ifinfo+0x1840/0x20f0 net/core/rtnetlink.c:2155
  rtmsg_ifinfo_build_skb+0x17d/0x260 net/core/rtnetlink.c:4452
  rtmsg_ifinfo_event net/core/rtnetlink.c:4485 [inline]
  rtnetlink_event+0x1b7/0x270 net/core/rtnetlink.c:7054
  notifier_call_chain+0x1ad/0x3d0 kernel/notifier.c:85
  call_netdevice_notifiers_extack net/core/dev.c:2287 [inline]
  call_netdevice_notifiers net/core/dev.c:2301 [inline]
  netdev_features_change net/core/dev.c:1590 [inline]
  netdev_change_features net/core/dev.c:11155 [inline]
  netdev_compute_master_upper_features+0x91e/0xac0 net/core/dev.c:12913
  bond_enslave+0x21cc/0x3c10 drivers/net/bonding/bond_main.c:2276
  do_set_master+0x533/0x6d0 net/core/rtnetlink.c:2985
  do_setlink+0x1018/0x4590 net/core/rtnetlink.c:3187
  rtnl_changelink net/core/rtnetlink.c:3798 [inline]
  __rtnl_newlink net/core/rtnetlink.c:3971 [inline]
  rtnl_newlink+0x15ad/0x1bb0 net/core/rtnetlink.c:4108
  rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6994
  netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
  netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
  netlink_unicast+0x75c/0x8e0 net/netlink/af_netlink.c:1344
  netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
  sock_sendmsg_nosec net/socket.c:787 [inline]
  __sock_sendmsg net/socket.c:802 [inline]
  ____sys_sendmsg+0x972/0x9f0 net/socket.c:2698
  ___sys_sendmsg+0x2a5/0x360 net/socket.c:2752
  __sys_sendmsg net/socket.c:2784 [inline]
  __do_sys_sendmsg net/socket.c:2789 [inline]
  __se_sys_sendmsg net/socket.c:2787 [inline]
  __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2787
  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
  do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
  entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&bond->stats_lock/2);
                               local_irq_disable();
                               lock(&br->lock);
                               lock(&bond->stats_lock/2);
  <Interrupt>
    lock(&br->lock);

 *** DEADLOCK ***

3 locks held by syz.3.6945/21491:
 #0: ffffffff8fdddc80 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline]
 #0: ffffffff8fdddc80 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock net/core/rtnetlink.c:341 [inline]
 #0: ffffffff8fdddc80 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x883/0x1bb0 net/core/rtnetlink.c:4107
 #1: ffff888036758e18 (&br->lock){+.-.}-{3:3}, at: spin_lock_bh include/linux/spinlock.h:348 [inline]
 #1: ffff888036758e18 (&br->lock){+.-.}-{3:3}, at: br_port_slave_changelink+0x3d/0x150 net/bridge/br_netlink.c:1212
 #2: ffffffff8e95cb20 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
 #2: ffffffff8e95cb20 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
 #2: ffffffff8e95cb20 (rcu_read_lock){....}-{1:3}, at: bond_get_stats+0x11a/0x740 drivers/net/bonding/bond_main.c:4509

the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
-> (&br->lock){+.-.}-{3:3} {
   HARDIRQ-ON-W at:
                    lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
                    __raw_spin_lock_bh include/linux/spinlock_api_smp.h:150 [inline]
                    _raw_spin_lock_bh+0x36/0x50 kernel/locking/spinlock.c:182
                    spin_lock_bh include/linux/spinlock.h:348 [inline]
                    br_add_if+0xa99/0xeb0 net/bridge/br_if.c:668
                    do_set_master+0x533/0x6d0 net/core/rtnetlink.c:2985
                    do_setlink+0x1018/0x4590 net/core/rtnetlink.c:3187
                    rtnl_changelink net/core/rtnetlink.c:3798 [inline]
                    __rtnl_newlink net/core/rtnetlink.c:3971 [inline]
                    rtnl_newlink+0x15ad/0x1bb0 net/core/rtnetlink.c:4108
                    rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6994
                    netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
                    netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
                    netlink_unicast+0x75c/0x8e0 net/netlink/af_netlink.c:1344
                    netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
                    sock_sendmsg_nosec net/socket.c:787 [inline]
                    __sock_sendmsg net/socket.c:802 [inline]
                    __sys_sendto+0x672/0x710 net/socket.c:2265
                    __do_sys_sendto net/socket.c:2272 [inline]
                    __se_sys_sendto net/socket.c:2268 [inline]
                    __x64_sys_sendto+0xde/0x100 net/socket.c:2268
                    do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
                    do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
                    entry_SYSCALL_64_after_hwframe+0x77/0x7f
   IN-SOFTIRQ-W at:
                    lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
                    __raw_spin_lock include/linux/spinlock_api_smp.h:158 [inline]
                    _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:158
                    spin_lock include/linux/spinlock.h:342 [inline]
                    br_forward_delay_timer_expired+0x4f/0x460 net/bridge/br_stp_timer.c:88
                    call_timer_fn+0x192/0x5e0 kernel/time/timer.c:1748
                    expire_timers kernel/time/timer.c:1799 [inline]
                    __run_timers kernel/time/timer.c:2374 [inline]
                    __run_timer_base+0x652/0x8b0 kernel/time/timer.c:2386
                    run_timer_base kernel/time/timer.c:2395 [inline]
                    run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2405
                    handle_softirqs+0x22a/0x840 kernel/softirq.c:622
                    __do_softirq kernel/softirq.c:656 [inline]
                    invoke_softirq kernel/softirq.c:496 [inline]
                    __irq_exit_rcu+0xca/0x220 kernel/softirq.c:735
                    irq_exit_rcu+0x9/0x30 kernel/softirq.c:752
                    common_interrupt+0xbb/0xe0 arch/x86/kernel/irq.c:326
                    asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:688
                    finish_task_switch+0x427/0xbe0 kernel/sched/core.c:5244
                    context_switch kernel/sched/core.c:5390 [inline]
                    __schedule+0x17bc/0x5680 kernel/sched/core.c:7188
                    __schedule_loop kernel/sched/core.c:7267 [inline]
                    schedule+0x164/0x360 kernel/sched/core.c:7282
                    smpboot_thread_fn+0x5bc/0xa50 kernel/smpboot.c:156
                    kthread+0x388/0x470 kernel/kthread.c:436
                    ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
                    ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
   INITIAL USE at:
                   lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
                   __raw_spin_lock_bh include/linux/spinlock_api_smp.h:150 [inline]
                   _raw_spin_lock_bh+0x36/0x50 kernel/locking/spinlock.c:182
                   spin_lock_bh include/linux/spinlock.h:348 [inline]
                   br_add_if+0xa99/0xeb0 net/bridge/br_if.c:668
                   do_set_master+0x533/0x6d0 net/core/rtnetlink.c:2985
                   do_setlink+0x1018/0x4590 net/core/rtnetlink.c:3187
                   rtnl_changelink net/core/rtnetlink.c:3798 [inline]
                   __rtnl_newlink net/core/rtnetlink.c:3971 [inline]
                   rtnl_newlink+0x15ad/0x1bb0 net/core/rtnetlink.c:4108
                   rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6994
                   netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
                   netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
                   netlink_unicast+0x75c/0x8e0 net/netlink/af_netlink.c:1344
                   netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
                   sock_sendmsg_nosec net/socket.c:787 [inline]
                   __sock_sendmsg net/socket.c:802 [inline]
                   __sys_sendto+0x672/0x710 net/socket.c:2265
                   __do_sys_sendto net/socket.c:2272 [inline]
                   __se_sys_sendto net/socket.c:2268 [inline]
                   __x64_sys_sendto+0xde/0x100 net/socket.c:2268
                   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
                   do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
                   entry_SYSCALL_64_after_hwframe+0x77/0x7f
 }
 ... key      at: [<ffffffff9aa0b240>] br_dev_setup.__key+0x0/0x20

the dependencies between the lock to be acquired
 and SOFTIRQ-irq-unsafe lock:
-> (&bond->stats_lock/2){+.+.}-{3:3} {
   HARDIRQ-ON-W at:
                    lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
                    _raw_spin_lock_nested+0x32/0x50 kernel/locking/spinlock.c:382
                    bond_get_stats+0x458/0x740 drivers/net/bonding/bond_main.c:4514
                    dev_get_stats+0xb4/0xa50 net/core/dev.c:11916
                    rtnl_fill_stats+0x47/0x8c0 net/core/rtnetlink.c:1506
                    rtnl_fill_ifinfo+0x1840/0x20f0 net/core/rtnetlink.c:2155
                    rtmsg_ifinfo_build_skb+0x17d/0x260 net/core/rtnetlink.c:4452
                    rtmsg_ifinfo_event net/core/rtnetlink.c:4485 [inline]
                    rtnetlink_event+0x1b7/0x270 net/core/rtnetlink.c:7054
                    notifier_call_chain+0x1ad/0x3d0 kernel/notifier.c:85
                    call_netdevice_notifiers_extack net/core/dev.c:2287 [inline]
                    call_netdevice_notifiers net/core/dev.c:2301 [inline]
                    netdev_features_change net/core/dev.c:1590 [inline]
                    netdev_change_features net/core/dev.c:11155 [inline]
                    netdev_compute_master_upper_features+0x91e/0xac0 net/core/dev.c:12913
                    bond_enslave+0x21cc/0x3c10 drivers/net/bonding/bond_main.c:2276
                    do_set_master+0x533/0x6d0 net/core/rtnetlink.c:2985
                    do_setlink+0x1018/0x4590 net/core/rtnetlink.c:3187
                    rtnl_changelink net/core/rtnetlink.c:3798 [inline]
                    __rtnl_newlink net/core/rtnetlink.c:3971 [inline]
                    rtnl_newlink+0x15ad/0x1bb0 net/core/rtnetlink.c:4108
                    rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6994
                    netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
                    netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
                    netlink_unicast+0x75c/0x8e0 net/netlink/af_netlink.c:1344
                    netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
                    sock_sendmsg_nosec net/socket.c:787 [inline]
                    __sock_sendmsg net/socket.c:802 [inline]
                    ____sys_sendmsg+0x972/0x9f0 net/socket.c:2698
                    ___sys_sendmsg+0x2a5/0x360 net/socket.c:2752
                    __sys_sendmsg net/socket.c:2784 [inline]
                    __do_sys_sendmsg net/socket.c:2789 [inline]
                    __se_sys_sendmsg net/socket.c:2787 [inline]
                    __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2787
                    do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
                    do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
                    entry_SYSCALL_64_after_hwframe+0x77/0x7f
   SOFTIRQ-ON-W at:
                    lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
                    _raw_spin_lock_nested+0x32/0x50 kernel/locking/spinlock.c:382
                    bond_get_stats+0x458/0x740 drivers/net/bonding/bond_main.c:4514
                    dev_get_stats+0xb4/0xa50 net/core/dev.c:11916
                    rtnl_fill_stats+0x47/0x8c0 net/core/rtnetlink.c:1506
                    rtnl_fill_ifinfo+0x1840/0x20f0 net/core/rtnetlink.c:2155
                    rtmsg_ifinfo_build_skb+0x17d/0x260 net/core/rtnetlink.c:4452
                    rtmsg_ifinfo_event net/core/rtnetlink.c:4485 [inline]
                    rtnetlink_event+0x1b7/0x270 net/core/rtnetlink.c:7054
                    notifier_call_chain+0x1ad/0x3d0 kernel/notifier.c:85
                    call_netdevice_notifiers_extack net/core/dev.c:2287 [inline]
                    call_netdevice_notifiers net/core/dev.c:2301 [inline]
                    netdev_features_change net/core/dev.c:1590 [inline]
                    netdev_change_features net/core/dev.c:11155 [inline]
                    netdev_compute_master_upper_features+0x91e/0xac0 net/core/dev.c:12913
                    bond_enslave+0x21cc/0x3c10 drivers/net/bonding/bond_main.c:2276
                    do_set_master+0x533/0x6d0 net/core/rtnetlink.c:2985
                    do_setlink+0x1018/0x4590 net/core/rtnetlink.c:3187
                    rtnl_changelink net/core/rtnetlink.c:3798 [inline]
                    __rtnl_newlink net/core/rtnetlink.c:3971 [inline]
                    rtnl_newlink+0x15ad/0x1bb0 net/core/rtnetlink.c:4108
                    rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6994
                    netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
                    netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
                    netlink_unicast+0x75c/0x8e0 net/netlink/af_netlink.c:1344
                    netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
                    sock_sendmsg_nosec net/socket.c:787 [inline]
                    __sock_sendmsg net/socket.c:802 [inline]
                    ____sys_sendmsg+0x972/0x9f0 net/socket.c:2698
                    ___sys_sendmsg+0x2a5/0x360 net/socket.c:2752
                    __sys_sendmsg net/socket.c:2784 [inline]
                    __do_sys_sendmsg net/socket.c:2789 [inline]
                    __se_sys_sendmsg net/socket.c:2787 [inline]
                    __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2787
                    do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
                    do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
                    entry_SYSCALL_64_after_hwframe+0x77/0x7f
   INITIAL USE at:
                   lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
                   _raw_spin_lock_nested+0x32/0x50 kernel/locking/spinlock.c:382
                   bond_get_stats+0x458/0x740 drivers/net/bonding/bond_main.c:4514
                   dev_get_stats+0xb4/0xa50 net/core/dev.c:11916
                   rtnl_fill_stats+0x47/0x8c0 net/core/rtnetlink.c:1506
                   rtnl_fill_ifinfo+0x1840/0x20f0 net/core/rtnetlink.c:2155
                   rtmsg_ifinfo_build_skb+0x17d/0x260 net/core/rtnetlink.c:4452
                   rtmsg_ifinfo_event net/core/rtnetlink.c:4485 [inline]
                   rtnetlink_event+0x1b7/0x270 net/core/rtnetlink.c:7054
                   notifier_call_chain+0x1ad/0x3d0 kernel/notifier.c:85
                   call_netdevice_notifiers_extack net/core/dev.c:2287 [inline]
                   call_netdevice_notifiers net/core/dev.c:2301 [inline]
                   netdev_features_change net/core/dev.c:1590 [inline]
                   netdev_change_features net/core/dev.c:11155 [inline]
                   netdev_compute_master_upper_features+0x91e/0xac0 net/core/dev.c:12913
                   bond_enslave+0x21cc/0x3c10 drivers/net/bonding/bond_main.c:2276
                   do_set_master+0x533/0x6d0 net/core/rtnetlink.c:2985
                   do_setlink+0x1018/0x4590 net/core/rtnetlink.c:3187
                   rtnl_changelink net/core/rtnetlink.c:3798 [inline]
                   __rtnl_newlink net/core/rtnetlink.c:3971 [inline]
                   rtnl_newlink+0x15ad/0x1bb0 net/core/rtnetlink.c:4108
                   rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6994
                   netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
                   netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
                   netlink_unicast+0x75c/0x8e0 net/netlink/af_netlink.c:1344
                   netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
                   sock_sendmsg_nosec net/socket.c:787 [inline]
                   __sock_sendmsg net/socket.c:802 [inline]
                   ____sys_sendmsg+0x972/0x9f0 net/socket.c:2698
                   ___sys_sendmsg+0x2a5/0x360 net/socket.c:2752
                   __sys_sendmsg net/socket.c:2784 [inline]
                   __do_sys_sendmsg net/socket.c:2789 [inline]
                   __se_sys_sendmsg net/socket.c:2787 [inline]
                   __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2787
                   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
                   do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
                   entry_SYSCALL_64_after_hwframe+0x77/0x7f
 }
 ... key      at: [<ffffffff9a825582>] bond_init.__key+0x2/0x20
 ... acquired at:
   _raw_spin_lock_nested+0x32/0x50 kernel/locking/spinlock.c:382
   bond_get_stats+0x458/0x740 drivers/net/bonding/bond_main.c:4514
   dev_get_stats+0xb4/0xa50 net/core/dev.c:11916
   rtnl_fill_stats+0x47/0x8c0 net/core/rtnetlink.c:1506
   rtnl_fill_ifinfo+0x1840/0x20f0 net/core/rtnetlink.c:2155
   rtmsg_ifinfo_build_skb+0x17d/0x260 net/core/rtnetlink.c:4452
   rtmsg_ifinfo_event net/core/rtnetlink.c:4485 [inline]
   rtmsg_ifinfo+0x8c/0x1a0 net/core/rtnetlink.c:4494
   __dev_notify_flags+0xf2/0x310 net/core/dev.c:9845
   __dev_set_promiscuity+0x27f/0x710 net/core/dev.c:9647
   netif_set_promiscuity+0x50/0xe0 net/core/dev.c:9657
   dev_set_promiscuity+0x126/0x260 net/core/dev_api.c:287
   br_port_clear_promisc net/bridge/br_if.c:135 [inline]
   br_manage_promisc+0x4db/0x560 net/bridge/br_if.c:172
   nbp_update_port_count net/bridge/br_if.c:242 [inline]
   br_port_flags_change+0x160/0x1f0 net/bridge/br_if.c:747
   br_setport+0xc0a/0x1680 net/bridge/br_netlink.c:1000
   br_port_slave_changelink+0x12f/0x150 net/bridge/br_netlink.c:1213
   rtnl_changelink net/core/rtnetlink.c:3791 [inline]
   __rtnl_newlink net/core/rtnetlink.c:3971 [inline]
   rtnl_newlink+0x191b/0x1bb0 net/core/rtnetlink.c:4108
   rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6994
   netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
   netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
   netlink_unicast+0x75c/0x8e0 net/netlink/af_netlink.c:1344
   netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
   sock_sendmsg_nosec net/socket.c:787 [inline]
   __sock_sendmsg net/socket.c:802 [inline]
   ____sys_sendmsg+0x972/0x9f0 net/socket.c:2698
   ___sys_sendmsg+0x2a5/0x360 net/socket.c:2752
   __sys_sendmsg net/socket.c:2784 [inline]
   __do_sys_sendmsg net/socket.c:2789 [inline]
   __se_sys_sendmsg net/socket.c:2787 [inline]
   __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2787
   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
   do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
   entry_SYSCALL_64_after_hwframe+0x77/0x7f


stack backtrace:
CPU: 0 UID: 0 PID: 21491 Comm: syz.3.6945 Tainted: G             L      syzkaller #0 PREEMPT(full) 
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_bad_irq_dependency kernel/locking/lockdep.c:2616 [inline]
 check_irq_usage kernel/locking/lockdep.c:2857 [inline]
 check_prev_add kernel/locking/lockdep.c:3169 [inline]
 check_prevs_add kernel/locking/lockdep.c:3284 [inline]
 validate_chain kernel/locking/lockdep.c:3908 [inline]
 __lock_acquire+0x2a94/0x2cf0 kernel/locking/lockdep.c:5237
 lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
 _raw_spin_lock_nested+0x32/0x50 kernel/locking/spinlock.c:382
 bond_get_stats+0x458/0x740 drivers/net/bonding/bond_main.c:4514
 dev_get_stats+0xb4/0xa50 net/core/dev.c:11916
 rtnl_fill_stats+0x47/0x8c0 net/core/rtnetlink.c:1506
 rtnl_fill_ifinfo+0x1840/0x20f0 net/core/rtnetlink.c:2155
 rtmsg_ifinfo_build_skb+0x17d/0x260 net/core/rtnetlink.c:4452
 rtmsg_ifinfo_event net/core/rtnetlink.c:4485 [inline]
 rtmsg_ifinfo+0x8c/0x1a0 net/core/rtnetlink.c:4494
 __dev_notify_flags+0xf2/0x310 net/core/dev.c:9845
 __dev_set_promiscuity+0x27f/0x710 net/core/dev.c:9647
 netif_set_promiscuity+0x50/0xe0 net/core/dev.c:9657
 dev_set_promiscuity+0x126/0x260 net/core/dev_api.c:287
 br_port_clear_promisc net/bridge/br_if.c:135 [inline]
 br_manage_promisc+0x4db/0x560 net/bridge/br_if.c:172
 nbp_update_port_count net/bridge/br_if.c:242 [inline]
 br_port_flags_change+0x160/0x1f0 net/bridge/br_if.c:747
 br_setport+0xc0a/0x1680 net/bridge/br_netlink.c:1000
 br_port_slave_changelink+0x12f/0x150 net/bridge/br_netlink.c:1213
 rtnl_changelink net/core/rtnetlink.c:3791 [inline]
 __rtnl_newlink net/core/rtnetlink.c:3971 [inline]
 rtnl_newlink+0x191b/0x1bb0 net/core/rtnetlink.c:4108
 rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6994
 netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
 netlink_unicast+0x75c/0x8e0 net/netlink/af_netlink.c:1344
 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
 sock_sendmsg_nosec net/socket.c:787 [inline]
 __sock_sendmsg net/socket.c:802 [inline]
 ____sys_sendmsg+0x972/0x9f0 net/socket.c:2698
 ___sys_sendmsg+0x2a5/0x360 net/socket.c:2752
 __sys_sendmsg net/socket.c:2784 [inline]
 __do_sys_sendmsg net/socket.c:2789 [inline]
 __se_sys_sendmsg net/socket.c:2787 [inline]
 __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2787
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f779019c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f7791124028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f7790415fa0 RCX: 00007f779019c819
RDX: 0000000000008002 RSI: 0000200000000340 RDI: 0000000000000003
RBP: 00007f7790232c91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f7790416038 R14: 00007f7790415fa0 R15: 00007f779053fa48
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* Re: [PATCH v1 net] tcp: Disable usec TS for SYN Cookie.
From: Eric Dumazet @ 2026-04-18  4:59 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: Neal Cardwell, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, netdev
In-Reply-To: <20260418024957.2669737-1-kuniyu@google.com>

On Fri, Apr 17, 2026 at 7:50 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>
> cookie_tcp_reqsk_alloc() sets tcp_rsk(req)->req_usec_ts to false
> unconditionally.
>
> If want_cookie is true in tcp_conn_request(), we should not set
> tcp_rsk(req)->req_usec_ts.
>
> Let's not call dst_tcp_usec_ts() for SYN Cookie.

May I ask why ?

TCP usec TS are based on routing. this feature is not part of SYN
and/or SYNACK options.

Both side must have:

ip route ... feature tcp_usec_ts

syncookies are orthogonal to this constraint, so TCP flows can use
usec TS just fine.

pw-bot: cr

^ permalink raw reply

* [PATCH net 4/4] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path
From: Jason Xing @ 2026-04-18  4:56 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260418045644.28612-1-kerneljasonxing@gmail.com>

From: Jason Xing <kernelxing@tencent.com>

When xsk_build_skb() processes multi-buffer packets in copy mode, the
first descriptor stores data into the skb linear area without adding
any frags, so nr_frags stays at 0. The caller then sets xs->skb = skb
to accumulate subsequent descriptors.

If a continuation descriptor fails (e.g. alloc_page returns NULL with
-EAGAIN), we jump to free_err where the condition:

  if (skb && !skb_shinfo(skb)->nr_frags)
      kfree_skb(skb);

evaluates to true because nr_frags is still 0 (the first descriptor
used the linear area, not frags). This frees the skb while xs->skb
still points to it, creating a dangling pointer. On the next transmit
attempt or socket close, xs->skb is dereferenced, causing a
use-after-free or double-free.

Fix by adding a !xs->skb check to the condition, ensuring we only free
skbs that were freshly allocated in this call (xs->skb is NULL) and
never free an in-progress multi-buffer skb that the caller still
references.

Closes: https://lore.kernel.org/all/20260415082654.21026-4-kerneljasonxing@gmail.com/
Fixes: 6b9c129c2f93 ("xsk: remove @first_frag from xsk_build_skb()")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 2918b773aa84..22c7a92e0734 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -894,7 +894,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
 	return skb;

 free_err:
-	if (skb && !skb_shinfo(skb)->nr_frags)
+	if (skb && !xs->skb && !skb_shinfo(skb)->nr_frags)
 		kfree_skb(skb);

 	if (err == -EOVERFLOW) {
-- 
2.41.3

^ permalink raw reply related

* [PATCH net 3/4] xsk: handle NULL dereference of the skb without frags issue
From: Jason Xing @ 2026-04-18  4:56 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260418045644.28612-1-kerneljasonxing@gmail.com>

From: Jason Xing <kernelxing@tencent.com>

When a first descriptor (xs->skb == NULL) triggers -EOVERFLOW in
xsk_build_skb_zerocopy (e.g., MAX_SKB_FRAGS exceeded), the free_err
EOVERFLOW handler unconditionally dereferences xs->skb via
xsk_inc_num_desc(xs->skb) and xsk_drop_skb(xs->skb), causing a NULL
pointer dereference.

In the patch 2/4, the skb is already freed by kfree_skb() inside
xsk_build_skb_zerocopy for the first-descriptor case, so we only need
to do the bookkeeping: cancel the one reserved CQ slot and account for
the single invalid descriptor.

Guard the existing xsk_inc_num_desc/xsk_drop_skb calls with an
xs->skb check (for the continuation case), and add an else branch
for the first-descriptor case that manually cancels the CQ slot and
increments invalid_descs by one.

Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 5d3dbb118730..2918b773aa84 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -898,9 +898,14 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
 		kfree_skb(skb);

 	if (err == -EOVERFLOW) {
-		/* Drop the packet */
-		xsk_inc_num_desc(xs->skb);
-		xsk_drop_skb(xs->skb);
+		if (xs->skb) {
+			/* Drop the packet */
+			xsk_inc_num_desc(xs->skb);
+			xsk_drop_skb(xs->skb);
+		} else {
+			xsk_cq_cancel_locked(xs->pool, 1);
+			xs->tx->invalid_descs++;
+		}
 		xskq_cons_release(xs->tx);
 	} else {
 		/* Let application retry */
-- 
2.41.3

^ permalink raw reply related

* [PATCH net 2/4] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
From: Jason Xing @ 2026-04-18  4:56 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260418045644.28612-1-kerneljasonxing@gmail.com>

From: Jason Xing <kernelxing@tencent.com>

Fix it by explicitly adding kfree_skb() before returning back to its
caller.

How to reproduce it in virtio_net:
1. the current skb is the first one (which means xs->skb is NULL) and
   hit the limit MAX_SKB_FRAGS.
2. xsk_build_skb_zerocopy() returns -EOVERFLOW.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. This
   is why bug can be triggered.
4. there is no chance to free this skb anymore.

Note that if in this case the xs->skb is not NULL, xsk_build_skb() will
call xsk_drop_skb(xs->skb) to do the right thing.

Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 8fcde34aec7b..5d3dbb118730 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -778,8 +778,11 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
 	addr = buffer - pool->addrs;

 	for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) {
-		if (unlikely(i >= MAX_SKB_FRAGS))
+		if (unlikely(i >= MAX_SKB_FRAGS)) {
+			if (!xs->skb)
+				kfree_skb(skb);
 			return ERR_PTR(-EOVERFLOW);
+		}

 		page = pool->umem->pgs[addr >> PAGE_SHIFT];
 		get_page(page);
-- 
2.41.3

^ permalink raw reply related

* [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case
From: Jason Xing @ 2026-04-18  4:56 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260418045644.28612-1-kerneljasonxing@gmail.com>

From: Jason Xing <kernelxing@tencent.com>

Fix it by explicitly adding kfree_skb() before returning back to its
caller.

How to reproduce it in virtio_net:
1. the current skb is the first one (which means no frag and xs->skb is
   NULL) and users enable metadata feature.
2. xsk_skb_metadata() returns a error code.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
4. there is no chance to free this skb anymore.

Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 6149f6a79897..8fcde34aec7b 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -743,8 +743,10 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
 		xsk_skb_init_misc(skb, xs, desc->addr);
 		if (desc->options & XDP_TX_METADATA) {
 			err = xsk_skb_metadata(skb, buffer, desc, pool, hr);
-			if (unlikely(err))
+			if (unlikely(err)) {
+				kfree_skb(skb);
 				return ERR_PTR(err);
+			}
 		}
 	} else {
 		struct xsk_addrs *xsk_addr;
-- 
2.41.3


^ permalink raw reply related

* [PATCH net 0/4] xsk: fix bugs around xsk skb allocation
From: Jason Xing @ 2026-04-18  4:56 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

There are four extremely rare issues around xsk_build_skb(). Two of them
were founded by Sashiko[1].

[1]: https://lore.kernel.org/all/20260415082654.21026-1-kerneljasonxing@gmail.com/

Jason Xing (4):
  xsk: avoid skb leak in XDP_TX_METADATA case
  xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
  xsk: handle NULL dereference of the skb without frags issue
  xsk: fix use-after-free of xs->skb in xsk_build_skb()  free_err path

 net/xdp/xsk.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

-- 
2.41.3


^ permalink raw reply

* Re: [PATCH net 2/2] selftests: net: add reuseport migration wakeup regression tests
From: Kuniyuki Iwashima @ 2026-04-18  4:40 UTC (permalink / raw)
  To: Zhenzhong Wu
  Cc: netdev, edumazet, ncardwell, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest
In-Reply-To: <20260418041633.691435-3-jt26wzz@gmail.com>

On Fri, Apr 17, 2026 at 9:17 PM Zhenzhong Wu <jt26wzz@gmail.com> wrote:
>
> Add selftests that reproduce missing wakeups on the target listener
> after SO_REUSEPORT migration from inet_csk_listen_stop().
>
> The epoll case connects while only the first listener is active so the
> child lands on its accept queue, registers the second listener with
> epoll, then closes the first listener to trigger migration. It verifies
> that the target listener both accepts the migrated child and becomes
> readable via epoll.
>
> The blocking accept case starts a thread blocked in accept() on the
> target listener, closes the first listener to trigger migration, and
> verifies that the blocked accept() wakes and returns the migrated
> child. Wait until the helper thread is actually asleep in accept()
> before triggering migration so the test does not race waiter
> registration.
>
> Run the tests in a private network namespace and enable
> net.ipv4.tcp_migrate_req=1 there so they can exercise the migration
> path without relying on a sk_reuseport/migrate BPF program. Treat a
> missing or unwritable tcp_migrate_req sysctl as SKIP. Run both
> scenarios for IPv4 and IPv6.
>
> These tests cover the bug fixed by the preceding patch.
>
> Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
> ---
>  tools/testing/selftests/net/Makefile          |   3 +
>  .../selftests/net/reuseport_migrate_accept.c  | 533 ++++++++++++++++++
>  .../selftests/net/reuseport_migrate_epoll.c   | 353 ++++++++++++
>  3 files changed, 889 insertions(+)
>  create mode 100644 tools/testing/selftests/net/reuseport_migrate_accept.c
>  create mode 100644 tools/testing/selftests/net/reuseport_migrate_epoll.c

Thanks for the series.

Instead of adding new tests, can you extend
tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c ?

It covers all migration scenarios and you can just add
the target listener to epoll and call non-blocking epoll_wait(,... 0)
before accept() to check if it returns 1 (the number of fd).


>
> diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
> index a275ed584..2f8b6c44d 100644
> --- a/tools/testing/selftests/net/Makefile
> +++ b/tools/testing/selftests/net/Makefile
> @@ -184,6 +184,8 @@ TEST_GEN_PROGS := \
>         reuseport_bpf_cpu \
>         reuseport_bpf_numa \
>         reuseport_dualstack \
> +       reuseport_migrate_accept \
> +       reuseport_migrate_epoll \
>         sk_bind_sendto_listen \
>         sk_connect_zero_addr \
>         sk_so_peek_off \
> @@ -232,6 +234,7 @@ $(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma
>  $(OUTPUT)/tcp_mmap: LDLIBS += -lpthread -lcrypto
>  $(OUTPUT)/tcp_inq: LDLIBS += -lpthread
>  $(OUTPUT)/bind_bhash: LDLIBS += -lpthread
> +$(OUTPUT)/reuseport_migrate_accept: LDLIBS += -lpthread
>  $(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/
>
>  include bpf.mk
> diff --git a/tools/testing/selftests/net/reuseport_migrate_accept.c b/tools/testing/selftests/net/reuseport_migrate_accept.c
> new file mode 100644
> index 000000000..a516843a0
> --- /dev/null
> +++ b/tools/testing/selftests/net/reuseport_migrate_accept.c
> @@ -0,0 +1,533 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define _GNU_SOURCE
> +
> +#include <arpa/inet.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <netinet/in.h>
> +#include <pthread.h>
> +#include <sched.h>
> +#include <signal.h>
> +#include <stdbool.h>
> +#include <stdatomic.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/socket.h>
> +#include <sys/syscall.h>
> +#include <time.h>
> +#include <unistd.h>
> +
> +#include "../kselftest.h"
> +
> +#define ACCEPT_BLOCK_TIMEOUT_MS 1000
> +#define ACCEPT_CLEANUP_TIMEOUT_MS 1000
> +#define ACCEPT_WAKE_TIMEOUT_MS 2000
> +#define TCP_MIGRATE_REQ_PATH "/proc/sys/net/ipv4/tcp_migrate_req"
> +
> +struct reuseport_migrate_case {
> +       const char *name;
> +       int family;
> +       const char *addr;
> +};
> +
> +struct accept_result {
> +       int listener_fd;
> +       atomic_int started;
> +       atomic_int tid;
> +       int accepted_fd;
> +       int err;
> +};
> +
> +static const struct reuseport_migrate_case test_cases[] = {
> +       {
> +               .name = "ipv4 blocking accept wake after reuseport migration",
> +               .family = AF_INET,
> +               .addr = "127.0.0.1",
> +       },
> +       {
> +               .name = "ipv6 blocking accept wake after reuseport migration",
> +               .family = AF_INET6,
> +               .addr = "::1",
> +       },
> +};
> +
> +static void close_fd(int *fd)
> +{
> +       if (*fd >= 0) {
> +               close(*fd);
> +               *fd = -1;
> +       }
> +}
> +
> +static bool unsupported_addr_err(int family, int err)
> +{
> +       return family == AF_INET6 &&
> +               (err == EAFNOSUPPORT ||
> +                err == EPROTONOSUPPORT ||
> +                err == EADDRNOTAVAIL);
> +}
> +
> +static int make_sockaddr(const struct reuseport_migrate_case *test_case,
> +                        unsigned short port,
> +                        struct sockaddr_storage *addr,
> +                        socklen_t *addrlen)
> +{
> +       memset(addr, 0, sizeof(*addr));
> +
> +       if (test_case->family == AF_INET) {
> +               struct sockaddr_in *addr4 = (struct sockaddr_in *)addr;
> +
> +               addr4->sin_family = AF_INET;
> +               addr4->sin_port = htons(port);
> +               if (inet_pton(AF_INET, test_case->addr, &addr4->sin_addr) != 1)
> +                       return -1;
> +
> +               *addrlen = sizeof(*addr4);
> +               return 0;
> +       }
> +
> +       if (test_case->family == AF_INET6) {
> +               struct sockaddr_in6 *addr6 = (struct sockaddr_in6 *)addr;
> +
> +               addr6->sin6_family = AF_INET6;
> +               addr6->sin6_port = htons(port);
> +               if (inet_pton(AF_INET6, test_case->addr, &addr6->sin6_addr) != 1)
> +                       return -1;
> +
> +               *addrlen = sizeof(*addr6);
> +               return 0;
> +       }
> +
> +       return -1;
> +}
> +
> +static int create_reuseport_socket(const struct reuseport_migrate_case *test_case)
> +{
> +       int one = 1;
> +       int fd;
> +
> +       fd = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
> +       if (fd < 0)
> +               return -1;
> +
> +       if (test_case->family == AF_INET6 &&
> +           setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, &one, sizeof(one))) {
> +               close(fd);
> +               return -1;
> +       }
> +
> +       if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one))) {
> +               close(fd);
> +               return -1;
> +       }
> +
> +       return fd;
> +}
> +
> +static int enable_tcp_migrate_req(void)
> +{
> +       int len;
> +       int fd;
> +
> +       fd = open(TCP_MIGRATE_REQ_PATH, O_RDWR | O_CLOEXEC);
> +       if (fd < 0) {
> +               if (errno == ENOENT || errno == EACCES ||
> +                   errno == EPERM || errno == EROFS)
> +                       return KSFT_SKIP;
> +               return KSFT_FAIL;
> +       }
> +
> +       len = write(fd, "1", 1);
> +       if (len != 1) {
> +               if (errno == EACCES || errno == EPERM || errno == EROFS) {
> +                       close(fd);
> +                       return KSFT_SKIP;
> +               }
> +
> +               close(fd);
> +               return KSFT_FAIL;
> +       }
> +
> +       close(fd);
> +       return KSFT_PASS;
> +}
> +
> +static void setup_netns(void)
> +{
> +       int ret;
> +
> +       if (unshare(CLONE_NEWNET))
> +               ksft_exit_skip("unshare(CLONE_NEWNET): %s\n", strerror(errno));
> +
> +       if (system("ip link set lo up"))
> +               ksft_exit_skip("failed to bring up lo interface in netns\n");
> +
> +       ret = enable_tcp_migrate_req();
> +       if (ret == KSFT_SKIP)
> +               ksft_exit_skip("failed to enable tcp_migrate_req\n");
> +       if (ret == KSFT_FAIL)
> +               ksft_exit_fail_msg("failed to enable tcp_migrate_req\n");
> +}
> +
> +static void noop_handler(int sig)
> +{
> +       (void)sig;
> +}
> +
> +static void *accept_thread(void *arg)
> +{
> +       struct accept_result *result = arg;
> +
> +       atomic_store_explicit(&result->tid, (int)syscall(SYS_gettid),
> +                             memory_order_release);
> +       atomic_store_explicit(&result->started, 1, memory_order_release);
> +       result->accepted_fd = accept4(result->listener_fd, NULL, NULL,
> +                                     SOCK_CLOEXEC);
> +       if (result->accepted_fd < 0)
> +               result->err = errno;
> +
> +       return NULL;
> +}
> +
> +static int read_thread_state(int tid, char *state)
> +{
> +       char *close_paren;
> +       char path[64];
> +       char buf[256];
> +       ssize_t len;
> +       int fd;
> +
> +       snprintf(path, sizeof(path), "/proc/self/task/%d/stat", tid);
> +
> +       fd = open(path, O_RDONLY | O_CLOEXEC);
> +       if (fd < 0)
> +               return -errno;
> +
> +       len = read(fd, buf, sizeof(buf) - 1);
> +       close(fd);
> +       if (len < 0)
> +               return -errno;
> +       if (!len)
> +               return -EINVAL;
> +
> +       buf[len] = '\0';
> +       close_paren = strrchr(buf, ')');
> +       if (!close_paren || close_paren[1] != ' ' || !close_paren[2])
> +               return -EINVAL;
> +
> +       *state = close_paren[2];
> +       return 0;
> +}
> +
> +static int wait_for_accept_to_block(const struct reuseport_migrate_case *test_case,
> +                                   int tid)
> +{
> +       char state = '\0';
> +       int ret;
> +       int i;
> +
> +       /*
> +        * A started thread is not enough here: we need to know the waiter
> +        * has actually gone to sleep in accept() before closing listener_a,
> +        * otherwise migration can race ahead of waiter registration. Poll
> +        * /proc task state because the pthread APIs can tell us whether the
> +        * thread has exited, but not whether it is already blocked in the
> +        * target syscall.
> +        */
> +       for (i = 0; i < ACCEPT_BLOCK_TIMEOUT_MS; i++) {
> +               ret = read_thread_state(tid, &state);
> +               if (!ret) {
> +                       if (state == 'S' || state == 'D')
> +                               return KSFT_PASS;
> +                       if (state == 'Z')
> +                               break;
> +               } else if (ret == -ENOENT) {
> +                       break;
> +               }
> +
> +               usleep(1000);
> +       }
> +
> +       ksft_print_msg("%s: accept waiter never blocked before migration\n",
> +                      test_case->name);
> +       return KSFT_FAIL;
> +}
> +
> +static int join_thread_with_timeout(pthread_t thread, int timeout_ms,
> +                                   bool *timed_out)
> +{
> +       struct timespec deadline;
> +       int err;
> +
> +       *timed_out = false;
> +
> +       if (clock_gettime(CLOCK_REALTIME, &deadline))
> +               return KSFT_FAIL;
> +
> +       deadline.tv_nsec += timeout_ms * 1000000LL;
> +       deadline.tv_sec += deadline.tv_nsec / 1000000000LL;
> +       deadline.tv_nsec %= 1000000000LL;
> +
> +       err = pthread_timedjoin_np(thread, NULL, &deadline);
> +       if (!err)
> +               return KSFT_PASS;
> +
> +       if (err != ETIMEDOUT)
> +               return KSFT_FAIL;
> +
> +       *timed_out = true;
> +       return KSFT_FAIL;
> +}
> +
> +static int interrupt_accept_thread(pthread_t thread)
> +{
> +       int err;
> +
> +       err = pthread_kill(thread, SIGUSR1);
> +       if (err && err != ESRCH)
> +               return KSFT_FAIL;
> +
> +       return KSFT_PASS;
> +}
> +
> +static int stop_accept_thread(pthread_t thread, bool *timed_out)
> +{
> +       if (interrupt_accept_thread(thread))
> +               return KSFT_FAIL;
> +
> +       return join_thread_with_timeout(thread, ACCEPT_CLEANUP_TIMEOUT_MS,
> +                                       timed_out);
> +}
> +
> +static int run_test(const struct reuseport_migrate_case *test_case)
> +{
> +       struct accept_result result = {
> +               .listener_fd = -1,
> +               .started = 0,
> +               .tid = -1,
> +               .accepted_fd = -1,
> +               .err = 0,
> +       };
> +       struct sockaddr_storage addr;
> +       struct sigaction sa = {
> +               .sa_handler = noop_handler,
> +       };
> +       bool thread_joined = false;
> +       bool cleanup_timed_out;
> +       int listener_a = -1;
> +       int listener_b = -1;
> +       int ret = KSFT_FAIL;
> +       socklen_t addrlen;
> +       pthread_t thread;
> +       int client = -1;
> +       bool timed_out;
> +       int probe = -1;
> +       int tid;
> +
> +       if (make_sockaddr(test_case, 0, &addr, &addrlen)) {
> +               ksft_print_msg("%s: failed to build socket address\n",
> +                              test_case->name);
> +               goto out;
> +       }
> +
> +       if (sigemptyset(&sa.sa_mask)) {
> +               ksft_perror("sigemptyset");
> +               goto out;
> +       }
> +
> +       if (sigaction(SIGUSR1, &sa, NULL)) {
> +               ksft_perror("sigaction(SIGUSR1)");
> +               goto out;
> +       }
> +
> +       listener_a = create_reuseport_socket(test_case);
> +       if (listener_a < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(listener_a)");
> +               goto out;
> +       }
> +
> +       if (bind(listener_a, (struct sockaddr *)&addr, addrlen)) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("bind(listener_a)");
> +               goto out;
> +       }
> +
> +       if (listen(listener_a, 1)) {
> +               ksft_perror("listen(listener_a)");
> +               goto out;
> +       }
> +
> +       addrlen = sizeof(addr);
> +       if (getsockname(listener_a, (struct sockaddr *)&addr, &addrlen)) {
> +               ksft_perror("getsockname(listener_a)");
> +               goto out;
> +       }
> +
> +       listener_b = create_reuseport_socket(test_case);
> +       if (listener_b < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(listener_b)");
> +               goto out;
> +       }
> +
> +       if (bind(listener_b, (struct sockaddr *)&addr, addrlen)) {
> +               ksft_perror("bind(listener_b)");
> +               goto out;
> +       }
> +
> +       client = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
> +       if (client < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(client)");
> +               goto out;
> +       }
> +
> +       /* Connect while only listener_a is listening, ensuring the
> +        * child lands in listener_a's accept queue deterministically.
> +        */
> +       if (connect(client, (struct sockaddr *)&addr, addrlen)) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("connect(client)");
> +               goto out;
> +       }
> +
> +       if (listen(listener_b, 1)) {
> +               ksft_perror("listen(listener_b)");
> +               goto out;
> +       }
> +
> +       result.listener_fd = listener_b;
> +       if (pthread_create(&thread, NULL, accept_thread, &result)) {
> +               ksft_perror("pthread_create");
> +               goto out;
> +       }
> +
> +       while (!atomic_load_explicit(&result.started, memory_order_acquire))
> +               sched_yield();
> +
> +       tid = atomic_load_explicit(&result.tid, memory_order_acquire);
> +       if (wait_for_accept_to_block(test_case, tid))
> +               goto out_with_thread;
> +
> +       close_fd(&listener_a);
> +
> +       ret = join_thread_with_timeout(thread, ACCEPT_WAKE_TIMEOUT_MS, &timed_out);
> +       if (ret == KSFT_PASS) {
> +               thread_joined = true;
> +               if (result.accepted_fd < 0) {
> +                       ksft_print_msg("%s: blocking accept() returned err=%d (%s)\n",
> +                                      test_case->name, result.err,
> +                                      strerror(result.err));
> +                       ret = KSFT_FAIL;
> +               }
> +
> +               goto out_with_thread;
> +       }
> +
> +       if (!timed_out) {
> +               ksft_print_msg("%s: join_thread_with_timeout() failed\n",
> +                              test_case->name);
> +               goto out_with_thread;
> +       }
> +
> +       if (stop_accept_thread(thread, &cleanup_timed_out) == KSFT_FAIL) {
> +               ksft_print_msg("%s: failed to stop blocking accept waiter\n",
> +                              test_case->name);
> +               goto out_with_thread;
> +       }
> +       thread_joined = true;
> +
> +       if (result.accepted_fd >= 0) {
> +               ksft_print_msg("%s: blocking accept() completed only in cleanup\n",
> +                              test_case->name);
> +               goto out_with_thread;
> +       }
> +
> +       if (result.err != EINTR) {
> +               ksft_print_msg("%s: blocking accept() returned err=%d (%s)\n",
> +                              test_case->name, result.err,
> +                              strerror(result.err));
> +               goto out_with_thread;
> +       }
> +
> +       probe = accept4(listener_b, NULL, NULL, SOCK_NONBLOCK | SOCK_CLOEXEC);
> +       if (probe >= 0) {
> +               ksft_print_msg("%s: accept queue was populated, but blocking accept() timed out\n",
> +                              test_case->name);
> +       } else if (errno == EAGAIN || errno == EWOULDBLOCK) {
> +               ksft_print_msg("%s: target listener had no queued child after migration\n",
> +                              test_case->name);
> +       } else {
> +               ksft_perror("accept4(listener_b)");
> +       }
> +
> +out_with_thread:
> +       close_fd(&probe);
> +       if (!thread_joined) {
> +               if (stop_accept_thread(thread, &cleanup_timed_out) == KSFT_FAIL) {
> +                       ksft_print_msg("%s: failed to stop blocking accept waiter\n",
> +                                      test_case->name);
> +                       ret = KSFT_FAIL;
> +                       goto out;
> +               }
> +
> +               thread_joined = true;
> +       }
> +       if (thread_joined)
> +               close_fd(&result.accepted_fd);
> +
> +out:
> +       close_fd(&client);
> +       close_fd(&listener_b);
> +       close_fd(&listener_a);
> +
> +       return ret;
> +}
> +
> +int main(void)
> +{
> +       int status = KSFT_PASS;
> +       int ret;
> +       int i;
> +
> +       setup_netns();
> +
> +       ksft_print_header();
> +       ksft_set_plan(ARRAY_SIZE(test_cases));
> +
> +       for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
> +               ret = run_test(&test_cases[i]);
> +               ksft_test_result_code(ret, test_cases[i].name, NULL);
> +
> +               if (ret == KSFT_FAIL)
> +                       status = KSFT_FAIL;
> +       }
> +
> +       if (status == KSFT_FAIL)
> +               ksft_exit_fail();
> +
> +       ksft_finished();
> +}
> diff --git a/tools/testing/selftests/net/reuseport_migrate_epoll.c b/tools/testing/selftests/net/reuseport_migrate_epoll.c
> new file mode 100644
> index 000000000..9cbfb58c4
> --- /dev/null
> +++ b/tools/testing/selftests/net/reuseport_migrate_epoll.c
> @@ -0,0 +1,353 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define _GNU_SOURCE
> +
> +#include <arpa/inet.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <netinet/in.h>
> +#include <sched.h>
> +#include <stdbool.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/epoll.h>
> +#include <sys/socket.h>
> +#include <unistd.h>
> +
> +#include "../kselftest.h"
> +
> +#define EPOLL_TIMEOUT_MS 500
> +#define TCP_MIGRATE_REQ_PATH "/proc/sys/net/ipv4/tcp_migrate_req"
> +
> +struct reuseport_migrate_case {
> +       const char *name;
> +       int family;
> +       const char *addr;
> +};
> +
> +static const struct reuseport_migrate_case test_cases[] = {
> +       {
> +               .name = "ipv4 epoll wake after reuseport migration",
> +               .family = AF_INET,
> +               .addr = "127.0.0.1",
> +       },
> +       {
> +               .name = "ipv6 epoll wake after reuseport migration",
> +               .family = AF_INET6,
> +               .addr = "::1",
> +       },
> +};
> +
> +static void close_fd(int *fd)
> +{
> +       if (*fd >= 0) {
> +               close(*fd);
> +               *fd = -1;
> +       }
> +}
> +
> +static bool unsupported_addr_err(int family, int err)
> +{
> +       return family == AF_INET6 &&
> +               (err == EAFNOSUPPORT ||
> +                err == EPROTONOSUPPORT ||
> +                err == EADDRNOTAVAIL);
> +}
> +
> +static int make_sockaddr(const struct reuseport_migrate_case *test_case,
> +                        unsigned short port,
> +                        struct sockaddr_storage *addr,
> +                        socklen_t *addrlen)
> +{
> +       memset(addr, 0, sizeof(*addr));
> +
> +       if (test_case->family == AF_INET) {
> +               struct sockaddr_in *addr4 = (struct sockaddr_in *)addr;
> +
> +               addr4->sin_family = AF_INET;
> +               addr4->sin_port = htons(port);
> +               if (inet_pton(AF_INET, test_case->addr, &addr4->sin_addr) != 1)
> +                       return -1;
> +
> +               *addrlen = sizeof(*addr4);
> +               return 0;
> +       }
> +
> +       if (test_case->family == AF_INET6) {
> +               struct sockaddr_in6 *addr6 = (struct sockaddr_in6 *)addr;
> +
> +               addr6->sin6_family = AF_INET6;
> +               addr6->sin6_port = htons(port);
> +               if (inet_pton(AF_INET6, test_case->addr, &addr6->sin6_addr) != 1)
> +                       return -1;
> +
> +               *addrlen = sizeof(*addr6);
> +               return 0;
> +       }
> +
> +       return -1;
> +}
> +
> +static int create_reuseport_socket(const struct reuseport_migrate_case *test_case)
> +{
> +       int one = 1;
> +       int fd;
> +
> +       fd = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
> +       if (fd < 0)
> +               return -1;
> +
> +       if (test_case->family == AF_INET6 &&
> +           setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, &one, sizeof(one))) {
> +               close(fd);
> +               return -1;
> +       }
> +
> +       if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one))) {
> +               close(fd);
> +               return -1;
> +       }
> +
> +       return fd;
> +}
> +
> +static int set_nonblocking(int fd)
> +{
> +       int flags;
> +
> +       flags = fcntl(fd, F_GETFL);
> +       if (flags < 0)
> +               return -1;
> +
> +       return fcntl(fd, F_SETFL, flags | O_NONBLOCK);
> +}
> +
> +static int enable_tcp_migrate_req(void)
> +{
> +       int len;
> +       int fd;
> +
> +       fd = open(TCP_MIGRATE_REQ_PATH, O_RDWR | O_CLOEXEC);
> +       if (fd < 0) {
> +               if (errno == ENOENT || errno == EACCES ||
> +                   errno == EPERM || errno == EROFS)
> +                       return KSFT_SKIP;
> +               return KSFT_FAIL;
> +       }
> +
> +       len = write(fd, "1", 1);
> +       if (len != 1) {
> +               if (errno == EACCES || errno == EPERM || errno == EROFS) {
> +                       close(fd);
> +                       return KSFT_SKIP;
> +               }
> +
> +               close(fd);
> +               return KSFT_FAIL;
> +       }
> +
> +       close(fd);
> +       return KSFT_PASS;
> +}
> +
> +static void setup_netns(void)
> +{
> +       int ret;
> +
> +       if (unshare(CLONE_NEWNET))
> +               ksft_exit_skip("unshare(CLONE_NEWNET): %s\n", strerror(errno));
> +
> +       if (system("ip link set lo up"))
> +               ksft_exit_skip("failed to bring up lo interface in netns\n");
> +
> +       ret = enable_tcp_migrate_req();
> +       if (ret == KSFT_SKIP)
> +               ksft_exit_skip("failed to enable tcp_migrate_req\n");
> +       if (ret == KSFT_FAIL)
> +               ksft_exit_fail_msg("failed to enable tcp_migrate_req\n");
> +}
> +
> +static int run_test(const struct reuseport_migrate_case *test_case)
> +{
> +       struct sockaddr_storage addr;
> +       struct epoll_event ev = {
> +               .events = EPOLLIN,
> +       };
> +       int listener_a = -1;
> +       int listener_b = -1;
> +       int ret = KSFT_FAIL;
> +       socklen_t addrlen;
> +       int accepted = -1;
> +       int client = -1;
> +       int epfd = -1;
> +       int n;
> +
> +       if (make_sockaddr(test_case, 0, &addr, &addrlen)) {
> +               ksft_print_msg("%s: failed to build socket address\n",
> +                              test_case->name);
> +               goto out;
> +       }
> +
> +       listener_a = create_reuseport_socket(test_case);
> +       if (listener_a < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(listener_a)");
> +               goto out;
> +       }
> +
> +       if (bind(listener_a, (struct sockaddr *)&addr, addrlen)) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("bind(listener_a)");
> +               goto out;
> +       }
> +
> +       if (listen(listener_a, 1)) {
> +               ksft_perror("listen(listener_a)");
> +               goto out;
> +       }
> +
> +       addrlen = sizeof(addr);
> +       if (getsockname(listener_a, (struct sockaddr *)&addr, &addrlen)) {
> +               ksft_perror("getsockname(listener_a)");
> +               goto out;
> +       }
> +
> +       listener_b = create_reuseport_socket(test_case);
> +       if (listener_b < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(listener_b)");
> +               goto out;
> +       }
> +
> +       if (bind(listener_b, (struct sockaddr *)&addr, addrlen)) {
> +               ksft_perror("bind(listener_b)");
> +               goto out;
> +       }
> +
> +       client = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
> +       if (client < 0) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("socket(client)");
> +               goto out;
> +       }
> +
> +       /* Connect while only listener_a is listening, ensuring the
> +        * child lands in listener_a's accept queue deterministically.
> +        */
> +       if (connect(client, (struct sockaddr *)&addr, addrlen)) {
> +               if (unsupported_addr_err(test_case->family, errno)) {
> +                       ret = KSFT_SKIP;
> +                       goto out;
> +               }
> +
> +               ksft_perror("connect(client)");
> +               goto out;
> +       }
> +
> +       if (listen(listener_b, 1)) {
> +               ksft_perror("listen(listener_b)");
> +               goto out;
> +       }
> +
> +       if (set_nonblocking(listener_b)) {
> +               ksft_perror("set_nonblocking(listener_b)");
> +               goto out;
> +       }
> +
> +       epfd = epoll_create1(EPOLL_CLOEXEC);
> +       if (epfd < 0) {
> +               ksft_perror("epoll_create1");
> +               goto out;
> +       }
> +
> +       ev.data.fd = listener_b;
> +       if (epoll_ctl(epfd, EPOLL_CTL_ADD, listener_b, &ev)) {
> +               ksft_perror("epoll_ctl(ADD listener_b)");
> +               goto out;
> +       }
> +
> +       close_fd(&listener_a);
> +
> +       n = epoll_wait(epfd, &ev, 1, EPOLL_TIMEOUT_MS);
> +       if (n < 0) {
> +               ksft_perror("epoll_wait");
> +               goto out;
> +       }
> +
> +       accepted = accept4(listener_b, NULL, NULL, SOCK_NONBLOCK | SOCK_CLOEXEC);
> +       if (accepted < 0) {
> +               if (errno == EAGAIN || errno == EWOULDBLOCK) {
> +                       ksft_print_msg("%s: target listener had no queued child after migration\n",
> +                                      test_case->name);
> +                       goto out;
> +               }
> +
> +               ksft_perror("accept4(listener_b)");
> +               goto out;
> +       }
> +
> +       if (n != 1) {
> +               ksft_print_msg("%s: accept queue was populated, but epoll_wait() timed out\n",
> +                              test_case->name);
> +               goto out;
> +       }
> +
> +       if (ev.data.fd != listener_b || !(ev.events & EPOLLIN)) {
> +               ksft_print_msg("%s: unexpected epoll event fd=%d events=%#x\n",
> +                              test_case->name, ev.data.fd, ev.events);
> +               goto out;
> +       }
> +
> +       ret = KSFT_PASS;
> +
> +out:
> +       close_fd(&accepted);
> +       close_fd(&epfd);
> +       close_fd(&client);
> +       close_fd(&listener_b);
> +       close_fd(&listener_a);
> +
> +       return ret;
> +}
> +
> +int main(void)
> +{
> +       int status = KSFT_PASS;
> +       int ret;
> +       int i;
> +
> +       setup_netns();
> +
> +       ksft_print_header();
> +       ksft_set_plan(ARRAY_SIZE(test_cases));
> +
> +       for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
> +               ret = run_test(&test_cases[i]);
> +               ksft_test_result_code(ret, test_cases[i].name, NULL);
> +
> +               if (ret == KSFT_FAIL)
> +                       status = KSFT_FAIL;
> +       }
> +
> +       if (status == KSFT_FAIL)
> +               ksft_exit_fail();
> +
> +       ksft_finished();
> +}
> --
> 2.43.0
>

^ permalink raw reply

* Re: [PATCH 4/4] drbd: switch from genl_magic macros to YNL-generated code
From: kernel test robot @ 2026-04-18  4:36 UTC (permalink / raw)
  To: Christoph Böhmwalder, Jens Axboe
  Cc: oe-kbuild-all, drbd-dev, linux-kernel, Lars Ellenberg,
	Philipp Reisner, linux-block, Donald Hunter, Eric Dumazet,
	Jakub Kicinski, netdev, Christoph Böhmwalder
In-Reply-To: <20260407173356.873887-5-christoph.boehmwalder@linbit.com>

Hi Christoph,

kernel test robot noticed the following build errors:

[auto build test ERROR on a9c4b1d37622ed01b75f94a4f68cf55f33153a31]

url:    https://github.com/intel-lab-lkp/linux/commits/Christoph-B-hmwalder/drbd-move-UAPI-headers-to-include-uapi-linux/20260417-214347
base:   a9c4b1d37622ed01b75f94a4f68cf55f33153a31
patch link:    https://lore.kernel.org/r/20260407173356.873887-5-christoph.boehmwalder%40linbit.com
patch subject: [PATCH 4/4] drbd: switch from genl_magic macros to YNL-generated code
config: x86_64-rhel-9.4 (https://download.01.org/0day-ci/archive/20260418/202604180607.iqIlyAER-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260418/202604180607.iqIlyAER-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604180607.iqIlyAER-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from <command-line>:
>> ./usr/include/linux/drbd.h:18:10: fatal error: sys/types.h: No such file or directory
      18 | #include <sys/types.h>
         |          ^~~~~~~~~~~~~
   compilation terminated.

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* [PATCH net 2/2] selftests: net: add reuseport migration wakeup regression tests
From: Zhenzhong Wu @ 2026-04-18  4:16 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, Zhenzhong Wu
In-Reply-To: <20260418041633.691435-1-jt26wzz@gmail.com>

Add selftests that reproduce missing wakeups on the target listener
after SO_REUSEPORT migration from inet_csk_listen_stop().

The epoll case connects while only the first listener is active so the
child lands on its accept queue, registers the second listener with
epoll, then closes the first listener to trigger migration. It verifies
that the target listener both accepts the migrated child and becomes
readable via epoll.

The blocking accept case starts a thread blocked in accept() on the
target listener, closes the first listener to trigger migration, and
verifies that the blocked accept() wakes and returns the migrated
child. Wait until the helper thread is actually asleep in accept()
before triggering migration so the test does not race waiter
registration.

Run the tests in a private network namespace and enable
net.ipv4.tcp_migrate_req=1 there so they can exercise the migration
path without relying on a sk_reuseport/migrate BPF program. Treat a
missing or unwritable tcp_migrate_req sysctl as SKIP. Run both
scenarios for IPv4 and IPv6.

These tests cover the bug fixed by the preceding patch.

Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
---
 tools/testing/selftests/net/Makefile          |   3 +
 .../selftests/net/reuseport_migrate_accept.c  | 533 ++++++++++++++++++
 .../selftests/net/reuseport_migrate_epoll.c   | 353 ++++++++++++
 3 files changed, 889 insertions(+)
 create mode 100644 tools/testing/selftests/net/reuseport_migrate_accept.c
 create mode 100644 tools/testing/selftests/net/reuseport_migrate_epoll.c

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index a275ed584..2f8b6c44d 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -184,6 +184,8 @@ TEST_GEN_PROGS := \
 	reuseport_bpf_cpu \
 	reuseport_bpf_numa \
 	reuseport_dualstack \
+	reuseport_migrate_accept \
+	reuseport_migrate_epoll \
 	sk_bind_sendto_listen \
 	sk_connect_zero_addr \
 	sk_so_peek_off \
@@ -232,6 +234,7 @@ $(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma
 $(OUTPUT)/tcp_mmap: LDLIBS += -lpthread -lcrypto
 $(OUTPUT)/tcp_inq: LDLIBS += -lpthread
 $(OUTPUT)/bind_bhash: LDLIBS += -lpthread
+$(OUTPUT)/reuseport_migrate_accept: LDLIBS += -lpthread
 $(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/
 
 include bpf.mk
diff --git a/tools/testing/selftests/net/reuseport_migrate_accept.c b/tools/testing/selftests/net/reuseport_migrate_accept.c
new file mode 100644
index 000000000..a516843a0
--- /dev/null
+++ b/tools/testing/selftests/net/reuseport_migrate_accept.c
@@ -0,0 +1,533 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <netinet/in.h>
+#include <pthread.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdatomic.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <sys/syscall.h>
+#include <time.h>
+#include <unistd.h>
+
+#include "../kselftest.h"
+
+#define ACCEPT_BLOCK_TIMEOUT_MS 1000
+#define ACCEPT_CLEANUP_TIMEOUT_MS 1000
+#define ACCEPT_WAKE_TIMEOUT_MS 2000
+#define TCP_MIGRATE_REQ_PATH "/proc/sys/net/ipv4/tcp_migrate_req"
+
+struct reuseport_migrate_case {
+	const char *name;
+	int family;
+	const char *addr;
+};
+
+struct accept_result {
+	int listener_fd;
+	atomic_int started;
+	atomic_int tid;
+	int accepted_fd;
+	int err;
+};
+
+static const struct reuseport_migrate_case test_cases[] = {
+	{
+		.name = "ipv4 blocking accept wake after reuseport migration",
+		.family = AF_INET,
+		.addr = "127.0.0.1",
+	},
+	{
+		.name = "ipv6 blocking accept wake after reuseport migration",
+		.family = AF_INET6,
+		.addr = "::1",
+	},
+};
+
+static void close_fd(int *fd)
+{
+	if (*fd >= 0) {
+		close(*fd);
+		*fd = -1;
+	}
+}
+
+static bool unsupported_addr_err(int family, int err)
+{
+	return family == AF_INET6 &&
+		(err == EAFNOSUPPORT ||
+		 err == EPROTONOSUPPORT ||
+		 err == EADDRNOTAVAIL);
+}
+
+static int make_sockaddr(const struct reuseport_migrate_case *test_case,
+			 unsigned short port,
+			 struct sockaddr_storage *addr,
+			 socklen_t *addrlen)
+{
+	memset(addr, 0, sizeof(*addr));
+
+	if (test_case->family == AF_INET) {
+		struct sockaddr_in *addr4 = (struct sockaddr_in *)addr;
+
+		addr4->sin_family = AF_INET;
+		addr4->sin_port = htons(port);
+		if (inet_pton(AF_INET, test_case->addr, &addr4->sin_addr) != 1)
+			return -1;
+
+		*addrlen = sizeof(*addr4);
+		return 0;
+	}
+
+	if (test_case->family == AF_INET6) {
+		struct sockaddr_in6 *addr6 = (struct sockaddr_in6 *)addr;
+
+		addr6->sin6_family = AF_INET6;
+		addr6->sin6_port = htons(port);
+		if (inet_pton(AF_INET6, test_case->addr, &addr6->sin6_addr) != 1)
+			return -1;
+
+		*addrlen = sizeof(*addr6);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int create_reuseport_socket(const struct reuseport_migrate_case *test_case)
+{
+	int one = 1;
+	int fd;
+
+	fd = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+	if (fd < 0)
+		return -1;
+
+	if (test_case->family == AF_INET6 &&
+	    setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, &one, sizeof(one))) {
+		close(fd);
+		return -1;
+	}
+
+	if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one))) {
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+static int enable_tcp_migrate_req(void)
+{
+	int len;
+	int fd;
+
+	fd = open(TCP_MIGRATE_REQ_PATH, O_RDWR | O_CLOEXEC);
+	if (fd < 0) {
+		if (errno == ENOENT || errno == EACCES ||
+		    errno == EPERM || errno == EROFS)
+			return KSFT_SKIP;
+		return KSFT_FAIL;
+	}
+
+	len = write(fd, "1", 1);
+	if (len != 1) {
+		if (errno == EACCES || errno == EPERM || errno == EROFS) {
+			close(fd);
+			return KSFT_SKIP;
+		}
+
+		close(fd);
+		return KSFT_FAIL;
+	}
+
+	close(fd);
+	return KSFT_PASS;
+}
+
+static void setup_netns(void)
+{
+	int ret;
+
+	if (unshare(CLONE_NEWNET))
+		ksft_exit_skip("unshare(CLONE_NEWNET): %s\n", strerror(errno));
+
+	if (system("ip link set lo up"))
+		ksft_exit_skip("failed to bring up lo interface in netns\n");
+
+	ret = enable_tcp_migrate_req();
+	if (ret == KSFT_SKIP)
+		ksft_exit_skip("failed to enable tcp_migrate_req\n");
+	if (ret == KSFT_FAIL)
+		ksft_exit_fail_msg("failed to enable tcp_migrate_req\n");
+}
+
+static void noop_handler(int sig)
+{
+	(void)sig;
+}
+
+static void *accept_thread(void *arg)
+{
+	struct accept_result *result = arg;
+
+	atomic_store_explicit(&result->tid, (int)syscall(SYS_gettid),
+			      memory_order_release);
+	atomic_store_explicit(&result->started, 1, memory_order_release);
+	result->accepted_fd = accept4(result->listener_fd, NULL, NULL,
+				      SOCK_CLOEXEC);
+	if (result->accepted_fd < 0)
+		result->err = errno;
+
+	return NULL;
+}
+
+static int read_thread_state(int tid, char *state)
+{
+	char *close_paren;
+	char path[64];
+	char buf[256];
+	ssize_t len;
+	int fd;
+
+	snprintf(path, sizeof(path), "/proc/self/task/%d/stat", tid);
+
+	fd = open(path, O_RDONLY | O_CLOEXEC);
+	if (fd < 0)
+		return -errno;
+
+	len = read(fd, buf, sizeof(buf) - 1);
+	close(fd);
+	if (len < 0)
+		return -errno;
+	if (!len)
+		return -EINVAL;
+
+	buf[len] = '\0';
+	close_paren = strrchr(buf, ')');
+	if (!close_paren || close_paren[1] != ' ' || !close_paren[2])
+		return -EINVAL;
+
+	*state = close_paren[2];
+	return 0;
+}
+
+static int wait_for_accept_to_block(const struct reuseport_migrate_case *test_case,
+				    int tid)
+{
+	char state = '\0';
+	int ret;
+	int i;
+
+	/*
+	 * A started thread is not enough here: we need to know the waiter
+	 * has actually gone to sleep in accept() before closing listener_a,
+	 * otherwise migration can race ahead of waiter registration. Poll
+	 * /proc task state because the pthread APIs can tell us whether the
+	 * thread has exited, but not whether it is already blocked in the
+	 * target syscall.
+	 */
+	for (i = 0; i < ACCEPT_BLOCK_TIMEOUT_MS; i++) {
+		ret = read_thread_state(tid, &state);
+		if (!ret) {
+			if (state == 'S' || state == 'D')
+				return KSFT_PASS;
+			if (state == 'Z')
+				break;
+		} else if (ret == -ENOENT) {
+			break;
+		}
+
+		usleep(1000);
+	}
+
+	ksft_print_msg("%s: accept waiter never blocked before migration\n",
+		       test_case->name);
+	return KSFT_FAIL;
+}
+
+static int join_thread_with_timeout(pthread_t thread, int timeout_ms,
+				    bool *timed_out)
+{
+	struct timespec deadline;
+	int err;
+
+	*timed_out = false;
+
+	if (clock_gettime(CLOCK_REALTIME, &deadline))
+		return KSFT_FAIL;
+
+	deadline.tv_nsec += timeout_ms * 1000000LL;
+	deadline.tv_sec += deadline.tv_nsec / 1000000000LL;
+	deadline.tv_nsec %= 1000000000LL;
+
+	err = pthread_timedjoin_np(thread, NULL, &deadline);
+	if (!err)
+		return KSFT_PASS;
+
+	if (err != ETIMEDOUT)
+		return KSFT_FAIL;
+
+	*timed_out = true;
+	return KSFT_FAIL;
+}
+
+static int interrupt_accept_thread(pthread_t thread)
+{
+	int err;
+
+	err = pthread_kill(thread, SIGUSR1);
+	if (err && err != ESRCH)
+		return KSFT_FAIL;
+
+	return KSFT_PASS;
+}
+
+static int stop_accept_thread(pthread_t thread, bool *timed_out)
+{
+	if (interrupt_accept_thread(thread))
+		return KSFT_FAIL;
+
+	return join_thread_with_timeout(thread, ACCEPT_CLEANUP_TIMEOUT_MS,
+					timed_out);
+}
+
+static int run_test(const struct reuseport_migrate_case *test_case)
+{
+	struct accept_result result = {
+		.listener_fd = -1,
+		.started = 0,
+		.tid = -1,
+		.accepted_fd = -1,
+		.err = 0,
+	};
+	struct sockaddr_storage addr;
+	struct sigaction sa = {
+		.sa_handler = noop_handler,
+	};
+	bool thread_joined = false;
+	bool cleanup_timed_out;
+	int listener_a = -1;
+	int listener_b = -1;
+	int ret = KSFT_FAIL;
+	socklen_t addrlen;
+	pthread_t thread;
+	int client = -1;
+	bool timed_out;
+	int probe = -1;
+	int tid;
+
+	if (make_sockaddr(test_case, 0, &addr, &addrlen)) {
+		ksft_print_msg("%s: failed to build socket address\n",
+			       test_case->name);
+		goto out;
+	}
+
+	if (sigemptyset(&sa.sa_mask)) {
+		ksft_perror("sigemptyset");
+		goto out;
+	}
+
+	if (sigaction(SIGUSR1, &sa, NULL)) {
+		ksft_perror("sigaction(SIGUSR1)");
+		goto out;
+	}
+
+	listener_a = create_reuseport_socket(test_case);
+	if (listener_a < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(listener_a)");
+		goto out;
+	}
+
+	if (bind(listener_a, (struct sockaddr *)&addr, addrlen)) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("bind(listener_a)");
+		goto out;
+	}
+
+	if (listen(listener_a, 1)) {
+		ksft_perror("listen(listener_a)");
+		goto out;
+	}
+
+	addrlen = sizeof(addr);
+	if (getsockname(listener_a, (struct sockaddr *)&addr, &addrlen)) {
+		ksft_perror("getsockname(listener_a)");
+		goto out;
+	}
+
+	listener_b = create_reuseport_socket(test_case);
+	if (listener_b < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(listener_b)");
+		goto out;
+	}
+
+	if (bind(listener_b, (struct sockaddr *)&addr, addrlen)) {
+		ksft_perror("bind(listener_b)");
+		goto out;
+	}
+
+	client = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+	if (client < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(client)");
+		goto out;
+	}
+
+	/* Connect while only listener_a is listening, ensuring the
+	 * child lands in listener_a's accept queue deterministically.
+	 */
+	if (connect(client, (struct sockaddr *)&addr, addrlen)) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("connect(client)");
+		goto out;
+	}
+
+	if (listen(listener_b, 1)) {
+		ksft_perror("listen(listener_b)");
+		goto out;
+	}
+
+	result.listener_fd = listener_b;
+	if (pthread_create(&thread, NULL, accept_thread, &result)) {
+		ksft_perror("pthread_create");
+		goto out;
+	}
+
+	while (!atomic_load_explicit(&result.started, memory_order_acquire))
+		sched_yield();
+
+	tid = atomic_load_explicit(&result.tid, memory_order_acquire);
+	if (wait_for_accept_to_block(test_case, tid))
+		goto out_with_thread;
+
+	close_fd(&listener_a);
+
+	ret = join_thread_with_timeout(thread, ACCEPT_WAKE_TIMEOUT_MS, &timed_out);
+	if (ret == KSFT_PASS) {
+		thread_joined = true;
+		if (result.accepted_fd < 0) {
+			ksft_print_msg("%s: blocking accept() returned err=%d (%s)\n",
+				       test_case->name, result.err,
+				       strerror(result.err));
+			ret = KSFT_FAIL;
+		}
+
+		goto out_with_thread;
+	}
+
+	if (!timed_out) {
+		ksft_print_msg("%s: join_thread_with_timeout() failed\n",
+			       test_case->name);
+		goto out_with_thread;
+	}
+
+	if (stop_accept_thread(thread, &cleanup_timed_out) == KSFT_FAIL) {
+		ksft_print_msg("%s: failed to stop blocking accept waiter\n",
+			       test_case->name);
+		goto out_with_thread;
+	}
+	thread_joined = true;
+
+	if (result.accepted_fd >= 0) {
+		ksft_print_msg("%s: blocking accept() completed only in cleanup\n",
+			       test_case->name);
+		goto out_with_thread;
+	}
+
+	if (result.err != EINTR) {
+		ksft_print_msg("%s: blocking accept() returned err=%d (%s)\n",
+			       test_case->name, result.err,
+			       strerror(result.err));
+		goto out_with_thread;
+	}
+
+	probe = accept4(listener_b, NULL, NULL, SOCK_NONBLOCK | SOCK_CLOEXEC);
+	if (probe >= 0) {
+		ksft_print_msg("%s: accept queue was populated, but blocking accept() timed out\n",
+			       test_case->name);
+	} else if (errno == EAGAIN || errno == EWOULDBLOCK) {
+		ksft_print_msg("%s: target listener had no queued child after migration\n",
+			       test_case->name);
+	} else {
+		ksft_perror("accept4(listener_b)");
+	}
+
+out_with_thread:
+	close_fd(&probe);
+	if (!thread_joined) {
+		if (stop_accept_thread(thread, &cleanup_timed_out) == KSFT_FAIL) {
+			ksft_print_msg("%s: failed to stop blocking accept waiter\n",
+				       test_case->name);
+			ret = KSFT_FAIL;
+			goto out;
+		}
+
+		thread_joined = true;
+	}
+	if (thread_joined)
+		close_fd(&result.accepted_fd);
+
+out:
+	close_fd(&client);
+	close_fd(&listener_b);
+	close_fd(&listener_a);
+
+	return ret;
+}
+
+int main(void)
+{
+	int status = KSFT_PASS;
+	int ret;
+	int i;
+
+	setup_netns();
+
+	ksft_print_header();
+	ksft_set_plan(ARRAY_SIZE(test_cases));
+
+	for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+		ret = run_test(&test_cases[i]);
+		ksft_test_result_code(ret, test_cases[i].name, NULL);
+
+		if (ret == KSFT_FAIL)
+			status = KSFT_FAIL;
+	}
+
+	if (status == KSFT_FAIL)
+		ksft_exit_fail();
+
+	ksft_finished();
+}
diff --git a/tools/testing/selftests/net/reuseport_migrate_epoll.c b/tools/testing/selftests/net/reuseport_migrate_epoll.c
new file mode 100644
index 000000000..9cbfb58c4
--- /dev/null
+++ b/tools/testing/selftests/net/reuseport_migrate_epoll.c
@@ -0,0 +1,353 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <netinet/in.h>
+#include <sched.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/epoll.h>
+#include <sys/socket.h>
+#include <unistd.h>
+
+#include "../kselftest.h"
+
+#define EPOLL_TIMEOUT_MS 500
+#define TCP_MIGRATE_REQ_PATH "/proc/sys/net/ipv4/tcp_migrate_req"
+
+struct reuseport_migrate_case {
+	const char *name;
+	int family;
+	const char *addr;
+};
+
+static const struct reuseport_migrate_case test_cases[] = {
+	{
+		.name = "ipv4 epoll wake after reuseport migration",
+		.family = AF_INET,
+		.addr = "127.0.0.1",
+	},
+	{
+		.name = "ipv6 epoll wake after reuseport migration",
+		.family = AF_INET6,
+		.addr = "::1",
+	},
+};
+
+static void close_fd(int *fd)
+{
+	if (*fd >= 0) {
+		close(*fd);
+		*fd = -1;
+	}
+}
+
+static bool unsupported_addr_err(int family, int err)
+{
+	return family == AF_INET6 &&
+		(err == EAFNOSUPPORT ||
+		 err == EPROTONOSUPPORT ||
+		 err == EADDRNOTAVAIL);
+}
+
+static int make_sockaddr(const struct reuseport_migrate_case *test_case,
+			 unsigned short port,
+			 struct sockaddr_storage *addr,
+			 socklen_t *addrlen)
+{
+	memset(addr, 0, sizeof(*addr));
+
+	if (test_case->family == AF_INET) {
+		struct sockaddr_in *addr4 = (struct sockaddr_in *)addr;
+
+		addr4->sin_family = AF_INET;
+		addr4->sin_port = htons(port);
+		if (inet_pton(AF_INET, test_case->addr, &addr4->sin_addr) != 1)
+			return -1;
+
+		*addrlen = sizeof(*addr4);
+		return 0;
+	}
+
+	if (test_case->family == AF_INET6) {
+		struct sockaddr_in6 *addr6 = (struct sockaddr_in6 *)addr;
+
+		addr6->sin6_family = AF_INET6;
+		addr6->sin6_port = htons(port);
+		if (inet_pton(AF_INET6, test_case->addr, &addr6->sin6_addr) != 1)
+			return -1;
+
+		*addrlen = sizeof(*addr6);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int create_reuseport_socket(const struct reuseport_migrate_case *test_case)
+{
+	int one = 1;
+	int fd;
+
+	fd = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+	if (fd < 0)
+		return -1;
+
+	if (test_case->family == AF_INET6 &&
+	    setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, &one, sizeof(one))) {
+		close(fd);
+		return -1;
+	}
+
+	if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one))) {
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+static int set_nonblocking(int fd)
+{
+	int flags;
+
+	flags = fcntl(fd, F_GETFL);
+	if (flags < 0)
+		return -1;
+
+	return fcntl(fd, F_SETFL, flags | O_NONBLOCK);
+}
+
+static int enable_tcp_migrate_req(void)
+{
+	int len;
+	int fd;
+
+	fd = open(TCP_MIGRATE_REQ_PATH, O_RDWR | O_CLOEXEC);
+	if (fd < 0) {
+		if (errno == ENOENT || errno == EACCES ||
+		    errno == EPERM || errno == EROFS)
+			return KSFT_SKIP;
+		return KSFT_FAIL;
+	}
+
+	len = write(fd, "1", 1);
+	if (len != 1) {
+		if (errno == EACCES || errno == EPERM || errno == EROFS) {
+			close(fd);
+			return KSFT_SKIP;
+		}
+
+		close(fd);
+		return KSFT_FAIL;
+	}
+
+	close(fd);
+	return KSFT_PASS;
+}
+
+static void setup_netns(void)
+{
+	int ret;
+
+	if (unshare(CLONE_NEWNET))
+		ksft_exit_skip("unshare(CLONE_NEWNET): %s\n", strerror(errno));
+
+	if (system("ip link set lo up"))
+		ksft_exit_skip("failed to bring up lo interface in netns\n");
+
+	ret = enable_tcp_migrate_req();
+	if (ret == KSFT_SKIP)
+		ksft_exit_skip("failed to enable tcp_migrate_req\n");
+	if (ret == KSFT_FAIL)
+		ksft_exit_fail_msg("failed to enable tcp_migrate_req\n");
+}
+
+static int run_test(const struct reuseport_migrate_case *test_case)
+{
+	struct sockaddr_storage addr;
+	struct epoll_event ev = {
+		.events = EPOLLIN,
+	};
+	int listener_a = -1;
+	int listener_b = -1;
+	int ret = KSFT_FAIL;
+	socklen_t addrlen;
+	int accepted = -1;
+	int client = -1;
+	int epfd = -1;
+	int n;
+
+	if (make_sockaddr(test_case, 0, &addr, &addrlen)) {
+		ksft_print_msg("%s: failed to build socket address\n",
+			       test_case->name);
+		goto out;
+	}
+
+	listener_a = create_reuseport_socket(test_case);
+	if (listener_a < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(listener_a)");
+		goto out;
+	}
+
+	if (bind(listener_a, (struct sockaddr *)&addr, addrlen)) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("bind(listener_a)");
+		goto out;
+	}
+
+	if (listen(listener_a, 1)) {
+		ksft_perror("listen(listener_a)");
+		goto out;
+	}
+
+	addrlen = sizeof(addr);
+	if (getsockname(listener_a, (struct sockaddr *)&addr, &addrlen)) {
+		ksft_perror("getsockname(listener_a)");
+		goto out;
+	}
+
+	listener_b = create_reuseport_socket(test_case);
+	if (listener_b < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(listener_b)");
+		goto out;
+	}
+
+	if (bind(listener_b, (struct sockaddr *)&addr, addrlen)) {
+		ksft_perror("bind(listener_b)");
+		goto out;
+	}
+
+	client = socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+	if (client < 0) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("socket(client)");
+		goto out;
+	}
+
+	/* Connect while only listener_a is listening, ensuring the
+	 * child lands in listener_a's accept queue deterministically.
+	 */
+	if (connect(client, (struct sockaddr *)&addr, addrlen)) {
+		if (unsupported_addr_err(test_case->family, errno)) {
+			ret = KSFT_SKIP;
+			goto out;
+		}
+
+		ksft_perror("connect(client)");
+		goto out;
+	}
+
+	if (listen(listener_b, 1)) {
+		ksft_perror("listen(listener_b)");
+		goto out;
+	}
+
+	if (set_nonblocking(listener_b)) {
+		ksft_perror("set_nonblocking(listener_b)");
+		goto out;
+	}
+
+	epfd = epoll_create1(EPOLL_CLOEXEC);
+	if (epfd < 0) {
+		ksft_perror("epoll_create1");
+		goto out;
+	}
+
+	ev.data.fd = listener_b;
+	if (epoll_ctl(epfd, EPOLL_CTL_ADD, listener_b, &ev)) {
+		ksft_perror("epoll_ctl(ADD listener_b)");
+		goto out;
+	}
+
+	close_fd(&listener_a);
+
+	n = epoll_wait(epfd, &ev, 1, EPOLL_TIMEOUT_MS);
+	if (n < 0) {
+		ksft_perror("epoll_wait");
+		goto out;
+	}
+
+	accepted = accept4(listener_b, NULL, NULL, SOCK_NONBLOCK | SOCK_CLOEXEC);
+	if (accepted < 0) {
+		if (errno == EAGAIN || errno == EWOULDBLOCK) {
+			ksft_print_msg("%s: target listener had no queued child after migration\n",
+				       test_case->name);
+			goto out;
+		}
+
+		ksft_perror("accept4(listener_b)");
+		goto out;
+	}
+
+	if (n != 1) {
+		ksft_print_msg("%s: accept queue was populated, but epoll_wait() timed out\n",
+			       test_case->name);
+		goto out;
+	}
+
+	if (ev.data.fd != listener_b || !(ev.events & EPOLLIN)) {
+		ksft_print_msg("%s: unexpected epoll event fd=%d events=%#x\n",
+			       test_case->name, ev.data.fd, ev.events);
+		goto out;
+	}
+
+	ret = KSFT_PASS;
+
+out:
+	close_fd(&accepted);
+	close_fd(&epfd);
+	close_fd(&client);
+	close_fd(&listener_b);
+	close_fd(&listener_a);
+
+	return ret;
+}
+
+int main(void)
+{
+	int status = KSFT_PASS;
+	int ret;
+	int i;
+
+	setup_netns();
+
+	ksft_print_header();
+	ksft_set_plan(ARRAY_SIZE(test_cases));
+
+	for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+		ret = run_test(&test_cases[i]);
+		ksft_test_result_code(ret, test_cases[i].name, NULL);
+
+		if (ret == KSFT_FAIL)
+			status = KSFT_FAIL;
+	}
+
+	if (status == KSFT_FAIL)
+		ksft_exit_fail();
+
+	ksft_finished();
+}
-- 
2.43.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox