Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v2] Documentation: net/smc: correct old value of smcr_max_recv_wr
From: Breno Leitao @ 2026-06-23 15:12 UTC (permalink / raw)
  To: Mahanta Jambigi
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, alibuda, dust.li,
	sidraya, wenjia, wintera, pasic, horms, tonylu, guwen, netdev,
	linux-s390
In-Reply-To: <20260424052336.3262350-1-mjambigi@linux.ibm.com>

On Fri, Apr 24, 2026 at 07:23:36AM +0200, Mahanta Jambigi wrote:
> The smc-sysctl.rst documentation incorrectly stated that the previous
> hardcoded maximum number of WR buffers on the receive path (smcr_max_recv_wr)
> was 16. The correct historical value used before the introduction of the sysctl
> control was 48. Update the documentation to reflect the accurate historical
> value. Also fix a couple of minor typos.
> 
> Fixes: aef3cdb47bbb net/smc: make wr buffer count configurable

This Fixes tag is broken. You probably want:

	Fixes: aef3cdb47bbb ("net/smc: make wr buffer count configurable")

Other than that, it looks good, the corrected value checks out.

^ permalink raw reply

* Re: [PATCH net-next 0/3] selftests/xsk: stabilize timeout test behavior
From: Maciej Fijalkowski @ 2026-06-23 14:56 UTC (permalink / raw)
  To: Jason Xing
  Cc: Tushar Vyavahare, netdev, magnus.karlsson, stfomichev, kernelxing,
	davem, kuba, pabeni, ast, daniel, tirthendu.sarkar, bpf
In-Reply-To: <ajJsMj0QMOF5I8qq@boxer>

On Wed, Jun 17, 2026 at 11:43:14AM +0200, Maciej Fijalkowski wrote:
> On Wed, Jun 17, 2026 at 07:39:06AM +0800, Jason Xing wrote:
> > Hi Tushar,
> > 
> > On Tue, Jun 16, 2026 at 11:50 PM Tushar Vyavahare
> > <tushar.vyavahare@intel.com> wrote:
> > >
> > > This series improves AF_XDP selftests by making timeout handling
> > > explicit and fixing sources of non-determinism in xsk timeout tests.
> > >
> > > Patch 1 introduces test_spec::poll_tmout and removes implicit
> > > dependence on RX UMEM setup state for timeout behavior.
> > >
> > > Patch 2 fixes thread harness sequencing by attaching XDP programs
> > > before worker startup, removing signal-based termination, and using
> > > barrier synchronization only for dual-thread runs.
> > >
> > > Patch 3 restores shared_umem after POLL_TXQ_FULL so test-local
> > > configuration does not leak into subsequent cases on shared-netdev
> > > runs.
> > >
> > > Together these changes make timeout handling easier to follow and
> > > improve selftest stability, especially on real NIC runs.
> > 
> > net-next is closed, but in the meantime I'll review the series ASAP.
> > 
> > BTW, another thing about selftests I had in my mind is that are you
> > planning to work on this [1]?
> 
> This one is on me. I took your changes Jason and aligned ZC batching side
> to this behavior, followed by xskxceiver adjustment. I am planning to send
> this today EOD, however let's see how badly internal Sashiko will kick my
> ass.

Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

> 
> > 
> > [1]: https://lore.kernel.org/all/20260520004244.55663-1-kerneljasonxing@gmail.com/
> > 
> > Thanks,
> > Jason
> > 
> > >
> > > Tushar Vyavahare (3):
> > >   selftests/xsk: make poll timeout mode explicit
> > >   selftests/xsk: fix timeout thread harness sequencing
> > >   selftests/xsk: restore shared_umem after POLL_TXQ_FULL
> > >
> > >  .../selftests/bpf/prog_tests/test_xsk.c       | 96 +++++++++++--------
> > >  .../selftests/bpf/prog_tests/test_xsk.h       |  2 +
> > >  2 files changed, 56 insertions(+), 42 deletions(-)
> > >
> > > --
> > > 2.43.0
> > >
> > >
> > 

^ permalink raw reply

* Re: [PATCH net-next 0/3] selftests/xsk: stabilize timeout test behavior
From: Maciej Fijalkowski @ 2026-06-23 14:58 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jason Xing, Tushar Vyavahare, netdev, magnus.karlsson, stfomichev,
	kernelxing, davem, pabeni, ast, daniel, tirthendu.sarkar, bpf
In-Reply-To: <ajpLuDNCu2PHS78l@boxer>

On Tue, Jun 23, 2026 at 11:02:48AM +0200, Maciej Fijalkowski wrote:
> On Mon, Jun 22, 2026 at 04:07:06PM -0700, Jakub Kicinski wrote:
> > On Wed, 17 Jun 2026 11:43:14 +0200 Maciej Fijalkowski wrote:
> > > > On Tue, Jun 16, 2026 at 11:50 PM Tushar Vyavahare
> > > > <tushar.vyavahare@intel.com> wrote:  
> > > > >
> > > > > This series improves AF_XDP selftests by making timeout handling
> > > > > explicit and fixing sources of non-determinism in xsk timeout tests.
> > > > >
> > > > > Patch 1 introduces test_spec::poll_tmout and removes implicit
> > > > > dependence on RX UMEM setup state for timeout behavior.
> > > > >
> > > > > Patch 2 fixes thread harness sequencing by attaching XDP programs
> > > > > before worker startup, removing signal-based termination, and using
> > > > > barrier synchronization only for dual-thread runs.
> > > > >
> > > > > Patch 3 restores shared_umem after POLL_TXQ_FULL so test-local
> > > > > configuration does not leak into subsequent cases on shared-netdev
> > > > > runs.
> > > > >
> > > > > Together these changes make timeout handling easier to follow and
> > > > > improve selftest stability, especially on real NIC runs.  
> > > > 
> > > > net-next is closed, but in the meantime I'll review the series ASAP.
> > > > 
> > > > BTW, another thing about selftests I had in my mind is that are you
> > > > planning to work on this [1]?  
> > > 
> > > This one is on me. I took your changes Jason and aligned ZC batching side
> > > to this behavior, followed by xskxceiver adjustment. I am planning to send
> > > this today EOD, however let's see how badly internal Sashiko will kick my
> > > ass.
> > 
> > Hi Maciej, do you want these applied? If they help make the tests less
> > flaky I think that it's fine to take them during the merge window.
> 
> Hi Jakub,
> 
> last refactor from Tushar broke BIDIRECTIONAL test case when HW is test
> target, but not on veth, so let me test these changes locally and then get
> back to you.
> 
> BPF CI runs xskxceiver on veth so this has not been caught. Seems my/our
> focus should be to enable xskxceiver HW tests on any kind of
> environment/infrastructure.
> 
> Gonna get back to you by the EOD.
> Maciej

Ah I replied on other thread I guess, so let me repeat:

Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

^ permalink raw reply

* Re: [PATCH 1/2] bug: Provide WARN_ON.*DEFERRED() macros for console deferred output
From: K Prateek Nayak @ 2026-06-23 14:54 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-arch, linux-kernel, sched-ext,
	netdev
  Cc: David S . Miller, Andrea Righi, Andrew Morton, Arnd Bergmann,
	Ben Segall, Breno Leitao, Changwoo Min, David Vernet,
	Dietmar Eggemann, Eric Dumazet, Ingo Molnar, Jakub Kicinski,
	John Ogness, Juri Lelli, Paolo Abeni, Peter Zijlstra, Petr Mladek,
	Sergey Senozhatsky, Simon Horman, Steven Rostedt, Tejun Heo,
	Vincent Guittot, Vlad Poenaru
In-Reply-To: <20260623142650.265721-2-bigeasy@linutronix.de>

Hello Sebastian,

On 6/23/2026 7:56 PM, Sebastian Andrzej Siewior wrote:
> --- a/lib/bug.c
> +++ b/lib/bug.c
> @@ -196,7 +196,7 @@ void __warn_printf(const char *fmt, struct pt_regs *regs)
>  
>  static enum bug_trap_type __report_bug(struct bug_entry *bug, unsigned long bugaddr, struct pt_regs *regs)
>  {
> -	bool warning, once, done, no_cut, has_args;
> +	bool warning, once, done, no_cut, has_args, deferred;
>  	const char *file, *fmt;
>  	unsigned line;
>  
> @@ -219,6 +219,7 @@ static enum bug_trap_type __report_bug(struct bug_entry *bug, unsigned long buga
>  	done     = bug->flags & BUGFLAG_DONE;
>  	no_cut   = bug->flags & BUGFLAG_NO_CUT_HERE;
>  	has_args = bug->flags & BUGFLAG_ARGS;
> +	deferred = bug->flags & BUGFLAG_DEFERRED;
>  
>  	if (warning && once) {
>  		if (done)
> @@ -229,7 +230,10 @@ static enum bug_trap_type __report_bug(struct bug_entry *bug, unsigned long buga
>  		 */
>  		bug->flags |= BUGFLAG_DONE;
>  	}
> -
> +	if (deferred) {
> +		preempt_disable_notrace();
> +		printk_deferred_enter();
> +	}
>  	/*
>  	 * BUG() and WARN_ON() families don't print a custom debug message
>  	 * before triggering the exception handler, so we must add the
> @@ -245,6 +249,10 @@ static enum bug_trap_type __report_bug(struct bug_entry *bug, unsigned long buga
>  		/* this is a WARN_ON rather than BUG/BUG_ON */
>  		__warn(file, line, (void *)bugaddr, BUG_GET_TAINT(bug), regs,
>  		       NULL);
> +		if (deferred) {
> +			printk_deferred_exit();
> +			preempt_enable_notrace();
> +		}
>  		return BUG_TRAP_TYPE_WARN;

nit.

Instead of replicating these bits, can we replace that return with a
"goto out" ...

>  	}
>  
> @@ -254,6 +262,10 @@ static enum bug_trap_type __report_bug(struct bug_entry *bug, unsigned long buga
>  		pr_crit("kernel BUG at %pB [verbose debug info unavailable]\n",
>  			(void *)bugaddr);
>  

out:

> +	if (deferred) {
> +		printk_deferred_exit();
> +		preempt_enable_notrace();
> +	}
>  	return BUG_TRAP_TYPE_BUG;

... and replace this return with a:

    return (warning) ? BUG_TRAP_TYPE_WARN : BUG_TRAP_TYPE_BUG;

Looks a tab bit cleaner to my eyes. Thoughts?

>  }
>  

-- 
Thanks and Regards,
Prateek


^ permalink raw reply

* [PATCH] qede: fix out-of-bounds check for cqe->len_list[]
From: Matvey Kovalev @ 2026-06-23 14:45 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Matvey Kovalev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Pavel Zhigulin, netdev, linux-kernel, lvc-project

Move index check before element access.

Fixes: 896f1a2493b5 ("net: qlogic/qede: fix potential out-of-bounds read in qede_tpa_cont() and qede_tpa_end()")
Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Matvey Kovalev <matvey.kovalev@ispras.ru>
---
 drivers/net/ethernet/qlogic/qede/qede_fp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c
index e338bfc8b7b2..33e18bb69774 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_fp.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c
@@ -961,7 +961,7 @@ static inline void qede_tpa_cont(struct qede_dev *edev,
 {
 	int i;
 
-	for (i = 0; cqe->len_list[i] && i < ARRAY_SIZE(cqe->len_list); i++)
+	for (i = 0; i < ARRAY_SIZE(cqe->len_list) && cqe->len_list[i]; i++)
 		qede_fill_frag_skb(edev, rxq, cqe->tpa_agg_index,
 				   le16_to_cpu(cqe->len_list[i]));
 
@@ -986,7 +986,7 @@ static int qede_tpa_end(struct qede_dev *edev,
 		dma_unmap_page(rxq->dev, tpa_info->buffer.mapping,
 			       PAGE_SIZE, rxq->data_direction);
 
-	for (i = 0; cqe->len_list[i] && i < ARRAY_SIZE(cqe->len_list); i++)
+	for (i = 0; i < ARRAY_SIZE(cqe->len_list) && cqe->len_list[i]; i++)
 		qede_fill_frag_skb(edev, rxq, cqe->tpa_agg_index,
 				   le16_to_cpu(cqe->len_list[i]));
 	if (unlikely(i > 1))
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH net v2 2/2] sctp: add INIT verification after cookie unpacking
From: Xin Long @ 2026-06-23 14:45 UTC (permalink / raw)
  To: Simon Horman
  Cc: netdev, linux-sctp, davem, kuba, edumazet, pabeni,
	marcelo.leitner
In-Reply-To: <20260622165200.1050676-1-horms@kernel.org>

On Mon, Jun 22, 2026 at 12:52 PM Simon Horman <horms@kernel.org> wrote:
>
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://netdev-ai.bots.linux.dev/sashiko/
> ---
> sctp: add INIT verification after cookie unpacking
>
> This commit adds sctp_verify_init() checks in the COOKIE-ECHO processing
> paths sctp_sf_do_5_1D_ce() and sctp_sf_do_5_2_4_dupcook() when
> cookie_auth_enable is disabled, and updates sctp_verify_init() to bound
> the parameter walk against peer_init length rather than chunk->chunk_end.
> On verification failure the new association is freed and an ABORT is
> generated via sctp_abort_on_init_err().
>
> > diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
> > index 8c636f045e45..6967e889d1bd 100644
> > --- a/net/sctp/sm_statefuns.c
> > +++ b/net/sctp/sm_statefuns.c
> > @@ -650,11 +650,12 @@ enum sctp_disposition sctp_sf_do_5_1D_ce(struct net *net,
> >                                        struct sctp_cmd_seq *commands)
> >  {
> >       struct sctp_ulpevent *ev, *ai_ev = NULL, *auth_ev = NULL;
> > +     struct sctp_chunk *err_chk_p = NULL;
> >       struct sctp_association *new_asoc;
> >       struct sctp_init_chunk *peer_init;
> >       struct sctp_chunk *chunk = arg;
> > -     struct sctp_chunk *err_chk_p;
> >       struct sctp_chunk *repl;
> > +     enum sctp_cid cid;
> >       struct sock *sk;
> >       int error = 0;
> >
> > @@ -728,6 +729,18 @@ enum sctp_disposition sctp_sf_do_5_1D_ce(struct net *net,
> >               }
> >       }
> >
> > +     peer_init = (struct sctp_init_chunk *)(chunk->subh.cookie_hdr + 1);
> > +     cid = peer_init->chunk_hdr.type;
> > +     if (!sctp_sk(sk)->cookie_auth_enable &&
> > +         !sctp_verify_init(net, ep, asoc, cid, peer_init, chunk,
> > +                           &err_chk_p)) {
> > +             sctp_association_free(new_asoc);
> > +             return sctp_abort_on_init_err(net, ep, asoc, cid, arg, commands,
> > +                                           err_chk_p);
> > +     }
>
> [Critical]
> When cookie_auth_enable is disabled the cookie HMAC is not validated by
> sctp_unpack_cookie(), so every byte of the embedded INIT (including
> peer_init->chunk_hdr.type) is attacker-controlled.  Is it safe to feed
> that byte directly into both sctp_verify_init() and
> sctp_abort_on_init_err() as cid?
>
> Inside the COOKIE-ECHO body the embedded chunk is by RFC always an INIT.
> Could the cid argument be hard-coded to SCTP_CID_INIT (or
> peer_init->chunk_hdr.type be validated against SCTP_CID_INIT before use)
> instead of trusting the wire byte?
>
> Two attacker-reachable consequences look possible if cid is forged to
> SCTP_CID_INIT_ACK (3) and the STATE_COOKIE parameter is omitted:
>
> sctp_verify_init() then returns 0 via this branch with *errp populated:
>
>         if ((SCTP_CID_INIT_ACK == cid) && !has_cookie)
>                 return sctp_process_missing_param(asoc, SCTP_PARAM_STATE_COOKIE,
>                                                   chunk, errp);
>
> control transfers to sctp_abort_on_init_err(net, ep, asoc, cid, ...) with
> cid == SCTP_CID_INIT_ACK and a non-NULL err_chunk.  After sending the
> ABORT packet, that helper falls through to its out: label because
> cid == SCTP_CID_INIT_ACK:
>
>         if (cid != SCTP_CID_INIT_ACK) {
>                 if (!packet)
>                         return SCTP_DISPOSITION_NOMEM;
>                 return SCTP_DISPOSITION_CONSUME;
>         }
>
> out:
>         ...
>         return sctp_stop_t1_and_abort(net, commands, error, ECONNREFUSED,
>                                       asoc, chunk->transport);
>
> In the sctp_sf_do_5_1D_ce() context, asoc is typically NULL (the function
> itself opens with if (asoc && !sctp_vtag_verify(chunk, asoc))).  Can the
> SCTP_CMD_NEW_STATE / SCTP_CMD_TIMER_STOP / SCTP_CMD_INIT_FAILED commands
> queued by sctp_stop_t1_and_abort() against a NULL asoc be dispatched
> without the sideeffect interpreter dereferencing asoc?
>
> > @@ -2205,6 +2219,19 @@ enum sctp_disposition sctp_sf_do_5_2_4_dupcook(
> >       switch (action) {
> >       case 'A': /* Association restart. */
> >       case 'B': /* Collision case B. */
> > +             peer_init = (struct sctp_init_chunk *)
> > +                             (chunk->subh.cookie_hdr + 1);
> > +             cid = peer_init->chunk_hdr.type;
> > +             if (!sctp_sk(ep->base.sk)->cookie_auth_enable &&
> > +                 !sctp_verify_init(net, ep, asoc, cid, peer_init, chunk,
> > +                                   &err_chk_p)) {
> > +                     sctp_association_free(new_asoc);
> > +                     return sctp_abort_on_init_err(net, ep, asoc, cid, arg,
> > +                                                   commands, err_chk_p);
> > +             }
> > +             if (err_chk_p)
> > +                     sctp_chunk_free(err_chk_p);
> > +             fallthrough;
>
> [Critical]
> The same attacker-controlled cid is propagated here as well.  In
> sctp_sf_do_5_2_4_dupcook() asoc is the existing established association.
> With a forged cid of SCTP_CID_INIT_ACK and missing STATE_COOKIE, can a
> single unauthenticated COOKIE-ECHO drive sctp_abort_on_init_err() into
> its out: path and have sctp_stop_t1_and_abort() tear down the established
> asoc and report ECONNREFUSED to the application?
>
> Would constraining the verification to the well-defined INIT case (for
> example, hard-coding SCTP_CID_INIT or rejecting any cid !=
> SCTP_CID_INIT before calling sctp_verify_init()/sctp_abort_on_init_err())
> avoid both of these paths?

I will place a check in sctp_unpack_cookie() for such malformed packets:

diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index d5ee81934d93..8adac9e0cd66 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -1761,6 +1761,8 @@ struct sctp_association *sctp_unpack_cookie(
        bear_cookie = &cookie->c;

        ch = (struct sctp_chunkhdr *)(bear_cookie + 1);
+       if (ch->type != SCTP_CID_INIT)
+               goto malformed;
        chlen = ntohs(ch->length);
        if (chlen < sizeof(struct sctp_init_chunk))
                goto malformed;

Thanks.

^ permalink raw reply related

* [PATCH v2] netdevsim: fix use-after-free in nsim_create and __nsim_dev_port_del
From: Hrushiraj Gandhi @ 2026-06-23 14:44 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Simon Horman, Andrew Lunn, David S . Miller, Eric Dumazet,
	Paolo Abeni, Jiri Pirko, netdev, linux-kernel, bpf,
	syzbot+6c25f4750230faf70be9, Hrushiraj Gandhi

debugfs files created under a port's ddir (ethtool/get_err,
ethtool/set_err, ring params, bpf_offloaded_id, udp_ports/inject_error,
etc.) store raw pointers directly into the netdevsim struct, which lives
in the net_device private data kmalloc slab.

If these files outlive the netdevsim struct, a concurrent reader can
trigger a slab-use-after-free by passing debugfs_file_get() (which only
checks dentry lifetime) and then dereferencing the freed data pointer
in debugfs_u32_get().

In __nsim_dev_port_del(), nsim_destroy() is called before
nsim_dev_port_debugfs_exit(). However, nsim_destroy() calls free_netdev()
at its end, while nsim_dev_port_debugfs_exit() removes the port's
debugfs directory. This means the slab is freed before the debugfs
files are removed.

The same window exists on nsim_create()'s error path:
nsim_ethtool_init() creates debugfs files under ddir with pointers into
ns before nsim_init_netdevsim()/nsim_init_netdevsim_vf() which can fail,
and the err_free_netdev label calls free_netdev() while those debugfs
entries are still live.

Fix both paths by calling debugfs_remove_recursive() on the port's
ddir before every free_netdev() call. The subsequent
nsim_dev_port_debugfs_exit() calls become harmless no-ops since ddir is
set to NULL.

Reported-by: syzbot+6c25f4750230faf70be9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6c25f4750230faf70be9
Fixes: e05b2d141fef ("netdevsim: move netdev creation/destruction to dev probe")
Signed-off-by: Hrushiraj Gandhi <hrushirajg23@gmail.com>
---
v2:
- Also fix the same use-after-free window on the error path of nsim_create() as suggested by Simon Horman.
- Shorten the code comment in nsim_destroy() to be more concise.

 drivers/net/netdevsim/netdev.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index 27e5f109f933..f2824e75cddd 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -1165,6 +1165,8 @@ struct netdevsim *nsim_create(struct nsim_dev *nsim_dev,
 	return ns;

 err_free_netdev:
+	debugfs_remove_recursive(nsim_dev_port->ddir);
+	nsim_dev_port->ddir = NULL;
 	free_netdev(dev);
 	return ERR_PTR(err);
 }
@@ -1214,6 +1216,13 @@ void nsim_destroy(struct netdevsim *ns)
 		ns->page = NULL;
 	}

+	/*
+	 * Remove per-port debugfs files before free_netdev() releases the
+	 * netdevsim struct to prevent use-after-free in concurrent readers.
+	 */
+	debugfs_remove_recursive(ns->nsim_dev_port->ddir);
+	ns->nsim_dev_port->ddir = NULL;
+
 	free_netdev(dev);
 }

-- 
2.47.3

^ permalink raw reply related

* Re: [PATCH v1 0/3] thunderbold: A few cleanups
From: Uwe Kleine-König (The Capable Hub) @ 2026-06-23 14:35 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Mika Westerberg, Yehezkel Bernat, Andreas Noever, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, linux-kernel, linux-usb
In-Reply-To: <20260623121746.GD3066@black.igk.intel.com>

[-- Attachment #1: Type: text/plain, Size: 753 bytes --]

Hello Mika,

On Tue, Jun 23, 2026 at 02:17:46PM +0200, Mika Westerberg wrote:
> On Thu, Jun 18, 2026 at 12:14:49PM +0200, Uwe Kleine-König (The Capable Hub) wrote:
> > Uwe Kleine-König (The Capable Hub) (3):
> >   thunderbold: Stop passing matched device ID to .probe()
> >   thunderbold: Assert that a service driver has a probe callback
> >   thunderbold: Drop comma after device id array terminator
> 
> Fixed the typo "thunderbold" -> "thunderbolt" and applied all to

Oh.

> thunderbolt.git/next. I also took the networking patch, let me know if
> that's not okay (I'm the maintainer of that driver too and it looked fine).

Sounds fine to me. So assuming you're not offending the network guys,
that should be ok.

Thanks!
Uwe

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [PATCH 2/2] sched: Use WARN_ON.*_DEFERRED()
From: Sebastian Andrzej Siewior @ 2026-06-23 14:26 UTC (permalink / raw)
  To: linux-arch, linux-kernel, sched-ext, netdev
  Cc: David S . Miller, Andrea Righi, Andrew Morton, Arnd Bergmann,
	Ben Segall, Breno Leitao, Changwoo Min, David Vernet,
	Dietmar Eggemann, Eric Dumazet, Ingo Molnar, Jakub Kicinski,
	John Ogness, Juri Lelli, K Prateek Nayak, Paolo Abeni,
	Peter Zijlstra, Petr Mladek, Sergey Senozhatsky, Simon Horman,
	Steven Rostedt, Tejun Heo, Vincent Guittot, Vlad Poenaru,
	Sebastian Andrzej Siewior
In-Reply-To: <20260623142650.265721-1-bigeasy@linutronix.de>

Vlad managed to trigger a warning in __enqueue_entity() while the rq
lock was held. He was using the netconsole in an older kernel which was
a legacy console (not nbcon). This resulted in an immediate flush which
led to sending packets and this in turn led to waking ksoftirqd. This
wake up ended up in deadlock because the scheduler tried to acquire the
already acquired rq.

This problem is not limited to the netconsole but all legacy consoles:
Should the console wake any task while holding its internal lock then
lockdep will observe and report a possible AB-BA deadlock. Also since
the warning does not happen regulary, lockdep may observe a lockchain
while acquiring the locks, leading to a recursion report while holding
locks.
More importantly after the during the console printing and once it is
finished the console semaphore is released which will lead to wakeup if
there is a waiter pending.

Replace WARNs within the scheduler with the DEFERRED variant. This will
queue an irq_work and the print will occur once the locks are dropped.

Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
Closes: https://lore.kernel.org/all/20260610183621.3915271-1-vlad.wing@gmail.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/sched/core.c        |  78 +++++++++++++-------------
 kernel/sched/core_sched.c  |   6 +-
 kernel/sched/cpudeadline.c |   6 +-
 kernel/sched/deadline.c    |  62 ++++++++++-----------
 kernel/sched/ext.c         | 110 ++++++++++++++++++-------------------
 kernel/sched/fair.c        |  88 ++++++++++++++---------------
 kernel/sched/rt.c          |  36 ++++++------
 kernel/sched/sched.h       |  18 +++---
 8 files changed, 202 insertions(+), 202 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b8871449d3c69..0e282457abb91 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -853,7 +853,7 @@ void update_rq_clock(struct rq *rq)
 		return;
 
 	if (sched_feat(WARN_DOUBLE_CLOCK))
-		WARN_ON_ONCE(rq->clock_update_flags & RQCF_UPDATED);
+		WARN_ON_ONCE_DEFERRED(rq->clock_update_flags & RQCF_UPDATED);
 	rq->clock_update_flags |= RQCF_UPDATED;
 
 	clock = sched_clock_cpu(cpu_of(rq));
@@ -1807,7 +1807,7 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
 
 	bucket = &uc_rq->bucket[uc_se->bucket_id];
 
-	WARN_ON_ONCE(!bucket->tasks);
+	WARN_ON_ONCE_DEFERRED(!bucket->tasks);
 	if (likely(bucket->tasks))
 		bucket->tasks--;
 
@@ -1827,7 +1827,7 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
 	 * Defensive programming: this should never happen. If it happens,
 	 * e.g. due to future modification, warn and fix up the expected value.
 	 */
-	WARN_ON_ONCE(bucket->value > rq_clamp);
+	WARN_ON_ONCE_DEFERRED(bucket->value > rq_clamp);
 	if (bucket->value >= rq_clamp) {
 		bkt_clamp = uclamp_rq_max_value(rq, clamp_id, uc_se->value);
 		uclamp_rq_set(rq, clamp_id, bkt_clamp);
@@ -2210,7 +2210,7 @@ void activate_task(struct rq *rq, struct task_struct *p, int flags)
 
 void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
 {
-	WARN_ON_ONCE(flags & DEQUEUE_SLEEP);
+	WARN_ON_ONCE_DEFERRED(flags & DEQUEUE_SLEEP);
 
 	WRITE_ONCE(p->on_rq, TASK_ON_RQ_MIGRATING);
 	ASSERT_EXCLUSIVE_WRITER(p->on_rq);
@@ -2516,7 +2516,7 @@ static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf,
 	rq = cpu_rq(new_cpu);
 
 	rq_lock(rq, rf);
-	WARN_ON_ONCE(task_cpu(p) != new_cpu);
+	WARN_ON_ONCE_DEFERRED(task_cpu(p) != new_cpu);
 	activate_task(rq, p, 0);
 	wakeup_preempt(rq, p, 0);
 
@@ -2602,7 +2602,7 @@ static int migration_cpu_stop(void *data)
 	 * If we were passed a pending, then ->stop_pending was set, thus
 	 * p->migration_pending must have remained stable.
 	 */
-	WARN_ON_ONCE(pending && pending != p->migration_pending);
+	WARN_ON_ONCE_DEFERRED(pending && pending != p->migration_pending);
 
 	/*
 	 * If task_rq(p) != rq, it cannot be migrated here, because we're
@@ -2661,7 +2661,7 @@ static int migration_cpu_stop(void *data)
 		 * determine is_migration_disabled() and so have to chase after
 		 * it.
 		 */
-		WARN_ON_ONCE(!pending->stop_pending);
+		WARN_ON_ONCE_DEFERRED(!pending->stop_pending);
 		preempt_disable();
 		rq_unlock(rq, &rf);
 		raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);
@@ -3004,7 +3004,7 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 	 *
 	 * Either way, we really should have a @pending here.
 	 */
-	if (WARN_ON_ONCE(!pending)) {
+	if (WARN_ON_ONCE_DEFERRED(!pending)) {
 		task_rq_unlock(rq, p, rf);
 		return -EINVAL;
 	}
@@ -3116,9 +3116,9 @@ static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
 			goto out;
 		}
 
-		if (WARN_ON_ONCE(p == current &&
-				 is_migration_disabled(p) &&
-				 !cpumask_test_cpu(task_cpu(p), ctx->new_mask))) {
+		if (WARN_ON_ONCE_DEFERRED(p == current &&
+					  is_migration_disabled(p) &&
+					  !cpumask_test_cpu(task_cpu(p), ctx->new_mask))) {
 			ret = -EBUSY;
 			goto out;
 		}
@@ -3267,7 +3267,7 @@ void force_compatible_cpus_allowed_ptr(struct task_struct *p)
 				cpumask_pr_args(override_mask));
 	}
 
-	WARN_ON(set_cpus_allowed_ptr(p, override_mask));
+	WARN_ON_DEFERRED(set_cpus_allowed_ptr(p, override_mask));
 out_free_mask:
 	cpus_read_unlock();
 	free_cpumask_var(new_mask);
@@ -3293,7 +3293,7 @@ void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
 	 * Cpuset masking will be done there too.
 	 */
 	ret = __sched_setaffinity(p, &ac);
-	WARN_ON_ONCE(ret);
+	WARN_ON_ONCE_DEFERRED(ret);
 }
 
 #ifdef CONFIG_SMP
@@ -3306,16 +3306,16 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 	 * We should never call set_task_cpu() on a blocked task,
 	 * ttwu() will sort out the placement.
 	 */
-	WARN_ON_ONCE(state != TASK_RUNNING && state != TASK_WAKING && !p->on_rq);
+	WARN_ON_ONCE_DEFERRED(state != TASK_RUNNING && state != TASK_WAKING && !p->on_rq);
 
 	/*
 	 * Migrating fair class task must have p->on_rq = TASK_ON_RQ_MIGRATING,
 	 * because schedstat_wait_{start,end} rebase migrating task's wait_start
 	 * time relying on p->on_rq.
 	 */
-	WARN_ON_ONCE(state == TASK_RUNNING &&
-		     p->sched_class == &fair_sched_class &&
-		     (p->on_rq && !task_on_rq_migrating(p)));
+	WARN_ON_ONCE_DEFERRED(state == TASK_RUNNING &&
+			      p->sched_class == &fair_sched_class &&
+			      (p->on_rq && !task_on_rq_migrating(p)));
 
 #ifdef CONFIG_LOCKDEP
 	/*
@@ -3328,15 +3328,15 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 	 * Furthermore, all task_rq users should acquire both locks, see
 	 * task_rq_lock().
 	 */
-	WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) ||
-				      lockdep_is_held(__rq_lockp(task_rq(p)))));
+	WARN_ON_ONCE_DEFERRED(debug_locks && !(lockdep_is_held(&p->pi_lock) ||
+					       lockdep_is_held(__rq_lockp(task_rq(p)))));
 #endif
 	/*
 	 * Clearly, migrating tasks to offline CPUs is a fairly daft thing.
 	 */
-	WARN_ON_ONCE(!cpu_online(new_cpu));
+	WARN_ON_ONCE_DEFERRED(!cpu_online(new_cpu));
 
-	WARN_ON_ONCE(is_migration_disabled(p));
+	WARN_ON_ONCE_DEFERRED(is_migration_disabled(p));
 
 	trace_sched_migrate_task(p, new_cpu);
 
@@ -3803,10 +3803,10 @@ void sched_ttwu_pending(void *arg)
 	update_rq_clock(rq);
 
 	llist_for_each_entry_safe(p, t, llist, wake_entry.llist) {
-		if (WARN_ON_ONCE(p->on_cpu))
+		if (WARN_ON_ONCE_DEFERRED(p->on_cpu))
 			smp_cond_load_acquire(&p->on_cpu, !VAL);
 
-		if (WARN_ON_ONCE(task_cpu(p) != cpu_of(rq)))
+		if (WARN_ON_ONCE_DEFERRED(task_cpu(p) != cpu_of(rq)))
 			set_task_cpu(p, cpu_of(rq));
 
 		ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf);
@@ -4003,8 +4003,8 @@ bool ttwu_state_match(struct task_struct *p, unsigned int state, int *success)
 	int match;
 
 	if (IS_ENABLED(CONFIG_DEBUG_PREEMPT)) {
-		WARN_ON_ONCE((state & TASK_RTLOCK_WAIT) &&
-			     state != TASK_RTLOCK_WAIT);
+		WARN_ON_ONCE_DEFERRED((state & TASK_RTLOCK_WAIT) &&
+				      state != TASK_RTLOCK_WAIT);
 	}
 
 	*success = !!(match = __task_state_match(p, state));
@@ -5745,7 +5745,7 @@ static void sched_tick_remote(struct work_struct *work)
 			 * we are always sure that there is no proxy (only a
 			 * single task is running).
 			 */
-			WARN_ON_ONCE(rq->curr != rq->donor);
+			WARN_ON_ONCE_DEFERRED(rq->curr != rq->donor);
 			update_rq_clock(rq);
 
 			if (!is_idle_task(curr)) {
@@ -5754,7 +5754,7 @@ static void sched_tick_remote(struct work_struct *work)
 				 * reasonable amount of time.
 				 */
 				u64 delta = rq_clock_task(rq) - curr->se.exec_start;
-				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 30);
+				WARN_ON_ONCE_DEFERRED(delta > (u64)NSEC_PER_SEC * 30);
 			}
 			curr->sched_class->task_tick(rq, curr, 0);
 
@@ -5769,7 +5769,7 @@ static void sched_tick_remote(struct work_struct *work)
 	 * first update state to reflect hotplug activity if required.
 	 */
 	os = atomic_fetch_add_unless(&twork->state, -1, TICK_SCHED_REMOTE_RUNNING);
-	WARN_ON_ONCE(os == TICK_SCHED_REMOTE_OFFLINE);
+	WARN_ON_ONCE_DEFERRED(os == TICK_SCHED_REMOTE_OFFLINE);
 	if (os == TICK_SCHED_REMOTE_RUNNING)
 		queue_delayed_work(system_dfl_wq, dwork, HZ);
 }
@@ -6196,7 +6196,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 			 * For robustness, update the min_vruntime_fi for
 			 * unconstrained picks as well.
 			 */
-			WARN_ON_ONCE(fi_before);
+			WARN_ON_ONCE_DEFERRED(fi_before);
 			task_vruntime_update(rq, next, false);
 			goto out_set_next;
 		}
@@ -6274,7 +6274,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 	rq->core_sched_seq = rq->core->core_pick_seq;
 
 	/* Something should have been selected for current CPU */
-	WARN_ON_ONCE(!next);
+	WARN_ON_ONCE_DEFERRED(!next);
 
 	/*
 	 * Reschedule siblings
@@ -6317,7 +6317,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 		}
 
 		/* Did we break L1TF mitigation requirements? */
-		WARN_ON_ONCE(!cookie_match(next, rq_i->core_pick));
+		WARN_ON_ONCE_DEFERRED(!cookie_match(next, rq_i->core_pick));
 
 		if (rq_i->curr == rq_i->core_pick) {
 			rq_i->core_pick = NULL;
@@ -6717,7 +6717,7 @@ static void proxy_migrate_task(struct rq *rq, struct rq_flags *rf,
 	struct rq *target_rq = cpu_rq(target_cpu);
 
 	lockdep_assert_rq_held(rq);
-	WARN_ON(p == rq->curr);
+	WARN_ON_DEFERRED(p == rq->curr);
 	/*
 	 * Since we are migrating a blocked donor, it could be rq->donor,
 	 * and we want to make sure there aren't any references from this
@@ -6749,7 +6749,7 @@ static void proxy_force_return(struct rq *rq, struct rq_flags *rf,
 	int cpu, wake_flag = WF_TTWU;
 
 	lockdep_assert_rq_held(rq);
-	WARN_ON(p == rq->curr);
+	WARN_ON_DEFERRED(p == rq->curr);
 
 	if (p == rq->donor)
 		proxy_resched_idle(rq);
@@ -6951,7 +6951,7 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
 		 * guarantee its existence, as per ttwu_remote().
 		 */
 	}
-	WARN_ON_ONCE(owner && !owner->on_rq);
+	WARN_ON_ONCE_DEFERRED(owner && !owner->on_rq);
 	return owner;
 
 deactivate:
@@ -7631,8 +7631,8 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
 	 * real need to boost.
 	 */
 	if (unlikely(p == rq->idle)) {
-		WARN_ON(p != rq->curr);
-		WARN_ON(p->pi_blocked_on);
+		WARN_ON_DEFERRED(p != rq->curr);
+		WARN_ON_DEFERRED(p->pi_blocked_on);
 		goto out_unlock;
 	}
 
@@ -8463,7 +8463,7 @@ static void balance_push_set(int cpu, bool on)
 
 	rq_lock_irqsave(rq, &rf);
 	if (on) {
-		WARN_ON_ONCE(rq->balance_callback);
+		WARN_ON_ONCE_DEFERRED(rq->balance_callback);
 		rq->balance_callback = &balance_push_callback;
 	} else if (rq->balance_callback == &balance_push_callback) {
 		rq->balance_callback = NULL;
@@ -11150,7 +11150,7 @@ struct sched_change_ctx *sched_change_begin(struct task_struct *p, unsigned int
 	 * Must exclusively use matched flags since this is both dequeue and
 	 * enqueue.
 	 */
-	WARN_ON_ONCE(flags & 0xFFFF0000);
+	WARN_ON_ONCE_DEFERRED(flags & 0xFFFF0000);
 
 	lockdep_assert_rq_held(rq);
 
@@ -11198,7 +11198,7 @@ void sched_change_end(struct sched_change_ctx *ctx)
 	/*
 	 * Changing class without *QUEUE_CLASS is bad.
 	 */
-	WARN_ON_ONCE(p->sched_class != ctx->class && !(ctx->flags & ENQUEUE_CLASS));
+	WARN_ON_ONCE_DEFERRED(p->sched_class != ctx->class && !(ctx->flags & ENQUEUE_CLASS));
 
 	if ((ctx->flags & ENQUEUE_CLASS) && p->sched_class->switching_to)
 		p->sched_class->switching_to(rq, p);
diff --git a/kernel/sched/core_sched.c b/kernel/sched/core_sched.c
index 73b6b24269119..ec88ed7d8ee87 100644
--- a/kernel/sched/core_sched.c
+++ b/kernel/sched/core_sched.c
@@ -67,7 +67,7 @@ static unsigned long sched_core_update_cookie(struct task_struct *p,
 	 * a cookie until after we've removed it, we must have core scheduling
 	 * enabled here.
 	 */
-	WARN_ON_ONCE((p->core_cookie || cookie) && !sched_core_enabled(rq));
+	WARN_ON_ONCE_DEFERRED((p->core_cookie || cookie) && !sched_core_enabled(rq));
 
 	if (sched_core_enqueued(p))
 		sched_core_dequeue(rq, p, DEQUEUE_SAVE);
@@ -249,7 +249,7 @@ void __sched_core_account_forceidle(struct rq *rq)
 
 	lockdep_assert_rq_held(rq);
 
-	WARN_ON_ONCE(!rq->core->core_forceidle_count);
+	WARN_ON_ONCE_DEFERRED(!rq->core->core_forceidle_count);
 
 	if (rq->core->core_forceidle_start == 0)
 		return;
@@ -260,7 +260,7 @@ void __sched_core_account_forceidle(struct rq *rq)
 
 	rq->core->core_forceidle_start = now;
 
-	if (WARN_ON_ONCE(!rq->core->core_forceidle_occupation)) {
+	if (WARN_ON_ONCE_DEFERRED(!rq->core->core_forceidle_occupation)) {
 		/* can't be forced idle without a running task */
 	} else if (rq->core->core_forceidle_count > 1 ||
 		   rq->core->core_forceidle_occupation > 1) {
diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 0a2b7e30fd10c..e305a8e993e27 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -149,7 +149,7 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
 	} else {
 		int best_cpu = cpudl_maximum(cp);
 
-		WARN_ON(best_cpu != -1 && !cpu_present(best_cpu));
+		WARN_ON_DEFERRED(best_cpu != -1 && !cpu_present(best_cpu));
 
 		if (cpumask_test_cpu(best_cpu, &p->cpus_mask) &&
 		    dl_time_before(dl_se->deadline, cp->elements[0].dl)) {
@@ -177,7 +177,7 @@ void cpudl_clear(struct cpudl *cp, int cpu, bool online)
 	int old_idx, new_cpu;
 	unsigned long flags;
 
-	WARN_ON(!cpu_present(cpu));
+	WARN_ON_DEFERRED(!cpu_present(cpu));
 
 	raw_spin_lock_irqsave(&cp->lock, flags);
 
@@ -220,7 +220,7 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl)
 	int old_idx;
 	unsigned long flags;
 
-	WARN_ON(!cpu_present(cpu));
+	WARN_ON_DEFERRED(!cpu_present(cpu));
 
 	raw_spin_lock_irqsave(&cp->lock, flags);
 
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 7db4c87df83b0..863ac7509192f 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -217,8 +217,8 @@ void __add_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
 
 	lockdep_assert_rq_held(rq_of_dl_rq(dl_rq));
 	dl_rq->running_bw += dl_bw;
-	WARN_ON_ONCE(dl_rq->running_bw < old); /* overflow */
-	WARN_ON_ONCE(dl_rq->running_bw > dl_rq->this_bw);
+	WARN_ON_ONCE_DEFERRED(dl_rq->running_bw < old); /* overflow */
+	WARN_ON_ONCE_DEFERRED(dl_rq->running_bw > dl_rq->this_bw);
 	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
 	cpufreq_update_util(rq_of_dl_rq(dl_rq), 0);
 }
@@ -230,7 +230,7 @@ void __sub_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
 
 	lockdep_assert_rq_held(rq_of_dl_rq(dl_rq));
 	dl_rq->running_bw -= dl_bw;
-	WARN_ON_ONCE(dl_rq->running_bw > old); /* underflow */
+	WARN_ON_ONCE_DEFERRED(dl_rq->running_bw > old); /* underflow */
 	if (dl_rq->running_bw > old)
 		dl_rq->running_bw = 0;
 	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
@@ -244,7 +244,7 @@ void __add_rq_bw(u64 dl_bw, struct dl_rq *dl_rq)
 
 	lockdep_assert_rq_held(rq_of_dl_rq(dl_rq));
 	dl_rq->this_bw += dl_bw;
-	WARN_ON_ONCE(dl_rq->this_bw < old); /* overflow */
+	WARN_ON_ONCE_DEFERRED(dl_rq->this_bw < old); /* overflow */
 }
 
 static inline
@@ -254,10 +254,10 @@ void __sub_rq_bw(u64 dl_bw, struct dl_rq *dl_rq)
 
 	lockdep_assert_rq_held(rq_of_dl_rq(dl_rq));
 	dl_rq->this_bw -= dl_bw;
-	WARN_ON_ONCE(dl_rq->this_bw > old); /* underflow */
+	WARN_ON_ONCE_DEFERRED(dl_rq->this_bw > old); /* underflow */
 	if (dl_rq->this_bw > old)
 		dl_rq->this_bw = 0;
-	WARN_ON_ONCE(dl_rq->running_bw > dl_rq->this_bw);
+	WARN_ON_ONCE_DEFERRED(dl_rq->running_bw > dl_rq->this_bw);
 }
 
 static inline
@@ -335,7 +335,7 @@ void cancel_inactive_timer(struct sched_dl_entity *dl_se)
 
 static void dl_change_utilization(struct task_struct *p, u64 new_bw)
 {
-	WARN_ON_ONCE(p->dl.flags & SCHED_FLAG_SUGOV);
+	WARN_ON_ONCE_DEFERRED(p->dl.flags & SCHED_FLAG_SUGOV);
 
 	if (task_on_rq_queued(p))
 		return;
@@ -416,7 +416,7 @@ static void task_non_contending(struct sched_dl_entity *dl_se, bool dl_task)
 	if (dl_entity_is_special(dl_se))
 		return;
 
-	WARN_ON(dl_se->dl_non_contending);
+	WARN_ON_DEFERRED(dl_se->dl_non_contending);
 
 	zerolag_time = dl_se->deadline -
 		 div64_long((dl_se->runtime * dl_se->dl_period),
@@ -582,7 +582,7 @@ static void enqueue_pushable_dl_task(struct rq *rq, struct task_struct *p)
 {
 	struct rb_node *leftmost;
 
-	WARN_ON_ONCE(!RB_EMPTY_NODE(&p->pushable_dl_tasks));
+	WARN_ON_ONCE_DEFERRED(!RB_EMPTY_NODE(&p->pushable_dl_tasks));
 
 	leftmost = rb_add_cached(&p->pushable_dl_tasks,
 				 &rq->dl.pushable_dl_tasks_root,
@@ -664,7 +664,7 @@ static struct rq *dl_task_offline_migration(struct rq *rq, struct task_struct *p
 			 * Failed to find any suitable CPU.
 			 * The task will never come back!
 			 */
-			WARN_ON_ONCE(dl_bandwidth_enabled());
+			WARN_ON_ONCE_DEFERRED(dl_bandwidth_enabled());
 
 			/*
 			 * If admission control is disabled we
@@ -756,8 +756,8 @@ static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
 	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
 	struct rq *rq = rq_of_dl_rq(dl_rq);
 
-	WARN_ON(is_dl_boosted(dl_se));
-	WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline));
+	WARN_ON_DEFERRED(is_dl_boosted(dl_se));
+	WARN_ON_DEFERRED(dl_time_before(rq_clock(rq), dl_se->deadline));
 
 	/*
 	 * We are racing with the deadline timer. So, do nothing because
@@ -801,7 +801,7 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se)
 	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
 	struct rq *rq = rq_of_dl_rq(dl_rq);
 
-	WARN_ON_ONCE(pi_of(dl_se)->dl_runtime <= 0);
+	WARN_ON_ONCE_DEFERRED(pi_of(dl_se)->dl_runtime <= 0);
 
 	/*
 	 * This could be the case for a !-dl task that is boosted.
@@ -975,7 +975,7 @@ update_dl_revised_wakeup(struct sched_dl_entity *dl_se, struct rq *rq)
 	 *
 	 * See update_dl_entity() comments for further details.
 	 */
-	WARN_ON(dl_time_before(dl_se->deadline, rq_clock(rq)));
+	WARN_ON_DEFERRED(dl_time_before(dl_se->deadline, rq_clock(rq)));
 
 	dl_se->runtime = (dl_se->dl_density * laxity) >> BW_SHIFT;
 }
@@ -1080,7 +1080,7 @@ static int start_dl_timer(struct sched_dl_entity *dl_se)
 	 * (current u > U).
 	 */
 	if (dl_se->dl_defer_armed) {
-		WARN_ON_ONCE(!dl_se->dl_throttled);
+		WARN_ON_ONCE_DEFERRED(!dl_se->dl_throttled);
 		act = ns_to_ktime(dl_se->deadline - dl_se->runtime);
 	} else {
 		/* act = deadline - rel-deadline + period */
@@ -1451,7 +1451,7 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
 		/*
 		 * Non-servers would never get time accounted while throttled.
 		 */
-		WARN_ON_ONCE(!dl_server(dl_se));
+		WARN_ON_ONCE_DEFERRED(!dl_server(dl_se));
 
 		/*
 		 * While the server is marked idle, do not push out the
@@ -1492,7 +1492,7 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
 		 * and queue right away. Otherwise nothing might queue it. That's similar
 		 * to what enqueue_dl_entity() does on start_dl_timer==0. For now, just warn.
 		 */
-		WARN_ON_ONCE(!start_dl_timer(dl_se));
+		WARN_ON_ONCE_DEFERRED(!start_dl_timer(dl_se));
 
 		return;
 	}
@@ -1801,7 +1801,7 @@ void dl_server_start(struct sched_dl_entity *dl_se)
 	 */
 	rq->donor->sched_class->update_curr(rq);
 
-	if (WARN_ON_ONCE(!cpu_online(cpu_of(rq))))
+	if (WARN_ON_ONCE_DEFERRED(!cpu_online(cpu_of(rq))))
 		return;
 
 	trace_sched_dl_server_start_tp(dl_se, cpu_of(rq), dl_get_type(dl_se, rq));
@@ -2073,7 +2073,7 @@ void inc_dl_tasks(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
 static inline
 void dec_dl_tasks(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
 {
-	WARN_ON(!dl_rq->dl_nr_running);
+	WARN_ON_DEFERRED(!dl_rq->dl_nr_running);
 	dl_rq->dl_nr_running--;
 
 	if (!dl_server(dl_se))
@@ -2165,7 +2165,7 @@ static void __enqueue_dl_entity(struct sched_dl_entity *dl_se)
 {
 	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
 
-	WARN_ON_ONCE(!RB_EMPTY_NODE(&dl_se->rb_node));
+	WARN_ON_ONCE_DEFERRED(!RB_EMPTY_NODE(&dl_se->rb_node));
 
 	rb_add_cached(&dl_se->rb_node, &dl_rq->root, __dl_less);
 
@@ -2189,7 +2189,7 @@ static void __dequeue_dl_entity(struct sched_dl_entity *dl_se)
 static void
 enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags)
 {
-	WARN_ON_ONCE(on_dl_rq(dl_se));
+	WARN_ON_ONCE_DEFERRED(on_dl_rq(dl_se));
 
 	update_stats_enqueue_dl(dl_rq_of_se(dl_se), dl_se, flags);
 
@@ -2611,7 +2611,7 @@ static struct task_struct *__pick_task_dl(struct rq *rq, struct rq_flags *rf)
 		return NULL;
 
 	dl_se = pick_next_dl_entity(dl_rq);
-	WARN_ON_ONCE(!dl_se);
+	WARN_ON_ONCE_DEFERRED(!dl_se);
 
 	if (dl_server(dl_se)) {
 		p = dl_se->server_pick_task(dl_se, rf);
@@ -2823,12 +2823,12 @@ static struct task_struct *pick_next_pushable_dl_task(struct rq *rq)
 	if (!p)
 		return NULL;
 
-	WARN_ON_ONCE(rq->cpu != task_cpu(p));
-	WARN_ON_ONCE(task_current(rq, p));
-	WARN_ON_ONCE(p->nr_cpus_allowed <= 1);
+	WARN_ON_ONCE_DEFERRED(rq->cpu != task_cpu(p));
+	WARN_ON_ONCE_DEFERRED(task_current(rq, p));
+	WARN_ON_ONCE_DEFERRED(p->nr_cpus_allowed <= 1);
 
-	WARN_ON_ONCE(!task_on_rq_queued(p));
-	WARN_ON_ONCE(!dl_task(p));
+	WARN_ON_ONCE_DEFERRED(!task_on_rq_queued(p));
+	WARN_ON_ONCE_DEFERRED(!dl_task(p));
 
 	return p;
 }
@@ -2944,7 +2944,7 @@ static int push_dl_task(struct rq *rq)
 	if (is_migration_disabled(next_task))
 		return 0;
 
-	if (WARN_ON(next_task == rq->curr))
+	if (WARN_ON_DEFERRED(next_task == rq->curr))
 		return 0;
 
 	/* We might release rq lock */
@@ -3050,8 +3050,8 @@ static void pull_dl_task(struct rq *this_rq)
 		 */
 		if (p && dl_time_before(p->dl.deadline, dmin) &&
 		    dl_task_is_earliest_deadline(p, this_rq)) {
-			WARN_ON(p == src_rq->curr);
-			WARN_ON(!task_on_rq_queued(p));
+			WARN_ON_DEFERRED(p == src_rq->curr);
+			WARN_ON_DEFERRED(!task_on_rq_queued(p));
 
 			/*
 			 * Then we pull iff p has actually an earlier
@@ -3109,7 +3109,7 @@ static void set_cpus_allowed_dl(struct task_struct *p,
 {
 	struct rq *rq;
 
-	WARN_ON_ONCE(!dl_task(p));
+	WARN_ON_ONCE_DEFERRED(!dl_task(p));
 
 	rq = task_rq(p);
 	/*
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 5d2d19473a82e..47d3a4c16455a 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -512,12 +512,12 @@ do {										\
  * So if kf_tasks[] is set, @p's scheduler-protected fields are stable.
  *
  * kf_tasks[] can not stack, so task-based SCX ops must not nest. The
- * WARN_ON_ONCE() in each macro catches a re-entry of any of the three variants
- * while a previous one is still in progress.
+ * WARN_ON_ONCE_DEFERRED() in each macro catches a re-entry of any of the three
+ * variants while a previous one is still in progress.
  */
 #define SCX_CALL_OP_TASK(sch, op, locked_rq, task, args...)			\
 do {										\
-	WARN_ON_ONCE(current->scx.kf_tasks[0]);					\
+	WARN_ON_ONCE_DEFERRED(current->scx.kf_tasks[0]);			\
 	current->scx.kf_tasks[0] = task;					\
 	SCX_CALL_OP((sch), op, locked_rq, task, ##args);			\
 	current->scx.kf_tasks[0] = NULL;					\
@@ -526,7 +526,7 @@ do {										\
 #define SCX_CALL_OP_TASK_RET(sch, op, locked_rq, task, args...)			\
 ({										\
 	__typeof__((sch)->ops.op(task, ##args)) __ret;				\
-	WARN_ON_ONCE(current->scx.kf_tasks[0]);					\
+	WARN_ON_ONCE_DEFERRED(current->scx.kf_tasks[0]);			\
 	current->scx.kf_tasks[0] = task;					\
 	__ret = SCX_CALL_OP_RET((sch), op, locked_rq, task, ##args);		\
 	current->scx.kf_tasks[0] = NULL;					\
@@ -536,7 +536,7 @@ do {										\
 #define SCX_CALL_OP_2TASKS_RET(sch, op, locked_rq, task0, task1, args...)	\
 ({										\
 	__typeof__((sch)->ops.op(task0, task1, ##args)) __ret;			\
-	WARN_ON_ONCE(current->scx.kf_tasks[0]);					\
+	WARN_ON_ONCE_DEFERRED(current->scx.kf_tasks[0]);			\
 	current->scx.kf_tasks[0] = task0;					\
 	current->scx.kf_tasks[1] = task1;					\
 	__ret = SCX_CALL_OP_RET((sch), op, locked_rq, task0, task1, ##args);	\
@@ -687,7 +687,7 @@ static bool nldsq_cursor_lost_task(struct scx_dsq_list_node *cursor,
 		return true;
 
 	/* if @p has stayed on @dsq, its rq couldn't have changed */
-	if (WARN_ON_ONCE(rq != task_rq(p)))
+	if (WARN_ON_ONCE_DEFERRED(rq != task_rq(p)))
 		return true;
 
 	return false;
@@ -1282,7 +1282,7 @@ static void schedule_reenq_local(struct rq *rq, u64 reenq_flags)
 {
 	struct scx_sched *root = rcu_dereference_sched(scx_root);
 
-	if (WARN_ON_ONCE(!root))
+	if (WARN_ON_ONCE_DEFERRED(!root))
 		return;
 
 	schedule_dsq_reenq(root, &rq->scx.local_dsq, reenq_flags, rq);
@@ -1379,7 +1379,7 @@ static void dsq_inc_nr(struct scx_dispatch_q *dsq, struct task_struct *p, u64 en
 	 */
 	if (enq_flags & SCX_ENQ_IMMED) {
 		if (unlikely(dsq->id != SCX_DSQ_LOCAL)) {
-			WARN_ON_ONCE(!(enq_flags & SCX_ENQ_GDSQ_FALLBACK));
+			WARN_ON_ONCE_DEFERRED(!(enq_flags & SCX_ENQ_GDSQ_FALLBACK));
 			return;
 		}
 		p->scx.flags |= SCX_TASK_IMMED;
@@ -1388,7 +1388,7 @@ static void dsq_inc_nr(struct scx_dispatch_q *dsq, struct task_struct *p, u64 en
 	if (p->scx.flags & SCX_TASK_IMMED) {
 		struct rq *rq = container_of(dsq, struct rq, scx.local_dsq);
 
-		if (WARN_ON_ONCE(dsq->id != SCX_DSQ_LOCAL))
+		if (WARN_ON_ONCE_DEFERRED(dsq->id != SCX_DSQ_LOCAL))
 			return;
 
 		rq->scx.nr_immed++;
@@ -1410,8 +1410,8 @@ static void dsq_dec_nr(struct scx_dispatch_q *dsq, struct task_struct *p)
 	if (p->scx.flags & SCX_TASK_IMMED) {
 		struct rq *rq = container_of(dsq, struct rq, scx.local_dsq);
 
-		if (WARN_ON_ONCE(dsq->id != SCX_DSQ_LOCAL) ||
-		    WARN_ON_ONCE(rq->scx.nr_immed <= 0))
+		if (WARN_ON_ONCE_DEFERRED(dsq->id != SCX_DSQ_LOCAL) ||
+		    WARN_ON_ONCE_DEFERRED(rq->scx.nr_immed <= 0))
 			return;
 
 		rq->scx.nr_immed--;
@@ -1521,9 +1521,9 @@ static void dispatch_enqueue(struct scx_sched *sch, struct rq *rq,
 {
 	bool is_local = dsq->id == SCX_DSQ_LOCAL;
 
-	WARN_ON_ONCE(p->scx.dsq || !list_empty(&p->scx.dsq_list.node));
-	WARN_ON_ONCE((p->scx.dsq_flags & SCX_TASK_DSQ_ON_PRIQ) ||
-		     !RB_EMPTY_NODE(&p->scx.dsq_priq));
+	WARN_ON_ONCE_DEFERRED(p->scx.dsq || !list_empty(&p->scx.dsq_list.node));
+	WARN_ON_ONCE_DEFERRED((p->scx.dsq_flags & SCX_TASK_DSQ_ON_PRIQ) ||
+			      !RB_EMPTY_NODE(&p->scx.dsq_priq));
 
 	if (!is_local) {
 		raw_spin_lock_nested(&dsq->lock,
@@ -1646,7 +1646,7 @@ static void dispatch_enqueue(struct scx_sched *sch, struct rq *rq,
 static void task_unlink_from_dsq(struct task_struct *p,
 				 struct scx_dispatch_q *dsq)
 {
-	WARN_ON_ONCE(list_empty(&p->scx.dsq_list.node));
+	WARN_ON_ONCE_DEFERRED(list_empty(&p->scx.dsq_list.node));
 
 	if (p->scx.dsq_flags & SCX_TASK_DSQ_ON_PRIQ) {
 		rb_erase(&p->scx.dsq_priq, &dsq->priq);
@@ -1709,7 +1709,7 @@ static void dispatch_dequeue(struct rq *rq, struct task_struct *p)
 		 * holding_cpu which tells dispatch_to_local_dsq() that it lost
 		 * the race.
 		 */
-		WARN_ON_ONCE(!list_empty(&p->scx.dsq_list.node));
+		WARN_ON_ONCE_DEFERRED(!list_empty(&p->scx.dsq_list.node));
 		p->scx.holding_cpu = -1;
 	}
 	p->scx.dsq = NULL;
@@ -1787,8 +1787,8 @@ static void mark_direct_dispatch(struct scx_sched *sch,
 		return;
 	}
 
-	WARN_ON_ONCE(p->scx.ddsp_dsq_id != SCX_DSQ_INVALID);
-	WARN_ON_ONCE(p->scx.ddsp_enq_flags);
+	WARN_ON_ONCE_DEFERRED(p->scx.ddsp_dsq_id != SCX_DSQ_INVALID);
+	WARN_ON_ONCE_DEFERRED(p->scx.ddsp_enq_flags);
 
 	p->scx.ddsp_dsq_id = dsq_id;
 	p->scx.ddsp_enq_flags = enq_flags;
@@ -1855,7 +1855,7 @@ static void direct_dispatch(struct scx_sched *sch, struct task_struct *p,
 			break;
 		}
 
-		WARN_ON_ONCE(p->scx.dsq || !list_empty(&p->scx.dsq_list.node));
+		WARN_ON_ONCE_DEFERRED(p->scx.dsq || !list_empty(&p->scx.dsq_list.node));
 		list_add_tail(&p->scx.dsq_list.node,
 			      &rq->scx.ddsp_deferred_locals);
 		schedule_deferred_locked(rq);
@@ -1888,7 +1888,7 @@ static void do_enqueue_task(struct rq *rq, struct task_struct *p, u64 enq_flags,
 	struct scx_dispatch_q *dsq;
 	unsigned long qseq;
 
-	WARN_ON_ONCE(!(p->scx.flags & SCX_TASK_QUEUED));
+	WARN_ON_ONCE_DEFERRED(!(p->scx.flags & SCX_TASK_QUEUED));
 
 	/* internal movements - rq migration / RESTORE */
 	if (sticky_cpu == cpu_of(rq))
@@ -1938,11 +1938,11 @@ static void do_enqueue_task(struct rq *rq, struct task_struct *p, u64 enq_flags,
 	/* DSQ bypass didn't trigger, enqueue on the BPF scheduler */
 	qseq = rq->scx.ops_qseq++ << SCX_OPSS_QSEQ_SHIFT;
 
-	WARN_ON_ONCE(atomic_long_read(&p->scx.ops_state) != SCX_OPSS_NONE);
+	WARN_ON_ONCE_DEFERRED(atomic_long_read(&p->scx.ops_state) != SCX_OPSS_NONE);
 	atomic_long_set(&p->scx.ops_state, SCX_OPSS_QUEUEING | qseq);
 
 	ddsp_taskp = this_cpu_ptr(&direct_dispatch_task);
-	WARN_ON_ONCE(*ddsp_taskp);
+	WARN_ON_ONCE_DEFERRED(*ddsp_taskp);
 	*ddsp_taskp = p;
 
 	SCX_CALL_OP_TASK(sch, enqueue, rq, p, enq_flags);
@@ -2039,7 +2039,7 @@ static void enqueue_task_scx(struct rq *rq, struct task_struct *p, int core_enq_
 		sticky_cpu = cpu_of(rq);
 
 	if (p->scx.flags & SCX_TASK_QUEUED) {
-		WARN_ON_ONCE(!task_runnable(p));
+		WARN_ON_ONCE_DEFERRED(!task_runnable(p));
 		goto out;
 	}
 
@@ -2159,7 +2159,7 @@ static bool dequeue_task_scx(struct rq *rq, struct task_struct *p, int core_deq_
 		deq_flags |= SCX_DEQ_SCHED_CHANGE;
 
 	if (!(p->scx.flags & SCX_TASK_QUEUED)) {
-		WARN_ON_ONCE(task_runnable(p));
+		WARN_ON_ONCE_DEFERRED(task_runnable(p));
 		return true;
 	}
 
@@ -2256,7 +2256,7 @@ static void move_local_task_to_local_dsq(struct scx_sched *sch,
 	lockdep_assert_held(&src_dsq->lock);
 	lockdep_assert_rq_held(dst_rq);
 
-	WARN_ON_ONCE(p->scx.holding_cpu >= 0);
+	WARN_ON_ONCE_DEFERRED(p->scx.holding_cpu >= 0);
 
 	if (enq_flags & (SCX_ENQ_HEAD | SCX_ENQ_PREEMPT))
 		list_add(&p->scx.dsq_list.node, &dst_dsq->list);
@@ -2299,8 +2299,8 @@ static void move_remote_task_to_local_dsq(struct task_struct *p, u64 enq_flags,
 	 * truncate the upper 32 bit. As we own @rq, we can pass them through
 	 * @rq->scx.extra_enq_flags instead.
 	 */
-	WARN_ON_ONCE(!cpumask_test_cpu(cpu_of(dst_rq), p->cpus_ptr));
-	WARN_ON_ONCE(dst_rq->scx.extra_enq_flags);
+	WARN_ON_ONCE_DEFERRED(!cpumask_test_cpu(cpu_of(dst_rq), p->cpus_ptr));
+	WARN_ON_ONCE_DEFERRED(dst_rq->scx.extra_enq_flags);
 	dst_rq->scx.extra_enq_flags = enq_flags;
 	activate_task(dst_rq, p, 0);
 	dst_rq->scx.extra_enq_flags = 0;
@@ -2331,7 +2331,7 @@ static bool task_can_run_on_remote_rq(struct scx_sched *sch,
 {
 	s32 cpu = cpu_of(rq);
 
-	WARN_ON_ONCE(task_cpu(p) == cpu);
+	WARN_ON_ONCE_DEFERRED(task_cpu(p) == cpu);
 
 	/*
 	 * If @p has migration disabled, @p->cpus_ptr is updated to contain only
@@ -2411,7 +2411,7 @@ static bool unlink_dsq_and_lock_src_rq(struct task_struct *p,
 
 	lockdep_assert_held(&dsq->lock);
 
-	WARN_ON_ONCE(p->scx.holding_cpu >= 0);
+	WARN_ON_ONCE_DEFERRED(p->scx.holding_cpu >= 0);
 	task_unlink_from_dsq(p, dsq);
 	p->scx.holding_cpu = cpu;
 
@@ -2420,7 +2420,7 @@ static bool unlink_dsq_and_lock_src_rq(struct task_struct *p,
 
 	/* task_rq couldn't have changed if we're still the holding cpu */
 	return likely(p->scx.holding_cpu == cpu) &&
-		!WARN_ON_ONCE(src_rq != task_rq(p));
+		!WARN_ON_ONCE_DEFERRED(src_rq != task_rq(p));
 }
 
 static bool consume_remote_task(struct rq *this_rq,
@@ -2630,7 +2630,7 @@ static void dispatch_to_local_dsq(struct scx_sched *sch, struct rq *rq,
 
 	/* task_rq couldn't have changed if we're still the holding cpu */
 	if (likely(p->scx.holding_cpu == raw_smp_processor_id()) &&
-	    !WARN_ON_ONCE(src_rq != task_rq(p))) {
+	    !WARN_ON_ONCE_DEFERRED(src_rq != task_rq(p))) {
 		/*
 		 * If @p is staying on the same rq, there's no need to go
 		 * through the full deactivate/activate cycle. Optimize by
@@ -3099,7 +3099,7 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
 		 * which should trigger an explicit follow-up scheduling event.
 		 */
 		if (next && sched_class_above(&ext_sched_class, next->sched_class)) {
-			WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
+			WARN_ON_ONCE_DEFERRED(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
 			do_enqueue_task(rq, p, SCX_ENQ_LAST, -1);
 		} else {
 			do_enqueue_task(rq, p, 0, -1);
@@ -3201,7 +3201,7 @@ do_pick_task_scx(struct rq *rq, struct rq_flags *rf, bool force_scx)
 	keep_prev = rq->scx.flags & SCX_RQ_BAL_KEEP;
 	if (unlikely(keep_prev &&
 		     prev->sched_class != &ext_sched_class)) {
-		WARN_ON_ONCE(scx_enable_state() == SCX_ENABLED);
+		WARN_ON_ONCE_DEFERRED(scx_enable_state() == SCX_ENABLED);
 		keep_prev = false;
 	}
 
@@ -3332,7 +3332,7 @@ static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wake_flag
 		struct task_struct **ddsp_taskp;
 
 		ddsp_taskp = this_cpu_ptr(&direct_dispatch_task);
-		WARN_ON_ONCE(*ddsp_taskp);
+		WARN_ON_ONCE_DEFERRED(*ddsp_taskp);
 		*ddsp_taskp = p;
 
 		this_rq()->scx.in_select_cpu = true;
@@ -3620,7 +3620,7 @@ static void __scx_enable_task(struct scx_sched *sch, struct task_struct *p)
 	 * transitions are consistent, the flag should always be clear
 	 * here.
 	 */
-	WARN_ON_ONCE(p->scx.flags & SCX_TASK_IN_CUSTODY);
+	WARN_ON_ONCE_DEFERRED(p->scx.flags & SCX_TASK_IN_CUSTODY);
 
 	/*
 	 * Set the weight before calling ops.enable() so that the scheduler
@@ -3651,7 +3651,7 @@ static void scx_disable_task(struct scx_sched *sch, struct task_struct *p)
 	struct rq *rq = task_rq(p);
 
 	lockdep_assert_rq_held(rq);
-	WARN_ON_ONCE(scx_get_task_state(p) != SCX_TASK_ENABLED);
+	WARN_ON_ONCE_DEFERRED(scx_get_task_state(p) != SCX_TASK_ENABLED);
 
 	clear_direct_dispatch(p);
 
@@ -3664,7 +3664,7 @@ static void scx_disable_task(struct scx_sched *sch, struct task_struct *p)
 	 * transitions are consistent, the flag should always be clear
 	 * here.
 	 */
-	WARN_ON_ONCE(p->scx.flags & SCX_TASK_IN_CUSTODY);
+	WARN_ON_ONCE_DEFERRED(p->scx.flags & SCX_TASK_IN_CUSTODY);
 }
 
 static void __scx_disable_and_exit_task(struct scx_sched *sch,
@@ -3689,7 +3689,7 @@ static void __scx_disable_and_exit_task(struct scx_sched *sch,
 		scx_disable_task(sch, p);
 		break;
 	default:
-		WARN_ON_ONCE(true);
+		WARN_ON_ONCE_DEFERRED(true);
 		return;
 	}
 
@@ -3726,7 +3726,7 @@ static void scx_disable_and_exit_task(struct scx_sched *sch,
 	 * path, so it's always clear when @p arrives here in %SCX_TASK_NONE.
 	 */
 	if (p->scx.flags & SCX_TASK_SUB_INIT) {
-		if (!WARN_ON_ONCE(!scx_enabling_sub_sched))
+		if (!WARN_ON_ONCE_DEFERRED(!scx_enabling_sub_sched))
 			scx_sub_init_cancel_task(scx_enabling_sub_sched, p);
 		p->scx.flags &= ~SCX_TASK_SUB_INIT;
 	}
@@ -3818,7 +3818,7 @@ void scx_cancel_fork(struct task_struct *p)
 		struct rq_flags rf;
 
 		rq = task_rq_lock(p, &rf);
-		WARN_ON_ONCE(scx_get_task_state(p) >= SCX_TASK_READY);
+		WARN_ON_ONCE_DEFERRED(scx_get_task_state(p) >= SCX_TASK_READY);
 		scx_disable_and_exit_task(scx_task_sched(p), p);
 		task_rq_unlock(rq, p, &rf);
 	}
@@ -3986,7 +3986,7 @@ static void process_ddsp_deferred_locals(struct rq *rq)
 		clear_direct_dispatch(p);
 
 		dsq = find_dsq_for_dispatch(sch, rq, dsq_id, task_cpu(p));
-		if (!WARN_ON_ONCE(dsq->id != SCX_DSQ_LOCAL))
+		if (!WARN_ON_ONCE_DEFERRED(dsq->id != SCX_DSQ_LOCAL))
 			dispatch_to_local_dsq(sch, rq, dsq, p, enq_flags);
 	}
 }
@@ -4041,7 +4041,7 @@ static u32 reenq_local(struct scx_sched *sch, struct rq *rq, u64 reenq_flags)
 
 	lockdep_assert_rq_held(rq);
 
-	if (WARN_ON_ONCE(reenq_flags & __SCX_REENQ_TSR_MASK))
+	if (WARN_ON_ONCE_DEFERRED(reenq_flags & __SCX_REENQ_TSR_MASK))
 		reenq_flags &= ~__SCX_REENQ_TSR_MASK;
 	if (rq_is_open(rq, 0))
 		reenq_flags |= SCX_REENQ_TSR_RQ_OPEN;
@@ -4078,7 +4078,7 @@ static u32 reenq_local(struct scx_sched *sch, struct rq *rq, u64 reenq_flags)
 
 		dispatch_dequeue(rq, p);
 
-		if (WARN_ON_ONCE(p->scx.flags & SCX_TASK_REENQ_REASON_MASK))
+		if (WARN_ON_ONCE_DEFERRED(p->scx.flags & SCX_TASK_REENQ_REASON_MASK))
 			p->scx.flags &= ~SCX_TASK_REENQ_REASON_MASK;
 		p->scx.flags |= reason;
 
@@ -4199,7 +4199,7 @@ static void reenq_user(struct rq *rq, struct scx_dispatch_q *dsq, u64 reenq_flag
 		dispatch_dequeue_locked(p, dsq);
 		raw_spin_unlock(&dsq->lock);
 
-		if (WARN_ON_ONCE(p->scx.flags & SCX_TASK_REENQ_REASON_MASK))
+		if (WARN_ON_ONCE_DEFERRED(p->scx.flags & SCX_TASK_REENQ_REASON_MASK))
 			p->scx.flags &= ~SCX_TASK_REENQ_REASON_MASK;
 		p->scx.flags |= reason;
 
@@ -4360,7 +4360,7 @@ int scx_cgroup_can_attach(struct cgroup_taskset *tset)
 		struct cgroup *from = tg_cgrp(task_group(p));
 		struct cgroup *to = tg_cgrp(css_tg(css));
 
-		WARN_ON_ONCE(p->scx.cgrp_moving_from);
+		WARN_ON_ONCE_DEFERRED(p->scx.cgrp_moving_from);
 
 		/*
 		 * sched_move_task() omits identity migrations. Let's match the
@@ -4617,7 +4617,7 @@ static void exit_dsq(struct scx_dispatch_q *dsq)
 		 * There must have been a RCU grace period since the last
 		 * insertion and @dsq should be off the deferred list by now.
 		 */
-		if (WARN_ON_ONCE(!list_empty(&dru->node))) {
+		if (WARN_ON_ONCE_DEFERRED(!list_empty(&dru->node))) {
 			guard(raw_spinlock_irqsave)(&rq->scx.deferred_reenq_lock);
 			list_del_init(&dru->node);
 		}
@@ -4745,7 +4745,7 @@ static int scx_cgroup_init(struct scx_sched *sch)
 		tg->scx.flags |= SCX_TG_INITED;
 	}
 
-	WARN_ON_ONCE(scx_cgroup_enabled);
+	WARN_ON_ONCE_DEFERRED(scx_cgroup_enabled);
 	scx_cgroup_enabled = true;
 
 	return 0;
@@ -4848,7 +4848,7 @@ static void scx_sched_free_rcu_work(struct work_struct *work)
 		 * period. As that blocks new deferrals, all
 		 * deferred_reenq_local_node's must be off-list by now.
 		 */
-		WARN_ON_ONCE(!list_empty(&pcpu->deferred_reenq_local.node));
+		WARN_ON_ONCE_DEFERRED(!list_empty(&pcpu->deferred_reenq_local.node));
 
 		exit_dsq(bypass_dsq(sch, cpu));
 	}
@@ -5324,7 +5324,7 @@ static bool inc_bypass_depth(struct scx_sched *sch)
 {
 	lockdep_assert_held(&scx_bypass_lock);
 
-	WARN_ON_ONCE(sch->bypass_depth < 0);
+	WARN_ON_ONCE_DEFERRED(sch->bypass_depth < 0);
 	WRITE_ONCE(sch->bypass_depth, sch->bypass_depth + 1);
 	if (sch->bypass_depth != 1)
 		return false;
@@ -5339,7 +5339,7 @@ static bool dec_bypass_depth(struct scx_sched *sch)
 {
 	lockdep_assert_held(&scx_bypass_lock);
 
-	WARN_ON_ONCE(sch->bypass_depth < 1);
+	WARN_ON_ONCE_DEFERRED(sch->bypass_depth < 1);
 	WRITE_ONCE(sch->bypass_depth, sch->bypass_depth - 1);
 	if (sch->bypass_depth != 0)
 		return false;
@@ -5360,7 +5360,7 @@ static void enable_bypass_dsp(struct scx_sched *sch)
 	 * @sch->bypass_depth transitioning from 0 to 1 triggers enabling.
 	 * Shouldn't stagger.
 	 */
-	if (WARN_ON_ONCE(test_and_set_bit(0, &sch->bypass_dsp_claim)))
+	if (WARN_ON_ONCE_DEFERRED(test_and_set_bit(0, &sch->bypass_dsp_claim)))
 		return;
 
 	/*
@@ -5380,11 +5380,11 @@ static void enable_bypass_dsp(struct scx_sched *sch)
 	 * Bump enable depth on both @sch and bypass dispatch host.
 	 */
 	ret = atomic_inc_return(&sch->bypass_dsp_enable_depth);
-	WARN_ON_ONCE(ret <= 0);
+	WARN_ON_ONCE_DEFERRED(ret <= 0);
 
 	if (host != sch) {
 		ret = atomic_inc_return(&host->bypass_dsp_enable_depth);
-		WARN_ON_ONCE(ret <= 0);
+		WARN_ON_ONCE_DEFERRED(ret <= 0);
 	}
 
 	/*
@@ -5405,11 +5405,11 @@ static void disable_bypass_dsp(struct scx_sched *sch)
 		return;
 
 	ret = atomic_dec_return(&sch->bypass_dsp_enable_depth);
-	WARN_ON_ONCE(ret < 0);
+	WARN_ON_ONCE_DEFERRED(ret < 0);
 
 	if (scx_parent(sch)) {
 		ret = atomic_dec_return(&scx_parent(sch)->bypass_dsp_enable_depth);
-		WARN_ON_ONCE(ret < 0);
+		WARN_ON_ONCE_DEFERRED(ret < 0);
 	}
 }
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3ebec186f9823..1213e77665fe9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -404,7 +404,7 @@ static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
 
 static inline void assert_list_leaf_cfs_rq(struct rq *rq)
 {
-	WARN_ON_ONCE(rq->tmp_alone_branch != &rq->leaf_cfs_rq_list);
+	WARN_ON_ONCE_DEFERRED(rq->tmp_alone_branch != &rq->leaf_cfs_rq_list);
 }
 
 /* Iterate through all leaf cfs_rq's on a runqueue */
@@ -689,7 +689,7 @@ __sum_w_vruntime_add(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	s64 w_vruntime, key = entity_key(cfs_rq, se);
 
 	w_vruntime = key * weight;
-	WARN_ON_ONCE((w_vruntime >> 63) != (w_vruntime >> 62));
+	WARN_ON_ONCE_DEFERRED((w_vruntime >> 63) != (w_vruntime >> 62));
 
 	cfs_rq->sum_w_vruntime += w_vruntime;
 	cfs_rq->sum_weight += weight;
@@ -861,7 +861,7 @@ bool update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	u64 avruntime = avg_vruntime(cfs_rq);
 	s64 vlag = entity_lag(cfs_rq, se, avruntime);
 
-	WARN_ON_ONCE(!se->on_rq);
+	WARN_ON_ONCE_DEFERRED(!se->on_rq);
 
 	if (se->sched_delayed) {
 		/* previous vlag < 0 otherwise se would not be delayed */
@@ -1153,7 +1153,7 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq, bool protect)
 	if (sched_feat(PICK_BUDDY) && protect &&
 	    cfs_rq->next && entity_eligible(cfs_rq, cfs_rq->next)) {
 		/* ->next will never be delayed */
-		WARN_ON_ONCE(cfs_rq->next->sched_delayed);
+		WARN_ON_ONCE_DEFERRED(cfs_rq->next->sched_delayed);
 		return cfs_rq->next;
 	}
 
@@ -4302,9 +4302,9 @@ static inline bool load_avg_is_decayed(struct sched_avg *sa)
 	 * Make sure that rounding and/or propagation of PELT values never
 	 * break this.
 	 */
-	WARN_ON_ONCE(sa->load_avg ||
-		      sa->util_avg ||
-		      sa->runnable_avg);
+	WARN_ON_ONCE_DEFERRED(sa->load_avg ||
+			      sa->util_avg ||
+			      sa->runnable_avg);
 
 	return true;
 }
@@ -5460,7 +5460,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 
 		weight = avg_vruntime_weight(cfs_rq, se->load.weight);
 		lag *= load + weight;
-		if (WARN_ON_ONCE(!load))
+		if (WARN_ON_ONCE_DEFERRED(!load))
 			load = 1;
 		lag = div64_long(lag, load);
 
@@ -5653,7 +5653,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	clear_buddies(cfs_rq, se);
 
 	if (flags & DEQUEUE_DELAYED) {
-		WARN_ON_ONCE(!se->sched_delayed);
+		WARN_ON_ONCE_DEFERRED(!se->sched_delayed);
 	} else {
 		bool delay = sleep;
 		/*
@@ -5663,7 +5663,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 		if (flags & (DEQUEUE_SPECIAL | DEQUEUE_THROTTLE))
 			delay = false;
 
-		WARN_ON_ONCE(delay && se->sched_delayed);
+		WARN_ON_ONCE_DEFERRED(delay && se->sched_delayed);
 
 		if (sched_feat(DELAY_DEQUEUE) && delay &&
 		    !entity_eligible(cfs_rq, se)) {
@@ -5747,7 +5747,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, bool first)
 	}
 
 	update_stats_curr_start(cfs_rq, se);
-	WARN_ON_ONCE(cfs_rq->curr);
+	WARN_ON_ONCE_DEFERRED(cfs_rq->curr);
 	cfs_rq->curr = se;
 
 	/*
@@ -5814,7 +5814,7 @@ static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *prev)
 		/* in !on_rq case, update occurred at dequeue */
 		update_load_avg(cfs_rq, prev, 0);
 	}
-	WARN_ON_ONCE(cfs_rq->curr != prev);
+	WARN_ON_ONCE_DEFERRED(cfs_rq->curr != prev);
 	cfs_rq->curr = NULL;
 }
 
@@ -6015,7 +6015,7 @@ static void throttle_cfs_rq_work(struct callback_head *work)
 	struct cfs_rq *cfs_rq;
 	struct rq *rq;
 
-	WARN_ON_ONCE(p != current);
+	WARN_ON_ONCE_DEFERRED(p != current);
 	p->sched_throttle_work.next = &p->sched_throttle_work;
 
 	/*
@@ -6041,7 +6041,7 @@ static void throttle_cfs_rq_work(struct callback_head *work)
 			return;
 		rq = scope.rq;
 		update_rq_clock(rq);
-		WARN_ON_ONCE(p->throttled || !list_empty(&p->throttle_node));
+		WARN_ON_ONCE_DEFERRED(p->throttled || !list_empty(&p->throttle_node));
 		dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_THROTTLE);
 		list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list);
 		/*
@@ -6072,7 +6072,7 @@ void init_cfs_throttle_work(struct task_struct *p)
 static void detach_task_cfs_rq(struct task_struct *p);
 static void dequeue_throttled_task(struct task_struct *p, int flags)
 {
-	WARN_ON_ONCE(p->se.on_rq);
+	WARN_ON_ONCE_DEFERRED(p->se.on_rq);
 	list_del_init(&p->throttle_node);
 
 	/* task blocked after throttled */
@@ -6094,7 +6094,7 @@ static bool enqueue_throttled_task(struct task_struct *p)
 	struct cfs_rq *cfs_rq = cfs_rq_of(&p->se);
 
 	/* @p should have gone through dequeue_throttled_task() first */
-	WARN_ON_ONCE(!list_empty(&p->throttle_node));
+	WARN_ON_ONCE_DEFERRED(!list_empty(&p->throttle_node));
 
 	/*
 	 * If the throttled task @p is enqueued to a throttled cfs_rq,
@@ -6162,7 +6162,7 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
 
 		cfs_rq->throttled_clock_self = 0;
 
-		if (WARN_ON_ONCE((s64)delta < 0))
+		if (WARN_ON_ONCE_DEFERRED((s64)delta < 0))
 			delta = 0;
 
 		cfs_rq->throttled_clock_self_time += delta;
@@ -6231,8 +6231,8 @@ static int tg_throttle_down(struct task_group *tg, void *data)
 		cfs_rq->pelt_clock_throttled = 1;
 	}
 
-	WARN_ON_ONCE(cfs_rq->throttled_clock_self);
-	WARN_ON_ONCE(!list_empty(&cfs_rq->throttled_limbo_list));
+	WARN_ON_ONCE_DEFERRED(cfs_rq->throttled_clock_self);
+	WARN_ON_ONCE_DEFERRED(!list_empty(&cfs_rq->throttled_limbo_list));
 	return 0;
 }
 
@@ -6273,7 +6273,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
 	 * throttled-list.  rq->lock protects completion.
 	 */
 	cfs_rq->throttled = 1;
-	WARN_ON_ONCE(cfs_rq->throttled_clock);
+	WARN_ON_ONCE_DEFERRED(cfs_rq->throttled_clock);
 	return true;
 }
 
@@ -6380,7 +6380,7 @@ static inline void __unthrottle_cfs_rq_async(struct cfs_rq *cfs_rq)
 	}
 
 	/* Already enqueued */
-	if (WARN_ON_ONCE(!list_empty(&cfs_rq->throttled_csd_list)))
+	if (WARN_ON_ONCE_DEFERRED(!list_empty(&cfs_rq->throttled_csd_list)))
 		return;
 
 	first = list_empty(&rq->cfsb_csd_list);
@@ -6393,7 +6393,7 @@ static void unthrottle_cfs_rq_async(struct cfs_rq *cfs_rq)
 {
 	lockdep_assert_rq_held(rq_of(cfs_rq));
 
-	if (WARN_ON_ONCE(!cfs_rq_throttled(cfs_rq) ||
+	if (WARN_ON_ONCE_DEFERRED(!cfs_rq_throttled(cfs_rq) ||
 	    cfs_rq->runtime_remaining <= 0))
 		return;
 
@@ -6429,7 +6429,7 @@ static bool distribute_cfs_runtime(struct cfs_bandwidth *cfs_b)
 			goto next;
 
 		/* By the above checks, this should never be true */
-		WARN_ON_ONCE(cfs_rq->runtime_remaining > 0);
+		WARN_ON_ONCE_DEFERRED(cfs_rq->runtime_remaining > 0);
 
 		raw_spin_lock(&cfs_b->lock);
 		runtime = -cfs_rq->runtime_remaining + 1;
@@ -6450,7 +6450,7 @@ static bool distribute_cfs_runtime(struct cfs_bandwidth *cfs_b)
 				 * We currently only expect to be unthrottling
 				 * a single cfs_rq locally.
 				 */
-				WARN_ON_ONCE(!list_empty(&local_unthrottle));
+				WARN_ON_ONCE_DEFERRED(!list_empty(&local_unthrottle));
 				list_add_tail(&cfs_rq->throttled_csd_list,
 					      &local_unthrottle);
 			}
@@ -6475,7 +6475,7 @@ static bool distribute_cfs_runtime(struct cfs_bandwidth *cfs_b)
 
 		rq_unlock_irqrestore(rq, &rf);
 	}
-	WARN_ON_ONCE(!list_empty(&local_unthrottle));
+	WARN_ON_ONCE_DEFERRED(!list_empty(&local_unthrottle));
 
 	rcu_read_unlock();
 
@@ -7048,7 +7048,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
 	u64 vdelta;
 	u64 delta;
 
-	WARN_ON_ONCE(task_rq(p) != rq);
+	WARN_ON_ONCE_DEFERRED(task_rq(p) != rq);
 
 	if (rq->cfs.h_nr_queued <= 1)
 		return;
@@ -7171,8 +7171,8 @@ requeue_delayed_entity(struct sched_entity *se)
 	 * Because a delayed entity is one that is still on
 	 * the runqueue competing until elegibility.
 	 */
-	WARN_ON_ONCE(!se->sched_delayed);
-	WARN_ON_ONCE(!se->on_rq);
+	WARN_ON_ONCE_DEFERRED(!se->sched_delayed);
+	WARN_ON_ONCE_DEFERRED(!se->on_rq);
 
 	if (update_entity_lag(cfs_rq, se)) {
 		cfs_rq->nr_queued--;
@@ -7409,8 +7409,8 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
 		rq->next_balance = jiffies;
 
 	if (p && task_delayed) {
-		WARN_ON_ONCE(!task_sleep);
-		WARN_ON_ONCE(p->on_rq != 1);
+		WARN_ON_ONCE_DEFERRED(!task_sleep);
+		WARN_ON_ONCE_DEFERRED(p->on_rq != 1);
 
 		/*
 		 * Fix-up what block_task() skipped.
@@ -8976,7 +8976,7 @@ static void set_cpus_allowed_fair(struct task_struct *p, struct affinity_context
 static void set_next_buddy(struct sched_entity *se)
 {
 	for_each_sched_entity(se) {
-		if (WARN_ON_ONCE(!se->on_rq))
+		if (WARN_ON_ONCE_DEFERRED(!se->on_rq))
 			return;
 		if (se_is_idle(se))
 			return;
@@ -9023,7 +9023,7 @@ preempt_sync(struct rq *rq, int wake_flags,
 	 * WF_SYNC without WF_TTWU is not expected so warn if it happens even
 	 * though it is likely harmless.
 	 */
-	WARN_ON_ONCE(!(wake_flags & WF_TTWU));
+	WARN_ON_ONCE_DEFERRED(!(wake_flags & WF_TTWU));
 
 	threshold = sysctl_sched_migration_cost;
 	delta = rq_clock_task(rq) - se->exec_start;
@@ -9095,7 +9095,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
 		return;
 
 	find_matching_se(&se, &pse);
-	WARN_ON_ONCE(!pse);
+	WARN_ON_ONCE_DEFERRED(!pse);
 
 	cse_is_idle = se_is_idle(se);
 	pse_is_idle = se_is_idle(pse);
@@ -9857,8 +9857,8 @@ static void detach_task(struct task_struct *p, struct lb_env *env)
 		schedstat_inc(p->stats.nr_forced_migrations);
 	}
 
-	WARN_ON(task_current(env->src_rq, p));
-	WARN_ON(task_current_donor(env->src_rq, p));
+	WARN_ON_DEFERRED(task_current(env->src_rq, p));
+	WARN_ON_DEFERRED(task_current_donor(env->src_rq, p));
 
 	deactivate_task(env->src_rq, p, DEQUEUE_NOCLOCK);
 	set_task_cpu(p, env->dst_cpu);
@@ -12151,7 +12151,7 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
 		goto out_balanced;
 	}
 
-	WARN_ON_ONCE(busiest == env.dst_rq);
+	WARN_ON_ONCE_DEFERRED(busiest == env.dst_rq);
 
 	update_lb_imbalance_stat(&env, sd, idle);
 
@@ -12461,7 +12461,7 @@ static int active_load_balance_cpu_stop(void *data)
 	 * we need to fix it. Originally reported by
 	 * Bjorn Helgaas on a 128-CPU setup.
 	 */
-	WARN_ON_ONCE(busiest_rq == target_rq);
+	WARN_ON_ONCE_DEFERRED(busiest_rq == target_rq);
 
 	/* Search for an sd spanning us and the target CPU. */
 	rcu_read_lock();
@@ -12883,7 +12883,7 @@ static void set_cpu_sd_state_busy(int cpu)
 
 void nohz_balance_exit_idle(struct rq *rq)
 {
-	WARN_ON_ONCE(rq != this_rq());
+	WARN_ON_ONCE_DEFERRED(rq != this_rq());
 
 	if (likely(!rq->nohz_tick_stopped))
 		return;
@@ -12918,7 +12918,7 @@ void nohz_balance_enter_idle(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
 
-	WARN_ON_ONCE(cpu != smp_processor_id());
+	WARN_ON_ONCE_DEFERRED(cpu != smp_processor_id());
 
 	/* If this CPU is going down, then nothing needs to be done: */
 	if (!cpu_active(cpu))
@@ -13000,7 +13000,7 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags)
 	int balance_cpu;
 	struct rq *rq;
 
-	WARN_ON_ONCE((flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK);
+	WARN_ON_ONCE_DEFERRED((flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK);
 
 	/*
 	 * We assume there will be no idle load after this update and clear
@@ -13623,7 +13623,7 @@ bool cfs_prio_less(const struct task_struct *a, const struct task_struct *b,
 	struct cfs_rq *cfs_rqb;
 	s64 delta;
 
-	WARN_ON_ONCE(task_rq(b)->core != rq->core);
+	WARN_ON_ONCE_DEFERRED(task_rq(b)->core != rq->core);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	/*
@@ -13839,7 +13839,7 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
 
 static void switched_to_fair(struct rq *rq, struct task_struct *p)
 {
-	WARN_ON_ONCE(p->se.sched_delayed);
+	WARN_ON_ONCE_DEFERRED(p->se.sched_delayed);
 
 	attach_task_cfs_rq(p);
 
@@ -13872,7 +13872,7 @@ static void __set_next_task_fair(struct rq *rq, struct task_struct *p, bool firs
 	if (!first)
 		return;
 
-	WARN_ON_ONCE(se->sched_delayed);
+	WARN_ON_ONCE_DEFERRED(se->sched_delayed);
 
 	if (hrtick_enabled_fair(rq))
 		hrtick_start_fair(rq, p);
@@ -14148,7 +14148,7 @@ int sched_group_set_idle(struct task_group *tg, long idle)
 		rq_lock_irqsave(rq, &rf);
 
 		grp_cfs_rq->idle = idle;
-		if (WARN_ON_ONCE(was_idle == cfs_rq_is_idle(grp_cfs_rq)))
+		if (WARN_ON_ONCE_DEFERRED(was_idle == cfs_rq_is_idle(grp_cfs_rq)))
 			goto next_cpu;
 
 		idle_task_delta = grp_cfs_rq->h_nr_queued -
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4ee8faf01441a..506d0f1afa58f 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -170,7 +170,7 @@ static void destroy_rt_bandwidth(struct rt_bandwidth *rt_b)
 
 static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se)
 {
-	WARN_ON_ONCE(!rt_entity_is_task(rt_se));
+	WARN_ON_ONCE_DEFERRED(!rt_entity_is_task(rt_se));
 
 	return container_of(rt_se, struct task_struct, rt);
 }
@@ -178,13 +178,13 @@ static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se)
 static inline struct rq *rq_of_rt_rq(struct rt_rq *rt_rq)
 {
 	/* Cannot fold with non-CONFIG_RT_GROUP_SCHED version, layout */
-	WARN_ON(!rt_group_sched_enabled() && rt_rq->tg != &root_task_group);
+	WARN_ON_DEFERRED(!rt_group_sched_enabled() && rt_rq->tg != &root_task_group);
 	return rt_rq->rq;
 }
 
 static inline struct rt_rq *rt_rq_of_se(struct sched_rt_entity *rt_se)
 {
-	WARN_ON(!rt_group_sched_enabled() && rt_se->rt_rq->tg != &root_task_group);
+	WARN_ON_DEFERRED(!rt_group_sched_enabled() && rt_se->rt_rq->tg != &root_task_group);
 	return rt_se->rt_rq;
 }
 
@@ -192,7 +192,7 @@ static inline struct rq *rq_of_rt_se(struct sched_rt_entity *rt_se)
 {
 	struct rt_rq *rt_rq = rt_se->rt_rq;
 
-	WARN_ON(!rt_group_sched_enabled() && rt_rq->tg != &root_task_group);
+	WARN_ON_DEFERRED(!rt_group_sched_enabled() && rt_rq->tg != &root_task_group);
 	return rt_rq->rq;
 }
 
@@ -493,7 +493,7 @@ typedef struct task_group *rt_rq_iter_t;
 static inline struct task_group *next_task_group(struct task_group *tg)
 {
 	if (!rt_group_sched_enabled()) {
-		WARN_ON(tg != &root_task_group);
+		WARN_ON_DEFERRED(tg != &root_task_group);
 		return NULL;
 	}
 
@@ -723,7 +723,7 @@ static void __disable_runtime(struct rq *rq)
 		 * We cannot be left wanting - that would mean some runtime
 		 * leaked out of the system.
 		 */
-		WARN_ON_ONCE(want);
+		WARN_ON_ONCE_DEFERRED(want);
 balanced:
 		/*
 		 * Disable all the borrow logic by pretending we have inf
@@ -1094,7 +1094,7 @@ dec_rt_prio(struct rt_rq *rt_rq, int prio)
 
 	if (rt_rq->rt_nr_running) {
 
-		WARN_ON(prio < prev_prio);
+		WARN_ON_DEFERRED(prio < prev_prio);
 
 		/*
 		 * This may have been our highest task, and therefore
@@ -1131,7 +1131,7 @@ dec_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
 	if (rt_se_boosted(rt_se))
 		rt_rq->rt_nr_boosted--;
 
-	WARN_ON(!rt_rq->rt_nr_running && rt_rq->rt_nr_boosted);
+	WARN_ON_DEFERRED(!rt_rq->rt_nr_running && rt_rq->rt_nr_boosted);
 }
 
 #else /* !CONFIG_RT_GROUP_SCHED: */
@@ -1176,7 +1176,7 @@ void inc_rt_tasks(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
 {
 	int prio = rt_se_prio(rt_se);
 
-	WARN_ON(!rt_prio(prio));
+	WARN_ON_DEFERRED(!rt_prio(prio));
 	rt_rq->rt_nr_running += rt_se_nr_running(rt_se);
 	rt_rq->rr_nr_running += rt_se_rr_nr_running(rt_se);
 
@@ -1187,8 +1187,8 @@ void inc_rt_tasks(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
 static inline
 void dec_rt_tasks(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
 {
-	WARN_ON(!rt_prio(rt_se_prio(rt_se)));
-	WARN_ON(!rt_rq->rt_nr_running);
+	WARN_ON_DEFERRED(!rt_prio(rt_se_prio(rt_se)));
+	WARN_ON_DEFERRED(!rt_rq->rt_nr_running);
 	rt_rq->rt_nr_running -= rt_se_nr_running(rt_se);
 	rt_rq->rr_nr_running -= rt_se_rr_nr_running(rt_se);
 
@@ -1348,7 +1348,7 @@ static void __enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flag
 	}
 
 	if (move_entity(flags)) {
-		WARN_ON_ONCE(rt_se->on_list);
+		WARN_ON_ONCE_DEFERRED(rt_se->on_list);
 		if (flags & ENQUEUE_HEAD)
 			list_add(&rt_se->run_list, queue);
 		else
@@ -1368,7 +1368,7 @@ static void __dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flag
 	struct rt_prio_array *array = &rt_rq->active;
 
 	if (move_entity(flags)) {
-		WARN_ON_ONCE(!rt_se->on_list);
+		WARN_ON_ONCE_DEFERRED(!rt_se->on_list);
 		__delist_rt_entity(rt_se, array);
 	}
 	rt_se->on_rq = 0;
@@ -1684,7 +1684,7 @@ static struct sched_rt_entity *pick_next_rt_entity(struct rt_rq *rt_rq)
 	BUG_ON(idx >= MAX_RT_PRIO);
 
 	queue = array->queue + idx;
-	if (WARN_ON_ONCE(list_empty(queue)))
+	if (WARN_ON_ONCE_DEFERRED(list_empty(queue)))
 		return NULL;
 	next = list_entry(queue->next, struct sched_rt_entity, run_list);
 
@@ -2016,7 +2016,7 @@ static int push_rt_task(struct rq *rq, bool pull)
 		return 0;
 	}
 
-	if (WARN_ON(next_task == rq->curr))
+	if (WARN_ON_DEFERRED(next_task == rq->curr))
 		return 0;
 
 	/* We might release rq lock */
@@ -2316,8 +2316,8 @@ static void pull_rt_task(struct rq *this_rq)
 		 * the to-be-scheduled task?
 		 */
 		if (p && (p->prio < this_rq->rt.highest_prio.curr)) {
-			WARN_ON(p == src_rq->curr);
-			WARN_ON(!task_on_rq_queued(p));
+			WARN_ON_DEFERRED(p == src_rq->curr);
+			WARN_ON_DEFERRED(!task_on_rq_queued(p));
 
 			/*
 			 * There's a chance that p is higher in priority
@@ -2583,7 +2583,7 @@ static int task_is_throttled_rt(struct task_struct *p, int cpu)
 
 #ifdef CONFIG_RT_GROUP_SCHED // XXX maybe add task_rt_rq(), see also sched_rt_period_rt_rq
 	rt_rq = task_group(p)->rt_rq[cpu];
-	WARN_ON(!rt_group_sched_enabled() && rt_rq->tg != &root_task_group);
+	WARN_ON_DEFERRED(!rt_group_sched_enabled() && rt_rq->tg != &root_task_group);
 #else
 	rt_rq = &cpu_rq(cpu)->rt;
 #endif
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9f63b15d309d1..f74f9cd44e098 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1684,7 +1684,7 @@ static inline void update_idle_core(struct rq *rq) { }
 
 static inline struct task_struct *task_of(struct sched_entity *se)
 {
-	WARN_ON_ONCE(!entity_is_task(se));
+	WARN_ON_ONCE_DEFERRED(!entity_is_task(se));
 	return container_of(se, struct task_struct, se);
 }
 
@@ -1766,7 +1766,7 @@ static inline void assert_clock_updated(struct rq *rq)
 	 * The only reason for not seeing a clock update since the
 	 * last rq_pin_lock() is if we're currently skipping updates.
 	 */
-	WARN_ON_ONCE(rq->clock_update_flags < RQCF_ACT_SKIP);
+	WARN_ON_ONCE_DEFERRED(rq->clock_update_flags < RQCF_ACT_SKIP);
 }
 
 static inline u64 rq_clock(struct rq *rq)
@@ -1813,7 +1813,7 @@ static inline void rq_clock_cancel_skipupdate(struct rq *rq)
 static inline void rq_clock_start_loop_update(struct rq *rq)
 {
 	lockdep_assert_rq_held(rq);
-	WARN_ON_ONCE(rq->clock_update_flags & RQCF_ACT_SKIP);
+	WARN_ON_ONCE_DEFERRED(rq->clock_update_flags & RQCF_ACT_SKIP);
 	rq->clock_update_flags |= RQCF_ACT_SKIP;
 }
 
@@ -1870,9 +1870,9 @@ static inline void scx_rq_clock_invalidate(struct rq *rq) {}
 
 static inline void assert_balance_callbacks_empty(struct rq *rq)
 {
-	WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_LOCKING) &&
-		     rq->balance_callback &&
-		     rq->balance_callback != &balance_push_callback);
+	WARN_ON_ONCE_DEFERRED(IS_ENABLED(CONFIG_PROVE_LOCKING) &&
+			      rq->balance_callback &&
+			      rq->balance_callback != &balance_push_callback);
 }
 
 /*
@@ -2681,7 +2681,7 @@ struct sched_class {
 
 static inline void put_prev_task(struct rq *rq, struct task_struct *prev)
 {
-	WARN_ON_ONCE(rq->donor != prev);
+	WARN_ON_ONCE_DEFERRED(rq->donor != prev);
 	prev->sched_class->put_prev_task(rq, prev, NULL);
 }
 
@@ -2704,7 +2704,7 @@ static inline void put_prev_set_next_task(struct rq *rq,
 					  struct task_struct *prev,
 					  struct task_struct *next)
 {
-	WARN_ON_ONCE(rq->donor != prev);
+	WARN_ON_ONCE_DEFERRED(rq->donor != prev);
 
 	__put_prev_set_next_dl_server(rq, prev, next);
 
@@ -3030,7 +3030,7 @@ static inline void attach_task(struct rq *rq, struct task_struct *p)
 {
 	lockdep_assert_rq_held(rq);
 
-	WARN_ON_ONCE(task_rq(p) != rq);
+	WARN_ON_ONCE_DEFERRED(task_rq(p) != rq);
 	activate_task(rq, p, ENQUEUE_NOCLOCK);
 	wakeup_preempt(rq, p, 0);
 }
-- 
2.53.0


^ permalink raw reply related

* [PATCH 0/2] sched: Introduce and use deferred WARNs in sched
From: Sebastian Andrzej Siewior @ 2026-06-23 14:26 UTC (permalink / raw)
  To: linux-arch, linux-kernel, sched-ext, netdev
  Cc: David S . Miller, Andrea Righi, Andrew Morton, Arnd Bergmann,
	Ben Segall, Breno Leitao, Changwoo Min, David Vernet,
	Dietmar Eggemann, Eric Dumazet, Ingo Molnar, Jakub Kicinski,
	John Ogness, Juri Lelli, K Prateek Nayak, Paolo Abeni,
	Peter Zijlstra, Petr Mladek, Sergey Senozhatsky, Simon Horman,
	Steven Rostedt, Tejun Heo, Vincent Guittot, Vlad Poenaru,
	Sebastian Andrzej Siewior

This is a follow-up to the netconsole lockup reported
	https://lore.kernel.org/all/20260610183621.3915271-1-vlad.wing@gmail.com/

The idea is to use deferred printing for WARNs and use them in sched. I
tried to use only where it looks that the rq lock acquired instead a
plain s/WARN_ON/WARN_ON_DEFFERED which would be simpler.

This unholy deferred mess can be removed once we don't have legacy
consoles anymore _or_ force force_legacy_kthread=true.

The initial report is against v6.16 and netconsole. The reported problem
does not occur upstream since commit 7eab73b18630e ("netconsole: convert
to NBCON console infrastructure") which is v7.0-rc1.

Should this be rejected outright because the preferred sollution is to
| - stick msg in buffer (lockless)
| - print to atomic consoles (lockless)
| - use irq_work to wake console kthreads (lockless)
| - each kthread then tries to flush buffer to its own non-atomic console
|   in non-atomic context."

then this means to force force_legacy_kthread=true.
The threaded legacy printer is available since v6.12-rc1. It terms of stable
fix, this could go back as of v6.12 stable and not earlier (in case we care).

I tested this on a x86 box with 8250 and warning in put_prev_entity().
After it printed the initial warning, it dead-locked shortly after
because systemd was writing to the kernel buffer it acquired the
uart_port_lock then attempted to write lockdep report which required the
same lock…

Sebastian Andrzej Siewior (2):
  bug: Provide WARN_ON.*DEFERRED() macros for console deferred output
  sched: Use WARN_ON.*_DEFERRED()

 include/asm-generic/bug.h  |  41 ++++++++++++++
 kernel/sched/core.c        |  78 +++++++++++++-------------
 kernel/sched/core_sched.c  |   6 +-
 kernel/sched/cpudeadline.c |   6 +-
 kernel/sched/deadline.c    |  62 ++++++++++-----------
 kernel/sched/ext.c         | 110 ++++++++++++++++++-------------------
 kernel/sched/fair.c        |  88 ++++++++++++++---------------
 kernel/sched/rt.c          |  36 ++++++------
 kernel/sched/sched.h       |  18 +++---
 lib/bug.c                  |  16 +++++-
 10 files changed, 257 insertions(+), 204 deletions(-)

-- 
2.53.0

^ permalink raw reply

* [PATCH 1/2] bug: Provide WARN_ON.*DEFERRED() macros for console deferred output
From: Sebastian Andrzej Siewior @ 2026-06-23 14:26 UTC (permalink / raw)
  To: linux-arch, linux-kernel, sched-ext, netdev
  Cc: David S . Miller, Andrea Righi, Andrew Morton, Arnd Bergmann,
	Ben Segall, Breno Leitao, Changwoo Min, David Vernet,
	Dietmar Eggemann, Eric Dumazet, Ingo Molnar, Jakub Kicinski,
	John Ogness, Juri Lelli, K Prateek Nayak, Paolo Abeni,
	Peter Zijlstra, Petr Mladek, Sergey Senozhatsky, Simon Horman,
	Steven Rostedt, Tejun Heo, Vincent Guittot, Vlad Poenaru,
	Sebastian Andrzej Siewior
In-Reply-To: <20260623142650.265721-1-bigeasy@linutronix.de>

Provide a deferred version of the WARN_ON() macro. It will delay
flushing the console until a later context. It is needed in a context
where the caller holds locks which can lead to a deadlock content is
flushed to the console driver.
An example would from a warning from within the scheduler resulting in a
wake-up of a task.

Deferring the output works by using printk_deferred_enter/ exit() around
the printing output. This must be used in a context where the task can't
migrate to another CPU. This should be the case usually, since the
scheduler would acquire the rq lock whith disabled interrupts, but to be
safe preemption is disabled to guarantee this.

In order not to bloat the code on architectures which provide an
optimized __WARN_FLAGS() define BUGFLAG_DEFERRED which is handled by
__report_bug() and does not increase the code size.

Provide the DEFERRED macros based on __WARN_FLAGS and __WARN_FLAGS
macros. Extend __report_bug() to handle the deferred case.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/asm-generic/bug.h | 41 +++++++++++++++++++++++++++++++++++++++
 lib/bug.c                 | 16 +++++++++++++--
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 09e8eccee8ed9..1e3ff00f709b8 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -14,6 +14,7 @@
 #define BUGFLAG_DONE		(1 << 2)
 #define BUGFLAG_NO_CUT_HERE	(1 << 3)	/* CUT_HERE already sent */
 #define BUGFLAG_ARGS		(1 << 4)
+#define BUGFLAG_DEFERRED	(1 << 5)
 #define BUGFLAG_TAINT(taint)	((taint) << 8)
 #define BUG_GET_TAINT(bug)	((bug)->flags >> 8)
 #endif
@@ -115,6 +116,16 @@ extern __printf(1, 2) void __warn_printk(const char *fmt, ...);
 })
 #endif
 
+#define WARN_ON_DEFERRED(condition) ({					\
+	int __ret_warn_on = !!(condition);				\
+	if (unlikely(__ret_warn_on)) {					\
+		__WARN_FLAGS(#condition,				\
+			     BUGFLAG_DEFERRED |				\
+			     BUGFLAG_TAINT(TAINT_WARN));		\
+	}								\
+	unlikely(__ret_warn_on);					\
+})
+
 #ifndef WARN_ON_ONCE
 #define WARN_ON_ONCE(condition) ({					\
 	int __ret_warn_on = !!(condition);				\
@@ -125,6 +136,16 @@ extern __printf(1, 2) void __warn_printk(const char *fmt, ...);
 	unlikely(__ret_warn_on);					\
 })
 #endif
+
+#define WARN_ON_ONCE_DEFERRED(condition) ({				\
+	int __ret_warn_on = !!(condition);				\
+	if (unlikely(__ret_warn_on)) {					\
+		__WARN_FLAGS(#condition,				\
+			     BUGFLAG_ONCE | BUGFLAG_DEFERRED |		\
+			     BUGFLAG_TAINT(TAINT_WARN));		\
+	}								\
+	unlikely(__ret_warn_on);					\
+})
 #endif /* __WARN_FLAGS */
 
 #if defined(__WARN_FLAGS) && !defined(__WARN_printf)
@@ -159,6 +180,19 @@ extern __printf(1, 2) void __warn_printk(const char *fmt, ...);
 })
 #endif
 
+#ifndef WARN_ON_DEFERRED
+#define WARN_ON_DEFERRED(condition) ({					\
+	int __ret_warn_on = !!(condition);				\
+	if (unlikely(__ret_warn_on)) {					\
+		guard(preempt)();					\
+		printk_deferred_enter()					\
+		__WARN();						\
+		printk_deferred_exit()					\
+	}								\
+	unlikely(__ret_warn_on);					\
+})
+#endif
+
 #ifndef WARN
 #define WARN(condition, format...) ({					\
 	int __ret_warn_on = !!(condition);				\
@@ -180,6 +214,11 @@ extern __printf(1, 2) void __warn_printk(const char *fmt, ...);
 	DO_ONCE_LITE_IF(condition, WARN_ON, 1)
 #endif
 
+#ifndef WARN_ON_ONCE_DEFERRED
+#define WARN_ON_ONCE_DEFERRED(condition)				\
+	DO_ONCE_LITE_IF(condition, WARN_ON_DEFERRED, 1)
+#endif
+
 #ifndef WARN_ONCE
 #define WARN_ONCE(condition, format...)				\
 	DO_ONCE_LITE_IF(condition, WARN, 1, format)
@@ -215,7 +254,9 @@ extern __printf(1, 2) void __warn_printk(const char *fmt, ...);
 })
 #endif
 
+#define WARN_ON_DEFERRED(condition) WARN_ON(condition)
 #define WARN_ON_ONCE(condition) WARN_ON(condition)
+#define WARN_ON_ONCE_DEFERRED(condition) WARN_ON(condition)
 #define WARN_ONCE(condition, format...) WARN(condition, format)
 #define WARN_TAINT(condition, taint, format...) WARN(condition, format)
 #define WARN_TAINT_ONCE(condition, taint, format...) WARN(condition, format)
diff --git a/lib/bug.c b/lib/bug.c
index 224f4cfa4aa31..f5768f5d17b47 100644
--- a/lib/bug.c
+++ b/lib/bug.c
@@ -196,7 +196,7 @@ void __warn_printf(const char *fmt, struct pt_regs *regs)
 
 static enum bug_trap_type __report_bug(struct bug_entry *bug, unsigned long bugaddr, struct pt_regs *regs)
 {
-	bool warning, once, done, no_cut, has_args;
+	bool warning, once, done, no_cut, has_args, deferred;
 	const char *file, *fmt;
 	unsigned line;
 
@@ -219,6 +219,7 @@ static enum bug_trap_type __report_bug(struct bug_entry *bug, unsigned long buga
 	done     = bug->flags & BUGFLAG_DONE;
 	no_cut   = bug->flags & BUGFLAG_NO_CUT_HERE;
 	has_args = bug->flags & BUGFLAG_ARGS;
+	deferred = bug->flags & BUGFLAG_DEFERRED;
 
 	if (warning && once) {
 		if (done)
@@ -229,7 +230,10 @@ static enum bug_trap_type __report_bug(struct bug_entry *bug, unsigned long buga
 		 */
 		bug->flags |= BUGFLAG_DONE;
 	}
-
+	if (deferred) {
+		preempt_disable_notrace();
+		printk_deferred_enter();
+	}
 	/*
 	 * BUG() and WARN_ON() families don't print a custom debug message
 	 * before triggering the exception handler, so we must add the
@@ -245,6 +249,10 @@ static enum bug_trap_type __report_bug(struct bug_entry *bug, unsigned long buga
 		/* this is a WARN_ON rather than BUG/BUG_ON */
 		__warn(file, line, (void *)bugaddr, BUG_GET_TAINT(bug), regs,
 		       NULL);
+		if (deferred) {
+			printk_deferred_exit();
+			preempt_enable_notrace();
+		}
 		return BUG_TRAP_TYPE_WARN;
 	}
 
@@ -254,6 +262,10 @@ static enum bug_trap_type __report_bug(struct bug_entry *bug, unsigned long buga
 		pr_crit("kernel BUG at %pB [verbose debug info unavailable]\n",
 			(void *)bugaddr);
 
+	if (deferred) {
+		printk_deferred_exit();
+		preempt_enable_notrace();
+	}
 	return BUG_TRAP_TYPE_BUG;
 }
 
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH net] tipc: fix UAF in cleanup_bearer() due to premature dst_cache_destroy()
From: Xin Long @ 2026-06-23 14:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kuniyuki Iwashima, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev, eric.dumazet, syzbot+e14bc5d4942756023b77,
	Jon Maloy
In-Reply-To: <CANn89iLOUWwECfTyiPbk--nqNo=tbshZbYAS9Zy_9OFNdpADoA@mail.gmail.com>

On Tue, Jun 23, 2026 at 9:58 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Jun 23, 2026 at 6:56 AM Xin Long <lucien.xin@gmail.com> wrote:
> >
> > On Tue, Jun 23, 2026 at 2:35 AM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Mon, Jun 22, 2026 at 10:37 PM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > On Mon, Jun 22, 2026 at 6:48 PM Xin Long <lucien.xin@gmail.com> wrote:
> > > > >
> > > >
> > > > > Could this corrupt the list for concurrent RCU readers?
> > > > > When list_del_rcu() is called, it intentionally leaves the next pointer
> > > > > intact so concurrent readers can continue their traversal. However, the
> > > > > immediate call to list_add() overwrites both the next and prev pointers
> > > > > to link the entry into private_list.
> > > > > If a concurrent reader is currently positioned at rcast, won't it follow
> > > > > the newly clobbered next pointer and jump from the original RCU list
> > > > > directly into private_list?
> > > > > Because private_list is allocated on the local stack, the reader might
> > > > > interpret stack memory as a struct udp_replicast. Furthermore, the reader
> > > > > would miss its loop termination condition because it expects to reach the
> > > > > original list head, potentially resulting in an infinite loop or a crash.
> > > > > [ ... ]
> > > >
> > > > I think you are right.
> > > >
> > > > Considering there is already one rcu_head in udp_replicast I will use it in V2.
> > >
> > > While looking at many syzbot reports with RTNL pressure. I found this
> > > gem in  tipc_exit_net()
> > >
> > > while (atomic_read(&tn->wq_count))
> > >       cond_resched();
> > >
> > > On some kernel builds cond_resched() can be a NOP, so we might loop
> > > here for a while :/
> > >
> > True, thanks for the report,
> >
> > I think a cleanup_wq should be added for 'ub->work' instead of using system_wq,
> > and then do flush_workqueue(cleanup_wq) in tipc_init_net().
> >
>
>  I will send a series of 2 patches.
>
> Second one looks like:
>
Cool, wait_var_event() looks better.

Thanks.

> More complex stuff can be added later in net-next.
>
> commit 2f6b56e70f7048a9a2577715b8cfdb0ec94c2469
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Tue Jun 23 06:48:33 2026 +0000
>
>     tipc: avoid busy looping in tipc_exit_net()
>
>     Blamed commit introduced a busy-wait loop in tipc_exit_net()
>     to wait for pending UDP bearer cleanup works to complete:
>
>            while (atomic_read(&tn->wq_count))
>                    cond_resched();
>
>     This loop can busy-wait for a long time if cond_resched() is a NOP. This
>     typically happens if the netns exit is executed by a high priority task,
>     or under kernels configured without preemption (CONFIG_PREEMPT_NONE). In
>     such cases, it wastes CPU cycles and can lead to soft lockups.
>
>     Fix this by replacing the busy loop with wait_var_event(), allowing the
>     thread to sleep properly until the work queue count reaches zero.
>
>     Accordingly, update cleanup_bearer() to use atomic_dec_and_test() and
>     wake_up_var() to wake up the waiter when the count drops to zero.
>
>     This uses the global wait queue hash table, avoiding the need to bloat
>     struct tipc_net with a wait_queue_head_t. The atomic_dec_and_test()
>     provides the necessary memory barrier to ensure the wakeup is not missed.
>
>     Fixes: 04c26faa51d1 ("tipc: wait and exit until all work queues are done")
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Cc: Xin Long <lucien.xin@gmail.com>
>     Cc: Jon Maloy <jmaloy@redhat.com>
>     Cc: tipc-discussion@lists.sourceforge.net
>
> diff --git a/net/tipc/core.c b/net/tipc/core.c
> index 1ddecea1df6e9100334c47a28ff6c065292fb9ad..315975c3be8186784e9c44c9ff69d62c17ffd4b9
> 100644
> --- a/net/tipc/core.c
> +++ b/net/tipc/core.c
> @@ -45,6 +45,7 @@
>  #include "crypto.h"
>
>  #include <linux/module.h>
> +#include <linux/wait_bit.h>
>
>  /* configurable TIPC parameters */
>  unsigned int tipc_net_id __read_mostly;
> @@ -118,8 +119,7 @@ static void __net_exit tipc_exit_net(struct net *net)
>  #ifdef CONFIG_TIPC_CRYPTO
>         tipc_crypto_stop(&tipc_net(net)->crypto_tx);
>  #endif
> -       while (atomic_read(&tn->wq_count))
> -               cond_resched();
> +       wait_var_event(&tn->wq_count, atomic_read(&tn->wq_count) == 0);
>  }
>
>  static void __net_exit tipc_pernet_pre_exit(struct net *net)
> diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
> index 66f3cb87a0aaaac8f40e8f237ab9a44d539b1cd8..62ae7f5b58409c89798c915dee752ac42487581f
> 100644
> --- a/net/tipc/udp_media.c
> +++ b/net/tipc/udp_media.c
> @@ -40,6 +40,7 @@
>  #include <linux/igmp.h>
>  #include <linux/kernel.h>
>  #include <linux/workqueue.h>
> +#include <linux/wait_bit.h>
>  #include <linux/list.h>
>  #include <net/sock.h>
>  #include <net/ip.h>
> @@ -830,7 +831,8 @@ static void cleanup_bearer(struct work_struct *work)
>         synchronize_net();
>
>         dst_cache_destroy(&ub->rcast.dst_cache);
> -       atomic_dec(&tn->wq_count);
> +       if (atomic_dec_and_test(&tn->wq_count))
> +               wake_up_var(&tn->wq_count);
>         kfree(ub);
>  }

^ permalink raw reply

* Re: [PATCH net] tipc: fix UAF in cleanup_bearer() due to premature dst_cache_destroy()
From: Eric Dumazet @ 2026-06-23 13:58 UTC (permalink / raw)
  To: Xin Long
  Cc: Kuniyuki Iwashima, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev, eric.dumazet, syzbot+e14bc5d4942756023b77,
	Jon Maloy
In-Reply-To: <CADvbK_cMHtBFGb87P6CoqJN+DCasn_5=RwtNhymdZ6p1eFnjuQ@mail.gmail.com>

On Tue, Jun 23, 2026 at 6:56 AM Xin Long <lucien.xin@gmail.com> wrote:
>
> On Tue, Jun 23, 2026 at 2:35 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Mon, Jun 22, 2026 at 10:37 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Mon, Jun 22, 2026 at 6:48 PM Xin Long <lucien.xin@gmail.com> wrote:
> > > >
> > >
> > > > Could this corrupt the list for concurrent RCU readers?
> > > > When list_del_rcu() is called, it intentionally leaves the next pointer
> > > > intact so concurrent readers can continue their traversal. However, the
> > > > immediate call to list_add() overwrites both the next and prev pointers
> > > > to link the entry into private_list.
> > > > If a concurrent reader is currently positioned at rcast, won't it follow
> > > > the newly clobbered next pointer and jump from the original RCU list
> > > > directly into private_list?
> > > > Because private_list is allocated on the local stack, the reader might
> > > > interpret stack memory as a struct udp_replicast. Furthermore, the reader
> > > > would miss its loop termination condition because it expects to reach the
> > > > original list head, potentially resulting in an infinite loop or a crash.
> > > > [ ... ]
> > >
> > > I think you are right.
> > >
> > > Considering there is already one rcu_head in udp_replicast I will use it in V2.
> >
> > While looking at many syzbot reports with RTNL pressure. I found this
> > gem in  tipc_exit_net()
> >
> > while (atomic_read(&tn->wq_count))
> >       cond_resched();
> >
> > On some kernel builds cond_resched() can be a NOP, so we might loop
> > here for a while :/
> >
> True, thanks for the report,
>
> I think a cleanup_wq should be added for 'ub->work' instead of using system_wq,
> and then do flush_workqueue(cleanup_wq) in tipc_init_net().
>

 I will send a series of 2 patches.

Second one looks like:

More complex stuff can be added later in net-next.

commit 2f6b56e70f7048a9a2577715b8cfdb0ec94c2469
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Jun 23 06:48:33 2026 +0000

    tipc: avoid busy looping in tipc_exit_net()

    Blamed commit introduced a busy-wait loop in tipc_exit_net()
    to wait for pending UDP bearer cleanup works to complete:

           while (atomic_read(&tn->wq_count))
                   cond_resched();

    This loop can busy-wait for a long time if cond_resched() is a NOP. This
    typically happens if the netns exit is executed by a high priority task,
    or under kernels configured without preemption (CONFIG_PREEMPT_NONE). In
    such cases, it wastes CPU cycles and can lead to soft lockups.

    Fix this by replacing the busy loop with wait_var_event(), allowing the
    thread to sleep properly until the work queue count reaches zero.

    Accordingly, update cleanup_bearer() to use atomic_dec_and_test() and
    wake_up_var() to wake up the waiter when the count drops to zero.

    This uses the global wait queue hash table, avoiding the need to bloat
    struct tipc_net with a wait_queue_head_t. The atomic_dec_and_test()
    provides the necessary memory barrier to ensure the wakeup is not missed.

    Fixes: 04c26faa51d1 ("tipc: wait and exit until all work queues are done")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Xin Long <lucien.xin@gmail.com>
    Cc: Jon Maloy <jmaloy@redhat.com>
    Cc: tipc-discussion@lists.sourceforge.net

diff --git a/net/tipc/core.c b/net/tipc/core.c
index 1ddecea1df6e9100334c47a28ff6c065292fb9ad..315975c3be8186784e9c44c9ff69d62c17ffd4b9
100644
--- a/net/tipc/core.c
+++ b/net/tipc/core.c
@@ -45,6 +45,7 @@
 #include "crypto.h"

 #include <linux/module.h>
+#include <linux/wait_bit.h>

 /* configurable TIPC parameters */
 unsigned int tipc_net_id __read_mostly;
@@ -118,8 +119,7 @@ static void __net_exit tipc_exit_net(struct net *net)
 #ifdef CONFIG_TIPC_CRYPTO
        tipc_crypto_stop(&tipc_net(net)->crypto_tx);
 #endif
-       while (atomic_read(&tn->wq_count))
-               cond_resched();
+       wait_var_event(&tn->wq_count, atomic_read(&tn->wq_count) == 0);
 }

 static void __net_exit tipc_pernet_pre_exit(struct net *net)
diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
index 66f3cb87a0aaaac8f40e8f237ab9a44d539b1cd8..62ae7f5b58409c89798c915dee752ac42487581f
100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -40,6 +40,7 @@
 #include <linux/igmp.h>
 #include <linux/kernel.h>
 #include <linux/workqueue.h>
+#include <linux/wait_bit.h>
 #include <linux/list.h>
 #include <net/sock.h>
 #include <net/ip.h>
@@ -830,7 +831,8 @@ static void cleanup_bearer(struct work_struct *work)
        synchronize_net();

        dst_cache_destroy(&ub->rcast.dst_cache);
-       atomic_dec(&tn->wq_count);
+       if (atomic_dec_and_test(&tn->wq_count))
+               wake_up_var(&tn->wq_count);
        kfree(ub);
 }

^ permalink raw reply

* Re: [PATCH net] ice: fix stats array overflow when VF requests more queues
From: Przemek Kitszel @ 2026-06-23 13:59 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: Tony Nguyen, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Jacob Keller, Petr Oros,
	intel-wired-lan, netdev, linux-kernel
In-Reply-To: <CADEbmW0BsQsu1pPX=kk58tTz_5EArjCKgmp_MKxRFcuvb3TDGg@mail.gmail.com>

On 4/29/26 23:59, Michal Schmidt wrote:
> On Tue, Apr 28, 2026 at 4:00 PM Przemek Kitszel
> <przemyslaw.kitszel@intel.com> wrote:
>> On 4/27/26 17:18, Michal Schmidt wrote:
>>> When a VF increases its queue count via VIRTCHNL_OP_REQUEST_QUEUES,
>>> ice_vc_request_qs_msg() sets vf->num_req_qs and triggers a VF reset.
>>> The reset calls ice_vf_reconfig_vsi(), which does ice_vsi_decfg()
>>> followed by ice_vsi_cfg(). ice_vsi_decfg() does not free the per-ring
>>> stats arrays. Inside ice_vsi_cfg_def(), ice_vsi_set_num_qs() updates
>>> alloc_txq/alloc_rxq to the new larger value, but
>>> ice_vsi_alloc_stat_arrays() returns early because the stats already
>>> exist. ice_vsi_alloc_ring_stats() then iterates using the new larger
>>> alloc_txq and writes beyond the bounds of the old, smaller
>>> tx_ring_stats/rx_ring_stats pointer arrays, corrupting adjacent SLUB
>>> metadata.
>>>
>>
>> thank you for reproducing the bug, it is exactly the situation that
>> I was facing
>> have you tried with my proposed (unfortunately not public yet) fix
>> to just combine ice_vsi_alloc_stat_arrays() and
>> ice_vsi_realloc_stat_arrays() into one function?
> 
> I tried that now and the result is: yes, your patch fixes the bug too.
> Michal
> 

Hi,
are you going to make your patch more robust against on CHNL VSIs?
https://lore.kernel.org/netdev/20260523001618.1757240-1-kuba@kernel.org

alternatively I could sent my "alternative fix" which covers that case

^ permalink raw reply

* Re: [PATCH net] tipc: fix UAF in cleanup_bearer() due to premature dst_cache_destroy()
From: Xin Long @ 2026-06-23 13:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kuniyuki Iwashima, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev, eric.dumazet, syzbot+e14bc5d4942756023b77,
	Jon Maloy
In-Reply-To: <CANn89i+dkbrSAwvaWXW7yWMfcwUebuTBLG5T7AGZaZcpVYGyfQ@mail.gmail.com>

On Tue, Jun 23, 2026 at 2:35 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Jun 22, 2026 at 10:37 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Mon, Jun 22, 2026 at 6:48 PM Xin Long <lucien.xin@gmail.com> wrote:
> > >
> >
> > > Could this corrupt the list for concurrent RCU readers?
> > > When list_del_rcu() is called, it intentionally leaves the next pointer
> > > intact so concurrent readers can continue their traversal. However, the
> > > immediate call to list_add() overwrites both the next and prev pointers
> > > to link the entry into private_list.
> > > If a concurrent reader is currently positioned at rcast, won't it follow
> > > the newly clobbered next pointer and jump from the original RCU list
> > > directly into private_list?
> > > Because private_list is allocated on the local stack, the reader might
> > > interpret stack memory as a struct udp_replicast. Furthermore, the reader
> > > would miss its loop termination condition because it expects to reach the
> > > original list head, potentially resulting in an infinite loop or a crash.
> > > [ ... ]
> >
> > I think you are right.
> >
> > Considering there is already one rcu_head in udp_replicast I will use it in V2.
>
> While looking at many syzbot reports with RTNL pressure. I found this
> gem in  tipc_exit_net()
>
> while (atomic_read(&tn->wq_count))
>       cond_resched();
>
> On some kernel builds cond_resched() can be a NOP, so we might loop
> here for a while :/
>
True, thanks for the report,

I think a cleanup_wq should be added for 'ub->work' instead of using system_wq,
and then do flush_workqueue(cleanup_wq) in tipc_init_net().

> Added in
>
> commit 04c26faa51d1e2fe71cf13c45791f5174c37f986    tipc: wait and exit
> until all work queues are done

^ permalink raw reply

* [PATCH net] tipc: fix out-of-bounds read in broadcast Gap ACK blocks
From: Samuel Page @ 2026-06-23 13:54 UTC (permalink / raw)
  To: Jon Maloy
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Tung Quang Nguyen, netdev, tipc-discussion,
	linux-kernel, Samuel Page

A broadcast PROTOCOL/STATE_MSG can carry a Gap ACK blocks record in its
data area. tipc_get_gap_ack_blks() only verifies that the record's len
field is self-consistent with its ugack_cnt/bgack_cnt counts
(sz == struct_size(p, gacks, ugack_cnt + bgack_cnt)); it does not check
that the record actually fits in the message data area, msg_data_sz().

The unicast caller tipc_link_proto_rcv() bounds it ("if (glen > dlen)
break;"), but the broadcast caller tipc_bcast_sync_rcv() discards the
returned size, so tipc_link_advance_transmq() copies the record off the
receive skb with an attacker-controlled count:

	this_ga = kmemdup(ga, struct_size(ga, gacks, ga->bgack_cnt),
			  GFP_ATOMIC);

A TIPC neighbour that negotiated TIPC_GAP_ACK_BLOCK triggers it with one
ordinary broadcast STATE_MSG (msg_bc_ack_invalid() clear), sized so its
data area is short, carrying a Gap ACK record with len = 0x400,
bgack_cnt = 0xff and ugack_cnt = 0. len then equals
struct_size(p, gacks, 255), so the consistency check passes and ga is
non-NULL; kmemdup() reads struct_size(ga, gacks, 255) = 1024 bytes out
of the much smaller skb:

  BUG: KASAN: slab-out-of-bounds in kmemdup_noprof+0x48/0x60
  Read of size 1024 at addr ffff0000c7030d38 by task poc864/69
  Call trace:
   kmemdup_noprof+0x48/0x60
   tipc_link_advance_transmq+0x86c/0xb80
   tipc_link_bc_ack_rcv+0x19c/0x1e0
   tipc_bcast_sync_rcv+0x1c4/0x2c4
   tipc_rcv+0x85c/0x1340
   tipc_l2_rcv_msg+0xac/0x104
  The buggy address belongs to the object at ffff0000c7030d00
   which belongs to the cache skbuff_small_head of size 704
  The buggy address is located 56 bytes inside of
   allocated 704-byte region [ffff0000c7030d00, ffff0000c7030fc0)

The copied-out bytes are subsequently consumed as gap/ack values, but
the read is already out of bounds at the kmemdup() regardless of how
they are used.

Apply the same bound the unicast path uses to the broadcast caller: drop
the Gap ACK blocks when the reported size exceeds the message data size.
A NULL ga is already the defined "no Gap ACK blocks" case, so well-formed
state messages are unaffected.

Fixes: d7626b5acff9 ("tipc: introduce Gap ACK blocks for broadcast link")
Cc: stable@vger.kernel.org
Assisted-by: Bynario AI
Signed-off-by: Samuel Page <sam@bynar.io>
---
Before posting I found an earlier thread for what looks like the same (or a
very closely related) issue:

  https://lore.kernel.org/netdev/1316452e465e9a96fce44ec15130a14f3872149f.1775809727.git.caoruide123@gmail.com/
  [PATCH net 1/1] tipc: validate Gap ACK blocks in STATE message

That one added the validation inside tipc_get_gap_ack_blks() and the thread
stalled on whether the extra checks were redundant. This patch instead adds,
on the broadcast caller, only the same bound the unicast path already applies,
and includes the KASAN reproducer that was asked for there. 

 net/tipc/bcast.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 76a1585d3f6b..61c83bd95755 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -502,6 +502,7 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
 	struct sk_buff_head *inputq = &tipc_bc_base(net)->inputq;
 	struct tipc_gap_ack_blks *ga;
 	struct sk_buff_head xmitq;
+	u16 glen;
 	int rc = 0;

 	__skb_queue_head_init(&xmitq);
@@ -510,7 +511,10 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
 	if (msg_type(hdr) != STATE_MSG) {
 		tipc_link_bc_init_rcv(l, hdr);
 	} else if (!msg_bc_ack_invalid(hdr)) {
-		tipc_get_gap_ack_blks(&ga, l, hdr, false);
+		/* Validate Gap ACK blocks, drop if invalid */
+		glen = tipc_get_gap_ack_blks(&ga, l, hdr, false);
+		if (glen > msg_data_sz(hdr))
+			ga = NULL;
 		if (!sysctl_tipc_bc_retruni)
 			retrq = &xmitq;
 		rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr),

base-commit: a986fde914d88af47eb78fd29c5d1af7952c3500
-- 
2.54.0

^ permalink raw reply related

* Re: [PATCH v4 9/9] rust: macros: remove `THIS_MODULE` static from `module!`
From: Gary Guo @ 2026-06-23 13:53 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
	FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
	linux-pci
In-Reply-To: <20260623-fix-fops-owner-v4-9-0daf5f077d5c@linux.dev>

On Tue Jun 23, 2026 at 7:29 AM BST, Alvin Sun wrote:
> All users have been migrated to `ModuleMetadata::THIS_MODULE` const or
> `this_module::<LocalModule>()` helper. The `static THIS_MODULE`
> generated by the `module!` macro is no longer referenced anywhere,
> so remove it to avoid having two sources of the same `ThisModule`
> pointer.
> 
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  rust/macros/module.rs | 16 ----------------
>  1 file changed, 16 deletions(-)


^ permalink raw reply

* Re: [PATCH v4 8/9] rust: binder: use `LocalModule` for `THIS_MODULE`
From: Gary Guo @ 2026-06-23 13:53 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
	FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
	linux-pci
In-Reply-To: <20260623-fix-fops-owner-v4-8-0daf5f077d5c@linux.dev>

On Tue Jun 23, 2026 at 7:29 AM BST, Alvin Sun wrote:
> Replace the `THIS_MODULE` static reference in the binder fops with
> `this_module::<LocalModule>()`, consistent with the move of
> `THIS_MODULE` into the `ModuleMetadata` trait.
> 
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  drivers/android/binder/rust_binder_main.rs | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)


^ permalink raw reply

* Re: [PATCH v4 7/9] rust: configfs: use `LocalModule` for `THIS_MODULE`
From: Gary Guo @ 2026-06-23 13:53 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
	FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
	linux-pci
In-Reply-To: <20260623-fix-fops-owner-v4-7-0daf5f077d5c@linux.dev>

On Tue Jun 23, 2026 at 7:29 AM BST, Alvin Sun wrote:
> Replace the `THIS_MODULE` static reference in the `configfs_attrs!`
> macro with `this_module::<LocalModule>()`, and update
> rnull to import `LocalModule` instead of `THIS_MODULE`, consistent
> with the move of `THIS_MODULE` into the `ModuleMetadata` trait.
>
> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
> ---
>  drivers/block/rnull/configfs.rs | 6 ++----
>  rust/kernel/configfs.rs         | 8 +++++---
>  2 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
> index c10a55fc58948..b2547ad1e5ddd 100644
> --- a/drivers/block/rnull/configfs.rs
> +++ b/drivers/block/rnull/configfs.rs
> @@ -1,9 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  
> -use super::{
> -    NullBlkDevice,
> -    THIS_MODULE, //
> -};
> +use super::NullBlkDevice;
> +use crate::LocalModule;
>  use kernel::{
>      block::mq::gen_disk::{
>          GenDisk,
> diff --git a/rust/kernel/configfs.rs b/rust/kernel/configfs.rs
> index 2339c6467325d..b542422115461 100644
> --- a/rust/kernel/configfs.rs
> +++ b/rust/kernel/configfs.rs
> @@ -875,7 +875,7 @@ fn as_ptr(&self) -> *const bindings::config_item_type {
>  ///                 configfs::Subsystem<Configuration>,
>  ///                 Configuration
>  ///                 >::new_with_child_ctor::<N,Child>(
> -///             &THIS_MODULE,
> +///             ::kernel::module::this_module::<LocalModule>(),

This should be `crate::LocalModule`.

Best,
Gary

>  ///             &CONFIGURATION_ATTRS
>  ///         );
>  ///
> @@ -1021,7 +1021,8 @@ macro_rules! configfs_attrs {
>  
>                      static [< $data:upper _TPE >] : $crate::configfs::ItemType<$container, $data>  =
>                          $crate::configfs::ItemType::<$container, $data>::new::<N>(
> -                            &THIS_MODULE, &[<$ data:upper _ATTRS >]
> +                            $crate::module::this_module::<LocalModule>(),
> +                            &[<$ data:upper _ATTRS >]
>                          );
>                  )?
>  
> @@ -1030,7 +1031,8 @@ macro_rules! configfs_attrs {
>                          $crate::configfs::ItemType<$container, $data>  =
>                              $crate::configfs::ItemType::<$container, $data>::
>                              new_with_child_ctor::<N, $child>(
> -                                &THIS_MODULE, &[<$ data:upper _ATTRS >]
> +                                $crate::module::this_module::<LocalModule>(),
> +                                &[<$ data:upper _ATTRS >]
>                              );
>                  )?
>  



^ permalink raw reply

* Re: [PATCH v4 6/9] rust: miscdevice: set fops.owner from driver module pointer
From: Gary Guo @ 2026-06-23 13:51 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
	FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
	linux-pci
In-Reply-To: <20260623-fix-fops-owner-v4-6-0daf5f077d5c@linux.dev>

On Tue Jun 23, 2026 at 7:29 AM BST, Alvin Sun wrote:
> Set the miscdevice fops owner field from the driver module pointer
> via the `this_module::<T::OwnerModule>()` helper, instead of
> defaulting to null.
> 
> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  rust/kernel/miscdevice.rs | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)


^ permalink raw reply

* Re: [PATCH v4 4/9] rust: macros: auto-insert OwnerModule in #[vtable]
From: Gary Guo @ 2026-06-23 13:50 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
	FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
	linux-pci
In-Reply-To: <20260623-fix-fops-owner-v4-4-0daf5f077d5c@linux.dev>

On Tue Jun 23, 2026 at 7:29 AM BST, Alvin Sun wrote:
> Auto-add `type OwnerModule: ::kernel::ModuleMetadata;` as a required
> associated type on the trait side if not already defined, and
> auto-insert `type OwnerModule = crate::LocalModule;` on the impl side
> if not explicitly provided, eliminating the need to manually declare
> and implement `OwnerModule` in every vtable trait and impl.
> 
> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
> Suggested-by: Gary Guo <gary@garyguo.net>
> Link: https://lore.kernel.org/all/DIMMWHUOLPSH.13JFRHDKDQJGO@garyguo.net
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  rust/macros/lib.rs    |  6 ++++++
>  rust/macros/vtable.rs | 41 ++++++++++++++++++++++++++++++++++++-----
>  2 files changed, 42 insertions(+), 5 deletions(-)


^ permalink raw reply

* Re: [PATCH v4 3/9] rust: doctest: add LocalModule fallback for #[vtable] ThisModule
From: Gary Guo @ 2026-06-23 13:49 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
	FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
	linux-pci
In-Reply-To: <20260623-fix-fops-owner-v4-3-0daf5f077d5c@linux.dev>

On Tue Jun 23, 2026 at 7:29 AM BST, Alvin Sun wrote:
> Add a `LocalModule` struct with a null-pointer `ModuleMetadata` impl
> in the doctest harness, so that `crate::LocalModule` (auto-inserted
> by `#[vtable]`) resolves correctly when there is no `module!` macro.
>
> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
> ---
>  scripts/rustdoc_test_gen.rs | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/scripts/rustdoc_test_gen.rs b/scripts/rustdoc_test_gen.rs
> index ee76e96b41eea..198af4e446c8c 100644
> --- a/scripts/rustdoc_test_gen.rs
> +++ b/scripts/rustdoc_test_gen.rs
> @@ -239,6 +239,22 @@ macro_rules! assert_eq {{
>  
>  const __LOG_PREFIX: &[u8] = b"rust_doctests_kernel\0";
>  
> +/// Dummy module type for doctest context.
> +struct LocalModule;
> +
> +use kernel::{{
> +    str::CStr,
> +    ModuleMetadata,
> +    ThisModule, //
> +}};
> +use core::ptr::null_mut;
> +
> +impl ModuleMetadata for LocalModule {{
> +    const NAME: &'static CStr = c"rust_doctests_kernel";
> +    // SAFETY: `try_module_get`/`module_put` handle null module pointers gracefully.
> +    const THIS_MODULE: ThisModule = unsafe {{ ThisModule::from_ptr(null_mut()) }};
> +}}

We probably a macro for crates that are built-in or are not the main crate of a
multi-crate module, and this would be able to use that mechanism.

But this looks okay for now.

Reviewed-by: Gary Guo <gary@garyguo.net>

> +
>  {rust_tests}
>  "#
>      )



^ permalink raw reply

* Re: [PATCH v4 2/9] rust: module: add `THIS_MODULE` const to `ModuleMetadata` trait
From: Gary Guo @ 2026-06-23 13:46 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
	FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
	linux-pci
In-Reply-To: <20260623-fix-fops-owner-v4-2-0daf5f077d5c@linux.dev>

On Tue Jun 23, 2026 at 7:29 AM BST, Alvin Sun wrote:
> Since `const_refs_to_static` has been stable as of the MSRV bump, a
> `ThisModule` pointer can now be used in const contexts.
>
> Add a `THIS_MODULE` const to the `ModuleMetadata` trait so that modules
> can provide their `ThisModule` pointer in const contexts such as static
> `file_operations`.
>
> Add a `this_module()` helper to retrieve the `THIS_MODULE` pointer of a
> given module type, and update `__init` to use it instead of the
> `THIS_MODULE` static generated by the `module!` macro.
>
> The `static THIS_MODULE` generated by the `module!` macro is retained
> for backwards compatibility with existing users and removed in a later
> patch once all references have been migrated.
>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
> ---
>  rust/kernel/module.rs |  8 ++++++++
>  rust/macros/module.rs | 18 +++++++++++++++++-
>  2 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/rust/kernel/module.rs b/rust/kernel/module.rs
> index be242a82e86d2..5aca42f7a33fc 100644
> --- a/rust/kernel/module.rs
> +++ b/rust/kernel/module.rs
> @@ -42,6 +42,14 @@ fn init(module: &'static ThisModule) -> impl pin_init::PinInit<Self, crate::erro
>  pub trait ModuleMetadata {
>      /// The name of the module as specified in the `module!` macro.
>      const NAME: &'static crate::str::CStr;
> +
> +    /// The module's `THIS_MODULE` pointer.
> +    const THIS_MODULE: ThisModule;
> +}
> +
> +/// Returns a reference to the `THIS_MODULE` of the given module type.

#[inline]

> +pub const fn this_module<M: ModuleMetadata>() -> &'static ThisModule {
> +    &M::THIS_MODULE
>  }


With the change,

Reviewed-by: Gary Guo <gary@garyguo.net>

>  
>  /// Equivalent to `THIS_MODULE` in the C API.
> diff --git a/rust/macros/module.rs b/rust/macros/module.rs
> index 06c18e2075083..aa9a618d5d19e 100644
> --- a/rust/macros/module.rs
> +++ b/rust/macros/module.rs
> @@ -519,6 +519,22 @@ pub(crate) fn module(info: ModuleInfo) -> Result<TokenStream> {
>  
>          impl ::kernel::ModuleMetadata for #type_ {
>              const NAME: &'static ::kernel::str::CStr = #name_cstr;
> +
> +            #[cfg(MODULE)]
> +            const THIS_MODULE: ::kernel::ThisModule = {
> +                extern "C" {
> +                    static __this_module: ::kernel::types::Opaque<::kernel::bindings::module>;
> +                }
> +
> +                // SAFETY: `__this_module` is constructed by the kernel at load time
> +                // and lives until the module is unloaded.
> +                unsafe { ::kernel::ThisModule::from_ptr(__this_module.get()) }
> +            };
> +
> +            #[cfg(not(MODULE))]
> +            const THIS_MODULE: ::kernel::ThisModule = unsafe {
> +                ::kernel::ThisModule::from_ptr(::core::ptr::null_mut())
> +            };
>          }
>  
>          // Double nested modules, since then nobody can access the public items inside.
> @@ -616,7 +632,7 @@ pub extern "C" fn #ident_exit() {
>                  /// This function must only be called once.
>                  unsafe fn __init() -> ::kernel::ffi::c_int {
>                      let initer = <super::super::LocalModule as ::kernel::InPlaceModule>::init(
> -                        &super::super::THIS_MODULE
> +                        ::kernel::module::this_module::<super::super::LocalModule>()
>                      );
>                      // SAFETY: No data race, since `__MOD` can only be accessed by this module
>                      // and there only `__init` and `__exit` access it. These functions are only



^ permalink raw reply

* [PATCH net 7/7] selftests/xsk: account invalid multi-buffer Tx descriptors
From: Maciej Fijalkowski @ 2026-06-23 13:32 UTC (permalink / raw)
  To: netdev
  Cc: bpf, magnus.karlsson, stfomichev, kuba, pabeni, horms,
	kerneljasonxing, bjorn, Maciej Fijalkowski
In-Reply-To: <20260623133240.1048434-1-maciej.fijalkowski@intel.com>

Invalid descriptors in the middle of a multi-buffer packet still belong
to the packet being consumed from the Tx ring. The tests should therefore
count the whole invalid packet as outstanding in verbatim mode, even
though the packet must not be expected on the Rx side.

Make fragment counting follow the packet boundary instead of stopping at
the first invalid fragment. Update custom stream generation so invalid
middle fragments terminate the generated Rx packet while Tx accounting
still covers all descriptors consumed from the invalid multi-buffer
packet.

Also add explicit end fragments after invalid middle descriptors. This
exercises the kernel drain logic and verifies that subsequent valid
packets are not interpreted as continuations of the invalid packet.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 .../selftests/bpf/prog_tests/test_xsk.c       | 24 ++++++++++++-------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
index de1e63c3fdf6..d8a1c0d40e5a 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
@@ -433,14 +433,14 @@ static u32 pkt_nb_frags(u32 frame_size, struct pkt_stream *pkt_stream, struct pk
 	}
 
 	/* Search for the end of the packet in verbatim mode */
-	if (!pkt_continues(pkt->options) || !pkt->valid)
+	if (!pkt_continues(pkt->options))
 		return nb_frags;
 
 	next_frag = pkt_stream->current_pkt_nb;
 	pkt++;
 	while (next_frag++ < pkt_stream->nb_pkts) {
 		nb_frags++;
-		if (!pkt_continues(pkt->options) || !pkt->valid)
+		if (!pkt_continues(pkt->options))
 			break;
 		pkt++;
 	}
@@ -671,11 +671,11 @@ static struct pkt_stream *__pkt_stream_generate_custom(struct ifobject *ifobj, s
 			if (!frame->valid || !pkt_continues(frame->options))
 				payload++;
 		} else {
-			if (frame->valid)
+			if (frame->valid) {
 				len += frame->len;
-			if (frame->valid && pkt_continues(frame->options))
-				continue;
-
+				if (pkt_continues(frame->options))
+					continue;
+			}
 			pkt->pkt_nb = pkt_nb;
 			pkt->len = len;
 			pkt->valid = frame->valid;
@@ -1214,6 +1214,7 @@ static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, b
 	for (i = 0; i < xsk->batch_size; i++) {
 		struct pkt *pkt = pkt_stream_get_next_tx_pkt(pkt_stream);
 		u32 nb_frags_left, nb_frags, bytes_written = 0;
+		struct pkt *first_pkt = pkt;
 
 		if (!pkt)
 			break;
@@ -1258,6 +1259,8 @@ static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, b
 		if (pkt && pkt->valid) {
 			valid_pkts++;
 			valid_frags += nb_frags;
+		} else if (pkt_stream->verbatim && pkt_continues(first_pkt->options)) {
+			valid_frags += nb_frags;
 		}
 	}
 
@@ -2104,13 +2107,16 @@ int testapp_invalid_desc_mb(struct test_spec *test)
 		{0, 0, 0, false, 0},
 		/* Invalid address in the second frame */
 		{0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD},
-		{umem_sz, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD},
+		{umem_sz * 2, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD},
+		{0, MIN_PKT_SIZE, 0, false, 0},
 		/* Invalid len in the middle */
 		{0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD},
 		{0, XSK_UMEM__INVALID_FRAME_SIZE, 0, false, XDP_PKT_CONTD},
+		{0, MIN_PKT_SIZE, 0, false, 0},
 		/* Invalid options in the middle */
 		{0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD},
 		{0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XSK_DESC__INVALID_OPTION},
+		{0, MIN_PKT_SIZE, 0, false, 0},
 		/* Transmit 2 frags, receive 3 */
 		{0, XSK_UMEM__MAX_FRAME_SIZE, 0, true, XDP_PKT_CONTD},
 		{0, XSK_UMEM__MAX_FRAME_SIZE, 0, true, 0},
@@ -2122,8 +2128,8 @@ int testapp_invalid_desc_mb(struct test_spec *test)
 
 	if (umem->unaligned_mode) {
 		/* Crossing a chunk boundary allowed */
-		pkts[12].valid = true;
-		pkts[13].valid = true;
+		pkts[15].valid = true;
+		pkts[16].valid = true;
 	}
 
 	test->mtu = MAX_ETH_JUMBO_SIZE;
-- 
2.43.0


^ permalink raw reply related

* [PATCH net 6/7] selftests/xsk: fix too-many-frags multi-buffer Tx test
From: Maciej Fijalkowski @ 2026-06-23 13:32 UTC (permalink / raw)
  To: netdev
  Cc: bpf, magnus.karlsson, stfomichev, kuba, pabeni, horms,
	kerneljasonxing, bjorn, Maciej Fijalkowski
In-Reply-To: <20260623133240.1048434-1-maciej.fijalkowski@intel.com>

The too-many-frags test describes a packet that is valid from the Tx
ring ownership point of view, but invalid for transmission because it
exceeds the supported number of fragments.

Keep the generated Tx descriptors valid so that __send_pkts() accounts
them as outstanding descriptors that must be reclaimed through the CQ.
Then mark the corresponding Rx packet invalid so the test still does
not expect the oversized packet to appear on the receive side.

Add a valid synchronization packet after the oversized packet so the
test can verify that the Tx path drains the bad packet and resumes at
the next packet boundary.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 .../selftests/bpf/prog_tests/test_xsk.c       | 20 +++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
index 72875071d4f1..de1e63c3fdf6 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
@@ -2258,7 +2258,7 @@ int testapp_too_many_frags(struct test_spec *test)
 		max_frags += 1;
 	}
 
-	pkts = calloc(2 * max_frags + 2, sizeof(struct pkt));
+	pkts = calloc(2 * max_frags + 3, sizeof(struct pkt));
 	if (!pkts)
 		return TEST_FAILURE;
 
@@ -2279,21 +2279,29 @@ int testapp_too_many_frags(struct test_spec *test)
 	/* An invalid packet with the max amount of frags but signals packet
 	 * continues on the last frag
 	 */
-	for (i = max_frags + 1; i < 2 * max_frags + 1; i++) {
+	for (i = max_frags + 1; i < 2 * max_frags + 2; i++) {
 		pkts[i].len = MIN_PKT_SIZE;
 		pkts[i].options = XDP_PKT_CONTD;
-		pkts[i].valid = false;
+		pkts[i].valid = true;
 	}
+	pkts[2 * max_frags + 1].options = 0;
 
 	/* Valid packet for synch */
-	pkts[2 * max_frags + 1].len = MIN_PKT_SIZE;
-	pkts[2 * max_frags + 1].valid = true;
+	pkts[2 * max_frags + 2].len = MIN_PKT_SIZE;
+	pkts[2 * max_frags + 2].valid = true;
 
-	if (pkt_stream_generate_custom(test, pkts, 2 * max_frags + 2)) {
+	if (pkt_stream_generate_custom(test, pkts, 2 * max_frags + 3)) {
 		free(pkts);
 		return TEST_FAILURE;
 	}
 
+	/* The generated Tx stream must keep the too-big packet valid so that
+	 * __send_pkts() accounts its descriptors in outstanding_tx. The Rx
+	 * stream, however, must not expect this packet on the wire.
+	 */
+	test->ifobj_rx->xsk->pkt_stream->pkts[2].valid = false;
+	test->ifobj_rx->xsk->pkt_stream->nb_valid_entries--;
+
 	ret = testapp_validate_traffic(test);
 	free(pkts);
 	return ret;
-- 
2.43.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox