Re: [PATCH] sched_ext: sync disable_irq_work in bpf_scx_unreg()

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Cheng-Yang Chou <yphbchou0911@gmail.com>
To: Richard Cheng <icheng@nvidia.com>
Cc: arighi@nvidia.com, mingo@redhat.com, peterz@infradead.org,
	 tj@kernel.org, void@manifault.com, changwoo@igalia.com,
	juri.lelli@redhat.com,  vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	 bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
	sched-ext@lists.linux.dev,  linux-kernel@vger.kernel.org,
	newtonl@nvidia.com, kristinc@nvidia.com, kaihengf@nvidia.com,
	 kobak@nvidia.com, Ching-Chun Huang <jserv@ccns.ncku.edu.tw>,
	 Chia-Ping Tsai <chia7712@gmail.com>
Subject: Re: [PATCH] sched_ext: sync disable_irq_work in bpf_scx_unreg()
Date: Wed, 22 Apr 2026 18:51:13 +0800	[thread overview]
Message-ID: <20260422183307.Gf0f8@cchengyang.duckdns.org> (raw)
In-Reply-To: <20260422100938.35781-1-icheng@nvidia.com>

Hi Richard,

On Wed, Apr 22, 2026 at 06:09:38PM +0800, Richard Cheng wrote:
> When unregistered my self-written scx scheduler, the following panic
> occurs [1].

Nit: you've placed the panic log [1] below the --- separator. Content
below this line will not be preserved in the comment msg.
Could you please move it into the commit msg body and send a v2 patch?

> 
> The root cause is that the JIT page backing ops->quiescent() is freed
> before all callers of that function have stopped.
> 
> The expected ordering during teardown is:
>     bitmap_zero(sch->has_op) + synchronize_rcu()
>         -> guarantees no CPU will ever call sch->ops.* again
>     -> only THEN free the BPF struct_ops JIT page
> 
> bpf_scx_unreg() is supposed to enforce the order, but after
> commit f4a6c506d118 ("sched_ext: Always bounce scx_disable() through
> irq_work"), disable_work is no longer queued directly, causing
> kthread_flush_work() to be a noop. Thus, the caller drops the struct_ops
> map too early and poisoned with AARCH64_BREAK_FAULT before
> disable_workfn ever execute.
> 
> So the subsequent dequeue_task() still sees SCX_HAS_OP(sch, quiescent)
> as true and calls ops.quiescent, which hit on the poisoned page and BRK
> panic.
> 
> Fix it by syncing disable_irq_work first, so disable_work is guaranteed
> to be queued before waiting for it.
> 
> Fixes: f4a6c506d118 ("sched_ext: Always bounce scx_disable() through irq_work")
> Signed-off-by: Richard Cheng <icheng@nvidia.com>

Thanks for the fix, and the logic looks correct to me.

Reviewed-by: Cheng-Yang Chou <yphbchou0911@gmail.com>

Also, scx_root_enable_workfn() has the same pattern in its error path:

	scx_error(sch, "scx_root_enable() failed (%d)", ret);
	kthread_flush_work(&sch->disable_work);

The comment above indicates that this flush is meant to "ensure that
error is reported before init completion". However, because scx_error()
goes through scx_vexit() -> irq_work_queue(), the flush can be a no-op
here as well. The same applies to the sub-scheduler enable error path.

Should those be fixed in the same patch? Tejun, Andrea, wdyt? Thanks

[...]
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 012ca8bd70fb..065660382a0c 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -7349,6 +7349,12 @@ static void bpf_scx_unreg(void *kdata, struct bpf_link *link)
>  	struct scx_sched *sch = rcu_dereference_protected(ops->priv, true);
>  
>  	scx_disable(sch, SCX_EXIT_UNREG);
> +	/*
> +	 * sch->disable_work might still not queued, causing kthread_flush_work()
> +	 * as a noop. Syncing the irq_work first is required to guarantee the

Perhaps s/noop/no-op/? Though it's just a matter of taste. ^_^

> +	 * kthread work has been queued before waiting for it.
> +	 */
> +	irq_work_sync(&sch->disable_irq_work);
>  	kthread_flush_work(&sch->disable_work);
>  	RCU_INIT_POINTER(ops->priv, NULL);
>  	kobject_put(&sch->kobj);
> -- 
> 2.43.0
> 
> 

-- 
Cheers,
Cheng-Yang

next prev parent reply	other threads:[~2026-04-22 10:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-22 10:09 [PATCH] sched_ext: sync disable_irq_work in bpf_scx_unreg() Richard Cheng
2026-04-22 10:46 ` Andrea Righi
2026-04-22 10:51 ` Cheng-Yang Chou [this message]
2026-04-22 17:24   ` Tejun Heo
2026-04-24  3:31     ` Richard Cheng
2026-04-24  3:29   ` Richard Cheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260422183307.Gf0f8@cchengyang.duckdns.org \
    --to=yphbchou0911@gmail.com \
    --cc=arighi@nvidia.com \
    --cc=bsegall@google.com \
    --cc=changwoo@igalia.com \
    --cc=chia7712@gmail.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=icheng@nvidia.com \
    --cc=jserv@ccns.ncku.edu.tw \
    --cc=juri.lelli@redhat.com \
    --cc=kaihengf@nvidia.com \
    --cc=kobak@nvidia.com \
    --cc=kristinc@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=newtonl@nvidia.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=void@manifault.com \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox