From: Tejun Heo <tj@kernel.org>
To: David Vernet <void@manifault.com>
Cc: kernel-team@meta.com, linux-kernel@vger.kernel.org, sched-ext@meta.com
Subject: Re: [PATCH 4/6] sched_ext: bypass mode shouldn't depend on ops.select_cpu()
Date: Thu, 10 Oct 2024 08:26:27 -1000 [thread overview]
Message-ID: <ZwgcU6PKOYMP83MC@slm.duckdns.org> (raw)
In-Reply-To: <20241010181517.GC28209@maniforge>
Hello,
On Thu, Oct 10, 2024 at 01:15:17PM -0500, David Vernet wrote:
> On Wed, Oct 09, 2024 at 11:41:00AM -1000, Tejun Heo wrote:
> > Bypass mode was depending on ops.select_cpu() which can't be trusted as with
> > the rest of the BPF scheduler. Always enable and use scx_select_cpu_dfl() in
> > bypass mode.
>
> Could you please clarify why we can't trust ops.select_cpu()? Even if it
> returns a bogus, offline, etc, CPU, shouldn't core.c take care of
> finding a valid CPU for us in select_fallback_rq()?
For example, if select_cpu() returns the same CPU for all threads on a
loaded system, that CPU can get very overloaded which can lead to RCU and
workqueue stalls which can then cascade to other failures.
> Assuming we really do require a valid CPU here in bypass mode, do we
> need to reset the state of the idle masks for the case of
> !scx_builtin_idle_enabled? The masks won't necessarily reflect the set
> of online CPUs if we haven't been updating it, right?
I think resched_cpu() after switching each CPU into bypass mode is enough.
That guarantees that the CPU leaves the idle state, clearing the idle state
if set, and if the CPU is idle, it goes back into idle, setting the bit, so
at the end, it ends up synchronized.
Thanks.
--
tejun
next prev parent reply other threads:[~2024-10-10 18:26 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-09 21:40 [PATCHSET sched_ext/for-6.12-fixes] sched_ext: Fix RCU and other stalls while iterating tasks during enable/disable Tejun Heo
2024-10-09 21:40 ` [PATCH 1/6] Revert "sched_ext: Use shorter slice while bypassing" Tejun Heo
2024-10-10 17:59 ` David Vernet
2024-10-09 21:40 ` [PATCH 2/6] sched_ext: Start schedulers with consistent p->scx.slice values Tejun Heo
2024-10-10 18:00 ` David Vernet
2024-10-09 21:40 ` [PATCH 3/6] sched_ext: Move scx_buildin_idle_enabled check to scx_bpf_select_cpu_dfl() Tejun Heo
2024-10-09 21:41 ` [PATCH 4/6] sched_ext: bypass mode shouldn't depend on ops.select_cpu() Tejun Heo
2024-10-10 18:15 ` David Vernet
2024-10-10 18:26 ` Tejun Heo [this message]
2024-10-10 18:31 ` David Vernet
2024-10-09 21:41 ` [PATCH 5/6] sched_ext: Move scx_tasks_lock handling into scx_task_iter helpers Tejun Heo
2024-10-10 18:36 ` David Vernet
2024-10-09 21:41 ` [PATCH 6/6] sched_ext: Don't hold scx_tasks_lock for too long Tejun Heo
2024-10-10 19:12 ` David Vernet
2024-10-10 21:38 ` Tejun Heo
2024-10-10 23:38 ` Waiman Long
2024-10-10 21:43 ` [PATCHSET sched_ext/for-6.12-fixes] sched_ext: Fix RCU and other stalls while iterating tasks during enable/disable Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZwgcU6PKOYMP83MC@slm.duckdns.org \
--to=tj@kernel.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@meta.com \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox