From: David Vernet <void@manifault.com>
To: Tejun Heo <tj@kernel.org>
Cc: kernel-team@meta.com, linux-kernel@vger.kernel.org, sched-ext@meta.com
Subject: Re: [PATCH 4/6] sched_ext: bypass mode shouldn't depend on ops.select_cpu()
Date: Thu, 10 Oct 2024 13:15:17 -0500 [thread overview]
Message-ID: <20241010181517.GC28209@maniforge> (raw)
In-Reply-To: <20241009214411.681233-5-tj@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 3608 bytes --]
On Wed, Oct 09, 2024 at 11:41:00AM -1000, Tejun Heo wrote:
Hi Tejun,
> Bypass mode was depending on ops.select_cpu() which can't be trusted as with
> the rest of the BPF scheduler. Always enable and use scx_select_cpu_dfl() in
> bypass mode.
Could you please clarify why we can't trust ops.select_cpu()? Even if it
returns a bogus, offline, etc, CPU, shouldn't core.c take care of
finding a valid CPU for us in select_fallback_rq()?
Assuming we really do require a valid CPU here in bypass mode, do we
need to reset the state of the idle masks for the case of
!scx_builtin_idle_enabled? The masks won't necessarily reflect the set
of online CPUs if we haven't been updating it, right?
Thanks,
David
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> kernel/sched/ext.c | 28 +++++++++++++++-------------
> 1 file changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 1cae18000de1..c5cda7368de5 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -3126,7 +3126,7 @@ static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wake_flag
> if (unlikely(wake_flags & WF_EXEC))
> return prev_cpu;
>
> - if (SCX_HAS_OP(select_cpu)) {
> + if (SCX_HAS_OP(select_cpu) && !scx_rq_bypassing(task_rq(p))) {
> s32 cpu;
> struct task_struct **ddsp_taskp;
>
> @@ -3191,7 +3191,7 @@ void __scx_update_idle(struct rq *rq, bool idle)
> {
> int cpu = cpu_of(rq);
>
> - if (SCX_HAS_OP(update_idle)) {
> + if (SCX_HAS_OP(update_idle) && !scx_rq_bypassing(rq)) {
> SCX_CALL_OP(SCX_KF_REST, update_idle, cpu_of(rq), idle);
> if (!static_branch_unlikely(&scx_builtin_idle_enabled))
> return;
> @@ -4254,21 +4254,23 @@ bool task_should_scx(struct task_struct *p)
> * the DISABLING state and then cycling the queued tasks through dequeue/enqueue
> * to force global FIFO scheduling.
> *
> - * a. ops.enqueue() is ignored and tasks are queued in simple global FIFO order.
> - * %SCX_OPS_ENQ_LAST is also ignored.
> + * - ops.select_cpu() is ignored and the default select_cpu() is used.
> *
> - * b. ops.dispatch() is ignored.
> + * - ops.enqueue() is ignored and tasks are queued in simple global FIFO order.
> + * %SCX_OPS_ENQ_LAST is also ignored.
> *
> - * c. balance_scx() does not set %SCX_RQ_BAL_KEEP on non-zero slice as slice
> - * can't be trusted. Whenever a tick triggers, the running task is rotated to
> - * the tail of the queue with core_sched_at touched.
> + * - ops.dispatch() is ignored.
> *
> - * d. pick_next_task() suppresses zero slice warning.
> + * - balance_scx() does not set %SCX_RQ_BAL_KEEP on non-zero slice as slice
> + * can't be trusted. Whenever a tick triggers, the running task is rotated to
> + * the tail of the queue with core_sched_at touched.
> *
> - * e. scx_bpf_kick_cpu() is disabled to avoid irq_work malfunction during PM
> - * operations.
> + * - pick_next_task() suppresses zero slice warning.
> *
> - * f. scx_prio_less() reverts to the default core_sched_at order.
> + * - scx_bpf_kick_cpu() is disabled to avoid irq_work malfunction during PM
> + * operations.
> + *
> + * - scx_prio_less() reverts to the default core_sched_at order.
> */
> static void scx_ops_bypass(bool bypass)
> {
> @@ -4338,7 +4340,7 @@ static void scx_ops_bypass(bool bypass)
>
> rq_unlock_irqrestore(rq, &rf);
>
> - /* kick to restore ticks */
> + /* resched to restore ticks and idle state */
> resched_cpu(cpu);
> }
> }
> --
> 2.46.2
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
next prev parent reply other threads:[~2024-10-10 18:15 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-09 21:40 [PATCHSET sched_ext/for-6.12-fixes] sched_ext: Fix RCU and other stalls while iterating tasks during enable/disable Tejun Heo
2024-10-09 21:40 ` [PATCH 1/6] Revert "sched_ext: Use shorter slice while bypassing" Tejun Heo
2024-10-10 17:59 ` David Vernet
2024-10-09 21:40 ` [PATCH 2/6] sched_ext: Start schedulers with consistent p->scx.slice values Tejun Heo
2024-10-10 18:00 ` David Vernet
2024-10-09 21:40 ` [PATCH 3/6] sched_ext: Move scx_buildin_idle_enabled check to scx_bpf_select_cpu_dfl() Tejun Heo
2024-10-09 21:41 ` [PATCH 4/6] sched_ext: bypass mode shouldn't depend on ops.select_cpu() Tejun Heo
2024-10-10 18:15 ` David Vernet [this message]
2024-10-10 18:26 ` Tejun Heo
2024-10-10 18:31 ` David Vernet
2024-10-09 21:41 ` [PATCH 5/6] sched_ext: Move scx_tasks_lock handling into scx_task_iter helpers Tejun Heo
2024-10-10 18:36 ` David Vernet
2024-10-09 21:41 ` [PATCH 6/6] sched_ext: Don't hold scx_tasks_lock for too long Tejun Heo
2024-10-10 19:12 ` David Vernet
2024-10-10 21:38 ` Tejun Heo
2024-10-10 23:38 ` Waiman Long
2024-10-10 21:43 ` [PATCHSET sched_ext/for-6.12-fixes] sched_ext: Fix RCU and other stalls while iterating tasks during enable/disable Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241010181517.GC28209@maniforge \
--to=void@manifault.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@meta.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.