Re: [PATCHSET v5 sched_ext/for-6.15] sched_ext: Enhance built-in idle selection with allowed CPUs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrea Righi <arighi@nvidia.com>
To: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCHSET v5 sched_ext/for-6.15] sched_ext: Enhance built-in idle selection with allowed CPUs
Date: Thu, 20 Mar 2025 16:33:51 +0100	[thread overview]
Message-ID: <Z9w1X_GDIYV1CmIs@gpd3> (raw)
In-Reply-To: <778d935f-ef77-4bac-aeff-1bafa91b825e@nvidia.com>

On Thu, Mar 20, 2025 at 03:05:37PM +0100, Joel Fernandes wrote:
> On 3/20/2025 8:36 AM, Andrea Righi wrote:
...
> > Example usage
> > =============
> > 
> > s32 BPF_STRUCT_OPS(foo_select_cpu, struct task_struct *p,
> > 		   s32 prev_cpu, u64 wake_flags)
> > {
> > 	const struct cpumask *cpus = task_allowed_cpus(p) ?: p->cpus_ptr;
> > 	s32 cpu;
> 
> Andrea, I'm curious why cannot this expression simply be moved into the default
> select implementation? And then for those that need a more custom mask, we can
> do the scx_bpf_select_cpu_and() as a second step.

Yeah, maybe the example could be improved a bit. Basically I'm doing
task_allowed_cpus(p) ?: p->cpus_ptr to highlight that you can't pass NULL
as the extra "and" cpumask (otherwise the verifier won't be happy).

Also, if you call the old scx_bpf_select_cpu_dfl(), the internal logic
already uses the same backend as scx_bpf_select_cpu_and() passing
p->cpus_ptr as @cpus_allowed.

> 
> Also I think I am missing, what is the motivation in the existing code to not do
> LLC/NUMA-only scans if the task is restrained? Thanks for clarifying.

You can use the "flags" argument to restrict the selection to the current
node, setting SCX_PICK_IDLE_IN_NODE.

We currently don't have a SCX_PICK_IDLE_IN_LLC flag (it'd be nice to
introduce it), so currently the only way to restrict the selection to the
current LLC is to use the additional "and" cpumask (@cpus_allowed), passing
the LLC span.

Thanks,
-Andrea

> 
> thanks,
> 
>  - Joel
> 
> 
> 
> > 
> > 	cpu = scx_bpf_select_cpu_and(p, prev_cpu, wake_flags, cpus, 0);
> > 	if (cpu >= 0) {
> > 		scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
> > 		return cpu;
> > 	}
> > 
> > 	return prev_cpu;
> > }
> > 
> > Results
> > =======
> > 
> > Load distribution on a 4 sockets / 4 cores per socket system, simulated
> > using virtme-ng, running a modified version of scx_bpfland that uses the
> > new helper scx_bpf_select_cpu_and() and 0xff00 as allowed domain:
> > 
> >      $ vng --cpu 16,sockets=4,cores=4,threads=1
> >      ...
> >      $ stress-ng -c 16
> >      ...
> >      $ htop
> >      ...
> >        0[                         0.0%]   8[||||||||||||||||||||||||100.0%]
> >        1[                         0.0%]   9[||||||||||||||||||||||||100.0%]
> >        2[                         0.0%]  10[||||||||||||||||||||||||100.0%]
> >        3[                         0.0%]  11[||||||||||||||||||||||||100.0%]
> >        4[                         0.0%]  12[||||||||||||||||||||||||100.0%]
> >        5[                         0.0%]  13[||||||||||||||||||||||||100.0%]
> >        6[                         0.0%]  14[||||||||||||||||||||||||100.0%]
> >        7[                         0.0%]  15[||||||||||||||||||||||||100.0%]
> > 
> > With scx_bpf_select_cpu_dfl() tasks would be distributed evenly across all
> > the available CPUs.
> > 
> > ChangeLog v4 -> v5:
> >  - simplify the code to compute (and) task's temporary cpumasks
> > 
> > ChangeLog v3 -> v4:
> >  - keep p->nr_cpus_allowed optimizations (skip cpumask operations when the
> >    task can run on all CPUs)
> >  - allow to call scx_bpf_select_cpu_and() also from ops.enqueue() and
> >    modify the kselftest to cover this case as well
> >  - rebase to the latest sched_ext/for-6.15
> > 
> > ChangeLog v2 -> v3:
> >  - incrementally refactor scx_select_cpu_dfl() to accept idle flags and an
> >    arbitrary allowed cpumask
> >  - build scx_bpf_select_cpu_and() on top of the existing logic
> >  - re-arrange scx_select_cpu_dfl() prototype, aligning the first three
> >    arguments with select_task_rq()
> >  - do not use "domain" for the allowed cpumask to avoid potential ambiguity
> >    with sched_domain
> > 
> > ChangeLog v1 -> v2:
> >   - rename scx_bpf_select_cpu_pref() to scx_bpf_select_cpu_and() and always
> >     select idle CPUs strictly within the allowed domain
> >   - rename preferred CPUs -> allowed CPU
> >   - drop %SCX_PICK_IDLE_IN_PREF (not required anymore)
> >   - deprecate scx_bpf_select_cpu_dfl() in favor of scx_bpf_select_cpu_and()
> >     and provide all the required backward compatibility boilerplate
> > 
> > Andrea Righi (6):
> >       sched_ext: idle: Extend topology optimizations to all tasks
> >       sched_ext: idle: Explicitly pass allowed cpumask to scx_select_cpu_dfl()
> >       sched_ext: idle: Accept an arbitrary cpumask in scx_select_cpu_dfl()
> >       sched_ext: idle: Introduce scx_bpf_select_cpu_and()
> >       selftests/sched_ext: Add test for scx_bpf_select_cpu_and()
> >       sched_ext: idle: Deprecate scx_bpf_select_cpu_dfl()
> > 
> >  Documentation/scheduler/sched-ext.rst              |  11 +-
> >  kernel/sched/ext.c                                 |   6 +-
> >  kernel/sched/ext_idle.c                            | 196 ++++++++++++++++-----
> >  kernel/sched/ext_idle.h                            |   3 +-
> >  tools/sched_ext/include/scx/common.bpf.h           |   5 +-
> >  tools/sched_ext/include/scx/compat.bpf.h           |  37 ++++
> >  tools/sched_ext/scx_flatcg.bpf.c                   |  12 +-
> >  tools/sched_ext/scx_simple.bpf.c                   |   9 +-
> >  tools/testing/selftests/sched_ext/Makefile         |   1 +
> >  .../testing/selftests/sched_ext/allowed_cpus.bpf.c | 121 +++++++++++++
> >  tools/testing/selftests/sched_ext/allowed_cpus.c   |  57 ++++++
> >  .../selftests/sched_ext/enq_select_cpu_fails.bpf.c |  12 +-
> >  .../selftests/sched_ext/enq_select_cpu_fails.c     |   2 +-
> >  tools/testing/selftests/sched_ext/exit.bpf.c       |   6 +-
> >  .../sched_ext/select_cpu_dfl_nodispatch.bpf.c      |  13 +-
> >  .../sched_ext/select_cpu_dfl_nodispatch.c          |   2 +-
> >  16 files changed, 404 insertions(+), 89 deletions(-)
> >  create mode 100644 tools/testing/selftests/sched_ext/allowed_cpus.bpf.c
> >  create mode 100644 tools/testing/selftests/sched_ext/allowed_cpus.c
>

     prev parent reply	other threads:[~2025-03-20 15:34 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-20  7:36 [PATCHSET v5 sched_ext/for-6.15] sched_ext: Enhance built-in idle selection with allowed CPUs Andrea Righi
2025-03-20  7:36 ` [PATCH 1/6] sched_ext: idle: Extend topology optimizations to all tasks Andrea Righi
2025-03-20 16:49   ` Tejun Heo
2025-03-20 22:08     ` Andrea Righi
2025-03-20  7:36 ` [PATCH 2/6] sched_ext: idle: Explicitly pass allowed cpumask to scx_select_cpu_dfl() Andrea Righi
2025-03-21 10:15   ` changwoo
2025-03-21 22:02     ` Andrea Righi
2025-03-20  7:36 ` [PATCH 3/6] sched_ext: idle: Accept an arbitrary cpumask in scx_select_cpu_dfl() Andrea Righi
2025-03-20  7:36 ` [PATCH 4/6] sched_ext: idle: Introduce scx_bpf_select_cpu_and() Andrea Righi
2025-03-20  7:36 ` [PATCH 5/6] selftests/sched_ext: Add test for scx_bpf_select_cpu_and() Andrea Righi
2025-03-20  7:36 ` [PATCH 6/6] sched_ext: idle: Deprecate scx_bpf_select_cpu_dfl() Andrea Righi
2025-03-20 14:05 ` [PATCHSET v5 sched_ext/for-6.15] sched_ext: Enhance built-in idle selection with allowed CPUs Joel Fernandes
2025-03-20 15:33   ` Andrea Righi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z9w1X_GDIYV1CmIs@gpd3 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=joelagnelf@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.