public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCHSET v5 sched_ext/for-6.15] sched_ext: Enhance built-in idle selection with allowed CPUs
Date: Thu, 20 Mar 2025 16:33:51 +0100	[thread overview]
Message-ID: <Z9w1X_GDIYV1CmIs@gpd3> (raw)
In-Reply-To: <778d935f-ef77-4bac-aeff-1bafa91b825e@nvidia.com>

On Thu, Mar 20, 2025 at 03:05:37PM +0100, Joel Fernandes wrote:
> On 3/20/2025 8:36 AM, Andrea Righi wrote:
...
> > Example usage
> > =============
> > 
> > s32 BPF_STRUCT_OPS(foo_select_cpu, struct task_struct *p,
> > 		   s32 prev_cpu, u64 wake_flags)
> > {
> > 	const struct cpumask *cpus = task_allowed_cpus(p) ?: p->cpus_ptr;
> > 	s32 cpu;
> 
> Andrea, I'm curious why cannot this expression simply be moved into the default
> select implementation? And then for those that need a more custom mask, we can
> do the scx_bpf_select_cpu_and() as a second step.

Yeah, maybe the example could be improved a bit. Basically I'm doing
task_allowed_cpus(p) ?: p->cpus_ptr to highlight that you can't pass NULL
as the extra "and" cpumask (otherwise the verifier won't be happy).

Also, if you call the old scx_bpf_select_cpu_dfl(), the internal logic
already uses the same backend as scx_bpf_select_cpu_and() passing
p->cpus_ptr as @cpus_allowed.

> 
> Also I think I am missing, what is the motivation in the existing code to not do
> LLC/NUMA-only scans if the task is restrained? Thanks for clarifying.

You can use the "flags" argument to restrict the selection to the current
node, setting SCX_PICK_IDLE_IN_NODE.

We currently don't have a SCX_PICK_IDLE_IN_LLC flag (it'd be nice to
introduce it), so currently the only way to restrict the selection to the
current LLC is to use the additional "and" cpumask (@cpus_allowed), passing
the LLC span.

Thanks,
-Andrea

> 
> thanks,
> 
>  - Joel
> 
> 
> 
> > 
> > 	cpu = scx_bpf_select_cpu_and(p, prev_cpu, wake_flags, cpus, 0);
> > 	if (cpu >= 0) {
> > 		scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
> > 		return cpu;
> > 	}
> > 
> > 	return prev_cpu;
> > }
> > 
> > Results
> > =======
> > 
> > Load distribution on a 4 sockets / 4 cores per socket system, simulated
> > using virtme-ng, running a modified version of scx_bpfland that uses the
> > new helper scx_bpf_select_cpu_and() and 0xff00 as allowed domain:
> > 
> >      $ vng --cpu 16,sockets=4,cores=4,threads=1
> >      ...
> >      $ stress-ng -c 16
> >      ...
> >      $ htop
> >      ...
> >        0[                         0.0%]   8[||||||||||||||||||||||||100.0%]
> >        1[                         0.0%]   9[||||||||||||||||||||||||100.0%]
> >        2[                         0.0%]  10[||||||||||||||||||||||||100.0%]
> >        3[                         0.0%]  11[||||||||||||||||||||||||100.0%]
> >        4[                         0.0%]  12[||||||||||||||||||||||||100.0%]
> >        5[                         0.0%]  13[||||||||||||||||||||||||100.0%]
> >        6[                         0.0%]  14[||||||||||||||||||||||||100.0%]
> >        7[                         0.0%]  15[||||||||||||||||||||||||100.0%]
> > 
> > With scx_bpf_select_cpu_dfl() tasks would be distributed evenly across all
> > the available CPUs.
> > 
> > ChangeLog v4 -> v5:
> >  - simplify the code to compute (and) task's temporary cpumasks
> > 
> > ChangeLog v3 -> v4:
> >  - keep p->nr_cpus_allowed optimizations (skip cpumask operations when the
> >    task can run on all CPUs)
> >  - allow to call scx_bpf_select_cpu_and() also from ops.enqueue() and
> >    modify the kselftest to cover this case as well
> >  - rebase to the latest sched_ext/for-6.15
> > 
> > ChangeLog v2 -> v3:
> >  - incrementally refactor scx_select_cpu_dfl() to accept idle flags and an
> >    arbitrary allowed cpumask
> >  - build scx_bpf_select_cpu_and() on top of the existing logic
> >  - re-arrange scx_select_cpu_dfl() prototype, aligning the first three
> >    arguments with select_task_rq()
> >  - do not use "domain" for the allowed cpumask to avoid potential ambiguity
> >    with sched_domain
> > 
> > ChangeLog v1 -> v2:
> >   - rename scx_bpf_select_cpu_pref() to scx_bpf_select_cpu_and() and always
> >     select idle CPUs strictly within the allowed domain
> >   - rename preferred CPUs -> allowed CPU
> >   - drop %SCX_PICK_IDLE_IN_PREF (not required anymore)
> >   - deprecate scx_bpf_select_cpu_dfl() in favor of scx_bpf_select_cpu_and()
> >     and provide all the required backward compatibility boilerplate
> > 
> > Andrea Righi (6):
> >       sched_ext: idle: Extend topology optimizations to all tasks
> >       sched_ext: idle: Explicitly pass allowed cpumask to scx_select_cpu_dfl()
> >       sched_ext: idle: Accept an arbitrary cpumask in scx_select_cpu_dfl()
> >       sched_ext: idle: Introduce scx_bpf_select_cpu_and()
> >       selftests/sched_ext: Add test for scx_bpf_select_cpu_and()
> >       sched_ext: idle: Deprecate scx_bpf_select_cpu_dfl()
> > 
> >  Documentation/scheduler/sched-ext.rst              |  11 +-
> >  kernel/sched/ext.c                                 |   6 +-
> >  kernel/sched/ext_idle.c                            | 196 ++++++++++++++++-----
> >  kernel/sched/ext_idle.h                            |   3 +-
> >  tools/sched_ext/include/scx/common.bpf.h           |   5 +-
> >  tools/sched_ext/include/scx/compat.bpf.h           |  37 ++++
> >  tools/sched_ext/scx_flatcg.bpf.c                   |  12 +-
> >  tools/sched_ext/scx_simple.bpf.c                   |   9 +-
> >  tools/testing/selftests/sched_ext/Makefile         |   1 +
> >  .../testing/selftests/sched_ext/allowed_cpus.bpf.c | 121 +++++++++++++
> >  tools/testing/selftests/sched_ext/allowed_cpus.c   |  57 ++++++
> >  .../selftests/sched_ext/enq_select_cpu_fails.bpf.c |  12 +-
> >  .../selftests/sched_ext/enq_select_cpu_fails.c     |   2 +-
> >  tools/testing/selftests/sched_ext/exit.bpf.c       |   6 +-
> >  .../sched_ext/select_cpu_dfl_nodispatch.bpf.c      |  13 +-
> >  .../sched_ext/select_cpu_dfl_nodispatch.c          |   2 +-
> >  16 files changed, 404 insertions(+), 89 deletions(-)
> >  create mode 100644 tools/testing/selftests/sched_ext/allowed_cpus.bpf.c
> >  create mode 100644 tools/testing/selftests/sched_ext/allowed_cpus.c
> 

      reply	other threads:[~2025-03-20 15:34 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-20  7:36 [PATCHSET v5 sched_ext/for-6.15] sched_ext: Enhance built-in idle selection with allowed CPUs Andrea Righi
2025-03-20  7:36 ` [PATCH 1/6] sched_ext: idle: Extend topology optimizations to all tasks Andrea Righi
2025-03-20 16:49   ` Tejun Heo
2025-03-20 22:08     ` Andrea Righi
2025-03-20  7:36 ` [PATCH 2/6] sched_ext: idle: Explicitly pass allowed cpumask to scx_select_cpu_dfl() Andrea Righi
2025-03-21 10:15   ` changwoo
2025-03-21 22:02     ` Andrea Righi
2025-03-20  7:36 ` [PATCH 3/6] sched_ext: idle: Accept an arbitrary cpumask in scx_select_cpu_dfl() Andrea Righi
2025-03-20  7:36 ` [PATCH 4/6] sched_ext: idle: Introduce scx_bpf_select_cpu_and() Andrea Righi
2025-03-20  7:36 ` [PATCH 5/6] selftests/sched_ext: Add test for scx_bpf_select_cpu_and() Andrea Righi
2025-03-20  7:36 ` [PATCH 6/6] sched_ext: idle: Deprecate scx_bpf_select_cpu_dfl() Andrea Righi
2025-03-20 14:05 ` [PATCHSET v5 sched_ext/for-6.15] sched_ext: Enhance built-in idle selection with allowed CPUs Joel Fernandes
2025-03-20 15:33   ` Andrea Righi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z9w1X_GDIYV1CmIs@gpd3 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=joelagnelf@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox