Re: [PATCHSET v6 sched_ext/for-6.15] sched_ext: Enhance built-in idle selection with allowed CPUs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Changwoo Min <changwoo@igalia.com>
To: Andrea Righi <arighi@nvidia.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCHSET v6 sched_ext/for-6.15] sched_ext: Enhance built-in idle selection with allowed CPUs
Date: Sat, 22 Mar 2025 12:56:11 +0900	[thread overview]
Message-ID: <36366fd3e9bc730300cc2262687c890f@igalia.com> (raw)
In-Reply-To: <20250321221454.298202-1-arighi@nvidia.com>

Hi Andrea,

Looks great to me.

Thanks!
Changwoo Min

On 2025-03-22 07:10, Andrea Righi wrote:
> Many scx schedulers implement their own hard or soft-affinity rules to
> support topology characteristics, such as heterogeneous architectures
> (e.g., big.LITTLE, P-cores/E-cores), or to categorize tasks based on
> specific properties (e.g., running certain tasks only in a subset of CPUs).
> 
> Currently, there is no mechanism that allows to use the built-in idle CPU
> selection policy to an arbitrary subset of CPUs. As a result, schedulers
> often implement their own idle CPU selection policies, which are typically
> similar to one another, leading to a lot of code duplication.
> 
> To address this, extend the built-in idle CPU selection policy introducing
> ]the concept of allowed CPUs.
> 
> With this concept, BPF schedulers can apply the built-in idle CPU selection
> policy to a subset of allowed CPUs, allowing them to implement their own
> hard/soft-affinity rules while still using the topology optimizations of
> the built-in policy, preventing code duplication across different
> schedulers.
> 
> To implement this introduce a new helper kfunc scx_bpf_select_cpu_and()
> that accepts a cpumask of allowed CPUs:
> 
> s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu,
> 			   u64 wake_flags,
> 			   const struct cpumask *cpus_allowed, u64 flags);
> 
> Example usage
> =============
> 
> s32 BPF_STRUCT_OPS(foo_select_cpu, struct task_struct *p,
> 		   s32 prev_cpu, u64 wake_flags)
> {
> 	const struct cpumask *cpus = task_allowed_cpus(p) ?: p->cpus_ptr;
> 	s32 cpu;
> 
> 	cpu = scx_bpf_select_cpu_and(p, prev_cpu, wake_flags, cpus, 0);
> 	if (cpu >= 0) {
> 		scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
> 		return cpu;
> 	}
> 
> 	return prev_cpu;
> }
> 
> Results
> =======
> 
> Load distribution on a 4 sockets / 4 cores per socket system, simulated
> using virtme-ng, running a modified version of scx_bpfland that uses the
> new helper scx_bpf_select_cpu_and() and 0xff00 as allowed domain:
> 
>      $ vng --cpu 16,sockets=4,cores=4,threads=1
>      ...
>      $ stress-ng -c 16
>      ...
>      $ htop
>      ...
>        0[                         0.0%]   8[||||||||||||||||||||||||100.0%]
>        1[                         0.0%]   9[||||||||||||||||||||||||100.0%]
>        2[                         0.0%]  10[||||||||||||||||||||||||100.0%]
>        3[                         0.0%]  11[||||||||||||||||||||||||100.0%]
>        4[                         0.0%]  12[||||||||||||||||||||||||100.0%]
>        5[                         0.0%]  13[||||||||||||||||||||||||100.0%]
>        6[                         0.0%]  14[||||||||||||||||||||||||100.0%]
>        7[                         0.0%]  15[||||||||||||||||||||||||100.0%]
> 
> With scx_bpf_select_cpu_dfl() tasks would be distributed evenly across all
> the available CPUs.
> 
> ChangeLog v5 -> v6:
>  - prevent redundant cpumask_subset() + cpumask_equal() checks in all
>    patches
>  - remove cpumask_subset() + cpumask_and() combo with local cpumasks, as
>    cpumask_and() alone is generally more efficient
>  - cleanup patches to prevent unnecessary function renames
> 
> ChangeLog v4 -> v5:
>  - simplify code to compute the temporary task's cpumasks (and)
> 
> ChangeLog v3 -> v4:
>  - keep p->nr_cpus_allowed optimizations (skip cpumask operations when the
>    task can run on all CPUs)
>  - allow to call scx_bpf_select_cpu_and() also from ops.enqueue() and
>    modify the kselftest to cover this case as well
>  - rebase to the latest sched_ext/for-6.15
> 
> ChangeLog v2 -> v3:
>  - incrementally refactor scx_select_cpu_dfl() to accept idle flags and an
>    arbitrary allowed cpumask
>  - build scx_bpf_select_cpu_and() on top of the existing logic
>  - re-arrange scx_select_cpu_dfl() prototype, aligning the first three
>    arguments with select_task_rq()
>  - do not use "domain" for the allowed cpumask to avoid potential ambiguity
>    with sched_domain
> 
> ChangeLog v1 -> v2:
>   - rename scx_bpf_select_cpu_pref() to scx_bpf_select_cpu_and() and always
>     select idle CPUs strictly within the allowed domain
>   - rename preferred CPUs -> allowed CPU
>   - drop %SCX_PICK_IDLE_IN_PREF (not required anymore)
>   - deprecate scx_bpf_select_cpu_dfl() in favor of scx_bpf_select_cpu_and()
>     and provide all the required backward compatibility boilerplate
> 
> Andrea Righi (6):
>       sched_ext: idle: Extend topology optimizations to all tasks
>       sched_ext: idle: Explicitly pass allowed cpumask to scx_select_cpu_dfl()
>       sched_ext: idle: Accept an arbitrary cpumask in scx_select_cpu_dfl()
>       sched_ext: idle: Introduce scx_bpf_select_cpu_and()
>       selftests/sched_ext: Add test for scx_bpf_select_cpu_and()
>       sched_ext: idle: Deprecate scx_bpf_select_cpu_dfl()
> 
>  Documentation/scheduler/sched-ext.rst              |  11 +-
>  kernel/sched/ext.c                                 |   6 +-
>  kernel/sched/ext_idle.c                            | 196 ++++++++++++++++-----
>  kernel/sched/ext_idle.h                            |   3 +-
>  tools/sched_ext/include/scx/common.bpf.h           |   5 +-
>  tools/sched_ext/include/scx/compat.bpf.h           |  37 ++++
>  tools/sched_ext/scx_flatcg.bpf.c                   |  12 +-
>  tools/sched_ext/scx_simple.bpf.c                   |   9 +-
>  tools/testing/selftests/sched_ext/Makefile         |   1 +
>  .../testing/selftests/sched_ext/allowed_cpus.bpf.c | 121 +++++++++++++
>  tools/testing/selftests/sched_ext/allowed_cpus.c   |  57 ++++++
>  .../selftests/sched_ext/enq_select_cpu_fails.bpf.c |  12 +-
>  .../selftests/sched_ext/enq_select_cpu_fails.c     |   2 +-
>  tools/testing/selftests/sched_ext/exit.bpf.c       |   6 +-
>  .../sched_ext/select_cpu_dfl_nodispatch.bpf.c      |  13 +-
>  .../sched_ext/select_cpu_dfl_nodispatch.c          |   2 +-
>  16 files changed, 404 insertions(+), 89 deletions(-)
>  create mode 100644 tools/testing/selftests/sched_ext/allowed_cpus.bpf.c
>  create mode 100644 tools/testing/selftests/sched_ext/allowed_cpus.c

     prev parent reply	other threads:[~2025-03-22  3:56 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-21 22:10 [PATCHSET v6 sched_ext/for-6.15] sched_ext: Enhance built-in idle selection with allowed CPUs Andrea Righi
2025-03-21 22:10 ` [PATCH 1/6] sched_ext: idle: Extend topology optimizations to all tasks Andrea Righi
2025-03-21 22:10 ` [PATCH 2/6] sched_ext: idle: Explicitly pass allowed cpumask to scx_select_cpu_dfl() Andrea Righi
2025-03-31 21:50   ` Tejun Heo
2025-04-01  6:21     ` Andrea Righi
2025-03-21 22:10 ` [PATCH 3/6] sched_ext: idle: Accept an arbitrary cpumask in scx_select_cpu_dfl() Andrea Righi
2025-03-31 21:56   ` Tejun Heo
2025-04-01  6:33     ` Andrea Righi
2025-03-21 22:10 ` [PATCH 4/6] sched_ext: idle: Introduce scx_bpf_select_cpu_and() Andrea Righi
2025-03-31 21:59   ` Tejun Heo
2025-04-01  6:35     ` Andrea Righi
2025-03-21 22:10 ` [PATCH 5/6] selftests/sched_ext: Add test for scx_bpf_select_cpu_and() Andrea Righi
2025-03-21 22:10 ` [PATCH 6/6] sched_ext: idle: Deprecate scx_bpf_select_cpu_dfl() Andrea Righi
2025-03-31 22:01   ` Tejun Heo
2025-04-01  6:38     ` Andrea Righi
2025-03-22  3:56 ` Changwoo Min [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=36366fd3e9bc730300cc2262687c890f@igalia.com \
    --to=changwoo@igalia.com \
    --cc=arighi@nvidia.com \
    --cc=joelagnelf@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.