Re: [PATCH v3 3/7] sched/fair: Add cgroup_mode: max

Linux cgroups development
 help / color / mirror / Atom feed

From: Waiman Long <longman@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>, mingo@kernel.org
Cc: chenridong@huaweicloud.com, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	vschneid@redhat.com, tj@kernel.org, hannes@cmpxchg.org,
	mkoutny@suse.com, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, jstultz@google.com,
	kprateek.nayak@amd.com, qyousef@layalina.io
Subject: Re: [PATCH v3 3/7] sched/fair: Add cgroup_mode: max
Date: Wed, 10 Jun 2026 11:09:59 -0400	[thread overview]
Message-ID: <d4ca5fe7-fd76-47c8-949a-a69916bfcbd4@redhat.com> (raw)
In-Reply-To: <20260605124051.589618504@infradead.org>

On 6/5/26 8:40 AM, Peter Zijlstra wrote:
> In order to avoid the average CPU fraction avg(F_g_n) becoming tiny '1/N',
> assume each cgroup is maximally concurrent and distrubute 'N*weight', such
> that:
>
> 	F_g_n' = N * F_g_n
>
> Giving:
>
> 	avg(F_g_n') = N*avg(F_g_n) ~ N * 1/N = 1
>
> And while this sounds like it solves things, remember what that ~ meant. There
> is the corner case when a cgroup is minimally loaded, eg a single runnable
> task, therefore limit the CPU fraction to that of a nice -20 task to avoid
> getting too much load.
>
> This last bit is what makes it different from a previous proposal to allow
> raising cpu.weight to '100 * N', that would not limit the mininal concurrency
> case and results in a very large F_g_n. And just like F_g_n << 1 is
> problematic, so is F_g_n >> 1 for the exact same reasons (it would drown the
> kthreads, but it also risks overflowing the load values).
>
> So while this might appear to be a better scheme than the current default
> scheme, it doesn't really handle less than maximal concurrency nicely -- it
> clips and introduces artificially large weights. So where the traditional SMP
> mode works well when nr_tasks << nr_cpus, MAX doesn't work well in that regime
> and vice-versa.
>
> The meaning of "cpu.weight" would be: weight per allowed CPU.
>
> Included for completeness (and infrastructure).
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>   include/linux/cpuset.h |    6 +++++
>   kernel/cgroup/cpuset.c |   15 ++++++++++++++
>   kernel/sched/debug.c   |    1
>   kernel/sched/fair.c    |   52 ++++++++++++++++++++++++++++++++++++++++++++-----
>   4 files changed, 69 insertions(+), 5 deletions(-)
>
> --- a/include/linux/cpuset.h
> +++ b/include/linux/cpuset.h
> @@ -80,6 +80,7 @@ extern void lockdep_assert_cpuset_lock_h
>   extern void cpuset_cpus_allowed_locked(struct task_struct *p, struct cpumask *mask);
>   extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask);
>   extern bool cpuset_cpus_allowed_fallback(struct task_struct *p);
> +extern int cpuset_num_cpus(struct cgroup *cgroup);
>   extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
>   #define cpuset_current_mems_allowed (current->mems_allowed)
>   void cpuset_init_current_mems_allowed(void);
> @@ -216,6 +217,11 @@ static inline bool cpuset_cpus_allowed_f
>   	return false;
>   }
>   
> +static inline int cpuset_num_cpus(struct cgroup *cgroup)
> +{
> +	return num_online_cpus();
> +}
> +
>   static inline nodemask_t cpuset_mems_allowed(struct task_struct *p)
>   {
>   	return node_possible_map;
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -4116,6 +4116,21 @@ bool cpuset_cpus_allowed_fallback(struct
>   	return changed;
>   }
>   
> +int cpuset_num_cpus(struct cgroup *cgrp)
> +{
> +	int nr = num_online_cpus();
> +	struct cpuset *cs;
> +
> +	if (is_in_v2_mode()) {
> +		guard(rcu)();
> +		cs = css_cs(cgroup_e_css(cgrp, &cpuset_cgrp_subsys));
> +		if (cs)
> +			nr = cpumask_weight(cs->effective_cpus);
> +	}
> +
> +	return nr;
> +}

I just have a question about cgroup v1 support. I am assuming that 
cgroup v1 without the cpuset_v2_mode mount option is not supported. To 
fully support cgroup v1, you may have to use guarantee_active_cpus() to 
return the actual set of CPUs that the task can run on. Also there is a 
caveat about the arm64 specific task_cpu_possible_mask() for certain 
arm64 CPUs. That is for 32-bit binary running on 64-bit core which are 
allowed only on a selected subset of cores within the CPU.

This is probably not what you want to focus on right now, but it will be 
good to have a comment to list items that are not fully supported here.

Cheers,
Longman

next prev parent reply	other threads:[~2026-06-10 15:10 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-05 12:40 [PATCH v3 0/7] sched: Flatten the pick Peter Zijlstra
2026-06-05 12:40 ` [PATCH v3 1/7] sched/fair: Add cgroup_mode switch Peter Zijlstra
2026-06-05 12:40 ` [PATCH v3 2/7] sched/fair: Add cgroup_mode: up Peter Zijlstra
2026-06-05 15:07   ` Peter Zijlstra
2026-06-05 12:40 ` [PATCH v3 3/7] sched/fair: Add cgroup_mode: max Peter Zijlstra
2026-06-10 15:09   ` Waiman Long [this message]
2026-06-10 15:42     ` Waiman Long
2026-06-11 13:49       ` Peter Zijlstra
2026-06-11 13:47     ` Peter Zijlstra
2026-06-11 20:57       ` Waiman Long
2026-06-05 12:40 ` [PATCH v3 4/7] sched/fair: Add cgroup_mode: concur Peter Zijlstra
2026-06-05 12:40 ` [PATCH v3 5/7] sched/fair: Add cgroup_mode: tasks Peter Zijlstra
2026-06-05 12:40 ` [PATCH v3 6/7] sched/fair: Change the default cgroup_mode to concur Peter Zijlstra
2026-06-05 12:40 ` [PATCH v3 7/7] sched/eevdf: Move to a single runqueue Peter Zijlstra
2026-06-09  5:37 ` [PATCH v3 0/7] sched: Flatten the pick K Prateek Nayak
2026-06-12  2:29 ` Shubhang Kaushik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d4ca5fe7-fd76-47c8-949a-a69916bfcbd4@redhat.com \
    --to=longman@redhat.com \
    --cc=bsegall@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chenridong@huaweicloud.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=jstultz@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox