From: Peter Zijlstra <peterz@infradead.org>
To: Tejun Heo <tj@kernel.org>
Cc: mingo@kernel.org, longman@redhat.com, chenridong@huaweicloud.com,
juri.lelli@redhat.com, vincent.guittot@linaro.org,
dietmar.eggemann@arm.com, rostedt@goodmis.org,
bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
hannes@cmpxchg.org, mkoutny@suse.com, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, jstultz@google.com,
kprateek.nayak@amd.com, qyousef@layalina.io
Subject: Re: [PATCH v2 00/10] sched: Flatten the pick
Date: Wed, 27 May 2026 11:41:59 +0200 [thread overview]
Message-ID: <20260527094159.GS3126523@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <agtkR_kTkMW4Gc5d@slm.duckdns.org>
On Mon, May 18, 2026 at 09:11:03AM -1000, Tejun Heo wrote:
> Hello, Peter.
>
> On Mon, May 18, 2026 at 09:14:56AM +0200, Peter Zijlstra wrote:
> ...
> > So the current scheme will inflate the part of A to be double the weight
> > (of B), giving them 2 out of 3 parts on the contended CPUs, but then B
> > will still get complete / uncontested access to those extra 128 CPUs,
> > resulting in a 2:4 weight distribution.
> >
> > Which also isn't as straight forward as one might think.
>
> Right, the current behavior isn't quite what people would expect intuitively
> either.
>
> ...
> > So for the one contended CPU A gets 256 out of 257 parts, while B gets
> > the full CPU for the remaining 255 CPUs, for a:
> >
> > 256 1 257
> > --- : --- + 255*--- = 256:65535 ~ 1:256
> > 257 257 257
> >
> > distribution. While with the new scheme it would be:
> >
> > 1 1 2
> > - : - + 255*- = 1:511
> > 2 2 2
> >
> > Which, realistically isn't all that different, except the old scheme has
> > this really large weight to deal with.
> >
> > So from where I'm sitting, yes different, but it behaves better.
FWIW if the workload was single threads per CPU; the above is also the
exact behaviour we'd have without cgroups.
> I see. Thread cardinality and affinity problems make weight based
> distribution such a pain. I wonder whether this can be better solved by
> turning it into a two-layer allocation problem - groups to CPUs and then
> timeshare on CPUs as necessary. That comes with a lot of its own problems
> but it can, aspirationally at least, approximate global weight distribution
> and would have better locality properties.
If people want, they can already do this today. I don't see a reason to
mandate something like that. That is, combine cpuset and cpu in a v2
hierarchy and you get this.
The main problem with doing something like that is of course that it
isn't always clear how many CPUs will be needed for a particular 'job'.
So assigning groups to CPUs isn't a straight forward thing.
If I remember, Meta was actually doing some of this. It was dynamically
resizing cpusets based on load predictions and the like in order to
separate various worloads on the same large machine, right?
Anyway, while it is somewhat tedious to change behaviour, I do think it
is worth doing in this case.
next prev parent reply other threads:[~2026-05-27 9:42 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-11 11:31 [PATCH v2 00/10] sched: Flatten the pick Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 01/10] sched/debug: Use char * instead of char (*)[] Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 02/10] sched: Use {READ,WRITE}_ONCE() for preempt_dynamic_mode Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 03/10] sched/debug: Collapse subsequent CONFIG_SCHED_CLASS_EXT sections Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 04/10] sched/fair: Add cgroup_mode switch Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 05/10] sched/fair: Add cgroup_mode: UP Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 06/10] sched/fair: Add cgroup_mode: MAX Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 07/10] sched/fair: Add cgroup_mode: CONCUR Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 08/10] sched/fair: Add newidle balance to pick_task_fair() Peter Zijlstra
2026-05-12 5:37 ` K Prateek Nayak
2026-05-12 9:45 ` Peter Zijlstra
2026-05-19 15:13 ` Vincent Guittot
2026-06-03 9:51 ` Aaron Lu
2026-06-11 11:32 ` Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 09/10] sched: Remove sched_class::pick_next_task() Peter Zijlstra
2026-05-19 15:14 ` Vincent Guittot
2026-05-11 11:31 ` [PATCH v2 10/10] sched/eevdf: Move to a single runqueue Peter Zijlstra
2026-05-11 16:21 ` K Prateek Nayak
2026-05-12 11:09 ` Peter Zijlstra
2026-05-13 7:01 ` K Prateek Nayak
2026-05-13 7:25 ` Peter Zijlstra
2026-05-13 4:51 ` John Stultz
2026-05-13 5:00 ` John Stultz
2026-05-14 1:36 ` John Stultz
2026-05-14 2:53 ` K Prateek Nayak
2026-05-14 3:14 ` John Stultz
2026-05-19 10:38 ` Vincent Guittot
2026-05-20 16:32 ` Vincent Guittot
2026-05-21 2:57 ` K Prateek Nayak
2026-05-21 7:56 ` Vincent Guittot
2026-05-21 10:31 ` Peter Zijlstra
2026-05-21 12:13 ` Vincent Guittot
2026-05-21 13:29 ` Peter Zijlstra
2026-05-21 13:44 ` Vincent Guittot
2026-05-21 14:01 ` Peter Zijlstra
2026-05-21 13:21 ` Peter Zijlstra
2026-05-21 13:39 ` Peter Zijlstra
2026-05-21 13:56 ` Vincent Guittot
2026-05-26 7:53 ` Zhang Qiao
2026-05-26 9:15 ` K Prateek Nayak
2026-05-26 9:36 ` Zhang Qiao
2026-05-26 9:52 ` Peter Zijlstra
2026-05-26 10:54 ` K Prateek Nayak
2026-05-26 11:07 ` Peter Zijlstra
2026-05-26 12:40 ` Peter Zijlstra
2026-05-11 19:23 ` [PATCH v2 00/10] sched: Flatten the pick Tejun Heo
2026-05-12 8:10 ` Peter Zijlstra
2026-05-12 18:45 ` Tejun Heo
2026-05-18 7:14 ` Peter Zijlstra
2026-05-18 19:11 ` Tejun Heo
2026-05-27 9:41 ` Peter Zijlstra [this message]
2026-05-12 8:42 ` Vincent Guittot
2026-05-12 9:20 ` Peter Zijlstra
2026-05-12 18:24 ` Peter Zijlstra
2026-05-12 18:25 ` Peter Zijlstra
2026-05-12 18:32 ` Vincent Guittot
2026-05-13 7:25 ` Peter Zijlstra
2026-05-13 11:35 ` Peter Zijlstra
2026-05-13 12:43 ` Peter Zijlstra
2026-05-18 13:34 ` Vincent Guittot
2026-05-18 21:12 ` Peter Zijlstra
2026-05-19 10:13 ` Vincent Guittot
2026-05-19 16:00 ` Vincent Guittot
2026-05-16 3:30 ` Qais Yousef
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260527094159.GS3126523@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huaweicloud.com \
--cc=dietmar.eggemann@arm.com \
--cc=hannes@cmpxchg.org \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=mkoutny@suse.com \
--cc=qyousef@layalina.io \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox