From: Peter Zijlstra <peterz@infradead.org>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: mingo@kernel.org, longman@redhat.com, chenridong@huaweicloud.com,
juri.lelli@redhat.com, vincent.guittot@linaro.org,
dietmar.eggemann@arm.com, rostedt@goodmis.org,
bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
jstultz@google.com
Subject: Re: [RFC][PATCH 8/8] sched/eevdf: Move to a single runqueue
Date: Wed, 18 Mar 2026 10:02:55 +0100 [thread overview]
Message-ID: <20260318090255.GG3738010@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <dc1a390f-16de-49b2-af85-a9df3f62eb8e@amd.com>
On Tue, Mar 17, 2026 at 11:16:52PM +0530, K Prateek Nayak wrote:
> > + /*
> > + * XXX comment on the curr thing
> > + */
> > + curr = (cfs_rq->curr == se);
> > + if (curr)
> > + place_entity(cfs_rq, se, flags);
> >
> > + if (se->on_rq && se->sched_delayed)
> > + requeue_delayed_entity(cfs_rq, se);
> >
> > + weight = enqueue_hierarchy(p, flags);
>
> Here is question I had when I first saw this on sched/flat and I've
> only looked at the series briefly:
>
> enqueue_hierarchy() would end up updating the averages, and reweighing
> the hierarchical load of the entities in the new task's hierarchy ...
>
> >
> > + if (!curr) {
> > + reweight_eevdf(cfs_rq, se, weight, false);
> > + place_entity(cfs_rq, se, flags | ENQUEUE_QUEUED);
>
> ... and the hierarchical weight of the newly enqueued task would be
> based on this updated hierarchical proportion.
>
> However, the tasks that are already queued have their deadlines
> calculated based on the old hierarchical proportions at the time they
> were enqueued / during the last task_tick_fair() for an entity that
> was put back.
>
> Consider two tasks of equal weight on cgroups with equal weights:
>
> root (weight: 1024)
> / \
> CG0 CG1 (wight(CG0,CG1) = 512)
> | |
> T0 T1 (h_weight(T0,T1) = 256)
>
>
> and a third task of equal weight arrives (for the sake of simplicity
> also consider both cgroups have saturated their respective global
> shares on this CPU - similar to UP mode):
>
>
> root (weight: 1024)
> / \
> (weight: 512) CG0 CG1 (weight: 512)
> / / \
> (h_weight(T0) = 256) T0 T1 T2 (h_weight(T2) = 128)
>
> (h_weight(T1) = 256)
>
>
> Logically, once T2 arrives, T1 should also be reweighed, it's
> hierarchical proportions be adjusted, and its vruntime and deadline
> be also adjusted accordingly based on the lag but that doesn't
> happen.
You are absolutely right.
> Instead, we continue with an approximation of h_load as seen
> sometime during the past. Is that alright with EEVDF or am I missing
> something?
Strictly speaking it is dodgy as heck ;-) I was hoping that on average
it would all work out. Esp. since PELT is a fairly slow and smooth
function, the reweights will mostly be minor adjustments.
> Can it so happen that on SMP, future enqueues, and SMP conditions
> always lead to larger h_load for the newly enqueued tasks and as a
> result the older tasks become less favorable for the pick leading
> to starvation? (Am I being paranoid?)
So typically the most recent enqueue will always have the smaller
fraction of the group weight. This would lead to a slight favour to the
older enqueue. So I think this would lead to a FIFO like bias.
But there is definitely some fun to be had here.
One definite fix is setting cgroup_mode to 'up' :-)
> > + __enqueue_entity(cfs_rq, se);
> > }
> >
> > if (!rq_h_nr_queued && rq->cfs.h_nr_queued)
>
> Anyhow, me goes and sees if any of this makes a difference to the
> benchmarks - I'll throw the biggest one at it first and see how
> that goes.
Thanks, fingers crossed. :-)
next prev parent reply other threads:[~2026-03-18 9:03 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 9:51 [RFC][PATCH 0/8] sched: Flatten the pick Peter Zijlstra
2026-03-17 9:51 ` [RFC][PATCH 1/8] sched/debug: Collapse subsequent CONFIG_SCHED_CLASS_EXT sections Peter Zijlstra
2026-03-17 9:51 ` [RFC][PATCH 2/8] sched/fair: Add cgroup_mode switch Peter Zijlstra
2026-03-17 9:51 ` [RFC][PATCH 3/8] sched/fair: Add cgroup_mode: UP Peter Zijlstra
2026-03-17 9:51 ` [RFC][PATCH 4/8] sched/fair: Add cgroup_mode: MAX Peter Zijlstra
2026-03-17 9:51 ` [RFC][PATCH 5/8] sched/fair: Add cgroup_mode: CONCUR Peter Zijlstra
2026-03-17 9:51 ` [RFC][PATCH 6/8] sched/fair: Add newidle balance to pick_task_fair() Peter Zijlstra
2026-03-17 9:51 ` [RFC][PATCH 7/8] sched: Remove sched_class::pick_next_task() Peter Zijlstra
2026-03-17 9:51 ` [RFC][PATCH 8/8] sched/eevdf: Move to a single runqueue Peter Zijlstra
2026-03-17 17:46 ` K Prateek Nayak
2026-03-18 9:02 ` Peter Zijlstra [this message]
2026-03-18 9:32 ` K Prateek Nayak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260318090255.GG3738010@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huaweicloud.com \
--cc=dietmar.eggemann@arm.com \
--cc=hannes@cmpxchg.org \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=mkoutny@suse.com \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox