Re: [RFC][PATCH 8/8] sched/eevdf: Move to a single runqueue

public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: mingo@kernel.org, longman@redhat.com, chenridong@huaweicloud.com,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
	tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	jstultz@google.com
Subject: Re: [RFC][PATCH 8/8] sched/eevdf: Move to a single runqueue
Date: Wed, 18 Mar 2026 10:02:55 +0100	[thread overview]
Message-ID: <20260318090255.GG3738010@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <dc1a390f-16de-49b2-af85-a9df3f62eb8e@amd.com>

On Tue, Mar 17, 2026 at 11:16:52PM +0530, K Prateek Nayak wrote:

> > +	/*
> > +	 * XXX comment on the curr thing
> > +	 */
> > +	curr = (cfs_rq->curr == se);
> > +	if (curr)
> > +		place_entity(cfs_rq, se, flags);
> >  
> > +	if (se->on_rq && se->sched_delayed)
> > +		requeue_delayed_entity(cfs_rq, se);
> >  
> > +	weight = enqueue_hierarchy(p, flags);
> 
> Here is question I had when I first saw this on sched/flat and I've
> only looked at the series briefly:
> 
> enqueue_hierarchy() would end up updating the averages, and reweighing
> the hierarchical load of the entities in the new task's hierarchy ...
> 
> >  
> > +	if (!curr) {
> > +		reweight_eevdf(cfs_rq, se, weight, false);
> > +		place_entity(cfs_rq, se, flags | ENQUEUE_QUEUED);
> 
> ... and the hierarchical weight of the newly enqueued task would be
> based on this updated hierarchical proportion.
> 
> However, the tasks that are already queued have their deadlines
> calculated based on the old hierarchical proportions at the time they
> were enqueued / during the last task_tick_fair() for an entity that
> was put back.
> 
> Consider two tasks of equal weight on cgroups with equal weights:
> 
>     root    (weight: 1024)
>    /    \
>   CG0   CG1 (wight(CG0,CG1) = 512)
>    |     |
>    T0    T1 (h_weight(T0,T1) = 256)
> 
> 
> and a third task of equal weight arrives (for the sake of simplicity
> also consider both cgroups have saturated their respective global
> shares on this CPU - similar to UP mode):
> 
> 
>                             root        (weight: 1024)
>                            /    \
>          (weight: 512)   CG0    CG1     (weight: 512)
>                          /     /   \
>   (h_weight(T0) = 256)  T0    T1    T2  (h_weight(T2) = 128)
>                        
>                            (h_weight(T1) = 256)
> 
> 
> Logically, once T2 arrives, T1 should also be reweighed, it's
> hierarchical proportions be adjusted, and its vruntime and deadline
> be also adjusted accordingly based on the lag but that doesn't
> happen.

You are absolutely right.

> Instead, we continue with an approximation of h_load as seen
> sometime during the past. Is that alright with EEVDF or am I missing
> something?

Strictly speaking it is dodgy as heck ;-) I was hoping that on average
it would all work out. Esp. since PELT is a fairly slow and smooth
function, the reweights will mostly be minor adjustments.

> Can it so happen that on SMP, future enqueues, and SMP conditions
> always lead to larger h_load for the newly enqueued tasks and as a
> result the older tasks become less favorable for the pick leading
> to starvation? (Am I being paranoid?)

So typically the most recent enqueue will always have the smaller
fraction of the group weight. This would lead to a slight favour to the
older enqueue. So I think this would lead to a FIFO like bias.

But there is definitely some fun to be had here.

One definite fix is setting cgroup_mode to 'up' :-)

> > +		__enqueue_entity(cfs_rq, se);
> >  	}
> >  
> >  	if (!rq_h_nr_queued && rq->cfs.h_nr_queued)
> 
> Anyhow, me goes and sees if any of this makes a difference to the
> benchmarks - I'll throw the biggest one at it first and see how
> that goes.

Thanks, fingers crossed. :-)

next prev parent reply	other threads:[~2026-03-18  9:03 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17  9:51 [RFC][PATCH 0/8] sched: Flatten the pick Peter Zijlstra
2026-03-17  9:51 ` [RFC][PATCH 1/8] sched/debug: Collapse subsequent CONFIG_SCHED_CLASS_EXT sections Peter Zijlstra
2026-03-17  9:51 ` [RFC][PATCH 2/8] sched/fair: Add cgroup_mode switch Peter Zijlstra
2026-03-17  9:51 ` [RFC][PATCH 3/8] sched/fair: Add cgroup_mode: UP Peter Zijlstra
2026-03-17  9:51 ` [RFC][PATCH 4/8] sched/fair: Add cgroup_mode: MAX Peter Zijlstra
2026-03-17  9:51 ` [RFC][PATCH 5/8] sched/fair: Add cgroup_mode: CONCUR Peter Zijlstra
2026-03-17  9:51 ` [RFC][PATCH 6/8] sched/fair: Add newidle balance to pick_task_fair() Peter Zijlstra
2026-03-17  9:51 ` [RFC][PATCH 7/8] sched: Remove sched_class::pick_next_task() Peter Zijlstra
2026-03-17  9:51 ` [RFC][PATCH 8/8] sched/eevdf: Move to a single runqueue Peter Zijlstra
2026-03-17 17:46   ` K Prateek Nayak
2026-03-18  9:02     ` Peter Zijlstra [this message]
2026-03-18  9:32       ` K Prateek Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260318090255.GG3738010@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bsegall@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chenridong@huaweicloud.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=jstultz@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox