From: Peter Zijlstra <peterz@infradead.org>
To: Doug Smythies <dsmythies@telus.net>
Cc: linux-kernel@vger.kernel.org, vincent.guittot@linaro.org,
'Ingo Molnar' <mingo@kernel.org>,
wuyun.abel@bytedance.com
Subject: Re: [REGRESSION] Re: [PATCH 00/24] Complete EEVDF
Date: Fri, 10 Jan 2025 12:57:20 +0100 [thread overview]
Message-ID: <20250110115720.GA17405@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <002f01db631d$d265a600$7730f200$@telus.net>
On Thu, Jan 09, 2025 at 09:09:26PM -0800, Doug Smythies wrote:
> Hi Peter,
>
> Thanks for all your hard work on this.
>
> On 2025.01.09 03:00 Peter Zijlstra wrote:
>
> ...
>
> > This made me have a very hard look at reweight_entity(), and
> > specifically the ->on_rq case, which is more prominent with
> > DELAY_DEQUEUE.
> >
> > And indeed, it is all sorts of broken. While the computation of the new
> > lag is correct, the computation for the new vruntime, using the new lag
> > is broken for it does not consider the logic set out in place_entity().
> >
> > With the below patch, I now see things like:
> >
> > migration/12-55 [012] d..3. 309.006650: reweight_entity: (ffff8881e0e6f600-ffff88885f235f40-12)
> > { weight: 977582 avg_vruntime: 4860513347366 vruntime: 4860513347908 (-542) deadline: 4860516552475
> } ->
> > { weight: 2 avg_vruntime: 4860528915984 vruntime: 4860793840706 (-264924722) deadline: 6427157349203
> }
> > migration/14-62 [014] d..3. 309.006698: reweight_entity: (ffff8881e0e6cc00-ffff88885f3b5f40-15)
> > { weight: 2 avg_vruntime: 4874472992283 vruntime: 4939833828823 (-65360836540) deadline:
> 6316614641111 } ->
> > { weight: 967149 avg_vruntime: 4874217684324 vruntime: 4874217688559 (-4235) deadline: 4874220535650
> }
> >
> > Which isn't perfect yet, but much closer.
>
> Agreed.
> I tested the patch. Attached is a repeat of a graph I had sent before, with different y axis scale and old data deleted.
> It still compares to the "b12" kernel (the last good one in the kernel bisection).
> It was a 2 hour and 31 minute duration test, and the maximum CPU migration time was 24 milliseconds,
> verses 6 seconds without the patch.
Progress!
> I left things running for many hours and will let it continue overnight.
> There seems to have been an issue at one spot in time:
Right, so by happy accident I also left mine running overnight and I
think I caught one of those weird spikes. It took a bit of staring to
figure out what went wrong this time, but what I now think is the thing
that sets off the chain of fail is a combination of DELAY_DEQUEUE and
the cgroup reweight stuff.
Specifically, when a cgroup's CPU queue becomes empty,
calc_group_shares() will drop its weight to the floor.
Now, normally, when a queue goes empty, it gets dequeued from its
parent and its weight is immaterial. However, with DELAY_DEQUEUE the
thing will stick around for a while -- at a very low weight.
What makes this esp. troublesome is that even for an active cgroup (like
the 'yes' group) the per-cpu weight will be relatively small (~1/nr_cpus
like). Worse, the avg_vruntime thing uses load_scale_down() because u64
just isn't all that big :/
(if only all 64bit machines could do 128bit divisions as cheaply as x86_64)
This means that on a 16 CPU machine, the weight of a 'normal' all busy
queue will be 64, and the weight of an empty queue will be 2, which means
the effect of the ginormous lag on the avg_vruntime is fairly
significant, pushing it wildly off balance and affecting placement of
new tasks.
Also, this violates the spirit of DELAY_DEQUEUE, that wants to continue
competition as the entity was.
As such, we should not adjust the weight of an empty queue.
I've started a new run, and some 15 minutes of runtime show nothing
interesting, I'll have to let it run for a while again.
---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3e9ca38512de..93644b3983d4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4065,7 +4019,11 @@ static void update_cfs_group(struct sched_entity *se)
struct cfs_rq *gcfs_rq = group_cfs_rq(se);
long shares;
- if (!gcfs_rq)
+ /*
+ * When a group becomes empty, preserve its weight. This matters for
+ * DELAY_DEQUEUE.
+ */
+ if (!gcfs_rq || !gcfs_rq->load.weight)
return;
if (throttled_hierarchy(gcfs_rq))
next prev parent reply other threads:[~2025-01-10 11:57 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-29 22:51 [REGRESSION] Re: [PATCH 00/24] Complete EEVDF Doug Smythies
2025-01-06 11:57 ` Peter Zijlstra
2025-01-06 15:01 ` Doug Smythies
2025-01-06 16:59 ` Peter Zijlstra
2025-01-06 17:04 ` Peter Zijlstra
2025-01-06 17:14 ` Peter Zijlstra
2025-01-07 1:24 ` Doug Smythies
2025-01-07 10:49 ` Peter Zijlstra
2025-01-06 22:28 ` Doug Smythies
2025-01-07 11:26 ` Peter Zijlstra
2025-01-07 15:04 ` Doug Smythies
2025-01-07 16:25 ` Doug Smythies
2025-01-07 19:23 ` Peter Zijlstra
2025-01-08 5:15 ` Doug Smythies
2025-01-08 13:12 ` Peter Zijlstra
2025-01-08 15:48 ` Doug Smythies
2025-01-09 10:59 ` Peter Zijlstra
2025-01-09 12:18 ` [tip: sched/urgent] sched/fair: Fix EEVDF entity placement bug causing scheduling lag tip-bot2 for Peter Zijlstra
2025-04-17 9:56 ` Alexander Egorenkov
2025-04-22 5:40 ` ll"RE: " Doug Smythies
2025-04-24 7:56 ` Alexander Egorenkov
2025-04-26 15:09 ` Doug Smythies
2025-01-10 5:09 ` [REGRESSION] Re: [PATCH 00/24] Complete EEVDF Doug Smythies
2025-01-10 11:57 ` Peter Zijlstra [this message]
2025-01-12 23:14 ` Doug Smythies
2025-01-13 11:03 ` Peter Zijlstra
2025-01-14 10:58 ` Peter Zijlstra
2025-01-14 15:15 ` Doug Smythies
2025-01-15 2:08 ` Len Brown
2025-01-15 16:47 ` Doug Smythies
2025-01-19 0:09 ` Doug Smythies
2025-01-20 3:55 ` Doug Smythies
2025-01-21 11:06 ` Peter Zijlstra
2025-01-21 8:49 ` Peter Zijlstra
2025-01-21 11:21 ` Peter Zijlstra
2025-01-21 15:58 ` Doug Smythies
2025-01-24 4:34 ` Doug Smythies
2025-01-24 11:04 ` Peter Zijlstra
2025-01-13 11:05 ` Peter Zijlstra
2025-01-13 16:01 ` Doug Smythies
2025-01-13 12:58 ` [tip: sched/urgent] sched/fair: Fix update_cfs_group() vs DELAY_DEQUEUE tip-bot2 for Peter Zijlstra
2025-01-12 19:59 ` [REGRESSION] Re: [PATCH 00/24] Complete EEVDF Doug Smythies
-- strict thread matches above, loose matches on Subject: below --
2024-07-27 10:27 Peter Zijlstra
2024-11-28 10:32 ` [REGRESSION] " Marcel Ziswiler
2024-11-28 10:58 ` Peter Zijlstra
2024-11-28 11:37 ` Marcel Ziswiler
2024-11-29 9:08 ` Peter Zijlstra
2024-12-02 18:46 ` Marcel Ziswiler
2024-12-09 9:49 ` Peter Zijlstra
2024-12-10 16:05 ` Marcel Ziswiler
2024-12-10 16:13 ` Steven Rostedt
2024-12-10 8:45 ` Luis Machado
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250110115720.GA17405@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=dsmythies@telus.net \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=wuyun.abel@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.