From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>,
Abel Wu <wuyun.abel@bytedance.com>,
Ingo Molnar <mingo@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
Tim Chen <tim.c.chen@intel.com>,
"Tiwei Bie" <tiwei.btw@antgroup.com>,
Honglei Wang <wanghonglei@didichuxing.com>,
Aaron Lu <aaron.lu@intel.com>, Chen Yu <yu.chen.surf@gmail.com>,
Yujie Liu <yujie.liu@intel.com>, <linux-kernel@vger.kernel.org>,
kernel test robot <oliver.sang@intel.com>,
Tianchen Ding <dtcccc@linux.alibaba.com>
Subject: Re: [RFC PATCH] sched/eevdf: Return leftmost entity in pick_eevdf() if no eligible entity is found
Date: Fri, 19 Apr 2024 18:04:25 +0800 [thread overview]
Message-ID: <ZiJBqRafMdBi+wCV@chenyu5-mobl2> (raw)
In-Reply-To: <20240419082440.GB6345@noisy.programming.kicks-ass.net>
On 2024-04-19 at 10:24:40 +0200, Peter Zijlstra wrote:
> On Thu, Apr 18, 2024 at 09:03:36PM +0800, Chen Yu wrote:
>
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 31bca05c3612..9f203012e8f5 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -696,15 +696,23 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
> > *
> > * XXX could add max_slice to the augmented data to track this.
> > */
> > +
> > +static s64 limit_entity_lag(struct sched_entity *se, s64 lag)
> > +{
> > + s64 limit;
> > +
> > + limit = calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se);
> > + return clamp(lag, -limit, limit);
> > +}
>
> Right, helper makes sense.
>
> > @@ -3721,6 +3729,7 @@ static void reweight_eevdf(struct cfs_rq *cfs_rq, struct sched_entity *se,
> > if (avruntime != se->vruntime) {
> > vlag = (s64)(avruntime - se->vruntime);
> > vlag = div_s64(vlag * old_weight, weight);
> > + vlag = limit_entity_lag(se, vlag);
> > se->vruntime = avruntime - vlag;
>
> So the !on_rq case has clamping in update_entity_lag() which is before
> scaling. And that makes more sense to me, because putting a limit on
> vlag before the multiplication *should* ensure the multiplication itself
> doesn't overflow.
>
> But now you allow it to compute garbage and then clip the garbage.
>
Yes, there is possibility to get multiplication overflow. Clamp first should
be better.
(BTW for !on_rq case in this patch, it is also scaled before the clamp).
> > }
> >
> > @@ -3768,6 +3777,9 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
> >
> > update_load_set(&se->load, weight);
> >
> > + if (!se->on_rq)
> > + se->vlag = limit_entity_lag(se, se->vlag);
> > +
>
> Except you now add clamping after scaling too, but in a really weird
> place. Should this not go right after the div_s64() that scales?
>
The reason to put this after update_load_set(&se->load, weight) is because
we want to clamp the vlag based on the latest load, although for reweight_eevdf(),
it uses the old load to clamp it, unless we add new parameter to the calc_delta_fair() to
use the new load rather than the current se->load.
> Unlike the reweight_eevdf() case, there might be an argument for doing
> it after scaling in this case. Namely, you can have multiple reweights
> stacking their scale ops.
>
>
Yes, I saw that your patch which clamp the vlag before scaling, I'll have a try
on that patch. Xuewen should post that v2 if everything works well.
> Also, could you put a few words in on how often these clips are hit? I
> suspect it's fairly rare (but crucial when it does).
Everytime it happens a subsequent NULL pointer exception usually happens.
The trace data was posted here:
https://lore.kernel.org/lkml/ZiAWTU5xb%2FJMn%2FHs@chenyu5-mobl2/
"
Here is the debug log printed by place_entity():
[ 397.597268]cfs_rq:0xe75f7100
cfs_rq.avg_vruntime:-1111846207333767
cfs_rq.min_vruntime:810640668779
avg_vruntime():686982466017
curr(0xc59f4f20 rb_producer weight:15 vruntime:1447773196654 sum_exec_ns:187707021870 ctx(0 73)
leftmost(0xeacb6e00 vruntime:332464705486 sum_exec_ns:78776125437 load:677)
...
[ 397.877251]cfs_rq:0xe75f7100
cfs_rq.avg_vruntime:-759390883821798
cfs_rq.min_vruntime:810640668779
avg_vruntime(): 689577229374
curr(0xc59f4f20 rb_producer weight:15 vruntime:1453640907998 sum_ns:187792974673 ctx(0 73)
leftmost(0xeacb6e00 vruntime:-59752941080010 sum_ns:78776125437 load:4)
The leftmost se is a task group, its vruntime reduces from 332464705486 to
-59752941080010, because its load reduced from 677 to 4 due to update_cfs_group()
on the tree entities.
Back to reweight_entity():
vlag = avruntime - se->vruntime = 689577229374 - 332464705486 = 357112523888;
vlag = vlag * old_weight / weight = 357112523888 * 677 / 4 = 60441294668044; <-------- ouch!
se->vruntime = avruntime - vlag = -59751717438670;
the new se vruntime -59751717438670 is close to what we printed -59752941080010,
consider that the avg_vruntime() vary.
Then later this leftmost se has changed its load back and forth, and when the load is 2,
the vuntime has reached a dangerous threshold to trigger the s64 overflow in
eligible check:
[ 398.011991]cfs_rq:0xe75f7100
cfs_rq.avg_vruntime:-11875977385353427
cfs_rq.min_vruntime:810640668779
cfs_rq.avg_load:96985
leftmost(0xeacb6e00 vruntime:18446623907344963655 load:2)
vruntime_eligible()
{
key = se.vruntime - cfs_rq.min_vruntime = -120977005256740;
key * avg_load <--------------------- OVERFLOW s64
}
"
next prev parent reply other threads:[~2024-04-19 10:04 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-26 8:23 [RFC PATCH] sched/eevdf: Return leftmost entity in pick_eevdf() if no eligible entity is found Chen Yu
2024-02-28 9:04 ` Xuewen Yan
2024-02-28 15:24 ` Chen Yu
2024-02-29 12:10 ` Xuewen Yan
2024-03-01 6:46 ` Chen Yu
2024-02-29 9:00 ` Abel Wu
2024-03-01 7:07 ` Chen Yu
2024-03-01 8:42 ` Abel Wu
2024-04-08 12:00 ` Peter Zijlstra
2024-04-08 11:58 ` Peter Zijlstra
2024-04-08 13:11 ` Chen Yu
2024-04-09 9:21 ` Peter Zijlstra
2024-04-15 7:22 ` Peter Zijlstra
2024-04-15 8:03 ` Chen Yu
2024-04-17 18:34 ` Chen Yu
2024-04-18 2:57 ` Xuewen Yan
2024-04-18 3:08 ` Chen Yu
2024-04-18 3:37 ` Tianchen Ding
2024-04-18 5:52 ` Chen Yu
2024-04-18 6:16 ` Tianchen Ding
2024-04-18 13:03 ` Chen Yu
2024-04-18 23:45 ` Tim Chen
2024-04-19 8:24 ` Peter Zijlstra
2024-04-19 8:45 ` Peter Zijlstra
2024-04-19 9:20 ` Xuewen Yan
2024-04-19 9:17 ` Xuewen Yan
2024-04-19 10:04 ` Chen Yu [this message]
2024-04-19 16:24 ` Peter Zijlstra
2024-04-19 17:22 ` Chen Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZiJBqRafMdBi+wCV@chenyu5-mobl2 \
--to=yu.c.chen@intel.com \
--cc=aaron.lu@intel.com \
--cc=dtcccc@linux.alibaba.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=oliver.sang@intel.com \
--cc=peterz@infradead.org \
--cc=tim.c.chen@intel.com \
--cc=tiwei.btw@antgroup.com \
--cc=vincent.guittot@linaro.org \
--cc=wanghonglei@didichuxing.com \
--cc=wuyun.abel@bytedance.com \
--cc=xuewen.yan94@gmail.com \
--cc=yu.chen.surf@gmail.com \
--cc=yujie.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.