Re: [RFC PATCH] sched/eevdf: Return leftmost entity in pick_eevdf() if no eligible entity is found

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Ingo Molnar <mingo@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	"Tiwei Bie" <tiwei.btw@antgroup.com>,
	Honglei Wang <wanghonglei@didichuxing.com>,
	Aaron Lu <aaron.lu@intel.com>, Chen Yu <yu.chen.surf@gmail.com>,
	Yujie Liu <yujie.liu@intel.com>, <linux-kernel@vger.kernel.org>,
	kernel test robot <oliver.sang@intel.com>,
	Tianchen Ding <dtcccc@linux.alibaba.com>
Subject: Re: [RFC PATCH] sched/eevdf: Return leftmost entity in pick_eevdf() if no eligible entity is found
Date: Fri, 19 Apr 2024 18:04:25 +0800	[thread overview]
Message-ID: <ZiJBqRafMdBi+wCV@chenyu5-mobl2> (raw)
In-Reply-To: <20240419082440.GB6345@noisy.programming.kicks-ass.net>

On 2024-04-19 at 10:24:40 +0200, Peter Zijlstra wrote:
> On Thu, Apr 18, 2024 at 09:03:36PM +0800, Chen Yu wrote:
> 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 31bca05c3612..9f203012e8f5 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -696,15 +696,23 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
> >   *
> >   * XXX could add max_slice to the augmented data to track this.
> >   */
> > +
> > +static s64 limit_entity_lag(struct sched_entity *se, s64 lag)
> > +{
> > +	s64 limit;
> > +
> > +	limit = calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se);
> > +	return clamp(lag, -limit, limit);
> > +}
> 
> Right, helper makes sense.
> 
> > @@ -3721,6 +3729,7 @@ static void reweight_eevdf(struct cfs_rq *cfs_rq, struct sched_entity *se,
> >  	if (avruntime != se->vruntime) {
> >  		vlag = (s64)(avruntime - se->vruntime);
> >  		vlag = div_s64(vlag * old_weight, weight);
> > +		vlag = limit_entity_lag(se, vlag);
> >  		se->vruntime = avruntime - vlag;
> 
> So the !on_rq case has clamping in update_entity_lag() which is before
> scaling. And that makes more sense to me, because putting a limit on
> vlag before the multiplication *should* ensure the multiplication itself
> doesn't overflow.
>
> But now you allow it to compute garbage and then clip the garbage.
> 

Yes, there is possibility to get multiplication overflow. Clamp first should
be better.
(BTW for !on_rq case in this patch, it is also scaled before the clamp).

> >  	}
> >  
> > @@ -3768,6 +3777,9 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
> >  
> >  	update_load_set(&se->load, weight);
> >  
> > +	if (!se->on_rq)
> > +		se->vlag = limit_entity_lag(se, se->vlag);
> > +
> 
> Except you now add clamping after scaling too, but in a really weird
> place. Should this not go right after the div_s64() that scales?
>

The reason to put this after update_load_set(&se->load, weight) is because
we want to clamp the vlag based on the latest load, although for reweight_eevdf(),
it uses the old load to clamp it, unless we add new parameter to the calc_delta_fair() to
use the new load rather than the current se->load.
 
> Unlike the reweight_eevdf() case, there might be an argument for doing
> it after scaling in this case. Namely, you can have multiple reweights
> stacking their scale ops.
> 
>

Yes, I saw that your patch which clamp the vlag before scaling, I'll have a try
on that patch. Xuewen should post that v2 if everything works well.
 
> Also, could you put a few words in on how often these clips are hit? I
> suspect it's fairly rare (but crucial when it does).

Everytime it happens a subsequent NULL pointer exception usually happens.
The trace data was posted here: 

https://lore.kernel.org/lkml/ZiAWTU5xb%2FJMn%2FHs@chenyu5-mobl2/

"
Here is the debug log printed by place_entity():


[  397.597268]cfs_rq:0xe75f7100
              cfs_rq.avg_vruntime:-1111846207333767
              cfs_rq.min_vruntime:810640668779
              avg_vruntime():686982466017
              curr(0xc59f4f20 rb_producer weight:15 vruntime:1447773196654 sum_exec_ns:187707021870 ctx(0 73)
              leftmost(0xeacb6e00 vruntime:332464705486 sum_exec_ns:78776125437 load:677)
...

[  397.877251]cfs_rq:0xe75f7100
              cfs_rq.avg_vruntime:-759390883821798
              cfs_rq.min_vruntime:810640668779
              avg_vruntime(): 689577229374
              curr(0xc59f4f20 rb_producer weight:15 vruntime:1453640907998 sum_ns:187792974673 ctx(0 73)
              leftmost(0xeacb6e00 vruntime:-59752941080010 sum_ns:78776125437 load:4)


The leftmost se is a task group, its vruntime reduces from 332464705486 to
-59752941080010, because its load reduced from 677 to 4 due to update_cfs_group()
on the tree entities.

Back to reweight_entity():
vlag = avruntime - se->vruntime = 689577229374 - 332464705486 = 357112523888;
vlag = vlag * old_weight / weight = 357112523888 * 677 / 4 = 60441294668044;        <-------- ouch!
se->vruntime = avruntime - vlag = -59751717438670;

the new se vruntime -59751717438670 is close to what we printed -59752941080010,
consider that the avg_vruntime() vary.

Then later this leftmost se has changed its load back and forth, and when the load is 2,
the vuntime has reached a dangerous threshold to trigger the s64 overflow in
eligible check:

[  398.011991]cfs_rq:0xe75f7100
              cfs_rq.avg_vruntime:-11875977385353427
              cfs_rq.min_vruntime:810640668779
              cfs_rq.avg_load:96985
              leftmost(0xeacb6e00 vruntime:18446623907344963655 load:2)

vruntime_eligible()
{
   key = se.vruntime - cfs_rq.min_vruntime = -120977005256740;
   key * avg_load                        <--------------------- OVERFLOW  s64
}
"

next prev parent reply	other threads:[~2024-04-19 10:04 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-26  8:23 [RFC PATCH] sched/eevdf: Return leftmost entity in pick_eevdf() if no eligible entity is found Chen Yu
2024-02-28  9:04 ` Xuewen Yan
2024-02-28 15:24   ` Chen Yu
2024-02-29 12:10     ` Xuewen Yan
2024-03-01  6:46       ` Chen Yu
2024-02-29  9:00 ` Abel Wu
2024-03-01  7:07   ` Chen Yu
2024-03-01  8:42     ` Abel Wu
2024-04-08 12:00       ` Peter Zijlstra
2024-04-08 11:58   ` Peter Zijlstra
2024-04-08 13:11     ` Chen Yu
2024-04-09  9:21       ` Peter Zijlstra
2024-04-15  7:22         ` Peter Zijlstra
2024-04-15  8:03           ` Chen Yu
2024-04-17 18:34         ` Chen Yu
2024-04-18  2:57           ` Xuewen Yan
2024-04-18  3:08             ` Chen Yu
2024-04-18  3:37               ` Tianchen Ding
2024-04-18  5:52                 ` Chen Yu
2024-04-18  6:16                   ` Tianchen Ding
2024-04-18 13:03             ` Chen Yu
2024-04-18 23:45               ` Tim Chen
2024-04-19  8:24               ` Peter Zijlstra
2024-04-19  8:45                 ` Peter Zijlstra
2024-04-19  9:20                   ` Xuewen Yan
2024-04-19  9:17                 ` Xuewen Yan
2024-04-19 10:04                 ` Chen Yu [this message]
2024-04-19 16:24                   ` Peter Zijlstra
2024-04-19 17:22                     ` Chen Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZiJBqRafMdBi+wCV@chenyu5-mobl2 \
    --to=yu.c.chen@intel.com \
    --cc=aaron.lu@intel.com \
    --cc=dtcccc@linux.alibaba.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oliver.sang@intel.com \
    --cc=peterz@infradead.org \
    --cc=tim.c.chen@intel.com \
    --cc=tiwei.btw@antgroup.com \
    --cc=vincent.guittot@linaro.org \
    --cc=wanghonglei@didichuxing.com \
    --cc=wuyun.abel@bytedance.com \
    --cc=xuewen.yan94@gmail.com \
    --cc=yu.chen.surf@gmail.com \
    --cc=yujie.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox