From: Peter Zijlstra <peterz@infradead.org>
To: Chen Yu <yu.c.chen@intel.com>
Cc: Abel Wu <wuyun.abel@bytedance.com>,
Ingo Molnar <mingo@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
Tim Chen <tim.c.chen@intel.com>,
Tiwei Bie <tiwei.btw@antgroup.com>,
Honglei Wang <wanghonglei@didichuxing.com>,
Aaron Lu <aaron.lu@intel.com>, Chen Yu <yu.chen.surf@gmail.com>,
linux-kernel@vger.kernel.org,
kernel test robot <oliver.sang@intel.com>
Subject: Re: [RFC PATCH] sched/eevdf: Return leftmost entity in pick_eevdf() if no eligible entity is found
Date: Tue, 9 Apr 2024 11:21:04 +0200 [thread overview]
Message-ID: <20240409092104.GA2665@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <ZhPtCyRmPxa0DpMe@chenyu5-mobl2>
On Mon, Apr 08, 2024 at 09:11:39PM +0800, Chen Yu wrote:
> On 2024-04-08 at 13:58:33 +0200, Peter Zijlstra wrote:
> > On Thu, Feb 29, 2024 at 05:00:18PM +0800, Abel Wu wrote:
> >
> > > > According to the log, vruntime is 18435852013561943404, the
> > > > cfs_rq->min_vruntime is 763383370431, the load is 629 + 2048 = 2677,
> > > > thus:
> > > > s64 delta = (s64)(18435852013561943404 - 763383370431) = -10892823530978643
> > > > delta * 2677 = 7733399554989275921
> > > > that is to say, the multiply result overflow the s64, which turns the
> > > > negative value into a positive value, thus eligible check fails.
> > >
> > > Indeed.
> >
> > From the data presented it looks like min_vruntime is wrong and needs
> > update. If you can readily reproduce this, dump the vruntime of all
> > tasks on the runqueue and see if min_vruntime is indeed correct.
> >
>
> This was the dump of all the entities on the tree, from left to right,
Oh, my bad, I thought it was the pick path.
> and also from top down in middle order traverse, when this issue happens:
>
> [ 514.461242][ T8390] cfs_rq avg_vruntime:386638640128 avg_load:2048 cfs_rq->min_vruntime:763383370431
> [ 514.535935][ T8390] current on_rq se 0xc5851400, deadline:18435852013562231446
> min_vruntime:18437121115753667698 vruntime:18435852013561943404, load:629
>
>
> [ 514.536772][ T8390] Traverse rb-tree from left to right
> [ 514.537138][ T8390] se 0xec1234e0 deadline:763384870431 min_vruntime:763383370431 vruntime:763383370431 non-eligible <-- leftmost se
> [ 514.537835][ T8390] se 0xec4fcf20 deadline:763762447228 min_vruntime:763760947228 vruntime:763760947228 non-eligible
>
> [ 514.538539][ T8390] Traverse rb-tree from topdown
> [ 514.538877][ T8390] middle se 0xec1234e0 deadline:763384870431 min_vruntime:763383370431 vruntime:763383370431 non-eligible <-- root se
> [ 514.539605][ T8390] middle se 0xec4fcf20 deadline:763762447228 min_vruntime:763760947228 vruntime:763760947228 non-eligible
>
> The tree looks like:
>
> se (0xec1234e0)
> |
> |
> ----> se (0xec4fcf20)
>
>
> The root se 0xec1234e0 is also the leftmost se, its min_vruntime and
> vruntime are both 763383370431, which is aligned with
> cfs_rq->min_vruntime. It seems that the cfs_rq's min_vruntime gets
> updated correctly, because it is monotonic increasing.
Right.
> My guess is that, for some reason, one newly forked se in a newly
> created task group, in the rb-tree has not been picked for a long
> time(maybe not eligible). Its vruntime stopped at the negative
> value(near (unsigned long)(-(1LL << 20)) for a long time, its vruntime
> is long behind the cfs_rq->vruntime, thus the overflow happens.
I'll have to do the math again, but that's something in the order of not
picking a task in about a day, that would be 'bad' :-)
Is there any sane way to reproduce this, and how often does it happen?
next prev parent reply other threads:[~2024-04-09 9:21 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-26 8:23 [RFC PATCH] sched/eevdf: Return leftmost entity in pick_eevdf() if no eligible entity is found Chen Yu
2024-02-28 9:04 ` Xuewen Yan
2024-02-28 15:24 ` Chen Yu
2024-02-29 12:10 ` Xuewen Yan
2024-03-01 6:46 ` Chen Yu
2024-02-29 9:00 ` Abel Wu
2024-03-01 7:07 ` Chen Yu
2024-03-01 8:42 ` Abel Wu
2024-04-08 12:00 ` Peter Zijlstra
2024-04-08 11:58 ` Peter Zijlstra
2024-04-08 13:11 ` Chen Yu
2024-04-09 9:21 ` Peter Zijlstra [this message]
2024-04-15 7:22 ` Peter Zijlstra
2024-04-15 8:03 ` Chen Yu
2024-04-17 18:34 ` Chen Yu
2024-04-18 2:57 ` Xuewen Yan
2024-04-18 3:08 ` Chen Yu
2024-04-18 3:37 ` Tianchen Ding
2024-04-18 5:52 ` Chen Yu
2024-04-18 6:16 ` Tianchen Ding
2024-04-18 13:03 ` Chen Yu
2024-04-18 23:45 ` Tim Chen
2024-04-19 8:24 ` Peter Zijlstra
2024-04-19 8:45 ` Peter Zijlstra
2024-04-19 9:20 ` Xuewen Yan
2024-04-19 9:17 ` Xuewen Yan
2024-04-19 10:04 ` Chen Yu
2024-04-19 16:24 ` Peter Zijlstra
2024-04-19 17:22 ` Chen Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240409092104.GA2665@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=aaron.lu@intel.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=oliver.sang@intel.com \
--cc=tim.c.chen@intel.com \
--cc=tiwei.btw@antgroup.com \
--cc=vincent.guittot@linaro.org \
--cc=wanghonglei@didichuxing.com \
--cc=wuyun.abel@bytedance.com \
--cc=yu.c.chen@intel.com \
--cc=yu.chen.surf@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox