From: Omar Sandoval <osandov@osandov.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>, Chris Mason <clm@meta.com>,
Pat Cody <pat@patcody.io>,
mingo@redhat.com, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
vschneid@redhat.com, linux-kernel@vger.kernel.org,
patcody@meta.com, kernel-team@meta.com,
Breno Leitao <leitao@debian.org>
Subject: Re: [PATCH] sched/fair: Add null pointer check to pick_next_entity()
Date: Fri, 25 Apr 2025 01:53:19 -0700 [thread overview]
Message-ID: <aAtNf6QMG7Dj6snR@telecaster> (raw)
In-Reply-To: <20250422151421.GB33555@noisy.programming.kicks-ass.net>
On Tue, Apr 22, 2025 at 05:14:21PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 22, 2025 at 04:13:52PM +0200, Peter Zijlstra wrote:
> > On Mon, Apr 21, 2025 at 05:06:45PM -0700, Omar Sandoval wrote:
> >
> > > Hey, Peter,
> > >
> > > We haven't been able to test your latest patch, but I dug through some
> > > core dumps from crashes with your initial zero_vruntime patch. It looks
> > > like on just about all of them, the entity vruntimes are way too spread
> > > out, so we would get overflows regardless of what we picked as
> > > zero_vruntime.
> > >
> > > As a representative example, we have a cfs_rq with 3 entities with the
> > > follow vruntimes and (scaled down) weights:
> > >
> > > vruntime weight
> > > 39052385155836636 2 (curr)
> > > 43658311782076206 2
> > > 42824722322062111 4886
> > >
> > > The difference between the minimum and maximum is 4605926626239570,
> >
> > Right, that is quite beyond usable. The key question at this point
> > is how did we get here...
> >
> > > which is 53 bits. The total load is 4890. Even if you picked
> > > zero_vruntime to be equidistant from the minimum and maximum, the
> > > (vruntime - zero_vruntime) * load calculation in entity_eligible() is
> > > doomed to overflow.
> > >
> > > That range in vruntime seems too absurd to be due to only to running too
> > > long without preemption. We're only seeing these crashes on internal
> > > node cgroups (i.e., cgroups whose children are cgroups, not tasks). This
> > > all leads me to suspect reweight_entity().
> > >
> > > Specifically, this line in reweight_entity():
> > >
> > > se->vlag = div_s64(se->vlag * se->load.weight, weight);
> > >
> > > seems like it could create a very large vlag, which could cause
> > > place_entity() to adjust vruntime by a large value.
> >
> > Right, I fixed that not too long ago. At the time I convinced myself
> > clipping there wasn't needed (in fact, it would lead to some other
> > artifacts iirc). Let me go review that decision :-)
>
> In particular, the two most recent commits in this area are:
>
> https://lore.kernel.org/r/20250109105959.GA2981@noisy.programming.kicks-ass.net
> https://lkml.kernel.org/r/20250110115720.GA17405@noisy.programming.kicks-ass.net
>
> (from the same thread).
>
> Note that it does call update_entity_lag() which does clip. So after
> that it's just scaling for the new weight.
>
> Notably, virtual time = time / weight, and the clip limit is adjusted
> for weight.
>
> So if it is inside limits pre-scaling, it should still be in limits
> after scaling.
>
> l = max / w;
>
> w->w' --> l' = l*w/w' = (max / w) * (w/w') = max / w'
>
> I've stuck some trace_printk()s on again, and the numbers I get here
> seem sane.
For anyone following along, I found the source of the bad vruntimes and
sent a patch:
https://lore.kernel.org/all/f0c2d1072be229e1bdddc73c0703919a8b00c652.1745570998.git.osandov@fb.com/
next prev parent reply other threads:[~2025-04-25 8:53 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-20 20:53 [PATCH] sched/fair: Add null pointer check to pick_next_entity() Pat Cody
2025-03-20 22:42 ` Christian Loehle
2025-03-21 17:52 ` Pat Cody
2025-03-24 11:56 ` Peter Zijlstra
2025-03-25 15:12 ` Pat Cody
2025-03-25 18:59 ` Peter Zijlstra
2025-03-26 19:26 ` Pat Cody
2025-04-02 14:59 ` Rik van Riel
2025-04-02 18:07 ` Peter Zijlstra
2025-04-09 14:29 ` Rik van Riel
2025-04-09 15:27 ` Peter Zijlstra
2025-04-11 14:51 ` Rik van Riel
2025-04-14 9:08 ` Peter Zijlstra
2025-04-14 15:38 ` Chris Mason
2025-04-15 10:07 ` Peter Zijlstra
2025-04-16 7:59 ` Peter Zijlstra
[not found] ` <7B2CFC16-1ADE-4565-B555-7525A50494C2@surriel.com>
[not found] ` <20250402082221.GT5880@noisy.programming.kicks-ass.net>
2025-04-14 19:57 ` Rik van Riel
2025-04-15 8:02 ` Peter Zijlstra
2025-04-16 12:44 ` Peter Zijlstra
2025-04-16 14:19 ` Rik van Riel
2025-04-16 15:27 ` Chris Mason
2025-04-18 15:44 ` Peter Zijlstra
2025-04-18 23:49 ` Omar Sandoval
2025-04-22 0:06 ` Omar Sandoval
2025-04-22 14:13 ` Peter Zijlstra
2025-04-22 15:14 ` Peter Zijlstra
2025-04-25 8:53 ` Omar Sandoval [this message]
2025-04-22 13:36 ` Peter Zijlstra
2025-04-19 2:53 ` Mike Galbraith
[not found] <20250320173438.3562449-2-patcody@meta.com>
2025-03-24 15:56 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aAtNf6QMG7Dj6snR@telecaster \
--to=osandov@osandov.com \
--cc=bsegall@google.com \
--cc=clm@meta.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@meta.com \
--cc=leitao@debian.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=pat@patcody.io \
--cc=patcody@meta.com \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.