All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aaron Lu <ziqianlu@bytedance.com>
To: xupengbo <xupengbo@oppo.com>
Cc: vincent.guittot@linaro.org, bsegall@google.com,
	cgroups@vger.kernel.org, dietmar.eggemann@arm.com,
	juri.lelli@redhat.com, linux-kernel@vger.kernel.org,
	mgorman@suse.de, mingo@redhat.com, peterz@infradead.org,
	rostedt@goodmis.org, vschneid@redhat.com
Subject: Re: [PATCH v2] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out.
Date: Wed, 6 Aug 2025 15:33:27 +0800	[thread overview]
Message-ID: <20250806071848.GA629@bytedance> (raw)
In-Reply-To: <20250806063158.25050-1-xupengbo@oppo.com>

On Wed, Aug 06, 2025 at 02:31:58PM +0800, xupengbo wrote:
> > >On Tue, 5 Aug 2025 at 16:42, xupengbo <xupengbo@oppo.com> wrote:
> > >
> > > When a task is migrated out, there is a probability that the tg->load_avg
> > > value will become abnormal. The reason is as follows.
> > >
> > > 1. Due to the 1ms update period limitation in update_tg_load_avg(), there
> > > is a possibility that the reduced load_avg is not updated to tg->load_avg
> > > when a task migrates out.
> > > 2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
> > > calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
> > > function cfs_rq_is_decayed() does not check whether
> > > cfs->tg_load_avg_contrib is null. Consequently, in some cases,
> > > __update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
> > > updated to tg->load_avg.
> > >
> > > I added a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
> > > which blocks the case (2.) mentioned above. I follow the condition in
> > > update_tg_load_avg() instead of directly checking if
> > > cfs_rq->tg_load_avg_contrib is null. I think it's necessary to keep the
> > > condition consistent in both places, otherwise unexpected problems may
> > > occur.
> > >
> > > Thanks for your comments,
> > > Xu Pengbo
> > >
> > > Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
> > > Signed-off-by: xupengbo <xupengbo@oppo.com>
> > > ---
> > > Changes:
> > > v1 -> v2:
> > > - Another option to fix the bug. Check cfs_rq->tg_load_avg_contrib in
> > > cfs_rq_is_decayed() to avoid early removal from the leaf_cfs_rq_list.
> > > - Link to v1 : https://lore.kernel.org/cgroups/20250804130326.57523-1-xupengbo@oppo.com/T/#u
> > >
> > >  kernel/sched/fair.c | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index b173a059315c..a35083a2d006 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -4062,6 +4062,11 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
> > >         if (child_cfs_rq_on_list(cfs_rq))
> > >                 return false;
> > >
> > > +       long delta = cfs_rq->avg.load_avg - cfs_rq->tg_load_avg_contrib;
> > > +
> > > +       if (abs(delta) > cfs_rq->tg_load_avg_contrib / 64)
> > 
> >I don't understand why you use the above condition instead of if
> >(!cfs_rq->tg_load_avg_contrib). Can you elaborate ?
> > 
> >strictly speaking we want to keep the cfs_rq in the list if
> >(cfs_rq->tg_load_avg_contrib != cfs_rq->avg.load_avg) and
> >cfs_rq->avg.load_avg == 0 when we test this condition
> 
> 
> I use this condition primarily based on the function update_tg_load_avg().
> I want to absolutely avoid a situation where cfs_rq_is_decay() returns 
> false but update_tg_load_avg() cannot update its value due to the delta 
> check, which may cause the cfs_rq to remain on the list permanently. 
> Honestly, I am not sure if this will happen, so I took this conservative 
> approach.

Hmm...it doesn't seem we need worry about this situation.

Because when cfs_rq->load_avg is 0, abs(delta) will be
cfs_rq->tg_load_avg_contrib and the following condition:

	if (abs(delta) > cfs_rq->tg_load_avg_contrib / 64)
becomes:
	if (cfs_rq->tg_load_avg_contrib > cfs_rq->tg_load_avg_contrib / 64)

which should always be true, right?

Thanks,
Aaron

> 
> In fact, in the second if-condition of cfs_rq_is_decay(), the comment in 
> the load_avg_is_decayed() function states:"_avg must be null when _sum is 
> null because _avg = _sum / divider". Therefore, when we check this newly 
> added condition, cfs_rq->avg.load_avg should already be 0, right?
> 
> After reading your comments, I carefully considered the differences 
> between these two approaches. Here, my condition is similar
> to cfs_rq->tg_load_avg_contrib != cfs_rq->avg.load_avg but weaker. In 
> fact, when cfs_rq->avg.load_avg is already 0, 
> abs(delta) > cfs_rq->tg_load_avg_contrib / 64 is equivalent to 
> cfs_rq->tg_load_avg_contrib > cfs_rq->tg_load_avg_contrib / 64,
> Further reasoning leads to the condition cfs_rq->tg_load_avg_contrib > 0.
> However if cfs_rq->avg.load_avg is not necessarily 0 at this point, then
> the condition you propose is obviously more accurate, simpler than the
> delta check, and requires fewer calculations.
> 
> I think our perspectives differ. From the perspective of 
> update_tg_load_avg(), the semantics of this condition are as follows: if
> there is no 1ms update limit, and update_tg_load_avg() can continue 
> updating after checking the delta, then in cfs_rq_is_decayed() we should
> return false to keep the cfs_rq in the list for subsequent updates. As 
> mentioned in the first paragraph, this avoids that tricky situation. From
> the perspective of cfs_rq_is_decayed(), the semantics of the condition you
> proposed are that if cfs_rq->avg.load_avg is already 0, then it cannot be
> removed from the list before all load_avg are updated to tg. That makes 
> sense to me, but I still feel like there's a little bit of a risk. Am I 
> being paranoid?
> 
> How do you view these two lines of thinking?
> 
> It's a pleasure to discuss this with you, 
> xupengbo.
> 
> > > +               return false;
> > > +
> > >         return true;
> > >  }
> > >
> > > --
> > > 2.43.0
> > >

  reply	other threads:[~2025-08-06  7:33 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-04 13:03 [PATCH] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out xupengbo
2025-08-05  9:08 ` Aaron Lu
2025-08-05  9:17   ` Vincent Guittot
2025-08-06  6:31     ` [PATCH v2] " xupengbo
2025-08-06  7:33       ` Aaron Lu [this message]
2025-08-06  7:58         ` xupengbo
2025-08-06  8:38         ` xupengbo
2025-08-06  9:22           ` Vincent Guittot
2025-08-25  9:50           ` Aaron Lu
  -- strict thread matches above, loose matches on Subject: below --
2025-08-05 14:41 xupengbo
2025-08-05 16:10 ` Vincent Guittot
2025-08-26  7:57 [PATCH v3] " xupengbo
2025-08-26  8:17 ` [PATCH v2] " xupengbo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250806071848.GA629@bytedance \
    --to=ziqianlu@bytedance.com \
    --cc=bsegall@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=xupengbo@oppo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.