All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar	 <viresh.kumar@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Steven Rostedt	 <rostedt@goodmis.org>,
	John Stultz <jstultz@google.com>,
	Dietmar Eggemann	 <dietmar.eggemann@arm.com>,
	"Chen, Yu C" <yu.c.chen@intel.com>,
	Thomas Gleixner	 <tglx@kernel.org>,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org
Subject: Re: [PATCH] sched/fair: Call update_util_est() after dequeue_entities()
Date: Fri, 15 May 2026 11:35:09 -0700	[thread overview]
Message-ID: <15a0cfb8457b7d9d767e7f2d2bcfd56b155f6c0b.camel@linux.intel.com> (raw)
In-Reply-To: <20260512124653.305275-1-qyousef@layalina.io>

On Tue, 2026-05-12 at 13:46 +0100, Qais Yousef wrote:
> update_util_est() reads task_util() at dequeue which is updated in
> dequeue_entities(). To read the accurate util_avg at dequeue, make sure
> to do the read after load_avg is updated in dequeue_entities().
> 
> util_est for a periodic task before
> 
>                                 periodic-3114 util_est.enqueued running
>    ┌───────────────────────────────────────────────────────────────────────────────────────────────┐
> 183┤                ▖▗  ▐▖         ▖ ▗▙   ▗   ▗▙▖▖       ▖▖   ▖       ▖▖        ▗  ▟  ▗▄▖          │
> 139┤               ▐▛█▜▙▞▀▄▄▞▚▄▟█▞▙█▄▟▀▚▄▄▞▚▄▄▟▀▀▛▄▝▄▄▄▙█▛▛█▛▜▛▄▄▀▄█▙▛▛▛▙▄▀▄▄▖▜▄▟█▟▀▜▟▄▜▀▄▄▟▙▖     │
>  95┤              ▐▀    ▘   ▝   ▝        ▝▘        ▘   ▘▘       ▝▘       ▝▘  ▝    ▝        ▀       │
>    │              ▛                                                                                │
>  51┤             ▐▘                                                                                │
>   7┤      ▖▗▗  ▗▄▐                                                                                 │
>    └┬─────────┬──────────┬─────────┬──────────┬─────────┬──────────┬─────────┬──────────┬─────────┬┘
>   0.00      0.65       1.30      1.96       2.61      3.26       3.91      4.57       5.22     5.87
> 
> and after
> 
>                                  periodic-2977 util_est.enqueued running
>      ┌─────────────────────────────────────────────────────────────────────────────────────────────┐
> 157.0┤               ▙▄ ▗▄  ▗▄▄▄ ▗▄  ▗▄▄▄▗▄▄  ▗▄▄▖ ▄   ▄▄▄   ▄  ▄▖▖  ▄▄▄▄▄▖▖▝▙▄▄▄▄▄▄▖ ▗▄           │
> 119.5┤             ▗▄▌▘▀▀ ▀▀▀ ▝▀▀▘▝▀▀▀ ▝▀▘ ▝▀▀▘ ▀▝▀▘▀▀▀▘▝▀▀▀▀▀▀▀▘▝▝▀▀ ▀   ▝▝▀  ▀   ▀▀▀▀            │
>  82.0┤             ▟                                                                               │
>      │             ▌                                                                               │
>  44.5┤             ▌                                                                               │
>   7.0┤      ▗   ▗▖ ▌                                                                               │
>      └┬─────────┬─────────┬──────────┬─────────┬─────────┬─────────┬──────────┬─────────┬─────────┬┘
>     0.00      0.65      1.30       1.95      2.60      3.25      3.90       4.56      5.21     5.86
> 
> Note how the signal is noisier and can peak to 183 vs 157 now.
> 
> Fixes: b55945c500c5 ("sched: Fix pick_next_task_fair() vs try_to_wake_up() race")
> Signed-off-by: Qais Yousef <qyousef@layalina.io>
> ---
> 
> This is split from [1] series where I stumbled upon this problem. AFAICS it
> needs backporting all the way to 6.12 LTS.
> 
> [1] https://lore.kernel.org/lkml/20260504020003.71306-1-qyousef@layalina.io/
> 
>  kernel/sched/fair.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 728965851842..96ba97e5f4ae 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7401,6 +7401,8 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
>   */
>  static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>  {
> +	int ret;
> +
>  	if (task_is_throttled(p)) {
>  		dequeue_throttled_task(p, flags);
>  		return true;
> @@ -7409,8 +7411,9 @@ static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>  	if (!p->se.sched_delayed)
>  		util_est_dequeue(&rq->cfs, p);
>  
> +	ret = dequeue_entities(rq, &p->se, flags);
>  	util_est_update(&rq->cfs, p, flags & DEQUEUE_SLEEP);

I thought that util_est_update() was called intentionally before dequeue_entities
to update the utilization of task p up to this time right
before the dequeue.  Then dequeue_entities() is called later
with up to date task utilization estimate of p.

Perhaps util_est_update() should be moved before
util_est_dequeue() so the updated utilization of p
is subtracted from the rq utilization.

@@ -8002,10 +8002,10 @@ static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
                return true;
        }
 
+       util_est_update(&rq->cfs, p, flags & DEQUEUE_SLEEP);
        if (!p->se.sched_delayed)
                util_est_dequeue(&rq->cfs, p);
 
-       util_est_update(&rq->cfs, p, flags & DEQUEUE_SLEEP);
        if (dequeue_entities(rq, &p->se, flags) < 0)
                return false;


Tim

> -	if (dequeue_entities(rq, &p->se, flags) < 0)
> +	if (ret < 0)
>  		return false;
>  
>  	/*

      parent reply	other threads:[~2026-05-15 18:35 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 12:46 [PATCH] sched/fair: Call update_util_est() after dequeue_entities() Qais Yousef
2026-05-12 20:52 ` John Stultz
2026-05-13 12:46   ` Vincent Guittot
2026-05-15  1:51     ` Qais Yousef
2026-05-15 15:23       ` Vincent Guittot
2026-05-15 18:35 ` Tim Chen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=15a0cfb8457b7d9d767e7f2d2bcfd56b155f6c0b.camel@linux.intel.com \
    --to=tim.c.chen@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=jstultz@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=rafael@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.