All of lore.kernel.org
 help / color / mirror / Atom feed
From: William Montaz <willymontaz@gmail.com>
To: vincent.guittot@linaro.org
Cc: bsegall@google.com, dietmar.eggemann@arm.com,
	dsmythies@telus.net, juri.lelli@redhat.com,
	kprateek.nayak@amd.com, linux-kernel@vger.kernel.org,
	mgorman@suse.de, mingo@kernel.org, peterz@infradead.org,
	quzicheng@huawei.com, rostedt@goodmis.org,
	shubhang@os.amperecomputing.com, vschneid@redhat.com,
	wangtao554@huawei.com
Subject: Re: [PATCH v2 6/7] sched/fair: Revert 6d71a9c61604 ("sched/fair: Fix EEVDF entity placement bug causing scheduling lag")
Date: Tue, 24 Mar 2026 10:01:26 +0000	[thread overview]
Message-ID: <20260324100126.3502-1-willymontaz@gmail.com> (raw)
In-Reply-To: <CAKfTPtAcAC1Nac6z=59U-OkpgT6bvmG9a9kE+pGeD=Z80+xt=g@mail.gmail.com>

Hi,

> Zicheng Qu reported that, because avg_vruntime() always includes
> cfs_rq->curr, when ->on_rq, place_entity() doesn't work right.

> Specifically, the lag scaling in place_entity() relies on
> avg_vruntime() being the state *before* placement of the new entity.
> However in this case avg_vruntime() will actually already include the
> entity, which breaks things.

This has proven to be harmful on our production cluster using kernel version 6.18.19

We witness a parent cgroup entity (/kubepods.slice in our case) changing very frequently load_avg figures, 
which leads to calling entity_pick->update_cfs_group->reweight_entity very often (pretty much at all entity_tick call).

If a cpu hogging task is member of this cgroup and bound to a CPU,
we observe starvation of processes bound to that same CPU but not being members of this cgroup
(kworkers for ceph rbd in our production case).

Looking at /sys/kernel/debug/sched/debug, we can indeed see that cfs_rq[0]:/ .avg_vruntime and .zero_vruntime
continuously move back in time while .left_deadline and .left_vruntime are stuck.

This is likely due to the wrong lag calculation of the cgroup entity within the root cgroup.

We can reproduce that in a sandboxed manner doing the following:
* create a cgroup 'CG'
* run a cpu intensive task 'offender', bound to a CPU
* move the task to cgroup 'CG'
* run a cpu intensive task 'victim' bound to the same CPU
* To reproduce the frequent call to reweight_entity, we change rapidly CG/cpu.weight from 99, 100, 101 and loop
* 'victim' will stop running

I use the following script to reproduce:

---
#!/bin/bash
TARGET_CPU=0
CG_PATH="/sys/fs/cgroup/test_reweight"

cat << 'EOF' > heartbeat.c
#include <stdio.h>
#include <time.h>
#include <stdint.h>
int main() {
    struct timespec last, now;
    uint64_t count = 0;
    clock_gettime(CLOCK_MONOTONIC, &last);
    while (1) {
        count++;
        clock_gettime(CLOCK_MONOTONIC, &now);
        long delta_ms = (now.tv_sec - last.tv_sec) * 1000 + (now.tv_nsec - last.tv_nsec) / 1000000;
        if (delta_ms >= 500) {
            printf("Tick: %lu iterations (delta %ld ms)\n", count, delta_ms);
            fflush(stdout);
            count = 0;
            last = now;
        }
    }
    return 0;
}
EOF

gcc -O2 heartbeat.c -o heartbeat

mkdir -p "$CG_PATH"
echo "+cpu" > /sys/fs/cgroup/cgroup.subtree_control

taskset -c $TARGET_CPU yes > /dev/null &
PID_YES=$!
echo $PID_YES > "$CG_PATH/cgroup.procs"

taskset -c $TARGET_CPU ./heartbeat &
PID_HEARTBEAT=$!

echo "5 seconds observation..."
sleep 5

echo "Jittering on $CG_PATH/cpu.weight..."
trap "kill $PID_YES $PID_HEARTBEAT; rmdir $CG_PATH; rm heartbeat.c; rm heartbeat; exit" SIGINT SIGTERM
while true; do
    echo 99 > "$CG_PATH/cpu.weight"
    echo 100 > "$CG_PATH/cpu.weight"
    echo 101 > "$CG_PATH/cpu.weight"
done
---

I tested the following versions:
* LTS 5.10.252, 5.15.202, 6.1.166, 6.6.129, 6.12.77 --> no issue
* LTS 6.18.19 has the issue
* Stable 6.19.9 has the issue
* Mainline 7.0-rc5 has the issue
* Tip 7.0.0-rc5+ no issue

Finally, I applied the patch to 6.18.19 LTS which solves the issue. However, we do not benefit from previous patches
such as [PATCH v2 5/7] sched/fair: Increase weight bits for avg_vruntime.

Thus I would prefer to let you decide how you want to adress backport on 6.18

If you want I can share my patch file, let me know.

Best regards






  reply	other threads:[~2026-03-24 10:01 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-19  7:58 [PATCH v2 0/7] sched: Various reweight_entity() fixes Peter Zijlstra
2026-02-19  7:58 ` [PATCH v2 1/7] sched/fair: Fix zero_vruntime tracking Peter Zijlstra
2026-02-23 10:56   ` Vincent Guittot
2026-02-23 13:09   ` Dietmar Eggemann
2026-02-23 14:15     ` Peter Zijlstra
2026-02-24  8:53       ` Dietmar Eggemann
2026-02-24  9:02         ` Peter Zijlstra
2026-03-28  5:44   ` John Stultz
2026-03-28 17:04     ` Steven Rostedt
2026-03-30 17:58       ` John Stultz
2026-03-30 18:27         ` Steven Rostedt
2026-03-30  9:43     ` Peter Zijlstra
2026-03-30 17:49       ` John Stultz
2026-03-30 10:10     ` Peter Zijlstra
2026-03-30 14:37       ` K Prateek Nayak
2026-03-30 14:40         ` Peter Zijlstra
2026-03-30 15:50           ` K Prateek Nayak
2026-03-30 19:11             ` Peter Zijlstra
2026-03-31  0:38               ` K Prateek Nayak
2026-03-31  4:58                 ` K Prateek Nayak
2026-03-31  7:08                 ` Peter Zijlstra
2026-03-31  7:14                   ` Peter Zijlstra
2026-03-31  8:49                     ` K Prateek Nayak
2026-03-31  9:29                       ` Peter Zijlstra
2026-03-31 12:20                         ` Peter Zijlstra
2026-03-31 16:14                           ` Peter Zijlstra
2026-03-31 17:02                             ` K Prateek Nayak
2026-03-31 22:40                             ` John Stultz
2026-03-30 19:40       ` John Stultz
2026-03-30 19:43         ` Peter Zijlstra
2026-03-30 21:45           ` John Stultz
2026-02-19  7:58 ` [PATCH v2 2/7] sched/fair: Only set slice protection at pick time Peter Zijlstra
2026-02-19  7:58 ` [PATCH v2 3/7] sched/eevdf: Update se->vprot in reweight_entity() Peter Zijlstra
2026-02-19  7:58 ` [PATCH v2 4/7] sched/fair: Fix lag clamp Peter Zijlstra
2026-02-23 10:23   ` Dietmar Eggemann
2026-02-23 10:57   ` Vincent Guittot
2026-02-19  7:58 ` [PATCH v2 5/7] sched/fair: Increase weight bits for avg_vruntime Peter Zijlstra
2026-02-23 10:56   ` Vincent Guittot
2026-02-23 11:51     ` Peter Zijlstra
2026-02-23 12:36       ` Peter Zijlstra
2026-02-23 13:06       ` Vincent Guittot
2026-03-30  7:55       ` K Prateek Nayak
2026-03-30  9:27         ` Peter Zijlstra
2026-04-02  5:28         ` K Prateek Nayak
2026-04-02 10:22           ` Peter Zijlstra
2026-04-02 10:56             ` K Prateek Nayak
2026-04-03  4:02               ` K Prateek Nayak
2026-04-07 12:00                 ` Peter Zijlstra
2026-04-07 13:42                   ` [tip: sched/core] sched/fair: Avoid overflow in enqueue_entity() tip-bot2 for K Prateek Nayak
2026-02-19  7:58 ` [PATCH v2 6/7] sched/fair: Revert 6d71a9c61604 ("sched/fair: Fix EEVDF entity placement bug causing scheduling lag") Peter Zijlstra
2026-02-23 10:57   ` Vincent Guittot
2026-03-24 10:01     ` William Montaz [this message]
2026-04-07 13:45       ` Peter Zijlstra
2026-04-15 21:10       ` [PATCH 6.18] " John Stultz
2026-02-19  7:58 ` [PATCH v2 7/7] sched/fair: Use full weight to __calc_delta() Peter Zijlstra
2026-02-23 10:57   ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260324100126.3502-1-willymontaz@gmail.com \
    --to=willymontaz@gmail.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=dsmythies@telus.net \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=quzicheng@huawei.com \
    --cc=rostedt@goodmis.org \
    --cc=shubhang@os.amperecomputing.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=wangtao554@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.