All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matteo Martelli <matteo.martelli@codethink.co.uk>
To: Aaron Lu <ziqianlu@bytedance.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>
Cc: "Valentin Schneider" <vschneid@redhat.com>,
	"Ben Segall" <bsegall@google.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Chengming Zhou" <chengming.zhou@linux.dev>,
	"Josh Don" <joshdon@google.com>, "Ingo Molnar" <mingo@redhat.com>,
	"Vincent Guittot" <vincent.guittot@linaro.org>,
	"Xi Wang" <xii@google.com>,
	linux-kernel@vger.kernel.org,
	"Juri Lelli" <juri.lelli@redhat.com>,
	"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Mel Gorman" <mgorman@suse.de>,
	"Chuyi Zhou" <zhouchuyi@bytedance.com>,
	"Jan Kiszka" <jan.kiszka@siemens.com>,
	"Florian Bezdeka" <florian.bezdeka@siemens.com>,
	"Songtang Liu" <liusongtang@bytedance.com>,
	"Chen Yu" <yu.c.chen@intel.com>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
	"Matteo Martelli" <matteo.martelli@codethink.co.uk>
Subject: Re: [PATCH 1/4] sched/fair: Propagate load for throttled cfs_rq
Date: Thu, 25 Sep 2025 15:33:54 +0200	[thread overview]
Message-ID: <e2e558b863c929c5019264b2ddefd4c0@codethink.co.uk> (raw)
In-Reply-To: <20250925120504.GC120@bytedance>

Hi Aaron,

On Thu, 25 Sep 2025 20:05:04 +0800, Aaron Lu <ziqianlu@bytedance.com> wrote:
> On Thu, Sep 25, 2025 at 04:52:25PM +0530, K Prateek Nayak wrote:
> > 
> > On 9/25/2025 2:59 PM, Aaron Lu wrote:
> > > Hi Prateek,
> > > 
> > > On Thu, Sep 25, 2025 at 01:47:35PM +0530, K Prateek Nayak wrote:
> > >> Hello Aaron, Matteo,
> > >>
> > >> On 9/24/2025 5:03 PM, Aaron Lu wrote:
> > >>>> ...
> > >>>> The test setup is the same used in my previous testing for v3 [2], where
> > >>>> the CFS throttling events are mostly triggered by the first ssh logins
> > >>>> into the system as the systemd user slice is configured with CPUQuota of
> > >>>> 25%. Also note that the same systemd user slice is configured with CPU
> > >>>
> > >>> I tried to replicate this setup, below is my setup using a 4 cpu VM
> > >>> and rt kernel at commit fe8d238e646e("sched/fair: Propagate load for
> > >>> throttled cfs_rq"):
> > >>> # pwd
> > >>> /sys/fs/cgroup/user.slice
> > >>> # cat cpu.max
> > >>> 25000 100000
> > >>> # cat cpuset.cpus
> > >>> 0
> > >>>
> > >>> I then login using ssh as a normal user and I can see throttle happened
> > >>> but couldn't hit this warning. Do you have to do something special to
> > >>> trigger it?

It wasn't very reproducible in my setup either, but I found out that the
warning was being triggered more often when I tried to ssh into the
system just after boot, probably due to some additional load from
processes spawned during the boot phase. Therefore I prepared a
reproducer script that resemble my initial setup, plus a stress-ng
worker in the background while connecting with ssh to the system. I also
reduced the CPUQuota to 10% which seemed to increase the probability to
trigger the warning. With this script I can reproduce the warning about
once or twice every 10 ssh executions. See the script at the end of this
email.

> > >>>> [   18.421350] WARNING: CPU: 0 PID: 1 at kernel/sched/fair.c:400 enqueue_task_fair+0x925/0x980
> > >>>
> > >>> I stared at the code and haven't been able to figure out when
> > >>> enqueue_task_fair() would end up with a broken leaf cfs_rq list.
> > >>>

> > >>
> > >> Yeah neither could I. I tried running with PREEMPT_RT too and still
> > >> couldn't trigger it :(
> > >>
> > >> But I'm wondering if all we are missing is:
> > >>
> > >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > >> index f993de30e146..5f9e7b4df391 100644
> > >> --- a/kernel/sched/fair.c
> > >> +++ b/kernel/sched/fair.c
> > >> @@ -6435,6 +6435,7 @@ static void sync_throttle(struct task_group *tg, int cpu)
> > >>  
> > >>  	cfs_rq->throttle_count = pcfs_rq->throttle_count;
> > >>  	cfs_rq->throttled_clock_pelt = rq_clock_pelt(cpu_rq(cpu));
> > >> +	cfs_rq->pelt_clock_throttled = pcfs_rq->pelt_clock_throttled;
> > >>  }
> > >>  
> > >>  /* conditionally throttle active cfs_rq's from put_prev_entity() */
> > >> ---
> > >>
> > >> This is the only way we can currently have a break in
> > >> cfs_rq_pelt_clock_throttled() hierarchy.
> > >>
> ...
> 
> Hi Matteo,
> 
> Can you test the above diff Prateek sent in his last email? Thanks.
> 

I have just tested with the same script below the diff sent by Prateek
in [1] (also quoted above) that changes sync_throttle(), and I couldn't
reproduce the warning.

Here's the script (I hope it doesn't add too much noise to the email
thread).

---
# Copyright (C) 2024 Codethink Limited
# SPDX-License-Identifier: GPL-2.0-only
#!/usr/bin/bash

script_path=$(realpath "$0")
wrk_dir=$(pwd)
kernel_version=fe8d238e646e
kernel_dir=${wrk_dir}/../linux
kernel_image=${kernel_dir}/arch/x86_64/boot/bzImage
qemu_image_url=https://cdimage.debian.org/images/cloud/sid/daily/20250919-2240/debian-sid-nocloud-amd64-daily-20250919-2240.qcow2
qemu_image_src=${wrk_dir}/$(basename "${qemu_image_url}")
qemu_image=${wrk_dir}/image.qcow2
guest_pkgs="stress-ng" # comma separated list of additional packages to install

run_qemu() {
    qemu-system-x86_64 \
        -m 2G -smp 4 \
        -nographic \
        -nic user,hostfwd=tcp::2222-:22 \
        -M q35,accel=kvm \
        -drive format=qcow2,file="${qemu_image}" \
        -virtfs local,path=.,mount_tag=shared,security_model=mapped-xattr \
        -serial file:console.log \
        -monitor none \
        -append "root=/dev/sda1 console=ttyS0,115200 sysctl.kernel.panic_on_oops=1" \
        -kernel "${kernel_image}"
}

run_ssh() {
    local cmd=$1
    ssh root@localhost -p 2222 \
        -i ${wrk_dir}/id_ed25519 -F none \
        -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
        $cmd
}

setup_kernel() {
    echo "Building kernel ..."
    pushd $kernel_dir
    git reset --hard $kernel_version && git clean -f -d
    make mrproper
    make defconfig
    scripts/config -k -e EXPERT
    scripts/config -k -e PREEMPT_RT
    scripts/config -k -e CFS_BANDWIDTH
    scripts/config -k -e FUNCTION_TRACER
    make olddefconfig
    make -j8
    popd
}

setup_image() {
    echo "Preparing qemu image ..."
    killall -9 qemu-system-x86_64

    if [ ! -f "${qemu_image_src}" ] ; then
        wget ${qemu_image_url} -O ${qemu_image_src}
    fi

    cp ${qemu_image_src} ${qemu_image}

    echo "[Unit]
Description=User and Session Slice
Before=slices.target

[Slice]
CPUQuota=10%
AllowedCPUs=0
    " > user.slice

    yes | ssh-keygen -t ed25519 -f id_ed25519 -N ''

    # https://wiki.debian.org/ThomasChung/CloudImage
    virt-customize -a ${qemu_image} \
        --install ssh,${guest_pkgs} \
        --append-line "/root/.ssh/authorized_keys:$(cat id_ed25519.pub)" \
        --upload "user.slice:/etc/systemd/system/user.slice" \
        --chown "0:0:/etc/systemd/system/user.slice"
}

setup_kernel
setup_image

echo "Run test..."
ts=$(date --utc "+%FT%TZ")
out_dir=${wrk_dir}/out/${ts}
out_log_dir=${out_dir}/logs
mkdir -p ${out_dir} ${out_log_dir}
cp ${script_path} ${out_dir}
cp ${kernel_dir}/.config ${out_dir}/kernel.config
trap "exit" INT TERM
trap "kill 0" EXIT
export -f run_ssh
export wrk_dir

while true; do
        run_qemu &
        qemu_pid=$!
        sleep 10
        run_ssh 'stress-ng --cpu 1 --timeout 60' &
        for i in $(seq 1 10) ; do
            echo "running ssh $i"
            timeout 3 bash -c run_ssh #launch interactive ssh
        done
        run_ssh "systemctl poweroff"
        wait $qemu_pid

        serial_out_file=${out_log_dir}/serial-$(date "+%F-%T").log
	mv ${wrk_dir}/console.log ${serial_out_file}
	grep -e "Kernel panic" -e "Call Trace" -e "Kernel BUG" -e "cut here" ${serial_out_file} && \
            echo "${serial_out_file}" >> ${out_dir}/panics-warns
        grep -e "rcu.*detected.*stall" ${serial_out_file} && \
            echo "${serial_out_file}" >> ${out_dir}/rcu-stalls
done
---

[1]: https://lore.kernel.org/all/db7fc090-5c12-450b-87a4-bcf06e10ef68@amd.com/

Best regards,
Matteo Martelli



  reply	other threads:[~2025-09-25 13:35 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10  9:50 [PATCH 0/4] Task based throttle follow ups Aaron Lu
2025-09-10  9:50 ` [PATCH 1/4] sched/fair: Propagate load for throttled cfs_rq Aaron Lu
2025-09-10 12:36   ` Chengming Zhou
2025-09-16 11:43   ` [tip: sched/core] " tip-bot2 for Aaron Lu
2025-09-23 13:05   ` [PATCH 1/4] " Matteo Martelli
2025-09-24 11:33     ` Aaron Lu
2025-09-25  8:17       ` K Prateek Nayak
2025-09-25  9:29         ` Aaron Lu
2025-09-25 11:22           ` K Prateek Nayak
2025-09-25 12:05             ` Aaron Lu
2025-09-25 13:33               ` Matteo Martelli [this message]
2025-09-26  4:32                 ` K Prateek Nayak
2025-09-26  5:53                   ` Aaron Lu
2025-09-26  8:19                 ` [PATCH] sched/fair: Start a cfs_rq on throttled hierarchy with PELT clock throttled K Prateek Nayak
2025-09-26  9:38                   ` Aaron Lu
2025-09-26 10:11                     ` K Prateek Nayak
2025-10-01 20:37                     ` Benjamin Segall
2025-09-26 14:48                   ` Matteo Martelli
2025-10-21  5:35                 ` [PATCH v2] " K Prateek Nayak
2025-10-21 10:10                   ` Peter Zijlstra
2025-10-22 13:28                   ` [tip: sched/urgent] " tip-bot2 for K Prateek Nayak
2025-09-29  7:51     ` [PATCH 1/4] sched/fair: Propagate load for throttled cfs_rq Aaron Lu
2025-09-10  9:50 ` [PATCH 2/4] sched/fair: update_cfs_group() for throttled cfs_rqs Aaron Lu
2025-09-16 11:43   ` [tip: sched/core] " tip-bot2 for Aaron Lu
2025-09-10  9:50 ` [PATCH 3/4] sched/fair: Do not special case tasks in throttled hierarchy Aaron Lu
2025-09-16 11:43   ` [tip: sched/core] " tip-bot2 for Aaron Lu
2025-09-10  9:50 ` [PATCH 4/4] sched/fair: Do not balance task to a throttled cfs_rq Aaron Lu
2025-09-11  2:03   ` kernel test robot
2025-09-12  3:44     ` [PATCH update " Aaron Lu
2025-09-12  3:56       ` K Prateek Nayak
2025-09-16 11:43       ` [tip: sched/core] " tip-bot2 for Aaron Lu
2025-09-11 10:42 ` [PATCH 0/4] Task based throttle follow ups Peter Zijlstra
2025-09-11 12:16   ` Aaron Lu
2025-09-15 21:54 ` Benjamin Segall
2025-09-19 14:37 ` Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e2e558b863c929c5019264b2ddefd4c0@codethink.co.uk \
    --to=matteo.martelli@codethink.co.uk \
    --cc=bigeasy@linutronix.de \
    --cc=bsegall@google.com \
    --cc=chengming.zhou@linux.dev \
    --cc=dietmar.eggemann@arm.com \
    --cc=florian.bezdeka@siemens.com \
    --cc=jan.kiszka@siemens.com \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liusongtang@bytedance.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=xii@google.com \
    --cc=yu.c.chen@intel.com \
    --cc=zhouchuyi@bytedance.com \
    --cc=ziqianlu@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.