From: Matteo Martelli <matteo.martelli@codethink.co.uk>
To: Aaron Lu <ziqianlu@bytedance.com>
Cc: Valentin Schneider <vschneid@redhat.com>,
Ben Segall <bsegall@google.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Peter Zijlstra <peterz@infradead.org>,
Chengming Zhou <chengming.zhou@linux.dev>,
Josh Don <joshdon@google.com>, Ingo Molnar <mingo@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Xi Wang <xii@google.com>,
linux-kernel@vger.kernel.org, Juri Lelli <juri.lelli@redhat.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mel Gorman <mgorman@suse.de>,
Chuyi Zhou <zhouchuyi@bytedance.com>,
Jan Kiszka <jan.kiszka@siemens.com>,
Florian Bezdeka <florian.bezdeka@siemens.com>,
Songtang Liu <liusongtang@bytedance.com>,
Matteo Martelli <matteo.martelli@codethink.co.uk>
Subject: Re: [PATCH v3 0/5] Defer throttle when task exits to user
Date: Fri, 08 Aug 2025 18:37:35 +0200 [thread overview]
Message-ID: <d37fcac575ee94c3fe605e08e6297986@codethink.co.uk> (raw)
In-Reply-To: <20250804075204.GA496@bytedance>
Hi Aaron,
On Mon, 4 Aug 2025 15:52:04 +0800, Aaron Lu <ziqianlu@bytedance.com> wrote:
> Hi Matteo,
>
> On Fri, Aug 01, 2025 at 04:31:25PM +0200, Matteo Martelli wrote:
> ... ...
> > I encountered this issue on a test image with both PREEMPT_RT and
> > CFS_BANDWIDTH kernel options enabled. The test image is based on
> > freedesktop-sdk (v24.08.10) [1] with custom system configurations on
> > top, and it was being run on qemu x86_64 with 4 virtual CPU cores. One
> > notable system configuration is having most of system services running
> > on a systemd slice, restricted on a single CPU core (with AllowedCPUs
> > systemd option) and using CFS throttling (with CPUQuota systemd option).
> > With this configuration I encountered RCU stalls during boots, I think
> > because of the increased probability given by multiple processes being
> > spawned simultaneously on the same core. After the first RCU stall, the
> > system becomes unresponsive and successive RCU stalls are detected
> > periodically. This seems to match with the deadlock situation described
> > in your cover letter. I could only reproduce RCU stalls with the
> > combination of both PREEMPT_RT and CFS_BANDWIDTH enabled.
> >
> > I previously already tested this patch set at v2 (RFC) [2] on top of
> > kernel v6.14 and v6.15. I've now retested it at v3 on top of kernel
> > v6.16-rc7. I could no longer reproduce RCU stalls in all cases with the
> > patch set applied. More specifically, in the last test I ran, without
> > patch set applied, I could reproduce 32 RCU stalls in 24 hours, about 1
> > or 2 every hour. In this test the system was rebooting just after the
> > first RCU stall occurrence (through panic_on_rcu_stall=1 and panic=5
> > kernel cmdline arguments) or after 100 seconds if no RCU stall occurred.
> > This means the system rebooted 854 times in 24 hours (about 3.7%
> > reproducibility). You can see below two RCU stall instances. I could not
> > reproduce any RCU stall with the same test after applying the patch set.
> > I obtained similar results while testing the patch set at v2 (RFC)[1].
> > Another possibly interesting note is that the original custom
> > configuration was with the slice CPUQuota=150%, then I retested it with
> > CPUQuota=80%. The issue was reproducible in both configurations, notably
> > even with CPUQuota=150% that to my understanding should correspond to no
> > CFS throttling due to the CPU affinity set to 1 core only.
>
> Agree. With cpu affinity set to 1 core, 150% quota should never hit. But
> from the test results, it seems quota is hit somehow because if quota is
> not hit, this series should make no difference.
>
> Maybe fire a bpftrace script and see if quota is actually hit? A
> reference script is here:
> https://lore.kernel.org/lkml/20250521115115.GB24746@bytedance/
>
I better looked into this and actually there was another slice
(user.slice) configured with CPUQuota=25%. Disabling the CPUQuota limit
on the first mentioned slice (system.slice) I could still reproduce the
RCU stalls. It looks like the throttling was happening during the first
login after boot as also shown by the following ftrace logs.
[ 12.019263] podman-user-gen-992 [000] dN.2. 12.023684: throttle_cfs_rq <-pick_task_fair
[ 12.051074] systemd-981 [000] dN.2. 12.055502: throttle_cfs_rq <-pick_task_fair
[ 12.150067] systemd-981 [000] dN.2. 12.154500: throttle_cfs_rq <-put_prev_entity
[ 12.251448] systemd-981 [000] dN.2. 12.255839: throttle_cfs_rq <-put_prev_entity
[ 12.369867] sshd-session-976 [000] dN.2. 12.374293: throttle_cfs_rq <-pick_task_fair
[ 12.453080] bash-1002 [000] dN.2. 12.457502: throttle_cfs_rq <-pick_task_fair
[ 12.551279] bash-1012 [000] dN.2. 12.555701: throttle_cfs_rq <-pick_task_fair
[ 12.651085] podman-998 [000] dN.2. 12.655505: throttle_cfs_rq <-pick_task_fair
[ 12.750509] podman-1001 [000] dN.2. 12.754931: throttle_cfs_rq <-put_prev_entity
[ 12.868351] podman-1030 [000] dN.2. 12.872780: throttle_cfs_rq <-put_prev_entity
[ 12.961076] podman-1033 [000] dN.2. 12.965504: throttle_cfs_rq <-put_prev_entity
By increasing the CPUQuota to 50% limit of the user.slice, the same test
mentioned in my previous email produced less RCU stalls and less
throttling events in the ftrace logs. Then by setting the user.slice to
100% I could no longer reproduce either RCU stalls or traced throttling
events.
> > I also ran some quick tests with stress-ng and systemd CPUQuota parameter to
> > verify that CFS throttling was behaving as expected. See details below after
> > RCU stall logs.
>
> Thanks for all these tests. If I read them correctly, in all these
> tests, CFS throttling worked as expected. Right?
>
Yes, correct.
> Best regards,
> Aaron
>
Best regards,
Matteo Martelli
next prev parent reply other threads:[~2025-08-08 16:38 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-15 7:16 [PATCH v3 0/5] Defer throttle when task exits to user Aaron Lu
2025-07-15 7:16 ` [PATCH v3 1/5] sched/fair: Add related data structure for task based throttle Aaron Lu
2025-07-15 7:16 ` [PATCH v3 2/5] sched/fair: Implement throttle task work and related helpers Aaron Lu
2025-07-15 7:16 ` [PATCH v3 3/5] sched/fair: Switch to task based throttle model Aaron Lu
2025-07-15 23:29 ` kernel test robot
2025-07-16 6:57 ` Aaron Lu
2025-07-16 7:40 ` Philip Li
2025-07-16 11:15 ` [PATCH v3 update " Aaron Lu
2025-07-16 11:27 ` [PATCH v3 " Peter Zijlstra
2025-07-16 15:20 ` kernel test robot
2025-07-17 3:52 ` Aaron Lu
2025-07-23 8:21 ` Oliver Sang
2025-07-23 10:08 ` Aaron Lu
2025-08-08 9:12 ` Valentin Schneider
2025-08-08 10:13 ` Aaron Lu
2025-08-08 11:45 ` Valentin Schneider
2025-08-12 8:48 ` Aaron Lu
2025-08-14 15:54 ` Valentin Schneider
2025-08-15 9:30 ` Aaron Lu
2025-08-22 11:07 ` Aaron Lu
2025-09-03 7:14 ` Aaron Lu
2025-09-03 9:11 ` K Prateek Nayak
2025-09-03 10:11 ` Aaron Lu
2025-09-03 10:31 ` K Prateek Nayak
2025-09-03 11:35 ` Aaron Lu
2025-09-04 7:33 ` Bezdeka, Florian
2025-09-04 8:26 ` K Prateek Nayak
2025-09-04 8:40 ` Aaron Lu
2025-08-28 3:50 ` Aaron Lu
2025-08-17 8:50 ` Chen, Yu C
2025-08-18 2:50 ` Aaron Lu
2025-08-18 3:10 ` Chen, Yu C
2025-08-18 3:12 ` Aaron Lu
2025-07-15 7:16 ` [PATCH v3 4/5] sched/fair: Task based throttle time accounting Aaron Lu
2025-08-18 14:57 ` Valentin Schneider
2025-08-19 9:34 ` Aaron Lu
2025-08-19 14:09 ` Valentin Schneider
2025-08-26 14:10 ` Michal Koutný
2025-08-27 15:16 ` Valentin Schneider
2025-08-28 6:06 ` Aaron Lu
2025-08-26 9:15 ` Aaron Lu
2025-07-15 7:16 ` [PATCH v3 5/5] sched/fair: Get rid of throttled_lb_pair() Aaron Lu
2025-07-15 7:22 ` [PATCH v3 0/5] Defer throttle when task exits to user Aaron Lu
2025-08-01 14:31 ` Matteo Martelli
2025-08-04 7:52 ` Aaron Lu
2025-08-04 11:18 ` Valentin Schneider
2025-08-04 11:56 ` Aaron Lu
2025-08-08 16:37 ` Matteo Martelli [this message]
2025-08-04 8:51 ` K Prateek Nayak
2025-08-04 11:48 ` Aaron Lu
2025-08-27 14:58 ` Valentin Schneider
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d37fcac575ee94c3fe605e08e6297986@codethink.co.uk \
--to=matteo.martelli@codethink.co.uk \
--cc=bsegall@google.com \
--cc=chengming.zhou@linux.dev \
--cc=dietmar.eggemann@arm.com \
--cc=florian.bezdeka@siemens.com \
--cc=jan.kiszka@siemens.com \
--cc=joshdon@google.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=liusongtang@bytedance.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=xii@google.com \
--cc=zhouchuyi@bytedance.com \
--cc=ziqianlu@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.