From: Aaron Lu <ziqianlu@bytedance.com>
To: Hao Jia <jiahao.kernel@gmail.com>
Cc: "Valentin Schneider" <vschneid@redhat.com>,
"Ben Segall" <bsegall@google.com>,
"K Prateek Nayak" <kprateek.nayak@amd.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Chengming Zhou" <chengming.zhou@linux.dev>,
"Josh Don" <joshdon@google.com>, "Ingo Molnar" <mingo@redhat.com>,
"Vincent Guittot" <vincent.guittot@linaro.org>,
"Xi Wang" <xii@google.com>,
linux-kernel@vger.kernel.org,
"Juri Lelli" <juri.lelli@redhat.com>,
"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Mel Gorman" <mgorman@suse.de>,
"Chuyi Zhou" <zhouchuyi@bytedance.com>,
"Jan Kiszka" <jan.kiszka@siemens.com>,
"Florian Bezdeka" <florian.bezdeka@siemens.com>,
"Songtang Liu" <liusongtang@bytedance.com>,
"Chen Yu" <yu.c.chen@intel.com>,
"Matteo Martelli" <matteo.martelli@codethink.co.uk>,
"Michal Koutný" <mkoutny@suse.com>,
"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>
Subject: Re: [PATCH] sched/fair: Prevent cfs_rq from being unthrottled with zero runtime_remaining
Date: Tue, 14 Oct 2025 19:50:18 +0800 [thread overview]
Message-ID: <20251014115018.GC41@bytedance> (raw)
In-Reply-To: <84382429-02c1-12d5-bdf4-23e880246cf3@gmail.com>
On Tue, Oct 14, 2025 at 07:01:15PM +0800, Hao Jia wrote:
>
> Hello Aaron,
>
> Thank you for your reply.
>
> On 2025/10/14 17:11, Aaron Lu wrote:
> > Hi Hao,
> >
> > On Tue, Oct 14, 2025 at 03:43:10PM +0800, Hao Jia wrote:
> > >
> > > Hello Aaron,
> > >
> > > On 2025/9/29 15:46, Aaron Lu wrote:
> > > > When a cfs_rq is to be throttled, its limbo list should be empty and
> > > > that's why there is a warn in tg_throttle_down() for non empty
> > > > cfs_rq->throttled_limbo_list.
> > > >
> > > > When running a test with the following hierarchy:
> > > >
> > > > root
> > > > / \
> > > > A* ...
> > > > / | \ ...
> > > > B
> > > > / \
> > > > C*
> > > >
> > > > where both A and C have quota settings, that warn on non empty limbo list
> > > > is triggered for a cfs_rq of C, let's call it cfs_rq_c(and ignore the cpu
> > > > part of the cfs_rq for the sake of simpler representation).
> > > >
> > >
> > > I encountered a similar warning a while ago and fixed it. I have a question
> > > I'd like to ask. tg_unthrottle_up(cfs_rq_C) calls enqueue_task_fair(p) to
> > > enqueue a task, which requires that the runtime_remaining of task p's entire
> > > task_group hierarchy be greater than 0.
> > >
> > > In addition to the case you fixed above,
> > > When bandwidth is running normally, Is it possible that there's a corner
> > > case where cfs_A->runtime_remaining > 0, but cfs_B->runtime_remaining < 0
> > > could trigger a similar warning?
> >
> > Do you mean B also has quota set and cfs_B's runtime_remaining < 0?
> > In this case, B should be throttled and C is a descendent of B so should
> > also be throttled, i.e. C can't be unthrottled when B is in throttled
> > state. Do I understand you correctly?
> >
> Yes, both A and B have quota set.
>
> Is there a possible corner case?
> Asynchronous unthrottling causes other running entities to completely
> consume cfs_B->runtime_remaining (cfs_B->runtime_remaining < 0) but not
> completely consume cfs_A->runtime_remaining (cfs_A->runtime_remaining > 0)
> when we call unthrottle_cfs_rq(cfs_rq_A) .
Let me try to understand the situation here: in your described setup,
all three task groups(A, B, C) have quota set?
>
> When we unthrottle_cfs_rq(cfs_rq_A), cfs_A->runtime_remaining > 0, but if
> cfs_B->runtime_remaining < 0 at this time,
Hmm... if cfs_B->runtime_remaining < 0, why it's not throttled?
> therefore, when enqueue_task_fair(p)->check_enqueue_throttle(cfs_rq_B)->throttle_cfs_rq(cfs_rq_B),
I assume p is a task of group B?
So when A is unthrottled, since p is a throttled task of group B and B
is still throttled, enqueue_task_fair(p) should not happen.
> an warnning may be triggered.
>
> My core question is:
> When we call unthrottle_cfs_rq(cfs_rq_A), we only check
> cfs_rq_A->runtime_remaining. However,
> enqueue_task_fair(p)->enqueue_entity(C->B->A)->check_enqueue_throttle() does
According to this info, I assume p is a task of group C here. If
unthrottle A would cause enqueuing p, that means: either group C and B
do not have quota set or group C and B are in unthrottled state.
> require that the runtime_remaining of each task_group level of task p is
> greater than 0.
If group C and B are in unthrottled state, their runtime_remaining
should be > 0.
>
> Can we guarantee this?
To guarantee this, a warn like below could be used. Can you try in your
setup if you can hit it? Thanks.
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3ef11783369d7..c347aa28c411a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5908,6 +5908,8 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
cfs_rq->throttled_clock_self_time += delta;
}
+ WARN_ON_ONCE(cfs_rq->runtime_enabled && cfs_rq->runtime_remaining <= 0);
+
/* Re-enqueue the tasks that have been throttled at this level. */
list_for_each_entry_safe(p, tmp, &cfs_rq->throttled_limbo_list, throttle_node) {
list_del_init(&p->throttle_node);
next prev parent reply other threads:[~2025-10-14 11:50 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-29 7:46 [PATCH] sched/fair: Prevent cfs_rq from being unthrottled with zero runtime_remaining Aaron Lu
2025-09-29 9:34 ` K Prateek Nayak
2025-09-29 10:55 ` Aaron Lu
2025-09-30 7:56 ` Aaron Lu
2025-09-30 8:58 ` K Prateek Nayak
2025-09-30 9:27 ` Aaron Lu
2025-09-30 11:07 ` Aaron Lu
2025-09-30 12:39 ` Aaron Lu
2025-09-30 13:38 ` K Prateek Nayak
2025-10-01 11:58 ` Aaron Lu
2025-10-14 7:43 ` Hao Jia
2025-10-14 9:11 ` Aaron Lu
2025-10-14 11:01 ` Hao Jia
2025-10-14 11:50 ` Aaron Lu [this message]
2025-10-15 1:43 ` Hao Jia
2025-10-15 1:48 ` Hao Jia
2025-10-15 2:51 ` Aaron Lu
2025-10-15 6:31 ` Hao Jia
2025-10-15 8:40 ` Aaron Lu
2025-10-15 10:21 ` Hao Jia
2025-10-16 6:54 ` Aaron Lu
2025-10-16 7:49 ` Hao Jia
2025-10-16 9:23 ` Aaron Lu
2025-10-16 11:04 ` Hao Jia
2025-10-16 11:46 ` Aaron Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251014115018.GC41@bytedance \
--to=ziqianlu@bytedance.com \
--cc=bigeasy@linutronix.de \
--cc=bsegall@google.com \
--cc=chengming.zhou@linux.dev \
--cc=dietmar.eggemann@arm.com \
--cc=florian.bezdeka@siemens.com \
--cc=jan.kiszka@siemens.com \
--cc=jiahao.kernel@gmail.com \
--cc=joshdon@google.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=liusongtang@bytedance.com \
--cc=matteo.martelli@codethink.co.uk \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=xii@google.com \
--cc=yu.c.chen@intel.com \
--cc=zhouchuyi@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox