All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aaron Lu <ziqianlu@bytedance.com>
To: Hao Jia <jiahao.kernel@gmail.com>
Cc: "Valentin Schneider" <vschneid@redhat.com>,
	"Ben Segall" <bsegall@google.com>,
	"K Prateek Nayak" <kprateek.nayak@amd.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Chengming Zhou" <chengming.zhou@linux.dev>,
	"Josh Don" <joshdon@google.com>, "Ingo Molnar" <mingo@redhat.com>,
	"Vincent Guittot" <vincent.guittot@linaro.org>,
	"Xi Wang" <xii@google.com>,
	linux-kernel@vger.kernel.org,
	"Juri Lelli" <juri.lelli@redhat.com>,
	"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Mel Gorman" <mgorman@suse.de>,
	"Chuyi Zhou" <zhouchuyi@bytedance.com>,
	"Jan Kiszka" <jan.kiszka@siemens.com>,
	"Florian Bezdeka" <florian.bezdeka@siemens.com>,
	"Songtang Liu" <liusongtang@bytedance.com>,
	"Chen Yu" <yu.c.chen@intel.com>,
	"Matteo Martelli" <matteo.martelli@codethink.co.uk>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>
Subject: Re: [PATCH] sched/fair: Prevent cfs_rq from being unthrottled with zero runtime_remaining
Date: Wed, 15 Oct 2025 16:40:45 +0800	[thread overview]
Message-ID: <20251015084045.GB35@bytedance> (raw)
In-Reply-To: <4902f7d4-c6ee-bc29-dd7f-282d19d0b3b2@gmail.com>

On Wed, Oct 15, 2025 at 02:31:27PM +0800, Hao Jia wrote:
> On 2025/10/15 10:51, Aaron Lu wrote:
> > On Wed, Oct 15, 2025 at 09:43:20AM +0800, Hao Jia wrote:
> > ... ...
> > > Yes, I've already hit the cfs_rq->runtime_remaining < 0 condition in
> > > tg_unthrottle_up().
> > > 
> > > This morning, after applying your patch, I still get the same issue.
> > > However, As before, because cfs_rq->curr isn't NULL,
> > > check_enqueue_throttle() returns prematurely, preventing the triggering of
> > > throttle_cfs_rq().
> > > 
> > > 
> > > Some information to share with you.
> > 
> > Can you also share your cgroup setup and related quota setting etc. and
> > how to trigger it? Thanks.
> 
> I ran some internal workloads on my test machine with different quota
> settings, and added 10 sched messaging branchmark cgroups, setting their
> cpu.max to 1000 100000.
> 
> perf bench sched messaging -g 10 -t -l 50000 &
> 
> I'm not sure if the issue can be reproduced without these internal
> workloads.

Thanks for the report, I think I understand your concern now.

I managed to trigger a condition in tg_unthrottle_up() for a cfs_rq that
has runtime_enabled but with a negative runtime_remaining, the setup is
as before:

          root
        /      \
        A*     ...
     /  |  \   ...
        B
       /  \
      C*

where both A and C have quota settings.

1 Initially, both cfs_rq_a and cfs_rq_c are in unthrottled state with a
  positive runtime_remaining.
2 At some time, cfs_rq_a is throttled. cfs_rq_c is now in a throttled
  hierarchy, but it's not throttled and has a positive runtime_remaining.
3 Some time later, task @p gets enqueued to cfs_rq_c and starts execution
  in kernel mode, consumed all cfs_rq_c's runtime_remaining.
  account_cfs_rq_runtime() properly accounted, but resched_curr() doesn't
  cause schedule() -> check_cfs_rq_runtime() -> throttle_cfs_rq() to
  happen immediately, because task @p is still executing in kernel mode
  (CONFIG_PREEMPT_VOLUNTARY).
4 Some time later, cfs_rq_a is unthrottled.
  tg_unthrottle_up() noticed cfs_rq_c has a negative runtime_remaining.

In this situation, check_enqueue_throttle() will not do anything though
because cfs_rq_c->curr is set, throttle will not happen immediately so
it won't cause throttle to happen on unthrottle path.

Hao Jia,

Do I understand you correctly that you can only hit the newly added
debug warn in tg_unthrottle_up():
WARN_ON_ONCE(cfs_rq->runtime_enabled && cfs_rq->runtime_remaining <= 0);
but not throttle triggered on unthrottle path?

BTW, I think your change has the advantage of being straightforward and
easy to reason about. My concern is, it's not efficient to enqueue tasks
to a cfs_rq that has no runtime left, not sure how big a deal that is
though.

Thanks.

  reply	other threads:[~2025-10-15  8:40 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-29  7:46 [PATCH] sched/fair: Prevent cfs_rq from being unthrottled with zero runtime_remaining Aaron Lu
2025-09-29  9:34 ` K Prateek Nayak
2025-09-29 10:55   ` Aaron Lu
2025-09-30  7:56   ` Aaron Lu
2025-09-30  8:58     ` K Prateek Nayak
2025-09-30  9:27       ` Aaron Lu
2025-09-30 11:07       ` Aaron Lu
2025-09-30 12:39         ` Aaron Lu
2025-09-30 13:38         ` K Prateek Nayak
2025-10-01 11:58           ` Aaron Lu
2025-10-14  7:43 ` Hao Jia
2025-10-14  9:11   ` Aaron Lu
2025-10-14 11:01     ` Hao Jia
2025-10-14 11:50       ` Aaron Lu
2025-10-15  1:43         ` Hao Jia
2025-10-15  1:48           ` Hao Jia
2025-10-15  2:51           ` Aaron Lu
2025-10-15  6:31             ` Hao Jia
2025-10-15  8:40               ` Aaron Lu [this message]
2025-10-15 10:21                 ` Hao Jia
2025-10-16  6:54                   ` Aaron Lu
2025-10-16  7:49                     ` Hao Jia
2025-10-16  9:23                       ` Aaron Lu
2025-10-16 11:04                         ` Hao Jia
2025-10-16 11:46                           ` Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251015084045.GB35@bytedance \
    --to=ziqianlu@bytedance.com \
    --cc=bigeasy@linutronix.de \
    --cc=bsegall@google.com \
    --cc=chengming.zhou@linux.dev \
    --cc=dietmar.eggemann@arm.com \
    --cc=florian.bezdeka@siemens.com \
    --cc=jan.kiszka@siemens.com \
    --cc=jiahao.kernel@gmail.com \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liusongtang@bytedance.com \
    --cc=matteo.martelli@codethink.co.uk \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=xii@google.com \
    --cc=yu.c.chen@intel.com \
    --cc=zhouchuyi@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.