All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aaron Lu <ziqianlu@bytedance.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Valentin Schneider <vschneid@redhat.com>,
	Ben Segall <bsegall@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Josh Don <joshdon@google.com>, Ingo Molnar <mingo@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	linux-kernel@vger.kernel.org, Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mel Gorman <mgorman@suse.de>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	Chuyi Zhou <zhouchuyi@bytedance.com>
Subject: Re: [RFC PATCH 3/7] sched/fair: Handle unthrottle path for task based throttle
Date: Mon, 17 Mar 2025 13:48:47 +0800	[thread overview]
Message-ID: <20250317054847.GA3107573@bytedance> (raw)
In-Reply-To: <dd749cb4-fcf7-4007-a68e-3ca405925e9d@amd.com>

On Fri, Mar 14, 2025 at 11:22:20PM +0530, K Prateek Nayak wrote:
> Hello Aaron,
> 
> On 3/14/2025 4:13 PM, Aaron Lu wrote:
> > On Fri, Mar 14, 2025 at 09:23:47AM +0530, K Prateek Nayak wrote:
> > > Hello Aaron,
> > > 
> > > On 3/13/2025 12:51 PM, Aaron Lu wrote:
> > > 
> > > [..snip..]
> > > 
> > > > ---
> > > >    kernel/sched/fair.c | 132 +++++++++++++++-----------------------------
> > > >    1 file changed, 45 insertions(+), 87 deletions(-)
> > > > 
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index ab403ff7d53c8..4a95fe3785e43 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -5366,18 +5366,17 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct
> > > > sched_entity *se, int flags)
> > > > 
> > > >    	if (cfs_rq->nr_queued == 1) {
> > > >    		check_enqueue_throttle(cfs_rq);
> > > > -		if (!throttled_hierarchy(cfs_rq)) {
> > > > -			list_add_leaf_cfs_rq(cfs_rq);
> > > > -		} else {
> > > > +		list_add_leaf_cfs_rq(cfs_rq);
> > > >    #ifdef CONFIG_CFS_BANDWIDTH
> > > > +		if (throttled_hierarchy(cfs_rq)) {
> > > >    			struct rq *rq = rq_of(cfs_rq);
> > > > 
> > > >    			if (cfs_rq_throttled(cfs_rq) && !cfs_rq->throttled_clock)
> > > >    				cfs_rq->throttled_clock = rq_clock(rq);
> > > >    			if (!cfs_rq->throttled_clock_self)
> > > >    				cfs_rq->throttled_clock_self = rq_clock(rq);
> > > 
> > > These bits probabaly need revisiting. From what I understand, these
> > > stats were maintained to know when a task was woken up on a
> > > throttled hierarchy which was not connected to the parent essentially
> > > tracking the amount of time runnable tasks were waiting on the
> > > cfs_rq before an unthrottle event allowed them to be picked.
> > 
> > Do you mean these throttled_clock stats?
> > 
> > I think they are here because we do not record the throttled_clock for
> > empty cfs_rqs and once the cfs_rq has task enqueued, it needs to record
> > its throttled_clock. This is done in commit 79462e8c879a("sched: don't
> > account throttle time for empty groups") by Josh. I don't think per-task
> > throttle change this.
> > 
> > With this said, I think there is a gap in per-task throttle, i.e. when
> > all tasks are dequeued from this throttled cfs_rq, we should record its
> > throttled_time and clear its throttled_clock.
> 
> Yes but then what it'll track is the amount of time task were running
> when the cfs_rq was on a throttled hierarchy. Is that what we want to
> track or something else.

Right, my last comment was not correct.

Basically, my current approach tried to mimic the existing accounting,
i.e. when there is task enqueued in a throttled cfs_rq, start recording
this cfs_rq's throttled_clock. It kind of over-accounts the throttled
time for cfs_rq with this per-task throttle model because some task can
still be running in kernel mode while cfs_rq is throttled.

> The commit log for 677ea015f231 ("sched: add throttled time stat for
> throttled children") says the following for "throttled_clock_self_time":
> 
>     We currently export the total throttled time for cgroups that are given
>     a bandwidth limit. This patch extends this accounting to also account
>     the total time that each children cgroup has been throttled.
> 
>     This is useful to understand the degree to which children have been
>     affected by the throttling control. Children which are not runnable
>     during the entire throttled period, for example, will not show any
>     self-throttling time during this period.
> 
> but with per-task model, it is probably the amount of time that
> "throttled_limbo_list" has a task on it since they are runnable
> but are in-fact waiting for an unthrottle.
>
> If a task enqueues itself on a throttled hierarchy and then blocks
> again before exiting to the userspace, it should not count in
> "throttled_clock_self_time" since the task was runnable the whole
> time despite the hierarchy being frozen.

I think there is a mismatch between per-task throttle and per-cfs_rq
stats, it's hard to make the accounting perfect. Assume a throttled
cfs_rq has 4 tasks, with 2 tasks blocked on limbo_list and 2 tasks still
running in kernel mode. Should we treat this time as throttled or not
for this cfs_rq?

This is similar to the pelt clock freeze problem. For the above example,
should we freeze the cfs_rq's pelt clock or let it continue when this
cfs_rq is throttled with some task blocked on limbo_list and some task
still running in kernel mode?

My understanding is, neither approach is perfect, so I just chose the
simpler one for now. Please correct me if my understaning is wrong.

Thanks,
Aaron

  reply	other threads:[~2025-03-17  5:49 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-17 10:56 [RFC PATCH 0/7] Defer throttle when task exits to user Aaron Lu
2025-03-13  7:21 ` [RFC PATCH 1/7] sched/fair: Add related data structure for task based throttle Aaron Lu
2025-03-17 10:28   ` Valentin Schneider
2025-03-17 11:02     ` Aaron Lu
2025-03-13  7:21 ` [RFC PATCH 2/7] sched/fair: Handle throttle path " Aaron Lu
2025-03-13 18:14   ` K Prateek Nayak
2025-03-14  8:48     ` Aaron Lu
2025-03-14  9:00       ` K Prateek Nayak
2025-03-14  3:28   ` K Prateek Nayak
2025-03-14  8:57     ` Aaron Lu
2025-03-14  9:12       ` K Prateek Nayak
2025-03-14 15:10         ` Aaron Lu
2025-03-14  8:39   ` Chengming Zhou
2025-03-14  8:49     ` K Prateek Nayak
2025-03-14  9:42     ` Aaron Lu
2025-03-14 10:26       ` K Prateek Nayak
2025-03-14 11:47         ` Aaron Lu
2025-03-14 15:58           ` Chengming Zhou
2025-03-14 18:04           ` K Prateek Nayak
2025-03-14 11:07       ` Chengming Zhou
2025-03-31  6:42         ` Aaron Lu
2025-03-31  9:14           ` Chengming Zhou
2025-03-16  3:25   ` Josh Don
2025-03-17  2:54     ` Chengming Zhou
2025-03-20  6:59       ` K Prateek Nayak
2025-03-20  8:39         ` Chengming Zhou
2025-03-20 18:40           ` Xi Wang
2025-03-24  8:58             ` Aaron Lu
2025-03-25 10:02               ` Aaron Lu
2025-03-28  0:11                 ` Xi Wang
2025-03-28  3:11                   ` Aaron Lu
2025-03-28 22:47         ` Benjamin Segall
2025-03-19 13:43     ` Aaron Lu
2025-03-20  1:06       ` Josh Don
2025-03-20  6:53     ` K Prateek Nayak
2025-03-13  7:21 ` [RFC PATCH 3/7] sched/fair: Handle unthrottle " Aaron Lu
2025-03-14  3:53   ` K Prateek Nayak
2025-03-14  4:06     ` K Prateek Nayak
2025-03-14 10:43     ` Aaron Lu
2025-03-14 17:52       ` K Prateek Nayak
2025-03-17  5:48         ` Aaron Lu [this message]
2025-04-02  9:25         ` Aaron Lu
2025-04-02 17:24           ` K Prateek Nayak
2025-03-13  7:21 ` [RFC PATCH 4/7] sched/fair: Take care of migrated task " Aaron Lu
2025-03-14  4:03   ` K Prateek Nayak
2025-03-14  9:49     ` [External] " Aaron Lu
2025-03-13  7:21 ` [RFC PATCH 5/7] sched/fair: Take care of group/affinity/sched_class change for throttled task Aaron Lu
2025-03-14  4:51   ` K Prateek Nayak
2025-03-14 11:40     ` [External] " Aaron Lu
2025-03-13  7:22 ` [RFC PATCH 6/7] sched/fair: fix tasks_rcu with task based throttle Aaron Lu
2025-03-14  4:14   ` K Prateek Nayak
2025-03-14 11:37     ` [External] " Aaron Lu
2025-03-31  6:19     ` Aaron Lu
2025-04-01  3:17       ` K Prateek Nayak
2025-04-01  8:48         ` Aaron Lu
2025-03-13  7:22 ` [RFC PATCH 7/7] sched/fair: Make sure cfs_rq has enough runtime_remaining on unthrottle path Aaron Lu
2025-03-14  4:18   ` K Prateek Nayak
2025-03-14 11:39     ` [External] " Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250317054847.GA3107573@bytedance \
    --to=ziqianlu@bytedance.com \
    --cc=bsegall@google.com \
    --cc=chengming.zhou@linux.dev \
    --cc=dietmar.eggemann@arm.com \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=zhouchuyi@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.