From: K Prateek Nayak <kprateek.nayak@amd.com>
To: Benjamin Segall <bsegall@google.com>, Aaron Lu <ziqianlu@bytedance.com>
Cc: Valentin Schneider <vschneid@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Chengming Zhou <chengming.zhou@linux.dev>,
Josh Don <joshdon@google.com>, Ingo Molnar <mingo@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Xi Wang <xii@google.com>,
linux-kernel@vger.kernel.org, Juri Lelli <juri.lelli@redhat.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mel Gorman <mgorman@suse.de>,
Chuyi Zhou <zhouchuyi@bytedance.com>,
Jan Kiszka <jan.kiszka@siemens.com>,
Florian Bezdeka <florian.bezdeka@siemens.com>
Subject: Re: [PATCH v2 0/5] Defer throttle when task exits to user
Date: Fri, 4 Jul 2025 10:04:13 +0530 [thread overview]
Message-ID: <dc54a6ab-2529-4def-ae7d-6a217e3bc1bc@amd.com> (raw)
In-Reply-To: <xm26qzyyb6cn.fsf@google.com>
Hello Ben,
On 7/3/2025 3:30 AM, Benjamin Segall wrote:
> Aaron Lu <ziqianlu@bytedance.com> writes:
>
>> For pelt clock, I chose to keep the current behavior to freeze it on
>> cfs_rq's throttle time. The assumption is that tasks running in kernel
>> mode should not last too long, freezing the cfs_rq's pelt clock can keep
>> its load and its corresponding sched_entity's weight. Hopefully, this can
>> result in a stable situation for the remaining running tasks to quickly
>> finish their jobs in kernel mode.
>
> I suppose the way that this would go wrong would be CPU 1 using up all
> of the quota, and then a task waking up on CPU 2 and trying to run in
> the kernel for a while. I suspect pelt time needs to also keep running
> until all the tasks are asleep (and that's what we have been running at
> google with the version based on separate accounting, so we haven't
> accidentally done a large scale test of letting it pause).
Thinking out loud ...
One thing this can possibly do is create a lot of:
throttled -> partially unthrottled -> throttled
transitions when tasks wakeup on throttled hierarchy, run for a while,
and then go back to sleep. If the PELT clocks aren't frozen, this
either means:
1. Do a full walk_tg_tree_from() placing all the leaf cfs_rq that have
any load associated back onto the list and allow PELT to progress only
to then remove them again once tasks go back to sleep. A great many of
these transitions are possible theoretically which is not ideal.
2. Propagate the delta time where PELT was not frozen during unthrottle
and if it isn't 0, do an update_load_avg() to sync PELT. This will
increase the overhead of the tg_tree callback which isn't ideal
either. It can also complicate the enqueue path since the PELT of
the the cfs_rq hierarchy being enqueued may need correction before
the task can be enqueued.
I know Josh hates both approaches since tg_tree_walks are already very
expensive in your use cases and it has to be done in a non-preemptible
context holding the rq_lock but which do you think is the lesser of two
evils? Or is there a better solution that I have completely missed?
>
> Otherwise it does look ok, so long as we're ok with increasing distribute
> time again.
--
Thanks and Regards,
Prateek
next prev parent reply other threads:[~2025-07-04 4:34 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-18 8:19 [PATCH v2 0/5] Defer throttle when task exits to user Aaron Lu
2025-06-18 8:19 ` [PATCH v2 1/5] sched/fair: Add related data structure for task based throttle Aaron Lu
2025-06-18 8:19 ` [PATCH v2 2/5] sched/fair: Implement throttle task work and related helpers Aaron Lu
2025-06-18 9:03 ` Chengming Zhou
2025-06-18 8:19 ` [PATCH v2 3/5] sched/fair: Switch to task based throttle model Aaron Lu
2025-06-18 9:55 ` Chengming Zhou
2025-06-18 11:19 ` Aaron Lu
2025-06-19 12:02 ` Chengming Zhou
2025-06-18 8:19 ` [PATCH v2 4/5] sched/fair: Task based throttle time accounting Aaron Lu
2025-06-18 8:19 ` [PATCH v2 5/5] sched/fair: Get rid of throttled_lb_pair() Aaron Lu
2025-07-01 8:31 ` [PATCH v2 0/5] Defer throttle when task exits to user Aaron Lu
2025-07-03 7:37 ` Peter Zijlstra
2025-07-03 11:51 ` Aaron Lu
2025-07-02 4:25 ` K Prateek Nayak
2025-07-02 8:51 ` Aaron Lu
2025-07-02 22:00 ` Benjamin Segall
2025-07-03 6:34 ` Aaron Lu
2025-07-04 4:34 ` K Prateek Nayak [this message]
2025-07-04 7:54 ` Aaron Lu
2025-07-04 8:48 ` K Prateek Nayak
2025-07-04 9:47 ` Aaron Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dc54a6ab-2529-4def-ae7d-6a217e3bc1bc@amd.com \
--to=kprateek.nayak@amd.com \
--cc=bsegall@google.com \
--cc=chengming.zhou@linux.dev \
--cc=dietmar.eggemann@arm.com \
--cc=florian.bezdeka@siemens.com \
--cc=jan.kiszka@siemens.com \
--cc=joshdon@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=xii@google.com \
--cc=zhouchuyi@bytedance.com \
--cc=ziqianlu@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox