From: Aaron Lu <ziqianlu@bytedance.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Valentin Schneider <vschneid@redhat.com>,
Ben Segall <bsegall@google.com>,
Peter Zijlstra <peterz@infradead.org>,
Josh Don <joshdon@google.com>, Ingo Molnar <mingo@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
linux-kernel@vger.kernel.org, Juri Lelli <juri.lelli@redhat.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mel Gorman <mgorman@suse.de>,
Chengming Zhou <chengming.zhou@linux.dev>,
Chuyi Zhou <zhouchuyi@bytedance.com>
Subject: Re: [RFC PATCH 2/7] sched/fair: Handle throttle path for task based throttle
Date: Fri, 14 Mar 2025 16:48:54 +0800 [thread overview]
Message-ID: <20250314084854.GA1633113@bytedance> (raw)
In-Reply-To: <c9b1d117-2824-4238-bb8c-6390ec3e931b@amd.com>
On Thu, Mar 13, 2025 at 11:44:49PM +0530, K Prateek Nayak wrote:
> Hello Aaron,
>
> P.S. I've fixed the wrapped lines and have been testing the series. So
> far I haven't run into any issues yet on my machine. Will report back if
> anything surface.
Thanks a lot for taking the time to review and test.
>
> I've few comments inlined below.
>
> On 3/13/2025 12:51 PM, Aaron Lu wrote:
>
> [..snip..]
>
> > +static inline void task_throttle_setup_work(struct task_struct *p)
> > +{
> > + /*
> > + * Kthreads and exiting tasks don't return to userspace, so adding the
> > + * work is pointless
> > + */
> > + if ((p->flags & (PF_EXITING | PF_KTHREAD)))
> > + return;
> > +
> > + if (task_has_throttle_work(p))
> > + return;
> > +
> > + task_work_add(p, &p->sched_throttle_work, TWA_RESUME);
>
> Does it make sense to add a throttle work to a delayed task? It may be
I missed the case that a delayed task can still be on cfs_rq and I agree
there is no need to add throttle work to a delayed task.
> dequeued soon and when it is queued back, the throttle situation might
> have changed but the work is unnecessarily run. Could the throttle work
> be instead added at the point of enqueue for delayed tasks?
Yes. If a delayed task gets re-queued and its cfs_rq is in throttled
hierarchy, it should be added the throttle work.
>
> > +}
> > +
> > static int tg_throttle_down(struct task_group *tg, void *data)
> > {
> > struct rq *rq = data;
> > struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
> > + struct task_struct *p;
> > + struct rb_node *node;
> > +
> > + cfs_rq->throttle_count++;
> > + if (cfs_rq->throttle_count > 1)
> > + return 0;
> >
> > /* group is entering throttled state, stop time */
> > - if (!cfs_rq->throttle_count) {
> > - cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
> > - list_del_leaf_cfs_rq(cfs_rq);
> > + cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
>
> Once cencern here is that the PELT is seemingly frozen despite the
> hierarchy being runnable. I've still not tracked down whether it'll
> cause any problems once unthrottled and all throttled time is negated
> from the pelt clock but is there any concerns here?
I chose to do it this way because:
1 I expect most of the time, if a task has to continue to run after its
cfs_rq gets throttled, the time is relatively small so should not cause
much impact. But I agree there can be times a task runs relatively long;
2 I think the original intent to freeze cfs_rq's pelt clock on throttle
is so that on unthrottle, it can retore its loada(without its load being
decayed etc.). If I chose to not freeze its pelt clock on throttle
because some task is still running in kernel mode, since some of this
cfs_rq's tasks are throttled, its load can become smaller and this can
impact its load on unthrottle.
I think both approach is not perfect, so I chose the simple one for now
:) Not sure if my thinking is correct though.
> Maybe this can be done at dequeue when cfs_rq->nr_queued on a
> throttled_hierarchy() reached 0.
Yes, this looks more consistent, maybe I should change to this approach.
> > + list_del_leaf_cfs_rq(cfs_rq);
> >
> > - SCHED_WARN_ON(cfs_rq->throttled_clock_self);
> > - if (cfs_rq->nr_queued)
> > - cfs_rq->throttled_clock_self = rq_clock(rq);
> > + SCHED_WARN_ON(cfs_rq->throttled_clock_self);
> > + if (cfs_rq->nr_queued)
> > + cfs_rq->throttled_clock_self = rq_clock(rq);
> > +
> > + WARN_ON_ONCE(!list_empty(&cfs_rq->throttled_limbo_list));
> > + /*
> > + * rq_lock is held, current is (obviously) executing this in kernelspace.
> > + *
> > + * All other tasks enqueued on this rq have their saved PC at the
> > + * context switch, so they will go through the kernel before returning
> > + * to userspace. Thus, there are no tasks-in-userspace to handle, just
> > + * install the task_work on all of them.
> > + */
> > + node = rb_first(&cfs_rq->tasks_timeline.rb_root);
> > + while (node) {
> > + struct sched_entity *se = __node_2_se(node);
> > +
> > + if (!entity_is_task(se))
> > + goto next;
> > +
> > + p = task_of(se);
> > + task_throttle_setup_work(p);
> > +next:
> > + node = rb_next(node);
> > + }
> > +
> > + /* curr is not in the timeline tree */
> > + if (cfs_rq->curr && entity_is_task(cfs_rq->curr)) {
>
> I believe we can reach here from pick_next_task_fair() ->
> check_cfs_rq_runtime() -> throttle_cfs_rq() in which case cfs_rq->curr
> will still be set despite the task being blocked since put_prev_entity()
> has not been called yet.
>
> I believe there should be a check for task_on_rq_queued() here for the
> current task.
Ah right, I'll see how to fix this.
Thanks,
Aaron
> > + p = task_of(cfs_rq->curr);
> > + task_throttle_setup_work(p);
> > }
> > - cfs_rq->throttle_count++;
> >
> > return 0;
> > }
> >
>
> [..snip..]
>
> --
> Thanks and Regards,
> Prateek
>
next prev parent reply other threads:[~2025-03-14 8:49 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-17 10:56 [RFC PATCH 0/7] Defer throttle when task exits to user Aaron Lu
2025-03-13 7:21 ` [RFC PATCH 1/7] sched/fair: Add related data structure for task based throttle Aaron Lu
2025-03-17 10:28 ` Valentin Schneider
2025-03-17 11:02 ` Aaron Lu
2025-03-13 7:21 ` [RFC PATCH 2/7] sched/fair: Handle throttle path " Aaron Lu
2025-03-13 18:14 ` K Prateek Nayak
2025-03-14 8:48 ` Aaron Lu [this message]
2025-03-14 9:00 ` K Prateek Nayak
2025-03-14 3:28 ` K Prateek Nayak
2025-03-14 8:57 ` Aaron Lu
2025-03-14 9:12 ` K Prateek Nayak
2025-03-14 15:10 ` Aaron Lu
2025-03-14 8:39 ` Chengming Zhou
2025-03-14 8:49 ` K Prateek Nayak
2025-03-14 9:42 ` Aaron Lu
2025-03-14 10:26 ` K Prateek Nayak
2025-03-14 11:47 ` Aaron Lu
2025-03-14 15:58 ` Chengming Zhou
2025-03-14 18:04 ` K Prateek Nayak
2025-03-14 11:07 ` Chengming Zhou
2025-03-31 6:42 ` Aaron Lu
2025-03-31 9:14 ` Chengming Zhou
2025-03-16 3:25 ` Josh Don
2025-03-17 2:54 ` Chengming Zhou
2025-03-20 6:59 ` K Prateek Nayak
2025-03-20 8:39 ` Chengming Zhou
2025-03-20 18:40 ` Xi Wang
2025-03-24 8:58 ` Aaron Lu
2025-03-25 10:02 ` Aaron Lu
2025-03-28 0:11 ` Xi Wang
2025-03-28 3:11 ` Aaron Lu
2025-03-28 22:47 ` Benjamin Segall
2025-03-19 13:43 ` Aaron Lu
2025-03-20 1:06 ` Josh Don
2025-03-20 6:53 ` K Prateek Nayak
2025-03-13 7:21 ` [RFC PATCH 3/7] sched/fair: Handle unthrottle " Aaron Lu
2025-03-14 3:53 ` K Prateek Nayak
2025-03-14 4:06 ` K Prateek Nayak
2025-03-14 10:43 ` Aaron Lu
2025-03-14 17:52 ` K Prateek Nayak
2025-03-17 5:48 ` Aaron Lu
2025-04-02 9:25 ` Aaron Lu
2025-04-02 17:24 ` K Prateek Nayak
2025-03-13 7:21 ` [RFC PATCH 4/7] sched/fair: Take care of migrated task " Aaron Lu
2025-03-14 4:03 ` K Prateek Nayak
2025-03-14 9:49 ` [External] " Aaron Lu
2025-03-13 7:21 ` [RFC PATCH 5/7] sched/fair: Take care of group/affinity/sched_class change for throttled task Aaron Lu
2025-03-14 4:51 ` K Prateek Nayak
2025-03-14 11:40 ` [External] " Aaron Lu
2025-03-13 7:22 ` [RFC PATCH 6/7] sched/fair: fix tasks_rcu with task based throttle Aaron Lu
2025-03-14 4:14 ` K Prateek Nayak
2025-03-14 11:37 ` [External] " Aaron Lu
2025-03-31 6:19 ` Aaron Lu
2025-04-01 3:17 ` K Prateek Nayak
2025-04-01 8:48 ` Aaron Lu
2025-03-13 7:22 ` [RFC PATCH 7/7] sched/fair: Make sure cfs_rq has enough runtime_remaining on unthrottle path Aaron Lu
2025-03-14 4:18 ` K Prateek Nayak
2025-03-14 11:39 ` [External] " Aaron Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250314084854.GA1633113@bytedance \
--to=ziqianlu@bytedance.com \
--cc=bsegall@google.com \
--cc=chengming.zhou@linux.dev \
--cc=dietmar.eggemann@arm.com \
--cc=joshdon@google.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=zhouchuyi@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.