From: Aaron Lu <ziqianlu@bytedance.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <vschneid@redhat.com>,
Ben Segall <bsegall@google.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Josh Don <joshdon@google.com>, Ingo Molnar <mingo@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Xi Wang <xii@google.com>,
linux-kernel@vger.kernel.org, Juri Lelli <juri.lelli@redhat.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mel Gorman <mgorman@suse.de>,
Chengming Zhou <chengming.zhou@linux.dev>,
Chuyi Zhou <zhouchuyi@bytedance.com>,
Jan Kiszka <jan.kiszka@siemens.com>,
Florian Bezdeka <florian.bezdeka@siemens.com>
Subject: Re: [PATCH 2/7] sched/fair: prepare throttle path for task based throttle
Date: Fri, 23 May 2025 17:53:50 +0800 [thread overview]
Message-ID: <20250523095350.GA1215853@bytedance> (raw)
In-Reply-To: <20250522123750.GB672414@bytedance>
On Thu, May 22, 2025 at 08:40:02PM +0800, Aaron Lu wrote:
> On Thu, May 22, 2025 at 01:54:18PM +0200, Peter Zijlstra wrote:
> > On Thu, May 22, 2025 at 07:44:55PM +0800, Aaron Lu wrote:
> > > On Thu, May 22, 2025 at 12:48:43PM +0200, Peter Zijlstra wrote:
> > > > On Tue, May 20, 2025 at 06:41:05PM +0800, Aaron Lu wrote:
> > > >
> > > > > static void throttle_cfs_rq_work(struct callback_head *work)
> > > > > {
> > > > > + struct task_struct *p = container_of(work, struct task_struct, sched_throttle_work);
> > > > > + struct sched_entity *se;
> > > > > + struct cfs_rq *cfs_rq;
> > > > > + struct rq *rq;
> > > > > +
> > > > > + WARN_ON_ONCE(p != current);
> > > > > + p->sched_throttle_work.next = &p->sched_throttle_work;
> > > > > +
> > > > > + /*
> > > > > + * If task is exiting, then there won't be a return to userspace, so we
> > > > > + * don't have to bother with any of this.
> > > > > + */
> > > > > + if ((p->flags & PF_EXITING))
> > > > > + return;
> > > > > +
> > > > > + scoped_guard(task_rq_lock, p) {
> > > > > + se = &p->se;
> > > > > + cfs_rq = cfs_rq_of(se);
> > > > > +
> > > > > + /* Raced, forget */
> > > > > + if (p->sched_class != &fair_sched_class)
> > > > > + return;
> > > > > +
> > > > > + /*
> > > > > + * If not in limbo, then either replenish has happened or this
> > > > > + * task got migrated out of the throttled cfs_rq, move along.
> > > > > + */
> > > > > + if (!cfs_rq->throttle_count)
> > > > > + return;
> > > > > + rq = scope.rq;
> > > > > + update_rq_clock(rq);
> > > > > + WARN_ON_ONCE(!list_empty(&p->throttle_node));
> > > > > + dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_SPECIAL);
> > > > > + list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list);
> > > > > + resched_curr(rq);
> > > > > + }
> > > > > +
> > > > > + cond_resched_tasks_rcu_qs();
> > > > > }
> > > >
> > > > What's that cond_resched thing about? The general plan is to make
> > > > cond_resched go away.
> > >
> > > Got it.
> > >
> > > The purpose is to let throttled task schedule and also mark a task rcu
> > > quiescent state. Without this cond_resched_tasks_rcu_qs(), this task
> > > will be scheduled by cond_resched() in task_work_run() and since that is
> > > a preempt schedule, it didn't mark a task rcu quiescent state.
> > >
> > > Any suggestion here? Perhaps a plain schedule()? Thanks.
> >
> > I am confused, this is task_work_run(), that is ran from
> > exit_to_user_mode_loop(), which contains a schedule().
>
I should probably have added that the schedule() call contained in
exit_to_user_mode_loop() is early in that loop, where the to-be-throttled
task doesn't have need_resched bit set yet.
> There is a cond_resched() in task_work_run() loop:
>
> do {
> next = work->next;
> work->func(work);
> work = next;
> cond_resched();
> } while (work);
>
> And when this throttle work returns with need_resched bit set,
> cond_resched() will cause a schedule but that didn't mark a task
> quiescent state...
Another approach I can think of is to add a test of task_is_throttled()
in rcu_tasks_is_holdout(). I remembered when I tried this before, I can
hit the following path:
exit_to_user_mode_loop() -> get_signal() -> throttle_task_work() ->
do_exit() -> exit_signals() -> percpu_rwsem_wait() -> schedule() ->
try_to_block_task() -> dequeue_task_fair().
I would like to avoid this path, because it doesn't feel right for a
throttled task to go through another dequeue again(except for the cases
like task group change, affinity change etc. are special cases that have
to be dealed with though).
It looks to me, a schedule() call(or any other form) that makes sure
this throttled task gets scheduled in its task work is the safest thing
to do.
Thoughts?
Thanks,
Aaron
next prev parent reply other threads:[~2025-05-23 9:54 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-20 10:41 [PATCH 0/7] Defer throttle when task exits to user Aaron Lu
2025-05-20 10:41 ` [PATCH 1/7] sched/fair: Add related data structure for task based throttle Aaron Lu
2025-05-21 8:48 ` Chengming Zhou
2025-05-20 10:41 ` [PATCH 2/7] sched/fair: prepare throttle path " Aaron Lu
2025-05-20 12:02 ` Florian Bezdeka
2025-05-21 6:37 ` Aaron Lu
2025-05-21 11:51 ` Aaron Lu
2025-05-21 9:01 ` Chengming Zhou
2025-05-21 9:21 ` [External] " Aaron Lu
2025-05-22 11:43 ` Chengming Zhou
2025-05-23 8:03 ` Aaron Lu
2025-05-22 10:48 ` Peter Zijlstra
2025-05-22 11:44 ` Aaron Lu
2025-05-22 11:54 ` Peter Zijlstra
2025-05-22 12:40 ` Aaron Lu
2025-05-23 9:53 ` Aaron Lu [this message]
2025-05-23 10:52 ` Peter Zijlstra
2025-05-23 11:17 ` Aaron Lu
2025-05-22 11:07 ` Peter Zijlstra
2025-05-23 7:40 ` Aaron Lu
2025-05-29 11:51 ` Aaron Lu
2025-05-30 5:36 ` K Prateek Nayak
2025-05-30 11:02 ` Aaron Lu
2025-05-23 12:35 ` Peter Zijlstra
2025-05-20 10:41 ` [PATCH 3/7] sched/fair: prepare unthrottle " Aaron Lu
2025-05-20 10:41 ` [PATCH 4/7] sched/fair: Take care of group/affinity/sched_class change for throttled task Aaron Lu
2025-05-22 12:03 ` Peter Zijlstra
2025-05-22 12:49 ` Aaron Lu
2025-05-23 14:59 ` Peter Zijlstra
2025-05-26 11:36 ` Aaron Lu
2025-05-27 6:58 ` Aaron Lu
2025-05-27 11:19 ` K Prateek Nayak
2025-05-27 11:54 ` Aaron Lu
2025-05-27 14:16 ` K Prateek Nayak
2025-05-23 2:43 ` Chengming Zhou
2025-05-23 7:56 ` Aaron Lu
2025-05-23 9:13 ` Chengming Zhou
2025-05-23 9:42 ` Aaron Lu
2025-05-23 9:53 ` Chengming Zhou
2025-05-23 11:59 ` Aaron Lu
2025-05-26 13:14 ` Chengming Zhou
2025-05-20 10:41 ` [PATCH 5/7] sched/fair: switch to task based throttle model Aaron Lu
2025-05-20 10:41 ` [PATCH 6/7] sched/fair: task based throttle time accounting Aaron Lu
2025-05-20 10:41 ` [PATCH 7/7] sched/fair: get rid of throttled_lb_pair() Aaron Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250523095350.GA1215853@bytedance \
--to=ziqianlu@bytedance.com \
--cc=bsegall@google.com \
--cc=chengming.zhou@linux.dev \
--cc=dietmar.eggemann@arm.com \
--cc=florian.bezdeka@siemens.com \
--cc=jan.kiszka@siemens.com \
--cc=joshdon@google.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=xii@google.com \
--cc=zhouchuyi@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).