From: Aaron Lu <ziqianlu@bytedance.com>
To: Benjamin Segall <bsegall@google.com>
Cc: "Valentin Schneider" <vschneid@redhat.com>,
"K Prateek Nayak" <kprateek.nayak@amd.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Chengming Zhou" <chengming.zhou@linux.dev>,
"Josh Don" <joshdon@google.com>, "Ingo Molnar" <mingo@redhat.com>,
"Vincent Guittot" <vincent.guittot@linaro.org>,
"Xi Wang" <xii@google.com>,
linux-kernel@vger.kernel.org,
"Juri Lelli" <juri.lelli@redhat.com>,
"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Mel Gorman" <mgorman@suse.de>,
"Chuyi Zhou" <zhouchuyi@bytedance.com>,
"Jan Kiszka" <jan.kiszka@siemens.com>,
"Florian Bezdeka" <florian.bezdeka@siemens.com>,
"Songtang Liu" <liusongtang@bytedance.com>,
"Chen Yu" <yu.c.chen@intel.com>,
"Matteo Martelli" <matteo.martelli@codethink.co.uk>,
"Michal Koutný" <mkoutny@suse.com>,
"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>
Subject: Re: [PATCH v4 3/5] sched/fair: Switch to task based throttle model
Date: Thu, 4 Sep 2025 19:30:45 +0800 [thread overview]
Message-ID: <20250904113045.GI42@bytedance> (raw)
In-Reply-To: <20250904112610.GH42@bytedance>
On Thu, Sep 04, 2025 at 07:26:10PM +0800, Aaron Lu wrote:
> On Wed, Sep 03, 2025 at 01:55:36PM -0700, Benjamin Segall wrote:
> > Aaron Lu <ziqianlu@bytedance.com> writes:
> >
> > > +static bool enqueue_throttled_task(struct task_struct *p)
> > > +{
> > > + struct cfs_rq *cfs_rq = cfs_rq_of(&p->se);
> > > +
> > > + /* @p should have gone through dequeue_throttled_task() first */
> > > + WARN_ON_ONCE(!list_empty(&p->throttle_node));
> > > +
> > > + /*
> > > + * If the throttled task @p is enqueued to a throttled cfs_rq,
> > > + * take the fast path by directly putting the task on the
> > > + * target cfs_rq's limbo list.
> > > + *
> > > + * Do not do that when @p is current because the following race can
> > > + * cause @p's group_node to be incorectly re-insterted in its rq's
> > > + * cfs_tasks list, despite being throttled:
> > > + *
> > > + * cpuX cpuY
> > > + * p ret2user
> > > + * throttle_cfs_rq_work() sched_move_task(p)
> > > + * LOCK task_rq_lock
> > > + * dequeue_task_fair(p)
> > > + * UNLOCK task_rq_lock
> > > + * LOCK task_rq_lock
> > > + * task_current_donor(p) == true
> > > + * task_on_rq_queued(p) == true
> > > + * dequeue_task(p)
> > > + * put_prev_task(p)
> > > + * sched_change_group()
> > > + * enqueue_task(p) -> p's new cfs_rq
> > > + * is throttled, go
> > > + * fast path and skip
> > > + * actual enqueue
> > > + * set_next_task(p)
> > > + * list_move(&se->group_node, &rq->cfs_tasks); // bug
> > > + * schedule()
> > > + *
> > > + * In the above race case, @p current cfs_rq is in the same rq as
> > > + * its previous cfs_rq because sched_move_task() only moves a task
> > > + * to a different group from the same rq, so we can use its current
> > > + * cfs_rq to derive rq and test if the task is current.
> > > + */
> > > + if (throttled_hierarchy(cfs_rq) &&
> > > + !task_current_donor(rq_of(cfs_rq), p)) {
> > > + list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list);
> > > + return true;
> > > + }
> > > +
> > > + /* we can't take the fast path, do an actual enqueue*/
> > > + p->throttled = false;
> > > + return false;
> > > +}
> > > +
> >
> > Is there a reason that __set_next_task_fair cannot check p->se.on_rq as
> > well as (or instead of) task_on_rq_queued()? All of the _entity parts of
> > set_next/put_prev check se.on_rq for this sort of thing, so that seems
> > fairly standard. And se.on_rq should exactly match if the task is on
> > cfs_tasks since that add/remove is done in account_entity_{en,de}queue.
>
> Makes sense to me.
>
> Only thing that feels a little strange is, a throttled/dequeued task is
> set as next now. Maybe not a big deal. I booted a VM and run some tests,
> didn't notice anything wrong but I could very well miss some cases.
Sorry, I should have added: the above test was done with following diff:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cb93e74a850e8..7a6782617c0e8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5836,38 +5836,8 @@ static bool enqueue_throttled_task(struct task_struct *p)
* If the throttled task @p is enqueued to a throttled cfs_rq,
* take the fast path by directly putting the task on the
* target cfs_rq's limbo list.
- *
- * Do not do that when @p is current because the following race can
- * cause @p's group_node to be incorectly re-insterted in its rq's
- * cfs_tasks list, despite being throttled:
- *
- * cpuX cpuY
- * p ret2user
- * throttle_cfs_rq_work() sched_move_task(p)
- * LOCK task_rq_lock
- * dequeue_task_fair(p)
- * UNLOCK task_rq_lock
- * LOCK task_rq_lock
- * task_current_donor(p) == true
- * task_on_rq_queued(p) == true
- * dequeue_task(p)
- * put_prev_task(p)
- * sched_change_group()
- * enqueue_task(p) -> p's new cfs_rq
- * is throttled, go
- * fast path and skip
- * actual enqueue
- * set_next_task(p)
- * list_move(&se->group_node, &rq->cfs_tasks); // bug
- * schedule()
- *
- * In the above race case, @p current cfs_rq is in the same rq as
- * its previous cfs_rq because sched_move_task() only moves a task
- * to a different group from the same rq, so we can use its current
- * cfs_rq to derive rq and test if the task is current.
*/
- if (throttled_hierarchy(cfs_rq) &&
- !task_current_donor(rq_of(cfs_rq), p)) {
+ if (throttled_hierarchy(cfs_rq)) {
list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list);
return true;
}
@@ -13256,7 +13226,7 @@ static void __set_next_task_fair(struct rq *rq, struct task_struct *p, bool firs
{
struct sched_entity *se = &p->se;
- if (task_on_rq_queued(p)) {
+ if (se->on_rq) {
/*
* Move the next running task to the front of the list, so our
* cfs_tasks list becomes MRU one.
next prev parent reply other threads:[~2025-09-04 11:30 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-29 8:11 [PATCH v4 0/5] Defer throttle when task exits to user Aaron Lu
2025-08-29 8:11 ` [PATCH v4 1/5] sched/fair: Add related data structure for task based throttle Aaron Lu
2025-09-03 8:05 ` [tip: sched/core] " tip-bot2 for Valentin Schneider
2025-08-29 8:11 ` [PATCH v4 2/5] sched/fair: Implement throttle task work and related helpers Aaron Lu
2025-09-03 8:05 ` [tip: sched/core] " tip-bot2 for Valentin Schneider
2025-08-29 8:11 ` [PATCH v4 3/5] sched/fair: Switch to task based throttle model Aaron Lu
2025-09-03 8:05 ` [tip: sched/core] " tip-bot2 for Valentin Schneider
2025-09-03 14:51 ` [PATCH v4 3/5] " Peter Zijlstra
2025-09-03 17:12 ` K Prateek Nayak
2025-09-03 20:27 ` Peter Zijlstra
2025-09-04 5:44 ` K Prateek Nayak
2025-09-04 7:04 ` Aaron Lu
2025-09-05 11:37 ` Aaron Lu
2025-09-05 12:53 ` Peter Zijlstra
2025-09-08 11:05 ` [PATCH] sched/fair: Propagate load for throttled cfs_rq Aaron Lu
2025-09-09 4:20 ` kernel test robot
2025-09-09 6:17 ` Aaron Lu
2025-09-09 6:22 ` K Prateek Nayak
2025-09-09 6:27 ` Aaron Lu
2025-09-10 9:55 ` Aaron Lu
2025-09-03 20:46 ` [PATCH v4 3/5] sched/fair: Switch to task based throttle model Benjamin Segall
2025-09-04 6:03 ` K Prateek Nayak
2025-09-09 4:10 ` Benjamin Segall
2025-09-04 8:16 ` Aaron Lu
2025-09-04 9:51 ` K Prateek Nayak
2025-09-04 11:05 ` Aaron Lu
2025-09-04 14:20 ` K Prateek Nayak
2025-09-09 3:58 ` Benjamin Segall
2025-09-09 12:03 ` Aaron Lu
2025-09-10 3:03 ` Aaron Lu
2025-09-04 12:04 ` Aaron Lu
2025-09-05 7:53 ` Aaron Lu
2025-09-03 20:55 ` Benjamin Segall
2025-09-04 11:26 ` Aaron Lu
2025-09-04 11:30 ` Aaron Lu [this message]
2025-08-29 8:11 ` [PATCH v4 4/5] sched/fair: Task based throttle time accounting Aaron Lu
2025-09-03 8:05 ` [tip: sched/core] " tip-bot2 for Aaron Lu
2025-08-29 8:11 ` [PATCH v4 5/5] sched/fair: Get rid of throttled_lb_pair() Aaron Lu
2025-09-03 8:05 ` [tip: sched/core] " tip-bot2 for Aaron Lu
2025-09-01 10:03 ` [PATCH v4 0/5] Defer throttle when task exits to user Peter Zijlstra
2025-12-02 8:59 ` Bezdeka, Florian
2025-12-02 9:43 ` Aaron Lu
2025-12-02 10:09 ` Florian Bezdeka
2025-12-02 12:01 ` Aaron Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250904113045.GI42@bytedance \
--to=ziqianlu@bytedance.com \
--cc=bigeasy@linutronix.de \
--cc=bsegall@google.com \
--cc=chengming.zhou@linux.dev \
--cc=dietmar.eggemann@arm.com \
--cc=florian.bezdeka@siemens.com \
--cc=jan.kiszka@siemens.com \
--cc=joshdon@google.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=liusongtang@bytedance.com \
--cc=matteo.martelli@codethink.co.uk \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=xii@google.com \
--cc=yu.c.chen@intel.com \
--cc=zhouchuyi@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.