public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Aaron Lu" <ziqianlu@bytedance.com>
To: "Zicheng Qu" <quzicheng@huawei.com>
Cc: <kprateek.nayak@amd.com>, <bsegall@google.com>,
	 <dhaval@linux.vnet.ibm.com>, <dietmar.eggemann@arm.com>,
	 <juri.lelli@redhat.com>, <linux-kernel@vger.kernel.org>,
	 <mgorman@suse.de>, <mingo@redhat.com>, <peterz@infradead.org>,
	 <rostedt@goodmis.org>, <tanghui20@huawei.com>,
	 <vatsa@linux.vnet.ibm.com>, <vincent.guittot@linaro.org>,
	 <vschneid@redhat.com>, <zhangqiao22@huawei.com>
Subject: Re: [PATCH] sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups
Date: Wed, 21 Jan 2026 11:49:18 +0800	[thread overview]
Message-ID: <20260121034918.GA1303836@bytedance.com> (raw)
In-Reply-To: <20260120032549.186733-1-quzicheng@huawei.com>

On Tue, Jan 20, 2026 at 03:25:49AM +0000, Zicheng Qu wrote:
> Consider the following sequence on a CPU configured with nohz_full:
> 
> 1) A task P runs in cgroup A, and cgroup A becomes throttled due to CFS
>    bandwidth control. The gse (cgroup A) where the task P attached is
> dequeued and the CPU switches to idle.
> 
> 2) Before cgroup A is unthrottled, task P is migrated from cgroup A to
>    another cgroup B (not throttled).
> 
>    During sched_move_task(), the task P is observed as queued but not
> running, and therefore no resched_curr() is triggered.
> 
> 3) Since the CPU is nohz_full, it remains in do_idle() waiting for an
>    explicit scheduling event, i.e., resched_curr().
> 
> 4) Later, cgroup A is unthrottled. However, the task P has already been
>    migrated out of cgroup A, so unthrottle_cfs_rq() may observe
> load_weight == 0 and return early without resched_curr() called.

I suppose this is only possible when the unthrottled cfs_rq has been
fully decayed, i.e. !cfs_rq->on_list is true? Because only in that case,
it will skip the resched_curr() in the bottom of unthrottle_cfs_rq() for
the scenario you have described.

Looking at this logic,  I feel the early return due to
(!cfs_rq->load.weight) && (!cfs_rq->on_list) is strange, because the
resched in bottom:

	/* Determine whether we need to wake up potentially idle CPU: */
		if (rq->curr == rq->idle && rq->cfs.nr_queued)
			resched_curr(rq);

should not depend on whether cfs_rq is fully decayed or not...

I think it should be something like this:
- complete the branch if no task enqueued but still on_list;
- only resched_curr() if task gets enqueued

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e71302282671c..e09da54a5d117 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6009,9 +6009,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 	/* update hierarchical throttle state */
 	walk_tg_tree_from(cfs_rq->tg, tg_nop, tg_unthrottle_up, (void *)rq);
 
-	if (!cfs_rq->load.weight) {
-		if (!cfs_rq->on_list)
-			return;
+	if (!cfs_rq->load.weight && cfs_rq->on_list) {
 		/*
 		 * Nothing to run but something to decay (on_list)?
 		 * Complete the branch.
@@ -6025,7 +6023,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 	assert_list_leaf_cfs_rq(rq);
 
 	/* Determine whether we need to wake up potentially idle CPU: */
-	if (rq->curr == rq->idle && rq->cfs.nr_queued)
+	if (rq->curr == rq->idle && cfs_rq->nr_queued)
 		resched_curr(rq);
 }
 

Thoughts?

> At this point, the task P is runnable in cgroup B (not throttled), but
> the CPU remains in do_idle() with no pending reschedule point. The
> system stays in this state until an unrelated event (e.g. a new task
> wakeup or any cases) that can trigger a resched_curr() breaks the
> nohz_full idle state, and then the task P finally gets scheduled.
> 
> The root cause is that sched_move_task() may classify the task as only
> queued, not running, and therefore fails to trigger a resched_curr(),
> while the later unthrottling path no longer has visibility of the
> migrated task.
> 
> Preserve the existing behavior for running tasks by issuing
> resched_curr(), and explicitly invoke check_preempt_curr() for tasks
> that were queued at the time of migration. This ensures that runnable
> tasks are reconsidered for scheduling even when nohz_full suppresses
> periodic ticks.
> 
> Fixes: 29f59db3a74b ("sched: group-scheduler core")
> Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
> Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>

I haven't been able to reproduce this but the change looks reasonable to
me, so:

Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>

  reply	other threads:[~2026-01-21  3:50 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-19 13:30 [PATCH] sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups Zicheng Qu
2026-01-20  3:33 ` K Prateek Nayak
2026-01-20  3:25   ` Zicheng Qu
2026-01-21  3:49     ` Aaron Lu [this message]
2026-01-21  5:24       ` K Prateek Nayak
2026-01-21  6:34         ` Aaron Lu
2026-01-30  8:34     ` Zicheng Qu
2026-01-30  9:03       ` Zicheng Qu
2026-02-02  7:09         ` Aaron Lu
2026-02-02 12:49       ` Peter Zijlstra
2026-02-03 11:18       ` [tip: sched/core] " tip-bot2 for Zicheng Qu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260121034918.GA1303836@bytedance.com \
    --to=ziqianlu@bytedance.com \
    --cc=bsegall@google.com \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=quzicheng@huawei.com \
    --cc=rostedt@goodmis.org \
    --cc=tanghui20@huawei.com \
    --cc=vatsa@linux.vnet.ibm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=zhangqiao22@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox