public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups
@ 2026-01-19 13:30 Zicheng Qu
  2026-01-20  3:33 ` K Prateek Nayak
  0 siblings, 1 reply; 11+ messages in thread
From: Zicheng Qu @ 2026-01-19 13:30 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, vschneid, vatsa, dhaval, linux-kernel
  Cc: tanghui20, zhangqiao22, quzicheng

Consider the following sequence on a CPU configured with nohz_full:

1) A task P runs in cgroup A, and cgroup A becomes throttled due to CFS
   bandwidth control. The gse (cgroup A) where the task P attached is
dequeued and the CPU switches to idle.

2) Before cgroup A is unthrottled, task P is migrated from cgroup A to
   another cgroup B (not throttled).

   During sched_move_task(), the task P is observed as queued but not
running, and therefore no resched_curr() is triggered.

3) Since the CPU is nohz_full, it remains in do_idle() waiting for an
   explicit scheduling event, i.e., resched_curr().

4) Later, cgroup A is unthrottled. However, the task P has already been
   migrated out of cgroup A, so unthrottle_cfs_rq() may observe
load_weight == 0 and return early without resched_curr() called.

At this point, the task P is runnable in cgroup B (not throttled), but
the CPU remains in do_idle() with no pending reschedule point. The
system stays in this state until an unrelated event (e.g. a new task
wakeup or any cases) that can trigger a resched_curr() breaks the
nohz_full idle state, and then the task P finally gets scheduled.

The root cause is that sched_move_task() may classify the task as only
queued, not running, and therefore fails to trigger a resched_curr(),
while the later unthrottling path no longer has visibility of the
migrated task.

Preserve the existing behavior for running tasks by issuing
resched_curr(), and explicitly invoke check_preempt_curr() for tasks
that were queued at the time of migration. This ensures that runnable
tasks are reconsidered for scheduling even when nohz_full suppresses
periodic ticks.

Fixes: 29f59db3a74b ("sched: group-scheduler core")
Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
---
 kernel/sched/core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 045f83ad261e..667070362bd3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9110,6 +9110,7 @@ static void sched_change_group(struct task_struct *tsk)
 void sched_move_task(struct task_struct *tsk, bool for_autogroup)
 {
 	unsigned int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE;
+	bool queued = false;
 	bool resched = false;
 	struct rq *rq;
 
@@ -9120,12 +9121,16 @@ void sched_move_task(struct task_struct *tsk, bool for_autogroup)
 		sched_change_group(tsk);
 		if (!for_autogroup)
 			scx_cgroup_move_task(tsk);
+		if (scope->queued)
+			queued = true;
 		if (scope->running)
 			resched = true;
 	}
 
 	if (resched)
 		resched_curr(rq);
+	else if (queued)
+		wakeup_preempt(rq, tsk, 0);
 
 	__balance_callbacks(rq, &rq_guard.rf);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-02-03 11:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-19 13:30 [PATCH] sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups Zicheng Qu
2026-01-20  3:33 ` K Prateek Nayak
2026-01-20  3:25   ` Zicheng Qu
2026-01-21  3:49     ` Aaron Lu
2026-01-21  5:24       ` K Prateek Nayak
2026-01-21  6:34         ` Aaron Lu
2026-01-30  8:34     ` Zicheng Qu
2026-01-30  9:03       ` Zicheng Qu
2026-02-02  7:09         ` Aaron Lu
2026-02-02 12:49       ` Peter Zijlstra
2026-02-03 11:18       ` [tip: sched/core] " tip-bot2 for Zicheng Qu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox