All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups
@ 2026-01-19 13:30 Zicheng Qu
  2026-01-20  3:33 ` K Prateek Nayak
  0 siblings, 1 reply; 11+ messages in thread
From: Zicheng Qu @ 2026-01-19 13:30 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, vschneid, vatsa, dhaval, linux-kernel
  Cc: tanghui20, zhangqiao22, quzicheng

Consider the following sequence on a CPU configured with nohz_full:

1) A task P runs in cgroup A, and cgroup A becomes throttled due to CFS
   bandwidth control. The gse (cgroup A) where the task P attached is
dequeued and the CPU switches to idle.

2) Before cgroup A is unthrottled, task P is migrated from cgroup A to
   another cgroup B (not throttled).

   During sched_move_task(), the task P is observed as queued but not
running, and therefore no resched_curr() is triggered.

3) Since the CPU is nohz_full, it remains in do_idle() waiting for an
   explicit scheduling event, i.e., resched_curr().

4) Later, cgroup A is unthrottled. However, the task P has already been
   migrated out of cgroup A, so unthrottle_cfs_rq() may observe
load_weight == 0 and return early without resched_curr() called.

At this point, the task P is runnable in cgroup B (not throttled), but
the CPU remains in do_idle() with no pending reschedule point. The
system stays in this state until an unrelated event (e.g. a new task
wakeup or any cases) that can trigger a resched_curr() breaks the
nohz_full idle state, and then the task P finally gets scheduled.

The root cause is that sched_move_task() may classify the task as only
queued, not running, and therefore fails to trigger a resched_curr(),
while the later unthrottling path no longer has visibility of the
migrated task.

Preserve the existing behavior for running tasks by issuing
resched_curr(), and explicitly invoke check_preempt_curr() for tasks
that were queued at the time of migration. This ensures that runnable
tasks are reconsidered for scheduling even when nohz_full suppresses
periodic ticks.

Fixes: 29f59db3a74b ("sched: group-scheduler core")
Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
---
 kernel/sched/core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 045f83ad261e..667070362bd3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9110,6 +9110,7 @@ static void sched_change_group(struct task_struct *tsk)
 void sched_move_task(struct task_struct *tsk, bool for_autogroup)
 {
 	unsigned int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE;
+	bool queued = false;
 	bool resched = false;
 	struct rq *rq;
 
@@ -9120,12 +9121,16 @@ void sched_move_task(struct task_struct *tsk, bool for_autogroup)
 		sched_change_group(tsk);
 		if (!for_autogroup)
 			scx_cgroup_move_task(tsk);
+		if (scope->queued)
+			queued = true;
 		if (scope->running)
 			resched = true;
 	}
 
 	if (resched)
 		resched_curr(rq);
+	else if (queued)
+		wakeup_preempt(rq, tsk, 0);
 
 	__balance_callbacks(rq, &rq_guard.rf);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-02-03 11:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-19 13:30 [PATCH] sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups Zicheng Qu
2026-01-20  3:33 ` K Prateek Nayak
2026-01-20  3:25   ` Zicheng Qu
2026-01-21  3:49     ` Aaron Lu
2026-01-21  5:24       ` K Prateek Nayak
2026-01-21  6:34         ` Aaron Lu
2026-01-30  8:34     ` Zicheng Qu
2026-01-30  9:03       ` Zicheng Qu
2026-02-02  7:09         ` Aaron Lu
2026-02-02 12:49       ` Peter Zijlstra
2026-02-03 11:18       ` [tip: sched/core] " tip-bot2 for Zicheng Qu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.