Linux cgroups development
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Aaron Lu <ziqianlu@bytedance.com>
Cc: mingo@kernel.org, longman@redhat.com, chenridong@huaweicloud.com,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
	tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	jstultz@google.com, kprateek.nayak@amd.com, qyousef@layalina.io,
	svens@linux.ibm.com
Subject: Re: [PATCH v2 08/10] sched/fair: Add newidle balance to pick_task_fair()
Date: Thu, 11 Jun 2026 13:32:19 +0200	[thread overview]
Message-ID: <20260611113219.GG187714@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20260603095108.GA1684319@bytedance.com>


Aaron,

Sorry I failed to notice this email earlier.

On Wed, Jun 03, 2026 at 05:51:08PM +0800, Aaron Lu wrote:

> I applied below diff and the problem is gone:
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5f48af700fd44..942a543af3e54 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9897,6 +9897,9 @@ static struct task_struct *pick_task_fair(struct rq *rq, struct rq_flags *rf)
>  	return p;
>  
>  idle:
> +	if (sched_core_enabled(rq))
> +		return NULL;
> +
>  	new_tasks = sched_balance_newidle(rq, rf);
>  	if (new_tasks < 0)
>  		return RETRY_TASK;
> 

Right, this is the safe patch and restores pick_task_fair() to its
previous status (for core-sched).

Since people are hitting this problem, I'm going to merge it as below.
I've presumed your SoB, please let me know if that's a problem.

I think I'm going to try and move newidle into sched_class::balance /
balance_fair(), but I'll do that next cycle.

Thanks!

---
Subject: sched/fair: Fix newidle vs core-sched
From: "Aaron Lu" <ziqianlu@bytedance.com>
Date: Wed, 3 Jun 2026 17:51:08 +0800

From: "Aaron Lu" <ziqianlu@bytedance.com>

While testing Prateek's throttle series, I noticed a panic issue when
coresched is enabled and bisected to this patch.

I fed the panic log and this patch to an agent and its analysis looks
correct to me(cpu56 and cpu57 are siblings in a VM):

       cpu57 (holds core-wide lock)

     pick_next_task() [core scheduling]
     for_each_cpu_wrap(i, smt_mask, 57):
       i=57: pick_task(rq_57)
             pick_task_fair(rq_57)
             -> picks task A
       rq_57->core_pick = task A
       // task_rq(A) == rq_57

       i=56: pick_task(rq_56)
             pick_task_fair(rq_56)
             cfs_rq->nr_queued == 0
             goto idle
             sched_balance_newidle(rq_56)
             raw_spin_rq_unlock(rq_56)
             // core-wide lock released
             newidle_balance() pulls
               task A: rq_57 -> rq_56
             // task_rq(A) == rq_56 now
             raw_spin_rq_lock(rq_56)
             // core-wide lock re-acquired
             return > 0
             goto again
             pick_task_fair(rq_56)
             -> picks task A
       rq_56->core_pick = task A

     // first loop done
     // rq_57->core_pick is still task A (set before lock release)
     // but task_rq(A) == rq_56 now
     next = rq_57->core_pick  // = task A

     put_prev_set_next_task(rq_57, prev, task A)
     __set_next_task_fair(rq_57, task A)
     hrtick_start_fair(rq_57, task A)
     WARN_ON_ONCE(task_rq(task A) != rq_57)
     // task_rq(A) == rq_56

IOW: by allowing pick_task_fair() to do newidle_balance and not returning
RETRY_TASK, it can end up selecting the same task on two CPUs. Restore the
previous state by never doing newidle when core scheduling is enabled.

Tested-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: "Aaron Lu" <ziqianlu@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260603095108.GA1684319@bytedance.com
---
 kernel/sched/fair.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9942,6 +9942,9 @@ struct task_struct *pick_task_fair(struc
 	return p;
 
 idle:
+	if (sched_core_enabled(rq))
+		return NULL;
+
 	new_tasks = sched_balance_newidle(rq, rf);
 	if (new_tasks < 0)
 		return RETRY_TASK;

  reply	other threads:[~2026-06-11 11:32 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11 11:31 [PATCH v2 00/10] sched: Flatten the pick Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 01/10] sched/debug: Use char * instead of char (*)[] Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 02/10] sched: Use {READ,WRITE}_ONCE() for preempt_dynamic_mode Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 03/10] sched/debug: Collapse subsequent CONFIG_SCHED_CLASS_EXT sections Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 04/10] sched/fair: Add cgroup_mode switch Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 05/10] sched/fair: Add cgroup_mode: UP Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 06/10] sched/fair: Add cgroup_mode: MAX Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 07/10] sched/fair: Add cgroup_mode: CONCUR Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 08/10] sched/fair: Add newidle balance to pick_task_fair() Peter Zijlstra
2026-05-12  5:37   ` K Prateek Nayak
2026-05-12  9:45     ` Peter Zijlstra
2026-05-19 15:13   ` Vincent Guittot
2026-06-03  9:51   ` Aaron Lu
2026-06-11 11:32     ` Peter Zijlstra [this message]
2026-05-11 11:31 ` [PATCH v2 09/10] sched: Remove sched_class::pick_next_task() Peter Zijlstra
2026-05-19 15:14   ` Vincent Guittot
2026-05-11 11:31 ` [PATCH v2 10/10] sched/eevdf: Move to a single runqueue Peter Zijlstra
2026-05-11 16:21   ` K Prateek Nayak
2026-05-12 11:09     ` Peter Zijlstra
2026-05-13  7:01       ` K Prateek Nayak
2026-05-13  7:25         ` Peter Zijlstra
2026-05-13  4:51   ` John Stultz
2026-05-13  5:00     ` John Stultz
2026-05-14  1:36       ` John Stultz
2026-05-14  2:53         ` K Prateek Nayak
2026-05-14  3:14           ` John Stultz
2026-05-19 10:38   ` Vincent Guittot
2026-05-20 16:32     ` Vincent Guittot
2026-05-21  2:57       ` K Prateek Nayak
2026-05-21  7:56         ` Vincent Guittot
2026-05-21 10:31       ` Peter Zijlstra
2026-05-21 12:13         ` Vincent Guittot
2026-05-21 13:29           ` Peter Zijlstra
2026-05-21 13:44             ` Vincent Guittot
2026-05-21 14:01             ` Peter Zijlstra
2026-05-21 13:21         ` Peter Zijlstra
2026-05-21 13:39         ` Peter Zijlstra
2026-05-21 13:56           ` Vincent Guittot
2026-05-26  7:53   ` Zhang Qiao
2026-05-26  9:15     ` K Prateek Nayak
2026-05-26  9:36       ` Zhang Qiao
2026-05-26  9:52       ` Peter Zijlstra
2026-05-26 10:54         ` K Prateek Nayak
2026-05-26 11:07           ` Peter Zijlstra
2026-05-26 12:40             ` Peter Zijlstra
2026-05-11 19:23 ` [PATCH v2 00/10] sched: Flatten the pick Tejun Heo
2026-05-12  8:10   ` Peter Zijlstra
2026-05-12 18:45     ` Tejun Heo
2026-05-18  7:14       ` Peter Zijlstra
2026-05-18 19:11         ` Tejun Heo
2026-05-27  9:41           ` Peter Zijlstra
2026-05-12  8:42 ` Vincent Guittot
2026-05-12  9:20   ` Peter Zijlstra
2026-05-12 18:24     ` Peter Zijlstra
2026-05-12 18:25       ` Peter Zijlstra
2026-05-12 18:32         ` Vincent Guittot
2026-05-13  7:25           ` Peter Zijlstra
2026-05-13 11:35   ` Peter Zijlstra
2026-05-13 12:43     ` Peter Zijlstra
2026-05-18 13:34     ` Vincent Guittot
2026-05-18 21:12       ` Peter Zijlstra
2026-05-19 10:13         ` Vincent Guittot
2026-05-19 16:00           ` Vincent Guittot
2026-05-16  3:30 ` Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260611113219.GG187714@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bsegall@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chenridong@huaweicloud.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=jstultz@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=qyousef@layalina.io \
    --cc=rostedt@goodmis.org \
    --cc=svens@linux.ibm.com \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=ziqianlu@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox