From: Peter Zijlstra <peterz@infradead.org>
To: Aaron Lu <ziqianlu@bytedance.com>
Cc: mingo@kernel.org, longman@redhat.com, chenridong@huaweicloud.com,
juri.lelli@redhat.com, vincent.guittot@linaro.org,
dietmar.eggemann@arm.com, rostedt@goodmis.org,
bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
jstultz@google.com, kprateek.nayak@amd.com, qyousef@layalina.io,
svens@linux.ibm.com
Subject: Re: [PATCH v2 08/10] sched/fair: Add newidle balance to pick_task_fair()
Date: Thu, 11 Jun 2026 13:32:19 +0200 [thread overview]
Message-ID: <20260611113219.GG187714@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20260603095108.GA1684319@bytedance.com>
Aaron,
Sorry I failed to notice this email earlier.
On Wed, Jun 03, 2026 at 05:51:08PM +0800, Aaron Lu wrote:
> I applied below diff and the problem is gone:
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5f48af700fd44..942a543af3e54 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9897,6 +9897,9 @@ static struct task_struct *pick_task_fair(struct rq *rq, struct rq_flags *rf)
> return p;
>
> idle:
> + if (sched_core_enabled(rq))
> + return NULL;
> +
> new_tasks = sched_balance_newidle(rq, rf);
> if (new_tasks < 0)
> return RETRY_TASK;
>
Right, this is the safe patch and restores pick_task_fair() to its
previous status (for core-sched).
Since people are hitting this problem, I'm going to merge it as below.
I've presumed your SoB, please let me know if that's a problem.
I think I'm going to try and move newidle into sched_class::balance /
balance_fair(), but I'll do that next cycle.
Thanks!
---
Subject: sched/fair: Fix newidle vs core-sched
From: "Aaron Lu" <ziqianlu@bytedance.com>
Date: Wed, 3 Jun 2026 17:51:08 +0800
From: "Aaron Lu" <ziqianlu@bytedance.com>
While testing Prateek's throttle series, I noticed a panic issue when
coresched is enabled and bisected to this patch.
I fed the panic log and this patch to an agent and its analysis looks
correct to me(cpu56 and cpu57 are siblings in a VM):
cpu57 (holds core-wide lock)
pick_next_task() [core scheduling]
for_each_cpu_wrap(i, smt_mask, 57):
i=57: pick_task(rq_57)
pick_task_fair(rq_57)
-> picks task A
rq_57->core_pick = task A
// task_rq(A) == rq_57
i=56: pick_task(rq_56)
pick_task_fair(rq_56)
cfs_rq->nr_queued == 0
goto idle
sched_balance_newidle(rq_56)
raw_spin_rq_unlock(rq_56)
// core-wide lock released
newidle_balance() pulls
task A: rq_57 -> rq_56
// task_rq(A) == rq_56 now
raw_spin_rq_lock(rq_56)
// core-wide lock re-acquired
return > 0
goto again
pick_task_fair(rq_56)
-> picks task A
rq_56->core_pick = task A
// first loop done
// rq_57->core_pick is still task A (set before lock release)
// but task_rq(A) == rq_56 now
next = rq_57->core_pick // = task A
put_prev_set_next_task(rq_57, prev, task A)
__set_next_task_fair(rq_57, task A)
hrtick_start_fair(rq_57, task A)
WARN_ON_ONCE(task_rq(task A) != rq_57)
// task_rq(A) == rq_56
IOW: by allowing pick_task_fair() to do newidle_balance and not returning
RETRY_TASK, it can end up selecting the same task on two CPUs. Restore the
previous state by never doing newidle when core scheduling is enabled.
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: "Aaron Lu" <ziqianlu@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260603095108.GA1684319@bytedance.com
---
kernel/sched/fair.c | 3 +++
1 file changed, 3 insertions(+)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9942,6 +9942,9 @@ struct task_struct *pick_task_fair(struc
return p;
idle:
+ if (sched_core_enabled(rq))
+ return NULL;
+
new_tasks = sched_balance_newidle(rq, rf);
if (new_tasks < 0)
return RETRY_TASK;
next prev parent reply other threads:[~2026-06-11 11:32 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-11 11:31 [PATCH v2 00/10] sched: Flatten the pick Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 01/10] sched/debug: Use char * instead of char (*)[] Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 02/10] sched: Use {READ,WRITE}_ONCE() for preempt_dynamic_mode Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 03/10] sched/debug: Collapse subsequent CONFIG_SCHED_CLASS_EXT sections Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 04/10] sched/fair: Add cgroup_mode switch Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 05/10] sched/fair: Add cgroup_mode: UP Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 06/10] sched/fair: Add cgroup_mode: MAX Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 07/10] sched/fair: Add cgroup_mode: CONCUR Peter Zijlstra
2026-05-11 11:31 ` [PATCH v2 08/10] sched/fair: Add newidle balance to pick_task_fair() Peter Zijlstra
2026-05-12 5:37 ` K Prateek Nayak
2026-05-12 9:45 ` Peter Zijlstra
2026-05-19 15:13 ` Vincent Guittot
2026-06-03 9:51 ` Aaron Lu
2026-06-11 11:32 ` Peter Zijlstra [this message]
2026-05-11 11:31 ` [PATCH v2 09/10] sched: Remove sched_class::pick_next_task() Peter Zijlstra
2026-05-19 15:14 ` Vincent Guittot
2026-05-11 11:31 ` [PATCH v2 10/10] sched/eevdf: Move to a single runqueue Peter Zijlstra
2026-05-11 16:21 ` K Prateek Nayak
2026-05-12 11:09 ` Peter Zijlstra
2026-05-13 7:01 ` K Prateek Nayak
2026-05-13 7:25 ` Peter Zijlstra
2026-05-13 4:51 ` John Stultz
2026-05-13 5:00 ` John Stultz
2026-05-14 1:36 ` John Stultz
2026-05-14 2:53 ` K Prateek Nayak
2026-05-14 3:14 ` John Stultz
2026-05-19 10:38 ` Vincent Guittot
2026-05-20 16:32 ` Vincent Guittot
2026-05-21 2:57 ` K Prateek Nayak
2026-05-21 7:56 ` Vincent Guittot
2026-05-21 10:31 ` Peter Zijlstra
2026-05-21 12:13 ` Vincent Guittot
2026-05-21 13:29 ` Peter Zijlstra
2026-05-21 13:44 ` Vincent Guittot
2026-05-21 14:01 ` Peter Zijlstra
2026-05-21 13:21 ` Peter Zijlstra
2026-05-21 13:39 ` Peter Zijlstra
2026-05-21 13:56 ` Vincent Guittot
2026-05-26 7:53 ` Zhang Qiao
2026-05-26 9:15 ` K Prateek Nayak
2026-05-26 9:36 ` Zhang Qiao
2026-05-26 9:52 ` Peter Zijlstra
2026-05-26 10:54 ` K Prateek Nayak
2026-05-26 11:07 ` Peter Zijlstra
2026-05-26 12:40 ` Peter Zijlstra
2026-05-11 19:23 ` [PATCH v2 00/10] sched: Flatten the pick Tejun Heo
2026-05-12 8:10 ` Peter Zijlstra
2026-05-12 18:45 ` Tejun Heo
2026-05-18 7:14 ` Peter Zijlstra
2026-05-18 19:11 ` Tejun Heo
2026-05-27 9:41 ` Peter Zijlstra
2026-05-12 8:42 ` Vincent Guittot
2026-05-12 9:20 ` Peter Zijlstra
2026-05-12 18:24 ` Peter Zijlstra
2026-05-12 18:25 ` Peter Zijlstra
2026-05-12 18:32 ` Vincent Guittot
2026-05-13 7:25 ` Peter Zijlstra
2026-05-13 11:35 ` Peter Zijlstra
2026-05-13 12:43 ` Peter Zijlstra
2026-05-18 13:34 ` Vincent Guittot
2026-05-18 21:12 ` Peter Zijlstra
2026-05-19 10:13 ` Vincent Guittot
2026-05-19 16:00 ` Vincent Guittot
2026-05-16 3:30 ` Qais Yousef
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260611113219.GG187714@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huaweicloud.com \
--cc=dietmar.eggemann@arm.com \
--cc=hannes@cmpxchg.org \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=mkoutny@suse.com \
--cc=qyousef@layalina.io \
--cc=rostedt@goodmis.org \
--cc=svens@linux.ibm.com \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=ziqianlu@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox