Linux s390 Architecture development
 help / color / mirror / Atom feed
* [PATCH v2 0/2] sched/core: Fix proxy-exec/core-sched interactions
@ 2026-05-07 10:41 Vasily Gorbik
  2026-05-07 10:41 ` [PATCH v2 1/2] sched/core: Don't steal a proxy-exec donor Vasily Gorbik
  2026-05-07 10:41 ` [PATCH v2 2/2] sched/core: Don't proxy-exec unmatched cookie lock owners Vasily Gorbik
  0 siblings, 2 replies; 3+ messages in thread
From: Vasily Gorbik @ 2026-05-07 10:41 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	K Prateek Nayak
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, John Stultz, Vineeth Pillai, Joel Fernandes,
	Heiko Carstens, linux-s390, linux-kernel

v1 [1] consisted of a fix for a scheduler corruption where
try_steal_cookie() could migrate a proxy-exec donor away from the source
rq while that rq still used it as the active scheduling context.

Prateek pointed out [2] a separate proxy-exec/core-sched issue: after
pick_next_task() selects a core cookie compatible donor, find_proxy_task()
can replace the execution context with a mutex owner with a different
cookie.

This v2 keeps the donor steal fix as patch 1 and adds patch 2 to reject
mismatched final proxy owners.

The v1 reported the issue reproduced on s390 LPAR, but it seems to be
easily reproducible with strace test suite "make -j$(nproc) check" on
any system with SMT, CONFIG_SCHED_CORE=y and CONFIG_SCHED_PROXY_EXEC=y
enabled, e.g. on x86 KVM with -smp cpus=16,sockets=1,cores=8,threads=2:

[  283.181298] WARNING: kernel/sched/fair.c:5788 at put_prev_entity+0x4f/0x90, CPU#2: unshare-report-/27895
[  283.185230] Modules linked in:
[  283.186480] CPU: 2 UID: 0 PID: 27895 Comm: unshare-report- Not tainted 7.1.0-rc2-00076-g74fe02ce122a #26 PREEMPT(full)
[  283.190699] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-9.fc43 06/10/2025
[  283.194482] RIP: 0010:put_prev_entity+0x4f/0x90
[  283.196591] Code: fd ff ff 80 7b 58 00 74 e0 66 90 48 89 de 48 89 ef e8 85 a9 ff ff 31 d2 48 89 de 48 89 ef e8 d8 d6 ff ff 48 39 5d 58 74 c6 90 <0f> 0b 90 48 c7 45 58 00 00 00 00 5b 5d e9 7f cb 31 01 48 83 bb b8
[  283.205157] RSP: 0018:ffffc90009177af0 EFLAGS: 00010006
[  283.207443] RAX: 0000000000000000 RBX: ffff888102de8080 RCX: 000000000004f800
[  283.210442] RDX: 0000000000000000 RSI: 0000000000027c00 RDI: 00000041dd7d5860
[  283.213528] RBP: ffff888116cb2200 R08: ffff888116fe8080 R09: 0000000000000002
[  283.216766] R10: 0000000005bf08d6 R11: 00000000000002b7 R12: ffff8881192da4a0
[  283.219872] R13: ffff88813a3ec801 R14: 0000000000000001 R15: ffff88813a3ec800
[  283.222777] FS:  00007f6b5ca21780(0000) GS:ffff8881b628c000(0000) knlGS:0000000000000000
[  283.226171] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  283.228493] CR2: 000000001319e358 CR3: 000000001f322000 CR4: 00000000000006f0
[  283.231951] Call Trace:
[  283.233137]  <TASK>
[  283.234066]  put_prev_task_fair+0x1d/0x40
[  283.235943]  __schedule+0x1165/0x28d0
[  283.237599]  ? __resched_curr+0x372/0x3a0
[  283.239413]  ? detach_task+0xc1/0xd0
[  283.241015]  ? lockdep_hardirqs_on_prepare+0xd7/0x190
[  283.243170]  ? trace_hardirqs_on+0x18/0x100
[  283.244852]  preempt_schedule+0x2e/0x50
[  283.246707]  preempt_schedule_thunk+0x16/0x30
[  283.248680]  ? _raw_spin_unlock_irqrestore+0x3f/0x50
[  283.251012]  __mutex_unlock_slowpath+0x2d9/0x3d0
[  283.253196]  pcpu_alloc_noprof+0x3e6/0xbd0
[  283.255187]  alloc_vfsmnt+0xd7/0x1e0
[  283.256651]  clone_mnt+0x1e/0x280
[  283.258061]  copy_tree+0x127/0x420
[  283.259449]  copy_mnt_ns+0x13f/0x520
[  283.260926]  create_new_namespaces+0x54/0x2e0
[  283.262974]  unshare_nsproxy_namespaces+0x7e/0xb0
[  283.265317]  ksys_unshare+0x196/0x550
[  283.267097]  __x64_sys_unshare+0xd/0x20
[  283.268876]  do_syscall_64+0xf3/0x6a0
[  283.270611]  ? exc_page_fault+0xfa/0x240
[  283.272329]  ? __irq_exit_rcu+0x3c/0x100
[  283.274006]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  283.276085] RIP: 0033:0x7f6b5cb1730d
[  283.277509] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 5a 0f 00 f7 d8 64 89 01 48
[  283.285484] RSP: 002b:00007ffef4e305d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
[  283.288741] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f6b5cb1730d
[  283.291711] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000020000
[  283.294664] RBP: 0000000000000000 R08: 0000000000000000 R09: 3230345b3a737475
[  283.298149] R10: 000000000000eefe R11: 0000000000000246 R12: 00000000000000f0
[  283.301268] R13: 0000000000000001 R14: 00007f6b5cd53000 R15: 0000000000404df0
[  283.304570]  </TASK>
[  283.305583] irq event stamp: 2018
[  283.307085] hardirqs last  enabled at (2017): [<ffffffff8269777f>] _raw_spin_unlock_irqrestore+0x3f/0x50
[  283.311026] hardirqs last disabled at (2018): [<ffffffff82689aff>] __schedule+0x13df/0x28d0
[  283.314726] softirqs last  enabled at (2008): [<ffffffff81324f40>] __irq_exit_rcu+0xc0/0x100
[  283.318427] softirqs last disabled at (2001): [<ffffffff81324f40>] __irq_exit_rcu+0xc0/0x100
[  283.321920] ---[ end trace 0000000000000000 ]---
[  283.323878] BUG: kernel NULL pointer dereference, address: 0000000000000059
[  283.326033] #PF: supervisor read access in kernel mode
[  283.327357] #PF: error_code(0x0000) - not-present page
[  283.328698] PGD 800000000a8c5067 P4D 800000000a8c5067 PUD 12879067 PMD 0
[  283.329491] Oops: Oops: 0000 [#1] SMP PTI
[  283.329796] CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Tainted: G        W           7.1.0-rc2-00076-g74fe02ce122a #26 PREEMPT(full)
[  283.331183] Tainted: [W]=WARN
[  283.331468] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-9.fc43 06/10/2025
[  283.332346] RIP: 0010:pick_task_fair+0x2d/0xb0
[  283.332735] Code: fa 8b 97 10 01 00 00 85 d2 0f 84 92 00 00 00 53 48 89 fb 48 83 ec 08 48 8d bb 00 01 00 00 eb 21 be 01 00 00 00 e8 13 8b ff ff <80> 78 59 00 75 3d 48 85 c0 74 48 48 8b b8 b8 00 00 00 48 85 ff 74
[  283.334364] RSP: 0018:ffffc900000b7e20 EFLAGS: 00010086
[  283.334992] RAX: 0000000000000000 RBX: ffff88813a5ec800 RCX: 041dd83271100000
[  283.335731] RDX: 0000000000000000 RSI: 0000000000200000 RDI: ffff888116cb2200
[  283.336404] RBP: ffffc900000b7f20 R08: 041dd83271100000 R09: 0000000000200000
[  283.337852] R10: 00000005252d41b2 R11: 0000000000000001 R12: 0000000000000002
[  283.338802] R13: ffff888025c18000 R14: 0000000000000003 R15: ffffffff84160800
[  283.339827] FS:  0000000000000000(0000) GS:ffff8881b628c000(0000) knlGS:0000000000000000
[  283.341158] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  283.342502] CR2: 0000000000000059 CR3: 000000001f322000 CR4: 00000000000006f0
[  283.345455] Call Trace:
[  283.347033]  <TASK>
[  283.348350]  __schedule+0xc65/0x28d0
[  283.349703]  ? tick_nohz_idle_exit+0x66/0x160
[  283.350882]  ? do_idle+0x17c/0x2b0
[  283.351454]  schedule_idle+0x1d/0x40
[  283.352017]  cpu_startup_entry+0x24/0x30
[  283.352594]  start_secondary+0xf8/0x100
[  283.353272]  common_startup_64+0x13e/0x148
[  283.353840]  </TASK>

Tested with strace test suite as well as hackbench and stress-ng on s390 and x86.

v1-v2:
- added a fix to prevent proxy-exec of unmatched cookie lock owners

[1] https://lore.kernel.org/all/c00-01.ttedd70@ub.hpns/
[2] https://lore.kernel.org/all/10282ce9-f4ae-498f-9b57-f4e1e61fffbc@amd.com/

Vasily Gorbik (2):
  sched/core: Don't steal a proxy-exec donor
  sched/core: Don't proxy-exec unmatched cookie lock owners

 kernel/sched/core.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

-- 
2.53.0

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v2 1/2] sched/core: Don't steal a proxy-exec donor
  2026-05-07 10:41 [PATCH v2 0/2] sched/core: Fix proxy-exec/core-sched interactions Vasily Gorbik
@ 2026-05-07 10:41 ` Vasily Gorbik
  2026-05-07 10:41 ` [PATCH v2 2/2] sched/core: Don't proxy-exec unmatched cookie lock owners Vasily Gorbik
  1 sibling, 0 replies; 3+ messages in thread
From: Vasily Gorbik @ 2026-05-07 10:41 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	K Prateek Nayak
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, John Stultz, Vineeth Pillai, Joel Fernandes,
	Heiko Carstens, linux-s390, linux-kernel

try_steal_cookie() avoids stealing src->core_pick and src->curr before
moving a task with the same cookie via move_queued_task_locked().

With proxy-exec, src->donor is the current scheduling context and may
differ from src->curr. Stealing it migrates a task that the source rq
still treats as current, leaving src's scheduler state for that task
stale. For CFS this means cfs_rq->curr points at the stolen entity,
and the next pick on the source rq hits the WARN_ON_ONCE in
put_prev_entity().

Commit 7de9d4f94638 ("sched: Start blocked_on chain processing in
find_proxy_task()") tweaked the fair class logic so that the donor task
isn't migrated away while we're running the proxy. Do it similarly for
try_steal_cookie() and skip src->donor as well.

Fixes: 7de9d4f94638 ("sched: Start blocked_on chain processing in find_proxy_task()")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b905805bbcbe..8aed55592ca9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6366,7 +6366,7 @@ static bool try_steal_cookie(int this, int that)
 		return false;
 
 	do {
-		if (p == src->core_pick || p == src->curr)
+		if (p == src->core_pick || p == src->curr || p == src->donor)
 			goto next;
 
 		if (!is_cpu_allowed(p, this))
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH v2 2/2] sched/core: Don't proxy-exec unmatched cookie lock owners
  2026-05-07 10:41 [PATCH v2 0/2] sched/core: Fix proxy-exec/core-sched interactions Vasily Gorbik
  2026-05-07 10:41 ` [PATCH v2 1/2] sched/core: Don't steal a proxy-exec donor Vasily Gorbik
@ 2026-05-07 10:41 ` Vasily Gorbik
  1 sibling, 0 replies; 3+ messages in thread
From: Vasily Gorbik @ 2026-05-07 10:41 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	K Prateek Nayak
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, John Stultz, Vineeth Pillai, Joel Fernandes,
	Heiko Carstens, linux-s390, linux-kernel

Core scheduling chooses a core-wide cookie before __schedule()
installs the next task. With proxy-exec enabled, that task becomes the
donor/scheduling context, and find_proxy_task() may then replace the
execution context with the runnable mutex owner. If its cookie differs
from the selected core cookie, running it would bypass core scheduling's
cookie selection.

When the final mutex owner found by find_proxy_task() does not match the
selected core cookie, stop proxying the donor. If the current execution
context is already in the blocked chain, fall back to idle like the
existing proxy-exec retry paths do. Otherwise deactivate the donor and
let __schedule() pick again. The mutex owner can be picked later under
its own cookie.

Fixes: 7de9d4f94638 ("sched: Start blocked_on chain processing in find_proxy_task()")
Reported-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
---
 kernel/sched/core.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8aed55592ca9..d338fb714ce8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6960,6 +6960,12 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
 		 */
 	}
 	WARN_ON_ONCE(owner && !owner->on_rq);
+
+	if (owner && !sched_cpu_cookie_match(rq, owner)) {
+		if (curr_in_chain)
+			return proxy_resched_idle(rq);
+		goto deactivate;
+	}
 	return owner;
 
 deactivate:
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-07 10:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07 10:41 [PATCH v2 0/2] sched/core: Fix proxy-exec/core-sched interactions Vasily Gorbik
2026-05-07 10:41 ` [PATCH v2 1/2] sched/core: Don't steal a proxy-exec donor Vasily Gorbik
2026-05-07 10:41 ` [PATCH v2 2/2] sched/core: Don't proxy-exec unmatched cookie lock owners Vasily Gorbik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox