Linux s390 Architecture development
 help / color / mirror / Atom feed
* [PATCH v2 0/2] sched/core: Fix proxy-exec/core-sched interactions
@ 2026-05-07 10:41 Vasily Gorbik
  2026-05-07 10:41 ` [PATCH v2 1/2] sched/core: Don't steal a proxy-exec donor Vasily Gorbik
  2026-05-07 10:41 ` [PATCH v2 2/2] sched/core: Don't proxy-exec unmatched cookie lock owners Vasily Gorbik
  0 siblings, 2 replies; 3+ messages in thread
From: Vasily Gorbik @ 2026-05-07 10:41 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	K Prateek Nayak
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, John Stultz, Vineeth Pillai, Joel Fernandes,
	Heiko Carstens, linux-s390, linux-kernel

v1 [1] consisted of a fix for a scheduler corruption where
try_steal_cookie() could migrate a proxy-exec donor away from the source
rq while that rq still used it as the active scheduling context.

Prateek pointed out [2] a separate proxy-exec/core-sched issue: after
pick_next_task() selects a core cookie compatible donor, find_proxy_task()
can replace the execution context with a mutex owner with a different
cookie.

This v2 keeps the donor steal fix as patch 1 and adds patch 2 to reject
mismatched final proxy owners.

The v1 reported the issue reproduced on s390 LPAR, but it seems to be
easily reproducible with strace test suite "make -j$(nproc) check" on
any system with SMT, CONFIG_SCHED_CORE=y and CONFIG_SCHED_PROXY_EXEC=y
enabled, e.g. on x86 KVM with -smp cpus=16,sockets=1,cores=8,threads=2:

[  283.181298] WARNING: kernel/sched/fair.c:5788 at put_prev_entity+0x4f/0x90, CPU#2: unshare-report-/27895
[  283.185230] Modules linked in:
[  283.186480] CPU: 2 UID: 0 PID: 27895 Comm: unshare-report- Not tainted 7.1.0-rc2-00076-g74fe02ce122a #26 PREEMPT(full)
[  283.190699] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-9.fc43 06/10/2025
[  283.194482] RIP: 0010:put_prev_entity+0x4f/0x90
[  283.196591] Code: fd ff ff 80 7b 58 00 74 e0 66 90 48 89 de 48 89 ef e8 85 a9 ff ff 31 d2 48 89 de 48 89 ef e8 d8 d6 ff ff 48 39 5d 58 74 c6 90 <0f> 0b 90 48 c7 45 58 00 00 00 00 5b 5d e9 7f cb 31 01 48 83 bb b8
[  283.205157] RSP: 0018:ffffc90009177af0 EFLAGS: 00010006
[  283.207443] RAX: 0000000000000000 RBX: ffff888102de8080 RCX: 000000000004f800
[  283.210442] RDX: 0000000000000000 RSI: 0000000000027c00 RDI: 00000041dd7d5860
[  283.213528] RBP: ffff888116cb2200 R08: ffff888116fe8080 R09: 0000000000000002
[  283.216766] R10: 0000000005bf08d6 R11: 00000000000002b7 R12: ffff8881192da4a0
[  283.219872] R13: ffff88813a3ec801 R14: 0000000000000001 R15: ffff88813a3ec800
[  283.222777] FS:  00007f6b5ca21780(0000) GS:ffff8881b628c000(0000) knlGS:0000000000000000
[  283.226171] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  283.228493] CR2: 000000001319e358 CR3: 000000001f322000 CR4: 00000000000006f0
[  283.231951] Call Trace:
[  283.233137]  <TASK>
[  283.234066]  put_prev_task_fair+0x1d/0x40
[  283.235943]  __schedule+0x1165/0x28d0
[  283.237599]  ? __resched_curr+0x372/0x3a0
[  283.239413]  ? detach_task+0xc1/0xd0
[  283.241015]  ? lockdep_hardirqs_on_prepare+0xd7/0x190
[  283.243170]  ? trace_hardirqs_on+0x18/0x100
[  283.244852]  preempt_schedule+0x2e/0x50
[  283.246707]  preempt_schedule_thunk+0x16/0x30
[  283.248680]  ? _raw_spin_unlock_irqrestore+0x3f/0x50
[  283.251012]  __mutex_unlock_slowpath+0x2d9/0x3d0
[  283.253196]  pcpu_alloc_noprof+0x3e6/0xbd0
[  283.255187]  alloc_vfsmnt+0xd7/0x1e0
[  283.256651]  clone_mnt+0x1e/0x280
[  283.258061]  copy_tree+0x127/0x420
[  283.259449]  copy_mnt_ns+0x13f/0x520
[  283.260926]  create_new_namespaces+0x54/0x2e0
[  283.262974]  unshare_nsproxy_namespaces+0x7e/0xb0
[  283.265317]  ksys_unshare+0x196/0x550
[  283.267097]  __x64_sys_unshare+0xd/0x20
[  283.268876]  do_syscall_64+0xf3/0x6a0
[  283.270611]  ? exc_page_fault+0xfa/0x240
[  283.272329]  ? __irq_exit_rcu+0x3c/0x100
[  283.274006]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  283.276085] RIP: 0033:0x7f6b5cb1730d
[  283.277509] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 5a 0f 00 f7 d8 64 89 01 48
[  283.285484] RSP: 002b:00007ffef4e305d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
[  283.288741] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f6b5cb1730d
[  283.291711] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000020000
[  283.294664] RBP: 0000000000000000 R08: 0000000000000000 R09: 3230345b3a737475
[  283.298149] R10: 000000000000eefe R11: 0000000000000246 R12: 00000000000000f0
[  283.301268] R13: 0000000000000001 R14: 00007f6b5cd53000 R15: 0000000000404df0
[  283.304570]  </TASK>
[  283.305583] irq event stamp: 2018
[  283.307085] hardirqs last  enabled at (2017): [<ffffffff8269777f>] _raw_spin_unlock_irqrestore+0x3f/0x50
[  283.311026] hardirqs last disabled at (2018): [<ffffffff82689aff>] __schedule+0x13df/0x28d0
[  283.314726] softirqs last  enabled at (2008): [<ffffffff81324f40>] __irq_exit_rcu+0xc0/0x100
[  283.318427] softirqs last disabled at (2001): [<ffffffff81324f40>] __irq_exit_rcu+0xc0/0x100
[  283.321920] ---[ end trace 0000000000000000 ]---
[  283.323878] BUG: kernel NULL pointer dereference, address: 0000000000000059
[  283.326033] #PF: supervisor read access in kernel mode
[  283.327357] #PF: error_code(0x0000) - not-present page
[  283.328698] PGD 800000000a8c5067 P4D 800000000a8c5067 PUD 12879067 PMD 0
[  283.329491] Oops: Oops: 0000 [#1] SMP PTI
[  283.329796] CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Tainted: G        W           7.1.0-rc2-00076-g74fe02ce122a #26 PREEMPT(full)
[  283.331183] Tainted: [W]=WARN
[  283.331468] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-9.fc43 06/10/2025
[  283.332346] RIP: 0010:pick_task_fair+0x2d/0xb0
[  283.332735] Code: fa 8b 97 10 01 00 00 85 d2 0f 84 92 00 00 00 53 48 89 fb 48 83 ec 08 48 8d bb 00 01 00 00 eb 21 be 01 00 00 00 e8 13 8b ff ff <80> 78 59 00 75 3d 48 85 c0 74 48 48 8b b8 b8 00 00 00 48 85 ff 74
[  283.334364] RSP: 0018:ffffc900000b7e20 EFLAGS: 00010086
[  283.334992] RAX: 0000000000000000 RBX: ffff88813a5ec800 RCX: 041dd83271100000
[  283.335731] RDX: 0000000000000000 RSI: 0000000000200000 RDI: ffff888116cb2200
[  283.336404] RBP: ffffc900000b7f20 R08: 041dd83271100000 R09: 0000000000200000
[  283.337852] R10: 00000005252d41b2 R11: 0000000000000001 R12: 0000000000000002
[  283.338802] R13: ffff888025c18000 R14: 0000000000000003 R15: ffffffff84160800
[  283.339827] FS:  0000000000000000(0000) GS:ffff8881b628c000(0000) knlGS:0000000000000000
[  283.341158] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  283.342502] CR2: 0000000000000059 CR3: 000000001f322000 CR4: 00000000000006f0
[  283.345455] Call Trace:
[  283.347033]  <TASK>
[  283.348350]  __schedule+0xc65/0x28d0
[  283.349703]  ? tick_nohz_idle_exit+0x66/0x160
[  283.350882]  ? do_idle+0x17c/0x2b0
[  283.351454]  schedule_idle+0x1d/0x40
[  283.352017]  cpu_startup_entry+0x24/0x30
[  283.352594]  start_secondary+0xf8/0x100
[  283.353272]  common_startup_64+0x13e/0x148
[  283.353840]  </TASK>

Tested with strace test suite as well as hackbench and stress-ng on s390 and x86.

v1-v2:
- added a fix to prevent proxy-exec of unmatched cookie lock owners

[1] https://lore.kernel.org/all/c00-01.ttedd70@ub.hpns/
[2] https://lore.kernel.org/all/10282ce9-f4ae-498f-9b57-f4e1e61fffbc@amd.com/

Vasily Gorbik (2):
  sched/core: Don't steal a proxy-exec donor
  sched/core: Don't proxy-exec unmatched cookie lock owners

 kernel/sched/core.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

-- 
2.53.0

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-07 10:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07 10:41 [PATCH v2 0/2] sched/core: Fix proxy-exec/core-sched interactions Vasily Gorbik
2026-05-07 10:41 ` [PATCH v2 1/2] sched/core: Don't steal a proxy-exec donor Vasily Gorbik
2026-05-07 10:41 ` [PATCH v2 2/2] sched/core: Don't proxy-exec unmatched cookie lock owners Vasily Gorbik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox