From: Vasily Gorbik <gor@linux.ibm.com>
To: Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
John Stultz <jstultz@google.com>,
Vineeth Pillai <vineethrp@google.com>,
Joel Fernandes <joelagnelf@nvidia.com>,
Heiko Carstens <hca@linux.ibm.com>,
linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH v2 0/2] sched/core: Fix proxy-exec/core-sched interactions
Date: Thu, 7 May 2026 12:41:53 +0200 [thread overview]
Message-ID: <c00-02.v2.ttenwd4@ub.hpns> (raw)
v1 [1] consisted of a fix for a scheduler corruption where
try_steal_cookie() could migrate a proxy-exec donor away from the source
rq while that rq still used it as the active scheduling context.
Prateek pointed out [2] a separate proxy-exec/core-sched issue: after
pick_next_task() selects a core cookie compatible donor, find_proxy_task()
can replace the execution context with a mutex owner with a different
cookie.
This v2 keeps the donor steal fix as patch 1 and adds patch 2 to reject
mismatched final proxy owners.
The v1 reported the issue reproduced on s390 LPAR, but it seems to be
easily reproducible with strace test suite "make -j$(nproc) check" on
any system with SMT, CONFIG_SCHED_CORE=y and CONFIG_SCHED_PROXY_EXEC=y
enabled, e.g. on x86 KVM with -smp cpus=16,sockets=1,cores=8,threads=2:
[ 283.181298] WARNING: kernel/sched/fair.c:5788 at put_prev_entity+0x4f/0x90, CPU#2: unshare-report-/27895
[ 283.185230] Modules linked in:
[ 283.186480] CPU: 2 UID: 0 PID: 27895 Comm: unshare-report- Not tainted 7.1.0-rc2-00076-g74fe02ce122a #26 PREEMPT(full)
[ 283.190699] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-9.fc43 06/10/2025
[ 283.194482] RIP: 0010:put_prev_entity+0x4f/0x90
[ 283.196591] Code: fd ff ff 80 7b 58 00 74 e0 66 90 48 89 de 48 89 ef e8 85 a9 ff ff 31 d2 48 89 de 48 89 ef e8 d8 d6 ff ff 48 39 5d 58 74 c6 90 <0f> 0b 90 48 c7 45 58 00 00 00 00 5b 5d e9 7f cb 31 01 48 83 bb b8
[ 283.205157] RSP: 0018:ffffc90009177af0 EFLAGS: 00010006
[ 283.207443] RAX: 0000000000000000 RBX: ffff888102de8080 RCX: 000000000004f800
[ 283.210442] RDX: 0000000000000000 RSI: 0000000000027c00 RDI: 00000041dd7d5860
[ 283.213528] RBP: ffff888116cb2200 R08: ffff888116fe8080 R09: 0000000000000002
[ 283.216766] R10: 0000000005bf08d6 R11: 00000000000002b7 R12: ffff8881192da4a0
[ 283.219872] R13: ffff88813a3ec801 R14: 0000000000000001 R15: ffff88813a3ec800
[ 283.222777] FS: 00007f6b5ca21780(0000) GS:ffff8881b628c000(0000) knlGS:0000000000000000
[ 283.226171] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 283.228493] CR2: 000000001319e358 CR3: 000000001f322000 CR4: 00000000000006f0
[ 283.231951] Call Trace:
[ 283.233137] <TASK>
[ 283.234066] put_prev_task_fair+0x1d/0x40
[ 283.235943] __schedule+0x1165/0x28d0
[ 283.237599] ? __resched_curr+0x372/0x3a0
[ 283.239413] ? detach_task+0xc1/0xd0
[ 283.241015] ? lockdep_hardirqs_on_prepare+0xd7/0x190
[ 283.243170] ? trace_hardirqs_on+0x18/0x100
[ 283.244852] preempt_schedule+0x2e/0x50
[ 283.246707] preempt_schedule_thunk+0x16/0x30
[ 283.248680] ? _raw_spin_unlock_irqrestore+0x3f/0x50
[ 283.251012] __mutex_unlock_slowpath+0x2d9/0x3d0
[ 283.253196] pcpu_alloc_noprof+0x3e6/0xbd0
[ 283.255187] alloc_vfsmnt+0xd7/0x1e0
[ 283.256651] clone_mnt+0x1e/0x280
[ 283.258061] copy_tree+0x127/0x420
[ 283.259449] copy_mnt_ns+0x13f/0x520
[ 283.260926] create_new_namespaces+0x54/0x2e0
[ 283.262974] unshare_nsproxy_namespaces+0x7e/0xb0
[ 283.265317] ksys_unshare+0x196/0x550
[ 283.267097] __x64_sys_unshare+0xd/0x20
[ 283.268876] do_syscall_64+0xf3/0x6a0
[ 283.270611] ? exc_page_fault+0xfa/0x240
[ 283.272329] ? __irq_exit_rcu+0x3c/0x100
[ 283.274006] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 283.276085] RIP: 0033:0x7f6b5cb1730d
[ 283.277509] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 5a 0f 00 f7 d8 64 89 01 48
[ 283.285484] RSP: 002b:00007ffef4e305d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
[ 283.288741] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f6b5cb1730d
[ 283.291711] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000020000
[ 283.294664] RBP: 0000000000000000 R08: 0000000000000000 R09: 3230345b3a737475
[ 283.298149] R10: 000000000000eefe R11: 0000000000000246 R12: 00000000000000f0
[ 283.301268] R13: 0000000000000001 R14: 00007f6b5cd53000 R15: 0000000000404df0
[ 283.304570] </TASK>
[ 283.305583] irq event stamp: 2018
[ 283.307085] hardirqs last enabled at (2017): [<ffffffff8269777f>] _raw_spin_unlock_irqrestore+0x3f/0x50
[ 283.311026] hardirqs last disabled at (2018): [<ffffffff82689aff>] __schedule+0x13df/0x28d0
[ 283.314726] softirqs last enabled at (2008): [<ffffffff81324f40>] __irq_exit_rcu+0xc0/0x100
[ 283.318427] softirqs last disabled at (2001): [<ffffffff81324f40>] __irq_exit_rcu+0xc0/0x100
[ 283.321920] ---[ end trace 0000000000000000 ]---
[ 283.323878] BUG: kernel NULL pointer dereference, address: 0000000000000059
[ 283.326033] #PF: supervisor read access in kernel mode
[ 283.327357] #PF: error_code(0x0000) - not-present page
[ 283.328698] PGD 800000000a8c5067 P4D 800000000a8c5067 PUD 12879067 PMD 0
[ 283.329491] Oops: Oops: 0000 [#1] SMP PTI
[ 283.329796] CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Tainted: G W 7.1.0-rc2-00076-g74fe02ce122a #26 PREEMPT(full)
[ 283.331183] Tainted: [W]=WARN
[ 283.331468] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-9.fc43 06/10/2025
[ 283.332346] RIP: 0010:pick_task_fair+0x2d/0xb0
[ 283.332735] Code: fa 8b 97 10 01 00 00 85 d2 0f 84 92 00 00 00 53 48 89 fb 48 83 ec 08 48 8d bb 00 01 00 00 eb 21 be 01 00 00 00 e8 13 8b ff ff <80> 78 59 00 75 3d 48 85 c0 74 48 48 8b b8 b8 00 00 00 48 85 ff 74
[ 283.334364] RSP: 0018:ffffc900000b7e20 EFLAGS: 00010086
[ 283.334992] RAX: 0000000000000000 RBX: ffff88813a5ec800 RCX: 041dd83271100000
[ 283.335731] RDX: 0000000000000000 RSI: 0000000000200000 RDI: ffff888116cb2200
[ 283.336404] RBP: ffffc900000b7f20 R08: 041dd83271100000 R09: 0000000000200000
[ 283.337852] R10: 00000005252d41b2 R11: 0000000000000001 R12: 0000000000000002
[ 283.338802] R13: ffff888025c18000 R14: 0000000000000003 R15: ffffffff84160800
[ 283.339827] FS: 0000000000000000(0000) GS:ffff8881b628c000(0000) knlGS:0000000000000000
[ 283.341158] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 283.342502] CR2: 0000000000000059 CR3: 000000001f322000 CR4: 00000000000006f0
[ 283.345455] Call Trace:
[ 283.347033] <TASK>
[ 283.348350] __schedule+0xc65/0x28d0
[ 283.349703] ? tick_nohz_idle_exit+0x66/0x160
[ 283.350882] ? do_idle+0x17c/0x2b0
[ 283.351454] schedule_idle+0x1d/0x40
[ 283.352017] cpu_startup_entry+0x24/0x30
[ 283.352594] start_secondary+0xf8/0x100
[ 283.353272] common_startup_64+0x13e/0x148
[ 283.353840] </TASK>
Tested with strace test suite as well as hackbench and stress-ng on s390 and x86.
v1-v2:
- added a fix to prevent proxy-exec of unmatched cookie lock owners
[1] https://lore.kernel.org/all/c00-01.ttedd70@ub.hpns/
[2] https://lore.kernel.org/all/10282ce9-f4ae-498f-9b57-f4e1e61fffbc@amd.com/
Vasily Gorbik (2):
sched/core: Don't steal a proxy-exec donor
sched/core: Don't proxy-exec unmatched cookie lock owners
kernel/sched/core.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--
2.53.0
next reply other threads:[~2026-05-07 10:42 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-07 10:41 Vasily Gorbik [this message]
2026-05-07 10:41 ` [PATCH v2 1/2] sched/core: Don't steal a proxy-exec donor Vasily Gorbik
2026-05-07 10:41 ` [PATCH v2 2/2] sched/core: Don't proxy-exec unmatched cookie lock owners Vasily Gorbik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c00-02.v2.ttenwd4@ub.hpns \
--to=gor@linux.ibm.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=hca@linux.ibm.com \
--cc=joelagnelf@nvidia.com \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vineethrp@google.com \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox