From: Vasily Gorbik <gor@linux.ibm.com>
To: Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
John Stultz <jstultz@google.com>,
Vineeth Pillai <vineethrp@google.com>,
Joel Fernandes <joelagnelf@nvidia.com>,
Heiko Carstens <hca@linux.ibm.com>,
linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH v2 0/2] sched/core: Fix proxy-exec/core-sched interactions
Date: Thu, 7 May 2026 12:41:53 +0200 [thread overview]
Message-ID: <c00-02.v2.ttenwd4@ub.hpns> (raw)
v1 [1] consisted of a fix for a scheduler corruption where
try_steal_cookie() could migrate a proxy-exec donor away from the source
rq while that rq still used it as the active scheduling context.
Prateek pointed out [2] a separate proxy-exec/core-sched issue: after
pick_next_task() selects a core cookie compatible donor, find_proxy_task()
can replace the execution context with a mutex owner with a different
cookie.
This v2 keeps the donor steal fix as patch 1 and adds patch 2 to reject
mismatched final proxy owners.
The v1 reported the issue reproduced on s390 LPAR, but it seems to be
easily reproducible with strace test suite "make -j$(nproc) check" on
any system with SMT, CONFIG_SCHED_CORE=y and CONFIG_SCHED_PROXY_EXEC=y
enabled, e.g. on x86 KVM with -smp cpus=16,sockets=1,cores=8,threads=2:
[ 283.181298] WARNING: kernel/sched/fair.c:5788 at put_prev_entity+0x4f/0x90, CPU#2: unshare-report-/27895
[ 283.185230] Modules linked in:
[ 283.186480] CPU: 2 UID: 0 PID: 27895 Comm: unshare-report- Not tainted 7.1.0-rc2-00076-g74fe02ce122a #26 PREEMPT(full)
[ 283.190699] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-9.fc43 06/10/2025
[ 283.194482] RIP: 0010:put_prev_entity+0x4f/0x90
[ 283.196591] Code: fd ff ff 80 7b 58 00 74 e0 66 90 48 89 de 48 89 ef e8 85 a9 ff ff 31 d2 48 89 de 48 89 ef e8 d8 d6 ff ff 48 39 5d 58 74 c6 90 <0f> 0b 90 48 c7 45 58 00 00 00 00 5b 5d e9 7f cb 31 01 48 83 bb b8
[ 283.205157] RSP: 0018:ffffc90009177af0 EFLAGS: 00010006
[ 283.207443] RAX: 0000000000000000 RBX: ffff888102de8080 RCX: 000000000004f800
[ 283.210442] RDX: 0000000000000000 RSI: 0000000000027c00 RDI: 00000041dd7d5860
[ 283.213528] RBP: ffff888116cb2200 R08: ffff888116fe8080 R09: 0000000000000002
[ 283.216766] R10: 0000000005bf08d6 R11: 00000000000002b7 R12: ffff8881192da4a0
[ 283.219872] R13: ffff88813a3ec801 R14: 0000000000000001 R15: ffff88813a3ec800
[ 283.222777] FS: 00007f6b5ca21780(0000) GS:ffff8881b628c000(0000) knlGS:0000000000000000
[ 283.226171] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 283.228493] CR2: 000000001319e358 CR3: 000000001f322000 CR4: 00000000000006f0
[ 283.231951] Call Trace:
[ 283.233137] <TASK>
[ 283.234066] put_prev_task_fair+0x1d/0x40
[ 283.235943] __schedule+0x1165/0x28d0
[ 283.237599] ? __resched_curr+0x372/0x3a0
[ 283.239413] ? detach_task+0xc1/0xd0
[ 283.241015] ? lockdep_hardirqs_on_prepare+0xd7/0x190
[ 283.243170] ? trace_hardirqs_on+0x18/0x100
[ 283.244852] preempt_schedule+0x2e/0x50
[ 283.246707] preempt_schedule_thunk+0x16/0x30
[ 283.248680] ? _raw_spin_unlock_irqrestore+0x3f/0x50
[ 283.251012] __mutex_unlock_slowpath+0x2d9/0x3d0
[ 283.253196] pcpu_alloc_noprof+0x3e6/0xbd0
[ 283.255187] alloc_vfsmnt+0xd7/0x1e0
[ 283.256651] clone_mnt+0x1e/0x280
[ 283.258061] copy_tree+0x127/0x420
[ 283.259449] copy_mnt_ns+0x13f/0x520
[ 283.260926] create_new_namespaces+0x54/0x2e0
[ 283.262974] unshare_nsproxy_namespaces+0x7e/0xb0
[ 283.265317] ksys_unshare+0x196/0x550
[ 283.267097] __x64_sys_unshare+0xd/0x20
[ 283.268876] do_syscall_64+0xf3/0x6a0
[ 283.270611] ? exc_page_fault+0xfa/0x240
[ 283.272329] ? __irq_exit_rcu+0x3c/0x100
[ 283.274006] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 283.276085] RIP: 0033:0x7f6b5cb1730d
[ 283.277509] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 5a 0f 00 f7 d8 64 89 01 48
[ 283.285484] RSP: 002b:00007ffef4e305d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
[ 283.288741] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f6b5cb1730d
[ 283.291711] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000020000
[ 283.294664] RBP: 0000000000000000 R08: 0000000000000000 R09: 3230345b3a737475
[ 283.298149] R10: 000000000000eefe R11: 0000000000000246 R12: 00000000000000f0
[ 283.301268] R13: 0000000000000001 R14: 00007f6b5cd53000 R15: 0000000000404df0
[ 283.304570] </TASK>
[ 283.305583] irq event stamp: 2018
[ 283.307085] hardirqs last enabled at (2017): [<ffffffff8269777f>] _raw_spin_unlock_irqrestore+0x3f/0x50
[ 283.311026] hardirqs last disabled at (2018): [<ffffffff82689aff>] __schedule+0x13df/0x28d0
[ 283.314726] softirqs last enabled at (2008): [<ffffffff81324f40>] __irq_exit_rcu+0xc0/0x100
[ 283.318427] softirqs last disabled at (2001): [<ffffffff81324f40>] __irq_exit_rcu+0xc0/0x100
[ 283.321920] ---[ end trace 0000000000000000 ]---
[ 283.323878] BUG: kernel NULL pointer dereference, address: 0000000000000059
[ 283.326033] #PF: supervisor read access in kernel mode
[ 283.327357] #PF: error_code(0x0000) - not-present page
[ 283.328698] PGD 800000000a8c5067 P4D 800000000a8c5067 PUD 12879067 PMD 0
[ 283.329491] Oops: Oops: 0000 [#1] SMP PTI
[ 283.329796] CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Tainted: G W 7.1.0-rc2-00076-g74fe02ce122a #26 PREEMPT(full)
[ 283.331183] Tainted: [W]=WARN
[ 283.331468] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-9.fc43 06/10/2025
[ 283.332346] RIP: 0010:pick_task_fair+0x2d/0xb0
[ 283.332735] Code: fa 8b 97 10 01 00 00 85 d2 0f 84 92 00 00 00 53 48 89 fb 48 83 ec 08 48 8d bb 00 01 00 00 eb 21 be 01 00 00 00 e8 13 8b ff ff <80> 78 59 00 75 3d 48 85 c0 74 48 48 8b b8 b8 00 00 00 48 85 ff 74
[ 283.334364] RSP: 0018:ffffc900000b7e20 EFLAGS: 00010086
[ 283.334992] RAX: 0000000000000000 RBX: ffff88813a5ec800 RCX: 041dd83271100000
[ 283.335731] RDX: 0000000000000000 RSI: 0000000000200000 RDI: ffff888116cb2200
[ 283.336404] RBP: ffffc900000b7f20 R08: 041dd83271100000 R09: 0000000000200000
[ 283.337852] R10: 00000005252d41b2 R11: 0000000000000001 R12: 0000000000000002
[ 283.338802] R13: ffff888025c18000 R14: 0000000000000003 R15: ffffffff84160800
[ 283.339827] FS: 0000000000000000(0000) GS:ffff8881b628c000(0000) knlGS:0000000000000000
[ 283.341158] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 283.342502] CR2: 0000000000000059 CR3: 000000001f322000 CR4: 00000000000006f0
[ 283.345455] Call Trace:
[ 283.347033] <TASK>
[ 283.348350] __schedule+0xc65/0x28d0
[ 283.349703] ? tick_nohz_idle_exit+0x66/0x160
[ 283.350882] ? do_idle+0x17c/0x2b0
[ 283.351454] schedule_idle+0x1d/0x40
[ 283.352017] cpu_startup_entry+0x24/0x30
[ 283.352594] start_secondary+0xf8/0x100
[ 283.353272] common_startup_64+0x13e/0x148
[ 283.353840] </TASK>
Tested with strace test suite as well as hackbench and stress-ng on s390 and x86.
v1-v2:
- added a fix to prevent proxy-exec of unmatched cookie lock owners
[1] https://lore.kernel.org/all/c00-01.ttedd70@ub.hpns/
[2] https://lore.kernel.org/all/10282ce9-f4ae-498f-9b57-f4e1e61fffbc@amd.com/
Vasily Gorbik (2):
sched/core: Don't steal a proxy-exec donor
sched/core: Don't proxy-exec unmatched cookie lock owners
kernel/sched/core.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--
2.53.0
next reply other threads:[~2026-05-07 10:42 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-07 10:41 Vasily Gorbik [this message]
2026-05-07 10:41 ` [PATCH v2 1/2] sched/core: Don't steal a proxy-exec donor Vasily Gorbik
2026-05-12 21:35 ` John Stultz
2026-05-07 10:41 ` [PATCH v2 2/2] sched/core: Don't proxy-exec unmatched cookie lock owners Vasily Gorbik
2026-05-12 22:16 ` John Stultz
2026-05-14 9:54 ` K Prateek Nayak
2026-05-12 21:17 ` [PATCH v2 0/2] sched/core: Fix proxy-exec/core-sched interactions John Stultz
2026-05-13 0:48 ` John Stultz
2026-05-15 16:38 ` Vasily Gorbik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c00-02.v2.ttenwd4@ub.hpns \
--to=gor@linux.ibm.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=hca@linux.ibm.com \
--cc=joelagnelf@nvidia.com \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vineethrp@google.com \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.