* [PATCH 0/1] sched/core: Don't steal a proxy-exec donor
@ 2026-05-04 12:31 Vasily Gorbik
2026-05-04 12:31 ` [PATCH 1/1] " Vasily Gorbik
0 siblings, 1 reply; 3+ messages in thread
From: Vasily Gorbik @ 2026-05-04 12:31 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot
Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, K Prateek Nayak, John Stultz,
Connor O'Brien, Vineeth Pillai, Joel Fernandes, linux-s390,
linux-kernel
Since sched-core-2026-04-13 pull s390 hits the following splats with
defconfig. Running strace test suite "make -j$(nproc) check" on LPAR with
64 SMT-2 cores couple of times usually enough to trigger this. First
WARN in put_prev_entity() on a strace task. The next pick on the same
CPU typically WARNs again, and 60s later the system is in an rcu_sched stall
[ 535.525203] WARNING: kernel/sched/fair.c:5788 at put_prev_entity+0xfe/0x170, CPU#26: grep/242219
[ 535.525212] Modules linked in: mptcp_diag xfrm_user xfrm_algo tcp_diag crypto_user inet_diag netlink_diag algif_hash af_alg dm_service_time nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables zfcp scsi_transport_fc s390_trng eadm_sch vfio_ccw mdev vfio_iommu_type1 vfio sch_fq_codel drm i2c_core loop dm_multipath drm_panel_orientation_quirks nfnetlink uvdevice diag288_wdt prng aes_s390 scsi_dh_rdac scsi_dh_emc scsi_dh_alua phmac_s390 paes_s390 crypto_engine pkey_cca pkey_ep11 zcrypt rng_core pkey_pckmo pkey autofs4 ecdsa_generic ecc sha512 [last unloaded: trace_printk]
[ 535.525268] CPU: 26 UID: 1001 PID: 242219 Comm: grep Not tainted 7.1.0-20260426.rc0.git0.897d54018cc9.300.fc43.s390x+git #1 PREEMPTLAZY
[ 535.525272] Hardware name: IBM 8561 T01 703 (LPAR)
[ 535.525273] Krnl PSW : 0404e00180000000 0000033840f08482 (put_prev_entity+0x102/0x170)
[ 535.525279] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[ 535.525282] Krnl GPRS: 000000000000b8ee 0000000000000000 0000000000000000 0000000000000800
[ 535.525284] 0000000000000000 0000007cafc9c400 0000033840f084f0 00000003e5512b68
[ 535.525287] 00000003e5512b70 0000007cafc9c516 00000000dcd88400 00000001d3b8c900
[ 535.525289] 0000000000000000 00000001d3b8c800 0000033840f08474 000002b84839fbe0
[ 535.525297] Krnl Code: 0000033840f08474: e3b0a0580020 cg %r11,88(%r10)
0000033840f0847a: a784ffa5 brc 8,0000033840f083c4
*0000033840f0847e: af000000 mc 0,0
>0000033840f08482: e548a0580000 mvghi 88(%r10),0
0000033840f08488: eb9ff0a00004 lmg %r9,%r15,160(%r15)
0000033840f0848e: 07fe bcr 15,%r14
0000033840f08490: 47000700 bc 0,1792
0000033840f08494: c0e5ffffeb86 brasl %r14,0000033840f05ba0
[ 535.525367] Call Trace:
[ 535.525372] [<0000033840f08482>] put_prev_entity+0x102/0x170
[ 535.525377] ([<0000033840f08474>] put_prev_entity+0xf4/0x170)
[ 535.525381] [<0000033840f0852a>] put_prev_task_fair+0x3a/0x60
[ 535.525385] [<0000033840ef6e58>] pick_next_task+0x138/0xbd0
[ 535.525388] [<0000033841d364d0>] __schedule+0x180/0x850
[ 535.525391] [<0000033841d36bdc>] schedule+0x3c/0xc0
[ 535.525394] [<0000033841d32d10>] irqentry_exit+0x1c0/0x610
[ 535.525397] [<0000033841d32846>] do_ext_irq+0xe6/0x290
[ 535.525399] [<0000033841d41576>] ext_int_handler+0xc6/0xf0
[ 535.525402] Last Breaking-Event-Address:
[ 535.525403] [<0000033840eff36a>] propagate_entity_load_avg+0x3a/0x490
[ 535.525407] ---[ end trace 0000000000000000 ]---
[ 535.525422] ------------[ cut here ]------------
[ 535.525424] WARNING: kernel/sched/fair.c:7022 at hrtick_start_fair+0x6c/0x80, CPU#26: swapper/26/0
[ 535.525428] Modules linked in: mptcp_diag xfrm_user xfrm_algo tcp_diag crypto_user inet_diag netlink_diag algif_hash af_alg dm_service_time nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables zfcp scsi_transport_fc s390_trng eadm_sch vfio_ccw mdev vfio_iommu_type1 vfio sch_fq_codel drm i2c_core loop dm_multipath drm_panel_orientation_quirks nfnetlink uvdevice diag288_wdt prng aes_s390 scsi_dh_rdac scsi_dh_emc scsi_dh_alua phmac_s390 paes_s390 crypto_engine pkey_cca pkey_ep11 zcrypt rng_core pkey_pckmo pkey autofs4 ecdsa_generic ecc sha512 [last unloaded: trace_printk]
[ 535.525465] CPU: 26 UID: 0 PID: 0 Comm: swapper/26 Tainted: G W 7.1.0-20260426.rc0.git0.897d54018cc9.300.fc43.s390x+git #1 PREEMPTLAZY
[ 535.525469] Tainted: [W]=WARN
[ 535.525471] Hardware name: IBM 8561 T01 703 (LPAR)
[ 535.525472] Krnl PSW : 0404e00180000000 0000033840ef9ac0 (hrtick_start_fair+0x70/0x80)
[ 535.525476] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[ 535.525479] Krnl GPRS: 0000007c00000001 00000003e5533800 00000003e5511800 00000001d3b8c800
[ 535.525482] 00000000000000d8 00000338429cd038 0000033840f19860 00000003e5511800
[ 535.525484] 00000001d3b8c800 000000000000003f 00000003e55127b0 00000001d3b8c948
[ 535.525486] 0000000000000000 0000000081992400 0000033840f0876a 000002b840e5bc48
[ 535.525492] Krnl Code: 0000033840ef9ab2: 47000700 bc 0,1792
0000033840ef9ab6: c0f4ffffd4dd brcl 15,0000033840ef4470
*0000033840ef9abc: af000000 mc 0,0
>0000033840ef9ac0: a7f4ffdc brc 15,0000033840ef9a78
0000033840ef9ac4: 0707 bcr 0,%r7
0000033840ef9ac6: 0707 bcr 0,%r7
0000033840ef9ac8: 0707 bcr 0,%r7
0000033840ef9aca: 0707 bcr 0,%r7
[ 535.525505] Call Trace:
[ 535.525507] [<0000033840ef9ac0>] hrtick_start_fair+0x70/0x80
[ 535.525509] ([<0000033840f086fe>] set_next_task_fair+0x4e/0x230)
[ 535.525513] [<0000033840ef6e76>] pick_next_task+0x156/0xbd0
[ 535.525515] [<0000033841d364d0>] __schedule+0x180/0x850
[ 535.525518] [<0000033841d36d16>] schedule_idle+0x36/0x60
[ 535.525520] [<0000033840f141ce>] do_idle+0x11e/0x160
[ 535.525523] [<0000033840f143e0>] cpu_startup_entry+0x40/0x50
[ 535.525526] [<0000033840e6f7c8>] smp_start_secondary+0x138/0x150
[ 535.525529] [<0000033841d40f72>] restart_int_handler+0x72/0x88
[ 535.525532] Last Breaking-Event-Address:
[ 535.525533] [<0000033840ef9a72>] hrtick_start_fair+0x22/0x80
[ 535.525536] ---[ end trace 0000000000000000 ]---
[ 535.525549] ------------[ cut here ]------------
[ 535.525550] WARNING: kernel/sched/sched.h:1769 at set_next_entity+0x35a/0x370, CPU#26: swapper/26/0
[ 535.525555] Modules linked in: mptcp_diag xfrm_user xfrm_algo tcp_diag crypto_user inet_diag netlink_diag algif_hash af_alg dm_service_time nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables zfcp scsi_transport_fc s390_trng eadm_sch vfio_ccw mdev vfio_iommu_type1 vfio sch_fq_codel drm i2c_core loop dm_multipath drm_panel_orientation_quirks nfnetlink uvdevice diag288_wdt prng aes_s390 scsi_dh_rdac scsi_dh_emc scsi_dh_alua phmac_s390 paes_s390 crypto_engine pkey_cca pkey_ep11 zcrypt rng_core pkey_pckmo pkey autofs4 ecdsa_generic ecc sha512 [last unloaded: trace_printk]
[ 535.525588] CPU: 26 UID: 0 PID: 0 Comm: swapper/26 Tainted: G W 7.1.0-20260426.rc0.git0.897d54018cc9.300.fc43.s390x+git #1 PREEMPTLAZY
[ 535.525591] Tainted: [W]=WARN
[ 535.525593] Hardware name: IBM 8561 T01 703 (LPAR)
[ 535.525594] Krnl PSW : 0404d00180000000 0000033840f0395e (set_next_entity+0x35e/0x370)
[ 535.525598] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
[ 535.525601] Krnl GPRS: 0000007caeee0479 07cafcf56a100000 0000000000000000 0000000000000000
[ 535.525604] 00000003e5445800 0000000000e15d28 0000033840f19860 00000003e5511800
[ 535.525606] 0000000000000001 0000000000000001 000000026ad0f400 00000001d3b8c900
[ 535.525608] 0000000000000000 0000000081992400 0000033840f036f6 000002b840e5bc00
[ 535.525613] Krnl Code: 0000033840f03952: b9e93090 sgrk %r9,%r0,%r3
0000033840f03956: a7f4fee7 brc 15,0000033840f03724
*0000033840f0395a: af000000 mc 0,0
>0000033840f0395e: a7f4fedb brc 15,0000033840f03714
0000033840f03962: 0707 bcr 0,%r7
0000033840f03964: 0707 bcr 0,%r7
0000033840f03966: 0707 bcr 0,%r7
0000033840f03968: 0707 bcr 0,%r7
[ 535.525626] Call Trace:
[ 535.525627] [<0000033840f0395e>] set_next_entity+0x35e/0x370
[ 535.525630] ([<0000033840f036f6>] set_next_entity+0xf6/0x370)
[ 535.525633] [<0000033840f086fe>] set_next_task_fair+0x4e/0x230
[ 535.525636] [<0000033840ef6e76>] pick_next_task+0x156/0xbd0
[ 535.525639] [<0000033841d364d0>] __schedule+0x180/0x850
[ 535.525642] [<0000033841d36d16>] schedule_idle+0x36/0x60
[ 535.525645] [<0000033840f141ce>] do_idle+0x11e/0x160
[ 535.525647] [<0000033840f143e0>] cpu_startup_entry+0x40/0x50
[ 535.525650] [<0000033840e6f7c8>] smp_start_secondary+0x138/0x150
[ 535.525652] [<0000033841d40f72>] restart_int_handler+0x72/0x88
[ 535.525655] Last Breaking-Event-Address:
[ 535.525656] [<0000033840f03710>] set_next_entity+0x110/0x370
[ 535.525659] ---[ end trace 0000000000000000 ]---
[ 535.528235] ------------[ cut here ]------------
[ 535.528241] WARNING: kernel/sched/fair.c:5721 at set_next_entity+0x2c8/0x370, CPU#20: swapper/20/0
[ 535.528248] Modules linked in: mptcp_diag xfrm_user xfrm_algo tcp_diag crypto_user inet_diag netlink_diag algif_hash af_alg dm_service_time nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables zfcp scsi_transport_fc s390_trng eadm_sch vfio_ccw mdev vfio_iommu_type1 vfio sch_fq_codel drm i2c_core loop dm_multipath drm_panel_orientation_quirks nfnetlink uvdevice diag288_wdt prng aes_s390 scsi_dh_rdac scsi_dh_emc scsi_dh_alua phmac_s390 paes_s390 crypto_engine pkey_cca pkey_ep11 zcrypt rng_core pkey_pckmo pkey autofs4 ecdsa_generic ecc sha512 [last unloaded: trace_printk]
[ 535.528291] CPU: 20 UID: 0 PID: 0 Comm: swapper/20 Tainted: G W 7.1.0-20260426.rc0.git0.897d54018cc9.300.fc43.s390x+git #1 PREEMPTLAZY
[ 535.528295] Tainted: [W]=WARN
[ 535.528296] Hardware name: IBM 8561 T01 703 (LPAR)
[ 535.528297] Krnl PSW : 0404e00180000000 0000033840f038cc (set_next_entity+0x2cc/0x370)
[ 535.528302] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[ 535.528305] Krnl GPRS: 0000007caff854b2 00000001d3b8c900 0000000082cec900 00000003e5445800
[ 535.528308] 000000000000002e 00000000000f423f 0000033840f19860 00000003e5445800
[ 535.528310] 00000001d3b8c900 000000804f18b648 000000026ad0f400 00000001d3b8c900
[ 535.528312] 0000000000000000 000000008198c800 0000033840f03820 000002b840e2bc00
[ 535.528319] Krnl Code: 0000033840f038be: e310a0580002 ltg %r1,88(%r10)
0000033840f038c4: a784fece brc 8,0000033840f03660
*0000033840f038c8: af000000 mc 0,0
>0000033840f038cc: a7f4feca brc 15,0000033840f03660
0000033840f038d0: e310b4000002 ltg %r1,1024(%r11)
0000033840f038d6: a784ff09 brc 8,0000033840f036e8
0000033840f038da: 4140b400 la %r4,1024(%r11)
0000033840f038de: e330bf00ff71 lay %r3,-256(%r11)
[ 535.528334] Call Trace:
[ 535.528336] [<0000033840f038cc>] set_next_entity+0x2cc/0x370
[ 535.528339] ([<0000033840f037ac>] set_next_entity+0x1ac/0x370)
[ 535.528342] [<0000033840f086fe>] set_next_task_fair+0x4e/0x230
[ 535.528345] [<0000033840ef6e76>] pick_next_task+0x156/0xbd0
[ 535.528348] [<0000033841d364d0>] __schedule+0x180/0x850
[ 535.528351] [<0000033841d36d16>] schedule_idle+0x36/0x60
[ 535.528354] [<0000033840f141ce>] do_idle+0x11e/0x160
[ 535.528356] [<0000033840f143e0>] cpu_startup_entry+0x40/0x50
[ 535.528359] [<0000033840e6f7c8>] smp_start_secondary+0x138/0x150
[ 535.528362] [<0000033841d40f72>] restart_int_handler+0x72/0x88
[ 535.528365] Last Breaking-Event-Address:
[ 535.528366] [<0000033840f0365c>] set_next_entity+0x5c/0x370
[ 535.528369] ---[ end trace 0000000000000000 ]---
[ 595.527130] rcu: INFO: rcu_sched self-detected stall on CPU
[ 595.527142] rcu: 7-...!: (47526 ticks this GP) idle=8f04/1/0x4000000000000000 softirq=56080/56080 fqs=0
[ 595.527175] rcu: (t=60000 jiffies g=232693 q=6273 ncpus=32)
[ 595.527177] rcu: rcu_sched kthread timer wakeup didn't happen for 59996 jiffies! g232693 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[ 595.527180] rcu: Possible timer handling issue on cpu=28 timer-softirq=1957
[ 595.527182] rcu: rcu_sched kthread starved for 60000 jiffies! g232693 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=28
[ 595.527184] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 595.527186] rcu: RCU grace-period kthread stack dump:
Bisect pointed at commit e0ca8991b2de6 ("sched: Make class_schedulers
avoid pushing current, and get rid of proxy_tag_curr()"), of the
sched-core-2026-04-13 pull. But it only seems to make the corruption
trigger easier, reverting it doesn't help. Applying "Proxy Execution
fixes for v7.1-rc" [1] on top didn't help, nor did "Optimized Donor
Migration for Proxy Execution" [2].
The problem goes away when sched_proxy_exec=0 or nosmt is used.
Adding some debug code in deactivate_task() showed try_steal_cookie()
calling it on src->donor right before the warning. try_steal_cookie()
skips src->core_pick and src->curr but not src->donor.
[1] https://lore.kernel.org/all/20260427183848.698551-1-jstultz@google.com/
[2] https://lore.kernel.org/all/20260422230659.903191-1-jstultz@google.com/
The following patch resolves the issue in my tests. Please consider it
if it makes sense.
Vasily Gorbik (1):
sched/core: Don't steal a proxy-exec donor
kernel/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--
2.53.0
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 1/1] sched/core: Don't steal a proxy-exec donor
2026-05-04 12:31 [PATCH 0/1] sched/core: Don't steal a proxy-exec donor Vasily Gorbik
@ 2026-05-04 12:31 ` Vasily Gorbik
2026-05-04 13:19 ` K Prateek Nayak
0 siblings, 1 reply; 3+ messages in thread
From: Vasily Gorbik @ 2026-05-04 12:31 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot
Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, K Prateek Nayak, John Stultz,
Connor O'Brien, Vineeth Pillai, Joel Fernandes, linux-s390,
linux-kernel
try_steal_cookie() avoids stealing src->core_pick and src->curr before
moving a task with the same cookie via move_queued_task_locked().
With proxy-exec, src->donor is the current scheduling context and may
differ from src->curr. Stealing it migrates a task that the source rq
still treats as current. For CFS, src cfs_rq->curr is left pointing
at the stolen entity and the next pick on src hits the WARN_ON_ONCE
in put_prev_entity().
Commit 7de9d4f94638 ("sched: Start blocked_on chain processing in
find_proxy_task()") tweaked the fair class logic so that the donor task
isn't migrated away while running the proxy. Do it similarly for
try_steal_cookie() and skip src->donor as well.
Fixes: 7de9d4f94638 ("sched: Start blocked_on chain processing in find_proxy_task()")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
---
kernel/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b8871449d3c6..3cf5fb70814c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6358,7 +6358,7 @@ static bool try_steal_cookie(int this, int that)
return false;
do {
- if (p == src->core_pick || p == src->curr)
+ if (p == src->core_pick || p == src->curr || p == src->donor)
goto next;
if (!is_cpu_allowed(p, this))
--
2.53.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH 1/1] sched/core: Don't steal a proxy-exec donor
2026-05-04 12:31 ` [PATCH 1/1] " Vasily Gorbik
@ 2026-05-04 13:19 ` K Prateek Nayak
0 siblings, 0 replies; 3+ messages in thread
From: K Prateek Nayak @ 2026-05-04 13:19 UTC (permalink / raw)
To: Vasily Gorbik, Ingo Molnar, Peter Zijlstra, Juri Lelli,
Vincent Guittot
Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, John Stultz, Connor O'Brien,
Vineeth Pillai, Joel Fernandes, linux-s390, linux-kernel
Hello Vasily,
On 5/4/2026 6:01 PM, Vasily Gorbik wrote:
> try_steal_cookie() avoids stealing src->core_pick and src->curr before
> moving a task with the same cookie via move_queued_task_locked().
>
> With proxy-exec, src->donor is the current scheduling context and may
> differ from src->curr. Stealing it migrates a task that the source rq
> still treats as current. For CFS, src cfs_rq->curr is left pointing
> at the stolen entity and the next pick on src hits the WARN_ON_ONCE
> in put_prev_entity().
>
> Commit 7de9d4f94638 ("sched: Start blocked_on chain processing in
> find_proxy_task()") tweaked the fair class logic so that the donor task
> isn't migrated away while running the proxy. Do it similarly for
> try_steal_cookie() and skip src->donor as well.
>
> Fixes: 7de9d4f94638 ("sched: Start blocked_on chain processing in find_proxy_task()")
> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> ---
> kernel/sched/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b8871449d3c6..3cf5fb70814c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6358,7 +6358,7 @@ static bool try_steal_cookie(int this, int that)
> return false;
>
> do {
> - if (p == src->core_pick || p == src->curr)
> + if (p == src->core_pick || p == src->curr || p == src->donor)
Although this solves the issue of stealing the donor, I'm a bit
skeptical if proxy exec even works with core scheduling at all since
__schedule() can override the decision of core_pick and the CPU
may end up running a task with different core-cookie if it found
the core_pick to be blocked on a mutex :-(
> goto next;
>
> if (!is_cpu_allowed(p, this))
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-04 13:19 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-04 12:31 [PATCH 0/1] sched/core: Don't steal a proxy-exec donor Vasily Gorbik
2026-05-04 12:31 ` [PATCH 1/1] " Vasily Gorbik
2026-05-04 13:19 ` K Prateek Nayak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox