From: Christian Loehle <christian.loehle@arm.com>
To: Peter Zijlstra <peterz@infradead.org>, tj@kernel.org
Cc: linux-kernel@vger.kernel.org, mingo@redhat.com,
juri.lelli@redhat.com, vincent.guittot@linaro.org,
dietmar.eggemann@arm.com, rostedt@goodmis.org,
bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com,
void@manifault.com, arighi@nvidia.com, changwoo@igalia.com,
cgroups@vger.kernel.org, sched-ext@lists.linux.dev,
liuwenfang@honor.com, tglx@linutronix.de
Subject: Re: [PATCH 00/14] sched: Support shared runqueue locking
Date: Thu, 18 Sep 2025 16:15:45 +0100 [thread overview]
Message-ID: <02811bd7-b401-4e16-bb7d-4edeb0b89ffd@arm.com> (raw)
In-Reply-To: <20250910154409.446470175@infradead.org>
On 9/10/25 16:44, Peter Zijlstra wrote:
> Hi,
>
> As mentioned [1], a fair amount of sched ext weirdness (current and proposed)
> is down to the core code not quite working right for shared runqueue stuff.
>
> Instead of endlessly hacking around that, bite the bullet and fix it all up.
>
> With these patches, it should be possible to clean up pick_task_scx() to not
> rely on balance_scx(). Additionally it should be possible to fix that RT issue,
> and the dl_server issue without further propagating lock breaks.
>
> As is, these patches boot and run/pass selftests/sched_ext with lockdep on.
>
> I meant to do more sched_ext cleanups, but since this has all already taken
> longer than I would've liked (real life interrupted :/), I figured I should
> post this as is and let TJ/Andrea poke at it.
>
> Patches are also available at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/cleanup
>
>
> [1] https://lkml.kernel.org/r/20250904202858.GN4068168@noisy.programming.kicks-ass.net
>
>
> ---
> include/linux/cleanup.h | 5 +
> include/linux/sched.h | 6 +-
> kernel/cgroup/cpuset.c | 2 +-
> kernel/kthread.c | 15 +-
> kernel/sched/core.c | 370 +++++++++++++++++++++--------------------------
> kernel/sched/deadline.c | 26 ++--
> kernel/sched/ext.c | 104 +++++++------
> kernel/sched/fair.c | 23 ++-
> kernel/sched/idle.c | 14 +-
> kernel/sched/rt.c | 13 +-
> kernel/sched/sched.h | 225 ++++++++++++++++++++--------
> kernel/sched/stats.h | 2 +-
> kernel/sched/stop_task.c | 14 +-
> kernel/sched/syscalls.c | 80 ++++------
> 14 files changed, 495 insertions(+), 404 deletions(-)
>
>
Hi Peter, A couple of issues popped up when testing this [1] (that don't trigger on [2]):
When booting (arm64 orion o6) I get:
[ 1.298020] sched: DL replenish lagged too much
[ 1.298364] ------------[ cut here ]------------
[ 1.298377] WARNING: CPU: 4 PID: 0 at kernel/sched/deadline.c:239 inactive_task_timer+0x3d0/0x474
[ 1.298413] Modules linked in:
[ 1.298436] CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Tainted: G S 6.17.0-rc4-cix-build+ #56 PREEMPT
[ 1.298455] Tainted: [S]=CPU_OUT_OF_SPEC
[ 1.298463] Hardware name: Radxa Computer (Shenzhen) Co., Ltd. Radxa Orion O6/Radxa Orion O6, BIOS 0.3.0-1 2025-04-28T03:35:34+00:00
[ 1.298473] pstate: 034000c9 (nzcv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[ 1.298486] pc : inactive_task_timer+0x3d0/0x474
[ 1.298505] lr : inactive_task_timer+0x394/0x474
[ 1.298522] sp : ffff800083d4be00
[ 1.298530] x29: ffff800083d4be20 x28: ffff00008362d888 x27: ffff800082ab1f88
[ 1.298561] x26: ffff800082ab4a98 x25: ffff0001fef50c18 x24: 0000000000019999
[ 1.298589] x23: 000000000000cccc x22: ffff0001fef51708 x21: ffff00008362d640
[ 1.298616] x20: ffff0001fef50c00 x19: ffff00008362d7f0 x18: fffffffffff0b580
[ 1.298642] x17: ffff80017c966000 x16: ffff800083d48000 x15: 0000000000000028
[ 1.298669] x14: 0000000000000000 x13: 00000000000c4000 x12: 00000000000000c5
[ 1.298695] x11: 0000000000004bb8 x10: 0000000000004bb8 x9 : 0000000000000000
[ 1.298722] x8 : 0000000000000000 x7 : 0000000000000011 x6 : ffff0001fef51bc0
[ 1.298747] x5 : ffff0001fef50c00 x4 : 00000000000000cc x3 : 0000000000000000
[ 1.298773] x2 : ffff80017c966000 x1 : 0000000000000000 x0 : ffffffffffff3333
[ 1.298800] Call trace:
[ 1.298808] inactive_task_timer+0x3d0/0x474 (P)
[ 1.298830] __hrtimer_run_queues+0x3c4/0x440
[ 1.298852] hrtimer_interrupt+0xe4/0x244
[ 1.298871] arch_timer_handler_phys+0x2c/0x44
[ 1.298893] handle_percpu_devid_irq+0xa8/0x1f0
[ 1.298916] handle_irq_desc+0x40/0x58
[ 1.298933] generic_handle_domain_irq+0x1c/0x28
[ 1.298950] gic_handle_irq+0x4c/0x11c
[ 1.298965] call_on_irq_stack+0x30/0x48
[ 1.298982] do_interrupt_handler+0x80/0x84
[ 1.299001] el1_interrupt+0x34/0x64
[ 1.299022] el1h_64_irq_handler+0x18/0x24
[ 1.299037] el1h_64_irq+0x6c/0x70
[ 1.299052] finish_task_switch.isra.0+0xac/0x2bc (P)
[ 1.299070] __schedule+0x45c/0xffc
[ 1.299088] schedule_idle+0x28/0x48
[ 1.299104] do_idle+0x184/0x27c
[ 1.299121] cpu_startup_entry+0x34/0x3c
[ 1.299137] secondary_start_kernel+0x134/0x154
[ 1.299158] __secondary_switched+0xc0/0xc4
[ 1.299179] irq event stamp: 1634
[ 1.299189] hardirqs last enabled at (1633): [<ffff800081486354>] el1_interrupt+0x54/0x64
[ 1.299210] hardirqs last disabled at (1634): [<ffff800081486324>] el1_interrupt+0x24/0x64
[ 1.299229] softirqs last enabled at (1614): [<ffff8000800bf7b0>] handle_softirqs+0x4a0/0x4b8
[ 1.299248] softirqs last disabled at (1609): [<ffff800080010600>] __do_softirq+0x14/0x20
[ 1.299262] ---[ end trace 0000000000000000 ]---
and when running actual tests (e.g. iterating through all scx schedulers under load):
[ 146.532691] ================================
[ 146.536947] WARNING: inconsistent lock state
[ 146.541204] 6.17.0-rc4-cix-build+ #58 Tainted: G S W
[ 146.547457] --------------------------------
[ 146.551713] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[ 146.557705] rcu_tasks_trace/79 [HC0[0]:SC0[0]:HE0:SE1] takes:
[ 146.563438] ffff000089c90e58 (&dsq->lock){?.-.}-{2:2}, at: __task_rq_lock+0x88/0x194
[ 146.571179] {IN-HARDIRQ-W} state was registered at:
[ 146.576042] lock_acquire+0x1c8/0x338
[ 146.579788] _raw_spin_lock+0x48/0x60
[ 146.583536] dispatch_enqueue+0x130/0x3e8
[ 146.587632] do_enqueue_task+0x2f0/0x464
[ 146.591629] enqueue_task_scx+0x1b0/0x290
[ 146.595712] enqueue_task+0x84/0x18c
[ 146.599360] ttwu_do_activate+0x84/0x25c
[ 146.603361] try_to_wake_up+0x310/0x5f8
[ 146.607271] wake_up_process+0x18/0x24
[ 146.611094] kick_pool+0x9c/0x17c
[ 146.614483] __queue_work+0x544/0x7a8
[ 146.618223] __queue_delayed_work+0x118/0x15c
[ 146.622653] mod_delayed_work_on+0xcc/0xe0
[ 146.626823] kblockd_mod_delayed_work_on+0x20/0x30
[ 146.631696] blk_mq_kick_requeue_list+0x1c/0x28
[ 146.636307] blk_flush_complete_seq+0xd4/0x2a4
[ 146.640824] flush_end_io+0x1e0/0x3f4
[ 146.644559] blk_mq_end_request+0x60/0x154
[ 146.648733] nvme_end_req+0x30/0x78
[ 146.652306] nvme_complete_rq+0x7c/0x218
[ 146.656302] nvme_pci_complete_rq+0x98/0x110
[ 146.660650] nvme_poll_cq+0x1cc/0x3b4
[ 146.664385] nvme_irq+0x34/0x88
[ 146.667600] __handle_irq_event_percpu+0x88/0x304
[ 146.672384] handle_irq_event+0x4c/0xa8
[ 146.676293] handle_fasteoi_irq+0x108/0x20c
[ 146.680555] handle_irq_desc+0x40/0x58
[ 146.684378] generic_handle_domain_irq+0x1c/0x28
[ 146.689068] gic_handle_irq+0x4c/0x11c
[ 146.692891] call_on_irq_stack+0x30/0x48
[ 146.696891] do_interrupt_handler+0x80/0x84
[ 146.701151] el1_interrupt+0x34/0x64
[ 146.704810] el1h_64_irq_handler+0x18/0x24
[ 146.708979] el1h_64_irq+0x6c/0x70
[ 146.712453] cpuidle_enter_state+0x12c/0x53c
[ 146.716796] cpuidle_enter+0x38/0x50
[ 146.720458] do_idle+0x204/0x27c
[ 146.723759] cpu_startup_entry+0x38/0x3c
[ 146.727755] secondary_start_kernel+0x134/0x154
[ 146.732370] __secondary_switched+0xc0/0xc4
[ 146.736638] irq event stamp: 1754
[ 146.739938] hardirqs last enabled at (1753): [<ffff800081497184>] _raw_spin_unlock_irqrestore+0x6c/0x70
[ 146.749405] hardirqs last disabled at (1754): [<ffff8000814965e4>] _raw_spin_lock_irqsave+0x84/0x88
[ 146.758437] softirqs last enabled at (1664): [<ffff800080195598>] rcu_tasks_invoke_cbs+0x100/0x394
[ 146.767476] softirqs last disabled at (1660): [<ffff800080195598>] rcu_tasks_invoke_cbs+0x100/0x394
[ 146.776506]
[ 146.776506] other info that might help us debug this:
[ 146.783019] Possible unsafe locking scenario:
[ 146.783019]
[ 146.788923] CPU0
[ 146.791356] ----
[ 146.793788] lock(&dsq->lock);
[ 146.796915] <Interrupt>
[ 146.799521] lock(&dsq->lock);
[ 146.802821]
[ 146.802821] *** DEADLOCK ***
[ 146.802821]
[ 146.808725] 3 locks held by rcu_tasks_trace/79:
[ 146.813242] #0: ffff800082e6e650 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{4:4}, at: rcu_tasks_one_gp+0x328/0x570
[ 146.823403] #1: ffff800082adc1f0 (cpu_hotplug_lock){++++}-{0:0}, at: cpus_read_lock+0x10/0x1c
[ 146.832014] #2: ffff000089c90e58 (&dsq->lock){?.-.}-{2:2}, at: __task_rq_lock+0x88/0x194
[ 146.840178]
[ 146.813242] #0: ffff800082e6e650 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{4:4}, at: rcu_tasks_one_gp+0x328/0x570
[ 146.823403] #1: ffff800082adc1f0 (cpu_hotplug_lock){++++}-{0:0}, at: cpus_read_lock+0x10/0x1c
[ 146.832014] #2: ffff000089c90e58 (&dsq->lock){?.-.}-{2:2}, at: __task_rq_lock+0x88/0x194
[ 146.840178]
[ 146.840178] stack backtrace:
[ 146.844521] CPU: 10 UID: 0 PID: 79 Comm: rcu_tasks_trace Tainted: G S W 6.17.0-rc4-cix-build+ #58 PREEMPT
[ 146.855463] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
[ 146.860240] Hardware name: Radxa Computer (Shenzhen) Co., Ltd. Radxa Orion O6/Radxa Orion O6, BIOS 0.3.0-1 2025-04-28T03:35:34+00:00
[ 146.872136] Sched_ext: simple (enabled+all), task: runnable_at=-4ms
[ 146.872138] Call trace:
[ 146.880822] show_stack+0x18/0x24 (C)
[ 146.884471] dump_stack_lvl+0x90/0xd0
[ 146.888131] dump_stack+0x18/0x24
[ 146.891432] print_usage_bug.part.0+0x29c/0x364
[ 146.895950] mark_lock+0x778/0x978
[ 146.899338] mark_held_locks+0x58/0x90
[ 146.903074] lockdep_hardirqs_on_prepare+0x100/0x210
[ 146.908025] trace_hardirqs_on+0x5c/0x1cc
[ 146.912025] _raw_spin_unlock_irqrestore+0x6c/0x70
[ 146.916803] task_call_func+0x110/0x164
[ 146.920625] trc_wait_for_one_reader.part.0+0x5c/0x3b8
[ 146.925750] check_all_holdout_tasks_trace+0x124/0x480
[ 146.930874] rcu_tasks_wait_gp+0x1f0/0x3b4
[ 146.934957] rcu_tasks_one_gp+0x4a4/0x570
[ 146.938953] rcu_tasks_kthread+0xd4/0xe0
[ 146.942862] kthread+0x148/0x208
[ 146.946079] ret_from_fork+0x10/0x20
(This actually locks up the system without any further print FWIW).
I'll keep testing and start debugging now, but if I can help you with anything immediately, please
do shout.
[1]
This referring to sched/cleanup at time of writing:
e127838bf8f9 sched: Cleanup NOCLOCK
ce024feefe1c sched/ext: Implement p->srq_lock support
6ef342071dd7 sched: Add {DE,EN}QUEUE_LOCKED
ed738ce6f9fb sched: Add shared runqueue locking to __task_rq_lock()
94f197f28834 sched: Add flags to {put_prev,set_next}_task() methods
254d43c94105 sched: Add locking comments to sched_class methods
f8864b505a17 sched: Make __do_set_cpus_allowed() use the sched_change pattern
d0e9cfb835d3 sched: Rename do_set_cpus_allowed()
cfcabf45249d sched: Fix do_set_cpus_allowed() locking
f7b9b39041fb sched: Fix migrate_disable_switch() locking
91128b33456a sched: Move sched_class::prio_changed() into the change pattern
c59dc6ce071b sched: Cleanup sched_delayed handling for class switches
13ea43940095 sched: Fold sched_class::switch{ing,ed}_{to,from}() into the change pattern
f0b336327a1b sched: Re-arrange the {EN,DE}QUEUE flags
b55442cb4ec1 sched: Employ sched_change guards
[2]
5b726e9bf954 sched/fair: Get rid of throttled_lb_pair()
next prev parent reply other threads:[~2025-09-18 15:15 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-10 15:44 [PATCH 00/14] sched: Support shared runqueue locking Peter Zijlstra
2025-09-10 15:44 ` [PATCH 01/14] sched: Employ sched_change guards Peter Zijlstra
2025-09-11 9:06 ` K Prateek Nayak
2025-09-11 9:55 ` Peter Zijlstra
2025-09-11 10:10 ` Peter Zijlstra
2025-09-11 10:37 ` K Prateek Nayak
2025-10-06 15:21 ` Shrikanth Hegde
2025-10-06 18:14 ` Peter Zijlstra
2025-10-07 5:12 ` Shrikanth Hegde
2025-10-07 9:34 ` Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] sched: Mandate shared flags for sched_change tip-bot2 for Peter Zijlstra
2025-09-10 15:44 ` [PATCH 02/14] sched: Re-arrange the {EN,DE}QUEUE flags Peter Zijlstra
2025-09-10 15:44 ` [PATCH 03/14] sched: Fold sched_class::switch{ing,ed}_{to,from}() into the change pattern Peter Zijlstra
2025-09-10 15:44 ` [PATCH 04/14] sched: Cleanup sched_delayed handling for class switches Peter Zijlstra
2025-09-10 15:44 ` [PATCH 05/14] sched: Move sched_class::prio_changed() into the change pattern Peter Zijlstra
2025-09-11 1:44 ` Tejun Heo
2025-09-10 15:44 ` [PATCH 06/14] sched: Fix migrate_disable_switch() locking Peter Zijlstra
2025-09-10 15:44 ` [PATCH 07/14] sched: Fix do_set_cpus_allowed() locking Peter Zijlstra
2025-10-30 0:12 ` Mark Brown
2025-10-30 9:07 ` Peter Zijlstra
2025-10-30 12:47 ` Mark Brown
2025-09-10 15:44 ` [PATCH 08/14] sched: Rename do_set_cpus_allowed() Peter Zijlstra
2025-09-10 15:44 ` [PATCH 09/14] sched: Make __do_set_cpus_allowed() use the sched_change pattern Peter Zijlstra
2025-09-10 15:44 ` [PATCH 10/14] sched: Add locking comments to sched_class methods Peter Zijlstra
2025-09-10 15:44 ` [PATCH 11/14] sched: Add flags to {put_prev,set_next}_task() methods Peter Zijlstra
2025-09-10 15:44 ` [PATCH 12/14] sched: Add shared runqueue locking to __task_rq_lock() Peter Zijlstra
2025-09-12 0:19 ` Tejun Heo
2025-09-12 11:54 ` Peter Zijlstra
2025-09-12 14:11 ` Peter Zijlstra
2025-09-12 17:56 ` Tejun Heo
2025-09-15 8:38 ` Peter Zijlstra
2025-09-16 22:29 ` Tejun Heo
2025-09-16 22:41 ` Tejun Heo
2025-09-25 8:35 ` Peter Zijlstra
2025-09-25 21:43 ` Tejun Heo
2025-09-26 9:59 ` Peter Zijlstra
2025-09-26 16:48 ` Tejun Heo
2025-09-26 10:36 ` Peter Zijlstra
2025-09-26 21:39 ` Tejun Heo
2025-09-29 10:06 ` Peter Zijlstra
2025-09-30 23:49 ` Tejun Heo
2025-10-01 11:54 ` Peter Zijlstra
2025-10-02 23:32 ` Tejun Heo
2025-09-10 15:44 ` [PATCH 13/14] sched: Add {DE,EN}QUEUE_LOCKED Peter Zijlstra
2025-09-11 2:01 ` Tejun Heo
2025-09-11 9:42 ` Peter Zijlstra
2025-09-11 20:40 ` Tejun Heo
2025-09-12 14:19 ` Peter Zijlstra
2025-09-12 16:32 ` Tejun Heo
2025-09-13 22:32 ` Tejun Heo
2025-09-15 8:48 ` Peter Zijlstra
2025-09-25 13:10 ` Peter Zijlstra
2025-09-25 15:40 ` Tejun Heo
2025-09-25 15:53 ` Peter Zijlstra
2025-09-25 18:44 ` Tejun Heo
2025-09-10 15:44 ` [PATCH 14/14] sched/ext: Implement p->srq_lock support Peter Zijlstra
2025-09-10 16:07 ` Peter Zijlstra
2025-09-10 17:32 ` [PATCH 00/14] sched: Support shared runqueue locking Andrea Righi
2025-09-10 18:19 ` Peter Zijlstra
2025-09-10 18:35 ` Peter Zijlstra
2025-09-10 19:00 ` Andrea Righi
2025-09-11 9:58 ` Peter Zijlstra
2025-09-11 14:51 ` Andrea Righi
2025-09-11 14:00 ` Peter Zijlstra
2025-09-11 14:30 ` Peter Zijlstra
2025-09-11 14:48 ` Andrea Righi
2025-09-18 15:15 ` Christian Loehle [this message]
2025-09-25 9:00 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=02811bd7-b401-4e16-bb7d-4edeb0b89ffd@arm.com \
--to=christian.loehle@arm.com \
--cc=arighi@nvidia.com \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=changwoo@igalia.com \
--cc=dietmar.eggemann@arm.com \
--cc=hannes@cmpxchg.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=liuwenfang@honor.com \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=sched-ext@lists.linux.dev \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=void@manifault.com \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.