public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET sched_ext/for-6.12-fixes] sched_ext: Fix locking enable/disable path bugs includling locking order one
@ 2024-09-23 18:59 Tejun Heo
  2024-09-23 18:59 ` [PATCH 1/8] sched_ext: Relocate check_hotplug_seq() call in scx_ops_enable() Tejun Heo
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Tejun Heo @ 2024-09-23 18:59 UTC (permalink / raw)
  To: void; +Cc: kernel-team, linux-kernel, sched-ext, aboorvad

Aboorva Devarajan reported an issue where sched_ext init code can
occasionally deadlock when scheduler loading races CPU hotplug. The deadlock
scenario is as follows:

       scx_ops_enable()                               hotplug

                                          percpu_down_write(&cpu_hotplug_lock)
   percpu_down_write(&scx_fork_rwsem)
   block on cpu_hotplug_lock
                                          kthread_create() waits for kthreadd
					  kthreadd blocks on scx_fork_rwsem

Note that this doesn't trigger lockdep because the hotplug side dependency
bounces through kthreadd.

This is primarily caused by SCX enable/disable paths grabbing big locks
together. This patchset updates the enable/disable paths to decouple the
locks. In the process, it also fixes several subtle bugs in the enable path.

This patchset contains the following patches:

 0001-sched_ext-Relocate-check_hotplug_seq-call-in-scx_ops.patch
 0002-sched_ext-Remove-SCX_OPS_PREPPING.patch
 0003-sched_ext-Initialize-in-bypass-mode.patch
 0004-sched_ext-Fix-SCX_TASK_INIT-SCX_TASK_READY-transitio.patch
 0005-sched_ext-Enable-scx_ops_init_task-separately.patch
 0006-sched_ext-Add-scx_cgroup_enabled-to-gate-cgroup-oper.patch
 0007-sched_ext-Decouple-locks-in-scx_ops_disable_workfn.patch
 0008-sched_ext-Decouple-locks-in-scx_ops_enable.patch

 0001-0002 are prep patches.

 0003 removes a race window in the enable path that can cause stalls and
 prepares for further locking updates.

 0004-0005 remove race windows in the enable path that can cause invalid task
 state transitions.

 0006 fixes a bug in cgroup enable path which can skip invocation of
 ops.cgroup_exit() and prepares for further locking updates.

 0007-0008 decouple the big locks and fix the deadlock.

This patchset can also be found in the following git branch:

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git scx-enable-locking-fix

diffstat follows. Thanks.

 kernel/sched/ext.c |  199 ++++++++++++++++++++++++++++++++------------------------------------
 1 file changed, 94 insertions(+), 105 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-09-27 20:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-23 18:59 [PATCHSET sched_ext/for-6.12-fixes] sched_ext: Fix locking enable/disable path bugs includling locking order one Tejun Heo
2024-09-23 18:59 ` [PATCH 1/8] sched_ext: Relocate check_hotplug_seq() call in scx_ops_enable() Tejun Heo
2024-09-23 18:59 ` [PATCH 2/8] sched_ext: Remove SCX_OPS_PREPPING Tejun Heo
2024-09-23 18:59 ` [PATCH 3/8] sched_ext: Initialize in bypass mode Tejun Heo
2024-09-23 18:59 ` [PATCH 4/8] sched_ext: Fix SCX_TASK_INIT -> SCX_TASK_READY transitions in scx_ops_enable() Tejun Heo
2024-09-23 18:59 ` [PATCH 5/8] sched_ext: Enable scx_ops_init_task() separately Tejun Heo
2024-09-23 18:59 ` [PATCH 6/8] sched_ext: Add scx_cgroup_enabled to gate cgroup operations and fix scx_tg_online() Tejun Heo
2024-09-23 18:59 ` [PATCH 7/8] sched_ext: Decouple locks in scx_ops_disable_workfn() Tejun Heo
2024-09-23 18:59 ` [PATCH 8/8] sched_ext: Decouple locks in scx_ops_enable() Tejun Heo
2024-09-27 20:03 ` [PATCHSET sched_ext/for-6.12-fixes] sched_ext: Fix locking enable/disable path bugs includling locking order one Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox