All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: void@manifault.com
Cc: kernel-team@meta.com, linux-kernel@vger.kernel.org,
	sched-ext@meta.com, aboorvad@linux.ibm.com,
	Tejun Heo <tj@kernel.org>
Subject: [PATCH 7/8] sched_ext: Decouple locks in scx_ops_disable_workfn()
Date: Mon, 23 Sep 2024 08:59:34 -1000	[thread overview]
Message-ID: <20240923190020.1446325-8-tj@kernel.org> (raw)
In-Reply-To: <20240923190020.1446325-1-tj@kernel.org>

The disable path uses three big locks - scx_fork_rwsem, scx_cgroup_rwsem and
cpus_read_lock. Currently, the locks are grabbed together which is prone to
locking order problems. With the preceding scx_cgroup_enabled change, we can
decouple them:

- As cgroup disabling no longer requires modifying a static_key which
  requires cpus_read_lock(), no need to grab cpus_read_lock() before
  grabbing scx_cgroup_rwsem.

- cgroup can now be independently disabled before tasks are moved back to
  the fair class.

Relocate scx_cgroup_exit() invocation before scx_fork_rwsem is grabbed, drop
now unnecessary cpus_read_lock() and move static_key operations out of
scx_fork_rwsem. This decouples all three locks in the disable path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Link: http://lkml.kernel.org/r/8cd0ec0c4c7c1bc0119e61fbef0bee9d5e24022d.camel@linux.ibm.com
---
 kernel/sched/ext.c | 37 +++++++++++++++++--------------------
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 06dc1011312d..18f95072866f 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4439,21 +4439,23 @@ static void scx_ops_disable_workfn(struct kthread_work *work)
 	WRITE_ONCE(scx_switching_all, false);
 
 	/*
-	 * Avoid racing against fork and cgroup changes. See scx_ops_enable()
-	 * for explanation on the locking order.
+	 * Shut down cgroup support before tasks so that the cgroup attach path
+	 * doesn't race against scx_ops_exit_task().
 	 */
-	percpu_down_write(&scx_fork_rwsem);
-	cpus_read_lock();
 	scx_cgroup_lock();
+	scx_cgroup_exit();
+	scx_cgroup_unlock();
 
-	scx_ops_init_task_enabled = false;
-
-	spin_lock_irq(&scx_tasks_lock);
-	scx_task_iter_init(&sti);
 	/*
 	 * The BPF scheduler is going away. All tasks including %TASK_DEAD ones
 	 * must be switched out and exited synchronously.
 	 */
+	percpu_down_write(&scx_fork_rwsem);
+
+	scx_ops_init_task_enabled = false;
+
+	spin_lock_irq(&scx_tasks_lock);
+	scx_task_iter_init(&sti);
 	while ((p = scx_task_iter_next_locked(&sti))) {
 		const struct sched_class *old_class = p->sched_class;
 		struct sched_enq_and_set_ctx ctx;
@@ -4471,23 +4473,18 @@ static void scx_ops_disable_workfn(struct kthread_work *work)
 	}
 	scx_task_iter_exit(&sti);
 	spin_unlock_irq(&scx_tasks_lock);
+	percpu_up_write(&scx_fork_rwsem);
 
 	/* no task is on scx, turn off all the switches and flush in-progress calls */
-	static_branch_disable_cpuslocked(&__scx_ops_enabled);
+	static_branch_disable(&__scx_ops_enabled);
 	for (i = SCX_OPI_BEGIN; i < SCX_OPI_END; i++)
-		static_branch_disable_cpuslocked(&scx_has_op[i]);
-	static_branch_disable_cpuslocked(&scx_ops_enq_last);
-	static_branch_disable_cpuslocked(&scx_ops_enq_exiting);
-	static_branch_disable_cpuslocked(&scx_ops_cpu_preempt);
-	static_branch_disable_cpuslocked(&scx_builtin_idle_enabled);
+		static_branch_disable(&scx_has_op[i]);
+	static_branch_disable(&scx_ops_enq_last);
+	static_branch_disable(&scx_ops_enq_exiting);
+	static_branch_disable(&scx_ops_cpu_preempt);
+	static_branch_disable(&scx_builtin_idle_enabled);
 	synchronize_rcu();
 
-	scx_cgroup_exit();
-
-	scx_cgroup_unlock();
-	cpus_read_unlock();
-	percpu_up_write(&scx_fork_rwsem);
-
 	if (ei->kind >= SCX_EXIT_ERROR) {
 		pr_err("sched_ext: BPF scheduler \"%s\" disabled (%s)\n",
 		       scx_ops.name, ei->reason);
-- 
2.46.0


  parent reply	other threads:[~2024-09-23 19:00 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-23 18:59 [PATCHSET sched_ext/for-6.12-fixes] sched_ext: Fix locking enable/disable path bugs includling locking order one Tejun Heo
2024-09-23 18:59 ` [PATCH 1/8] sched_ext: Relocate check_hotplug_seq() call in scx_ops_enable() Tejun Heo
2024-09-23 18:59 ` [PATCH 2/8] sched_ext: Remove SCX_OPS_PREPPING Tejun Heo
2024-09-23 18:59 ` [PATCH 3/8] sched_ext: Initialize in bypass mode Tejun Heo
2024-09-23 18:59 ` [PATCH 4/8] sched_ext: Fix SCX_TASK_INIT -> SCX_TASK_READY transitions in scx_ops_enable() Tejun Heo
2024-09-23 18:59 ` [PATCH 5/8] sched_ext: Enable scx_ops_init_task() separately Tejun Heo
2024-09-23 18:59 ` [PATCH 6/8] sched_ext: Add scx_cgroup_enabled to gate cgroup operations and fix scx_tg_online() Tejun Heo
2024-09-23 18:59 ` Tejun Heo [this message]
2024-09-23 18:59 ` [PATCH 8/8] sched_ext: Decouple locks in scx_ops_enable() Tejun Heo
2024-09-27 20:03 ` [PATCHSET sched_ext/for-6.12-fixes] sched_ext: Fix locking enable/disable path bugs includling locking order one Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240923190020.1446325-8-tj@kernel.org \
    --to=tj@kernel.org \
    --cc=aboorvad@linux.ibm.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@meta.com \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.