From: Waiman Long <longman@redhat.com>
To: "Chen Ridong" <chenridong@huaweicloud.com>,
"Tejun Heo" <tj@kernel.org>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Michal Koutný" <mkoutny@suse.com>,
"Ingo Molnar" <mingo@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Juri Lelli" <juri.lelli@redhat.com>,
"Vincent Guittot" <vincent.guittot@linaro.org>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Ben Segall" <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
"Valentin Schneider" <vschneid@redhat.com>,
"Frederic Weisbecker" <frederic@kernel.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Shuah Khan" <shuah@kernel.org>
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org, Waiman Long <longman@redhat.com>
Subject: [PATCH v6 3/8] cgroup/cpuset: Clarify exclusion rules for cpuset internal variables
Date: Sat, 21 Feb 2026 13:54:13 -0500 [thread overview]
Message-ID: <20260221185418.29319-4-longman@redhat.com> (raw)
In-Reply-To: <20260221185418.29319-1-longman@redhat.com>
Clarify the locking rules associated with file level internal variables
inside the cpuset code. There is no functional change.
Reviewed-by: Chen Ridong <chenridong@huaweicloud.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset.c | 105 ++++++++++++++++++++++++-----------------
1 file changed, 61 insertions(+), 44 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 58660e06d322..e8c0b3cfd1f9 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -61,6 +61,58 @@ static const char * const perr_strings[] = {
[PERR_REMOTE] = "Have remote partition underneath",
};
+/*
+ * CPUSET Locking Convention
+ * -------------------------
+ *
+ * Below are the three global locks guarding cpuset structures in lock
+ * acquisition order:
+ * - cpu_hotplug_lock (cpus_read_lock/cpus_write_lock)
+ * - cpuset_mutex
+ * - callback_lock (raw spinlock)
+ *
+ * A task must hold all the three locks to modify externally visible or
+ * used fields of cpusets, though some of the internally used cpuset fields
+ * and internal variables can be modified without holding callback_lock. If only
+ * reliable read access of the externally used fields are needed, a task can
+ * hold either cpuset_mutex or callback_lock which are exposed to other
+ * external subsystems.
+ *
+ * If a task holds cpu_hotplug_lock and cpuset_mutex, it blocks others,
+ * ensuring that it is the only task able to also acquire callback_lock and
+ * be able to modify cpusets. It can perform various checks on the cpuset
+ * structure first, knowing nothing will change. It can also allocate memory
+ * without holding callback_lock. While it is performing these checks, various
+ * callback routines can briefly acquire callback_lock to query cpusets. Once
+ * it is ready to make the changes, it takes callback_lock, blocking everyone
+ * else.
+ *
+ * Calls to the kernel memory allocator cannot be made while holding
+ * callback_lock which is a spinlock, as the memory allocator may sleep or
+ * call back into cpuset code and acquire callback_lock.
+ *
+ * Now, the task_struct fields mems_allowed and mempolicy may be changed
+ * by other task, we use alloc_lock in the task_struct fields to protect
+ * them.
+ *
+ * The cpuset_common_seq_show() handlers only hold callback_lock across
+ * small pieces of code, such as when reading out possibly multi-word
+ * cpumasks and nodemasks.
+ */
+
+static DEFINE_MUTEX(cpuset_mutex);
+
+/*
+ * File level internal variables below follow one of the following exclusion
+ * rules.
+ *
+ * RWCS: Read/write-able by holding either cpus_write_lock (and optionally
+ * cpuset_mutex) or both cpus_read_lock and cpuset_mutex.
+ *
+ * CSCB: Readable by holding either cpuset_mutex or callback_lock. Writable
+ * by holding both cpuset_mutex and callback_lock.
+ */
+
/*
* For local partitions, update to subpartitions_cpus & isolated_cpus is done
* in update_parent_effective_cpumask(). For remote partitions, it is done in
@@ -70,19 +122,18 @@ static const char * const perr_strings[] = {
* Exclusive CPUs distributed out to local or remote sub-partitions of
* top_cpuset
*/
-static cpumask_var_t subpartitions_cpus;
+static cpumask_var_t subpartitions_cpus; /* RWCS */
/*
- * Exclusive CPUs in isolated partitions
+ * Exclusive CPUs in isolated partitions (shown in cpuset.cpus.isolated)
*/
-static cpumask_var_t isolated_cpus;
+static cpumask_var_t isolated_cpus; /* CSCB */
/*
- * isolated_cpus updating flag (protected by cpuset_mutex)
- * Set if isolated_cpus is going to be updated in the current
- * cpuset_mutex crtical section.
+ * Set if isolated_cpus is being updated in the current cpuset_mutex
+ * critical section.
*/
-static bool isolated_cpus_updating;
+static bool isolated_cpus_updating; /* RWCS */
/*
* A flag to force sched domain rebuild at the end of an operation.
@@ -98,7 +149,7 @@ static bool isolated_cpus_updating;
* Note that update_relax_domain_level() in cpuset-v1.c can still call
* rebuild_sched_domains_locked() directly without using this flag.
*/
-static bool force_sd_rebuild;
+static bool force_sd_rebuild; /* RWCS */
/*
* Partition root states:
@@ -218,42 +269,6 @@ struct cpuset top_cpuset = {
.partition_root_state = PRS_ROOT,
};
-/*
- * There are two global locks guarding cpuset structures - cpuset_mutex and
- * callback_lock. The cpuset code uses only cpuset_mutex. Other kernel
- * subsystems can use cpuset_lock()/cpuset_unlock() to prevent change to cpuset
- * structures. Note that cpuset_mutex needs to be a mutex as it is used in
- * paths that rely on priority inheritance (e.g. scheduler - on RT) for
- * correctness.
- *
- * A task must hold both locks to modify cpusets. If a task holds
- * cpuset_mutex, it blocks others, ensuring that it is the only task able to
- * also acquire callback_lock and be able to modify cpusets. It can perform
- * various checks on the cpuset structure first, knowing nothing will change.
- * It can also allocate memory while just holding cpuset_mutex. While it is
- * performing these checks, various callback routines can briefly acquire
- * callback_lock to query cpusets. Once it is ready to make the changes, it
- * takes callback_lock, blocking everyone else.
- *
- * Calls to the kernel memory allocator can not be made while holding
- * callback_lock, as that would risk double tripping on callback_lock
- * from one of the callbacks into the cpuset code from within
- * __alloc_pages().
- *
- * If a task is only holding callback_lock, then it has read-only
- * access to cpusets.
- *
- * Now, the task_struct fields mems_allowed and mempolicy may be changed
- * by other task, we use alloc_lock in the task_struct fields to protect
- * them.
- *
- * The cpuset_common_seq_show() handlers only hold callback_lock across
- * small pieces of code, such as when reading out possibly multi-word
- * cpumasks and nodemasks.
- */
-
-static DEFINE_MUTEX(cpuset_mutex);
-
/**
* cpuset_lock - Acquire the global cpuset mutex
*
@@ -1162,6 +1177,8 @@ static void reset_partition_data(struct cpuset *cs)
static void isolated_cpus_update(int old_prs, int new_prs, struct cpumask *xcpus)
{
WARN_ON_ONCE(old_prs == new_prs);
+ lockdep_assert_held(&callback_lock);
+ lockdep_assert_held(&cpuset_mutex);
if (new_prs == PRS_ISOLATED)
cpumask_or(isolated_cpus, isolated_cpus, xcpus);
else
--
2.53.0
next prev parent reply other threads:[~2026-02-21 18:55 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-21 18:54 [PATCH v6 0/8] cgroup/cpuset: Fix partition related locking issues Waiman Long
2026-02-21 18:54 ` [PATCH v6 1/8] cgroup/cpuset: Fix incorrect change to effective_xcpus in partition_xcpus_del() Waiman Long
2026-02-21 18:54 ` [PATCH v6 2/8] cgroup/cpuset: Fix incorrect use of cpuset_update_tasks_cpumask() in update_cpumasks_hier() Waiman Long
2026-02-21 18:54 ` Waiman Long [this message]
2026-02-26 15:00 ` [PATCH v6 3/8] cgroup/cpuset: Clarify exclusion rules for cpuset internal variables Frederic Weisbecker
2026-02-21 18:54 ` [PATCH v6 4/8] cgroup/cpuset: Set isolated_cpus_updating only if isolated_cpus is changed Waiman Long
2026-02-26 15:07 ` Frederic Weisbecker
2026-02-21 18:54 ` [PATCH v6 5/8] kselftest/cgroup: Simplify test_cpuset_prs.sh by removing "S+" command Waiman Long
2026-02-21 18:54 ` [PATCH v6 6/8] cgroup/cpuset: Move housekeeping_update()/rebuild_sched_domains() together Waiman Long
2026-02-26 15:51 ` Frederic Weisbecker
2026-02-21 18:54 ` [PATCH v6 7/8] cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue Waiman Long
2026-02-26 16:06 ` Frederic Weisbecker
2026-03-03 16:00 ` Waiman Long
2026-03-03 22:48 ` Frederic Weisbecker
2026-03-04 4:05 ` Waiman Long
2026-03-02 11:49 ` Frederic Weisbecker
2026-03-03 15:18 ` Jon Hunter
2026-03-03 16:09 ` Waiman Long
2026-03-04 3:58 ` Waiman Long
2026-03-04 11:07 ` Jon Hunter
2026-03-04 18:11 ` Waiman Long
2026-02-21 18:54 ` [PATCH v6 8/8] cgroup/cpuset: Call housekeeping_update() without holding cpus_read_lock Waiman Long
2026-03-02 12:14 ` Frederic Weisbecker
2026-03-02 14:15 ` Waiman Long
2026-03-02 15:40 ` Waiman Long
2026-02-23 20:57 ` [PATCH v6 0/8] cgroup/cpuset: Fix partition related locking issues Tejun Heo
2026-02-23 21:11 ` Waiman Long
2026-02-24 7:51 ` Chen Ridong
2026-03-02 12:21 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260221185418.29319-4-longman@redhat.com \
--to=longman@redhat.com \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huaweicloud.com \
--cc=frederic@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=shuah@kernel.org \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox