linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: "Tejun Heo" <tj@kernel.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Frederic Weisbecker" <frederic@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	"Neeraj Upadhyay" <neeraj.upadhyay@kernel.org>,
	"Joel Fernandes" <joelagnelf@nvidia.com>,
	"Josh Triplett" <josh@joshtriplett.org>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Uladzislau Rezki" <urezki@gmail.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"Lai Jiangshan" <jiangshanlai@gmail.com>,
	Zqiang <qiang.zhang@linux.dev>,
	"Anna-Maria Behnsen" <anna-maria@linutronix.de>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Juri Lelli" <juri.lelli@redhat.com>,
	"Vincent Guittot" <vincent.guittot@linaro.org>,
	"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
	"Ben Segall" <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
	"Valentin Schneider" <vschneid@redhat.com>,
	"Shuah Khan" <shuah@kernel.org>
Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, rcu@vger.kernel.org,
	linux-kselftest@vger.kernel.org, Phil Auld <pauld@redhat.com>,
	Costa Shulyupin <costa.shul@redhat.com>,
	Gabriele Monaco <gmonaco@redhat.com>,
	Cestmir Kalina <ckalina@redhat.com>,
	Waiman Long <longman@redhat.com>
Subject: [RFC PATCH 03/18] sched/isolation: Use RCU to delay successive housekeeping cpumask updates
Date: Fri,  8 Aug 2025 11:10:47 -0400	[thread overview]
Message-ID: <20250808151053.19777-4-longman@redhat.com> (raw)
In-Reply-To: <20250808151053.19777-1-longman@redhat.com>

Even though there are 2 separate sets of housekeeping cpumasks for access
and update, it is possible that the set of cpumasks to be updated are
still being used by the callers of housekeeping functions resulting in
the use of an intermediate cpumask between the new and old ones.

To reduce the chance of this, we need to introduce delay between
successive housekeeping cpumask updates. One simple way is to make
use of the RCU grace period delay. The callers of the housekeeping APIs
can optionally hold rcu_read_lock to eliminate the chance of using
intermediate housekeeping cpumasks.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/sched/isolation.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index ee396ae13719..f26708667754 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -23,6 +23,9 @@ EXPORT_SYMBOL_GPL(housekeeping_overridden);
  * The housekeeping cpumasks can now be dynamically updated at run time.
  * Two set of cpumasks are kept. One set can be used while the other set are
  * being updated concurrently.
+ *
+ * rcu_read_lock() can optionally be held by housekeeping API callers to
+ * ensure stability of the cpumasks.
  */
 static DEFINE_RAW_SPINLOCK(cpumask_lock);
 struct housekeeping {
@@ -34,6 +37,8 @@ struct housekeeping {
 
 static struct housekeeping housekeeping;
 static bool sched_tick_offload_inited;
+static struct rcu_head rcu_gp[HK_TYPE_MAX];
+static unsigned long update_flags;
 
 bool housekeeping_enabled(enum hk_type type)
 {
@@ -267,6 +272,18 @@ static int __init housekeeping_isolcpus_setup(char *str)
 }
 __setup("isolcpus=", housekeeping_isolcpus_setup);
 
+/*
+ * Bits in update_flags can only turned on with cpumask_lock held and
+ * cleared by this RCU callback function.
+ */
+static void rcu_gp_end(struct rcu_head *rcu)
+{
+	int type = rcu - rcu_gp;
+
+	/* Atomically clear the corresponding flag bit */
+	clear_bit(type, &update_flags);
+}
+
 /**
  * housekeeping_exclude_cpumask - Update housekeeping cpumasks to exclude only the given cpumask
  * @cpumask:  new cpumask to be excluded from housekeeping cpumasks
@@ -306,8 +323,21 @@ int housekeeping_exclude_cpumask(struct cpumask *cpumask, unsigned long hk_flags
 	}
 #endif
 
+retry:
+	/*
+	 * If the RCU grace period for the previous update with conflicting
+	 * flag bits hasn't been completed yet, we have to wait for it.
+	 */
+	while (READ_ONCE(update_flags) & hk_flags)
+		synchronize_rcu();
+
 	raw_spin_lock(&cpumask_lock);
 
+	if (READ_ONCE(update_flags) & hk_flags) {
+		raw_spin_unlock(&cpumask_lock);
+		goto retry;
+	}
+
 	for_each_set_bit(type, &hk_flags, HK_TYPE_MAX) {
 		int idx = ++housekeeping.seq_nrs[type] & 1;
 		struct cpumask *dst_cpumask = housekeeping.cpumasks[type][idx];
@@ -320,8 +350,11 @@ int housekeeping_exclude_cpumask(struct cpumask *cpumask, unsigned long hk_flags
 			housekeeping.flags |= BIT(type);
 		}
 		WRITE_ONCE(housekeeping.cpumask_ptrs[type], dst_cpumask);
+		set_bit(type, &update_flags);
 	}
 	raw_spin_unlock(&cpumask_lock);
+	for_each_set_bit(type, &hk_flags, HK_TYPE_MAX)
+		call_rcu(&rcu_gp[type], rcu_gp_end);
 
 	if (!housekeeping.flags && static_key_enabled(&housekeeping_overridden))
 		static_key_disable(&housekeeping_overridden.key);
-- 
2.50.0


  parent reply	other threads:[~2025-08-08 15:12 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-08 15:10 [RFC PATCH 00/18] cgroup/cpuset: Enable runtime modification of Waiman Long
2025-08-08 15:10 ` [RFC PATCH 01/18] sched/isolation: Enable runtime update of housekeeping cpumasks Waiman Long
2025-08-08 15:10 ` [RFC PATCH 02/18] sched/isolation: Call sched_tick_offload_init() when HK_FLAG_KERNEL_NOISE is first set Waiman Long
2025-08-08 15:10 ` Waiman Long [this message]
2025-08-08 15:10 ` [RFC PATCH 04/18] sched/isolation: Add a debugfs file to dump housekeeping cpumasks Waiman Long
2025-08-08 15:10 ` [RFC PATCH 05/18] cpu/hotplug: Add a new cpuhp_offline_cb() API Waiman Long
2025-08-08 15:10 ` [RFC PATCH 06/18] cgroup/cpuset: Introduce a new top level isolcpus_update_mutex Waiman Long
2025-08-08 15:10 ` [RFC PATCH 07/18] cgroup/cpuset: Allow overwriting HK_TYPE_DOMAIN housekeeping cpumask Waiman Long
2025-08-08 15:10 ` [RFC PATCH 08/18] cgroup/cpuset: Use CPU hotplug to enable runtime nohz_full modification Waiman Long
2025-08-08 15:10 ` [RFC PATCH 09/18] cgroup/cpuset: Revert "Include isolated cpuset CPUs in cpu_is_isolated() check" Waiman Long
2025-08-08 15:19 ` [RFC PATCH 10/18] sched/core: Ignore DL BW deactivation error if in cpuhp_offline_cb_mode Waiman Long
2025-08-08 15:19 ` [RFC PATCH 11/18] tick/nohz: Make nohz_full parameter optional Waiman Long
2025-08-08 15:19 ` [RFC PATCH 12/18] tick/nohz: Introduce tick_nohz_full_update_cpus() to update tick_nohz_full_mask Waiman Long
2025-08-08 15:19 ` [RFC PATCH 13/18] tick/nohz: Allow runtime changes in full dynticks CPUs Waiman Long
2025-08-08 15:19 ` [RFC PATCH 14/18] tick: Pass timer tick job to an online HK CPU in tick_cpu_dying() Waiman Long
2025-08-08 15:19 ` [RFC PATCH 15/18] cgroup/cpuset: Enable RCU NO-CB CPU offloading of newly isolated CPUs Waiman Long
2025-08-08 15:19 ` [RFC PATCH 16/18] cgroup/cpuset: Don't set have_boot_nohz_full without any boot time nohz_full CPU Waiman Long
2025-08-08 15:20 ` [RFC PATCH 17/18] cgroup/cpuset: Documentation updates & don't use CPU 0 for isolated partition Waiman Long
2025-08-08 15:20 ` [RFC PATCH 18/18] cgroup/cpuset: Add pr_debug() statements for cpuhp_offline_cb() call Waiman Long
2025-08-08 15:50 ` [RFC PATCH 00/18] cgroup/cpuset: Enable runtime modification of Frederic Weisbecker
2025-08-08 16:27   ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250808151053.19777-4-longman@redhat.com \
    --to=longman@redhat.com \
    --cc=anna-maria@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=bsegall@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=ckalina@redhat.com \
    --cc=corbet@lwn.net \
    --cc=costa.shul@redhat.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=frederic@kernel.org \
    --cc=gmonaco@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=jiangshanlai@gmail.com \
    --cc=joelagnelf@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=pauld@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qiang.zhang@linux.dev \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=urezki@gmail.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).