public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: "Tejun Heo" <tj@kernel.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Shuah Khan" <skhan@linuxfoundation.org>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Will Deacon" <will@kernel.org>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	"Haiyang Zhang" <haiyangz@microsoft.com>,
	"Wei Liu" <wei.liu@kernel.org>,
	"Dexuan Cui" <decui@microsoft.com>,
	"Long Li" <longli@microsoft.com>,
	"Guenter Roeck" <linux@roeck-us.net>,
	"Frederic Weisbecker" <frederic@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	"Neeraj Upadhyay" <neeraj.upadhyay@kernel.org>,
	"Joel Fernandes" <joelagnelf@nvidia.com>,
	"Josh Triplett" <josh@joshtriplett.org>,
	"Boqun Feng" <boqun@kernel.org>,
	"Uladzislau Rezki" <urezki@gmail.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"Lai Jiangshan" <jiangshanlai@gmail.com>,
	Zqiang <qiang.zhang@linux.dev>,
	"Anna-Maria Behnsen" <anna-maria@linutronix.de>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Thomas Gleixner" <tglx@kernel.org>,
	"Chen Ridong" <chenridong@huaweicloud.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Juri Lelli" <juri.lelli@redhat.com>,
	"Vincent Guittot" <vincent.guittot@linaro.org>,
	"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
	"Ben Segall" <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
	"Valentin Schneider" <vschneid@redhat.com>,
	"K Prateek Nayak" <kprateek.nayak@amd.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Simon Horman" <horms@kernel.org>
Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-hyperv@vger.kernel.org, linux-hwmon@vger.kernel.org,
	rcu@vger.kernel.org, netdev@vger.kernel.org,
	linux-kselftest@vger.kernel.org,
	Costa Shulyupin <cshulyup@redhat.com>,
	Qiliang Yuan <realwujing@gmail.com>,
	Waiman Long <longman@redhat.com>
Subject: [PATCH 22/23] cgroup/cpuset: Prevent offline_disabled CPUs from being used in isolated partition
Date: Mon, 20 Apr 2026 23:03:50 -0400	[thread overview]
Message-ID: <20260421030351.281436-23-longman@redhat.com> (raw)
In-Reply-To: <20260421030351.281436-1-longman@redhat.com>

If tick_nohz_full_enabled() is true, we are going to use CPU
hotplug to enable runtime changes to the HK_TYPE_KERNEL_NOISE and
HK_TYPE_MANAGED_IRQ cpumasks. However, for some architectures, one
or maybe more CPUs will have the offline_disabled flag set in their
cpu devices. For instance, x64-64 will set this flag for its boot CPU
(typically CPU 0) to disable it from going offline. Those CPUs can't
be used in cpuset isolated partition, or we are going to have problem
in the CPU offline process.

Find out the set of CPUs with offline_disabled set in a
new cpuset_init_late() helper, set the corresponding bits in
offline_disabled_cpus cpumask and check it when isolated partitions are
being created or modified to ensure that we will not use any of those
offline disabled CPUs in an isolated partition.

An error message mentioning those offline disabled CPUs will be
constructed in cpuset_init_late() and shown in "cpuset.cpus.partition"
when isolated creation or modification fails.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset-internal.h |  1 +
 kernel/cgroup/cpuset.c          | 89 ++++++++++++++++++++++++++++++++-
 2 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index fd7d19842ded..87b7411540ff 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -35,6 +35,7 @@ enum prs_errcode {
 	PERR_HKEEPING,
 	PERR_ACCESS,
 	PERR_REMOTE,
+	PERR_OL_DISABLED,
 };
 
 /* bits in struct cpuset flags field */
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 5f6b4e67748f..f3af8ef6c64e 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -20,6 +20,8 @@
  */
 #include "cpuset-internal.h"
 
+#include <linux/cpu.h>
+#include <linux/device.h>
 #include <linux/init.h>
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
@@ -48,7 +50,7 @@ DEFINE_STATIC_KEY_FALSE(cpusets_enabled_key);
  */
 DEFINE_STATIC_KEY_FALSE(cpusets_insane_config_key);
 
-static const char * const perr_strings[] = {
+static const char *perr_strings[] __ro_after_init = {
 	[PERR_INVCPUS]   = "Invalid cpu list in cpuset.cpus.exclusive",
 	[PERR_INVPARENT] = "Parent is an invalid partition root",
 	[PERR_NOTPART]   = "Parent is not a partition root",
@@ -59,6 +61,7 @@ static const char * const perr_strings[] = {
 	[PERR_HKEEPING]  = "partition config conflicts with housekeeping setup",
 	[PERR_ACCESS]    = "Enable partition not permitted",
 	[PERR_REMOTE]    = "Have remote partition underneath",
+	[PERR_OL_DISABLED] = "", /* To be set up later */
 };
 
 /*
@@ -164,6 +167,12 @@ static cpumask_var_t	isolated_mirq_cpus;	/* T */
 static bool		boot_nohz_le_domain __ro_after_init;
 static bool		boot_mirq_le_domain __ro_after_init;
 
+/*
+ * Cpumask of CPUs with offline_disabled set
+ * The cpumask is effectively __ro_after_init.
+ */
+static cpumask_var_t	offline_disabled_cpus;
+
 /*
  * A flag to force sched domain rebuild at the end of an operation.
  * It can be set in
@@ -1649,6 +1658,8 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
 	 * The effective_xcpus mask can contain offline CPUs, but there must
 	 * be at least one or more online CPUs present before it can be enabled.
 	 *
+	 * An isolated partition cannot contain CPUs with offline disabled.
+	 *
 	 * Note that creating a remote partition with any local partition root
 	 * above it or remote partition root underneath it is not allowed.
 	 */
@@ -1661,6 +1672,9 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
 	     !isolated_cpus_can_update(tmp->new_cpus, NULL)) ||
 	    prstate_housekeeping_conflict(new_prs, tmp->new_cpus))
 		return PERR_HKEEPING;
+	if (tick_nohz_full_enabled() && (new_prs == PRS_ISOLATED) &&
+	    cpumask_intersects(tmp->new_cpus, offline_disabled_cpus))
+		return PERR_OL_DISABLED;
 
 	spin_lock_irq(&callback_lock);
 	partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
@@ -1746,6 +1760,16 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
 		goto invalidate;
 	}
 
+	/*
+	 * Isolated partition cannot contains CPUs with offline_disabled
+	 * bit set.
+	 */
+	if (tick_nohz_full_enabled() && (prs == PRS_ISOLATED) &&
+	    cpumask_intersects(excpus, offline_disabled_cpus)) {
+		cs->prs_err = PERR_OL_DISABLED;
+		goto invalidate;
+	}
+
 	adding   = cpumask_andnot(tmp->addmask, excpus, cs->effective_xcpus);
 	deleting = cpumask_andnot(tmp->delmask, cs->effective_xcpus, excpus);
 
@@ -1913,6 +1937,11 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
 		if (tasks_nocpu_error(parent, cs, xcpus))
 			return PERR_NOCPUS;
 
+		if (tick_nohz_full_enabled() &&
+		    (new_prs == PRS_ISOLATED) &&
+		    cpumask_intersects(xcpus, offline_disabled_cpus))
+			return PERR_OL_DISABLED;
+
 		/*
 		 * This function will only be called when all the preliminary
 		 * checks have passed. At this point, the following condition
@@ -1979,12 +2008,21 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
 					       parent->effective_xcpus);
 		}
 
+		/*
+		 * Isolated partition cannot contain CPUs with offline_disabled
+		 * bit set.
+		 */
+		if (tick_nohz_full_enabled() &&
+		   ((old_prs == PRS_ISOLATED) ||
+		    (old_prs == PRS_INVALID_ISOLATED)) &&
+		    cpumask_intersects(newmask, offline_disabled_cpus)) {
+			part_error = PERR_OL_DISABLED;
 		/*
 		 * TBD: Invalidate a currently valid child root partition may
 		 * still break isolated_cpus_can_update() rule if parent is an
 		 * isolated partition.
 		 */
-		if (is_partition_valid(cs) && (old_prs != parent_prs)) {
+		} else if (is_partition_valid(cs) && (old_prs != parent_prs)) {
 			if ((parent_prs == PRS_ROOT) &&
 			    /* Adding to parent means removing isolated CPUs */
 			    !isolated_cpus_can_update(tmp->delmask, tmp->addmask))
@@ -1995,6 +2033,19 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
 				part_error = PERR_HKEEPING;
 		}
 
+		if (part_error) {
+			deleting = false;
+			/*
+			 * For a previously valid partition, we need to move
+			 * the exclusive CPUs back to its parent.
+			 */
+			if (is_partition_valid(cs) &&
+			    !cpumask_empty(cs->effective_xcpus)) {
+				cpumask_copy(tmp->addmask, cs->effective_xcpus);
+				adding = true;
+			}
+		}
+
 		/*
 		 * The new CPUs to be removed from parent's effective CPUs
 		 * must be present.
@@ -3829,6 +3880,7 @@ int __init cpuset_init(void)
 	BUG_ON(!zalloc_cpumask_var(&subpartitions_cpus, GFP_KERNEL));
 	BUG_ON(!zalloc_cpumask_var(&isolated_cpus, GFP_KERNEL));
 	BUG_ON(!zalloc_cpumask_var(&isolated_hk_cpus, GFP_KERNEL));
+	BUG_ON(!zalloc_cpumask_var(&offline_disabled_cpus, GFP_KERNEL));
 
 	cpumask_setall(top_cpuset.cpus_allowed);
 	nodes_setall(top_cpuset.mems_allowed);
@@ -4188,6 +4240,39 @@ void __init cpuset_init_smp(void)
 	BUG_ON(!cpuset_migrate_mm_wq);
 }
 
+/**
+ * cpuset_init_late - initialize the list of CPUs with offline_disabled set
+ *
+ * Description: Initialize a cpumask with CPUs that have the offline_disabled
+ *		bit set. It is done in a separate initcall as cpuset_init_smp()
+ *		is called before driver_init() where the CPU devices will be
+ *		set up.
+ */
+static int __init cpuset_init_late(void)
+{
+	int cpu;
+
+	if (!tick_nohz_full_enabled())
+		return 0;
+	/*
+	 * Iterate all the possible CPUs to see which one has offline disabled.
+	 */
+	for_each_possible_cpu(cpu) {
+		if (get_cpu_device(cpu)->offline_disabled)
+			__cpumask_set_cpu(cpu, offline_disabled_cpus);
+	}
+	if (!cpumask_empty(offline_disabled_cpus)) {
+		char buf[128];
+
+		snprintf(buf, sizeof(buf),
+			 "CPU %*pbl with offline disabled not allowed in isolated partition",
+			 cpumask_pr_args(offline_disabled_cpus));
+		perr_strings[PERR_OL_DISABLED] = kstrdup(buf, GFP_KERNEL);
+	}
+	return 0;
+}
+pure_initcall(cpuset_init_late);
+
 /*
  * Return cpus_allowed mask from a task's cpuset.
  */
-- 
2.53.0


  parent reply	other threads:[~2026-04-21  3:07 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21  3:03 [PATCH-next 00/23] cgroup/cpuset: Enable runtime update of nohz_full and managed_irq CPUs Waiman Long
2026-04-21  3:03 ` [PATCH 01/23] sched/isolation: Add HK_TYPE_KERNEL_NOISE_BOOT & HK_TYPE_MANAGED_IRQ_BOOT Waiman Long
2026-04-21  3:03 ` [PATCH 02/23] sched/isolation: Enhance housekeeping_update() to support updating more than one HK cpumask Waiman Long
2026-04-21  3:03 ` [PATCH 03/23] tick/nohz: Make nohz_full parameter optional Waiman Long
2026-04-21  8:32   ` Thomas Gleixner
2026-04-21 14:14     ` Waiman Long
2026-04-21  3:03 ` [PATCH 04/23] tick/nohz: Allow runtime changes in full dynticks CPUs Waiman Long
2026-04-21  8:50   ` Thomas Gleixner
2026-04-21 14:24     ` Waiman Long
2026-04-21  3:03 ` [PATCH 05/23] tick: Pass timer tick job to an online HK CPU in tick_cpu_dying() Waiman Long
2026-04-21  8:55   ` Thomas Gleixner
2026-04-21 14:22     ` Waiman Long
2026-04-21  3:03 ` [PATCH 06/23] rcu/nocbs: Allow runtime changes in RCU NOCBS cpumask Waiman Long
2026-04-21  3:03 ` [PATCH 07/23] watchdog: Sync up with runtime change of isolated CPUs Waiman Long
2026-04-21  3:03 ` [PATCH 08/23] arm64: topology: Use RCU to protect access to HK_TYPE_TICK cpumask Waiman Long
2026-04-21  3:03 ` [PATCH 09/23] workqueue: Use RCU to protect access of HK_TYPE_TIMER cpumask Waiman Long
2026-04-21  3:03 ` [PATCH 10/23] cpu: " Waiman Long
2026-04-21  8:57   ` Thomas Gleixner
2026-04-21 14:25     ` Waiman Long
2026-04-21  3:03 ` [PATCH 11/23] hrtimer: " Waiman Long
2026-04-21  8:59   ` Thomas Gleixner
2026-04-21  3:03 ` [PATCH 12/23] net: Use boot time housekeeping cpumask settings for now Waiman Long
2026-04-21  3:03 ` [PATCH 13/23] sched/core: Use RCU to protect access of HK_TYPE_KERNEL_NOISE cpumask Waiman Long
2026-04-21  3:03 ` [PATCH 14/23] hwmon/coretemp: Use RCU to protect access of HK_TYPE_MISC cpumask Waiman Long
2026-04-21  3:03 ` [PATCH 15/23] Drivers: hv: Use RCU to protect access of HK_TYPE_MANAGED_IRQ cpumask Waiman Long
2026-04-21  3:03 ` [PATCH 16/23] genirq/cpuhotplug: " Waiman Long
2026-04-21  9:02   ` Thomas Gleixner
2026-04-21 14:29     ` Waiman Long
2026-04-21  3:03 ` [PATCH 17/23] sched/isolation: Extend housekeeping_dereference_check() to cover changes in nohz_full or manged_irqs cpumasks Waiman Long
2026-04-21  3:03 ` [PATCH 18/23] cpu/hotplug: Add a new cpuhp_offline_cb() API Waiman Long
2026-04-21 16:17   ` Thomas Gleixner
2026-04-21 17:29     ` Waiman Long
2026-04-21 18:43       ` Thomas Gleixner
2026-04-21  3:03 ` [PATCH 19/23] cgroup/cpuset: Improve check for calling housekeeping_update() Waiman Long
2026-04-21  3:03 ` [PATCH 20/23] cgroup/cpuset: Enable runtime update of HK_TYPE_{KERNEL_NOISE,MANAGED_IRQ} cpumasks Waiman Long
2026-04-21  3:03 ` [PATCH 21/23] cgroup/cpuset: Limit the side effect of using CPU hotplug on isolated partition Waiman Long
2026-04-21  3:03 ` Waiman Long [this message]
2026-04-21  3:03 ` [PATCH 23/23] cgroup/cpuset: Documentation and kselftest updates Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260421030351.281436-23-longman@redhat.com \
    --to=longman@redhat.com \
    --cc=anna-maria@linutronix.de \
    --cc=boqun@kernel.org \
    --cc=bsegall@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chenridong@huaweicloud.com \
    --cc=corbet@lwn.net \
    --cc=cshulyup@redhat.com \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=edumazet@google.com \
    --cc=frederic@kernel.org \
    --cc=haiyangz@microsoft.com \
    --cc=hannes@cmpxchg.org \
    --cc=horms@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=joelagnelf@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=kuba@kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-hwmon@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=longli@microsoft.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qiang.zhang@linux.dev \
    --cc=rcu@vger.kernel.org \
    --cc=realwujing@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=skhan@linuxfoundation.org \
    --cc=tglx@kernel.org \
    --cc=tj@kernel.org \
    --cc=urezki@gmail.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=wei.liu@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox