From: Frederic Weisbecker <frederic@kernel.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: "Frederic Weisbecker" <frederic@kernel.org>,
"Michal Koutný" <mkoutny@suse.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Catalin Marinas" <catalin.marinas@arm.com>,
"Chen Ridong" <chenridong@huawei.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"David S . Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Gabriele Monaco" <gmonaco@redhat.com>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
"Ingo Molnar" <mingo@redhat.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Jens Axboe" <axboe@kernel.dk>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Lai Jiangshan" <jiangshanlai@gmail.com>,
"Marco Crivellari" <marco.crivellari@suse.com>,
"Michal Hocko" <mhocko@suse.com>,
"Muchun Song" <muchun.song@linux.dev>,
"Paolo Abeni" <pabeni@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Phil Auld" <pauld@redhat.com>,
"Rafael J . Wysocki" <rafael@kernel.org>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Shakeel Butt" <shakeel.butt@linux.dev>,
"Simon Horman" <horms@kernel.org>, "Tejun Heo" <tj@kernel.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Waiman Long" <longman@redhat.com>,
"Will Deacon" <will@kernel.org>,
cgroups@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
linux-block@vger.kernel.org, linux-mm@kvack.org,
linux-pci@vger.kernel.org, netdev@vger.kernel.org
Subject: [PATCH 14/33] cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset
Date: Sun, 25 Jan 2026 23:45:21 +0100 [thread overview]
Message-ID: <20260125224541.50226-15-frederic@kernel.org> (raw)
In-Reply-To: <20260125224541.50226-1-frederic@kernel.org>
Until now, HK_TYPE_DOMAIN used to only include boot defined isolated
CPUs passed through isolcpus= boot option. Users interested in also
knowing the runtime defined isolated CPUs through cpuset must use
different APIs: cpuset_cpu_is_isolated(), cpu_is_isolated(), etc...
There are many drawbacks to that approach:
1) Most interested subsystems want to know about all isolated CPUs, not
just those defined on boot time.
2) cpuset_cpu_is_isolated() / cpu_is_isolated() are not synchronized with
concurrent cpuset changes.
3) Further cpuset modifications are not propagated to subsystems
Solve 1) and 2) and centralize all isolated CPUs within the
HK_TYPE_DOMAIN housekeeping cpumask.
Subsystems can rely on RCU to synchronize against concurrent changes.
The propagation mentioned in 3) will be handled in further patches.
[Chen Ridong: Fix cpu_hotplug_lock deadlock and use correct static
branch API]
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Waiman Long <longman@redhat.com>
Reviewed-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
include/linux/sched/isolation.h | 7 +++
kernel/cgroup/cpuset.c | 5 ++-
kernel/sched/isolation.c | 75 ++++++++++++++++++++++++++++++---
kernel/sched/sched.h | 1 +
4 files changed, 80 insertions(+), 8 deletions(-)
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index c7cf6934489c..d8d9baf44516 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -9,6 +9,11 @@
enum hk_type {
/* Inverse of boot-time isolcpus= argument */
HK_TYPE_DOMAIN_BOOT,
+ /*
+ * Same as HK_TYPE_DOMAIN_BOOT but also includes the
+ * inverse of cpuset isolated partitions. As such it
+ * is always a subset of HK_TYPE_DOMAIN_BOOT.
+ */
HK_TYPE_DOMAIN,
/* Inverse of boot-time isolcpus=managed_irq argument */
HK_TYPE_MANAGED_IRQ,
@@ -35,6 +40,7 @@ extern const struct cpumask *housekeeping_cpumask(enum hk_type type);
extern bool housekeeping_enabled(enum hk_type type);
extern void housekeeping_affine(struct task_struct *t, enum hk_type type);
extern bool housekeeping_test_cpu(int cpu, enum hk_type type);
+extern int housekeeping_update(struct cpumask *isol_mask);
extern void __init housekeeping_init(void);
#else
@@ -62,6 +68,7 @@ static inline bool housekeeping_test_cpu(int cpu, enum hk_type type)
return true;
}
+static inline int housekeeping_update(struct cpumask *isol_mask) { return 0; }
static inline void housekeeping_init(void) { }
#endif /* CONFIG_CPU_ISOLATION */
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 5e2e3514c22e..e146e1f34bf9 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1482,14 +1482,15 @@ static void update_isolation_cpumasks(void)
if (!isolated_cpus_updating)
return;
- lockdep_assert_cpus_held();
-
ret = workqueue_unbound_exclude_cpumask(isolated_cpus);
WARN_ON_ONCE(ret < 0);
ret = tmigr_isolated_exclude_cpumask(isolated_cpus);
WARN_ON_ONCE(ret < 0);
+ ret = housekeeping_update(isolated_cpus);
+ WARN_ON_ONCE(ret < 0);
+
isolated_cpus_updating = false;
}
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 6f77289c14c3..674a02891b38 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -29,18 +29,48 @@ static struct housekeeping housekeeping;
bool housekeeping_enabled(enum hk_type type)
{
- return !!(housekeeping.flags & BIT(type));
+ return !!(READ_ONCE(housekeeping.flags) & BIT(type));
}
EXPORT_SYMBOL_GPL(housekeeping_enabled);
+static bool housekeeping_dereference_check(enum hk_type type)
+{
+ if (IS_ENABLED(CONFIG_LOCKDEP) && type == HK_TYPE_DOMAIN) {
+ /* Cpuset isn't even writable yet? */
+ if (system_state <= SYSTEM_SCHEDULING)
+ return true;
+
+ /* CPU hotplug write locked, so cpuset partition can't be overwritten */
+ if (IS_ENABLED(CONFIG_HOTPLUG_CPU) && lockdep_is_cpus_write_held())
+ return true;
+
+ /* Cpuset lock held, partitions not writable */
+ if (IS_ENABLED(CONFIG_CPUSETS) && lockdep_is_cpuset_held())
+ return true;
+
+ return false;
+ }
+
+ return true;
+}
+
+static inline struct cpumask *housekeeping_cpumask_dereference(enum hk_type type)
+{
+ return rcu_dereference_all_check(housekeeping.cpumasks[type],
+ housekeeping_dereference_check(type));
+}
+
const struct cpumask *housekeeping_cpumask(enum hk_type type)
{
+ const struct cpumask *mask = NULL;
+
if (static_branch_unlikely(&housekeeping_overridden)) {
- if (housekeeping.flags & BIT(type)) {
- return rcu_dereference_check(housekeeping.cpumasks[type], 1);
- }
+ if (READ_ONCE(housekeeping.flags) & BIT(type))
+ mask = housekeeping_cpumask_dereference(type);
}
- return cpu_possible_mask;
+ if (!mask)
+ mask = cpu_possible_mask;
+ return mask;
}
EXPORT_SYMBOL_GPL(housekeeping_cpumask);
@@ -80,12 +110,45 @@ EXPORT_SYMBOL_GPL(housekeeping_affine);
bool housekeeping_test_cpu(int cpu, enum hk_type type)
{
- if (static_branch_unlikely(&housekeeping_overridden) && housekeeping.flags & BIT(type))
+ if (static_branch_unlikely(&housekeeping_overridden) &&
+ READ_ONCE(housekeeping.flags) & BIT(type))
return cpumask_test_cpu(cpu, housekeeping_cpumask(type));
return true;
}
EXPORT_SYMBOL_GPL(housekeeping_test_cpu);
+int housekeeping_update(struct cpumask *isol_mask)
+{
+ struct cpumask *trial, *old = NULL;
+
+ lockdep_assert_cpus_held();
+
+ trial = kmalloc(cpumask_size(), GFP_KERNEL);
+ if (!trial)
+ return -ENOMEM;
+
+ cpumask_andnot(trial, housekeeping_cpumask(HK_TYPE_DOMAIN_BOOT), isol_mask);
+ if (!cpumask_intersects(trial, cpu_online_mask)) {
+ kfree(trial);
+ return -EINVAL;
+ }
+
+ if (!housekeeping.flags)
+ static_branch_enable_cpuslocked(&housekeeping_overridden);
+
+ if (housekeeping.flags & HK_FLAG_DOMAIN)
+ old = housekeeping_cpumask_dereference(HK_TYPE_DOMAIN);
+ else
+ WRITE_ONCE(housekeeping.flags, housekeeping.flags | HK_FLAG_DOMAIN);
+ rcu_assign_pointer(housekeeping.cpumasks[HK_TYPE_DOMAIN], trial);
+
+ synchronize_rcu();
+
+ kfree(old);
+
+ return 0;
+}
+
void __init housekeeping_init(void)
{
enum hk_type type;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 475bdab3b8db..653e898a996a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -30,6 +30,7 @@
#include <linux/context_tracking.h>
#include <linux/cpufreq.h>
#include <linux/cpumask_api.h>
+#include <linux/cpuset.h>
#include <linux/ctype.h>
#include <linux/file.h>
#include <linux/fs_api.h>
--
2.51.1
next prev parent reply other threads:[~2026-01-25 22:47 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-25 22:45 [PATCH 00/33 v7] cpuset/isolation: Honour kthreads preferred affinity Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 01/33] PCI: Prepare to protect against concurrent isolated cpuset change Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 02/33] cpu: Revert "cpu/hotplug: Prevent self deadlock on CPU hot-unplug" Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 03/33] memcg: Prepare to protect against concurrent isolated cpuset change Frederic Weisbecker
2026-01-26 16:41 ` Michal Hocko
2026-01-27 12:45 ` Frederic Weisbecker
2026-01-28 8:45 ` Michal Hocko
2026-01-28 11:27 ` Frederic Weisbecker
2026-01-28 21:18 ` Michal Hocko
2026-01-28 21:18 ` Michal Hocko
2026-01-25 22:45 ` [PATCH 04/33] mm: vmstat: " Frederic Weisbecker
2026-01-28 21:18 ` Michal Hocko
2026-01-25 22:45 ` [PATCH 05/33] sched/isolation: Save boot defined domain flags Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 06/33] cpuset: Convert boot_hk_cpus to use HK_TYPE_DOMAIN_BOOT Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 07/33] driver core: cpu: Convert /sys/devices/system/cpu/isolated " Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 08/33] net: Keep ignoring isolated cpuset change Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 09/33] block: Protect against concurrent " Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 10/33] timers/migration: Prevent from lockdep false positive warning Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 11/33] cpu: Provide lockdep check for CPU hotplug lock write-held Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 12/33] cpuset: Provide lockdep check for cpuset lock held Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 13/33] sched/isolation: Convert housekeeping cpumasks to rcu pointers Frederic Weisbecker
2026-01-25 22:45 ` Frederic Weisbecker [this message]
2026-01-25 22:45 ` [PATCH 15/33] sched/isolation: Flush memcg workqueues on cpuset isolated partition change Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 16/33] sched/isolation: Flush vmstat " Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 17/33] PCI: Flush PCI probe workqueue " Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 18/33] cpuset: Propagate cpuset isolation update to workqueue through housekeeping Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 19/33] cpuset: Propagate cpuset isolation update to timers " Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 20/33] timers/migration: Remove superfluous cpuset isolation test Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 21/33] cpuset: Remove cpuset_cpu_is_isolated() Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 22/33] sched/isolation: Remove HK_TYPE_TICK test from cpu_is_isolated() Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 23/33] PCI: Remove superfluous HK_TYPE_WQ check Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 24/33] kthread: Refine naming of affinity related fields Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 25/33] kthread: Include unbound kthreads in the managed affinity list Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 26/33] kthread: Include kthreadd to " Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 27/33] kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 28/33] sched: Switch the fallback task allowed cpumask to HK_TYPE_DOMAIN Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 29/33] sched/arm64: Move fallback task " Frederic Weisbecker
2026-01-26 13:52 ` Will Deacon
2026-01-25 22:45 ` [PATCH 30/33] kthread: Honour kthreads preferred affinity after cpuset changes Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 31/33] kthread: Comment on the purpose and placement of kthread_affine_node() call Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 32/33] kthread: Document kthread_affine_preferred() Frederic Weisbecker
2026-01-25 22:45 ` [PATCH 33/33] doc: Add housekeeping documentation Frederic Weisbecker
-- strict thread matches above, loose matches on Subject: below --
2026-01-01 22:13 [PATCH 00/33 v6] cpuset/isolation: Honour kthreads preferred affinity Frederic Weisbecker
2026-01-01 22:13 ` [PATCH 14/33] cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset Frederic Weisbecker
2026-01-22 11:24 ` Chen Ridong
2025-12-24 13:44 [PATCH 00/33 v5] cpuset/isolation: Honour kthreads preferred affinity Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 14/33] cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset Frederic Weisbecker
2025-12-26 2:24 ` Waiman Long
2025-12-26 3:20 ` Waiman Long
2025-12-26 8:08 ` Chen Ridong
2025-12-31 14:21 ` Frederic Weisbecker
2025-08-29 15:47 [PATCH 00/33 v2] cpuset/isolation: Honour kthreads preferred affinity Frederic Weisbecker
2025-08-29 15:47 ` [PATCH 14/33] cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset Frederic Weisbecker
2025-09-01 0:40 ` Waiman Long
2025-09-22 14:57 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260125224541.50226-15-frederic@kernel.org \
--to=frederic@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=bhelgaas@google.com \
--cc=catalin.marinas@arm.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huawei.com \
--cc=dakr@kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gmonaco@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=horms@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pci@vger.kernel.org \
--cc=longman@redhat.com \
--cc=marco.crivellari@suse.com \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.