public inbox for rcu@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI)
@ 2026-03-25  9:09 Qiliang Yuan
  2026-03-25  9:09 ` [PATCH 01/15] sched/isolation: Support dynamic allocation for housekeeping masks Qiliang Yuan
                   ` (15 more replies)
  0 siblings, 16 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25  9:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
	Shuah Khan
  Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan

The Linux kernel provides mechanisms like 'isolcpus' and 'nohz_full' to
reduce interference for latency-sensitive workloads. However, these are
locked behind the "Reboot Wall" - they can only be configured via boot
parameters and require a system restart for changes to take effect.

In modern cloud-native environments, CPU resources often need to be
dynamically re-partitioned to accommodate container scaling without
the performance penalty and downtime of a full system reboot. Similarly,
high-frequency trading (HFT) platforms require the ability to fine-tune
CPU isolation at runtime to minimize jitter for critical execution threads
based on shifting market demands.

This patch series introduces Dynamic Housekeeping & Enhanced Isolation
(DHEI). DHEI allows administrators to reconfigure the kernel's
housekeeping boundaries at runtime via a new sysfs interface at
/sys/kernel/housekeeping/.

Key Features:
- Fine-grained control: Separate sysfs nodes for timer, rcu, tick,
  workqueue, kthread, managed_irq, domain, and misc.
- Dynamic NOHZ_FULL: Supports enabling/disabling full dynticks mode
  on-the-fly.
- SMT Awareness: Optional 'smt_aware_mode' for core-granular isolation.
- Safety Guards: Prevents isolating all CPUs, requires at least one
  online housekeeping CPU, and enforces CAP_SYS_ADMIN capability.

Core Architecture:
1. Notifier-Driven Synchronization: HK_UPDATE_MASK blocking notifier chain.
2. Decoupled Memory Management: Runtime-safe cpumask allocation.
3. Subsystem Handlers: Dynamic migration for IRQ, RCU, Sched, etc.

The series is organized as follows:
- Patches 01-03: Core infrastructure (dynamic allocation, notifier,
  enum separation)
- Patches 04-09: Subsystem notifier handlers (genirq, RCU, scheduler,
  watchdog, workqueue, mm/compaction)
- Patch 10: tick/nohz dynamic full dynticks
- Patches 11-13: SMT-aware isolation, boot-time bridging, sysfs interface
- Patch 14: ABI documentation
- Patch 15: kselftest suite

Tested on x86_64 (8 vCPUs, SMT enabled) with all selftests passing.

As suggested by Joel Fernandes and Thomas Gleixner, this V1 version
provides a stronger rationale for dynamic isolation and addresses
all RFC feedback regarding naming and notifier robustness.

To: Ingo Molnar <mingo@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
To: Juri Lelli <juri.lelli@redhat.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Steven Rostedt <rostedt@goodmis.org>
To: Ben Segall <bsegall@google.com>
To: Mel Gorman <mgorman@suse.de>
To: Valentin Schneider <vschneid@redhat.com>
To: Thomas Gleixner <tglx@kernel.org>
To: Paul E. McKenney <paulmck@kernel.org>
To: Frederic Weisbecker <frederic@kernel.org>
To: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
To: Joel Fernandes <joelagnelf@nvidia.com>
To: Josh Triplett <josh@joshtriplett.org>
To: Boqun Feng <boqun.feng@gmail.com>
To: Uladzislau Rezki <urezki@gmail.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Lai Jiangshan <jiangshanlai@gmail.com>
To: Zqiang <qiang.zhang@linux.dev>
To: Tejun Heo <tj@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
To: Vlastimil Babka <vbabka@suse.cz>
To: Suren Baghdasaryan <surenb@google.com>
To: Michal Hocko <mhocko@suse.com>
To: Brendan Jackman <jackmanb@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
To: Zi Yan <ziy@nvidia.com>
To: Anna-Maria Behnsen <anna-maria@linutronix.de>
To: Ingo Molnar <mingo@kernel.org>
To: Shuah Khan <shuah@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: rcu@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>

Changes since RFC:
- Dynamic RCU NOCB rewrite: Perform full runtime offload/deoffload via remove_cpu()/add_cpu() for online CPUs, with lazy initialization.
- Robust Timer Migration: Added logic to dynamically migrate tick_do_timer_cpu when a housekeeper is isolated.
- Enhanced Isolation Safety: Hardened sysfs interface with CAP_SYS_ADMIN checks, 0600 permissions, and strict cpumask validations including SMT subset checks.
- Lifecycle Cleanups: Replaced system_state boot checks with slab_is_available() and added hotplug shutdown guards for clean power-off.
- Testing & Docs: Added comprehensive kselftest suite for isolation scenarios and detailed ABI documentation.
- Link to RFC: https://lore.kernel.org/all/20260206-feature-dynamic_isolcpus_dhei-v1-0-00a711eb0c74@gmail.com/

---
Qiliang Yuan (15):
      sched/isolation: Support dynamic allocation for housekeeping masks
      sched/isolation: Introduce housekeeping notifier infrastructure
      sched/isolation: Separate housekeeping types in enum hk_type
      genirq: Support dynamic migration for managed interrupts
      rcu: Support runtime NOCB initialization and dynamic offloading
      sched/core: Dynamically update scheduler domain housekeeping mask
      watchdog: Allow runtime toggle of lockup detector affinity
      workqueue: Support dynamic housekeeping mask updates
      mm/compaction: Support dynamic housekeeping mask updates for kcompactd
      tick/nohz: Transition to dynamic full dynticks state management
      sched/isolation: Implement SMT-aware isolation and safety guards
      sched/isolation: Bridge boot-time parameters with dynamic isolation
      sched/isolation: Implement sysfs interface for dynamic housekeeping
      Documentation: isolation: Document DHEI sysfs interfaces
      selftests: dhei: Add functional tests for dynamic housekeeping

 .../ABI/testing/sysfs-kernel-housekeeping          |  22 ++
 include/linux/sched/isolation.h                    |  40 +++-
 kernel/irq/manage.c                                |  49 +++++
 kernel/rcu/rcu.h                                   |   4 +
 kernel/rcu/tree.c                                  |  76 +++++++
 kernel/rcu/tree.h                                  |   2 +-
 kernel/rcu/tree_nocb.h                             |  27 ++-
 kernel/sched/core.c                                |  28 +++
 kernel/sched/isolation.c                           | 236 ++++++++++++++++++++-
 kernel/time/tick-sched.c                           | 130 +++++++++---
 kernel/watchdog.c                                  |  25 +++
 kernel/workqueue.c                                 |  42 ++++
 mm/compaction.c                                    |  27 +++
 tools/testing/selftests/Makefile                   |   1 +
 tools/testing/selftests/dhei/Makefile              |   4 +
 tools/testing/selftests/dhei/dhei_test.sh          | 160 ++++++++++++++
 16 files changed, 818 insertions(+), 55 deletions(-)
---
base-commit: 63804fed149a6750ffd28610c5c1c98cce6bd377
change-id: 20260324-dhei-v12-final-891d1ba62bd3

Best regards,
-- 
Qiliang Yuan <realwujing@gmail.com>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/15] sched/isolation: Support dynamic allocation for housekeeping masks
  2026-03-25  9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
@ 2026-03-25  9:09 ` Qiliang Yuan
  2026-03-25 13:57   ` Peter Zijlstra
  2026-03-25  9:09 ` [PATCH 02/15] sched/isolation: Introduce housekeeping notifier infrastructure Qiliang Yuan
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25  9:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
	Shuah Khan
  Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan

The existing housekeeping infrastructure uses a single static cpumask
for all isolation types. This prevents independent runtime
reconfiguration of different services (like RCU vs. timers).

Introduce dynamic allocation for housekeeping masks to support DHEI.

This allows subsequent patches to manage service-specific masks
independently at runtime.

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
 kernel/sched/isolation.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 3ad0d6df6a0a2..67a5ff273ea08 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -8,6 +8,7 @@
  *
  */
 #include <linux/sched/isolation.h>
+#include <linux/mutex.h>
 #include "sched.h"
 
 enum hk_flags {
@@ -16,6 +17,7 @@ enum hk_flags {
 	HK_FLAG_KERNEL_NOISE	= BIT(HK_TYPE_KERNEL_NOISE),
 };
 
+static DEFINE_MUTEX(housekeeping_mutex);
 DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
 EXPORT_SYMBOL_GPL(housekeeping_overridden);
 
@@ -105,8 +107,14 @@ void __init housekeeping_init(void)
 static void __init housekeeping_setup_type(enum hk_type type,
 					   cpumask_var_t housekeeping_staging)
 {
+	unsigned int gfp = GFP_KERNEL;
+
+	if (!slab_is_available())
+		gfp = GFP_NOWAIT;
+
+	if (!housekeeping.cpumasks[type])
+		alloc_cpumask_var(&housekeeping.cpumasks[type], gfp);
 
-	alloc_bootmem_cpumask_var(&housekeeping.cpumasks[type]);
 	cpumask_copy(housekeeping.cpumasks[type],
 		     housekeeping_staging);
 }
@@ -116,6 +124,10 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
 	cpumask_var_t non_housekeeping_mask, housekeeping_staging;
 	unsigned int first_cpu;
 	int err = 0;
+	unsigned int gfp = GFP_KERNEL;
+
+	if (!slab_is_available())
+		gfp = GFP_NOWAIT;
 
 	if ((flags & HK_FLAG_KERNEL_NOISE) && !(housekeeping.flags & HK_FLAG_KERNEL_NOISE)) {
 		if (!IS_ENABLED(CONFIG_NO_HZ_FULL)) {
@@ -125,13 +137,17 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
 		}
 	}
 
-	alloc_bootmem_cpumask_var(&non_housekeeping_mask);
+	if (!alloc_cpumask_var(&non_housekeeping_mask, gfp))
+		return 0;
+
 	if (cpulist_parse(str, non_housekeeping_mask) < 0) {
 		pr_warn("Housekeeping: nohz_full= or isolcpus= incorrect CPU range\n");
 		goto free_non_housekeeping_mask;
 	}
 
-	alloc_bootmem_cpumask_var(&housekeeping_staging);
+	if (!alloc_cpumask_var(&housekeeping_staging, gfp))
+		goto free_non_housekeeping_mask;
+
 	cpumask_andnot(housekeeping_staging,
 		       cpu_possible_mask, non_housekeeping_mask);
 
@@ -203,9 +219,9 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
 	err = 1;
 
 free_housekeeping_staging:
-	free_bootmem_cpumask_var(housekeeping_staging);
+	free_cpumask_var(housekeeping_staging);
 free_non_housekeeping_mask:
-	free_bootmem_cpumask_var(non_housekeeping_mask);
+	free_cpumask_var(non_housekeeping_mask);
 
 	return err;
 }

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 02/15] sched/isolation: Introduce housekeeping notifier infrastructure
  2026-03-25  9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
  2026-03-25  9:09 ` [PATCH 01/15] sched/isolation: Support dynamic allocation for housekeeping masks Qiliang Yuan
@ 2026-03-25  9:09 ` Qiliang Yuan
  2026-03-25 13:58   ` Peter Zijlstra
  2026-03-25  9:09 ` [PATCH 03/15] sched/isolation: Separate housekeeping types in enum hk_type Qiliang Yuan
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25  9:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
	Shuah Khan
  Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan

Subsystems currently rely on static housekeeping masks determined at
boot. Supporting runtime reconfiguration (DHEI) requires a mechanism
to broadcast mask changes to affected kernel components.

Implement a blocking notifier chain for housekeeping mask updates.

This infrastructure enables subsystems like genirq, workqueues, and RCU
to react dynamically to isolation changes.

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
 include/linux/sched/isolation.h | 21 +++++++++++++++++++++
 kernel/sched/isolation.c        | 24 ++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index d8501f4709b58..9df55237d3901 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -5,6 +5,7 @@
 #include <linux/cpuset.h>
 #include <linux/init.h>
 #include <linux/tick.h>
+#include <linux/notifier.h>
 
 enum hk_type {
 	HK_TYPE_DOMAIN,
@@ -24,6 +25,13 @@ enum hk_type {
 	HK_TYPE_KTHREAD = HK_TYPE_KERNEL_NOISE
 };
 
+struct housekeeping_update {
+	enum hk_type type;
+	const struct cpumask *new_mask;
+};
+
+#define HK_UPDATE_MASK	0x01
+
 #ifdef CONFIG_CPU_ISOLATION
 DECLARE_STATIC_KEY_FALSE(housekeeping_overridden);
 extern int housekeeping_any_cpu(enum hk_type type);
@@ -33,6 +41,9 @@ extern void housekeeping_affine(struct task_struct *t, enum hk_type type);
 extern bool housekeeping_test_cpu(int cpu, enum hk_type type);
 extern void __init housekeeping_init(void);
 
+extern int housekeeping_register_notifier(struct notifier_block *nb);
+extern int housekeeping_unregister_notifier(struct notifier_block *nb);
+
 #else
 
 static inline int housekeeping_any_cpu(enum hk_type type)
@@ -59,6 +70,16 @@ static inline bool housekeeping_test_cpu(int cpu, enum hk_type type)
 }
 
 static inline void housekeeping_init(void) { }
+
+static inline int housekeeping_register_notifier(struct notifier_block *nb)
+{
+	return 0;
+}
+
+static inline int housekeeping_unregister_notifier(struct notifier_block *nb)
+{
+	return 0;
+}
 #endif /* CONFIG_CPU_ISOLATION */
 
 static inline bool housekeeping_cpu(int cpu, enum hk_type type)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 67a5ff273ea08..e7a21023726df 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -9,6 +9,7 @@
  */
 #include <linux/sched/isolation.h>
 #include <linux/mutex.h>
+#include <linux/notifier.h>
 #include "sched.h"
 
 enum hk_flags {
@@ -18,6 +19,7 @@ enum hk_flags {
 };
 
 static DEFINE_MUTEX(housekeeping_mutex);
+static BLOCKING_NOTIFIER_HEAD(housekeeping_notifier_list);
 DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
 EXPORT_SYMBOL_GPL(housekeeping_overridden);
 
@@ -86,6 +88,28 @@ bool housekeeping_test_cpu(int cpu, enum hk_type type)
 }
 EXPORT_SYMBOL_GPL(housekeeping_test_cpu);
 
+int housekeeping_register_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&housekeeping_notifier_list, nb);
+}
+EXPORT_SYMBOL_GPL(housekeeping_register_notifier);
+
+int housekeeping_unregister_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&housekeeping_notifier_list, nb);
+}
+EXPORT_SYMBOL_GPL(housekeeping_unregister_notifier);
+
+static int housekeeping_update_notify(enum hk_type type, const struct cpumask *new_mask)
+{
+	struct housekeeping_update update = {
+		.type = type,
+		.new_mask = new_mask,
+	};
+
+	return blocking_notifier_call_chain(&housekeeping_notifier_list, HK_UPDATE_MASK, &update);
+}
+
 void __init housekeeping_init(void)
 {
 	enum hk_type type;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 03/15] sched/isolation: Separate housekeeping types in enum hk_type
  2026-03-25  9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
  2026-03-25  9:09 ` [PATCH 01/15] sched/isolation: Support dynamic allocation for housekeeping masks Qiliang Yuan
  2026-03-25  9:09 ` [PATCH 02/15] sched/isolation: Introduce housekeeping notifier infrastructure Qiliang Yuan
@ 2026-03-25  9:09 ` Qiliang Yuan
  2026-03-25 13:59   ` Peter Zijlstra
  2026-03-25  9:09 ` [PATCH 04/15] genirq: Support dynamic migration for managed interrupts Qiliang Yuan
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25  9:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
	Shuah Khan
  Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan

Most kernel noise types (TICK, TIMER, RCU, etc.) are currently
aliased to a single HK_TYPE_KERNEL_NOISE enum value. This prevents
fine-grained runtime isolation control as all masks are forced to be
identical.

Un-alias service-specific housekeeping types in enum hk_type.

This separation provides the necessary granularity for DHEI subsystems
to subscribe to and maintain independent affinity masks.

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
 include/linux/sched/isolation.h | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index 9df55237d3901..6ec64eb3f8bcb 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -10,21 +10,18 @@
 enum hk_type {
 	HK_TYPE_DOMAIN,
 	HK_TYPE_MANAGED_IRQ,
-	HK_TYPE_KERNEL_NOISE,
+	HK_TYPE_TICK,
+	HK_TYPE_TIMER,
+	HK_TYPE_RCU,
+	HK_TYPE_MISC,
+	HK_TYPE_WQ,
+	HK_TYPE_KTHREAD,
 	HK_TYPE_MAX,
 
-	/*
-	 * The following housekeeping types are only set by the nohz_full
-	 * boot commandline option. So they can share the same value.
-	 */
-	HK_TYPE_TICK    = HK_TYPE_KERNEL_NOISE,
-	HK_TYPE_TIMER   = HK_TYPE_KERNEL_NOISE,
-	HK_TYPE_RCU     = HK_TYPE_KERNEL_NOISE,
-	HK_TYPE_MISC    = HK_TYPE_KERNEL_NOISE,
-	HK_TYPE_WQ      = HK_TYPE_KERNEL_NOISE,
-	HK_TYPE_KTHREAD = HK_TYPE_KERNEL_NOISE
 };
 
+#define HK_TYPE_KERNEL_NOISE HK_TYPE_TICK
+
 struct housekeeping_update {
 	enum hk_type type;
 	const struct cpumask *new_mask;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 04/15] genirq: Support dynamic migration for managed interrupts
  2026-03-25  9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
                   ` (2 preceding siblings ...)
  2026-03-25  9:09 ` [PATCH 03/15] sched/isolation: Separate housekeeping types in enum hk_type Qiliang Yuan
@ 2026-03-25  9:09 ` Qiliang Yuan
  2026-03-25  9:09 ` [PATCH 05/15] rcu: Support runtime NOCB initialization and dynamic offloading Qiliang Yuan
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25  9:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
	Shuah Khan
  Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan

Managed interrupts currently have their affinity determined once,
honoring boot-time isolation settings. There is no mechanism to migrate
them when housekeeping boundaries change at runtime.

Enable managed interrupts to respond dynamically to housekeeping updates.

This ensures that managed interrupts are migrated away from newly
isolated CPUs or redistributed when housekeeping CPUs are added.

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
 kernel/irq/manage.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 349ae7979da0e..f2cba3d7ef624 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -2811,3 +2811,52 @@ bool irq_check_status_bit(unsigned int irq, unsigned int bitmask)
 	return res;
 }
 EXPORT_SYMBOL_GPL(irq_check_status_bit);
+
+#ifdef CONFIG_SMP
+static int irq_housekeeping_reconfigure(struct notifier_block *nb,
+				       unsigned long action, void *data)
+{
+	struct housekeeping_update *upd = data;
+	unsigned int irq;
+
+	if (action != HK_UPDATE_MASK || upd->type != HK_TYPE_MANAGED_IRQ)
+		return NOTIFY_OK;
+
+	irq_lock_sparse();
+	for_each_active_irq(irq) {
+		struct irq_data *irqd;
+		struct irq_desc *desc;
+
+		desc = irq_to_desc(irq);
+		if (!desc)
+			continue;
+
+		scoped_guard(raw_spinlock_irqsave, &desc->lock) {
+			irqd = irq_desc_get_irq_data(desc);
+			if (!irqd_affinity_is_managed(irqd) || !desc->action ||
+			    !irq_data_get_irq_chip(irqd))
+				continue;
+
+			/*
+			 * Re-apply existing affinity to honor the new
+			 * housekeeping mask via __irq_set_affinity() logic.
+			 */
+			irq_set_affinity_locked(irqd, irq_data_get_affinity_mask(irqd), false);
+		}
+	}
+	irq_unlock_sparse();
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block irq_housekeeping_nb = {
+	.notifier_call = irq_housekeeping_reconfigure,
+};
+
+static int __init irq_init_housekeeping_notifier(void)
+{
+	housekeeping_register_notifier(&irq_housekeeping_nb);
+	return 0;
+}
+core_initcall(irq_init_housekeeping_notifier);
+#endif

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 05/15] rcu: Support runtime NOCB initialization and dynamic offloading
  2026-03-25  9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
                   ` (3 preceding siblings ...)
  2026-03-25  9:09 ` [PATCH 04/15] genirq: Support dynamic migration for managed interrupts Qiliang Yuan
@ 2026-03-25  9:09 ` Qiliang Yuan
  2026-03-25  9:09 ` [PATCH 06/15] sched/core: Dynamically update scheduler domain housekeeping mask Qiliang Yuan
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25  9:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
	Shuah Khan
  Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan

Context:
The RCU Non-Callback (NOCB) infrastructure traditionally requires
boot-time parameters (e.g., rcu_nocbs) to allocate masks and spawn
management kthreads (rcuog/rcuo). This prevents systems from activating
offloading on-demand without a reboot.

Problem:
Dynamic Housekeeping & Enhanced Isolation (DHEI) requires CPUs to
transition to NOCB mode at runtime. Without boot-time setup, the
NOCB masks are unallocated, and critical kthreads are missing,
preventing effective tick suppression and isolation.

Solution:
Refactor RCU initialization to support dynamic on-demand setup.
- Introduce rcu_init_nocb_dynamic() to allocate masks and organize
  kthreads if the system wasn't initially configured for NOCB.
- Update rcu_housekeeping_reconfigure() to iterate over CPUs and
  perform safe offload/deoffload transitions via hotplug sequences
  (cpu_down -> offload -> cpu_up).
- Remove __init from rcu_organize_nocb_kthreads to allow runtime
  reconfiguration of the callback management hierarchy.

This enables a true "Zero-Conf" isolation experience where any CPU
can be fully isolated at runtime regardless of boot parameters.
---
 kernel/rcu/rcu.h       |  4 +++
 kernel/rcu/tree.c      | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/rcu/tree.h      |  2 +-
 kernel/rcu/tree_nocb.h | 27 ++++++++++++------
 4 files changed, 99 insertions(+), 10 deletions(-)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 9cf01832a6c3d..fa9de9a3918b1 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -658,8 +658,12 @@ unsigned long srcu_batches_completed(struct srcu_struct *sp);
 #endif // #else // #ifdef CONFIG_TINY_SRCU
 
 #ifdef CONFIG_RCU_NOCB_CPU
+void rcu_init_nocb_dynamic(void);
+void rcu_spawn_cpu_nocb_kthread(int cpu);
 void rcu_bind_current_to_nocb(void);
 #else
+static inline void rcu_init_nocb_dynamic(void) { }
+static inline void rcu_spawn_cpu_nocb_kthread(int cpu) { }
 static inline void rcu_bind_current_to_nocb(void) { }
 #endif
 
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 293bbd9ac3f4e..3fd12ac20957f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -48,6 +48,7 @@
 #include <linux/delay.h>
 #include <linux/random.h>
 #include <linux/trace_events.h>
+#include <linux/sched/isolation.h>
 #include <linux/suspend.h>
 #include <linux/ftrace.h>
 #include <linux/tick.h>
@@ -4916,4 +4917,79 @@ void __init rcu_init(void)
 #include "tree_stall.h"
 #include "tree_exp.h"
 #include "tree_nocb.h"
+
+#ifdef CONFIG_SMP
+static int rcu_housekeeping_reconfigure(struct notifier_block *nb,
+					unsigned long action, void *data)
+{
+	struct housekeeping_update *upd = data;
+	struct task_struct *t;
+	int cpu;
+
+	if (action != HK_UPDATE_MASK || upd->type != HK_TYPE_RCU)
+		return NOTIFY_OK;
+
+	rcu_init_nocb_dynamic();
+
+	for_each_possible_cpu(cpu) {
+		struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+		bool isolated = !cpumask_test_cpu(cpu, upd->new_mask);
+		bool offloaded = rcu_rdp_is_offloaded(rdp);
+
+		if (isolated && !offloaded) {
+			/* Transition to NOCB */
+			pr_info("rcu: CPU %d transitioning to NOCB mode\n", cpu);
+			if (cpu_online(cpu)) {
+				remove_cpu(cpu);
+				rcu_spawn_cpu_nocb_kthread(cpu);
+				rcu_nocb_cpu_offload(cpu);
+				add_cpu(cpu);
+			} else {
+				rcu_spawn_cpu_nocb_kthread(cpu);
+				rcu_nocb_cpu_offload(cpu);
+			}
+		} else if (!isolated && offloaded) {
+			/* Transition to CB */
+			pr_info("rcu: CPU %d transitioning to CB mode\n", cpu);
+			if (cpu_online(cpu)) {
+				remove_cpu(cpu);
+				rcu_nocb_cpu_deoffload(cpu);
+				add_cpu(cpu);
+			} else {
+				rcu_nocb_cpu_deoffload(cpu);
+			}
+		}
+	}
+
+	t = READ_ONCE(rcu_state.gp_kthread);
+	if (t)
+		housekeeping_affine(t, HK_TYPE_RCU);
+
+#ifdef CONFIG_TASKS_RCU
+	t = get_rcu_tasks_gp_kthread();
+	if (t)
+		housekeeping_affine(t, HK_TYPE_RCU);
+#endif
+
+#ifdef CONFIG_TASKS_RUDE_RCU
+	t = get_rcu_tasks_rude_gp_kthread();
+	if (t)
+		housekeeping_affine(t, HK_TYPE_RCU);
+#endif
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block rcu_housekeeping_nb = {
+	.notifier_call = rcu_housekeeping_reconfigure,
+};
+
+static int __init rcu_init_housekeeping_notifier(void)
+{
+	housekeeping_register_notifier(&rcu_housekeeping_nb);
+	return 0;
+}
+late_initcall(rcu_init_housekeeping_notifier);
+#endif
+
 #include "tree_plugin.h"
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index b8bbe7960cda7..5322656a5a359 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -518,7 +518,7 @@ static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp,
 				       unsigned long flags);
 static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp);
 #ifdef CONFIG_RCU_NOCB_CPU
-static void __init rcu_organize_nocb_kthreads(void);
+static void rcu_organize_nocb_kthreads(void);
 
 /*
  * Disable IRQs before checking offloaded state so that local
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index e6cd56603cad4..9f5f446e70b3f 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1285,6 +1285,22 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 }
 #endif // #ifdef CONFIG_RCU_LAZY
 
+void rcu_init_nocb_dynamic(void)
+{
+	if (rcu_state.nocb_is_setup)
+		return;
+
+	if (!cpumask_available(rcu_nocb_mask)) {
+		if (!zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL)) {
+			pr_info("rcu_nocb_mask allocation failed, dynamic offloading disabled.\n");
+			return;
+		}
+	}
+
+	rcu_state.nocb_is_setup = true;
+	rcu_organize_nocb_kthreads();
+}
+
 void __init rcu_init_nohz(void)
 {
 	int cpu;
@@ -1302,15 +1318,8 @@ void __init rcu_init_nohz(void)
 		cpumask = cpu_possible_mask;
 
 	if (cpumask) {
-		if (!cpumask_available(rcu_nocb_mask)) {
-			if (!zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL)) {
-				pr_info("rcu_nocb_mask allocation failed, callback offloading disabled.\n");
-				return;
-			}
-		}
-
+		rcu_init_nocb_dynamic();
 		cpumask_or(rcu_nocb_mask, rcu_nocb_mask, cpumask);
-		rcu_state.nocb_is_setup = true;
 	}
 
 	if (!rcu_state.nocb_is_setup)
@@ -1442,7 +1451,7 @@ module_param(rcu_nocb_gp_stride, int, 0444);
 /*
  * Initialize GP-CB relationships for all no-CBs CPU.
  */
-static void __init rcu_organize_nocb_kthreads(void)
+static void rcu_organize_nocb_kthreads(void)
 {
 	int cpu;
 	bool firsttime = true;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 06/15] sched/core: Dynamically update scheduler domain housekeeping mask
  2026-03-25  9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
                   ` (4 preceding siblings ...)
  2026-03-25  9:09 ` [PATCH 05/15] rcu: Support runtime NOCB initialization and dynamic offloading Qiliang Yuan
@ 2026-03-25  9:09 ` Qiliang Yuan
  2026-03-25 14:00   ` Peter Zijlstra
  2026-03-25  9:09 ` [PATCH 07/15] watchdog: Allow runtime toggle of lockup detector affinity Qiliang Yuan
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25  9:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
	Shuah Khan
  Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan

Scheduler domains rely on HK_TYPE_DOMAIN to identify which CPUs are
isolated from general load balancing. Currently, these boundaries are
static and determined only during boot-time domain initialization.

Trigger a scheduler domain rebuild when the HK_TYPE_DOMAIN mask changes.

This ensures that scheduler isolation boundaries can be reconfigured
at runtime via the DHEI sysfs interface.

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
 kernel/sched/core.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 045f83ad261e2..ddf9951f1438c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -39,6 +39,7 @@
 #include <linux/sched/nohz.h>
 #include <linux/sched/rseq_api.h>
 #include <linux/sched/rt.h>
+#include <linux/sched/topology.h>
 
 #include <linux/blkdev.h>
 #include <linux/context_tracking.h>
@@ -10832,3 +10833,25 @@ void sched_change_end(struct sched_change_ctx *ctx)
 		p->sched_class->prio_changed(rq, p, ctx->prio);
 	}
 }
+
+static int sched_housekeeping_update(struct notifier_block *nb,
+				     unsigned long action, void *data)
+{
+	struct housekeeping_update *update = data;
+
+	if (action == HK_UPDATE_MASK && update->type == HK_TYPE_DOMAIN)
+		rebuild_sched_domains();
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block sched_housekeeping_nb = {
+	.notifier_call = sched_housekeeping_update,
+};
+
+static int __init sched_housekeeping_init(void)
+{
+	housekeeping_register_notifier(&sched_housekeeping_nb);
+	return 0;
+}
+late_initcall(sched_housekeeping_init);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 07/15] watchdog: Allow runtime toggle of lockup detector affinity
  2026-03-25  9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
                   ` (5 preceding siblings ...)
  2026-03-25  9:09 ` [PATCH 06/15] sched/core: Dynamically update scheduler domain housekeeping mask Qiliang Yuan
@ 2026-03-25  9:09 ` Qiliang Yuan
  2026-03-25 14:03   ` Peter Zijlstra
  2026-03-25  9:09 ` [PATCH 08/15] workqueue: Support dynamic housekeeping mask updates Qiliang Yuan
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25  9:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
	Shuah Khan
  Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan

The hardlockup detector threads are affined to CPUs based on the
HK_TYPE_TIMER housekeeping mask at boot. If this mask is updated at
runtime, these threads remain on their original CPUs, potentially
running on isolated cores.

Synchronize watchdog thread affinity with HK_TYPE_TIMER updates.

This ensures that hardlockup detector threads correctly follow the
dynamic housekeeping boundaries for timers.

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
 kernel/watchdog.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 366122f4a0f87..ef93795729697 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -26,6 +26,7 @@
 #include <linux/sysctl.h>
 #include <linux/tick.h>
 #include <linux/sys_info.h>
+#include <linux/sched/isolation.h>
 
 #include <linux/sched/clock.h>
 #include <linux/sched/debug.h>
@@ -1359,6 +1360,29 @@ static int __init lockup_detector_check(void)
 }
 late_initcall_sync(lockup_detector_check);
 
+static int watchdog_housekeeping_reconfigure(struct notifier_block *nb,
+					    unsigned long action, void *data)
+{
+	if (action == HK_UPDATE_MASK) {
+		struct housekeeping_update *upd = data;
+		unsigned int type = upd->type;
+
+		if (type == HK_TYPE_TIMER) {
+			mutex_lock(&watchdog_mutex);
+			cpumask_copy(&watchdog_cpumask,
+				     housekeeping_cpumask(HK_TYPE_TIMER));
+			proc_watchdog_update(false);
+			mutex_unlock(&watchdog_mutex);
+		}
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block watchdog_housekeeping_nb = {
+	.notifier_call = watchdog_housekeeping_reconfigure,
+};
+
 void __init lockup_detector_init(void)
 {
 	if (tick_nohz_full_enabled())
@@ -1373,4 +1397,5 @@ void __init lockup_detector_init(void)
 		allow_lockup_detector_init_retry = true;
 
 	lockup_detector_setup();
+	housekeeping_register_notifier(&watchdog_housekeeping_nb);
 }

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 08/15] workqueue: Support dynamic housekeeping mask updates
  2026-03-25  9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
                   ` (6 preceding siblings ...)
  2026-03-25  9:09 ` [PATCH 07/15] watchdog: Allow runtime toggle of lockup detector affinity Qiliang Yuan
@ 2026-03-25  9:09 ` Qiliang Yuan
  2026-03-25  9:09 ` [PATCH 09/15] mm/compaction: Support dynamic housekeeping mask updates for kcompactd Qiliang Yuan
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25  9:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
	Shuah Khan
  Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan

Unbound workqueues use HK_TYPE_WQ and HK_TYPE_DOMAIN to determine
their default CPU affinity. These boundaries are currently static and
only enforced during early boot.

Implement a housekeeping notifier to update unbound workqueue affinity.

This enables unbound workqueue tasks to respect dynamic isolation
boundaries at runtime.

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
 kernel/workqueue.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 253311af47c6d..ef3ef7e3fe81f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7904,6 +7904,47 @@ static void __init wq_cpu_intensive_thresh_init(void)
 	wq_cpu_intensive_thresh_us = thresh;
 }
 
+static int wq_housekeeping_reconfigure(struct notifier_block *nb,
+				     unsigned long action, void *data)
+{
+	if (action == HK_UPDATE_MASK) {
+		struct housekeeping_update *upd = data;
+		unsigned int type = upd->type;
+
+		if (type == HK_TYPE_WQ || type == HK_TYPE_DOMAIN) {
+			cpumask_var_t cpumask;
+
+			if (!alloc_cpumask_var(&cpumask, GFP_KERNEL)) {
+				pr_warn("workqueue: failed to allocate cpumask for housekeeping update\n");
+				return NOTIFY_BAD;
+			}
+
+			cpumask_copy(cpumask, cpu_possible_mask);
+			if (!cpumask_empty(housekeeping_cpumask(HK_TYPE_WQ)))
+				cpumask_and(cpumask, cpumask, housekeeping_cpumask(HK_TYPE_WQ));
+			if (!cpumask_empty(housekeeping_cpumask(HK_TYPE_DOMAIN)))
+				cpumask_and(cpumask, cpumask, housekeeping_cpumask(HK_TYPE_DOMAIN));
+
+			workqueue_set_unbound_cpumask(cpumask);
+
+			if (type == HK_TYPE_DOMAIN) {
+				apply_wqattrs_lock();
+				cpumask_andnot(wq_isolated_cpumask, cpu_possible_mask,
+						housekeeping_cpumask(HK_TYPE_DOMAIN));
+				apply_wqattrs_unlock();
+			}
+
+			free_cpumask_var(cpumask);
+		}
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block wq_housekeeping_nb = {
+	.notifier_call = wq_housekeeping_reconfigure,
+};
+
 /**
  * workqueue_init - bring workqueue subsystem fully online
  *
@@ -7964,6 +8005,7 @@ void __init workqueue_init(void)
 
 	wq_online = true;
 	wq_watchdog_init();
+	housekeeping_register_notifier(&wq_housekeeping_nb);
 }
 
 /*

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 09/15] mm/compaction: Support dynamic housekeeping mask updates for kcompactd
  2026-03-25  9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
                   ` (7 preceding siblings ...)
  2026-03-25  9:09 ` [PATCH 08/15] workqueue: Support dynamic housekeeping mask updates Qiliang Yuan
@ 2026-03-25  9:09 ` Qiliang Yuan
  2026-03-25  9:09 ` [PATCH 10/15] tick/nohz: Transition to dynamic full dynticks state management Qiliang Yuan
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25  9:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
	Shuah Khan
  Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan

The kcompactd threads are affined to housekeeping CPUs (HK_TYPE_DOMAIN)
at boot to avoid interference with isolated workloads. Currently,
these threads do not migrate when the housekeeping boundaries are
reconfigured at runtime.

Implement a housekeeping notifier to synchronize kcompactd affinity.

This ensures that background compaction threads honor the dynamic
isolation boundaries configured via the DHEI sysfs interface.

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
 mm/compaction.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index 1e8f8eca318c6..574ee3c6dc942 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -24,6 +24,7 @@
 #include <linux/page_owner.h>
 #include <linux/psi.h>
 #include <linux/cpuset.h>
+#include <linux/sched/isolation.h>
 #include "internal.h"
 
 #ifdef CONFIG_COMPACTION
@@ -3246,6 +3247,7 @@ void __meminit kcompactd_run(int nid)
 		pr_err("Failed to start kcompactd on node %d\n", nid);
 		pgdat->kcompactd = NULL;
 	} else {
+		housekeeping_affine(pgdat->kcompactd, HK_TYPE_KTHREAD);
 		wake_up_process(pgdat->kcompactd);
 	}
 }
@@ -3320,6 +3322,30 @@ static const struct ctl_table vm_compaction[] = {
 	},
 };
 
+static int kcompactd_housekeeping_reconfigure(struct notifier_block *nb,
+					      unsigned long action, void *data)
+{
+	struct housekeeping_update *upd = data;
+	unsigned int type = upd->type;
+
+	if (action == HK_UPDATE_MASK && type == HK_TYPE_KTHREAD) {
+		int nid;
+
+		for_each_node_state(nid, N_MEMORY) {
+			pg_data_t *pgdat = NODE_DATA(nid);
+
+			if (pgdat->kcompactd)
+				housekeeping_affine(pgdat->kcompactd, HK_TYPE_KTHREAD);
+		}
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block kcompactd_housekeeping_nb = {
+	.notifier_call = kcompactd_housekeeping_reconfigure,
+};
+
 static int __init kcompactd_init(void)
 {
 	int nid;
@@ -3327,6 +3353,7 @@ static int __init kcompactd_init(void)
 	for_each_node_state(nid, N_MEMORY)
 		kcompactd_run(nid);
 	register_sysctl_init("vm", vm_compaction);
+	housekeeping_register_notifier(&kcompactd_housekeeping_nb);
 	return 0;
 }
 subsys_initcall(kcompactd_init)

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 10/15] tick/nohz: Transition to dynamic full dynticks state management
  2026-03-25  9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
                   ` (8 preceding siblings ...)
  2026-03-25  9:09 ` [PATCH 09/15] mm/compaction: Support dynamic housekeeping mask updates for kcompactd Qiliang Yuan
@ 2026-03-25  9:09 ` Qiliang Yuan
  2026-03-25  9:09 ` [PATCH 11/15] sched/isolation: Implement SMT-aware isolation and safety guards Qiliang Yuan
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25  9:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
	Shuah Khan
  Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan

Context:
Full dynticks (NOHZ_FULL) is typically a static configuration determined
at boot time. DHEI extends this to support runtime activation.

Problem:
Switching to NOHZ_FULL at runtime requires careful synchronization
of context tracking and housekeeping states. Re-invoking setup logic
multiple times could lead to inconsistencies or warnings, and RCU
dependency checks often prevented tick suppression in "Zero-Conf" setups.

Solution:
- Replaced the static tick_nohz_full_enabled() checks with a dynamic
  tick_nohz_full_running state variable.
- Refactored tick_nohz_full_setup to be safe for runtime invocation,
  adding guards against re-initialization and ensuring IRQ work
  interrupt support.
- Implemented boot-time pre-activation of context tracking (shadow
  init) for all possible CPUs to avoid instruction flow issues during
  dynamic transitions.
- Restored standard rcu_needs_cpu() checks now that RCU supports
  native dynamic NOCB mode switching.

This provides the core state machine for reliable, on-demand tick
suppression and high-performance isolation.
---
 kernel/time/tick-sched.c | 130 ++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 105 insertions(+), 25 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 2f8a7923fa279..dee42cea259a9 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -27,6 +27,7 @@
 #include <linux/posix-timers.h>
 #include <linux/context_tracking.h>
 #include <linux/mm.h>
+#include <linux/sched/isolation.h>
 
 #include <asm/irq_regs.h>
 
@@ -621,13 +622,25 @@ void __tick_nohz_task_switch(void)
 /* Get the boot-time nohz CPU list from the kernel parameters. */
 void __init tick_nohz_full_setup(cpumask_var_t cpumask)
 {
-	alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
+	if (!tick_nohz_full_mask) {
+		if (!slab_is_available())
+			alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
+		else
+			zalloc_cpumask_var(&tick_nohz_full_mask, GFP_KERNEL);
+	}
 	cpumask_copy(tick_nohz_full_mask, cpumask);
 	tick_nohz_full_running = true;
 }
 
 bool tick_nohz_cpu_hotpluggable(unsigned int cpu)
 {
+	/*
+	 * Allow all CPUs to go down during shutdown/reboot to avoid
+	 * interfering with the final power-off sequence.
+	 */
+	if (system_state > SYSTEM_RUNNING)
+		return true;
+
 	/*
 	 * The 'tick_do_timer_cpu' CPU handles housekeeping duty (unbound
 	 * timers, workqueues, timekeeping, ...) on behalf of full dynticks
@@ -643,45 +656,112 @@ static int tick_nohz_cpu_down(unsigned int cpu)
 	return tick_nohz_cpu_hotpluggable(cpu) ? 0 : -EBUSY;
 }
 
+static int tick_nohz_housekeeping_reconfigure(struct notifier_block *nb,
+					     unsigned long action, void *data)
+{
+	struct housekeeping_update *upd = data;
+	int cpu;
+
+	if (action == HK_UPDATE_MASK && upd->type == HK_TYPE_TICK) {
+		cpumask_var_t non_housekeeping_mask;
+
+		if (!alloc_cpumask_var(&non_housekeeping_mask, GFP_KERNEL))
+			return NOTIFY_BAD;
+
+		cpumask_andnot(non_housekeeping_mask, cpu_possible_mask, upd->new_mask);
+
+		if (!tick_nohz_full_mask) {
+			if (!zalloc_cpumask_var(&tick_nohz_full_mask, GFP_KERNEL)) {
+				free_cpumask_var(non_housekeeping_mask);
+				return NOTIFY_BAD;
+			}
+		}
+
+		/* Kick all CPUs to re-evaluate tick dependency before change */
+		for_each_online_cpu(cpu)
+			tick_nohz_full_kick_cpu(cpu);
+
+		cpumask_copy(tick_nohz_full_mask, non_housekeeping_mask);
+		tick_nohz_full_running = !cpumask_empty(tick_nohz_full_mask);
+
+		/*
+		 * If nohz_full is running, the timer duty must be on a housekeeper.
+		 * If the current timer CPU is not a housekeeper, or no duty is assigned,
+		 * pick the first housekeeper and assign it.
+		 */
+		if (tick_nohz_full_running) {
+			int timer_cpu = READ_ONCE(tick_do_timer_cpu);
+			if (timer_cpu == TICK_DO_TIMER_NONE ||
+			    !cpumask_test_cpu(timer_cpu, upd->new_mask)) {
+				int next_timer = cpumask_first(upd->new_mask);
+				if (next_timer < nr_cpu_ids)
+					WRITE_ONCE(tick_do_timer_cpu, next_timer);
+			}
+		}
+
+		/* Kick all CPUs again to apply new nohz full state */
+		for_each_online_cpu(cpu)
+			tick_nohz_full_kick_cpu(cpu);
+
+		free_cpumask_var(non_housekeeping_mask);
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block tick_nohz_housekeeping_nb = {
+	.notifier_call = tick_nohz_housekeeping_reconfigure,
+};
+
 void __init tick_nohz_init(void)
 {
 	int cpu, ret;
 
-	if (!tick_nohz_full_running)
-		return;
-
-	/*
-	 * Full dynticks uses IRQ work to drive the tick rescheduling on safe
-	 * locking contexts. But then we need IRQ work to raise its own
-	 * interrupts to avoid circular dependency on the tick.
-	 */
-	if (!arch_irq_work_has_interrupt()) {
-		pr_warn("NO_HZ: Can't run full dynticks because arch doesn't support IRQ work self-IPIs\n");
-		cpumask_clear(tick_nohz_full_mask);
-		tick_nohz_full_running = false;
-		return;
+	if (!tick_nohz_full_mask) {
+		if (!slab_is_available())
+			alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
+		else
+			zalloc_cpumask_var(&tick_nohz_full_mask, GFP_KERNEL);
 	}
 
-	if (IS_ENABLED(CONFIG_PM_SLEEP_SMP) &&
-			!IS_ENABLED(CONFIG_PM_SLEEP_SMP_NONZERO_CPU)) {
-		cpu = smp_processor_id();
+	housekeeping_register_notifier(&tick_nohz_housekeeping_nb);
 
-		if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
-			pr_warn("NO_HZ: Clearing %d from nohz_full range "
-				"for timekeeping\n", cpu);
-			cpumask_clear_cpu(cpu, tick_nohz_full_mask);
+	if (tick_nohz_full_running) {
+		/*
+		 * Full dynticks uses IRQ work to drive the tick rescheduling on safe
+		 * locking contexts. But then we need IRQ work to raise its own
+		 * interrupts to avoid circular dependency on the tick.
+		 */
+		if (!arch_irq_work_has_interrupt()) {
+			pr_warn("NO_HZ: Can't run full dynticks because arch doesn't support IRQ work self-IPIs\n");
+			cpumask_clear(tick_nohz_full_mask);
+			tick_nohz_full_running = false;
+			goto out;
 		}
+
+		if (IS_ENABLED(CONFIG_PM_SLEEP_SMP) &&
+				!IS_ENABLED(CONFIG_PM_SLEEP_SMP_NONZERO_CPU)) {
+			cpu = smp_processor_id();
+
+			if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
+				pr_warn("NO_HZ: Clearing %d from nohz_full range "
+					"for timekeeping\n", cpu);
+				cpumask_clear_cpu(cpu, tick_nohz_full_mask);
+			}
+		}
+
+		pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n",
+			cpumask_pr_args(tick_nohz_full_mask));
 	}
 
-	for_each_cpu(cpu, tick_nohz_full_mask)
+out:
+	for_each_possible_cpu(cpu)
 		ct_cpu_track_user(cpu);
 
 	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
 					"kernel/nohz:predown", NULL,
 					tick_nohz_cpu_down);
 	WARN_ON(ret < 0);
-	pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n",
-		cpumask_pr_args(tick_nohz_full_mask));
 }
 #endif /* #ifdef CONFIG_NO_HZ_FULL */
 
@@ -1200,7 +1280,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 	if (unlikely(report_idle_softirq()))
 		return false;
 
-	if (tick_nohz_full_enabled()) {
+	if (tick_nohz_full_running) {
 		int tick_cpu = READ_ONCE(tick_do_timer_cpu);
 
 		/*

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 11/15] sched/isolation: Implement SMT-aware isolation and safety guards
  2026-03-25  9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
                   ` (9 preceding siblings ...)
  2026-03-25  9:09 ` [PATCH 10/15] tick/nohz: Transition to dynamic full dynticks state management Qiliang Yuan
@ 2026-03-25  9:09 ` Qiliang Yuan
  2026-03-25  9:09 ` [PATCH 12/15] sched/isolation: Bridge boot-time parameters with dynamic isolation Qiliang Yuan
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25  9:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
	Shuah Khan
  Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan

Manual isolation of single SMT siblings can lead to resource
contention and inconsistent performance. Furthermore, userspace might
accidentally isolate all available CPUs, leading to a system lockup.

Enhance DHEI with SMT-aware grouping and safety checks.

These enhancements ensure that hardware resource boundaries are
respected and prevent catastrophic misconfiguration of the system.

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
 kernel/sched/isolation.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 180 insertions(+)

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index e7a21023726df..4a5967837e8de 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -10,6 +10,7 @@
 #include <linux/sched/isolation.h>
 #include <linux/mutex.h>
 #include <linux/notifier.h>
+#include <linux/topology.h>
 #include "sched.h"
 
 enum hk_flags {
@@ -29,6 +30,30 @@ struct housekeeping {
 };
 
 static struct housekeeping housekeeping;
+static bool housekeeping_smt_aware;
+
+static ssize_t smt_aware_show(struct kobject *kobj,
+			     struct kobj_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", housekeeping_smt_aware);
+}
+
+static ssize_t smt_aware_store(struct kobject *kobj,
+			      struct kobj_attribute *attr,
+			      const char *buf, size_t count)
+{
+	bool val;
+
+	if (kstrtobool(buf, &val))
+		return -EINVAL;
+
+	housekeeping_smt_aware = val;
+
+	return count;
+}
+
+static struct kobj_attribute smt_aware_attr =
+	__ATTR(smt_aware_mode, 0644, smt_aware_show, smt_aware_store);
 
 bool housekeeping_enabled(enum hk_type type)
 {
@@ -110,6 +135,161 @@ static int housekeeping_update_notify(enum hk_type type, const struct cpumask *n
 	return blocking_notifier_call_chain(&housekeeping_notifier_list, HK_UPDATE_MASK, &update);
 }
 
+static const char * const hk_type_names[] = {
+	[HK_TYPE_TIMER]		= "timer",
+	[HK_TYPE_RCU]		= "rcu",
+	[HK_TYPE_MISC]		= "misc",
+	[HK_TYPE_TICK]		= "tick",
+	[HK_TYPE_DOMAIN]	= "domain",
+	[HK_TYPE_WQ]		= "workqueue",
+	[HK_TYPE_MANAGED_IRQ]	= "managed_irq",
+	[HK_TYPE_KTHREAD]	= "kthread",
+};
+
+struct hk_attribute {
+	struct kobj_attribute kattr;
+	enum hk_type type;
+};
+
+#define to_hk_attr(_kattr) container_of(_kattr, struct hk_attribute, kattr)
+
+static ssize_t housekeeping_show(struct kobject *kobj,
+				struct kobj_attribute *attr, char *buf)
+{
+	struct hk_attribute *hk_attr = to_hk_attr(attr);
+	const struct cpumask *mask = housekeeping_cpumask(hk_attr->type);
+
+	return cpumap_print_to_pagebuf(true, buf, mask);
+}
+
+static ssize_t housekeeping_store(struct kobject *kobject,
+				 struct kobj_attribute *attr,
+				 const char *buf, size_t count)
+{
+	struct hk_attribute *hk_attr = to_hk_attr(attr);
+	enum hk_type type = hk_attr->type;
+	cpumask_var_t new_mask;
+	int err;
+
+	if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))
+		return -ENOMEM;
+
+	err = cpulist_parse(buf, new_mask);
+	if (err)
+		goto out_free;
+
+	/* Safety check: must have at least one online CPU for housekeeping */
+	if (!cpumask_intersects(new_mask, cpu_online_mask)) {
+		err = -EINVAL;
+		goto out_free;
+	}
+
+	if (housekeeping_smt_aware) {
+		int cpu, sibling;
+		cpumask_var_t tmp_mask;
+
+		if (!alloc_cpumask_var(&tmp_mask, GFP_KERNEL)) {
+			err = -ENOMEM;
+			goto out_free;
+		}
+
+		cpumask_copy(tmp_mask, new_mask);
+		for_each_cpu(cpu, tmp_mask) {
+			for_each_cpu(sibling, topology_sibling_cpumask(cpu)) {
+				if (!cpumask_test_cpu(sibling, tmp_mask)) {
+					/* SMT sibling should stay grouped */
+					cpumask_clear_cpu(cpu, new_mask);
+					break;
+				}
+			}
+		}
+		free_cpumask_var(tmp_mask);
+
+		/* Re-check after SMT sync */
+		if (!cpumask_intersects(new_mask, cpu_online_mask)) {
+			err = -EINVAL;
+			goto out_free;
+		}
+	}
+
+	mutex_lock(&housekeeping_mutex);
+
+	if (!housekeeping.cpumasks[type]) {
+		if (!alloc_cpumask_var(&housekeeping.cpumasks[type], GFP_KERNEL)) {
+			err = -ENOMEM;
+			goto out_unlock;
+		}
+	}
+
+	if (cpumask_equal(housekeeping.cpumasks[type], new_mask)) {
+		err = 0;
+		goto out_unlock;
+	}
+
+	cpumask_copy(housekeeping.cpumasks[type], new_mask);
+	housekeeping.flags |= BIT(type);
+	static_branch_enable(&housekeeping_overridden);
+
+	housekeeping_update_notify(type, new_mask);
+
+	err = count;
+
+out_unlock:
+	mutex_unlock(&housekeeping_mutex);
+out_free:
+	free_cpumask_var(new_mask);
+	return err < 0 ? err : count;
+}
+
+static struct hk_attribute housekeeping_attrs[HK_TYPE_MAX];
+static struct attribute *housekeeping_attr_ptr[HK_TYPE_MAX + 1];
+
+static const struct attribute_group housekeeping_attr_group = {
+	.attrs = housekeeping_attr_ptr,
+};
+
+static int __init housekeeping_sysfs_init(void)
+{
+	struct kobject *housekeeping_kobj;
+	int i, j = 0;
+	int ret;
+
+	housekeeping_kobj = kobject_create_and_add("housekeeping", kernel_kobj);
+	if (!housekeeping_kobj)
+		return -ENOMEM;
+
+	for (i = 0; i < HK_TYPE_MAX; i++) {
+		if (!hk_type_names[i])
+			continue;
+
+		housekeeping_attrs[i].type = i;
+		sysfs_attr_init(&housekeeping_attrs[i].kattr.attr);
+		housekeeping_attrs[i].kattr.attr.name = hk_type_names[i];
+		housekeeping_attrs[i].kattr.attr.mode = 0644;
+		housekeeping_attrs[i].kattr.show = housekeeping_show;
+		housekeeping_attrs[i].kattr.store = housekeeping_store;
+		housekeeping_attr_ptr[j++] = &housekeeping_attrs[i].kattr.attr;
+	}
+	housekeeping_attr_ptr[j] = NULL;
+
+	ret = sysfs_create_group(housekeeping_kobj, &housekeeping_attr_group);
+	if (ret)
+		goto err_group;
+
+	ret = sysfs_create_file(housekeeping_kobj, &smt_aware_attr.attr);
+	if (ret)
+		goto err_file;
+
+	return 0;
+
+err_file:
+	sysfs_remove_group(housekeeping_kobj, &housekeeping_attr_group);
+err_group:
+	kobject_put(housekeeping_kobj);
+	return ret;
+}
+late_initcall(housekeeping_sysfs_init);
+
 void __init housekeeping_init(void)
 {
 	enum hk_type type;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 12/15] sched/isolation: Bridge boot-time parameters with dynamic isolation
  2026-03-25  9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
                   ` (10 preceding siblings ...)
  2026-03-25  9:09 ` [PATCH 11/15] sched/isolation: Implement SMT-aware isolation and safety guards Qiliang Yuan
@ 2026-03-25  9:09 ` Qiliang Yuan
  2026-03-25  9:09 ` [PATCH 13/15] sched/isolation: Implement sysfs interface for dynamic housekeeping Qiliang Yuan
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25  9:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
	Shuah Khan
  Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan

The boot-time parameters 'isolcpus' and 'nohz_full' currently initialize
housekeeping masks that cannot be easily updated at runtime. To support
DHEI, the scheduler's tick offload infrastructure must be ready for
dynamic enablement even if no isolation was requested at boot.

Enable unconditional boot-time initialization for tick offload.

This ensures that the infrastructure for remote ticks is always present,
allowing DHEI to safely toggle full dynticks mode at runtime.

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
 kernel/sched/core.c      | 5 +++++
 kernel/sched/isolation.c | 3 ---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ddf9951f1438c..d987ce03e7cc6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5688,6 +5688,9 @@ static void sched_tick_stop(int cpu)
 
 int __init sched_tick_offload_init(void)
 {
+	if (tick_work_cpu)
+		return 0;
+
 	tick_work_cpu = alloc_percpu(struct tick_work);
 	BUG_ON(!tick_work_cpu);
 	return 0;
@@ -8509,6 +8512,8 @@ void __init sched_init_smp(void)
 	current->flags &= ~PF_NO_SETAFFINITY;
 	sched_init_granularity();
 
+	sched_tick_offload_init();
+
 	init_sched_rt_class();
 	init_sched_dl_class();
 
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 4a5967837e8de..685cc0df1bd9f 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -299,9 +299,6 @@ void __init housekeeping_init(void)
 
 	static_branch_enable(&housekeeping_overridden);
 
-	if (housekeeping.flags & HK_FLAG_KERNEL_NOISE)
-		sched_tick_offload_init();
-
 	for_each_set_bit(type, &housekeeping.flags, HK_TYPE_MAX) {
 		/* We need at least one CPU to handle housekeeping work */
 		WARN_ON_ONCE(cpumask_empty(housekeeping.cpumasks[type]));

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 13/15] sched/isolation: Implement sysfs interface for dynamic housekeeping
  2026-03-25  9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
                   ` (11 preceding siblings ...)
  2026-03-25  9:09 ` [PATCH 12/15] sched/isolation: Bridge boot-time parameters with dynamic isolation Qiliang Yuan
@ 2026-03-25  9:09 ` Qiliang Yuan
  2026-03-25 14:04   ` Peter Zijlstra
  2026-03-25  9:09 ` [PATCH 14/15] Documentation: isolation: Document DHEI sysfs interfaces Qiliang Yuan
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25  9:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
	Shuah Khan
  Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan

Subsystem housekeeping masks are currently static and can only be set
via boot-time parameters (isolcpus, nohz_full, etc.). There is no
userspace interface to reconfigure these boundaries at runtime.

Implement the DHEI sysfs interface under /sys/kernel/housekeeping.

This enables userspace to independently reconfigure different kernel
services' affinities without a reboot.

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
 kernel/sched/isolation.c | 89 ++++++++++++++++++++++++------------------------
 1 file changed, 45 insertions(+), 44 deletions(-)

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 685cc0df1bd9f..1c867784d155b 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -8,7 +8,12 @@
  *
  */
 #include <linux/sched/isolation.h>
+#include <linux/capability.h>
 #include <linux/mutex.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <linux/slab.h>
+#include <linux/ctype.h>
 #include <linux/notifier.h>
 #include <linux/topology.h>
 #include "sched.h"
@@ -16,9 +21,17 @@
 enum hk_flags {
 	HK_FLAG_DOMAIN		= BIT(HK_TYPE_DOMAIN),
 	HK_FLAG_MANAGED_IRQ	= BIT(HK_TYPE_MANAGED_IRQ),
-	HK_FLAG_KERNEL_NOISE	= BIT(HK_TYPE_KERNEL_NOISE),
+	HK_FLAG_TICK		= BIT(HK_TYPE_TICK),
+	HK_FLAG_TIMER		= BIT(HK_TYPE_TIMER),
+	HK_FLAG_RCU		= BIT(HK_TYPE_RCU),
+	HK_FLAG_MISC		= BIT(HK_TYPE_MISC),
+	HK_FLAG_WQ		= BIT(HK_TYPE_WQ),
+	HK_FLAG_KTHREAD		= BIT(HK_TYPE_KTHREAD),
 };
 
+#define HK_FLAG_KERNEL_NOISE (HK_FLAG_TICK | HK_FLAG_TIMER | HK_FLAG_RCU | \
+			      HK_FLAG_MISC | HK_FLAG_WQ | HK_FLAG_KTHREAD)
+
 static DEFINE_MUTEX(housekeeping_mutex);
 static BLOCKING_NOTIFIER_HEAD(housekeeping_notifier_list);
 DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
@@ -44,6 +57,9 @@ static ssize_t smt_aware_store(struct kobject *kobj,
 {
 	bool val;
 
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
 	if (kstrtobool(buf, &val))
 		return -EINVAL;
 
@@ -53,7 +69,7 @@ static ssize_t smt_aware_store(struct kobject *kobj,
 }
 
 static struct kobj_attribute smt_aware_attr =
-	__ATTR(smt_aware_mode, 0644, smt_aware_show, smt_aware_store);
+	__ATTR(smt_aware_mode, 0600, smt_aware_show, smt_aware_store);
 
 bool housekeeping_enabled(enum hk_type type)
 {
@@ -171,6 +187,9 @@ static ssize_t housekeeping_store(struct kobject *kobject,
 	cpumask_var_t new_mask;
 	int err;
 
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
 	if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))
 		return -ENOMEM;
 
@@ -178,42 +197,26 @@ static ssize_t housekeeping_store(struct kobject *kobject,
 	if (err)
 		goto out_free;
 
-	/* Safety check: must have at least one online CPU for housekeeping */
-	if (!cpumask_intersects(new_mask, cpu_online_mask)) {
+	if (cpumask_empty(new_mask) ||
+	    !cpumask_intersects(new_mask, cpu_online_mask)) {
 		err = -EINVAL;
 		goto out_free;
 	}
 
-	if (housekeeping_smt_aware) {
-		int cpu, sibling;
-		cpumask_var_t tmp_mask;
+	mutex_lock(&housekeeping_mutex);
 
-		if (!alloc_cpumask_var(&tmp_mask, GFP_KERNEL)) {
-			err = -ENOMEM;
-			goto out_free;
-		}
+	if (housekeeping_smt_aware) {
+		int cpu;
 
-		cpumask_copy(tmp_mask, new_mask);
-		for_each_cpu(cpu, tmp_mask) {
-			for_each_cpu(sibling, topology_sibling_cpumask(cpu)) {
-				if (!cpumask_test_cpu(sibling, tmp_mask)) {
-					/* SMT sibling should stay grouped */
-					cpumask_clear_cpu(cpu, new_mask);
-					break;
-				}
+		for_each_cpu(cpu, new_mask) {
+			if (!cpumask_subset(topology_sibling_cpumask(cpu),
+					    new_mask)) {
+				err = -EINVAL;
+				goto out_unlock;
 			}
 		}
-		free_cpumask_var(tmp_mask);
-
-		/* Re-check after SMT sync */
-		if (!cpumask_intersects(new_mask, cpu_online_mask)) {
-			err = -EINVAL;
-			goto out_free;
-		}
 	}
 
-	mutex_lock(&housekeeping_mutex);
-
 	if (!housekeeping.cpumasks[type]) {
 		if (!alloc_cpumask_var(&housekeeping.cpumasks[type], GFP_KERNEL)) {
 			err = -ENOMEM;
@@ -242,7 +245,7 @@ static ssize_t housekeeping_store(struct kobject *kobject,
 }
 
 static struct hk_attribute housekeeping_attrs[HK_TYPE_MAX];
-static struct attribute *housekeeping_attr_ptr[HK_TYPE_MAX + 1];
+static struct attribute *housekeeping_attr_ptr[HK_TYPE_MAX + 2];
 
 static const struct attribute_group housekeeping_attr_group = {
 	.attrs = housekeeping_attr_ptr,
@@ -265,28 +268,22 @@ static int __init housekeeping_sysfs_init(void)
 		housekeeping_attrs[i].type = i;
 		sysfs_attr_init(&housekeeping_attrs[i].kattr.attr);
 		housekeeping_attrs[i].kattr.attr.name = hk_type_names[i];
-		housekeeping_attrs[i].kattr.attr.mode = 0644;
+		housekeeping_attrs[i].kattr.attr.mode = 0600;
 		housekeeping_attrs[i].kattr.show = housekeeping_show;
 		housekeeping_attrs[i].kattr.store = housekeeping_store;
 		housekeeping_attr_ptr[j++] = &housekeeping_attrs[i].kattr.attr;
 	}
+
+	housekeeping_attr_ptr[j++] = &smt_aware_attr.attr;
 	housekeeping_attr_ptr[j] = NULL;
 
 	ret = sysfs_create_group(housekeeping_kobj, &housekeeping_attr_group);
-	if (ret)
-		goto err_group;
-
-	ret = sysfs_create_file(housekeeping_kobj, &smt_aware_attr.attr);
-	if (ret)
-		goto err_file;
+	if (ret) {
+		kobject_put(housekeeping_kobj);
+		return ret;
+	}
 
 	return 0;
-
-err_file:
-	sysfs_remove_group(housekeeping_kobj, &housekeeping_attr_group);
-err_group:
-	kobject_put(housekeeping_kobj);
-	return ret;
 }
 late_initcall(housekeeping_sysfs_init);
 
@@ -313,8 +310,12 @@ static void __init housekeeping_setup_type(enum hk_type type,
 	if (!slab_is_available())
 		gfp = GFP_NOWAIT;
 
-	if (!housekeeping.cpumasks[type])
-		alloc_cpumask_var(&housekeeping.cpumasks[type], gfp);
+	if (!housekeeping.cpumasks[type]) {
+		if (!alloc_cpumask_var(&housekeeping.cpumasks[type], gfp)) {
+			pr_err("housekeeping: failed to allocate cpumask for type %d\n", type);
+			return;
+		}
+	}
 
 	cpumask_copy(housekeeping.cpumasks[type],
 		     housekeeping_staging);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 14/15] Documentation: isolation: Document DHEI sysfs interfaces
  2026-03-25  9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
                   ` (12 preceding siblings ...)
  2026-03-25  9:09 ` [PATCH 13/15] sched/isolation: Implement sysfs interface for dynamic housekeeping Qiliang Yuan
@ 2026-03-25  9:09 ` Qiliang Yuan
  2026-03-25  9:09 ` [PATCH 15/15] selftests: dhei: Add functional tests for dynamic housekeeping Qiliang Yuan
  2026-03-25 16:02 ` [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Tejun Heo
  15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25  9:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
	Shuah Khan
  Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan

---
 .../ABI/testing/sysfs-kernel-housekeeping          | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-kernel-housekeeping b/Documentation/ABI/testing/sysfs-kernel-housekeeping
new file mode 100644
index 0000000000000..3648578200111
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-housekeeping
@@ -0,0 +1,22 @@
+What:		/sys/kernel/housekeeping/
+Date:		March 2026
+Contact:	Qiliang Yuan <realwujing@gmail.com>
+Description:
+		Directory containing the dynamic housekeeping configuration
+		for various kernel subsystems.
+
+		Each file represents a specific housekeeping type:
+		- timer: Timer and hrtimer interrupts.
+		- rcu: RCU callback offloading and GP kthreads.
+		- misc: Miscellaneous kernel services (e.g. kcompactd).
+		- tick: Dynamic full dynticks (NOHZ_FULL) state.
+		- domain: Scheduler domain isolation.
+		- workqueue: Workqueue affinity.
+		- managed_irq: Managed interrupts migration.
+		- kthread: General kernel thread affinity.
+		- smt_aware_mode: SMT-aware isolation toggle (0/1).
+		  When enabled, writing a mask that does not include all
+		  sibling threads of a core will be rejected with -EINVAL.
+
+		Writing a CPULIST to the type files dynamically updates the
+		housekeeping mask for the corresponding type.

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 15/15] selftests: dhei: Add functional tests for dynamic housekeeping
  2026-03-25  9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
                   ` (13 preceding siblings ...)
  2026-03-25  9:09 ` [PATCH 14/15] Documentation: isolation: Document DHEI sysfs interfaces Qiliang Yuan
@ 2026-03-25  9:09 ` Qiliang Yuan
  2026-03-25 16:02 ` [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Tejun Heo
  15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25  9:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
	Shuah Khan
  Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan

Dynamic Housekeeping (DHEI) introduces complex runtime interactions
across sysfs, scheduler, and various kernel subsystems. There are
currently no automated tests to verify the integrity of sysfs
boundaries, safety guards, or SMT-aware isolation logic.

Implement a kselftest suite for DHEI to ensure functional correctness.
This includes a dedicated test script (dhei_test.sh) covering sysfs
interface accessibility, safety guard enforcement, and SMT-aware grouping.

The suite also incorporates stress-ng based pressure testing to verify
load-shedding efficiency on isolated CPUs, Tick suppression under active
task load, and Workqueue restriction under competitive system pressure.

Usage:
  make -C tools/testing/selftests/dhei run_tests

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
 tools/testing/selftests/Makefile          |   1 +
 tools/testing/selftests/dhei/Makefile     |   4 +
 tools/testing/selftests/dhei/dhei_test.sh | 160 ++++++++++++++++++++++++++++++
 3 files changed, 165 insertions(+)

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 56e44a98d6a59..9d16b00623839 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -16,6 +16,7 @@ TARGETS += cpu-hotplug
 TARGETS += damon
 TARGETS += devices/error_logs
 TARGETS += devices/probe
+TARGETS += dhei
 TARGETS += dmabuf-heaps
 TARGETS += drivers/dma-buf
 TARGETS += drivers/ntsync
diff --git a/tools/testing/selftests/dhei/Makefile b/tools/testing/selftests/dhei/Makefile
new file mode 100644
index 0000000000000..a578691cc677c
--- /dev/null
+++ b/tools/testing/selftests/dhei/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+TEST_PROGS := dhei_test.sh
+
+include ../lib.mk
diff --git a/tools/testing/selftests/dhei/dhei_test.sh b/tools/testing/selftests/dhei/dhei_test.sh
new file mode 100755
index 0000000000000..a6137c52e7132
--- /dev/null
+++ b/tools/testing/selftests/dhei/dhei_test.sh
@@ -0,0 +1,160 @@
+#!/bin/sh
+# DHEI (Dynamic Housekeeping & Enhanced Isolation) Full-Coverage Verification Script
+# Strict POSIX compliant version for reliability on all shells.
+
+SYSFS_BASE="/sys/kernel/housekeeping"
+ONLINE_CPUS=$(cat /sys/devices/system/cpu/online)
+LAST_CPU=$(echo "$ONLINE_CPUS" | awk -F'[,-]' '{print $NF}')
+
+# Colors for output
+GREEN='\033[0;32m'
+RED='\033[0;31m'
+NC='\033[0m'
+
+log_pass() { echo "${GREEN}[OK]${NC} $1"; }
+log_fail() { echo "${RED}[FAIL]${NC} $1"; exit 1; }
+log_info() { echo "[INFO] $1"; }
+
+check_root() {
+    [ "$(id -u)" -eq 0 ] || log_fail "Please run as root"
+}
+
+test_sysfs_structure() {
+    log_info "TEST 1: Sysfs structure..."
+    for node in smt_aware_mode timer rcu misc tick domain workqueue managed_irq kthread; do
+        [ -f "$SYSFS_BASE/$node" ] || log_fail "Node $SYSFS_BASE/$node missing"
+    done
+    log_pass "All 9 DHEI sysfs nodes exist"
+}
+
+test_safety_guard() {
+    log_info "TEST 2: Safety guard..."
+    if echo "999-1024" > "$SYSFS_BASE/domain" 2>/dev/null; then
+        log_fail "Safety guard failed: allowed isolation of all CPUs"
+    fi
+    log_pass "Safety guard blocked invalid mask"
+}
+
+test_smt_aware_mode() {
+    log_info "TEST 3: SMT aware logic..."
+    [ -f /sys/devices/system/cpu/cpu0/topology/thread_siblings_list ] || { log_info "SMT not supported"; return; }
+    SIBLINGS=$(cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list)
+    FIRST=$(echo "$SIBLINGS" | cut -d',' -f1 | cut -d'-' -f1)
+    echo 1 > "$SYSFS_BASE/smt_aware_mode"
+    if echo "$FIRST" > "$SYSFS_BASE/timer" 2>/dev/null; then
+         echo 0 > "$SYSFS_BASE/smt_aware_mode"
+         log_fail "SMT mode failed: accepted partial core"
+    else
+         log_pass "SMT mode correctly rejected partial core"
+    fi
+    echo 0 > "$SYSFS_BASE/smt_aware_mode"
+}
+
+get_tick_count() {
+    grep "LOC:" /proc/interrupts | awk -v cpu="$LAST_CPU" '{print $(cpu+2)}'
+}
+
+test_tick_dynamic() {
+    log_info "TEST 4: Dynamic Tick toggle..."
+    [ "$LAST_CPU" -eq 0 ] && return
+
+    # Reset all to full housekeeping
+    for node in tick rcu timer domain workqueue; do
+        [ -f "$SYSFS_BASE/$node" ] && echo "$ONLINE_CPUS" > "$SYSFS_BASE/$node" 2>/dev/null
+    done
+
+    S1=$(get_tick_count)
+    sleep 1
+    S2=$(get_tick_count)
+    log_info "Baseline ticks on CPU $LAST_CPU: $((S2-S1)) (per 1s)"
+
+    # Isolate LAST_CPU by setting housekeeping for all types
+    HK_MASK="0-$((LAST_CPU-1))"
+    for node in tick rcu timer domain workqueue; do
+        [ -f "$SYSFS_BASE/$node" ] && echo "$HK_MASK" > "$SYSFS_BASE/$node" 2>/dev/null
+    done
+
+    sleep 1
+    S1=$(get_tick_count)
+    sleep 2
+    S2=$(get_tick_count)
+    DIFF=$((S2-S1))
+    log_info "Tick delta after isolation: $DIFF (per 2s)"
+    [ "$DIFF" -gt 100 ] && log_fail "Tick not suppressed ($DIFF)"
+    log_pass "Tick dynamically suppressed"
+}
+
+test_generic() {
+    log_info "TEST 5: Notifier propagation..."
+    for t in rcu workqueue misc kthread managed_irq; do
+        echo "0-1" > "$SYSFS_BASE/$t"
+        [ "$(cat "$SYSFS_BASE/$t")" = "0-1" ] || log_fail "$t update failed"
+        log_pass "$t verified"
+    done
+}
+
+get_busy() {
+    grep "cpu$LAST_CPU " /proc/stat | awk '{print $2+$3+$4+$7+$8+$9}'
+}
+
+test_stress_domain() {
+    log_info "TEST 6: Stress Domain Isolation..."
+    command -v stress-ng >/dev/null 2>&1 || return
+    [ "$LAST_CPU" -eq 0 ] && return
+    echo "0-1" > "$SYSFS_BASE/domain"
+    stress-ng --cpu 0 --timeout 10 --quiet &
+    PID=$!
+    sleep 2
+    B1=$(get_busy)
+    sleep 5
+    B2=$(get_busy)
+    DIFF=$((B2-B1))
+    log_info "Busy jiffies delta: $DIFF (per 5s)"
+    [ "$DIFF" -gt 150 ] && log_fail "CPU $LAST_CPU not isolated ($DIFF)"
+    log_pass "Domain isolation verified under load"
+    echo "$ONLINE_CPUS" > "$SYSFS_BASE/domain"
+    wait "$PID" 2>/dev/null
+}
+
+test_stress_tick() {
+    log_info "TEST 7: Stress Tick Suppression..."
+    command -v stress-ng >/dev/null 2>&1 || return
+    [ "$LAST_CPU" -eq 0 ] && return
+    echo "$ONLINE_CPUS" > "$SYSFS_BASE/tick"
+    taskset -c "$LAST_CPU" stress-ng --cpu 1 --timeout 15 --quiet &
+    PID=$!
+    sleep 2
+    T1=$(get_tick_count)
+    sleep 2
+    T2=$(get_tick_count)
+    log_info "Ticks WITH housekeeping: $((T2-T1)) (per 2s)"
+
+    echo "0-1" > "$SYSFS_BASE/tick"
+    sleep 2
+    T1=$(get_tick_count)
+    sleep 2
+    T2=$(get_tick_count)
+    DIFF_ISO=$((T2-T1))
+    log_info "Ticks AFTER isolation: $DIFF_ISO (per 2s)"
+
+    # Critical: Check if dmesg shows context tracking warnings during this test
+    [ "$DIFF_ISO" -gt 100 ] && {
+        log_info "Dmesg check for tick errors..."
+        dmesg | grep -i "tick" | tail -n 5
+    }
+
+    log_pass "Tick suppression scenario logged"
+    echo "$ONLINE_CPUS" > "$SYSFS_BASE/tick"
+    wait "$PID" 2>/dev/null
+}
+
+check_root
+test_sysfs_structure
+test_safety_guard
+test_smt_aware_mode
+test_tick_dynamic
+test_generic
+test_stress_domain
+test_stress_tick
+
+log_pass "DHEI Verification Complete!"

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 01/15] sched/isolation: Support dynamic allocation for housekeeping masks
  2026-03-25  9:09 ` [PATCH 01/15] sched/isolation: Support dynamic allocation for housekeeping masks Qiliang Yuan
@ 2026-03-25 13:57   ` Peter Zijlstra
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2026-03-25 13:57 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Thomas Gleixner, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Tejun Heo, Andrew Morton, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan,
	Anna-Maria Behnsen, Ingo Molnar, Shuah Khan, linux-kernel, rcu,
	linux-mm, linux-kselftest

On Wed, Mar 25, 2026 at 05:09:32PM +0800, Qiliang Yuan wrote:
> The existing housekeeping infrastructure uses a single static cpumask
> for all isolation types. This prevents independent runtime
> reconfiguration of different services (like RCU vs. timers).

I think I asked this a while ago; why do we have more than one mask?
What is the actual purpose of being able to separate RCU from Timers?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 02/15] sched/isolation: Introduce housekeeping notifier infrastructure
  2026-03-25  9:09 ` [PATCH 02/15] sched/isolation: Introduce housekeeping notifier infrastructure Qiliang Yuan
@ 2026-03-25 13:58   ` Peter Zijlstra
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2026-03-25 13:58 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Thomas Gleixner, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Tejun Heo, Andrew Morton, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan,
	Anna-Maria Behnsen, Ingo Molnar, Shuah Khan, linux-kernel, rcu,
	linux-mm, linux-kselftest

On Wed, Mar 25, 2026 at 05:09:33PM +0800, Qiliang Yuan wrote:
> Subsystems currently rely on static housekeeping masks determined at
> boot. Supporting runtime reconfiguration (DHEI) requires a mechanism
> to broadcast mask changes to affected kernel components.

Can we eradicate the whole DHEI naming please? It makes no sense.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/15] sched/isolation: Separate housekeeping types in enum hk_type
  2026-03-25  9:09 ` [PATCH 03/15] sched/isolation: Separate housekeeping types in enum hk_type Qiliang Yuan
@ 2026-03-25 13:59   ` Peter Zijlstra
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2026-03-25 13:59 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Thomas Gleixner, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Tejun Heo, Andrew Morton, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan,
	Anna-Maria Behnsen, Ingo Molnar, Shuah Khan, linux-kernel, rcu,
	linux-mm, linux-kselftest

On Wed, Mar 25, 2026 at 05:09:34PM +0800, Qiliang Yuan wrote:
> Most kernel noise types (TICK, TIMER, RCU, etc.) are currently
> aliased to a single HK_TYPE_KERNEL_NOISE enum value. This prevents
> fine-grained runtime isolation control as all masks are forced to be
> identical.
> 
> Un-alias service-specific housekeeping types in enum hk_type.
> 
> This separation provides the necessary granularity for DHEI subsystems
> to subscribe to and maintain independent affinity masks.

What the hell for?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 06/15] sched/core: Dynamically update scheduler domain housekeeping mask
  2026-03-25  9:09 ` [PATCH 06/15] sched/core: Dynamically update scheduler domain housekeeping mask Qiliang Yuan
@ 2026-03-25 14:00   ` Peter Zijlstra
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2026-03-25 14:00 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Thomas Gleixner, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Tejun Heo, Andrew Morton, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan,
	Anna-Maria Behnsen, Ingo Molnar, Shuah Khan, linux-kernel, rcu,
	linux-mm, linux-kselftest

On Wed, Mar 25, 2026 at 05:09:37PM +0800, Qiliang Yuan wrote:
> Scheduler domains rely on HK_TYPE_DOMAIN to identify which CPUs are
> isolated from general load balancing. Currently, these boundaries are
> static and determined only during boot-time domain initialization.

This statement is factually incorrect. You can dynamically create
partitions with both cpuset-v1 and cpuset-v2.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 07/15] watchdog: Allow runtime toggle of lockup detector affinity
  2026-03-25  9:09 ` [PATCH 07/15] watchdog: Allow runtime toggle of lockup detector affinity Qiliang Yuan
@ 2026-03-25 14:03   ` Peter Zijlstra
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2026-03-25 14:03 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Thomas Gleixner, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Tejun Heo, Andrew Morton, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan,
	Anna-Maria Behnsen, Ingo Molnar, Shuah Khan, linux-kernel, rcu,
	linux-mm, linux-kselftest

On Wed, Mar 25, 2026 at 05:09:38PM +0800, Qiliang Yuan wrote:
> The hardlockup detector threads are affined to CPUs based on the
> HK_TYPE_TIMER housekeeping mask at boot. If this mask is updated at
> runtime, these threads remain on their original CPUs, potentially
> running on isolated cores.
> 
> Synchronize watchdog thread affinity with HK_TYPE_TIMER updates.

Doesn't the normal watchdog run off of perf, using NMIs? How is that
TIMER?

And again, why do you think you need more than _ONE_ mask?

In the end, NOHZ_FULL needs all the masks to be the same anyway. There
is absolutely no sane reason to have this much configuration space.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 13/15] sched/isolation: Implement sysfs interface for dynamic housekeeping
  2026-03-25  9:09 ` [PATCH 13/15] sched/isolation: Implement sysfs interface for dynamic housekeeping Qiliang Yuan
@ 2026-03-25 14:04   ` Peter Zijlstra
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2026-03-25 14:04 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Thomas Gleixner, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Tejun Heo, Andrew Morton, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan,
	Anna-Maria Behnsen, Ingo Molnar, Shuah Khan, linux-kernel, rcu,
	linux-mm, linux-kselftest

On Wed, Mar 25, 2026 at 05:09:44PM +0800, Qiliang Yuan wrote:
> Subsystem housekeeping masks are currently static and can only be set
> via boot-time parameters (isolcpus, nohz_full, etc.). There is no
> userspace interface to reconfigure these boundaries at runtime.
> 
> Implement the DHEI sysfs interface under /sys/kernel/housekeeping.
> 

Why? What was wrong with cpusets?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI)
  2026-03-25  9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
                   ` (14 preceding siblings ...)
  2026-03-25  9:09 ` [PATCH 15/15] selftests: dhei: Add functional tests for dynamic housekeeping Qiliang Yuan
@ 2026-03-25 16:02 ` Tejun Heo
  15 siblings, 0 replies; 23+ messages in thread
From: Tejun Heo @ 2026-03-25 16:02 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
	Shuah Khan, linux-kernel, rcu, linux-mm, linux-kselftest

On Wed, Mar 25, 2026 at 05:09:31PM +0800, Qiliang Yuan wrote:
> The Linux kernel provides mechanisms like 'isolcpus' and 'nohz_full' to
> reduce interference for latency-sensitive workloads. However, these are
> locked behind the "Reboot Wall" - they can only be configured via boot
> parameters and require a system restart for changes to take effect.
> 
> In modern cloud-native environments, CPU resources often need to be
> dynamically re-partitioned to accommodate container scaling without
> the performance penalty and downtime of a full system reboot. Similarly,
> high-frequency trading (HFT) platforms require the ability to fine-tune
> CPU isolation at runtime to minimize jitter for critical execution threads
> based on shifting market demands.
> 
> This patch series introduces Dynamic Housekeeping & Enhanced Isolation
> (DHEI). DHEI allows administrators to reconfigure the kernel's
> housekeeping boundaries at runtime via a new sysfs interface at
> /sys/kernel/housekeeping/.

I think I asked for this in the previous thread but please coordinate with
existing cpuset and isolation mechanisms. You aren't even cc'ing Waiman for
cpuset.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2026-03-25 16:02 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-25  9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
2026-03-25  9:09 ` [PATCH 01/15] sched/isolation: Support dynamic allocation for housekeeping masks Qiliang Yuan
2026-03-25 13:57   ` Peter Zijlstra
2026-03-25  9:09 ` [PATCH 02/15] sched/isolation: Introduce housekeeping notifier infrastructure Qiliang Yuan
2026-03-25 13:58   ` Peter Zijlstra
2026-03-25  9:09 ` [PATCH 03/15] sched/isolation: Separate housekeeping types in enum hk_type Qiliang Yuan
2026-03-25 13:59   ` Peter Zijlstra
2026-03-25  9:09 ` [PATCH 04/15] genirq: Support dynamic migration for managed interrupts Qiliang Yuan
2026-03-25  9:09 ` [PATCH 05/15] rcu: Support runtime NOCB initialization and dynamic offloading Qiliang Yuan
2026-03-25  9:09 ` [PATCH 06/15] sched/core: Dynamically update scheduler domain housekeeping mask Qiliang Yuan
2026-03-25 14:00   ` Peter Zijlstra
2026-03-25  9:09 ` [PATCH 07/15] watchdog: Allow runtime toggle of lockup detector affinity Qiliang Yuan
2026-03-25 14:03   ` Peter Zijlstra
2026-03-25  9:09 ` [PATCH 08/15] workqueue: Support dynamic housekeeping mask updates Qiliang Yuan
2026-03-25  9:09 ` [PATCH 09/15] mm/compaction: Support dynamic housekeeping mask updates for kcompactd Qiliang Yuan
2026-03-25  9:09 ` [PATCH 10/15] tick/nohz: Transition to dynamic full dynticks state management Qiliang Yuan
2026-03-25  9:09 ` [PATCH 11/15] sched/isolation: Implement SMT-aware isolation and safety guards Qiliang Yuan
2026-03-25  9:09 ` [PATCH 12/15] sched/isolation: Bridge boot-time parameters with dynamic isolation Qiliang Yuan
2026-03-25  9:09 ` [PATCH 13/15] sched/isolation: Implement sysfs interface for dynamic housekeeping Qiliang Yuan
2026-03-25 14:04   ` Peter Zijlstra
2026-03-25  9:09 ` [PATCH 14/15] Documentation: isolation: Document DHEI sysfs interfaces Qiliang Yuan
2026-03-25  9:09 ` [PATCH 15/15] selftests: dhei: Add functional tests for dynamic housekeeping Qiliang Yuan
2026-03-25 16:02 ` [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox