* [PATCH 01/15] sched/isolation: Support dynamic allocation for housekeeping masks
2026-03-25 9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
@ 2026-03-25 9:09 ` Qiliang Yuan
2026-03-25 13:57 ` Peter Zijlstra
2026-03-25 9:09 ` [PATCH 02/15] sched/isolation: Introduce housekeeping notifier infrastructure Qiliang Yuan
` (14 subsequent siblings)
15 siblings, 1 reply; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25 9:09 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
Shuah Khan
Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan
The existing housekeeping infrastructure uses a single static cpumask
for all isolation types. This prevents independent runtime
reconfiguration of different services (like RCU vs. timers).
Introduce dynamic allocation for housekeeping masks to support DHEI.
This allows subsequent patches to manage service-specific masks
independently at runtime.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
kernel/sched/isolation.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 3ad0d6df6a0a2..67a5ff273ea08 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -8,6 +8,7 @@
*
*/
#include <linux/sched/isolation.h>
+#include <linux/mutex.h>
#include "sched.h"
enum hk_flags {
@@ -16,6 +17,7 @@ enum hk_flags {
HK_FLAG_KERNEL_NOISE = BIT(HK_TYPE_KERNEL_NOISE),
};
+static DEFINE_MUTEX(housekeeping_mutex);
DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
EXPORT_SYMBOL_GPL(housekeeping_overridden);
@@ -105,8 +107,14 @@ void __init housekeeping_init(void)
static void __init housekeeping_setup_type(enum hk_type type,
cpumask_var_t housekeeping_staging)
{
+ unsigned int gfp = GFP_KERNEL;
+
+ if (!slab_is_available())
+ gfp = GFP_NOWAIT;
+
+ if (!housekeeping.cpumasks[type])
+ alloc_cpumask_var(&housekeeping.cpumasks[type], gfp);
- alloc_bootmem_cpumask_var(&housekeeping.cpumasks[type]);
cpumask_copy(housekeeping.cpumasks[type],
housekeeping_staging);
}
@@ -116,6 +124,10 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
cpumask_var_t non_housekeeping_mask, housekeeping_staging;
unsigned int first_cpu;
int err = 0;
+ unsigned int gfp = GFP_KERNEL;
+
+ if (!slab_is_available())
+ gfp = GFP_NOWAIT;
if ((flags & HK_FLAG_KERNEL_NOISE) && !(housekeeping.flags & HK_FLAG_KERNEL_NOISE)) {
if (!IS_ENABLED(CONFIG_NO_HZ_FULL)) {
@@ -125,13 +137,17 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
}
}
- alloc_bootmem_cpumask_var(&non_housekeeping_mask);
+ if (!alloc_cpumask_var(&non_housekeeping_mask, gfp))
+ return 0;
+
if (cpulist_parse(str, non_housekeeping_mask) < 0) {
pr_warn("Housekeeping: nohz_full= or isolcpus= incorrect CPU range\n");
goto free_non_housekeeping_mask;
}
- alloc_bootmem_cpumask_var(&housekeeping_staging);
+ if (!alloc_cpumask_var(&housekeeping_staging, gfp))
+ goto free_non_housekeeping_mask;
+
cpumask_andnot(housekeeping_staging,
cpu_possible_mask, non_housekeeping_mask);
@@ -203,9 +219,9 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
err = 1;
free_housekeeping_staging:
- free_bootmem_cpumask_var(housekeeping_staging);
+ free_cpumask_var(housekeeping_staging);
free_non_housekeeping_mask:
- free_bootmem_cpumask_var(non_housekeeping_mask);
+ free_cpumask_var(non_housekeeping_mask);
return err;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 01/15] sched/isolation: Support dynamic allocation for housekeeping masks
2026-03-25 9:09 ` [PATCH 01/15] sched/isolation: Support dynamic allocation for housekeeping masks Qiliang Yuan
@ 2026-03-25 13:57 ` Peter Zijlstra
0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2026-03-25 13:57 UTC (permalink / raw)
To: Qiliang Yuan
Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Thomas Gleixner, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Tejun Heo, Andrew Morton, Vlastimil Babka, Suren Baghdasaryan,
Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan,
Anna-Maria Behnsen, Ingo Molnar, Shuah Khan, linux-kernel, rcu,
linux-mm, linux-kselftest
On Wed, Mar 25, 2026 at 05:09:32PM +0800, Qiliang Yuan wrote:
> The existing housekeeping infrastructure uses a single static cpumask
> for all isolation types. This prevents independent runtime
> reconfiguration of different services (like RCU vs. timers).
I think I asked this a while ago; why do we have more than one mask?
What is the actual purpose of being able to separate RCU from Timers?
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 02/15] sched/isolation: Introduce housekeeping notifier infrastructure
2026-03-25 9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
2026-03-25 9:09 ` [PATCH 01/15] sched/isolation: Support dynamic allocation for housekeeping masks Qiliang Yuan
@ 2026-03-25 9:09 ` Qiliang Yuan
2026-03-25 13:58 ` Peter Zijlstra
2026-03-25 9:09 ` [PATCH 03/15] sched/isolation: Separate housekeeping types in enum hk_type Qiliang Yuan
` (13 subsequent siblings)
15 siblings, 1 reply; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25 9:09 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
Shuah Khan
Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan
Subsystems currently rely on static housekeeping masks determined at
boot. Supporting runtime reconfiguration (DHEI) requires a mechanism
to broadcast mask changes to affected kernel components.
Implement a blocking notifier chain for housekeeping mask updates.
This infrastructure enables subsystems like genirq, workqueues, and RCU
to react dynamically to isolation changes.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
include/linux/sched/isolation.h | 21 +++++++++++++++++++++
kernel/sched/isolation.c | 24 ++++++++++++++++++++++++
2 files changed, 45 insertions(+)
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index d8501f4709b58..9df55237d3901 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -5,6 +5,7 @@
#include <linux/cpuset.h>
#include <linux/init.h>
#include <linux/tick.h>
+#include <linux/notifier.h>
enum hk_type {
HK_TYPE_DOMAIN,
@@ -24,6 +25,13 @@ enum hk_type {
HK_TYPE_KTHREAD = HK_TYPE_KERNEL_NOISE
};
+struct housekeeping_update {
+ enum hk_type type;
+ const struct cpumask *new_mask;
+};
+
+#define HK_UPDATE_MASK 0x01
+
#ifdef CONFIG_CPU_ISOLATION
DECLARE_STATIC_KEY_FALSE(housekeeping_overridden);
extern int housekeeping_any_cpu(enum hk_type type);
@@ -33,6 +41,9 @@ extern void housekeeping_affine(struct task_struct *t, enum hk_type type);
extern bool housekeeping_test_cpu(int cpu, enum hk_type type);
extern void __init housekeeping_init(void);
+extern int housekeeping_register_notifier(struct notifier_block *nb);
+extern int housekeeping_unregister_notifier(struct notifier_block *nb);
+
#else
static inline int housekeeping_any_cpu(enum hk_type type)
@@ -59,6 +70,16 @@ static inline bool housekeeping_test_cpu(int cpu, enum hk_type type)
}
static inline void housekeeping_init(void) { }
+
+static inline int housekeeping_register_notifier(struct notifier_block *nb)
+{
+ return 0;
+}
+
+static inline int housekeeping_unregister_notifier(struct notifier_block *nb)
+{
+ return 0;
+}
#endif /* CONFIG_CPU_ISOLATION */
static inline bool housekeeping_cpu(int cpu, enum hk_type type)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 67a5ff273ea08..e7a21023726df 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -9,6 +9,7 @@
*/
#include <linux/sched/isolation.h>
#include <linux/mutex.h>
+#include <linux/notifier.h>
#include "sched.h"
enum hk_flags {
@@ -18,6 +19,7 @@ enum hk_flags {
};
static DEFINE_MUTEX(housekeeping_mutex);
+static BLOCKING_NOTIFIER_HEAD(housekeeping_notifier_list);
DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
EXPORT_SYMBOL_GPL(housekeeping_overridden);
@@ -86,6 +88,28 @@ bool housekeeping_test_cpu(int cpu, enum hk_type type)
}
EXPORT_SYMBOL_GPL(housekeeping_test_cpu);
+int housekeeping_register_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&housekeeping_notifier_list, nb);
+}
+EXPORT_SYMBOL_GPL(housekeeping_register_notifier);
+
+int housekeeping_unregister_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_unregister(&housekeeping_notifier_list, nb);
+}
+EXPORT_SYMBOL_GPL(housekeeping_unregister_notifier);
+
+static int housekeeping_update_notify(enum hk_type type, const struct cpumask *new_mask)
+{
+ struct housekeeping_update update = {
+ .type = type,
+ .new_mask = new_mask,
+ };
+
+ return blocking_notifier_call_chain(&housekeeping_notifier_list, HK_UPDATE_MASK, &update);
+}
+
void __init housekeeping_init(void)
{
enum hk_type type;
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 02/15] sched/isolation: Introduce housekeeping notifier infrastructure
2026-03-25 9:09 ` [PATCH 02/15] sched/isolation: Introduce housekeeping notifier infrastructure Qiliang Yuan
@ 2026-03-25 13:58 ` Peter Zijlstra
0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2026-03-25 13:58 UTC (permalink / raw)
To: Qiliang Yuan
Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Thomas Gleixner, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Tejun Heo, Andrew Morton, Vlastimil Babka, Suren Baghdasaryan,
Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan,
Anna-Maria Behnsen, Ingo Molnar, Shuah Khan, linux-kernel, rcu,
linux-mm, linux-kselftest
On Wed, Mar 25, 2026 at 05:09:33PM +0800, Qiliang Yuan wrote:
> Subsystems currently rely on static housekeeping masks determined at
> boot. Supporting runtime reconfiguration (DHEI) requires a mechanism
> to broadcast mask changes to affected kernel components.
Can we eradicate the whole DHEI naming please? It makes no sense.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 03/15] sched/isolation: Separate housekeeping types in enum hk_type
2026-03-25 9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
2026-03-25 9:09 ` [PATCH 01/15] sched/isolation: Support dynamic allocation for housekeeping masks Qiliang Yuan
2026-03-25 9:09 ` [PATCH 02/15] sched/isolation: Introduce housekeeping notifier infrastructure Qiliang Yuan
@ 2026-03-25 9:09 ` Qiliang Yuan
2026-03-25 13:59 ` Peter Zijlstra
2026-03-25 9:09 ` [PATCH 04/15] genirq: Support dynamic migration for managed interrupts Qiliang Yuan
` (12 subsequent siblings)
15 siblings, 1 reply; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25 9:09 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
Shuah Khan
Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan
Most kernel noise types (TICK, TIMER, RCU, etc.) are currently
aliased to a single HK_TYPE_KERNEL_NOISE enum value. This prevents
fine-grained runtime isolation control as all masks are forced to be
identical.
Un-alias service-specific housekeeping types in enum hk_type.
This separation provides the necessary granularity for DHEI subsystems
to subscribe to and maintain independent affinity masks.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
include/linux/sched/isolation.h | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index 9df55237d3901..6ec64eb3f8bcb 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -10,21 +10,18 @@
enum hk_type {
HK_TYPE_DOMAIN,
HK_TYPE_MANAGED_IRQ,
- HK_TYPE_KERNEL_NOISE,
+ HK_TYPE_TICK,
+ HK_TYPE_TIMER,
+ HK_TYPE_RCU,
+ HK_TYPE_MISC,
+ HK_TYPE_WQ,
+ HK_TYPE_KTHREAD,
HK_TYPE_MAX,
- /*
- * The following housekeeping types are only set by the nohz_full
- * boot commandline option. So they can share the same value.
- */
- HK_TYPE_TICK = HK_TYPE_KERNEL_NOISE,
- HK_TYPE_TIMER = HK_TYPE_KERNEL_NOISE,
- HK_TYPE_RCU = HK_TYPE_KERNEL_NOISE,
- HK_TYPE_MISC = HK_TYPE_KERNEL_NOISE,
- HK_TYPE_WQ = HK_TYPE_KERNEL_NOISE,
- HK_TYPE_KTHREAD = HK_TYPE_KERNEL_NOISE
};
+#define HK_TYPE_KERNEL_NOISE HK_TYPE_TICK
+
struct housekeeping_update {
enum hk_type type;
const struct cpumask *new_mask;
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 03/15] sched/isolation: Separate housekeeping types in enum hk_type
2026-03-25 9:09 ` [PATCH 03/15] sched/isolation: Separate housekeeping types in enum hk_type Qiliang Yuan
@ 2026-03-25 13:59 ` Peter Zijlstra
0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2026-03-25 13:59 UTC (permalink / raw)
To: Qiliang Yuan
Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Thomas Gleixner, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Tejun Heo, Andrew Morton, Vlastimil Babka, Suren Baghdasaryan,
Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan,
Anna-Maria Behnsen, Ingo Molnar, Shuah Khan, linux-kernel, rcu,
linux-mm, linux-kselftest
On Wed, Mar 25, 2026 at 05:09:34PM +0800, Qiliang Yuan wrote:
> Most kernel noise types (TICK, TIMER, RCU, etc.) are currently
> aliased to a single HK_TYPE_KERNEL_NOISE enum value. This prevents
> fine-grained runtime isolation control as all masks are forced to be
> identical.
>
> Un-alias service-specific housekeeping types in enum hk_type.
>
> This separation provides the necessary granularity for DHEI subsystems
> to subscribe to and maintain independent affinity masks.
What the hell for?
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 04/15] genirq: Support dynamic migration for managed interrupts
2026-03-25 9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
` (2 preceding siblings ...)
2026-03-25 9:09 ` [PATCH 03/15] sched/isolation: Separate housekeeping types in enum hk_type Qiliang Yuan
@ 2026-03-25 9:09 ` Qiliang Yuan
2026-03-25 9:09 ` [PATCH 05/15] rcu: Support runtime NOCB initialization and dynamic offloading Qiliang Yuan
` (11 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25 9:09 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
Shuah Khan
Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan
Managed interrupts currently have their affinity determined once,
honoring boot-time isolation settings. There is no mechanism to migrate
them when housekeeping boundaries change at runtime.
Enable managed interrupts to respond dynamically to housekeeping updates.
This ensures that managed interrupts are migrated away from newly
isolated CPUs or redistributed when housekeeping CPUs are added.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
kernel/irq/manage.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 349ae7979da0e..f2cba3d7ef624 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -2811,3 +2811,52 @@ bool irq_check_status_bit(unsigned int irq, unsigned int bitmask)
return res;
}
EXPORT_SYMBOL_GPL(irq_check_status_bit);
+
+#ifdef CONFIG_SMP
+static int irq_housekeeping_reconfigure(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct housekeeping_update *upd = data;
+ unsigned int irq;
+
+ if (action != HK_UPDATE_MASK || upd->type != HK_TYPE_MANAGED_IRQ)
+ return NOTIFY_OK;
+
+ irq_lock_sparse();
+ for_each_active_irq(irq) {
+ struct irq_data *irqd;
+ struct irq_desc *desc;
+
+ desc = irq_to_desc(irq);
+ if (!desc)
+ continue;
+
+ scoped_guard(raw_spinlock_irqsave, &desc->lock) {
+ irqd = irq_desc_get_irq_data(desc);
+ if (!irqd_affinity_is_managed(irqd) || !desc->action ||
+ !irq_data_get_irq_chip(irqd))
+ continue;
+
+ /*
+ * Re-apply existing affinity to honor the new
+ * housekeeping mask via __irq_set_affinity() logic.
+ */
+ irq_set_affinity_locked(irqd, irq_data_get_affinity_mask(irqd), false);
+ }
+ }
+ irq_unlock_sparse();
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block irq_housekeeping_nb = {
+ .notifier_call = irq_housekeeping_reconfigure,
+};
+
+static int __init irq_init_housekeeping_notifier(void)
+{
+ housekeeping_register_notifier(&irq_housekeeping_nb);
+ return 0;
+}
+core_initcall(irq_init_housekeeping_notifier);
+#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 05/15] rcu: Support runtime NOCB initialization and dynamic offloading
2026-03-25 9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
` (3 preceding siblings ...)
2026-03-25 9:09 ` [PATCH 04/15] genirq: Support dynamic migration for managed interrupts Qiliang Yuan
@ 2026-03-25 9:09 ` Qiliang Yuan
2026-03-25 9:09 ` [PATCH 06/15] sched/core: Dynamically update scheduler domain housekeeping mask Qiliang Yuan
` (10 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25 9:09 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
Shuah Khan
Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan
Context:
The RCU Non-Callback (NOCB) infrastructure traditionally requires
boot-time parameters (e.g., rcu_nocbs) to allocate masks and spawn
management kthreads (rcuog/rcuo). This prevents systems from activating
offloading on-demand without a reboot.
Problem:
Dynamic Housekeeping & Enhanced Isolation (DHEI) requires CPUs to
transition to NOCB mode at runtime. Without boot-time setup, the
NOCB masks are unallocated, and critical kthreads are missing,
preventing effective tick suppression and isolation.
Solution:
Refactor RCU initialization to support dynamic on-demand setup.
- Introduce rcu_init_nocb_dynamic() to allocate masks and organize
kthreads if the system wasn't initially configured for NOCB.
- Update rcu_housekeeping_reconfigure() to iterate over CPUs and
perform safe offload/deoffload transitions via hotplug sequences
(cpu_down -> offload -> cpu_up).
- Remove __init from rcu_organize_nocb_kthreads to allow runtime
reconfiguration of the callback management hierarchy.
This enables a true "Zero-Conf" isolation experience where any CPU
can be fully isolated at runtime regardless of boot parameters.
---
kernel/rcu/rcu.h | 4 +++
kernel/rcu/tree.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++
kernel/rcu/tree.h | 2 +-
kernel/rcu/tree_nocb.h | 27 ++++++++++++------
4 files changed, 99 insertions(+), 10 deletions(-)
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 9cf01832a6c3d..fa9de9a3918b1 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -658,8 +658,12 @@ unsigned long srcu_batches_completed(struct srcu_struct *sp);
#endif // #else // #ifdef CONFIG_TINY_SRCU
#ifdef CONFIG_RCU_NOCB_CPU
+void rcu_init_nocb_dynamic(void);
+void rcu_spawn_cpu_nocb_kthread(int cpu);
void rcu_bind_current_to_nocb(void);
#else
+static inline void rcu_init_nocb_dynamic(void) { }
+static inline void rcu_spawn_cpu_nocb_kthread(int cpu) { }
static inline void rcu_bind_current_to_nocb(void) { }
#endif
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 293bbd9ac3f4e..3fd12ac20957f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -48,6 +48,7 @@
#include <linux/delay.h>
#include <linux/random.h>
#include <linux/trace_events.h>
+#include <linux/sched/isolation.h>
#include <linux/suspend.h>
#include <linux/ftrace.h>
#include <linux/tick.h>
@@ -4916,4 +4917,79 @@ void __init rcu_init(void)
#include "tree_stall.h"
#include "tree_exp.h"
#include "tree_nocb.h"
+
+#ifdef CONFIG_SMP
+static int rcu_housekeeping_reconfigure(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct housekeeping_update *upd = data;
+ struct task_struct *t;
+ int cpu;
+
+ if (action != HK_UPDATE_MASK || upd->type != HK_TYPE_RCU)
+ return NOTIFY_OK;
+
+ rcu_init_nocb_dynamic();
+
+ for_each_possible_cpu(cpu) {
+ struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+ bool isolated = !cpumask_test_cpu(cpu, upd->new_mask);
+ bool offloaded = rcu_rdp_is_offloaded(rdp);
+
+ if (isolated && !offloaded) {
+ /* Transition to NOCB */
+ pr_info("rcu: CPU %d transitioning to NOCB mode\n", cpu);
+ if (cpu_online(cpu)) {
+ remove_cpu(cpu);
+ rcu_spawn_cpu_nocb_kthread(cpu);
+ rcu_nocb_cpu_offload(cpu);
+ add_cpu(cpu);
+ } else {
+ rcu_spawn_cpu_nocb_kthread(cpu);
+ rcu_nocb_cpu_offload(cpu);
+ }
+ } else if (!isolated && offloaded) {
+ /* Transition to CB */
+ pr_info("rcu: CPU %d transitioning to CB mode\n", cpu);
+ if (cpu_online(cpu)) {
+ remove_cpu(cpu);
+ rcu_nocb_cpu_deoffload(cpu);
+ add_cpu(cpu);
+ } else {
+ rcu_nocb_cpu_deoffload(cpu);
+ }
+ }
+ }
+
+ t = READ_ONCE(rcu_state.gp_kthread);
+ if (t)
+ housekeeping_affine(t, HK_TYPE_RCU);
+
+#ifdef CONFIG_TASKS_RCU
+ t = get_rcu_tasks_gp_kthread();
+ if (t)
+ housekeeping_affine(t, HK_TYPE_RCU);
+#endif
+
+#ifdef CONFIG_TASKS_RUDE_RCU
+ t = get_rcu_tasks_rude_gp_kthread();
+ if (t)
+ housekeeping_affine(t, HK_TYPE_RCU);
+#endif
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block rcu_housekeeping_nb = {
+ .notifier_call = rcu_housekeeping_reconfigure,
+};
+
+static int __init rcu_init_housekeeping_notifier(void)
+{
+ housekeeping_register_notifier(&rcu_housekeeping_nb);
+ return 0;
+}
+late_initcall(rcu_init_housekeeping_notifier);
+#endif
+
#include "tree_plugin.h"
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index b8bbe7960cda7..5322656a5a359 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -518,7 +518,7 @@ static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp,
unsigned long flags);
static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp);
#ifdef CONFIG_RCU_NOCB_CPU
-static void __init rcu_organize_nocb_kthreads(void);
+static void rcu_organize_nocb_kthreads(void);
/*
* Disable IRQs before checking offloaded state so that local
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index e6cd56603cad4..9f5f446e70b3f 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1285,6 +1285,22 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
}
#endif // #ifdef CONFIG_RCU_LAZY
+void rcu_init_nocb_dynamic(void)
+{
+ if (rcu_state.nocb_is_setup)
+ return;
+
+ if (!cpumask_available(rcu_nocb_mask)) {
+ if (!zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL)) {
+ pr_info("rcu_nocb_mask allocation failed, dynamic offloading disabled.\n");
+ return;
+ }
+ }
+
+ rcu_state.nocb_is_setup = true;
+ rcu_organize_nocb_kthreads();
+}
+
void __init rcu_init_nohz(void)
{
int cpu;
@@ -1302,15 +1318,8 @@ void __init rcu_init_nohz(void)
cpumask = cpu_possible_mask;
if (cpumask) {
- if (!cpumask_available(rcu_nocb_mask)) {
- if (!zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL)) {
- pr_info("rcu_nocb_mask allocation failed, callback offloading disabled.\n");
- return;
- }
- }
-
+ rcu_init_nocb_dynamic();
cpumask_or(rcu_nocb_mask, rcu_nocb_mask, cpumask);
- rcu_state.nocb_is_setup = true;
}
if (!rcu_state.nocb_is_setup)
@@ -1442,7 +1451,7 @@ module_param(rcu_nocb_gp_stride, int, 0444);
/*
* Initialize GP-CB relationships for all no-CBs CPU.
*/
-static void __init rcu_organize_nocb_kthreads(void)
+static void rcu_organize_nocb_kthreads(void)
{
int cpu;
bool firsttime = true;
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 06/15] sched/core: Dynamically update scheduler domain housekeeping mask
2026-03-25 9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
` (4 preceding siblings ...)
2026-03-25 9:09 ` [PATCH 05/15] rcu: Support runtime NOCB initialization and dynamic offloading Qiliang Yuan
@ 2026-03-25 9:09 ` Qiliang Yuan
2026-03-25 14:00 ` Peter Zijlstra
2026-03-25 9:09 ` [PATCH 07/15] watchdog: Allow runtime toggle of lockup detector affinity Qiliang Yuan
` (9 subsequent siblings)
15 siblings, 1 reply; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25 9:09 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
Shuah Khan
Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan
Scheduler domains rely on HK_TYPE_DOMAIN to identify which CPUs are
isolated from general load balancing. Currently, these boundaries are
static and determined only during boot-time domain initialization.
Trigger a scheduler domain rebuild when the HK_TYPE_DOMAIN mask changes.
This ensures that scheduler isolation boundaries can be reconfigured
at runtime via the DHEI sysfs interface.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
kernel/sched/core.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 045f83ad261e2..ddf9951f1438c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -39,6 +39,7 @@
#include <linux/sched/nohz.h>
#include <linux/sched/rseq_api.h>
#include <linux/sched/rt.h>
+#include <linux/sched/topology.h>
#include <linux/blkdev.h>
#include <linux/context_tracking.h>
@@ -10832,3 +10833,25 @@ void sched_change_end(struct sched_change_ctx *ctx)
p->sched_class->prio_changed(rq, p, ctx->prio);
}
}
+
+static int sched_housekeeping_update(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct housekeeping_update *update = data;
+
+ if (action == HK_UPDATE_MASK && update->type == HK_TYPE_DOMAIN)
+ rebuild_sched_domains();
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block sched_housekeeping_nb = {
+ .notifier_call = sched_housekeeping_update,
+};
+
+static int __init sched_housekeeping_init(void)
+{
+ housekeeping_register_notifier(&sched_housekeeping_nb);
+ return 0;
+}
+late_initcall(sched_housekeeping_init);
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 06/15] sched/core: Dynamically update scheduler domain housekeeping mask
2026-03-25 9:09 ` [PATCH 06/15] sched/core: Dynamically update scheduler domain housekeeping mask Qiliang Yuan
@ 2026-03-25 14:00 ` Peter Zijlstra
0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2026-03-25 14:00 UTC (permalink / raw)
To: Qiliang Yuan
Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Thomas Gleixner, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Tejun Heo, Andrew Morton, Vlastimil Babka, Suren Baghdasaryan,
Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan,
Anna-Maria Behnsen, Ingo Molnar, Shuah Khan, linux-kernel, rcu,
linux-mm, linux-kselftest
On Wed, Mar 25, 2026 at 05:09:37PM +0800, Qiliang Yuan wrote:
> Scheduler domains rely on HK_TYPE_DOMAIN to identify which CPUs are
> isolated from general load balancing. Currently, these boundaries are
> static and determined only during boot-time domain initialization.
This statement is factually incorrect. You can dynamically create
partitions with both cpuset-v1 and cpuset-v2.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 07/15] watchdog: Allow runtime toggle of lockup detector affinity
2026-03-25 9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
` (5 preceding siblings ...)
2026-03-25 9:09 ` [PATCH 06/15] sched/core: Dynamically update scheduler domain housekeeping mask Qiliang Yuan
@ 2026-03-25 9:09 ` Qiliang Yuan
2026-03-25 14:03 ` Peter Zijlstra
2026-03-25 9:09 ` [PATCH 08/15] workqueue: Support dynamic housekeeping mask updates Qiliang Yuan
` (8 subsequent siblings)
15 siblings, 1 reply; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25 9:09 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
Shuah Khan
Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan
The hardlockup detector threads are affined to CPUs based on the
HK_TYPE_TIMER housekeeping mask at boot. If this mask is updated at
runtime, these threads remain on their original CPUs, potentially
running on isolated cores.
Synchronize watchdog thread affinity with HK_TYPE_TIMER updates.
This ensures that hardlockup detector threads correctly follow the
dynamic housekeeping boundaries for timers.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
kernel/watchdog.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 366122f4a0f87..ef93795729697 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -26,6 +26,7 @@
#include <linux/sysctl.h>
#include <linux/tick.h>
#include <linux/sys_info.h>
+#include <linux/sched/isolation.h>
#include <linux/sched/clock.h>
#include <linux/sched/debug.h>
@@ -1359,6 +1360,29 @@ static int __init lockup_detector_check(void)
}
late_initcall_sync(lockup_detector_check);
+static int watchdog_housekeeping_reconfigure(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ if (action == HK_UPDATE_MASK) {
+ struct housekeeping_update *upd = data;
+ unsigned int type = upd->type;
+
+ if (type == HK_TYPE_TIMER) {
+ mutex_lock(&watchdog_mutex);
+ cpumask_copy(&watchdog_cpumask,
+ housekeeping_cpumask(HK_TYPE_TIMER));
+ proc_watchdog_update(false);
+ mutex_unlock(&watchdog_mutex);
+ }
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block watchdog_housekeeping_nb = {
+ .notifier_call = watchdog_housekeeping_reconfigure,
+};
+
void __init lockup_detector_init(void)
{
if (tick_nohz_full_enabled())
@@ -1373,4 +1397,5 @@ void __init lockup_detector_init(void)
allow_lockup_detector_init_retry = true;
lockup_detector_setup();
+ housekeeping_register_notifier(&watchdog_housekeeping_nb);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 07/15] watchdog: Allow runtime toggle of lockup detector affinity
2026-03-25 9:09 ` [PATCH 07/15] watchdog: Allow runtime toggle of lockup detector affinity Qiliang Yuan
@ 2026-03-25 14:03 ` Peter Zijlstra
0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2026-03-25 14:03 UTC (permalink / raw)
To: Qiliang Yuan
Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Thomas Gleixner, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Tejun Heo, Andrew Morton, Vlastimil Babka, Suren Baghdasaryan,
Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan,
Anna-Maria Behnsen, Ingo Molnar, Shuah Khan, linux-kernel, rcu,
linux-mm, linux-kselftest
On Wed, Mar 25, 2026 at 05:09:38PM +0800, Qiliang Yuan wrote:
> The hardlockup detector threads are affined to CPUs based on the
> HK_TYPE_TIMER housekeeping mask at boot. If this mask is updated at
> runtime, these threads remain on their original CPUs, potentially
> running on isolated cores.
>
> Synchronize watchdog thread affinity with HK_TYPE_TIMER updates.
Doesn't the normal watchdog run off of perf, using NMIs? How is that
TIMER?
And again, why do you think you need more than _ONE_ mask?
In the end, NOHZ_FULL needs all the masks to be the same anyway. There
is absolutely no sane reason to have this much configuration space.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 08/15] workqueue: Support dynamic housekeeping mask updates
2026-03-25 9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
` (6 preceding siblings ...)
2026-03-25 9:09 ` [PATCH 07/15] watchdog: Allow runtime toggle of lockup detector affinity Qiliang Yuan
@ 2026-03-25 9:09 ` Qiliang Yuan
2026-03-25 9:09 ` [PATCH 09/15] mm/compaction: Support dynamic housekeeping mask updates for kcompactd Qiliang Yuan
` (7 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25 9:09 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
Shuah Khan
Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan
Unbound workqueues use HK_TYPE_WQ and HK_TYPE_DOMAIN to determine
their default CPU affinity. These boundaries are currently static and
only enforced during early boot.
Implement a housekeeping notifier to update unbound workqueue affinity.
This enables unbound workqueue tasks to respect dynamic isolation
boundaries at runtime.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
kernel/workqueue.c | 42 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 253311af47c6d..ef3ef7e3fe81f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7904,6 +7904,47 @@ static void __init wq_cpu_intensive_thresh_init(void)
wq_cpu_intensive_thresh_us = thresh;
}
+static int wq_housekeeping_reconfigure(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ if (action == HK_UPDATE_MASK) {
+ struct housekeeping_update *upd = data;
+ unsigned int type = upd->type;
+
+ if (type == HK_TYPE_WQ || type == HK_TYPE_DOMAIN) {
+ cpumask_var_t cpumask;
+
+ if (!alloc_cpumask_var(&cpumask, GFP_KERNEL)) {
+ pr_warn("workqueue: failed to allocate cpumask for housekeeping update\n");
+ return NOTIFY_BAD;
+ }
+
+ cpumask_copy(cpumask, cpu_possible_mask);
+ if (!cpumask_empty(housekeeping_cpumask(HK_TYPE_WQ)))
+ cpumask_and(cpumask, cpumask, housekeeping_cpumask(HK_TYPE_WQ));
+ if (!cpumask_empty(housekeeping_cpumask(HK_TYPE_DOMAIN)))
+ cpumask_and(cpumask, cpumask, housekeeping_cpumask(HK_TYPE_DOMAIN));
+
+ workqueue_set_unbound_cpumask(cpumask);
+
+ if (type == HK_TYPE_DOMAIN) {
+ apply_wqattrs_lock();
+ cpumask_andnot(wq_isolated_cpumask, cpu_possible_mask,
+ housekeeping_cpumask(HK_TYPE_DOMAIN));
+ apply_wqattrs_unlock();
+ }
+
+ free_cpumask_var(cpumask);
+ }
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block wq_housekeeping_nb = {
+ .notifier_call = wq_housekeeping_reconfigure,
+};
+
/**
* workqueue_init - bring workqueue subsystem fully online
*
@@ -7964,6 +8005,7 @@ void __init workqueue_init(void)
wq_online = true;
wq_watchdog_init();
+ housekeeping_register_notifier(&wq_housekeeping_nb);
}
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 09/15] mm/compaction: Support dynamic housekeeping mask updates for kcompactd
2026-03-25 9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
` (7 preceding siblings ...)
2026-03-25 9:09 ` [PATCH 08/15] workqueue: Support dynamic housekeeping mask updates Qiliang Yuan
@ 2026-03-25 9:09 ` Qiliang Yuan
2026-03-25 9:09 ` [PATCH 10/15] tick/nohz: Transition to dynamic full dynticks state management Qiliang Yuan
` (6 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25 9:09 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
Shuah Khan
Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan
The kcompactd threads are affined to housekeeping CPUs (HK_TYPE_DOMAIN)
at boot to avoid interference with isolated workloads. Currently,
these threads do not migrate when the housekeeping boundaries are
reconfigured at runtime.
Implement a housekeeping notifier to synchronize kcompactd affinity.
This ensures that background compaction threads honor the dynamic
isolation boundaries configured via the DHEI sysfs interface.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
mm/compaction.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/mm/compaction.c b/mm/compaction.c
index 1e8f8eca318c6..574ee3c6dc942 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -24,6 +24,7 @@
#include <linux/page_owner.h>
#include <linux/psi.h>
#include <linux/cpuset.h>
+#include <linux/sched/isolation.h>
#include "internal.h"
#ifdef CONFIG_COMPACTION
@@ -3246,6 +3247,7 @@ void __meminit kcompactd_run(int nid)
pr_err("Failed to start kcompactd on node %d\n", nid);
pgdat->kcompactd = NULL;
} else {
+ housekeeping_affine(pgdat->kcompactd, HK_TYPE_KTHREAD);
wake_up_process(pgdat->kcompactd);
}
}
@@ -3320,6 +3322,30 @@ static const struct ctl_table vm_compaction[] = {
},
};
+static int kcompactd_housekeeping_reconfigure(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct housekeeping_update *upd = data;
+ unsigned int type = upd->type;
+
+ if (action == HK_UPDATE_MASK && type == HK_TYPE_KTHREAD) {
+ int nid;
+
+ for_each_node_state(nid, N_MEMORY) {
+ pg_data_t *pgdat = NODE_DATA(nid);
+
+ if (pgdat->kcompactd)
+ housekeeping_affine(pgdat->kcompactd, HK_TYPE_KTHREAD);
+ }
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block kcompactd_housekeeping_nb = {
+ .notifier_call = kcompactd_housekeeping_reconfigure,
+};
+
static int __init kcompactd_init(void)
{
int nid;
@@ -3327,6 +3353,7 @@ static int __init kcompactd_init(void)
for_each_node_state(nid, N_MEMORY)
kcompactd_run(nid);
register_sysctl_init("vm", vm_compaction);
+ housekeeping_register_notifier(&kcompactd_housekeeping_nb);
return 0;
}
subsys_initcall(kcompactd_init)
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 10/15] tick/nohz: Transition to dynamic full dynticks state management
2026-03-25 9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
` (8 preceding siblings ...)
2026-03-25 9:09 ` [PATCH 09/15] mm/compaction: Support dynamic housekeeping mask updates for kcompactd Qiliang Yuan
@ 2026-03-25 9:09 ` Qiliang Yuan
2026-03-25 9:09 ` [PATCH 11/15] sched/isolation: Implement SMT-aware isolation and safety guards Qiliang Yuan
` (5 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25 9:09 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
Shuah Khan
Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan
Context:
Full dynticks (NOHZ_FULL) is typically a static configuration determined
at boot time. DHEI extends this to support runtime activation.
Problem:
Switching to NOHZ_FULL at runtime requires careful synchronization
of context tracking and housekeeping states. Re-invoking setup logic
multiple times could lead to inconsistencies or warnings, and RCU
dependency checks often prevented tick suppression in "Zero-Conf" setups.
Solution:
- Replaced the static tick_nohz_full_enabled() checks with a dynamic
tick_nohz_full_running state variable.
- Refactored tick_nohz_full_setup to be safe for runtime invocation,
adding guards against re-initialization and ensuring IRQ work
interrupt support.
- Implemented boot-time pre-activation of context tracking (shadow
init) for all possible CPUs to avoid instruction flow issues during
dynamic transitions.
- Restored standard rcu_needs_cpu() checks now that RCU supports
native dynamic NOCB mode switching.
This provides the core state machine for reliable, on-demand tick
suppression and high-performance isolation.
---
kernel/time/tick-sched.c | 130 ++++++++++++++++++++++++++++++++++++++---------
1 file changed, 105 insertions(+), 25 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 2f8a7923fa279..dee42cea259a9 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -27,6 +27,7 @@
#include <linux/posix-timers.h>
#include <linux/context_tracking.h>
#include <linux/mm.h>
+#include <linux/sched/isolation.h>
#include <asm/irq_regs.h>
@@ -621,13 +622,25 @@ void __tick_nohz_task_switch(void)
/* Get the boot-time nohz CPU list from the kernel parameters. */
void __init tick_nohz_full_setup(cpumask_var_t cpumask)
{
- alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
+ if (!tick_nohz_full_mask) {
+ if (!slab_is_available())
+ alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
+ else
+ zalloc_cpumask_var(&tick_nohz_full_mask, GFP_KERNEL);
+ }
cpumask_copy(tick_nohz_full_mask, cpumask);
tick_nohz_full_running = true;
}
bool tick_nohz_cpu_hotpluggable(unsigned int cpu)
{
+ /*
+ * Allow all CPUs to go down during shutdown/reboot to avoid
+ * interfering with the final power-off sequence.
+ */
+ if (system_state > SYSTEM_RUNNING)
+ return true;
+
/*
* The 'tick_do_timer_cpu' CPU handles housekeeping duty (unbound
* timers, workqueues, timekeeping, ...) on behalf of full dynticks
@@ -643,45 +656,112 @@ static int tick_nohz_cpu_down(unsigned int cpu)
return tick_nohz_cpu_hotpluggable(cpu) ? 0 : -EBUSY;
}
+static int tick_nohz_housekeeping_reconfigure(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct housekeeping_update *upd = data;
+ int cpu;
+
+ if (action == HK_UPDATE_MASK && upd->type == HK_TYPE_TICK) {
+ cpumask_var_t non_housekeeping_mask;
+
+ if (!alloc_cpumask_var(&non_housekeeping_mask, GFP_KERNEL))
+ return NOTIFY_BAD;
+
+ cpumask_andnot(non_housekeeping_mask, cpu_possible_mask, upd->new_mask);
+
+ if (!tick_nohz_full_mask) {
+ if (!zalloc_cpumask_var(&tick_nohz_full_mask, GFP_KERNEL)) {
+ free_cpumask_var(non_housekeeping_mask);
+ return NOTIFY_BAD;
+ }
+ }
+
+ /* Kick all CPUs to re-evaluate tick dependency before change */
+ for_each_online_cpu(cpu)
+ tick_nohz_full_kick_cpu(cpu);
+
+ cpumask_copy(tick_nohz_full_mask, non_housekeeping_mask);
+ tick_nohz_full_running = !cpumask_empty(tick_nohz_full_mask);
+
+ /*
+ * If nohz_full is running, the timer duty must be on a housekeeper.
+ * If the current timer CPU is not a housekeeper, or no duty is assigned,
+ * pick the first housekeeper and assign it.
+ */
+ if (tick_nohz_full_running) {
+ int timer_cpu = READ_ONCE(tick_do_timer_cpu);
+ if (timer_cpu == TICK_DO_TIMER_NONE ||
+ !cpumask_test_cpu(timer_cpu, upd->new_mask)) {
+ int next_timer = cpumask_first(upd->new_mask);
+ if (next_timer < nr_cpu_ids)
+ WRITE_ONCE(tick_do_timer_cpu, next_timer);
+ }
+ }
+
+ /* Kick all CPUs again to apply new nohz full state */
+ for_each_online_cpu(cpu)
+ tick_nohz_full_kick_cpu(cpu);
+
+ free_cpumask_var(non_housekeeping_mask);
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block tick_nohz_housekeeping_nb = {
+ .notifier_call = tick_nohz_housekeeping_reconfigure,
+};
+
void __init tick_nohz_init(void)
{
int cpu, ret;
- if (!tick_nohz_full_running)
- return;
-
- /*
- * Full dynticks uses IRQ work to drive the tick rescheduling on safe
- * locking contexts. But then we need IRQ work to raise its own
- * interrupts to avoid circular dependency on the tick.
- */
- if (!arch_irq_work_has_interrupt()) {
- pr_warn("NO_HZ: Can't run full dynticks because arch doesn't support IRQ work self-IPIs\n");
- cpumask_clear(tick_nohz_full_mask);
- tick_nohz_full_running = false;
- return;
+ if (!tick_nohz_full_mask) {
+ if (!slab_is_available())
+ alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
+ else
+ zalloc_cpumask_var(&tick_nohz_full_mask, GFP_KERNEL);
}
- if (IS_ENABLED(CONFIG_PM_SLEEP_SMP) &&
- !IS_ENABLED(CONFIG_PM_SLEEP_SMP_NONZERO_CPU)) {
- cpu = smp_processor_id();
+ housekeeping_register_notifier(&tick_nohz_housekeeping_nb);
- if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
- pr_warn("NO_HZ: Clearing %d from nohz_full range "
- "for timekeeping\n", cpu);
- cpumask_clear_cpu(cpu, tick_nohz_full_mask);
+ if (tick_nohz_full_running) {
+ /*
+ * Full dynticks uses IRQ work to drive the tick rescheduling on safe
+ * locking contexts. But then we need IRQ work to raise its own
+ * interrupts to avoid circular dependency on the tick.
+ */
+ if (!arch_irq_work_has_interrupt()) {
+ pr_warn("NO_HZ: Can't run full dynticks because arch doesn't support IRQ work self-IPIs\n");
+ cpumask_clear(tick_nohz_full_mask);
+ tick_nohz_full_running = false;
+ goto out;
}
+
+ if (IS_ENABLED(CONFIG_PM_SLEEP_SMP) &&
+ !IS_ENABLED(CONFIG_PM_SLEEP_SMP_NONZERO_CPU)) {
+ cpu = smp_processor_id();
+
+ if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
+ pr_warn("NO_HZ: Clearing %d from nohz_full range "
+ "for timekeeping\n", cpu);
+ cpumask_clear_cpu(cpu, tick_nohz_full_mask);
+ }
+ }
+
+ pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n",
+ cpumask_pr_args(tick_nohz_full_mask));
}
- for_each_cpu(cpu, tick_nohz_full_mask)
+out:
+ for_each_possible_cpu(cpu)
ct_cpu_track_user(cpu);
ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
"kernel/nohz:predown", NULL,
tick_nohz_cpu_down);
WARN_ON(ret < 0);
- pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n",
- cpumask_pr_args(tick_nohz_full_mask));
}
#endif /* #ifdef CONFIG_NO_HZ_FULL */
@@ -1200,7 +1280,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
if (unlikely(report_idle_softirq()))
return false;
- if (tick_nohz_full_enabled()) {
+ if (tick_nohz_full_running) {
int tick_cpu = READ_ONCE(tick_do_timer_cpu);
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 11/15] sched/isolation: Implement SMT-aware isolation and safety guards
2026-03-25 9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
` (9 preceding siblings ...)
2026-03-25 9:09 ` [PATCH 10/15] tick/nohz: Transition to dynamic full dynticks state management Qiliang Yuan
@ 2026-03-25 9:09 ` Qiliang Yuan
2026-03-25 9:09 ` [PATCH 12/15] sched/isolation: Bridge boot-time parameters with dynamic isolation Qiliang Yuan
` (4 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25 9:09 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
Shuah Khan
Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan
Manual isolation of single SMT siblings can lead to resource
contention and inconsistent performance. Furthermore, userspace might
accidentally isolate all available CPUs, leading to a system lockup.
Enhance DHEI with SMT-aware grouping and safety checks.
These enhancements ensure that hardware resource boundaries are
respected and prevent catastrophic misconfiguration of the system.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
kernel/sched/isolation.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 180 insertions(+)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index e7a21023726df..4a5967837e8de 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -10,6 +10,7 @@
#include <linux/sched/isolation.h>
#include <linux/mutex.h>
#include <linux/notifier.h>
+#include <linux/topology.h>
#include "sched.h"
enum hk_flags {
@@ -29,6 +30,30 @@ struct housekeeping {
};
static struct housekeeping housekeeping;
+static bool housekeeping_smt_aware;
+
+static ssize_t smt_aware_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return sprintf(buf, "%d\n", housekeeping_smt_aware);
+}
+
+static ssize_t smt_aware_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ bool val;
+
+ if (kstrtobool(buf, &val))
+ return -EINVAL;
+
+ housekeeping_smt_aware = val;
+
+ return count;
+}
+
+static struct kobj_attribute smt_aware_attr =
+ __ATTR(smt_aware_mode, 0644, smt_aware_show, smt_aware_store);
bool housekeeping_enabled(enum hk_type type)
{
@@ -110,6 +135,161 @@ static int housekeeping_update_notify(enum hk_type type, const struct cpumask *n
return blocking_notifier_call_chain(&housekeeping_notifier_list, HK_UPDATE_MASK, &update);
}
+static const char * const hk_type_names[] = {
+ [HK_TYPE_TIMER] = "timer",
+ [HK_TYPE_RCU] = "rcu",
+ [HK_TYPE_MISC] = "misc",
+ [HK_TYPE_TICK] = "tick",
+ [HK_TYPE_DOMAIN] = "domain",
+ [HK_TYPE_WQ] = "workqueue",
+ [HK_TYPE_MANAGED_IRQ] = "managed_irq",
+ [HK_TYPE_KTHREAD] = "kthread",
+};
+
+struct hk_attribute {
+ struct kobj_attribute kattr;
+ enum hk_type type;
+};
+
+#define to_hk_attr(_kattr) container_of(_kattr, struct hk_attribute, kattr)
+
+static ssize_t housekeeping_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct hk_attribute *hk_attr = to_hk_attr(attr);
+ const struct cpumask *mask = housekeeping_cpumask(hk_attr->type);
+
+ return cpumap_print_to_pagebuf(true, buf, mask);
+}
+
+static ssize_t housekeeping_store(struct kobject *kobject,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct hk_attribute *hk_attr = to_hk_attr(attr);
+ enum hk_type type = hk_attr->type;
+ cpumask_var_t new_mask;
+ int err;
+
+ if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))
+ return -ENOMEM;
+
+ err = cpulist_parse(buf, new_mask);
+ if (err)
+ goto out_free;
+
+ /* Safety check: must have at least one online CPU for housekeeping */
+ if (!cpumask_intersects(new_mask, cpu_online_mask)) {
+ err = -EINVAL;
+ goto out_free;
+ }
+
+ if (housekeeping_smt_aware) {
+ int cpu, sibling;
+ cpumask_var_t tmp_mask;
+
+ if (!alloc_cpumask_var(&tmp_mask, GFP_KERNEL)) {
+ err = -ENOMEM;
+ goto out_free;
+ }
+
+ cpumask_copy(tmp_mask, new_mask);
+ for_each_cpu(cpu, tmp_mask) {
+ for_each_cpu(sibling, topology_sibling_cpumask(cpu)) {
+ if (!cpumask_test_cpu(sibling, tmp_mask)) {
+ /* SMT sibling should stay grouped */
+ cpumask_clear_cpu(cpu, new_mask);
+ break;
+ }
+ }
+ }
+ free_cpumask_var(tmp_mask);
+
+ /* Re-check after SMT sync */
+ if (!cpumask_intersects(new_mask, cpu_online_mask)) {
+ err = -EINVAL;
+ goto out_free;
+ }
+ }
+
+ mutex_lock(&housekeeping_mutex);
+
+ if (!housekeeping.cpumasks[type]) {
+ if (!alloc_cpumask_var(&housekeeping.cpumasks[type], GFP_KERNEL)) {
+ err = -ENOMEM;
+ goto out_unlock;
+ }
+ }
+
+ if (cpumask_equal(housekeeping.cpumasks[type], new_mask)) {
+ err = 0;
+ goto out_unlock;
+ }
+
+ cpumask_copy(housekeeping.cpumasks[type], new_mask);
+ housekeeping.flags |= BIT(type);
+ static_branch_enable(&housekeeping_overridden);
+
+ housekeeping_update_notify(type, new_mask);
+
+ err = count;
+
+out_unlock:
+ mutex_unlock(&housekeeping_mutex);
+out_free:
+ free_cpumask_var(new_mask);
+ return err < 0 ? err : count;
+}
+
+static struct hk_attribute housekeeping_attrs[HK_TYPE_MAX];
+static struct attribute *housekeeping_attr_ptr[HK_TYPE_MAX + 1];
+
+static const struct attribute_group housekeeping_attr_group = {
+ .attrs = housekeeping_attr_ptr,
+};
+
+static int __init housekeeping_sysfs_init(void)
+{
+ struct kobject *housekeeping_kobj;
+ int i, j = 0;
+ int ret;
+
+ housekeeping_kobj = kobject_create_and_add("housekeeping", kernel_kobj);
+ if (!housekeeping_kobj)
+ return -ENOMEM;
+
+ for (i = 0; i < HK_TYPE_MAX; i++) {
+ if (!hk_type_names[i])
+ continue;
+
+ housekeeping_attrs[i].type = i;
+ sysfs_attr_init(&housekeeping_attrs[i].kattr.attr);
+ housekeeping_attrs[i].kattr.attr.name = hk_type_names[i];
+ housekeeping_attrs[i].kattr.attr.mode = 0644;
+ housekeeping_attrs[i].kattr.show = housekeeping_show;
+ housekeeping_attrs[i].kattr.store = housekeeping_store;
+ housekeeping_attr_ptr[j++] = &housekeeping_attrs[i].kattr.attr;
+ }
+ housekeeping_attr_ptr[j] = NULL;
+
+ ret = sysfs_create_group(housekeeping_kobj, &housekeeping_attr_group);
+ if (ret)
+ goto err_group;
+
+ ret = sysfs_create_file(housekeeping_kobj, &smt_aware_attr.attr);
+ if (ret)
+ goto err_file;
+
+ return 0;
+
+err_file:
+ sysfs_remove_group(housekeeping_kobj, &housekeeping_attr_group);
+err_group:
+ kobject_put(housekeeping_kobj);
+ return ret;
+}
+late_initcall(housekeeping_sysfs_init);
+
void __init housekeeping_init(void)
{
enum hk_type type;
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 12/15] sched/isolation: Bridge boot-time parameters with dynamic isolation
2026-03-25 9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
` (10 preceding siblings ...)
2026-03-25 9:09 ` [PATCH 11/15] sched/isolation: Implement SMT-aware isolation and safety guards Qiliang Yuan
@ 2026-03-25 9:09 ` Qiliang Yuan
2026-03-25 9:09 ` [PATCH 13/15] sched/isolation: Implement sysfs interface for dynamic housekeeping Qiliang Yuan
` (3 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25 9:09 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
Shuah Khan
Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan
The boot-time parameters 'isolcpus' and 'nohz_full' currently initialize
housekeeping masks that cannot be easily updated at runtime. To support
DHEI, the scheduler's tick offload infrastructure must be ready for
dynamic enablement even if no isolation was requested at boot.
Enable unconditional boot-time initialization for tick offload.
This ensures that the infrastructure for remote ticks is always present,
allowing DHEI to safely toggle full dynticks mode at runtime.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
kernel/sched/core.c | 5 +++++
kernel/sched/isolation.c | 3 ---
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ddf9951f1438c..d987ce03e7cc6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5688,6 +5688,9 @@ static void sched_tick_stop(int cpu)
int __init sched_tick_offload_init(void)
{
+ if (tick_work_cpu)
+ return 0;
+
tick_work_cpu = alloc_percpu(struct tick_work);
BUG_ON(!tick_work_cpu);
return 0;
@@ -8509,6 +8512,8 @@ void __init sched_init_smp(void)
current->flags &= ~PF_NO_SETAFFINITY;
sched_init_granularity();
+ sched_tick_offload_init();
+
init_sched_rt_class();
init_sched_dl_class();
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 4a5967837e8de..685cc0df1bd9f 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -299,9 +299,6 @@ void __init housekeeping_init(void)
static_branch_enable(&housekeeping_overridden);
- if (housekeeping.flags & HK_FLAG_KERNEL_NOISE)
- sched_tick_offload_init();
-
for_each_set_bit(type, &housekeeping.flags, HK_TYPE_MAX) {
/* We need at least one CPU to handle housekeeping work */
WARN_ON_ONCE(cpumask_empty(housekeeping.cpumasks[type]));
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 13/15] sched/isolation: Implement sysfs interface for dynamic housekeeping
2026-03-25 9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
` (11 preceding siblings ...)
2026-03-25 9:09 ` [PATCH 12/15] sched/isolation: Bridge boot-time parameters with dynamic isolation Qiliang Yuan
@ 2026-03-25 9:09 ` Qiliang Yuan
2026-03-25 14:04 ` Peter Zijlstra
2026-03-25 9:09 ` [PATCH 14/15] Documentation: isolation: Document DHEI sysfs interfaces Qiliang Yuan
` (2 subsequent siblings)
15 siblings, 1 reply; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25 9:09 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
Shuah Khan
Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan
Subsystem housekeeping masks are currently static and can only be set
via boot-time parameters (isolcpus, nohz_full, etc.). There is no
userspace interface to reconfigure these boundaries at runtime.
Implement the DHEI sysfs interface under /sys/kernel/housekeeping.
This enables userspace to independently reconfigure different kernel
services' affinities without a reboot.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
kernel/sched/isolation.c | 89 ++++++++++++++++++++++++------------------------
1 file changed, 45 insertions(+), 44 deletions(-)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 685cc0df1bd9f..1c867784d155b 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -8,7 +8,12 @@
*
*/
#include <linux/sched/isolation.h>
+#include <linux/capability.h>
#include <linux/mutex.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <linux/slab.h>
+#include <linux/ctype.h>
#include <linux/notifier.h>
#include <linux/topology.h>
#include "sched.h"
@@ -16,9 +21,17 @@
enum hk_flags {
HK_FLAG_DOMAIN = BIT(HK_TYPE_DOMAIN),
HK_FLAG_MANAGED_IRQ = BIT(HK_TYPE_MANAGED_IRQ),
- HK_FLAG_KERNEL_NOISE = BIT(HK_TYPE_KERNEL_NOISE),
+ HK_FLAG_TICK = BIT(HK_TYPE_TICK),
+ HK_FLAG_TIMER = BIT(HK_TYPE_TIMER),
+ HK_FLAG_RCU = BIT(HK_TYPE_RCU),
+ HK_FLAG_MISC = BIT(HK_TYPE_MISC),
+ HK_FLAG_WQ = BIT(HK_TYPE_WQ),
+ HK_FLAG_KTHREAD = BIT(HK_TYPE_KTHREAD),
};
+#define HK_FLAG_KERNEL_NOISE (HK_FLAG_TICK | HK_FLAG_TIMER | HK_FLAG_RCU | \
+ HK_FLAG_MISC | HK_FLAG_WQ | HK_FLAG_KTHREAD)
+
static DEFINE_MUTEX(housekeeping_mutex);
static BLOCKING_NOTIFIER_HEAD(housekeeping_notifier_list);
DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
@@ -44,6 +57,9 @@ static ssize_t smt_aware_store(struct kobject *kobj,
{
bool val;
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
if (kstrtobool(buf, &val))
return -EINVAL;
@@ -53,7 +69,7 @@ static ssize_t smt_aware_store(struct kobject *kobj,
}
static struct kobj_attribute smt_aware_attr =
- __ATTR(smt_aware_mode, 0644, smt_aware_show, smt_aware_store);
+ __ATTR(smt_aware_mode, 0600, smt_aware_show, smt_aware_store);
bool housekeeping_enabled(enum hk_type type)
{
@@ -171,6 +187,9 @@ static ssize_t housekeeping_store(struct kobject *kobject,
cpumask_var_t new_mask;
int err;
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))
return -ENOMEM;
@@ -178,42 +197,26 @@ static ssize_t housekeeping_store(struct kobject *kobject,
if (err)
goto out_free;
- /* Safety check: must have at least one online CPU for housekeeping */
- if (!cpumask_intersects(new_mask, cpu_online_mask)) {
+ if (cpumask_empty(new_mask) ||
+ !cpumask_intersects(new_mask, cpu_online_mask)) {
err = -EINVAL;
goto out_free;
}
- if (housekeeping_smt_aware) {
- int cpu, sibling;
- cpumask_var_t tmp_mask;
+ mutex_lock(&housekeeping_mutex);
- if (!alloc_cpumask_var(&tmp_mask, GFP_KERNEL)) {
- err = -ENOMEM;
- goto out_free;
- }
+ if (housekeeping_smt_aware) {
+ int cpu;
- cpumask_copy(tmp_mask, new_mask);
- for_each_cpu(cpu, tmp_mask) {
- for_each_cpu(sibling, topology_sibling_cpumask(cpu)) {
- if (!cpumask_test_cpu(sibling, tmp_mask)) {
- /* SMT sibling should stay grouped */
- cpumask_clear_cpu(cpu, new_mask);
- break;
- }
+ for_each_cpu(cpu, new_mask) {
+ if (!cpumask_subset(topology_sibling_cpumask(cpu),
+ new_mask)) {
+ err = -EINVAL;
+ goto out_unlock;
}
}
- free_cpumask_var(tmp_mask);
-
- /* Re-check after SMT sync */
- if (!cpumask_intersects(new_mask, cpu_online_mask)) {
- err = -EINVAL;
- goto out_free;
- }
}
- mutex_lock(&housekeeping_mutex);
-
if (!housekeeping.cpumasks[type]) {
if (!alloc_cpumask_var(&housekeeping.cpumasks[type], GFP_KERNEL)) {
err = -ENOMEM;
@@ -242,7 +245,7 @@ static ssize_t housekeeping_store(struct kobject *kobject,
}
static struct hk_attribute housekeeping_attrs[HK_TYPE_MAX];
-static struct attribute *housekeeping_attr_ptr[HK_TYPE_MAX + 1];
+static struct attribute *housekeeping_attr_ptr[HK_TYPE_MAX + 2];
static const struct attribute_group housekeeping_attr_group = {
.attrs = housekeeping_attr_ptr,
@@ -265,28 +268,22 @@ static int __init housekeeping_sysfs_init(void)
housekeeping_attrs[i].type = i;
sysfs_attr_init(&housekeeping_attrs[i].kattr.attr);
housekeeping_attrs[i].kattr.attr.name = hk_type_names[i];
- housekeeping_attrs[i].kattr.attr.mode = 0644;
+ housekeeping_attrs[i].kattr.attr.mode = 0600;
housekeeping_attrs[i].kattr.show = housekeeping_show;
housekeeping_attrs[i].kattr.store = housekeeping_store;
housekeeping_attr_ptr[j++] = &housekeeping_attrs[i].kattr.attr;
}
+
+ housekeeping_attr_ptr[j++] = &smt_aware_attr.attr;
housekeeping_attr_ptr[j] = NULL;
ret = sysfs_create_group(housekeeping_kobj, &housekeeping_attr_group);
- if (ret)
- goto err_group;
-
- ret = sysfs_create_file(housekeeping_kobj, &smt_aware_attr.attr);
- if (ret)
- goto err_file;
+ if (ret) {
+ kobject_put(housekeeping_kobj);
+ return ret;
+ }
return 0;
-
-err_file:
- sysfs_remove_group(housekeeping_kobj, &housekeeping_attr_group);
-err_group:
- kobject_put(housekeeping_kobj);
- return ret;
}
late_initcall(housekeeping_sysfs_init);
@@ -313,8 +310,12 @@ static void __init housekeeping_setup_type(enum hk_type type,
if (!slab_is_available())
gfp = GFP_NOWAIT;
- if (!housekeeping.cpumasks[type])
- alloc_cpumask_var(&housekeeping.cpumasks[type], gfp);
+ if (!housekeeping.cpumasks[type]) {
+ if (!alloc_cpumask_var(&housekeeping.cpumasks[type], gfp)) {
+ pr_err("housekeeping: failed to allocate cpumask for type %d\n", type);
+ return;
+ }
+ }
cpumask_copy(housekeeping.cpumasks[type],
housekeeping_staging);
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 13/15] sched/isolation: Implement sysfs interface for dynamic housekeeping
2026-03-25 9:09 ` [PATCH 13/15] sched/isolation: Implement sysfs interface for dynamic housekeeping Qiliang Yuan
@ 2026-03-25 14:04 ` Peter Zijlstra
0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2026-03-25 14:04 UTC (permalink / raw)
To: Qiliang Yuan
Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Thomas Gleixner, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Tejun Heo, Andrew Morton, Vlastimil Babka, Suren Baghdasaryan,
Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan,
Anna-Maria Behnsen, Ingo Molnar, Shuah Khan, linux-kernel, rcu,
linux-mm, linux-kselftest
On Wed, Mar 25, 2026 at 05:09:44PM +0800, Qiliang Yuan wrote:
> Subsystem housekeeping masks are currently static and can only be set
> via boot-time parameters (isolcpus, nohz_full, etc.). There is no
> userspace interface to reconfigure these boundaries at runtime.
>
> Implement the DHEI sysfs interface under /sys/kernel/housekeeping.
>
Why? What was wrong with cpusets?
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 14/15] Documentation: isolation: Document DHEI sysfs interfaces
2026-03-25 9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
` (12 preceding siblings ...)
2026-03-25 9:09 ` [PATCH 13/15] sched/isolation: Implement sysfs interface for dynamic housekeeping Qiliang Yuan
@ 2026-03-25 9:09 ` Qiliang Yuan
2026-03-25 9:09 ` [PATCH 15/15] selftests: dhei: Add functional tests for dynamic housekeeping Qiliang Yuan
2026-03-25 16:02 ` [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Tejun Heo
15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25 9:09 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
Shuah Khan
Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan
---
.../ABI/testing/sysfs-kernel-housekeeping | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-kernel-housekeeping b/Documentation/ABI/testing/sysfs-kernel-housekeeping
new file mode 100644
index 0000000000000..3648578200111
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-housekeeping
@@ -0,0 +1,22 @@
+What: /sys/kernel/housekeeping/
+Date: March 2026
+Contact: Qiliang Yuan <realwujing@gmail.com>
+Description:
+ Directory containing the dynamic housekeeping configuration
+ for various kernel subsystems.
+
+ Each file represents a specific housekeeping type:
+ - timer: Timer and hrtimer interrupts.
+ - rcu: RCU callback offloading and GP kthreads.
+ - misc: Miscellaneous kernel services (e.g. kcompactd).
+ - tick: Dynamic full dynticks (NOHZ_FULL) state.
+ - domain: Scheduler domain isolation.
+ - workqueue: Workqueue affinity.
+ - managed_irq: Managed interrupts migration.
+ - kthread: General kernel thread affinity.
+ - smt_aware_mode: SMT-aware isolation toggle (0/1).
+ When enabled, writing a mask that does not include all
+ sibling threads of a core will be rejected with -EINVAL.
+
+ Writing a CPULIST to the type files dynamically updates the
+ housekeeping mask for the corresponding type.
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 15/15] selftests: dhei: Add functional tests for dynamic housekeeping
2026-03-25 9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
` (13 preceding siblings ...)
2026-03-25 9:09 ` [PATCH 14/15] Documentation: isolation: Document DHEI sysfs interfaces Qiliang Yuan
@ 2026-03-25 9:09 ` Qiliang Yuan
2026-03-25 16:02 ` [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Tejun Heo
15 siblings, 0 replies; 23+ messages in thread
From: Qiliang Yuan @ 2026-03-25 9:09 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Tejun Heo, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
Shuah Khan
Cc: linux-kernel, rcu, linux-mm, linux-kselftest, Qiliang Yuan
Dynamic Housekeeping (DHEI) introduces complex runtime interactions
across sysfs, scheduler, and various kernel subsystems. There are
currently no automated tests to verify the integrity of sysfs
boundaries, safety guards, or SMT-aware isolation logic.
Implement a kselftest suite for DHEI to ensure functional correctness.
This includes a dedicated test script (dhei_test.sh) covering sysfs
interface accessibility, safety guard enforcement, and SMT-aware grouping.
The suite also incorporates stress-ng based pressure testing to verify
load-shedding efficiency on isolated CPUs, Tick suppression under active
task load, and Workqueue restriction under competitive system pressure.
Usage:
make -C tools/testing/selftests/dhei run_tests
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/dhei/Makefile | 4 +
tools/testing/selftests/dhei/dhei_test.sh | 160 ++++++++++++++++++++++++++++++
3 files changed, 165 insertions(+)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 56e44a98d6a59..9d16b00623839 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -16,6 +16,7 @@ TARGETS += cpu-hotplug
TARGETS += damon
TARGETS += devices/error_logs
TARGETS += devices/probe
+TARGETS += dhei
TARGETS += dmabuf-heaps
TARGETS += drivers/dma-buf
TARGETS += drivers/ntsync
diff --git a/tools/testing/selftests/dhei/Makefile b/tools/testing/selftests/dhei/Makefile
new file mode 100644
index 0000000000000..a578691cc677c
--- /dev/null
+++ b/tools/testing/selftests/dhei/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+TEST_PROGS := dhei_test.sh
+
+include ../lib.mk
diff --git a/tools/testing/selftests/dhei/dhei_test.sh b/tools/testing/selftests/dhei/dhei_test.sh
new file mode 100755
index 0000000000000..a6137c52e7132
--- /dev/null
+++ b/tools/testing/selftests/dhei/dhei_test.sh
@@ -0,0 +1,160 @@
+#!/bin/sh
+# DHEI (Dynamic Housekeeping & Enhanced Isolation) Full-Coverage Verification Script
+# Strict POSIX compliant version for reliability on all shells.
+
+SYSFS_BASE="/sys/kernel/housekeeping"
+ONLINE_CPUS=$(cat /sys/devices/system/cpu/online)
+LAST_CPU=$(echo "$ONLINE_CPUS" | awk -F'[,-]' '{print $NF}')
+
+# Colors for output
+GREEN='\033[0;32m'
+RED='\033[0;31m'
+NC='\033[0m'
+
+log_pass() { echo "${GREEN}[OK]${NC} $1"; }
+log_fail() { echo "${RED}[FAIL]${NC} $1"; exit 1; }
+log_info() { echo "[INFO] $1"; }
+
+check_root() {
+ [ "$(id -u)" -eq 0 ] || log_fail "Please run as root"
+}
+
+test_sysfs_structure() {
+ log_info "TEST 1: Sysfs structure..."
+ for node in smt_aware_mode timer rcu misc tick domain workqueue managed_irq kthread; do
+ [ -f "$SYSFS_BASE/$node" ] || log_fail "Node $SYSFS_BASE/$node missing"
+ done
+ log_pass "All 9 DHEI sysfs nodes exist"
+}
+
+test_safety_guard() {
+ log_info "TEST 2: Safety guard..."
+ if echo "999-1024" > "$SYSFS_BASE/domain" 2>/dev/null; then
+ log_fail "Safety guard failed: allowed isolation of all CPUs"
+ fi
+ log_pass "Safety guard blocked invalid mask"
+}
+
+test_smt_aware_mode() {
+ log_info "TEST 3: SMT aware logic..."
+ [ -f /sys/devices/system/cpu/cpu0/topology/thread_siblings_list ] || { log_info "SMT not supported"; return; }
+ SIBLINGS=$(cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list)
+ FIRST=$(echo "$SIBLINGS" | cut -d',' -f1 | cut -d'-' -f1)
+ echo 1 > "$SYSFS_BASE/smt_aware_mode"
+ if echo "$FIRST" > "$SYSFS_BASE/timer" 2>/dev/null; then
+ echo 0 > "$SYSFS_BASE/smt_aware_mode"
+ log_fail "SMT mode failed: accepted partial core"
+ else
+ log_pass "SMT mode correctly rejected partial core"
+ fi
+ echo 0 > "$SYSFS_BASE/smt_aware_mode"
+}
+
+get_tick_count() {
+ grep "LOC:" /proc/interrupts | awk -v cpu="$LAST_CPU" '{print $(cpu+2)}'
+}
+
+test_tick_dynamic() {
+ log_info "TEST 4: Dynamic Tick toggle..."
+ [ "$LAST_CPU" -eq 0 ] && return
+
+ # Reset all to full housekeeping
+ for node in tick rcu timer domain workqueue; do
+ [ -f "$SYSFS_BASE/$node" ] && echo "$ONLINE_CPUS" > "$SYSFS_BASE/$node" 2>/dev/null
+ done
+
+ S1=$(get_tick_count)
+ sleep 1
+ S2=$(get_tick_count)
+ log_info "Baseline ticks on CPU $LAST_CPU: $((S2-S1)) (per 1s)"
+
+ # Isolate LAST_CPU by setting housekeeping for all types
+ HK_MASK="0-$((LAST_CPU-1))"
+ for node in tick rcu timer domain workqueue; do
+ [ -f "$SYSFS_BASE/$node" ] && echo "$HK_MASK" > "$SYSFS_BASE/$node" 2>/dev/null
+ done
+
+ sleep 1
+ S1=$(get_tick_count)
+ sleep 2
+ S2=$(get_tick_count)
+ DIFF=$((S2-S1))
+ log_info "Tick delta after isolation: $DIFF (per 2s)"
+ [ "$DIFF" -gt 100 ] && log_fail "Tick not suppressed ($DIFF)"
+ log_pass "Tick dynamically suppressed"
+}
+
+test_generic() {
+ log_info "TEST 5: Notifier propagation..."
+ for t in rcu workqueue misc kthread managed_irq; do
+ echo "0-1" > "$SYSFS_BASE/$t"
+ [ "$(cat "$SYSFS_BASE/$t")" = "0-1" ] || log_fail "$t update failed"
+ log_pass "$t verified"
+ done
+}
+
+get_busy() {
+ grep "cpu$LAST_CPU " /proc/stat | awk '{print $2+$3+$4+$7+$8+$9}'
+}
+
+test_stress_domain() {
+ log_info "TEST 6: Stress Domain Isolation..."
+ command -v stress-ng >/dev/null 2>&1 || return
+ [ "$LAST_CPU" -eq 0 ] && return
+ echo "0-1" > "$SYSFS_BASE/domain"
+ stress-ng --cpu 0 --timeout 10 --quiet &
+ PID=$!
+ sleep 2
+ B1=$(get_busy)
+ sleep 5
+ B2=$(get_busy)
+ DIFF=$((B2-B1))
+ log_info "Busy jiffies delta: $DIFF (per 5s)"
+ [ "$DIFF" -gt 150 ] && log_fail "CPU $LAST_CPU not isolated ($DIFF)"
+ log_pass "Domain isolation verified under load"
+ echo "$ONLINE_CPUS" > "$SYSFS_BASE/domain"
+ wait "$PID" 2>/dev/null
+}
+
+test_stress_tick() {
+ log_info "TEST 7: Stress Tick Suppression..."
+ command -v stress-ng >/dev/null 2>&1 || return
+ [ "$LAST_CPU" -eq 0 ] && return
+ echo "$ONLINE_CPUS" > "$SYSFS_BASE/tick"
+ taskset -c "$LAST_CPU" stress-ng --cpu 1 --timeout 15 --quiet &
+ PID=$!
+ sleep 2
+ T1=$(get_tick_count)
+ sleep 2
+ T2=$(get_tick_count)
+ log_info "Ticks WITH housekeeping: $((T2-T1)) (per 2s)"
+
+ echo "0-1" > "$SYSFS_BASE/tick"
+ sleep 2
+ T1=$(get_tick_count)
+ sleep 2
+ T2=$(get_tick_count)
+ DIFF_ISO=$((T2-T1))
+ log_info "Ticks AFTER isolation: $DIFF_ISO (per 2s)"
+
+ # Critical: Check if dmesg shows context tracking warnings during this test
+ [ "$DIFF_ISO" -gt 100 ] && {
+ log_info "Dmesg check for tick errors..."
+ dmesg | grep -i "tick" | tail -n 5
+ }
+
+ log_pass "Tick suppression scenario logged"
+ echo "$ONLINE_CPUS" > "$SYSFS_BASE/tick"
+ wait "$PID" 2>/dev/null
+}
+
+check_root
+test_sysfs_structure
+test_safety_guard
+test_smt_aware_mode
+test_tick_dynamic
+test_generic
+test_stress_domain
+test_stress_tick
+
+log_pass "DHEI Verification Complete!"
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI)
2026-03-25 9:09 [PATCH 00/15] Implementation of Dynamic Housekeeping & Enhanced Isolation (DHEI) Qiliang Yuan
` (14 preceding siblings ...)
2026-03-25 9:09 ` [PATCH 15/15] selftests: dhei: Add functional tests for dynamic housekeeping Qiliang Yuan
@ 2026-03-25 16:02 ` Tejun Heo
15 siblings, 0 replies; 23+ messages in thread
From: Tejun Heo @ 2026-03-25 16:02 UTC (permalink / raw)
To: Qiliang Yuan
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Thomas Gleixner, Paul E. McKenney,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Anna-Maria Behnsen, Ingo Molnar,
Shuah Khan, linux-kernel, rcu, linux-mm, linux-kselftest
On Wed, Mar 25, 2026 at 05:09:31PM +0800, Qiliang Yuan wrote:
> The Linux kernel provides mechanisms like 'isolcpus' and 'nohz_full' to
> reduce interference for latency-sensitive workloads. However, these are
> locked behind the "Reboot Wall" - they can only be configured via boot
> parameters and require a system restart for changes to take effect.
>
> In modern cloud-native environments, CPU resources often need to be
> dynamically re-partitioned to accommodate container scaling without
> the performance penalty and downtime of a full system reboot. Similarly,
> high-frequency trading (HFT) platforms require the ability to fine-tune
> CPU isolation at runtime to minimize jitter for critical execution threads
> based on shifting market demands.
>
> This patch series introduces Dynamic Housekeeping & Enhanced Isolation
> (DHEI). DHEI allows administrators to reconfigure the kernel's
> housekeeping boundaries at runtime via a new sysfs interface at
> /sys/kernel/housekeeping/.
I think I asked for this in the previous thread but please coordinate with
existing cpuset and isolation mechanisms. You aren't even cc'ing Waiman for
cpuset.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 23+ messages in thread