From: Jing Wu <realwujing@gmail.com>
To: Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Frederic Weisbecker <frederic@kernel.org>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Joel Fernandes <joelagnelf@nvidia.com>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun@kernel.org>,
Uladzislau Rezki <urezki@gmail.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang@linux.dev>,
Anna-Maria Behnsen <anna-maria@linutronix.de>,
Tejun Heo <tj@kernel.org>, Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Shuah Khan <shuah@kernel.org>, Thomas Gleixner <tglx@kernel.org>
Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org,
cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kselftest@vger.kernel.org, Jing Wu <realwujing@gmail.com>,
Qiliang Yuan <yuanql9@chinatelecom.cn>
Subject: [PATCH v3 08/13] genirq: Add explicit housekeeping callback for managed IRQ migration
Date: Thu, 18 Jun 2026 11:11:19 +0800 [thread overview]
Message-ID: <20260618-wujing-dhm-v3-8-28f1a4d83b68@gmail.com> (raw)
In-Reply-To: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com>
Register a housekeeping callback for HK_TYPE_MANAGED_IRQ. When the
mask changes, iterate all active managed interrupts, intersect their
current affinity mask with the new housekeeping mask, and re-apply
with irq_do_set_affinity(). Managed interrupts on CPUs removed from
the housekeeping set are migrated to remaining housekeeping CPUs.
Only managed interrupts (IRQF_AFFINITY_MANAGED) are selected because
the kernel owns their affinity; user-controlled IRQ affinities must
not be overridden by the housekeeping layer.
The new HK_TYPE_MANAGED_IRQ cpumask is snapshotted once under an RCU
read lock before the IRQ loop, satisfying the lockdep annotation in
housekeeping_cpumask() for runtime-mutable types.
When the intersection of the IRQ's current affinity and the new
housekeeping mask is non-empty, irq_do_set_affinity() moves the IRQ
to the restricted set. If the intersection is empty (all CPUs that
were serving this IRQ are now isolated), the affinity update is skipped
and the IRQ continues to run on the isolated CPU temporarily. Full
support for the IRQ shutdown / re-startup path (when all serving CPUs
become isolated) is left for follow-up work.
Guarded by irq_lock_sparse() and per-descriptor raw_spin_lock to
prevent races with concurrent affinity changes.
Signed-off-by: Jing Wu <realwujing@gmail.com>
Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
---
kernel/irq/manage.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 86 insertions(+)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 2e80724378267..ea97f455eab2a 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -2801,3 +2801,89 @@ bool irq_check_status_bit(unsigned int irq, unsigned int bitmask)
return res;
}
EXPORT_SYMBOL_GPL(irq_check_status_bit);
+
+/*
+ * Managed IRQ housekeeping callback: iterate all managed IRQs and ask
+ * the chip to move them off CPUs newly removed from HK_TYPE_MANAGED_IRQ.
+ */
+static void irq_hk_apply(enum hk_type type)
+{
+ cpumask_var_t hk_mask;
+ struct irq_desc *desc;
+ unsigned int irq;
+
+ if (!alloc_cpumask_var(&hk_mask, GFP_KERNEL))
+ return;
+
+ /*
+ * Snapshot the new HK_TYPE_MANAGED_IRQ mask under an RCU read lock
+ * before iterating IRQ descriptors. The lockdep annotation in
+ * housekeeping_cpumask() requires an RCU read-side critical section
+ * for runtime-mutable types.
+ */
+ rcu_read_lock();
+ cpumask_copy(hk_mask, housekeeping_cpumask_rcu(HK_TYPE_MANAGED_IRQ));
+ rcu_read_unlock();
+
+ irq_lock_sparse();
+
+ for_each_active_irq(irq) {
+ desc = irq_to_desc(irq);
+ if (!desc || !desc->action)
+ continue;
+
+ /*
+ * Only managed interrupts are selected: they have
+ * IRQF_AFFINITY_MANAGED set, meaning the kernel owns their
+ * affinity. User-controlled IRQs are intentionally skipped.
+ *
+ * When the intersection of the current affinity mask and the
+ * new housekeeping mask is non-empty, re-apply the restricted
+ * affinity to migrate the IRQ away from newly isolated CPUs.
+ * If the intersection is empty (all serving CPUs are now
+ * isolated), the IRQ is left on its current CPU temporarily;
+ * handling that case (IRQ shutdown / re-startup) is left for
+ * a follow-up.
+ */
+ if (irqd_affinity_is_managed(&desc->irq_data)) {
+ const struct cpumask *mask;
+ struct cpumask *tmp = this_cpu_ptr(&__tmp_mask);
+
+ raw_spin_lock_irq(&desc->lock);
+ mask = irq_data_get_affinity_mask(&desc->irq_data);
+ cpumask_and(tmp, mask, hk_mask);
+ if (cpumask_intersects(tmp, cpu_online_mask))
+ irq_do_set_affinity(&desc->irq_data, tmp, false);
+ raw_spin_unlock_irq(&desc->lock);
+ }
+ }
+
+ irq_unlock_sparse();
+ free_cpumask_var(hk_mask);
+}
+
+static int irq_hk_validate(enum hk_type type,
+ const struct cpumask *cur_mask,
+ const struct cpumask *new_mask)
+{
+ if (!IS_ENABLED(CONFIG_SMP))
+ return -EOPNOTSUPP;
+ return 0;
+}
+
+static struct housekeeping_cbs irq_hk_cbs = {
+ .name = "genirq/managed",
+ .pre_validate = irq_hk_validate,
+ .apply = irq_hk_apply,
+};
+
+static int __init irq_hk_init(void)
+{
+ int ret;
+
+ ret = housekeeping_register_cbs(HK_TYPE_MANAGED_IRQ, &irq_hk_cbs);
+ if (ret)
+ pr_info("genirq: managed IRQ runtime migration disabled (%d)\n", ret);
+ return 0;
+}
+late_initcall(irq_hk_init);
--
2.43.0
next prev parent reply other threads:[~2026-06-18 3:12 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-18 3:11 [PATCH v3 00/13] Dynamic Housekeeping Management (DHM) via CPUSets Jing Wu
2026-06-18 3:11 ` [PATCH v3 01/13] sched/isolation: Replace notifier chain with explicit callback interface Jing Wu
2026-06-18 3:11 ` [PATCH v3 02/13] sched/isolation: Add housekeeping_update_types() for kernel-noise masks Jing Wu
2026-06-18 3:11 ` [PATCH v3 03/13] sched/isolation: RCU-protect all housekeeping cpumask readers Jing Wu
2026-06-18 3:11 ` [PATCH v3 04/13] sched/isolation: Fix RCU protection for runtime-mutable cpumask callers Jing Wu
2026-06-18 3:11 ` [PATCH v3 05/13] cpu/hotplug: Reserve CPUHP states for nohz_full and managed IRQ down-paths Jing Wu
2026-06-18 16:06 ` Thomas Gleixner
2026-06-18 21:01 ` Thomas Gleixner
2026-06-18 3:11 ` [PATCH v3 06/13] tick/nohz, context_tracking: Prepare for runtime nohz_full updates Jing Wu
2026-06-18 17:27 ` Thomas Gleixner
2026-06-18 19:49 ` Thomas Gleixner
2026-06-18 3:11 ` [PATCH v3 07/13] rcu/nocb: Add explicit housekeeping callback for runtime NOCB toggling Jing Wu
2026-06-18 3:11 ` Jing Wu [this message]
2026-06-18 20:27 ` [PATCH v3 08/13] genirq: Add explicit housekeeping callback for managed IRQ migration Thomas Gleixner
2026-06-18 21:11 ` Thomas Gleixner
2026-06-18 3:11 ` [PATCH v3 09/13] watchdog/lockup_detector: Register housekeeping callback for kernel-noise Jing Wu
2026-06-18 3:11 ` [PATCH v3 10/13] sched: Guard sched_tick_start/stop against uninitialized tick_work_cpu Jing Wu
2026-06-18 20:50 ` Thomas Gleixner
2026-06-18 3:11 ` [PATCH v3 11/13] cgroup/cpuset: Extend isolated partition to trigger kernel-noise isolation Jing Wu
2026-06-18 20:55 ` Thomas Gleixner
2026-06-18 3:11 ` [PATCH v3 12/13] docs: cgroup-v2: Document kernel-noise isolation via isolated partitions Jing Wu
2026-06-18 3:11 ` [PATCH v3 13/13] selftests/cgroup: Add kernel-noise isolation test to cpuset selftest Jing Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260618-wujing-dhm-v3-8-28f1a4d83b68@gmail.com \
--to=realwujing@gmail.com \
--cc=anna-maria@linutronix.de \
--cc=boqun@kernel.org \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=corbet@lwn.net \
--cc=dietmar.eggemann@arm.com \
--cc=frederic@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=josh@joshtriplett.org \
--cc=juri.lelli@redhat.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=neeraj.upadhyay@kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=qiang.zhang@linux.dev \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=shuah@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=tglx@kernel.org \
--cc=tj@kernel.org \
--cc=urezki@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=yuanql9@chinatelecom.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox