From: Thomas Gleixner <tglx@kernel.org>
To: Jing Wu <realwujing@gmail.com>, Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Frederic Weisbecker <frederic@kernel.org>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Joel Fernandes <joelagnelf@nvidia.com>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun@kernel.org>,
Uladzislau Rezki <urezki@gmail.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang@linux.dev>,
Anna-Maria Behnsen <anna-maria@linutronix.de>,
Tejun Heo <tj@kernel.org>, Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Shuah Khan <shuah@kernel.org>, Waiman Long <longman@redhat.com>
Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org,
cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kselftest@vger.kernel.org, Jing Wu <realwujing@gmail.com>,
Qiliang Yuan <yuanql9@chinatelecom.cn>
Subject: Re: [PATCH v3 08/13] genirq: Add explicit housekeeping callback for managed IRQ migration
Date: Thu, 18 Jun 2026 22:27:57 +0200 [thread overview]
Message-ID: <87cxxnegqa.ffs@fw13> (raw)
In-Reply-To: <20260618-wujing-dhm-v3-8-28f1a4d83b68@gmail.com>
On Thu, Jun 18 2026 at 11:11, Jing Wu wrote:
> +
> +/*
> + * Managed IRQ housekeeping callback: iterate all managed IRQs and ask
S/IRQ/interrupt/
> + * the chip to move them off CPUs newly removed from HK_TYPE_MANAGED_IRQ.
Also this doesn't ask the chip to move it.
> + */
> +static void irq_hk_apply(enum hk_type type)
> +{
> + cpumask_var_t hk_mask;
> + struct irq_desc *desc;
> + unsigned int irq;
> +
> + if (!alloc_cpumask_var(&hk_mask, GFP_KERNEL))
> + return;
> +
> + /*
> + * Snapshot the new HK_TYPE_MANAGED_IRQ mask under an RCU read lock
> + * before iterating IRQ descriptors. The lockdep annotation in
> + * housekeeping_cpumask() requires an RCU read-side critical section
> + * for runtime-mutable types.
> + */
> + rcu_read_lock();
> + cpumask_copy(hk_mask, housekeeping_cpumask_rcu(HK_TYPE_MANAGED_IRQ));
> + rcu_read_unlock();
Same comments as in the nohz patch.
> +
> + irq_lock_sparse();
> +
> + for_each_active_irq(irq) {
> + desc = irq_to_desc(irq);
> + if (!desc || !desc->action)
> + continue;
> +
for (unsigned int irq = 0; irq < total_nr_irqs; irq++) {
struct irq_desc *desc;
scoped_guard(rcu)
desc = irq_find_desc_at_or_after(irq);
....
> + /*
> + * Only managed interrupts are selected: they have
> + * IRQF_AFFINITY_MANAGED set, meaning the kernel owns their
> + * affinity. User-controlled IRQs are intentionally skipped.
> + *
> + * When the intersection of the current affinity mask and the
> + * new housekeeping mask is non-empty, re-apply the restricted
> + * affinity to migrate the IRQ away from newly isolated CPUs.
> + * If the intersection is empty (all serving CPUs are now
> + * isolated), the IRQ is left on its current CPU temporarily;
> + * handling that case (IRQ shutdown / re-startup) is left for
> + * a follow-up.
Oh well...
> + */
> + if (irqd_affinity_is_managed(&desc->irq_data)) {
So you set the affinity even on an interrupt which is shutdown?
> + const struct cpumask *mask;
> + struct cpumask *tmp = this_cpu_ptr(&__tmp_mask);
> +
> + raw_spin_lock_irq(&desc->lock);
guard()
> + mask = irq_data_get_affinity_mask(&desc->irq_data);
> + cpumask_and(tmp, mask, hk_mask);
> + if (cpumask_intersects(tmp, cpu_online_mask))
> + irq_do_set_affinity(&desc->irq_data, tmp, false);
That's completely broken. You _cannot_ change the affinity mask of a
managed interrupt. The mask itself is immutable.
The effective affinity can be changed by invoking the affinity setter
with the original unmodified mask. irq_do_set_affinity() already deals
with the housekeeping mask.
Also invoking irq_do_set_affinity() directly here is just wrong. It
breaks interrupts which cannot be moved in process context.
But even if that is fixed, then there is zero coordination with the
affected drivers/subsystems. Managed interrupts are related to device
and block queues and you cannot change one without the other. Neither
can you stop managed interrupts without quiescing the related device
queue. Starting them up requires also to reenable the device queue.
This problem needs to be fixed no matter what. See below.
> +static int irq_hk_validate(enum hk_type type,
> + const struct cpumask *cur_mask,
> + const struct cpumask *new_mask)
> +{
> + if (!IS_ENABLED(CONFIG_SMP))
> + return -EOPNOTSUPP;
> + return 0;
Seriously? Why is this stuff even built when CONFIG_SMP=n?
So these validate callback seem to be just another voodoo container for
no value.
While this series might work for you by some definition of "works", it's
broken beyond repair and it's really annoying that I explained all of it
to the other people who try to solve that very same problem. Of course
you did not read any of that otherwise you would have CC'ed them.
https://lore.kernel.org/lkml/87o6jcb84w.ffs@tglx
Trying to do that without taking the CPUs mostly offline and bringing
them online again is not going to work and there is zero benefit trying
to avoid that. First of all changing the isolation is not a hotpath
operation. Doing it one by one without bringing the CPU completely down
as I outlined in the above linked mail is not much more disruptive than
trying to do all of this on the fly. If you isolate a CPU then the tasks
on that CPU which do not belong to the isolation set need to get off the
CPU anyway. If you unisolate a CPU then it's really not a problem
whether the non-isolated tasks can move on it 10 milliseconds earlier or
later.
If you want to solve all the problems related to NOHZ, managed
interrupts, RCU etc. without the hotplug machinery then you end up
replicating half of it. Don't even try to think about it, that's a
complete waste of time and won't go anywhere.
Fix the few issues which are related to hotplug that I described in the
above linked mail and use the fully correct and tested common code for
your isolation muck. Please coordinate with Waiman or whoever is working
on it at RH right now.
Thanks,
tglx
next prev parent reply other threads:[~2026-06-18 20:27 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-18 3:11 [PATCH v3 00/13] Dynamic Housekeeping Management (DHM) via CPUSets Jing Wu
2026-06-18 3:11 ` [PATCH v3 01/13] sched/isolation: Replace notifier chain with explicit callback interface Jing Wu
2026-06-18 3:11 ` [PATCH v3 02/13] sched/isolation: Add housekeeping_update_types() for kernel-noise masks Jing Wu
2026-06-18 3:11 ` [PATCH v3 03/13] sched/isolation: RCU-protect all housekeeping cpumask readers Jing Wu
2026-06-18 3:11 ` [PATCH v3 04/13] sched/isolation: Fix RCU protection for runtime-mutable cpumask callers Jing Wu
2026-06-18 3:11 ` [PATCH v3 05/13] cpu/hotplug: Reserve CPUHP states for nohz_full and managed IRQ down-paths Jing Wu
2026-06-18 16:06 ` Thomas Gleixner
2026-06-18 21:01 ` Thomas Gleixner
2026-06-18 3:11 ` [PATCH v3 06/13] tick/nohz, context_tracking: Prepare for runtime nohz_full updates Jing Wu
2026-06-18 17:27 ` Thomas Gleixner
2026-06-18 19:49 ` Thomas Gleixner
2026-06-18 3:11 ` [PATCH v3 07/13] rcu/nocb: Add explicit housekeeping callback for runtime NOCB toggling Jing Wu
2026-06-18 3:11 ` [PATCH v3 08/13] genirq: Add explicit housekeeping callback for managed IRQ migration Jing Wu
2026-06-18 20:27 ` Thomas Gleixner [this message]
2026-06-18 21:11 ` Thomas Gleixner
2026-06-18 3:11 ` [PATCH v3 09/13] watchdog/lockup_detector: Register housekeeping callback for kernel-noise Jing Wu
2026-06-18 3:11 ` [PATCH v3 10/13] sched: Guard sched_tick_start/stop against uninitialized tick_work_cpu Jing Wu
2026-06-18 20:50 ` Thomas Gleixner
2026-06-18 3:11 ` [PATCH v3 11/13] cgroup/cpuset: Extend isolated partition to trigger kernel-noise isolation Jing Wu
2026-06-18 20:55 ` Thomas Gleixner
2026-06-18 3:11 ` [PATCH v3 12/13] docs: cgroup-v2: Document kernel-noise isolation via isolated partitions Jing Wu
2026-06-18 3:11 ` [PATCH v3 13/13] selftests/cgroup: Add kernel-noise isolation test to cpuset selftest Jing Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87cxxnegqa.ffs@fw13 \
--to=tglx@kernel.org \
--cc=anna-maria@linutronix.de \
--cc=boqun@kernel.org \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=corbet@lwn.net \
--cc=dietmar.eggemann@arm.com \
--cc=frederic@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=josh@joshtriplett.org \
--cc=juri.lelli@redhat.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=neeraj.upadhyay@kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=qiang.zhang@linux.dev \
--cc=rcu@vger.kernel.org \
--cc=realwujing@gmail.com \
--cc=rostedt@goodmis.org \
--cc=shuah@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=tj@kernel.org \
--cc=urezki@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=yuanql9@chinatelecom.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox