From: Frederic Weisbecker <frederic@kernel.org>
To: Waiman Long <llong@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
LKML <linux-kernel@vger.kernel.org>,
Marco Crivellari <marco.crivellari@suse.com>,
cgroups@vger.kernel.org
Subject: Re: [PATCH] genirq: Fix IRQ threads affinity VS cpuset isolated partitions
Date: Wed, 12 Nov 2025 13:56:24 +0100 [thread overview]
Message-ID: <aRSD-Fyy87qhCR6C@localhost.localdomain> (raw)
In-Reply-To: <5d3d80dd-00ca-464d-bebf-c0fd4836b947@redhat.com>
Le Mon, Nov 10, 2025 at 04:28:49PM -0500, Waiman Long a écrit :
> On 11/5/25 8:17 AM, Frederic Weisbecker wrote:
> > When a cpuset isolated partition is created / updated or destroyed,
> > the IRQ threads are affine blindly to all the non-isolated CPUs. And
> > this happens without taking into account the IRQ thread initial
> > affinity that becomes ignored.
> >
> > For example in a system with 8 CPUs, if an IRQ and its kthread are
> > initially affine to CPU 5, creating an isolated partition with only
> > CPU 2 inside will eventually end up affining the IRQ kthread to all
> > CPUs but CPU 2 (that is CPUs 0,1,3-7), losing the kthread preference for
> > CPU 5.
> >
> > Besides the blind re-affinity, this doesn't take care of the actual
> > low level interrupt which isn't migrated. As of today the only way to
> > isolate non managed interrupts, along with their kthreads, is to
> > overwrite their affinity separately, for example through /proc/irq/
> >
> > To avoid doing that manually, future development should focus on
> > updating the IRQs affinity whenever cpuset isolated partitions are
> > updated.
> >
> > In the meantime, cpuset shouldn't fiddle with IRQ threads directly.
> > To prevent from that, set the PF_NO_SETAFFINITY flag to them.
> >
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> > kernel/irq/manage.c | 33 ++++++++++++++++++++-------------
> > 1 file changed, 20 insertions(+), 13 deletions(-)
> >
> > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> > index 400856abf672..5ca000c9f4a7 100644
> > --- a/kernel/irq/manage.c
> > +++ b/kernel/irq/manage.c
> > @@ -176,7 +176,7 @@ bool irq_can_set_affinity_usr(unsigned int irq)
> > }
> > /**
> > - * irq_set_thread_affinity - Notify irq threads to adjust affinity
> > + * irq_thread_update_affinity - Notify irq threads to adjust affinity
> > * @desc: irq descriptor which has affinity changed
> > *
> > * Just set IRQTF_AFFINITY and delegate the affinity setting to the
> > @@ -184,7 +184,7 @@ bool irq_can_set_affinity_usr(unsigned int irq)
> > * we hold desc->lock and this code can be called from hard interrupt
> > * context.
> > */
> > -static void irq_set_thread_affinity(struct irq_desc *desc)
> > +static void irq_thread_update_affinity(struct irq_desc *desc)
> > {
> > struct irqaction *action;
> > @@ -283,7 +283,7 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
> > fallthrough;
> > case IRQ_SET_MASK_OK_NOCOPY:
> > irq_validate_effective_affinity(data);
> > - irq_set_thread_affinity(desc);
> > + irq_thread_update_affinity(desc);
> > ret = 0;
> > }
> > @@ -1035,8 +1035,23 @@ static void irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *a
> > set_cpus_allowed_ptr(current, mask);
> > free_cpumask_var(mask);
> > }
> > +
> > +static inline void irq_thread_set_affinity(struct task_struct *t,
> > + struct irq_desc *desc)
> > +{
> > + const struct cpumask *mask;
> > +
> > + if (cpumask_available(desc->irq_common_data.affinity))
> > + mask = irq_data_get_effective_affinity_mask(&desc->irq_data);
> > + else
> > + mask = cpu_possible_mask;
> > +
> > + kthread_bind_mask(t, mask);
> > +}
>
> This function seems to mirror what is done in irq_thread_check_affinity()
> when the affinity cpumask is available. But if affinity isn't defined, it
> will make this irq kthread immune from changes in the set of isolated CPUs.
> Should we use IRQD_AFFINITY_SET flag to check if affinity has been set and
> then set PF_NO_SETAFFINITY only in this case?
So IIUC, the cpumask_available() failure can't really happen because an allocation
failure would make irq_alloc_descs() fail.
__irq_alloc_descs() -> alloc_descs() -> alloc_desc() -> init_desc() - > alloc_mask()
The error doesn't seem as well handled in early_irq_init() but the desc is freed
anyway if that happens.
So this is just a sanity check at best.
Thanks.
--
Frederic Weisbecker
SUSE Labs
next prev parent reply other threads:[~2025-11-12 12:56 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-05 13:17 [PATCH] genirq: Fix IRQ threads affinity VS cpuset isolated partitions Frederic Weisbecker
2025-11-10 21:28 ` Waiman Long
2025-11-12 12:56 ` Frederic Weisbecker [this message]
2025-11-14 15:40 ` Thomas Gleixner
2025-11-12 13:00 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aRSD-Fyy87qhCR6C@localhost.localdomain \
--to=frederic@kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=llong@redhat.com \
--cc=marco.crivellari@suse.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox