From: Marc Zyngier <maz@kernel.org>
To: Thierry Reding <thierry.reding@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
linux-tegra@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: IRQ thread timeouts and affinity
Date: Thu, 09 Oct 2025 19:11:20 +0100 [thread overview]
Message-ID: <86ms60x7w7.wl-maz@kernel.org> (raw)
In-Reply-To: <86o6qgxayt.wl-maz@kernel.org>
On Thu, 09 Oct 2025 18:04:58 +0100,
Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 09 Oct 2025 17:05:15 +0100,
> Thierry Reding <thierry.reding@gmail.com> wrote:
> >
> > [1 <text/plain; us-ascii (quoted-printable)>]
> > On Thu, Oct 09, 2025 at 03:30:56PM +0100, Marc Zyngier wrote:
> > > Hi Thierry,
> > >
> > > On Thu, 09 Oct 2025 12:38:55 +0100,
> > > Thierry Reding <thierry.reding@gmail.com> wrote:
> > > >
> > > > Which brings me to the actual question: what is the right way to solve
> > > > this? I had, maybe naively, assumed that the default CPU affinity, which
> > > > includes all available CPUs, would be sufficient to have interrupts
> > > > balanced across all of those CPUs, but that doesn't appear to be the
> > > > case. At least not with the GIC (v3) driver which selects one CPU (CPU 0
> > > > in this particular case) from the affinity mask to set the "effective
> > > > affinity", which then dictates where IRQs are handled and where the
> > > > corresponding IRQ thread function is run.
> > >
> > > There's a (GIC-specific) answer to that, and that's the "1 of N"
> > > distribution model. The problem is that it is a massive headache (it
> > > completely breaks with per-CPU context).
> >
> > Heh, that started out as a very promising first paragraph but turned
> > ugly very quickly... =)
> >
> > > We could try and hack this in somehow, but defining a reasonable API
> > > is complicated. The set of CPUs receiving 1:N interrupts is a *global*
> > > set, which means you cannot have one interrupt targeting CPUs 0-1, and
> > > another targeting CPUs 2-3. You can only have a single set for all 1:N
> > > interrupts. How would you define such a set in a platform agnostic
> > > manner so that a random driver could use this? I definitely don't want
> > > to have a GIC-specific API.
> >
> > I see. I've been thinking that maybe the only way to solve this is using
> > some sort of policy. A very simple policy might be: use CPU 0 as the
> > "default" interrupt (much like it is now) because like you said there
> > might be assumptions built-in that break when the interrupt is scheduled
> > elsewhere. But then let individual drivers opt into the 1:N set, which
> > would perhaps span all available CPUs but the first one. From an API PoV
> > this would just be a flag that's passed to request_irq() (or one of its
> > derivatives).
>
> The $10k question is how do you pick the victim CPUs? I can't see how
> to do it in a reasonable way unless we decide that interrupts that
> have an affinity matching cpu_possible_mask are 1:N. And then we're
> left with wondering what to do about CPU hotplug.
For fun and giggles, here's the result of a 5 minute hack. It enables
1:N distribution on SPIs that have an "all cpus" affinity. It works on
one machine, doesn't on another -- no idea why yet. YMMV.
This is of course conditioned on your favourite HW supporting the 1:N
feature, and it is likely that things will catch fire quickly. It will
probably make your overall interrupt latency *worse*, but maybe less
variable. Let me know.
M.
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index dbeb85677b08c..ab32339b32719 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -67,6 +67,7 @@ struct gic_chip_data {
u32 nr_redist_regions;
u64 flags;
bool has_rss;
+ bool has_oon;
unsigned int ppi_nr;
struct partition_desc **ppi_descs;
};
@@ -1173,9 +1174,10 @@ static void gic_update_rdist_properties(void)
gic_iterate_rdists(__gic_update_rdist_properties);
if (WARN_ON(gic_data.ppi_nr == UINT_MAX))
gic_data.ppi_nr = 0;
- pr_info("GICv3 features: %d PPIs%s%s\n",
+ pr_info("GICv3 features: %d PPIs%s%s%s\n",
gic_data.ppi_nr,
gic_data.has_rss ? ", RSS" : "",
+ gic_data.has_oon ? ", 1:N" : "",
gic_data.rdists.has_direct_lpi ? ", DirectLPI" : "");
if (gic_data.rdists.has_vlpis)
@@ -1481,6 +1483,7 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
u32 offset, index;
void __iomem *reg;
int enabled;
+ bool oon;
u64 val;
if (force)
@@ -1488,6 +1491,8 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
else
cpu = cpumask_any_and(mask_val, cpu_online_mask);
+ oon = gic_data.has_oon && cpumask_equal(mask_val, cpu_possible_mask);
+
if (cpu >= nr_cpu_ids)
return -EINVAL;
@@ -1501,7 +1506,7 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
offset = convert_offset_index(d, GICD_IROUTER, &index);
reg = gic_dist_base(d) + offset + (index * 8);
- val = gic_cpu_to_affinity(cpu);
+ val = oon ? GICD_IROUTER_SPI_MODE_ANY : gic_cpu_to_affinity(cpu);
gic_write_irouter(val, reg);
@@ -1512,7 +1517,7 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
if (enabled)
gic_unmask_irq(d);
- irq_data_update_effective_affinity(d, cpumask_of(cpu));
+ irq_data_update_effective_affinity(d, oon ? cpu_possible_mask : cpumask_of(cpu));
return IRQ_SET_MASK_OK_DONE;
}
@@ -2114,6 +2119,7 @@ static int __init gic_init_bases(phys_addr_t dist_phys_base,
irq_domain_update_bus_token(gic_data.domain, DOMAIN_BUS_WIRED);
gic_data.has_rss = !!(typer & GICD_TYPER_RSS);
+ gic_data.has_oon = !(typer & GICD_TYPER_No1N);
if (typer & GICD_TYPER_MBIS) {
err = mbi_init(handle, gic_data.domain);
diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
index 70c0948f978eb..ffbfc1c8d1934 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -80,6 +80,7 @@
#define GICD_CTLR_ENABLE_SS_G0 (1U << 0)
#define GICD_TYPER_RSS (1U << 26)
+#define GICD_TYPER_No1N (1U << 25)
#define GICD_TYPER_LPIS (1U << 17)
#define GICD_TYPER_MBIS (1U << 16)
#define GICD_TYPER_ESPI (1U << 8)
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2025-10-09 18:11 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-09 11:38 IRQ thread timeouts and affinity Thierry Reding
2025-10-09 14:30 ` Marc Zyngier
2025-10-09 16:05 ` Thierry Reding
2025-10-09 17:04 ` Marc Zyngier
2025-10-09 18:11 ` Marc Zyngier [this message]
2025-10-10 13:50 ` Thierry Reding
2025-10-10 14:18 ` Marc Zyngier
2025-10-10 14:38 ` Jon Hunter
2025-10-10 14:54 ` Thierry Reding
2025-10-10 15:52 ` Jon Hunter
2025-10-10 15:03 ` Thierry Reding
2025-10-11 10:00 ` Marc Zyngier
2025-10-14 10:50 ` Thierry Reding
2025-10-14 11:08 ` Thierry Reding
2025-10-14 17:46 ` Marc Zyngier
2025-10-16 18:53 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86ms60x7w7.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tegra@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=thierry.reding@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.