* [PATCH] irqchip/gicv3-its: Workaround for GIC-700 erratum 2195890
@ 2024-06-24 16:55 Roman Kagan
2024-06-24 21:44 ` Thomas Gleixner
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Roman Kagan @ 2024-06-24 16:55 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Catalin Marinas, Will Deacon, nh-open-source, linux-doc,
linux-kernel, Thomas Gleixner, Jonathan Corbet, Marc Zyngier
According to Arm CoreLink GIC-700 erratum 2195890, on GIC revisions
r0p0, r0p1, r1p0 under certain conditions LPIs may remain in the Pending
Table until one of a number of external events occurs.
No LPIs are lost but they may not be delivered in a finite time.
The workaround is to issue an INV using GICR_INVLPIR to an unused, in
range LPI ID to retrigger the search.
Add this workaround to the quirk table. When the quirk is applicable,
carve out one LPI ID from the available range and run periodic work to
do INV to it, in order to prevent GIC from stalling.
TT: https://t.corp.amazon.com/D82032616
Signed-off-by: Elad Rosner <eladros@amazon.com>
Signed-off-by: Mohamed Mediouni <mediou@amazon.com>
Signed-off-by: Roman Kagan <rkagan@amazon.de>
---
drivers/irqchip/irq-gic-v3-its.c | 70 ++++++++++++++++++++-
Documentation/arch/arm64/silicon-errata.rst | 2 +
arch/arm64/Kconfig | 18 ++++++
3 files changed, 89 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 3c755d5dad6e..53cf50dd8e13 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -29,6 +29,7 @@
#include <linux/percpu.h>
#include <linux/slab.h>
#include <linux/syscore_ops.h>
+#include <linux/workqueue.h>
#include <linux/irqchip.h>
#include <linux/irqchip/arm-gic-v3.h>
@@ -49,6 +50,7 @@
#define RD_LOCAL_MEMRESERVE_DONE BIT(2)
static u32 lpi_id_bits;
+static u32 lpi_id_base __initdata = 8192;
/*
* We allocate memory for PROPBASE to cover 2 ^ lpi_id_bits LPIs to
@@ -2136,7 +2138,7 @@ static int __init its_lpi_init(u32 id_bits)
* Initializing the allocator is just the same as freeing the
* full range of LPIs.
*/
- err = free_lpi_range(8192, lpis);
+ err = free_lpi_range(lpi_id_base, lpis - lpi_id_base + 8192);
pr_debug("ITS: Allocator initialized for %u LPIs\n", lpis);
return err;
}
@@ -4763,6 +4765,61 @@ static bool its_set_non_coherent(void *data)
return true;
}
+#define ITS_QUIRK_GIC700_2195890_PERIOD_MSEC 1000
+static struct {
+ u32 lpi;
+ struct delayed_work work;
+} its_quirk_gic700_2195890_data __maybe_unused;
+
+static void __maybe_unused its_quirk_gic700_2195890_work_handler(struct work_struct *work)
+{
+ int cpu;
+ void __iomem *rdbase;
+ u64 gicr_invlpir_val;
+
+ for_each_online_cpu(cpu) {
+ rdbase = gic_data_rdist_cpu(cpu)->rd_base;
+ if (!rdbase) {
+ continue;
+ }
+
+ /*
+ * Prod the respective GIC with an INV for an otherwise unused
+ * LPI. This is only to resume the stalled processing, so
+ * there's no need to wait for invalidation to complete.
+ */
+ gicr_invlpir_val =
+ FIELD_PREP(GICR_INVLPIR_INTID,
+ its_quirk_gic700_2195890_data.lpi);
+ raw_spin_lock(&gic_data_rdist_cpu(cpu)->rd_lock);
+ gic_write_lpir(gicr_invlpir_val, rdbase + GICR_INVLPIR);
+ raw_spin_unlock(&gic_data_rdist_cpu(cpu)->rd_lock);
+ }
+
+ schedule_delayed_work(&its_quirk_gic700_2195890_data.work,
+ msecs_to_jiffies(ITS_QUIRK_GIC700_2195890_PERIOD_MSEC));
+}
+
+static bool __maybe_unused its_enable_quirk_gic700_2195890(void *data)
+{
+ struct its_node *its = data;
+
+ if (its_quirk_gic700_2195890_data.lpi)
+ return true;
+
+ /*
+ * Use one LPI INTID from the start of the LPI range for GIC prodding,
+ * and make it unavailable for regular LPI use later.
+ */
+ its_quirk_gic700_2195890_data.lpi = lpi_id_base++;
+
+ INIT_DELAYED_WORK(&its_quirk_gic700_2195890_data.work,
+ its_quirk_gic700_2195890_work_handler);
+ schedule_delayed_work(&its_quirk_gic700_2195890_data.work, 0);
+
+ return true;
+}
+
static const struct gic_quirk its_quirks[] = {
#ifdef CONFIG_CAVIUM_ERRATUM_22375
{
@@ -4822,6 +4879,17 @@ static const struct gic_quirk its_quirks[] = {
.property = "dma-noncoherent",
.init = its_set_non_coherent,
},
+#ifdef CONFIG_ARM64_ERRATUM_2195890
+ {
+ .desc = "ITS: GIC-700 erratum 2195890",
+ /*
+ * Applies to r0p0, r0p1, r1p0: iidr_var(bits 16..19) == 0 or 1
+ */
+ .iidr = 0x0400043b,
+ .mask = 0xfffeffff,
+ .init = its_enable_quirk_gic700_2195890,
+ },
+#endif
{
}
};
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index eb8af8032c31..67445075ae88 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -169,6 +169,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | GIC-700 | #2941627 | ARM64_ERRATUM_2941627 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | GIC-700 | #2195890 | ARM64_ERRATUM_2195890 |
++----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 |
+----------------+-----------------+-----------------+-----------------------------+
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 5d91259ee7b5..9c330029131c 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1279,6 +1279,24 @@ config ROCKCHIP_ERRATUM_3588001
If unsure, say Y.
+config ARM64_ERRATUM_2195890
+ bool "GIC-700: 2195890: LPIs may be held in Pending Table until specific external events happen"
+ default y
+ help
+ This option adds a workaround for Arm GIC-700 erratum 2195890.
+
+ In affected GIC-700 versions (r0p0, r0p1, r1p0) under certain
+ conditions LPIs may remain in the Pending Table until one of a number
+ of external events occurs.
+
+ No LPIs are lost and this can happen on physical or virtual PEs but
+ this erratum means they may not be delivered in a finite time.
+
+ Work around the issue by inserting an INV command for an unused but
+ valid LPI INTID every so often to retrigger the search.
+
+ If unsure, say Y.
+
config SOCIONEXT_SYNQUACER_PREITS
bool "Socionext Synquacer: Workaround for GICv3 pre-ITS"
default y
--
2.34.1
Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] irqchip/gicv3-its: Workaround for GIC-700 erratum 2195890
2024-06-24 16:55 [PATCH] irqchip/gicv3-its: Workaround for GIC-700 erratum 2195890 Roman Kagan
@ 2024-06-24 21:44 ` Thomas Gleixner
2024-06-25 11:59 ` Roman Kagan
2024-06-25 8:45 ` Marc Zyngier
2024-06-28 16:07 ` kernel test robot
2 siblings, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2024-06-24 21:44 UTC (permalink / raw)
To: Roman Kagan, linux-arm-kernel
Cc: Catalin Marinas, Will Deacon, nh-open-source, linux-doc,
linux-kernel, Jonathan Corbet, Marc Zyngier
On Mon, Jun 24 2024 at 18:55, Roman Kagan wrote:
> According to Arm CoreLink GIC-700 erratum 2195890, on GIC revisions
> r0p0, r0p1, r1p0 under certain conditions LPIs may remain in the Pending
> Table until one of a number of external events occurs.
>
> No LPIs are lost but they may not be delivered in a finite time.
>
> The workaround is to issue an INV using GICR_INVLPIR to an unused, in
> range LPI ID to retrigger the search.
>
> Add this workaround to the quirk table. When the quirk is applicable,
> carve out one LPI ID from the available range and run periodic work to
> do INV to it, in order to prevent GIC from stalling.
>
> TT: https://t.corp.amazon.com/D82032616
Can you please refrain from providing internal links?
> Signed-off-by: Elad Rosner <eladros@amazon.com>
> Signed-off-by: Mohamed Mediouni <mediou@amazon.com>
> Signed-off-by: Roman Kagan <rkagan@amazon.de>
That Signed-off-by chain is invalid. See Documentation/process/
Thanks,
tglx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] irqchip/gicv3-its: Workaround for GIC-700 erratum 2195890
2024-06-24 16:55 [PATCH] irqchip/gicv3-its: Workaround for GIC-700 erratum 2195890 Roman Kagan
2024-06-24 21:44 ` Thomas Gleixner
@ 2024-06-25 8:45 ` Marc Zyngier
2024-06-25 13:54 ` Roman Kagan
2024-06-28 16:07 ` kernel test robot
2 siblings, 1 reply; 7+ messages in thread
From: Marc Zyngier @ 2024-06-25 8:45 UTC (permalink / raw)
To: Roman Kagan
Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, nh-open-source,
linux-doc, linux-kernel, Thomas Gleixner, Jonathan Corbet
On Mon, 24 Jun 2024 17:55:41 +0100,
Roman Kagan <rkagan@amazon.de> wrote:
>
> According to Arm CoreLink GIC-700 erratum 2195890, on GIC revisions
> r0p0, r0p1, r1p0 under certain conditions LPIs may remain in the Pending
> Table until one of a number of external events occurs.
Please add a link to the errata document.
>
> No LPIs are lost but they may not be delivered in a finite time.
>
> The workaround is to issue an INV using GICR_INVLPIR to an unused, in
> range LPI ID to retrigger the search.
>
> Add this workaround to the quirk table. When the quirk is applicable,
> carve out one LPI ID from the available range and run periodic work to
> do INV to it, in order to prevent GIC from stalling.
The errata document says a lot more:
<quote>
For physical LPIs the workaround is to issue an INV using GICR_INVLPIR
to an unused, in range LPI ID to retrigger the search. This could be
done periodically, for example, in line with a residency change, or as
part of servicing LPIs. If using LPIs as the event, then the
GICR_INVLPIR write could be issued after servicing every LPI.
However, it only needs to be issued if:
* At least 4 interrupts in the block of 32 are enabled and mapped to
the current PE or, if easier,
* At least 4 interrupts in the block of 32 are enabled and mapped to
any PE
</quote>
>
> TT: https://t.corp.amazon.com/D82032616
Gniii????
> Signed-off-by: Elad Rosner <eladros@amazon.com>
> Signed-off-by: Mohamed Mediouni <mediou@amazon.com>
> Signed-off-by: Roman Kagan <rkagan@amazon.de>
Who is the author?
> ---
> drivers/irqchip/irq-gic-v3-its.c | 70 ++++++++++++++++++++-
> Documentation/arch/arm64/silicon-errata.rst | 2 +
> arch/arm64/Kconfig | 18 ++++++
> 3 files changed, 89 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 3c755d5dad6e..53cf50dd8e13 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -29,6 +29,7 @@
> #include <linux/percpu.h>
> #include <linux/slab.h>
> #include <linux/syscore_ops.h>
> +#include <linux/workqueue.h>
>
> #include <linux/irqchip.h>
> #include <linux/irqchip/arm-gic-v3.h>
> @@ -49,6 +50,7 @@
> #define RD_LOCAL_MEMRESERVE_DONE BIT(2)
>
> static u32 lpi_id_bits;
> +static u32 lpi_id_base __initdata = 8192;
>
> /*
> * We allocate memory for PROPBASE to cover 2 ^ lpi_id_bits LPIs to
> @@ -2136,7 +2138,7 @@ static int __init its_lpi_init(u32 id_bits)
> * Initializing the allocator is just the same as freeing the
> * full range of LPIs.
> */
> - err = free_lpi_range(8192, lpis);
> + err = free_lpi_range(lpi_id_base, lpis - lpi_id_base + 8192);
> pr_debug("ITS: Allocator initialized for %u LPIs\n", lpis);
> return err;
> }
> @@ -4763,6 +4765,61 @@ static bool its_set_non_coherent(void *data)
> return true;
> }
>
> +#define ITS_QUIRK_GIC700_2195890_PERIOD_MSEC 1000
Use MSEC_PER_SEC.
> +static struct {
> + u32 lpi;
> + struct delayed_work work;
> +} its_quirk_gic700_2195890_data __maybe_unused;
> +
> +static void __maybe_unused its_quirk_gic700_2195890_work_handler(struct work_struct *work)
> +{
> + int cpu;
> + void __iomem *rdbase;
> + u64 gicr_invlpir_val;
> +
> + for_each_online_cpu(cpu) {
The errata document doesn't say that this need to happen for *every*
RD. Can you please clarify this?
> + rdbase = gic_data_rdist_cpu(cpu)->rd_base;
> + if (!rdbase) {
> + continue;
> + }
> +
> + /*
> + * Prod the respective GIC with an INV for an otherwise unused
> + * LPI. This is only to resume the stalled processing, so
> + * there's no need to wait for invalidation to complete.
> + */
> + gicr_invlpir_val =
> + FIELD_PREP(GICR_INVLPIR_INTID,
> + its_quirk_gic700_2195890_data.lpi);
Don't split assignments.
> + raw_spin_lock(&gic_data_rdist_cpu(cpu)->rd_lock);
> + gic_write_lpir(gicr_invlpir_val, rdbase + GICR_INVLPIR);
> + raw_spin_unlock(&gic_data_rdist_cpu(cpu)->rd_lock);
No synchronisation? How is that supposed to work?
Also, if you need to dig into the internals of the driver, extract a
helper from __direct_lpi_inv().
> + }
> +
> + schedule_delayed_work(&its_quirk_gic700_2195890_data.work,
> + msecs_to_jiffies(ITS_QUIRK_GIC700_2195890_PERIOD_MSEC));
It would be pretty easy to detect whether an LPI was ack'ed since the
last pass, and not issue the invalidate.
> +}
> +
> +static bool __maybe_unused its_enable_quirk_gic700_2195890(void *data)
> +{
> + struct its_node *its = data;
> +
> + if (its_quirk_gic700_2195890_data.lpi)
> + return true;
> +
> + /*
> + * Use one LPI INTID from the start of the LPI range for GIC prodding,
> + * and make it unavailable for regular LPI use later.
> + */
> + its_quirk_gic700_2195890_data.lpi = lpi_id_base++;
> +
> + INIT_DELAYED_WORK(&its_quirk_gic700_2195890_data.work,
> + its_quirk_gic700_2195890_work_handler);
> + schedule_delayed_work(&its_quirk_gic700_2195890_data.work, 0);
> +
> + return true;
> +}
It is a bit odd to hook this on an ITS being probed when the ITS isn't
really involved. Not a big deal, but a bit clumsy.
> +
> static const struct gic_quirk its_quirks[] = {
> #ifdef CONFIG_CAVIUM_ERRATUM_22375
> {
> @@ -4822,6 +4879,17 @@ static const struct gic_quirk its_quirks[] = {
> .property = "dma-noncoherent",
> .init = its_set_non_coherent,
> },
> +#ifdef CONFIG_ARM64_ERRATUM_2195890
> + {
> + .desc = "ITS: GIC-700 erratum 2195890",
> + /*
> + * Applies to r0p0, r0p1, r1p0: iidr_var(bits 16..19) == 0 or 1
> + */
> + .iidr = 0x0400043b,
> + .mask = 0xfffeffff,
> + .init = its_enable_quirk_gic700_2195890,
This catches r0p0 and r1p0, but not r0p1 (you require that bits 15:12
are 0).
Overall, this requires a bit of rework. Notably, this could be
significantly relaxed to match the requirements of the published
workaround.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] irqchip/gicv3-its: Workaround for GIC-700 erratum 2195890
2024-06-24 21:44 ` Thomas Gleixner
@ 2024-06-25 11:59 ` Roman Kagan
0 siblings, 0 replies; 7+ messages in thread
From: Roman Kagan @ 2024-06-25 11:59 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, nh-open-source,
linux-doc, linux-kernel, Jonathan Corbet, Marc Zyngier
On Mon, Jun 24, 2024 at 11:44:11PM +0200, Thomas Gleixner wrote:
> On Mon, Jun 24 2024 at 18:55, Roman Kagan wrote:
> > According to Arm CoreLink GIC-700 erratum 2195890, on GIC revisions
> > r0p0, r0p1, r1p0 under certain conditions LPIs may remain in the Pending
> > Table until one of a number of external events occurs.
> >
> > No LPIs are lost but they may not be delivered in a finite time.
> >
> > The workaround is to issue an INV using GICR_INVLPIR to an unused, in
> > range LPI ID to retrigger the search.
> >
> > Add this workaround to the quirk table. When the quirk is applicable,
> > carve out one LPI ID from the available range and run periodic work to
> > do INV to it, in order to prevent GIC from stalling.
> >
> > TT: https://t.corp.amazon.com/D82032616
>
> Can you please refrain from providing internal links?
>
> > Signed-off-by: Elad Rosner <eladros@amazon.com>
> > Signed-off-by: Mohamed Mediouni <mediou@amazon.com>
> > Signed-off-by: Roman Kagan <rkagan@amazon.de>
>
> That Signed-off-by chain is invalid. See Documentation/process/
ACK to all, will fix (sorry haven't posted patches for a while, muscle
memory got rusty).
Thanks,
Roman.
Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] irqchip/gicv3-its: Workaround for GIC-700 erratum 2195890
2024-06-25 8:45 ` Marc Zyngier
@ 2024-06-25 13:54 ` Roman Kagan
2024-06-25 14:30 ` Marc Zyngier
0 siblings, 1 reply; 7+ messages in thread
From: Roman Kagan @ 2024-06-25 13:54 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, nh-open-source,
linux-doc, linux-kernel, Thomas Gleixner, Jonathan Corbet
On Tue, Jun 25, 2024 at 09:45:22AM +0100, Marc Zyngier wrote:
> On Mon, 24 Jun 2024 17:55:41 +0100,
> Roman Kagan <rkagan@amazon.de> wrote:
> >
> > According to Arm CoreLink GIC-700 erratum 2195890, on GIC revisions
> > r0p0, r0p1, r1p0 under certain conditions LPIs may remain in the Pending
> > Table until one of a number of external events occurs.
>
> Please add a link to the errata document.
https://developer.arm.com/documentation/SDEN-1769194/
Will include when respinning.
> >
> > No LPIs are lost but they may not be delivered in a finite time.
> >
> > The workaround is to issue an INV using GICR_INVLPIR to an unused, in
> > range LPI ID to retrigger the search.
> >
> > Add this workaround to the quirk table. When the quirk is applicable,
> > carve out one LPI ID from the available range and run periodic work to
> > do INV to it, in order to prevent GIC from stalling.
>
> The errata document says a lot more:
>
> <quote>
> For physical LPIs the workaround is to issue an INV using GICR_INVLPIR
> to an unused, in range LPI ID to retrigger the search. This could be
> done periodically, for example, in line with a residency change, or as
> part of servicing LPIs. If using LPIs as the event, then the
> GICR_INVLPIR write could be issued after servicing every LPI.
>
> However, it only needs to be issued if:
>
> * At least 4 interrupts in the block of 32 are enabled and mapped to
> the current PE or, if easier,
>
> * At least 4 interrupts in the block of 32 are enabled and mapped to
> any PE
> </quote>
It didn't feel like worth optimizing for. I'll reconsider.
> > TT: https://t.corp.amazon.com/D82032616
>
> Gniii????
Indeed Q-/
> > Signed-off-by: Elad Rosner <eladros@amazon.com>
> > Signed-off-by: Mohamed Mediouni <mediou@amazon.com>
> > Signed-off-by: Roman Kagan <rkagan@amazon.de>
>
> Who is the author?
Joint effort aka inherited ownership. Will fix according to the
process doc.
> > +static void __maybe_unused its_quirk_gic700_2195890_work_handler(struct work_struct *work)
> > +{
> > + int cpu;
> > + void __iomem *rdbase;
> > + u64 gicr_invlpir_val;
> > +
> > + for_each_online_cpu(cpu) {
>
> The errata document doesn't say that this need to happen for *every*
> RD. Can you please clarify this?
(Digging out a year-old comms with ARM)
> > In multi-chip GIC system, does this write have to happen in each
> > chip or would a write to a single GICR trigger the search in all
> > GICDs?
> The write needs to occur for each physical PE - in other words, to
> each individual GICR that the search needs to be re-triggered for.
> > + raw_spin_lock(&gic_data_rdist_cpu(cpu)->rd_lock);
> > + gic_write_lpir(gicr_invlpir_val, rdbase + GICR_INVLPIR);
> > + raw_spin_unlock(&gic_data_rdist_cpu(cpu)->rd_lock);
>
> No synchronisation? How is that supposed to work?
>
> Also, if you need to dig into the internals of the driver, extract a
> helper from __direct_lpi_inv().
ACK
> > + }
> > +
> > + schedule_delayed_work(&its_quirk_gic700_2195890_data.work,
> > + msecs_to_jiffies(ITS_QUIRK_GIC700_2195890_PERIOD_MSEC));
>
> It would be pretty easy to detect whether an LPI was ack'ed since the
> last pass, and not issue the invalidate.
Makes sense, will look into it.
Overall, do you think this approach with a global work looping over cpus
is the right one, or we should better try and implement something
per-cpu?
> > +}
> > +
> > +static bool __maybe_unused its_enable_quirk_gic700_2195890(void *data)
> > +{
> > + struct its_node *its = data;
> > +
> > + if (its_quirk_gic700_2195890_data.lpi)
> > + return true;
> > +
> > + /*
> > + * Use one LPI INTID from the start of the LPI range for GIC prodding,
> > + * and make it unavailable for regular LPI use later.
> > + */
> > + its_quirk_gic700_2195890_data.lpi = lpi_id_base++;
> > +
> > + INIT_DELAYED_WORK(&its_quirk_gic700_2195890_data.work,
> > + its_quirk_gic700_2195890_work_handler);
> > + schedule_delayed_work(&its_quirk_gic700_2195890_data.work, 0);
> > +
> > + return true;
> > +}
>
> It is a bit odd to hook this on an ITS being probed when the ITS isn't
> really involved. Not a big deal, but a bit clumsy.
True, but the LPI allocation lives in this file so it looked easier to
wire it all up here. Where do you think it's more appropriate?
> > static const struct gic_quirk its_quirks[] = {
> > #ifdef CONFIG_CAVIUM_ERRATUM_22375
> > {
> > @@ -4822,6 +4879,17 @@ static const struct gic_quirk its_quirks[] = {
> > .property = "dma-noncoherent",
> > .init = its_set_non_coherent,
> > },
> > +#ifdef CONFIG_ARM64_ERRATUM_2195890
> > + {
> > + .desc = "ITS: GIC-700 erratum 2195890",
> > + /*
> > + * Applies to r0p0, r0p1, r1p0: iidr_var(bits 16..19) == 0 or 1
> > + */
> > + .iidr = 0x0400043b,
> > + .mask = 0xfffeffff,
> > + .init = its_enable_quirk_gic700_2195890,
>
> This catches r0p0 and r1p0, but not r0p1 (you require that bits 15:12
> are 0).
Ouch, right. Given the erratum exact wording
> Fault Status: Present in: r0p0, r0p1, r1p0 Fixed in: r2p0
I guess I should match everything below r2p0 and allow arbitrary bits
15:12 (i.e. set the third nibble in the mask to 0).
> Overall, this requires a bit of rework. Notably, this could be
> significantly relaxed to match the requirements of the published
> workaround.
Thanks for the propmpt review! Will rework and respin.
Roman.
Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] irqchip/gicv3-its: Workaround for GIC-700 erratum 2195890
2024-06-25 13:54 ` Roman Kagan
@ 2024-06-25 14:30 ` Marc Zyngier
0 siblings, 0 replies; 7+ messages in thread
From: Marc Zyngier @ 2024-06-25 14:30 UTC (permalink / raw)
To: Roman Kagan, Marc Zyngier, linux-arm-kernel, Catalin Marinas,
Will Deacon, nh-open-source, linux-doc, linux-kernel,
Thomas Gleixner, Jonathan Corbet
On Tue, 25 Jun 2024 14:54:28 +0100,
Roman Kagan <rkagan@amazon.de> wrote:
>
> On Tue, Jun 25, 2024 at 09:45:22AM +0100, Marc Zyngier wrote:
> > On Mon, 24 Jun 2024 17:55:41 +0100,
> > Roman Kagan <rkagan@amazon.de> wrote:
> > >
> > > According to Arm CoreLink GIC-700 erratum 2195890, on GIC revisions
> > > r0p0, r0p1, r1p0 under certain conditions LPIs may remain in the Pending
> > > Table until one of a number of external events occurs.
> >
> > Please add a link to the errata document.
>
> https://developer.arm.com/documentation/SDEN-1769194/
> Will include when respinning.
>
> > >
> > > No LPIs are lost but they may not be delivered in a finite time.
> > >
> > > The workaround is to issue an INV using GICR_INVLPIR to an unused, in
> > > range LPI ID to retrigger the search.
> > >
> > > Add this workaround to the quirk table. When the quirk is applicable,
> > > carve out one LPI ID from the available range and run periodic work to
> > > do INV to it, in order to prevent GIC from stalling.
> >
> > The errata document says a lot more:
> >
> > <quote>
> > For physical LPIs the workaround is to issue an INV using GICR_INVLPIR
> > to an unused, in range LPI ID to retrigger the search. This could be
> > done periodically, for example, in line with a residency change, or as
> > part of servicing LPIs. If using LPIs as the event, then the
> > GICR_INVLPIR write could be issued after servicing every LPI.
> >
> > However, it only needs to be issued if:
> >
> > * At least 4 interrupts in the block of 32 are enabled and mapped to
> > the current PE or, if easier,
> >
> > * At least 4 interrupts in the block of 32 are enabled and mapped to
> > any PE
> > </quote>
>
> It didn't feel like worth optimizing for. I'll reconsider.
I'm not sure we want to optimise for it, but I'd certainly want to
hear the *rationale* behind not considering the optimisation.
>
> > > TT: https://t.corp.amazon.com/D82032616
> >
> > Gniii????
>
> Indeed Q-/
>
> > > Signed-off-by: Elad Rosner <eladros@amazon.com>
> > > Signed-off-by: Mohamed Mediouni <mediou@amazon.com>
> > > Signed-off-by: Roman Kagan <rkagan@amazon.de>
> >
> > Who is the author?
>
> Joint effort aka inherited ownership. Will fix according to the
> process doc.
>
> > > +static void __maybe_unused its_quirk_gic700_2195890_work_handler(struct work_struct *work)
> > > +{
> > > + int cpu;
> > > + void __iomem *rdbase;
> > > + u64 gicr_invlpir_val;
> > > +
> > > + for_each_online_cpu(cpu) {
> >
> > The errata document doesn't say that this need to happen for *every*
> > RD. Can you please clarify this?
>
> (Digging out a year-old comms with ARM)
> > > In multi-chip GIC system, does this write have to happen in each
> > > chip or would a write to a single GICR trigger the search in all
> > > GICDs?
> > The write needs to occur for each physical PE - in other words, to
> > each individual GICR that the search needs to be re-triggered for.
OK, that pretty much rules out doing anything clever (note to self,
check the GIC revision before buying the HW...).
>
> > > + raw_spin_lock(&gic_data_rdist_cpu(cpu)->rd_lock);
> > > + gic_write_lpir(gicr_invlpir_val, rdbase + GICR_INVLPIR);
> > > + raw_spin_unlock(&gic_data_rdist_cpu(cpu)->rd_lock);
> >
> > No synchronisation? How is that supposed to work?
> >
> > Also, if you need to dig into the internals of the driver, extract a
> > helper from __direct_lpi_inv().
>
> ACK
>
> > > + }
> > > +
> > > + schedule_delayed_work(&its_quirk_gic700_2195890_data.work,
> > > + msecs_to_jiffies(ITS_QUIRK_GIC700_2195890_PERIOD_MSEC));
> >
> > It would be pretty easy to detect whether an LPI was ack'ed since the
> > last pass, and not issue the invalidate.
>
> Makes sense, will look into it.
>
> Overall, do you think this approach with a global work looping over cpus
> is the right one, or we should better try and implement something
> per-cpu?
One of my worries is that you're crossing all node boundaries by doing
this, which is going to suck on really large systems. If anything,
you'd be better off with a per-node worker.
It doesn't need to be implemented right now, but I have the feeling
that someone is going to ask.
>
> > > +}
> > > +
> > > +static bool __maybe_unused its_enable_quirk_gic700_2195890(void *data)
> > > +{
> > > + struct its_node *its = data;
> > > +
> > > + if (its_quirk_gic700_2195890_data.lpi)
> > > + return true;
> > > +
> > > + /*
> > > + * Use one LPI INTID from the start of the LPI range for GIC prodding,
> > > + * and make it unavailable for regular LPI use later.
> > > + */
> > > + its_quirk_gic700_2195890_data.lpi = lpi_id_base++;
> > > +
> > > + INIT_DELAYED_WORK(&its_quirk_gic700_2195890_data.work,
> > > + its_quirk_gic700_2195890_work_handler);
> > > + schedule_delayed_work(&its_quirk_gic700_2195890_data.work, 0);
> > > +
> > > + return true;
> > > +}
> >
> > It is a bit odd to hook this on an ITS being probed when the ITS isn't
> > really involved. Not a big deal, but a bit clumsy.
>
> True, but the LPI allocation lives in this file so it looked easier to
> wire it all up here. Where do you think it's more appropriate?
But the allocation doesn't really take place, does it? You just nick
one LPI. Which by the way I'd rather you pick the last one instead of
the first, as this messes with devices that require ye oldie MultiMSI
and its stupid alignment requirements.
Also, you have its_cpu_init() which takes care of RDs. You could
absolutely add the quirk to the main GIC driver (based on the
distributor IIDR), add a flag to rdists.flags, and let it roll.
Either way, I don't really care. Maybe keeping it centralised is good
enough.
>
> > > static const struct gic_quirk its_quirks[] = {
> > > #ifdef CONFIG_CAVIUM_ERRATUM_22375
> > > {
> > > @@ -4822,6 +4879,17 @@ static const struct gic_quirk its_quirks[] = {
> > > .property = "dma-noncoherent",
> > > .init = its_set_non_coherent,
> > > },
> > > +#ifdef CONFIG_ARM64_ERRATUM_2195890
> > > + {
> > > + .desc = "ITS: GIC-700 erratum 2195890",
> > > + /*
> > > + * Applies to r0p0, r0p1, r1p0: iidr_var(bits 16..19) == 0 or 1
> > > + */
> > > + .iidr = 0x0400043b,
> > > + .mask = 0xfffeffff,
> > > + .init = its_enable_quirk_gic700_2195890,
> >
> > This catches r0p0 and r1p0, but not r0p1 (you require that bits 15:12
> > are 0).
>
> Ouch, right. Given the erratum exact wording
>
> > Fault Status: Present in: r0p0, r0p1, r1p0 Fixed in: r2p0
>
> I guess I should match everything below r2p0 and allow arbitrary bits
> 15:12 (i.e. set the third nibble in the mask to 0).
Either that or you could have two entries. Or use the fact that there
is no released r1p1 to your advantage...
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] irqchip/gicv3-its: Workaround for GIC-700 erratum 2195890
2024-06-24 16:55 [PATCH] irqchip/gicv3-its: Workaround for GIC-700 erratum 2195890 Roman Kagan
2024-06-24 21:44 ` Thomas Gleixner
2024-06-25 8:45 ` Marc Zyngier
@ 2024-06-28 16:07 ` kernel test robot
2 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2024-06-28 16:07 UTC (permalink / raw)
To: Roman Kagan, linux-arm-kernel
Cc: oe-kbuild-all, Catalin Marinas, Will Deacon, nh-open-source,
linux-doc, linux-kernel, Thomas Gleixner, Jonathan Corbet,
Marc Zyngier
Hi Roman,
kernel test robot noticed the following build warnings:
[auto build test WARNING on arm64/for-next/core]
[also build test WARNING on soc/for-next tip/irq/core linus/master v6.10-rc5 next-20240627]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Roman-Kagan/irqchip-gicv3-its-Workaround-for-GIC-700-erratum-2195890/20240625-212945
base: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
patch link: https://lore.kernel.org/r/20240624165541.1286227-1-rkagan%40amazon.de
patch subject: [PATCH] irqchip/gicv3-its: Workaround for GIC-700 erratum 2195890
config: arm64-randconfig-004-20240628 (https://download.01.org/0day-ci/archive/20240628/202406282359.98P8jJ4c-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240628/202406282359.98P8jJ4c-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202406282359.98P8jJ4c-lkp@intel.com/
All warnings (new ones prefixed by >>, old ones prefixed by <<):
>> WARNING: modpost: vmlinux: section mismatch in reference: its_enable_quirk_gic700_2195890+0x24 (section: .text) -> lpi_id_base (section: .init.data)
WARNING: modpost: vmlinux: section mismatch in reference: its_enable_quirk_gic700_2195890+0x34 (section: .text) -> lpi_id_base (section: .init.data)
WARNING: modpost: vmlinux: section mismatch in reference: its_enable_quirk_gic700_2195890+0x44 (section: .text) -> lpi_id_base (section: .init.data)
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/arm64/crypto/crct10dif-ce.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/arm64/crypto/aes-neon-bs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/locking/test-ww_mutex.o
WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/time/test_udelay.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/nls_cp850.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/nls_cp860.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/nls_cp862.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/nls_cp863.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/nls_cp949.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/nls_cp950.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/nls_cp1251.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/nls_iso8859-7.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/mac-centeuro.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/mac-gaelic.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/binfmt_misc.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/cramfs/cramfs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/fat/fat.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/ufs/ufs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/efs/efs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/adfs/adfs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/btrfs/btrfs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in security/keys/trusted-keys/trusted.o
WARNING: modpost: missing MODULE_DESCRIPTION() in crypto/algif_hash.o
WARNING: modpost: missing MODULE_DESCRIPTION() in crypto/algif_skcipher.o
WARNING: modpost: missing MODULE_DESCRIPTION() in crypto/xor.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/crypto/libdes.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_hexdump.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_bpf.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_firmware.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test-kstrtox.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_module.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_user_copy.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_blackhole_dev.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/atomic64_test.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/asn1_encoder.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/pinctrl/pinctrl-mcp23s08_i2c.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/pinctrl/pinctrl-mcp23s08.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/video/backlight/platform_lcd.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/dma/qcom/hdma.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/tty/serial/8250/8250_base.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/tty/serial/serial_mctrl_gpio.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/tty/n_hdlc.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/tty/n_gsm.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/tty/ttynull.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/char/hw_random/omap-rng.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/char/hw_random/omap3-rom-rng.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/char/hw_random/arm_smccc_trng.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/char/ppdev.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/misc/fastrpc.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/auxdisplay/line-display.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/media/tuners/tda9887.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/media/dvb-frontends/au8522_decoder.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/media/dvb-frontends/mb86a16.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mmc/core/pwrseq_emmc.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mmc/core/sdio_uart.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/perf/arm-ccn.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/uio/uio.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/greybus/greybus.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/rpmsg/rpmsg_char.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/iio/buffer/kfifo_buf.o
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-06-28 16:08 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-24 16:55 [PATCH] irqchip/gicv3-its: Workaround for GIC-700 erratum 2195890 Roman Kagan
2024-06-24 21:44 ` Thomas Gleixner
2024-06-25 11:59 ` Roman Kagan
2024-06-25 8:45 ` Marc Zyngier
2024-06-25 13:54 ` Roman Kagan
2024-06-25 14:30 ` Marc Zyngier
2024-06-28 16:07 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).