* [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
@ 2023-10-02 16:45 Douglas Anderson
2023-10-02 16:45 ` [PATCH 2/2] arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup Douglas Anderson
2023-10-02 17:24 ` [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Mark Rutland
0 siblings, 2 replies; 19+ messages in thread
From: Douglas Anderson @ 2023-10-02 16:45 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Marc Zyngier
Cc: Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, Douglas Anderson,
AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf,
Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg,
Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek
Some mediatek devices have the property
"mediatek,broken-save-restore-fw" in their GIC. This means that,
although the hardware supports pseudo-NMI, the firmware has a bug
that blocks enabling it. When we're in this state,
system_uses_irq_prio_masking() will return true but we'll fail to
actually enable the IRQ in the GIC.
Let's make the code handle this. We'll detect that we failed to
request an IPI as NMI and fallback to requesting it normally. Though
we expect that either all of our requests will fail or all will
succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us
robust.
Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
Reported-by: Chen-Yu Tsai <wenst@chromium.org>
Closes: https://issuetracker.google.com/issues/197061987#comment68
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
arch/arm64/kernel/smp.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 814d9aa93b21..0a6002243a8c 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -87,6 +87,7 @@ enum ipi_msg_type {
static int ipi_irq_base __ro_after_init;
static int nr_ipi __ro_after_init = NR_IPI;
static struct irq_desc *ipi_desc[MAX_IPI] __ro_after_init;
+DECLARE_BITMAP(ipi_is_nmi, MAX_IPI);
static void ipi_setup(int cpu);
@@ -986,7 +987,7 @@ static void ipi_setup(int cpu)
return;
for (i = 0; i < nr_ipi; i++) {
- if (ipi_should_be_nmi(i)) {
+ if (test_bit(i, ipi_is_nmi)) {
prepare_percpu_nmi(ipi_irq_base + i);
enable_percpu_nmi(ipi_irq_base + i, 0);
} else {
@@ -1004,7 +1005,7 @@ static void ipi_teardown(int cpu)
return;
for (i = 0; i < nr_ipi; i++) {
- if (ipi_should_be_nmi(i)) {
+ if (test_bit(i, ipi_is_nmi)) {
disable_percpu_nmi(ipi_irq_base + i);
teardown_percpu_nmi(ipi_irq_base + i);
} else {
@@ -1022,17 +1023,21 @@ void __init set_smp_ipi_range(int ipi_base, int n)
nr_ipi = min(n, MAX_IPI);
for (i = 0; i < nr_ipi; i++) {
- int err;
+ int err = -EINVAL;
if (ipi_should_be_nmi(i)) {
err = request_percpu_nmi(ipi_base + i, ipi_handler,
"IPI", &cpu_number);
- WARN(err, "Could not request IPI %d as NMI, err=%d\n",
- i, err);
- } else {
+ if (err)
+ pr_info_once("NMI unavailable; fallback to IRQ\n");
+ else
+ set_bit(i, ipi_is_nmi);
+ }
+
+ if (err) {
err = request_percpu_irq(ipi_base + i, ipi_handler,
"IPI", &cpu_number);
- WARN(err, "Could not request IPI %d as IRQ, err=%d\n",
+ WARN(err, "Could not request IPI %d, err=%d\n",
i, err);
}
--
2.42.0.582.g8ccd20d70d-goog
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH 2/2] arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup 2023-10-02 16:45 [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Douglas Anderson @ 2023-10-02 16:45 ` Douglas Anderson 2023-10-02 17:25 ` Mark Rutland 2023-10-02 17:24 ` [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Mark Rutland 1 sibling, 1 reply; 19+ messages in thread From: Douglas Anderson @ 2023-10-02 16:45 UTC (permalink / raw) To: Catalin Marinas, Will Deacon, Mark Rutland, Marc Zyngier Cc: Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, Douglas Anderson, D Scott Phillips, Josh Poimboeuf, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel In commit 2b2d0a7a96ab ("arm64: smp: Remove dedicated wakeup IPI") we started using a scheduler IPI to avoid a dedicated reschedule. When we did this, we used arch_smp_send_reschedule() directly rather than calling smp_send_reschedule(). The only difference is that calling arch_smp_send_reschedule() directly avoids tracing. Presumably we _don't_ want to avoid tracing here, so switch to smp_send_reschedule(). Fixes: 2b2d0a7a96ab ("arm64: smp: Remove dedicated wakeup IPI") Signed-off-by: Douglas Anderson <dianders@chromium.org> --- I don't 100% know if this is correct and I don't have any hardware that uses the "ACPI parking protocol", but I think it's right. My main incentive for this is that it makes it easier to backport pseudo-NMI to kernels that don't have arch_smp_send_reschedule(), but I think it's also more correct. If for some reason we truly did want to avoid tracing here, please shout and we can drop this patch. arch/arm64/kernel/smp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 0a6002243a8c..b530d8ef9c1d 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -1063,7 +1063,7 @@ void arch_send_wakeup_ipi(unsigned int cpu) * We use a scheduler IPI to wake the CPU as this avoids the need for a * dedicated IPI and we can safely handle spurious scheduler IPIs. */ - arch_smp_send_reschedule(cpu); + smp_send_reschedule(cpu); } #endif -- 2.42.0.582.g8ccd20d70d-goog _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup 2023-10-02 16:45 ` [PATCH 2/2] arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup Douglas Anderson @ 2023-10-02 17:25 ` Mark Rutland 0 siblings, 0 replies; 19+ messages in thread From: Mark Rutland @ 2023-10-02 17:25 UTC (permalink / raw) To: Douglas Anderson Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, D Scott Phillips, Josh Poimboeuf, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel On Mon, Oct 02, 2023 at 09:45:30AM -0700, Douglas Anderson wrote: > In commit 2b2d0a7a96ab ("arm64: smp: Remove dedicated wakeup IPI") we > started using a scheduler IPI to avoid a dedicated reschedule. When we > did this, we used arch_smp_send_reschedule() directly rather than > calling smp_send_reschedule(). The only difference is that calling > arch_smp_send_reschedule() directly avoids tracing. Presumably we > _don't_ want to avoid tracing here, so switch to > smp_send_reschedule(). > > Fixes: 2b2d0a7a96ab ("arm64: smp: Remove dedicated wakeup IPI") > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > I don't 100% know if this is correct and I don't have any hardware > that uses the "ACPI parking protocol", but I think it's right. My main > incentive for this is that it makes it easier to backport pseudo-NMI > to kernels that don't have arch_smp_send_reschedule(), but I think > it's also more correct. > > If for some reason we truly did want to avoid tracing here, please > shout and we can drop this patch. This should be sound, and I don't have strong feelings either way on this, so FWIW: Acked-by: Mark Rutland <mark.rutland@arm.com> Mark. > > arch/arm64/kernel/smp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 0a6002243a8c..b530d8ef9c1d 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -1063,7 +1063,7 @@ void arch_send_wakeup_ipi(unsigned int cpu) > * We use a scheduler IPI to wake the CPU as this avoids the need for a > * dedicated IPI and we can safely handle spurious scheduler IPIs. > */ > - arch_smp_send_reschedule(cpu); > + smp_send_reschedule(cpu); > } > #endif > > -- > 2.42.0.582.g8ccd20d70d-goog > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW 2023-10-02 16:45 [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Douglas Anderson 2023-10-02 16:45 ` [PATCH 2/2] arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup Douglas Anderson @ 2023-10-02 17:24 ` Mark Rutland 2023-10-02 19:16 ` Doug Anderson ` (2 more replies) 1 sibling, 3 replies; 19+ messages in thread From: Mark Rutland @ 2023-10-02 17:24 UTC (permalink / raw) To: Douglas Anderson Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf, Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote: > Some mediatek devices have the property > "mediatek,broken-save-restore-fw" in their GIC. This means that, > although the hardware supports pseudo-NMI, the firmware has a bug > that blocks enabling it. When we're in this state, > system_uses_irq_prio_masking() will return true but we'll fail to > actually enable the IRQ in the GIC. > > Let's make the code handle this. We'll detect that we failed to > request an IPI as NMI and fallback to requesting it normally. Though > we expect that either all of our requests will fail or all will > succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us > robust. > > Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > Reported-by: Chen-Yu Tsai <wenst@chromium.org> > Closes: https://issuetracker.google.com/issues/197061987#comment68 > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > > arch/arm64/kernel/smp.c | 19 ++++++++++++------- > 1 file changed, 12 insertions(+), 7 deletions(-) I'm not too keen on falling back here when we have no idea why the request failed. I'd prefer if we could check the `supports_pseudo_nmis` static key directly to account for the case of broken FW, e.g. as below. Mark. ---->8---- From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001 From: Mark Rutland <mark.rutland@arm.com> Date: Mon, 2 Oct 2023 18:00:36 +0100 Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW Some MediaTek devices have broken firmware which corrupts some GICR registers behind the back of the OS, and pseudo-NMIs cannot be used on these devices. For more details see commit: 44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues") We did not take this problem into account in commit: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") Since that commit arm64's SMP code will try to setup some IPIs as pseudo-NMIs, even on systems with broken FW. The GICv3 code will (rightly) reject attempts to request interrupts as pseudo-NMIs, resulting in boot-time failures. Avoid the problem by taking the broken FW into account when deciding to request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key named "supports_pseudo_nmis" which is false on systems with broken FW, and we can consult this within ipi_should_be_nmi(). Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") Reported-by: Chen-Yu Tsai <wenst@chromium.org> Closes: https://issuetracker.google.com/issues/197061987#comment68 Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Douglas Anderson <dianders@chromium.org> Cc: Marc Zyngier <maz@kernel.org> --- arch/arm64/kernel/smp.c | 5 ++++- drivers/irqchip/irq-gic-v3.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 814d9aa93b21b..061c69160f90f 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr) static bool ipi_should_be_nmi(enum ipi_msg_type ipi) { - if (!system_uses_irq_prio_masking()) + DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis); + + if (!system_uses_irq_prio_masking() || + !static_branch_likely(&supports_pseudo_nmis)) return false; switch (ipi) { diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 787ccc880b22d..737da1b9aabf2 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -106,7 +106,7 @@ static DEFINE_STATIC_KEY_TRUE(supports_deactivate_key); * - Figure 4-7 Secure read of the priority field for a Non-secure Group 1 * interrupt. */ -static DEFINE_STATIC_KEY_FALSE(supports_pseudo_nmis); +DEFINE_STATIC_KEY_FALSE(supports_pseudo_nmis); DEFINE_STATIC_KEY_FALSE(gic_nonsecure_priorities); EXPORT_SYMBOL(gic_nonsecure_priorities); -- 2.30.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW 2023-10-02 17:24 ` [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Mark Rutland @ 2023-10-02 19:16 ` Doug Anderson 2023-10-03 12:29 ` Mark Rutland 2023-10-06 9:55 ` Marc Zyngier 2023-10-06 10:20 ` Chen-Yu Tsai 2 siblings, 1 reply; 19+ messages in thread From: Doug Anderson @ 2023-10-02 19:16 UTC (permalink / raw) To: Mark Rutland Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf, Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek Hi, On Mon, Oct 2, 2023 at 10:24 AM Mark Rutland <mark.rutland@arm.com> wrote: > > On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote: > > Some mediatek devices have the property > > "mediatek,broken-save-restore-fw" in their GIC. This means that, > > although the hardware supports pseudo-NMI, the firmware has a bug > > that blocks enabling it. When we're in this state, > > system_uses_irq_prio_masking() will return true but we'll fail to > > actually enable the IRQ in the GIC. > > > > Let's make the code handle this. We'll detect that we failed to > > request an IPI as NMI and fallback to requesting it normally. Though > > we expect that either all of our requests will fail or all will > > succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us > > robust. > > > > Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > Reported-by: Chen-Yu Tsai <wenst@chromium.org> > > Closes: https://issuetracker.google.com/issues/197061987#comment68 > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > --- > > > > arch/arm64/kernel/smp.c | 19 ++++++++++++------- > > 1 file changed, 12 insertions(+), 7 deletions(-) > > I'm not too keen on falling back here when we have no idea why the request failed. > > I'd prefer if we could check the `supports_pseudo_nmis` static key directly to > account for the case of broken FW, e.g. as below. > > Mark. > > ---->8---- > From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001 > From: Mark Rutland <mark.rutland@arm.com> > Date: Mon, 2 Oct 2023 18:00:36 +0100 > Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW > > Some MediaTek devices have broken firmware which corrupts some GICR > registers behind the back of the OS, and pseudo-NMIs cannot be used on > these devices. For more details see commit: > > 44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues") > > We did not take this problem into account in commit: > > 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > Since that commit arm64's SMP code will try to setup some IPIs as > pseudo-NMIs, even on systems with broken FW. The GICv3 code will > (rightly) reject attempts to request interrupts as pseudo-NMIs, > resulting in boot-time failures. > > Avoid the problem by taking the broken FW into account when deciding to > request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key > named "supports_pseudo_nmis" which is false on systems with broken FW, > and we can consult this within ipi_should_be_nmi(). > > Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > Reported-by: Chen-Yu Tsai <wenst@chromium.org> > Closes: https://issuetracker.google.com/issues/197061987#comment68 > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > Cc: Douglas Anderson <dianders@chromium.org> > Cc: Marc Zyngier <maz@kernel.org> > --- > arch/arm64/kernel/smp.c | 5 ++++- > drivers/irqchip/irq-gic-v3.c | 2 +- > 2 files changed, 5 insertions(+), 2 deletions(-) Sure, this is OK w/ me as long as folks don't mind accessing the global here, it's OK w/ me: Reviewed-by: Douglas Anderson <dianders@chromium.org> It seems to work for me, thus: Tested-by: Douglas Anderson <dianders@chromium.org> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 814d9aa93b21b..061c69160f90f 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr) > > static bool ipi_should_be_nmi(enum ipi_msg_type ipi) > { > - if (!system_uses_irq_prio_masking()) > + DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis); > + > + if (!system_uses_irq_prio_masking() || > + !static_branch_likely(&supports_pseudo_nmis)) One thought, actually, is whether we should actually change system_uses_irq_prio_masking() to return the correct value. What do you think? -Doug _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW 2023-10-02 19:16 ` Doug Anderson @ 2023-10-03 12:29 ` Mark Rutland 2023-10-03 13:43 ` Doug Anderson 0 siblings, 1 reply; 19+ messages in thread From: Mark Rutland @ 2023-10-03 12:29 UTC (permalink / raw) To: Doug Anderson Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf, Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek On Mon, Oct 02, 2023 at 12:16:17PM -0700, Doug Anderson wrote: > Hi, > > On Mon, Oct 2, 2023 at 10:24 AM Mark Rutland <mark.rutland@arm.com> wrote: > > > > On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote: > > > Some mediatek devices have the property > > > "mediatek,broken-save-restore-fw" in their GIC. This means that, > > > although the hardware supports pseudo-NMI, the firmware has a bug > > > that blocks enabling it. When we're in this state, > > > system_uses_irq_prio_masking() will return true but we'll fail to > > > actually enable the IRQ in the GIC. > > > > > > Let's make the code handle this. We'll detect that we failed to > > > request an IPI as NMI and fallback to requesting it normally. Though > > > we expect that either all of our requests will fail or all will > > > succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us > > > robust. > > > > > > Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > > Reported-by: Chen-Yu Tsai <wenst@chromium.org> > > > Closes: https://issuetracker.google.com/issues/197061987#comment68 > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > --- > > > > > > arch/arm64/kernel/smp.c | 19 ++++++++++++------- > > > 1 file changed, 12 insertions(+), 7 deletions(-) > > > > I'm not too keen on falling back here when we have no idea why the request failed. > > > > I'd prefer if we could check the `supports_pseudo_nmis` static key directly to > > account for the case of broken FW, e.g. as below. > > > > Mark. > > > > ---->8---- > > From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001 > > From: Mark Rutland <mark.rutland@arm.com> > > Date: Mon, 2 Oct 2023 18:00:36 +0100 > > Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW > > > > Some MediaTek devices have broken firmware which corrupts some GICR > > registers behind the back of the OS, and pseudo-NMIs cannot be used on > > these devices. For more details see commit: > > > > 44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues") > > > > We did not take this problem into account in commit: > > > > 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > > > Since that commit arm64's SMP code will try to setup some IPIs as > > pseudo-NMIs, even on systems with broken FW. The GICv3 code will > > (rightly) reject attempts to request interrupts as pseudo-NMIs, > > resulting in boot-time failures. > > > > Avoid the problem by taking the broken FW into account when deciding to > > request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key > > named "supports_pseudo_nmis" which is false on systems with broken FW, > > and we can consult this within ipi_should_be_nmi(). > > > > Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > Reported-by: Chen-Yu Tsai <wenst@chromium.org> > > Closes: https://issuetracker.google.com/issues/197061987#comment68 > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > > Cc: Douglas Anderson <dianders@chromium.org> > > Cc: Marc Zyngier <maz@kernel.org> > > --- > > arch/arm64/kernel/smp.c | 5 ++++- > > drivers/irqchip/irq-gic-v3.c | 2 +- > > 2 files changed, 5 insertions(+), 2 deletions(-) > > Sure, this is OK w/ me as long as folks don't mind accessing the > global here, it's OK w/ me: > > Reviewed-by: Douglas Anderson <dianders@chromium.org> > > It seems to work for me, thus: > > Tested-by: Douglas Anderson <dianders@chromium.org> > > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > > index 814d9aa93b21b..061c69160f90f 100644 > > --- a/arch/arm64/kernel/smp.c > > +++ b/arch/arm64/kernel/smp.c > > @@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr) > > > > static bool ipi_should_be_nmi(enum ipi_msg_type ipi) > > { > > - if (!system_uses_irq_prio_masking()) > > + DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis); > > + > > + if (!system_uses_irq_prio_masking() || > > + !static_branch_likely(&supports_pseudo_nmis)) > > One thought, actually, is whether we should actually change > system_uses_irq_prio_masking() to return the correct value. What do > you think? I don't think we should add this to system_uses_irq_prio_masking(); that's used by the low-level flags manipulation code that gets inlined all over the place, and that code will work regarldess of whether we actually use NMI priorities. If we want to avoid using PMR masking *at all* on these platforms, we'd need to detect that within can_use_gic_priorities() or early_enable_pseudo_nmi(). Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW 2023-10-03 12:29 ` Mark Rutland @ 2023-10-03 13:43 ` Doug Anderson 2023-10-03 16:32 ` Mark Rutland 0 siblings, 1 reply; 19+ messages in thread From: Doug Anderson @ 2023-10-03 13:43 UTC (permalink / raw) To: Mark Rutland Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf, Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek Hi, On Tue, Oct 3, 2023 at 5:29 AM Mark Rutland <mark.rutland@arm.com> wrote: > > On Mon, Oct 02, 2023 at 12:16:17PM -0700, Doug Anderson wrote: > > Hi, > > > > On Mon, Oct 2, 2023 at 10:24 AM Mark Rutland <mark.rutland@arm.com> wrote: > > > > > > On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote: > > > > Some mediatek devices have the property > > > > "mediatek,broken-save-restore-fw" in their GIC. This means that, > > > > although the hardware supports pseudo-NMI, the firmware has a bug > > > > that blocks enabling it. When we're in this state, > > > > system_uses_irq_prio_masking() will return true but we'll fail to > > > > actually enable the IRQ in the GIC. > > > > > > > > Let's make the code handle this. We'll detect that we failed to > > > > request an IPI as NMI and fallback to requesting it normally. Though > > > > we expect that either all of our requests will fail or all will > > > > succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us > > > > robust. > > > > > > > > Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > > > Reported-by: Chen-Yu Tsai <wenst@chromium.org> > > > > Closes: https://issuetracker.google.com/issues/197061987#comment68 > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > > --- > > > > > > > > arch/arm64/kernel/smp.c | 19 ++++++++++++------- > > > > 1 file changed, 12 insertions(+), 7 deletions(-) > > > > > > I'm not too keen on falling back here when we have no idea why the request failed. > > > > > > I'd prefer if we could check the `supports_pseudo_nmis` static key directly to > > > account for the case of broken FW, e.g. as below. > > > > > > Mark. > > > > > > ---->8---- > > > From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001 > > > From: Mark Rutland <mark.rutland@arm.com> > > > Date: Mon, 2 Oct 2023 18:00:36 +0100 > > > Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW > > > > > > Some MediaTek devices have broken firmware which corrupts some GICR > > > registers behind the back of the OS, and pseudo-NMIs cannot be used on > > > these devices. For more details see commit: > > > > > > 44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues") > > > > > > We did not take this problem into account in commit: > > > > > > 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > > > > > Since that commit arm64's SMP code will try to setup some IPIs as > > > pseudo-NMIs, even on systems with broken FW. The GICv3 code will > > > (rightly) reject attempts to request interrupts as pseudo-NMIs, > > > resulting in boot-time failures. > > > > > > Avoid the problem by taking the broken FW into account when deciding to > > > request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key > > > named "supports_pseudo_nmis" which is false on systems with broken FW, > > > and we can consult this within ipi_should_be_nmi(). > > > > > > Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > > Reported-by: Chen-Yu Tsai <wenst@chromium.org> > > > Closes: https://issuetracker.google.com/issues/197061987#comment68 > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > > > Cc: Douglas Anderson <dianders@chromium.org> > > > Cc: Marc Zyngier <maz@kernel.org> > > > --- > > > arch/arm64/kernel/smp.c | 5 ++++- > > > drivers/irqchip/irq-gic-v3.c | 2 +- > > > 2 files changed, 5 insertions(+), 2 deletions(-) > > > > Sure, this is OK w/ me as long as folks don't mind accessing the > > global here, it's OK w/ me: > > > > Reviewed-by: Douglas Anderson <dianders@chromium.org> > > > > It seems to work for me, thus: > > > > Tested-by: Douglas Anderson <dianders@chromium.org> > > > > > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > > > index 814d9aa93b21b..061c69160f90f 100644 > > > --- a/arch/arm64/kernel/smp.c > > > +++ b/arch/arm64/kernel/smp.c > > > @@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr) > > > > > > static bool ipi_should_be_nmi(enum ipi_msg_type ipi) > > > { > > > - if (!system_uses_irq_prio_masking()) > > > + DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis); > > > + > > > + if (!system_uses_irq_prio_masking() || > > > + !static_branch_likely(&supports_pseudo_nmis)) > > > > One thought, actually, is whether we should actually change > > system_uses_irq_prio_masking() to return the correct value. What do > > you think? > > I don't think we should add this to system_uses_irq_prio_masking(); that's used > by the low-level flags manipulation code that gets inlined all over the place, > and that code will work regarldess of whether we actually use NMI priorities. > > If we want to avoid using PMR masking *at all* on these platforms, we'd need to > detect that within can_use_gic_priorities() or early_enable_pseudo_nmi(). I suspect that anyone trying to use PMR masking on these systems for any purpose will be unhappy. The issue is talked about in: https://issuetracker.google.com/281831288 ...where you can see that the firmware on these systems isn't properly saving/restoring some registers, including GICR_IPRIORITYR. -Doug _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW 2023-10-03 13:43 ` Doug Anderson @ 2023-10-03 16:32 ` Mark Rutland 2023-10-03 19:32 ` Doug Anderson 0 siblings, 1 reply; 19+ messages in thread From: Mark Rutland @ 2023-10-03 16:32 UTC (permalink / raw) To: Doug Anderson Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf, Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek On Tue, Oct 03, 2023 at 06:43:07AM -0700, Doug Anderson wrote: > Hi, > > On Tue, Oct 3, 2023 at 5:29 AM Mark Rutland <mark.rutland@arm.com> wrote: > > > > On Mon, Oct 02, 2023 at 12:16:17PM -0700, Doug Anderson wrote: > > > Hi, > > > > > > On Mon, Oct 2, 2023 at 10:24 AM Mark Rutland <mark.rutland@arm.com> wrote: > > > > > > > > On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote: > > > > > Some mediatek devices have the property > > > > > "mediatek,broken-save-restore-fw" in their GIC. This means that, > > > > > although the hardware supports pseudo-NMI, the firmware has a bug > > > > > that blocks enabling it. When we're in this state, > > > > > system_uses_irq_prio_masking() will return true but we'll fail to > > > > > actually enable the IRQ in the GIC. > > > > > > > > > > Let's make the code handle this. We'll detect that we failed to > > > > > request an IPI as NMI and fallback to requesting it normally. Though > > > > > we expect that either all of our requests will fail or all will > > > > > succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us > > > > > robust. > > > > > > > > > > Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > > > > Reported-by: Chen-Yu Tsai <wenst@chromium.org> > > > > > Closes: https://issuetracker.google.com/issues/197061987#comment68 > > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > > > --- > > > > > > > > > > arch/arm64/kernel/smp.c | 19 ++++++++++++------- > > > > > 1 file changed, 12 insertions(+), 7 deletions(-) > > > > > > > > I'm not too keen on falling back here when we have no idea why the request failed. > > > > > > > > I'd prefer if we could check the `supports_pseudo_nmis` static key directly to > > > > account for the case of broken FW, e.g. as below. > > > > > > > > Mark. > > > > > > > > ---->8---- > > > > From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001 > > > > From: Mark Rutland <mark.rutland@arm.com> > > > > Date: Mon, 2 Oct 2023 18:00:36 +0100 > > > > Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW > > > > > > > > Some MediaTek devices have broken firmware which corrupts some GICR > > > > registers behind the back of the OS, and pseudo-NMIs cannot be used on > > > > these devices. For more details see commit: > > > > > > > > 44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues") > > > > > > > > We did not take this problem into account in commit: > > > > > > > > 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > > > > > > > Since that commit arm64's SMP code will try to setup some IPIs as > > > > pseudo-NMIs, even on systems with broken FW. The GICv3 code will > > > > (rightly) reject attempts to request interrupts as pseudo-NMIs, > > > > resulting in boot-time failures. > > > > > > > > Avoid the problem by taking the broken FW into account when deciding to > > > > request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key > > > > named "supports_pseudo_nmis" which is false on systems with broken FW, > > > > and we can consult this within ipi_should_be_nmi(). > > > > > > > > Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > > > Reported-by: Chen-Yu Tsai <wenst@chromium.org> > > > > Closes: https://issuetracker.google.com/issues/197061987#comment68 > > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > > > > Cc: Douglas Anderson <dianders@chromium.org> > > > > Cc: Marc Zyngier <maz@kernel.org> > > > > --- > > > > arch/arm64/kernel/smp.c | 5 ++++- > > > > drivers/irqchip/irq-gic-v3.c | 2 +- > > > > 2 files changed, 5 insertions(+), 2 deletions(-) > > > > > > Sure, this is OK w/ me as long as folks don't mind accessing the > > > global here, it's OK w/ me: > > > > > > Reviewed-by: Douglas Anderson <dianders@chromium.org> > > > > > > It seems to work for me, thus: > > > > > > Tested-by: Douglas Anderson <dianders@chromium.org> > > > > > > > > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > > > > index 814d9aa93b21b..061c69160f90f 100644 > > > > --- a/arch/arm64/kernel/smp.c > > > > +++ b/arch/arm64/kernel/smp.c > > > > @@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr) > > > > > > > > static bool ipi_should_be_nmi(enum ipi_msg_type ipi) > > > > { > > > > - if (!system_uses_irq_prio_masking()) > > > > + DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis); > > > > + > > > > + if (!system_uses_irq_prio_masking() || > > > > + !static_branch_likely(&supports_pseudo_nmis)) > > > > > > One thought, actually, is whether we should actually change > > > system_uses_irq_prio_masking() to return the correct value. What do > > > you think? > > > > I don't think we should add this to system_uses_irq_prio_masking(); that's used > > by the low-level flags manipulation code that gets inlined all over the place, > > and that code will work regarldess of whether we actually use NMI priorities. > > > > If we want to avoid using PMR masking *at all* on these platforms, we'd need to > > detect that within can_use_gic_priorities() or early_enable_pseudo_nmi(). > > I suspect that anyone trying to use PMR masking on these systems for > any purpose will be unhappy. The issue is talked about in: > > https://issuetracker.google.com/281831288 > > ...where you can see that the firmware on these systems isn't properly > saving/restoring some registers, including GICR_IPRIORITYR. Ok, then that's a latent bug even before the IPI changes, going back to the original workaround in commit: 44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues") For the sake of those reading the archive, can we have a better description of what exactly happens on these boards? IIUC on these boards the firmware fails to save+restore (some?) GICR registers across (some?) PSCI CPU_SUSPEND idle states. Which registers does it save+restore? Does it reset other registers into a specific state? Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW 2023-10-03 16:32 ` Mark Rutland @ 2023-10-03 19:32 ` Doug Anderson 2023-10-04 9:59 ` Mark Rutland 0 siblings, 1 reply; 19+ messages in thread From: Doug Anderson @ 2023-10-03 19:32 UTC (permalink / raw) To: Mark Rutland Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf, Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek Hi, On Tue, Oct 3, 2023 at 9:32 AM Mark Rutland <mark.rutland@arm.com> wrote: > > On Tue, Oct 03, 2023 at 06:43:07AM -0700, Doug Anderson wrote: > > Hi, > > > > On Tue, Oct 3, 2023 at 5:29 AM Mark Rutland <mark.rutland@arm.com> wrote: > > > > > > On Mon, Oct 02, 2023 at 12:16:17PM -0700, Doug Anderson wrote: > > > > Hi, > > > > > > > > On Mon, Oct 2, 2023 at 10:24 AM Mark Rutland <mark.rutland@arm.com> wrote: > > > > > > > > > > On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote: > > > > > > Some mediatek devices have the property > > > > > > "mediatek,broken-save-restore-fw" in their GIC. This means that, > > > > > > although the hardware supports pseudo-NMI, the firmware has a bug > > > > > > that blocks enabling it. When we're in this state, > > > > > > system_uses_irq_prio_masking() will return true but we'll fail to > > > > > > actually enable the IRQ in the GIC. > > > > > > > > > > > > Let's make the code handle this. We'll detect that we failed to > > > > > > request an IPI as NMI and fallback to requesting it normally. Though > > > > > > we expect that either all of our requests will fail or all will > > > > > > succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us > > > > > > robust. > > > > > > > > > > > > Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > > > > > Reported-by: Chen-Yu Tsai <wenst@chromium.org> > > > > > > Closes: https://issuetracker.google.com/issues/197061987#comment68 > > > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > > > > --- > > > > > > > > > > > > arch/arm64/kernel/smp.c | 19 ++++++++++++------- > > > > > > 1 file changed, 12 insertions(+), 7 deletions(-) > > > > > > > > > > I'm not too keen on falling back here when we have no idea why the request failed. > > > > > > > > > > I'd prefer if we could check the `supports_pseudo_nmis` static key directly to > > > > > account for the case of broken FW, e.g. as below. > > > > > > > > > > Mark. > > > > > > > > > > ---->8---- > > > > > From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001 > > > > > From: Mark Rutland <mark.rutland@arm.com> > > > > > Date: Mon, 2 Oct 2023 18:00:36 +0100 > > > > > Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW > > > > > > > > > > Some MediaTek devices have broken firmware which corrupts some GICR > > > > > registers behind the back of the OS, and pseudo-NMIs cannot be used on > > > > > these devices. For more details see commit: > > > > > > > > > > 44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues") > > > > > > > > > > We did not take this problem into account in commit: > > > > > > > > > > 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > > > > > > > > > Since that commit arm64's SMP code will try to setup some IPIs as > > > > > pseudo-NMIs, even on systems with broken FW. The GICv3 code will > > > > > (rightly) reject attempts to request interrupts as pseudo-NMIs, > > > > > resulting in boot-time failures. > > > > > > > > > > Avoid the problem by taking the broken FW into account when deciding to > > > > > request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key > > > > > named "supports_pseudo_nmis" which is false on systems with broken FW, > > > > > and we can consult this within ipi_should_be_nmi(). > > > > > > > > > > Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > > > > Reported-by: Chen-Yu Tsai <wenst@chromium.org> > > > > > Closes: https://issuetracker.google.com/issues/197061987#comment68 > > > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > > > > > Cc: Douglas Anderson <dianders@chromium.org> > > > > > Cc: Marc Zyngier <maz@kernel.org> > > > > > --- > > > > > arch/arm64/kernel/smp.c | 5 ++++- > > > > > drivers/irqchip/irq-gic-v3.c | 2 +- > > > > > 2 files changed, 5 insertions(+), 2 deletions(-) > > > > > > > > Sure, this is OK w/ me as long as folks don't mind accessing the > > > > global here, it's OK w/ me: > > > > > > > > Reviewed-by: Douglas Anderson <dianders@chromium.org> > > > > > > > > It seems to work for me, thus: > > > > > > > > Tested-by: Douglas Anderson <dianders@chromium.org> > > > > > > > > > > > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > > > > > index 814d9aa93b21b..061c69160f90f 100644 > > > > > --- a/arch/arm64/kernel/smp.c > > > > > +++ b/arch/arm64/kernel/smp.c > > > > > @@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr) > > > > > > > > > > static bool ipi_should_be_nmi(enum ipi_msg_type ipi) > > > > > { > > > > > - if (!system_uses_irq_prio_masking()) > > > > > + DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis); > > > > > + > > > > > + if (!system_uses_irq_prio_masking() || > > > > > + !static_branch_likely(&supports_pseudo_nmis)) > > > > > > > > One thought, actually, is whether we should actually change > > > > system_uses_irq_prio_masking() to return the correct value. What do > > > > you think? > > > > > > I don't think we should add this to system_uses_irq_prio_masking(); that's used > > > by the low-level flags manipulation code that gets inlined all over the place, > > > and that code will work regarldess of whether we actually use NMI priorities. > > > > > > If we want to avoid using PMR masking *at all* on these platforms, we'd need to > > > detect that within can_use_gic_priorities() or early_enable_pseudo_nmi(). > > > > I suspect that anyone trying to use PMR masking on these systems for > > any purpose will be unhappy. The issue is talked about in: > > > > https://issuetracker.google.com/281831288 > > > > ...where you can see that the firmware on these systems isn't properly > > saving/restoring some registers, including GICR_IPRIORITYR. > > Ok, then that's a latent bug even before the IPI changes, going back to the > original workaround in commit: > > 44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues") > > For the sake of those reading the archive, can we have a better description of > what exactly happens on these boards? > > IIUC on these boards the firmware fails to save+restore (some?) GICR registers > across (some?) PSCI CPU_SUSPEND idle states. > > Which registers does it save+restore? > > Does it reset other registers into a specific state? Though I'm not an expert in this area, my understanding is that in some of the deeper idle states then GICR registers are lost. That matches a thread [0] I found. In early investigation I found that I could comment out `cpu-idle-states` in the device tree to avoid the problems [1]. I believe this is fully expected which is why firmware is supposed to save/restore these registers whenever a low power is entered/exited. I'd presume that any register not properly saved/restored comes up in whatever its default state is. As far as pseudo-NMI was concerned, all I really needed to save/restore was "GICR_NUM_IPRIORITYR" [2], but Marc Zyngier looked at the code and identified [3] at least these in addition: * GICR_CTLR * GICR_ISPENDR0 * GICR_ISACTIVER0 * GICR_NSACR That list seems to match the Arm Trusted Firmware patch that fixed the issue [4]. ...but it will be impossible to ever get the fix rolled out to all devices. Even if we could get firmware spins Qualified for every device there will still be cases where we'll boot with the old firmware. Since we _don't_ bundle the device tree with the firmware, we believe that the quirk mechanism that we came up with (add a quirk in never device trees and firmware removes the quirk when we have a fix) is at least a robust/reliable way to detect the issue. The whole issue seems rather concerning, but (apparently) it never caused issues in the kernel until we tried to use pseudo-NMI. [0] https://github.com/ARM-software/tf-issues/issues/464 [1] https://issuetracker.google.com/issues/197061987#comment27 [2] https://crrev.com/c/4519877 [3] https://issuetracker.google.com/issues/281831288#comment5 [4] https://github.com/ARM-software/arm-trusted-firmware/commit/1c62cc7fbdf2ec2a7e69b3c027d507fcafdcaa12 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW 2023-10-03 19:32 ` Doug Anderson @ 2023-10-04 9:59 ` Mark Rutland 2023-10-04 10:15 ` Marc Zyngier 0 siblings, 1 reply; 19+ messages in thread From: Mark Rutland @ 2023-10-04 9:59 UTC (permalink / raw) To: Doug Anderson Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf, Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek On Tue, Oct 03, 2023 at 12:32:39PM -0700, Doug Anderson wrote: > Hi, > > On Tue, Oct 3, 2023 at 9:32 AM Mark Rutland <mark.rutland@arm.com> wrote: > > > > On Tue, Oct 03, 2023 at 06:43:07AM -0700, Doug Anderson wrote: > > > Hi, > > > > > > On Tue, Oct 3, 2023 at 5:29 AM Mark Rutland <mark.rutland@arm.com> wrote: > > > > > > > > On Mon, Oct 02, 2023 at 12:16:17PM -0700, Doug Anderson wrote: > > > > > Hi, > > > > > > > > > > On Mon, Oct 2, 2023 at 10:24 AM Mark Rutland <mark.rutland@arm.com> wrote: > > > > > > > > > > > > On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote: > > > > > > > Some mediatek devices have the property > > > > > > > "mediatek,broken-save-restore-fw" in their GIC. This means that, > > > > > > > although the hardware supports pseudo-NMI, the firmware has a bug > > > > > > > that blocks enabling it. When we're in this state, > > > > > > > system_uses_irq_prio_masking() will return true but we'll fail to > > > > > > > actually enable the IRQ in the GIC. > > > > > > > > > > > > > > Let's make the code handle this. We'll detect that we failed to > > > > > > > request an IPI as NMI and fallback to requesting it normally. Though > > > > > > > we expect that either all of our requests will fail or all will > > > > > > > succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us > > > > > > > robust. > > > > > > > > > > > > > > Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > > > > > > Reported-by: Chen-Yu Tsai <wenst@chromium.org> > > > > > > > Closes: https://issuetracker.google.com/issues/197061987#comment68 > > > > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > > > > > --- > > > > > > > > > > > > > > arch/arm64/kernel/smp.c | 19 ++++++++++++------- > > > > > > > 1 file changed, 12 insertions(+), 7 deletions(-) > > > > > > > > > > > > I'm not too keen on falling back here when we have no idea why the request failed. > > > > > > > > > > > > I'd prefer if we could check the `supports_pseudo_nmis` static key directly to > > > > > > account for the case of broken FW, e.g. as below. > > > > > > > > > > > > Mark. > > > > > > > > > > > > ---->8---- > > > > > > From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001 > > > > > > From: Mark Rutland <mark.rutland@arm.com> > > > > > > Date: Mon, 2 Oct 2023 18:00:36 +0100 > > > > > > Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW > > > > > > > > > > > > Some MediaTek devices have broken firmware which corrupts some GICR > > > > > > registers behind the back of the OS, and pseudo-NMIs cannot be used on > > > > > > these devices. For more details see commit: > > > > > > > > > > > > 44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues") > > > > > > > > > > > > We did not take this problem into account in commit: > > > > > > > > > > > > 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > > > > > > > > > > > Since that commit arm64's SMP code will try to setup some IPIs as > > > > > > pseudo-NMIs, even on systems with broken FW. The GICv3 code will > > > > > > (rightly) reject attempts to request interrupts as pseudo-NMIs, > > > > > > resulting in boot-time failures. > > > > > > > > > > > > Avoid the problem by taking the broken FW into account when deciding to > > > > > > request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key > > > > > > named "supports_pseudo_nmis" which is false on systems with broken FW, > > > > > > and we can consult this within ipi_should_be_nmi(). > > > > > > > > > > > > Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > > > > > Reported-by: Chen-Yu Tsai <wenst@chromium.org> > > > > > > Closes: https://issuetracker.google.com/issues/197061987#comment68 > > > > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > > > > > > Cc: Douglas Anderson <dianders@chromium.org> > > > > > > Cc: Marc Zyngier <maz@kernel.org> > > > > > > --- > > > > > > arch/arm64/kernel/smp.c | 5 ++++- > > > > > > drivers/irqchip/irq-gic-v3.c | 2 +- > > > > > > 2 files changed, 5 insertions(+), 2 deletions(-) > > > > > > > > > > Sure, this is OK w/ me as long as folks don't mind accessing the > > > > > global here, it's OK w/ me: > > > > > > > > > > Reviewed-by: Douglas Anderson <dianders@chromium.org> > > > > > > > > > > It seems to work for me, thus: > > > > > > > > > > Tested-by: Douglas Anderson <dianders@chromium.org> > > > > > > > > > > > > > > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > > > > > > index 814d9aa93b21b..061c69160f90f 100644 > > > > > > --- a/arch/arm64/kernel/smp.c > > > > > > +++ b/arch/arm64/kernel/smp.c > > > > > > @@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr) > > > > > > > > > > > > static bool ipi_should_be_nmi(enum ipi_msg_type ipi) > > > > > > { > > > > > > - if (!system_uses_irq_prio_masking()) > > > > > > + DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis); > > > > > > + > > > > > > + if (!system_uses_irq_prio_masking() || > > > > > > + !static_branch_likely(&supports_pseudo_nmis)) > > > > > > > > > > One thought, actually, is whether we should actually change > > > > > system_uses_irq_prio_masking() to return the correct value. What do > > > > > you think? > > > > > > > > I don't think we should add this to system_uses_irq_prio_masking(); that's used > > > > by the low-level flags manipulation code that gets inlined all over the place, > > > > and that code will work regarldess of whether we actually use NMI priorities. > > > > > > > > If we want to avoid using PMR masking *at all* on these platforms, we'd need to > > > > detect that within can_use_gic_priorities() or early_enable_pseudo_nmi(). > > > > > > I suspect that anyone trying to use PMR masking on these systems for > > > any purpose will be unhappy. The issue is talked about in: > > > > > > https://issuetracker.google.com/281831288 > > > > > > ...where you can see that the firmware on these systems isn't properly > > > saving/restoring some registers, including GICR_IPRIORITYR. > > > > Ok, then that's a latent bug even before the IPI changes, going back to the > > original workaround in commit: > > > > 44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues") > > > > For the sake of those reading the archive, can we have a better description of > > what exactly happens on these boards? > > > > IIUC on these boards the firmware fails to save+restore (some?) GICR registers > > across (some?) PSCI CPU_SUSPEND idle states. > > > > Which registers does it save+restore? > > > > Does it reset other registers into a specific state? > > Though I'm not an expert in this area, my understanding is that in > some of the deeper idle states then GICR registers are lost. That > matches a thread [0] I found. In early investigation I found that I > could comment out `cpu-idle-states` in the device tree to avoid the > problems [1]. I believe this is fully expected which is why firmware > is supposed to save/restore these registers whenever a low power is > entered/exited. I'd presume that any register not properly > saved/restored comes up in whatever its default state is. > > As far as pseudo-NMI was concerned, all I really needed to > save/restore was "GICR_NUM_IPRIORITYR" [2], but Marc Zyngier looked at > the code and identified [3] at least these in addition: > * GICR_CTLR > * GICR_ISPENDR0 > * GICR_ISACTIVER0 > * GICR_NSACR Looking at the GIC spec (Arm IHI 0069H), page 12-673, I see for all of the GICR_IPRIORITYR<n>.Priority_offset_*B fields: | The reset behavior of this field is: | • On a GIC reset, this field resets to an architecturally UNKNOWN value. ... so at least per the architecture these could be reset to arbitrary values, and that priority might permit SGI/PPIs to be taken IRQs are priority-masked, or to prevent SGI/PPIs to be taken when priority-unmasked. I also see for GICR_CTLR that EnableLPIs would be reset to 0, and IIUC that means LPIs won't work on these parts, which seems like a problem. > That list seems to match the Arm Trusted Firmware patch that fixed the > issue [4]. ...but it will be impossible to ever get the fix rolled out > to all devices. Even if we could get firmware spins Qualified for > every device there will still be cases where we'll boot with the old > firmware. Since we _don't_ bundle the device tree with the firmware, > we believe that the quirk mechanism that we came up with (add a quirk > in never device trees and firmware removes the quirk when we have a > fix) is at least a robust/reliable way to detect the issue. > > The whole issue seems rather concerning, but (apparently) it never > caused issues in the kernel until we tried to use pseudo-NMI. Given you haven't seen any issues, I suspect those are getting reset to fixed values that happens to work out for us, but it is a bit worrisome more generally (e.g. the LPI case above). Mark. > > [0] https://github.com/ARM-software/tf-issues/issues/464 > [1] https://issuetracker.google.com/issues/197061987#comment27 > [2] https://crrev.com/c/4519877 > [3] https://issuetracker.google.com/issues/281831288#comment5 > [4] https://github.com/ARM-software/arm-trusted-firmware/commit/1c62cc7fbdf2ec2a7e69b3c027d507fcafdcaa12 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW 2023-10-04 9:59 ` Mark Rutland @ 2023-10-04 10:15 ` Marc Zyngier 2023-10-04 14:04 ` Doug Anderson 0 siblings, 1 reply; 19+ messages in thread From: Marc Zyngier @ 2023-10-04 10:15 UTC (permalink / raw) To: Mark Rutland Cc: Doug Anderson, Catalin Marinas, Will Deacon, Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf, Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek On Wed, 04 Oct 2023 10:59:50 +0100, Mark Rutland <mark.rutland@arm.com> wrote: > > Given you haven't seen any issues, I suspect those are getting reset to fixed > values that happens to work out for us, but it is a bit worrisome more > generally (e.g. the LPI case above). It is likely that these SoCs don't even have an ITS. M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW 2023-10-04 10:15 ` Marc Zyngier @ 2023-10-04 14:04 ` Doug Anderson 2023-10-05 10:27 ` Mark Rutland 0 siblings, 1 reply; 19+ messages in thread From: Doug Anderson @ 2023-10-04 14:04 UTC (permalink / raw) To: Marc Zyngier Cc: Mark Rutland, Catalin Marinas, Will Deacon, Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf, Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek Hi, On Wed, Oct 4, 2023 at 3:15 AM Marc Zyngier <maz@kernel.org> wrote: > > On Wed, 04 Oct 2023 10:59:50 +0100, > Mark Rutland <mark.rutland@arm.com> wrote: > > > > Given you haven't seen any issues, I suspect those are getting reset to fixed > > values that happens to work out for us, but it is a bit worrisome more > > generally (e.g. the LPI case above). > > It is likely that these SoCs don't even have an ITS. Right. That was what we decided [1] when Marc pointed this out earlier. Overall: we know that this firmware behavior is not good but we're stuck with it. :( At the very least, any new devices coming out will have this fixed. Presumably if old devices are working OK enough today (as long as you don't enable pseudo-NMI) then they can be made to keep working? So circling back: what patch should we actually land? As of right now only pseudo-NMI is broken, but it would be good to make sure that if the kernel later adds other features that would be broken on this hardware that it gets handled properly... [1] https://issuetracker.google.com/issues/281831288#comment4 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW 2023-10-04 14:04 ` Doug Anderson @ 2023-10-05 10:27 ` Mark Rutland 2023-10-05 15:34 ` Doug Anderson 2023-10-06 10:20 ` Marc Zyngier 0 siblings, 2 replies; 19+ messages in thread From: Mark Rutland @ 2023-10-05 10:27 UTC (permalink / raw) To: Doug Anderson Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf, Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek On Wed, Oct 04, 2023 at 07:04:12AM -0700, Doug Anderson wrote: > On Wed, Oct 4, 2023 at 3:15 AM Marc Zyngier <maz@kernel.org> wrote: > > > > On Wed, 04 Oct 2023 10:59:50 +0100, > > Mark Rutland <mark.rutland@arm.com> wrote: > > > > > > Given you haven't seen any issues, I suspect those are getting reset to fixed > > > values that happens to work out for us, but it is a bit worrisome more > > > generally (e.g. the LPI case above). > > > > It is likely that these SoCs don't even have an ITS. > > Right. That was what we decided [1] when Marc pointed this out earlier. > > Overall: we know that this firmware behavior is not good but we're > stuck with it. :( At the very least, any new devices coming out will > have this fixed. Presumably if old devices are working OK enough today > (as long as you don't enable pseudo-NMI) then they can be made to keep > working? > > So circling back: what patch should we actually land? For now I'd prefer we took the patch I sent in: https://lore.kernel.org/linux-arm-kernel/ZRr8r7XMoyDKaitd@FVFF77S0Q05N.cambridge.arm.com/ ... as that leaves us no worse than before this series, and it's pretty simple. > As of right now only pseudo-NMI is broken, but it would be good to make sure > that if the kernel later adds other features that would be broken on this > hardware that it gets handled properly... Going further than the above, I think there are three options here: 1) Complete fix: depend on a working firmware, and throw this workaround away. IIUC from the above, that's not something you can commit to. 2) Partial fix: have the kernel save/restore everything. IIUC this is unpalatable. 3) Partial fix: make the ARM64_HAS_GIC_PRIO_MASKING cpucap depend on the absence of a "mediatek,broken-save-restore-fw" property in the DT. I believe we can check that in early_enable_pseudo_nmi() or can_use_gic_priorities(). That'll avoid potential issues if/when we change the priorities used for pNMI (which is something I've been looking at). I'm happy with (3) if Marc is. Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW 2023-10-05 10:27 ` Mark Rutland @ 2023-10-05 15:34 ` Doug Anderson 2023-10-06 12:19 ` Catalin Marinas 2023-10-06 10:20 ` Marc Zyngier 1 sibling, 1 reply; 19+ messages in thread From: Doug Anderson @ 2023-10-05 15:34 UTC (permalink / raw) To: Mark Rutland, Catalin Marinas, Will Deacon Cc: Marc Zyngier, Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf, Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek Hi, On Thu, Oct 5, 2023 at 3:27 AM Mark Rutland <mark.rutland@arm.com> wrote: > > On Wed, Oct 04, 2023 at 07:04:12AM -0700, Doug Anderson wrote: > > On Wed, Oct 4, 2023 at 3:15 AM Marc Zyngier <maz@kernel.org> wrote: > > > > > > On Wed, 04 Oct 2023 10:59:50 +0100, > > > Mark Rutland <mark.rutland@arm.com> wrote: > > > > > > > > Given you haven't seen any issues, I suspect those are getting reset to fixed > > > > values that happens to work out for us, but it is a bit worrisome more > > > > generally (e.g. the LPI case above). > > > > > > It is likely that these SoCs don't even have an ITS. > > > > Right. That was what we decided [1] when Marc pointed this out earlier. > > > > Overall: we know that this firmware behavior is not good but we're > > stuck with it. :( At the very least, any new devices coming out will > > have this fixed. Presumably if old devices are working OK enough today > > (as long as you don't enable pseudo-NMI) then they can be made to keep > > working? > > > > So circling back: what patch should we actually land? > > For now I'd prefer we took the patch I sent in: > > https://lore.kernel.org/linux-arm-kernel/ZRr8r7XMoyDKaitd@FVFF77S0Q05N.cambridge.arm.com/ > > ... as that leaves us no worse than before this series, and it's pretty simple. Sounds good to me! Catalin / Will: Please yell if there's anything you need me to do. Otherwise I'll assume you'll pick up Mark's patch instead of my patch #1 and then you'll pick up my patch #2. > > As of right now only pseudo-NMI is broken, but it would be good to make sure > > that if the kernel later adds other features that would be broken on this > > hardware that it gets handled properly... > > Going further than the above, I think there are three options here: > > 1) Complete fix: depend on a working firmware, and throw this workaround away. > > IIUC from the above, that's not something you can commit to. Right. We've landed the fix in the firmware branch for many of the devices that had the issue but there's a whole process (and cost involved) in getting this actually rolled out. Each unique board needs to kick off a FW qual. Given that nothing is actually broken today it's hard to justify a FW qual just for this, so we're left piggybacking on the next reason for a FW qual (if there is one). ...even if we could kick them off all instantly, though, it's always best not to rely on a FW fix, especially if (as in this case) it's to keep us from crashing. There'll always be some case where someone tries to boot a new OS with an old firmware. One such case can happen when booting recovery images on ChromeOS where the device always boots from the Read-only (not updatable) firmware. > 2) Partial fix: have the kernel save/restore everything. > > IIUC this is unpalatable. Yeah, Marc already NAKed it. > 3) Partial fix: make the ARM64_HAS_GIC_PRIO_MASKING cpucap depend on the > absence of a "mediatek,broken-save-restore-fw" property in the DT. I believe > we can check that in early_enable_pseudo_nmi() or can_use_gic_priorities(). > > That'll avoid potential issues if/when we change the priorities used for > pNMI (which is something I've been looking at). > > I'm happy with (3) if Marc is. Yeah, that seems best to me as a long term solution, too. -Doug _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW 2023-10-05 15:34 ` Doug Anderson @ 2023-10-06 12:19 ` Catalin Marinas 0 siblings, 0 replies; 19+ messages in thread From: Catalin Marinas @ 2023-10-06 12:19 UTC (permalink / raw) To: Doug Anderson Cc: Mark Rutland, Will Deacon, Marc Zyngier, Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf, Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek On Thu, Oct 05, 2023 at 08:34:56AM -0700, Doug Anderson wrote: > On Thu, Oct 5, 2023 at 3:27 AM Mark Rutland <mark.rutland@arm.com> wrote: > > On Wed, Oct 04, 2023 at 07:04:12AM -0700, Doug Anderson wrote: > > > On Wed, Oct 4, 2023 at 3:15 AM Marc Zyngier <maz@kernel.org> wrote: > > > > On Wed, 04 Oct 2023 10:59:50 +0100, > > > > Mark Rutland <mark.rutland@arm.com> wrote: > > > > > Given you haven't seen any issues, I suspect those are getting reset to fixed > > > > > values that happens to work out for us, but it is a bit worrisome more > > > > > generally (e.g. the LPI case above). > > > > > > > > It is likely that these SoCs don't even have an ITS. > > > > > > Right. That was what we decided [1] when Marc pointed this out earlier. > > > > > > Overall: we know that this firmware behavior is not good but we're > > > stuck with it. :( At the very least, any new devices coming out will > > > have this fixed. Presumably if old devices are working OK enough today > > > (as long as you don't enable pseudo-NMI) then they can be made to keep > > > working? > > > > > > So circling back: what patch should we actually land? > > > > For now I'd prefer we took the patch I sent in: > > > > https://lore.kernel.org/linux-arm-kernel/ZRr8r7XMoyDKaitd@FVFF77S0Q05N.cambridge.arm.com/ > > > > ... as that leaves us no worse than before this series, and it's pretty simple. > > Sounds good to me! > > Catalin / Will: Please yell if there's anything you need me to do. > Otherwise I'll assume you'll pick up Mark's patch instead of my patch > #1 and then you'll pick up my patch #2. I applied both to the arm64 for-next/backtrace-ipi branch. Thanks. -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW 2023-10-05 10:27 ` Mark Rutland 2023-10-05 15:34 ` Doug Anderson @ 2023-10-06 10:20 ` Marc Zyngier 2023-10-06 22:17 ` Doug Anderson 1 sibling, 1 reply; 19+ messages in thread From: Marc Zyngier @ 2023-10-06 10:20 UTC (permalink / raw) To: Mark Rutland Cc: Doug Anderson, Catalin Marinas, Will Deacon, Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf, Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek On Thu, 05 Oct 2023 11:27:26 +0100, Mark Rutland <mark.rutland@arm.com> wrote: > 3) Partial fix: make the ARM64_HAS_GIC_PRIO_MASKING cpucap depend on the > absence of a "mediatek,broken-save-restore-fw" property in the DT. I believe > we can check that in early_enable_pseudo_nmi() or can_use_gic_priorities(). > > That'll avoid potential issues if/when we change the priorities used for > pNMI (which is something I've been looking at). > > I'm happy with (3) if Marc is. Definitely worth investigating if you have the bandwidth. M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW 2023-10-06 10:20 ` Marc Zyngier @ 2023-10-06 22:17 ` Doug Anderson 0 siblings, 0 replies; 19+ messages in thread From: Doug Anderson @ 2023-10-06 22:17 UTC (permalink / raw) To: Marc Zyngier Cc: Mark Rutland, Catalin Marinas, Will Deacon, Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf, Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek Hi, On Fri, Oct 6, 2023 at 3:20 AM Marc Zyngier <maz@kernel.org> wrote: > > On Thu, 05 Oct 2023 11:27:26 +0100, > Mark Rutland <mark.rutland@arm.com> wrote: > > > 3) Partial fix: make the ARM64_HAS_GIC_PRIO_MASKING cpucap depend on the > > absence of a "mediatek,broken-save-restore-fw" property in the DT. I believe > > we can check that in early_enable_pseudo_nmi() or can_use_gic_priorities(). > > > > That'll avoid potential issues if/when we change the priorities used for > > pNMI (which is something I've been looking at). > > > > I'm happy with (3) if Marc is. > > Definitely worth investigating if you have the bandwidth. I made an attempt at it. See: https://lore.kernel.org/20231006151547.1.Ide945748593cffd8ff0feb9ae22b795935b944d6@changeid _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW 2023-10-02 17:24 ` [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Mark Rutland 2023-10-02 19:16 ` Doug Anderson @ 2023-10-06 9:55 ` Marc Zyngier 2023-10-06 10:20 ` Chen-Yu Tsai 2 siblings, 0 replies; 19+ messages in thread From: Marc Zyngier @ 2023-10-06 9:55 UTC (permalink / raw) To: Mark Rutland Cc: Douglas Anderson, Catalin Marinas, Will Deacon, Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf, Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek On Mon, 02 Oct 2023 18:24:11 +0100, Mark Rutland <mark.rutland@arm.com> wrote: > > On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote: > > Some mediatek devices have the property > > "mediatek,broken-save-restore-fw" in their GIC. This means that, > > although the hardware supports pseudo-NMI, the firmware has a bug > > that blocks enabling it. When we're in this state, > > system_uses_irq_prio_masking() will return true but we'll fail to > > actually enable the IRQ in the GIC. > > > > Let's make the code handle this. We'll detect that we failed to > > request an IPI as NMI and fallback to requesting it normally. Though > > we expect that either all of our requests will fail or all will > > succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us > > robust. > > > > Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > Reported-by: Chen-Yu Tsai <wenst@chromium.org> > > Closes: https://issuetracker.google.com/issues/197061987#comment68 > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > --- > > > > arch/arm64/kernel/smp.c | 19 ++++++++++++------- > > 1 file changed, 12 insertions(+), 7 deletions(-) > > I'm not too keen on falling back here when we have no idea why the request failed. > > I'd prefer if we could check the `supports_pseudo_nmis` static key directly to > account for the case of broken FW, e.g. as below. > > Mark. > > ---->8---- > From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001 > From: Mark Rutland <mark.rutland@arm.com> > Date: Mon, 2 Oct 2023 18:00:36 +0100 > Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW > > Some MediaTek devices have broken firmware which corrupts some GICR > registers behind the back of the OS, and pseudo-NMIs cannot be used on > these devices. For more details see commit: > > 44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues") > > We did not take this problem into account in commit: > > 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > Since that commit arm64's SMP code will try to setup some IPIs as > pseudo-NMIs, even on systems with broken FW. The GICv3 code will > (rightly) reject attempts to request interrupts as pseudo-NMIs, > resulting in boot-time failures. > > Avoid the problem by taking the broken FW into account when deciding to > request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key > named "supports_pseudo_nmis" which is false on systems with broken FW, > and we can consult this within ipi_should_be_nmi(). > > Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > Reported-by: Chen-Yu Tsai <wenst@chromium.org> > Closes: https://issuetracker.google.com/issues/197061987#comment68 > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > Cc: Douglas Anderson <dianders@chromium.org> > Cc: Marc Zyngier <maz@kernel.org> > --- > arch/arm64/kernel/smp.c | 5 ++++- > drivers/irqchip/irq-gic-v3.c | 2 +- > 2 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 814d9aa93b21b..061c69160f90f 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr) > > static bool ipi_should_be_nmi(enum ipi_msg_type ipi) > { > - if (!system_uses_irq_prio_masking()) > + DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis); > + > + if (!system_uses_irq_prio_masking() || > + !static_branch_likely(&supports_pseudo_nmis)) > return false; > > switch (ipi) { > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c > index 787ccc880b22d..737da1b9aabf2 100644 > --- a/drivers/irqchip/irq-gic-v3.c > +++ b/drivers/irqchip/irq-gic-v3.c > @@ -106,7 +106,7 @@ static DEFINE_STATIC_KEY_TRUE(supports_deactivate_key); > * - Figure 4-7 Secure read of the priority field for a Non-secure Group 1 > * interrupt. > */ > -static DEFINE_STATIC_KEY_FALSE(supports_pseudo_nmis); > +DEFINE_STATIC_KEY_FALSE(supports_pseudo_nmis); > > DEFINE_STATIC_KEY_FALSE(gic_nonsecure_priorities); > EXPORT_SYMBOL(gic_nonsecure_priorities); This last hunk is going to result in more spam from the robots about global objects without a previous declaration. Not that I care the least, but worth mentioning. Other than that: Reviewed-by: Marc Zyngier <maz@kernel.org> Please take it via the arm64 tree with patch #1 Thanks, M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW 2023-10-02 17:24 ` [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Mark Rutland 2023-10-02 19:16 ` Doug Anderson 2023-10-06 9:55 ` Marc Zyngier @ 2023-10-06 10:20 ` Chen-Yu Tsai 2 siblings, 0 replies; 19+ messages in thread From: Chen-Yu Tsai @ 2023-10-06 10:20 UTC (permalink / raw) To: Mark Rutland Cc: Douglas Anderson, Catalin Marinas, Will Deacon, Marc Zyngier, Stephen Boyd, Valentin Schneider, AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf, Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek On Tue, Oct 3, 2023 at 1:24 AM Mark Rutland <mark.rutland@arm.com> wrote: > > On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote: > > Some mediatek devices have the property > > "mediatek,broken-save-restore-fw" in their GIC. This means that, > > although the hardware supports pseudo-NMI, the firmware has a bug > > that blocks enabling it. When we're in this state, > > system_uses_irq_prio_masking() will return true but we'll fail to > > actually enable the IRQ in the GIC. > > > > Let's make the code handle this. We'll detect that we failed to > > request an IPI as NMI and fallback to requesting it normally. Though > > we expect that either all of our requests will fail or all will > > succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us > > robust. > > > > Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > Reported-by: Chen-Yu Tsai <wenst@chromium.org> > > Closes: https://issuetracker.google.com/issues/197061987#comment68 > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > --- > > > > arch/arm64/kernel/smp.c | 19 ++++++++++++------- > > 1 file changed, 12 insertions(+), 7 deletions(-) > > I'm not too keen on falling back here when we have no idea why the request failed. > > I'd prefer if we could check the `supports_pseudo_nmis` static key directly to > account for the case of broken FW, e.g. as below. > > Mark. > > ---->8---- > From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001 > From: Mark Rutland <mark.rutland@arm.com> > Date: Mon, 2 Oct 2023 18:00:36 +0100 > Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW > > Some MediaTek devices have broken firmware which corrupts some GICR > registers behind the back of the OS, and pseudo-NMIs cannot be used on > these devices. For more details see commit: > > 44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues") > > We did not take this problem into account in commit: > > 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > > Since that commit arm64's SMP code will try to setup some IPIs as > pseudo-NMIs, even on systems with broken FW. The GICv3 code will > (rightly) reject attempts to request interrupts as pseudo-NMIs, > resulting in boot-time failures. > > Avoid the problem by taking the broken FW into account when deciding to > request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key > named "supports_pseudo_nmis" which is false on systems with broken FW, > and we can consult this within ipi_should_be_nmi(). > > Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") > Reported-by: Chen-Yu Tsai <wenst@chromium.org> > Closes: https://issuetracker.google.com/issues/197061987#comment68 > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > Cc: Douglas Anderson <dianders@chromium.org> > Cc: Marc Zyngier <maz@kernel.org> Tested-by: Chen-Yu Tsai <wenst@chromium.org> > --- > arch/arm64/kernel/smp.c | 5 ++++- > drivers/irqchip/irq-gic-v3.c | 2 +- > 2 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 814d9aa93b21b..061c69160f90f 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr) > > static bool ipi_should_be_nmi(enum ipi_msg_type ipi) > { > - if (!system_uses_irq_prio_masking()) > + DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis); > + > + if (!system_uses_irq_prio_masking() || > + !static_branch_likely(&supports_pseudo_nmis)) > return false; > > switch (ipi) { > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c > index 787ccc880b22d..737da1b9aabf2 100644 > --- a/drivers/irqchip/irq-gic-v3.c > +++ b/drivers/irqchip/irq-gic-v3.c > @@ -106,7 +106,7 @@ static DEFINE_STATIC_KEY_TRUE(supports_deactivate_key); > * - Figure 4-7 Secure read of the priority field for a Non-secure Group 1 > * interrupt. > */ > -static DEFINE_STATIC_KEY_FALSE(supports_pseudo_nmis); > +DEFINE_STATIC_KEY_FALSE(supports_pseudo_nmis); > > DEFINE_STATIC_KEY_FALSE(gic_nonsecure_priorities); > EXPORT_SYMBOL(gic_nonsecure_priorities); > -- > 2.30.2 > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2023-10-06 22:18 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-10-02 16:45 [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Douglas Anderson 2023-10-02 16:45 ` [PATCH 2/2] arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup Douglas Anderson 2023-10-02 17:25 ` Mark Rutland 2023-10-02 17:24 ` [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Mark Rutland 2023-10-02 19:16 ` Doug Anderson 2023-10-03 12:29 ` Mark Rutland 2023-10-03 13:43 ` Doug Anderson 2023-10-03 16:32 ` Mark Rutland 2023-10-03 19:32 ` Doug Anderson 2023-10-04 9:59 ` Mark Rutland 2023-10-04 10:15 ` Marc Zyngier 2023-10-04 14:04 ` Doug Anderson 2023-10-05 10:27 ` Mark Rutland 2023-10-05 15:34 ` Doug Anderson 2023-10-06 12:19 ` Catalin Marinas 2023-10-06 10:20 ` Marc Zyngier 2023-10-06 22:17 ` Doug Anderson 2023-10-06 9:55 ` Marc Zyngier 2023-10-06 10:20 ` Chen-Yu Tsai
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).