linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
@ 2023-10-02 16:45 Douglas Anderson
  2023-10-02 16:45 ` [PATCH 2/2] arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup Douglas Anderson
  2023-10-02 17:24 ` [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Mark Rutland
  0 siblings, 2 replies; 19+ messages in thread
From: Douglas Anderson @ 2023-10-02 16:45 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Marc Zyngier
  Cc: Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, Douglas Anderson,
	AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf,
	Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg,
	Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek

Some mediatek devices have the property
"mediatek,broken-save-restore-fw" in their GIC. This means that,
although the hardware supports pseudo-NMI, the firmware has a bug
that blocks enabling it. When we're in this state,
system_uses_irq_prio_masking() will return true but we'll fail to
actually enable the IRQ in the GIC.

Let's make the code handle this. We'll detect that we failed to
request an IPI as NMI and fallback to requesting it normally. Though
we expect that either all of our requests will fail or all will
succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us
robust.

Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
Reported-by: Chen-Yu Tsai <wenst@chromium.org>
Closes: https://issuetracker.google.com/issues/197061987#comment68
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 arch/arm64/kernel/smp.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 814d9aa93b21..0a6002243a8c 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -87,6 +87,7 @@ enum ipi_msg_type {
 static int ipi_irq_base __ro_after_init;
 static int nr_ipi __ro_after_init = NR_IPI;
 static struct irq_desc *ipi_desc[MAX_IPI] __ro_after_init;
+DECLARE_BITMAP(ipi_is_nmi, MAX_IPI);
 
 static void ipi_setup(int cpu);
 
@@ -986,7 +987,7 @@ static void ipi_setup(int cpu)
 		return;
 
 	for (i = 0; i < nr_ipi; i++) {
-		if (ipi_should_be_nmi(i)) {
+		if (test_bit(i, ipi_is_nmi)) {
 			prepare_percpu_nmi(ipi_irq_base + i);
 			enable_percpu_nmi(ipi_irq_base + i, 0);
 		} else {
@@ -1004,7 +1005,7 @@ static void ipi_teardown(int cpu)
 		return;
 
 	for (i = 0; i < nr_ipi; i++) {
-		if (ipi_should_be_nmi(i)) {
+		if (test_bit(i, ipi_is_nmi)) {
 			disable_percpu_nmi(ipi_irq_base + i);
 			teardown_percpu_nmi(ipi_irq_base + i);
 		} else {
@@ -1022,17 +1023,21 @@ void __init set_smp_ipi_range(int ipi_base, int n)
 	nr_ipi = min(n, MAX_IPI);
 
 	for (i = 0; i < nr_ipi; i++) {
-		int err;
+		int err = -EINVAL;
 
 		if (ipi_should_be_nmi(i)) {
 			err = request_percpu_nmi(ipi_base + i, ipi_handler,
 						 "IPI", &cpu_number);
-			WARN(err, "Could not request IPI %d as NMI, err=%d\n",
-			     i, err);
-		} else {
+			if (err)
+				pr_info_once("NMI unavailable; fallback to IRQ\n");
+			else
+				set_bit(i, ipi_is_nmi);
+		}
+
+		if (err) {
 			err = request_percpu_irq(ipi_base + i, ipi_handler,
 						 "IPI", &cpu_number);
-			WARN(err, "Could not request IPI %d as IRQ, err=%d\n",
+			WARN(err, "Could not request IPI %d, err=%d\n",
 			     i, err);
 		}
 
-- 
2.42.0.582.g8ccd20d70d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/2] arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup
  2023-10-02 16:45 [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Douglas Anderson
@ 2023-10-02 16:45 ` Douglas Anderson
  2023-10-02 17:25   ` Mark Rutland
  2023-10-02 17:24 ` [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Mark Rutland
  1 sibling, 1 reply; 19+ messages in thread
From: Douglas Anderson @ 2023-10-02 16:45 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Marc Zyngier
  Cc: Stephen Boyd, Valentin Schneider, Chen-Yu Tsai, Douglas Anderson,
	D Scott Phillips, Josh Poimboeuf, Peter Zijlstra, Sumit Garg,
	Thomas Gleixner, linux-arm-kernel, linux-kernel

In commit 2b2d0a7a96ab ("arm64: smp: Remove dedicated wakeup IPI") we
started using a scheduler IPI to avoid a dedicated reschedule. When we
did this, we used arch_smp_send_reschedule() directly rather than
calling smp_send_reschedule(). The only difference is that calling
arch_smp_send_reschedule() directly avoids tracing. Presumably we
_don't_ want to avoid tracing here, so switch to
smp_send_reschedule().

Fixes: 2b2d0a7a96ab ("arm64: smp: Remove dedicated wakeup IPI")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
I don't 100% know if this is correct and I don't have any hardware
that uses the "ACPI parking protocol", but I think it's right. My main
incentive for this is that it makes it easier to backport pseudo-NMI
to kernels that don't have arch_smp_send_reschedule(), but I think
it's also more correct.

If for some reason we truly did want to avoid tracing here, please
shout and we can drop this patch.

 arch/arm64/kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 0a6002243a8c..b530d8ef9c1d 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -1063,7 +1063,7 @@ void arch_send_wakeup_ipi(unsigned int cpu)
 	 * We use a scheduler IPI to wake the CPU as this avoids the need for a
 	 * dedicated IPI and we can safely handle spurious scheduler IPIs.
 	 */
-	arch_smp_send_reschedule(cpu);
+	smp_send_reschedule(cpu);
 }
 #endif
 
-- 
2.42.0.582.g8ccd20d70d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
  2023-10-02 16:45 [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Douglas Anderson
  2023-10-02 16:45 ` [PATCH 2/2] arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup Douglas Anderson
@ 2023-10-02 17:24 ` Mark Rutland
  2023-10-02 19:16   ` Doug Anderson
                     ` (2 more replies)
  1 sibling, 3 replies; 19+ messages in thread
From: Mark Rutland @ 2023-10-02 17:24 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Stephen Boyd,
	Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno,
	D Scott Phillips, Josh Poimboeuf, Matthias Brugger,
	Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, linux-mediatek

On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote:
> Some mediatek devices have the property
> "mediatek,broken-save-restore-fw" in their GIC. This means that,
> although the hardware supports pseudo-NMI, the firmware has a bug
> that blocks enabling it. When we're in this state,
> system_uses_irq_prio_masking() will return true but we'll fail to
> actually enable the IRQ in the GIC.
> 
> Let's make the code handle this. We'll detect that we failed to
> request an IPI as NMI and fallback to requesting it normally. Though
> we expect that either all of our requests will fail or all will
> succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us
> robust.
> 
> Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> Reported-by: Chen-Yu Tsai <wenst@chromium.org>
> Closes: https://issuetracker.google.com/issues/197061987#comment68
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
>  arch/arm64/kernel/smp.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)

I'm not too keen on falling back here when we have no idea why the request failed.

I'd prefer if we could check the `supports_pseudo_nmis` static key directly to
account for the case of broken FW, e.g. as below.

Mark.

---->8----
From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001
From: Mark Rutland <mark.rutland@arm.com>
Date: Mon, 2 Oct 2023 18:00:36 +0100
Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW

Some MediaTek devices have broken firmware which corrupts some GICR
registers behind the back of the OS, and pseudo-NMIs cannot be used on
these devices. For more details see commit:

  44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues")

We did not take this problem into account in commit:

  331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")

Since that commit arm64's SMP code will try to setup some IPIs as
pseudo-NMIs, even on systems with broken FW. The GICv3 code will
(rightly) reject attempts to request interrupts as pseudo-NMIs,
resulting in boot-time failures.

Avoid the problem by taking the broken FW into account when deciding to
request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key
named "supports_pseudo_nmis" which is false on systems with broken FW,
and we can consult this within ipi_should_be_nmi().

Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
Reported-by: Chen-Yu Tsai <wenst@chromium.org>
Closes: https://issuetracker.google.com/issues/197061987#comment68
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kernel/smp.c      | 5 ++++-
 drivers/irqchip/irq-gic-v3.c | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 814d9aa93b21b..061c69160f90f 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
 
 static bool ipi_should_be_nmi(enum ipi_msg_type ipi)
 {
-	if (!system_uses_irq_prio_masking())
+	DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis);
+
+	if (!system_uses_irq_prio_masking() ||
+	    !static_branch_likely(&supports_pseudo_nmis))
 		return false;
 
 	switch (ipi) {
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 787ccc880b22d..737da1b9aabf2 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -106,7 +106,7 @@ static DEFINE_STATIC_KEY_TRUE(supports_deactivate_key);
  * - Figure 4-7 Secure read of the priority field for a Non-secure Group 1
  *   interrupt.
  */
-static DEFINE_STATIC_KEY_FALSE(supports_pseudo_nmis);
+DEFINE_STATIC_KEY_FALSE(supports_pseudo_nmis);
 
 DEFINE_STATIC_KEY_FALSE(gic_nonsecure_priorities);
 EXPORT_SYMBOL(gic_nonsecure_priorities);
-- 
2.30.2

 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup
  2023-10-02 16:45 ` [PATCH 2/2] arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup Douglas Anderson
@ 2023-10-02 17:25   ` Mark Rutland
  0 siblings, 0 replies; 19+ messages in thread
From: Mark Rutland @ 2023-10-02 17:25 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Stephen Boyd,
	Valentin Schneider, Chen-Yu Tsai, D Scott Phillips,
	Josh Poimboeuf, Peter Zijlstra, Sumit Garg, Thomas Gleixner,
	linux-arm-kernel, linux-kernel

On Mon, Oct 02, 2023 at 09:45:30AM -0700, Douglas Anderson wrote:
> In commit 2b2d0a7a96ab ("arm64: smp: Remove dedicated wakeup IPI") we
> started using a scheduler IPI to avoid a dedicated reschedule. When we
> did this, we used arch_smp_send_reschedule() directly rather than
> calling smp_send_reschedule(). The only difference is that calling
> arch_smp_send_reschedule() directly avoids tracing. Presumably we
> _don't_ want to avoid tracing here, so switch to
> smp_send_reschedule().
> 
> Fixes: 2b2d0a7a96ab ("arm64: smp: Remove dedicated wakeup IPI")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> I don't 100% know if this is correct and I don't have any hardware
> that uses the "ACPI parking protocol", but I think it's right. My main
> incentive for this is that it makes it easier to backport pseudo-NMI
> to kernels that don't have arch_smp_send_reschedule(), but I think
> it's also more correct.
> 
> If for some reason we truly did want to avoid tracing here, please
> shout and we can drop this patch.

This should be sound, and I don't have strong feelings either way on this, so
FWIW:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> 
>  arch/arm64/kernel/smp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 0a6002243a8c..b530d8ef9c1d 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -1063,7 +1063,7 @@ void arch_send_wakeup_ipi(unsigned int cpu)
>  	 * We use a scheduler IPI to wake the CPU as this avoids the need for a
>  	 * dedicated IPI and we can safely handle spurious scheduler IPIs.
>  	 */
> -	arch_smp_send_reschedule(cpu);
> +	smp_send_reschedule(cpu);
>  }
>  #endif
>  
> -- 
> 2.42.0.582.g8ccd20d70d-goog
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
  2023-10-02 17:24 ` [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Mark Rutland
@ 2023-10-02 19:16   ` Doug Anderson
  2023-10-03 12:29     ` Mark Rutland
  2023-10-06  9:55   ` Marc Zyngier
  2023-10-06 10:20   ` Chen-Yu Tsai
  2 siblings, 1 reply; 19+ messages in thread
From: Doug Anderson @ 2023-10-02 19:16 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Stephen Boyd,
	Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno,
	D Scott Phillips, Josh Poimboeuf, Matthias Brugger,
	Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, linux-mediatek

Hi,

On Mon, Oct 2, 2023 at 10:24 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote:
> > Some mediatek devices have the property
> > "mediatek,broken-save-restore-fw" in their GIC. This means that,
> > although the hardware supports pseudo-NMI, the firmware has a bug
> > that blocks enabling it. When we're in this state,
> > system_uses_irq_prio_masking() will return true but we'll fail to
> > actually enable the IRQ in the GIC.
> >
> > Let's make the code handle this. We'll detect that we failed to
> > request an IPI as NMI and fallback to requesting it normally. Though
> > we expect that either all of our requests will fail or all will
> > succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us
> > robust.
> >
> > Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> > Reported-by: Chen-Yu Tsai <wenst@chromium.org>
> > Closes: https://issuetracker.google.com/issues/197061987#comment68
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> >
> >  arch/arm64/kernel/smp.c | 19 ++++++++++++-------
> >  1 file changed, 12 insertions(+), 7 deletions(-)
>
> I'm not too keen on falling back here when we have no idea why the request failed.
>
> I'd prefer if we could check the `supports_pseudo_nmis` static key directly to
> account for the case of broken FW, e.g. as below.
>
> Mark.
>
> ---->8----
> From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001
> From: Mark Rutland <mark.rutland@arm.com>
> Date: Mon, 2 Oct 2023 18:00:36 +0100
> Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW
>
> Some MediaTek devices have broken firmware which corrupts some GICR
> registers behind the back of the OS, and pseudo-NMIs cannot be used on
> these devices. For more details see commit:
>
>   44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues")
>
> We did not take this problem into account in commit:
>
>   331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
>
> Since that commit arm64's SMP code will try to setup some IPIs as
> pseudo-NMIs, even on systems with broken FW. The GICv3 code will
> (rightly) reject attempts to request interrupts as pseudo-NMIs,
> resulting in boot-time failures.
>
> Avoid the problem by taking the broken FW into account when deciding to
> request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key
> named "supports_pseudo_nmis" which is false on systems with broken FW,
> and we can consult this within ipi_should_be_nmi().
>
> Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> Reported-by: Chen-Yu Tsai <wenst@chromium.org>
> Closes: https://issuetracker.google.com/issues/197061987#comment68
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Douglas Anderson <dianders@chromium.org>
> Cc: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kernel/smp.c      | 5 ++++-
>  drivers/irqchip/irq-gic-v3.c | 2 +-
>  2 files changed, 5 insertions(+), 2 deletions(-)

Sure, this is OK w/ me as long as folks don't mind accessing the
global here, it's OK w/ me:

Reviewed-by: Douglas Anderson <dianders@chromium.org>

It seems to work for me, thus:

Tested-by: Douglas Anderson <dianders@chromium.org>


> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 814d9aa93b21b..061c69160f90f 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
>
>  static bool ipi_should_be_nmi(enum ipi_msg_type ipi)
>  {
> -       if (!system_uses_irq_prio_masking())
> +       DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis);
> +
> +       if (!system_uses_irq_prio_masking() ||
> +           !static_branch_likely(&supports_pseudo_nmis))

One thought, actually, is whether we should actually change
system_uses_irq_prio_masking() to return the correct value. What do
you think?

-Doug

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
  2023-10-02 19:16   ` Doug Anderson
@ 2023-10-03 12:29     ` Mark Rutland
  2023-10-03 13:43       ` Doug Anderson
  0 siblings, 1 reply; 19+ messages in thread
From: Mark Rutland @ 2023-10-03 12:29 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Stephen Boyd,
	Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno,
	D Scott Phillips, Josh Poimboeuf, Matthias Brugger,
	Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, linux-mediatek

On Mon, Oct 02, 2023 at 12:16:17PM -0700, Doug Anderson wrote:
> Hi,
> 
> On Mon, Oct 2, 2023 at 10:24 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote:
> > > Some mediatek devices have the property
> > > "mediatek,broken-save-restore-fw" in their GIC. This means that,
> > > although the hardware supports pseudo-NMI, the firmware has a bug
> > > that blocks enabling it. When we're in this state,
> > > system_uses_irq_prio_masking() will return true but we'll fail to
> > > actually enable the IRQ in the GIC.
> > >
> > > Let's make the code handle this. We'll detect that we failed to
> > > request an IPI as NMI and fallback to requesting it normally. Though
> > > we expect that either all of our requests will fail or all will
> > > succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us
> > > robust.
> > >
> > > Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> > > Reported-by: Chen-Yu Tsai <wenst@chromium.org>
> > > Closes: https://issuetracker.google.com/issues/197061987#comment68
> > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > ---
> > >
> > >  arch/arm64/kernel/smp.c | 19 ++++++++++++-------
> > >  1 file changed, 12 insertions(+), 7 deletions(-)
> >
> > I'm not too keen on falling back here when we have no idea why the request failed.
> >
> > I'd prefer if we could check the `supports_pseudo_nmis` static key directly to
> > account for the case of broken FW, e.g. as below.
> >
> > Mark.
> >
> > ---->8----
> > From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001
> > From: Mark Rutland <mark.rutland@arm.com>
> > Date: Mon, 2 Oct 2023 18:00:36 +0100
> > Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW
> >
> > Some MediaTek devices have broken firmware which corrupts some GICR
> > registers behind the back of the OS, and pseudo-NMIs cannot be used on
> > these devices. For more details see commit:
> >
> >   44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues")
> >
> > We did not take this problem into account in commit:
> >
> >   331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> >
> > Since that commit arm64's SMP code will try to setup some IPIs as
> > pseudo-NMIs, even on systems with broken FW. The GICv3 code will
> > (rightly) reject attempts to request interrupts as pseudo-NMIs,
> > resulting in boot-time failures.
> >
> > Avoid the problem by taking the broken FW into account when deciding to
> > request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key
> > named "supports_pseudo_nmis" which is false on systems with broken FW,
> > and we can consult this within ipi_should_be_nmi().
> >
> > Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> > Reported-by: Chen-Yu Tsai <wenst@chromium.org>
> > Closes: https://issuetracker.google.com/issues/197061987#comment68
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: Douglas Anderson <dianders@chromium.org>
> > Cc: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/kernel/smp.c      | 5 ++++-
> >  drivers/irqchip/irq-gic-v3.c | 2 +-
> >  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> Sure, this is OK w/ me as long as folks don't mind accessing the
> global here, it's OK w/ me:
> 
> Reviewed-by: Douglas Anderson <dianders@chromium.org>
> 
> It seems to work for me, thus:
> 
> Tested-by: Douglas Anderson <dianders@chromium.org>
> 
> 
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index 814d9aa93b21b..061c69160f90f 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
> >
> >  static bool ipi_should_be_nmi(enum ipi_msg_type ipi)
> >  {
> > -       if (!system_uses_irq_prio_masking())
> > +       DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis);
> > +
> > +       if (!system_uses_irq_prio_masking() ||
> > +           !static_branch_likely(&supports_pseudo_nmis))
> 
> One thought, actually, is whether we should actually change
> system_uses_irq_prio_masking() to return the correct value. What do
> you think?

I don't think we should add this to system_uses_irq_prio_masking(); that's used
by the low-level flags manipulation code that gets inlined all over the place,
and that code will work regarldess of whether we actually use NMI priorities.

If we want to avoid using PMR masking *at all* on these platforms, we'd need to
detect that within can_use_gic_priorities() or early_enable_pseudo_nmi().

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
  2023-10-03 12:29     ` Mark Rutland
@ 2023-10-03 13:43       ` Doug Anderson
  2023-10-03 16:32         ` Mark Rutland
  0 siblings, 1 reply; 19+ messages in thread
From: Doug Anderson @ 2023-10-03 13:43 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Stephen Boyd,
	Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno,
	D Scott Phillips, Josh Poimboeuf, Matthias Brugger,
	Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, linux-mediatek

Hi,

On Tue, Oct 3, 2023 at 5:29 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Mon, Oct 02, 2023 at 12:16:17PM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Mon, Oct 2, 2023 at 10:24 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote:
> > > > Some mediatek devices have the property
> > > > "mediatek,broken-save-restore-fw" in their GIC. This means that,
> > > > although the hardware supports pseudo-NMI, the firmware has a bug
> > > > that blocks enabling it. When we're in this state,
> > > > system_uses_irq_prio_masking() will return true but we'll fail to
> > > > actually enable the IRQ in the GIC.
> > > >
> > > > Let's make the code handle this. We'll detect that we failed to
> > > > request an IPI as NMI and fallback to requesting it normally. Though
> > > > we expect that either all of our requests will fail or all will
> > > > succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us
> > > > robust.
> > > >
> > > > Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> > > > Reported-by: Chen-Yu Tsai <wenst@chromium.org>
> > > > Closes: https://issuetracker.google.com/issues/197061987#comment68
> > > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > > ---
> > > >
> > > >  arch/arm64/kernel/smp.c | 19 ++++++++++++-------
> > > >  1 file changed, 12 insertions(+), 7 deletions(-)
> > >
> > > I'm not too keen on falling back here when we have no idea why the request failed.
> > >
> > > I'd prefer if we could check the `supports_pseudo_nmis` static key directly to
> > > account for the case of broken FW, e.g. as below.
> > >
> > > Mark.
> > >
> > > ---->8----
> > > From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001
> > > From: Mark Rutland <mark.rutland@arm.com>
> > > Date: Mon, 2 Oct 2023 18:00:36 +0100
> > > Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW
> > >
> > > Some MediaTek devices have broken firmware which corrupts some GICR
> > > registers behind the back of the OS, and pseudo-NMIs cannot be used on
> > > these devices. For more details see commit:
> > >
> > >   44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues")
> > >
> > > We did not take this problem into account in commit:
> > >
> > >   331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> > >
> > > Since that commit arm64's SMP code will try to setup some IPIs as
> > > pseudo-NMIs, even on systems with broken FW. The GICv3 code will
> > > (rightly) reject attempts to request interrupts as pseudo-NMIs,
> > > resulting in boot-time failures.
> > >
> > > Avoid the problem by taking the broken FW into account when deciding to
> > > request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key
> > > named "supports_pseudo_nmis" which is false on systems with broken FW,
> > > and we can consult this within ipi_should_be_nmi().
> > >
> > > Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> > > Reported-by: Chen-Yu Tsai <wenst@chromium.org>
> > > Closes: https://issuetracker.google.com/issues/197061987#comment68
> > > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > > Cc: Douglas Anderson <dianders@chromium.org>
> > > Cc: Marc Zyngier <maz@kernel.org>
> > > ---
> > >  arch/arm64/kernel/smp.c      | 5 ++++-
> > >  drivers/irqchip/irq-gic-v3.c | 2 +-
> > >  2 files changed, 5 insertions(+), 2 deletions(-)
> >
> > Sure, this is OK w/ me as long as folks don't mind accessing the
> > global here, it's OK w/ me:
> >
> > Reviewed-by: Douglas Anderson <dianders@chromium.org>
> >
> > It seems to work for me, thus:
> >
> > Tested-by: Douglas Anderson <dianders@chromium.org>
> >
> >
> > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > > index 814d9aa93b21b..061c69160f90f 100644
> > > --- a/arch/arm64/kernel/smp.c
> > > +++ b/arch/arm64/kernel/smp.c
> > > @@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
> > >
> > >  static bool ipi_should_be_nmi(enum ipi_msg_type ipi)
> > >  {
> > > -       if (!system_uses_irq_prio_masking())
> > > +       DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis);
> > > +
> > > +       if (!system_uses_irq_prio_masking() ||
> > > +           !static_branch_likely(&supports_pseudo_nmis))
> >
> > One thought, actually, is whether we should actually change
> > system_uses_irq_prio_masking() to return the correct value. What do
> > you think?
>
> I don't think we should add this to system_uses_irq_prio_masking(); that's used
> by the low-level flags manipulation code that gets inlined all over the place,
> and that code will work regarldess of whether we actually use NMI priorities.
>
> If we want to avoid using PMR masking *at all* on these platforms, we'd need to
> detect that within can_use_gic_priorities() or early_enable_pseudo_nmi().

I suspect that anyone trying to use PMR masking on these systems for
any purpose will be unhappy. The issue is talked about in:

https://issuetracker.google.com/281831288

...where you can see that the firmware on these systems isn't properly
saving/restoring some registers, including GICR_IPRIORITYR.

-Doug

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
  2023-10-03 13:43       ` Doug Anderson
@ 2023-10-03 16:32         ` Mark Rutland
  2023-10-03 19:32           ` Doug Anderson
  0 siblings, 1 reply; 19+ messages in thread
From: Mark Rutland @ 2023-10-03 16:32 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Stephen Boyd,
	Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno,
	D Scott Phillips, Josh Poimboeuf, Matthias Brugger,
	Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, linux-mediatek

On Tue, Oct 03, 2023 at 06:43:07AM -0700, Doug Anderson wrote:
> Hi,
> 
> On Tue, Oct 3, 2023 at 5:29 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Mon, Oct 02, 2023 at 12:16:17PM -0700, Doug Anderson wrote:
> > > Hi,
> > >
> > > On Mon, Oct 2, 2023 at 10:24 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > >
> > > > On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote:
> > > > > Some mediatek devices have the property
> > > > > "mediatek,broken-save-restore-fw" in their GIC. This means that,
> > > > > although the hardware supports pseudo-NMI, the firmware has a bug
> > > > > that blocks enabling it. When we're in this state,
> > > > > system_uses_irq_prio_masking() will return true but we'll fail to
> > > > > actually enable the IRQ in the GIC.
> > > > >
> > > > > Let's make the code handle this. We'll detect that we failed to
> > > > > request an IPI as NMI and fallback to requesting it normally. Though
> > > > > we expect that either all of our requests will fail or all will
> > > > > succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us
> > > > > robust.
> > > > >
> > > > > Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> > > > > Reported-by: Chen-Yu Tsai <wenst@chromium.org>
> > > > > Closes: https://issuetracker.google.com/issues/197061987#comment68
> > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > > > ---
> > > > >
> > > > >  arch/arm64/kernel/smp.c | 19 ++++++++++++-------
> > > > >  1 file changed, 12 insertions(+), 7 deletions(-)
> > > >
> > > > I'm not too keen on falling back here when we have no idea why the request failed.
> > > >
> > > > I'd prefer if we could check the `supports_pseudo_nmis` static key directly to
> > > > account for the case of broken FW, e.g. as below.
> > > >
> > > > Mark.
> > > >
> > > > ---->8----
> > > > From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001
> > > > From: Mark Rutland <mark.rutland@arm.com>
> > > > Date: Mon, 2 Oct 2023 18:00:36 +0100
> > > > Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW
> > > >
> > > > Some MediaTek devices have broken firmware which corrupts some GICR
> > > > registers behind the back of the OS, and pseudo-NMIs cannot be used on
> > > > these devices. For more details see commit:
> > > >
> > > >   44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues")
> > > >
> > > > We did not take this problem into account in commit:
> > > >
> > > >   331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> > > >
> > > > Since that commit arm64's SMP code will try to setup some IPIs as
> > > > pseudo-NMIs, even on systems with broken FW. The GICv3 code will
> > > > (rightly) reject attempts to request interrupts as pseudo-NMIs,
> > > > resulting in boot-time failures.
> > > >
> > > > Avoid the problem by taking the broken FW into account when deciding to
> > > > request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key
> > > > named "supports_pseudo_nmis" which is false on systems with broken FW,
> > > > and we can consult this within ipi_should_be_nmi().
> > > >
> > > > Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> > > > Reported-by: Chen-Yu Tsai <wenst@chromium.org>
> > > > Closes: https://issuetracker.google.com/issues/197061987#comment68
> > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > > > Cc: Douglas Anderson <dianders@chromium.org>
> > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > ---
> > > >  arch/arm64/kernel/smp.c      | 5 ++++-
> > > >  drivers/irqchip/irq-gic-v3.c | 2 +-
> > > >  2 files changed, 5 insertions(+), 2 deletions(-)
> > >
> > > Sure, this is OK w/ me as long as folks don't mind accessing the
> > > global here, it's OK w/ me:
> > >
> > > Reviewed-by: Douglas Anderson <dianders@chromium.org>
> > >
> > > It seems to work for me, thus:
> > >
> > > Tested-by: Douglas Anderson <dianders@chromium.org>
> > >
> > >
> > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > > > index 814d9aa93b21b..061c69160f90f 100644
> > > > --- a/arch/arm64/kernel/smp.c
> > > > +++ b/arch/arm64/kernel/smp.c
> > > > @@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
> > > >
> > > >  static bool ipi_should_be_nmi(enum ipi_msg_type ipi)
> > > >  {
> > > > -       if (!system_uses_irq_prio_masking())
> > > > +       DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis);
> > > > +
> > > > +       if (!system_uses_irq_prio_masking() ||
> > > > +           !static_branch_likely(&supports_pseudo_nmis))
> > >
> > > One thought, actually, is whether we should actually change
> > > system_uses_irq_prio_masking() to return the correct value. What do
> > > you think?
> >
> > I don't think we should add this to system_uses_irq_prio_masking(); that's used
> > by the low-level flags manipulation code that gets inlined all over the place,
> > and that code will work regarldess of whether we actually use NMI priorities.
> >
> > If we want to avoid using PMR masking *at all* on these platforms, we'd need to
> > detect that within can_use_gic_priorities() or early_enable_pseudo_nmi().
> 
> I suspect that anyone trying to use PMR masking on these systems for
> any purpose will be unhappy. The issue is talked about in:
> 
> https://issuetracker.google.com/281831288
> 
> ...where you can see that the firmware on these systems isn't properly
> saving/restoring some registers, including GICR_IPRIORITYR.

Ok, then that's a latent bug even before the IPI changes, going back to the
original workaround in commit:

  44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues")

For the sake of those reading the archive, can we have a better description of
what exactly happens on these boards?

IIUC on these boards the firmware fails to save+restore (some?) GICR registers
across (some?) PSCI CPU_SUSPEND idle states.

Which registers does it save+restore?

Does it reset other registers into a specific state?

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
  2023-10-03 16:32         ` Mark Rutland
@ 2023-10-03 19:32           ` Doug Anderson
  2023-10-04  9:59             ` Mark Rutland
  0 siblings, 1 reply; 19+ messages in thread
From: Doug Anderson @ 2023-10-03 19:32 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Stephen Boyd,
	Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno,
	D Scott Phillips, Josh Poimboeuf, Matthias Brugger,
	Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, linux-mediatek

Hi,

On Tue, Oct 3, 2023 at 9:32 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, Oct 03, 2023 at 06:43:07AM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Tue, Oct 3, 2023 at 5:29 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > On Mon, Oct 02, 2023 at 12:16:17PM -0700, Doug Anderson wrote:
> > > > Hi,
> > > >
> > > > On Mon, Oct 2, 2023 at 10:24 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > >
> > > > > On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote:
> > > > > > Some mediatek devices have the property
> > > > > > "mediatek,broken-save-restore-fw" in their GIC. This means that,
> > > > > > although the hardware supports pseudo-NMI, the firmware has a bug
> > > > > > that blocks enabling it. When we're in this state,
> > > > > > system_uses_irq_prio_masking() will return true but we'll fail to
> > > > > > actually enable the IRQ in the GIC.
> > > > > >
> > > > > > Let's make the code handle this. We'll detect that we failed to
> > > > > > request an IPI as NMI and fallback to requesting it normally. Though
> > > > > > we expect that either all of our requests will fail or all will
> > > > > > succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us
> > > > > > robust.
> > > > > >
> > > > > > Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> > > > > > Reported-by: Chen-Yu Tsai <wenst@chromium.org>
> > > > > > Closes: https://issuetracker.google.com/issues/197061987#comment68
> > > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > > > > ---
> > > > > >
> > > > > >  arch/arm64/kernel/smp.c | 19 ++++++++++++-------
> > > > > >  1 file changed, 12 insertions(+), 7 deletions(-)
> > > > >
> > > > > I'm not too keen on falling back here when we have no idea why the request failed.
> > > > >
> > > > > I'd prefer if we could check the `supports_pseudo_nmis` static key directly to
> > > > > account for the case of broken FW, e.g. as below.
> > > > >
> > > > > Mark.
> > > > >
> > > > > ---->8----
> > > > > From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001
> > > > > From: Mark Rutland <mark.rutland@arm.com>
> > > > > Date: Mon, 2 Oct 2023 18:00:36 +0100
> > > > > Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW
> > > > >
> > > > > Some MediaTek devices have broken firmware which corrupts some GICR
> > > > > registers behind the back of the OS, and pseudo-NMIs cannot be used on
> > > > > these devices. For more details see commit:
> > > > >
> > > > >   44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues")
> > > > >
> > > > > We did not take this problem into account in commit:
> > > > >
> > > > >   331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> > > > >
> > > > > Since that commit arm64's SMP code will try to setup some IPIs as
> > > > > pseudo-NMIs, even on systems with broken FW. The GICv3 code will
> > > > > (rightly) reject attempts to request interrupts as pseudo-NMIs,
> > > > > resulting in boot-time failures.
> > > > >
> > > > > Avoid the problem by taking the broken FW into account when deciding to
> > > > > request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key
> > > > > named "supports_pseudo_nmis" which is false on systems with broken FW,
> > > > > and we can consult this within ipi_should_be_nmi().
> > > > >
> > > > > Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> > > > > Reported-by: Chen-Yu Tsai <wenst@chromium.org>
> > > > > Closes: https://issuetracker.google.com/issues/197061987#comment68
> > > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > > > > Cc: Douglas Anderson <dianders@chromium.org>
> > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > ---
> > > > >  arch/arm64/kernel/smp.c      | 5 ++++-
> > > > >  drivers/irqchip/irq-gic-v3.c | 2 +-
> > > > >  2 files changed, 5 insertions(+), 2 deletions(-)
> > > >
> > > > Sure, this is OK w/ me as long as folks don't mind accessing the
> > > > global here, it's OK w/ me:
> > > >
> > > > Reviewed-by: Douglas Anderson <dianders@chromium.org>
> > > >
> > > > It seems to work for me, thus:
> > > >
> > > > Tested-by: Douglas Anderson <dianders@chromium.org>
> > > >
> > > >
> > > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > > > > index 814d9aa93b21b..061c69160f90f 100644
> > > > > --- a/arch/arm64/kernel/smp.c
> > > > > +++ b/arch/arm64/kernel/smp.c
> > > > > @@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
> > > > >
> > > > >  static bool ipi_should_be_nmi(enum ipi_msg_type ipi)
> > > > >  {
> > > > > -       if (!system_uses_irq_prio_masking())
> > > > > +       DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis);
> > > > > +
> > > > > +       if (!system_uses_irq_prio_masking() ||
> > > > > +           !static_branch_likely(&supports_pseudo_nmis))
> > > >
> > > > One thought, actually, is whether we should actually change
> > > > system_uses_irq_prio_masking() to return the correct value. What do
> > > > you think?
> > >
> > > I don't think we should add this to system_uses_irq_prio_masking(); that's used
> > > by the low-level flags manipulation code that gets inlined all over the place,
> > > and that code will work regarldess of whether we actually use NMI priorities.
> > >
> > > If we want to avoid using PMR masking *at all* on these platforms, we'd need to
> > > detect that within can_use_gic_priorities() or early_enable_pseudo_nmi().
> >
> > I suspect that anyone trying to use PMR masking on these systems for
> > any purpose will be unhappy. The issue is talked about in:
> >
> > https://issuetracker.google.com/281831288
> >
> > ...where you can see that the firmware on these systems isn't properly
> > saving/restoring some registers, including GICR_IPRIORITYR.
>
> Ok, then that's a latent bug even before the IPI changes, going back to the
> original workaround in commit:
>
>   44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues")
>
> For the sake of those reading the archive, can we have a better description of
> what exactly happens on these boards?
>
> IIUC on these boards the firmware fails to save+restore (some?) GICR registers
> across (some?) PSCI CPU_SUSPEND idle states.
>
> Which registers does it save+restore?
>
> Does it reset other registers into a specific state?

Though I'm not an expert in this area, my understanding is that in
some of the deeper idle states then GICR registers are lost. That
matches a thread [0] I found. In early investigation I found that I
could comment out `cpu-idle-states` in the device tree to avoid the
problems [1]. I believe this is fully expected which is why firmware
is supposed to save/restore these registers whenever a low power is
entered/exited. I'd presume that any register not properly
saved/restored comes up in whatever its default state is.

As far as pseudo-NMI was concerned, all I really needed to
save/restore was "GICR_NUM_IPRIORITYR" [2], but Marc Zyngier looked at
the code and identified [3] at least these in addition:
* GICR_CTLR
* GICR_ISPENDR0
* GICR_ISACTIVER0
* GICR_NSACR

That list seems to match the Arm Trusted Firmware patch that fixed the
issue [4]. ...but it will be impossible to ever get the fix rolled out
to all devices. Even if we could get firmware spins Qualified for
every device there will still be cases where we'll boot with the old
firmware. Since we _don't_ bundle the device tree with the firmware,
we believe that the quirk mechanism that we came up with (add a quirk
in never device trees and firmware removes the quirk when we have a
fix) is at least a robust/reliable way to detect the issue.

The whole issue seems rather concerning, but (apparently) it never
caused issues in the kernel until we tried to use pseudo-NMI.

[0] https://github.com/ARM-software/tf-issues/issues/464
[1] https://issuetracker.google.com/issues/197061987#comment27
[2] https://crrev.com/c/4519877
[3] https://issuetracker.google.com/issues/281831288#comment5
[4] https://github.com/ARM-software/arm-trusted-firmware/commit/1c62cc7fbdf2ec2a7e69b3c027d507fcafdcaa12

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
  2023-10-03 19:32           ` Doug Anderson
@ 2023-10-04  9:59             ` Mark Rutland
  2023-10-04 10:15               ` Marc Zyngier
  0 siblings, 1 reply; 19+ messages in thread
From: Mark Rutland @ 2023-10-04  9:59 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Stephen Boyd,
	Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno,
	D Scott Phillips, Josh Poimboeuf, Matthias Brugger,
	Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, linux-mediatek

On Tue, Oct 03, 2023 at 12:32:39PM -0700, Doug Anderson wrote:
> Hi,
> 
> On Tue, Oct 3, 2023 at 9:32 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Tue, Oct 03, 2023 at 06:43:07AM -0700, Doug Anderson wrote:
> > > Hi,
> > >
> > > On Tue, Oct 3, 2023 at 5:29 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > >
> > > > On Mon, Oct 02, 2023 at 12:16:17PM -0700, Doug Anderson wrote:
> > > > > Hi,
> > > > >
> > > > > On Mon, Oct 2, 2023 at 10:24 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > >
> > > > > > On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote:
> > > > > > > Some mediatek devices have the property
> > > > > > > "mediatek,broken-save-restore-fw" in their GIC. This means that,
> > > > > > > although the hardware supports pseudo-NMI, the firmware has a bug
> > > > > > > that blocks enabling it. When we're in this state,
> > > > > > > system_uses_irq_prio_masking() will return true but we'll fail to
> > > > > > > actually enable the IRQ in the GIC.
> > > > > > >
> > > > > > > Let's make the code handle this. We'll detect that we failed to
> > > > > > > request an IPI as NMI and fallback to requesting it normally. Though
> > > > > > > we expect that either all of our requests will fail or all will
> > > > > > > succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us
> > > > > > > robust.
> > > > > > >
> > > > > > > Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> > > > > > > Reported-by: Chen-Yu Tsai <wenst@chromium.org>
> > > > > > > Closes: https://issuetracker.google.com/issues/197061987#comment68
> > > > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > > > > > ---
> > > > > > >
> > > > > > >  arch/arm64/kernel/smp.c | 19 ++++++++++++-------
> > > > > > >  1 file changed, 12 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > I'm not too keen on falling back here when we have no idea why the request failed.
> > > > > >
> > > > > > I'd prefer if we could check the `supports_pseudo_nmis` static key directly to
> > > > > > account for the case of broken FW, e.g. as below.
> > > > > >
> > > > > > Mark.
> > > > > >
> > > > > > ---->8----
> > > > > > From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001
> > > > > > From: Mark Rutland <mark.rutland@arm.com>
> > > > > > Date: Mon, 2 Oct 2023 18:00:36 +0100
> > > > > > Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW
> > > > > >
> > > > > > Some MediaTek devices have broken firmware which corrupts some GICR
> > > > > > registers behind the back of the OS, and pseudo-NMIs cannot be used on
> > > > > > these devices. For more details see commit:
> > > > > >
> > > > > >   44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues")
> > > > > >
> > > > > > We did not take this problem into account in commit:
> > > > > >
> > > > > >   331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> > > > > >
> > > > > > Since that commit arm64's SMP code will try to setup some IPIs as
> > > > > > pseudo-NMIs, even on systems with broken FW. The GICv3 code will
> > > > > > (rightly) reject attempts to request interrupts as pseudo-NMIs,
> > > > > > resulting in boot-time failures.
> > > > > >
> > > > > > Avoid the problem by taking the broken FW into account when deciding to
> > > > > > request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key
> > > > > > named "supports_pseudo_nmis" which is false on systems with broken FW,
> > > > > > and we can consult this within ipi_should_be_nmi().
> > > > > >
> > > > > > Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> > > > > > Reported-by: Chen-Yu Tsai <wenst@chromium.org>
> > > > > > Closes: https://issuetracker.google.com/issues/197061987#comment68
> > > > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > > > > > Cc: Douglas Anderson <dianders@chromium.org>
> > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > ---
> > > > > >  arch/arm64/kernel/smp.c      | 5 ++++-
> > > > > >  drivers/irqchip/irq-gic-v3.c | 2 +-
> > > > > >  2 files changed, 5 insertions(+), 2 deletions(-)
> > > > >
> > > > > Sure, this is OK w/ me as long as folks don't mind accessing the
> > > > > global here, it's OK w/ me:
> > > > >
> > > > > Reviewed-by: Douglas Anderson <dianders@chromium.org>
> > > > >
> > > > > It seems to work for me, thus:
> > > > >
> > > > > Tested-by: Douglas Anderson <dianders@chromium.org>
> > > > >
> > > > >
> > > > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > > > > > index 814d9aa93b21b..061c69160f90f 100644
> > > > > > --- a/arch/arm64/kernel/smp.c
> > > > > > +++ b/arch/arm64/kernel/smp.c
> > > > > > @@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
> > > > > >
> > > > > >  static bool ipi_should_be_nmi(enum ipi_msg_type ipi)
> > > > > >  {
> > > > > > -       if (!system_uses_irq_prio_masking())
> > > > > > +       DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis);
> > > > > > +
> > > > > > +       if (!system_uses_irq_prio_masking() ||
> > > > > > +           !static_branch_likely(&supports_pseudo_nmis))
> > > > >
> > > > > One thought, actually, is whether we should actually change
> > > > > system_uses_irq_prio_masking() to return the correct value. What do
> > > > > you think?
> > > >
> > > > I don't think we should add this to system_uses_irq_prio_masking(); that's used
> > > > by the low-level flags manipulation code that gets inlined all over the place,
> > > > and that code will work regarldess of whether we actually use NMI priorities.
> > > >
> > > > If we want to avoid using PMR masking *at all* on these platforms, we'd need to
> > > > detect that within can_use_gic_priorities() or early_enable_pseudo_nmi().
> > >
> > > I suspect that anyone trying to use PMR masking on these systems for
> > > any purpose will be unhappy. The issue is talked about in:
> > >
> > > https://issuetracker.google.com/281831288
> > >
> > > ...where you can see that the firmware on these systems isn't properly
> > > saving/restoring some registers, including GICR_IPRIORITYR.
> >
> > Ok, then that's a latent bug even before the IPI changes, going back to the
> > original workaround in commit:
> >
> >   44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues")
> >
> > For the sake of those reading the archive, can we have a better description of
> > what exactly happens on these boards?
> >
> > IIUC on these boards the firmware fails to save+restore (some?) GICR registers
> > across (some?) PSCI CPU_SUSPEND idle states.
> >
> > Which registers does it save+restore?
> >
> > Does it reset other registers into a specific state?
> 
> Though I'm not an expert in this area, my understanding is that in
> some of the deeper idle states then GICR registers are lost. That
> matches a thread [0] I found. In early investigation I found that I
> could comment out `cpu-idle-states` in the device tree to avoid the
> problems [1]. I believe this is fully expected which is why firmware
> is supposed to save/restore these registers whenever a low power is
> entered/exited. I'd presume that any register not properly
> saved/restored comes up in whatever its default state is.
> 
> As far as pseudo-NMI was concerned, all I really needed to
> save/restore was "GICR_NUM_IPRIORITYR" [2], but Marc Zyngier looked at
> the code and identified [3] at least these in addition:
> * GICR_CTLR
> * GICR_ISPENDR0
> * GICR_ISACTIVER0
> * GICR_NSACR

Looking at the GIC spec (Arm IHI 0069H), page 12-673, I see for all of the
GICR_IPRIORITYR<n>.Priority_offset_*B fields:

| The reset behavior of this field is:
| • On a GIC reset, this field resets to an architecturally UNKNOWN value.

... so at least per the architecture these could be reset to arbitrary values,

and that priority might permit SGI/PPIs to be taken IRQs are priority-masked,
or to prevent SGI/PPIs to be taken when priority-unmasked.

I also see for GICR_CTLR that EnableLPIs would be reset to 0, and IIUC that
means LPIs won't work on these parts, which seems like a problem.

> That list seems to match the Arm Trusted Firmware patch that fixed the
> issue [4]. ...but it will be impossible to ever get the fix rolled out
> to all devices. Even if we could get firmware spins Qualified for
> every device there will still be cases where we'll boot with the old
> firmware. Since we _don't_ bundle the device tree with the firmware,
> we believe that the quirk mechanism that we came up with (add a quirk
> in never device trees and firmware removes the quirk when we have a
> fix) is at least a robust/reliable way to detect the issue.
> 
> The whole issue seems rather concerning, but (apparently) it never
> caused issues in the kernel until we tried to use pseudo-NMI.

Given you haven't seen any issues, I suspect those are getting reset to fixed
values that happens to work out for us, but it is a bit worrisome more
generally (e.g. the LPI case above).

Mark.

> 
> [0] https://github.com/ARM-software/tf-issues/issues/464
> [1] https://issuetracker.google.com/issues/197061987#comment27
> [2] https://crrev.com/c/4519877
> [3] https://issuetracker.google.com/issues/281831288#comment5
> [4] https://github.com/ARM-software/arm-trusted-firmware/commit/1c62cc7fbdf2ec2a7e69b3c027d507fcafdcaa12

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
  2023-10-04  9:59             ` Mark Rutland
@ 2023-10-04 10:15               ` Marc Zyngier
  2023-10-04 14:04                 ` Doug Anderson
  0 siblings, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2023-10-04 10:15 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Doug Anderson, Catalin Marinas, Will Deacon, Stephen Boyd,
	Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno,
	D Scott Phillips, Josh Poimboeuf, Matthias Brugger,
	Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, linux-mediatek

On Wed, 04 Oct 2023 10:59:50 +0100,
Mark Rutland <mark.rutland@arm.com> wrote:
> 
> Given you haven't seen any issues, I suspect those are getting reset to fixed
> values that happens to work out for us, but it is a bit worrisome more
> generally (e.g. the LPI case above).

It is likely that these SoCs don't even have an ITS.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
  2023-10-04 10:15               ` Marc Zyngier
@ 2023-10-04 14:04                 ` Doug Anderson
  2023-10-05 10:27                   ` Mark Rutland
  0 siblings, 1 reply; 19+ messages in thread
From: Doug Anderson @ 2023-10-04 14:04 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Mark Rutland, Catalin Marinas, Will Deacon, Stephen Boyd,
	Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno,
	D Scott Phillips, Josh Poimboeuf, Matthias Brugger,
	Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, linux-mediatek

Hi,

On Wed, Oct 4, 2023 at 3:15 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Wed, 04 Oct 2023 10:59:50 +0100,
> Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > Given you haven't seen any issues, I suspect those are getting reset to fixed
> > values that happens to work out for us, but it is a bit worrisome more
> > generally (e.g. the LPI case above).
>
> It is likely that these SoCs don't even have an ITS.

Right. That was what we decided [1] when Marc pointed this out earlier.

Overall: we know that this firmware behavior is not good but we're
stuck with it. :( At the very least, any new devices coming out will
have this fixed. Presumably if old devices are working OK enough today
(as long as you don't enable pseudo-NMI) then they can be made to keep
working?

So circling back: what patch should we actually land? As of right now
only pseudo-NMI is broken, but it would be good to make sure that if
the kernel later adds other features that would be broken on this
hardware that it gets handled properly...

[1] https://issuetracker.google.com/issues/281831288#comment4

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
  2023-10-04 14:04                 ` Doug Anderson
@ 2023-10-05 10:27                   ` Mark Rutland
  2023-10-05 15:34                     ` Doug Anderson
  2023-10-06 10:20                     ` Marc Zyngier
  0 siblings, 2 replies; 19+ messages in thread
From: Mark Rutland @ 2023-10-05 10:27 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Stephen Boyd,
	Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno,
	D Scott Phillips, Josh Poimboeuf, Matthias Brugger,
	Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, linux-mediatek

On Wed, Oct 04, 2023 at 07:04:12AM -0700, Doug Anderson wrote:
> On Wed, Oct 4, 2023 at 3:15 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Wed, 04 Oct 2023 10:59:50 +0100,
> > Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > Given you haven't seen any issues, I suspect those are getting reset to fixed
> > > values that happens to work out for us, but it is a bit worrisome more
> > > generally (e.g. the LPI case above).
> >
> > It is likely that these SoCs don't even have an ITS.
> 
> Right. That was what we decided [1] when Marc pointed this out earlier.
> 
> Overall: we know that this firmware behavior is not good but we're
> stuck with it. :( At the very least, any new devices coming out will
> have this fixed. Presumably if old devices are working OK enough today
> (as long as you don't enable pseudo-NMI) then they can be made to keep
> working?
> 
> So circling back: what patch should we actually land?

For now I'd prefer we took the patch I sent in:

  https://lore.kernel.org/linux-arm-kernel/ZRr8r7XMoyDKaitd@FVFF77S0Q05N.cambridge.arm.com/

... as that leaves us no worse than before this series, and it's pretty simple.

> As of right now only pseudo-NMI is broken, but it would be good to make sure
> that if the kernel later adds other features that would be broken on this
> hardware that it gets handled properly...

Going further than the above, I think there are three options here:

1) Complete fix: depend on a working firmware, and throw this workaround away.

   IIUC from the above, that's not something you can commit to.

2) Partial fix: have the kernel save/restore everything.

   IIUC this is unpalatable.

3) Partial fix: make the ARM64_HAS_GIC_PRIO_MASKING cpucap depend on the
   absence of a "mediatek,broken-save-restore-fw" property in the DT. I believe
   we can check that in early_enable_pseudo_nmi() or can_use_gic_priorities().

   That'll avoid potential issues if/when we change the priorities used for
   pNMI (which is something I've been looking at).

I'm happy with (3) if Marc is.

Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
  2023-10-05 10:27                   ` Mark Rutland
@ 2023-10-05 15:34                     ` Doug Anderson
  2023-10-06 12:19                       ` Catalin Marinas
  2023-10-06 10:20                     ` Marc Zyngier
  1 sibling, 1 reply; 19+ messages in thread
From: Doug Anderson @ 2023-10-05 15:34 UTC (permalink / raw)
  To: Mark Rutland, Catalin Marinas, Will Deacon
  Cc: Marc Zyngier, Stephen Boyd, Valentin Schneider, Chen-Yu Tsai,
	AngeloGioacchino Del Regno, D Scott Phillips, Josh Poimboeuf,
	Matthias Brugger, Misono Tomohiro, Peter Zijlstra, Sumit Garg,
	Thomas Gleixner, linux-arm-kernel, linux-kernel, linux-mediatek

Hi,

On Thu, Oct 5, 2023 at 3:27 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Wed, Oct 04, 2023 at 07:04:12AM -0700, Doug Anderson wrote:
> > On Wed, Oct 4, 2023 at 3:15 AM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Wed, 04 Oct 2023 10:59:50 +0100,
> > > Mark Rutland <mark.rutland@arm.com> wrote:
> > > >
> > > > Given you haven't seen any issues, I suspect those are getting reset to fixed
> > > > values that happens to work out for us, but it is a bit worrisome more
> > > > generally (e.g. the LPI case above).
> > >
> > > It is likely that these SoCs don't even have an ITS.
> >
> > Right. That was what we decided [1] when Marc pointed this out earlier.
> >
> > Overall: we know that this firmware behavior is not good but we're
> > stuck with it. :( At the very least, any new devices coming out will
> > have this fixed. Presumably if old devices are working OK enough today
> > (as long as you don't enable pseudo-NMI) then they can be made to keep
> > working?
> >
> > So circling back: what patch should we actually land?
>
> For now I'd prefer we took the patch I sent in:
>
>   https://lore.kernel.org/linux-arm-kernel/ZRr8r7XMoyDKaitd@FVFF77S0Q05N.cambridge.arm.com/
>
> ... as that leaves us no worse than before this series, and it's pretty simple.

Sounds good to me!

Catalin / Will: Please yell if there's anything you need me to do.
Otherwise I'll assume you'll pick up Mark's patch instead of my patch
#1 and then you'll pick up my patch #2.


> > As of right now only pseudo-NMI is broken, but it would be good to make sure
> > that if the kernel later adds other features that would be broken on this
> > hardware that it gets handled properly...
>
> Going further than the above, I think there are three options here:
>
> 1) Complete fix: depend on a working firmware, and throw this workaround away.
>
>    IIUC from the above, that's not something you can commit to.

Right. We've landed the fix in the firmware branch for many of the
devices that had the issue but there's a whole process (and cost
involved) in getting this actually rolled out. Each unique board needs
to kick off a FW qual. Given that nothing is actually broken today
it's hard to justify a FW qual just for this, so we're left
piggybacking on the next reason for a FW qual (if there is one).
...even if we could kick them off all instantly, though, it's always
best not to rely on a FW fix, especially if (as in this case) it's to
keep us from crashing. There'll always be some case where someone
tries to boot a new OS with an old firmware. One such case can happen
when booting recovery images on ChromeOS where the device always boots
from the Read-only (not updatable) firmware.


> 2) Partial fix: have the kernel save/restore everything.
>
>    IIUC this is unpalatable.

Yeah, Marc already NAKed it.


> 3) Partial fix: make the ARM64_HAS_GIC_PRIO_MASKING cpucap depend on the
>    absence of a "mediatek,broken-save-restore-fw" property in the DT. I believe
>    we can check that in early_enable_pseudo_nmi() or can_use_gic_priorities().
>
>    That'll avoid potential issues if/when we change the priorities used for
>    pNMI (which is something I've been looking at).
>
> I'm happy with (3) if Marc is.

Yeah, that seems best to me as a long term solution, too.

-Doug

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
  2023-10-02 17:24 ` [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Mark Rutland
  2023-10-02 19:16   ` Doug Anderson
@ 2023-10-06  9:55   ` Marc Zyngier
  2023-10-06 10:20   ` Chen-Yu Tsai
  2 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2023-10-06  9:55 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Douglas Anderson, Catalin Marinas, Will Deacon, Stephen Boyd,
	Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno,
	D Scott Phillips, Josh Poimboeuf, Matthias Brugger,
	Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, linux-mediatek

On Mon, 02 Oct 2023 18:24:11 +0100,
Mark Rutland <mark.rutland@arm.com> wrote:
> 
> On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote:
> > Some mediatek devices have the property
> > "mediatek,broken-save-restore-fw" in their GIC. This means that,
> > although the hardware supports pseudo-NMI, the firmware has a bug
> > that blocks enabling it. When we're in this state,
> > system_uses_irq_prio_masking() will return true but we'll fail to
> > actually enable the IRQ in the GIC.
> > 
> > Let's make the code handle this. We'll detect that we failed to
> > request an IPI as NMI and fallback to requesting it normally. Though
> > we expect that either all of our requests will fail or all will
> > succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us
> > robust.
> > 
> > Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> > Reported-by: Chen-Yu Tsai <wenst@chromium.org>
> > Closes: https://issuetracker.google.com/issues/197061987#comment68
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> > 
> >  arch/arm64/kernel/smp.c | 19 ++++++++++++-------
> >  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> I'm not too keen on falling back here when we have no idea why the request failed.
> 
> I'd prefer if we could check the `supports_pseudo_nmis` static key directly to
> account for the case of broken FW, e.g. as below.
> 
> Mark.
> 
> ---->8----
> From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001
> From: Mark Rutland <mark.rutland@arm.com>
> Date: Mon, 2 Oct 2023 18:00:36 +0100
> Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW
> 
> Some MediaTek devices have broken firmware which corrupts some GICR
> registers behind the back of the OS, and pseudo-NMIs cannot be used on
> these devices. For more details see commit:
> 
>   44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues")
> 
> We did not take this problem into account in commit:
> 
>   331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> 
> Since that commit arm64's SMP code will try to setup some IPIs as
> pseudo-NMIs, even on systems with broken FW. The GICv3 code will
> (rightly) reject attempts to request interrupts as pseudo-NMIs,
> resulting in boot-time failures.
> 
> Avoid the problem by taking the broken FW into account when deciding to
> request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key
> named "supports_pseudo_nmis" which is false on systems with broken FW,
> and we can consult this within ipi_should_be_nmi().
> 
> Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> Reported-by: Chen-Yu Tsai <wenst@chromium.org>
> Closes: https://issuetracker.google.com/issues/197061987#comment68
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Douglas Anderson <dianders@chromium.org>
> Cc: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kernel/smp.c      | 5 ++++-
>  drivers/irqchip/irq-gic-v3.c | 2 +-
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 814d9aa93b21b..061c69160f90f 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
>  
>  static bool ipi_should_be_nmi(enum ipi_msg_type ipi)
>  {
> -	if (!system_uses_irq_prio_masking())
> +	DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis);
> +
> +	if (!system_uses_irq_prio_masking() ||
> +	    !static_branch_likely(&supports_pseudo_nmis))
>  		return false;
>  
>  	switch (ipi) {
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 787ccc880b22d..737da1b9aabf2 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -106,7 +106,7 @@ static DEFINE_STATIC_KEY_TRUE(supports_deactivate_key);
>   * - Figure 4-7 Secure read of the priority field for a Non-secure Group 1
>   *   interrupt.
>   */
> -static DEFINE_STATIC_KEY_FALSE(supports_pseudo_nmis);
> +DEFINE_STATIC_KEY_FALSE(supports_pseudo_nmis);
>  
>  DEFINE_STATIC_KEY_FALSE(gic_nonsecure_priorities);
>  EXPORT_SYMBOL(gic_nonsecure_priorities);

This last hunk is going to result in more spam from the robots about
global objects without a previous declaration. Not that I care the
least, but worth mentioning.

Other than that:

Reviewed-by: Marc Zyngier <maz@kernel.org>

Please take it via the arm64 tree with patch #1

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
  2023-10-02 17:24 ` [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Mark Rutland
  2023-10-02 19:16   ` Doug Anderson
  2023-10-06  9:55   ` Marc Zyngier
@ 2023-10-06 10:20   ` Chen-Yu Tsai
  2 siblings, 0 replies; 19+ messages in thread
From: Chen-Yu Tsai @ 2023-10-06 10:20 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Douglas Anderson, Catalin Marinas, Will Deacon, Marc Zyngier,
	Stephen Boyd, Valentin Schneider, AngeloGioacchino Del Regno,
	D Scott Phillips, Josh Poimboeuf, Matthias Brugger,
	Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, linux-mediatek

On Tue, Oct 3, 2023 at 1:24 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Mon, Oct 02, 2023 at 09:45:29AM -0700, Douglas Anderson wrote:
> > Some mediatek devices have the property
> > "mediatek,broken-save-restore-fw" in their GIC. This means that,
> > although the hardware supports pseudo-NMI, the firmware has a bug
> > that blocks enabling it. When we're in this state,
> > system_uses_irq_prio_masking() will return true but we'll fail to
> > actually enable the IRQ in the GIC.
> >
> > Let's make the code handle this. We'll detect that we failed to
> > request an IPI as NMI and fallback to requesting it normally. Though
> > we expect that either all of our requests will fail or all will
> > succeed, it's just as cheap to keep a per-IPI bitmap and that keeps us
> > robust.
> >
> > Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> > Reported-by: Chen-Yu Tsai <wenst@chromium.org>
> > Closes: https://issuetracker.google.com/issues/197061987#comment68
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> >
> >  arch/arm64/kernel/smp.c | 19 ++++++++++++-------
> >  1 file changed, 12 insertions(+), 7 deletions(-)
>
> I'm not too keen on falling back here when we have no idea why the request failed.
>
> I'd prefer if we could check the `supports_pseudo_nmis` static key directly to
> account for the case of broken FW, e.g. as below.
>
> Mark.
>
> ---->8----
> From 72fdec05c64a74f21871b44c7c760bbe07cac044 Mon Sep 17 00:00:00 2001
> From: Mark Rutland <mark.rutland@arm.com>
> Date: Mon, 2 Oct 2023 18:00:36 +0100
> Subject: [PATCH] arm64: smp: avoid NMI IPIs with broken MediaTek FW
>
> Some MediaTek devices have broken firmware which corrupts some GICR
> registers behind the back of the OS, and pseudo-NMIs cannot be used on
> these devices. For more details see commit:
>
>   44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues")
>
> We did not take this problem into account in commit:
>
>   331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
>
> Since that commit arm64's SMP code will try to setup some IPIs as
> pseudo-NMIs, even on systems with broken FW. The GICv3 code will
> (rightly) reject attempts to request interrupts as pseudo-NMIs,
> resulting in boot-time failures.
>
> Avoid the problem by taking the broken FW into account when deciding to
> request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key
> named "supports_pseudo_nmis" which is false on systems with broken FW,
> and we can consult this within ipi_should_be_nmi().
>
> Fixes: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
> Reported-by: Chen-Yu Tsai <wenst@chromium.org>
> Closes: https://issuetracker.google.com/issues/197061987#comment68
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Douglas Anderson <dianders@chromium.org>
> Cc: Marc Zyngier <maz@kernel.org>

Tested-by: Chen-Yu Tsai <wenst@chromium.org>

> ---
>  arch/arm64/kernel/smp.c      | 5 ++++-
>  drivers/irqchip/irq-gic-v3.c | 2 +-
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 814d9aa93b21b..061c69160f90f 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -964,7 +964,10 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
>
>  static bool ipi_should_be_nmi(enum ipi_msg_type ipi)
>  {
> -       if (!system_uses_irq_prio_masking())
> +       DECLARE_STATIC_KEY_FALSE(supports_pseudo_nmis);
> +
> +       if (!system_uses_irq_prio_masking() ||
> +           !static_branch_likely(&supports_pseudo_nmis))
>                 return false;
>
>         switch (ipi) {
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 787ccc880b22d..737da1b9aabf2 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -106,7 +106,7 @@ static DEFINE_STATIC_KEY_TRUE(supports_deactivate_key);
>   * - Figure 4-7 Secure read of the priority field for a Non-secure Group 1
>   *   interrupt.
>   */
> -static DEFINE_STATIC_KEY_FALSE(supports_pseudo_nmis);
> +DEFINE_STATIC_KEY_FALSE(supports_pseudo_nmis);
>
>  DEFINE_STATIC_KEY_FALSE(gic_nonsecure_priorities);
>  EXPORT_SYMBOL(gic_nonsecure_priorities);
> --
> 2.30.2
>
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
  2023-10-05 10:27                   ` Mark Rutland
  2023-10-05 15:34                     ` Doug Anderson
@ 2023-10-06 10:20                     ` Marc Zyngier
  2023-10-06 22:17                       ` Doug Anderson
  1 sibling, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2023-10-06 10:20 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Doug Anderson, Catalin Marinas, Will Deacon, Stephen Boyd,
	Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno,
	D Scott Phillips, Josh Poimboeuf, Matthias Brugger,
	Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, linux-mediatek

On Thu, 05 Oct 2023 11:27:26 +0100,
Mark Rutland <mark.rutland@arm.com> wrote:

> 3) Partial fix: make the ARM64_HAS_GIC_PRIO_MASKING cpucap depend on the
>    absence of a "mediatek,broken-save-restore-fw" property in the DT. I believe
>    we can check that in early_enable_pseudo_nmi() or can_use_gic_priorities().
> 
>    That'll avoid potential issues if/when we change the priorities used for
>    pNMI (which is something I've been looking at).
> 
> I'm happy with (3) if Marc is.

Definitely worth investigating if you have the bandwidth.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
  2023-10-05 15:34                     ` Doug Anderson
@ 2023-10-06 12:19                       ` Catalin Marinas
  0 siblings, 0 replies; 19+ messages in thread
From: Catalin Marinas @ 2023-10-06 12:19 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Mark Rutland, Will Deacon, Marc Zyngier, Stephen Boyd,
	Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno,
	D Scott Phillips, Josh Poimboeuf, Matthias Brugger,
	Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, linux-mediatek

On Thu, Oct 05, 2023 at 08:34:56AM -0700, Doug Anderson wrote:
> On Thu, Oct 5, 2023 at 3:27 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > On Wed, Oct 04, 2023 at 07:04:12AM -0700, Doug Anderson wrote:
> > > On Wed, Oct 4, 2023 at 3:15 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > On Wed, 04 Oct 2023 10:59:50 +0100,
> > > > Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > Given you haven't seen any issues, I suspect those are getting reset to fixed
> > > > > values that happens to work out for us, but it is a bit worrisome more
> > > > > generally (e.g. the LPI case above).
> > > >
> > > > It is likely that these SoCs don't even have an ITS.
> > >
> > > Right. That was what we decided [1] when Marc pointed this out earlier.
> > >
> > > Overall: we know that this firmware behavior is not good but we're
> > > stuck with it. :( At the very least, any new devices coming out will
> > > have this fixed. Presumably if old devices are working OK enough today
> > > (as long as you don't enable pseudo-NMI) then they can be made to keep
> > > working?
> > >
> > > So circling back: what patch should we actually land?
> >
> > For now I'd prefer we took the patch I sent in:
> >
> >   https://lore.kernel.org/linux-arm-kernel/ZRr8r7XMoyDKaitd@FVFF77S0Q05N.cambridge.arm.com/
> >
> > ... as that leaves us no worse than before this series, and it's pretty simple.
> 
> Sounds good to me!
> 
> Catalin / Will: Please yell if there's anything you need me to do.
> Otherwise I'll assume you'll pick up Mark's patch instead of my patch
> #1 and then you'll pick up my patch #2.

I applied both to the arm64 for-next/backtrace-ipi branch. Thanks.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW
  2023-10-06 10:20                     ` Marc Zyngier
@ 2023-10-06 22:17                       ` Doug Anderson
  0 siblings, 0 replies; 19+ messages in thread
From: Doug Anderson @ 2023-10-06 22:17 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Mark Rutland, Catalin Marinas, Will Deacon, Stephen Boyd,
	Valentin Schneider, Chen-Yu Tsai, AngeloGioacchino Del Regno,
	D Scott Phillips, Josh Poimboeuf, Matthias Brugger,
	Misono Tomohiro, Peter Zijlstra, Sumit Garg, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, linux-mediatek

Hi,

On Fri, Oct 6, 2023 at 3:20 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 05 Oct 2023 11:27:26 +0100,
> Mark Rutland <mark.rutland@arm.com> wrote:
>
> > 3) Partial fix: make the ARM64_HAS_GIC_PRIO_MASKING cpucap depend on the
> >    absence of a "mediatek,broken-save-restore-fw" property in the DT. I believe
> >    we can check that in early_enable_pseudo_nmi() or can_use_gic_priorities().
> >
> >    That'll avoid potential issues if/when we change the priorities used for
> >    pNMI (which is something I've been looking at).
> >
> > I'm happy with (3) if Marc is.
>
> Definitely worth investigating if you have the bandwidth.

I made an attempt at it. See:

https://lore.kernel.org/20231006151547.1.Ide945748593cffd8ff0feb9ae22b795935b944d6@changeid

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-10-06 22:18 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-02 16:45 [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Douglas Anderson
2023-10-02 16:45 ` [PATCH 2/2] arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup Douglas Anderson
2023-10-02 17:25   ` Mark Rutland
2023-10-02 17:24 ` [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW Mark Rutland
2023-10-02 19:16   ` Doug Anderson
2023-10-03 12:29     ` Mark Rutland
2023-10-03 13:43       ` Doug Anderson
2023-10-03 16:32         ` Mark Rutland
2023-10-03 19:32           ` Doug Anderson
2023-10-04  9:59             ` Mark Rutland
2023-10-04 10:15               ` Marc Zyngier
2023-10-04 14:04                 ` Doug Anderson
2023-10-05 10:27                   ` Mark Rutland
2023-10-05 15:34                     ` Doug Anderson
2023-10-06 12:19                       ` Catalin Marinas
2023-10-06 10:20                     ` Marc Zyngier
2023-10-06 22:17                       ` Doug Anderson
2023-10-06  9:55   ` Marc Zyngier
2023-10-06 10:20   ` Chen-Yu Tsai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).