public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV
@ 2026-04-13 16:54 Breno Leitao
  2026-04-14  9:43 ` lihuisong (C)
  0 siblings, 1 reply; 12+ messages in thread
From: Breno Leitao @ 2026-04-13 16:54 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Huisong Li, lpieralisi,
	catalin.marinas, will
  Cc: Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski,
	guohanjun, sudeep.holla, linux-arm-kernel, rmikey, kernel-team,
	Breno Leitao

Commit cac173bea57d ("ACPI: processor: idle: Rework the handling of
acpi_processor_ffh_lpi_probe()") moved the acpi_processor_ffh_lpi_probe()
call from acpi_processor_setup_cpuidle_dev(), where its return value was
ignored, to acpi_processor_get_power_info(), where it is treated as a
hard failure. This causes cpuidle setup to fail entirely for all CPUs on
platforms where the FFH LPI probe returns an error.

On NVIDIA Grace (aarch64) systems with PSCIv1.1, the probe fails for all
72 CPUs with -ENODEV because psci_acpi_cpu_init_idle() finds
power.count - 1 <= 0 (power.count=1). This results in no cpuidle states
registered for any CPU, forcing them to busy-poll when idle instead of
entering low-power states.

The -ENODEV error simply means no deep PSCI idle states are available
beyond WFI, which is a normal condition. Do not propagate -ENODEV and
downgrade its message to pr_debug, while still propagating other errors
that may indicate real problems.

Fixes: cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()")
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 drivers/acpi/processor_idle.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index ee5facccbe10c..7b6f7730ec63d 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -1258,8 +1258,13 @@ static int acpi_processor_get_power_info(struct acpi_processor *pr)
 
 	if (pr->flags.has_lpi) {
 		ret = acpi_processor_ffh_lpi_probe(pr->id);
-		if (ret)
+		if (ret == -ENODEV) {
+			pr_debug("CPU%u: FFH LPI probe failed, err = %d, power.count = %d\n",
+				 pr->id, ret, pr->power.count);
+			ret = 0;
+		} else if (ret) {
 			pr_err("CPU%u: Invalid FFH LPI data\n", pr->id);
+		}
 	}
 
 	return ret;

---
base-commit: 66672af7a095d89f082c5327f3b15bc2f93d558e
change-id: 20260413-ffh-93f68b2f46a3

Best regards,
--  
Breno Leitao <leitao@debian.org>



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV
  2026-04-13 16:54 [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV Breno Leitao
@ 2026-04-14  9:43 ` lihuisong (C)
  2026-04-14 10:21   ` Breno Leitao
  0 siblings, 1 reply; 12+ messages in thread
From: lihuisong (C) @ 2026-04-14  9:43 UTC (permalink / raw)
  To: Breno Leitao, Rafael J. Wysocki, Len Brown, lpieralisi,
	catalin.marinas, will
  Cc: Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski,
	guohanjun, sudeep.holla, linux-arm-kernel, rmikey, kernel-team,
	lihuisong


On 4/14/2026 12:54 AM, Breno Leitao wrote:
> Commit cac173bea57d ("ACPI: processor: idle: Rework the handling of
> acpi_processor_ffh_lpi_probe()") moved the acpi_processor_ffh_lpi_probe()
> call from acpi_processor_setup_cpuidle_dev(), where its return value was
> ignored, to acpi_processor_get_power_info(), where it is treated as a
> hard failure. This causes cpuidle setup to fail entirely for all CPUs on
> platforms where the FFH LPI probe returns an error.
>
> On NVIDIA Grace (aarch64) systems with PSCIv1.1, the probe fails for all
> 72 CPUs with -ENODEV because psci_acpi_cpu_init_idle() finds
> power.count - 1 <= 0 (power.count=1). This results in no cpuidle states
> registered for any CPU, forcing them to busy-poll when idle instead of
> entering low-power states.
>
>
Sorry for bring you cpuidle issues on your platform.
IIUC, your platfom just has one LPI states on failure, it should be WFI, 
right?
This failure will only cause the cpuidle directory for each CPU not to 
be created, but will not affect the CPU entering the WFI state which 
done by cpuidle core.

But it is a real issue. Thanks for your report.
I think the best way to fix your issue is that remove this verification 
in psci_acpi_cpu_init_idle().
Because it is legal for platform to report one LPI state.
This function just needs to verify the LPI states which are FFH.

/Huisong
>
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV
  2026-04-14  9:43 ` lihuisong (C)
@ 2026-04-14 10:21   ` Breno Leitao
  2026-04-14 11:31     ` lihuisong (C)
  0 siblings, 1 reply; 12+ messages in thread
From: Breno Leitao @ 2026-04-14 10:21 UTC (permalink / raw)
  To: lihuisong (C)
  Cc: Rafael J. Wysocki, Len Brown, lpieralisi, catalin.marinas, will,
	Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski,
	guohanjun, sudeep.holla, linux-arm-kernel, rmikey, kernel-team

Hello Huisong,

On Tue, Apr 14, 2026 at 05:43:51PM +0800, lihuisong (C) wrote:
> But it is a real issue. Thanks for your report.
> I think the best way to fix your issue is that remove this verification in
> psci_acpi_cpu_init_idle().
> Because it is legal for platform to report one LPI state.
> This function just needs to verify the LPI states which are FFH.

Thank you for the prompt feedback.

Would this approach work?

commit 6c9d52840a4f778cc989838ba76ee51416e85de3
Author: Breno Leitao <leitao@debian.org>
Date:   Tue Apr 14 03:16:08 2026 -0700

    ACPI: processor: idle: Allow platforms with only one LPI state
    
    psci_acpi_cpu_init_idle() rejects platforms where power.count - 1 <= 0
    by returning -ENODEV. However, having a single LPI state (WFI) is a
    valid configuration. The function's purpose is to verify FFH idle states,
    and when count is zero, there are simply no FFH states to validate —
    this is not an error.
    
    On NVIDIA Grace (aarch64) systems with PSCIv1.1, power.count is 1 for
    all 72 CPUs, so the probe fails with -ENODEV. After commit cac173bea57d
    ("ACPI: processor: idle: Rework the handling of
    acpi_processor_ffh_lpi_probe()"), this failure propagates up and prevents
    cpuidle registration entirely.
    
    Change the check from (count <= 0) to (count < 0) so that platforms
    with only WFI are accepted. The for loop naturally handles count == 0
    by not iterating.
    
    Fixes: cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()")
    Signed-off-by: Breno Leitao <leitao@debian.org>

diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c
index 801f9c4501425..7791b751042ce 100644
--- a/drivers/acpi/arm64/cpuidle.c
+++ b/drivers/acpi/arm64/cpuidle.c
@@ -31,7 +31,7 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
 		return -EOPNOTSUPP;
 
 	count = pr->power.count - 1;
-	if (count <= 0)
+	if (count < 0)
 		return -ENODEV;
 
 	for (i = 0; i < count; i++) {


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV
  2026-04-14 10:21   ` Breno Leitao
@ 2026-04-14 11:31     ` lihuisong (C)
  2026-04-14 12:05       ` Breno Leitao
  2026-04-14 12:25       ` Sudeep Holla
  0 siblings, 2 replies; 12+ messages in thread
From: lihuisong (C) @ 2026-04-14 11:31 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Rafael J. Wysocki, Len Brown, lpieralisi, catalin.marinas, will,
	Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski,
	guohanjun, sudeep.holla, linux-arm-kernel, rmikey, kernel-team


On 4/14/2026 6:21 PM, Breno Leitao wrote:
> Hello Huisong,
>
> On Tue, Apr 14, 2026 at 05:43:51PM +0800, lihuisong (C) wrote:
>> But it is a real issue. Thanks for your report.
>> I think the best way to fix your issue is that remove this verification in
>> psci_acpi_cpu_init_idle().
>> Because it is legal for platform to report one LPI state.
>> This function just needs to verify the LPI states which are FFH.
> Thank you for the prompt feedback.
>
> Would this approach work?
>
> commit 6c9d52840a4f778cc989838ba76ee51416e85de3
> Author: Breno Leitao <leitao@debian.org>
> Date:   Tue Apr 14 03:16:08 2026 -0700
>
>      ACPI: processor: idle: Allow platforms with only one LPI state
>      
>      psci_acpi_cpu_init_idle() rejects platforms where power.count - 1 <= 0
>      by returning -ENODEV. However, having a single LPI state (WFI) is a
>      valid configuration. The function's purpose is to verify FFH idle states,
>      and when count is zero, there are simply no FFH states to validate —
>      this is not an error.
>      
>      On NVIDIA Grace (aarch64) systems with PSCIv1.1, power.count is 1 for
>      all 72 CPUs, so the probe fails with -ENODEV. After commit cac173bea57d
>      ("ACPI: processor: idle: Rework the handling of
>      acpi_processor_ffh_lpi_probe()"), this failure propagates up and prevents
>      cpuidle registration entirely.
>      
>      Change the check from (count <= 0) to (count < 0) so that platforms
>      with only WFI are accepted. The for loop naturally handles count == 0
>      by not iterating.
>      
>      Fixes: cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()")
>      Signed-off-by: Breno Leitao <leitao@debian.org>
>
> diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c
> index 801f9c4501425..7791b751042ce 100644
> --- a/drivers/acpi/arm64/cpuidle.c
> +++ b/drivers/acpi/arm64/cpuidle.c
> @@ -31,7 +31,7 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
>   		return -EOPNOTSUPP;
>   
>   	count = pr->power.count - 1;
> -	if (count <= 0)
> +	if (count < 0)
>   		return -ENODEV;
>   
>   	for (i = 0; i < count; i++) {

This count already verified in acpi_processor_get_lpi_info.

I suggest modifing it as below:

-->

git diff
diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c
index 801f9c450142..c68a5db8ebba 100644
--- a/drivers/acpi/arm64/cpuidle.c
+++ b/drivers/acpi/arm64/cpuidle.c
@@ -16,7 +16,7 @@

  static int psci_acpi_cpu_init_idle(unsigned int cpu)
  {
-       int i, count;
+       int i;
         struct acpi_lpi_state *lpi;
         struct acpi_processor *pr = per_cpu(processors, cpu);

@@ -30,14 +30,10 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
         if (!psci_ops.cpu_suspend)
                 return -EOPNOTSUPP;

-       count = pr->power.count - 1;
-       if (count <= 0)
-               return -ENODEV;
-
-       for (i = 0; i < count; i++) {
+       for (i = 1; i < pr->power.count; i++) {
                 u32 state;

-               lpi = &pr->power.lpi_states[i + 1];
+               lpi = &pr->power.lpi_states[i];
                 /*
                  * Only bits[31:0] represent a PSCI power_state while
                  * bits[63:32] must be 0x0 as per ARM ACPI FFH 
Specification



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV
  2026-04-14 11:31     ` lihuisong (C)
@ 2026-04-14 12:05       ` Breno Leitao
  2026-04-14 12:25       ` Sudeep Holla
  1 sibling, 0 replies; 12+ messages in thread
From: Breno Leitao @ 2026-04-14 12:05 UTC (permalink / raw)
  To: lihuisong (C)
  Cc: Rafael J. Wysocki, Len Brown, lpieralisi, catalin.marinas, will,
	Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski,
	guohanjun, sudeep.holla, linux-arm-kernel, rmikey, kernel-team

On Tue, Apr 14, 2026 at 07:31:29PM +0800, lihuisong (C) wrote:
> 
> On 4/14/2026 6:21 PM, Breno Leitao wrote:
> > Hello Huisong,
> > 
> > On Tue, Apr 14, 2026 at 05:43:51PM +0800, lihuisong (C) wrote:
> > > But it is a real issue. Thanks for your report.
> > > I think the best way to fix your issue is that remove this verification in
> > > psci_acpi_cpu_init_idle().
> > > Because it is legal for platform to report one LPI state.
> > > This function just needs to verify the LPI states which are FFH.
> > Thank you for the prompt feedback.
> > 
> > Would this approach work?
> > 
> > commit 6c9d52840a4f778cc989838ba76ee51416e85de3
> > Author: Breno Leitao <leitao@debian.org>
> > Date:   Tue Apr 14 03:16:08 2026 -0700
> > 
> >      ACPI: processor: idle: Allow platforms with only one LPI state
> >      psci_acpi_cpu_init_idle() rejects platforms where power.count - 1 <= 0
> >      by returning -ENODEV. However, having a single LPI state (WFI) is a
> >      valid configuration. The function's purpose is to verify FFH idle states,
> >      and when count is zero, there are simply no FFH states to validate —
> >      this is not an error.
> >      On NVIDIA Grace (aarch64) systems with PSCIv1.1, power.count is 1 for
> >      all 72 CPUs, so the probe fails with -ENODEV. After commit cac173bea57d
> >      ("ACPI: processor: idle: Rework the handling of
> >      acpi_processor_ffh_lpi_probe()"), this failure propagates up and prevents
> >      cpuidle registration entirely.
> >      Change the check from (count <= 0) to (count < 0) so that platforms
> >      with only WFI are accepted. The for loop naturally handles count == 0
> >      by not iterating.
> >      Fixes: cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()")
> >      Signed-off-by: Breno Leitao <leitao@debian.org>
> > 
> > diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c
> > index 801f9c4501425..7791b751042ce 100644
> > --- a/drivers/acpi/arm64/cpuidle.c
> > +++ b/drivers/acpi/arm64/cpuidle.c
> > @@ -31,7 +31,7 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
> >   		return -EOPNOTSUPP;
> >   	count = pr->power.count - 1;
> > -	if (count <= 0)
> > +	if (count < 0)
> >   		return -ENODEV;
> >   	for (i = 0; i < count; i++) {
> 
> This count already verified in acpi_processor_get_lpi_info.
> 
> I suggest modifing it as below:
> 
> -->
> 
> git diff
> diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c
> index 801f9c450142..c68a5db8ebba 100644
> --- a/drivers/acpi/arm64/cpuidle.c
> +++ b/drivers/acpi/arm64/cpuidle.c
> @@ -16,7 +16,7 @@
> 
>  static int psci_acpi_cpu_init_idle(unsigned int cpu)
>  {
> -       int i, count;
> +       int i;
>         struct acpi_lpi_state *lpi;
>         struct acpi_processor *pr = per_cpu(processors, cpu);
> 
> @@ -30,14 +30,10 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
>         if (!psci_ops.cpu_suspend)
>                 return -EOPNOTSUPP;
> 
> -       count = pr->power.count - 1;
> -       if (count <= 0)
> -               return -ENODEV;
> -
> -       for (i = 0; i < count; i++) {
> +       for (i = 1; i < pr->power.count; i++) {
>                 u32 state;
> 
> -               lpi = &pr->power.lpi_states[i + 1];
> +               lpi = &pr->power.lpi_states[i];
>                 /*
>                  * Only bits[31:0] represent a PSCI power_state while
>                  * bits[63:32] must be 0x0 as per ARM ACPI FFH Specification

Ack, I will respin.

Thanks for your help,
--breno


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV
  2026-04-14 11:31     ` lihuisong (C)
  2026-04-14 12:05       ` Breno Leitao
@ 2026-04-14 12:25       ` Sudeep Holla
  2026-04-14 13:14         ` Breno Leitao
  2026-04-15  1:32         ` lihuisong (C)
  1 sibling, 2 replies; 12+ messages in thread
From: Sudeep Holla @ 2026-04-14 12:25 UTC (permalink / raw)
  To: lihuisong (C)
  Cc: Breno Leitao, Sudeep Holla, Rafael J. Wysocki, Len Brown,
	lpieralisi, catalin.marinas, will, Rafael J. Wysocki, linux-acpi,
	linux-kernel, pjaroszynski, guohanjun, linux-arm-kernel, rmikey,
	kernel-team

On Tue, Apr 14, 2026 at 07:31:29PM +0800, lihuisong (C) wrote:
> 
> On 4/14/2026 6:21 PM, Breno Leitao wrote:
> > Hello Huisong,
> > 
> > On Tue, Apr 14, 2026 at 05:43:51PM +0800, lihuisong (C) wrote:
> > > But it is a real issue. Thanks for your report.
> > > I think the best way to fix your issue is that remove this verification in
> > > psci_acpi_cpu_init_idle().
> > > Because it is legal for platform to report one LPI state.
> > > This function just needs to verify the LPI states which are FFH.
> > Thank you for the prompt feedback.
> > 
> > Would this approach work?
> > 
> > commit 6c9d52840a4f778cc989838ba76ee51416e85de3
> > Author: Breno Leitao <leitao@debian.org>
> > Date:   Tue Apr 14 03:16:08 2026 -0700
> > 
> >      ACPI: processor: idle: Allow platforms with only one LPI state
> >      psci_acpi_cpu_init_idle() rejects platforms where power.count - 1 <= 0
> >      by returning -ENODEV. However, having a single LPI state (WFI) is a
> >      valid configuration. The function's purpose is to verify FFH idle states,
> >      and when count is zero, there are simply no FFH states to validate —
> >      this is not an error.
> >      On NVIDIA Grace (aarch64) systems with PSCIv1.1, power.count is 1 for
> >      all 72 CPUs, so the probe fails with -ENODEV. After commit cac173bea57d
> >      ("ACPI: processor: idle: Rework the handling of
> >      acpi_processor_ffh_lpi_probe()"), this failure propagates up and prevents
> >      cpuidle registration entirely.
> >      Change the check from (count <= 0) to (count < 0) so that platforms
> >      with only WFI are accepted. The for loop naturally handles count == 0
> >      by not iterating.
> >      Fixes: cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()")
> >      Signed-off-by: Breno Leitao <leitao@debian.org>
> > 
> > diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c
> > index 801f9c4501425..7791b751042ce 100644
> > --- a/drivers/acpi/arm64/cpuidle.c
> > +++ b/drivers/acpi/arm64/cpuidle.c
> > @@ -31,7 +31,7 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
> >   		return -EOPNOTSUPP;
> >   	count = pr->power.count - 1;
> > -	if (count <= 0)
> > +	if (count < 0)
> >   		return -ENODEV;
> >   	for (i = 0; i < count; i++) {
> 
> This count already verified in acpi_processor_get_lpi_info.
> 
> I suggest modifing it as below:
> 
> -->
> 
> git diff
> diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c
> index 801f9c450142..c68a5db8ebba 100644
> --- a/drivers/acpi/arm64/cpuidle.c
> +++ b/drivers/acpi/arm64/cpuidle.c
> @@ -16,7 +16,7 @@
> 
>  static int psci_acpi_cpu_init_idle(unsigned int cpu)
>  {
> -       int i, count;
> +       int i;
>         struct acpi_lpi_state *lpi;
>         struct acpi_processor *pr = per_cpu(processors, cpu);
> 
> @@ -30,14 +30,10 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
>         if (!psci_ops.cpu_suspend)
>                 return -EOPNOTSUPP;
> 
> -       count = pr->power.count - 1;
> -       if (count <= 0)
> -               return -ENODEV;
> -

It was intentionally designed this way, as there is little value in defining
only WFI in the _LPI tables. In the absence of a cpuidle driver/LPI entry,
arch_cpu_idle() is invoked, which is sufficient and avoids unnecessary
complexity, only to ultimately execute wfi() anyway.

So while I understand that the kernel did not report an error previously, that
does not mean the _LPI table is merely moot on this platform when it contains
only a WFI state.


-- 
Regards,
Sudeep


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV
  2026-04-14 12:25       ` Sudeep Holla
@ 2026-04-14 13:14         ` Breno Leitao
  2026-04-14 14:10           ` Sudeep Holla
  2026-04-15  1:32         ` lihuisong (C)
  1 sibling, 1 reply; 12+ messages in thread
From: Breno Leitao @ 2026-04-14 13:14 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: lihuisong (C), Rafael J. Wysocki, Len Brown, lpieralisi,
	catalin.marinas, will, Rafael J. Wysocki, linux-acpi,
	linux-kernel, pjaroszynski, guohanjun, linux-arm-kernel, rmikey,
	kernel-team

Hello Sudeep,

On Tue, Apr 14, 2026 at 01:25:53PM +0100, Sudeep Holla wrote:
> So while I understand that the kernel did not report an error previously, that
> does not mean the _LPI table is merely moot on this platform when it contains
> only a WFI state.

Can you clarify whether datacenter ARM systems are expected to expose
deeper idle states beyond WFI in their _LPI tables?

Backing up, I'm observing 72 pr_err() messages during boot on these
hosts and trying to determine whether this indicates a firmware issue or
if the kernel needs adjustment.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV
  2026-04-14 13:14         ` Breno Leitao
@ 2026-04-14 14:10           ` Sudeep Holla
  2026-04-14 16:31             ` Breno Leitao
  0 siblings, 1 reply; 12+ messages in thread
From: Sudeep Holla @ 2026-04-14 14:10 UTC (permalink / raw)
  To: Breno Leitao
  Cc: lihuisong (C), Rafael J. Wysocki, Sudeep Holla, Len Brown,
	lpieralisi, catalin.marinas, will, Rafael J. Wysocki, linux-acpi,
	linux-kernel, pjaroszynski, guohanjun, linux-arm-kernel, rmikey,
	kernel-team

On Tue, Apr 14, 2026 at 06:14:19AM -0700, Breno Leitao wrote:
> Hello Sudeep,
> 
> On Tue, Apr 14, 2026 at 01:25:53PM +0100, Sudeep Holla wrote:
> > So while I understand that the kernel did not report an error previously, that
> > does not mean the _LPI table is merely moot on this platform when it contains
> > only a WFI state.
> 
> Can you clarify whether datacenter ARM systems are expected to expose
> deeper idle states beyond WFI in their _LPI tables?
> 

Of course any system that prefers to save power when its idling must have
these _LPI deeper idle states.

> Backing up, I'm observing 72 pr_err() messages during boot on these
> hosts and trying to determine whether this indicates a firmware issue or
> if the kernel needs adjustment.

I consider this a firmware issue, but not a fatal one. What matters more is
the behavior after those errors are reported.

If you force success, either through your change or through the PSCI approach
suggested by lihuisong, then in practice you are only enabling a cpuidle
driver with a single usable state: WFI. That is not inherently wrong, but it
also does not provide much benefit.

When the idle task needs to enter an idle state, it will ask the cpuidle
framework to choose the appropriate state. In this case, however, the
framework has no real decision to make, because the only available state is
WFI. If that is all the platform can support, then the same result can be
achieved more efficiently without registering the cpuidle driver at all, by
falling back to `arch_cpu_idle()`, as I mentioned earlier.

-- 
Regards,
Sudeep


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV
  2026-04-14 14:10           ` Sudeep Holla
@ 2026-04-14 16:31             ` Breno Leitao
  2026-04-15 10:45               ` Sudeep Holla
  0 siblings, 1 reply; 12+ messages in thread
From: Breno Leitao @ 2026-04-14 16:31 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: lihuisong (C), Rafael J. Wysocki, Len Brown, lpieralisi,
	catalin.marinas, will, Rafael J. Wysocki, linux-acpi,
	linux-kernel, pjaroszynski, guohanjun, linux-arm-kernel, rmikey,
	kernel-team

Hello Sudeep,

On Tue, Apr 14, 2026 at 03:10:03PM +0100, Sudeep Holla wrote:
> On Tue, Apr 14, 2026 at 06:14:19AM -0700, Breno Leitao wrote:
> > Hello Sudeep,
> >
> > On Tue, Apr 14, 2026 at 01:25:53PM +0100, Sudeep Holla wrote:
> > > So while I understand that the kernel did not report an error previously, that
> > > does not mean the _LPI table is merely moot on this platform when it contains
> > > only a WFI state.
> >
> > Can you clarify whether datacenter ARM systems are expected to expose
> > deeper idle states beyond WFI in their _LPI tables?
> >
>
> Of course any system that prefers to save power when its idling must have
> these _LPI deeper idle states.
>
> > Backing up, I'm observing 72 pr_err() messages during boot on these
> > hosts and trying to determine whether this indicates a firmware issue or
> > if the kernel needs adjustment.
>
> I consider this a firmware issue, but not a fatal one. What matters more is
> the behavior after those errors are reported.

I understand. While I'm not a hardware or firmware vendor myself, I can
see how they might consider power management features _optional_ for certain
server configurations.

> If you force success, either through your change or through the PSCI approach
> suggested by lihuisong, then in practice you are only enabling a cpuidle
> driver with a single usable state: WFI. That is not inherently wrong, but it
> also does not provide much benefit.

Given that this isn't a critical error, would it make sense to downgrade
the pr_err() to pr_debug()? is it a reasonable compromise. I just want
to avoid these pr_err() all accross the board, affecting kernel health
metrics in large fleets.

My proposal:

commit c98007f9e10fe229672d29c3844c96705cecaed5
Author: Breno Leitao <leitao@debian.org>
Date:   Tue Apr 14 05:28:28 2026 -0700

    ACPI: processor: idle: Downgrade FFH LPI probe failure message to pr_debug()
    
    The "Invalid FFH LPI data" message is printed at pr_err() level for every
    CPU when acpi_processor_ffh_lpi_probe() fails. On platforms where the FFH
    probe legitimately returns an error (e.g., no deep idle states beyond
    WFI), this floods the kernel log with per-CPU error messages that are not
    actionable.
    
    Downgrade to pr_debug() since this is a diagnostic message, not a
    critical error.
    
    Signed-off-by: Breno Leitao <leitao@debian.org>

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index ee5facccbe10c..ab93a2c10a9ad 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -1259,7 +1259,7 @@ static int acpi_processor_get_power_info(struct acpi_processor *pr)
 	if (pr->flags.has_lpi) {
 		ret = acpi_processor_ffh_lpi_probe(pr->id);
 		if (ret)
-			pr_err("CPU%u: Invalid FFH LPI data\n", pr->id);
+			pr_debug("CPU%u: Invalid FFH LPI data\n", pr->id);
 	}
 
 	return ret;




^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV
  2026-04-14 12:25       ` Sudeep Holla
  2026-04-14 13:14         ` Breno Leitao
@ 2026-04-15  1:32         ` lihuisong (C)
  2026-04-15 14:03           ` Rafael J. Wysocki
  1 sibling, 1 reply; 12+ messages in thread
From: lihuisong (C) @ 2026-04-15  1:32 UTC (permalink / raw)
  To: Sudeep Holla, Rafael J. Wysocki
  Cc: Breno Leitao, Len Brown, lpieralisi, catalin.marinas, will,
	Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski,
	guohanjun, linux-arm-kernel, rmikey, kernel-team, lihuisong


On 4/14/2026 8:25 PM, Sudeep Holla wrote:
> On Tue, Apr 14, 2026 at 07:31:29PM +0800, lihuisong (C) wrote:
>> On 4/14/2026 6:21 PM, Breno Leitao wrote:
>>> Hello Huisong,
>>>
>>> On Tue, Apr 14, 2026 at 05:43:51PM +0800, lihuisong (C) wrote:
>>>> But it is a real issue. Thanks for your report.
>>>> I think the best way to fix your issue is that remove this verification in
>>>> psci_acpi_cpu_init_idle().
>>>> Because it is legal for platform to report one LPI state.
>>>> This function just needs to verify the LPI states which are FFH.
>>> Thank you for the prompt feedback.
>>>
>>> Would this approach work?
>>>
>>> commit 6c9d52840a4f778cc989838ba76ee51416e85de3
>>> Author: Breno Leitao <leitao@debian.org>
>>> Date:   Tue Apr 14 03:16:08 2026 -0700
>>>
>>>       ACPI: processor: idle: Allow platforms with only one LPI state
>>>       psci_acpi_cpu_init_idle() rejects platforms where power.count - 1 <= 0
>>>       by returning -ENODEV. However, having a single LPI state (WFI) is a
>>>       valid configuration. The function's purpose is to verify FFH idle states,
>>>       and when count is zero, there are simply no FFH states to validate —
>>>       this is not an error.
>>>       On NVIDIA Grace (aarch64) systems with PSCIv1.1, power.count is 1 for
>>>       all 72 CPUs, so the probe fails with -ENODEV. After commit cac173bea57d
>>>       ("ACPI: processor: idle: Rework the handling of
>>>       acpi_processor_ffh_lpi_probe()"), this failure propagates up and prevents
>>>       cpuidle registration entirely.
>>>       Change the check from (count <= 0) to (count < 0) so that platforms
>>>       with only WFI are accepted. The for loop naturally handles count == 0
>>>       by not iterating.
>>>       Fixes: cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()")
>>>       Signed-off-by: Breno Leitao <leitao@debian.org>
>>>
>>> diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c
>>> index 801f9c4501425..7791b751042ce 100644
>>> --- a/drivers/acpi/arm64/cpuidle.c
>>> +++ b/drivers/acpi/arm64/cpuidle.c
>>> @@ -31,7 +31,7 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
>>>    		return -EOPNOTSUPP;
>>>    	count = pr->power.count - 1;
>>> -	if (count <= 0)
>>> +	if (count < 0)
>>>    		return -ENODEV;
>>>    	for (i = 0; i < count; i++) {
>> This count already verified in acpi_processor_get_lpi_info.
>>
>> I suggest modifing it as below:
>>
>> -->
>>
>> git diff
>> diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c
>> index 801f9c450142..c68a5db8ebba 100644
>> --- a/drivers/acpi/arm64/cpuidle.c
>> +++ b/drivers/acpi/arm64/cpuidle.c
>> @@ -16,7 +16,7 @@
>>
>>   static int psci_acpi_cpu_init_idle(unsigned int cpu)
>>   {
>> -       int i, count;
>> +       int i;
>>          struct acpi_lpi_state *lpi;
>>          struct acpi_processor *pr = per_cpu(processors, cpu);
>>
>> @@ -30,14 +30,10 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
>>          if (!psci_ops.cpu_suspend)
>>                  return -EOPNOTSUPP;
>>
>> -       count = pr->power.count - 1;
>> -       if (count <= 0)
>> -               return -ENODEV;
>> -
> It was intentionally designed this way, as there is little value in defining
> only WFI in the _LPI tables. In the absence of a cpuidle driver/LPI entry,
> arch_cpu_idle() is invoked, which is sufficient and avoids unnecessary
> complexity, only to ultimately execute wfi() anyway.
Yeah, it's correct. The code flow will be more simple and high-efficiency.
This looks good to me.


But cpuidle sysfs under per CPU is created when firmware just reports 
WFI state before
my commit cac173bea57d ("ACPI: processor: idle: Rework the handling of 
acpi_processor_ffh_lpi_probe()").
However, these platforms will no longer be created now and some 
statistics for state0 are also missing.
This change in behavor is visiable to user space.I'm not sure if it is 
acceptable.
What do you think, Rafael?
> So while I understand that the kernel did not report an error previously, that
> does not mean the _LPI table is merely moot on this platform when it contains
> only a WFI state.
>
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV
  2026-04-14 16:31             ` Breno Leitao
@ 2026-04-15 10:45               ` Sudeep Holla
  0 siblings, 0 replies; 12+ messages in thread
From: Sudeep Holla @ 2026-04-15 10:45 UTC (permalink / raw)
  To: Breno Leitao
  Cc: lihuisong (C), Rafael J. Wysocki, Sudeep Holla, Len Brown,
	lpieralisi, catalin.marinas, will, Rafael J. Wysocki, linux-acpi,
	linux-kernel, pjaroszynski, guohanjun, linux-arm-kernel, rmikey,
	kernel-team

On Tue, Apr 14, 2026 at 09:31:33AM -0700, Breno Leitao wrote:
> Hello Sudeep,
> 
> On Tue, Apr 14, 2026 at 03:10:03PM +0100, Sudeep Holla wrote:
> > On Tue, Apr 14, 2026 at 06:14:19AM -0700, Breno Leitao wrote:
> > > Hello Sudeep,
> > >
> > > On Tue, Apr 14, 2026 at 01:25:53PM +0100, Sudeep Holla wrote:
> > > > So while I understand that the kernel did not report an error previously, that
> > > > does not mean the _LPI table is merely moot on this platform when it contains
> > > > only a WFI state.
> > >
> > > Can you clarify whether datacenter ARM systems are expected to expose
> > > deeper idle states beyond WFI in their _LPI tables?
> > >
> >
> > Of course any system that prefers to save power when its idling must have
> > these _LPI deeper idle states.
> >
> > > Backing up, I'm observing 72 pr_err() messages during boot on these
> > > hosts and trying to determine whether this indicates a firmware issue or
> > > if the kernel needs adjustment.
> >
> > I consider this a firmware issue, but not a fatal one. What matters more is
> > the behavior after those errors are reported.
> 
> I understand. While I'm not a hardware or firmware vendor myself, I can
> see how they might consider power management features _optional_ for certain
> server configurations.
> 
> > If you force success, either through your change or through the PSCI approach
> > suggested by lihuisong, then in practice you are only enabling a cpuidle
> > driver with a single usable state: WFI. That is not inherently wrong, but it
> > also does not provide much benefit.
> 
> Given that this isn't a critical error, would it make sense to downgrade
> the pr_err() to pr_debug()? is it a reasonable compromise. I just want
> to avoid these pr_err() all accross the board, affecting kernel health
> metrics in large fleets.
> 
> My proposal:
> 
> commit c98007f9e10fe229672d29c3844c96705cecaed5
> Author: Breno Leitao <leitao@debian.org>
> Date:   Tue Apr 14 05:28:28 2026 -0700
> 
>     ACPI: processor: idle: Downgrade FFH LPI probe failure message to pr_debug()
>     
>     The "Invalid FFH LPI data" message is printed at pr_err() level for every
>     CPU when acpi_processor_ffh_lpi_probe() fails. On platforms where the FFH
>     probe legitimately returns an error (e.g., no deep idle states beyond
>     WFI), this floods the kernel log with per-CPU error messages that are not
>     actionable.
>     
>     Downgrade to pr_debug() since this is a diagnostic message, not a
>     critical error.
>     
>     Signed-off-by: Breno Leitao <leitao@debian.org>
> 
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index ee5facccbe10c..ab93a2c10a9ad 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -1259,7 +1259,7 @@ static int acpi_processor_get_power_info(struct acpi_processor *pr)
>  	if (pr->flags.has_lpi) {
>  		ret = acpi_processor_ffh_lpi_probe(pr->id);
>  		if (ret)
> -			pr_err("CPU%u: Invalid FFH LPI data\n", pr->id);
> +			pr_debug("CPU%u: Invalid FFH LPI data\n", pr->id);

I am afraid if this might mask the real _LPI table entry errors.
Need to think, I am still of the opinion that we need warn on that system
as _LPI with just WFI is simply useless.

-- 
Regards,
Sudeep


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV
  2026-04-15  1:32         ` lihuisong (C)
@ 2026-04-15 14:03           ` Rafael J. Wysocki
  0 siblings, 0 replies; 12+ messages in thread
From: Rafael J. Wysocki @ 2026-04-15 14:03 UTC (permalink / raw)
  To: lihuisong (C)
  Cc: Sudeep Holla, Rafael J. Wysocki, Breno Leitao, Len Brown,
	lpieralisi, catalin.marinas, will, Rafael J. Wysocki, linux-acpi,
	linux-kernel, pjaroszynski, guohanjun, linux-arm-kernel, rmikey,
	kernel-team

On Wed, Apr 15, 2026 at 3:32 AM lihuisong (C) <lihuisong@huawei.com> wrote:
>
>
> On 4/14/2026 8:25 PM, Sudeep Holla wrote:
> > On Tue, Apr 14, 2026 at 07:31:29PM +0800, lihuisong (C) wrote:
> >> On 4/14/2026 6:21 PM, Breno Leitao wrote:
> >>> Hello Huisong,
> >>>
> >>> On Tue, Apr 14, 2026 at 05:43:51PM +0800, lihuisong (C) wrote:
> >>>> But it is a real issue. Thanks for your report.
> >>>> I think the best way to fix your issue is that remove this verification in
> >>>> psci_acpi_cpu_init_idle().
> >>>> Because it is legal for platform to report one LPI state.
> >>>> This function just needs to verify the LPI states which are FFH.
> >>> Thank you for the prompt feedback.
> >>>
> >>> Would this approach work?
> >>>
> >>> commit 6c9d52840a4f778cc989838ba76ee51416e85de3
> >>> Author: Breno Leitao <leitao@debian.org>
> >>> Date:   Tue Apr 14 03:16:08 2026 -0700
> >>>
> >>>       ACPI: processor: idle: Allow platforms with only one LPI state
> >>>       psci_acpi_cpu_init_idle() rejects platforms where power.count - 1 <= 0
> >>>       by returning -ENODEV. However, having a single LPI state (WFI) is a
> >>>       valid configuration. The function's purpose is to verify FFH idle states,
> >>>       and when count is zero, there are simply no FFH states to validate —
> >>>       this is not an error.
> >>>       On NVIDIA Grace (aarch64) systems with PSCIv1.1, power.count is 1 for
> >>>       all 72 CPUs, so the probe fails with -ENODEV. After commit cac173bea57d
> >>>       ("ACPI: processor: idle: Rework the handling of
> >>>       acpi_processor_ffh_lpi_probe()"), this failure propagates up and prevents
> >>>       cpuidle registration entirely.
> >>>       Change the check from (count <= 0) to (count < 0) so that platforms
> >>>       with only WFI are accepted. The for loop naturally handles count == 0
> >>>       by not iterating.
> >>>       Fixes: cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()")
> >>>       Signed-off-by: Breno Leitao <leitao@debian.org>
> >>>
> >>> diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c
> >>> index 801f9c4501425..7791b751042ce 100644
> >>> --- a/drivers/acpi/arm64/cpuidle.c
> >>> +++ b/drivers/acpi/arm64/cpuidle.c
> >>> @@ -31,7 +31,7 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
> >>>             return -EOPNOTSUPP;
> >>>     count = pr->power.count - 1;
> >>> -   if (count <= 0)
> >>> +   if (count < 0)
> >>>             return -ENODEV;
> >>>     for (i = 0; i < count; i++) {
> >> This count already verified in acpi_processor_get_lpi_info.
> >>
> >> I suggest modifing it as below:
> >>
> >> -->
> >>
> >> git diff
> >> diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c
> >> index 801f9c450142..c68a5db8ebba 100644
> >> --- a/drivers/acpi/arm64/cpuidle.c
> >> +++ b/drivers/acpi/arm64/cpuidle.c
> >> @@ -16,7 +16,7 @@
> >>
> >>   static int psci_acpi_cpu_init_idle(unsigned int cpu)
> >>   {
> >> -       int i, count;
> >> +       int i;
> >>          struct acpi_lpi_state *lpi;
> >>          struct acpi_processor *pr = per_cpu(processors, cpu);
> >>
> >> @@ -30,14 +30,10 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
> >>          if (!psci_ops.cpu_suspend)
> >>                  return -EOPNOTSUPP;
> >>
> >> -       count = pr->power.count - 1;
> >> -       if (count <= 0)
> >> -               return -ENODEV;
> >> -
> > It was intentionally designed this way, as there is little value in defining
> > only WFI in the _LPI tables. In the absence of a cpuidle driver/LPI entry,
> > arch_cpu_idle() is invoked, which is sufficient and avoids unnecessary
> > complexity, only to ultimately execute wfi() anyway.
> Yeah, it's correct. The code flow will be more simple and high-efficiency.
> This looks good to me.
>
>
> But cpuidle sysfs under per CPU is created when firmware just reports
> WFI state before
> my commit cac173bea57d ("ACPI: processor: idle: Rework the handling of
> acpi_processor_ffh_lpi_probe()").
> However, these platforms will no longer be created now and some
> statistics for state0 are also missing.
> This change in behavor is visiable to user space.I'm not sure if it is
> acceptable.
> What do you think, Rafael?

I think that it would be good to restore the previous behavior,
especially if it has been changed inadvertently.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-04-15 14:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-13 16:54 [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV Breno Leitao
2026-04-14  9:43 ` lihuisong (C)
2026-04-14 10:21   ` Breno Leitao
2026-04-14 11:31     ` lihuisong (C)
2026-04-14 12:05       ` Breno Leitao
2026-04-14 12:25       ` Sudeep Holla
2026-04-14 13:14         ` Breno Leitao
2026-04-14 14:10           ` Sudeep Holla
2026-04-14 16:31             ` Breno Leitao
2026-04-15 10:45               ` Sudeep Holla
2026-04-15  1:32         ` lihuisong (C)
2026-04-15 14:03           ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox