* [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV
@ 2026-04-13 16:54 Breno Leitao
2026-04-14 9:43 ` lihuisong (C)
0 siblings, 1 reply; 12+ messages in thread
From: Breno Leitao @ 2026-04-13 16:54 UTC (permalink / raw)
To: Rafael J. Wysocki, Len Brown, Huisong Li, lpieralisi,
catalin.marinas, will
Cc: Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski,
guohanjun, sudeep.holla, linux-arm-kernel, rmikey, kernel-team,
Breno Leitao
Commit cac173bea57d ("ACPI: processor: idle: Rework the handling of
acpi_processor_ffh_lpi_probe()") moved the acpi_processor_ffh_lpi_probe()
call from acpi_processor_setup_cpuidle_dev(), where its return value was
ignored, to acpi_processor_get_power_info(), where it is treated as a
hard failure. This causes cpuidle setup to fail entirely for all CPUs on
platforms where the FFH LPI probe returns an error.
On NVIDIA Grace (aarch64) systems with PSCIv1.1, the probe fails for all
72 CPUs with -ENODEV because psci_acpi_cpu_init_idle() finds
power.count - 1 <= 0 (power.count=1). This results in no cpuidle states
registered for any CPU, forcing them to busy-poll when idle instead of
entering low-power states.
The -ENODEV error simply means no deep PSCI idle states are available
beyond WFI, which is a normal condition. Do not propagate -ENODEV and
downgrade its message to pr_debug, while still propagating other errors
that may indicate real problems.
Fixes: cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()")
Signed-off-by: Breno Leitao <leitao@debian.org>
---
drivers/acpi/processor_idle.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index ee5facccbe10c..7b6f7730ec63d 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -1258,8 +1258,13 @@ static int acpi_processor_get_power_info(struct acpi_processor *pr)
if (pr->flags.has_lpi) {
ret = acpi_processor_ffh_lpi_probe(pr->id);
- if (ret)
+ if (ret == -ENODEV) {
+ pr_debug("CPU%u: FFH LPI probe failed, err = %d, power.count = %d\n",
+ pr->id, ret, pr->power.count);
+ ret = 0;
+ } else if (ret) {
pr_err("CPU%u: Invalid FFH LPI data\n", pr->id);
+ }
}
return ret;
---
base-commit: 66672af7a095d89f082c5327f3b15bc2f93d558e
change-id: 20260413-ffh-93f68b2f46a3
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV 2026-04-13 16:54 [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV Breno Leitao @ 2026-04-14 9:43 ` lihuisong (C) 2026-04-14 10:21 ` Breno Leitao 0 siblings, 1 reply; 12+ messages in thread From: lihuisong (C) @ 2026-04-14 9:43 UTC (permalink / raw) To: Breno Leitao, Rafael J. Wysocki, Len Brown, lpieralisi, catalin.marinas, will Cc: Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski, guohanjun, sudeep.holla, linux-arm-kernel, rmikey, kernel-team, lihuisong On 4/14/2026 12:54 AM, Breno Leitao wrote: > Commit cac173bea57d ("ACPI: processor: idle: Rework the handling of > acpi_processor_ffh_lpi_probe()") moved the acpi_processor_ffh_lpi_probe() > call from acpi_processor_setup_cpuidle_dev(), where its return value was > ignored, to acpi_processor_get_power_info(), where it is treated as a > hard failure. This causes cpuidle setup to fail entirely for all CPUs on > platforms where the FFH LPI probe returns an error. > > On NVIDIA Grace (aarch64) systems with PSCIv1.1, the probe fails for all > 72 CPUs with -ENODEV because psci_acpi_cpu_init_idle() finds > power.count - 1 <= 0 (power.count=1). This results in no cpuidle states > registered for any CPU, forcing them to busy-poll when idle instead of > entering low-power states. > > Sorry for bring you cpuidle issues on your platform. IIUC, your platfom just has one LPI states on failure, it should be WFI, right? This failure will only cause the cpuidle directory for each CPU not to be created, but will not affect the CPU entering the WFI state which done by cpuidle core. But it is a real issue. Thanks for your report. I think the best way to fix your issue is that remove this verification in psci_acpi_cpu_init_idle(). Because it is legal for platform to report one LPI state. This function just needs to verify the LPI states which are FFH. /Huisong > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV 2026-04-14 9:43 ` lihuisong (C) @ 2026-04-14 10:21 ` Breno Leitao 2026-04-14 11:31 ` lihuisong (C) 0 siblings, 1 reply; 12+ messages in thread From: Breno Leitao @ 2026-04-14 10:21 UTC (permalink / raw) To: lihuisong (C) Cc: Rafael J. Wysocki, Len Brown, lpieralisi, catalin.marinas, will, Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski, guohanjun, sudeep.holla, linux-arm-kernel, rmikey, kernel-team Hello Huisong, On Tue, Apr 14, 2026 at 05:43:51PM +0800, lihuisong (C) wrote: > But it is a real issue. Thanks for your report. > I think the best way to fix your issue is that remove this verification in > psci_acpi_cpu_init_idle(). > Because it is legal for platform to report one LPI state. > This function just needs to verify the LPI states which are FFH. Thank you for the prompt feedback. Would this approach work? commit 6c9d52840a4f778cc989838ba76ee51416e85de3 Author: Breno Leitao <leitao@debian.org> Date: Tue Apr 14 03:16:08 2026 -0700 ACPI: processor: idle: Allow platforms with only one LPI state psci_acpi_cpu_init_idle() rejects platforms where power.count - 1 <= 0 by returning -ENODEV. However, having a single LPI state (WFI) is a valid configuration. The function's purpose is to verify FFH idle states, and when count is zero, there are simply no FFH states to validate — this is not an error. On NVIDIA Grace (aarch64) systems with PSCIv1.1, power.count is 1 for all 72 CPUs, so the probe fails with -ENODEV. After commit cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()"), this failure propagates up and prevents cpuidle registration entirely. Change the check from (count <= 0) to (count < 0) so that platforms with only WFI are accepted. The for loop naturally handles count == 0 by not iterating. Fixes: cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()") Signed-off-by: Breno Leitao <leitao@debian.org> diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c index 801f9c4501425..7791b751042ce 100644 --- a/drivers/acpi/arm64/cpuidle.c +++ b/drivers/acpi/arm64/cpuidle.c @@ -31,7 +31,7 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu) return -EOPNOTSUPP; count = pr->power.count - 1; - if (count <= 0) + if (count < 0) return -ENODEV; for (i = 0; i < count; i++) { ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV 2026-04-14 10:21 ` Breno Leitao @ 2026-04-14 11:31 ` lihuisong (C) 2026-04-14 12:05 ` Breno Leitao 2026-04-14 12:25 ` Sudeep Holla 0 siblings, 2 replies; 12+ messages in thread From: lihuisong (C) @ 2026-04-14 11:31 UTC (permalink / raw) To: Breno Leitao Cc: Rafael J. Wysocki, Len Brown, lpieralisi, catalin.marinas, will, Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski, guohanjun, sudeep.holla, linux-arm-kernel, rmikey, kernel-team On 4/14/2026 6:21 PM, Breno Leitao wrote: > Hello Huisong, > > On Tue, Apr 14, 2026 at 05:43:51PM +0800, lihuisong (C) wrote: >> But it is a real issue. Thanks for your report. >> I think the best way to fix your issue is that remove this verification in >> psci_acpi_cpu_init_idle(). >> Because it is legal for platform to report one LPI state. >> This function just needs to verify the LPI states which are FFH. > Thank you for the prompt feedback. > > Would this approach work? > > commit 6c9d52840a4f778cc989838ba76ee51416e85de3 > Author: Breno Leitao <leitao@debian.org> > Date: Tue Apr 14 03:16:08 2026 -0700 > > ACPI: processor: idle: Allow platforms with only one LPI state > > psci_acpi_cpu_init_idle() rejects platforms where power.count - 1 <= 0 > by returning -ENODEV. However, having a single LPI state (WFI) is a > valid configuration. The function's purpose is to verify FFH idle states, > and when count is zero, there are simply no FFH states to validate — > this is not an error. > > On NVIDIA Grace (aarch64) systems with PSCIv1.1, power.count is 1 for > all 72 CPUs, so the probe fails with -ENODEV. After commit cac173bea57d > ("ACPI: processor: idle: Rework the handling of > acpi_processor_ffh_lpi_probe()"), this failure propagates up and prevents > cpuidle registration entirely. > > Change the check from (count <= 0) to (count < 0) so that platforms > with only WFI are accepted. The for loop naturally handles count == 0 > by not iterating. > > Fixes: cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()") > Signed-off-by: Breno Leitao <leitao@debian.org> > > diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c > index 801f9c4501425..7791b751042ce 100644 > --- a/drivers/acpi/arm64/cpuidle.c > +++ b/drivers/acpi/arm64/cpuidle.c > @@ -31,7 +31,7 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu) > return -EOPNOTSUPP; > > count = pr->power.count - 1; > - if (count <= 0) > + if (count < 0) > return -ENODEV; > > for (i = 0; i < count; i++) { This count already verified in acpi_processor_get_lpi_info. I suggest modifing it as below: --> git diff diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c index 801f9c450142..c68a5db8ebba 100644 --- a/drivers/acpi/arm64/cpuidle.c +++ b/drivers/acpi/arm64/cpuidle.c @@ -16,7 +16,7 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu) { - int i, count; + int i; struct acpi_lpi_state *lpi; struct acpi_processor *pr = per_cpu(processors, cpu); @@ -30,14 +30,10 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu) if (!psci_ops.cpu_suspend) return -EOPNOTSUPP; - count = pr->power.count - 1; - if (count <= 0) - return -ENODEV; - - for (i = 0; i < count; i++) { + for (i = 1; i < pr->power.count; i++) { u32 state; - lpi = &pr->power.lpi_states[i + 1]; + lpi = &pr->power.lpi_states[i]; /* * Only bits[31:0] represent a PSCI power_state while * bits[63:32] must be 0x0 as per ARM ACPI FFH Specification ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV 2026-04-14 11:31 ` lihuisong (C) @ 2026-04-14 12:05 ` Breno Leitao 2026-04-14 12:25 ` Sudeep Holla 1 sibling, 0 replies; 12+ messages in thread From: Breno Leitao @ 2026-04-14 12:05 UTC (permalink / raw) To: lihuisong (C) Cc: Rafael J. Wysocki, Len Brown, lpieralisi, catalin.marinas, will, Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski, guohanjun, sudeep.holla, linux-arm-kernel, rmikey, kernel-team On Tue, Apr 14, 2026 at 07:31:29PM +0800, lihuisong (C) wrote: > > On 4/14/2026 6:21 PM, Breno Leitao wrote: > > Hello Huisong, > > > > On Tue, Apr 14, 2026 at 05:43:51PM +0800, lihuisong (C) wrote: > > > But it is a real issue. Thanks for your report. > > > I think the best way to fix your issue is that remove this verification in > > > psci_acpi_cpu_init_idle(). > > > Because it is legal for platform to report one LPI state. > > > This function just needs to verify the LPI states which are FFH. > > Thank you for the prompt feedback. > > > > Would this approach work? > > > > commit 6c9d52840a4f778cc989838ba76ee51416e85de3 > > Author: Breno Leitao <leitao@debian.org> > > Date: Tue Apr 14 03:16:08 2026 -0700 > > > > ACPI: processor: idle: Allow platforms with only one LPI state > > psci_acpi_cpu_init_idle() rejects platforms where power.count - 1 <= 0 > > by returning -ENODEV. However, having a single LPI state (WFI) is a > > valid configuration. The function's purpose is to verify FFH idle states, > > and when count is zero, there are simply no FFH states to validate — > > this is not an error. > > On NVIDIA Grace (aarch64) systems with PSCIv1.1, power.count is 1 for > > all 72 CPUs, so the probe fails with -ENODEV. After commit cac173bea57d > > ("ACPI: processor: idle: Rework the handling of > > acpi_processor_ffh_lpi_probe()"), this failure propagates up and prevents > > cpuidle registration entirely. > > Change the check from (count <= 0) to (count < 0) so that platforms > > with only WFI are accepted. The for loop naturally handles count == 0 > > by not iterating. > > Fixes: cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()") > > Signed-off-by: Breno Leitao <leitao@debian.org> > > > > diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c > > index 801f9c4501425..7791b751042ce 100644 > > --- a/drivers/acpi/arm64/cpuidle.c > > +++ b/drivers/acpi/arm64/cpuidle.c > > @@ -31,7 +31,7 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu) > > return -EOPNOTSUPP; > > count = pr->power.count - 1; > > - if (count <= 0) > > + if (count < 0) > > return -ENODEV; > > for (i = 0; i < count; i++) { > > This count already verified in acpi_processor_get_lpi_info. > > I suggest modifing it as below: > > --> > > git diff > diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c > index 801f9c450142..c68a5db8ebba 100644 > --- a/drivers/acpi/arm64/cpuidle.c > +++ b/drivers/acpi/arm64/cpuidle.c > @@ -16,7 +16,7 @@ > > static int psci_acpi_cpu_init_idle(unsigned int cpu) > { > - int i, count; > + int i; > struct acpi_lpi_state *lpi; > struct acpi_processor *pr = per_cpu(processors, cpu); > > @@ -30,14 +30,10 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu) > if (!psci_ops.cpu_suspend) > return -EOPNOTSUPP; > > - count = pr->power.count - 1; > - if (count <= 0) > - return -ENODEV; > - > - for (i = 0; i < count; i++) { > + for (i = 1; i < pr->power.count; i++) { > u32 state; > > - lpi = &pr->power.lpi_states[i + 1]; > + lpi = &pr->power.lpi_states[i]; > /* > * Only bits[31:0] represent a PSCI power_state while > * bits[63:32] must be 0x0 as per ARM ACPI FFH Specification Ack, I will respin. Thanks for your help, --breno ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV 2026-04-14 11:31 ` lihuisong (C) 2026-04-14 12:05 ` Breno Leitao @ 2026-04-14 12:25 ` Sudeep Holla 2026-04-14 13:14 ` Breno Leitao 2026-04-15 1:32 ` lihuisong (C) 1 sibling, 2 replies; 12+ messages in thread From: Sudeep Holla @ 2026-04-14 12:25 UTC (permalink / raw) To: lihuisong (C) Cc: Breno Leitao, Sudeep Holla, Rafael J. Wysocki, Len Brown, lpieralisi, catalin.marinas, will, Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski, guohanjun, linux-arm-kernel, rmikey, kernel-team On Tue, Apr 14, 2026 at 07:31:29PM +0800, lihuisong (C) wrote: > > On 4/14/2026 6:21 PM, Breno Leitao wrote: > > Hello Huisong, > > > > On Tue, Apr 14, 2026 at 05:43:51PM +0800, lihuisong (C) wrote: > > > But it is a real issue. Thanks for your report. > > > I think the best way to fix your issue is that remove this verification in > > > psci_acpi_cpu_init_idle(). > > > Because it is legal for platform to report one LPI state. > > > This function just needs to verify the LPI states which are FFH. > > Thank you for the prompt feedback. > > > > Would this approach work? > > > > commit 6c9d52840a4f778cc989838ba76ee51416e85de3 > > Author: Breno Leitao <leitao@debian.org> > > Date: Tue Apr 14 03:16:08 2026 -0700 > > > > ACPI: processor: idle: Allow platforms with only one LPI state > > psci_acpi_cpu_init_idle() rejects platforms where power.count - 1 <= 0 > > by returning -ENODEV. However, having a single LPI state (WFI) is a > > valid configuration. The function's purpose is to verify FFH idle states, > > and when count is zero, there are simply no FFH states to validate — > > this is not an error. > > On NVIDIA Grace (aarch64) systems with PSCIv1.1, power.count is 1 for > > all 72 CPUs, so the probe fails with -ENODEV. After commit cac173bea57d > > ("ACPI: processor: idle: Rework the handling of > > acpi_processor_ffh_lpi_probe()"), this failure propagates up and prevents > > cpuidle registration entirely. > > Change the check from (count <= 0) to (count < 0) so that platforms > > with only WFI are accepted. The for loop naturally handles count == 0 > > by not iterating. > > Fixes: cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()") > > Signed-off-by: Breno Leitao <leitao@debian.org> > > > > diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c > > index 801f9c4501425..7791b751042ce 100644 > > --- a/drivers/acpi/arm64/cpuidle.c > > +++ b/drivers/acpi/arm64/cpuidle.c > > @@ -31,7 +31,7 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu) > > return -EOPNOTSUPP; > > count = pr->power.count - 1; > > - if (count <= 0) > > + if (count < 0) > > return -ENODEV; > > for (i = 0; i < count; i++) { > > This count already verified in acpi_processor_get_lpi_info. > > I suggest modifing it as below: > > --> > > git diff > diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c > index 801f9c450142..c68a5db8ebba 100644 > --- a/drivers/acpi/arm64/cpuidle.c > +++ b/drivers/acpi/arm64/cpuidle.c > @@ -16,7 +16,7 @@ > > static int psci_acpi_cpu_init_idle(unsigned int cpu) > { > - int i, count; > + int i; > struct acpi_lpi_state *lpi; > struct acpi_processor *pr = per_cpu(processors, cpu); > > @@ -30,14 +30,10 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu) > if (!psci_ops.cpu_suspend) > return -EOPNOTSUPP; > > - count = pr->power.count - 1; > - if (count <= 0) > - return -ENODEV; > - It was intentionally designed this way, as there is little value in defining only WFI in the _LPI tables. In the absence of a cpuidle driver/LPI entry, arch_cpu_idle() is invoked, which is sufficient and avoids unnecessary complexity, only to ultimately execute wfi() anyway. So while I understand that the kernel did not report an error previously, that does not mean the _LPI table is merely moot on this platform when it contains only a WFI state. -- Regards, Sudeep ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV 2026-04-14 12:25 ` Sudeep Holla @ 2026-04-14 13:14 ` Breno Leitao 2026-04-14 14:10 ` Sudeep Holla 2026-04-15 1:32 ` lihuisong (C) 1 sibling, 1 reply; 12+ messages in thread From: Breno Leitao @ 2026-04-14 13:14 UTC (permalink / raw) To: Sudeep Holla Cc: lihuisong (C), Rafael J. Wysocki, Len Brown, lpieralisi, catalin.marinas, will, Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski, guohanjun, linux-arm-kernel, rmikey, kernel-team Hello Sudeep, On Tue, Apr 14, 2026 at 01:25:53PM +0100, Sudeep Holla wrote: > So while I understand that the kernel did not report an error previously, that > does not mean the _LPI table is merely moot on this platform when it contains > only a WFI state. Can you clarify whether datacenter ARM systems are expected to expose deeper idle states beyond WFI in their _LPI tables? Backing up, I'm observing 72 pr_err() messages during boot on these hosts and trying to determine whether this indicates a firmware issue or if the kernel needs adjustment. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV 2026-04-14 13:14 ` Breno Leitao @ 2026-04-14 14:10 ` Sudeep Holla 2026-04-14 16:31 ` Breno Leitao 0 siblings, 1 reply; 12+ messages in thread From: Sudeep Holla @ 2026-04-14 14:10 UTC (permalink / raw) To: Breno Leitao Cc: lihuisong (C), Rafael J. Wysocki, Sudeep Holla, Len Brown, lpieralisi, catalin.marinas, will, Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski, guohanjun, linux-arm-kernel, rmikey, kernel-team On Tue, Apr 14, 2026 at 06:14:19AM -0700, Breno Leitao wrote: > Hello Sudeep, > > On Tue, Apr 14, 2026 at 01:25:53PM +0100, Sudeep Holla wrote: > > So while I understand that the kernel did not report an error previously, that > > does not mean the _LPI table is merely moot on this platform when it contains > > only a WFI state. > > Can you clarify whether datacenter ARM systems are expected to expose > deeper idle states beyond WFI in their _LPI tables? > Of course any system that prefers to save power when its idling must have these _LPI deeper idle states. > Backing up, I'm observing 72 pr_err() messages during boot on these > hosts and trying to determine whether this indicates a firmware issue or > if the kernel needs adjustment. I consider this a firmware issue, but not a fatal one. What matters more is the behavior after those errors are reported. If you force success, either through your change or through the PSCI approach suggested by lihuisong, then in practice you are only enabling a cpuidle driver with a single usable state: WFI. That is not inherently wrong, but it also does not provide much benefit. When the idle task needs to enter an idle state, it will ask the cpuidle framework to choose the appropriate state. In this case, however, the framework has no real decision to make, because the only available state is WFI. If that is all the platform can support, then the same result can be achieved more efficiently without registering the cpuidle driver at all, by falling back to `arch_cpu_idle()`, as I mentioned earlier. -- Regards, Sudeep ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV 2026-04-14 14:10 ` Sudeep Holla @ 2026-04-14 16:31 ` Breno Leitao 2026-04-15 10:45 ` Sudeep Holla 0 siblings, 1 reply; 12+ messages in thread From: Breno Leitao @ 2026-04-14 16:31 UTC (permalink / raw) To: Sudeep Holla Cc: lihuisong (C), Rafael J. Wysocki, Len Brown, lpieralisi, catalin.marinas, will, Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski, guohanjun, linux-arm-kernel, rmikey, kernel-team Hello Sudeep, On Tue, Apr 14, 2026 at 03:10:03PM +0100, Sudeep Holla wrote: > On Tue, Apr 14, 2026 at 06:14:19AM -0700, Breno Leitao wrote: > > Hello Sudeep, > > > > On Tue, Apr 14, 2026 at 01:25:53PM +0100, Sudeep Holla wrote: > > > So while I understand that the kernel did not report an error previously, that > > > does not mean the _LPI table is merely moot on this platform when it contains > > > only a WFI state. > > > > Can you clarify whether datacenter ARM systems are expected to expose > > deeper idle states beyond WFI in their _LPI tables? > > > > Of course any system that prefers to save power when its idling must have > these _LPI deeper idle states. > > > Backing up, I'm observing 72 pr_err() messages during boot on these > > hosts and trying to determine whether this indicates a firmware issue or > > if the kernel needs adjustment. > > I consider this a firmware issue, but not a fatal one. What matters more is > the behavior after those errors are reported. I understand. While I'm not a hardware or firmware vendor myself, I can see how they might consider power management features _optional_ for certain server configurations. > If you force success, either through your change or through the PSCI approach > suggested by lihuisong, then in practice you are only enabling a cpuidle > driver with a single usable state: WFI. That is not inherently wrong, but it > also does not provide much benefit. Given that this isn't a critical error, would it make sense to downgrade the pr_err() to pr_debug()? is it a reasonable compromise. I just want to avoid these pr_err() all accross the board, affecting kernel health metrics in large fleets. My proposal: commit c98007f9e10fe229672d29c3844c96705cecaed5 Author: Breno Leitao <leitao@debian.org> Date: Tue Apr 14 05:28:28 2026 -0700 ACPI: processor: idle: Downgrade FFH LPI probe failure message to pr_debug() The "Invalid FFH LPI data" message is printed at pr_err() level for every CPU when acpi_processor_ffh_lpi_probe() fails. On platforms where the FFH probe legitimately returns an error (e.g., no deep idle states beyond WFI), this floods the kernel log with per-CPU error messages that are not actionable. Downgrade to pr_debug() since this is a diagnostic message, not a critical error. Signed-off-by: Breno Leitao <leitao@debian.org> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index ee5facccbe10c..ab93a2c10a9ad 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -1259,7 +1259,7 @@ static int acpi_processor_get_power_info(struct acpi_processor *pr) if (pr->flags.has_lpi) { ret = acpi_processor_ffh_lpi_probe(pr->id); if (ret) - pr_err("CPU%u: Invalid FFH LPI data\n", pr->id); + pr_debug("CPU%u: Invalid FFH LPI data\n", pr->id); } return ret; ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV 2026-04-14 16:31 ` Breno Leitao @ 2026-04-15 10:45 ` Sudeep Holla 0 siblings, 0 replies; 12+ messages in thread From: Sudeep Holla @ 2026-04-15 10:45 UTC (permalink / raw) To: Breno Leitao Cc: lihuisong (C), Rafael J. Wysocki, Sudeep Holla, Len Brown, lpieralisi, catalin.marinas, will, Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski, guohanjun, linux-arm-kernel, rmikey, kernel-team On Tue, Apr 14, 2026 at 09:31:33AM -0700, Breno Leitao wrote: > Hello Sudeep, > > On Tue, Apr 14, 2026 at 03:10:03PM +0100, Sudeep Holla wrote: > > On Tue, Apr 14, 2026 at 06:14:19AM -0700, Breno Leitao wrote: > > > Hello Sudeep, > > > > > > On Tue, Apr 14, 2026 at 01:25:53PM +0100, Sudeep Holla wrote: > > > > So while I understand that the kernel did not report an error previously, that > > > > does not mean the _LPI table is merely moot on this platform when it contains > > > > only a WFI state. > > > > > > Can you clarify whether datacenter ARM systems are expected to expose > > > deeper idle states beyond WFI in their _LPI tables? > > > > > > > Of course any system that prefers to save power when its idling must have > > these _LPI deeper idle states. > > > > > Backing up, I'm observing 72 pr_err() messages during boot on these > > > hosts and trying to determine whether this indicates a firmware issue or > > > if the kernel needs adjustment. > > > > I consider this a firmware issue, but not a fatal one. What matters more is > > the behavior after those errors are reported. > > I understand. While I'm not a hardware or firmware vendor myself, I can > see how they might consider power management features _optional_ for certain > server configurations. > > > If you force success, either through your change or through the PSCI approach > > suggested by lihuisong, then in practice you are only enabling a cpuidle > > driver with a single usable state: WFI. That is not inherently wrong, but it > > also does not provide much benefit. > > Given that this isn't a critical error, would it make sense to downgrade > the pr_err() to pr_debug()? is it a reasonable compromise. I just want > to avoid these pr_err() all accross the board, affecting kernel health > metrics in large fleets. > > My proposal: > > commit c98007f9e10fe229672d29c3844c96705cecaed5 > Author: Breno Leitao <leitao@debian.org> > Date: Tue Apr 14 05:28:28 2026 -0700 > > ACPI: processor: idle: Downgrade FFH LPI probe failure message to pr_debug() > > The "Invalid FFH LPI data" message is printed at pr_err() level for every > CPU when acpi_processor_ffh_lpi_probe() fails. On platforms where the FFH > probe legitimately returns an error (e.g., no deep idle states beyond > WFI), this floods the kernel log with per-CPU error messages that are not > actionable. > > Downgrade to pr_debug() since this is a diagnostic message, not a > critical error. > > Signed-off-by: Breno Leitao <leitao@debian.org> > > diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c > index ee5facccbe10c..ab93a2c10a9ad 100644 > --- a/drivers/acpi/processor_idle.c > +++ b/drivers/acpi/processor_idle.c > @@ -1259,7 +1259,7 @@ static int acpi_processor_get_power_info(struct acpi_processor *pr) > if (pr->flags.has_lpi) { > ret = acpi_processor_ffh_lpi_probe(pr->id); > if (ret) > - pr_err("CPU%u: Invalid FFH LPI data\n", pr->id); > + pr_debug("CPU%u: Invalid FFH LPI data\n", pr->id); I am afraid if this might mask the real _LPI table entry errors. Need to think, I am still of the opinion that we need warn on that system as _LPI with just WFI is simply useless. -- Regards, Sudeep ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV 2026-04-14 12:25 ` Sudeep Holla 2026-04-14 13:14 ` Breno Leitao @ 2026-04-15 1:32 ` lihuisong (C) 2026-04-15 14:03 ` Rafael J. Wysocki 1 sibling, 1 reply; 12+ messages in thread From: lihuisong (C) @ 2026-04-15 1:32 UTC (permalink / raw) To: Sudeep Holla, Rafael J. Wysocki Cc: Breno Leitao, Len Brown, lpieralisi, catalin.marinas, will, Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski, guohanjun, linux-arm-kernel, rmikey, kernel-team, lihuisong On 4/14/2026 8:25 PM, Sudeep Holla wrote: > On Tue, Apr 14, 2026 at 07:31:29PM +0800, lihuisong (C) wrote: >> On 4/14/2026 6:21 PM, Breno Leitao wrote: >>> Hello Huisong, >>> >>> On Tue, Apr 14, 2026 at 05:43:51PM +0800, lihuisong (C) wrote: >>>> But it is a real issue. Thanks for your report. >>>> I think the best way to fix your issue is that remove this verification in >>>> psci_acpi_cpu_init_idle(). >>>> Because it is legal for platform to report one LPI state. >>>> This function just needs to verify the LPI states which are FFH. >>> Thank you for the prompt feedback. >>> >>> Would this approach work? >>> >>> commit 6c9d52840a4f778cc989838ba76ee51416e85de3 >>> Author: Breno Leitao <leitao@debian.org> >>> Date: Tue Apr 14 03:16:08 2026 -0700 >>> >>> ACPI: processor: idle: Allow platforms with only one LPI state >>> psci_acpi_cpu_init_idle() rejects platforms where power.count - 1 <= 0 >>> by returning -ENODEV. However, having a single LPI state (WFI) is a >>> valid configuration. The function's purpose is to verify FFH idle states, >>> and when count is zero, there are simply no FFH states to validate — >>> this is not an error. >>> On NVIDIA Grace (aarch64) systems with PSCIv1.1, power.count is 1 for >>> all 72 CPUs, so the probe fails with -ENODEV. After commit cac173bea57d >>> ("ACPI: processor: idle: Rework the handling of >>> acpi_processor_ffh_lpi_probe()"), this failure propagates up and prevents >>> cpuidle registration entirely. >>> Change the check from (count <= 0) to (count < 0) so that platforms >>> with only WFI are accepted. The for loop naturally handles count == 0 >>> by not iterating. >>> Fixes: cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()") >>> Signed-off-by: Breno Leitao <leitao@debian.org> >>> >>> diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c >>> index 801f9c4501425..7791b751042ce 100644 >>> --- a/drivers/acpi/arm64/cpuidle.c >>> +++ b/drivers/acpi/arm64/cpuidle.c >>> @@ -31,7 +31,7 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu) >>> return -EOPNOTSUPP; >>> count = pr->power.count - 1; >>> - if (count <= 0) >>> + if (count < 0) >>> return -ENODEV; >>> for (i = 0; i < count; i++) { >> This count already verified in acpi_processor_get_lpi_info. >> >> I suggest modifing it as below: >> >> --> >> >> git diff >> diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c >> index 801f9c450142..c68a5db8ebba 100644 >> --- a/drivers/acpi/arm64/cpuidle.c >> +++ b/drivers/acpi/arm64/cpuidle.c >> @@ -16,7 +16,7 @@ >> >> static int psci_acpi_cpu_init_idle(unsigned int cpu) >> { >> - int i, count; >> + int i; >> struct acpi_lpi_state *lpi; >> struct acpi_processor *pr = per_cpu(processors, cpu); >> >> @@ -30,14 +30,10 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu) >> if (!psci_ops.cpu_suspend) >> return -EOPNOTSUPP; >> >> - count = pr->power.count - 1; >> - if (count <= 0) >> - return -ENODEV; >> - > It was intentionally designed this way, as there is little value in defining > only WFI in the _LPI tables. In the absence of a cpuidle driver/LPI entry, > arch_cpu_idle() is invoked, which is sufficient and avoids unnecessary > complexity, only to ultimately execute wfi() anyway. Yeah, it's correct. The code flow will be more simple and high-efficiency. This looks good to me. But cpuidle sysfs under per CPU is created when firmware just reports WFI state before my commit cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()"). However, these platforms will no longer be created now and some statistics for state0 are also missing. This change in behavor is visiable to user space.I'm not sure if it is acceptable. What do you think, Rafael? > So while I understand that the kernel did not report an error previously, that > does not mean the _LPI table is merely moot on this platform when it contains > only a WFI state. > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV 2026-04-15 1:32 ` lihuisong (C) @ 2026-04-15 14:03 ` Rafael J. Wysocki 0 siblings, 0 replies; 12+ messages in thread From: Rafael J. Wysocki @ 2026-04-15 14:03 UTC (permalink / raw) To: lihuisong (C) Cc: Sudeep Holla, Rafael J. Wysocki, Breno Leitao, Len Brown, lpieralisi, catalin.marinas, will, Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski, guohanjun, linux-arm-kernel, rmikey, kernel-team On Wed, Apr 15, 2026 at 3:32 AM lihuisong (C) <lihuisong@huawei.com> wrote: > > > On 4/14/2026 8:25 PM, Sudeep Holla wrote: > > On Tue, Apr 14, 2026 at 07:31:29PM +0800, lihuisong (C) wrote: > >> On 4/14/2026 6:21 PM, Breno Leitao wrote: > >>> Hello Huisong, > >>> > >>> On Tue, Apr 14, 2026 at 05:43:51PM +0800, lihuisong (C) wrote: > >>>> But it is a real issue. Thanks for your report. > >>>> I think the best way to fix your issue is that remove this verification in > >>>> psci_acpi_cpu_init_idle(). > >>>> Because it is legal for platform to report one LPI state. > >>>> This function just needs to verify the LPI states which are FFH. > >>> Thank you for the prompt feedback. > >>> > >>> Would this approach work? > >>> > >>> commit 6c9d52840a4f778cc989838ba76ee51416e85de3 > >>> Author: Breno Leitao <leitao@debian.org> > >>> Date: Tue Apr 14 03:16:08 2026 -0700 > >>> > >>> ACPI: processor: idle: Allow platforms with only one LPI state > >>> psci_acpi_cpu_init_idle() rejects platforms where power.count - 1 <= 0 > >>> by returning -ENODEV. However, having a single LPI state (WFI) is a > >>> valid configuration. The function's purpose is to verify FFH idle states, > >>> and when count is zero, there are simply no FFH states to validate — > >>> this is not an error. > >>> On NVIDIA Grace (aarch64) systems with PSCIv1.1, power.count is 1 for > >>> all 72 CPUs, so the probe fails with -ENODEV. After commit cac173bea57d > >>> ("ACPI: processor: idle: Rework the handling of > >>> acpi_processor_ffh_lpi_probe()"), this failure propagates up and prevents > >>> cpuidle registration entirely. > >>> Change the check from (count <= 0) to (count < 0) so that platforms > >>> with only WFI are accepted. The for loop naturally handles count == 0 > >>> by not iterating. > >>> Fixes: cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()") > >>> Signed-off-by: Breno Leitao <leitao@debian.org> > >>> > >>> diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c > >>> index 801f9c4501425..7791b751042ce 100644 > >>> --- a/drivers/acpi/arm64/cpuidle.c > >>> +++ b/drivers/acpi/arm64/cpuidle.c > >>> @@ -31,7 +31,7 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu) > >>> return -EOPNOTSUPP; > >>> count = pr->power.count - 1; > >>> - if (count <= 0) > >>> + if (count < 0) > >>> return -ENODEV; > >>> for (i = 0; i < count; i++) { > >> This count already verified in acpi_processor_get_lpi_info. > >> > >> I suggest modifing it as below: > >> > >> --> > >> > >> git diff > >> diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c > >> index 801f9c450142..c68a5db8ebba 100644 > >> --- a/drivers/acpi/arm64/cpuidle.c > >> +++ b/drivers/acpi/arm64/cpuidle.c > >> @@ -16,7 +16,7 @@ > >> > >> static int psci_acpi_cpu_init_idle(unsigned int cpu) > >> { > >> - int i, count; > >> + int i; > >> struct acpi_lpi_state *lpi; > >> struct acpi_processor *pr = per_cpu(processors, cpu); > >> > >> @@ -30,14 +30,10 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu) > >> if (!psci_ops.cpu_suspend) > >> return -EOPNOTSUPP; > >> > >> - count = pr->power.count - 1; > >> - if (count <= 0) > >> - return -ENODEV; > >> - > > It was intentionally designed this way, as there is little value in defining > > only WFI in the _LPI tables. In the absence of a cpuidle driver/LPI entry, > > arch_cpu_idle() is invoked, which is sufficient and avoids unnecessary > > complexity, only to ultimately execute wfi() anyway. > Yeah, it's correct. The code flow will be more simple and high-efficiency. > This looks good to me. > > > But cpuidle sysfs under per CPU is created when firmware just reports > WFI state before > my commit cac173bea57d ("ACPI: processor: idle: Rework the handling of > acpi_processor_ffh_lpi_probe()"). > However, these platforms will no longer be created now and some > statistics for state0 are also missing. > This change in behavor is visiable to user space.I'm not sure if it is > acceptable. > What do you think, Rafael? I think that it would be good to restore the previous behavior, especially if it has been changed inadvertently. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-04-15 14:03 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-13 16:54 [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV Breno Leitao 2026-04-14 9:43 ` lihuisong (C) 2026-04-14 10:21 ` Breno Leitao 2026-04-14 11:31 ` lihuisong (C) 2026-04-14 12:05 ` Breno Leitao 2026-04-14 12:25 ` Sudeep Holla 2026-04-14 13:14 ` Breno Leitao 2026-04-14 14:10 ` Sudeep Holla 2026-04-14 16:31 ` Breno Leitao 2026-04-15 10:45 ` Sudeep Holla 2026-04-15 1:32 ` lihuisong (C) 2026-04-15 14:03 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox