Devicetree
 help / color / mirror / Atom feed
* [PATCH RFC] arm64: dts: qcom: hamoa: Drop cluster_cl5 idle state from CPU clusters
@ 2026-06-04 17:40 Jens Glathe via B4 Relay
  2026-06-05  8:09 ` Marc Zyngier
  0 siblings, 1 reply; 2+ messages in thread
From: Jens Glathe via B4 Relay @ 2026-06-04 17:40 UTC (permalink / raw)
  To: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-arm-msm, devicetree, linux-kernel, Steev Klimaszewski,
	Icecream95, Marc Zyngier, Jens Glathe

From: Jens Glathe <jens.glathe@oldschoolsolutions.biz>

The cluster_cl5 idle state triggers DC ZVA misbehavior that resets
X1 SoCs. Remove it from cluster_pd0/1/2 domain-idle-states for now.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Jens Glathe <jens.glathe@oldschoolsolutions.biz>
---
This is an RFC for a mitigation of a stability issue observed on
Snapdragon X1-based SoCs (Hamoa and Purwa).

Affected systems experience spontaneous resets under the following
conditions:
 - During intensive `git fetch` / `git pull` activity
 - During mostly idle periods (Bitburner and similar workloads were
   frequently mentioned)

Steev Klimaszewski first connected the crashes to git operations.
Subsequent discussion in #aarch64-laptops led icecream95 to isolate
DC ZVA as the triggering instruction and to create a reliable
reproducer [1].

Further debugging showed that the issue is strongly related to deep
cluster idle states. Marc Zyngier suggested removing the deepest
cluster state (`cluster_cl5`), which resolved the problem on all tested
consumer hardware.

This patch implements that change by removing `&cluster_cl5` from the
`domain-idle-states` of `cluster_pd0`, `cluster_pd1`, and `cluster_pd2`.

Testing:
 - Lenovo ThinkPad T14s G6 (X1E-78-100, Hamoa)
 - Lenovo ThinkBook 16 G7 QOY (X1P-42-100, Purwa)
 - Lenovo IdeaPad 5 2-in-1 14Q8X9 (X1P-42-100, Purwa)
 - Lenovo IdeaPad Slim 3x 15Q8X10 (X1-26-100, Purwa)

All consumer devices became stable with this change.

On the Snapdragon Dev Kit (X1E-001-DE, Hamoa) the situation is
different: the firmware does not advertise OSI mode. Even with this
patch the device still crashes with the x1e-crash reproducer. Stability
is only achieved by passing `cpuidle.off=1`, which of course increases
power consumption but makes the devkit a bit faster, so there's that.

The different behaviour correlates with PSCI mode:
- Consumer firmwares enable OSI mode
- Devkit firmware stays in platform-coordinated mode

This patch is therefore only a band-aid. All evidence points to a
firmware/microcode issue where DC ZVA can hit caches that have been
powered down by PSCI idle states. A proper fix would be either a
Qualcomm firmware update or a kernel erratum that disables DZE on
these SoCs.

[1] https://github.com/icecream95/x1e-crash
---
 arch/arm64/boot/dts/qcom/hamoa.dtsi | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/hamoa.dtsi b/arch/arm64/boot/dts/qcom/hamoa.dtsi
index 4ba751a65142b..8ec39ba621946 100644
--- a/arch/arm64/boot/dts/qcom/hamoa.dtsi
+++ b/arch/arm64/boot/dts/qcom/hamoa.dtsi
@@ -442,19 +442,19 @@ cpu_pd11: power-domain-cpu11 {
 
 		cluster_pd0: power-domain-cpu-cluster0 {
 			#power-domain-cells = <0>;
-			domain-idle-states = <&cluster_cl4>, <&cluster_cl5>;
+			domain-idle-states = <&cluster_cl4>;
 			power-domains = <&system_pd>;
 		};
 
 		cluster_pd1: power-domain-cpu-cluster1 {
 			#power-domain-cells = <0>;
-			domain-idle-states = <&cluster_cl4>, <&cluster_cl5>;
+			domain-idle-states = <&cluster_cl4>;
 			power-domains = <&system_pd>;
 		};
 
 		cluster_pd2: power-domain-cpu-cluster2 {
 			#power-domain-cells = <0>;
-			domain-idle-states = <&cluster_cl4>, <&cluster_cl5>;
+			domain-idle-states = <&cluster_cl4>;
 			power-domains = <&system_pd>;
 		};
 

---
base-commit: a225caacc36546a09586e3ece36c0313146e7da9
change-id: 20260604-dc_zva_mitigation-245ecd5d797f

Best regards,
-- 
Jens Glathe <jens.glathe@oldschoolsolutions.biz>



^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-06-05  8:06 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 17:40 [PATCH RFC] arm64: dts: qcom: hamoa: Drop cluster_cl5 idle state from CPU clusters Jens Glathe via B4 Relay
2026-06-05  8:09 ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox