From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E382948C8CF; Fri, 5 Jun 2026 08:06:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780646767; cv=none; b=QB5nwftW5HxPsXj7GIEkJh6RJUO1WBHS9nNNACRxfggE/IWkZYReaSx3dtkWL37jAbTa1UIpj0eUrDFceS6oz8em0ufyAqy65d5uGQjEBrqEbuwjsl9Tx5jPm3XnEw0hGGm19HFkRAzZ+/ZyUt+9F860OC7Bq4yNftFaQLXjG9w= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780646767; c=relaxed/simple; bh=kY1do4kUInWVtWNUomY8hPqOXdXkh1FfPmsTJWlYDGk=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=ntdgAAO17laYFM6HEcphDzvH1xe1B4JtCgDXNkoYvaEa2rbepmEDj8UFCm1WOsSoGfUyEv8HOrbx2DU3Ns2qXTSeOG1J0GIzqIuwnMxPaUm8LsV5ClJxCsWAMx3rNXFgdQCvxBhb/s+n1oQFH7MMZFWWB7mwCS3O9sJB8vyec4M= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=iylheQQk; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="iylheQQk" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 879931F00893; Fri, 5 Jun 2026 08:06:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780646765; bh=ZkO4TQoorawjATu9YptGJfbbBe6b6tFrB+zEo7jVabM=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=iylheQQkfo9ydTsxDQzWUxM8r87rGtS3m1YNoBCyzytH8wBHqKZ5ZpuiCWMmL1c6D EffAtqJPOuNfQZolNUeCPC+EqZOTN0wlQfARovHjwW3ClKdO0HkIlbT0AgvlHyhkDV 1/BzMKVwrRUoevkymZ127Zed6M8qMegRM9vd4dPYfKSL01l1hZRf3Zxo73zjw+1zBc EpnEo0hdPgFt0D6HK6KSt3PtcqgGeXgngTYDFT9fUZ390CWbuej3uPxqtThO3Fn0Vn qI2QqUsUVw3VpOtRxcvO0G7hswhq82JpbGTV1J0sOIa5Q/VF27lUGO1koCk8W9XxDM dFdVpjzK4igkA== Received: from sofa.misterjones.org ([185.219.108.64] helo=lobster-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wVPZ5-00000009gQx-2UtJ; Fri, 05 Jun 2026 08:06:03 +0000 Date: Fri, 05 Jun 2026 09:09:23 +0100 Message-ID: <87bjdp9znw.wl-maz@kernel.org> From: Marc Zyngier To: jens.glathe@oldschoolsolutions.biz Cc: Bjorn Andersson , Konrad Dybcio , Rob Herring , Krzysztof Kozlowski , Conor Dooley , linux-arm-msm@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, Steev Klimaszewski , Icecream95 Subject: Re: [PATCH RFC] arm64: dts: qcom: hamoa: Drop cluster_cl5 idle state from CPU clusters In-Reply-To: <20260604-dc_zva_mitigation-v1-1-d1148c1c0259@oldschoolsolutions.biz> References: <20260604-dc_zva_mitigation-v1-1-d1148c1c0259@oldschoolsolutions.biz> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: devicetree@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: jens.glathe@oldschoolsolutions.biz, andersson@kernel.org, konradybcio@kernel.org, robh@kernel.org, krzk+dt@kernel.org, conor+dt@kernel.org, linux-arm-msm@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, threeway@gmail.com, ixn@disroot.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Hi Jens, Thanks for sending this. On Thu, 04 Jun 2026 18:40:14 +0100, Jens Glathe via B4 Relay wrote: > > From: Jens Glathe > > The cluster_cl5 idle state triggers DC ZVA misbehavior that resets > X1 SoCs. Remove it from cluster_pd0/1/2 domain-idle-states for now. > > Suggested-by: Marc Zyngier > Signed-off-by: Jens Glathe > --- > This is an RFC for a mitigation of a stability issue observed on > Snapdragon X1-based SoCs (Hamoa and Purwa). > > Affected systems experience spontaneous resets under the following > conditions: > - During intensive `git fetch` / `git pull` activity > - During mostly idle periods (Bitburner and similar workloads were > frequently mentioned) > > Steev Klimaszewski first connected the crashes to git operations. > Subsequent discussion in #aarch64-laptops led icecream95 to isolate > DC ZVA as the triggering instruction and to create a reliable > reproducer [1]. > > Further debugging showed that the issue is strongly related to deep > cluster idle states. Marc Zyngier suggested removing the deepest > cluster state (`cluster_cl5`), which resolved the problem on all tested > consumer hardware. > > This patch implements that change by removing `&cluster_cl5` from the > `domain-idle-states` of `cluster_pd0`, `cluster_pd1`, and `cluster_pd2`. > > Testing: > - Lenovo ThinkPad T14s G6 (X1E-78-100, Hamoa) > - Lenovo ThinkBook 16 G7 QOY (X1P-42-100, Purwa) > - Lenovo IdeaPad 5 2-in-1 14Q8X9 (X1P-42-100, Purwa) > - Lenovo IdeaPad Slim 3x 15Q8X10 (X1-26-100, Purwa) > > All consumer devices became stable with this change. > > On the Snapdragon Dev Kit (X1E-001-DE, Hamoa) the situation is > different: the firmware does not advertise OSI mode. Even with this > patch the device still crashes with the x1e-crash reproducer. Stability > is only achieved by passing `cpuidle.off=1`, which of course increases > power consumption but makes the devkit a bit faster, so there's that. > > The different behaviour correlates with PSCI mode: > - Consumer firmwares enable OSI mode > - Devkit firmware stays in platform-coordinated mode > > This patch is therefore only a band-aid. All evidence points to a > firmware/microcode issue where DC ZVA can hit caches that have been > powered down by PSCI idle states. A proper fix would be either a > Qualcomm firmware update or a kernel erratum that disables DZE on > these SoCs. > > [1] https://github.com/icecream95/x1e-crash > --- > arch/arm64/boot/dts/qcom/hamoa.dtsi | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/boot/dts/qcom/hamoa.dtsi b/arch/arm64/boot/dts/qcom/hamoa.dtsi > index 4ba751a65142b..8ec39ba621946 100644 > --- a/arch/arm64/boot/dts/qcom/hamoa.dtsi > +++ b/arch/arm64/boot/dts/qcom/hamoa.dtsi > @@ -442,19 +442,19 @@ cpu_pd11: power-domain-cpu11 { > > cluster_pd0: power-domain-cpu-cluster0 { > #power-domain-cells = <0>; > - domain-idle-states = <&cluster_cl4>, <&cluster_cl5>; > + domain-idle-states = <&cluster_cl4>; > power-domains = <&system_pd>; > }; > > cluster_pd1: power-domain-cpu-cluster1 { > #power-domain-cells = <0>; > - domain-idle-states = <&cluster_cl4>, <&cluster_cl5>; > + domain-idle-states = <&cluster_cl4>; > power-domains = <&system_pd>; > }; > > cluster_pd2: power-domain-cpu-cluster2 { > #power-domain-cells = <0>; > - domain-idle-states = <&cluster_cl4>, <&cluster_cl5>; > + domain-idle-states = <&cluster_cl4>; > power-domains = <&system_pd>; > }; > > It may be worth adding a comment somewhere in the DTS file, as cluster_cl5 is not referenced anymore. Ideally we'd simply mark cluster-sleep-1 with 'status = "disabled"', but I'm not sure Linux (and other OSs that consume this) actively parse this property. Overall, I'd like clarity from the vendor on what can be done to better mitigate issues like this. So far, we have been randomly disabling features and CPU capabilities each and every time we find something broken on these machines, and the list is getting long. I don't think such course of action is sustainable, and maybe we should simply consider marking the full X1 platform as BROKEN so that people know what to expect. Thanks, M. -- Jazz isn't dead. It just smells funny.