* [PATCH 0/2] arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones
@ 2025-01-03 14:38 Neil Armstrong
2025-01-03 14:38 ` [PATCH 1/2] arm64: dts: qcom: sm8650: setup cpu thermal with idle on high temperatures Neil Armstrong
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Neil Armstrong @ 2025-01-03 14:38 UTC (permalink / raw)
To: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski,
Conor Dooley
Cc: linux-arm-msm, devicetree, linux-kernel, Neil Armstrong
On the SM8650 platform, the dynamic clock and voltage scaling (DCVS) for
the CPUs and GPU is handled by hardware & firmware using factory and
form-factor determined parameters in order to maximize frequency while
keeping the temperature way below the junction temperature where the SoC
would experience a thermal shutdown if not permanent damages.
On the other side, the High Level Ooperating System (HLOS), like Linux,
is able to adjust the CPU and GPU frequency using the internal SoC
temperature sensors (here tsens) and it's UP/LOW interrupts, but it
effectly does the same work twice in an less effective manner.
Let's take the Hardware & Firmware action in account and design the
thermal zones trip points and cooling devices mapping to use the HLOS
as a safety warant in case the platform experiences a temperature surge
to helpfully avoid a thermal shutdown and handle the scenario gracefully.
On the CPU side, the LMh hardware does the DCVS control loop, so
let's set higher trip points temperatures closer to the junction
and thermal shutdown temperatures and add some idle injection cooling
device with 100% duty cycle for each CPU that would act as emergency
action to avoid the thermal shutdown.
On the GPU side, the GPU Management Unit (GMU) acts as the DCVS
control loop, but since we can't perform idle injection, let's
also set higher trip points temperatures closer to the junction
and thermal shutdown temperatures to reduce the GPU frequency only
as an emergency action before the thermal shutdown.
Those 2 changes optimizes the thermal management design by avoiding
concurrent thermal management, calculations & avoidable interrupts
by moving the HLOS management to a last resort emergency if the
Hardware & Firmwares fails to avoid a thermal shutdown.
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
---
Neil Armstrong (2):
arm64: dts: qcom: sm8650: setup cpu thermal with idle on high temperatures
arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures
arch/arm64/boot/dts/qcom/sm8650.dtsi | 322 ++++++++++++++++++++++++++---------
1 file changed, 238 insertions(+), 84 deletions(-)
---
base-commit: 8155b4ef3466f0e289e8fcc9e6e62f3f4dceeac2
change-id: 20250103-topic-sm8650-thermal-cpu-idle-1e19181a94ed
Best regards,
--
Neil Armstrong <neil.armstrong@linaro.org>
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH 1/2] arm64: dts: qcom: sm8650: setup cpu thermal with idle on high temperatures 2025-01-03 14:38 [PATCH 0/2] arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones Neil Armstrong @ 2025-01-03 14:38 ` Neil Armstrong 2025-01-06 23:39 ` Bjorn Andersson 2025-01-03 14:38 ` [PATCH 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures Neil Armstrong 2025-01-03 14:43 ` [PATCH 0/2] arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones Konrad Dybcio 2 siblings, 1 reply; 16+ messages in thread From: Neil Armstrong @ 2025-01-03 14:38 UTC (permalink / raw) To: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley Cc: linux-arm-msm, devicetree, linux-kernel, Neil Armstrong On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an hardware controlled loop using the LMH and EPSS blocks with constraints and OPPs programmed in the board firmware. Since the Hardware does a better job at maintaining the CPUs temperature in an acceptable range by taking in account more parameters like the die characteristics or other factory fused values, it makes no sense to try and reproduce a similar set of constraints with the Linux cpufreq thermal core. In addition, the tsens IP is responsible for monitoring the temperature across the SoC and the current settings will heavily trigger the tsens UP/LOW interrupts if the CPU temperatures reaches the hardware thermal constraints which are currently defined in the DT. And since the CPUs are not hooked in the thermal trip points, the potential interrupts and calculations are a waste of system resources. Instead, set higher temperatures in the CPU trip points, and hook some CPU idle injector with a 100% duty cycle at the highest trip point in the case the hardware DCVS cannot handle the temperature surge, and try our best to avoid reaching the critical temperature trip point which should trigger an inevitable thermal shutdown. Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> --- arch/arm64/boot/dts/qcom/sm8650.dtsi | 274 +++++++++++++++++++++++++++-------- 1 file changed, 214 insertions(+), 60 deletions(-) diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi index 25e47505adcb790d09f1d2726386438487255824..448374a32e07151e35727d92fab77356769aea8a 100644 --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi @@ -99,6 +99,13 @@ l3_0: l3-cache { cache-unified; }; }; + + cpu0_idle: thermal-idle { + #cooling-cells = <2>; + duration-us = <800000>; + exit-latency-us = <10000>; + }; + }; cpu1: cpu@100 { @@ -119,6 +126,12 @@ cpu1: cpu@100 { qcom,freq-domain = <&cpufreq_hw 0>; #cooling-cells = <2>; + + cpu1_idle: thermal-idle { + #cooling-cells = <2>; + duration-us = <800000>; + exit-latency-us = <10000>; + }; }; cpu2: cpu@200 { @@ -146,6 +159,12 @@ l2_200: l2-cache { cache-unified; next-level-cache = <&l3_0>; }; + + cpu2_idle: thermal-idle { + #cooling-cells = <2>; + duration-us = <800000>; + exit-latency-us = <10000>; + }; }; cpu3: cpu@300 { @@ -166,6 +185,12 @@ cpu3: cpu@300 { qcom,freq-domain = <&cpufreq_hw 3>; #cooling-cells = <2>; + + cpu3_idle: thermal-idle { + #cooling-cells = <2>; + duration-us = <800000>; + exit-latency-us = <10000>; + }; }; cpu4: cpu@400 { @@ -193,6 +218,12 @@ l2_400: l2-cache { cache-unified; next-level-cache = <&l3_0>; }; + + cpu4_idle: thermal-idle { + #cooling-cells = <2>; + duration-us = <800000>; + exit-latency-us = <10000>; + }; }; cpu5: cpu@500 { @@ -220,6 +251,12 @@ l2_500: l2-cache { cache-unified; next-level-cache = <&l3_0>; }; + + cpu5_idle: thermal-idle { + #cooling-cells = <2>; + duration-us = <800000>; + exit-latency-us = <10000>; + }; }; cpu6: cpu@600 { @@ -247,6 +284,12 @@ l2_600: l2-cache { cache-unified; next-level-cache = <&l3_0>; }; + + cpu6_idle: thermal-idle { + #cooling-cells = <2>; + duration-us = <800000>; + exit-latency-us = <10000>; + }; }; cpu7: cpu@700 { @@ -274,6 +317,12 @@ l2_700: l2-cache { cache-unified; next-level-cache = <&l3_0>; }; + + cpu7_idle: thermal-idle { + #cooling-cells = <2>; + duration-us = <800000>; + exit-latency-us = <10000>; + }; }; cpu-map { @@ -5752,23 +5801,30 @@ cpu2-top-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu2_top_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu2-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu2_top_alert1>; + cooling-device = <&cpu2_idle 100 100>; + }; + }; }; cpu2-bottom-thermal { @@ -5776,23 +5832,30 @@ cpu2-bottom-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu2_bottom_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu2-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu2_bottom_alert1>; + cooling-device = <&cpu2_idle 100 100>; + }; + }; }; cpu3-top-thermal { @@ -5800,23 +5863,30 @@ cpu3-top-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu3_top_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu3-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu3_top_alert1>; + cooling-device = <&cpu3_idle 100 100>; + }; + }; }; cpu3-bottom-thermal { @@ -5824,23 +5894,30 @@ cpu3-bottom-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu3_bottom_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu3-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu3_bottom_alert1>; + cooling-device = <&cpu3_idle 100 100>; + }; + }; }; cpu4-top-thermal { @@ -5848,23 +5925,30 @@ cpu4-top-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu4_top_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu4-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu4_top_alert1>; + cooling-device = <&cpu4_idle 100 100>; + }; + }; }; cpu4-bottom-thermal { @@ -5872,23 +5956,30 @@ cpu4-bottom-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu4_bottom_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu4-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu4_bottom_alert1>; + cooling-device = <&cpu4_idle 100 100>; + }; + }; }; cpu5-top-thermal { @@ -5896,23 +5987,30 @@ cpu5-top-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu5_top_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu5-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu5_top_alert1>; + cooling-device = <&cpu5_idle 100 100>; + }; + }; }; cpu5-bottom-thermal { @@ -5920,23 +6018,30 @@ cpu5-bottom-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu5_bottom_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu5-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu5_bottom_alert1>; + cooling-device = <&cpu5_idle 100 100>; + }; + }; }; cpu6-top-thermal { @@ -5944,23 +6049,30 @@ cpu6-top-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu6_top_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu6-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu6_top_alert1>; + cooling-device = <&cpu6_idle 100 100>; + }; + }; }; cpu6-bottom-thermal { @@ -5968,23 +6080,30 @@ cpu6-bottom-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu6_bottom_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu6-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu6_bottom_alert1>; + cooling-device = <&cpu6_idle 100 100>; + }; + }; }; aoss1-thermal { @@ -6010,23 +6129,30 @@ cpu7-top-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu7_top_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu7-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu7_top_alert1>; + cooling-device = <&cpu7_idle 100 100>; + }; + }; }; cpu7-middle-thermal { @@ -6034,23 +6160,30 @@ cpu7-middle-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu7_middle_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu7-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu7_middle_alert1>; + cooling-device = <&cpu7_idle 100 100>; + }; + }; }; cpu7-bottom-thermal { @@ -6058,23 +6191,30 @@ cpu7-bottom-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu7_bottom_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu7-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu7_bottom_alert1>; + cooling-device = <&cpu7_idle 100 100>; + }; + }; }; cpu0-thermal { @@ -6082,23 +6222,30 @@ cpu0-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu0_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu0-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu0_alert1>; + cooling-device = <&cpu0_idle 100 100>; + }; + }; }; cpu1-thermal { @@ -6106,23 +6253,30 @@ cpu1-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu1_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu1-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu1_alert1>; + cooling-device = <&cpu1_idle 100 100>; + }; + }; }; nsphvx0-thermal { -- 2.34.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] arm64: dts: qcom: sm8650: setup cpu thermal with idle on high temperatures 2025-01-03 14:38 ` [PATCH 1/2] arm64: dts: qcom: sm8650: setup cpu thermal with idle on high temperatures Neil Armstrong @ 2025-01-06 23:39 ` Bjorn Andersson 2025-01-07 8:13 ` Neil Armstrong 0 siblings, 1 reply; 16+ messages in thread From: Bjorn Andersson @ 2025-01-06 23:39 UTC (permalink / raw) To: Neil Armstrong Cc: Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley, linux-arm-msm, devicetree, linux-kernel On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote: > On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an > hardware controlled loop using the LMH and EPSS blocks with constraints and > OPPs programmed in the board firmware. > > Since the Hardware does a better job at maintaining the CPUs temperature > in an acceptable range by taking in account more parameters like the die > characteristics or other factory fused values, it makes no sense to try > and reproduce a similar set of constraints with the Linux cpufreq thermal > core. > > In addition, the tsens IP is responsible for monitoring the temperature > across the SoC and the current settings will heavily trigger the tsens > UP/LOW interrupts if the CPU temperatures reaches the hardware thermal > constraints which are currently defined in the DT. And since the CPUs > are not hooked in the thermal trip points, the potential interrupts and > calculations are a waste of system resources. > > Instead, set higher temperatures in the CPU trip points, and hook some CPU > idle injector with a 100% duty cycle at the highest trip point in the case > the hardware DCVS cannot handle the temperature surge, and try our best to > avoid reaching the critical temperature trip point which should trigger an > inevitable thermal shutdown. > Are you able to hit these higher temperatures? Do you have some test case where the idle-injection shows to be successful in blocking us from reaching the critical temp? E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only the critical trip for when the hardware fails us. I have no concerns at all about "removing" the 90C trip point, that makes total sense to me - let the hardware keep the cores as close to max as possible, and then use some slower sensor for keeping the system temperature in check (such as the x13s skin sensor). PS. The described behavior should apply to anything SDM845 and newer, so I'd like to see this set/document precedence for other platforms. Regards, Bjorn > Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> > --- > arch/arm64/boot/dts/qcom/sm8650.dtsi | 274 +++++++++++++++++++++++++++-------- > 1 file changed, 214 insertions(+), 60 deletions(-) > > diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi > index 25e47505adcb790d09f1d2726386438487255824..448374a32e07151e35727d92fab77356769aea8a 100644 > --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi > +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi > @@ -99,6 +99,13 @@ l3_0: l3-cache { > cache-unified; > }; > }; > + > + cpu0_idle: thermal-idle { > + #cooling-cells = <2>; > + duration-us = <800000>; > + exit-latency-us = <10000>; > + }; > + > }; > > cpu1: cpu@100 { > @@ -119,6 +126,12 @@ cpu1: cpu@100 { > qcom,freq-domain = <&cpufreq_hw 0>; > > #cooling-cells = <2>; > + > + cpu1_idle: thermal-idle { > + #cooling-cells = <2>; > + duration-us = <800000>; > + exit-latency-us = <10000>; > + }; > }; > > cpu2: cpu@200 { > @@ -146,6 +159,12 @@ l2_200: l2-cache { > cache-unified; > next-level-cache = <&l3_0>; > }; > + > + cpu2_idle: thermal-idle { > + #cooling-cells = <2>; > + duration-us = <800000>; > + exit-latency-us = <10000>; > + }; > }; > > cpu3: cpu@300 { > @@ -166,6 +185,12 @@ cpu3: cpu@300 { > qcom,freq-domain = <&cpufreq_hw 3>; > > #cooling-cells = <2>; > + > + cpu3_idle: thermal-idle { > + #cooling-cells = <2>; > + duration-us = <800000>; > + exit-latency-us = <10000>; > + }; > }; > > cpu4: cpu@400 { > @@ -193,6 +218,12 @@ l2_400: l2-cache { > cache-unified; > next-level-cache = <&l3_0>; > }; > + > + cpu4_idle: thermal-idle { > + #cooling-cells = <2>; > + duration-us = <800000>; > + exit-latency-us = <10000>; > + }; > }; > > cpu5: cpu@500 { > @@ -220,6 +251,12 @@ l2_500: l2-cache { > cache-unified; > next-level-cache = <&l3_0>; > }; > + > + cpu5_idle: thermal-idle { > + #cooling-cells = <2>; > + duration-us = <800000>; > + exit-latency-us = <10000>; > + }; > }; > > cpu6: cpu@600 { > @@ -247,6 +284,12 @@ l2_600: l2-cache { > cache-unified; > next-level-cache = <&l3_0>; > }; > + > + cpu6_idle: thermal-idle { > + #cooling-cells = <2>; > + duration-us = <800000>; > + exit-latency-us = <10000>; > + }; > }; > > cpu7: cpu@700 { > @@ -274,6 +317,12 @@ l2_700: l2-cache { > cache-unified; > next-level-cache = <&l3_0>; > }; > + > + cpu7_idle: thermal-idle { > + #cooling-cells = <2>; > + duration-us = <800000>; > + exit-latency-us = <10000>; > + }; > }; > > cpu-map { > @@ -5752,23 +5801,30 @@ cpu2-top-thermal { > > trips { > trip-point0 { > - temperature = <90000>; > + temperature = <108000>; > hysteresis = <2000>; > type = "passive"; > }; > > - trip-point1 { > - temperature = <95000>; > + cpu2_top_alert1: trip-point1 { > + temperature = <110000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu2-critical { > - temperature = <110000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "critical"; > }; > }; > + > + cooling-maps { > + map0 { > + trip = <&cpu2_top_alert1>; > + cooling-device = <&cpu2_idle 100 100>; > + }; > + }; > }; > > cpu2-bottom-thermal { > @@ -5776,23 +5832,30 @@ cpu2-bottom-thermal { > > trips { > trip-point0 { > - temperature = <90000>; > + temperature = <108000>; > hysteresis = <2000>; > type = "passive"; > }; > > - trip-point1 { > - temperature = <95000>; > + cpu2_bottom_alert1: trip-point1 { > + temperature = <110000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu2-critical { > - temperature = <110000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "critical"; > }; > }; > + > + cooling-maps { > + map0 { > + trip = <&cpu2_bottom_alert1>; > + cooling-device = <&cpu2_idle 100 100>; > + }; > + }; > }; > > cpu3-top-thermal { > @@ -5800,23 +5863,30 @@ cpu3-top-thermal { > > trips { > trip-point0 { > - temperature = <90000>; > + temperature = <108000>; > hysteresis = <2000>; > type = "passive"; > }; > > - trip-point1 { > - temperature = <95000>; > + cpu3_top_alert1: trip-point1 { > + temperature = <110000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu3-critical { > - temperature = <110000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "critical"; > }; > }; > + > + cooling-maps { > + map0 { > + trip = <&cpu3_top_alert1>; > + cooling-device = <&cpu3_idle 100 100>; > + }; > + }; > }; > > cpu3-bottom-thermal { > @@ -5824,23 +5894,30 @@ cpu3-bottom-thermal { > > trips { > trip-point0 { > - temperature = <90000>; > + temperature = <108000>; > hysteresis = <2000>; > type = "passive"; > }; > > - trip-point1 { > - temperature = <95000>; > + cpu3_bottom_alert1: trip-point1 { > + temperature = <110000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu3-critical { > - temperature = <110000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "critical"; > }; > }; > + > + cooling-maps { > + map0 { > + trip = <&cpu3_bottom_alert1>; > + cooling-device = <&cpu3_idle 100 100>; > + }; > + }; > }; > > cpu4-top-thermal { > @@ -5848,23 +5925,30 @@ cpu4-top-thermal { > > trips { > trip-point0 { > - temperature = <90000>; > + temperature = <108000>; > hysteresis = <2000>; > type = "passive"; > }; > > - trip-point1 { > - temperature = <95000>; > + cpu4_top_alert1: trip-point1 { > + temperature = <110000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu4-critical { > - temperature = <110000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "critical"; > }; > }; > + > + cooling-maps { > + map0 { > + trip = <&cpu4_top_alert1>; > + cooling-device = <&cpu4_idle 100 100>; > + }; > + }; > }; > > cpu4-bottom-thermal { > @@ -5872,23 +5956,30 @@ cpu4-bottom-thermal { > > trips { > trip-point0 { > - temperature = <90000>; > + temperature = <108000>; > hysteresis = <2000>; > type = "passive"; > }; > > - trip-point1 { > - temperature = <95000>; > + cpu4_bottom_alert1: trip-point1 { > + temperature = <110000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu4-critical { > - temperature = <110000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "critical"; > }; > }; > + > + cooling-maps { > + map0 { > + trip = <&cpu4_bottom_alert1>; > + cooling-device = <&cpu4_idle 100 100>; > + }; > + }; > }; > > cpu5-top-thermal { > @@ -5896,23 +5987,30 @@ cpu5-top-thermal { > > trips { > trip-point0 { > - temperature = <90000>; > + temperature = <108000>; > hysteresis = <2000>; > type = "passive"; > }; > > - trip-point1 { > - temperature = <95000>; > + cpu5_top_alert1: trip-point1 { > + temperature = <110000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu5-critical { > - temperature = <110000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "critical"; > }; > }; > + > + cooling-maps { > + map0 { > + trip = <&cpu5_top_alert1>; > + cooling-device = <&cpu5_idle 100 100>; > + }; > + }; > }; > > cpu5-bottom-thermal { > @@ -5920,23 +6018,30 @@ cpu5-bottom-thermal { > > trips { > trip-point0 { > - temperature = <90000>; > + temperature = <108000>; > hysteresis = <2000>; > type = "passive"; > }; > > - trip-point1 { > - temperature = <95000>; > + cpu5_bottom_alert1: trip-point1 { > + temperature = <110000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu5-critical { > - temperature = <110000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "critical"; > }; > }; > + > + cooling-maps { > + map0 { > + trip = <&cpu5_bottom_alert1>; > + cooling-device = <&cpu5_idle 100 100>; > + }; > + }; > }; > > cpu6-top-thermal { > @@ -5944,23 +6049,30 @@ cpu6-top-thermal { > > trips { > trip-point0 { > - temperature = <90000>; > + temperature = <108000>; > hysteresis = <2000>; > type = "passive"; > }; > > - trip-point1 { > - temperature = <95000>; > + cpu6_top_alert1: trip-point1 { > + temperature = <110000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu6-critical { > - temperature = <110000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "critical"; > }; > }; > + > + cooling-maps { > + map0 { > + trip = <&cpu6_top_alert1>; > + cooling-device = <&cpu6_idle 100 100>; > + }; > + }; > }; > > cpu6-bottom-thermal { > @@ -5968,23 +6080,30 @@ cpu6-bottom-thermal { > > trips { > trip-point0 { > - temperature = <90000>; > + temperature = <108000>; > hysteresis = <2000>; > type = "passive"; > }; > > - trip-point1 { > - temperature = <95000>; > + cpu6_bottom_alert1: trip-point1 { > + temperature = <110000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu6-critical { > - temperature = <110000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "critical"; > }; > }; > + > + cooling-maps { > + map0 { > + trip = <&cpu6_bottom_alert1>; > + cooling-device = <&cpu6_idle 100 100>; > + }; > + }; > }; > > aoss1-thermal { > @@ -6010,23 +6129,30 @@ cpu7-top-thermal { > > trips { > trip-point0 { > - temperature = <90000>; > + temperature = <108000>; > hysteresis = <2000>; > type = "passive"; > }; > > - trip-point1 { > - temperature = <95000>; > + cpu7_top_alert1: trip-point1 { > + temperature = <110000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu7-critical { > - temperature = <110000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "critical"; > }; > }; > + > + cooling-maps { > + map0 { > + trip = <&cpu7_top_alert1>; > + cooling-device = <&cpu7_idle 100 100>; > + }; > + }; > }; > > cpu7-middle-thermal { > @@ -6034,23 +6160,30 @@ cpu7-middle-thermal { > > trips { > trip-point0 { > - temperature = <90000>; > + temperature = <108000>; > hysteresis = <2000>; > type = "passive"; > }; > > - trip-point1 { > - temperature = <95000>; > + cpu7_middle_alert1: trip-point1 { > + temperature = <110000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu7-critical { > - temperature = <110000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "critical"; > }; > }; > + > + cooling-maps { > + map0 { > + trip = <&cpu7_middle_alert1>; > + cooling-device = <&cpu7_idle 100 100>; > + }; > + }; > }; > > cpu7-bottom-thermal { > @@ -6058,23 +6191,30 @@ cpu7-bottom-thermal { > > trips { > trip-point0 { > - temperature = <90000>; > + temperature = <108000>; > hysteresis = <2000>; > type = "passive"; > }; > > - trip-point1 { > - temperature = <95000>; > + cpu7_bottom_alert1: trip-point1 { > + temperature = <110000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu7-critical { > - temperature = <110000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "critical"; > }; > }; > + > + cooling-maps { > + map0 { > + trip = <&cpu7_bottom_alert1>; > + cooling-device = <&cpu7_idle 100 100>; > + }; > + }; > }; > > cpu0-thermal { > @@ -6082,23 +6222,30 @@ cpu0-thermal { > > trips { > trip-point0 { > - temperature = <90000>; > + temperature = <108000>; > hysteresis = <2000>; > type = "passive"; > }; > > - trip-point1 { > - temperature = <95000>; > + cpu0_alert1: trip-point1 { > + temperature = <110000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu0-critical { > - temperature = <110000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "critical"; > }; > }; > + > + cooling-maps { > + map0 { > + trip = <&cpu0_alert1>; > + cooling-device = <&cpu0_idle 100 100>; > + }; > + }; > }; > > cpu1-thermal { > @@ -6106,23 +6253,30 @@ cpu1-thermal { > > trips { > trip-point0 { > - temperature = <90000>; > + temperature = <108000>; > hysteresis = <2000>; > type = "passive"; > }; > > - trip-point1 { > - temperature = <95000>; > + cpu1_alert1: trip-point1 { > + temperature = <110000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu1-critical { > - temperature = <110000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "critical"; > }; > }; > + > + cooling-maps { > + map0 { > + trip = <&cpu1_alert1>; > + cooling-device = <&cpu1_idle 100 100>; > + }; > + }; > }; > > nsphvx0-thermal { > > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] arm64: dts: qcom: sm8650: setup cpu thermal with idle on high temperatures 2025-01-06 23:39 ` Bjorn Andersson @ 2025-01-07 8:13 ` Neil Armstrong 2025-01-08 3:11 ` Bjorn Andersson 0 siblings, 1 reply; 16+ messages in thread From: Neil Armstrong @ 2025-01-07 8:13 UTC (permalink / raw) To: Bjorn Andersson Cc: Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley, linux-arm-msm, devicetree, linux-kernel Hi, On 07/01/2025 00:39, Bjorn Andersson wrote: > On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote: >> On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an >> hardware controlled loop using the LMH and EPSS blocks with constraints and >> OPPs programmed in the board firmware. >> >> Since the Hardware does a better job at maintaining the CPUs temperature >> in an acceptable range by taking in account more parameters like the die >> characteristics or other factory fused values, it makes no sense to try >> and reproduce a similar set of constraints with the Linux cpufreq thermal >> core. >> >> In addition, the tsens IP is responsible for monitoring the temperature >> across the SoC and the current settings will heavily trigger the tsens >> UP/LOW interrupts if the CPU temperatures reaches the hardware thermal >> constraints which are currently defined in the DT. And since the CPUs >> are not hooked in the thermal trip points, the potential interrupts and >> calculations are a waste of system resources. >> >> Instead, set higher temperatures in the CPU trip points, and hook some CPU >> idle injector with a 100% duty cycle at the highest trip point in the case >> the hardware DCVS cannot handle the temperature surge, and try our best to >> avoid reaching the critical temperature trip point which should trigger an >> inevitable thermal shutdown. >> > > Are you able to hit these higher temperatures? Do you have some test > case where the idle-injection shows to be successful in blocking us from > reaching the critical temp? No, I've been able to test idle-injection and observed a noticeable effect but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from scaling down and let the temp go higher ? > > E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only > the critical trip for when the hardware fails us. It's the goal here aswell > > > I have no concerns at all about "removing" the 90C trip point, that > makes total sense to me - let the hardware keep the cores as close to > max as possible, and then use some slower sensor for keeping the system > temperature in check (such as the x13s skin sensor). > > > PS. The described behavior should apply to anything SDM845 and newer, so > I'd like to see this set/document precedence for other platforms. > > Regards, > Bjorn > >> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> >> --- >> arch/arm64/boot/dts/qcom/sm8650.dtsi | 274 +++++++++++++++++++++++++++-------- >> 1 file changed, 214 insertions(+), 60 deletions(-) >> >> diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi >> index 25e47505adcb790d09f1d2726386438487255824..448374a32e07151e35727d92fab77356769aea8a 100644 >> --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi >> +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi >> @@ -99,6 +99,13 @@ l3_0: l3-cache { >> cache-unified; >> }; >> }; >> + >> + cpu0_idle: thermal-idle { >> + #cooling-cells = <2>; >> + duration-us = <800000>; >> + exit-latency-us = <10000>; >> + }; >> + >> }; >> >> cpu1: cpu@100 { >> @@ -119,6 +126,12 @@ cpu1: cpu@100 { >> qcom,freq-domain = <&cpufreq_hw 0>; >> >> #cooling-cells = <2>; >> + >> + cpu1_idle: thermal-idle { >> + #cooling-cells = <2>; >> + duration-us = <800000>; >> + exit-latency-us = <10000>; >> + }; >> }; >> >> cpu2: cpu@200 { >> @@ -146,6 +159,12 @@ l2_200: l2-cache { >> cache-unified; >> next-level-cache = <&l3_0>; >> }; >> + >> + cpu2_idle: thermal-idle { >> + #cooling-cells = <2>; >> + duration-us = <800000>; >> + exit-latency-us = <10000>; >> + }; >> }; >> >> cpu3: cpu@300 { >> @@ -166,6 +185,12 @@ cpu3: cpu@300 { >> qcom,freq-domain = <&cpufreq_hw 3>; >> >> #cooling-cells = <2>; >> + >> + cpu3_idle: thermal-idle { >> + #cooling-cells = <2>; >> + duration-us = <800000>; >> + exit-latency-us = <10000>; >> + }; >> }; >> >> cpu4: cpu@400 { >> @@ -193,6 +218,12 @@ l2_400: l2-cache { >> cache-unified; >> next-level-cache = <&l3_0>; >> }; >> + >> + cpu4_idle: thermal-idle { >> + #cooling-cells = <2>; >> + duration-us = <800000>; >> + exit-latency-us = <10000>; >> + }; >> }; >> >> cpu5: cpu@500 { >> @@ -220,6 +251,12 @@ l2_500: l2-cache { >> cache-unified; >> next-level-cache = <&l3_0>; >> }; >> + >> + cpu5_idle: thermal-idle { >> + #cooling-cells = <2>; >> + duration-us = <800000>; >> + exit-latency-us = <10000>; >> + }; >> }; >> >> cpu6: cpu@600 { >> @@ -247,6 +284,12 @@ l2_600: l2-cache { >> cache-unified; >> next-level-cache = <&l3_0>; >> }; >> + >> + cpu6_idle: thermal-idle { >> + #cooling-cells = <2>; >> + duration-us = <800000>; >> + exit-latency-us = <10000>; >> + }; >> }; >> >> cpu7: cpu@700 { >> @@ -274,6 +317,12 @@ l2_700: l2-cache { >> cache-unified; >> next-level-cache = <&l3_0>; >> }; >> + >> + cpu7_idle: thermal-idle { >> + #cooling-cells = <2>; >> + duration-us = <800000>; >> + exit-latency-us = <10000>; >> + }; >> }; >> >> cpu-map { >> @@ -5752,23 +5801,30 @@ cpu2-top-thermal { >> >> trips { >> trip-point0 { >> - temperature = <90000>; >> + temperature = <108000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> - trip-point1 { >> - temperature = <95000>; >> + cpu2_top_alert1: trip-point1 { >> + temperature = <110000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> cpu2-critical { >> - temperature = <110000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> }; >> + >> + cooling-maps { >> + map0 { >> + trip = <&cpu2_top_alert1>; >> + cooling-device = <&cpu2_idle 100 100>; >> + }; >> + }; >> }; >> >> cpu2-bottom-thermal { >> @@ -5776,23 +5832,30 @@ cpu2-bottom-thermal { >> >> trips { >> trip-point0 { >> - temperature = <90000>; >> + temperature = <108000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> - trip-point1 { >> - temperature = <95000>; >> + cpu2_bottom_alert1: trip-point1 { >> + temperature = <110000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> cpu2-critical { >> - temperature = <110000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> }; >> + >> + cooling-maps { >> + map0 { >> + trip = <&cpu2_bottom_alert1>; >> + cooling-device = <&cpu2_idle 100 100>; >> + }; >> + }; >> }; >> >> cpu3-top-thermal { >> @@ -5800,23 +5863,30 @@ cpu3-top-thermal { >> >> trips { >> trip-point0 { >> - temperature = <90000>; >> + temperature = <108000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> - trip-point1 { >> - temperature = <95000>; >> + cpu3_top_alert1: trip-point1 { >> + temperature = <110000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> cpu3-critical { >> - temperature = <110000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> }; >> + >> + cooling-maps { >> + map0 { >> + trip = <&cpu3_top_alert1>; >> + cooling-device = <&cpu3_idle 100 100>; >> + }; >> + }; >> }; >> >> cpu3-bottom-thermal { >> @@ -5824,23 +5894,30 @@ cpu3-bottom-thermal { >> >> trips { >> trip-point0 { >> - temperature = <90000>; >> + temperature = <108000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> - trip-point1 { >> - temperature = <95000>; >> + cpu3_bottom_alert1: trip-point1 { >> + temperature = <110000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> cpu3-critical { >> - temperature = <110000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> }; >> + >> + cooling-maps { >> + map0 { >> + trip = <&cpu3_bottom_alert1>; >> + cooling-device = <&cpu3_idle 100 100>; >> + }; >> + }; >> }; >> >> cpu4-top-thermal { >> @@ -5848,23 +5925,30 @@ cpu4-top-thermal { >> >> trips { >> trip-point0 { >> - temperature = <90000>; >> + temperature = <108000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> - trip-point1 { >> - temperature = <95000>; >> + cpu4_top_alert1: trip-point1 { >> + temperature = <110000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> cpu4-critical { >> - temperature = <110000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> }; >> + >> + cooling-maps { >> + map0 { >> + trip = <&cpu4_top_alert1>; >> + cooling-device = <&cpu4_idle 100 100>; >> + }; >> + }; >> }; >> >> cpu4-bottom-thermal { >> @@ -5872,23 +5956,30 @@ cpu4-bottom-thermal { >> >> trips { >> trip-point0 { >> - temperature = <90000>; >> + temperature = <108000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> - trip-point1 { >> - temperature = <95000>; >> + cpu4_bottom_alert1: trip-point1 { >> + temperature = <110000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> cpu4-critical { >> - temperature = <110000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> }; >> + >> + cooling-maps { >> + map0 { >> + trip = <&cpu4_bottom_alert1>; >> + cooling-device = <&cpu4_idle 100 100>; >> + }; >> + }; >> }; >> >> cpu5-top-thermal { >> @@ -5896,23 +5987,30 @@ cpu5-top-thermal { >> >> trips { >> trip-point0 { >> - temperature = <90000>; >> + temperature = <108000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> - trip-point1 { >> - temperature = <95000>; >> + cpu5_top_alert1: trip-point1 { >> + temperature = <110000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> cpu5-critical { >> - temperature = <110000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> }; >> + >> + cooling-maps { >> + map0 { >> + trip = <&cpu5_top_alert1>; >> + cooling-device = <&cpu5_idle 100 100>; >> + }; >> + }; >> }; >> >> cpu5-bottom-thermal { >> @@ -5920,23 +6018,30 @@ cpu5-bottom-thermal { >> >> trips { >> trip-point0 { >> - temperature = <90000>; >> + temperature = <108000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> - trip-point1 { >> - temperature = <95000>; >> + cpu5_bottom_alert1: trip-point1 { >> + temperature = <110000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> cpu5-critical { >> - temperature = <110000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> }; >> + >> + cooling-maps { >> + map0 { >> + trip = <&cpu5_bottom_alert1>; >> + cooling-device = <&cpu5_idle 100 100>; >> + }; >> + }; >> }; >> >> cpu6-top-thermal { >> @@ -5944,23 +6049,30 @@ cpu6-top-thermal { >> >> trips { >> trip-point0 { >> - temperature = <90000>; >> + temperature = <108000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> - trip-point1 { >> - temperature = <95000>; >> + cpu6_top_alert1: trip-point1 { >> + temperature = <110000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> cpu6-critical { >> - temperature = <110000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> }; >> + >> + cooling-maps { >> + map0 { >> + trip = <&cpu6_top_alert1>; >> + cooling-device = <&cpu6_idle 100 100>; >> + }; >> + }; >> }; >> >> cpu6-bottom-thermal { >> @@ -5968,23 +6080,30 @@ cpu6-bottom-thermal { >> >> trips { >> trip-point0 { >> - temperature = <90000>; >> + temperature = <108000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> - trip-point1 { >> - temperature = <95000>; >> + cpu6_bottom_alert1: trip-point1 { >> + temperature = <110000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> cpu6-critical { >> - temperature = <110000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> }; >> + >> + cooling-maps { >> + map0 { >> + trip = <&cpu6_bottom_alert1>; >> + cooling-device = <&cpu6_idle 100 100>; >> + }; >> + }; >> }; >> >> aoss1-thermal { >> @@ -6010,23 +6129,30 @@ cpu7-top-thermal { >> >> trips { >> trip-point0 { >> - temperature = <90000>; >> + temperature = <108000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> - trip-point1 { >> - temperature = <95000>; >> + cpu7_top_alert1: trip-point1 { >> + temperature = <110000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> cpu7-critical { >> - temperature = <110000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> }; >> + >> + cooling-maps { >> + map0 { >> + trip = <&cpu7_top_alert1>; >> + cooling-device = <&cpu7_idle 100 100>; >> + }; >> + }; >> }; >> >> cpu7-middle-thermal { >> @@ -6034,23 +6160,30 @@ cpu7-middle-thermal { >> >> trips { >> trip-point0 { >> - temperature = <90000>; >> + temperature = <108000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> - trip-point1 { >> - temperature = <95000>; >> + cpu7_middle_alert1: trip-point1 { >> + temperature = <110000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> cpu7-critical { >> - temperature = <110000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> }; >> + >> + cooling-maps { >> + map0 { >> + trip = <&cpu7_middle_alert1>; >> + cooling-device = <&cpu7_idle 100 100>; >> + }; >> + }; >> }; >> >> cpu7-bottom-thermal { >> @@ -6058,23 +6191,30 @@ cpu7-bottom-thermal { >> >> trips { >> trip-point0 { >> - temperature = <90000>; >> + temperature = <108000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> - trip-point1 { >> - temperature = <95000>; >> + cpu7_bottom_alert1: trip-point1 { >> + temperature = <110000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> cpu7-critical { >> - temperature = <110000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> }; >> + >> + cooling-maps { >> + map0 { >> + trip = <&cpu7_bottom_alert1>; >> + cooling-device = <&cpu7_idle 100 100>; >> + }; >> + }; >> }; >> >> cpu0-thermal { >> @@ -6082,23 +6222,30 @@ cpu0-thermal { >> >> trips { >> trip-point0 { >> - temperature = <90000>; >> + temperature = <108000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> - trip-point1 { >> - temperature = <95000>; >> + cpu0_alert1: trip-point1 { >> + temperature = <110000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> cpu0-critical { >> - temperature = <110000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> }; >> + >> + cooling-maps { >> + map0 { >> + trip = <&cpu0_alert1>; >> + cooling-device = <&cpu0_idle 100 100>; >> + }; >> + }; >> }; >> >> cpu1-thermal { >> @@ -6106,23 +6253,30 @@ cpu1-thermal { >> >> trips { >> trip-point0 { >> - temperature = <90000>; >> + temperature = <108000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> - trip-point1 { >> - temperature = <95000>; >> + cpu1_alert1: trip-point1 { >> + temperature = <110000>; >> hysteresis = <2000>; >> type = "passive"; >> }; >> >> cpu1-critical { >> - temperature = <110000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> }; >> + >> + cooling-maps { >> + map0 { >> + trip = <&cpu1_alert1>; >> + cooling-device = <&cpu1_idle 100 100>; >> + }; >> + }; >> }; >> >> nsphvx0-thermal { >> >> -- >> 2.34.1 >> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] arm64: dts: qcom: sm8650: setup cpu thermal with idle on high temperatures 2025-01-07 8:13 ` Neil Armstrong @ 2025-01-08 3:11 ` Bjorn Andersson 2025-01-08 9:15 ` Neil Armstrong 0 siblings, 1 reply; 16+ messages in thread From: Bjorn Andersson @ 2025-01-08 3:11 UTC (permalink / raw) To: Neil Armstrong Cc: Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley, linux-arm-msm, devicetree, linux-kernel On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote: > Hi, > > On 07/01/2025 00:39, Bjorn Andersson wrote: > > On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote: > > > On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an > > > hardware controlled loop using the LMH and EPSS blocks with constraints and > > > OPPs programmed in the board firmware. > > > > > > Since the Hardware does a better job at maintaining the CPUs temperature > > > in an acceptable range by taking in account more parameters like the die > > > characteristics or other factory fused values, it makes no sense to try > > > and reproduce a similar set of constraints with the Linux cpufreq thermal > > > core. > > > > > > In addition, the tsens IP is responsible for monitoring the temperature > > > across the SoC and the current settings will heavily trigger the tsens > > > UP/LOW interrupts if the CPU temperatures reaches the hardware thermal > > > constraints which are currently defined in the DT. And since the CPUs > > > are not hooked in the thermal trip points, the potential interrupts and > > > calculations are a waste of system resources. > > > > > > Instead, set higher temperatures in the CPU trip points, and hook some CPU > > > idle injector with a 100% duty cycle at the highest trip point in the case > > > the hardware DCVS cannot handle the temperature surge, and try our best to > > > avoid reaching the critical temperature trip point which should trigger an > > > inevitable thermal shutdown. > > > > > > > Are you able to hit these higher temperatures? Do you have some test > > case where the idle-injection shows to be successful in blocking us from > > reaching the critical temp? > > No, I've been able to test idle-injection and observed a noticeable effect > but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from > scaling down and let the temp go higher ? > I don't know how to override that configuration. > > > > E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only > > the critical trip for when the hardware fails us. > > It's the goal here aswell > How about simplifying the patch by removing the idle-injection step and just rely on LMH/EPSS and the "critical" trip (at least until someone can prove that there's value in the extra mitigation)? Regards, Bjorn > > > > > > I have no concerns at all about "removing" the 90C trip point, that > > makes total sense to me - let the hardware keep the cores as close to > > max as possible, and then use some slower sensor for keeping the system > > temperature in check (such as the x13s skin sensor). > > > > > > PS. The described behavior should apply to anything SDM845 and newer, so > > I'd like to see this set/document precedence for other platforms. > > > > Regards, > > Bjorn > > > > > Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> > > > --- > > > arch/arm64/boot/dts/qcom/sm8650.dtsi | 274 +++++++++++++++++++++++++++-------- > > > 1 file changed, 214 insertions(+), 60 deletions(-) > > > > > > diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi > > > index 25e47505adcb790d09f1d2726386438487255824..448374a32e07151e35727d92fab77356769aea8a 100644 > > > --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi > > > +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi > > > @@ -99,6 +99,13 @@ l3_0: l3-cache { > > > cache-unified; > > > }; > > > }; > > > + > > > + cpu0_idle: thermal-idle { > > > + #cooling-cells = <2>; > > > + duration-us = <800000>; > > > + exit-latency-us = <10000>; > > > + }; > > > + > > > }; > > > cpu1: cpu@100 { > > > @@ -119,6 +126,12 @@ cpu1: cpu@100 { > > > qcom,freq-domain = <&cpufreq_hw 0>; > > > #cooling-cells = <2>; > > > + > > > + cpu1_idle: thermal-idle { > > > + #cooling-cells = <2>; > > > + duration-us = <800000>; > > > + exit-latency-us = <10000>; > > > + }; > > > }; > > > cpu2: cpu@200 { > > > @@ -146,6 +159,12 @@ l2_200: l2-cache { > > > cache-unified; > > > next-level-cache = <&l3_0>; > > > }; > > > + > > > + cpu2_idle: thermal-idle { > > > + #cooling-cells = <2>; > > > + duration-us = <800000>; > > > + exit-latency-us = <10000>; > > > + }; > > > }; > > > cpu3: cpu@300 { > > > @@ -166,6 +185,12 @@ cpu3: cpu@300 { > > > qcom,freq-domain = <&cpufreq_hw 3>; > > > #cooling-cells = <2>; > > > + > > > + cpu3_idle: thermal-idle { > > > + #cooling-cells = <2>; > > > + duration-us = <800000>; > > > + exit-latency-us = <10000>; > > > + }; > > > }; > > > cpu4: cpu@400 { > > > @@ -193,6 +218,12 @@ l2_400: l2-cache { > > > cache-unified; > > > next-level-cache = <&l3_0>; > > > }; > > > + > > > + cpu4_idle: thermal-idle { > > > + #cooling-cells = <2>; > > > + duration-us = <800000>; > > > + exit-latency-us = <10000>; > > > + }; > > > }; > > > cpu5: cpu@500 { > > > @@ -220,6 +251,12 @@ l2_500: l2-cache { > > > cache-unified; > > > next-level-cache = <&l3_0>; > > > }; > > > + > > > + cpu5_idle: thermal-idle { > > > + #cooling-cells = <2>; > > > + duration-us = <800000>; > > > + exit-latency-us = <10000>; > > > + }; > > > }; > > > cpu6: cpu@600 { > > > @@ -247,6 +284,12 @@ l2_600: l2-cache { > > > cache-unified; > > > next-level-cache = <&l3_0>; > > > }; > > > + > > > + cpu6_idle: thermal-idle { > > > + #cooling-cells = <2>; > > > + duration-us = <800000>; > > > + exit-latency-us = <10000>; > > > + }; > > > }; > > > cpu7: cpu@700 { > > > @@ -274,6 +317,12 @@ l2_700: l2-cache { > > > cache-unified; > > > next-level-cache = <&l3_0>; > > > }; > > > + > > > + cpu7_idle: thermal-idle { > > > + #cooling-cells = <2>; > > > + duration-us = <800000>; > > > + exit-latency-us = <10000>; > > > + }; > > > }; > > > cpu-map { > > > @@ -5752,23 +5801,30 @@ cpu2-top-thermal { > > > trips { > > > trip-point0 { > > > - temperature = <90000>; > > > + temperature = <108000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > - trip-point1 { > > > - temperature = <95000>; > > > + cpu2_top_alert1: trip-point1 { > > > + temperature = <110000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > cpu2-critical { > > > - temperature = <110000>; > > > + temperature = <115000>; > > > hysteresis = <1000>; > > > type = "critical"; > > > }; > > > }; > > > + > > > + cooling-maps { > > > + map0 { > > > + trip = <&cpu2_top_alert1>; > > > + cooling-device = <&cpu2_idle 100 100>; > > > + }; > > > + }; > > > }; > > > cpu2-bottom-thermal { > > > @@ -5776,23 +5832,30 @@ cpu2-bottom-thermal { > > > trips { > > > trip-point0 { > > > - temperature = <90000>; > > > + temperature = <108000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > - trip-point1 { > > > - temperature = <95000>; > > > + cpu2_bottom_alert1: trip-point1 { > > > + temperature = <110000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > cpu2-critical { > > > - temperature = <110000>; > > > + temperature = <115000>; > > > hysteresis = <1000>; > > > type = "critical"; > > > }; > > > }; > > > + > > > + cooling-maps { > > > + map0 { > > > + trip = <&cpu2_bottom_alert1>; > > > + cooling-device = <&cpu2_idle 100 100>; > > > + }; > > > + }; > > > }; > > > cpu3-top-thermal { > > > @@ -5800,23 +5863,30 @@ cpu3-top-thermal { > > > trips { > > > trip-point0 { > > > - temperature = <90000>; > > > + temperature = <108000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > - trip-point1 { > > > - temperature = <95000>; > > > + cpu3_top_alert1: trip-point1 { > > > + temperature = <110000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > cpu3-critical { > > > - temperature = <110000>; > > > + temperature = <115000>; > > > hysteresis = <1000>; > > > type = "critical"; > > > }; > > > }; > > > + > > > + cooling-maps { > > > + map0 { > > > + trip = <&cpu3_top_alert1>; > > > + cooling-device = <&cpu3_idle 100 100>; > > > + }; > > > + }; > > > }; > > > cpu3-bottom-thermal { > > > @@ -5824,23 +5894,30 @@ cpu3-bottom-thermal { > > > trips { > > > trip-point0 { > > > - temperature = <90000>; > > > + temperature = <108000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > - trip-point1 { > > > - temperature = <95000>; > > > + cpu3_bottom_alert1: trip-point1 { > > > + temperature = <110000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > cpu3-critical { > > > - temperature = <110000>; > > > + temperature = <115000>; > > > hysteresis = <1000>; > > > type = "critical"; > > > }; > > > }; > > > + > > > + cooling-maps { > > > + map0 { > > > + trip = <&cpu3_bottom_alert1>; > > > + cooling-device = <&cpu3_idle 100 100>; > > > + }; > > > + }; > > > }; > > > cpu4-top-thermal { > > > @@ -5848,23 +5925,30 @@ cpu4-top-thermal { > > > trips { > > > trip-point0 { > > > - temperature = <90000>; > > > + temperature = <108000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > - trip-point1 { > > > - temperature = <95000>; > > > + cpu4_top_alert1: trip-point1 { > > > + temperature = <110000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > cpu4-critical { > > > - temperature = <110000>; > > > + temperature = <115000>; > > > hysteresis = <1000>; > > > type = "critical"; > > > }; > > > }; > > > + > > > + cooling-maps { > > > + map0 { > > > + trip = <&cpu4_top_alert1>; > > > + cooling-device = <&cpu4_idle 100 100>; > > > + }; > > > + }; > > > }; > > > cpu4-bottom-thermal { > > > @@ -5872,23 +5956,30 @@ cpu4-bottom-thermal { > > > trips { > > > trip-point0 { > > > - temperature = <90000>; > > > + temperature = <108000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > - trip-point1 { > > > - temperature = <95000>; > > > + cpu4_bottom_alert1: trip-point1 { > > > + temperature = <110000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > cpu4-critical { > > > - temperature = <110000>; > > > + temperature = <115000>; > > > hysteresis = <1000>; > > > type = "critical"; > > > }; > > > }; > > > + > > > + cooling-maps { > > > + map0 { > > > + trip = <&cpu4_bottom_alert1>; > > > + cooling-device = <&cpu4_idle 100 100>; > > > + }; > > > + }; > > > }; > > > cpu5-top-thermal { > > > @@ -5896,23 +5987,30 @@ cpu5-top-thermal { > > > trips { > > > trip-point0 { > > > - temperature = <90000>; > > > + temperature = <108000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > - trip-point1 { > > > - temperature = <95000>; > > > + cpu5_top_alert1: trip-point1 { > > > + temperature = <110000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > cpu5-critical { > > > - temperature = <110000>; > > > + temperature = <115000>; > > > hysteresis = <1000>; > > > type = "critical"; > > > }; > > > }; > > > + > > > + cooling-maps { > > > + map0 { > > > + trip = <&cpu5_top_alert1>; > > > + cooling-device = <&cpu5_idle 100 100>; > > > + }; > > > + }; > > > }; > > > cpu5-bottom-thermal { > > > @@ -5920,23 +6018,30 @@ cpu5-bottom-thermal { > > > trips { > > > trip-point0 { > > > - temperature = <90000>; > > > + temperature = <108000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > - trip-point1 { > > > - temperature = <95000>; > > > + cpu5_bottom_alert1: trip-point1 { > > > + temperature = <110000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > cpu5-critical { > > > - temperature = <110000>; > > > + temperature = <115000>; > > > hysteresis = <1000>; > > > type = "critical"; > > > }; > > > }; > > > + > > > + cooling-maps { > > > + map0 { > > > + trip = <&cpu5_bottom_alert1>; > > > + cooling-device = <&cpu5_idle 100 100>; > > > + }; > > > + }; > > > }; > > > cpu6-top-thermal { > > > @@ -5944,23 +6049,30 @@ cpu6-top-thermal { > > > trips { > > > trip-point0 { > > > - temperature = <90000>; > > > + temperature = <108000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > - trip-point1 { > > > - temperature = <95000>; > > > + cpu6_top_alert1: trip-point1 { > > > + temperature = <110000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > cpu6-critical { > > > - temperature = <110000>; > > > + temperature = <115000>; > > > hysteresis = <1000>; > > > type = "critical"; > > > }; > > > }; > > > + > > > + cooling-maps { > > > + map0 { > > > + trip = <&cpu6_top_alert1>; > > > + cooling-device = <&cpu6_idle 100 100>; > > > + }; > > > + }; > > > }; > > > cpu6-bottom-thermal { > > > @@ -5968,23 +6080,30 @@ cpu6-bottom-thermal { > > > trips { > > > trip-point0 { > > > - temperature = <90000>; > > > + temperature = <108000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > - trip-point1 { > > > - temperature = <95000>; > > > + cpu6_bottom_alert1: trip-point1 { > > > + temperature = <110000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > cpu6-critical { > > > - temperature = <110000>; > > > + temperature = <115000>; > > > hysteresis = <1000>; > > > type = "critical"; > > > }; > > > }; > > > + > > > + cooling-maps { > > > + map0 { > > > + trip = <&cpu6_bottom_alert1>; > > > + cooling-device = <&cpu6_idle 100 100>; > > > + }; > > > + }; > > > }; > > > aoss1-thermal { > > > @@ -6010,23 +6129,30 @@ cpu7-top-thermal { > > > trips { > > > trip-point0 { > > > - temperature = <90000>; > > > + temperature = <108000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > - trip-point1 { > > > - temperature = <95000>; > > > + cpu7_top_alert1: trip-point1 { > > > + temperature = <110000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > cpu7-critical { > > > - temperature = <110000>; > > > + temperature = <115000>; > > > hysteresis = <1000>; > > > type = "critical"; > > > }; > > > }; > > > + > > > + cooling-maps { > > > + map0 { > > > + trip = <&cpu7_top_alert1>; > > > + cooling-device = <&cpu7_idle 100 100>; > > > + }; > > > + }; > > > }; > > > cpu7-middle-thermal { > > > @@ -6034,23 +6160,30 @@ cpu7-middle-thermal { > > > trips { > > > trip-point0 { > > > - temperature = <90000>; > > > + temperature = <108000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > - trip-point1 { > > > - temperature = <95000>; > > > + cpu7_middle_alert1: trip-point1 { > > > + temperature = <110000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > cpu7-critical { > > > - temperature = <110000>; > > > + temperature = <115000>; > > > hysteresis = <1000>; > > > type = "critical"; > > > }; > > > }; > > > + > > > + cooling-maps { > > > + map0 { > > > + trip = <&cpu7_middle_alert1>; > > > + cooling-device = <&cpu7_idle 100 100>; > > > + }; > > > + }; > > > }; > > > cpu7-bottom-thermal { > > > @@ -6058,23 +6191,30 @@ cpu7-bottom-thermal { > > > trips { > > > trip-point0 { > > > - temperature = <90000>; > > > + temperature = <108000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > - trip-point1 { > > > - temperature = <95000>; > > > + cpu7_bottom_alert1: trip-point1 { > > > + temperature = <110000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > cpu7-critical { > > > - temperature = <110000>; > > > + temperature = <115000>; > > > hysteresis = <1000>; > > > type = "critical"; > > > }; > > > }; > > > + > > > + cooling-maps { > > > + map0 { > > > + trip = <&cpu7_bottom_alert1>; > > > + cooling-device = <&cpu7_idle 100 100>; > > > + }; > > > + }; > > > }; > > > cpu0-thermal { > > > @@ -6082,23 +6222,30 @@ cpu0-thermal { > > > trips { > > > trip-point0 { > > > - temperature = <90000>; > > > + temperature = <108000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > - trip-point1 { > > > - temperature = <95000>; > > > + cpu0_alert1: trip-point1 { > > > + temperature = <110000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > cpu0-critical { > > > - temperature = <110000>; > > > + temperature = <115000>; > > > hysteresis = <1000>; > > > type = "critical"; > > > }; > > > }; > > > + > > > + cooling-maps { > > > + map0 { > > > + trip = <&cpu0_alert1>; > > > + cooling-device = <&cpu0_idle 100 100>; > > > + }; > > > + }; > > > }; > > > cpu1-thermal { > > > @@ -6106,23 +6253,30 @@ cpu1-thermal { > > > trips { > > > trip-point0 { > > > - temperature = <90000>; > > > + temperature = <108000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > - trip-point1 { > > > - temperature = <95000>; > > > + cpu1_alert1: trip-point1 { > > > + temperature = <110000>; > > > hysteresis = <2000>; > > > type = "passive"; > > > }; > > > cpu1-critical { > > > - temperature = <110000>; > > > + temperature = <115000>; > > > hysteresis = <1000>; > > > type = "critical"; > > > }; > > > }; > > > + > > > + cooling-maps { > > > + map0 { > > > + trip = <&cpu1_alert1>; > > > + cooling-device = <&cpu1_idle 100 100>; > > > + }; > > > + }; > > > }; > > > nsphvx0-thermal { > > > > > > -- > > > 2.34.1 > > > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] arm64: dts: qcom: sm8650: setup cpu thermal with idle on high temperatures 2025-01-08 3:11 ` Bjorn Andersson @ 2025-01-08 9:15 ` Neil Armstrong 2025-01-09 15:18 ` Konrad Dybcio 2025-01-09 21:01 ` Bjorn Andersson 0 siblings, 2 replies; 16+ messages in thread From: Neil Armstrong @ 2025-01-08 9:15 UTC (permalink / raw) To: Bjorn Andersson Cc: Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley, linux-arm-msm, devicetree, linux-kernel On 08/01/2025 04:11, Bjorn Andersson wrote: > On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote: >> Hi, >> >> On 07/01/2025 00:39, Bjorn Andersson wrote: >>> On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote: >>>> On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an >>>> hardware controlled loop using the LMH and EPSS blocks with constraints and >>>> OPPs programmed in the board firmware. >>>> >>>> Since the Hardware does a better job at maintaining the CPUs temperature >>>> in an acceptable range by taking in account more parameters like the die >>>> characteristics or other factory fused values, it makes no sense to try >>>> and reproduce a similar set of constraints with the Linux cpufreq thermal >>>> core. >>>> >>>> In addition, the tsens IP is responsible for monitoring the temperature >>>> across the SoC and the current settings will heavily trigger the tsens >>>> UP/LOW interrupts if the CPU temperatures reaches the hardware thermal >>>> constraints which are currently defined in the DT. And since the CPUs >>>> are not hooked in the thermal trip points, the potential interrupts and >>>> calculations are a waste of system resources. >>>> >>>> Instead, set higher temperatures in the CPU trip points, and hook some CPU >>>> idle injector with a 100% duty cycle at the highest trip point in the case >>>> the hardware DCVS cannot handle the temperature surge, and try our best to >>>> avoid reaching the critical temperature trip point which should trigger an >>>> inevitable thermal shutdown. >>>> >>> >>> Are you able to hit these higher temperatures? Do you have some test >>> case where the idle-injection shows to be successful in blocking us from >>> reaching the critical temp? >> >> No, I've been able to test idle-injection and observed a noticeable effect >> but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from >> scaling down and let the temp go higher ? >> > > I don't know how to override that configuration. > >>> >>> E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only >>> the critical trip for when the hardware fails us. >> >> It's the goal here aswell >> > > How about simplifying the patch by removing the idle-injection step and > just rely on LMH/EPSS and the "critical" trip (at least until someone > can prove that there's value in the extra mitigation)? OK, but I see value in this idle injection mitigation in that case LMH/EPSS fails, the only factor in control of HLOS is by stopping scheduling tasks since frequency won't be able to scale anymore. Anyway, I agree it can be added later on, so should I drop the 2 trip points and only leave the critical one ? > > Regards, > Bjorn > >>> >>> >>> I have no concerns at all about "removing" the 90C trip point, that >>> makes total sense to me - let the hardware keep the cores as close to >>> max as possible, and then use some slower sensor for keeping the system >>> temperature in check (such as the x13s skin sensor). >>> >>> >>> PS. The described behavior should apply to anything SDM845 and newer, so >>> I'd like to see this set/document precedence for other platforms. >>> >>> Regards, >>> Bjorn >>> >>>> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> >>>> --- >>>> arch/arm64/boot/dts/qcom/sm8650.dtsi | 274 +++++++++++++++++++++++++++-------- >>>> 1 file changed, 214 insertions(+), 60 deletions(-) >>>> >>>> diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi >>>> index 25e47505adcb790d09f1d2726386438487255824..448374a32e07151e35727d92fab77356769aea8a 100644 >>>> --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi >>>> +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi >>>> @@ -99,6 +99,13 @@ l3_0: l3-cache { >>>> cache-unified; >>>> }; >>>> }; >>>> + >>>> + cpu0_idle: thermal-idle { >>>> + #cooling-cells = <2>; >>>> + duration-us = <800000>; >>>> + exit-latency-us = <10000>; >>>> + }; >>>> + >>>> }; >>>> cpu1: cpu@100 { >>>> @@ -119,6 +126,12 @@ cpu1: cpu@100 { >>>> qcom,freq-domain = <&cpufreq_hw 0>; >>>> #cooling-cells = <2>; >>>> + >>>> + cpu1_idle: thermal-idle { >>>> + #cooling-cells = <2>; >>>> + duration-us = <800000>; >>>> + exit-latency-us = <10000>; >>>> + }; >>>> }; >>>> cpu2: cpu@200 { >>>> @@ -146,6 +159,12 @@ l2_200: l2-cache { >>>> cache-unified; >>>> next-level-cache = <&l3_0>; >>>> }; >>>> + >>>> + cpu2_idle: thermal-idle { >>>> + #cooling-cells = <2>; >>>> + duration-us = <800000>; >>>> + exit-latency-us = <10000>; >>>> + }; >>>> }; >>>> cpu3: cpu@300 { >>>> @@ -166,6 +185,12 @@ cpu3: cpu@300 { >>>> qcom,freq-domain = <&cpufreq_hw 3>; >>>> #cooling-cells = <2>; >>>> + >>>> + cpu3_idle: thermal-idle { >>>> + #cooling-cells = <2>; >>>> + duration-us = <800000>; >>>> + exit-latency-us = <10000>; >>>> + }; >>>> }; >>>> cpu4: cpu@400 { >>>> @@ -193,6 +218,12 @@ l2_400: l2-cache { >>>> cache-unified; >>>> next-level-cache = <&l3_0>; >>>> }; >>>> + >>>> + cpu4_idle: thermal-idle { >>>> + #cooling-cells = <2>; >>>> + duration-us = <800000>; >>>> + exit-latency-us = <10000>; >>>> + }; >>>> }; >>>> cpu5: cpu@500 { >>>> @@ -220,6 +251,12 @@ l2_500: l2-cache { >>>> cache-unified; >>>> next-level-cache = <&l3_0>; >>>> }; >>>> + >>>> + cpu5_idle: thermal-idle { >>>> + #cooling-cells = <2>; >>>> + duration-us = <800000>; >>>> + exit-latency-us = <10000>; >>>> + }; >>>> }; >>>> cpu6: cpu@600 { >>>> @@ -247,6 +284,12 @@ l2_600: l2-cache { >>>> cache-unified; >>>> next-level-cache = <&l3_0>; >>>> }; >>>> + >>>> + cpu6_idle: thermal-idle { >>>> + #cooling-cells = <2>; >>>> + duration-us = <800000>; >>>> + exit-latency-us = <10000>; >>>> + }; >>>> }; >>>> cpu7: cpu@700 { >>>> @@ -274,6 +317,12 @@ l2_700: l2-cache { >>>> cache-unified; >>>> next-level-cache = <&l3_0>; >>>> }; >>>> + >>>> + cpu7_idle: thermal-idle { >>>> + #cooling-cells = <2>; >>>> + duration-us = <800000>; >>>> + exit-latency-us = <10000>; >>>> + }; >>>> }; >>>> cpu-map { >>>> @@ -5752,23 +5801,30 @@ cpu2-top-thermal { >>>> trips { >>>> trip-point0 { >>>> - temperature = <90000>; >>>> + temperature = <108000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> - trip-point1 { >>>> - temperature = <95000>; >>>> + cpu2_top_alert1: trip-point1 { >>>> + temperature = <110000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> cpu2-critical { >>>> - temperature = <110000>; >>>> + temperature = <115000>; >>>> hysteresis = <1000>; >>>> type = "critical"; >>>> }; >>>> }; >>>> + >>>> + cooling-maps { >>>> + map0 { >>>> + trip = <&cpu2_top_alert1>; >>>> + cooling-device = <&cpu2_idle 100 100>; >>>> + }; >>>> + }; >>>> }; >>>> cpu2-bottom-thermal { >>>> @@ -5776,23 +5832,30 @@ cpu2-bottom-thermal { >>>> trips { >>>> trip-point0 { >>>> - temperature = <90000>; >>>> + temperature = <108000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> - trip-point1 { >>>> - temperature = <95000>; >>>> + cpu2_bottom_alert1: trip-point1 { >>>> + temperature = <110000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> cpu2-critical { >>>> - temperature = <110000>; >>>> + temperature = <115000>; >>>> hysteresis = <1000>; >>>> type = "critical"; >>>> }; >>>> }; >>>> + >>>> + cooling-maps { >>>> + map0 { >>>> + trip = <&cpu2_bottom_alert1>; >>>> + cooling-device = <&cpu2_idle 100 100>; >>>> + }; >>>> + }; >>>> }; >>>> cpu3-top-thermal { >>>> @@ -5800,23 +5863,30 @@ cpu3-top-thermal { >>>> trips { >>>> trip-point0 { >>>> - temperature = <90000>; >>>> + temperature = <108000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> - trip-point1 { >>>> - temperature = <95000>; >>>> + cpu3_top_alert1: trip-point1 { >>>> + temperature = <110000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> cpu3-critical { >>>> - temperature = <110000>; >>>> + temperature = <115000>; >>>> hysteresis = <1000>; >>>> type = "critical"; >>>> }; >>>> }; >>>> + >>>> + cooling-maps { >>>> + map0 { >>>> + trip = <&cpu3_top_alert1>; >>>> + cooling-device = <&cpu3_idle 100 100>; >>>> + }; >>>> + }; >>>> }; >>>> cpu3-bottom-thermal { >>>> @@ -5824,23 +5894,30 @@ cpu3-bottom-thermal { >>>> trips { >>>> trip-point0 { >>>> - temperature = <90000>; >>>> + temperature = <108000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> - trip-point1 { >>>> - temperature = <95000>; >>>> + cpu3_bottom_alert1: trip-point1 { >>>> + temperature = <110000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> cpu3-critical { >>>> - temperature = <110000>; >>>> + temperature = <115000>; >>>> hysteresis = <1000>; >>>> type = "critical"; >>>> }; >>>> }; >>>> + >>>> + cooling-maps { >>>> + map0 { >>>> + trip = <&cpu3_bottom_alert1>; >>>> + cooling-device = <&cpu3_idle 100 100>; >>>> + }; >>>> + }; >>>> }; >>>> cpu4-top-thermal { >>>> @@ -5848,23 +5925,30 @@ cpu4-top-thermal { >>>> trips { >>>> trip-point0 { >>>> - temperature = <90000>; >>>> + temperature = <108000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> - trip-point1 { >>>> - temperature = <95000>; >>>> + cpu4_top_alert1: trip-point1 { >>>> + temperature = <110000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> cpu4-critical { >>>> - temperature = <110000>; >>>> + temperature = <115000>; >>>> hysteresis = <1000>; >>>> type = "critical"; >>>> }; >>>> }; >>>> + >>>> + cooling-maps { >>>> + map0 { >>>> + trip = <&cpu4_top_alert1>; >>>> + cooling-device = <&cpu4_idle 100 100>; >>>> + }; >>>> + }; >>>> }; >>>> cpu4-bottom-thermal { >>>> @@ -5872,23 +5956,30 @@ cpu4-bottom-thermal { >>>> trips { >>>> trip-point0 { >>>> - temperature = <90000>; >>>> + temperature = <108000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> - trip-point1 { >>>> - temperature = <95000>; >>>> + cpu4_bottom_alert1: trip-point1 { >>>> + temperature = <110000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> cpu4-critical { >>>> - temperature = <110000>; >>>> + temperature = <115000>; >>>> hysteresis = <1000>; >>>> type = "critical"; >>>> }; >>>> }; >>>> + >>>> + cooling-maps { >>>> + map0 { >>>> + trip = <&cpu4_bottom_alert1>; >>>> + cooling-device = <&cpu4_idle 100 100>; >>>> + }; >>>> + }; >>>> }; >>>> cpu5-top-thermal { >>>> @@ -5896,23 +5987,30 @@ cpu5-top-thermal { >>>> trips { >>>> trip-point0 { >>>> - temperature = <90000>; >>>> + temperature = <108000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> - trip-point1 { >>>> - temperature = <95000>; >>>> + cpu5_top_alert1: trip-point1 { >>>> + temperature = <110000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> cpu5-critical { >>>> - temperature = <110000>; >>>> + temperature = <115000>; >>>> hysteresis = <1000>; >>>> type = "critical"; >>>> }; >>>> }; >>>> + >>>> + cooling-maps { >>>> + map0 { >>>> + trip = <&cpu5_top_alert1>; >>>> + cooling-device = <&cpu5_idle 100 100>; >>>> + }; >>>> + }; >>>> }; >>>> cpu5-bottom-thermal { >>>> @@ -5920,23 +6018,30 @@ cpu5-bottom-thermal { >>>> trips { >>>> trip-point0 { >>>> - temperature = <90000>; >>>> + temperature = <108000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> - trip-point1 { >>>> - temperature = <95000>; >>>> + cpu5_bottom_alert1: trip-point1 { >>>> + temperature = <110000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> cpu5-critical { >>>> - temperature = <110000>; >>>> + temperature = <115000>; >>>> hysteresis = <1000>; >>>> type = "critical"; >>>> }; >>>> }; >>>> + >>>> + cooling-maps { >>>> + map0 { >>>> + trip = <&cpu5_bottom_alert1>; >>>> + cooling-device = <&cpu5_idle 100 100>; >>>> + }; >>>> + }; >>>> }; >>>> cpu6-top-thermal { >>>> @@ -5944,23 +6049,30 @@ cpu6-top-thermal { >>>> trips { >>>> trip-point0 { >>>> - temperature = <90000>; >>>> + temperature = <108000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> - trip-point1 { >>>> - temperature = <95000>; >>>> + cpu6_top_alert1: trip-point1 { >>>> + temperature = <110000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> cpu6-critical { >>>> - temperature = <110000>; >>>> + temperature = <115000>; >>>> hysteresis = <1000>; >>>> type = "critical"; >>>> }; >>>> }; >>>> + >>>> + cooling-maps { >>>> + map0 { >>>> + trip = <&cpu6_top_alert1>; >>>> + cooling-device = <&cpu6_idle 100 100>; >>>> + }; >>>> + }; >>>> }; >>>> cpu6-bottom-thermal { >>>> @@ -5968,23 +6080,30 @@ cpu6-bottom-thermal { >>>> trips { >>>> trip-point0 { >>>> - temperature = <90000>; >>>> + temperature = <108000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> - trip-point1 { >>>> - temperature = <95000>; >>>> + cpu6_bottom_alert1: trip-point1 { >>>> + temperature = <110000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> cpu6-critical { >>>> - temperature = <110000>; >>>> + temperature = <115000>; >>>> hysteresis = <1000>; >>>> type = "critical"; >>>> }; >>>> }; >>>> + >>>> + cooling-maps { >>>> + map0 { >>>> + trip = <&cpu6_bottom_alert1>; >>>> + cooling-device = <&cpu6_idle 100 100>; >>>> + }; >>>> + }; >>>> }; >>>> aoss1-thermal { >>>> @@ -6010,23 +6129,30 @@ cpu7-top-thermal { >>>> trips { >>>> trip-point0 { >>>> - temperature = <90000>; >>>> + temperature = <108000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> - trip-point1 { >>>> - temperature = <95000>; >>>> + cpu7_top_alert1: trip-point1 { >>>> + temperature = <110000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> cpu7-critical { >>>> - temperature = <110000>; >>>> + temperature = <115000>; >>>> hysteresis = <1000>; >>>> type = "critical"; >>>> }; >>>> }; >>>> + >>>> + cooling-maps { >>>> + map0 { >>>> + trip = <&cpu7_top_alert1>; >>>> + cooling-device = <&cpu7_idle 100 100>; >>>> + }; >>>> + }; >>>> }; >>>> cpu7-middle-thermal { >>>> @@ -6034,23 +6160,30 @@ cpu7-middle-thermal { >>>> trips { >>>> trip-point0 { >>>> - temperature = <90000>; >>>> + temperature = <108000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> - trip-point1 { >>>> - temperature = <95000>; >>>> + cpu7_middle_alert1: trip-point1 { >>>> + temperature = <110000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> cpu7-critical { >>>> - temperature = <110000>; >>>> + temperature = <115000>; >>>> hysteresis = <1000>; >>>> type = "critical"; >>>> }; >>>> }; >>>> + >>>> + cooling-maps { >>>> + map0 { >>>> + trip = <&cpu7_middle_alert1>; >>>> + cooling-device = <&cpu7_idle 100 100>; >>>> + }; >>>> + }; >>>> }; >>>> cpu7-bottom-thermal { >>>> @@ -6058,23 +6191,30 @@ cpu7-bottom-thermal { >>>> trips { >>>> trip-point0 { >>>> - temperature = <90000>; >>>> + temperature = <108000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> - trip-point1 { >>>> - temperature = <95000>; >>>> + cpu7_bottom_alert1: trip-point1 { >>>> + temperature = <110000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> cpu7-critical { >>>> - temperature = <110000>; >>>> + temperature = <115000>; >>>> hysteresis = <1000>; >>>> type = "critical"; >>>> }; >>>> }; >>>> + >>>> + cooling-maps { >>>> + map0 { >>>> + trip = <&cpu7_bottom_alert1>; >>>> + cooling-device = <&cpu7_idle 100 100>; >>>> + }; >>>> + }; >>>> }; >>>> cpu0-thermal { >>>> @@ -6082,23 +6222,30 @@ cpu0-thermal { >>>> trips { >>>> trip-point0 { >>>> - temperature = <90000>; >>>> + temperature = <108000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> - trip-point1 { >>>> - temperature = <95000>; >>>> + cpu0_alert1: trip-point1 { >>>> + temperature = <110000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> cpu0-critical { >>>> - temperature = <110000>; >>>> + temperature = <115000>; >>>> hysteresis = <1000>; >>>> type = "critical"; >>>> }; >>>> }; >>>> + >>>> + cooling-maps { >>>> + map0 { >>>> + trip = <&cpu0_alert1>; >>>> + cooling-device = <&cpu0_idle 100 100>; >>>> + }; >>>> + }; >>>> }; >>>> cpu1-thermal { >>>> @@ -6106,23 +6253,30 @@ cpu1-thermal { >>>> trips { >>>> trip-point0 { >>>> - temperature = <90000>; >>>> + temperature = <108000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> - trip-point1 { >>>> - temperature = <95000>; >>>> + cpu1_alert1: trip-point1 { >>>> + temperature = <110000>; >>>> hysteresis = <2000>; >>>> type = "passive"; >>>> }; >>>> cpu1-critical { >>>> - temperature = <110000>; >>>> + temperature = <115000>; >>>> hysteresis = <1000>; >>>> type = "critical"; >>>> }; >>>> }; >>>> + >>>> + cooling-maps { >>>> + map0 { >>>> + trip = <&cpu1_alert1>; >>>> + cooling-device = <&cpu1_idle 100 100>; >>>> + }; >>>> + }; >>>> }; >>>> nsphvx0-thermal { >>>> >>>> -- >>>> 2.34.1 >>>> >> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] arm64: dts: qcom: sm8650: setup cpu thermal with idle on high temperatures 2025-01-08 9:15 ` Neil Armstrong @ 2025-01-09 15:18 ` Konrad Dybcio 2025-01-10 9:41 ` neil.armstrong 2025-01-09 21:01 ` Bjorn Andersson 1 sibling, 1 reply; 16+ messages in thread From: Konrad Dybcio @ 2025-01-09 15:18 UTC (permalink / raw) To: neil.armstrong, Bjorn Andersson Cc: Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley, linux-arm-msm, devicetree, linux-kernel On 8.01.2025 10:15 AM, Neil Armstrong wrote: > On 08/01/2025 04:11, Bjorn Andersson wrote: >> On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote: >>> Hi, >>> >>> On 07/01/2025 00:39, Bjorn Andersson wrote: >>>> On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote: >>>>> On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an >>>>> hardware controlled loop using the LMH and EPSS blocks with constraints and >>>>> OPPs programmed in the board firmware. >>>>> >>>>> Since the Hardware does a better job at maintaining the CPUs temperature >>>>> in an acceptable range by taking in account more parameters like the die >>>>> characteristics or other factory fused values, it makes no sense to try >>>>> and reproduce a similar set of constraints with the Linux cpufreq thermal >>>>> core. >>>>> >>>>> In addition, the tsens IP is responsible for monitoring the temperature >>>>> across the SoC and the current settings will heavily trigger the tsens >>>>> UP/LOW interrupts if the CPU temperatures reaches the hardware thermal >>>>> constraints which are currently defined in the DT. And since the CPUs >>>>> are not hooked in the thermal trip points, the potential interrupts and >>>>> calculations are a waste of system resources. >>>>> >>>>> Instead, set higher temperatures in the CPU trip points, and hook some CPU >>>>> idle injector with a 100% duty cycle at the highest trip point in the case >>>>> the hardware DCVS cannot handle the temperature surge, and try our best to >>>>> avoid reaching the critical temperature trip point which should trigger an >>>>> inevitable thermal shutdown. >>>>> >>>> >>>> Are you able to hit these higher temperatures? Do you have some test >>>> case where the idle-injection shows to be successful in blocking us from >>>> reaching the critical temp? >>> >>> No, I've been able to test idle-injection and observed a noticeable effect >>> but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from >>> scaling down and let the temp go higher ? >>> >> >> I don't know how to override that configuration. I'll try to get some answers. SDM845 seems to expose a couple SCM calls for this purpose and it's already wired up in drivers/thermal/qcom/lmh.c >>>> E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only >>>> the critical trip for when the hardware fails us. >>> >>> It's the goal here aswell >>> >> >> How about simplifying the patch by removing the idle-injection step and >> just rely on LMH/EPSS and the "critical" trip (at least until someone >> can prove that there's value in the extra mitigation)? > > OK, but I see value in this idle injection mitigation in that case LMH/EPSS > fails, the only factor in control of HLOS is by stopping scheduling tasks > since frequency won't be able to scale anymore. If LMH fails, your SoC is probably cooked already, anyway :( I'm not sure why idle injection isn't enabled by default if no other cooling methods are found. Perhaps that could be discussed with some thermal folks.. > Anyway, I agree it can be added later on, so should I drop the 2 trip points > and only leave the critical one ? I think sticking with critical=Tjmax + critical-action = "reboot" may be the way to go here. We may want to give some folks a heads up, so they can wire up skin sensors on their devices ahead of these changes landing tree-wide. Konrad ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] arm64: dts: qcom: sm8650: setup cpu thermal with idle on high temperatures 2025-01-09 15:18 ` Konrad Dybcio @ 2025-01-10 9:41 ` neil.armstrong 0 siblings, 0 replies; 16+ messages in thread From: neil.armstrong @ 2025-01-10 9:41 UTC (permalink / raw) To: Konrad Dybcio, Bjorn Andersson Cc: Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley, linux-arm-msm, devicetree, linux-kernel On 09/01/2025 16:18, Konrad Dybcio wrote: > On 8.01.2025 10:15 AM, Neil Armstrong wrote: >> On 08/01/2025 04:11, Bjorn Andersson wrote: >>> On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote: >>>> Hi, >>>> >>>> On 07/01/2025 00:39, Bjorn Andersson wrote: >>>>> On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote: >>>>>> On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an >>>>>> hardware controlled loop using the LMH and EPSS blocks with constraints and >>>>>> OPPs programmed in the board firmware. >>>>>> >>>>>> Since the Hardware does a better job at maintaining the CPUs temperature >>>>>> in an acceptable range by taking in account more parameters like the die >>>>>> characteristics or other factory fused values, it makes no sense to try >>>>>> and reproduce a similar set of constraints with the Linux cpufreq thermal >>>>>> core. >>>>>> >>>>>> In addition, the tsens IP is responsible for monitoring the temperature >>>>>> across the SoC and the current settings will heavily trigger the tsens >>>>>> UP/LOW interrupts if the CPU temperatures reaches the hardware thermal >>>>>> constraints which are currently defined in the DT. And since the CPUs >>>>>> are not hooked in the thermal trip points, the potential interrupts and >>>>>> calculations are a waste of system resources. >>>>>> >>>>>> Instead, set higher temperatures in the CPU trip points, and hook some CPU >>>>>> idle injector with a 100% duty cycle at the highest trip point in the case >>>>>> the hardware DCVS cannot handle the temperature surge, and try our best to >>>>>> avoid reaching the critical temperature trip point which should trigger an >>>>>> inevitable thermal shutdown. >>>>>> >>>>> >>>>> Are you able to hit these higher temperatures? Do you have some test >>>>> case where the idle-injection shows to be successful in blocking us from >>>>> reaching the critical temp? >>>> >>>> No, I've been able to test idle-injection and observed a noticeable effect >>>> but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from >>>> scaling down and let the temp go higher ? >>>> >>> >>> I don't know how to override that configuration. > > I'll try to get some answers. SDM845 seems to expose a couple SCM calls for > this purpose and it's already wired up in drivers/thermal/qcom/lmh.c Would be great, thx > >>>>> E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only >>>>> the critical trip for when the hardware fails us. >>>> >>>> It's the goal here aswell >>>> >>> >>> How about simplifying the patch by removing the idle-injection step and >>> just rely on LMH/EPSS and the "critical" trip (at least until someone >>> can prove that there's value in the extra mitigation)? >> >> OK, but I see value in this idle injection mitigation in that case LMH/EPSS >> fails, the only factor in control of HLOS is by stopping scheduling tasks >> since frequency won't be able to scale anymore. > > If LMH fails, your SoC is probably cooked already, anyway :( > > I'm not sure why idle injection isn't enabled by default if no other cooling > methods are found. Perhaps that could be discussed with some thermal folks.. Yeah this is good question, this should probably be the default "hot" behaviour > >> Anyway, I agree it can be added later on, so should I drop the 2 trip points >> and only leave the critical one ? > > I think sticking with critical=Tjmax + critical-action = "reboot" may be the > way to go here. > > We may want to give some folks a heads up, so they can wire up skin sensors > on their devices ahead of these changes landing tree-wide. Yeah it's also my goal, will respin with only critical. Thanks, Neil > > Konrad ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] arm64: dts: qcom: sm8650: setup cpu thermal with idle on high temperatures 2025-01-08 9:15 ` Neil Armstrong 2025-01-09 15:18 ` Konrad Dybcio @ 2025-01-09 21:01 ` Bjorn Andersson 2025-01-10 9:40 ` Neil Armstrong 1 sibling, 1 reply; 16+ messages in thread From: Bjorn Andersson @ 2025-01-09 21:01 UTC (permalink / raw) To: Neil Armstrong Cc: Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley, linux-arm-msm, devicetree, linux-kernel On Wed, Jan 08, 2025 at 10:15:34AM +0100, Neil Armstrong wrote: > On 08/01/2025 04:11, Bjorn Andersson wrote: > > On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote: > > > Hi, > > > > > > On 07/01/2025 00:39, Bjorn Andersson wrote: > > > > On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote: > > > > > On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an > > > > > hardware controlled loop using the LMH and EPSS blocks with constraints and > > > > > OPPs programmed in the board firmware. > > > > > > > > > > Since the Hardware does a better job at maintaining the CPUs temperature > > > > > in an acceptable range by taking in account more parameters like the die > > > > > characteristics or other factory fused values, it makes no sense to try > > > > > and reproduce a similar set of constraints with the Linux cpufreq thermal > > > > > core. > > > > > > > > > > In addition, the tsens IP is responsible for monitoring the temperature > > > > > across the SoC and the current settings will heavily trigger the tsens > > > > > UP/LOW interrupts if the CPU temperatures reaches the hardware thermal > > > > > constraints which are currently defined in the DT. And since the CPUs > > > > > are not hooked in the thermal trip points, the potential interrupts and > > > > > calculations are a waste of system resources. > > > > > > > > > > Instead, set higher temperatures in the CPU trip points, and hook some CPU > > > > > idle injector with a 100% duty cycle at the highest trip point in the case > > > > > the hardware DCVS cannot handle the temperature surge, and try our best to > > > > > avoid reaching the critical temperature trip point which should trigger an > > > > > inevitable thermal shutdown. > > > > > > > > > > > > > Are you able to hit these higher temperatures? Do you have some test > > > > case where the idle-injection shows to be successful in blocking us from > > > > reaching the critical temp? > > > > > > No, I've been able to test idle-injection and observed a noticeable effect > > > but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from > > > scaling down and let the temp go higher ? > > > > > > > I don't know how to override that configuration. > > > > > > > > > > E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only > > > > the critical trip for when the hardware fails us. > > > > > > It's the goal here aswell > > > > > > > How about simplifying the patch by removing the idle-injection step and > > just rely on LMH/EPSS and the "critical" trip (at least until someone > > can prove that there's value in the extra mitigation)? > > OK, but I see value in this idle injection mitigation in that case LMH/EPSS > fails, the only factor in control of HLOS is by stopping scheduling tasks > since frequency won't be able to scale anymore. > I think that sounds good, but afaict we don't have any indication of this being a problem and we don't have any way to test that it actually solves that problem. > Anyway, I agree it can be added later on, so should I drop the 2 trip points > and only leave the critical one ? > I think that's a simple and functional starting point - and it solves your IRQ issue. Regards, Bjorn ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] arm64: dts: qcom: sm8650: setup cpu thermal with idle on high temperatures 2025-01-09 21:01 ` Bjorn Andersson @ 2025-01-10 9:40 ` Neil Armstrong 0 siblings, 0 replies; 16+ messages in thread From: Neil Armstrong @ 2025-01-10 9:40 UTC (permalink / raw) To: Bjorn Andersson Cc: Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley, linux-arm-msm, devicetree, linux-kernel On 09/01/2025 22:01, Bjorn Andersson wrote: > On Wed, Jan 08, 2025 at 10:15:34AM +0100, Neil Armstrong wrote: >> On 08/01/2025 04:11, Bjorn Andersson wrote: >>> On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote: >>>> Hi, >>>> >>>> On 07/01/2025 00:39, Bjorn Andersson wrote: >>>>> On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote: >>>>>> On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an >>>>>> hardware controlled loop using the LMH and EPSS blocks with constraints and >>>>>> OPPs programmed in the board firmware. >>>>>> >>>>>> Since the Hardware does a better job at maintaining the CPUs temperature >>>>>> in an acceptable range by taking in account more parameters like the die >>>>>> characteristics or other factory fused values, it makes no sense to try >>>>>> and reproduce a similar set of constraints with the Linux cpufreq thermal >>>>>> core. >>>>>> >>>>>> In addition, the tsens IP is responsible for monitoring the temperature >>>>>> across the SoC and the current settings will heavily trigger the tsens >>>>>> UP/LOW interrupts if the CPU temperatures reaches the hardware thermal >>>>>> constraints which are currently defined in the DT. And since the CPUs >>>>>> are not hooked in the thermal trip points, the potential interrupts and >>>>>> calculations are a waste of system resources. >>>>>> >>>>>> Instead, set higher temperatures in the CPU trip points, and hook some CPU >>>>>> idle injector with a 100% duty cycle at the highest trip point in the case >>>>>> the hardware DCVS cannot handle the temperature surge, and try our best to >>>>>> avoid reaching the critical temperature trip point which should trigger an >>>>>> inevitable thermal shutdown. >>>>>> >>>>> >>>>> Are you able to hit these higher temperatures? Do you have some test >>>>> case where the idle-injection shows to be successful in blocking us from >>>>> reaching the critical temp? >>>> >>>> No, I've been able to test idle-injection and observed a noticeable effect >>>> but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from >>>> scaling down and let the temp go higher ? >>>> >>> >>> I don't know how to override that configuration. >>> >>>>> >>>>> E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only >>>>> the critical trip for when the hardware fails us. >>>> >>>> It's the goal here aswell >>>> >>> >>> How about simplifying the patch by removing the idle-injection step and >>> just rely on LMH/EPSS and the "critical" trip (at least until someone >>> can prove that there's value in the extra mitigation)? >> >> OK, but I see value in this idle injection mitigation in that case LMH/EPSS >> fails, the only factor in control of HLOS is by stopping scheduling tasks >> since frequency won't be able to scale anymore. >> > > I think that sounds good, but afaict we don't have any indication of > this being a problem and we don't have any way to test that it actually > solves that problem. Sure, let's postpone the idle injection when we can actually test it. > >> Anyway, I agree it can be added later on, so should I drop the 2 trip points >> and only leave the critical one ? >> > > I think that's a simple and functional starting point - and it solves > your IRQ issue. Ack Thanks, Neil > > Regards, > Bjorn ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures 2025-01-03 14:38 [PATCH 0/2] arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones Neil Armstrong 2025-01-03 14:38 ` [PATCH 1/2] arm64: dts: qcom: sm8650: setup cpu thermal with idle on high temperatures Neil Armstrong @ 2025-01-03 14:38 ` Neil Armstrong 2025-01-03 20:00 ` Rob Clark 2025-01-03 14:43 ` [PATCH 0/2] arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones Konrad Dybcio 2 siblings, 1 reply; 16+ messages in thread From: Neil Armstrong @ 2025-01-03 14:38 UTC (permalink / raw) To: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley Cc: linux-arm-msm, devicetree, linux-kernel, Neil Armstrong On the SM8650, the dynamic clock and voltage scaling (DCVS) for the GPU is done in an hardware controlled loop by the GPU Management Unit (GMU). Since the GMU does a better job at maintaining the GPUs temperature in an acceptable range by taking in account more parameters like the die characteristics or other internal sensors, it makes no sense to try and reproduce a similar set of constraints with the Linux devfreq thermal core. Instead, set higher temperatures in the GPU trip points corresponding to the temperatures provided by Qualcomm in the dowstream source, which will trigger the devfreq thermal core if the GMU cannot handle the temperature surge, and try our best to avoid reaching the critical temperature trip point which should trigger an inevitable thermal shutdown. Fixes: 497624ed5506 ("arm64: dts: qcom: sm8650: Throttle the GPU when overheating") Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> --- arch/arm64/boot/dts/qcom/sm8650.dtsi | 48 ++++++++++++++++++------------------ 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi index 448374a32e07151e35727d92fab77356769aea8a..ddcb57886eb5eac2a70d28e6ad68fc6820b5dcf1 100644 --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi @@ -6507,19 +6507,19 @@ map0 { trips { gpu0_alert0: trip-point0 { - temperature = <85000>; + temperature = <95000>; hysteresis = <1000>; type = "passive"; }; trip-point1 { - temperature = <90000>; + temperature = <115000>; hysteresis = <1000>; type = "hot"; }; trip-point2 { - temperature = <110000>; + temperature = <125000>; hysteresis = <1000>; type = "critical"; }; @@ -6540,19 +6540,19 @@ map0 { trips { gpu1_alert0: trip-point0 { - temperature = <85000>; + temperature = <95000>; hysteresis = <1000>; type = "passive"; }; trip-point1 { - temperature = <90000>; + temperature = <115000>; hysteresis = <1000>; type = "hot"; }; trip-point2 { - temperature = <110000>; + temperature = <125000>; hysteresis = <1000>; type = "critical"; }; @@ -6573,19 +6573,19 @@ map0 { trips { gpu2_alert0: trip-point0 { - temperature = <85000>; + temperature = <95000>; hysteresis = <1000>; type = "passive"; }; trip-point1 { - temperature = <90000>; + temperature = <115000>; hysteresis = <1000>; type = "hot"; }; trip-point2 { - temperature = <110000>; + temperature = <125000>; hysteresis = <1000>; type = "critical"; }; @@ -6606,19 +6606,19 @@ map0 { trips { gpu3_alert0: trip-point0 { - temperature = <85000>; + temperature = <95000>; hysteresis = <1000>; type = "passive"; }; trip-point1 { - temperature = <90000>; + temperature = <115000>; hysteresis = <1000>; type = "hot"; }; trip-point2 { - temperature = <110000>; + temperature = <125000>; hysteresis = <1000>; type = "critical"; }; @@ -6639,19 +6639,19 @@ map0 { trips { gpu4_alert0: trip-point0 { - temperature = <85000>; + temperature = <95000>; hysteresis = <1000>; type = "passive"; }; trip-point1 { - temperature = <90000>; + temperature = <115000>; hysteresis = <1000>; type = "hot"; }; trip-point2 { - temperature = <110000>; + temperature = <125000>; hysteresis = <1000>; type = "critical"; }; @@ -6672,19 +6672,19 @@ map0 { trips { gpu5_alert0: trip-point0 { - temperature = <85000>; + temperature = <95000>; hysteresis = <1000>; type = "passive"; }; trip-point1 { - temperature = <90000>; + temperature = <115000>; hysteresis = <1000>; type = "hot"; }; trip-point2 { - temperature = <110000>; + temperature = <125000>; hysteresis = <1000>; type = "critical"; }; @@ -6705,19 +6705,19 @@ map0 { trips { gpu6_alert0: trip-point0 { - temperature = <85000>; + temperature = <95000>; hysteresis = <1000>; type = "passive"; }; trip-point1 { - temperature = <90000>; + temperature = <115000>; hysteresis = <1000>; type = "hot"; }; trip-point2 { - temperature = <110000>; + temperature = <125000>; hysteresis = <1000>; type = "critical"; }; @@ -6738,19 +6738,19 @@ map0 { trips { gpu7_alert0: trip-point0 { - temperature = <85000>; + temperature = <95000>; hysteresis = <1000>; type = "passive"; }; trip-point1 { - temperature = <90000>; + temperature = <115000>; hysteresis = <1000>; type = "hot"; }; trip-point2 { - temperature = <110000>; + temperature = <125000>; hysteresis = <1000>; type = "critical"; }; -- 2.34.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures 2025-01-03 14:38 ` [PATCH 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures Neil Armstrong @ 2025-01-03 20:00 ` Rob Clark 2025-01-06 9:02 ` Neil Armstrong 0 siblings, 1 reply; 16+ messages in thread From: Rob Clark @ 2025-01-03 20:00 UTC (permalink / raw) To: Neil Armstrong Cc: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley, linux-arm-msm, devicetree, linux-kernel On Fri, Jan 3, 2025 at 6:38 AM Neil Armstrong <neil.armstrong@linaro.org> wrote: > > On the SM8650, the dynamic clock and voltage scaling (DCVS) for the GPU > is done in an hardware controlled loop by the GPU Management Unit (GMU). > > Since the GMU does a better job at maintaining the GPUs temperature in an > acceptable range by taking in account more parameters like the die > characteristics or other internal sensors, it makes no sense to try > and reproduce a similar set of constraints with the Linux devfreq thermal > core. > > Instead, set higher temperatures in the GPU trip points corresponding to > the temperatures provided by Qualcomm in the dowstream source, which will > trigger the devfreq thermal core if the GMU cannot handle the temperature > surge, and try our best to avoid reaching the critical temperature trip > point which should trigger an inevitable thermal shutdown. to we need something like this on other recent SoCs, like x1e? BR, -R > Fixes: 497624ed5506 ("arm64: dts: qcom: sm8650: Throttle the GPU when overheating") > Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> > --- > arch/arm64/boot/dts/qcom/sm8650.dtsi | 48 ++++++++++++++++++------------------ > 1 file changed, 24 insertions(+), 24 deletions(-) > > diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi > index 448374a32e07151e35727d92fab77356769aea8a..ddcb57886eb5eac2a70d28e6ad68fc6820b5dcf1 100644 > --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi > +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi > @@ -6507,19 +6507,19 @@ map0 { > > trips { > gpu0_alert0: trip-point0 { > - temperature = <85000>; > + temperature = <95000>; > hysteresis = <1000>; > type = "passive"; > }; > > trip-point1 { > - temperature = <90000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "hot"; > }; > > trip-point2 { > - temperature = <110000>; > + temperature = <125000>; > hysteresis = <1000>; > type = "critical"; > }; > @@ -6540,19 +6540,19 @@ map0 { > > trips { > gpu1_alert0: trip-point0 { > - temperature = <85000>; > + temperature = <95000>; > hysteresis = <1000>; > type = "passive"; > }; > > trip-point1 { > - temperature = <90000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "hot"; > }; > > trip-point2 { > - temperature = <110000>; > + temperature = <125000>; > hysteresis = <1000>; > type = "critical"; > }; > @@ -6573,19 +6573,19 @@ map0 { > > trips { > gpu2_alert0: trip-point0 { > - temperature = <85000>; > + temperature = <95000>; > hysteresis = <1000>; > type = "passive"; > }; > > trip-point1 { > - temperature = <90000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "hot"; > }; > > trip-point2 { > - temperature = <110000>; > + temperature = <125000>; > hysteresis = <1000>; > type = "critical"; > }; > @@ -6606,19 +6606,19 @@ map0 { > > trips { > gpu3_alert0: trip-point0 { > - temperature = <85000>; > + temperature = <95000>; > hysteresis = <1000>; > type = "passive"; > }; > > trip-point1 { > - temperature = <90000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "hot"; > }; > > trip-point2 { > - temperature = <110000>; > + temperature = <125000>; > hysteresis = <1000>; > type = "critical"; > }; > @@ -6639,19 +6639,19 @@ map0 { > > trips { > gpu4_alert0: trip-point0 { > - temperature = <85000>; > + temperature = <95000>; > hysteresis = <1000>; > type = "passive"; > }; > > trip-point1 { > - temperature = <90000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "hot"; > }; > > trip-point2 { > - temperature = <110000>; > + temperature = <125000>; > hysteresis = <1000>; > type = "critical"; > }; > @@ -6672,19 +6672,19 @@ map0 { > > trips { > gpu5_alert0: trip-point0 { > - temperature = <85000>; > + temperature = <95000>; > hysteresis = <1000>; > type = "passive"; > }; > > trip-point1 { > - temperature = <90000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "hot"; > }; > > trip-point2 { > - temperature = <110000>; > + temperature = <125000>; > hysteresis = <1000>; > type = "critical"; > }; > @@ -6705,19 +6705,19 @@ map0 { > > trips { > gpu6_alert0: trip-point0 { > - temperature = <85000>; > + temperature = <95000>; > hysteresis = <1000>; > type = "passive"; > }; > > trip-point1 { > - temperature = <90000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "hot"; > }; > > trip-point2 { > - temperature = <110000>; > + temperature = <125000>; > hysteresis = <1000>; > type = "critical"; > }; > @@ -6738,19 +6738,19 @@ map0 { > > trips { > gpu7_alert0: trip-point0 { > - temperature = <85000>; > + temperature = <95000>; > hysteresis = <1000>; > type = "passive"; > }; > > trip-point1 { > - temperature = <90000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "hot"; > }; > > trip-point2 { > - temperature = <110000>; > + temperature = <125000>; > hysteresis = <1000>; > type = "critical"; > }; > > -- > 2.34.1 > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures 2025-01-03 20:00 ` Rob Clark @ 2025-01-06 9:02 ` Neil Armstrong 0 siblings, 0 replies; 16+ messages in thread From: Neil Armstrong @ 2025-01-06 9:02 UTC (permalink / raw) To: Rob Clark Cc: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley, linux-arm-msm, devicetree, linux-kernel On 03/01/2025 21:00, Rob Clark wrote: > On Fri, Jan 3, 2025 at 6:38 AM Neil Armstrong <neil.armstrong@linaro.org> wrote: >> >> On the SM8650, the dynamic clock and voltage scaling (DCVS) for the GPU >> is done in an hardware controlled loop by the GPU Management Unit (GMU). >> >> Since the GMU does a better job at maintaining the GPUs temperature in an >> acceptable range by taking in account more parameters like the die >> characteristics or other internal sensors, it makes no sense to try >> and reproduce a similar set of constraints with the Linux devfreq thermal >> core. >> >> Instead, set higher temperatures in the GPU trip points corresponding to >> the temperatures provided by Qualcomm in the dowstream source, which will >> trigger the devfreq thermal core if the GMU cannot handle the temperature >> surge, and try our best to avoid reaching the critical temperature trip >> point which should trigger an inevitable thermal shutdown. > > to we need something like this on other recent SoCs, like x1e? Probably yes, but I don't have physical access to the platorm.. Neil > > BR, > -R > >> Fixes: 497624ed5506 ("arm64: dts: qcom: sm8650: Throttle the GPU when overheating") >> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> >> --- >> arch/arm64/boot/dts/qcom/sm8650.dtsi | 48 ++++++++++++++++++------------------ >> 1 file changed, 24 insertions(+), 24 deletions(-) >> >> diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi >> index 448374a32e07151e35727d92fab77356769aea8a..ddcb57886eb5eac2a70d28e6ad68fc6820b5dcf1 100644 >> --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi >> +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi >> @@ -6507,19 +6507,19 @@ map0 { >> >> trips { >> gpu0_alert0: trip-point0 { >> - temperature = <85000>; >> + temperature = <95000>; >> hysteresis = <1000>; >> type = "passive"; >> }; >> >> trip-point1 { >> - temperature = <90000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "hot"; >> }; >> >> trip-point2 { >> - temperature = <110000>; >> + temperature = <125000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> @@ -6540,19 +6540,19 @@ map0 { >> >> trips { >> gpu1_alert0: trip-point0 { >> - temperature = <85000>; >> + temperature = <95000>; >> hysteresis = <1000>; >> type = "passive"; >> }; >> >> trip-point1 { >> - temperature = <90000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "hot"; >> }; >> >> trip-point2 { >> - temperature = <110000>; >> + temperature = <125000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> @@ -6573,19 +6573,19 @@ map0 { >> >> trips { >> gpu2_alert0: trip-point0 { >> - temperature = <85000>; >> + temperature = <95000>; >> hysteresis = <1000>; >> type = "passive"; >> }; >> >> trip-point1 { >> - temperature = <90000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "hot"; >> }; >> >> trip-point2 { >> - temperature = <110000>; >> + temperature = <125000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> @@ -6606,19 +6606,19 @@ map0 { >> >> trips { >> gpu3_alert0: trip-point0 { >> - temperature = <85000>; >> + temperature = <95000>; >> hysteresis = <1000>; >> type = "passive"; >> }; >> >> trip-point1 { >> - temperature = <90000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "hot"; >> }; >> >> trip-point2 { >> - temperature = <110000>; >> + temperature = <125000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> @@ -6639,19 +6639,19 @@ map0 { >> >> trips { >> gpu4_alert0: trip-point0 { >> - temperature = <85000>; >> + temperature = <95000>; >> hysteresis = <1000>; >> type = "passive"; >> }; >> >> trip-point1 { >> - temperature = <90000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "hot"; >> }; >> >> trip-point2 { >> - temperature = <110000>; >> + temperature = <125000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> @@ -6672,19 +6672,19 @@ map0 { >> >> trips { >> gpu5_alert0: trip-point0 { >> - temperature = <85000>; >> + temperature = <95000>; >> hysteresis = <1000>; >> type = "passive"; >> }; >> >> trip-point1 { >> - temperature = <90000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "hot"; >> }; >> >> trip-point2 { >> - temperature = <110000>; >> + temperature = <125000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> @@ -6705,19 +6705,19 @@ map0 { >> >> trips { >> gpu6_alert0: trip-point0 { >> - temperature = <85000>; >> + temperature = <95000>; >> hysteresis = <1000>; >> type = "passive"; >> }; >> >> trip-point1 { >> - temperature = <90000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "hot"; >> }; >> >> trip-point2 { >> - temperature = <110000>; >> + temperature = <125000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> @@ -6738,19 +6738,19 @@ map0 { >> >> trips { >> gpu7_alert0: trip-point0 { >> - temperature = <85000>; >> + temperature = <95000>; >> hysteresis = <1000>; >> type = "passive"; >> }; >> >> trip-point1 { >> - temperature = <90000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "hot"; >> }; >> >> trip-point2 { >> - temperature = <110000>; >> + temperature = <125000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> >> -- >> 2.34.1 >> >> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones 2025-01-03 14:38 [PATCH 0/2] arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones Neil Armstrong 2025-01-03 14:38 ` [PATCH 1/2] arm64: dts: qcom: sm8650: setup cpu thermal with idle on high temperatures Neil Armstrong 2025-01-03 14:38 ` [PATCH 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures Neil Armstrong @ 2025-01-03 14:43 ` Konrad Dybcio 2025-01-03 14:49 ` Neil Armstrong 2 siblings, 1 reply; 16+ messages in thread From: Konrad Dybcio @ 2025-01-03 14:43 UTC (permalink / raw) To: Neil Armstrong, Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley Cc: linux-arm-msm, devicetree, linux-kernel On 3.01.2025 3:38 PM, Neil Armstrong wrote: > On the SM8650 platform, the dynamic clock and voltage scaling (DCVS) for > the CPUs and GPU is handled by hardware & firmware using factory and > form-factor determined parameters in order to maximize frequency while > keeping the temperature way below the junction temperature where the SoC > would experience a thermal shutdown if not permanent damages. > > On the other side, the High Level Ooperating System (HLOS), like Linux, > is able to adjust the CPU and GPU frequency using the internal SoC > temperature sensors (here tsens) and it's UP/LOW interrupts, but it > effectly does the same work twice in an less effective manner. > > Let's take the Hardware & Firmware action in account and design the > thermal zones trip points and cooling devices mapping to use the HLOS > as a safety warant in case the platform experiences a temperature surge > to helpfully avoid a thermal shutdown and handle the scenario gracefully. > > On the CPU side, the LMh hardware does the DCVS control loop, so > let's set higher trip points temperatures closer to the junction > and thermal shutdown temperatures and add some idle injection cooling > device with 100% duty cycle for each CPU that would act as emergency > action to avoid the thermal shutdown. > > On the GPU side, the GPU Management Unit (GMU) acts as the DCVS > control loop, but since we can't perform idle injection, let's > also set higher trip points temperatures closer to the junction > and thermal shutdown temperatures to reduce the GPU frequency only > as an emergency action before the thermal shutdown. > > Those 2 changes optimizes the thermal management design by avoiding > concurrent thermal management, calculations & avoidable interrupts > by moving the HLOS management to a last resort emergency if the > Hardware & Firmwares fails to avoid a thermal shutdown. > > Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> > --- Got any numbers to back this? Konrad ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones 2025-01-03 14:43 ` [PATCH 0/2] arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones Konrad Dybcio @ 2025-01-03 14:49 ` Neil Armstrong 2025-01-09 15:20 ` Konrad Dybcio 0 siblings, 1 reply; 16+ messages in thread From: Neil Armstrong @ 2025-01-03 14:49 UTC (permalink / raw) To: Konrad Dybcio, Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley Cc: linux-arm-msm, devicetree, linux-kernel On 03/01/2025 15:43, Konrad Dybcio wrote: > On 3.01.2025 3:38 PM, Neil Armstrong wrote: >> On the SM8650 platform, the dynamic clock and voltage scaling (DCVS) for >> the CPUs and GPU is handled by hardware & firmware using factory and >> form-factor determined parameters in order to maximize frequency while >> keeping the temperature way below the junction temperature where the SoC >> would experience a thermal shutdown if not permanent damages. >> >> On the other side, the High Level Ooperating System (HLOS), like Linux, >> is able to adjust the CPU and GPU frequency using the internal SoC >> temperature sensors (here tsens) and it's UP/LOW interrupts, but it >> effectly does the same work twice in an less effective manner. >> >> Let's take the Hardware & Firmware action in account and design the >> thermal zones trip points and cooling devices mapping to use the HLOS >> as a safety warant in case the platform experiences a temperature surge >> to helpfully avoid a thermal shutdown and handle the scenario gracefully. >> >> On the CPU side, the LMh hardware does the DCVS control loop, so >> let's set higher trip points temperatures closer to the junction >> and thermal shutdown temperatures and add some idle injection cooling >> device with 100% duty cycle for each CPU that would act as emergency >> action to avoid the thermal shutdown. >> >> On the GPU side, the GPU Management Unit (GMU) acts as the DCVS >> control loop, but since we can't perform idle injection, let's >> also set higher trip points temperatures closer to the junction >> and thermal shutdown temperatures to reduce the GPU frequency only >> as an emergency action before the thermal shutdown. >> >> Those 2 changes optimizes the thermal management design by avoiding >> concurrent thermal management, calculations & avoidable interrupts >> by moving the HLOS management to a last resort emergency if the >> Hardware & Firmwares fails to avoid a thermal shutdown. >> >> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> >> --- > > Got any numbers to back this? To back which part ? Yes I've been running loads with difference scenarios and effectively the hardware work is much better with a more linear correction and slighly better performances because it sets slighly higger OPPs while maintaining the core closer to the target temperature range. Which is kind of expected. I don't have easy numbers to share, sorry... So yes I consider avoiding the concurrent effort is better, but since we also take the firmware design in account in the whole platform representation in DT (DSPs, SCM, GMU, ...) we should also extend this to thermal. Neil > > Konrad ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones 2025-01-03 14:49 ` Neil Armstrong @ 2025-01-09 15:20 ` Konrad Dybcio 0 siblings, 0 replies; 16+ messages in thread From: Konrad Dybcio @ 2025-01-09 15:20 UTC (permalink / raw) To: neil.armstrong, Konrad Dybcio, Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley, robclark Cc: linux-arm-msm, devicetree, linux-kernel On 3.01.2025 3:49 PM, Neil Armstrong wrote: > On 03/01/2025 15:43, Konrad Dybcio wrote: >> On 3.01.2025 3:38 PM, Neil Armstrong wrote: >>> On the SM8650 platform, the dynamic clock and voltage scaling (DCVS) for >>> the CPUs and GPU is handled by hardware & firmware using factory and >>> form-factor determined parameters in order to maximize frequency while >>> keeping the temperature way below the junction temperature where the SoC >>> would experience a thermal shutdown if not permanent damages. >>> >>> On the other side, the High Level Ooperating System (HLOS), like Linux, >>> is able to adjust the CPU and GPU frequency using the internal SoC >>> temperature sensors (here tsens) and it's UP/LOW interrupts, but it >>> effectly does the same work twice in an less effective manner. >>> >>> Let's take the Hardware & Firmware action in account and design the >>> thermal zones trip points and cooling devices mapping to use the HLOS >>> as a safety warant in case the platform experiences a temperature surge >>> to helpfully avoid a thermal shutdown and handle the scenario gracefully. >>> >>> On the CPU side, the LMh hardware does the DCVS control loop, so >>> let's set higher trip points temperatures closer to the junction >>> and thermal shutdown temperatures and add some idle injection cooling >>> device with 100% duty cycle for each CPU that would act as emergency >>> action to avoid the thermal shutdown. >>> >>> On the GPU side, the GPU Management Unit (GMU) acts as the DCVS >>> control loop, but since we can't perform idle injection, let's >>> also set higher trip points temperatures closer to the junction >>> and thermal shutdown temperatures to reduce the GPU frequency only >>> as an emergency action before the thermal shutdown. We could probably work out some mechanism for drm to say "gpu is too hot / too busy" and stall the userspace's requests.. If that doesn't exist already (+RobC) >>> >>> Those 2 changes optimizes the thermal management design by avoiding >>> concurrent thermal management, calculations & avoidable interrupts >>> by moving the HLOS management to a last resort emergency if the >>> Hardware & Firmwares fails to avoid a thermal shutdown. >>> >>> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> >>> --- >> >> Got any numbers to back this? > > To back which part ? Yes I've been running loads with difference > scenarios and effectively the hardware work is much better with > a more linear correction and slighly better performances because > it sets slighly higger OPPs while maintaining the core closer to > the target temperature range. Which is kind of expected. > > I don't have easy numbers to share, sorry... Ok, what you said above sounds good already. Konrad ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-01-10 9:41 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-03 14:38 [PATCH 0/2] arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones Neil Armstrong 2025-01-03 14:38 ` [PATCH 1/2] arm64: dts: qcom: sm8650: setup cpu thermal with idle on high temperatures Neil Armstrong 2025-01-06 23:39 ` Bjorn Andersson 2025-01-07 8:13 ` Neil Armstrong 2025-01-08 3:11 ` Bjorn Andersson 2025-01-08 9:15 ` Neil Armstrong 2025-01-09 15:18 ` Konrad Dybcio 2025-01-10 9:41 ` neil.armstrong 2025-01-09 21:01 ` Bjorn Andersson 2025-01-10 9:40 ` Neil Armstrong 2025-01-03 14:38 ` [PATCH 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures Neil Armstrong 2025-01-03 20:00 ` Rob Clark 2025-01-06 9:02 ` Neil Armstrong 2025-01-03 14:43 ` [PATCH 0/2] arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones Konrad Dybcio 2025-01-03 14:49 ` Neil Armstrong 2025-01-09 15:20 ` Konrad Dybcio
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox