* [PATCH v2 0/2] arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones
@ 2025-01-10 10:36 Neil Armstrong
2025-01-10 10:36 ` [PATCH v2 1/2] arm64: dts: qcom: sm8650: drop cpu thermal passive trip points Neil Armstrong
2025-01-10 10:36 ` [PATCH v2 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures Neil Armstrong
0 siblings, 2 replies; 9+ messages in thread
From: Neil Armstrong @ 2025-01-10 10:36 UTC (permalink / raw)
To: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski,
Conor Dooley
Cc: linux-arm-msm, devicetree, linux-kernel, Neil Armstrong
On the SM8650 platform, the dynamic clock and voltage scaling (DCVS) for
the CPUs and GPU is handled by hardware & firmware using factory and
form-factor determined parameters in order to maximize frequency while
keeping the temperature way below the junction temperature where the SoC
would experience a thermal shutdown if not permanent damages.
On the other side, the High Level Ooperating System (HLOS), like Linux,
is able to adjust the CPU and GPU frequency using the internal SoC
temperature sensors (here tsens) and it's UP/LOW interrupts, but it
effectly does the same work twice in an less effective manner.
Let's take the Hardware & Firmware action in account and design the
thermal zones trip points and cooling devices mapping to use the HLOS
as a safety warant in case the platform experiences a temperature surge
to helpfully avoid a thermal shutdown and handle the scenario gracefully.
On the CPU side, the LMh hardware does the DCVS control loop, so
only keep the critical trip point that would do a software system
reboot as an emergency action to avoid the thermal shutdown.
On the GPU side, the GPU Management Unit (GMU) acts as the DCVS
control loop, but since we can't perform idle injection, let's
also set higher trip points temperatures closer to the junction
and thermal shutdown temperatures to reduce the GPU frequency only
as an emergency action before the thermal shutdown.
Those 2 changes optimizes the thermal management design by avoiding
concurrent thermal management, calculations & avoidable interrupts
by moving the HLOS management to a last resort emergency if the
Hardware & Firmwares fails to avoid a thermal shutdown.
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
---
Changes in v2:
- Drop idle injection
- only keep critical trip points
- reword commmit msg and cover letter
- Link to v1: https://lore.kernel.org/r/20250103-topic-sm8650-thermal-cpu-idle-v1-0-faa1f011ecd9@linaro.org
---
Neil Armstrong (2):
arm64: dts: qcom: sm8650: drop cpu thermal passive trip points
arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures
arch/arm64/boot/dts/qcom/sm8650.dtsi | 228 ++++-------------------------------
1 file changed, 24 insertions(+), 204 deletions(-)
---
base-commit: 8155b4ef3466f0e289e8fcc9e6e62f3f4dceeac2
change-id: 20250103-topic-sm8650-thermal-cpu-idle-1e19181a94ed
Best regards,
--
Neil Armstrong <neil.armstrong@linaro.org>
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH v2 1/2] arm64: dts: qcom: sm8650: drop cpu thermal passive trip points 2025-01-10 10:36 [PATCH v2 0/2] arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones Neil Armstrong @ 2025-01-10 10:36 ` Neil Armstrong 2025-01-10 10:36 ` [PATCH v2 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures Neil Armstrong 1 sibling, 0 replies; 9+ messages in thread From: Neil Armstrong @ 2025-01-10 10:36 UTC (permalink / raw) To: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley Cc: linux-arm-msm, devicetree, linux-kernel, Neil Armstrong On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an hardware controlled loop using the LMH and EPSS blocks with constraints and OPPs programmed in the board firmware. Since the Hardware does a better job at maintaining the CPUs temperature in an acceptable range by taking in account more parameters like the die characteristics or other factory fused values, it makes no sense to try and reproduce a similar set of constraints with the Linux cpufreq thermal core. In addition, the tsens IP is responsible for monitoring the temperature across the SoC and the current settings will heavily trigger the tsens UP/LOW interrupts if the CPU temperatures reaches the hardware thermal constraints which are currently defined in the DT. And since the CPUs are not hooked in the thermal trip points, the potential interrupts and calculations are a waste of system resources. Drop the current passive trip points and only leave the critical trip point that will trigger a software system reboot before an hardware thermal shutdown in the allmost impossible case the hardware DCVS cannot handle the temperature surge. Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> --- arch/arm64/boot/dts/qcom/sm8650.dtsi | 180 ----------------------------------- 1 file changed, 180 deletions(-) diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi index 25e47505adcb790d09f1d2726386438487255824..95509ce2713d4fcc3dbe0c5cd5827312d5681af4 100644 --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi @@ -5751,18 +5751,6 @@ cpu2-top-thermal { thermal-sensors = <&tsens0 5>; trips { - trip-point0 { - temperature = <90000>; - hysteresis = <2000>; - type = "passive"; - }; - - trip-point1 { - temperature = <95000>; - hysteresis = <2000>; - type = "passive"; - }; - cpu2-critical { temperature = <110000>; hysteresis = <1000>; @@ -5775,18 +5763,6 @@ cpu2-bottom-thermal { thermal-sensors = <&tsens0 6>; trips { - trip-point0 { - temperature = <90000>; - hysteresis = <2000>; - type = "passive"; - }; - - trip-point1 { - temperature = <95000>; - hysteresis = <2000>; - type = "passive"; - }; - cpu2-critical { temperature = <110000>; hysteresis = <1000>; @@ -5799,18 +5775,6 @@ cpu3-top-thermal { thermal-sensors = <&tsens0 7>; trips { - trip-point0 { - temperature = <90000>; - hysteresis = <2000>; - type = "passive"; - }; - - trip-point1 { - temperature = <95000>; - hysteresis = <2000>; - type = "passive"; - }; - cpu3-critical { temperature = <110000>; hysteresis = <1000>; @@ -5823,18 +5787,6 @@ cpu3-bottom-thermal { thermal-sensors = <&tsens0 8>; trips { - trip-point0 { - temperature = <90000>; - hysteresis = <2000>; - type = "passive"; - }; - - trip-point1 { - temperature = <95000>; - hysteresis = <2000>; - type = "passive"; - }; - cpu3-critical { temperature = <110000>; hysteresis = <1000>; @@ -5847,18 +5799,6 @@ cpu4-top-thermal { thermal-sensors = <&tsens0 9>; trips { - trip-point0 { - temperature = <90000>; - hysteresis = <2000>; - type = "passive"; - }; - - trip-point1 { - temperature = <95000>; - hysteresis = <2000>; - type = "passive"; - }; - cpu4-critical { temperature = <110000>; hysteresis = <1000>; @@ -5871,18 +5811,6 @@ cpu4-bottom-thermal { thermal-sensors = <&tsens0 10>; trips { - trip-point0 { - temperature = <90000>; - hysteresis = <2000>; - type = "passive"; - }; - - trip-point1 { - temperature = <95000>; - hysteresis = <2000>; - type = "passive"; - }; - cpu4-critical { temperature = <110000>; hysteresis = <1000>; @@ -5895,18 +5823,6 @@ cpu5-top-thermal { thermal-sensors = <&tsens0 11>; trips { - trip-point0 { - temperature = <90000>; - hysteresis = <2000>; - type = "passive"; - }; - - trip-point1 { - temperature = <95000>; - hysteresis = <2000>; - type = "passive"; - }; - cpu5-critical { temperature = <110000>; hysteresis = <1000>; @@ -5919,18 +5835,6 @@ cpu5-bottom-thermal { thermal-sensors = <&tsens0 12>; trips { - trip-point0 { - temperature = <90000>; - hysteresis = <2000>; - type = "passive"; - }; - - trip-point1 { - temperature = <95000>; - hysteresis = <2000>; - type = "passive"; - }; - cpu5-critical { temperature = <110000>; hysteresis = <1000>; @@ -5943,18 +5847,6 @@ cpu6-top-thermal { thermal-sensors = <&tsens0 13>; trips { - trip-point0 { - temperature = <90000>; - hysteresis = <2000>; - type = "passive"; - }; - - trip-point1 { - temperature = <95000>; - hysteresis = <2000>; - type = "passive"; - }; - cpu6-critical { temperature = <110000>; hysteresis = <1000>; @@ -5967,18 +5859,6 @@ cpu6-bottom-thermal { thermal-sensors = <&tsens0 14>; trips { - trip-point0 { - temperature = <90000>; - hysteresis = <2000>; - type = "passive"; - }; - - trip-point1 { - temperature = <95000>; - hysteresis = <2000>; - type = "passive"; - }; - cpu6-critical { temperature = <110000>; hysteresis = <1000>; @@ -6009,18 +5889,6 @@ cpu7-top-thermal { thermal-sensors = <&tsens1 1>; trips { - trip-point0 { - temperature = <90000>; - hysteresis = <2000>; - type = "passive"; - }; - - trip-point1 { - temperature = <95000>; - hysteresis = <2000>; - type = "passive"; - }; - cpu7-critical { temperature = <110000>; hysteresis = <1000>; @@ -6033,18 +5901,6 @@ cpu7-middle-thermal { thermal-sensors = <&tsens1 2>; trips { - trip-point0 { - temperature = <90000>; - hysteresis = <2000>; - type = "passive"; - }; - - trip-point1 { - temperature = <95000>; - hysteresis = <2000>; - type = "passive"; - }; - cpu7-critical { temperature = <110000>; hysteresis = <1000>; @@ -6057,18 +5913,6 @@ cpu7-bottom-thermal { thermal-sensors = <&tsens1 3>; trips { - trip-point0 { - temperature = <90000>; - hysteresis = <2000>; - type = "passive"; - }; - - trip-point1 { - temperature = <95000>; - hysteresis = <2000>; - type = "passive"; - }; - cpu7-critical { temperature = <110000>; hysteresis = <1000>; @@ -6081,18 +5925,6 @@ cpu0-thermal { thermal-sensors = <&tsens1 4>; trips { - trip-point0 { - temperature = <90000>; - hysteresis = <2000>; - type = "passive"; - }; - - trip-point1 { - temperature = <95000>; - hysteresis = <2000>; - type = "passive"; - }; - cpu0-critical { temperature = <110000>; hysteresis = <1000>; @@ -6105,18 +5937,6 @@ cpu1-thermal { thermal-sensors = <&tsens1 5>; trips { - trip-point0 { - temperature = <90000>; - hysteresis = <2000>; - type = "passive"; - }; - - trip-point1 { - temperature = <95000>; - hysteresis = <2000>; - type = "passive"; - }; - cpu1-critical { temperature = <110000>; hysteresis = <1000>; -- 2.34.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures 2025-01-10 10:36 [PATCH v2 0/2] arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones Neil Armstrong 2025-01-10 10:36 ` [PATCH v2 1/2] arm64: dts: qcom: sm8650: drop cpu thermal passive trip points Neil Armstrong @ 2025-01-10 10:36 ` Neil Armstrong 2025-01-13 10:28 ` Akhil P Oommen 1 sibling, 1 reply; 9+ messages in thread From: Neil Armstrong @ 2025-01-10 10:36 UTC (permalink / raw) To: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley Cc: linux-arm-msm, devicetree, linux-kernel, Neil Armstrong On the SM8650, the dynamic clock and voltage scaling (DCVS) for the GPU is done in an hardware controlled loop by the GPU Management Unit (GMU). Since the GMU does a better job at maintaining the GPUs temperature in an acceptable range by taking in account more parameters like the die characteristics or other internal sensors, it makes no sense to try and reproduce a similar set of constraints with the Linux devfreq thermal core. Instead, set higher temperatures in the GPU trip points corresponding to the temperatures provided by Qualcomm in the dowstream source, which will trigger the devfreq thermal core if the GMU cannot handle the temperature surge, and try our best to avoid reaching the critical temperature trip point which should trigger an inevitable thermal shutdown. Fixes: 497624ed5506 ("arm64: dts: qcom: sm8650: Throttle the GPU when overheating") Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> --- arch/arm64/boot/dts/qcom/sm8650.dtsi | 48 ++++++++++++++++++------------------ 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi index 95509ce2713d4fcc3dbe0c5cd5827312d5681af4..e9fcf05cb084b7979ecf0f4712fed332e9f4b07a 100644 --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi @@ -6173,19 +6173,19 @@ map0 { trips { gpu0_alert0: trip-point0 { - temperature = <85000>; + temperature = <95000>; hysteresis = <1000>; type = "passive"; }; trip-point1 { - temperature = <90000>; + temperature = <115000>; hysteresis = <1000>; type = "hot"; }; trip-point2 { - temperature = <110000>; + temperature = <125000>; hysteresis = <1000>; type = "critical"; }; @@ -6206,19 +6206,19 @@ map0 { trips { gpu1_alert0: trip-point0 { - temperature = <85000>; + temperature = <95000>; hysteresis = <1000>; type = "passive"; }; trip-point1 { - temperature = <90000>; + temperature = <115000>; hysteresis = <1000>; type = "hot"; }; trip-point2 { - temperature = <110000>; + temperature = <125000>; hysteresis = <1000>; type = "critical"; }; @@ -6239,19 +6239,19 @@ map0 { trips { gpu2_alert0: trip-point0 { - temperature = <85000>; + temperature = <95000>; hysteresis = <1000>; type = "passive"; }; trip-point1 { - temperature = <90000>; + temperature = <115000>; hysteresis = <1000>; type = "hot"; }; trip-point2 { - temperature = <110000>; + temperature = <125000>; hysteresis = <1000>; type = "critical"; }; @@ -6272,19 +6272,19 @@ map0 { trips { gpu3_alert0: trip-point0 { - temperature = <85000>; + temperature = <95000>; hysteresis = <1000>; type = "passive"; }; trip-point1 { - temperature = <90000>; + temperature = <115000>; hysteresis = <1000>; type = "hot"; }; trip-point2 { - temperature = <110000>; + temperature = <125000>; hysteresis = <1000>; type = "critical"; }; @@ -6305,19 +6305,19 @@ map0 { trips { gpu4_alert0: trip-point0 { - temperature = <85000>; + temperature = <95000>; hysteresis = <1000>; type = "passive"; }; trip-point1 { - temperature = <90000>; + temperature = <115000>; hysteresis = <1000>; type = "hot"; }; trip-point2 { - temperature = <110000>; + temperature = <125000>; hysteresis = <1000>; type = "critical"; }; @@ -6338,19 +6338,19 @@ map0 { trips { gpu5_alert0: trip-point0 { - temperature = <85000>; + temperature = <95000>; hysteresis = <1000>; type = "passive"; }; trip-point1 { - temperature = <90000>; + temperature = <115000>; hysteresis = <1000>; type = "hot"; }; trip-point2 { - temperature = <110000>; + temperature = <125000>; hysteresis = <1000>; type = "critical"; }; @@ -6371,19 +6371,19 @@ map0 { trips { gpu6_alert0: trip-point0 { - temperature = <85000>; + temperature = <95000>; hysteresis = <1000>; type = "passive"; }; trip-point1 { - temperature = <90000>; + temperature = <115000>; hysteresis = <1000>; type = "hot"; }; trip-point2 { - temperature = <110000>; + temperature = <125000>; hysteresis = <1000>; type = "critical"; }; @@ -6404,19 +6404,19 @@ map0 { trips { gpu7_alert0: trip-point0 { - temperature = <85000>; + temperature = <95000>; hysteresis = <1000>; type = "passive"; }; trip-point1 { - temperature = <90000>; + temperature = <115000>; hysteresis = <1000>; type = "hot"; }; trip-point2 { - temperature = <110000>; + temperature = <125000>; hysteresis = <1000>; type = "critical"; }; -- 2.34.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures 2025-01-10 10:36 ` [PATCH v2 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures Neil Armstrong @ 2025-01-13 10:28 ` Akhil P Oommen 2025-01-13 10:45 ` Neil Armstrong 2025-01-16 21:20 ` Konrad Dybcio 0 siblings, 2 replies; 9+ messages in thread From: Akhil P Oommen @ 2025-01-13 10:28 UTC (permalink / raw) To: Neil Armstrong, Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley Cc: linux-arm-msm, devicetree, linux-kernel On 1/10/2025 4:06 PM, Neil Armstrong wrote: > On the SM8650, the dynamic clock and voltage scaling (DCVS) for the GPU > is done in an hardware controlled loop by the GPU Management Unit (GMU). > > Since the GMU does a better job at maintaining the GPUs temperature in an > acceptable range by taking in account more parameters like the die > characteristics or other internal sensors, it makes no sense to try > and reproduce a similar set of constraints with the Linux devfreq thermal > core. Just FYI, this description is incorrect. SM8650's GMU doesn't do any sort of thermal management. -Akhil. > > Instead, set higher temperatures in the GPU trip points corresponding to > the temperatures provided by Qualcomm in the dowstream source, which will > trigger the devfreq thermal core if the GMU cannot handle the temperature > surge, and try our best to avoid reaching the critical temperature trip > point which should trigger an inevitable thermal shutdown. > > Fixes: 497624ed5506 ("arm64: dts: qcom: sm8650: Throttle the GPU when overheating") > Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> > --- > arch/arm64/boot/dts/qcom/sm8650.dtsi | 48 ++++++++++++++++++------------------ > 1 file changed, 24 insertions(+), 24 deletions(-) > > diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi > index 95509ce2713d4fcc3dbe0c5cd5827312d5681af4..e9fcf05cb084b7979ecf0f4712fed332e9f4b07a 100644 > --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi > +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi > @@ -6173,19 +6173,19 @@ map0 { > > trips { > gpu0_alert0: trip-point0 { > - temperature = <85000>; > + temperature = <95000>; > hysteresis = <1000>; > type = "passive"; > }; > > trip-point1 { > - temperature = <90000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "hot"; > }; > > trip-point2 { > - temperature = <110000>; > + temperature = <125000>; > hysteresis = <1000>; > type = "critical"; > }; > @@ -6206,19 +6206,19 @@ map0 { > > trips { > gpu1_alert0: trip-point0 { > - temperature = <85000>; > + temperature = <95000>; > hysteresis = <1000>; > type = "passive"; > }; > > trip-point1 { > - temperature = <90000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "hot"; > }; > > trip-point2 { > - temperature = <110000>; > + temperature = <125000>; > hysteresis = <1000>; > type = "critical"; > }; > @@ -6239,19 +6239,19 @@ map0 { > > trips { > gpu2_alert0: trip-point0 { > - temperature = <85000>; > + temperature = <95000>; > hysteresis = <1000>; > type = "passive"; > }; > > trip-point1 { > - temperature = <90000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "hot"; > }; > > trip-point2 { > - temperature = <110000>; > + temperature = <125000>; > hysteresis = <1000>; > type = "critical"; > }; > @@ -6272,19 +6272,19 @@ map0 { > > trips { > gpu3_alert0: trip-point0 { > - temperature = <85000>; > + temperature = <95000>; > hysteresis = <1000>; > type = "passive"; > }; > > trip-point1 { > - temperature = <90000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "hot"; > }; > > trip-point2 { > - temperature = <110000>; > + temperature = <125000>; > hysteresis = <1000>; > type = "critical"; > }; > @@ -6305,19 +6305,19 @@ map0 { > > trips { > gpu4_alert0: trip-point0 { > - temperature = <85000>; > + temperature = <95000>; > hysteresis = <1000>; > type = "passive"; > }; > > trip-point1 { > - temperature = <90000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "hot"; > }; > > trip-point2 { > - temperature = <110000>; > + temperature = <125000>; > hysteresis = <1000>; > type = "critical"; > }; > @@ -6338,19 +6338,19 @@ map0 { > > trips { > gpu5_alert0: trip-point0 { > - temperature = <85000>; > + temperature = <95000>; > hysteresis = <1000>; > type = "passive"; > }; > > trip-point1 { > - temperature = <90000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "hot"; > }; > > trip-point2 { > - temperature = <110000>; > + temperature = <125000>; > hysteresis = <1000>; > type = "critical"; > }; > @@ -6371,19 +6371,19 @@ map0 { > > trips { > gpu6_alert0: trip-point0 { > - temperature = <85000>; > + temperature = <95000>; > hysteresis = <1000>; > type = "passive"; > }; > > trip-point1 { > - temperature = <90000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "hot"; > }; > > trip-point2 { > - temperature = <110000>; > + temperature = <125000>; > hysteresis = <1000>; > type = "critical"; > }; > @@ -6404,19 +6404,19 @@ map0 { > > trips { > gpu7_alert0: trip-point0 { > - temperature = <85000>; > + temperature = <95000>; > hysteresis = <1000>; > type = "passive"; > }; > > trip-point1 { > - temperature = <90000>; > + temperature = <115000>; > hysteresis = <1000>; > type = "hot"; > }; > > trip-point2 { > - temperature = <110000>; > + temperature = <125000>; > hysteresis = <1000>; > type = "critical"; > }; > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures 2025-01-13 10:28 ` Akhil P Oommen @ 2025-01-13 10:45 ` Neil Armstrong 2025-01-15 19:09 ` Akhil P Oommen 2025-01-16 21:20 ` Konrad Dybcio 1 sibling, 1 reply; 9+ messages in thread From: Neil Armstrong @ 2025-01-13 10:45 UTC (permalink / raw) To: Akhil P Oommen, Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley Cc: linux-arm-msm, devicetree, linux-kernel Hi, On 13/01/2025 11:28, Akhil P Oommen wrote: > On 1/10/2025 4:06 PM, Neil Armstrong wrote: >> On the SM8650, the dynamic clock and voltage scaling (DCVS) for the GPU >> is done in an hardware controlled loop by the GPU Management Unit (GMU). >> >> Since the GMU does a better job at maintaining the GPUs temperature in an >> acceptable range by taking in account more parameters like the die >> characteristics or other internal sensors, it makes no sense to try >> and reproduce a similar set of constraints with the Linux devfreq thermal >> core. > > Just FYI, this description is incorrect. SM8650's GMU doesn't do any > sort of thermal management. Ok, thx for confirming this, in our tests the temperature steadily stayed at a max trip point when setting them higher. But perhaps it's a side effect of other mitigations. Are the new trip points still ok ? they are derived from the downstream DT. Thanks, Neil > > -Akhil. > >> >> Instead, set higher temperatures in the GPU trip points corresponding to >> the temperatures provided by Qualcomm in the dowstream source, which will >> trigger the devfreq thermal core if the GMU cannot handle the temperature >> surge, and try our best to avoid reaching the critical temperature trip >> point which should trigger an inevitable thermal shutdown. >> >> Fixes: 497624ed5506 ("arm64: dts: qcom: sm8650: Throttle the GPU when overheating") >> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> >> --- >> arch/arm64/boot/dts/qcom/sm8650.dtsi | 48 ++++++++++++++++++------------------ >> 1 file changed, 24 insertions(+), 24 deletions(-) >> >> diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi >> index 95509ce2713d4fcc3dbe0c5cd5827312d5681af4..e9fcf05cb084b7979ecf0f4712fed332e9f4b07a 100644 >> --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi >> +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi >> @@ -6173,19 +6173,19 @@ map0 { >> >> trips { >> gpu0_alert0: trip-point0 { >> - temperature = <85000>; >> + temperature = <95000>; >> hysteresis = <1000>; >> type = "passive"; >> }; >> >> trip-point1 { >> - temperature = <90000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "hot"; >> }; >> >> trip-point2 { >> - temperature = <110000>; >> + temperature = <125000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> @@ -6206,19 +6206,19 @@ map0 { >> >> trips { >> gpu1_alert0: trip-point0 { >> - temperature = <85000>; >> + temperature = <95000>; >> hysteresis = <1000>; >> type = "passive"; >> }; >> >> trip-point1 { >> - temperature = <90000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "hot"; >> }; >> >> trip-point2 { >> - temperature = <110000>; >> + temperature = <125000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> @@ -6239,19 +6239,19 @@ map0 { >> >> trips { >> gpu2_alert0: trip-point0 { >> - temperature = <85000>; >> + temperature = <95000>; >> hysteresis = <1000>; >> type = "passive"; >> }; >> >> trip-point1 { >> - temperature = <90000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "hot"; >> }; >> >> trip-point2 { >> - temperature = <110000>; >> + temperature = <125000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> @@ -6272,19 +6272,19 @@ map0 { >> >> trips { >> gpu3_alert0: trip-point0 { >> - temperature = <85000>; >> + temperature = <95000>; >> hysteresis = <1000>; >> type = "passive"; >> }; >> >> trip-point1 { >> - temperature = <90000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "hot"; >> }; >> >> trip-point2 { >> - temperature = <110000>; >> + temperature = <125000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> @@ -6305,19 +6305,19 @@ map0 { >> >> trips { >> gpu4_alert0: trip-point0 { >> - temperature = <85000>; >> + temperature = <95000>; >> hysteresis = <1000>; >> type = "passive"; >> }; >> >> trip-point1 { >> - temperature = <90000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "hot"; >> }; >> >> trip-point2 { >> - temperature = <110000>; >> + temperature = <125000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> @@ -6338,19 +6338,19 @@ map0 { >> >> trips { >> gpu5_alert0: trip-point0 { >> - temperature = <85000>; >> + temperature = <95000>; >> hysteresis = <1000>; >> type = "passive"; >> }; >> >> trip-point1 { >> - temperature = <90000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "hot"; >> }; >> >> trip-point2 { >> - temperature = <110000>; >> + temperature = <125000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> @@ -6371,19 +6371,19 @@ map0 { >> >> trips { >> gpu6_alert0: trip-point0 { >> - temperature = <85000>; >> + temperature = <95000>; >> hysteresis = <1000>; >> type = "passive"; >> }; >> >> trip-point1 { >> - temperature = <90000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "hot"; >> }; >> >> trip-point2 { >> - temperature = <110000>; >> + temperature = <125000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> @@ -6404,19 +6404,19 @@ map0 { >> >> trips { >> gpu7_alert0: trip-point0 { >> - temperature = <85000>; >> + temperature = <95000>; >> hysteresis = <1000>; >> type = "passive"; >> }; >> >> trip-point1 { >> - temperature = <90000>; >> + temperature = <115000>; >> hysteresis = <1000>; >> type = "hot"; >> }; >> >> trip-point2 { >> - temperature = <110000>; >> + temperature = <125000>; >> hysteresis = <1000>; >> type = "critical"; >> }; >> > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures 2025-01-13 10:45 ` Neil Armstrong @ 2025-01-15 19:09 ` Akhil P Oommen 0 siblings, 0 replies; 9+ messages in thread From: Akhil P Oommen @ 2025-01-15 19:09 UTC (permalink / raw) To: neil.armstrong, Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley Cc: linux-arm-msm, devicetree, linux-kernel On 1/13/2025 4:15 PM, Neil Armstrong wrote: > Hi, > > On 13/01/2025 11:28, Akhil P Oommen wrote: >> On 1/10/2025 4:06 PM, Neil Armstrong wrote: >>> On the SM8650, the dynamic clock and voltage scaling (DCVS) for the GPU >>> is done in an hardware controlled loop by the GPU Management Unit (GMU). >>> >>> Since the GMU does a better job at maintaining the GPUs temperature >>> in an >>> acceptable range by taking in account more parameters like the die >>> characteristics or other internal sensors, it makes no sense to try >>> and reproduce a similar set of constraints with the Linux devfreq >>> thermal >>> core. >> >> Just FYI, this description is incorrect. SM8650's GMU doesn't do any >> sort of thermal management. > > Ok, thx for confirming this, in our tests the temperature steadily stayed > at a max trip point when setting them higher. But perhaps it's a side > effect > of other mitigations. > > Are the new trip points still ok ? they are derived from the downstream DT. I don't have expertise on the thermal side. But in my non-expert opinion, it is fine to use a similar configuration from downstream. -Akhil. > > Thanks, > Neil > >> >> -Akhil. >> >>> >>> Instead, set higher temperatures in the GPU trip points corresponding to >>> the temperatures provided by Qualcomm in the dowstream source, which >>> will >>> trigger the devfreq thermal core if the GMU cannot handle the >>> temperature >>> surge, and try our best to avoid reaching the critical temperature trip >>> point which should trigger an inevitable thermal shutdown. >>> >>> Fixes: 497624ed5506 ("arm64: dts: qcom: sm8650: Throttle the GPU when >>> overheating") >>> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> >>> --- >>> arch/arm64/boot/dts/qcom/sm8650.dtsi | 48 +++++++++++++++++ >>> +------------------ >>> 1 file changed, 24 insertions(+), 24 deletions(-) >>> >>> diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/ >>> dts/qcom/sm8650.dtsi >>> index >>> 95509ce2713d4fcc3dbe0c5cd5827312d5681af4..e9fcf05cb084b7979ecf0f4712fed332e9f4b07a 100644 >>> --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi >>> +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi >>> @@ -6173,19 +6173,19 @@ map0 { >>> trips { >>> gpu0_alert0: trip-point0 { >>> - temperature = <85000>; >>> + temperature = <95000>; >>> hysteresis = <1000>; >>> type = "passive"; >>> }; >>> trip-point1 { >>> - temperature = <90000>; >>> + temperature = <115000>; >>> hysteresis = <1000>; >>> type = "hot"; >>> }; >>> trip-point2 { >>> - temperature = <110000>; >>> + temperature = <125000>; >>> hysteresis = <1000>; >>> type = "critical"; >>> }; >>> @@ -6206,19 +6206,19 @@ map0 { >>> trips { >>> gpu1_alert0: trip-point0 { >>> - temperature = <85000>; >>> + temperature = <95000>; >>> hysteresis = <1000>; >>> type = "passive"; >>> }; >>> trip-point1 { >>> - temperature = <90000>; >>> + temperature = <115000>; >>> hysteresis = <1000>; >>> type = "hot"; >>> }; >>> trip-point2 { >>> - temperature = <110000>; >>> + temperature = <125000>; >>> hysteresis = <1000>; >>> type = "critical"; >>> }; >>> @@ -6239,19 +6239,19 @@ map0 { >>> trips { >>> gpu2_alert0: trip-point0 { >>> - temperature = <85000>; >>> + temperature = <95000>; >>> hysteresis = <1000>; >>> type = "passive"; >>> }; >>> trip-point1 { >>> - temperature = <90000>; >>> + temperature = <115000>; >>> hysteresis = <1000>; >>> type = "hot"; >>> }; >>> trip-point2 { >>> - temperature = <110000>; >>> + temperature = <125000>; >>> hysteresis = <1000>; >>> type = "critical"; >>> }; >>> @@ -6272,19 +6272,19 @@ map0 { >>> trips { >>> gpu3_alert0: trip-point0 { >>> - temperature = <85000>; >>> + temperature = <95000>; >>> hysteresis = <1000>; >>> type = "passive"; >>> }; >>> trip-point1 { >>> - temperature = <90000>; >>> + temperature = <115000>; >>> hysteresis = <1000>; >>> type = "hot"; >>> }; >>> trip-point2 { >>> - temperature = <110000>; >>> + temperature = <125000>; >>> hysteresis = <1000>; >>> type = "critical"; >>> }; >>> @@ -6305,19 +6305,19 @@ map0 { >>> trips { >>> gpu4_alert0: trip-point0 { >>> - temperature = <85000>; >>> + temperature = <95000>; >>> hysteresis = <1000>; >>> type = "passive"; >>> }; >>> trip-point1 { >>> - temperature = <90000>; >>> + temperature = <115000>; >>> hysteresis = <1000>; >>> type = "hot"; >>> }; >>> trip-point2 { >>> - temperature = <110000>; >>> + temperature = <125000>; >>> hysteresis = <1000>; >>> type = "critical"; >>> }; >>> @@ -6338,19 +6338,19 @@ map0 { >>> trips { >>> gpu5_alert0: trip-point0 { >>> - temperature = <85000>; >>> + temperature = <95000>; >>> hysteresis = <1000>; >>> type = "passive"; >>> }; >>> trip-point1 { >>> - temperature = <90000>; >>> + temperature = <115000>; >>> hysteresis = <1000>; >>> type = "hot"; >>> }; >>> trip-point2 { >>> - temperature = <110000>; >>> + temperature = <125000>; >>> hysteresis = <1000>; >>> type = "critical"; >>> }; >>> @@ -6371,19 +6371,19 @@ map0 { >>> trips { >>> gpu6_alert0: trip-point0 { >>> - temperature = <85000>; >>> + temperature = <95000>; >>> hysteresis = <1000>; >>> type = "passive"; >>> }; >>> trip-point1 { >>> - temperature = <90000>; >>> + temperature = <115000>; >>> hysteresis = <1000>; >>> type = "hot"; >>> }; >>> trip-point2 { >>> - temperature = <110000>; >>> + temperature = <125000>; >>> hysteresis = <1000>; >>> type = "critical"; >>> }; >>> @@ -6404,19 +6404,19 @@ map0 { >>> trips { >>> gpu7_alert0: trip-point0 { >>> - temperature = <85000>; >>> + temperature = <95000>; >>> hysteresis = <1000>; >>> type = "passive"; >>> }; >>> trip-point1 { >>> - temperature = <90000>; >>> + temperature = <115000>; >>> hysteresis = <1000>; >>> type = "hot"; >>> }; >>> trip-point2 { >>> - temperature = <110000>; >>> + temperature = <125000>; >>> hysteresis = <1000>; >>> type = "critical"; >>> }; >>> >> > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures 2025-01-13 10:28 ` Akhil P Oommen 2025-01-13 10:45 ` Neil Armstrong @ 2025-01-16 21:20 ` Konrad Dybcio 2025-01-22 14:09 ` Akhil P Oommen 1 sibling, 1 reply; 9+ messages in thread From: Konrad Dybcio @ 2025-01-16 21:20 UTC (permalink / raw) To: Akhil P Oommen, Neil Armstrong, Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley Cc: linux-arm-msm, devicetree, linux-kernel On 13.01.2025 11:28 AM, Akhil P Oommen wrote: > On 1/10/2025 4:06 PM, Neil Armstrong wrote: >> On the SM8650, the dynamic clock and voltage scaling (DCVS) for the GPU >> is done in an hardware controlled loop by the GPU Management Unit (GMU). >> >> Since the GMU does a better job at maintaining the GPUs temperature in an >> acceptable range by taking in account more parameters like the die >> characteristics or other internal sensors, it makes no sense to try >> and reproduce a similar set of constraints with the Linux devfreq thermal >> core. > > Just FYI, this description is incorrect. SM8650's GMU doesn't do any > sort of thermal management. What's this for then? Just reacting to thermal pressure? https://git.codelinaro.org/clo/le/platform/vendor/qcom/opensource/graphics-kernel/-/commit/e4387d101d14965c8f2c67e10a6a9499c1a88af4 Konrad ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures 2025-01-16 21:20 ` Konrad Dybcio @ 2025-01-22 14:09 ` Akhil P Oommen 2025-01-23 11:59 ` Konrad Dybcio 0 siblings, 1 reply; 9+ messages in thread From: Akhil P Oommen @ 2025-01-22 14:09 UTC (permalink / raw) To: Konrad Dybcio, Neil Armstrong, Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley Cc: linux-arm-msm, devicetree, linux-kernel On 1/17/2025 2:50 AM, Konrad Dybcio wrote: > On 13.01.2025 11:28 AM, Akhil P Oommen wrote: >> On 1/10/2025 4:06 PM, Neil Armstrong wrote: >>> On the SM8650, the dynamic clock and voltage scaling (DCVS) for the GPU >>> is done in an hardware controlled loop by the GPU Management Unit (GMU). >>> >>> Since the GMU does a better job at maintaining the GPUs temperature in an >>> acceptable range by taking in account more parameters like the die >>> characteristics or other internal sensors, it makes no sense to try >>> and reproduce a similar set of constraints with the Linux devfreq thermal >>> core. >> >> Just FYI, this description is incorrect. SM8650's GMU doesn't do any >> sort of thermal management. > > What's this for then? Just reacting to thermal pressure? > > https://git.codelinaro.org/clo/le/platform/vendor/qcom/opensource/graphics-kernel/-/commit/e4387d101d14965c8f2c67e10a6a9499c1a88af4 > I don't think those TSENSE configs matters on SM8650 in production. -Akhil. > Konrad ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures 2025-01-22 14:09 ` Akhil P Oommen @ 2025-01-23 11:59 ` Konrad Dybcio 0 siblings, 0 replies; 9+ messages in thread From: Konrad Dybcio @ 2025-01-23 11:59 UTC (permalink / raw) To: Akhil P Oommen, Konrad Dybcio, Neil Armstrong, Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski, Conor Dooley Cc: linux-arm-msm, devicetree, linux-kernel On 22.01.2025 3:09 PM, Akhil P Oommen wrote: > On 1/17/2025 2:50 AM, Konrad Dybcio wrote: >> On 13.01.2025 11:28 AM, Akhil P Oommen wrote: >>> On 1/10/2025 4:06 PM, Neil Armstrong wrote: >>>> On the SM8650, the dynamic clock and voltage scaling (DCVS) for the GPU >>>> is done in an hardware controlled loop by the GPU Management Unit (GMU). >>>> >>>> Since the GMU does a better job at maintaining the GPUs temperature in an >>>> acceptable range by taking in account more parameters like the die >>>> characteristics or other internal sensors, it makes no sense to try >>>> and reproduce a similar set of constraints with the Linux devfreq thermal >>>> core. >>> >>> Just FYI, this description is incorrect. SM8650's GMU doesn't do any >>> sort of thermal management. >> >> What's this for then? Just reacting to thermal pressure? >> >> https://git.codelinaro.org/clo/le/platform/vendor/qcom/opensource/graphics-kernel/-/commit/e4387d101d14965c8f2c67e10a6a9499c1a88af4 >> > > I don't think those TSENSE configs matters on SM8650 in production. OK, thanks Konrad ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-01-23 11:59 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-10 10:36 [PATCH v2 0/2] arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones Neil Armstrong 2025-01-10 10:36 ` [PATCH v2 1/2] arm64: dts: qcom: sm8650: drop cpu thermal passive trip points Neil Armstrong 2025-01-10 10:36 ` [PATCH v2 2/2] arm64: dts: qcom: sm8650: setup gpu thermal with higher temperatures Neil Armstrong 2025-01-13 10:28 ` Akhil P Oommen 2025-01-13 10:45 ` Neil Armstrong 2025-01-15 19:09 ` Akhil P Oommen 2025-01-16 21:20 ` Konrad Dybcio 2025-01-22 14:09 ` Akhil P Oommen 2025-01-23 11:59 ` Konrad Dybcio
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox