From: Dragan Simic <dsimic@manjaro.org>
To: Alexey Charkov <alchark@gmail.com>
Cc: "Rob Herring" <robh+dt@kernel.org>,
"Krzysztof Kozlowski" <krzysztof.kozlowski+dt@linaro.org>,
"Conor Dooley" <conor+dt@kernel.org>,
"Heiko Stuebner" <heiko@sntech.de>,
"Sebastian Reichel" <sebastian.reichel@collabora.com>,
"Cristian Ciocaltea" <cristian.ciocaltea@collabora.com>,
"Christopher Obbard" <chris.obbard@collabora.com>,
"Tamás Szűcs" <szucst@iit.uni-miskolc.hu>,
"Shreeya Patel" <shreeya.patel@collabora.com>,
"Kever Yang" <kever.yang@rock-chips.com>,
"Chris Morgan" <macromorgan@hotmail.com>,
devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
linux-rockchip@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588
Date: Mon, 22 Jan 2024 05:55:35 +0100 [thread overview]
Message-ID: <f5c05015e042b11a51a9af26c35f18ed@manjaro.org> (raw)
In-Reply-To: <CABjd4Yz11D8ThcT-oCWsQf9jL2idChFYSRYVVu3KNnzwoOwkKQ@mail.gmail.com>
Hello Alexey,
On 2024-01-21 19:56, Alexey Charkov wrote:
> On Thu, Jan 18, 2024 at 10:48 PM Dragan Simic <dsimic@manjaro.org>
> wrote:
>> On 2024-01-08 14:41, Alexey Charkov wrote:
>> I apologize for my delayed response. It took me almost a month to
>> nearly fully recover from some really nasty flu that eventually went
>> into my lungs. It was awful and I'm still not back to my 100%. :(
>
> Ouch, I hope you get well soon!
Thank you, let's hope so. It's been really exhausting. :(
>> > On Sun, Jan 7, 2024 at 2:54 AM Dragan Simic <dsimic@manjaro.org> wrote:
>> >> On 2024-01-06 23:23, Alexey Charkov wrote:
>> >> > Include thermal zones information in device tree for rk3588 variants
>> >> > and enable the built-in thermal sensing ADC on RADXA Rock 5B
>> >> >
>> >> > Signed-off-by: Alexey Charkov <alchark@gmail.com>
>> >> > ---
>> >> > diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> >> > b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> >> > index 8aa0499f9b03..8235991e3112 100644
>> >> > --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> >> > +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> >> > @@ -10,6 +10,7 @@
>> >> > #include <dt-bindings/reset/rockchip,rk3588-cru.h>
>> >> > #include <dt-bindings/phy/phy.h>
>> >> > #include <dt-bindings/ata/ahci.h>
>> >> > +#include <dt-bindings/thermal/thermal.h>
>> >> >
>> >> > / {
>> >> > compatible = "rockchip,rk3588";
>> >> > @@ -2112,6 +2113,148 @@ tsadc: tsadc@fec00000 {
>> >> > status = "disabled";
>> >> > };
>> >> >
>> >> > + thermal_zones: thermal-zones {
>> >> > + soc_thermal: soc-thermal {
>> >>
>> >> It should be better to name it cpu_thermal instead. In the end,
>> >> that's what it is.
>> >
>> > The TRM document says the first TSADC channel (to which this section
>> > applies) measures the temperature near the center of the SoC die,
>> > which implies not only the CPU but also the GPU at least. RADXA's
>> > kernel for Rock 5B also has GPU passive cooling as one of the cooling
>> > maps under this node (not included here, as we don't have the GPU node
>> > in .dtsi just yet). So perhaps naming this one cpu_thermal could be
>> > misleading?
>>
>> Ah, I see now, thanks for reminding; it's all described on page 1,372
>> of the first part of the RK3588 TRM v1.0.
>>
>> Having that in mind, I'd suggest that we end up naming it
>> package_thermal.
>> The temperature near the center of the chip is usually considered to
>> be
>> the overall package temperature; for example, that's how the
>> user-facing
>> CPU temperatures are measured in the x86_64 world.
>
> Sounds good, will rename in v3!
Thanks, I'm glad you agree.
>> >> > + trips {
>> >> > + threshold: trip-point-0 {
>> >>
>> >> It should be better to name it cpu_alert0 instead, because that's what
>> >> other newer dtsi files already use.
>> >
>> > Reflecting on your comments here and below, I'm thinking that maybe it
>> > would be better to define only the critical trip point for the SoC
>> > overall, and then have alerts along with the respective cooling maps
>> > separately for A76-0,1, A76-2,3, A55-0,1,2,3? After all, given that we
>> > have more granular temperature measurement here than in previous RK
>> > chipsets it might be better to only throttle the "offending" cores,
>> > not the full package.
>> >
>> > What do you think?
>> >
>> > Downstream DT doesn't follow this approach though, so maybe there's
>> > something I'm missing here.
>>
>> I agree, it's better to fully utilize the higher measurement
>> granularity
>> made possible by having multiple temperature sensors available.
>>
>> I also agree that we should have only the critical trip defined for
>> the
>> package-level temperature sensor. Let's have the separate temperature
>> measurements for the CPU (sub)clusters do the thermal throttling, and
>> let's keep the package-level measurement for the critical shutdowns
>> only. IIRC, some MediaTek SoC dtsi already does exactly that.
>>
>> Of course, there are no reasons not to have the critical trips defined
>> for the CPU (sub)clusters as well.
>
> I think I'll also add a board-specific active cooling mechanism on the
> package level in the next iteration, given that Rock 5B has a PWM fan
> defined as a cooling device. That will go in the separate patch that
> updates rk3588-rock-5b.dts (your feedback to v2 of this patch is also
> duly noted, thank you!)
Great, thanks. Sure, making use of the Rock 5B's support for attaching
a PWM-controlled cooling fan is the way to go.
Just to reiterate a bit, any "active" trip points belong to the board
dts
file(s), because having a cooling fan is a board-specific feature. As a
note, you may also want to have a look at the RockPro64 dts(i) files,
for
example; the RockPro64 also comes with a cooling fan connector and the
associated PWM fan control logic.
>> >> > + temperature = <75000>;
>> >> > + hysteresis = <2000>;
>> >> > + type = "passive";
>> >> > + };
>> >> > + target: trip-point-1 {
>> >>
>> >> It should be better to name it cpu_alert1 instead, because that's what
>> >> other newer dtsi files already use.
>> >>
>> >> > + temperature = <85000>;
>> >> > + hysteresis = <2000>;
>> >> > + type = "passive";
>> >> > + };
>> >> > + soc_crit: soc-crit {
>> >>
>> >> It should be better to name it cpu_crit instead, because that's what
>> >> other newer dtsi files already use.
>> >
>> > Seems to me that if I define separate trips for the three groups of
>> > CPU cores as mentioned above this would better stay as soc_crit, as it
>> > applies to the whole die rather than the CPU cluster alone. Then
>> > 'threshold' and 'target' will go altogether, and I'll have separate
>> > *_alert0 and *_alert1 per CPU group.
>>
>> It should perhaps be the best to have "passive", "hot" and "critical"
>> trips defined for all three CPU groups/(sub)clusters, separately of
>> course, to have even higher granularity when it comes to the resulting
>> thermal throttling.
>
> I looked through drivers/thermal/rockchip_thermal.c, and it doesn't
> seem to provide any callback for the "hot" trip as part of its struct
> thermal_zone_device_ops, so I guess it would be redundant in our case
> here? I couldn't find any generic mechanism to react to "hot" trips,
> and they seem to be purely driver-specific, thus no-op in case of
> Rockchips - or am I missing something?
That's a good question. Please, let me go through the code in detail,
and I'll get back with an update soon. Also, please wait a bit with
sending the v3, until all open questions are addressed.
>> >> > + hysteresis = <2000>;
>> >> > + type = "critical";
>> >> > + };
>> >> > + };
>> >> > + cooling-maps {
>> >> > + map0 {
>> >> > + trip = <&target>;
>> >>
>> >> Shouldn't &threshold (i.e. &cpu_alert0) be referenced here instead?
>> >>
>> >> > + cooling-device = <&cpu_l0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>> >>
>> >> Shouldn't all big CPU cores be listed here instead?
>> >
>> > I guess if a separate trip point is defined for cpu_l0,1,2,3 then it
>> > would need to throttle at 75C, and then cpu_b0,1 and cpu_b2,3 at 85C
>> > each. Logic being that if a sensor stacked in the middle of a group of
>> > four cores shows 75C then one of the four might well be in abnormal
>> > temperature range already (85+), while sensors next to only two big
>> > cores each will likely show temperatures similar to the actual core
>> > temperature.
>>
>> I think we shouldn't make any assumptions of how the CPU cores heat up
>> and affect each other, because we don't really know the required
>> details.
>> Instead, we should simply define the reasonable values for the
>> "passive",
>> "hot" and "critical" trips, and leave the rest to the standard thermal
>> throttling logic. I hope you agree.
>>
>> In the end, that's why we have separate thermal sensors available.
>
> Indeed! I'll add extra "passive" alerts though (at 75C) to enable the
> power allocation governor to initialize its PID parameters calculation
> before the control temperature setpoint gets hit (per Daniel's
> feedback under separate cover).
I'm glad you agree. Adding one more "passive" trip point makes sense,
but please let me go through the code in detail first.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2024-01-22 4:56 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-06 22:23 [PATCH] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588 Alexey Charkov
2024-01-06 22:54 ` Dragan Simic
2024-01-08 13:41 ` Alexey Charkov
2024-01-18 18:48 ` Dragan Simic
2024-01-21 18:56 ` Alexey Charkov
2024-01-22 4:55 ` Dragan Simic [this message]
2024-01-22 6:03 ` Alexey Charkov
2024-01-22 6:22 ` Dragan Simic
2024-01-22 7:36 ` Alexey Charkov
2024-01-22 7:57 ` Dragan Simic
2024-01-22 14:20 ` Alexey Charkov
2024-01-22 17:33 ` Dragan Simic
2024-01-09 19:19 ` [PATCH v2] " Alexey Charkov
2024-01-18 19:20 ` Dragan Simic
2024-01-19 13:15 ` Heiko Stübner
2024-01-19 16:21 ` Daniel Lezcano
2024-01-21 19:57 ` Alexey Charkov
2024-01-22 0:04 ` Daniel Lezcano
2024-01-22 5:57 ` Alexey Charkov
2024-01-23 19:47 ` Alexey Charkov
2024-01-24 0:14 ` Daniel Lezcano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f5c05015e042b11a51a9af26c35f18ed@manjaro.org \
--to=dsimic@manjaro.org \
--cc=alchark@gmail.com \
--cc=chris.obbard@collabora.com \
--cc=conor+dt@kernel.org \
--cc=cristian.ciocaltea@collabora.com \
--cc=devicetree@vger.kernel.org \
--cc=heiko@sntech.de \
--cc=kever.yang@rock-chips.com \
--cc=krzysztof.kozlowski+dt@linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rockchip@lists.infradead.org \
--cc=macromorgan@hotmail.com \
--cc=robh+dt@kernel.org \
--cc=sebastian.reichel@collabora.com \
--cc=shreeya.patel@collabora.com \
--cc=szucst@iit.uni-miskolc.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).