linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Dragan Simic <dsimic@manjaro.org>
To: Alexey Charkov <alchark@gmail.com>
Cc: "Rob Herring" <robh+dt@kernel.org>,
	"Krzysztof Kozlowski" <krzysztof.kozlowski+dt@linaro.org>,
	"Conor Dooley" <conor+dt@kernel.org>,
	"Heiko Stuebner" <heiko@sntech.de>,
	"Sebastian Reichel" <sebastian.reichel@collabora.com>,
	"Cristian Ciocaltea" <cristian.ciocaltea@collabora.com>,
	"Christopher Obbard" <chris.obbard@collabora.com>,
	"Tamás Szűcs" <szucst@iit.uni-miskolc.hu>,
	"Shreeya Patel" <shreeya.patel@collabora.com>,
	"Kever Yang" <kever.yang@rock-chips.com>,
	"Chris Morgan" <macromorgan@hotmail.com>,
	devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-rockchip@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588
Date: Mon, 22 Jan 2024 05:55:35 +0100	[thread overview]
Message-ID: <f5c05015e042b11a51a9af26c35f18ed@manjaro.org> (raw)
In-Reply-To: <CABjd4Yz11D8ThcT-oCWsQf9jL2idChFYSRYVVu3KNnzwoOwkKQ@mail.gmail.com>

Hello Alexey,

On 2024-01-21 19:56, Alexey Charkov wrote:
> On Thu, Jan 18, 2024 at 10:48 PM Dragan Simic <dsimic@manjaro.org> 
> wrote:
>> On 2024-01-08 14:41, Alexey Charkov wrote:
>> I apologize for my delayed response.  It took me almost a month to
>> nearly fully recover from some really nasty flu that eventually went
>> into my lungs.  It was awful and I'm still not back to my 100%. :(
> 
> Ouch, I hope you get well soon!

Thank you, let's hope so.  It's been really exhausting. :(

>> > On Sun, Jan 7, 2024 at 2:54 AM Dragan Simic <dsimic@manjaro.org> wrote:
>> >> On 2024-01-06 23:23, Alexey Charkov wrote:
>> >> > Include thermal zones information in device tree for rk3588 variants
>> >> > and enable the built-in thermal sensing ADC on RADXA Rock 5B
>> >> >
>> >> > Signed-off-by: Alexey Charkov <alchark@gmail.com>
>> >> > ---
>> >> > diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> >> > b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> >> > index 8aa0499f9b03..8235991e3112 100644
>> >> > --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> >> > +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> >> > @@ -10,6 +10,7 @@
>> >> >  #include <dt-bindings/reset/rockchip,rk3588-cru.h>
>> >> >  #include <dt-bindings/phy/phy.h>
>> >> >  #include <dt-bindings/ata/ahci.h>
>> >> > +#include <dt-bindings/thermal/thermal.h>
>> >> >
>> >> >  / {
>> >> >       compatible = "rockchip,rk3588";
>> >> > @@ -2112,6 +2113,148 @@ tsadc: tsadc@fec00000 {
>> >> >               status = "disabled";
>> >> >       };
>> >> >
>> >> > +     thermal_zones: thermal-zones {
>> >> > +             soc_thermal: soc-thermal {
>> >>
>> >> It should be better to name it cpu_thermal instead.  In the end,
>> >> that's what it is.
>> >
>> > The TRM document says the first TSADC channel (to which this section
>> > applies) measures the temperature near the center of the SoC die,
>> > which implies not only the CPU but also the GPU at least. RADXA's
>> > kernel for Rock 5B also has GPU passive cooling as one of the cooling
>> > maps under this node (not included here, as we don't have the GPU node
>> > in .dtsi just yet). So perhaps naming this one cpu_thermal could be
>> > misleading?
>> 
>> Ah, I see now, thanks for reminding;  it's all described on page 1,372
>> of the first part of the RK3588 TRM v1.0.
>> 
>> Having that in mind, I'd suggest that we end up naming it 
>> package_thermal.
>> The temperature near the center of the chip is usually considered to 
>> be
>> the overall package temperature;  for example, that's how the 
>> user-facing
>> CPU temperatures are measured in the x86_64 world.
> 
> Sounds good, will rename in v3!

Thanks, I'm glad you agree.

>> >> > +                     trips {
>> >> > +                             threshold: trip-point-0 {
>> >>
>> >> It should be better to name it cpu_alert0 instead, because that's what
>> >> other newer dtsi files already use.
>> >
>> > Reflecting on your comments here and below, I'm thinking that maybe it
>> > would be better to define only the critical trip point for the SoC
>> > overall, and then have alerts along with the respective cooling maps
>> > separately for A76-0,1, A76-2,3, A55-0,1,2,3? After all, given that we
>> > have more granular temperature measurement here than in previous RK
>> > chipsets it might be better to only throttle the "offending" cores,
>> > not the full package.
>> >
>> > What do you think?
>> >
>> > Downstream DT doesn't follow this approach though, so maybe there's
>> > something I'm missing here.
>> 
>> I agree, it's better to fully utilize the higher measurement 
>> granularity
>> made possible by having multiple temperature sensors available.
>> 
>> I also agree that we should have only the critical trip defined for 
>> the
>> package-level temperature sensor.  Let's have the separate temperature
>> measurements for the CPU (sub)clusters do the thermal throttling, and
>> let's keep the package-level measurement for the critical shutdowns
>> only.  IIRC, some MediaTek SoC dtsi already does exactly that.
>> 
>> Of course, there are no reasons not to have the critical trips defined
>> for the CPU (sub)clusters as well.
> 
> I think I'll also add a board-specific active cooling mechanism on the
> package level in the next iteration, given that Rock 5B has a PWM fan
> defined as a cooling device. That will go in the separate patch that
> updates rk3588-rock-5b.dts (your feedback to v2 of this patch is also
> duly noted, thank you!)

Great, thanks.  Sure, making use of the Rock 5B's support for attaching
a PWM-controlled cooling fan is the way to go.

Just to reiterate a bit, any "active" trip points belong to the board 
dts
file(s), because having a cooling fan is a board-specific feature.  As a
note, you may also want to have a look at the RockPro64 dts(i) files, 
for
example;  the RockPro64 also comes with a cooling fan connector and the
associated PWM fan control logic.

>> >> > +                                     temperature = <75000>;
>> >> > +                                     hysteresis = <2000>;
>> >> > +                                     type = "passive";
>> >> > +                             };
>> >> > +                             target: trip-point-1 {
>> >>
>> >> It should be better to name it cpu_alert1 instead, because that's what
>> >> other newer dtsi files already use.
>> >>
>> >> > +                                     temperature = <85000>;
>> >> > +                                     hysteresis = <2000>;
>> >> > +                                     type = "passive";
>> >> > +                             };
>> >> > +                             soc_crit: soc-crit {
>> >>
>> >> It should be better to name it cpu_crit instead, because that's what
>> >> other newer dtsi files already use.
>> >
>> > Seems to me that if I define separate trips for the three groups of
>> > CPU cores as mentioned above this would better stay as soc_crit, as it
>> > applies to the whole die rather than the CPU cluster alone. Then
>> > 'threshold' and 'target' will go altogether, and I'll have separate
>> > *_alert0 and *_alert1 per CPU group.
>> 
>> It should perhaps be the best to have "passive", "hot" and "critical"
>> trips defined for all three CPU groups/(sub)clusters, separately of
>> course, to have even higher granularity when it comes to the resulting
>> thermal throttling.
> 
> I looked through drivers/thermal/rockchip_thermal.c, and it doesn't
> seem to provide any callback for the "hot" trip as part of its struct
> thermal_zone_device_ops, so I guess it would be redundant in our case
> here? I couldn't find any generic mechanism to react to "hot" trips,
> and they seem to be purely driver-specific, thus no-op in case of
> Rockchips - or am I missing something?

That's a good question.  Please, let me go through the code in detail,
and I'll get back with an update soon.  Also, please wait a bit with
sending the v3, until all open questions are addressed.

>> >> > +                                     hysteresis = <2000>;
>> >> > +                                     type = "critical";
>> >> > +                             };
>> >> > +                     };
>> >> > +                     cooling-maps {
>> >> > +                             map0 {
>> >> > +                                     trip = <&target>;
>> >>
>> >> Shouldn't &threshold (i.e. &cpu_alert0) be referenced here instead?
>> >>
>> >> > +                                     cooling-device = <&cpu_l0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>> >>
>> >> Shouldn't all big CPU cores be listed here instead?
>> >
>> > I guess if a separate trip point is defined for cpu_l0,1,2,3 then it
>> > would need to throttle at 75C, and then cpu_b0,1 and cpu_b2,3 at 85C
>> > each. Logic being that if a sensor stacked in the middle of a group of
>> > four cores shows 75C then one of the four might well be in abnormal
>> > temperature range already (85+), while sensors next to only two big
>> > cores each will likely show temperatures similar to the actual core
>> > temperature.
>> 
>> I think we shouldn't make any assumptions of how the CPU cores heat up
>> and affect each other, because we don't really know the required 
>> details.
>> Instead, we should simply define the reasonable values for the 
>> "passive",
>> "hot" and "critical" trips, and leave the rest to the standard thermal
>> throttling logic.  I hope you agree.
>> 
>> In the end, that's why we have separate thermal sensors available.
> 
> Indeed! I'll add extra "passive" alerts though (at 75C) to enable the
> power allocation governor to initialize its PID parameters calculation
> before the control temperature setpoint gets hit (per Daniel's
> feedback under separate cover).

I'm glad you agree.  Adding one more "passive" trip point makes sense,
but please let me go through the code in detail first.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2024-01-22  4:56 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-06 22:23 [PATCH] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588 Alexey Charkov
2024-01-06 22:54 ` Dragan Simic
2024-01-08 13:41   ` Alexey Charkov
2024-01-18 18:48     ` Dragan Simic
2024-01-21 18:56       ` Alexey Charkov
2024-01-22  4:55         ` Dragan Simic [this message]
2024-01-22  6:03           ` Alexey Charkov
2024-01-22  6:22             ` Dragan Simic
2024-01-22  7:36               ` Alexey Charkov
2024-01-22  7:57                 ` Dragan Simic
2024-01-22 14:20                   ` Alexey Charkov
2024-01-22 17:33                     ` Dragan Simic
2024-01-09 19:19 ` [PATCH v2] " Alexey Charkov
2024-01-18 19:20   ` Dragan Simic
2024-01-19 13:15   ` Heiko Stübner
2024-01-19 16:21   ` Daniel Lezcano
2024-01-21 19:57     ` Alexey Charkov
2024-01-22  0:04       ` Daniel Lezcano
2024-01-22  5:57         ` Alexey Charkov
2024-01-23 19:47         ` Alexey Charkov
2024-01-24  0:14           ` Daniel Lezcano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f5c05015e042b11a51a9af26c35f18ed@manjaro.org \
    --to=dsimic@manjaro.org \
    --cc=alchark@gmail.com \
    --cc=chris.obbard@collabora.com \
    --cc=conor+dt@kernel.org \
    --cc=cristian.ciocaltea@collabora.com \
    --cc=devicetree@vger.kernel.org \
    --cc=heiko@sntech.de \
    --cc=kever.yang@rock-chips.com \
    --cc=krzysztof.kozlowski+dt@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=macromorgan@hotmail.com \
    --cc=robh+dt@kernel.org \
    --cc=sebastian.reichel@collabora.com \
    --cc=shreeya.patel@collabora.com \
    --cc=szucst@iit.uni-miskolc.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).