public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
From: Daniel Lezcano <daniel.lezcano@linaro.org>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Linux PM mailing list <linux-pm@vger.kernel.org>,
	Lukasz Luba <Lukasz.Luba@arm.com>
Subject: Re: Trip points crossed not detected when no cooling device bound
Date: Fri, 28 Jun 2024 10:04:38 +0200	[thread overview]
Message-ID: <ff495355-3a9f-422b-b9c8-707f7e35ba43@linaro.org> (raw)
In-Reply-To: <CAJZ5v0g0=k4HZhKhs=2iwO8zc=jkng898wF-nn_bUT-xA_iu6w@mail.gmail.com>

On 27/06/2024 20:23, Rafael J. Wysocki wrote:
> On Thu, Jun 27, 2024 at 6:30 PM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>>
>> On 27/06/2024 11:54, Rafael J. Wysocki wrote:
>>> On Thu, Jun 27, 2024 at 12:24 AM Daniel Lezcano
>>> <daniel.lezcano@linaro.org> wrote:
>>>>
>>>> On 26/06/2024 23:21, Daniel Lezcano wrote:
>>>>
>>>> [ ... ]
>>>>
>>>>>> Oh, I see where the problem can be.  If the zone is polling only, it
>>>>>> will not rearm the timer when the current zone temperature is invalid
>>>>>> after the above commit, so does the attached patch help?
>>>>>
>>>>> At this point, I went far when bisecting another problem and I ended up
>>>>> screwing my config file. So I had to generate a new one from the default
>>>>> config. Since then the issue is no longer happening which sounds very
>>>>> strange to me.
>>>>>
>>>>> I'm still investigating but if you have a suggestion coming in mind, it
>>>>> would be welcome because I'm failing to find out what is going on ... :/
>>>>
>>>> I finally reproduced the issue. That happens when there is *no* cooling
>>>> device bound on *any* thermal zones.
>>>
>>> Interesting.
>>>
>>>> Your patch seems to fix the problem but I'm not sure to understand the
>>>> conditions of the bug
>>>
>>> It's probably the same as for commit 202aa0d4bb53:
>>> thermal_zone_device_init() sets tz->temperature to
>>> THERMAL_TEMP_INVALID and if the first invocation of
>>> __thermal_zone_get_temp() returns an error (because the .get_temp()
>>> callback returns an error), monitor_thermal_zone().  If polling is the
>>> only way in which the zone temperature can be updated, things go south
>>> because the timer is not set and there is no other way to set it.  No
>>> updates will be coming
>>
>> If there is no polling delay (aka interrupt driven), the routine will
>> skip the _set_trips function and the monitor_thermal_zone() will do
>> nothing in this case, right ?
> 
> _set_trips() looks at tz->temperature, however, and it doesn't make
> sense to call it if the latter is invalid.
> 
> Same for handle_thermal_trip() and governor callbacks.
> 
>> Even setting a label jump to "monitor:" the routine is broken AFAICT
> 
> I beg to differ.
> 
> Yes, monitor_thermal_zone() does nothing if there is no polling, but
> it needs to be called anyway because it checks whether or not polling
> is there in the first place.
> 
> And if there is no polling, it is assumed that
> __thermal_zone_device_update() will be called by other means.

AFAICT, the interrupt can fire and it will result in a 
thermal_zone_device_update() but the interrupt must be setup by 
__set_trips() which is skipped because of the invalid temperature.

I've confirmed that with my evb board.

  - trips point
  - no polling delay (interrupt based)
  - no cooling device

That does not work, there is no trip crossed notification.

>>> The reason why the presence of cooling devices can "fix" this is
>>> because thermal_bind_cdev_to_trip() sets tz->need_update to 1 which
>>> then causes the thermal_zone_device_update() in
>>> __thermal_cooling_device_register() to trigger and that will update
>>> the temperature.
>>
>> IIUC, the first time get_temp() fails and then when the tz is bound, the
>> update triggers a new call with get_temp() which returns a valid
>> temperature ?

Ok, I can see another glitch here. Actually the thermal_of is calling 
thermal_of_zone_register() which in turn calls 
thermal_zone_device_enable(). However, the driver can be not fully setup 
yet (eg. rockchip_thermal.c), so it results in an error in get_temp and 
an invalid temperature.

With the aforementioned setup, that leads to a broken thermal platform 
because no trip notification will happen.

All this result in a fragile code :/

That is working only because there is a cooling device bound to the 
thermal zone which happens after the sensor is fully setup.

IMO, we should be much less resilient to .get_temp failing ...

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


  reply	other threads:[~2024-06-28  8:04 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-26  6:50 Trip points crossed not detected when no cooling device bound Daniel Lezcano
2024-06-26 10:38 ` Rafael J. Wysocki
2024-06-26 21:21   ` Daniel Lezcano
2024-06-26 22:24     ` Daniel Lezcano
2024-06-27  9:54       ` Rafael J. Wysocki
2024-06-27 16:30         ` Daniel Lezcano
2024-06-27 18:23           ` Rafael J. Wysocki
2024-06-28  8:04             ` Daniel Lezcano [this message]
2024-06-28 10:49               ` Rafael J. Wysocki
2024-06-28 12:26                 ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ff495355-3a9f-422b-b9c8-707f7e35ba43@linaro.org \
    --to=daniel.lezcano@linaro.org \
    --cc=Lukasz.Luba@arm.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox