From: Daniel Lezcano <daniel.lezcano@linaro.org>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Linux PM mailing list <linux-pm@vger.kernel.org>,
Lukasz Luba <Lukasz.Luba@arm.com>
Subject: Re: Trip points crossed not detected when no cooling device bound
Date: Fri, 28 Jun 2024 10:04:38 +0200 [thread overview]
Message-ID: <ff495355-3a9f-422b-b9c8-707f7e35ba43@linaro.org> (raw)
In-Reply-To: <CAJZ5v0g0=k4HZhKhs=2iwO8zc=jkng898wF-nn_bUT-xA_iu6w@mail.gmail.com>
On 27/06/2024 20:23, Rafael J. Wysocki wrote:
> On Thu, Jun 27, 2024 at 6:30 PM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>>
>> On 27/06/2024 11:54, Rafael J. Wysocki wrote:
>>> On Thu, Jun 27, 2024 at 12:24 AM Daniel Lezcano
>>> <daniel.lezcano@linaro.org> wrote:
>>>>
>>>> On 26/06/2024 23:21, Daniel Lezcano wrote:
>>>>
>>>> [ ... ]
>>>>
>>>>>> Oh, I see where the problem can be. If the zone is polling only, it
>>>>>> will not rearm the timer when the current zone temperature is invalid
>>>>>> after the above commit, so does the attached patch help?
>>>>>
>>>>> At this point, I went far when bisecting another problem and I ended up
>>>>> screwing my config file. So I had to generate a new one from the default
>>>>> config. Since then the issue is no longer happening which sounds very
>>>>> strange to me.
>>>>>
>>>>> I'm still investigating but if you have a suggestion coming in mind, it
>>>>> would be welcome because I'm failing to find out what is going on ... :/
>>>>
>>>> I finally reproduced the issue. That happens when there is *no* cooling
>>>> device bound on *any* thermal zones.
>>>
>>> Interesting.
>>>
>>>> Your patch seems to fix the problem but I'm not sure to understand the
>>>> conditions of the bug
>>>
>>> It's probably the same as for commit 202aa0d4bb53:
>>> thermal_zone_device_init() sets tz->temperature to
>>> THERMAL_TEMP_INVALID and if the first invocation of
>>> __thermal_zone_get_temp() returns an error (because the .get_temp()
>>> callback returns an error), monitor_thermal_zone(). If polling is the
>>> only way in which the zone temperature can be updated, things go south
>>> because the timer is not set and there is no other way to set it. No
>>> updates will be coming
>>
>> If there is no polling delay (aka interrupt driven), the routine will
>> skip the _set_trips function and the monitor_thermal_zone() will do
>> nothing in this case, right ?
>
> _set_trips() looks at tz->temperature, however, and it doesn't make
> sense to call it if the latter is invalid.
>
> Same for handle_thermal_trip() and governor callbacks.
>
>> Even setting a label jump to "monitor:" the routine is broken AFAICT
>
> I beg to differ.
>
> Yes, monitor_thermal_zone() does nothing if there is no polling, but
> it needs to be called anyway because it checks whether or not polling
> is there in the first place.
>
> And if there is no polling, it is assumed that
> __thermal_zone_device_update() will be called by other means.
AFAICT, the interrupt can fire and it will result in a
thermal_zone_device_update() but the interrupt must be setup by
__set_trips() which is skipped because of the invalid temperature.
I've confirmed that with my evb board.
- trips point
- no polling delay (interrupt based)
- no cooling device
That does not work, there is no trip crossed notification.
>>> The reason why the presence of cooling devices can "fix" this is
>>> because thermal_bind_cdev_to_trip() sets tz->need_update to 1 which
>>> then causes the thermal_zone_device_update() in
>>> __thermal_cooling_device_register() to trigger and that will update
>>> the temperature.
>>
>> IIUC, the first time get_temp() fails and then when the tz is bound, the
>> update triggers a new call with get_temp() which returns a valid
>> temperature ?
Ok, I can see another glitch here. Actually the thermal_of is calling
thermal_of_zone_register() which in turn calls
thermal_zone_device_enable(). However, the driver can be not fully setup
yet (eg. rockchip_thermal.c), so it results in an error in get_temp and
an invalid temperature.
With the aforementioned setup, that leads to a broken thermal platform
because no trip notification will happen.
All this result in a fragile code :/
That is working only because there is a cooling device bound to the
thermal zone which happens after the sensor is fully setup.
IMO, we should be much less resilient to .get_temp failing ...
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
next prev parent reply other threads:[~2024-06-28 8:04 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-26 6:50 Trip points crossed not detected when no cooling device bound Daniel Lezcano
2024-06-26 10:38 ` Rafael J. Wysocki
2024-06-26 21:21 ` Daniel Lezcano
2024-06-26 22:24 ` Daniel Lezcano
2024-06-27 9:54 ` Rafael J. Wysocki
2024-06-27 16:30 ` Daniel Lezcano
2024-06-27 18:23 ` Rafael J. Wysocki
2024-06-28 8:04 ` Daniel Lezcano [this message]
2024-06-28 10:49 ` Rafael J. Wysocki
2024-06-28 12:26 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ff495355-3a9f-422b-b9c8-707f7e35ba43@linaro.org \
--to=daniel.lezcano@linaro.org \
--cc=Lukasz.Luba@arm.com \
--cc=linux-pm@vger.kernel.org \
--cc=rafael@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox