From: Mauricio Faria de Oliveira <mfo@igalia.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Daniel Lezcano <daniel.lezcano@kernel.org>,
Zhang Rui <rui.zhang@intel.com>,
Lukasz Luba <lukasz.luba@arm.com>,
linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org,
kernel-dev@igalia.com,
syzbot+3b3852c6031d0f30dfaf@syzkaller.appspotmail.com
Subject: Re: [PATCH] thermal: core: fix use-after-free due to init/cancel delayed_work race
Date: Wed, 25 Mar 2026 16:22:10 -0300 [thread overview]
Message-ID: <cce6cb6e12b09bc97cf684e378c874f4@igalia.com> (raw)
In-Reply-To: <CAJZ5v0jcK0eg1fjSmMDDhTqNqFxUmhdNs=exs13nduRivCiLQQ@mail.gmail.com>
On 2026-03-25 13:24, Rafael J. Wysocki wrote:
> On Wed, Mar 25, 2026 at 4:13 PM Mauricio Faria de Oliveira
> <mfo@igalia.com> wrote:
>>
>> On 2026-03-25 11:28, Mauricio Faria de Oliveira wrote:
>> > On 2026-03-25 11:17, Mauricio Faria de Oliveira wrote:
>> >> Thanks for looking into this.
>> >>
>> >> On 2026-03-25 09:47, Rafael J. Wysocki wrote:
>> >>> I can see the one between thermal_zone_device_unregister() and
>> >>> thermal_zone_device_resume(), but that can be addressed by adding a
>> >>> TZ_STATE_FLAG_EXIT check to the latter AFAICS.
>> >>
>> >
>> > Please disregard this paragraph; I incorrectly read/wrote _resume()
>> > as thermal_zone_pm_complete() discussed above. The rest should be
>> > right. I'll review this and get back shortly.
>> >
>> >> In the example describe above and detailed below, apparently that
>> >> is not sufficient, if I'm not missing anything. See, if _resume()
>> >> is reached with thermal_list_lock held, thermal_zone_device_exit()
>> >> is waiting for thermal_list_lock before setting TZ_STATE_FLAG_EXIT,
>> >> thus a check for it in _resume() would find it clear yet.
>>
>> Ok, similarly:
>>
>> Say, thermal_pm_notify() -> thermal_pm_notify_complete() ->
>> thermal_zone_pm_complete()
>> run before thermal_zone_device_unregister() is called;
>> thermal_zone_device_resume()
>> starts, and by now thermal_zone_device_unregister() is called.
>>
>> If thermal_zone_device_resume() wins the race over thermal_zone_exit()
>> for guard(thermal_zone(tz) (tz->lock), it sees TZ_STATE_FLAG_EXIT clear;
>> note its callees (eg, thermal_zone_device_init()) run with tz->lock
>> held,
>> so they see it clear as well.
>>
>> So, thermal_zone_device_init() calls INIT_DELAYED_WORK(), everything
>> returns, tz->lock is released and the thermal_zone_device_unregister()
>> -> thermal_zone_exit() path can continue to run.
>>
>> Only now thermal_zone_exit() sets TZ_STATE_FLAG_EXIT (too late),
>> returns.
>> cancel_delayed_work_sync() does not wait for
>> thermal_zone_device_resume()
>> due to INIT_DELAYED_WORK() in thermal_zone_device_init(); and kfree(tz).
>>
>> Then, thermal_zone_device_resume() accesses tz and hits use-after-free.
>>
>> Hope this clarifies. Please let me know your thoughts. Thanks!
>
> Thanks for the analysis, it sounds accurate.
>
> I'd say that thermal_zone_device_unregister() needs to flush the
> workqueue before calling cancel_delayed_work_sync() to get rid of the
> stuff that may be running out of it that hasn't seen the changes made
> by thermal_zone_exit().
IIUIC, cancel_delayed_work_sync() has that effect: it waits for
(specific)
work that might be running and hasn't seen changes by
thermal_zone_exit()).
> This should take care of all of the existing races because if anything
> is running out of the workqueue when thermal_zone_device_unregister()
> runs, it will be waited for after calling thermal_zone_exit() and any
> leftover stuff will be caught by cancel_delayed_work_sync().
Likewise, the wait-for part is an effect of cancel_delayed_work_sync(),
and AFAIK, there is no leftover after cancel_delayed_work_sync(), as
it waits for the running work function to finish.
And no further work is queued in the 2 code paths that can queue work:
1) thermal_zone_device_check(): even if it misses the tz->state check,
mod_delayed_work() does not requeue the current work item if it is
canceled/waited for by cancel_delayed_work_sync() (tested locally).
2) thermal_zone_pm_complete(): this function will no longer be reached
because tz is no longer in thermal_tz_list.
> Of course, it's better to switch over to using a dedicated workqueue
> in the thermal core for that.
Considering the points above, AFAICT, it should be sufficient to call
cancel_delayed_work_sync() for the 2 code paths in unregister()
(which thus require the distint work items for each code path).
Thanks,
--
Mauricio
next prev parent reply other threads:[~2026-03-25 19:22 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-24 23:50 [PATCH] thermal: core: fix use-after-free due to init/cancel delayed_work race Mauricio Faria de Oliveira
2026-03-25 12:10 ` Rafael J. Wysocki
2026-03-25 12:47 ` Rafael J. Wysocki
2026-03-25 14:17 ` Mauricio Faria de Oliveira
2026-03-25 14:28 ` Mauricio Faria de Oliveira
2026-03-25 15:13 ` Mauricio Faria de Oliveira
2026-03-25 16:24 ` Rafael J. Wysocki
2026-03-25 19:22 ` Mauricio Faria de Oliveira [this message]
2026-03-25 19:29 ` Rafael J. Wysocki
2026-03-26 17:41 ` Mauricio Faria de Oliveira
2026-03-25 20:20 ` Rafael J. Wysocki
2026-03-26 17:45 ` Mauricio Faria de Oliveira
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cce6cb6e12b09bc97cf684e378c874f4@igalia.com \
--to=mfo@igalia.com \
--cc=daniel.lezcano@kernel.org \
--cc=kernel-dev@igalia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=lukasz.luba@arm.com \
--cc=rafael@kernel.org \
--cc=rui.zhang@intel.com \
--cc=syzbot+3b3852c6031d0f30dfaf@syzkaller.appspotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox