public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
From: Mauricio Faria de Oliveira <mfo@igalia.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Daniel Lezcano <daniel.lezcano@kernel.org>,
	Zhang Rui <rui.zhang@intel.com>,
	Lukasz Luba <lukasz.luba@arm.com>,
	linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org,
	kernel-dev@igalia.com,
	syzbot+3b3852c6031d0f30dfaf@syzkaller.appspotmail.com
Subject: Re: [PATCH] thermal: core: fix use-after-free due to init/cancel delayed_work race
Date: Wed, 25 Mar 2026 16:22:10 -0300	[thread overview]
Message-ID: <cce6cb6e12b09bc97cf684e378c874f4@igalia.com> (raw)
In-Reply-To: <CAJZ5v0jcK0eg1fjSmMDDhTqNqFxUmhdNs=exs13nduRivCiLQQ@mail.gmail.com>

On 2026-03-25 13:24, Rafael J. Wysocki wrote:
> On Wed, Mar 25, 2026 at 4:13 PM Mauricio Faria de Oliveira
> <mfo@igalia.com> wrote:
>>
>> On 2026-03-25 11:28, Mauricio Faria de Oliveira wrote:
>> > On 2026-03-25 11:17, Mauricio Faria de Oliveira wrote:
>> >> Thanks for looking into this.
>> >>
>> >> On 2026-03-25 09:47, Rafael J. Wysocki wrote:
>> >>> I can see the one between thermal_zone_device_unregister() and
>> >>> thermal_zone_device_resume(), but that can be addressed by adding a
>> >>> TZ_STATE_FLAG_EXIT check to the latter AFAICS.
>> >>
>> >
>> > Please disregard this paragraph; I incorrectly read/wrote _resume()
>> > as thermal_zone_pm_complete() discussed above. The rest should be
>> > right. I'll review this and get back shortly.
>> >
>> >> In the example describe above and detailed below, apparently that
>> >> is not sufficient, if I'm not missing anything. See, if _resume()
>> >> is reached with thermal_list_lock held, thermal_zone_device_exit()
>> >> is waiting for thermal_list_lock before setting TZ_STATE_FLAG_EXIT,
>> >> thus a check for it in _resume() would find it clear yet.
>>
>> Ok, similarly:
>>
>> Say, thermal_pm_notify() -> thermal_pm_notify_complete() ->
>> thermal_zone_pm_complete()
>> run before thermal_zone_device_unregister() is called;
>> thermal_zone_device_resume()
>> starts, and by now thermal_zone_device_unregister() is called.
>>
>> If thermal_zone_device_resume() wins the race over thermal_zone_exit()
>> for guard(thermal_zone(tz) (tz->lock), it sees TZ_STATE_FLAG_EXIT clear;
>> note its callees (eg, thermal_zone_device_init()) run with tz->lock
>> held,
>> so they see it clear as well.
>>
>> So, thermal_zone_device_init() calls INIT_DELAYED_WORK(), everything
>> returns, tz->lock is released and the thermal_zone_device_unregister()
>> -> thermal_zone_exit() path can continue to run.
>>
>> Only now thermal_zone_exit() sets TZ_STATE_FLAG_EXIT (too late),
>> returns.
>> cancel_delayed_work_sync() does not wait for
>> thermal_zone_device_resume()
>> due to INIT_DELAYED_WORK() in thermal_zone_device_init(); and kfree(tz).
>>
>> Then, thermal_zone_device_resume() accesses tz and hits use-after-free.
>>
>> Hope this clarifies. Please let me know your thoughts. Thanks!
> 
> Thanks for the analysis, it sounds accurate.
> 
> I'd say that thermal_zone_device_unregister() needs to flush the
> workqueue before calling cancel_delayed_work_sync() to get rid of the
> stuff that may be running out of it that hasn't seen the changes made
> by thermal_zone_exit().

IIUIC, cancel_delayed_work_sync() has that effect: it waits for
(specific)
work that might be running and hasn't seen changes by
thermal_zone_exit()).

> This should take care of all of the existing races because if anything
> is running out of the workqueue when thermal_zone_device_unregister()
> runs, it will be waited for after calling thermal_zone_exit() and any
> leftover stuff will be caught by cancel_delayed_work_sync().

Likewise, the wait-for part is an effect of cancel_delayed_work_sync(),
and AFAIK, there is no leftover after cancel_delayed_work_sync(), as
it waits for the running work function to finish.

And no further work is queued in the 2 code paths that can queue work:

1) thermal_zone_device_check(): even if it misses the tz->state check,
mod_delayed_work() does not requeue the current work item if it is
canceled/waited for by cancel_delayed_work_sync() (tested locally).

2) thermal_zone_pm_complete(): this function will no longer be reached
because tz is no longer in thermal_tz_list.

> Of course, it's better to switch over to using a dedicated workqueue
> in the thermal core for that.

Considering the points above, AFAICT, it should be sufficient to call
cancel_delayed_work_sync() for the 2 code paths in unregister()
(which thus require the distint work items for each code path).

Thanks,

-- 
Mauricio

  reply	other threads:[~2026-03-25 19:22 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 23:50 [PATCH] thermal: core: fix use-after-free due to init/cancel delayed_work race Mauricio Faria de Oliveira
2026-03-25 12:10 ` Rafael J. Wysocki
2026-03-25 12:47   ` Rafael J. Wysocki
2026-03-25 14:17     ` Mauricio Faria de Oliveira
2026-03-25 14:28       ` Mauricio Faria de Oliveira
2026-03-25 15:13         ` Mauricio Faria de Oliveira
2026-03-25 16:24           ` Rafael J. Wysocki
2026-03-25 19:22             ` Mauricio Faria de Oliveira [this message]
2026-03-25 19:29               ` Rafael J. Wysocki
2026-03-26 17:41                 ` Mauricio Faria de Oliveira
2026-03-25 20:20           ` Rafael J. Wysocki
2026-03-26 17:45             ` Mauricio Faria de Oliveira

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cce6cb6e12b09bc97cf684e378c874f4@igalia.com \
    --to=mfo@igalia.com \
    --cc=daniel.lezcano@kernel.org \
    --cc=kernel-dev@igalia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=rafael@kernel.org \
    --cc=rui.zhang@intel.com \
    --cc=syzbot+3b3852c6031d0f30dfaf@syzkaller.appspotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox