public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* FAILED: patch "[PATCH] thermal: core: Address thermal zone removal races with resume" failed to apply to 6.12-stable tree
@ 2026-04-08  7:01 gregkh
  2026-04-13 23:14 ` [PATCH 6.12.y 0/2] backport: thermal: core: Address thermal zone removal races with resume Mauricio Faria de Oliveira
  0 siblings, 1 reply; 4+ messages in thread
From: gregkh @ 2026-04-08  7:01 UTC (permalink / raw)
  To: rafael.j.wysocki, lukasz.luba, mfo, stable; +Cc: stable


The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable@vger.kernel.org>.

To reproduce the conflict and resubmit, you may use the following commands:

git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 45b859b0728267a6199ee5002d62e6c6f3e8c89d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable@vger.kernel.org>' --in-reply-to '2026040820-overpass-barrette-bf09@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..

Possible dependencies:



thanks,

greg k-h

------------------ original commit in Linus's tree ------------------

From 45b859b0728267a6199ee5002d62e6c6f3e8c89d Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Fri, 27 Mar 2026 10:49:52 +0100
Subject: [PATCH] thermal: core: Address thermal zone removal races with resume

Since thermal_zone_pm_complete() and thermal_zone_device_resume()
re-initialize the poll_queue delayed work for the given thermal zone,
the cancel_delayed_work_sync() in thermal_zone_device_unregister()
may miss some already running work items and the thermal zone may
be freed prematurely [1].

There are two failing scenarios that both start with
running thermal_pm_notify_complete() right before invoking
thermal_zone_device_unregister() for one of the thermal zones.

In the first scenario, there is a work item already running for
the given thermal zone when thermal_pm_notify_complete() calls
thermal_zone_pm_complete() for that thermal zone and it continues to
run when thermal_zone_device_unregister() starts.  Since the poll_queue
delayed work has been re-initialized by thermal_pm_notify_complete(), the
running work item will be missed by the cancel_delayed_work_sync() in
thermal_zone_device_unregister() and if it continues to run past the
freeing of the thermal zone object, a use-after-free will occur.

In the second scenario, thermal_zone_device_resume() queued up by
thermal_pm_notify_complete() runs right after the thermal_zone_exit()
called by thermal_zone_device_unregister() has returned.  The poll_queue
delayed work is re-initialized by it before cancel_delayed_work_sync() is
called by thermal_zone_device_unregister(), so it may continue to run
after the freeing of the thermal zone object, which also leads to a
use-after-free.

Address the first failing scenario by ensuring that no thermal work
items will be running when thermal_pm_notify_complete() is called.
For this purpose, first move the cancel_delayed_work() call from
thermal_zone_pm_complete() to thermal_zone_pm_prepare() to prevent
new work from entering the workqueue going forward.  Next, switch
over to using a dedicated workqueue for thermal events and update
the code in thermal_pm_notify() to flush that workqueue after
thermal_pm_notify_prepare() has returned which will take care of
all leftover thermal work already on the workqueue (that leftover
work would do nothing useful anyway because all of the thermal zones
have been flagged as suspended).

The second failing scenario is addressed by adding a tz->state check
to thermal_zone_device_resume() to prevent it from re-initializing
the poll_queue delayed work if the thermal zone is going away.

Note that the above changes will also facilitate relocating the suspend
and resume of thermal zones closer to the suspend and resume of devices,
respectively.

Fixes: 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously")
Reported-by: syzbot+3b3852c6031d0f30dfaf@syzkaller.appspotmail.com
Closes: https://syzbot.org/bug?extid=3b3852c6031d0f30dfaf
Reported-by: Mauricio Faria de Oliveira <mfo@igalia.com>
Closes: https://lore.kernel.org/linux-pm/20260324-thermal-core-uaf-init_delayed_work-v1-1-6611ae76a8a1@igalia.com/ [1]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mauricio Faria de Oliveira <mfo@igalia.com>
Tested-by: Mauricio Faria de Oliveira <mfo@igalia.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Cc: All applicable <stable@vger.kernel.org>
Link: https://patch.msgid.link/6267615.lOV4Wx5bFT@rafael.j.wysocki

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index b7d706ed7ed9..5337612e484d 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -41,6 +41,8 @@ static struct thermal_governor *def_governor;
 
 static bool thermal_pm_suspended;
 
+static struct workqueue_struct *thermal_wq __ro_after_init;
+
 /*
  * Governor section: set of functions to handle thermal governors
  *
@@ -313,7 +315,7 @@ static void thermal_zone_device_set_polling(struct thermal_zone_device *tz,
 	if (delay > HZ)
 		delay = round_jiffies_relative(delay);
 
-	mod_delayed_work(system_freezable_power_efficient_wq, &tz->poll_queue, delay);
+	mod_delayed_work(thermal_wq, &tz->poll_queue, delay);
 }
 
 static void thermal_zone_recheck(struct thermal_zone_device *tz, int error)
@@ -1785,6 +1787,10 @@ static void thermal_zone_device_resume(struct work_struct *work)
 
 	guard(thermal_zone)(tz);
 
+	/* If the thermal zone is going away, there's nothing to do. */
+	if (tz->state & TZ_STATE_FLAG_EXIT)
+		return;
+
 	tz->state &= ~(TZ_STATE_FLAG_SUSPENDED | TZ_STATE_FLAG_RESUMING);
 
 	thermal_debug_tz_resume(tz);
@@ -1811,6 +1817,9 @@ static void thermal_zone_pm_prepare(struct thermal_zone_device *tz)
 	}
 
 	tz->state |= TZ_STATE_FLAG_SUSPENDED;
+
+	/* Prevent new work from getting to the workqueue subsequently. */
+	cancel_delayed_work(&tz->poll_queue);
 }
 
 static void thermal_pm_notify_prepare(void)
@@ -1829,8 +1838,6 @@ static void thermal_zone_pm_complete(struct thermal_zone_device *tz)
 {
 	guard(thermal_zone)(tz);
 
-	cancel_delayed_work(&tz->poll_queue);
-
 	reinit_completion(&tz->resume);
 	tz->state |= TZ_STATE_FLAG_RESUMING;
 
@@ -1840,7 +1847,7 @@ static void thermal_zone_pm_complete(struct thermal_zone_device *tz)
 	 */
 	INIT_DELAYED_WORK(&tz->poll_queue, thermal_zone_device_resume);
 	/* Queue up the work without a delay. */
-	mod_delayed_work(system_freezable_power_efficient_wq, &tz->poll_queue, 0);
+	mod_delayed_work(thermal_wq, &tz->poll_queue, 0);
 }
 
 static void thermal_pm_notify_complete(void)
@@ -1863,6 +1870,11 @@ static int thermal_pm_notify(struct notifier_block *nb,
 	case PM_RESTORE_PREPARE:
 	case PM_SUSPEND_PREPARE:
 		thermal_pm_notify_prepare();
+		/*
+		 * Allow any leftover thermal work items already on the
+		 * worqueue to complete so they don't get in the way later.
+		 */
+		flush_workqueue(thermal_wq);
 		break;
 	case PM_POST_HIBERNATION:
 	case PM_POST_RESTORE:
@@ -1895,9 +1907,16 @@ static int __init thermal_init(void)
 	if (result)
 		goto error;
 
+	thermal_wq = alloc_workqueue("thermal_events",
+				      WQ_FREEZABLE | WQ_POWER_EFFICIENT | WQ_PERCPU, 0);
+	if (!thermal_wq) {
+		result = -ENOMEM;
+		goto unregister_netlink;
+	}
+
 	result = thermal_register_governors();
 	if (result)
-		goto unregister_netlink;
+		goto destroy_workqueue;
 
 	thermal_class = kzalloc_obj(*thermal_class);
 	if (!thermal_class) {
@@ -1924,6 +1943,8 @@ static int __init thermal_init(void)
 
 unregister_governors:
 	thermal_unregister_governors();
+destroy_workqueue:
+	destroy_workqueue(thermal_wq);
 unregister_netlink:
 	thermal_netlink_exit();
 error:


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 6.12.y 0/2] backport: thermal: core: Address thermal zone removal races with resume
  2026-04-08  7:01 FAILED: patch "[PATCH] thermal: core: Address thermal zone removal races with resume" failed to apply to 6.12-stable tree gregkh
@ 2026-04-13 23:14 ` Mauricio Faria de Oliveira
  2026-04-13 23:14   ` [PATCH 6.12.y 1/2] thermal: core: Mark thermal zones as exiting before unregistration Mauricio Faria de Oliveira
  2026-04-13 23:14   ` [PATCH 6.12.y 2/2] thermal: core: Address thermal zone removal races with resume Mauricio Faria de Oliveira
  0 siblings, 2 replies; 4+ messages in thread
From: Mauricio Faria de Oliveira @ 2026-04-13 23:14 UTC (permalink / raw)
  To: stable, Rafael J. Wysocki; +Cc: Daniel Lezcano, Zhang Rui, Lukasz Luba

This is a backport of upstream commit 45b859b07282 ("thermal: core: Address thermal zone removal races with resume")
which failed to apply to 6.12-stable tree, with one dependency commit
(not strictly necessary, but simple and makes the backport simpler).

Only minor changes were necessary in PATCH 2/2 (see backport comment).

Tested on v6.12.81 with the synthetic reproducers wrote for mainline.
Both scenarios fail (i.e., hit use-after-free) without these patches,
and pass (do not hit use-after-free) with these patches.

I could not perform further testing than the synthetic reproducers
(which exercise very specific code paths), so reviews and/or tests
are welcome.

Thanks,
Mauricio

Rafael J. Wysocki (2):
  thermal: core: Mark thermal zones as exiting before unregistration
  thermal: core: Address thermal zone removal races with resume

 drivers/thermal/thermal_core.c | 36 +++++++++++++++++++++++++++++-----
 drivers/thermal/thermal_core.h |  1 +
 2 files changed, 32 insertions(+), 5 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 6.12.y 1/2] thermal: core: Mark thermal zones as exiting before unregistration
  2026-04-13 23:14 ` [PATCH 6.12.y 0/2] backport: thermal: core: Address thermal zone removal races with resume Mauricio Faria de Oliveira
@ 2026-04-13 23:14   ` Mauricio Faria de Oliveira
  2026-04-13 23:14   ` [PATCH 6.12.y 2/2] thermal: core: Address thermal zone removal races with resume Mauricio Faria de Oliveira
  1 sibling, 0 replies; 4+ messages in thread
From: Mauricio Faria de Oliveira @ 2026-04-13 23:14 UTC (permalink / raw)
  To: stable, Rafael J. Wysocki; +Cc: Daniel Lezcano, Zhang Rui, Lukasz Luba

From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

[ Upstream commit 1dae3e70b473adc32f81ca1be926440f9b1de9dc ]

In analogy with a previous change in the thermal zone registration code
path, to ensure that __thermal_zone_device_update() will return early
for thermal zones that are going away, introduce a thermal zone state
flag representing the "exit" state and set it while deleting the thermal
zone from thermal_tz_list.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/4394176.ejJDZkT8p0@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
[ mfo: this commit is a dependency/helper for backporting next commit. ]
Signed-off-by: Mauricio Faria de Oliveira <mfo@igalia.com>
---
 drivers/thermal/thermal_core.c | 3 +++
 drivers/thermal/thermal_core.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index aa302ac62b2e..4663ca7a587c 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1614,7 +1614,10 @@ void thermal_zone_device_unregister(struct thermal_zone_device *tz)
 	}
 
 	mutex_lock(&tz->lock);
+
+	tz->state |= TZ_STATE_FLAG_EXIT;
 	list_del(&tz->node);
+
 	mutex_unlock(&tz->lock);
 
 	/* Unbind all cdevs associated with 'this' thermal zone */
diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
index 163871699a60..007990ce139d 100644
--- a/drivers/thermal/thermal_core.h
+++ b/drivers/thermal/thermal_core.h
@@ -65,6 +65,7 @@ struct thermal_governor {
 #define	TZ_STATE_FLAG_SUSPENDED	BIT(0)
 #define	TZ_STATE_FLAG_RESUMING	BIT(1)
 #define	TZ_STATE_FLAG_INIT	BIT(2)
+#define	TZ_STATE_FLAG_EXIT	BIT(3)
 
 #define TZ_STATE_READY		0
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 6.12.y 2/2] thermal: core: Address thermal zone removal races with resume
  2026-04-13 23:14 ` [PATCH 6.12.y 0/2] backport: thermal: core: Address thermal zone removal races with resume Mauricio Faria de Oliveira
  2026-04-13 23:14   ` [PATCH 6.12.y 1/2] thermal: core: Mark thermal zones as exiting before unregistration Mauricio Faria de Oliveira
@ 2026-04-13 23:14   ` Mauricio Faria de Oliveira
  1 sibling, 0 replies; 4+ messages in thread
From: Mauricio Faria de Oliveira @ 2026-04-13 23:14 UTC (permalink / raw)
  To: stable, Rafael J. Wysocki; +Cc: Daniel Lezcano, Zhang Rui, Lukasz Luba

From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

[ Upstream commit 45b859b0728267a6199ee5002d62e6c6f3e8c89d ]

Since thermal_zone_pm_complete() and thermal_zone_device_resume()
re-initialize the poll_queue delayed work for the given thermal zone,
the cancel_delayed_work_sync() in thermal_zone_device_unregister()
may miss some already running work items and the thermal zone may
be freed prematurely [1].

There are two failing scenarios that both start with
running thermal_pm_notify_complete() right before invoking
thermal_zone_device_unregister() for one of the thermal zones.

In the first scenario, there is a work item already running for
the given thermal zone when thermal_pm_notify_complete() calls
thermal_zone_pm_complete() for that thermal zone and it continues to
run when thermal_zone_device_unregister() starts.  Since the poll_queue
delayed work has been re-initialized by thermal_pm_notify_complete(), the
running work item will be missed by the cancel_delayed_work_sync() in
thermal_zone_device_unregister() and if it continues to run past the
freeing of the thermal zone object, a use-after-free will occur.

In the second scenario, thermal_zone_device_resume() queued up by
thermal_pm_notify_complete() runs right after the thermal_zone_exit()
called by thermal_zone_device_unregister() has returned.  The poll_queue
delayed work is re-initialized by it before cancel_delayed_work_sync() is
called by thermal_zone_device_unregister(), so it may continue to run
after the freeing of the thermal zone object, which also leads to a
use-after-free.

Address the first failing scenario by ensuring that no thermal work
items will be running when thermal_pm_notify_complete() is called.
For this purpose, first move the cancel_delayed_work() call from
thermal_zone_pm_complete() to thermal_zone_pm_prepare() to prevent
new work from entering the workqueue going forward.  Next, switch
over to using a dedicated workqueue for thermal events and update
the code in thermal_pm_notify() to flush that workqueue after
thermal_pm_notify_prepare() has returned which will take care of
all leftover thermal work already on the workqueue (that leftover
work would do nothing useful anyway because all of the thermal zones
have been flagged as suspended).

The second failing scenario is addressed by adding a tz->state check
to thermal_zone_device_resume() to prevent it from re-initializing
the poll_queue delayed work if the thermal zone is going away.

Note that the above changes will also facilitate relocating the suspend
and resume of thermal zones closer to the suspend and resume of devices,
respectively.

Fixes: 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously")
Reported-by: syzbot+3b3852c6031d0f30dfaf@syzkaller.appspotmail.com
Closes: https://syzbot.org/bug?extid=3b3852c6031d0f30dfaf
Reported-by: Mauricio Faria de Oliveira <mfo@igalia.com>
Closes: https://lore.kernel.org/linux-pm/20260324-thermal-core-uaf-init_delayed_work-v1-1-6611ae76a8a1@igalia.com/ [1]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mauricio Faria de Oliveira <mfo@igalia.com>
Tested-by: Mauricio Faria de Oliveira <mfo@igalia.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Cc: All applicable <stable@vger.kernel.org>
Link: https://patch.msgid.link/6267615.lOV4Wx5bFT@rafael.j.wysocki
[ mfo: backport for 6.12.y:
  - No guard() or thermal_pm_notify_{prepare,complete}() for the lack of
    commit d1c8aa2a5c5c ("thermal: core: Manage thermal_list_lock using a mutex guard")
    - thermal_zone_device_resume() calls mutex_unlock() to return;
    - thermal_pm_notify() has thermal_pm_notify_prepare() in *_PREPARE;
  - No WQ_PERCPU flag in alloc_workqueue(), introduced in v6.17. ]
Signed-off-by: Mauricio Faria de Oliveira <mfo@igalia.com>
---
 drivers/thermal/thermal_core.c | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 4663ca7a587c..8ce1134e15e5 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -42,6 +42,8 @@ static struct thermal_governor *def_governor;
 
 static bool thermal_pm_suspended;
 
+static struct workqueue_struct *thermal_wq __ro_after_init;
+
 /*
  * Governor section: set of functions to handle thermal governors
  *
@@ -328,7 +330,7 @@ static void thermal_zone_device_set_polling(struct thermal_zone_device *tz,
 	if (delay > HZ)
 		delay = round_jiffies_relative(delay);
 
-	mod_delayed_work(system_freezable_power_efficient_wq, &tz->poll_queue, delay);
+	mod_delayed_work(thermal_wq, &tz->poll_queue, delay);
 }
 
 static void thermal_zone_recheck(struct thermal_zone_device *tz, int error)
@@ -1691,6 +1693,12 @@ static void thermal_zone_device_resume(struct work_struct *work)
 
 	mutex_lock(&tz->lock);
 
+	/* If the thermal zone is going away, there's nothing to do. */
+	if (tz->state & TZ_STATE_FLAG_EXIT) {
+		mutex_unlock(&tz->lock);
+		return;
+	}
+
 	tz->state &= ~(TZ_STATE_FLAG_SUSPENDED | TZ_STATE_FLAG_RESUMING);
 
 	thermal_debug_tz_resume(tz);
@@ -1722,6 +1730,9 @@ static void thermal_zone_pm_prepare(struct thermal_zone_device *tz)
 
 	tz->state |= TZ_STATE_FLAG_SUSPENDED;
 
+	/* Prevent new work from getting to the workqueue subsequently. */
+	cancel_delayed_work(&tz->poll_queue);
+
 	mutex_unlock(&tz->lock);
 }
 
@@ -1729,8 +1740,6 @@ static void thermal_zone_pm_complete(struct thermal_zone_device *tz)
 {
 	mutex_lock(&tz->lock);
 
-	cancel_delayed_work(&tz->poll_queue);
-
 	reinit_completion(&tz->resume);
 	tz->state |= TZ_STATE_FLAG_RESUMING;
 
@@ -1740,7 +1749,7 @@ static void thermal_zone_pm_complete(struct thermal_zone_device *tz)
 	 */
 	INIT_DELAYED_WORK(&tz->poll_queue, thermal_zone_device_resume);
 	/* Queue up the work without a delay. */
-	mod_delayed_work(system_freezable_power_efficient_wq, &tz->poll_queue, 0);
+	mod_delayed_work(thermal_wq, &tz->poll_queue, 0);
 
 	mutex_unlock(&tz->lock);
 }
@@ -1762,6 +1771,11 @@ static int thermal_pm_notify(struct notifier_block *nb,
 			thermal_zone_pm_prepare(tz);
 
 		mutex_unlock(&thermal_list_lock);
+		/*
+		 * Allow any leftover thermal work items already on the
+		 * worqueue to complete so they don't get in the way later.
+		 */
+		flush_workqueue(thermal_wq);
 		break;
 	case PM_POST_HIBERNATION:
 	case PM_POST_RESTORE:
@@ -1801,9 +1815,16 @@ static int __init thermal_init(void)
 	if (result)
 		goto error;
 
+	thermal_wq = alloc_workqueue("thermal_events",
+				      WQ_FREEZABLE | WQ_POWER_EFFICIENT, 0);
+	if (!thermal_wq) {
+		result = -ENOMEM;
+		goto unregister_netlink;
+	}
+
 	result = thermal_register_governors();
 	if (result)
-		goto unregister_netlink;
+		goto destroy_workqueue;
 
 	thermal_class = kzalloc(sizeof(*thermal_class), GFP_KERNEL);
 	if (!thermal_class) {
@@ -1830,6 +1851,8 @@ static int __init thermal_init(void)
 
 unregister_governors:
 	thermal_unregister_governors();
+destroy_workqueue:
+	destroy_workqueue(thermal_wq);
 unregister_netlink:
 	thermal_netlink_exit();
 error:
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-13 23:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-08  7:01 FAILED: patch "[PATCH] thermal: core: Address thermal zone removal races with resume" failed to apply to 6.12-stable tree gregkh
2026-04-13 23:14 ` [PATCH 6.12.y 0/2] backport: thermal: core: Address thermal zone removal races with resume Mauricio Faria de Oliveira
2026-04-13 23:14   ` [PATCH 6.12.y 1/2] thermal: core: Mark thermal zones as exiting before unregistration Mauricio Faria de Oliveira
2026-04-13 23:14   ` [PATCH 6.12.y 2/2] thermal: core: Address thermal zone removal races with resume Mauricio Faria de Oliveira

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox