public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/3] thermal: core: Fix issues related to thermal zone resume
@ 2023-12-18 19:23 Rafael J. Wysocki
  2023-12-18 19:25 ` [PATCH v1 1/3] thermal: core: Fix thermal zone suspend-resume synchronization Rafael J. Wysocki
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2023-12-18 19:23 UTC (permalink / raw)
  To: Linux PM
  Cc: Srinivas Pandruvada, Daniel Lezcano, Zhang Rui, LKML, Lukasz Luba,
	Bo Ye, Radu Solea

Hi Everyone,

This patch series fixes some issues related to the suspend and resume of
thermal zones during system-wide transitions.

Patch [1/3] addresses some existing synchronization issues.

Patch [2/3] is a preliminary change for the last patch.

Patch [3/3] rearranges the thermal zone resume code to resume thermal
zones asynchronously using the existing thermal zone polling support.

Please refer to the individual patch changelogs for details.

Thanks!




^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v1 1/3] thermal: core: Fix thermal zone suspend-resume synchronization
  2023-12-18 19:23 [PATCH v1 0/3] thermal: core: Fix issues related to thermal zone resume Rafael J. Wysocki
@ 2023-12-18 19:25 ` Rafael J. Wysocki
  2023-12-18 19:26 ` [PATCH v1 2/3] thermal: core: Initialize poll_queue in thermal_zone_device_init() Rafael J. Wysocki
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2023-12-18 19:25 UTC (permalink / raw)
  To: Linux PM
  Cc: Srinivas Pandruvada, Daniel Lezcano, Zhang Rui, LKML, Lukasz Luba,
	Bo Ye, Radu Solea

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

There are 3 synchronization issues with thermal zone suspend-resume
during system-wide transitions:

 1. The resume code runs in a PM notifier which is invoked after user
    space has been thawed, so it can run concurrently with user space
    which can trigger a thermal zone device removal.  If that happens,
    the thermal zone resume code may use a stale pointer to the next
    list element and crash, because it does not hold thermal_list_lock
    while walking thermal_tz_list.

 2. The thermal zone resume code calls thermal_zone_device_init()
    outside the zone lock, so user space or an update triggered by
    the platform firmware may see an inconsistent state of a
    thermal zone leading to unexpected behavior.

 3. Clearing the in_suspend global variable in thermal_pm_notify()
    allows __thermal_zone_device_update() to continue for all thermal
    zones and it may as well run before the thermal_tz_list walk (or
    at any point during the list walk for that matter) and attempt to
    operate on a thermal zone that has not been resumed yet.  It may
    also race destructively with thermal_zone_device_init().

To address these issues, add thermal_list_lock locking to
thermal_pm_notify(), especially arount the thermal_tz_list,
make it call thermal_zone_device_init() back-to-back with
__thermal_zone_device_update() under the zone lock and replace
in_suspend with per-zone bool "suspend" indicators set and unset
under the given zone's lock.

Link: https://lore.kernel.org/linux-pm/20231218162348.69101-1-bo.ye@mediatek.com/
Reported-by: Bo Ye <bo.ye@mediatek.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/thermal/thermal_core.c |   30 +++++++++++++++++++++++-------
 include/linux/thermal.h        |    2 ++
 2 files changed, 25 insertions(+), 7 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -37,8 +37,6 @@ static LIST_HEAD(thermal_governor_list);
 static DEFINE_MUTEX(thermal_list_lock);
 static DEFINE_MUTEX(thermal_governor_lock);
 
-static atomic_t in_suspend;
-
 static struct thermal_governor *def_governor;
 
 /*
@@ -427,7 +425,7 @@ void __thermal_zone_device_update(struct
 {
 	struct thermal_trip *trip;
 
-	if (atomic_read(&in_suspend))
+	if (tz->suspended)
 		return;
 
 	if (!thermal_zone_device_is_enabled(tz))
@@ -1538,17 +1536,35 @@ static int thermal_pm_notify(struct noti
 	case PM_HIBERNATION_PREPARE:
 	case PM_RESTORE_PREPARE:
 	case PM_SUSPEND_PREPARE:
-		atomic_set(&in_suspend, 1);
+		mutex_lock(&thermal_list_lock);
+
+		list_for_each_entry(tz, &thermal_tz_list, node) {
+			mutex_lock(&tz->lock);
+
+			tz->suspended = true;
+
+			mutex_unlock(&tz->lock);
+		}
+
+		mutex_unlock(&thermal_list_lock);
 		break;
 	case PM_POST_HIBERNATION:
 	case PM_POST_RESTORE:
 	case PM_POST_SUSPEND:
-		atomic_set(&in_suspend, 0);
+		mutex_lock(&thermal_list_lock);
+
 		list_for_each_entry(tz, &thermal_tz_list, node) {
+			mutex_lock(&tz->lock);
+
+			tz->suspended = false;
+
 			thermal_zone_device_init(tz);
-			thermal_zone_device_update(tz,
-						   THERMAL_EVENT_UNSPECIFIED);
+			__thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
+
+			mutex_unlock(&tz->lock);
 		}
+
+		mutex_unlock(&thermal_list_lock);
 		break;
 	default:
 		break;
Index: linux-pm/include/linux/thermal.h
===================================================================
--- linux-pm.orig/include/linux/thermal.h
+++ linux-pm/include/linux/thermal.h
@@ -152,6 +152,7 @@ struct thermal_cooling_device {
  * @node:	node in thermal_tz_list (in thermal_core.c)
  * @poll_queue:	delayed work for polling
  * @notify_event: Last notification event
+ * @suspended: thermal zone suspend indicator
  */
 struct thermal_zone_device {
 	int id;
@@ -185,6 +186,7 @@ struct thermal_zone_device {
 	struct list_head node;
 	struct delayed_work poll_queue;
 	enum thermal_notify_event notify_event;
+	bool suspended;
 };
 
 /**




^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v1 2/3] thermal: core: Initialize poll_queue in thermal_zone_device_init()
  2023-12-18 19:23 [PATCH v1 0/3] thermal: core: Fix issues related to thermal zone resume Rafael J. Wysocki
  2023-12-18 19:25 ` [PATCH v1 1/3] thermal: core: Fix thermal zone suspend-resume synchronization Rafael J. Wysocki
@ 2023-12-18 19:26 ` Rafael J. Wysocki
  2023-12-18 19:28 ` [PATCH v1 3/3] thermal: core: Resume thermal zones asynchronously Rafael J. Wysocki
  2023-12-28 13:23 ` [PATCH v1 0/3] thermal: core: Fix issues related to thermal zone resume Rafael J. Wysocki
  3 siblings, 0 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2023-12-18 19:26 UTC (permalink / raw)
  To: Linux PM
  Cc: Srinivas Pandruvada, Daniel Lezcano, Zhang Rui, LKML, Lukasz Luba,
	Bo Ye, Radu Solea

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

In preparation for a subsequent change, move the initialization of the
poll_queue delayed work from thermal_zone_device_register_with_trips()
to thermal_zone_device_init() which is called by the former.

However, because thermal_zone_device_init() is also called by
thermal_pm_notify(), make the latter call cancel_delayed_work() on
poll_queue before invoking the former, so as to allow the work
item to be re-initialized safely.

Also move thermal_zone_device_check() which needs to be defined
before thermal_zone_device_init(), so the latter can pass it to the
INIT_DELAYED_WORK() macro.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/thermal/thermal_core.c |   23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -410,9 +410,20 @@ static void update_temperature(struct th
 	thermal_genl_sampling_temp(tz->id, temp);
 }
 
+static void thermal_zone_device_check(struct work_struct *work)
+{
+	struct thermal_zone_device *tz = container_of(work, struct
+						      thermal_zone_device,
+						      poll_queue.work);
+	thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
+}
+
 static void thermal_zone_device_init(struct thermal_zone_device *tz)
 {
 	struct thermal_instance *pos;
+
+	INIT_DELAYED_WORK(&tz->poll_queue, thermal_zone_device_check);
+
 	tz->temperature = THERMAL_TEMP_INVALID;
 	tz->prev_low_trip = -INT_MAX;
 	tz->prev_high_trip = INT_MAX;
@@ -509,14 +520,6 @@ void thermal_zone_device_update(struct t
 }
 EXPORT_SYMBOL_GPL(thermal_zone_device_update);
 
-static void thermal_zone_device_check(struct work_struct *work)
-{
-	struct thermal_zone_device *tz = container_of(work, struct
-						      thermal_zone_device,
-						      poll_queue.work);
-	thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
-}
-
 int for_each_thermal_governor(int (*cb)(struct thermal_governor *, void *),
 			      void *data)
 {
@@ -1372,8 +1375,6 @@ thermal_zone_device_register_with_trips(
 	/* Bind cooling devices for this zone */
 	bind_tz(tz);
 
-	INIT_DELAYED_WORK(&tz->poll_queue, thermal_zone_device_check);
-
 	thermal_zone_device_init(tz);
 	/* Update the new thermal zone and mark it as already updated. */
 	if (atomic_cmpxchg(&tz->need_update, 1, 0))
@@ -1556,6 +1557,8 @@ static int thermal_pm_notify(struct noti
 		list_for_each_entry(tz, &thermal_tz_list, node) {
 			mutex_lock(&tz->lock);
 
+			cancel_delayed_work(&tz->poll_queue);
+
 			tz->suspended = false;
 
 			thermal_zone_device_init(tz);




^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v1 3/3] thermal: core: Resume thermal zones asynchronously
  2023-12-18 19:23 [PATCH v1 0/3] thermal: core: Fix issues related to thermal zone resume Rafael J. Wysocki
  2023-12-18 19:25 ` [PATCH v1 1/3] thermal: core: Fix thermal zone suspend-resume synchronization Rafael J. Wysocki
  2023-12-18 19:26 ` [PATCH v1 2/3] thermal: core: Initialize poll_queue in thermal_zone_device_init() Rafael J. Wysocki
@ 2023-12-18 19:28 ` Rafael J. Wysocki
  2023-12-28 13:23 ` [PATCH v1 0/3] thermal: core: Fix issues related to thermal zone resume Rafael J. Wysocki
  3 siblings, 0 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2023-12-18 19:28 UTC (permalink / raw)
  To: Linux PM
  Cc: Srinivas Pandruvada, Daniel Lezcano, Zhang Rui, LKML, Lukasz Luba,
	Bo Ye, Radu Solea

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The resume of thermal zones in thermal_pm_notify() is carried out
sequentially, which may be a problem if __thermal_zone_device_update()
takes a significant time to run for some thermal zones, because some
other thermal zones may need to wait for them to resume then and if
any other PM notifiers are going to be invoked after the thermal one,
they will need to wait for it either.

To address this, make thermal_pm_notify() switch the poll_queue delayed
work over to a one-shot thermal_zone_device_resume() work function that
will restore the original one during the thermal zone resume and queue
up poll_queue without a delay for each thermal zone.

Link: https://lore.kernel.org/linux-pm/20231120234015.3273143-1-radusolea@google.com/
Reported-by: Radu Solea <radusolea@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/thermal/thermal_core.c |   30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -1528,6 +1528,22 @@ exit:
 }
 EXPORT_SYMBOL_GPL(thermal_zone_get_zone_by_name);
 
+static void thermal_zone_device_resume(struct work_struct *work)
+{
+	struct thermal_zone_device *tz;
+
+	tz = container_of(work, struct thermal_zone_device, poll_queue.work);
+
+	mutex_lock(&tz->lock);
+
+	tz->suspended = false;
+
+	thermal_zone_device_init(tz);
+	__thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
+
+	mutex_unlock(&tz->lock);
+}
+
 static int thermal_pm_notify(struct notifier_block *nb,
 			     unsigned long mode, void *_unused)
 {
@@ -1559,10 +1575,16 @@ static int thermal_pm_notify(struct noti
 
 			cancel_delayed_work(&tz->poll_queue);
 
-			tz->suspended = false;
-
-			thermal_zone_device_init(tz);
-			__thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
+			/*
+			 * Replace the work function with the resume one, which
+			 * will restore the original work function and schedule
+			 * the polling work if needed.
+			 */
+			INIT_DELAYED_WORK(&tz->poll_queue,
+					  thermal_zone_device_resume);
+			/* Queue up the work without a delay. */
+			mod_delayed_work(system_freezable_power_efficient_wq,
+					 &tz->poll_queue, 0);
 
 			mutex_unlock(&tz->lock);
 		}




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1 0/3] thermal: core: Fix issues related to thermal zone resume
  2023-12-18 19:23 [PATCH v1 0/3] thermal: core: Fix issues related to thermal zone resume Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2023-12-18 19:28 ` [PATCH v1 3/3] thermal: core: Resume thermal zones asynchronously Rafael J. Wysocki
@ 2023-12-28 13:23 ` Rafael J. Wysocki
  3 siblings, 0 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2023-12-28 13:23 UTC (permalink / raw)
  To: Linux PM
  Cc: Srinivas Pandruvada, Daniel Lezcano, Rafael J. Wysocki, Zhang Rui,
	LKML, Lukasz Luba, Bo Ye, Radu Solea

On Mon, Dec 18, 2023 at 8:28 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> Hi Everyone,
>
> This patch series fixes some issues related to the suspend and resume of
> thermal zones during system-wide transitions.
>
> Patch [1/3] addresses some existing synchronization issues.
>
> Patch [2/3] is a preliminary change for the last patch.
>
> Patch [3/3] rearranges the thermal zone resume code to resume thermal
> zones asynchronously using the existing thermal zone polling support.
>
> Please refer to the individual patch changelogs for details.

These are fixes, so it would be good to get them into 6.8.

Since I don't see any objections to them, I'm adding them to the
bleeding-edge branch and will move them to linux-next next week.

Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-12-28 13:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-18 19:23 [PATCH v1 0/3] thermal: core: Fix issues related to thermal zone resume Rafael J. Wysocki
2023-12-18 19:25 ` [PATCH v1 1/3] thermal: core: Fix thermal zone suspend-resume synchronization Rafael J. Wysocki
2023-12-18 19:26 ` [PATCH v1 2/3] thermal: core: Initialize poll_queue in thermal_zone_device_init() Rafael J. Wysocki
2023-12-18 19:28 ` [PATCH v1 3/3] thermal: core: Resume thermal zones asynchronously Rafael J. Wysocki
2023-12-28 13:23 ` [PATCH v1 0/3] thermal: core: Fix issues related to thermal zone resume Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox