linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/2] thermal: core: Fix two issues related to thermal zone resume
@ 2024-06-14 15:17 Rafael J. Wysocki
  2024-06-14 15:22 ` [PATCH v1 1/2] thermal: core: Synchronize suspend-prepare and post-suspend actions Rafael J. Wysocki
  2024-06-14 15:26 ` [PATCH v1 2/2] thermal: core: Change PM notifier priority to the minimum Rafael J. Wysocki
  0 siblings, 2 replies; 3+ messages in thread
From: Rafael J. Wysocki @ 2024-06-14 15:17 UTC (permalink / raw)
  To: Linux PM
  Cc: Daniel Lezcano, LKML, Lukasz Luba, Srinivas Pandruvada, Zhang Rui

Hi Everyone,

There are two issues resulting from asynchronous suspend of thermal zones.

One of them is platform-specific and related to some firmware issue (I think)
causing battery readings to become invalid after a system resume due to
interference between ACPI battery resume and a thermal zone update.  This
can be addressed by running the thermal PM notifier after all of the other
PM notifiers (including the ACPI battery one) which is done by patch [2/2].

The other one is mostly theoretical, but I couldn't convince myself that it
cannot happen.  Namely, a leftover thermal zone resume running during the
next system suspend (if it is carried out back-to-back with the previous
suspend-resume cycle) can accidentally reset tz->suspended set for a thermal
zone by the thermal pre-suspend PM notifier.  This is addressed by patch
[1/2] (which goes before the second one because the latter increases the
likelihood of the issue slightly).

Thanks!




^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v1 1/2] thermal: core: Synchronize suspend-prepare and post-suspend actions
  2024-06-14 15:17 [PATCH v1 0/2] thermal: core: Fix two issues related to thermal zone resume Rafael J. Wysocki
@ 2024-06-14 15:22 ` Rafael J. Wysocki
  2024-06-14 15:26 ` [PATCH v1 2/2] thermal: core: Change PM notifier priority to the minimum Rafael J. Wysocki
  1 sibling, 0 replies; 3+ messages in thread
From: Rafael J. Wysocki @ 2024-06-14 15:22 UTC (permalink / raw)
  To: Linux PM
  Cc: Daniel Lezcano, LKML, Lukasz Luba, Srinivas Pandruvada, Zhang Rui

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

After commit 5a5efdaffda5 ("thermal: core: Resume thermal zones
asynchronously") it is theoretically possible that, if a system suspend
starts immediately after a system resume, thermal_zone_device_resume()
spawned by the thermal PM notifier for one of the thermal zones at the
end of the system resume will run after the PM thermal notifier for the
suspend-prepare action.  If that happens, tz->suspended set by the latter
will be reset by the former which may lead to unexpected consequences.

To avoid that race, synchronize thermal_zone_device_resume() with the
suspend-prepare thermal PM notifier with the help of additional bool
field and completion in struct thermal_zone_device.

Note that this also ensures running __thermal_zone_device_update() at
least once for each thermal zone between system resume and the following
system suspend in case it is needed to start thermal mitigation.

Fixes: 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/thermal/thermal_core.c |   21 +++++++++++++++++++++
 drivers/thermal/thermal_core.h |    4 ++++
 2 files changed, 25 insertions(+)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -1406,6 +1406,7 @@ thermal_zone_device_register_with_trips(
 	ida_init(&tz->ida);
 	mutex_init(&tz->lock);
 	init_completion(&tz->removal);
+	init_completion(&tz->resume);
 	id = ida_alloc(&thermal_tz_ida, GFP_KERNEL);
 	if (id < 0) {
 		result = id;
@@ -1652,6 +1653,9 @@ static void thermal_zone_device_resume(s
 	thermal_zone_device_init(tz);
 	__thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
 
+	complete(&tz->resume);
+	tz->resuming = false;
+
 	mutex_unlock(&tz->lock);
 }
 
@@ -1669,6 +1673,20 @@ static int thermal_pm_notify(struct noti
 		list_for_each_entry(tz, &thermal_tz_list, node) {
 			mutex_lock(&tz->lock);
 
+			if (tz->resuming) {
+				/*
+				 * thermal_zone_device_resume() queued up for
+				 * this zone has not acquired the lock yet, so
+				 * release it to let the function run and wait
+				 * util it has done the work.
+				 */
+				mutex_unlock(&tz->lock);
+
+				wait_for_completion(&tz->resume);
+
+				mutex_lock(&tz->lock);
+			}
+
 			tz->suspended = true;
 
 			mutex_unlock(&tz->lock);
@@ -1686,6 +1704,9 @@ static int thermal_pm_notify(struct noti
 
 			cancel_delayed_work(&tz->poll_queue);
 
+			reinit_completion(&tz->resume);
+			tz->resuming = true;
+
 			/*
 			 * Replace the work function with the resume one, which
 			 * will restore the original work function and schedule
Index: linux-pm/drivers/thermal/thermal_core.h
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.h
+++ linux-pm/drivers/thermal/thermal_core.h
@@ -55,6 +55,7 @@ struct thermal_governor {
  * @type:	the thermal zone device type
  * @device:	&struct device for this thermal zone
  * @removal:	removal completion
+ * @resume:	resume completion
  * @trip_temp_attrs:	attributes for trip points for sysfs: trip temperature
  * @trip_type_attrs:	attributes for trip points for sysfs: trip type
  * @trip_hyst_attrs:	attributes for trip points for sysfs: trip hysteresis
@@ -89,6 +90,7 @@ struct thermal_governor {
  * @poll_queue:	delayed work for polling
  * @notify_event: Last notification event
  * @suspended: thermal zone suspend indicator
+ * @resuming:	indicates whether or not thermal zone resume is in progress
  * @trips:	array of struct thermal_trip objects
  */
 struct thermal_zone_device {
@@ -96,6 +98,7 @@ struct thermal_zone_device {
 	char type[THERMAL_NAME_LENGTH];
 	struct device device;
 	struct completion removal;
+	struct completion resume;
 	struct attribute_group trips_attribute_group;
 	struct thermal_attr *trip_temp_attrs;
 	struct thermal_attr *trip_type_attrs;
@@ -123,6 +126,7 @@ struct thermal_zone_device {
 	struct delayed_work poll_queue;
 	enum thermal_notify_event notify_event;
 	bool suspended;
+	bool resuming;
 #ifdef CONFIG_THERMAL_DEBUGFS
 	struct thermal_debugfs *debugfs;
 #endif




^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v1 2/2] thermal: core: Change PM notifier priority to the minimum
  2024-06-14 15:17 [PATCH v1 0/2] thermal: core: Fix two issues related to thermal zone resume Rafael J. Wysocki
  2024-06-14 15:22 ` [PATCH v1 1/2] thermal: core: Synchronize suspend-prepare and post-suspend actions Rafael J. Wysocki
@ 2024-06-14 15:26 ` Rafael J. Wysocki
  1 sibling, 0 replies; 3+ messages in thread
From: Rafael J. Wysocki @ 2024-06-14 15:26 UTC (permalink / raw)
  To: Linux PM
  Cc: Daniel Lezcano, LKML, Lukasz Luba, Srinivas Pandruvada, Zhang Rui,
	fhortner

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

It is reported that commit 5a5efdaffda5 ("thermal: core: Resume thermal
zones asynchronously") causes battery data in sysfs on Thinkpad P1 Gen2
to become invalid after a resume from S3 (and it is necessary to reboot
the machine to restore correct battery data).  Some investigation into
the problem indicated that it happened because, after the commit in
question, the ACPI battery PM notifier ran in parallel with
thermal_zone_device_resume() for one of the thermal zones which
apparently confused the platform firmware on the affected system.

While the exact reason for the firmware confusion remains unclear, it
is arguably not particularly relevant, and the expected behavior of the
affected system can be restored by making the thermal PM notifier run
at the lowest priority which avoids interference between work items
spawned by it and the other PM notifiers (that will run before those
work items now).

Fixes: 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218881
Reported-by: fhortner@yahoo.de
Tested-by: fhortner@yahoo.de
Cc: 6.8+ <stable@vger.kernel.org> # 6.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/thermal/thermal_core.c |    6 ++++++
 1 file changed, 6 insertions(+)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -1710,6 +1710,12 @@ static int thermal_pm_notify(struct noti
 
 static struct notifier_block thermal_pm_nb = {
 	.notifier_call = thermal_pm_notify,
+	/*
+	 * Run at the lowest priority to avoid interference between the thermal
+	 * zone resume work items spawned by thermal_pm_notify() and the other
+	 * PM notifiers.
+	 */
+	.priority = INT_MIN,
 };
 
 static int __init thermal_init(void)




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-06-14 15:26 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-14 15:17 [PATCH v1 0/2] thermal: core: Fix two issues related to thermal zone resume Rafael J. Wysocki
2024-06-14 15:22 ` [PATCH v1 1/2] thermal: core: Synchronize suspend-prepare and post-suspend actions Rafael J. Wysocki
2024-06-14 15:26 ` [PATCH v1 2/2] thermal: core: Change PM notifier priority to the minimum Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).