public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Bo Ye <bo.ye@mediatek.com>, Sasha Levin <sashal@kernel.org>,
	rafael@kernel.org, daniel.lezcano@linaro.org,
	matthias.bgg@gmail.com, angelogioacchino.delregno@collabora.com,
	linux-pm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-mediatek@lists.infradead.org
Subject: [PATCH AUTOSEL 6.7 18/18] thermal: core: Fix thermal zone suspend-resume synchronization
Date: Mon, 15 Jan 2024 19:13:00 -0500	[thread overview]
Message-ID: <20240116001308.212917-18-sashal@kernel.org> (raw)
In-Reply-To: <20240116001308.212917-1-sashal@kernel.org>

From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

[ Upstream commit 4e814173a8c4f432fd068b1c796f0416328c9d99 ]

There are 3 synchronization issues with thermal zone suspend-resume
during system-wide transitions:

 1. The resume code runs in a PM notifier which is invoked after user
    space has been thawed, so it can run concurrently with user space
    which can trigger a thermal zone device removal.  If that happens,
    the thermal zone resume code may use a stale pointer to the next
    list element and crash, because it does not hold thermal_list_lock
    while walking thermal_tz_list.

 2. The thermal zone resume code calls thermal_zone_device_init()
    outside the zone lock, so user space or an update triggered by
    the platform firmware may see an inconsistent state of a
    thermal zone leading to unexpected behavior.

 3. Clearing the in_suspend global variable in thermal_pm_notify()
    allows __thermal_zone_device_update() to continue for all thermal
    zones and it may as well run before the thermal_tz_list walk (or
    at any point during the list walk for that matter) and attempt to
    operate on a thermal zone that has not been resumed yet.  It may
    also race destructively with thermal_zone_device_init().

To address these issues, add thermal_list_lock locking to
thermal_pm_notify(), especially arount the thermal_tz_list,
make it call thermal_zone_device_init() back-to-back with
__thermal_zone_device_update() under the zone lock and replace
in_suspend with per-zone bool "suspend" indicators set and unset
under the given zone's lock.

Link: https://lore.kernel.org/linux-pm/20231218162348.69101-1-bo.ye@mediatek.com/
Reported-by: Bo Ye <bo.ye@mediatek.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/thermal/thermal_core.c | 30 +++++++++++++++++++++++-------
 include/linux/thermal.h        |  2 ++
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 9c17d35ccbbd..fe3bcfba4d67 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -37,8 +37,6 @@ static LIST_HEAD(thermal_governor_list);
 static DEFINE_MUTEX(thermal_list_lock);
 static DEFINE_MUTEX(thermal_governor_lock);
 
-static atomic_t in_suspend;
-
 static struct thermal_governor *def_governor;
 
 /*
@@ -405,7 +403,7 @@ void __thermal_zone_device_update(struct thermal_zone_device *tz,
 {
 	const struct thermal_trip *trip;
 
-	if (atomic_read(&in_suspend))
+	if (tz->suspended)
 		return;
 
 	if (WARN_ONCE(!tz->ops->get_temp,
@@ -1515,17 +1513,35 @@ static int thermal_pm_notify(struct notifier_block *nb,
 	case PM_HIBERNATION_PREPARE:
 	case PM_RESTORE_PREPARE:
 	case PM_SUSPEND_PREPARE:
-		atomic_set(&in_suspend, 1);
+		mutex_lock(&thermal_list_lock);
+
+		list_for_each_entry(tz, &thermal_tz_list, node) {
+			mutex_lock(&tz->lock);
+
+			tz->suspended = true;
+
+			mutex_unlock(&tz->lock);
+		}
+
+		mutex_unlock(&thermal_list_lock);
 		break;
 	case PM_POST_HIBERNATION:
 	case PM_POST_RESTORE:
 	case PM_POST_SUSPEND:
-		atomic_set(&in_suspend, 0);
+		mutex_lock(&thermal_list_lock);
+
 		list_for_each_entry(tz, &thermal_tz_list, node) {
+			mutex_lock(&tz->lock);
+
+			tz->suspended = false;
+
 			thermal_zone_device_init(tz);
-			thermal_zone_device_update(tz,
-						   THERMAL_EVENT_UNSPECIFIED);
+			__thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
+
+			mutex_unlock(&tz->lock);
 		}
+
+		mutex_unlock(&thermal_list_lock);
 		break;
 	default:
 		break;
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index cee814d5d1ac..1da1739d75d9 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -149,6 +149,7 @@ struct thermal_cooling_device {
  * @node:	node in thermal_tz_list (in thermal_core.c)
  * @poll_queue:	delayed work for polling
  * @notify_event: Last notification event
+ * @suspended: thermal zone suspend indicator
  */
 struct thermal_zone_device {
 	int id;
@@ -181,6 +182,7 @@ struct thermal_zone_device {
 	struct list_head node;
 	struct delayed_work poll_queue;
 	enum thermal_notify_event notify_event;
+	bool suspended;
 };
 
 /**
-- 
2.43.0


      parent reply	other threads:[~2024-01-16  0:13 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-16  0:12 [PATCH AUTOSEL 6.7 01/18] regulator: core: Only increment use_count when enable_count changes Sasha Levin
2024-01-16  0:12 ` [PATCH AUTOSEL 6.7 02/18] audit: Send netlink ACK before setting connection in auditd_set Sasha Levin
2024-01-16  0:12 ` [PATCH AUTOSEL 6.7 03/18] platform/chrome: cros_ec_debugfs: Fix permissions for panicinfo Sasha Levin
2024-01-16  0:12 ` [PATCH AUTOSEL 6.7 04/18] ACPI: tables: Correct and clean up the logic of acpi_parse_entries_array() Sasha Levin
2024-01-16  0:12 ` [PATCH AUTOSEL 6.7 05/18] ACPI: processor: reduce CPUFREQ thermal reduction pctg for Tegra241 Sasha Levin
2024-01-16  0:12 ` [PATCH AUTOSEL 6.7 06/18] ACPI: video: Add quirk for the Colorful X15 AT 23 Laptop Sasha Levin
2024-01-16  0:12 ` [PATCH AUTOSEL 6.7 07/18] PNP: ACPI: fix fortify warning Sasha Levin
2024-01-16  0:12 ` [PATCH AUTOSEL 6.7 08/18] ACPI: extlog: fix NULL pointer dereference check Sasha Levin
2024-01-16  0:12 ` [PATCH AUTOSEL 6.7 09/18] selftests/nolibc: use EFI -bios for LoongArch qemu Sasha Levin
2024-01-16  0:12 ` [PATCH AUTOSEL 6.7 10/18] selftests/nolibc: fix testcase status alignment Sasha Levin
2024-01-16  0:12 ` [PATCH AUTOSEL 6.7 11/18] ACPI: NUMA: Fix the logic of getting the fake_pxm value Sasha Levin
2024-01-16  0:12 ` [PATCH AUTOSEL 6.7 12/18] kunit: tool: fix parsing of test attributes Sasha Levin
2024-01-16  0:12 ` [PATCH AUTOSEL 6.7 13/18] kunit: Reset test->priv after each param iteration Sasha Levin
2024-01-16  0:12 ` [PATCH AUTOSEL 6.7 14/18] PM / devfreq: Synchronize devfreq_monitor_[start/stop] Sasha Levin
2024-01-16  0:12 ` [PATCH AUTOSEL 6.7 15/18] platform/x86: wmi: Remove ACPI handlers after WMI devices Sasha Levin
2024-01-16 11:53   ` Armin Wolf
2024-01-30 21:04     ` Sasha Levin
2024-01-16  0:12 ` [PATCH AUTOSEL 6.7 16/18] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Sasha Levin
2024-01-16  0:12 ` [PATCH AUTOSEL 6.7 17/18] OPP: The level field is always of unsigned int type Sasha Levin
2024-01-16  0:13 ` Sasha Levin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240116001308.212917-18-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=angelogioacchino.delregno@collabora.com \
    --cc=bo.ye@mediatek.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=matthias.bgg@gmail.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rafael@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox