From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4C3621CA85; Tue, 16 Jan 2024 00:13:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Lf7ZY1IJ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C72D1C433C7; Tue, 16 Jan 2024 00:13:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705364020; bh=qu8PZWL5sa4HADRmhp8O/xnDAnb+X3Fz4jVED4/Lwsc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Lf7ZY1IJiiVtM6xVuHvS5TN9wz1+kxMtqs9WUhosdOD2UQIcDuXSAvotLTMZAregC PzxwQfHsHdkHhaQ8+iYvusOI97wj26Qteu/d1e8pMDJJ6ucnEB227TAdlEN5B0orZD pOttQNna5gH4oO+xHSeauFfQnyYD5H1J34ipt6IavLefIBR9QQs7jq6JGo2S9uMq5j LCX/VnBLpcFIo5r67K8vqqSgtfv/dvRPQZChRGGkxiNBg5/rcBUzhB3fT9n/WXR4kQ WeiM3Pnqx4CR+r2nTHnER6RwZF59JR+SaTYcptT6KP2KlGuS00WIVk2MmgRfC+Hdbn FmMGTUWOGG3Kw== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: "Rafael J. Wysocki" , Bo Ye , Sasha Levin , rafael@kernel.org, daniel.lezcano@linaro.org, matthias.bgg@gmail.com, angelogioacchino.delregno@collabora.com, linux-pm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org Subject: [PATCH AUTOSEL 6.7 18/18] thermal: core: Fix thermal zone suspend-resume synchronization Date: Mon, 15 Jan 2024 19:13:00 -0500 Message-ID: <20240116001308.212917-18-sashal@kernel.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240116001308.212917-1-sashal@kernel.org> References: <20240116001308.212917-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 6.7 Content-Transfer-Encoding: 8bit From: "Rafael J. Wysocki" [ Upstream commit 4e814173a8c4f432fd068b1c796f0416328c9d99 ] There are 3 synchronization issues with thermal zone suspend-resume during system-wide transitions: 1. The resume code runs in a PM notifier which is invoked after user space has been thawed, so it can run concurrently with user space which can trigger a thermal zone device removal. If that happens, the thermal zone resume code may use a stale pointer to the next list element and crash, because it does not hold thermal_list_lock while walking thermal_tz_list. 2. The thermal zone resume code calls thermal_zone_device_init() outside the zone lock, so user space or an update triggered by the platform firmware may see an inconsistent state of a thermal zone leading to unexpected behavior. 3. Clearing the in_suspend global variable in thermal_pm_notify() allows __thermal_zone_device_update() to continue for all thermal zones and it may as well run before the thermal_tz_list walk (or at any point during the list walk for that matter) and attempt to operate on a thermal zone that has not been resumed yet. It may also race destructively with thermal_zone_device_init(). To address these issues, add thermal_list_lock locking to thermal_pm_notify(), especially arount the thermal_tz_list, make it call thermal_zone_device_init() back-to-back with __thermal_zone_device_update() under the zone lock and replace in_suspend with per-zone bool "suspend" indicators set and unset under the given zone's lock. Link: https://lore.kernel.org/linux-pm/20231218162348.69101-1-bo.ye@mediatek.com/ Reported-by: Bo Ye Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin --- drivers/thermal/thermal_core.c | 30 +++++++++++++++++++++++------- include/linux/thermal.h | 2 ++ 2 files changed, 25 insertions(+), 7 deletions(-) diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 9c17d35ccbbd..fe3bcfba4d67 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -37,8 +37,6 @@ static LIST_HEAD(thermal_governor_list); static DEFINE_MUTEX(thermal_list_lock); static DEFINE_MUTEX(thermal_governor_lock); -static atomic_t in_suspend; - static struct thermal_governor *def_governor; /* @@ -405,7 +403,7 @@ void __thermal_zone_device_update(struct thermal_zone_device *tz, { const struct thermal_trip *trip; - if (atomic_read(&in_suspend)) + if (tz->suspended) return; if (WARN_ONCE(!tz->ops->get_temp, @@ -1515,17 +1513,35 @@ static int thermal_pm_notify(struct notifier_block *nb, case PM_HIBERNATION_PREPARE: case PM_RESTORE_PREPARE: case PM_SUSPEND_PREPARE: - atomic_set(&in_suspend, 1); + mutex_lock(&thermal_list_lock); + + list_for_each_entry(tz, &thermal_tz_list, node) { + mutex_lock(&tz->lock); + + tz->suspended = true; + + mutex_unlock(&tz->lock); + } + + mutex_unlock(&thermal_list_lock); break; case PM_POST_HIBERNATION: case PM_POST_RESTORE: case PM_POST_SUSPEND: - atomic_set(&in_suspend, 0); + mutex_lock(&thermal_list_lock); + list_for_each_entry(tz, &thermal_tz_list, node) { + mutex_lock(&tz->lock); + + tz->suspended = false; + thermal_zone_device_init(tz); - thermal_zone_device_update(tz, - THERMAL_EVENT_UNSPECIFIED); + __thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED); + + mutex_unlock(&tz->lock); } + + mutex_unlock(&thermal_list_lock); break; default: break; diff --git a/include/linux/thermal.h b/include/linux/thermal.h index cee814d5d1ac..1da1739d75d9 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -149,6 +149,7 @@ struct thermal_cooling_device { * @node: node in thermal_tz_list (in thermal_core.c) * @poll_queue: delayed work for polling * @notify_event: Last notification event + * @suspended: thermal zone suspend indicator */ struct thermal_zone_device { int id; @@ -181,6 +182,7 @@ struct thermal_zone_device { struct list_head node; struct delayed_work poll_queue; enum thermal_notify_event notify_event; + bool suspended; }; /** -- 2.43.0