From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B8E3146D7E; Thu, 11 Apr 2024 10:35:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712831733; cv=none; b=I4sQC9reZx9UuB/9OOcwDbn1w4BqittR7g8D/NYC4c+QgDEuJtQQzsmrd2Xl3nnYgFAU9lDBBIG/OsHSFBYTrFINxv+/5saIb8RhxKRoRBwMfJOgqrtrrGFFVqXRzwxNrJ0sFEngNisDT7E5wHBVqnXH+axzuOvwW1+fJF/1GEg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712831733; c=relaxed/simple; bh=juoEirCxsMSUU8kM2CfrlgKbhsKq60LgFGN10PKjqLo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rFOqTnEUAcK5S7vlpyujeSV3CPLjzUjn/B04vakSKxtN/KAgT8L1eOe4KPU1HA6t2KlZpaEhRb1u8+epak0+7Qc0tK3zBqQM9uuo/OlsoommiCgt3CTCaBAGC5S8j8SmoNsCYGotusOIctl9dEO1yu1hjNK3ldm2lQeGdaxxCiY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b=FxayID9U; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b="FxayID9U" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D5343C433F1; Thu, 11 Apr 2024 10:35:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1712831733; bh=juoEirCxsMSUU8kM2CfrlgKbhsKq60LgFGN10PKjqLo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=FxayID9URxytpzJlDkQ04u5dfmKbvREcv6vyBFzCowqjd7wpVefFWqwcdWfnLMPTw lB1y/r6XteDF2N/uO4zai0QgFMfvLpK6zjRFo4dlYyniQkyo1VgN0QvUZ8akV31Nic mrHc+09LuOV+dt/tLVUbuN/7AgQxQ75PkZiiafx4= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Kai-Heng Feng , "Rafael J. Wysocki" , Bjorn Helgaas , Ricky Wu , Sasha Levin Subject: [PATCH 5.10 060/294] PCI/PM: Drain runtime-idle callbacks before driver removal Date: Thu, 11 Apr 2024 11:53:43 +0200 Message-ID: <20240411095437.447297136@linuxfoundation.org> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240411095435.633465671@linuxfoundation.org> References: <20240411095435.633465671@linuxfoundation.org> User-Agent: quilt/0.67 X-stable: review X-Patchwork-Hint: ignore Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit 5.10-stable review patch. If anyone has any objections, please let me know. ------------------ From: Rafael J. Wysocki [ Upstream commit 9d5286d4e7f68beab450deddbb6a32edd5ecf4bf ] A race condition between the .runtime_idle() callback and the .remove() callback in the rtsx_pcr PCI driver leads to a kernel crash due to an unhandled page fault [1]. The problem is that rtsx_pci_runtime_idle() is not expected to be running after pm_runtime_get_sync() has been called, but the latter doesn't really guarantee that. It only guarantees that the suspend and resume callbacks will not be running when it returns. However, if a .runtime_idle() callback is already running when pm_runtime_get_sync() is called, the latter will notice that the runtime PM status of the device is RPM_ACTIVE and it will return right away without waiting for the former to complete. In fact, it cannot wait for .runtime_idle() to complete because it may be called from that callback (it arguably does not make much sense to do that, but it is not strictly prohibited). Thus in general, whoever is providing a .runtime_idle() callback needs to protect it from running in parallel with whatever code runs after pm_runtime_get_sync(). [Note that .runtime_idle() will not start after pm_runtime_get_sync() has returned, but it may continue running then if it has started earlier.] One way to address that race condition is to call pm_runtime_barrier() after pm_runtime_get_sync() (not before it, because a nonzero value of the runtime PM usage counter is necessary to prevent runtime PM callbacks from being invoked) to wait for the .runtime_idle() callback to complete should it be running at that point. A suitable place for doing that is in pci_device_remove() which calls pm_runtime_get_sync() before removing the driver, so it may as well call pm_runtime_barrier() subsequently, which will prevent the race in question from occurring, not just in the rtsx_pcr driver, but in any PCI drivers providing .runtime_idle() callbacks. Link: https://lore.kernel.org/lkml/20240229062201.49500-1-kai.heng.feng@canonical.com/ # [1] Link: https://lore.kernel.org/r/5761426.DvuYhMxLoT@kreacher Reported-by: Kai-Heng Feng Signed-off-by: Rafael J. Wysocki Signed-off-by: Bjorn Helgaas Tested-by: Ricky Wu Acked-by: Kai-Heng Feng Cc: Signed-off-by: Sasha Levin --- drivers/pci/pci-driver.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index dbfeb5c148755..ae675c5a47151 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -446,6 +446,13 @@ static int pci_device_remove(struct device *dev) if (drv->remove) { pm_runtime_get_sync(dev); + /* + * If the driver provides a .runtime_idle() callback and it has + * started to run already, it may continue to run in parallel + * with the code below, so wait until all of the runtime PM + * activity has completed. + */ + pm_runtime_barrier(dev); drv->remove(pci_dev); pm_runtime_put_noidle(dev); } -- 2.43.0