* [PATCH v1] PM: runtime: PCI: Drain runtime-idle callbacks before driver removal
@ 2024-03-05 10:45 Rafael J. Wysocki
2024-03-05 13:36 ` Kai-Heng Feng
2024-03-05 18:14 ` Bjorn Helgaas
0 siblings, 2 replies; 3+ messages in thread
From: Rafael J. Wysocki @ 2024-03-05 10:45 UTC (permalink / raw)
To: Linux PCI, Bjorn Helgaas
Cc: LKML, Linux PM, Rafael J. Wysocki, Greg Kroah-Hartman, Ricky Wu,
Kai-Heng Feng
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
A race condition between the .runtime_idle() callback and the .remove()
callback in the rtsx_pcr PCI driver leads to a kernel crash due to an
unhandled page fault [1].
The problem is that rtsx_pci_runtime_idle() is not expected to be
running after pm_runtime_get_sync() has been called, but the latter
doesn't really guarantee that. It only guarantees that the suspend
and resume callbacks will not be running when it returns.
However, if a .runtime_idle() callback is already running when
pm_runtime_get_sync() is called, the latter will notice that the
runtime PM status of the device is RPM_ACTIVE and it will return right
away without waiting for the former to complete. In fact, it cannot
wait for .runtime_idle() to complete because it may be called from that
callback (it arguably does not make much sense to do that, but it is not
strictly prohibited).
Thus in general, whoever is providing a .runtime_idle() callback, they
need to protect it from running in parallel with whatever code runs
after pm_runtime_get_sync(). [Note that .runtime_idle() will not start
after pm_runtime_get_sync() has returned, but it may continue running
then if it has started earlier already.]
One way to address that race condition is to call pm_runtime_barrier()
after pm_runtime_get_sync() (not before it, because a nonzero value of
the runtime PM usage counter is necessary to prevent runtime PM
callbacks from being invoked) to wait for the runtime-idle callback to
complete should it be running at that point. A suitable place for
doing that is in pci_device_remove() which calls pm_runtime_get_sync()
before removing the driver, so it may as well call pm_runtime_barrier()
subsequently, which will prevent the race in question from occurring,
not just in the rtsx_pcr driver, but in any PCI drivers providing
runtime-idle callbacks.
Link: https://lore.kernel.org/lkml/20240229062201.49500-1-kai.heng.feng@canonical.com/ # [1]
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Ricky Wu <ricky_wu@realtek.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
drivers/pci/pci-driver.c | 7 +++++++
1 file changed, 7 insertions(+)
Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -473,6 +473,13 @@ static void pci_device_remove(struct dev
if (drv->remove) {
pm_runtime_get_sync(dev);
+ /*
+ * If the driver provides a .runtime_idle() callback and it has
+ * started to run already, it may continue to run in parallel
+ * with the code below, so wait until all of the runtime PM
+ * activity has completed.
+ */
+ pm_runtime_barrier(dev);
drv->remove(pci_dev);
pm_runtime_put_noidle(dev);
}
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v1] PM: runtime: PCI: Drain runtime-idle callbacks before driver removal
2024-03-05 10:45 [PATCH v1] PM: runtime: PCI: Drain runtime-idle callbacks before driver removal Rafael J. Wysocki
@ 2024-03-05 13:36 ` Kai-Heng Feng
2024-03-05 18:14 ` Bjorn Helgaas
1 sibling, 0 replies; 3+ messages in thread
From: Kai-Heng Feng @ 2024-03-05 13:36 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux PCI, Bjorn Helgaas, LKML, Linux PM, Rafael J. Wysocki,
Greg Kroah-Hartman, Ricky Wu
On Tue, Mar 5, 2024 at 6:45 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> A race condition between the .runtime_idle() callback and the .remove()
> callback in the rtsx_pcr PCI driver leads to a kernel crash due to an
> unhandled page fault [1].
>
> The problem is that rtsx_pci_runtime_idle() is not expected to be
> running after pm_runtime_get_sync() has been called, but the latter
> doesn't really guarantee that. It only guarantees that the suspend
> and resume callbacks will not be running when it returns.
>
> However, if a .runtime_idle() callback is already running when
> pm_runtime_get_sync() is called, the latter will notice that the
> runtime PM status of the device is RPM_ACTIVE and it will return right
> away without waiting for the former to complete. In fact, it cannot
> wait for .runtime_idle() to complete because it may be called from that
> callback (it arguably does not make much sense to do that, but it is not
> strictly prohibited).
>
> Thus in general, whoever is providing a .runtime_idle() callback, they
> need to protect it from running in parallel with whatever code runs
> after pm_runtime_get_sync(). [Note that .runtime_idle() will not start
> after pm_runtime_get_sync() has returned, but it may continue running
> then if it has started earlier already.]
>
> One way to address that race condition is to call pm_runtime_barrier()
> after pm_runtime_get_sync() (not before it, because a nonzero value of
> the runtime PM usage counter is necessary to prevent runtime PM
> callbacks from being invoked) to wait for the runtime-idle callback to
> complete should it be running at that point. A suitable place for
> doing that is in pci_device_remove() which calls pm_runtime_get_sync()
> before removing the driver, so it may as well call pm_runtime_barrier()
> subsequently, which will prevent the race in question from occurring,
> not just in the rtsx_pcr driver, but in any PCI drivers providing
> runtime-idle callbacks.
>
> Link: https://lore.kernel.org/lkml/20240229062201.49500-1-kai.heng.feng@canonical.com/ # [1]
> Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> Tested-by: Ricky Wu <ricky_wu@realtek.com>
> Cc: All applicable <stable@vger.kernel.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Thanks for the debugging and patch.
Acked-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
> drivers/pci/pci-driver.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> Index: linux-pm/drivers/pci/pci-driver.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -473,6 +473,13 @@ static void pci_device_remove(struct dev
>
> if (drv->remove) {
> pm_runtime_get_sync(dev);
> + /*
> + * If the driver provides a .runtime_idle() callback and it has
> + * started to run already, it may continue to run in parallel
> + * with the code below, so wait until all of the runtime PM
> + * activity has completed.
> + */
> + pm_runtime_barrier(dev);
> drv->remove(pci_dev);
> pm_runtime_put_noidle(dev);
> }
>
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v1] PM: runtime: PCI: Drain runtime-idle callbacks before driver removal
2024-03-05 10:45 [PATCH v1] PM: runtime: PCI: Drain runtime-idle callbacks before driver removal Rafael J. Wysocki
2024-03-05 13:36 ` Kai-Heng Feng
@ 2024-03-05 18:14 ` Bjorn Helgaas
1 sibling, 0 replies; 3+ messages in thread
From: Bjorn Helgaas @ 2024-03-05 18:14 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux PCI, LKML, Linux PM, Rafael J. Wysocki, Greg Kroah-Hartman,
Ricky Wu, Kai-Heng Feng
On Tue, Mar 05, 2024 at 11:45:38AM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> A race condition between the .runtime_idle() callback and the .remove()
> callback in the rtsx_pcr PCI driver leads to a kernel crash due to an
> unhandled page fault [1].
>
> The problem is that rtsx_pci_runtime_idle() is not expected to be
> running after pm_runtime_get_sync() has been called, but the latter
> doesn't really guarantee that. It only guarantees that the suspend
> and resume callbacks will not be running when it returns.
>
> However, if a .runtime_idle() callback is already running when
> pm_runtime_get_sync() is called, the latter will notice that the
> runtime PM status of the device is RPM_ACTIVE and it will return right
> away without waiting for the former to complete. In fact, it cannot
> wait for .runtime_idle() to complete because it may be called from that
> callback (it arguably does not make much sense to do that, but it is not
> strictly prohibited).
>
> Thus in general, whoever is providing a .runtime_idle() callback, they
> need to protect it from running in parallel with whatever code runs
> after pm_runtime_get_sync(). [Note that .runtime_idle() will not start
> after pm_runtime_get_sync() has returned, but it may continue running
> then if it has started earlier already.]
>
> One way to address that race condition is to call pm_runtime_barrier()
> after pm_runtime_get_sync() (not before it, because a nonzero value of
> the runtime PM usage counter is necessary to prevent runtime PM
> callbacks from being invoked) to wait for the runtime-idle callback to
> complete should it be running at that point. A suitable place for
> doing that is in pci_device_remove() which calls pm_runtime_get_sync()
> before removing the driver, so it may as well call pm_runtime_barrier()
> subsequently, which will prevent the race in question from occurring,
> not just in the rtsx_pcr driver, but in any PCI drivers providing
> runtime-idle callbacks.
>
> Link: https://lore.kernel.org/lkml/20240229062201.49500-1-kai.heng.feng@canonical.com/ # [1]
> Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> Tested-by: Ricky Wu <ricky_wu@realtek.com>
> Cc: All applicable <stable@vger.kernel.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Applied with Kai-Feng's ack to pci/pm for v6.9, thank you very much!
> ---
> drivers/pci/pci-driver.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> Index: linux-pm/drivers/pci/pci-driver.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -473,6 +473,13 @@ static void pci_device_remove(struct dev
>
> if (drv->remove) {
> pm_runtime_get_sync(dev);
> + /*
> + * If the driver provides a .runtime_idle() callback and it has
> + * started to run already, it may continue to run in parallel
> + * with the code below, so wait until all of the runtime PM
> + * activity has completed.
> + */
> + pm_runtime_barrier(dev);
> drv->remove(pci_dev);
> pm_runtime_put_noidle(dev);
> }
>
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-03-05 18:14 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-05 10:45 [PATCH v1] PM: runtime: PCI: Drain runtime-idle callbacks before driver removal Rafael J. Wysocki
2024-03-05 13:36 ` Kai-Heng Feng
2024-03-05 18:14 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).