linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] PCI PM refinements
@ 2016-08-31  6:15 Lukas Wunner
  2016-08-31  6:15 ` [PATCH 2/4] PCI: Query platform firmware for device power state Lukas Wunner
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Lukas Wunner @ 2016-08-31  6:15 UTC (permalink / raw)
  To: linux-pci
  Cc: linux-pm, linux-acpi, Rafael J. Wysocki, Peter Wu, Andreas Noever

The following patches are part of my Thunderbolt Runtime PM series.
I'm posting them separately to split the series into digestible chunks.

They benefit everyone, not just Thunderbolt, and Thunderbolt works fine
without them (just not with the same degree of perfection).

Patch [1/4] affords direct-complete to devices which are not platform-
power-manageable, yet can be suspended to D3cold with a nonstandard
mechanism. It does so using a heuristic: If a device's current_state
is D3cold and it is not power-manageable, it is assumed that a
nonstandard PM mechanism is at work and the device is left asleep.
The patch is the outcome of a discussion with Rafael and Alan on
linux-pci@ and I would like to thank both for tirelessly responding
to my questions.

Patch [3/4] replaces the unconditional resume after direct-complete with
a conditional resume by determining if the platform firmware has put the
device in reset-power-on or not.

Because the D3cold state cannot be determined from the PMCSR, patch [2/4]
adds a hook to query the platform firmware for its opinion on the device's
power state.

Finally, patch [4/4] changes the shutdown handling of PCI devices such
that they are no longer woken without reason.

The result of these patches should be a speedup and decreased power
consumption for system sleep and system shutdown (both for devices that
are platform-power-manageable and those that aren't).

If you prefer point & click interfaces, the patches are browsable here:
https://github.com/l1k/linux/commits/thunderbolt_runpm_v3

Thanks,

Lukas


Lukas Wunner (4):
  PCI: Afford direct-complete to devices with nonstandard pm
  PCI: Query platform firmware for device power state
  PCI: Avoid unnecessary resume after direct-complete
  PCI: Avoid unnecessary resume on shutdown

 drivers/pci/pci-acpi.c   | 23 +++++++++++++++++++++++
 drivers/pci/pci-driver.c | 20 ++++++++++++++++++--
 drivers/pci/pci.c        | 38 +++++++++++++++++++++++++++++++-------
 drivers/pci/pci.h        |  3 +++
 4 files changed, 75 insertions(+), 9 deletions(-)

-- 
2.8.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2/4] PCI: Query platform firmware for device power state
  2016-08-31  6:15 [PATCH 0/4] PCI PM refinements Lukas Wunner
@ 2016-08-31  6:15 ` Lukas Wunner
  2016-09-14  0:21   ` Rafael J. Wysocki
  2016-08-31  6:15 ` [PATCH 1/4] PCI: Afford direct-complete to devices with nonstandard PM Lukas Wunner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Lukas Wunner @ 2016-08-31  6:15 UTC (permalink / raw)
  To: linux-pci
  Cc: linux-pm, linux-acpi, Rafael J. Wysocki, Peter Wu, Andreas Noever

Usually the most accurate way to determine a PCI device's power state is
to read its PM Control & Status Register.  There are two cases however
when this is not an option:  If the device doesn't have the PM
capability at all, or if it is in D3cold.

In D3cold, reading from config space typically results in a fabricated
"all ones" response.  But in D3hot, the two bits representing the power
state in the PMCSR are *also* set to 1.  Thus D3hot and D3cold are not
discernible by just reading the PMCSR.

A (supposedly) reliable way to detect D3cold is to query the platform
firmware for its opinion on the device's power state.  To this end,
add a ->get_power callback to struct pci_platform_pm_ops, and an
implementation to acpi_pci_platform_pm.  (The only pci_platform_pm_ops
existing so far).

Amend pci_update_current_state() to query the platform firmware before
reading the PMCSR.

Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pci-acpi.c | 23 +++++++++++++++++++++++
 drivers/pci/pci.c      | 21 ++++++++++++++++-----
 drivers/pci/pci.h      |  3 +++
 3 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index 9a033e8..89f2707 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -452,6 +452,28 @@ static int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
 	return error;
 }
 
+static pci_power_t acpi_pci_get_power_state(struct pci_dev *dev)
+{
+	struct acpi_device *adev = ACPI_COMPANION(&dev->dev);
+	static const pci_power_t state_conv[] = {
+		[ACPI_STATE_D0]      = PCI_D0,
+		[ACPI_STATE_D1]      = PCI_D1,
+		[ACPI_STATE_D2]      = PCI_D2,
+		[ACPI_STATE_D3_HOT]  = PCI_D3hot,
+		[ACPI_STATE_D3_COLD] = PCI_D3cold,
+	};
+	int state;
+
+	if (!adev || !acpi_device_power_manageable(adev))
+		return PCI_UNKNOWN;
+
+	if (acpi_device_get_power(adev, &state) || state < ACPI_STATE_D0
+						|| state > ACPI_STATE_D3_COLD)
+		return PCI_UNKNOWN;
+
+	return state_conv[state];
+}
+
 static bool acpi_pci_can_wakeup(struct pci_dev *dev)
 {
 	struct acpi_device *adev = ACPI_COMPANION(&dev->dev);
@@ -534,6 +556,7 @@ static bool acpi_pci_need_resume(struct pci_dev *dev)
 static const struct pci_platform_pm_ops acpi_pci_platform_pm = {
 	.is_manageable = acpi_pci_power_manageable,
 	.set_state = acpi_pci_set_power_state,
+	.get_state = acpi_pci_get_power_state,
 	.choose_state = acpi_pci_choose_state,
 	.sleep_wake = acpi_pci_sleep_wake,
 	.run_wake = acpi_pci_run_wake,
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 72a9d3a..e52e3d4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -552,8 +552,9 @@ static const struct pci_platform_pm_ops *pci_platform_pm;
 
 int pci_set_platform_pm(const struct pci_platform_pm_ops *ops)
 {
-	if (!ops->is_manageable || !ops->set_state || !ops->choose_state ||
-	    !ops->sleep_wake || !ops->run_wake || !ops->need_resume)
+	if (!ops->is_manageable || !ops->set_state  || !ops->get_state ||
+	    !ops->choose_state  || !ops->sleep_wake || !ops->run_wake  ||
+	    !ops->need_resume)
 		return -EINVAL;
 	pci_platform_pm = ops;
 	return 0;
@@ -570,6 +571,11 @@ static inline int platform_pci_set_power_state(struct pci_dev *dev,
 	return pci_platform_pm ? pci_platform_pm->set_state(dev, t) : -ENOSYS;
 }
 
+static inline pci_power_t platform_pci_get_power_state(struct pci_dev *dev)
+{
+	return pci_platform_pm ? pci_platform_pm->get_state(dev) : PCI_UNKNOWN;
+}
+
 static inline pci_power_t platform_pci_choose_state(struct pci_dev *dev)
 {
 	return pci_platform_pm ?
@@ -701,14 +707,19 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
 }
 
 /**
- * pci_update_current_state - Read PCI power state of given device from its
- *                            PCI PM registers and cache it
+ * pci_update_current_state - Read power state of given device and cache it
  * @dev: PCI device to handle.
  * @state: State to cache in case the device doesn't have the PM capability
+ *
+ * The power state is read from the PMCSR register, which however is
+ * inaccessible in D3cold. The platform firmware is therefore queried first
+ * to detect accessibility of the register.
  */
 void pci_update_current_state(struct pci_dev *dev, pci_power_t state)
 {
-	if (dev->pm_cap) {
+	if (platform_pci_get_power_state(dev) == PCI_D3cold) {
+		dev->current_state = PCI_D3cold;
+	} else if (dev->pm_cap) {
 		u16 pmcsr;
 
 		/*
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9730c47..01d5206 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -42,6 +42,8 @@ int pci_probe_reset_function(struct pci_dev *dev);
  *
  * @set_state: invokes the platform firmware to set the device's power state
  *
+ * @get_state: queries the platform firmware for a device's current power state
+ *
  * @choose_state: returns PCI power state of given device preferred by the
  *                platform; to be used during system-wide transitions from a
  *                sleeping state to the working state and vice versa
@@ -62,6 +64,7 @@ int pci_probe_reset_function(struct pci_dev *dev);
 struct pci_platform_pm_ops {
 	bool (*is_manageable)(struct pci_dev *dev);
 	int (*set_state)(struct pci_dev *dev, pci_power_t state);
+	pci_power_t (*get_state)(struct pci_dev *dev);
 	pci_power_t (*choose_state)(struct pci_dev *dev);
 	int (*sleep_wake)(struct pci_dev *dev, bool enable);
 	int (*run_wake)(struct pci_dev *dev, bool enable);
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/4] PCI: Avoid unnecessary resume after direct-complete
  2016-08-31  6:15 [PATCH 0/4] PCI PM refinements Lukas Wunner
                   ` (2 preceding siblings ...)
  2016-08-31  6:15 ` [PATCH 4/4] PCI: Avoid unnecessary resume on shutdown Lukas Wunner
@ 2016-08-31  6:15 ` Lukas Wunner
  2016-09-14  0:29   ` Rafael J. Wysocki
  2016-09-14  0:50   ` Rafael J. Wysocki
  2016-09-12 22:07 ` [PATCH 0/4] PCI PM refinements Bjorn Helgaas
  4 siblings, 2 replies; 27+ messages in thread
From: Lukas Wunner @ 2016-08-31  6:15 UTC (permalink / raw)
  To: linux-pci
  Cc: linux-pm, linux-acpi, Rafael J. Wysocki, Peter Wu, Andreas Noever

Commit 58a1fbbb2ee8 ("PM / PCI / ACPI: Kick devices that might have been
reset by firmware") added a runtime resume for devices that were runtime
suspended when the system entered sleep.

The motivation was that devices might be in a reset-power-on state after
waking from system sleep, so their power state as perceived by Linux
(stored in pci_dev->current_state) would no longer reflect reality.
By resuming such devices, we allow them to return to a low-power state
via autosuspend and also bring their current_state in sync with reality.

However for devices that are *not* in a reset-power-on state, doing an
unconditional resume wastes energy.  A more refined approach is called
for which issues a runtime resume only if the power state after
direct-complete is shallower than it was before. To achieve this, update
the device's current_state by querying the platform firmware and reading
its PMCSR.

Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pci-driver.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index e39a67c..fd4b9c4 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -684,8 +684,19 @@ static int pci_pm_prepare(struct device *dev)
 
 static void pci_pm_complete(struct device *dev)
 {
-	pci_dev_complete_resume(to_pci_dev(dev));
-	pm_complete_with_resume_check(dev);
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+
+	pci_dev_complete_resume(pci_dev);
+	pm_generic_complete(dev);
+
+	/* Resume device if platform firmware has put it in reset-power-on */
+	if (dev->power.direct_complete && pm_resume_via_firmware()) {
+		pci_power_t pre_sleep_state = pci_dev->current_state;
+
+		pci_update_current_state(pci_dev, pci_dev->current_state);
+		if (pci_dev->current_state < pre_sleep_state)
+			pm_request_resume(dev);
+	}
 }
 
 #else /* !CONFIG_PM_SLEEP */
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/4] PCI: Avoid unnecessary resume on shutdown
  2016-08-31  6:15 [PATCH 0/4] PCI PM refinements Lukas Wunner
  2016-08-31  6:15 ` [PATCH 2/4] PCI: Query platform firmware for device power state Lukas Wunner
  2016-08-31  6:15 ` [PATCH 1/4] PCI: Afford direct-complete to devices with nonstandard PM Lukas Wunner
@ 2016-08-31  6:15 ` Lukas Wunner
  2016-09-14  0:29   ` Rafael J. Wysocki
  2016-08-31  6:15 ` [PATCH 3/4] PCI: Avoid unnecessary resume after direct-complete Lukas Wunner
  2016-09-12 22:07 ` [PATCH 0/4] PCI PM refinements Bjorn Helgaas
  4 siblings, 1 reply; 27+ messages in thread
From: Lukas Wunner @ 2016-08-31  6:15 UTC (permalink / raw)
  To: linux-pci
  Cc: linux-pm, linux-acpi, Rafael J. Wysocki, Peter Wu, Andreas Noever

We currently perform a mandatory runtime resume of all PCI devices on
->shutdown.  However it is pointless to wake devices only to immediately
power them down afterwards.  (Or have the firmware reset them, in case
of a reboot.)

It seems there are only two cases when a runtime resume is actually
necessary:  If the driver has declared a ->shutdown callback or if kexec
is in progress.

Constrain resume of a device to these cases and let it slumber
otherwise, thereby conserving energy and speeding up shutdown.

Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pci-driver.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index fd4b9c4..09a4e56 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -459,6 +459,11 @@ static void pci_device_shutdown(struct device *dev)
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct pci_driver *drv = pci_dev->driver;
 
+	/* Fast path for suspended devices */
+	if (pm_runtime_suspended(dev) && (!drv || !drv->shutdown) &&
+	    !kexec_in_progress)
+		return;
+
 	pm_runtime_resume(dev);
 
 	if (drv && drv->shutdown)
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 1/4] PCI: Afford direct-complete to devices with nonstandard PM
  2016-08-31  6:15 [PATCH 0/4] PCI PM refinements Lukas Wunner
  2016-08-31  6:15 ` [PATCH 2/4] PCI: Query platform firmware for device power state Lukas Wunner
@ 2016-08-31  6:15 ` Lukas Wunner
  2016-09-14  0:05   ` Rafael J. Wysocki
  2016-08-31  6:15 ` [PATCH 4/4] PCI: Avoid unnecessary resume on shutdown Lukas Wunner
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Lukas Wunner @ 2016-08-31  6:15 UTC (permalink / raw)
  To: linux-pci
  Cc: linux-pm, linux-acpi, Rafael J. Wysocki, Peter Wu, Andreas Noever

There are devices not power-manageable by the platform, but still able
to runtime suspend to D3cold with a nonstandard mechanism.  One example
is laptop hybrid graphics where the discrete GPU and its built-in HDA
controller are power-managed either with a _DSM (AMD PowerXpress, Nvidia
Optimus) or a separate gmux controller (MacBook Pro).  Another example
is Thunderbolt on Macs which is power-managed with custom ACPI methods.

When putting the system to sleep, we currently handle such devices
improperly by transitioning them from D3cold to D3hot (the default power
state defined at the top of pci_target_state()).  This wastes energy and
prolongs the suspend sequence (powering up the Thunderbolt controller
takes 2 seconds).

Avoid that by assuming that a nonstandard PM mechanism is at work if the
device is not platform-power-manageable but currently in D3cold.

If the device is wakeup enabled, we might still have to wake it up from
D3cold if PME cannot be signaled from that power state.

The check for devices without PM capability comes before the check for
D3cold since such devices could in theory also be powered down by
nonstandard means and should then be afforded direct-complete as well.

Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pci.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index aab9d51..72a9d3a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1959,9 +1959,22 @@ static pci_power_t pci_target_state(struct pci_dev *dev)
 		default:
 			target_state = state;
 		}
-	} else if (!dev->pm_cap) {
+
+		return target_state;
+	}
+
+	if (!dev->pm_cap)
 		target_state = PCI_D0;
-	} else if (device_may_wakeup(&dev->dev)) {
+
+	/*
+	 * If the device is in D3cold even though it's not power-manageable by
+	 * the platform, it may have been powered down by nonstandard means.
+	 * Best to let it slumber.
+	 */
+	if (dev->current_state == PCI_D3cold)
+		target_state = PCI_D3cold;
+
+	if (device_may_wakeup(&dev->dev)) {
 		/*
 		 * Find the deepest state from which the device can generate
 		 * wake-up events, make it the target state and enable device
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/4] PCI PM refinements
  2016-08-31  6:15 [PATCH 0/4] PCI PM refinements Lukas Wunner
                   ` (3 preceding siblings ...)
  2016-08-31  6:15 ` [PATCH 3/4] PCI: Avoid unnecessary resume after direct-complete Lukas Wunner
@ 2016-09-12 22:07 ` Bjorn Helgaas
  2016-09-12 22:15   ` Rafael J. Wysocki
  4 siblings, 1 reply; 27+ messages in thread
From: Bjorn Helgaas @ 2016-09-12 22:07 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, linux-pm, linux-acpi, Rafael J. Wysocki, Peter Wu,
	Andreas Noever

On Wed, Aug 31, 2016 at 08:15:18AM +0200, Lukas Wunner wrote:
> The following patches are part of my Thunderbolt Runtime PM series.
> I'm posting them separately to split the series into digestible chunks.
> 
> They benefit everyone, not just Thunderbolt, and Thunderbolt works fine
> without them (just not with the same degree of perfection).
> 
> Patch [1/4] affords direct-complete to devices which are not platform-
> power-manageable, yet can be suspended to D3cold with a nonstandard
> mechanism. It does so using a heuristic: If a device's current_state
> is D3cold and it is not power-manageable, it is assumed that a
> nonstandard PM mechanism is at work and the device is left asleep.
> The patch is the outcome of a discussion with Rafael and Alan on
> linux-pci@ and I would like to thank both for tirelessly responding
> to my questions.
> 
> Patch [3/4] replaces the unconditional resume after direct-complete with
> a conditional resume by determining if the platform firmware has put the
> device in reset-power-on or not.
> 
> Because the D3cold state cannot be determined from the PMCSR, patch [2/4]
> adds a hook to query the platform firmware for its opinion on the device's
> power state.
> 
> Finally, patch [4/4] changes the shutdown handling of PCI devices such
> that they are no longer woken without reason.
> 
> The result of these patches should be a speedup and decreased power
> consumption for system sleep and system shutdown (both for devices that
> are platform-power-manageable and those that aren't).
> 
> If you prefer point & click interfaces, the patches are browsable here:
> https://github.com/l1k/linux/commits/thunderbolt_runpm_v3

Rafael, do you have any comment on these?  I don't have any objections
myself, but I'd like your opinion as the PM expert :)

> Lukas Wunner (4):
>   PCI: Afford direct-complete to devices with nonstandard pm
>   PCI: Query platform firmware for device power state
>   PCI: Avoid unnecessary resume after direct-complete
>   PCI: Avoid unnecessary resume on shutdown
> 
>  drivers/pci/pci-acpi.c   | 23 +++++++++++++++++++++++
>  drivers/pci/pci-driver.c | 20 ++++++++++++++++++--
>  drivers/pci/pci.c        | 38 +++++++++++++++++++++++++++++++-------
>  drivers/pci/pci.h        |  3 +++
>  4 files changed, 75 insertions(+), 9 deletions(-)
> 
> -- 
> 2.8.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/4] PCI PM refinements
  2016-09-12 22:07 ` [PATCH 0/4] PCI PM refinements Bjorn Helgaas
@ 2016-09-12 22:15   ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2016-09-12 22:15 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Lukas Wunner, linux-pci, linux-pm, linux-acpi, Peter Wu,
	Andreas Noever

On Monday, September 12, 2016 05:07:10 PM Bjorn Helgaas wrote:
> On Wed, Aug 31, 2016 at 08:15:18AM +0200, Lukas Wunner wrote:
> > The following patches are part of my Thunderbolt Runtime PM series.
> > I'm posting them separately to split the series into digestible chunks.
> > 
> > They benefit everyone, not just Thunderbolt, and Thunderbolt works fine
> > without them (just not with the same degree of perfection).
> > 
> > Patch [1/4] affords direct-complete to devices which are not platform-
> > power-manageable, yet can be suspended to D3cold with a nonstandard
> > mechanism. It does so using a heuristic: If a device's current_state
> > is D3cold and it is not power-manageable, it is assumed that a
> > nonstandard PM mechanism is at work and the device is left asleep.
> > The patch is the outcome of a discussion with Rafael and Alan on
> > linux-pci@ and I would like to thank both for tirelessly responding
> > to my questions.
> > 
> > Patch [3/4] replaces the unconditional resume after direct-complete with
> > a conditional resume by determining if the platform firmware has put the
> > device in reset-power-on or not.
> > 
> > Because the D3cold state cannot be determined from the PMCSR, patch [2/4]
> > adds a hook to query the platform firmware for its opinion on the device's
> > power state.
> > 
> > Finally, patch [4/4] changes the shutdown handling of PCI devices such
> > that they are no longer woken without reason.
> > 
> > The result of these patches should be a speedup and decreased power
> > consumption for system sleep and system shutdown (both for devices that
> > are platform-power-manageable and those that aren't).
> > 
> > If you prefer point & click interfaces, the patches are browsable here:
> > https://github.com/l1k/linux/commits/thunderbolt_runpm_v3
> 
> Rafael, do you have any comment on these?  I don't have any objections
> myself, but I'd like your opinion as the PM expert :)

Please give me some more time (a day or two should be sufficient).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] PCI: Afford direct-complete to devices with nonstandard PM
  2016-08-31  6:15 ` [PATCH 1/4] PCI: Afford direct-complete to devices with nonstandard PM Lukas Wunner
@ 2016-09-14  0:05   ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2016-09-14  0:05 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> There are devices not power-manageable by the platform, but still able
> to runtime suspend to D3cold with a nonstandard mechanism.  One example
> is laptop hybrid graphics where the discrete GPU and its built-in HDA
> controller are power-managed either with a _DSM (AMD PowerXpress, Nvidia
> Optimus) or a separate gmux controller (MacBook Pro).  Another example
> is Thunderbolt on Macs which is power-managed with custom ACPI methods.
> 
> When putting the system to sleep, we currently handle such devices
> improperly by transitioning them from D3cold to D3hot (the default power
> state defined at the top of pci_target_state()).  This wastes energy and
> prolongs the suspend sequence (powering up the Thunderbolt controller
> takes 2 seconds).
> 
> Avoid that by assuming that a nonstandard PM mechanism is at work if the
> device is not platform-power-manageable but currently in D3cold.
> 
> If the device is wakeup enabled, we might still have to wake it up from
> D3cold if PME cannot be signaled from that power state.
> 
> The check for devices without PM capability comes before the check for
> D3cold since such devices could in theory also be powered down by
> nonstandard means and should then be afforded direct-complete as well.
> 
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/pci/pci.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index aab9d51..72a9d3a 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1959,9 +1959,22 @@ static pci_power_t pci_target_state(struct pci_dev *dev)
>  		default:
>  			target_state = state;
>  		}
> -	} else if (!dev->pm_cap) {
> +
> +		return target_state;
> +	}
> +
> +	if (!dev->pm_cap)
>  		target_state = PCI_D0;
> -	} else if (device_may_wakeup(&dev->dev)) {
> +
> +	/*
> +	 * If the device is in D3cold even though it's not power-manageable by
> +	 * the platform, it may have been powered down by nonstandard means.
> +	 * Best to let it slumber.
> +	 */
> +	if (dev->current_state == PCI_D3cold)
> +		target_state = PCI_D3cold;
> +
> +	if (device_may_wakeup(&dev->dev)) {
>  		/*
>  		 * Find the deepest state from which the device can generate
>  		 * wake-up events, make it the target state and enable device
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] PCI: Query platform firmware for device power state
  2016-08-31  6:15 ` [PATCH 2/4] PCI: Query platform firmware for device power state Lukas Wunner
@ 2016-09-14  0:21   ` Rafael J. Wysocki
  2016-09-14 10:50     ` Lukas Wunner
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2016-09-14  0:21 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> Usually the most accurate way to determine a PCI device's power state is
> to read its PM Control & Status Register.  There are two cases however
> when this is not an option:  If the device doesn't have the PM
> capability at all, or if it is in D3cold.
> 
> In D3cold, reading from config space typically results in a fabricated
> "all ones" response.  But in D3hot, the two bits representing the power
> state in the PMCSR are *also* set to 1.  Thus D3hot and D3cold are not
> discernible by just reading the PMCSR.
> 
> A (supposedly) reliable way to detect D3cold is to query the platform
> firmware for its opinion on the device's power state.  To this end,
> add a ->get_power callback to struct pci_platform_pm_ops, and an
> implementation to acpi_pci_platform_pm.  (The only pci_platform_pm_ops
> existing so far).
> 
> Amend pci_update_current_state() to query the platform firmware before
> reading the PMCSR.
> 
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/pci/pci-acpi.c | 23 +++++++++++++++++++++++
>  drivers/pci/pci.c      | 21 ++++++++++++++++-----
>  drivers/pci/pci.h      |  3 +++
>  3 files changed, 42 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index 9a033e8..89f2707 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -452,6 +452,28 @@ static int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
>  	return error;
>  }
>  
> +static pci_power_t acpi_pci_get_power_state(struct pci_dev *dev)
> +{
> +	struct acpi_device *adev = ACPI_COMPANION(&dev->dev);
> +	static const pci_power_t state_conv[] = {
> +		[ACPI_STATE_D0]      = PCI_D0,
> +		[ACPI_STATE_D1]      = PCI_D1,
> +		[ACPI_STATE_D2]      = PCI_D2,
> +		[ACPI_STATE_D3_HOT]  = PCI_D3hot,
> +		[ACPI_STATE_D3_COLD] = PCI_D3cold,
> +	};
> +	int state;

ACPI_STATE_D3_HOT and ACPI_STATE_D3_COLD were introduced in ACPI 4.0.  For
systems predating that, ACPI_STATE_D3_HOT is the deepest state returned by
acpi_device_get_power().

> +
> +	if (!adev || !acpi_device_power_manageable(adev))
> +		return PCI_UNKNOWN;
> +
> +	if (acpi_device_get_power(adev, &state) || state < ACPI_STATE_D0
> +						|| state > ACPI_STATE_D3_COLD)

If the device is power-manageable by ACPI (you've just checked that) and
acpi_device_get_power() returns success (0), the returned state is guaranteed
to be within the boundaries (if it isn't, there is a bug that needs to be fixed).

> +		return PCI_UNKNOWN;
> +
> +	return state_conv[state];
> +}
> +
>  static bool acpi_pci_can_wakeup(struct pci_dev *dev)
>  {
>  	struct acpi_device *adev = ACPI_COMPANION(&dev->dev);
> @@ -534,6 +556,7 @@ static bool acpi_pci_need_resume(struct pci_dev *dev)
>  static const struct pci_platform_pm_ops acpi_pci_platform_pm = {
>  	.is_manageable = acpi_pci_power_manageable,
>  	.set_state = acpi_pci_set_power_state,
> +	.get_state = acpi_pci_get_power_state,
>  	.choose_state = acpi_pci_choose_state,
>  	.sleep_wake = acpi_pci_sleep_wake,
>  	.run_wake = acpi_pci_run_wake,
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 72a9d3a..e52e3d4 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -552,8 +552,9 @@ static const struct pci_platform_pm_ops *pci_platform_pm;
>  
>  int pci_set_platform_pm(const struct pci_platform_pm_ops *ops)
>  {
> -	if (!ops->is_manageable || !ops->set_state || !ops->choose_state ||
> -	    !ops->sleep_wake || !ops->run_wake || !ops->need_resume)
> +	if (!ops->is_manageable || !ops->set_state  || !ops->get_state ||
> +	    !ops->choose_state  || !ops->sleep_wake || !ops->run_wake  ||
> +	    !ops->need_resume)
>  		return -EINVAL;
>  	pci_platform_pm = ops;
>  	return 0;
> @@ -570,6 +571,11 @@ static inline int platform_pci_set_power_state(struct pci_dev *dev,
>  	return pci_platform_pm ? pci_platform_pm->set_state(dev, t) : -ENOSYS;
>  }
>  
> +static inline pci_power_t platform_pci_get_power_state(struct pci_dev *dev)
> +{
> +	return pci_platform_pm ? pci_platform_pm->get_state(dev) : PCI_UNKNOWN;
> +}
> +
>  static inline pci_power_t platform_pci_choose_state(struct pci_dev *dev)
>  {
>  	return pci_platform_pm ?
> @@ -701,14 +707,19 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
>  }
>  
>  /**
> - * pci_update_current_state - Read PCI power state of given device from its
> - *                            PCI PM registers and cache it
> + * pci_update_current_state - Read power state of given device and cache it
>   * @dev: PCI device to handle.
>   * @state: State to cache in case the device doesn't have the PM capability
> + *
> + * The power state is read from the PMCSR register, which however is
> + * inaccessible in D3cold. The platform firmware is therefore queried first
> + * to detect accessibility of the register.
>   */
>  void pci_update_current_state(struct pci_dev *dev, pci_power_t state)
>  {
> -	if (dev->pm_cap) {
> +	if (platform_pci_get_power_state(dev) == PCI_D3cold) {
> +		dev->current_state = PCI_D3cold;
> +	} else if (dev->pm_cap) {

Why exactly do you need to change this function?

>  		u16 pmcsr;
>  
>  		/*
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 9730c47..01d5206 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -42,6 +42,8 @@ int pci_probe_reset_function(struct pci_dev *dev);
>   *
>   * @set_state: invokes the platform firmware to set the device's power state
>   *
> + * @get_state: queries the platform firmware for a device's current power state
> + *
>   * @choose_state: returns PCI power state of given device preferred by the
>   *                platform; to be used during system-wide transitions from a
>   *                sleeping state to the working state and vice versa
> @@ -62,6 +64,7 @@ int pci_probe_reset_function(struct pci_dev *dev);
>  struct pci_platform_pm_ops {
>  	bool (*is_manageable)(struct pci_dev *dev);
>  	int (*set_state)(struct pci_dev *dev, pci_power_t state);
> +	pci_power_t (*get_state)(struct pci_dev *dev);
>  	pci_power_t (*choose_state)(struct pci_dev *dev);
>  	int (*sleep_wake)(struct pci_dev *dev, bool enable);
>  	int (*run_wake)(struct pci_dev *dev, bool enable);
> 

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] PCI: Avoid unnecessary resume after direct-complete
  2016-08-31  6:15 ` [PATCH 3/4] PCI: Avoid unnecessary resume after direct-complete Lukas Wunner
@ 2016-09-14  0:29   ` Rafael J. Wysocki
  2016-09-14  0:43     ` Rafael J. Wysocki
  2016-09-14  0:50   ` Rafael J. Wysocki
  1 sibling, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2016-09-14  0:29 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> Commit 58a1fbbb2ee8 ("PM / PCI / ACPI: Kick devices that might have been
> reset by firmware") added a runtime resume for devices that were runtime
> suspended when the system entered sleep.
> 
> The motivation was that devices might be in a reset-power-on state after
> waking from system sleep, so their power state as perceived by Linux
> (stored in pci_dev->current_state) would no longer reflect reality.
> By resuming such devices, we allow them to return to a low-power state
> via autosuspend and also bring their current_state in sync with reality.

Not only that.

It allows those devices to respond more timely to user space requests
occuring right after resume.

Those user space requests often occur sequentially (read from device A,
write to device B, read from device C, write to device D, for example)
and without that pre-emptive resume each of them needs to wait for another
device to (runtime-)resume which results in quite annoying latencies
observable by users.

> However for devices that are *not* in a reset-power-on state, doing an
> unconditional resume wastes energy.

It may or may not waste it and this really is a tradeoff.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/4] PCI: Avoid unnecessary resume on shutdown
  2016-08-31  6:15 ` [PATCH 4/4] PCI: Avoid unnecessary resume on shutdown Lukas Wunner
@ 2016-09-14  0:29   ` Rafael J. Wysocki
  2016-09-15 13:11     ` Lukas Wunner
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2016-09-14  0:29 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> We currently perform a mandatory runtime resume of all PCI devices on
> ->shutdown.  However it is pointless to wake devices only to immediately
> power them down afterwards.  (Or have the firmware reset them, in case
> of a reboot.)
> 
> It seems there are only two cases when a runtime resume is actually
> necessary:  If the driver has declared a ->shutdown callback or if kexec
> is in progress.
> 
> Constrain resume of a device to these cases and let it slumber
> otherwise, thereby conserving energy and speeding up shutdown.
> 
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/pci/pci-driver.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index fd4b9c4..09a4e56 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -459,6 +459,11 @@ static void pci_device_shutdown(struct device *dev)
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
>  	struct pci_driver *drv = pci_dev->driver;
>  
> +	/* Fast path for suspended devices */
> +	if (pm_runtime_suspended(dev) && (!drv || !drv->shutdown) &&
> +	    !kexec_in_progress)

What happens if runtime suspend or resume of the device happens here?

> +		return;
> +
>  	pm_runtime_resume(dev);
>  
>  	if (drv && drv->shutdown)
> 

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] PCI: Avoid unnecessary resume after direct-complete
  2016-09-14  0:29   ` Rafael J. Wysocki
@ 2016-09-14  0:43     ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2016-09-14  0:43 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Wednesday, September 14, 2016 02:29:12 AM Rafael J. Wysocki wrote:
> On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> > Commit 58a1fbbb2ee8 ("PM / PCI / ACPI: Kick devices that might have been
> > reset by firmware") added a runtime resume for devices that were runtime
> > suspended when the system entered sleep.
> > 
> > The motivation was that devices might be in a reset-power-on state after
> > waking from system sleep, so their power state as perceived by Linux
> > (stored in pci_dev->current_state) would no longer reflect reality.
> > By resuming such devices, we allow them to return to a low-power state
> > via autosuspend and also bring their current_state in sync with reality.
> 
> Not only that.
> 
> It allows those devices to respond more timely to user space requests
> occuring right after resume.
> 
> Those user space requests often occur sequentially (read from device A,
> write to device B, read from device C, write to device D, for example)
> and without that pre-emptive resume each of them needs to wait for another
> device to (runtime-)resume which results in quite annoying latencies
> observable by users.
> 
> > However for devices that are *not* in a reset-power-on state, doing an
> > unconditional resume wastes energy.
> 
> It may or may not waste it and this really is a tradeoff.

That said it is a bit inconsistent with the suspend-to-idle case in which
we don't resume those devices, so never mind. :-)

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] PCI: Avoid unnecessary resume after direct-complete
  2016-08-31  6:15 ` [PATCH 3/4] PCI: Avoid unnecessary resume after direct-complete Lukas Wunner
  2016-09-14  0:29   ` Rafael J. Wysocki
@ 2016-09-14  0:50   ` Rafael J. Wysocki
  2016-09-14  9:56     ` Lukas Wunner
  1 sibling, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2016-09-14  0:50 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> Commit 58a1fbbb2ee8 ("PM / PCI / ACPI: Kick devices that might have been
> reset by firmware") added a runtime resume for devices that were runtime
> suspended when the system entered sleep.
> 
> The motivation was that devices might be in a reset-power-on state after
> waking from system sleep, so their power state as perceived by Linux
> (stored in pci_dev->current_state) would no longer reflect reality.
> By resuming such devices, we allow them to return to a low-power state
> via autosuspend and also bring their current_state in sync with reality.
> 
> However for devices that are *not* in a reset-power-on state, doing an
> unconditional resume wastes energy.  A more refined approach is called
> for which issues a runtime resume only if the power state after
> direct-complete is shallower than it was before. To achieve this, update
> the device's current_state by querying the platform firmware and reading
> its PMCSR.
> 
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/pci/pci-driver.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index e39a67c..fd4b9c4 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -684,8 +684,19 @@ static int pci_pm_prepare(struct device *dev)
>  
>  static void pci_pm_complete(struct device *dev)
>  {
> -	pci_dev_complete_resume(to_pci_dev(dev));
> -	pm_complete_with_resume_check(dev);
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +
> +	pci_dev_complete_resume(pci_dev);
> +	pm_generic_complete(dev);
> +
> +	/* Resume device if platform firmware has put it in reset-power-on */
> +	if (dev->power.direct_complete && pm_resume_via_firmware()) {
> +		pci_power_t pre_sleep_state = pci_dev->current_state;
> +
> +		pci_update_current_state(pci_dev, pci_dev->current_state);

I'm not sure if this can be made reliable enough for all of the systems out
there (including the pre-ACPI-4.0 ones and so on).  I'm a bit concerned that
we'll uncover firmware issues and such by doing this.

How much of a problem the existing code is in practice?

> +		if (pci_dev->current_state < pre_sleep_state)
> +			pm_request_resume(dev);
> +	}
>  }
>  
>  #else /* !CONFIG_PM_SLEEP */
> 

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] PCI: Avoid unnecessary resume after direct-complete
  2016-09-14  0:50   ` Rafael J. Wysocki
@ 2016-09-14  9:56     ` Lukas Wunner
  2016-09-14 13:14       ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Lukas Wunner @ 2016-09-14  9:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Wed, Sep 14, 2016 at 02:50:13AM +0200, Rafael J. Wysocki wrote:
> On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> > Commit 58a1fbbb2ee8 ("PM / PCI / ACPI: Kick devices that might have been
> > reset by firmware") added a runtime resume for devices that were runtime
> > suspended when the system entered sleep.
> > 
> > The motivation was that devices might be in a reset-power-on state after
> > waking from system sleep, so their power state as perceived by Linux
> > (stored in pci_dev->current_state) would no longer reflect reality.
> > By resuming such devices, we allow them to return to a low-power state
> > via autosuspend and also bring their current_state in sync with reality.
> > 
> > However for devices that are *not* in a reset-power-on state, doing an
> > unconditional resume wastes energy.  A more refined approach is called
> > for which issues a runtime resume only if the power state after
> > direct-complete is shallower than it was before. To achieve this, update
> > the device's current_state by querying the platform firmware and reading
> > its PMCSR.
> > 
> > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > ---
> >  drivers/pci/pci-driver.c | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index e39a67c..fd4b9c4 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -684,8 +684,19 @@ static int pci_pm_prepare(struct device *dev)
> >  
> >  static void pci_pm_complete(struct device *dev)
> >  {
> > -	pci_dev_complete_resume(to_pci_dev(dev));
> > -	pm_complete_with_resume_check(dev);
> > +	struct pci_dev *pci_dev = to_pci_dev(dev);
> > +
> > +	pci_dev_complete_resume(pci_dev);
> > +	pm_generic_complete(dev);
> > +
> > +	/* Resume device if platform firmware has put it in reset-power-on */
> > +	if (dev->power.direct_complete && pm_resume_via_firmware()) {
> > +		pci_power_t pre_sleep_state = pci_dev->current_state;
> > +
> > +		pci_update_current_state(pci_dev, pci_dev->current_state);
> 
> I'm not sure if this can be made reliable enough for all of the systems out
> there (including the pre-ACPI-4.0 ones and so on).  I'm a bit concerned that
> we'll uncover firmware issues and such by doing this.
> 
> How much of a problem the existing code is in practice?

Where's your ambition?  Do you want your power management to be as good
as Apple's or are you going to waste energy for fear of uncovering
firmware bugs?

I first tried to solve this only for Thunderbolt by opting out of the
mandatory resume after direct_complete.  It worked, but it was ugly.

(I have to clear the direct_complete flag in the ->complete hook, but
the device tree is walked bottom-up during ->complete, and since the
Thunderbolt controller exposes multiple devices, I have to clear the
flag for all devices from the bottom-most device, which is the NHI.
And all the rest of the s/r code lives on the top-most device, which
is the upstream bridge.)

You suggested in your e-mail of July 18: "maybe it's better to change
the PCI bus type to do something different from calling the generic
function?"

So there, I did what you suggested and tried to fix it not just for
Thunderbolt but for everyone.

And you're still not happy.

*sigh*

If on pre ACPI 4.0 systems, the device state is incorrectly reported
as D3hot although the device is still in D3cold, the device will be
runtime resumed to D0, just as it is now.  In other words, the benefit
of avoiding an unnecessary runtime resume after direct_complete is
only granted if the platform firmware reports the device state
correctly.

Best regards,

Lukas

> 
> > +		if (pci_dev->current_state < pre_sleep_state)
> > +			pm_request_resume(dev);
> > +	}
> >  }
> >  
> >  #else /* !CONFIG_PM_SLEEP */
> > 
> 
> Thanks,
> Rafael
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] PCI: Query platform firmware for device power state
  2016-09-14  0:21   ` Rafael J. Wysocki
@ 2016-09-14 10:50     ` Lukas Wunner
  2016-09-14 13:01       ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Lukas Wunner @ 2016-09-14 10:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Wed, Sep 14, 2016 at 02:21:30AM +0200, Rafael J. Wysocki wrote:
> On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> > Usually the most accurate way to determine a PCI device's power state is
> > to read its PM Control & Status Register.  There are two cases however
> > when this is not an option:  If the device doesn't have the PM
> > capability at all, or if it is in D3cold.
> > 
> > In D3cold, reading from config space typically results in a fabricated
> > "all ones" response.  But in D3hot, the two bits representing the power
> > state in the PMCSR are *also* set to 1.  Thus D3hot and D3cold are not
> > discernible by just reading the PMCSR.
> > 
> > A (supposedly) reliable way to detect D3cold is to query the platform
> > firmware for its opinion on the device's power state.  To this end,
> > add a ->get_power callback to struct pci_platform_pm_ops, and an
> > implementation to acpi_pci_platform_pm.  (The only pci_platform_pm_ops
> > existing so far).
> > 
> > Amend pci_update_current_state() to query the platform firmware before
> > reading the PMCSR.
> > 
> > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > ---
> >  drivers/pci/pci-acpi.c | 23 +++++++++++++++++++++++
> >  drivers/pci/pci.c      | 21 ++++++++++++++++-----
> >  drivers/pci/pci.h      |  3 +++
> >  3 files changed, 42 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> > index 9a033e8..89f2707 100644
> > --- a/drivers/pci/pci-acpi.c
> > +++ b/drivers/pci/pci-acpi.c
> > @@ -452,6 +452,28 @@ static int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
> >  	return error;
> >  }
> >  
> > +static pci_power_t acpi_pci_get_power_state(struct pci_dev *dev)
> > +{
> > +	struct acpi_device *adev = ACPI_COMPANION(&dev->dev);
> > +	static const pci_power_t state_conv[] = {
> > +		[ACPI_STATE_D0]      = PCI_D0,
> > +		[ACPI_STATE_D1]      = PCI_D1,
> > +		[ACPI_STATE_D2]      = PCI_D2,
> > +		[ACPI_STATE_D3_HOT]  = PCI_D3hot,
> > +		[ACPI_STATE_D3_COLD] = PCI_D3cold,
> > +	};
> > +	int state;
> 
> ACPI_STATE_D3_HOT and ACPI_STATE_D3_COLD were introduced in ACPI 4.0.  For
> systems predating that, ACPI_STATE_D3_HOT is the deepest state returned by
> acpi_device_get_power().

Would it be possible to detect the ACPI spec version the platform
firmware conforms to, and amend acpi_device_get_power() to return
ACPI_STATE_D3_COLD if the device is in D3?

Then we could avoid the unnecessary runtime resume after direct_complete
also for these older machines.

Can the revision in the FADT (offset 8) be used as a proxy?
=> E.g. the old Clevo B7130 has revision 3 in the FADT and uses
   a _DSM and _PS3 to put the discrete GPU in D3cold:
   https://github.com/Lekensteyn/acpi-stuff/tree/master/dsl/Clevo_B7130
=> Whereas the newer Clevo P651RA has revision 5 in the FADT and uses
   _PR3 to put the discrete GPU in D3cold:
   https://github.com/Lekensteyn/acpi-stuff/tree/master/dsl/Clevo_P651RA

However the FADT revision was already 4 in the ACPI 3.0 spec,
so we can only use it to discern ACPI 2.0 vs 3.0, not 3.0 vs 4.0,
which is what we'd actually want.  And there's a comment in
acpica/tbfadt.c that "The FADT revision value is unreliable."

Do you know of a better way to discern ACPI 3.0 vs 4.0?

> 
> > +
> > +	if (!adev || !acpi_device_power_manageable(adev))
> > +		return PCI_UNKNOWN;
> > +
> > +	if (acpi_device_get_power(adev, &state) || state < ACPI_STATE_D0
> > +						|| state > ACPI_STATE_D3_COLD)
> 
> If the device is power-manageable by ACPI (you've just checked that) and
> acpi_device_get_power() returns success (0), the returned state is guaranteed
> to be within the boundaries (if it isn't, there is a bug that needs to be
> fixed).

No, acpi_device_get_power() can also return ACPI_STATE_UNKNOWN, which has
the value 0xff.  I could add that to state_conv[] above but then I'd have
an array with 256 integers on the stack, most of them 0, which I don't want.
I could check for != ACPI_STATE_UNKNOWN but checking the boundaries seemed
safer. So I maintain that the code is correct.

> 
> > +		return PCI_UNKNOWN;
> > +
> > +	return state_conv[state];
> > +}
> > +
> >  static bool acpi_pci_can_wakeup(struct pci_dev *dev)
> >  {
> >  	struct acpi_device *adev = ACPI_COMPANION(&dev->dev);
> > @@ -534,6 +556,7 @@ static bool acpi_pci_need_resume(struct pci_dev *dev)
> >  static const struct pci_platform_pm_ops acpi_pci_platform_pm = {
> >  	.is_manageable = acpi_pci_power_manageable,
> >  	.set_state = acpi_pci_set_power_state,
> > +	.get_state = acpi_pci_get_power_state,
> >  	.choose_state = acpi_pci_choose_state,
> >  	.sleep_wake = acpi_pci_sleep_wake,
> >  	.run_wake = acpi_pci_run_wake,
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 72a9d3a..e52e3d4 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -552,8 +552,9 @@ static const struct pci_platform_pm_ops *pci_platform_pm;
> >  
> >  int pci_set_platform_pm(const struct pci_platform_pm_ops *ops)
> >  {
> > -	if (!ops->is_manageable || !ops->set_state || !ops->choose_state ||
> > -	    !ops->sleep_wake || !ops->run_wake || !ops->need_resume)
> > +	if (!ops->is_manageable || !ops->set_state  || !ops->get_state ||
> > +	    !ops->choose_state  || !ops->sleep_wake || !ops->run_wake  ||
> > +	    !ops->need_resume)
> >  		return -EINVAL;
> >  	pci_platform_pm = ops;
> >  	return 0;
> > @@ -570,6 +571,11 @@ static inline int platform_pci_set_power_state(struct pci_dev *dev,
> >  	return pci_platform_pm ? pci_platform_pm->set_state(dev, t) : -ENOSYS;
> >  }
> >  
> > +static inline pci_power_t platform_pci_get_power_state(struct pci_dev *dev)
> > +{
> > +	return pci_platform_pm ? pci_platform_pm->get_state(dev) : PCI_UNKNOWN;
> > +}
> > +
> >  static inline pci_power_t platform_pci_choose_state(struct pci_dev *dev)
> >  {
> >  	return pci_platform_pm ?
> > @@ -701,14 +707,19 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
> >  }
> >  
> >  /**
> > - * pci_update_current_state - Read PCI power state of given device from its
> > - *                            PCI PM registers and cache it
> > + * pci_update_current_state - Read power state of given device and cache it
> >   * @dev: PCI device to handle.
> >   * @state: State to cache in case the device doesn't have the PM capability
> > + *
> > + * The power state is read from the PMCSR register, which however is
> > + * inaccessible in D3cold. The platform firmware is therefore queried first
> > + * to detect accessibility of the register.
> >   */
> >  void pci_update_current_state(struct pci_dev *dev, pci_power_t state)
> >  {
> > -	if (dev->pm_cap) {
> > +	if (platform_pci_get_power_state(dev) == PCI_D3cold) {
> > +		dev->current_state = PCI_D3cold;
> > +	} else if (dev->pm_cap) {
> 
> Why exactly do you need to change this function?

It would be pointless to add the ->platform_pci_get_power_state
hook without using it anywhere, wouldn't it?

I am adding this here so that I can call pci_update_current_state()
in patch [3/4] to compare the device's state after system sleep with
the one before, and be able to discern D3hot and D3cold properly
(as explained in the commit message above).

That said, I need to amend the patch to remove this portion in
pci_update_current_state():

                if (dev->current_state == PCI_D3cold)
                        return;

because otherwise we'd never try to read the PMCSR if the firmware
says the device is in <= D3hot.

Thanks,

Lukas

> 
> >  		u16 pmcsr;
> >  
> >  		/*
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index 9730c47..01d5206 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -42,6 +42,8 @@ int pci_probe_reset_function(struct pci_dev *dev);
> >   *
> >   * @set_state: invokes the platform firmware to set the device's power state
> >   *
> > + * @get_state: queries the platform firmware for a device's current power state
> > + *
> >   * @choose_state: returns PCI power state of given device preferred by the
> >   *                platform; to be used during system-wide transitions from a
> >   *                sleeping state to the working state and vice versa
> > @@ -62,6 +64,7 @@ int pci_probe_reset_function(struct pci_dev *dev);
> >  struct pci_platform_pm_ops {
> >  	bool (*is_manageable)(struct pci_dev *dev);
> >  	int (*set_state)(struct pci_dev *dev, pci_power_t state);
> > +	pci_power_t (*get_state)(struct pci_dev *dev);
> >  	pci_power_t (*choose_state)(struct pci_dev *dev);
> >  	int (*sleep_wake)(struct pci_dev *dev, bool enable);
> >  	int (*run_wake)(struct pci_dev *dev, bool enable);
> > 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] PCI: Query platform firmware for device power state
  2016-09-14 10:50     ` Lukas Wunner
@ 2016-09-14 13:01       ` Rafael J. Wysocki
  2016-09-14 16:47         ` Rafael J. Wysocki
  2016-09-17 13:49         ` Lukas Wunner
  0 siblings, 2 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2016-09-14 13:01 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Wednesday, September 14, 2016 12:50:24 PM Lukas Wunner wrote:
> On Wed, Sep 14, 2016 at 02:21:30AM +0200, Rafael J. Wysocki wrote:
> > On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> > > Usually the most accurate way to determine a PCI device's power state is
> > > to read its PM Control & Status Register.  There are two cases however
> > > when this is not an option:  If the device doesn't have the PM
> > > capability at all, or if it is in D3cold.
> > > 
> > > In D3cold, reading from config space typically results in a fabricated
> > > "all ones" response.  But in D3hot, the two bits representing the power
> > > state in the PMCSR are *also* set to 1.  Thus D3hot and D3cold are not
> > > discernible by just reading the PMCSR.
> > > 
> > > A (supposedly) reliable way to detect D3cold is to query the platform
> > > firmware for its opinion on the device's power state.  To this end,
> > > add a ->get_power callback to struct pci_platform_pm_ops, and an
> > > implementation to acpi_pci_platform_pm.  (The only pci_platform_pm_ops
> > > existing so far).
> > > 
> > > Amend pci_update_current_state() to query the platform firmware before
> > > reading the PMCSR.
> > > 
> > > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > ---
> > >  drivers/pci/pci-acpi.c | 23 +++++++++++++++++++++++
> > >  drivers/pci/pci.c      | 21 ++++++++++++++++-----
> > >  drivers/pci/pci.h      |  3 +++
> > >  3 files changed, 42 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> > > index 9a033e8..89f2707 100644
> > > --- a/drivers/pci/pci-acpi.c
> > > +++ b/drivers/pci/pci-acpi.c
> > > @@ -452,6 +452,28 @@ static int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
> > >  	return error;
> > >  }
> > >  
> > > +static pci_power_t acpi_pci_get_power_state(struct pci_dev *dev)
> > > +{
> > > +	struct acpi_device *adev = ACPI_COMPANION(&dev->dev);
> > > +	static const pci_power_t state_conv[] = {
> > > +		[ACPI_STATE_D0]      = PCI_D0,
> > > +		[ACPI_STATE_D1]      = PCI_D1,
> > > +		[ACPI_STATE_D2]      = PCI_D2,
> > > +		[ACPI_STATE_D3_HOT]  = PCI_D3hot,
> > > +		[ACPI_STATE_D3_COLD] = PCI_D3cold,
> > > +	};
> > > +	int state;
> > 
> > ACPI_STATE_D3_HOT and ACPI_STATE_D3_COLD were introduced in ACPI 4.0.  For
> > systems predating that, ACPI_STATE_D3_HOT is the deepest state returned by
> > acpi_device_get_power().
> 
> Would it be possible to detect the ACPI spec version the platform
> firmware conforms to, and amend acpi_device_get_power() to return
> ACPI_STATE_D3_COLD if the device is in D3?
> 
> Then we could avoid the unnecessary runtime resume after direct_complete
> also for these older machines.

Well, please see below.

> Can the revision in the FADT (offset 8) be used as a proxy?
> => E.g. the old Clevo B7130 has revision 3 in the FADT and uses
>    a _DSM and _PS3 to put the discrete GPU in D3cold:
>    https://github.com/Lekensteyn/acpi-stuff/tree/master/dsl/Clevo_B7130
> => Whereas the newer Clevo P651RA has revision 5 in the FADT and uses
>    _PR3 to put the discrete GPU in D3cold:
>    https://github.com/Lekensteyn/acpi-stuff/tree/master/dsl/Clevo_P651RA
> 
> However the FADT revision was already 4 in the ACPI 3.0 spec,
> so we can only use it to discern ACPI 2.0 vs 3.0, not 3.0 vs 4.0,
> which is what we'd actually want.  And there's a comment in
> acpica/tbfadt.c that "The FADT revision value is unreliable."
> 
> Do you know of a better way to discern ACPI 3.0 vs 4.0?

I'm not sure if there is a reliable way to be honest.

Also even if the FADT etc tells you something, there's no guarantee that AML
actually follows that.

In fact, whether or not acpi_device_get_power() will ever return D3cold
for a device depends on whether or not _PR3 is there.  If it's not there,
the deepest state returned will be D3hot in any case.

So I'm not sure how useful that is in the context of D3cold detection?

> > 
> > > +
> > > +	if (!adev || !acpi_device_power_manageable(adev))
> > > +		return PCI_UNKNOWN;
> > > +
> > > +	if (acpi_device_get_power(adev, &state) || state < ACPI_STATE_D0
> > > +						|| state > ACPI_STATE_D3_COLD)
> > 
> > If the device is power-manageable by ACPI (you've just checked that) and
> > acpi_device_get_power() returns success (0), the returned state is guaranteed
> > to be within the boundaries (if it isn't, there is a bug that needs to be
> > fixed).
> 
> No, acpi_device_get_power() can also return ACPI_STATE_UNKNOWN, which has
> the value 0xff.

That would be the case without power resources or _PSC, right?

> I could add that to state_conv[] above but then I'd have
> an array with 256 integers on the stack, most of them 0, which I don't want.
> I could check for != ACPI_STATE_UNKNOWN but checking the boundaries seemed
> safer.

Well, I'm not sure in what way it is safer and you get one check instead of
two. :-)

> So I maintain that the code is correct.

It simply contains an redundant check.

> > 
> > > +		return PCI_UNKNOWN;
> > > +
> > > +	return state_conv[state];
> > > +}
> > > +
> > >  static bool acpi_pci_can_wakeup(struct pci_dev *dev)
> > >  {
> > >  	struct acpi_device *adev = ACPI_COMPANION(&dev->dev);
> > > @@ -534,6 +556,7 @@ static bool acpi_pci_need_resume(struct pci_dev *dev)
> > >  static const struct pci_platform_pm_ops acpi_pci_platform_pm = {
> > >  	.is_manageable = acpi_pci_power_manageable,
> > >  	.set_state = acpi_pci_set_power_state,
> > > +	.get_state = acpi_pci_get_power_state,
> > >  	.choose_state = acpi_pci_choose_state,
> > >  	.sleep_wake = acpi_pci_sleep_wake,
> > >  	.run_wake = acpi_pci_run_wake,
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 72a9d3a..e52e3d4 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -552,8 +552,9 @@ static const struct pci_platform_pm_ops *pci_platform_pm;
> > >  
> > >  int pci_set_platform_pm(const struct pci_platform_pm_ops *ops)
> > >  {
> > > -	if (!ops->is_manageable || !ops->set_state || !ops->choose_state ||
> > > -	    !ops->sleep_wake || !ops->run_wake || !ops->need_resume)
> > > +	if (!ops->is_manageable || !ops->set_state  || !ops->get_state ||
> > > +	    !ops->choose_state  || !ops->sleep_wake || !ops->run_wake  ||
> > > +	    !ops->need_resume)
> > >  		return -EINVAL;
> > >  	pci_platform_pm = ops;
> > >  	return 0;
> > > @@ -570,6 +571,11 @@ static inline int platform_pci_set_power_state(struct pci_dev *dev,
> > >  	return pci_platform_pm ? pci_platform_pm->set_state(dev, t) : -ENOSYS;
> > >  }
> > >  
> > > +static inline pci_power_t platform_pci_get_power_state(struct pci_dev *dev)
> > > +{
> > > +	return pci_platform_pm ? pci_platform_pm->get_state(dev) : PCI_UNKNOWN;
> > > +}
> > > +
> > >  static inline pci_power_t platform_pci_choose_state(struct pci_dev *dev)
> > >  {
> > >  	return pci_platform_pm ?
> > > @@ -701,14 +707,19 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
> > >  }
> > >  
> > >  /**
> > > - * pci_update_current_state - Read PCI power state of given device from its
> > > - *                            PCI PM registers and cache it
> > > + * pci_update_current_state - Read power state of given device and cache it
> > >   * @dev: PCI device to handle.
> > >   * @state: State to cache in case the device doesn't have the PM capability
> > > + *
> > > + * The power state is read from the PMCSR register, which however is
> > > + * inaccessible in D3cold. The platform firmware is therefore queried first
> > > + * to detect accessibility of the register.
> > >   */
> > >  void pci_update_current_state(struct pci_dev *dev, pci_power_t state)
> > >  {
> > > -	if (dev->pm_cap) {
> > > +	if (platform_pci_get_power_state(dev) == PCI_D3cold) {
> > > +		dev->current_state = PCI_D3cold;
> > > +	} else if (dev->pm_cap) {
> > 
> > Why exactly do you need to change this function?
> 
> It would be pointless to add the ->platform_pci_get_power_state
> hook without using it anywhere, wouldn't it?

That depends on what you want to do with it next.  Callers may be added in
separate patches, no problem with that.

> I am adding this here so that I can call pci_update_current_state()
> in patch [3/4] to compare the device's state after system sleep with
> the one before, and be able to discern D3hot and D3cold properly
> (as explained in the commit message above).
> 
> That said, I need to amend the patch to remove this portion in
> pci_update_current_state():
> 
>                 if (dev->current_state == PCI_D3cold)
>                         return;
> 
> because otherwise we'd never try to read the PMCSR if the firmware
> says the device is in <= D3hot.

pci_update_current_state() is called in a few places with assumptions that need
to be re-evaluated to see if they still hold after the changes.  They probably
do, but then again, these call sites need to be double checked.  If you did
that, then fine.

But I would say that (a) the next patch will add detection of the real
post-resume power state of "direct_complete" devices and it will use
pci_update_current_state() for that (to be consistent with the rest of
the code), so (b) pci_update_current_state() needs to ask ACPI about its
view on the power state of the device for this purpose.

And if you want to change that logic for PCI, it would be good to change it
in the same way for acpi_general_pm_domain which has exactly the same problem.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] PCI: Avoid unnecessary resume after direct-complete
  2016-09-14  9:56     ` Lukas Wunner
@ 2016-09-14 13:14       ` Rafael J. Wysocki
  2016-09-18 12:43         ` Lukas Wunner
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2016-09-14 13:14 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Wednesday, September 14, 2016 11:56:40 AM Lukas Wunner wrote:
> On Wed, Sep 14, 2016 at 02:50:13AM +0200, Rafael J. Wysocki wrote:
> > On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> > > Commit 58a1fbbb2ee8 ("PM / PCI / ACPI: Kick devices that might have been
> > > reset by firmware") added a runtime resume for devices that were runtime
> > > suspended when the system entered sleep.
> > > 
> > > The motivation was that devices might be in a reset-power-on state after
> > > waking from system sleep, so their power state as perceived by Linux
> > > (stored in pci_dev->current_state) would no longer reflect reality.
> > > By resuming such devices, we allow them to return to a low-power state
> > > via autosuspend and also bring their current_state in sync with reality.
> > > 
> > > However for devices that are *not* in a reset-power-on state, doing an
> > > unconditional resume wastes energy.  A more refined approach is called
> > > for which issues a runtime resume only if the power state after
> > > direct-complete is shallower than it was before. To achieve this, update
> > > the device's current_state by querying the platform firmware and reading
> > > its PMCSR.
> > > 
> > > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > ---
> > >  drivers/pci/pci-driver.c | 15 +++++++++++++--
> > >  1 file changed, 13 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > > index e39a67c..fd4b9c4 100644
> > > --- a/drivers/pci/pci-driver.c
> > > +++ b/drivers/pci/pci-driver.c
> > > @@ -684,8 +684,19 @@ static int pci_pm_prepare(struct device *dev)
> > >  
> > >  static void pci_pm_complete(struct device *dev)
> > >  {
> > > -	pci_dev_complete_resume(to_pci_dev(dev));
> > > -	pm_complete_with_resume_check(dev);
> > > +	struct pci_dev *pci_dev = to_pci_dev(dev);
> > > +
> > > +	pci_dev_complete_resume(pci_dev);
> > > +	pm_generic_complete(dev);
> > > +
> > > +	/* Resume device if platform firmware has put it in reset-power-on */
> > > +	if (dev->power.direct_complete && pm_resume_via_firmware()) {
> > > +		pci_power_t pre_sleep_state = pci_dev->current_state;
> > > +
> > > +		pci_update_current_state(pci_dev, pci_dev->current_state);
> > 
> > I'm not sure if this can be made reliable enough for all of the systems out
> > there (including the pre-ACPI-4.0 ones and so on).  I'm a bit concerned that
> > we'll uncover firmware issues and such by doing this.
> > 
> > How much of a problem the existing code is in practice?
> 
> Where's your ambition?  Do you want your power management to be as good
> as Apple's or are you going to waste energy for fear of uncovering
> firmware bugs?

That's not about my ambition or lack thereof, but about avoiding introducing
regressions on systems that are not Apple because of the ambition to be as
good as Apple.

And I'm not even talking about this particular piece of code, but about the
other code that calls pci_update_current_state().

> I first tried to solve this only for Thunderbolt by opting out of the
> mandatory resume after direct_complete.  It worked, but it was ugly.
> 
> (I have to clear the direct_complete flag in the ->complete hook, but
> the device tree is walked bottom-up during ->complete, and since the
> Thunderbolt controller exposes multiple devices, I have to clear the
> flag for all devices from the bottom-most device, which is the NHI.
> And all the rest of the s/r code lives on the top-most device, which
> is the upstream bridge.)
> 
> You suggested in your e-mail of July 18: "maybe it's better to change
> the PCI bus type to do something different from calling the generic
> function?"
> 
> So there, I did what you suggested and tried to fix it not just for
> Thunderbolt but for everyone.
> 
> And you're still not happy.
> 
> *sigh*

Sorry for disappointing you.

I have concerns, so I talk about them.  Is that wrong?  If so, what's wrong
about it?

My time goes into that as well, mind you.

> If on pre ACPI 4.0 systems, the device state is incorrectly reported
> as D3hot although the device is still in D3cold, the device will be
> runtime resumed to D0, just as it is now.  In other words, the benefit
> of avoiding an unnecessary runtime resume after direct_complete is
> only granted if the platform firmware reports the device state
> correctly.

OK

Which as I said in the other message I've just sent, if you use
acpi_device_get_power() for that, it will only report D3cold if (a) the device
has a _PR3 and (b) the power resources configuration in there indicates D3cold.

Does that cover the Thunderbolt case even?

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] PCI: Query platform firmware for device power state
  2016-09-14 13:01       ` Rafael J. Wysocki
@ 2016-09-14 16:47         ` Rafael J. Wysocki
  2016-09-14 21:36           ` Rafael J. Wysocki
  2016-09-14 22:58           ` Lukas Wunner
  2016-09-17 13:49         ` Lukas Wunner
  1 sibling, 2 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2016-09-14 16:47 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Wednesday, September 14, 2016 03:01:33 PM Rafael J. Wysocki wrote:
> On Wednesday, September 14, 2016 12:50:24 PM Lukas Wunner wrote:
> > On Wed, Sep 14, 2016 at 02:21:30AM +0200, Rafael J. Wysocki wrote:
> > > On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> > > > Usually the most accurate way to determine a PCI device's power state is
> > > > to read its PM Control & Status Register.  There are two cases however
> > > > when this is not an option:  If the device doesn't have the PM
> > > > capability at all, or if it is in D3cold.
> > > > 
> > > > In D3cold, reading from config space typically results in a fabricated
> > > > "all ones" response.  But in D3hot, the two bits representing the power
> > > > state in the PMCSR are *also* set to 1.  Thus D3hot and D3cold are not
> > > > discernible by just reading the PMCSR.
> > > > 
> > > > A (supposedly) reliable way to detect D3cold is to query the platform
> > > > firmware for its opinion on the device's power state.  To this end,
> > > > add a ->get_power callback to struct pci_platform_pm_ops, and an
> > > > implementation to acpi_pci_platform_pm.  (The only pci_platform_pm_ops
> > > > existing so far).
> > > > 
> > > > Amend pci_update_current_state() to query the platform firmware before
> > > > reading the PMCSR.
> > > > 
> > > > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > > ---
> > > >  drivers/pci/pci-acpi.c | 23 +++++++++++++++++++++++
> > > >  drivers/pci/pci.c      | 21 ++++++++++++++++-----
> > > >  drivers/pci/pci.h      |  3 +++
> > > >  3 files changed, 42 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> > > > index 9a033e8..89f2707 100644
> > > > --- a/drivers/pci/pci-acpi.c
> > > > +++ b/drivers/pci/pci-acpi.c
> > > > @@ -452,6 +452,28 @@ static int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
> > > >  	return error;
> > > >  }
> > > >  
> > > > +static pci_power_t acpi_pci_get_power_state(struct pci_dev *dev)
> > > > +{
> > > > +	struct acpi_device *adev = ACPI_COMPANION(&dev->dev);
> > > > +	static const pci_power_t state_conv[] = {
> > > > +		[ACPI_STATE_D0]      = PCI_D0,
> > > > +		[ACPI_STATE_D1]      = PCI_D1,
> > > > +		[ACPI_STATE_D2]      = PCI_D2,
> > > > +		[ACPI_STATE_D3_HOT]  = PCI_D3hot,
> > > > +		[ACPI_STATE_D3_COLD] = PCI_D3cold,
> > > > +	};
> > > > +	int state;
> > > 
> > > ACPI_STATE_D3_HOT and ACPI_STATE_D3_COLD were introduced in ACPI 4.0.  For
> > > systems predating that, ACPI_STATE_D3_HOT is the deepest state returned by
> > > acpi_device_get_power().
> > 
> > Would it be possible to detect the ACPI spec version the platform
> > firmware conforms to, and amend acpi_device_get_power() to return
> > ACPI_STATE_D3_COLD if the device is in D3?
> > 
> > Then we could avoid the unnecessary runtime resume after direct_complete
> > also for these older machines.
> 
> Well, please see below.
> 
> > Can the revision in the FADT (offset 8) be used as a proxy?
> > => E.g. the old Clevo B7130 has revision 3 in the FADT and uses
> >    a _DSM and _PS3 to put the discrete GPU in D3cold:
> >    https://github.com/Lekensteyn/acpi-stuff/tree/master/dsl/Clevo_B7130
> > => Whereas the newer Clevo P651RA has revision 5 in the FADT and uses
> >    _PR3 to put the discrete GPU in D3cold:
> >    https://github.com/Lekensteyn/acpi-stuff/tree/master/dsl/Clevo_P651RA
> > 
> > However the FADT revision was already 4 in the ACPI 3.0 spec,
> > so we can only use it to discern ACPI 2.0 vs 3.0, not 3.0 vs 4.0,
> > which is what we'd actually want.  And there's a comment in
> > acpica/tbfadt.c that "The FADT revision value is unreliable."
> > 
> > Do you know of a better way to discern ACPI 3.0 vs 4.0?
> 
> I'm not sure if there is a reliable way to be honest.
> 
> Also even if the FADT etc tells you something, there's no guarantee that AML
> actually follows that.
> 
> In fact, whether or not acpi_device_get_power() will ever return D3cold
> for a device depends on whether or not _PR3 is there.  If it's not there,
> the deepest state returned will be D3hot in any case.
> 
> So I'm not sure how useful that is in the context of D3cold detection?

There is more to this.

In fact, we can't really trust _PSC too, because in some ACPI tables it is
implemented to always return 0 (seriously).

For this reason, I think the way to go would be to use something like
acpi_device_update_power() to ensure that the device really is in the
state we think it is in.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] PCI: Query platform firmware for device power state
  2016-09-14 16:47         ` Rafael J. Wysocki
@ 2016-09-14 21:36           ` Rafael J. Wysocki
  2016-09-14 21:47             ` Rafael J. Wysocki
  2016-09-14 22:58           ` Lukas Wunner
  1 sibling, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2016-09-14 21:36 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Wednesday, September 14, 2016 06:47:40 PM Rafael J. Wysocki wrote:
> On Wednesday, September 14, 2016 03:01:33 PM Rafael J. Wysocki wrote:
> > On Wednesday, September 14, 2016 12:50:24 PM Lukas Wunner wrote:
> > > On Wed, Sep 14, 2016 at 02:21:30AM +0200, Rafael J. Wysocki wrote:
> > > > On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> > > > > Usually the most accurate way to determine a PCI device's power state is
> > > > > to read its PM Control & Status Register.  There are two cases however
> > > > > when this is not an option:  If the device doesn't have the PM
> > > > > capability at all, or if it is in D3cold.
> > > > > 
> > > > > In D3cold, reading from config space typically results in a fabricated
> > > > > "all ones" response.  But in D3hot, the two bits representing the power
> > > > > state in the PMCSR are *also* set to 1.  Thus D3hot and D3cold are not
> > > > > discernible by just reading the PMCSR.
> > > > > 
> > > > > A (supposedly) reliable way to detect D3cold is to query the platform
> > > > > firmware for its opinion on the device's power state.  To this end,
> > > > > add a ->get_power callback to struct pci_platform_pm_ops, and an
> > > > > implementation to acpi_pci_platform_pm.  (The only pci_platform_pm_ops
> > > > > existing so far).
> > > > > 
> > > > > Amend pci_update_current_state() to query the platform firmware before
> > > > > reading the PMCSR.
> > > > > 
> > > > > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > > > ---
> > > > >  drivers/pci/pci-acpi.c | 23 +++++++++++++++++++++++
> > > > >  drivers/pci/pci.c      | 21 ++++++++++++++++-----
> > > > >  drivers/pci/pci.h      |  3 +++
> > > > >  3 files changed, 42 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> > > > > index 9a033e8..89f2707 100644
> > > > > --- a/drivers/pci/pci-acpi.c
> > > > > +++ b/drivers/pci/pci-acpi.c
> > > > > @@ -452,6 +452,28 @@ static int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
> > > > >  	return error;
> > > > >  }
> > > > >  
> > > > > +static pci_power_t acpi_pci_get_power_state(struct pci_dev *dev)
> > > > > +{
> > > > > +	struct acpi_device *adev = ACPI_COMPANION(&dev->dev);
> > > > > +	static const pci_power_t state_conv[] = {
> > > > > +		[ACPI_STATE_D0]      = PCI_D0,
> > > > > +		[ACPI_STATE_D1]      = PCI_D1,
> > > > > +		[ACPI_STATE_D2]      = PCI_D2,
> > > > > +		[ACPI_STATE_D3_HOT]  = PCI_D3hot,
> > > > > +		[ACPI_STATE_D3_COLD] = PCI_D3cold,
> > > > > +	};
> > > > > +	int state;
> > > > 
> > > > ACPI_STATE_D3_HOT and ACPI_STATE_D3_COLD were introduced in ACPI 4.0.  For
> > > > systems predating that, ACPI_STATE_D3_HOT is the deepest state returned by
> > > > acpi_device_get_power().
> > > 
> > > Would it be possible to detect the ACPI spec version the platform
> > > firmware conforms to, and amend acpi_device_get_power() to return
> > > ACPI_STATE_D3_COLD if the device is in D3?
> > > 
> > > Then we could avoid the unnecessary runtime resume after direct_complete
> > > also for these older machines.
> > 
> > Well, please see below.
> > 
> > > Can the revision in the FADT (offset 8) be used as a proxy?
> > > => E.g. the old Clevo B7130 has revision 3 in the FADT and uses
> > >    a _DSM and _PS3 to put the discrete GPU in D3cold:
> > >    https://github.com/Lekensteyn/acpi-stuff/tree/master/dsl/Clevo_B7130
> > > => Whereas the newer Clevo P651RA has revision 5 in the FADT and uses
> > >    _PR3 to put the discrete GPU in D3cold:
> > >    https://github.com/Lekensteyn/acpi-stuff/tree/master/dsl/Clevo_P651RA
> > > 
> > > However the FADT revision was already 4 in the ACPI 3.0 spec,
> > > so we can only use it to discern ACPI 2.0 vs 3.0, not 3.0 vs 4.0,
> > > which is what we'd actually want.  And there's a comment in
> > > acpica/tbfadt.c that "The FADT revision value is unreliable."
> > > 
> > > Do you know of a better way to discern ACPI 3.0 vs 4.0?
> > 
> > I'm not sure if there is a reliable way to be honest.
> > 
> > Also even if the FADT etc tells you something, there's no guarantee that AML
> > actually follows that.
> > 
> > In fact, whether or not acpi_device_get_power() will ever return D3cold
> > for a device depends on whether or not _PR3 is there.  If it's not there,
> > the deepest state returned will be D3hot in any case.
> > 
> > So I'm not sure how useful that is in the context of D3cold detection?
> 
> There is more to this.
> 
> In fact, we can't really trust _PSC too, because in some ACPI tables it is
> implemented to always return 0 (seriously).
> 
> For this reason, I think the way to go would be to use something like
> acpi_device_update_power() to ensure that the device really is in the
> state we think it is in.

Actually, it may be sufficient to simply check if the device is in any
state different from D0 and leave it alone if that's the case.

For that, acpi_device_update_power() should be good enough and the PCI
native part would be simpler too I think.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] PCI: Query platform firmware for device power state
  2016-09-14 21:36           ` Rafael J. Wysocki
@ 2016-09-14 21:47             ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2016-09-14 21:47 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Wednesday, September 14, 2016 11:36:40 PM Rafael J. Wysocki wrote:
> On Wednesday, September 14, 2016 06:47:40 PM Rafael J. Wysocki wrote:
> > On Wednesday, September 14, 2016 03:01:33 PM Rafael J. Wysocki wrote:
> > > On Wednesday, September 14, 2016 12:50:24 PM Lukas Wunner wrote:
> > > > On Wed, Sep 14, 2016 at 02:21:30AM +0200, Rafael J. Wysocki wrote:
> > > > > On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> > > > > > Usually the most accurate way to determine a PCI device's power state is
> > > > > > to read its PM Control & Status Register.  There are two cases however
> > > > > > when this is not an option:  If the device doesn't have the PM
> > > > > > capability at all, or if it is in D3cold.
> > > > > > 
> > > > > > In D3cold, reading from config space typically results in a fabricated
> > > > > > "all ones" response.  But in D3hot, the two bits representing the power
> > > > > > state in the PMCSR are *also* set to 1.  Thus D3hot and D3cold are not
> > > > > > discernible by just reading the PMCSR.
> > > > > > 
> > > > > > A (supposedly) reliable way to detect D3cold is to query the platform
> > > > > > firmware for its opinion on the device's power state.  To this end,
> > > > > > add a ->get_power callback to struct pci_platform_pm_ops, and an
> > > > > > implementation to acpi_pci_platform_pm.  (The only pci_platform_pm_ops
> > > > > > existing so far).
> > > > > > 
> > > > > > Amend pci_update_current_state() to query the platform firmware before
> > > > > > reading the PMCSR.
> > > > > > 
> > > > > > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > > > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > > > > ---
> > > > > >  drivers/pci/pci-acpi.c | 23 +++++++++++++++++++++++
> > > > > >  drivers/pci/pci.c      | 21 ++++++++++++++++-----
> > > > > >  drivers/pci/pci.h      |  3 +++
> > > > > >  3 files changed, 42 insertions(+), 5 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> > > > > > index 9a033e8..89f2707 100644
> > > > > > --- a/drivers/pci/pci-acpi.c
> > > > > > +++ b/drivers/pci/pci-acpi.c
> > > > > > @@ -452,6 +452,28 @@ static int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
> > > > > >  	return error;
> > > > > >  }
> > > > > >  
> > > > > > +static pci_power_t acpi_pci_get_power_state(struct pci_dev *dev)
> > > > > > +{
> > > > > > +	struct acpi_device *adev = ACPI_COMPANION(&dev->dev);
> > > > > > +	static const pci_power_t state_conv[] = {
> > > > > > +		[ACPI_STATE_D0]      = PCI_D0,
> > > > > > +		[ACPI_STATE_D1]      = PCI_D1,
> > > > > > +		[ACPI_STATE_D2]      = PCI_D2,
> > > > > > +		[ACPI_STATE_D3_HOT]  = PCI_D3hot,
> > > > > > +		[ACPI_STATE_D3_COLD] = PCI_D3cold,
> > > > > > +	};
> > > > > > +	int state;
> > > > > 
> > > > > ACPI_STATE_D3_HOT and ACPI_STATE_D3_COLD were introduced in ACPI 4.0.  For
> > > > > systems predating that, ACPI_STATE_D3_HOT is the deepest state returned by
> > > > > acpi_device_get_power().
> > > > 
> > > > Would it be possible to detect the ACPI spec version the platform
> > > > firmware conforms to, and amend acpi_device_get_power() to return
> > > > ACPI_STATE_D3_COLD if the device is in D3?
> > > > 
> > > > Then we could avoid the unnecessary runtime resume after direct_complete
> > > > also for these older machines.
> > > 
> > > Well, please see below.
> > > 
> > > > Can the revision in the FADT (offset 8) be used as a proxy?
> > > > => E.g. the old Clevo B7130 has revision 3 in the FADT and uses
> > > >    a _DSM and _PS3 to put the discrete GPU in D3cold:
> > > >    https://github.com/Lekensteyn/acpi-stuff/tree/master/dsl/Clevo_B7130
> > > > => Whereas the newer Clevo P651RA has revision 5 in the FADT and uses
> > > >    _PR3 to put the discrete GPU in D3cold:
> > > >    https://github.com/Lekensteyn/acpi-stuff/tree/master/dsl/Clevo_P651RA
> > > > 
> > > > However the FADT revision was already 4 in the ACPI 3.0 spec,
> > > > so we can only use it to discern ACPI 2.0 vs 3.0, not 3.0 vs 4.0,
> > > > which is what we'd actually want.  And there's a comment in
> > > > acpica/tbfadt.c that "The FADT revision value is unreliable."
> > > > 
> > > > Do you know of a better way to discern ACPI 3.0 vs 4.0?
> > > 
> > > I'm not sure if there is a reliable way to be honest.
> > > 
> > > Also even if the FADT etc tells you something, there's no guarantee that AML
> > > actually follows that.
> > > 
> > > In fact, whether or not acpi_device_get_power() will ever return D3cold
> > > for a device depends on whether or not _PR3 is there.  If it's not there,
> > > the deepest state returned will be D3hot in any case.
> > > 
> > > So I'm not sure how useful that is in the context of D3cold detection?
> > 
> > There is more to this.
> > 
> > In fact, we can't really trust _PSC too, because in some ACPI tables it is
> > implemented to always return 0 (seriously).
> > 
> > For this reason, I think the way to go would be to use something like
> > acpi_device_update_power() to ensure that the device really is in the
> > state we think it is in.
> 
> Actually, it may be sufficient to simply check if the device is in any
> state different from D0 and leave it alone if that's the case.
> 
> For that, acpi_device_update_power() should be good enough and the PCI
> native part would be simpler too I think.

I meant acpi_device_get_power(), sorry.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] PCI: Query platform firmware for device power state
  2016-09-14 16:47         ` Rafael J. Wysocki
  2016-09-14 21:36           ` Rafael J. Wysocki
@ 2016-09-14 22:58           ` Lukas Wunner
  2016-09-15  0:49             ` Rafael J. Wysocki
  1 sibling, 1 reply; 27+ messages in thread
From: Lukas Wunner @ 2016-09-14 22:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Wed, Sep 14, 2016 at 06:47:40PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, September 14, 2016 03:01:33 PM Rafael J. Wysocki wrote:
> > On Wednesday, September 14, 2016 12:50:24 PM Lukas Wunner wrote:
> > > On Wed, Sep 14, 2016 at 02:21:30AM +0200, Rafael J. Wysocki wrote:
> > > > On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> > > > > Usually the most accurate way to determine a PCI device's power state
> > > > > is to read its PM Control & Status Register.  There are two cases
> > > > > however when this is not an option:  If the device doesn't have the
> > > > > PM capability at all, or if it is in D3cold.
> > > > > 
> > > > > In D3cold, reading from config space typically results in a
> > > > > fabricated "all ones" response.  But in D3hot, the two bits
> > > > > representing the power state in the PMCSR are *also* set to 1.
> > > > > Thus D3hot and D3cold are not discernible by just reading the PMCSR.
> > > > > 
> > > > > A (supposedly) reliable way to detect D3cold is to query the platform
> > > > > firmware for its opinion on the device's power state.  To this end,
> > > > > add a ->get_power callback to struct pci_platform_pm_ops, and an
> > > > > implementation to acpi_pci_platform_pm.  (The only
> > > > > pci_platform_pm_ops existing so far).
> > > > > 
> > > > > Amend pci_update_current_state() to query the platform firmware before
> > > > > reading the PMCSR.
> > > > > 
> > > > > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > > > ---
> > > > >  drivers/pci/pci-acpi.c | 23 +++++++++++++++++++++++
> > > > >  drivers/pci/pci.c      | 21 ++++++++++++++++-----
> > > > >  drivers/pci/pci.h      |  3 +++
> > > > >  3 files changed, 42 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> > > > > index 9a033e8..89f2707 100644
> > > > > --- a/drivers/pci/pci-acpi.c
> > > > > +++ b/drivers/pci/pci-acpi.c
> > > > > @@ -452,6 +452,28 @@ static int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
> > > > >  	return error;
> > > > >  }
> > > > >  
> > > > > +static pci_power_t acpi_pci_get_power_state(struct pci_dev *dev)
> > > > > +{
> > > > > +	struct acpi_device *adev = ACPI_COMPANION(&dev->dev);
> > > > > +	static const pci_power_t state_conv[] = {
> > > > > +		[ACPI_STATE_D0]      = PCI_D0,
> > > > > +		[ACPI_STATE_D1]      = PCI_D1,
> > > > > +		[ACPI_STATE_D2]      = PCI_D2,
> > > > > +		[ACPI_STATE_D3_HOT]  = PCI_D3hot,
> > > > > +		[ACPI_STATE_D3_COLD] = PCI_D3cold,
> > > > > +	};
> > > > > +	int state;
> > > > 
> > > > ACPI_STATE_D3_HOT and ACPI_STATE_D3_COLD were introduced in ACPI 4.0.
> > > > For systems predating that, ACPI_STATE_D3_HOT is the deepest state
> > > > returned by acpi_device_get_power().
> > > 
> > > Would it be possible to detect the ACPI spec version the platform
> > > firmware conforms to, and amend acpi_device_get_power() to return
> > > ACPI_STATE_D3_COLD if the device is in D3?
> > > 
> > > Then we could avoid the unnecessary runtime resume after direct_complete
> > > also for these older machines.
> > 
> > Well, please see below.
> > 
> > > Can the revision in the FADT (offset 8) be used as a proxy?
> > > => E.g. the old Clevo B7130 has revision 3 in the FADT and uses
> > >    a _DSM and _PS3 to put the discrete GPU in D3cold:
> > >    https://github.com/Lekensteyn/acpi-stuff/tree/master/dsl/Clevo_B7130
> > > => Whereas the newer Clevo P651RA has revision 5 in the FADT and uses
> > >    _PR3 to put the discrete GPU in D3cold:
> > >    https://github.com/Lekensteyn/acpi-stuff/tree/master/dsl/Clevo_P651RA
> > > 
> > > However the FADT revision was already 4 in the ACPI 3.0 spec,
> > > so we can only use it to discern ACPI 2.0 vs 3.0, not 3.0 vs 4.0,
> > > which is what we'd actually want.  And there's a comment in
> > > acpica/tbfadt.c that "The FADT revision value is unreliable."
> > > 
> > > Do you know of a better way to discern ACPI 3.0 vs 4.0?
> > 
> > I'm not sure if there is a reliable way to be honest.
> > 
> > Also even if the FADT etc tells you something, there's no guarantee that
> > AML actually follows that.
> > 
> > In fact, whether or not acpi_device_get_power() will ever return D3cold
> > for a device depends on whether or not _PR3 is there.  If it's not there,
> > the deepest state returned will be D3hot in any case.
> > 
> > So I'm not sure how useful that is in the context of D3cold detection?
> 
> There is more to this.
> 
> In fact, we can't really trust _PSC too, because in some ACPI tables it is
> implemented to always return 0 (seriously).
> 
> For this reason, I think the way to go would be to use something like
> acpi_device_update_power() to ensure that the device really is in the
> state we think it is in.

The patch only trusts the platform firmware to report D3cold correctly.
If it reports anything else the patch reads the PMCSR.

Thus if _PSC returns a bogus D0 status, worst that can happen is that we
incorrectly assume D3hot while the device is actually in D3cold.  Which
is still kind of a bummer.

We've got this handy pci_device_is_present() function which reads the
vendor ID and returns false if it's "all ones" (which is what we'd get
in D3cold).

What if we ask the platform firmware first.  If it says D3cold, assume
that's true.  In all other cases, call pci_device_is_present().  If
that returns false, assume D3cold.  Otherwise read the PMCSR.

That would also work for devices that are suspended to D3cold in a
nonstandard way, such as Thunderbolt.  Having a function that updates
the current_state in a robust way and correctly discerns D3cold and
D3hot would be really useful I think.

Thanks a lot for sharing these ACPI intricacies, it wasn't clear to me
in how far we can trust the platform firmware.

Best regards,

Lukas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] PCI: Query platform firmware for device power state
  2016-09-14 22:58           ` Lukas Wunner
@ 2016-09-15  0:49             ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2016-09-15  0:49 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Thursday, September 15, 2016 12:58:57 AM Lukas Wunner wrote:
> On Wed, Sep 14, 2016 at 06:47:40PM +0200, Rafael J. Wysocki wrote:
> > On Wednesday, September 14, 2016 03:01:33 PM Rafael J. Wysocki wrote:
> > > On Wednesday, September 14, 2016 12:50:24 PM Lukas Wunner wrote:
> > > > On Wed, Sep 14, 2016 at 02:21:30AM +0200, Rafael J. Wysocki wrote:
> > > > > On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> > > > > > Usually the most accurate way to determine a PCI device's power state
> > > > > > is to read its PM Control & Status Register.  There are two cases
> > > > > > however when this is not an option:  If the device doesn't have the
> > > > > > PM capability at all, or if it is in D3cold.
> > > > > > 
> > > > > > In D3cold, reading from config space typically results in a
> > > > > > fabricated "all ones" response.  But in D3hot, the two bits
> > > > > > representing the power state in the PMCSR are *also* set to 1.
> > > > > > Thus D3hot and D3cold are not discernible by just reading the PMCSR.
> > > > > > 
> > > > > > A (supposedly) reliable way to detect D3cold is to query the platform
> > > > > > firmware for its opinion on the device's power state.  To this end,
> > > > > > add a ->get_power callback to struct pci_platform_pm_ops, and an
> > > > > > implementation to acpi_pci_platform_pm.  (The only
> > > > > > pci_platform_pm_ops existing so far).
> > > > > > 
> > > > > > Amend pci_update_current_state() to query the platform firmware before
> > > > > > reading the PMCSR.
> > > > > > 
> > > > > > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > > > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > > > > ---
> > > > > >  drivers/pci/pci-acpi.c | 23 +++++++++++++++++++++++
> > > > > >  drivers/pci/pci.c      | 21 ++++++++++++++++-----
> > > > > >  drivers/pci/pci.h      |  3 +++
> > > > > >  3 files changed, 42 insertions(+), 5 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> > > > > > index 9a033e8..89f2707 100644
> > > > > > --- a/drivers/pci/pci-acpi.c
> > > > > > +++ b/drivers/pci/pci-acpi.c
> > > > > > @@ -452,6 +452,28 @@ static int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
> > > > > >  	return error;
> > > > > >  }
> > > > > >  
> > > > > > +static pci_power_t acpi_pci_get_power_state(struct pci_dev *dev)
> > > > > > +{
> > > > > > +	struct acpi_device *adev = ACPI_COMPANION(&dev->dev);
> > > > > > +	static const pci_power_t state_conv[] = {
> > > > > > +		[ACPI_STATE_D0]      = PCI_D0,
> > > > > > +		[ACPI_STATE_D1]      = PCI_D1,
> > > > > > +		[ACPI_STATE_D2]      = PCI_D2,
> > > > > > +		[ACPI_STATE_D3_HOT]  = PCI_D3hot,
> > > > > > +		[ACPI_STATE_D3_COLD] = PCI_D3cold,
> > > > > > +	};
> > > > > > +	int state;
> > > > > 
> > > > > ACPI_STATE_D3_HOT and ACPI_STATE_D3_COLD were introduced in ACPI 4.0.
> > > > > For systems predating that, ACPI_STATE_D3_HOT is the deepest state
> > > > > returned by acpi_device_get_power().
> > > > 
> > > > Would it be possible to detect the ACPI spec version the platform
> > > > firmware conforms to, and amend acpi_device_get_power() to return
> > > > ACPI_STATE_D3_COLD if the device is in D3?
> > > > 
> > > > Then we could avoid the unnecessary runtime resume after direct_complete
> > > > also for these older machines.
> > > 
> > > Well, please see below.
> > > 
> > > > Can the revision in the FADT (offset 8) be used as a proxy?
> > > > => E.g. the old Clevo B7130 has revision 3 in the FADT and uses
> > > >    a _DSM and _PS3 to put the discrete GPU in D3cold:
> > > >    https://github.com/Lekensteyn/acpi-stuff/tree/master/dsl/Clevo_B7130
> > > > => Whereas the newer Clevo P651RA has revision 5 in the FADT and uses
> > > >    _PR3 to put the discrete GPU in D3cold:
> > > >    https://github.com/Lekensteyn/acpi-stuff/tree/master/dsl/Clevo_P651RA
> > > > 
> > > > However the FADT revision was already 4 in the ACPI 3.0 spec,
> > > > so we can only use it to discern ACPI 2.0 vs 3.0, not 3.0 vs 4.0,
> > > > which is what we'd actually want.  And there's a comment in
> > > > acpica/tbfadt.c that "The FADT revision value is unreliable."
> > > > 
> > > > Do you know of a better way to discern ACPI 3.0 vs 4.0?
> > > 
> > > I'm not sure if there is a reliable way to be honest.
> > > 
> > > Also even if the FADT etc tells you something, there's no guarantee that
> > > AML actually follows that.
> > > 
> > > In fact, whether or not acpi_device_get_power() will ever return D3cold
> > > for a device depends on whether or not _PR3 is there.  If it's not there,
> > > the deepest state returned will be D3hot in any case.
> > > 
> > > So I'm not sure how useful that is in the context of D3cold detection?
> > 
> > There is more to this.
> > 
> > In fact, we can't really trust _PSC too, because in some ACPI tables it is
> > implemented to always return 0 (seriously).
> > 
> > For this reason, I think the way to go would be to use something like
> > acpi_device_update_power() to ensure that the device really is in the
> > state we think it is in.
> 
> The patch only trusts the platform firmware to report D3cold correctly.
> If it reports anything else the patch reads the PMCSR.
> 
> Thus if _PSC returns a bogus D0 status, worst that can happen is that we
> incorrectly assume D3hot while the device is actually in D3cold.  Which
> is still kind of a bummer.
> 
> We've got this handy pci_device_is_present() function which reads the
> vendor ID and returns false if it's "all ones" (which is what we'd get
> in D3cold).
> 
> What if we ask the platform firmware first.  If it says D3cold, assume
> that's true.  In all other cases, call pci_device_is_present().  If
> that returns false, assume D3cold.  Otherwise read the PMCSR.

Sounds reasonable.  We already use pci_device_is_present() for a similar
purpose in at least one place IIRC.

> That would also work for devices that are suspended to D3cold in a
> nonstandard way, such as Thunderbolt.  Having a function that updates
> the current_state in a robust way and correctly discerns D3cold and
> D3hot would be really useful I think.
> 
> Thanks a lot for sharing these ACPI intricacies, it wasn't clear to me
> in how far we can trust the platform firmware.

No problem.  Unfortunately, it often takes me some time to recall those
things.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/4] PCI: Avoid unnecessary resume on shutdown
  2016-09-14  0:29   ` Rafael J. Wysocki
@ 2016-09-15 13:11     ` Lukas Wunner
  2016-09-15 13:57       ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Lukas Wunner @ 2016-09-15 13:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Wed, Sep 14, 2016 at 02:29:52AM +0200, Rafael J. Wysocki wrote:
> On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> > We currently perform a mandatory runtime resume of all PCI devices on
> > ->shutdown.  However it is pointless to wake devices only to immediately
> > power them down afterwards.  (Or have the firmware reset them, in case
> > of a reboot.)
> > 
> > It seems there are only two cases when a runtime resume is actually
> > necessary:  If the driver has declared a ->shutdown callback or if kexec
> > is in progress.
> > 
> > Constrain resume of a device to these cases and let it slumber
> > otherwise, thereby conserving energy and speeding up shutdown.
> > 
> > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > ---
> >  drivers/pci/pci-driver.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index fd4b9c4..09a4e56 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -459,6 +459,11 @@ static void pci_device_shutdown(struct device *dev)
> >  	struct pci_dev *pci_dev = to_pci_dev(dev);
> >  	struct pci_driver *drv = pci_dev->driver;
> >  
> > +	/* Fast path for suspended devices */
> > +	if (pm_runtime_suspended(dev) && (!drv || !drv->shutdown) &&
> > +	    !kexec_in_progress)
> 
> What happens if runtime suspend or resume of the device happens here?

You're right, good point.  How about disabling runtime PM then, like this:

	/* Fast path for suspended devices */
	if (pm_runtime_status_suspended(dev) && (!drv || !drv->shutdown) &&
	    !kexec_in_progress) {
		pm_runtime_disable(dev);
		if (pm_runtime_status_suspended(dev))
			return;
		pm_runtime_enable(dev);
	}

All dependents (children, and in the future, consumers) should already
have been treated at that point, due to the ordering of devices_kset->list.

Thanks!

Lukas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/4] PCI: Avoid unnecessary resume on shutdown
  2016-09-15 13:11     ` Lukas Wunner
@ 2016-09-15 13:57       ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2016-09-15 13:57 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Rafael J. Wysocki, Linux PCI, Linux PM, ACPI Devel Maling List,
	Peter Wu, Andreas Noever

On Thu, Sep 15, 2016 at 3:11 PM, Lukas Wunner <lukas@wunner.de> wrote:
> On Wed, Sep 14, 2016 at 02:29:52AM +0200, Rafael J. Wysocki wrote:
>> On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
>> > We currently perform a mandatory runtime resume of all PCI devices on
>> > ->shutdown.  However it is pointless to wake devices only to immediately
>> > power them down afterwards.  (Or have the firmware reset them, in case
>> > of a reboot.)
>> >
>> > It seems there are only two cases when a runtime resume is actually
>> > necessary:  If the driver has declared a ->shutdown callback or if kexec
>> > is in progress.
>> >
>> > Constrain resume of a device to these cases and let it slumber
>> > otherwise, thereby conserving energy and speeding up shutdown.
>> >
>> > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
>> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
>> > ---
>> >  drivers/pci/pci-driver.c | 5 +++++
>> >  1 file changed, 5 insertions(+)
>> >
>> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>> > index fd4b9c4..09a4e56 100644
>> > --- a/drivers/pci/pci-driver.c
>> > +++ b/drivers/pci/pci-driver.c
>> > @@ -459,6 +459,11 @@ static void pci_device_shutdown(struct device *dev)
>> >     struct pci_dev *pci_dev = to_pci_dev(dev);
>> >     struct pci_driver *drv = pci_dev->driver;
>> >
>> > +   /* Fast path for suspended devices */
>> > +   if (pm_runtime_suspended(dev) && (!drv || !drv->shutdown) &&
>> > +       !kexec_in_progress)
>>
>> What happens if runtime suspend or resume of the device happens here?
>
> You're right, good point.  How about disabling runtime PM then, like this:
>
>         /* Fast path for suspended devices */
>         if (pm_runtime_status_suspended(dev) && (!drv || !drv->shutdown) &&
>             !kexec_in_progress) {
>                 pm_runtime_disable(dev);
>                 if (pm_runtime_status_suspended(dev))
>                         return;
>                 pm_runtime_enable(dev);
>         }
>
> All dependents (children, and in the future, consumers) should already
> have been treated at that point, due to the ordering of devices_kset->list.

That should work.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] PCI: Query platform firmware for device power state
  2016-09-14 13:01       ` Rafael J. Wysocki
  2016-09-14 16:47         ` Rafael J. Wysocki
@ 2016-09-17 13:49         ` Lukas Wunner
  2016-09-18  1:09           ` Rafael J. Wysocki
  1 sibling, 1 reply; 27+ messages in thread
From: Lukas Wunner @ 2016-09-17 13:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Wed, Sep 14, 2016 at 03:01:33PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, September 14, 2016 12:50:24 PM Lukas Wunner wrote:
> > On Wed, Sep 14, 2016 at 02:21:30AM +0200, Rafael J. Wysocki wrote:
> > > On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> > > > +
> > > > +	if (!adev || !acpi_device_power_manageable(adev))
> > > > +		return PCI_UNKNOWN;
> > > > +
> > > > +	if (acpi_device_get_power(adev, &state) || state < ACPI_STATE_D0
> > > > +						|| state > ACPI_STATE_D3_COLD)
> > > 
> > > If the device is power-manageable by ACPI (you've just checked that) and
> > > acpi_device_get_power() returns success (0), the returned state is
> > > guaranteed to be within the boundaries (if it isn't, there is a bug that
> > > needs to be fixed).
> > 
> > No, acpi_device_get_power() can also return ACPI_STATE_UNKNOWN, which has
> > the value 0xff.
> 
> That would be the case without power resources or _PSC, right?

There's this check in acpi_device_get_power():

                else if (result == ACPI_STATE_UNKNOWN)
                        result = psc > ACPI_STATE_D2 ? ACPI_STATE_D3_HOT : psc;

where "result" has been set further up by acpi_power_get_inferred_state().
If acpi_power_get_inferred_state() does return this value and _PSC is
not present (i.e. device->power.flags.explicit_get is not set),
then acpi_device_get_power() would return ACPI_STATE_UNKNOWN.

Also, if the device is not power manageable, has neither power resources
nor _PSC, and its parent has power state ACPI_STATE_UNKNOWN, then this
will be returned.


> > I could add that to state_conv[] above but then I'd have an array with
> > 256 integers on the stack, most of them 0, which I don't want.
> > I could check for != ACPI_STATE_UNKNOWN but checking the boundaries seemed
> > safer.
> 
> Well, I'm not sure in what way it is safer and you get one check instead of
> two. :-)
> 
> > So I maintain that the code is correct.
> 
> It simply contains an redundant check.

Fair enough, I'm only checking for ACPI_STATE_UNKNOWN now.


> > > >  void pci_update_current_state(struct pci_dev *dev, pci_power_t state)
> > > >  {
> > > > -	if (dev->pm_cap) {
> > > > +	if (platform_pci_get_power_state(dev) == PCI_D3cold) {
> > > > +		dev->current_state = PCI_D3cold;
> > > > +	} else if (dev->pm_cap) {
> > > 
> > > Why exactly do you need to change this function?
> > 
> > It would be pointless to add the ->platform_pci_get_power_state
> > hook without using it anywhere, wouldn't it?
> 
> That depends on what you want to do with it next.  Callers may be added in
> separate patches, no problem with that.

I've moved the change of pci_update_current_state() to a separate patch now,
that will also make it easier to revert it, should anything blow up.


> pci_update_current_state() is called in a few places with assumptions that
> need to be re-evaluated to see if they still hold after the changes.  They
> probably do, but then again, these call sites need to be double checked.
> If you did that, then fine.

pci_update_current_state() is called whenever a device is resumed
(both runtime and after system sleep) and after changing its power
state using the platform in pci_platform_power_transition().

With the new patch, pci_update_current_state() will be more accurate
than it is now:  Laptop hybrid graphics which are not platform-power-
manageable (older Optimus/ATPX or current MacBook Pro) will power
down the GPU but its current_state will be D3hot.  That's because
nouveau/radeon call pci_set_power_state(pdev, PCI_D3cold) and the
PCI core will only put it in D3hot due to the lack of platform-power-
manageability.  When the device is resumed, pci_update_current_state()
will now change the current_state from D3hot to D3cold.  I'm actually
seeing this on my MacBook Pro.  When the system is subsequently put
to sleep, direct_complete will still be afforded despite the changed
current_state because of the first patch in this series.  Works like
a charm, I'm curious if this causes issues for others, but I doubt it.


> And if you want to change that logic for PCI, it would be good to change it
> in the same way for acpi_general_pm_domain which has exactly the same
> problem.

Hm, with PCI I can read the PMCSR and detect D3cold by reading the
vendor ID.  For generic ACPI devices, I don't have that so I have
to rely on the accuracy of acpi_device_get_power().  However as
you've pointed out, it might misrepresent D3cold as D3hot, and it
might incorrectly report D0 even though the device is in a different
state.  Is it safe to rely on acpi_device_get_power() then?

Another idea would be to use acpi_device_get_power() for PCI devices
without PM capability.  Then we could do away with the "state"
argument to pci_update_current_state().  This too hinges on the
reliability of acpi_device_get_power() of course.  At least D3cold
can be detected by reading the vendor ID, so we're not reliant on
ACPI for that.  I've even got a commit on my development branch
to make that change, but I can't test it, my machine doesn't have
PCI devices without PM cap.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] PCI: Query platform firmware for device power state
  2016-09-17 13:49         ` Lukas Wunner
@ 2016-09-18  1:09           ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2016-09-18  1:09 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Saturday, September 17, 2016 03:49:43 PM Lukas Wunner wrote:
> On Wed, Sep 14, 2016 at 03:01:33PM +0200, Rafael J. Wysocki wrote:
> > On Wednesday, September 14, 2016 12:50:24 PM Lukas Wunner wrote:
> > > On Wed, Sep 14, 2016 at 02:21:30AM +0200, Rafael J. Wysocki wrote:
> > > > On Wednesday, August 31, 2016 08:15:18 AM Lukas Wunner wrote:
> > > > > +
> > > > > +	if (!adev || !acpi_device_power_manageable(adev))
> > > > > +		return PCI_UNKNOWN;
> > > > > +
> > > > > +	if (acpi_device_get_power(adev, &state) || state < ACPI_STATE_D0
> > > > > +						|| state > ACPI_STATE_D3_COLD)
> > > > 
> > > > If the device is power-manageable by ACPI (you've just checked that) and
> > > > acpi_device_get_power() returns success (0), the returned state is
> > > > guaranteed to be within the boundaries (if it isn't, there is a bug that
> > > > needs to be fixed).
> > > 
> > > No, acpi_device_get_power() can also return ACPI_STATE_UNKNOWN, which has
> > > the value 0xff.
> > 
> > That would be the case without power resources or _PSC, right?
> 
> There's this check in acpi_device_get_power():
> 
>                 else if (result == ACPI_STATE_UNKNOWN)
>                         result = psc > ACPI_STATE_D2 ? ACPI_STATE_D3_HOT : psc;
> 
> where "result" has been set further up by acpi_power_get_inferred_state().
> If acpi_power_get_inferred_state() does return this value

It doesn't.

> and _PSC is not present (i.e. device->power.flags.explicit_get is not set),
> then acpi_device_get_power() would return ACPI_STATE_UNKNOWN.
> 
> Also, if the device is not power manageable,

You had a check for that.

> has neither power resources nor _PSC, and its parent has power state
> ACPI_STATE_UNKNOWN, then this will be returned.

With the power-manageable check upfront the only case I can see is when the
device has _PS0, it doesn't have _PR0 (ie. no power resources) and _PSC is not
present.

> > > I could add that to state_conv[] above but then I'd have an array with
> > > 256 integers on the stack, most of them 0, which I don't want.
> > > I could check for != ACPI_STATE_UNKNOWN but checking the boundaries seemed
> > > safer.
> > 
> > Well, I'm not sure in what way it is safer and you get one check instead of
> > two. :-)
> > 
> > > So I maintain that the code is correct.
> > 
> > It simply contains an redundant check.
> 
> Fair enough, I'm only checking for ACPI_STATE_UNKNOWN now.
> 
> 
> > > > >  void pci_update_current_state(struct pci_dev *dev, pci_power_t state)
> > > > >  {
> > > > > -	if (dev->pm_cap) {
> > > > > +	if (platform_pci_get_power_state(dev) == PCI_D3cold) {
> > > > > +		dev->current_state = PCI_D3cold;
> > > > > +	} else if (dev->pm_cap) {
> > > > 
> > > > Why exactly do you need to change this function?
> > > 
> > > It would be pointless to add the ->platform_pci_get_power_state
> > > hook without using it anywhere, wouldn't it?
> > 
> > That depends on what you want to do with it next.  Callers may be added in
> > separate patches, no problem with that.
> 
> I've moved the change of pci_update_current_state() to a separate patch now,
> that will also make it easier to revert it, should anything blow up.
> 
> 
> > pci_update_current_state() is called in a few places with assumptions that
> > need to be re-evaluated to see if they still hold after the changes.  They
> > probably do, but then again, these call sites need to be double checked.
> > If you did that, then fine.
> 
> pci_update_current_state() is called whenever a device is resumed
> (both runtime and after system sleep) and after changing its power
> state using the platform in pci_platform_power_transition().
> 
> With the new patch, pci_update_current_state() will be more accurate
> than it is now:  Laptop hybrid graphics which are not platform-power-
> manageable (older Optimus/ATPX or current MacBook Pro) will power
> down the GPU but its current_state will be D3hot.  That's because
> nouveau/radeon call pci_set_power_state(pdev, PCI_D3cold) and the
> PCI core will only put it in D3hot due to the lack of platform-power-
> manageability.  When the device is resumed, pci_update_current_state()
> will now change the current_state from D3hot to D3cold.  I'm actually
> seeing this on my MacBook Pro.  When the system is subsequently put
> to sleep, direct_complete will still be afforded despite the changed
> current_state because of the first patch in this series.  Works like
> a charm, I'm curious if this causes issues for others, but I doubt it.

Well, we'll see.

> > And if you want to change that logic for PCI, it would be good to change it
> > in the same way for acpi_general_pm_domain which has exactly the same
> > problem.
> 
> Hm, with PCI I can read the PMCSR and detect D3cold by reading the
> vendor ID.  For generic ACPI devices, I don't have that so I have
> to rely on the accuracy of acpi_device_get_power().  However as
> you've pointed out, it might misrepresent D3cold as D3hot, and it
> might incorrectly report D0 even though the device is in a different
> state.  Is it safe to rely on acpi_device_get_power() then?

This is all about optimizing the current suboptimal behavior and that can be
done for the cases when the answer is known to be most likely correct.

Like for example when acpi_device_get_power() reports D3cold (or D1, D2 for
that matter).

IOW, D0 and D3hot may be problematic for different reasons.

> Another idea would be to use acpi_device_get_power() for PCI devices
> without PM capability.  Then we could do away with the "state"
> argument to pci_update_current_state().

That I'm not sure about.

> This too hinges on the
> reliability of acpi_device_get_power() of course.  At least D3cold
> can be detected by reading the vendor ID, so we're not reliant on
> ACPI for that.  I've even got a commit on my development branch
> to make that change, but I can't test it, my machine doesn't have
> PCI devices without PM cap.

Those are rare now. All PCIe will have it and PCI 2.0 and later as well IIRC.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] PCI: Avoid unnecessary resume after direct-complete
  2016-09-14 13:14       ` Rafael J. Wysocki
@ 2016-09-18 12:43         ` Lukas Wunner
  0 siblings, 0 replies; 27+ messages in thread
From: Lukas Wunner @ 2016-09-18 12:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pci, linux-pm, linux-acpi, Peter Wu, Andreas Noever

On Wed, Sep 14, 2016 at 03:14:37PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, September 14, 2016 11:56:40 AM Lukas Wunner wrote:
> > I first tried to solve this only for Thunderbolt by opting out of the
> > mandatory resume after direct_complete.  It worked, but it was ugly.
> > 
> > (I have to clear the direct_complete flag in the ->complete hook, but
> > the device tree is walked bottom-up during ->complete, and since the
> > Thunderbolt controller exposes multiple devices, I have to clear the
> > flag for all devices from the bottom-most device, which is the NHI.
> > And all the rest of the s/r code lives on the top-most device, which
> > is the upstream bridge.)
> > 
> > You suggested in your e-mail of July 18: "maybe it's better to change
> > the PCI bus type to do something different from calling the generic
> > function?"
> > 
> > So there, I did what you suggested and tried to fix it not just for
> > Thunderbolt but for everyone.
> > 
> > And you're still not happy.
> > 
> > *sigh*
> 
> Sorry for disappointing you.
> 
> I have concerns, so I talk about them.  Is that wrong?  If so, what's wrong
> about it?
> 
> My time goes into that as well, mind you.

Sorry, never mind, just a bout of desperation that I failed to control.

Kind regards,

Lukas

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-09-18 12:43 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-31  6:15 [PATCH 0/4] PCI PM refinements Lukas Wunner
2016-08-31  6:15 ` [PATCH 2/4] PCI: Query platform firmware for device power state Lukas Wunner
2016-09-14  0:21   ` Rafael J. Wysocki
2016-09-14 10:50     ` Lukas Wunner
2016-09-14 13:01       ` Rafael J. Wysocki
2016-09-14 16:47         ` Rafael J. Wysocki
2016-09-14 21:36           ` Rafael J. Wysocki
2016-09-14 21:47             ` Rafael J. Wysocki
2016-09-14 22:58           ` Lukas Wunner
2016-09-15  0:49             ` Rafael J. Wysocki
2016-09-17 13:49         ` Lukas Wunner
2016-09-18  1:09           ` Rafael J. Wysocki
2016-08-31  6:15 ` [PATCH 1/4] PCI: Afford direct-complete to devices with nonstandard PM Lukas Wunner
2016-09-14  0:05   ` Rafael J. Wysocki
2016-08-31  6:15 ` [PATCH 4/4] PCI: Avoid unnecessary resume on shutdown Lukas Wunner
2016-09-14  0:29   ` Rafael J. Wysocki
2016-09-15 13:11     ` Lukas Wunner
2016-09-15 13:57       ` Rafael J. Wysocki
2016-08-31  6:15 ` [PATCH 3/4] PCI: Avoid unnecessary resume after direct-complete Lukas Wunner
2016-09-14  0:29   ` Rafael J. Wysocki
2016-09-14  0:43     ` Rafael J. Wysocki
2016-09-14  0:50   ` Rafael J. Wysocki
2016-09-14  9:56     ` Lukas Wunner
2016-09-14 13:14       ` Rafael J. Wysocki
2016-09-18 12:43         ` Lukas Wunner
2016-09-12 22:07 ` [PATCH 0/4] PCI PM refinements Bjorn Helgaas
2016-09-12 22:15   ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).