linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/12] VRAM Self Refresh
@ 2025-04-01 15:32 Anshuman Gupta
  2025-04-01 15:32 ` [PATCH 01/12] PCI/ACPI: Add D3cold Aux Power Limit_DSM method Anshuman Gupta
                   ` (11 more replies)
  0 siblings, 12 replies; 42+ messages in thread
From: Anshuman Gupta @ 2025-04-01 15:32 UTC (permalink / raw)
  To: intel-xe, linux-acpi, linux-pci
  Cc: rafael, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, anshuman.gupta, varun.gupta,
	ville.syrjala, uma.shankar

This revision has fixed the below review comment on RFC[1] series.

PCIe ACPI Patches:
- Break the _DSM {10, 11} Patch for different _DSM Method "D3cold Aux Power Limit", "PERST# Assertion Delay"
  (Bjorn Helgaas)
- Remove the Battlemage-specific language from commit log and Document that this implementation
  does not track and aggregate requests from all child devices `below the Downstream Port.
  (Bjorn Helgaas)
- Add Rev 3.3 along with Section in commit log.
- Address cosmetic review comment.
- Added a notifier block.

Xe Pacthes:
- %s/VRAM_/VRAM_SR/
  (Rodrigo)
- %s/d3cold.allowed/d3cold.target_state
  (Rodrigo)
- Add has_vrsr flag and refactor pci_acpi_aux_power_setup.
  (Rodrigo)
- Reorder the xe_d3_state enum.
  (Karthik)
- Use default VGA gpu to enable VRSR.


VRAM Self Refresh

This revision has fixed the below review cpmment on RFC[1] series.

PCIe ACPI Patches:
- Break the _DSM {10, 11} Patch for different _DSM Method "D3cold Aux Power Limit", "PERST# Assertion Delay"
  (Bjorn Helgaas)
- Remove the Battlemage-specific language from commit log and Document that this implementation
  does not track and aggregate requests from all child devices `below the Downstream Port.
  (Bjorn Helgaas)
- Add Rev 3.3 along with Section in commit log.
- Address cosmetic review comment.
- Added a notifier block.

Xe Patches:
- %s/VRAM_/VRAM_SR/
  (Rodrigo)
- %s/d3cold.allowed/d3cold.target_state
  (Rodrigo)
- Add has_vrsr flag and refactor pci_acpi_aux_power_setup.
  (Rodrigo)
- Reorder the xe_d3_state enum.
  (Karthik)
- Use default VGA gpu to enable VRSR.

[1] https://patchwork.freedesktop.org/series/145342/

Anshuman Gupta (8):
  PCI/ACPI: Add D3cold Aux Power Limit_DSM method
  PCI/ACPI: Add PERST# Assertion Delay _DSM method
  PCI/ACPI: Add aux power grant notifier
  drm/xe/vrsr: Detect VRSR Capability
  drm/xe: Add PCIe ACPI Aux Power notifier
  drm/xe/vrsr: Refactor d3cold.allowed to a enum
  drm/xe/pm: D3Cold target state
  drm/xe/vrsr: Enable VRSR

Badal Nilawar (4):
  drm/xe/vrsr: Introduce flag has_vrsr
  drm/xe/vrsr: Initialize VRSR feature
  drm/xe/vrsr: Enable VRSR on default VGA boot device
  drm/xe/vrsr: Introduce a debugfs node named vrsr_capable

 drivers/gpu/drm/xe/display/xe_display.c |  28 ++-
 drivers/gpu/drm/xe/display/xe_display.h |   2 +
 drivers/gpu/drm/xe/regs/xe_regs.h       |   3 +
 drivers/gpu/drm/xe/xe_debugfs.c         |  20 ++
 drivers/gpu/drm/xe/xe_device_types.h    |  15 +-
 drivers/gpu/drm/xe/xe_pci.c             |  13 +-
 drivers/gpu/drm/xe/xe_pcode_api.h       |   8 +
 drivers/gpu/drm/xe/xe_pm.c              | 236 ++++++++++++++++++++++--
 drivers/gpu/drm/xe/xe_pm.h              |   9 +-
 drivers/pci/pci-acpi.c                  | 159 ++++++++++++++++
 include/linux/pci-acpi.h                |  26 +++
 11 files changed, 490 insertions(+), 29 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 01/12] PCI/ACPI: Add D3cold Aux Power Limit_DSM method
  2025-04-01 15:32 [PATCH 00/12] VRAM Self Refresh Anshuman Gupta
@ 2025-04-01 15:32 ` Anshuman Gupta
  2025-04-01 18:25   ` Bjorn Helgaas
  2025-04-01 15:32 ` [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method Anshuman Gupta
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Anshuman Gupta @ 2025-04-01 15:32 UTC (permalink / raw)
  To: intel-xe, linux-acpi, linux-pci
  Cc: rafael, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, anshuman.gupta, varun.gupta,
	ville.syrjala, uma.shankar

Implement _DSM method 10 and _DSM Method 11 as per PCI firmware specs
section 4.6.10 Rev 3.3.

Note that this implementation assumes only a single device below the
Downstream Port will request for Aux Power Limit under a given
Root Port because it does not track and aggregate requests
from all child devices below the Downstream Port as required
by Section 4.6.10 Rev 3.3.

One possible mitigation would be only allowing only first PCIe
Non-Bridge Endpoint Function 0 driver to call_DSM method 10.

Signed-off-by: Varun Gupta <varun.gupta@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
---
 drivers/pci/pci-acpi.c   | 78 ++++++++++++++++++++++++++++++++++++++++
 include/linux/pci-acpi.h |  6 ++++
 2 files changed, 84 insertions(+)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index af370628e583..ebd49e43457e 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -1421,6 +1421,84 @@ static void pci_acpi_optimize_delay(struct pci_dev *pdev,
 	ACPI_FREE(obj);
 }
 
+/**
+ * pci_acpi_request_d3cold_aux_power - Request D3cold aux power via ACPI DSM
+ * @dev: PCI device instance
+ * @requested_power: Requested auxiliary power in milliwatts
+ *
+ * This function sends a request to the host BIOS via ACPI _DSM Function 10
+ * to grant the required D3Cold Auxiliary power for the specified PCI device.
+ * It evaluates the _DSM (Device Specific Method) to request the Auxiliary
+ * power and handles the response accordingly.
+ *
+ * This function shall be only called by 1st non-bridge Endpoint driver
+ * on Function 0. For a Multi-Function Device, the driver for Function 0 is
+ * required to report an aggregate power requirement covering all
+ * functions contained within the device.
+ *
+ * Return: Returns 0 on success and errno on failure.
+ */
+int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
+{
+	union acpi_object in_obj = {
+		.integer.type = ACPI_TYPE_INTEGER,
+		.integer.value = requested_power,
+	};
+
+	union acpi_object *out_obj;
+	acpi_handle handle;
+	int result, ret = -EINVAL;
+
+	if (!dev || !ACPI_HANDLE(&dev->dev))
+		return -EINVAL;
+
+	handle = ACPI_HANDLE(&dev->dev);
+
+	out_obj = acpi_evaluate_dsm_typed(handle,
+					  &pci_acpi_dsm_guid,
+					  4,
+					  DSM_PCI_D3COLD_AUX_POWER_LIMIT,
+					  &in_obj,
+					  ACPI_TYPE_INTEGER);
+	if (!out_obj)
+		return -EINVAL;
+
+	result = out_obj->integer.value;
+
+	switch (result) {
+	case 0x0:
+		dev_dbg(&dev->dev, "D3cold Aux Power %umW request denied\n",
+			requested_power);
+		break;
+	case 0x1:
+		dev_info(&dev->dev, "D3cold Aux Power request granted: %umW\n",
+			 requested_power);
+		ret = 0;
+		break;
+	case 0x2:
+		dev_info(&dev->dev, "D3cold Aux Power: Main power won't be removed\n");
+		ret = -EBUSY;
+		break;
+	default:
+		if (result >= 0x11 && result <= 0x1F) {
+			int retry_interval = result & 0xF;
+
+			dev_warn(&dev->dev, "D3cold Aux Power request needs retry."
+				 "Interval: %d seconds.\n", retry_interval);
+			msleep(retry_interval * 1000);
+			ret = -EAGAIN;
+		} else {
+			dev_err(&dev->dev, "D3cold Aux Power: Reserved or "
+				"unsupported response: 0x%x.\n", result);
+		}
+		break;
+	}
+
+	ACPI_FREE(out_obj);
+	return ret;
+}
+EXPORT_SYMBOL(pci_acpi_request_d3cold_aux_power);
+
 static void pci_acpi_set_external_facing(struct pci_dev *dev)
 {
 	u8 val;
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 078225b514d4..dbc4b0ed433c 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -121,6 +121,7 @@ extern const guid_t pci_acpi_dsm_guid;
 #define DSM_PCI_DEVICE_NAME			0x07
 #define DSM_PCI_POWER_ON_RESET_DELAY		0x08
 #define DSM_PCI_DEVICE_READINESS_DURATIONS	0x09
+#define DSM_PCI_D3COLD_AUX_POWER_LIMIT		0x0A
 
 #ifdef CONFIG_PCIE_EDR
 void pci_acpi_add_edr_notifier(struct pci_dev *pdev);
@@ -132,10 +133,15 @@ static inline void pci_acpi_remove_edr_notifier(struct pci_dev *pdev) { }
 
 int pci_acpi_set_companion_lookup_hook(struct acpi_device *(*func)(struct pci_dev *));
 void pci_acpi_clear_companion_lookup_hook(void);
+int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power);
 
 #else	/* CONFIG_ACPI */
 static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
 static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
+int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
+{
+	return -EOPNOTSUPP;
+}
 #endif	/* CONFIG_ACPI */
 
 #endif	/* _PCI_ACPI_H_ */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method
  2025-04-01 15:32 [PATCH 00/12] VRAM Self Refresh Anshuman Gupta
  2025-04-01 15:32 ` [PATCH 01/12] PCI/ACPI: Add D3cold Aux Power Limit_DSM method Anshuman Gupta
@ 2025-04-01 15:32 ` Anshuman Gupta
  2025-04-01 18:30   ` Bjorn Helgaas
  2025-04-02 11:06   ` Rafael J. Wysocki
  2025-04-01 15:32 ` [PATCH 03/12] PCI/ACPI: Add aux power grant notifier Anshuman Gupta
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 42+ messages in thread
From: Anshuman Gupta @ 2025-04-01 15:32 UTC (permalink / raw)
  To: intel-xe, linux-acpi, linux-pci
  Cc: rafael, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, anshuman.gupta, varun.gupta,
	ville.syrjala, uma.shankar

Implement _DSM Method 11 as per PCI firmware specs
section 4.6.11 Rev 3.3.

Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
---
 drivers/pci/pci-acpi.c   | 53 ++++++++++++++++++++++++++++++++++++++++
 include/linux/pci-acpi.h |  7 ++++++
 2 files changed, 60 insertions(+)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index ebd49e43457e..04149f037664 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -1499,6 +1499,59 @@ int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
 }
 EXPORT_SYMBOL(pci_acpi_request_d3cold_aux_power);
 
+/**
+ * pci_acpi_add_perst_assertion_delay - Request PERST# delay via ACPI DSM
+ * @dev: PCI device instance
+ * @delay_us: Requested delay_us
+ *
+ * This function sends a request to the host BIOS via ACPI _DSM to grant the
+ * required PERST# delay for the specified PCI device. It evaluates the _DSM
+ * to request the PERST# delay and handles the response accordingly.
+ *
+ * Return: returns 0 on success and errno on failure.
+ */
+int pci_acpi_add_perst_assertion_delay(struct pci_dev *dev, u32 delay_us)
+{
+	union acpi_object in_obj = {
+		.integer.type = ACPI_TYPE_INTEGER,
+		.integer.value = delay_us,
+	};
+
+	union acpi_object *out_obj;
+	acpi_handle handle;
+	int result, ret = -EINVAL;
+
+	if (!dev || !ACPI_HANDLE(&dev->dev))
+		return -EINVAL;
+
+	handle = ACPI_HANDLE(&dev->dev);
+
+	out_obj = acpi_evaluate_dsm_typed(handle, &pci_acpi_dsm_guid, 4,
+					  DSM_PCI_PERST_ASSERTION_DELAY,
+					  &in_obj, ACPI_TYPE_INTEGER);
+	if (!out_obj)
+		return -EINVAL;
+
+	result = out_obj->integer.value;
+
+	if (result == delay_us) {
+		dev_info(&dev->dev, "PERST# Assertion Delay set to"
+			 "%u microseconds\n", delay_us);
+		ret = 0;
+	} else if (result == 0) {
+		dev_warn(&dev->dev, "PERST# Assertion Delay request failed,"
+			 "no previous valid request\n");
+	} else {
+		dev_warn(&dev->dev,
+			 "PERST# Assertion Delay request failed"
+			 "Previous valid delay: %u microseconds\n", result);
+	}
+
+	ACPI_FREE(out_obj);
+	return ret;
+}
+EXPORT_SYMBOL(pci_acpi_add_perst_assertion_delay);
+
 static void pci_acpi_set_external_facing(struct pci_dev *dev)
 {
 	u8 val;
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index dbc4b0ed433c..4b7373f91a9a 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -122,6 +122,7 @@ extern const guid_t pci_acpi_dsm_guid;
 #define DSM_PCI_POWER_ON_RESET_DELAY		0x08
 #define DSM_PCI_DEVICE_READINESS_DURATIONS	0x09
 #define DSM_PCI_D3COLD_AUX_POWER_LIMIT		0x0A
+#define DSM_PCI_PERST_ASSERTION_DELAY		0x0B
 
 #ifdef CONFIG_PCIE_EDR
 void pci_acpi_add_edr_notifier(struct pci_dev *pdev);
@@ -134,6 +135,7 @@ static inline void pci_acpi_remove_edr_notifier(struct pci_dev *pdev) { }
 int pci_acpi_set_companion_lookup_hook(struct acpi_device *(*func)(struct pci_dev *));
 void pci_acpi_clear_companion_lookup_hook(void);
 int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power);
+int pci_acpi_add_perst_assertion_delay(struct pci_dev *dev, u32 delay_us);
 
 #else	/* CONFIG_ACPI */
 static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
@@ -142,6 +144,11 @@ int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
 {
 	return -EOPNOTSUPP;
 }
+
+int pci_acpi_add_perst_assertion_delay(struct pci_dev *dev, u32 delay_us)
+{
+	return -EOPNOTSUPP;
+}
 #endif	/* CONFIG_ACPI */
 
 #endif	/* _PCI_ACPI_H_ */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
  2025-04-01 15:32 [PATCH 00/12] VRAM Self Refresh Anshuman Gupta
  2025-04-01 15:32 ` [PATCH 01/12] PCI/ACPI: Add D3cold Aux Power Limit_DSM method Anshuman Gupta
  2025-04-01 15:32 ` [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method Anshuman Gupta
@ 2025-04-01 15:32 ` Anshuman Gupta
  2025-04-01 20:13   ` Bjorn Helgaas
  2025-04-01 15:32 ` [PATCH 04/12] drm/xe/vrsr: Introduce flag has_vrsr Anshuman Gupta
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Anshuman Gupta @ 2025-04-01 15:32 UTC (permalink / raw)
  To: intel-xe, linux-acpi, linux-pci
  Cc: rafael, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, anshuman.gupta, varun.gupta,
	ville.syrjala, uma.shankar

Adding a notifier to notify all PCIe child devices about the
block main power removal. It is needed because theoretically
multiple PCIe Endpoint devices on same Root Port
can request for AUX power and _DSM method can return with
80000000h signifies that the hierarchy connected via
the slot cannot support main power removal when in D3Cold.
This may disrupt functionality of other child device.

Let's notify all PCIe devices requested Aux power resource
and Let PCIe End Point driver handle it in its callback.

Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
---
 drivers/pci/pci-acpi.c   | 34 +++++++++++++++++++++++++++++++---
 include/linux/pci-acpi.h | 13 +++++++++++++
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index 04149f037664..d1ca1649e6e8 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -1421,6 +1421,32 @@ static void pci_acpi_optimize_delay(struct pci_dev *pdev,
 	ACPI_FREE(obj);
 }
 
+static BLOCKING_NOTIFIER_HEAD(pci_acpi_aux_power_notify_list);
+
+/**
+ * pci_acpi_register_aux_power_notifier - Register driver notifier
+ * @nb: notifier block
+ *
+ * This function shall be called by PCIe End Point device requested the Aux
+ * power resource in order to handle the any scenario gracefully when other
+ * child PCIe devices Aux power request returns with No main power removal.
+ * PCIe devices which register this notifier shall handle No main power
+ * removal scenario accordingly.
+ *
+ * Return: Returns 0 on success and errno on failure.
+ */
+int pci_acpi_register_aux_power_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&pci_acpi_aux_power_notify_list, nb);
+}
+EXPORT_SYMBOL_GPL(pci_acpi_register_aux_power_notifier);
+
+void pci_acpi_unregister_aux_power_notifier(struct notifier_block *nb)
+{
+	blocking_notifier_chain_unregister(&pci_acpi_aux_power_notify_list, nb);
+}
+EXPORT_SYMBOL_GPL(pci_acpi_unregister_aux_power_notifier);
+
 /**
  * pci_acpi_request_d3cold_aux_power - Request D3cold aux power via ACPI DSM
  * @dev: PCI device instance
@@ -1466,17 +1492,19 @@ int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
 	result = out_obj->integer.value;
 
 	switch (result) {
-	case 0x0:
+	case ACPI_AUX_PW_DENIED:
 		dev_dbg(&dev->dev, "D3cold Aux Power %umW request denied\n",
 			requested_power);
 		break;
-	case 0x1:
+	case ACPI_AUX_PW_GRANTED:
 		dev_info(&dev->dev, "D3cold Aux Power request granted: %umW\n",
 			 requested_power);
 		ret = 0;
 		break;
-	case 0x2:
+	case ACPI_NO_MAIN_PW_REMOVAL:
 		dev_info(&dev->dev, "D3cold Aux Power: Main power won't be removed\n");
+		blocking_notifier_call_chain(&pci_acpi_aux_power_notify_list,
+					     ACPI_NO_MAIN_PW_REMOVAL, dev);
 		ret = -EBUSY;
 		break;
 	default:
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 4b7373f91a9a..793b979af98b 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -124,6 +124,10 @@ extern const guid_t pci_acpi_dsm_guid;
 #define DSM_PCI_D3COLD_AUX_POWER_LIMIT		0x0A
 #define DSM_PCI_PERST_ASSERTION_DELAY		0x0B
 
+#define ACPI_AUX_PW_DENIED			0x0
+#define ACPI_AUX_PW_GRANTED			0x1
+#define ACPI_NO_MAIN_PW_REMOVAL			0x2
+
 #ifdef CONFIG_PCIE_EDR
 void pci_acpi_add_edr_notifier(struct pci_dev *pdev);
 void pci_acpi_remove_edr_notifier(struct pci_dev *pdev);
@@ -134,12 +138,21 @@ static inline void pci_acpi_remove_edr_notifier(struct pci_dev *pdev) { }
 
 int pci_acpi_set_companion_lookup_hook(struct acpi_device *(*func)(struct pci_dev *));
 void pci_acpi_clear_companion_lookup_hook(void);
+int pci_acpi_register_aux_power_notifier(struct notifier_block *nb);
+void pci_acpi_unregister_aux_power_notifier(struct notifier_block *nb);
 int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power);
 int pci_acpi_add_perst_assertion_delay(struct pci_dev *dev, u32 delay_us);
 
 #else	/* CONFIG_ACPI */
 static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
 static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
+int pci_acpi_register_aux_power_notifier(struct notifier_block *nb)
+{
+	return -EOPNOTSUPP;
+}
+
+void pci_acpi_unregister_aux_power_notifier(struct notifier_block *nb) { }
+
 int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
 {
 	return -EOPNOTSUPP;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 04/12] drm/xe/vrsr: Introduce flag has_vrsr
  2025-04-01 15:32 [PATCH 00/12] VRAM Self Refresh Anshuman Gupta
                   ` (2 preceding siblings ...)
  2025-04-01 15:32 ` [PATCH 03/12] PCI/ACPI: Add aux power grant notifier Anshuman Gupta
@ 2025-04-01 15:32 ` Anshuman Gupta
  2025-04-01 15:32 ` [PATCH 05/12] drm/xe/vrsr: Detect VRSR Capability Anshuman Gupta
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 42+ messages in thread
From: Anshuman Gupta @ 2025-04-01 15:32 UTC (permalink / raw)
  To: intel-xe, linux-acpi, linux-pci
  Cc: rafael, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, anshuman.gupta, varun.gupta,
	ville.syrjala, uma.shankar

From: Badal Nilawar <badal.nilawar@intel.com>

Introduce flag has_vrsr to determine if platform supports VRSR feature

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h | 2 ++
 drivers/gpu/drm/xe/xe_pci.c          | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index c54adebfe518..b8ccf729f7c0 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -334,6 +334,8 @@ struct xe_device {
 		u8 has_sriov:1;
 		/** @info.has_usm: Device has unified shared memory support */
 		u8 has_usm:1;
+		/** @info.has_vrsr: Has capability to enter into VRAM self refresh */
+		u8 has_vrsr:1;
 		/** @info.is_dgfx: is discrete device */
 		u8 is_dgfx:1;
 		/**
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 772b6c81672c..70d4827f5821 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -68,6 +68,7 @@ struct xe_device_desc {
 	u8 has_llc:1;
 	u8 has_pxp:1;
 	u8 has_sriov:1;
+	u8 has_vrsr:1;
 	u8 skip_guc_pc:1;
 	u8 skip_mtcfg:1;
 	u8 skip_pcode:1;
@@ -343,6 +344,7 @@ static const struct xe_device_desc bmg_desc = {
 	.dma_mask_size = 46,
 	.has_display = true,
 	.has_fan_control = true,
+	.has_vrsr = true,
 	.has_heci_cscfi = 1,
 };
 
@@ -589,6 +591,7 @@ static int xe_info_init_early(struct xe_device *xe,
 	xe->info.has_llc = desc->has_llc;
 	xe->info.has_pxp = desc->has_pxp;
 	xe->info.has_sriov = desc->has_sriov;
+	xe->info.has_vrsr = desc->has_vrsr;
 	xe->info.skip_guc_pc = desc->skip_guc_pc;
 	xe->info.skip_mtcfg = desc->skip_mtcfg;
 	xe->info.skip_pcode = desc->skip_pcode;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 05/12] drm/xe/vrsr: Detect VRSR Capability
  2025-04-01 15:32 [PATCH 00/12] VRAM Self Refresh Anshuman Gupta
                   ` (3 preceding siblings ...)
  2025-04-01 15:32 ` [PATCH 04/12] drm/xe/vrsr: Introduce flag has_vrsr Anshuman Gupta
@ 2025-04-01 15:32 ` Anshuman Gupta
  2025-04-01 15:32 ` [PATCH 06/12] drm/xe/vrsr: Initialize VRSR feature Anshuman Gupta
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 42+ messages in thread
From: Anshuman Gupta @ 2025-04-01 15:32 UTC (permalink / raw)
  To: intel-xe, linux-acpi, linux-pci
  Cc: rafael, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, anshuman.gupta, varun.gupta,
	ville.syrjala, uma.shankar

Detect VRAM Self Refresh(vrsr) Capability.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
---
 drivers/gpu/drm/xe/regs/xe_regs.h    |  3 +++
 drivers/gpu/drm/xe/xe_device_types.h |  4 ++++
 drivers/gpu/drm/xe/xe_pm.c           | 27 +++++++++++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/xe/regs/xe_regs.h b/drivers/gpu/drm/xe/regs/xe_regs.h
index 3abb17d2ca33..4db486fb310a 100644
--- a/drivers/gpu/drm/xe/regs/xe_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_regs.h
@@ -53,6 +53,9 @@
 #define MTL_MPE_FREQUENCY			XE_REG(0x13802c)
 #define   MTL_RPE_MASK				REG_GENMASK(8, 0)
 
+#define VRAM_SR_CAPABILITY                         XE_REG(0x138144)
+#define   VRAM_SR_SUPPORTED                        REG_BIT(0)
+
 #define VF_CAP_REG				XE_REG(0x1901f8, XE_REG_OPTION_VF)
 #define   VF_CAP				REG_BIT(0)
 
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index b8ccf729f7c0..219800092b8d 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -498,6 +498,9 @@ struct xe_device {
 		/** @d3cold.allowed: Indicates if d3cold is a valid device state */
 		bool allowed;
 
+		/** @d3cold.vrsr_capable: Indicates if d3cold VRAM Self Refresh is supported */
+		bool vrsr_capable;
+
 		/**
 		 * @d3cold.vram_threshold:
 		 *
@@ -508,6 +511,7 @@ struct xe_device {
 		 * Default threshold value is 300mb.
 		 */
 		u32 vram_threshold;
+
 		/** @d3cold.lock: protect vram_threshold */
 		struct mutex lock;
 	} d3cold;
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index a7ddf45db886..c96be409de49 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -17,12 +17,15 @@
 #include "xe_bo_evict.h"
 #include "xe_device.h"
 #include "xe_device_sysfs.h"
+#include "xe_force_wake.h"
 #include "xe_ggtt.h"
 #include "xe_gt.h"
 #include "xe_guc.h"
 #include "xe_irq.h"
+#include "xe_mmio.h"
 #include "xe_pcode.h"
 #include "xe_pxp.h"
+#include "regs/xe_regs.h"
 #include "xe_trace.h"
 #include "xe_wa.h"
 
@@ -236,6 +239,28 @@ static bool xe_pm_pci_d3cold_capable(struct xe_device *xe)
 	return true;
 }
 
+static bool xe_pm_vrsr_capable(struct xe_device *xe)
+{
+	struct xe_mmio *mmio = xe_root_tile_mmio(xe);
+	unsigned int fw_ref;
+	struct xe_gt *gt;
+	u32 val;
+
+	gt = xe_root_mmio_gt(xe);
+
+	if (!xe->info.probe_display)
+		return false;
+
+	fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
+	if (!fw_ref)
+		return false;
+
+	val = xe_mmio_read32(mmio, VRAM_SR_CAPABILITY);
+	xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL);
+
+	return val & VRAM_SR_SUPPORTED;
+}
+
 static void xe_pm_runtime_init(struct xe_device *xe)
 {
 	struct device *dev = xe->drm.dev;
@@ -310,6 +335,8 @@ int xe_pm_init(struct xe_device *xe)
 		err = xe_pm_set_vram_threshold(xe, vram_threshold);
 		if (err)
 			return err;
+
+		xe->d3cold.vrsr_capable = xe_pm_vrsr_capable(xe);
 	}
 
 	xe_pm_runtime_init(xe);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 06/12] drm/xe/vrsr: Initialize VRSR feature
  2025-04-01 15:32 [PATCH 00/12] VRAM Self Refresh Anshuman Gupta
                   ` (4 preceding siblings ...)
  2025-04-01 15:32 ` [PATCH 05/12] drm/xe/vrsr: Detect VRSR Capability Anshuman Gupta
@ 2025-04-01 15:32 ` Anshuman Gupta
  2025-04-01 19:56   ` Bjorn Helgaas
  2025-04-01 15:32 ` [PATCH 07/12] drm/xe/vrsr: Enable VRSR on default VGA boot device Anshuman Gupta
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Anshuman Gupta @ 2025-04-01 15:32 UTC (permalink / raw)
  To: intel-xe, linux-acpi, linux-pci
  Cc: rafael, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, anshuman.gupta, varun.gupta,
	ville.syrjala, uma.shankar

From: Badal Nilawar <badal.nilawar@intel.com>

Initialize VRSR feature by requesting Auxilary Power and PERST# assertion
delay. Include an API to enable VRSR feature.

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h |  1 +
 drivers/gpu/drm/xe/xe_pcode_api.h    |  8 +++
 drivers/gpu/drm/xe/xe_pm.c           | 92 +++++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_pm.h           |  1 +
 4 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 219800092b8d..fd9dea207580 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -7,6 +7,7 @@
 #define _XE_DEVICE_TYPES_H_
 
 #include <linux/pci.h>
+#include <linux/pci-acpi.h>
 
 #include <drm/drm_device.h>
 #include <drm/drm_file.h>
diff --git a/drivers/gpu/drm/xe/xe_pcode_api.h b/drivers/gpu/drm/xe/xe_pcode_api.h
index e622ae17f08d..cffdf52495f9 100644
--- a/drivers/gpu/drm/xe/xe_pcode_api.h
+++ b/drivers/gpu/drm/xe/xe_pcode_api.h
@@ -42,6 +42,14 @@
 #define	    POWER_SETUP_I1_SHIFT		6	/* 10.6 fixed point format */
 #define	    POWER_SETUP_I1_DATA_MASK		REG_GENMASK(15, 0)
 
+#define	  PCODE_D3_VRAM_SELF_REFRESH	0x71
+#define	    PCODE_D3_VRSR_SC_DISABLE	0x0
+#define	    PCODE_D3_VRSR_SC_ENABLE	0x1
+#define     PCODE_D3_VRSR_SC_AUX_PL_AND_PERST_DELAY	0x2
+#define     PCODE_D3_VRSR_PERST_SHIFT	16
+#define	    POWER_D3_VRSR_PSERST_MASK	REG_GENMASK(31, 16)
+#define	    POWER_D3_VRSR_AUX_PL_MASK	REG_GENMASK(15, 0)
+
 #define   PCODE_FREQUENCY_CONFIG		0x6e
 /* Frequency Config Sub Commands (param1) */
 #define     PCODE_MBOX_FC_SC_READ_FUSED_P0	0x0
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index c96be409de49..abb5099475cb 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -23,6 +23,7 @@
 #include "xe_guc.h"
 #include "xe_irq.h"
 #include "xe_mmio.h"
+#include "xe_pcode_api.h"
 #include "xe_pcode.h"
 #include "xe_pxp.h"
 #include "regs/xe_regs.h"
@@ -261,6 +262,95 @@ static bool xe_pm_vrsr_capable(struct xe_device *xe)
 	return val & VRAM_SR_SUPPORTED;
 }
 
+static int pci_acpi_aux_power_setup(struct xe_device *xe)
+{
+	struct xe_tile *root_tile = xe_device_get_root_tile(xe);
+	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
+	struct pci_dev *root_pdev;
+	int ret;
+	u32 uval;
+	u32 aux_pwr_limit;
+	u32 perst_delay;
+
+	root_pdev = pcie_find_root_port(pdev);
+	if (!root_pdev)
+		return -EINVAL;
+
+	ret = xe_pcode_read(root_tile, PCODE_MBOX(PCODE_D3_VRAM_SELF_REFRESH,
+			    PCODE_D3_VRSR_SC_AUX_PL_AND_PERST_DELAY, 0),
+			    &uval, NULL);
+
+	if (ret)
+		return ret;
+
+	aux_pwr_limit = REG_FIELD_GET(POWER_D3_VRSR_AUX_PL_MASK, uval);
+	perst_delay = REG_FIELD_GET(POWER_D3_VRSR_PSERST_MASK, uval);
+
+	drm_dbg(&xe->drm, "AUX power limit =%d\n", aux_pwr_limit);
+	drm_dbg(&xe->drm, "PERST Assertion delay =%d\n", perst_delay);
+
+	ret = pci_acpi_request_d3cold_aux_power(root_pdev, aux_pwr_limit);
+	if (ret)
+		return ret;
+
+	ret = pci_acpi_add_perst_assertion_delay(root_pdev, perst_delay);
+
+	return ret;
+}
+
+static void xe_pm_vrsr_init(struct xe_device *xe)
+{
+	int ret;
+
+	/* Check if platform support d3cold vrsr */
+	if (!xe->info.has_vrsr)
+		return;
+
+	if (!xe_pm_vrsr_capable(xe))
+		return;
+
+	/*
+	 * If the VRSR initialization fails, the device will proceed with the regular
+	 * D3 Cold flow
+	 */
+	ret = pci_acpi_aux_power_setup(xe);
+	if (ret) {
+		drm_info(&xe->drm, "VRSR capable %s\n", "No");
+		return;
+	}
+
+	xe->d3cold.vrsr_capable = true;
+	drm_info(&xe->drm, "VRSR capable %s\n", "Yes");
+}
+
+/**
+ * xe_pm_vrsr_enable - Enable VRAM self refresh
+ * @xe: The xe device.
+ * @enable: true: Enable, false: Disable
+ *
+ * This function enables the VRSR feature in D3Cold path.
+ *
+ * Return: It returns 0 on success and errno on failure.
+ */
+int xe_pm_vrsr_enable(struct xe_device *xe, bool enable)
+{
+	struct xe_tile *root_tile = xe_device_get_root_tile(xe);
+	int ret;
+	u32 uval = 0;
+
+	if (!xe->d3cold.vrsr_capable)
+		return -ENXIO;
+
+	if (enable)
+		ret = xe_pcode_write(root_tile, PCODE_MBOX(PCODE_D3_VRAM_SELF_REFRESH,
+				     PCODE_D3_VRSR_SC_ENABLE, 0), uval);
+	else
+		ret = xe_pcode_write(root_tile, PCODE_MBOX(PCODE_D3_VRAM_SELF_REFRESH,
+				     PCODE_D3_VRSR_SC_DISABLE, 0), uval);
+
+	return ret;
+}
+
 static void xe_pm_runtime_init(struct xe_device *xe)
 {
 	struct device *dev = xe->drm.dev;
@@ -336,7 +426,7 @@ int xe_pm_init(struct xe_device *xe)
 		if (err)
 			return err;
 
-		xe->d3cold.vrsr_capable = xe_pm_vrsr_capable(xe);
+		xe_pm_vrsr_init(xe);
 	}
 
 	xe_pm_runtime_init(xe);
diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
index 998d1ed64556..2b5df31db4c4 100644
--- a/drivers/gpu/drm/xe/xe_pm.h
+++ b/drivers/gpu/drm/xe/xe_pm.h
@@ -35,4 +35,5 @@ bool xe_rpm_reclaim_safe(const struct xe_device *xe);
 struct task_struct *xe_pm_read_callback_task(struct xe_device *xe);
 int xe_pm_module_init(void);
 
+int xe_pm_vrsr_enable(struct xe_device *xe, bool enable);
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 07/12] drm/xe/vrsr: Enable VRSR on default VGA boot device
  2025-04-01 15:32 [PATCH 00/12] VRAM Self Refresh Anshuman Gupta
                   ` (5 preceding siblings ...)
  2025-04-01 15:32 ` [PATCH 06/12] drm/xe/vrsr: Initialize VRSR feature Anshuman Gupta
@ 2025-04-01 15:32 ` Anshuman Gupta
  2025-04-01 15:32 ` [PATCH 08/12] drm/xe: Add PCIe ACPI Aux Power notifier Anshuman Gupta
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 42+ messages in thread
From: Anshuman Gupta @ 2025-04-01 15:32 UTC (permalink / raw)
  To: intel-xe, linux-acpi, linux-pci
  Cc: rafael, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, anshuman.gupta, varun.gupta,
	ville.syrjala, uma.shankar

From: Badal Nilawar <badal.nilawar@intel.com>

The VRSR feature is to enhance the display screen refresh experience
when the device exits from the D3Cold state. Therefore, apply the VRSR
feature to the default VGA boot device and when a display is connected.

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
---
 drivers/gpu/drm/xe/display/xe_display.c | 22 ++++++++++++++++++++++
 drivers/gpu/drm/xe/display/xe_display.h |  2 ++
 drivers/gpu/drm/xe/xe_pm.c              |  5 +++++
 3 files changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
index 23937ed3b33d..a7a50f48f1c5 100644
--- a/drivers/gpu/drm/xe/display/xe_display.c
+++ b/drivers/gpu/drm/xe/display/xe_display.c
@@ -88,6 +88,28 @@ static void display_destroy(struct drm_device *dev, void *dummy)
 	destroy_workqueue(xe->display.hotplug.dp_wq);
 }
 
+bool xe_display_connected(struct xe_device *xe)
+{
+	struct drm_connector *list_connector;
+	struct drm_connector_list_iter iter;
+	bool ret = false;
+
+	mutex_lock(&xe->drm.mode_config.mutex);
+	drm_connector_list_iter_begin(&xe->drm, &iter);
+
+	drm_for_each_connector_iter(list_connector, &iter) {
+		if (list_connector->status == connector_status_connected) {
+			ret = true;
+			break;
+		}
+	}
+
+	drm_connector_list_iter_end(&iter);
+	mutex_unlock(&xe->drm.mode_config.mutex);
+
+	return ret;
+}
+
 /**
  * xe_display_create - create display struct
  * @xe: XE device instance
diff --git a/drivers/gpu/drm/xe/display/xe_display.h b/drivers/gpu/drm/xe/display/xe_display.h
index 46e14f8dee28..a432790d6d34 100644
--- a/drivers/gpu/drm/xe/display/xe_display.h
+++ b/drivers/gpu/drm/xe/display/xe_display.h
@@ -39,6 +39,7 @@ void xe_display_pm_resume(struct xe_device *xe);
 void xe_display_pm_runtime_suspend(struct xe_device *xe);
 void xe_display_pm_runtime_suspend_late(struct xe_device *xe);
 void xe_display_pm_runtime_resume(struct xe_device *xe);
+bool xe_display_connected(struct xe_device *xe);
 
 #else
 
@@ -71,5 +72,6 @@ static inline void xe_display_pm_runtime_suspend(struct xe_device *xe) {}
 static inline void xe_display_pm_runtime_suspend_late(struct xe_device *xe) {}
 static inline void xe_display_pm_runtime_resume(struct xe_device *xe) {}
 
+static inline bool xe_display_connected(struct xe_device *xe) {}
 #endif /* CONFIG_DRM_XE_DISPLAY */
 #endif /* _XE_DISPLAY_H_ */
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index abb5099475cb..364b937e0ded 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -8,6 +8,7 @@
 #include <linux/fault-inject.h>
 #include <linux/pm_runtime.h>
 #include <linux/suspend.h>
+#include <linux/vgaarb.h>
 
 #include <drm/drm_managed.h>
 #include <drm/ttm/ttm_placement.h>
@@ -300,6 +301,7 @@ static int pci_acpi_aux_power_setup(struct xe_device *xe)
 
 static void xe_pm_vrsr_init(struct xe_device *xe)
 {
+	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
 	int ret;
 
 	/* Check if platform support d3cold vrsr */
@@ -309,6 +311,9 @@ static void xe_pm_vrsr_init(struct xe_device *xe)
 	if (!xe_pm_vrsr_capable(xe))
 		return;
 
+	if (pdev != vga_default_device() || !xe_display_connected(xe))
+		return;
+
 	/*
 	 * If the VRSR initialization fails, the device will proceed with the regular
 	 * D3 Cold flow
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 08/12] drm/xe: Add PCIe ACPI Aux Power notifier
  2025-04-01 15:32 [PATCH 00/12] VRAM Self Refresh Anshuman Gupta
                   ` (6 preceding siblings ...)
  2025-04-01 15:32 ` [PATCH 07/12] drm/xe/vrsr: Enable VRSR on default VGA boot device Anshuman Gupta
@ 2025-04-01 15:32 ` Anshuman Gupta
  2025-04-01 15:32 ` [PATCH 09/12] drm/xe/vrsr: Refactor d3cold.allowed to a enum Anshuman Gupta
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 42+ messages in thread
From: Anshuman Gupta @ 2025-04-01 15:32 UTC (permalink / raw)
  To: intel-xe, linux-acpi, linux-pci
  Cc: rafael, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, anshuman.gupta, varun.gupta,
	ville.syrjala, uma.shankar

Register PCIe ACPI notifier block and mark vrsr_capable
false in the notifier callback. This will ensure that
BMG GPU does not explode if any other PCIe child device
(under same Root Port) aux power request returns
with No main power removal.

Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h |  3 ++
 drivers/gpu/drm/xe/xe_pm.c           | 41 ++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index fd9dea207580..9aacd5af7d0f 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -517,6 +517,9 @@ struct xe_device {
 		struct mutex lock;
 	} d3cold;
 
+	/** @nb: PCI ACPI Aux power notifier */
+	struct notifier_block nb;
+
 	/** @pmt: Support the PMT driver callback interface */
 	struct {
 		/** @pmt.lock: protect access for telemetry data */
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index 364b937e0ded..ea93923d6671 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -87,6 +87,41 @@ static struct lockdep_map xe_pm_runtime_nod3cold_map = {
 };
 #endif
 
+static int aux_pwr_notifier(struct notifier_block *nb,
+			    unsigned long val, void *data)
+{
+	struct pci_dev *root_pdev = data;
+	struct pci_dev *pdev;
+	struct xe_device *xe;
+
+	xe = container_of(nb, struct xe_device, nb);
+
+	pdev = pcie_find_root_port(to_pci_dev(xe->drm.dev));
+	if (!pdev)
+		return -EINVAL;
+
+	if (root_pdev != pdev)
+		return 0;
+
+	xe_pm_runtime_get(xe);
+
+	if (val == ACPI_NO_MAIN_PW_REMOVAL) {
+		drm_err(&xe->drm, "PCIe core blocked the removal of Main Supply\n");
+		xe->d3cold.vrsr_capable = false;
+	}
+
+	xe_pm_runtime_put(xe);
+
+	return 0;
+}
+
+static void xe_pm_vrsr_fini(void *arg)
+{
+	struct xe_device *xe = arg;
+
+	pci_acpi_unregister_aux_power_notifier(&xe->nb);
+}
+
 /**
  * xe_rpm_reclaim_safe() - Whether runtime resume can be done from reclaim context
  * @xe: The xe device.
@@ -295,6 +330,12 @@ static int pci_acpi_aux_power_setup(struct xe_device *xe)
 		return ret;
 
 	ret = pci_acpi_add_perst_assertion_delay(root_pdev, perst_delay);
+	if (ret)
+		return ret;
+
+	xe->nb.notifier_call = aux_pwr_notifier;
+	pci_acpi_register_aux_power_notifier(&xe->nb);
+	devm_add_action_or_reset(xe->drm.dev, xe_pm_vrsr_fini, xe);
 
 	return ret;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 09/12] drm/xe/vrsr: Refactor d3cold.allowed to a enum
  2025-04-01 15:32 [PATCH 00/12] VRAM Self Refresh Anshuman Gupta
                   ` (7 preceding siblings ...)
  2025-04-01 15:32 ` [PATCH 08/12] drm/xe: Add PCIe ACPI Aux Power notifier Anshuman Gupta
@ 2025-04-01 15:32 ` Anshuman Gupta
  2025-04-01 15:32 ` [PATCH 10/12] drm/xe/pm: D3Cold target state Anshuman Gupta
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 42+ messages in thread
From: Anshuman Gupta @ 2025-04-01 15:32 UTC (permalink / raw)
  To: intel-xe, linux-acpi, linux-pci
  Cc: rafael, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, anshuman.gupta, varun.gupta,
	ville.syrjala, uma.shankar

Add xe_d3_state enum to add support for VRAM Self Refresh
d3cold state.

Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
---
 drivers/gpu/drm/xe/display/xe_display.c |  6 +++---
 drivers/gpu/drm/xe/xe_device_types.h    |  5 +++--
 drivers/gpu/drm/xe/xe_pci.c             |  6 +++---
 drivers/gpu/drm/xe/xe_pm.c              | 16 ++++++++--------
 drivers/gpu/drm/xe/xe_pm.h              |  8 +++++++-
 5 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
index a7a50f48f1c5..7877c2d61618 100644
--- a/drivers/gpu/drm/xe/display/xe_display.c
+++ b/drivers/gpu/drm/xe/display/xe_display.c
@@ -433,7 +433,7 @@ void xe_display_pm_runtime_suspend(struct xe_device *xe)
 	if (!xe->info.probe_display)
 		return;
 
-	if (xe->d3cold.allowed) {
+	if (xe->d3cold.target_state) {
 		xe_display_enable_d3cold(xe);
 		return;
 	}
@@ -459,7 +459,7 @@ void xe_display_pm_runtime_suspend_late(struct xe_device *xe)
 	if (!xe->info.probe_display)
 		return;
 
-	if (xe->d3cold.allowed)
+	if (xe->d3cold.target_state)
 		xe_display_pm_suspend_late(xe);
 
 	/*
@@ -537,7 +537,7 @@ void xe_display_pm_runtime_resume(struct xe_device *xe)
 	if (!xe->info.probe_display)
 		return;
 
-	if (xe->d3cold.allowed) {
+	if (xe->d3cold.target_state) {
 		xe_display_disable_d3cold(xe);
 		return;
 	}
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 9aacd5af7d0f..9d97f2c84c33 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -20,6 +20,7 @@
 #include "xe_memirq_types.h"
 #include "xe_oa_types.h"
 #include "xe_platform_types.h"
+#include "xe_pm.h"
 #include "xe_pmu_types.h"
 #include "xe_pt_types.h"
 #include "xe_sriov_types.h"
@@ -496,8 +497,8 @@ struct xe_device {
 		/** @d3cold.capable: Indicates if root port is d3cold capable */
 		bool capable;
 
-		/** @d3cold.allowed: Indicates if d3cold is a valid device state */
-		bool allowed;
+		/** @d3cold.target_state: Indicates d3cold target state */
+		enum xe_d3_state target_state;
 
 		/** @d3cold.vrsr_capable: Indicates if d3cold VRAM Self Refresh is supported */
 		bool vrsr_capable;
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 70d4827f5821..fa2d43395129 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -976,7 +976,7 @@ static int xe_pci_runtime_suspend(struct device *dev)
 
 	pci_save_state(pdev);
 
-	if (xe->d3cold.allowed) {
+	if (xe->d3cold.target_state) {
 		d3cold_toggle(pdev, D3COLD_ENABLE);
 		pci_disable_device(pdev);
 		pci_ignore_hotplug(pdev);
@@ -1001,7 +1001,7 @@ static int xe_pci_runtime_resume(struct device *dev)
 
 	pci_restore_state(pdev);
 
-	if (xe->d3cold.allowed) {
+	if (xe->d3cold.target_state) {
 		err = pci_enable_device(pdev);
 		if (err)
 			return err;
@@ -1017,7 +1017,7 @@ static int xe_pci_runtime_idle(struct device *dev)
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct xe_device *xe = pdev_to_xe_device(pdev);
 
-	xe_pm_d3cold_allowed_toggle(xe);
+	xe_pm_d3cold_target_state_toggle(xe);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index ea93923d6671..d4149a2eace7 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -585,7 +585,7 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
 
 	xe_display_pm_runtime_suspend(xe);
 
-	if (xe->d3cold.allowed) {
+	if (xe->d3cold.target_state) {
 		err = xe_bo_evict_all(xe);
 		if (err)
 			goto out_resume;
@@ -632,7 +632,7 @@ int xe_pm_runtime_resume(struct xe_device *xe)
 
 	xe_rpm_lockmap_acquire(xe);
 
-	if (xe->d3cold.allowed) {
+	if (xe->d3cold.target_state) {
 		err = xe_pcode_ready(xe, true);
 		if (err)
 			goto out;
@@ -655,7 +655,7 @@ int xe_pm_runtime_resume(struct xe_device *xe)
 
 	xe_display_pm_runtime_resume(xe);
 
-	if (xe->d3cold.allowed) {
+	if (xe->d3cold.target_state) {
 		err = xe_bo_restore_user(xe);
 		if (err)
 			goto out;
@@ -897,13 +897,13 @@ int xe_pm_set_vram_threshold(struct xe_device *xe, u32 threshold)
 }
 
 /**
- * xe_pm_d3cold_allowed_toggle - Check conditions to toggle d3cold.allowed
+ * xe_pm_d3cold_target_state_toggle - Check conditions to toggle target_state
  * @xe: xe device instance
  *
  * To be called during runtime_pm idle callback.
  * Check for all the D3Cold conditions ahead of runtime suspend.
  */
-void xe_pm_d3cold_allowed_toggle(struct xe_device *xe)
+void xe_pm_d3cold_target_state_toggle(struct xe_device *xe)
 {
 	struct ttm_resource_manager *man;
 	u32 total_vram_used_mb = 0;
@@ -911,7 +911,7 @@ void xe_pm_d3cold_allowed_toggle(struct xe_device *xe)
 	int i;
 
 	if (!xe->d3cold.capable) {
-		xe->d3cold.allowed = false;
+		xe->d3cold.target_state = XE_D3HOT;
 		return;
 	}
 
@@ -926,9 +926,9 @@ void xe_pm_d3cold_allowed_toggle(struct xe_device *xe)
 	mutex_lock(&xe->d3cold.lock);
 
 	if (total_vram_used_mb < xe->d3cold.vram_threshold)
-		xe->d3cold.allowed = true;
+		xe->d3cold.target_state = XE_D3COLD_OFF;
 	else
-		xe->d3cold.allowed = false;
+		xe->d3cold.target_state = XE_D3HOT;
 
 	mutex_unlock(&xe->d3cold.lock);
 }
diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
index 2b5df31db4c4..803c82f50d77 100644
--- a/drivers/gpu/drm/xe/xe_pm.h
+++ b/drivers/gpu/drm/xe/xe_pm.h
@@ -12,6 +12,12 @@
 
 struct xe_device;
 
+enum xe_d3_state {
+	XE_D3HOT = 0,
+	XE_D3COLD_VRSR,
+	XE_D3COLD_OFF,
+};
+
 int xe_pm_suspend(struct xe_device *xe);
 int xe_pm_resume(struct xe_device *xe);
 
@@ -30,7 +36,7 @@ void xe_pm_runtime_get_noresume(struct xe_device *xe);
 bool xe_pm_runtime_resume_and_get(struct xe_device *xe);
 void xe_pm_assert_unbounded_bridge(struct xe_device *xe);
 int xe_pm_set_vram_threshold(struct xe_device *xe, u32 threshold);
-void xe_pm_d3cold_allowed_toggle(struct xe_device *xe);
+void xe_pm_d3cold_target_state_toggle(struct xe_device *xe);
 bool xe_rpm_reclaim_safe(const struct xe_device *xe);
 struct task_struct *xe_pm_read_callback_task(struct xe_device *xe);
 int xe_pm_module_init(void);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 10/12] drm/xe/pm: D3Cold target state
  2025-04-01 15:32 [PATCH 00/12] VRAM Self Refresh Anshuman Gupta
                   ` (8 preceding siblings ...)
  2025-04-01 15:32 ` [PATCH 09/12] drm/xe/vrsr: Refactor d3cold.allowed to a enum Anshuman Gupta
@ 2025-04-01 15:32 ` Anshuman Gupta
  2025-04-02 10:28   ` [10/12] " Poosa, Karthik
  2025-04-01 15:32 ` [PATCH 11/12] drm/xe/vrsr: Enable VRSR Anshuman Gupta
  2025-04-01 15:32 ` [PATCH 12/12] drm/xe/vrsr: Introduce a debugfs node named vrsr_capable Anshuman Gupta
  11 siblings, 1 reply; 42+ messages in thread
From: Anshuman Gupta @ 2025-04-01 15:32 UTC (permalink / raw)
  To: intel-xe, linux-acpi, linux-pci
  Cc: rafael, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, anshuman.gupta, varun.gupta,
	ville.syrjala, uma.shankar

Trade-off D3Cold target state based upon current vram usage.
If vram usage is greater than vram_d3cold_threshold and GPU
is VRSR capable target D3Cold state is D3Cold-VRSR
otherwise target state is D3Cold-Off.

Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
---
 drivers/gpu/drm/xe/xe_pm.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index d4149a2eace7..5db9313ae269 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -148,6 +148,14 @@ static void xe_rpm_lockmap_release(const struct xe_device *xe)
 			 &xe_pm_runtime_d3cold_map);
 }
 
+static void xe_pm_suspend_prepare(struct xe_device *xe)
+{
+	if (pm_suspend_target_state == PM_SUSPEND_TO_IDLE)
+		xe_pm_d3cold_allowed_toggle(xe);
+	else
+		xe->d3cold.target_state = XE_D3COLD_OFF;
+}
+
 /**
  * xe_pm_suspend - Helper for System suspend, i.e. S0->S3 / S0->S2idle
  * @xe: xe device instance
@@ -163,6 +171,8 @@ int xe_pm_suspend(struct xe_device *xe)
 	drm_dbg(&xe->drm, "Suspending device\n");
 	trace_xe_pm_suspend(xe, __builtin_return_address(0));
 
+	xe_pm_suspend_prepare(xe);
+
 	err = xe_pxp_pm_suspend(xe->pxp);
 	if (err)
 		goto err;
@@ -927,10 +937,14 @@ void xe_pm_d3cold_target_state_toggle(struct xe_device *xe)
 
 	if (total_vram_used_mb < xe->d3cold.vram_threshold)
 		xe->d3cold.target_state = XE_D3COLD_OFF;
+	else if (xe->d3cold.vrsr_capable)
+		xe->d3cold.target_state = XE_D3COLD_VRSR;
 	else
 		xe->d3cold.target_state = XE_D3HOT;
 
 	mutex_unlock(&xe->d3cold.lock);
+
+	drm_dbg(&xe->drm, "Xe d3cold target state %d\n", xe->d3cold.allowed);
 }
 
 /**
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 11/12] drm/xe/vrsr: Enable VRSR
  2025-04-01 15:32 [PATCH 00/12] VRAM Self Refresh Anshuman Gupta
                   ` (9 preceding siblings ...)
  2025-04-01 15:32 ` [PATCH 10/12] drm/xe/pm: D3Cold target state Anshuman Gupta
@ 2025-04-01 15:32 ` Anshuman Gupta
  2025-04-01 15:32 ` [PATCH 12/12] drm/xe/vrsr: Introduce a debugfs node named vrsr_capable Anshuman Gupta
  11 siblings, 0 replies; 42+ messages in thread
From: Anshuman Gupta @ 2025-04-01 15:32 UTC (permalink / raw)
  To: intel-xe, linux-acpi, linux-pci
  Cc: rafael, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, anshuman.gupta, varun.gupta,
	ville.syrjala, uma.shankar

Enabling VRSR in runtime suspend and also in System wide suspend.
Also fix couple of typo in xe_pm.c.

Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
---
 drivers/gpu/drm/xe/xe_pci.c |  4 +--
 drivers/gpu/drm/xe/xe_pm.c  | 51 +++++++++++++++++++++++++++----------
 2 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index fa2d43395129..3317d475be79 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -925,7 +925,7 @@ static int xe_pci_suspend(struct device *dev)
 
 	/*
 	 * Enabling D3Cold is needed for S2Idle/S0ix.
-	 * It is save to allow here since xe_pm_suspend has evicted
+	 * It is safe to allow here since xe_pm_suspend has evicted
 	 * the local memory and the direct complete optimization is disabled.
 	 */
 	d3cold_toggle(pdev, D3COLD_ENABLE);
@@ -942,7 +942,7 @@ static int xe_pci_resume(struct device *dev)
 	struct pci_dev *pdev = to_pci_dev(dev);
 	int err;
 
-	/* Give back the D3Cold decision to the runtime P M*/
+	/* Give back the D3Cold decision to the runtime PM */
 	d3cold_toggle(pdev, D3COLD_DISABLE);
 
 	err = pci_set_power_state(pdev, PCI_D0);
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index 5db9313ae269..41b59c8b31b3 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -151,7 +151,7 @@ static void xe_rpm_lockmap_release(const struct xe_device *xe)
 static void xe_pm_suspend_prepare(struct xe_device *xe)
 {
 	if (pm_suspend_target_state == PM_SUSPEND_TO_IDLE)
-		xe_pm_d3cold_allowed_toggle(xe);
+		xe_pm_d3cold_target_state_toggle(xe);
 	else
 		xe->d3cold.target_state = XE_D3COLD_OFF;
 }
@@ -182,10 +182,12 @@ int xe_pm_suspend(struct xe_device *xe)
 
 	xe_display_pm_suspend(xe);
 
-	/* FIXME: Super racey... */
-	err = xe_bo_evict_all(xe);
-	if (err)
-		goto err_pxp;
+	if (xe->d3cold.target_state == XE_D3COLD_OFF) {
+		/* FIXME: Super racey... */
+		err = xe_bo_evict_all(xe);
+		if (err)
+			goto err_pxp;
+	}
 
 	for_each_gt(gt, xe, id) {
 		err = xe_gt_suspend(gt);
@@ -197,6 +199,12 @@ int xe_pm_suspend(struct xe_device *xe)
 
 	xe_display_pm_suspend_late(xe);
 
+	if (xe->d3cold.target_state == XE_D3COLD_VRSR) {
+		err = xe_pm_vrsr_enable(xe, true);
+			if (err)
+				goto err_display;
+	}
+
 	drm_dbg(&xe->drm, "Device suspended\n");
 	return 0;
 
@@ -238,9 +246,11 @@ int xe_pm_resume(struct xe_device *xe)
 	 * This only restores pinned memory which is the memory required for the
 	 * GT(s) to resume.
 	 */
-	err = xe_bo_restore_kernel(xe);
-	if (err)
-		goto err;
+	if (xe->d3cold.target_state == XE_D3COLD_OFF) {
+		err = xe_bo_restore_kernel(xe);
+		if (err)
+			goto err;
+	}
 
 	xe_irq_resume(xe);
 
@@ -249,9 +259,11 @@ int xe_pm_resume(struct xe_device *xe)
 
 	xe_display_pm_resume(xe);
 
-	err = xe_bo_restore_user(xe);
-	if (err)
-		goto err;
+	if (xe->d3cold.target_state == XE_D3COLD_OFF) {
+		err = xe_bo_restore_user(xe);
+		if (err)
+			goto err;
+	}
 
 	xe_pxp_pm_resume(xe->pxp);
 
@@ -595,7 +607,7 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
 
 	xe_display_pm_runtime_suspend(xe);
 
-	if (xe->d3cold.target_state) {
+	if (xe->d3cold.target_state == XE_D3COLD_OFF) {
 		err = xe_bo_evict_all(xe);
 		if (err)
 			goto out_resume;
@@ -611,6 +623,14 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
 
 	xe_display_pm_runtime_suspend_late(xe);
 
+	if (xe->d3cold.target_state == XE_D3COLD_VRSR) {
+		err = xe_pm_vrsr_enable(xe, true);
+			if (err) {
+				drm_err(&xe->drm, "Failed to enable VRSR: %d\n", err);
+				goto out_resume;
+			}
+	}
+
 	xe_rpm_lockmap_release(xe);
 	xe_pm_write_callback_task(xe, NULL);
 	return 0;
@@ -642,7 +662,7 @@ int xe_pm_runtime_resume(struct xe_device *xe)
 
 	xe_rpm_lockmap_acquire(xe);
 
-	if (xe->d3cold.target_state) {
+	if (xe->d3cold.target_state == XE_D3COLD_OFF) {
 		err = xe_pcode_ready(xe, true);
 		if (err)
 			goto out;
@@ -658,6 +678,9 @@ int xe_pm_runtime_resume(struct xe_device *xe)
 			goto out;
 	}
 
+	if (xe->d3cold.target_state == XE_D3COLD_VRSR)
+		xe_display_pm_resume_early(xe);
+
 	xe_irq_resume(xe);
 
 	for_each_gt(gt, xe, id)
@@ -665,7 +688,7 @@ int xe_pm_runtime_resume(struct xe_device *xe)
 
 	xe_display_pm_runtime_resume(xe);
 
-	if (xe->d3cold.target_state) {
+	if (xe->d3cold.target_state == XE_D3COLD_OFF) {
 		err = xe_bo_restore_user(xe);
 		if (err)
 			goto out;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 12/12] drm/xe/vrsr: Introduce a debugfs node named vrsr_capable
  2025-04-01 15:32 [PATCH 00/12] VRAM Self Refresh Anshuman Gupta
                   ` (10 preceding siblings ...)
  2025-04-01 15:32 ` [PATCH 11/12] drm/xe/vrsr: Enable VRSR Anshuman Gupta
@ 2025-04-01 15:32 ` Anshuman Gupta
  11 siblings, 0 replies; 42+ messages in thread
From: Anshuman Gupta @ 2025-04-01 15:32 UTC (permalink / raw)
  To: intel-xe, linux-acpi, linux-pci
  Cc: rafael, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, anshuman.gupta, varun.gupta,
	ville.syrjala, uma.shankar

From: Badal Nilawar <badal.nilawar@intel.com>

Add a debugfs node named vrsr_capable to check if the device supports
VRSR

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
---
 drivers/gpu/drm/xe/xe_debugfs.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
index d0503959a8ed..9f8d635c750c 100644
--- a/drivers/gpu/drm/xe/xe_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_debugfs.c
@@ -191,6 +191,23 @@ static const struct file_operations wedged_mode_fops = {
 	.write = wedged_mode_set,
 };
 
+static ssize_t vrsr_capable_show(struct file *f, char __user *ubuf,
+				size_t size, loff_t *pos)
+{
+	struct xe_device *xe = file_inode(f)->i_private;
+	char buf[32];
+	int len = 0;
+
+	len = scnprintf(buf, sizeof(buf), "%s\n", xe->d3cold.vrsr_capable ? "true" : "false");
+
+	return simple_read_from_buffer(ubuf, size, pos, buf, len);
+}
+
+static const struct file_operations vrsr_capable_fops = {
+	.owner = THIS_MODULE,
+	.read = vrsr_capable_show,
+};
+
 void xe_debugfs_register(struct xe_device *xe)
 {
 	struct ttm_device *bdev = &xe->ttm;
@@ -211,6 +228,9 @@ void xe_debugfs_register(struct xe_device *xe)
 	debugfs_create_file("wedged_mode", 0600, root, xe,
 			    &wedged_mode_fops);
 
+	debugfs_create_file("vrsr_capable", 0400, root, xe,
+			    &vrsr_capable_fops);
+
 	for (mem_type = XE_PL_VRAM0; mem_type <= XE_PL_VRAM1; ++mem_type) {
 		man = ttm_manager_type(bdev, mem_type);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 01/12] PCI/ACPI: Add D3cold Aux Power Limit_DSM method
  2025-04-01 15:32 ` [PATCH 01/12] PCI/ACPI: Add D3cold Aux Power Limit_DSM method Anshuman Gupta
@ 2025-04-01 18:25   ` Bjorn Helgaas
  2025-04-02 10:59     ` Rafael J. Wysocki
  2025-04-03  5:25     ` Gupta, Anshuman
  0 siblings, 2 replies; 42+ messages in thread
From: Bjorn Helgaas @ 2025-04-01 18:25 UTC (permalink / raw)
  To: Anshuman Gupta
  Cc: intel-xe, linux-acpi, linux-pci, rafael, lenb, bhelgaas,
	ilpo.jarvinen, lucas.demarchi, rodrigo.vivi, badal.nilawar,
	varun.gupta, ville.syrjala, uma.shankar

On Tue, Apr 01, 2025 at 09:02:14PM +0530, Anshuman Gupta wrote:
> Implement _DSM method 10 and _DSM Method 11 as per PCI firmware specs
> section 4.6.10 Rev 3.3.

Thanks for splitting this into two patches.  But I think now this only
implements function 10 (0x0a), so this sentence needs to be updated.

I would write this consistently as "0x0a" or "0Ah" to match the spec
description.  I don't think the spec ever uses "10".

> Note that this implementation assumes only a single device below the
> Downstream Port will request for Aux Power Limit under a given
> Root Port because it does not track and aggregate requests
> from all child devices below the Downstream Port as required
> by Section 4.6.10 Rev 3.3.
> 
> One possible mitigation would be only allowing only first PCIe
> Non-Bridge Endpoint Function 0 driver to call_DSM method 10.
> 
> Signed-off-by: Varun Gupta <varun.gupta@intel.com>
> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> ---
>  drivers/pci/pci-acpi.c   | 78 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci-acpi.h |  6 ++++
>  2 files changed, 84 insertions(+)
> 
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index af370628e583..ebd49e43457e 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -1421,6 +1421,84 @@ static void pci_acpi_optimize_delay(struct pci_dev *pdev,
>  	ACPI_FREE(obj);
>  }
>  
> +/**
> + * pci_acpi_request_d3cold_aux_power - Request D3cold aux power via ACPI DSM
> + * @dev: PCI device instance
> + * @requested_power: Requested auxiliary power in milliwatts
> + *
> + * This function sends a request to the host BIOS via ACPI _DSM Function 10
> + * to grant the required D3Cold Auxiliary power for the specified PCI device.
> + * It evaluates the _DSM (Device Specific Method) to request the Auxiliary
> + * power and handles the response accordingly.
> + *
> + * This function shall be only called by 1st non-bridge Endpoint driver
> + * on Function 0. For a Multi-Function Device, the driver for Function 0 is
> + * required to report an aggregate power requirement covering all
> + * functions contained within the device.
> + *
> + * Return: Returns 0 on success and errno on failure.

Write all this in imperative mood, e.g.,

  Request auxiliary power while device is in D3cold ...

  Return 0 on success ...

> + */
> +int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
> +{
> +	union acpi_object in_obj = {
> +		.integer.type = ACPI_TYPE_INTEGER,
> +		.integer.value = requested_power,
> +	};
> +
> +	union acpi_object *out_obj;
> +	acpi_handle handle;
> +	int result, ret = -EINVAL;
> +
> +	if (!dev || !ACPI_HANDLE(&dev->dev))
> +		return -EINVAL;
> +
> +	handle = ACPI_HANDLE(&dev->dev);

This needs an acpi_check_dsm() call to find out whether the platform
supports DSM_PCI_D3COLD_AUX_POWER_LIMIT.

We have several _DSM calls that *should* do this, but unfortunately
they don't do it yet, so they're not good examples to copy.

> +	out_obj = acpi_evaluate_dsm_typed(handle,
> +					  &pci_acpi_dsm_guid,
> +					  4,
> +					  DSM_PCI_D3COLD_AUX_POWER_LIMIT,
> +					  &in_obj,
> +					  ACPI_TYPE_INTEGER);
> +	if (!out_obj)
> +		return -EINVAL;
> +
> +	result = out_obj->integer.value;
> +
> +	switch (result) {
> +	case 0x0:
> +		dev_dbg(&dev->dev, "D3cold Aux Power %umW request denied\n",
> +			requested_power);

Use pci_dbg(dev), pci_info(dev), etc.

> +		break;
> +	case 0x1:
> +		dev_info(&dev->dev, "D3cold Aux Power request granted: %umW\n",
> +			 requested_power);
> +		ret = 0;
> +		break;
> +	case 0x2:
> +		dev_info(&dev->dev, "D3cold Aux Power: Main power won't be removed\n");
> +		ret = -EBUSY;
> +		break;
> +	default:
> +		if (result >= 0x11 && result <= 0x1F) {
> +			int retry_interval = result & 0xF;
> +
> +			dev_warn(&dev->dev, "D3cold Aux Power request needs retry."
> +				 "Interval: %d seconds.\n", retry_interval);
> +			msleep(retry_interval * 1000);

It doesn't seem right to me to do this sleep internally because it
means this interface might take up to 15 seconds to return, and the
caller has no idea what is happening during that time.

This seems like it should be done in the driver, which can decide
*whether* it wants to sleep, and if it does sleep, it may give a nice
indication to the user.

Of course, that would mean returning some kind of retry interval
information to the caller.

> +			ret = -EAGAIN;
> +		} else {
> +			dev_err(&dev->dev, "D3cold Aux Power: Reserved or "
> +				"unsupported response: 0x%x.\n", result);

Drop periods at end of messages.

> +		}
> +		break;
> +	}
> +
> +	ACPI_FREE(out_obj);
> +	return ret;
> +}
> +EXPORT_SYMBOL(pci_acpi_request_d3cold_aux_power);

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method
  2025-04-01 15:32 ` [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method Anshuman Gupta
@ 2025-04-01 18:30   ` Bjorn Helgaas
  2025-04-03  5:59     ` Gupta, Anshuman
  2025-04-02 11:06   ` Rafael J. Wysocki
  1 sibling, 1 reply; 42+ messages in thread
From: Bjorn Helgaas @ 2025-04-01 18:30 UTC (permalink / raw)
  To: Anshuman Gupta
  Cc: intel-xe, linux-acpi, linux-pci, rafael, lenb, bhelgaas,
	ilpo.jarvinen, lucas.demarchi, rodrigo.vivi, badal.nilawar,
	varun.gupta, ville.syrjala, uma.shankar

On Tue, Apr 01, 2025 at 09:02:15PM +0530, Anshuman Gupta wrote:
> Implement _DSM Method 11 as per PCI firmware specs
> section 4.6.11 Rev 3.3.

"PCI Firmware r3.3, sec 4.6.11" so the citation is major to minor.

"0xb" or "0Bh" to match spec usage.

> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> ---
>  drivers/pci/pci-acpi.c   | 53 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci-acpi.h |  7 ++++++
>  2 files changed, 60 insertions(+)
> 
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index ebd49e43457e..04149f037664 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -1499,6 +1499,59 @@ int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
>  }
>  EXPORT_SYMBOL(pci_acpi_request_d3cold_aux_power);
>  
> +/**
> + * pci_acpi_add_perst_assertion_delay - Request PERST# delay via ACPI DSM
> + * @dev: PCI device instance
> + * @delay_us: Requested delay_us
> + *
> + * This function sends a request to the host BIOS via ACPI _DSM to grant the
> + * required PERST# delay for the specified PCI device. It evaluates the _DSM
> + * to request the PERST# delay and handles the response accordingly.
> + *
> + * Return: returns 0 on success and errno on failure.
> + */
> +int pci_acpi_add_perst_assertion_delay(struct pci_dev *dev, u32 delay_us)
> +{
> +	union acpi_object in_obj = {
> +		.integer.type = ACPI_TYPE_INTEGER,
> +		.integer.value = delay_us,
> +	};
> +
> +	union acpi_object *out_obj;
> +	acpi_handle handle;
> +	int result, ret = -EINVAL;
> +
> +	if (!dev || !ACPI_HANDLE(&dev->dev))
> +		return -EINVAL;
> +
> +	handle = ACPI_HANDLE(&dev->dev);

acpi_check_dsm().

> +	out_obj = acpi_evaluate_dsm_typed(handle, &pci_acpi_dsm_guid, 4,
> +					  DSM_PCI_PERST_ASSERTION_DELAY,
> +					  &in_obj, ACPI_TYPE_INTEGER);
> +	if (!out_obj)
> +		return -EINVAL;
> +
> +	result = out_obj->integer.value;
> +
> +	if (result == delay_us) {
> +		dev_info(&dev->dev, "PERST# Assertion Delay set to"
> +			 "%u microseconds\n", delay_us);

pci_info().

Join these into a single string, even though they won't fit in a line
without wrapping.  This is to make them easier to grep for when a user
reports seeing the message.  (Do this on the previous patch too, where
I forgot to mention it.)

> +		ret = 0;
> +	} else if (result == 0) {
> +		dev_warn(&dev->dev, "PERST# Assertion Delay request failed,"
> +			 "no previous valid request\n");
> +	} else {
> +		dev_warn(&dev->dev,
> +			 "PERST# Assertion Delay request failed"
> +			 "Previous valid delay: %u microseconds\n", result);
> +	}
> +
> +	ACPI_FREE(out_obj);
> +	return ret;
> +}
> +EXPORT_SYMBOL(pci_acpi_add_perst_assertion_delay);

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 06/12] drm/xe/vrsr: Initialize VRSR feature
  2025-04-01 15:32 ` [PATCH 06/12] drm/xe/vrsr: Initialize VRSR feature Anshuman Gupta
@ 2025-04-01 19:56   ` Bjorn Helgaas
  2025-04-03  6:09     ` Gupta, Anshuman
  0 siblings, 1 reply; 42+ messages in thread
From: Bjorn Helgaas @ 2025-04-01 19:56 UTC (permalink / raw)
  To: Anshuman Gupta
  Cc: intel-xe, linux-acpi, linux-pci, rafael, lenb, bhelgaas,
	ilpo.jarvinen, lucas.demarchi, rodrigo.vivi, badal.nilawar,
	varun.gupta, ville.syrjala, uma.shankar

On Tue, Apr 01, 2025 at 09:02:19PM +0530, Anshuman Gupta wrote:
> From: Badal Nilawar <badal.nilawar@intel.com>
> 
> Initialize VRSR feature by requesting Auxilary Power and PERST# assertion
> delay. Include an API to enable VRSR feature.

s/Auxilary/Auxiliary/

I would include the name of the API directly.

> +#define     PCODE_D3_VRSR_PERST_SHIFT	16

PCODE_D3_VRSR_PERST_SHIFT is not used by this series; maybe omit it?

> +#define	    POWER_D3_VRSR_PSERST_MASK	REG_GENMASK(31, 16)

s/PSERST/PERST/ (looks like a typo?)

> +static void xe_pm_vrsr_init(struct xe_device *xe)
> +{
> +	int ret;
> +
> +	/* Check if platform support d3cold vrsr */

Nicer to consistently capitalize as "VRSR" in comments, commit
logs, and messages.

Similar with "D3cold" ("D3cold" is used ~100 times in the tree,
"D3Cold" ~20, mostly in xe).

> +	if (!xe->info.has_vrsr)
> +		return;
> +
> +	if (!xe_pm_vrsr_capable(xe))
> +		return;
> +
> +	/*
> +	 * If the VRSR initialization fails, the device will proceed with the regular
> +	 * D3 Cold flow
> +	 */
> +	ret = pci_acpi_aux_power_setup(xe);
> +	if (ret) {
> +		drm_info(&xe->drm, "VRSR capable %s\n", "No");

Kinda weird to use %s when the text is a known constant.

> +		return;
> +	}
> +
> +	xe->d3cold.vrsr_capable = true;
> +	drm_info(&xe->drm, "VRSR capable %s\n", "Yes");

Same.

> +}

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
  2025-04-01 15:32 ` [PATCH 03/12] PCI/ACPI: Add aux power grant notifier Anshuman Gupta
@ 2025-04-01 20:13   ` Bjorn Helgaas
  2025-04-02 11:23     ` Rafael J. Wysocki
  2025-04-03  7:56     ` Gupta, Anshuman
  0 siblings, 2 replies; 42+ messages in thread
From: Bjorn Helgaas @ 2025-04-01 20:13 UTC (permalink / raw)
  To: Anshuman Gupta
  Cc: intel-xe, linux-acpi, linux-pci, rafael, lenb, bhelgaas,
	ilpo.jarvinen, lucas.demarchi, rodrigo.vivi, badal.nilawar,
	varun.gupta, ville.syrjala, uma.shankar

On Tue, Apr 01, 2025 at 09:02:16PM +0530, Anshuman Gupta wrote:
> Adding a notifier to notify all PCIe child devices about the
> block main power removal. It is needed because theoretically
> multiple PCIe Endpoint devices on same Root Port
> can request for AUX power and _DSM method can return with
> 80000000h signifies that the hierarchy connected via
> the slot cannot support main power removal when in D3Cold.

I wish the spec used different language here.  "D3cold" *means* "main
power is removed" (PCIe r6.0, sec 5.3.1.4.2), so it doesn't make sense
to say that "the slot cannot support main power removal when in
D3cold".  If a device is in D3cold, its main power has been removed by
definition.

I suppose the spec is trying to say if the driver has called this _DSM
with 80000000h, it means the platform must not remove main power from
the device while the system is in S0?  Is that what you think it
means?

The 2h return value description says it "indicates that the platform
will not remove main power from the slot while the system is in S0,"
so I guess that must be it.

In this series, pci_acpi_aux_power_setup() only supplies a 16-bit
aux_pwr_limit value, so the driver cannot *request* that the platform
not remove main power.

So I guess the only scenario where the _DSM returns 80000000h is when
the platform itself or other devices prevent the removal of main
power.  And the point of the notifier is to tell devices that their
main power will never be removed while the system is in S0.  Is that
right?

> This may disrupt functionality of other child device.

What sort of disruption could happen?  I would think that if the _DSM
returns 80000000h, it just means the device will not have main power
removed, so it won't see that power state transition.  The only
"disruption" would be that we're using more power.

> Let's notify all PCIe devices requested Aux power resource
> and Let PCIe End Point driver handle it in its callback.

Wrap to fill 75 columns.

s/Adding/Add/
s/Let's notify/Notify/
s/and Let/and let/
s/End Point/Endpoint/ (several places here and below)

> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> ---
>  drivers/pci/pci-acpi.c   | 34 +++++++++++++++++++++++++++++++---
>  include/linux/pci-acpi.h | 13 +++++++++++++
>  2 files changed, 44 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index 04149f037664..d1ca1649e6e8 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -1421,6 +1421,32 @@ static void pci_acpi_optimize_delay(struct pci_dev *pdev,
>  	ACPI_FREE(obj);
>  }
>  
> +static BLOCKING_NOTIFIER_HEAD(pci_acpi_aux_power_notify_list);
> +
> +/**
> + * pci_acpi_register_aux_power_notifier - Register driver notifier
> + * @nb: notifier block
> + *
> + * This function shall be called by PCIe End Point device requested the Aux
> + * power resource in order to handle the any scenario gracefully when other
> + * child PCIe devices Aux power request returns with No main power removal.
> + * PCIe devices which register this notifier shall handle No main power
> + * removal scenario accordingly.

This would actually be called by the *driver* (not by the device).

Reword in imperative mood if possible.

> + *
> + * Return: Returns 0 on success and errno on failure.
> + */
> +int pci_acpi_register_aux_power_notifier(struct notifier_block *nb)
> +{
> +	return blocking_notifier_chain_register(&pci_acpi_aux_power_notify_list, nb);
> +}
> +EXPORT_SYMBOL_GPL(pci_acpi_register_aux_power_notifier);
> +
> +void pci_acpi_unregister_aux_power_notifier(struct notifier_block *nb)
> +{
> +	blocking_notifier_chain_unregister(&pci_acpi_aux_power_notify_list, nb);
> +}
> +EXPORT_SYMBOL_GPL(pci_acpi_unregister_aux_power_notifier);
> +
>  /**
>   * pci_acpi_request_d3cold_aux_power - Request D3cold aux power via ACPI DSM
>   * @dev: PCI device instance
> @@ -1466,17 +1492,19 @@ int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
>  	result = out_obj->integer.value;
>  
>  	switch (result) {
> -	case 0x0:
> +	case ACPI_AUX_PW_DENIED:

Add these constants in the patch that adds the _DSM.  Then this patch
will be just notifier-related code.

>  		dev_dbg(&dev->dev, "D3cold Aux Power %umW request denied\n",
>  			requested_power);
>  		break;
> -	case 0x1:
> +	case ACPI_AUX_PW_GRANTED:
>  		dev_info(&dev->dev, "D3cold Aux Power request granted: %umW\n",
>  			 requested_power);
>  		ret = 0;
>  		break;
> -	case 0x2:
> +	case ACPI_NO_MAIN_PW_REMOVAL:
>  		dev_info(&dev->dev, "D3cold Aux Power: Main power won't be removed\n");
> +		blocking_notifier_call_chain(&pci_acpi_aux_power_notify_list,
> +					     ACPI_NO_MAIN_PW_REMOVAL, dev);
>  		ret = -EBUSY;
>  		break;
>  	default:
> diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
> index 4b7373f91a9a..793b979af98b 100644
> --- a/include/linux/pci-acpi.h
> +++ b/include/linux/pci-acpi.h
> @@ -124,6 +124,10 @@ extern const guid_t pci_acpi_dsm_guid;
>  #define DSM_PCI_D3COLD_AUX_POWER_LIMIT		0x0A
>  #define DSM_PCI_PERST_ASSERTION_DELAY		0x0B
>  
> +#define ACPI_AUX_PW_DENIED			0x0
> +#define ACPI_AUX_PW_GRANTED			0x1
> +#define ACPI_NO_MAIN_PW_REMOVAL			0x2
> +
>  #ifdef CONFIG_PCIE_EDR
>  void pci_acpi_add_edr_notifier(struct pci_dev *pdev);
>  void pci_acpi_remove_edr_notifier(struct pci_dev *pdev);
> @@ -134,12 +138,21 @@ static inline void pci_acpi_remove_edr_notifier(struct pci_dev *pdev) { }
>  
>  int pci_acpi_set_companion_lookup_hook(struct acpi_device *(*func)(struct pci_dev *));
>  void pci_acpi_clear_companion_lookup_hook(void);
> +int pci_acpi_register_aux_power_notifier(struct notifier_block *nb);
> +void pci_acpi_unregister_aux_power_notifier(struct notifier_block *nb);
>  int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power);
>  int pci_acpi_add_perst_assertion_delay(struct pci_dev *dev, u32 delay_us);
>  
>  #else	/* CONFIG_ACPI */
>  static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
>  static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
> +int pci_acpi_register_aux_power_notifier(struct notifier_block *nb)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +void pci_acpi_unregister_aux_power_notifier(struct notifier_block *nb) { }
> +
>  int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
>  {
>  	return -EOPNOTSUPP;
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [10/12] drm/xe/pm: D3Cold target state
  2025-04-01 15:32 ` [PATCH 10/12] drm/xe/pm: D3Cold target state Anshuman Gupta
@ 2025-04-02 10:28   ` Poosa, Karthik
  0 siblings, 0 replies; 42+ messages in thread
From: Poosa, Karthik @ 2025-04-02 10:28 UTC (permalink / raw)
  To: Anshuman Gupta, intel-xe, linux-acpi, linux-pci
  Cc: rafael, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, varun.gupta, ville.syrjala,
	uma.shankar


On 01-04-2025 21:02, Anshuman Gupta wrote:
> Trade-off D3Cold target state based upon current vram usage.
> If vram usage is greater than vram_d3cold_threshold and GPU
> is VRSR capable target D3Cold state is D3Cold-VRSR
> otherwise target state is D3Cold-Off.
>
> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_pm.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> index d4149a2eace7..5db9313ae269 100644
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -148,6 +148,14 @@ static void xe_rpm_lockmap_release(const struct xe_device *xe)
>   			 &xe_pm_runtime_d3cold_map);
>   }
>   
> +static void xe_pm_suspend_prepare(struct xe_device *xe)
> +{
> +	if (pm_suspend_target_state == PM_SUSPEND_TO_IDLE)
> +		xe_pm_d3cold_allowed_toggle(xe);
> +	else
> +		xe->d3cold.target_state = XE_D3COLD_OFF;
> +}
> +
>   /**
>    * xe_pm_suspend - Helper for System suspend, i.e. S0->S3 / S0->S2idle
>    * @xe: xe device instance
> @@ -163,6 +171,8 @@ int xe_pm_suspend(struct xe_device *xe)
>   	drm_dbg(&xe->drm, "Suspending device\n");
>   	trace_xe_pm_suspend(xe, __builtin_return_address(0));
>   
> +	xe_pm_suspend_prepare(xe);
> +
>   	err = xe_pxp_pm_suspend(xe->pxp);
>   	if (err)
>   		goto err;
> @@ -927,10 +937,14 @@ void xe_pm_d3cold_target_state_toggle(struct xe_device *xe)
>   
>   	if (total_vram_used_mb < xe->d3cold.vram_threshold)
>   		xe->d3cold.target_state = XE_D3COLD_OFF;
> +	else if (xe->d3cold.vrsr_capable)
> +		xe->d3cold.target_state = XE_D3COLD_VRSR;
>   	else
>   		xe->d3cold.target_state = XE_D3HOT;
>   
>   	mutex_unlock(&xe->d3cold.lock);
> +
> +	drm_dbg(&xe->drm, "Xe d3cold target state %d\n", xe->d3cold.allowed);
the variable should be xe->d3cold.target_state
>   }
>   
>   /**

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 01/12] PCI/ACPI: Add D3cold Aux Power Limit_DSM method
  2025-04-01 18:25   ` Bjorn Helgaas
@ 2025-04-02 10:59     ` Rafael J. Wysocki
  2025-04-03  5:25     ` Gupta, Anshuman
  1 sibling, 0 replies; 42+ messages in thread
From: Rafael J. Wysocki @ 2025-04-02 10:59 UTC (permalink / raw)
  To: Bjorn Helgaas, Anshuman Gupta
  Cc: intel-xe, linux-acpi, linux-pci, rafael, lenb, bhelgaas,
	ilpo.jarvinen, lucas.demarchi, rodrigo.vivi, badal.nilawar,
	varun.gupta, ville.syrjala, uma.shankar

On Tue, Apr 1, 2025 at 8:25 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, Apr 01, 2025 at 09:02:14PM +0530, Anshuman Gupta wrote:
> > Implement _DSM method 10 and _DSM Method 11 as per PCI firmware specs
> > section 4.6.10 Rev 3.3.
>
> Thanks for splitting this into two patches.  But I think now this only
> implements function 10 (0x0a), so this sentence needs to be updated.
>
> I would write this consistently as "0x0a" or "0Ah" to match the spec
> description.  I don't think the spec ever uses "10".
>
> > Note that this implementation assumes only a single device below the
> > Downstream Port will request for Aux Power Limit under a given
> > Root Port because it does not track and aggregate requests
> > from all child devices below the Downstream Port as required
> > by Section 4.6.10 Rev 3.3.

Why is it regarded as compliant, then?

Request aggregation is a known problem that has been addressed for a
couple of times (at least) in the kernel.  Why is it too hard to
address it here?

> > One possible mitigation would be only allowing only first PCIe
> > Non-Bridge Endpoint Function 0 driver to call_DSM method 10.

What about topologies where there is a switch with multiple downstream
ports below the Root Port?

> >
> > Signed-off-by: Varun Gupta <varun.gupta@intel.com>
> > Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
> > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> > ---
> >  drivers/pci/pci-acpi.c   | 78 ++++++++++++++++++++++++++++++++++++++++
> >  include/linux/pci-acpi.h |  6 ++++
> >  2 files changed, 84 insertions(+)
> >
> > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> > index af370628e583..ebd49e43457e 100644
> > --- a/drivers/pci/pci-acpi.c
> > +++ b/drivers/pci/pci-acpi.c
> > @@ -1421,6 +1421,84 @@ static void pci_acpi_optimize_delay(struct pci_dev *pdev,
> >       ACPI_FREE(obj);
> >  }
> >
> > +/**
> > + * pci_acpi_request_d3cold_aux_power - Request D3cold aux power via ACPI DSM
> > + * @dev: PCI device instance
> > + * @requested_power: Requested auxiliary power in milliwatts

How's the caller supposed to find out what value to use here?

> > + *
> > + * This function sends a request to the host BIOS via ACPI _DSM Function 10
> > + * to grant the required D3Cold Auxiliary power for the specified PCI device.
> > + * It evaluates the _DSM (Device Specific Method) to request the Auxiliary
> > + * power and handles the response accordingly.
> > + *
> > + * This function shall be only called by 1st non-bridge Endpoint driver
> > + * on Function 0. For a Multi-Function Device, the driver for Function 0 is
> > + * required to report an aggregate power requirement covering all
> > + * functions contained within the device.
> > + *
> > + * Return: Returns 0 on success and errno on failure.
>
> Write all this in imperative mood, e.g.,
>
>   Request auxiliary power while device is in D3cold ...
>
>   Return 0 on success ...
>
> > + */
> > +int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
> > +{
> > +     union acpi_object in_obj = {
> > +             .integer.type = ACPI_TYPE_INTEGER,
> > +             .integer.value = requested_power,
> > +     };
> > +
> > +     union acpi_object *out_obj;
> > +     acpi_handle handle;
> > +     int result, ret = -EINVAL;
> > +
> > +     if (!dev || !ACPI_HANDLE(&dev->dev))
> > +             return -EINVAL;
> > +
> > +     handle = ACPI_HANDLE(&dev->dev);

Well, ACPI_HANDLE() is not a simple macro, so I'd rather avoid using
it twice in a row for the same device.

What about

if (!dev)
        return -EINVAL;

handle = ACPI_HANDLE(&dev->dev);
if (!handle)
        return -EINVAL;

>
> This needs an acpi_check_dsm() call to find out whether the platform
> supports DSM_PCI_D3COLD_AUX_POWER_LIMIT.

Absolutely.

> We have several _DSM calls that *should* do this, but unfortunately
> they don't do it yet, so they're not good examples to copy.

Right.

> > +     out_obj = acpi_evaluate_dsm_typed(handle,
> > +                                       &pci_acpi_dsm_guid,
> > +                                       4,
> > +                                       DSM_PCI_D3COLD_AUX_POWER_LIMIT,
> > +                                       &in_obj,
> > +                                       ACPI_TYPE_INTEGER);
> > +     if (!out_obj)
> > +             return -EINVAL;
> > +
> > +     result = out_obj->integer.value;
> > +
> > +     switch (result) {
> > +     case 0x0:
> > +             dev_dbg(&dev->dev, "D3cold Aux Power %umW request denied\n",
> > +                     requested_power);
>
> Use pci_dbg(dev), pci_info(dev), etc.
>
> > +             break;
> > +     case 0x1:
> > +             dev_info(&dev->dev, "D3cold Aux Power request granted: %umW\n",
> > +                      requested_power);
> > +             ret = 0;
> > +             break;
> > +     case 0x2:
> > +             dev_info(&dev->dev, "D3cold Aux Power: Main power won't be removed\n");
> > +             ret = -EBUSY;
> > +             break;
> > +     default:
> > +             if (result >= 0x11 && result <= 0x1F) {
> > +                     int retry_interval = result & 0xF;
> > +
> > +                     dev_warn(&dev->dev, "D3cold Aux Power request needs retry."
> > +                              "Interval: %d seconds.\n", retry_interval);
> > +                     msleep(retry_interval * 1000);
>
> It doesn't seem right to me to do this sleep internally because it
> means this interface might take up to 15 seconds to return, and the
> caller has no idea what is happening during that time.

I entirely agree here.

If this is going to happen during system suspend, sleeping for seconds
is a total no-go.

> This seems like it should be done in the driver, which can decide
> *whether* it wants to sleep, and if it does sleep, it may give a nice
> indication to the user.

So during a system suspend in progress, this is rather hard to achieve.

I'd say that if the request is not granted right away, it is a failure
and we don't use D3cold.

> Of course, that would mean returning some kind of retry interval
> information to the caller.
>
> > +                     ret = -EAGAIN;
> > +             } else {
> > +                     dev_err(&dev->dev, "D3cold Aux Power: Reserved or "
> > +                             "unsupported response: 0x%x.\n", result);
>
> Drop periods at end of messages.
>
> > +             }
> > +             break;
> > +     }
> > +
> > +     ACPI_FREE(out_obj);
> > +     return ret;
> > +}
> > +EXPORT_SYMBOL(pci_acpi_request_d3cold_aux_power);

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method
  2025-04-01 15:32 ` [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method Anshuman Gupta
  2025-04-01 18:30   ` Bjorn Helgaas
@ 2025-04-02 11:06   ` Rafael J. Wysocki
  2025-04-02 14:21     ` Bjorn Helgaas
  1 sibling, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2025-04-02 11:06 UTC (permalink / raw)
  To: Anshuman Gupta
  Cc: intel-xe, linux-acpi, linux-pci, rafael, lenb, bhelgaas,
	ilpo.jarvinen, lucas.demarchi, rodrigo.vivi, badal.nilawar,
	varun.gupta, ville.syrjala, uma.shankar

On Tue, Apr 1, 2025 at 5:36 PM Anshuman Gupta <anshuman.gupta@intel.com> wrote:
>
> Implement _DSM Method 11 as per PCI firmware specs
> section 4.6.11 Rev 3.3.
>
> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>

I have basically the same comments as for the previous patch (in
addition to Bjorn's comments).

> ---
>  drivers/pci/pci-acpi.c   | 53 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci-acpi.h |  7 ++++++
>  2 files changed, 60 insertions(+)
>
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index ebd49e43457e..04149f037664 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -1499,6 +1499,59 @@ int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
>  }
>  EXPORT_SYMBOL(pci_acpi_request_d3cold_aux_power);
>
> +/**
> + * pci_acpi_add_perst_assertion_delay - Request PERST# delay via ACPI DSM
> + * @dev: PCI device instance
> + * @delay_us: Requested delay_us

How's the caller supposed to find out what value to use here?

> + *
> + * This function sends a request to the host BIOS via ACPI _DSM to grant the
> + * required PERST# delay for the specified PCI device. It evaluates the _DSM
> + * to request the PERST# delay and handles the response accordingly.
> + *
> + * Return: returns 0 on success and errno on failure.
> + */
> +int pci_acpi_add_perst_assertion_delay(struct pci_dev *dev, u32 delay_us)
> +{
> +       union acpi_object in_obj = {
> +               .integer.type = ACPI_TYPE_INTEGER,
> +               .integer.value = delay_us,
> +       };
> +
> +       union acpi_object *out_obj;
> +       acpi_handle handle;
> +       int result, ret = -EINVAL;
> +
> +       if (!dev || !ACPI_HANDLE(&dev->dev))
> +               return -EINVAL;
> +
> +       handle = ACPI_HANDLE(&dev->dev);

Please use ACPI_HANDLE() once.

> +
> +       out_obj = acpi_evaluate_dsm_typed(handle, &pci_acpi_dsm_guid, 4,

This is something I haven't noticed in the previous patch, but also
applies to it.

Why is rev 4 of the interface hard-coded here?

> +                                         DSM_PCI_PERST_ASSERTION_DELAY,
> +                                         &in_obj, ACPI_TYPE_INTEGER);
> +       if (!out_obj)
> +               return -EINVAL;
> +
> +       result = out_obj->integer.value;
> +
> +       if (result == delay_us) {
> +               dev_info(&dev->dev, "PERST# Assertion Delay set to"
> +                        "%u microseconds\n", delay_us);
> +               ret = 0;
> +       } else if (result == 0) {
> +               dev_warn(&dev->dev, "PERST# Assertion Delay request failed,"
> +                        "no previous valid request\n");
> +       } else {
> +               dev_warn(&dev->dev,
> +                        "PERST# Assertion Delay request failed"
> +                        "Previous valid delay: %u microseconds\n", result);
> +       }
> +
> +       ACPI_FREE(out_obj);
> +       return ret;
> +}
> +EXPORT_SYMBOL(pci_acpi_add_perst_assertion_delay);

EXPORT_SYMBOL_GPL()?

> +
>  static void pci_acpi_set_external_facing(struct pci_dev *dev)
>  {
>         u8 val;
> diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
> index dbc4b0ed433c..4b7373f91a9a 100644
> --- a/include/linux/pci-acpi.h
> +++ b/include/linux/pci-acpi.h
> @@ -122,6 +122,7 @@ extern const guid_t pci_acpi_dsm_guid;
>  #define DSM_PCI_POWER_ON_RESET_DELAY           0x08
>  #define DSM_PCI_DEVICE_READINESS_DURATIONS     0x09
>  #define DSM_PCI_D3COLD_AUX_POWER_LIMIT         0x0A
> +#define DSM_PCI_PERST_ASSERTION_DELAY          0x0B
>
>  #ifdef CONFIG_PCIE_EDR
>  void pci_acpi_add_edr_notifier(struct pci_dev *pdev);
> @@ -134,6 +135,7 @@ static inline void pci_acpi_remove_edr_notifier(struct pci_dev *pdev) { }
>  int pci_acpi_set_companion_lookup_hook(struct acpi_device *(*func)(struct pci_dev *));
>  void pci_acpi_clear_companion_lookup_hook(void);
>  int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power);
> +int pci_acpi_add_perst_assertion_delay(struct pci_dev *dev, u32 delay_us);
>
>  #else  /* CONFIG_ACPI */
>  static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
> @@ -142,6 +144,11 @@ int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
>  {
>         return -EOPNOTSUPP;
>  }
> +
> +int pci_acpi_add_perst_assertion_delay(struct pci_dev *dev, u32 delay_us)
> +{
> +       return -EOPNOTSUPP;
> +}
>  #endif /* CONFIG_ACPI */
>
>  #endif /* _PCI_ACPI_H_ */
> --

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
  2025-04-01 20:13   ` Bjorn Helgaas
@ 2025-04-02 11:23     ` Rafael J. Wysocki
  2025-04-03 11:30       ` Gupta, Anshuman
  2025-04-03  7:56     ` Gupta, Anshuman
  1 sibling, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2025-04-02 11:23 UTC (permalink / raw)
  To: Bjorn Helgaas, Anshuman Gupta
  Cc: intel-xe, linux-acpi, linux-pci, rafael, lenb, bhelgaas,
	ilpo.jarvinen, lucas.demarchi, rodrigo.vivi, badal.nilawar,
	varun.gupta, ville.syrjala, uma.shankar

On Tue, Apr 1, 2025 at 10:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, Apr 01, 2025 at 09:02:16PM +0530, Anshuman Gupta wrote:
> > Adding a notifier to notify all PCIe child devices about the
> > block main power removal. It is needed because theoretically
> > multiple PCIe Endpoint devices on same Root Port
> > can request for AUX power and _DSM method can return with
> > 80000000h signifies that the hierarchy connected via
> > the slot cannot support main power removal when in D3Cold.
>
> I wish the spec used different language here.  "D3cold" *means* "main
> power is removed" (PCIe r6.0, sec 5.3.1.4.2), so it doesn't make sense
> to say that "the slot cannot support main power removal when in
> D3cold".  If a device is in D3cold, its main power has been removed by
> definition.
>
> I suppose the spec is trying to say if the driver has called this _DSM
> with 80000000h, it means the platform must not remove main power from
> the device while the system is in S0?  Is that what you think it
> means?
>
> The 2h return value description says it "indicates that the platform
> will not remove main power from the slot while the system is in S0,"
> so I guess that must be it.
>
> In this series, pci_acpi_aux_power_setup() only supplies a 16-bit
> aux_pwr_limit value, so the driver cannot *request* that the platform
> not remove main power.
>
> So I guess the only scenario where the _DSM returns 80000000h is when
> the platform itself or other devices prevent the removal of main
> power.  And the point of the notifier is to tell devices that their
> main power will never be removed while the system is in S0.  Is that
> right?
>
> > This may disrupt functionality of other child device.
>
> What sort of disruption could happen?  I would think that if the _DSM
> returns 80000000h, it just means the device will not have main power
> removed, so it won't see that power state transition.  The only
> "disruption" would be that we're using more power.
>
> > Let's notify all PCIe devices requested Aux power resource
> > and Let PCIe End Point driver handle it in its callback.
>
> Wrap to fill 75 columns.
>
> s/Adding/Add/
> s/Let's notify/Notify/
> s/and Let/and let/
> s/End Point/Endpoint/ (several places here and below)
>
> > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> > ---
> >  drivers/pci/pci-acpi.c   | 34 +++++++++++++++++++++++++++++++---
> >  include/linux/pci-acpi.h | 13 +++++++++++++
> >  2 files changed, 44 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> > index 04149f037664..d1ca1649e6e8 100644
> > --- a/drivers/pci/pci-acpi.c
> > +++ b/drivers/pci/pci-acpi.c
> > @@ -1421,6 +1421,32 @@ static void pci_acpi_optimize_delay(struct pci_dev *pdev,
> >       ACPI_FREE(obj);
> >  }
> >
> > +static BLOCKING_NOTIFIER_HEAD(pci_acpi_aux_power_notify_list);
> > +
> > +/**
> > + * pci_acpi_register_aux_power_notifier - Register driver notifier
> > + * @nb: notifier block
> > + *
> > + * This function shall be called by PCIe End Point device requested the Aux
> > + * power resource in order to handle the any scenario gracefully when other
> > + * child PCIe devices Aux power request returns with No main power removal.
> > + * PCIe devices which register this notifier shall handle No main power
> > + * removal scenario accordingly.
>
> This would actually be called by the *driver* (not by the device).

Apart from this, there seems to be a design issue here because it
won't notify every driver that has requested the Aux power, just the
ones that have also registered notifiers.

So this appears to be an opt-in from getting notifications on Aux
power request rejection responses to requests from other drivers
requesting Aux power for the same Root Port, but the changelog of the
first patch already claimed that the aggregation of requests was not
supported.  So if only one driver will be allowed to request the Aux
power for the given Root Port, why would the notifiers be necessary
after all?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method
  2025-04-02 11:06   ` Rafael J. Wysocki
@ 2025-04-02 14:21     ` Bjorn Helgaas
  2025-04-02 14:52       ` Rafael J. Wysocki
  0 siblings, 1 reply; 42+ messages in thread
From: Bjorn Helgaas @ 2025-04-02 14:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Anshuman Gupta, intel-xe, linux-acpi, linux-pci, lenb, bhelgaas,
	ilpo.jarvinen, lucas.demarchi, rodrigo.vivi, badal.nilawar,
	varun.gupta, ville.syrjala, uma.shankar

On Wed, Apr 02, 2025 at 01:06:42PM +0200, Rafael J. Wysocki wrote:
> On Tue, Apr 1, 2025 at 5:36 PM Anshuman Gupta <anshuman.gupta@intel.com> wrote:
> >
> > Implement _DSM Method 11 as per PCI firmware specs
> > section 4.6.11 Rev 3.3.

> > +int pci_acpi_add_perst_assertion_delay(struct pci_dev *dev, u32 delay_us)
> > +{
> > +       union acpi_object in_obj = {
> > +               .integer.type = ACPI_TYPE_INTEGER,
> > +               .integer.value = delay_us,
> > +       };
> > +
> > +       union acpi_object *out_obj;
> > +       acpi_handle handle;
> > +       int result, ret = -EINVAL;
> > +
> > +       if (!dev || !ACPI_HANDLE(&dev->dev))
> > +               return -EINVAL;
> > +
> > +       handle = ACPI_HANDLE(&dev->dev);
> > +
> > +       out_obj = acpi_evaluate_dsm_typed(handle, &pci_acpi_dsm_guid, 4,
> 
> This is something I haven't noticed in the previous patch, but also
> applies to it.
> 
> Why is rev 4 of the interface hard-coded here?

Thanks for asking this because it's related to the whole _DSM revision
question that I don't understand.

If we didn't use rev 4 here, what should we use?  The PCI Firmware
spec, r3.3, sec 4.6.11, documents this interface and says "lowest
valid Revision ID value is 4", so that's the source of the 4.

My argument is that the spec documents rev 4, the kernel code was
tested with rev 4, so what would be the benefit of using a different
revision here?

> > +                                         DSM_PCI_PERST_ASSERTION_DELAY,
> > +                                         &in_obj, ACPI_TYPE_INTEGER);

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method
  2025-04-02 14:21     ` Bjorn Helgaas
@ 2025-04-02 14:52       ` Rafael J. Wysocki
  2025-04-02 15:50         ` Bjorn Helgaas
  0 siblings, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2025-04-02 14:52 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Anshuman Gupta, intel-xe, linux-acpi,
	linux-pci, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, varun.gupta, ville.syrjala,
	uma.shankar

On Wed, Apr 2, 2025 at 4:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Apr 02, 2025 at 01:06:42PM +0200, Rafael J. Wysocki wrote:
> > On Tue, Apr 1, 2025 at 5:36 PM Anshuman Gupta <anshuman.gupta@intel.com> wrote:
> > >
> > > Implement _DSM Method 11 as per PCI firmware specs
> > > section 4.6.11 Rev 3.3.
>
> > > +int pci_acpi_add_perst_assertion_delay(struct pci_dev *dev, u32 delay_us)
> > > +{
> > > +       union acpi_object in_obj = {
> > > +               .integer.type = ACPI_TYPE_INTEGER,
> > > +               .integer.value = delay_us,
> > > +       };
> > > +
> > > +       union acpi_object *out_obj;
> > > +       acpi_handle handle;
> > > +       int result, ret = -EINVAL;
> > > +
> > > +       if (!dev || !ACPI_HANDLE(&dev->dev))
> > > +               return -EINVAL;
> > > +
> > > +       handle = ACPI_HANDLE(&dev->dev);
> > > +
> > > +       out_obj = acpi_evaluate_dsm_typed(handle, &pci_acpi_dsm_guid, 4,
> >
> > This is something I haven't noticed in the previous patch, but also
> > applies to it.
> >
> > Why is rev 4 of the interface hard-coded here?
>
> Thanks for asking this because it's related to the whole _DSM revision
> question that I don't understand.
>
> If we didn't use rev 4 here, what should we use?  The PCI Firmware
> spec, r3.3, sec 4.6.11, documents this interface and says "lowest
> valid Revision ID value is 4", so that's the source of the 4.

Well, the "lowest vaild Revision ID" does not generally mean the "only
valid Revision ID".

> My argument is that the spec documents rev 4, the kernel code was
> tested with rev 4, so what would be the benefit of using a different
> revision here?

I'm talking about using a symbol to represent the number 4, not about
possibly using a different number, along the lines of using, say,
ACPI_FADT_LOW_POWER_S0 instead of putting BIT(21) directly into the
code.

The value is not likely to change, but using a symbol for representing
it has merit (it can be meaningfully used in searches, it can be
documented etc.).

Now, I'm not sure how likely it is for the PCI Firmware spec to bump
up the revision of this interface (I suppose that it will do so if a
new function is defined), but even if it does so, the kernel will have
to check both the new revision and rev 4 anyway, in case the firmware
doesn't know about the new revision.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method
  2025-04-02 14:52       ` Rafael J. Wysocki
@ 2025-04-02 15:50         ` Bjorn Helgaas
  2025-04-02 17:51           ` Rafael J. Wysocki
  0 siblings, 1 reply; 42+ messages in thread
From: Bjorn Helgaas @ 2025-04-02 15:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Anshuman Gupta, intel-xe, linux-acpi, linux-pci, lenb, bhelgaas,
	ilpo.jarvinen, lucas.demarchi, rodrigo.vivi, badal.nilawar,
	varun.gupta, ville.syrjala, uma.shankar

On Wed, Apr 02, 2025 at 04:52:55PM +0200, Rafael J. Wysocki wrote:
> On Wed, Apr 2, 2025 at 4:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Wed, Apr 02, 2025 at 01:06:42PM +0200, Rafael J. Wysocki wrote:
> > > On Tue, Apr 1, 2025 at 5:36 PM Anshuman Gupta <anshuman.gupta@intel.com> wrote:
> > > >
> > > > Implement _DSM Method 11 as per PCI firmware specs
> > > > section 4.6.11 Rev 3.3.
> >
> > > > +int pci_acpi_add_perst_assertion_delay(struct pci_dev *dev, u32 delay_us)
> > > > +{
> > > > +       union acpi_object in_obj = {
> > > > +               .integer.type = ACPI_TYPE_INTEGER,
> > > > +               .integer.value = delay_us,
> > > > +       };
> > > > +
> > > > +       union acpi_object *out_obj;
> > > > +       acpi_handle handle;
> > > > +       int result, ret = -EINVAL;
> > > > +
> > > > +       if (!dev || !ACPI_HANDLE(&dev->dev))
> > > > +               return -EINVAL;
> > > > +
> > > > +       handle = ACPI_HANDLE(&dev->dev);
> > > > +
> > > > +       out_obj = acpi_evaluate_dsm_typed(handle, &pci_acpi_dsm_guid, 4,
> > >
> > > This is something I haven't noticed in the previous patch, but also
> > > applies to it.
> > >
> > > Why is rev 4 of the interface hard-coded here?
> >
> > Thanks for asking this because it's related to the whole _DSM revision
> > question that I don't understand.
> >
> > If we didn't use rev 4 here, what should we use?  The PCI Firmware
> > spec, r3.3, sec 4.6.11, documents this interface and says "lowest
> > valid Revision ID value is 4", so that's the source of the 4.
> 
> Well, the "lowest valid Revision ID" does not generally mean the "only
> valid Revision ID".

True, but why should the kernel change from using the tested revision
ID to some other revision ID?  What would be the benefit of using rev
5?

PCI Firmware r3.3 does contain a statement that "OSPM must invoke all
Functions other than Function 0 with the same Revision ID value," but
IMO that was a mistake, and we just approved an ECR to remove it.

> > My argument is that the spec documents rev 4, the kernel code was
> > tested with rev 4, so what would be the benefit of using a different
> > revision here?
> 
> I'm talking about using a symbol to represent the number 4, not about
> possibly using a different number, along the lines of using, say,
> ACPI_FADT_LOW_POWER_S0 instead of putting BIT(21) directly into the
> code.
>
> The value is not likely to change, but using a symbol for representing
> it has merit (it can be meaningfully used in searches, it can be
> documented etc.).

DSM_PCI_PERST_ASSERTION_DELAY (the function number) is a great thing
to search for.  I doubt a symbol for "4" would really add anything.
At least in PCI, changes to a particular _DSM function are relatively
rare.

> Now, I'm not sure how likely it is for the PCI Firmware spec to bump
> up the revision of this interface (I suppose that it will do so if a
> new function is defined), but even if it does so, the kernel will have
> to check both the new revision and rev 4 anyway, in case the firmware
> doesn't know about the new revision.

I guess the reason the OS would check both rev 5 and rev 4 would be
that a new platform might implement DSM_PCI_PERST_ASSERTION_DELAY only
at rev 5?  I think this would be a real mistake in terms of
implementing something to the spec.  

The spec documents DSM_PCI_PERST_ASSERTION_DELAY rev 4.  Some
platforms implemented and tested DSM_PCI_PERST_ASSERTION_DELAY rev 4.
Linux added and tested support for DSM_PCI_PERST_ASSERTION_DELAY rev
4.  I propose new platforms should also implement and test
DSM_PCI_PERST_ASSERTION_DELAY rev 4.

If a new platform implements only DSM_PCI_PERST_ASSERTION_DELAY rev 5,
there is no actual documentation for that rev other than the spec
assertion that "rev 5 must support all functions defined by previous
revs of the UUID."  IMO this is nonsense.  The platform might have no
need to implement all the functions defined for the UUID.

Even if the platform makes all its functions valid for "the lowest
valid Revision ID" through the "current Revision ID", there's nothing
other than the spec assertion to guarantee that they all behave the
same.  And dealing with multiple revisions that are "supposed" to be
the same just makes work and risk for the OS.

Bjorn

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method
  2025-04-02 15:50         ` Bjorn Helgaas
@ 2025-04-02 17:51           ` Rafael J. Wysocki
  2025-04-02 18:48             ` Bjorn Helgaas
  0 siblings, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2025-04-02 17:51 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Anshuman Gupta, intel-xe, linux-acpi,
	linux-pci, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, varun.gupta, ville.syrjala,
	uma.shankar

On Wed, Apr 2, 2025 at 5:50 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Apr 02, 2025 at 04:52:55PM +0200, Rafael J. Wysocki wrote:
> > On Wed, Apr 2, 2025 at 4:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Wed, Apr 02, 2025 at 01:06:42PM +0200, Rafael J. Wysocki wrote:
> > > > On Tue, Apr 1, 2025 at 5:36 PM Anshuman Gupta <anshuman.gupta@intel.com> wrote:
> > > > >
> > > > > Implement _DSM Method 11 as per PCI firmware specs
> > > > > section 4.6.11 Rev 3.3.
> > >
> > > > > +int pci_acpi_add_perst_assertion_delay(struct pci_dev *dev, u32 delay_us)
> > > > > +{
> > > > > +       union acpi_object in_obj = {
> > > > > +               .integer.type = ACPI_TYPE_INTEGER,
> > > > > +               .integer.value = delay_us,
> > > > > +       };
> > > > > +
> > > > > +       union acpi_object *out_obj;
> > > > > +       acpi_handle handle;
> > > > > +       int result, ret = -EINVAL;
> > > > > +
> > > > > +       if (!dev || !ACPI_HANDLE(&dev->dev))
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       handle = ACPI_HANDLE(&dev->dev);
> > > > > +
> > > > > +       out_obj = acpi_evaluate_dsm_typed(handle, &pci_acpi_dsm_guid, 4,
> > > >
> > > > This is something I haven't noticed in the previous patch, but also
> > > > applies to it.
> > > >
> > > > Why is rev 4 of the interface hard-coded here?
> > >
> > > Thanks for asking this because it's related to the whole _DSM revision
> > > question that I don't understand.
> > >
> > > If we didn't use rev 4 here, what should we use?  The PCI Firmware
> > > spec, r3.3, sec 4.6.11, documents this interface and says "lowest
> > > valid Revision ID value is 4", so that's the source of the 4.
> >
> > Well, the "lowest valid Revision ID" does not generally mean the "only
> > valid Revision ID".
>
> True, but why should the kernel change from using the tested revision
> ID to some other revision ID?  What would be the benefit of using rev
> 5?

TBH I'm not exactly buying the "tested revision ID" concept because
what does it really mean?

If a given _DSM interface has been tested on one platform, does it
necessarily mean that it will work on all of the other platforms
implementing it?

> PCI Firmware r3.3 does contain a statement that "OSPM must invoke all
> Functions other than Function 0 with the same Revision ID value," but
> IMO that was a mistake, and we just approved an ECR to remove it.
>
> > > My argument is that the spec documents rev 4, the kernel code was
> > > tested with rev 4, so what would be the benefit of using a different
> > > revision here?
> >
> > I'm talking about using a symbol to represent the number 4, not about
> > possibly using a different number, along the lines of using, say,
> > ACPI_FADT_LOW_POWER_S0 instead of putting BIT(21) directly into the
> > code.
> >
> > The value is not likely to change, but using a symbol for representing
> > it has merit (it can be meaningfully used in searches, it can be
> > documented etc.).
>
> DSM_PCI_PERST_ASSERTION_DELAY (the function number) is a great thing
> to search for.  I doubt a symbol for "4" would really add anything.
> At least in PCI, changes to a particular _DSM function are relatively
> rare.
>
> > Now, I'm not sure how likely it is for the PCI Firmware spec to bump
> > up the revision of this interface (I suppose that it will do so if a
> > new function is defined), but even if it does so, the kernel will have
> > to check both the new revision and rev 4 anyway, in case the firmware
> > doesn't know about the new revision.
>
> I guess the reason the OS would check both rev 5 and rev 4 would be
> that a new platform might implement DSM_PCI_PERST_ASSERTION_DELAY only
> at rev 5?  I think this would be a real mistake in terms of
> implementing something to the spec.

This is a real possibility, however, because there's nothing to
prevent this kind of spec interpretation.

I didn't mean this, though.

Say the kernel wants to use a function that is only defined at rev 5.
It will invoke function 0 at rev 5 to see if this function is
available, but then it may as well see if the other functions it wants
to use are available at rev 5 because it will get this information
from the same _DSM call.  If the platform firmware responds that they
all are implemented, why won't the kernel use them all at rev 5?

> The spec documents DSM_PCI_PERST_ASSERTION_DELAY rev 4.  Some
> platforms implemented and tested DSM_PCI_PERST_ASSERTION_DELAY rev 4.
> Linux added and tested support for DSM_PCI_PERST_ASSERTION_DELAY rev
> 4.  I propose new platforms should also implement and test
> DSM_PCI_PERST_ASSERTION_DELAY rev 4.
>
> If a new platform implements only DSM_PCI_PERST_ASSERTION_DELAY rev 5,
> there is no actual documentation for that rev other than the spec
> assertion that "rev 5 must support all functions defined by previous
> revs of the UUID."  IMO this is nonsense.  The platform might have no
> need to implement all the functions defined for the UUID.

Sure.  That's why function 0 should always be used.

> Even if the platform makes all its functions valid for "the lowest
> valid Revision ID" through the "current Revision ID", there's nothing
> other than the spec assertion to guarantee that they all behave the
> same.  And dealing with multiple revisions that are "supposed" to be
> the same just makes work and risk for the OS.

You seem to be questioning the interface versioning at large by saying
that there is no real guarantee of backwards compatibility between
consecutive interface revisions on the same platform.

Presumably, though, the interface is adhering to some specification
which defines the behavior of the functions (and the set of available
functions in the first place) for all of the valid revisions of it.
The OS and the platform firmware both follow the same spec and so they
should be mutually compatible.  If this particular spec defines rev 5
to be an exact superset of rev 4, there is no reason to expect
anything else.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method
  2025-04-02 17:51           ` Rafael J. Wysocki
@ 2025-04-02 18:48             ` Bjorn Helgaas
  2025-04-02 19:36               ` Rafael J. Wysocki
  0 siblings, 1 reply; 42+ messages in thread
From: Bjorn Helgaas @ 2025-04-02 18:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Anshuman Gupta, intel-xe, linux-acpi, linux-pci, lenb, bhelgaas,
	ilpo.jarvinen, lucas.demarchi, rodrigo.vivi, badal.nilawar,
	varun.gupta, ville.syrjala, uma.shankar

On Wed, Apr 02, 2025 at 07:51:29PM +0200, Rafael J. Wysocki wrote:
> On Wed, Apr 2, 2025 at 5:50 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Wed, Apr 02, 2025 at 04:52:55PM +0200, Rafael J. Wysocki wrote:
> > > On Wed, Apr 2, 2025 at 4:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Wed, Apr 02, 2025 at 01:06:42PM +0200, Rafael J. Wysocki wrote:
> > > > > On Tue, Apr 1, 2025 at 5:36 PM Anshuman Gupta <anshuman.gupta@intel.com> wrote:
> > > > > >
> > > > > > Implement _DSM Method 11 as per PCI firmware specs
> > > > > > section 4.6.11 Rev 3.3.
> > > >
> > > > > > +int pci_acpi_add_perst_assertion_delay(struct pci_dev *dev, u32 delay_us)
> > > > > > +{
> > > > > > +       union acpi_object in_obj = {
> > > > > > +               .integer.type = ACPI_TYPE_INTEGER,
> > > > > > +               .integer.value = delay_us,
> > > > > > +       };
> > > > > > +
> > > > > > +       union acpi_object *out_obj;
> > > > > > +       acpi_handle handle;
> > > > > > +       int result, ret = -EINVAL;
> > > > > > +
> > > > > > +       if (!dev || !ACPI_HANDLE(&dev->dev))
> > > > > > +               return -EINVAL;
> > > > > > +
> > > > > > +       handle = ACPI_HANDLE(&dev->dev);
> > > > > > +
> > > > > > +       out_obj = acpi_evaluate_dsm_typed(handle, &pci_acpi_dsm_guid, 4,
> > > > >
> > > > > This is something I haven't noticed in the previous patch, but also
> > > > > applies to it.
> > > > >
> > > > > Why is rev 4 of the interface hard-coded here?
> > > >
> > > > Thanks for asking this because it's related to the whole _DSM revision
> > > > question that I don't understand.
> > > >
> > > > If we didn't use rev 4 here, what should we use?  The PCI Firmware
> > > > spec, r3.3, sec 4.6.11, documents this interface and says "lowest
> > > > valid Revision ID value is 4", so that's the source of the 4.
> > >
> > > Well, the "lowest valid Revision ID" does not generally mean the "only
> > > valid Revision ID".
> >
> > True, but why should the kernel change from using the tested revision
> > ID to some other revision ID?  What would be the benefit of using rev
> > 5?
> 
> TBH I'm not exactly buying the "tested revision ID" concept because
> what does it really mean?
> 
> If a given _DSM interface has been tested on one platform, does it
> necessarily mean that it will work on all of the other platforms
> implementing it?

No, all we can say is that this OS rev 4 code works on the platforms
we've tested.  Other platforms might have their own defects.

My point is that this OS code was written to the rev 4 spec and has
been tested, and we shouldn't change the code or the revision it uses
unless we're fixing something.

> > > Now, I'm not sure how likely it is for the PCI Firmware spec to
> > > bump up the revision of this interface (I suppose that it will
> > > do so if a new function is defined), but even if it does so, the
> > > kernel will have to check both the new revision and rev 4
> > > anyway, in case the firmware doesn't know about the new
> > > revision.
> >
> > I guess the reason the OS would check both rev 5 and rev 4 would
> > be that a new platform might implement
> > DSM_PCI_PERST_ASSERTION_DELAY only at rev 5?  I think this would
> > be a real mistake in terms of implementing something to the spec.
> 
> This is a real possibility, however, because there's nothing to
> prevent this kind of spec interpretation.
> 
> I didn't mean this, though.
> 
> Say the kernel wants to use a function that is only defined at rev
> 5.  It will invoke function 0 at rev 5 to see if this function is
> available, but then it may as well see if the other functions it
> wants to use are available at rev 5 because it will get this
> information from the same _DSM call.  If the platform firmware
> responds that they all are implemented, why won't the kernel use
> them all at rev 5?

This is an unnecessary change, since the OS tested its rev 4 code and
now we're saying the OS should use rev 5 instead, which the OS did not
test.  The reason rev 5 exists at all is probably that some completely
unrelated new function was added, and somebody decided it should be
rev 5.

I guess function 0 *could* have been defined to answer "yes/no" about
whether a single (function, revision) is implemented.  Then we
probably wouldn't be having this conversation.

But we do get an entire bitmask of implemented functions back from
function 0, which allows the possibility of using the same revision
for all the functions.  I suppose we could have a boot-time function
that calls function 0 with rev 0, 1, 2, ..., until it gets an empty
bitmask, and saves the highest rev with a non-empty mask.

In PCI we don't do that; instead, we call acpi_check_dsm() before
every acpi_evaluate_dsm().  The main reason is that I don't think it's
safe to change the function X rev just because function Y was added
later with a higher revision.

> > The spec documents DSM_PCI_PERST_ASSERTION_DELAY rev 4.  Some
> > platforms implemented and tested DSM_PCI_PERST_ASSERTION_DELAY rev
> > 4.  Linux added and tested support for
> > DSM_PCI_PERST_ASSERTION_DELAY rev 4.  I propose new platforms
> > should also implement and test DSM_PCI_PERST_ASSERTION_DELAY rev
> > 4.
> >
> > If a new platform implements only DSM_PCI_PERST_ASSERTION_DELAY
> > rev 5, there is no actual documentation for that rev other than
> > the spec assertion that "rev 5 must support all functions defined
> > by previous revs of the UUID."  IMO this is nonsense.  The
> > platform might have no need to implement all the functions defined
> > for the UUID.
> 
> Sure.  That's why function 0 should always be used.

Yes.  But the requirement that rev 5 must support all functions ever
previously documented for the UUID still makes no sense to me.  I
don't think there's any requirement that a platform implement ALL of
the documented PCI functions.

Maybe the intent is that this only applies to a given platform, i.e.,
that new firmware for that platform must continue to support all
functions and revisions that were ever supported on that platform?
That seems obvious to me and hardly worth mentioning.

> > Even if the platform makes all its functions valid for "the lowest
> > valid Revision ID" through the "current Revision ID", there's
> > nothing other than the spec assertion to guarantee that they all
> > behave the same.  And dealing with multiple revisions that are
> > "supposed" to be the same just makes work and risk for the OS.
> 
> You seem to be questioning the interface versioning at large by
> saying that there is no real guarantee of backwards compatibility
> between consecutive interface revisions on the same platform.

That's exactly what I'm saying, although I think problems are more
likely across different platforms.  There's the requirement in the
spec, but that's just words on a page.  There's no way to enforce or
validate it.

> Presumably, though, the interface is adhering to some specification
> which defines the behavior of the functions (and the set of
> available functions in the first place) for all of the valid
> revisions of it.  The OS and the platform firmware both follow the
> same spec and so they should be mutually compatible.  If this
> particular spec defines rev 5 to be an exact superset of rev 4,
> there is no reason to expect anything else.

I don't *expect* rev 5 to be different.  But as a user of it, why
would I change working, tested code that is not broken?

The PCI DPC function 0xc is an example where rev 5 (per ECN) and rev 6
(per r3.3) are not compatible.

If the OS implemented rev 5, then learns via function 0 that function
0xc is also supported at rev 6, and starts using the same OS code with
rev 6, the OS is broken.

Bjorn

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method
  2025-04-02 18:48             ` Bjorn Helgaas
@ 2025-04-02 19:36               ` Rafael J. Wysocki
  2025-04-08 20:48                 ` Bjorn Helgaas
  0 siblings, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2025-04-02 19:36 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Anshuman Gupta, intel-xe, linux-acpi,
	linux-pci, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, varun.gupta, ville.syrjala,
	uma.shankar

On Wed, Apr 2, 2025 at 8:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Apr 02, 2025 at 07:51:29PM +0200, Rafael J. Wysocki wrote:
> > On Wed, Apr 2, 2025 at 5:50 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Wed, Apr 02, 2025 at 04:52:55PM +0200, Rafael J. Wysocki wrote:
> > > > On Wed, Apr 2, 2025 at 4:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Wed, Apr 02, 2025 at 01:06:42PM +0200, Rafael J. Wysocki wrote:
> > > > > > On Tue, Apr 1, 2025 at 5:36 PM Anshuman Gupta <anshuman.gupta@intel.com> wrote:
> > > > > > >
> > > > > > > Implement _DSM Method 11 as per PCI firmware specs
> > > > > > > section 4.6.11 Rev 3.3.
> > > > >
> > > > > > > +int pci_acpi_add_perst_assertion_delay(struct pci_dev *dev, u32 delay_us)
> > > > > > > +{
> > > > > > > +       union acpi_object in_obj = {
> > > > > > > +               .integer.type = ACPI_TYPE_INTEGER,
> > > > > > > +               .integer.value = delay_us,
> > > > > > > +       };
> > > > > > > +
> > > > > > > +       union acpi_object *out_obj;
> > > > > > > +       acpi_handle handle;
> > > > > > > +       int result, ret = -EINVAL;
> > > > > > > +
> > > > > > > +       if (!dev || !ACPI_HANDLE(&dev->dev))
> > > > > > > +               return -EINVAL;
> > > > > > > +
> > > > > > > +       handle = ACPI_HANDLE(&dev->dev);
> > > > > > > +
> > > > > > > +       out_obj = acpi_evaluate_dsm_typed(handle, &pci_acpi_dsm_guid, 4,
> > > > > >
> > > > > > This is something I haven't noticed in the previous patch, but also
> > > > > > applies to it.
> > > > > >
> > > > > > Why is rev 4 of the interface hard-coded here?
> > > > >
> > > > > Thanks for asking this because it's related to the whole _DSM revision
> > > > > question that I don't understand.
> > > > >
> > > > > If we didn't use rev 4 here, what should we use?  The PCI Firmware
> > > > > spec, r3.3, sec 4.6.11, documents this interface and says "lowest
> > > > > valid Revision ID value is 4", so that's the source of the 4.
> > > >
> > > > Well, the "lowest valid Revision ID" does not generally mean the "only
> > > > valid Revision ID".
> > >
> > > True, but why should the kernel change from using the tested revision
> > > ID to some other revision ID?  What would be the benefit of using rev
> > > 5?
> >
> > TBH I'm not exactly buying the "tested revision ID" concept because
> > what does it really mean?
> >
> > If a given _DSM interface has been tested on one platform, does it
> > necessarily mean that it will work on all of the other platforms
> > implementing it?
>
> No, all we can say is that this OS rev 4 code works on the platforms
> we've tested.  Other platforms might have their own defects.
>
> My point is that this OS code was written to the rev 4 spec and has
> been tested, and we shouldn't change the code or the revision it uses
> unless we're fixing something.

Well, if the spec says "rev 5 is the same as rev 4 except for function
Y which is new in rev 5", then arguably the same OS code is also
compliant with rev 5 (it will not use function Y, but that's fine).

If the platform firmware follows the same spec, then the rev 5
implementation in it can be expected to work exactly as its rev 4
implementation either.

I see no problem with using rev 5 instead of rev 4 in that case and nn
new platforms, rev 5 is just as risky as rev 4.

> > > > Now, I'm not sure how likely it is for the PCI Firmware spec to
> > > > bump up the revision of this interface (I suppose that it will
> > > > do so if a new function is defined), but even if it does so, the
> > > > kernel will have to check both the new revision and rev 4
> > > > anyway, in case the firmware doesn't know about the new
> > > > revision.
> > >
> > > I guess the reason the OS would check both rev 5 and rev 4 would
> > > be that a new platform might implement
> > > DSM_PCI_PERST_ASSERTION_DELAY only at rev 5?  I think this would
> > > be a real mistake in terms of implementing something to the spec.
> >
> > This is a real possibility, however, because there's nothing to
> > prevent this kind of spec interpretation.
> >
> > I didn't mean this, though.
> >
> > Say the kernel wants to use a function that is only defined at rev
> > 5.  It will invoke function 0 at rev 5 to see if this function is
> > available, but then it may as well see if the other functions it
> > wants to use are available at rev 5 because it will get this
> > information from the same _DSM call.  If the platform firmware
> > responds that they all are implemented, why won't the kernel use
> > them all at rev 5?
>
> This is an unnecessary change, since the OS tested its rev 4 code and
> now we're saying the OS should use rev 5 instead, which the OS did not
> test.

Yes, it did.  It is the same code, as per the above.

> The reason rev 5 exists at all is probably that some completely
> unrelated new function was added, and somebody decided it should be
> rev 5.
>
> I guess function 0 *could* have been defined to answer "yes/no" about
> whether a single (function, revision) is implemented.  Then we
> probably wouldn't be having this conversation.
>
> But we do get an entire bitmask of implemented functions back from
> function 0, which allows the possibility of using the same revision
> for all the functions.  I suppose we could have a boot-time function
> that calls function 0 with rev 0, 1, 2, ..., until it gets an empty
> bitmask, and saves the highest rev with a non-empty mask.
>
> In PCI we don't do that; instead, we call acpi_check_dsm() before
> every acpi_evaluate_dsm().  The main reason is that I don't think it's
> safe to change the function X rev just because function Y was added
> later with a higher revision.

So what if there is a platform that only implements revisions greater than X?

Presumably, there is another OS targeted by this platform that only
uses revisions X+1 and forward.

> > > The spec documents DSM_PCI_PERST_ASSERTION_DELAY rev 4.  Some
> > > platforms implemented and tested DSM_PCI_PERST_ASSERTION_DELAY rev
> > > 4.  Linux added and tested support for
> > > DSM_PCI_PERST_ASSERTION_DELAY rev 4.  I propose new platforms
> > > should also implement and test DSM_PCI_PERST_ASSERTION_DELAY rev
> > > 4.
> > >
> > > If a new platform implements only DSM_PCI_PERST_ASSERTION_DELAY
> > > rev 5, there is no actual documentation for that rev other than
> > > the spec assertion that "rev 5 must support all functions defined
> > > by previous revs of the UUID."  IMO this is nonsense.  The
> > > platform might have no need to implement all the functions defined
> > > for the UUID.
> >
> > Sure.  That's why function 0 should always be used.
>
> Yes.  But the requirement that rev 5 must support all functions ever
> previously documented for the UUID still makes no sense to me.  I
> don't think there's any requirement that a platform implement ALL of
> the documented PCI functions.

I think that there is some confusion in the spec language that seems
to be conflating platforms with interface definitions.

It ostensibly wants the definitions of subsequent revisions of the
same interface to be consistent with each other, which is not a bad
thing in principle IMV, but at the same time it ostensibly allows
platforms to choose which functions to implement, and it doesn't say
anything about the expected behavior when the OS attempts to use a
function that is not implemented for a given rev.

You are right that the backward compatibility language in the _DSM
definition is not enforceable, but it can be regarded as a guidance:
Do not change the interface that you have once defined arbitrarily
between revisions.  After all, if you don't like this guidance, you
can just allocate a new UUID and define a new interface based on it.

Anyway, I'm not the author of this piece of the specification, so I
don't really know the original intent, and it is not sufficiently
clear.

> Maybe the intent is that this only applies to a given platform, i.e.,
> that new firmware for that platform must continue to support all
> functions and revisions that were ever supported on that platform?
> That seems obvious to me and hardly worth mentioning.
>
> > > Even if the platform makes all its functions valid for "the lowest
> > > valid Revision ID" through the "current Revision ID", there's
> > > nothing other than the spec assertion to guarantee that they all
> > > behave the same.  And dealing with multiple revisions that are
> > > "supposed" to be the same just makes work and risk for the OS.
> >
> > You seem to be questioning the interface versioning at large by
> > saying that there is no real guarantee of backwards compatibility
> > between consecutive interface revisions on the same platform.
>
> That's exactly what I'm saying, although I think problems are more
> likely across different platforms.  There's the requirement in the
> spec, but that's just words on a page.  There's no way to enforce or
> validate it.
>
> > Presumably, though, the interface is adhering to some specification
> > which defines the behavior of the functions (and the set of
> > available functions in the first place) for all of the valid
> > revisions of it.  The OS and the platform firmware both follow the
> > same spec and so they should be mutually compatible.  If this
> > particular spec defines rev 5 to be an exact superset of rev 4,
> > there is no reason to expect anything else.
>
> I don't *expect* rev 5 to be different.  But as a user of it, why
> would I change working, tested code that is not broken?
>
> The PCI DPC function 0xc is an example where rev 5 (per ECN) and rev 6
> (per r3.3) are not compatible.
>
> If the OS implemented rev 5, then learns via function 0 that function
> 0xc is also supported at rev 6, and starts using the same OS code with
> rev 6, the OS is broken.

Yes, in this case the backward compatibility language in the _DSM
definition is obviously not followed.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 01/12] PCI/ACPI: Add D3cold Aux Power Limit_DSM method
  2025-04-01 18:25   ` Bjorn Helgaas
  2025-04-02 10:59     ` Rafael J. Wysocki
@ 2025-04-03  5:25     ` Gupta, Anshuman
  1 sibling, 0 replies; 42+ messages in thread
From: Gupta, Anshuman @ 2025-04-03  5:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: intel-xe@lists.freedesktop.org, linux-acpi@vger.kernel.org,
	linux-pci@vger.kernel.org, rafael@kernel.org, lenb@kernel.org,
	bhelgaas@google.com, ilpo.jarvinen@linux.intel.com,
	De Marchi, Lucas, Vivi, Rodrigo, Nilawar, Badal, Gupta, Varun,
	ville.syrjala@linux.intel.com, Shankar, Uma



> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Tuesday, April 1, 2025 11:56 PM
> To: Gupta, Anshuman <anshuman.gupta@intel.com>
> Cc: intel-xe@lists.freedesktop.org; linux-acpi@vger.kernel.org; linux-
> pci@vger.kernel.org; rafael@kernel.org; lenb@kernel.org;
> bhelgaas@google.com; ilpo.jarvinen@linux.intel.com; De Marchi, Lucas
> <lucas.demarchi@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Nilawar,
> Badal <badal.nilawar@intel.com>; Gupta, Varun <varun.gupta@intel.com>;
> ville.syrjala@linux.intel.com; Shankar, Uma <uma.shankar@intel.com>
> Subject: Re: [PATCH 01/12] PCI/ACPI: Add D3cold Aux Power Limit_DSM
> method
> 
> On Tue, Apr 01, 2025 at 09:02:14PM +0530, Anshuman Gupta wrote:
> > Implement _DSM method 10 and _DSM Method 11 as per PCI firmware
> specs
> > section 4.6.10 Rev 3.3.
> 
> Thanks for splitting this into two patches.  But I think now this only
> implements function 10 (0x0a), so this sentence needs to be updated.
> 
> I would write this consistently as "0x0a" or "0Ah" to match the spec
> description.  I don't think the spec ever uses "10".
Thanks for Review, will fix the commit log.
> 
> > Note that this implementation assumes only a single device below the
> > Downstream Port will request for Aux Power Limit under a given Root
> > Port because it does not track and aggregate requests from all child
> > devices below the Downstream Port as required by Section 4.6.10 Rev
> > 3.3.
> >
> > One possible mitigation would be only allowing only first PCIe
> > Non-Bridge Endpoint Function 0 driver to call_DSM method 10.
> >
> > Signed-off-by: Varun Gupta <varun.gupta@intel.com>
> > Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
> > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> > ---
> >  drivers/pci/pci-acpi.c   | 78
> ++++++++++++++++++++++++++++++++++++++++
> >  include/linux/pci-acpi.h |  6 ++++
> >  2 files changed, 84 insertions(+)
> >
> > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index
> > af370628e583..ebd49e43457e 100644
> > --- a/drivers/pci/pci-acpi.c
> > +++ b/drivers/pci/pci-acpi.c
> > @@ -1421,6 +1421,84 @@ static void pci_acpi_optimize_delay(struct
> pci_dev *pdev,
> >  	ACPI_FREE(obj);
> >  }
> >
> > +/**
> > + * pci_acpi_request_d3cold_aux_power - Request D3cold aux power via
> > +ACPI DSM
> > + * @dev: PCI device instance
> > + * @requested_power: Requested auxiliary power in milliwatts
> > + *
> > + * This function sends a request to the host BIOS via ACPI _DSM
> > +Function 10
> > + * to grant the required D3Cold Auxiliary power for the specified PCI device.
> > + * It evaluates the _DSM (Device Specific Method) to request the
> > +Auxiliary
> > + * power and handles the response accordingly.
> > + *
> > + * This function shall be only called by 1st non-bridge Endpoint
> > +driver
> > + * on Function 0. For a Multi-Function Device, the driver for
> > +Function 0 is
> > + * required to report an aggregate power requirement covering all
> > + * functions contained within the device.
> > + *
> > + * Return: Returns 0 on success and errno on failure.
> 
> Write all this in imperative mood, e.g.,
> 
>   Request auxiliary power while device is in D3cold ...
Will address this in V2.
> 
>   Return 0 on success ...
> 
> > + */
> > +int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32
> > +requested_power) {
> > +	union acpi_object in_obj = {
> > +		.integer.type = ACPI_TYPE_INTEGER,
> > +		.integer.value = requested_power,
> > +	};
> > +
> > +	union acpi_object *out_obj;
> > +	acpi_handle handle;
> > +	int result, ret = -EINVAL;
> > +
> > +	if (!dev || !ACPI_HANDLE(&dev->dev))
> > +		return -EINVAL;
> > +
> > +	handle = ACPI_HANDLE(&dev->dev);
> 
> This needs an acpi_check_dsm() call to find out whether the platform supports
> DSM_PCI_D3COLD_AUX_POWER_LIMIT.
> 
> We have several _DSM calls that *should* do this, but unfortunately they
> don't do it yet, so they're not good examples to copy.
Thanks for comment, I would like to understand what shall be the Rev used to invoke this ?
 acpi_check_dsm(handle, guid, 4 , DSM_PCI_D3COLD_AUX_POWER_LIMIT).

TBH I am not able to understand the discussion about Rev 4 in the next patch.
I hope it is ok to use 4 as constant here for  Revision ID.
> 
> > +	out_obj = acpi_evaluate_dsm_typed(handle,
> > +					  &pci_acpi_dsm_guid,
> > +					  4,
> > +
> DSM_PCI_D3COLD_AUX_POWER_LIMIT,
> > +					  &in_obj,
> > +					  ACPI_TYPE_INTEGER);
> > +	if (!out_obj)
> > +		return -EINVAL;
> > +
> > +	result = out_obj->integer.value;
> > +
> > +	switch (result) {
> > +	case 0x0:
> > +		dev_dbg(&dev->dev, "D3cold Aux Power %umW request
> denied\n",
> > +			requested_power);
> 
> Use pci_dbg(dev), pci_info(dev), etc.
Will change all dev_{dbg, err, info} with pci_{dbg, err, info}.
> 
> > +		break;
> > +	case 0x1:
> > +		dev_info(&dev->dev, "D3cold Aux Power request granted:
> %umW\n",
> > +			 requested_power);
> > +		ret = 0;
> > +		break;
> > +	case 0x2:
> > +		dev_info(&dev->dev, "D3cold Aux Power: Main power won't
> be removed\n");
> > +		ret = -EBUSY;
> > +		break;
> > +	default:
> > +		if (result >= 0x11 && result <= 0x1F) {
> > +			int retry_interval = result & 0xF;
> > +
> > +			dev_warn(&dev->dev, "D3cold Aux Power request
> needs retry."
> > +				 "Interval: %d seconds.\n", retry_interval);
> > +			msleep(retry_interval * 1000);
> 
> It doesn't seem right to me to do this sleep internally because it means this
> interface might take up to 15 seconds to return, and the caller has no idea
> what is happening during that time.
> 
> This seems like it should be done in the driver, which can decide
> *whether* it wants to sleep, and if it does sleep, it may give a nice indication to
> the user.
> 
> Of course, that would mean returning some kind of retry interval information
> to the caller.
Sure will modify the function to return a output which will have the retry delay.
> 
> > +			ret = -EAGAIN;
> > +		} else {
> > +			dev_err(&dev->dev, "D3cold Aux Power: Reserved or "
> > +				"unsupported response: 0x%x.\n", result);
> 
> Drop periods at end of messages.
Will fix it.

Thanks,
Anshuman
> 
> > +		}
> > +		break;
> > +	}
> > +
> > +	ACPI_FREE(out_obj);
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL(pci_acpi_request_d3cold_aux_power);

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method
  2025-04-01 18:30   ` Bjorn Helgaas
@ 2025-04-03  5:59     ` Gupta, Anshuman
  0 siblings, 0 replies; 42+ messages in thread
From: Gupta, Anshuman @ 2025-04-03  5:59 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: intel-xe@lists.freedesktop.org, linux-acpi@vger.kernel.org,
	linux-pci@vger.kernel.org, rafael@kernel.org, lenb@kernel.org,
	bhelgaas@google.com, ilpo.jarvinen@linux.intel.com,
	De Marchi, Lucas, Vivi, Rodrigo, Nilawar, Badal, Gupta, Varun,
	ville.syrjala@linux.intel.com, Shankar, Uma



> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Wednesday, April 2, 2025 12:01 AM
> To: Gupta, Anshuman <anshuman.gupta@intel.com>
> Cc: intel-xe@lists.freedesktop.org; linux-acpi@vger.kernel.org; linux-
> pci@vger.kernel.org; rafael@kernel.org; lenb@kernel.org;
> bhelgaas@google.com; ilpo.jarvinen@linux.intel.com; De Marchi, Lucas
> <lucas.demarchi@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Nilawar,
> Badal <badal.nilawar@intel.com>; Gupta, Varun <varun.gupta@intel.com>;
> ville.syrjala@linux.intel.com; Shankar, Uma <uma.shankar@intel.com>
> Subject: Re: [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM
> method
> 
> On Tue, Apr 01, 2025 at 09:02:15PM +0530, Anshuman Gupta wrote:
> > Implement _DSM Method 11 as per PCI firmware specs section 4.6.11 Rev
> > 3.3.
> 
> "PCI Firmware r3.3, sec 4.6.11" so the citation is major to minor.
> 
> "0xb" or "0Bh" to match spec usage.
Thanks for review,
will fix it.
> 
> > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> > ---
> >  drivers/pci/pci-acpi.c   | 53
> ++++++++++++++++++++++++++++++++++++++++
> >  include/linux/pci-acpi.h |  7 ++++++
> >  2 files changed, 60 insertions(+)
> >
> > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index
> > ebd49e43457e..04149f037664 100644
> > --- a/drivers/pci/pci-acpi.c
> > +++ b/drivers/pci/pci-acpi.c
> > @@ -1499,6 +1499,59 @@ int pci_acpi_request_d3cold_aux_power(struct
> > pci_dev *dev, u32 requested_power)  }
> > EXPORT_SYMBOL(pci_acpi_request_d3cold_aux_power);
> >
> > +/**
> > + * pci_acpi_add_perst_assertion_delay - Request PERST# delay via ACPI
> > +DSM
> > + * @dev: PCI device instance
> > + * @delay_us: Requested delay_us
> > + *
> > + * This function sends a request to the host BIOS via ACPI _DSM to
> > +grant the
> > + * required PERST# delay for the specified PCI device. It evaluates
> > +the _DSM
> > + * to request the PERST# delay and handles the response accordingly.
> > + *
> > + * Return: returns 0 on success and errno on failure.
> > + */
> > +int pci_acpi_add_perst_assertion_delay(struct pci_dev *dev, u32
> > +delay_us) {
> > +	union acpi_object in_obj = {
> > +		.integer.type = ACPI_TYPE_INTEGER,
> > +		.integer.value = delay_us,
> > +	};
> > +
> > +	union acpi_object *out_obj;
> > +	acpi_handle handle;
> > +	int result, ret = -EINVAL;
> > +
> > +	if (!dev || !ACPI_HANDLE(&dev->dev))
> > +		return -EINVAL;
> > +
> > +	handle = ACPI_HANDLE(&dev->dev);
> 
> acpi_check_dsm().
Will fix it same as patch1
> 
> > +	out_obj = acpi_evaluate_dsm_typed(handle, &pci_acpi_dsm_guid, 4,
> > +					  DSM_PCI_PERST_ASSERTION_DELAY,
> > +					  &in_obj, ACPI_TYPE_INTEGER);
> > +	if (!out_obj)
> > +		return -EINVAL;
> > +
> > +	result = out_obj->integer.value;
> > +
> > +	if (result == delay_us) {
> > +		dev_info(&dev->dev, "PERST# Assertion Delay set to"
> > +			 "%u microseconds\n", delay_us);
> 
> pci_info().
> 
> Join these into a single string, even though they won't fit in a line without
> wrapping.  This is to make them easier to grep for when a user reports seeing
> the message.  (Do this on the previous patch too, where I forgot to mention
> it.)
there was comment in RFC to warp the line with in 80.
Will join the string.
Thanks,
Anshuman
> 
> > +		ret = 0;
> > +	} else if (result == 0) {
> > +		dev_warn(&dev->dev, "PERST# Assertion Delay request
> failed,"
> > +			 "no previous valid request\n");
> > +	} else {
> > +		dev_warn(&dev->dev,
> > +			 "PERST# Assertion Delay request failed"
> > +			 "Previous valid delay: %u microseconds\n", result);
> > +	}
> > +
> > +	ACPI_FREE(out_obj);
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL(pci_acpi_add_perst_assertion_delay);

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 06/12] drm/xe/vrsr: Initialize VRSR feature
  2025-04-01 19:56   ` Bjorn Helgaas
@ 2025-04-03  6:09     ` Gupta, Anshuman
  0 siblings, 0 replies; 42+ messages in thread
From: Gupta, Anshuman @ 2025-04-03  6:09 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: intel-xe@lists.freedesktop.org, linux-acpi@vger.kernel.org,
	linux-pci@vger.kernel.org, rafael@kernel.org, lenb@kernel.org,
	bhelgaas@google.com, ilpo.jarvinen@linux.intel.com,
	De Marchi, Lucas, Vivi, Rodrigo, Nilawar, Badal, Gupta, Varun,
	ville.syrjala@linux.intel.com, Shankar, Uma



> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Wednesday, April 2, 2025 1:27 AM
> To: Gupta, Anshuman <anshuman.gupta@intel.com>
> Cc: intel-xe@lists.freedesktop.org; linux-acpi@vger.kernel.org; linux-
> pci@vger.kernel.org; rafael@kernel.org; lenb@kernel.org;
> bhelgaas@google.com; ilpo.jarvinen@linux.intel.com; De Marchi, Lucas
> <lucas.demarchi@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Nilawar,
> Badal <badal.nilawar@intel.com>; Gupta, Varun <varun.gupta@intel.com>;
> ville.syrjala@linux.intel.com; Shankar, Uma <uma.shankar@intel.com>
> Subject: Re: [PATCH 06/12] drm/xe/vrsr: Initialize VRSR feature
> 
> On Tue, Apr 01, 2025 at 09:02:19PM +0530, Anshuman Gupta wrote:
> > From: Badal Nilawar <badal.nilawar@intel.com>
> >
> > Initialize VRSR feature by requesting Auxilary Power and PERST# assertion
> > delay. Include an API to enable VRSR feature.
> 
> s/Auxilary/Auxiliary/
> 
> I would include the name of the API directly.
> 
> > +#define     PCODE_D3_VRSR_PERST_SHIFT	16
> 
> PCODE_D3_VRSR_PERST_SHIFT is not used by this series; maybe omit it?
> 
> > +#define	    POWER_D3_VRSR_PSERST_MASK	REG_GENMASK(31,
> 16)
> 
> s/PSERST/PERST/ (looks like a typo?)
> 
> > +static void xe_pm_vrsr_init(struct xe_device *xe)
> > +{
> > +	int ret;
> > +
> > +	/* Check if platform support d3cold vrsr */
> 
> Nicer to consistently capitalize as "VRSR" in comments, commit
> logs, and messages.
> 
> Similar with "D3cold" ("D3cold" is used ~100 times in the tree,
> "D3Cold" ~20, mostly in xe).
> 
> > +	if (!xe->info.has_vrsr)
> > +		return;
> > +
> > +	if (!xe_pm_vrsr_capable(xe))
> > +		return;
> > +
> > +	/*
> > +	 * If the VRSR initialization fails, the device will proceed with the regular
> > +	 * D3 Cold flow
> > +	 */
> > +	ret = pci_acpi_aux_power_setup(xe);
> > +	if (ret) {
> > +		drm_info(&xe->drm, "VRSR capable %s\n", "No");
> 
> Kinda weird to use %s when the text is a known constant.
> 
> > +		return;
> > +	}
> > +
> > +	xe->d3cold.vrsr_capable = true;
> > +	drm_info(&xe->drm, "VRSR capable %s\n", "Yes");
> 
> Same.
Thanks for review comment, will address these in v2.
Thanks,
Anshuman.
> 
> > +}

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
  2025-04-01 20:13   ` Bjorn Helgaas
  2025-04-02 11:23     ` Rafael J. Wysocki
@ 2025-04-03  7:56     ` Gupta, Anshuman
  2025-04-03 13:48       ` Rafael J. Wysocki
  1 sibling, 1 reply; 42+ messages in thread
From: Gupta, Anshuman @ 2025-04-03  7:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: intel-xe@lists.freedesktop.org, linux-acpi@vger.kernel.org,
	linux-pci@vger.kernel.org, rafael@kernel.org, lenb@kernel.org,
	bhelgaas@google.com, ilpo.jarvinen@linux.intel.com,
	De Marchi, Lucas, Vivi, Rodrigo, Nilawar, Badal, Gupta, Varun,
	ville.syrjala@linux.intel.com, Shankar, Uma, Nasim, Kam



> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Wednesday, April 2, 2025 1:44 AM
> To: Gupta, Anshuman <anshuman.gupta@intel.com>
> Cc: intel-xe@lists.freedesktop.org; linux-acpi@vger.kernel.org; linux-
> pci@vger.kernel.org; rafael@kernel.org; lenb@kernel.org;
> bhelgaas@google.com; ilpo.jarvinen@linux.intel.com; De Marchi, Lucas
> <lucas.demarchi@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Nilawar,
> Badal <badal.nilawar@intel.com>; Gupta, Varun <varun.gupta@intel.com>;
> ville.syrjala@linux.intel.com; Shankar, Uma <uma.shankar@intel.com>
> Subject: Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
> 
> On Tue, Apr 01, 2025 at 09:02:16PM +0530, Anshuman Gupta wrote:
> > Adding a notifier to notify all PCIe child devices about the block
> > main power removal. It is needed because theoretically multiple PCIe
> > Endpoint devices on same Root Port can request for AUX power and _DSM
> > method can return with 80000000h signifies that the hierarchy
> > connected via the slot cannot support main power removal when in
> > D3Cold.
> 
> I wish the spec used different language here.  "D3cold" *means* "main power
> is removed" (PCIe r6.0, sec 5.3.1.4.2), so it doesn't make sense to say that
> "the slot cannot support main power removal when in D3cold".  If a device is
> in D3cold, its main power has been removed by definition.
> 
> I suppose the spec is trying to say if the driver has called this _DSM with
> 80000000h, it means the platform must not remove main power from the
> device while the system is in S0?  Is that what you think it means?
Thanks for review.
what I understand here, S0 term essentially means here PCIe Runtime PM or s2idle 
(both refers system is S0 state). AFAIU during both Runtime PM and s2ilde path 
Root Port can transition  to D3Cold if it support _PR3. 
I observed Root Port either have D0 or D3Cold state.
/sys/bus/pci0/devices/$root_port_bdf/firmware_node/real_power_state

Driver can disable the D3cold by pci_d3cold_disable(), so I wonder there is no use
case driver calling _DSM 0xa with 80000000h.  
PCIe specs shall be simplified by removal of value 80000000h from _DSM 0xa Arg.
> 
> The 2h return value description says it "indicates that the platform will not
> remove main power from the slot while the system is in S0,"
> so I guess that must be it.
> 
> In this series, pci_acpi_aux_power_setup() only supplies a 16-bit
> aux_pwr_limit value, so the driver cannot *request* that the platform not
> remove main power.
Yeah but that value Xe driver received from GPU firmware.
> 
> So I guess the only scenario where the _DSM returns 80000000h is when the
> platform itself or other devices prevent the removal of main power.  And the
> point of the notifier is to tell devices that their main power will never be
> removed while the system is in S0.  Is that right?
Yes the notifier will safeguard against the use case if some other PCIe device
calls _DSM 0xa with Arg3 Value of 80000000h or its request return with 0x2h.
> 
> > This may disrupt functionality of other child device.
> 
> What sort of disruption could happen?  I would think that if the _DSM returns
> 80000000h, it just means the device will not have main power removed, so it
> won't see that power state transition.  The only "disruption" would be that
> we're using more power.
Let's say during Xe Driver initialization BIOS firmware grants the Xe driver 
Aux power request successfully.
Xe driver will prepare to transition D3Cold state with VRAM Self Refresh.
VRAM Self Refresh relies on vram shall derive power from Aux power.
But later at any point if some other PCIe device under same root port 
Calls _DSM either with 080000000h or _DSM returns with 02h, will
Block the main power removal. But Xe driver is unaware of it can still
program the VRAM Self Refresh and that make VRAM to derive power 
from Aux and that will disrupt the GPU VRAM as there is no switch to Aux
Power.
Thanks,
Anshuman
> 
> > Let's notify all PCIe devices requested Aux power resource and Let
> > PCIe End Point driver handle it in its callback.
> 
> Wrap to fill 75 columns.
> 
> s/Adding/Add/
> s/Let's notify/Notify/
> s/and Let/and let/
> s/End Point/Endpoint/ (several places here and below)
> 
> > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> > ---
> >  drivers/pci/pci-acpi.c   | 34 +++++++++++++++++++++++++++++++---
> >  include/linux/pci-acpi.h | 13 +++++++++++++
> >  2 files changed, 44 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index
> > 04149f037664..d1ca1649e6e8 100644
> > --- a/drivers/pci/pci-acpi.c
> > +++ b/drivers/pci/pci-acpi.c
> > @@ -1421,6 +1421,32 @@ static void pci_acpi_optimize_delay(struct
> pci_dev *pdev,
> >  	ACPI_FREE(obj);
> >  }
> >
> > +static BLOCKING_NOTIFIER_HEAD(pci_acpi_aux_power_notify_list);
> > +
> > +/**
> > + * pci_acpi_register_aux_power_notifier - Register driver notifier
> > + * @nb: notifier block
> > + *
> > + * This function shall be called by PCIe End Point device requested
> > +the Aux
> > + * power resource in order to handle the any scenario gracefully when
> > +other
> > + * child PCIe devices Aux power request returns with No main power
> removal.
> > + * PCIe devices which register this notifier shall handle No main
> > +power
> > + * removal scenario accordingly.
> 
> This would actually be called by the *driver* (not by the device).
> 
> Reword in imperative mood if possible.
> 
> > + *
> > + * Return: Returns 0 on success and errno on failure.
> > + */
> > +int pci_acpi_register_aux_power_notifier(struct notifier_block *nb) {
> > +	return
> > +blocking_notifier_chain_register(&pci_acpi_aux_power_notify_list,
> > +nb); } EXPORT_SYMBOL_GPL(pci_acpi_register_aux_power_notifier);
> > +
> > +void pci_acpi_unregister_aux_power_notifier(struct notifier_block
> > +*nb) {
> > +	blocking_notifier_chain_unregister(&pci_acpi_aux_power_notify_list,
> > +nb); } EXPORT_SYMBOL_GPL(pci_acpi_unregister_aux_power_notifier);
> > +
> >  /**
> >   * pci_acpi_request_d3cold_aux_power - Request D3cold aux power via
> ACPI DSM
> >   * @dev: PCI device instance
> > @@ -1466,17 +1492,19 @@ int
> pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32
> requested_power)
> >  	result = out_obj->integer.value;
> >
> >  	switch (result) {
> > -	case 0x0:
> > +	case ACPI_AUX_PW_DENIED:
> 
> Add these constants in the patch that adds the _DSM.  Then this patch will be
> just notifier-related code.
> 
> >  		dev_dbg(&dev->dev, "D3cold Aux Power %umW request
> denied\n",
> >  			requested_power);
> >  		break;
> > -	case 0x1:
> > +	case ACPI_AUX_PW_GRANTED:
> >  		dev_info(&dev->dev, "D3cold Aux Power request granted:
> %umW\n",
> >  			 requested_power);
> >  		ret = 0;
> >  		break;
> > -	case 0x2:
> > +	case ACPI_NO_MAIN_PW_REMOVAL:
> >  		dev_info(&dev->dev, "D3cold Aux Power: Main power won't
> be
> > removed\n");
> > +		blocking_notifier_call_chain(&pci_acpi_aux_power_notify_list,
> > +					     ACPI_NO_MAIN_PW_REMOVAL,
> dev);
> >  		ret = -EBUSY;
> >  		break;
> >  	default:
> > diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index
> > 4b7373f91a9a..793b979af98b 100644
> > --- a/include/linux/pci-acpi.h
> > +++ b/include/linux/pci-acpi.h
> > @@ -124,6 +124,10 @@ extern const guid_t pci_acpi_dsm_guid;
> >  #define DSM_PCI_D3COLD_AUX_POWER_LIMIT		0x0A
> >  #define DSM_PCI_PERST_ASSERTION_DELAY		0x0B
> >
> > +#define ACPI_AUX_PW_DENIED			0x0
> > +#define ACPI_AUX_PW_GRANTED			0x1
> > +#define ACPI_NO_MAIN_PW_REMOVAL			0x2
> > +
> >  #ifdef CONFIG_PCIE_EDR
> >  void pci_acpi_add_edr_notifier(struct pci_dev *pdev);  void
> > pci_acpi_remove_edr_notifier(struct pci_dev *pdev); @@ -134,12 +138,21
> > @@ static inline void pci_acpi_remove_edr_notifier(struct pci_dev
> > *pdev) { }
> >
> >  int pci_acpi_set_companion_lookup_hook(struct acpi_device
> > *(*func)(struct pci_dev *));  void
> > pci_acpi_clear_companion_lookup_hook(void);
> > +int pci_acpi_register_aux_power_notifier(struct notifier_block *nb);
> > +void pci_acpi_unregister_aux_power_notifier(struct notifier_block
> > +*nb);
> >  int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32
> > requested_power);  int pci_acpi_add_perst_assertion_delay(struct
> > pci_dev *dev, u32 delay_us);
> >
> >  #else	/* CONFIG_ACPI */
> >  static inline void acpi_pci_add_bus(struct pci_bus *bus) { }  static
> > inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
> > +int pci_acpi_register_aux_power_notifier(struct notifier_block *nb) {
> > +	return -EOPNOTSUPP;
> > +}
> > +
> > +void pci_acpi_unregister_aux_power_notifier(struct notifier_block
> > +*nb) { }
> > +
> >  int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32
> > requested_power)  {
> >  	return -EOPNOTSUPP;
> > --
> > 2.43.0
> >

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
  2025-04-02 11:23     ` Rafael J. Wysocki
@ 2025-04-03 11:30       ` Gupta, Anshuman
  2025-04-03 13:34         ` Rafael J. Wysocki
  0 siblings, 1 reply; 42+ messages in thread
From: Gupta, Anshuman @ 2025-04-03 11:30 UTC (permalink / raw)
  To: Rafael J. Wysocki, Bjorn Helgaas
  Cc: intel-xe@lists.freedesktop.org, linux-acpi@vger.kernel.org,
	linux-pci@vger.kernel.org, lenb@kernel.org, bhelgaas@google.com,
	ilpo.jarvinen@linux.intel.com, De Marchi, Lucas, Vivi, Rodrigo,
	Nilawar, Badal, Gupta, Varun, ville.syrjala@linux.intel.com,
	Shankar, Uma



> -----Original Message-----
> From: Rafael J. Wysocki <rafael@kernel.org>
> Sent: Wednesday, April 2, 2025 4:54 PM
> To: Bjorn Helgaas <helgaas@kernel.org>; Gupta, Anshuman
> <anshuman.gupta@intel.com>
> Cc: intel-xe@lists.freedesktop.org; linux-acpi@vger.kernel.org; linux-
> pci@vger.kernel.org; rafael@kernel.org; lenb@kernel.org;
> bhelgaas@google.com; ilpo.jarvinen@linux.intel.com; De Marchi, Lucas
> <lucas.demarchi@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Nilawar,
> Badal <badal.nilawar@intel.com>; Gupta, Varun <varun.gupta@intel.com>;
> ville.syrjala@linux.intel.com; Shankar, Uma <uma.shankar@intel.com>
> Subject: Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
> 
> On Tue, Apr 1, 2025 at 10:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Tue, Apr 01, 2025 at 09:02:16PM +0530, Anshuman Gupta wrote:
> > > Adding a notifier to notify all PCIe child devices about the block
> > > main power removal. It is needed because theoretically multiple PCIe
> > > Endpoint devices on same Root Port can request for AUX power and
> > > _DSM method can return with 80000000h signifies that the hierarchy
> > > connected via the slot cannot support main power removal when in
> > > D3Cold.
> >
> > I wish the spec used different language here.  "D3cold" *means* "main
> > power is removed" (PCIe r6.0, sec 5.3.1.4.2), so it doesn't make sense
> > to say that "the slot cannot support main power removal when in
> > D3cold".  If a device is in D3cold, its main power has been removed by
> > definition.
> >
> > I suppose the spec is trying to say if the driver has called this _DSM
> > with 80000000h, it means the platform must not remove main power from
> > the device while the system is in S0?  Is that what you think it
> > means?
> >
> > The 2h return value description says it "indicates that the platform
> > will not remove main power from the slot while the system is in S0,"
> > so I guess that must be it.
> >
> > In this series, pci_acpi_aux_power_setup() only supplies a 16-bit
> > aux_pwr_limit value, so the driver cannot *request* that the platform
> > not remove main power.
> >
> > So I guess the only scenario where the _DSM returns 80000000h is when
> > the platform itself or other devices prevent the removal of main
> > power.  And the point of the notifier is to tell devices that their
> > main power will never be removed while the system is in S0.  Is that
> > right?
> >
> > > This may disrupt functionality of other child device.
> >
> > What sort of disruption could happen?  I would think that if the _DSM
> > returns 80000000h, it just means the device will not have main power
> > removed, so it won't see that power state transition.  The only
> > "disruption" would be that we're using more power.
> >
> > > Let's notify all PCIe devices requested Aux power resource and Let
> > > PCIe End Point driver handle it in its callback.
> >
> > Wrap to fill 75 columns.
> >
> > s/Adding/Add/
> > s/Let's notify/Notify/
> > s/and Let/and let/
> > s/End Point/Endpoint/ (several places here and below)
> >
> > > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> > > ---
> > >  drivers/pci/pci-acpi.c   | 34 +++++++++++++++++++++++++++++++---
> > >  include/linux/pci-acpi.h | 13 +++++++++++++
> > >  2 files changed, 44 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index
> > > 04149f037664..d1ca1649e6e8 100644
> > > --- a/drivers/pci/pci-acpi.c
> > > +++ b/drivers/pci/pci-acpi.c
> > > @@ -1421,6 +1421,32 @@ static void pci_acpi_optimize_delay(struct
> pci_dev *pdev,
> > >       ACPI_FREE(obj);
> > >  }
> > >
> > > +static BLOCKING_NOTIFIER_HEAD(pci_acpi_aux_power_notify_list);
> > > +
> > > +/**
> > > + * pci_acpi_register_aux_power_notifier - Register driver notifier
> > > + * @nb: notifier block
> > > + *
> > > + * This function shall be called by PCIe End Point device requested
> > > +the Aux
> > > + * power resource in order to handle the any scenario gracefully
> > > +when other
> > > + * child PCIe devices Aux power request returns with No main power
> removal.
> > > + * PCIe devices which register this notifier shall handle No main
> > > +power
> > > + * removal scenario accordingly.
> >
> > This would actually be called by the *driver* (not by the device).
> 
Hi Rafael,
Thanks for review.
> Apart from this, there seems to be a design issue here because it won't notify
> every driver that has requested the Aux power, just the ones that have also
> registered notifiers.
IMHO that is what intended, if any device has functional impact in our case it is
INTEL GPU has functional impact if other PCIe device's (on same root port) Aux
Power request returns with 0x2. We need to handle such case to handle it gracefully.
> 
> So this appears to be an opt-in from getting notifications on Aux power
> request rejection responses to requests from other drivers requesting Aux
> power for the same Root Port, but the changelog of the first patch already
> claimed that the aggregation of requests was not supported.  So if only one
> driver will be allowed to request the Aux power for the given Root Port, why
> would the notifiers be necessary after all?
Please guide us, if we remove the above limitation from the commit log.
Then do we have any design issue with notifier approach ?
Thanks,
Anshuman.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
  2025-04-03 11:30       ` Gupta, Anshuman
@ 2025-04-03 13:34         ` Rafael J. Wysocki
  2025-04-03 16:08           ` Gupta, Anshuman
  0 siblings, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2025-04-03 13:34 UTC (permalink / raw)
  To: Gupta, Anshuman
  Cc: Rafael J. Wysocki, Bjorn Helgaas, intel-xe@lists.freedesktop.org,
	linux-acpi@vger.kernel.org, linux-pci@vger.kernel.org,
	lenb@kernel.org, bhelgaas@google.com,
	ilpo.jarvinen@linux.intel.com, De Marchi, Lucas, Vivi, Rodrigo,
	Nilawar, Badal, Gupta, Varun, ville.syrjala@linux.intel.com,
	Shankar, Uma

On Thu, Apr 3, 2025 at 1:30 PM Gupta, Anshuman <anshuman.gupta@intel.com> wrote:
>
> > -----Original Message-----
> > From: Rafael J. Wysocki <rafael@kernel.org>
> > Sent: Wednesday, April 2, 2025 4:54 PM
> > To: Bjorn Helgaas <helgaas@kernel.org>; Gupta, Anshuman
> > <anshuman.gupta@intel.com>
> > Cc: intel-xe@lists.freedesktop.org; linux-acpi@vger.kernel.org; linux-
> > pci@vger.kernel.org; rafael@kernel.org; lenb@kernel.org;
> > bhelgaas@google.com; ilpo.jarvinen@linux.intel.com; De Marchi, Lucas
> > <lucas.demarchi@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Nilawar,
> > Badal <badal.nilawar@intel.com>; Gupta, Varun <varun.gupta@intel.com>;
> > ville.syrjala@linux.intel.com; Shankar, Uma <uma.shankar@intel.com>
> > Subject: Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
> >
> > On Tue, Apr 1, 2025 at 10:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > On Tue, Apr 01, 2025 at 09:02:16PM +0530, Anshuman Gupta wrote:
> > > > Adding a notifier to notify all PCIe child devices about the block
> > > > main power removal. It is needed because theoretically multiple PCIe
> > > > Endpoint devices on same Root Port can request for AUX power and
> > > > _DSM method can return with 80000000h signifies that the hierarchy
> > > > connected via the slot cannot support main power removal when in
> > > > D3Cold.
> > >
> > > I wish the spec used different language here.  "D3cold" *means* "main
> > > power is removed" (PCIe r6.0, sec 5.3.1.4.2), so it doesn't make sense
> > > to say that "the slot cannot support main power removal when in
> > > D3cold".  If a device is in D3cold, its main power has been removed by
> > > definition.
> > >
> > > I suppose the spec is trying to say if the driver has called this _DSM
> > > with 80000000h, it means the platform must not remove main power from
> > > the device while the system is in S0?  Is that what you think it
> > > means?
> > >
> > > The 2h return value description says it "indicates that the platform
> > > will not remove main power from the slot while the system is in S0,"
> > > so I guess that must be it.
> > >
> > > In this series, pci_acpi_aux_power_setup() only supplies a 16-bit
> > > aux_pwr_limit value, so the driver cannot *request* that the platform
> > > not remove main power.
> > >
> > > So I guess the only scenario where the _DSM returns 80000000h is when
> > > the platform itself or other devices prevent the removal of main
> > > power.  And the point of the notifier is to tell devices that their
> > > main power will never be removed while the system is in S0.  Is that
> > > right?
> > >
> > > > This may disrupt functionality of other child device.
> > >
> > > What sort of disruption could happen?  I would think that if the _DSM
> > > returns 80000000h, it just means the device will not have main power
> > > removed, so it won't see that power state transition.  The only
> > > "disruption" would be that we're using more power.
> > >
> > > > Let's notify all PCIe devices requested Aux power resource and Let
> > > > PCIe End Point driver handle it in its callback.
> > >
> > > Wrap to fill 75 columns.
> > >
> > > s/Adding/Add/
> > > s/Let's notify/Notify/
> > > s/and Let/and let/
> > > s/End Point/Endpoint/ (several places here and below)
> > >
> > > > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> > > > ---
> > > >  drivers/pci/pci-acpi.c   | 34 +++++++++++++++++++++++++++++++---
> > > >  include/linux/pci-acpi.h | 13 +++++++++++++
> > > >  2 files changed, 44 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index
> > > > 04149f037664..d1ca1649e6e8 100644
> > > > --- a/drivers/pci/pci-acpi.c
> > > > +++ b/drivers/pci/pci-acpi.c
> > > > @@ -1421,6 +1421,32 @@ static void pci_acpi_optimize_delay(struct
> > pci_dev *pdev,
> > > >       ACPI_FREE(obj);
> > > >  }
> > > >
> > > > +static BLOCKING_NOTIFIER_HEAD(pci_acpi_aux_power_notify_list);
> > > > +
> > > > +/**
> > > > + * pci_acpi_register_aux_power_notifier - Register driver notifier
> > > > + * @nb: notifier block
> > > > + *
> > > > + * This function shall be called by PCIe End Point device requested
> > > > +the Aux
> > > > + * power resource in order to handle the any scenario gracefully
> > > > +when other
> > > > + * child PCIe devices Aux power request returns with No main power
> > removal.
> > > > + * PCIe devices which register this notifier shall handle No main
> > > > +power
> > > > + * removal scenario accordingly.
> > >
> > > This would actually be called by the *driver* (not by the device).
> >
> Hi Rafael,
> Thanks for review.
> > Apart from this, there seems to be a design issue here because it won't notify
> > every driver that has requested the Aux power, just the ones that have also
> > registered notifiers.
> IMHO that is what intended, if any device has functional impact in our case it is
> INTEL GPU has functional impact if other PCIe device's (on same root port) Aux
> Power request returns with 0x2. We need to handle such case to handle it gracefully.
> >
> > So this appears to be an opt-in from getting notifications on Aux power
> > request rejection responses to requests from other drivers requesting Aux
> > power for the same Root Port, but the changelog of the first patch already
> > claimed that the aggregation of requests was not supported.  So if only one
> > driver will be allowed to request the Aux power for the given Root Port, why
> > would the notifiers be necessary after all?
> >
> Please guide us, if we remove the above limitation from the commit log.
> Then do we have any design issue with notifier approach ?

Exactly like I said: If you only allow one driver to use the _DSM to
request the Aux power from a given Root Port, it will have all of the
information and does not need to be notified about any changes.  Since
no one else is allowed to use this interface for that Root Port, no
one else will need a notifier either.  For this to work, you need some
mechanism allowing drivers to claim the interface so no one else can
use it (on a per Root Port basis) which is currently missing AFAICS.
I think that this is the option you want to pursue.

OTOH, if you want to allow multiple drivers to use this interface for
the same Root Port, you need to aggregate the requests (as required by
the spec IIUC) in the first place, or else the users can override each
other's request.  In that case you'll likely need a notification
mechanism of some sort to let the requesters know whether or not the
Aux power will be provided, but maybe that can be done through a
callback associated with a request.

The second option is genuinely more complicated because all of the
devices requesting the Aux power from the same Root Port cannot be
suspended independently in that case, so until there is a clear
real-world use case for it, my recommendation would be to go for the
first option.

But the first option means exclusive access and so no need for
notifiers and such.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
  2025-04-03  7:56     ` Gupta, Anshuman
@ 2025-04-03 13:48       ` Rafael J. Wysocki
  0 siblings, 0 replies; 42+ messages in thread
From: Rafael J. Wysocki @ 2025-04-03 13:48 UTC (permalink / raw)
  To: Gupta, Anshuman
  Cc: Bjorn Helgaas, intel-xe@lists.freedesktop.org,
	linux-acpi@vger.kernel.org, linux-pci@vger.kernel.org,
	rafael@kernel.org, lenb@kernel.org, bhelgaas@google.com,
	ilpo.jarvinen@linux.intel.com, De Marchi, Lucas, Vivi, Rodrigo,
	Nilawar, Badal, Gupta, Varun, ville.syrjala@linux.intel.com,
	Shankar, Uma, Nasim, Kam

On Thu, Apr 3, 2025 at 9:57 AM Gupta, Anshuman <anshuman.gupta@intel.com> wrote:
>
> > -----Original Message-----
> > From: Bjorn Helgaas <helgaas@kernel.org>
> > Sent: Wednesday, April 2, 2025 1:44 AM
> > To: Gupta, Anshuman <anshuman.gupta@intel.com>
> > Cc: intel-xe@lists.freedesktop.org; linux-acpi@vger.kernel.org; linux-
> > pci@vger.kernel.org; rafael@kernel.org; lenb@kernel.org;
> > bhelgaas@google.com; ilpo.jarvinen@linux.intel.com; De Marchi, Lucas
> > <lucas.demarchi@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Nilawar,
> > Badal <badal.nilawar@intel.com>; Gupta, Varun <varun.gupta@intel.com>;
> > ville.syrjala@linux.intel.com; Shankar, Uma <uma.shankar@intel.com>
> > Subject: Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
> >
> > On Tue, Apr 01, 2025 at 09:02:16PM +0530, Anshuman Gupta wrote:
> > > Adding a notifier to notify all PCIe child devices about the block
> > > main power removal. It is needed because theoretically multiple PCIe
> > > Endpoint devices on same Root Port can request for AUX power and _DSM
> > > method can return with 80000000h signifies that the hierarchy
> > > connected via the slot cannot support main power removal when in
> > > D3Cold.
> >
> > I wish the spec used different language here.  "D3cold" *means* "main power
> > is removed" (PCIe r6.0, sec 5.3.1.4.2), so it doesn't make sense to say that
> > "the slot cannot support main power removal when in D3cold".  If a device is
> > in D3cold, its main power has been removed by definition.
> >
> > I suppose the spec is trying to say if the driver has called this _DSM with
> > 80000000h, it means the platform must not remove main power from the
> > device while the system is in S0?  Is that what you think it means?
> Thanks for review.
> what I understand here, S0 term essentially means here PCIe Runtime PM or s2idle
> (both refers system is S0 state). AFAIU during both Runtime PM and s2ilde path
> Root Port can transition  to D3Cold if it support _PR3.
> I observed Root Port either have D0 or D3Cold state.
> /sys/bus/pci0/devices/$root_port_bdf/firmware_node/real_power_state
>
> Driver can disable the D3cold by pci_d3cold_disable(), so I wonder there is no use
> case driver calling _DSM 0xa with 80000000h.
> PCIe specs shall be simplified by removal of value 80000000h from _DSM 0xa Arg.
> >
> > The 2h return value description says it "indicates that the platform will not
> > remove main power from the slot while the system is in S0,"
> > so I guess that must be it.
> >
> > In this series, pci_acpi_aux_power_setup() only supplies a 16-bit
> > aux_pwr_limit value, so the driver cannot *request* that the platform not
> > remove main power.
> Yeah but that value Xe driver received from GPU firmware.
> >
> > So I guess the only scenario where the _DSM returns 80000000h is when the
> > platform itself or other devices prevent the removal of main power.  And the
> > point of the notifier is to tell devices that their main power will never be
> > removed while the system is in S0.  Is that right?
> Yes the notifier will safeguard against the use case if some other PCIe device
> calls _DSM 0xa with Arg3 Value of 80000000h or its request return with 0x2h.
> >
> > > This may disrupt functionality of other child device.
> >
> > What sort of disruption could happen?  I would think that if the _DSM returns
> > 80000000h, it just means the device will not have main power removed, so it
> > won't see that power state transition.  The only "disruption" would be that
> > we're using more power.
>
> Let's say during Xe Driver initialization BIOS firmware grants the Xe driver
> Aux power request successfully.

Slightly OT, but if you do it at init time, you also need to do it
after hibernation because the original ini-time request may not
survive it.

> Xe driver will prepare to transition D3Cold state with VRAM Self Refresh.
> VRAM Self Refresh relies on vram shall derive power from Aux power.
> But later at any point if some other PCIe device under same root port
> Calls _DSM either with 080000000h or _DSM returns with 02h, will
> Block the main power removal.

As I said in the previous message, this cannot happen if only one
driver at a time is allowed to request the Aux power on a given Root
Port.

If multiple drivers are allowed to do it, aggregation of requests is necessary.

> But Xe driver is unaware of it can still
> program the VRAM Self Refresh and that make VRAM to derive power
> from Aux and that will disrupt the GPU VRAM as there is no switch to Aux

This is a rather unfortunate and counter-intuitive design decision
IMV.  If the main power is still there, I'd expect the VRAM Self
Refresh to use it instead of the Aux power.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
  2025-04-03 13:34         ` Rafael J. Wysocki
@ 2025-04-03 16:08           ` Gupta, Anshuman
  2025-04-03 18:15             ` Rafael J. Wysocki
  0 siblings, 1 reply; 42+ messages in thread
From: Gupta, Anshuman @ 2025-04-03 16:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Bjorn Helgaas, intel-xe@lists.freedesktop.org,
	linux-acpi@vger.kernel.org, linux-pci@vger.kernel.org,
	lenb@kernel.org, bhelgaas@google.com,
	ilpo.jarvinen@linux.intel.com, De Marchi, Lucas, Vivi, Rodrigo,
	Nilawar, Badal, Gupta, Varun, ville.syrjala@linux.intel.com,
	Shankar, Uma



> -----Original Message-----
> From: Rafael J. Wysocki <rafael@kernel.org>
> Sent: Thursday, April 3, 2025 7:04 PM
> To: Gupta, Anshuman <anshuman.gupta@intel.com>
> Cc: Rafael J. Wysocki <rafael@kernel.org>; Bjorn Helgaas
> <helgaas@kernel.org>; intel-xe@lists.freedesktop.org; linux-
> acpi@vger.kernel.org; linux-pci@vger.kernel.org; lenb@kernel.org;
> bhelgaas@google.com; ilpo.jarvinen@linux.intel.com; De Marchi, Lucas
> <lucas.demarchi@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Nilawar,
> Badal <badal.nilawar@intel.com>; Gupta, Varun <varun.gupta@intel.com>;
> ville.syrjala@linux.intel.com; Shankar, Uma <uma.shankar@intel.com>
> Subject: Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
> 
> On Thu, Apr 3, 2025 at 1:30 PM Gupta, Anshuman
> <anshuman.gupta@intel.com> wrote:
> >
> > > -----Original Message-----
> > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > Sent: Wednesday, April 2, 2025 4:54 PM
> > > To: Bjorn Helgaas <helgaas@kernel.org>; Gupta, Anshuman
> > > <anshuman.gupta@intel.com>
> > > Cc: intel-xe@lists.freedesktop.org; linux-acpi@vger.kernel.org;
> > > linux- pci@vger.kernel.org; rafael@kernel.org; lenb@kernel.org;
> > > bhelgaas@google.com; ilpo.jarvinen@linux.intel.com; De Marchi, Lucas
> > > <lucas.demarchi@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>;
> > > Nilawar, Badal <badal.nilawar@intel.com>; Gupta, Varun
> > > <varun.gupta@intel.com>; ville.syrjala@linux.intel.com; Shankar, Uma
> > > <uma.shankar@intel.com>
> > > Subject: Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
> > >
> > > On Tue, Apr 1, 2025 at 10:13 PM Bjorn Helgaas <helgaas@kernel.org>
> wrote:
> > > >
> > > > On Tue, Apr 01, 2025 at 09:02:16PM +0530, Anshuman Gupta wrote:
> > > > > Adding a notifier to notify all PCIe child devices about the
> > > > > block main power removal. It is needed because theoretically
> > > > > multiple PCIe Endpoint devices on same Root Port can request for
> > > > > AUX power and _DSM method can return with 80000000h signifies
> > > > > that the hierarchy connected via the slot cannot support main
> > > > > power removal when in D3Cold.
> > > >
> > > > I wish the spec used different language here.  "D3cold" *means*
> > > > "main power is removed" (PCIe r6.0, sec 5.3.1.4.2), so it doesn't
> > > > make sense to say that "the slot cannot support main power removal
> > > > when in D3cold".  If a device is in D3cold, its main power has
> > > > been removed by definition.
> > > >
> > > > I suppose the spec is trying to say if the driver has called this
> > > > _DSM with 80000000h, it means the platform must not remove main
> > > > power from the device while the system is in S0?  Is that what you
> > > > think it means?
> > > >
> > > > The 2h return value description says it "indicates that the
> > > > platform will not remove main power from the slot while the system is in
> S0,"
> > > > so I guess that must be it.
> > > >
> > > > In this series, pci_acpi_aux_power_setup() only supplies a 16-bit
> > > > aux_pwr_limit value, so the driver cannot *request* that the
> > > > platform not remove main power.
> > > >
> > > > So I guess the only scenario where the _DSM returns 80000000h is
> > > > when the platform itself or other devices prevent the removal of
> > > > main power.  And the point of the notifier is to tell devices that
> > > > their main power will never be removed while the system is in S0.
> > > > Is that right?
> > > >
> > > > > This may disrupt functionality of other child device.
> > > >
> > > > What sort of disruption could happen?  I would think that if the
> > > > _DSM returns 80000000h, it just means the device will not have
> > > > main power removed, so it won't see that power state transition.
> > > > The only "disruption" would be that we're using more power.
> > > >
> > > > > Let's notify all PCIe devices requested Aux power resource and
> > > > > Let PCIe End Point driver handle it in its callback.
> > > >
> > > > Wrap to fill 75 columns.
> > > >
> > > > s/Adding/Add/
> > > > s/Let's notify/Notify/
> > > > s/and Let/and let/
> > > > s/End Point/Endpoint/ (several places here and below)
> > > >
> > > > > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> > > > > ---
> > > > >  drivers/pci/pci-acpi.c   | 34 +++++++++++++++++++++++++++++++---
> > > > >  include/linux/pci-acpi.h | 13 +++++++++++++
> > > > >  2 files changed, 44 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> > > > > index
> > > > > 04149f037664..d1ca1649e6e8 100644
> > > > > --- a/drivers/pci/pci-acpi.c
> > > > > +++ b/drivers/pci/pci-acpi.c
> > > > > @@ -1421,6 +1421,32 @@ static void
> > > > > pci_acpi_optimize_delay(struct
> > > pci_dev *pdev,
> > > > >       ACPI_FREE(obj);
> > > > >  }
> > > > >
> > > > > +static BLOCKING_NOTIFIER_HEAD(pci_acpi_aux_power_notify_list);
> > > > > +
> > > > > +/**
> > > > > + * pci_acpi_register_aux_power_notifier - Register driver
> > > > > +notifier
> > > > > + * @nb: notifier block
> > > > > + *
> > > > > + * This function shall be called by PCIe End Point device
> > > > > +requested the Aux
> > > > > + * power resource in order to handle the any scenario
> > > > > +gracefully when other
> > > > > + * child PCIe devices Aux power request returns with No main
> > > > > +power
> > > removal.
> > > > > + * PCIe devices which register this notifier shall handle No
> > > > > +main power
> > > > > + * removal scenario accordingly.
> > > >
> > > > This would actually be called by the *driver* (not by the device).
> > >
> > Hi Rafael,
> > Thanks for review.
> > > Apart from this, there seems to be a design issue here because it
> > > won't notify every driver that has requested the Aux power, just the
> > > ones that have also registered notifiers.
> > IMHO that is what intended, if any device has functional impact in our
> > case it is INTEL GPU has functional impact if other PCIe device's (on
> > same root port) Aux Power request returns with 0x2. We need to handle
> such case to handle it gracefully.
> > >
> > > So this appears to be an opt-in from getting notifications on Aux
> > > power request rejection responses to requests from other drivers
> > > requesting Aux power for the same Root Port, but the changelog of
> > > the first patch already claimed that the aggregation of requests was
> > > not supported.  So if only one driver will be allowed to request the
> > > Aux power for the given Root Port, why would the notifiers be necessary
> after all?
> > >
> > Please guide us, if we remove the above limitation from the commit log.
> > Then do we have any design issue with notifier approach ?
> 
> Exactly like I said: If you only allow one driver to use the _DSM to request the
> Aux power from a given Root Port, it will have all of the information and does
> not need to be notified about any changes.  Since no one else is allowed to use
> this interface for that Root Port, no one else will need a notifier either.  For this
> to work, you need some mechanism allowing drivers to claim the interface so
> no one else can use it (on a per Root Port basis) which is currently missing
> AFAICS.
IMHO such kind of mechanism will require to add Root Port specific data structure to claim 
the interface. But real problem is the criteria  to claim the interface.
Is first PCIe Non-Bridge Endpoint Function 0 driver can be criteria  to claim the
Interface. Or first come and first serve approach ?
> I think that this is the option you want to pursue.
> 
We will prefer to stick with this option, if it can guaranty that XeKMD will the only driver
allowed to request Aux power to enable VRSR.
> OTOH, if you want to allow multiple drivers to use this interface for the same
> Root Port, you need to aggregate the requests (as required by the spec IIUC) in
> the first place, or else the users can override each other's request.  In that case
INHO how Linux Kernel will aggregate the Aux Power request, without knowing the
total Aux power budget. AFAIU aggregated Aux power request without knowledge of total 
power budget can also be denied by BIOS[1].  
Therefore how aggregation can be useful ?
  
[1]As per r3.3 Section 4.6.10 
Platform Firmware Budgeting of Aux Power Availability:
Platform firmware must not grant more power than what is available within the system. For 
example, a board may be designed with four CEM slots (one x16 slot, one x4 slot, and two x1 
slots). The board may implement a power delivery circuit capable of supplying 2 W of power for 
the 3.3 Vaux rail supplying all 4 slots. The 3.3 Vaux pins on each CEM slot can supply 1 W each. 
Platform firmware may use the retry mechanism to prioritize requests from devices in preferred 
slots in the following manner:

-> Requests from a device in the highest priority slot (e.g., x16) are granted immediately, if 
available. 

-> Requests from devices in lower priority slots (e.g., x2, x1) are initially rejected, with a retry 
interval inversely proportional to the slot priority. For instance, if the x2 slot is higher priority 
than the x1 slots, so the retry interval for the x2 slot may be 4 seconds, and the x1 slots may be 
8 and 10 seconds.

->As requests are granted, the granted values are subtracted from the available budget.
 Retried requests are granted based on the remaining power budget or denied if insufficient 
power budget is available. Another retry is not permitted.

-> When there is insufficient power budget for a request, firmware may choose to keep main 
power on and return no power removal (2h).

Thanks,
Anshuman.
> you'll likely need a notification mechanism of some sort to let the requesters
> know whether or not the Aux power will be provided, but maybe that can be
> done through a callback associated with a request.
> 
> The second option is genuinely more complicated because all of the devices
> requesting the Aux power from the same Root Port cannot be suspended
> independently in that case, so until there is a clear real-world use case for it,
> my recommendation would be to go for the first option.
Multiple devices under same root ports requesting Aux Power is a theoretical 
use case.   
> 
> But the first option means exclusive access and so no need for notifiers and
> such.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
  2025-04-03 16:08           ` Gupta, Anshuman
@ 2025-04-03 18:15             ` Rafael J. Wysocki
  2025-04-04 12:53               ` Gupta, Anshuman
  0 siblings, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2025-04-03 18:15 UTC (permalink / raw)
  To: Gupta, Anshuman
  Cc: Rafael J. Wysocki, Bjorn Helgaas, intel-xe@lists.freedesktop.org,
	linux-acpi@vger.kernel.org, linux-pci@vger.kernel.org,
	lenb@kernel.org, bhelgaas@google.com,
	ilpo.jarvinen@linux.intel.com, De Marchi, Lucas, Vivi, Rodrigo,
	Nilawar, Badal, Gupta, Varun, ville.syrjala@linux.intel.com,
	Shankar, Uma

On Thu, Apr 3, 2025 at 6:09 PM Gupta, Anshuman <anshuman.gupta@intel.com> wrote:
>
> > -----Original Message-----
> > From: Rafael J. Wysocki <rafael@kernel.org>
> > Sent: Thursday, April 3, 2025 7:04 PM
> > To: Gupta, Anshuman <anshuman.gupta@intel.com>
> > Cc: Rafael J. Wysocki <rafael@kernel.org>; Bjorn Helgaas
> > <helgaas@kernel.org>; intel-xe@lists.freedesktop.org; linux-
> > acpi@vger.kernel.org; linux-pci@vger.kernel.org; lenb@kernel.org;
> > bhelgaas@google.com; ilpo.jarvinen@linux.intel.com; De Marchi, Lucas
> > <lucas.demarchi@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Nilawar,
> > Badal <badal.nilawar@intel.com>; Gupta, Varun <varun.gupta@intel.com>;
> > ville.syrjala@linux.intel.com; Shankar, Uma <uma.shankar@intel.com>
> > Subject: Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
> >
> > On Thu, Apr 3, 2025 at 1:30 PM Gupta, Anshuman
> > <anshuman.gupta@intel.com> wrote:
> > >
> > > > -----Original Message-----
> > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > Sent: Wednesday, April 2, 2025 4:54 PM
> > > > To: Bjorn Helgaas <helgaas@kernel.org>; Gupta, Anshuman
> > > > <anshuman.gupta@intel.com>
> > > > Cc: intel-xe@lists.freedesktop.org; linux-acpi@vger.kernel.org;
> > > > linux- pci@vger.kernel.org; rafael@kernel.org; lenb@kernel.org;
> > > > bhelgaas@google.com; ilpo.jarvinen@linux.intel.com; De Marchi, Lucas
> > > > <lucas.demarchi@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>;
> > > > Nilawar, Badal <badal.nilawar@intel.com>; Gupta, Varun
> > > > <varun.gupta@intel.com>; ville.syrjala@linux.intel.com; Shankar, Uma
> > > > <uma.shankar@intel.com>
> > > > Subject: Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
> > > >
> > > > On Tue, Apr 1, 2025 at 10:13 PM Bjorn Helgaas <helgaas@kernel.org>
> > wrote:
> > > > >
> > > > > On Tue, Apr 01, 2025 at 09:02:16PM +0530, Anshuman Gupta wrote:
> > > > > > Adding a notifier to notify all PCIe child devices about the
> > > > > > block main power removal. It is needed because theoretically
> > > > > > multiple PCIe Endpoint devices on same Root Port can request for
> > > > > > AUX power and _DSM method can return with 80000000h signifies
> > > > > > that the hierarchy connected via the slot cannot support main
> > > > > > power removal when in D3Cold.
> > > > >
> > > > > I wish the spec used different language here.  "D3cold" *means*
> > > > > "main power is removed" (PCIe r6.0, sec 5.3.1.4.2), so it doesn't
> > > > > make sense to say that "the slot cannot support main power removal
> > > > > when in D3cold".  If a device is in D3cold, its main power has
> > > > > been removed by definition.
> > > > >
> > > > > I suppose the spec is trying to say if the driver has called this
> > > > > _DSM with 80000000h, it means the platform must not remove main
> > > > > power from the device while the system is in S0?  Is that what you
> > > > > think it means?
> > > > >
> > > > > The 2h return value description says it "indicates that the
> > > > > platform will not remove main power from the slot while the system is in
> > S0,"
> > > > > so I guess that must be it.
> > > > >
> > > > > In this series, pci_acpi_aux_power_setup() only supplies a 16-bit
> > > > > aux_pwr_limit value, so the driver cannot *request* that the
> > > > > platform not remove main power.
> > > > >
> > > > > So I guess the only scenario where the _DSM returns 80000000h is
> > > > > when the platform itself or other devices prevent the removal of
> > > > > main power.  And the point of the notifier is to tell devices that
> > > > > their main power will never be removed while the system is in S0.
> > > > > Is that right?
> > > > >
> > > > > > This may disrupt functionality of other child device.
> > > > >
> > > > > What sort of disruption could happen?  I would think that if the
> > > > > _DSM returns 80000000h, it just means the device will not have
> > > > > main power removed, so it won't see that power state transition.
> > > > > The only "disruption" would be that we're using more power.
> > > > >
> > > > > > Let's notify all PCIe devices requested Aux power resource and
> > > > > > Let PCIe End Point driver handle it in its callback.
> > > > >
> > > > > Wrap to fill 75 columns.
> > > > >
> > > > > s/Adding/Add/
> > > > > s/Let's notify/Notify/
> > > > > s/and Let/and let/
> > > > > s/End Point/Endpoint/ (several places here and below)
> > > > >
> > > > > > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> > > > > > ---
> > > > > >  drivers/pci/pci-acpi.c   | 34 +++++++++++++++++++++++++++++++---
> > > > > >  include/linux/pci-acpi.h | 13 +++++++++++++
> > > > > >  2 files changed, 44 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> > > > > > index
> > > > > > 04149f037664..d1ca1649e6e8 100644
> > > > > > --- a/drivers/pci/pci-acpi.c
> > > > > > +++ b/drivers/pci/pci-acpi.c
> > > > > > @@ -1421,6 +1421,32 @@ static void
> > > > > > pci_acpi_optimize_delay(struct
> > > > pci_dev *pdev,
> > > > > >       ACPI_FREE(obj);
> > > > > >  }
> > > > > >
> > > > > > +static BLOCKING_NOTIFIER_HEAD(pci_acpi_aux_power_notify_list);
> > > > > > +
> > > > > > +/**
> > > > > > + * pci_acpi_register_aux_power_notifier - Register driver
> > > > > > +notifier
> > > > > > + * @nb: notifier block
> > > > > > + *
> > > > > > + * This function shall be called by PCIe End Point device
> > > > > > +requested the Aux
> > > > > > + * power resource in order to handle the any scenario
> > > > > > +gracefully when other
> > > > > > + * child PCIe devices Aux power request returns with No main
> > > > > > +power
> > > > removal.
> > > > > > + * PCIe devices which register this notifier shall handle No
> > > > > > +main power
> > > > > > + * removal scenario accordingly.
> > > > >
> > > > > This would actually be called by the *driver* (not by the device).
> > > >
> > > Hi Rafael,
> > > Thanks for review.
> > > > Apart from this, there seems to be a design issue here because it
> > > > won't notify every driver that has requested the Aux power, just the
> > > > ones that have also registered notifiers.
> > > IMHO that is what intended, if any device has functional impact in our
> > > case it is INTEL GPU has functional impact if other PCIe device's (on
> > > same root port) Aux Power request returns with 0x2. We need to handle
> > such case to handle it gracefully.
> > > >
> > > > So this appears to be an opt-in from getting notifications on Aux
> > > > power request rejection responses to requests from other drivers
> > > > requesting Aux power for the same Root Port, but the changelog of
> > > > the first patch already claimed that the aggregation of requests was
> > > > not supported.  So if only one driver will be allowed to request the
> > > > Aux power for the given Root Port, why would the notifiers be necessary
> > after all?
> > > >
> > > Please guide us, if we remove the above limitation from the commit log.
> > > Then do we have any design issue with notifier approach ?
> >
> > Exactly like I said: If you only allow one driver to use the _DSM to request the
> > Aux power from a given Root Port, it will have all of the information and does
> > not need to be notified about any changes.  Since no one else is allowed to use
> > this interface for that Root Port, no one else will need a notifier either.  For this
> > to work, you need some mechanism allowing drivers to claim the interface so
> > no one else can use it (on a per Root Port basis) which is currently missing
> > AFAICS.
>
> IMHO such kind of mechanism will require to add Root Port specific data structure to claim
> the interface. But real problem is the criteria  to claim the interface.
> Is first PCIe Non-Bridge Endpoint Function 0 driver can be criteria  to claim the
> Interface. Or first come and first serve approach ?

IMV, the first PCIe Non-Bridge Endpoint Function 0 driver approach
would be sort of fragile and cumbersome to enforce.

First come, first serve is much simpler and should be sufficient for now AFAICS.

> > I think that this is the option you want to pursue.
> >
> We will prefer to stick with this option, if it can guaranty that XeKMD will the only driver
> allowed to request Aux power to enable VRSR.
>
> > OTOH, if you want to allow multiple drivers to use this interface for the same
> > Root Port, you need to aggregate the requests (as required by the spec IIUC) in
> > the first place, or else the users can override each other's request.  In that case
>
> INHO how Linux Kernel will aggregate the Aux Power request, without knowing the
> total Aux power budget. AFAIU aggregated Aux power request without knowledge of total
> power budget can also be denied by BIOS[1].
> Therefore how aggregation can be useful ?

Say two drivers request the Aux power from the same Root Port and each
of them passes a limit sufficient for its device.  Without
aggregation, what guarantees that the last limit passed will be
sufficient for both devices?

Also see the spec provision regarding retries below.

> [1]As per r3.3 Section 4.6.10
> Platform Firmware Budgeting of Aux Power Availability:
> Platform firmware must not grant more power than what is available within the system. For
> example, a board may be designed with four CEM slots (one x16 slot, one x4 slot, and two x1
> slots). The board may implement a power delivery circuit capable of supplying 2 W of power for
> the 3.3 Vaux rail supplying all 4 slots. The 3.3 Vaux pins on each CEM slot can supply 1 W each.
> Platform firmware may use the retry mechanism to prioritize requests from devices in preferred
> slots in the following manner:
>
> -> Requests from a device in the highest priority slot (e.g., x16) are granted immediately, if
> available.
>
> -> Requests from devices in lower priority slots (e.g., x2, x1) are initially rejected, with a retry
> interval inversely proportional to the slot priority. For instance, if the x2 slot is higher priority
> than the x1 slots, so the retry interval for the x2 slot may be 4 seconds, and the x1 slots may be
> 8 and 10 seconds.
>
> ->As requests are granted, the granted values are subtracted from the available budget.
>  Retried requests are granted based on the remaining power budget or denied if insufficient
> power budget is available. Another retry is not permitted.
>
> -> When there is insufficient power budget for a request, firmware may choose to keep main
> power on and return no power removal (2h).

This means that the platform firmware is supposed to do its own
aggregation, but at the system (board) level, not at the Root Port
level.

The algorithm is described above and my understanding of it is that
once a request has been granted, it cannot be retracted later.  Also
it appears to assume a 1:1 correspondence between PCIe slots and Root
Ports.

The guys who haven't been granted their requests at once may be asked
to try again later and there may not be enough Aux power for the last
guys at all, so they will not get it and they will stay on the main
power.  And again, this appears to be at the Root Port granularity.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
  2025-04-03 18:15             ` Rafael J. Wysocki
@ 2025-04-04 12:53               ` Gupta, Anshuman
  2025-04-08 13:01                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 42+ messages in thread
From: Gupta, Anshuman @ 2025-04-04 12:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Bjorn Helgaas, intel-xe@lists.freedesktop.org,
	linux-acpi@vger.kernel.org, linux-pci@vger.kernel.org,
	lenb@kernel.org, bhelgaas@google.com,
	ilpo.jarvinen@linux.intel.com, De Marchi, Lucas, Vivi, Rodrigo,
	Nilawar, Badal, Gupta, Varun, ville.syrjala@linux.intel.com,
	Shankar, Uma

/snip
> > >
> > > Exactly like I said: If you only allow one driver to use the _DSM to
> > > request the Aux power from a given Root Port, it will have all of
> > > the information and does not need to be notified about any changes.
> > > Since no one else is allowed to use this interface for that Root
> > > Port, no one else will need a notifier either.  For this to work,
> > > you need some mechanism allowing drivers to claim the interface so
> > > no one else can use it (on a per Root Port basis) which is currently missing
> AFAICS.
> >
> > IMHO such kind of mechanism will require to add Root Port specific
> > data structure to claim the interface. But real problem is the criteria  to claim
> the interface.
> > Is first PCIe Non-Bridge Endpoint Function 0 driver can be criteria
> > to claim the Interface. Or first come and first serve approach ?
> 
> IMV, the first PCIe Non-Bridge Endpoint Function 0 driver approach would be
> sort of fragile and cumbersome to enforce.
> 
> First come, first serve is much simpler and should be sufficient for now AFAICS.
We are enabling VRSR only for default vga boot device.
As it needed only GPU driving the display for better user experience.
Can we use same logic vga_default_device() to claim the interface under root port.
That will simplify the criteria to claim the interface.
> 
> > > I think that this is the option you want to pursue.
> > >
> > We will prefer to stick with this option, if it can guaranty that
> > XeKMD will the only driver allowed to request Aux power to enable VRSR.
> >
> > > OTOH, if you want to allow multiple drivers to use this interface
> > > for the same Root Port, you need to aggregate the requests (as
> > > required by the spec IIUC) in the first place, or else the users can
> > > override each other's request.  In that case
> >
> > INHO how Linux Kernel will aggregate the Aux Power request, without
> > knowing the total Aux power budget. AFAIU aggregated Aux power request
> > without knowledge of total power budget can also be denied by BIOS[1].
> > Therefore how aggregation can be useful ?
> 
> Say two drivers request the Aux power from the same Root Port and each of
> them passes a limit sufficient for its device.  Without aggregation, what
> guarantees that the last limit passed will be sufficient for both devices?
Thanks for explaining this.
> 
> Also see the spec provision regarding retries below.
> 
> > [1]As per r3.3 Section 4.6.10
> > Platform Firmware Budgeting of Aux Power Availability:
> > Platform firmware must not grant more power than what is available
> > within the system. For example, a board may be designed with four CEM
> > slots (one x16 slot, one x4 slot, and two x1 slots). The board may
> > implement a power delivery circuit capable of supplying 2 W of power for the
> 3.3 Vaux rail supplying all 4 slots. The 3.3 Vaux pins on each CEM slot can
> supply 1 W each.
> > Platform firmware may use the retry mechanism to prioritize requests
> > from devices in preferred slots in the following manner:
> >
> > -> Requests from a device in the highest priority slot (e.g., x16) are
> > -> granted immediately, if
> > available.
> >
> > -> Requests from devices in lower priority slots (e.g., x2, x1) are
> > -> initially rejected, with a retry
> > interval inversely proportional to the slot priority. For instance, if
> > the x2 slot is higher priority than the x1 slots, so the retry
> > interval for the x2 slot may be 4 seconds, and the x1 slots may be
> > 8 and 10 seconds.
> >
> > ->As requests are granted, the granted values are subtracted from the
> available budget.
> >  Retried requests are granted based on the remaining power budget or
> > denied if insufficient power budget is available. Another retry is not
> permitted.
> >
> > -> When there is insufficient power budget for a request, firmware may
> > -> choose to keep main
> > power on and return no power removal (2h).
> 
> This means that the platform firmware is supposed to do its own aggregation,
> but at the system (board) level, not at the Root Port level.
> 
> The algorithm is described above and my understanding of it is that once a
> request has been granted, it cannot be retracted later.  Also it appears to
> assume a 1:1 correspondence between PCIe slots and Root Ports.
> 
> The guys who haven't been granted their requests at once may be asked to try
> again later and there may not be enough Aux power for the last guys at all, so
> they will not get it and they will stay on the main power.  And again, this
> appears to be at the Root Port granularity.
Thanks for explanation.
Thanks,
Anshuman

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
  2025-04-04 12:53               ` Gupta, Anshuman
@ 2025-04-08 13:01                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 42+ messages in thread
From: Rafael J. Wysocki @ 2025-04-08 13:01 UTC (permalink / raw)
  To: Gupta, Anshuman
  Cc: Rafael J. Wysocki, Bjorn Helgaas, intel-xe@lists.freedesktop.org,
	linux-acpi@vger.kernel.org, linux-pci@vger.kernel.org,
	lenb@kernel.org, bhelgaas@google.com,
	ilpo.jarvinen@linux.intel.com, De Marchi, Lucas, Vivi, Rodrigo,
	Nilawar, Badal, Gupta, Varun, ville.syrjala@linux.intel.com,
	Shankar, Uma

On Fri, Apr 4, 2025 at 2:53 PM Gupta, Anshuman <anshuman.gupta@intel.com> wrote:
>
> /snip
> > > >
> > > > Exactly like I said: If you only allow one driver to use the _DSM to
> > > > request the Aux power from a given Root Port, it will have all of
> > > > the information and does not need to be notified about any changes.
> > > > Since no one else is allowed to use this interface for that Root
> > > > Port, no one else will need a notifier either.  For this to work,
> > > > you need some mechanism allowing drivers to claim the interface so
> > > > no one else can use it (on a per Root Port basis) which is currently missing
> > AFAICS.
> > >
> > > IMHO such kind of mechanism will require to add Root Port specific
> > > data structure to claim the interface. But real problem is the criteria  to claim
> > the interface.
> > > Is first PCIe Non-Bridge Endpoint Function 0 driver can be criteria
> > > to claim the Interface. Or first come and first serve approach ?
> >
> > IMV, the first PCIe Non-Bridge Endpoint Function 0 driver approach would be
> > sort of fragile and cumbersome to enforce.
> >
> > First come, first serve is much simpler and should be sufficient for now AFAICS.
>
> We are enabling VRSR only for default vga boot device.
> As it needed only GPU driving the display for better user experience.
> Can we use same logic vga_default_device() to claim the interface under root port.
> That will simplify the criteria to claim the interface.

Basically, you need to prevent somebody else from running
DSM_PCI_D3COLD_AUX_POWER_LIMIT concurrently for the given Root Port
and store the information that it has been run already.

Personally, I'd add aux_power_limit to struct acpi_device_power and
I'd use a static mutex in pci_acpi_request_d3cold_aux_power() along
the lines of:

1. Acquire the mutex.
2. If power.aux_power_limit is set for the ACPI companion of pci_dev,
release the mutex and bail out.
3. Evaluate DSM_PCI_D3COLD_AUX_POWER_LIMIT and if it fails, release
the mutex and bail out.
4. Set power.aux_power_limit for the ACPI companion of pci_dev to the
requested value.
5. Release the mutex.

Of course, this would only allow it to be set once per kernel boot, so
in order to handle hibernation properly, the same Aux power limit
would need to be requested again automatically on Root Port restore.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method
  2025-04-02 19:36               ` Rafael J. Wysocki
@ 2025-04-08 20:48                 ` Bjorn Helgaas
  2025-04-09 12:30                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 42+ messages in thread
From: Bjorn Helgaas @ 2025-04-08 20:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Anshuman Gupta, intel-xe, linux-acpi, linux-pci, lenb, bhelgaas,
	ilpo.jarvinen, lucas.demarchi, rodrigo.vivi, badal.nilawar,
	varun.gupta, ville.syrjala, uma.shankar

On Wed, Apr 02, 2025 at 09:36:01PM +0200, Rafael J. Wysocki wrote:
> On Wed, Apr 2, 2025 at 8:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:

> > I don't *expect* rev 5 to be different.  But as a user of it, why
> > would I change working, tested code that is not broken?
> >
> > The PCI DPC function 0xc is an example where rev 5 (per ECN) and rev 6
> > (per r3.3) are not compatible.
> >
> > If the OS implemented rev 5, then learns via function 0 that function
> > 0xc is also supported at rev 6, and starts using the same OS code with
> > rev 6, the OS is broken.
> 
> Yes, in this case the backward compatibility language in the _DSM
> definition is obviously not followed.

Rev 5 in the ECN isn't compatible with rev 6 in the PCI FW r3.3 spec,
so it doesn't follow the ACPI compatibility requirement.  And this is
documented in PCI FW, which says "Fn 0xC was added with rev 5 (see ECN
for rev 5 details); here is how rev 6 works."

An OS implemented to the ECN doesn't know that rev 6 is different from
rev 5; it assumes they're the same because ACPI says we can assume
that, and PCI FW r3.3 even says the OS should use the same rev for all
functions.  If OS adds support for rev 6 of a some other function, it
is supposed to use rev 6 of Fn 0xC, which doesn't work as the OS
expects.

I guess one could argue that "OS didn't add rev 6 support for anything
until PCI FW r3.3 added a function at rev 6, r3.3 did mention the
difference between Fn 0xC rev 5 and 6, and OS should have looked at
all the already-implemented unrelated functions for possible changes
between rev 5 and rev 6."

I think that causes unnecessary changes in unrelated code.  The OS can
avoid the problem by *always* using Fn 0xC rev 5, regardless of what
rev it uses for other functions.

Of course, the OS can include support for multiple revs, e.g., it
could add support for Fn 0xC rev 6 and use that when available.  And
it could retain support for Fn 0xC rev 5 and use that if rev 6 isn't
available.

Bjorn

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method
  2025-04-08 20:48                 ` Bjorn Helgaas
@ 2025-04-09 12:30                   ` Rafael J. Wysocki
  2025-04-09 14:47                     ` Bjorn Helgaas
  0 siblings, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2025-04-09 12:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Anshuman Gupta, intel-xe, linux-acpi,
	linux-pci, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, varun.gupta, ville.syrjala,
	uma.shankar

On Tue, Apr 8, 2025 at 10:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Apr 02, 2025 at 09:36:01PM +0200, Rafael J. Wysocki wrote:
> > On Wed, Apr 2, 2025 at 8:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> > > I don't *expect* rev 5 to be different.  But as a user of it, why
> > > would I change working, tested code that is not broken?
> > >
> > > The PCI DPC function 0xc is an example where rev 5 (per ECN) and rev 6
> > > (per r3.3) are not compatible.
> > >
> > > If the OS implemented rev 5, then learns via function 0 that function
> > > 0xc is also supported at rev 6, and starts using the same OS code with
> > > rev 6, the OS is broken.
> >
> > Yes, in this case the backward compatibility language in the _DSM
> > definition is obviously not followed.
>
> Rev 5 in the ECN isn't compatible with rev 6 in the PCI FW r3.3 spec,
> so it doesn't follow the ACPI compatibility requirement.  And this is
> documented in PCI FW, which says "Fn 0xC was added with rev 5 (see ECN
> for rev 5 details); here is how rev 6 works."
>
> An OS implemented to the ECN doesn't know that rev 6 is different from
> rev 5; it assumes they're the same because ACPI says we can assume
> that, and PCI FW r3.3 even says the OS should use the same rev for all
> functions.

OK (and this is important because PCI FW r3.3 is the spec defining the
interface)

> If OS adds support for rev 6 of a some other function, it is supposed to use
> rev 6 of Fn 0xC, which doesn't work as the OS expects.

IMV with respect to _DSM, the spec that has defined the interface (PCI
FW r3.3) takes precedence over the ACPI spec regardless of what the
latter is saying.  In this case ACPI provides a framework the
interface can be based on, but the actual interface is not defined by
it.

Also, I think that the OS should use rev 6 if it is supported by the
firmware (for all functions) and it should literally follow the
definition of rev 6.

If rev 6 is not supported (or some functions needed by the OS are not
implemented by the firmware), it should fall back to rev 5 and in this
case it should literally follow the definition of rev 5.

And so on.

> I guess one could argue that "OS didn't add rev 6 support for anything
> until PCI FW r3.3 added a function at rev 6, r3.3 did mention the
> difference between Fn 0xC rev 5 and 6, and OS should have looked at
> all the already-implemented unrelated functions for possible changes
> between rev 5 and rev 6."

Yes, it should.

> I think that causes unnecessary changes in unrelated code.  The OS can
> avoid the problem by *always* using Fn 0xC rev 5, regardless of what
> rev it uses for other functions.

What if the functions on the firmware side depend on each other
interfally and the firmware gets confused if revisions are mixed up on
the OS side?

Since PCI FW r3.3 says that the OS should use the same rev for all
functions, that's what it should do because the firmware may
reasonably assume that it will.

> Of course, the OS can include support for multiple revs, e.g., it
> could add support for Fn 0xC rev 6 and use that when available.  And
> it could retain support for Fn 0xC rev 5 and use that if rev 6 isn't
> available.

That's what I mean basically.

In practice, the OS needs to assume that the firmware may not
implement a certain rev and so it needs to support other revisions
anyway.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method
  2025-04-09 12:30                   ` Rafael J. Wysocki
@ 2025-04-09 14:47                     ` Bjorn Helgaas
  2025-04-09 16:28                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 42+ messages in thread
From: Bjorn Helgaas @ 2025-04-09 14:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Anshuman Gupta, intel-xe, linux-acpi, linux-pci, lenb, bhelgaas,
	ilpo.jarvinen, lucas.demarchi, rodrigo.vivi, badal.nilawar,
	varun.gupta, ville.syrjala, uma.shankar

On Wed, Apr 09, 2025 at 02:30:31PM +0200, Rafael J. Wysocki wrote:
> On Tue, Apr 8, 2025 at 10:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Wed, Apr 02, 2025 at 09:36:01PM +0200, Rafael J. Wysocki wrote:
> > > On Wed, Apr 2, 2025 at 8:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > > > I don't *expect* rev 5 to be different.  But as a user of it,
> > > > why would I change working, tested code that is not broken?
> > > >
> > > > The PCI DPC function 0xc is an example where rev 5 (per ECN)
> > > > and rev 6 (per r3.3) are not compatible.
> > > >
> > > > If the OS implemented rev 5, then learns via function 0 that
> > > > function 0xc is also supported at rev 6, and starts using the
> > > > same OS code with rev 6, the OS is broken.
> > >
> > > Yes, in this case the backward compatibility language in the
> > > _DSM definition is obviously not followed.
> >
> > Rev 5 in the ECN isn't compatible with rev 6 in the PCI FW r3.3
> > spec, so it doesn't follow the ACPI compatibility requirement.
> > And this is documented in PCI FW, which says "Fn 0xC was added
> > with rev 5 (see ECN for rev 5 details); here is how rev 6 works."
> >
> > An OS implemented to the ECN doesn't know that rev 6 is different
> > from rev 5; it assumes they're the same because ACPI says we can
> > assume that, and PCI FW r3.3 even says the OS should use the same
> > rev for all functions.
> 
> OK (and this is important because PCI FW r3.3 is the spec defining
> the interface)
> 
> > If OS adds support for rev 6 of a some other function, it is
> > supposed to use rev 6 of Fn 0xC, which doesn't work as the OS
> > expects.
> 
> IMV with respect to _DSM, the spec that has defined the interface
> (PCI FW r3.3) takes precedence over the ACPI spec regardless of what
> the latter is saying.  In this case ACPI provides a framework the
> interface can be based on, but the actual interface is not defined
> by it.
> 
> Also, I think that the OS should use rev 6 if it is supported by the
> firmware (for all functions) and it should literally follow the
> definition of rev 6.

I think you interpret rev 6 as a global revision of the platform,
i.e., the platform supports rev 6 for every function it implements in
this UUID (which is clearly the intent of the ACPI ASL example).

I suggest that it would be better to interpret the revision
individually for each function because it removes the backwards
compatibility assumption and reduces the testing burden.

Most functions would be specified and implemented with rev 0, and
would never have a rev 1.  There would only be a rev 1 of a function
if we made a non backwards compatible change to it.

Any other functions would be untouched, and they would still only
support rev 0, not rev 1.

> > I guess one could argue that "OS didn't add rev 6 support for
> > anything until PCI FW r3.3 added a function at rev 6, r3.3 did
> > mention the difference between Fn 0xC rev 5 and 6, and OS should
> > have looked at all the already-implemented unrelated functions for
> > possible changes between rev 5 and rev 6."
> 
> Yes, it should.

I don't think it's reasonable to require the person adding support for
Fn 0xE rev 6 (TLP Processing Hints) to go back and add Fn 0xC rev 6
(Downstream Port Containment) at the same time.

Assuming that "rev X works the same as rev X-1 and therefore rev X
doesn't need to be tested" seems unwise to me.  But even if we
normally rely on that assumption, in this case Fn 0xC rev 5 and rev 6
are different, so we'd be adding new code that would require testing
on every platform that supports rev 6 of any function.

> What if the functions on the firmware side depend on each other
> interfally and the firmware gets confused if revisions are mixed up
> on the OS side?

In such a case, the backwards compatibility assumption doesn't apply
to those functions, so the spec would have to document multiple
revisions of them, and IMO that's the natural place to document
a requirement to use the same revision for the set.

Bjorn

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method
  2025-04-09 14:47                     ` Bjorn Helgaas
@ 2025-04-09 16:28                       ` Rafael J. Wysocki
  0 siblings, 0 replies; 42+ messages in thread
From: Rafael J. Wysocki @ 2025-04-09 16:28 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Anshuman Gupta, intel-xe, linux-acpi,
	linux-pci, lenb, bhelgaas, ilpo.jarvinen, lucas.demarchi,
	rodrigo.vivi, badal.nilawar, varun.gupta, ville.syrjala,
	uma.shankar

On Wed, Apr 9, 2025 at 4:47 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Apr 09, 2025 at 02:30:31PM +0200, Rafael J. Wysocki wrote:
> > On Tue, Apr 8, 2025 at 10:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Wed, Apr 02, 2025 at 09:36:01PM +0200, Rafael J. Wysocki wrote:
> > > > On Wed, Apr 2, 2025 at 8:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > > > I don't *expect* rev 5 to be different.  But as a user of it,
> > > > > why would I change working, tested code that is not broken?
> > > > >
> > > > > The PCI DPC function 0xc is an example where rev 5 (per ECN)
> > > > > and rev 6 (per r3.3) are not compatible.
> > > > >
> > > > > If the OS implemented rev 5, then learns via function 0 that
> > > > > function 0xc is also supported at rev 6, and starts using the
> > > > > same OS code with rev 6, the OS is broken.
> > > >
> > > > Yes, in this case the backward compatibility language in the
> > > > _DSM definition is obviously not followed.
> > >
> > > Rev 5 in the ECN isn't compatible with rev 6 in the PCI FW r3.3
> > > spec, so it doesn't follow the ACPI compatibility requirement.
> > > And this is documented in PCI FW, which says "Fn 0xC was added
> > > with rev 5 (see ECN for rev 5 details); here is how rev 6 works."
> > >
> > > An OS implemented to the ECN doesn't know that rev 6 is different
> > > from rev 5; it assumes they're the same because ACPI says we can
> > > assume that, and PCI FW r3.3 even says the OS should use the same
> > > rev for all functions.
> >
> > OK (and this is important because PCI FW r3.3 is the spec defining
> > the interface)
> >
> > > If OS adds support for rev 6 of a some other function, it is
> > > supposed to use rev 6 of Fn 0xC, which doesn't work as the OS
> > > expects.
> >
> > IMV with respect to _DSM, the spec that has defined the interface
> > (PCI FW r3.3) takes precedence over the ACPI spec regardless of what
> > the latter is saying.  In this case ACPI provides a framework the
> > interface can be based on, but the actual interface is not defined
> > by it.
> >
> > Also, I think that the OS should use rev 6 if it is supported by the
> > firmware (for all functions) and it should literally follow the
> > definition of rev 6.
>
> I think you interpret rev 6 as a global revision of the platform,
> i.e., the platform supports rev 6 for every function it implements in
> this UUID (which is clearly the intent of the ACPI ASL example).

Yes, I do.

> I suggest that it would be better to interpret the revision
> individually for each function because it removes the backwards
> compatibility assumption and reduces the testing burden.
>
> Most functions would be specified and implemented with rev 0, and
> would never have a rev 1.  There would only be a rev 1 of a function
> if we made a non backwards compatible change to it.
>
> Any other functions would be untouched, and they would still only
> support rev 0, not rev 1.

I think that it would just be confusing because both the OS and the
firmware would now have to remember which function is defined at which
revision.

Also the industry practice in this respect has been different so far, AFAICS.

And PCI FW r3.3 wants the OS to use the same rev for all functions
which doesn't leave too much room for interpretation.

> > > I guess one could argue that "OS didn't add rev 6 support for
> > > anything until PCI FW r3.3 added a function at rev 6, r3.3 did
> > > mention the difference between Fn 0xC rev 5 and 6, and OS should
> > > have looked at all the already-implemented unrelated functions for
> > > possible changes between rev 5 and rev 6."
> >
> > Yes, it should.
>
> I don't think it's reasonable to require the person adding support for
> Fn 0xE rev 6 (TLP Processing Hints) to go back and add Fn 0xC rev 6
> (Downstream Port Containment) at the same time.

Well, if you know that the function is supposed to work the same in
the new rev, it only is a matter of changing the rev passed to
acpi_evaluate_dsm*().

> Assuming that "rev X works the same as rev X-1 and therefore rev X
> doesn't need to be tested" seems unwise to me.

So say you've only changed the rev passed to acpi_evaluate_dsm*() as
per the above.  The only reason why it may not work is when there is a
bug in the firmware.

Now, suppose that you don't change anything and stick to rev X-1, but
there is a new version of the firmware that contains a bug in the
given _DSM implementation.  Same thing.

I get the "only the tested code is trustable" argument, but its
applicability here is naturally limited.

> But even if we normally rely on that assumption, in this case Fn 0xC rev 5 and rev 6
> are different, so we'd be adding new code that would require testing
> on every platform that supports rev 6 of any function.

Unfortunately, as far as OS-firmware interfaces are concerned, you
need to trust the other end to do the right thing because you cannot
test all of the possible combinations.

> > What if the functions on the firmware side depend on each other
> > interfally and the firmware gets confused if revisions are mixed up
> > on the OS side?
>
> In such a case, the backwards compatibility assumption doesn't apply
> to those functions, so the spec would have to document multiple
> revisions of them, and IMO that's the natural place to document
> a requirement to use the same revision for the set.

I'm not talking about the definition of the interface, but about
dependencies at the implementation level.

Given the PCI FW r3.3 provisions, firmware may assume that the OS will
use the same rev for all functions and it may depend on that.

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2025-04-09 16:28 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-01 15:32 [PATCH 00/12] VRAM Self Refresh Anshuman Gupta
2025-04-01 15:32 ` [PATCH 01/12] PCI/ACPI: Add D3cold Aux Power Limit_DSM method Anshuman Gupta
2025-04-01 18:25   ` Bjorn Helgaas
2025-04-02 10:59     ` Rafael J. Wysocki
2025-04-03  5:25     ` Gupta, Anshuman
2025-04-01 15:32 ` [PATCH 02/12] PCI/ACPI: Add PERST# Assertion Delay _DSM method Anshuman Gupta
2025-04-01 18:30   ` Bjorn Helgaas
2025-04-03  5:59     ` Gupta, Anshuman
2025-04-02 11:06   ` Rafael J. Wysocki
2025-04-02 14:21     ` Bjorn Helgaas
2025-04-02 14:52       ` Rafael J. Wysocki
2025-04-02 15:50         ` Bjorn Helgaas
2025-04-02 17:51           ` Rafael J. Wysocki
2025-04-02 18:48             ` Bjorn Helgaas
2025-04-02 19:36               ` Rafael J. Wysocki
2025-04-08 20:48                 ` Bjorn Helgaas
2025-04-09 12:30                   ` Rafael J. Wysocki
2025-04-09 14:47                     ` Bjorn Helgaas
2025-04-09 16:28                       ` Rafael J. Wysocki
2025-04-01 15:32 ` [PATCH 03/12] PCI/ACPI: Add aux power grant notifier Anshuman Gupta
2025-04-01 20:13   ` Bjorn Helgaas
2025-04-02 11:23     ` Rafael J. Wysocki
2025-04-03 11:30       ` Gupta, Anshuman
2025-04-03 13:34         ` Rafael J. Wysocki
2025-04-03 16:08           ` Gupta, Anshuman
2025-04-03 18:15             ` Rafael J. Wysocki
2025-04-04 12:53               ` Gupta, Anshuman
2025-04-08 13:01                 ` Rafael J. Wysocki
2025-04-03  7:56     ` Gupta, Anshuman
2025-04-03 13:48       ` Rafael J. Wysocki
2025-04-01 15:32 ` [PATCH 04/12] drm/xe/vrsr: Introduce flag has_vrsr Anshuman Gupta
2025-04-01 15:32 ` [PATCH 05/12] drm/xe/vrsr: Detect VRSR Capability Anshuman Gupta
2025-04-01 15:32 ` [PATCH 06/12] drm/xe/vrsr: Initialize VRSR feature Anshuman Gupta
2025-04-01 19:56   ` Bjorn Helgaas
2025-04-03  6:09     ` Gupta, Anshuman
2025-04-01 15:32 ` [PATCH 07/12] drm/xe/vrsr: Enable VRSR on default VGA boot device Anshuman Gupta
2025-04-01 15:32 ` [PATCH 08/12] drm/xe: Add PCIe ACPI Aux Power notifier Anshuman Gupta
2025-04-01 15:32 ` [PATCH 09/12] drm/xe/vrsr: Refactor d3cold.allowed to a enum Anshuman Gupta
2025-04-01 15:32 ` [PATCH 10/12] drm/xe/pm: D3Cold target state Anshuman Gupta
2025-04-02 10:28   ` [10/12] " Poosa, Karthik
2025-04-01 15:32 ` [PATCH 11/12] drm/xe/vrsr: Enable VRSR Anshuman Gupta
2025-04-01 15:32 ` [PATCH 12/12] drm/xe/vrsr: Introduce a debugfs node named vrsr_capable Anshuman Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).