Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 00/20] First attempt to kill mem_access
@ 2023-12-28  2:12 Rodrigo Vivi
  2023-12-28  2:12 ` [RFC 01/20] drm/xe: Document Xe PM component Rodrigo Vivi
                   ` (23 more replies)
  0 siblings, 24 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

At first the mem_access seemed a good idea since it would ensure
we could map every memory access and apply some workarounds and
then use that to ensure that the device is awake.

However it has become a nightmare in locking conflicts with memory
locking. The only sane way to go is to move the runtime_pm protection
to the outer bounds and ensure that the device is resumed way
before memory locking.

So, this RFC here is the first attempt to kill the mem access and
have a clean rpm handling on the outer bounds.

Well, at this time we already know that we need to solve some TLB
invalidation issues and the last patch in this series needs to
be split in smaller pieces. But I'd like to at lest get
the discussion started.

Happy New Year,
Rodrigo.

Rodrigo Vivi (20):
  drm/xe: Document Xe PM component
  drm/xe: Fix display runtime_pm handling
  drm/xe: Create a xe_pm_runtime_resume_and_get variant for display
  drm/xe: Convert xe_pm_runtime_{get,put} to void and protect from
    recursion
  drm/xe: Prepare display for D3Cold
  drm/xe: Convert mem_access assertion towards the runtime_pm state
  drm/xe: Runtime PM wake on every IOCTL
  drm/xe: Runtime PM wake on every exec
  drm/xe: Runtime PM wake on every sysfs call
  drm/xe: Sort some xe_pm_runtime related functions
  drm/xe: Ensure device is awake before removing it
  drm/xe: Remove mem_access from guc_pc calls
  drm/xe: Runtime PM wake on every debugfs call
  drm/xe: Replace dma_buf mem_access per direct xe_pm_runtime calls
  drm/xe: Allow GuC CT fast path and worker regardless of runtime_pm
  drm/xe: Remove mem_access calls from migration
  drm/xe: Removing extra mem_access protection from runtime pm
  drm/xe: Convert hwmon from mem_access to xe_pm_runtime calls
  drm/xe: Remove unused runtime pm helper
  drm/xe: Mega Kill of mem_access

 .../gpu/drm/xe/compat-i915-headers/i915_drv.h |   8 +-
 drivers/gpu/drm/xe/display/xe_fb_pin.c        |   7 +-
 drivers/gpu/drm/xe/tests/xe_bo.c              |   8 -
 drivers/gpu/drm/xe/tests/xe_migrate.c         |   2 -
 drivers/gpu/drm/xe/tests/xe_mocs.c            |   4 -
 drivers/gpu/drm/xe/xe_bo.c                    |   5 -
 drivers/gpu/drm/xe/xe_debugfs.c               |  10 +-
 drivers/gpu/drm/xe/xe_device.c                | 129 ++++-------
 drivers/gpu/drm/xe/xe_device.h                |   9 -
 drivers/gpu/drm/xe/xe_device_sysfs.c          |   4 +
 drivers/gpu/drm/xe/xe_device_types.h          |   9 -
 drivers/gpu/drm/xe/xe_dma_buf.c               |   5 +-
 drivers/gpu/drm/xe/xe_exec_queue.c            |  18 --
 drivers/gpu/drm/xe/xe_ggtt.c                  |   6 -
 drivers/gpu/drm/xe/xe_gsc.c                   |   3 -
 drivers/gpu/drm/xe/xe_gt.c                    |  17 --
 drivers/gpu/drm/xe/xe_gt_debugfs.c            |  53 ++++-
 drivers/gpu/drm/xe/xe_gt_freq.c               |  38 +++-
 drivers/gpu/drm/xe/xe_gt_idle.c               |  23 +-
 drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c     |   3 +
 drivers/gpu/drm/xe/xe_guc_ct.c                |  40 ----
 drivers/gpu/drm/xe/xe_guc_debugfs.c           |   9 +-
 drivers/gpu/drm/xe/xe_guc_pc.c                |  62 +----
 drivers/gpu/drm/xe/xe_huc_debugfs.c           |   5 +-
 drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c |  58 ++++-
 drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.h |   7 +
 drivers/gpu/drm/xe/xe_hwmon.c                 |  25 ++-
 drivers/gpu/drm/xe/xe_pat.c                   |  10 -
 drivers/gpu/drm/xe/xe_pci.c                   |   2 +-
 drivers/gpu/drm/xe/xe_pm.c                    | 211 ++++++++++++++----
 drivers/gpu/drm/xe/xe_pm.h                    |   9 +-
 drivers/gpu/drm/xe/xe_query.c                 |   4 -
 drivers/gpu/drm/xe/xe_sched_job.c             |  10 +-
 drivers/gpu/drm/xe/xe_tile.c                  |  10 +-
 drivers/gpu/drm/xe/xe_tile_sysfs.c            |   1 +
 drivers/gpu/drm/xe/xe_ttm_sys_mgr.c           |   5 +-
 drivers/gpu/drm/xe/xe_vm.c                    |   7 -
 37 files changed, 445 insertions(+), 391 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC 01/20] drm/xe: Document Xe PM component
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2023-12-28  2:12 ` [RFC 02/20] drm/xe: Fix display runtime_pm handling Rodrigo Vivi
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

Replace outdated information with a proper PM documentation.
Already establish the rules for the runtime PM get and put that
Xe needs to follow.

Also add missing function documentation to all the "exported" functions.

Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_pm.c | 107 +++++++++++++++++++++++++++++++++----
 1 file changed, 96 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index b429c2876a764..a7fe69d2f442e 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -25,21 +25,45 @@
 /**
  * DOC: Xe Power Management
  *
- * Xe PM shall be guided by the simplicity.
- * Use the simplest hook options whenever possible.
- * Let's not reinvent the runtime_pm references and hooks.
- * Shall have a clear separation of display and gt underneath this component.
+ * Xe PM implements the main routines for both system level suspend states and
+ * for the opportunistic runtime suspend states.
  *
- * What's next:
+ * System Level Suspend (S-States) - In general this is OS initiated suspend
+ * driven by ACPI for achieving S0ix (a.k.a. S2idle, freeze), S3 (suspend to ram),
+ * S4 (disk). The main functions here are `xe_pm_suspend` and `xe_pm_resume` that
+ * are the main point for the suspend to and resume from these states.
  *
- * For now s2idle and s3 are only working in integrated devices. The next step
- * is to iterate through all VRAM's BO backing them up into the system memory
- * before allowing the system suspend.
+ * Runtime Suspend (D-States) - This is the opportunistic PCIe device low power
+ * state D3. Xe PM component provides `xe_pm_runtime_suspend` and
+ * `xe_pm_runtime_resume` systems that PCI subsystem will call before transition
+ * to D3. Also, Xe PM provides get and put functions that Xe driver will use to
+ * indicate activity. In order to avoid locking complications with the memory
+ * management, whenever possible, these get and put functions needs to be called
+ * from the higher/outer levels.
  *
- * Also runtime_pm needs to be here from the beginning.
+ * The main cases that need to be protected from the outer levels are: IOCLT,
+ * sysfs, debugfs, dma-buf sharing, GPU execution.
  *
- * RC6/RPS are also critical PM features. Let's start with GuCRC and GuC SLPC
- * and no wait boost. Frequency optimizations should come on a next stage.
+ * PCI D3 is special and can mean D3hot, where Vcc power is on for keeping memory
+ * alive and quicker low latency resume or D3Cold where Vcc power is off for
+ * better power savings.
+ * The Vcc control of PCI hierarchy can only be controlled at the PCI root port
+ * level, while the device driver can be behind multiple bridges/switches and
+ * paired with other devices. For this reason, the PCI subsystem cannot perform
+ * the transition towards D3Cold. The lowest runtime PM possible from the PCI
+ * subsystem is D3hot. Then, if all these paired devices in the same root port
+ * are in D3hot, ACPI will assist here and run its _PR3 and _OFF methods to
+ * perform the transition from D3hot to D3cold. Xe may disallow this transition
+ * based on runtime conditions such as VRAM usage for a quick and low latency
+ * resume for instance.
+ *
+ * Intel systems are capable of taking the system to S0ix when devices are on
+ * D3hot through the runtime PM. This is also called as 'opportunistic-S0iX'.
+ * But in this case, the `xe_pm_suspend` and `xe_pm_resume` won't be called for
+ * S0ix.
+ *
+ * This component is no responsible for GT idleness (RC6) nor GT frequency
+ * management (RPS).
  */
 
 /**
@@ -163,6 +187,12 @@ static void xe_pm_runtime_init(struct xe_device *xe)
 	pm_runtime_put(dev);
 }
 
+/**
+ * xe_pm_init - Initialize Xe Power Management
+ * @xe: xe device instance
+ *
+ * This component is responsible for System and Device sleep states.
+ */
 void xe_pm_init(struct xe_device *xe)
 {
 	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
@@ -183,6 +213,10 @@ void xe_pm_init(struct xe_device *xe)
 	xe_pm_runtime_init(xe);
 }
 
+/**
+ * xe_pm_runtime_fini - Finalize Runtime PM
+ * @xe: xe device instance
+ */
 void xe_pm_runtime_fini(struct xe_device *xe)
 {
 	struct device *dev = xe->drm.dev;
@@ -212,6 +246,12 @@ struct task_struct *xe_pm_read_callback_task(struct xe_device *xe)
 	return READ_ONCE(xe->pm_callback_task);
 }
 
+/**
+ * xe_pm_runtime_suspend - Prepare our device for D3hot/D3Cold
+ * @xe: xe device instance
+ *
+ * Returns 0 for success, negative error code otherwise.
+ */
 int xe_pm_runtime_suspend(struct xe_device *xe)
 {
 	struct xe_gt *gt;
@@ -266,6 +306,12 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
 	return err;
 }
 
+/**
+ * xe_pm_runtime_resume - Waking up from D3hot/D3Cold
+ * @xe: xe device instance
+ *
+ * Returns 0 for success, negative error code otherwise.
+ */
 int xe_pm_runtime_resume(struct xe_device *xe)
 {
 	struct xe_gt *gt;
@@ -317,22 +363,47 @@ int xe_pm_runtime_resume(struct xe_device *xe)
 	return err;
 }
 
+/**
+ * xe_pm_runtime_get - Get a runtime_pm reference and resume synchronously
+ * @xe: xe device instance
+ *
+ * Returns: Any number grater than or equal to 0 for success, negative error
+ * code otherwise.
+ */
 int xe_pm_runtime_get(struct xe_device *xe)
 {
 	return pm_runtime_get_sync(xe->drm.dev);
 }
 
+/**
+ * xe_pm_runtime_put - Put the runtime_pm reference back and mark as idle
+ * @xe: xe device instance
+ *
+ * Returns: Any number grater than or equal to 0 for success, negative error
+ * code otherwise.
+ */
 int xe_pm_runtime_put(struct xe_device *xe)
 {
 	pm_runtime_mark_last_busy(xe->drm.dev);
 	return pm_runtime_put(xe->drm.dev);
 }
 
+/**
+ * xe_pm_runtime_get_if_active - Get a runtime_pm reference if device active
+ * @xe: xe device instance
+ *
+ * Returns: Any number grater than or equal to 0 for success, negative error
+ * code otherwise.
+ */
 int xe_pm_runtime_get_if_active(struct xe_device *xe)
 {
 	return pm_runtime_get_if_active(xe->drm.dev, true);
 }
 
+/**
+ * xe_pm_assert_unbounded_bridge - Disable PM on unbounded pcie parent bridge
+ * @xe: xe device instance
+ */
 void xe_pm_assert_unbounded_bridge(struct xe_device *xe)
 {
 	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
@@ -347,6 +418,13 @@ void xe_pm_assert_unbounded_bridge(struct xe_device *xe)
 	}
 }
 
+/**
+ * xe_pm_set_vram_threshold - Set a vram threshold for allowing/blocking D3Cold
+ * @xe: xe device instance
+ * @threshold: VRAM size in bites for the D3cold threshold
+ *
+ * Returns 0 for success, negative error code otherwise.
+ */
 int xe_pm_set_vram_threshold(struct xe_device *xe, u32 threshold)
 {
 	struct ttm_resource_manager *man;
@@ -371,6 +449,13 @@ int xe_pm_set_vram_threshold(struct xe_device *xe, u32 threshold)
 	return 0;
 }
 
+/**
+ * xe_pm_d3cold_allowed_toggle - Check conditions to toggle d3cold.allowed
+ * @xe: xe device instance
+ *
+ * To be called during runtime_pm idle callback.
+ * Check for all the D3Cold conditions ahead of runtime suspend.
+ */
 void xe_pm_d3cold_allowed_toggle(struct xe_device *xe)
 {
 	struct ttm_resource_manager *man;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 02/20] drm/xe: Fix display runtime_pm handling
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
  2023-12-28  2:12 ` [RFC 01/20] drm/xe: Document Xe PM component Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2023-12-28  2:12 ` [RFC 03/20] drm/xe: Create a xe_pm_runtime_resume_and_get variant for display Rodrigo Vivi
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

i915's intel_runtime_pm_get_if_in_use actually calls the
pm_runtime_get_if_active() with ign_usage_count = false, but Xe
was erroneously calling it with true because of the mem_access cases.
This can lead to unbalanced references.

Let's use directly the 'if_in_use' function provided by linux/pm_runtime.

Also, already start this new function protected from the runtime
recursion, since runtime_pm will need to call for display functions
for a proper D3Cold flow.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../gpu/drm/xe/compat-i915-headers/i915_drv.h   |  2 +-
 drivers/gpu/drm/xe/xe_pm.c                      | 17 +++++++++++++++++
 drivers/gpu/drm/xe/xe_pm.h                      |  1 +
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h b/drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h
index 5d2a77b52db41..f0464dfc45cf5 100644
--- a/drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h
+++ b/drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h
@@ -177,7 +177,7 @@ static inline bool intel_runtime_pm_get_if_in_use(struct xe_runtime_pm *pm)
 {
 	struct xe_device *xe = container_of(pm, struct xe_device, runtime_pm);
 
-	return xe_pm_runtime_get_if_active(xe);
+	return xe_pm_runtime_get_if_in_use(xe);
 }
 
 static inline void intel_runtime_pm_put_unchecked(struct xe_runtime_pm *pm)
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index a7fe69d2f442e..5a4d601ab0976 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -400,6 +400,23 @@ int xe_pm_runtime_get_if_active(struct xe_device *xe)
 	return pm_runtime_get_if_active(xe->drm.dev, true);
 }
 
+/**
+ * xe_pm_runtime_get_if_in_use - Get a runtime_pm reference and resume if needed
+ * @xe: xe device instance
+ *
+ * Returns: True if device is awake and the the reference was taken, false otherwise.
+ */
+bool xe_pm_runtime_get_if_in_use(struct xe_device *xe)
+{
+	if (xe_pm_read_callback_task(xe) == current) {
+		/* The device is awake, grab the ref and move on */
+                pm_runtime_get_noresume(xe->drm.dev);
+		return true;
+	}
+
+        return pm_runtime_get_if_in_use(xe->drm.dev) >= 0;
+}
+
 /**
  * xe_pm_assert_unbounded_bridge - Disable PM on unbounded pcie parent bridge
  * @xe: xe device instance
diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
index 6b9031f7af243..13eebd604dd96 100644
--- a/drivers/gpu/drm/xe/xe_pm.h
+++ b/drivers/gpu/drm/xe/xe_pm.h
@@ -27,6 +27,7 @@ int xe_pm_runtime_resume(struct xe_device *xe);
 int xe_pm_runtime_get(struct xe_device *xe);
 int xe_pm_runtime_put(struct xe_device *xe);
 int xe_pm_runtime_get_if_active(struct xe_device *xe);
+bool xe_pm_runtime_get_if_in_use(struct xe_device *xe);
 void xe_pm_assert_unbounded_bridge(struct xe_device *xe);
 int xe_pm_set_vram_threshold(struct xe_device *xe, u32 threshold);
 void xe_pm_d3cold_allowed_toggle(struct xe_device *xe);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 03/20] drm/xe: Create a xe_pm_runtime_resume_and_get variant for display
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
  2023-12-28  2:12 ` [RFC 01/20] drm/xe: Document Xe PM component Rodrigo Vivi
  2023-12-28  2:12 ` [RFC 02/20] drm/xe: Fix display runtime_pm handling Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2023-12-28  2:12 ` [RFC 04/20] drm/xe: Convert xe_pm_runtime_{get, put} to void and protect from recursion Rodrigo Vivi
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

Introduce the resume and get to fulfill the display need for checking
if the device was actually resumed (or it is awake) and the reference
was taken.

Then we can convert the remaining cases to a void function and have
individual functions for individual cases.

Also, already start this new function protected from the runtime
recursion, since runtime_pm will need to call for display functions
for a proper D3Cold flow.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../gpu/drm/xe/compat-i915-headers/i915_drv.h   |  6 +-----
 drivers/gpu/drm/xe/xe_pm.c                      | 17 +++++++++++++++++
 drivers/gpu/drm/xe/xe_pm.h                      |  1 +
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h b/drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h
index f0464dfc45cf5..c0ec888b67271 100644
--- a/drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h
+++ b/drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h
@@ -166,11 +166,7 @@ static inline bool intel_runtime_pm_get(struct xe_runtime_pm *pm)
 {
 	struct xe_device *xe = container_of(pm, struct xe_device, runtime_pm);
 
-	if (xe_pm_runtime_get(xe) < 0) {
-		xe_pm_runtime_put(xe);
-		return false;
-	}
-	return true;
+	return xe_pm_runtime_resume_and_get(xe);
 }
 
 static inline bool intel_runtime_pm_get_if_in_use(struct xe_runtime_pm *pm)
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index 5a4d601ab0976..32db9068ac169 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -417,6 +417,23 @@ bool xe_pm_runtime_get_if_in_use(struct xe_device *xe)
         return pm_runtime_get_if_in_use(xe->drm.dev) >= 0;
 }
 
+/**
+ * xe_pm_runtime_resume_and_get - Resume, then get a runtime_pm ref if awake.
+ * @xe: xe device instance
+ *
+ * Returns: True if device is awake and the the reference was taken, false otherwise.
+ */
+bool xe_pm_runtime_resume_and_get(struct xe_device *xe)
+{
+	if (xe_pm_read_callback_task(xe) == current) {
+		/* The device is awake, grab the ref and move on */
+                pm_runtime_get_noresume(xe->drm.dev);
+		return true;
+	}
+
+        return pm_runtime_resume_and_get(xe->drm.dev) >= 0;
+}
+
 /**
  * xe_pm_assert_unbounded_bridge - Disable PM on unbounded pcie parent bridge
  * @xe: xe device instance
diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
index 13eebd604dd96..b812281635673 100644
--- a/drivers/gpu/drm/xe/xe_pm.h
+++ b/drivers/gpu/drm/xe/xe_pm.h
@@ -28,6 +28,7 @@ int xe_pm_runtime_get(struct xe_device *xe);
 int xe_pm_runtime_put(struct xe_device *xe);
 int xe_pm_runtime_get_if_active(struct xe_device *xe);
 bool xe_pm_runtime_get_if_in_use(struct xe_device *xe);
+bool xe_pm_runtime_resume_and_get(struct xe_device *xe);
 void xe_pm_assert_unbounded_bridge(struct xe_device *xe);
 int xe_pm_set_vram_threshold(struct xe_device *xe, u32 threshold);
 void xe_pm_d3cold_allowed_toggle(struct xe_device *xe);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 04/20] drm/xe: Convert xe_pm_runtime_{get, put} to void and protect from recursion
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (2 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 03/20] drm/xe: Create a xe_pm_runtime_resume_and_get variant for display Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2023-12-28  2:12 ` [RFC 05/20] drm/xe: Prepare display for D3Cold Rodrigo Vivi
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

With mem_access will go away and pm_runtime calls will be called instead.
So, we need to protect these against recursions.

For D3cold, the TTM migration helpers will call for the job execution.
Jobs execution will be protected by direct runtime_pm calls, but they
cannot be called again if we are already at a runtime suspend/resume
transaction when evicting/restoring memory for D3Cold. So, we will check
for the xe_pm_read_callback_task.

The put is asynchronous so there's no need to block it. However, for a
proper balance, we need to ensure that the references are taken and
restored regardless of the flow. So, let's convert them all to void and
use some direct linux/pm_runtime functions.

Cases that need to check for the references or runtime status will
be handled separate from these main get and put functions here.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_pm.c | 19 +++++++++----------
 drivers/gpu/drm/xe/xe_pm.h |  4 ++--
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index 32db9068ac169..bd5ca41e19c5e 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -366,26 +366,25 @@ int xe_pm_runtime_resume(struct xe_device *xe)
 /**
  * xe_pm_runtime_get - Get a runtime_pm reference and resume synchronously
  * @xe: xe device instance
- *
- * Returns: Any number grater than or equal to 0 for success, negative error
- * code otherwise.
  */
-int xe_pm_runtime_get(struct xe_device *xe)
+void xe_pm_runtime_get(struct xe_device *xe)
 {
-	return pm_runtime_get_sync(xe->drm.dev);
+	pm_runtime_get_noresume(xe->drm.dev);
+
+	if (xe_pm_read_callback_task(xe) == current)
+                return;
+
+	pm_runtime_resume(xe->drm.dev);
 }
 
 /**
  * xe_pm_runtime_put - Put the runtime_pm reference back and mark as idle
  * @xe: xe device instance
- *
- * Returns: Any number grater than or equal to 0 for success, negative error
- * code otherwise.
  */
-int xe_pm_runtime_put(struct xe_device *xe)
+void xe_pm_runtime_put(struct xe_device *xe)
 {
 	pm_runtime_mark_last_busy(xe->drm.dev);
-	return pm_runtime_put(xe->drm.dev);
+	pm_runtime_put(xe->drm.dev);
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
index b812281635673..069f41c61505b 100644
--- a/drivers/gpu/drm/xe/xe_pm.h
+++ b/drivers/gpu/drm/xe/xe_pm.h
@@ -24,8 +24,8 @@ void xe_pm_init(struct xe_device *xe);
 void xe_pm_runtime_fini(struct xe_device *xe);
 int xe_pm_runtime_suspend(struct xe_device *xe);
 int xe_pm_runtime_resume(struct xe_device *xe);
-int xe_pm_runtime_get(struct xe_device *xe);
-int xe_pm_runtime_put(struct xe_device *xe);
+void xe_pm_runtime_get(struct xe_device *xe);
+void xe_pm_runtime_put(struct xe_device *xe);
 int xe_pm_runtime_get_if_active(struct xe_device *xe);
 bool xe_pm_runtime_get_if_in_use(struct xe_device *xe);
 bool xe_pm_runtime_resume_and_get(struct xe_device *xe);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 05/20] drm/xe: Prepare display for D3Cold
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (3 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 04/20] drm/xe: Convert xe_pm_runtime_{get, put} to void and protect from recursion Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2023-12-28  2:12 ` [RFC 06/20] drm/xe: Convert mem_access assertion towards the runtime_pm state Rodrigo Vivi
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

A full display suspend and resume is needed when the power
was lost during D3Cold, so the proper flow for DC9 and
DMC restoration is in place.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_pm.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index bd5ca41e19c5e..cabed94a21873 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -291,6 +291,8 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
 		err = xe_bo_evict_all(xe);
 		if (err)
 			goto out;
+
+		xe_display_pm_suspend(xe);
 	}
 
 	for_each_gt(gt, xe, id) {
@@ -300,6 +302,9 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
 	}
 
 	xe_irq_suspend(xe);
+
+	if (xe->d3cold.allowed)
+		xe_display_pm_suspend_late(xe);
 out:
 	lock_map_release(&xe_device_mem_access_lockdep_map);
 	xe_pm_write_callback_task(xe, NULL);
@@ -338,6 +343,8 @@ int xe_pm_runtime_resume(struct xe_device *xe)
 				goto out;
 		}
 
+		xe_display_pm_resume_early(xe);
+
 		/*
 		 * This only restores pinned memory which is the memory
 		 * required for the GT(s) to resume.
@@ -353,6 +360,7 @@ int xe_pm_runtime_resume(struct xe_device *xe)
 		xe_gt_resume(gt);
 
 	if (xe->d3cold.allowed && xe->d3cold.power_lost) {
+		xe_display_pm_resume(xe);
 		err = xe_bo_restore_user(xe);
 		if (err)
 			goto out;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 06/20] drm/xe: Convert mem_access assertion towards the runtime_pm state
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (4 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 05/20] drm/xe: Prepare display for D3Cold Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2024-01-09 11:06   ` Matthew Auld
  2023-12-28  2:12 ` [RFC 07/20] drm/xe: Runtime PM wake on every IOCTL Rodrigo Vivi
                   ` (17 subsequent siblings)
  23 siblings, 1 reply; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

The mem_access helpers are going away and getting replaced by
direct calls of the xe_pm_runtime_{get,put} functions. However, an
assertion with a warning splat is desired when we hit the worst
case of a memory access with the device really in the 'suspended'
state.

Also, this needs to be the first step. Otherwise, the upcoming
conversion would be really noise with warn splats of missing mem_access
gets.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c | 13 ++++++++++++-
 drivers/gpu/drm/xe/xe_pm.c     | 16 ++++++++++++++++
 drivers/gpu/drm/xe/xe_pm.h     |  1 +
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 86867d42d5329..dc3721bb37b1e 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -631,9 +631,20 @@ bool xe_device_mem_access_ongoing(struct xe_device *xe)
 	return atomic_read(&xe->mem_access.ref);
 }
 
+/**
+ * xe_device_assert_mem_access - Inspect the current runtime_pm state.
+ * @xe: xe device instance
+ *
+ * To be used before any kind of memory access. It will splat a debug warning
+ * if the device is currently sleeping. But it doesn't guarantee in any way
+ * that the device is going to continue awake. Xe PM runtime get and put
+ * functions might be added to the outer bound of the memory access, while
+ * this check is intended for inner usage to splat some warning if the worst
+ * case has just happened.
+ */
 void xe_device_assert_mem_access(struct xe_device *xe)
 {
-	XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
+	XE_WARN_ON(xe_pm_runtime_suspended(xe));
 }
 
 bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe)
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index cabed94a21873..45114e4e76a5a 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -246,6 +246,22 @@ struct task_struct *xe_pm_read_callback_task(struct xe_device *xe)
 	return READ_ONCE(xe->pm_callback_task);
 }
 
+/**
+ * xe_pm_runtime_suspended - Inspect the current runtime_pm state.
+ * @xe: xe device instance
+ *
+ * This does not provide any guarantee that the device is going to continue
+ * suspended as it might be racing with the runtime state transitions.
+ * It can be used only as a non-reliable assertion, to ensure that we are not in
+ * the sleep state while trying to access some memory for instance.
+ *
+ * Returns true if PCI device is suspended, false otherwise.
+ */
+bool xe_pm_runtime_suspended(struct xe_device *xe)
+{
+	return pm_runtime_suspended(xe->drm.dev);
+}
+
 /**
  * xe_pm_runtime_suspend - Prepare our device for D3hot/D3Cold
  * @xe: xe device instance
diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
index 069f41c61505b..67a9bf3dd379b 100644
--- a/drivers/gpu/drm/xe/xe_pm.h
+++ b/drivers/gpu/drm/xe/xe_pm.h
@@ -22,6 +22,7 @@ int xe_pm_resume(struct xe_device *xe);
 
 void xe_pm_init(struct xe_device *xe);
 void xe_pm_runtime_fini(struct xe_device *xe);
+bool xe_pm_runtime_suspended(struct xe_device *xe);
 int xe_pm_runtime_suspend(struct xe_device *xe);
 int xe_pm_runtime_resume(struct xe_device *xe);
 void xe_pm_runtime_get(struct xe_device *xe);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 07/20] drm/xe: Runtime PM wake on every IOCTL
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (5 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 06/20] drm/xe: Convert mem_access assertion towards the runtime_pm state Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2024-01-02 11:30   ` Gupta, Anshuman
  2023-12-28  2:12 ` [RFC 08/20] drm/xe: Runtime PM wake on every exec Rodrigo Vivi
                   ` (16 subsequent siblings)
  23 siblings, 1 reply; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

Let's ensure our PCI device is awaken on every IOCTL entry.
Let's increase the runtime_pm protection and start moving
that to the outer bounds.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c | 32 ++++++++++++++++++++++++++++++--
 drivers/gpu/drm/xe/xe_pm.c     | 15 +++++++++++++++
 drivers/gpu/drm/xe/xe_pm.h     |  1 +
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index dc3721bb37b1e..ee9b6612eec43 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -140,15 +140,43 @@ static const struct drm_ioctl_desc xe_ioctls[] = {
 			  DRM_RENDER_ALLOW),
 };
 
+static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	struct drm_file *file_priv = file->private_data;
+	struct xe_device *xe = to_xe_device(file_priv->minor->dev);
+	long ret;
+
+	ret = xe_pm_runtime_get_sync(xe);
+	if (ret >= 0)
+		ret = drm_ioctl(file, cmd, arg);
+	xe_pm_runtime_put(xe);
+
+	return ret;
+}
+
+static long xe_drm_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	struct drm_file *file_priv = file->private_data;
+	struct xe_device *xe = to_xe_device(file_priv->minor->dev);
+	long ret;
+
+	ret = xe_pm_runtime_get_sync(xe);
+	if (ret >= 0)
+		ret = drm_compat_ioctl(file, cmd, arg);
+	xe_pm_runtime_put(xe);
+
+	return ret;
+}
+
 static const struct file_operations xe_driver_fops = {
 	.owner = THIS_MODULE,
 	.open = drm_open,
 	.release = drm_release_noglobal,
-	.unlocked_ioctl = drm_ioctl,
+	.unlocked_ioctl = xe_drm_ioctl,
 	.mmap = drm_gem_mmap,
 	.poll = drm_poll,
 	.read = drm_read,
-	.compat_ioctl = drm_compat_ioctl,
+	.compat_ioctl = xe_drm_compat_ioctl,
 	.llseek = noop_llseek,
 #ifdef CONFIG_PROC_FS
 	.show_fdinfo = drm_show_fdinfo,
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index 45114e4e76a5a..f599707413f18 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -411,6 +411,21 @@ void xe_pm_runtime_put(struct xe_device *xe)
 	pm_runtime_put(xe->drm.dev);
 }
 
+/**
+ * xe_pm_runtime_get_sync - Get a runtime_pm reference and resume synchronously
+ * @xe: xe device instance
+ *
+ * Returns: Any number grater than or equal to 0 for success, negative error
+ * code otherwise.
+ */
+int xe_pm_runtime_get_sync(struct xe_device *xe)
+{
+        if (WARN_ON(xe_pm_read_callback_task(xe) == current))
+                return -ELOOP;
+
+        return pm_runtime_get_sync(xe->drm.dev);
+}
+
 /**
  * xe_pm_runtime_get_if_active - Get a runtime_pm reference if device active
  * @xe: xe device instance
diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
index 67a9bf3dd379b..d0e6011a80688 100644
--- a/drivers/gpu/drm/xe/xe_pm.h
+++ b/drivers/gpu/drm/xe/xe_pm.h
@@ -26,6 +26,7 @@ bool xe_pm_runtime_suspended(struct xe_device *xe);
 int xe_pm_runtime_suspend(struct xe_device *xe);
 int xe_pm_runtime_resume(struct xe_device *xe);
 void xe_pm_runtime_get(struct xe_device *xe);
+int xe_pm_runtime_get_sync(struct xe_device *xe);
 void xe_pm_runtime_put(struct xe_device *xe);
 int xe_pm_runtime_get_if_active(struct xe_device *xe);
 bool xe_pm_runtime_get_if_in_use(struct xe_device *xe);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 08/20] drm/xe: Runtime PM wake on every exec
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (6 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 07/20] drm/xe: Runtime PM wake on every IOCTL Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2024-01-09 11:24   ` Matthew Auld
  2023-12-28  2:12 ` [RFC 09/20] drm/xe: Runtime PM wake on every sysfs call Rodrigo Vivi
                   ` (15 subsequent siblings)
  23 siblings, 1 reply; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

Let's ensure our PCI device is awaken on every GT execution to
the end of the execution.
Let's increase the runtime_pm protection and start moving
that to the outer bounds.

Let's also remove the unnecessary mem_access get/put.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_sched_job.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index 01106a1156ad8..0b30ec77fc5ad 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -15,6 +15,7 @@
 #include "xe_hw_fence.h"
 #include "xe_lrc.h"
 #include "xe_macros.h"
+#include "xe_pm.h"
 #include "xe_trace.h"
 #include "xe_vm.h"
 
@@ -67,6 +68,8 @@ static void job_free(struct xe_sched_job *job)
 	struct xe_exec_queue *q = job->q;
 	bool is_migration = xe_sched_job_is_migration(q);
 
+	xe_pm_runtime_put(gt_to_xe(q->gt));
+
 	kmem_cache_free(xe_exec_queue_is_parallel(job->q) || is_migration ?
 			xe_sched_job_parallel_slab : xe_sched_job_slab, job);
 }
@@ -86,6 +89,8 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
 	int i, j;
 	u32 width;
 
+	xe_pm_runtime_get(gt_to_xe(q->gt));
+
 	/* only a kernel context can submit a vm-less job */
 	XE_WARN_ON(!q->vm && !(q->flags & EXEC_QUEUE_FLAG_KERNEL));
 
@@ -155,9 +160,6 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
 	for (i = 0; i < width; ++i)
 		job->batch_addr[i] = batch_addr[i];
 
-	/* All other jobs require a VM to be open which has a ref */
-	if (unlikely(q->flags & EXEC_QUEUE_FLAG_KERNEL))
-		xe_device_mem_access_get(job_to_xe(job));
 	xe_device_assert_mem_access(job_to_xe(job));
 
 	trace_xe_sched_job_create(job);
@@ -189,8 +191,6 @@ void xe_sched_job_destroy(struct kref *ref)
 	struct xe_sched_job *job =
 		container_of(ref, struct xe_sched_job, refcount);
 
-	if (unlikely(job->q->flags & EXEC_QUEUE_FLAG_KERNEL))
-		xe_device_mem_access_put(job_to_xe(job));
 	xe_exec_queue_put(job->q);
 	dma_fence_put(job->fence);
 	drm_sched_job_cleanup(&job->drm);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 09/20] drm/xe: Runtime PM wake on every sysfs call
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (7 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 08/20] drm/xe: Runtime PM wake on every exec Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2023-12-28  2:12 ` [RFC 10/20] drm/xe: Sort some xe_pm_runtime related functions Rodrigo Vivi
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

Let's ensure our PCI device is awaken on every sysfs call.
Let's increase the runtime_pm protection and start moving
that to the outer bounds.

For now, for the files with small number of attr functions,
let's only call the runtime pm functions directly.
For the hw_engines entries with many files, let's add
the sysfs_ops wrapper.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_device_sysfs.c          |  4 ++
 drivers/gpu/drm/xe/xe_gt_freq.c               | 38 +++++++++++-
 drivers/gpu/drm/xe/xe_gt_idle.c               | 23 +++++++-
 drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c     |  3 +
 drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c | 58 ++++++++++++++++++-
 drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.h |  7 +++
 drivers/gpu/drm/xe/xe_tile_sysfs.c            |  1 +
 7 files changed, 129 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device_sysfs.c b/drivers/gpu/drm/xe/xe_device_sysfs.c
index 99113a5a2b849..e47c8ad1bb17a 100644
--- a/drivers/gpu/drm/xe/xe_device_sysfs.c
+++ b/drivers/gpu/drm/xe/xe_device_sysfs.c
@@ -35,7 +35,9 @@ vram_d3cold_threshold_show(struct device *dev,
 	if (!xe)
 		return -EINVAL;
 
+	xe_pm_runtime_get(xe);
 	ret = sysfs_emit(buf, "%d\n", xe->d3cold.vram_threshold);
+	xe_pm_runtime_put(xe);
 
 	return ret;
 }
@@ -58,7 +60,9 @@ vram_d3cold_threshold_store(struct device *dev, struct device_attribute *attr,
 
 	drm_dbg(&xe->drm, "vram_d3cold_threshold: %u\n", vram_d3cold_threshold);
 
+	xe_pm_runtime_get(xe);
 	ret = xe_pm_set_vram_threshold(xe, vram_d3cold_threshold);
+	xe_pm_runtime_put(xe);
 
 	return ret ?: count;
 }
diff --git a/drivers/gpu/drm/xe/xe_gt_freq.c b/drivers/gpu/drm/xe/xe_gt_freq.c
index 3adfa6686e7cf..2f1f7afb55345 100644
--- a/drivers/gpu/drm/xe/xe_gt_freq.c
+++ b/drivers/gpu/drm/xe/xe_gt_freq.c
@@ -15,6 +15,7 @@
 #include "xe_gt_sysfs.h"
 #include "xe_gt_throttle_sysfs.h"
 #include "xe_guc_pc.h"
+#include "xe_pm.h"
 
 /**
  * DOC: Xe GT Frequency Management
@@ -49,12 +50,23 @@ dev_to_pc(struct device *dev)
 	return &kobj_to_gt(dev->kobj.parent)->uc.guc.pc;
 }
 
+static struct xe_device *
+dev_to_xe(struct device *dev)
+{
+	return gt_to_xe(kobj_to_gt(dev->kobj.parent));
+}
+
 static ssize_t act_freq_show(struct device *dev,
 			     struct device_attribute *attr, char *buf)
 {
 	struct xe_guc_pc *pc = dev_to_pc(dev);
+	u32 freq;
+
+	xe_pm_runtime_get(dev_to_xe(dev));
+	freq = xe_guc_pc_get_act_freq(pc);
+	xe_pm_runtime_put(dev_to_xe(dev));
 
-	return sysfs_emit(buf, "%d\n", xe_guc_pc_get_act_freq(pc));
+	return sysfs_emit(buf, "%d\n", freq);
 }
 static DEVICE_ATTR_RO(act_freq);
 
@@ -65,7 +77,9 @@ static ssize_t cur_freq_show(struct device *dev,
 	u32 freq;
 	ssize_t ret;
 
+	xe_pm_runtime_get(dev_to_xe(dev));
 	ret = xe_guc_pc_get_cur_freq(pc, &freq);
+	xe_pm_runtime_put(dev_to_xe(dev));
 	if (ret)
 		return ret;
 
@@ -77,8 +91,13 @@ static ssize_t rp0_freq_show(struct device *dev,
 			     struct device_attribute *attr, char *buf)
 {
 	struct xe_guc_pc *pc = dev_to_pc(dev);
+	u32 freq;
+
+	xe_pm_runtime_get(dev_to_xe(dev));
+	freq = xe_guc_pc_get_rp0_freq(pc);
+	xe_pm_runtime_put(dev_to_xe(dev));
 
-	return sysfs_emit(buf, "%d\n", xe_guc_pc_get_rp0_freq(pc));
+	return sysfs_emit(buf, "%d\n", freq);
 }
 static DEVICE_ATTR_RO(rp0_freq);
 
@@ -86,8 +105,13 @@ static ssize_t rpe_freq_show(struct device *dev,
 			     struct device_attribute *attr, char *buf)
 {
 	struct xe_guc_pc *pc = dev_to_pc(dev);
+	u32 freq;
+
+	xe_pm_runtime_get(dev_to_xe(dev));
+	freq = xe_guc_pc_get_rpe_freq(pc);
+	xe_pm_runtime_put(dev_to_xe(dev));
 
-	return sysfs_emit(buf, "%d\n", xe_guc_pc_get_rpe_freq(pc));
+	return sysfs_emit(buf, "%d\n", freq);
 }
 static DEVICE_ATTR_RO(rpe_freq);
 
@@ -107,7 +131,9 @@ static ssize_t min_freq_show(struct device *dev,
 	u32 freq;
 	ssize_t ret;
 
+	xe_pm_runtime_get(dev_to_xe(dev));
 	ret = xe_guc_pc_get_min_freq(pc, &freq);
+	xe_pm_runtime_put(dev_to_xe(dev));
 	if (ret)
 		return ret;
 
@@ -125,7 +151,9 @@ static ssize_t min_freq_store(struct device *dev, struct device_attribute *attr,
 	if (ret)
 		return ret;
 
+	xe_pm_runtime_get(dev_to_xe(dev));
 	ret = xe_guc_pc_set_min_freq(pc, freq);
+	xe_pm_runtime_put(dev_to_xe(dev));
 	if (ret)
 		return ret;
 
@@ -140,7 +168,9 @@ static ssize_t max_freq_show(struct device *dev,
 	u32 freq;
 	ssize_t ret;
 
+	xe_pm_runtime_get(dev_to_xe(dev));
 	ret = xe_guc_pc_get_max_freq(pc, &freq);
+	xe_pm_runtime_put(dev_to_xe(dev));
 	if (ret)
 		return ret;
 
@@ -158,7 +188,9 @@ static ssize_t max_freq_store(struct device *dev, struct device_attribute *attr,
 	if (ret)
 		return ret;
 
+	xe_pm_runtime_get(dev_to_xe(dev));
 	ret = xe_guc_pc_set_max_freq(pc, freq);
+	xe_pm_runtime_put(dev_to_xe(dev));
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/xe/xe_gt_idle.c b/drivers/gpu/drm/xe/xe_gt_idle.c
index 9358f73368896..824ff458011fd 100644
--- a/drivers/gpu/drm/xe/xe_gt_idle.c
+++ b/drivers/gpu/drm/xe/xe_gt_idle.c
@@ -12,6 +12,7 @@
 #include "xe_guc_pc.h"
 #include "regs/xe_gt_regs.h"
 #include "xe_mmio.h"
+#include "xe_pm.h"
 
 /**
  * DOC: Xe GT Idle
@@ -40,6 +41,15 @@ static struct xe_guc_pc *gtidle_to_pc(struct xe_gt_idle *gtidle)
 	return &gtidle_to_gt(gtidle)->uc.guc.pc;
 }
 
+static struct xe_device *
+pc_to_xe(struct xe_guc_pc *pc)
+{
+	struct xe_guc *guc = container_of(pc, struct xe_guc, pc);
+	struct xe_gt *gt = container_of(guc, struct xe_gt, uc.guc);
+
+	return gt_to_xe(gt);
+}
+
 static const char *gt_idle_state_to_string(enum xe_gt_idle_state state)
 {
 	switch (state) {
@@ -86,8 +96,14 @@ static ssize_t name_show(struct device *dev,
 			 struct device_attribute *attr, char *buff)
 {
 	struct xe_gt_idle *gtidle = dev_to_gtidle(dev);
+	struct xe_guc_pc *pc = gtidle_to_pc(gtidle);
+	ssize_t ret;
+
+	xe_pm_runtime_get(pc_to_xe(pc));
+	ret = sysfs_emit(buff, "%s\n", gtidle->name);
+	xe_pm_runtime_put(pc_to_xe(pc));
 
-	return sysfs_emit(buff, "%s\n", gtidle->name);
+	return ret;
 }
 static DEVICE_ATTR_RO(name);
 
@@ -98,7 +114,9 @@ static ssize_t idle_status_show(struct device *dev,
 	struct xe_guc_pc *pc = gtidle_to_pc(gtidle);
 	enum xe_gt_idle_state state;
 
+	xe_pm_runtime_get(pc_to_xe(pc));
 	state = gtidle->idle_status(pc);
+	xe_pm_runtime_put(pc_to_xe(pc));
 
 	return sysfs_emit(buff, "%s\n", gt_idle_state_to_string(state));
 }
@@ -111,7 +129,10 @@ static ssize_t idle_residency_ms_show(struct device *dev,
 	struct xe_guc_pc *pc = gtidle_to_pc(gtidle);
 	u64 residency;
 
+	xe_pm_runtime_get(pc_to_xe(pc));
 	residency = gtidle->idle_residency(pc);
+	xe_pm_runtime_put(pc_to_xe(pc));
+
 	return sysfs_emit(buff, "%llu\n", get_residency_ms(gtidle, residency));
 }
 static DEVICE_ATTR_RO(idle_residency_ms);
diff --git a/drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c b/drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c
index 63d640591a527..9c33045ff1ef4 100644
--- a/drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c
+++ b/drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c
@@ -11,6 +11,7 @@
 #include "xe_gt_sysfs.h"
 #include "xe_gt_throttle_sysfs.h"
 #include "xe_mmio.h"
+#include "xe_pm.h"
 
 /**
  * DOC: Xe GT Throttle
@@ -38,10 +39,12 @@ static u32 read_perf_limit_reasons(struct xe_gt *gt)
 {
 	u32 reg;
 
+	xe_pm_runtime_get(gt_to_xe(gt));
 	if (xe_gt_is_media_type(gt))
 		reg = xe_mmio_read32(gt, MTL_MEDIA_PERF_LIMIT_REASONS);
 	else
 		reg = xe_mmio_read32(gt, GT0_PERF_LIMIT_REASONS);
+	xe_pm_runtime_put(gt_to_xe(gt));
 
 	return reg;
 }
diff --git a/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c b/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c
index e49bc14f0ecf0..756ec3320b697 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c
@@ -9,6 +9,7 @@
 
 #include "xe_gt.h"
 #include "xe_hw_engine_class_sysfs.h"
+#include "xe_pm.h"
 
 #define MAX_ENGINE_CLASS_NAME_LEN    16
 static int xe_add_hw_engine_class_defaults(struct xe_device *xe,
@@ -513,6 +514,7 @@ kobj_xe_hw_engine_class(struct xe_device *xe, struct kobject *parent, char *name
 		kobject_put(&keclass->base);
 		return NULL;
 	}
+	keclass->xe = xe;
 
 	err = drmm_add_action_or_reset(&xe->drm, kobj_xe_hw_engine_class_fini,
 				       &keclass->base);
@@ -567,9 +569,63 @@ static void xe_hw_engine_sysfs_kobj_release(struct kobject *kobj)
 	kfree(kobj);
 }
 
+#include "xe_pm.h"
+
+static inline struct xe_device *pdev_to_xe_device(struct pci_dev *pdev)
+{
+	return pci_get_drvdata(pdev);
+}
+static inline struct xe_device *to_xe_device(const struct drm_device *dev)
+{
+	return container_of(dev, struct xe_device, drm);
+}
+
+
+static ssize_t xe_hw_engine_class_sysfs_attr_show(struct kobject *kobj,
+						  struct attribute *attr,
+						  char *buf)
+{
+	struct xe_device *xe = kobj_to_xe(kobj);
+	struct kobj_attribute *kattr;
+	ssize_t ret = -EIO;
+
+	kattr = container_of(attr, struct kobj_attribute, attr);
+	if (kattr->show) {
+		xe_pm_runtime_get(xe);
+		ret = kattr->show(kobj, kattr, buf);
+		xe_pm_runtime_put(xe);
+	}
+
+	return ret;
+}
+
+static ssize_t xe_hw_engine_class_sysfs_attr_store(struct kobject *kobj,
+						   struct attribute *attr,
+						   const char *buf,
+						   size_t count)
+{
+	struct xe_device *xe = kobj_to_xe(kobj);
+	struct kobj_attribute *kattr;
+	ssize_t ret = -EIO;
+
+	kattr = container_of(attr, struct kobj_attribute, attr);
+	if (kattr->store) {
+		xe_pm_runtime_get(xe);
+		ret = kattr->store(kobj, kattr, buf, count);
+		xe_pm_runtime_put(xe);
+	}
+
+	return ret;
+}
+
+static const struct sysfs_ops xe_hw_engine_class_sysfs_ops = {
+	.show = xe_hw_engine_class_sysfs_attr_show,
+	.store = xe_hw_engine_class_sysfs_attr_store,
+};
+
 static const struct kobj_type xe_hw_engine_sysfs_kobj_type = {
 	.release = xe_hw_engine_sysfs_kobj_release,
-	.sysfs_ops = &kobj_sysfs_ops,
+	.sysfs_ops = &xe_hw_engine_class_sysfs_ops,
 };
 
 static void hw_engine_class_sysfs_fini(struct drm_device *drm, void *arg)
diff --git a/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.h b/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.h
index ec5ba673b314b..28a0d7c909c01 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.h
+++ b/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.h
@@ -26,6 +26,8 @@ struct kobj_eclass {
 	struct kobject base;
 	/** @eclass: A pointer to the hw engine class interface */
 	struct xe_hw_engine_class_intf *eclass;
+	/** @xe: A pointer to the xe device */
+	struct xe_device *xe;
 };
 
 static inline struct xe_hw_engine_class_intf *kobj_to_eclass(struct kobject *kobj)
@@ -33,4 +35,9 @@ static inline struct xe_hw_engine_class_intf *kobj_to_eclass(struct kobject *kob
 	return container_of(kobj, struct kobj_eclass, base)->eclass;
 }
 
+static inline struct xe_device *kobj_to_xe(struct kobject *kobj)
+{
+	return container_of(kobj, struct kobj_eclass, base)->xe;
+}
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_tile_sysfs.c b/drivers/gpu/drm/xe/xe_tile_sysfs.c
index 0f8d3e7fce46a..5b55d1801c1e0 100644
--- a/drivers/gpu/drm/xe/xe_tile_sysfs.c
+++ b/drivers/gpu/drm/xe/xe_tile_sysfs.c
@@ -9,6 +9,7 @@
 
 #include "xe_tile.h"
 #include "xe_tile_sysfs.h"
+#include "xe_pm.h"
 
 static void xe_tile_sysfs_kobj_release(struct kobject *kobj)
 {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 10/20] drm/xe: Sort some xe_pm_runtime related functions
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (8 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 09/20] drm/xe: Runtime PM wake on every sysfs call Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2024-01-09 11:26   ` Matthew Auld
  2023-12-28  2:12 ` [RFC 11/20] drm/xe: Ensure device is awake before removing it Rodrigo Vivi
                   ` (13 subsequent siblings)
  23 siblings, 1 reply; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

No functional change. Just organizing the file a bit better

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_pm.c | 42 +++++++++++++++++++-------------------
 drivers/gpu/drm/xe/xe_pm.h |  4 ++--
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index f599707413f18..3594e707606ce 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -387,6 +387,23 @@ int xe_pm_runtime_resume(struct xe_device *xe)
 	return err;
 }
 
+/**
+ * xe_pm_runtime_resume_and_get - Resume, then get a runtime_pm ref if awake.
+ * @xe: xe device instance
+ *
+ * Returns: True if device is awake and the the reference was taken, false otherwise.
+ */
+bool xe_pm_runtime_resume_and_get(struct xe_device *xe)
+{
+	if (xe_pm_read_callback_task(xe) == current) {
+		/* The device is awake, grab the ref and move on */
+                pm_runtime_get_noresume(xe->drm.dev);
+		return true;
+	}
+
+        return pm_runtime_resume_and_get(xe->drm.dev) >= 0;
+}
+
 /**
  * xe_pm_runtime_get - Get a runtime_pm reference and resume synchronously
  * @xe: xe device instance
@@ -401,16 +418,6 @@ void xe_pm_runtime_get(struct xe_device *xe)
 	pm_runtime_resume(xe->drm.dev);
 }
 
-/**
- * xe_pm_runtime_put - Put the runtime_pm reference back and mark as idle
- * @xe: xe device instance
- */
-void xe_pm_runtime_put(struct xe_device *xe)
-{
-	pm_runtime_mark_last_busy(xe->drm.dev);
-	pm_runtime_put(xe->drm.dev);
-}
-
 /**
  * xe_pm_runtime_get_sync - Get a runtime_pm reference and resume synchronously
  * @xe: xe device instance
@@ -456,20 +463,13 @@ bool xe_pm_runtime_get_if_in_use(struct xe_device *xe)
 }
 
 /**
- * xe_pm_runtime_resume_and_get - Resume, then get a runtime_pm ref if awake.
+ * xe_pm_runtime_put - Put the runtime_pm reference back and mark as idle
  * @xe: xe device instance
- *
- * Returns: True if device is awake and the the reference was taken, false otherwise.
  */
-bool xe_pm_runtime_resume_and_get(struct xe_device *xe)
+void xe_pm_runtime_put(struct xe_device *xe)
 {
-	if (xe_pm_read_callback_task(xe) == current) {
-		/* The device is awake, grab the ref and move on */
-                pm_runtime_get_noresume(xe->drm.dev);
-		return true;
-	}
-
-        return pm_runtime_resume_and_get(xe->drm.dev) >= 0;
+	pm_runtime_mark_last_busy(xe->drm.dev);
+	pm_runtime_put(xe->drm.dev);
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
index d0e6011a80688..fc82a1466453b 100644
--- a/drivers/gpu/drm/xe/xe_pm.h
+++ b/drivers/gpu/drm/xe/xe_pm.h
@@ -25,12 +25,12 @@ void xe_pm_runtime_fini(struct xe_device *xe);
 bool xe_pm_runtime_suspended(struct xe_device *xe);
 int xe_pm_runtime_suspend(struct xe_device *xe);
 int xe_pm_runtime_resume(struct xe_device *xe);
+bool xe_pm_runtime_resume_and_get(struct xe_device *xe);
 void xe_pm_runtime_get(struct xe_device *xe);
 int xe_pm_runtime_get_sync(struct xe_device *xe);
-void xe_pm_runtime_put(struct xe_device *xe);
 int xe_pm_runtime_get_if_active(struct xe_device *xe);
 bool xe_pm_runtime_get_if_in_use(struct xe_device *xe);
-bool xe_pm_runtime_resume_and_get(struct xe_device *xe);
+void xe_pm_runtime_put(struct xe_device *xe);
 void xe_pm_assert_unbounded_bridge(struct xe_device *xe);
 int xe_pm_set_vram_threshold(struct xe_device *xe, u32 threshold);
 void xe_pm_d3cold_allowed_toggle(struct xe_device *xe);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 11/20] drm/xe: Ensure device is awake before removing it
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (9 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 10/20] drm/xe: Sort some xe_pm_runtime related functions Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2023-12-28  2:12 ` [RFC 12/20] drm/xe: Remove mem_access from guc_pc calls Rodrigo Vivi
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

If device is suspended, the module unload might face challenges.
So, let's ensure that the very first thing of the "unprobe" (remove)
is to wake the device.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 1f997353a78f1..d4f36fa6572a8 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -686,8 +686,8 @@ static void xe_pci_remove(struct pci_dev *pdev)
 	if (!xe) /* driver load aborted, nothing to cleanup */
 		return;
 
-	xe_device_remove(xe);
 	xe_pm_runtime_fini(xe);
+	xe_device_remove(xe);
 	pci_set_drvdata(pdev, NULL);
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 12/20] drm/xe: Remove mem_access from guc_pc calls
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (10 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 11/20] drm/xe: Ensure device is awake before removing it Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2023-12-28  2:12 ` [RFC 13/20] drm/xe: Runtime PM wake on every debugfs call Rodrigo Vivi
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

We are now protected by init, sysfs, or removal and don't
need these mem_access protections around GuC_PC anymore.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_pc.c | 62 ++++++----------------------------
 1 file changed, 10 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c
index f71085228cb33..ce39eac7c8f54 100644
--- a/drivers/gpu/drm/xe/xe_guc_pc.c
+++ b/drivers/gpu/drm/xe/xe_guc_pc.c
@@ -381,8 +381,6 @@ u32 xe_guc_pc_get_act_freq(struct xe_guc_pc *pc)
 	struct xe_device *xe = gt_to_xe(gt);
 	u32 freq;
 
-	xe_device_mem_access_get(gt_to_xe(gt));
-
 	/* When in RC6, actual frequency reported will be 0. */
 	if (GRAPHICS_VERx100(xe) >= 1270) {
 		freq = xe_mmio_read32(gt, MTL_MIRROR_TARGET_WP1);
@@ -394,8 +392,6 @@ u32 xe_guc_pc_get_act_freq(struct xe_guc_pc *pc)
 
 	freq = decode_freq(freq);
 
-	xe_device_mem_access_put(gt_to_xe(gt));
-
 	return freq;
 }
 
@@ -412,14 +408,13 @@ int xe_guc_pc_get_cur_freq(struct xe_guc_pc *pc, u32 *freq)
 	struct xe_gt *gt = pc_to_gt(pc);
 	int ret;
 
-	xe_device_mem_access_get(gt_to_xe(gt));
 	/*
 	 * GuC SLPC plays with cur freq request when GuCRC is enabled
 	 * Block RC6 for a more reliable read.
 	 */
 	ret = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
 	if (ret)
-		goto out;
+		return ret;
 
 	*freq = xe_mmio_read32(gt, RPNSWREQ);
 
@@ -427,9 +422,7 @@ int xe_guc_pc_get_cur_freq(struct xe_guc_pc *pc, u32 *freq)
 	*freq = decode_freq(*freq);
 
 	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
-out:
-	xe_device_mem_access_put(gt_to_xe(gt));
-	return ret;
+	return 0;
 }
 
 /**
@@ -451,12 +444,7 @@ u32 xe_guc_pc_get_rp0_freq(struct xe_guc_pc *pc)
  */
 u32 xe_guc_pc_get_rpe_freq(struct xe_guc_pc *pc)
 {
-	struct xe_gt *gt = pc_to_gt(pc);
-	struct xe_device *xe = gt_to_xe(gt);
-
-	xe_device_mem_access_get(xe);
 	pc_update_rp_values(pc);
-	xe_device_mem_access_put(xe);
 
 	return pc->rpe_freq;
 }
@@ -485,7 +473,6 @@ int xe_guc_pc_get_min_freq(struct xe_guc_pc *pc, u32 *freq)
 	struct xe_gt *gt = pc_to_gt(pc);
 	int ret;
 
-	xe_device_mem_access_get(pc_to_xe(pc));
 	mutex_lock(&pc->freq_lock);
 	if (!pc->freq_ready) {
 		/* Might be in the middle of a gt reset */
@@ -511,7 +498,6 @@ int xe_guc_pc_get_min_freq(struct xe_guc_pc *pc, u32 *freq)
 	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
 out:
 	mutex_unlock(&pc->freq_lock);
-	xe_device_mem_access_put(pc_to_xe(pc));
 	return ret;
 }
 
@@ -528,7 +514,6 @@ int xe_guc_pc_set_min_freq(struct xe_guc_pc *pc, u32 freq)
 {
 	int ret;
 
-	xe_device_mem_access_get(pc_to_xe(pc));
 	mutex_lock(&pc->freq_lock);
 	if (!pc->freq_ready) {
 		/* Might be in the middle of a gt reset */
@@ -544,8 +529,6 @@ int xe_guc_pc_set_min_freq(struct xe_guc_pc *pc, u32 freq)
 
 out:
 	mutex_unlock(&pc->freq_lock);
-	xe_device_mem_access_put(pc_to_xe(pc));
-
 	return ret;
 }
 
@@ -561,7 +544,6 @@ int xe_guc_pc_get_max_freq(struct xe_guc_pc *pc, u32 *freq)
 {
 	int ret;
 
-	xe_device_mem_access_get(pc_to_xe(pc));
 	mutex_lock(&pc->freq_lock);
 	if (!pc->freq_ready) {
 		/* Might be in the middle of a gt reset */
@@ -577,7 +559,6 @@ int xe_guc_pc_get_max_freq(struct xe_guc_pc *pc, u32 *freq)
 
 out:
 	mutex_unlock(&pc->freq_lock);
-	xe_device_mem_access_put(pc_to_xe(pc));
 	return ret;
 }
 
@@ -594,7 +575,6 @@ int xe_guc_pc_set_max_freq(struct xe_guc_pc *pc, u32 freq)
 {
 	int ret;
 
-	xe_device_mem_access_get(pc_to_xe(pc));
 	mutex_lock(&pc->freq_lock);
 	if (!pc->freq_ready) {
 		/* Might be in the middle of a gt reset */
@@ -610,7 +590,6 @@ int xe_guc_pc_set_max_freq(struct xe_guc_pc *pc, u32 freq)
 
 out:
 	mutex_unlock(&pc->freq_lock);
-	xe_device_mem_access_put(pc_to_xe(pc));
 	return ret;
 }
 
@@ -623,8 +602,6 @@ enum xe_gt_idle_state xe_guc_pc_c_status(struct xe_guc_pc *pc)
 	struct xe_gt *gt = pc_to_gt(pc);
 	u32 reg, gt_c_state;
 
-	xe_device_mem_access_get(gt_to_xe(gt));
-
 	if (GRAPHICS_VERx100(gt_to_xe(gt)) >= 1270) {
 		reg = xe_mmio_read32(gt, MTL_MIRROR_TARGET_WP1);
 		gt_c_state = REG_FIELD_GET(MTL_CC_MASK, reg);
@@ -633,8 +610,6 @@ enum xe_gt_idle_state xe_guc_pc_c_status(struct xe_guc_pc *pc)
 		gt_c_state = REG_FIELD_GET(RCN_MASK, reg);
 	}
 
-	xe_device_mem_access_put(gt_to_xe(gt));
-
 	switch (gt_c_state) {
 	case GT_C6:
 		return GT_IDLE_C6;
@@ -654,9 +629,7 @@ u64 xe_guc_pc_rc6_residency(struct xe_guc_pc *pc)
 	struct xe_gt *gt = pc_to_gt(pc);
 	u32 reg;
 
-	xe_device_mem_access_get(gt_to_xe(gt));
 	reg = xe_mmio_read32(gt, GT_GFX_RC6);
-	xe_device_mem_access_put(gt_to_xe(gt));
 
 	return reg;
 }
@@ -670,9 +643,7 @@ u64 xe_guc_pc_mc6_residency(struct xe_guc_pc *pc)
 	struct xe_gt *gt = pc_to_gt(pc);
 	u64 reg;
 
-	xe_device_mem_access_get(gt_to_xe(gt));
 	reg = xe_mmio_read32(gt, MTL_MEDIA_MC6);
-	xe_device_mem_access_put(gt_to_xe(gt));
 
 	return reg;
 }
@@ -801,23 +772,19 @@ int xe_guc_pc_gucrc_disable(struct xe_guc_pc *pc)
 	if (xe->info.skip_guc_pc)
 		return 0;
 
-	xe_device_mem_access_get(pc_to_xe(pc));
-
 	ret = pc_action_setup_gucrc(pc, XE_GUCRC_HOST_CONTROL);
 	if (ret)
-		goto out;
+		return ret;
 
 	ret = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
 	if (ret)
-		goto out;
+		return ret;
 
 	xe_gt_idle_disable_c6(gt);
 
 	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
 
-out:
-	xe_device_mem_access_put(pc_to_xe(pc));
-	return ret;
+	return 0;
 }
 
 static void pc_init_pcode_freq(struct xe_guc_pc *pc)
@@ -870,11 +837,9 @@ int xe_guc_pc_start(struct xe_guc_pc *pc)
 
 	xe_gt_assert(gt, xe_device_uc_enabled(xe));
 
-	xe_device_mem_access_get(pc_to_xe(pc));
-
 	ret = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
 	if (ret)
-		goto out_fail_force_wake;
+		return ret;
 
 	if (xe->info.skip_guc_pc) {
 		if (xe->info.platform != XE_PVC)
@@ -914,8 +879,6 @@ int xe_guc_pc_start(struct xe_guc_pc *pc)
 
 out:
 	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
-out_fail_force_wake:
-	xe_device_mem_access_put(pc_to_xe(pc));
 	return ret;
 }
 
@@ -928,12 +891,9 @@ int xe_guc_pc_stop(struct xe_guc_pc *pc)
 	struct xe_device *xe = pc_to_xe(pc);
 	int ret;
 
-	xe_device_mem_access_get(pc_to_xe(pc));
-
 	if (xe->info.skip_guc_pc) {
 		xe_gt_idle_disable_c6(pc_to_gt(pc));
-		ret = 0;
-		goto out;
+		return 0;
 	}
 
 	mutex_lock(&pc->freq_lock);
@@ -942,16 +902,14 @@ int xe_guc_pc_stop(struct xe_guc_pc *pc)
 
 	ret = pc_action_shutdown(pc);
 	if (ret)
-		goto out;
+		return ret;
 
 	if (wait_for_pc_state(pc, SLPC_GLOBAL_STATE_NOT_RUNNING)) {
 		drm_err(&pc_to_xe(pc)->drm, "GuC PC Shutdown failed\n");
-		ret = -EIO;
+		return -EIO;
 	}
 
-out:
-	xe_device_mem_access_put(pc_to_xe(pc));
-	return ret;
+	return 0;
 }
 
 /**
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 13/20] drm/xe: Runtime PM wake on every debugfs call
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (11 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 12/20] drm/xe: Remove mem_access from guc_pc calls Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2023-12-28  2:12 ` [RFC 14/20] drm/xe: Replace dma_buf mem_access per direct xe_pm_runtime calls Rodrigo Vivi
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

Let's ensure our PCI device is awaken on every debugfs call.
Let's increase the runtime_pm protection and start moving
that to the outer bounds.

Also remove the mem_access get_put helpers, now that they are not
needed anymore.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_debugfs.c     | 10 +++---
 drivers/gpu/drm/xe/xe_gt_debugfs.c  | 53 ++++++++++++++++++++++++++---
 drivers/gpu/drm/xe/xe_guc_debugfs.c |  9 ++---
 drivers/gpu/drm/xe/xe_huc_debugfs.c |  3 ++
 drivers/gpu/drm/xe/xe_ttm_sys_mgr.c |  5 ++-
 5 files changed, 66 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
index c56fd7d59f057..1ead9cc9300a6 100644
--- a/drivers/gpu/drm/xe/xe_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_debugfs.c
@@ -12,6 +12,7 @@
 #include "xe_bo.h"
 #include "xe_device.h"
 #include "xe_gt_debugfs.h"
+#include "xe_pm.h"
 #include "xe_step.h"
 
 #ifdef CONFIG_DRM_XE_DEBUG
@@ -37,6 +38,8 @@ static int info(struct seq_file *m, void *data)
 	struct xe_gt *gt;
 	u8 id;
 
+	xe_pm_runtime_get(xe);
+
 	drm_printf(&p, "graphics_verx100 %d\n", xe->info.graphics_verx100);
 	drm_printf(&p, "media_verx100 %d\n", xe->info.media_verx100);
 	drm_printf(&p, "stepping G:%s M:%s D:%s B:%s\n",
@@ -62,6 +65,7 @@ static int info(struct seq_file *m, void *data)
 			   gt->info.engine_mask);
 	}
 
+	xe_pm_runtime_put(xe);
 	return 0;
 }
 
@@ -75,8 +79,7 @@ static int forcewake_open(struct inode *inode, struct file *file)
 	struct xe_gt *gt;
 	u8 id;
 
-	xe_device_mem_access_get(xe);
-
+	xe_pm_runtime_get(xe);
 	for_each_gt(gt, xe, id)
 		XE_WARN_ON(xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL));
 
@@ -91,8 +94,7 @@ static int forcewake_release(struct inode *inode, struct file *file)
 
 	for_each_gt(gt, xe, id)
 		XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
-
-	xe_device_mem_access_put(xe);
+	xe_pm_runtime_put(xe);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/xe/xe_gt_debugfs.c b/drivers/gpu/drm/xe/xe_gt_debugfs.c
index c4b67cf09f8f2..6b4dc29277277 100644
--- a/drivers/gpu/drm/xe/xe_gt_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_gt_debugfs.c
@@ -18,6 +18,7 @@
 #include "xe_lrc.h"
 #include "xe_macros.h"
 #include "xe_pat.h"
+#include "xe_pm.h"
 #include "xe_reg_sr.h"
 #include "xe_reg_whitelist.h"
 #include "xe_uc_debugfs.h"
@@ -37,10 +38,10 @@ static int hw_engines(struct seq_file *m, void *data)
 	enum xe_hw_engine_id id;
 	int err;
 
-	xe_device_mem_access_get(xe);
+	xe_pm_runtime_get(xe);
 	err = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
 	if (err) {
-		xe_device_mem_access_put(xe);
+		xe_pm_runtime_put(xe);
 		return err;
 	}
 
@@ -48,7 +49,7 @@ static int hw_engines(struct seq_file *m, void *data)
 		xe_hw_engine_print(hwe, &p);
 
 	err = xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL);
-	xe_device_mem_access_put(xe);
+	xe_pm_runtime_put(xe);
 	if (err)
 		return err;
 
@@ -59,18 +60,23 @@ static int force_reset(struct seq_file *m, void *data)
 {
 	struct xe_gt *gt = node_to_gt(m->private);
 
+	xe_pm_runtime_get(gt_to_xe(gt));
 	xe_gt_reset_async(gt);
+	xe_pm_runtime_put(gt_to_xe(gt));
 
 	return 0;
 }
 
 static int sa_info(struct seq_file *m, void *data)
 {
-	struct xe_tile *tile = gt_to_tile(node_to_gt(m->private));
+	struct xe_gt *gt = node_to_gt(m->private);
+	struct xe_tile *tile = gt_to_tile(gt);
 	struct drm_printer p = drm_seq_file_printer(m);
 
+	xe_pm_runtime_get(gt_to_xe(gt));
 	drm_suballoc_dump_debug_info(&tile->mem.kernel_bb_pool->base, &p,
 				     tile->mem.kernel_bb_pool->gpu_addr);
+	xe_pm_runtime_put(gt_to_xe(gt));
 
 	return 0;
 }
@@ -80,7 +86,9 @@ static int topology(struct seq_file *m, void *data)
 	struct xe_gt *gt = node_to_gt(m->private);
 	struct drm_printer p = drm_seq_file_printer(m);
 
+	xe_pm_runtime_get(gt_to_xe(gt));
 	xe_gt_topology_dump(gt, &p);
+	xe_pm_runtime_put(gt_to_xe(gt));
 
 	return 0;
 }
@@ -90,7 +98,9 @@ static int steering(struct seq_file *m, void *data)
 	struct xe_gt *gt = node_to_gt(m->private);
 	struct drm_printer p = drm_seq_file_printer(m);
 
+	xe_pm_runtime_get(gt_to_xe(gt));
 	xe_gt_mcr_steering_dump(gt, &p);
+	xe_pm_runtime_put(gt_to_xe(gt));
 
 	return 0;
 }
@@ -99,8 +109,13 @@ static int ggtt(struct seq_file *m, void *data)
 {
 	struct xe_gt *gt = node_to_gt(m->private);
 	struct drm_printer p = drm_seq_file_printer(m);
+	int ret;
+
+	xe_pm_runtime_get(gt_to_xe(gt));
+	ret = xe_ggtt_dump(gt_to_tile(gt)->mem.ggtt, &p);
+	xe_pm_runtime_put(gt_to_xe(gt));
 
-	return xe_ggtt_dump(gt_to_tile(gt)->mem.ggtt, &p);
+	return ret;
 }
 
 static int register_save_restore(struct seq_file *m, void *data)
@@ -110,6 +125,8 @@ static int register_save_restore(struct seq_file *m, void *data)
 	struct xe_hw_engine *hwe;
 	enum xe_hw_engine_id id;
 
+	xe_pm_runtime_get(gt_to_xe(gt));
+
 	xe_reg_sr_dump(&gt->reg_sr, &p);
 	drm_printf(&p, "\n");
 
@@ -127,6 +144,8 @@ static int register_save_restore(struct seq_file *m, void *data)
 	for_each_hw_engine(hwe, gt, id)
 		xe_reg_whitelist_dump(&hwe->reg_whitelist, &p);
 
+	xe_pm_runtime_put(gt_to_xe(gt));
+
 	return 0;
 }
 
@@ -135,7 +154,9 @@ static int workarounds(struct seq_file *m, void *data)
 	struct xe_gt *gt = node_to_gt(m->private);
 	struct drm_printer p = drm_seq_file_printer(m);
 
+	xe_pm_runtime_get(gt_to_xe(gt));
 	xe_wa_dump(gt, &p);
+	xe_pm_runtime_put(gt_to_xe(gt));
 
 	return 0;
 }
@@ -145,48 +166,70 @@ static int pat(struct seq_file *m, void *data)
 	struct xe_gt *gt = node_to_gt(m->private);
 	struct drm_printer p = drm_seq_file_printer(m);
 
+	xe_pm_runtime_get(gt_to_xe(gt));
 	xe_pat_dump(gt, &p);
+	xe_pm_runtime_put(gt_to_xe(gt));
 
 	return 0;
 }
 
 static int rcs_default_lrc(struct seq_file *m, void *data)
 {
+	struct xe_gt *gt = node_to_gt(m->private);
 	struct drm_printer p = drm_seq_file_printer(m);
 
+	xe_pm_runtime_get(gt_to_xe(gt));
 	xe_lrc_dump_default(&p, node_to_gt(m->private), XE_ENGINE_CLASS_RENDER);
+	xe_pm_runtime_put(gt_to_xe(gt));
+
 	return 0;
 }
 
 static int ccs_default_lrc(struct seq_file *m, void *data)
 {
+	struct xe_gt *gt = node_to_gt(m->private);
 	struct drm_printer p = drm_seq_file_printer(m);
 
+	xe_pm_runtime_get(gt_to_xe(gt));
 	xe_lrc_dump_default(&p, node_to_gt(m->private), XE_ENGINE_CLASS_COMPUTE);
+	xe_pm_runtime_put(gt_to_xe(gt));
+
 	return 0;
 }
 
 static int bcs_default_lrc(struct seq_file *m, void *data)
 {
+	struct xe_gt *gt = node_to_gt(m->private);
 	struct drm_printer p = drm_seq_file_printer(m);
 
+	xe_pm_runtime_get(gt_to_xe(gt));
 	xe_lrc_dump_default(&p, node_to_gt(m->private), XE_ENGINE_CLASS_COPY);
+	xe_pm_runtime_put(gt_to_xe(gt));
+
 	return 0;
 }
 
 static int vcs_default_lrc(struct seq_file *m, void *data)
 {
+	struct xe_gt *gt = node_to_gt(m->private);
 	struct drm_printer p = drm_seq_file_printer(m);
 
+	xe_pm_runtime_get(gt_to_xe(gt));
 	xe_lrc_dump_default(&p, node_to_gt(m->private), XE_ENGINE_CLASS_VIDEO_DECODE);
+	xe_pm_runtime_put(gt_to_xe(gt));
+
 	return 0;
 }
 
 static int vecs_default_lrc(struct seq_file *m, void *data)
 {
+	struct xe_gt *gt = node_to_gt(m->private);
 	struct drm_printer p = drm_seq_file_printer(m);
 
+	xe_pm_runtime_get(gt_to_xe(gt));
 	xe_lrc_dump_default(&p, node_to_gt(m->private), XE_ENGINE_CLASS_VIDEO_ENHANCE);
+	xe_pm_runtime_put(gt_to_xe(gt));
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/xe/xe_guc_debugfs.c b/drivers/gpu/drm/xe/xe_guc_debugfs.c
index ffd7d53bcc42b..d3822cbea273a 100644
--- a/drivers/gpu/drm/xe/xe_guc_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_guc_debugfs.c
@@ -14,6 +14,7 @@
 #include "xe_guc_ct.h"
 #include "xe_guc_log.h"
 #include "xe_macros.h"
+#include "xe_pm.h"
 
 static struct xe_guc *node_to_guc(struct drm_info_node *node)
 {
@@ -26,9 +27,9 @@ static int guc_info(struct seq_file *m, void *data)
 	struct xe_device *xe = guc_to_xe(guc);
 	struct drm_printer p = drm_seq_file_printer(m);
 
-	xe_device_mem_access_get(xe);
+	xe_pm_runtime_get(xe);
 	xe_guc_print_info(guc, &p);
-	xe_device_mem_access_put(xe);
+	xe_pm_runtime_put(xe);
 
 	return 0;
 }
@@ -39,9 +40,9 @@ static int guc_log(struct seq_file *m, void *data)
 	struct xe_device *xe = guc_to_xe(guc);
 	struct drm_printer p = drm_seq_file_printer(m);
 
-	xe_device_mem_access_get(xe);
+	xe_pm_runtime_get(xe);
 	xe_guc_log_print(&guc->log, &p);
-	xe_device_mem_access_put(xe);
+	xe_pm_runtime_put(xe);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/xe/xe_huc_debugfs.c b/drivers/gpu/drm/xe/xe_huc_debugfs.c
index 18585a7eeb9d4..4d9f2bb0eee2e 100644
--- a/drivers/gpu/drm/xe/xe_huc_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_huc_debugfs.c
@@ -12,6 +12,7 @@
 #include "xe_gt.h"
 #include "xe_huc.h"
 #include "xe_macros.h"
+#include "xe_pm.h"
 
 static struct xe_gt *
 huc_to_gt(struct xe_huc *huc)
@@ -36,9 +37,11 @@ static int huc_info(struct seq_file *m, void *data)
 	struct xe_device *xe = huc_to_xe(huc);
 	struct drm_printer p = drm_seq_file_printer(m);
 
+	xe_pm_runtime_get(xe);
 	xe_device_mem_access_get(xe);
 	xe_huc_print_info(huc, &p);
 	xe_device_mem_access_put(xe);
+	xe_pm_runtime_put(xe);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
index 3e1fa0c832cab..9844a8edbfe19 100644
--- a/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
@@ -73,7 +73,10 @@ static void xe_ttm_sys_mgr_del(struct ttm_resource_manager *man,
 static void xe_ttm_sys_mgr_debug(struct ttm_resource_manager *man,
 				 struct drm_printer *printer)
 {
-
+	/*
+	 * This function is called by debugfs entry and would require
+	 * pm_runtime_{get,put} wrappers around any operation.
+	 */
 }
 
 static const struct ttm_resource_manager_func xe_ttm_sys_mgr_func = {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 14/20] drm/xe: Replace dma_buf mem_access per direct xe_pm_runtime calls
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (12 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 13/20] drm/xe: Runtime PM wake on every debugfs call Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2023-12-28  2:12 ` [RFC 15/20] drm/xe: Allow GuC CT fast path and worker regardless of runtime_pm Rodrigo Vivi
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

Continue on the path to entirely remove mem_access helpers in
favor of the direct xe_pm_runtime calls. This item is one of
the direct outer bounds of the protection.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_dma_buf.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index 64ed303728fda..eeee103c5c14a 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -16,6 +16,7 @@
 #include "tests/xe_test.h"
 #include "xe_bo.h"
 #include "xe_device.h"
+#include "xe_pm.h"
 #include "xe_ttm_vram_mgr.h"
 #include "xe_vm.h"
 
@@ -33,7 +34,7 @@ static int xe_dma_buf_attach(struct dma_buf *dmabuf,
 	if (!attach->peer2peer && !xe_bo_can_migrate(gem_to_xe_bo(obj), XE_PL_TT))
 		return -EOPNOTSUPP;
 
-	xe_device_mem_access_get(to_xe_device(obj->dev));
+	xe_pm_runtime_get(to_xe_device(obj->dev));
 	return 0;
 }
 
@@ -42,7 +43,7 @@ static void xe_dma_buf_detach(struct dma_buf *dmabuf,
 {
 	struct drm_gem_object *obj = attach->dmabuf->priv;
 
-	xe_device_mem_access_put(to_xe_device(obj->dev));
+	xe_pm_runtime_put(to_xe_device(obj->dev));
 }
 
 static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 15/20] drm/xe: Allow GuC CT fast path and worker regardless of runtime_pm
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (13 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 14/20] drm/xe: Replace dma_buf mem_access per direct xe_pm_runtime calls Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2024-01-09 12:09   ` Matthew Auld
  2023-12-28  2:12 ` [RFC 16/20] drm/xe: Remove mem_access calls from migration Rodrigo Vivi
                   ` (8 subsequent siblings)
  23 siblings, 1 reply; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

First of all, this !ongoing && !from_runtime_functions seems a case that
should not happen and be bad anyway. So, let's at least stop doing
the workaround and if we find the case again we need to find in which
outer bound we need to protect this access, or another real condition.

On top of that we are now protecting more outer bounds instead of
a more granular memory access, so we might be fine. Or maybe ensure
that we really shut off GuC on these conditions.

Anyway, let's proceed with our killing of the memory_access callers
for now.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_ct.c | 40 ----------------------------------
 1 file changed, 40 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 4cde93c18a2d4..7e68ef69ca8d5 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -1096,14 +1096,8 @@ static void g2h_fast_path(struct xe_guc_ct *ct, u32 *msg, u32 len)
  */
 void xe_guc_ct_fast_path(struct xe_guc_ct *ct)
 {
-	struct xe_device *xe = ct_to_xe(ct);
-	bool ongoing;
 	int len;
 
-	ongoing = xe_device_mem_access_get_if_ongoing(ct_to_xe(ct));
-	if (!ongoing && xe_pm_read_callback_task(ct_to_xe(ct)) == NULL)
-		return;
-
 	spin_lock(&ct->fast_lock);
 	do {
 		len = g2h_read(ct, ct->fast_msg, true);
@@ -1111,9 +1105,6 @@ void xe_guc_ct_fast_path(struct xe_guc_ct *ct)
 			g2h_fast_path(ct, ct->fast_msg, len);
 	} while (len > 0);
 	spin_unlock(&ct->fast_lock);
-
-	if (ongoing)
-		xe_device_mem_access_put(xe);
 }
 
 /* Returns less than zero on error, 0 on done, 1 on more available */
@@ -1144,36 +1135,8 @@ static int dequeue_one_g2h(struct xe_guc_ct *ct)
 static void g2h_worker_func(struct work_struct *w)
 {
 	struct xe_guc_ct *ct = container_of(w, struct xe_guc_ct, g2h_worker);
-	bool ongoing;
 	int ret;
 
-	/*
-	 * Normal users must always hold mem_access.ref around CT calls. However
-	 * during the runtime pm callbacks we rely on CT to talk to the GuC, but
-	 * at this stage we can't rely on mem_access.ref and even the
-	 * callback_task will be different than current.  For such cases we just
-	 * need to ensure we always process the responses from any blocking
-	 * ct_send requests or where we otherwise expect some response when
-	 * initiated from those callbacks (which will need to wait for the below
-	 * dequeue_one_g2h()).  The dequeue_one_g2h() will gracefully fail if
-	 * the device has suspended to the point that the CT communication has
-	 * been disabled.
-	 *
-	 * If we are inside the runtime pm callback, we can be the only task
-	 * still issuing CT requests (since that requires having the
-	 * mem_access.ref).  It seems like it might in theory be possible to
-	 * receive unsolicited events from the GuC just as we are
-	 * suspending-resuming, but those will currently anyway be lost when
-	 * eventually exiting from suspend, hence no need to wake up the device
-	 * here. If we ever need something stronger than get_if_ongoing() then
-	 * we need to be careful with blocking the pm callbacks from getting CT
-	 * responses, if the worker here is blocked on those callbacks
-	 * completing, creating a deadlock.
-	 */
-	ongoing = xe_device_mem_access_get_if_ongoing(ct_to_xe(ct));
-	if (!ongoing && xe_pm_read_callback_task(ct_to_xe(ct)) == NULL)
-		return;
-
 	do {
 		mutex_lock(&ct->lock);
 		ret = dequeue_one_g2h(ct);
@@ -1187,9 +1150,6 @@ static void g2h_worker_func(struct work_struct *w)
 			kick_reset(ct);
 		}
 	} while (ret == 1);
-
-	if (ongoing)
-		xe_device_mem_access_put(ct_to_xe(ct));
 }
 
 static void guc_ctb_snapshot_capture(struct xe_device *xe, struct guc_ctb *ctb,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 16/20] drm/xe: Remove mem_access calls from migration
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (14 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 15/20] drm/xe: Allow GuC CT fast path and worker regardless of runtime_pm Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2024-01-09 12:33   ` Matthew Auld
  2023-12-28  2:12 ` [RFC 17/20] drm/xe: Removing extra mem_access protection from runtime pm Rodrigo Vivi
                   ` (7 subsequent siblings)
  23 siblings, 1 reply; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

The sched jobs runtime pm calls already protects every execution,
including these migration ones.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_migrate.c |  2 --
 drivers/gpu/drm/xe/xe_device.c        | 17 -----------------
 drivers/gpu/drm/xe/xe_device.h        |  1 -
 drivers/gpu/drm/xe/xe_exec_queue.c    | 18 ------------------
 4 files changed, 38 deletions(-)

diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
index 7a32faa2f6888..2257f0a28435b 100644
--- a/drivers/gpu/drm/xe/tests/xe_migrate.c
+++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
@@ -428,9 +428,7 @@ static int migrate_test_run_device(struct xe_device *xe)
 
 		kunit_info(test, "Testing tile id %d.\n", id);
 		xe_vm_lock(m->q->vm, true);
-		xe_device_mem_access_get(xe);
 		xe_migrate_sanity_test(m, test);
-		xe_device_mem_access_put(xe);
 		xe_vm_unlock(m->q->vm);
 	}
 
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index ee9b6612eec43..a7bec49da49fa 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -675,23 +675,6 @@ void xe_device_assert_mem_access(struct xe_device *xe)
 	XE_WARN_ON(xe_pm_runtime_suspended(xe));
 }
 
-bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe)
-{
-	bool active;
-
-	if (xe_pm_read_callback_task(xe) == current)
-		return true;
-
-	active = xe_pm_runtime_get_if_active(xe);
-	if (active) {
-		int ref = atomic_inc_return(&xe->mem_access.ref);
-
-		xe_assert(xe, ref != S32_MAX);
-	}
-
-	return active;
-}
-
 void xe_device_mem_access_get(struct xe_device *xe)
 {
 	int ref;
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index af8ac2e9e2709..4acf4c2973390 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -142,7 +142,6 @@ static inline struct xe_force_wake *gt_to_fw(struct xe_gt *gt)
 }
 
 void xe_device_mem_access_get(struct xe_device *xe);
-bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe);
 void xe_device_mem_access_put(struct xe_device *xe);
 
 void xe_device_assert_mem_access(struct xe_device *xe);
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 44fe8097b7cda..d3a8d2d8caaaf 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -87,17 +87,6 @@ static struct xe_exec_queue *__xe_exec_queue_create(struct xe_device *xe,
 	if (err)
 		goto err_lrc;
 
-	/*
-	 * Normally the user vm holds an rpm ref to keep the device
-	 * awake, and the context holds a ref for the vm, however for
-	 * some engines we use the kernels migrate vm underneath which offers no
-	 * such rpm ref, or we lack a vm. Make sure we keep a ref here, so we
-	 * can perform GuC CT actions when needed. Caller is expected to have
-	 * already grabbed the rpm ref outside any sensitive locks.
-	 */
-	if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !vm))
-		drm_WARN_ON(&xe->drm, !xe_device_mem_access_get_if_ongoing(xe));
-
 	return q;
 
 err_lrc:
@@ -172,8 +161,6 @@ void xe_exec_queue_fini(struct xe_exec_queue *q)
 
 	for (i = 0; i < q->width; ++i)
 		xe_lrc_finish(q->lrc + i);
-	if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !q->vm))
-		xe_device_mem_access_put(gt_to_xe(q->gt));
 	if (q->vm)
 		xe_vm_put(q->vm);
 
@@ -643,9 +630,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 			if (XE_IOCTL_DBG(xe, !hwe))
 				return -EINVAL;
 
-			/* The migration vm doesn't hold rpm ref */
-			xe_device_mem_access_get(xe);
-
 			migrate_vm = xe_migrate_get_vm(gt_to_tile(gt)->migrate);
 			new = xe_exec_queue_create(xe, migrate_vm, logical_mask,
 						   args->width, hwe,
@@ -655,8 +639,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 						    EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD :
 						    0));
 
-			xe_device_mem_access_put(xe); /* now held by engine */
-
 			xe_vm_put(migrate_vm);
 			if (IS_ERR(new)) {
 				err = PTR_ERR(new);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 17/20] drm/xe: Removing extra mem_access protection from runtime pm
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (15 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 16/20] drm/xe: Remove mem_access calls from migration Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2023-12-28  2:12 ` [RFC 18/20] drm/xe: Convert hwmon from mem_access to xe_pm_runtime calls Rodrigo Vivi
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

This is not needed any longer, now that we have all the protection
in place with the runtime pm itself.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c | 8 --------
 drivers/gpu/drm/xe/xe_device.h | 1 -
 drivers/gpu/drm/xe/xe_pm.c     | 3 ---
 3 files changed, 12 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index a7bec49da49fa..c1c19264a58b4 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -651,14 +651,6 @@ u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size)
 		DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE(xe)) : 0;
 }
 
-bool xe_device_mem_access_ongoing(struct xe_device *xe)
-{
-	if (xe_pm_read_callback_task(xe) != NULL)
-		return true;
-
-	return atomic_read(&xe->mem_access.ref);
-}
-
 /**
  * xe_device_assert_mem_access - Inspect the current runtime_pm state.
  * @xe: xe device instance
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 4acf4c2973390..d5ab7c62d5f62 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -145,7 +145,6 @@ void xe_device_mem_access_get(struct xe_device *xe);
 void xe_device_mem_access_put(struct xe_device *xe);
 
 void xe_device_assert_mem_access(struct xe_device *xe);
-bool xe_device_mem_access_ongoing(struct xe_device *xe);
 
 static inline bool xe_device_in_fault_mode(struct xe_device *xe)
 {
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index 3594e707606ce..e53ac5a2f4ad3 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -274,9 +274,6 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
 	u8 id;
 	int err = 0;
 
-	if (xe->d3cold.allowed && xe_device_mem_access_ongoing(xe))
-		return -EBUSY;
-
 	/* Disable access_ongoing asserts and prevent recursive pm calls */
 	xe_pm_write_callback_task(xe, current);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 18/20] drm/xe: Convert hwmon from mem_access to xe_pm_runtime calls
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (16 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 17/20] drm/xe: Removing extra mem_access protection from runtime pm Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2023-12-28  2:12 ` [RFC 19/20] drm/xe: Remove unused runtime pm helper Rodrigo Vivi
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

Continue the work to kill the mem_access in favor of a pure runtime pm.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_hwmon.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_hwmon.c b/drivers/gpu/drm/xe/xe_hwmon.c
index 6ef2aa1eae8b0..e7e001198097a 100644
--- a/drivers/gpu/drm/xe/xe_hwmon.c
+++ b/drivers/gpu/drm/xe/xe_hwmon.c
@@ -16,6 +16,7 @@
 #include "xe_mmio.h"
 #include "xe_pcode.h"
 #include "xe_pcode_api.h"
+#include "xe_pm.h"
 
 enum xe_hwmon_reg {
 	REG_PKG_RAPL_LIMIT,
@@ -264,7 +265,7 @@ xe_hwmon_power1_max_interval_show(struct device *dev, struct device_attribute *a
 	u32 x, y, x_w = 2; /* 2 bits */
 	u64 r, tau4, out;
 
-	xe_device_mem_access_get(gt_to_xe(hwmon->gt));
+	xe_pm_runtime_get(gt_to_xe(hwmon->gt));
 
 	mutex_lock(&hwmon->hwmon_lock);
 
@@ -273,7 +274,7 @@ xe_hwmon_power1_max_interval_show(struct device *dev, struct device_attribute *a
 
 	mutex_unlock(&hwmon->hwmon_lock);
 
-	xe_device_mem_access_put(gt_to_xe(hwmon->gt));
+	xe_pm_runtime_put(gt_to_xe(hwmon->gt));
 
 	x = REG_FIELD_GET(PKG_PWR_LIM_1_TIME_X, r);
 	y = REG_FIELD_GET(PKG_PWR_LIM_1_TIME_Y, r);
@@ -352,7 +353,7 @@ xe_hwmon_power1_max_interval_store(struct device *dev, struct device_attribute *
 
 	rxy = REG_FIELD_PREP(PKG_PWR_LIM_1_TIME_X, x) | REG_FIELD_PREP(PKG_PWR_LIM_1_TIME_Y, y);
 
-	xe_device_mem_access_get(gt_to_xe(hwmon->gt));
+	xe_pm_runtime_get(gt_to_xe(hwmon->gt));
 
 	mutex_lock(&hwmon->hwmon_lock);
 
@@ -361,7 +362,7 @@ xe_hwmon_power1_max_interval_store(struct device *dev, struct device_attribute *
 
 	mutex_unlock(&hwmon->hwmon_lock);
 
-	xe_device_mem_access_put(gt_to_xe(hwmon->gt));
+	xe_pm_runtime_put(gt_to_xe(hwmon->gt));
 
 	return count;
 }
@@ -382,12 +383,12 @@ static umode_t xe_hwmon_attributes_visible(struct kobject *kobj,
 	struct xe_hwmon *hwmon = dev_get_drvdata(dev);
 	int ret = 0;
 
-	xe_device_mem_access_get(gt_to_xe(hwmon->gt));
+	xe_pm_runtime_get(gt_to_xe(hwmon->gt));
 
 	if (attr == &sensor_dev_attr_power1_max_interval.dev_attr.attr)
 		ret = xe_hwmon_get_reg(hwmon, REG_PKG_RAPL_LIMIT) ? attr->mode : 0;
 
-	xe_device_mem_access_put(gt_to_xe(hwmon->gt));
+	xe_pm_runtime_put(gt_to_xe(hwmon->gt));
 
 	return ret;
 }
@@ -608,7 +609,7 @@ xe_hwmon_is_visible(const void *drvdata, enum hwmon_sensor_types type,
 	struct xe_hwmon *hwmon = (struct xe_hwmon *)drvdata;
 	int ret;
 
-	xe_device_mem_access_get(gt_to_xe(hwmon->gt));
+	xe_pm_runtime_get(gt_to_xe(hwmon->gt));
 
 	switch (type) {
 	case hwmon_power:
@@ -628,7 +629,7 @@ xe_hwmon_is_visible(const void *drvdata, enum hwmon_sensor_types type,
 		break;
 	}
 
-	xe_device_mem_access_put(gt_to_xe(hwmon->gt));
+	xe_pm_runtime_put(gt_to_xe(hwmon->gt));
 
 	return ret;
 }
@@ -640,7 +641,7 @@ xe_hwmon_read(struct device *dev, enum hwmon_sensor_types type, u32 attr,
 	struct xe_hwmon *hwmon = dev_get_drvdata(dev);
 	int ret;
 
-	xe_device_mem_access_get(gt_to_xe(hwmon->gt));
+	xe_pm_runtime_get(gt_to_xe(hwmon->gt));
 
 	switch (type) {
 	case hwmon_power:
@@ -660,7 +661,7 @@ xe_hwmon_read(struct device *dev, enum hwmon_sensor_types type, u32 attr,
 		break;
 	}
 
-	xe_device_mem_access_put(gt_to_xe(hwmon->gt));
+	xe_pm_runtime_put(gt_to_xe(hwmon->gt));
 
 	return ret;
 }
@@ -672,7 +673,7 @@ xe_hwmon_write(struct device *dev, enum hwmon_sensor_types type, u32 attr,
 	struct xe_hwmon *hwmon = dev_get_drvdata(dev);
 	int ret;
 
-	xe_device_mem_access_get(gt_to_xe(hwmon->gt));
+	xe_pm_runtime_get(gt_to_xe(hwmon->gt));
 
 	switch (type) {
 	case hwmon_power:
@@ -686,7 +687,7 @@ xe_hwmon_write(struct device *dev, enum hwmon_sensor_types type, u32 attr,
 		break;
 	}
 
-	xe_device_mem_access_put(gt_to_xe(hwmon->gt));
+	xe_pm_runtime_put(gt_to_xe(hwmon->gt));
 
 	return ret;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 19/20] drm/xe: Remove unused runtime pm helper
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (17 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 18/20] drm/xe: Convert hwmon from mem_access to xe_pm_runtime calls Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2023-12-28  2:12 ` [RFC 20/20] drm/xe: Mega Kill of mem_access Rodrigo Vivi
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

Now that ongoing variants of mem_access is gone, this is not
needed anymore.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_pm.c | 12 ------------
 drivers/gpu/drm/xe/xe_pm.h |  1 -
 2 files changed, 13 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index e53ac5a2f4ad3..16ead90f3cab5 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -430,18 +430,6 @@ int xe_pm_runtime_get_sync(struct xe_device *xe)
         return pm_runtime_get_sync(xe->drm.dev);
 }
 
-/**
- * xe_pm_runtime_get_if_active - Get a runtime_pm reference if device active
- * @xe: xe device instance
- *
- * Returns: Any number grater than or equal to 0 for success, negative error
- * code otherwise.
- */
-int xe_pm_runtime_get_if_active(struct xe_device *xe)
-{
-	return pm_runtime_get_if_active(xe->drm.dev, true);
-}
-
 /**
  * xe_pm_runtime_get_if_in_use - Get a runtime_pm reference and resume if needed
  * @xe: xe device instance
diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
index fc82a1466453b..fa42a5ec89712 100644
--- a/drivers/gpu/drm/xe/xe_pm.h
+++ b/drivers/gpu/drm/xe/xe_pm.h
@@ -28,7 +28,6 @@ int xe_pm_runtime_resume(struct xe_device *xe);
 bool xe_pm_runtime_resume_and_get(struct xe_device *xe);
 void xe_pm_runtime_get(struct xe_device *xe);
 int xe_pm_runtime_get_sync(struct xe_device *xe);
-int xe_pm_runtime_get_if_active(struct xe_device *xe);
 bool xe_pm_runtime_get_if_in_use(struct xe_device *xe);
 void xe_pm_runtime_put(struct xe_device *xe);
 void xe_pm_assert_unbounded_bridge(struct xe_device *xe);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 20/20] drm/xe: Mega Kill of mem_access
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (18 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 19/20] drm/xe: Remove unused runtime pm helper Rodrigo Vivi
@ 2023-12-28  2:12 ` Rodrigo Vivi
  2024-01-09 11:41   ` Matthew Auld
  2024-01-04  5:40 ` ✓ CI.Patch_applied: success for First attempt to kill mem_access Patchwork
                   ` (3 subsequent siblings)
  23 siblings, 1 reply; 46+ messages in thread
From: Rodrigo Vivi @ 2023-12-28  2:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

All of these remaining cases should already be protected
by the outer bound calls of runtime_pm

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/display/xe_fb_pin.c |  7 +--
 drivers/gpu/drm/xe/tests/xe_bo.c       |  8 ----
 drivers/gpu/drm/xe/tests/xe_mocs.c     |  4 --
 drivers/gpu/drm/xe/xe_bo.c             |  5 ---
 drivers/gpu/drm/xe/xe_device.c         | 59 --------------------------
 drivers/gpu/drm/xe/xe_device.h         |  7 ---
 drivers/gpu/drm/xe/xe_device_types.h   |  9 ----
 drivers/gpu/drm/xe/xe_ggtt.c           |  6 ---
 drivers/gpu/drm/xe/xe_gsc.c            |  3 --
 drivers/gpu/drm/xe/xe_gt.c             | 17 --------
 drivers/gpu/drm/xe/xe_huc_debugfs.c    |  2 -
 drivers/gpu/drm/xe/xe_pat.c            | 10 -----
 drivers/gpu/drm/xe/xe_pm.c             | 27 ------------
 drivers/gpu/drm/xe/xe_query.c          |  4 --
 drivers/gpu/drm/xe/xe_tile.c           | 10 ++---
 drivers/gpu/drm/xe/xe_vm.c             |  7 ---
 16 files changed, 5 insertions(+), 180 deletions(-)

diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
index 722c84a566073..077294ec50ece 100644
--- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
+++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
@@ -190,10 +190,9 @@ static int __xe_pin_fb_vma_ggtt(struct intel_framebuffer *fb,
 	/* TODO: Consider sharing framebuffer mapping?
 	 * embed i915_vma inside intel_framebuffer
 	 */
-	xe_device_mem_access_get(tile_to_xe(ggtt->tile));
 	ret = mutex_lock_interruptible(&ggtt->lock);
 	if (ret)
-		goto out;
+		return ret;
 
 	align = XE_PAGE_SIZE;
 	if (xe_bo_is_vram(bo) && ggtt->flags & XE_GGTT_FLAGS_64K)
@@ -241,8 +240,6 @@ static int __xe_pin_fb_vma_ggtt(struct intel_framebuffer *fb,
 	xe_ggtt_invalidate(ggtt);
 out_unlock:
 	mutex_unlock(&ggtt->lock);
-out:
-	xe_device_mem_access_put(tile_to_xe(ggtt->tile));
 	return ret;
 }
 
@@ -381,4 +378,4 @@ struct i915_address_space *intel_dpt_create(struct intel_framebuffer *fb)
 void intel_dpt_destroy(struct i915_address_space *vm)
 {
 	return;
-}
\ No newline at end of file
+}
diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
index 412b2e7ce40cb..97b10e597f0ad 100644
--- a/drivers/gpu/drm/xe/tests/xe_bo.c
+++ b/drivers/gpu/drm/xe/tests/xe_bo.c
@@ -164,8 +164,6 @@ static int ccs_test_run_device(struct xe_device *xe)
 		return 0;
 	}
 
-	xe_device_mem_access_get(xe);
-
 	for_each_tile(tile, xe, id) {
 		/* For igfx run only for primary tile */
 		if (!IS_DGFX(xe) && id > 0)
@@ -173,8 +171,6 @@ static int ccs_test_run_device(struct xe_device *xe)
 		ccs_test_run_tile(xe, tile, test);
 	}
 
-	xe_device_mem_access_put(xe);
-
 	return 0;
 }
 
@@ -336,13 +332,9 @@ static int evict_test_run_device(struct xe_device *xe)
 		return 0;
 	}
 
-	xe_device_mem_access_get(xe);
-
 	for_each_tile(tile, xe, id)
 		evict_test_run_tile(xe, tile, test);
 
-	xe_device_mem_access_put(xe);
-
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/xe/tests/xe_mocs.c b/drivers/gpu/drm/xe/tests/xe_mocs.c
index 7dd34f94e8094..a12e7e2bb5861 100644
--- a/drivers/gpu/drm/xe/tests/xe_mocs.c
+++ b/drivers/gpu/drm/xe/tests/xe_mocs.c
@@ -45,7 +45,6 @@ static void read_l3cc_table(struct xe_gt *gt,
 
 	struct kunit *test = xe_cur_kunit();
 
-	xe_device_mem_access_get(gt_to_xe(gt));
 	ret = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
 	KUNIT_ASSERT_EQ_MSG(test, ret, 0, "Forcewake Failed.\n");
 	mocs_dbg(&gt_to_xe(gt)->drm, "L3CC entries:%d\n", info->n_entries);
@@ -65,7 +64,6 @@ static void read_l3cc_table(struct xe_gt *gt,
 				   XELP_LNCFCMOCS(i).addr);
 	}
 	xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
-	xe_device_mem_access_put(gt_to_xe(gt));
 }
 
 static void read_mocs_table(struct xe_gt *gt,
@@ -80,7 +78,6 @@ static void read_mocs_table(struct xe_gt *gt,
 
 	struct kunit *test = xe_cur_kunit();
 
-	xe_device_mem_access_get(gt_to_xe(gt));
 	ret = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
 	KUNIT_ASSERT_EQ_MSG(test, ret, 0, "Forcewake Failed.\n");
 	mocs_dbg(&gt_to_xe(gt)->drm, "Global MOCS entries:%d\n", info->n_entries);
@@ -100,7 +97,6 @@ static void read_mocs_table(struct xe_gt *gt,
 				   XELP_GLOBAL_MOCS(i).addr);
 	}
 	xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
-	xe_device_mem_access_put(gt_to_xe(gt));
 }
 
 static int mocs_kernel_test_run_device(struct xe_device *xe)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 8e4a3b1f6b938..056c65c2675d8 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -715,7 +715,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
 	xe_assert(xe, migrate);
 
 	trace_xe_bo_move(bo);
-	xe_device_mem_access_get(xe);
 
 	if (xe_bo_is_pinned(bo) && !xe_bo_is_user(bo)) {
 		/*
@@ -739,7 +738,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
 
 				if (XE_WARN_ON(new_mem->start == XE_BO_INVALID_OFFSET)) {
 					ret = -EINVAL;
-					xe_device_mem_access_put(xe);
 					goto out;
 				}
 
@@ -757,7 +755,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
 						new_mem, handle_system_ccs);
 		if (IS_ERR(fence)) {
 			ret = PTR_ERR(fence);
-			xe_device_mem_access_put(xe);
 			goto out;
 		}
 		if (!move_lacks_source) {
@@ -782,8 +779,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
 		dma_fence_put(fence);
 	}
 
-	xe_device_mem_access_put(xe);
-
 out:
 	return ret;
 
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index c1c19264a58b4..cb08a4369bb9e 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -44,12 +44,6 @@
 #include "xe_wait_user_fence.h"
 #include "xe_hwmon.h"
 
-#ifdef CONFIG_LOCKDEP
-struct lockdep_map xe_device_mem_access_lockdep_map = {
-	.name = "xe_device_mem_access_lockdep_map"
-};
-#endif
-
 static int xe_file_open(struct drm_device *dev, struct drm_file *file)
 {
 	struct xe_device *xe = to_xe_device(dev);
@@ -666,56 +660,3 @@ void xe_device_assert_mem_access(struct xe_device *xe)
 {
 	XE_WARN_ON(xe_pm_runtime_suspended(xe));
 }
-
-void xe_device_mem_access_get(struct xe_device *xe)
-{
-	int ref;
-
-	/*
-	 * This looks racy, but should be fine since the pm_callback_task only
-	 * transitions from NULL -> current (and back to NULL again), during the
-	 * runtime_resume() or runtime_suspend() callbacks, for which there can
-	 * only be a single one running for our device. We only need to prevent
-	 * recursively calling the runtime_get or runtime_put from those
-	 * callbacks, as well as preventing triggering any access_ongoing
-	 * asserts.
-	 */
-	if (xe_pm_read_callback_task(xe) == current)
-		return;
-
-	/*
-	 * Since the resume here is synchronous it can be quite easy to deadlock
-	 * if we are not careful. Also in practice it might be quite timing
-	 * sensitive to ever see the 0 -> 1 transition with the callers locks
-	 * held, so deadlocks might exist but are hard for lockdep to ever see.
-	 * With this in mind, help lockdep learn about the potentially scary
-	 * stuff that can happen inside the runtime_resume callback by acquiring
-	 * a dummy lock (it doesn't protect anything and gets compiled out on
-	 * non-debug builds).  Lockdep then only needs to see the
-	 * mem_access_lockdep_map -> runtime_resume callback once, and then can
-	 * hopefully validate all the (callers_locks) -> mem_access_lockdep_map.
-	 * For example if the (callers_locks) are ever grabbed in the
-	 * runtime_resume callback, lockdep should give us a nice splat.
-	 */
-	lock_map_acquire(&xe_device_mem_access_lockdep_map);
-	lock_map_release(&xe_device_mem_access_lockdep_map);
-
-	xe_pm_runtime_get(xe);
-	ref = atomic_inc_return(&xe->mem_access.ref);
-
-	xe_assert(xe, ref != S32_MAX);
-
-}
-
-void xe_device_mem_access_put(struct xe_device *xe)
-{
-	int ref;
-
-	if (xe_pm_read_callback_task(xe) == current)
-		return;
-
-	ref = atomic_dec_return(&xe->mem_access.ref);
-	xe_pm_runtime_put(xe);
-
-	xe_assert(xe, ref >= 0);
-}
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index d5ab7c62d5f62..fb4d6ed4a3d2a 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -16,10 +16,6 @@ struct xe_file;
 #include "xe_force_wake.h"
 #include "xe_macros.h"
 
-#ifdef CONFIG_LOCKDEP
-extern struct lockdep_map xe_device_mem_access_lockdep_map;
-#endif
-
 static inline struct xe_device *to_xe_device(const struct drm_device *dev)
 {
 	return container_of(dev, struct xe_device, drm);
@@ -141,9 +137,6 @@ static inline struct xe_force_wake *gt_to_fw(struct xe_gt *gt)
 	return &gt->mmio.fw;
 }
 
-void xe_device_mem_access_get(struct xe_device *xe);
-void xe_device_mem_access_put(struct xe_device *xe);
-
 void xe_device_assert_mem_access(struct xe_device *xe);
 
 static inline bool xe_device_in_fault_mode(struct xe_device *xe)
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 71f23ac365e66..8f0185c773d0b 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -378,15 +378,6 @@ struct xe_device {
 	/** @tiles: device tiles */
 	struct xe_tile tiles[XE_MAX_TILES_PER_DEVICE];
 
-	/**
-	 * @mem_access: keep track of memory access in the device, possibly
-	 * triggering additional actions when they occur.
-	 */
-	struct {
-		/** @ref: ref count of memory accesses */
-		atomic_t ref;
-	} mem_access;
-
 	/**
 	 * @pat: Encapsulate PAT related stuff
 	 */
diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index c639dbf3bdd27..96df4b553c18b 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -198,14 +198,12 @@ static void xe_ggtt_initial_clear(struct xe_ggtt *ggtt)
 	u64 start, end;
 
 	/* Display may have allocated inside ggtt, so be careful with clearing here */
-	xe_device_mem_access_get(tile_to_xe(ggtt->tile));
 	mutex_lock(&ggtt->lock);
 	drm_mm_for_each_hole(hole, &ggtt->mm, start, end)
 		xe_ggtt_clear(ggtt, start, end - start);
 
 	xe_ggtt_invalidate(ggtt);
 	mutex_unlock(&ggtt->lock);
-	xe_device_mem_access_put(tile_to_xe(ggtt->tile));
 }
 
 int xe_ggtt_init(struct xe_ggtt *ggtt)
@@ -366,14 +364,12 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
 	if (err)
 		return err;
 
-	xe_device_mem_access_get(tile_to_xe(ggtt->tile));
 	mutex_lock(&ggtt->lock);
 	err = drm_mm_insert_node_in_range(&ggtt->mm, &bo->ggtt_node, bo->size,
 					  alignment, 0, start, end, 0);
 	if (!err)
 		xe_ggtt_map_bo(ggtt, bo);
 	mutex_unlock(&ggtt->lock);
-	xe_device_mem_access_put(tile_to_xe(ggtt->tile));
 
 	return err;
 }
@@ -391,7 +387,6 @@ int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
 
 void xe_ggtt_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node)
 {
-	xe_device_mem_access_get(tile_to_xe(ggtt->tile));
 	mutex_lock(&ggtt->lock);
 
 	xe_ggtt_clear(ggtt, node->start, node->size);
@@ -401,7 +396,6 @@ void xe_ggtt_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node)
 	xe_ggtt_invalidate(ggtt);
 
 	mutex_unlock(&ggtt->lock);
-	xe_device_mem_access_put(tile_to_xe(ggtt->tile));
 }
 
 void xe_ggtt_remove_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c
index a8a895cf4b448..cbd3f5b2f6a78 100644
--- a/drivers/gpu/drm/xe/xe_gsc.c
+++ b/drivers/gpu/drm/xe/xe_gsc.c
@@ -251,10 +251,8 @@ static void gsc_work(struct work_struct *work)
 {
 	struct xe_gsc *gsc = container_of(work, typeof(*gsc), work);
 	struct xe_gt *gt = gsc_to_gt(gsc);
-	struct xe_device *xe = gt_to_xe(gt);
 	int ret;
 
-	xe_device_mem_access_get(xe);
 	xe_force_wake_get(gt_to_fw(gt), XE_FW_GSC);
 
 	ret = gsc_upload(gsc);
@@ -271,7 +269,6 @@ static void gsc_work(struct work_struct *work)
 
 out:
 	xe_force_wake_put(gt_to_fw(gt), XE_FW_GSC);
-	xe_device_mem_access_put(xe);
 }
 
 int xe_gsc_init(struct xe_gsc *gsc)
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 3af2adec12956..00aa8a52e9076 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -336,7 +336,6 @@ static int gt_fw_domain_init(struct xe_gt *gt)
 {
 	int err, i;
 
-	xe_device_mem_access_get(gt_to_xe(gt));
 	err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
 	if (err)
 		goto err_hw_fence_irq;
@@ -388,7 +387,6 @@ static int gt_fw_domain_init(struct xe_gt *gt)
 
 	err = xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
 	XE_WARN_ON(err);
-	xe_device_mem_access_put(gt_to_xe(gt));
 
 	return 0;
 
@@ -398,7 +396,6 @@ static int gt_fw_domain_init(struct xe_gt *gt)
 err_hw_fence_irq:
 	for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i)
 		xe_hw_fence_irq_finish(&gt->fence_irq[i]);
-	xe_device_mem_access_put(gt_to_xe(gt));
 
 	return err;
 }
@@ -407,7 +404,6 @@ static int all_fw_domain_init(struct xe_gt *gt)
 {
 	int err, i;
 
-	xe_device_mem_access_get(gt_to_xe(gt));
 	err = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
 	if (err)
 		goto err_hw_fence_irq;
@@ -470,7 +466,6 @@ static int all_fw_domain_init(struct xe_gt *gt)
 
 	err = xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL);
 	XE_WARN_ON(err);
-	xe_device_mem_access_put(gt_to_xe(gt));
 
 	return 0;
 
@@ -479,7 +474,6 @@ static int all_fw_domain_init(struct xe_gt *gt)
 err_hw_fence_irq:
 	for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i)
 		xe_hw_fence_irq_finish(&gt->fence_irq[i]);
-	xe_device_mem_access_put(gt_to_xe(gt));
 
 	return err;
 }
@@ -606,7 +600,6 @@ static int gt_reset(struct xe_gt *gt)
 
 	xe_gt_sanitize(gt);
 
-	xe_device_mem_access_get(gt_to_xe(gt));
 	err = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
 	if (err)
 		goto err_msg;
@@ -630,7 +623,6 @@ static int gt_reset(struct xe_gt *gt)
 		goto err_out;
 
 	err = xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL);
-	xe_device_mem_access_put(gt_to_xe(gt));
 	XE_WARN_ON(err);
 
 	xe_gt_info(gt, "reset done\n");
@@ -641,7 +633,6 @@ static int gt_reset(struct xe_gt *gt)
 	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
 err_msg:
 	XE_WARN_ON(xe_uc_start(&gt->uc));
-	xe_device_mem_access_put(gt_to_xe(gt));
 err_fail:
 	xe_gt_err(gt, "reset failed (%pe)\n", ERR_PTR(err));
 
@@ -671,13 +662,11 @@ void xe_gt_reset_async(struct xe_gt *gt)
 
 void xe_gt_suspend_prepare(struct xe_gt *gt)
 {
-	xe_device_mem_access_get(gt_to_xe(gt));
 	XE_WARN_ON(xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL));
 
 	xe_uc_stop_prepare(&gt->uc);
 
 	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
-	xe_device_mem_access_put(gt_to_xe(gt));
 }
 
 int xe_gt_suspend(struct xe_gt *gt)
@@ -686,7 +675,6 @@ int xe_gt_suspend(struct xe_gt *gt)
 
 	xe_gt_sanitize(gt);
 
-	xe_device_mem_access_get(gt_to_xe(gt));
 	err = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
 	if (err)
 		goto err_msg;
@@ -696,7 +684,6 @@ int xe_gt_suspend(struct xe_gt *gt)
 		goto err_force_wake;
 
 	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
-	xe_device_mem_access_put(gt_to_xe(gt));
 	xe_gt_info(gt, "suspended\n");
 
 	return 0;
@@ -704,7 +691,6 @@ int xe_gt_suspend(struct xe_gt *gt)
 err_force_wake:
 	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
 err_msg:
-	xe_device_mem_access_put(gt_to_xe(gt));
 	xe_gt_err(gt, "suspend failed (%pe)\n", ERR_PTR(err));
 
 	return err;
@@ -714,7 +700,6 @@ int xe_gt_resume(struct xe_gt *gt)
 {
 	int err;
 
-	xe_device_mem_access_get(gt_to_xe(gt));
 	err = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
 	if (err)
 		goto err_msg;
@@ -724,7 +709,6 @@ int xe_gt_resume(struct xe_gt *gt)
 		goto err_force_wake;
 
 	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
-	xe_device_mem_access_put(gt_to_xe(gt));
 	xe_gt_info(gt, "resumed\n");
 
 	return 0;
@@ -732,7 +716,6 @@ int xe_gt_resume(struct xe_gt *gt)
 err_force_wake:
 	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
 err_msg:
-	xe_device_mem_access_put(gt_to_xe(gt));
 	xe_gt_err(gt, "resume failed (%pe)\n", ERR_PTR(err));
 
 	return err;
diff --git a/drivers/gpu/drm/xe/xe_huc_debugfs.c b/drivers/gpu/drm/xe/xe_huc_debugfs.c
index 4d9f2bb0eee2e..3a888a40188b9 100644
--- a/drivers/gpu/drm/xe/xe_huc_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_huc_debugfs.c
@@ -38,9 +38,7 @@ static int huc_info(struct seq_file *m, void *data)
 	struct drm_printer p = drm_seq_file_printer(m);
 
 	xe_pm_runtime_get(xe);
-	xe_device_mem_access_get(xe);
 	xe_huc_print_info(huc, &p);
-	xe_device_mem_access_put(xe);
 	xe_pm_runtime_put(xe);
 
 	return 0;
diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
index 1ff6bc79e7d44..b253ef863c274 100644
--- a/drivers/gpu/drm/xe/xe_pat.c
+++ b/drivers/gpu/drm/xe/xe_pat.c
@@ -173,7 +173,6 @@ static void xelp_dump(struct xe_gt *gt, struct drm_printer *p)
 	struct xe_device *xe = gt_to_xe(gt);
 	int i, err;
 
-	xe_device_mem_access_get(xe);
 	err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
 	if (err)
 		goto err_fw;
@@ -191,7 +190,6 @@ static void xelp_dump(struct xe_gt *gt, struct drm_printer *p)
 	err = xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
 err_fw:
 	xe_assert(xe, !err);
-	xe_device_mem_access_put(xe);
 }
 
 static const struct xe_pat_ops xelp_pat_ops = {
@@ -204,7 +202,6 @@ static void xehp_dump(struct xe_gt *gt, struct drm_printer *p)
 	struct xe_device *xe = gt_to_xe(gt);
 	int i, err;
 
-	xe_device_mem_access_get(xe);
 	err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
 	if (err)
 		goto err_fw;
@@ -224,7 +221,6 @@ static void xehp_dump(struct xe_gt *gt, struct drm_printer *p)
 	err = xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
 err_fw:
 	xe_assert(xe, !err);
-	xe_device_mem_access_put(xe);
 }
 
 static const struct xe_pat_ops xehp_pat_ops = {
@@ -237,7 +233,6 @@ static void xehpc_dump(struct xe_gt *gt, struct drm_printer *p)
 	struct xe_device *xe = gt_to_xe(gt);
 	int i, err;
 
-	xe_device_mem_access_get(xe);
 	err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
 	if (err)
 		goto err_fw;
@@ -255,7 +250,6 @@ static void xehpc_dump(struct xe_gt *gt, struct drm_printer *p)
 	err = xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
 err_fw:
 	xe_assert(xe, !err);
-	xe_device_mem_access_put(xe);
 }
 
 static const struct xe_pat_ops xehpc_pat_ops = {
@@ -268,7 +262,6 @@ static void xelpg_dump(struct xe_gt *gt, struct drm_printer *p)
 	struct xe_device *xe = gt_to_xe(gt);
 	int i, err;
 
-	xe_device_mem_access_get(xe);
 	err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
 	if (err)
 		goto err_fw;
@@ -291,7 +284,6 @@ static void xelpg_dump(struct xe_gt *gt, struct drm_printer *p)
 	err = xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
 err_fw:
 	xe_assert(xe, !err);
-	xe_device_mem_access_put(xe);
 }
 
 /*
@@ -324,7 +316,6 @@ static void xe2_dump(struct xe_gt *gt, struct drm_printer *p)
 	int i, err;
 	u32 pat;
 
-	xe_device_mem_access_get(xe);
 	err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
 	if (err)
 		goto err_fw;
@@ -369,7 +360,6 @@ static void xe2_dump(struct xe_gt *gt, struct drm_printer *p)
 	err = xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
 err_fw:
 	xe_assert(xe, !err);
-	xe_device_mem_access_put(xe);
 }
 
 static const struct xe_pat_ops xe2_pat_ops = {
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index 16ead90f3cab5..4cfe1f5d2085b 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -277,29 +277,6 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
 	/* Disable access_ongoing asserts and prevent recursive pm calls */
 	xe_pm_write_callback_task(xe, current);
 
-	/*
-	 * The actual xe_device_mem_access_put() is always async underneath, so
-	 * exactly where that is called should makes no difference to us. However
-	 * we still need to be very careful with the locks that this callback
-	 * acquires and the locks that are acquired and held by any callers of
-	 * xe_device_mem_access_get(). We already have the matching annotation
-	 * on that side, but we also need it here. For example lockdep should be
-	 * able to tell us if the following scenario is in theory possible:
-	 *
-	 * CPU0                          | CPU1 (kworker)
-	 * lock(A)                       |
-	 *                               | xe_pm_runtime_suspend()
-	 *                               |      lock(A)
-	 * xe_device_mem_access_get()    |
-	 *
-	 * This will clearly deadlock since rpm core needs to wait for
-	 * xe_pm_runtime_suspend() to complete, but here we are holding lock(A)
-	 * on CPU0 which prevents CPU1 making forward progress.  With the
-	 * annotation here and in xe_device_mem_access_get() lockdep will see
-	 * the potential lock inversion and give us a nice splat.
-	 */
-	lock_map_acquire(&xe_device_mem_access_lockdep_map);
-
 	if (xe->d3cold.allowed) {
 		err = xe_bo_evict_all(xe);
 		if (err)
@@ -319,7 +296,6 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
 	if (xe->d3cold.allowed)
 		xe_display_pm_suspend_late(xe);
 out:
-	lock_map_release(&xe_device_mem_access_lockdep_map);
 	xe_pm_write_callback_task(xe, NULL);
 	return err;
 }
@@ -339,8 +315,6 @@ int xe_pm_runtime_resume(struct xe_device *xe)
 	/* Disable access_ongoing asserts and prevent recursive pm calls */
 	xe_pm_write_callback_task(xe, current);
 
-	lock_map_acquire(&xe_device_mem_access_lockdep_map);
-
 	/*
 	 * It can be possible that xe has allowed d3cold but other pcie devices
 	 * in gfx card soc would have blocked d3cold, therefore card has not
@@ -379,7 +353,6 @@ int xe_pm_runtime_resume(struct xe_device *xe)
 			goto out;
 	}
 out:
-	lock_map_release(&xe_device_mem_access_lockdep_map);
 	xe_pm_write_callback_task(xe, NULL);
 	return err;
 }
diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c
index 9b35673b286c8..86222c80a874b 100644
--- a/drivers/gpu/drm/xe/xe_query.c
+++ b/drivers/gpu/drm/xe/xe_query.c
@@ -147,7 +147,6 @@ query_engine_cycles(struct xe_device *xe,
 	if (!hwe)
 		return -EINVAL;
 
-	xe_device_mem_access_get(xe);
 	xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
 
 	__read_timestamps(gt,
@@ -159,7 +158,6 @@ query_engine_cycles(struct xe_device *xe,
 			  cpu_clock);
 
 	xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL);
-	xe_device_mem_access_put(xe);
 	resp.width = 36;
 
 	/* Only write to the output fields of user query */
@@ -437,9 +435,7 @@ static int query_hwconfig(struct xe_device *xe,
 	if (!hwconfig)
 		return -ENOMEM;
 
-	xe_device_mem_access_get(xe);
 	xe_guc_hwconfig_copy(&gt->uc.guc, hwconfig);
-	xe_device_mem_access_put(xe);
 
 	if (copy_to_user(query_ptr, hwconfig, size)) {
 		kfree(hwconfig);
diff --git a/drivers/gpu/drm/xe/xe_tile.c b/drivers/gpu/drm/xe/xe_tile.c
index 044c20881de7e..74ecb5f39438f 100644
--- a/drivers/gpu/drm/xe/xe_tile.c
+++ b/drivers/gpu/drm/xe/xe_tile.c
@@ -160,23 +160,19 @@ int xe_tile_init_noalloc(struct xe_tile *tile)
 {
 	int err;
 
-	xe_device_mem_access_get(tile_to_xe(tile));
-
 	err = tile_ttm_mgr_init(tile);
 	if (err)
-		goto err_mem_access;
+		return err;
 
 	tile->mem.kernel_bb_pool = xe_sa_bo_manager_init(tile, SZ_1M, 16);
 	if (IS_ERR(tile->mem.kernel_bb_pool))
-		err = PTR_ERR(tile->mem.kernel_bb_pool);
+		return PTR_ERR(tile->mem.kernel_bb_pool);
 
 	xe_wa_apply_tile_workarounds(tile);
 
 	xe_tile_sysfs_init(tile);
 
-err_mem_access:
-	xe_device_mem_access_put(tile_to_xe(tile));
-	return err;
+	return 0;
 }
 
 void xe_tile_migrate_wait(struct xe_tile *tile)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 1ca917b8315c2..92223e6ca687c 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1279,9 +1279,6 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 
 	vm->pt_ops = &xelp_pt_ops;
 
-	if (!(flags & XE_VM_FLAG_MIGRATION))
-		xe_device_mem_access_get(xe);
-
 	vm_resv_obj = drm_gpuvm_resv_object_alloc(&xe->drm);
 	if (!vm_resv_obj) {
 		err = -ENOMEM;
@@ -1389,8 +1386,6 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 	for_each_tile(tile, xe, id)
 		xe_range_fence_tree_fini(&vm->rftree[id]);
 	kfree(vm);
-	if (!(flags & XE_VM_FLAG_MIGRATION))
-		xe_device_mem_access_put(xe);
 	return ERR_PTR(err);
 }
 
@@ -1511,8 +1506,6 @@ static void vm_destroy_work_func(struct work_struct *w)
 	xe_assert(xe, !vm->size);
 
 	if (!(vm->flags & XE_VM_FLAG_MIGRATION)) {
-		xe_device_mem_access_put(xe);
-
 		if (xe->info.has_asid && vm->usm.asid) {
 			mutex_lock(&xe->usm.lock);
 			lookup = xa_erase(&xe->usm.asid_to_vm, vm->usm.asid);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* RE: [RFC 07/20] drm/xe: Runtime PM wake on every IOCTL
  2023-12-28  2:12 ` [RFC 07/20] drm/xe: Runtime PM wake on every IOCTL Rodrigo Vivi
@ 2024-01-02 11:30   ` Gupta, Anshuman
  2024-01-09 17:57     ` Rodrigo Vivi
  0 siblings, 1 reply; 46+ messages in thread
From: Gupta, Anshuman @ 2024-01-02 11:30 UTC (permalink / raw)
  To: Vivi, Rodrigo, intel-xe@lists.freedesktop.org; +Cc: Deak, Imre, Vivi, Rodrigo



> -----Original Message-----
> From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of Rodrigo
> Vivi
> Sent: Thursday, December 28, 2023 7:42 AM
> To: intel-xe@lists.freedesktop.org
> Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
> Subject: [RFC 07/20] drm/xe: Runtime PM wake on every IOCTL
> 
> Let's ensure our PCI device is awaken on every IOCTL entry.
> Let's increase the runtime_pm protection and start moving that to the outer
> bounds.
IMO we need to decouple dc9 from runtime suspend as prev patch " [RFC,05/20] drm/xe: Prepare display for D3Cold"
added that. Let dc9 to be enable when all display are off. Otherwise blocking runtime PM on every ioctl will also block
DC9 unnecessary.
Thanks,
Anshuman Gupta.
> 
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device.c | 32
> ++++++++++++++++++++++++++++++--
>  drivers/gpu/drm/xe/xe_pm.c     | 15 +++++++++++++++
>  drivers/gpu/drm/xe/xe_pm.h     |  1 +
>  3 files changed, 46 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index dc3721bb37b1e..ee9b6612eec43 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -140,15 +140,43 @@ static const struct drm_ioctl_desc xe_ioctls[] = {
>  			  DRM_RENDER_ALLOW),
>  };
> 
> +static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned
> +long arg) {
> +	struct drm_file *file_priv = file->private_data;
> +	struct xe_device *xe = to_xe_device(file_priv->minor->dev);
> +	long ret;
> +
> +	ret = xe_pm_runtime_get_sync(xe);
> +	if (ret >= 0)
> +		ret = drm_ioctl(file, cmd, arg);
> +	xe_pm_runtime_put(xe);
> +
> +	return ret;
> +}
> +
> +static long xe_drm_compat_ioctl(struct file *file, unsigned int cmd,
> +unsigned long arg) {
> +	struct drm_file *file_priv = file->private_data;
> +	struct xe_device *xe = to_xe_device(file_priv->minor->dev);
> +	long ret;
> +
> +	ret = xe_pm_runtime_get_sync(xe);
> +	if (ret >= 0)
> +		ret = drm_compat_ioctl(file, cmd, arg);
> +	xe_pm_runtime_put(xe);
> +
> +	return ret;
> +}
> +
>  static const struct file_operations xe_driver_fops = {
>  	.owner = THIS_MODULE,
>  	.open = drm_open,
>  	.release = drm_release_noglobal,
> -	.unlocked_ioctl = drm_ioctl,
> +	.unlocked_ioctl = xe_drm_ioctl,
>  	.mmap = drm_gem_mmap,
>  	.poll = drm_poll,
>  	.read = drm_read,
> -	.compat_ioctl = drm_compat_ioctl,
> +	.compat_ioctl = xe_drm_compat_ioctl,
>  	.llseek = noop_llseek,
>  #ifdef CONFIG_PROC_FS
>  	.show_fdinfo = drm_show_fdinfo,
> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c index
> 45114e4e76a5a..f599707413f18 100644
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -411,6 +411,21 @@ void xe_pm_runtime_put(struct xe_device *xe)
>  	pm_runtime_put(xe->drm.dev);
>  }
> 
> +/**
> + * xe_pm_runtime_get_sync - Get a runtime_pm reference and resume
> +synchronously
> + * @xe: xe device instance
> + *
> + * Returns: Any number grater than or equal to 0 for success, negative
> +error
> + * code otherwise.
> + */
> +int xe_pm_runtime_get_sync(struct xe_device *xe) {
> +        if (WARN_ON(xe_pm_read_callback_task(xe) == current))
> +                return -ELOOP;
> +
> +        return pm_runtime_get_sync(xe->drm.dev); }
> +
>  /**
>   * xe_pm_runtime_get_if_active - Get a runtime_pm reference if device active
>   * @xe: xe device instance
> diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
> index 67a9bf3dd379b..d0e6011a80688 100644
> --- a/drivers/gpu/drm/xe/xe_pm.h
> +++ b/drivers/gpu/drm/xe/xe_pm.h
> @@ -26,6 +26,7 @@ bool xe_pm_runtime_suspended(struct xe_device *xe);
> int xe_pm_runtime_suspend(struct xe_device *xe);  int
> xe_pm_runtime_resume(struct xe_device *xe);  void
> xe_pm_runtime_get(struct xe_device *xe);
> +int xe_pm_runtime_get_sync(struct xe_device *xe);
>  void xe_pm_runtime_put(struct xe_device *xe);  int
> xe_pm_runtime_get_if_active(struct xe_device *xe);  bool
> xe_pm_runtime_get_if_in_use(struct xe_device *xe);
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* ✓ CI.Patch_applied: success for First attempt to kill mem_access
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (19 preceding siblings ...)
  2023-12-28  2:12 ` [RFC 20/20] drm/xe: Mega Kill of mem_access Rodrigo Vivi
@ 2024-01-04  5:40 ` Patchwork
  2024-01-04  5:40 ` ✗ CI.checkpatch: warning " Patchwork
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 46+ messages in thread
From: Patchwork @ 2024-01-04  5:40 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

== Series Details ==

Series: First attempt to kill mem_access
URL   : https://patchwork.freedesktop.org/series/128044/
State : success

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: 7b3b98d03 drm/xe/xe2: Add workaround 16020183090
=== git am output follows ===
Applying: drm/xe: Document Xe PM component
Applying: drm/xe: Fix display runtime_pm handling
Applying: drm/xe: Create a xe_pm_runtime_resume_and_get variant for display
Applying: drm/xe: Convert xe_pm_runtime_{get, put} to void and protect from recursion
Applying: drm/xe: Prepare display for D3Cold
Applying: drm/xe: Convert mem_access assertion towards the runtime_pm state
Applying: drm/xe: Runtime PM wake on every IOCTL
Applying: drm/xe: Runtime PM wake on every exec
Applying: drm/xe: Runtime PM wake on every sysfs call
Applying: drm/xe: Sort some xe_pm_runtime related functions
Applying: drm/xe: Ensure device is awake before removing it
Applying: drm/xe: Remove mem_access from guc_pc calls
Applying: drm/xe: Runtime PM wake on every debugfs call
Applying: drm/xe: Replace dma_buf mem_access per direct xe_pm_runtime calls
Applying: drm/xe: Allow GuC CT fast path and worker regardless of runtime_pm
Applying: drm/xe: Remove mem_access calls from migration
Applying: drm/xe: Removing extra mem_access protection from runtime pm
Applying: drm/xe: Convert hwmon from mem_access to xe_pm_runtime calls
Applying: drm/xe: Remove unused runtime pm helper
Applying: drm/xe: Mega Kill of mem_access



^ permalink raw reply	[flat|nested] 46+ messages in thread

* ✗ CI.checkpatch: warning for First attempt to kill mem_access
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (20 preceding siblings ...)
  2024-01-04  5:40 ` ✓ CI.Patch_applied: success for First attempt to kill mem_access Patchwork
@ 2024-01-04  5:40 ` Patchwork
  2024-01-04  5:41 ` ✗ CI.KUnit: failure " Patchwork
  2024-01-10  5:21 ` [RFC 00/20] " Matthew Brost
  23 siblings, 0 replies; 46+ messages in thread
From: Patchwork @ 2024-01-04  5:40 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

== Series Details ==

Series: First attempt to kill mem_access
URL   : https://patchwork.freedesktop.org/series/128044/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
6030b24c1386b00de8187b5fb987e283a57b372a
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit fd84f8497cabd367f0c544cd3cfece8a40b7ebbd
Author: Rodrigo Vivi <rodrigo.vivi@intel.com>
Date:   Wed Dec 27 21:12:32 2023 -0500

    drm/xe: Mega Kill of mem_access
    
    All of these remaining cases should already be protected
    by the outer bound calls of runtime_pm
    
    Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
+ /mt/dim checkpatch 7b3b98d034784b125bad7aca46f9e7a3cfcde45a drm-intel
1d470b54a drm/xe: Document Xe PM component
75358e29c drm/xe: Fix display runtime_pm handling
-:44: WARNING:REPEATED_WORD: Possible repeated word: 'the'
#44: FILE: drivers/gpu/drm/xe/xe_pm.c:407:
+ * Returns: True if device is awake and the the reference was taken, false otherwise.

-:50: ERROR:CODE_INDENT: code indent should use tabs where possible
#50: FILE: drivers/gpu/drm/xe/xe_pm.c:413:
+                pm_runtime_get_noresume(xe->drm.dev);$

-:50: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#50: FILE: drivers/gpu/drm/xe/xe_pm.c:413:
+                pm_runtime_get_noresume(xe->drm.dev);$

-:54: ERROR:CODE_INDENT: code indent should use tabs where possible
#54: FILE: drivers/gpu/drm/xe/xe_pm.c:417:
+        return pm_runtime_get_if_in_use(xe->drm.dev) >= 0;$

-:54: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#54: FILE: drivers/gpu/drm/xe/xe_pm.c:417:
+        return pm_runtime_get_if_in_use(xe->drm.dev) >= 0;$

total: 2 errors, 3 warnings, 0 checks, 38 lines checked
769f946b3 drm/xe: Create a xe_pm_runtime_resume_and_get variant for display
-:49: WARNING:REPEATED_WORD: Possible repeated word: 'the'
#49: FILE: drivers/gpu/drm/xe/xe_pm.c:424:
+ * Returns: True if device is awake and the the reference was taken, false otherwise.

-:55: ERROR:CODE_INDENT: code indent should use tabs where possible
#55: FILE: drivers/gpu/drm/xe/xe_pm.c:430:
+                pm_runtime_get_noresume(xe->drm.dev);$

-:55: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#55: FILE: drivers/gpu/drm/xe/xe_pm.c:430:
+                pm_runtime_get_noresume(xe->drm.dev);$

-:59: ERROR:CODE_INDENT: code indent should use tabs where possible
#59: FILE: drivers/gpu/drm/xe/xe_pm.c:434:
+        return pm_runtime_resume_and_get(xe->drm.dev) >= 0;$

-:59: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#59: FILE: drivers/gpu/drm/xe/xe_pm.c:434:
+        return pm_runtime_resume_and_get(xe->drm.dev) >= 0;$

total: 2 errors, 3 warnings, 0 checks, 42 lines checked
4947ac546 drm/xe: Convert xe_pm_runtime_{get, put} to void and protect from recursion
-:45: ERROR:CODE_INDENT: code indent should use tabs where possible
#45: FILE: drivers/gpu/drm/xe/xe_pm.c:375:
+                return;$

-:45: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#45: FILE: drivers/gpu/drm/xe/xe_pm.c:375:
+                return;$

total: 1 errors, 1 warnings, 0 checks, 45 lines checked
27921a1e2 drm/xe: Prepare display for D3Cold
a407c85be drm/xe: Convert mem_access assertion towards the runtime_pm state
58a9c6ab5 drm/xe: Runtime PM wake on every IOCTL
-:79: ERROR:CODE_INDENT: code indent should use tabs where possible
#79: FILE: drivers/gpu/drm/xe/xe_pm.c:423:
+        if (WARN_ON(xe_pm_read_callback_task(xe) == current))$

-:79: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#79: FILE: drivers/gpu/drm/xe/xe_pm.c:423:
+        if (WARN_ON(xe_pm_read_callback_task(xe) == current))$

-:80: ERROR:CODE_INDENT: code indent should use tabs where possible
#80: FILE: drivers/gpu/drm/xe/xe_pm.c:424:
+                return -ELOOP;$

-:80: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#80: FILE: drivers/gpu/drm/xe/xe_pm.c:424:
+                return -ELOOP;$

-:82: ERROR:CODE_INDENT: code indent should use tabs where possible
#82: FILE: drivers/gpu/drm/xe/xe_pm.c:426:
+        return pm_runtime_get_sync(xe->drm.dev);$

-:82: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#82: FILE: drivers/gpu/drm/xe/xe_pm.c:426:
+        return pm_runtime_get_sync(xe->drm.dev);$

total: 3 errors, 3 warnings, 0 checks, 73 lines checked
73a58e666 drm/xe: Runtime PM wake on every exec
4fabc06de drm/xe: Runtime PM wake on every sysfs call
-:278: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#278: FILE: drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c:578:
+}
+static inline struct xe_device *to_xe_device(const struct drm_device *dev)

-:283: CHECK:LINE_SPACING: Please don't use multiple blank lines
#283: FILE: drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c:583:
+
+

total: 0 errors, 0 warnings, 2 checks, 299 lines checked
6ac390e79 drm/xe: Sort some xe_pm_runtime related functions
-:22: WARNING:REPEATED_WORD: Possible repeated word: 'the'
#22: FILE: drivers/gpu/drm/xe/xe_pm.c:394:
+ * Returns: True if device is awake and the the reference was taken, false otherwise.

-:28: ERROR:CODE_INDENT: code indent should use tabs where possible
#28: FILE: drivers/gpu/drm/xe/xe_pm.c:400:
+                pm_runtime_get_noresume(xe->drm.dev);$

-:28: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#28: FILE: drivers/gpu/drm/xe/xe_pm.c:400:
+                pm_runtime_get_noresume(xe->drm.dev);$

-:32: ERROR:CODE_INDENT: code indent should use tabs where possible
#32: FILE: drivers/gpu/drm/xe/xe_pm.c:404:
+        return pm_runtime_resume_and_get(xe->drm.dev) >= 0;$

-:32: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#32: FILE: drivers/gpu/drm/xe/xe_pm.c:404:
+        return pm_runtime_resume_and_get(xe->drm.dev) >= 0;$

total: 2 errors, 3 warnings, 0 checks, 77 lines checked
28388d6b9 drm/xe: Ensure device is awake before removing it
af3f933c1 drm/xe: Remove mem_access from guc_pc calls
d4f2bc787 drm/xe: Runtime PM wake on every debugfs call
4eb02fa60 drm/xe: Replace dma_buf mem_access per direct xe_pm_runtime calls
e9d147f41 drm/xe: Allow GuC CT fast path and worker regardless of runtime_pm
9272531e3 drm/xe: Remove mem_access calls from migration
7a45910d6 drm/xe: Removing extra mem_access protection from runtime pm
cec35989f drm/xe: Convert hwmon from mem_access to xe_pm_runtime calls
ddcf5db86 drm/xe: Remove unused runtime pm helper
fd84f8497 drm/xe: Mega Kill of mem_access



^ permalink raw reply	[flat|nested] 46+ messages in thread

* ✗ CI.KUnit: failure for First attempt to kill mem_access
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (21 preceding siblings ...)
  2024-01-04  5:40 ` ✗ CI.checkpatch: warning " Patchwork
@ 2024-01-04  5:41 ` Patchwork
  2024-01-10  5:21 ` [RFC 00/20] " Matthew Brost
  23 siblings, 0 replies; 46+ messages in thread
From: Patchwork @ 2024-01-04  5:41 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

== Series Details ==

Series: First attempt to kill mem_access
URL   : https://patchwork.freedesktop.org/series/128044/
State : failure

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
ERROR:root:In file included from ../include/uapi/linux/posix_types.h:5,
                 from ../include/uapi/linux/types.h:14,
                 from ../include/linux/types.h:6,
                 from ../include/linux/kasan-checks.h:5,
                 from ../include/asm-generic/rwonce.h:26,
                 from ./arch/x86/include/generated/asm/rwonce.h:1,
                 from ../include/linux/compiler.h:251,
                 from ../include/linux/array_size.h:5,
                 from ../include/linux/kernel.h:16,
                 from ../include/linux/interrupt.h:6,
                 from ../include/drm/drm_util.h:35,
                 from ../drivers/gpu/drm/xe/xe_device.h:12,
                 from ../drivers/gpu/drm/xe/xe_device.c:6:
../drivers/gpu/drm/xe/xe_device.c: In function ‘xe_drm_compat_ioctl’:
../include/linux/stddef.h:8:14: error: called object is not a function or function pointer
    8 | #define NULL ((void *)0)
      |              ^
../include/drm/drm_ioctl.h:165:26: note: in expansion of macro ‘NULL’
  165 | #define drm_compat_ioctl NULL
      |                          ^~~~
../drivers/gpu/drm/xe/xe_device.c:159:9: note: in expansion of macro ‘drm_compat_ioctl’
  159 |   ret = drm_compat_ioctl(file, cmd, arg);
      |         ^~~~~~~~~~~~~~~~
make[7]: *** [../scripts/Makefile.build:243: drivers/gpu/drm/xe/xe_device.o] Error 1
make[7]: *** Waiting for unfinished jobs....
make[6]: *** [../scripts/Makefile.build:480: drivers/gpu/drm/xe] Error 2
make[6]: *** Waiting for unfinished jobs....
make[5]: *** [../scripts/Makefile.build:480: drivers/gpu/drm] Error 2
make[4]: *** [../scripts/Makefile.build:480: drivers/gpu] Error 2
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [../scripts/Makefile.build:480: drivers] Error 2
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [/kernel/Makefile:1911: .] Error 2
make[1]: *** [/kernel/Makefile:234: __sub-make] Error 2
make: *** [Makefile:234: __sub-make] Error 2

[05:40:37] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[05:40:41] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make ARCH=um O=.kunit --jobs=48
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 06/20] drm/xe: Convert mem_access assertion towards the runtime_pm state
  2023-12-28  2:12 ` [RFC 06/20] drm/xe: Convert mem_access assertion towards the runtime_pm state Rodrigo Vivi
@ 2024-01-09 11:06   ` Matthew Auld
  2024-01-09 17:50     ` Rodrigo Vivi
  0 siblings, 1 reply; 46+ messages in thread
From: Matthew Auld @ 2024-01-09 11:06 UTC (permalink / raw)
  To: Rodrigo Vivi, intel-xe

On 28/12/2023 02:12, Rodrigo Vivi wrote:
> The mem_access helpers are going away and getting replaced by
> direct calls of the xe_pm_runtime_{get,put} functions. However, an
> assertion with a warning splat is desired when we hit the worst
> case of a memory access with the device really in the 'suspended'
> state.
> 
> Also, this needs to be the first step. Otherwise, the upcoming
> conversion would be really noise with warn splats of missing mem_access
> gets.
> 
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_device.c | 13 ++++++++++++-
>   drivers/gpu/drm/xe/xe_pm.c     | 16 ++++++++++++++++
>   drivers/gpu/drm/xe/xe_pm.h     |  1 +
>   3 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 86867d42d5329..dc3721bb37b1e 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -631,9 +631,20 @@ bool xe_device_mem_access_ongoing(struct xe_device *xe)
>   	return atomic_read(&xe->mem_access.ref);
>   }
>   
> +/**
> + * xe_device_assert_mem_access - Inspect the current runtime_pm state.
> + * @xe: xe device instance
> + *
> + * To be used before any kind of memory access. It will splat a debug warning
> + * if the device is currently sleeping. But it doesn't guarantee in any way
> + * that the device is going to continue awake. Xe PM runtime get and put
> + * functions might be added to the outer bound of the memory access, while
> + * this check is intended for inner usage to splat some warning if the worst
> + * case has just happened.
> + */
>   void xe_device_assert_mem_access(struct xe_device *xe)
>   {
> -	XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
> +	XE_WARN_ON(xe_pm_runtime_suspended(xe));
>   }
>   
>   bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe)
> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> index cabed94a21873..45114e4e76a5a 100644
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -246,6 +246,22 @@ struct task_struct *xe_pm_read_callback_task(struct xe_device *xe)
>   	return READ_ONCE(xe->pm_callback_task);
>   }
>   
> +/**
> + * xe_pm_runtime_suspended - Inspect the current runtime_pm state.
> + * @xe: xe device instance
> + *
> + * This does not provide any guarantee that the device is going to continue
> + * suspended as it might be racing with the runtime state transitions.
> + * It can be used only as a non-reliable assertion, to ensure that we are not in
> + * the sleep state while trying to access some memory for instance.
> + *
> + * Returns true if PCI device is suspended, false otherwise.
> + */
> +bool xe_pm_runtime_suspended(struct xe_device *xe)
> +{
> +	return pm_runtime_suspended(xe->drm.dev);

Would it not be better to check for active instead? That way we can 
check for !active above and create a bigger net with SUSPENDING and 
RESUMING states also being invalid i.e another task is about to suspend 
or hasn't fully resumed yet. We might also need to also check the 
callback task though.

> +}
> +
>   /**
>    * xe_pm_runtime_suspend - Prepare our device for D3hot/D3Cold
>    * @xe: xe device instance
> diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
> index 069f41c61505b..67a9bf3dd379b 100644
> --- a/drivers/gpu/drm/xe/xe_pm.h
> +++ b/drivers/gpu/drm/xe/xe_pm.h
> @@ -22,6 +22,7 @@ int xe_pm_resume(struct xe_device *xe);
>   
>   void xe_pm_init(struct xe_device *xe);
>   void xe_pm_runtime_fini(struct xe_device *xe);
> +bool xe_pm_runtime_suspended(struct xe_device *xe);
>   int xe_pm_runtime_suspend(struct xe_device *xe);
>   int xe_pm_runtime_resume(struct xe_device *xe);
>   void xe_pm_runtime_get(struct xe_device *xe);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 08/20] drm/xe: Runtime PM wake on every exec
  2023-12-28  2:12 ` [RFC 08/20] drm/xe: Runtime PM wake on every exec Rodrigo Vivi
@ 2024-01-09 11:24   ` Matthew Auld
  2024-01-09 17:41     ` Rodrigo Vivi
  0 siblings, 1 reply; 46+ messages in thread
From: Matthew Auld @ 2024-01-09 11:24 UTC (permalink / raw)
  To: Rodrigo Vivi, intel-xe

On 28/12/2023 02:12, Rodrigo Vivi wrote:
> Let's ensure our PCI device is awaken on every GT execution to
> the end of the execution.
> Let's increase the runtime_pm protection and start moving
> that to the outer bounds.
> 
> Let's also remove the unnecessary mem_access get/put.
> 
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_sched_job.c | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
> index 01106a1156ad8..0b30ec77fc5ad 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job.c
> +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> @@ -15,6 +15,7 @@
>   #include "xe_hw_fence.h"
>   #include "xe_lrc.h"
>   #include "xe_macros.h"
> +#include "xe_pm.h"
>   #include "xe_trace.h"
>   #include "xe_vm.h"
>   
> @@ -67,6 +68,8 @@ static void job_free(struct xe_sched_job *job)
>   	struct xe_exec_queue *q = job->q;
>   	bool is_migration = xe_sched_job_is_migration(q);
>   
> +	xe_pm_runtime_put(gt_to_xe(q->gt));
> +
>   	kmem_cache_free(xe_exec_queue_is_parallel(job->q) || is_migration ?
>   			xe_sched_job_parallel_slab : xe_sched_job_slab, job);
>   }
> @@ -86,6 +89,8 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
>   	int i, j;
>   	u32 width;
>   
> +	xe_pm_runtime_get(gt_to_xe(q->gt));
> +

This seems way too deep in the call chain. If this actually wakes up the 
device we will end up with all of the same d3cold deadlock issues. Like 
here we are for sure holding stuff like dma-resv, but the rpm callbacks 
also want to grab it. IMO this needs to be something like 
runtime_get_if_active(), with the upper layers already ensuring device 
is awake (like ioctl), so here we are just keeping it awake until the 
job is done. Or maybe this is how it is by the end of the series?

>   	/* only a kernel context can submit a vm-less job */
>   	XE_WARN_ON(!q->vm && !(q->flags & EXEC_QUEUE_FLAG_KERNEL));
>   
> @@ -155,9 +160,6 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
>   	for (i = 0; i < width; ++i)
>   		job->batch_addr[i] = batch_addr[i];
>   
> -	/* All other jobs require a VM to be open which has a ref */
> -	if (unlikely(q->flags & EXEC_QUEUE_FLAG_KERNEL))
> -		xe_device_mem_access_get(job_to_xe(job));
>   	xe_device_assert_mem_access(job_to_xe(job));
>   
>   	trace_xe_sched_job_create(job);
> @@ -189,8 +191,6 @@ void xe_sched_job_destroy(struct kref *ref)
>   	struct xe_sched_job *job =
>   		container_of(ref, struct xe_sched_job, refcount);
>   
> -	if (unlikely(job->q->flags & EXEC_QUEUE_FLAG_KERNEL))
> -		xe_device_mem_access_put(job_to_xe(job));
>   	xe_exec_queue_put(job->q);
>   	dma_fence_put(job->fence);
>   	drm_sched_job_cleanup(&job->drm);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 10/20] drm/xe: Sort some xe_pm_runtime related functions
  2023-12-28  2:12 ` [RFC 10/20] drm/xe: Sort some xe_pm_runtime related functions Rodrigo Vivi
@ 2024-01-09 11:26   ` Matthew Auld
  0 siblings, 0 replies; 46+ messages in thread
From: Matthew Auld @ 2024-01-09 11:26 UTC (permalink / raw)
  To: Rodrigo Vivi, intel-xe

On 28/12/2023 02:12, Rodrigo Vivi wrote:
> No functional change. Just organizing the file a bit better
> 
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_pm.c | 42 +++++++++++++++++++-------------------
>   drivers/gpu/drm/xe/xe_pm.h |  4 ++--
>   2 files changed, 23 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> index f599707413f18..3594e707606ce 100644
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -387,6 +387,23 @@ int xe_pm_runtime_resume(struct xe_device *xe)
>   	return err;
>   }
>   
> +/**
> + * xe_pm_runtime_resume_and_get - Resume, then get a runtime_pm ref if awake.
> + * @xe: xe device instance
> + *
> + * Returns: True if device is awake and the the reference was taken, false otherwise.
> + */
> +bool xe_pm_runtime_resume_and_get(struct xe_device *xe)
> +{
> +	if (xe_pm_read_callback_task(xe) == current) {
> +		/* The device is awake, grab the ref and move on */
> +                pm_runtime_get_noresume(xe->drm.dev);
> +		return true;
> +	}
> +
> +        return pm_runtime_resume_and_get(xe->drm.dev) >= 0;

Nit: Formatting looks off here.

> +}
> +
>   /**
>    * xe_pm_runtime_get - Get a runtime_pm reference and resume synchronously
>    * @xe: xe device instance
> @@ -401,16 +418,6 @@ void xe_pm_runtime_get(struct xe_device *xe)
>   	pm_runtime_resume(xe->drm.dev);
>   }
>   
> -/**
> - * xe_pm_runtime_put - Put the runtime_pm reference back and mark as idle
> - * @xe: xe device instance
> - */
> -void xe_pm_runtime_put(struct xe_device *xe)
> -{
> -	pm_runtime_mark_last_busy(xe->drm.dev);
> -	pm_runtime_put(xe->drm.dev);
> -}
> -
>   /**
>    * xe_pm_runtime_get_sync - Get a runtime_pm reference and resume synchronously
>    * @xe: xe device instance
> @@ -456,20 +463,13 @@ bool xe_pm_runtime_get_if_in_use(struct xe_device *xe)
>   }
>   
>   /**
> - * xe_pm_runtime_resume_and_get - Resume, then get a runtime_pm ref if awake.
> + * xe_pm_runtime_put - Put the runtime_pm reference back and mark as idle
>    * @xe: xe device instance
> - *
> - * Returns: True if device is awake and the the reference was taken, false otherwise.
>    */
> -bool xe_pm_runtime_resume_and_get(struct xe_device *xe)
> +void xe_pm_runtime_put(struct xe_device *xe)
>   {
> -	if (xe_pm_read_callback_task(xe) == current) {
> -		/* The device is awake, grab the ref and move on */
> -                pm_runtime_get_noresume(xe->drm.dev);
> -		return true;
> -	}
> -
> -        return pm_runtime_resume_and_get(xe->drm.dev) >= 0;
> +	pm_runtime_mark_last_busy(xe->drm.dev);
> +	pm_runtime_put(xe->drm.dev);
>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
> index d0e6011a80688..fc82a1466453b 100644
> --- a/drivers/gpu/drm/xe/xe_pm.h
> +++ b/drivers/gpu/drm/xe/xe_pm.h
> @@ -25,12 +25,12 @@ void xe_pm_runtime_fini(struct xe_device *xe);
>   bool xe_pm_runtime_suspended(struct xe_device *xe);
>   int xe_pm_runtime_suspend(struct xe_device *xe);
>   int xe_pm_runtime_resume(struct xe_device *xe);
> +bool xe_pm_runtime_resume_and_get(struct xe_device *xe);
>   void xe_pm_runtime_get(struct xe_device *xe);
>   int xe_pm_runtime_get_sync(struct xe_device *xe);
> -void xe_pm_runtime_put(struct xe_device *xe);
>   int xe_pm_runtime_get_if_active(struct xe_device *xe);
>   bool xe_pm_runtime_get_if_in_use(struct xe_device *xe);
> -bool xe_pm_runtime_resume_and_get(struct xe_device *xe);
> +void xe_pm_runtime_put(struct xe_device *xe);
>   void xe_pm_assert_unbounded_bridge(struct xe_device *xe);
>   int xe_pm_set_vram_threshold(struct xe_device *xe, u32 threshold);
>   void xe_pm_d3cold_allowed_toggle(struct xe_device *xe);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 20/20] drm/xe: Mega Kill of mem_access
  2023-12-28  2:12 ` [RFC 20/20] drm/xe: Mega Kill of mem_access Rodrigo Vivi
@ 2024-01-09 11:41   ` Matthew Auld
  2024-01-09 17:39     ` Rodrigo Vivi
  0 siblings, 1 reply; 46+ messages in thread
From: Matthew Auld @ 2024-01-09 11:41 UTC (permalink / raw)
  To: Rodrigo Vivi, intel-xe

On 28/12/2023 02:12, Rodrigo Vivi wrote:
> All of these remaining cases should already be protected
> by the outer bound calls of runtime_pm
> 
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>   drivers/gpu/drm/xe/display/xe_fb_pin.c |  7 +--
>   drivers/gpu/drm/xe/tests/xe_bo.c       |  8 ----
>   drivers/gpu/drm/xe/tests/xe_mocs.c     |  4 --
>   drivers/gpu/drm/xe/xe_bo.c             |  5 ---
>   drivers/gpu/drm/xe/xe_device.c         | 59 --------------------------
>   drivers/gpu/drm/xe/xe_device.h         |  7 ---
>   drivers/gpu/drm/xe/xe_device_types.h   |  9 ----
>   drivers/gpu/drm/xe/xe_ggtt.c           |  6 ---
>   drivers/gpu/drm/xe/xe_gsc.c            |  3 --
>   drivers/gpu/drm/xe/xe_gt.c             | 17 --------
>   drivers/gpu/drm/xe/xe_huc_debugfs.c    |  2 -
>   drivers/gpu/drm/xe/xe_pat.c            | 10 -----
>   drivers/gpu/drm/xe/xe_pm.c             | 27 ------------
>   drivers/gpu/drm/xe/xe_query.c          |  4 --
>   drivers/gpu/drm/xe/xe_tile.c           | 10 ++---
>   drivers/gpu/drm/xe/xe_vm.c             |  7 ---
>   16 files changed, 5 insertions(+), 180 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> index 722c84a566073..077294ec50ece 100644
> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> @@ -190,10 +190,9 @@ static int __xe_pin_fb_vma_ggtt(struct intel_framebuffer *fb,
>   	/* TODO: Consider sharing framebuffer mapping?
>   	 * embed i915_vma inside intel_framebuffer
>   	 */
> -	xe_device_mem_access_get(tile_to_xe(ggtt->tile));
>   	ret = mutex_lock_interruptible(&ggtt->lock);
>   	if (ret)
> -		goto out;
> +		return ret;
>   
>   	align = XE_PAGE_SIZE;
>   	if (xe_bo_is_vram(bo) && ggtt->flags & XE_GGTT_FLAGS_64K)
> @@ -241,8 +240,6 @@ static int __xe_pin_fb_vma_ggtt(struct intel_framebuffer *fb,
>   	xe_ggtt_invalidate(ggtt);
>   out_unlock:
>   	mutex_unlock(&ggtt->lock);
> -out:
> -	xe_device_mem_access_put(tile_to_xe(ggtt->tile));
>   	return ret;
>   }
>   
> @@ -381,4 +378,4 @@ struct i915_address_space *intel_dpt_create(struct intel_framebuffer *fb)
>   void intel_dpt_destroy(struct i915_address_space *vm)
>   {
>   	return;
> -}
> \ No newline at end of file
> +}
> diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
> index 412b2e7ce40cb..97b10e597f0ad 100644
> --- a/drivers/gpu/drm/xe/tests/xe_bo.c
> +++ b/drivers/gpu/drm/xe/tests/xe_bo.c
> @@ -164,8 +164,6 @@ static int ccs_test_run_device(struct xe_device *xe)
>   		return 0;
>   	}
>   
> -	xe_device_mem_access_get(xe);
> -
>   	for_each_tile(tile, xe, id) {
>   		/* For igfx run only for primary tile */
>   		if (!IS_DGFX(xe) && id > 0)
> @@ -173,8 +171,6 @@ static int ccs_test_run_device(struct xe_device *xe)
>   		ccs_test_run_tile(xe, tile, test);
>   	}
>   
> -	xe_device_mem_access_put(xe);
> -
>   	return 0;
>   }
>   
> @@ -336,13 +332,9 @@ static int evict_test_run_device(struct xe_device *xe)
>   		return 0;
>   	}
>   
> -	xe_device_mem_access_get(xe);
> -
>   	for_each_tile(tile, xe, id)
>   		evict_test_run_tile(xe, tile, test);
>   
> -	xe_device_mem_access_put(xe);
> -
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/xe/tests/xe_mocs.c b/drivers/gpu/drm/xe/tests/xe_mocs.c
> index 7dd34f94e8094..a12e7e2bb5861 100644
> --- a/drivers/gpu/drm/xe/tests/xe_mocs.c
> +++ b/drivers/gpu/drm/xe/tests/xe_mocs.c
> @@ -45,7 +45,6 @@ static void read_l3cc_table(struct xe_gt *gt,
>   
>   	struct kunit *test = xe_cur_kunit();
>   
> -	xe_device_mem_access_get(gt_to_xe(gt));
>   	ret = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
>   	KUNIT_ASSERT_EQ_MSG(test, ret, 0, "Forcewake Failed.\n");
>   	mocs_dbg(&gt_to_xe(gt)->drm, "L3CC entries:%d\n", info->n_entries);
> @@ -65,7 +64,6 @@ static void read_l3cc_table(struct xe_gt *gt,
>   				   XELP_LNCFCMOCS(i).addr);
>   	}
>   	xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
> -	xe_device_mem_access_put(gt_to_xe(gt));
>   }
>   
>   static void read_mocs_table(struct xe_gt *gt,
> @@ -80,7 +78,6 @@ static void read_mocs_table(struct xe_gt *gt,
>   
>   	struct kunit *test = xe_cur_kunit();
>   
> -	xe_device_mem_access_get(gt_to_xe(gt));
>   	ret = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
>   	KUNIT_ASSERT_EQ_MSG(test, ret, 0, "Forcewake Failed.\n");
>   	mocs_dbg(&gt_to_xe(gt)->drm, "Global MOCS entries:%d\n", info->n_entries);
> @@ -100,7 +97,6 @@ static void read_mocs_table(struct xe_gt *gt,
>   				   XELP_GLOBAL_MOCS(i).addr);
>   	}
>   	xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
> -	xe_device_mem_access_put(gt_to_xe(gt));
>   }
>   
>   static int mocs_kernel_test_run_device(struct xe_device *xe)
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 8e4a3b1f6b938..056c65c2675d8 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -715,7 +715,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>   	xe_assert(xe, migrate);
>   
>   	trace_xe_bo_move(bo);
> -	xe_device_mem_access_get(xe);
>   
>   	if (xe_bo_is_pinned(bo) && !xe_bo_is_user(bo)) {
>   		/*
> @@ -739,7 +738,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>   
>   				if (XE_WARN_ON(new_mem->start == XE_BO_INVALID_OFFSET)) {
>   					ret = -EINVAL;
> -					xe_device_mem_access_put(xe);
>   					goto out;
>   				}
>   
> @@ -757,7 +755,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>   						new_mem, handle_system_ccs);
>   		if (IS_ERR(fence)) {
>   			ret = PTR_ERR(fence);
> -			xe_device_mem_access_put(xe);
>   			goto out;
>   		}
>   		if (!move_lacks_source) {
> @@ -782,8 +779,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>   		dma_fence_put(fence);
>   	}
>   
> -	xe_device_mem_access_put(xe);
> -
>   out:
>   	return ret;
>   
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index c1c19264a58b4..cb08a4369bb9e 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -44,12 +44,6 @@
>   #include "xe_wait_user_fence.h"
>   #include "xe_hwmon.h"
>   
> -#ifdef CONFIG_LOCKDEP
> -struct lockdep_map xe_device_mem_access_lockdep_map = {
> -	.name = "xe_device_mem_access_lockdep_map"
> -};
> -#endif

Did you mean to drop this? IMO we should for sure keep the lockdep 
annotations. Otherwise it is going to be really hard to validate the 
locking design and have reasonable confidence that we don't have 
deadlocks lurking, or as new users come along sprinkling rpm get in the 
wrong place.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 15/20] drm/xe: Allow GuC CT fast path and worker regardless of runtime_pm
  2023-12-28  2:12 ` [RFC 15/20] drm/xe: Allow GuC CT fast path and worker regardless of runtime_pm Rodrigo Vivi
@ 2024-01-09 12:09   ` Matthew Auld
  0 siblings, 0 replies; 46+ messages in thread
From: Matthew Auld @ 2024-01-09 12:09 UTC (permalink / raw)
  To: Rodrigo Vivi, intel-xe

On 28/12/2023 02:12, Rodrigo Vivi wrote:
> First of all, this !ongoing && !from_runtime_functions seems a case that
> should not happen and be bad anyway. So, let's at least stop doing
> the workaround and if we find the case again we need to find in which
> outer bound we need to protect this access, or another real condition.
> 
> On top of that we are now protecting more outer bounds instead of
> a more granular memory access, so we might be fine. Or maybe ensure
> that we really shut off GuC on these conditions.
> 
> Anyway, let's proceed with our killing of the memory_access callers
> for now.

IIRC the main concern was that you could somehow get unsolicited CT 
messages from GuC, so we figured it was best to ensure the device is 
awake (and stays awake) before proceeding with accessing the device. Say 
device enters SUSPENDING just as we get/process an unsolicited CT 
message, or maybe CT request timed out waiting for response and caller 
bails out dropping rpm and maybe triggering async gt reset, but turns 
out GuC is just being slow and we do eventually get that response and 
process it? Is that type of stuff not possible?

> 
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_guc_ct.c | 40 ----------------------------------
>   1 file changed, 40 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index 4cde93c18a2d4..7e68ef69ca8d5 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -1096,14 +1096,8 @@ static void g2h_fast_path(struct xe_guc_ct *ct, u32 *msg, u32 len)
>    */
>   void xe_guc_ct_fast_path(struct xe_guc_ct *ct)
>   {
> -	struct xe_device *xe = ct_to_xe(ct);
> -	bool ongoing;
>   	int len;
>   
> -	ongoing = xe_device_mem_access_get_if_ongoing(ct_to_xe(ct));
> -	if (!ongoing && xe_pm_read_callback_task(ct_to_xe(ct)) == NULL)
> -		return;
> -
>   	spin_lock(&ct->fast_lock);
>   	do {
>   		len = g2h_read(ct, ct->fast_msg, true);
> @@ -1111,9 +1105,6 @@ void xe_guc_ct_fast_path(struct xe_guc_ct *ct)
>   			g2h_fast_path(ct, ct->fast_msg, len);
>   	} while (len > 0);
>   	spin_unlock(&ct->fast_lock);
> -
> -	if (ongoing)
> -		xe_device_mem_access_put(xe);
>   }
>   
>   /* Returns less than zero on error, 0 on done, 1 on more available */
> @@ -1144,36 +1135,8 @@ static int dequeue_one_g2h(struct xe_guc_ct *ct)
>   static void g2h_worker_func(struct work_struct *w)
>   {
>   	struct xe_guc_ct *ct = container_of(w, struct xe_guc_ct, g2h_worker);
> -	bool ongoing;
>   	int ret;
>   
> -	/*
> -	 * Normal users must always hold mem_access.ref around CT calls. However
> -	 * during the runtime pm callbacks we rely on CT to talk to the GuC, but
> -	 * at this stage we can't rely on mem_access.ref and even the
> -	 * callback_task will be different than current.  For such cases we just
> -	 * need to ensure we always process the responses from any blocking
> -	 * ct_send requests or where we otherwise expect some response when
> -	 * initiated from those callbacks (which will need to wait for the below
> -	 * dequeue_one_g2h()).  The dequeue_one_g2h() will gracefully fail if
> -	 * the device has suspended to the point that the CT communication has
> -	 * been disabled.
> -	 *
> -	 * If we are inside the runtime pm callback, we can be the only task
> -	 * still issuing CT requests (since that requires having the
> -	 * mem_access.ref).  It seems like it might in theory be possible to
> -	 * receive unsolicited events from the GuC just as we are
> -	 * suspending-resuming, but those will currently anyway be lost when
> -	 * eventually exiting from suspend, hence no need to wake up the device
> -	 * here. If we ever need something stronger than get_if_ongoing() then
> -	 * we need to be careful with blocking the pm callbacks from getting CT
> -	 * responses, if the worker here is blocked on those callbacks
> -	 * completing, creating a deadlock.
> -	 */
> -	ongoing = xe_device_mem_access_get_if_ongoing(ct_to_xe(ct));
> -	if (!ongoing && xe_pm_read_callback_task(ct_to_xe(ct)) == NULL)
> -		return;
> -
>   	do {
>   		mutex_lock(&ct->lock);
>   		ret = dequeue_one_g2h(ct);
> @@ -1187,9 +1150,6 @@ static void g2h_worker_func(struct work_struct *w)
>   			kick_reset(ct);
>   		}
>   	} while (ret == 1);
> -
> -	if (ongoing)
> -		xe_device_mem_access_put(ct_to_xe(ct));
>   }
>   
>   static void guc_ctb_snapshot_capture(struct xe_device *xe, struct guc_ctb *ctb,

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 16/20] drm/xe: Remove mem_access calls from migration
  2023-12-28  2:12 ` [RFC 16/20] drm/xe: Remove mem_access calls from migration Rodrigo Vivi
@ 2024-01-09 12:33   ` Matthew Auld
  2024-01-09 17:58     ` Rodrigo Vivi
  0 siblings, 1 reply; 46+ messages in thread
From: Matthew Auld @ 2024-01-09 12:33 UTC (permalink / raw)
  To: Rodrigo Vivi, intel-xe

On 28/12/2023 02:12, Rodrigo Vivi wrote:
> The sched jobs runtime pm calls already protects every execution,
> including these migration ones.

Is job really enough here? I assume queue is only destroyed once it has 
no more jobs and the final queue ref is dropped. And destroying the 
queue might involve stuff like de-register the context with GuC etc. 
which needs to use CT which will need rpm ref. What is holding the rpm 
if not the vm or queue?

> 
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>   drivers/gpu/drm/xe/tests/xe_migrate.c |  2 --
>   drivers/gpu/drm/xe/xe_device.c        | 17 -----------------
>   drivers/gpu/drm/xe/xe_device.h        |  1 -
>   drivers/gpu/drm/xe/xe_exec_queue.c    | 18 ------------------
>   4 files changed, 38 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> index 7a32faa2f6888..2257f0a28435b 100644
> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> @@ -428,9 +428,7 @@ static int migrate_test_run_device(struct xe_device *xe)
>   
>   		kunit_info(test, "Testing tile id %d.\n", id);
>   		xe_vm_lock(m->q->vm, true);
> -		xe_device_mem_access_get(xe);
>   		xe_migrate_sanity_test(m, test);
> -		xe_device_mem_access_put(xe);
>   		xe_vm_unlock(m->q->vm);
>   	}
>   
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index ee9b6612eec43..a7bec49da49fa 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -675,23 +675,6 @@ void xe_device_assert_mem_access(struct xe_device *xe)
>   	XE_WARN_ON(xe_pm_runtime_suspended(xe));
>   }
>   
> -bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe)
> -{
> -	bool active;
> -
> -	if (xe_pm_read_callback_task(xe) == current)
> -		return true;
> -
> -	active = xe_pm_runtime_get_if_active(xe);
> -	if (active) {
> -		int ref = atomic_inc_return(&xe->mem_access.ref);
> -
> -		xe_assert(xe, ref != S32_MAX);
> -	}
> -
> -	return active;
> -}
> -
>   void xe_device_mem_access_get(struct xe_device *xe)
>   {
>   	int ref;
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index af8ac2e9e2709..4acf4c2973390 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -142,7 +142,6 @@ static inline struct xe_force_wake *gt_to_fw(struct xe_gt *gt)
>   }
>   
>   void xe_device_mem_access_get(struct xe_device *xe);
> -bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe);
>   void xe_device_mem_access_put(struct xe_device *xe);
>   
>   void xe_device_assert_mem_access(struct xe_device *xe);
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 44fe8097b7cda..d3a8d2d8caaaf 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -87,17 +87,6 @@ static struct xe_exec_queue *__xe_exec_queue_create(struct xe_device *xe,
>   	if (err)
>   		goto err_lrc;
>   
> -	/*
> -	 * Normally the user vm holds an rpm ref to keep the device
> -	 * awake, and the context holds a ref for the vm, however for
> -	 * some engines we use the kernels migrate vm underneath which offers no
> -	 * such rpm ref, or we lack a vm. Make sure we keep a ref here, so we
> -	 * can perform GuC CT actions when needed. Caller is expected to have
> -	 * already grabbed the rpm ref outside any sensitive locks.
> -	 */
> -	if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !vm))
> -		drm_WARN_ON(&xe->drm, !xe_device_mem_access_get_if_ongoing(xe));
> -
>   	return q;
>   
>   err_lrc:
> @@ -172,8 +161,6 @@ void xe_exec_queue_fini(struct xe_exec_queue *q)
>   
>   	for (i = 0; i < q->width; ++i)
>   		xe_lrc_finish(q->lrc + i);
> -	if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !q->vm))
> -		xe_device_mem_access_put(gt_to_xe(q->gt));
>   	if (q->vm)
>   		xe_vm_put(q->vm);
>   
> @@ -643,9 +630,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>   			if (XE_IOCTL_DBG(xe, !hwe))
>   				return -EINVAL;
>   
> -			/* The migration vm doesn't hold rpm ref */
> -			xe_device_mem_access_get(xe);
> -
>   			migrate_vm = xe_migrate_get_vm(gt_to_tile(gt)->migrate);
>   			new = xe_exec_queue_create(xe, migrate_vm, logical_mask,
>   						   args->width, hwe,
> @@ -655,8 +639,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>   						    EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD :
>   						    0));
>   
> -			xe_device_mem_access_put(xe); /* now held by engine */
> -
>   			xe_vm_put(migrate_vm);
>   			if (IS_ERR(new)) {
>   				err = PTR_ERR(new);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 20/20] drm/xe: Mega Kill of mem_access
  2024-01-09 11:41   ` Matthew Auld
@ 2024-01-09 17:39     ` Rodrigo Vivi
  2024-01-09 18:27       ` Matthew Auld
  0 siblings, 1 reply; 46+ messages in thread
From: Rodrigo Vivi @ 2024-01-09 17:39 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe

On Tue, Jan 09, 2024 at 11:41:35AM +0000, Matthew Auld wrote:
> On 28/12/2023 02:12, Rodrigo Vivi wrote:
> > All of these remaining cases should already be protected
> > by the outer bound calls of runtime_pm
> > 
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> >   drivers/gpu/drm/xe/display/xe_fb_pin.c |  7 +--
> >   drivers/gpu/drm/xe/tests/xe_bo.c       |  8 ----
> >   drivers/gpu/drm/xe/tests/xe_mocs.c     |  4 --
> >   drivers/gpu/drm/xe/xe_bo.c             |  5 ---
> >   drivers/gpu/drm/xe/xe_device.c         | 59 --------------------------
> >   drivers/gpu/drm/xe/xe_device.h         |  7 ---
> >   drivers/gpu/drm/xe/xe_device_types.h   |  9 ----
> >   drivers/gpu/drm/xe/xe_ggtt.c           |  6 ---
> >   drivers/gpu/drm/xe/xe_gsc.c            |  3 --
> >   drivers/gpu/drm/xe/xe_gt.c             | 17 --------
> >   drivers/gpu/drm/xe/xe_huc_debugfs.c    |  2 -
> >   drivers/gpu/drm/xe/xe_pat.c            | 10 -----
> >   drivers/gpu/drm/xe/xe_pm.c             | 27 ------------
> >   drivers/gpu/drm/xe/xe_query.c          |  4 --
> >   drivers/gpu/drm/xe/xe_tile.c           | 10 ++---
> >   drivers/gpu/drm/xe/xe_vm.c             |  7 ---
> >   16 files changed, 5 insertions(+), 180 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > index 722c84a566073..077294ec50ece 100644
> > --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > @@ -190,10 +190,9 @@ static int __xe_pin_fb_vma_ggtt(struct intel_framebuffer *fb,
> >   	/* TODO: Consider sharing framebuffer mapping?
> >   	 * embed i915_vma inside intel_framebuffer
> >   	 */
> > -	xe_device_mem_access_get(tile_to_xe(ggtt->tile));
> >   	ret = mutex_lock_interruptible(&ggtt->lock);
> >   	if (ret)
> > -		goto out;
> > +		return ret;
> >   	align = XE_PAGE_SIZE;
> >   	if (xe_bo_is_vram(bo) && ggtt->flags & XE_GGTT_FLAGS_64K)
> > @@ -241,8 +240,6 @@ static int __xe_pin_fb_vma_ggtt(struct intel_framebuffer *fb,
> >   	xe_ggtt_invalidate(ggtt);
> >   out_unlock:
> >   	mutex_unlock(&ggtt->lock);
> > -out:
> > -	xe_device_mem_access_put(tile_to_xe(ggtt->tile));
> >   	return ret;
> >   }
> > @@ -381,4 +378,4 @@ struct i915_address_space *intel_dpt_create(struct intel_framebuffer *fb)
> >   void intel_dpt_destroy(struct i915_address_space *vm)
> >   {
> >   	return;
> > -}
> > \ No newline at end of file
> > +}
> > diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
> > index 412b2e7ce40cb..97b10e597f0ad 100644
> > --- a/drivers/gpu/drm/xe/tests/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/tests/xe_bo.c
> > @@ -164,8 +164,6 @@ static int ccs_test_run_device(struct xe_device *xe)
> >   		return 0;
> >   	}
> > -	xe_device_mem_access_get(xe);
> > -
> >   	for_each_tile(tile, xe, id) {
> >   		/* For igfx run only for primary tile */
> >   		if (!IS_DGFX(xe) && id > 0)
> > @@ -173,8 +171,6 @@ static int ccs_test_run_device(struct xe_device *xe)
> >   		ccs_test_run_tile(xe, tile, test);
> >   	}
> > -	xe_device_mem_access_put(xe);
> > -
> >   	return 0;
> >   }
> > @@ -336,13 +332,9 @@ static int evict_test_run_device(struct xe_device *xe)
> >   		return 0;
> >   	}
> > -	xe_device_mem_access_get(xe);
> > -
> >   	for_each_tile(tile, xe, id)
> >   		evict_test_run_tile(xe, tile, test);
> > -	xe_device_mem_access_put(xe);
> > -
> >   	return 0;
> >   }
> > diff --git a/drivers/gpu/drm/xe/tests/xe_mocs.c b/drivers/gpu/drm/xe/tests/xe_mocs.c
> > index 7dd34f94e8094..a12e7e2bb5861 100644
> > --- a/drivers/gpu/drm/xe/tests/xe_mocs.c
> > +++ b/drivers/gpu/drm/xe/tests/xe_mocs.c
> > @@ -45,7 +45,6 @@ static void read_l3cc_table(struct xe_gt *gt,
> >   	struct kunit *test = xe_cur_kunit();
> > -	xe_device_mem_access_get(gt_to_xe(gt));
> >   	ret = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
> >   	KUNIT_ASSERT_EQ_MSG(test, ret, 0, "Forcewake Failed.\n");
> >   	mocs_dbg(&gt_to_xe(gt)->drm, "L3CC entries:%d\n", info->n_entries);
> > @@ -65,7 +64,6 @@ static void read_l3cc_table(struct xe_gt *gt,
> >   				   XELP_LNCFCMOCS(i).addr);
> >   	}
> >   	xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
> > -	xe_device_mem_access_put(gt_to_xe(gt));
> >   }
> >   static void read_mocs_table(struct xe_gt *gt,
> > @@ -80,7 +78,6 @@ static void read_mocs_table(struct xe_gt *gt,
> >   	struct kunit *test = xe_cur_kunit();
> > -	xe_device_mem_access_get(gt_to_xe(gt));
> >   	ret = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
> >   	KUNIT_ASSERT_EQ_MSG(test, ret, 0, "Forcewake Failed.\n");
> >   	mocs_dbg(&gt_to_xe(gt)->drm, "Global MOCS entries:%d\n", info->n_entries);
> > @@ -100,7 +97,6 @@ static void read_mocs_table(struct xe_gt *gt,
> >   				   XELP_GLOBAL_MOCS(i).addr);
> >   	}
> >   	xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
> > -	xe_device_mem_access_put(gt_to_xe(gt));
> >   }
> >   static int mocs_kernel_test_run_device(struct xe_device *xe)
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > index 8e4a3b1f6b938..056c65c2675d8 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -715,7 +715,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
> >   	xe_assert(xe, migrate);
> >   	trace_xe_bo_move(bo);
> > -	xe_device_mem_access_get(xe);
> >   	if (xe_bo_is_pinned(bo) && !xe_bo_is_user(bo)) {
> >   		/*
> > @@ -739,7 +738,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
> >   				if (XE_WARN_ON(new_mem->start == XE_BO_INVALID_OFFSET)) {
> >   					ret = -EINVAL;
> > -					xe_device_mem_access_put(xe);
> >   					goto out;
> >   				}
> > @@ -757,7 +755,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
> >   						new_mem, handle_system_ccs);
> >   		if (IS_ERR(fence)) {
> >   			ret = PTR_ERR(fence);
> > -			xe_device_mem_access_put(xe);
> >   			goto out;
> >   		}
> >   		if (!move_lacks_source) {
> > @@ -782,8 +779,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
> >   		dma_fence_put(fence);
> >   	}
> > -	xe_device_mem_access_put(xe);
> > -
> >   out:
> >   	return ret;
> > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > index c1c19264a58b4..cb08a4369bb9e 100644
> > --- a/drivers/gpu/drm/xe/xe_device.c
> > +++ b/drivers/gpu/drm/xe/xe_device.c
> > @@ -44,12 +44,6 @@
> >   #include "xe_wait_user_fence.h"
> >   #include "xe_hwmon.h"
> > -#ifdef CONFIG_LOCKDEP
> > -struct lockdep_map xe_device_mem_access_lockdep_map = {
> > -	.name = "xe_device_mem_access_lockdep_map"
> > -};
> > -#endif
> 
> Did you mean to drop this? IMO we should for sure keep the lockdep
> annotations. Otherwise it is going to be really hard to validate the locking
> design and have reasonable confidence that we don't have deadlocks lurking,
> or as new users come along sprinkling rpm get in the wrong place.

Well, the whole goal of this series is to actually avoid sprinkling RPM calls at all.
We should only protect the outer bounds. I'm afraid that if we put this to the outer
bounds we would start getting false positives on this, no?!

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 08/20] drm/xe: Runtime PM wake on every exec
  2024-01-09 11:24   ` Matthew Auld
@ 2024-01-09 17:41     ` Rodrigo Vivi
  2024-01-09 18:40       ` Matthew Auld
  0 siblings, 1 reply; 46+ messages in thread
From: Rodrigo Vivi @ 2024-01-09 17:41 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe

On Tue, Jan 09, 2024 at 11:24:34AM +0000, Matthew Auld wrote:
> On 28/12/2023 02:12, Rodrigo Vivi wrote:
> > Let's ensure our PCI device is awaken on every GT execution to
> > the end of the execution.
> > Let's increase the runtime_pm protection and start moving
> > that to the outer bounds.
> > 
> > Let's also remove the unnecessary mem_access get/put.
> > 
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_sched_job.c | 10 +++++-----
> >   1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
> > index 01106a1156ad8..0b30ec77fc5ad 100644
> > --- a/drivers/gpu/drm/xe/xe_sched_job.c
> > +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> > @@ -15,6 +15,7 @@
> >   #include "xe_hw_fence.h"
> >   #include "xe_lrc.h"
> >   #include "xe_macros.h"
> > +#include "xe_pm.h"
> >   #include "xe_trace.h"
> >   #include "xe_vm.h"
> > @@ -67,6 +68,8 @@ static void job_free(struct xe_sched_job *job)
> >   	struct xe_exec_queue *q = job->q;
> >   	bool is_migration = xe_sched_job_is_migration(q);
> > +	xe_pm_runtime_put(gt_to_xe(q->gt));
> > +
> >   	kmem_cache_free(xe_exec_queue_is_parallel(job->q) || is_migration ?
> >   			xe_sched_job_parallel_slab : xe_sched_job_slab, job);
> >   }
> > @@ -86,6 +89,8 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
> >   	int i, j;
> >   	u32 width;
> > +	xe_pm_runtime_get(gt_to_xe(q->gt));
> > +
> 
> This seems way too deep in the call chain. If this actually wakes up the
> device we will end up with all of the same d3cold deadlock issues. Like here
> we are for sure holding stuff like dma-resv, but the rpm callbacks also want
> to grab it. IMO this needs to be something like runtime_get_if_active(),
> with the upper layers already ensuring device is awake (like ioctl), so here
> we are just keeping it awake until the job is done. Or maybe this is how it
> is by the end of the series?

we have 2 cases here, one that it is already awake by the ioctl and the
other that is on the eviction preparation and that exit because of the
'current' task. So we should be good anyways, but you are right, maybe
using the get_if_active is better here for clarity.

> 
> >   	/* only a kernel context can submit a vm-less job */
> >   	XE_WARN_ON(!q->vm && !(q->flags & EXEC_QUEUE_FLAG_KERNEL));
> > @@ -155,9 +160,6 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
> >   	for (i = 0; i < width; ++i)
> >   		job->batch_addr[i] = batch_addr[i];
> > -	/* All other jobs require a VM to be open which has a ref */
> > -	if (unlikely(q->flags & EXEC_QUEUE_FLAG_KERNEL))
> > -		xe_device_mem_access_get(job_to_xe(job));
> >   	xe_device_assert_mem_access(job_to_xe(job));
> >   	trace_xe_sched_job_create(job);
> > @@ -189,8 +191,6 @@ void xe_sched_job_destroy(struct kref *ref)
> >   	struct xe_sched_job *job =
> >   		container_of(ref, struct xe_sched_job, refcount);
> > -	if (unlikely(job->q->flags & EXEC_QUEUE_FLAG_KERNEL))
> > -		xe_device_mem_access_put(job_to_xe(job));
> >   	xe_exec_queue_put(job->q);
> >   	dma_fence_put(job->fence);
> >   	drm_sched_job_cleanup(&job->drm);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 06/20] drm/xe: Convert mem_access assertion towards the runtime_pm state
  2024-01-09 11:06   ` Matthew Auld
@ 2024-01-09 17:50     ` Rodrigo Vivi
  0 siblings, 0 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2024-01-09 17:50 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe

On Tue, Jan 09, 2024 at 11:06:19AM +0000, Matthew Auld wrote:
> On 28/12/2023 02:12, Rodrigo Vivi wrote:
> > The mem_access helpers are going away and getting replaced by
> > direct calls of the xe_pm_runtime_{get,put} functions. However, an
> > assertion with a warning splat is desired when we hit the worst
> > case of a memory access with the device really in the 'suspended'
> > state.
> > 
> > Also, this needs to be the first step. Otherwise, the upcoming
> > conversion would be really noise with warn splats of missing mem_access
> > gets.
> > 
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_device.c | 13 ++++++++++++-
> >   drivers/gpu/drm/xe/xe_pm.c     | 16 ++++++++++++++++
> >   drivers/gpu/drm/xe/xe_pm.h     |  1 +
> >   3 files changed, 29 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > index 86867d42d5329..dc3721bb37b1e 100644
> > --- a/drivers/gpu/drm/xe/xe_device.c
> > +++ b/drivers/gpu/drm/xe/xe_device.c
> > @@ -631,9 +631,20 @@ bool xe_device_mem_access_ongoing(struct xe_device *xe)
> >   	return atomic_read(&xe->mem_access.ref);
> >   }
> > +/**
> > + * xe_device_assert_mem_access - Inspect the current runtime_pm state.
> > + * @xe: xe device instance
> > + *
> > + * To be used before any kind of memory access. It will splat a debug warning
> > + * if the device is currently sleeping. But it doesn't guarantee in any way
> > + * that the device is going to continue awake. Xe PM runtime get and put
> > + * functions might be added to the outer bound of the memory access, while
> > + * this check is intended for inner usage to splat some warning if the worst
> > + * case has just happened.
> > + */
> >   void xe_device_assert_mem_access(struct xe_device *xe)
> >   {
> > -	XE_WARN_ON(!xe_device_mem_access_ongoing(xe));
> > +	XE_WARN_ON(xe_pm_runtime_suspended(xe));
> >   }
> >   bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe)
> > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > index cabed94a21873..45114e4e76a5a 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.c
> > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > @@ -246,6 +246,22 @@ struct task_struct *xe_pm_read_callback_task(struct xe_device *xe)
> >   	return READ_ONCE(xe->pm_callback_task);
> >   }
> > +/**
> > + * xe_pm_runtime_suspended - Inspect the current runtime_pm state.
> > + * @xe: xe device instance
> > + *
> > + * This does not provide any guarantee that the device is going to continue
> > + * suspended as it might be racing with the runtime state transitions.
> > + * It can be used only as a non-reliable assertion, to ensure that we are not in
> > + * the sleep state while trying to access some memory for instance.
> > + *
> > + * Returns true if PCI device is suspended, false otherwise.
> > + */
> > +bool xe_pm_runtime_suspended(struct xe_device *xe)
> > +{
> > +	return pm_runtime_suspended(xe->drm.dev);
> 
> Would it not be better to check for active instead? That way we can check
> for !active above and create a bigger net with SUSPENDING and RESUMING
> states also being invalid i.e another task is about to suspend or hasn't
> fully resumed yet. We might also need to also check the callback task
> though.

In both transition cases, the device is anyway awake. And this check will
be called from places that we know that we are in the transition, like
eviction/restore. So we could convert to active, but then we would need
to check if the task is not current as well. And also have the risk with
calls from workqueues that doesn't come from the same task.

Since this pm_runtime var cannot be trusted anyway because we can not
hold the dev power lock, this will be unreliable it doesn't matter
how you put. So I decided for the quick single shot only to ensure
that we get some clue if we hit the worst case that is device for
sure in the sleep state while accessing some memory.

> 
> > +}
> > +
> >   /**
> >    * xe_pm_runtime_suspend - Prepare our device for D3hot/D3Cold
> >    * @xe: xe device instance
> > diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
> > index 069f41c61505b..67a9bf3dd379b 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.h
> > +++ b/drivers/gpu/drm/xe/xe_pm.h
> > @@ -22,6 +22,7 @@ int xe_pm_resume(struct xe_device *xe);
> >   void xe_pm_init(struct xe_device *xe);
> >   void xe_pm_runtime_fini(struct xe_device *xe);
> > +bool xe_pm_runtime_suspended(struct xe_device *xe);
> >   int xe_pm_runtime_suspend(struct xe_device *xe);
> >   int xe_pm_runtime_resume(struct xe_device *xe);
> >   void xe_pm_runtime_get(struct xe_device *xe);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 07/20] drm/xe: Runtime PM wake on every IOCTL
  2024-01-02 11:30   ` Gupta, Anshuman
@ 2024-01-09 17:57     ` Rodrigo Vivi
  0 siblings, 0 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2024-01-09 17:57 UTC (permalink / raw)
  To: Gupta, Anshuman; +Cc: Deak, Imre, intel-xe@lists.freedesktop.org

On Tue, Jan 02, 2024 at 06:30:31AM -0500, Gupta, Anshuman wrote:
> 
> 
> > -----Original Message-----
> > From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of Rodrigo
> > Vivi
> > Sent: Thursday, December 28, 2023 7:42 AM
> > To: intel-xe@lists.freedesktop.org
> > Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
> > Subject: [RFC 07/20] drm/xe: Runtime PM wake on every IOCTL
> > 
> > Let's ensure our PCI device is awaken on every IOCTL entry.
> > Let's increase the runtime_pm protection and start moving that to the outer
> > bounds.
> IMO we need to decouple dc9 from runtime suspend as prev patch " [RFC,05/20] drm/xe: Prepare display for D3Cold"
> added that. Let dc9 to be enable when all display are off. Otherwise blocking runtime PM on every ioctl will also block
> DC9 unnecessary.

Good catch. We need to decouple that somehow. I will take a look into
that later.

> Thanks,
> Anshuman Gupta.
> > 
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_device.c | 32
> > ++++++++++++++++++++++++++++++--
> >  drivers/gpu/drm/xe/xe_pm.c     | 15 +++++++++++++++
> >  drivers/gpu/drm/xe/xe_pm.h     |  1 +
> >  3 files changed, 46 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > index dc3721bb37b1e..ee9b6612eec43 100644
> > --- a/drivers/gpu/drm/xe/xe_device.c
> > +++ b/drivers/gpu/drm/xe/xe_device.c
> > @@ -140,15 +140,43 @@ static const struct drm_ioctl_desc xe_ioctls[] = {
> >  			  DRM_RENDER_ALLOW),
> >  };
> > 
> > +static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned
> > +long arg) {
> > +	struct drm_file *file_priv = file->private_data;
> > +	struct xe_device *xe = to_xe_device(file_priv->minor->dev);
> > +	long ret;
> > +
> > +	ret = xe_pm_runtime_get_sync(xe);
> > +	if (ret >= 0)
> > +		ret = drm_ioctl(file, cmd, arg);
> > +	xe_pm_runtime_put(xe);
> > +
> > +	return ret;
> > +}
> > +
> > +static long xe_drm_compat_ioctl(struct file *file, unsigned int cmd,
> > +unsigned long arg) {
> > +	struct drm_file *file_priv = file->private_data;
> > +	struct xe_device *xe = to_xe_device(file_priv->minor->dev);
> > +	long ret;
> > +
> > +	ret = xe_pm_runtime_get_sync(xe);
> > +	if (ret >= 0)
> > +		ret = drm_compat_ioctl(file, cmd, arg);
> > +	xe_pm_runtime_put(xe);
> > +
> > +	return ret;
> > +}
> > +
> >  static const struct file_operations xe_driver_fops = {
> >  	.owner = THIS_MODULE,
> >  	.open = drm_open,
> >  	.release = drm_release_noglobal,
> > -	.unlocked_ioctl = drm_ioctl,
> > +	.unlocked_ioctl = xe_drm_ioctl,
> >  	.mmap = drm_gem_mmap,
> >  	.poll = drm_poll,
> >  	.read = drm_read,
> > -	.compat_ioctl = drm_compat_ioctl,
> > +	.compat_ioctl = xe_drm_compat_ioctl,
> >  	.llseek = noop_llseek,
> >  #ifdef CONFIG_PROC_FS
> >  	.show_fdinfo = drm_show_fdinfo,
> > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c index
> > 45114e4e76a5a..f599707413f18 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.c
> > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > @@ -411,6 +411,21 @@ void xe_pm_runtime_put(struct xe_device *xe)
> >  	pm_runtime_put(xe->drm.dev);
> >  }
> > 
> > +/**
> > + * xe_pm_runtime_get_sync - Get a runtime_pm reference and resume
> > +synchronously
> > + * @xe: xe device instance
> > + *
> > + * Returns: Any number grater than or equal to 0 for success, negative
> > +error
> > + * code otherwise.
> > + */
> > +int xe_pm_runtime_get_sync(struct xe_device *xe) {
> > +        if (WARN_ON(xe_pm_read_callback_task(xe) == current))
> > +                return -ELOOP;
> > +
> > +        return pm_runtime_get_sync(xe->drm.dev); }
> > +
> >  /**
> >   * xe_pm_runtime_get_if_active - Get a runtime_pm reference if device active
> >   * @xe: xe device instance
> > diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
> > index 67a9bf3dd379b..d0e6011a80688 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.h
> > +++ b/drivers/gpu/drm/xe/xe_pm.h
> > @@ -26,6 +26,7 @@ bool xe_pm_runtime_suspended(struct xe_device *xe);
> > int xe_pm_runtime_suspend(struct xe_device *xe);  int
> > xe_pm_runtime_resume(struct xe_device *xe);  void
> > xe_pm_runtime_get(struct xe_device *xe);
> > +int xe_pm_runtime_get_sync(struct xe_device *xe);
> >  void xe_pm_runtime_put(struct xe_device *xe);  int
> > xe_pm_runtime_get_if_active(struct xe_device *xe);  bool
> > xe_pm_runtime_get_if_in_use(struct xe_device *xe);
> > --
> > 2.43.0
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 16/20] drm/xe: Remove mem_access calls from migration
  2024-01-09 12:33   ` Matthew Auld
@ 2024-01-09 17:58     ` Rodrigo Vivi
  2024-01-09 18:49       ` Matthew Auld
  0 siblings, 1 reply; 46+ messages in thread
From: Rodrigo Vivi @ 2024-01-09 17:58 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe

On Tue, Jan 09, 2024 at 12:33:25PM +0000, Matthew Auld wrote:
> On 28/12/2023 02:12, Rodrigo Vivi wrote:
> > The sched jobs runtime pm calls already protects every execution,
> > including these migration ones.
> 
> Is job really enough here? I assume queue is only destroyed once it has no
> more jobs and the final queue ref is dropped. And destroying the queue might
> involve stuff like de-register the context with GuC etc. which needs to use
> CT which will need rpm ref. What is holding the rpm if not the vm or queue?

The exec queue is holding to the end.

> 
> > 
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> >   drivers/gpu/drm/xe/tests/xe_migrate.c |  2 --
> >   drivers/gpu/drm/xe/xe_device.c        | 17 -----------------
> >   drivers/gpu/drm/xe/xe_device.h        |  1 -
> >   drivers/gpu/drm/xe/xe_exec_queue.c    | 18 ------------------
> >   4 files changed, 38 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > index 7a32faa2f6888..2257f0a28435b 100644
> > --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > @@ -428,9 +428,7 @@ static int migrate_test_run_device(struct xe_device *xe)
> >   		kunit_info(test, "Testing tile id %d.\n", id);
> >   		xe_vm_lock(m->q->vm, true);
> > -		xe_device_mem_access_get(xe);
> >   		xe_migrate_sanity_test(m, test);
> > -		xe_device_mem_access_put(xe);
> >   		xe_vm_unlock(m->q->vm);
> >   	}
> > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > index ee9b6612eec43..a7bec49da49fa 100644
> > --- a/drivers/gpu/drm/xe/xe_device.c
> > +++ b/drivers/gpu/drm/xe/xe_device.c
> > @@ -675,23 +675,6 @@ void xe_device_assert_mem_access(struct xe_device *xe)
> >   	XE_WARN_ON(xe_pm_runtime_suspended(xe));
> >   }
> > -bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe)
> > -{
> > -	bool active;
> > -
> > -	if (xe_pm_read_callback_task(xe) == current)
> > -		return true;
> > -
> > -	active = xe_pm_runtime_get_if_active(xe);
> > -	if (active) {
> > -		int ref = atomic_inc_return(&xe->mem_access.ref);
> > -
> > -		xe_assert(xe, ref != S32_MAX);
> > -	}
> > -
> > -	return active;
> > -}
> > -
> >   void xe_device_mem_access_get(struct xe_device *xe)
> >   {
> >   	int ref;
> > diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> > index af8ac2e9e2709..4acf4c2973390 100644
> > --- a/drivers/gpu/drm/xe/xe_device.h
> > +++ b/drivers/gpu/drm/xe/xe_device.h
> > @@ -142,7 +142,6 @@ static inline struct xe_force_wake *gt_to_fw(struct xe_gt *gt)
> >   }
> >   void xe_device_mem_access_get(struct xe_device *xe);
> > -bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe);
> >   void xe_device_mem_access_put(struct xe_device *xe);
> >   void xe_device_assert_mem_access(struct xe_device *xe);
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> > index 44fe8097b7cda..d3a8d2d8caaaf 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > @@ -87,17 +87,6 @@ static struct xe_exec_queue *__xe_exec_queue_create(struct xe_device *xe,
> >   	if (err)
> >   		goto err_lrc;
> > -	/*
> > -	 * Normally the user vm holds an rpm ref to keep the device
> > -	 * awake, and the context holds a ref for the vm, however for
> > -	 * some engines we use the kernels migrate vm underneath which offers no
> > -	 * such rpm ref, or we lack a vm. Make sure we keep a ref here, so we
> > -	 * can perform GuC CT actions when needed. Caller is expected to have
> > -	 * already grabbed the rpm ref outside any sensitive locks.
> > -	 */
> > -	if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !vm))
> > -		drm_WARN_ON(&xe->drm, !xe_device_mem_access_get_if_ongoing(xe));
> > -
> >   	return q;
> >   err_lrc:
> > @@ -172,8 +161,6 @@ void xe_exec_queue_fini(struct xe_exec_queue *q)
> >   	for (i = 0; i < q->width; ++i)
> >   		xe_lrc_finish(q->lrc + i);
> > -	if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !q->vm))
> > -		xe_device_mem_access_put(gt_to_xe(q->gt));
> >   	if (q->vm)
> >   		xe_vm_put(q->vm);
> > @@ -643,9 +630,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
> >   			if (XE_IOCTL_DBG(xe, !hwe))
> >   				return -EINVAL;
> > -			/* The migration vm doesn't hold rpm ref */
> > -			xe_device_mem_access_get(xe);
> > -
> >   			migrate_vm = xe_migrate_get_vm(gt_to_tile(gt)->migrate);
> >   			new = xe_exec_queue_create(xe, migrate_vm, logical_mask,
> >   						   args->width, hwe,
> > @@ -655,8 +639,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
> >   						    EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD :
> >   						    0));
> > -			xe_device_mem_access_put(xe); /* now held by engine */
> > -
> >   			xe_vm_put(migrate_vm);
> >   			if (IS_ERR(new)) {
> >   				err = PTR_ERR(new);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 20/20] drm/xe: Mega Kill of mem_access
  2024-01-09 17:39     ` Rodrigo Vivi
@ 2024-01-09 18:27       ` Matthew Auld
  2024-01-09 22:34         ` Rodrigo Vivi
  0 siblings, 1 reply; 46+ messages in thread
From: Matthew Auld @ 2024-01-09 18:27 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

On 09/01/2024 17:39, Rodrigo Vivi wrote:
> On Tue, Jan 09, 2024 at 11:41:35AM +0000, Matthew Auld wrote:
>> On 28/12/2023 02:12, Rodrigo Vivi wrote:
>>> All of these remaining cases should already be protected
>>> by the outer bound calls of runtime_pm
>>>
>>> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/display/xe_fb_pin.c |  7 +--
>>>    drivers/gpu/drm/xe/tests/xe_bo.c       |  8 ----
>>>    drivers/gpu/drm/xe/tests/xe_mocs.c     |  4 --
>>>    drivers/gpu/drm/xe/xe_bo.c             |  5 ---
>>>    drivers/gpu/drm/xe/xe_device.c         | 59 --------------------------
>>>    drivers/gpu/drm/xe/xe_device.h         |  7 ---
>>>    drivers/gpu/drm/xe/xe_device_types.h   |  9 ----
>>>    drivers/gpu/drm/xe/xe_ggtt.c           |  6 ---
>>>    drivers/gpu/drm/xe/xe_gsc.c            |  3 --
>>>    drivers/gpu/drm/xe/xe_gt.c             | 17 --------
>>>    drivers/gpu/drm/xe/xe_huc_debugfs.c    |  2 -
>>>    drivers/gpu/drm/xe/xe_pat.c            | 10 -----
>>>    drivers/gpu/drm/xe/xe_pm.c             | 27 ------------
>>>    drivers/gpu/drm/xe/xe_query.c          |  4 --
>>>    drivers/gpu/drm/xe/xe_tile.c           | 10 ++---
>>>    drivers/gpu/drm/xe/xe_vm.c             |  7 ---
>>>    16 files changed, 5 insertions(+), 180 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>> index 722c84a566073..077294ec50ece 100644
>>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>> @@ -190,10 +190,9 @@ static int __xe_pin_fb_vma_ggtt(struct intel_framebuffer *fb,
>>>    	/* TODO: Consider sharing framebuffer mapping?
>>>    	 * embed i915_vma inside intel_framebuffer
>>>    	 */
>>> -	xe_device_mem_access_get(tile_to_xe(ggtt->tile));
>>>    	ret = mutex_lock_interruptible(&ggtt->lock);
>>>    	if (ret)
>>> -		goto out;
>>> +		return ret;
>>>    	align = XE_PAGE_SIZE;
>>>    	if (xe_bo_is_vram(bo) && ggtt->flags & XE_GGTT_FLAGS_64K)
>>> @@ -241,8 +240,6 @@ static int __xe_pin_fb_vma_ggtt(struct intel_framebuffer *fb,
>>>    	xe_ggtt_invalidate(ggtt);
>>>    out_unlock:
>>>    	mutex_unlock(&ggtt->lock);
>>> -out:
>>> -	xe_device_mem_access_put(tile_to_xe(ggtt->tile));
>>>    	return ret;
>>>    }
>>> @@ -381,4 +378,4 @@ struct i915_address_space *intel_dpt_create(struct intel_framebuffer *fb)
>>>    void intel_dpt_destroy(struct i915_address_space *vm)
>>>    {
>>>    	return;
>>> -}
>>> \ No newline at end of file
>>> +}
>>> diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
>>> index 412b2e7ce40cb..97b10e597f0ad 100644
>>> --- a/drivers/gpu/drm/xe/tests/xe_bo.c
>>> +++ b/drivers/gpu/drm/xe/tests/xe_bo.c
>>> @@ -164,8 +164,6 @@ static int ccs_test_run_device(struct xe_device *xe)
>>>    		return 0;
>>>    	}
>>> -	xe_device_mem_access_get(xe);
>>> -
>>>    	for_each_tile(tile, xe, id) {
>>>    		/* For igfx run only for primary tile */
>>>    		if (!IS_DGFX(xe) && id > 0)
>>> @@ -173,8 +171,6 @@ static int ccs_test_run_device(struct xe_device *xe)
>>>    		ccs_test_run_tile(xe, tile, test);
>>>    	}
>>> -	xe_device_mem_access_put(xe);
>>> -
>>>    	return 0;
>>>    }
>>> @@ -336,13 +332,9 @@ static int evict_test_run_device(struct xe_device *xe)
>>>    		return 0;
>>>    	}
>>> -	xe_device_mem_access_get(xe);
>>> -
>>>    	for_each_tile(tile, xe, id)
>>>    		evict_test_run_tile(xe, tile, test);
>>> -	xe_device_mem_access_put(xe);
>>> -
>>>    	return 0;
>>>    }
>>> diff --git a/drivers/gpu/drm/xe/tests/xe_mocs.c b/drivers/gpu/drm/xe/tests/xe_mocs.c
>>> index 7dd34f94e8094..a12e7e2bb5861 100644
>>> --- a/drivers/gpu/drm/xe/tests/xe_mocs.c
>>> +++ b/drivers/gpu/drm/xe/tests/xe_mocs.c
>>> @@ -45,7 +45,6 @@ static void read_l3cc_table(struct xe_gt *gt,
>>>    	struct kunit *test = xe_cur_kunit();
>>> -	xe_device_mem_access_get(gt_to_xe(gt));
>>>    	ret = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
>>>    	KUNIT_ASSERT_EQ_MSG(test, ret, 0, "Forcewake Failed.\n");
>>>    	mocs_dbg(&gt_to_xe(gt)->drm, "L3CC entries:%d\n", info->n_entries);
>>> @@ -65,7 +64,6 @@ static void read_l3cc_table(struct xe_gt *gt,
>>>    				   XELP_LNCFCMOCS(i).addr);
>>>    	}
>>>    	xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
>>> -	xe_device_mem_access_put(gt_to_xe(gt));
>>>    }
>>>    static void read_mocs_table(struct xe_gt *gt,
>>> @@ -80,7 +78,6 @@ static void read_mocs_table(struct xe_gt *gt,
>>>    	struct kunit *test = xe_cur_kunit();
>>> -	xe_device_mem_access_get(gt_to_xe(gt));
>>>    	ret = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
>>>    	KUNIT_ASSERT_EQ_MSG(test, ret, 0, "Forcewake Failed.\n");
>>>    	mocs_dbg(&gt_to_xe(gt)->drm, "Global MOCS entries:%d\n", info->n_entries);
>>> @@ -100,7 +97,6 @@ static void read_mocs_table(struct xe_gt *gt,
>>>    				   XELP_GLOBAL_MOCS(i).addr);
>>>    	}
>>>    	xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
>>> -	xe_device_mem_access_put(gt_to_xe(gt));
>>>    }
>>>    static int mocs_kernel_test_run_device(struct xe_device *xe)
>>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>>> index 8e4a3b1f6b938..056c65c2675d8 100644
>>> --- a/drivers/gpu/drm/xe/xe_bo.c
>>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>>> @@ -715,7 +715,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>>>    	xe_assert(xe, migrate);
>>>    	trace_xe_bo_move(bo);
>>> -	xe_device_mem_access_get(xe);
>>>    	if (xe_bo_is_pinned(bo) && !xe_bo_is_user(bo)) {
>>>    		/*
>>> @@ -739,7 +738,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>>>    				if (XE_WARN_ON(new_mem->start == XE_BO_INVALID_OFFSET)) {
>>>    					ret = -EINVAL;
>>> -					xe_device_mem_access_put(xe);
>>>    					goto out;
>>>    				}
>>> @@ -757,7 +755,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>>>    						new_mem, handle_system_ccs);
>>>    		if (IS_ERR(fence)) {
>>>    			ret = PTR_ERR(fence);
>>> -			xe_device_mem_access_put(xe);
>>>    			goto out;
>>>    		}
>>>    		if (!move_lacks_source) {
>>> @@ -782,8 +779,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>>>    		dma_fence_put(fence);
>>>    	}
>>> -	xe_device_mem_access_put(xe);
>>> -
>>>    out:
>>>    	return ret;
>>> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>>> index c1c19264a58b4..cb08a4369bb9e 100644
>>> --- a/drivers/gpu/drm/xe/xe_device.c
>>> +++ b/drivers/gpu/drm/xe/xe_device.c
>>> @@ -44,12 +44,6 @@
>>>    #include "xe_wait_user_fence.h"
>>>    #include "xe_hwmon.h"
>>> -#ifdef CONFIG_LOCKDEP
>>> -struct lockdep_map xe_device_mem_access_lockdep_map = {
>>> -	.name = "xe_device_mem_access_lockdep_map"
>>> -};
>>> -#endif
>>
>> Did you mean to drop this? IMO we should for sure keep the lockdep
>> annotations. Otherwise it is going to be really hard to validate the locking
>> design and have reasonable confidence that we don't have deadlocks lurking,
>> or as new users come along sprinkling rpm get in the wrong place.
> 
> Well, the whole goal of this series is to actually avoid sprinkling RPM calls at all.

I mean new users are bound to appear, and they might add such calls in 
the wrong place. Lockdep would hopefully catch such things for us.

> We should only protect the outer bounds. I'm afraid that if we put this to the outer
> bounds we would start getting false positives on this, no?!

What kind of false positives? With this series the sync rpm get should 
be the outermost thing for the most part, and so the locking dependences 
should be minimal. If we drop the annotations we get no help from 
lockdep to tell us if the rpm resume and suspend callbacks are grabbing 
locks that are already held when calling the sync rpm get.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 08/20] drm/xe: Runtime PM wake on every exec
  2024-01-09 17:41     ` Rodrigo Vivi
@ 2024-01-09 18:40       ` Matthew Auld
  0 siblings, 0 replies; 46+ messages in thread
From: Matthew Auld @ 2024-01-09 18:40 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

On 09/01/2024 17:41, Rodrigo Vivi wrote:
> On Tue, Jan 09, 2024 at 11:24:34AM +0000, Matthew Auld wrote:
>> On 28/12/2023 02:12, Rodrigo Vivi wrote:
>>> Let's ensure our PCI device is awaken on every GT execution to
>>> the end of the execution.
>>> Let's increase the runtime_pm protection and start moving
>>> that to the outer bounds.
>>>
>>> Let's also remove the unnecessary mem_access get/put.
>>>
>>> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/xe_sched_job.c | 10 +++++-----
>>>    1 file changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
>>> index 01106a1156ad8..0b30ec77fc5ad 100644
>>> --- a/drivers/gpu/drm/xe/xe_sched_job.c
>>> +++ b/drivers/gpu/drm/xe/xe_sched_job.c
>>> @@ -15,6 +15,7 @@
>>>    #include "xe_hw_fence.h"
>>>    #include "xe_lrc.h"
>>>    #include "xe_macros.h"
>>> +#include "xe_pm.h"
>>>    #include "xe_trace.h"
>>>    #include "xe_vm.h"
>>> @@ -67,6 +68,8 @@ static void job_free(struct xe_sched_job *job)
>>>    	struct xe_exec_queue *q = job->q;
>>>    	bool is_migration = xe_sched_job_is_migration(q);
>>> +	xe_pm_runtime_put(gt_to_xe(q->gt));
>>> +
>>>    	kmem_cache_free(xe_exec_queue_is_parallel(job->q) || is_migration ?
>>>    			xe_sched_job_parallel_slab : xe_sched_job_slab, job);
>>>    }
>>> @@ -86,6 +89,8 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
>>>    	int i, j;
>>>    	u32 width;
>>> +	xe_pm_runtime_get(gt_to_xe(q->gt));
>>> +
>>
>> This seems way too deep in the call chain. If this actually wakes up the
>> device we will end up with all of the same d3cold deadlock issues. Like here
>> we are for sure holding stuff like dma-resv, but the rpm callbacks also want
>> to grab it. IMO this needs to be something like runtime_get_if_active(),
>> with the upper layers already ensuring device is awake (like ioctl), so here
>> we are just keeping it awake until the job is done. Or maybe this is how it
>> is by the end of the series?
> 
> we have 2 cases here, one that it is already awake by the ioctl and the
> other that is on the eviction preparation and that exit because of the
> 'current' task. So we should be good anyways, but you are right, maybe
> using the get_if_active is better here for clarity.

Yeah, looking at the code it is hard to know if the rpm get can actually 
wake up the device or not, or if waking up only happens in some tricky 
edge case or 1/1000 runs. That is also what is nice about the lockdep 
annotations, since we just assume that all rpm get calls can potentially 
wake the device up and lockdep knows exactly what locks are being held, 
and then so long as we have annotations in the rpm resume/suspend 
callback we can be reasonably sure we have no deadlocks, since it is 
just a question of code coverage, rather then actually needing to hit 
the 0 -> 1 transition for every caller on a real system in CI, which is 
kind of hard.

> 
>>
>>>    	/* only a kernel context can submit a vm-less job */
>>>    	XE_WARN_ON(!q->vm && !(q->flags & EXEC_QUEUE_FLAG_KERNEL));
>>> @@ -155,9 +160,6 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
>>>    	for (i = 0; i < width; ++i)
>>>    		job->batch_addr[i] = batch_addr[i];
>>> -	/* All other jobs require a VM to be open which has a ref */
>>> -	if (unlikely(q->flags & EXEC_QUEUE_FLAG_KERNEL))
>>> -		xe_device_mem_access_get(job_to_xe(job));
>>>    	xe_device_assert_mem_access(job_to_xe(job));
>>>    	trace_xe_sched_job_create(job);
>>> @@ -189,8 +191,6 @@ void xe_sched_job_destroy(struct kref *ref)
>>>    	struct xe_sched_job *job =
>>>    		container_of(ref, struct xe_sched_job, refcount);
>>> -	if (unlikely(job->q->flags & EXEC_QUEUE_FLAG_KERNEL))
>>> -		xe_device_mem_access_put(job_to_xe(job));
>>>    	xe_exec_queue_put(job->q);
>>>    	dma_fence_put(job->fence);
>>>    	drm_sched_job_cleanup(&job->drm);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 16/20] drm/xe: Remove mem_access calls from migration
  2024-01-09 17:58     ` Rodrigo Vivi
@ 2024-01-09 18:49       ` Matthew Auld
  2024-01-09 22:40         ` Rodrigo Vivi
  0 siblings, 1 reply; 46+ messages in thread
From: Matthew Auld @ 2024-01-09 18:49 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

On 09/01/2024 17:58, Rodrigo Vivi wrote:
> On Tue, Jan 09, 2024 at 12:33:25PM +0000, Matthew Auld wrote:
>> On 28/12/2023 02:12, Rodrigo Vivi wrote:
>>> The sched jobs runtime pm calls already protects every execution,
>>> including these migration ones.
>>
>> Is job really enough here? I assume queue is only destroyed once it has no
>> more jobs and the final queue ref is dropped. And destroying the queue might
>> involve stuff like de-register the context with GuC etc. which needs to use
>> CT which will need rpm ref. What is holding the rpm if not the vm or queue?
> 
> The exec queue is holding to the end.

Can you share some more details? AFAIK the queue destruction is async, 
and previously the vm underneath is holding the rpm or in the case of 
migration vm, if was the queue itself. But for the migration vm case 
that is removed below. I guess I'm missing something here.

> 
>>
>>>
>>> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/tests/xe_migrate.c |  2 --
>>>    drivers/gpu/drm/xe/xe_device.c        | 17 -----------------
>>>    drivers/gpu/drm/xe/xe_device.h        |  1 -
>>>    drivers/gpu/drm/xe/xe_exec_queue.c    | 18 ------------------
>>>    4 files changed, 38 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
>>> index 7a32faa2f6888..2257f0a28435b 100644
>>> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
>>> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
>>> @@ -428,9 +428,7 @@ static int migrate_test_run_device(struct xe_device *xe)
>>>    		kunit_info(test, "Testing tile id %d.\n", id);
>>>    		xe_vm_lock(m->q->vm, true);
>>> -		xe_device_mem_access_get(xe);
>>>    		xe_migrate_sanity_test(m, test);
>>> -		xe_device_mem_access_put(xe);
>>>    		xe_vm_unlock(m->q->vm);
>>>    	}
>>> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>>> index ee9b6612eec43..a7bec49da49fa 100644
>>> --- a/drivers/gpu/drm/xe/xe_device.c
>>> +++ b/drivers/gpu/drm/xe/xe_device.c
>>> @@ -675,23 +675,6 @@ void xe_device_assert_mem_access(struct xe_device *xe)
>>>    	XE_WARN_ON(xe_pm_runtime_suspended(xe));
>>>    }
>>> -bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe)
>>> -{
>>> -	bool active;
>>> -
>>> -	if (xe_pm_read_callback_task(xe) == current)
>>> -		return true;
>>> -
>>> -	active = xe_pm_runtime_get_if_active(xe);
>>> -	if (active) {
>>> -		int ref = atomic_inc_return(&xe->mem_access.ref);
>>> -
>>> -		xe_assert(xe, ref != S32_MAX);
>>> -	}
>>> -
>>> -	return active;
>>> -}
>>> -
>>>    void xe_device_mem_access_get(struct xe_device *xe)
>>>    {
>>>    	int ref;
>>> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
>>> index af8ac2e9e2709..4acf4c2973390 100644
>>> --- a/drivers/gpu/drm/xe/xe_device.h
>>> +++ b/drivers/gpu/drm/xe/xe_device.h
>>> @@ -142,7 +142,6 @@ static inline struct xe_force_wake *gt_to_fw(struct xe_gt *gt)
>>>    }
>>>    void xe_device_mem_access_get(struct xe_device *xe);
>>> -bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe);
>>>    void xe_device_mem_access_put(struct xe_device *xe);
>>>    void xe_device_assert_mem_access(struct xe_device *xe);
>>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>>> index 44fe8097b7cda..d3a8d2d8caaaf 100644
>>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>>> @@ -87,17 +87,6 @@ static struct xe_exec_queue *__xe_exec_queue_create(struct xe_device *xe,
>>>    	if (err)
>>>    		goto err_lrc;
>>> -	/*
>>> -	 * Normally the user vm holds an rpm ref to keep the device
>>> -	 * awake, and the context holds a ref for the vm, however for
>>> -	 * some engines we use the kernels migrate vm underneath which offers no
>>> -	 * such rpm ref, or we lack a vm. Make sure we keep a ref here, so we
>>> -	 * can perform GuC CT actions when needed. Caller is expected to have
>>> -	 * already grabbed the rpm ref outside any sensitive locks.
>>> -	 */
>>> -	if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !vm))
>>> -		drm_WARN_ON(&xe->drm, !xe_device_mem_access_get_if_ongoing(xe));
>>> -
>>>    	return q;
>>>    err_lrc:
>>> @@ -172,8 +161,6 @@ void xe_exec_queue_fini(struct xe_exec_queue *q)
>>>    	for (i = 0; i < q->width; ++i)
>>>    		xe_lrc_finish(q->lrc + i);
>>> -	if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !q->vm))
>>> -		xe_device_mem_access_put(gt_to_xe(q->gt));
>>>    	if (q->vm)
>>>    		xe_vm_put(q->vm);
>>> @@ -643,9 +630,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>>    			if (XE_IOCTL_DBG(xe, !hwe))
>>>    				return -EINVAL;
>>> -			/* The migration vm doesn't hold rpm ref */
>>> -			xe_device_mem_access_get(xe);
>>> -
>>>    			migrate_vm = xe_migrate_get_vm(gt_to_tile(gt)->migrate);
>>>    			new = xe_exec_queue_create(xe, migrate_vm, logical_mask,
>>>    						   args->width, hwe,
>>> @@ -655,8 +639,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>>    						    EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD :
>>>    						    0));
>>> -			xe_device_mem_access_put(xe); /* now held by engine */
>>> -
>>>    			xe_vm_put(migrate_vm);
>>>    			if (IS_ERR(new)) {
>>>    				err = PTR_ERR(new);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 20/20] drm/xe: Mega Kill of mem_access
  2024-01-09 18:27       ` Matthew Auld
@ 2024-01-09 22:34         ` Rodrigo Vivi
  0 siblings, 0 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2024-01-09 22:34 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe

On Tue, Jan 09, 2024 at 06:27:13PM +0000, Matthew Auld wrote:
> On 09/01/2024 17:39, Rodrigo Vivi wrote:
> > On Tue, Jan 09, 2024 at 11:41:35AM +0000, Matthew Auld wrote:
> > > On 28/12/2023 02:12, Rodrigo Vivi wrote:
> > > > All of these remaining cases should already be protected
> > > > by the outer bound calls of runtime_pm
> > > > 
> > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > ---
> > > >    drivers/gpu/drm/xe/display/xe_fb_pin.c |  7 +--
> > > >    drivers/gpu/drm/xe/tests/xe_bo.c       |  8 ----
> > > >    drivers/gpu/drm/xe/tests/xe_mocs.c     |  4 --
> > > >    drivers/gpu/drm/xe/xe_bo.c             |  5 ---
> > > >    drivers/gpu/drm/xe/xe_device.c         | 59 --------------------------
> > > >    drivers/gpu/drm/xe/xe_device.h         |  7 ---
> > > >    drivers/gpu/drm/xe/xe_device_types.h   |  9 ----
> > > >    drivers/gpu/drm/xe/xe_ggtt.c           |  6 ---
> > > >    drivers/gpu/drm/xe/xe_gsc.c            |  3 --
> > > >    drivers/gpu/drm/xe/xe_gt.c             | 17 --------
> > > >    drivers/gpu/drm/xe/xe_huc_debugfs.c    |  2 -
> > > >    drivers/gpu/drm/xe/xe_pat.c            | 10 -----
> > > >    drivers/gpu/drm/xe/xe_pm.c             | 27 ------------
> > > >    drivers/gpu/drm/xe/xe_query.c          |  4 --
> > > >    drivers/gpu/drm/xe/xe_tile.c           | 10 ++---
> > > >    drivers/gpu/drm/xe/xe_vm.c             |  7 ---
> > > >    16 files changed, 5 insertions(+), 180 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > > > index 722c84a566073..077294ec50ece 100644
> > > > --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > > > +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > > > @@ -190,10 +190,9 @@ static int __xe_pin_fb_vma_ggtt(struct intel_framebuffer *fb,
> > > >    	/* TODO: Consider sharing framebuffer mapping?
> > > >    	 * embed i915_vma inside intel_framebuffer
> > > >    	 */
> > > > -	xe_device_mem_access_get(tile_to_xe(ggtt->tile));
> > > >    	ret = mutex_lock_interruptible(&ggtt->lock);
> > > >    	if (ret)
> > > > -		goto out;
> > > > +		return ret;
> > > >    	align = XE_PAGE_SIZE;
> > > >    	if (xe_bo_is_vram(bo) && ggtt->flags & XE_GGTT_FLAGS_64K)
> > > > @@ -241,8 +240,6 @@ static int __xe_pin_fb_vma_ggtt(struct intel_framebuffer *fb,
> > > >    	xe_ggtt_invalidate(ggtt);
> > > >    out_unlock:
> > > >    	mutex_unlock(&ggtt->lock);
> > > > -out:
> > > > -	xe_device_mem_access_put(tile_to_xe(ggtt->tile));
> > > >    	return ret;
> > > >    }
> > > > @@ -381,4 +378,4 @@ struct i915_address_space *intel_dpt_create(struct intel_framebuffer *fb)
> > > >    void intel_dpt_destroy(struct i915_address_space *vm)
> > > >    {
> > > >    	return;
> > > > -}
> > > > \ No newline at end of file
> > > > +}
> > > > diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
> > > > index 412b2e7ce40cb..97b10e597f0ad 100644
> > > > --- a/drivers/gpu/drm/xe/tests/xe_bo.c
> > > > +++ b/drivers/gpu/drm/xe/tests/xe_bo.c
> > > > @@ -164,8 +164,6 @@ static int ccs_test_run_device(struct xe_device *xe)
> > > >    		return 0;
> > > >    	}
> > > > -	xe_device_mem_access_get(xe);
> > > > -
> > > >    	for_each_tile(tile, xe, id) {
> > > >    		/* For igfx run only for primary tile */
> > > >    		if (!IS_DGFX(xe) && id > 0)
> > > > @@ -173,8 +171,6 @@ static int ccs_test_run_device(struct xe_device *xe)
> > > >    		ccs_test_run_tile(xe, tile, test);
> > > >    	}
> > > > -	xe_device_mem_access_put(xe);
> > > > -
> > > >    	return 0;
> > > >    }
> > > > @@ -336,13 +332,9 @@ static int evict_test_run_device(struct xe_device *xe)
> > > >    		return 0;
> > > >    	}
> > > > -	xe_device_mem_access_get(xe);
> > > > -
> > > >    	for_each_tile(tile, xe, id)
> > > >    		evict_test_run_tile(xe, tile, test);
> > > > -	xe_device_mem_access_put(xe);
> > > > -
> > > >    	return 0;
> > > >    }
> > > > diff --git a/drivers/gpu/drm/xe/tests/xe_mocs.c b/drivers/gpu/drm/xe/tests/xe_mocs.c
> > > > index 7dd34f94e8094..a12e7e2bb5861 100644
> > > > --- a/drivers/gpu/drm/xe/tests/xe_mocs.c
> > > > +++ b/drivers/gpu/drm/xe/tests/xe_mocs.c
> > > > @@ -45,7 +45,6 @@ static void read_l3cc_table(struct xe_gt *gt,
> > > >    	struct kunit *test = xe_cur_kunit();
> > > > -	xe_device_mem_access_get(gt_to_xe(gt));
> > > >    	ret = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
> > > >    	KUNIT_ASSERT_EQ_MSG(test, ret, 0, "Forcewake Failed.\n");
> > > >    	mocs_dbg(&gt_to_xe(gt)->drm, "L3CC entries:%d\n", info->n_entries);
> > > > @@ -65,7 +64,6 @@ static void read_l3cc_table(struct xe_gt *gt,
> > > >    				   XELP_LNCFCMOCS(i).addr);
> > > >    	}
> > > >    	xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
> > > > -	xe_device_mem_access_put(gt_to_xe(gt));
> > > >    }
> > > >    static void read_mocs_table(struct xe_gt *gt,
> > > > @@ -80,7 +78,6 @@ static void read_mocs_table(struct xe_gt *gt,
> > > >    	struct kunit *test = xe_cur_kunit();
> > > > -	xe_device_mem_access_get(gt_to_xe(gt));
> > > >    	ret = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
> > > >    	KUNIT_ASSERT_EQ_MSG(test, ret, 0, "Forcewake Failed.\n");
> > > >    	mocs_dbg(&gt_to_xe(gt)->drm, "Global MOCS entries:%d\n", info->n_entries);
> > > > @@ -100,7 +97,6 @@ static void read_mocs_table(struct xe_gt *gt,
> > > >    				   XELP_GLOBAL_MOCS(i).addr);
> > > >    	}
> > > >    	xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
> > > > -	xe_device_mem_access_put(gt_to_xe(gt));
> > > >    }
> > > >    static int mocs_kernel_test_run_device(struct xe_device *xe)
> > > > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > > > index 8e4a3b1f6b938..056c65c2675d8 100644
> > > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > > @@ -715,7 +715,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
> > > >    	xe_assert(xe, migrate);
> > > >    	trace_xe_bo_move(bo);
> > > > -	xe_device_mem_access_get(xe);
> > > >    	if (xe_bo_is_pinned(bo) && !xe_bo_is_user(bo)) {
> > > >    		/*
> > > > @@ -739,7 +738,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
> > > >    				if (XE_WARN_ON(new_mem->start == XE_BO_INVALID_OFFSET)) {
> > > >    					ret = -EINVAL;
> > > > -					xe_device_mem_access_put(xe);
> > > >    					goto out;
> > > >    				}
> > > > @@ -757,7 +755,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
> > > >    						new_mem, handle_system_ccs);
> > > >    		if (IS_ERR(fence)) {
> > > >    			ret = PTR_ERR(fence);
> > > > -			xe_device_mem_access_put(xe);
> > > >    			goto out;
> > > >    		}
> > > >    		if (!move_lacks_source) {
> > > > @@ -782,8 +779,6 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
> > > >    		dma_fence_put(fence);
> > > >    	}
> > > > -	xe_device_mem_access_put(xe);
> > > > -
> > > >    out:
> > > >    	return ret;
> > > > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > > > index c1c19264a58b4..cb08a4369bb9e 100644
> > > > --- a/drivers/gpu/drm/xe/xe_device.c
> > > > +++ b/drivers/gpu/drm/xe/xe_device.c
> > > > @@ -44,12 +44,6 @@
> > > >    #include "xe_wait_user_fence.h"
> > > >    #include "xe_hwmon.h"
> > > > -#ifdef CONFIG_LOCKDEP
> > > > -struct lockdep_map xe_device_mem_access_lockdep_map = {
> > > > -	.name = "xe_device_mem_access_lockdep_map"
> > > > -};
> > > > -#endif
> > > 
> > > Did you mean to drop this? IMO we should for sure keep the lockdep
> > > annotations. Otherwise it is going to be really hard to validate the locking
> > > design and have reasonable confidence that we don't have deadlocks lurking,
> > > or as new users come along sprinkling rpm get in the wrong place.
> > 
> > Well, the whole goal of this series is to actually avoid sprinkling RPM calls at all.
> 
> I mean new users are bound to appear, and they might add such calls in the
> wrong place. Lockdep would hopefully catch such things for us.

we should actually catch that during reviews and push back ;)
But I do see your point.

> 
> > We should only protect the outer bounds. I'm afraid that if we put this to the outer
> > bounds we would start getting false positives on this, no?!
> 
> What kind of false positives? With this series the sync rpm get should be
> the outermost thing for the most part, and so the locking dependences should
> be minimal. If we drop the annotations we get no help from lockdep to tell
> us if the rpm resume and suspend callbacks are grabbing locks that are
> already held when calling the sync rpm get.

yeap, you are right. I honestly didn't think so deeply there and was just
afraid of some inversions.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 16/20] drm/xe: Remove mem_access calls from migration
  2024-01-09 18:49       ` Matthew Auld
@ 2024-01-09 22:40         ` Rodrigo Vivi
  2024-01-11 14:17           ` Matthew Brost
  0 siblings, 1 reply; 46+ messages in thread
From: Rodrigo Vivi @ 2024-01-09 22:40 UTC (permalink / raw)
  To: Matthew Auld, matthew.brost; +Cc: intel-xe

On Tue, Jan 09, 2024 at 06:49:47PM +0000, Matthew Auld wrote:
> On 09/01/2024 17:58, Rodrigo Vivi wrote:
> > On Tue, Jan 09, 2024 at 12:33:25PM +0000, Matthew Auld wrote:
> > > On 28/12/2023 02:12, Rodrigo Vivi wrote:
> > > > The sched jobs runtime pm calls already protects every execution,
> > > > including these migration ones.
> > > 
> > > Is job really enough here? I assume queue is only destroyed once it has no
> > > more jobs and the final queue ref is dropped. And destroying the queue might
> > > involve stuff like de-register the context with GuC etc. which needs to use
> > > CT which will need rpm ref. What is holding the rpm if not the vm or queue?
> > 
> > The exec queue is holding to the end.
> 
> Can you share some more details? AFAIK the queue destruction is async, and
> previously the vm underneath is holding the rpm or in the case of migration
> vm, if was the queue itself. But for the migration vm case that is removed
> below. I guess I'm missing something here.

Cc: Matthew Brost

I had understood that every vm case would use the exec_queue and the destroy
would cover them all. But if there's more VM operation happening beyond the
exec_queue destroy, then we need something else?

Brost, any suggestion on an outer bound case here that we would be safe,
but without entirely killing the RPM?

> 
> > 
> > > 
> > > > 
> > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > ---
> > > >    drivers/gpu/drm/xe/tests/xe_migrate.c |  2 --
> > > >    drivers/gpu/drm/xe/xe_device.c        | 17 -----------------
> > > >    drivers/gpu/drm/xe/xe_device.h        |  1 -
> > > >    drivers/gpu/drm/xe/xe_exec_queue.c    | 18 ------------------
> > > >    4 files changed, 38 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > > > index 7a32faa2f6888..2257f0a28435b 100644
> > > > --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> > > > +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > > > @@ -428,9 +428,7 @@ static int migrate_test_run_device(struct xe_device *xe)
> > > >    		kunit_info(test, "Testing tile id %d.\n", id);
> > > >    		xe_vm_lock(m->q->vm, true);
> > > > -		xe_device_mem_access_get(xe);
> > > >    		xe_migrate_sanity_test(m, test);
> > > > -		xe_device_mem_access_put(xe);
> > > >    		xe_vm_unlock(m->q->vm);
> > > >    	}
> > > > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > > > index ee9b6612eec43..a7bec49da49fa 100644
> > > > --- a/drivers/gpu/drm/xe/xe_device.c
> > > > +++ b/drivers/gpu/drm/xe/xe_device.c
> > > > @@ -675,23 +675,6 @@ void xe_device_assert_mem_access(struct xe_device *xe)
> > > >    	XE_WARN_ON(xe_pm_runtime_suspended(xe));
> > > >    }
> > > > -bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe)
> > > > -{
> > > > -	bool active;
> > > > -
> > > > -	if (xe_pm_read_callback_task(xe) == current)
> > > > -		return true;
> > > > -
> > > > -	active = xe_pm_runtime_get_if_active(xe);
> > > > -	if (active) {
> > > > -		int ref = atomic_inc_return(&xe->mem_access.ref);
> > > > -
> > > > -		xe_assert(xe, ref != S32_MAX);
> > > > -	}
> > > > -
> > > > -	return active;
> > > > -}
> > > > -
> > > >    void xe_device_mem_access_get(struct xe_device *xe)
> > > >    {
> > > >    	int ref;
> > > > diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> > > > index af8ac2e9e2709..4acf4c2973390 100644
> > > > --- a/drivers/gpu/drm/xe/xe_device.h
> > > > +++ b/drivers/gpu/drm/xe/xe_device.h
> > > > @@ -142,7 +142,6 @@ static inline struct xe_force_wake *gt_to_fw(struct xe_gt *gt)
> > > >    }
> > > >    void xe_device_mem_access_get(struct xe_device *xe);
> > > > -bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe);
> > > >    void xe_device_mem_access_put(struct xe_device *xe);
> > > >    void xe_device_assert_mem_access(struct xe_device *xe);
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > index 44fe8097b7cda..d3a8d2d8caaaf 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > @@ -87,17 +87,6 @@ static struct xe_exec_queue *__xe_exec_queue_create(struct xe_device *xe,
> > > >    	if (err)
> > > >    		goto err_lrc;
> > > > -	/*
> > > > -	 * Normally the user vm holds an rpm ref to keep the device
> > > > -	 * awake, and the context holds a ref for the vm, however for
> > > > -	 * some engines we use the kernels migrate vm underneath which offers no
> > > > -	 * such rpm ref, or we lack a vm. Make sure we keep a ref here, so we
> > > > -	 * can perform GuC CT actions when needed. Caller is expected to have
> > > > -	 * already grabbed the rpm ref outside any sensitive locks.
> > > > -	 */
> > > > -	if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !vm))
> > > > -		drm_WARN_ON(&xe->drm, !xe_device_mem_access_get_if_ongoing(xe));
> > > > -
> > > >    	return q;
> > > >    err_lrc:
> > > > @@ -172,8 +161,6 @@ void xe_exec_queue_fini(struct xe_exec_queue *q)
> > > >    	for (i = 0; i < q->width; ++i)
> > > >    		xe_lrc_finish(q->lrc + i);
> > > > -	if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !q->vm))
> > > > -		xe_device_mem_access_put(gt_to_xe(q->gt));
> > > >    	if (q->vm)
> > > >    		xe_vm_put(q->vm);
> > > > @@ -643,9 +630,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
> > > >    			if (XE_IOCTL_DBG(xe, !hwe))
> > > >    				return -EINVAL;
> > > > -			/* The migration vm doesn't hold rpm ref */
> > > > -			xe_device_mem_access_get(xe);
> > > > -
> > > >    			migrate_vm = xe_migrate_get_vm(gt_to_tile(gt)->migrate);
> > > >    			new = xe_exec_queue_create(xe, migrate_vm, logical_mask,
> > > >    						   args->width, hwe,
> > > > @@ -655,8 +639,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
> > > >    						    EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD :
> > > >    						    0));
> > > > -			xe_device_mem_access_put(xe); /* now held by engine */
> > > > -
> > > >    			xe_vm_put(migrate_vm);
> > > >    			if (IS_ERR(new)) {
> > > >    				err = PTR_ERR(new);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 00/20] First attempt to kill mem_access
  2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
                   ` (22 preceding siblings ...)
  2024-01-04  5:41 ` ✗ CI.KUnit: failure " Patchwork
@ 2024-01-10  5:21 ` Matthew Brost
  2024-01-10 14:06   ` Rodrigo Vivi
  23 siblings, 1 reply; 46+ messages in thread
From: Matthew Brost @ 2024-01-10  5:21 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

On Wed, Dec 27, 2023 at 09:12:12PM -0500, Rodrigo Vivi wrote:
> At first the mem_access seemed a good idea since it would ensure
> we could map every memory access and apply some workarounds and
> then use that to ensure that the device is awake.
> 
> However it has become a nightmare in locking conflicts with memory
> locking. The only sane way to go is to move the runtime_pm protection
> to the outer bounds and ensure that the device is resumed way
> before memory locking.
> 
> So, this RFC here is the first attempt to kill the mem access and
> have a clean rpm handling on the outer bounds.
> 
> Well, at this time we already know that we need to solve some TLB
> invalidation issues and the last patch in this series needs to
> be split in smaller pieces. But I'd like to at lest get
> the discussion started.
> 
> Happy New Year,
> Rodrigo.
> 

Hi Rodrigo - I haven't fully reviewed everything but noticed a few
issues to discuss.

1. LR mode VMs
	- I don't think the PM refs taken for LR jobs works. LR job's hw
	  fence is signal immediately after scheduling the job to the
	  hardware. Once the hw fence is signalled, the job can be
	  typically be freed.
	- How about we just take a PM reference when a LR VM is opened?

2. Tearing down exec queues
	- Tearing down exec queues requires a ping-ping with the GuC
	  which likely needs PM ref

3. Schedule enable G2H
	- First job on an exec queue will issue schedule enable H2G
	  which results in a G2H. This G2H could be recieved after the
	  job is freed

4. TLB Invalidations
	- Send H2G, receive G2H when done
	- Four cases
		a) From a (un)bind job
			- Job can free before invalidation issued /
			  complete
		b) GGTT invalidations
			- BO creation, should be covered by IOCTL PM ref
		c) Userptr invalidation / BO move on LR VM
			- should be covered by #1 if LR VM take PM ref
		d) Page fault handler
			- should be covered by #1 if LR VM take PM ref

5. SRIOV Relay?
	- Haven't looked into this all might have issues here too?

2, 3, 4a all are H2G waiting on G2H. Perhaps it is simplest to build the
PM references into the CT layer? A lower layer but off the top my head
not seeing a better option really.

e.g. A CT send that expects a G2H takes a PM ref with the caveat we
expect the device to already have a PM ref. The receive can drop the PM
ref and it can transition to zero.

Thoughts?

Matt

> Rodrigo Vivi (20):
>   drm/xe: Document Xe PM component
>   drm/xe: Fix display runtime_pm handling
>   drm/xe: Create a xe_pm_runtime_resume_and_get variant for display
>   drm/xe: Convert xe_pm_runtime_{get,put} to void and protect from
>     recursion
>   drm/xe: Prepare display for D3Cold
>   drm/xe: Convert mem_access assertion towards the runtime_pm state
>   drm/xe: Runtime PM wake on every IOCTL
>   drm/xe: Runtime PM wake on every exec
>   drm/xe: Runtime PM wake on every sysfs call
>   drm/xe: Sort some xe_pm_runtime related functions
>   drm/xe: Ensure device is awake before removing it
>   drm/xe: Remove mem_access from guc_pc calls
>   drm/xe: Runtime PM wake on every debugfs call
>   drm/xe: Replace dma_buf mem_access per direct xe_pm_runtime calls
>   drm/xe: Allow GuC CT fast path and worker regardless of runtime_pm
>   drm/xe: Remove mem_access calls from migration
>   drm/xe: Removing extra mem_access protection from runtime pm
>   drm/xe: Convert hwmon from mem_access to xe_pm_runtime calls
>   drm/xe: Remove unused runtime pm helper
>   drm/xe: Mega Kill of mem_access
> 
>  .../gpu/drm/xe/compat-i915-headers/i915_drv.h |   8 +-
>  drivers/gpu/drm/xe/display/xe_fb_pin.c        |   7 +-
>  drivers/gpu/drm/xe/tests/xe_bo.c              |   8 -
>  drivers/gpu/drm/xe/tests/xe_migrate.c         |   2 -
>  drivers/gpu/drm/xe/tests/xe_mocs.c            |   4 -
>  drivers/gpu/drm/xe/xe_bo.c                    |   5 -
>  drivers/gpu/drm/xe/xe_debugfs.c               |  10 +-
>  drivers/gpu/drm/xe/xe_device.c                | 129 ++++-------
>  drivers/gpu/drm/xe/xe_device.h                |   9 -
>  drivers/gpu/drm/xe/xe_device_sysfs.c          |   4 +
>  drivers/gpu/drm/xe/xe_device_types.h          |   9 -
>  drivers/gpu/drm/xe/xe_dma_buf.c               |   5 +-
>  drivers/gpu/drm/xe/xe_exec_queue.c            |  18 --
>  drivers/gpu/drm/xe/xe_ggtt.c                  |   6 -
>  drivers/gpu/drm/xe/xe_gsc.c                   |   3 -
>  drivers/gpu/drm/xe/xe_gt.c                    |  17 --
>  drivers/gpu/drm/xe/xe_gt_debugfs.c            |  53 ++++-
>  drivers/gpu/drm/xe/xe_gt_freq.c               |  38 +++-
>  drivers/gpu/drm/xe/xe_gt_idle.c               |  23 +-
>  drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c     |   3 +
>  drivers/gpu/drm/xe/xe_guc_ct.c                |  40 ----
>  drivers/gpu/drm/xe/xe_guc_debugfs.c           |   9 +-
>  drivers/gpu/drm/xe/xe_guc_pc.c                |  62 +----
>  drivers/gpu/drm/xe/xe_huc_debugfs.c           |   5 +-
>  drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c |  58 ++++-
>  drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.h |   7 +
>  drivers/gpu/drm/xe/xe_hwmon.c                 |  25 ++-
>  drivers/gpu/drm/xe/xe_pat.c                   |  10 -
>  drivers/gpu/drm/xe/xe_pci.c                   |   2 +-
>  drivers/gpu/drm/xe/xe_pm.c                    | 211 ++++++++++++++----
>  drivers/gpu/drm/xe/xe_pm.h                    |   9 +-
>  drivers/gpu/drm/xe/xe_query.c                 |   4 -
>  drivers/gpu/drm/xe/xe_sched_job.c             |  10 +-
>  drivers/gpu/drm/xe/xe_tile.c                  |  10 +-
>  drivers/gpu/drm/xe/xe_tile_sysfs.c            |   1 +
>  drivers/gpu/drm/xe/xe_ttm_sys_mgr.c           |   5 +-
>  drivers/gpu/drm/xe/xe_vm.c                    |   7 -
>  37 files changed, 445 insertions(+), 391 deletions(-)
> 
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 00/20] First attempt to kill mem_access
  2024-01-10  5:21 ` [RFC 00/20] " Matthew Brost
@ 2024-01-10 14:06   ` Rodrigo Vivi
  2024-01-10 14:08     ` Vivi, Rodrigo
  2024-01-10 14:33     ` Matthew Brost
  0 siblings, 2 replies; 46+ messages in thread
From: Rodrigo Vivi @ 2024-01-10 14:06 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Wed, Jan 10, 2024 at 05:21:34AM +0000, Matthew Brost wrote:
> On Wed, Dec 27, 2023 at 09:12:12PM -0500, Rodrigo Vivi wrote:
> > At first the mem_access seemed a good idea since it would ensure
> > we could map every memory access and apply some workarounds and
> > then use that to ensure that the device is awake.
> > 
> > However it has become a nightmare in locking conflicts with memory
> > locking. The only sane way to go is to move the runtime_pm protection
> > to the outer bounds and ensure that the device is resumed way
> > before memory locking.
> > 
> > So, this RFC here is the first attempt to kill the mem access and
> > have a clean rpm handling on the outer bounds.
> > 
> > Well, at this time we already know that we need to solve some TLB
> > invalidation issues and the last patch in this series needs to
> > be split in smaller pieces. But I'd like to at lest get
> > the discussion started.
> > 
> > Happy New Year,
> > Rodrigo.
> > 
> 
> Hi Rodrigo - I haven't fully reviewed everything but noticed a few
> issues to discuss.

+Auld, who was also raising very similar concerns.

> 
> 1. LR mode VMs
> 	- I don't think the PM refs taken for LR jobs works. LR job's hw
> 	  fence is signal immediately after scheduling the job to the
> 	  hardware. Once the hw fence is signalled, the job can be
> 	  typically be freed.
> 	- How about we just take a PM reference when a LR VM is opened?

I like this idea!

> 
> 2. Tearing down exec queues
> 	- Tearing down exec queues requires a ping-ping with the GuC
> 	  which likely needs PM ref

would the idea of getting with the CT that expects G2H help here as well?
(calling CT-expecting-G2H-ref now on)

> 
> 3. Schedule enable G2H
> 	- First job on an exec queue will issue schedule enable H2G
> 	  which results in a G2H. This G2H could be recieved after the
> 	  job is freed

for this, the CT-expecting-G2H-ref would be enough right?

> 
> 4. TLB Invalidations
> 	- Send H2G, receive G2H when done

for this, the CT-expecting-G2H-ref would be enough right?

> 	- Four cases
> 		a) From a (un)bind job
> 			- Job can free before invalidation issued /
> 			  complete

hmm... I believe I have faced this at some point.
would the CT-expecting-G2H help here?
or any other idea to cover this case?

> 		b) GGTT invalidations
> 			- BO creation, should be covered by IOCTL PM ref

this should be okay then.

> 		c) Userptr invalidation / BO move on LR VM
> 			- should be covered by #1 if LR VM take PM ref
> 		d) Page fault handler
> 			- should be covered by #1 if LR VM take PM ref
> 

these (c and d) would be okay with the LR-VM ref, right?

> 5. SRIOV Relay?
> 	- Haven't looked into this all might have issues here too?

would this be covered as well with the CT-expecting-G2H-ref?
or any big hammer needed of blocking rpm anytime that we have a VF
maybe?

> 
> 2, 3, 4a all are H2G waiting on G2H. Perhaps it is simplest to build the
> PM references into the CT layer? A lower layer but off the top my head
> not seeing a better option really.
> 
> e.g. A CT send that expects a G2H takes a PM ref with the caveat we
> expect the device to already have a PM ref. The receive can drop the PM
> ref and it can transition to zero.

one extra reason to keep the lockdep checks, but that should be okay
I believe. I will try it here.

> 
> Thoughts?

Basically it looks that we need:
1. Get back the lockdep
2. Add a big hammer around LR-VM (outer bound refs at VM creation and destruction if LR)
3. Add an inner rpm get around CT messages who expect G2H back messages and put on G2H responses.
For this, do you have any good idea for the right places and conditions for the proper balance?
and to ensure that we don't keep holding the ref forever in case of never getting the response...

Anything else that I might be missing?

Thank you all for all the great comments and suggestions!

> 
> Matt
> 
> > Rodrigo Vivi (20):
> >   drm/xe: Document Xe PM component
> >   drm/xe: Fix display runtime_pm handling
> >   drm/xe: Create a xe_pm_runtime_resume_and_get variant for display
> >   drm/xe: Convert xe_pm_runtime_{get,put} to void and protect from
> >     recursion
> >   drm/xe: Prepare display for D3Cold
> >   drm/xe: Convert mem_access assertion towards the runtime_pm state
> >   drm/xe: Runtime PM wake on every IOCTL
> >   drm/xe: Runtime PM wake on every exec
> >   drm/xe: Runtime PM wake on every sysfs call
> >   drm/xe: Sort some xe_pm_runtime related functions
> >   drm/xe: Ensure device is awake before removing it
> >   drm/xe: Remove mem_access from guc_pc calls
> >   drm/xe: Runtime PM wake on every debugfs call
> >   drm/xe: Replace dma_buf mem_access per direct xe_pm_runtime calls
> >   drm/xe: Allow GuC CT fast path and worker regardless of runtime_pm
> >   drm/xe: Remove mem_access calls from migration
> >   drm/xe: Removing extra mem_access protection from runtime pm
> >   drm/xe: Convert hwmon from mem_access to xe_pm_runtime calls
> >   drm/xe: Remove unused runtime pm helper
> >   drm/xe: Mega Kill of mem_access
> > 
> >  .../gpu/drm/xe/compat-i915-headers/i915_drv.h |   8 +-
> >  drivers/gpu/drm/xe/display/xe_fb_pin.c        |   7 +-
> >  drivers/gpu/drm/xe/tests/xe_bo.c              |   8 -
> >  drivers/gpu/drm/xe/tests/xe_migrate.c         |   2 -
> >  drivers/gpu/drm/xe/tests/xe_mocs.c            |   4 -
> >  drivers/gpu/drm/xe/xe_bo.c                    |   5 -
> >  drivers/gpu/drm/xe/xe_debugfs.c               |  10 +-
> >  drivers/gpu/drm/xe/xe_device.c                | 129 ++++-------
> >  drivers/gpu/drm/xe/xe_device.h                |   9 -
> >  drivers/gpu/drm/xe/xe_device_sysfs.c          |   4 +
> >  drivers/gpu/drm/xe/xe_device_types.h          |   9 -
> >  drivers/gpu/drm/xe/xe_dma_buf.c               |   5 +-
> >  drivers/gpu/drm/xe/xe_exec_queue.c            |  18 --
> >  drivers/gpu/drm/xe/xe_ggtt.c                  |   6 -
> >  drivers/gpu/drm/xe/xe_gsc.c                   |   3 -
> >  drivers/gpu/drm/xe/xe_gt.c                    |  17 --
> >  drivers/gpu/drm/xe/xe_gt_debugfs.c            |  53 ++++-
> >  drivers/gpu/drm/xe/xe_gt_freq.c               |  38 +++-
> >  drivers/gpu/drm/xe/xe_gt_idle.c               |  23 +-
> >  drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c     |   3 +
> >  drivers/gpu/drm/xe/xe_guc_ct.c                |  40 ----
> >  drivers/gpu/drm/xe/xe_guc_debugfs.c           |   9 +-
> >  drivers/gpu/drm/xe/xe_guc_pc.c                |  62 +----
> >  drivers/gpu/drm/xe/xe_huc_debugfs.c           |   5 +-
> >  drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c |  58 ++++-
> >  drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.h |   7 +
> >  drivers/gpu/drm/xe/xe_hwmon.c                 |  25 ++-
> >  drivers/gpu/drm/xe/xe_pat.c                   |  10 -
> >  drivers/gpu/drm/xe/xe_pci.c                   |   2 +-
> >  drivers/gpu/drm/xe/xe_pm.c                    | 211 ++++++++++++++----
> >  drivers/gpu/drm/xe/xe_pm.h                    |   9 +-
> >  drivers/gpu/drm/xe/xe_query.c                 |   4 -
> >  drivers/gpu/drm/xe/xe_sched_job.c             |  10 +-
> >  drivers/gpu/drm/xe/xe_tile.c                  |  10 +-
> >  drivers/gpu/drm/xe/xe_tile_sysfs.c            |   1 +
> >  drivers/gpu/drm/xe/xe_ttm_sys_mgr.c           |   5 +-
> >  drivers/gpu/drm/xe/xe_vm.c                    |   7 -
> >  37 files changed, 445 insertions(+), 391 deletions(-)
> > 
> > -- 
> > 2.43.0
> > 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 00/20] First attempt to kill mem_access
  2024-01-10 14:06   ` Rodrigo Vivi
@ 2024-01-10 14:08     ` Vivi, Rodrigo
  2024-01-10 14:33     ` Matthew Brost
  1 sibling, 0 replies; 46+ messages in thread
From: Vivi, Rodrigo @ 2024-01-10 14:08 UTC (permalink / raw)
  To: Brost, Matthew, Auld, Matthew; +Cc: intel-xe@lists.freedesktop.org

On Wed, 2024-01-10 at 09:06 -0500, Rodrigo Vivi wrote:
> On Wed, Jan 10, 2024 at 05:21:34AM +0000, Matthew Brost wrote:
> > On Wed, Dec 27, 2023 at 09:12:12PM -0500, Rodrigo Vivi wrote:
> > > At first the mem_access seemed a good idea since it would ensure
> > > we could map every memory access and apply some workarounds and
> > > then use that to ensure that the device is awake.
> > > 
> > > However it has become a nightmare in locking conflicts with
> > > memory
> > > locking. The only sane way to go is to move the runtime_pm
> > > protection
> > > to the outer bounds and ensure that the device is resumed way
> > > before memory locking.
> > > 
> > > So, this RFC here is the first attempt to kill the mem access and
> > > have a clean rpm handling on the outer bounds.
> > > 
> > > Well, at this time we already know that we need to solve some TLB
> > > invalidation issues and the last patch in this series needs to
> > > be split in smaller pieces. But I'd like to at lest get
> > > the discussion started.
> > > 
> > > Happy New Year,
> > > Rodrigo.
> > > 
> > 
> > Hi Rodrigo - I haven't fully reviewed everything but noticed a few
> > issues to discuss.
> 
> +Auld, who was also raising very similar concerns.
(actually doing it)

> 
> > 
> > 1. LR mode VMs
> >         - I don't think the PM refs taken for LR jobs works. LR
> > job's hw
> >           fence is signal immediately after scheduling the job to
> > the
> >           hardware. Once the hw fence is signalled, the job can be
> >           typically be freed.
> >         - How about we just take a PM reference when a LR VM is
> > opened?
> 
> I like this idea!
> 
> > 
> > 2. Tearing down exec queues
> >         - Tearing down exec queues requires a ping-ping with the
> > GuC
> >           which likely needs PM ref
> 
> would the idea of getting with the CT that expects G2H help here as
> well?
> (calling CT-expecting-G2H-ref now on)
> 
> > 
> > 3. Schedule enable G2H
> >         - First job on an exec queue will issue schedule enable H2G
> >           which results in a G2H. This G2H could be recieved after
> > the
> >           job is freed
> 
> for this, the CT-expecting-G2H-ref would be enough right?
> 
> > 
> > 4. TLB Invalidations
> >         - Send H2G, receive G2H when done
> 
> for this, the CT-expecting-G2H-ref would be enough right?
> 
> >         - Four cases
> >                 a) From a (un)bind job
> >                         - Job can free before invalidation issued /
> >                           complete
> 
> hmm... I believe I have faced this at some point.
> would the CT-expecting-G2H help here?
> or any other idea to cover this case?
> 
> >                 b) GGTT invalidations
> >                         - BO creation, should be covered by IOCTL
> > PM ref
> 
> this should be okay then.
> 
> >                 c) Userptr invalidation / BO move on LR VM
> >                         - should be covered by #1 if LR VM take PM
> > ref
> >                 d) Page fault handler
> >                         - should be covered by #1 if LR VM take PM
> > ref
> > 
> 
> these (c and d) would be okay with the LR-VM ref, right?
> 
> > 5. SRIOV Relay?
> >         - Haven't looked into this all might have issues here too?
> 
> would this be covered as well with the CT-expecting-G2H-ref?
> or any big hammer needed of blocking rpm anytime that we have a VF
> maybe?
> 
> > 
> > 2, 3, 4a all are H2G waiting on G2H. Perhaps it is simplest to
> > build the
> > PM references into the CT layer? A lower layer but off the top my
> > head
> > not seeing a better option really.
> > 
> > e.g. A CT send that expects a G2H takes a PM ref with the caveat we
> > expect the device to already have a PM ref. The receive can drop
> > the PM
> > ref and it can transition to zero.
> 
> one extra reason to keep the lockdep checks, but that should be okay
> I believe. I will try it here.
> 
> > 
> > Thoughts?
> 
> Basically it looks that we need:
> 1. Get back the lockdep
> 2. Add a big hammer around LR-VM (outer bound refs at VM creation and
> destruction if LR)
> 3. Add an inner rpm get around CT messages who expect G2H back
> messages and put on G2H responses.
> For this, do you have any good idea for the right places and
> conditions for the proper balance?
> and to ensure that we don't keep holding the ref forever in case of
> never getting the response...
> 
> Anything else that I might be missing?
> 
> Thank you all for all the great comments and suggestions!
> 
> > 
> > Matt
> > 
> > > Rodrigo Vivi (20):
> > >   drm/xe: Document Xe PM component
> > >   drm/xe: Fix display runtime_pm handling
> > >   drm/xe: Create a xe_pm_runtime_resume_and_get variant for
> > > display
> > >   drm/xe: Convert xe_pm_runtime_{get,put} to void and protect
> > > from
> > >     recursion
> > >   drm/xe: Prepare display for D3Cold
> > >   drm/xe: Convert mem_access assertion towards the runtime_pm
> > > state
> > >   drm/xe: Runtime PM wake on every IOCTL
> > >   drm/xe: Runtime PM wake on every exec
> > >   drm/xe: Runtime PM wake on every sysfs call
> > >   drm/xe: Sort some xe_pm_runtime related functions
> > >   drm/xe: Ensure device is awake before removing it
> > >   drm/xe: Remove mem_access from guc_pc calls
> > >   drm/xe: Runtime PM wake on every debugfs call
> > >   drm/xe: Replace dma_buf mem_access per direct xe_pm_runtime
> > > calls
> > >   drm/xe: Allow GuC CT fast path and worker regardless of
> > > runtime_pm
> > >   drm/xe: Remove mem_access calls from migration
> > >   drm/xe: Removing extra mem_access protection from runtime pm
> > >   drm/xe: Convert hwmon from mem_access to xe_pm_runtime calls
> > >   drm/xe: Remove unused runtime pm helper
> > >   drm/xe: Mega Kill of mem_access
> > > 
> > >  .../gpu/drm/xe/compat-i915-headers/i915_drv.h |   8 +-
> > >  drivers/gpu/drm/xe/display/xe_fb_pin.c        |   7 +-
> > >  drivers/gpu/drm/xe/tests/xe_bo.c              |   8 -
> > >  drivers/gpu/drm/xe/tests/xe_migrate.c         |   2 -
> > >  drivers/gpu/drm/xe/tests/xe_mocs.c            |   4 -
> > >  drivers/gpu/drm/xe/xe_bo.c                    |   5 -
> > >  drivers/gpu/drm/xe/xe_debugfs.c               |  10 +-
> > >  drivers/gpu/drm/xe/xe_device.c                | 129 ++++-------
> > >  drivers/gpu/drm/xe/xe_device.h                |   9 -
> > >  drivers/gpu/drm/xe/xe_device_sysfs.c          |   4 +
> > >  drivers/gpu/drm/xe/xe_device_types.h          |   9 -
> > >  drivers/gpu/drm/xe/xe_dma_buf.c               |   5 +-
> > >  drivers/gpu/drm/xe/xe_exec_queue.c            |  18 --
> > >  drivers/gpu/drm/xe/xe_ggtt.c                  |   6 -
> > >  drivers/gpu/drm/xe/xe_gsc.c                   |   3 -
> > >  drivers/gpu/drm/xe/xe_gt.c                    |  17 --
> > >  drivers/gpu/drm/xe/xe_gt_debugfs.c            |  53 ++++-
> > >  drivers/gpu/drm/xe/xe_gt_freq.c               |  38 +++-
> > >  drivers/gpu/drm/xe/xe_gt_idle.c               |  23 +-
> > >  drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c     |   3 +
> > >  drivers/gpu/drm/xe/xe_guc_ct.c                |  40 ----
> > >  drivers/gpu/drm/xe/xe_guc_debugfs.c           |   9 +-
> > >  drivers/gpu/drm/xe/xe_guc_pc.c                |  62 +----
> > >  drivers/gpu/drm/xe/xe_huc_debugfs.c           |   5 +-
> > >  drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c |  58 ++++-
> > >  drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.h |   7 +
> > >  drivers/gpu/drm/xe/xe_hwmon.c                 |  25 ++-
> > >  drivers/gpu/drm/xe/xe_pat.c                   |  10 -
> > >  drivers/gpu/drm/xe/xe_pci.c                   |   2 +-
> > >  drivers/gpu/drm/xe/xe_pm.c                    | 211
> > > ++++++++++++++----
> > >  drivers/gpu/drm/xe/xe_pm.h                    |   9 +-
> > >  drivers/gpu/drm/xe/xe_query.c                 |   4 -
> > >  drivers/gpu/drm/xe/xe_sched_job.c             |  10 +-
> > >  drivers/gpu/drm/xe/xe_tile.c                  |  10 +-
> > >  drivers/gpu/drm/xe/xe_tile_sysfs.c            |   1 +
> > >  drivers/gpu/drm/xe/xe_ttm_sys_mgr.c           |   5 +-
> > >  drivers/gpu/drm/xe/xe_vm.c                    |   7 -
> > >  37 files changed, 445 insertions(+), 391 deletions(-)
> > > 
> > > -- 
> > > 2.43.0
> > > 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 00/20] First attempt to kill mem_access
  2024-01-10 14:06   ` Rodrigo Vivi
  2024-01-10 14:08     ` Vivi, Rodrigo
@ 2024-01-10 14:33     ` Matthew Brost
  1 sibling, 0 replies; 46+ messages in thread
From: Matthew Brost @ 2024-01-10 14:33 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

On Wed, Jan 10, 2024 at 09:06:00AM -0500, Rodrigo Vivi wrote:
> On Wed, Jan 10, 2024 at 05:21:34AM +0000, Matthew Brost wrote:
> > On Wed, Dec 27, 2023 at 09:12:12PM -0500, Rodrigo Vivi wrote:
> > > At first the mem_access seemed a good idea since it would ensure
> > > we could map every memory access and apply some workarounds and
> > > then use that to ensure that the device is awake.
> > > 
> > > However it has become a nightmare in locking conflicts with memory
> > > locking. The only sane way to go is to move the runtime_pm protection
> > > to the outer bounds and ensure that the device is resumed way
> > > before memory locking.
> > > 
> > > So, this RFC here is the first attempt to kill the mem access and
> > > have a clean rpm handling on the outer bounds.
> > > 
> > > Well, at this time we already know that we need to solve some TLB
> > > invalidation issues and the last patch in this series needs to
> > > be split in smaller pieces. But I'd like to at lest get
> > > the discussion started.
> > > 
> > > Happy New Year,
> > > Rodrigo.
> > > 
> > 
> > Hi Rodrigo - I haven't fully reviewed everything but noticed a few
> > issues to discuss.
> 
> +Auld, who was also raising very similar concerns.
> 
> > 
> > 1. LR mode VMs
> > 	- I don't think the PM refs taken for LR jobs works. LR job's hw
> > 	  fence is signal immediately after scheduling the job to the
> > 	  hardware. Once the hw fence is signalled, the job can be
> > 	  typically be freed.
> > 	- How about we just take a PM reference when a LR VM is opened?
> 
> I like this idea!
> 
> > 
> > 2. Tearing down exec queues
> > 	- Tearing down exec queues requires a ping-ping with the GuC
> > 	  which likely needs PM ref
> 
> would the idea of getting with the CT that expects G2H help here as well?
> (calling CT-expecting-G2H-ref now on)
> 

Yes.

> > 
> > 3. Schedule enable G2H
> > 	- First job on an exec queue will issue schedule enable H2G
> > 	  which results in a G2H. This G2H could be recieved after the
> > 	  job is freed
> 
> for this, the CT-expecting-G2H-ref would be enough right?
> 

Yes.

> > 
> > 4. TLB Invalidations
> > 	- Send H2G, receive G2H when done
> 
> for this, the CT-expecting-G2H-ref would be enough right?
> 

Yes, more details below.

> > 	- Four cases
> > 		a) From a (un)bind job
> > 			- Job can free before invalidation issued /
> > 			  complete
> 
> hmm... I believe I have faced this at some point.
> would the CT-expecting-G2H help here?
> or any other idea to cover this case?
>

Yes, and also maybe an extra ref taken somewhere safe that ensures the
device doesn't try to go to sleep between the bind job completing and
the TLV invalidation being issued. This also ensures the send in the CT
layer doesn't wake the device.

> > 		b) GGTT invalidations
> > 			- BO creation, should be covered by IOCTL PM ref
> 
> this should be okay then.
>

Yep.
 
> > 		c) Userptr invalidation / BO move on LR VM
> > 			- should be covered by #1 if LR VM take PM ref
> > 		d) Page fault handler
> > 			- should be covered by #1 if LR VM take PM ref
> > 
> 
> these (c and d) would be okay with the LR-VM ref, right?
> 

Yep.

> > 5. SRIOV Relay?
> > 	- Haven't looked into this all might have issues here too?
> 
> would this be covered as well with the CT-expecting-G2H-ref?
> or any big hammer needed of blocking rpm anytime that we have a VF
> maybe?
>

Discussed on chat, need to wrap my head around this usage but 1 of these
is likely true.
 
> > 
> > 2, 3, 4a all are H2G waiting on G2H. Perhaps it is simplest to build the
> > PM references into the CT layer? A lower layer but off the top my head
> > not seeing a better option really.
> > 
> > e.g. A CT send that expects a G2H takes a PM ref with the caveat we
> > expect the device to already have a PM ref. The receive can drop the PM
> > ref and it can transition to zero.
> 
> one extra reason to keep the lockdep checks, but that should be okay
> I believe. I will try it here.
> 
> > 
> > Thoughts?
> 
> Basically it looks that we need:
> 1. Get back the lockdep
> 2. Add a big hammer around LR-VM (outer bound refs at VM creation and destruction if LR)
> 3. Add an inner rpm get around CT messages who expect G2H back messages and put on G2H responses.

This sounds right.

> For this, do you have any good idea for the right places and conditions for the proper balance?

I can maybe send a snippet of code to the list for this and let you take
it from there. It should be enough to get us aligned.

> and to ensure that we don't keep holding the ref forever in case of never getting the response...
> 

Yep, this is where it gets tricky - resets / lost G2H. G2H only should
be lost on devices not behaving correctly which should eventually result
in a GT reset. We will have clean up the PM refs during resets. I'll
include this in a snippet too.

Note that this change makes it very important the KMD doesn't leak G2H
via software bugs - I remember the initial GuC submission code on the
i915 had quite a few bugs related to this. I think Xe is coded bug
free but this will be very important to ensure that it is.

> Anything else that I might be missing?
> 

I don't think.

Matt

> Thank you all for all the great comments and suggestions!
> 
> > 
> > Matt
> > 
> > > Rodrigo Vivi (20):
> > >   drm/xe: Document Xe PM component
> > >   drm/xe: Fix display runtime_pm handling
> > >   drm/xe: Create a xe_pm_runtime_resume_and_get variant for display
> > >   drm/xe: Convert xe_pm_runtime_{get,put} to void and protect from
> > >     recursion
> > >   drm/xe: Prepare display for D3Cold
> > >   drm/xe: Convert mem_access assertion towards the runtime_pm state
> > >   drm/xe: Runtime PM wake on every IOCTL
> > >   drm/xe: Runtime PM wake on every exec
> > >   drm/xe: Runtime PM wake on every sysfs call
> > >   drm/xe: Sort some xe_pm_runtime related functions
> > >   drm/xe: Ensure device is awake before removing it
> > >   drm/xe: Remove mem_access from guc_pc calls
> > >   drm/xe: Runtime PM wake on every debugfs call
> > >   drm/xe: Replace dma_buf mem_access per direct xe_pm_runtime calls
> > >   drm/xe: Allow GuC CT fast path and worker regardless of runtime_pm
> > >   drm/xe: Remove mem_access calls from migration
> > >   drm/xe: Removing extra mem_access protection from runtime pm
> > >   drm/xe: Convert hwmon from mem_access to xe_pm_runtime calls
> > >   drm/xe: Remove unused runtime pm helper
> > >   drm/xe: Mega Kill of mem_access
> > > 
> > >  .../gpu/drm/xe/compat-i915-headers/i915_drv.h |   8 +-
> > >  drivers/gpu/drm/xe/display/xe_fb_pin.c        |   7 +-
> > >  drivers/gpu/drm/xe/tests/xe_bo.c              |   8 -
> > >  drivers/gpu/drm/xe/tests/xe_migrate.c         |   2 -
> > >  drivers/gpu/drm/xe/tests/xe_mocs.c            |   4 -
> > >  drivers/gpu/drm/xe/xe_bo.c                    |   5 -
> > >  drivers/gpu/drm/xe/xe_debugfs.c               |  10 +-
> > >  drivers/gpu/drm/xe/xe_device.c                | 129 ++++-------
> > >  drivers/gpu/drm/xe/xe_device.h                |   9 -
> > >  drivers/gpu/drm/xe/xe_device_sysfs.c          |   4 +
> > >  drivers/gpu/drm/xe/xe_device_types.h          |   9 -
> > >  drivers/gpu/drm/xe/xe_dma_buf.c               |   5 +-
> > >  drivers/gpu/drm/xe/xe_exec_queue.c            |  18 --
> > >  drivers/gpu/drm/xe/xe_ggtt.c                  |   6 -
> > >  drivers/gpu/drm/xe/xe_gsc.c                   |   3 -
> > >  drivers/gpu/drm/xe/xe_gt.c                    |  17 --
> > >  drivers/gpu/drm/xe/xe_gt_debugfs.c            |  53 ++++-
> > >  drivers/gpu/drm/xe/xe_gt_freq.c               |  38 +++-
> > >  drivers/gpu/drm/xe/xe_gt_idle.c               |  23 +-
> > >  drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c     |   3 +
> > >  drivers/gpu/drm/xe/xe_guc_ct.c                |  40 ----
> > >  drivers/gpu/drm/xe/xe_guc_debugfs.c           |   9 +-
> > >  drivers/gpu/drm/xe/xe_guc_pc.c                |  62 +----
> > >  drivers/gpu/drm/xe/xe_huc_debugfs.c           |   5 +-
> > >  drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c |  58 ++++-
> > >  drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.h |   7 +
> > >  drivers/gpu/drm/xe/xe_hwmon.c                 |  25 ++-
> > >  drivers/gpu/drm/xe/xe_pat.c                   |  10 -
> > >  drivers/gpu/drm/xe/xe_pci.c                   |   2 +-
> > >  drivers/gpu/drm/xe/xe_pm.c                    | 211 ++++++++++++++----
> > >  drivers/gpu/drm/xe/xe_pm.h                    |   9 +-
> > >  drivers/gpu/drm/xe/xe_query.c                 |   4 -
> > >  drivers/gpu/drm/xe/xe_sched_job.c             |  10 +-
> > >  drivers/gpu/drm/xe/xe_tile.c                  |  10 +-
> > >  drivers/gpu/drm/xe/xe_tile_sysfs.c            |   1 +
> > >  drivers/gpu/drm/xe/xe_ttm_sys_mgr.c           |   5 +-
> > >  drivers/gpu/drm/xe/xe_vm.c                    |   7 -
> > >  37 files changed, 445 insertions(+), 391 deletions(-)
> > > 
> > > -- 
> > > 2.43.0
> > > 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 16/20] drm/xe: Remove mem_access calls from migration
  2024-01-09 22:40         ` Rodrigo Vivi
@ 2024-01-11 14:17           ` Matthew Brost
  0 siblings, 0 replies; 46+ messages in thread
From: Matthew Brost @ 2024-01-11 14:17 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

On Tue, Jan 09, 2024 at 05:40:19PM -0500, Rodrigo Vivi wrote:
> On Tue, Jan 09, 2024 at 06:49:47PM +0000, Matthew Auld wrote:
> > On 09/01/2024 17:58, Rodrigo Vivi wrote:
> > > On Tue, Jan 09, 2024 at 12:33:25PM +0000, Matthew Auld wrote:
> > > > On 28/12/2023 02:12, Rodrigo Vivi wrote:
> > > > > The sched jobs runtime pm calls already protects every execution,
> > > > > including these migration ones.
> > > > 
> > > > Is job really enough here? I assume queue is only destroyed once it has no
> > > > more jobs and the final queue ref is dropped. And destroying the queue might
> > > > involve stuff like de-register the context with GuC etc. which needs to use
> > > > CT which will need rpm ref. What is holding the rpm if not the vm or queue?
> > > 
> > > The exec queue is holding to the end.
> > 
> > Can you share some more details? AFAIK the queue destruction is async, and
> > previously the vm underneath is holding the rpm or in the case of migration
> > vm, if was the queue itself. But for the migration vm case that is removed
> > below. I guess I'm missing something here.
> 
> Cc: Matthew Brost
> 
> I had understood that every vm case would use the exec_queue and the destroy
> would cover them all. But if there's more VM operation happening beyond the
> exec_queue destroy, then we need something else?
> 
> Brost, any suggestion on an outer bound case here that we would be safe,
> but without entirely killing the RPM?
> 

Matthew Auld is correct and this issue #2 is my response to the cover
letter.

The code snippet I sent to the list [1] covers this with an outer bound
grabbing the PM ref in guc_exec_queue_process_msg with the CT layer
(__g2h_reserve_space) safely increasing the PM ref further down in the
call stack.

Matt

[1] https://patchwork.freedesktop.org/patch/573939/?series=128434&rev=1

> > 
> > > 
> > > > 
> > > > > 
> > > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > > ---
> > > > >    drivers/gpu/drm/xe/tests/xe_migrate.c |  2 --
> > > > >    drivers/gpu/drm/xe/xe_device.c        | 17 -----------------
> > > > >    drivers/gpu/drm/xe/xe_device.h        |  1 -
> > > > >    drivers/gpu/drm/xe/xe_exec_queue.c    | 18 ------------------
> > > > >    4 files changed, 38 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > > > > index 7a32faa2f6888..2257f0a28435b 100644
> > > > > --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> > > > > +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > > > > @@ -428,9 +428,7 @@ static int migrate_test_run_device(struct xe_device *xe)
> > > > >    		kunit_info(test, "Testing tile id %d.\n", id);
> > > > >    		xe_vm_lock(m->q->vm, true);
> > > > > -		xe_device_mem_access_get(xe);
> > > > >    		xe_migrate_sanity_test(m, test);
> > > > > -		xe_device_mem_access_put(xe);
> > > > >    		xe_vm_unlock(m->q->vm);
> > > > >    	}
> > > > > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > > > > index ee9b6612eec43..a7bec49da49fa 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_device.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_device.c
> > > > > @@ -675,23 +675,6 @@ void xe_device_assert_mem_access(struct xe_device *xe)
> > > > >    	XE_WARN_ON(xe_pm_runtime_suspended(xe));
> > > > >    }
> > > > > -bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe)
> > > > > -{
> > > > > -	bool active;
> > > > > -
> > > > > -	if (xe_pm_read_callback_task(xe) == current)
> > > > > -		return true;
> > > > > -
> > > > > -	active = xe_pm_runtime_get_if_active(xe);
> > > > > -	if (active) {
> > > > > -		int ref = atomic_inc_return(&xe->mem_access.ref);
> > > > > -
> > > > > -		xe_assert(xe, ref != S32_MAX);
> > > > > -	}
> > > > > -
> > > > > -	return active;
> > > > > -}
> > > > > -
> > > > >    void xe_device_mem_access_get(struct xe_device *xe)
> > > > >    {
> > > > >    	int ref;
> > > > > diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> > > > > index af8ac2e9e2709..4acf4c2973390 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_device.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_device.h
> > > > > @@ -142,7 +142,6 @@ static inline struct xe_force_wake *gt_to_fw(struct xe_gt *gt)
> > > > >    }
> > > > >    void xe_device_mem_access_get(struct xe_device *xe);
> > > > > -bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe);
> > > > >    void xe_device_mem_access_put(struct xe_device *xe);
> > > > >    void xe_device_assert_mem_access(struct xe_device *xe);
> > > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > index 44fe8097b7cda..d3a8d2d8caaaf 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > @@ -87,17 +87,6 @@ static struct xe_exec_queue *__xe_exec_queue_create(struct xe_device *xe,
> > > > >    	if (err)
> > > > >    		goto err_lrc;
> > > > > -	/*
> > > > > -	 * Normally the user vm holds an rpm ref to keep the device
> > > > > -	 * awake, and the context holds a ref for the vm, however for
> > > > > -	 * some engines we use the kernels migrate vm underneath which offers no
> > > > > -	 * such rpm ref, or we lack a vm. Make sure we keep a ref here, so we
> > > > > -	 * can perform GuC CT actions when needed. Caller is expected to have
> > > > > -	 * already grabbed the rpm ref outside any sensitive locks.
> > > > > -	 */
> > > > > -	if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !vm))
> > > > > -		drm_WARN_ON(&xe->drm, !xe_device_mem_access_get_if_ongoing(xe));
> > > > > -
> > > > >    	return q;
> > > > >    err_lrc:
> > > > > @@ -172,8 +161,6 @@ void xe_exec_queue_fini(struct xe_exec_queue *q)
> > > > >    	for (i = 0; i < q->width; ++i)
> > > > >    		xe_lrc_finish(q->lrc + i);
> > > > > -	if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !q->vm))
> > > > > -		xe_device_mem_access_put(gt_to_xe(q->gt));
> > > > >    	if (q->vm)
> > > > >    		xe_vm_put(q->vm);
> > > > > @@ -643,9 +630,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
> > > > >    			if (XE_IOCTL_DBG(xe, !hwe))
> > > > >    				return -EINVAL;
> > > > > -			/* The migration vm doesn't hold rpm ref */
> > > > > -			xe_device_mem_access_get(xe);
> > > > > -
> > > > >    			migrate_vm = xe_migrate_get_vm(gt_to_tile(gt)->migrate);
> > > > >    			new = xe_exec_queue_create(xe, migrate_vm, logical_mask,
> > > > >    						   args->width, hwe,
> > > > > @@ -655,8 +639,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
> > > > >    						    EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD :
> > > > >    						    0));
> > > > > -			xe_device_mem_access_put(xe); /* now held by engine */
> > > > > -
> > > > >    			xe_vm_put(migrate_vm);
> > > > >    			if (IS_ERR(new)) {
> > > > >    				err = PTR_ERR(new);

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2024-01-11 14:19 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
2023-12-28  2:12 ` [RFC 01/20] drm/xe: Document Xe PM component Rodrigo Vivi
2023-12-28  2:12 ` [RFC 02/20] drm/xe: Fix display runtime_pm handling Rodrigo Vivi
2023-12-28  2:12 ` [RFC 03/20] drm/xe: Create a xe_pm_runtime_resume_and_get variant for display Rodrigo Vivi
2023-12-28  2:12 ` [RFC 04/20] drm/xe: Convert xe_pm_runtime_{get, put} to void and protect from recursion Rodrigo Vivi
2023-12-28  2:12 ` [RFC 05/20] drm/xe: Prepare display for D3Cold Rodrigo Vivi
2023-12-28  2:12 ` [RFC 06/20] drm/xe: Convert mem_access assertion towards the runtime_pm state Rodrigo Vivi
2024-01-09 11:06   ` Matthew Auld
2024-01-09 17:50     ` Rodrigo Vivi
2023-12-28  2:12 ` [RFC 07/20] drm/xe: Runtime PM wake on every IOCTL Rodrigo Vivi
2024-01-02 11:30   ` Gupta, Anshuman
2024-01-09 17:57     ` Rodrigo Vivi
2023-12-28  2:12 ` [RFC 08/20] drm/xe: Runtime PM wake on every exec Rodrigo Vivi
2024-01-09 11:24   ` Matthew Auld
2024-01-09 17:41     ` Rodrigo Vivi
2024-01-09 18:40       ` Matthew Auld
2023-12-28  2:12 ` [RFC 09/20] drm/xe: Runtime PM wake on every sysfs call Rodrigo Vivi
2023-12-28  2:12 ` [RFC 10/20] drm/xe: Sort some xe_pm_runtime related functions Rodrigo Vivi
2024-01-09 11:26   ` Matthew Auld
2023-12-28  2:12 ` [RFC 11/20] drm/xe: Ensure device is awake before removing it Rodrigo Vivi
2023-12-28  2:12 ` [RFC 12/20] drm/xe: Remove mem_access from guc_pc calls Rodrigo Vivi
2023-12-28  2:12 ` [RFC 13/20] drm/xe: Runtime PM wake on every debugfs call Rodrigo Vivi
2023-12-28  2:12 ` [RFC 14/20] drm/xe: Replace dma_buf mem_access per direct xe_pm_runtime calls Rodrigo Vivi
2023-12-28  2:12 ` [RFC 15/20] drm/xe: Allow GuC CT fast path and worker regardless of runtime_pm Rodrigo Vivi
2024-01-09 12:09   ` Matthew Auld
2023-12-28  2:12 ` [RFC 16/20] drm/xe: Remove mem_access calls from migration Rodrigo Vivi
2024-01-09 12:33   ` Matthew Auld
2024-01-09 17:58     ` Rodrigo Vivi
2024-01-09 18:49       ` Matthew Auld
2024-01-09 22:40         ` Rodrigo Vivi
2024-01-11 14:17           ` Matthew Brost
2023-12-28  2:12 ` [RFC 17/20] drm/xe: Removing extra mem_access protection from runtime pm Rodrigo Vivi
2023-12-28  2:12 ` [RFC 18/20] drm/xe: Convert hwmon from mem_access to xe_pm_runtime calls Rodrigo Vivi
2023-12-28  2:12 ` [RFC 19/20] drm/xe: Remove unused runtime pm helper Rodrigo Vivi
2023-12-28  2:12 ` [RFC 20/20] drm/xe: Mega Kill of mem_access Rodrigo Vivi
2024-01-09 11:41   ` Matthew Auld
2024-01-09 17:39     ` Rodrigo Vivi
2024-01-09 18:27       ` Matthew Auld
2024-01-09 22:34         ` Rodrigo Vivi
2024-01-04  5:40 ` ✓ CI.Patch_applied: success for First attempt to kill mem_access Patchwork
2024-01-04  5:40 ` ✗ CI.checkpatch: warning " Patchwork
2024-01-04  5:41 ` ✗ CI.KUnit: failure " Patchwork
2024-01-10  5:21 ` [RFC 00/20] " Matthew Brost
2024-01-10 14:06   ` Rodrigo Vivi
2024-01-10 14:08     ` Vivi, Rodrigo
2024-01-10 14:33     ` Matthew Brost

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox