[Intel-gfx] [RFC 0/2] DGFX mmap with rpm

public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed

* [Intel-gfx] [RFC 0/2] DGFX mmap with rpm
@ 2022-08-17 15:09 Anshuman Gupta
  2022-08-17 15:09 ` [Intel-gfx] [RFC 1/2] drm/i915/dgfx: Release mmap on rpm suspend Anshuman Gupta
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Anshuman Gupta @ 2022-08-17 15:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: chris, matthew.auld, rodrigo.vivi

As per PCIe Spec Section 5.3.1.4.1
When pcie function is in d3hot state, 
Configuration and Message requests are the only TLPs accepted by a 
Function in the D3hot state. All other received Requests must be 
handled as Unsupported Requests, and all received Completions
may optionally be handled as Unexpected Completions.

Therefore when gfx endpoint function is in d3 state, all pcie iomem
transaction requires to transition the pcie function in D0 state.

RFC proposal to get community feedback to handle the lmem
mmap memory mappings with runtime suspend.

Anshuman Gupta (2):
  drm/i915/dgfx: Release mmap on rpm suspend
  drm/i915/dgfx: Runtime resume the dgpu on user fault

 drivers/gpu/drm/i915/gem/i915_gem_ttm.c  | 29 +++++++++++++++++++-----
 drivers/gpu/drm/i915/gt/intel_gt.c       |  1 +
 drivers/gpu/drm/i915/gt/intel_gt_types.h |  2 ++
 drivers/gpu/drm/i915/i915_gem.c          |  6 +++++
 4 files changed, 32 insertions(+), 6 deletions(-)

-- 
2.26.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-gfx] [RFC 1/2] drm/i915/dgfx: Release mmap on rpm suspend
  2022-08-17 15:09 [Intel-gfx] [RFC 0/2] DGFX mmap with rpm Anshuman Gupta
@ 2022-08-17 15:09 ` Anshuman Gupta
  2022-08-17 17:34   ` Matthew Auld
  2022-08-17 15:09 ` [Intel-gfx] [RFC 2/2] drm/i915/dgfx: Runtime resume the dgpu on user fault Anshuman Gupta
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Anshuman Gupta @ 2022-08-17 15:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: chris, matthew.auld, rodrigo.vivi

Release all mmap mapping for all lmem objects which are associated
with userfault such that, while pcie function in D3hot, any access
to memory mappings will raise a userfault.

Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c  | 4 ++++
 drivers/gpu/drm/i915/gt/intel_gt.c       | 1 +
 drivers/gpu/drm/i915/gt/intel_gt_types.h | 2 ++
 drivers/gpu/drm/i915/i915_gem.c          | 6 ++++++
 4 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 5a5cf332d8a5..b49823d599e7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1073,6 +1073,10 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
 	} else {
 		ret = ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
 	}
+
+	if (i915_gem_object_is_lmem(obj))
+		list_add(&obj->userfault_link, &to_gt(to_i915(obj->base.dev))->lmem_userfault_list);
+
 	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index e4bac2431e41..f0d641c3153c 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -39,6 +39,7 @@ static void __intel_gt_init_early(struct intel_gt *gt)
 {
 	spin_lock_init(&gt->irq_lock);
 
+	INIT_LIST_HEAD(&gt->lmem_userfault_list);
 	INIT_LIST_HEAD(&gt->closed_vma);
 	spin_lock_init(&gt->closed_lock);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 4d56f7d5a3be..3e915df255f3 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -132,6 +132,8 @@ struct intel_gt {
 	struct intel_wakeref wakeref;
 	atomic_t user_wakeref;
 
+	struct list_head lmem_userfault_list;
+
 	struct list_head closed_vma;
 	spinlock_t closed_lock; /* guards the list of closed_vma */
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 702e5b89be22..1e6ce6d06c11 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -842,6 +842,12 @@ void i915_gem_runtime_suspend(struct drm_i915_private *i915)
 				 &to_gt(i915)->ggtt->userfault_list, userfault_link)
 		__i915_gem_object_release_mmap_gtt(obj);
 
+	list_for_each_entry_safe(obj, on,
+				 &to_gt(i915)->lmem_userfault_list, userfault_link) {
+		i915_gem_object_release_mmap_offset(obj);
+		list_del(&obj->userfault_link);
+	}
+
 	/*
 	 * The fence will be lost when the device powers down. If any were
 	 * in use by hardware (i.e. they are pinned), we should not be powering
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Intel-gfx] [RFC 2/2] drm/i915/dgfx: Runtime resume the dgpu on user fault
  2022-08-17 15:09 [Intel-gfx] [RFC 0/2] DGFX mmap with rpm Anshuman Gupta
  2022-08-17 15:09 ` [Intel-gfx] [RFC 1/2] drm/i915/dgfx: Release mmap on rpm suspend Anshuman Gupta
@ 2022-08-17 15:09 ` Anshuman Gupta
  2022-08-17 18:11   ` Matthew Auld
  2022-08-17 15:32 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for DGFX mmap with rpm Patchwork
  2022-08-17 15:42 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  3 siblings, 1 reply; 8+ messages in thread
From: Anshuman Gupta @ 2022-08-17 15:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: chris, matthew.auld, rodrigo.vivi

Runtime resume the dgpu(when gem object lies in lmem).
This will transition the dgpu graphics function to D0
state if it was in D3 in order to access the mmap memory
mappings.

Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index b49823d599e7..1e9b07473a8f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1020,6 +1020,7 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
 	struct ttm_buffer_object *bo = area->vm_private_data;
 	struct drm_device *dev = bo->base.dev;
 	struct drm_i915_gem_object *obj;
+	intel_wakeref_t wakeref = 0;
 	vm_fault_t ret;
 	int idx;
 
@@ -1027,18 +1028,24 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
 	if (!obj)
 		return VM_FAULT_SIGBUS;
 
+	if (i915_gem_object_is_lmem(obj))
+		wakeref = intel_runtime_pm_get(&to_i915(obj->base.dev)->runtime_pm);
+
 	/* Sanity check that we allow writing into this object */
 	if (unlikely(i915_gem_object_is_readonly(obj) &&
-		     area->vm_flags & VM_WRITE))
-		return VM_FAULT_SIGBUS;
+		     area->vm_flags & VM_WRITE)) {
+		ret = VM_FAULT_SIGBUS;
+		goto out_rpm;
+	}
 
 	ret = ttm_bo_vm_reserve(bo, vmf);
 	if (ret)
-		return ret;
+		goto out_rpm;
 
 	if (obj->mm.madv != I915_MADV_WILLNEED) {
 		dma_resv_unlock(bo->base.resv);
-		return VM_FAULT_SIGBUS;
+		ret = VM_FAULT_SIGBUS;
+		goto out_rpm;
 	}
 
 	if (!i915_ttm_resource_mappable(bo->resource)) {
@@ -1062,7 +1069,8 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
 		if (err) {
 			drm_dbg(dev, "Unable to make resource CPU accessible\n");
 			dma_resv_unlock(bo->base.resv);
-			return VM_FAULT_SIGBUS;
+			ret = VM_FAULT_SIGBUS;
+			goto out_rpm;
 		}
 	}
 
@@ -1078,11 +1086,16 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
 		list_add(&obj->userfault_link, &to_gt(to_i915(obj->base.dev))->lmem_userfault_list);
 
 	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
-		return ret;
+		goto out_rpm;
 
 	i915_ttm_adjust_lru(obj);
 
 	dma_resv_unlock(bo->base.resv);
+
+out_rpm:
+	if (wakeref)
+		intel_runtime_pm_put(&to_i915(obj->base.dev)->runtime_pm, wakeref);
+
 	return ret;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for DGFX mmap with rpm
  2022-08-17 15:09 [Intel-gfx] [RFC 0/2] DGFX mmap with rpm Anshuman Gupta
  2022-08-17 15:09 ` [Intel-gfx] [RFC 1/2] drm/i915/dgfx: Release mmap on rpm suspend Anshuman Gupta
  2022-08-17 15:09 ` [Intel-gfx] [RFC 2/2] drm/i915/dgfx: Runtime resume the dgpu on user fault Anshuman Gupta
@ 2022-08-17 15:32 ` Patchwork
  2022-08-17 15:42 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  3 siblings, 0 replies; 8+ messages in thread
From: Patchwork @ 2022-08-17 15:32 UTC (permalink / raw)
  To: Anshuman Gupta; +Cc: intel-gfx

== Series Details ==

Series: DGFX mmap with rpm
URL   : https://patchwork.freedesktop.org/series/107400/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for DGFX mmap with rpm
  2022-08-17 15:09 [Intel-gfx] [RFC 0/2] DGFX mmap with rpm Anshuman Gupta
                   ` (2 preceding siblings ...)
  2022-08-17 15:32 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for DGFX mmap with rpm Patchwork
@ 2022-08-17 15:42 ` Patchwork
  3 siblings, 0 replies; 8+ messages in thread
From: Patchwork @ 2022-08-17 15:42 UTC (permalink / raw)
  To: Anshuman Gupta; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 5340 bytes --]

== Series Details ==

Series: DGFX mmap with rpm
URL   : https://patchwork.freedesktop.org/series/107400/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11993 -> Patchwork_107400v1
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_107400v1 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_107400v1, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107400v1/index.html

Participating hosts (31 -> 34)
------------------------------

  Additional (3): fi-icl-u2 fi-tgl-dsi bat-adlp-4 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_107400v1:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_busy@busy@all:
    - bat-dg1-6:          [PASS][1] -> [DMESG-WARN][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11993/bat-dg1-6/igt@gem_busy@busy@all.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107400v1/bat-dg1-6/igt@gem_busy@busy@all.html
    - bat-dg1-5:          [PASS][3] -> [DMESG-WARN][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11993/bat-dg1-5/igt@gem_busy@busy@all.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107400v1/bat-dg1-5/igt@gem_busy@busy@all.html

  * igt@i915_module_load@load:
    - fi-icl-u2:          NOTRUN -> [DMESG-WARN][5]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107400v1/fi-icl-u2/igt@i915_module_load@load.html
    - bat-adlp-4:         NOTRUN -> [DMESG-WARN][6]
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107400v1/bat-adlp-4/igt@i915_module_load@load.html

  * igt@i915_suspend@basic-s3-without-i915:
    - fi-pnv-d510:        NOTRUN -> [INCOMPLETE][7]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107400v1/fi-pnv-d510/igt@i915_suspend@basic-s3-without-i915.html

  
Known issues
------------

  Here are the changes found in Patchwork_107400v1 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_flip@basic-flip-vs-wf_vblank@b-vga1:
    - fi-pnv-d510:        [PASS][8] -> [FAIL][9] ([i915#2122])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11993/fi-pnv-d510/igt@kms_flip@basic-flip-vs-wf_vblank@b-vga1.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107400v1/fi-pnv-d510/igt@kms_flip@basic-flip-vs-wf_vblank@b-vga1.html

  * igt@runner@aborted:
    - bat-adlp-4:         NOTRUN -> [FAIL][10] ([i915#4312])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107400v1/bat-adlp-4/igt@runner@aborted.html
    - fi-icl-u2:          NOTRUN -> [FAIL][11] ([i915#4312])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107400v1/fi-icl-u2/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@requests:
    - fi-pnv-d510:        [DMESG-FAIL][12] ([i915#4528]) -> [PASS][13]
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11993/fi-pnv-d510/igt@i915_selftest@live@requests.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107400v1/fi-pnv-d510/igt@i915_selftest@live@requests.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#110189]: https://bugs.freedesktop.org/show_bug.cgi?id=110189
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1245]: https://gitlab.freedesktop.org/drm/intel/issues/1245
  [i915#2122]: https://gitlab.freedesktop.org/drm/intel/issues/2122
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2411]: https://gitlab.freedesktop.org/drm/intel/issues/2411
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4528]: https://gitlab.freedesktop.org/drm/intel/issues/4528
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#6559]: https://gitlab.freedesktop.org/drm/intel/issues/6559
  [i915#6596]: https://gitlab.freedesktop.org/drm/intel/issues/6596


Build changes
-------------

  * Linux: CI_DRM_11993 -> Patchwork_107400v1

  CI-20190529: 20190529
  CI_DRM_11993: 2dfc67084761ba9b6259873bd574548dc4827296 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6629: d24e986fb3b2ab6d755498d27828bc85931d12ff @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_107400v1: 2dfc67084761ba9b6259873bd574548dc4827296 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

08e0d07fba36 drm/i915/dgfx: Runtime resume the dgpu on user fault
319f12d0972f drm/i915/dgfx: Release mmap on rpm suspend

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107400v1/index.html

[-- Attachment #2: Type: text/html, Size: 5307 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Intel-gfx] [RFC 1/2] drm/i915/dgfx: Release mmap on rpm suspend
  2022-08-17 15:09 ` [Intel-gfx] [RFC 1/2] drm/i915/dgfx: Release mmap on rpm suspend Anshuman Gupta
@ 2022-08-17 17:34   ` Matthew Auld
  0 siblings, 0 replies; 8+ messages in thread
From: Matthew Auld @ 2022-08-17 17:34 UTC (permalink / raw)
  To: Anshuman Gupta, intel-gfx; +Cc: chris, rodrigo.vivi

On 17/08/2022 16:09, Anshuman Gupta wrote:
> Release all mmap mapping for all lmem objects which are associated
> with userfault such that, while pcie function in D3hot, any access
> to memory mappings will raise a userfault.
> 
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c  | 4 ++++
>   drivers/gpu/drm/i915/gt/intel_gt.c       | 1 +
>   drivers/gpu/drm/i915/gt/intel_gt_types.h | 2 ++
>   drivers/gpu/drm/i915/i915_gem.c          | 6 ++++++
>   4 files changed, 13 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 5a5cf332d8a5..b49823d599e7 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -1073,6 +1073,10 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
>   	} else {
>   		ret = ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
>   	}
> +
> +	if (i915_gem_object_is_lmem(obj))
> +		list_add(&obj->userfault_link, &to_gt(to_i915(obj->base.dev))->lmem_userfault_list);

Both patches need to be squashed together I think. Also AFAIK we need 
some kind of lock here to protect against concurrent list_add(). We also 
need something to prevent against double list_add(). And we should 
probably first check the ret value before touching the list.

> +
>   	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
>   		return ret;
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index e4bac2431e41..f0d641c3153c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -39,6 +39,7 @@ static void __intel_gt_init_early(struct intel_gt *gt)
>   {
>   	spin_lock_init(&gt->irq_lock);
>   
> +	INIT_LIST_HEAD(&gt->lmem_userfault_list);
>   	INIT_LIST_HEAD(&gt->closed_vma);
>   	spin_lock_init(&gt->closed_lock);
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> index 4d56f7d5a3be..3e915df255f3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> @@ -132,6 +132,8 @@ struct intel_gt {
>   	struct intel_wakeref wakeref;
>   	atomic_t user_wakeref;
>   
> +	struct list_head lmem_userfault_list;
> +
>   	struct list_head closed_vma;
>   	spinlock_t closed_lock; /* guards the list of closed_vma */
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 702e5b89be22..1e6ce6d06c11 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -842,6 +842,12 @@ void i915_gem_runtime_suspend(struct drm_i915_private *i915)
>   				 &to_gt(i915)->ggtt->userfault_list, userfault_link)
>   		__i915_gem_object_release_mmap_gtt(obj);
>   
> +	list_for_each_entry_safe(obj, on,
> +				 &to_gt(i915)->lmem_userfault_list, userfault_link) {
> +		i915_gem_object_release_mmap_offset(obj);
> +		list_del(&obj->userfault_link);
> +	}
> +
>   	/*
>   	 * The fence will be lost when the device powers down. If any were
>   	 * in use by hardware (i.e. they are pinned), we should not be powering

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Intel-gfx] [RFC 2/2] drm/i915/dgfx: Runtime resume the dgpu on user fault
  2022-08-17 15:09 ` [Intel-gfx] [RFC 2/2] drm/i915/dgfx: Runtime resume the dgpu on user fault Anshuman Gupta
@ 2022-08-17 18:11   ` Matthew Auld
  2022-08-18  9:46     ` Gupta, Anshuman
  0 siblings, 1 reply; 8+ messages in thread
From: Matthew Auld @ 2022-08-17 18:11 UTC (permalink / raw)
  To: Anshuman Gupta, intel-gfx; +Cc: chris, rodrigo.vivi

On 17/08/2022 16:09, Anshuman Gupta wrote:
> Runtime resume the dgpu(when gem object lies in lmem).
> This will transition the dgpu graphics function to D0
> state if it was in D3 in order to access the mmap memory
> mappings.
> 
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 25 +++++++++++++++++++------
>   1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index b49823d599e7..1e9b07473a8f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -1020,6 +1020,7 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
>   	struct ttm_buffer_object *bo = area->vm_private_data;
>   	struct drm_device *dev = bo->base.dev;
>   	struct drm_i915_gem_object *obj;
> +	intel_wakeref_t wakeref = 0;
>   	vm_fault_t ret;
>   	int idx;
>   
> @@ -1027,18 +1028,24 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
>   	if (!obj)
>   		return VM_FAULT_SIGBUS;
>   
> +	if (i915_gem_object_is_lmem(obj))

We shouldn't call this without first locking the object (see 
bo_vm_reserve below), since it could be in the process of being moved to 
system memory or vice versa. For example, below we check is_lmem() again 
(this time with the lock held), which might return true, even though 
here it returned false, which means we can now race against the 
i915_gem_runtime_suspend() modifying the list as we add something.

We ofc still need to audit all the kernel internal users that are 
touching lmem though a CPU mapping, and making sure we have the right 
pm_get/put wrapping those accesses.

> +		wakeref = intel_runtime_pm_get(&to_i915(obj->base.dev)->runtime_pm);
> +
>   	/* Sanity check that we allow writing into this object */
>   	if (unlikely(i915_gem_object_is_readonly(obj) &&
> -		     area->vm_flags & VM_WRITE))
> -		return VM_FAULT_SIGBUS;
> +		     area->vm_flags & VM_WRITE)) {
> +		ret = VM_FAULT_SIGBUS;
> +		goto out_rpm;
> +	}
>   
>   	ret = ttm_bo_vm_reserve(bo, vmf);
>   	if (ret)
> -		return ret;
> +		goto out_rpm;
>   
>   	if (obj->mm.madv != I915_MADV_WILLNEED) {
>   		dma_resv_unlock(bo->base.resv);
> -		return VM_FAULT_SIGBUS;
> +		ret = VM_FAULT_SIGBUS;
> +		goto out_rpm;
>   	}
>   
>   	if (!i915_ttm_resource_mappable(bo->resource)) {
> @@ -1062,7 +1069,8 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
>   		if (err) {
>   			drm_dbg(dev, "Unable to make resource CPU accessible\n");
>   			dma_resv_unlock(bo->base.resv);
> -			return VM_FAULT_SIGBUS;
> +			ret = VM_FAULT_SIGBUS;
> +			goto out_rpm;
>   		}
>   	}
>   
> @@ -1078,11 +1086,16 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
>   		list_add(&obj->userfault_link, &to_gt(to_i915(obj->base.dev))->lmem_userfault_list);
>   
>   	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
> -		return ret;
> +		goto out_rpm;
>   
>   	i915_ttm_adjust_lru(obj);
>   
>   	dma_resv_unlock(bo->base.resv);
> +
> +out_rpm:
> +	if (wakeref)
> +		intel_runtime_pm_put(&to_i915(obj->base.dev)->runtime_pm, wakeref);

Do we need something like DRM_I915_USERFAULT_AUTOSUSPEND here?

> +
>   	return ret;
>   }
>   

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Intel-gfx] [RFC 2/2] drm/i915/dgfx: Runtime resume the dgpu on user fault
  2022-08-17 18:11   ` Matthew Auld
@ 2022-08-18  9:46     ` Gupta, Anshuman
  0 siblings, 0 replies; 8+ messages in thread
From: Gupta, Anshuman @ 2022-08-18  9:46 UTC (permalink / raw)
  To: Auld, Matthew, intel-gfx@lists.freedesktop.org
  Cc: chris@chris-wilson.co.uk, Vivi, Rodrigo



> -----Original Message-----
> From: Auld, Matthew <matthew.auld@intel.com>
> Sent: Wednesday, August 17, 2022 11:41 PM
> To: Gupta, Anshuman <anshuman.gupta@intel.com>; intel-
> gfx@lists.freedesktop.org
> Cc: joonas.lahtinen@linux.intel.com; Vivi, Rodrigo <rodrigo.vivi@intel.com>;
> Nilawar, Badal <badal.nilawar@intel.com>; chris@chris-wilson.co.uk
> Subject: Re: [RFC 2/2] drm/i915/dgfx: Runtime resume the dgpu on user fault
> 
> On 17/08/2022 16:09, Anshuman Gupta wrote:
> > Runtime resume the dgpu(when gem object lies in lmem).
> > This will transition the dgpu graphics function to D0 state if it was
> > in D3 in order to access the mmap memory mappings.
> >
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 25 +++++++++++++++++++-----
> -
> >   1 file changed, 19 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > index b49823d599e7..1e9b07473a8f 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > @@ -1020,6 +1020,7 @@ static vm_fault_t vm_fault_ttm(struct vm_fault
> *vmf)
> >   	struct ttm_buffer_object *bo = area->vm_private_data;
> >   	struct drm_device *dev = bo->base.dev;
> >   	struct drm_i915_gem_object *obj;
> > +	intel_wakeref_t wakeref = 0;
> >   	vm_fault_t ret;
> >   	int idx;
> >
> > @@ -1027,18 +1028,24 @@ static vm_fault_t vm_fault_ttm(struct vm_fault
> *vmf)
> >   	if (!obj)
> >   		return VM_FAULT_SIGBUS;
> >
> > +	if (i915_gem_object_is_lmem(obj))
> 
> We shouldn't call this without first locking the object (see bo_vm_reserve
> below), since it could be in the process of being moved to system memory or
> vice versa. For example, below we check is_lmem() again (this time with the lock
> held), which might return true, even though here it returned false, which means
> we can now race against the
> i915_gem_runtime_suspend() modifying the list as we add something.
Thanks for review, i will fix this.
> 
> We ofc still need to audit all the kernel internal users that are touching lmem
> though a CPU mapping, and making sure we have the right pm_get/put
> wrapping those accesses.
I was thinking to use assert_rpm_wakelock_held in i915_gem_object_pin_map()
So every caller should take the proper wakeref before mapping the pages. 
It will be difficult to track the wakeref with multiple i915_gem_object_{pin,unpin}_map.
 
And i915_ttm_move_memcpy() also need to wrapped with rpm get/put.

Other than that is there are any iomem related pcie transaction transaction
involved to prepare object sgl page list ?

Thanks,
Anshuman Gupta.
> 
> > +		wakeref =
> > +intel_runtime_pm_get(&to_i915(obj->base.dev)->runtime_pm);
> > +
> >   	/* Sanity check that we allow writing into this object */
> >   	if (unlikely(i915_gem_object_is_readonly(obj) &&
> > -		     area->vm_flags & VM_WRITE))
> > -		return VM_FAULT_SIGBUS;
> > +		     area->vm_flags & VM_WRITE)) {
> > +		ret = VM_FAULT_SIGBUS;
> > +		goto out_rpm;
> > +	}
> >
> >   	ret = ttm_bo_vm_reserve(bo, vmf);
> >   	if (ret)
> > -		return ret;
> > +		goto out_rpm;
> >
> >   	if (obj->mm.madv != I915_MADV_WILLNEED) {
> >   		dma_resv_unlock(bo->base.resv);
> > -		return VM_FAULT_SIGBUS;
> > +		ret = VM_FAULT_SIGBUS;
> > +		goto out_rpm;
> >   	}
> >
> >   	if (!i915_ttm_resource_mappable(bo->resource)) { @@ -1062,7 +1069,8
> > @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
> >   		if (err) {
> >   			drm_dbg(dev, "Unable to make resource CPU
> accessible\n");
> >   			dma_resv_unlock(bo->base.resv);
> > -			return VM_FAULT_SIGBUS;
> > +			ret = VM_FAULT_SIGBUS;
> > +			goto out_rpm;
> >   		}
> >   	}
> >
> > @@ -1078,11 +1086,16 @@ static vm_fault_t vm_fault_ttm(struct vm_fault
> *vmf)
> >   		list_add(&obj->userfault_link,
> > &to_gt(to_i915(obj->base.dev))->lmem_userfault_list);
> >
> >   	if (ret == VM_FAULT_RETRY && !(vmf->flags &
> FAULT_FLAG_RETRY_NOWAIT))
> > -		return ret;
> > +		goto out_rpm;
> >
> >   	i915_ttm_adjust_lru(obj);
> >
> >   	dma_resv_unlock(bo->base.resv);
> > +
> > +out_rpm:
> > +	if (wakeref)
> > +		intel_runtime_pm_put(&to_i915(obj->base.dev)->runtime_pm,
> wakeref);
> 
> Do we need something like DRM_I915_USERFAULT_AUTOSUSPEND here?
> 
> > +
> >   	return ret;
> >   }
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-08-18  9:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-17 15:09 [Intel-gfx] [RFC 0/2] DGFX mmap with rpm Anshuman Gupta
2022-08-17 15:09 ` [Intel-gfx] [RFC 1/2] drm/i915/dgfx: Release mmap on rpm suspend Anshuman Gupta
2022-08-17 17:34   ` Matthew Auld
2022-08-17 15:09 ` [Intel-gfx] [RFC 2/2] drm/i915/dgfx: Runtime resume the dgpu on user fault Anshuman Gupta
2022-08-17 18:11   ` Matthew Auld
2022-08-18  9:46     ` Gupta, Anshuman
2022-08-17 15:32 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for DGFX mmap with rpm Patchwork
2022-08-17 15:42 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox