public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [Intel-gfx] [RFC PATCH] drm/ttm: Allow the driver to resolve a WW transaction rollback
@ 2023-05-05 14:17 Thomas Hellström
  2023-05-05 20:19 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Thomas Hellström @ 2023-05-05 14:17 UTC (permalink / raw)
  To: dri-devel; +Cc: Thomas Hellström, intel-gfx, Christian Koenig, intel-xe

Allow drivers to resolve a WW transaction rollback. This allows for
1) Putting a lower-priority transaction to sleep allowing another to
succeed instead both fighting using trylocks.
2) Letting the driver know whether a received -ENOMEM is the result of
competition with another WW transaction, which can be resolved using
rollback and retry or a real -ENOMEM which should be propagated back
to user-space as a failure.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 17 +++++++++++++++--
 include/drm/ttm/ttm_bo.h     |  2 ++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bd5dae4d1624..c3ccbea2be3e 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -561,6 +561,10 @@ static int ttm_mem_evict_wait_busy(struct ttm_buffer_object *busy_bo,
 	if (!busy_bo || !ticket)
 		return -EBUSY;
 
+	/* We want to resolve contention before trying to lock again. */
+	if (ctx->propagate_edeadlk && ctx->contended_bo)
+		return  -EDEADLK;
+
 	if (ctx->interruptible)
 		r = dma_resv_lock_interruptible(busy_bo->base.resv,
 							  ticket);
@@ -575,7 +579,15 @@ static int ttm_mem_evict_wait_busy(struct ttm_buffer_object *busy_bo,
 	if (!r)
 		dma_resv_unlock(busy_bo->base.resv);
 
-	return r == -EDEADLK ? -EBUSY : r;
+	if (r == -EDEADLK) {
+		if (ctx->propagate_edeadlk) {
+			ttm_bo_get(busy_bo);
+			ctx->contended_bo = busy_bo;
+		}
+		r = -EBUSY;
+	}
+
+	return r;
 }
 
 int ttm_mem_evict_first(struct ttm_device *bdev,
@@ -816,7 +828,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
 			goto error;
 	}
 
-	ret = -ENOMEM;
+	ret = (ctx->propagate_edeadlk && ctx->contended_bo) ? -EDEADLK : -ENOMEM;
 	if (!type_found) {
 		pr_err(TTM_PFX "No compatible memory type found\n");
 		ret = -EINVAL;
@@ -913,6 +925,7 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
 		if (ret)
 			return ret;
 	}
+
 	return 0;
 }
 EXPORT_SYMBOL(ttm_bo_validate);
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 8b113c384236..d8e35a794ce5 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -181,8 +181,10 @@ struct ttm_operation_ctx {
 	bool gfp_retry_mayfail;
 	bool allow_res_evict;
 	bool force_alloc;
+	bool propagate_edeadlk;
 	struct dma_resv *resv;
 	uint64_t bytes_moved;
+	struct ttm_buffer_object *contended_bo;
 };
 
 /**
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/ttm: Allow the driver to resolve a WW transaction rollback
  2023-05-05 14:17 [Intel-gfx] [RFC PATCH] drm/ttm: Allow the driver to resolve a WW transaction rollback Thomas Hellström
@ 2023-05-05 20:19 ` Patchwork
  2023-05-05 20:33 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  2023-05-25 12:59 ` [Intel-gfx] [RFC PATCH] " Thomas Hellström
  2 siblings, 0 replies; 5+ messages in thread
From: Patchwork @ 2023-05-05 20:19 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-gfx

== Series Details ==

Series: drm/ttm: Allow the driver to resolve a WW transaction rollback
URL   : https://patchwork.freedesktop.org/series/117389/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/ttm: Allow the driver to resolve a WW transaction rollback
  2023-05-05 14:17 [Intel-gfx] [RFC PATCH] drm/ttm: Allow the driver to resolve a WW transaction rollback Thomas Hellström
  2023-05-05 20:19 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
@ 2023-05-05 20:33 ` Patchwork
  2023-05-25 12:59 ` [Intel-gfx] [RFC PATCH] " Thomas Hellström
  2 siblings, 0 replies; 5+ messages in thread
From: Patchwork @ 2023-05-05 20:33 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 5424 bytes --]

== Series Details ==

Series: drm/ttm: Allow the driver to resolve a WW transaction rollback
URL   : https://patchwork.freedesktop.org/series/117389/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_13114 -> Patchwork_117389v1
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_117389v1 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_117389v1, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/index.html

Participating hosts (40 -> 39)
------------------------------

  Missing    (1): fi-snb-2520m 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_117389v1:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_exec_fence@basic-busy@bcs0:
    - bat-adlp-9:         [PASS][1] -> [DMESG-WARN][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13114/bat-adlp-9/igt@gem_exec_fence@basic-busy@bcs0.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-adlp-9/igt@gem_exec_fence@basic-busy@bcs0.html

  * igt@gem_exec_fence@basic-busy@rcs0:
    - bat-adlp-9:         [PASS][3] -> [ABORT][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13114/bat-adlp-9/igt@gem_exec_fence@basic-busy@rcs0.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-adlp-9/igt@gem_exec_fence@basic-busy@rcs0.html

  
Known issues
------------

  Here are the changes found in Patchwork_117389v1 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live@gt_mocs:
    - bat-rpls-2:         [PASS][5] -> [INCOMPLETE][6] ([i915#4983])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13114/bat-rpls-2/igt@i915_selftest@live@gt_mocs.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-rpls-2/igt@i915_selftest@live@gt_mocs.html

  * igt@i915_selftest@live@slpc:
    - bat-rpls-1:         NOTRUN -> [DMESG-WARN][7] ([i915#6367])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-rpls-1/igt@i915_selftest@live@slpc.html

  * igt@kms_chamelium_hpd@common-hpd-after-suspend:
    - bat-adlp-6:         NOTRUN -> [SKIP][8] ([i915#7828])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-adlp-6/igt@kms_chamelium_hpd@common-hpd-after-suspend.html
    - bat-jsl-3:          NOTRUN -> [SKIP][9] ([i915#7828])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-jsl-3/igt@kms_chamelium_hpd@common-hpd-after-suspend.html
    - bat-rpls-1:         NOTRUN -> [SKIP][10] ([i915#7828])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-rpls-1/igt@kms_chamelium_hpd@common-hpd-after-suspend.html

  * igt@kms_pipe_crc_basic@suspend-read-crc:
    - bat-rpls-1:         NOTRUN -> [SKIP][11] ([i915#1845])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-rpls-1/igt@kms_pipe_crc_basic@suspend-read-crc.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@hangcheck:
    - bat-adlp-6:         [ABORT][12] ([i915#7677] / [i915#7913]) -> [PASS][13]
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13114/bat-adlp-6/igt@i915_selftest@live@hangcheck.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-adlp-6/igt@i915_selftest@live@hangcheck.html

  * igt@i915_selftest@live@requests:
    - bat-rpls-1:         [ABORT][14] ([i915#4983] / [i915#7911] / [i915#7920]) -> [PASS][15]
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13114/bat-rpls-1/igt@i915_selftest@live@requests.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-rpls-1/igt@i915_selftest@live@requests.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#6367]: https://gitlab.freedesktop.org/drm/intel/issues/6367
  [i915#7677]: https://gitlab.freedesktop.org/drm/intel/issues/7677
  [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828
  [i915#7911]: https://gitlab.freedesktop.org/drm/intel/issues/7911
  [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913
  [i915#7920]: https://gitlab.freedesktop.org/drm/intel/issues/7920
  [i915#8379]: https://gitlab.freedesktop.org/drm/intel/issues/8379


Build changes
-------------

  * Linux: CI_DRM_13114 -> Patchwork_117389v1

  CI-20190529: 20190529
  CI_DRM_13114: b4d6f70062cd04a8fdb9872828bcbe4767a4f833 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7281: 9e9cd7e69a393b7cce8fc12fce409eb59817dd7e @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_117389v1: b4d6f70062cd04a8fdb9872828bcbe4767a4f833 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

91cde99d7228 drm/ttm: Allow the driver to resolve a WW transaction rollback

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/index.html

[-- Attachment #2: Type: text/html, Size: 6350 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] [RFC PATCH] drm/ttm: Allow the driver to resolve a WW transaction rollback
  2023-05-05 14:17 [Intel-gfx] [RFC PATCH] drm/ttm: Allow the driver to resolve a WW transaction rollback Thomas Hellström
  2023-05-05 20:19 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
  2023-05-05 20:33 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
@ 2023-05-25 12:59 ` Thomas Hellström
  2023-05-25 13:59   ` Christian König
  2 siblings, 1 reply; 5+ messages in thread
From: Thomas Hellström @ 2023-05-25 12:59 UTC (permalink / raw)
  To: dri-devel, Christian Koenig; +Cc: intel-gfx, intel-xe

On Fri, 2023-05-05 at 16:17 +0200, Thomas Hellström wrote:
> Allow drivers to resolve a WW transaction rollback. This allows for
> 1) Putting a lower-priority transaction to sleep allowing another to
> succeed instead both fighting using trylocks.
> 2) Letting the driver know whether a received -ENOMEM is the result
> of
> competition with another WW transaction, which can be resolved using
> rollback and retry or a real -ENOMEM which should be propagated back
> to user-space as a failure.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Christian, Any objections?

/Thomas


> ---
>  drivers/gpu/drm/ttm/ttm_bo.c | 17 +++++++++++++++--
>  include/drm/ttm/ttm_bo.h     |  2 ++
>  2 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
> b/drivers/gpu/drm/ttm/ttm_bo.c
> index bd5dae4d1624..c3ccbea2be3e 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -561,6 +561,10 @@ static int ttm_mem_evict_wait_busy(struct
> ttm_buffer_object *busy_bo,
>         if (!busy_bo || !ticket)
>                 return -EBUSY;
>  
> +       /* We want to resolve contention before trying to lock again.
> */
> +       if (ctx->propagate_edeadlk && ctx->contended_bo)
> +               return  -EDEADLK;
> +
>         if (ctx->interruptible)
>                 r = dma_resv_lock_interruptible(busy_bo->base.resv,
>                                                           ticket);
> @@ -575,7 +579,15 @@ static int ttm_mem_evict_wait_busy(struct
> ttm_buffer_object *busy_bo,
>         if (!r)
>                 dma_resv_unlock(busy_bo->base.resv);
>  
> -       return r == -EDEADLK ? -EBUSY : r;
> +       if (r == -EDEADLK) {
> +               if (ctx->propagate_edeadlk) {
> +                       ttm_bo_get(busy_bo);
> +                       ctx->contended_bo = busy_bo;
> +               }
> +               r = -EBUSY;
> +       }
> +
> +       return r;
>  }
>  
>  int ttm_mem_evict_first(struct ttm_device *bdev,
> @@ -816,7 +828,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object
> *bo,
>                         goto error;
>         }
>  
> -       ret = -ENOMEM;
> +       ret = (ctx->propagate_edeadlk && ctx->contended_bo) ? -
> EDEADLK : -ENOMEM;
>         if (!type_found) {
>                 pr_err(TTM_PFX "No compatible memory type found\n");
>                 ret = -EINVAL;
> @@ -913,6 +925,7 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
>                 if (ret)
>                         return ret;
>         }
> +
>         return 0;
>  }
>  EXPORT_SYMBOL(ttm_bo_validate);
> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> index 8b113c384236..d8e35a794ce5 100644
> --- a/include/drm/ttm/ttm_bo.h
> +++ b/include/drm/ttm/ttm_bo.h
> @@ -181,8 +181,10 @@ struct ttm_operation_ctx {
>         bool gfp_retry_mayfail;
>         bool allow_res_evict;
>         bool force_alloc;
> +       bool propagate_edeadlk;
>         struct dma_resv *resv;
>         uint64_t bytes_moved;
> +       struct ttm_buffer_object *contended_bo;
>  };
>  
>  /**


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] [RFC PATCH] drm/ttm: Allow the driver to resolve a WW transaction rollback
  2023-05-25 12:59 ` [Intel-gfx] [RFC PATCH] " Thomas Hellström
@ 2023-05-25 13:59   ` Christian König
  0 siblings, 0 replies; 5+ messages in thread
From: Christian König @ 2023-05-25 13:59 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel; +Cc: intel-gfx, intel-xe

Am 25.05.23 um 14:59 schrieb Thomas Hellström:
> On Fri, 2023-05-05 at 16:17 +0200, Thomas Hellström wrote:
>> Allow drivers to resolve a WW transaction rollback. This allows for
>> 1) Putting a lower-priority transaction to sleep allowing another to
>> succeed instead both fighting using trylocks.
>> 2) Letting the driver know whether a received -ENOMEM is the result
>> of
>> competition with another WW transaction, which can be resolved using
>> rollback and retry or a real -ENOMEM which should be propagated back
>> to user-space as a failure.
>>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Christian, Any objections?

General idea sounds like what I had in mind as well, but I've moved both 
my office and household in the last two weeks and are now digging 
through >800 unread mails/patches.

Give me some days to catch up and I can take a detailed look.

Christian.

>
> /Thomas
>
>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo.c | 17 +++++++++++++++--
>>   include/drm/ttm/ttm_bo.h     |  2 ++
>>   2 files changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
>> b/drivers/gpu/drm/ttm/ttm_bo.c
>> index bd5dae4d1624..c3ccbea2be3e 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -561,6 +561,10 @@ static int ttm_mem_evict_wait_busy(struct
>> ttm_buffer_object *busy_bo,
>>          if (!busy_bo || !ticket)
>>                  return -EBUSY;
>>   
>> +       /* We want to resolve contention before trying to lock again.
>> */
>> +       if (ctx->propagate_edeadlk && ctx->contended_bo)
>> +               return  -EDEADLK;
>> +
>>          if (ctx->interruptible)
>>                  r = dma_resv_lock_interruptible(busy_bo->base.resv,
>>                                                            ticket);
>> @@ -575,7 +579,15 @@ static int ttm_mem_evict_wait_busy(struct
>> ttm_buffer_object *busy_bo,
>>          if (!r)
>>                  dma_resv_unlock(busy_bo->base.resv);
>>   
>> -       return r == -EDEADLK ? -EBUSY : r;
>> +       if (r == -EDEADLK) {
>> +               if (ctx->propagate_edeadlk) {
>> +                       ttm_bo_get(busy_bo);
>> +                       ctx->contended_bo = busy_bo;
>> +               }
>> +               r = -EBUSY;
>> +       }
>> +
>> +       return r;
>>   }
>>   
>>   int ttm_mem_evict_first(struct ttm_device *bdev,
>> @@ -816,7 +828,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object
>> *bo,
>>                          goto error;
>>          }
>>   
>> -       ret = -ENOMEM;
>> +       ret = (ctx->propagate_edeadlk && ctx->contended_bo) ? -
>> EDEADLK : -ENOMEM;
>>          if (!type_found) {
>>                  pr_err(TTM_PFX "No compatible memory type found\n");
>>                  ret = -EINVAL;
>> @@ -913,6 +925,7 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
>>                  if (ret)
>>                          return ret;
>>          }
>> +
>>          return 0;
>>   }
>>   EXPORT_SYMBOL(ttm_bo_validate);
>> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
>> index 8b113c384236..d8e35a794ce5 100644
>> --- a/include/drm/ttm/ttm_bo.h
>> +++ b/include/drm/ttm/ttm_bo.h
>> @@ -181,8 +181,10 @@ struct ttm_operation_ctx {
>>          bool gfp_retry_mayfail;
>>          bool allow_res_evict;
>>          bool force_alloc;
>> +       bool propagate_edeadlk;
>>          struct dma_resv *resv;
>>          uint64_t bytes_moved;
>> +       struct ttm_buffer_object *contended_bo;
>>   };
>>   
>>   /**


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-05-25 13:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-05 14:17 [Intel-gfx] [RFC PATCH] drm/ttm: Allow the driver to resolve a WW transaction rollback Thomas Hellström
2023-05-05 20:19 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
2023-05-05 20:33 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2023-05-25 12:59 ` [Intel-gfx] [RFC PATCH] " Thomas Hellström
2023-05-25 13:59   ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox