* [Intel-gfx] [RFC PATCH] drm/ttm: Allow the driver to resolve a WW transaction rollback
@ 2023-05-05 14:17 Thomas Hellström
2023-05-05 20:19 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Thomas Hellström @ 2023-05-05 14:17 UTC (permalink / raw)
To: dri-devel; +Cc: Thomas Hellström, intel-gfx, Christian Koenig, intel-xe
Allow drivers to resolve a WW transaction rollback. This allows for
1) Putting a lower-priority transaction to sleep allowing another to
succeed instead both fighting using trylocks.
2) Letting the driver know whether a received -ENOMEM is the result of
competition with another WW transaction, which can be resolved using
rollback and retry or a real -ENOMEM which should be propagated back
to user-space as a failure.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
drivers/gpu/drm/ttm/ttm_bo.c | 17 +++++++++++++++--
include/drm/ttm/ttm_bo.h | 2 ++
2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bd5dae4d1624..c3ccbea2be3e 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -561,6 +561,10 @@ static int ttm_mem_evict_wait_busy(struct ttm_buffer_object *busy_bo,
if (!busy_bo || !ticket)
return -EBUSY;
+ /* We want to resolve contention before trying to lock again. */
+ if (ctx->propagate_edeadlk && ctx->contended_bo)
+ return -EDEADLK;
+
if (ctx->interruptible)
r = dma_resv_lock_interruptible(busy_bo->base.resv,
ticket);
@@ -575,7 +579,15 @@ static int ttm_mem_evict_wait_busy(struct ttm_buffer_object *busy_bo,
if (!r)
dma_resv_unlock(busy_bo->base.resv);
- return r == -EDEADLK ? -EBUSY : r;
+ if (r == -EDEADLK) {
+ if (ctx->propagate_edeadlk) {
+ ttm_bo_get(busy_bo);
+ ctx->contended_bo = busy_bo;
+ }
+ r = -EBUSY;
+ }
+
+ return r;
}
int ttm_mem_evict_first(struct ttm_device *bdev,
@@ -816,7 +828,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
goto error;
}
- ret = -ENOMEM;
+ ret = (ctx->propagate_edeadlk && ctx->contended_bo) ? -EDEADLK : -ENOMEM;
if (!type_found) {
pr_err(TTM_PFX "No compatible memory type found\n");
ret = -EINVAL;
@@ -913,6 +925,7 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
if (ret)
return ret;
}
+
return 0;
}
EXPORT_SYMBOL(ttm_bo_validate);
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 8b113c384236..d8e35a794ce5 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -181,8 +181,10 @@ struct ttm_operation_ctx {
bool gfp_retry_mayfail;
bool allow_res_evict;
bool force_alloc;
+ bool propagate_edeadlk;
struct dma_resv *resv;
uint64_t bytes_moved;
+ struct ttm_buffer_object *contended_bo;
};
/**
--
2.39.2
^ permalink raw reply related [flat|nested] 5+ messages in thread* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/ttm: Allow the driver to resolve a WW transaction rollback
2023-05-05 14:17 [Intel-gfx] [RFC PATCH] drm/ttm: Allow the driver to resolve a WW transaction rollback Thomas Hellström
@ 2023-05-05 20:19 ` Patchwork
2023-05-05 20:33 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2023-05-25 12:59 ` [Intel-gfx] [RFC PATCH] " Thomas Hellström
2 siblings, 0 replies; 5+ messages in thread
From: Patchwork @ 2023-05-05 20:19 UTC (permalink / raw)
To: Thomas Hellström; +Cc: intel-gfx
== Series Details ==
Series: drm/ttm: Allow the driver to resolve a WW transaction rollback
URL : https://patchwork.freedesktop.org/series/117389/
State : warning
== Summary ==
Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/ttm: Allow the driver to resolve a WW transaction rollback
2023-05-05 14:17 [Intel-gfx] [RFC PATCH] drm/ttm: Allow the driver to resolve a WW transaction rollback Thomas Hellström
2023-05-05 20:19 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
@ 2023-05-05 20:33 ` Patchwork
2023-05-25 12:59 ` [Intel-gfx] [RFC PATCH] " Thomas Hellström
2 siblings, 0 replies; 5+ messages in thread
From: Patchwork @ 2023-05-05 20:33 UTC (permalink / raw)
To: Thomas Hellström; +Cc: intel-gfx
[-- Attachment #1: Type: text/plain, Size: 5424 bytes --]
== Series Details ==
Series: drm/ttm: Allow the driver to resolve a WW transaction rollback
URL : https://patchwork.freedesktop.org/series/117389/
State : failure
== Summary ==
CI Bug Log - changes from CI_DRM_13114 -> Patchwork_117389v1
====================================================
Summary
-------
**FAILURE**
Serious unknown changes coming with Patchwork_117389v1 absolutely need to be
verified manually.
If you think the reported changes have nothing to do with the changes
introduced in Patchwork_117389v1, please notify your bug team to allow them
to document this new failure mode, which will reduce false positives in CI.
External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/index.html
Participating hosts (40 -> 39)
------------------------------
Missing (1): fi-snb-2520m
Possible new issues
-------------------
Here are the unknown changes that may have been introduced in Patchwork_117389v1:
### IGT changes ###
#### Possible regressions ####
* igt@gem_exec_fence@basic-busy@bcs0:
- bat-adlp-9: [PASS][1] -> [DMESG-WARN][2]
[1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13114/bat-adlp-9/igt@gem_exec_fence@basic-busy@bcs0.html
[2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-adlp-9/igt@gem_exec_fence@basic-busy@bcs0.html
* igt@gem_exec_fence@basic-busy@rcs0:
- bat-adlp-9: [PASS][3] -> [ABORT][4]
[3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13114/bat-adlp-9/igt@gem_exec_fence@basic-busy@rcs0.html
[4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-adlp-9/igt@gem_exec_fence@basic-busy@rcs0.html
Known issues
------------
Here are the changes found in Patchwork_117389v1 that come from known issues:
### IGT changes ###
#### Issues hit ####
* igt@i915_selftest@live@gt_mocs:
- bat-rpls-2: [PASS][5] -> [INCOMPLETE][6] ([i915#4983])
[5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13114/bat-rpls-2/igt@i915_selftest@live@gt_mocs.html
[6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-rpls-2/igt@i915_selftest@live@gt_mocs.html
* igt@i915_selftest@live@slpc:
- bat-rpls-1: NOTRUN -> [DMESG-WARN][7] ([i915#6367])
[7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-rpls-1/igt@i915_selftest@live@slpc.html
* igt@kms_chamelium_hpd@common-hpd-after-suspend:
- bat-adlp-6: NOTRUN -> [SKIP][8] ([i915#7828])
[8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-adlp-6/igt@kms_chamelium_hpd@common-hpd-after-suspend.html
- bat-jsl-3: NOTRUN -> [SKIP][9] ([i915#7828])
[9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-jsl-3/igt@kms_chamelium_hpd@common-hpd-after-suspend.html
- bat-rpls-1: NOTRUN -> [SKIP][10] ([i915#7828])
[10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-rpls-1/igt@kms_chamelium_hpd@common-hpd-after-suspend.html
* igt@kms_pipe_crc_basic@suspend-read-crc:
- bat-rpls-1: NOTRUN -> [SKIP][11] ([i915#1845])
[11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-rpls-1/igt@kms_pipe_crc_basic@suspend-read-crc.html
#### Possible fixes ####
* igt@i915_selftest@live@hangcheck:
- bat-adlp-6: [ABORT][12] ([i915#7677] / [i915#7913]) -> [PASS][13]
[12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13114/bat-adlp-6/igt@i915_selftest@live@hangcheck.html
[13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-adlp-6/igt@i915_selftest@live@hangcheck.html
* igt@i915_selftest@live@requests:
- bat-rpls-1: [ABORT][14] ([i915#4983] / [i915#7911] / [i915#7920]) -> [PASS][15]
[14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13114/bat-rpls-1/igt@i915_selftest@live@requests.html
[15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/bat-rpls-1/igt@i915_selftest@live@requests.html
{name}: This element is suppressed. This means it is ignored when computing
the status of the difference (SUCCESS, WARNING, or FAILURE).
[i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
[i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
[i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
[i915#6367]: https://gitlab.freedesktop.org/drm/intel/issues/6367
[i915#7677]: https://gitlab.freedesktop.org/drm/intel/issues/7677
[i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828
[i915#7911]: https://gitlab.freedesktop.org/drm/intel/issues/7911
[i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913
[i915#7920]: https://gitlab.freedesktop.org/drm/intel/issues/7920
[i915#8379]: https://gitlab.freedesktop.org/drm/intel/issues/8379
Build changes
-------------
* Linux: CI_DRM_13114 -> Patchwork_117389v1
CI-20190529: 20190529
CI_DRM_13114: b4d6f70062cd04a8fdb9872828bcbe4767a4f833 @ git://anongit.freedesktop.org/gfx-ci/linux
IGT_7281: 9e9cd7e69a393b7cce8fc12fce409eb59817dd7e @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
Patchwork_117389v1: b4d6f70062cd04a8fdb9872828bcbe4767a4f833 @ git://anongit.freedesktop.org/gfx-ci/linux
### Linux commits
91cde99d7228 drm/ttm: Allow the driver to resolve a WW transaction rollback
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_117389v1/index.html
[-- Attachment #2: Type: text/html, Size: 6350 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [Intel-gfx] [RFC PATCH] drm/ttm: Allow the driver to resolve a WW transaction rollback
2023-05-05 14:17 [Intel-gfx] [RFC PATCH] drm/ttm: Allow the driver to resolve a WW transaction rollback Thomas Hellström
2023-05-05 20:19 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
2023-05-05 20:33 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
@ 2023-05-25 12:59 ` Thomas Hellström
2023-05-25 13:59 ` Christian König
2 siblings, 1 reply; 5+ messages in thread
From: Thomas Hellström @ 2023-05-25 12:59 UTC (permalink / raw)
To: dri-devel, Christian Koenig; +Cc: intel-gfx, intel-xe
On Fri, 2023-05-05 at 16:17 +0200, Thomas Hellström wrote:
> Allow drivers to resolve a WW transaction rollback. This allows for
> 1) Putting a lower-priority transaction to sleep allowing another to
> succeed instead both fighting using trylocks.
> 2) Letting the driver know whether a received -ENOMEM is the result
> of
> competition with another WW transaction, which can be resolved using
> rollback and retry or a real -ENOMEM which should be propagated back
> to user-space as a failure.
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Christian, Any objections?
/Thomas
> ---
> drivers/gpu/drm/ttm/ttm_bo.c | 17 +++++++++++++++--
> include/drm/ttm/ttm_bo.h | 2 ++
> 2 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
> b/drivers/gpu/drm/ttm/ttm_bo.c
> index bd5dae4d1624..c3ccbea2be3e 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -561,6 +561,10 @@ static int ttm_mem_evict_wait_busy(struct
> ttm_buffer_object *busy_bo,
> if (!busy_bo || !ticket)
> return -EBUSY;
>
> + /* We want to resolve contention before trying to lock again.
> */
> + if (ctx->propagate_edeadlk && ctx->contended_bo)
> + return -EDEADLK;
> +
> if (ctx->interruptible)
> r = dma_resv_lock_interruptible(busy_bo->base.resv,
> ticket);
> @@ -575,7 +579,15 @@ static int ttm_mem_evict_wait_busy(struct
> ttm_buffer_object *busy_bo,
> if (!r)
> dma_resv_unlock(busy_bo->base.resv);
>
> - return r == -EDEADLK ? -EBUSY : r;
> + if (r == -EDEADLK) {
> + if (ctx->propagate_edeadlk) {
> + ttm_bo_get(busy_bo);
> + ctx->contended_bo = busy_bo;
> + }
> + r = -EBUSY;
> + }
> +
> + return r;
> }
>
> int ttm_mem_evict_first(struct ttm_device *bdev,
> @@ -816,7 +828,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object
> *bo,
> goto error;
> }
>
> - ret = -ENOMEM;
> + ret = (ctx->propagate_edeadlk && ctx->contended_bo) ? -
> EDEADLK : -ENOMEM;
> if (!type_found) {
> pr_err(TTM_PFX "No compatible memory type found\n");
> ret = -EINVAL;
> @@ -913,6 +925,7 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
> if (ret)
> return ret;
> }
> +
> return 0;
> }
> EXPORT_SYMBOL(ttm_bo_validate);
> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> index 8b113c384236..d8e35a794ce5 100644
> --- a/include/drm/ttm/ttm_bo.h
> +++ b/include/drm/ttm/ttm_bo.h
> @@ -181,8 +181,10 @@ struct ttm_operation_ctx {
> bool gfp_retry_mayfail;
> bool allow_res_evict;
> bool force_alloc;
> + bool propagate_edeadlk;
> struct dma_resv *resv;
> uint64_t bytes_moved;
> + struct ttm_buffer_object *contended_bo;
> };
>
> /**
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [Intel-gfx] [RFC PATCH] drm/ttm: Allow the driver to resolve a WW transaction rollback
2023-05-25 12:59 ` [Intel-gfx] [RFC PATCH] " Thomas Hellström
@ 2023-05-25 13:59 ` Christian König
0 siblings, 0 replies; 5+ messages in thread
From: Christian König @ 2023-05-25 13:59 UTC (permalink / raw)
To: Thomas Hellström, dri-devel; +Cc: intel-gfx, intel-xe
Am 25.05.23 um 14:59 schrieb Thomas Hellström:
> On Fri, 2023-05-05 at 16:17 +0200, Thomas Hellström wrote:
>> Allow drivers to resolve a WW transaction rollback. This allows for
>> 1) Putting a lower-priority transaction to sleep allowing another to
>> succeed instead both fighting using trylocks.
>> 2) Letting the driver know whether a received -ENOMEM is the result
>> of
>> competition with another WW transaction, which can be resolved using
>> rollback and retry or a real -ENOMEM which should be propagated back
>> to user-space as a failure.
>>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Christian, Any objections?
General idea sounds like what I had in mind as well, but I've moved both
my office and household in the last two weeks and are now digging
through >800 unread mails/patches.
Give me some days to catch up and I can take a detailed look.
Christian.
>
> /Thomas
>
>
>> ---
>> drivers/gpu/drm/ttm/ttm_bo.c | 17 +++++++++++++++--
>> include/drm/ttm/ttm_bo.h | 2 ++
>> 2 files changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
>> b/drivers/gpu/drm/ttm/ttm_bo.c
>> index bd5dae4d1624..c3ccbea2be3e 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -561,6 +561,10 @@ static int ttm_mem_evict_wait_busy(struct
>> ttm_buffer_object *busy_bo,
>> if (!busy_bo || !ticket)
>> return -EBUSY;
>>
>> + /* We want to resolve contention before trying to lock again.
>> */
>> + if (ctx->propagate_edeadlk && ctx->contended_bo)
>> + return -EDEADLK;
>> +
>> if (ctx->interruptible)
>> r = dma_resv_lock_interruptible(busy_bo->base.resv,
>> ticket);
>> @@ -575,7 +579,15 @@ static int ttm_mem_evict_wait_busy(struct
>> ttm_buffer_object *busy_bo,
>> if (!r)
>> dma_resv_unlock(busy_bo->base.resv);
>>
>> - return r == -EDEADLK ? -EBUSY : r;
>> + if (r == -EDEADLK) {
>> + if (ctx->propagate_edeadlk) {
>> + ttm_bo_get(busy_bo);
>> + ctx->contended_bo = busy_bo;
>> + }
>> + r = -EBUSY;
>> + }
>> +
>> + return r;
>> }
>>
>> int ttm_mem_evict_first(struct ttm_device *bdev,
>> @@ -816,7 +828,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object
>> *bo,
>> goto error;
>> }
>>
>> - ret = -ENOMEM;
>> + ret = (ctx->propagate_edeadlk && ctx->contended_bo) ? -
>> EDEADLK : -ENOMEM;
>> if (!type_found) {
>> pr_err(TTM_PFX "No compatible memory type found\n");
>> ret = -EINVAL;
>> @@ -913,6 +925,7 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
>> if (ret)
>> return ret;
>> }
>> +
>> return 0;
>> }
>> EXPORT_SYMBOL(ttm_bo_validate);
>> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
>> index 8b113c384236..d8e35a794ce5 100644
>> --- a/include/drm/ttm/ttm_bo.h
>> +++ b/include/drm/ttm/ttm_bo.h
>> @@ -181,8 +181,10 @@ struct ttm_operation_ctx {
>> bool gfp_retry_mayfail;
>> bool allow_res_evict;
>> bool force_alloc;
>> + bool propagate_edeadlk;
>> struct dma_resv *resv;
>> uint64_t bytes_moved;
>> + struct ttm_buffer_object *contended_bo;
>> };
>>
>> /**
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-05-25 13:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-05 14:17 [Intel-gfx] [RFC PATCH] drm/ttm: Allow the driver to resolve a WW transaction rollback Thomas Hellström
2023-05-05 20:19 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
2023-05-05 20:33 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2023-05-25 12:59 ` [Intel-gfx] [RFC PATCH] " Thomas Hellström
2023-05-25 13:59 ` Christian König
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox