stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/4] drm/ttm: Fix ttm_lru_bulk_move_pos_tail()
       [not found] <20230622101412.78426-1-thomas.hellstrom@linux.intel.com>
@ 2023-06-22 10:14 ` Thomas Hellström
  2023-06-22 12:26   ` [Intel-gfx] " Nirmoy Das
  2023-06-22 12:26   ` Christian König
  2023-06-22 10:14 ` [PATCH 2/4] drm/ttm: Don't shadow the operation context Thomas Hellström
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Thomas Hellström @ 2023-06-22 10:14 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Christian König, Christian König,
	Daniel Vetter, dri-devel, stable, intel-gfx, linux-kernel

The value of pos->first was not updated when the first resource of the
range was moved. This could lead to errors like the one below.
Fix this by updating pos->first when needed.

<3> [218.963342] BUG: KASAN: null-ptr-deref in ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
<3> [218.963456] Read of size 8 at addr 0000000000000038 by task xe_evict/1529
<3> [218.963546]
<3> [218.963566] CPU: 0 PID: 1529 Comm: xe_evict Not tainted 6.3.0-xe #1
<3> [218.963664] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake H DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.4064.A00.2102041619 02/04/2021
<3> [218.963841] Call Trace:
<3> [218.963881]  <TASK>
<3> [218.963915]  dump_stack_lvl+0x64/0xb0
<3> [218.963976]  print_report+0x3e5/0x600
<3> [218.964036]  ? ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
<3> [218.964127]  kasan_report+0x96/0xc0
<3> [218.964183]  ? ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
<3> [218.964276]  ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
<3> [218.964365]  ttm_bo_set_bulk_move+0x92/0x140 [ttm]
<3> [218.964454]  xe_gem_object_close+0xc8/0x120 [xe]
<3> [218.964675]  ? __pfx_xe_gem_object_close+0x10/0x10 [xe]
<3> [218.964908]  ? drm_gem_object_handle_put_unlocked+0xc7/0x170 [drm]
<3> [218.965071]  drm_gem_object_release_handle+0x45/0x80 [drm]
<3> [218.965220]  ? __pfx_drm_gem_object_release_handle+0x10/0x10 [drm]
<3> [218.965381]  idr_for_each+0xc9/0x180
<3> [218.965437]  ? __pfx_idr_for_each+0x10/0x10
<3> [218.965504]  drm_gem_release+0x20/0x30 [drm]
<3> [218.965637]  drm_file_free.part.0+0x4cb/0x4f0 [drm]
<3> [218.965778]  ? drm_close_helper.isra.0+0xb7/0xe0 [drm]
<3> [218.965921]  drm_release_noglobal+0x49/0x90 [drm]
<3> [218.966061]  __fput+0x122/0x450
<3> [218.966115]  task_work_run+0xfe/0x190
<3> [218.966175]  ? __pfx_task_work_run+0x10/0x10
<3> [218.966239]  ? do_raw_spin_unlock+0xa7/0x140
<3> [218.966308]  do_exit+0x55f/0x1430
<3> [218.966364]  ? __pfx_lock_release+0x10/0x10
<3> [218.966431]  ? do_raw_spin_lock+0x11d/0x1e0
<3> [218.966498]  ? __pfx_do_exit+0x10/0x10
<3> [218.966554]  ? __pfx_do_raw_spin_lock+0x10/0x10
<3> [218.966625]  ? mark_held_locks+0x24/0x90
<3> [218.966688]  ? lockdep_hardirqs_on_prepare+0x136/0x210
<3> [218.966768]  do_group_exit+0x68/0x110
<3> [218.966828]  __x64_sys_exit_group+0x2c/0x30
<3> [218.966896]  do_syscall_64+0x3c/0x90
<3> [218.966955]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
<3> [218.967035] RIP: 0033:0x7f77b194f146
<3> [218.967094] Code: Unable to access opcode bytes at 0x7f77b194f11c.
<3> [218.967174] RSP: 002b:00007ffc64791188 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
<3> [218.967271] RAX: ffffffffffffffda RBX: 00007f77b1a548a0 RCX: 00007f77b194f146
<3> [218.967364] RDX: 0000000000000062 RSI: 000000000000003c RDI: 0000000000000062
<3> [218.967458] RBP: 0000000000000062 R08: 00000000000000e7 R09: ffffffffffffff78
<3> [218.967553] R10: 0000000000000058 R11: 0000000000000246 R12: 00007f77b1a548a0
<3> [218.967648] R13: 0000000000000003 R14: 00007f77b1a5d2e8 R15: 0000000000000000
<3> [218.967745]  </TASK>

Fixes: fee2ede15542 ("drm/ttm: rework bulk move handling v5")
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.19+
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/411
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_resource.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
index 7333f7a87a2f..cb05e0a36576 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -86,6 +86,8 @@ static void ttm_lru_bulk_move_pos_tail(struct ttm_lru_bulk_move_pos *pos,
 				       struct ttm_resource *res)
 {
 	if (pos->last != res) {
+		if (pos->first == res)
+			pos->first = list_next_entry(res, lru);
 		list_move(&res->lru, &pos->last->lru);
 		pos->last = res;
 	}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] drm/ttm: Don't shadow the operation context
       [not found] <20230622101412.78426-1-thomas.hellstrom@linux.intel.com>
  2023-06-22 10:14 ` [PATCH 1/4] drm/ttm: Fix ttm_lru_bulk_move_pos_tail() Thomas Hellström
@ 2023-06-22 10:14 ` Thomas Hellström
  2023-06-22 10:14 ` [PATCH 3/4] drm/ttm: Don't leak a resource on eviction error Thomas Hellström
  2023-06-22 10:14 ` [PATCH 4/4] drm/ttm: Don't leak a resource on swapout move error Thomas Hellström
  3 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2023-06-22 10:14 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Christian König, Roger He, dri-devel,
	intel-gfx, stable, Matthew Brost, linux-kernel,
	Christian König

ttm_bo_swapout() shadows the ttm operation context which may cause
major confusion in driver callbacks when swapping out !TTM_PL_SYSTEM
memory. Fix this by reusing the operation context argument to
ttm_bo_swapout().

Cc: "Christian König" <christian.koenig@amd.com>
Cc: Roger He <Hongbo.He@amd.com>
Cc: <dri-devel@lists.freedesktop.org>
Cc: <intel-gfx@lists.freedesktop.org>
Cc: <stable@vger.kernel.org> # v4.16+
Fixes: dc947770cf34 ("drm/ttm: enable swapout for reserved BOs during allocation")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bd5dae4d1624..615d30c4262d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1154,7 +1154,6 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
 	 * Move to system cached
 	 */
 	if (bo->resource->mem_type != TTM_PL_SYSTEM) {
-		struct ttm_operation_ctx ctx = { false, false };
 		struct ttm_resource *evict_mem;
 		struct ttm_place hop;
 
@@ -1164,7 +1163,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
 		if (unlikely(ret))
 			goto out;
 
-		ret = ttm_bo_handle_move_mem(bo, evict_mem, true, &ctx, &hop);
+		ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
 		if (unlikely(ret != 0)) {
 			WARN(ret == -EMULTIHOP, "Unexpected multihop in swaput - likely driver bug.\n");
 			goto out;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] drm/ttm: Don't leak a resource on eviction error
       [not found] <20230622101412.78426-1-thomas.hellstrom@linux.intel.com>
  2023-06-22 10:14 ` [PATCH 1/4] drm/ttm: Fix ttm_lru_bulk_move_pos_tail() Thomas Hellström
  2023-06-22 10:14 ` [PATCH 2/4] drm/ttm: Don't shadow the operation context Thomas Hellström
@ 2023-06-22 10:14 ` Thomas Hellström
  2023-06-22 12:25   ` Nirmoy Das
  2023-06-22 13:55   ` [Intel-gfx] " Andi Shyti
  2023-06-22 10:14 ` [PATCH 4/4] drm/ttm: Don't leak a resource on swapout move error Thomas Hellström
  3 siblings, 2 replies; 14+ messages in thread
From: Thomas Hellström @ 2023-06-22 10:14 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Andrey Grodzovsky, Christian König,
	Huang Rui, dri-devel, stable, intel-gfx, linux-kernel,
	Christian König

On eviction errors other than -EMULTIHOP we were leaking a resource.
Fix.

Fixes: 403797925768 ("drm/ttm: Fix multihop assert on eviction.")
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.15+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 615d30c4262d..89530f2a027f 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -462,14 +462,14 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
 	ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
 	if (ret == -EMULTIHOP) {
 		ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);
-		if (ret) {
-			if (ret != -ERESTARTSYS && ret != -EINTR)
-				pr_err("Buffer eviction failed\n");
-			ttm_resource_free(bo, &evict_mem);
-			goto out;
-		}
-		/* try and move to final place now. */
-		goto bounce;
+		if (!ret)
+			/* try and move to final place now. */
+			goto bounce;
+	}
+	if (ret) {
+		ttm_resource_free(bo, &evict_mem);
+		if (ret != -ERESTARTSYS && ret != -EINTR)
+			pr_err("Buffer eviction failed\n");
 	}
 out:
 	return ret;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] drm/ttm: Don't leak a resource on swapout move error
       [not found] <20230622101412.78426-1-thomas.hellstrom@linux.intel.com>
                   ` (2 preceding siblings ...)
  2023-06-22 10:14 ` [PATCH 3/4] drm/ttm: Don't leak a resource on eviction error Thomas Hellström
@ 2023-06-22 10:14 ` Thomas Hellström
  2023-06-22 12:25   ` Nirmoy Das
  2023-06-22 13:56   ` [Intel-gfx] " Andi Shyti
  3 siblings, 2 replies; 14+ messages in thread
From: Thomas Hellström @ 2023-06-22 10:14 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Christian König, Christian König,
	dri-devel, stable, intel-gfx, linux-kernel

If moving the bo to system for swapout failed, we were leaking
a resource. Fix.

Fixes: bfa3357ef9ab ("drm/ttm: allocate resource object instead of embedding it v2")
Cc: Christian König <christian.koenig@amd.com>
Cc: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.14+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 89530f2a027f..d737ddd7f4c0 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1166,6 +1166,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
 		ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
 		if (unlikely(ret != 0)) {
 			WARN(ret == -EMULTIHOP, "Unexpected multihop in swaput - likely driver bug.\n");
+			ttm_resource_free(bo, &evict_mem);
 			goto out;
 		}
 	}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] drm/ttm: Don't leak a resource on eviction error
  2023-06-22 10:14 ` [PATCH 3/4] drm/ttm: Don't leak a resource on eviction error Thomas Hellström
@ 2023-06-22 12:25   ` Nirmoy Das
  2023-06-22 13:55   ` [Intel-gfx] " Andi Shyti
  1 sibling, 0 replies; 14+ messages in thread
From: Nirmoy Das @ 2023-06-22 12:25 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Andrey Grodzovsky, Christian König, intel-gfx, linux-kernel,
	stable, Huang Rui, dri-devel, Christian König


On 6/22/2023 12:14 PM, Thomas Hellström wrote:
> On eviction errors other than -EMULTIHOP we were leaking a resource.
> Fix.
>
> Fixes: 403797925768 ("drm/ttm: Fix multihop assert on eviction.")
> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v5.15+
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>

> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 16 ++++++++--------
>   1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 615d30c4262d..89530f2a027f 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -462,14 +462,14 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
>   	ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
>   	if (ret == -EMULTIHOP) {
>   		ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);
> -		if (ret) {
> -			if (ret != -ERESTARTSYS && ret != -EINTR)
> -				pr_err("Buffer eviction failed\n");
> -			ttm_resource_free(bo, &evict_mem);
> -			goto out;
> -		}
> -		/* try and move to final place now. */
> -		goto bounce;
> +		if (!ret)
> +			/* try and move to final place now. */
> +			goto bounce;
> +	}
> +	if (ret) {
> +		ttm_resource_free(bo, &evict_mem);
> +		if (ret != -ERESTARTSYS && ret != -EINTR)
> +			pr_err("Buffer eviction failed\n");
>   	}
>   out:
>   	return ret;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] drm/ttm: Don't leak a resource on swapout move error
  2023-06-22 10:14 ` [PATCH 4/4] drm/ttm: Don't leak a resource on swapout move error Thomas Hellström
@ 2023-06-22 12:25   ` Nirmoy Das
  2023-06-22 13:56   ` [Intel-gfx] " Andi Shyti
  1 sibling, 0 replies; 14+ messages in thread
From: Nirmoy Das @ 2023-06-22 12:25 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Christian König, intel-gfx, linux-kernel, stable, dri-devel,
	Christian König


On 6/22/2023 12:14 PM, Thomas Hellström wrote:
> If moving the bo to system for swapout failed, we were leaking
> a resource. Fix.
>
> Fixes: bfa3357ef9ab ("drm/ttm: allocate resource object instead of embedding it v2")
> Cc: Christian König <christian.koenig@amd.com>
> Cc: "Christian König" <ckoenig.leichtzumerken@gmail.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v5.14+
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 89530f2a027f..d737ddd7f4c0 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1166,6 +1166,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
>   		ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
>   		if (unlikely(ret != 0)) {
>   			WARN(ret == -EMULTIHOP, "Unexpected multihop in swaput - likely driver bug.\n");
> +			ttm_resource_free(bo, &evict_mem);
>   			goto out;
>   		}
>   	}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] drm/ttm: Fix ttm_lru_bulk_move_pos_tail()
  2023-06-22 10:14 ` [PATCH 1/4] drm/ttm: Fix ttm_lru_bulk_move_pos_tail() Thomas Hellström
@ 2023-06-22 12:26   ` Nirmoy Das
  2023-06-22 12:26   ` Christian König
  1 sibling, 0 replies; 14+ messages in thread
From: Nirmoy Das @ 2023-06-22 12:26 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Daniel Vetter, intel-gfx, linux-kernel, stable, dri-devel,
	Christian König, Christian König, Yunxiang.Li

Hi Thomas,

Saw a fix from Yunxiang for this: 
https://patchwork.freedesktop.org/patch/543652/?series=119697&rev=1


Regards,

Nirmoy

On 6/22/2023 12:14 PM, Thomas Hellström wrote:
> The value of pos->first was not updated when the first resource of the
> range was moved. This could lead to errors like the one below.
> Fix this by updating pos->first when needed.
>
> <3> [218.963342] BUG: KASAN: null-ptr-deref in ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
> <3> [218.963456] Read of size 8 at addr 0000000000000038 by task xe_evict/1529
> <3> [218.963546]
> <3> [218.963566] CPU: 0 PID: 1529 Comm: xe_evict Not tainted 6.3.0-xe #1
> <3> [218.963664] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake H DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.4064.A00.2102041619 02/04/2021
> <3> [218.963841] Call Trace:
> <3> [218.963881]  <TASK>
> <3> [218.963915]  dump_stack_lvl+0x64/0xb0
> <3> [218.963976]  print_report+0x3e5/0x600
> <3> [218.964036]  ? ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
> <3> [218.964127]  kasan_report+0x96/0xc0
> <3> [218.964183]  ? ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
> <3> [218.964276]  ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
> <3> [218.964365]  ttm_bo_set_bulk_move+0x92/0x140 [ttm]
> <3> [218.964454]  xe_gem_object_close+0xc8/0x120 [xe]
> <3> [218.964675]  ? __pfx_xe_gem_object_close+0x10/0x10 [xe]
> <3> [218.964908]  ? drm_gem_object_handle_put_unlocked+0xc7/0x170 [drm]
> <3> [218.965071]  drm_gem_object_release_handle+0x45/0x80 [drm]
> <3> [218.965220]  ? __pfx_drm_gem_object_release_handle+0x10/0x10 [drm]
> <3> [218.965381]  idr_for_each+0xc9/0x180
> <3> [218.965437]  ? __pfx_idr_for_each+0x10/0x10
> <3> [218.965504]  drm_gem_release+0x20/0x30 [drm]
> <3> [218.965637]  drm_file_free.part.0+0x4cb/0x4f0 [drm]
> <3> [218.965778]  ? drm_close_helper.isra.0+0xb7/0xe0 [drm]
> <3> [218.965921]  drm_release_noglobal+0x49/0x90 [drm]
> <3> [218.966061]  __fput+0x122/0x450
> <3> [218.966115]  task_work_run+0xfe/0x190
> <3> [218.966175]  ? __pfx_task_work_run+0x10/0x10
> <3> [218.966239]  ? do_raw_spin_unlock+0xa7/0x140
> <3> [218.966308]  do_exit+0x55f/0x1430
> <3> [218.966364]  ? __pfx_lock_release+0x10/0x10
> <3> [218.966431]  ? do_raw_spin_lock+0x11d/0x1e0
> <3> [218.966498]  ? __pfx_do_exit+0x10/0x10
> <3> [218.966554]  ? __pfx_do_raw_spin_lock+0x10/0x10
> <3> [218.966625]  ? mark_held_locks+0x24/0x90
> <3> [218.966688]  ? lockdep_hardirqs_on_prepare+0x136/0x210
> <3> [218.966768]  do_group_exit+0x68/0x110
> <3> [218.966828]  __x64_sys_exit_group+0x2c/0x30
> <3> [218.966896]  do_syscall_64+0x3c/0x90
> <3> [218.966955]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> <3> [218.967035] RIP: 0033:0x7f77b194f146
> <3> [218.967094] Code: Unable to access opcode bytes at 0x7f77b194f11c.
> <3> [218.967174] RSP: 002b:00007ffc64791188 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> <3> [218.967271] RAX: ffffffffffffffda RBX: 00007f77b1a548a0 RCX: 00007f77b194f146
> <3> [218.967364] RDX: 0000000000000062 RSI: 000000000000003c RDI: 0000000000000062
> <3> [218.967458] RBP: 0000000000000062 R08: 00000000000000e7 R09: ffffffffffffff78
> <3> [218.967553] R10: 0000000000000058 R11: 0000000000000246 R12: 00007f77b1a548a0
> <3> [218.967648] R13: 0000000000000003 R14: 00007f77b1a5d2e8 R15: 0000000000000000
> <3> [218.967745]  </TASK>
>
> Fixes: fee2ede15542 ("drm/ttm: rework bulk move handling v5")
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: "Christian König" <ckoenig.leichtzumerken@gmail.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v5.19+
> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/411
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/ttm/ttm_resource.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
> index 7333f7a87a2f..cb05e0a36576 100644
> --- a/drivers/gpu/drm/ttm/ttm_resource.c
> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
> @@ -86,6 +86,8 @@ static void ttm_lru_bulk_move_pos_tail(struct ttm_lru_bulk_move_pos *pos,
>   				       struct ttm_resource *res)
>   {
>   	if (pos->last != res) {
> +		if (pos->first == res)
> +			pos->first = list_next_entry(res, lru);
>   		list_move(&res->lru, &pos->last->lru);
>   		pos->last = res;
>   	}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] drm/ttm: Fix ttm_lru_bulk_move_pos_tail()
  2023-06-22 10:14 ` [PATCH 1/4] drm/ttm: Fix ttm_lru_bulk_move_pos_tail() Thomas Hellström
  2023-06-22 12:26   ` [Intel-gfx] " Nirmoy Das
@ 2023-06-22 12:26   ` Christian König
  1 sibling, 0 replies; 14+ messages in thread
From: Christian König @ 2023-06-22 12:26 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Christian König, Daniel Vetter, dri-devel, stable, intel-gfx,
	linux-kernel

Teddy already stumbled over this as well, but this patch here looks better.

Going to review and push it to drm-misc-fixes asap.

Thanks,
Christian.

Am 22.06.23 um 12:14 schrieb Thomas Hellström:
> The value of pos->first was not updated when the first resource of the
> range was moved. This could lead to errors like the one below.
> Fix this by updating pos->first when needed.
>
> <3> [218.963342] BUG: KASAN: null-ptr-deref in ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
> <3> [218.963456] Read of size 8 at addr 0000000000000038 by task xe_evict/1529
> <3> [218.963546]
> <3> [218.963566] CPU: 0 PID: 1529 Comm: xe_evict Not tainted 6.3.0-xe #1
> <3> [218.963664] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake H DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.4064.A00.2102041619 02/04/2021
> <3> [218.963841] Call Trace:
> <3> [218.963881]  <TASK>
> <3> [218.963915]  dump_stack_lvl+0x64/0xb0
> <3> [218.963976]  print_report+0x3e5/0x600
> <3> [218.964036]  ? ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
> <3> [218.964127]  kasan_report+0x96/0xc0
> <3> [218.964183]  ? ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
> <3> [218.964276]  ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
> <3> [218.964365]  ttm_bo_set_bulk_move+0x92/0x140 [ttm]
> <3> [218.964454]  xe_gem_object_close+0xc8/0x120 [xe]
> <3> [218.964675]  ? __pfx_xe_gem_object_close+0x10/0x10 [xe]
> <3> [218.964908]  ? drm_gem_object_handle_put_unlocked+0xc7/0x170 [drm]
> <3> [218.965071]  drm_gem_object_release_handle+0x45/0x80 [drm]
> <3> [218.965220]  ? __pfx_drm_gem_object_release_handle+0x10/0x10 [drm]
> <3> [218.965381]  idr_for_each+0xc9/0x180
> <3> [218.965437]  ? __pfx_idr_for_each+0x10/0x10
> <3> [218.965504]  drm_gem_release+0x20/0x30 [drm]
> <3> [218.965637]  drm_file_free.part.0+0x4cb/0x4f0 [drm]
> <3> [218.965778]  ? drm_close_helper.isra.0+0xb7/0xe0 [drm]
> <3> [218.965921]  drm_release_noglobal+0x49/0x90 [drm]
> <3> [218.966061]  __fput+0x122/0x450
> <3> [218.966115]  task_work_run+0xfe/0x190
> <3> [218.966175]  ? __pfx_task_work_run+0x10/0x10
> <3> [218.966239]  ? do_raw_spin_unlock+0xa7/0x140
> <3> [218.966308]  do_exit+0x55f/0x1430
> <3> [218.966364]  ? __pfx_lock_release+0x10/0x10
> <3> [218.966431]  ? do_raw_spin_lock+0x11d/0x1e0
> <3> [218.966498]  ? __pfx_do_exit+0x10/0x10
> <3> [218.966554]  ? __pfx_do_raw_spin_lock+0x10/0x10
> <3> [218.966625]  ? mark_held_locks+0x24/0x90
> <3> [218.966688]  ? lockdep_hardirqs_on_prepare+0x136/0x210
> <3> [218.966768]  do_group_exit+0x68/0x110
> <3> [218.966828]  __x64_sys_exit_group+0x2c/0x30
> <3> [218.966896]  do_syscall_64+0x3c/0x90
> <3> [218.966955]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> <3> [218.967035] RIP: 0033:0x7f77b194f146
> <3> [218.967094] Code: Unable to access opcode bytes at 0x7f77b194f11c.
> <3> [218.967174] RSP: 002b:00007ffc64791188 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> <3> [218.967271] RAX: ffffffffffffffda RBX: 00007f77b1a548a0 RCX: 00007f77b194f146
> <3> [218.967364] RDX: 0000000000000062 RSI: 000000000000003c RDI: 0000000000000062
> <3> [218.967458] RBP: 0000000000000062 R08: 00000000000000e7 R09: ffffffffffffff78
> <3> [218.967553] R10: 0000000000000058 R11: 0000000000000246 R12: 00007f77b1a548a0
> <3> [218.967648] R13: 0000000000000003 R14: 00007f77b1a5d2e8 R15: 0000000000000000
> <3> [218.967745]  </TASK>
>
> Fixes: fee2ede15542 ("drm/ttm: rework bulk move handling v5")
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: "Christian König" <ckoenig.leichtzumerken@gmail.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v5.19+
> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/411
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/ttm/ttm_resource.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
> index 7333f7a87a2f..cb05e0a36576 100644
> --- a/drivers/gpu/drm/ttm/ttm_resource.c
> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
> @@ -86,6 +86,8 @@ static void ttm_lru_bulk_move_pos_tail(struct ttm_lru_bulk_move_pos *pos,
>   				       struct ttm_resource *res)
>   {
>   	if (pos->last != res) {
> +		if (pos->first == res)
> +			pos->first = list_next_entry(res, lru);
>   		list_move(&res->lru, &pos->last->lru);
>   		pos->last = res;
>   	}


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH 3/4] drm/ttm: Don't leak a resource on eviction error
  2023-06-22 10:14 ` [PATCH 3/4] drm/ttm: Don't leak a resource on eviction error Thomas Hellström
  2023-06-22 12:25   ` Nirmoy Das
@ 2023-06-22 13:55   ` Andi Shyti
  2023-06-22 14:08     ` Thomas Hellström
  1 sibling, 1 reply; 14+ messages in thread
From: Andi Shyti @ 2023-06-22 13:55 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Andrey Grodzovsky, Christian König, intel-gfx,
	linux-kernel, stable, Huang Rui, dri-devel, Christian König

Hi Thomas,

On Thu, Jun 22, 2023 at 12:14:11PM +0200, Thomas Hellström wrote:
> On eviction errors other than -EMULTIHOP we were leaking a resource.
> Fix.
> 
> Fixes: 403797925768 ("drm/ttm: Fix multihop assert on eviction.")
> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v5.15+
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/ttm/ttm_bo.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 615d30c4262d..89530f2a027f 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -462,14 +462,14 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
>  	ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
>  	if (ret == -EMULTIHOP) {
>  		ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);
> -		if (ret) {
> -			if (ret != -ERESTARTSYS && ret != -EINTR)
> -				pr_err("Buffer eviction failed\n");
> -			ttm_resource_free(bo, &evict_mem);
> -			goto out;
> -		}
> -		/* try and move to final place now. */
> -		goto bounce;
> +		if (!ret)
> +			/* try and move to final place now. */
> +			goto bounce;

As we are at this, can't we replace this with a while()? Goto's
used instead of a while loop are a fist in the eye...

It looks even better:

	while (1) {
		ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
		if (!ret)
			break;

		if (ret == -EMULTIHOP)
			ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem,
							ctx, &hop);

		/* try again */
		if (!ret)
			continue;

		ttm_resource_free(bo, &evict_mem);
		if (ret != -ERESTARTSYS && ret != -EINTR)
			pr_err("Buffer eviction failed\n");

		break;
	}

Andi

> +	}
> +	if (ret) {
> +		ttm_resource_free(bo, &evict_mem);
> +		if (ret != -ERESTARTSYS && ret != -EINTR)
> +			pr_err("Buffer eviction failed\n");
>  	}
>  out:
>  	return ret;
> -- 
> 2.40.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH 4/4] drm/ttm: Don't leak a resource on swapout move error
  2023-06-22 10:14 ` [PATCH 4/4] drm/ttm: Don't leak a resource on swapout move error Thomas Hellström
  2023-06-22 12:25   ` Nirmoy Das
@ 2023-06-22 13:56   ` Andi Shyti
  1 sibling, 0 replies; 14+ messages in thread
From: Andi Shyti @ 2023-06-22 13:56 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Christian König, intel-gfx, linux-kernel, stable,
	dri-devel, Christian König

Hi Thomas,

On Thu, Jun 22, 2023 at 12:14:12PM +0200, Thomas Hellström wrote:
> If moving the bo to system for swapout failed, we were leaking
> a resource. Fix.
> 
> Fixes: bfa3357ef9ab ("drm/ttm: allocate resource object instead of embedding it v2")
> Cc: Christian König <christian.koenig@amd.com>
> Cc: "Christian König" <ckoenig.leichtzumerken@gmail.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v5.14+
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> 

Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH 3/4] drm/ttm: Don't leak a resource on eviction error
  2023-06-22 13:55   ` [Intel-gfx] " Andi Shyti
@ 2023-06-22 14:08     ` Thomas Hellström
  2023-06-22 14:48       ` Christian König
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellström @ 2023-06-22 14:08 UTC (permalink / raw)
  To: Andi Shyti
  Cc: intel-xe, Andrey Grodzovsky, Christian König, intel-gfx,
	linux-kernel, stable, Huang Rui, dri-devel, Christian König


On 6/22/23 15:55, Andi Shyti wrote:
> Hi Thomas,
>
> On Thu, Jun 22, 2023 at 12:14:11PM +0200, Thomas Hellström wrote:
>> On eviction errors other than -EMULTIHOP we were leaking a resource.
>> Fix.
>>
>> Fixes: 403797925768 ("drm/ttm: Fix multihop assert on eviction.")
>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Cc: Huang Rui <ray.huang@amd.com>
>> Cc: dri-devel@lists.freedesktop.org
>> Cc: <stable@vger.kernel.org> # v5.15+
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo.c | 16 ++++++++--------
>>   1 file changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index 615d30c4262d..89530f2a027f 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -462,14 +462,14 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
>>   	ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
>>   	if (ret == -EMULTIHOP) {
>>   		ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);
>> -		if (ret) {
>> -			if (ret != -ERESTARTSYS && ret != -EINTR)
>> -				pr_err("Buffer eviction failed\n");
>> -			ttm_resource_free(bo, &evict_mem);
>> -			goto out;
>> -		}
>> -		/* try and move to final place now. */
>> -		goto bounce;
>> +		if (!ret)
>> +			/* try and move to final place now. */
>> +			goto bounce;
> As we are at this, can't we replace this with a while()? Goto's
> used instead of a while loop are a fist in the eye...

I'm completely OK with that. this patch already did away with one of 
them. Let's hear Christian's opinion first, though.

Thanks,

Thomas





>
> It looks even better:
>
> 	while (1) {
> 		ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
> 		if (!ret)
> 			break;
>
> 		if (ret == -EMULTIHOP)
> 			ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem,
> 							ctx, &hop);
>
> 		/* try again */
> 		if (!ret)
> 			continue;
>
> 		ttm_resource_free(bo, &evict_mem);
> 		if (ret != -ERESTARTSYS && ret != -EINTR)
> 			pr_err("Buffer eviction failed\n");
>
> 		break;
> 	}
>
> Andi
>
>> +	}
>> +	if (ret) {
>> +		ttm_resource_free(bo, &evict_mem);
>> +		if (ret != -ERESTARTSYS && ret != -EINTR)
>> +			pr_err("Buffer eviction failed\n");
>>   	}
>>   out:
>>   	return ret;
>> -- 
>> 2.40.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH 3/4] drm/ttm: Don't leak a resource on eviction error
  2023-06-22 14:08     ` Thomas Hellström
@ 2023-06-22 14:48       ` Christian König
  2023-06-22 17:03         ` Thomas Hellström
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2023-06-22 14:48 UTC (permalink / raw)
  To: Thomas Hellström, Andi Shyti
  Cc: intel-xe, Andrey Grodzovsky, Christian König, intel-gfx,
	linux-kernel, stable, Huang Rui, dri-devel



Am 22.06.23 um 16:08 schrieb Thomas Hellström:
>
> On 6/22/23 15:55, Andi Shyti wrote:
>> Hi Thomas,
>>
>> On Thu, Jun 22, 2023 at 12:14:11PM +0200, Thomas Hellström wrote:
>>> On eviction errors other than -EMULTIHOP we were leaking a resource.
>>> Fix.
>>>
>>> Fixes: 403797925768 ("drm/ttm: Fix multihop assert on eviction.")
>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>> Cc: Huang Rui <ray.huang@amd.com>
>>> Cc: dri-devel@lists.freedesktop.org
>>> Cc: <stable@vger.kernel.org> # v5.15+
>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_bo.c | 16 ++++++++--------
>>>   1 file changed, 8 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 615d30c4262d..89530f2a027f 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -462,14 +462,14 @@ static int ttm_bo_evict(struct 
>>> ttm_buffer_object *bo,
>>>       ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
>>>       if (ret == -EMULTIHOP) {
>>>           ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);
>>> -        if (ret) {
>>> -            if (ret != -ERESTARTSYS && ret != -EINTR)
>>> -                pr_err("Buffer eviction failed\n");
>>> -            ttm_resource_free(bo, &evict_mem);
>>> -            goto out;
>>> -        }
>>> -        /* try and move to final place now. */
>>> -        goto bounce;
>>> +        if (!ret)
>>> +            /* try and move to final place now. */
>>> +            goto bounce;
>> As we are at this, can't we replace this with a while()? Goto's
>> used instead of a while loop are a fist in the eye...
>
> I'm completely OK with that. this patch already did away with one of 
> them. Let's hear Christian's opinion first, though.

I'm not a fan of that goto either, but could we somehow avoid the 
while(1) ? E.g. something like do { } while (!ret) after handling the 
multihop?

Christian.

>
> Thanks,
>
> Thomas
>
>
>
>
>
>>
>> It looks even better:
>>
>>     while (1) {
>>         ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
>>         if (!ret)
>>             break;
>>
>>         if (ret == -EMULTIHOP)
>>             ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem,
>>                             ctx, &hop);
>>
>>         /* try again */
>>         if (!ret)
>>             continue;
>>
>>         ttm_resource_free(bo, &evict_mem);
>>         if (ret != -ERESTARTSYS && ret != -EINTR)
>>             pr_err("Buffer eviction failed\n");
>>
>>         break;
>>     }
>>
>> Andi
>>
>>> +    }
>>> +    if (ret) {
>>> +        ttm_resource_free(bo, &evict_mem);
>>> +        if (ret != -ERESTARTSYS && ret != -EINTR)
>>> +            pr_err("Buffer eviction failed\n");
>>>       }
>>>   out:
>>>       return ret;
>>> -- 
>>> 2.40.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH 3/4] drm/ttm: Don't leak a resource on eviction error
  2023-06-22 14:48       ` Christian König
@ 2023-06-22 17:03         ` Thomas Hellström
  2023-06-23  7:48           ` Andi Shyti
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellström @ 2023-06-22 17:03 UTC (permalink / raw)
  To: Christian König, Andi Shyti
  Cc: intel-xe, Andrey Grodzovsky, Christian König, intel-gfx,
	linux-kernel, stable, Huang Rui, dri-devel


On 6/22/23 16:48, Christian König wrote:
>
>
> Am 22.06.23 um 16:08 schrieb Thomas Hellström:
>>
>> On 6/22/23 15:55, Andi Shyti wrote:
>>> Hi Thomas,
>>>
>>> On Thu, Jun 22, 2023 at 12:14:11PM +0200, Thomas Hellström wrote:
>>>> On eviction errors other than -EMULTIHOP we were leaking a resource.
>>>> Fix.
>>>>
>>>> Fixes: 403797925768 ("drm/ttm: Fix multihop assert on eviction.")
>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>> Cc: Huang Rui <ray.huang@amd.com>
>>>> Cc: dri-devel@lists.freedesktop.org
>>>> Cc: <stable@vger.kernel.org> # v5.15+
>>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>> ---
>>>>   drivers/gpu/drm/ttm/ttm_bo.c | 16 ++++++++--------
>>>>   1 file changed, 8 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>>> index 615d30c4262d..89530f2a027f 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>> @@ -462,14 +462,14 @@ static int ttm_bo_evict(struct 
>>>> ttm_buffer_object *bo,
>>>>       ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
>>>>       if (ret == -EMULTIHOP) {
>>>>           ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);
>>>> -        if (ret) {
>>>> -            if (ret != -ERESTARTSYS && ret != -EINTR)
>>>> -                pr_err("Buffer eviction failed\n");
>>>> -            ttm_resource_free(bo, &evict_mem);
>>>> -            goto out;
>>>> -        }
>>>> -        /* try and move to final place now. */
>>>> -        goto bounce;
>>>> +        if (!ret)
>>>> +            /* try and move to final place now. */
>>>> +            goto bounce;
>>> As we are at this, can't we replace this with a while()? Goto's
>>> used instead of a while loop are a fist in the eye...
>>
>> I'm completely OK with that. this patch already did away with one of 
>> them. Let's hear Christian's opinion first, though.
>
> I'm not a fan of that goto either, but could we somehow avoid the 
> while(1) ? E.g. something like do { } while (!ret) after handling the 
> multihop?

I think the construct that makes it most obvious what's happening, 
although it needs two tests for -EMULTIHOP is something like

do {
....
    if (ret != -EMULTIHOP)
       break;
    ....
} while (ret ==-EMULTIHOP);

Will be out tomorrow, though, so I don't have time to respin before Monday.

/Thomas


>
> Christian.
>
>>
>> Thanks,
>>
>> Thomas
>>
>>
>>
>>
>>
>>>
>>> It looks even better:
>>>
>>>     while (1) {
>>>         ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
>>>         if (!ret)
>>>             break;
>>>
>>>         if (ret == -EMULTIHOP)
>>>             ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem,
>>>                             ctx, &hop);
>>>
>>>         /* try again */
>>>         if (!ret)
>>>             continue;
>>>
>>>         ttm_resource_free(bo, &evict_mem);
>>>         if (ret != -ERESTARTSYS && ret != -EINTR)
>>>             pr_err("Buffer eviction failed\n");
>>>
>>>         break;
>>>     }
>>>
>>> Andi
>>>
>>>> +    }
>>>> +    if (ret) {
>>>> +        ttm_resource_free(bo, &evict_mem);
>>>> +        if (ret != -ERESTARTSYS && ret != -EINTR)
>>>> +            pr_err("Buffer eviction failed\n");
>>>>       }
>>>>   out:
>>>>       return ret;
>>>> -- 
>>>> 2.40.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH 3/4] drm/ttm: Don't leak a resource on eviction error
  2023-06-22 17:03         ` Thomas Hellström
@ 2023-06-23  7:48           ` Andi Shyti
  0 siblings, 0 replies; 14+ messages in thread
From: Andi Shyti @ 2023-06-23  7:48 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Christian König, Andi Shyti, intel-xe, Andrey Grodzovsky,
	Christian König, intel-gfx, linux-kernel, stable, Huang Rui,
	dri-devel

Hi Christian and Thomas,

> > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
> > > > > b/drivers/gpu/drm/ttm/ttm_bo.c
> > > > > index 615d30c4262d..89530f2a027f 100644
> > > > > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > > > > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > > > > @@ -462,14 +462,14 @@ static int ttm_bo_evict(struct
> > > > > ttm_buffer_object *bo,
> > > > >       ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
> > > > >       if (ret == -EMULTIHOP) {
> > > > >           ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);
> > > > > -        if (ret) {
> > > > > -            if (ret != -ERESTARTSYS && ret != -EINTR)
> > > > > -                pr_err("Buffer eviction failed\n");
> > > > > -            ttm_resource_free(bo, &evict_mem);
> > > > > -            goto out;
> > > > > -        }
> > > > > -        /* try and move to final place now. */
> > > > > -        goto bounce;
> > > > > +        if (!ret)
> > > > > +            /* try and move to final place now. */
> > > > > +            goto bounce;
> > > > As we are at this, can't we replace this with a while()? Goto's
> > > > used instead of a while loop are a fist in the eye...
> > > 
> > > I'm completely OK with that. this patch already did away with one of
> > > them. Let's hear Christian's opinion first, though.
> > 
> > I'm not a fan of that goto either, but could we somehow avoid the
> > while(1) ? E.g. something like do { } while (!ret) after handling the
> > multihop?
> 
> I think the construct that makes it most obvious what's happening, although
> it needs two tests for -EMULTIHOP is something like
> 
> do {
> ....
>    if (ret != -EMULTIHOP)
>       break;
>    ....
> } while (ret ==-EMULTIHOP);

even better :)

Thank you!
Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-06-23  7:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20230622101412.78426-1-thomas.hellstrom@linux.intel.com>
2023-06-22 10:14 ` [PATCH 1/4] drm/ttm: Fix ttm_lru_bulk_move_pos_tail() Thomas Hellström
2023-06-22 12:26   ` [Intel-gfx] " Nirmoy Das
2023-06-22 12:26   ` Christian König
2023-06-22 10:14 ` [PATCH 2/4] drm/ttm: Don't shadow the operation context Thomas Hellström
2023-06-22 10:14 ` [PATCH 3/4] drm/ttm: Don't leak a resource on eviction error Thomas Hellström
2023-06-22 12:25   ` Nirmoy Das
2023-06-22 13:55   ` [Intel-gfx] " Andi Shyti
2023-06-22 14:08     ` Thomas Hellström
2023-06-22 14:48       ` Christian König
2023-06-22 17:03         ` Thomas Hellström
2023-06-23  7:48           ` Andi Shyti
2023-06-22 10:14 ` [PATCH 4/4] drm/ttm: Don't leak a resource on swapout move error Thomas Hellström
2023-06-22 12:25   ` Nirmoy Das
2023-06-22 13:56   ` [Intel-gfx] " Andi Shyti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).