* [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage
@ 2022-08-24 14:23 Luben Tuikov
2022-08-24 14:23 ` [Intel-gfx] [PATCH 2/3] drm/ttm: stop allocating dummy resources during BO creation Luben Tuikov
` (5 more replies)
0 siblings, 6 replies; 24+ messages in thread
From: Luben Tuikov @ 2022-08-24 14:23 UTC (permalink / raw)
To: dri-devel, Intel Graphics; +Cc: Luben Tuikov, Christian König
From: Christian König <christian.koenig@amd.com>
Make sure we can at least move and alloc TT objects without backing store.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 6 ++----
drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 2 +-
2 files changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index bc9c432edffe03..45ce2d1f754cc4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -271,8 +271,6 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
{
struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
bdev);
- struct ttm_resource_manager *man =
- ttm_manager_type(bo->bdev, bo->resource->mem_type);
struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
unsigned long ccs_pages = 0;
enum ttm_caching caching;
@@ -286,8 +284,8 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
if (!i915_tt)
return NULL;
- if (obj->flags & I915_BO_ALLOC_CPU_CLEAR &&
- man->use_tt)
+ if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && bo->resource &&
+ ttm_manager_type(bo->bdev, bo->resource->mem_type)->use_tt)
page_flags |= TTM_TT_FLAG_ZERO_ALLOC;
caching = i915_ttm_select_tt_caching(obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 9a7e50534b84bb..c420d1ab605b6f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -560,7 +560,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
bool clear;
int ret;
- if (GEM_WARN_ON(!obj)) {
+ if (GEM_WARN_ON(!obj) || !bo->resource) {
ttm_bo_move_null(bo, dst_mem);
return 0;
}
--
2.36.1.74.g277cf0bc36
^ permalink raw reply related [flat|nested] 24+ messages in thread* [Intel-gfx] [PATCH 2/3] drm/ttm: stop allocating dummy resources during BO creation 2022-08-24 14:23 [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage Luben Tuikov @ 2022-08-24 14:23 ` Luben Tuikov 2022-08-24 14:23 ` [Intel-gfx] [PATCH 3/3] drm/ttm: stop allocating a dummy resource for pipelined gutting Luben Tuikov ` (4 subsequent siblings) 5 siblings, 0 replies; 24+ messages in thread From: Luben Tuikov @ 2022-08-24 14:23 UTC (permalink / raw) To: dri-devel, Intel Graphics; +Cc: Luben Tuikov, Christian König From: Christian König <christian.koenig@amd.com> That should not be necessary any more when drivers should at least be able to handle the move without a resource. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> --- drivers/gpu/drm/ttm/ttm_bo.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index f066e8124c5058..d7ed891d388769 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -959,7 +959,6 @@ int ttm_bo_init_reserved(struct ttm_device *bdev, struct ttm_buffer_object *bo, struct sg_table *sg, struct dma_resv *resv, void (*destroy) (struct ttm_buffer_object *)) { - static const struct ttm_place sys_mem = { .mem_type = TTM_PL_SYSTEM }; int ret; kref_init(&bo->kref); @@ -977,12 +976,6 @@ int ttm_bo_init_reserved(struct ttm_device *bdev, struct ttm_buffer_object *bo, bo->base.resv = &bo->base._resv; atomic_inc(&ttm_glob.bo_count); - ret = ttm_resource_alloc(bo, &sys_mem, &bo->resource); - if (unlikely(ret)) { - ttm_bo_put(bo); - return ret; - } - /* * For ttm_bo_type_device buffers, allocate * address space from the device. -- 2.36.1.74.g277cf0bc36 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* [Intel-gfx] [PATCH 3/3] drm/ttm: stop allocating a dummy resource for pipelined gutting 2022-08-24 14:23 [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage Luben Tuikov 2022-08-24 14:23 ` [Intel-gfx] [PATCH 2/3] drm/ttm: stop allocating dummy resources during BO creation Luben Tuikov @ 2022-08-24 14:23 ` Luben Tuikov 2022-08-24 16:21 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [1/3] drm/i915: audit bo->resource usage Patchwork ` (3 subsequent siblings) 5 siblings, 0 replies; 24+ messages in thread From: Luben Tuikov @ 2022-08-24 14:23 UTC (permalink / raw) To: dri-devel, Intel Graphics; +Cc: Luben Tuikov, Christian König From: Christian König <christian.koenig@amd.com> That should not be necessary any more when drivers should at least be able to handle a move without a resource. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> --- drivers/gpu/drm/ttm/ttm_bo_util.c | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index 497ee1fdbad7bf..a1c4dc031cae63 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -607,16 +607,10 @@ EXPORT_SYMBOL(ttm_bo_move_sync_cleanup); */ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo) { - static const struct ttm_place sys_mem = { .mem_type = TTM_PL_SYSTEM }; struct ttm_buffer_object *ghost; - struct ttm_resource *sys_res; struct ttm_tt *ttm; int ret; - ret = ttm_resource_alloc(bo, &sys_mem, &sys_res); - if (ret) - return ret; - /* If already idle, no need for ghost object dance. */ ret = ttm_bo_wait(bo, false, true); if (ret != -EBUSY) { @@ -624,14 +618,13 @@ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo) /* See comment below about clearing. */ ret = ttm_tt_create(bo, true); if (ret) - goto error_free_sys_mem; + return ret; } else { ttm_tt_unpopulate(bo->bdev, bo->ttm); if (bo->type == ttm_bo_type_device) ttm_tt_mark_for_clear(bo->ttm); } ttm_resource_free(bo, &bo->resource); - ttm_bo_assign_mem(bo, sys_res); return 0; } @@ -648,7 +641,7 @@ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo) ret = ttm_tt_create(bo, true); swap(bo->ttm, ttm); if (ret) - goto error_free_sys_mem; + return ret; ret = ttm_buffer_object_transfer(bo, &ghost); if (ret) @@ -662,13 +655,9 @@ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo) dma_resv_unlock(&ghost->base._resv); ttm_bo_put(ghost); bo->ttm = ttm; - ttm_bo_assign_mem(bo, sys_res); return 0; error_destroy_tt: ttm_tt_destroy(bo->bdev, ttm); - -error_free_sys_mem: - ttm_resource_free(bo, &sys_res); return ret; } -- 2.36.1.74.g277cf0bc36 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [1/3] drm/i915: audit bo->resource usage 2022-08-24 14:23 [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage Luben Tuikov 2022-08-24 14:23 ` [Intel-gfx] [PATCH 2/3] drm/ttm: stop allocating dummy resources during BO creation Luben Tuikov 2022-08-24 14:23 ` [Intel-gfx] [PATCH 3/3] drm/ttm: stop allocating a dummy resource for pipelined gutting Luben Tuikov @ 2022-08-24 16:21 ` Patchwork 2022-08-24 16:44 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork ` (2 subsequent siblings) 5 siblings, 0 replies; 24+ messages in thread From: Patchwork @ 2022-08-24 16:21 UTC (permalink / raw) To: Luben Tuikov; +Cc: intel-gfx == Series Details == Series: series starting with [1/3] drm/i915: audit bo->resource usage URL : https://patchwork.freedesktop.org/series/107680/ State : warning == Summary == Error: dim sparse failed Sparse version: v0.6.2 Fast mode used, each commit won't be checked separately. +./arch/x86/include/asm/bitops.h:117:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:117:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:117:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:148:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:148:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:148:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:150:9: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:150:9: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:150:9: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:154:26: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:154:26: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:154:26: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:156:16: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:156:16: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:156:16: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:156:9: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:156:9: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:156:9: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:174:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:174:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:174:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:176:9: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:176:9: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:176:9: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:180:35: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:180:35: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:180:35: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:182:16: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:182:16: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:182:16: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:182:9: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:182:9: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:182:9: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:186:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:186:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:186:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:188:9: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:188:9: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:188:9: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:192:35: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:192:35: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:192:35: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:195:16: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:195:16: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:195:16: warning: unreplaced symbol 'oldbit' +./arch/x86/include/asm/bitops.h:195:9: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:195:9: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:195:9: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:223:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:223:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:223:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:225:9: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:225:9: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:225:9: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:66:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:66:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:66:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:92:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:92:1: warning: unreplaced symbol 'return' +./arch/x86/include/asm/bitops.h:92:1: warning: unreplaced symbol 'return' +drivers/gpu/drm/ttm/ttm_bo.c:1142:20: warning: context imbalance in 'ttm_bo_swapout' - unexpected unlock +drivers/gpu/drm/ttm/ttm_bo.c:259:28: warning: context imbalance in 'ttm_bo_cleanup_refs' - unexpected unlock +drivers/gpu/drm/ttm/ttm_bo.c:317:27: warning: context imbalance in 'ttm_bo_delayed_delete' - different lock contexts for basic block +drivers/gpu/drm/ttm/ttm_bo.c:613:5: warning: context imbalance in 'ttm_mem_evict_first' - wrong count at exit +./include/asm-generic/bitops/generic-non-atomic.h:104:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:104:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:104:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:106:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:106:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:106:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:107:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:107:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:107:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:108:9: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:108:9: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:108:9: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:110:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:110:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:110:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:110:14: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:110:14: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:110:14: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:110:20: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:110:20: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:110:20: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:111:17: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:111:17: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:111:17: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:111:23: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:111:23: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:111:23: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:111:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:111:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:111:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:120:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:120:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:120:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:127:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:127:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:127:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:152:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:152:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:152:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:154:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:154:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:154:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:155:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:155:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:155:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:156:9: warning: unreplaced symbol 'val' +./include/asm-generic/bitops/generic-non-atomic.h:156:9: warning: unreplaced symbol 'val' +./include/asm-generic/bitops/generic-non-atomic.h:156:9: warning: unreplaced symbol 'val' +./include/asm-generic/bitops/generic-non-atomic.h:158:19: warning: unreplaced symbol 'val' +./include/asm-generic/bitops/generic-non-atomic.h:158:19: warning: unreplaced symbol 'val' +./include/asm-generic/bitops/generic-non-atomic.h:158:19: warning: unreplaced symbol 'val' +./include/asm-generic/bitops/generic-non-atomic.h:158:25: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:158:25: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:158:25: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:158:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:158:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:158:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:27:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:27:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:27:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:29:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:29:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:29:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:30:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:30:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:30:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:32:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:32:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:32:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:32:16: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:32:16: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:32:16: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:36:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:36:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:36:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:38:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:38:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:38:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:39:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:39:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:39:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:41:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:41:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:41:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:41:16: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:41:16: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:41:16: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:54:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:54:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:54:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:56:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:56:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:56:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:57:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:57:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:57:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:59:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:59:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:59:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:59:15: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:59:15: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:59:15: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:72:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:72:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:72:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:74:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:74:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:74:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:75:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:75:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:75:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:76:9: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:76:9: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:76:9: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:78:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:78:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:78:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:78:14: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:78:14: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:78:14: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:78:20: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:78:20: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:78:20: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:79:17: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:79:17: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:79:17: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:79:23: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:79:23: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:79:23: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:79:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:79:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:79:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:92:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:92:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:92:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:94:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:94:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:94:9: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:95:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:95:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:95:9: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:96:9: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:96:9: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:96:9: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:98:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:98:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:98:10: warning: unreplaced symbol 'p' +./include/asm-generic/bitops/generic-non-atomic.h:98:14: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:98:14: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:98:14: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:98:21: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:98:21: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:98:21: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:99:17: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:99:17: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:99:17: warning: unreplaced symbol 'old' +./include/asm-generic/bitops/generic-non-atomic.h:99:23: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:99:23: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:99:23: warning: unreplaced symbol 'mask' +./include/asm-generic/bitops/generic-non-atomic.h:99:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:99:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/generic-non-atomic.h:99:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:100:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:100:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:100:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:112:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:112:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:112:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:115:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:115:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:115:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:127:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:127:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:127:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:130:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:130:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:130:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:139:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:139:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:139:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:142:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:142:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:142:9: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:26:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:26:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:26:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:42:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:42:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:42:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:58:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:58:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:58:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:97:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:97:1: warning: unreplaced symbol 'return' +./include/asm-generic/bitops/instrumented-non-atomic.h:97:1: warning: unreplaced symbol 'return' ^ permalink raw reply [flat|nested] 24+ messages in thread
* [Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [1/3] drm/i915: audit bo->resource usage 2022-08-24 14:23 [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage Luben Tuikov ` (2 preceding siblings ...) 2022-08-24 16:21 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [1/3] drm/i915: audit bo->resource usage Patchwork @ 2022-08-24 16:44 ` Patchwork 2022-08-30 7:33 ` [Intel-gfx] [PATCH 1/3] " Christian König 2022-09-01 8:43 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/3] drm/i915: audit bo->resource usage (rev2) Patchwork 5 siblings, 0 replies; 24+ messages in thread From: Patchwork @ 2022-08-24 16:44 UTC (permalink / raw) To: Luben Tuikov; +Cc: intel-gfx [-- Attachment #1: Type: text/plain, Size: 6279 bytes --] == Series Details == Series: series starting with [1/3] drm/i915: audit bo->resource usage URL : https://patchwork.freedesktop.org/series/107680/ State : failure == Summary == CI Bug Log - changes from CI_DRM_12021 -> Patchwork_107680v1 ==================================================== Summary ------- **FAILURE** Serious unknown changes coming with Patchwork_107680v1 absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_107680v1, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107680v1/index.html Participating hosts (37 -> 37) ------------------------------ Additional (1): fi-kbl-soraka Missing (1): fi-hsw-4770 Possible new issues ------------------- Here are the unknown changes that may have been introduced in Patchwork_107680v1: ### IGT changes ### #### Possible regressions #### * igt@i915_module_load@load: - bat-dg1-6: [PASS][1] -> [FAIL][2] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12021/bat-dg1-6/igt@i915_module_load@load.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107680v1/bat-dg1-6/igt@i915_module_load@load.html * igt@i915_selftest@live@memory_region: - bat-dg1-5: [PASS][3] -> [DMESG-FAIL][4] [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12021/bat-dg1-5/igt@i915_selftest@live@memory_region.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107680v1/bat-dg1-5/igt@i915_selftest@live@memory_region.html #### Suppressed #### The following results come from untrusted machines, tests, or statuses. They do not affect the overall result. * igt@core_hotunplug@unbind-rebind: - {bat-dg2-9}: [PASS][5] -> [{ABORT}][6] [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12021/bat-dg2-9/igt@core_hotunplug@unbind-rebind.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107680v1/bat-dg2-9/igt@core_hotunplug@unbind-rebind.html - {bat-dg2-11}: [PASS][7] -> [{ABORT}][8] [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12021/bat-dg2-11/igt@core_hotunplug@unbind-rebind.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107680v1/bat-dg2-11/igt@core_hotunplug@unbind-rebind.html * igt@gem_lmem_swapping@basic: - {bat-dg2-8}: NOTRUN -> [SKIP][9] [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107680v1/bat-dg2-8/igt@gem_lmem_swapping@basic.html * igt@i915_module_load@load: - {bat-dg2-10}: [PASS][10] -> [INCOMPLETE][11] [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12021/bat-dg2-10/igt@i915_module_load@load.html [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107680v1/bat-dg2-10/igt@i915_module_load@load.html Known issues ------------ Here are the changes found in Patchwork_107680v1 that come from known issues: ### IGT changes ### #### Issues hit #### * igt@i915_suspend@basic-s3-without-i915: - fi-pnv-d510: NOTRUN -> [INCOMPLETE][12] ([i915#6598] / [i915#6601]) [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107680v1/fi-pnv-d510/igt@i915_suspend@basic-s3-without-i915.html * igt@runner@aborted: - fi-kbl-soraka: NOTRUN -> [FAIL][13] ([i915#6219]) [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107680v1/fi-kbl-soraka/igt@runner@aborted.html #### Possible fixes #### * igt@gem_exec_suspend@basic-s3@smem: - {bat-rplp-1}: [DMESG-WARN][14] ([i915#2867]) -> [PASS][15] [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12021/bat-rplp-1/igt@gem_exec_suspend@basic-s3@smem.html [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107680v1/bat-rplp-1/igt@gem_exec_suspend@basic-s3@smem.html * igt@i915_selftest@live@requests: - fi-pnv-d510: [DMESG-FAIL][16] ([i915#4528]) -> [PASS][17] [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12021/fi-pnv-d510/igt@i915_selftest@live@requests.html [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107680v1/fi-pnv-d510/igt@i915_selftest@live@requests.html #### Warnings #### * igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions: - fi-bsw-kefka: [FAIL][18] -> [FAIL][19] ([i915#6298]) [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12021/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions.html [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107680v1/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions.html {name}: This element is suppressed. This means it is ignored when computing the status of the difference (SUCCESS, WARNING, or FAILURE). [i915#2867]: https://gitlab.freedesktop.org/drm/intel/issues/2867 [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312 [i915#4528]: https://gitlab.freedesktop.org/drm/intel/issues/4528 [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983 [i915#6219]: https://gitlab.freedesktop.org/drm/intel/issues/6219 [i915#6257]: https://gitlab.freedesktop.org/drm/intel/issues/6257 [i915#6298]: https://gitlab.freedesktop.org/drm/intel/issues/6298 [i915#6380]: https://gitlab.freedesktop.org/drm/intel/issues/6380 [i915#6598]: https://gitlab.freedesktop.org/drm/intel/issues/6598 [i915#6601]: https://gitlab.freedesktop.org/drm/intel/issues/6601 Build changes ------------- * Linux: CI_DRM_12021 -> Patchwork_107680v1 CI-20190529: 20190529 CI_DRM_12021: 078959b4819e4e0ab8cf2965e7bfd98278c0b35d @ git://anongit.freedesktop.org/gfx-ci/linux IGT_6636: 1298b5f0e1b3e010657ffba41d2e775fab028e08 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git Patchwork_107680v1: 078959b4819e4e0ab8cf2965e7bfd98278c0b35d @ git://anongit.freedesktop.org/gfx-ci/linux ### Linux commits d89370bd4ce1 drm/ttm: stop allocating a dummy resource for pipelined gutting 5700995ed3d0 drm/ttm: stop allocating dummy resources during BO creation 2f18a077f77a drm/i915: audit bo->resource usage == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107680v1/index.html [-- Attachment #2: Type: text/html, Size: 6958 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage 2022-08-24 14:23 [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage Luben Tuikov ` (3 preceding siblings ...) 2022-08-24 16:44 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork @ 2022-08-30 7:33 ` Christian König 2022-08-30 10:45 ` Matthew Auld 2022-09-01 8:43 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/3] drm/i915: audit bo->resource usage (rev2) Patchwork 5 siblings, 1 reply; 24+ messages in thread From: Christian König @ 2022-08-30 7:33 UTC (permalink / raw) To: dri-devel, Intel Graphics, Matthew Auld, Thomas Hellström Cc: Luben Tuikov Hi guys, can we get an rb/acked-by for this i915 change? Basically we are just making sure that the driver doesn't crash when bo->resource is NULL and a bo doesn't have any backing store assigned to it. The Intel CI seems to be happy with this change, so I'm pretty sure it is correct. Thanks, Christian. Am 24.08.22 um 16:23 schrieb Luben Tuikov: > From: Christian König <christian.koenig@amd.com> > > Make sure we can at least move and alloc TT objects without backing store. > > Signed-off-by: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 6 ++---- > drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 2 +- > 2 files changed, 3 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > index bc9c432edffe03..45ce2d1f754cc4 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > @@ -271,8 +271,6 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo, > { > struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915), > bdev); > - struct ttm_resource_manager *man = > - ttm_manager_type(bo->bdev, bo->resource->mem_type); > struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); > unsigned long ccs_pages = 0; > enum ttm_caching caching; > @@ -286,8 +284,8 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo, > if (!i915_tt) > return NULL; > > - if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && > - man->use_tt) > + if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && bo->resource && > + ttm_manager_type(bo->bdev, bo->resource->mem_type)->use_tt) > page_flags |= TTM_TT_FLAG_ZERO_ALLOC; > > caching = i915_ttm_select_tt_caching(obj); > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > index 9a7e50534b84bb..c420d1ab605b6f 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > @@ -560,7 +560,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict, > bool clear; > int ret; > > - if (GEM_WARN_ON(!obj)) { > + if (GEM_WARN_ON(!obj) || !bo->resource) { > ttm_bo_move_null(bo, dst_mem); > return 0; > } ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage 2022-08-30 7:33 ` [Intel-gfx] [PATCH 1/3] " Christian König @ 2022-08-30 10:45 ` Matthew Auld 2022-08-31 8:16 ` Christian König 0 siblings, 1 reply; 24+ messages in thread From: Matthew Auld @ 2022-08-30 10:45 UTC (permalink / raw) To: Christian König, dri-devel, Intel Graphics, Thomas Hellström Cc: Luben Tuikov Hi, On 30/08/2022 08:33, Christian König wrote: > Hi guys, > > can we get an rb/acked-by for this i915 change? > > Basically we are just making sure that the driver doesn't crash when > bo->resource is NULL and a bo doesn't have any backing store assigned to > it. > > The Intel CI seems to be happy with this change, so I'm pretty sure it > is correct. It looks like DG2/DG1 (which happen to use TTM here) are no longer loading the module: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107680v1/index.html? According to the logs the firmware is failing to load, so perhaps related to I915_BO_ALLOC_CPU_CLEAR, since that is one of the rare users. See below. > > Thanks, > Christian. > > Am 24.08.22 um 16:23 schrieb Luben Tuikov: >> From: Christian König <christian.koenig@amd.com> >> >> Make sure we can at least move and alloc TT objects without backing >> store. >> >> Signed-off-by: Christian König <christian.koenig@amd.com> >> --- >> drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 6 ++---- >> drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 2 +- >> 2 files changed, 3 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >> index bc9c432edffe03..45ce2d1f754cc4 100644 >> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >> @@ -271,8 +271,6 @@ static struct ttm_tt *i915_ttm_tt_create(struct >> ttm_buffer_object *bo, >> { >> struct drm_i915_private *i915 = container_of(bo->bdev, >> typeof(*i915), >> bdev); >> - struct ttm_resource_manager *man = >> - ttm_manager_type(bo->bdev, bo->resource->mem_type); >> struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); >> unsigned long ccs_pages = 0; >> enum ttm_caching caching; >> @@ -286,8 +284,8 @@ static struct ttm_tt *i915_ttm_tt_create(struct >> ttm_buffer_object *bo, >> if (!i915_tt) >> return NULL; >> - if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && >> - man->use_tt) >> + if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && bo->resource && >> + ttm_manager_type(bo->bdev, bo->resource->mem_type)->use_tt) >> page_flags |= TTM_TT_FLAG_ZERO_ALLOC; AFAICT i915 was massively relying on everything starting out in a "dummy" system memory resource (ttm_tt), where it then later "transitions" to the real resource. And if we need to clear the memory we rely on ZERO_ALLOC being set before calling into the i915_ttm_move() callback (even when allocating local-memory). For ttm_bo_type_device objects (userspace stuff) it looks like this was previously handled by ttm_bo_validate() always doing: ret = ttm_tt_create(bo, true); /* clear = true */ Which we would always hit since the resource was always "compatible" for the dummy case. But it looks like this is no longer even called, since we can now call into ttm_move with bo->resource == NULL, which still calls tt_create eventually, which not always with clear = true. All other objects are then ttm_bo_type_kernel so we don't care about clearing, except in the rare case of ALLOC_CPU_CLEAR, which was handled as per above in i915_ttm_tt_create(). But I think here bo->resource is NULL at the start when first creating the object, which will skip setting ZERO_ALLOC, which might explain the CI failure. The other possible concern (not sure since CI didn't get that far) is around ttm_bo_pipeline_gutting(), which now leaves bo->resource = NULL. It looks like i915_ttm_shrink() was relying on that to unpopulate the ttm_tt. When later calling ttm_bo_validate(), i915_ttm_move() would see the SWAPPED flag set on the ttm_tt , re-populate it and then potentially move it back to local-memory. What are your thoughts here? Also sorry if i915 is making a bit of mess here. >> caching = i915_ttm_select_tt_caching(obj); >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >> index 9a7e50534b84bb..c420d1ab605b6f 100644 >> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >> @@ -560,7 +560,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, >> bool evict, >> bool clear; >> int ret; >> - if (GEM_WARN_ON(!obj)) { >> + if (GEM_WARN_ON(!obj) || !bo->resource) { >> ttm_bo_move_null(bo, dst_mem); >> return 0; >> } > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage 2022-08-30 10:45 ` Matthew Auld @ 2022-08-31 8:16 ` Christian König 2022-08-31 9:26 ` Matthew Auld 0 siblings, 1 reply; 24+ messages in thread From: Christian König @ 2022-08-31 8:16 UTC (permalink / raw) To: Matthew Auld, dri-devel, Intel Graphics, Thomas Hellström Cc: Luben Tuikov Hi Matthew, Am 30.08.22 um 12:45 schrieb Matthew Auld: > Hi, > > On 30/08/2022 08:33, Christian König wrote: >> Hi guys, >> >> can we get an rb/acked-by for this i915 change? >> >> Basically we are just making sure that the driver doesn't crash when >> bo->resource is NULL and a bo doesn't have any backing store assigned >> to it. >> >> The Intel CI seems to be happy with this change, so I'm pretty sure >> it is correct. > > It looks like DG2/DG1 (which happen to use TTM here) are no longer > loading the module: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fintel-gfx-ci.01.org%2Ftree%2Fdrm-tip%2FPatchwork_107680v1%2Findex.html&data=05%7C01%7Cchristian.koenig%40amd.com%7Caa9bdb0e31064859568708da8a74b899%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637974531164663116%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=UW8BEnIFXHawhAfLUcknGmE88g2wwAiTLAQ3Y5v1pFA%3D&reserved=0? > > > According to the logs the firmware is failing to load, so perhaps > related to I915_BO_ALLOC_CPU_CLEAR, since that is one of the rare > users. See below. > >> >> Thanks, >> Christian. >> >> Am 24.08.22 um 16:23 schrieb Luben Tuikov: >>> From: Christian König <christian.koenig@amd.com> >>> >>> Make sure we can at least move and alloc TT objects without backing >>> store. >>> >>> Signed-off-by: Christian König <christian.koenig@amd.com> >>> --- >>> drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 6 ++---- >>> drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 2 +- >>> 2 files changed, 3 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>> index bc9c432edffe03..45ce2d1f754cc4 100644 >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>> @@ -271,8 +271,6 @@ static struct ttm_tt *i915_ttm_tt_create(struct >>> ttm_buffer_object *bo, >>> { >>> struct drm_i915_private *i915 = container_of(bo->bdev, >>> typeof(*i915), >>> bdev); >>> - struct ttm_resource_manager *man = >>> - ttm_manager_type(bo->bdev, bo->resource->mem_type); >>> struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); >>> unsigned long ccs_pages = 0; >>> enum ttm_caching caching; >>> @@ -286,8 +284,8 @@ static struct ttm_tt *i915_ttm_tt_create(struct >>> ttm_buffer_object *bo, >>> if (!i915_tt) >>> return NULL; >>> - if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && >>> - man->use_tt) >>> + if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && bo->resource && >>> + ttm_manager_type(bo->bdev, bo->resource->mem_type)->use_tt) >>> page_flags |= TTM_TT_FLAG_ZERO_ALLOC; > > AFAICT i915 was massively relying on everything starting out in a > "dummy" system memory resource (ttm_tt), where it then later > "transitions" to the real resource. And if we need to clear the memory > we rely on ZERO_ALLOC being set before calling into the > i915_ttm_move() callback (even when allocating local-memory). > > For ttm_bo_type_device objects (userspace stuff) it looks like this > was previously handled by ttm_bo_validate() always doing: > > ret = ttm_tt_create(bo, true); /* clear = true */ > > Which we would always hit since the resource was always "compatible" > for the dummy case. But it looks like this is no longer even called, > since we can now call into ttm_move with bo->resource == NULL, which > still calls tt_create eventually, which not always with clear = true. > > All other objects are then ttm_bo_type_kernel so we don't care about > clearing, except in the rare case of ALLOC_CPU_CLEAR, which was > handled as per above in i915_ttm_tt_create(). But I think here > bo->resource is NULL at the start when first creating the object, > which will skip setting ZERO_ALLOC, which might explain the CI failure. > > The other possible concern (not sure since CI didn't get that far) is > around ttm_bo_pipeline_gutting(), which now leaves bo->resource = > NULL. It looks like i915_ttm_shrink() was relying on that to > unpopulate the ttm_tt. When later calling ttm_bo_validate(), > i915_ttm_move() would see the SWAPPED flag set on the ttm_tt , > re-populate it and then potentially move it back to local-memory. > > What are your thoughts here? Also sorry if i915 is making a bit of > mess here. First of all thanks a lot for taking a look. We previously got reports about strange crashes with this patch, but couldn't really reproduce them (even not by sending them out again). This explains that a bit. The simplest solution would be to just change the && into a ||, e.g. when previously either no resource is allocated or the resource requires to use a tt we clear it. That should give you the same behavior as before, but I agree that this is a bit messy. I've been considering to replacing the ttm_bo_type with a bunch of behavior flags for a bo. I'm hoping that this will clean things up a bit. Regards, Christian. > >>> caching = i915_ttm_select_tt_caching(obj); >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>> index 9a7e50534b84bb..c420d1ab605b6f 100644 >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>> @@ -560,7 +560,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, >>> bool evict, >>> bool clear; >>> int ret; >>> - if (GEM_WARN_ON(!obj)) { >>> + if (GEM_WARN_ON(!obj) || !bo->resource) { >>> ttm_bo_move_null(bo, dst_mem); >>> return 0; >>> } >> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage 2022-08-31 8:16 ` Christian König @ 2022-08-31 9:26 ` Matthew Auld 2022-08-31 9:38 ` Christian König 0 siblings, 1 reply; 24+ messages in thread From: Matthew Auld @ 2022-08-31 9:26 UTC (permalink / raw) To: Christian König, dri-devel, Intel Graphics, Thomas Hellström Cc: Luben Tuikov On 31/08/2022 09:16, Christian König wrote: > Hi Matthew, > > Am 30.08.22 um 12:45 schrieb Matthew Auld: >> Hi, >> >> On 30/08/2022 08:33, Christian König wrote: >>> Hi guys, >>> >>> can we get an rb/acked-by for this i915 change? >>> >>> Basically we are just making sure that the driver doesn't crash when >>> bo->resource is NULL and a bo doesn't have any backing store assigned >>> to it. >>> >>> The Intel CI seems to be happy with this change, so I'm pretty sure >>> it is correct. >> >> It looks like DG2/DG1 (which happen to use TTM here) are no longer >> loading the module: >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fintel-gfx-ci.01.org%2Ftree%2Fdrm-tip%2FPatchwork_107680v1%2Findex.html&data=05%7C01%7Cchristian.koenig%40amd.com%7Caa9bdb0e31064859568708da8a74b899%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637974531164663116%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=UW8BEnIFXHawhAfLUcknGmE88g2wwAiTLAQ3Y5v1pFA%3D&reserved=0? >> >> According to the logs the firmware is failing to load, so perhaps >> related to I915_BO_ALLOC_CPU_CLEAR, since that is one of the rare >> users. See below. >> >>> >>> Thanks, >>> Christian. >>> >>> Am 24.08.22 um 16:23 schrieb Luben Tuikov: >>>> From: Christian König <christian.koenig@amd.com> >>>> >>>> Make sure we can at least move and alloc TT objects without backing >>>> store. >>>> >>>> Signed-off-by: Christian König <christian.koenig@amd.com> >>>> --- >>>> drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 6 ++---- >>>> drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 2 +- >>>> 2 files changed, 3 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>> index bc9c432edffe03..45ce2d1f754cc4 100644 >>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>> @@ -271,8 +271,6 @@ static struct ttm_tt *i915_ttm_tt_create(struct >>>> ttm_buffer_object *bo, >>>> { >>>> struct drm_i915_private *i915 = container_of(bo->bdev, >>>> typeof(*i915), >>>> bdev); >>>> - struct ttm_resource_manager *man = >>>> - ttm_manager_type(bo->bdev, bo->resource->mem_type); >>>> struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); >>>> unsigned long ccs_pages = 0; >>>> enum ttm_caching caching; >>>> @@ -286,8 +284,8 @@ static struct ttm_tt *i915_ttm_tt_create(struct >>>> ttm_buffer_object *bo, >>>> if (!i915_tt) >>>> return NULL; >>>> - if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && >>>> - man->use_tt) >>>> + if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && bo->resource && >>>> + ttm_manager_type(bo->bdev, bo->resource->mem_type)->use_tt) >>>> page_flags |= TTM_TT_FLAG_ZERO_ALLOC; >> >> AFAICT i915 was massively relying on everything starting out in a >> "dummy" system memory resource (ttm_tt), where it then later >> "transitions" to the real resource. And if we need to clear the memory >> we rely on ZERO_ALLOC being set before calling into the >> i915_ttm_move() callback (even when allocating local-memory). >> >> For ttm_bo_type_device objects (userspace stuff) it looks like this >> was previously handled by ttm_bo_validate() always doing: >> >> ret = ttm_tt_create(bo, true); /* clear = true */ >> >> Which we would always hit since the resource was always "compatible" >> for the dummy case. But it looks like this is no longer even called, >> since we can now call into ttm_move with bo->resource == NULL, which >> still calls tt_create eventually, which not always with clear = true. >> >> All other objects are then ttm_bo_type_kernel so we don't care about >> clearing, except in the rare case of ALLOC_CPU_CLEAR, which was >> handled as per above in i915_ttm_tt_create(). But I think here >> bo->resource is NULL at the start when first creating the object, >> which will skip setting ZERO_ALLOC, which might explain the CI failure. >> >> The other possible concern (not sure since CI didn't get that far) is >> around ttm_bo_pipeline_gutting(), which now leaves bo->resource = >> NULL. It looks like i915_ttm_shrink() was relying on that to >> unpopulate the ttm_tt. When later calling ttm_bo_validate(), >> i915_ttm_move() would see the SWAPPED flag set on the ttm_tt , >> re-populate it and then potentially move it back to local-memory. >> >> What are your thoughts here? Also sorry if i915 is making a bit of >> mess here. > > First of all thanks a lot for taking a look. We previously got reports > about strange crashes with this patch, but couldn't really reproduce > them (even not by sending them out again). This explains that a bit. > > The simplest solution would be to just change the && into a ||, e.g. > when previously either no resource is allocated or the resource requires > to use a tt we clear it. > > That should give you the same behavior as before, but I agree that this > is a bit messy. Yeah, that should do the trick. That hopefully just leaves i915_ttm_shrink(), which is swapping out shmem ttm_tt and is calling ttm_bo_validate() with empty placements to force the pipeline-gutting path, which importantly unpopulates the ttm_tt for us (since ttm_tt_unpopulate is not exported it seems). But AFAICT it looks like that will now also nuke the bo->resource, instead of just leaving it in system memory. My assumption is that when later calling ttm_bo_validate(), it will just do the bo_move_null() in i915_ttm_move(), instead of re-populating the ttm_tt and then potentially copying it back to local-memory? > > I've been considering to replacing the ttm_bo_type with a bunch of > behavior flags for a bo. I'm hoping that this will clean things up a bit. > > Regards, > Christian. > >> >>>> caching = i915_ttm_select_tt_caching(obj); >>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>> index 9a7e50534b84bb..c420d1ab605b6f 100644 >>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>> @@ -560,7 +560,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, >>>> bool evict, >>>> bool clear; >>>> int ret; >>>> - if (GEM_WARN_ON(!obj)) { >>>> + if (GEM_WARN_ON(!obj) || !bo->resource) { >>>> ttm_bo_move_null(bo, dst_mem); >>>> return 0; >>>> } >>> > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage 2022-08-31 9:26 ` Matthew Auld @ 2022-08-31 9:38 ` Christian König 2022-08-31 10:37 ` Matthew Auld 0 siblings, 1 reply; 24+ messages in thread From: Christian König @ 2022-08-31 9:38 UTC (permalink / raw) To: Matthew Auld, dri-devel, Intel Graphics, Thomas Hellström Cc: Luben Tuikov Am 31.08.22 um 11:26 schrieb Matthew Auld: > On 31/08/2022 09:16, Christian König wrote: >> Hi Matthew, >> >> Am 30.08.22 um 12:45 schrieb Matthew Auld: >>> Hi, >>> >>> On 30/08/2022 08:33, Christian König wrote: >>>> Hi guys, >>>> >>>> can we get an rb/acked-by for this i915 change? >>>> >>>> Basically we are just making sure that the driver doesn't crash >>>> when bo->resource is NULL and a bo doesn't have any backing store >>>> assigned to it. >>>> >>>> The Intel CI seems to be happy with this change, so I'm pretty sure >>>> it is correct. >>> >>> It looks like DG2/DG1 (which happen to use TTM here) are no longer >>> loading the module: >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fintel-gfx-ci.01.org%2Ftree%2Fdrm-tip%2FPatchwork_107680v1%2Findex.html&data=05%7C01%7Cchristian.koenig%40amd.com%7Ccaca567b3279492450fd08da8b32e598%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637975347967354305%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=apanfOjzSWD2vduINzr2j6F2DiIBC93hLRnnGJcGQ5o%3D&reserved=0? >>> >>> >>> According to the logs the firmware is failing to load, so perhaps >>> related to I915_BO_ALLOC_CPU_CLEAR, since that is one of the rare >>> users. See below. >>> >>>> >>>> Thanks, >>>> Christian. >>>> >>>> Am 24.08.22 um 16:23 schrieb Luben Tuikov: >>>>> From: Christian König <christian.koenig@amd.com> >>>>> >>>>> Make sure we can at least move and alloc TT objects without >>>>> backing store. >>>>> >>>>> Signed-off-by: Christian König <christian.koenig@amd.com> >>>>> --- >>>>> drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 6 ++---- >>>>> drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 2 +- >>>>> 2 files changed, 3 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>>> index bc9c432edffe03..45ce2d1f754cc4 100644 >>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>>> @@ -271,8 +271,6 @@ static struct ttm_tt >>>>> *i915_ttm_tt_create(struct ttm_buffer_object *bo, >>>>> { >>>>> struct drm_i915_private *i915 = container_of(bo->bdev, >>>>> typeof(*i915), >>>>> bdev); >>>>> - struct ttm_resource_manager *man = >>>>> - ttm_manager_type(bo->bdev, bo->resource->mem_type); >>>>> struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); >>>>> unsigned long ccs_pages = 0; >>>>> enum ttm_caching caching; >>>>> @@ -286,8 +284,8 @@ static struct ttm_tt >>>>> *i915_ttm_tt_create(struct ttm_buffer_object *bo, >>>>> if (!i915_tt) >>>>> return NULL; >>>>> - if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && >>>>> - man->use_tt) >>>>> + if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && bo->resource && >>>>> + ttm_manager_type(bo->bdev, bo->resource->mem_type)->use_tt) >>>>> page_flags |= TTM_TT_FLAG_ZERO_ALLOC; >>> >>> AFAICT i915 was massively relying on everything starting out in a >>> "dummy" system memory resource (ttm_tt), where it then later >>> "transitions" to the real resource. And if we need to clear the >>> memory we rely on ZERO_ALLOC being set before calling into the >>> i915_ttm_move() callback (even when allocating local-memory). >>> >>> For ttm_bo_type_device objects (userspace stuff) it looks like this >>> was previously handled by ttm_bo_validate() always doing: >>> >>> ret = ttm_tt_create(bo, true); /* clear = true */ >>> >>> Which we would always hit since the resource was always "compatible" >>> for the dummy case. But it looks like this is no longer even called, >>> since we can now call into ttm_move with bo->resource == NULL, which >>> still calls tt_create eventually, which not always with clear = true. >>> >>> All other objects are then ttm_bo_type_kernel so we don't care about >>> clearing, except in the rare case of ALLOC_CPU_CLEAR, which was >>> handled as per above in i915_ttm_tt_create(). But I think here >>> bo->resource is NULL at the start when first creating the object, >>> which will skip setting ZERO_ALLOC, which might explain the CI failure. >>> >>> The other possible concern (not sure since CI didn't get that far) >>> is around ttm_bo_pipeline_gutting(), which now leaves bo->resource = >>> NULL. It looks like i915_ttm_shrink() was relying on that to >>> unpopulate the ttm_tt. When later calling ttm_bo_validate(), >>> i915_ttm_move() would see the SWAPPED flag set on the ttm_tt , >>> re-populate it and then potentially move it back to local-memory. >>> >>> What are your thoughts here? Also sorry if i915 is making a bit of >>> mess here. >> >> First of all thanks a lot for taking a look. We previously got >> reports about strange crashes with this patch, but couldn't really >> reproduce them (even not by sending them out again). This explains >> that a bit. >> >> The simplest solution would be to just change the && into a ||, e.g. >> when previously either no resource is allocated or the resource >> requires to use a tt we clear it. >> >> That should give you the same behavior as before, but I agree that >> this is a bit messy. > > Yeah, that should do the trick. > > That hopefully just leaves i915_ttm_shrink(), which is swapping out > shmem ttm_tt and is calling ttm_bo_validate() with empty placements to > force the pipeline-gutting path, which importantly unpopulates the > ttm_tt for us (since ttm_tt_unpopulate is not exported it seems). But > AFAICT it looks like that will now also nuke the bo->resource, instead > of just leaving it in system memory. My assumption is that when later > calling ttm_bo_validate(), it will just do the bo_move_null() in > i915_ttm_move(), instead of re-populating the ttm_tt and then > potentially copying it back to local-memory? Well you do ttm_bo_validate() with something like GTT domain, don't you? This should result in re-populating the tt object, but I'm not 100% sure if that really works as expected. Thanks, Christian. > >> >> I've been considering to replacing the ttm_bo_type with a bunch of >> behavior flags for a bo. I'm hoping that this will clean things up a >> bit. >> >> Regards, >> Christian. >> >>> >>>>> caching = i915_ttm_select_tt_caching(obj); >>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>> index 9a7e50534b84bb..c420d1ab605b6f 100644 >>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>> @@ -560,7 +560,7 @@ int i915_ttm_move(struct ttm_buffer_object >>>>> *bo, bool evict, >>>>> bool clear; >>>>> int ret; >>>>> - if (GEM_WARN_ON(!obj)) { >>>>> + if (GEM_WARN_ON(!obj) || !bo->resource) { >>>>> ttm_bo_move_null(bo, dst_mem); >>>>> return 0; >>>>> } >>>> >> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage 2022-08-31 9:38 ` Christian König @ 2022-08-31 10:37 ` Matthew Auld 2022-08-31 11:03 ` Christian König 0 siblings, 1 reply; 24+ messages in thread From: Matthew Auld @ 2022-08-31 10:37 UTC (permalink / raw) To: Christian König, dri-devel, Intel Graphics, Thomas Hellström Cc: Luben Tuikov On 31/08/2022 10:38, Christian König wrote: > Am 31.08.22 um 11:26 schrieb Matthew Auld: >> On 31/08/2022 09:16, Christian König wrote: >>> Hi Matthew, >>> >>> Am 30.08.22 um 12:45 schrieb Matthew Auld: >>>> Hi, >>>> >>>> On 30/08/2022 08:33, Christian König wrote: >>>>> Hi guys, >>>>> >>>>> can we get an rb/acked-by for this i915 change? >>>>> >>>>> Basically we are just making sure that the driver doesn't crash >>>>> when bo->resource is NULL and a bo doesn't have any backing store >>>>> assigned to it. >>>>> >>>>> The Intel CI seems to be happy with this change, so I'm pretty sure >>>>> it is correct. >>>> >>>> It looks like DG2/DG1 (which happen to use TTM here) are no longer >>>> loading the module: >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fintel-gfx-ci.01.org%2Ftree%2Fdrm-tip%2FPatchwork_107680v1%2Findex.html&data=05%7C01%7Cchristian.koenig%40amd.com%7Ccaca567b3279492450fd08da8b32e598%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637975347967354305%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=apanfOjzSWD2vduINzr2j6F2DiIBC93hLRnnGJcGQ5o%3D&reserved=0? >>>> >>>> According to the logs the firmware is failing to load, so perhaps >>>> related to I915_BO_ALLOC_CPU_CLEAR, since that is one of the rare >>>> users. See below. >>>> >>>>> >>>>> Thanks, >>>>> Christian. >>>>> >>>>> Am 24.08.22 um 16:23 schrieb Luben Tuikov: >>>>>> From: Christian König <christian.koenig@amd.com> >>>>>> >>>>>> Make sure we can at least move and alloc TT objects without >>>>>> backing store. >>>>>> >>>>>> Signed-off-by: Christian König <christian.koenig@amd.com> >>>>>> --- >>>>>> drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 6 ++---- >>>>>> drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 2 +- >>>>>> 2 files changed, 3 insertions(+), 5 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>>>> index bc9c432edffe03..45ce2d1f754cc4 100644 >>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>>>> @@ -271,8 +271,6 @@ static struct ttm_tt >>>>>> *i915_ttm_tt_create(struct ttm_buffer_object *bo, >>>>>> { >>>>>> struct drm_i915_private *i915 = container_of(bo->bdev, >>>>>> typeof(*i915), >>>>>> bdev); >>>>>> - struct ttm_resource_manager *man = >>>>>> - ttm_manager_type(bo->bdev, bo->resource->mem_type); >>>>>> struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); >>>>>> unsigned long ccs_pages = 0; >>>>>> enum ttm_caching caching; >>>>>> @@ -286,8 +284,8 @@ static struct ttm_tt >>>>>> *i915_ttm_tt_create(struct ttm_buffer_object *bo, >>>>>> if (!i915_tt) >>>>>> return NULL; >>>>>> - if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && >>>>>> - man->use_tt) >>>>>> + if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && bo->resource && >>>>>> + ttm_manager_type(bo->bdev, bo->resource->mem_type)->use_tt) >>>>>> page_flags |= TTM_TT_FLAG_ZERO_ALLOC; >>>> >>>> AFAICT i915 was massively relying on everything starting out in a >>>> "dummy" system memory resource (ttm_tt), where it then later >>>> "transitions" to the real resource. And if we need to clear the >>>> memory we rely on ZERO_ALLOC being set before calling into the >>>> i915_ttm_move() callback (even when allocating local-memory). >>>> >>>> For ttm_bo_type_device objects (userspace stuff) it looks like this >>>> was previously handled by ttm_bo_validate() always doing: >>>> >>>> ret = ttm_tt_create(bo, true); /* clear = true */ >>>> >>>> Which we would always hit since the resource was always "compatible" >>>> for the dummy case. But it looks like this is no longer even called, >>>> since we can now call into ttm_move with bo->resource == NULL, which >>>> still calls tt_create eventually, which not always with clear = true. >>>> >>>> All other objects are then ttm_bo_type_kernel so we don't care about >>>> clearing, except in the rare case of ALLOC_CPU_CLEAR, which was >>>> handled as per above in i915_ttm_tt_create(). But I think here >>>> bo->resource is NULL at the start when first creating the object, >>>> which will skip setting ZERO_ALLOC, which might explain the CI failure. >>>> >>>> The other possible concern (not sure since CI didn't get that far) >>>> is around ttm_bo_pipeline_gutting(), which now leaves bo->resource = >>>> NULL. It looks like i915_ttm_shrink() was relying on that to >>>> unpopulate the ttm_tt. When later calling ttm_bo_validate(), >>>> i915_ttm_move() would see the SWAPPED flag set on the ttm_tt , >>>> re-populate it and then potentially move it back to local-memory. >>>> >>>> What are your thoughts here? Also sorry if i915 is making a bit of >>>> mess here. >>> >>> First of all thanks a lot for taking a look. We previously got >>> reports about strange crashes with this patch, but couldn't really >>> reproduce them (even not by sending them out again). This explains >>> that a bit. >>> >>> The simplest solution would be to just change the && into a ||, e.g. >>> when previously either no resource is allocated or the resource >>> requires to use a tt we clear it. >>> >>> That should give you the same behavior as before, but I agree that >>> this is a bit messy. >> >> Yeah, that should do the trick. >> >> That hopefully just leaves i915_ttm_shrink(), which is swapping out >> shmem ttm_tt and is calling ttm_bo_validate() with empty placements to >> force the pipeline-gutting path, which importantly unpopulates the >> ttm_tt for us (since ttm_tt_unpopulate is not exported it seems). But >> AFAICT it looks like that will now also nuke the bo->resource, instead >> of just leaving it in system memory. My assumption is that when later >> calling ttm_bo_validate(), it will just do the bo_move_null() in >> i915_ttm_move(), instead of re-populating the ttm_tt and then >> potentially copying it back to local-memory? > > Well you do ttm_bo_validate() with something like GTT domain, don't you? > This should result in re-populating the tt object, but I'm not 100% sure > if that really works as expected. AFAIK for domains we either have system memory (which uses ttm_tt and might be shmem underneath) or local-memory. But perhaps i915 is doing something wrong here, or abusing TTM in some way. I'm not sure tbh. Anyway, I think we have two cases here: - We have some system memory only object. After doing i915_ttm_shrink(), bo->resource is now NULL. We then call ttm_bo_validate() at some later point, but here we don't need to copy anything, but it also looks like ttm_bo_handle_move_mem() won't populate the ttm_tt or us either, since mem_type == TTM_PL_SYSTEM. It looks like i915_ttm_move() was taking care of this, but now we just call ttm_bo_move_null(). - We have a local-memory only object, which was evicted to shmem, and then swapped out by the shrinker like above. The bo->resource is NULL. However this time when calling ttm_bo_validate() we need to actually do a copy in i915_ttm_move(), as well as re-populate the ttm_tt. i915_ttm_move() was taking care of this, but now we just call ttm_bo_move_null(). Perhaps i915 is doing something wrong in the above two cases? > > Thanks, > Christian. > >> >>> >>> I've been considering to replacing the ttm_bo_type with a bunch of >>> behavior flags for a bo. I'm hoping that this will clean things up a >>> bit. >>> >>> Regards, >>> Christian. >>> >>>> >>>>>> caching = i915_ttm_select_tt_caching(obj); >>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>> index 9a7e50534b84bb..c420d1ab605b6f 100644 >>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>> @@ -560,7 +560,7 @@ int i915_ttm_move(struct ttm_buffer_object >>>>>> *bo, bool evict, >>>>>> bool clear; >>>>>> int ret; >>>>>> - if (GEM_WARN_ON(!obj)) { >>>>>> + if (GEM_WARN_ON(!obj) || !bo->resource) { >>>>>> ttm_bo_move_null(bo, dst_mem); >>>>>> return 0; >>>>>> } >>>>> >>> > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage 2022-08-31 10:37 ` Matthew Auld @ 2022-08-31 11:03 ` Christian König 2022-08-31 12:06 ` Matthew Auld 0 siblings, 1 reply; 24+ messages in thread From: Christian König @ 2022-08-31 11:03 UTC (permalink / raw) To: Matthew Auld, dri-devel, Intel Graphics, Thomas Hellström Cc: Luben Tuikov Am 31.08.22 um 12:37 schrieb Matthew Auld: > [SNIP] >>> >>> That hopefully just leaves i915_ttm_shrink(), which is swapping out >>> shmem ttm_tt and is calling ttm_bo_validate() with empty placements >>> to force the pipeline-gutting path, which importantly unpopulates >>> the ttm_tt for us (since ttm_tt_unpopulate is not exported it >>> seems). But AFAICT it looks like that will now also nuke the >>> bo->resource, instead of just leaving it in system memory. My >>> assumption is that when later calling ttm_bo_validate(), it will >>> just do the bo_move_null() in i915_ttm_move(), instead of >>> re-populating the ttm_tt and then potentially copying it back to >>> local-memory? >> >> Well you do ttm_bo_validate() with something like GTT domain, don't >> you? This should result in re-populating the tt object, but I'm not >> 100% sure if that really works as expected. > > AFAIK for domains we either have system memory (which uses ttm_tt and > might be shmem underneath) or local-memory. But perhaps i915 is doing > something wrong here, or abusing TTM in some way. I'm not sure tbh. > > Anyway, I think we have two cases here: > > - We have some system memory only object. After doing > i915_ttm_shrink(), bo->resource is now NULL. We then call > ttm_bo_validate() at some later point, but here we don't need to copy > anything, but it also looks like ttm_bo_handle_move_mem() won't > populate the ttm_tt or us either, since mem_type == TTM_PL_SYSTEM. It > looks like i915_ttm_move() was taking care of this, but now we just > call ttm_bo_move_null(). > > - We have a local-memory only object, which was evicted to shmem, and > then swapped out by the shrinker like above. The bo->resource is NULL. > However this time when calling ttm_bo_validate() we need to actually > do a copy in i915_ttm_move(), as well as re-populate the ttm_tt. > i915_ttm_move() was taking care of this, but now we just call > ttm_bo_move_null(). > > Perhaps i915 is doing something wrong in the above two cases? Mhm, as far as I can see that should still work. See previously you should got a transition from SYSTEM->GTT in i915_ttm_move() to re-create your backing store. Not you get NULL->SYSTEM which is handled by ttm_bo_move_null() and then SYSTEM->GTT. If you just validated to SYSTEM memory before I think the tt object wouldn't have been populated either. Regards, Christian. > >> >> Thanks, >> Christian. >> >>> >>>> >>>> I've been considering to replacing the ttm_bo_type with a bunch of >>>> behavior flags for a bo. I'm hoping that this will clean things up >>>> a bit. >>>> >>>> Regards, >>>> Christian. >>>> >>>>> >>>>>>> caching = i915_ttm_select_tt_caching(obj); >>>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>> index 9a7e50534b84bb..c420d1ab605b6f 100644 >>>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>> @@ -560,7 +560,7 @@ int i915_ttm_move(struct ttm_buffer_object >>>>>>> *bo, bool evict, >>>>>>> bool clear; >>>>>>> int ret; >>>>>>> - if (GEM_WARN_ON(!obj)) { >>>>>>> + if (GEM_WARN_ON(!obj) || !bo->resource) { >>>>>>> ttm_bo_move_null(bo, dst_mem); >>>>>>> return 0; >>>>>>> } >>>>>> >>>> >> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage 2022-08-31 11:03 ` Christian König @ 2022-08-31 12:06 ` Matthew Auld 2022-08-31 12:35 ` Christian König 0 siblings, 1 reply; 24+ messages in thread From: Matthew Auld @ 2022-08-31 12:06 UTC (permalink / raw) To: Christian König, dri-devel, Intel Graphics, Thomas Hellström Cc: Luben Tuikov On 31/08/2022 12:03, Christian König wrote: > Am 31.08.22 um 12:37 schrieb Matthew Auld: >> [SNIP] >>>> >>>> That hopefully just leaves i915_ttm_shrink(), which is swapping out >>>> shmem ttm_tt and is calling ttm_bo_validate() with empty placements >>>> to force the pipeline-gutting path, which importantly unpopulates >>>> the ttm_tt for us (since ttm_tt_unpopulate is not exported it >>>> seems). But AFAICT it looks like that will now also nuke the >>>> bo->resource, instead of just leaving it in system memory. My >>>> assumption is that when later calling ttm_bo_validate(), it will >>>> just do the bo_move_null() in i915_ttm_move(), instead of >>>> re-populating the ttm_tt and then potentially copying it back to >>>> local-memory? >>> >>> Well you do ttm_bo_validate() with something like GTT domain, don't >>> you? This should result in re-populating the tt object, but I'm not >>> 100% sure if that really works as expected. >> >> AFAIK for domains we either have system memory (which uses ttm_tt and >> might be shmem underneath) or local-memory. But perhaps i915 is doing >> something wrong here, or abusing TTM in some way. I'm not sure tbh. >> >> Anyway, I think we have two cases here: >> >> - We have some system memory only object. After doing >> i915_ttm_shrink(), bo->resource is now NULL. We then call >> ttm_bo_validate() at some later point, but here we don't need to copy >> anything, but it also looks like ttm_bo_handle_move_mem() won't >> populate the ttm_tt or us either, since mem_type == TTM_PL_SYSTEM. It >> looks like i915_ttm_move() was taking care of this, but now we just >> call ttm_bo_move_null(). >> >> - We have a local-memory only object, which was evicted to shmem, and >> then swapped out by the shrinker like above. The bo->resource is NULL. >> However this time when calling ttm_bo_validate() we need to actually >> do a copy in i915_ttm_move(), as well as re-populate the ttm_tt. >> i915_ttm_move() was taking care of this, but now we just call >> ttm_bo_move_null(). >> >> Perhaps i915 is doing something wrong in the above two cases? > > Mhm, as far as I can see that should still work. > > See previously you should got a transition from SYSTEM->GTT in > i915_ttm_move() to re-create your backing store. Not you get > NULL->SYSTEM which is handled by ttm_bo_move_null() and then SYSTEM->GTT. What is GTT here in TTM world? Also I'm not seeing where there is this SYSTEM->GTT transition? Maybe I'm blind. Just to be clear, i915 is only calling ttm_bo_validate() once when acquiring the pages, and we don't call it again, unless it was evicted (and potentially swapped out). > > If you just validated to SYSTEM memory before I think the tt object > wouldn't have been populated either. > > Regards, > Christian. > >> >>> >>> Thanks, >>> Christian. >>> >>>> >>>>> >>>>> I've been considering to replacing the ttm_bo_type with a bunch of >>>>> behavior flags for a bo. I'm hoping that this will clean things up >>>>> a bit. >>>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>>> >>>>>>>> caching = i915_ttm_select_tt_caching(obj); >>>>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>> index 9a7e50534b84bb..c420d1ab605b6f 100644 >>>>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>> @@ -560,7 +560,7 @@ int i915_ttm_move(struct ttm_buffer_object >>>>>>>> *bo, bool evict, >>>>>>>> bool clear; >>>>>>>> int ret; >>>>>>>> - if (GEM_WARN_ON(!obj)) { >>>>>>>> + if (GEM_WARN_ON(!obj) || !bo->resource) { >>>>>>>> ttm_bo_move_null(bo, dst_mem); >>>>>>>> return 0; >>>>>>>> } >>>>>>> >>>>> >>> > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage 2022-08-31 12:06 ` Matthew Auld @ 2022-08-31 12:35 ` Christian König 2022-08-31 12:50 ` Matthew Auld 0 siblings, 1 reply; 24+ messages in thread From: Christian König @ 2022-08-31 12:35 UTC (permalink / raw) To: Matthew Auld, dri-devel, Intel Graphics, Thomas Hellström Cc: Luben Tuikov Am 31.08.22 um 14:06 schrieb Matthew Auld: > On 31/08/2022 12:03, Christian König wrote: >> Am 31.08.22 um 12:37 schrieb Matthew Auld: >>> [SNIP] >>>>> >>>>> That hopefully just leaves i915_ttm_shrink(), which is swapping >>>>> out shmem ttm_tt and is calling ttm_bo_validate() with empty >>>>> placements to force the pipeline-gutting path, which importantly >>>>> unpopulates the ttm_tt for us (since ttm_tt_unpopulate is not >>>>> exported it seems). But AFAICT it looks like that will now also >>>>> nuke the bo->resource, instead of just leaving it in system >>>>> memory. My assumption is that when later calling >>>>> ttm_bo_validate(), it will just do the bo_move_null() in >>>>> i915_ttm_move(), instead of re-populating the ttm_tt and then >>>>> potentially copying it back to local-memory? >>>> >>>> Well you do ttm_bo_validate() with something like GTT domain, don't >>>> you? This should result in re-populating the tt object, but I'm not >>>> 100% sure if that really works as expected. >>> >>> AFAIK for domains we either have system memory (which uses ttm_tt >>> and might be shmem underneath) or local-memory. But perhaps i915 is >>> doing something wrong here, or abusing TTM in some way. I'm not sure >>> tbh. >>> >>> Anyway, I think we have two cases here: >>> >>> - We have some system memory only object. After doing >>> i915_ttm_shrink(), bo->resource is now NULL. We then call >>> ttm_bo_validate() at some later point, but here we don't need to >>> copy anything, but it also looks like ttm_bo_handle_move_mem() won't >>> populate the ttm_tt or us either, since mem_type == TTM_PL_SYSTEM. >>> It looks like i915_ttm_move() was taking care of this, but now we >>> just call ttm_bo_move_null(). >>> >>> - We have a local-memory only object, which was evicted to shmem, >>> and then swapped out by the shrinker like above. The bo->resource is >>> NULL. However this time when calling ttm_bo_validate() we need to >>> actually do a copy in i915_ttm_move(), as well as re-populate the >>> ttm_tt. i915_ttm_move() was taking care of this, but now we just >>> call ttm_bo_move_null(). >>> >>> Perhaps i915 is doing something wrong in the above two cases? >> >> Mhm, as far as I can see that should still work. >> >> See previously you should got a transition from SYSTEM->GTT in >> i915_ttm_move() to re-create your backing store. Not you get >> NULL->SYSTEM which is handled by ttm_bo_move_null() and then >> SYSTEM->GTT. > > What is GTT here in TTM world? Also I'm not seeing where there is this > SYSTEM->GTT transition? Maybe I'm blind. Just to be clear, i915 is > only calling ttm_bo_validate() once when acquiring the pages, and we > don't call it again, unless it was evicted (and potentially swapped out). Well GTT means TTM_PL_TT. And calling it only once is perfectly fine, TTM will internally see that we need two hops to reach TTM_PL_TT and so does the NULL->SYSTEM transition and then SYSTEM->TT. As far as I can see that should work like it did before. Christian. > >> >> If you just validated to SYSTEM memory before I think the tt object >> wouldn't have been populated either. >> >> Regards, >> Christian. >> >>> >>>> >>>> Thanks, >>>> Christian. >>>> >>>>> >>>>>> >>>>>> I've been considering to replacing the ttm_bo_type with a bunch >>>>>> of behavior flags for a bo. I'm hoping that this will clean >>>>>> things up a bit. >>>>>> >>>>>> Regards, >>>>>> Christian. >>>>>> >>>>>>> >>>>>>>>> caching = i915_ttm_select_tt_caching(obj); >>>>>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>> index 9a7e50534b84bb..c420d1ab605b6f 100644 >>>>>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>> @@ -560,7 +560,7 @@ int i915_ttm_move(struct ttm_buffer_object >>>>>>>>> *bo, bool evict, >>>>>>>>> bool clear; >>>>>>>>> int ret; >>>>>>>>> - if (GEM_WARN_ON(!obj)) { >>>>>>>>> + if (GEM_WARN_ON(!obj) || !bo->resource) { >>>>>>>>> ttm_bo_move_null(bo, dst_mem); >>>>>>>>> return 0; >>>>>>>>> } >>>>>>>> >>>>>> >>>> >> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage 2022-08-31 12:35 ` Christian König @ 2022-08-31 12:50 ` Matthew Auld 2022-08-31 13:34 ` Christian König 0 siblings, 1 reply; 24+ messages in thread From: Matthew Auld @ 2022-08-31 12:50 UTC (permalink / raw) To: Christian König, dri-devel, Intel Graphics, Thomas Hellström Cc: Luben Tuikov On 31/08/2022 13:35, Christian König wrote: > Am 31.08.22 um 14:06 schrieb Matthew Auld: >> On 31/08/2022 12:03, Christian König wrote: >>> Am 31.08.22 um 12:37 schrieb Matthew Auld: >>>> [SNIP] >>>>>> >>>>>> That hopefully just leaves i915_ttm_shrink(), which is swapping >>>>>> out shmem ttm_tt and is calling ttm_bo_validate() with empty >>>>>> placements to force the pipeline-gutting path, which importantly >>>>>> unpopulates the ttm_tt for us (since ttm_tt_unpopulate is not >>>>>> exported it seems). But AFAICT it looks like that will now also >>>>>> nuke the bo->resource, instead of just leaving it in system >>>>>> memory. My assumption is that when later calling >>>>>> ttm_bo_validate(), it will just do the bo_move_null() in >>>>>> i915_ttm_move(), instead of re-populating the ttm_tt and then >>>>>> potentially copying it back to local-memory? >>>>> >>>>> Well you do ttm_bo_validate() with something like GTT domain, don't >>>>> you? This should result in re-populating the tt object, but I'm not >>>>> 100% sure if that really works as expected. >>>> >>>> AFAIK for domains we either have system memory (which uses ttm_tt >>>> and might be shmem underneath) or local-memory. But perhaps i915 is >>>> doing something wrong here, or abusing TTM in some way. I'm not sure >>>> tbh. >>>> >>>> Anyway, I think we have two cases here: >>>> >>>> - We have some system memory only object. After doing >>>> i915_ttm_shrink(), bo->resource is now NULL. We then call >>>> ttm_bo_validate() at some later point, but here we don't need to >>>> copy anything, but it also looks like ttm_bo_handle_move_mem() won't >>>> populate the ttm_tt or us either, since mem_type == TTM_PL_SYSTEM. >>>> It looks like i915_ttm_move() was taking care of this, but now we >>>> just call ttm_bo_move_null(). >>>> >>>> - We have a local-memory only object, which was evicted to shmem, >>>> and then swapped out by the shrinker like above. The bo->resource is >>>> NULL. However this time when calling ttm_bo_validate() we need to >>>> actually do a copy in i915_ttm_move(), as well as re-populate the >>>> ttm_tt. i915_ttm_move() was taking care of this, but now we just >>>> call ttm_bo_move_null(). >>>> >>>> Perhaps i915 is doing something wrong in the above two cases? >>> >>> Mhm, as far as I can see that should still work. >>> >>> See previously you should got a transition from SYSTEM->GTT in >>> i915_ttm_move() to re-create your backing store. Not you get >>> NULL->SYSTEM which is handled by ttm_bo_move_null() and then >>> SYSTEM->GTT. >> >> What is GTT here in TTM world? Also I'm not seeing where there is this >> SYSTEM->GTT transition? Maybe I'm blind. Just to be clear, i915 is >> only calling ttm_bo_validate() once when acquiring the pages, and we >> don't call it again, unless it was evicted (and potentially swapped out). > > Well GTT means TTM_PL_TT. > > And calling it only once is perfectly fine, TTM will internally see that > we need two hops to reach TTM_PL_TT and so does the NULL->SYSTEM > transition and then SYSTEM->TT. Ah interesting, so that's what the multi-hop thing does. But AFAICT i915 is not using either TTM_PL_TT or -EMULTIHOP. Also what is the difference between TTM_PL_TT and TM_PL_SYSTEM? When should you use one over the other? > > As far as I can see that should work like it did before. > > Christian. > >> >>> >>> If you just validated to SYSTEM memory before I think the tt object >>> wouldn't have been populated either. >>> >>> Regards, >>> Christian. >>> >>>> >>>>> >>>>> Thanks, >>>>> Christian. >>>>> >>>>>> >>>>>>> >>>>>>> I've been considering to replacing the ttm_bo_type with a bunch >>>>>>> of behavior flags for a bo. I'm hoping that this will clean >>>>>>> things up a bit. >>>>>>> >>>>>>> Regards, >>>>>>> Christian. >>>>>>> >>>>>>>> >>>>>>>>>> caching = i915_ttm_select_tt_caching(obj); >>>>>>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>>> index 9a7e50534b84bb..c420d1ab605b6f 100644 >>>>>>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>>> @@ -560,7 +560,7 @@ int i915_ttm_move(struct ttm_buffer_object >>>>>>>>>> *bo, bool evict, >>>>>>>>>> bool clear; >>>>>>>>>> int ret; >>>>>>>>>> - if (GEM_WARN_ON(!obj)) { >>>>>>>>>> + if (GEM_WARN_ON(!obj) || !bo->resource) { >>>>>>>>>> ttm_bo_move_null(bo, dst_mem); >>>>>>>>>> return 0; >>>>>>>>>> } >>>>>>>>> >>>>>>> >>>>> >>> > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage 2022-08-31 12:50 ` Matthew Auld @ 2022-08-31 13:34 ` Christian König 2022-08-31 14:53 ` Matthew Auld 2022-09-01 17:48 ` Thomas Hellström 0 siblings, 2 replies; 24+ messages in thread From: Christian König @ 2022-08-31 13:34 UTC (permalink / raw) To: Matthew Auld, dri-devel, Intel Graphics, Thomas Hellström Cc: Luben Tuikov Am 31.08.22 um 14:50 schrieb Matthew Auld: > On 31/08/2022 13:35, Christian König wrote: >> Am 31.08.22 um 14:06 schrieb Matthew Auld: >>> On 31/08/2022 12:03, Christian König wrote: >>>> Am 31.08.22 um 12:37 schrieb Matthew Auld: >>>>> [SNIP] >>>>>>> >>>>>>> That hopefully just leaves i915_ttm_shrink(), which is swapping >>>>>>> out shmem ttm_tt and is calling ttm_bo_validate() with empty >>>>>>> placements to force the pipeline-gutting path, which importantly >>>>>>> unpopulates the ttm_tt for us (since ttm_tt_unpopulate is not >>>>>>> exported it seems). But AFAICT it looks like that will now also >>>>>>> nuke the bo->resource, instead of just leaving it in system >>>>>>> memory. My assumption is that when later calling >>>>>>> ttm_bo_validate(), it will just do the bo_move_null() in >>>>>>> i915_ttm_move(), instead of re-populating the ttm_tt and then >>>>>>> potentially copying it back to local-memory? >>>>>> >>>>>> Well you do ttm_bo_validate() with something like GTT domain, >>>>>> don't you? This should result in re-populating the tt object, but >>>>>> I'm not 100% sure if that really works as expected. >>>>> >>>>> AFAIK for domains we either have system memory (which uses ttm_tt >>>>> and might be shmem underneath) or local-memory. But perhaps i915 >>>>> is doing something wrong here, or abusing TTM in some way. I'm not >>>>> sure tbh. >>>>> >>>>> Anyway, I think we have two cases here: >>>>> >>>>> - We have some system memory only object. After doing >>>>> i915_ttm_shrink(), bo->resource is now NULL. We then call >>>>> ttm_bo_validate() at some later point, but here we don't need to >>>>> copy anything, but it also looks like ttm_bo_handle_move_mem() >>>>> won't populate the ttm_tt or us either, since mem_type == >>>>> TTM_PL_SYSTEM. It looks like i915_ttm_move() was taking care of >>>>> this, but now we just call ttm_bo_move_null(). >>>>> >>>>> - We have a local-memory only object, which was evicted to shmem, >>>>> and then swapped out by the shrinker like above. The bo->resource >>>>> is NULL. However this time when calling ttm_bo_validate() we need >>>>> to actually do a copy in i915_ttm_move(), as well as re-populate >>>>> the ttm_tt. i915_ttm_move() was taking care of this, but now we >>>>> just call ttm_bo_move_null(). >>>>> >>>>> Perhaps i915 is doing something wrong in the above two cases? >>>> >>>> Mhm, as far as I can see that should still work. >>>> >>>> See previously you should got a transition from SYSTEM->GTT in >>>> i915_ttm_move() to re-create your backing store. Not you get >>>> NULL->SYSTEM which is handled by ttm_bo_move_null() and then >>>> SYSTEM->GTT. >>> >>> What is GTT here in TTM world? Also I'm not seeing where there is >>> this SYSTEM->GTT transition? Maybe I'm blind. Just to be clear, i915 >>> is only calling ttm_bo_validate() once when acquiring the pages, and >>> we don't call it again, unless it was evicted (and potentially >>> swapped out). >> >> Well GTT means TTM_PL_TT. >> >> And calling it only once is perfectly fine, TTM will internally see >> that we need two hops to reach TTM_PL_TT and so does the NULL->SYSTEM >> transition and then SYSTEM->TT. > > Ah interesting, so that's what the multi-hop thing does. But AFAICT > i915 is not using either TTM_PL_TT or -EMULTIHOP. Mhm, it could be that we then have a problem and the i915 driver only sees NULL->TT directly. But I really don't know the i915 driver code good enough to judge that. Can you take a look at this and test it maybe? > > Also what is the difference between TTM_PL_TT and TM_PL_SYSTEM? When > should you use one over the other? TTM_PL_SYSTEM means the device is not accessing the buffer and TTM has the control over the backing store and can swapout/swapin as it wants it. TTM_PL_TT means that the device is accessing the data (TT stands for translation table) and so TTM can't swap the backing store in/out. TTM_PL_VRAM well that one is obvious. Thanks, Christian. > >> >> As far as I can see that should work like it did before. >> >> Christian. >> >>> >>>> >>>> If you just validated to SYSTEM memory before I think the tt object >>>> wouldn't have been populated either. >>>> >>>> Regards, >>>> Christian. >>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> Christian. >>>>>> >>>>>>> >>>>>>>> >>>>>>>> I've been considering to replacing the ttm_bo_type with a bunch >>>>>>>> of behavior flags for a bo. I'm hoping that this will clean >>>>>>>> things up a bit. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Christian. >>>>>>>> >>>>>>>>> >>>>>>>>>>> caching = i915_ttm_select_tt_caching(obj); >>>>>>>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>>>> index 9a7e50534b84bb..c420d1ab605b6f 100644 >>>>>>>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>>>> @@ -560,7 +560,7 @@ int i915_ttm_move(struct >>>>>>>>>>> ttm_buffer_object *bo, bool evict, >>>>>>>>>>> bool clear; >>>>>>>>>>> int ret; >>>>>>>>>>> - if (GEM_WARN_ON(!obj)) { >>>>>>>>>>> + if (GEM_WARN_ON(!obj) || !bo->resource) { >>>>>>>>>>> ttm_bo_move_null(bo, dst_mem); >>>>>>>>>>> return 0; >>>>>>>>>>> } >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage 2022-08-31 13:34 ` Christian König @ 2022-08-31 14:53 ` Matthew Auld 2022-08-31 16:32 ` Matthew Auld 2022-09-01 17:48 ` Thomas Hellström 1 sibling, 1 reply; 24+ messages in thread From: Matthew Auld @ 2022-08-31 14:53 UTC (permalink / raw) To: Christian König, dri-devel, Intel Graphics, Thomas Hellström Cc: Luben Tuikov On 31/08/2022 14:34, Christian König wrote: > Am 31.08.22 um 14:50 schrieb Matthew Auld: >> On 31/08/2022 13:35, Christian König wrote: >>> Am 31.08.22 um 14:06 schrieb Matthew Auld: >>>> On 31/08/2022 12:03, Christian König wrote: >>>>> Am 31.08.22 um 12:37 schrieb Matthew Auld: >>>>>> [SNIP] >>>>>>>> >>>>>>>> That hopefully just leaves i915_ttm_shrink(), which is swapping >>>>>>>> out shmem ttm_tt and is calling ttm_bo_validate() with empty >>>>>>>> placements to force the pipeline-gutting path, which importantly >>>>>>>> unpopulates the ttm_tt for us (since ttm_tt_unpopulate is not >>>>>>>> exported it seems). But AFAICT it looks like that will now also >>>>>>>> nuke the bo->resource, instead of just leaving it in system >>>>>>>> memory. My assumption is that when later calling >>>>>>>> ttm_bo_validate(), it will just do the bo_move_null() in >>>>>>>> i915_ttm_move(), instead of re-populating the ttm_tt and then >>>>>>>> potentially copying it back to local-memory? >>>>>>> >>>>>>> Well you do ttm_bo_validate() with something like GTT domain, >>>>>>> don't you? This should result in re-populating the tt object, but >>>>>>> I'm not 100% sure if that really works as expected. >>>>>> >>>>>> AFAIK for domains we either have system memory (which uses ttm_tt >>>>>> and might be shmem underneath) or local-memory. But perhaps i915 >>>>>> is doing something wrong here, or abusing TTM in some way. I'm not >>>>>> sure tbh. >>>>>> >>>>>> Anyway, I think we have two cases here: >>>>>> >>>>>> - We have some system memory only object. After doing >>>>>> i915_ttm_shrink(), bo->resource is now NULL. We then call >>>>>> ttm_bo_validate() at some later point, but here we don't need to >>>>>> copy anything, but it also looks like ttm_bo_handle_move_mem() >>>>>> won't populate the ttm_tt or us either, since mem_type == >>>>>> TTM_PL_SYSTEM. It looks like i915_ttm_move() was taking care of >>>>>> this, but now we just call ttm_bo_move_null(). >>>>>> >>>>>> - We have a local-memory only object, which was evicted to shmem, >>>>>> and then swapped out by the shrinker like above. The bo->resource >>>>>> is NULL. However this time when calling ttm_bo_validate() we need >>>>>> to actually do a copy in i915_ttm_move(), as well as re-populate >>>>>> the ttm_tt. i915_ttm_move() was taking care of this, but now we >>>>>> just call ttm_bo_move_null(). >>>>>> >>>>>> Perhaps i915 is doing something wrong in the above two cases? >>>>> >>>>> Mhm, as far as I can see that should still work. >>>>> >>>>> See previously you should got a transition from SYSTEM->GTT in >>>>> i915_ttm_move() to re-create your backing store. Not you get >>>>> NULL->SYSTEM which is handled by ttm_bo_move_null() and then >>>>> SYSTEM->GTT. >>>> >>>> What is GTT here in TTM world? Also I'm not seeing where there is >>>> this SYSTEM->GTT transition? Maybe I'm blind. Just to be clear, i915 >>>> is only calling ttm_bo_validate() once when acquiring the pages, and >>>> we don't call it again, unless it was evicted (and potentially >>>> swapped out). >>> >>> Well GTT means TTM_PL_TT. >>> >>> And calling it only once is perfectly fine, TTM will internally see >>> that we need two hops to reach TTM_PL_TT and so does the NULL->SYSTEM >>> transition and then SYSTEM->TT. >> >> Ah interesting, so that's what the multi-hop thing does. But AFAICT >> i915 is not using either TTM_PL_TT or -EMULTIHOP. > > Mhm, it could be that we then have a problem and the i915 driver only > sees NULL->TT directly. But I really don't know the i915 driver code > good enough to judge that. > > Can you take a look at this and test it maybe? I'll grab a machine and try to see what is going on here. > >> >> Also what is the difference between TTM_PL_TT and TM_PL_SYSTEM? When >> should you use one over the other? > > TTM_PL_SYSTEM means the device is not accessing the buffer and TTM has > the control over the backing store and can swapout/swapin as it wants it. > > TTM_PL_TT means that the device is accessing the data (TT stands for > translation table) and so TTM can't swap the backing store in/out. > > TTM_PL_VRAM well that one is obvious. Thanks for the explanation. So it looks like i915 is using TTM_PL_SYSTEM even for device access it seems. > > Thanks, > Christian. > >> >>> >>> As far as I can see that should work like it did before. >>> >>> Christian. >>> >>>> >>>>> >>>>> If you just validated to SYSTEM memory before I think the tt object >>>>> wouldn't have been populated either. >>>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Christian. >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> I've been considering to replacing the ttm_bo_type with a bunch >>>>>>>>> of behavior flags for a bo. I'm hoping that this will clean >>>>>>>>> things up a bit. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Christian. >>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> caching = i915_ttm_select_tt_caching(obj); >>>>>>>>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>>>>> index 9a7e50534b84bb..c420d1ab605b6f 100644 >>>>>>>>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>>>>> @@ -560,7 +560,7 @@ int i915_ttm_move(struct >>>>>>>>>>>> ttm_buffer_object *bo, bool evict, >>>>>>>>>>>> bool clear; >>>>>>>>>>>> int ret; >>>>>>>>>>>> - if (GEM_WARN_ON(!obj)) { >>>>>>>>>>>> + if (GEM_WARN_ON(!obj) || !bo->resource) { >>>>>>>>>>>> ttm_bo_move_null(bo, dst_mem); >>>>>>>>>>>> return 0; >>>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage 2022-08-31 14:53 ` Matthew Auld @ 2022-08-31 16:32 ` Matthew Auld 2022-09-01 8:00 ` Christian König 0 siblings, 1 reply; 24+ messages in thread From: Matthew Auld @ 2022-08-31 16:32 UTC (permalink / raw) To: Christian König, dri-devel, Intel Graphics, Thomas Hellström Cc: Luben Tuikov On 31/08/2022 15:53, Matthew Auld wrote: > On 31/08/2022 14:34, Christian König wrote: >> Am 31.08.22 um 14:50 schrieb Matthew Auld: >>> On 31/08/2022 13:35, Christian König wrote: >>>> Am 31.08.22 um 14:06 schrieb Matthew Auld: >>>>> On 31/08/2022 12:03, Christian König wrote: >>>>>> Am 31.08.22 um 12:37 schrieb Matthew Auld: >>>>>>> [SNIP] >>>>>>>>> >>>>>>>>> That hopefully just leaves i915_ttm_shrink(), which is swapping >>>>>>>>> out shmem ttm_tt and is calling ttm_bo_validate() with empty >>>>>>>>> placements to force the pipeline-gutting path, which >>>>>>>>> importantly unpopulates the ttm_tt for us (since >>>>>>>>> ttm_tt_unpopulate is not exported it seems). But AFAICT it >>>>>>>>> looks like that will now also nuke the bo->resource, instead of >>>>>>>>> just leaving it in system memory. My assumption is that when >>>>>>>>> later calling ttm_bo_validate(), it will just do the >>>>>>>>> bo_move_null() in i915_ttm_move(), instead of re-populating the >>>>>>>>> ttm_tt and then potentially copying it back to local-memory? >>>>>>>> >>>>>>>> Well you do ttm_bo_validate() with something like GTT domain, >>>>>>>> don't you? This should result in re-populating the tt object, >>>>>>>> but I'm not 100% sure if that really works as expected. >>>>>>> >>>>>>> AFAIK for domains we either have system memory (which uses ttm_tt >>>>>>> and might be shmem underneath) or local-memory. But perhaps i915 >>>>>>> is doing something wrong here, or abusing TTM in some way. I'm >>>>>>> not sure tbh. >>>>>>> >>>>>>> Anyway, I think we have two cases here: >>>>>>> >>>>>>> - We have some system memory only object. After doing >>>>>>> i915_ttm_shrink(), bo->resource is now NULL. We then call >>>>>>> ttm_bo_validate() at some later point, but here we don't need to >>>>>>> copy anything, but it also looks like ttm_bo_handle_move_mem() >>>>>>> won't populate the ttm_tt or us either, since mem_type == >>>>>>> TTM_PL_SYSTEM. It looks like i915_ttm_move() was taking care of >>>>>>> this, but now we just call ttm_bo_move_null(). >>>>>>> >>>>>>> - We have a local-memory only object, which was evicted to shmem, >>>>>>> and then swapped out by the shrinker like above. The bo->resource >>>>>>> is NULL. However this time when calling ttm_bo_validate() we need >>>>>>> to actually do a copy in i915_ttm_move(), as well as re-populate >>>>>>> the ttm_tt. i915_ttm_move() was taking care of this, but now we >>>>>>> just call ttm_bo_move_null(). >>>>>>> >>>>>>> Perhaps i915 is doing something wrong in the above two cases? >>>>>> >>>>>> Mhm, as far as I can see that should still work. >>>>>> >>>>>> See previously you should got a transition from SYSTEM->GTT in >>>>>> i915_ttm_move() to re-create your backing store. Not you get >>>>>> NULL->SYSTEM which is handled by ttm_bo_move_null() and then >>>>>> SYSTEM->GTT. >>>>> >>>>> What is GTT here in TTM world? Also I'm not seeing where there is >>>>> this SYSTEM->GTT transition? Maybe I'm blind. Just to be clear, >>>>> i915 is only calling ttm_bo_validate() once when acquiring the >>>>> pages, and we don't call it again, unless it was evicted (and >>>>> potentially swapped out). >>>> >>>> Well GTT means TTM_PL_TT. >>>> >>>> And calling it only once is perfectly fine, TTM will internally see >>>> that we need two hops to reach TTM_PL_TT and so does the >>>> NULL->SYSTEM transition and then SYSTEM->TT. >>> >>> Ah interesting, so that's what the multi-hop thing does. But AFAICT >>> i915 is not using either TTM_PL_TT or -EMULTIHOP. >> >> Mhm, it could be that we then have a problem and the i915 driver only >> sees NULL->TT directly. But I really don't know the i915 driver code >> good enough to judge that. >> >> Can you take a look at this and test it maybe? > > I'll grab a machine and try to see what is going on here. Well at least the issue with the firmware not loading looks to be fixed now. So running some eviction + oom tests it looks it now does: /* eviction kicks in */ i915_ttm_move(bo): LMEM -> PL_SYSTEM /* shrinker/oom kicks in at some point */ i915_ttm_shrink(bo): bo->resource = NULL, /* pipeline_gutting */ shmem ttm_tt is unpopulated and pages are correctly swapped out /* user touches the same object later */ i915_ttm_move(bo): NULL -> LMEM, bo_move_null() So seems to incorrectly skip swapping it back in and then copy over to lmem. It just allocates directly in lmem. And previously the last two steps would have been: i915_ttm_shrink(bo): bo->resource = PL_SYSTEM, /* pipeline_gutting */ shmem ttm_tt is unpopulated and pages are correctly swapped out i915_ttm_move(bo): PL_SYSTEM -> LMEM, ttm_tt is repopulated and pages are copied over to lmem > >> >>> >>> Also what is the difference between TTM_PL_TT and TM_PL_SYSTEM? When >>> should you use one over the other? >> >> TTM_PL_SYSTEM means the device is not accessing the buffer and TTM has >> the control over the backing store and can swapout/swapin as it wants it. >> >> TTM_PL_TT means that the device is accessing the data (TT stands for >> translation table) and so TTM can't swap the backing store in/out. >> >> TTM_PL_VRAM well that one is obvious. > > Thanks for the explanation. So it looks like i915 is using TTM_PL_SYSTEM > even for device access it seems. > >> >> Thanks, >> Christian. >> >>> >>>> >>>> As far as I can see that should work like it did before. >>>> >>>> Christian. >>>> >>>>> >>>>>> >>>>>> If you just validated to SYSTEM memory before I think the tt >>>>>> object wouldn't have been populated either. >>>>>> >>>>>> Regards, >>>>>> Christian. >>>>>> >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Christian. >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> I've been considering to replacing the ttm_bo_type with a >>>>>>>>>> bunch of behavior flags for a bo. I'm hoping that this will >>>>>>>>>> clean things up a bit. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Christian. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> caching = i915_ttm_select_tt_caching(obj); >>>>>>>>>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>>>>>> index 9a7e50534b84bb..c420d1ab605b6f 100644 >>>>>>>>>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >>>>>>>>>>>>> @@ -560,7 +560,7 @@ int i915_ttm_move(struct >>>>>>>>>>>>> ttm_buffer_object *bo, bool evict, >>>>>>>>>>>>> bool clear; >>>>>>>>>>>>> int ret; >>>>>>>>>>>>> - if (GEM_WARN_ON(!obj)) { >>>>>>>>>>>>> + if (GEM_WARN_ON(!obj) || !bo->resource) { >>>>>>>>>>>>> ttm_bo_move_null(bo, dst_mem); >>>>>>>>>>>>> return 0; >>>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage 2022-08-31 16:32 ` Matthew Auld @ 2022-09-01 8:00 ` Christian König 2022-09-01 12:52 ` Matthew Auld 0 siblings, 1 reply; 24+ messages in thread From: Christian König @ 2022-09-01 8:00 UTC (permalink / raw) To: Matthew Auld, dri-devel, Intel Graphics, Thomas Hellström Cc: Luben Tuikov Am 31.08.22 um 18:32 schrieb Matthew Auld: > On 31/08/2022 15:53, Matthew Auld wrote: >> On 31/08/2022 14:34, Christian König wrote: >>> Am 31.08.22 um 14:50 schrieb Matthew Auld: >>>> On 31/08/2022 13:35, Christian König wrote: >>>>> Am 31.08.22 um 14:06 schrieb Matthew Auld: >>>>>> On 31/08/2022 12:03, Christian König wrote: >>>>>>> Am 31.08.22 um 12:37 schrieb Matthew Auld: >>>>>>>> [SNIP] >>>>>>>>>> >>>>>>>>>> That hopefully just leaves i915_ttm_shrink(), which is >>>>>>>>>> swapping out shmem ttm_tt and is calling ttm_bo_validate() >>>>>>>>>> with empty placements to force the pipeline-gutting path, >>>>>>>>>> which importantly unpopulates the ttm_tt for us (since >>>>>>>>>> ttm_tt_unpopulate is not exported it seems). But AFAICT it >>>>>>>>>> looks like that will now also nuke the bo->resource, instead >>>>>>>>>> of just leaving it in system memory. My assumption is that >>>>>>>>>> when later calling ttm_bo_validate(), it will just do the >>>>>>>>>> bo_move_null() in i915_ttm_move(), instead of re-populating >>>>>>>>>> the ttm_tt and then potentially copying it back to local-memory? >>>>>>>>> >>>>>>>>> Well you do ttm_bo_validate() with something like GTT domain, >>>>>>>>> don't you? This should result in re-populating the tt object, >>>>>>>>> but I'm not 100% sure if that really works as expected. >>>>>>>> >>>>>>>> AFAIK for domains we either have system memory (which uses >>>>>>>> ttm_tt and might be shmem underneath) or local-memory. But >>>>>>>> perhaps i915 is doing something wrong here, or abusing TTM in >>>>>>>> some way. I'm not sure tbh. >>>>>>>> >>>>>>>> Anyway, I think we have two cases here: >>>>>>>> >>>>>>>> - We have some system memory only object. After doing >>>>>>>> i915_ttm_shrink(), bo->resource is now NULL. We then call >>>>>>>> ttm_bo_validate() at some later point, but here we don't need >>>>>>>> to copy anything, but it also looks like >>>>>>>> ttm_bo_handle_move_mem() won't populate the ttm_tt or us >>>>>>>> either, since mem_type == TTM_PL_SYSTEM. It looks like >>>>>>>> i915_ttm_move() was taking care of this, but now we just call >>>>>>>> ttm_bo_move_null(). >>>>>>>> >>>>>>>> - We have a local-memory only object, which was evicted to >>>>>>>> shmem, and then swapped out by the shrinker like above. The >>>>>>>> bo->resource is NULL. However this time when calling >>>>>>>> ttm_bo_validate() we need to actually do a copy in >>>>>>>> i915_ttm_move(), as well as re-populate the ttm_tt. >>>>>>>> i915_ttm_move() was taking care of this, but now we just call >>>>>>>> ttm_bo_move_null(). >>>>>>>> >>>>>>>> Perhaps i915 is doing something wrong in the above two cases? >>>>>>> >>>>>>> Mhm, as far as I can see that should still work. >>>>>>> >>>>>>> See previously you should got a transition from SYSTEM->GTT in >>>>>>> i915_ttm_move() to re-create your backing store. Not you get >>>>>>> NULL->SYSTEM which is handled by ttm_bo_move_null() and then >>>>>>> SYSTEM->GTT. >>>>>> >>>>>> What is GTT here in TTM world? Also I'm not seeing where there is >>>>>> this SYSTEM->GTT transition? Maybe I'm blind. Just to be clear, >>>>>> i915 is only calling ttm_bo_validate() once when acquiring the >>>>>> pages, and we don't call it again, unless it was evicted (and >>>>>> potentially swapped out). >>>>> >>>>> Well GTT means TTM_PL_TT. >>>>> >>>>> And calling it only once is perfectly fine, TTM will internally >>>>> see that we need two hops to reach TTM_PL_TT and so does the >>>>> NULL->SYSTEM transition and then SYSTEM->TT. >>>> >>>> Ah interesting, so that's what the multi-hop thing does. But AFAICT >>>> i915 is not using either TTM_PL_TT or -EMULTIHOP. >>> >>> Mhm, it could be that we then have a problem and the i915 driver >>> only sees NULL->TT directly. But I really don't know the i915 driver >>> code good enough to judge that. >>> >>> Can you take a look at this and test it maybe? >> >> I'll grab a machine and try to see what is going on here. > > Well at least the issue with the firmware not loading looks to be > fixed now. > > So running some eviction + oom tests it looks it now does: > > /* eviction kicks in */ > i915_ttm_move(bo): LMEM -> PL_SYSTEM > > /* shrinker/oom kicks in at some point */ > i915_ttm_shrink(bo): > bo->resource = NULL, /* pipeline_gutting */ > shmem ttm_tt is unpopulated and pages are correctly swapped out > > /* user touches the same object later */ > i915_ttm_move(bo): NULL -> LMEM, bo_move_null() > > So seems to incorrectly skip swapping it back in and then copy over to > lmem. It just allocates directly in lmem. > > And previously the last two steps would have been: > > i915_ttm_shrink(bo): > bo->resource = PL_SYSTEM, /* pipeline_gutting */ > shmem ttm_tt is unpopulated and pages are correctly swapped out > > i915_ttm_move(bo): > PL_SYSTEM -> LMEM, > ttm_tt is repopulated and pages are copied over to lmem Mhm, crap. That indeed looks like it won't work. How about changing the code like this: diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c index c420d1ab605b..1ee7d81ddcbc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c @@ -560,7 +560,17 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict, bool clear; int ret; - if (GEM_WARN_ON(!obj) || !bo->resource) { + if (GEM_WARN_ON(!obj)) { + ttm_bo_move_null(bo, dst_mem); + return 0; + } + + if (!bo->resource) { + if (dst_mem->mem_type != TTM_PL_SYSTEM) { + hop->mem_type = TTM_PL_SYSTEM; + hop->flags = TTM_PL_FLAG_TEMPORARY; + return -EMULTIHOP; + } ttm_bo_move_null(bo, dst_mem); return 0; } That should result in allocating a TTM_PL_SYSTEM resource first and then moving from that to the final destination. Thanks, Christian. ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage 2022-09-01 8:00 ` Christian König @ 2022-09-01 12:52 ` Matthew Auld 0 siblings, 0 replies; 24+ messages in thread From: Matthew Auld @ 2022-09-01 12:52 UTC (permalink / raw) To: Christian König, dri-devel, Intel Graphics, Thomas Hellström Cc: Luben Tuikov On 01/09/2022 09:00, Christian König wrote: > Am 31.08.22 um 18:32 schrieb Matthew Auld: >> On 31/08/2022 15:53, Matthew Auld wrote: >>> On 31/08/2022 14:34, Christian König wrote: >>>> Am 31.08.22 um 14:50 schrieb Matthew Auld: >>>>> On 31/08/2022 13:35, Christian König wrote: >>>>>> Am 31.08.22 um 14:06 schrieb Matthew Auld: >>>>>>> On 31/08/2022 12:03, Christian König wrote: >>>>>>>> Am 31.08.22 um 12:37 schrieb Matthew Auld: >>>>>>>>> [SNIP] >>>>>>>>>>> >>>>>>>>>>> That hopefully just leaves i915_ttm_shrink(), which is >>>>>>>>>>> swapping out shmem ttm_tt and is calling ttm_bo_validate() >>>>>>>>>>> with empty placements to force the pipeline-gutting path, >>>>>>>>>>> which importantly unpopulates the ttm_tt for us (since >>>>>>>>>>> ttm_tt_unpopulate is not exported it seems). But AFAICT it >>>>>>>>>>> looks like that will now also nuke the bo->resource, instead >>>>>>>>>>> of just leaving it in system memory. My assumption is that >>>>>>>>>>> when later calling ttm_bo_validate(), it will just do the >>>>>>>>>>> bo_move_null() in i915_ttm_move(), instead of re-populating >>>>>>>>>>> the ttm_tt and then potentially copying it back to local-memory? >>>>>>>>>> >>>>>>>>>> Well you do ttm_bo_validate() with something like GTT domain, >>>>>>>>>> don't you? This should result in re-populating the tt object, >>>>>>>>>> but I'm not 100% sure if that really works as expected. >>>>>>>>> >>>>>>>>> AFAIK for domains we either have system memory (which uses >>>>>>>>> ttm_tt and might be shmem underneath) or local-memory. But >>>>>>>>> perhaps i915 is doing something wrong here, or abusing TTM in >>>>>>>>> some way. I'm not sure tbh. >>>>>>>>> >>>>>>>>> Anyway, I think we have two cases here: >>>>>>>>> >>>>>>>>> - We have some system memory only object. After doing >>>>>>>>> i915_ttm_shrink(), bo->resource is now NULL. We then call >>>>>>>>> ttm_bo_validate() at some later point, but here we don't need >>>>>>>>> to copy anything, but it also looks like >>>>>>>>> ttm_bo_handle_move_mem() won't populate the ttm_tt or us >>>>>>>>> either, since mem_type == TTM_PL_SYSTEM. It looks like >>>>>>>>> i915_ttm_move() was taking care of this, but now we just call >>>>>>>>> ttm_bo_move_null(). >>>>>>>>> >>>>>>>>> - We have a local-memory only object, which was evicted to >>>>>>>>> shmem, and then swapped out by the shrinker like above. The >>>>>>>>> bo->resource is NULL. However this time when calling >>>>>>>>> ttm_bo_validate() we need to actually do a copy in >>>>>>>>> i915_ttm_move(), as well as re-populate the ttm_tt. >>>>>>>>> i915_ttm_move() was taking care of this, but now we just call >>>>>>>>> ttm_bo_move_null(). >>>>>>>>> >>>>>>>>> Perhaps i915 is doing something wrong in the above two cases? >>>>>>>> >>>>>>>> Mhm, as far as I can see that should still work. >>>>>>>> >>>>>>>> See previously you should got a transition from SYSTEM->GTT in >>>>>>>> i915_ttm_move() to re-create your backing store. Not you get >>>>>>>> NULL->SYSTEM which is handled by ttm_bo_move_null() and then >>>>>>>> SYSTEM->GTT. >>>>>>> >>>>>>> What is GTT here in TTM world? Also I'm not seeing where there is >>>>>>> this SYSTEM->GTT transition? Maybe I'm blind. Just to be clear, >>>>>>> i915 is only calling ttm_bo_validate() once when acquiring the >>>>>>> pages, and we don't call it again, unless it was evicted (and >>>>>>> potentially swapped out). >>>>>> >>>>>> Well GTT means TTM_PL_TT. >>>>>> >>>>>> And calling it only once is perfectly fine, TTM will internally >>>>>> see that we need two hops to reach TTM_PL_TT and so does the >>>>>> NULL->SYSTEM transition and then SYSTEM->TT. >>>>> >>>>> Ah interesting, so that's what the multi-hop thing does. But AFAICT >>>>> i915 is not using either TTM_PL_TT or -EMULTIHOP. >>>> >>>> Mhm, it could be that we then have a problem and the i915 driver >>>> only sees NULL->TT directly. But I really don't know the i915 driver >>>> code good enough to judge that. >>>> >>>> Can you take a look at this and test it maybe? >>> >>> I'll grab a machine and try to see what is going on here. >> >> Well at least the issue with the firmware not loading looks to be >> fixed now. >> >> So running some eviction + oom tests it looks it now does: >> >> /* eviction kicks in */ >> i915_ttm_move(bo): LMEM -> PL_SYSTEM >> >> /* shrinker/oom kicks in at some point */ >> i915_ttm_shrink(bo): >> bo->resource = NULL, /* pipeline_gutting */ >> shmem ttm_tt is unpopulated and pages are correctly swapped out >> >> /* user touches the same object later */ >> i915_ttm_move(bo): NULL -> LMEM, bo_move_null() >> >> So seems to incorrectly skip swapping it back in and then copy over to >> lmem. It just allocates directly in lmem. >> >> And previously the last two steps would have been: >> >> i915_ttm_shrink(bo): >> bo->resource = PL_SYSTEM, /* pipeline_gutting */ >> shmem ttm_tt is unpopulated and pages are correctly swapped out >> >> i915_ttm_move(bo): >> PL_SYSTEM -> LMEM, >> ttm_tt is repopulated and pages are copied over to lmem > > Mhm, crap. That indeed looks like it won't work. > > How about changing the code like this: > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > index c420d1ab605b..1ee7d81ddcbc 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > @@ -560,7 +560,17 @@ int i915_ttm_move(struct ttm_buffer_object *bo, > bool evict, > bool clear; > int ret; > > - if (GEM_WARN_ON(!obj) || !bo->resource) { > + if (GEM_WARN_ON(!obj)) { > + ttm_bo_move_null(bo, dst_mem); > + return 0; > + } > + > + if (!bo->resource) { > + if (dst_mem->mem_type != TTM_PL_SYSTEM) { > + hop->mem_type = TTM_PL_SYSTEM; > + hop->flags = TTM_PL_FLAG_TEMPORARY; > + return -EMULTIHOP; > + } > ttm_bo_move_null(bo, dst_mem); > return 0; > } > > That should result in allocating a TTM_PL_SYSTEM resource first and then > moving from that to the final destination. Ok, I played around with this some more. The final diff looks like: https://patchwork.freedesktop.org/patch/500786/?series=108027&rev=1 It looks like we had some more places where bo->resource == NULL didn't end well. It at least now seems to survive my local testing. > > Thanks, > Christian. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage 2022-08-31 13:34 ` Christian König 2022-08-31 14:53 ` Matthew Auld @ 2022-09-01 17:48 ` Thomas Hellström 1 sibling, 0 replies; 24+ messages in thread From: Thomas Hellström @ 2022-09-01 17:48 UTC (permalink / raw) To: Christian König, Matthew Auld, dri-devel, Intel Graphics Cc: Luben Tuikov On Wed, 2022-08-31 at 15:34 +0200, Christian König wrote: > Am 31.08.22 um 14:50 schrieb Matthew Auld: > > On 31/08/2022 13:35, Christian König wrote: > > > Am 31.08.22 um 14:06 schrieb Matthew Auld: > > > > On 31/08/2022 12:03, Christian König wrote: > > > > > Am 31.08.22 um 12:37 schrieb Matthew Auld: > > > > > > [SNIP] > > > > > > > > > > > > > > > > That hopefully just leaves i915_ttm_shrink(), which is > > > > > > > > swapping > > > > > > > > out shmem ttm_tt and is calling ttm_bo_validate() with > > > > > > > > empty > > > > > > > > placements to force the pipeline-gutting path, which > > > > > > > > importantly > > > > > > > > unpopulates the ttm_tt for us (since ttm_tt_unpopulate > > > > > > > > is not > > > > > > > > exported it seems). But AFAICT it looks like that will > > > > > > > > now also > > > > > > > > nuke the bo->resource, instead of just leaving it in > > > > > > > > system > > > > > > > > memory. My assumption is that when later calling > > > > > > > > ttm_bo_validate(), it will just do the bo_move_null() > > > > > > > > in > > > > > > > > i915_ttm_move(), instead of re-populating the ttm_tt > > > > > > > > and then > > > > > > > > potentially copying it back to local-memory? > > > > > > > > > > > > > > Well you do ttm_bo_validate() with something like GTT > > > > > > > domain, > > > > > > > don't you? This should result in re-populating the tt > > > > > > > object, but > > > > > > > I'm not 100% sure if that really works as expected. > > > > > > > > > > > > AFAIK for domains we either have system memory (which uses > > > > > > ttm_tt > > > > > > and might be shmem underneath) or local-memory. But perhaps > > > > > > i915 > > > > > > is doing something wrong here, or abusing TTM in some way. > > > > > > I'm not > > > > > > sure tbh. > > > > > > > > > > > > Anyway, I think we have two cases here: > > > > > > > > > > > > - We have some system memory only object. After doing > > > > > > i915_ttm_shrink(), bo->resource is now NULL. We then call > > > > > > ttm_bo_validate() at some later point, but here we don't > > > > > > need to > > > > > > copy anything, but it also looks like > > > > > > ttm_bo_handle_move_mem() > > > > > > won't populate the ttm_tt or us either, since mem_type == > > > > > > TTM_PL_SYSTEM. It looks like i915_ttm_move() was taking > > > > > > care of > > > > > > this, but now we just call ttm_bo_move_null(). > > > > > > > > > > > > - We have a local-memory only object, which was evicted to > > > > > > shmem, > > > > > > and then swapped out by the shrinker like above. The bo- > > > > > > >resource > > > > > > is NULL. However this time when calling ttm_bo_validate() > > > > > > we need > > > > > > to actually do a copy in i915_ttm_move(), as well as re- > > > > > > populate > > > > > > the ttm_tt. i915_ttm_move() was taking care of this, but > > > > > > now we > > > > > > just call ttm_bo_move_null(). > > > > > > > > > > > > Perhaps i915 is doing something wrong in the above two > > > > > > cases? > > > > > > > > > > Mhm, as far as I can see that should still work. > > > > > > > > > > See previously you should got a transition from SYSTEM->GTT > > > > > in > > > > > i915_ttm_move() to re-create your backing store. Not you get > > > > > NULL->SYSTEM which is handled by ttm_bo_move_null() and then > > > > > SYSTEM->GTT. > > > > > > > > What is GTT here in TTM world? Also I'm not seeing where there > > > > is > > > > this SYSTEM->GTT transition? Maybe I'm blind. Just to be clear, > > > > i915 > > > > is only calling ttm_bo_validate() once when acquiring the > > > > pages, and > > > > we don't call it again, unless it was evicted (and potentially > > > > swapped out). > > > > > > Well GTT means TTM_PL_TT. > > > > > > And calling it only once is perfectly fine, TTM will internally > > > see > > > that we need two hops to reach TTM_PL_TT and so does the NULL- > > > >SYSTEM > > > transition and then SYSTEM->TT. > > > > Ah interesting, so that's what the multi-hop thing does. But AFAICT > > i915 is not using either TTM_PL_TT or -EMULTIHOP. > > Mhm, it could be that we then have a problem and the i915 driver only > sees NULL->TT directly. But I really don't know the i915 driver code > good enough to judge that. > > Can you take a look at this and test it maybe? > > > > > Also what is the difference between TTM_PL_TT and TM_PL_SYSTEM? > > When > > should you use one over the other? > > TTM_PL_SYSTEM means the device is not accessing the buffer and TTM > has > the control over the backing store and can swapout/swapin as it wants > it. > > TTM_PL_TT means that the device is accessing the data (TT stands for > translation table) and so TTM can't swap the backing store in/out. > > TTM_PL_VRAM well that one is obvious. > > Thanks, > Christian. We've had a previous long discussions on this listing pros and cons, and IIRC concluded that either binding to the device from system needed some TTM fixes and documentation to be straightforward, or a driver should use the above scheme bouncing in TT. IIRC we concluded that the best thing for i915 would be to transition and introduce a dummy TT region and obey the scheme outlined above by Christian. We unfortunately never gotten around to that, though, due to other work got prioritized. Also need to solve the limbo (not populated) -> vram transition without populating when moving to TT.... Originaly TT was intended for GGTT-like and AGP apertures that needed cpu-mapping to the PCI address. Using it like Christian outlines helps avoid special casing for swapout. Devices that bind to System memory needs the swap notifier to unbind. /Thomas > > > > > > > > > As far as I can see that should work like it did before. > > > > > > Christian. > > > > > > > > > > > > > > > > > If you just validated to SYSTEM memory before I think the tt > > > > > object > > > > > wouldn't have been populated either. > > > > > > > > > > Regards, > > > > > Christian. > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > Christian. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've been considering to replacing the ttm_bo_type > > > > > > > > > with a bunch > > > > > > > > > of behavior flags for a bo. I'm hoping that this will > > > > > > > > > clean > > > > > > > > > things up a bit. > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > Christian. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > caching = > > > > > > > > > > > > i915_ttm_select_tt_caching(obj); > > > > > > > > > > > > diff --git > > > > > > > > > > > > a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > > > > > > > > > > > > b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > > > > > > > > > > > > index 9a7e50534b84bb..c420d1ab605b6f 100644 > > > > > > > > > > > > --- > > > > > > > > > > > > a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > > > > > > > > > > > > +++ > > > > > > > > > > > > b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > > > > > > > > > > > > @@ -560,7 +560,7 @@ int i915_ttm_move(struct > > > > > > > > > > > > ttm_buffer_object *bo, bool evict, > > > > > > > > > > > > bool clear; > > > > > > > > > > > > int ret; > > > > > > > > > > > > - if (GEM_WARN_ON(!obj)) { > > > > > > > > > > > > + if (GEM_WARN_ON(!obj) || !bo->resource) { > > > > > > > > > > > > ttm_bo_move_null(bo, dst_mem); > > > > > > > > > > > > return 0; > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/3] drm/i915: audit bo->resource usage (rev2) 2022-08-24 14:23 [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage Luben Tuikov ` (4 preceding siblings ...) 2022-08-30 7:33 ` [Intel-gfx] [PATCH 1/3] " Christian König @ 2022-09-01 8:43 ` Patchwork 5 siblings, 0 replies; 24+ messages in thread From: Patchwork @ 2022-09-01 8:43 UTC (permalink / raw) To: Christian König; +Cc: intel-gfx == Series Details == Series: series starting with [1/3] drm/i915: audit bo->resource usage (rev2) URL : https://patchwork.freedesktop.org/series/107680/ State : failure == Summary == Error: patch https://patchwork.freedesktop.org/api/1.0/series/107680/revisions/2/mbox/ not applied Applying: drm/i915: audit bo->resource usage error: git diff header lacks filename information when removing 1 leading pathname component (line 2) error: could not build fake ancestor hint: Use 'git am --show-current-patch=diff' to see the failed patch Patch failed at 0001 drm/i915: audit bo->resource usage When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". ^ permalink raw reply [flat|nested] 24+ messages in thread
* [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage @ 2022-07-12 11:46 Christian König 2022-07-13 10:08 ` Matthew Auld 0 siblings, 1 reply; 24+ messages in thread From: Christian König @ 2022-07-12 11:46 UTC (permalink / raw) To: intel-gfx, dri-devel; +Cc: Christian König Make sure we can at least move and alloc TT objects without backing store. Signed-off-by: Christian König <christian.koenig@amd.com> --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 6 ++---- drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 2 +- 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 70e2ed4e99df..5449738c262f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -289,8 +289,6 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo, { struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915), bdev); - struct ttm_resource_manager *man = - ttm_manager_type(bo->bdev, bo->resource->mem_type); struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); unsigned long ccs_pages = 0; enum ttm_caching caching; @@ -304,8 +302,8 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo, if (!i915_tt) return NULL; - if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && - man->use_tt) + if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && bo->resource && + ttm_manager_type(bo->bdev, bo->resource->mem_type)->use_tt) page_flags |= TTM_TT_FLAG_ZERO_ALLOC; caching = i915_ttm_select_tt_caching(obj); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c index a10716f4e717..dcb838dffd7b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c @@ -490,7 +490,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict, bool clear; int ret; - if (GEM_WARN_ON(!obj)) { + if (!bo->resource || GEM_WARN_ON(!obj)) { ttm_bo_move_null(bo, dst_mem); return 0; } -- 2.25.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage 2022-07-12 11:46 [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage Christian König @ 2022-07-13 10:08 ` Matthew Auld 0 siblings, 0 replies; 24+ messages in thread From: Matthew Auld @ 2022-07-13 10:08 UTC (permalink / raw) To: Christian König Cc: Intel Graphics Development, Christian König, ML dri-devel On Tue, 12 Jul 2022 at 12:46, Christian König <ckoenig.leichtzumerken@gmail.com> wrote: > > Make sure we can at least move and alloc TT objects without backing store. > > Signed-off-by: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 6 ++---- > drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 2 +- > 2 files changed, 3 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > index 70e2ed4e99df..5449738c262f 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > @@ -289,8 +289,6 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo, > { > struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915), > bdev); > - struct ttm_resource_manager *man = > - ttm_manager_type(bo->bdev, bo->resource->mem_type); > struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); > unsigned long ccs_pages = 0; > enum ttm_caching caching; > @@ -304,8 +302,8 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo, > if (!i915_tt) > return NULL; > > - if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && > - man->use_tt) > + if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && bo->resource && > + ttm_manager_type(bo->bdev, bo->resource->mem_type)->use_tt) > page_flags |= TTM_TT_FLAG_ZERO_ALLOC; AFAICT it should be safe to make this: if (obj->flags & I915_BO_ALLOC_CPU_CLEAR) page_flags |= TTM_TT_FLAG_ZERO_ALLOC; Hopefully that fixes the igt_lmem_create_cleared_cpu subtest? > > caching = i915_ttm_select_tt_caching(obj); > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > index a10716f4e717..dcb838dffd7b 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > @@ -490,7 +490,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict, > bool clear; > int ret; > > - if (GEM_WARN_ON(!obj)) { > + if (!bo->resource || GEM_WARN_ON(!obj)) { IIUC in patch 3 we now nuke the bo->resource when doing the "pipeline gutting" thing, but I think i915 is (ab)using that when swapping out shmem objects (see i915_ttm_shrink), so I think here we need to somehow inspect the tt to see if something needs to be swapped in? We might also need to move it back to lmem after. Or maybe this is already handled somehow? CI should hopefully be able to confirm (gem_lmem_swapping). > ttm_bo_move_null(bo, dst_mem); > return 0; > } > -- > 2.25.1 > ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2022-09-01 17:48 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-08-24 14:23 [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage Luben Tuikov 2022-08-24 14:23 ` [Intel-gfx] [PATCH 2/3] drm/ttm: stop allocating dummy resources during BO creation Luben Tuikov 2022-08-24 14:23 ` [Intel-gfx] [PATCH 3/3] drm/ttm: stop allocating a dummy resource for pipelined gutting Luben Tuikov 2022-08-24 16:21 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [1/3] drm/i915: audit bo->resource usage Patchwork 2022-08-24 16:44 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork 2022-08-30 7:33 ` [Intel-gfx] [PATCH 1/3] " Christian König 2022-08-30 10:45 ` Matthew Auld 2022-08-31 8:16 ` Christian König 2022-08-31 9:26 ` Matthew Auld 2022-08-31 9:38 ` Christian König 2022-08-31 10:37 ` Matthew Auld 2022-08-31 11:03 ` Christian König 2022-08-31 12:06 ` Matthew Auld 2022-08-31 12:35 ` Christian König 2022-08-31 12:50 ` Matthew Auld 2022-08-31 13:34 ` Christian König 2022-08-31 14:53 ` Matthew Auld 2022-08-31 16:32 ` Matthew Auld 2022-09-01 8:00 ` Christian König 2022-09-01 12:52 ` Matthew Auld 2022-09-01 17:48 ` Thomas Hellström 2022-09-01 8:43 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/3] drm/i915: audit bo->resource usage (rev2) Patchwork -- strict thread matches above, loose matches on Subject: below -- 2022-07-12 11:46 [Intel-gfx] [PATCH 1/3] drm/i915: audit bo->resource usage Christian König 2022-07-13 10:08 ` Matthew Auld
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox