All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] drm/amdgpu/uvd: Fix UVD BO memory placement issues
@ 2026-05-19  8:21 Timur Kristóf
  2026-05-19  8:22 ` [PATCH 1/5] drm/amdgpu: Respect placement requirements in amdgpu_gtt_mgr functions Timur Kristóf
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Timur Kristóf @ 2026-05-19  8:21 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, christian.koenig, Natalie Vock,
	John Olender, Liu Leo
  Cc: Timur Kristóf

UVD 4.x and older have two requirements for CS BOs:
1. All BOs must not cross 256M segments
2. MSG and FB BOs must be located in the same segment as the VCPU BO

The amdgpu_uvd code attempts to solve those requirements,
but unfortunately it has hit various limitations:

* VCPU BO may be placed in a different segment
* VRAM allocations may cross 256M in low memory scenarios
* GTT manager doesn't respect placement requirements
* GTT allocations may cross 256M
* GTT->GTT moves are not implemented

Let's solve these issues by fixing the GTT manager,
making sure that GTT allocations are placed in 256M segments
and VRAM allocations are moved to GTT when they cross 256M.
It also fixes forcing MSG and FB BOs to the UVD segment
when the UVD segment isn't the first segment, which can be
the case when resizable BAR is enabled.

This series should be backported to 7.0 and 7.1 because
technically this may have been a regression for some users
caused by switching to amdgpu by default.

Timur Kristóf (5):
  drm/amdgpu: Respect placement requirements in amdgpu_gtt_mgr functions
  drm/amdgpu: Use placements of 256M GART segments for SI/CIK
  drm/amdgpu/uvd: Place VCPU BO only in VRAM for UVD 4.x and older
  drm/amdgpu/uvd: Fix forcing BOs into UVD segment when it isn't at 0
  drm/amdgpu/uvd: Move BOs to GTT when we can't place them in VRAM
    correctly

 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 30 ++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c     | 57 ++++++++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h     |  3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     | 74 +++++++++++++++------
 4 files changed, 136 insertions(+), 28 deletions(-)

-- 
2.54.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] drm/amdgpu: Respect placement requirements in amdgpu_gtt_mgr functions
  2026-05-19  8:21 [PATCH 0/5] drm/amdgpu/uvd: Fix UVD BO memory placement issues Timur Kristóf
@ 2026-05-19  8:22 ` Timur Kristóf
  2026-05-19  8:52   ` Christian König
  2026-05-19  8:22 ` [PATCH 2/5] drm/amdgpu: Use placements of 256M GART segments for SI/CIK Timur Kristóf
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Timur Kristóf @ 2026-05-19  8:22 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, christian.koenig, Natalie Vock,
	John Olender, Liu Leo
  Cc: Timur Kristóf

When testing intersection and compatibility, respect
the actual placement requirements. This is a pre-requisite
for ensuring that UVD CS BOs do not cross 256M segments.

Fixes: ded910f368a5 ("drm/amdgpu: Implement intersect/compatible functions")
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 30 +++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 02f85802f579..19b6770a877d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -272,7 +272,20 @@ static bool amdgpu_gtt_mgr_intersects(struct ttm_resource_manager *man,
 				      const struct ttm_place *place,
 				      size_t size)
 {
-	return !place->lpfn || amdgpu_gtt_mgr_has_gart_addr(res);
+	const struct drm_mm_node *const node = &to_ttm_range_mgr_node(res)->mm_nodes[0];
+	const u32 num_pages = PFN_UP(size);
+
+	if (!place->lpfn)
+		return true;
+
+	if (!amdgpu_gtt_mgr_has_gart_addr(res))
+		return false;
+
+	if (place->fpfn >= (node->start + num_pages) ||
+	    (place->lpfn && place->lpfn <= node->start))
+		return false;
+
+	return true;
 }
 
 /**
@@ -290,7 +303,20 @@ static bool amdgpu_gtt_mgr_compatible(struct ttm_resource_manager *man,
 				      const struct ttm_place *place,
 				      size_t size)
 {
-	return !place->lpfn || amdgpu_gtt_mgr_has_gart_addr(res);
+	const struct drm_mm_node *const node = &to_ttm_range_mgr_node(res)->mm_nodes[0];
+	const u32 num_pages = PFN_UP(size);
+
+	if (!place->lpfn)
+		return true;
+
+	if (!amdgpu_gtt_mgr_has_gart_addr(res))
+		return false;
+
+	if (node->start < place->fpfn ||
+	    (place->lpfn && (node->start + num_pages) > place->lpfn))
+		return false;
+
+	return true;
 }
 
 /**
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] drm/amdgpu: Use placements of 256M GART segments for SI/CIK
  2026-05-19  8:21 [PATCH 0/5] drm/amdgpu/uvd: Fix UVD BO memory placement issues Timur Kristóf
  2026-05-19  8:22 ` [PATCH 1/5] drm/amdgpu: Respect placement requirements in amdgpu_gtt_mgr functions Timur Kristóf
@ 2026-05-19  8:22 ` Timur Kristóf
  2026-05-19  8:54   ` Christian König
  2026-05-19  8:22 ` [PATCH 3/5] drm/amdgpu/uvd: Place VCPU BO only in VRAM for UVD 4.x and older Timur Kristóf
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Timur Kristóf @ 2026-05-19  8:22 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, christian.koenig, Natalie Vock,
	John Olender, Liu Leo
  Cc: Timur Kristóf

UVD 4.x and older require that BOs don't cross 256M segments.
We need to respect that in amdgpu_ttm_alloc_gart().
We can't move the BOs later because GTT->GTT moves are
not implemented. We also can't force all BOs to VRAM
because that becomes very problematic in low VRAM scenarios.

This fixes UVD CS BOs crossing 256M segments
when they are placed in the GART.

Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4799
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 56 ++++++++++++++++++++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  3 ++
 2 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 6c6ab4dd6ea9..a106c7e77e26 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -959,6 +959,40 @@ static int amdgpu_ttm_backend_bind(struct ttm_device *bdev,
 	return 0;
 }
 
+/**
+ * amdgpu_ttm_fill_gart_256M_placements() - Fill placements array with 256M GART segments
+ *
+ * @bo: TTM buffer objects whose placements should be filled
+ * @placements: Pointer to an array of placements
+ * @max_placements: Size of the placements array
+ *
+ * Fill the specified placements array with 256M GART segments,
+ * starting from the highest address in order to reduce the
+ * contention of the lowest segment.
+ *
+ * Returns the number of placements filled.
+ */
+u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
+					 struct ttm_place *placements,
+					 u32 max_placements)
+{
+	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
+	u32 i;
+
+	/* Fill the placements array with 256M segments, starting from highest. */
+	for (i = 0; i < max_placements; ++i) {
+		if (i * SZ_256M >= adev->gmc.gart_size)
+			break;
+
+		placements[i].lpfn = (adev->gmc.gart_size - i * SZ_256M) >> PAGE_SHIFT;
+		placements[i].fpfn = ALIGN_DOWN(placements[i].lpfn - 1, SZ_256M >> PAGE_SHIFT);
+		placements[i].mem_type = TTM_PL_TT;
+		placements[i].flags = bo->resource->placement;
+	}
+
+	return i;
+}
+
 /*
  * amdgpu_ttm_alloc_gart - Make sure buffer object is accessible either
  * through AGP or GART aperture.
@@ -973,7 +1007,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
 	struct ttm_operation_ctx ctx = { false, false };
 	struct amdgpu_ttm_tt *gtt = ttm_to_amdgpu_ttm_tt(bo->ttm);
 	struct ttm_placement placement;
-	struct ttm_place placements;
+	struct ttm_place placements[AMDGPU_BO_MAX_PLACEMENTS];
 	struct ttm_resource *tmp;
 	uint64_t addr, flags;
 	int r;
@@ -987,11 +1021,21 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
 
 	/* allocate GART space */
 	placement.num_placement = 1;
-	placement.placement = &placements;
-	placements.fpfn = 0;
-	placements.lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
-	placements.mem_type = TTM_PL_TT;
-	placements.flags = bo->resource->placement;
+	placement.placement = &placements[0];
+	placements[0].fpfn = 0;
+	placements[0].lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
+	placements[0].mem_type = TTM_PL_TT;
+	placements[0].flags = bo->resource->placement;
+
+	/*
+	 * UVD 4.x and older require that BOs don't cross 256M segments.
+	 * We need to respect that here. We can't move the BO later
+	 * because GTT->GTT moves are not implemented.
+	 */
+	if (bo->base.size < SZ_256M && adev->family <= AMDGPU_FAMILY_KV)
+		placement.num_placement =
+			amdgpu_ttm_fill_gart_256M_placements(bo, placements,
+							     ARRAY_SIZE(placements));
 
 	r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx);
 	if (unlikely(r))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 2d72fa217274..e9de628c8d2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -202,6 +202,9 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_ttm_buffer_entity *entity,
 			    u64 k_job_id);
 struct amdgpu_ttm_buffer_entity *amdgpu_ttm_next_clear_entity(struct amdgpu_device *adev);
 
+u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
+					 struct ttm_place *placements,
+					 u32 max_placements);
 int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
 void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);
 uint64_t amdgpu_ttm_domain_start(struct amdgpu_device *adev, uint32_t type);
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] drm/amdgpu/uvd: Place VCPU BO only in VRAM for UVD 4.x and older
  2026-05-19  8:21 [PATCH 0/5] drm/amdgpu/uvd: Fix UVD BO memory placement issues Timur Kristóf
  2026-05-19  8:22 ` [PATCH 1/5] drm/amdgpu: Respect placement requirements in amdgpu_gtt_mgr functions Timur Kristóf
  2026-05-19  8:22 ` [PATCH 2/5] drm/amdgpu: Use placements of 256M GART segments for SI/CIK Timur Kristóf
@ 2026-05-19  8:22 ` Timur Kristóf
  2026-05-19  8:56   ` Christian König
  2026-05-19  8:22 ` [PATCH 4/5] drm/amdgpu/uvd: Fix forcing BOs into UVD segment when it isn't at 0 Timur Kristóf
  2026-05-19  8:22 ` [PATCH 5/5] drm/amdgpu/uvd: Move BOs to GTT when we can't place them in VRAM correctly Timur Kristóf
  4 siblings, 1 reply; 14+ messages in thread
From: Timur Kristóf @ 2026-05-19  8:22 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, christian.koenig, Natalie Vock,
	John Olender, Liu Leo
  Cc: Timur Kristóf

These UVD versions don't fully support GPUVM and are only
validated to work when their VCPU BO is placed in VRAM.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index 3a3bc0d370fa..1e59ca924abe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -188,6 +188,7 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
 	const struct common_firmware_header *hdr;
 	unsigned int family_id;
 	int i, j, r;
+	u32 vcpu_bo_domain;
 
 	INIT_DELAYED_WORK(&adev->uvd.idle_work, amdgpu_uvd_idle_work_handler);
 
@@ -319,12 +320,20 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
 	if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP)
 		bo_size += AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8);
 
+	/* UVD 5.0 and newer HW can use 64 bit addressing. */
+	adev->uvd.address_64_bit =
+		!amdgpu_device_ip_block_version_cmp(adev, AMD_IP_BLOCK_TYPE_UVD, 5, 0);
+
+	vcpu_bo_domain = AMDGPU_GEM_DOMAIN_VRAM;
+	if (adev->uvd.address_64_bit)
+		vcpu_bo_domain |= AMDGPU_GEM_DOMAIN_GTT;
+
 	for (j = 0; j < adev->uvd.num_uvd_inst; j++) {
 		if (adev->uvd.harvest_config & (1 << j))
 			continue;
+
 		r = amdgpu_bo_create_kernel(adev, bo_size, PAGE_SIZE,
-					    AMDGPU_GEM_DOMAIN_VRAM |
-					    AMDGPU_GEM_DOMAIN_GTT,
+					    vcpu_bo_domain,
 					    &adev->uvd.inst[j].vcpu_bo,
 					    &adev->uvd.inst[j].gpu_addr,
 					    &adev->uvd.inst[j].cpu_addr);
@@ -339,10 +348,6 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
 		adev->uvd.filp[i] = NULL;
 	}
 
-	/* from uvd v5.0 HW addressing capacity increased to 64 bits */
-	if (!amdgpu_device_ip_block_version_cmp(adev, AMD_IP_BLOCK_TYPE_UVD, 5, 0))
-		adev->uvd.address_64_bit = true;
-
 	r = amdgpu_uvd_create_msg_bo_helper(adev, 128 << 10, &adev->uvd.ib_bo);
 	if (r)
 		return r;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] drm/amdgpu/uvd: Fix forcing BOs into UVD segment when it isn't at 0
  2026-05-19  8:21 [PATCH 0/5] drm/amdgpu/uvd: Fix UVD BO memory placement issues Timur Kristóf
                   ` (2 preceding siblings ...)
  2026-05-19  8:22 ` [PATCH 3/5] drm/amdgpu/uvd: Place VCPU BO only in VRAM for UVD 4.x and older Timur Kristóf
@ 2026-05-19  8:22 ` Timur Kristóf
  2026-05-19  9:06   ` Christian König
  2026-05-19  8:22 ` [PATCH 5/5] drm/amdgpu/uvd: Move BOs to GTT when we can't place them in VRAM correctly Timur Kristóf
  4 siblings, 1 reply; 14+ messages in thread
From: Timur Kristóf @ 2026-05-19  8:22 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, christian.koenig, Natalie Vock,
	John Olender, Liu Leo
  Cc: Timur Kristóf

UVD 4.x and older can only access FB and MSG buffers from a
specific 256M VRAM segment that the VCPU BO is also located in.
We already modify all placements of the given BO to ensure
the BO is placed within this segment.

Previously, amdgpu_uvd_force_into_uvd_segment() always assumed
that the UVD segment is the first 256M of VRAM, even though
under some conditions the VCPU BO could be allocated outside
this segment, which made UVD non-functional as the BOs were
not inside the same segment as the UVD VCPU BO.

Solve that by using the segment where the VCPU BO actually is.

This fixes an issue with UVD failing to initialize on SI/CIK
when resizable BAR is enabled and the VCPU BO is allocated
in a different segment.

Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/3851
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 36 +++++++++++++++----------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index 1e59ca924abe..993957927782 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -550,16 +550,29 @@ void amdgpu_uvd_free_handles(struct amdgpu_device *adev, struct drm_file *filp)
 	}
 }
 
+/**
+ * amdgpu_uvd_force_into_uvd_segment() - Forces placement of a BO into the UVD segment
+ *
+ * @abo: buffer object whose placement is forced
+ *
+ * UVD 4.x and older can only access FB and MSG buffers from a specific 256M VRAM segment
+ * that the VCPU BO is also located in. Force the BO into that segment.
+ */
 static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo)
 {
-	int i;
+	struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
+	struct amdgpu_bo *vcpu_bo = adev->uvd.inst[0].vcpu_bo;
+	struct amdgpu_res_cursor vcpu_cur;
 
-	for (i = 0; i < abo->placement.num_placement; ++i) {
-		abo->placements[i].fpfn = 0 >> PAGE_SHIFT;
-		abo->placements[i].lpfn = (256 * 1024 * 1024) >> PAGE_SHIFT;
-		if (abo->placements[i].mem_type == TTM_PL_VRAM)
-			abo->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS;
-	}
+	amdgpu_res_first(vcpu_bo->tbo.resource, 0, amdgpu_bo_size(vcpu_bo), &vcpu_cur);
+
+	abo->placement.num_placement = 1;
+	abo->placements[0].fpfn = ALIGN_DOWN(vcpu_cur.start, SZ_256M) >> PAGE_SHIFT;
+	abo->placements[0].lpfn = abo->placements[0].fpfn + (SZ_256M >> PAGE_SHIFT);
+	abo->placements[0].mem_type = adev->uvd.inst[0].vcpu_bo->tbo.resource->mem_type;
+
+	if (abo->placements[0].mem_type == TTM_PL_VRAM)
+		abo->placements[0].flags |= TTM_PL_FLAG_CONTIGUOUS;
 }
 
 static u64 amdgpu_uvd_get_addr_from_ctx(struct amdgpu_uvd_cs_ctx *ctx)
@@ -600,13 +613,8 @@ static int amdgpu_uvd_cs_pass1(struct amdgpu_uvd_cs_ctx *ctx)
 	if (!ctx->parser->adev->uvd.address_64_bit) {
 		/* check if it's a message or feedback command */
 		cmd = amdgpu_ib_get_value(ctx->ib, ctx->idx) >> 1;
-		if (cmd == 0x0 || cmd == 0x3) {
-			/* yes, force it into VRAM */
-			uint32_t domain = AMDGPU_GEM_DOMAIN_VRAM;
-
-			amdgpu_bo_placement_from_domain(bo, domain);
-		}
-		amdgpu_uvd_force_into_uvd_segment(bo);
+		if (cmd == 0x0 || cmd == 0x3)
+			amdgpu_uvd_force_into_uvd_segment(bo);
 
 		r = ttm_bo_validate(&bo->tbo, &bo->placement, &tctx);
 	}
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] drm/amdgpu/uvd: Move BOs to GTT when we can't place them in VRAM correctly
  2026-05-19  8:21 [PATCH 0/5] drm/amdgpu/uvd: Fix UVD BO memory placement issues Timur Kristóf
                   ` (3 preceding siblings ...)
  2026-05-19  8:22 ` [PATCH 4/5] drm/amdgpu/uvd: Fix forcing BOs into UVD segment when it isn't at 0 Timur Kristóf
@ 2026-05-19  8:22 ` Timur Kristóf
  4 siblings, 0 replies; 14+ messages in thread
From: Timur Kristóf @ 2026-05-19  8:22 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, christian.koenig, Natalie Vock,
	John Olender, Liu Leo
  Cc: Timur Kristóf

When VRAM is nearly full, the Buddy allocator makes tradeoffs
and it may place BOs in a way that they cross 256M segments.

Move the BO to GTT when this eventuality is detected.

Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4800
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index a106c7e77e26..fb49bd53bd00 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -977,6 +977,7 @@ u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
 					 u32 max_placements)
 {
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
+	const u64 sz = adev->gmc.gart_size;
 	u32 i;
 
 	/* Fill the placements array with 256M segments, starting from highest. */
@@ -984,7 +985,7 @@ u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
 		if (i * SZ_256M >= adev->gmc.gart_size)
 			break;
 
-		placements[i].lpfn = (adev->gmc.gart_size - i * SZ_256M) >> PAGE_SHIFT;
+		placements[i].lpfn = MIN(ALIGN(sz, SZ_256M) - i * SZ_256M, sz) >> PAGE_SHIFT;
 		placements[i].fpfn = ALIGN_DOWN(placements[i].lpfn - 1, SZ_256M >> PAGE_SHIFT);
 		placements[i].mem_type = TTM_PL_TT;
 		placements[i].flags = bo->resource->placement;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index 993957927782..53f810c2a5fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -617,6 +617,27 @@ static int amdgpu_uvd_cs_pass1(struct amdgpu_uvd_cs_ctx *ctx)
 			amdgpu_uvd_force_into_uvd_segment(bo);
 
 		r = ttm_bo_validate(&bo->tbo, &bo->placement, &tctx);
+		if (r)
+			return r;
+
+		/* Check if the BO placement crosses a 256M segment. */
+		if ((amdgpu_bo_gpu_offset(bo) >> 28) !=
+		    ((amdgpu_bo_gpu_offset(bo) + amdgpu_bo_size(bo)) >> 28)) {
+			/* There is not enough memory for correct placement of FB/MSG BOs. */
+			if (cmd == 0x0 || cmd == 0x3)
+				return -ENOMEM;
+
+			/* GTT->GTT moves are not implemented yet. */
+			if (bo->tbo.resource->mem_type != TTM_PL_VRAM)
+				return -ENOMEM;
+
+			/* Try to move the BO from VRAM to GART into a 256M segment. */
+			amdgpu_ttm_fill_gart_256M_placements(&bo->tbo,
+							     bo->placements,
+							     ARRAY_SIZE(bo->placements));
+
+			r = ttm_bo_validate(&bo->tbo, &bo->placement, &tctx);
+		}
 	}
 
 	return r;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] drm/amdgpu: Respect placement requirements in amdgpu_gtt_mgr functions
  2026-05-19  8:22 ` [PATCH 1/5] drm/amdgpu: Respect placement requirements in amdgpu_gtt_mgr functions Timur Kristóf
@ 2026-05-19  8:52   ` Christian König
  0 siblings, 0 replies; 14+ messages in thread
From: Christian König @ 2026-05-19  8:52 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Natalie Vock,
	John Olender, Liu Leo

On 5/19/26 10:22, Timur Kristóf wrote:
> When testing intersection and compatibility, respect
> the actual placement requirements. This is a pre-requisite
> for ensuring that UVD CS BOs do not cross 256M segments.
> 
> Fixes: ded910f368a5 ("drm/amdgpu: Implement intersect/compatible functions")
> Suggested-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 30 +++++++++++++++++++--
>  1 file changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> index 02f85802f579..19b6770a877d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> @@ -272,7 +272,20 @@ static bool amdgpu_gtt_mgr_intersects(struct ttm_resource_manager *man,
>  				      const struct ttm_place *place,
>  				      size_t size)
>  {
> -	return !place->lpfn || amdgpu_gtt_mgr_has_gart_addr(res);
> +	const struct drm_mm_node *const node = &to_ttm_range_mgr_node(res)->mm_nodes[0];
> +	const u32 num_pages = PFN_UP(size);
> +
> +	if (!place->lpfn)
> +		return true;
> +
> +	if (!amdgpu_gtt_mgr_has_gart_addr(res))
> +		return false;
> +
> +	if (place->fpfn >= (node->start + num_pages) ||
> +	    (place->lpfn && place->lpfn <= node->start))
> +		return false;
> +
> +	return true;
>  }
>  
>  /**
> @@ -290,7 +303,20 @@ static bool amdgpu_gtt_mgr_compatible(struct ttm_resource_manager *man,
>  				      const struct ttm_place *place,
>  				      size_t size)
>  {
> -	return !place->lpfn || amdgpu_gtt_mgr_has_gart_addr(res);
> +	const struct drm_mm_node *const node = &to_ttm_range_mgr_node(res)->mm_nodes[0];
> +	const u32 num_pages = PFN_UP(size);
> +
> +	if (!place->lpfn)
> +		return true;
> +
> +	if (!amdgpu_gtt_mgr_has_gart_addr(res))
> +		return false;
> +
> +	if (node->start < place->fpfn ||
> +	    (place->lpfn && (node->start + num_pages) > place->lpfn))
> +		return false;
> +
> +	return true;
>  }
>  
>  /**


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] drm/amdgpu: Use placements of 256M GART segments for SI/CIK
  2026-05-19  8:22 ` [PATCH 2/5] drm/amdgpu: Use placements of 256M GART segments for SI/CIK Timur Kristóf
@ 2026-05-19  8:54   ` Christian König
  2026-05-19  8:59     ` Timur Kristóf
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2026-05-19  8:54 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Natalie Vock,
	John Olender, Liu Leo



On 5/19/26 10:22, Timur Kristóf wrote:
> UVD 4.x and older require that BOs don't cross 256M segments.
> We need to respect that in amdgpu_ttm_alloc_gart().
> We can't move the BOs later because GTT->GTT moves are
> not implemented. We also can't force all BOs to VRAM
> because that becomes very problematic in low VRAM scenarios.
> 
> This fixes UVD CS BOs crossing 256M segments
> when they are placed in the GART.

Clear NAK for that approach.

This is the general TTM interface function and shouldn't have any HW generation dependent code in it.

Regards,
Christian.

> 
> Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4799
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 56 ++++++++++++++++++++++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  3 ++
>  2 files changed, 53 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 6c6ab4dd6ea9..a106c7e77e26 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -959,6 +959,40 @@ static int amdgpu_ttm_backend_bind(struct ttm_device *bdev,
>  	return 0;
>  }
>  
> +/**
> + * amdgpu_ttm_fill_gart_256M_placements() - Fill placements array with 256M GART segments
> + *
> + * @bo: TTM buffer objects whose placements should be filled
> + * @placements: Pointer to an array of placements
> + * @max_placements: Size of the placements array
> + *
> + * Fill the specified placements array with 256M GART segments,
> + * starting from the highest address in order to reduce the
> + * contention of the lowest segment.
> + *
> + * Returns the number of placements filled.
> + */
> +u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
> +					 struct ttm_place *placements,
> +					 u32 max_placements)
> +{
> +	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
> +	u32 i;
> +
> +	/* Fill the placements array with 256M segments, starting from highest. */
> +	for (i = 0; i < max_placements; ++i) {
> +		if (i * SZ_256M >= adev->gmc.gart_size)
> +			break;
> +
> +		placements[i].lpfn = (adev->gmc.gart_size - i * SZ_256M) >> PAGE_SHIFT;
> +		placements[i].fpfn = ALIGN_DOWN(placements[i].lpfn - 1, SZ_256M >> PAGE_SHIFT);
> +		placements[i].mem_type = TTM_PL_TT;
> +		placements[i].flags = bo->resource->placement;
> +	}
> +
> +	return i;
> +}
> +
>  /*
>   * amdgpu_ttm_alloc_gart - Make sure buffer object is accessible either
>   * through AGP or GART aperture.
> @@ -973,7 +1007,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
>  	struct ttm_operation_ctx ctx = { false, false };
>  	struct amdgpu_ttm_tt *gtt = ttm_to_amdgpu_ttm_tt(bo->ttm);
>  	struct ttm_placement placement;
> -	struct ttm_place placements;
> +	struct ttm_place placements[AMDGPU_BO_MAX_PLACEMENTS];
>  	struct ttm_resource *tmp;
>  	uint64_t addr, flags;
>  	int r;
> @@ -987,11 +1021,21 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
>  
>  	/* allocate GART space */
>  	placement.num_placement = 1;
> -	placement.placement = &placements;
> -	placements.fpfn = 0;
> -	placements.lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
> -	placements.mem_type = TTM_PL_TT;
> -	placements.flags = bo->resource->placement;
> +	placement.placement = &placements[0];
> +	placements[0].fpfn = 0;
> +	placements[0].lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
> +	placements[0].mem_type = TTM_PL_TT;
> +	placements[0].flags = bo->resource->placement;
> +
> +	/*
> +	 * UVD 4.x and older require that BOs don't cross 256M segments.
> +	 * We need to respect that here. We can't move the BO later
> +	 * because GTT->GTT moves are not implemented.
> +	 */
> +	if (bo->base.size < SZ_256M && adev->family <= AMDGPU_FAMILY_KV)
> +		placement.num_placement =
> +			amdgpu_ttm_fill_gart_256M_placements(bo, placements,
> +							     ARRAY_SIZE(placements));
>  
>  	r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx);
>  	if (unlikely(r))
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> index 2d72fa217274..e9de628c8d2d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> @@ -202,6 +202,9 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_ttm_buffer_entity *entity,
>  			    u64 k_job_id);
>  struct amdgpu_ttm_buffer_entity *amdgpu_ttm_next_clear_entity(struct amdgpu_device *adev);
>  
> +u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
> +					 struct ttm_place *placements,
> +					 u32 max_placements);
>  int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
>  void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);
>  uint64_t amdgpu_ttm_domain_start(struct amdgpu_device *adev, uint32_t type);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] drm/amdgpu/uvd: Place VCPU BO only in VRAM for UVD 4.x and older
  2026-05-19  8:22 ` [PATCH 3/5] drm/amdgpu/uvd: Place VCPU BO only in VRAM for UVD 4.x and older Timur Kristóf
@ 2026-05-19  8:56   ` Christian König
  0 siblings, 0 replies; 14+ messages in thread
From: Christian König @ 2026-05-19  8:56 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Natalie Vock,
	John Olender, Liu Leo

On 5/19/26 10:22, Timur Kristóf wrote:
> These UVD versions don't fully support GPUVM and are only
> validated to work when their VCPU BO is placed in VRAM.
> 
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> index 3a3bc0d370fa..1e59ca924abe 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> @@ -188,6 +188,7 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
>  	const struct common_firmware_header *hdr;
>  	unsigned int family_id;
>  	int i, j, r;
> +	u32 vcpu_bo_domain;
>  
>  	INIT_DELAYED_WORK(&adev->uvd.idle_work, amdgpu_uvd_idle_work_handler);
>  
> @@ -319,12 +320,20 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
>  	if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP)
>  		bo_size += AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8);
>  
> +	/* UVD 5.0 and newer HW can use 64 bit addressing. */
> +	adev->uvd.address_64_bit =
> +		!amdgpu_device_ip_block_version_cmp(adev, AMD_IP_BLOCK_TYPE_UVD, 5, 0);
> +
> +	vcpu_bo_domain = AMDGPU_GEM_DOMAIN_VRAM;
> +	if (adev->uvd.address_64_bit)
> +		vcpu_bo_domain |= AMDGPU_GEM_DOMAIN_GTT;
> +
>  	for (j = 0; j < adev->uvd.num_uvd_inst; j++) {
>  		if (adev->uvd.harvest_config & (1 << j))
>  			continue;
> +
>  		r = amdgpu_bo_create_kernel(adev, bo_size, PAGE_SIZE,
> -					    AMDGPU_GEM_DOMAIN_VRAM |
> -					    AMDGPU_GEM_DOMAIN_GTT,
> +					    vcpu_bo_domain,
>  					    &adev->uvd.inst[j].vcpu_bo,
>  					    &adev->uvd.inst[j].gpu_addr,
>  					    &adev->uvd.inst[j].cpu_addr);
> @@ -339,10 +348,6 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
>  		adev->uvd.filp[i] = NULL;
>  	}
>  
> -	/* from uvd v5.0 HW addressing capacity increased to 64 bits */
> -	if (!amdgpu_device_ip_block_version_cmp(adev, AMD_IP_BLOCK_TYPE_UVD, 5, 0))
> -		adev->uvd.address_64_bit = true;
> -
>  	r = amdgpu_uvd_create_msg_bo_helper(adev, 128 << 10, &adev->uvd.ib_bo);
>  	if (r)
>  		return r;


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] drm/amdgpu: Use placements of 256M GART segments for SI/CIK
  2026-05-19  8:54   ` Christian König
@ 2026-05-19  8:59     ` Timur Kristóf
  2026-05-19  9:01       ` Christian König
  0 siblings, 1 reply; 14+ messages in thread
From: Timur Kristóf @ 2026-05-19  8:59 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Natalie Vock, John Olender, Liu Leo,
	Christian König

On Tuesday, May 19, 2026 10:54:10 AM Central European Summer Time Christian 
König wrote:
> On 5/19/26 10:22, Timur Kristóf wrote:
> > UVD 4.x and older require that BOs don't cross 256M segments.
> > We need to respect that in amdgpu_ttm_alloc_gart().
> > We can't move the BOs later because GTT->GTT moves are
> > not implemented. We also can't force all BOs to VRAM
> > because that becomes very problematic in low VRAM scenarios.
> > 
> > This fixes UVD CS BOs crossing 256M segments
> > when they are placed in the GART.
> 
> Clear NAK for that approach.
> 
> This is the general TTM interface function and shouldn't have any HW
> generation dependent code in it.

I don't see how else to solve this, since GTT->GTT moves are not implemented,
so we can't move the BO to a suitable address later. We also can't move it to 
VRAM.

Please suggest a better approach if you don't like this one.


> 
> Regards,
> Christian.
> 
> > Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4799
> > Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> > ---
> > 
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 56 ++++++++++++++++++++++---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  3 ++
> >  2 files changed, 53 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index
> > 6c6ab4dd6ea9..a106c7e77e26 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > @@ -959,6 +959,40 @@ static int amdgpu_ttm_backend_bind(struct ttm_device
> > *bdev,> 
> >  	return 0;
> >  
> >  }
> > 
> > +/**
> > + * amdgpu_ttm_fill_gart_256M_placements() - Fill placements array with
> > 256M GART segments + *
> > + * @bo: TTM buffer objects whose placements should be filled
> > + * @placements: Pointer to an array of placements
> > + * @max_placements: Size of the placements array
> > + *
> > + * Fill the specified placements array with 256M GART segments,
> > + * starting from the highest address in order to reduce the
> > + * contention of the lowest segment.
> > + *
> > + * Returns the number of placements filled.
> > + */
> > +u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
> > +					 struct ttm_place 
*placements,
> > +					 u32 max_placements)
> > +{
> > +	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
> > +	u32 i;
> > +
> > +	/* Fill the placements array with 256M segments, starting from 
highest.
> > */ +	for (i = 0; i < max_placements; ++i) {
> > +		if (i * SZ_256M >= adev->gmc.gart_size)
> > +			break;
> > +
> > +		placements[i].lpfn = (adev->gmc.gart_size - i * 
SZ_256M) >> PAGE_SHIFT;
> > +		placements[i].fpfn = ALIGN_DOWN(placements[i].lpfn - 1, 
SZ_256M >>
> > PAGE_SHIFT); +		placements[i].mem_type = TTM_PL_TT;
> > +		placements[i].flags = bo->resource->placement;
> > +	}
> > +
> > +	return i;
> > +}
> > +
> > 
> >  /*
> >  
> >   * amdgpu_ttm_alloc_gart - Make sure buffer object is accessible either
> >   * through AGP or GART aperture.
> > 
> > @@ -973,7 +1007,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object
> > *bo)> 
> >  	struct ttm_operation_ctx ctx = { false, false };
> >  	struct amdgpu_ttm_tt *gtt = ttm_to_amdgpu_ttm_tt(bo->ttm);
> >  	struct ttm_placement placement;
> > 
> > -	struct ttm_place placements;
> > +	struct ttm_place placements[AMDGPU_BO_MAX_PLACEMENTS];
> > 
> >  	struct ttm_resource *tmp;
> >  	uint64_t addr, flags;
> >  	int r;
> > 
> > @@ -987,11 +1021,21 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object
> > *bo)> 
> >  	/* allocate GART space */
> >  	placement.num_placement = 1;
> > 
> > -	placement.placement = &placements;
> > -	placements.fpfn = 0;
> > -	placements.lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
> > -	placements.mem_type = TTM_PL_TT;
> > -	placements.flags = bo->resource->placement;
> > +	placement.placement = &placements[0];
> > +	placements[0].fpfn = 0;
> > +	placements[0].lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
> > +	placements[0].mem_type = TTM_PL_TT;
> > +	placements[0].flags = bo->resource->placement;
> > +
> > +	/*
> > +	 * UVD 4.x and older require that BOs don't cross 256M segments.
> > +	 * We need to respect that here. We can't move the BO later
> > +	 * because GTT->GTT moves are not implemented.
> > +	 */
> > +	if (bo->base.size < SZ_256M && adev->family <= AMDGPU_FAMILY_KV)
> > +		placement.num_placement =
> > +			amdgpu_ttm_fill_gart_256M_placements(bo, 
placements,
> > +							     
ARRAY_SIZE(placements));
> > 
> >  	r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx);
> >  	if (unlikely(r))
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index
> > 2d72fa217274..e9de628c8d2d 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> > @@ -202,6 +202,9 @@ int amdgpu_ttm_clear_buffer(struct
> > amdgpu_ttm_buffer_entity *entity,> 
> >  			    u64 k_job_id);
> >  
> >  struct amdgpu_ttm_buffer_entity *amdgpu_ttm_next_clear_entity(struct
> >  amdgpu_device *adev);> 
> > +u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
> > +					 struct ttm_place 
*placements,
> > +					 u32 max_placements);
> > 
> >  int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
> >  void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);
> >  uint64_t amdgpu_ttm_domain_start(struct amdgpu_device *adev, uint32_t
> >  type);





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] drm/amdgpu: Use placements of 256M GART segments for SI/CIK
  2026-05-19  8:59     ` Timur Kristóf
@ 2026-05-19  9:01       ` Christian König
  2026-05-19  9:16         ` Timur Kristóf
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2026-05-19  9:01 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Natalie Vock,
	John Olender, Liu Leo

On 5/19/26 10:59, Timur Kristóf wrote:
> On Tuesday, May 19, 2026 10:54:10 AM Central European Summer Time Christian 
> König wrote:
>> On 5/19/26 10:22, Timur Kristóf wrote:
>>> UVD 4.x and older require that BOs don't cross 256M segments.
>>> We need to respect that in amdgpu_ttm_alloc_gart().
>>> We can't move the BOs later because GTT->GTT moves are
>>> not implemented. We also can't force all BOs to VRAM
>>> because that becomes very problematic in low VRAM scenarios.
>>>
>>> This fixes UVD CS BOs crossing 256M segments
>>> when they are placed in the GART.
>>
>> Clear NAK for that approach.
>>
>> This is the general TTM interface function and shouldn't have any HW
>> generation dependent code in it.
> 
> I don't see how else to solve this, since GTT->GTT moves are not implemented,
> so we can't move the BO to a suitable address later. We also can't move it to 
> VRAM.

GTT to GTT moves should be relatively easy to implement.

We just need to wait for the BO to be idle, unbind, move and bind again.

Regards,
Christian.

> 
> Please suggest a better approach if you don't like this one.
> 
> 
>>
>> Regards,
>> Christian.
>>
>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4799
>>> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
>>> ---
>>>
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 56 ++++++++++++++++++++++---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  3 ++
>>>  2 files changed, 53 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index
>>> 6c6ab4dd6ea9..a106c7e77e26 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> @@ -959,6 +959,40 @@ static int amdgpu_ttm_backend_bind(struct ttm_device
>>> *bdev,> 
>>>  	return 0;
>>>  
>>>  }
>>>
>>> +/**
>>> + * amdgpu_ttm_fill_gart_256M_placements() - Fill placements array with
>>> 256M GART segments + *
>>> + * @bo: TTM buffer objects whose placements should be filled
>>> + * @placements: Pointer to an array of placements
>>> + * @max_placements: Size of the placements array
>>> + *
>>> + * Fill the specified placements array with 256M GART segments,
>>> + * starting from the highest address in order to reduce the
>>> + * contention of the lowest segment.
>>> + *
>>> + * Returns the number of placements filled.
>>> + */
>>> +u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
>>> +					 struct ttm_place 
> *placements,
>>> +					 u32 max_placements)
>>> +{
>>> +	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
>>> +	u32 i;
>>> +
>>> +	/* Fill the placements array with 256M segments, starting from 
> highest.
>>> */ +	for (i = 0; i < max_placements; ++i) {
>>> +		if (i * SZ_256M >= adev->gmc.gart_size)
>>> +			break;
>>> +
>>> +		placements[i].lpfn = (adev->gmc.gart_size - i * 
> SZ_256M) >> PAGE_SHIFT;
>>> +		placements[i].fpfn = ALIGN_DOWN(placements[i].lpfn - 1, 
> SZ_256M >>
>>> PAGE_SHIFT); +		placements[i].mem_type = TTM_PL_TT;
>>> +		placements[i].flags = bo->resource->placement;
>>> +	}
>>> +
>>> +	return i;
>>> +}
>>> +
>>>
>>>  /*
>>>  
>>>   * amdgpu_ttm_alloc_gart - Make sure buffer object is accessible either
>>>   * through AGP or GART aperture.
>>>
>>> @@ -973,7 +1007,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object
>>> *bo)> 
>>>  	struct ttm_operation_ctx ctx = { false, false };
>>>  	struct amdgpu_ttm_tt *gtt = ttm_to_amdgpu_ttm_tt(bo->ttm);
>>>  	struct ttm_placement placement;
>>>
>>> -	struct ttm_place placements;
>>> +	struct ttm_place placements[AMDGPU_BO_MAX_PLACEMENTS];
>>>
>>>  	struct ttm_resource *tmp;
>>>  	uint64_t addr, flags;
>>>  	int r;
>>>
>>> @@ -987,11 +1021,21 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object
>>> *bo)> 
>>>  	/* allocate GART space */
>>>  	placement.num_placement = 1;
>>>
>>> -	placement.placement = &placements;
>>> -	placements.fpfn = 0;
>>> -	placements.lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
>>> -	placements.mem_type = TTM_PL_TT;
>>> -	placements.flags = bo->resource->placement;
>>> +	placement.placement = &placements[0];
>>> +	placements[0].fpfn = 0;
>>> +	placements[0].lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
>>> +	placements[0].mem_type = TTM_PL_TT;
>>> +	placements[0].flags = bo->resource->placement;
>>> +
>>> +	/*
>>> +	 * UVD 4.x and older require that BOs don't cross 256M segments.
>>> +	 * We need to respect that here. We can't move the BO later
>>> +	 * because GTT->GTT moves are not implemented.
>>> +	 */
>>> +	if (bo->base.size < SZ_256M && adev->family <= AMDGPU_FAMILY_KV)
>>> +		placement.num_placement =
>>> +			amdgpu_ttm_fill_gart_256M_placements(bo, 
> placements,
>>> +							     
> ARRAY_SIZE(placements));
>>>
>>>  	r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx);
>>>  	if (unlikely(r))
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index
>>> 2d72fa217274..e9de628c8d2d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> @@ -202,6 +202,9 @@ int amdgpu_ttm_clear_buffer(struct
>>> amdgpu_ttm_buffer_entity *entity,> 
>>>  			    u64 k_job_id);
>>>  
>>>  struct amdgpu_ttm_buffer_entity *amdgpu_ttm_next_clear_entity(struct
>>>  amdgpu_device *adev);> 
>>> +u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
>>> +					 struct ttm_place 
> *placements,
>>> +					 u32 max_placements);
>>>
>>>  int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
>>>  void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);
>>>  uint64_t amdgpu_ttm_domain_start(struct amdgpu_device *adev, uint32_t
>>>  type);
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] drm/amdgpu/uvd: Fix forcing BOs into UVD segment when it isn't at 0
  2026-05-19  8:22 ` [PATCH 4/5] drm/amdgpu/uvd: Fix forcing BOs into UVD segment when it isn't at 0 Timur Kristóf
@ 2026-05-19  9:06   ` Christian König
  2026-05-19  9:32     ` Timur Kristóf
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2026-05-19  9:06 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Natalie Vock,
	John Olender, Liu Leo

On 5/19/26 10:22, Timur Kristóf wrote:
> UVD 4.x and older can only access FB and MSG buffers from a
> specific 256M VRAM segment that the VCPU BO is also located in.
> We already modify all placements of the given BO to ensure
> the BO is placed within this segment.
> 
> Previously, amdgpu_uvd_force_into_uvd_segment() always assumed
> that the UVD segment is the first 256M of VRAM, even though
> under some conditions the VCPU BO could be allocated outside
> this segment, which made UVD non-functional as the BOs were
> not inside the same segment as the UVD VCPU BO.
> 
> Solve that by using the segment where the VCPU BO actually is.
> 
> This fixes an issue with UVD failing to initialize on SI/CIK
> when resizable BAR is enabled and the VCPU BO is allocated
> in a different segment.
> 
> Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/3851
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 36 +++++++++++++++----------
>  1 file changed, 22 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> index 1e59ca924abe..993957927782 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> @@ -550,16 +550,29 @@ void amdgpu_uvd_free_handles(struct amdgpu_device *adev, struct drm_file *filp)
>  	}
>  }
>  
> +/**
> + * amdgpu_uvd_force_into_uvd_segment() - Forces placement of a BO into the UVD segment
> + *
> + * @abo: buffer object whose placement is forced
> + *
> + * UVD 4.x and older can only access FB and MSG buffers from a specific 256M VRAM segment
> + * that the VCPU BO is also located in. Force the BO into that segment.
> + */
>  static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo)
>  {
> -	int i;
> +	struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
> +	struct amdgpu_bo *vcpu_bo = adev->uvd.inst[0].vcpu_bo;
> +	struct amdgpu_res_cursor vcpu_cur;
>  
> -	for (i = 0; i < abo->placement.num_placement; ++i) {
> -		abo->placements[i].fpfn = 0 >> PAGE_SHIFT;
> -		abo->placements[i].lpfn = (256 * 1024 * 1024) >> PAGE_SHIFT;
> -		if (abo->placements[i].mem_type == TTM_PL_VRAM)
> -			abo->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS;
> -	}
> +	amdgpu_res_first(vcpu_bo->tbo.resource, 0, amdgpu_bo_size(vcpu_bo), &vcpu_cur);
> +
> +	abo->placement.num_placement = 1;
> +	abo->placements[0].fpfn = ALIGN_DOWN(vcpu_cur.start, SZ_256M) >> PAGE_SHIFT;
> +	abo->placements[0].lpfn = abo->placements[0].fpfn + (SZ_256M >> PAGE_SHIFT);
> +	abo->placements[0].mem_type = adev->uvd.inst[0].vcpu_bo->tbo.resource->mem_type;

This should clearly be applied to all placements, it's just that VRAM should use the vcpu segment and GTT the first one.

> +
> +	if (abo->placements[0].mem_type == TTM_PL_VRAM)
> +		abo->placements[0].flags |= TTM_PL_FLAG_CONTIGUOUS;
>  }
>  
>  static u64 amdgpu_uvd_get_addr_from_ctx(struct amdgpu_uvd_cs_ctx *ctx)
> @@ -600,13 +613,8 @@ static int amdgpu_uvd_cs_pass1(struct amdgpu_uvd_cs_ctx *ctx)
>  	if (!ctx->parser->adev->uvd.address_64_bit) {
>  		/* check if it's a message or feedback command */
>  		cmd = amdgpu_ib_get_value(ctx->ib, ctx->idx) >> 1;
> -		if (cmd == 0x0 || cmd == 0x3) {
> -			/* yes, force it into VRAM */
> -			uint32_t domain = AMDGPU_GEM_DOMAIN_VRAM;
> -
> -			amdgpu_bo_placement_from_domain(bo, domain);
> -		}
> -		amdgpu_uvd_force_into_uvd_segment(bo);
> +		if (cmd == 0x0 || cmd == 0x3)
> +			amdgpu_uvd_force_into_uvd_segment(bo);

The existing code was already correct. We just messed up the GTT handling by not having the correct check in the manager and not supporting GTT->GTT moves.

Regards,
Christian.

>  
>  		r = ttm_bo_validate(&bo->tbo, &bo->placement, &tctx);
>  	}


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] drm/amdgpu: Use placements of 256M GART segments for SI/CIK
  2026-05-19  9:01       ` Christian König
@ 2026-05-19  9:16         ` Timur Kristóf
  0 siblings, 0 replies; 14+ messages in thread
From: Timur Kristóf @ 2026-05-19  9:16 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Natalie Vock, John Olender, Liu Leo,
	Christian König

On Tuesday, May 19, 2026 11:01:48 AM Central European Summer Time Christian 
König wrote:
> On 5/19/26 10:59, Timur Kristóf wrote:
> > On Tuesday, May 19, 2026 10:54:10 AM Central European Summer Time
> > Christian
> > 
> > König wrote:
> >> On 5/19/26 10:22, Timur Kristóf wrote:
> >>> UVD 4.x and older require that BOs don't cross 256M segments.
> >>> We need to respect that in amdgpu_ttm_alloc_gart().
> >>> We can't move the BOs later because GTT->GTT moves are
> >>> not implemented. We also can't force all BOs to VRAM
> >>> because that becomes very problematic in low VRAM scenarios.
> >>> 
> >>> This fixes UVD CS BOs crossing 256M segments
> >>> when they are placed in the GART.
> >> 
> >> Clear NAK for that approach.
> >> 
> >> This is the general TTM interface function and shouldn't have any HW
> >> generation dependent code in it.
> > 
> > I don't see how else to solve this, since GTT->GTT moves are not
> > implemented, so we can't move the BO to a suitable address later. We also
> > can't move it to VRAM.
> 
> GTT to GTT moves should be relatively easy to implement.
> 
> We just need to wait for the BO to be idle, unbind, move and bind again.

Implementing GTT->GTT moves sounds like a bigger task and cannot be backported 
as that would be a new feature not a bug fix. So I strongly prefer to solve 
this problem with the tools we already have available.

Also, I would prefer to not have to move the BO at all and give it a suitable 
address from the beginning, to avoid the overhead of the move.

As far as I understand you take issue with checking adev->family in 
amdgpu_ttm_alloc_gart(), right? So, how about one of these alternatives:

- add a bool argument so the caller can request 256M segments, then the caller 
can check the GPU generation
- add an optional argument so the caller can just pass in a placements array

Or, if you have a different suggestion, let me know.

Thanks,
Timur

> >> 
> >>> Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4799
> >>> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> >>> ---
> >>> 
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 56 ++++++++++++++++++++++---
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  3 ++
> >>>  2 files changed, 53 insertions(+), 6 deletions(-)
> >>> 
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index
> >>> 6c6ab4dd6ea9..a106c7e77e26 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>> @@ -959,6 +959,40 @@ static int amdgpu_ttm_backend_bind(struct
> >>> ttm_device
> >>> *bdev,>
> >>> 
> >>>  	return 0;
> >>>  
> >>>  }
> >>> 
> >>> +/**
> >>> + * amdgpu_ttm_fill_gart_256M_placements() - Fill placements array with
> >>> 256M GART segments + *
> >>> + * @bo: TTM buffer objects whose placements should be filled
> >>> + * @placements: Pointer to an array of placements
> >>> + * @max_placements: Size of the placements array
> >>> + *
> >>> + * Fill the specified placements array with 256M GART segments,
> >>> + * starting from the highest address in order to reduce the
> >>> + * contention of the lowest segment.
> >>> + *
> >>> + * Returns the number of placements filled.
> >>> + */
> >>> +u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
> >>> +					 struct ttm_place
> > 
> > *placements,
> > 
> >>> +					 u32 max_placements)
> >>> +{
> >>> +	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
> >>> +	u32 i;
> >>> +
> >>> +	/* Fill the placements array with 256M segments, starting from
> > 
> > highest.
> > 
> >>> */ +	for (i = 0; i < max_placements; ++i) {
> >>> +		if (i * SZ_256M >= adev->gmc.gart_size)
> >>> +			break;
> >>> +
> >>> +		placements[i].lpfn = (adev->gmc.gart_size - i *
> > 
> > SZ_256M) >> PAGE_SHIFT;
> > 
> >>> +		placements[i].fpfn = ALIGN_DOWN(placements[i].lpfn - 1,
> > 
> > SZ_256M >>
> > 
> >>> PAGE_SHIFT); +		placements[i].mem_type = TTM_PL_TT;
> >>> +		placements[i].flags = bo->resource->placement;
> >>> +	}
> >>> +
> >>> +	return i;
> >>> +}
> >>> +
> >>> 
> >>>  /*
> >>>  
> >>>   * amdgpu_ttm_alloc_gart - Make sure buffer object is accessible either
> >>>   * through AGP or GART aperture.
> >>> 
> >>> @@ -973,7 +1007,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object
> >>> *bo)>
> >>> 
> >>>  	struct ttm_operation_ctx ctx = { false, false };
> >>>  	struct amdgpu_ttm_tt *gtt = ttm_to_amdgpu_ttm_tt(bo->ttm);
> >>>  	struct ttm_placement placement;
> >>> 
> >>> -	struct ttm_place placements;
> >>> +	struct ttm_place placements[AMDGPU_BO_MAX_PLACEMENTS];
> >>> 
> >>>  	struct ttm_resource *tmp;
> >>>  	uint64_t addr, flags;
> >>>  	int r;
> >>> 
> >>> @@ -987,11 +1021,21 @@ int amdgpu_ttm_alloc_gart(struct
> >>> ttm_buffer_object
> >>> *bo)>
> >>> 
> >>>  	/* allocate GART space */
> >>>  	placement.num_placement = 1;
> >>> 
> >>> -	placement.placement = &placements;
> >>> -	placements.fpfn = 0;
> >>> -	placements.lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
> >>> -	placements.mem_type = TTM_PL_TT;
> >>> -	placements.flags = bo->resource->placement;
> >>> +	placement.placement = &placements[0];
> >>> +	placements[0].fpfn = 0;
> >>> +	placements[0].lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
> >>> +	placements[0].mem_type = TTM_PL_TT;
> >>> +	placements[0].flags = bo->resource->placement;
> >>> +
> >>> +	/*
> >>> +	 * UVD 4.x and older require that BOs don't cross 256M segments.
> >>> +	 * We need to respect that here. We can't move the BO later
> >>> +	 * because GTT->GTT moves are not implemented.
> >>> +	 */
> >>> +	if (bo->base.size < SZ_256M && adev->family <= AMDGPU_FAMILY_KV)
> >>> +		placement.num_placement =
> >>> +			amdgpu_ttm_fill_gart_256M_placements(bo,
> > 
> > placements,
> > 
> >>> +
> > 
> > ARRAY_SIZE(placements));
> > 
> >>>  	r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx);
> >>>  	if (unlikely(r))
> >>> 
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index
> >>> 2d72fa217274..e9de628c8d2d 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> >>> @@ -202,6 +202,9 @@ int amdgpu_ttm_clear_buffer(struct
> >>> amdgpu_ttm_buffer_entity *entity,>
> >>> 
> >>>  			    u64 k_job_id);
> >>>  
> >>>  struct amdgpu_ttm_buffer_entity *amdgpu_ttm_next_clear_entity(struct
> >>>  amdgpu_device *adev);>
> >>> 
> >>> +u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
> >>> +					 struct ttm_place
> > 
> > *placements,
> > 
> >>> +					 u32 max_placements);
> >>> 
> >>>  int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
> >>>  void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);
> >>>  uint64_t amdgpu_ttm_domain_start(struct amdgpu_device *adev, uint32_t
> >>>  type);





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] drm/amdgpu/uvd: Fix forcing BOs into UVD segment when it isn't at 0
  2026-05-19  9:06   ` Christian König
@ 2026-05-19  9:32     ` Timur Kristóf
  0 siblings, 0 replies; 14+ messages in thread
From: Timur Kristóf @ 2026-05-19  9:32 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Natalie Vock, John Olender, Liu Leo,
	Christian König

On Tuesday, May 19, 2026 11:06:51 AM Central European Summer Time Christian 
König wrote:
> On 5/19/26 10:22, Timur Kristóf wrote:
> > UVD 4.x and older can only access FB and MSG buffers from a
> > specific 256M VRAM segment that the VCPU BO is also located in.
> > We already modify all placements of the given BO to ensure
> > the BO is placed within this segment.
> > 
> > Previously, amdgpu_uvd_force_into_uvd_segment() always assumed
> > that the UVD segment is the first 256M of VRAM, even though
> > under some conditions the VCPU BO could be allocated outside
> > this segment, which made UVD non-functional as the BOs were
> > not inside the same segment as the UVD VCPU BO.
> > 
> > Solve that by using the segment where the VCPU BO actually is.
> > 
> > This fixes an issue with UVD failing to initialize on SI/CIK
> > when resizable BAR is enabled and the VCPU BO is allocated
> > in a different segment.
> > 
> > Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/3851
> > Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> > ---
> > 
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 36 +++++++++++++++----------
> >  1 file changed, 22 insertions(+), 14 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c index
> > 1e59ca924abe..993957927782 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> > @@ -550,16 +550,29 @@ void amdgpu_uvd_free_handles(struct amdgpu_device
> > *adev, struct drm_file *filp)> 
> >  	}
> >  
> >  }
> > 
> > +/**
> > + * amdgpu_uvd_force_into_uvd_segment() - Forces placement of a BO into
> > the UVD segment + *
> > + * @abo: buffer object whose placement is forced
> > + *
> > + * UVD 4.x and older can only access FB and MSG buffers from a specific
> > 256M VRAM segment + * that the VCPU BO is also located in. Force the BO
> > into that segment. + */
> > 
> >  static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo)
> >  {
> > 
> > -	int i;
> > +	struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
> > +	struct amdgpu_bo *vcpu_bo = adev->uvd.inst[0].vcpu_bo;
> > +	struct amdgpu_res_cursor vcpu_cur;
> > 
> > -	for (i = 0; i < abo->placement.num_placement; ++i) {
> > -		abo->placements[i].fpfn = 0 >> PAGE_SHIFT;
> > -		abo->placements[i].lpfn = (256 * 1024 * 1024) >> 
PAGE_SHIFT;
> > -		if (abo->placements[i].mem_type == TTM_PL_VRAM)
> > -			abo->placements[i].flags |= 
TTM_PL_FLAG_CONTIGUOUS;
> > -	}
> > +	amdgpu_res_first(vcpu_bo->tbo.resource, 0, amdgpu_bo_size(vcpu_bo),
> > &vcpu_cur); +
> > +	abo->placement.num_placement = 1;
> > +	abo->placements[0].fpfn = ALIGN_DOWN(vcpu_cur.start, SZ_256M) >>
> > PAGE_SHIFT; +	abo->placements[0].lpfn = abo->placements[0].fpfn +
> > (SZ_256M >> PAGE_SHIFT); +	abo->placements[0].mem_type =
> > adev->uvd.inst[0].vcpu_bo->tbo.resource->mem_type;
> This should clearly be applied to all placements, it's just that VRAM should
> use the vcpu segment and GTT the first one.

Previously, the function was trying to do two things at once:

- Forcing some BOs into the UVD segment, as the name of the function suggests, 
but it was buggy because it assumed the UVD segment is always segment 0
- Forcing GTT placements for other BOs, which is not what its name suggests, 
and was actually wrong and not working.

With this patch, the function is now only responsible for forcing MSG/FB BOs 
into the UVD segment, and is no longer responsible for doing anything with the 
other BOs, which was incorrect anyway.

This is also beneficial because it reduces contention for the UVD segment for 
the BOs that don't actually need to be located in that segment.

For non-MSG/FB BOs, this function should no longer be called.
When those BOs are located in GTT, patch 2 should already make sure they are 
placed correctly. When they are in VRAM, patch 5 will make sure to correct 
their placement if it was wrong.

> > +
> > +	if (abo->placements[0].mem_type == TTM_PL_VRAM)
> > +		abo->placements[0].flags |= TTM_PL_FLAG_CONTIGUOUS;
> > 
> >  }
> >  
> >  static u64 amdgpu_uvd_get_addr_from_ctx(struct amdgpu_uvd_cs_ctx *ctx)
> > 
> > @@ -600,13 +613,8 @@ static int amdgpu_uvd_cs_pass1(struct
> > amdgpu_uvd_cs_ctx *ctx)> 
> >  	if (!ctx->parser->adev->uvd.address_64_bit) {
> >  	
> >  		/* check if it's a message or feedback command */
> >  		cmd = amdgpu_ib_get_value(ctx->ib, ctx->idx) >> 1;
> > 
> > -		if (cmd == 0x0 || cmd == 0x3) {
> > -			/* yes, force it into VRAM */
> > -			uint32_t domain = AMDGPU_GEM_DOMAIN_VRAM;
> > -
> > -			amdgpu_bo_placement_from_domain(bo, domain);
> > -		}
> > -		amdgpu_uvd_force_into_uvd_segment(bo);
> > +		if (cmd == 0x0 || cmd == 0x3)
> > +			amdgpu_uvd_force_into_uvd_segment(bo);
> 
> The existing code was already correct. We just messed up the GTT handling by
> not having the correct check in the manager and not supporting GTT->GTT
> moves.

See above for an explanation of why I changed this part.

> 
> Regards,
> Christian.
> 
> >  		r = ttm_bo_validate(&bo->tbo, &bo->placement, &tctx);
> >  	
> >  	}





^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-05-19  9:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-19  8:21 [PATCH 0/5] drm/amdgpu/uvd: Fix UVD BO memory placement issues Timur Kristóf
2026-05-19  8:22 ` [PATCH 1/5] drm/amdgpu: Respect placement requirements in amdgpu_gtt_mgr functions Timur Kristóf
2026-05-19  8:52   ` Christian König
2026-05-19  8:22 ` [PATCH 2/5] drm/amdgpu: Use placements of 256M GART segments for SI/CIK Timur Kristóf
2026-05-19  8:54   ` Christian König
2026-05-19  8:59     ` Timur Kristóf
2026-05-19  9:01       ` Christian König
2026-05-19  9:16         ` Timur Kristóf
2026-05-19  8:22 ` [PATCH 3/5] drm/amdgpu/uvd: Place VCPU BO only in VRAM for UVD 4.x and older Timur Kristóf
2026-05-19  8:56   ` Christian König
2026-05-19  8:22 ` [PATCH 4/5] drm/amdgpu/uvd: Fix forcing BOs into UVD segment when it isn't at 0 Timur Kristóf
2026-05-19  9:06   ` Christian König
2026-05-19  9:32     ` Timur Kristóf
2026-05-19  8:22 ` [PATCH 5/5] drm/amdgpu/uvd: Move BOs to GTT when we can't place them in VRAM correctly Timur Kristóf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.