AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] drm/amdgpu: CIK UVD initialization fixes
@ 2025-04-29 11:24 John Olender
  2025-04-29 11:24 ` [RFC PATCH 1/2] drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram John Olender
  2025-04-29 11:24 ` [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment John Olender
  0 siblings, 2 replies; 28+ messages in thread
From: John Olender @ 2025-04-29 11:24 UTC (permalink / raw)
  To: amd-gfx; +Cc: arunpravin.paneerselvam

This patchset addresses two issues affecting UVD initialization on CIK
hardware:
 
* Patch #1 fixes init with <256M vram:
(https://gitlab.freedesktop.org/drm/amd/-/issues/3448)
 
* Patch #2 fixes init with >256M vram:
(https://gitlab.freedesktop.org/drm/amd/-/issues/3851)
Includes a whitespace fix for the version on the issue tracker.
 
[Request for Testing]

Patch #2 tries to preserve the existing vcpu bo allocation behavior for
UVD 5.0+.  However, I don't have any UVD 5.0+ hardware available for
regression testing.
 
If someone with a GCN3-5 card can confirm this patchset does not break
UVD 5.0+ init, it'd be greatly appreciated.

John Olender (2):
  drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram
  drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment

 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c      | 36 ++++++++++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c |  2 +-
 2 files changed, 22 insertions(+), 16 deletions(-)


base-commit: 092f0316dd74beba6685b5c65ff7d25c4870e842
-- 
2.47.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 1/2] drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram
  2025-04-29 11:24 [RFC PATCH 0/2] drm/amdgpu: CIK UVD initialization fixes John Olender
@ 2025-04-29 11:24 ` John Olender
  2025-04-30 21:20   ` Alex Deucher
  2025-04-29 11:24 ` [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment John Olender
  1 sibling, 1 reply; 28+ messages in thread
From: John Olender @ 2025-04-29 11:24 UTC (permalink / raw)
  To: amd-gfx; +Cc: arunpravin.paneerselvam

The drm_mm allocator tolerated being passed end > mm->size, but the
drm_buddy allocator does not.

Restore the pre-buddy-allocator behavior of allowing such placements.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448
Signed-off-by: John Olender <john.olender@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 2d7f82e98df9..abdc52b0895a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -463,7 +463,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
 	int r;
 
 	lpfn = (u64)place->lpfn << PAGE_SHIFT;
-	if (!lpfn)
+	if (!lpfn || lpfn > man->size)
 		lpfn = man->size;
 
 	fpfn = (u64)place->fpfn << PAGE_SHIFT;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment
  2025-04-29 11:24 [RFC PATCH 0/2] drm/amdgpu: CIK UVD initialization fixes John Olender
  2025-04-29 11:24 ` [RFC PATCH 1/2] drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram John Olender
@ 2025-04-29 11:24 ` John Olender
  2025-04-30 21:39   ` Alex Deucher
  1 sibling, 1 reply; 28+ messages in thread
From: John Olender @ 2025-04-29 11:24 UTC (permalink / raw)
  To: amd-gfx; +Cc: arunpravin.paneerselvam

If the vcpu bos are allocated outside the uvd segment,
amdgpu_uvd_ring_test_ib() times out waiting on the ring's fence.

See amdgpu_fence_driver_start_ring() for more context.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3851
Signed-off-by: John Olender <john.olender@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 36 ++++++++++++++-----------
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index 74758b5ffc6c..a6b3e75ffa2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -139,15 +139,20 @@ static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo);
 
 static int amdgpu_uvd_create_msg_bo_helper(struct amdgpu_device *adev,
 					   uint32_t size,
-					   struct amdgpu_bo **bo_ptr)
+					   struct amdgpu_bo **bo_ptr,
+					   bool interruptible)
 {
-	struct ttm_operation_ctx ctx = { true, false };
+	struct ttm_operation_ctx ctx = { interruptible, false };
 	struct amdgpu_bo *bo = NULL;
+	u32 initial_domain = AMDGPU_GEM_DOMAIN_GTT;
 	void *addr;
 	int r;
 
+	if (!interruptible && adev->uvd.address_64_bit)
+		initial_domain |= AMDGPU_GEM_DOMAIN_VRAM;
+
 	r = amdgpu_bo_create_reserved(adev, size, PAGE_SIZE,
-				      AMDGPU_GEM_DOMAIN_GTT,
+				      initial_domain,
 				      &bo, NULL, &addr);
 	if (r)
 		return r;
@@ -319,19 +324,23 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
 	if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP)
 		bo_size += AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8);
 
+	/* from uvd v5.0 HW addressing capacity increased to 64 bits */
+	if (!amdgpu_device_ip_block_version_cmp(adev, AMD_IP_BLOCK_TYPE_UVD, 5, 0))
+		adev->uvd.address_64_bit = true;
+
 	for (j = 0; j < adev->uvd.num_uvd_inst; j++) {
 		if (adev->uvd.harvest_config & (1 << j))
 			continue;
-		r = amdgpu_bo_create_kernel(adev, bo_size, PAGE_SIZE,
-					    AMDGPU_GEM_DOMAIN_VRAM |
-					    AMDGPU_GEM_DOMAIN_GTT,
-					    &adev->uvd.inst[j].vcpu_bo,
-					    &adev->uvd.inst[j].gpu_addr,
-					    &adev->uvd.inst[j].cpu_addr);
+		r = amdgpu_uvd_create_msg_bo_helper(adev, bo_size,
+				&adev->uvd.inst[j].vcpu_bo, false);
 		if (r) {
 			dev_err(adev->dev, "(%d) failed to allocate UVD bo\n", r);
 			return r;
 		}
+		adev->uvd.inst[j].gpu_addr =
+				amdgpu_bo_gpu_offset(adev->uvd.inst[j].vcpu_bo);
+		adev->uvd.inst[j].cpu_addr =
+				amdgpu_bo_kptr(adev->uvd.inst[j].vcpu_bo);
 	}
 
 	for (i = 0; i < adev->uvd.max_handles; ++i) {
@@ -339,11 +348,8 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
 		adev->uvd.filp[i] = NULL;
 	}
 
-	/* from uvd v5.0 HW addressing capacity increased to 64 bits */
-	if (!amdgpu_device_ip_block_version_cmp(adev, AMD_IP_BLOCK_TYPE_UVD, 5, 0))
-		adev->uvd.address_64_bit = true;
-
-	r = amdgpu_uvd_create_msg_bo_helper(adev, 128 << 10, &adev->uvd.ib_bo);
+	r = amdgpu_uvd_create_msg_bo_helper(adev, 128 << 10, &adev->uvd.ib_bo,
+			true);
 	if (r)
 		return r;
 
@@ -1236,7 +1242,7 @@ int amdgpu_uvd_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handle,
 	if (direct) {
 		bo = adev->uvd.ib_bo;
 	} else {
-		r = amdgpu_uvd_create_msg_bo_helper(adev, 4096, &bo);
+		r = amdgpu_uvd_create_msg_bo_helper(adev, 4096, &bo, true);
 		if (r)
 			return r;
 	}
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/2] drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram
  2025-04-29 11:24 ` [RFC PATCH 1/2] drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram John Olender
@ 2025-04-30 21:20   ` Alex Deucher
  2025-04-30 21:44     ` Paneer Selvam, Arunpravin
  2025-05-02  8:28     ` Christian König
  0 siblings, 2 replies; 28+ messages in thread
From: Alex Deucher @ 2025-04-30 21:20 UTC (permalink / raw)
  To: John Olender, Christian Koenig; +Cc: amd-gfx, arunpravin.paneerselvam

+ Christian

On Tue, Apr 29, 2025 at 7:24 AM John Olender <john.olender@gmail.com> wrote:
>
> The drm_mm allocator tolerated being passed end > mm->size, but the
> drm_buddy allocator does not.
>
> Restore the pre-buddy-allocator behavior of allowing such placements.
>
> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448
> Signed-off-by: John Olender <john.olender@gmail.com>

This looks correct to me.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 2d7f82e98df9..abdc52b0895a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> @@ -463,7 +463,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
>         int r;
>
>         lpfn = (u64)place->lpfn << PAGE_SHIFT;
> -       if (!lpfn)
> +       if (!lpfn || lpfn > man->size)
>                 lpfn = man->size;
>
>         fpfn = (u64)place->fpfn << PAGE_SHIFT;
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment
  2025-04-29 11:24 ` [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment John Olender
@ 2025-04-30 21:39   ` Alex Deucher
  2025-05-02  8:36     ` Christian König
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Deucher @ 2025-04-30 21:39 UTC (permalink / raw)
  To: John Olender, Christian Koenig; +Cc: amd-gfx, arunpravin.paneerselvam

+ Christian

On Tue, Apr 29, 2025 at 7:25 AM John Olender <john.olender@gmail.com> wrote:
>
> If the vcpu bos are allocated outside the uvd segment,
> amdgpu_uvd_ring_test_ib() times out waiting on the ring's fence.
>
> See amdgpu_fence_driver_start_ring() for more context.
>
> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3851
> Signed-off-by: John Olender <john.olender@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 36 ++++++++++++++-----------
>  1 file changed, 21 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> index 74758b5ffc6c..a6b3e75ffa2d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> @@ -139,15 +139,20 @@ static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo);
>
>  static int amdgpu_uvd_create_msg_bo_helper(struct amdgpu_device *adev,
>                                            uint32_t size,
> -                                          struct amdgpu_bo **bo_ptr)
> +                                          struct amdgpu_bo **bo_ptr,
> +                                          bool interruptible)
>  {
> -       struct ttm_operation_ctx ctx = { true, false };
> +       struct ttm_operation_ctx ctx = { interruptible, false };
>         struct amdgpu_bo *bo = NULL;
> +       u32 initial_domain = AMDGPU_GEM_DOMAIN_GTT;
>         void *addr;
>         int r;
>
> +       if (!interruptible && adev->uvd.address_64_bit)
> +               initial_domain |= AMDGPU_GEM_DOMAIN_VRAM;
> +
>         r = amdgpu_bo_create_reserved(adev, size, PAGE_SIZE,
> -                                     AMDGPU_GEM_DOMAIN_GTT,
> +                                     initial_domain,
>                                       &bo, NULL, &addr);
>         if (r)
>                 return r;
> @@ -319,19 +324,23 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
>         if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP)
>                 bo_size += AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8);
>
> +       /* from uvd v5.0 HW addressing capacity increased to 64 bits */
> +       if (!amdgpu_device_ip_block_version_cmp(adev, AMD_IP_BLOCK_TYPE_UVD, 5, 0))
> +               adev->uvd.address_64_bit = true;
> +
>         for (j = 0; j < adev->uvd.num_uvd_inst; j++) {
>                 if (adev->uvd.harvest_config & (1 << j))
>                         continue;
> -               r = amdgpu_bo_create_kernel(adev, bo_size, PAGE_SIZE,
> -                                           AMDGPU_GEM_DOMAIN_VRAM |
> -                                           AMDGPU_GEM_DOMAIN_GTT,

I think we can just make this VRAM only.  Or something like:
adev->uvd.address_64_bit ? AMDGPU_GEM_DOMAIN_GTT : AMDGPU_GEM_DOMAIN_VRAM

If that fixes it, this should be tagged with:
Fixes: 58ab2c08d708 ("drm/amdgpu: use VRAM|GTT for a bunch of kernel
allocations")

Alex

> -                                           &adev->uvd.inst[j].vcpu_bo,
> -                                           &adev->uvd.inst[j].gpu_addr,
> -                                           &adev->uvd.inst[j].cpu_addr);
> +               r = amdgpu_uvd_create_msg_bo_helper(adev, bo_size,
> +                               &adev->uvd.inst[j].vcpu_bo, false);
>                 if (r) {
>                         dev_err(adev->dev, "(%d) failed to allocate UVD bo\n", r);
>                         return r;
>                 }
> +               adev->uvd.inst[j].gpu_addr =
> +                               amdgpu_bo_gpu_offset(adev->uvd.inst[j].vcpu_bo);
> +               adev->uvd.inst[j].cpu_addr =
> +                               amdgpu_bo_kptr(adev->uvd.inst[j].vcpu_bo);
>         }
>
>         for (i = 0; i < adev->uvd.max_handles; ++i) {
> @@ -339,11 +348,8 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
>                 adev->uvd.filp[i] = NULL;
>         }
>
> -       /* from uvd v5.0 HW addressing capacity increased to 64 bits */
> -       if (!amdgpu_device_ip_block_version_cmp(adev, AMD_IP_BLOCK_TYPE_UVD, 5, 0))
> -               adev->uvd.address_64_bit = true;
> -
> -       r = amdgpu_uvd_create_msg_bo_helper(adev, 128 << 10, &adev->uvd.ib_bo);
> +       r = amdgpu_uvd_create_msg_bo_helper(adev, 128 << 10, &adev->uvd.ib_bo,
> +                       true);
>         if (r)
>                 return r;
>
> @@ -1236,7 +1242,7 @@ int amdgpu_uvd_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handle,
>         if (direct) {
>                 bo = adev->uvd.ib_bo;
>         } else {
> -               r = amdgpu_uvd_create_msg_bo_helper(adev, 4096, &bo);
> +               r = amdgpu_uvd_create_msg_bo_helper(adev, 4096, &bo, true);
>                 if (r)
>                         return r;
>         }
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/2] drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram
  2025-04-30 21:20   ` Alex Deucher
@ 2025-04-30 21:44     ` Paneer Selvam, Arunpravin
  2025-05-02 15:32       ` John Olender
  2025-05-02  8:28     ` Christian König
  1 sibling, 1 reply; 28+ messages in thread
From: Paneer Selvam, Arunpravin @ 2025-04-30 21:44 UTC (permalink / raw)
  To: Alex Deucher, John Olender, Christian Koenig; +Cc: amd-gfx



On 5/1/2025 2:50 AM, Alex Deucher wrote:
> + Christian
>
> On Tue, Apr 29, 2025 at 7:24 AM John Olender <john.olender@gmail.com> wrote:
>> The drm_mm allocator tolerated being passed end > mm->size, but the
>> drm_buddy allocator does not.
>>
>> Restore the pre-buddy-allocator behavior of allowing such placements.
>>
>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448
>> Signed-off-by: John Olender <john.olender@gmail.com>
> This looks correct to me.
> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
I was thinking that we should return an error when lpfn > man->size.

Regards,
Arun.
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> index 2d7f82e98df9..abdc52b0895a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> @@ -463,7 +463,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
>>          int r;
>>
>>          lpfn = (u64)place->lpfn << PAGE_SHIFT;
>> -       if (!lpfn)
>> +       if (!lpfn || lpfn > man->size)
>>                  lpfn = man->size;
>>
>>          fpfn = (u64)place->fpfn << PAGE_SHIFT;
>> --
>> 2.47.2
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/2] drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram
  2025-04-30 21:20   ` Alex Deucher
  2025-04-30 21:44     ` Paneer Selvam, Arunpravin
@ 2025-05-02  8:28     ` Christian König
  1 sibling, 0 replies; 28+ messages in thread
From: Christian König @ 2025-05-02  8:28 UTC (permalink / raw)
  To: Alex Deucher, John Olender; +Cc: amd-gfx, arunpravin.paneerselvam

On 4/30/25 23:20, Alex Deucher wrote:
> + Christian
> 
> On Tue, Apr 29, 2025 at 7:24 AM John Olender <john.olender@gmail.com> wrote:
>>
>> The drm_mm allocator tolerated being passed end > mm->size, but the
>> drm_buddy allocator does not.
>>
>> Restore the pre-buddy-allocator behavior of allowing such placements.
>>
>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448
>> Signed-off-by: John Olender <john.olender@gmail.com>
> 
> This looks correct to me.
> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>


Good catch, Arun needs to look at this.

My educated guess is that it would be better to make the drm buddy tolerate being passed end > mm->size.

Regards,
Christian.

> 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> index 2d7f82e98df9..abdc52b0895a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> @@ -463,7 +463,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
>>         int r;
>>
>>         lpfn = (u64)place->lpfn << PAGE_SHIFT;
>> -       if (!lpfn)
>> +       if (!lpfn || lpfn > man->size)
>>                 lpfn = man->size;
>>
>>         fpfn = (u64)place->fpfn << PAGE_SHIFT;
>> --
>> 2.47.2
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment
  2025-04-30 21:39   ` Alex Deucher
@ 2025-05-02  8:36     ` Christian König
  2025-05-03 20:31       ` John Olender
  2025-06-05  9:21       ` John Olender
  0 siblings, 2 replies; 28+ messages in thread
From: Christian König @ 2025-05-02  8:36 UTC (permalink / raw)
  To: Alex Deucher, John Olender; +Cc: amd-gfx, arunpravin.paneerselvam

On 4/30/25 23:39, Alex Deucher wrote:
> + Christian
> 
> On Tue, Apr 29, 2025 at 7:25 AM John Olender <john.olender@gmail.com> wrote:
>>
>> If the vcpu bos are allocated outside the uvd segment,
>> amdgpu_uvd_ring_test_ib() times out waiting on the ring's fence.

That's incorrect, but pointing to the correct solution.

>>
>> See amdgpu_fence_driver_start_ring() for more context.
>>
>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3851
>> Signed-off-by: John Olender <john.olender@gmail.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 36 ++++++++++++++-----------
>>  1 file changed, 21 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>> index 74758b5ffc6c..a6b3e75ffa2d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>> @@ -139,15 +139,20 @@ static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo);
>>
>>  static int amdgpu_uvd_create_msg_bo_helper(struct amdgpu_device *adev,
>>                                            uint32_t size,
>> -                                          struct amdgpu_bo **bo_ptr)
>> +                                          struct amdgpu_bo **bo_ptr,
>> +                                          bool interruptible)
>>  {
>> -       struct ttm_operation_ctx ctx = { true, false };
>> +       struct ttm_operation_ctx ctx = { interruptible, false };
>>         struct amdgpu_bo *bo = NULL;
>> +       u32 initial_domain = AMDGPU_GEM_DOMAIN_GTT;
>>         void *addr;
>>         int r;
>>
>> +       if (!interruptible && adev->uvd.address_64_bit)
>> +               initial_domain |= AMDGPU_GEM_DOMAIN_VRAM;
>> +
>>         r = amdgpu_bo_create_reserved(adev, size, PAGE_SIZE,
>> -                                     AMDGPU_GEM_DOMAIN_GTT,
>> +                                     initial_domain,
>>                                       &bo, NULL, &addr);
>>         if (r)
>>                 return r;
>> @@ -319,19 +324,23 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
>>         if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP)
>>                 bo_size += AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8);
>>
>> +       /* from uvd v5.0 HW addressing capacity increased to 64 bits */
>> +       if (!amdgpu_device_ip_block_version_cmp(adev, AMD_IP_BLOCK_TYPE_UVD, 5, 0))
>> +               adev->uvd.address_64_bit = true;
>> +
>>         for (j = 0; j < adev->uvd.num_uvd_inst; j++) {
>>                 if (adev->uvd.harvest_config & (1 << j))
>>                         continue;
>> -               r = amdgpu_bo_create_kernel(adev, bo_size, PAGE_SIZE,
>> -                                           AMDGPU_GEM_DOMAIN_VRAM |
>> -                                           AMDGPU_GEM_DOMAIN_GTT,
> 
> I think we can just make this VRAM only.  Or something like:
> adev->uvd.address_64_bit ? AMDGPU_GEM_DOMAIN_GTT : AMDGPU_GEM_DOMAIN_VRAM

Yeah completely agree. It's a good catch, but the solution is incorrect.

On the older UVD MC interface the FW needs to be in VRAM or the validation fails. If it's inside the window for the message and fence is actually irrelevant.

So something like AMDGPU_GEM_DOMAIN_VRAM | (adev->uvd.address_64_bit ? AMDGPU_GEM_DOMAIN_GTT : 0) would be correct I think.

> 
> If that fixes it, this should be tagged with:
> Fixes: 58ab2c08d708 ("drm/amdgpu: use VRAM|GTT for a bunch of kernel
> allocations")

And CC stable I think.

Regards,
Christian.

> 
> Alex
> 
>> -                                           &adev->uvd.inst[j].vcpu_bo,
>> -                                           &adev->uvd.inst[j].gpu_addr,
>> -                                           &adev->uvd.inst[j].cpu_addr);
>> +               r = amdgpu_uvd_create_msg_bo_helper(adev, bo_size,
>> +                               &adev->uvd.inst[j].vcpu_bo, false);
>>                 if (r) {
>>                         dev_err(adev->dev, "(%d) failed to allocate UVD bo\n", r);
>>                         return r;
>>                 }
>> +               adev->uvd.inst[j].gpu_addr =
>> +                               amdgpu_bo_gpu_offset(adev->uvd.inst[j].vcpu_bo);
>> +               adev->uvd.inst[j].cpu_addr =
>> +                               amdgpu_bo_kptr(adev->uvd.inst[j].vcpu_bo);
>>         }
>>
>>         for (i = 0; i < adev->uvd.max_handles; ++i) {
>> @@ -339,11 +348,8 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
>>                 adev->uvd.filp[i] = NULL;
>>         }
>>
>> -       /* from uvd v5.0 HW addressing capacity increased to 64 bits */
>> -       if (!amdgpu_device_ip_block_version_cmp(adev, AMD_IP_BLOCK_TYPE_UVD, 5, 0))
>> -               adev->uvd.address_64_bit = true;
>> -
>> -       r = amdgpu_uvd_create_msg_bo_helper(adev, 128 << 10, &adev->uvd.ib_bo);
>> +       r = amdgpu_uvd_create_msg_bo_helper(adev, 128 << 10, &adev->uvd.ib_bo,
>> +                       true);
>>         if (r)
>>                 return r;
>>
>> @@ -1236,7 +1242,7 @@ int amdgpu_uvd_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handle,
>>         if (direct) {
>>                 bo = adev->uvd.ib_bo;
>>         } else {
>> -               r = amdgpu_uvd_create_msg_bo_helper(adev, 4096, &bo);
>> +               r = amdgpu_uvd_create_msg_bo_helper(adev, 4096, &bo, true);
>>                 if (r)
>>                         return r;
>>         }
>> --
>> 2.47.2
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/2] drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram
  2025-04-30 21:44     ` Paneer Selvam, Arunpravin
@ 2025-05-02 15:32       ` John Olender
  2025-05-03 12:23         ` Paneer Selvam, Arunpravin
  0 siblings, 1 reply; 28+ messages in thread
From: John Olender @ 2025-05-02 15:32 UTC (permalink / raw)
  To: Paneer Selvam, Arunpravin, Alex Deucher, Christian Koenig; +Cc: amd-gfx

On 4/30/25 5:44 PM, Paneer Selvam, Arunpravin wrote:
> 
> 
> On 5/1/2025 2:50 AM, Alex Deucher wrote:
>> + Christian
>>
>> On Tue, Apr 29, 2025 at 7:24 AM John Olender <john.olender@gmail.com>
>> wrote:
>>> The drm_mm allocator tolerated being passed end > mm->size, but the
>>> drm_buddy allocator does not.
>>>
>>> Restore the pre-buddy-allocator behavior of allowing such placements.
>>>
>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448
>>> Signed-off-by: John Olender <john.olender@gmail.com>
>> This looks correct to me.
>> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
> I was thinking that we should return an error when lpfn > man->size.
> 
> Regards,
> Arun.

This patch restores the previous behavior in the spirit of "Do not crash
the kernel".  The existing uvd placements are pretty clear in their
intent and were accepted until the switch to drm_buddy.  I think it's
fair to consider their style as expected.

With that in mind, I'm not sure amdgpu_vram_mgr is the place this change
really belongs.  That is, I think it's worth asking:

1) Why does drm_mm accept end > mm->size without complaint?
2) Why doesn't drm_buddy do the same?

Thanks,
John

>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/
>>> gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> index 2d7f82e98df9..abdc52b0895a 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> @@ -463,7 +463,7 @@ static int amdgpu_vram_mgr_new(struct
>>> ttm_resource_manager *man,
>>>          int r;
>>>
>>>          lpfn = (u64)place->lpfn << PAGE_SHIFT;
>>> -       if (!lpfn)
>>> +       if (!lpfn || lpfn > man->size)
>>>                  lpfn = man->size;
>>>
>>>          fpfn = (u64)place->fpfn << PAGE_SHIFT;
>>> -- 
>>> 2.47.2
>>>
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/2] drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram
  2025-05-02 15:32       ` John Olender
@ 2025-05-03 12:23         ` Paneer Selvam, Arunpravin
  2025-05-11 20:33           ` Paneer Selvam, Arunpravin
  0 siblings, 1 reply; 28+ messages in thread
From: Paneer Selvam, Arunpravin @ 2025-05-03 12:23 UTC (permalink / raw)
  To: John Olender, Alex Deucher, Christian Koenig; +Cc: amd-gfx



On 5/2/2025 9:02 PM, John Olender wrote:
> On 4/30/25 5:44 PM, Paneer Selvam, Arunpravin wrote:
>>
>> On 5/1/2025 2:50 AM, Alex Deucher wrote:
>>> + Christian
>>>
>>> On Tue, Apr 29, 2025 at 7:24 AM John Olender <john.olender@gmail.com>
>>> wrote:
>>>> The drm_mm allocator tolerated being passed end > mm->size, but the
>>>> drm_buddy allocator does not.
>>>>
>>>> Restore the pre-buddy-allocator behavior of allowing such placements.
>>>>
>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448
>>>> Signed-off-by: John Olender <john.olender@gmail.com>
>>> This looks correct to me.
>>> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>> I was thinking that we should return an error when lpfn > man->size.
>>
>> Regards,
>> Arun.
> This patch restores the previous behavior in the spirit of "Do not crash
> the kernel".  The existing uvd placements are pretty clear in their
> intent and were accepted until the switch to drm_buddy.  I think it's
> fair to consider their style as expected.
>
> With that in mind, I'm not sure amdgpu_vram_mgr is the place this change
> really belongs.  That is, I think it's worth asking:
>
> 1) Why does drm_mm accept end > mm->size without complaint?
> 2) Why doesn't drm_buddy do the same?

I remember that during the development of DRM buddy , we had a 
discussion with Intel folks and decided to
return an error in DRM buddy when end > mm->size. This was done to 
ensure that, at the driver level,  lpfn
has the correct value.

I will modify this at drm_buddy to match with drm_mm and send the patch.

Regards,
Arun.
>
> Thanks,
> John
>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/
>>>> gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>> index 2d7f82e98df9..abdc52b0895a 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>> @@ -463,7 +463,7 @@ static int amdgpu_vram_mgr_new(struct
>>>> ttm_resource_manager *man,
>>>>           int r;
>>>>
>>>>           lpfn = (u64)place->lpfn << PAGE_SHIFT;
>>>> -       if (!lpfn)
>>>> +       if (!lpfn || lpfn > man->size)
>>>>                   lpfn = man->size;
>>>>
>>>>           fpfn = (u64)place->fpfn << PAGE_SHIFT;
>>>> -- 
>>>> 2.47.2
>>>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment
  2025-05-02  8:36     ` Christian König
@ 2025-05-03 20:31       ` John Olender
  2025-05-05  9:02         ` Christian König
  2025-06-05  9:21       ` John Olender
  1 sibling, 1 reply; 28+ messages in thread
From: John Olender @ 2025-05-03 20:31 UTC (permalink / raw)
  To: Christian König, Alex Deucher; +Cc: amd-gfx, arunpravin.paneerselvam

On 5/2/25 4:36 AM, Christian König wrote:
> On 4/30/25 23:39, Alex Deucher wrote:
>> + Christian
>>
>> On Tue, Apr 29, 2025 at 7:25 AM John Olender <john.olender@gmail.com> wrote:
>>>
>>> If the vcpu bos are allocated outside the uvd segment,
>>> amdgpu_uvd_ring_test_ib() times out waiting on the ring's fence.
> 
> That's incorrect, but pointing to the correct solution.
> 
>>>
>>> See amdgpu_fence_driver_start_ring() for more context.
>>>
>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3851
>>> Signed-off-by: John Olender <john.olender@gmail.com>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 36 ++++++++++++++-----------
>>>  1 file changed, 21 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>>> index 74758b5ffc6c..a6b3e75ffa2d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>>> @@ -139,15 +139,20 @@ static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo);
>>>
>>>  static int amdgpu_uvd_create_msg_bo_helper(struct amdgpu_device *adev,
>>>                                            uint32_t size,
>>> -                                          struct amdgpu_bo **bo_ptr)
>>> +                                          struct amdgpu_bo **bo_ptr,
>>> +                                          bool interruptible)
>>>  {
>>> -       struct ttm_operation_ctx ctx = { true, false };
>>> +       struct ttm_operation_ctx ctx = { interruptible, false };
>>>         struct amdgpu_bo *bo = NULL;
>>> +       u32 initial_domain = AMDGPU_GEM_DOMAIN_GTT;
>>>         void *addr;
>>>         int r;
>>>
>>> +       if (!interruptible && adev->uvd.address_64_bit)
>>> +               initial_domain |= AMDGPU_GEM_DOMAIN_VRAM;
>>> +
>>>         r = amdgpu_bo_create_reserved(adev, size, PAGE_SIZE,
>>> -                                     AMDGPU_GEM_DOMAIN_GTT,
>>> +                                     initial_domain,
>>>                                       &bo, NULL, &addr);
>>>         if (r)
>>>                 return r;
>>> @@ -319,19 +324,23 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
>>>         if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP)
>>>                 bo_size += AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8);
>>>
>>> +       /* from uvd v5.0 HW addressing capacity increased to 64 bits */
>>> +       if (!amdgpu_device_ip_block_version_cmp(adev, AMD_IP_BLOCK_TYPE_UVD, 5, 0))
>>> +               adev->uvd.address_64_bit = true;
>>> +
>>>         for (j = 0; j < adev->uvd.num_uvd_inst; j++) {
>>>                 if (adev->uvd.harvest_config & (1 << j))
>>>                         continue;
>>> -               r = amdgpu_bo_create_kernel(adev, bo_size, PAGE_SIZE,
>>> -                                           AMDGPU_GEM_DOMAIN_VRAM |
>>> -                                           AMDGPU_GEM_DOMAIN_GTT,
>>
>> I think we can just make this VRAM only.  Or something like:
>> adev->uvd.address_64_bit ? AMDGPU_GEM_DOMAIN_GTT : AMDGPU_GEM_DOMAIN_VRAM
> 
> Yeah completely agree. It's a good catch, but the solution is incorrect.
> 
> On the older UVD MC interface the FW needs to be in VRAM or the validation fails. If it's inside the window for the message and fence is actually irrelevant.
> 
> So something like AMDGPU_GEM_DOMAIN_VRAM | (adev->uvd.address_64_bit ? AMDGPU_GEM_DOMAIN_GTT : 0) would be correct I think.
> 
>>
>> If that fixes it, this should be tagged with:
>> Fixes: 58ab2c08d708 ("drm/amdgpu: use VRAM|GTT for a bunch of kernel
>> allocations")
> 
> And CC stable I think.
> 
> Regards,
> Christian.
> 
>>
>> Alex
>>

Simply changing the uvd vcpu bo (and therefore the firmware) to always
be allocated in vram does *not* solve #3851.

Let me go into a bit of depth about how I arrived at this patch.

First, what sort of system configuration changes result in the uvd init
failure?  It looks like having a display connected and changing the BAR
size have an impact.  Next, which kernel change reliably triggers the
issue?  The change is the switch to the buddy allocator.

Now that the issue can be reliably triggered, where does the error code,
-110 / -ETIMEDOUT, come from?  It turns out it's in
amdgpu_uvd_ring_test_ib(), specifically a timeout while waiting on the
ring's fence.

With that out of the way, what allocator-related change happens when a
display is connected at startup?  The 'stolen_vga_memory' and related
bos are created.  Adding a one page dummy bo to the same place in the
driver can allow a headless configuration to now pass the uvd ring ib test.

Why does having these extra objects allocated result in a change in
behavior?  Well, the switch to the buddy allocator drastically changes
*where* in vram various objects end up being placed.  What about the BAR
size change?  That ends up influencing where the objects are placed too.

Which objects related to uvd end up being moved around?  The uvd code
has a function to force its objects into a specific segment after all.
Well, it turns out the vcpu bo doesn't go through this function and is
therefore being moved around.

When the system configuration results in a ring ib timeout, the uvd vcpu
bo is pinned *outside* the uvd segment.  When uvd init succeeds, the uvd
vcpu bo is pinned *inside* the uvd segment.

So, it appears there's a relationship between *where* the vcpu bo ends
up and the fence timeout.  But why does the issue manifest as a ring
fence timeout while testing the ib?  Unfortunately, I'm unable to find
something like a datasheet or developer's guide containing the finer
details of uvd.

Well, what seems related in the code?  Where is the ring fence located?
It's placed inside the vcpu bo by amdgpu_fence_driver_start_ring().

So, does this patch provide the correct solution to the problem?  Maybe
not.  But the solution seems plausible enough to at least send in the
patch for review.

Thanks,
John

>>> -                                           &adev->uvd.inst[j].vcpu_bo,
>>> -                                           &adev->uvd.inst[j].gpu_addr,
>>> -                                           &adev->uvd.inst[j].cpu_addr);
>>> +               r = amdgpu_uvd_create_msg_bo_helper(adev, bo_size,
>>> +                               &adev->uvd.inst[j].vcpu_bo, false);
>>>                 if (r) {
>>>                         dev_err(adev->dev, "(%d) failed to allocate UVD bo\n", r);
>>>                         return r;
>>>                 }
>>> +               adev->uvd.inst[j].gpu_addr =
>>> +                               amdgpu_bo_gpu_offset(adev->uvd.inst[j].vcpu_bo);
>>> +               adev->uvd.inst[j].cpu_addr =
>>> +                               amdgpu_bo_kptr(adev->uvd.inst[j].vcpu_bo);
>>>         }
>>>
>>>         for (i = 0; i < adev->uvd.max_handles; ++i) {
>>> @@ -339,11 +348,8 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
>>>                 adev->uvd.filp[i] = NULL;
>>>         }
>>>
>>> -       /* from uvd v5.0 HW addressing capacity increased to 64 bits */
>>> -       if (!amdgpu_device_ip_block_version_cmp(adev, AMD_IP_BLOCK_TYPE_UVD, 5, 0))
>>> -               adev->uvd.address_64_bit = true;
>>> -
>>> -       r = amdgpu_uvd_create_msg_bo_helper(adev, 128 << 10, &adev->uvd.ib_bo);
>>> +       r = amdgpu_uvd_create_msg_bo_helper(adev, 128 << 10, &adev->uvd.ib_bo,
>>> +                       true);
>>>         if (r)
>>>                 return r;
>>>
>>> @@ -1236,7 +1242,7 @@ int amdgpu_uvd_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handle,
>>>         if (direct) {
>>>                 bo = adev->uvd.ib_bo;
>>>         } else {
>>> -               r = amdgpu_uvd_create_msg_bo_helper(adev, 4096, &bo);
>>> +               r = amdgpu_uvd_create_msg_bo_helper(adev, 4096, &bo, true);
>>>                 if (r)
>>>                         return r;
>>>         }
>>> --
>>> 2.47.2
>>>
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment
  2025-05-03 20:31       ` John Olender
@ 2025-05-05  9:02         ` Christian König
  2025-05-05 16:06           ` John Olender
  0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2025-05-05  9:02 UTC (permalink / raw)
  To: John Olender, Alex Deucher; +Cc: amd-gfx, arunpravin.paneerselvam

> Simply changing the uvd vcpu bo (and therefore the firmware) to always
> be allocated in vram does *not* solve #3851.
> 
> Let me go into a bit of depth about how I arrived at this patch.
> 
> First, what sort of system configuration changes result in the uvd init
> failure?  It looks like having a display connected and changing the BAR
> size have an impact.  Next, which kernel change reliably triggers the
> issue?  The change is the switch to the buddy allocator.

Well that is not a resizable BAR, but rather the "VRAM" is just stolen system memory and we completely bypass the BAR to access it.

But the effect is the same. E.g. you have more memory CPU accessible than otherwise.

> 
> Now that the issue can be reliably triggered, where does the error code,
> -110 / -ETIMEDOUT, come from?  It turns out it's in
> amdgpu_uvd_ring_test_ib(), specifically a timeout while waiting on the
> ring's fence.
> 
> With that out of the way, what allocator-related change happens when a
> display is connected at startup?  The 'stolen_vga_memory' and related
> bos are created.  Adding a one page dummy bo to the same place in the
> driver can allow a headless configuration to now pass the uvd ring ib test.
> 
> Why does having these extra objects allocated result in a change in
> behavior?  Well, the switch to the buddy allocator drastically changes
> *where* in vram various objects end up being placed.  What about the BAR
> size change?  That ends up influencing where the objects are placed too.
> 
> Which objects related to uvd end up being moved around?  The uvd code
> has a function to force its objects into a specific segment after all.
> Well, it turns out the vcpu bo doesn't go through this function and is
> therefore being moved around.

That function is there because independent buffers (the message and the feedback for example) needs to be in the same 256MB segment.

> When the system configuration results in a ring ib timeout, the uvd vcpu
> bo is pinned *outside* the uvd segment.  When uvd init succeeds, the uvd
> vcpu bo is pinned *inside* the uvd segment.
> 
> So, it appears there's a relationship between *where* the vcpu bo ends
> up and the fence timeout.  But why does the issue manifest as a ring
> fence timeout while testing the ib?  Unfortunately, I'm unable to find
> something like a datasheet or developer's guide containing the finer
> details of uvd.


Mhm, there must be something wrong with programming bits 28-31 of the VCPU BO base address.

Forcing the VCPU into the first 256 segment just makes those bits zero and so makes it work on your system.

The problem is that this is basically just coincident. On other systems the base address can be completely different.

See function uvd_v4_2_mc_resume() where the mmUVD_LMI_ADDR_EXT and mmUVD_LMI_EXT40_ADDR register is programmed and try to hack those two register writes and see if they really end up in the HW.

I will try to find a Kaveri system which is still working to reproduce the issue.

Thanks,
Christian.

> 
> Well, what seems related in the code?  Where is the ring fence located?
> It's placed inside the vcpu bo by amdgpu_fence_driver_start_ring().
> 
> So, does this patch provide the correct solution to the problem?  Maybe
> not.  But the solution seems plausible enough to at least send in the
> patch for review.
> 
> Thanks,
> John

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment
  2025-05-05  9:02         ` Christian König
@ 2025-05-05 16:06           ` John Olender
  2025-05-07 11:31             ` John Olender
  0 siblings, 1 reply; 28+ messages in thread
From: John Olender @ 2025-05-05 16:06 UTC (permalink / raw)
  To: Christian König, Alex Deucher; +Cc: amd-gfx, arunpravin.paneerselvam

On 5/5/25 5:02 AM, Christian König wrote:
>> Simply changing the uvd vcpu bo (and therefore the firmware) to always
>> be allocated in vram does *not* solve #3851.
>>
>> Let me go into a bit of depth about how I arrived at this patch.
>>
>> First, what sort of system configuration changes result in the uvd init
>> failure?  It looks like having a display connected and changing the BAR
>> size have an impact.  Next, which kernel change reliably triggers the
>> issue?  The change is the switch to the buddy allocator.
> 
> Well that is not a resizable BAR, but rather the "VRAM" is just stolen system memory and we completely bypass the BAR to access it.
> 
> But the effect is the same. E.g. you have more memory CPU accessible than otherwise.
> 
>>
>> Now that the issue can be reliably triggered, where does the error code,
>> -110 / -ETIMEDOUT, come from?  It turns out it's in
>> amdgpu_uvd_ring_test_ib(), specifically a timeout while waiting on the
>> ring's fence.
>>
>> With that out of the way, what allocator-related change happens when a
>> display is connected at startup?  The 'stolen_vga_memory' and related
>> bos are created.  Adding a one page dummy bo to the same place in the
>> driver can allow a headless configuration to now pass the uvd ring ib test.
>>
>> Why does having these extra objects allocated result in a change in
>> behavior?  Well, the switch to the buddy allocator drastically changes
>> *where* in vram various objects end up being placed.  What about the BAR
>> size change?  That ends up influencing where the objects are placed too.
>>
>> Which objects related to uvd end up being moved around?  The uvd code
>> has a function to force its objects into a specific segment after all.
>> Well, it turns out the vcpu bo doesn't go through this function and is
>> therefore being moved around.
> 
> That function is there because independent buffers (the message and the feedback for example) needs to be in the same 256MB segment.
> 
>> When the system configuration results in a ring ib timeout, the uvd vcpu
>> bo is pinned *outside* the uvd segment.  When uvd init succeeds, the uvd
>> vcpu bo is pinned *inside* the uvd segment.
>>
>> So, it appears there's a relationship between *where* the vcpu bo ends
>> up and the fence timeout.  But why does the issue manifest as a ring
>> fence timeout while testing the ib?  Unfortunately, I'm unable to find
>> something like a datasheet or developer's guide containing the finer
>> details of uvd.
> 
> 
> Mhm, there must be something wrong with programming bits 28-31 of the VCPU BO base address.
> 
> Forcing the VCPU into the first 256 segment just makes those bits zero and so makes it work on your system.
> 
> The problem is that this is basically just coincident. On other systems the base address can be completely different.
> 
> See function uvd_v4_2_mc_resume() where the mmUVD_LMI_ADDR_EXT and mmUVD_LMI_EXT40_ADDR register is programmed and try to hack those two register writes and see if they really end up in the HW.
> 
> I will try to find a Kaveri system which is still working to reproduce the issue.
> 
> Thanks,
> Christian.
> 

I first saw this issue with a s9150.  I had serious reservations about
reporting the issue because, in its default configuration, the s9150 has
no display output.  I needed to figure out that yes, this is a real
issue, I didn't just shoot myself in the foot by enabling broken display
hardware.

The issue affects all s9150s in a system, occurs in different slots and
numa nodes, still occurs when other hardware is added or removed, and
follows the s9150 from x399 to a significantly newer b650 system.

The Kaveri iGPU, while also impacted, mainly serves to show that yes,
this issue is happening on more than just some dodgy s9150 setup.

Anyway, hopefully these extra configuration details help narrow down the
problem.

Thanks,
John

>>
>> Well, what seems related in the code?  Where is the ring fence located?
>> It's placed inside the vcpu bo by amdgpu_fence_driver_start_ring().
>>
>> So, does this patch provide the correct solution to the problem?  Maybe
>> not.  But the solution seems plausible enough to at least send in the
>> patch for review.
>>
>> Thanks,
>> John


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment
  2025-05-05 16:06           ` John Olender
@ 2025-05-07 11:31             ` John Olender
  2025-05-29 23:15               ` John Olender
  0 siblings, 1 reply; 28+ messages in thread
From: John Olender @ 2025-05-07 11:31 UTC (permalink / raw)
  To: Christian König, Alex Deucher; +Cc: amd-gfx, arunpravin.paneerselvam

On 5/5/25 12:06 PM, John Olender wrote:
> On 5/5/25 5:02 AM, Christian König wrote:
>>> Simply changing the uvd vcpu bo (and therefore the firmware) to always
>>> be allocated in vram does *not* solve #3851.
>>>
>>> Let me go into a bit of depth about how I arrived at this patch.
>>>
>>> First, what sort of system configuration changes result in the uvd init
>>> failure?  It looks like having a display connected and changing the BAR
>>> size have an impact.  Next, which kernel change reliably triggers the
>>> issue?  The change is the switch to the buddy allocator.
>>
>> Well that is not a resizable BAR, but rather the "VRAM" is just stolen system memory and we completely bypass the BAR to access it.
>>
>> But the effect is the same. E.g. you have more memory CPU accessible than otherwise.
>>
>>>
>>> Now that the issue can be reliably triggered, where does the error code,
>>> -110 / -ETIMEDOUT, come from?  It turns out it's in
>>> amdgpu_uvd_ring_test_ib(), specifically a timeout while waiting on the
>>> ring's fence.
>>>
>>> With that out of the way, what allocator-related change happens when a
>>> display is connected at startup?  The 'stolen_vga_memory' and related
>>> bos are created.  Adding a one page dummy bo to the same place in the
>>> driver can allow a headless configuration to now pass the uvd ring ib test.
>>>
>>> Why does having these extra objects allocated result in a change in
>>> behavior?  Well, the switch to the buddy allocator drastically changes
>>> *where* in vram various objects end up being placed.  What about the BAR
>>> size change?  That ends up influencing where the objects are placed too.
>>>
>>> Which objects related to uvd end up being moved around?  The uvd code
>>> has a function to force its objects into a specific segment after all.
>>> Well, it turns out the vcpu bo doesn't go through this function and is
>>> therefore being moved around.
>>
>> That function is there because independent buffers (the message and the feedback for example) needs to be in the same 256MB segment.
>>
>>> When the system configuration results in a ring ib timeout, the uvd vcpu
>>> bo is pinned *outside* the uvd segment.  When uvd init succeeds, the uvd
>>> vcpu bo is pinned *inside* the uvd segment.
>>>
>>> So, it appears there's a relationship between *where* the vcpu bo ends
>>> up and the fence timeout.  But why does the issue manifest as a ring
>>> fence timeout while testing the ib?  Unfortunately, I'm unable to find
>>> something like a datasheet or developer's guide containing the finer
>>> details of uvd.
>>
>>
>> Mhm, there must be something wrong with programming bits 28-31 of the VCPU BO base address.
>>
>> Forcing the VCPU into the first 256 segment just makes those bits zero and so makes it work on your system.
>>
>> The problem is that this is basically just coincident. On other systems the base address can be completely different.
>>
>> See function uvd_v4_2_mc_resume() where the mmUVD_LMI_ADDR_EXT and mmUVD_LMI_EXT40_ADDR register is programmed and try to hack those two register writes and see if they really end up in the HW.

Okay, I did a read and compare after each write.

Both writes seem to go through on both the Kaveri and s9150:

Kaveri (512MB UMA Buffer):
amdgpu 0000:00:01.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_ADDR_EXT: gpu_addr=0xF41FA00000, addr=0x00000001, wrote 0x00001001, read 0x00001001 [same]
amdgpu 0000:00:01.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_EXT40_ADDR: gpu_addr=0xF41FA00000, addr=0x000000F4, wrote 0x800900F4, read 0x800900F4 [same]

s9150:
amdgpu 0000:41:00.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_ADDR_EXT: gpu_addr=0xF7FFA00000, addr=0x0000000F, wrote 0x0000F00F, read 0x0000F00F [same]
amdgpu 0000:41:00.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_EXT40_ADDR: gpu_addr=0xF7FFA00000, addr=0x000000F7, wrote 0x800900F7, read 0x800900F7 [same]

Thanks,
John

>>
>> I will try to find a Kaveri system which is still working to reproduce the issue.
>>
>> Thanks,
>> Christian.
>>
> 
> I first saw this issue with a s9150.  I had serious reservations about
> reporting the issue because, in its default configuration, the s9150 has
> no display output.  I needed to figure out that yes, this is a real
> issue, I didn't just shoot myself in the foot by enabling broken display
> hardware.
> 
> The issue affects all s9150s in a system, occurs in different slots and
> numa nodes, still occurs when other hardware is added or removed, and
> follows the s9150 from x399 to a significantly newer b650 system.
> 
> The Kaveri iGPU, while also impacted, mainly serves to show that yes,
> this issue is happening on more than just some dodgy s9150 setup.
> 
> Anyway, hopefully these extra configuration details help narrow down the
> problem.
> 
> Thanks,
> John
> 
>>>
>>> Well, what seems related in the code?  Where is the ring fence located?
>>> It's placed inside the vcpu bo by amdgpu_fence_driver_start_ring().
>>>
>>> So, does this patch provide the correct solution to the problem?  Maybe
>>> not.  But the solution seems plausible enough to at least send in the
>>> patch for review.
>>>
>>> Thanks,
>>> John
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/2] drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram
  2025-05-03 12:23         ` Paneer Selvam, Arunpravin
@ 2025-05-11 20:33           ` Paneer Selvam, Arunpravin
  2025-05-11 20:37             ` Paneer Selvam, Arunpravin
  0 siblings, 1 reply; 28+ messages in thread
From: Paneer Selvam, Arunpravin @ 2025-05-11 20:33 UTC (permalink / raw)
  To: John Olender, Alex Deucher, Christian Koenig; +Cc: amd-gfx



On 5/3/2025 5:53 PM, Paneer Selvam, Arunpravin wrote:
>
>
> On 5/2/2025 9:02 PM, John Olender wrote:
>> On 4/30/25 5:44 PM, Paneer Selvam, Arunpravin wrote:
>>>
>>> On 5/1/2025 2:50 AM, Alex Deucher wrote:
>>>> + Christian
>>>>
>>>> On Tue, Apr 29, 2025 at 7:24 AM John Olender <john.olender@gmail.com>
>>>> wrote:
>>>>> The drm_mm allocator tolerated being passed end > mm->size, but the
>>>>> drm_buddy allocator does not.
>>>>>
>>>>> Restore the pre-buddy-allocator behavior of allowing such placements.
>>>>>
>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448
>>>>> Signed-off-by: John Olender <john.olender@gmail.com>
>>>> This looks correct to me.
>>>> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>>> I was thinking that we should return an error when lpfn > man->size.
>>>
>>> Regards,
>>> Arun.
>> This patch restores the previous behavior in the spirit of "Do not crash
>> the kernel".  The existing uvd placements are pretty clear in their
>> intent and were accepted until the switch to drm_buddy.  I think it's
>> fair to consider their style as expected.
>>
>> With that in mind, I'm not sure amdgpu_vram_mgr is the place this change
>> really belongs.  That is, I think it's worth asking:
>>
>> 1) Why does drm_mm accept end > mm->size without complaint?
>> 2) Why doesn't drm_buddy do the same?
>
> I remember that during the development of DRM buddy , we had a 
> discussion with Intel folks and decided to
> return an error in DRM buddy when end > mm->size. This was done to 
> ensure that, at the driver level,  lpfn
> has the correct value.
>
> I will modify this at drm_buddy to match with drm_mm and send the patch.
After giving it some thought, I think it is more effective to implement 
this tolerance at the VRAM manager level
and allow the DRM buddy manager to perform a strict validation, as this 
is necessary for other graphics drivers
(e.g., i915).

Regards,
Arun.
>
> Regards,
> Arun.
>>
>> Thanks,
>> John
>>
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/
>>>>> gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>> index 2d7f82e98df9..abdc52b0895a 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>> @@ -463,7 +463,7 @@ static int amdgpu_vram_mgr_new(struct
>>>>> ttm_resource_manager *man,
>>>>>           int r;
>>>>>
>>>>>           lpfn = (u64)place->lpfn << PAGE_SHIFT;
>>>>> -       if (!lpfn)
>>>>> +       if (!lpfn || lpfn > man->size)
>>>>>                   lpfn = man->size;
>>>>>
>>>>>           fpfn = (u64)place->fpfn << PAGE_SHIFT;
>>>>> -- 
>>>>> 2.47.2
>>>>>
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/2] drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram
  2025-05-11 20:33           ` Paneer Selvam, Arunpravin
@ 2025-05-11 20:37             ` Paneer Selvam, Arunpravin
  2025-05-12  7:09               ` Christian König
  0 siblings, 1 reply; 28+ messages in thread
From: Paneer Selvam, Arunpravin @ 2025-05-11 20:37 UTC (permalink / raw)
  To: John Olender, Alex Deucher, Christian Koenig; +Cc: amd-gfx

[-- Attachment #1: Type: text/plain, Size: 3069 bytes --]



On 5/12/2025 2:03 AM, Paneer Selvam, Arunpravin wrote:
>
>
> On 5/3/2025 5:53 PM, Paneer Selvam, Arunpravin wrote:
>>
>>
>> On 5/2/2025 9:02 PM, John Olender wrote:
>>> On 4/30/25 5:44 PM, Paneer Selvam, Arunpravin wrote:
>>>>
>>>> On 5/1/2025 2:50 AM, Alex Deucher wrote:
>>>>> + Christian
>>>>>
>>>>> On Tue, Apr 29, 2025 at 7:24 AM John Olender <john.olender@gmail.com>
>>>>> wrote:
>>>>>> The drm_mm allocator tolerated being passed end > mm->size, but the
>>>>>> drm_buddy allocator does not.
>>>>>>
>>>>>> Restore the pre-buddy-allocator behavior of allowing such 
>>>>>> placements.
>>>>>>
>>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448
>>>>>> Signed-off-by: John Olender <john.olender@gmail.com>
>>>>> This looks correct to me.
>>>>> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>>>> I was thinking that we should return an error when lpfn > man->size.
>>>>
>>>> Regards,
>>>> Arun.
>>> This patch restores the previous behavior in the spirit of "Do not 
>>> crash
>>> the kernel".  The existing uvd placements are pretty clear in their
>>> intent and were accepted until the switch to drm_buddy.  I think it's
>>> fair to consider their style as expected.
>>>
>>> With that in mind, I'm not sure amdgpu_vram_mgr is the place this 
>>> change
>>> really belongs.  That is, I think it's worth asking:
>>>
>>> 1) Why does drm_mm accept end > mm->size without complaint?
>>> 2) Why doesn't drm_buddy do the same?
>>
>> I remember that during the development of DRM buddy , we had a 
>> discussion with Intel folks and decided to
>> return an error in DRM buddy when end > mm->size. This was done to 
>> ensure that, at the driver level,  lpfn
>> has the correct value.
>>
>> I will modify this at drm_buddy to match with drm_mm and send the patch.
> After giving it some thought, I think it is more effective to 
> implement this tolerance at the VRAM manager level
> and allow the DRM buddy manager to perform a strict validation, as 
> this is necessary for other graphics drivers
> (e.g., i915).

Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
>
> Regards,
> Arun.
>>
>> Regards,
>> Arun.
>>>
>>> Thanks,
>>> John
>>>
>>>>>> ---
>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
>>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/
>>>>>> gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>>> index 2d7f82e98df9..abdc52b0895a 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>>> @@ -463,7 +463,7 @@ static int amdgpu_vram_mgr_new(struct
>>>>>> ttm_resource_manager *man,
>>>>>>           int r;
>>>>>>
>>>>>>           lpfn = (u64)place->lpfn << PAGE_SHIFT;
>>>>>> -       if (!lpfn)
>>>>>> +       if (!lpfn || lpfn > man->size)
>>>>>>                   lpfn = man->size;
>>>>>>
>>>>>>           fpfn = (u64)place->fpfn << PAGE_SHIFT;
>>>>>> -- 
>>>>>> 2.47.2
>>>>>>
>>
>

[-- Attachment #2: Type: text/html, Size: 6644 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/2] drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram
  2025-05-11 20:37             ` Paneer Selvam, Arunpravin
@ 2025-05-12  7:09               ` Christian König
  2025-05-12  7:11                 ` Paneer Selvam, Arunpravin
  0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2025-05-12  7:09 UTC (permalink / raw)
  To: Paneer Selvam, Arunpravin, John Olender, Alex Deucher; +Cc: amd-gfx



On 5/11/25 22:37, Paneer Selvam, Arunpravin wrote:
> 
> 
> On 5/12/2025 2:03 AM, Paneer Selvam, Arunpravin wrote:
>>
>>
>> On 5/3/2025 5:53 PM, Paneer Selvam, Arunpravin wrote:
>>>
>>>
>>> On 5/2/2025 9:02 PM, John Olender wrote:
>>>> On 4/30/25 5:44 PM, Paneer Selvam, Arunpravin wrote:
>>>>>
>>>>> On 5/1/2025 2:50 AM, Alex Deucher wrote:
>>>>>> + Christian
>>>>>>
>>>>>> On Tue, Apr 29, 2025 at 7:24 AM John Olender <john.olender@gmail.com>
>>>>>> wrote:
>>>>>>> The drm_mm allocator tolerated being passed end > mm->size, but the
>>>>>>> drm_buddy allocator does not.
>>>>>>>
>>>>>>> Restore the pre-buddy-allocator behavior of allowing such placements.
>>>>>>>
>>>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448
>>>>>>> Signed-off-by: John Olender <john.olender@gmail.com>
>>>>>> This looks correct to me.
>>>>>> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>>>>> I was thinking that we should return an error when lpfn > man->size.
>>>>>
>>>>> Regards,
>>>>> Arun.
>>>> This patch restores the previous behavior in the spirit of "Do not crash
>>>> the kernel".  The existing uvd placements are pretty clear in their
>>>> intent and were accepted until the switch to drm_buddy.  I think it's
>>>> fair to consider their style as expected.
>>>>
>>>> With that in mind, I'm not sure amdgpu_vram_mgr is the place this change
>>>> really belongs.  That is, I think it's worth asking:
>>>>
>>>> 1) Why does drm_mm accept end > mm->size without complaint?
>>>> 2) Why doesn't drm_buddy do the same?
>>>
>>> I remember that during the development of DRM buddy , we had a discussion with Intel folks and decided to
>>> return an error in DRM buddy when end > mm->size. This was done to ensure that, at the driver level,  lpfn
>>> has the correct value.
>>>
>>> I will modify this at drm_buddy to match with drm_mm and send the patch.
>> After giving it some thought, I think it is more effective to implement this tolerance at the VRAM manager level
>> and allow the DRM buddy manager to perform a strict validation, as this is necessary for other graphics drivers
>> (e.g., i915).
> 
> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>

Ok in that case please pick this patch up and make sure that it land in amd-staging-drm-next Arun.

Alex most likely won't follow the discussion till the end.

Thanks,
Christian.

>>
>> Regards,
>> Arun.
>>>
>>> Regards,
>>> Arun.
>>>>
>>>> Thanks,
>>>> John
>>>>
>>>>>>> ---
>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
>>>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/
>>>>>>> gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>>>> index 2d7f82e98df9..abdc52b0895a 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>>>> @@ -463,7 +463,7 @@ static int amdgpu_vram_mgr_new(struct
>>>>>>> ttm_resource_manager *man,
>>>>>>>           int r;
>>>>>>>
>>>>>>>           lpfn = (u64)place->lpfn << PAGE_SHIFT;
>>>>>>> -       if (!lpfn)
>>>>>>> +       if (!lpfn || lpfn > man->size)
>>>>>>>                   lpfn = man->size;
>>>>>>>
>>>>>>>           fpfn = (u64)place->fpfn << PAGE_SHIFT;
>>>>>>> -- 
>>>>>>> 2.47.2
>>>>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/2] drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram
  2025-05-12  7:09               ` Christian König
@ 2025-05-12  7:11                 ` Paneer Selvam, Arunpravin
  2025-05-15 15:49                   ` Paneer Selvam, Arunpravin
  0 siblings, 1 reply; 28+ messages in thread
From: Paneer Selvam, Arunpravin @ 2025-05-12  7:11 UTC (permalink / raw)
  To: Christian König, John Olender, Alex Deucher; +Cc: amd-gfx



On 5/12/2025 12:39 PM, Christian König wrote:
>
> On 5/11/25 22:37, Paneer Selvam, Arunpravin wrote:
>>
>> On 5/12/2025 2:03 AM, Paneer Selvam, Arunpravin wrote:
>>>
>>> On 5/3/2025 5:53 PM, Paneer Selvam, Arunpravin wrote:
>>>>
>>>> On 5/2/2025 9:02 PM, John Olender wrote:
>>>>> On 4/30/25 5:44 PM, Paneer Selvam, Arunpravin wrote:
>>>>>> On 5/1/2025 2:50 AM, Alex Deucher wrote:
>>>>>>> + Christian
>>>>>>>
>>>>>>> On Tue, Apr 29, 2025 at 7:24 AM John Olender <john.olender@gmail.com>
>>>>>>> wrote:
>>>>>>>> The drm_mm allocator tolerated being passed end > mm->size, but the
>>>>>>>> drm_buddy allocator does not.
>>>>>>>>
>>>>>>>> Restore the pre-buddy-allocator behavior of allowing such placements.
>>>>>>>>
>>>>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448
>>>>>>>> Signed-off-by: John Olender <john.olender@gmail.com>
>>>>>>> This looks correct to me.
>>>>>>> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>>>>>> I was thinking that we should return an error when lpfn > man->size.
>>>>>>
>>>>>> Regards,
>>>>>> Arun.
>>>>> This patch restores the previous behavior in the spirit of "Do not crash
>>>>> the kernel".  The existing uvd placements are pretty clear in their
>>>>> intent and were accepted until the switch to drm_buddy.  I think it's
>>>>> fair to consider their style as expected.
>>>>>
>>>>> With that in mind, I'm not sure amdgpu_vram_mgr is the place this change
>>>>> really belongs.  That is, I think it's worth asking:
>>>>>
>>>>> 1) Why does drm_mm accept end > mm->size without complaint?
>>>>> 2) Why doesn't drm_buddy do the same?
>>>> I remember that during the development of DRM buddy , we had a discussion with Intel folks and decided to
>>>> return an error in DRM buddy when end > mm->size. This was done to ensure that, at the driver level,  lpfn
>>>> has the correct value.
>>>>
>>>> I will modify this at drm_buddy to match with drm_mm and send the patch.
>>> After giving it some thought, I think it is more effective to implement this tolerance at the VRAM manager level
>>> and allow the DRM buddy manager to perform a strict validation, as this is necessary for other graphics drivers
>>> (e.g., i915).
>> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
> Ok in that case please pick this patch up and make sure that it land in amd-staging-drm-next Arun.
>
> Alex most likely won't follow the discussion till the end.
Sure Christian, I will merge this patch into amd-staging-drm-next.

Thanks,
Arun.
>
> Thanks,
> Christian.
>
>>> Regards,
>>> Arun.
>>>> Regards,
>>>> Arun.
>>>>> Thanks,
>>>>> John
>>>>>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
>>>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/
>>>>>>>> gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>>>>> index 2d7f82e98df9..abdc52b0895a 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>>>>> @@ -463,7 +463,7 @@ static int amdgpu_vram_mgr_new(struct
>>>>>>>> ttm_resource_manager *man,
>>>>>>>>            int r;
>>>>>>>>
>>>>>>>>            lpfn = (u64)place->lpfn << PAGE_SHIFT;
>>>>>>>> -       if (!lpfn)
>>>>>>>> +       if (!lpfn || lpfn > man->size)
>>>>>>>>                    lpfn = man->size;
>>>>>>>>
>>>>>>>>            fpfn = (u64)place->fpfn << PAGE_SHIFT;
>>>>>>>> -- 
>>>>>>>> 2.47.2
>>>>>>>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/2] drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram
  2025-05-12  7:11                 ` Paneer Selvam, Arunpravin
@ 2025-05-15 15:49                   ` Paneer Selvam, Arunpravin
  2025-05-23 11:31                     ` Paneer Selvam, Arunpravin
  0 siblings, 1 reply; 28+ messages in thread
From: Paneer Selvam, Arunpravin @ 2025-05-15 15:49 UTC (permalink / raw)
  To: Christian König, John Olender, Alex Deucher; +Cc: amd-gfx



On 5/12/2025 12:41 PM, Paneer Selvam, Arunpravin wrote:
>
>
> On 5/12/2025 12:39 PM, Christian König wrote:
>>
>> On 5/11/25 22:37, Paneer Selvam, Arunpravin wrote:
>>>
>>> On 5/12/2025 2:03 AM, Paneer Selvam, Arunpravin wrote:
>>>>
>>>> On 5/3/2025 5:53 PM, Paneer Selvam, Arunpravin wrote:
>>>>>
>>>>> On 5/2/2025 9:02 PM, John Olender wrote:
>>>>>> On 4/30/25 5:44 PM, Paneer Selvam, Arunpravin wrote:
>>>>>>> On 5/1/2025 2:50 AM, Alex Deucher wrote:
>>>>>>>> + Christian
>>>>>>>>
>>>>>>>> On Tue, Apr 29, 2025 at 7:24 AM John Olender 
>>>>>>>> <john.olender@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> The drm_mm allocator tolerated being passed end > mm->size, 
>>>>>>>>> but the
>>>>>>>>> drm_buddy allocator does not.
>>>>>>>>>
>>>>>>>>> Restore the pre-buddy-allocator behavior of allowing such 
>>>>>>>>> placements.
>>>>>>>>>
>>>>>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448
>>>>>>>>> Signed-off-by: John Olender <john.olender@gmail.com>
>>>>>>>> This looks correct to me.
>>>>>>>> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>>>>>>> I was thinking that we should return an error when lpfn > 
>>>>>>> man->size.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Arun.
>>>>>> This patch restores the previous behavior in the spirit of "Do 
>>>>>> not crash
>>>>>> the kernel".  The existing uvd placements are pretty clear in their
>>>>>> intent and were accepted until the switch to drm_buddy. I think it's
>>>>>> fair to consider their style as expected.
>>>>>>
>>>>>> With that in mind, I'm not sure amdgpu_vram_mgr is the place this 
>>>>>> change
>>>>>> really belongs.  That is, I think it's worth asking:
>>>>>>
>>>>>> 1) Why does drm_mm accept end > mm->size without complaint?
>>>>>> 2) Why doesn't drm_buddy do the same?
>>>>> I remember that during the development of DRM buddy , we had a 
>>>>> discussion with Intel folks and decided to
>>>>> return an error in DRM buddy when end > mm->size. This was done to 
>>>>> ensure that, at the driver level,  lpfn
>>>>> has the correct value.
>>>>>
>>>>> I will modify this at drm_buddy to match with drm_mm and send the 
>>>>> patch.
>>>> After giving it some thought, I think it is more effective to 
>>>> implement this tolerance at the VRAM manager level
>>>> and allow the DRM buddy manager to perform a strict validation, as 
>>>> this is necessary for other graphics drivers
>>>> (e.g., i915).
>>> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
>> Ok in that case please pick this patch up and make sure that it land 
>> in amd-staging-drm-next Arun.
>>
>> Alex most likely won't follow the discussion till the end.
> Sure Christian, I will merge this patch into amd-staging-drm-next.
I see a black screen problem during the amdgpu driver load on the tip of 
amd-staging-drm-next,
once that issue is fixed, I will push this patch into amd-staging-drm-next.

Regards,
Arun.
>
> Thanks,
> Arun.
>>
>> Thanks,
>> Christian.
>>
>>>> Regards,
>>>> Arun.
>>>>> Regards,
>>>>> Arun.
>>>>>> Thanks,
>>>>>> John
>>>>>>
>>>>>>>>> ---
>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
>>>>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
>>>>>>>>> b/drivers/
>>>>>>>>> gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>>>>>> index 2d7f82e98df9..abdc52b0895a 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>>>>>> @@ -463,7 +463,7 @@ static int amdgpu_vram_mgr_new(struct
>>>>>>>>> ttm_resource_manager *man,
>>>>>>>>>            int r;
>>>>>>>>>
>>>>>>>>>            lpfn = (u64)place->lpfn << PAGE_SHIFT;
>>>>>>>>> -       if (!lpfn)
>>>>>>>>> +       if (!lpfn || lpfn > man->size)
>>>>>>>>>                    lpfn = man->size;
>>>>>>>>>
>>>>>>>>>            fpfn = (u64)place->fpfn << PAGE_SHIFT;
>>>>>>>>> -- 
>>>>>>>>> 2.47.2
>>>>>>>>>
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/2] drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram
  2025-05-15 15:49                   ` Paneer Selvam, Arunpravin
@ 2025-05-23 11:31                     ` Paneer Selvam, Arunpravin
  0 siblings, 0 replies; 28+ messages in thread
From: Paneer Selvam, Arunpravin @ 2025-05-23 11:31 UTC (permalink / raw)
  To: Christian König, John Olender, Alex Deucher; +Cc: amd-gfx



On 5/15/2025 9:19 PM, Paneer Selvam, Arunpravin wrote:
>
>
> On 5/12/2025 12:41 PM, Paneer Selvam, Arunpravin wrote:
>>
>>
>> On 5/12/2025 12:39 PM, Christian König wrote:
>>>
>>> On 5/11/25 22:37, Paneer Selvam, Arunpravin wrote:
>>>>
>>>> On 5/12/2025 2:03 AM, Paneer Selvam, Arunpravin wrote:
>>>>>
>>>>> On 5/3/2025 5:53 PM, Paneer Selvam, Arunpravin wrote:
>>>>>>
>>>>>> On 5/2/2025 9:02 PM, John Olender wrote:
>>>>>>> On 4/30/25 5:44 PM, Paneer Selvam, Arunpravin wrote:
>>>>>>>> On 5/1/2025 2:50 AM, Alex Deucher wrote:
>>>>>>>>> + Christian
>>>>>>>>>
>>>>>>>>> On Tue, Apr 29, 2025 at 7:24 AM John Olender 
>>>>>>>>> <john.olender@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> The drm_mm allocator tolerated being passed end > mm->size, 
>>>>>>>>>> but the
>>>>>>>>>> drm_buddy allocator does not.
>>>>>>>>>>
>>>>>>>>>> Restore the pre-buddy-allocator behavior of allowing such 
>>>>>>>>>> placements.
>>>>>>>>>>
>>>>>>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448
>>>>>>>>>> Signed-off-by: John Olender <john.olender@gmail.com>
>>>>>>>>> This looks correct to me.
>>>>>>>>> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>> I was thinking that we should return an error when lpfn > 
>>>>>>>> man->size.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Arun.
>>>>>>> This patch restores the previous behavior in the spirit of "Do 
>>>>>>> not crash
>>>>>>> the kernel".  The existing uvd placements are pretty clear in their
>>>>>>> intent and were accepted until the switch to drm_buddy. I think 
>>>>>>> it's
>>>>>>> fair to consider their style as expected.
>>>>>>>
>>>>>>> With that in mind, I'm not sure amdgpu_vram_mgr is the place 
>>>>>>> this change
>>>>>>> really belongs.  That is, I think it's worth asking:
>>>>>>>
>>>>>>> 1) Why does drm_mm accept end > mm->size without complaint?
>>>>>>> 2) Why doesn't drm_buddy do the same?
>>>>>> I remember that during the development of DRM buddy , we had a 
>>>>>> discussion with Intel folks and decided to
>>>>>> return an error in DRM buddy when end > mm->size. This was done 
>>>>>> to ensure that, at the driver level,  lpfn
>>>>>> has the correct value.
>>>>>>
>>>>>> I will modify this at drm_buddy to match with drm_mm and send the 
>>>>>> patch.
>>>>> After giving it some thought, I think it is more effective to 
>>>>> implement this tolerance at the VRAM manager level
>>>>> and allow the DRM buddy manager to perform a strict validation, as 
>>>>> this is necessary for other graphics drivers
>>>>> (e.g., i915).
>>>> Reviewed-by: Arunpravin Paneer Selvam 
>>>> <Arunpravin.PaneerSelvam@amd.com>
>>> Ok in that case please pick this patch up and make sure that it land 
>>> in amd-staging-drm-next Arun.
>>>
>>> Alex most likely won't follow the discussion till the end.
>> Sure Christian, I will merge this patch into amd-staging-drm-next.
> I see a black screen problem during the amdgpu driver load on the tip 
> of amd-staging-drm-next,
> once that issue is fixed, I will push this patch into 
> amd-staging-drm-next.
I have merged this patch into amd-staging-drm-next.

Thanks,
Arun.
>
> Regards,
> Arun.
>>
>> Thanks,
>> Arun.
>>>
>>> Thanks,
>>> Christian.
>>>
>>>>> Regards,
>>>>> Arun.
>>>>>> Regards,
>>>>>> Arun.
>>>>>>> Thanks,
>>>>>>> John
>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
>>>>>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
>>>>>>>>>> b/drivers/
>>>>>>>>>> gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>>>>>>> index 2d7f82e98df9..abdc52b0895a 100644
>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>>>>>>>> @@ -463,7 +463,7 @@ static int amdgpu_vram_mgr_new(struct
>>>>>>>>>> ttm_resource_manager *man,
>>>>>>>>>>            int r;
>>>>>>>>>>
>>>>>>>>>>            lpfn = (u64)place->lpfn << PAGE_SHIFT;
>>>>>>>>>> -       if (!lpfn)
>>>>>>>>>> +       if (!lpfn || lpfn > man->size)
>>>>>>>>>>                    lpfn = man->size;
>>>>>>>>>>
>>>>>>>>>>            fpfn = (u64)place->fpfn << PAGE_SHIFT;
>>>>>>>>>> -- 
>>>>>>>>>> 2.47.2
>>>>>>>>>>
>>
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment
  2025-05-07 11:31             ` John Olender
@ 2025-05-29 23:15               ` John Olender
  2025-06-02 10:00                 ` Christian König
  0 siblings, 1 reply; 28+ messages in thread
From: John Olender @ 2025-05-29 23:15 UTC (permalink / raw)
  To: Christian König, Alex Deucher; +Cc: amd-gfx, arunpravin.paneerselvam

Ping.

On 5/7/25 7:31 AM, John Olender wrote:
> On 5/5/25 12:06 PM, John Olender wrote:
>> On 5/5/25 5:02 AM, Christian König wrote:
>>>> Simply changing the uvd vcpu bo (and therefore the firmware) to always
>>>> be allocated in vram does *not* solve #3851.
>>>>
>>>> Let me go into a bit of depth about how I arrived at this patch.
>>>>
>>>> First, what sort of system configuration changes result in the uvd init
>>>> failure?  It looks like having a display connected and changing the BAR
>>>> size have an impact.  Next, which kernel change reliably triggers the
>>>> issue?  The change is the switch to the buddy allocator.
>>>
>>> Well that is not a resizable BAR, but rather the "VRAM" is just stolen system memory and we completely bypass the BAR to access it.
>>>
>>> But the effect is the same. E.g. you have more memory CPU accessible than otherwise.
>>>
>>>>
>>>> Now that the issue can be reliably triggered, where does the error code,
>>>> -110 / -ETIMEDOUT, come from?  It turns out it's in
>>>> amdgpu_uvd_ring_test_ib(), specifically a timeout while waiting on the
>>>> ring's fence.
>>>>
>>>> With that out of the way, what allocator-related change happens when a
>>>> display is connected at startup?  The 'stolen_vga_memory' and related
>>>> bos are created.  Adding a one page dummy bo to the same place in the
>>>> driver can allow a headless configuration to now pass the uvd ring ib test.
>>>>
>>>> Why does having these extra objects allocated result in a change in
>>>> behavior?  Well, the switch to the buddy allocator drastically changes
>>>> *where* in vram various objects end up being placed.  What about the BAR
>>>> size change?  That ends up influencing where the objects are placed too.
>>>>
>>>> Which objects related to uvd end up being moved around?  The uvd code
>>>> has a function to force its objects into a specific segment after all.
>>>> Well, it turns out the vcpu bo doesn't go through this function and is
>>>> therefore being moved around.
>>>
>>> That function is there because independent buffers (the message and the feedback for example) needs to be in the same 256MB segment.
>>>
>>>> When the system configuration results in a ring ib timeout, the uvd vcpu
>>>> bo is pinned *outside* the uvd segment.  When uvd init succeeds, the uvd
>>>> vcpu bo is pinned *inside* the uvd segment.
>>>>
>>>> So, it appears there's a relationship between *where* the vcpu bo ends
>>>> up and the fence timeout.  But why does the issue manifest as a ring
>>>> fence timeout while testing the ib?  Unfortunately, I'm unable to find
>>>> something like a datasheet or developer's guide containing the finer
>>>> details of uvd.
>>>
>>>
>>> Mhm, there must be something wrong with programming bits 28-31 of the VCPU BO base address.
>>>
>>> Forcing the VCPU into the first 256 segment just makes those bits zero and so makes it work on your system.
>>>
>>> The problem is that this is basically just coincident. On other systems the base address can be completely different.
>>>
>>> See function uvd_v4_2_mc_resume() where the mmUVD_LMI_ADDR_EXT and mmUVD_LMI_EXT40_ADDR register is programmed and try to hack those two register writes and see if they really end up in the HW.
> 
> Okay, I did a read and compare after each write.
> 
> Both writes seem to go through on both the Kaveri and s9150:
> 
> Kaveri (512MB UMA Buffer):
> amdgpu 0000:00:01.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_ADDR_EXT: gpu_addr=0xF41FA00000, addr=0x00000001, wrote 0x00001001, read 0x00001001 [same]
> amdgpu 0000:00:01.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_EXT40_ADDR: gpu_addr=0xF41FA00000, addr=0x000000F4, wrote 0x800900F4, read 0x800900F4 [same]
> 
> s9150:
> amdgpu 0000:41:00.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_ADDR_EXT: gpu_addr=0xF7FFA00000, addr=0x0000000F, wrote 0x0000F00F, read 0x0000F00F [same]
> amdgpu 0000:41:00.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_EXT40_ADDR: gpu_addr=0xF7FFA00000, addr=0x000000F7, wrote 0x800900F7, read 0x800900F7 [same]
> 

I've also confirmed the patch works fine when segments other than
[0, 256M) are used.
 
E.g.: Both init and VA-API playback work fine with a UVD segment of
[1792M, 2048M) on Kaveri with a 2G UMA buffer.

> Thanks,
> John
> 
>>>
>>> I will try to find a Kaveri system which is still working to reproduce the issue.
>>>
>>> Thanks,
>>> Christian.
>>>
>>
>> I first saw this issue with a s9150.  I had serious reservations about
>> reporting the issue because, in its default configuration, the s9150 has
>> no display output.  I needed to figure out that yes, this is a real
>> issue, I didn't just shoot myself in the foot by enabling broken display
>> hardware.
>>
>> The issue affects all s9150s in a system, occurs in different slots and
>> numa nodes, still occurs when other hardware is added or removed, and
>> follows the s9150 from x399 to a significantly newer b650 system.
>>
>> The Kaveri iGPU, while also impacted, mainly serves to show that yes,
>> this issue is happening on more than just some dodgy s9150 setup.
>>
>> Anyway, hopefully these extra configuration details help narrow down the
>> problem.
>>
>> Thanks,
>> John
>>
>>>>
>>>> Well, what seems related in the code?  Where is the ring fence located?
>>>> It's placed inside the vcpu bo by amdgpu_fence_driver_start_ring().
>>>>
>>>> So, does this patch provide the correct solution to the problem?  Maybe
>>>> not.  But the solution seems plausible enough to at least send in the
>>>> patch for review.
>>>>
>>>> Thanks,
>>>> John
>>
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment
  2025-05-29 23:15               ` John Olender
@ 2025-06-02 10:00                 ` Christian König
  2025-06-02 13:03                   ` John Olender
  0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2025-06-02 10:00 UTC (permalink / raw)
  To: John Olender, Alex Deucher; +Cc: amd-gfx, arunpravin.paneerselvam

Hi John,

first of all thanks a lot for taking a look into this.

>> Okay, I did a read and compare after each write.
>>
>> Both writes seem to go through on both the Kaveri and s9150:
>>
>> Kaveri (512MB UMA Buffer):
>> amdgpu 0000:00:01.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_ADDR_EXT: gpu_addr=0xF41FA00000, addr=0x00000001, wrote 0x00001001, read 0x00001001 [same]
>> amdgpu 0000:00:01.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_EXT40_ADDR: gpu_addr=0xF41FA00000, addr=0x000000F4, wrote 0x800900F4, read 0x800900F4 [same]
>>
>> s9150:
>> amdgpu 0000:41:00.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_ADDR_EXT: gpu_addr=0xF7FFA00000, addr=0x0000000F, wrote 0x0000F00F, read 0x0000F00F [same]
>> amdgpu 0000:41:00.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_EXT40_ADDR: gpu_addr=0xF7FFA00000, addr=0x000000F7, wrote 0x800900F7, read 0x800900F7 [same]
>>
> 
> I've also confirmed the patch works fine when segments other than
> [0, 256M) are used.
>  
> E.g.: Both init and VA-API playback work fine with a UVD segment of
> [1792M, 2048M) on Kaveri with a 2G UMA buffer.

Oh, that's a very interesting find. Could you try to turn around the way the patch works?

E.g. instead of forcing the UVD FW into the first segment, change amdgpu_uvd_force_into_uvd_segment() so that the BOs are forced into the same segment as the UVD firmware.

That would resolve my concern that this could overload the first segment. The feedback and message BO are usually rather small (4 or 128k IIRC), but the firmware is a couple of megabytes in size.

When we have other FW and VGA emulation buffers in the first segment as well then that could result into clashing that segment to much.

Thanks,
Christian.

>>>> I will try to find a Kaveri system which is still working to reproduce the issue.

I unfortunately couldn't find a working box of hand. Would need to search in our HW stash for a box which still works and get that shipped to me.

And that is overhead for this issue I would rather like to avoid.

So if you can come up with a simpler patch which works for you I'm happy to take that.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment
  2025-06-02 10:00                 ` Christian König
@ 2025-06-02 13:03                   ` John Olender
  2025-06-03 14:34                     ` John Olender
  0 siblings, 1 reply; 28+ messages in thread
From: John Olender @ 2025-06-02 13:03 UTC (permalink / raw)
  To: Christian König, Alex Deucher; +Cc: amd-gfx, arunpravin.paneerselvam

On 6/2/25 6:00 AM, Christian König wrote:
> Hi John,
> 
> first of all thanks a lot for taking a look into this.
> 
>>> Okay, I did a read and compare after each write.
>>>
>>> Both writes seem to go through on both the Kaveri and s9150:
>>>
>>> Kaveri (512MB UMA Buffer):
>>> amdgpu 0000:00:01.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_ADDR_EXT: gpu_addr=0xF41FA00000, addr=0x00000001, wrote 0x00001001, read 0x00001001 [same]
>>> amdgpu 0000:00:01.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_EXT40_ADDR: gpu_addr=0xF41FA00000, addr=0x000000F4, wrote 0x800900F4, read 0x800900F4 [same]
>>>
>>> s9150:
>>> amdgpu 0000:41:00.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_ADDR_EXT: gpu_addr=0xF7FFA00000, addr=0x0000000F, wrote 0x0000F00F, read 0x0000F00F [same]
>>> amdgpu 0000:41:00.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_EXT40_ADDR: gpu_addr=0xF7FFA00000, addr=0x000000F7, wrote 0x800900F7, read 0x800900F7 [same]
>>>
>>
>> I've also confirmed the patch works fine when segments other than
>> [0, 256M) are used.
>>  
>> E.g.: Both init and VA-API playback work fine with a UVD segment of
>> [1792M, 2048M) on Kaveri with a 2G UMA buffer.
> 
> Oh, that's a very interesting find. Could you try to turn around the way the patch works?
> 
> E.g. instead of forcing the UVD FW into the first segment, change amdgpu_uvd_force_into_uvd_segment() so that the BOs are forced into the same segment as the UVD firmware.
> 
> That would resolve my concern that this could overload the first segment. The feedback and message BO are usually rather small (4 or 128k IIRC), but the firmware is a couple of megabytes in size.
> 
> When we have other FW and VGA emulation buffers in the first segment as well then that could result into clashing that segment to much.
> 
> Thanks,
> Christian.
> 

Okay, yeah, that should make for a significantly simpler fix.

>>>>> I will try to find a Kaveri system which is still working to reproduce the issue.
> 
> I unfortunately couldn't find a working box of hand. Would need to search in our HW stash for a box which still works and get that shipped to me.
> 
> And that is overhead for this issue I would rather like to avoid.
> 
> So if you can come up with a simpler patch which works for you I'm happy to take that.
> 
> Regards,
> Christian.

Sounds good.  I'll get started on this variant.

Thanks,
John


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment
  2025-06-02 13:03                   ` John Olender
@ 2025-06-03 14:34                     ` John Olender
  2025-06-03 16:26                       ` Christian König
  0 siblings, 1 reply; 28+ messages in thread
From: John Olender @ 2025-06-03 14:34 UTC (permalink / raw)
  To: Christian König, Alex Deucher; +Cc: amd-gfx, arunpravin.paneerselvam

>> Oh, that's a very interesting find. Could you try to turn around the way the patch works?
>>
>> E.g. instead of forcing the UVD FW into the first segment, change amdgpu_uvd_force_into_uvd_segment() so that the BOs are forced into the same segment as the UVD firmware.
>>

I started implementing this and I realized two main problems with this
approach.

First, there's currently no guarantee the UVD FW does not cross a 256MB
boundary.  Checking for this and providing a fallback is going to make
this patch... not really any less complex than the original.

Second, most of time this is just going to end up selecting the first
segment anyway.  I'll go more into this below.

>> That would resolve my concern that this could overload the first segment. The feedback and message BO are usually rather small (4 or 128k IIRC), but the firmware is a couple of megabytes in size.
>>
>> When we have other FW and VGA emulation buffers in the first segment as well then that could result into clashing that segment to much.
>>

During my initial investigation, I found out that the UVD FW got placed
in the first segment *because* things were already placed there.  This
is why adding a 'stolen_vga_memory' substitute was an effective workaround.

So, CIK is *already* forced to deal with an overloaded first segment
and, with the inverted approach, will continue to do so for typical use
cases.  Explicitly placing the UVD FW into the first segment just makes
this guaranteed.

I did implement a module parameter for testing that allows designating a
specific 256MB segment as the legacy UVD segment.  I can polish this up
so the user has the option to manually relieve some of the first segment
pressure on SI and CIK devices.

I haven't run into a situation where I've needed this during normal use,
but I can certainly appreciate it being available.

Thanks,
John

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment
  2025-06-03 14:34                     ` John Olender
@ 2025-06-03 16:26                       ` Christian König
  2025-06-03 17:52                         ` John Olender
  0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2025-06-03 16:26 UTC (permalink / raw)
  To: John Olender, Alex Deucher; +Cc: amd-gfx, arunpravin.paneerselvam



On 6/3/25 16:34, John Olender wrote:
>>> Oh, that's a very interesting find. Could you try to turn around the way the patch works?
>>>
>>> E.g. instead of forcing the UVD FW into the first segment, change amdgpu_uvd_force_into_uvd_segment() so that the BOs are forced into the same segment as the UVD firmware.
>>>
> 
> I started implementing this and I realized two main problems with this
> approach.
> 
> First, there's currently no guarantee the UVD FW does not cross a 256MB
> boundary.

There actually is a guarantee for that.

FW BOs are always allocated contiguously, and the allocation backend makes sure that for example an 16MiB (rounded up) FW is always aligned to a 16MiB boundary.

So FW memory never crosses a segment boundary larger than it's rounded up size. Without that tons of things in our driver would fall apart, not just UVD :)

  Checking for this and providing a fallback is going to make
> this patch... not really any less complex than the original.
> 
> Second, most of time this is just going to end up selecting the first
> segment anyway.  I'll go more into this below.

That's fine with me. I'm just concerned about that 1% of people where it potentially worked before and we are now breaking it.

> 
>>> That would resolve my concern that this could overload the first segment. The feedback and message BO are usually rather small (4 or 128k IIRC), but the firmware is a couple of megabytes in size.
>>>
>>> When we have other FW and VGA emulation buffers in the first segment as well then that could result into clashing that segment to much.
>>>
> 
> During my initial investigation, I found out that the UVD FW got placed
> in the first segment *because* things were already placed there.  This
> is why adding a 'stolen_vga_memory' substitute was an effective workaround.
> 
> So, CIK is *already* forced to deal with an overloaded first segment
> and, with the inverted approach, will continue to do so for typical use
> cases.  Explicitly placing the UVD FW into the first segment just makes
> this guaranteed.
> 
> I did implement a module parameter for testing that allows designating a
> specific 256MB segment as the legacy UVD segment.  I can polish this up
> so the user has the option to manually relieve some of the first segment
> pressure on SI and CIK devices.

The module parameter is fine for your testing, but that is not something I would accept upstream.

As soon as things are not working automatically any more people will start to complain. That we have a manual work around is only feasible to a minority.

Regards,
Christian.

> 
> I haven't run into a situation where I've needed this during normal use,
> but I can certainly appreciate it being available.
> 
> Thanks,
> John


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment
  2025-06-03 16:26                       ` Christian König
@ 2025-06-03 17:52                         ` John Olender
  0 siblings, 0 replies; 28+ messages in thread
From: John Olender @ 2025-06-03 17:52 UTC (permalink / raw)
  To: Christian König, Alex Deucher; +Cc: amd-gfx, arunpravin.paneerselvam

On 6/3/25 12:26 PM, Christian König wrote:
> 
> 
> On 6/3/25 16:34, John Olender wrote:
>>>> Oh, that's a very interesting find. Could you try to turn around the way the patch works?
>>>>
>>>> E.g. instead of forcing the UVD FW into the first segment, change amdgpu_uvd_force_into_uvd_segment() so that the BOs are forced into the same segment as the UVD firmware.
>>>>
>>
>> I started implementing this and I realized two main problems with this
>> approach.
>>
>> First, there's currently no guarantee the UVD FW does not cross a 256MB
>> boundary.
> 
> There actually is a guarantee for that.
> 
> FW BOs are always allocated contiguously, and the allocation backend makes sure that for example an 16MiB (rounded up) FW is always aligned to a 16MiB boundary.
> 
> So FW memory never crosses a segment boundary larger than it's rounded up size. Without that tons of things in our driver would fall apart, not just UVD :)

Aha, thanks!

> 
>   Checking for this and providing a fallback is going to make
>> this patch... not really any less complex than the original.
>>
>> Second, most of time this is just going to end up selecting the first
>> segment anyway.  I'll go more into this below.
> 
> That's fine with me. I'm just concerned about that 1% of people where it potentially worked before and we are now breaking it.
> 
>>
>>>> That would resolve my concern that this could overload the first segment. The feedback and message BO are usually rather small (4 or 128k IIRC), but the firmware is a couple of megabytes in size.
>>>>
>>>> When we have other FW and VGA emulation buffers in the first segment as well then that could result into clashing that segment to much.
>>>>

Okay this makes much more sense.  I read this as general concern and put
the brakes on, especially after missing the above.

I'll resume work on the simplified patch.

Thanks again,
John

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment
  2025-05-02  8:36     ` Christian König
  2025-05-03 20:31       ` John Olender
@ 2025-06-05  9:21       ` John Olender
  2025-06-05  9:54         ` Christian König
  1 sibling, 1 reply; 28+ messages in thread
From: John Olender @ 2025-06-05  9:21 UTC (permalink / raw)
  To: Christian König, Alex Deucher; +Cc: amd-gfx, arunpravin.paneerselvam

On 5/2/25 4:36 AM, Christian König wrote:
>> I think we can just make this VRAM only.  Or something like:
>> adev->uvd.address_64_bit ? AMDGPU_GEM_DOMAIN_GTT : AMDGPU_GEM_DOMAIN_VRAM
> Yeah completely agree. It's a good catch, but the solution is incorrect.
> 
> On the older UVD MC interface the FW needs to be in VRAM or the validation fails. If it's inside the window for the message and fence is actually irrelevant.
> 
> So something like AMDGPU_GEM_DOMAIN_VRAM | (adev->uvd.address_64_bit ? AMDGPU_GEM_DOMAIN_GTT : 0) would be correct I think.
> 
>> If that fixes it, this should be tagged with:
>> Fixes: 58ab2c08d708 ("drm/amdgpu: use VRAM|GTT for a bunch of kernel
>> allocations")
> And CC stable I think.
> 
> Regards,
> Christian.
> 
>> Alex

The simplified patch needs this change to the vcpu bo domain.  Would you
prefer this change as a separate patch?

Thanks,
John

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment
  2025-06-05  9:21       ` John Olender
@ 2025-06-05  9:54         ` Christian König
  0 siblings, 0 replies; 28+ messages in thread
From: Christian König @ 2025-06-05  9:54 UTC (permalink / raw)
  To: John Olender, Alex Deucher; +Cc: amd-gfx, arunpravin.paneerselvam

On 6/5/25 11:21, John Olender wrote:
> On 5/2/25 4:36 AM, Christian König wrote:
>>> I think we can just make this VRAM only.  Or something like:
>>> adev->uvd.address_64_bit ? AMDGPU_GEM_DOMAIN_GTT : AMDGPU_GEM_DOMAIN_VRAM
>> Yeah completely agree. It's a good catch, but the solution is incorrect.
>>
>> On the older UVD MC interface the FW needs to be in VRAM or the validation fails. If it's inside the window for the message and fence is actually irrelevant.
>>
>> So something like AMDGPU_GEM_DOMAIN_VRAM | (adev->uvd.address_64_bit ? AMDGPU_GEM_DOMAIN_GTT : 0) would be correct I think.
>>
>>> If that fixes it, this should be tagged with:
>>> Fixes: 58ab2c08d708 ("drm/amdgpu: use VRAM|GTT for a bunch of kernel
>>> allocations")
>> And CC stable I think.
>>
>> Regards,
>> Christian.
>>
>>> Alex
> 
> The simplified patch needs this change to the vcpu bo domain.  Would you
> prefer this change as a separate patch?

Separate one please.

Just send it as the first one in a series.

Regards,
Christian.

> 
> Thanks,
> John


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-06-05  9:54 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-29 11:24 [RFC PATCH 0/2] drm/amdgpu: CIK UVD initialization fixes John Olender
2025-04-29 11:24 ` [RFC PATCH 1/2] drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram John Olender
2025-04-30 21:20   ` Alex Deucher
2025-04-30 21:44     ` Paneer Selvam, Arunpravin
2025-05-02 15:32       ` John Olender
2025-05-03 12:23         ` Paneer Selvam, Arunpravin
2025-05-11 20:33           ` Paneer Selvam, Arunpravin
2025-05-11 20:37             ` Paneer Selvam, Arunpravin
2025-05-12  7:09               ` Christian König
2025-05-12  7:11                 ` Paneer Selvam, Arunpravin
2025-05-15 15:49                   ` Paneer Selvam, Arunpravin
2025-05-23 11:31                     ` Paneer Selvam, Arunpravin
2025-05-02  8:28     ` Christian König
2025-04-29 11:24 ` [RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment John Olender
2025-04-30 21:39   ` Alex Deucher
2025-05-02  8:36     ` Christian König
2025-05-03 20:31       ` John Olender
2025-05-05  9:02         ` Christian König
2025-05-05 16:06           ` John Olender
2025-05-07 11:31             ` John Olender
2025-05-29 23:15               ` John Olender
2025-06-02 10:00                 ` Christian König
2025-06-02 13:03                   ` John Olender
2025-06-03 14:34                     ` John Olender
2025-06-03 16:26                       ` Christian König
2025-06-03 17:52                         ` John Olender
2025-06-05  9:21       ` John Olender
2025-06-05  9:54         ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox