Re: [PATCH 2/5] drm/amdgpu: Use placements of 256M GART segments for SI/CIK

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Timur Kristóf" <timur.kristof@gmail.com>
To: amd-gfx@lists.freedesktop.org,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Natalie Vock" <natalie.vock@gmx.de>,
	"John Olender" <john.olender@gmail.com>,
	"Liu Leo" <Leo.Liu@amd.com>,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: [PATCH 2/5] drm/amdgpu: Use placements of 256M GART segments for SI/CIK
Date: Tue, 19 May 2026 11:16:29 +0200	[thread overview]
Message-ID: <3693368.dWV9SEqChM@timur-hyperion> (raw)
In-Reply-To: <97a4608b-133b-4c87-ab61-ea45c638693d@amd.com>

On Tuesday, May 19, 2026 11:01:48 AM Central European Summer Time Christian 
König wrote:
> On 5/19/26 10:59, Timur Kristóf wrote:
> > On Tuesday, May 19, 2026 10:54:10 AM Central European Summer Time
> > Christian
> > 
> > König wrote:
> >> On 5/19/26 10:22, Timur Kristóf wrote:
> >>> UVD 4.x and older require that BOs don't cross 256M segments.
> >>> We need to respect that in amdgpu_ttm_alloc_gart().
> >>> We can't move the BOs later because GTT->GTT moves are
> >>> not implemented. We also can't force all BOs to VRAM
> >>> because that becomes very problematic in low VRAM scenarios.
> >>> 
> >>> This fixes UVD CS BOs crossing 256M segments
> >>> when they are placed in the GART.
> >> 
> >> Clear NAK for that approach.
> >> 
> >> This is the general TTM interface function and shouldn't have any HW
> >> generation dependent code in it.
> > 
> > I don't see how else to solve this, since GTT->GTT moves are not
> > implemented, so we can't move the BO to a suitable address later. We also
> > can't move it to VRAM.
> 
> GTT to GTT moves should be relatively easy to implement.
> 
> We just need to wait for the BO to be idle, unbind, move and bind again.

Implementing GTT->GTT moves sounds like a bigger task and cannot be backported 
as that would be a new feature not a bug fix. So I strongly prefer to solve 
this problem with the tools we already have available.

Also, I would prefer to not have to move the BO at all and give it a suitable 
address from the beginning, to avoid the overhead of the move.

As far as I understand you take issue with checking adev->family in 
amdgpu_ttm_alloc_gart(), right? So, how about one of these alternatives:

- add a bool argument so the caller can request 256M segments, then the caller 
can check the GPU generation
- add an optional argument so the caller can just pass in a placements array

Or, if you have a different suggestion, let me know.

Thanks,
Timur

> >> 
> >>> Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4799
> >>> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> >>> ---
> >>> 
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 56 ++++++++++++++++++++++---
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  3 ++
> >>>  2 files changed, 53 insertions(+), 6 deletions(-)
> >>> 
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index
> >>> 6c6ab4dd6ea9..a106c7e77e26 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>> @@ -959,6 +959,40 @@ static int amdgpu_ttm_backend_bind(struct
> >>> ttm_device
> >>> *bdev,>
> >>> 
> >>>  	return 0;
> >>>  
> >>>  }
> >>> 
> >>> +/**
> >>> + * amdgpu_ttm_fill_gart_256M_placements() - Fill placements array with
> >>> 256M GART segments + *
> >>> + * @bo: TTM buffer objects whose placements should be filled
> >>> + * @placements: Pointer to an array of placements
> >>> + * @max_placements: Size of the placements array
> >>> + *
> >>> + * Fill the specified placements array with 256M GART segments,
> >>> + * starting from the highest address in order to reduce the
> >>> + * contention of the lowest segment.
> >>> + *
> >>> + * Returns the number of placements filled.
> >>> + */
> >>> +u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
> >>> +					 struct ttm_place
> > 
> > *placements,
> > 
> >>> +					 u32 max_placements)
> >>> +{
> >>> +	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
> >>> +	u32 i;
> >>> +
> >>> +	/* Fill the placements array with 256M segments, starting from
> > 
> > highest.
> > 
> >>> */ +	for (i = 0; i < max_placements; ++i) {
> >>> +		if (i * SZ_256M >= adev->gmc.gart_size)
> >>> +			break;
> >>> +
> >>> +		placements[i].lpfn = (adev->gmc.gart_size - i *
> > 
> > SZ_256M) >> PAGE_SHIFT;
> > 
> >>> +		placements[i].fpfn = ALIGN_DOWN(placements[i].lpfn - 1,
> > 
> > SZ_256M >>
> > 
> >>> PAGE_SHIFT); +		placements[i].mem_type = TTM_PL_TT;
> >>> +		placements[i].flags = bo->resource->placement;
> >>> +	}
> >>> +
> >>> +	return i;
> >>> +}
> >>> +
> >>> 
> >>>  /*
> >>>  
> >>>   * amdgpu_ttm_alloc_gart - Make sure buffer object is accessible either
> >>>   * through AGP or GART aperture.
> >>> 
> >>> @@ -973,7 +1007,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object
> >>> *bo)>
> >>> 
> >>>  	struct ttm_operation_ctx ctx = { false, false };
> >>>  	struct amdgpu_ttm_tt *gtt = ttm_to_amdgpu_ttm_tt(bo->ttm);
> >>>  	struct ttm_placement placement;
> >>> 
> >>> -	struct ttm_place placements;
> >>> +	struct ttm_place placements[AMDGPU_BO_MAX_PLACEMENTS];
> >>> 
> >>>  	struct ttm_resource *tmp;
> >>>  	uint64_t addr, flags;
> >>>  	int r;
> >>> 
> >>> @@ -987,11 +1021,21 @@ int amdgpu_ttm_alloc_gart(struct
> >>> ttm_buffer_object
> >>> *bo)>
> >>> 
> >>>  	/* allocate GART space */
> >>>  	placement.num_placement = 1;
> >>> 
> >>> -	placement.placement = &placements;
> >>> -	placements.fpfn = 0;
> >>> -	placements.lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
> >>> -	placements.mem_type = TTM_PL_TT;
> >>> -	placements.flags = bo->resource->placement;
> >>> +	placement.placement = &placements[0];
> >>> +	placements[0].fpfn = 0;
> >>> +	placements[0].lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
> >>> +	placements[0].mem_type = TTM_PL_TT;
> >>> +	placements[0].flags = bo->resource->placement;
> >>> +
> >>> +	/*
> >>> +	 * UVD 4.x and older require that BOs don't cross 256M segments.
> >>> +	 * We need to respect that here. We can't move the BO later
> >>> +	 * because GTT->GTT moves are not implemented.
> >>> +	 */
> >>> +	if (bo->base.size < SZ_256M && adev->family <= AMDGPU_FAMILY_KV)
> >>> +		placement.num_placement =
> >>> +			amdgpu_ttm_fill_gart_256M_placements(bo,
> > 
> > placements,
> > 
> >>> +
> > 
> > ARRAY_SIZE(placements));
> > 
> >>>  	r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx);
> >>>  	if (unlikely(r))
> >>> 
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index
> >>> 2d72fa217274..e9de628c8d2d 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> >>> @@ -202,6 +202,9 @@ int amdgpu_ttm_clear_buffer(struct
> >>> amdgpu_ttm_buffer_entity *entity,>
> >>> 
> >>>  			    u64 k_job_id);
> >>>  
> >>>  struct amdgpu_ttm_buffer_entity *amdgpu_ttm_next_clear_entity(struct
> >>>  amdgpu_device *adev);>
> >>> 
> >>> +u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
> >>> +					 struct ttm_place
> > 
> > *placements,
> > 
> >>> +					 u32 max_placements);
> >>> 
> >>>  int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
> >>>  void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);
> >>>  uint64_t amdgpu_ttm_domain_start(struct amdgpu_device *adev, uint32_t
> >>>  type);

next prev parent reply	other threads:[~2026-05-19  9:16 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19  8:21 [PATCH 0/5] drm/amdgpu/uvd: Fix UVD BO memory placement issues Timur Kristóf
2026-05-19  8:22 ` [PATCH 1/5] drm/amdgpu: Respect placement requirements in amdgpu_gtt_mgr functions Timur Kristóf
2026-05-19  8:52   ` Christian König
2026-05-19  8:22 ` [PATCH 2/5] drm/amdgpu: Use placements of 256M GART segments for SI/CIK Timur Kristóf
2026-05-19  8:54   ` Christian König
2026-05-19  8:59     ` Timur Kristóf
2026-05-19  9:01       ` Christian König
2026-05-19  9:16         ` Timur Kristóf [this message]
2026-05-19  8:22 ` [PATCH 3/5] drm/amdgpu/uvd: Place VCPU BO only in VRAM for UVD 4.x and older Timur Kristóf
2026-05-19  8:56   ` Christian König
2026-05-19  8:22 ` [PATCH 4/5] drm/amdgpu/uvd: Fix forcing BOs into UVD segment when it isn't at 0 Timur Kristóf
2026-05-19  9:06   ` Christian König
2026-05-19  9:32     ` Timur Kristóf
2026-05-19  8:22 ` [PATCH 5/5] drm/amdgpu/uvd: Move BOs to GTT when we can't place them in VRAM correctly Timur Kristóf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3693368.dWV9SEqChM@timur-hyperion \
    --to=timur.kristof@gmail.com \
    --cc=Leo.Liu@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=john.olender@gmail.com \
    --cc=natalie.vock@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.