All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 0/2] Discussion around eviction improvements
@ 2024-05-16 12:18 Tvrtko Ursulin
  2024-05-16 12:18 ` [RFC 1/2] drm/amdgpu: Re-validate evicted buffers Tvrtko Ursulin
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2024-05-16 12:18 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Christian König, Friedrich Vock

From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>

Reduced re-spin of my previous series after Christian corrected a few
misconceptions that I had. So lets see if what remains makes sense or is still
misguided.

To summarise, the series address the following two issues:

 * Migration rate limiting does not work, at least not for the common case
   where userspace configures VRAM+GTT. It thinks it can stop migration attempts
   by playing with bo->allowed_domains vs bo->preferred domains but, both from
   the code, and from empirical experiments, I see that not working at all. When
   both masks are identical fiddling with them achieves nothing. Even when they
   are not identical allowed has a fallback GTT placement which means that when
   over the migration budget ttm_bo_validate with bo->allowed_domains can cause
   migration from GTT to VRAM.

 * Driver thinks it will be re-validating evicted buffers on the next submission
   but it does not for the very common case of VRAM+GTT because it only checks
   if current placement is *none* of the preferred placements.

These two patches appear to have a positive result for a memory intensive game
like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
the migration budget appears to keep buffer blits at bay and improves the
minimum frame rate, ie. makes things smoother.

From the game's built-in benchmark, average of three runs each:

						FPS
		migrated KiB	min	avg	max	min-1%	min-0.1%
  because	   20784781	10.00  37.00   89.67    22.00    12.33
  patched	    4227688	13.67  37.00   81.33    23.33    15.00

Disclaimers that I have is that more runs would be needed to be more confident
about the results. And more games. And APU versus discrete.

Cc: Christian König <christian.koenig@amd.com>
Cc: Friedrich Vock <friedrich.vock@gmx.de>

Tvrtko Ursulin (2):
  drm/amdgpu: Re-validate evicted buffers
  drm/amdgpu: Actually respect buffer migration budget

 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  21 ++++-
 2 files changed, 103 insertions(+), 30 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC 1/2] drm/amdgpu: Re-validate evicted buffers
  2024-05-16 12:18 [RFC v2 0/2] Discussion around eviction improvements Tvrtko Ursulin
@ 2024-05-16 12:18 ` Tvrtko Ursulin
  2024-05-16 12:18 ` [RFC 2/2] drm/amdgpu: Actually respect buffer migration budget Tvrtko Ursulin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2024-05-16 12:18 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Christian König, Friedrich Vock

From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>

Currently the driver appears to be thinking that it will be attempting to
re-validate the evicted buffers on the next submission if they are not in
their preferred placement.

That however appears not to be true for the very common case of buffers
with allowed placements of VRAM+GTT. Simply because the check can only
detect if the current placement is *none* of the preferred ones, happily
leaving VRAM+GTT buffers in the GTT placement "forever".

Fix it by extending the VRAM+GTT special case to the re-validation logic.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 6bddd43604bc..e53ff914b62e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1248,10 +1248,25 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
 	 * next command submission.
 	 */
 	if (amdgpu_vm_is_bo_always_valid(vm, bo)) {
-		uint32_t mem_type = bo->tbo.resource->mem_type;
+		unsigned current_domain =
+			amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
+		bool move_to_evict = false;
 
-		if (!(bo->preferred_domains &
-		      amdgpu_mem_type_to_domain(mem_type)))
+		if (!(bo->preferred_domains & current_domain)) {
+			move_to_evict = true;
+		} else if ((bo->preferred_domains & AMDGPU_GEM_DOMAIN_MASK) ==
+			   (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT) &&
+			   current_domain != AMDGPU_GEM_DOMAIN_VRAM) {
+			/*
+			 * If userspace has provided a list of possible
+			 * placements equal to VRAM+GTT, we assume VRAM is *the*
+			 * preferred placement and so try to move it back there
+			 * on the next submission.
+			 */
+			move_to_evict = true;
+		}
+
+		if (move_to_evict)
 			amdgpu_vm_bo_evicted(&bo_va->base);
 		else
 			amdgpu_vm_bo_idle(&bo_va->base);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC 2/2] drm/amdgpu: Actually respect buffer migration budget
  2024-05-16 12:18 [RFC v2 0/2] Discussion around eviction improvements Tvrtko Ursulin
  2024-05-16 12:18 ` [RFC 1/2] drm/amdgpu: Re-validate evicted buffers Tvrtko Ursulin
@ 2024-05-16 12:18 ` Tvrtko Ursulin
  2024-05-16 19:21 ` [RFC v2 0/2] Discussion around eviction improvements Alex Deucher
  2024-05-30 10:17 ` Tvrtko Ursulin
  3 siblings, 0 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2024-05-16 12:18 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Christian König, Friedrich Vock

From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>

Current code appears to live in a misconception that playing with buffer
allowed and preferred placements can always control the decision on
whether backing store migration will be attempted or not.

That is however not the case when userspace sets buffer placements of
VRAM+GTT, which is what radv does since commit 862b6a9a ("radv: Improve
spilling on discrete GPUs."), with the end result of completely ignoring
the migration budget.

Fix this by validating against a local singleton placement set to the
current backing store location. This way, when migration budget has been
depleted, we can prevent ttm_bo_validate from seeing any other than the
current placement.

For the case of implicit GTT allowed domain added in amdgpu_bo_create
when userspace only sets VRAM the behaviour should be the same. On the
first pass the re-validation will attempt to migrate away from the
fallback GTT domain, and if that did not succeed the buffer will remain in
the fallback placement.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
 1 file changed, 85 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index ec888fc6ead8..08e7631f3a2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -32,6 +32,7 @@
 
 #include <drm/amdgpu_drm.h>
 #include <drm/drm_syncobj.h>
+#include <drm/drm_print.h>
 #include <drm/ttm/ttm_tt.h>
 
 #include "amdgpu_cs.h"
@@ -775,6 +776,56 @@ void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes,
 	spin_unlock(&adev->mm_stats.lock);
 }
 
+static bool
+amdgpu_cs_bo_move_under_budget(struct amdgpu_cs_parser *p,
+			       struct amdgpu_bo *abo)
+{
+	struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
+
+	/*
+	 * Don't move this buffer if we have depleted our allowance
+	 * to move it. Don't move anything if the threshold is zero.
+	 */
+	if (p->bytes_moved >= p->bytes_moved_threshold)
+		return false;
+
+	if ((!abo->tbo.base.dma_buf ||
+	     list_empty(&abo->tbo.base.dma_buf->attachments)) &&
+	    (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
+	     (abo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) &&
+	    p->bytes_moved_vis >= p->bytes_moved_vis_threshold) {
+		/*
+		 * And don't move a CPU_ACCESS_REQUIRED BO to limited
+		 * visible VRAM if we've depleted our allowance to do
+		 * that.
+		 */
+		return false;
+	}
+
+	return true;
+}
+
+static bool
+amdgpu_bo_fill_current_placement(struct amdgpu_bo *abo,
+				 struct ttm_placement *placement,
+				 struct ttm_place *place)
+{
+	struct ttm_placement *bo_placement = &abo->placement;
+	int i;
+
+	for (i = 0; i < bo_placement->num_placement; i++) {
+		if (bo_placement->placement[i].mem_type ==
+		    abo->tbo.resource->mem_type) {
+			*place = bo_placement->placement[i];
+			placement->num_placement = 1;
+			placement->placement = place;
+			return true;
+		}
+	}
+
+	return false;
+}
+
 static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
 {
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
@@ -784,46 +835,53 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
 		.no_wait_gpu = false,
 		.resv = bo->tbo.base.resv
 	};
-	uint32_t domain;
+	bool allow_move;
 	int r;
 
 	if (bo->tbo.pin_count)
 		return 0;
 
-	/* Don't move this buffer if we have depleted our allowance
-	 * to move it. Don't move anything if the threshold is zero.
-	 */
-	if (p->bytes_moved < p->bytes_moved_threshold &&
-	    (!bo->tbo.base.dma_buf ||
-	    list_empty(&bo->tbo.base.dma_buf->attachments))) {
-		if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
-		    (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) {
-			/* And don't move a CPU_ACCESS_REQUIRED BO to limited
-			 * visible VRAM if we've depleted our allowance to do
-			 * that.
-			 */
-			if (p->bytes_moved_vis < p->bytes_moved_vis_threshold)
-				domain = bo->preferred_domains;
-			else
-				domain = bo->allowed_domains;
-		} else {
-			domain = bo->preferred_domains;
-		}
-	} else {
-		domain = bo->allowed_domains;
-	}
+	if (!bo->tbo.resource || amdgpu_cs_bo_move_under_budget(p, bo))
+		allow_move = true;
+	else
+		allow_move = false;
 
+	amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
 retry:
-	amdgpu_bo_placement_from_domain(bo, domain);
-	r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+	if (allow_move) {
+		r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+	} else {
+		struct ttm_placement same_placement = { };
+		struct ttm_place same_place;
+		bool placement_found;
+
+		placement_found =
+			amdgpu_bo_fill_current_placement(bo,
+							 &same_placement,
+							 &same_place);
+
+		if (drm_WARN_ON_ONCE(&adev->ddev, !placement_found)) {
+			/*
+			 * QQQ
+			 * Current placement not in the bo->allowed_domains set.
+			 * Can this happen?
+			 * QQQ
+			 */
+			allow_move = true;
+			goto retry;
+		}
+
+		r = ttm_bo_validate(&bo->tbo, &same_placement, &ctx);
+	}
 
 	p->bytes_moved += ctx.bytes_moved;
 	if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
 	    amdgpu_res_cpu_visible(adev, bo->tbo.resource))
 		p->bytes_moved_vis += ctx.bytes_moved;
 
-	if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
-		domain = bo->allowed_domains;
+	if (unlikely(r == -ENOMEM) && !allow_move) {
+		ctx.bytes_moved = 0;
+		allow_move = true;
 		goto retry;
 	}
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC v2 0/2] Discussion around eviction improvements
  2024-05-16 12:18 [RFC v2 0/2] Discussion around eviction improvements Tvrtko Ursulin
  2024-05-16 12:18 ` [RFC 1/2] drm/amdgpu: Re-validate evicted buffers Tvrtko Ursulin
  2024-05-16 12:18 ` [RFC 2/2] drm/amdgpu: Actually respect buffer migration budget Tvrtko Ursulin
@ 2024-05-16 19:21 ` Alex Deucher
  2024-05-17  7:41   ` Tvrtko Ursulin
  2024-05-31  9:12   ` Christian König
  2024-05-30 10:17 ` Tvrtko Ursulin
  3 siblings, 2 replies; 8+ messages in thread
From: Alex Deucher @ 2024-05-16 19:21 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: amd-gfx, dri-devel, kernel-dev, Tvrtko Ursulin,
	Christian König, Friedrich Vock

On Thu, May 16, 2024 at 8:18 AM Tvrtko Ursulin <tursulin@igalia.com> wrote:
>
> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>
> Reduced re-spin of my previous series after Christian corrected a few
> misconceptions that I had. So lets see if what remains makes sense or is still
> misguided.
>
> To summarise, the series address the following two issues:
>
>  * Migration rate limiting does not work, at least not for the common case
>    where userspace configures VRAM+GTT. It thinks it can stop migration attempts
>    by playing with bo->allowed_domains vs bo->preferred domains but, both from
>    the code, and from empirical experiments, I see that not working at all. When
>    both masks are identical fiddling with them achieves nothing. Even when they
>    are not identical allowed has a fallback GTT placement which means that when
>    over the migration budget ttm_bo_validate with bo->allowed_domains can cause
>    migration from GTT to VRAM.
>
>  * Driver thinks it will be re-validating evicted buffers on the next submission
>    but it does not for the very common case of VRAM+GTT because it only checks
>    if current placement is *none* of the preferred placements.

For APUs at least, we should never migrate because GTT and VRAM are
both system memory so are effectively equal performance-wise.  Maybe
this regressed when Christian reworked ttm to better handle migrating
buffers back to VRAM after suspend on dGPUs?

Alex

>
> These two patches appear to have a positive result for a memory intensive game
> like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
> set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
> the migration budget appears to keep buffer blits at bay and improves the
> minimum frame rate, ie. makes things smoother.
>
> From the game's built-in benchmark, average of three runs each:
>
>                                                 FPS
>                 migrated KiB    min     avg     max     min-1%  min-0.1%
>   because          20784781     10.00  37.00   89.67    22.00    12.33
>   patched           4227688     13.67  37.00   81.33    23.33    15.00
>
> Disclaimers that I have is that more runs would be needed to be more confident
> about the results. And more games. And APU versus discrete.
>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Friedrich Vock <friedrich.vock@gmx.de>
>
> Tvrtko Ursulin (2):
>   drm/amdgpu: Re-validate evicted buffers
>   drm/amdgpu: Actually respect buffer migration budget
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  21 ++++-
>  2 files changed, 103 insertions(+), 30 deletions(-)
>
> --
> 2.44.0
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC v2 0/2] Discussion around eviction improvements
  2024-05-16 19:21 ` [RFC v2 0/2] Discussion around eviction improvements Alex Deucher
@ 2024-05-17  7:41   ` Tvrtko Ursulin
  2024-05-17 13:43     ` Alex Deucher
  2024-05-31  9:12   ` Christian König
  1 sibling, 1 reply; 8+ messages in thread
From: Tvrtko Ursulin @ 2024-05-17  7:41 UTC (permalink / raw)
  To: Alex Deucher, Tvrtko Ursulin
  Cc: amd-gfx, dri-devel, kernel-dev, Christian König,
	Friedrich Vock


On 16/05/2024 20:21, Alex Deucher wrote:
> On Thu, May 16, 2024 at 8:18 AM Tvrtko Ursulin <tursulin@igalia.com> wrote:
>>
>> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>
>> Reduced re-spin of my previous series after Christian corrected a few
>> misconceptions that I had. So lets see if what remains makes sense or is still
>> misguided.
>>
>> To summarise, the series address the following two issues:
>>
>>   * Migration rate limiting does not work, at least not for the common case
>>     where userspace configures VRAM+GTT. It thinks it can stop migration attempts
>>     by playing with bo->allowed_domains vs bo->preferred domains but, both from
>>     the code, and from empirical experiments, I see that not working at all. When
>>     both masks are identical fiddling with them achieves nothing. Even when they
>>     are not identical allowed has a fallback GTT placement which means that when
>>     over the migration budget ttm_bo_validate with bo->allowed_domains can cause
>>     migration from GTT to VRAM.
>>
>>   * Driver thinks it will be re-validating evicted buffers on the next submission
>>     but it does not for the very common case of VRAM+GTT because it only checks
>>     if current placement is *none* of the preferred placements.
> 
> For APUs at least, we should never migrate because GTT and VRAM are
> both system memory so are effectively equal performance-wise.  Maybe

I was curious about this but thought there could be a reason why VRAM 
carve-out is a fix small-ish size. It cannot be made 1:1 with RAM or 
some other solution?

> this regressed when Christian reworked ttm to better handle migrating
> buffers back to VRAM after suspend on dGPUs?

I will leave this to Christian to answer but for what this series is 
concerned I'd say it is orthogonal to that.

Here we have two fixes not limited to APU use cases, just so it happens 
fixing the migration throttling improves things there too. And that even 
despite the first patch which triggering *more* migration attempts. 
Because the second patch then correctly curbs them.

First patch should help with transient overcommit on discrete, allowing 
things get back into VRAM as soon as there is space.

Second patch tries to makes migration throttling work as intended.

Volunteers for testing on discrete? :)

>>
>> These two patches appear to have a positive result for a memory intensive game
>> like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
>> set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
>> the migration budget appears to keep buffer blits at bay and improves the
>> minimum frame rate, ie. makes things smoother.
>>
>>  From the game's built-in benchmark, average of three runs each:
>>
>>                                                  FPS
>>                  migrated KiB    min     avg     max     min-1%  min-0.1%
>>    because          20784781     10.00  37.00   89.67    22.00    12.33
>>    patched           4227688     13.67  37.00   81.33    23.33    15.00

Hmm! s/because/before/ here obviously!

Regards,

Tvrtko

>> Disclaimers that I have is that more runs would be needed to be more confident
>> about the results. And more games. And APU versus discrete.
>>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Friedrich Vock <friedrich.vock@gmx.de>
>>
>> Tvrtko Ursulin (2):
>>    drm/amdgpu: Re-validate evicted buffers
>>    drm/amdgpu: Actually respect buffer migration budget
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  21 ++++-
>>   2 files changed, 103 insertions(+), 30 deletions(-)
>>
>> --
>> 2.44.0
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC v2 0/2] Discussion around eviction improvements
  2024-05-17  7:41   ` Tvrtko Ursulin
@ 2024-05-17 13:43     ` Alex Deucher
  0 siblings, 0 replies; 8+ messages in thread
From: Alex Deucher @ 2024-05-17 13:43 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Tvrtko Ursulin, amd-gfx, dri-devel, kernel-dev,
	Christian König, Friedrich Vock

On Fri, May 17, 2024 at 3:41 AM Tvrtko Ursulin
<tvrtko.ursulin@igalia.com> wrote:
>
>
> On 16/05/2024 20:21, Alex Deucher wrote:
> > On Thu, May 16, 2024 at 8:18 AM Tvrtko Ursulin <tursulin@igalia.com> wrote:
> >>
> >> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> >>
> >> Reduced re-spin of my previous series after Christian corrected a few
> >> misconceptions that I had. So lets see if what remains makes sense or is still
> >> misguided.
> >>
> >> To summarise, the series address the following two issues:
> >>
> >>   * Migration rate limiting does not work, at least not for the common case
> >>     where userspace configures VRAM+GTT. It thinks it can stop migration attempts
> >>     by playing with bo->allowed_domains vs bo->preferred domains but, both from
> >>     the code, and from empirical experiments, I see that not working at all. When
> >>     both masks are identical fiddling with them achieves nothing. Even when they
> >>     are not identical allowed has a fallback GTT placement which means that when
> >>     over the migration budget ttm_bo_validate with bo->allowed_domains can cause
> >>     migration from GTT to VRAM.
> >>
> >>   * Driver thinks it will be re-validating evicted buffers on the next submission
> >>     but it does not for the very common case of VRAM+GTT because it only checks
> >>     if current placement is *none* of the preferred placements.
> >
> > For APUs at least, we should never migrate because GTT and VRAM are
> > both system memory so are effectively equal performance-wise.  Maybe
>
> I was curious about this but thought there could be a reason why VRAM
> carve-out is a fix small-ish size. It cannot be made 1:1 with RAM or
> some other solution?

It's really only needed for displays at boot up prior to the OS
loading or for displays on older APU that did not support
scatter/gather display.  The problem with increasing the carveout size
is that that prevents the memory from being available to the rest of
the system.

Alex

>
> > this regressed when Christian reworked ttm to better handle migrating
> > buffers back to VRAM after suspend on dGPUs?
>
> I will leave this to Christian to answer but for what this series is
> concerned I'd say it is orthogonal to that.
>
> Here we have two fixes not limited to APU use cases, just so it happens
> fixing the migration throttling improves things there too. And that even
> despite the first patch which triggering *more* migration attempts.
> Because the second patch then correctly curbs them.
>
> First patch should help with transient overcommit on discrete, allowing
> things get back into VRAM as soon as there is space.
>
> Second patch tries to makes migration throttling work as intended.
>
> Volunteers for testing on discrete? :)
>
> >>
> >> These two patches appear to have a positive result for a memory intensive game
> >> like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
> >> set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
> >> the migration budget appears to keep buffer blits at bay and improves the
> >> minimum frame rate, ie. makes things smoother.
> >>
> >>  From the game's built-in benchmark, average of three runs each:
> >>
> >>                                                  FPS
> >>                  migrated KiB    min     avg     max     min-1%  min-0.1%
> >>    because          20784781     10.00  37.00   89.67    22.00    12.33
> >>    patched           4227688     13.67  37.00   81.33    23.33    15.00
>
> Hmm! s/because/before/ here obviously!
>
> Regards,
>
> Tvrtko
>
> >> Disclaimers that I have is that more runs would be needed to be more confident
> >> about the results. And more games. And APU versus discrete.
> >>
> >> Cc: Christian König <christian.koenig@amd.com>
> >> Cc: Friedrich Vock <friedrich.vock@gmx.de>
> >>
> >> Tvrtko Ursulin (2):
> >>    drm/amdgpu: Re-validate evicted buffers
> >>    drm/amdgpu: Actually respect buffer migration budget
> >>
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  21 ++++-
> >>   2 files changed, 103 insertions(+), 30 deletions(-)
> >>
> >> --
> >> 2.44.0
> >>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC v2 0/2] Discussion around eviction improvements
  2024-05-16 12:18 [RFC v2 0/2] Discussion around eviction improvements Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2024-05-16 19:21 ` [RFC v2 0/2] Discussion around eviction improvements Alex Deucher
@ 2024-05-30 10:17 ` Tvrtko Ursulin
  3 siblings, 0 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2024-05-30 10:17 UTC (permalink / raw)
  To: Tvrtko Ursulin, amd-gfx, dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Christian König, Friedrich Vock


Hi,

On 16/05/2024 13:18, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> 
> Reduced re-spin of my previous series after Christian corrected a few
> misconceptions that I had. So lets see if what remains makes sense or is still
> misguided.
> 
> To summarise, the series address the following two issues:
> 
>   * Migration rate limiting does not work, at least not for the common case
>     where userspace configures VRAM+GTT. It thinks it can stop migration attempts
>     by playing with bo->allowed_domains vs bo->preferred domains but, both from
>     the code, and from empirical experiments, I see that not working at all. When
>     both masks are identical fiddling with them achieves nothing. Even when they
>     are not identical allowed has a fallback GTT placement which means that when
>     over the migration budget ttm_bo_validate with bo->allowed_domains can cause
>     migration from GTT to VRAM.
> 
>   * Driver thinks it will be re-validating evicted buffers on the next submission
>     but it does not for the very common case of VRAM+GTT because it only checks
>     if current placement is *none* of the preferred placements.
> 
> These two patches appear to have a positive result for a memory intensive game
> like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
> set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
> the migration budget appears to keep buffer blits at bay and improves the
> minimum frame rate, ie. makes things smoother.
> 
>>From the game's built-in benchmark, average of three runs each:
> 
> 						FPS
> 		migrated KiB	min	avg	max	min-1%	min-0.1%
>    because	   20784781	10.00  37.00   89.67    22.00    12.33
>    patched	    4227688	13.67  37.00   81.33    23.33    15.00

Any feedback on this series?

As described above, neither migration rate limiting or re-validation of 
evicted buffers seems to work as expected currently.

Regards,

Tvrtko

> Disclaimers that I have is that more runs would be needed to be more confident
> about the results. And more games. And APU versus discrete.
> 
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Friedrich Vock <friedrich.vock@gmx.de>
> 
> Tvrtko Ursulin (2):
>    drm/amdgpu: Re-validate evicted buffers
>    drm/amdgpu: Actually respect buffer migration budget
> 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  21 ++++-
>   2 files changed, 103 insertions(+), 30 deletions(-)
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC v2 0/2] Discussion around eviction improvements
  2024-05-16 19:21 ` [RFC v2 0/2] Discussion around eviction improvements Alex Deucher
  2024-05-17  7:41   ` Tvrtko Ursulin
@ 2024-05-31  9:12   ` Christian König
  1 sibling, 0 replies; 8+ messages in thread
From: Christian König @ 2024-05-31  9:12 UTC (permalink / raw)
  To: Alex Deucher, Tvrtko Ursulin
  Cc: amd-gfx, dri-devel, kernel-dev, Tvrtko Ursulin,
	Christian König, Friedrich Vock

Am 16.05.24 um 21:21 schrieb Alex Deucher:
> On Thu, May 16, 2024 at 8:18 AM Tvrtko Ursulin <tursulin@igalia.com> wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>
>> Reduced re-spin of my previous series after Christian corrected a few
>> misconceptions that I had. So lets see if what remains makes sense or is still
>> misguided.
>>
>> To summarise, the series address the following two issues:
>>
>>   * Migration rate limiting does not work, at least not for the common case
>>     where userspace configures VRAM+GTT. It thinks it can stop migration attempts
>>     by playing with bo->allowed_domains vs bo->preferred domains but, both from
>>     the code, and from empirical experiments, I see that not working at all. When
>>     both masks are identical fiddling with them achieves nothing. Even when they
>>     are not identical allowed has a fallback GTT placement which means that when
>>     over the migration budget ttm_bo_validate with bo->allowed_domains can cause
>>     migration from GTT to VRAM.
>>
>>   * Driver thinks it will be re-validating evicted buffers on the next submission
>>     but it does not for the very common case of VRAM+GTT because it only checks
>>     if current placement is *none* of the preferred placements.
> For APUs at least, we should never migrate because GTT and VRAM are
> both system memory so are effectively equal performance-wise.  Maybe
> this regressed when Christian reworked ttm to better handle migrating
> buffers back to VRAM after suspend on dGPUs?

Yeah, that's quite likely. I'm already looking into this.

Regards,
Christian.

>
> Alex
>
>> These two patches appear to have a positive result for a memory intensive game
>> like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
>> set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
>> the migration budget appears to keep buffer blits at bay and improves the
>> minimum frame rate, ie. makes things smoother.
>>
>>  From the game's built-in benchmark, average of three runs each:
>>
>>                                                  FPS
>>                  migrated KiB    min     avg     max     min-1%  min-0.1%
>>    because          20784781     10.00  37.00   89.67    22.00    12.33
>>    patched           4227688     13.67  37.00   81.33    23.33    15.00
>>
>> Disclaimers that I have is that more runs would be needed to be more confident
>> about the results. And more games. And APU versus discrete.
>>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Friedrich Vock <friedrich.vock@gmx.de>
>>
>> Tvrtko Ursulin (2):
>>    drm/amdgpu: Re-validate evicted buffers
>>    drm/amdgpu: Actually respect buffer migration budget
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  21 ++++-
>>   2 files changed, 103 insertions(+), 30 deletions(-)
>>
>> --
>> 2.44.0
>>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-05-31  9:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-16 12:18 [RFC v2 0/2] Discussion around eviction improvements Tvrtko Ursulin
2024-05-16 12:18 ` [RFC 1/2] drm/amdgpu: Re-validate evicted buffers Tvrtko Ursulin
2024-05-16 12:18 ` [RFC 2/2] drm/amdgpu: Actually respect buffer migration budget Tvrtko Ursulin
2024-05-16 19:21 ` [RFC v2 0/2] Discussion around eviction improvements Alex Deucher
2024-05-17  7:41   ` Tvrtko Ursulin
2024-05-17 13:43     ` Alex Deucher
2024-05-31  9:12   ` Christian König
2024-05-30 10:17 ` Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.