* [RFC 1/2] drm/amdgpu: Re-validate evicted buffers
2024-05-16 12:18 [RFC v2 0/2] Discussion around eviction improvements Tvrtko Ursulin
@ 2024-05-16 12:18 ` Tvrtko Ursulin
2024-05-16 12:18 ` [RFC 2/2] drm/amdgpu: Actually respect buffer migration budget Tvrtko Ursulin
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2024-05-16 12:18 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König, Friedrich Vock
From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Currently the driver appears to be thinking that it will be attempting to
re-validate the evicted buffers on the next submission if they are not in
their preferred placement.
That however appears not to be true for the very common case of buffers
with allowed placements of VRAM+GTT. Simply because the check can only
detect if the current placement is *none* of the preferred ones, happily
leaving VRAM+GTT buffers in the GTT placement "forever".
Fix it by extending the VRAM+GTT special case to the re-validation logic.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Friedrich Vock <friedrich.vock@gmx.de>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 6bddd43604bc..e53ff914b62e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1248,10 +1248,25 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
* next command submission.
*/
if (amdgpu_vm_is_bo_always_valid(vm, bo)) {
- uint32_t mem_type = bo->tbo.resource->mem_type;
+ unsigned current_domain =
+ amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
+ bool move_to_evict = false;
- if (!(bo->preferred_domains &
- amdgpu_mem_type_to_domain(mem_type)))
+ if (!(bo->preferred_domains & current_domain)) {
+ move_to_evict = true;
+ } else if ((bo->preferred_domains & AMDGPU_GEM_DOMAIN_MASK) ==
+ (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT) &&
+ current_domain != AMDGPU_GEM_DOMAIN_VRAM) {
+ /*
+ * If userspace has provided a list of possible
+ * placements equal to VRAM+GTT, we assume VRAM is *the*
+ * preferred placement and so try to move it back there
+ * on the next submission.
+ */
+ move_to_evict = true;
+ }
+
+ if (move_to_evict)
amdgpu_vm_bo_evicted(&bo_va->base);
else
amdgpu_vm_bo_idle(&bo_va->base);
--
2.44.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* [RFC 2/2] drm/amdgpu: Actually respect buffer migration budget
2024-05-16 12:18 [RFC v2 0/2] Discussion around eviction improvements Tvrtko Ursulin
2024-05-16 12:18 ` [RFC 1/2] drm/amdgpu: Re-validate evicted buffers Tvrtko Ursulin
@ 2024-05-16 12:18 ` Tvrtko Ursulin
2024-05-16 19:21 ` [RFC v2 0/2] Discussion around eviction improvements Alex Deucher
2024-05-30 10:17 ` Tvrtko Ursulin
3 siblings, 0 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2024-05-16 12:18 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König, Friedrich Vock
From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Current code appears to live in a misconception that playing with buffer
allowed and preferred placements can always control the decision on
whether backing store migration will be attempted or not.
That is however not the case when userspace sets buffer placements of
VRAM+GTT, which is what radv does since commit 862b6a9a ("radv: Improve
spilling on discrete GPUs."), with the end result of completely ignoring
the migration budget.
Fix this by validating against a local singleton placement set to the
current backing store location. This way, when migration budget has been
depleted, we can prevent ttm_bo_validate from seeing any other than the
current placement.
For the case of implicit GTT allowed domain added in amdgpu_bo_create
when userspace only sets VRAM the behaviour should be the same. On the
first pass the re-validation will attempt to migrate away from the
fallback GTT domain, and if that did not succeed the buffer will remain in
the fallback placement.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Friedrich Vock <friedrich.vock@gmx.de>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
1 file changed, 85 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index ec888fc6ead8..08e7631f3a2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -32,6 +32,7 @@
#include <drm/amdgpu_drm.h>
#include <drm/drm_syncobj.h>
+#include <drm/drm_print.h>
#include <drm/ttm/ttm_tt.h>
#include "amdgpu_cs.h"
@@ -775,6 +776,56 @@ void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes,
spin_unlock(&adev->mm_stats.lock);
}
+static bool
+amdgpu_cs_bo_move_under_budget(struct amdgpu_cs_parser *p,
+ struct amdgpu_bo *abo)
+{
+ struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
+
+ /*
+ * Don't move this buffer if we have depleted our allowance
+ * to move it. Don't move anything if the threshold is zero.
+ */
+ if (p->bytes_moved >= p->bytes_moved_threshold)
+ return false;
+
+ if ((!abo->tbo.base.dma_buf ||
+ list_empty(&abo->tbo.base.dma_buf->attachments)) &&
+ (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
+ (abo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) &&
+ p->bytes_moved_vis >= p->bytes_moved_vis_threshold) {
+ /*
+ * And don't move a CPU_ACCESS_REQUIRED BO to limited
+ * visible VRAM if we've depleted our allowance to do
+ * that.
+ */
+ return false;
+ }
+
+ return true;
+}
+
+static bool
+amdgpu_bo_fill_current_placement(struct amdgpu_bo *abo,
+ struct ttm_placement *placement,
+ struct ttm_place *place)
+{
+ struct ttm_placement *bo_placement = &abo->placement;
+ int i;
+
+ for (i = 0; i < bo_placement->num_placement; i++) {
+ if (bo_placement->placement[i].mem_type ==
+ abo->tbo.resource->mem_type) {
+ *place = bo_placement->placement[i];
+ placement->num_placement = 1;
+ placement->placement = place;
+ return true;
+ }
+ }
+
+ return false;
+}
+
static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
{
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
@@ -784,46 +835,53 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
.no_wait_gpu = false,
.resv = bo->tbo.base.resv
};
- uint32_t domain;
+ bool allow_move;
int r;
if (bo->tbo.pin_count)
return 0;
- /* Don't move this buffer if we have depleted our allowance
- * to move it. Don't move anything if the threshold is zero.
- */
- if (p->bytes_moved < p->bytes_moved_threshold &&
- (!bo->tbo.base.dma_buf ||
- list_empty(&bo->tbo.base.dma_buf->attachments))) {
- if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
- (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) {
- /* And don't move a CPU_ACCESS_REQUIRED BO to limited
- * visible VRAM if we've depleted our allowance to do
- * that.
- */
- if (p->bytes_moved_vis < p->bytes_moved_vis_threshold)
- domain = bo->preferred_domains;
- else
- domain = bo->allowed_domains;
- } else {
- domain = bo->preferred_domains;
- }
- } else {
- domain = bo->allowed_domains;
- }
+ if (!bo->tbo.resource || amdgpu_cs_bo_move_under_budget(p, bo))
+ allow_move = true;
+ else
+ allow_move = false;
+ amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
retry:
- amdgpu_bo_placement_from_domain(bo, domain);
- r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+ if (allow_move) {
+ r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+ } else {
+ struct ttm_placement same_placement = { };
+ struct ttm_place same_place;
+ bool placement_found;
+
+ placement_found =
+ amdgpu_bo_fill_current_placement(bo,
+ &same_placement,
+ &same_place);
+
+ if (drm_WARN_ON_ONCE(&adev->ddev, !placement_found)) {
+ /*
+ * QQQ
+ * Current placement not in the bo->allowed_domains set.
+ * Can this happen?
+ * QQQ
+ */
+ allow_move = true;
+ goto retry;
+ }
+
+ r = ttm_bo_validate(&bo->tbo, &same_placement, &ctx);
+ }
p->bytes_moved += ctx.bytes_moved;
if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
amdgpu_res_cpu_visible(adev, bo->tbo.resource))
p->bytes_moved_vis += ctx.bytes_moved;
- if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
- domain = bo->allowed_domains;
+ if (unlikely(r == -ENOMEM) && !allow_move) {
+ ctx.bytes_moved = 0;
+ allow_move = true;
goto retry;
}
--
2.44.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [RFC v2 0/2] Discussion around eviction improvements
2024-05-16 12:18 [RFC v2 0/2] Discussion around eviction improvements Tvrtko Ursulin
2024-05-16 12:18 ` [RFC 1/2] drm/amdgpu: Re-validate evicted buffers Tvrtko Ursulin
2024-05-16 12:18 ` [RFC 2/2] drm/amdgpu: Actually respect buffer migration budget Tvrtko Ursulin
@ 2024-05-16 19:21 ` Alex Deucher
2024-05-17 7:41 ` Tvrtko Ursulin
2024-05-31 9:12 ` Christian König
2024-05-30 10:17 ` Tvrtko Ursulin
3 siblings, 2 replies; 8+ messages in thread
From: Alex Deucher @ 2024-05-16 19:21 UTC (permalink / raw)
To: Tvrtko Ursulin
Cc: amd-gfx, dri-devel, kernel-dev, Tvrtko Ursulin,
Christian König, Friedrich Vock
On Thu, May 16, 2024 at 8:18 AM Tvrtko Ursulin <tursulin@igalia.com> wrote:
>
> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>
> Reduced re-spin of my previous series after Christian corrected a few
> misconceptions that I had. So lets see if what remains makes sense or is still
> misguided.
>
> To summarise, the series address the following two issues:
>
> * Migration rate limiting does not work, at least not for the common case
> where userspace configures VRAM+GTT. It thinks it can stop migration attempts
> by playing with bo->allowed_domains vs bo->preferred domains but, both from
> the code, and from empirical experiments, I see that not working at all. When
> both masks are identical fiddling with them achieves nothing. Even when they
> are not identical allowed has a fallback GTT placement which means that when
> over the migration budget ttm_bo_validate with bo->allowed_domains can cause
> migration from GTT to VRAM.
>
> * Driver thinks it will be re-validating evicted buffers on the next submission
> but it does not for the very common case of VRAM+GTT because it only checks
> if current placement is *none* of the preferred placements.
For APUs at least, we should never migrate because GTT and VRAM are
both system memory so are effectively equal performance-wise. Maybe
this regressed when Christian reworked ttm to better handle migrating
buffers back to VRAM after suspend on dGPUs?
Alex
>
> These two patches appear to have a positive result for a memory intensive game
> like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
> set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
> the migration budget appears to keep buffer blits at bay and improves the
> minimum frame rate, ie. makes things smoother.
>
> From the game's built-in benchmark, average of three runs each:
>
> FPS
> migrated KiB min avg max min-1% min-0.1%
> because 20784781 10.00 37.00 89.67 22.00 12.33
> patched 4227688 13.67 37.00 81.33 23.33 15.00
>
> Disclaimers that I have is that more runs would be needed to be more confident
> about the results. And more games. And APU versus discrete.
>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Friedrich Vock <friedrich.vock@gmx.de>
>
> Tvrtko Ursulin (2):
> drm/amdgpu: Re-validate evicted buffers
> drm/amdgpu: Actually respect buffer migration budget
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 21 ++++-
> 2 files changed, 103 insertions(+), 30 deletions(-)
>
> --
> 2.44.0
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC v2 0/2] Discussion around eviction improvements
2024-05-16 19:21 ` [RFC v2 0/2] Discussion around eviction improvements Alex Deucher
@ 2024-05-17 7:41 ` Tvrtko Ursulin
2024-05-17 13:43 ` Alex Deucher
2024-05-31 9:12 ` Christian König
1 sibling, 1 reply; 8+ messages in thread
From: Tvrtko Ursulin @ 2024-05-17 7:41 UTC (permalink / raw)
To: Alex Deucher, Tvrtko Ursulin
Cc: amd-gfx, dri-devel, kernel-dev, Christian König,
Friedrich Vock
On 16/05/2024 20:21, Alex Deucher wrote:
> On Thu, May 16, 2024 at 8:18 AM Tvrtko Ursulin <tursulin@igalia.com> wrote:
>>
>> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>
>> Reduced re-spin of my previous series after Christian corrected a few
>> misconceptions that I had. So lets see if what remains makes sense or is still
>> misguided.
>>
>> To summarise, the series address the following two issues:
>>
>> * Migration rate limiting does not work, at least not for the common case
>> where userspace configures VRAM+GTT. It thinks it can stop migration attempts
>> by playing with bo->allowed_domains vs bo->preferred domains but, both from
>> the code, and from empirical experiments, I see that not working at all. When
>> both masks are identical fiddling with them achieves nothing. Even when they
>> are not identical allowed has a fallback GTT placement which means that when
>> over the migration budget ttm_bo_validate with bo->allowed_domains can cause
>> migration from GTT to VRAM.
>>
>> * Driver thinks it will be re-validating evicted buffers on the next submission
>> but it does not for the very common case of VRAM+GTT because it only checks
>> if current placement is *none* of the preferred placements.
>
> For APUs at least, we should never migrate because GTT and VRAM are
> both system memory so are effectively equal performance-wise. Maybe
I was curious about this but thought there could be a reason why VRAM
carve-out is a fix small-ish size. It cannot be made 1:1 with RAM or
some other solution?
> this regressed when Christian reworked ttm to better handle migrating
> buffers back to VRAM after suspend on dGPUs?
I will leave this to Christian to answer but for what this series is
concerned I'd say it is orthogonal to that.
Here we have two fixes not limited to APU use cases, just so it happens
fixing the migration throttling improves things there too. And that even
despite the first patch which triggering *more* migration attempts.
Because the second patch then correctly curbs them.
First patch should help with transient overcommit on discrete, allowing
things get back into VRAM as soon as there is space.
Second patch tries to makes migration throttling work as intended.
Volunteers for testing on discrete? :)
>>
>> These two patches appear to have a positive result for a memory intensive game
>> like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
>> set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
>> the migration budget appears to keep buffer blits at bay and improves the
>> minimum frame rate, ie. makes things smoother.
>>
>> From the game's built-in benchmark, average of three runs each:
>>
>> FPS
>> migrated KiB min avg max min-1% min-0.1%
>> because 20784781 10.00 37.00 89.67 22.00 12.33
>> patched 4227688 13.67 37.00 81.33 23.33 15.00
Hmm! s/because/before/ here obviously!
Regards,
Tvrtko
>> Disclaimers that I have is that more runs would be needed to be more confident
>> about the results. And more games. And APU versus discrete.
>>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Friedrich Vock <friedrich.vock@gmx.de>
>>
>> Tvrtko Ursulin (2):
>> drm/amdgpu: Re-validate evicted buffers
>> drm/amdgpu: Actually respect buffer migration budget
>>
>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 21 ++++-
>> 2 files changed, 103 insertions(+), 30 deletions(-)
>>
>> --
>> 2.44.0
>>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC v2 0/2] Discussion around eviction improvements
2024-05-17 7:41 ` Tvrtko Ursulin
@ 2024-05-17 13:43 ` Alex Deucher
0 siblings, 0 replies; 8+ messages in thread
From: Alex Deucher @ 2024-05-17 13:43 UTC (permalink / raw)
To: Tvrtko Ursulin
Cc: Tvrtko Ursulin, amd-gfx, dri-devel, kernel-dev,
Christian König, Friedrich Vock
On Fri, May 17, 2024 at 3:41 AM Tvrtko Ursulin
<tvrtko.ursulin@igalia.com> wrote:
>
>
> On 16/05/2024 20:21, Alex Deucher wrote:
> > On Thu, May 16, 2024 at 8:18 AM Tvrtko Ursulin <tursulin@igalia.com> wrote:
> >>
> >> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> >>
> >> Reduced re-spin of my previous series after Christian corrected a few
> >> misconceptions that I had. So lets see if what remains makes sense or is still
> >> misguided.
> >>
> >> To summarise, the series address the following two issues:
> >>
> >> * Migration rate limiting does not work, at least not for the common case
> >> where userspace configures VRAM+GTT. It thinks it can stop migration attempts
> >> by playing with bo->allowed_domains vs bo->preferred domains but, both from
> >> the code, and from empirical experiments, I see that not working at all. When
> >> both masks are identical fiddling with them achieves nothing. Even when they
> >> are not identical allowed has a fallback GTT placement which means that when
> >> over the migration budget ttm_bo_validate with bo->allowed_domains can cause
> >> migration from GTT to VRAM.
> >>
> >> * Driver thinks it will be re-validating evicted buffers on the next submission
> >> but it does not for the very common case of VRAM+GTT because it only checks
> >> if current placement is *none* of the preferred placements.
> >
> > For APUs at least, we should never migrate because GTT and VRAM are
> > both system memory so are effectively equal performance-wise. Maybe
>
> I was curious about this but thought there could be a reason why VRAM
> carve-out is a fix small-ish size. It cannot be made 1:1 with RAM or
> some other solution?
It's really only needed for displays at boot up prior to the OS
loading or for displays on older APU that did not support
scatter/gather display. The problem with increasing the carveout size
is that that prevents the memory from being available to the rest of
the system.
Alex
>
> > this regressed when Christian reworked ttm to better handle migrating
> > buffers back to VRAM after suspend on dGPUs?
>
> I will leave this to Christian to answer but for what this series is
> concerned I'd say it is orthogonal to that.
>
> Here we have two fixes not limited to APU use cases, just so it happens
> fixing the migration throttling improves things there too. And that even
> despite the first patch which triggering *more* migration attempts.
> Because the second patch then correctly curbs them.
>
> First patch should help with transient overcommit on discrete, allowing
> things get back into VRAM as soon as there is space.
>
> Second patch tries to makes migration throttling work as intended.
>
> Volunteers for testing on discrete? :)
>
> >>
> >> These two patches appear to have a positive result for a memory intensive game
> >> like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
> >> set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
> >> the migration budget appears to keep buffer blits at bay and improves the
> >> minimum frame rate, ie. makes things smoother.
> >>
> >> From the game's built-in benchmark, average of three runs each:
> >>
> >> FPS
> >> migrated KiB min avg max min-1% min-0.1%
> >> because 20784781 10.00 37.00 89.67 22.00 12.33
> >> patched 4227688 13.67 37.00 81.33 23.33 15.00
>
> Hmm! s/because/before/ here obviously!
>
> Regards,
>
> Tvrtko
>
> >> Disclaimers that I have is that more runs would be needed to be more confident
> >> about the results. And more games. And APU versus discrete.
> >>
> >> Cc: Christian König <christian.koenig@amd.com>
> >> Cc: Friedrich Vock <friedrich.vock@gmx.de>
> >>
> >> Tvrtko Ursulin (2):
> >> drm/amdgpu: Re-validate evicted buffers
> >> drm/amdgpu: Actually respect buffer migration budget
> >>
> >> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
> >> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 21 ++++-
> >> 2 files changed, 103 insertions(+), 30 deletions(-)
> >>
> >> --
> >> 2.44.0
> >>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC v2 0/2] Discussion around eviction improvements
2024-05-16 19:21 ` [RFC v2 0/2] Discussion around eviction improvements Alex Deucher
2024-05-17 7:41 ` Tvrtko Ursulin
@ 2024-05-31 9:12 ` Christian König
1 sibling, 0 replies; 8+ messages in thread
From: Christian König @ 2024-05-31 9:12 UTC (permalink / raw)
To: Alex Deucher, Tvrtko Ursulin
Cc: amd-gfx, dri-devel, kernel-dev, Tvrtko Ursulin,
Christian König, Friedrich Vock
Am 16.05.24 um 21:21 schrieb Alex Deucher:
> On Thu, May 16, 2024 at 8:18 AM Tvrtko Ursulin <tursulin@igalia.com> wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>
>> Reduced re-spin of my previous series after Christian corrected a few
>> misconceptions that I had. So lets see if what remains makes sense or is still
>> misguided.
>>
>> To summarise, the series address the following two issues:
>>
>> * Migration rate limiting does not work, at least not for the common case
>> where userspace configures VRAM+GTT. It thinks it can stop migration attempts
>> by playing with bo->allowed_domains vs bo->preferred domains but, both from
>> the code, and from empirical experiments, I see that not working at all. When
>> both masks are identical fiddling with them achieves nothing. Even when they
>> are not identical allowed has a fallback GTT placement which means that when
>> over the migration budget ttm_bo_validate with bo->allowed_domains can cause
>> migration from GTT to VRAM.
>>
>> * Driver thinks it will be re-validating evicted buffers on the next submission
>> but it does not for the very common case of VRAM+GTT because it only checks
>> if current placement is *none* of the preferred placements.
> For APUs at least, we should never migrate because GTT and VRAM are
> both system memory so are effectively equal performance-wise. Maybe
> this regressed when Christian reworked ttm to better handle migrating
> buffers back to VRAM after suspend on dGPUs?
Yeah, that's quite likely. I'm already looking into this.
Regards,
Christian.
>
> Alex
>
>> These two patches appear to have a positive result for a memory intensive game
>> like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
>> set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
>> the migration budget appears to keep buffer blits at bay and improves the
>> minimum frame rate, ie. makes things smoother.
>>
>> From the game's built-in benchmark, average of three runs each:
>>
>> FPS
>> migrated KiB min avg max min-1% min-0.1%
>> because 20784781 10.00 37.00 89.67 22.00 12.33
>> patched 4227688 13.67 37.00 81.33 23.33 15.00
>>
>> Disclaimers that I have is that more runs would be needed to be more confident
>> about the results. And more games. And APU versus discrete.
>>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Friedrich Vock <friedrich.vock@gmx.de>
>>
>> Tvrtko Ursulin (2):
>> drm/amdgpu: Re-validate evicted buffers
>> drm/amdgpu: Actually respect buffer migration budget
>>
>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 21 ++++-
>> 2 files changed, 103 insertions(+), 30 deletions(-)
>>
>> --
>> 2.44.0
>>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC v2 0/2] Discussion around eviction improvements
2024-05-16 12:18 [RFC v2 0/2] Discussion around eviction improvements Tvrtko Ursulin
` (2 preceding siblings ...)
2024-05-16 19:21 ` [RFC v2 0/2] Discussion around eviction improvements Alex Deucher
@ 2024-05-30 10:17 ` Tvrtko Ursulin
3 siblings, 0 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2024-05-30 10:17 UTC (permalink / raw)
To: Tvrtko Ursulin, amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König, Friedrich Vock
Hi,
On 16/05/2024 13:18, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>
> Reduced re-spin of my previous series after Christian corrected a few
> misconceptions that I had. So lets see if what remains makes sense or is still
> misguided.
>
> To summarise, the series address the following two issues:
>
> * Migration rate limiting does not work, at least not for the common case
> where userspace configures VRAM+GTT. It thinks it can stop migration attempts
> by playing with bo->allowed_domains vs bo->preferred domains but, both from
> the code, and from empirical experiments, I see that not working at all. When
> both masks are identical fiddling with them achieves nothing. Even when they
> are not identical allowed has a fallback GTT placement which means that when
> over the migration budget ttm_bo_validate with bo->allowed_domains can cause
> migration from GTT to VRAM.
>
> * Driver thinks it will be re-validating evicted buffers on the next submission
> but it does not for the very common case of VRAM+GTT because it only checks
> if current placement is *none* of the preferred placements.
>
> These two patches appear to have a positive result for a memory intensive game
> like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
> set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
> the migration budget appears to keep buffer blits at bay and improves the
> minimum frame rate, ie. makes things smoother.
>
>>From the game's built-in benchmark, average of three runs each:
>
> FPS
> migrated KiB min avg max min-1% min-0.1%
> because 20784781 10.00 37.00 89.67 22.00 12.33
> patched 4227688 13.67 37.00 81.33 23.33 15.00
Any feedback on this series?
As described above, neither migration rate limiting or re-validation of
evicted buffers seems to work as expected currently.
Regards,
Tvrtko
> Disclaimers that I have is that more runs would be needed to be more confident
> about the results. And more games. And APU versus discrete.
>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Friedrich Vock <friedrich.vock@gmx.de>
>
> Tvrtko Ursulin (2):
> drm/amdgpu: Re-validate evicted buffers
> drm/amdgpu: Actually respect buffer migration budget
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 21 ++++-
> 2 files changed, 103 insertions(+), 30 deletions(-)
>
^ permalink raw reply [flat|nested] 8+ messages in thread