Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: Felix Kuehling <felix.kuehling@amd.com>,
	Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 1/9] drm/amdgpu: generally allow over-commit during BO allocation
Date: Sat, 10 Dec 2022 15:12:26 +0100	[thread overview]
Message-ID: <c9243d99-2a02-2e95-82f6-c70db9a08641@gmail.com> (raw)
In-Reply-To: <5ad09c47-1f50-07ce-7b8b-f8e4195f2256@amd.com>

Am 10.12.22 um 07:15 schrieb Felix Kuehling:
> On 2022-11-25 05:21, Christian König wrote:
>> We already fallback to a dummy BO with no backing store when we
>> allocate GDS,GWS and OA resources and to GTT when we allocate VRAM.
>>
>> Drop all those workarounds and generalize this for GTT as well. This
>> fixes ENOMEM issues with runaway applications which try to allocate/free
>> GTT in a loop and are otherwise only limited by the CPU speed.
>>
>> The CS will wait for the cleanup of freed up BOs to satisfy the
>> various domain specific limits and so effectively throttle those
>> buggy applications down to a sane allocation behavior again.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>
> This patch causes some regressions in KFDTest. KFDMemoryTest.MMBench 
> sees a huge VRAM allocation slow-down. And 
> KFDMemoryTest.LargestVramBufferTest can only allocate half the 
> available memory.

Mhm, I wasn't expecting that we use this for the KFD as well.

>
> This seems to be caused by initially validating VRAM BOs in the CPU 
> domain, which allocates a ttm_tt. A subsequent validation in the VRAM 
> domain involves a copy from GTT to VRAM.

The idea was to initially create the BOs without any backing store.

>
> After that, freeing of BOs can get delayed by the ghost object of a 
> previous migration, which delays calling release notifiers and causes 
> problems for KFDs available memory accounting.
>
> I experimented with a workaround that validates BOs immediately after 
> allocation, but that only moves around the delays and doesn't solve 
> the problem. During those experiments I may also have stumbled over a 
> bug in ttm_buffer_object_transfer: It calls ttm_bo_set_bulk_move 
> before initializing and locking fbo->base.base._resv. This results in 
> a flood of warnings because ttm_bo_set_bulk_move expects the 
> reservation to be locked.
>
> Right now I'd like to remove the bp.domain = initial_domain | 
> AMDGPU_GEM_DOMAIN_CPU change in amdgpu_gem_object_create to fix this.

Yeah, let's revert and investigate this first.

Thanks,
Christian.

>
> Regards,
>   Felix
>
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 16 +++-------------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-----
>>   2 files changed, 4 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> index a0780a4e3e61..62e98f1ad770 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> @@ -113,7 +113,7 @@ int amdgpu_gem_object_create(struct amdgpu_device 
>> *adev, unsigned long size,
>>       bp.resv = resv;
>>       bp.preferred_domain = initial_domain;
>>       bp.flags = flags;
>> -    bp.domain = initial_domain;
>> +    bp.domain = initial_domain | AMDGPU_GEM_DOMAIN_CPU;
>>       bp.bo_ptr_size = sizeof(struct amdgpu_bo);
>>         r = amdgpu_bo_create_user(adev, &bp, &ubo);
>> @@ -332,20 +332,10 @@ int amdgpu_gem_create_ioctl(struct drm_device 
>> *dev, void *data,
>>       }
>>         initial_domain = (u32)(0xffffffff & args->in.domains);
>> -retry:
>>       r = amdgpu_gem_object_create(adev, size, args->in.alignment,
>> -                     initial_domain,
>> -                     flags, ttm_bo_type_device, resv, &gobj);
>> +                     initial_domain, flags, ttm_bo_type_device,
>> +                     resv, &gobj);
>>       if (r && r != -ERESTARTSYS) {
>> -        if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
>> -            flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
>> -            goto retry;
>> -        }
>> -
>> -        if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
>> -            initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
>> -            goto retry;
>> -        }
>>           DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, 
>> %d)\n",
>>                   size, initial_domain, args->in.alignment, r);
>>       }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index 974e85d8b6cc..919bbea2e3ac 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -581,11 +581,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>           bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
>>         bo->tbo.bdev = &adev->mman.bdev;
>> -    if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA |
>> -              AMDGPU_GEM_DOMAIN_GDS))
>> -        amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>> -    else
>> -        amdgpu_bo_placement_from_domain(bo, bp->domain);
>> +    amdgpu_bo_placement_from_domain(bo, bp->domain);
>>       if (bp->type == ttm_bo_type_kernel)
>>           bo->tbo.priority = 1;


  reply	other threads:[~2022-12-10 15:33 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-25 10:21 [Intel-gfx] [PATCH 1/9] drm/amdgpu: generally allow over-commit during BO allocation Christian König
2022-11-25 10:21 ` [Intel-gfx] [PATCH 2/9] drm/ttm: remove ttm_bo_(un)lock_delayed_workqueue Christian König
2022-11-25 10:21 ` [Intel-gfx] [PATCH 3/9] drm/ttm: use per BO cleanup workers Christian König
2022-11-29 21:14   ` Felix Kuehling
2022-12-05 13:39     ` Christian König
2023-06-13 13:05       ` Karol Herbst
2023-06-13 13:59         ` Christian König
2023-06-13 14:18           ` Karol Herbst
2023-06-15 11:19             ` Christian König
2023-06-15 12:04               ` Karol Herbst
2022-11-25 10:21 ` [Intel-gfx] [PATCH 4/9] drm/ttm: merge ttm_bo_api.h and ttm_bo_driver.h Christian König
2022-11-25 12:43   ` kernel test robot
2022-11-25 21:19   ` kernel test robot
2022-11-25 10:21 ` [Intel-gfx] [PATCH 5/9] drm/nouveau: stop using ttm_bo_wait Christian König
2022-11-25 10:21 ` [Intel-gfx] [PATCH 6/9] drm/qxl: " Christian König
2022-12-15 14:19   ` Christian König
2022-12-15 20:09     ` Dave Airlie
2022-11-25 10:21 ` [Intel-gfx] [PATCH 7/9] drm/i915: " Christian König
2022-11-25 11:14   ` Tvrtko Ursulin
2022-11-25 12:46     ` Christian König
2022-11-29 18:05     ` Matthew Auld
2022-11-30 13:02       ` Tvrtko Ursulin
2022-11-30 14:06         ` Daniel Vetter
2022-12-05 19:58           ` Christian König
2022-12-06 18:03             ` Matthew Auld
2022-12-06 18:06               ` Christian König
2022-11-25 10:21 ` [Intel-gfx] [PATCH 8/9] drm/ttm: use ttm_bo_wait_ctx instead of ttm_bo_wait Christian König
2022-11-25 10:21 ` [Intel-gfx] [PATCH 9/9] drm/ttm: move ttm_bo_wait into VMWGFX Christian König
2022-11-25 11:17 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/9] drm/amdgpu: generally allow over-commit during BO allocation Patchwork
2022-11-25 11:18 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-11-25 11:18 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2022-11-25 11:40 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-11-25 18:18 ` [Intel-gfx] [PATCH 1/9] " Alex Deucher
2022-12-05 13:41   ` Christian König
2022-11-28  6:00 ` Arunpravin Paneer Selvam
2022-12-10  6:15 ` Felix Kuehling
2022-12-10 14:12   ` Christian König [this message]
2022-12-11  1:13     ` Felix Kuehling

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9243d99-2a02-2e95-82f6-c70db9a08641@gmail.com \
    --to=ckoenig.leichtzumerken@gmail.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox