From: Felix Kuehling <felix.kuehling@amd.com>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>,
"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/9] drm/amdgpu: generally allow over-commit during BO allocation
Date: Sat, 10 Dec 2022 20:13:10 -0500 [thread overview]
Message-ID: <223a4acc-ee20-ae16-8b0b-63d358d0902f@amd.com> (raw)
In-Reply-To: <c9243d99-2a02-2e95-82f6-c70db9a08641@gmail.com>
Am 2022-12-10 um 09:12 schrieb Christian König:
> Am 10.12.22 um 07:15 schrieb Felix Kuehling:
>> On 2022-11-25 05:21, Christian König wrote:
>>> We already fallback to a dummy BO with no backing store when we
>>> allocate GDS,GWS and OA resources and to GTT when we allocate VRAM.
>>>
>>> Drop all those workarounds and generalize this for GTT as well. This
>>> fixes ENOMEM issues with runaway applications which try to
>>> allocate/free
>>> GTT in a loop and are otherwise only limited by the CPU speed.
>>>
>>> The CS will wait for the cleanup of freed up BOs to satisfy the
>>> various domain specific limits and so effectively throttle those
>>> buggy applications down to a sane allocation behavior again.
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>
>> This patch causes some regressions in KFDTest. KFDMemoryTest.MMBench
>> sees a huge VRAM allocation slow-down. And
>> KFDMemoryTest.LargestVramBufferTest can only allocate half the
>> available memory.
>
> Mhm, I wasn't expecting that we use this for the KFD as well.
Yeah, we use amdgpu_gem_object_create. I guess we could duplicate its
functionality or add a "no_overcommit" or "greedy" parameter for our needs.
>
>>
>> This seems to be caused by initially validating VRAM BOs in the CPU
>> domain, which allocates a ttm_tt. A subsequent validation in the VRAM
>> domain involves a copy from GTT to VRAM.
>
> The idea was to initially create the BOs without any backing store.
I thought about it a bit more. I believe the BO creation without backing
store is working as expected. But amdgpu_bo_move can't move the
uninitialized BO directly from system to VRAM. It returns -EMULTIHOP. So
the BO gets moved to GTT first (allocating system memory) before it can
be migrated to VRAM. That adds a bunch of overhead with unnecessary
system memory allocation and forces all VRAM to be zero-initialized on
the CPU and copied through PCIe. I think your idea would work with
almost no overhead if amdgpu_bo_move could directly move a BO without
backing store to VRAM with ttm_bo_move_null.
Regards,
Felix
>
>>
>> After that, freeing of BOs can get delayed by the ghost object of a
>> previous migration, which delays calling release notifiers and causes
>> problems for KFDs available memory accounting.
>>
>> I experimented with a workaround that validates BOs immediately after
>> allocation, but that only moves around the delays and doesn't solve
>> the problem. During those experiments I may also have stumbled over a
>> bug in ttm_buffer_object_transfer: It calls ttm_bo_set_bulk_move
>> before initializing and locking fbo->base.base._resv. This results in
>> a flood of warnings because ttm_bo_set_bulk_move expects the
>> reservation to be locked.
>>
>> Right now I'd like to remove the bp.domain = initial_domain |
>> AMDGPU_GEM_DOMAIN_CPU change in amdgpu_gem_object_create to fix this.
>
> Yeah, let's revert and investigate this first.
>
> Thanks,
> Christian.
>
>>
>> Regards,
>> Felix
>>
>>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 16 +++-------------
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +-----
>>> 2 files changed, 4 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> index a0780a4e3e61..62e98f1ad770 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> @@ -113,7 +113,7 @@ int amdgpu_gem_object_create(struct
>>> amdgpu_device *adev, unsigned long size,
>>> bp.resv = resv;
>>> bp.preferred_domain = initial_domain;
>>> bp.flags = flags;
>>> - bp.domain = initial_domain;
>>> + bp.domain = initial_domain | AMDGPU_GEM_DOMAIN_CPU;
>>> bp.bo_ptr_size = sizeof(struct amdgpu_bo);
>>> r = amdgpu_bo_create_user(adev, &bp, &ubo);
>>> @@ -332,20 +332,10 @@ int amdgpu_gem_create_ioctl(struct drm_device
>>> *dev, void *data,
>>> }
>>> initial_domain = (u32)(0xffffffff & args->in.domains);
>>> -retry:
>>> r = amdgpu_gem_object_create(adev, size, args->in.alignment,
>>> - initial_domain,
>>> - flags, ttm_bo_type_device, resv, &gobj);
>>> + initial_domain, flags, ttm_bo_type_device,
>>> + resv, &gobj);
>>> if (r && r != -ERESTARTSYS) {
>>> - if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
>>> - flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
>>> - goto retry;
>>> - }
>>> -
>>> - if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
>>> - initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
>>> - goto retry;
>>> - }
>>> DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu,
>>> %d)\n",
>>> size, initial_domain, args->in.alignment, r);
>>> }
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> index 974e85d8b6cc..919bbea2e3ac 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> @@ -581,11 +581,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>> bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
>>> bo->tbo.bdev = &adev->mman.bdev;
>>> - if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA |
>>> - AMDGPU_GEM_DOMAIN_GDS))
>>> - amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>>> - else
>>> - amdgpu_bo_placement_from_domain(bo, bp->domain);
>>> + amdgpu_bo_placement_from_domain(bo, bp->domain);
>>> if (bp->type == ttm_bo_type_kernel)
>>> bo->tbo.priority = 1;
>
prev parent reply other threads:[~2022-12-11 1:13 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-25 10:21 [PATCH 1/9] drm/amdgpu: generally allow over-commit during BO allocation Christian König
2022-11-25 10:21 ` [PATCH 2/9] drm/ttm: remove ttm_bo_(un)lock_delayed_workqueue Christian König
2022-11-25 10:21 ` [PATCH 3/9] drm/ttm: use per BO cleanup workers Christian König
2022-11-29 21:14 ` Felix Kuehling
2022-12-05 13:39 ` Christian König
2023-06-13 13:05 ` Karol Herbst
2023-06-13 13:59 ` Christian König
2023-06-13 14:18 ` Karol Herbst
2023-06-15 11:19 ` Christian König
2023-06-15 12:04 ` Karol Herbst
2022-11-25 10:21 ` [PATCH 4/9] drm/ttm: merge ttm_bo_api.h and ttm_bo_driver.h Christian König
2022-11-25 12:43 ` kernel test robot
2022-11-25 21:19 ` kernel test robot
2022-11-25 10:21 ` [PATCH 5/9] drm/nouveau: stop using ttm_bo_wait Christian König
2022-11-25 10:21 ` [PATCH 6/9] drm/qxl: " Christian König
2022-12-15 14:19 ` Christian König
2022-12-15 20:09 ` Dave Airlie
2022-11-25 10:21 ` [PATCH 7/9] drm/i915: " Christian König
2022-11-25 11:14 ` [Intel-gfx] " Tvrtko Ursulin
2022-11-25 12:46 ` Christian König
2022-11-29 18:05 ` Matthew Auld
2022-11-30 13:02 ` Tvrtko Ursulin
2022-11-30 14:06 ` Daniel Vetter
2022-12-05 19:58 ` Christian König
2022-12-06 18:03 ` Matthew Auld
2022-12-06 18:06 ` Christian König
2022-11-25 10:21 ` [PATCH 8/9] drm/ttm: use ttm_bo_wait_ctx instead of ttm_bo_wait Christian König
2022-11-25 10:21 ` [PATCH 9/9] drm/ttm: move ttm_bo_wait into VMWGFX Christian König
2022-11-25 18:18 ` [PATCH 1/9] drm/amdgpu: generally allow over-commit during BO allocation Alex Deucher
2022-12-05 13:41 ` Christian König
2022-11-28 6:00 ` Arunpravin Paneer Selvam
2022-12-10 6:15 ` Felix Kuehling
2022-12-10 14:12 ` Christian König
2022-12-11 1:13 ` Felix Kuehling [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=223a4acc-ee20-ae16-8b0b-63d358d0902f@amd.com \
--to=felix.kuehling@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=ckoenig.leichtzumerken@gmail.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox