AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Koenig, Christian" <Christian.Koenig-5C7GfCeVMHo@public.gmane.org>
To: "Kuehling, Felix" <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>,
	"Olsak, Marek" <Marek.Olsak-5C7GfCeVMHo@public.gmane.org>,
	"Zhou,
	David(ChunMing)" <David1.Zhou-5C7GfCeVMHo@public.gmane.org>,
	"Liang, Prike" <Prike.Liang-5C7GfCeVMHo@public.gmane.org>,
	"dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>,
	"amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
Subject: Re: [PATCH 10/10] drm/amdgpu: stop removing BOs from the LRU v3
Date: Mon, 27 May 2019 10:51:06 +0000	[thread overview]
Message-ID: <198d6abf-146d-c8f8-5602-37b95cd6b809@amd.com> (raw)
In-Reply-To: <776d29df-428f-ad98-8e38-4b191b602abb-5C7GfCeVMHo@public.gmane.org>

Am 24.05.19 um 23:34 schrieb Kuehling, Felix:
> On 2019-05-23 5:06 a.m., Christian König wrote:
>> [CAUTION: External Email]
>>
>> Leaving BOs on the LRU is harmless. We always did this for VM page table
>> and per VM BOs.
>>
>> The key point is that BOs which couldn't be reserved can't be evicted.
>> So what happened is that an application used basically all of VRAM
>> during CS and because of this X server couldn't pin a BO for scanout.
>>
>> Now we keep the BOs on the LRU and modify TTM to block for the CS to
>> complete, which in turn allows the X server to pin its BO for scanout.
>
> OK, let me rephrase that to make sure I understand it correctly. I think
> the point is that eviction candidates come from an LRU list, so leaving
> things on the LRU makes more BOs available for eviction and avoids OOM
> situations. To take advantage of that, patch 6 adds the ability to wait
> for reserved BOs when there is nothing easier to evict.
>
> ROCm applications like to use lots of memory. So it probably makes sense
> for us to stop removing our BOs from the LRU as well while we
> mass-validate our BOs in amdgpu_amdkfd_gpuvm_restore_process_bos.

Well that would allow concurrent calls of 
amdgpu_amdkfd_gpuvm_restore_process_bos() to wait for each other.

If that's what you want then yeah that certainly makes sense.

Regards,
Christian.

>
> Regards,
>     Felix
>
>
>> Christian.
>>
>> Am 22.05.19 um 21:43 schrieb Kuehling, Felix:
>>> Can you explain how this avoids OOM situations? When is it safe to leave
>>> a reserved BO on the LRU list? Could we do the same thing in
>>> amdgpu_amdkfd_gpuvm.c? And if we did, what would be the expected side
>>> effects or consequences?
>>>
>>> Thanks,
>>>      Felix
>>>
>>> On 2019-05-22 8:59 a.m., Christian König wrote:
>>>> [CAUTION: External Email]
>>>>
>>>> This avoids OOM situations when we have lots of threads
>>>> submitting at the same time.
>>>>
>>>> v3: apply this to the whole driver, not just CS
>>>>
>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>> ---
>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     | 2 +-
>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c    | 2 +-
>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 4 ++--
>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 +-
>>>>     4 files changed, 5 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> index 20f2955d2a55..3e2da24cd17a 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct
>>>> amdgpu_cs_parser *p,
>>>>            }
>>>>
>>>>            r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
>>>> -                                  &duplicates, true);
>>>> +                                  &duplicates, false);
>>>>            if (unlikely(r != 0)) {
>>>>                    if (r != -ERESTARTSYS)
>>>>                            DRM_ERROR("ttm_eu_reserve_buffers
>>>> failed.\n");
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>>>> index 06f83cac0d3a..f660628e6af9 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>>>> @@ -79,7 +79,7 @@ int amdgpu_map_static_csa(struct amdgpu_device
>>>> *adev, struct amdgpu_vm *vm,
>>>>            list_add(&csa_tv.head, &list);
>>>>            amdgpu_vm_get_pd_bo(vm, &list, &pd);
>>>>
>>>> -       r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL, true);
>>>> +       r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL, false);
>>>>            if (r) {
>>>>                    DRM_ERROR("failed to reserve CSA,PD BOs:
>>>> err=%d\n", r);
>>>>                    return r;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> index d513a5ad03dd..ed25a4e14404 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> @@ -171,7 +171,7 @@ void amdgpu_gem_object_close(struct
>>>> drm_gem_object *obj,
>>>>
>>>>            amdgpu_vm_get_pd_bo(vm, &list, &vm_pd);
>>>>
>>>> -       r = ttm_eu_reserve_buffers(&ticket, &list, false,
>>>> &duplicates, true);
>>>> +       r = ttm_eu_reserve_buffers(&ticket, &list, false,
>>>> &duplicates, false);
>>>>            if (r) {
>>>>                    dev_err(adev->dev, "leaking bo va because "
>>>>                            "we fail to reserve bo (%d)\n", r);
>>>> @@ -608,7 +608,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev,
>>>> void *data,
>>>>
>>>>            amdgpu_vm_get_pd_bo(&fpriv->vm, &list, &vm_pd);
>>>>
>>>> -       r = ttm_eu_reserve_buffers(&ticket, &list, true,
>>>> &duplicates, true);
>>>> +       r = ttm_eu_reserve_buffers(&ticket, &list, true,
>>>> &duplicates, false);
>>>>            if (r)
>>>>                    goto error_unref;
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>>> index c430e8259038..d60593cc436e 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>>> @@ -155,7 +155,7 @@ static inline int amdgpu_bo_reserve(struct
>>>> amdgpu_bo *bo, bool no_intr)
>>>>            struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>>>>            int r;
>>>>
>>>> -       r = ttm_bo_reserve(&bo->tbo, !no_intr, false, NULL);
>>>> +       r = __ttm_bo_reserve(&bo->tbo, !no_intr, false, NULL);
>>>>            if (unlikely(r != 0)) {
>>>>                    if (r != -ERESTARTSYS)
>>>>                            dev_err(adev->dev, "%p reserve failed\n",
>>>> bo);
>>>> -- 
>>>> 2.17.1
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  parent reply	other threads:[~2019-05-27 10:51 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-22 12:59 [PATCH 01/10] drm/ttm: Make LRU removal optional Christian König
2019-05-22 12:59 ` [PATCH 02/10] drm/ttm: return immediately in case of a signal Christian König
2019-05-22 12:59 ` [PATCH 03/10] drm/ttm: remove manual placement preference Christian König
     [not found] ` <20190522125947.4592-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2019-05-22 12:59   ` [PATCH 04/10] drm/ttm: cleanup ttm_bo_mem_space Christian König
2019-05-22 12:59   ` [PATCH 05/10] drm/ttm: immediately move BOs to the new LRU v2 Christian König
2019-05-22 12:59   ` [PATCH 06/10] drm/ttm: fix busy memory to fail other user v10 Christian König
2019-05-23 10:24     ` zhoucm1
2019-05-23 11:03       ` Christian König
     [not found]         ` <16918096-1430-d581-7284-a987aacb89da-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-05-23 11:50           ` Chunming Zhou
     [not found]             ` <5d68ba04-250d-918e-3633-ec45e5b18904-5C7GfCeVMHo@public.gmane.org>
2019-05-23 14:15               ` Koenig, Christian
2019-05-24  5:35         ` Liang, Prike
     [not found]           ` <MN2PR12MB35364235378F29899838CD80FB020-rweVpJHSKTovpq7YPKzLfQdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-05-24  8:49             ` Christian König
     [not found]     ` <20190522125947.4592-6-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2019-06-26  6:36       ` Kuehling, Felix
2019-05-22 12:59   ` [PATCH 07/10] drm/amd/display: use ttm_eu_reserve_buffers instead of amdgpu_bo_reserve v2 Christian König
2019-05-22 12:59   ` [PATCH 08/10] drm/amdgpu: drop some validation failure messages Christian König
2019-05-22 12:59   ` [PATCH 09/10] drm/amdgpu: create GDS, GWS and OA in system domain Christian König
2019-05-23  9:15   ` [PATCH 01/10] drm/ttm: Make LRU removal optional zhoucm1
     [not found]     ` <fbb023f9-28e7-2ac8-994f-e262da597098-5C7GfCeVMHo@public.gmane.org>
2019-05-23  9:39       ` Christian König
2019-05-22 12:59 ` [PATCH 10/10] drm/amdgpu: stop removing BOs from the LRU v3 Christian König
     [not found]   ` <20190522125947.4592-10-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2019-05-22 19:43     ` Kuehling, Felix
     [not found]       ` <48ac98a8-de22-3549-5d63-078a0effab72-5C7GfCeVMHo@public.gmane.org>
2019-05-23  9:06         ` Christian König
     [not found]           ` <eea6245e-616d-eb16-8521-2f21ce5d6d25-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-05-24 21:34             ` Kuehling, Felix
     [not found]               ` <776d29df-428f-ad98-8e38-4b191b602abb-5C7GfCeVMHo@public.gmane.org>
2019-05-27 10:51                 ` Koenig, Christian [this message]
2019-05-23  8:27     ` Liang, Prike
  -- strict thread matches above, loose matches on Subject: below --
2019-05-28 16:25 [PATCH 01/10] drm/ttm: Make LRU removal optional v2 Christian König
     [not found] ` <20190528162557.1280-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2019-05-28 16:25   ` [PATCH 10/10] drm/amdgpu: stop removing BOs from the LRU v3 Christian König
2019-05-29 12:26 [PATCH 01/10] drm/ttm: Make LRU removal optional v2 Christian König
2019-05-29 12:27 ` [PATCH 10/10] drm/amdgpu: stop removing BOs from the LRU v3 Christian König
     [not found]   ` <20190529122702.13035-10-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2019-05-29 13:10     ` Zhou, David(ChunMing)
2019-05-29 13:40     ` Pelloux-prayer, Pierre-eric

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=198d6abf-146d-c8f8-5602-37b95cd6b809@amd.com \
    --to=christian.koenig-5c7gfcevmho@public.gmane.org \
    --cc=David1.Zhou-5C7GfCeVMHo@public.gmane.org \
    --cc=Felix.Kuehling-5C7GfCeVMHo@public.gmane.org \
    --cc=Marek.Olsak-5C7GfCeVMHo@public.gmane.org \
    --cc=Prike.Liang-5C7GfCeVMHo@public.gmane.org \
    --cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    --cc=dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox