AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Chunming Zhou <zhoucm1-5C7GfCeVMHo@public.gmane.org>
To: "Koenig,
	Christian" <Christian.Koenig-5C7GfCeVMHo@public.gmane.org>,
	"Olsak, Marek" <Marek.Olsak-5C7GfCeVMHo@public.gmane.org>,
	"Zhou,
	David(ChunMing)" <David1.Zhou-5C7GfCeVMHo@public.gmane.org>,
	"Liang, Prike" <Prike.Liang-5C7GfCeVMHo@public.gmane.org>,
	"dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>,
	"amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
Subject: Re: [PATCH 06/10] drm/ttm: fix busy memory to fail other user v10
Date: Thu, 23 May 2019 11:50:37 +0000	[thread overview]
Message-ID: <5d68ba04-250d-918e-3633-ec45e5b18904@amd.com> (raw)
In-Reply-To: <16918096-1430-d581-7284-a987aacb89da-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>


在 2019/5/23 19:03, Christian König 写道:
> [CAUTION: External Email]
>
> Am 23.05.19 um 12:24 schrieb zhoucm1:
>>
>>
>> On 2019年05月22日 20:59, Christian König wrote:
>>> [CAUTION: External Email]
>>>
>>> BOs on the LRU might be blocked during command submission
>>> and cause OOM situations.
>>>
>>> Avoid this by blocking for the first busy BO not locked by
>>> the same ticket as the BO we are searching space for.
>>>
>>> v10: completely start over with the patch since we didn't
>>>       handled a whole bunch of corner cases.
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_bo.c | 77 
>>> ++++++++++++++++++++++++++++++------
>>>   1 file changed, 66 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 4c6389d849ed..861facac33d4 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -771,32 +771,72 @@ EXPORT_SYMBOL(ttm_bo_eviction_valuable);
>>>    * b. Otherwise, trylock it.
>>>    */
>>>   static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object
>>> *bo,
>>> -                       struct ttm_operation_ctx *ctx, bool *locked)
>>> +                       struct ttm_operation_ctx *ctx, bool *locked,
>>> bool *busy)
>>>   {
>>>          bool ret = false;
>>>
>>> -       *locked = false;
>>>          if (bo->resv == ctx->resv) {
>>>                  reservation_object_assert_held(bo->resv);
>>>                  if (ctx->flags & TTM_OPT_FLAG_ALLOW_RES_EVICT
>>>                      || !list_empty(&bo->ddestroy))
>>>                          ret = true;
>>> +               *locked = false;
>>> +               if (busy)
>>> +                       *busy = false;
>>>          } else {
>>> -               *locked = reservation_object_trylock(bo->resv);
>>> -               ret = *locked;
>>> +               ret = reservation_object_trylock(bo->resv);
>>> +               *locked = ret;
>>> +               if (busy)
>>> +                       *busy = !ret;
>>>          }
>>>
>>>          return ret;
>>>   }
>>>
>>> +/**
>>> + * ttm_mem_evict_wait_busy - wait for a busy BO to become available
>>> + *
>>> + * @busy_bo: BO which couldn't be locked with trylock
>>> + * @ctx: operation context
>>> + * @ticket: acquire ticket
>>> + *
>>> + * Try to lock a busy buffer object to avoid failing eviction.
>>> + */
>>> +static int ttm_mem_evict_wait_busy(struct ttm_buffer_object *busy_bo,
>>> +                                  struct ttm_operation_ctx *ctx,
>>> +                                  struct ww_acquire_ctx *ticket)
>>> +{
>>> +       int r;
>>> +
>>> +       if (!busy_bo || !ticket)
>>> +               return -EBUSY;
>>> +
>>> +       if (ctx->interruptible)
>>> +               r = 
>>> reservation_object_lock_interruptible(busy_bo->resv,
>>> + ticket);
>>> +       else
>>> +               r = reservation_object_lock(busy_bo->resv, ticket);
>>> +
>>> +       /*
>>> +        * TODO: It would be better to keep the BO locked until
>>> allocation is at
>>> +        * least tried one more time, but that would mean a much
>>> larger rework
>>> +        * of TTM.
>>> +        */
>>> +       if (!r)
>>> +               reservation_object_unlock(busy_bo->resv);
>>> +
>>> +       return r == -EDEADLK ? -EAGAIN : r;
>>> +}
>>> +
>>>   static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
>>>                                 uint32_t mem_type,
>>>                                 const struct ttm_place *place,
>>> -                              struct ttm_operation_ctx *ctx)
>>> +                              struct ttm_operation_ctx *ctx,
>>> +                              struct ww_acquire_ctx *ticket)
>>>   {
>>> +       struct ttm_buffer_object *bo = NULL, *busy_bo = NULL;
>>>          struct ttm_bo_global *glob = bdev->glob;
>>>          struct ttm_mem_type_manager *man = &bdev->man[mem_type];
>>> -       struct ttm_buffer_object *bo = NULL;
>>>          bool locked = false;
>>>          unsigned i;
>>>          int ret;
>>> @@ -804,8 +844,15 @@ static int ttm_mem_evict_first(struct
>>> ttm_bo_device *bdev,
>>>          spin_lock(&glob->lru_lock);
>>>          for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
>>>                  list_for_each_entry(bo, &man->lru[i], lru) {
>>> -                       if (!ttm_bo_evict_swapout_allowable(bo, ctx,
>>> &locked))
>>> +                       bool busy;
>>> +
>>> +                       if (!ttm_bo_evict_swapout_allowable(bo, ctx,
>>> &locked,
>>> + &busy)) {
>>> +                               if (busy && !busy_bo &&
>>> +                                   bo->resv->lock.ctx != ticket)
>>> +                                       busy_bo = bo;
>>>                                  continue;
>>> +                       }
>>>
>>>                          if (place &&
>>> !bdev->driver->eviction_valuable(bo,
>>> place)) {
>>> @@ -824,8 +871,13 @@ static int ttm_mem_evict_first(struct
>>> ttm_bo_device *bdev,
>>>          }
>>>
>>>          if (!bo) {
>>> +               if (busy_bo)
>>> +                       ttm_bo_get(busy_bo);
>>>                  spin_unlock(&glob->lru_lock);
>>> -               return -EBUSY;
>>> +               ret = ttm_mem_evict_wait_busy(busy_bo, ctx, ticket);
>> If you rely on EAGAIN, why do you still try to lock busy_bo? any
>> negative effect if directly return EAGAIN without tring lock?
>
> Yeah, that would burn a lot of CPU cycles because we would essentially
> busy wait for the BO to become unlocked.
>
> When we only return in case of a deadlock the other thread can continue
> with its eviction while we reacquire all looks during EAGAIN handling.
>
> Even directly unlocking the BO as I do here is a bit questionable. But I
> couldn't get the original logic with finding a new BO to evict to work
> correctly, that's why I have the TODO comment in the function itself as
> well.

Yes, it looks very wired.

original logic should already work verified by Prike. Friendly, you  
need go through lookup lru loop again, judge allowable, and whether 
busy_bo->resv and requried_bo->resv are same, and make evict_allowable 
in ctx for same lock of busy_bo before loop again.

Or at least, if busy_bo is evict allowable, you  can direclty evict 
busy_bo after locked successfully.

-David

>
> Christian.
>
>>
>> -David
>>> +               if (busy_bo)
>>> +                       ttm_bo_put(busy_bo);
>>> +               return ret;
>>>          }
>>>
>>>          kref_get(&bo->list_kref);
>>> @@ -911,7 +963,8 @@ static int ttm_bo_mem_force_space(struct
>>> ttm_buffer_object *bo,
>>>                          return ret;
>>>                  if (mem->mm_node)
>>>                          break;
>>> -               ret = ttm_mem_evict_first(bdev, mem->mem_type, place,
>>> ctx);
>>> +               ret = ttm_mem_evict_first(bdev, mem->mem_type, place,
>>> ctx,
>>> + bo->resv->lock.ctx);
>>>                  if (unlikely(ret != 0))
>>>                          return ret;
>>>          } while (1);
>>> @@ -1426,7 +1479,8 @@ static int ttm_bo_force_list_clean(struct
>>> ttm_bo_device *bdev,
>>>          for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
>>>                  while (!list_empty(&man->lru[i])) {
>>>                          spin_unlock(&glob->lru_lock);
>>> -                       ret = ttm_mem_evict_first(bdev, mem_type,
>>> NULL, &ctx);
>>> +                       ret = ttm_mem_evict_first(bdev, mem_type,
>>> NULL, &ctx,
>>> +                                                 NULL);
>>>                          if (ret)
>>>                                  return ret;
>>>                          spin_lock(&glob->lru_lock);
>>> @@ -1797,7 +1851,8 @@ int ttm_bo_swapout(struct ttm_bo_global *glob,
>>> struct ttm_operation_ctx *ctx)
>>>          spin_lock(&glob->lru_lock);
>>>          for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
>>>                  list_for_each_entry(bo, &glob->swap_lru[i], swap) {
>>> -                       if (ttm_bo_evict_swapout_allowable(bo, ctx,
>>> &locked)) {
>>> +                       if (ttm_bo_evict_swapout_allowable(bo, ctx,
>>> &locked,
>>> + NULL)) {
>>>                                  ret = 0;
>>>                                  break;
>>>                          }
>>> -- 
>>> 2.17.1
>>>
>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  parent reply	other threads:[~2019-05-23 11:50 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-22 12:59 [PATCH 01/10] drm/ttm: Make LRU removal optional Christian König
2019-05-22 12:59 ` [PATCH 02/10] drm/ttm: return immediately in case of a signal Christian König
2019-05-22 12:59 ` [PATCH 03/10] drm/ttm: remove manual placement preference Christian König
2019-05-22 12:59 ` [PATCH 10/10] drm/amdgpu: stop removing BOs from the LRU v3 Christian König
     [not found]   ` <20190522125947.4592-10-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2019-05-22 19:43     ` Kuehling, Felix
     [not found]       ` <48ac98a8-de22-3549-5d63-078a0effab72-5C7GfCeVMHo@public.gmane.org>
2019-05-23  9:06         ` Christian König
     [not found]           ` <eea6245e-616d-eb16-8521-2f21ce5d6d25-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-05-24 21:34             ` Kuehling, Felix
     [not found]               ` <776d29df-428f-ad98-8e38-4b191b602abb-5C7GfCeVMHo@public.gmane.org>
2019-05-27 10:51                 ` Koenig, Christian
2019-05-23  8:27     ` Liang, Prike
     [not found] ` <20190522125947.4592-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2019-05-22 12:59   ` [PATCH 04/10] drm/ttm: cleanup ttm_bo_mem_space Christian König
2019-05-22 12:59   ` [PATCH 05/10] drm/ttm: immediately move BOs to the new LRU v2 Christian König
2019-05-22 12:59   ` [PATCH 06/10] drm/ttm: fix busy memory to fail other user v10 Christian König
2019-05-23 10:24     ` zhoucm1
2019-05-23 11:03       ` Christian König
     [not found]         ` <16918096-1430-d581-7284-a987aacb89da-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-05-23 11:50           ` Chunming Zhou [this message]
     [not found]             ` <5d68ba04-250d-918e-3633-ec45e5b18904-5C7GfCeVMHo@public.gmane.org>
2019-05-23 14:15               ` Koenig, Christian
2019-05-24  5:35         ` Liang, Prike
     [not found]           ` <MN2PR12MB35364235378F29899838CD80FB020-rweVpJHSKTovpq7YPKzLfQdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-05-24  8:49             ` Christian König
     [not found]     ` <20190522125947.4592-6-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2019-06-26  6:36       ` Kuehling, Felix
2019-05-22 12:59   ` [PATCH 07/10] drm/amd/display: use ttm_eu_reserve_buffers instead of amdgpu_bo_reserve v2 Christian König
2019-05-22 12:59   ` [PATCH 08/10] drm/amdgpu: drop some validation failure messages Christian König
2019-05-22 12:59   ` [PATCH 09/10] drm/amdgpu: create GDS, GWS and OA in system domain Christian König
2019-05-23  9:15   ` [PATCH 01/10] drm/ttm: Make LRU removal optional zhoucm1
     [not found]     ` <fbb023f9-28e7-2ac8-994f-e262da597098-5C7GfCeVMHo@public.gmane.org>
2019-05-23  9:39       ` Christian König
  -- strict thread matches above, loose matches on Subject: below --
2019-05-28 16:25 [PATCH 01/10] drm/ttm: Make LRU removal optional v2 Christian König
2019-05-28 16:25 ` [PATCH 06/10] drm/ttm: fix busy memory to fail other user v10 Christian König
2019-05-29 12:26 [PATCH 01/10] drm/ttm: Make LRU removal optional v2 Christian König
2019-05-29 12:26 ` [PATCH 06/10] drm/ttm: fix busy memory to fail other user v10 Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5d68ba04-250d-918e-3633-ec45e5b18904@amd.com \
    --to=zhoucm1-5c7gfcevmho@public.gmane.org \
    --cc=Christian.Koenig-5C7GfCeVMHo@public.gmane.org \
    --cc=David1.Zhou-5C7GfCeVMHo@public.gmane.org \
    --cc=Marek.Olsak-5C7GfCeVMHo@public.gmane.org \
    --cc=Prike.Liang-5C7GfCeVMHo@public.gmane.org \
    --cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    --cc=dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox