AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: "Yu, Lang" <Lang.Yu@amd.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Cc: "Kuehling, Felix" <Felix.Kuehling@amd.com>,
	"Huang, Ray" <Ray.Huang@amd.com>
Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
Date: Fri, 24 Sep 2021 08:37:26 +0200	[thread overview]
Message-ID: <fc860cdc-e17a-ac94-b863-df8c21d4bacf@gmail.com> (raw)
In-Reply-To: <DM6PR12MB4250D2851824AA4F27BE7FDBFBA49@DM6PR12MB4250.namprd12.prod.outlook.com>

Am 24.09.21 um 08:34 schrieb Yu, Lang:
> [AMD Official Use Only]
>
>
>
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: Friday, September 24, 2021 1:54 PM
>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Huang, Ray
>> <Ray.Huang@amd.com>
>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>
>>
>> Am 24.09.21 um 07:50 schrieb Yu, Lang:
>>> [AMD Official Use Only]
>>>> [SNIP]
>>>>>>> Hi Christian,
>>>>>>>
>>>>>>> Thanks for your explanation and advice. I got your point.
>>>>>>> Actually, these BOs are allocated and pinned during a kfd process lifecycle.
>>>>>>> I will try to add a flag into struct kgd_mem to indicate which BO
>>>>>>> is pined and should be unpinned. Which will make
>>>>>>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks!
>>>>>> That isn't to much better. The real solution would be to unpin them
>>>>>> when the kfd process is destroyed.
>>>>> Yes, will unpin them when the kfd process is destroyed.
>>>>> But we should indicate which BO is pinned and should be unpinned. Right?
>>>> Well not with a flag or something like that.
>>>>
>>>> The knowledge which BO is pinned and needs to be unpinned should come
>>>> from the control logic and not be papered over by some general handling.
>>>> That's the background why we have removed the general handling for
>>>> this from TTM in the first place.
>>>>
>>>> In other words when you need to pin a BO because it is kmapped it
>>>> should be unpinned when it is kunmapped and if you don't kunmap at
>>>> all then there is something wrong with the code structure from a higher level
>> point of view.
>>> Yes, this function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" did a
>>> kmap, but without a kunmap when the kfd process is destroyed. The flag
>> actually indicates kmap/kunmap.
>>
>> Well that's the wrong approach then. I mean you need to have the BO reference
>> and the mapped pointer somewhere, don't you?
>>
>> How do you clean those up?
> They are respectively cleaned by " kfd_process_device_free_bos " and " kfd_process_destroy_pdds".
> Let me put the code here. Thanks!
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index ec028cf963f5..d65b3bf13fd8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -81,6 +81,7 @@ struct kgd_mem {
>
>          bool aql_queue;
>          bool is_imported;
> +       bool is_mapped_to_kernel;

Yeah, that is exactly what you absolutely should NOT do.

>   };
>
>   /* KFD Memory Eviction */
> @@ -278,6 +279,8 @@ int amdgpu_amdkfd_gpuvm_sync_memory(
>                  struct kgd_dev *kgd, struct kgd_mem *mem, bool intr);
>   int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
>                  struct kgd_mem *mem, void **kptr, uint64_t *size);

The real question is who is calling this function here?

> +void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev *kgd,
> +               struct kgd_mem *mem);
>   int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info,
>                                              struct dma_fence **ef);
>   int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 2d6b2d77b738..45ccbe9f63ee 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1857,6 +1857,8 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
>
>          amdgpu_bo_unreserve(bo);
>
> +       mem->is_mapped_to_kernel = true;
> +
>          mutex_unlock(&mem->process_info->lock);
>          return 0;
>
> @@ -1870,6 +1872,20 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
>          return ret;
>   }
>
> +void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev *kgd, struct kgd_mem *mem)
> +{
> +       struct amdgpu_bo *bo = mem->bo;
> +
> +       if (!mem->is_mapped_to_kernel)
> +               return;
> +
> +       amdgpu_bo_reserve(bo, true);
> +       amdgpu_bo_kunmap(bo);
> +       amdgpu_bo_unpin(bo);
> +       amdgpu_bo_unreserve(bo);
> +       mem->is_mapped_to_kernel = false;
> +}
> +
>   int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd,
>                                                struct kfd_vm_fault_info *mem)
>   {
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index 21ec8a18cad2..f5506b153aed 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -941,6 +941,8 @@ static void kfd_process_device_free_bos(struct kfd_process_device *pdd)
>                                  peer_pdd->dev->kgd, mem, peer_pdd->drm_priv);
>                  }
>
> +               amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(pdd->dev->kgd, mem);
> +

That's a general cleanup function for user space allocations and should 
not be abused for stuff like that.

Regards,
Christian.

>                  amdgpu_amdkfd_gpuvm_free_memory_of_gpu(pdd->dev->kgd, mem,
>                                                         pdd->drm_priv, NULL);
>                  kfd_process_device_remove_obj_handle(pdd, id);
>
> Regards,
> Lang
>
>> Regards,
>> Christian.


  reply	other threads:[~2021-09-24  6:37 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-23  9:44 [PATCH] drm/kfd: fix ttm_bo_release warning Lang Yu
2021-09-23 11:39 ` Christian König
2021-09-23 12:09   ` Yu, Lang
2021-09-23 12:23     ` Christian König
2021-09-23 14:24       ` Yu, Lang
2021-09-23 14:52         ` Christian König
2021-09-24  5:35           ` Yu, Lang
2021-09-24  5:42             ` Christian König
2021-09-24  5:50               ` Yu, Lang
2021-09-24  5:54                 ` Christian König
2021-09-24  6:34                   ` Yu, Lang
2021-09-24  6:37                     ` Christian König [this message]
2021-09-24 10:37                       ` Yu, Lang
2021-09-23 16:21     ` Felix Kuehling
2021-09-24  5:35       ` Yu, Lang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fc860cdc-e17a-ac94-b863-df8c21d4bacf@gmail.com \
    --to=ckoenig.leichtzumerken@gmail.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Lang.Yu@amd.com \
    --cc=Ray.Huang@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox