AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Chen, Xiaogang" <xiaogang.chen@amd.com>
To: Zhu Lingshan <lingshan.zhu@amd.com>, amd-gfx@lists.freedesktop.org
Cc: felix.kuehling@amd.com, philip.yang@amd.com
Subject: Re: [PATCH v2] drm/amdkfd: change kfd process kref count at creation
Date: Mon, 14 Oct 2024 23:49:11 -0500	[thread overview]
Message-ID: <bb26d8c5-0ae8-4a74-9f8a-799e0d6c063d@amd.com> (raw)
In-Reply-To: <5e372d79-ceca-4509-8bf6-f18175744817@amd.com>

[-- Attachment #1: Type: text/plain, Size: 5529 bytes --]


On 10/14/2024 9:51 PM, Zhu Lingshan wrote:
> On 10/14/2024 11:07 PM, Chen, Xiaogang wrote:
>> On 10/13/2024 8:55 PM, Zhu Lingshan wrote:
>>> On 10/13/2024 1:30 AM, Chen, Xiaogang wrote:
>>>> On 10/11/2024 9:56 PM, Zhu Lingshan wrote:
>>>>> On 10/11/2024 10:41 PM, Xiaogang.Chen wrote:
>>>>>> From: Xiaogang Chen<xiaogang.chen@amd.com>
>>>>>>
>>>>>> kfd process kref count(process->ref) is initialized to 1 by kref_init. After
>>>>>> it is created not need to increaes its kref. Instad add kfd process kref at kfd
>>>>>> process mmu notifier allocation since we decrease the ref at free_notifier of
>>>>>> mmu_notifier_ops, so pair them.
>>>>>>
>>>>>> When user process opens kfd node multiple times the kfd process kref is
>>>>>> increased each time to balance kfd node close operation.
>>>>>>
>>>>>> Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/amd/amdkfd/kfd_process.c | 15 ++++++++++-----
>>>>>>     1 file changed, 10 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> index d07acf1b2f93..78bf918abf92 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> @@ -850,8 +850,10 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>>>>>>             goto out;
>>>>>>         }
>>>>>>     -    /* A prior open of /dev/kfd could have already created the process. */
>>>>>> -    process = find_process(thread, false);
>>>>>> +    /* A prior open of /dev/kfd could have already created the process.
>>>>>> +     * find_process will increase process kref in this case
>>>>>> +     */
>>>>>> +    process = find_process(thread, true);
>>>>>>         if (process) {
>>>>>>             pr_debug("Process already found\n");
>>>>>>         } else {
>>>>>> @@ -899,8 +901,6 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>>>>>>             init_waitqueue_head(&process->wait_irq_drain);
>>>>>>         }
>>>>>>     out:
>>>>>> -    if (!IS_ERR(process))
>>>>>> -        kref_get(&process->ref);
>>>>>>         mutex_unlock(&kfd_processes_mutex);
>>>>>>         mmput(thread->mm);
>>>>>>     @@ -1191,7 +1191,12 @@ static struct mmu_notifier *kfd_process_alloc_notifier(struct mm_struct *mm)
>>>>>>           srcu_read_unlock(&kfd_processes_srcu, idx);
>>>>>>     -    return p ? &p->mmu_notifier : ERR_PTR(-ESRCH);
>>>>>> +    if (p) {
>>>>>> +        kref_get(&p->ref);
>>>>>> +        return &p->mmu_notifier;
>>>>>> +    }
>>>>>> +
>>>>>> +    return ERR_PTR(-ESRCH);
>>>>> this cb should only allocate the notifier (here it returns an existing notifier ),
>>>>> so I am not sure this is a better place to increase the kref, it seems coupling
>>>>> two low correlated routines.
>>>>>
>>>>> kref is decreased in the free notifier, but not mean it has to be increased in alloc notifier.
>>>> Who referring kfd process should also un-referrer it after finish. Any client should not do un-refer if it did not refer. That keeps balance in clean way.
>>> I think we already do so, see any functions call kfd_lookup_process_by_xxx would unref the kref of the kfd_process.
>>>> The current way is using  mmu's free notifier to unref kfref that was added by kfd process creation. Ex: if not use mmu notifier there would be extra kref that prevent release kfd process.
>>> I am not sure this is about paring, current design is to free the last kref when the whole program exits by the mmu free notifier, so it would destroy the kfd_process.
>>> MMU free notifier would be certainly invoked since it has been registered.
>> This patch is about having "get/put" at correct places, or keeping kref balance in a clean way. We have 'put' kferf at mmu free notifier why not have 'get' kfref at mmu registry(alloc) notifier?
> If we place increasing kref in mmu alloc notifier, it is still increased at kfd_process creation time, actually no difference, but inexplicitly done. Others need to dive into mmu ops to understand. Current approach  actually has a better readability.
I think this patch has better readability that it pairs kref of 
kfd_process in "get" and "put". People see the kref got added at 
"alloc_notifier", and decreased at "free_notifier" in same source file, 
not need to dive into mmu ops. When people see it got decreased at 
free_notifier they would wonder why the kref is not increased at 
alloc_notifier.
>
> MMU alloc notifier is invoked through locking, it locks the whole mm, so better not to add extra dispensable code there.
The change is adding kref for kfd_process , not mm or mmu_notifier at 
alloc_notifier. MMU free_notifier is more complicated then alloc 
notifier. free_notifier is triggered by scru callback and we have kfref 
updated at free_notifier, why not at alloc_notifier?
>
> Current solution runs for years and this change actually does not fix an issue

As said this patch is having "get/put" at correct places, or keeping 
kref balance in a clean way. Do you see any regression?

Regards

Xiaogang

>
> Thanks
> Lingshan
>> Regards
>>
>> Xiaogang
>>
>>> Thanks
>>> Lingshan
>>>> The final kref is same. The patch just makes the balance in a logical way.
>>>>
>>>> Regards
>>>>
>>>> Xiaogang
>>>>
>>>>> Thanks
>>>>> Lingshan
>>>>>
>>>>>>       static void kfd_process_free_notifier(struct mmu_notifier *mn)

[-- Attachment #2: Type: text/html, Size: 8507 bytes --]

  reply	other threads:[~2024-10-15  4:49 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-11 14:41 [PATCH v2] drm/amdkfd: change kfd process kref count at creation Xiaogang.Chen
2024-10-12  2:56 ` Zhu Lingshan
2024-10-12 17:30   ` Chen, Xiaogang
2024-10-14  1:55     ` Zhu Lingshan
2024-10-14 15:07       ` Chen, Xiaogang
2024-10-15  2:51         ` Zhu Lingshan
2024-10-15  4:49           ` Chen, Xiaogang [this message]
2024-10-18 19:14 ` Felix Kuehling
2024-10-18 22:02   ` Chen, Xiaogang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb26d8c5-0ae8-4a74-9f8a-799e0d6c063d@amd.com \
    --to=xiaogang.chen@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=lingshan.zhu@amd.com \
    --cc=philip.yang@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox