AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] drm/amdkfd: change kfd process kref count at creation
@ 2024-10-11 14:41 Xiaogang.Chen
  2024-10-12  2:56 ` Zhu Lingshan
  2024-10-18 19:14 ` Felix Kuehling
  0 siblings, 2 replies; 9+ messages in thread
From: Xiaogang.Chen @ 2024-10-11 14:41 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, philip.yang, Xiaogang Chen, Xiaogang Chen

From: Xiaogang Chen <xiaogang.chen@amd.com>

kfd process kref count(process->ref) is initialized to 1 by kref_init. After
it is created not need to increaes its kref. Instad add kfd process kref at kfd
process mmu notifier allocation since we decrease the ref at free_notifier of
mmu_notifier_ops, so pair them.

When user process opens kfd node multiple times the kfd process kref is
increased each time to balance kfd node close operation.

Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index d07acf1b2f93..78bf918abf92 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -850,8 +850,10 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
 		goto out;
 	}
 
-	/* A prior open of /dev/kfd could have already created the process. */
-	process = find_process(thread, false);
+	/* A prior open of /dev/kfd could have already created the process.
+	 * find_process will increase process kref in this case
+	 */
+	process = find_process(thread, true);
 	if (process) {
 		pr_debug("Process already found\n");
 	} else {
@@ -899,8 +901,6 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
 		init_waitqueue_head(&process->wait_irq_drain);
 	}
 out:
-	if (!IS_ERR(process))
-		kref_get(&process->ref);
 	mutex_unlock(&kfd_processes_mutex);
 	mmput(thread->mm);
 
@@ -1191,7 +1191,12 @@ static struct mmu_notifier *kfd_process_alloc_notifier(struct mm_struct *mm)
 
 	srcu_read_unlock(&kfd_processes_srcu, idx);
 
-	return p ? &p->mmu_notifier : ERR_PTR(-ESRCH);
+	if (p) {
+		kref_get(&p->ref);
+		return &p->mmu_notifier;
+	}
+
+	return ERR_PTR(-ESRCH);
 }
 
 static void kfd_process_free_notifier(struct mmu_notifier *mn)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/amdkfd: change kfd process kref count at creation
  2024-10-11 14:41 [PATCH v2] drm/amdkfd: change kfd process kref count at creation Xiaogang.Chen
@ 2024-10-12  2:56 ` Zhu Lingshan
  2024-10-12 17:30   ` Chen, Xiaogang
  2024-10-18 19:14 ` Felix Kuehling
  1 sibling, 1 reply; 9+ messages in thread
From: Zhu Lingshan @ 2024-10-12  2:56 UTC (permalink / raw)
  To: Xiaogang.Chen, amd-gfx; +Cc: felix.kuehling, philip.yang

On 10/11/2024 10:41 PM, Xiaogang.Chen wrote:
> From: Xiaogang Chen <xiaogang.chen@amd.com>
>
> kfd process kref count(process->ref) is initialized to 1 by kref_init. After
> it is created not need to increaes its kref. Instad add kfd process kref at kfd
> process mmu notifier allocation since we decrease the ref at free_notifier of
> mmu_notifier_ops, so pair them.
>
> When user process opens kfd node multiple times the kfd process kref is
> increased each time to balance kfd node close operation.
>
> Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index d07acf1b2f93..78bf918abf92 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -850,8 +850,10 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>  		goto out;
>  	}
>  
> -	/* A prior open of /dev/kfd could have already created the process. */
> -	process = find_process(thread, false);
> +	/* A prior open of /dev/kfd could have already created the process.
> +	 * find_process will increase process kref in this case
> +	 */
> +	process = find_process(thread, true);
>  	if (process) {
>  		pr_debug("Process already found\n");
>  	} else {
> @@ -899,8 +901,6 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>  		init_waitqueue_head(&process->wait_irq_drain);
>  	}
>  out:
> -	if (!IS_ERR(process))
> -		kref_get(&process->ref);
>  	mutex_unlock(&kfd_processes_mutex);
>  	mmput(thread->mm);
>  
> @@ -1191,7 +1191,12 @@ static struct mmu_notifier *kfd_process_alloc_notifier(struct mm_struct *mm)
>  
>  	srcu_read_unlock(&kfd_processes_srcu, idx);
>  
> -	return p ? &p->mmu_notifier : ERR_PTR(-ESRCH);
> +	if (p) {
> +		kref_get(&p->ref);
> +		return &p->mmu_notifier;
> +	}
> +
> +	return ERR_PTR(-ESRCH);
this cb should only allocate the notifier (here it returns an existing notifier ),
so I am not sure this is a better place to increase the kref, it seems coupling
two low correlated routines.

kref is decreased in the free notifier, but not mean it has to be increased in alloc notifier.

Thanks
Lingshan

>  
>  static void kfd_process_free_notifier(struct mmu_notifier *mn)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/amdkfd: change kfd process kref count at creation
  2024-10-12  2:56 ` Zhu Lingshan
@ 2024-10-12 17:30   ` Chen, Xiaogang
  2024-10-14  1:55     ` Zhu Lingshan
  0 siblings, 1 reply; 9+ messages in thread
From: Chen, Xiaogang @ 2024-10-12 17:30 UTC (permalink / raw)
  To: Zhu Lingshan, amd-gfx; +Cc: felix.kuehling, philip.yang


On 10/11/2024 9:56 PM, Zhu Lingshan wrote:
> On 10/11/2024 10:41 PM, Xiaogang.Chen wrote:
>> From: Xiaogang Chen <xiaogang.chen@amd.com>
>>
>> kfd process kref count(process->ref) is initialized to 1 by kref_init. After
>> it is created not need to increaes its kref. Instad add kfd process kref at kfd
>> process mmu notifier allocation since we decrease the ref at free_notifier of
>> mmu_notifier_ops, so pair them.
>>
>> When user process opens kfd node multiple times the kfd process kref is
>> increased each time to balance kfd node close operation.
>>
>> Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 15 ++++++++++-----
>>   1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> index d07acf1b2f93..78bf918abf92 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> @@ -850,8 +850,10 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>>   		goto out;
>>   	}
>>   
>> -	/* A prior open of /dev/kfd could have already created the process. */
>> -	process = find_process(thread, false);
>> +	/* A prior open of /dev/kfd could have already created the process.
>> +	 * find_process will increase process kref in this case
>> +	 */
>> +	process = find_process(thread, true);
>>   	if (process) {
>>   		pr_debug("Process already found\n");
>>   	} else {
>> @@ -899,8 +901,6 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>>   		init_waitqueue_head(&process->wait_irq_drain);
>>   	}
>>   out:
>> -	if (!IS_ERR(process))
>> -		kref_get(&process->ref);
>>   	mutex_unlock(&kfd_processes_mutex);
>>   	mmput(thread->mm);
>>   
>> @@ -1191,7 +1191,12 @@ static struct mmu_notifier *kfd_process_alloc_notifier(struct mm_struct *mm)
>>   
>>   	srcu_read_unlock(&kfd_processes_srcu, idx);
>>   
>> -	return p ? &p->mmu_notifier : ERR_PTR(-ESRCH);
>> +	if (p) {
>> +		kref_get(&p->ref);
>> +		return &p->mmu_notifier;
>> +	}
>> +
>> +	return ERR_PTR(-ESRCH);
> this cb should only allocate the notifier (here it returns an existing notifier ),
> so I am not sure this is a better place to increase the kref, it seems coupling
> two low correlated routines.
>
> kref is decreased in the free notifier, but not mean it has to be increased in alloc notifier.

Who referring kfd process should also un-referrer it after finish. Any 
client should not do un-refer if it did not refer. That keeps balance in 
clean way.

The current way is using  mmu's free notifier to unref kfref that was 
added by kfd process creation. Ex: if not use mmu notifier there would 
be extra kref that prevent release kfd process.

The final kref is same. The patch just makes the balance in a logical way.

Regards

Xiaogang

>
> Thanks
> Lingshan
>
>>   
>>   static void kfd_process_free_notifier(struct mmu_notifier *mn)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/amdkfd: change kfd process kref count at creation
  2024-10-12 17:30   ` Chen, Xiaogang
@ 2024-10-14  1:55     ` Zhu Lingshan
  2024-10-14 15:07       ` Chen, Xiaogang
  0 siblings, 1 reply; 9+ messages in thread
From: Zhu Lingshan @ 2024-10-14  1:55 UTC (permalink / raw)
  To: Chen, Xiaogang, amd-gfx; +Cc: felix.kuehling, philip.yang

On 10/13/2024 1:30 AM, Chen, Xiaogang wrote:
>
> On 10/11/2024 9:56 PM, Zhu Lingshan wrote:
>> On 10/11/2024 10:41 PM, Xiaogang.Chen wrote:
>>> From: Xiaogang Chen <xiaogang.chen@amd.com>
>>>
>>> kfd process kref count(process->ref) is initialized to 1 by kref_init. After
>>> it is created not need to increaes its kref. Instad add kfd process kref at kfd
>>> process mmu notifier allocation since we decrease the ref at free_notifier of
>>> mmu_notifier_ops, so pair them.
>>>
>>> When user process opens kfd node multiple times the kfd process kref is
>>> increased each time to balance kfd node close operation.
>>>
>>> Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 15 ++++++++++-----
>>>   1 file changed, 10 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> index d07acf1b2f93..78bf918abf92 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> @@ -850,8 +850,10 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>>>           goto out;
>>>       }
>>>   -    /* A prior open of /dev/kfd could have already created the process. */
>>> -    process = find_process(thread, false);
>>> +    /* A prior open of /dev/kfd could have already created the process.
>>> +     * find_process will increase process kref in this case
>>> +     */
>>> +    process = find_process(thread, true);
>>>       if (process) {
>>>           pr_debug("Process already found\n");
>>>       } else {
>>> @@ -899,8 +901,6 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>>>           init_waitqueue_head(&process->wait_irq_drain);
>>>       }
>>>   out:
>>> -    if (!IS_ERR(process))
>>> -        kref_get(&process->ref);
>>>       mutex_unlock(&kfd_processes_mutex);
>>>       mmput(thread->mm);
>>>   @@ -1191,7 +1191,12 @@ static struct mmu_notifier *kfd_process_alloc_notifier(struct mm_struct *mm)
>>>         srcu_read_unlock(&kfd_processes_srcu, idx);
>>>   -    return p ? &p->mmu_notifier : ERR_PTR(-ESRCH);
>>> +    if (p) {
>>> +        kref_get(&p->ref);
>>> +        return &p->mmu_notifier;
>>> +    }
>>> +
>>> +    return ERR_PTR(-ESRCH);
>> this cb should only allocate the notifier (here it returns an existing notifier ),
>> so I am not sure this is a better place to increase the kref, it seems coupling
>> two low correlated routines.
>>
>> kref is decreased in the free notifier, but not mean it has to be increased in alloc notifier.
>
> Who referring kfd process should also un-referrer it after finish. Any client should not do un-refer if it did not refer. That keeps balance in clean way.
I think we already do so, see any functions call kfd_lookup_process_by_xxx would unref the kref of the kfd_process.
>
> The current way is using  mmu's free notifier to unref kfref that was added by kfd process creation. Ex: if not use mmu notifier there would be extra kref that prevent release kfd process.
I am not sure this is about paring, current design is to free the last kref when the whole program exits by the mmu free notifier, so it would destroy the kfd_process.
MMU free notifier would be certainly invoked since it has been registered.

Thanks
Lingshan
>
> The final kref is same. The patch just makes the balance in a logical way.
>
> Regards
>
> Xiaogang
>
>>
>> Thanks
>> Lingshan
>>
>>>     static void kfd_process_free_notifier(struct mmu_notifier *mn)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/amdkfd: change kfd process kref count at creation
  2024-10-14  1:55     ` Zhu Lingshan
@ 2024-10-14 15:07       ` Chen, Xiaogang
  2024-10-15  2:51         ` Zhu Lingshan
  0 siblings, 1 reply; 9+ messages in thread
From: Chen, Xiaogang @ 2024-10-14 15:07 UTC (permalink / raw)
  To: Zhu Lingshan, amd-gfx; +Cc: felix.kuehling, philip.yang


On 10/13/2024 8:55 PM, Zhu Lingshan wrote:
> On 10/13/2024 1:30 AM, Chen, Xiaogang wrote:
>> On 10/11/2024 9:56 PM, Zhu Lingshan wrote:
>>> On 10/11/2024 10:41 PM, Xiaogang.Chen wrote:
>>>> From: Xiaogang Chen <xiaogang.chen@amd.com>
>>>>
>>>> kfd process kref count(process->ref) is initialized to 1 by kref_init. After
>>>> it is created not need to increaes its kref. Instad add kfd process kref at kfd
>>>> process mmu notifier allocation since we decrease the ref at free_notifier of
>>>> mmu_notifier_ops, so pair them.
>>>>
>>>> When user process opens kfd node multiple times the kfd process kref is
>>>> increased each time to balance kfd node close operation.
>>>>
>>>> Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdkfd/kfd_process.c | 15 ++++++++++-----
>>>>    1 file changed, 10 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> index d07acf1b2f93..78bf918abf92 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> @@ -850,8 +850,10 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>>>>            goto out;
>>>>        }
>>>>    -    /* A prior open of /dev/kfd could have already created the process. */
>>>> -    process = find_process(thread, false);
>>>> +    /* A prior open of /dev/kfd could have already created the process.
>>>> +     * find_process will increase process kref in this case
>>>> +     */
>>>> +    process = find_process(thread, true);
>>>>        if (process) {
>>>>            pr_debug("Process already found\n");
>>>>        } else {
>>>> @@ -899,8 +901,6 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>>>>            init_waitqueue_head(&process->wait_irq_drain);
>>>>        }
>>>>    out:
>>>> -    if (!IS_ERR(process))
>>>> -        kref_get(&process->ref);
>>>>        mutex_unlock(&kfd_processes_mutex);
>>>>        mmput(thread->mm);
>>>>    @@ -1191,7 +1191,12 @@ static struct mmu_notifier *kfd_process_alloc_notifier(struct mm_struct *mm)
>>>>          srcu_read_unlock(&kfd_processes_srcu, idx);
>>>>    -    return p ? &p->mmu_notifier : ERR_PTR(-ESRCH);
>>>> +    if (p) {
>>>> +        kref_get(&p->ref);
>>>> +        return &p->mmu_notifier;
>>>> +    }
>>>> +
>>>> +    return ERR_PTR(-ESRCH);
>>> this cb should only allocate the notifier (here it returns an existing notifier ),
>>> so I am not sure this is a better place to increase the kref, it seems coupling
>>> two low correlated routines.
>>>
>>> kref is decreased in the free notifier, but not mean it has to be increased in alloc notifier.
>> Who referring kfd process should also un-referrer it after finish. Any client should not do un-refer if it did not refer. That keeps balance in clean way.
> I think we already do so, see any functions call kfd_lookup_process_by_xxx would unref the kref of the kfd_process.
>> The current way is using  mmu's free notifier to unref kfref that was added by kfd process creation. Ex: if not use mmu notifier there would be extra kref that prevent release kfd process.
> I am not sure this is about paring, current design is to free the last kref when the whole program exits by the mmu free notifier, so it would destroy the kfd_process.
> MMU free notifier would be certainly invoked since it has been registered.

This patch is about having "get/put" at correct places, or keeping kref 
balance in a clean way. We have 'put' kferf at mmu free notifier why not 
have 'get' kfref at mmu registry(alloc) notifier?

Regards

Xiaogang

>
> Thanks
> Lingshan
>> The final kref is same. The patch just makes the balance in a logical way.
>>
>> Regards
>>
>> Xiaogang
>>
>>> Thanks
>>> Lingshan
>>>
>>>>      static void kfd_process_free_notifier(struct mmu_notifier *mn)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/amdkfd: change kfd process kref count at creation
  2024-10-14 15:07       ` Chen, Xiaogang
@ 2024-10-15  2:51         ` Zhu Lingshan
  2024-10-15  4:49           ` Chen, Xiaogang
  0 siblings, 1 reply; 9+ messages in thread
From: Zhu Lingshan @ 2024-10-15  2:51 UTC (permalink / raw)
  To: Chen, Xiaogang, amd-gfx; +Cc: felix.kuehling, philip.yang

On 10/14/2024 11:07 PM, Chen, Xiaogang wrote:
>
> On 10/13/2024 8:55 PM, Zhu Lingshan wrote:
>> On 10/13/2024 1:30 AM, Chen, Xiaogang wrote:
>>> On 10/11/2024 9:56 PM, Zhu Lingshan wrote:
>>>> On 10/11/2024 10:41 PM, Xiaogang.Chen wrote:
>>>>> From: Xiaogang Chen <xiaogang.chen@amd.com>
>>>>>
>>>>> kfd process kref count(process->ref) is initialized to 1 by kref_init. After
>>>>> it is created not need to increaes its kref. Instad add kfd process kref at kfd
>>>>> process mmu notifier allocation since we decrease the ref at free_notifier of
>>>>> mmu_notifier_ops, so pair them.
>>>>>
>>>>> When user process opens kfd node multiple times the kfd process kref is
>>>>> increased each time to balance kfd node close operation.
>>>>>
>>>>> Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdkfd/kfd_process.c | 15 ++++++++++-----
>>>>>    1 file changed, 10 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>> index d07acf1b2f93..78bf918abf92 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>> @@ -850,8 +850,10 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>>>>>            goto out;
>>>>>        }
>>>>>    -    /* A prior open of /dev/kfd could have already created the process. */
>>>>> -    process = find_process(thread, false);
>>>>> +    /* A prior open of /dev/kfd could have already created the process.
>>>>> +     * find_process will increase process kref in this case
>>>>> +     */
>>>>> +    process = find_process(thread, true);
>>>>>        if (process) {
>>>>>            pr_debug("Process already found\n");
>>>>>        } else {
>>>>> @@ -899,8 +901,6 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>>>>>            init_waitqueue_head(&process->wait_irq_drain);
>>>>>        }
>>>>>    out:
>>>>> -    if (!IS_ERR(process))
>>>>> -        kref_get(&process->ref);
>>>>>        mutex_unlock(&kfd_processes_mutex);
>>>>>        mmput(thread->mm);
>>>>>    @@ -1191,7 +1191,12 @@ static struct mmu_notifier *kfd_process_alloc_notifier(struct mm_struct *mm)
>>>>>          srcu_read_unlock(&kfd_processes_srcu, idx);
>>>>>    -    return p ? &p->mmu_notifier : ERR_PTR(-ESRCH);
>>>>> +    if (p) {
>>>>> +        kref_get(&p->ref);
>>>>> +        return &p->mmu_notifier;
>>>>> +    }
>>>>> +
>>>>> +    return ERR_PTR(-ESRCH);
>>>> this cb should only allocate the notifier (here it returns an existing notifier ),
>>>> so I am not sure this is a better place to increase the kref, it seems coupling
>>>> two low correlated routines.
>>>>
>>>> kref is decreased in the free notifier, but not mean it has to be increased in alloc notifier.
>>> Who referring kfd process should also un-referrer it after finish. Any client should not do un-refer if it did not refer. That keeps balance in clean way.
>> I think we already do so, see any functions call kfd_lookup_process_by_xxx would unref the kref of the kfd_process.
>>> The current way is using  mmu's free notifier to unref kfref that was added by kfd process creation. Ex: if not use mmu notifier there would be extra kref that prevent release kfd process.
>> I am not sure this is about paring, current design is to free the last kref when the whole program exits by the mmu free notifier, so it would destroy the kfd_process.
>> MMU free notifier would be certainly invoked since it has been registered.
>
> This patch is about having "get/put" at correct places, or keeping kref balance in a clean way. We have 'put' kferf at mmu free notifier why not have 'get' kfref at mmu registry(alloc) notifier?
If we place increasing kref in mmu alloc notifier, it is still increased at kfd_process creation time, actually no difference, but inexplicitly done. Others need to dive into mmu ops to understand. Current approach  actually has a better readability.

MMU alloc notifier is invoked through locking, it locks the whole mm, so better not to add extra dispensable code there.

Current solution runs for years and this change actually does not fix an issue

Thanks
Lingshan 
>
> Regards
>
> Xiaogang
>
>>
>> Thanks
>> Lingshan
>>> The final kref is same. The patch just makes the balance in a logical way.
>>>
>>> Regards
>>>
>>> Xiaogang
>>>
>>>> Thanks
>>>> Lingshan
>>>>
>>>>>      static void kfd_process_free_notifier(struct mmu_notifier *mn)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/amdkfd: change kfd process kref count at creation
  2024-10-15  2:51         ` Zhu Lingshan
@ 2024-10-15  4:49           ` Chen, Xiaogang
  0 siblings, 0 replies; 9+ messages in thread
From: Chen, Xiaogang @ 2024-10-15  4:49 UTC (permalink / raw)
  To: Zhu Lingshan, amd-gfx; +Cc: felix.kuehling, philip.yang

[-- Attachment #1: Type: text/plain, Size: 5529 bytes --]


On 10/14/2024 9:51 PM, Zhu Lingshan wrote:
> On 10/14/2024 11:07 PM, Chen, Xiaogang wrote:
>> On 10/13/2024 8:55 PM, Zhu Lingshan wrote:
>>> On 10/13/2024 1:30 AM, Chen, Xiaogang wrote:
>>>> On 10/11/2024 9:56 PM, Zhu Lingshan wrote:
>>>>> On 10/11/2024 10:41 PM, Xiaogang.Chen wrote:
>>>>>> From: Xiaogang Chen<xiaogang.chen@amd.com>
>>>>>>
>>>>>> kfd process kref count(process->ref) is initialized to 1 by kref_init. After
>>>>>> it is created not need to increaes its kref. Instad add kfd process kref at kfd
>>>>>> process mmu notifier allocation since we decrease the ref at free_notifier of
>>>>>> mmu_notifier_ops, so pair them.
>>>>>>
>>>>>> When user process opens kfd node multiple times the kfd process kref is
>>>>>> increased each time to balance kfd node close operation.
>>>>>>
>>>>>> Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/amd/amdkfd/kfd_process.c | 15 ++++++++++-----
>>>>>>     1 file changed, 10 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> index d07acf1b2f93..78bf918abf92 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> @@ -850,8 +850,10 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>>>>>>             goto out;
>>>>>>         }
>>>>>>     -    /* A prior open of /dev/kfd could have already created the process. */
>>>>>> -    process = find_process(thread, false);
>>>>>> +    /* A prior open of /dev/kfd could have already created the process.
>>>>>> +     * find_process will increase process kref in this case
>>>>>> +     */
>>>>>> +    process = find_process(thread, true);
>>>>>>         if (process) {
>>>>>>             pr_debug("Process already found\n");
>>>>>>         } else {
>>>>>> @@ -899,8 +901,6 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>>>>>>             init_waitqueue_head(&process->wait_irq_drain);
>>>>>>         }
>>>>>>     out:
>>>>>> -    if (!IS_ERR(process))
>>>>>> -        kref_get(&process->ref);
>>>>>>         mutex_unlock(&kfd_processes_mutex);
>>>>>>         mmput(thread->mm);
>>>>>>     @@ -1191,7 +1191,12 @@ static struct mmu_notifier *kfd_process_alloc_notifier(struct mm_struct *mm)
>>>>>>           srcu_read_unlock(&kfd_processes_srcu, idx);
>>>>>>     -    return p ? &p->mmu_notifier : ERR_PTR(-ESRCH);
>>>>>> +    if (p) {
>>>>>> +        kref_get(&p->ref);
>>>>>> +        return &p->mmu_notifier;
>>>>>> +    }
>>>>>> +
>>>>>> +    return ERR_PTR(-ESRCH);
>>>>> this cb should only allocate the notifier (here it returns an existing notifier ),
>>>>> so I am not sure this is a better place to increase the kref, it seems coupling
>>>>> two low correlated routines.
>>>>>
>>>>> kref is decreased in the free notifier, but not mean it has to be increased in alloc notifier.
>>>> Who referring kfd process should also un-referrer it after finish. Any client should not do un-refer if it did not refer. That keeps balance in clean way.
>>> I think we already do so, see any functions call kfd_lookup_process_by_xxx would unref the kref of the kfd_process.
>>>> The current way is using  mmu's free notifier to unref kfref that was added by kfd process creation. Ex: if not use mmu notifier there would be extra kref that prevent release kfd process.
>>> I am not sure this is about paring, current design is to free the last kref when the whole program exits by the mmu free notifier, so it would destroy the kfd_process.
>>> MMU free notifier would be certainly invoked since it has been registered.
>> This patch is about having "get/put" at correct places, or keeping kref balance in a clean way. We have 'put' kferf at mmu free notifier why not have 'get' kfref at mmu registry(alloc) notifier?
> If we place increasing kref in mmu alloc notifier, it is still increased at kfd_process creation time, actually no difference, but inexplicitly done. Others need to dive into mmu ops to understand. Current approach  actually has a better readability.
I think this patch has better readability that it pairs kref of 
kfd_process in "get" and "put". People see the kref got added at 
"alloc_notifier", and decreased at "free_notifier" in same source file, 
not need to dive into mmu ops. When people see it got decreased at 
free_notifier they would wonder why the kref is not increased at 
alloc_notifier.
>
> MMU alloc notifier is invoked through locking, it locks the whole mm, so better not to add extra dispensable code there.
The change is adding kref for kfd_process , not mm or mmu_notifier at 
alloc_notifier. MMU free_notifier is more complicated then alloc 
notifier. free_notifier is triggered by scru callback and we have kfref 
updated at free_notifier, why not at alloc_notifier?
>
> Current solution runs for years and this change actually does not fix an issue

As said this patch is having "get/put" at correct places, or keeping 
kref balance in a clean way. Do you see any regression?

Regards

Xiaogang

>
> Thanks
> Lingshan
>> Regards
>>
>> Xiaogang
>>
>>> Thanks
>>> Lingshan
>>>> The final kref is same. The patch just makes the balance in a logical way.
>>>>
>>>> Regards
>>>>
>>>> Xiaogang
>>>>
>>>>> Thanks
>>>>> Lingshan
>>>>>
>>>>>>       static void kfd_process_free_notifier(struct mmu_notifier *mn)

[-- Attachment #2: Type: text/html, Size: 8507 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/amdkfd: change kfd process kref count at creation
  2024-10-11 14:41 [PATCH v2] drm/amdkfd: change kfd process kref count at creation Xiaogang.Chen
  2024-10-12  2:56 ` Zhu Lingshan
@ 2024-10-18 19:14 ` Felix Kuehling
  2024-10-18 22:02   ` Chen, Xiaogang
  1 sibling, 1 reply; 9+ messages in thread
From: Felix Kuehling @ 2024-10-18 19:14 UTC (permalink / raw)
  To: Xiaogang.Chen, amd-gfx; +Cc: philip.yang


On 2024-10-11 10:41, Xiaogang.Chen wrote:
> From: Xiaogang Chen <xiaogang.chen@amd.com>
>
> kfd process kref count(process->ref) is initialized to 1 by kref_init. After
> it is created not need to increaes its kref. Instad add kfd process kref at kfd
> process mmu notifier allocation since we decrease the ref at free_notifier of
> mmu_notifier_ops, so pair them.
>
> When user process opens kfd node multiple times the kfd process kref is
> increased each time to balance kfd node close operation.
>
> Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>

Thanks, this is an elegant solution, IMO. The reference returned by 
kfd_create_process comes either from find_process or create_process. And 
the extra reference that gets released by the free_notifier gets 
allocated by the alloc_notifier. I think there is a race condition, 
though. See inline.


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 15 ++++++++++-----
>   1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index d07acf1b2f93..78bf918abf92 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -850,8 +850,10 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>   		goto out;
>   	}
>   
> -	/* A prior open of /dev/kfd could have already created the process. */
> -	process = find_process(thread, false);
> +	/* A prior open of /dev/kfd could have already created the process.
> +	 * find_process will increase process kref in this case
> +	 */
> +	process = find_process(thread, true);
>   	if (process) {
>   		pr_debug("Process already found\n");
>   	} else {
> @@ -899,8 +901,6 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>   		init_waitqueue_head(&process->wait_irq_drain);
>   	}
>   out:
> -	if (!IS_ERR(process))
> -		kref_get(&process->ref);
>   	mutex_unlock(&kfd_processes_mutex);
>   	mmput(thread->mm);
>   
> @@ -1191,7 +1191,12 @@ static struct mmu_notifier *kfd_process_alloc_notifier(struct mm_struct *mm)
>   
>   	srcu_read_unlock(&kfd_processes_srcu, idx);
>   
> -	return p ? &p->mmu_notifier : ERR_PTR(-ESRCH);
> +	if (p) {
> +		kref_get(&p->ref);

This should be inside the srcu. I think you could use 
kfd_lookup_process_by_mm instead of open-coding the SRCU locking and 
find_process_by_mm. This does the lookup and reference counting safely 
already.

Regards,
   Felix

> +		return &p->mmu_notifier;
> +	}
> +
> +	return ERR_PTR(-ESRCH);
>   }
>   
>   static void kfd_process_free_notifier(struct mmu_notifier *mn)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/amdkfd: change kfd process kref count at creation
  2024-10-18 19:14 ` Felix Kuehling
@ 2024-10-18 22:02   ` Chen, Xiaogang
  0 siblings, 0 replies; 9+ messages in thread
From: Chen, Xiaogang @ 2024-10-18 22:02 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx; +Cc: philip.yang


On 10/18/2024 2:14 PM, Felix Kuehling wrote:
>
> On 2024-10-11 10:41, Xiaogang.Chen wrote:
>> From: Xiaogang Chen <xiaogang.chen@amd.com>
>>
>> kfd process kref count(process->ref) is initialized to 1 by 
>> kref_init. After
>> it is created not need to increaes its kref. Instad add kfd process 
>> kref at kfd
>> process mmu notifier allocation since we decrease the ref at 
>> free_notifier of
>> mmu_notifier_ops, so pair them.
>>
>> When user process opens kfd node multiple times the kfd process kref is
>> increased each time to balance kfd node close operation.
>>
>> Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
>
> Thanks, this is an elegant solution, IMO. The reference returned by 
> kfd_create_process comes either from find_process or create_process. 
> And the extra reference that gets released by the free_notifier gets 
> allocated by the alloc_notifier. I think there is a race condition, 
> though. See inline.
>
>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 15 ++++++++++-----
>>   1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> index d07acf1b2f93..78bf918abf92 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> @@ -850,8 +850,10 @@ struct kfd_process *kfd_create_process(struct 
>> task_struct *thread)
>>           goto out;
>>       }
>>   -    /* A prior open of /dev/kfd could have already created the 
>> process. */
>> -    process = find_process(thread, false);
>> +    /* A prior open of /dev/kfd could have already created the process.
>> +     * find_process will increase process kref in this case
>> +     */
>> +    process = find_process(thread, true);
>>       if (process) {
>>           pr_debug("Process already found\n");
>>       } else {
>> @@ -899,8 +901,6 @@ struct kfd_process *kfd_create_process(struct 
>> task_struct *thread)
>>           init_waitqueue_head(&process->wait_irq_drain);
>>       }
>>   out:
>> -    if (!IS_ERR(process))
>> -        kref_get(&process->ref);
>>       mutex_unlock(&kfd_processes_mutex);
>>       mmput(thread->mm);
>>   @@ -1191,7 +1191,12 @@ static struct mmu_notifier 
>> *kfd_process_alloc_notifier(struct mm_struct *mm)
>>         srcu_read_unlock(&kfd_processes_srcu, idx);
>>   -    return p ? &p->mmu_notifier : ERR_PTR(-ESRCH);
>> +    if (p) {
>> +        kref_get(&p->ref);
>
> This should be inside the srcu. I think you could use 
> kfd_lookup_process_by_mm instead of open-coding the SRCU locking and 
> find_process_by_mm. This does the lookup and reference counting safely 
> already.
>
ok, understand. Will do it after next week vacation.

Thanks

Xiaogang

> Regards,
>   Felix
>
>> +        return &p->mmu_notifier;
>> +    }
>> +
>> +    return ERR_PTR(-ESRCH);
>>   }
>>     static void kfd_process_free_notifier(struct mmu_notifier *mn)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-10-18 22:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-11 14:41 [PATCH v2] drm/amdkfd: change kfd process kref count at creation Xiaogang.Chen
2024-10-12  2:56 ` Zhu Lingshan
2024-10-12 17:30   ` Chen, Xiaogang
2024-10-14  1:55     ` Zhu Lingshan
2024-10-14 15:07       ` Chen, Xiaogang
2024-10-15  2:51         ` Zhu Lingshan
2024-10-15  4:49           ` Chen, Xiaogang
2024-10-18 19:14 ` Felix Kuehling
2024-10-18 22:02   ` Chen, Xiaogang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox