AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL
@ 2025-10-02 17:43 Philip Yang
  2025-10-02 17:43 ` [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address Philip Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Philip Yang @ 2025-10-02 17:43 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix.Kuehling, harish.kasiviswanathan, Philip Yang

Add hmm_range kzalloc return NULL error check. In case the get_pages
return failed, free and set hmm_range to NULL, to avoid double free in
get_pages_done.

Fixes: 29e6f5716115 ("drm/amdgpu: use user provided hmm_range buffer in amdgpu_ttm_tt_get_user_pages")
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 8c3787b00f36..e8a15751c125 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1736,15 +1736,20 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
 				continue;
 			}
 
-			WRITE_ONCE(p->svms.faulting_task, current);
 			hmm_range = kzalloc(sizeof(*hmm_range), GFP_KERNEL);
-			r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, npages,
-						       readonly, owner,
-						       hmm_range);
-			WRITE_ONCE(p->svms.faulting_task, NULL);
-			if (r) {
-				kfree(hmm_range);
-				pr_debug("failed %d to get svm range pages\n", r);
+			if (unlikely(!hmm_range)) {
+				r = -ENOMEM;
+			} else {
+				WRITE_ONCE(p->svms.faulting_task, current);
+				r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, npages,
+							       readonly, owner,
+							       hmm_range);
+				WRITE_ONCE(p->svms.faulting_task, NULL);
+				if (r) {
+					kfree(hmm_range);
+					hmm_range = NULL;
+					pr_debug("failed %d to get svm range pages\n", r);
+				}
 			}
 		} else {
 			r = -EFAULT;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address
  2025-10-02 17:43 [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Philip Yang
@ 2025-10-02 17:43 ` Philip Yang
  2025-10-02 22:04   ` Chen, Xiaogang
  2025-10-02 17:43 ` [PATCH 3/3] drm/amdkfd: Don't stuck in svm restore worker Philip Yang
  2025-10-02 21:48 ` [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Chen, Xiaogang
  2 siblings, 1 reply; 8+ messages in thread
From: Philip Yang @ 2025-10-02 17:43 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix.Kuehling, harish.kasiviswanathan, Philip Yang

svm_range_unmap_from_gpus uses page aligned start, end address, the end
address is inclusive.

Fixes: 38c55f6719f7 ("drm/amdkfd: Handle lack of READ permissions in SVM mapping")
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index e8a15751c125..742c28833650 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1723,8 +1723,8 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
 				svm_range_lock(prange);
 				if (vma->vm_flags & VM_WRITE)
 					pr_debug("VM_WRITE without VM_READ is not supported");
-				s = max(start, prange->start);
-				e = min(end, prange->last);
+				s = max(start >> PAGE_SHIFT, prange->start);
+				e = min((end - 1) >> PAGE_SHIFT, prange->last);
 				if (e >= s)
 					r = svm_range_unmap_from_gpus(prange, s, e,
 						       KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] drm/amdkfd: Don't stuck in svm restore worker
  2025-10-02 17:43 [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Philip Yang
  2025-10-02 17:43 ` [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address Philip Yang
@ 2025-10-02 17:43 ` Philip Yang
  2025-10-02 21:48 ` [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Chen, Xiaogang
  2 siblings, 0 replies; 8+ messages in thread
From: Philip Yang @ 2025-10-02 17:43 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix.Kuehling, harish.kasiviswanathan, Philip Yang

If vma is not found, the application has freed the memory using madvise
MADV_FREE, but driver don't receive the unmap from CPU MMU notifier
callback, the memory is still mapped on GPUs. svm restore work will
schedule the work to retry forever. Then user queues not resumed and
cause application hangs to wait for queue finish.

svm restore work should unmap the memory range from GPUs then resume
queues. If GPU page fault happens on the unmapped address, it is
application use-after-free bug.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 78 ++++++++++++++--------------
 1 file changed, 40 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 742c28833650..608a25c6c865 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1708,51 +1708,53 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
 		bool readonly;
 
 		vma = vma_lookup(mm, addr);
-		if (vma) {
-			readonly = !(vma->vm_flags & VM_WRITE);
+		next = vma ? min(vma->vm_end, end) : end;
 
-			next = min(vma->vm_end, end);
-			npages = (next - addr) >> PAGE_SHIFT;
+		if (!vma || !(vma->vm_flags & VM_READ)) {
 			/* HMM requires at least READ permissions. If provided with PROT_NONE,
 			 * unmap the memory. If it's not already mapped, this is a no-op
 			 * If PROT_WRITE is provided without READ, warn first then unmap
+			 * If vma is not found, addr is invalid, unmap from GPUs
 			 */
-			if (!(vma->vm_flags & VM_READ)) {
-				unsigned long e, s;
-
-				svm_range_lock(prange);
-				if (vma->vm_flags & VM_WRITE)
-					pr_debug("VM_WRITE without VM_READ is not supported");
-				s = max(start >> PAGE_SHIFT, prange->start);
-				e = min((end - 1) >> PAGE_SHIFT, prange->last);
-				if (e >= s)
-					r = svm_range_unmap_from_gpus(prange, s, e,
-						       KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU);
-				svm_range_unlock(prange);
-				/* If unmap returns non-zero, we'll bail on the next for loop
-				 * iteration, so just leave r and continue
-				 */
-				addr = next;
-				continue;
-			}
+			unsigned long e, s;
+
+			svm_range_lock(prange);
+
+			if (!vma)
+				pr_debug("vma not found\n");
+			else if (vma->vm_flags & VM_WRITE)
+				pr_debug("VM_WRITE without VM_READ is not supported");
+
+			s = max(start >> PAGE_SHIFT, prange->start);
+			e = min((end - 1) >> PAGE_SHIFT, prange->last);
+			if (e >= s)
+				r = svm_range_unmap_from_gpus(prange, s, e,
+					       KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU);
+			svm_range_unlock(prange);
+			/* If unmap returns non-zero, we'll bail on the next for loop
+			 * iteration, so just leave r and continue
+			 */
+			addr = next;
+			continue;
+		}
 
-			hmm_range = kzalloc(sizeof(*hmm_range), GFP_KERNEL);
-			if (unlikely(!hmm_range)) {
-				r = -ENOMEM;
-			} else {
-				WRITE_ONCE(p->svms.faulting_task, current);
-				r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, npages,
-							       readonly, owner,
-							       hmm_range);
-				WRITE_ONCE(p->svms.faulting_task, NULL);
-				if (r) {
-					kfree(hmm_range);
-					hmm_range = NULL;
-					pr_debug("failed %d to get svm range pages\n", r);
-				}
-			}
+		readonly = !(vma->vm_flags & VM_WRITE);
+		npages = (next - addr) >> PAGE_SHIFT;
+
+		hmm_range = kzalloc(sizeof(*hmm_range), GFP_KERNEL);
+		if (unlikely(!hmm_range)) {
+			r = -ENOMEM;
 		} else {
-			r = -EFAULT;
+			WRITE_ONCE(p->svms.faulting_task, current);
+			r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, npages,
+						       readonly, owner,
+						       hmm_range);
+			WRITE_ONCE(p->svms.faulting_task, NULL);
+			if (r) {
+				kfree(hmm_range);
+				hmm_range = NULL;
+				pr_debug("failed %d to get svm range pages\n", r);
+			}
 		}
 
 		if (!r) {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL
  2025-10-02 17:43 [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Philip Yang
  2025-10-02 17:43 ` [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address Philip Yang
  2025-10-02 17:43 ` [PATCH 3/3] drm/amdkfd: Don't stuck in svm restore worker Philip Yang
@ 2025-10-02 21:48 ` Chen, Xiaogang
  2025-10-03 15:12   ` Philip Yang
  2 siblings, 1 reply; 8+ messages in thread
From: Chen, Xiaogang @ 2025-10-02 21:48 UTC (permalink / raw)
  To: Philip Yang, amd-gfx; +Cc: Felix.Kuehling, harish.kasiviswanathan


On 10/2/2025 12:43 PM, Philip Yang wrote:
> Add hmm_range kzalloc return NULL error check. In case the get_pages
> return failed, free and set hmm_range to NULL, to avoid double free in
> get_pages_done.
>
> Fixes: 29e6f5716115 ("drm/amdgpu: use user provided hmm_range buffer in amdgpu_ttm_tt_get_user_pages")
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 21 +++++++++++++--------
>   1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 8c3787b00f36..e8a15751c125 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -1736,15 +1736,20 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
>   				continue;
>   			}
>   
> -			WRITE_ONCE(p->svms.faulting_task, current);
>   			hmm_range = kzalloc(sizeof(*hmm_range), GFP_KERNEL);
> -			r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, npages,
> -						       readonly, owner,
> -						       hmm_range);
> -			WRITE_ONCE(p->svms.faulting_task, NULL);
> -			if (r) {
> -				kfree(hmm_range);
> -				pr_debug("failed %d to get svm range pages\n", r);
> +			if (unlikely(!hmm_range)) {
> +				r = -ENOMEM;
> +			} else {
> +				WRITE_ONCE(p->svms.faulting_task, current);
> +				r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, npages,
> +							       readonly, owner,
> +							       hmm_range);
> +				WRITE_ONCE(p->svms.faulting_task, NULL);
> +				if (r) {
> +					kfree(hmm_range);
> +					hmm_range = NULL;

How it avoid double free hmm_range? Currently hmm_range got freed at 
amdgpu_hmm_range_get_pages_done. You free it here, then 
amdgpu_hmm_range_get_pages_done would not free it. if do not free here  
amdgpu_hmm_range_get_pages_done would do.

And besides free hmm_range, we also need to free hmm_range->hmm_pfns 
that is done at amdgpu_hmm_range_get_pages_done.

I think the real problem is hmn_range is allocated and free at different 
places. I do not know why.

Regards

Xiaogang


> +					pr_debug("failed %d to get svm range pages\n", r);
> +				}
>   			}
>   		} else {
>   			r = -EFAULT;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address
  2025-10-02 17:43 ` [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address Philip Yang
@ 2025-10-02 22:04   ` Chen, Xiaogang
  2025-10-03 15:03     ` Philip Yang
  0 siblings, 1 reply; 8+ messages in thread
From: Chen, Xiaogang @ 2025-10-02 22:04 UTC (permalink / raw)
  To: Philip Yang, amd-gfx; +Cc: Felix.Kuehling, harish.kasiviswanathan


On 10/2/2025 12:43 PM, Philip Yang wrote:
> svm_range_unmap_from_gpus uses page aligned start, end address, the end
> address is inclusive.
>
> Fixes: 38c55f6719f7 ("drm/amdkfd: Handle lack of READ permissions in SVM mapping")
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index e8a15751c125..742c28833650 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -1723,8 +1723,8 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
>   				svm_range_lock(prange);
>   				if (vma->vm_flags & VM_WRITE)
>   					pr_debug("VM_WRITE without VM_READ is not supported");
> -				s = max(start, prange->start);
> -				e = min(end, prange->last);
> +				s = max(start >> PAGE_SHIFT, prange->start);
> +				e = min((end - 1) >> PAGE_SHIFT, prange->last);

I think the problem is more than that. Here "end" is the last place for 
prange gpu mapping and the handling here is for a vma. Should use end of 
the vma range to get "e".

Regards

Xiaogang

>   				if (e >= s)
>   					r = svm_range_unmap_from_gpus(prange, s, e,
>   						       KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address
  2025-10-02 22:04   ` Chen, Xiaogang
@ 2025-10-03 15:03     ` Philip Yang
  0 siblings, 0 replies; 8+ messages in thread
From: Philip Yang @ 2025-10-03 15:03 UTC (permalink / raw)
  To: Chen, Xiaogang, Philip Yang, amd-gfx
  Cc: Felix.Kuehling, harish.kasiviswanathan


On 2025-10-02 18:04, Chen, Xiaogang wrote:
>
> On 10/2/2025 12:43 PM, Philip Yang wrote:
>> svm_range_unmap_from_gpus uses page aligned start, end address, the end
>> address is inclusive.
>>
>> Fixes: 38c55f6719f7 ("drm/amdkfd: Handle lack of READ permissions in 
>> SVM mapping")
>> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index e8a15751c125..742c28833650 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -1723,8 +1723,8 @@ static int svm_range_validate_and_map(struct 
>> mm_struct *mm,
>>                   svm_range_lock(prange);
>>                   if (vma->vm_flags & VM_WRITE)
>>                       pr_debug("VM_WRITE without VM_READ is not 
>> supported");
>> -                s = max(start, prange->start);
>> -                e = min(end, prange->last);
>> +                s = max(start >> PAGE_SHIFT, prange->start);
>> +                e = min((end - 1) >> PAGE_SHIFT, prange->last);
>
> I think the problem is more than that. Here "end" is the last place 
> for prange gpu mapping and the handling here is for a vma. Should use 
> end of the vma range to get "e".

You are right, end should use next - 1 instead, I will refactor the 
patch series, send out v2.

regards,

Philip

>
> Regards
>
> Xiaogang
>
>>                   if (e >= s)
>>                       r = svm_range_unmap_from_gpus(prange, s, e,
>> KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL
  2025-10-02 21:48 ` [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Chen, Xiaogang
@ 2025-10-03 15:12   ` Philip Yang
  2025-10-03 15:39     ` Chen, Xiaogang
  0 siblings, 1 reply; 8+ messages in thread
From: Philip Yang @ 2025-10-03 15:12 UTC (permalink / raw)
  To: Chen, Xiaogang, Philip Yang, amd-gfx
  Cc: Felix.Kuehling, harish.kasiviswanathan


On 2025-10-02 17:48, Chen, Xiaogang wrote:
>
> On 10/2/2025 12:43 PM, Philip Yang wrote:
>> Add hmm_range kzalloc return NULL error check. In case the get_pages
>> return failed, free and set hmm_range to NULL, to avoid double free in
>> get_pages_done.
>>
>> Fixes: 29e6f5716115 ("drm/amdgpu: use user provided hmm_range buffer 
>> in amdgpu_ttm_tt_get_user_pages")
>> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 21 +++++++++++++--------
>>   1 file changed, 13 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index 8c3787b00f36..e8a15751c125 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -1736,15 +1736,20 @@ static int svm_range_validate_and_map(struct 
>> mm_struct *mm,
>>                   continue;
>>               }
>>   -            WRITE_ONCE(p->svms.faulting_task, current);
>>               hmm_range = kzalloc(sizeof(*hmm_range), GFP_KERNEL);
>> -            r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, 
>> npages,
>> -                               readonly, owner,
>> -                               hmm_range);
>> -            WRITE_ONCE(p->svms.faulting_task, NULL);
>> -            if (r) {
>> -                kfree(hmm_range);
>> -                pr_debug("failed %d to get svm range pages\n", r);
>> +            if (unlikely(!hmm_range)) {
>> +                r = -ENOMEM;
>> +            } else {
>> +                WRITE_ONCE(p->svms.faulting_task, current);
>> +                r = amdgpu_hmm_range_get_pages(&prange->notifier, 
>> addr, npages,
>> +                                   readonly, owner,
>> +                                   hmm_range);
>> +                WRITE_ONCE(p->svms.faulting_task, NULL);
>> +                if (r) {
>> +                    kfree(hmm_range);
>> +                    hmm_range = NULL;
>
> How it avoid double free hmm_range? Currently hmm_range got freed at 
> amdgpu_hmm_range_get_pages_done. You free it here, then 
> amdgpu_hmm_range_get_pages_done would not free it. if do not free 
> here  amdgpu_hmm_range_get_pages_done would do.
>
> And besides free hmm_range, we also need to free hmm_range->hmm_pfns 
> that is done at amdgpu_hmm_range_get_pages_done.

if amdgpu_hmm_range_get_pages returns failed, hmm_range->hmm_pfns is 
already freed. If only free hmm_range, but don't set hmm_range to NULL, 
then amdgpu_hmm_range_get_pages_done is called and will double free 
hmm_range again.

>
> I think the real problem is hmn_range is allocated and free at 
> different places. I do not know why.

This is the change in 29e6f5716115, with details in the patch.

Regards,

Philip

>
> Regards
>
> Xiaogang
>
>
>> +                    pr_debug("failed %d to get svm range pages\n", r);
>> +                }
>>               }
>>           } else {
>>               r = -EFAULT;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL
  2025-10-03 15:12   ` Philip Yang
@ 2025-10-03 15:39     ` Chen, Xiaogang
  0 siblings, 0 replies; 8+ messages in thread
From: Chen, Xiaogang @ 2025-10-03 15:39 UTC (permalink / raw)
  To: Philip Yang, Philip Yang, amd-gfx; +Cc: Felix.Kuehling, harish.kasiviswanathan


On 10/3/2025 10:12 AM, Philip Yang wrote:
>
> On 2025-10-02 17:48, Chen, Xiaogang wrote:
>>
>> On 10/2/2025 12:43 PM, Philip Yang wrote:
>>> Add hmm_range kzalloc return NULL error check. In case the get_pages
>>> return failed, free and set hmm_range to NULL, to avoid double free in
>>> get_pages_done.
>>>
>>> Fixes: 29e6f5716115 ("drm/amdgpu: use user provided hmm_range buffer 
>>> in amdgpu_ttm_tt_get_user_pages")
>>> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 21 +++++++++++++--------
>>>   1 file changed, 13 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> index 8c3787b00f36..e8a15751c125 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> @@ -1736,15 +1736,20 @@ static int svm_range_validate_and_map(struct 
>>> mm_struct *mm,
>>>                   continue;
>>>               }
>>>   -            WRITE_ONCE(p->svms.faulting_task, current);
>>>               hmm_range = kzalloc(sizeof(*hmm_range), GFP_KERNEL);
>>> -            r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, 
>>> npages,
>>> -                               readonly, owner,
>>> -                               hmm_range);
>>> -            WRITE_ONCE(p->svms.faulting_task, NULL);
>>> -            if (r) {
>>> -                kfree(hmm_range);
>>> -                pr_debug("failed %d to get svm range pages\n", r);
>>> +            if (unlikely(!hmm_range)) {
>>> +                r = -ENOMEM;
>>> +            } else {
>>> +                WRITE_ONCE(p->svms.faulting_task, current);
>>> +                r = amdgpu_hmm_range_get_pages(&prange->notifier, 
>>> addr, npages,
>>> +                                   readonly, owner,
>>> +                                   hmm_range);
>>> +                WRITE_ONCE(p->svms.faulting_task, NULL);
>>> +                if (r) {
>>> +                    kfree(hmm_range);
>>> +                    hmm_range = NULL;
>>
>> How it avoid double free hmm_range? Currently hmm_range got freed at 
>> amdgpu_hmm_range_get_pages_done. You free it here, then 
>> amdgpu_hmm_range_get_pages_done would not free it. if do not free 
>> here  amdgpu_hmm_range_get_pages_done would do.
>>
>> And besides free hmm_range, we also need to free hmm_range->hmm_pfns 
>> that is done at amdgpu_hmm_range_get_pages_done.
>
> if amdgpu_hmm_range_get_pages returns failed, hmm_range->hmm_pfns is 
> already freed. If only free hmm_range, but don't set hmm_range to 
> NULL, then amdgpu_hmm_range_get_pages_done is called and will double 
> free hmm_range again.

Yes, but amdgpu_hmm_range_get_pages_done already has hmm_range free, why 
you free hmm_range at svm?

I think the thing got confusing is hmm_range allocation and free at 
different places, not double free. If hmm_range is allocated by svm, svm 
needs free no matter amdgpu_hmm_range_get_pages is success or not. Even 
have allocation and free at different places the free function should 
appear only in once place.

>
>>
>> I think the real problem is hmn_range is allocated and free at 
>> different places. I do not know why.
>
> This is the change in 29e6f5716115, with details in the patch.
>
This patch wants have hmm_range pointer easier, but I think allocation 
and free hmm_ranges at different components is not a good idea.

Regards

Xiaogang

> Regards,
>
> Philip
>
>>
>> Regards
>>
>> Xiaogang
>>
>>
>>> +                    pr_debug("failed %d to get svm range pages\n", r);
>>> +                }
>>>               }
>>>           } else {
>>>               r = -EFAULT;

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-10-03 15:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-02 17:43 [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Philip Yang
2025-10-02 17:43 ` [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address Philip Yang
2025-10-02 22:04   ` Chen, Xiaogang
2025-10-03 15:03     ` Philip Yang
2025-10-02 17:43 ` [PATCH 3/3] drm/amdkfd: Don't stuck in svm restore worker Philip Yang
2025-10-02 21:48 ` [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Chen, Xiaogang
2025-10-03 15:12   ` Philip Yang
2025-10-03 15:39     ` Chen, Xiaogang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox