* [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL
@ 2025-10-02 17:43 Philip Yang
2025-10-02 17:43 ` [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address Philip Yang
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Philip Yang @ 2025-10-02 17:43 UTC (permalink / raw)
To: amd-gfx; +Cc: Felix.Kuehling, harish.kasiviswanathan, Philip Yang
Add hmm_range kzalloc return NULL error check. In case the get_pages
return failed, free and set hmm_range to NULL, to avoid double free in
get_pages_done.
Fixes: 29e6f5716115 ("drm/amdgpu: use user provided hmm_range buffer in amdgpu_ttm_tt_get_user_pages")
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 8c3787b00f36..e8a15751c125 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1736,15 +1736,20 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
continue;
}
- WRITE_ONCE(p->svms.faulting_task, current);
hmm_range = kzalloc(sizeof(*hmm_range), GFP_KERNEL);
- r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, npages,
- readonly, owner,
- hmm_range);
- WRITE_ONCE(p->svms.faulting_task, NULL);
- if (r) {
- kfree(hmm_range);
- pr_debug("failed %d to get svm range pages\n", r);
+ if (unlikely(!hmm_range)) {
+ r = -ENOMEM;
+ } else {
+ WRITE_ONCE(p->svms.faulting_task, current);
+ r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, npages,
+ readonly, owner,
+ hmm_range);
+ WRITE_ONCE(p->svms.faulting_task, NULL);
+ if (r) {
+ kfree(hmm_range);
+ hmm_range = NULL;
+ pr_debug("failed %d to get svm range pages\n", r);
+ }
}
} else {
r = -EFAULT;
--
2.49.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address
2025-10-02 17:43 [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Philip Yang
@ 2025-10-02 17:43 ` Philip Yang
2025-10-02 22:04 ` Chen, Xiaogang
2025-10-02 17:43 ` [PATCH 3/3] drm/amdkfd: Don't stuck in svm restore worker Philip Yang
2025-10-02 21:48 ` [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Chen, Xiaogang
2 siblings, 1 reply; 8+ messages in thread
From: Philip Yang @ 2025-10-02 17:43 UTC (permalink / raw)
To: amd-gfx; +Cc: Felix.Kuehling, harish.kasiviswanathan, Philip Yang
svm_range_unmap_from_gpus uses page aligned start, end address, the end
address is inclusive.
Fixes: 38c55f6719f7 ("drm/amdkfd: Handle lack of READ permissions in SVM mapping")
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index e8a15751c125..742c28833650 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1723,8 +1723,8 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
svm_range_lock(prange);
if (vma->vm_flags & VM_WRITE)
pr_debug("VM_WRITE without VM_READ is not supported");
- s = max(start, prange->start);
- e = min(end, prange->last);
+ s = max(start >> PAGE_SHIFT, prange->start);
+ e = min((end - 1) >> PAGE_SHIFT, prange->last);
if (e >= s)
r = svm_range_unmap_from_gpus(prange, s, e,
KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU);
--
2.49.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] drm/amdkfd: Don't stuck in svm restore worker
2025-10-02 17:43 [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Philip Yang
2025-10-02 17:43 ` [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address Philip Yang
@ 2025-10-02 17:43 ` Philip Yang
2025-10-02 21:48 ` [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Chen, Xiaogang
2 siblings, 0 replies; 8+ messages in thread
From: Philip Yang @ 2025-10-02 17:43 UTC (permalink / raw)
To: amd-gfx; +Cc: Felix.Kuehling, harish.kasiviswanathan, Philip Yang
If vma is not found, the application has freed the memory using madvise
MADV_FREE, but driver don't receive the unmap from CPU MMU notifier
callback, the memory is still mapped on GPUs. svm restore work will
schedule the work to retry forever. Then user queues not resumed and
cause application hangs to wait for queue finish.
svm restore work should unmap the memory range from GPUs then resume
queues. If GPU page fault happens on the unmapped address, it is
application use-after-free bug.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 78 ++++++++++++++--------------
1 file changed, 40 insertions(+), 38 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 742c28833650..608a25c6c865 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1708,51 +1708,53 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
bool readonly;
vma = vma_lookup(mm, addr);
- if (vma) {
- readonly = !(vma->vm_flags & VM_WRITE);
+ next = vma ? min(vma->vm_end, end) : end;
- next = min(vma->vm_end, end);
- npages = (next - addr) >> PAGE_SHIFT;
+ if (!vma || !(vma->vm_flags & VM_READ)) {
/* HMM requires at least READ permissions. If provided with PROT_NONE,
* unmap the memory. If it's not already mapped, this is a no-op
* If PROT_WRITE is provided without READ, warn first then unmap
+ * If vma is not found, addr is invalid, unmap from GPUs
*/
- if (!(vma->vm_flags & VM_READ)) {
- unsigned long e, s;
-
- svm_range_lock(prange);
- if (vma->vm_flags & VM_WRITE)
- pr_debug("VM_WRITE without VM_READ is not supported");
- s = max(start >> PAGE_SHIFT, prange->start);
- e = min((end - 1) >> PAGE_SHIFT, prange->last);
- if (e >= s)
- r = svm_range_unmap_from_gpus(prange, s, e,
- KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU);
- svm_range_unlock(prange);
- /* If unmap returns non-zero, we'll bail on the next for loop
- * iteration, so just leave r and continue
- */
- addr = next;
- continue;
- }
+ unsigned long e, s;
+
+ svm_range_lock(prange);
+
+ if (!vma)
+ pr_debug("vma not found\n");
+ else if (vma->vm_flags & VM_WRITE)
+ pr_debug("VM_WRITE without VM_READ is not supported");
+
+ s = max(start >> PAGE_SHIFT, prange->start);
+ e = min((end - 1) >> PAGE_SHIFT, prange->last);
+ if (e >= s)
+ r = svm_range_unmap_from_gpus(prange, s, e,
+ KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU);
+ svm_range_unlock(prange);
+ /* If unmap returns non-zero, we'll bail on the next for loop
+ * iteration, so just leave r and continue
+ */
+ addr = next;
+ continue;
+ }
- hmm_range = kzalloc(sizeof(*hmm_range), GFP_KERNEL);
- if (unlikely(!hmm_range)) {
- r = -ENOMEM;
- } else {
- WRITE_ONCE(p->svms.faulting_task, current);
- r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, npages,
- readonly, owner,
- hmm_range);
- WRITE_ONCE(p->svms.faulting_task, NULL);
- if (r) {
- kfree(hmm_range);
- hmm_range = NULL;
- pr_debug("failed %d to get svm range pages\n", r);
- }
- }
+ readonly = !(vma->vm_flags & VM_WRITE);
+ npages = (next - addr) >> PAGE_SHIFT;
+
+ hmm_range = kzalloc(sizeof(*hmm_range), GFP_KERNEL);
+ if (unlikely(!hmm_range)) {
+ r = -ENOMEM;
} else {
- r = -EFAULT;
+ WRITE_ONCE(p->svms.faulting_task, current);
+ r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, npages,
+ readonly, owner,
+ hmm_range);
+ WRITE_ONCE(p->svms.faulting_task, NULL);
+ if (r) {
+ kfree(hmm_range);
+ hmm_range = NULL;
+ pr_debug("failed %d to get svm range pages\n", r);
+ }
}
if (!r) {
--
2.49.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL
2025-10-02 17:43 [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Philip Yang
2025-10-02 17:43 ` [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address Philip Yang
2025-10-02 17:43 ` [PATCH 3/3] drm/amdkfd: Don't stuck in svm restore worker Philip Yang
@ 2025-10-02 21:48 ` Chen, Xiaogang
2025-10-03 15:12 ` Philip Yang
2 siblings, 1 reply; 8+ messages in thread
From: Chen, Xiaogang @ 2025-10-02 21:48 UTC (permalink / raw)
To: Philip Yang, amd-gfx; +Cc: Felix.Kuehling, harish.kasiviswanathan
On 10/2/2025 12:43 PM, Philip Yang wrote:
> Add hmm_range kzalloc return NULL error check. In case the get_pages
> return failed, free and set hmm_range to NULL, to avoid double free in
> get_pages_done.
>
> Fixes: 29e6f5716115 ("drm/amdgpu: use user provided hmm_range buffer in amdgpu_ttm_tt_get_user_pages")
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 21 +++++++++++++--------
> 1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 8c3787b00f36..e8a15751c125 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -1736,15 +1736,20 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
> continue;
> }
>
> - WRITE_ONCE(p->svms.faulting_task, current);
> hmm_range = kzalloc(sizeof(*hmm_range), GFP_KERNEL);
> - r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, npages,
> - readonly, owner,
> - hmm_range);
> - WRITE_ONCE(p->svms.faulting_task, NULL);
> - if (r) {
> - kfree(hmm_range);
> - pr_debug("failed %d to get svm range pages\n", r);
> + if (unlikely(!hmm_range)) {
> + r = -ENOMEM;
> + } else {
> + WRITE_ONCE(p->svms.faulting_task, current);
> + r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, npages,
> + readonly, owner,
> + hmm_range);
> + WRITE_ONCE(p->svms.faulting_task, NULL);
> + if (r) {
> + kfree(hmm_range);
> + hmm_range = NULL;
How it avoid double free hmm_range? Currently hmm_range got freed at
amdgpu_hmm_range_get_pages_done. You free it here, then
amdgpu_hmm_range_get_pages_done would not free it. if do not free here
amdgpu_hmm_range_get_pages_done would do.
And besides free hmm_range, we also need to free hmm_range->hmm_pfns
that is done at amdgpu_hmm_range_get_pages_done.
I think the real problem is hmn_range is allocated and free at different
places. I do not know why.
Regards
Xiaogang
> + pr_debug("failed %d to get svm range pages\n", r);
> + }
> }
> } else {
> r = -EFAULT;
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address
2025-10-02 17:43 ` [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address Philip Yang
@ 2025-10-02 22:04 ` Chen, Xiaogang
2025-10-03 15:03 ` Philip Yang
0 siblings, 1 reply; 8+ messages in thread
From: Chen, Xiaogang @ 2025-10-02 22:04 UTC (permalink / raw)
To: Philip Yang, amd-gfx; +Cc: Felix.Kuehling, harish.kasiviswanathan
On 10/2/2025 12:43 PM, Philip Yang wrote:
> svm_range_unmap_from_gpus uses page aligned start, end address, the end
> address is inclusive.
>
> Fixes: 38c55f6719f7 ("drm/amdkfd: Handle lack of READ permissions in SVM mapping")
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index e8a15751c125..742c28833650 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -1723,8 +1723,8 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
> svm_range_lock(prange);
> if (vma->vm_flags & VM_WRITE)
> pr_debug("VM_WRITE without VM_READ is not supported");
> - s = max(start, prange->start);
> - e = min(end, prange->last);
> + s = max(start >> PAGE_SHIFT, prange->start);
> + e = min((end - 1) >> PAGE_SHIFT, prange->last);
I think the problem is more than that. Here "end" is the last place for
prange gpu mapping and the handling here is for a vma. Should use end of
the vma range to get "e".
Regards
Xiaogang
> if (e >= s)
> r = svm_range_unmap_from_gpus(prange, s, e,
> KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU);
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address
2025-10-02 22:04 ` Chen, Xiaogang
@ 2025-10-03 15:03 ` Philip Yang
0 siblings, 0 replies; 8+ messages in thread
From: Philip Yang @ 2025-10-03 15:03 UTC (permalink / raw)
To: Chen, Xiaogang, Philip Yang, amd-gfx
Cc: Felix.Kuehling, harish.kasiviswanathan
On 2025-10-02 18:04, Chen, Xiaogang wrote:
>
> On 10/2/2025 12:43 PM, Philip Yang wrote:
>> svm_range_unmap_from_gpus uses page aligned start, end address, the end
>> address is inclusive.
>>
>> Fixes: 38c55f6719f7 ("drm/amdkfd: Handle lack of READ permissions in
>> SVM mapping")
>> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index e8a15751c125..742c28833650 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -1723,8 +1723,8 @@ static int svm_range_validate_and_map(struct
>> mm_struct *mm,
>> svm_range_lock(prange);
>> if (vma->vm_flags & VM_WRITE)
>> pr_debug("VM_WRITE without VM_READ is not
>> supported");
>> - s = max(start, prange->start);
>> - e = min(end, prange->last);
>> + s = max(start >> PAGE_SHIFT, prange->start);
>> + e = min((end - 1) >> PAGE_SHIFT, prange->last);
>
> I think the problem is more than that. Here "end" is the last place
> for prange gpu mapping and the handling here is for a vma. Should use
> end of the vma range to get "e".
You are right, end should use next - 1 instead, I will refactor the
patch series, send out v2.
regards,
Philip
>
> Regards
>
> Xiaogang
>
>> if (e >= s)
>> r = svm_range_unmap_from_gpus(prange, s, e,
>> KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU);
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL
2025-10-02 21:48 ` [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Chen, Xiaogang
@ 2025-10-03 15:12 ` Philip Yang
2025-10-03 15:39 ` Chen, Xiaogang
0 siblings, 1 reply; 8+ messages in thread
From: Philip Yang @ 2025-10-03 15:12 UTC (permalink / raw)
To: Chen, Xiaogang, Philip Yang, amd-gfx
Cc: Felix.Kuehling, harish.kasiviswanathan
On 2025-10-02 17:48, Chen, Xiaogang wrote:
>
> On 10/2/2025 12:43 PM, Philip Yang wrote:
>> Add hmm_range kzalloc return NULL error check. In case the get_pages
>> return failed, free and set hmm_range to NULL, to avoid double free in
>> get_pages_done.
>>
>> Fixes: 29e6f5716115 ("drm/amdgpu: use user provided hmm_range buffer
>> in amdgpu_ttm_tt_get_user_pages")
>> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 21 +++++++++++++--------
>> 1 file changed, 13 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index 8c3787b00f36..e8a15751c125 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -1736,15 +1736,20 @@ static int svm_range_validate_and_map(struct
>> mm_struct *mm,
>> continue;
>> }
>> - WRITE_ONCE(p->svms.faulting_task, current);
>> hmm_range = kzalloc(sizeof(*hmm_range), GFP_KERNEL);
>> - r = amdgpu_hmm_range_get_pages(&prange->notifier, addr,
>> npages,
>> - readonly, owner,
>> - hmm_range);
>> - WRITE_ONCE(p->svms.faulting_task, NULL);
>> - if (r) {
>> - kfree(hmm_range);
>> - pr_debug("failed %d to get svm range pages\n", r);
>> + if (unlikely(!hmm_range)) {
>> + r = -ENOMEM;
>> + } else {
>> + WRITE_ONCE(p->svms.faulting_task, current);
>> + r = amdgpu_hmm_range_get_pages(&prange->notifier,
>> addr, npages,
>> + readonly, owner,
>> + hmm_range);
>> + WRITE_ONCE(p->svms.faulting_task, NULL);
>> + if (r) {
>> + kfree(hmm_range);
>> + hmm_range = NULL;
>
> How it avoid double free hmm_range? Currently hmm_range got freed at
> amdgpu_hmm_range_get_pages_done. You free it here, then
> amdgpu_hmm_range_get_pages_done would not free it. if do not free
> here amdgpu_hmm_range_get_pages_done would do.
>
> And besides free hmm_range, we also need to free hmm_range->hmm_pfns
> that is done at amdgpu_hmm_range_get_pages_done.
if amdgpu_hmm_range_get_pages returns failed, hmm_range->hmm_pfns is
already freed. If only free hmm_range, but don't set hmm_range to NULL,
then amdgpu_hmm_range_get_pages_done is called and will double free
hmm_range again.
>
> I think the real problem is hmn_range is allocated and free at
> different places. I do not know why.
This is the change in 29e6f5716115, with details in the patch.
Regards,
Philip
>
> Regards
>
> Xiaogang
>
>
>> + pr_debug("failed %d to get svm range pages\n", r);
>> + }
>> }
>> } else {
>> r = -EFAULT;
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL
2025-10-03 15:12 ` Philip Yang
@ 2025-10-03 15:39 ` Chen, Xiaogang
0 siblings, 0 replies; 8+ messages in thread
From: Chen, Xiaogang @ 2025-10-03 15:39 UTC (permalink / raw)
To: Philip Yang, Philip Yang, amd-gfx; +Cc: Felix.Kuehling, harish.kasiviswanathan
On 10/3/2025 10:12 AM, Philip Yang wrote:
>
> On 2025-10-02 17:48, Chen, Xiaogang wrote:
>>
>> On 10/2/2025 12:43 PM, Philip Yang wrote:
>>> Add hmm_range kzalloc return NULL error check. In case the get_pages
>>> return failed, free and set hmm_range to NULL, to avoid double free in
>>> get_pages_done.
>>>
>>> Fixes: 29e6f5716115 ("drm/amdgpu: use user provided hmm_range buffer
>>> in amdgpu_ttm_tt_get_user_pages")
>>> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 21 +++++++++++++--------
>>> 1 file changed, 13 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> index 8c3787b00f36..e8a15751c125 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> @@ -1736,15 +1736,20 @@ static int svm_range_validate_and_map(struct
>>> mm_struct *mm,
>>> continue;
>>> }
>>> - WRITE_ONCE(p->svms.faulting_task, current);
>>> hmm_range = kzalloc(sizeof(*hmm_range), GFP_KERNEL);
>>> - r = amdgpu_hmm_range_get_pages(&prange->notifier, addr,
>>> npages,
>>> - readonly, owner,
>>> - hmm_range);
>>> - WRITE_ONCE(p->svms.faulting_task, NULL);
>>> - if (r) {
>>> - kfree(hmm_range);
>>> - pr_debug("failed %d to get svm range pages\n", r);
>>> + if (unlikely(!hmm_range)) {
>>> + r = -ENOMEM;
>>> + } else {
>>> + WRITE_ONCE(p->svms.faulting_task, current);
>>> + r = amdgpu_hmm_range_get_pages(&prange->notifier,
>>> addr, npages,
>>> + readonly, owner,
>>> + hmm_range);
>>> + WRITE_ONCE(p->svms.faulting_task, NULL);
>>> + if (r) {
>>> + kfree(hmm_range);
>>> + hmm_range = NULL;
>>
>> How it avoid double free hmm_range? Currently hmm_range got freed at
>> amdgpu_hmm_range_get_pages_done. You free it here, then
>> amdgpu_hmm_range_get_pages_done would not free it. if do not free
>> here amdgpu_hmm_range_get_pages_done would do.
>>
>> And besides free hmm_range, we also need to free hmm_range->hmm_pfns
>> that is done at amdgpu_hmm_range_get_pages_done.
>
> if amdgpu_hmm_range_get_pages returns failed, hmm_range->hmm_pfns is
> already freed. If only free hmm_range, but don't set hmm_range to
> NULL, then amdgpu_hmm_range_get_pages_done is called and will double
> free hmm_range again.
Yes, but amdgpu_hmm_range_get_pages_done already has hmm_range free, why
you free hmm_range at svm?
I think the thing got confusing is hmm_range allocation and free at
different places, not double free. If hmm_range is allocated by svm, svm
needs free no matter amdgpu_hmm_range_get_pages is success or not. Even
have allocation and free at different places the free function should
appear only in once place.
>
>>
>> I think the real problem is hmn_range is allocated and free at
>> different places. I do not know why.
>
> This is the change in 29e6f5716115, with details in the patch.
>
This patch wants have hmm_range pointer easier, but I think allocation
and free hmm_ranges at different components is not a good idea.
Regards
Xiaogang
> Regards,
>
> Philip
>
>>
>> Regards
>>
>> Xiaogang
>>
>>
>>> + pr_debug("failed %d to get svm range pages\n", r);
>>> + }
>>> }
>>> } else {
>>> r = -EFAULT;
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-10-03 15:40 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-02 17:43 [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Philip Yang
2025-10-02 17:43 ` [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address Philip Yang
2025-10-02 22:04 ` Chen, Xiaogang
2025-10-03 15:03 ` Philip Yang
2025-10-02 17:43 ` [PATCH 3/3] drm/amdkfd: Don't stuck in svm restore worker Philip Yang
2025-10-02 21:48 ` [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Chen, Xiaogang
2025-10-03 15:12 ` Philip Yang
2025-10-03 15:39 ` Chen, Xiaogang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox