All of lore.kernel.org
 help / color / mirror / Atom feed
From: Felix Kuehling <felix.kuehling@amd.com>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"xinhui pan" <xinhui.pan@amd.com>,
	amd-gfx@lists.freedesktop.org
Cc: alexander.deucher@amd.com, christian.koenig@amd.com
Subject: Re: [PATCH] drm/amdgpu: Fix a NULL pointer of fence
Date: Thu, 7 Jul 2022 11:47:22 -0400	[thread overview]
Message-ID: <4b60ece6-afa5-62ca-afa6-bb800cdba982@amd.com> (raw)
In-Reply-To: <92f468dc-2fad-5135-4aeb-c8ce2a680c69@gmail.com>

Am 2022-07-07 um 05:54 schrieb Christian König:
> Am 07.07.22 um 11:50 schrieb xinhui pan:
>> Fence is accessed by dma_resv_add_fence() now.
>> Use amdgpu_amdkfd_remove_eviction_fence instead.
>>
>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index 0036c9e405af..1e25c400ce4f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -1558,10 +1558,10 @@ void amdgpu_amdkfd_gpuvm_destroy_cb(struct 
>> amdgpu_device *adev,
>>         if (!process_info)
>>           return;
>> -
>>       /* Release eviction fence from PD */
>>       amdgpu_bo_reserve(pd, false);
>> -    amdgpu_bo_fence(pd, NULL, false);
>> +    amdgpu_amdkfd_remove_eviction_fence(pd,
>> +                    process_info->eviction_fence);
>
> Good catch as well, but Felix needs to take a look at this.

This is weird. We used amdgpu_bo_fence(pd, NULL, false) here, which 
would have removed an exclusive fence. But as far as I can tell we added 
the fence as a shared fence in init_kfd_vm and 
amdgpu_amdkfd_gpuvm_restore_process_bos. So this probably never worked 
as intended.

You could try if this is really needed. Just remove the eviction fence 
removal. Then enable eviction debugging with

     echo Y > /sys/module/amdgpu/parameters/debug_evictions

Run some simple tests and check the kernel log to see if process 
termination is causing any unexpected evictions.

Regards,
   Felix


>
> Regards,
> Christian.
>
>>       amdgpu_bo_unreserve(pd);
>>         /* Update process info */
>

  reply	other threads:[~2022-07-07 15:47 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-07  9:50 [PATCH] drm/amdgpu: Fix a NULL pointer of fence xinhui pan
2022-07-07  9:54 ` Christian König
2022-07-07 15:47   ` Felix Kuehling [this message]
2022-07-08  1:08     ` Pan, Xinhui
2022-07-08  9:03       ` Christian König
2022-07-18 14:58         ` Mike Lothian
2022-07-18 15:29           ` Felix Kuehling
2022-07-18 15:46             ` Deucher, Alexander
2022-07-07 18:23 ` Mike Lothian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b60ece6-afa5-62ca-afa6-bb800cdba982@amd.com \
    --to=felix.kuehling@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=xinhui.pan@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.