AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Lazar, Lijo" <lijo.lazar@amd.com>
To: "Pan, Xinhui" <Xinhui.Pan@amd.com>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Cc: "Deucher, Alexander" <Alexander.Deucher@amd.com>
Subject: Re: 回复: 回复: [PATCH v2] drm/amdgpu: Fix a race of IB test
Date: Mon, 13 Sep 2021 19:20:18 +0530	[thread overview]
Message-ID: <49d5a097-2ea3-4984-c6cc-3df6af338a5b@amd.com> (raw)
In-Reply-To: <DM4PR12MB51658EF37C6518E0F166DAF787D99@DM4PR12MB5165.namprd12.prod.outlook.com>

Thanks for the clarification Xinhui.

Based on Christian's explanation, what I understood is - this is an 
exceptional case in debugfs calls and the other goal is to avoid 
maintenance of one more lock just to support this API. I no longer have 
any issues with this approach.

Thanks,
Lijo

On 9/13/2021 12:53 PM, Pan, Xinhui wrote:
> [AMD Official Use Only]
> 
> Of source IB test can hang the GPU.
> But it wait fence with one specific timeout. and it not depends on gpu scheduler.
> So IB test must can return.
> 
> ________________________________________
> 发件人: Lazar, Lijo <Lijo.Lazar@amd.com>
> 发送时间: 2021年9月13日 15:15
> 收件人: Christian König; Koenig, Christian; Pan, Xinhui; amd-gfx@lists.freedesktop.org
> 抄送: Deucher, Alexander
> 主题: Re: 回复: [PATCH v2] drm/amdgpu: Fix a race of IB test
> 
> 
> 
> On 9/13/2021 12:21 PM, Christian König wrote:
>> Keep in mind that we don't try to avoid contention here. The goal is
>> rather to have as few locks as possible to avoid the extra overhead in
>> the hot path.
>>
>> Contention is completely irrelevant for the debug and device reset since
>> that are rarely occurring events and performance doesn't matter for them.
>>
>> It is perfectly reasonable to take the write side of the reset lock as
>> necessary when we need to make sure that we don't have concurrent device
>> access.
> 
> The original code has down_read which gave the impression that there is
> some protection to avoid access during reset. Basically would like to
> avoid this as a precedence for this sort of usage for any debugfs call.
> Reset semaphore is supposed to be a 'protect all' thing and provides a
> shortcut.
> 
> BTW, question about a hypothetical case - what happens if the test
> itself causes a hang and need to trigger a reset? Will there be chance
> for the lock to be released (whether a submit call will hang
> indefinitely) for the actual reset to be executed?
> 
> Thanks,
> Lijo
> 
>>
>> Regards,
>> Christian.
>>
>> Am 13.09.21 um 08:43 schrieb Lazar, Lijo:
>>> There are other interfaces to emulate the exact reset process, or
>>> atleast this is not the one we are using for doing any sort of reset
>>> through debugfs.
>>>
>>> In any case, the expectation is reset thread takes the write side of
>>> the lock and it's already done somewhere else.
>>>
>>> Reset semaphore is supposed to protect the device from concurrent
>>> access (any sort of resource usage is thus protected by default). Then
>>> the same logic can be applied for any other call and that is not a
>>> reasonable ask.
>>>
>>> Thanks,
>>> Lijo
>>>
>>> On 9/13/2021 12:07 PM, Christian König wrote:
>>>> That's complete nonsense.
>>>>
>>>> The debugfs interface emulates parts of the reset procedure for
>>>> testing and we absolutely need to take the same locks as the reset to
>>>> avoid corruption of the involved objects.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 13.09.21 um 08:25 schrieb Lazar, Lijo:
>>>>> This is a debugfs interface and adding another writer contention in
>>>>> debugfs over an actual reset is lazy fix. This shouldn't be executed
>>>>> in the first place and should not take precedence over any reset.
>>>>>
>>>>> Thanks,
>>>>> Lijo
>>>>>
>>>>>
>>>>> On 9/13/2021 11:52 AM, Christian König wrote:
>>>>>> NAK, this is not the lazy way to fix it at all.
>>>>>>
>>>>>> The reset semaphore protects the scheduler and ring objects from
>>>>>> concurrent modification, so taking the write side of it is
>>>>>> perfectly valid here.
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>> Am 13.09.21 um 06:42 schrieb Pan, Xinhui:
>>>>>>> [AMD Official Use Only]
>>>>>>>
>>>>>>> yep, that is a lazy way to fix it.
>>>>>>>
>>>>>>> I am thinking of adding one amdgpu_ring.direct_access_mutex before
>>>>>>> we issue test_ib on each ring.
>>>>>>> ________________________________________
>>>>>>> 发件人: Lazar, Lijo <Lijo.Lazar@amd.com>
>>>>>>> 发送时间: 2021年9月13日 12:00
>>>>>>> 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
>>>>>>> 抄送: Deucher, Alexander; Koenig, Christian
>>>>>>> 主题: Re: [PATCH v2] drm/amdgpu: Fix a race of IB test
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 9/13/2021 5:18 AM, xinhui pan wrote:
>>>>>>>> Direct IB submission should be exclusive. So use write lock.
>>>>>>>>
>>>>>>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
>>>>>>>>     1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>>>>>>> index 19323b4cce7b..be5d12ed3db1 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>>>>>>> @@ -1358,7 +1358,7 @@ static int
>>>>>>>> amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
>>>>>>>>         }
>>>>>>>>
>>>>>>>>         /* Avoid accidently unparking the sched thread during GPU
>>>>>>>> reset */
>>>>>>>> -     r = down_read_killable(&adev->reset_sem);
>>>>>>>> +     r = down_write_killable(&adev->reset_sem);
>>>>>>> There are many ioctls and debugfs calls which takes this lock and
>>>>>>> as you
>>>>>>> know the purpose is to avoid them while there is a reset. The
>>>>>>> purpose is
>>>>>>> *not to* fix any concurrency issues those calls themselves have
>>>>>>> otherwise and fixing those concurrency issues this way is just
>>>>>>> lazy and
>>>>>>> not acceptable.
>>>>>>>
>>>>>>> This will take away any fairness given to the writer in this rw
>>>>>>> lock and
>>>>>>> that is supposed to be the reset thread.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Lijo
>>>>>>>
>>>>>>>>         if (r)
>>>>>>>>                 return r;
>>>>>>>>
>>>>>>>> @@ -1387,7 +1387,7 @@ static int
>>>>>>>> amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
>>>>>>>>                 kthread_unpark(ring->sched.thread);
>>>>>>>>         }
>>>>>>>>
>>>>>>>> -     up_read(&adev->reset_sem);
>>>>>>>> +     up_write(&adev->reset_sem);
>>>>>>>>
>>>>>>>>         pm_runtime_mark_last_busy(dev->dev);
>>>>>>>>         pm_runtime_put_autosuspend(dev->dev);
>>>>>>>>
>>>>>>
>>>>
>>

  reply	other threads:[~2021-09-13 13:50 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-12 23:48 [PATCH v2] drm/amdgpu: Fix a race of IB test xinhui pan
2021-09-13  4:00 ` Lazar, Lijo
2021-09-13  4:42   ` 回复: " Pan, Xinhui
2021-09-13  6:22     ` Christian König
2021-09-13  6:25       ` Lazar, Lijo
2021-09-13  6:37         ` Christian König
2021-09-13  6:43           ` Lazar, Lijo
2021-09-13  6:51             ` Christian König
2021-09-13  7:15               ` Lazar, Lijo
2021-09-13  7:22                 ` Christian König
2021-09-13  7:23                 ` 回复: " Pan, Xinhui
2021-09-13 13:50                   ` Lazar, Lijo [this message]
2021-09-13 14:41 ` Andrey Grodzovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49d5a097-2ea3-4984-c6cc-3df6af338a5b@amd.com \
    --to=lijo.lazar@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Xinhui.Pan@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=ckoenig.leichtzumerken@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox