From: "Lazar, Lijo" <lijo.lazar@amd.com>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>,
"Christian König" <christian.koenig@amd.com>,
"Pan, Xinhui" <Xinhui.Pan@amd.com>,
"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Cc: "Deucher, Alexander" <Alexander.Deucher@amd.com>
Subject: Re: 回复: [PATCH v2] drm/amdgpu: Fix a race of IB test
Date: Mon, 13 Sep 2021 12:45:35 +0530 [thread overview]
Message-ID: <687ec30e-ef63-0b55-8aa1-dbcf4b36f244@amd.com> (raw)
In-Reply-To: <beb849ee-0cf6-53a0-a563-56400e6cfacc@gmail.com>
On 9/13/2021 12:21 PM, Christian König wrote:
> Keep in mind that we don't try to avoid contention here. The goal is
> rather to have as few locks as possible to avoid the extra overhead in
> the hot path.
>
> Contention is completely irrelevant for the debug and device reset since
> that are rarely occurring events and performance doesn't matter for them.
>
> It is perfectly reasonable to take the write side of the reset lock as
> necessary when we need to make sure that we don't have concurrent device
> access.
The original code has down_read which gave the impression that there is
some protection to avoid access during reset. Basically would like to
avoid this as a precedence for this sort of usage for any debugfs call.
Reset semaphore is supposed to be a 'protect all' thing and provides a
shortcut.
BTW, question about a hypothetical case - what happens if the test
itself causes a hang and need to trigger a reset? Will there be chance
for the lock to be released (whether a submit call will hang
indefinitely) for the actual reset to be executed?
Thanks,
Lijo
>
> Regards,
> Christian.
>
> Am 13.09.21 um 08:43 schrieb Lazar, Lijo:
>> There are other interfaces to emulate the exact reset process, or
>> atleast this is not the one we are using for doing any sort of reset
>> through debugfs.
>>
>> In any case, the expectation is reset thread takes the write side of
>> the lock and it's already done somewhere else.
>>
>> Reset semaphore is supposed to protect the device from concurrent
>> access (any sort of resource usage is thus protected by default). Then
>> the same logic can be applied for any other call and that is not a
>> reasonable ask.
>>
>> Thanks,
>> Lijo
>>
>> On 9/13/2021 12:07 PM, Christian König wrote:
>>> That's complete nonsense.
>>>
>>> The debugfs interface emulates parts of the reset procedure for
>>> testing and we absolutely need to take the same locks as the reset to
>>> avoid corruption of the involved objects.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 13.09.21 um 08:25 schrieb Lazar, Lijo:
>>>> This is a debugfs interface and adding another writer contention in
>>>> debugfs over an actual reset is lazy fix. This shouldn't be executed
>>>> in the first place and should not take precedence over any reset.
>>>>
>>>> Thanks,
>>>> Lijo
>>>>
>>>>
>>>> On 9/13/2021 11:52 AM, Christian König wrote:
>>>>> NAK, this is not the lazy way to fix it at all.
>>>>>
>>>>> The reset semaphore protects the scheduler and ring objects from
>>>>> concurrent modification, so taking the write side of it is
>>>>> perfectly valid here.
>>>>>
>>>>> Christian.
>>>>>
>>>>> Am 13.09.21 um 06:42 schrieb Pan, Xinhui:
>>>>>> [AMD Official Use Only]
>>>>>>
>>>>>> yep, that is a lazy way to fix it.
>>>>>>
>>>>>> I am thinking of adding one amdgpu_ring.direct_access_mutex before
>>>>>> we issue test_ib on each ring.
>>>>>> ________________________________________
>>>>>> 发件人: Lazar, Lijo <Lijo.Lazar@amd.com>
>>>>>> 发送时间: 2021年9月13日 12:00
>>>>>> 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
>>>>>> 抄送: Deucher, Alexander; Koenig, Christian
>>>>>> 主题: Re: [PATCH v2] drm/amdgpu: Fix a race of IB test
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 9/13/2021 5:18 AM, xinhui pan wrote:
>>>>>>> Direct IB submission should be exclusive. So use write lock.
>>>>>>>
>>>>>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>>>>>> index 19323b4cce7b..be5d12ed3db1 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>>>>>> @@ -1358,7 +1358,7 @@ static int
>>>>>>> amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
>>>>>>> }
>>>>>>>
>>>>>>> /* Avoid accidently unparking the sched thread during GPU
>>>>>>> reset */
>>>>>>> - r = down_read_killable(&adev->reset_sem);
>>>>>>> + r = down_write_killable(&adev->reset_sem);
>>>>>> There are many ioctls and debugfs calls which takes this lock and
>>>>>> as you
>>>>>> know the purpose is to avoid them while there is a reset. The
>>>>>> purpose is
>>>>>> *not to* fix any concurrency issues those calls themselves have
>>>>>> otherwise and fixing those concurrency issues this way is just
>>>>>> lazy and
>>>>>> not acceptable.
>>>>>>
>>>>>> This will take away any fairness given to the writer in this rw
>>>>>> lock and
>>>>>> that is supposed to be the reset thread.
>>>>>>
>>>>>> Thanks,
>>>>>> Lijo
>>>>>>
>>>>>>> if (r)
>>>>>>> return r;
>>>>>>>
>>>>>>> @@ -1387,7 +1387,7 @@ static int
>>>>>>> amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
>>>>>>> kthread_unpark(ring->sched.thread);
>>>>>>> }
>>>>>>>
>>>>>>> - up_read(&adev->reset_sem);
>>>>>>> + up_write(&adev->reset_sem);
>>>>>>>
>>>>>>> pm_runtime_mark_last_busy(dev->dev);
>>>>>>> pm_runtime_put_autosuspend(dev->dev);
>>>>>>>
>>>>>
>>>
>
next prev parent reply other threads:[~2021-09-13 7:15 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-12 23:48 [PATCH v2] drm/amdgpu: Fix a race of IB test xinhui pan
2021-09-13 4:00 ` Lazar, Lijo
2021-09-13 4:42 ` 回复: " Pan, Xinhui
2021-09-13 6:22 ` Christian König
2021-09-13 6:25 ` Lazar, Lijo
2021-09-13 6:37 ` Christian König
2021-09-13 6:43 ` Lazar, Lijo
2021-09-13 6:51 ` Christian König
2021-09-13 7:15 ` Lazar, Lijo [this message]
2021-09-13 7:22 ` Christian König
2021-09-13 7:23 ` 回复: " Pan, Xinhui
2021-09-13 13:50 ` Lazar, Lijo
2021-09-13 14:41 ` Andrey Grodzovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=687ec30e-ef63-0b55-8aa1-dbcf4b36f244@amd.com \
--to=lijo.lazar@amd.com \
--cc=Alexander.Deucher@amd.com \
--cc=Xinhui.Pan@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=ckoenig.leichtzumerken@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox