From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
To: "Yadav, Arvind" <arvyadav@amd.com>,
Arvind Yadav <Arvind.Yadav@amd.com>,
Christian.Koenig@amd.com, shashank.sharma@amd.com,
amaranath.somalapuram@amd.com, Arunpravin.PaneerSelvam@amd.com,
sumit.semwal@linaro.org, gustavo@padovan.org, airlied@linux.ie,
daniel@ffwll.ch, linux-media@vger.kernel.org,
dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 5/6] drm/sched: Use parent fence instead of finished
Date: Fri, 9 Sep 2022 16:55:39 -0400 [thread overview]
Message-ID: <f96d7b4b-2cbd-223a-3140-dbd5178fbe8d@amd.com> (raw)
In-Reply-To: <b0b81d03-840d-bcf2-3593-5fc0079f1e6a@amd.com>
Got it.
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Andrey
On 2022-09-09 16:30, Yadav, Arvind wrote:
>
> On 9/9/2022 11:02 PM, Andrey Grodzovsky wrote:
>> What exactly is the scenario which this patch fixes in more detail
>> please ?
>>
> GPU reset issue started after adding [PATCH 6/6].
>
> Root cause -> In drm_sched_get_cleanup_job(), We use the finished
> fence status bit to check the job status dma_fence_is_signaled(). If a
> job is signaled (DMA_FENCE_FLAG_SIGNALED_BIT is set), then we cancel
> the reset worker thread.
>
> After applying [patch 6] now we are checking enable signaling in
> dma_fence_is_signaled() by checking DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT
> bit. but signaling is not enabled for the finished fence. As a result,
> dma_fence_is_signaled() always returns false, and
> drm_sched_get_cleanup_job() will not cancel the reset worker thread,
> resulting in the GPU reset.
>
> To Fix the above issue Christian suggested that we can use
> parent(hardware) fence instead of finished fence because signaling
> enabled by the calling of dma_fence_add_callback() for parent fence.
> As a result, dma_fence_is_signaled() will return the correct fence
> status and reset worker thread can be cancelled in
> drm_sched_get_cleanup_job().
>
> ~arvind
>
>> Andrey
>>
>> On 2022-09-09 13:08, Arvind Yadav wrote:
>>> Using the parent fence instead of the finished fence
>>> to get the job status. This change is to avoid GPU
>>> scheduler timeout error which can cause GPU reset.
>>>
>>> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
>>> ---
>>>
>>> changes in v1,v2 - Enable signaling for finished fence in sche_main()
>>> is removed
>>>
>>> ---
>>> drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index e0ab14e0fb6b..2ac28ad11432 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -829,7 +829,7 @@ drm_sched_get_cleanup_job(struct
>>> drm_gpu_scheduler *sched)
>>> job = list_first_entry_or_null(&sched->pending_list,
>>> struct drm_sched_job, list);
>>> - if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>> + if (job && dma_fence_is_signaled(job->s_fence->parent)) {
>>> /* remove job from pending_list */
>>> list_del_init(&job->list);
>>> @@ -841,7 +841,7 @@ drm_sched_get_cleanup_job(struct
>>> drm_gpu_scheduler *sched)
>>> if (next) {
>>> next->s_fence->scheduled.timestamp =
>>> - job->s_fence->finished.timestamp;
>>> + job->s_fence->parent->timestamp;
>>> /* start TO timer for next job */
>>> drm_sched_start_timeout(sched);
>>> }
next prev parent reply other threads:[~2022-09-09 20:55 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-09 17:08 [PATCH v3 0/6] dma-buf: Check status of enable-signaling bit on debug Arvind Yadav
2022-09-09 17:08 ` [PATCH v3 1/6] dma-buf: Remove the signaled bit status check Arvind Yadav
2022-09-12 6:44 ` Christian König
2022-09-09 17:08 ` [PATCH v3 2/6] dma-buf: set signaling bit for the stub fence Arvind Yadav
2022-09-12 6:45 ` Christian König
2022-09-09 17:08 ` [PATCH v3 3/6] dma-buf: Enable signaling on fence for selftests Arvind Yadav
2022-09-12 6:51 ` Christian König
2022-09-09 17:08 ` [PATCH v3 4/6] drm/amdgpu: Enable signaling on fence Arvind Yadav
2022-09-12 8:46 ` Christian König
2022-09-09 17:08 ` [PATCH v3 5/6] drm/sched: Use parent fence instead of finished Arvind Yadav
2022-09-09 17:32 ` Andrey Grodzovsky
2022-09-09 20:30 ` Yadav, Arvind
2022-09-09 20:55 ` Andrey Grodzovsky [this message]
2022-09-09 17:08 ` [PATCH v3 6/6] dma-buf: Check status of enable-signaling bit on debug Arvind Yadav
2022-09-12 8:48 ` Christian König
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f96d7b4b-2cbd-223a-3140-dbd5178fbe8d@amd.com \
--to=andrey.grodzovsky@amd.com \
--cc=Arunpravin.PaneerSelvam@amd.com \
--cc=Arvind.Yadav@amd.com \
--cc=Christian.Koenig@amd.com \
--cc=airlied@linux.ie \
--cc=amaranath.somalapuram@amd.com \
--cc=arvyadav@amd.com \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=gustavo@padovan.org \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=shashank.sharma@amd.com \
--cc=sumit.semwal@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox