public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed
From: "Yadav, Arvind" <arvyadav@amd.com>
To: Andrey Grodzovsky <andrey.grodzovsky@amd.com>,
	Arvind Yadav <Arvind.Yadav@amd.com>,
	Christian.Koenig@amd.com, shashank.sharma@amd.com,
	amaranath.somalapuram@amd.com, Arunpravin.PaneerSelvam@amd.com,
	sumit.semwal@linaro.org, gustavo@padovan.org, airlied@linux.ie,
	daniel@ffwll.ch, linux-media@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 5/6] drm/sched: Use parent fence instead of finished
Date: Sat, 10 Sep 2022 02:00:54 +0530	[thread overview]
Message-ID: <b0b81d03-840d-bcf2-3593-5fc0079f1e6a@amd.com> (raw)
In-Reply-To: <2937dc45-0b62-7c71-b846-942fa91cbb4e@amd.com>


On 9/9/2022 11:02 PM, Andrey Grodzovsky wrote:
> What exactly is the scenario which this patch fixes in more detail 
> please  ?
>
GPU reset issue started after adding [PATCH 6/6].

Root cause -> In drm_sched_get_cleanup_job(), We use the finished fence 
status bit to check the job status dma_fence_is_signaled(). If a job is 
signaled (DMA_FENCE_FLAG_SIGNALED_BIT is set), then we cancel the reset 
worker thread.

After applying [patch 6] now we are checking enable signaling in 
dma_fence_is_signaled() by checking DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT 
bit. but signaling is not enabled for the finished fence. As a result, 
dma_fence_is_signaled() always returns false, and 
drm_sched_get_cleanup_job() will not cancel the reset worker thread, 
resulting in the GPU reset.

To Fix the above issue  Christian suggested that we can use 
parent(hardware) fence instead of finished fence because signaling 
enabled by the calling of dma_fence_add_callback() for parent fence. As 
a result, dma_fence_is_signaled() will return the correct fence status 
and reset worker thread can be cancelled in drm_sched_get_cleanup_job().

~arvind

> Andrey
>
> On 2022-09-09 13:08, Arvind Yadav wrote:
>> Using the parent fence instead of the finished fence
>> to get the job status. This change is to avoid GPU
>> scheduler timeout error which can cause GPU reset.
>>
>> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
>> ---
>>
>> changes in v1,v2 - Enable signaling for finished fence in sche_main()
>> is removed
>>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index e0ab14e0fb6b..2ac28ad11432 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -829,7 +829,7 @@ drm_sched_get_cleanup_job(struct 
>> drm_gpu_scheduler *sched)
>>       job = list_first_entry_or_null(&sched->pending_list,
>>                          struct drm_sched_job, list);
>>   -    if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>> +    if (job && dma_fence_is_signaled(job->s_fence->parent)) {
>>           /* remove job from pending_list */
>>           list_del_init(&job->list);
>>   @@ -841,7 +841,7 @@ drm_sched_get_cleanup_job(struct 
>> drm_gpu_scheduler *sched)
>>             if (next) {
>>               next->s_fence->scheduled.timestamp =
>> -                job->s_fence->finished.timestamp;
>> +                job->s_fence->parent->timestamp;
>>               /* start TO timer for next job */
>>               drm_sched_start_timeout(sched);
>>           }

  reply	other threads:[~2022-09-09 20:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-09 17:08 [PATCH v3 0/6] dma-buf: Check status of enable-signaling bit on debug Arvind Yadav
2022-09-09 17:08 ` [PATCH v3 1/6] dma-buf: Remove the signaled bit status check Arvind Yadav
2022-09-12  6:44   ` Christian König
2022-09-09 17:08 ` [PATCH v3 2/6] dma-buf: set signaling bit for the stub fence Arvind Yadav
2022-09-12  6:45   ` Christian König
2022-09-09 17:08 ` [PATCH v3 3/6] dma-buf: Enable signaling on fence for selftests Arvind Yadav
2022-09-12  6:51   ` Christian König
2022-09-09 17:08 ` [PATCH v3 4/6] drm/amdgpu: Enable signaling on fence Arvind Yadav
2022-09-12  8:46   ` Christian König
2022-09-09 17:08 ` [PATCH v3 5/6] drm/sched: Use parent fence instead of finished Arvind Yadav
2022-09-09 17:32   ` Andrey Grodzovsky
2022-09-09 20:30     ` Yadav, Arvind [this message]
2022-09-09 20:55       ` Andrey Grodzovsky
2022-09-09 17:08 ` [PATCH v3 6/6] dma-buf: Check status of enable-signaling bit on debug Arvind Yadav
2022-09-12  8:48   ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b0b81d03-840d-bcf2-3593-5fc0079f1e6a@amd.com \
    --to=arvyadav@amd.com \
    --cc=Arunpravin.PaneerSelvam@amd.com \
    --cc=Arvind.Yadav@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=airlied@linux.ie \
    --cc=amaranath.somalapuram@amd.com \
    --cc=andrey.grodzovsky@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gustavo@padovan.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=shashank.sharma@amd.com \
    --cc=sumit.semwal@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox