AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: "Dmitry Osipenko" <dmitry.osipenko@collabora.com>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	luben.tuikov@amd.com, dri-devel@lists.freedesktop.org,
	amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 12/13] drm/scheduler: rework entity flush, kill and fini
Date: Thu, 17 Nov 2022 14:11:00 +0100	[thread overview]
Message-ID: <4f5766ab-d31f-d0c8-6b1e-0c7e0fbabfed@amd.com> (raw)
In-Reply-To: <126a8c1e-69ec-5068-1aad-30f5e7c3ef21@collabora.com>

Am 17.11.22 um 14:00 schrieb Dmitry Osipenko:
> On 11/17/22 15:59, Dmitry Osipenko wrote:
>> On 11/17/22 15:55, Christian König wrote:
>>> Am 17.11.22 um 13:47 schrieb Dmitry Osipenko:
>>>> On 11/17/22 12:53, Christian König wrote:
>>>>> Am 17.11.22 um 03:36 schrieb Dmitry Osipenko:
>>>>>> Hi,
>>>>>>
>>>>>> On 10/14/22 11:46, Christian König wrote:
>>>>>>> +/* Remove the entity from the scheduler and kill all pending jobs */
>>>>>>> +static void drm_sched_entity_kill(struct drm_sched_entity *entity)
>>>>>>> +{
>>>>>>> +    struct drm_sched_job *job;
>>>>>>> +    struct dma_fence *prev;
>>>>>>> +
>>>>>>> +    if (!entity->rq)
>>>>>>> +        return;
>>>>>>> +
>>>>>>> +    spin_lock(&entity->rq_lock);
>>>>>>> +    entity->stopped = true;
>>>>>>> +    drm_sched_rq_remove_entity(entity->rq, entity);
>>>>>>> +    spin_unlock(&entity->rq_lock);
>>>>>>> +
>>>>>>> +    /* Make sure this entity is not used by the scheduler at the
>>>>>>> moment */
>>>>>>> +    wait_for_completion(&entity->entity_idle);
>>>>>> I'm always hitting lockup here using Panfrost driver on terminating
>>>>>> Xorg. Revering this patch helps. Any ideas how to fix it?
>>>>>>
>>>>> Well is the entity idle or are there some unsubmitted jobs left?
>>>> Do you mean unsubmitted to h/w? IIUC, there are unsubmitted jobs left.
>>>>
>>>> I see that there are 5-6 incomplete (in-flight) jobs when
>>>> panfrost_job_close() is invoked.
>>>>
>>>> There are 1-2 jobs that are constantly scheduled and finished once in a
>>>> few seconds after the lockup happens.
>>> Well what drm_sched_entity_kill() is supposed to do is to prevent
>>> pushing queued up stuff to the hw when the process which queued it is
>>> killed. Is the process really killed or is that just some incorrect
>>> handling?
>> It's actually 5-6 incomplete jobs of Xorg that are hanging when Xorg
>> process is closed.
>>
>> The two re-scheduled jobs are from sddm, so it's only the Xorg context
>> that hangs.
>>
>>> In other words I see two possibilities here, either we have a bug in the
>>> scheduler or panfrost isn't using it correctly.
>>>
>>> Does panfrost calls drm_sched_entity_flush() before it calls
>>> drm_sched_entity_fini()? (I don't have the driver source at hand at the
>>> moment).
>> Panfrost doesn't use drm_sched_entity_flush(), nor drm_sched_entity_flush().
> *nor drm_sched_entity_fini()

Well that would mean that this is *really* buggy! How do you then end up 
in drm_sched_entity_kill()? From drm_sched_entity_destroy()?

drm_sched_entity_flush() should be called from the flush callback from 
the file_operations structure of panfrost. See amdgpu_flush() and 
amdgpu_ctx_mgr_entity_flush(). This makes sure that we wait for all 
entities of the process/file descriptor to be flushed out.

drm_sched_entity_fini() must be called before you free the memory the 
entity structure or otherwise we would run into an use after free.

Regards,
Christian.

  reply	other threads:[~2022-11-17 13:11 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-14  8:46 Fixes for scheduler hang when killing a process Christian König
2022-10-14  8:46 ` [PATCH 01/13] drm/scheduler: fix fence ref counting Christian König
2022-10-25  3:23   ` Luna Nova
2022-10-25 11:35     ` Christian König
2022-10-14  8:46 ` [PATCH 02/13] drm/scheduler: add drm_sched_job_add_resv_dependencies Christian König
2022-10-14  8:46 ` [PATCH 03/13] drm/amdgpu: use drm_sched_job_add_resv_dependencies for moves Christian König
2022-10-14  8:46 ` [PATCH 04/13] drm/amdgpu: drop the fence argument from amdgpu_vmid_grab Christian König
2022-10-14  8:46 ` [PATCH 05/13] drm/amdgpu: drop amdgpu_sync " Christian König
2022-10-23  1:25   ` Luben Tuikov
2022-10-24 10:54     ` Christian König
2022-10-14  8:46 ` [PATCH 06/13] drm/amdgpu: cleanup scheduler job initialization Christian König
2022-10-23  1:50   ` Luben Tuikov
2022-10-14  8:46 ` [PATCH 07/13] drm/amdgpu: move explicit sync check into the CS Christian König
2022-10-14  8:46 ` [PATCH 08/13] drm/amdgpu: use scheduler depenencies for VM updates Christian König
2022-10-24  5:50   ` Luben Tuikov
2022-10-14  8:46 ` [PATCH 09/13] drm/amdgpu: use scheduler depenencies for UVD msgs Christian König
2022-10-24  5:53   ` Luben Tuikov
2022-10-14  8:46 ` [PATCH 10/13] drm/amdgpu: use scheduler depenencies for CS Christian König
2022-10-24  5:55   ` Luben Tuikov
2022-12-21 15:34   ` Mike Lothian
2022-12-21 15:47     ` Mike Lothian
2022-12-21 15:52     ` Luben Tuikov
2022-12-21 15:55       ` Mike Lothian
2022-10-14  8:46 ` [PATCH 11/13] drm/scheduler: remove drm_sched_dependency_optimized Christian König
2022-10-14  8:46 ` [PATCH 12/13] drm/scheduler: rework entity flush, kill and fini Christian König
2022-11-17  2:36   ` Dmitry Osipenko
2022-11-17  9:53     ` Christian König
2022-11-17 12:47       ` Dmitry Osipenko
2022-11-17 12:55         ` Christian König
2022-11-17 12:59           ` Dmitry Osipenko
2022-11-17 13:00             ` Dmitry Osipenko
2022-11-17 13:11               ` Christian König [this message]
2022-11-17 14:41                 ` Dmitry Osipenko
2022-11-17 15:09                   ` Christian König
2022-11-17 15:11                     ` Dmitry Osipenko
2022-12-28 16:27                       ` Rob Clark
2022-12-28 16:52                         ` Rob Clark
2023-01-01 18:29                           ` youling257
2023-01-02  9:24                             ` Dmitry Osipenko
2023-01-02 14:17                               ` youling 257
2023-01-02 15:08                                 ` Dmitry Osipenko
2022-10-14  8:46 ` [PATCH 13/13] drm/scheduler: rename dependency callback into prepare_job Christian König
2022-10-23  1:35 ` Fixes for scheduler hang when killing a process Luben Tuikov
2022-10-24  7:00 ` Luben Tuikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4f5766ab-d31f-d0c8-6b1e-0c7e0fbabfed@amd.com \
    --to=christian.koenig@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=dmitry.osipenko@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=luben.tuikov@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox