From: Vlastimil Babka <vbabka@suse.cz>
To: "Christian König" <christian.koenig@amd.com>,
"Matthew Brost" <matthew.brost@intel.com>
Cc: ltuikov89@gmail.com, dri-devel@lists.freedesktop.org,
Thorsten Leemhuis <regressions@leemhuis.info>,
Mario Limonciello <mario.limonciello@amd.com>,
daniel@ffwll.ch, Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>,
airlied@gmail.com, intel-xe@lists.freedesktop.org
Subject: Re: [PATCH] drm/sched: Drain all entities in DRM sched run job worker
Date: Mon, 29 Jan 2024 08:49:04 +0100 [thread overview]
Message-ID: <dd9ec121-b09f-4862-8a9c-14bb3605fc47@suse.cz> (raw)
In-Reply-To: <79a8fe04-66a3-406d-981a-06e40b386d99@amd.com>
On 1/29/24 08:44, Christian König wrote:
> Am 26.01.24 um 17:29 schrieb Matthew Brost:
>> On Fri, Jan 26, 2024 at 11:32:57AM +0100, Christian König wrote:
>>> Am 25.01.24 um 18:30 schrieb Matthew Brost:
>>>> On Thu, Jan 25, 2024 at 04:12:58PM +0100, Christian König wrote:
>>>>> Am 24.01.24 um 22:08 schrieb Matthew Brost:
>>>>>> All entities must be drained in the DRM scheduler run job worker to
>>>>>> avoid the following case. An entity found that is ready, no job found
>>>>>> ready on entity, and run job worker goes idle with other entities + jobs
>>>>>> ready. Draining all ready entities (i.e. loop over all ready entities)
>>>>>> in the run job worker ensures all job that are ready will be scheduled.
>>>>> That doesn't make sense. drm_sched_select_entity() only returns entities
>>>>> which are "ready", e.g. have a job to run.
>>>>>
>>>> That is what I thought too, hence my original design but it is not
>>>> exactly true. Let me explain.
>>>>
>>>> drm_sched_select_entity() returns an entity with a non-empty spsc queue
>>>> (job in queue) and no *current* waiting dependecies [1]. Dependecies for
>>>> an entity can be added when drm_sched_entity_pop_job() is called [2][3]
>>>> returning a NULL job. Thus we can get into a scenario where 2 entities
>>>> A and B both have jobs and no current dependecies. A's job is waiting
>>>> B's job, entity A gets selected first, a dependecy gets installed in
>>>> drm_sched_entity_pop_job(), run work goes idle, and now we deadlock.
>>> And here is the real problem. run work doesn't goes idle in that moment.
>>>
>>> drm_sched_run_job_work() should restarts itself until there is either no
>>> more space in the ring buffer or it can't find a ready entity any more.
>>>
>>> At least that was the original design when that was all still driven by a
>>> kthread.
>>>
>>> It can perfectly be that we messed this up when switching from kthread to a
>>> work item.
>>>
>> Right, that what this patch does - the run worker does not go idle until
>> no ready entities are found. That was incorrect in the original patch
>> and fixed here. Do you have any issues with this fix? It has been tested
>> 3x times and clearly fixes the issue.
>
> Ah! Yes in this case that patch here is a little bit ugly as well.
>
> The original idea was that run_job restarts so that we are able to pause
> the submission thread without searching for an entity to submit more.
>
> I strongly suggest to replace the while loop with a call to
> drm_sched_run_job_queue() so that when the entity can't provide a job we
> just restart the queuing work.
Note it's already included in rc2, so any changes need to be a followup fix.
If these are important, then please make sure they get to rc3 :)
next prev parent reply other threads:[~2024-01-29 7:49 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-24 21:08 [PATCH] drm/sched: Drain all entities in DRM sched run job worker Matthew Brost
2024-01-24 21:10 ` ✓ CI.Patch_applied: success for " Patchwork
2024-01-24 21:10 ` ✓ CI.checkpatch: " Patchwork
2024-01-24 21:11 ` ✓ CI.KUnit: " Patchwork
2024-01-24 21:18 ` ✓ CI.Build: " Patchwork
2024-01-24 21:19 ` ✓ CI.Hooks: " Patchwork
2024-01-24 21:20 ` ✓ CI.checksparse: " Patchwork
2024-01-24 21:43 ` ✓ CI.BAT: " Patchwork
2024-01-25 9:24 ` [PATCH] " Vlastimil Babka
2024-01-25 17:30 ` Matthew Brost
2024-01-26 2:45 ` Dave Airlie
2024-01-25 15:12 ` Christian König
2024-01-25 17:30 ` Matthew Brost
2024-01-26 10:32 ` Christian König
2024-01-26 16:29 ` Matthew Brost
2024-01-29 5:29 ` Luben Tuikov
2024-01-29 7:44 ` Christian König
2024-01-29 7:49 ` Vlastimil Babka [this message]
2024-01-29 17:10 ` Luben Tuikov
2024-01-29 18:31 ` Matthew Brost
2024-01-29 4:08 ` Luben Tuikov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dd9ec121-b09f-4862-8a9c-14bb3605fc47@suse.cz \
--to=vbabka@suse.cz \
--cc=airlied@gmail.com \
--cc=christian.koenig@amd.com \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=ltuikov89@gmail.com \
--cc=mario.limonciello@amd.com \
--cc=matthew.brost@intel.com \
--cc=mikhail.v.gavrilov@gmail.com \
--cc=regressions@leemhuis.info \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox