From: Pierre-Eric Pelloux-Prayer <pierre-eric@damsy.net>
To: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>,
dri-devel@lists.freedesktop.org
Cc: kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org,
intel-xe@lists.freedesktop.org,
"Christian König" <christian.koenig@amd.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"Matthew Brost" <matthew.brost@intel.com>,
"Philipp Stanner" <phasta@kernel.org>,
"Pierre-Eric Pelloux-Prayer" <pierre-eric.pelloux-prayer@amd.com>
Subject: Re: [RFC v7 10/12] drm/sched: Break submission patterns with some randomness
Date: Mon, 28 Jul 2025 11:28:11 +0200 [thread overview]
Message-ID: <fe05e8fd-d56f-4b32-a65b-46c9ef6df9c7@damsy.net> (raw)
In-Reply-To: <20250724141921.75583-11-tvrtko.ursulin@igalia.com>
Le 24/07/2025 à 16:19, Tvrtko Ursulin a écrit :
> GPUs generally don't implement preemption and DRM scheduler definitely
> does not support it at the front end scheduling level. This means
> execution quanta can be quite long and is controlled by userspace,
> consequence of which is picking the "wrong" entity to run can have a
> larger negative effect than it would have with a virtual runtime based CPU
> scheduler.
>
> Another important consideration is that rendering clients often have
> shallow submission queues, meaning they will be entering and exiting the
> scheduler's runnable queue often.
>
> Relevant scenario here is what happens when an entity re-joins the
> runnable queue with other entities already present. One cornerstone of the
> virtual runtime algorithm is to let it re-join at the head and depend on
> the virtual runtime accounting to sort out the order after an execution
> quanta or two.
>
> However, as explained above, this may not work fully reliably in the GPU
> world. Entity could always get to overtake the existing entities, or not,
> depending on the submission order and rbtree equal key insertion
> behaviour.
>
> We can break this latching by adding some randomness for this specific
> corner case.
>
> If an entity is re-joining the runnable queue, was head of the queue the
> last time it got picked, and there is an already queued different entity
> of an equal scheduling priority, we can break the tie by randomly choosing
> the execution order between the two.
>
> For randomness we implement a simple driver global boolean which selects
> whether new entity will be first or not. Because the boolean is global and
> shared between all the run queues and entities, its actual effect can be
> loosely called random. Under the assumption it will not always be the same
> entity which is re-joining the queue under these circumstances.
>
> Another way to look at this is that it is adding a little bit of limited
> random round-robin behaviour to the fair scheduling algorithm.
>
> Net effect is a significant improvemnt to the scheduling unit tests which
> check the scheduling quality for the interactive client running in
> parallel with GPU hogs.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> ---
> drivers/gpu/drm/scheduler/sched_rq.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
> index d16ee3ee3653..087a6bdbb824 100644
> --- a/drivers/gpu/drm/scheduler/sched_rq.c
> +++ b/drivers/gpu/drm/scheduler/sched_rq.c
> @@ -147,6 +147,16 @@ drm_sched_entity_restore_vruntime(struct drm_sched_entity *entity,
> * Higher priority can go first.
> */
> vruntime = -us_to_ktime(rq_prio - prio);
> + } else {
> + static const int shuffle[2] = { -100, 100 };
> + static bool r = 0;
> +
> + /*
> + * For equal priority apply some randomness to break
> + * latching caused by submission patterns.
> + */
> + vruntime = shuffle[r];
> + r ^= 1;
I don't understand why this is needed at all?
I suppose this is related to how drm_sched_entity_save_vruntime saves a relative vruntime (= entity
rejoins with a 0 runtime would be impossible otherwise) but I don't understand this either.
Pierre-Eric
> }
> }
>
next prev parent reply other threads:[~2025-07-28 9:28 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-24 14:19 [RFC v7 00/12] Fair DRM scheduler Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 01/12] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 02/12] drm/sched: Add some more " Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 03/12] drm/sched: Implement RR via FIFO Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 04/12] drm/sched: Consolidate entity run queue management Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 05/12] drm/sched: Move run queue related code into a separate file Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 06/12] drm/sched: Free all finished jobs at once Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 07/12] drm/sched: Account entity GPU time Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 08/12] drm/sched: Remove idle entity from tree Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 09/12] drm/sched: Add fair scheduling policy Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 10/12] drm/sched: Break submission patterns with some randomness Tvrtko Ursulin
2025-07-28 9:28 ` Pierre-Eric Pelloux-Prayer [this message]
2025-07-28 11:14 ` Tvrtko Ursulin
2025-07-30 7:56 ` Philipp Stanner
2025-08-11 12:48 ` Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 11/12] drm/sched: Remove FIFO and RR and simplify to a single run queue Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 12/12] drm/sched: Embed run queue singleton into the scheduler Tvrtko Ursulin
2025-07-28 10:39 ` [RFC v7 00/12] Fair DRM scheduler Tvrtko Ursulin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fe05e8fd-d56f-4b32-a65b-46c9ef6df9c7@damsy.net \
--to=pierre-eric@damsy.net \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=kernel-dev@igalia.com \
--cc=matthew.brost@intel.com \
--cc=phasta@kernel.org \
--cc=pierre-eric.pelloux-prayer@amd.com \
--cc=tvrtko.ursulin@igalia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).