Re: [RFC v7 10/12] drm/sched: Break submission patterns with some randomness

amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

From: Pierre-Eric Pelloux-Prayer <pierre-eric@damsy.net>
To: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>,
	dri-devel@lists.freedesktop.org
Cc: kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org,
	"Christian König" <christian.koenig@amd.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Philipp Stanner" <phasta@kernel.org>,
	"Pierre-Eric Pelloux-Prayer" <pierre-eric.pelloux-prayer@amd.com>
Subject: Re: [RFC v7 10/12] drm/sched: Break submission patterns with some randomness
Date: Mon, 28 Jul 2025 11:28:11 +0200	[thread overview]
Message-ID: <fe05e8fd-d56f-4b32-a65b-46c9ef6df9c7@damsy.net> (raw)
In-Reply-To: <20250724141921.75583-11-tvrtko.ursulin@igalia.com>



Le 24/07/2025 à 16:19, Tvrtko Ursulin a écrit :
> GPUs generally don't implement preemption and DRM scheduler definitely
> does not support it at the front end scheduling level. This means
> execution quanta can be quite long and is controlled by userspace,
> consequence of which is picking the "wrong" entity to run can have a
> larger negative effect than it would have with a virtual runtime based CPU
> scheduler.
> 
> Another important consideration is that rendering clients often have
> shallow submission queues, meaning they will be entering and exiting the
> scheduler's runnable queue often.
> 
> Relevant scenario here is what happens when an entity re-joins the
> runnable queue with other entities already present. One cornerstone of the
> virtual runtime algorithm is to let it re-join at the head and depend on
> the virtual runtime accounting to sort out the order after an execution
> quanta or two.
> 
> However, as explained above, this may not work fully reliably in the GPU
> world. Entity could always get to overtake the existing entities, or not,
> depending on the submission order and rbtree equal key insertion
> behaviour.
> 
> We can break this latching by adding some randomness for this specific
> corner case.
> 
> If an entity is re-joining the runnable queue, was head of the queue the
> last time it got picked, and there is an already queued different entity
> of an equal scheduling priority, we can break the tie by randomly choosing
> the execution order between the two.
> 
> For randomness we implement a simple driver global boolean which selects
> whether new entity will be first or not. Because the boolean is global and
> shared between all the run queues and entities, its actual effect can be
> loosely called random. Under the assumption it will not always be the same
> entity which is re-joining the queue under these circumstances.
> 
> Another way to look at this is that it is adding a little bit of limited
> random round-robin behaviour to the fair scheduling algorithm.
> 
> Net effect is a significant improvemnt to the scheduling unit tests which
> check the scheduling quality for the interactive client running in
> parallel with GPU hogs.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> ---
>   drivers/gpu/drm/scheduler/sched_rq.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
> index d16ee3ee3653..087a6bdbb824 100644
> --- a/drivers/gpu/drm/scheduler/sched_rq.c
> +++ b/drivers/gpu/drm/scheduler/sched_rq.c
> @@ -147,6 +147,16 @@ drm_sched_entity_restore_vruntime(struct drm_sched_entity *entity,
>   			 * Higher priority can go first.
>   			 */
>   			vruntime = -us_to_ktime(rq_prio - prio);
> +		} else {
> +			static const int shuffle[2] = { -100, 100 };
> +			static bool r = 0;
> +
> +			/*
> +			 * For equal priority apply some randomness to break
> +			 * latching caused by submission patterns.
> +			 */
> +			vruntime = shuffle[r];
> +			r ^= 1;

I don't understand why this is needed at all?

I suppose this is related to how drm_sched_entity_save_vruntime saves a relative vruntime (= entity 
rejoins with a 0 runtime would be impossible otherwise) but I don't understand this either.

Pierre-Eric


>   		}
>   	}
>

next prev parent reply	other threads:[~2025-07-28  9:28 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-24 14:19 [RFC v7 00/12] Fair DRM scheduler Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 01/12] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 02/12] drm/sched: Add some more " Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 03/12] drm/sched: Implement RR via FIFO Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 04/12] drm/sched: Consolidate entity run queue management Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 05/12] drm/sched: Move run queue related code into a separate file Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 06/12] drm/sched: Free all finished jobs at once Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 07/12] drm/sched: Account entity GPU time Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 08/12] drm/sched: Remove idle entity from tree Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 09/12] drm/sched: Add fair scheduling policy Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 10/12] drm/sched: Break submission patterns with some randomness Tvrtko Ursulin
2025-07-28  9:28   ` Pierre-Eric Pelloux-Prayer [this message]
2025-07-28 11:14     ` Tvrtko Ursulin
2025-07-30  7:56       ` Philipp Stanner
2025-08-11 12:48         ` Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 11/12] drm/sched: Remove FIFO and RR and simplify to a single run queue Tvrtko Ursulin
2025-07-24 14:19 ` [RFC v7 12/12] drm/sched: Embed run queue singleton into the scheduler Tvrtko Ursulin
2025-07-28 10:39 ` [RFC v7 00/12] Fair DRM scheduler Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fe05e8fd-d56f-4b32-a65b-46c9ef6df9c7@damsy.net \
    --to=pierre-eric@damsy.net \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=kernel-dev@igalia.com \
    --cc=matthew.brost@intel.com \
    --cc=phasta@kernel.org \
    --cc=pierre-eric.pelloux-prayer@amd.com \
    --cc=tvrtko.ursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).