From: "Christian König" <christian.koenig@amd.com>
To: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>,
dri-devel@lists.freedesktop.org
Cc: intel-xe@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
kernel-dev@igalia.com, Danilo Krummrich <dakr@kernel.org>,
Matthew Brost <matthew.brost@intel.com>,
Philipp Stanner <phasta@kernel.org>
Subject: Re: [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation
Date: Tue, 8 Jul 2025 14:37:33 +0200 [thread overview]
Message-ID: <cb140d4e-01cd-4cd7-bd7c-5c10b44cf98f@amd.com> (raw)
In-Reply-To: <20250708095147.73366-15-tvrtko.ursulin@igalia.com>
On 08.07.25 11:51, Tvrtko Ursulin wrote:
> There is no reason to queue just a single job if scheduler can take more
> and re-queue the worker to queue more.
That's not correct. This was intentionally avoided.
If more than just the scheduler is using the single threaded workqeueu other workers, especially the timeout worker, can jump in and execute first.
We explicitely removed submitting more than one job in each worker run.
Regards,
Christian.
> We can simply feed the hardware
> with as much as it can take in one go and hopefully win some latency.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> ---
> drivers/gpu/drm/scheduler/sched_internal.h | 2 -
> drivers/gpu/drm/scheduler/sched_main.c | 132 ++++++++++-----------
> drivers/gpu/drm/scheduler/sched_rq.c | 12 +-
> 3 files changed, 64 insertions(+), 82 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
> index 15d78abc48df..1a5c2f255223 100644
> --- a/drivers/gpu/drm/scheduler/sched_internal.h
> +++ b/drivers/gpu/drm/scheduler/sched_internal.h
> @@ -22,8 +22,6 @@ struct drm_sched_entity_stats {
> u64 vruntime;
> };
>
> -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
> - struct drm_sched_entity *entity);
> void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>
> void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 35025edea669..1fb3f1da4821 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -95,35 +95,6 @@ static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
> return credits;
> }
>
> -/**
> - * drm_sched_can_queue -- Can we queue more to the hardware?
> - * @sched: scheduler instance
> - * @entity: the scheduler entity
> - *
> - * Return true if we can push at least one more job from @entity, false
> - * otherwise.
> - */
> -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
> - struct drm_sched_entity *entity)
> -{
> - struct drm_sched_job *s_job;
> -
> - s_job = drm_sched_entity_queue_peek(entity);
> - if (!s_job)
> - return false;
> -
> - /* If a job exceeds the credit limit, truncate it to the credit limit
> - * itself to guarantee forward progress.
> - */
> - if (s_job->credits > sched->credit_limit) {
> - dev_WARN(sched->dev,
> - "Jobs may not exceed the credit limit, truncate.\n");
> - s_job->credits = sched->credit_limit;
> - }
> -
> - return drm_sched_available_credits(sched) >= s_job->credits;
> -}
> -
> /**
> * drm_sched_run_job_queue - enqueue run-job work
> * @sched: scheduler instance
> @@ -940,54 +911,77 @@ static void drm_sched_run_job_work(struct work_struct *w)
> {
> struct drm_gpu_scheduler *sched =
> container_of(w, struct drm_gpu_scheduler, work_run_job);
> + u32 job_credits, submitted_credits = 0;
> struct drm_sched_entity *entity;
> - struct dma_fence *fence;
> struct drm_sched_fence *s_fence;
> struct drm_sched_job *sched_job;
> - int r;
> + struct dma_fence *fence;
>
> - /* Find entity with a ready job */
> - entity = drm_sched_rq_select_entity(sched, sched->rq);
> - if (IS_ERR_OR_NULL(entity))
> - return; /* No more work */
> + while (!READ_ONCE(sched->pause_submit)) {
> + /* Find entity with a ready job */
> + entity = drm_sched_rq_select_entity(sched, sched->rq);
> + if (!entity)
> + break; /* No more work */
> +
> + sched_job = drm_sched_entity_queue_peek(entity);
> + if (!sched_job) {
> + complete_all(&entity->entity_idle);
> + continue;
> + }
> +
> + job_credits = sched_job->credits;
> + /*
> + * If a job exceeds the credit limit truncate it to guarantee
> + * forward progress.
> + */
> + if (dev_WARN_ONCE(sched->dev, job_credits > sched->credit_limit,
> + "Jobs may not exceed the credit limit, truncating.\n"))
> + job_credits = sched_job->credits = sched->credit_limit;
> +
> + if (job_credits > drm_sched_available_credits(sched)) {
> + complete_all(&entity->entity_idle);
> + break;
> + }
> +
> + sched_job = drm_sched_entity_pop_job(entity);
> + if (!sched_job) {
> + /* Top entity is not yet runnable after all */
> + complete_all(&entity->entity_idle);
> + continue;
> + }
> +
> + s_fence = sched_job->s_fence;
> + drm_sched_job_begin(sched_job);
> + trace_drm_sched_job_run(sched_job, entity);
> + submitted_credits += job_credits;
> + atomic_add(job_credits, &sched->credit_count);
> +
> + fence = sched->ops->run_job(sched_job);
> + drm_sched_fence_scheduled(s_fence, fence);
> +
> + if (!IS_ERR_OR_NULL(fence)) {
> + int r;
> +
> + /* Drop for original kref_init of the fence */
> + dma_fence_put(fence);
> +
> + r = dma_fence_add_callback(fence, &sched_job->cb,
> + drm_sched_job_done_cb);
> + if (r == -ENOENT)
> + drm_sched_job_done(sched_job, fence->error);
> + else if (r)
> + DRM_DEV_ERROR(sched->dev,
> + "fence add callback failed (%d)\n", r);
> + } else {
> + drm_sched_job_done(sched_job, IS_ERR(fence) ?
> + PTR_ERR(fence) : 0);
> + }
>
> - sched_job = drm_sched_entity_pop_job(entity);
> - if (!sched_job) {
> complete_all(&entity->entity_idle);
> - drm_sched_run_job_queue(sched);
> - return;
> }
>
> - s_fence = sched_job->s_fence;
> -
> - atomic_add(sched_job->credits, &sched->credit_count);
> - drm_sched_job_begin(sched_job);
> -
> - trace_drm_sched_job_run(sched_job, entity);
> - /*
> - * The run_job() callback must by definition return a fence whose
> - * refcount has been incremented for the scheduler already.
> - */
> - fence = sched->ops->run_job(sched_job);
> - complete_all(&entity->entity_idle);
> - drm_sched_fence_scheduled(s_fence, fence);
> -
> - if (!IS_ERR_OR_NULL(fence)) {
> - r = dma_fence_add_callback(fence, &sched_job->cb,
> - drm_sched_job_done_cb);
> - if (r == -ENOENT)
> - drm_sched_job_done(sched_job, fence->error);
> - else if (r)
> - DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r);
> -
> - dma_fence_put(fence);
> - } else {
> - drm_sched_job_done(sched_job, IS_ERR(fence) ?
> - PTR_ERR(fence) : 0);
> - }
> -
> - wake_up(&sched->job_scheduled);
> - drm_sched_run_job_queue(sched);
> + if (submitted_credits)
> + wake_up(&sched->job_scheduled);
> }
>
> static struct workqueue_struct *drm_sched_alloc_wq(const char *name)
> diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
> index e22f9ff88822..f0afdc0bd417 100644
> --- a/drivers/gpu/drm/scheduler/sched_rq.c
> +++ b/drivers/gpu/drm/scheduler/sched_rq.c
> @@ -197,9 +197,7 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
> *
> * Find oldest waiting ready entity.
> *
> - * Return an entity if one is found; return an error-pointer (!NULL) if an
> - * entity was ready, but the scheduler had insufficient credits to accommodate
> - * its job; return NULL, if no ready entity was found.
> + * Return an entity if one is found or NULL if no ready entity was found.
> */
> struct drm_sched_entity *
> drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
> @@ -213,14 +211,6 @@ drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
>
> entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
> if (drm_sched_entity_is_ready(entity)) {
> - /* If we can't queue yet, preserve the current entity in
> - * terms of fairness.
> - */
> - if (!drm_sched_can_queue(sched, entity)) {
> - spin_unlock(&rq->lock);
> - return ERR_PTR(-ENOSPC);
> - }
> -
> reinit_completion(&entity->entity_idle);
> break;
> }
next prev parent reply other threads:[~2025-07-08 12:37 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-08 9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
2025-07-08 9:51 ` [PATCH v6 01/15] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
2025-07-09 11:16 ` kernel test robot
2025-07-08 9:51 ` [PATCH v6 02/15] drm/sched: Add some more " Tvrtko Ursulin
2025-07-08 9:51 ` [PATCH v6 03/15] drm/sched: Avoid double re-lock on the job free path Tvrtko Ursulin
2025-07-08 11:22 ` Philipp Stanner
2025-07-08 12:23 ` Tvrtko Ursulin
2025-07-08 9:51 ` [PATCH v6 04/15] drm/sched: Consolidate drm_sched_job_timedout Tvrtko Ursulin
2025-07-08 9:51 ` [PATCH v6 05/15] drm/sched: Consolidate drm_sched_rq_select_entity_rr Tvrtko Ursulin
2025-07-08 11:31 ` Philipp Stanner
2025-07-08 12:21 ` Danilo Krummrich
2025-07-08 12:23 ` Tvrtko Ursulin
2025-07-08 9:51 ` [PATCH v6 06/15] drm/sched: Implement RR via FIFO Tvrtko Ursulin
2025-07-08 9:51 ` [PATCH v6 07/15] drm/sched: Consolidate entity run queue management Tvrtko Ursulin
2025-07-08 9:51 ` [PATCH v6 08/15] drm/sched: Move run queue related code into a separate file Tvrtko Ursulin
2025-07-08 9:51 ` [PATCH v6 09/15] drm/sched: Free all finished jobs at once Tvrtko Ursulin
2025-07-08 9:51 ` [PATCH v6 10/15] drm/sched: Account entity GPU time Tvrtko Ursulin
2025-07-08 9:51 ` [PATCH v6 11/15] drm/sched: Remove idle entity from tree Tvrtko Ursulin
2025-07-08 9:51 ` [PATCH v6 12/15] drm/sched: Add fair scheduling policy Tvrtko Ursulin
2025-07-08 9:51 ` [PATCH v6 13/15] drm/sched: Remove FIFO and RR and simplify to a single run queue Tvrtko Ursulin
2025-07-08 9:51 ` [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation Tvrtko Ursulin
2025-07-08 12:19 ` Philipp Stanner
2025-07-08 12:28 ` Tvrtko Ursulin
2025-07-08 12:37 ` Christian König [this message]
2025-07-08 12:54 ` Tvrtko Ursulin
2025-07-08 13:02 ` Christian König
2025-07-08 15:31 ` Tvrtko Ursulin
2025-07-08 18:59 ` Matthew Brost
2025-07-09 8:57 ` Christian König
2025-07-08 9:51 ` [PATCH v6 15/15] drm/sched: Embed run queue singleton into the scheduler Tvrtko Ursulin
2025-07-08 10:02 ` ✗ CI.checkpatch: warning for Fair DRM scheduler Patchwork
2025-07-08 10:03 ` ✓ CI.KUnit: success " Patchwork
2025-07-08 10:40 ` [PATCH v6 00/15] " Philipp Stanner
2025-07-08 10:43 ` ✓ Xe.CI.BAT: success for " Patchwork
2025-07-08 12:25 ` ✗ Xe.CI.Full: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cb140d4e-01cd-4cd7-bd7c-5c10b44cf98f@amd.com \
--to=christian.koenig@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=kernel-dev@igalia.com \
--cc=matthew.brost@intel.com \
--cc=phasta@kernel.org \
--cc=tvrtko.ursulin@igalia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox