From: Philipp Stanner <phasta@kernel.org>
To: "Lyude Paul" <lyude@redhat.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Matthew Brost" <matthew.brost@intel.com>,
"Philipp Stanner" <phasta@kernel.org>,
"Christian König" <ckoenig.leichtzumerken@gmail.com>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
"Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>
Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
linux-kernel@vger.kernel.org, Danilo Krummrich <dakr@redhat.com>
Subject: [PATCH v2 2/6] drm/sched: Prevent teardown waitque from blocking too long
Date: Thu, 24 Apr 2025 11:55:32 +0200 [thread overview]
Message-ID: <20250424095535.26119-4-phasta@kernel.org> (raw)
In-Reply-To: <20250424095535.26119-2-phasta@kernel.org>
The waitqueue that ensures that drm_sched_fini() blocks until the
pending_list has become empty could theoretically cause that function to
block for a very long time. That, ultimately, could block userspace
procesess and prevent them from being killable through SIGKILL.
When a driver calls drm_sched_fini(), it is safe to assume that all
still pending jobs are not needed anymore anyways. Thus, they can be
cancelled and thereby it can be ensured that drm_sched_fini() will
return relatively quickly.
Implement a new helper to stop all work items / submission except for
the drm_sched_backend_ops.run_job().
Implement a driver callback, kill_fence_context(), that instructs the
driver to kill the fence context associated with this scheduler, thereby
causing all pending hardware fences to be signalled.
Call those new routines in drm_sched_fini() and ensure backwards
compatibility if the new callback is not implemented.
Suggested-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/sched_main.c | 47 +++++++++++++++++---------
include/drm/gpu_scheduler.h | 11 ++++++
2 files changed, 42 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index ea82e69a72a8..c2ad6c70bfb6 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1390,31 +1390,46 @@ drm_sched_no_jobs_pending(struct drm_gpu_scheduler *sched)
return empty;
}
+/**
+ * drm_sched_cancel_jobs_and_wait - trigger freeing of all pending jobs
+ * @sched: scheduler instance
+ *
+ * Must only be called if &struct drm_sched_backend_ops.kill_fence_context is
+ * implemented.
+ *
+ * Instructs the driver to kill the fence context associated with this scheduler,
+ * thereby signalling all pending fences. This, in turn, will trigger
+ * &struct drm_sched_backend_ops.free_job to be called for all pending jobs.
+ * The function then blocks until all pending jobs have been freed.
+ */
+static inline void
+drm_sched_cancel_jobs_and_wait(struct drm_gpu_scheduler *sched)
+{
+ sched->ops->kill_fence_context(sched);
+ wait_event(sched->pending_list_waitque, drm_sched_no_jobs_pending(sched));
+}
+
/**
* drm_sched_fini - Destroy a gpu scheduler
*
* @sched: scheduler instance
*
- * Tears down and cleans up the scheduler.
- *
- * Note that this function blocks until all the fences returned by
- * &struct drm_sched_backend_ops.run_job have been signalled.
+ * Tears down and cleans up the scheduler. Might leak memory if
+ * &struct drm_sched_backend_ops.kill_fence_context is not implemented.
*/
void drm_sched_fini(struct drm_gpu_scheduler *sched)
{
struct drm_sched_entity *s_entity;
int i;
- /*
- * Jobs that have neither been scheduled or which have timed out are
- * gone by now, but jobs that have been submitted through
- * backend_ops.run_job() and have not yet terminated are still pending.
- *
- * Wait for the pending_list to become empty to avoid leaking those jobs.
- */
- drm_sched_submission_and_timeout_stop(sched);
- wait_event(sched->pending_list_waitque, drm_sched_no_jobs_pending(sched));
- drm_sched_free_stop(sched);
+ if (sched->ops->kill_fence_context) {
+ drm_sched_submission_and_timeout_stop(sched);
+ drm_sched_cancel_jobs_and_wait(sched);
+ drm_sched_free_stop(sched);
+ } else {
+ /* We're in "legacy free-mode" and ignore potential mem leaks */
+ drm_sched_wqueue_stop(sched);
+ }
for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
struct drm_sched_rq *rq = sched->sched_rq[i];
@@ -1502,7 +1517,7 @@ bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched)
EXPORT_SYMBOL(drm_sched_wqueue_ready);
/**
- * drm_sched_wqueue_stop - stop scheduler submission
+ * drm_sched_wqueue_stop - stop scheduler submission and freeing
* @sched: scheduler instance
*
* Stops the scheduler from pulling new jobs from entities. It also stops
@@ -1518,7 +1533,7 @@ void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched)
EXPORT_SYMBOL(drm_sched_wqueue_stop);
/**
- * drm_sched_wqueue_start - start scheduler submission
+ * drm_sched_wqueue_start - start scheduler submission and freeing
* @sched: scheduler instance
*
* Restarts the scheduler after drm_sched_wqueue_stop() has stopped it.
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index d0b1f416b4d9..8630b4a26f10 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -509,6 +509,17 @@ struct drm_sched_backend_ops {
* and it's time to clean it up.
*/
void (*free_job)(struct drm_sched_job *sched_job);
+
+ /**
+ * @kill_fence_context: kill the fence context belonging to this scheduler
+ *
+ * Needed to cleanly tear the scheduler down in drm_sched_fini(). This
+ * callback will cause all hardware fences to be signalled by the driver,
+ * which, ultimately, ensures that all jobs get freed before teardown.
+ *
+ * This callback is optional, but it is highly recommended to implement it.
+ */
+ void (*kill_fence_context)(struct drm_gpu_scheduler *sched);
};
/**
--
2.48.1
next prev parent reply other threads:[~2025-04-24 9:56 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-24 9:55 [PATCH v2 0/6] drm/sched: Fix memory leaks in drm_sched_fini() Philipp Stanner
2025-04-24 9:55 ` [PATCH v2 1/6] drm/sched: Fix teardown leaks with waitqueue Philipp Stanner
2025-05-16 9:19 ` Tvrtko Ursulin
2025-04-24 9:55 ` Philipp Stanner [this message]
2025-05-16 9:33 ` [PATCH v2 2/6] drm/sched: Prevent teardown waitque from blocking too long Tvrtko Ursulin
2025-05-16 9:53 ` Danilo Krummrich
2025-05-16 10:19 ` Tvrtko Ursulin
2025-05-16 10:54 ` Danilo Krummrich
2025-05-16 11:35 ` Tvrtko Ursulin
2025-05-16 12:00 ` Danilo Krummrich
2025-05-16 15:48 ` Tvrtko Ursulin
2025-05-16 9:54 ` Philipp Stanner
2025-05-16 10:34 ` Tvrtko Ursulin
2025-04-24 9:55 ` [PATCH v2 3/6] drm/sched: Warn if pending list is not empty Philipp Stanner
2025-05-16 9:40 ` Tvrtko Ursulin
2025-04-24 9:55 ` [PATCH v2 4/6] drm/nouveau: Add new callback for scheduler teardown Philipp Stanner
2025-04-24 9:55 ` [PATCH v2 5/6] drm/nouveau: Remove waitque for sched teardown Philipp Stanner
2025-04-24 9:55 ` [PATCH v2 6/6] drm/sched: Port unit tests to new cleanup design Philipp Stanner
2025-05-08 11:03 ` Philipp Stanner
2025-05-08 12:51 ` Tvrtko Ursulin
2025-05-12 8:00 ` Philipp Stanner
2025-05-14 8:30 ` Tvrtko Ursulin
2025-05-14 9:19 ` Philipp Stanner
2025-05-16 9:00 ` Tvrtko Ursulin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250424095535.26119-4-phasta@kernel.org \
--to=phasta@kernel.org \
--cc=airlied@gmail.com \
--cc=ckoenig.leichtzumerken@gmail.com \
--cc=dakr@kernel.org \
--cc=dakr@redhat.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lyude@redhat.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=matthew.brost@intel.com \
--cc=mripard@kernel.org \
--cc=nouveau@lists.freedesktop.org \
--cc=simona@ffwll.ch \
--cc=tvrtko.ursulin@igalia.com \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox