From: Danilo Krummrich <dakr@kernel.org>
To: Philipp Stanner <phasta@kernel.org>
Cc: "Lyude Paul" <lyude@redhat.com>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Matthew Brost" <matthew.brost@intel.com>,
"Christian König" <ckoenig.leichtzumerken@gmail.com>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
"Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>,
dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
linux-kernel@vger.kernel.org,
"Philipp Stanner" <pstanner@redhat.com>
Subject: Re: [PATCH v3 1/5] drm/sched: Fix teardown leaks with waitqueue
Date: Thu, 22 May 2025 14:44:08 +0200 [thread overview]
Message-ID: <aC8cGPx_m8g2ApcV@pollux> (raw)
In-Reply-To: <20250522082742.148191-3-phasta@kernel.org>
On Thu, May 22, 2025 at 10:27:39AM +0200, Philipp Stanner wrote:
> +/**
> + * drm_sched_submission_and_timeout_stop - stop everything except for free_job
> + * @sched: scheduler instance
> + *
> + * Helper for tearing down the scheduler in drm_sched_fini().
> + */
> +static void
> +drm_sched_submission_and_timeout_stop(struct drm_gpu_scheduler *sched)
> +{
> + WRITE_ONCE(sched->pause_submit, true);
> + cancel_work_sync(&sched->work_run_job);
> + cancel_delayed_work_sync(&sched->work_tdr);
> +}
> +
> +/**
> + * drm_sched_free_stop - stop free_job
> + * @sched: scheduler instance
> + *
> + * Helper for tearing down the scheduler in drm_sched_fini().
> + */
> +static void drm_sched_free_stop(struct drm_gpu_scheduler *sched)
> +{
> + WRITE_ONCE(sched->pause_free, true);
> + cancel_work_sync(&sched->work_free_job);
> +}
> +
> +/**
> + * drm_sched_no_jobs_pending - check whether jobs are pending
> + * @sched: scheduler instance
> + *
> + * Checks if jobs are pending for @sched.
> + *
> + * Return: true if jobs are pending, false otherwise.
> + */
> +static bool drm_sched_no_jobs_pending(struct drm_gpu_scheduler *sched)
> +{
> + bool empty;
> +
> + spin_lock(&sched->job_list_lock);
> + empty = list_empty(&sched->pending_list);
> + spin_unlock(&sched->job_list_lock);
> +
> + return empty;
> +}
I understand that the way you use this function is correct, since you only call
it *after* drm_sched_submission_and_timeout_stop(), which means that no new
items can end up on the pending_list.
But if we look at this function without context, it's broken:
The documentation says "Return: true if jobs are pending, false otherwise.", but
you can't guarantee that, since a new job could be added to the pending_list
after spin_unlock().
Hence, providing this function is a footgun.
Instead, you should put this teardown sequence in a single function, where you
can control the external conditions, i.e. that
drm_sched_submission_and_timeout_stop() has been called.
Please also add a comment explaining why we can release the lock and still work
with the value returned by list_empty() in this case, i.e. because we guarantee
that the list item count converges against zero.
The other two helpers above, drm_sched_submission_and_timeout_stop() and
drm_sched_free_stop() should be fine to have.
> +/**
> + * drm_sched_cancel_jobs_and_wait - trigger freeing of all pending jobs
> + * @sched: scheduler instance
> + *
> + * Must only be called if &struct drm_sched_backend_ops.cancel_pending_fences is
> + * implemented.
> + *
> + * Instructs the driver to kill the fence context associated with this scheduler,
> + * thereby signaling all pending fences. This, in turn, will trigger
> + * &struct drm_sched_backend_ops.free_job to be called for all pending jobs.
> + * The function then blocks until all pending jobs have been freed.
> + */
> +static void drm_sched_cancel_jobs_and_wait(struct drm_gpu_scheduler *sched)
> +{
> + sched->ops->cancel_pending_fences(sched);
> + wait_event(sched->pending_list_waitque, drm_sched_no_jobs_pending(sched));
> +}
Same here, you can't have this as an isolated helper.
next prev parent reply other threads:[~2025-05-22 12:44 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-22 8:27 [PATCH v3 0/5] Fix memory leaks in drm_sched_fini() Philipp Stanner
2025-05-22 8:27 ` [PATCH v3 1/5] drm/sched: Fix teardown leaks with waitqueue Philipp Stanner
2025-05-22 12:44 ` Danilo Krummrich [this message]
2025-05-22 13:37 ` Tvrtko Ursulin
2025-05-22 15:32 ` Philipp Stanner
2025-05-23 15:35 ` Tvrtko Ursulin
2025-05-22 8:27 ` [PATCH v3 2/5] drm/sched/tests: Port tests to new cleanup method Philipp Stanner
2025-05-22 14:06 ` Tvrtko Ursulin
2025-05-22 14:59 ` Philipp Stanner
2025-05-23 15:49 ` Tvrtko Ursulin
2025-05-22 8:27 ` [PATCH v3 3/5] drm/sched: Warn if pending list is not empty Philipp Stanner
2025-05-22 8:27 ` [PATCH v3 4/5] drm/nouveau: Add new callback for scheduler teardown Philipp Stanner
2025-05-22 8:27 ` [PATCH v3 5/5] drm/nouveau: Remove waitque for sched teardown Philipp Stanner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aC8cGPx_m8g2ApcV@pollux \
--to=dakr@kernel.org \
--cc=airlied@gmail.com \
--cc=ckoenig.leichtzumerken@gmail.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lyude@redhat.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=matthew.brost@intel.com \
--cc=mripard@kernel.org \
--cc=nouveau@lists.freedesktop.org \
--cc=phasta@kernel.org \
--cc=pstanner@redhat.com \
--cc=simona@ffwll.ch \
--cc=tvrtko.ursulin@igalia.com \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.