From: Jani Nikula <jani.nikula@linux.intel.com>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>,
dakr@kernel.org, pstanner@redhat.com,
dri-devel@lists.freedesktop.org, ltuikov89@gmail.com
Subject: Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
Date: Wed, 18 Sep 2024 17:41:41 +0300 [thread overview]
Message-ID: <87h6adausa.fsf@intel.com> (raw)
In-Reply-To: <20240918133956.26557-1-christian.koenig@amd.com>
On Wed, 18 Sep 2024, "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
> Tearing down the scheduler with jobs still on the pending list can
> lead to use after free issues. Add a warning if drivers try to
> destroy a scheduler which still has work pushed to the HW.
>
> When there are still entities with jobs the situation is even worse
> since the dma_fences for those jobs can never signal we can just
> choose between potentially locking up core memory management and
> random memory corruption. When drivers really mess it up that well
> let them run into a BUG_ON().
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 19 ++++++++++++++++++-
> 1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index f093616fe53c..8a46fab5cdc8 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1333,17 +1333,34 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
>
> drm_sched_wqueue_stop(sched);
>
> + /*
> + * Tearing down the scheduler wile there are still unprocessed jobs can
> + * lead to use after free issues in the scheduler fence.
> + */
> + WARN_ON(!list_empty(&sched->pending_list));
drm_WARN_ON(sched->dev, ...) would identify the device, which I presume
would be helpful in multi-GPU systems.
BR,
Jani.
> +
> for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
> struct drm_sched_rq *rq = sched->sched_rq[i];
>
> spin_lock(&rq->lock);
> - list_for_each_entry(s_entity, &rq->entities, list)
> + list_for_each_entry(s_entity, &rq->entities, list) {
> + /*
> + * The justification for this BUG_ON() is that tearing
> + * down the scheduler while jobs are pending leaves
> + * dma_fences unsignaled. Since we have dependencies
> + * from the core memory management to eventually signal
> + * dma_fences this can trivially lead to a system wide
> + * stop because of a locked up memory management.
> + */
> + BUG_ON(spsc_queue_count(&s_entity->job_queue));
> +
> /*
> * Prevents reinsertion and marks job_queue as idle,
> * it will removed from rq in drm_sched_entity_fini
> * eventually
> */
> s_entity->stopped = true;
> + }
> spin_unlock(&rq->lock);
> kfree(sched->sched_rq[i]);
> }
--
Jani Nikula, Intel
next prev parent reply other threads:[~2024-09-18 14:41 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-18 13:39 [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini Christian König
2024-09-18 13:39 ` [PATCH 2/2] drm/sched: clarify the documentation on drm_sched_entity_error Christian König
2024-11-25 10:00 ` Philipp Stanner
2024-09-18 14:41 ` Jani Nikula [this message]
2024-09-18 14:56 ` [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini Christian König
2024-09-20 8:57 ` Philipp Stanner
2024-09-20 10:33 ` Christian König
2024-09-20 13:26 ` Philipp Stanner
2024-09-23 15:24 ` Christian König
2024-09-24 11:18 ` Simona Vetter
2024-09-25 14:53 ` Philipp Stanner
2024-09-27 9:04 ` Christian König
2024-10-14 14:56 ` Danilo Krummrich
2024-10-18 12:07 ` Philipp Stanner
2024-10-29 7:22 ` Philipp Stanner
2024-11-15 11:26 ` Christian König
2024-11-15 13:55 ` Philipp Stanner
2024-11-15 14:39 ` Christian König
2024-11-15 16:07 ` Philipp Stanner
2024-11-18 11:48 ` Christian König
2024-11-21 9:06 ` Philipp Stanner
2024-11-21 18:10 ` Matthew Brost
2024-11-21 18:05 ` Matthew Brost
2024-11-22 16:24 ` Philipp Stanner
2024-11-22 16:39 ` Matthew Brost
2024-10-14 14:49 ` Danilo Krummrich
2024-11-08 4:00 ` Dave Airlie
2024-11-12 15:26 ` Christian König
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h6adausa.fsf@intel.com \
--to=jani.nikula@linux.intel.com \
--cc=ckoenig.leichtzumerken@gmail.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=ltuikov89@gmail.com \
--cc=pstanner@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.