From: "Christian König" <christian.koenig@amd.com>
To: Asahi Lina <lina@asahilina.net>,
Luben Tuikov <luben.tuikov@amd.com>,
David Airlie <airlied@gmail.com>, Daniel Vetter <daniel@ffwll.ch>,
Sumit Semwal <sumit.semwal@linaro.org>
Cc: Faith Ekstrand <faith.ekstrand@collabora.com>,
Alyssa Rosenzweig <alyssa@rosenzweig.io>,
dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
linux-media@vger.kernel.org, asahi@lists.linux.dev
Subject: Re: [PATCH 1/3] drm/scheduler: Add more documentation
Date: Fri, 14 Jul 2023 10:40:03 +0200 [thread overview]
Message-ID: <332e031c-c04e-998c-e401-685c817ea2a1@amd.com> (raw)
In-Reply-To: <20230714-drm-sched-fixes-v1-1-c567249709f7@asahilina.net>
Am 14.07.23 um 10:21 schrieb Asahi Lina:
> Document the implied lifetime rules of the scheduler (or at least the
> intended ones), as well as the expectations of how resource acquisition
> should be handled.
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 58 ++++++++++++++++++++++++++++++++--
> 1 file changed, 55 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 7b2bfc10c1a5..1f3bc3606239 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -43,9 +43,61 @@
> *
> * The jobs in a entity are always scheduled in the order that they were pushed.
> *
> - * Note that once a job was taken from the entities queue and pushed to the
> - * hardware, i.e. the pending queue, the entity must not be referenced anymore
> - * through the jobs entity pointer.
> + * Lifetime rules
> + * --------------
> + *
> + * Getting object lifetimes right across the stack is critical to avoid UAF
> + * issues. The DRM scheduler has the following lifetime rules:
> + *
> + * - The scheduler must outlive all of its entities.
> + * - Jobs pushed to the scheduler are owned by it, and must only be freed
> + * after the free_job() callback is called.
> + * - Scheduler fences are reference-counted and may outlive the scheduler.
> + * - The scheduler *may* be destroyed while jobs are still in flight.
That's not correct. The scheduler can only be destroyed after all the
entities serving it have been destroyed as well as all the jobs already
pushed to the hw finished.
What might be possible to add is that the hw is still working on the
already pushed jobs, but so far that was rejected as undesirable.
> + * - There is no guarantee that all jobs have been freed when all entities
> + * and the scheduled have been destroyed. Jobs may be freed asynchronously
> + * after this point.
> + * - Once a job is taken from the entity's queue and pushed to the hardware,
> + * i.e. the pending queue, the entity must not be referenced any more
> + * through the job's entity pointer. In other words, entities are not
> + * required to outlive job execution.
> + *
> + * If the scheduler is destroyed with jobs in flight, the following
> + * happens:
> + *
> + * - Jobs that were pushed but have not yet run will be destroyed as part
> + * of the entity cleanup (which must happen before the scheduler itself
> + * is destroyed, per the first rule above). This signals the job
> + * finished fence with an error flag. This process runs asynchronously
> + * after drm_sched_entity_destroy() returns.
> + * - Jobs that are in-flight on the hardware are "detached" from their
> + * driver fence (the fence returned from the run_job() callback). In
> + * this case, it is up to the driver to ensure that any bookkeeping or
> + * internal data structures have separately managed lifetimes and that
> + * the hardware either cancels the jobs or runs them to completion.
> + * The DRM scheduler itself will immediately signal the job complete
> + * fence (with an error flag) and then call free_job() as part of the
> + * cleanup process.
> + *
> + * After the scheduler is destroyed, drivers *may* (but are not required to)
> + * skip signaling their remaining driver fences, as long as they have only ever
> + * been returned to the scheduler being destroyed as the return value from
> + * run_job() and not passed anywhere else.
This is an outright NAK to this. Fences must always be cleanly signaled.
IIRC Daniel documented this as mandatory on the dma_fence behavior.
Regards,
Christian.
> If these fences are used in any other
> + * context, then the driver *must* signal them, per the usual fence signaling
> + * rules.
> + *
> + * Resource management
> + * -------------------
> + *
> + * Drivers may need to acquire certain hardware resources (e.g. VM IDs) in order
> + * to run a job. This process must happen during the job's prepare() callback,
> + * not in the run() callback. If any resource is unavailable at job prepare time,
> + * the driver must return a suitable fence that can be waited on to wait for the
> + * resource to (potentially) become available.
> + *
> + * In order to avoid deadlocks, drivers must always acquire resources in the
> + * same order, and release them in opposite order when a job completes or if
> + * resource acquisition fails.
> */
>
> #include <linux/kthread.h>
>
next prev parent reply other threads:[~2023-07-14 8:40 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-14 8:21 [PATCH 0/3] DRM scheduler documentation & bug fixes Asahi Lina
2023-07-14 8:21 ` [PATCH 1/3] drm/scheduler: Add more documentation Asahi Lina
2023-07-14 8:40 ` Christian König [this message]
2023-07-14 9:39 ` Asahi Lina
2023-07-14 9:47 ` Christian König
2023-07-14 9:51 ` Asahi Lina
2023-07-14 8:21 ` [PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name Asahi Lina
2023-07-14 8:43 ` Christian König
2023-07-14 9:44 ` Asahi Lina
2023-07-14 9:51 ` Christian König
2023-07-14 10:07 ` Asahi Lina
2023-07-14 10:29 ` Christian König
2023-07-14 9:49 ` Asahi Lina
2023-07-14 9:57 ` Christian König
2023-07-14 10:06 ` Asahi Lina
2023-07-14 10:18 ` Christian König
2023-07-14 12:13 ` Asahi Lina
2023-07-15 4:03 ` Luben Tuikov
2023-07-15 14:14 ` alyssa
2023-07-17 15:55 ` Christian König
2023-07-18 2:35 ` Asahi Lina
2023-07-18 5:45 ` Luben Tuikov
2023-07-21 10:33 ` Asahi Lina
2023-07-31 8:09 ` Christian König
2023-11-01 6:59 ` Dave Airlie
2023-11-01 8:13 ` Daniel Vetter
2023-11-02 10:48 ` Christian König
2023-11-02 11:19 ` Lucas Stach
2023-11-02 12:39 ` Christian König
2023-07-28 7:48 ` Christian König
2023-07-18 8:21 ` Pekka Paalanen
2023-07-14 8:21 ` [PATCH 3/3] drm/scheduler: Clean up jobs when the scheduler is torn down Asahi Lina
2023-07-15 7:14 ` Luben Tuikov
2023-07-16 7:51 ` Asahi Lina
2023-07-17 17:40 ` Luben Tuikov
2023-07-17 22:45 ` Asahi Lina
2023-07-18 5:14 ` Luben Tuikov
2023-07-19 18:16 ` Konstantin Ryabitsev
2023-07-19 18:58 ` Luben Tuikov
2023-08-02 4:06 ` Matthew Brost
2023-08-02 14:12 ` Luben Tuikov
2023-07-19 8:45 ` Christian König
2023-07-19 15:05 ` Luben Tuikov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=332e031c-c04e-998c-e401-685c817ea2a1@amd.com \
--to=christian.koenig@amd.com \
--cc=airlied@gmail.com \
--cc=alyssa@rosenzweig.io \
--cc=asahi@lists.linux.dev \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=faith.ekstrand@collabora.com \
--cc=lina@asahilina.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=luben.tuikov@amd.com \
--cc=sumit.semwal@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox