Re: [RFC PATCH] drm/sched: Make sure drm_sched_fence_ops don't vanish

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Boris Brezillon <boris.brezillon@collabora.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: dri-devel@lists.freedesktop.org, kernel@collabora.com,
	"Luben Tuikov" <ltuikov89@gmail.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Danilo Krummrich" <dakr@redhat.com>
Subject: Re: [RFC PATCH] drm/sched: Make sure drm_sched_fence_ops don't vanish
Date: Sat, 31 Aug 2024 09:13:52 +0200	[thread overview]
Message-ID: <20240831091352.1f212fff@collabora.com> (raw)
In-Reply-To: <ZtJHg8JOPi7CVme+@DUT025-TGLU.fm.intel.com>

Hi Matthew,

On Fri, 30 Aug 2024 22:28:19 +0000
Matthew Brost <matthew.brost@intel.com> wrote:

> On Fri, Aug 30, 2024 at 12:40:57PM +0200, Boris Brezillon wrote:
> > dma_fence objects created by drm_sched might hit other subsystems, which
> > means the fence object might potentially outlive the drm_sched module
> > lifetime, and at this point the dma_fence::ops points to a memory region
> > that no longer exists.
> > 
> > In order to fix that, let's make sure the drm_sched_fence code is always
> > statically linked, just like dma-fence{-chain}.c does.
> > 
> > Cc: Luben Tuikov <ltuikov89@gmail.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: Danilo Krummrich <dakr@redhat.com>
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> > Just like for the other UAF fix, this is an RFC, as I'm not sure that's
> > how we want it fixed. The other option we have is to retain a module ref
> > for each initialized fence and release it when the fence is freed.  
> 
> So this is different than your other patch. From Xe PoV the other patch
> is referencing an object which is tied to an IOCTL rather than a module
> whereas this referencing a module.

Well, it's not fixing the exact same problem either. My other patch was
about making sure the timeline name string doesn't disappear when a
scheduler is destroyed, which can happen while the module is still
loaded. As Christian pointed out, the "module unload while fences
exist" problem can be solved by binding the module refcounting to
the drm_sched_fence_timeline object lifetime, but I wanted to show a
simpler alternative to this approach with this patch.

> If a module can disappear when fence
> tied to the module, well that is a bit scary and Xe might have some
> issues there - I'd have audit our of exporting internally created
> fences.

Yeah, I moved away from exposing driver fences directly because of
that. Now all I expose to the outside world are drm_sched_fence
objects, which just moves the problem one level down, but at least we
can fix it generically if all the drivers take this precaution.

> 
> Based on Christian's feedback we really shouldn't be allowing this but
> also don't really love the idea of a fence holding a module ref either.
> 
> Seems like we need a well defined + documented rule - exported fences
> need to be completely decoupled from the module upon signaling

That basically means patching drm_sched_fence::ops (with the lock held)
at signal time so it points to a dummy ops that's statically linked
(something in dma-fence.c, I guess). But that also means the names
returned by get_{driver,timeline}_name() no longer reflect the original
fence owner after signalling, or you have to do some allocation+copy
of these strings. Neither of these options sound good to me.

> or
> exported fences must retain a module ref.

Yes, and that's the option I was originally heading to, until I
realized we have a 3rd option that's already working well for the core
dma-fence code (AKA the stub, the chain, and other containers I might
have missed). If the code is not in a module but instead statically
linked, all our module-lifetime vs fence-lifetime issues go away
without resorting to this module refcounting trick. Given sched_fence.c
is pretty self-contained (no deps on other DRM functions that might be
linked as a module), I now think this is our best shot for drm_sched
fences.

The story is a bit different for driver fences exposed to the outside
world, but if you're using drm_sched, I see no good reason for not using
the drm_sched_fence proxy for all your fences. Note that the same kind
of proxy can be generalized at the dma-fence.c level if it's deemed
valuable for other subsystems.

The idea behind it being that:

- the dma_fence_proxy ops should live in dma-fence.c (or any other
  object that's statically linked)
- driver and timeline names attached to a proxy need to be dynamically
  allocated and owned by some dma_fence_proxy_context object
- the dma_fence_proxy_context object needs to be refcounted, and each
  fence attached to this 'fence-context' needs to hold a ref on this
  struct

> I'd probably lean towards the
> former.

For the reasons exposed above, I don't think that's a viable option,
unless we decide we no longer care about the fence origin after
signalling happened (which would be a mess for people looking a
dma_buf's debug file to be honest).

> One reason to support the former is fences can be released in
> IRQ contexts and dropping a module ref in IRQ context seems a bit
> problematic. Also because some oher part of kernel holds on to fence ref
> lazily block a module unload just seems wrong.

There's always the option to defer the release if we're in a context
where we can't release immediately. But I think we're overthinking it.
For the drm_sched_fence, the simple answer is to make this code static,
just like dma-fence{-chain}.c.

> 
> Sorry if above we have well defined rule and I'm just not aware.

AFAICT, it's not.

Regards,

Boris

next prev parent reply	other threads:[~2024-08-31  7:14 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-30 10:40 [RFC PATCH] drm/sched: Make sure drm_sched_fence_ops don't vanish Boris Brezillon
2024-08-30 22:28 ` Matthew Brost
2024-08-31  7:13   ` Boris Brezillon [this message]
2024-09-03  8:29     ` Simona Vetter
2024-09-02 10:58   ` Christian König
2024-09-02 13:40     ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240831091352.1f212fff@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=kernel@collabora.com \
    --cc=ltuikov89@gmail.com \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.