Re: [PATCH] drm/panthor: Fix the "done_fence is initialized" detection logic

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Boris Brezillon <boris.brezillon@collabora.com>
To: Liviu Dudau <liviu.dudau@arm.com>
Cc: "Steven Price" <steven.price@arm.com>,
	"Adrián Larumbe" <adrian.larumbe@collabora.com>,
	dri-devel@lists.freedesktop.org, kernel@collabora.com,
	"Nicolas Frattaroli" <nicolas.frattaroli@collabora.com>,
	"Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>,
	"Philipp Stanner" <phasta@kernel.org>,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: [PATCH] drm/panthor: Fix the "done_fence is initialized" detection logic
Date: Mon, 9 Mar 2026 16:32:37 +0100	[thread overview]
Message-ID: <20260309163237.1941983b@fedora> (raw)
In-Reply-To: <aa7fHayRMdHn2Yxo@e142607>

On Mon, 9 Mar 2026 14:54:21 +0000
Liviu Dudau <liviu.dudau@arm.com> wrote:

> On Mon, Mar 09, 2026 at 02:15:49PM +0100, Boris Brezillon wrote:
> > On Mon, 9 Mar 2026 11:05:06 +0000
> > Liviu Dudau <liviu.dudau@arm.com> wrote:
> >   
> > > > After commit 541c8f2468b9 ("dma-buf: detach fence ops on signal v3"),
> > > > dma_fence::ops == NULL can't be used to check if the fence is initialized
> > > > or not. We could turn this into an "is_signaled() || ops == NULL" test,
> > > > but that's fragile, since it's still subject to dma_fence internal
> > > > changes. So let's have the "is_initialized" state encoded directly in
> > > > the pointer through the lowest bit which is guaranteed to be unused
> > > > because of the dma_fence alignment constraint.    
> > > 
> > > I'm confused! There is only one place where we end up being interested if the
> > > fence has been initialized or not, and that is in job_release(). I don't
> > > see why checking for "ops != NULL" before calling dma_fence_put() should not
> > > be enough,  
> > 
> > Because after 541c8f2468b9 ("dma-buf: detach fence ops on signal v3"),
> > dma_fence->ops is set back to NULL at signal time[1].  
> 
> Yes, I gathered that. What I meant to say was that I don't understand why we need
> all this infrastructure just for one check. Meanwhile Christian pointed out that
> a simpler solution already exists.
> 
> >   
> > > or even better, why don't we call dma_fence_put() regardless,
> > > as the core code should take care of an uninitialized dma_fence AFAICT.  
> > 
> > When the job is created, we pre-allocate the done_fence, but we leave it
> > uninitialized until ::run_job() is called. If we call
> > dma_fence_release() (through dma_fence_put()) on a dma_fence that was
> > not dma_fence_init()-ialized, we have a NULL deref on the cb_list, and
> > probably other issues too.  
> 
> I don't see the benefit of not initializing the done_fence until we ::run_job()
> but I might have missed something obvious.

It has to do with the way we connect dma_fence::seqno to the CS_SYNC
object seqno. The submission process is a multi-step operation:

for_each_job() // can fail
	1. allocate and initialize resources (including dma_fence and
	   drm_sched_fence objects)
	2. gather deps

for_each_job() // can't fail
	3. arm drm_sched fences
	4. queue jobs
	5. update syncobjs with the drm_sched fences

If anything fails before step3, we rollback all we've done. Now, if we
were initializing the job::dma_fence when we allocate it, we would
consume a seqno on the panthor_queue, and because the execute-job
sequence assumes the seqno increases monotonically (SYNC_ADD(+1)), we
can't leave holes behind, which would happen if we were initializing at
alloc time and something fails half way through the submission process.
There are ways around it, like using SYNC_SET(seqno) instead of
SYNC_ADD(+1), but those changes are more invasive than delaying the
initialization of the ::done_fence object.

> If we want to keep that, maybe we
> should not be droping the reference in job_release() but when we
> signal the fence.

If we assume that several paths call dma_fence_signal[_locked](), that'd
mean more code and more chances to forget the
dma_fence_put()+done_fence=NULL in case new paths are added. That, or we
need a panthor_job_signal_done_fence() wrapper.

> But that would leak the memory of the uninitialized done_fence.

Yes, the problem with uninitialized fences remains: as soon as we have
this two-step model where allocation and initialization is split, we
need to deal with both cases in the cleanup path.

next prev parent reply	other threads:[~2026-03-09 15:32 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-09 10:30 [PATCH] drm/panthor: Fix the "done_fence is initialized" detection logic Boris Brezillon
2026-03-09 10:50 ` Christian König
2026-03-09 11:06   ` Boris Brezillon
2026-03-09 11:05 ` Liviu Dudau
2026-03-09 13:15   ` Boris Brezillon
2026-03-09 14:54     ` Liviu Dudau
2026-03-09 15:32       ` Boris Brezillon [this message]
2026-03-09 11:06 ` Nicolas Frattaroli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260309163237.1941983b@fedora \
    --to=boris.brezillon@collabora.com \
    --cc=adrian.larumbe@collabora.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=kernel@collabora.com \
    --cc=liviu.dudau@arm.com \
    --cc=nicolas.frattaroli@collabora.com \
    --cc=phasta@kernel.org \
    --cc=steven.price@arm.com \
    --cc=tvrtko.ursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.