public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed
From: Philipp Stanner <phasta@mailbox.org>
To: "Christian König" <christian.koenig@amd.com>,
	phasta@kernel.org, "Lyude Paul" <lyude@redhat.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Sabrina Dubroca" <sd@queasysnail.net>,
	"Sumit Semwal" <sumit.semwal@linaro.org>
Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
	 linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	 linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org,
	 stable@vger.kernel.org
Subject: Re: [PATCH 1/3] drm/nouveau: Prevent signaled fences in pending list
Date: Fri, 11 Apr 2025 16:10:29 +0200	[thread overview]
Message-ID: <1127db242503055b2e5e8d07db3aeae46cfb7a24.camel@mailbox.org> (raw)
In-Reply-To: <45d66ca4-5390-42e9-869a-f5f9125d05b6@amd.com>

On Fri, 2025-04-11 at 15:06 +0200, Christian König wrote:
> Am 11.04.25 um 14:44 schrieb Philipp Stanner:
> > On Fri, 2025-04-11 at 13:05 +0200, Christian König wrote:
> > >  Am 11.04.25 um 11:29 schrieb Philipp Stanner:
> > >  
> > > > [SNIP]
> > > >  
> > > > It could be, however, that at the same moment
> > > > nouveau_fence_signal() is
> > > > removing that entry, holding the appropriate lock.
> > > > 
> > > > So we have a race. Again.
> > > >  
> > >  
> > >  Ah, yes of course. If signaled is called with or without the
> > > lock is
> > > actually undetermined.
> > >  
> > >  
> > > >  
> > > > You see, fixing things in Nouveau is difficult :)
> > > > It gets more difficult if you want to clean it up "properly",
> > > > so it
> > > > conforms to rules such as those from dma_fence.
> > > > 
> > > > I have now provided two fixes that both work, but you are not
> > > > satisfied
> > > > with from the dma_fence-maintainer's perspective. I understand
> > > > that,
> > > > but please also understand that it's actually not my primary
> > > > task
> > > > to
> > > > work on Nouveau. I just have to fix this bug to move on with my
> > > > scheduler work.
> > > >  
> > >  
> > >  Well I'm happy with whatever solution as long as it works, but
> > > as
> > > far as I can see the approach with the callback simply doesn't.
> > >  
> > >  You just can't drop the fence reference for the list from the
> > > callback.
> > >  
> > >  
> > > >  
> > > > So if you have another idea, feel free to share it. But I'd
> > > > like to
> > > > know how we can go on here.
> > > >  
> > >  
> > >  Well the fence code actually works, doesn't it? The problem is
> > > rather that setting the error throws a warning because it doesn't
> > > expect signaled fences on the pending list.
> > >  
> > >  Maybe we should fix that instead.
> > The fence code works as the author intended, but I would be happy
> > if it
> > were more explicitly documented.
> > 
> > Regarding the WARN_ON: It occurs in dma_fence_set_error() because
> > there
> > is an attempt to set an error code on a signaled fence. I don't
> > think
> > that should be "fixed", it works as intended: You must not set an
> > error
> > code of a fence that was already signaled.
> > 
> > The reason seems to be that once a fence is signaled, a third party
> > might evaluate the error code.
> 
> Yeah, more or less correct. The idea is you can't declare an
> operation as having an error after the operation has already
> completed.
> 
> Because everyone will just wait for the completion and nobody checks
> the status again after that.
> 
> > 
> > But I think this wasn't wat you meant with "fix".
> 
> The idea was to avoid calling dma_fence_set_error() on already
> signaled fences. Something like this:
> 
> @@ -90,7 +90,7 @@ nouveau_fence_context_kill(struct
> nouveau_fence_chan *fctx, int error)
>         while (!list_empty(&fctx->pending)) {
>                 fence = list_entry(fctx->pending.next,
> typeof(*fence), head);
>  
> -               if (error)
> +               if (error & !dma_fence_is_signaled_locked(&fence-
> >base))
>                         dma_fence_set_error(&fence->base, error);
>  
>                 if (nouveau_fence_signal(fence))
> 
> That would also improve the handling quite a bit since we now don't
> set errors on fences which are already completed even if we haven't
> realized that they are already completed yet.
> 
> > In any case, there must not be signaled fences in nouveau's
> > pending-
> > list. They must be removed immediately once they signal, and this
> > must
> > not race.
> 
> Why actually? As far as I can see the pending list is not for the
> unsignaled fences, but rather the pending interrupt processing.

That's a list of fences that are "in the air", i.e., whose jobs are
currently being processed by the hardware. Once a job is done, its
fence must be removed.

> 
> So having signaled fences on the pending list is perfectly possible.

It is possible, and that is a bug. The list is used by
nouveau_fence_context_kill() to kill still pending jobs. It shall not
try to kill and set error codes for fences that are already signaled.



Anyways, forget about the "remove callbacks solution" it actually
causes a MASSIVE performance regression. No idea why, AFAICS the fast
path is only ever evaluated in nouveau_fence_done(), but maybe I missed
something.

Will re-iterate next week…


P.


> 
> Regards,
> Christian.
> 
> > 
> > >  
> > >  
> > > >  
> > > > I'm running out of ideas. What I'm wondering if we couldn't
> > > > just
> > > > remove
> > > > performance hacky fastpath functions such as
> > > > nouveau_fence_is_signaled() completely. It seems redundant to
> > > > me.
> > > >  
> > >  
> > >  That would work for me as well.
> > I'll test this approach. Seems a bit like the nuclear approach, but
> > if
> > it works we'd at least clean up a lot of this mess.
> > 
> > 
> > P.
> > 
> > 
> > >  
> > >  
> > > >  
> > > > 
> > > > Or we might add locking to it, but IDK what was achieved with
> > > > RCU
> > > > here.
> > > > In any case it's definitely bad that Nouveau has so many
> > > > redundant
> > > > and
> > > > half-redundant mechanisms.
> > > >  
> > >  
> > >  Yeah, agree messing with the locks even more won't help us here.
> > >  
> > >  Regards,
> > >  Christian.
> > >  
> > >  
> > > >  
> > > > 
> > > > 
> > > > P.
> > > > 
> > > >  
> > > > >  
> > > > > 
> > > > > P.
> > > > > 
> > > > >  
> > > > > >  
> > > > > > Regards,
> > > > > > Christian.
> > > > > > 
> > > > > >  
> > > > > > >  
> > > > > > > P.
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > >  
> > > > > > > >  
> > > > > > > > Regards,
> > > > > > > > Christian.
> > > > > > > > 
> > > > > > > >  
> > > > > > > > >  
> > > > > > > > > Replace the call to dma_fence_is_signaled() with
> > > > > > > > > nouveau_fence_base_is_signaled().
> > > > > > > > > 
> > > > > > > > > Cc: <stable@vger.kernel.org> # 4.10+, precise commit
> > > > > > > > > not
> > > > > > > > > to
> > > > > > > > > be
> > > > > > > > > determined
> > > > > > > > > Signed-off-by: Philipp Stanner <phasta@kernel.org>
> > > > > > > > > ---
> > > > > > > > >  drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +-
> > > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c
> > > > > > > > > b/drivers/gpu/drm/nouveau/nouveau_fence.c
> > > > > > > > > index 7cc84472cece..33535987d8ed 100644
> > > > > > > > > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> > > > > > > > > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> > > > > > > > > @@ -274,7 +274,7 @@ nouveau_fence_done(struct
> > > > > > > > > nouveau_fence
> > > > > > > > > *fence)
> > > > > > > > >  			nvif_event_block(&fctx-
> > > > > > > > > >event);
> > > > > > > > >  		spin_unlock_irqrestore(&fctx->lock,
> > > > > > > > > flags);
> > > > > > > > >  	}
> > > > > > > > > -	return dma_fence_is_signaled(&fence->base);
> > > > > > > > > +	return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> > > > > > > > > &fence-
> > > > > > > > >  
> > > > > > > > > >  
> > > > > > > > > > base.flags);
> > > > > > > > > >  
> > > > > > > > >  
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > >  static long
> > > > > > > > >  
> > > > > > > >  
> > > > > > >  
> > > > > >  
> > > > >   
> > > >   
> > >  
> > >  
> 


  reply	other threads:[~2025-04-11 14:10 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-10  9:24 [PATCH 0/3] drm/nouveau: Fix & improve nouveau_fence_done() Philipp Stanner
2025-04-10  9:24 ` [PATCH 1/3] drm/nouveau: Prevent signaled fences in pending list Philipp Stanner
2025-04-10 12:13   ` Christian König
2025-04-10 12:21     ` Danilo Krummrich
2025-04-10 12:42       ` Christian König
2025-04-10 12:58   ` Christian König
2025-04-10 13:09     ` Philipp Stanner
2025-04-10 13:16       ` Christian König
2025-04-10 15:36         ` Philipp Stanner
2025-04-11  9:29           ` Philipp Stanner
     [not found]             ` <81a70ba6-94b1-4bb3-a0b2-9e8890f90b33@amd.com>
2025-04-11 12:44               ` Philipp Stanner
2025-04-11 13:06                 ` Christian König
2025-04-11 14:10                   ` Philipp Stanner [this message]
2025-04-14  8:54                     ` Philipp Stanner
2025-04-14 14:27                       ` Danilo Krummrich
2025-04-15  9:56                         ` Christian König
2025-04-15 12:54                           ` Philipp Stanner
2025-04-10  9:24 ` [PATCH 2/3] drm/nouveau: Remove surplus if-branch Philipp Stanner
2025-04-10 12:15   ` Christian König
2025-04-10  9:24 ` [PATCH 3/3] drm/nouveau: Add helper to check base fence Philipp Stanner
2025-04-10  9:51 ` [PATCH 0/3] drm/nouveau: Fix & improve nouveau_fence_done() Philipp Stanner
2025-04-10 12:18   ` Christian König
2025-04-10 13:18     ` Philipp Stanner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1127db242503055b2e5e8d07db3aeae46cfb7a24.camel@mailbox.org \
    --to=phasta@mailbox.org \
    --cc=airlied@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=lyude@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=phasta@kernel.org \
    --cc=sd@queasysnail.net \
    --cc=simona@ffwll.ch \
    --cc=stable@vger.kernel.org \
    --cc=sumit.semwal@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox