* [PATCH] nouveau/fence: handle cross device fences properly. @ 2025-01-07 5:58 Dave Airlie 2025-01-07 6:16 ` Ben Skeggs 2025-01-07 16:02 ` Danilo Krummrich 0 siblings, 2 replies; 6+ messages in thread From: Dave Airlie @ 2025-01-07 5:58 UTC (permalink / raw) To: dri-devel; +Cc: nouveau, dakr From: Dave Airlie <airlied@redhat.com> If we have two nouveau controlled devices and one passes a dma-fence to the other, when we hit the sync path it can cause the second device to try and put a sync wait in it's pushbuf for the seqno of the context on the first device. Since fence contexts are vmm bound, check the if vmm's match between both users, this should ensure that fence seqnos don't get used wrongly on incorrect channels. This seems to happen fairly spuriously and I found it tracking down a multi-card regression report, that seems to work by luck before this. Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: stable@vger.kernel.org --- drivers/gpu/drm/nouveau/nouveau_fence.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index ee5e9d40c166f..5743c82f4094b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -370,7 +370,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, rcu_read_lock(); prev = rcu_dereference(f->channel); - if (prev && (prev == chan || + if (prev && (prev->vmm == chan->vmm) && + (prev == chan || fctx->sync(f, prev, chan) == 0)) must_wait = false; rcu_read_unlock(); -- 2.43.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] nouveau/fence: handle cross device fences properly. 2025-01-07 5:58 [PATCH] nouveau/fence: handle cross device fences properly Dave Airlie @ 2025-01-07 6:16 ` Ben Skeggs 2025-01-07 16:02 ` Danilo Krummrich 1 sibling, 0 replies; 6+ messages in thread From: Ben Skeggs @ 2025-01-07 6:16 UTC (permalink / raw) To: dri-devel On 7/1/25 15:58, Dave Airlie wrote: > From: Dave Airlie <airlied@redhat.com> > > If we have two nouveau controlled devices and one passes a dma-fence > to the other, when we hit the sync path it can cause the second device > to try and put a sync wait in it's pushbuf for the seqno of the context > on the first device. > > Since fence contexts are vmm bound, check the if vmm's match between > both users, this should ensure that fence seqnos don't get used wrongly > on incorrect channels. > > This seems to happen fairly spuriously and I found it tracking down > a multi-card regression report, that seems to work by luck before this. > > Signed-off-by: Dave Airlie <airlied@redhat.com> > Cc: stable@vger.kernel.org Reviewed-by: Ben Skeggs <bskeggs@nvidia.com> > --- > drivers/gpu/drm/nouveau/nouveau_fence.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c > index ee5e9d40c166f..5743c82f4094b 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c > @@ -370,7 +370,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, > > rcu_read_lock(); > prev = rcu_dereference(f->channel); > - if (prev && (prev == chan || > + if (prev && (prev->vmm == chan->vmm) && > + (prev == chan || > fctx->sync(f, prev, chan) == 0)) > must_wait = false; > rcu_read_unlock(); ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] nouveau/fence: handle cross device fences properly. 2025-01-07 5:58 [PATCH] nouveau/fence: handle cross device fences properly Dave Airlie 2025-01-07 6:16 ` Ben Skeggs @ 2025-01-07 16:02 ` Danilo Krummrich 2025-01-08 1:04 ` Dave Airlie 1 sibling, 1 reply; 6+ messages in thread From: Danilo Krummrich @ 2025-01-07 16:02 UTC (permalink / raw) To: Dave Airlie; +Cc: dri-devel, nouveau On Tue, Jan 07, 2025 at 03:58:46PM +1000, Dave Airlie wrote: > From: Dave Airlie <airlied@redhat.com> > > If we have two nouveau controlled devices and one passes a dma-fence > to the other, when we hit the sync path it can cause the second device > to try and put a sync wait in it's pushbuf for the seqno of the context > on the first device. > > Since fence contexts are vmm bound, check the if vmm's match between > both users, this should ensure that fence seqnos don't get used wrongly > on incorrect channels. The fence sequence number is global, i.e. per device, hence checking the vmm context seems too restrictive. Wouldn't it be better to ensure that `prev->cli->drm == chan->cli->drm`? This way we can still optimize where dependencies are between different applications, but on the same device. > > This seems to happen fairly spuriously and I found it tracking down > a multi-card regression report, that seems to work by luck before this. > > Signed-off-by: Dave Airlie <airlied@redhat.com> > Cc: stable@vger.kernel.org > --- > drivers/gpu/drm/nouveau/nouveau_fence.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c > index ee5e9d40c166f..5743c82f4094b 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c > @@ -370,7 +370,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, > > rcu_read_lock(); > prev = rcu_dereference(f->channel); > - if (prev && (prev == chan || > + if (prev && (prev->vmm == chan->vmm) && > + (prev == chan || Maybe better break it down a bit, e.g. bool local = prev && (prev->... == chan->...); if (local && ...) { ... } > fctx->sync(f, prev, chan) == 0)) > must_wait = false; > rcu_read_unlock(); > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] nouveau/fence: handle cross device fences properly. 2025-01-07 16:02 ` Danilo Krummrich @ 2025-01-08 1:04 ` Dave Airlie 2025-01-08 1:49 ` Ben Skeggs 2025-01-08 7:39 ` Danilo Krummrich 0 siblings, 2 replies; 6+ messages in thread From: Dave Airlie @ 2025-01-08 1:04 UTC (permalink / raw) To: Danilo Krummrich; +Cc: dri-devel, nouveau On Wed, 8 Jan 2025 at 02:02, Danilo Krummrich <dakr@kernel.org> wrote: > > On Tue, Jan 07, 2025 at 03:58:46PM +1000, Dave Airlie wrote: > > From: Dave Airlie <airlied@redhat.com> > > > > If we have two nouveau controlled devices and one passes a dma-fence > > to the other, when we hit the sync path it can cause the second device > > to try and put a sync wait in it's pushbuf for the seqno of the context > > on the first device. > > > > Since fence contexts are vmm bound, check the if vmm's match between > > both users, this should ensure that fence seqnos don't get used wrongly > > on incorrect channels. > > The fence sequence number is global, i.e. per device, hence checking the vmm > context seems too restrictive. > > Wouldn't it be better to ensure that `prev->cli->drm == chan->cli->drm`? Can you prove that? I thought the same and I've gone around a few times yesterday/today and convinced myself what I wrote is right. dma_fence_init gets passed the seqno which comes from fctx->sequence, which is nouveau_fence_chan, which gets allocated for each channel. So we should hit this path if we have 2 userspace submits, one with say graphics, the one with copy engine contexts, otherwise we should wait on the CPU. > > drivers/gpu/drm/nouveau/nouveau_fence.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c > > index ee5e9d40c166f..5743c82f4094b 100644 > > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c > > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c > > @@ -370,7 +370,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, > > > > rcu_read_lock(); > > prev = rcu_dereference(f->channel); > > - if (prev && (prev == chan || > > + if (prev && (prev->vmm == chan->vmm) && > > + (prev == chan || > > Maybe better break it down a bit, e.g. > > bool local = prev && (prev->... == chan->...); > > if (local && ...) { > ... > } I'll update that once we resolve the above. Dave. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] nouveau/fence: handle cross device fences properly. 2025-01-08 1:04 ` Dave Airlie @ 2025-01-08 1:49 ` Ben Skeggs 2025-01-08 7:39 ` Danilo Krummrich 1 sibling, 0 replies; 6+ messages in thread From: Ben Skeggs @ 2025-01-08 1:49 UTC (permalink / raw) To: nouveau On 8/1/25 11:04, Dave Airlie wrote: > On Wed, 8 Jan 2025 at 02:02, Danilo Krummrich <dakr@kernel.org> wrote: >> On Tue, Jan 07, 2025 at 03:58:46PM +1000, Dave Airlie wrote: >>> From: Dave Airlie <airlied@redhat.com> >>> >>> If we have two nouveau controlled devices and one passes a dma-fence >>> to the other, when we hit the sync path it can cause the second device >>> to try and put a sync wait in it's pushbuf for the seqno of the context >>> on the first device. >>> >>> Since fence contexts are vmm bound, check the if vmm's match between >>> both users, this should ensure that fence seqnos don't get used wrongly >>> on incorrect channels. >> The fence sequence number is global, i.e. per device, hence checking the vmm >> context seems too restrictive. >> >> Wouldn't it be better to ensure that `prev->cli->drm == chan->cli->drm`? > Can you prove that? I thought the same and I've gone around a few > times yesterday/today and convinced myself what I wrote is right. I think Danilo is right. Using the VMM would prevent synchronisation between clients on the same device, which was one of the intended purposes. > > dma_fence_init gets passed the seqno which comes from fctx->sequence, > which is nouveau_fence_chan, which gets allocated for each channel. All this code is really old and horrible, especially after not receiving much attention through many many DRM changes over the years. But - all channels share the semaphore buffer, each with their own (fixed, based on channel id) offset. There are indeed per-channel GPU VA mappings of the buffer in the fctx, but they all point at the same underlying memory. The "new" exec submission path doesn't use nouveau_fence_sync() at all. This isn't the worst idea in the world, given various shortcomings in how it's currently implemented, but I've never felt confident *something* wouldn't regress by removing its use in the older paths (or buffer moves). > > So we should hit this path if we have 2 userspace submits, one with > say graphics, the one with copy engine contexts, otherwise we should > wait on the CPU. > >>> drivers/gpu/drm/nouveau/nouveau_fence.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c >>> index ee5e9d40c166f..5743c82f4094b 100644 >>> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c >>> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c >>> @@ -370,7 +370,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, >>> >>> rcu_read_lock(); >>> prev = rcu_dereference(f->channel); >>> - if (prev && (prev == chan || >>> + if (prev && (prev->vmm == chan->vmm) && >>> + (prev == chan || >> Maybe better break it down a bit, e.g. >> >> bool local = prev && (prev->... == chan->...); >> >> if (local && ...) { >> ... >> } > I'll update that once we resolve the above. > > Dave. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] nouveau/fence: handle cross device fences properly. 2025-01-08 1:04 ` Dave Airlie 2025-01-08 1:49 ` Ben Skeggs @ 2025-01-08 7:39 ` Danilo Krummrich 1 sibling, 0 replies; 6+ messages in thread From: Danilo Krummrich @ 2025-01-08 7:39 UTC (permalink / raw) To: Dave Airlie; +Cc: dri-devel, nouveau On Wed, Jan 08, 2025 at 11:04:21AM +1000, Dave Airlie wrote: > On Wed, 8 Jan 2025 at 02:02, Danilo Krummrich <dakr@kernel.org> wrote: > > > > On Tue, Jan 07, 2025 at 03:58:46PM +1000, Dave Airlie wrote: > > > From: Dave Airlie <airlied@redhat.com> > > > > > > If we have two nouveau controlled devices and one passes a dma-fence > > > to the other, when we hit the sync path it can cause the second device > > > to try and put a sync wait in it's pushbuf for the seqno of the context > > > on the first device. > > > > > > Since fence contexts are vmm bound, check the if vmm's match between > > > both users, this should ensure that fence seqnos don't get used wrongly > > > on incorrect channels. > > > > The fence sequence number is global, i.e. per device, hence checking the vmm > > context seems too restrictive. > > > > Wouldn't it be better to ensure that `prev->cli->drm == chan->cli->drm`? > > Can you prove that? I thought the same and I've gone around a few > times yesterday/today and convinced myself what I wrote is right. Honestly, I thought you were implying that by the commit summary and message, but that's more the how you found this. With that bias grep made me end up at pre-nv84 code, where this is actually still the case (see nv17_fence_sync()). But of course for later GPUs it's a per fence-context / channel seqno; can't know what the firmware scheduler puts first. I think we should change the commit message to "handle cross cli fences properly" (channels of the same cli share the cli's vmm) and clarify in the commit message that not only cross device cases are affected. I'd also put that the problem is that (for nv84 and later) we otherwise take the channel ID of the fence' channel and add it on top of the fence-context vma address of the target channel, which (if they have different VMMs) makes us end up with a wrong synchronization point [1]. Cross device could even be worse with very old GPUs, since ->sync() just assumes the same fence-context type between the channels. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/nouveau/nv84_fence.c#n100 > > dma_fence_init gets passed the seqno which comes from fctx->sequence, > which is nouveau_fence_chan, which gets allocated for each channel. > > So we should hit this path if we have 2 userspace submits, one with > say graphics, the one with copy engine contexts, otherwise we should > wait on the CPU. > > > > drivers/gpu/drm/nouveau/nouveau_fence.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c > > > index ee5e9d40c166f..5743c82f4094b 100644 > > > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c > > > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c > > > @@ -370,7 +370,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, > > > > > > rcu_read_lock(); > > > prev = rcu_dereference(f->channel); > > > - if (prev && (prev == chan || > > > + if (prev && (prev->vmm == chan->vmm) && > > > + (prev == chan || > > > > Maybe better break it down a bit, e.g. > > > > bool local = prev && (prev->... == chan->...); > > > > if (local && ...) { > > ... > > } > > I'll update that once we resolve the above. > > Dave. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-01-08 7:39 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-07 5:58 [PATCH] nouveau/fence: handle cross device fences properly Dave Airlie 2025-01-07 6:16 ` Ben Skeggs 2025-01-07 16:02 ` Danilo Krummrich 2025-01-08 1:04 ` Dave Airlie 2025-01-08 1:49 ` Ben Skeggs 2025-01-08 7:39 ` Danilo Krummrich
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.