* [PATCH] drm/i915/gt: Fix -EDEADLK handling regression
@ 2021-06-30 16:44 Ville Syrjala
2021-07-01 7:07 ` Maarten Lankhorst
0 siblings, 1 reply; 7+ messages in thread
From: Ville Syrjala @ 2021-06-30 16:44 UTC (permalink / raw)
To: intel-gfx; +Cc: stable, Maarten Lankhorst, Thomas Hellström
From: Ville Syrjälä <ville.syrjala@linux.intel.com>
The conversion to ww mutexes failed to address the fence code which
already returns -EDEADLK when we run out of fences. Ww mutexes on
the other hand treat -EDEADLK as an internal errno value indicating
a need to restart the operation due to a deadlock. So now when the
fence code returns -EDEADLK the higher level code erroneously
restarts everything instead of returning the error to userspace
as is expected.
To remedy this let's switch the fence code to use a different errno
value for this. -ENOBUFS seems like a semi-reasonable unique choice.
Apart from igt the only user of this I could find is sna, and even
there all we do is dump the current fence registers from debugfs
into the X server log. So no user visible functionality is affected.
If we really cared about preserving this we could of course convert
back to -EDEADLK higher up, but doesn't seem like that's worth
the hassle here.
Not quite sure which commit specifically broke this, but I'll
just attribute it to the general gem ww mutex work.
Cc: stable@vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Testcase: igt/gem_pread/exhaustion
Testcase: igt/gem_pwrite/basic-exhaustion
Testcase: igt/gem_fenced_exec_thrash/too-many-fences
Fixes: 80f0b679d6f0 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
index cac7f3f44642..f8948de72036 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
@@ -348,7 +348,7 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt)
if (intel_has_pending_fb_unpin(ggtt->vm.i915))
return ERR_PTR(-EAGAIN);
- return ERR_PTR(-EDEADLK);
+ return ERR_PTR(-ENOBUFS);
}
int __i915_vma_pin_fence(struct i915_vma *vma)
--
2.31.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915/gt: Fix -EDEADLK handling regression
2021-06-30 16:44 [PATCH] drm/i915/gt: Fix -EDEADLK handling regression Ville Syrjala
@ 2021-07-01 7:07 ` Maarten Lankhorst
2021-07-01 17:00 ` Ville Syrjälä
2021-07-13 19:58 ` [Intel-gfx] " Daniel Vetter
0 siblings, 2 replies; 7+ messages in thread
From: Maarten Lankhorst @ 2021-07-01 7:07 UTC (permalink / raw)
To: Ville Syrjala, intel-gfx; +Cc: stable, Thomas Hellström
Op 30-06-2021 om 18:44 schreef Ville Syrjala:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>
> The conversion to ww mutexes failed to address the fence code which
> already returns -EDEADLK when we run out of fences. Ww mutexes on
> the other hand treat -EDEADLK as an internal errno value indicating
> a need to restart the operation due to a deadlock. So now when the
> fence code returns -EDEADLK the higher level code erroneously
> restarts everything instead of returning the error to userspace
> as is expected.
>
> To remedy this let's switch the fence code to use a different errno
> value for this. -ENOBUFS seems like a semi-reasonable unique choice.
> Apart from igt the only user of this I could find is sna, and even
> there all we do is dump the current fence registers from debugfs
> into the X server log. So no user visible functionality is affected.
> If we really cared about preserving this we could of course convert
> back to -EDEADLK higher up, but doesn't seem like that's worth
> the hassle here.
>
> Not quite sure which commit specifically broke this, but I'll
> just attribute it to the general gem ww mutex work.
>
> Cc: stable@vger.kernel.org
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@intel.com>
> Testcase: igt/gem_pread/exhaustion
> Testcase: igt/gem_pwrite/basic-exhaustion
> Testcase: igt/gem_fenced_exec_thrash/too-many-fences
> Fixes: 80f0b679d6f0 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.")
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
> drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> index cac7f3f44642..f8948de72036 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> @@ -348,7 +348,7 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt)
> if (intel_has_pending_fb_unpin(ggtt->vm.i915))
> return ERR_PTR(-EAGAIN);
>
> - return ERR_PTR(-EDEADLK);
> + return ERR_PTR(-ENOBUFS);
> }
>
> int __i915_vma_pin_fence(struct i915_vma *vma)
Makes sense..
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Is it a slightly more reent commit? Might probably be the part that converts execbuffer to use ww locks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915/gt: Fix -EDEADLK handling regression
2021-07-01 7:07 ` Maarten Lankhorst
@ 2021-07-01 17:00 ` Ville Syrjälä
2021-07-13 19:58 ` [Intel-gfx] " Daniel Vetter
1 sibling, 0 replies; 7+ messages in thread
From: Ville Syrjälä @ 2021-07-01 17:00 UTC (permalink / raw)
To: Maarten Lankhorst; +Cc: intel-gfx, stable, Thomas Hellström
On Thu, Jul 01, 2021 at 09:07:27AM +0200, Maarten Lankhorst wrote:
> Op 30-06-2021 om 18:44 schreef Ville Syrjala:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >
> > The conversion to ww mutexes failed to address the fence code which
> > already returns -EDEADLK when we run out of fences. Ww mutexes on
> > the other hand treat -EDEADLK as an internal errno value indicating
> > a need to restart the operation due to a deadlock. So now when the
> > fence code returns -EDEADLK the higher level code erroneously
> > restarts everything instead of returning the error to userspace
> > as is expected.
> >
> > To remedy this let's switch the fence code to use a different errno
> > value for this. -ENOBUFS seems like a semi-reasonable unique choice.
> > Apart from igt the only user of this I could find is sna, and even
> > there all we do is dump the current fence registers from debugfs
> > into the X server log. So no user visible functionality is affected.
> > If we really cared about preserving this we could of course convert
> > back to -EDEADLK higher up, but doesn't seem like that's worth
> > the hassle here.
> >
> > Not quite sure which commit specifically broke this, but I'll
> > just attribute it to the general gem ww mutex work.
> >
> > Cc: stable@vger.kernel.org
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Cc: Thomas Hellström <thomas.hellstrom@intel.com>
> > Testcase: igt/gem_pread/exhaustion
> > Testcase: igt/gem_pwrite/basic-exhaustion
> > Testcase: igt/gem_fenced_exec_thrash/too-many-fences
> > Fixes: 80f0b679d6f0 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.")
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> > drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > index cac7f3f44642..f8948de72036 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > @@ -348,7 +348,7 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt)
> > if (intel_has_pending_fb_unpin(ggtt->vm.i915))
> > return ERR_PTR(-EAGAIN);
> >
> > - return ERR_PTR(-EDEADLK);
> > + return ERR_PTR(-ENOBUFS);
> > }
> >
> > int __i915_vma_pin_fence(struct i915_vma *vma)
>
> Makes sense..
>
> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>
> Is it a slightly more reent commit? Might probably be the part that converts execbuffer to use ww locks.
No idea about the specific commit since I've not actually bisected it.
It's just been bugging CI for quite a while now so figured I need to
fix it.
--
Ville Syrjälä
Intel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915/gt: Fix -EDEADLK handling regression
2021-07-01 7:07 ` Maarten Lankhorst
2021-07-01 17:00 ` Ville Syrjälä
@ 2021-07-13 19:58 ` Daniel Vetter
2021-07-13 19:59 ` Daniel Vetter
1 sibling, 1 reply; 7+ messages in thread
From: Daniel Vetter @ 2021-07-13 19:58 UTC (permalink / raw)
To: Maarten Lankhorst, dri-devel
Cc: Ville Syrjala, intel-gfx, Thomas Hellström, stable
On Thu, Jul 1, 2021 at 9:07 AM Maarten Lankhorst
<maarten.lankhorst@linux.intel.com> wrote:
> Op 30-06-2021 om 18:44 schreef Ville Syrjala:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >
> > The conversion to ww mutexes failed to address the fence code which
> > already returns -EDEADLK when we run out of fences. Ww mutexes on
> > the other hand treat -EDEADLK as an internal errno value indicating
> > a need to restart the operation due to a deadlock. So now when the
> > fence code returns -EDEADLK the higher level code erroneously
> > restarts everything instead of returning the error to userspace
> > as is expected.
> >
> > To remedy this let's switch the fence code to use a different errno
> > value for this. -ENOBUFS seems like a semi-reasonable unique choice.
> > Apart from igt the only user of this I could find is sna, and even
> > there all we do is dump the current fence registers from debugfs
> > into the X server log. So no user visible functionality is affected.
> > If we really cared about preserving this we could of course convert
> > back to -EDEADLK higher up, but doesn't seem like that's worth
> > the hassle here.
> >
> > Not quite sure which commit specifically broke this, but I'll
> > just attribute it to the general gem ww mutex work.
> >
> > Cc: stable@vger.kernel.org
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Cc: Thomas Hellström <thomas.hellstrom@intel.com>
> > Testcase: igt/gem_pread/exhaustion
> > Testcase: igt/gem_pwrite/basic-exhaustion
> > Testcase: igt/gem_fenced_exec_thrash/too-many-fences
> > Fixes: 80f0b679d6f0 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.")
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> > drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > index cac7f3f44642..f8948de72036 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > @@ -348,7 +348,7 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt)
> > if (intel_has_pending_fb_unpin(ggtt->vm.i915))
> > return ERR_PTR(-EAGAIN);
> >
> > - return ERR_PTR(-EDEADLK);
> > + return ERR_PTR(-ENOBUFS);
> > }
> >
> > int __i915_vma_pin_fence(struct i915_vma *vma)
>
> Makes sense..
>
> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>
> Is it a slightly more reent commit? Might probably be the part that converts execbuffer to use ww locks.
- please cc: dri-devel on anything gem/gt related.
- this should probably be ENOSPC or something like that for at least a
seeming retention of errno consistentcy:
https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#recommended-ioctl-return-values
Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915/gt: Fix -EDEADLK handling regression
2021-07-13 19:58 ` [Intel-gfx] " Daniel Vetter
@ 2021-07-13 19:59 ` Daniel Vetter
2021-07-13 20:19 ` Rodrigo Vivi
2021-07-13 20:22 ` Ville Syrjälä
0 siblings, 2 replies; 7+ messages in thread
From: Daniel Vetter @ 2021-07-13 19:59 UTC (permalink / raw)
To: Maarten Lankhorst, dri-devel
Cc: Ville Syrjala, intel-gfx, Thomas Hellström, stable
On Tue, Jul 13, 2021 at 9:58 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Thu, Jul 1, 2021 at 9:07 AM Maarten Lankhorst
> <maarten.lankhorst@linux.intel.com> wrote:
> > Op 30-06-2021 om 18:44 schreef Ville Syrjala:
> > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > >
> > > The conversion to ww mutexes failed to address the fence code which
> > > already returns -EDEADLK when we run out of fences. Ww mutexes on
> > > the other hand treat -EDEADLK as an internal errno value indicating
> > > a need to restart the operation due to a deadlock. So now when the
> > > fence code returns -EDEADLK the higher level code erroneously
> > > restarts everything instead of returning the error to userspace
> > > as is expected.
> > >
> > > To remedy this let's switch the fence code to use a different errno
> > > value for this. -ENOBUFS seems like a semi-reasonable unique choice.
> > > Apart from igt the only user of this I could find is sna, and even
> > > there all we do is dump the current fence registers from debugfs
> > > into the X server log. So no user visible functionality is affected.
> > > If we really cared about preserving this we could of course convert
> > > back to -EDEADLK higher up, but doesn't seem like that's worth
> > > the hassle here.
> > >
> > > Not quite sure which commit specifically broke this, but I'll
> > > just attribute it to the general gem ww mutex work.
> > >
> > > Cc: stable@vger.kernel.org
> > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > Cc: Thomas Hellström <thomas.hellstrom@intel.com>
> > > Testcase: igt/gem_pread/exhaustion
> > > Testcase: igt/gem_pwrite/basic-exhaustion
> > > Testcase: igt/gem_fenced_exec_thrash/too-many-fences
> > > Fixes: 80f0b679d6f0 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.")
> > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > ---
> > > drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > > index cac7f3f44642..f8948de72036 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > > @@ -348,7 +348,7 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt)
> > > if (intel_has_pending_fb_unpin(ggtt->vm.i915))
> > > return ERR_PTR(-EAGAIN);
> > >
> > > - return ERR_PTR(-EDEADLK);
> > > + return ERR_PTR(-ENOBUFS);
> > > }
> > >
> > > int __i915_vma_pin_fence(struct i915_vma *vma)
> >
> > Makes sense..
> >
> > Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >
> > Is it a slightly more reent commit? Might probably be the part that converts execbuffer to use ww locks.
>
> - please cc: dri-devel on anything gem/gt related.
> - this should probably be ENOSPC or something like that for at least a
> seeming retention of errno consistentcy:
>
> https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#recommended-ioctl-return-values
Other option would be to map that back to EDEADLK in the execbuf ioctl
somewhere, so we retain a distinct errno code.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915/gt: Fix -EDEADLK handling regression
2021-07-13 19:59 ` Daniel Vetter
@ 2021-07-13 20:19 ` Rodrigo Vivi
2021-07-13 20:22 ` Ville Syrjälä
1 sibling, 0 replies; 7+ messages in thread
From: Rodrigo Vivi @ 2021-07-13 20:19 UTC (permalink / raw)
To: Daniel Vetter
Cc: Maarten Lankhorst, dri-devel, intel-gfx, Thomas Hellström,
stable
On Tue, Jul 13, 2021 at 09:59:18PM +0200, Daniel Vetter wrote:
> On Tue, Jul 13, 2021 at 9:58 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Thu, Jul 1, 2021 at 9:07 AM Maarten Lankhorst
> > <maarten.lankhorst@linux.intel.com> wrote:
> > > Op 30-06-2021 om 18:44 schreef Ville Syrjala:
> > > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > >
> > > > The conversion to ww mutexes failed to address the fence code which
> > > > already returns -EDEADLK when we run out of fences. Ww mutexes on
> > > > the other hand treat -EDEADLK as an internal errno value indicating
> > > > a need to restart the operation due to a deadlock. So now when the
> > > > fence code returns -EDEADLK the higher level code erroneously
> > > > restarts everything instead of returning the error to userspace
> > > > as is expected.
> > > >
> > > > To remedy this let's switch the fence code to use a different errno
> > > > value for this. -ENOBUFS seems like a semi-reasonable unique choice.
> > > > Apart from igt the only user of this I could find is sna, and even
> > > > there all we do is dump the current fence registers from debugfs
> > > > into the X server log. So no user visible functionality is affected.
> > > > If we really cared about preserving this we could of course convert
> > > > back to -EDEADLK higher up, but doesn't seem like that's worth
> > > > the hassle here.
> > > >
> > > > Not quite sure which commit specifically broke this, but I'll
> > > > just attribute it to the general gem ww mutex work.
> > > >
> > > > Cc: stable@vger.kernel.org
> > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > > Cc: Thomas Hellström <thomas.hellstrom@intel.com>
> > > > Testcase: igt/gem_pread/exhaustion
> > > > Testcase: igt/gem_pwrite/basic-exhaustion
> > > > Testcase: igt/gem_fenced_exec_thrash/too-many-fences
> > > > Fixes: 80f0b679d6f0 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.")
> > > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > ---
> > > > drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > > > index cac7f3f44642..f8948de72036 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > > > @@ -348,7 +348,7 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt)
> > > > if (intel_has_pending_fb_unpin(ggtt->vm.i915))
> > > > return ERR_PTR(-EAGAIN);
> > > >
> > > > - return ERR_PTR(-EDEADLK);
> > > > + return ERR_PTR(-ENOBUFS);
> > > > }
> > > >
> > > > int __i915_vma_pin_fence(struct i915_vma *vma)
> > >
> > > Makes sense..
> > >
> > > Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > >
> > > Is it a slightly more reent commit? Might probably be the part that converts execbuffer to use ww locks.
> >
> > - please cc: dri-devel on anything gem/gt related.
> > - this should probably be ENOSPC or something like that for at least a
> > seeming retention of errno consistentcy:
> >
> > https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#recommended-ioctl-return-values
>
> Other option would be to map that back to EDEADLK in the execbuf ioctl
> somewhere, so we retain a distinct errno code.
I'm about to push this patch to drm-intel-fixes... I'm assuming if there's any fix it will
be a follow-up patch and not a revert or force push, right?!
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915/gt: Fix -EDEADLK handling regression
2021-07-13 19:59 ` Daniel Vetter
2021-07-13 20:19 ` Rodrigo Vivi
@ 2021-07-13 20:22 ` Ville Syrjälä
1 sibling, 0 replies; 7+ messages in thread
From: Ville Syrjälä @ 2021-07-13 20:22 UTC (permalink / raw)
To: Daniel Vetter
Cc: Maarten Lankhorst, dri-devel, intel-gfx, Thomas Hellström,
stable
On Tue, Jul 13, 2021 at 09:59:18PM +0200, Daniel Vetter wrote:
> On Tue, Jul 13, 2021 at 9:58 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Thu, Jul 1, 2021 at 9:07 AM Maarten Lankhorst
> > <maarten.lankhorst@linux.intel.com> wrote:
> > > Op 30-06-2021 om 18:44 schreef Ville Syrjala:
> > > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > >
> > > > The conversion to ww mutexes failed to address the fence code which
> > > > already returns -EDEADLK when we run out of fences. Ww mutexes on
> > > > the other hand treat -EDEADLK as an internal errno value indicating
> > > > a need to restart the operation due to a deadlock. So now when the
> > > > fence code returns -EDEADLK the higher level code erroneously
> > > > restarts everything instead of returning the error to userspace
> > > > as is expected.
> > > >
> > > > To remedy this let's switch the fence code to use a different errno
> > > > value for this. -ENOBUFS seems like a semi-reasonable unique choice.
> > > > Apart from igt the only user of this I could find is sna, and even
> > > > there all we do is dump the current fence registers from debugfs
> > > > into the X server log. So no user visible functionality is affected.
> > > > If we really cared about preserving this we could of course convert
> > > > back to -EDEADLK higher up, but doesn't seem like that's worth
> > > > the hassle here.
> > > >
> > > > Not quite sure which commit specifically broke this, but I'll
> > > > just attribute it to the general gem ww mutex work.
> > > >
> > > > Cc: stable@vger.kernel.org
> > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > > Cc: Thomas Hellström <thomas.hellstrom@intel.com>
> > > > Testcase: igt/gem_pread/exhaustion
> > > > Testcase: igt/gem_pwrite/basic-exhaustion
> > > > Testcase: igt/gem_fenced_exec_thrash/too-many-fences
> > > > Fixes: 80f0b679d6f0 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.")
> > > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > ---
> > > > drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > > > index cac7f3f44642..f8948de72036 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > > > @@ -348,7 +348,7 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt)
> > > > if (intel_has_pending_fb_unpin(ggtt->vm.i915))
> > > > return ERR_PTR(-EAGAIN);
> > > >
> > > > - return ERR_PTR(-EDEADLK);
> > > > + return ERR_PTR(-ENOBUFS);
> > > > }
> > > >
> > > > int __i915_vma_pin_fence(struct i915_vma *vma)
> > >
> > > Makes sense..
> > >
> > > Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > >
> > > Is it a slightly more reent commit? Might probably be the part that converts execbuffer to use ww locks.
> >
> > - please cc: dri-devel on anything gem/gt related.
Thought I did. Apparently got lost somewhere.
> > - this should probably be ENOSPC or something like that for at least a
> > seeming retention of errno consistentcy:
ENOSPC is already used for other things.
> >
> > https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#recommended-ioctl-return-values
>
> Other option would be to map that back to EDEADLK in the execbuf ioctl
> somewhere, so we retain a distinct errno code.
Already mentioned in the commit msg.
--
Ville Syrjälä
Intel
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-07-13 20:22 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-06-30 16:44 [PATCH] drm/i915/gt: Fix -EDEADLK handling regression Ville Syrjala
2021-07-01 7:07 ` Maarten Lankhorst
2021-07-01 17:00 ` Ville Syrjälä
2021-07-13 19:58 ` [Intel-gfx] " Daniel Vetter
2021-07-13 19:59 ` Daniel Vetter
2021-07-13 20:19 ` Rodrigo Vivi
2021-07-13 20:22 ` Ville Syrjälä
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).