public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Ben Widawsky <ben@bwidawsk.net>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/4] drm/i915: don't return a spurious -EIO from intel_ring_begin
Date: Mon, 2 Jul 2012 09:04:48 -0700	[thread overview]
Message-ID: <20120702090448.119b02cc@bwidawsk.net> (raw)
In-Reply-To: <CAKMK7uFiFL7CQ5Cg8y0qSkBOZH+CrwcWLLowC=zOnnK-EAG6xg@mail.gmail.com>

On Sun, 1 Jul 2012 12:41:19 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Sun, Jul 1, 2012 at 5:09 AM, Ben Widawsky <ben@bwidawsk.net> wrote:
> > On Tue, 26 Jun 2012 23:08:50 +0200
> > Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> >> Having this early check in intel_ring_begin doesn't buy us anything,
> >> since we'll be calling into wait_request in the usual case already
> >> anyway. In the corner case of not waiting for free space using the
> >> last_retired_head we simply need to do the same check, too.
> >>
> >> With these changes we'll only ever get an -EIO from intel_ring_begin
> >> if the gpu has truely been declared dead.
> >>
> >> v2: Don't conflate the change to prevent intel_ring_begin from returning
> >> a spurious -EIO with the refactor to use i915_gem_check_wedge, which is
> >> just prep work to avoid returning -EAGAIN in callsites that can't handle
> >> syscall restarting.
> >
> > I'm really scared by this change. It's diffuclt to review because there
> > are SO many callers of intel_ring_begin, and figuring out if they all
> > work in the wedged case is simply too difficult. I've yet to review the
> > rest of the series, but it may make more sense to put this change last
> > perhaps?
> 
> Well, this patch doesn't really affect much what error codes the
> callers get - we'll still throw both -EGAIN and -EIO at them (later on
> patches will fix this).
> 
> What this patch does though is prevent us from returning too many
> -EIO. Imagine the gpu died and we've noticed already (hence
> dev_priv->mm.wedged is set), but some process is stuck churning
> through the execbuf ioctl, holding dev->struct_mutex. While processing
> the execbuf we call intel_ring_begin to emit a few commands. Now
> usually, even when the gpu is dead, there is enough space in the ring
> to do so, which allows us to complete the execbuf ioctl and then later
> on we can block properly when trying to grab the mutex in the next
> ioctl for the gpu reset work handler to complete.

That in itself is a pretty big change, don't you think? It seems rather
strange and dangerous to modify HW (which we will if we allow execbuf to
continue when we write the tail pointer). I think we want some way to
not write the tail ptr in such a case. Now you might respond, well, who
cares about writes? But this imposes a pretty large restriction on any
code that can't work well after the GPU is hung.

I see the irony. I'm complaining that you can make GPU hangs unstable,
and the patch series is fixing GPU reset. Call it paranoia, it still
seems unsafe to me, and makes us have to think a bit more whenever
adding any code. Am I over-thinking this?

> 
> But thanks to that wedged check in intel_ring_begin we'll instead
> return -EIO, despite the fact that later on we could successfully
> reset the gpu. Returning -EIO forces the X server to fall back to s/w
> rendering and disabling dri2, and in the case of a 3d compositor
> usually results in a abort. After this patch we can still return -EIO
> if the gpu is dead but the reset work hasn't completed yet, but only
> so if the ring is full (which in many cases is unlikely).
> 
> Cheers, Daniel
> 
> >
> >>
> >> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >> ---
> >>  drivers/gpu/drm/i915/intel_ringbuffer.c |    4 ----
> >>  1 file changed, 4 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >> index f30a53a..501546e 100644
> >> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> >> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >> @@ -1230,13 +1230,9 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
> >>  int intel_ring_begin(struct intel_ring_buffer *ring,
> >>                    int num_dwords)
> >>  {
> >> -     struct drm_i915_private *dev_priv = ring->dev->dev_private;
> >>       int n = 4*num_dwords;
> >>       int ret;
> >>
> >> -     if (unlikely(atomic_read(&dev_priv->mm.wedged)))
> >> -             return -EIO;
> >> -
> >>       if (unlikely(ring->tail + n > ring->effective_size)) {
> >>               ret = intel_wrap_ring_buffer(ring);
> >>               if (unlikely(ret))
> >
> >
> >
> > --
> > Ben Widawsky, Intel Open Source Technology Center
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> 
> 



-- 
Ben Widawsky, Intel Open Source Technology Center

  reply	other threads:[~2012-07-02 16:04 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-26 21:08 [PATCH 1/4] drm/i915: don't return a spurious -EIO from intel_ring_begin Daniel Vetter
2012-06-26 21:08 ` [PATCH 2/4] drm/i915: don't trylock in the gpu reset code Daniel Vetter
2012-06-26 22:08   ` Łukasz Kuryło
2012-06-26 22:34     ` Daniel Vetter
2012-06-26 21:08 ` [PATCH 3/4] drm/i915: non-interruptible sleeps can't handle -EGAIN Daniel Vetter
2012-06-27 15:19   ` Ben Widawsky
2012-06-27 16:36     ` Daniel Vetter
2012-06-26 21:08 ` [PATCH 4/4] drm/i915: don't hange userspace when the gpu reset is stuck Daniel Vetter
2012-07-01  3:09 ` [PATCH 1/4] drm/i915: don't return a spurious -EIO from intel_ring_begin Ben Widawsky
2012-07-01 10:41   ` Daniel Vetter
2012-07-02 16:04     ` Ben Widawsky [this message]
2012-07-02 16:47       ` Daniel Vetter
2012-07-03 15:59 ` Chris Wilson
2012-07-03 18:11   ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120702090448.119b02cc@bwidawsk.net \
    --to=ben@bwidawsk.net \
    --cc=daniel@ffwll.ch \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox