Re: [PATCH 3/4] drm/i915: non-interruptible sleeps can't handle -EGAIN

public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed

From: Daniel Vetter <daniel@ffwll.ch>
To: Ben Widawsky <ben@bwidawsk.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
	Intel Graphics Development <intel-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 3/4] drm/i915: non-interruptible sleeps can't handle -EGAIN
Date: Wed, 27 Jun 2012 18:36:02 +0200	[thread overview]
Message-ID: <20120627163602.GG5326@phenom.ffwll.local> (raw)
In-Reply-To: <20120627081903.1a15044a@bwidawsk.net>

On Wed, Jun 27, 2012 at 08:19:03AM -0700, Ben Widawsky wrote:
> On Tue, 26 Jun 2012 23:08:52 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> 
> > So don't return -EAGAIN, even in the case of a gpu hang. Remap it to -EIO
> > instead.
> 
> What I'd really like to see in this rather long commit message is what
> exactly happens in this case that's being fixed (maybe I should know,
> but I don't).

What about adding: "Note that this isn't really an issue with
interruptability, but more that we have quite a few codepaths (mostly
around kms stuff) that simply can't handle any errors and hence not even
-EGAIN. Instead of adding proper failure paths so that we could restart
these ioctls we've opted for the cheap way out of sleeping
non-interruptibly.  Which works everywhere but when the gpu dies, which
this patch fixes.

So essentially interruptible == false means 'wait for the gpu or die
trying'."

-Daniel

> 
> > 
> > This is a bit ugly because intel_ring_begin is all non-interruptible
> > and hence only returns -EIO. But as the comment in there says,
> > auditing all the callsites would be a pain.
> > 
> > To avoid duplicating code, reuse i915_gem_check_wedge in __wait_seqno
> > and intel_wait_ring_buffer. Also use the opportunity to clarify the
> > different cases in i915_gem_check_wedge a bit with comments.
> > 
> > v2: Don't access dev_priv->mm.interruptible from check_wedge - we
> > might not hold dev->struct_mutex, making this racy. Instead pass
> > interruptible in as a parameter. I've noticed this because I've hit a
> > BUG_ON(!mutex_is_locked) at the top of check_wedge. This has been
> > added in
> > 
> > commit b4aca0106c466b5a0329318203f65bac2d91b682
> > Author: Ben Widawsky <ben@bwidawsk.net>
> > Date:   Wed Apr 25 20:50:12 2012 -0700
> > 
> >     drm/i915: extract some common olr+wedge code
> > 
> > although that commit is missing any justification for this it. I guess
> > it's just copy&paste, because the same commit add the same BUG_ON
> > check to check_olr, where it indeed makes sense.
> > 
> > But in check_wedge everything we access is protected by other means,
> > so this is superflous. And because it now gets in the way (we add a
> > new caller in __wait_seqno, which can be called without
> > dev->struct_mutext) let's just remove it.
> > 
> > v3: Group all the i915_gem_check_wedge refactoring into this patch, so
> > that this patch here is all about not returning -EAGAIN to callsites
> > that can't handle syscall restarting.
> > 
> > Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
> 
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h         |    2 ++
> >  drivers/gpu/drm/i915/i915_gem.c         |   26 ++++++++++++++++++--------
> >  drivers/gpu/drm/i915/intel_ringbuffer.c |    6 ++++--
> >  3 files changed, 24 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index a0c15ab..ab9ade0 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1330,6 +1330,8 @@ i915_gem_object_unpin_fence(struct drm_i915_gem_object *obj)
> >  
> >  void i915_gem_retire_requests(struct drm_device *dev);
> >  void i915_gem_retire_requests_ring(struct intel_ring_buffer *ring);
> > +int __must_check i915_gem_check_wedge(struct drm_i915_private *dev_priv,
> > +				      bool interruptible);
> >  
> >  void i915_gem_reset(struct drm_device *dev);
> >  void i915_gem_clflush_object(struct drm_i915_gem_object *obj);
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 6a98c06..af6a510 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1863,11 +1863,10 @@ i915_gem_retire_work_handler(struct work_struct *work)
> >  	mutex_unlock(&dev->struct_mutex);
> >  }
> >  
> > -static int
> > -i915_gem_check_wedge(struct drm_i915_private *dev_priv)
> > +int
> > +i915_gem_check_wedge(struct drm_i915_private *dev_priv,
> > +		     bool interruptible)
> >  {
> > -	BUG_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
> > -
> >  	if (atomic_read(&dev_priv->mm.wedged)) {
> >  		struct completion *x = &dev_priv->error_completion;
> >  		bool recovery_complete;
> > @@ -1878,7 +1877,16 @@ i915_gem_check_wedge(struct drm_i915_private *dev_priv)
> >  		recovery_complete = x->done > 0;
> >  		spin_unlock_irqrestore(&x->wait.lock, flags);
> >  
> > -		return recovery_complete ? -EIO : -EAGAIN;
> > +		/* Non-interruptible callers can't handle -EAGAIN, hence return
> > +		 * -EIO unconditionally for these. */
> > +		if (!interruptible)
> > +			return -EIO;
> > +
> > +		/* Recovery complete, but still wedged means reset failure. */
> > +		if (recovery_complete)
> > +			return -EIO;
> > +
> > +		return -EAGAIN;
> >  	}
> >  
> >  	return 0;
> > @@ -1932,6 +1940,7 @@ static int __wait_seqno(struct intel_ring_buffer *ring, u32 seqno,
> >  	unsigned long timeout_jiffies;
> >  	long end;
> >  	bool wait_forever = true;
> > +	int ret;
> >  
> >  	if (i915_seqno_passed(ring->get_seqno(ring), seqno))
> >  		return 0;
> > @@ -1963,8 +1972,9 @@ static int __wait_seqno(struct intel_ring_buffer *ring, u32 seqno,
> >  			end = wait_event_timeout(ring->irq_queue, EXIT_COND,
> >  						 timeout_jiffies);
> >  
> > -		if (atomic_read(&dev_priv->mm.wedged))
> > -			end = -EAGAIN;
> > +		ret = i915_gem_check_wedge(dev_priv, interruptible);
> > +		if (ret)
> > +			end = ret;
> >  	} while (end == 0 && wait_forever);
> >  
> >  	getrawmonotonic(&now);
> > @@ -2004,7 +2014,7 @@ i915_wait_seqno(struct intel_ring_buffer *ring, uint32_t seqno)
> >  
> >  	BUG_ON(seqno == 0);
> >  
> > -	ret = i915_gem_check_wedge(dev_priv);
> > +	ret = i915_gem_check_wedge(dev_priv, dev_priv->mm.interruptible);
> >  	if (ret)
> >  		return ret;
> >  
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 501546e..6c024d4 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1220,8 +1220,10 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
> >  		}
> >  
> >  		msleep(1);
> > -		if (atomic_read(&dev_priv->mm.wedged))
> > -			return -EAGAIN;
> > +
> > +		ret = i915_gem_check_wedge(dev_priv, dev_priv->mm.interruptible);
> > +		if (ret)
> > +			return ret;
> >  	} while (!time_after(jiffies, end));
> >  	trace_i915_ring_wait_end(ring);
> >  	return -EBUSY;
> 
> 
> 
> -- 
> Ben Widawsky, Intel Open Source Technology Center

-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

next prev parent reply	other threads:[~2012-06-27 16:36 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-26 21:08 [PATCH 1/4] drm/i915: don't return a spurious -EIO from intel_ring_begin Daniel Vetter
2012-06-26 21:08 ` [PATCH 2/4] drm/i915: don't trylock in the gpu reset code Daniel Vetter
2012-06-26 22:08   ` Łukasz Kuryło
2012-06-26 22:34     ` Daniel Vetter
2012-06-26 21:08 ` [PATCH 3/4] drm/i915: non-interruptible sleeps can't handle -EGAIN Daniel Vetter
2012-06-27 15:19   ` Ben Widawsky
2012-06-27 16:36     ` Daniel Vetter [this message]
2012-06-26 21:08 ` [PATCH 4/4] drm/i915: don't hange userspace when the gpu reset is stuck Daniel Vetter
2012-07-01  3:09 ` [PATCH 1/4] drm/i915: don't return a spurious -EIO from intel_ring_begin Ben Widawsky
2012-07-01 10:41   ` Daniel Vetter
2012-07-02 16:04     ` Ben Widawsky
2012-07-02 16:47       ` Daniel Vetter
2012-07-03 15:59 ` Chris Wilson
2012-07-03 18:11   ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120627163602.GG5326@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=ben@bwidawsk.net \
    --cc=daniel.vetter@ffwll.ch \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox