From: Daniel Vetter <daniel@ffwll.ch>
To: Chris Wilson <chris@chris-wilson.co.uk>,
Daniel Vetter <daniel@ffwll.ch>,
ville.syrjala@linux.intel.com, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/i915: Ignore -EIO from __i915_wait_request() during mmio flip
Date: Wed, 17 Jun 2015 17:07:31 +0200 [thread overview]
Message-ID: <20150617150731.GD23637@phenom.ffwll.local> (raw)
In-Reply-To: <20150617141610.GH24012@nuc-i3427.alporthouse.com>
On Wed, Jun 17, 2015 at 03:16:10PM +0100, Chris Wilson wrote:
> We have gone far off topic.
>
> The question is how we want __i915_wait_request() to handle a wedged
> GPU.
>
> It currently reports EIO, and my argument is that is wrong wrt to the
> semantics of the wait completion and that no caller actually cares about
> EIO from __i915_wait_request().
>
> * Correction: one caller cares!
>
> If we regard a wedged GPU (and in the short term a reset is equally
> terminal to an outstanding request) then the GPU can no longer be
> accesing thta request and the wait can be safely completed. Imo it is
> correct to return 0 in all circumstances. (Reset pending needs to return
> -EAGAIN if we need to backoff, but for the lockless consumers we can
> just ignore the reset notification.
>
> That is set-domain, mmioflip, modesetting do not care if the request
> succeeded, just that it completed.
>
> Throttle() has an -EIO in its ABI for reporting a wedged GPU - this is
> used by X to detect when the GPU is unusable prior to use, e.g. when
> waking up, and also during its periodic flushes.
>
> Overlay reports -EIO when turning on and hanging the GPU. To be fair, it
> can equally report that failure the very next time it touches the ring.
>
> Execbuf itself doesn't rely on wait request reporting EIO, just that we
> report EIO prior to submitting work o a dead GPU/context. Execbuf uses
> wait_request via two paths, syncing to an old request on another ring
> and for flushing requests from the ringbuffer to make room for new
> commands. This is the tricky part, the only instance where we rely on
> aborting after waiting but before further operation - we won't even
> notice a dead GPU prior to starting a request and running out of space
> otherwise). Since it is the only instance, we can move the terminal
> detection of a dead GPU from the wait request into the
> ring_wait_for_space(). This is in keeping with the ethos that we do not
> report -EIO until we attempt to access the GPU.
Ok, following up with my side of the irc discussion we've had. I agree
that there's only 2 places where we must report an EIO if the gpu is
terminally wedge:
- throttle
- execbuf
How that's done doesn't matter, and when it's racy wrt concurrent gpu
deaths that also doens't matter, i.e. we don't need wait_request to EIO
immediately as long as we check terminally_wedged somewhere in these
ioctls.
My main concern is that if we remove the EIO from wait_request we'll
accidentally also remove the EIO from execbuf. And we've had kernels where
the only EIO left was the wait_request from ring_begin ...
But if we add a small igt to manually wedge the gpu through debugfs and
then check that throttle/execbuf do EIO that risk is averted and I'd be ok
with eating EIO from wait_request with extreme prejudice. Since indeed we
still have trouble with EIO at least temporarily totally wreaking modeset
ioclts and other things that really always should work.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2015-06-17 15:04 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-11 16:14 [PATCH] drm/i915: Ignore -EIO from __i915_wait_request() during mmio flip ville.syrjala
2015-06-11 17:05 ` [PATCH v2] drm/i915: Ignore -EIO from __i915_wait_request() during flips ville.syrjala
2015-06-15 4:01 ` shuang.he
2015-06-29 2:53 ` shuang.he
2015-06-11 20:01 ` [PATCH] drm/i915: Ignore -EIO from __i915_wait_request() during mmio flip Chris Wilson
2015-06-15 16:34 ` Daniel Vetter
2015-06-16 12:10 ` Chris Wilson
2015-06-16 16:21 ` Daniel Vetter
2015-06-16 16:30 ` Chris Wilson
2015-06-17 11:53 ` Daniel Vetter
2015-06-17 13:05 ` Chris Wilson
2015-06-17 14:16 ` Chris Wilson
2015-06-17 15:07 ` Daniel Vetter [this message]
2015-06-17 15:46 ` Chris Wilson
2015-06-15 1:40 ` shuang.he
2015-06-29 9:11 ` shuang.he
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150617150731.GD23637@phenom.ffwll.local \
--to=daniel@ffwll.ch \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
--cc=ville.syrjala@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.