From: Daniel Vetter <daniel@ffwll.ch>
To: Chris Wilson <chris@chris-wilson.co.uk>,
Daniel Vetter <daniel@ffwll.ch>,
ville.syrjala@linux.intel.com, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/i915: Ignore -EIO from __i915_wait_request() during mmio flip
Date: Wed, 17 Jun 2015 17:07:31 +0200 [thread overview]
Message-ID: <20150617150731.GD23637@phenom.ffwll.local> (raw)
In-Reply-To: <20150617141610.GH24012@nuc-i3427.alporthouse.com>
On Wed, Jun 17, 2015 at 03:16:10PM +0100, Chris Wilson wrote:
> We have gone far off topic.
>
> The question is how we want __i915_wait_request() to handle a wedged
> GPU.
>
> It currently reports EIO, and my argument is that is wrong wrt to the
> semantics of the wait completion and that no caller actually cares about
> EIO from __i915_wait_request().
>
> * Correction: one caller cares!
>
> If we regard a wedged GPU (and in the short term a reset is equally
> terminal to an outstanding request) then the GPU can no longer be
> accesing thta request and the wait can be safely completed. Imo it is
> correct to return 0 in all circumstances. (Reset pending needs to return
> -EAGAIN if we need to backoff, but for the lockless consumers we can
> just ignore the reset notification.
>
> That is set-domain, mmioflip, modesetting do not care if the request
> succeeded, just that it completed.
>
> Throttle() has an -EIO in its ABI for reporting a wedged GPU - this is
> used by X to detect when the GPU is unusable prior to use, e.g. when
> waking up, and also during its periodic flushes.
>
> Overlay reports -EIO when turning on and hanging the GPU. To be fair, it
> can equally report that failure the very next time it touches the ring.
>
> Execbuf itself doesn't rely on wait request reporting EIO, just that we
> report EIO prior to submitting work o a dead GPU/context. Execbuf uses
> wait_request via two paths, syncing to an old request on another ring
> and for flushing requests from the ringbuffer to make room for new
> commands. This is the tricky part, the only instance where we rely on
> aborting after waiting but before further operation - we won't even
> notice a dead GPU prior to starting a request and running out of space
> otherwise). Since it is the only instance, we can move the terminal
> detection of a dead GPU from the wait request into the
> ring_wait_for_space(). This is in keeping with the ethos that we do not
> report -EIO until we attempt to access the GPU.
Ok, following up with my side of the irc discussion we've had. I agree
that there's only 2 places where we must report an EIO if the gpu is
terminally wedge:
- throttle
- execbuf
How that's done doesn't matter, and when it's racy wrt concurrent gpu
deaths that also doens't matter, i.e. we don't need wait_request to EIO
immediately as long as we check terminally_wedged somewhere in these
ioctls.
My main concern is that if we remove the EIO from wait_request we'll
accidentally also remove the EIO from execbuf. And we've had kernels where
the only EIO left was the wait_request from ring_begin ...
But if we add a small igt to manually wedge the gpu through debugfs and
then check that throttle/execbuf do EIO that risk is averted and I'd be ok
with eating EIO from wait_request with extreme prejudice. Since indeed we
still have trouble with EIO at least temporarily totally wreaking modeset
ioclts and other things that really always should work.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2015-06-17 15:04 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-11 16:14 [PATCH] drm/i915: Ignore -EIO from __i915_wait_request() during mmio flip ville.syrjala
2015-06-11 17:05 ` [PATCH v2] drm/i915: Ignore -EIO from __i915_wait_request() during flips ville.syrjala
2015-06-15 4:01 ` shuang.he
2015-06-29 2:53 ` shuang.he
2015-06-11 20:01 ` [PATCH] drm/i915: Ignore -EIO from __i915_wait_request() during mmio flip Chris Wilson
2015-06-15 16:34 ` Daniel Vetter
2015-06-16 12:10 ` Chris Wilson
2015-06-16 16:21 ` Daniel Vetter
2015-06-16 16:30 ` Chris Wilson
2015-06-17 11:53 ` Daniel Vetter
2015-06-17 13:05 ` Chris Wilson
2015-06-17 14:16 ` Chris Wilson
2015-06-17 15:07 ` Daniel Vetter [this message]
2015-06-17 15:46 ` Chris Wilson
2015-06-15 1:40 ` shuang.he
2015-06-29 9:11 ` shuang.he
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150617150731.GD23637@phenom.ffwll.local \
--to=daniel@ffwll.ch \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
--cc=ville.syrjala@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox