Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/2] drm/i915: Set guilty-flag on fence after detecting a hang
Date: Tue, 3 Jan 2017 11:57:44 +0000	[thread overview]
Message-ID: <b5cb288b-1ba4-290b-cc7e-856a752d39c3@linux.intel.com> (raw)
In-Reply-To: <20170103114612.GI16295@nuc-i3427.alporthouse.com>


On 03/01/2017 11:46, Chris Wilson wrote:
> On Tue, Jan 03, 2017 at 11:34:45AM +0000, Tvrtko Ursulin wrote:
>>
>> On 03/01/2017 11:05, Chris Wilson wrote:
>>> The struct dma_fence carries a status field exposed to userspace by
>>> sync_file. This is inspected after the fence is signaled and can convey
>>> whether or not the request completed successfully, or in our case if we
>>> detected a hang during the request (signaled via -EIO in
>>> SYNC_IOC_FILE_INFO).
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>> ---
>>> drivers/gpu/drm/i915/i915_gem.c | 6 ++++--
>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 204c4a673bf3..bc99c0e292d8 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -2757,10 +2757,12 @@ static void i915_gem_reset_engine(struct intel_engine_cs *engine)
>>> 		ring_hung = false;
>>> 	}
>>>
>>> -	if (ring_hung)
>>> +	if (ring_hung) {
>>> 		i915_gem_context_mark_guilty(request->ctx);
>>> -	else
>>> +		request->fence.status = -EIO;
>>> +	} else {
>>> 		i915_gem_context_mark_innocent(request->ctx);
>>> +	}
>>>
>>> 	if (!ring_hung)
>>> 		return;
>>>
>>
>> Reading what happens later in this function, should we set the
>> status of all the other requests we are about to clear?
>>
>> However one thing I don't understand is how this scheme interacts
>> with the current userspace. We will clear/no-nop some of the
>> submitted requests since the state is corrupt. But how will
>> userspace notice this before it submits more requets?
>
> There is no mechanism currently for user space to be able to detect a
> hung request. (It can use the uevent for async notification of the
> hang/reset, but that will not tell you who caused the hang.) Userspace
> can track the number of hangs it caused, but the delay makes any
> roundtripping impractical (i.e. you have to synchronise before all
> rendering if you must detect the event immediately). Note also that we
> do not want to give out interprocess information (i.e. to allow one
> client to spy on another), which makes things harder to get right.

So idea is to clear already submitted requests _if_ the userspace is 
synchronising before all rendering and looking at reset stats, to make 
it theoretically possible to detect the corrupt state?

Still with the fences do you agree error status needs to be set on those 
as well?

Regards,

Tvrtko






_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2017-01-03 11:57 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-03 11:05 [PATCH 1/2] dma-fence: Clear fence->status during dma_fence_init() Chris Wilson
2017-01-03 11:05 ` [PATCH 2/2] drm/i915: Set guilty-flag on fence after detecting a hang Chris Wilson
2017-01-03 11:34   ` Tvrtko Ursulin
2017-01-03 11:46     ` [Intel-gfx] " Chris Wilson
2017-01-03 11:57       ` Tvrtko Ursulin [this message]
2017-01-03 12:13         ` Chris Wilson
2017-01-03 12:34           ` [Intel-gfx] " Tvrtko Ursulin
2017-01-03 12:38             ` Chris Wilson
2017-01-03 13:17               ` Tvrtko Ursulin
2017-01-03 13:25                 ` Chris Wilson
2017-01-03 11:53 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] dma-fence: Clear fence->status during dma_fence_init() Patchwork
2017-01-03 14:04 ` [PATCH 1/2] " Tvrtko Ursulin
2017-01-04  9:15   ` [Intel-gfx] " Daniel Vetter
2017-01-04  9:24     ` Chris Wilson
2017-01-04  9:37       ` [Intel-gfx] " Daniel Vetter
2017-01-04  9:43         ` Chris Wilson
2017-01-04 10:18           ` Daniel Vetter
2017-01-04 10:26             ` Chris Wilson
2017-01-04 11:31               ` [Intel-gfx] " Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b5cb288b-1ba4-290b-cc7e-856a752d39c3@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox