From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/2] drm/i915: Set guilty-flag on fence after detecting a hang
Date: Tue, 3 Jan 2017 13:17:19 +0000 [thread overview]
Message-ID: <1943b5da-fd46-9fce-d79b-0ba28ba89bd8@linux.intel.com> (raw)
In-Reply-To: <20170103123824.GL16295@nuc-i3427.alporthouse.com>
On 03/01/2017 12:38, Chris Wilson wrote:
> On Tue, Jan 03, 2017 at 12:34:16PM +0000, Tvrtko Ursulin wrote:
>>
>> On 03/01/2017 12:13, Chris Wilson wrote:
>>> On Tue, Jan 03, 2017 at 11:57:44AM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 03/01/2017 11:46, Chris Wilson wrote:
>>>>> On Tue, Jan 03, 2017 at 11:34:45AM +0000, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 03/01/2017 11:05, Chris Wilson wrote:
>>>>>>> The struct dma_fence carries a status field exposed to userspace by
>>>>>>> sync_file. This is inspected after the fence is signaled and can convey
>>>>>>> whether or not the request completed successfully, or in our case if we
>>>>>>> detected a hang during the request (signaled via -EIO in
>>>>>>> SYNC_IOC_FILE_INFO).
>>>>>>>
>>>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/i915/i915_gem.c | 6 ++++--
>>>>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>>>>>> index 204c4a673bf3..bc99c0e292d8 100644
>>>>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>>>>> @@ -2757,10 +2757,12 @@ static void i915_gem_reset_engine(struct intel_engine_cs *engine)
>>>>>>> ring_hung = false;
>>>>>>> }
>>>>>>>
>>>>>>> - if (ring_hung)
>>>>>>> + if (ring_hung) {
>>>>>>> i915_gem_context_mark_guilty(request->ctx);
>>>>>>> - else
>>>>>>> + request->fence.status = -EIO;
>>>>>>> + } else {
>>>>>>> i915_gem_context_mark_innocent(request->ctx);
>>>>>>> + }
>>>>>>>
>>>>>>> if (!ring_hung)
>>>>>>> return;
>>>>>>>
>>>>>>
>>>>>> Reading what happens later in this function, should we set the
>>>>>> status of all the other requests we are about to clear?
>>>>>>
>>>>>> However one thing I don't understand is how this scheme interacts
>>>>>> with the current userspace. We will clear/no-nop some of the
>>>>>> submitted requests since the state is corrupt. But how will
>>>>>> userspace notice this before it submits more requets?
>>>>>
>>>>> There is no mechanism currently for user space to be able to detect a
>>>>> hung request. (It can use the uevent for async notification of the
>>>>> hang/reset, but that will not tell you who caused the hang.) Userspace
>>>>> can track the number of hangs it caused, but the delay makes any
>>>>> roundtripping impractical (i.e. you have to synchronise before all
>>>>> rendering if you must detect the event immediately). Note also that we
>>>>> do not want to give out interprocess information (i.e. to allow one
>>>>> client to spy on another), which makes things harder to get right.
>>>>
>>>> So idea is to clear already submitted requests _if_ the userspace is
>>>> synchronising before all rendering and looking at reset stats, to
>>>> make it theoretically possible to detect the corrupt state?
>>>
>>> No, I'm just don't see a way that userspace can detect the hang without
>>> testing after seeing the request signaled (either by waiting on the
>>> batch or by waiting on the fence), i.e. by being completely synchronous
>>> (or at least chosing its synchronous points very carefully, such as
>>> around IPC). It can either poll reset-count or use sync_file (which
>>> requires fence exporting).
>>>
>>> The current robustness interfaces is a basic query on whether any reset
>>> occurred within the context, not when.
>>
>> Why do we bother with clearing the submitted requests then?
>
> The same reason we ban processes from submitting new requests if they
> cause repeated hangs. If before we ban that client, it has already
> submitted 1000 hanging requests, it has successfully locked the machine
> up for a couple of hours.
So we would need to gate clearing on the transition to banned state I
think. Because currently it does in unconditionally.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2017-01-03 13:17 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-03 11:05 [PATCH 1/2] dma-fence: Clear fence->status during dma_fence_init() Chris Wilson
2017-01-03 11:05 ` [PATCH 2/2] drm/i915: Set guilty-flag on fence after detecting a hang Chris Wilson
2017-01-03 11:34 ` Tvrtko Ursulin
2017-01-03 11:46 ` [Intel-gfx] " Chris Wilson
2017-01-03 11:57 ` Tvrtko Ursulin
2017-01-03 12:13 ` Chris Wilson
2017-01-03 12:34 ` [Intel-gfx] " Tvrtko Ursulin
2017-01-03 12:38 ` Chris Wilson
2017-01-03 13:17 ` Tvrtko Ursulin [this message]
2017-01-03 13:25 ` Chris Wilson
2017-01-03 11:53 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] dma-fence: Clear fence->status during dma_fence_init() Patchwork
2017-01-03 14:04 ` [PATCH 1/2] " Tvrtko Ursulin
2017-01-04 9:15 ` [Intel-gfx] " Daniel Vetter
2017-01-04 9:24 ` Chris Wilson
2017-01-04 9:37 ` [Intel-gfx] " Daniel Vetter
2017-01-04 9:43 ` Chris Wilson
2017-01-04 10:18 ` Daniel Vetter
2017-01-04 10:26 ` Chris Wilson
2017-01-04 11:31 ` [Intel-gfx] " Daniel Vetter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1943b5da-fd46-9fce-d79b-0ba28ba89bd8@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox