From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/i915: Always run hangcheck while the GPU is busy
Date: Tue, 30 Jan 2018 15:00:50 +0200 [thread overview]
Message-ID: <87vafjwlfh.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <151731504873.10214.10161176460418745617@mail.alporthouse.com>
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Quoting Mika Kuoppala (2018-01-30 12:18:17)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>>
>> > Previously, we relied on only running the hangcheck while somebody was
>> > waiting on the GPU, in order to minimise the amount of time hangcheck
>> > had to run. (If nobody was watching the GPU, nobody would notice if the
>> > GPU wasn't responding -- eventually somebody would care and so kick
>> > hangcheck into action.) However, this falls apart from around commit
>> > 4680816be336 ("drm/i915: Wait first for submission, before waiting for
>> > request completion"), as not all waiters declare themselves to hangcheck
>> > and so we could switch off hangcheck and miss GPU hangs even when
>> > waiting under the struct_mutex.
>> >
>> > If we enable hangcheck from the first request submission, and let it run
>> > until the GPU is idle again, we forgo all the complexity involved with
>> > only enabling around waiters. Instead we have to be careful that we do
>> > not declare a GPU hang when idly waiting for the next request to be come
>> > ready.
>>
>> For the complexity part I agree that this is simple and elegant. But
>> I think I have not understood it fully as I don't connect the part where
>> we need to be careful in idly waiting for next request.
>> Could you elaborate and point it the relevant portion in the patch for it?
>
> It's not in this patch, it's just relating to the experiences we've had
> previously in compensating for an engine with requests scheduled waiting
> for a signal, making sure we treated those engines as idle rather than
> stuck.
Ok. Perhaps the last sentence can be omitted then.
I tried to look if we somehow could miss an idle engine check
and declare a false hang if we somehow end up doing a check on
a just idled hardware.
Could not find a clear way that would happen but as
the gt.awake is now a master, should it be first thing we
check in intel_engine_is_idle() to limit how far
we look into the rabbit hole?
-Mika
> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2018-01-30 13:01 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-29 14:41 [PATCH] drm/i915: Always run hangcheck while the GPU is busy Chris Wilson
2018-01-29 15:05 ` ✓ Fi.CI.BAT: success for " Patchwork
2018-01-29 23:41 ` [PATCH] " Antonio Argenziano
2018-01-30 12:18 ` Mika Kuoppala
2018-01-30 12:24 ` Chris Wilson
2018-01-30 13:00 ` Mika Kuoppala [this message]
2018-01-30 15:52 ` Chris Wilson
2018-01-31 9:41 ` Mika Kuoppala
2018-01-31 10:09 ` Chris Wilson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87vafjwlfh.fsf@gaia.fi.intel.com \
--to=mika.kuoppala@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.