From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/i915: Always run hangcheck while the GPU is busy
Date: Tue, 30 Jan 2018 14:18:17 +0200 [thread overview]
Message-ID: <87y3kfwnee.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <20180129144104.3921-1-chris@chris-wilson.co.uk>
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Previously, we relied on only running the hangcheck while somebody was
> waiting on the GPU, in order to minimise the amount of time hangcheck
> had to run. (If nobody was watching the GPU, nobody would notice if the
> GPU wasn't responding -- eventually somebody would care and so kick
> hangcheck into action.) However, this falls apart from around commit
> 4680816be336 ("drm/i915: Wait first for submission, before waiting for
> request completion"), as not all waiters declare themselves to hangcheck
> and so we could switch off hangcheck and miss GPU hangs even when
> waiting under the struct_mutex.
>
> If we enable hangcheck from the first request submission, and let it run
> until the GPU is idle again, we forgo all the complexity involved with
> only enabling around waiters. Instead we have to be careful that we do
> not declare a GPU hang when idly waiting for the next request to be come
> ready.
For the complexity part I agree that this is simple and elegant. But
I think I have not understood it fully as I don't connect the part where
we need to be careful in idly waiting for next request.
Could you elaborate and point it the relevant portion in the patch for it?
-Mika
>
> Fixes: 4680816be336 ("drm/i915: Wait first for submission, before waiting for request completion"
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 7 +++----
> drivers/gpu/drm/i915/i915_gem_request.c | 2 ++
> drivers/gpu/drm/i915/intel_breadcrumbs.c | 11 -----------
> drivers/gpu/drm/i915/intel_hangcheck.c | 7 +------
> 4 files changed, 6 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 062b21408698..63308ec016a3 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3322,16 +3322,15 @@ i915_gem_retire_work_handler(struct work_struct *work)
> mutex_unlock(&dev->struct_mutex);
> }
>
> - /* Keep the retire handler running until we are finally idle.
> + /*
> + * Keep the retire handler running until we are finally idle.
> * We do not need to do this test under locking as in the worst-case
> * we queue the retire worker once too often.
> */
> - if (READ_ONCE(dev_priv->gt.awake)) {
> - i915_queue_hangcheck(dev_priv);
> + if (READ_ONCE(dev_priv->gt.awake))
> queue_delayed_work(dev_priv->wq,
> &dev_priv->gt.retire_work,
> round_jiffies_up_relative(HZ));
> - }
> }
>
> static void shrink_caches(struct drm_i915_private *i915)
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index 9b02310307fc..6cacd78cc849 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -393,6 +393,8 @@ static void mark_busy(struct drm_i915_private *i915)
>
> intel_engines_unpark(i915);
>
> + i915_queue_hangcheck(i915);
> +
> queue_delayed_work(i915->wq,
> &i915->gt.retire_work,
> round_jiffies_up_relative(HZ));
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 86acac010bb8..62b2a20bc24e 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -149,17 +149,6 @@ static void intel_breadcrumbs_fake_irq(struct timer_list *t)
> return;
>
> mod_timer(&b->fake_irq, jiffies + 1);
> -
> - /* Ensure that even if the GPU hangs, we get woken up.
> - *
> - * However, note that if no one is waiting, we never notice
> - * a gpu hang. Eventually, we will have to wait for a resource
> - * held by the GPU and so trigger a hangcheck. In the most
> - * pathological case, this will be upon memory starvation! To
> - * prevent this, we also queue the hangcheck from the retire
> - * worker.
> - */
> - i915_queue_hangcheck(engine->i915);
> }
>
> static void irq_enable(struct intel_engine_cs *engine)
> diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> index 31f01d64c021..348a4f7ffb67 100644
> --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> @@ -411,7 +411,6 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> struct intel_engine_cs *engine;
> enum intel_engine_id id;
> unsigned int hung = 0, stuck = 0;
> - int busy_count = 0;
>
> if (!i915_modparams.enable_hangcheck)
> return;
> @@ -429,7 +428,6 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> intel_uncore_arm_unclaimed_mmio_detection(dev_priv);
>
> for_each_engine(engine, dev_priv, id) {
> - const bool busy = intel_engine_has_waiter(engine);
> struct intel_engine_hangcheck hc;
>
> semaphore_clear_deadlocks(dev_priv);
> @@ -443,16 +441,13 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> if (hc.action != ENGINE_DEAD)
> stuck |= intel_engine_flag(engine);
> }
> -
> - busy_count += busy;
> }
>
> if (hung)
> hangcheck_declare_hang(dev_priv, hung, stuck);
>
> /* Reset timer in case GPU hangs without another request being added */
> - if (busy_count)
> - i915_queue_hangcheck(dev_priv);
> + i915_queue_hangcheck(dev_priv);
> }
>
> void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
> --
> 2.15.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2018-01-30 12:18 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-29 14:41 [PATCH] drm/i915: Always run hangcheck while the GPU is busy Chris Wilson
2018-01-29 15:05 ` ✓ Fi.CI.BAT: success for " Patchwork
2018-01-29 23:41 ` [PATCH] " Antonio Argenziano
2018-01-30 12:18 ` Mika Kuoppala [this message]
2018-01-30 12:24 ` Chris Wilson
2018-01-30 13:00 ` Mika Kuoppala
2018-01-30 15:52 ` Chris Wilson
2018-01-31 9:41 ` Mika Kuoppala
2018-01-31 10:09 ` Chris Wilson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y3kfwnee.fsf@gaia.fi.intel.com \
--to=mika.kuoppala@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.