From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Cc: stable@vger.kernel.org
Subject: Re: [Intel-gfx] [PATCH 2/4] drm/i915: Cancel outstanding work after disabling heartbeats on an engine
Date: Thu, 24 Sep 2020 14:35:47 +0100 [thread overview]
Message-ID: <756c5947-dcc4-0a4e-e5b7-960d0afe8fd3@linux.intel.com> (raw)
In-Reply-To: <20200916094219.3878-2-chris@chris-wilson.co.uk>
On 16/09/2020 10:42, Chris Wilson wrote:
> We only allow persistent requests to remain on the GPU past the closure
> of their containing context (and process) so long as they are continuously
> checked for hangs or allow other requests to preempt them, as we need to
> ensure forward progress of the system. If we allow persistent contexts
> to remain on the system after the the hangcheck mechanism is disabled,
> the system may grind to a halt. On disabling the mechanism, we sent a
> pulse along the engine to remove all executing contexts from the engine
> which would check for hung contexts -- but we did not prevent those
> contexts from being resubmitted if they survived the final hangcheck.
>
> Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs")
> Testcase: igt/gem_ctx_persistence/heartbeat-stop
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: <stable@vger.kernel.org> # v5.7+
> ---
> drivers/gpu/drm/i915/gt/intel_engine.h | 9 +++++++++
> drivers/gpu/drm/i915/i915_request.c | 5 +++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index 08e2c000dcc3..7c3a1012e702 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -337,4 +337,13 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
> return intel_engine_has_preemption(engine);
> }
>
> +static inline bool
> +intel_engine_has_heartbeat(const struct intel_engine_cs *engine)
> +{
> + if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL))
> + return false;
> +
> + return READ_ONCE(engine->props.heartbeat_interval_ms);
> +}
> +
> #endif /* _INTEL_RINGBUFFER_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 436ce368ddaa..0e813819b041 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -542,8 +542,13 @@ bool __i915_request_submit(struct i915_request *request)
> if (i915_request_completed(request))
> goto xfer;
>
> + if (unlikely(intel_context_is_closed(request->context) &&
> + !intel_engine_has_heartbeat(engine)))
> + intel_context_set_banned(request->context);
> +
> if (unlikely(intel_context_is_banned(request->context)))
> i915_request_set_error_once(request, -EIO);
> +
> if (unlikely(fatal_error(request->fence.error)))
> __i915_request_skip(request);
>
>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
WARNING: multiple messages have this Message-ID (diff)
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Cc: stable@vger.kernel.org
Subject: Re: [Intel-gfx] [PATCH 2/4] drm/i915: Cancel outstanding work after disabling heartbeats on an engine
Date: Thu, 24 Sep 2020 14:35:47 +0100 [thread overview]
Message-ID: <756c5947-dcc4-0a4e-e5b7-960d0afe8fd3@linux.intel.com> (raw)
In-Reply-To: <20200916094219.3878-2-chris@chris-wilson.co.uk>
On 16/09/2020 10:42, Chris Wilson wrote:
> We only allow persistent requests to remain on the GPU past the closure
> of their containing context (and process) so long as they are continuously
> checked for hangs or allow other requests to preempt them, as we need to
> ensure forward progress of the system. If we allow persistent contexts
> to remain on the system after the the hangcheck mechanism is disabled,
> the system may grind to a halt. On disabling the mechanism, we sent a
> pulse along the engine to remove all executing contexts from the engine
> which would check for hung contexts -- but we did not prevent those
> contexts from being resubmitted if they survived the final hangcheck.
>
> Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs")
> Testcase: igt/gem_ctx_persistence/heartbeat-stop
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: <stable@vger.kernel.org> # v5.7+
> ---
> drivers/gpu/drm/i915/gt/intel_engine.h | 9 +++++++++
> drivers/gpu/drm/i915/i915_request.c | 5 +++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index 08e2c000dcc3..7c3a1012e702 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -337,4 +337,13 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
> return intel_engine_has_preemption(engine);
> }
>
> +static inline bool
> +intel_engine_has_heartbeat(const struct intel_engine_cs *engine)
> +{
> + if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL))
> + return false;
> +
> + return READ_ONCE(engine->props.heartbeat_interval_ms);
> +}
> +
> #endif /* _INTEL_RINGBUFFER_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 436ce368ddaa..0e813819b041 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -542,8 +542,13 @@ bool __i915_request_submit(struct i915_request *request)
> if (i915_request_completed(request))
> goto xfer;
>
> + if (unlikely(intel_context_is_closed(request->context) &&
> + !intel_engine_has_heartbeat(engine)))
> + intel_context_set_banned(request->context);
> +
> if (unlikely(intel_context_is_banned(request->context)))
> i915_request_set_error_once(request, -EIO);
> +
> if (unlikely(fatal_error(request->fence.error)))
> __i915_request_skip(request);
>
>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Regards,
Tvrtko
next prev parent reply other threads:[~2020-09-24 13:35 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-16 9:42 [Intel-gfx] [PATCH 1/4] drm/i915/gem: Hold request reference for canceling an active context Chris Wilson
2020-09-16 9:42 ` [Intel-gfx] [PATCH 2/4] drm/i915: Cancel outstanding work after disabling heartbeats on an engine Chris Wilson
2020-09-16 9:42 ` Chris Wilson
2020-09-24 13:35 ` Tvrtko Ursulin [this message]
2020-09-24 13:35 ` [Intel-gfx] " Tvrtko Ursulin
2020-09-25 11:04 ` Joonas Lahtinen
2020-09-25 11:04 ` Joonas Lahtinen
2020-09-16 9:42 ` [Intel-gfx] [PATCH 3/4] drm/i915/gt: Always send a pulse down the engine after disabling heartbeat Chris Wilson
2020-09-16 9:42 ` Chris Wilson
2020-09-24 13:43 ` [Intel-gfx] " Tvrtko Ursulin
2020-09-24 13:43 ` Tvrtko Ursulin
2020-09-25 10:01 ` Chris Wilson
2020-09-25 10:01 ` Chris Wilson
2020-09-25 13:19 ` Tvrtko Ursulin
2020-09-25 13:19 ` Tvrtko Ursulin
2020-09-25 14:13 ` Chris Wilson
2020-09-25 14:13 ` Chris Wilson
2020-09-16 9:42 ` [Intel-gfx] [PATCH 4/4] drm/i915/gem: Always test execution status on closing the context Chris Wilson
2020-09-16 9:42 ` Chris Wilson
2020-09-24 14:26 ` [Intel-gfx] " Tvrtko Ursulin
2020-09-24 14:26 ` Tvrtko Ursulin
2020-09-25 10:05 ` Chris Wilson
2020-09-25 10:05 ` Chris Wilson
2020-09-25 13:23 ` Tvrtko Ursulin
2020-09-25 13:23 ` Tvrtko Ursulin
2020-09-16 13:03 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/4] drm/i915/gem: Hold request reference for canceling an active context Patchwork
2020-09-16 13:04 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2020-09-16 13:28 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-09-16 17:19 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2020-09-24 13:34 ` [Intel-gfx] [PATCH 1/4] " Tvrtko Ursulin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=756c5947-dcc4-0a4e-e5b7-960d0afe8fd3@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.