Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH 0/6] Default request/fence expiry + watchdog
@ 2021-03-16 16:23 Tvrtko Ursulin
  2021-03-16 16:23 ` [Intel-gfx] [PATCH 1/6] drm/i915: Individual request cancellation Tvrtko Ursulin
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2021-03-16 16:23 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Daniel Vetter, dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

"Watchdog" aka "restoring hangcheck" aka default request/fence expiry - second
post of a somewhat controversial feature, now upgraded to patch status.

I quote the "watchdog" becuase in classical sense watchdog would allow userspace
to ping it and so remain alive.

I quote "restoring hangcheck" because this series, contrary to the old
hangcheck, is not looking at whether the workload is making any progress from
the kernel side either. (Although disclaimer my memory may be leaky - Daniel
suspects old hangcheck had some stricter, more indiscriminatory, angles to it.
But apart from being prone to both false negatives and false positives I can't
remember that myself.)

Short version - ask is to fail any user submissions after a set time period. In
this RFC that time is twelve seconds.

Time counts from the moment user submission is "runnable" (implicit and explicit
dependencies have been cleared) and keeps counting regardless of the GPU
contetion caused by other users of the system.

So semantics are really a bit weak, but again, I understand this is really
really wanted by the DRM core even if I am not convinced it is a good idea.

There are two dangers with doing this, text borrowed from a patch in the series:

    This can have an effect that workloads which used to work fine will
    suddenly start failing.

    Another interaction is with hangcheck where care needs to be taken timeout
    is not set lower or close to three times the heartbeat interval. Otherwise
    a hang in any application can cause complete termination of all
    submissions from unrelated clients. Any users modifying the per engine
    heartbeat intervals therefore need to be aware of this potential denial of
    service to avoid inadvertently enabling it.

v2:
 * Dropped context param.
 * Improved commit messages and Kconfig text.

Test-with: 20210316161840.1993818-1-tvrtko.ursulin@linux.intel.com
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Chris Wilson (1):
  drm/i915: Individual request cancellation

Tvrtko Ursulin (5):
  drm/i915: Restrict sentinel requests further
  drm/i915: Handle async cancellation in sentinel assert
  drm/i915: Request watchdog infrastructure
  drm/i915: Fail too long user submissions by default
  drm/i915: Allow configuring default request expiry via modparam

 drivers/gpu/drm/i915/Kconfig.profile          |  14 ++
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  39 ++++
 .../gpu/drm/i915/gem/i915_gem_context_types.h |   4 +
 drivers/gpu/drm/i915/gt/intel_context_param.h |  11 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   4 +
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   1 +
 .../drm/i915/gt/intel_execlists_submission.c  |  16 +-
 .../drm/i915/gt/intel_execlists_submission.h  |   2 +
 drivers/gpu/drm/i915/gt/intel_gt.c            |   3 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |   2 +
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   |  21 ++
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |   7 +
 drivers/gpu/drm/i915/i915_params.c            |   5 +
 drivers/gpu/drm/i915/i915_params.h            |   1 +
 drivers/gpu/drm/i915/i915_request.c           | 108 +++++++++-
 drivers/gpu/drm/i915/i915_request.h           |  12 +-
 drivers/gpu/drm/i915/selftests/i915_request.c | 201 ++++++++++++++++++
 17 files changed, 442 insertions(+), 9 deletions(-)

-- 
2.27.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-03-16 17:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-16 16:23 [Intel-gfx] [PATCH 0/6] Default request/fence expiry + watchdog Tvrtko Ursulin
2021-03-16 16:23 ` [Intel-gfx] [PATCH 1/6] drm/i915: Individual request cancellation Tvrtko Ursulin
2021-03-16 16:23 ` [Intel-gfx] [PATCH 2/6] drm/i915: Restrict sentinel requests further Tvrtko Ursulin
2021-03-16 16:23 ` [Intel-gfx] [PATCH 3/6] drm/i915: Handle async cancellation in sentinel assert Tvrtko Ursulin
2021-03-16 16:23 ` [Intel-gfx] [PATCH 4/6] drm/i915: Request watchdog infrastructure Tvrtko Ursulin
2021-03-16 16:23 ` [Intel-gfx] [PATCH 5/6] drm/i915: Fail too long user submissions by default Tvrtko Ursulin
2021-03-16 16:23 ` [Intel-gfx] [PATCH 6/6] drm/i915: Allow configuring default request expiry via modparam Tvrtko Ursulin
2021-03-16 17:36 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for Default request/fence expiry + watchdog (rev2) Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox