All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Taint (TAINT_DIE) the kernel if the GPU reset fails
@ 2017-11-29 13:59 Chris Wilson
  2017-11-29 14:05 ` [PATCH v2] " Chris Wilson
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Chris Wilson @ 2017-11-29 13:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

History tells us that if we cannot reset the GPU now, we never will. This
then impacts everything that is run subsequently. On failing the reset,
we mark the driver as wedged, trying to prevent further execution on the
GPU, forcing userspace to fallback to using the CPU to update its
framebuffers and let the user know what happened.

We also want to go one step further and add a taint to the kernel so that
any subsequent faults can be traced back to this failure. This is
important for igt, where if the GPU/driver fails we want to reboot and
restart testing rather than continue on into oblivion.

TAINT_DIE is colloquially known as "system on fire", which seems
appropriate for unresponsive hardware.

References: https://bugs.freedesktop.org/show_bug.cgi?id=103514
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Michał Winiarski <michal.winiarski@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 696d5cdf2779..f08343be880c 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1904,10 +1904,24 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
 
 	ret = intel_gpu_reset(i915, ALL_ENGINES);
 	if (ret) {
-		if (ret != -ENODEV)
-			DRM_ERROR("Failed to reset chip: %i\n", ret);
-		else
+		/*
+		 * History tells us that if we cannot reset the GPU now, we
+		 * never will. This then impacts everything that is run
+		 * subsequently. On failing the reset, we mark the driver
+		 * as wedged, preventing further execution on the GPU.
+		 * We also want to go one step further and add a taint to the
+		 * kernel so that any subsequent faults can be traced back to
+		 * this failure. This is important for igt, where if the
+		 * GPU/driver fails we want to reboot and restart testing
+		 * rather than continue on into oblivion.
+		 */
+		if (ret != -ENODEV) {
+			dev_err(i915->drm.dev,
+				"Failed to reset chip: %i\n", ret);
+			add_taint(TAINT_DIE, LOCKDEP_STILL_OK);
+		} else {
 			DRM_DEBUG_DRIVER("GPU reset disabled\n");
+		}
 		goto error;
 	}
 
-- 
2.15.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-12-05 21:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-29 13:59 [PATCH] drm/i915: Taint (TAINT_DIE) the kernel if the GPU reset fails Chris Wilson
2017-11-29 14:05 ` [PATCH v2] " Chris Wilson
2017-11-30 12:24   ` Lofstedt, Marta
2017-12-04 13:41   ` Joonas Lahtinen
2017-12-04 13:45     ` Chris Wilson
2017-12-05 16:56     ` Chris Wilson
2017-12-05 17:06   ` Chris Wilson
2017-11-30 10:02 ` ✗ Fi.CI.BAT: failure for drm/i915: Taint (TAINT_DIE) the kernel if the GPU reset fails (rev2) Patchwork
2017-11-30 14:15 ` Patchwork
2017-12-05 17:26 ` [PATCH v3] drm/i915: Taint (TAINT_WARN) the kernel if the GPU reset fails Chris Wilson
2017-12-05 17:27 ` [PATCH v4] " Chris Wilson
2017-12-05 18:34 ` ✓ Fi.CI.BAT: success for drm/i915: Taint (TAINT_DIE) the kernel if the GPU reset fails (rev4) Patchwork
2017-12-05 21:09 ` ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.