GPU and machine hang fixes, take 2

public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed

* GPU and machine hang fixes, take 2
@ 2016-04-09  9:27 Chris Wilson
  2016-04-09  9:27 ` [PATCH 1/4] drm/i915: Prevent machine death on Ivybridge context switching Chris Wilson
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Chris Wilson @ 2016-04-09  9:27 UTC (permalink / raw)
  To: intel-gfx

Please review or even just ack!

Tvrtko, please have another look at 4/4 as it impacts upon requests and
both the state we track in a request and activity tracked by requests.
-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] drm/i915: Prevent machine death on Ivybridge context switching
  2016-04-09  9:27 GPU and machine hang fixes, take 2 Chris Wilson
@ 2016-04-09  9:27 ` Chris Wilson
  2016-04-09  9:27 ` [PATCH 2/4] drm/i915: Force ringbuffers to not be at offset 0 Chris Wilson
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2016-04-09  9:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: stable

Two concurrent writes into the same register cacheline has the chance of
killing the machine on Ivybridge and other gen7. This includes LRI
emitted from the command parser.  The MI_SET_CONTEXT itself serves as
serialising barrier and prevents the pair of register writes in the first
packet from triggering the fault.  However, if a second switch-context
immediately occurs then we may have two adjacent blocks of LRI to the
same registers which may then trigger the hang. To counteract this we
need to insert a delay after the second register write using SRM.

This is easiest to reproduce with something like
igt/gem_ctx_switch/interruptible that triggers back-to-back context
switches (with no operations in between them in the command stream,
which requires the execbuf operation to be interrupted after the
MI_SET_CONTEXT) but can be observed sporadically elsewhere when running
interruptible igt. No reports from the wild though, so it must be of low
enough frequency that no one has correlated the random machine freezes
with i915.ko

The issue was introduced with
commit 2c550183476dfa25641309ae9a28d30feed14379 [v3.19]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Dec 16 10:02:27 2014 +0000

    drm/i915: Disable PSMI sleep messages on all rings around context switches

Testcase: igt/gem_ctx_switch/render-interruptible #ivb
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_gem_context.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index fe580cb9501a..e5ad7b21e356 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -539,7 +539,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 
 	len = 4;
 	if (INTEL_INFO(engine->dev)->gen >= 7)
-		len += 2 + (num_rings ? 4*num_rings + 2 : 0);
+		len += 2 + (num_rings ? 4*num_rings + 6 : 0);
 
 	ret = intel_ring_begin(req, len);
 	if (ret)
@@ -579,6 +579,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 	if (INTEL_INFO(engine->dev)->gen >= 7) {
 		if (num_rings) {
 			struct intel_engine_cs *signaller;
+			i915_reg_t last_reg = {}; /* keep gcc quiet */
 
 			intel_ring_emit(engine,
 					MI_LOAD_REGISTER_IMM(num_rings));
@@ -586,11 +587,19 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 				if (signaller == engine)
 					continue;
 
-				intel_ring_emit_reg(engine,
-						    RING_PSMI_CTL(signaller->mmio_base));
+				last_reg = RING_PSMI_CTL(signaller->mmio_base);
+				intel_ring_emit_reg(engine, last_reg);
 				intel_ring_emit(engine,
 						_MASKED_BIT_DISABLE(GEN6_PSMI_SLEEP_MSG_DISABLE));
 			}
+
+			/* Insert a delay before the next switch! */
+			intel_ring_emit(engine,
+					MI_STORE_REGISTER_MEM |
+					MI_SRM_LRM_GLOBAL_GTT);
+			intel_ring_emit_reg(engine, last_reg);
+			intel_ring_emit(engine, engine->scratch.gtt_offset);
+			intel_ring_emit(engine, MI_NOOP);
 		}
 		intel_ring_emit(engine, MI_ARB_ON_OFF | MI_ARB_ENABLE);
 	}
-- 
2.8.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/4] drm/i915: Force ringbuffers to not be at offset 0
  2016-04-09  9:27 GPU and machine hang fixes, take 2 Chris Wilson
  2016-04-09  9:27 ` [PATCH 1/4] drm/i915: Prevent machine death on Ivybridge context switching Chris Wilson
@ 2016-04-09  9:27 ` Chris Wilson
  2016-04-09  9:39   ` Chris Wilson
  2016-04-09  9:27 ` [PATCH 3/4] drm/i915: Move the mb() following release-mmap into release-mmap Chris Wilson
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-04-09  9:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: Tvrtko Ursulin, Chris Wilson, stable

For reasons unknown Sandybridge GT1 (at least) will eventually hang when
it encounters a ring wraparound at offset 0. The test case that
reproduces the bug reliably forces a large number of interrupted context
switches, thereby causing very frequent ring wraparounds, but there are
similar bug reports in the wild with the same symptoms, seqno writes
stop just before the wrap and the ringbuffer at address 0. It is also
timing crucial, but adding various delays hasn't helped pinpoint where
the window lies.

Whether the fault is restricted to the ringbuffer itself or the GTT
addressing is unclear, but moving the ringbuffer fixes all the hangs I
have been able to reproduce.

References: (e.g.) https://bugs.freedesktop.org/show_bug.cgi?id=93262
Testcase: igt/gem_exec_whisper/render-contexts-interruptible #snb-gt1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6b4952031e30..2f13ab2a1098 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2113,10 +2113,12 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct i915_ggtt *ggtt = &dev_priv->ggtt;
 	struct drm_i915_gem_object *obj = ringbuf->obj;
+	/* Ring wraparound at offset 0 sometimes hangs. No idea why. */
+	unsigned flags = PIN_OFFSET_BIAS | 4096;
 	int ret;
 
 	if (HAS_LLC(dev_priv) && !obj->stolen) {
-		ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, 0);
+		ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, flags);
 		if (ret)
 			return ret;
 
@@ -2132,7 +2134,8 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 			return -ENOMEM;
 		}
 	} else {
-		ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
+		ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE,
+					    flags | PIN_MAPPABLE);
 		if (ret)
 			return ret;
 
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/4] drm/i915: Move the mb() following release-mmap into release-mmap
  2016-04-09  9:27 GPU and machine hang fixes, take 2 Chris Wilson
  2016-04-09  9:27 ` [PATCH 1/4] drm/i915: Prevent machine death on Ivybridge context switching Chris Wilson
  2016-04-09  9:27 ` [PATCH 2/4] drm/i915: Force ringbuffers to not be at offset 0 Chris Wilson
@ 2016-04-09  9:27 ` Chris Wilson
  2016-04-09  9:27 ` [PATCH 4/4] drm/i915: Late request cancellations are harmful Chris Wilson
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2016-04-09  9:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Goel, Akash

As paranoia, we want to ensure that the CPU's PTEs have been revoked for
the object before we return from i915_gem_release_mmap(). This allows us
to rely on there being no outstanding memory accesses and guarantees
serialisation of the code against concurrent access just by calling
i915_gem_release_mmap().

v2: Reduce the mb() into a wmb() following the revoke.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5a65a7663b88..1c3ff56594d6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1964,11 +1964,21 @@ out:
 void
 i915_gem_release_mmap(struct drm_i915_gem_object *obj)
 {
+	/* Serialisation between user GTT access and our code depends upon
+	 * revoking the CPU's PTE whilst the mutex is held. The next user
+	 * pagefault then has to wait until we release the mutex.
+	 */
+	lockdep_assert_held(&obj->base.dev->struct_mutex);
+
 	if (!obj->fault_mappable)
 		return;
 
 	drm_vma_node_unmap(&obj->base.vma_node,
 			   obj->base.dev->anon_inode->i_mapping);
+
+	/* Ensure that the CPU's PTE are revoked before we return */
+	wmb();
+
 	obj->fault_mappable = false;
 }
 
@@ -3296,9 +3306,6 @@ static void i915_gem_object_finish_gtt(struct drm_i915_gem_object *obj)
 	if ((obj->base.read_domains & I915_GEM_DOMAIN_GTT) == 0)
 		return;
 
-	/* Wait for any direct GTT access to complete */
-	mb();
-
 	old_read_domains = obj->base.read_domains;
 	old_write_domain = obj->base.write_domain;
 
-- 
2.8.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/4] drm/i915: Late request cancellations are harmful
  2016-04-09  9:27 GPU and machine hang fixes, take 2 Chris Wilson
                   ` (2 preceding siblings ...)
  2016-04-09  9:27 ` [PATCH 3/4] drm/i915: Move the mb() following release-mmap into release-mmap Chris Wilson
@ 2016-04-09  9:27 ` Chris Wilson
  2016-04-11 13:50   ` Tvrtko Ursulin
  2016-04-09  9:56 ` ✗ Fi.CI.BAT: warning for series starting with [1/4] drm/i915: Prevent machine death on Ivybridge context switching Patchwork
  2016-04-12  7:54 ` ✗ Fi.CI.BAT: warning for series starting with [1/4] drm/i915: Prevent machine death on Ivybridge context switching (rev2) Patchwork
  5 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-04-09  9:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: Tvrtko Ursulin, Chris Wilson, Daniel Vetter, stable

Conceptually, each request is a record of a hardware transaction - we
build up a list of pending commands and then either commit them to
hardware, or cancel them. However, whilst building up the list of
pending commands, we may modify state outside of the request and make
references to the pending request. If we do so and then cancel that
request, external objects then point to the deleted request leading to
both graphical and memory corruption.

The easiest example is to consider object/VMA tracking. When we mark an
object as active in a request, we store a pointer to this, the most
recent request, in the object. Then we want to free that object, we wait
for the most recent request to be idle before proceeding (otherwise the
hardware will write to pages now owned by the system, or we will attempt
to read from those pages before the hardware is finished writing). If
the request was cancelled instead, that wait completes immediately. As a
result, all requests must be committed and not cancelled if the external
state is unknown.

All that remains of i915_gem_request_cancel() users are just a couple of
extremely unlikely allocation failures, so remove the API entirely.

A consequence of committing all incomplete requests is that we generate
excess breadcrumbs and fill the ring much more often with dummy work. We
have completely undone the outstanding_last_seqno optimisation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_drv.h            |  2 --
 drivers/gpu/drm/i915/i915_gem.c            | 50 ++++++++++++------------------
 drivers/gpu/drm/i915/i915_gem_context.c    | 21 ++++++-------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 15 +++------
 drivers/gpu/drm/i915/intel_display.c       |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c           |  4 +--
 drivers/gpu/drm/i915/intel_overlay.c       |  8 ++---
 7 files changed, 39 insertions(+), 63 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a93e5dd4fa9a..f374db8de673 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2320,7 +2320,6 @@ struct drm_i915_gem_request {
 struct drm_i915_gem_request * __must_check
 i915_gem_request_alloc(struct intel_engine_cs *engine,
 		       struct intel_context *ctx);
-void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 void i915_gem_request_free(struct kref *req_ref);
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 				   struct drm_file *file);
@@ -2872,7 +2871,6 @@ int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 			     struct drm_file *file_priv);
 void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 					struct drm_i915_gem_request *req);
-void i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params);
 int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 				   struct drm_i915_gem_execbuffer2 *args,
 				   struct list_head *vmas);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1c3ff56594d6..42227495803f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2753,7 +2753,8 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
 		 * fully prepared. Thus it can be cleaned up using the proper
 		 * free code.
 		 */
-		i915_gem_request_cancel(req);
+		intel_ring_reserved_space_cancel(req->ringbuf);
+		i915_gem_request_unreference(req);
 		return ret;
 	}
 
@@ -2790,13 +2791,6 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 	return err ? ERR_PTR(err) : req;
 }
 
-void i915_gem_request_cancel(struct drm_i915_gem_request *req)
-{
-	intel_ring_reserved_space_cancel(req->ringbuf);
-
-	i915_gem_request_unreference(req);
-}
-
 struct drm_i915_gem_request *
 i915_gem_find_active_request(struct intel_engine_cs *engine)
 {
@@ -3410,12 +3404,9 @@ int i915_gpu_idle(struct drm_device *dev)
 				return PTR_ERR(req);
 
 			ret = i915_switch_context(req);
-			if (ret) {
-				i915_gem_request_cancel(req);
-				return ret;
-			}
-
 			i915_add_request_no_flush(req);
+			if (ret)
+				return ret;
 		}
 
 		ret = intel_engine_idle(engine);
@@ -4917,34 +4908,33 @@ i915_gem_init_hw(struct drm_device *dev)
 		req = i915_gem_request_alloc(engine, NULL);
 		if (IS_ERR(req)) {
 			ret = PTR_ERR(req);
-			i915_gem_cleanup_engines(dev);
-			goto out;
+			break;
 		}
 
 		if (engine->id == RCS) {
-			for (j = 0; j < NUM_L3_SLICES(dev); j++)
-				i915_gem_l3_remap(req, j);
+			for (j = 0; j < NUM_L3_SLICES(dev); j++) {
+				ret = i915_gem_l3_remap(req, j);
+				if (ret)
+					goto err_request;
+			}
 		}
 
 		ret = i915_ppgtt_init_ring(req);
-		if (ret && ret != -EIO) {
-			DRM_ERROR("PPGTT enable %s failed %d\n",
-				  engine->name, ret);
-			i915_gem_request_cancel(req);
-			i915_gem_cleanup_engines(dev);
-			goto out;
-		}
+		if (ret)
+			goto err_request;
 
 		ret = i915_gem_context_enable(req);
-		if (ret && ret != -EIO) {
-			DRM_ERROR("Context enable %s failed %d\n",
+		if (ret)
+			goto err_request;
+
+err_request:
+		i915_add_request_no_flush(req);
+		if (ret) {
+			DRM_ERROR("Failed to enable %s, error=%d\n",
 				  engine->name, ret);
-			i915_gem_request_cancel(req);
 			i915_gem_cleanup_engines(dev);
-			goto out;
+			break;
 		}
-
-		i915_add_request_no_flush(req);
 	}
 
 out:
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index e5ad7b21e356..583b373d41ab 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -668,7 +668,6 @@ static int do_switch(struct drm_i915_gem_request *req)
 	struct drm_i915_private *dev_priv = req->i915;
 	struct intel_context *from = engine->last_context;
 	u32 hw_flags = 0;
-	bool uninitialized = false;
 	int ret, i;
 
 	if (from != NULL && engine == &dev_priv->engine[RCS]) {
@@ -776,6 +775,15 @@ static int do_switch(struct drm_i915_gem_request *req)
 			to->remap_slice &= ~(1<<i);
 	}
 
+	if (!to->legacy_hw_ctx.initialized) {
+		if (engine->init_context) {
+			ret = engine->init_context(req);
+			if (ret)
+				goto unpin_out;
+		}
+		to->legacy_hw_ctx.initialized = true;
+	}
+
 	/* The backing object for the context is done after switching to the
 	 * *next* context. Therefore we cannot retire the previous context until
 	 * the next context has already started running. In fact, the below code
@@ -799,21 +807,10 @@ static int do_switch(struct drm_i915_gem_request *req)
 		i915_gem_context_unreference(from);
 	}
 
-	uninitialized = !to->legacy_hw_ctx.initialized;
-	to->legacy_hw_ctx.initialized = true;
-
 done:
 	i915_gem_context_reference(to);
 	engine->last_context = to;
 
-	if (uninitialized) {
-		if (engine->init_context) {
-			ret = engine->init_context(req);
-			if (ret)
-				DRM_ERROR("ring init context: %d\n", ret);
-		}
-	}
-
 	return 0;
 
 unpin_out:
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 0ee61fd014df..b733f7bdafca 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1137,7 +1137,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 	}
 }
 
-void
+static void
 i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params)
 {
 	/* Unconditionally force add_request to emit a full flush. */
@@ -1322,7 +1322,6 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
 
 	i915_gem_execbuffer_move_to_active(vmas, params->request);
-	i915_gem_execbuffer_retire_commands(params);
 
 	return 0;
 }
@@ -1624,7 +1623,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 
 	ret = i915_gem_request_add_to_client(req, file);
 	if (ret)
-		goto err_batch_unpin;
+		goto err_request;
 
 	/*
 	 * Save assorted stuff away to pass through to *_submission().
@@ -1641,6 +1640,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	params->request                 = req;
 
 	ret = dev_priv->gt.execbuf_submit(params, args, &eb->vmas);
+err_request:
+	i915_gem_execbuffer_retire_commands(params);
 
 err_batch_unpin:
 	/*
@@ -1657,14 +1658,6 @@ err:
 	i915_gem_context_unreference(ctx);
 	eb_destroy(eb);
 
-	/*
-	 * If the request was created but not successfully submitted then it
-	 * must be freed again. If it was submitted then it is being tracked
-	 * on the active request list and no clean up is required here.
-	 */
-	if (ret && !IS_ERR_OR_NULL(req))
-		i915_gem_request_cancel(req);
-
 	mutex_unlock(&dev->struct_mutex);
 
 pre_mutex_err:
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index feb7028341b8..e3637a12126c 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11654,7 +11654,7 @@ cleanup_unpin:
 	intel_unpin_fb_obj(fb, crtc->primary->state->rotation);
 cleanup_pending:
 	if (!IS_ERR_OR_NULL(request))
-		i915_gem_request_cancel(request);
+		i915_add_request_no_flush(request);
 	atomic_dec(&intel_crtc->unpin_work_count);
 	mutex_unlock(&dev->struct_mutex);
 cleanup:
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a1db6a02cf23..8784f0769de3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1016,7 +1016,6 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
 
 	i915_gem_execbuffer_move_to_active(vmas, params->request);
-	i915_gem_execbuffer_retire_commands(params);
 
 	return 0;
 }
@@ -2657,13 +2656,12 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 		}
 
 		ret = engine->init_context(req);
+		i915_add_request_no_flush(req);
 		if (ret) {
 			DRM_ERROR("ring init context: %d\n",
 				ret);
-			i915_gem_request_cancel(req);
 			goto error_ringbuf;
 		}
-		i915_add_request_no_flush(req);
 	}
 	return 0;
 
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index 6694e9230cd5..bcc3b6a016d8 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -247,7 +247,7 @@ static int intel_overlay_on(struct intel_overlay *overlay)
 
 	ret = intel_ring_begin(req, 4);
 	if (ret) {
-		i915_gem_request_cancel(req);
+		i915_add_request_no_flush(req);
 		return ret;
 	}
 
@@ -290,7 +290,7 @@ static int intel_overlay_continue(struct intel_overlay *overlay,
 
 	ret = intel_ring_begin(req, 2);
 	if (ret) {
-		i915_gem_request_cancel(req);
+		i915_add_request_no_flush(req);
 		return ret;
 	}
 
@@ -356,7 +356,7 @@ static int intel_overlay_off(struct intel_overlay *overlay)
 
 	ret = intel_ring_begin(req, 6);
 	if (ret) {
-		i915_gem_request_cancel(req);
+		i915_add_request_no_flush(req);
 		return ret;
 	}
 
@@ -431,7 +431,7 @@ static int intel_overlay_release_old_vid(struct intel_overlay *overlay)
 
 		ret = intel_ring_begin(req, 2);
 		if (ret) {
-			i915_gem_request_cancel(req);
+			i915_add_request_no_flush(req);
 			return ret;
 		}
 
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] drm/i915: Force ringbuffers to not be at offset 0
  2016-04-09  9:27 ` [PATCH 2/4] drm/i915: Force ringbuffers to not be at offset 0 Chris Wilson
@ 2016-04-09  9:39   ` Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2016-04-09  9:39 UTC (permalink / raw)
  To: intel-gfx; +Cc: stable

On Sat, Apr 09, 2016 at 10:27:37AM +0100, Chris Wilson wrote:
> For reasons unknown Sandybridge GT1 (at least) will eventually hang when
> it encounters a ring wraparound at offset 0. The test case that
> reproduces the bug reliably forces a large number of interrupted context
> switches, thereby causing very frequent ring wraparounds, but there are
> similar bug reports in the wild with the same symptoms, seqno writes
> stop just before the wrap and the ringbuffer at address 0. It is also
> timing crucial, but adding various delays hasn't helped pinpoint where
> the window lies.

This also seems to fix resume on my SNB-GT2, but I guess that is a
different issue related to address 0. And long ago, I reported PNV also
hung when resumed when it hang ringbuffers at address 0 - so I wonder if
that's the same as well...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* ✗ Fi.CI.BAT: warning for series starting with [1/4] drm/i915: Prevent machine death on Ivybridge context switching
  2016-04-09  9:27 GPU and machine hang fixes, take 2 Chris Wilson
                   ` (3 preceding siblings ...)
  2016-04-09  9:27 ` [PATCH 4/4] drm/i915: Late request cancellations are harmful Chris Wilson
@ 2016-04-09  9:56 ` Patchwork
  2016-04-09 10:39   ` Chris Wilson
  2016-04-12  7:54 ` ✗ Fi.CI.BAT: warning for series starting with [1/4] drm/i915: Prevent machine death on Ivybridge context switching (rev2) Patchwork
  5 siblings, 1 reply; 12+ messages in thread
From: Patchwork @ 2016-04-09  9:56 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/4] drm/i915: Prevent machine death on Ivybridge context switching
URL   : https://patchwork.freedesktop.org/series/5484/
State : warning

== Summary ==

Series 5484v1 Series without cover letter
http://patchwork.freedesktop.org/api/1.0/series/5484/revisions/1/mbox/

Test gem_exec_basic:
        Subgroup basic-bsd:
                dmesg-warn -> PASS       (bsw-nuc-2)
Test gem_exec_suspend:
        Subgroup basic-s3:
                dmesg-warn -> PASS       (bsw-nuc-2)
Test gem_ringfill:
        Subgroup basic-default-hang:
                pass       -> DMESG-WARN (byt-nuc)
                pass       -> DMESG-WARN (snb-x220t)
                pass       -> DMESG-WARN (snb-dellxps)
                pass       -> DMESG-WARN (hsw-brixbox)
                pass       -> DMESG-WARN (ivb-t430s)
Test gem_sync:
        Subgroup basic-bsd:
                dmesg-warn -> PASS       (bsw-nuc-2)
Test pm_rpm:
        Subgroup basic-rte:
                dmesg-warn -> PASS       (bsw-nuc-2)

bdw-nuci7        total:196  pass:184  dwarn:0   dfail:0   fail:0   skip:12 
bdw-ultra        total:196  pass:175  dwarn:0   dfail:0   fail:0   skip:21 
bsw-nuc-2        total:196  pass:159  dwarn:0   dfail:0   fail:0   skip:37 
byt-nuc          total:196  pass:160  dwarn:1   dfail:0   fail:0   skip:35 
hsw-brixbox      total:196  pass:173  dwarn:1   dfail:0   fail:0   skip:22 
ilk-hp8440p      total:196  pass:132  dwarn:0   dfail:0   fail:0   skip:64 
ivb-t430s        total:196  pass:170  dwarn:1   dfail:0   fail:0   skip:25 
skl-i7k-2        total:196  pass:173  dwarn:0   dfail:0   fail:0   skip:23 
snb-dellxps      total:196  pass:161  dwarn:1   dfail:0   fail:0   skip:34 
snb-x220t        total:196  pass:161  dwarn:1   dfail:0   fail:1   skip:33 

Results at /archive/results/CI_IGT_test/Patchwork_1852/

949884a57b51aa158e3ae9afe1f08130cdb7a3ef drm-intel-nightly: 2016y-04m-08d-10h-45m-28s UTC integration manifest
37779149c4e4b641d0ff811502d90e594882efda drm/i915: Late request cancellations are harmful
87ac23b892993e2500a35af4b3fc3eff1df38315 drm/i915: Move the mb() following release-mmap into release-mmap
ff41060287aabfb3bf463629144ccae744f9ebda drm/i915: Force ringbuffers to not be at offset 0
a64d18416237b2230a329477e8f201c6c4a42114 drm/i915: Prevent machine death on Ivybridge context switching

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ✗ Fi.CI.BAT: warning for series starting with [1/4] drm/i915: Prevent machine death on Ivybridge context switching
  2016-04-09  9:56 ` ✗ Fi.CI.BAT: warning for series starting with [1/4] drm/i915: Prevent machine death on Ivybridge context switching Patchwork
@ 2016-04-09 10:39   ` Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2016-04-09 10:39 UTC (permalink / raw)
  To: intel-gfx

On Sat, Apr 09, 2016 at 09:56:11AM -0000, Patchwork wrote:
> == Series Details ==
> 
> Series: series starting with [1/4] drm/i915: Prevent machine death on Ivybridge context switching
> URL   : https://patchwork.freedesktop.org/series/5484/
> State : warning
> 
> == Summary ==
> 
> Series 5484v1 Series without cover letter
> http://patchwork.freedesktop.org/api/1.0/series/5484/revisions/1/mbox/
> 
> Test gem_exec_basic:
>         Subgroup basic-bsd:
>                 dmesg-warn -> PASS       (bsw-nuc-2)
> Test gem_exec_suspend:
>         Subgroup basic-s3:
>                 dmesg-warn -> PASS       (bsw-nuc-2)
> Test gem_ringfill:
>         Subgroup basic-default-hang:
>                 pass       -> DMESG-WARN (byt-nuc)
>                 pass       -> DMESG-WARN (snb-x220t)
>                 pass       -> DMESG-WARN (snb-dellxps)
>                 pass       -> DMESG-WARN (hsw-brixbox)
>                 pass       -> DMESG-WARN (ivb-t430s)

which apparently requires the EIO handling fixes to silence correctly.
Hmm,
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] drm/i915: Late request cancellations are harmful
  2016-04-09  9:27 ` [PATCH 4/4] drm/i915: Late request cancellations are harmful Chris Wilson
@ 2016-04-11 13:50   ` Tvrtko Ursulin
  2016-04-11 14:19     ` Chris Wilson
  0 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-04-11 13:50 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter, stable


On 09/04/16 10:27, Chris Wilson wrote:
> Conceptually, each request is a record of a hardware transaction - we
> build up a list of pending commands and then either commit them to
> hardware, or cancel them. However, whilst building up the list of
> pending commands, we may modify state outside of the request and make
> references to the pending request. If we do so and then cancel that
> request, external objects then point to the deleted request leading to
> both graphical and memory corruption.
>
> The easiest example is to consider object/VMA tracking. When we mark an
> object as active in a request, we store a pointer to this, the most
> recent request, in the object. Then we want to free that object, we wait
> for the most recent request to be idle before proceeding (otherwise the
> hardware will write to pages now owned by the system, or we will attempt
> to read from those pages before the hardware is finished writing). If
> the request was cancelled instead, that wait completes immediately. As a
> result, all requests must be committed and not cancelled if the external
> state is unknown.

This was a bit hard to figure out.

So we cannot unwind because once we set last_read_req we lose the data 
on what was the previous one, before this transaction started?

Intuitively I don't like the idea of sending unfinished stuff to the
GPU, when it failed at some random point in ring buffer preparation.

So I am struggling with reviewing this as I have in the previous round.

> All that remains of i915_gem_request_cancel() users are just a couple of
> extremely unlikely allocation failures, so remove the API entirely.

This parts feels extra weird because in the non-execbuf cases we 
actually can cancel the transaction without any issues, correct?

Would middle-ground be to keep the cancellations for in-kernel submits, 
and for execbuf rewind the ringbuf so only request post-amble is sent to 
the GPU?

> A consequence of committing all incomplete requests is that we generate
> excess breadcrumbs and fill the ring much more often with dummy work. We
> have completely undone the outstanding_last_seqno optimisation.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: stable@vger.kernel.org
> ---
>   drivers/gpu/drm/i915/i915_drv.h            |  2 --
>   drivers/gpu/drm/i915/i915_gem.c            | 50 ++++++++++++------------------
>   drivers/gpu/drm/i915/i915_gem_context.c    | 21 ++++++-------
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 15 +++------
>   drivers/gpu/drm/i915/intel_display.c       |  2 +-
>   drivers/gpu/drm/i915/intel_lrc.c           |  4 +--
>   drivers/gpu/drm/i915/intel_overlay.c       |  8 ++---
>   7 files changed, 39 insertions(+), 63 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index a93e5dd4fa9a..f374db8de673 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2320,7 +2320,6 @@ struct drm_i915_gem_request {
>   struct drm_i915_gem_request * __must_check
>   i915_gem_request_alloc(struct intel_engine_cs *engine,
>   		       struct intel_context *ctx);
> -void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>   void i915_gem_request_free(struct kref *req_ref);
>   int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>   				   struct drm_file *file);
> @@ -2872,7 +2871,6 @@ int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
>   			     struct drm_file *file_priv);
>   void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
>   					struct drm_i915_gem_request *req);
> -void i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params);
>   int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
>   				   struct drm_i915_gem_execbuffer2 *args,
>   				   struct list_head *vmas);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1c3ff56594d6..42227495803f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2753,7 +2753,8 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
>   		 * fully prepared. Thus it can be cleaned up using the proper
>   		 * free code.
>   		 */
> -		i915_gem_request_cancel(req);
> +		intel_ring_reserved_space_cancel(req->ringbuf);
> +		i915_gem_request_unreference(req);
>   		return ret;
>   	}
>
> @@ -2790,13 +2791,6 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
>   	return err ? ERR_PTR(err) : req;
>   }
>
> -void i915_gem_request_cancel(struct drm_i915_gem_request *req)
> -{
> -	intel_ring_reserved_space_cancel(req->ringbuf);
> -
> -	i915_gem_request_unreference(req);
> -}
> -
>   struct drm_i915_gem_request *
>   i915_gem_find_active_request(struct intel_engine_cs *engine)
>   {
> @@ -3410,12 +3404,9 @@ int i915_gpu_idle(struct drm_device *dev)
>   				return PTR_ERR(req);
>
>   			ret = i915_switch_context(req);
> -			if (ret) {
> -				i915_gem_request_cancel(req);
> -				return ret;
> -			}
> -
>   			i915_add_request_no_flush(req);
> +			if (ret)
> +				return ret;

Looks like with this it could execute the context switch on the GPU but 
not update the engine->last_context in do_switch().

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] drm/i915: Late request cancellations are harmful
  2016-04-11 13:50   ` Tvrtko Ursulin
@ 2016-04-11 14:19     ` Chris Wilson
  2016-04-12  6:23       ` [PATCH] drm/i915: Reorganise legacy context switch to cope with late failure Chris Wilson
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-04-11 14:19 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, intel-gfx, stable

On Mon, Apr 11, 2016 at 02:50:17PM +0100, Tvrtko Ursulin wrote:
> 
> On 09/04/16 10:27, Chris Wilson wrote:
> >Conceptually, each request is a record of a hardware transaction - we
> >build up a list of pending commands and then either commit them to
> >hardware, or cancel them. However, whilst building up the list of
> >pending commands, we may modify state outside of the request and make
> >references to the pending request. If we do so and then cancel that
> >request, external objects then point to the deleted request leading to
> >both graphical and memory corruption.
> >
> >The easiest example is to consider object/VMA tracking. When we mark an
> >object as active in a request, we store a pointer to this, the most
> >recent request, in the object. Then we want to free that object, we wait
> >for the most recent request to be idle before proceeding (otherwise the
> >hardware will write to pages now owned by the system, or we will attempt
> >to read from those pages before the hardware is finished writing). If
> >the request was cancelled instead, that wait completes immediately. As a
> >result, all requests must be committed and not cancelled if the external
> >state is unknown.
> 
> This was a bit hard to figure out.
> 
> So we cannot unwind because once we set last_read_req we lose the
> data on what was the previous one, before this transaction started?
> 
> Intuitively I don't like the idea of sending unfinished stuff to the
> GPU, when it failed at some random point in ring buffer preparation.

Yes, but it is not unfinished though. The mm-switch is made and
flagged as completed, etc. The request does contain instructions for the
state changes it has made so far.
 
> So I am struggling with reviewing this as I have in the previous round.
> 
> >All that remains of i915_gem_request_cancel() users are just a couple of
> >extremely unlikely allocation failures, so remove the API entirely.
> 
> This parts feels extra weird because in the non-execbuf cases we
> actually can cancel the transaction without any issues, correct?

Nope. Same problem arises in that they do not know what happens
underneath calls to e.g. object_sync. Before cancellation you have to
inspect every path now and in the future to be sure that no state was
modified inside the request.
 
> Would middle-ground be to keep the cancellations for in-kernel
> submits, and for execbuf rewind the ringbuf so only request
> post-amble is sent to the GPU?

The only place that knows whether an external observer is the request
code. I am thinking about doing the cancellation there, I just need to
check that the lockless idling will be ok with that.

> >@@ -3410,12 +3404,9 @@ int i915_gpu_idle(struct drm_device *dev)
> >  				return PTR_ERR(req);
> >
> >  			ret = i915_switch_context(req);
> >-			if (ret) {
> >-				i915_gem_request_cancel(req);
> >-				return ret;
> >-			}
> >-
> >  			i915_add_request_no_flush(req);
> >+			if (ret)
> >+				return ret;
> 
> Looks like with this it could execute the context switch on the GPU
> but not update the engine->last_context in do_switch().

Hmm, that problem is present in the current code - requests are not
unwound, just deleted. We need to rearrange switch_context after the
mi_set_context() to ensure that on the subsequent call, the context is
reloaded.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] drm/i915: Reorganise legacy context switch to cope with late failure
  2016-04-11 14:19     ` Chris Wilson
@ 2016-04-12  6:23       ` Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2016-04-12  6:23 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

After mi_set_context() succeeds, we need to update the state of the
engine's last_context. This ensures that we hold a pin on the context
whilst the hardware may write to it. However, since we didn't complete
the post-switch setup of the context, we need to force the subsequent
use of the same context to complete the setup (which means updating
should_skip_switch()).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 215 ++++++++++++++++----------------
 1 file changed, 109 insertions(+), 106 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index e5ad7b21e356..361959b1d53a 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -609,50 +609,40 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 	return ret;
 }
 
-static inline bool should_skip_switch(struct intel_engine_cs *engine,
-				      struct intel_context *from,
+static inline bool should_skip_switch(struct intel_context *from,
 				      struct intel_context *to)
 {
 	if (to->remap_slice)
 		return false;
 
-	if (to->ppgtt && from == to &&
-	    !(intel_engine_flag(engine) & to->ppgtt->pd_dirty_rings))
-		return true;
+	if (!to->legacy_hw_ctx.initialized)
+		return false;
 
-	return false;
+	if (to->ppgtt && to->ppgtt->pd_dirty_rings & (1 << RCS))
+		return false;
+
+	return to == from;
 }
 
 static bool
-needs_pd_load_pre(struct intel_engine_cs *engine, struct intel_context *to)
+needs_pd_load_pre(struct intel_context *to)
 {
-	struct drm_i915_private *dev_priv = engine->dev->dev_private;
-
 	if (!to->ppgtt)
 		return false;
 
-	if (INTEL_INFO(engine->dev)->gen < 8)
-		return true;
-
-	if (engine != &dev_priv->engine[RCS])
+	if (INTEL_INFO(to->i915)->gen < 8)
 		return true;
 
 	return false;
 }
 
 static bool
-needs_pd_load_post(struct intel_engine_cs *engine, struct intel_context *to,
-		   u32 hw_flags)
+needs_pd_load_post(struct intel_context *to, u32 hw_flags)
 {
-	struct drm_i915_private *dev_priv = engine->dev->dev_private;
-
 	if (!to->ppgtt)
 		return false;
 
-	if (!IS_GEN8(engine->dev))
-		return false;
-
-	if (engine != &dev_priv->engine[RCS])
+	if (!IS_GEN8(to->i915))
 		return false;
 
 	if (hw_flags & MI_RESTORE_INHIBIT)
@@ -661,60 +651,33 @@ needs_pd_load_post(struct intel_engine_cs *engine, struct intel_context *to,
 	return false;
 }
 
-static int do_switch(struct drm_i915_gem_request *req)
+static int do_rcs_switch(struct drm_i915_gem_request *req)
 {
 	struct intel_context *to = req->ctx;
 	struct intel_engine_cs *engine = req->engine;
-	struct drm_i915_private *dev_priv = req->i915;
-	struct intel_context *from = engine->last_context;
-	u32 hw_flags = 0;
-	bool uninitialized = false;
+	struct intel_context *from;
+	u32 hw_flags;
 	int ret, i;
 
-	if (from != NULL && engine == &dev_priv->engine[RCS]) {
-		BUG_ON(from->legacy_hw_ctx.rcs_state == NULL);
-		BUG_ON(!i915_gem_obj_is_pinned(from->legacy_hw_ctx.rcs_state));
-	}
-
-	if (should_skip_switch(engine, from, to))
+	if (should_skip_switch(engine->last_context, to))
 		return 0;
 
 	/* Trying to pin first makes error handling easier. */
-	if (engine == &dev_priv->engine[RCS]) {
-		ret = i915_gem_obj_ggtt_pin(to->legacy_hw_ctx.rcs_state,
-					    get_context_alignment(engine->dev),
-					    0);
-		if (ret)
-			return ret;
-	}
+	ret = i915_gem_obj_ggtt_pin(to->legacy_hw_ctx.rcs_state,
+				    get_context_alignment(engine->dev),
+				    0);
+	if (ret)
+		return ret;
 
 	/*
 	 * Pin can switch back to the default context if we end up calling into
 	 * evict_everything - as a last ditch gtt defrag effort that also
 	 * switches to the default context. Hence we need to reload from here.
+	 *
+	 * XXX: Doing so is painfully broken!
 	 */
 	from = engine->last_context;
 
-	if (needs_pd_load_pre(engine, to)) {
-		/* Older GENs and non render rings still want the load first,
-		 * "PP_DCLV followed by PP_DIR_BASE register through Load
-		 * Register Immediate commands in Ring Buffer before submitting
-		 * a context."*/
-		trace_switch_mm(engine, to);
-		ret = to->ppgtt->switch_mm(to->ppgtt, req);
-		if (ret)
-			goto unpin_out;
-
-		/* Doing a PD load always reloads the page dirs */
-		to->ppgtt->pd_dirty_rings &= ~intel_engine_flag(engine);
-	}
-
-	if (engine != &dev_priv->engine[RCS]) {
-		if (from)
-			i915_gem_context_unreference(from);
-		goto done;
-	}
-
 	/*
 	 * Clear this page out of any CPU caches for coherent swap-in/out. Note
 	 * that thanks to write = false in this call and us not setting any gpu
@@ -727,55 +690,46 @@ static int do_switch(struct drm_i915_gem_request *req)
 	if (ret)
 		goto unpin_out;
 
-	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to)) {
-		hw_flags |= MI_RESTORE_INHIBIT;
+	if (needs_pd_load_pre(to)) {
+		/* Older GENs and non render rings still want the load first,
+		 * "PP_DCLV followed by PP_DIR_BASE register through Load
+		 * Register Immediate commands in Ring Buffer before submitting
+		 * a context."*/
+		trace_switch_mm(engine, to);
+		ret = to->ppgtt->switch_mm(to->ppgtt, req);
+		if (ret)
+			goto unpin_out;
+
+		/* Doing a PD load always reloads the page dirs */
+		to->ppgtt->pd_dirty_rings &= ~(1 << RCS);
+	}
+
+	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
 		/* NB: If we inhibit the restore, the context is not allowed to
 		 * die because future work may end up depending on valid address
 		 * space. This means we must enforce that a page table load
 		 * occur when this occurs. */
-	} else if (to->ppgtt &&
-		   (intel_engine_flag(engine) & to->ppgtt->pd_dirty_rings)) {
-		hw_flags |= MI_FORCE_RESTORE;
-		to->ppgtt->pd_dirty_rings &= ~intel_engine_flag(engine);
-	}
+		hw_flags = MI_RESTORE_INHIBIT;
+	else if (to->ppgtt && to->ppgtt->pd_dirty_rings & (1 << RCS))
+		hw_flags = MI_FORCE_RESTORE;
+	else
+		hw_flags = 0;
 
 	/* We should never emit switch_mm more than once */
-	WARN_ON(needs_pd_load_pre(engine, to) &&
-		needs_pd_load_post(engine, to, hw_flags));
+	WARN_ON(needs_pd_load_pre(to) && needs_pd_load_post(to, hw_flags));
 
-	ret = mi_set_context(req, hw_flags);
-	if (ret)
-		goto unpin_out;
-
-	/* GEN8 does *not* require an explicit reload if the PDPs have been
-	 * setup, and we do not wish to move them.
-	 */
-	if (needs_pd_load_post(engine, to, hw_flags)) {
-		trace_switch_mm(engine, to);
-		ret = to->ppgtt->switch_mm(to->ppgtt, req);
-		/* The hardware context switch is emitted, but we haven't
-		 * actually changed the state - so it's probably safe to bail
-		 * here. Still, let the user know something dangerous has
-		 * happened.
-		 */
+	if (to != from || (hw_flags & MI_FORCE_RESTORE)) {
+		ret = mi_set_context(req, hw_flags);
 		if (ret) {
-			DRM_ERROR("Failed to change address space on context switch\n");
+			/* Force the reload of this and the previous mm */
+			if (needs_pd_load_pre(to))
+				to->ppgtt->pd_dirty_rings |= 1 << RCS;
+			if (from && needs_pd_load_pre(from))
+				from->ppgtt->pd_dirty_rings |= 1 << RCS;
 			goto unpin_out;
 		}
 	}
 
-	for (i = 0; i < MAX_L3_SLICES; i++) {
-		if (!(to->remap_slice & (1<<i)))
-			continue;
-
-		ret = i915_gem_l3_remap(req, i);
-		/* If it failed, try again next round */
-		if (ret)
-			DRM_DEBUG_DRIVER("L3 remapping failed\n");
-		else
-			to->remap_slice &= ~(1<<i);
-	}
-
 	/* The backing object for the context is done after switching to the
 	 * *next* context. Therefore we cannot retire the previous context until
 	 * the next context has already started running. In fact, the below code
@@ -798,27 +752,52 @@ static int do_switch(struct drm_i915_gem_request *req)
 		i915_gem_object_ggtt_unpin(from->legacy_hw_ctx.rcs_state);
 		i915_gem_context_unreference(from);
 	}
-
-	uninitialized = !to->legacy_hw_ctx.initialized;
-	to->legacy_hw_ctx.initialized = true;
-
-done:
 	i915_gem_context_reference(to);
 	engine->last_context = to;
 
-	if (uninitialized) {
+	/* GEN8 does *not* require an explicit reload if the PDPs have been
+	 * setup, and we do not wish to move them.
+	 */
+	if (needs_pd_load_post(to, hw_flags)) {
+		trace_switch_mm(engine, to);
+		ret = to->ppgtt->switch_mm(to->ppgtt, req);
+		/* The hardware context switch is emitted, but we haven't
+		 * actually changed the state - so it's probably safe to bail
+		 * here. Still, let the user know something dangerous has
+		 * happened.
+		 */
+		if (ret)
+			return ret;
+
+		to->ppgtt->pd_dirty_rings &= ~(1 << RCS);
+	}
+	if (hw_flags & MI_FORCE_RESTORE)
+		to->ppgtt->pd_dirty_rings &= ~(1 << RCS);
+
+	for (i = 0; i < MAX_L3_SLICES; i++) {
+		if (!(to->remap_slice & (1<<i)))
+			continue;
+
+		ret = i915_gem_l3_remap(req, i);
+		if (ret)
+			return ret;
+
+		to->remap_slice &= ~(1<<i);
+	}
+
+	if (!to->legacy_hw_ctx.initialized) {
 		if (engine->init_context) {
 			ret = engine->init_context(req);
 			if (ret)
-				DRM_ERROR("ring init context: %d\n", ret);
+				return ret;
 		}
+		to->legacy_hw_ctx.initialized = true;
 	}
 
 	return 0;
 
 unpin_out:
-	if (engine->id == RCS)
-		i915_gem_object_ggtt_unpin(to->legacy_hw_ctx.rcs_state);
+	i915_gem_object_ggtt_unpin(to->legacy_hw_ctx.rcs_state);
 	return ret;
 }
 
@@ -853,7 +832,31 @@ int i915_switch_context(struct drm_i915_gem_request *req)
 		return 0;
 	}
 
-	return do_switch(req);
+	if (engine->id != RCS) {
+		struct intel_context *from = engine->last_context;
+		struct intel_context *to = req->ctx;
+
+		if (to->ppgtt &&
+		    (from != to ||
+		     intel_engine_flag(engine) & to->ppgtt->pd_dirty_rings)) {
+			int ret;
+
+			trace_switch_mm(engine, to);
+			ret = to->ppgtt->switch_mm(to->ppgtt, req);
+			if (ret)
+				return ret;
+
+			to->ppgtt->pd_dirty_rings &= ~intel_engine_flag(engine);
+		}
+
+		if (from)
+			i915_gem_context_unreference(from);
+		i915_gem_context_reference(to);
+		engine->last_context = to;
+		return 0;
+	}
+
+	return do_rcs_switch(req);
 }
 
 static bool contexts_enabled(struct drm_device *dev)
-- 
2.8.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* ✗ Fi.CI.BAT: warning for series starting with [1/4] drm/i915: Prevent machine death on Ivybridge context switching (rev2)
  2016-04-09  9:27 GPU and machine hang fixes, take 2 Chris Wilson
                   ` (4 preceding siblings ...)
  2016-04-09  9:56 ` ✗ Fi.CI.BAT: warning for series starting with [1/4] drm/i915: Prevent machine death on Ivybridge context switching Patchwork
@ 2016-04-12  7:54 ` Patchwork
  5 siblings, 0 replies; 12+ messages in thread
From: Patchwork @ 2016-04-12  7:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/4] drm/i915: Prevent machine death on Ivybridge context switching (rev2)
URL   : https://patchwork.freedesktop.org/series/5484/
State : warning

== Summary ==

Series 5484v2 Series without cover letter
http://patchwork.freedesktop.org/api/1.0/series/5484/revisions/2/mbox/

Test gem_basic:
        Subgroup create-fd-close:
                incomplete -> PASS       (bsw-nuc-2)
Test gem_ctx_param_basic:
        Subgroup invalid-param-get:
                incomplete -> PASS       (bsw-nuc-2)
Test gem_exec_basic:
        Subgroup basic-bsd1:
                incomplete -> SKIP       (bsw-nuc-2)
Test gem_exec_store:
        Subgroup basic-bsd1:
                incomplete -> SKIP       (bsw-nuc-2)
        Subgroup basic-bsd2:
                incomplete -> SKIP       (bsw-nuc-2)
Test gem_ringfill:
        Subgroup basic-default-hang:
                dmesg-warn -> PASS       (bsw-nuc-2)
Test gem_storedw_loop:
        Subgroup basic-bsd2:
                incomplete -> SKIP       (bsw-nuc-2)
Test kms_addfb_basic:
        Subgroup bad-pitch-1024:
                dmesg-warn -> PASS       (bsw-nuc-2)
        Subgroup basic-y-tiled:
                incomplete -> PASS       (bsw-nuc-2)
        Subgroup small-bo:
                incomplete -> PASS       (bsw-nuc-2)
        Subgroup unused-handle:
                incomplete -> PASS       (bsw-nuc-2)
Test kms_flip:
        Subgroup basic-flip-vs-modeset:
                pass       -> DMESG-WARN (skl-i7k-2)
Test kms_force_connector_basic:
        Subgroup force-load-detect:
                pass       -> SKIP       (ilk-hp8440p)
Test kms_pipe_crc_basic:
        Subgroup hang-read-crc-pipe-a:
                incomplete -> PASS       (snb-dellxps)
        Subgroup read-crc-pipe-a:
                incomplete -> SKIP       (bsw-nuc-2)
        Subgroup read-crc-pipe-b:
                incomplete -> SKIP       (bsw-nuc-2)
Test pm_rpm:
        Subgroup basic-rte:
                pass       -> DMESG-WARN (byt-nuc) UNSTABLE
Test prime_self_import:
        Subgroup basic-with_one_bo:
                incomplete -> PASS       (bsw-nuc-2)
        Subgroup basic-with_two_bos:
                incomplete -> PASS       (bsw-nuc-2)

bdw-nuci7        total:202  pass:190  dwarn:0   dfail:0   fail:0   skip:12 
bdw-ultra        total:202  pass:179  dwarn:0   dfail:0   fail:0   skip:23 
bsw-nuc-2        total:201  pass:162  dwarn:0   dfail:0   fail:0   skip:39 
byt-nuc          total:201  pass:162  dwarn:1   dfail:0   fail:0   skip:38 
hsw-brixbox      total:202  pass:178  dwarn:0   dfail:0   fail:0   skip:24 
ilk-hp8440p      total:202  pass:133  dwarn:0   dfail:0   fail:0   skip:69 
ivb-t430s        total:202  pass:174  dwarn:0   dfail:0   fail:0   skip:28 
skl-i7k-2        total:202  pass:176  dwarn:1   dfail:0   fail:0   skip:25 
snb-dellxps      total:202  pass:164  dwarn:0   dfail:0   fail:0   skip:38 
snb-x220t        total:202  pass:164  dwarn:0   dfail:0   fail:1   skip:37 

Results at /archive/results/CI_IGT_test/Patchwork_1866/

dc5380b5263ebb0bf251bb09db542585702b528b drm-intel-nightly: 2016y-04m-11d-19h-43m-10s UTC integration manifest
82c52bb3831f06caddbb6f10ebaefa5e2dbdc44b drm/i915: Reorganise legacy context switch to cope with late failure
36dc69a8bc7e9f48edd60d5120416afa5973d516 drm/i915: Move the mb() following release-mmap into release-mmap
508a7170e74f9eb1436c7c05d601218ce9fb9bf4 drm/i915: Force ringbuffers to not be at offset 0
bd468dc213bc0f4d596bdc6215aedc3a0751dc15 drm/i915: Prevent machine death on Ivybridge context switching

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-04-12  7:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-09  9:27 GPU and machine hang fixes, take 2 Chris Wilson
2016-04-09  9:27 ` [PATCH 1/4] drm/i915: Prevent machine death on Ivybridge context switching Chris Wilson
2016-04-09  9:27 ` [PATCH 2/4] drm/i915: Force ringbuffers to not be at offset 0 Chris Wilson
2016-04-09  9:39   ` Chris Wilson
2016-04-09  9:27 ` [PATCH 3/4] drm/i915: Move the mb() following release-mmap into release-mmap Chris Wilson
2016-04-09  9:27 ` [PATCH 4/4] drm/i915: Late request cancellations are harmful Chris Wilson
2016-04-11 13:50   ` Tvrtko Ursulin
2016-04-11 14:19     ` Chris Wilson
2016-04-12  6:23       ` [PATCH] drm/i915: Reorganise legacy context switch to cope with late failure Chris Wilson
2016-04-09  9:56 ` ✗ Fi.CI.BAT: warning for series starting with [1/4] drm/i915: Prevent machine death on Ivybridge context switching Patchwork
2016-04-09 10:39   ` Chris Wilson
2016-04-12  7:54 ` ✗ Fi.CI.BAT: warning for series starting with [1/4] drm/i915: Prevent machine death on Ivybridge context switching (rev2) Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox