[PATCH 1/5] drm/i915: kicking rings stuck on semaphores considered harmful

public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed

* [PATCH 1/5] drm/i915: kicking rings stuck on semaphores considered harmful
@ 2011-10-30 19:12 Daniel Vetter
  2011-10-30 19:12 ` [PATCH 2/5] drm/i915: don't bail out of intel_wait_ring_buffer too early Daniel Vetter
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Daniel Vetter @ 2011-10-30 19:12 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

If our semaphore logic gets confused and we have a ring stuck waiting
for one, there's a decent chance it'll just execute garbage when being
kicked. Also, kicking the ring obscures the place where the error
first occured, making error_state decoding much harder.

So drop this an let gpu reset handle this mess in a clean fashion.

In contrast, kicking rings stuck on MI_WAIT is rather harmless, at
worst there'll be a bit of screen-flickering. There's also old
broken userspace out there which needs this as a  work-around.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@hchris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_irq.c |    7 -------
 1 files changed, 0 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index b40004b..69d4044 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1649,13 +1649,6 @@ static bool kick_ring(struct intel_ring_buffer *ring)
 		I915_WRITE_CTL(ring, tmp);
 		return true;
 	}
-	if (IS_GEN6(dev) &&
-	    (tmp & RING_WAIT_SEMAPHORE)) {
-		DRM_ERROR("Kicking stuck semaphore on %s\n",
-			  ring->name);
-		I915_WRITE_CTL(ring, tmp);
-		return true;
-	}
 	return false;
 }
 
-- 
1.7.6.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/5] drm/i915: don't bail out of intel_wait_ring_buffer too early
  2011-10-30 19:12 [PATCH 1/5] drm/i915: kicking rings stuck on semaphores considered harmful Daniel Vetter
@ 2011-10-30 19:12 ` Daniel Vetter
  2011-10-31  1:29   ` Ben Widawsky
  2011-11-01 15:31   ` Eugeni Dodonov
  2011-10-30 19:12 ` [PATCH 3/5] drm/i915: switch ring->id to be a real id Daniel Vetter
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 16+ messages in thread
From: Daniel Vetter @ 2011-10-30 19:12 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

In the pre-gem days with non-existing hangcheck and gpu reset code,
this timeout of 3 seconds was pretty important to avoid stuck
processes.

But now we have the hangcheck code in gem that goes to great length
to ensure that the gpu is really dead before declaring it wedged.

So there's no need for this timeout anymore. Actually it's even harmful
because we can bail out too early (e.g. with xscreensaver slip)
when running giant batchbuffers. And our code isn't robust enough
to properly unroll any state-changes, we pretty much rely on the gpu
reset code cleaning up the mess (like cache tracking, fencing state,
active list/request tracking, ...).

With this change intel_begin_ring can only fail when the gpu is
wedged, and it will return -EAGAIN (like wait_request in case the
gpu reset is still outstanding).

v2: Chris Wilson noted that on resume timers aren't running and hence
we won't ever get kicked out of this loop by the hangcheck code. Use
an insanely large timeout instead for the HAS_GEM case to prevent
resume bugs from totally hanging the machine.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ca70e2f..6e28301 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1119,7 +1119,16 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
 	}

 	trace_i915_ring_wait_begin(ring);
-	end = jiffies + 3 * HZ;
+	if (drm_core_check_feature(dev, DRIVER_GEM))
+		/* With GEM the hangcheck timer should kick us out of the loop,
+		 * leaving it early runs the risk of corrupting GEM state (due
+		 * to running on almost untested codepaths). But on resume
+		 * timers don't work yet, so prevent a complete hang in that
+		 * case by choosing an insanely large timeout. */
+		end = jiffies + 60 * HZ;
+	else
+		end = jiffies + 3 * HZ;
+
 	do {
 		ring->head = I915_READ_HEAD(ring);
 		ring->space = ring_space(ring);
-- 
1.7.6.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/5] drm/i915: switch ring->id to be a real id
  2011-10-30 19:12 [PATCH 1/5] drm/i915: kicking rings stuck on semaphores considered harmful Daniel Vetter
  2011-10-30 19:12 ` [PATCH 2/5] drm/i915: don't bail out of intel_wait_ring_buffer too early Daniel Vetter
@ 2011-10-30 19:12 ` Daniel Vetter
  2011-10-31  1:48   ` Ben Widawsky
  2011-11-01 15:29   ` Eugeni Dodonov
  2011-10-30 19:12 ` [PATCH 4/5] drm/i915: refactor ring error state capture to use arrays Daniel Vetter
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 16+ messages in thread
From: Daniel Vetter @ 2011-10-30 19:12 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

... and add a helpr function for the places where we want a flag.

This way we can use ring->id to index into arrays.

v2: Resurrect the missing beautification-space Chris Wilson noted.
I'm moving this space around because I'll reuse ring_str in the next
patch.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |    9 +++++----
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    4 ++--
 drivers/gpu/drm/i915/i915_irq.c            |    2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c    |   14 +++++++-------
 drivers/gpu/drm/i915/intel_ringbuffer.h    |   20 ++++++++++----------
 5 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index f2e0207..9e6cd50 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -666,9 +666,9 @@ static int i915_ringbuffer_info(struct seq_file *m, void *data)
 static const char *ring_str(int ring)
 {
 	switch (ring) {
-	case RING_RENDER: return " render";
-	case RING_BSD: return " bsd";
-	case RING_BLT: return " blt";
+	case RCS: return "render";
+	case VCS: return "bsd";
+	case BCS: return "blt";
 	default: return "";
 	}
 }
@@ -711,7 +711,7 @@ static void print_error_buffers(struct seq_file *m,
 	seq_printf(m, "%s [%d]:\n", name, count);
 
 	while (count--) {
-		seq_printf(m, "  %08x %8u %04x %04x %08x%s%s%s%s%s%s",
+		seq_printf(m, "  %08x %8u %04x %04x %08x%s%s%s%s%s%s%s",
 			   err->gtt_offset,
 			   err->size,
 			   err->read_domains,
@@ -721,6 +721,7 @@ static void print_error_buffers(struct seq_file *m,
 			   tiling_flag(err->tiling),
 			   dirty_flag(err->dirty),
 			   purgeable_flag(err->purgeable),
+			   err->ring != -1 ? " " : "",
 			   ring_str(err->ring),
 			   cache_level_str(err->cache_level));
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3693e83..926ed48 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -202,9 +202,9 @@ i915_gem_object_set_to_gpu_domain(struct drm_i915_gem_object *obj,
 	cd->invalidate_domains |= invalidate_domains;
 	cd->flush_domains |= flush_domains;
 	if (flush_domains & I915_GEM_GPU_DOMAINS)
-		cd->flush_rings |= obj->ring->id;
+		cd->flush_rings |= intel_ring_flag(obj->ring);
 	if (invalidate_domains & I915_GEM_GPU_DOMAINS)
-		cd->flush_rings |= ring->id;
+		cd->flush_rings |= intel_ring_flag(ring);
 }
 
 struct eb_objects {
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 69d4044..3cd85dd 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -804,7 +804,7 @@ static u32 capture_bo_list(struct drm_i915_error_buffer *err,
 		err->tiling = obj->tiling_mode;
 		err->dirty = obj->dirty;
 		err->purgeable = obj->madv != I915_MADV_WILLNEED;
-		err->ring = obj->ring ? obj->ring->id : 0;
+		err->ring = obj->ring ? obj->ring->id : -1;
 		err->cache_level = obj->cache_level;
 
 		if (++i == count)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6e28301..3c30dba 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -726,13 +726,13 @@ void intel_ring_setup_status_page(struct intel_ring_buffer *ring)
 	 */
 	if (IS_GEN7(dev)) {
 		switch (ring->id) {
-		case RING_RENDER:
+		case RCS:
 			mmio = RENDER_HWS_PGA_GEN7;
 			break;
-		case RING_BLT:
+		case BCS:
 			mmio = BLT_HWS_PGA_GEN7;
 			break;
-		case RING_BSD:
+		case VCS:
 			mmio = BSD_HWS_PGA_GEN7;
 			break;
 		}
@@ -1185,7 +1185,7 @@ void intel_ring_advance(struct intel_ring_buffer *ring)
 
 static const struct intel_ring_buffer render_ring = {
 	.name			= "render ring",
-	.id			= RING_RENDER,
+	.id			= RCS,
 	.mmio_base		= RENDER_RING_BASE,
 	.size			= 32 * PAGE_SIZE,
 	.init			= init_render_ring,
@@ -1208,7 +1208,7 @@ static const struct intel_ring_buffer render_ring = {
 
 static const struct intel_ring_buffer bsd_ring = {
 	.name                   = "bsd ring",
-	.id			= RING_BSD,
+	.id			= VCS,
 	.mmio_base		= BSD_RING_BASE,
 	.size			= 32 * PAGE_SIZE,
 	.init			= init_ring_common,
@@ -1318,7 +1318,7 @@ gen6_bsd_ring_put_irq(struct intel_ring_buffer *ring)
 /* ring buffer for Video Codec for Gen6+ */
 static const struct intel_ring_buffer gen6_bsd_ring = {
 	.name			= "gen6 bsd ring",
-	.id			= RING_BSD,
+	.id			= VCS,
 	.mmio_base		= GEN6_BSD_RING_BASE,
 	.size			= 32 * PAGE_SIZE,
 	.init			= init_ring_common,
@@ -1453,7 +1453,7 @@ static void blt_ring_cleanup(struct intel_ring_buffer *ring)
 
 static const struct intel_ring_buffer gen6_blt_ring = {
 	.name			= "blt ring",
-	.id			= RING_BLT,
+	.id			= BCS,
 	.mmio_base		= BLT_RING_BASE,
 	.size			= 32 * PAGE_SIZE,
 	.init			= blt_ring_init,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 68281c9..c8b9cc0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -1,13 +1,6 @@
 #ifndef _INTEL_RINGBUFFER_H_
 #define _INTEL_RINGBUFFER_H_
 
-enum {
-	RCS = 0x0,
-	VCS,
-	BCS,
-	I915_NUM_RINGS,
-};
-
 struct  intel_hw_status_page {
 	u32	__iomem	*page_addr;
 	unsigned int	gfx_addr;
@@ -36,10 +29,11 @@ struct  intel_hw_status_page {
 struct  intel_ring_buffer {
 	const char	*name;
 	enum intel_ring_id {
-		RING_RENDER = 0x1,
-		RING_BSD = 0x2,
-		RING_BLT = 0x4,
+		RCS = 0x0,
+		VCS,
+		BCS,
 	} id;
+#define I915_NUM_RINGS 3
 	u32		mmio_base;
 	void		__iomem *virtual_start;
 	struct		drm_device *dev;
@@ -119,6 +113,12 @@ struct  intel_ring_buffer {
 	void *private;
 };
 
+static inline unsigned
+intel_ring_flag(struct intel_ring_buffer *ring)
+{
+	return 1 << ring->id;
+}
+
 static inline u32
 intel_ring_sync_index(struct intel_ring_buffer *ring,
 		      struct intel_ring_buffer *other)
-- 
1.7.6.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/5] drm/i915: refactor ring error state capture to use arrays
  2011-10-30 19:12 [PATCH 1/5] drm/i915: kicking rings stuck on semaphores considered harmful Daniel Vetter
  2011-10-30 19:12 ` [PATCH 2/5] drm/i915: don't bail out of intel_wait_ring_buffer too early Daniel Vetter
  2011-10-30 19:12 ` [PATCH 3/5] drm/i915: switch ring->id to be a real id Daniel Vetter
@ 2011-10-30 19:12 ` Daniel Vetter
  2011-10-30 19:32   ` Chris Wilson
  2011-10-31  1:47   ` Ben Widawsky
  2011-10-30 19:12 ` [PATCH 5/5] drm/i915: collect more per ring error state Daniel Vetter
  2011-10-31  1:25 ` [PATCH 1/5] drm/i915: kicking rings stuck on semaphores considered harmful Ben Widawsky
  4 siblings, 2 replies; 16+ messages in thread
From: Daniel Vetter @ 2011-10-30 19:12 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

The code already got unwindy and we want to dump more per-ring
registers.

Only functional change is that we now also capture the video
ring registers on ilk.

v2: fixup a refactor fumble spotted by Chris Wilson.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_debugfs.c |   55 ++++++++++++++-------------
 drivers/gpu/drm/i915/i915_drv.h     |   20 ++-------
 drivers/gpu/drm/i915/i915_irq.c     |   70 ++++++++++++++++++-----------------
 drivers/gpu/drm/i915/i915_reg.h     |   11 +----
 4 files changed, 73 insertions(+), 83 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 9e6cd50..290aece 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -735,6 +735,26 @@ static void print_error_buffers(struct seq_file *m,
 	}
 }
 
+static void i915_ring_error_state(struct seq_file *m,
+				  struct drm_device *dev,
+				  struct drm_i915_error_state *error,
+				  unsigned ring)
+{
+	seq_printf(m, "%s command stream:\n", ring_str(ring));
+	seq_printf(m, "  ACTHD: 0x%08x\n", error->acthd[ring]);
+	seq_printf(m, "  IPEIR: 0x%08x\n", error->ipeir[ring]);
+	seq_printf(m, "  IPEHR: 0x%08x\n", error->ipehr[ring]);
+	seq_printf(m, "  INSTDONE: 0x%08x\n", error->instdone[ring]);
+	if (ring == RCS) {
+		if (INTEL_INFO(dev)->gen >= 4) {
+			seq_printf(m, "  INSTDONE1: 0x%08x\n", error->instdone1);
+			seq_printf(m, "  INSTPS: 0x%08x\n", error->instps);
+		}
+		seq_printf(m, "  INSTPM: 0x%08x\n", error->instpm);
+	}
+	seq_printf(m, "  seqno: 0x%08x\n", error->seqno[ring]);
+}
+
 static int i915_error_state(struct seq_file *m, void *unused)
 {
 	struct drm_info_node *node = (struct drm_info_node *) m->private;
@@ -757,36 +777,19 @@ static int i915_error_state(struct seq_file *m, void *unused)
 	seq_printf(m, "PCI ID: 0x%04x\n", dev->pci_device);
 	seq_printf(m, "EIR: 0x%08x\n", error->eir);
 	seq_printf(m, "PGTBL_ER: 0x%08x\n", error->pgtbl_er);
-	if (INTEL_INFO(dev)->gen >= 6) {
-		seq_printf(m, "ERROR: 0x%08x\n", error->error);
-		seq_printf(m, "Blitter command stream:\n");
-		seq_printf(m, "  ACTHD:    0x%08x\n", error->bcs_acthd);
-		seq_printf(m, "  IPEIR:    0x%08x\n", error->bcs_ipeir);
-		seq_printf(m, "  IPEHR:    0x%08x\n", error->bcs_ipehr);
-		seq_printf(m, "  INSTDONE: 0x%08x\n", error->bcs_instdone);
-		seq_printf(m, "  seqno:    0x%08x\n", error->bcs_seqno);
-		seq_printf(m, "Video (BSD) command stream:\n");
-		seq_printf(m, "  ACTHD:    0x%08x\n", error->vcs_acthd);
-		seq_printf(m, "  IPEIR:    0x%08x\n", error->vcs_ipeir);
-		seq_printf(m, "  IPEHR:    0x%08x\n", error->vcs_ipehr);
-		seq_printf(m, "  INSTDONE: 0x%08x\n", error->vcs_instdone);
-		seq_printf(m, "  seqno:    0x%08x\n", error->vcs_seqno);
-	}
-	seq_printf(m, "Render command stream:\n");
-	seq_printf(m, "  ACTHD: 0x%08x\n", error->acthd);
-	seq_printf(m, "  IPEIR: 0x%08x\n", error->ipeir);
-	seq_printf(m, "  IPEHR: 0x%08x\n", error->ipehr);
-	seq_printf(m, "  INSTDONE: 0x%08x\n", error->instdone);
-	if (INTEL_INFO(dev)->gen >= 4) {
-		seq_printf(m, "  INSTDONE1: 0x%08x\n", error->instdone1);
-		seq_printf(m, "  INSTPS: 0x%08x\n", error->instps);
-	}
-	seq_printf(m, "  INSTPM: 0x%08x\n", error->instpm);
-	seq_printf(m, "  seqno: 0x%08x\n", error->seqno);
 
 	for (i = 0; i < dev_priv->num_fence_regs; i++)
 		seq_printf(m, "  fence[%d] = %08llx\n", i, error->fence[i]);
 
+	if (INTEL_INFO(dev)->gen >= 6) 
+		seq_printf(m, "ERROR: 0x%08x\n", error->error);
+
+	i915_ring_error_state(m, dev, error, RCS);
+	if (HAS_BLT(dev))
+		i915_ring_error_state(m, dev, error, BCS);
+	if (HAS_BSD(dev))
+		i915_ring_error_state(m, dev, error, VCS);
+
 	if (error->active_bo)
 		print_error_buffers(m, "Active",
 				    error->active_bo,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d2da91f..17617c1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -151,25 +151,15 @@ struct drm_i915_error_state {
 	u32 eir;
 	u32 pgtbl_er;
 	u32 pipestat[I915_MAX_PIPES];
-	u32 ipeir;
-	u32 ipehr;
-	u32 instdone;
-	u32 acthd;
+	u32 ipeir[I915_NUM_RINGS];
+	u32 ipehr[I915_NUM_RINGS];
+	u32 instdone[I915_NUM_RINGS];
+	u32 acthd[I915_NUM_RINGS];
 	u32 error; /* gen6+ */
-	u32 bcs_acthd; /* gen6+ blt engine */
-	u32 bcs_ipehr;
-	u32 bcs_ipeir;
-	u32 bcs_instdone;
-	u32 bcs_seqno;
-	u32 vcs_acthd; /* gen6+ bsd engine */
-	u32 vcs_ipehr;
-	u32 vcs_ipeir;
-	u32 vcs_instdone;
-	u32 vcs_seqno;
 	u32 instpm;
 	u32 instps;
 	u32 instdone1;
-	u32 seqno;
+	u32 seqno[I915_NUM_RINGS];
 	u64 bbaddr;
 	u64 fence[I915_MAX_NUM_FENCES];
 	struct timeval time;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 3cd85dd..70e67f1 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -876,6 +876,32 @@ i915_error_first_batchbuffer(struct drm_i915_private *dev_priv,
 	return NULL;
 }
 
+static void i915_record_ring_state(struct drm_device *dev,
+				   struct drm_i915_error_state *error,
+				   struct intel_ring_buffer *ring)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	if (INTEL_INFO(dev)->gen >= 4) {
+		error->ipeir[ring->id] = I915_READ(RING_IPEIR(ring->mmio_base));
+		error->ipehr[ring->id] = I915_READ(RING_IPEHR(ring->mmio_base));
+		error->instdone[ring->id] = I915_READ(RING_INSTDONE(ring->mmio_base));
+		if (ring->id == RCS) {
+			error->instps = I915_READ(INSTPS);
+			error->instdone1 = I915_READ(INSTDONE1);
+			error->bbaddr = I915_READ64(BB_ADDR);
+		}
+	} else {
+		error->ipeir[ring->id] = I915_READ(IPEIR);
+		error->ipehr[ring->id] = I915_READ(IPEHR);
+		error->instdone[ring->id] = I915_READ(INSTDONE);
+		error->bbaddr = 0;
+	}
+
+	error->seqno[ring->id] = ring->get_seqno(ring);
+	error->acthd[ring->id] = intel_ring_get_active_head(ring);
+}
+
 /**
  * i915_capture_error_state - capture an error record for later analysis
  * @dev: drm device
@@ -909,47 +935,23 @@ static void i915_capture_error_state(struct drm_device *dev)
 	DRM_INFO("capturing error event; look for more information in /debug/dri/%d/i915_error_state\n",
 		 dev->primary->index);
 
-	error->seqno = dev_priv->ring[RCS].get_seqno(&dev_priv->ring[RCS]);
 	error->eir = I915_READ(EIR);
 	error->pgtbl_er = I915_READ(PGTBL_ER);
 	for_each_pipe(pipe)
 		error->pipestat[pipe] = I915_READ(PIPESTAT(pipe));
 	error->instpm = I915_READ(INSTPM);
-	error->error = 0;
-	if (INTEL_INFO(dev)->gen >= 6) {
+
+	if (INTEL_INFO(dev)->gen >= 6)
 		error->error = I915_READ(ERROR_GEN6);
+	else
+		error->error = 0;
+
+	i915_record_ring_state(dev, error, &dev_priv->ring[RCS]);
+	if (HAS_BLT(dev))
+		i915_record_ring_state(dev, error, &dev_priv->ring[BCS]);
+	if (HAS_BSD(dev))
+		i915_record_ring_state(dev, error, &dev_priv->ring[VCS]);
 
-		error->bcs_acthd = I915_READ(BCS_ACTHD);
-		error->bcs_ipehr = I915_READ(BCS_IPEHR);
-		error->bcs_ipeir = I915_READ(BCS_IPEIR);
-		error->bcs_instdone = I915_READ(BCS_INSTDONE);
-		error->bcs_seqno = 0;
-		if (dev_priv->ring[BCS].get_seqno)
-			error->bcs_seqno = dev_priv->ring[BCS].get_seqno(&dev_priv->ring[BCS]);
-
-		error->vcs_acthd = I915_READ(VCS_ACTHD);
-		error->vcs_ipehr = I915_READ(VCS_IPEHR);
-		error->vcs_ipeir = I915_READ(VCS_IPEIR);
-		error->vcs_instdone = I915_READ(VCS_INSTDONE);
-		error->vcs_seqno = 0;
-		if (dev_priv->ring[VCS].get_seqno)
-			error->vcs_seqno = dev_priv->ring[VCS].get_seqno(&dev_priv->ring[VCS]);
-	}
-	if (INTEL_INFO(dev)->gen >= 4) {
-		error->ipeir = I915_READ(IPEIR_I965);
-		error->ipehr = I915_READ(IPEHR_I965);
-		error->instdone = I915_READ(INSTDONE_I965);
-		error->instps = I915_READ(INSTPS);
-		error->instdone1 = I915_READ(INSTDONE1);
-		error->acthd = I915_READ(ACTHD_I965);
-		error->bbaddr = I915_READ64(BB_ADDR);
-	} else {
-		error->ipeir = I915_READ(IPEIR);
-		error->ipehr = I915_READ(IPEHR);
-		error->instdone = I915_READ(INSTDONE);
-		error->acthd = I915_READ(ACTHD);
-		error->bbaddr = 0;
-	}
 	i915_gem_record_fences(dev, error);
 
 	/* Record the active batch and ring buffers */
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 5a09416..c292957 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -352,6 +352,9 @@
 #define IPEIR_I965	0x02064
 #define IPEHR_I965	0x02068
 #define INSTDONE_I965	0x0206c
+#define RING_IPEIR(base)	((base)+0x64)
+#define RING_IPEHR(base)	((base)+0x68)
+#define RING_INSTDONE(base)	((base)+0x6c)
 #define INSTPS		0x02070 /* 965+ only */
 #define INSTDONE1	0x0207c /* 965+ only */
 #define ACTHD_I965	0x02074
@@ -365,14 +368,6 @@
 #define INSTDONE	0x02090
 #define NOPID		0x02094
 #define HWSTAM		0x02098
-#define VCS_INSTDONE	0x1206C
-#define VCS_IPEIR	0x12064
-#define VCS_IPEHR	0x12068
-#define VCS_ACTHD	0x12074
-#define BCS_INSTDONE	0x2206C
-#define BCS_IPEIR	0x22064
-#define BCS_IPEHR	0x22068
-#define BCS_ACTHD	0x22074
 
 #define ERROR_GEN6	0x040a0
 
-- 
1.7.6.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/5] drm/i915: collect more per ring error state
  2011-10-30 19:12 [PATCH 1/5] drm/i915: kicking rings stuck on semaphores considered harmful Daniel Vetter
                   ` (2 preceding siblings ...)
  2011-10-30 19:12 ` [PATCH 4/5] drm/i915: refactor ring error state capture to use arrays Daniel Vetter
@ 2011-10-30 19:12 ` Daniel Vetter
  2011-10-31  1:51   ` Ben Widawsky
  2011-10-31  1:25 ` [PATCH 1/5] drm/i915: kicking rings stuck on semaphores considered harmful Ben Widawsky
  4 siblings, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2011-10-30 19:12 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

Based on a patch by Ben Widawsky, but with different colors
for the bikeshed.

In contrast to Ben's patch this one doesn't add the fault regs.
Afaics they're for the optional page fault support which
- we're not enabling
- and which seems to be unsupported by the hw team. Recent bspec
  lacks tons of information about this that the public docs released
  half a year back still contain.

Also dump ring HEAD/TAIL registers - I've recently seen a few
error_state where just guessing these is not good enough.

v2: Also dump INSTPM for every ring.

v3: Fix a few really silly goof-ups spotted by Chris Wilson.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c |   16 ++++++++++------
 drivers/gpu/drm/i915/i915_drv.h     |    7 +++++--
 drivers/gpu/drm/i915/i915_irq.c     |    9 +++++++--
 drivers/gpu/drm/i915/i915_reg.h     |    3 +++
 4 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 290aece..5ba63ad 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -741,17 +741,21 @@ static void i915_ring_error_state(struct seq_file *m,
 				  unsigned ring)
 {
 	seq_printf(m, "%s command stream:\n", ring_str(ring));
+	seq_printf(m, "  HEAD: 0x%08x\n", error->head[ring]);
+	seq_printf(m, "  TAIL: 0x%08x\n", error->tail[ring]);
 	seq_printf(m, "  ACTHD: 0x%08x\n", error->acthd[ring]);
 	seq_printf(m, "  IPEIR: 0x%08x\n", error->ipeir[ring]);
 	seq_printf(m, "  IPEHR: 0x%08x\n", error->ipehr[ring]);
 	seq_printf(m, "  INSTDONE: 0x%08x\n", error->instdone[ring]);
-	if (ring == RCS) {
-		if (INTEL_INFO(dev)->gen >= 4) {
-			seq_printf(m, "  INSTDONE1: 0x%08x\n", error->instdone1);
-			seq_printf(m, "  INSTPS: 0x%08x\n", error->instps);
-		}
-		seq_printf(m, "  INSTPM: 0x%08x\n", error->instpm);
+	if (ring == RCS && INTEL_INFO(dev)->gen >= 4) {
+		seq_printf(m, "  INSTDONE1: 0x%08x\n", error->instdone1);
+		seq_printf(m, "  BBADDR: 0x%08llx\n", error->bbaddr);
 	}
+	if (INTEL_INFO(dev)->gen >= 4)
+		seq_printf(m, "  INSTPS: 0x%08x\n", error->instps[ring]);
+	seq_printf(m, "  INSTPM: 0x%08x\n", error->instpm[ring]);
+	if (INTEL_INFO(dev)->gen >= 6)
+		seq_printf(m, "  FADDR: 0x%08x\n", error->faddr[ring]);
 	seq_printf(m, "  seqno: 0x%08x\n", error->seqno[ring]);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 17617c1..bd98fb3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -151,16 +151,19 @@ struct drm_i915_error_state {
 	u32 eir;
 	u32 pgtbl_er;
 	u32 pipestat[I915_MAX_PIPES];
+	u32 tail[I915_NUM_RINGS];
+	u32 head[I915_NUM_RINGS];
 	u32 ipeir[I915_NUM_RINGS];
 	u32 ipehr[I915_NUM_RINGS];
 	u32 instdone[I915_NUM_RINGS];
 	u32 acthd[I915_NUM_RINGS];
 	u32 error; /* gen6+ */
-	u32 instpm;
-	u32 instps;
+	u32 instpm[I915_NUM_RINGS];
+	u32 instps[I915_NUM_RINGS];
 	u32 instdone1;
 	u32 seqno[I915_NUM_RINGS];
 	u64 bbaddr;
+	u32 faddr[I915_NUM_RINGS];
 	u64 fence[I915_MAX_NUM_FENCES];
 	struct timeval time;
 	struct drm_i915_error_object {
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 70e67f1..a04d606 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -882,12 +882,15 @@ static void i915_record_ring_state(struct drm_device *dev,
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
+	if (INTEL_INFO(dev)->gen >= 6)
+		error->faddr[ring->id] = I915_READ(RING_DMA_FADD(ring->mmio_base));
+
 	if (INTEL_INFO(dev)->gen >= 4) {
 		error->ipeir[ring->id] = I915_READ(RING_IPEIR(ring->mmio_base));
 		error->ipehr[ring->id] = I915_READ(RING_IPEHR(ring->mmio_base));
 		error->instdone[ring->id] = I915_READ(RING_INSTDONE(ring->mmio_base));
+		error->instps[ring->id] = I915_READ(RING_INSTPS(ring->mmio_base));
 		if (ring->id == RCS) {
-			error->instps = I915_READ(INSTPS);
 			error->instdone1 = I915_READ(INSTDONE1);
 			error->bbaddr = I915_READ64(BB_ADDR);
 		}
@@ -898,8 +901,11 @@ static void i915_record_ring_state(struct drm_device *dev,
 		error->bbaddr = 0;
 	}
 
+	error->instpm[ring->id] = I915_READ(RING_INSTPM(ring->mmio_base));
 	error->seqno[ring->id] = ring->get_seqno(ring);
 	error->acthd[ring->id] = intel_ring_get_active_head(ring);
+	error->head[ring->id] = I915_READ_HEAD(ring);
+	error->tail[ring->id] = I915_READ_TAIL(ring);
 }
 
 /**
@@ -939,7 +945,6 @@ static void i915_capture_error_state(struct drm_device *dev)
 	error->pgtbl_er = I915_READ(PGTBL_ER);
 	for_each_pipe(pipe)
 		error->pipestat[pipe] = I915_READ(PIPESTAT(pipe));
-	error->instpm = I915_READ(INSTPM);
 
 	if (INTEL_INFO(dev)->gen >= 6)
 		error->error = I915_READ(ERROR_GEN6);
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index c292957..cbf5f9f 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -355,6 +355,9 @@
 #define RING_IPEIR(base)	((base)+0x64)
 #define RING_IPEHR(base)	((base)+0x68)
 #define RING_INSTDONE(base)	((base)+0x6c)
+#define RING_INSTPS(base)	((base)+0x70)
+#define RING_DMA_FADD(base)	((base)+0x78)
+#define RING_INSTPM(base)	((base)+0xc0)
 #define INSTPS		0x02070 /* 965+ only */
 #define INSTDONE1	0x0207c /* 965+ only */
 #define ACTHD_I965	0x02074
-- 
1.7.6.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] drm/i915: refactor ring error state capture to use arrays
  2011-10-30 19:12 ` [PATCH 4/5] drm/i915: refactor ring error state capture to use arrays Daniel Vetter
@ 2011-10-30 19:32   ` Chris Wilson
  2011-10-31  1:47   ` Ben Widawsky
  1 sibling, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2011-10-30 19:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

On Sun, 30 Oct 2011 20:12:11 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> The code already got unwindy and we want to dump more per-ring
> registers.

So this patch is to cook the spaghetti then? Not sure if unwindy
conjures the right imagery, I still think you mean unwieldly. ;)
 
> Only functional change is that we now also capture the video
> ring registers on ilk.
> 
> v2: fixup a refactor fumble spotted by Chris Wilson.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] drm/i915: kicking rings stuck on semaphores considered harmful
  2011-10-30 19:12 [PATCH 1/5] drm/i915: kicking rings stuck on semaphores considered harmful Daniel Vetter
                   ` (3 preceding siblings ...)
  2011-10-30 19:12 ` [PATCH 5/5] drm/i915: collect more per ring error state Daniel Vetter
@ 2011-10-31  1:25 ` Ben Widawsky
  4 siblings, 0 replies; 16+ messages in thread
From: Ben Widawsky @ 2011-10-31  1:25 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Sun, 30 Oct 2011 20:12:08 +0100
Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> If our semaphore logic gets confused and we have a ring stuck waiting
> for one, there's a decent chance it'll just execute garbage when being
> kicked. Also, kicking the ring obscures the place where the error
> first occured, making error_state decoding much harder.
> 
> So drop this an let gpu reset handle this mess in a clean fashion.
> 
> In contrast, kicking rings stuck on MI_WAIT is rather harmless, at
> worst there'll be a bit of screen-flickering. There's also old
> broken userspace out there which needs this as a  work-around.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Reviewed-by: Chris Wilson <chris@hchris-wilson.co.uk>
> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_irq.c |    7 -------
>  1 files changed, 0 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index b40004b..69d4044 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1649,13 +1649,6 @@ static bool kick_ring(struct intel_ring_buffer *ring)
>  		I915_WRITE_CTL(ring, tmp);
>  		return true;
>  	}
> -	if (IS_GEN6(dev) &&
> -	    (tmp & RING_WAIT_SEMAPHORE)) {
> -		DRM_ERROR("Kicking stuck semaphore on %s\n",
> -			  ring->name);
> -		I915_WRITE_CTL(ring, tmp);
> -		return true;
> -	}
>  	return false;
>  }
>  

This also has the added benefit that it fixes a forcewake race on gen6+.
Ben

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] drm/i915: don't bail out of intel_wait_ring_buffer too early
  2011-10-30 19:12 ` [PATCH 2/5] drm/i915: don't bail out of intel_wait_ring_buffer too early Daniel Vetter
@ 2011-10-31  1:29   ` Ben Widawsky
  2011-10-31  7:37     ` Daniel Vetter
  2011-11-01 15:31   ` Eugeni Dodonov
  1 sibling, 1 reply; 16+ messages in thread
From: Ben Widawsky @ 2011-10-31  1:29 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Sun, 30 Oct 2011 20:12:09 +0100
Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> In the pre-gem days with non-existing hangcheck and gpu reset code,
> this timeout of 3 seconds was pretty important to avoid stuck
> processes.
> 
> But now we have the hangcheck code in gem that goes to great length
> to ensure that the gpu is really dead before declaring it wedged.
> 
> So there's no need for this timeout anymore. Actually it's even harmful
> because we can bail out too early (e.g. with xscreensaver slip)
> when running giant batchbuffers. And our code isn't robust enough
> to properly unroll any state-changes, we pretty much rely on the gpu
> reset code cleaning up the mess (like cache tracking, fencing state,
> active list/request tracking, ...).
> 
> With this change intel_begin_ring can only fail when the gpu is
> wedged, and it will return -EAGAIN (like wait_request in case the
> gpu reset is still outstanding).
> 
> v2: Chris Wilson noted that on resume timers aren't running and hence
> we won't ever get kicked out of this loop by the hangcheck code. Use
> an insanely large timeout instead for the HAS_GEM case to prevent
> resume bugs from totally hanging the machine.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index ca70e2f..6e28301 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1119,7 +1119,16 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
>  	}
>  
>  	trace_i915_ring_wait_begin(ring);
> -	end = jiffies + 3 * HZ;
> +	if (drm_core_check_feature(dev, DRIVER_GEM))
> +		/* With GEM the hangcheck timer should kick us out of the loop,
> +		 * leaving it early runs the risk of corrupting GEM state (due
> +		 * to running on almost untested codepaths). But on resume
> +		 * timers don't work yet, so prevent a complete hang in that
> +		 * case by choosing an insanely large timeout. */
> +		end = jiffies + 60 * HZ;
> +	else
> +		end = jiffies + 3 * HZ;
> +
>  	do {
>  		ring->head = I915_READ_HEAD(ring);
>  		ring->space = ring_space(ring);

I didn't really check to see if there is actually an issue here, but
instead of 60, do you want to play nice with timeouts such as
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT (ie. the min of all the timeouts and
60)?

Acked-by: Ben Widawsky <ben@bwidawsk.net>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] drm/i915: refactor ring error state capture to use arrays
  2011-10-30 19:12 ` [PATCH 4/5] drm/i915: refactor ring error state capture to use arrays Daniel Vetter
  2011-10-30 19:32   ` Chris Wilson
@ 2011-10-31  1:47   ` Ben Widawsky
  2011-10-31  1:50     ` Ben Widawsky
  1 sibling, 1 reply; 16+ messages in thread
From: Ben Widawsky @ 2011-10-31  1:47 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Sun, 30 Oct 2011 20:12:11 +0100
Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> The code already got unwindy and we want to dump more per-ring
> registers.
> 
> Only functional change is that we now also capture the video
> ring registers on ilk.
> 
> v2: fixup a refactor fumble spotted by Chris Wilson.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |   55 ++++++++++++++-------------
>  drivers/gpu/drm/i915/i915_drv.h     |   20 ++-------
>  drivers/gpu/drm/i915/i915_irq.c     |   70 ++++++++++++++++++-----------------
>  drivers/gpu/drm/i915/i915_reg.h     |   11 +----
>  4 files changed, 73 insertions(+), 83 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 9e6cd50..290aece 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -735,6 +735,26 @@ static void print_error_buffers(struct seq_file *m,
>  	}
>  }
>  
> +static void i915_ring_error_state(struct seq_file *m,
> +				  struct drm_device *dev,
> +				  struct drm_i915_error_state *error,
> +				  unsigned ring)
> +{
> +	seq_printf(m, "%s command stream:\n", ring_str(ring));
> +	seq_printf(m, "  ACTHD: 0x%08x\n", error->acthd[ring]);
> +	seq_printf(m, "  IPEIR: 0x%08x\n", error->ipeir[ring]);
> +	seq_printf(m, "  IPEHR: 0x%08x\n", error->ipehr[ring]);
> +	seq_printf(m, "  INSTDONE: 0x%08x\n", error->instdone[ring]);
> +	if (ring == RCS) {
> +		if (INTEL_INFO(dev)->gen >= 4) {
> +			seq_printf(m, "  INSTDONE1: 0x%08x\n", error->instdone1);
> +			seq_printf(m, "  INSTPS: 0x%08x\n", error->instps);
> +		}
> +		seq_printf(m, "  INSTPM: 0x%08x\n", error->instpm);
> +	}
> +	seq_printf(m, "  seqno: 0x%08x\n", error->seqno[ring]);
> +}
> +
>  static int i915_error_state(struct seq_file *m, void *unused)
>  {
>  	struct drm_info_node *node = (struct drm_info_node *) m->private;
> @@ -757,36 +777,19 @@ static int i915_error_state(struct seq_file *m, void *unused)
>  	seq_printf(m, "PCI ID: 0x%04x\n", dev->pci_device);
>  	seq_printf(m, "EIR: 0x%08x\n", error->eir);
>  	seq_printf(m, "PGTBL_ER: 0x%08x\n", error->pgtbl_er);
> -	if (INTEL_INFO(dev)->gen >= 6) {
> -		seq_printf(m, "ERROR: 0x%08x\n", error->error);
> -		seq_printf(m, "Blitter command stream:\n");
> -		seq_printf(m, "  ACTHD:    0x%08x\n", error->bcs_acthd);
> -		seq_printf(m, "  IPEIR:    0x%08x\n", error->bcs_ipeir);
> -		seq_printf(m, "  IPEHR:    0x%08x\n", error->bcs_ipehr);
> -		seq_printf(m, "  INSTDONE: 0x%08x\n", error->bcs_instdone);
> -		seq_printf(m, "  seqno:    0x%08x\n", error->bcs_seqno);
> -		seq_printf(m, "Video (BSD) command stream:\n");
> -		seq_printf(m, "  ACTHD:    0x%08x\n", error->vcs_acthd);
> -		seq_printf(m, "  IPEIR:    0x%08x\n", error->vcs_ipeir);
> -		seq_printf(m, "  IPEHR:    0x%08x\n", error->vcs_ipehr);
> -		seq_printf(m, "  INSTDONE: 0x%08x\n", error->vcs_instdone);
> -		seq_printf(m, "  seqno:    0x%08x\n", error->vcs_seqno);
> -	}
> -	seq_printf(m, "Render command stream:\n");
> -	seq_printf(m, "  ACTHD: 0x%08x\n", error->acthd);
> -	seq_printf(m, "  IPEIR: 0x%08x\n", error->ipeir);
> -	seq_printf(m, "  IPEHR: 0x%08x\n", error->ipehr);
> -	seq_printf(m, "  INSTDONE: 0x%08x\n", error->instdone);
> -	if (INTEL_INFO(dev)->gen >= 4) {
> -		seq_printf(m, "  INSTDONE1: 0x%08x\n", error->instdone1);
> -		seq_printf(m, "  INSTPS: 0x%08x\n", error->instps);
> -	}
> -	seq_printf(m, "  INSTPM: 0x%08x\n", error->instpm);
> -	seq_printf(m, "  seqno: 0x%08x\n", error->seqno);
>  
>  	for (i = 0; i < dev_priv->num_fence_regs; i++)
>  		seq_printf(m, "  fence[%d] = %08llx\n", i, error->fence[i]);
>  
> +	if (INTEL_INFO(dev)->gen >= 6) 
> +		seq_printf(m, "ERROR: 0x%08x\n", error->error);
> +
> +	i915_ring_error_state(m, dev, error, RCS);
> +	if (HAS_BLT(dev))
> +		i915_ring_error_state(m, dev, error, BCS);
> +	if (HAS_BSD(dev))
> +		i915_ring_error_state(m, dev, error, VCS);
> +
>  	if (error->active_bo)
>  		print_error_buffers(m, "Active",
>  				    error->active_bo,
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index d2da91f..17617c1 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -151,25 +151,15 @@ struct drm_i915_error_state {
>  	u32 eir;
>  	u32 pgtbl_er;
>  	u32 pipestat[I915_MAX_PIPES];
> -	u32 ipeir;
> -	u32 ipehr;
> -	u32 instdone;
> -	u32 acthd;
> +	u32 ipeir[I915_NUM_RINGS];
> +	u32 ipehr[I915_NUM_RINGS];
> +	u32 instdone[I915_NUM_RINGS];
> +	u32 acthd[I915_NUM_RINGS];
>  	u32 error; /* gen6+ */
> -	u32 bcs_acthd; /* gen6+ blt engine */
> -	u32 bcs_ipehr;
> -	u32 bcs_ipeir;
> -	u32 bcs_instdone;
> -	u32 bcs_seqno;
> -	u32 vcs_acthd; /* gen6+ bsd engine */
> -	u32 vcs_ipehr;
> -	u32 vcs_ipeir;
> -	u32 vcs_instdone;
> -	u32 vcs_seqno;
>  	u32 instpm;
>  	u32 instps;
>  	u32 instdone1;
> -	u32 seqno;
> +	u32 seqno[I915_NUM_RINGS];
>  	u64 bbaddr;
>  	u64 fence[I915_MAX_NUM_FENCES];
>  	struct timeval time;
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 3cd85dd..70e67f1 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -876,6 +876,32 @@ i915_error_first_batchbuffer(struct drm_i915_private *dev_priv,
>  	return NULL;
>  }
>  
> +static void i915_record_ring_state(struct drm_device *dev,
> +				   struct drm_i915_error_state *error,
> +				   struct intel_ring_buffer *ring)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	if (INTEL_INFO(dev)->gen >= 4) {
> +		error->ipeir[ring->id] = I915_READ(RING_IPEIR(ring->mmio_base));
> +		error->ipehr[ring->id] = I915_READ(RING_IPEHR(ring->mmio_base));
> +		error->instdone[ring->id] = I915_READ(RING_INSTDONE(ring->mmio_base));
> +		if (ring->id == RCS) {
> +			error->instps = I915_READ(INSTPS);
> +			error->instdone1 = I915_READ(INSTDONE1);
> +			error->bbaddr = I915_READ64(BB_ADDR);
> +		}
> +	} else {
> +		error->ipeir[ring->id] = I915_READ(IPEIR);
> +		error->ipehr[ring->id] = I915_READ(IPEHR);
> +		error->instdone[ring->id] = I915_READ(INSTDONE);
> +		error->bbaddr = 0;
> +	}
> +
> +	error->seqno[ring->id] = ring->get_seqno(ring);
> +	error->acthd[ring->id] = intel_ring_get_active_head(ring);
> +}
> +
>  /**
>   * i915_capture_error_state - capture an error record for later analysis
>   * @dev: drm device
> @@ -909,47 +935,23 @@ static void i915_capture_error_state(struct drm_device *dev)
>  	DRM_INFO("capturing error event; look for more information in /debug/dri/%d/i915_error_state\n",
>  		 dev->primary->index);
>  
> -	error->seqno = dev_priv->ring[RCS].get_seqno(&dev_priv->ring[RCS]);
>  	error->eir = I915_READ(EIR);
>  	error->pgtbl_er = I915_READ(PGTBL_ER);
>  	for_each_pipe(pipe)
>  		error->pipestat[pipe] = I915_READ(PIPESTAT(pipe));
>  	error->instpm = I915_READ(INSTPM);
> -	error->error = 0;
> -	if (INTEL_INFO(dev)->gen >= 6) {
> +
> +	if (INTEL_INFO(dev)->gen >= 6)
>  		error->error = I915_READ(ERROR_GEN6);
> +	else
> +		error->error = 0;
> +
> +	i915_record_ring_state(dev, error, &dev_priv->ring[RCS]);
> +	if (HAS_BLT(dev))
> +		i915_record_ring_state(dev, error, &dev_priv->ring[BCS]);
> +	if (HAS_BSD(dev))
> +		i915_record_ring_state(dev, error, &dev_priv->ring[VCS]);
>  
> -		error->bcs_acthd = I915_READ(BCS_ACTHD);
> -		error->bcs_ipehr = I915_READ(BCS_IPEHR);
> -		error->bcs_ipeir = I915_READ(BCS_IPEIR);
> -		error->bcs_instdone = I915_READ(BCS_INSTDONE);
> -		error->bcs_seqno = 0;
> -		if (dev_priv->ring[BCS].get_seqno)
> -			error->bcs_seqno = dev_priv->ring[BCS].get_seqno(&dev_priv->ring[BCS]);
> -
> -		error->vcs_acthd = I915_READ(VCS_ACTHD);
> -		error->vcs_ipehr = I915_READ(VCS_IPEHR);
> -		error->vcs_ipeir = I915_READ(VCS_IPEIR);
> -		error->vcs_instdone = I915_READ(VCS_INSTDONE);
> -		error->vcs_seqno = 0;
> -		if (dev_priv->ring[VCS].get_seqno)
> -			error->vcs_seqno = dev_priv->ring[VCS].get_seqno(&dev_priv->ring[VCS]);
> -	}
> -	if (INTEL_INFO(dev)->gen >= 4) {
> -		error->ipeir = I915_READ(IPEIR_I965);
> -		error->ipehr = I915_READ(IPEHR_I965);
> -		error->instdone = I915_READ(INSTDONE_I965);
> -		error->instps = I915_READ(INSTPS);
> -		error->instdone1 = I915_READ(INSTDONE1);
> -		error->acthd = I915_READ(ACTHD_I965);
> -		error->bbaddr = I915_READ64(BB_ADDR);
> -	} else {
> -		error->ipeir = I915_READ(IPEIR);
> -		error->ipehr = I915_READ(IPEHR);
> -		error->instdone = I915_READ(INSTDONE);
> -		error->acthd = I915_READ(ACTHD);
> -		error->bbaddr = 0;
> -	}
>  	i915_gem_record_fences(dev, error);
>  
>  	/* Record the active batch and ring buffers */
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 5a09416..c292957 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -352,6 +352,9 @@
>  #define IPEIR_I965	0x02064
>  #define IPEHR_I965	0x02068
>  #define INSTDONE_I965	0x0206c
> +#define RING_IPEIR(base)	((base)+0x64)
> +#define RING_IPEHR(base)	((base)+0x68)
> +#define RING_INSTDONE(base)	((base)+0x6c)
>  #define INSTPS		0x02070 /* 965+ only */
>  #define INSTDONE1	0x0207c /* 965+ only */
>  #define ACTHD_I965	0x02074
> @@ -365,14 +368,6 @@
>  #define INSTDONE	0x02090
>  #define NOPID		0x02094
>  #define HWSTAM		0x02098
> -#define VCS_INSTDONE	0x1206C
> -#define VCS_IPEIR	0x12064
> -#define VCS_IPEHR	0x12068
> -#define VCS_ACTHD	0x12074
> -#define BCS_INSTDONE	0x2206C
> -#define BCS_IPEIR	0x22064
> -#define BCS_IPEHR	0x22068
> -#define BCS_ACTHD	0x22074
>  
>  #define ERROR_GEN6	0x040a0
>  

This patch looks a lot like an earlier patch of mine except your ring ID
changes made it quite nice. I wonder why you didn't keep my INSTPS, and INSTPM
per ring?

Ben

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] drm/i915: switch ring->id to be a real id
  2011-10-30 19:12 ` [PATCH 3/5] drm/i915: switch ring->id to be a real id Daniel Vetter
@ 2011-10-31  1:48   ` Ben Widawsky
  2011-11-01 15:29   ` Eugeni Dodonov
  1 sibling, 0 replies; 16+ messages in thread
From: Ben Widawsky @ 2011-10-31  1:48 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Sun, 30 Oct 2011 20:12:10 +0100
Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> ... and add a helpr function for the places where we want a flag.
> 
> This way we can use ring->id to index into arrays.
> 
> v2: Resurrect the missing beautification-space Chris Wilson noted.
> I'm moving this space around because I'll reuse ring_str in the next
> patch.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] drm/i915: refactor ring error state capture to use arrays
  2011-10-31  1:47   ` Ben Widawsky
@ 2011-10-31  1:50     ` Ben Widawsky
  2011-10-31  7:53       ` Daniel Vetter
  0 siblings, 1 reply; 16+ messages in thread
From: Ben Widawsky @ 2011-10-31  1:50 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Daniel Vetter, intel-gfx

On Sun, 30 Oct 2011 18:47:50 -0700
Ben Widawsky <ben@bwidawsk.net> wrote:

> On Sun, 30 Oct 2011 20:12:11 +0100
> Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> 
> > The code already got unwindy and we want to dump more per-ring
> > registers.
> > 
> > Only functional change is that we now also capture the video
> > ring registers on ilk.
> > 
> > v2: fixup a refactor fumble spotted by Chris Wilson.
> > 
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> >  drivers/gpu/drm/i915/i915_debugfs.c |   55 ++++++++++++++-------------
> >  drivers/gpu/drm/i915/i915_drv.h     |   20 ++-------
> >  drivers/gpu/drm/i915/i915_irq.c     |   70 ++++++++++++++++++-----------------
> >  drivers/gpu/drm/i915/i915_reg.h     |   11 +----
> >  4 files changed, 73 insertions(+), 83 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 9e6cd50..290aece 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -735,6 +735,26 @@ static void print_error_buffers(struct seq_file *m,
> >  	}
> >  }
> >  
> > +static void i915_ring_error_state(struct seq_file *m,
> > +				  struct drm_device *dev,
> > +				  struct drm_i915_error_state *error,
> > +				  unsigned ring)
> > +{
> > +	seq_printf(m, "%s command stream:\n", ring_str(ring));
> > +	seq_printf(m, "  ACTHD: 0x%08x\n", error->acthd[ring]);
> > +	seq_printf(m, "  IPEIR: 0x%08x\n", error->ipeir[ring]);
> > +	seq_printf(m, "  IPEHR: 0x%08x\n", error->ipehr[ring]);
> > +	seq_printf(m, "  INSTDONE: 0x%08x\n", error->instdone[ring]);
> > +	if (ring == RCS) {
> > +		if (INTEL_INFO(dev)->gen >= 4) {
> > +			seq_printf(m, "  INSTDONE1: 0x%08x\n", error->instdone1);
> > +			seq_printf(m, "  INSTPS: 0x%08x\n", error->instps);
> > +		}
> > +		seq_printf(m, "  INSTPM: 0x%08x\n", error->instpm);
> > +	}
> > +	seq_printf(m, "  seqno: 0x%08x\n", error->seqno[ring]);
> > +}
> > +
> >  static int i915_error_state(struct seq_file *m, void *unused)
> >  {
> >  	struct drm_info_node *node = (struct drm_info_node *) m->private;
> > @@ -757,36 +777,19 @@ static int i915_error_state(struct seq_file *m, void *unused)
> >  	seq_printf(m, "PCI ID: 0x%04x\n", dev->pci_device);
> >  	seq_printf(m, "EIR: 0x%08x\n", error->eir);
> >  	seq_printf(m, "PGTBL_ER: 0x%08x\n", error->pgtbl_er);
> > -	if (INTEL_INFO(dev)->gen >= 6) {
> > -		seq_printf(m, "ERROR: 0x%08x\n", error->error);
> > -		seq_printf(m, "Blitter command stream:\n");
> > -		seq_printf(m, "  ACTHD:    0x%08x\n", error->bcs_acthd);
> > -		seq_printf(m, "  IPEIR:    0x%08x\n", error->bcs_ipeir);
> > -		seq_printf(m, "  IPEHR:    0x%08x\n", error->bcs_ipehr);
> > -		seq_printf(m, "  INSTDONE: 0x%08x\n", error->bcs_instdone);
> > -		seq_printf(m, "  seqno:    0x%08x\n", error->bcs_seqno);
> > -		seq_printf(m, "Video (BSD) command stream:\n");
> > -		seq_printf(m, "  ACTHD:    0x%08x\n", error->vcs_acthd);
> > -		seq_printf(m, "  IPEIR:    0x%08x\n", error->vcs_ipeir);
> > -		seq_printf(m, "  IPEHR:    0x%08x\n", error->vcs_ipehr);
> > -		seq_printf(m, "  INSTDONE: 0x%08x\n", error->vcs_instdone);
> > -		seq_printf(m, "  seqno:    0x%08x\n", error->vcs_seqno);
> > -	}
> > -	seq_printf(m, "Render command stream:\n");
> > -	seq_printf(m, "  ACTHD: 0x%08x\n", error->acthd);
> > -	seq_printf(m, "  IPEIR: 0x%08x\n", error->ipeir);
> > -	seq_printf(m, "  IPEHR: 0x%08x\n", error->ipehr);
> > -	seq_printf(m, "  INSTDONE: 0x%08x\n", error->instdone);
> > -	if (INTEL_INFO(dev)->gen >= 4) {
> > -		seq_printf(m, "  INSTDONE1: 0x%08x\n", error->instdone1);
> > -		seq_printf(m, "  INSTPS: 0x%08x\n", error->instps);
> > -	}
> > -	seq_printf(m, "  INSTPM: 0x%08x\n", error->instpm);
> > -	seq_printf(m, "  seqno: 0x%08x\n", error->seqno);
> >  
> >  	for (i = 0; i < dev_priv->num_fence_regs; i++)
> >  		seq_printf(m, "  fence[%d] = %08llx\n", i, error->fence[i]);
> >  
> > +	if (INTEL_INFO(dev)->gen >= 6) 
> > +		seq_printf(m, "ERROR: 0x%08x\n", error->error);
> > +
> > +	i915_ring_error_state(m, dev, error, RCS);
> > +	if (HAS_BLT(dev))
> > +		i915_ring_error_state(m, dev, error, BCS);
> > +	if (HAS_BSD(dev))
> > +		i915_ring_error_state(m, dev, error, VCS);
> > +
> >  	if (error->active_bo)
> >  		print_error_buffers(m, "Active",
> >  				    error->active_bo,
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index d2da91f..17617c1 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -151,25 +151,15 @@ struct drm_i915_error_state {
> >  	u32 eir;
> >  	u32 pgtbl_er;
> >  	u32 pipestat[I915_MAX_PIPES];
> > -	u32 ipeir;
> > -	u32 ipehr;
> > -	u32 instdone;
> > -	u32 acthd;
> > +	u32 ipeir[I915_NUM_RINGS];
> > +	u32 ipehr[I915_NUM_RINGS];
> > +	u32 instdone[I915_NUM_RINGS];
> > +	u32 acthd[I915_NUM_RINGS];
> >  	u32 error; /* gen6+ */
> > -	u32 bcs_acthd; /* gen6+ blt engine */
> > -	u32 bcs_ipehr;
> > -	u32 bcs_ipeir;
> > -	u32 bcs_instdone;
> > -	u32 bcs_seqno;
> > -	u32 vcs_acthd; /* gen6+ bsd engine */
> > -	u32 vcs_ipehr;
> > -	u32 vcs_ipeir;
> > -	u32 vcs_instdone;
> > -	u32 vcs_seqno;
> >  	u32 instpm;
> >  	u32 instps;
> >  	u32 instdone1;
> > -	u32 seqno;
> > +	u32 seqno[I915_NUM_RINGS];
> >  	u64 bbaddr;
> >  	u64 fence[I915_MAX_NUM_FENCES];
> >  	struct timeval time;
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index 3cd85dd..70e67f1 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -876,6 +876,32 @@ i915_error_first_batchbuffer(struct drm_i915_private *dev_priv,
> >  	return NULL;
> >  }
> >  
> > +static void i915_record_ring_state(struct drm_device *dev,
> > +				   struct drm_i915_error_state *error,
> > +				   struct intel_ring_buffer *ring)
> > +{
> > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > +
> > +	if (INTEL_INFO(dev)->gen >= 4) {
> > +		error->ipeir[ring->id] = I915_READ(RING_IPEIR(ring->mmio_base));
> > +		error->ipehr[ring->id] = I915_READ(RING_IPEHR(ring->mmio_base));
> > +		error->instdone[ring->id] = I915_READ(RING_INSTDONE(ring->mmio_base));
> > +		if (ring->id == RCS) {
> > +			error->instps = I915_READ(INSTPS);
> > +			error->instdone1 = I915_READ(INSTDONE1);
> > +			error->bbaddr = I915_READ64(BB_ADDR);
> > +		}
> > +	} else {
> > +		error->ipeir[ring->id] = I915_READ(IPEIR);
> > +		error->ipehr[ring->id] = I915_READ(IPEHR);
> > +		error->instdone[ring->id] = I915_READ(INSTDONE);
> > +		error->bbaddr = 0;
> > +	}
> > +
> > +	error->seqno[ring->id] = ring->get_seqno(ring);
> > +	error->acthd[ring->id] = intel_ring_get_active_head(ring);
> > +}
> > +
> >  /**
> >   * i915_capture_error_state - capture an error record for later analysis
> >   * @dev: drm device
> > @@ -909,47 +935,23 @@ static void i915_capture_error_state(struct drm_device *dev)
> >  	DRM_INFO("capturing error event; look for more information in /debug/dri/%d/i915_error_state\n",
> >  		 dev->primary->index);
> >  
> > -	error->seqno = dev_priv->ring[RCS].get_seqno(&dev_priv->ring[RCS]);
> >  	error->eir = I915_READ(EIR);
> >  	error->pgtbl_er = I915_READ(PGTBL_ER);
> >  	for_each_pipe(pipe)
> >  		error->pipestat[pipe] = I915_READ(PIPESTAT(pipe));
> >  	error->instpm = I915_READ(INSTPM);
> > -	error->error = 0;
> > -	if (INTEL_INFO(dev)->gen >= 6) {
> > +
> > +	if (INTEL_INFO(dev)->gen >= 6)
> >  		error->error = I915_READ(ERROR_GEN6);
> > +	else
> > +		error->error = 0;
> > +
> > +	i915_record_ring_state(dev, error, &dev_priv->ring[RCS]);
> > +	if (HAS_BLT(dev))
> > +		i915_record_ring_state(dev, error, &dev_priv->ring[BCS]);
> > +	if (HAS_BSD(dev))
> > +		i915_record_ring_state(dev, error, &dev_priv->ring[VCS]);
> >  
> > -		error->bcs_acthd = I915_READ(BCS_ACTHD);
> > -		error->bcs_ipehr = I915_READ(BCS_IPEHR);
> > -		error->bcs_ipeir = I915_READ(BCS_IPEIR);
> > -		error->bcs_instdone = I915_READ(BCS_INSTDONE);
> > -		error->bcs_seqno = 0;
> > -		if (dev_priv->ring[BCS].get_seqno)
> > -			error->bcs_seqno = dev_priv->ring[BCS].get_seqno(&dev_priv->ring[BCS]);
> > -
> > -		error->vcs_acthd = I915_READ(VCS_ACTHD);
> > -		error->vcs_ipehr = I915_READ(VCS_IPEHR);
> > -		error->vcs_ipeir = I915_READ(VCS_IPEIR);
> > -		error->vcs_instdone = I915_READ(VCS_INSTDONE);
> > -		error->vcs_seqno = 0;
> > -		if (dev_priv->ring[VCS].get_seqno)
> > -			error->vcs_seqno = dev_priv->ring[VCS].get_seqno(&dev_priv->ring[VCS]);
> > -	}
> > -	if (INTEL_INFO(dev)->gen >= 4) {
> > -		error->ipeir = I915_READ(IPEIR_I965);
> > -		error->ipehr = I915_READ(IPEHR_I965);
> > -		error->instdone = I915_READ(INSTDONE_I965);
> > -		error->instps = I915_READ(INSTPS);
> > -		error->instdone1 = I915_READ(INSTDONE1);
> > -		error->acthd = I915_READ(ACTHD_I965);
> > -		error->bbaddr = I915_READ64(BB_ADDR);
> > -	} else {
> > -		error->ipeir = I915_READ(IPEIR);
> > -		error->ipehr = I915_READ(IPEHR);
> > -		error->instdone = I915_READ(INSTDONE);
> > -		error->acthd = I915_READ(ACTHD);
> > -		error->bbaddr = 0;
> > -	}
> >  	i915_gem_record_fences(dev, error);
> >  
> >  	/* Record the active batch and ring buffers */
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index 5a09416..c292957 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -352,6 +352,9 @@
> >  #define IPEIR_I965	0x02064
> >  #define IPEHR_I965	0x02068
> >  #define INSTDONE_I965	0x0206c
> > +#define RING_IPEIR(base)	((base)+0x64)
> > +#define RING_IPEHR(base)	((base)+0x68)
> > +#define RING_INSTDONE(base)	((base)+0x6c)
> >  #define INSTPS		0x02070 /* 965+ only */
> >  #define INSTDONE1	0x0207c /* 965+ only */
> >  #define ACTHD_I965	0x02074
> > @@ -365,14 +368,6 @@
> >  #define INSTDONE	0x02090
> >  #define NOPID		0x02094
> >  #define HWSTAM		0x02098
> > -#define VCS_INSTDONE	0x1206C
> > -#define VCS_IPEIR	0x12064
> > -#define VCS_IPEHR	0x12068
> > -#define VCS_ACTHD	0x12074
> > -#define BCS_INSTDONE	0x2206C
> > -#define BCS_IPEIR	0x22064
> > -#define BCS_IPEHR	0x22068
> > -#define BCS_ACTHD	0x22074
> >  
> >  #define ERROR_GEN6	0x040a0
> >  
> 
> This patch looks a lot like an earlier patch of mine except your ring ID
> changes made it quite nice. I wonder why you didn't keep my INSTPS, and INSTPM
> per ring?
> 
> Ben

Ah, just saw patch 5... I guess I find this a little weird way to break
it up, but I think I did a much worse job in my patches.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] drm/i915: collect more per ring error state
  2011-10-30 19:12 ` [PATCH 5/5] drm/i915: collect more per ring error state Daniel Vetter
@ 2011-10-31  1:51   ` Ben Widawsky
  0 siblings, 0 replies; 16+ messages in thread
From: Ben Widawsky @ 2011-10-31  1:51 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Sun, 30 Oct 2011 20:12:12 +0100
Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> Based on a patch by Ben Widawsky, but with different colors
> for the bikeshed.
> 
> In contrast to Ben's patch this one doesn't add the fault regs.
> Afaics they're for the optional page fault support which
> - we're not enabling
> - and which seems to be unsupported by the hw team. Recent bspec
>   lacks tons of information about this that the public docs released
>   half a year back still contain.
> 
> Also dump ring HEAD/TAIL registers - I've recently seen a few
> error_state where just guessing these is not good enough.
> 
> v2: Also dump INSTPM for every ring.
> 
> v3: Fix a few really silly goof-ups spotted by Chris Wilson.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] drm/i915: don't bail out of intel_wait_ring_buffer too early
  2011-10-31  1:29   ` Ben Widawsky
@ 2011-10-31  7:37     ` Daniel Vetter
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel Vetter @ 2011-10-31  7:37 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Daniel Vetter, intel-gfx

On Sun, Oct 30, 2011 at 06:29:13PM -0700, Ben Widawsky wrote:
> I didn't really check to see if there is actually an issue here, but
> instead of 60, do you want to play nice with timeouts such as
> CONFIG_DEFAULT_HUNG_TASK_TIMEOUT (ie. the min of all the timeouts and
> 60)?

Yeah, 60s should be half the default hung task timeout. But people who
muck around with these settings should know what they're doing, so I don't
care. Worst case we splatter the dmesg with hung task backtraces before we
continue (and then splatter the dmesg with gpu hung warnings).
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] drm/i915: refactor ring error state capture to use arrays
  2011-10-31  1:50     ` Ben Widawsky
@ 2011-10-31  7:53       ` Daniel Vetter
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel Vetter @ 2011-10-31  7:53 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Daniel Vetter, intel-gfx

On Sun, Oct 30, 2011 at 06:50:05PM -0700, Ben Widawsky wrote:
> Ah, just saw patch 5... I guess I find this a little weird way to break
> it up, but I think I did a much worse job in my patches.

Safe when the change is really small (i.e. a few one-liner hunks), always
separate the refactor from the actual code change. It's just damn hard to
spot a small functional change amid tons of code movement, making proper
review impossible.

That's also the reason I'll only ever smash an acked-by on top of a
massive refactor - you can't review that kind of stuff without completely
redoing the patch yourself.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] drm/i915: switch ring->id to be a real id
  2011-10-30 19:12 ` [PATCH 3/5] drm/i915: switch ring->id to be a real id Daniel Vetter
  2011-10-31  1:48   ` Ben Widawsky
@ 2011-11-01 15:29   ` Eugeni Dodonov
  1 sibling, 0 replies; 16+ messages in thread
From: Eugeni Dodonov @ 2011-11-01 15:29 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 572 bytes --]

On Sun, Oct 30, 2011 at 17:12, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> ... and add a helpr function for the places where we want a flag.
>
> This way we can use ring->id to index into arrays.
>
> v2: Resurrect the missing beautification-space Chris Wilson noted.
> I'm moving this space around because I'll reuse ring_str in the next
> patch.
>
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>

Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>

-- 
Eugeni Dodonov
<http://eugeni.dodonov.net/>

[-- Attachment #1.2: Type: text/html, Size: 1057 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] drm/i915: don't bail out of intel_wait_ring_buffer too early
  2011-10-30 19:12 ` [PATCH 2/5] drm/i915: don't bail out of intel_wait_ring_buffer too early Daniel Vetter
  2011-10-31  1:29   ` Ben Widawsky
@ 2011-11-01 15:31   ` Eugeni Dodonov
  1 sibling, 0 replies; 16+ messages in thread
From: Eugeni Dodonov @ 2011-11-01 15:31 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1408 bytes --]

On Sun, Oct 30, 2011 at 17:12, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> In the pre-gem days with non-existing hangcheck and gpu reset code,
> this timeout of 3 seconds was pretty important to avoid stuck
> processes.
>
> But now we have the hangcheck code in gem that goes to great length
> to ensure that the gpu is really dead before declaring it wedged.
>
> So there's no need for this timeout anymore. Actually it's even harmful
> because we can bail out too early (e.g. with xscreensaver slip)
> when running giant batchbuffers. And our code isn't robust enough
> to properly unroll any state-changes, we pretty much rely on the gpu
> reset code cleaning up the mess (like cache tracking, fencing state,
> active list/request tracking, ...).
>
> With this change intel_begin_ring can only fail when the gpu is
> wedged, and it will return -EAGAIN (like wait_request in case the
> gpu reset is still outstanding).
>
> v2: Chris Wilson noted that on resume timers aren't running and hence
> we won't ever get kicked out of this loop by the hangcheck code. Use
> an insanely large timeout instead for the HAS_GEM case to prevent
> resume bugs from totally hanging the machine.
>
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>

Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>


-- 
Eugeni Dodonov
<http://eugeni.dodonov.net/>

[-- Attachment #1.2: Type: text/html, Size: 1952 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-11-01 15:32 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-30 19:12 [PATCH 1/5] drm/i915: kicking rings stuck on semaphores considered harmful Daniel Vetter
2011-10-30 19:12 ` [PATCH 2/5] drm/i915: don't bail out of intel_wait_ring_buffer too early Daniel Vetter
2011-10-31  1:29   ` Ben Widawsky
2011-10-31  7:37     ` Daniel Vetter
2011-11-01 15:31   ` Eugeni Dodonov
2011-10-30 19:12 ` [PATCH 3/5] drm/i915: switch ring->id to be a real id Daniel Vetter
2011-10-31  1:48   ` Ben Widawsky
2011-11-01 15:29   ` Eugeni Dodonov
2011-10-30 19:12 ` [PATCH 4/5] drm/i915: refactor ring error state capture to use arrays Daniel Vetter
2011-10-30 19:32   ` Chris Wilson
2011-10-31  1:47   ` Ben Widawsky
2011-10-31  1:50     ` Ben Widawsky
2011-10-31  7:53       ` Daniel Vetter
2011-10-30 19:12 ` [PATCH 5/5] drm/i915: collect more per ring error state Daniel Vetter
2011-10-31  1:51   ` Ben Widawsky
2011-10-31  1:25 ` [PATCH 1/5] drm/i915: kicking rings stuck on semaphores considered harmful Ben Widawsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox