intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/5] drm/i915/lrc: Clarify the format of the context image
@ 2017-07-13 10:27 Chris Wilson
  2017-07-13 10:27 ` [PATCH v3 2/5] drm/i915/guc: Don't make assumptions while getting the lrca offset Chris Wilson
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Chris Wilson @ 2017-07-13 10:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: intel-gvt-dev

From: Michel Thierry <michel.thierry@intel.com>

Not only the context image consist of two parts (the PPHWSP, and the
logical context state), but we also allocate a header at the start of
for sharing data with GuC. Thus every lrc looks like this:

  | [guc]          | [hwsp] [logical state] |
  |<- our header ->|<- context image      ->|

So far, we have oversimplified whenever we use each of these parts of the
context, just because the GuC header happens to be in page 0, and the
(PP)HWSP is in page 1. But this had led to using the same define for more
than one meaning (as a page index in the lrc and as 1 page).

This patch adds defines for the GuC shared page, the PPHWSP page and the
start of the logical state. It also updated the places where the old
define was being used. Since we are not changing the size (or format) of
the context, there are no functional changes.

v2: Use PPHWSP index for hws again.

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: intel-gvt-dev@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170712193032.27080-1-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gvt/scheduler.c       |  4 ++--
 drivers/gpu/drm/i915/i915_guc_submission.c |  4 ++--
 drivers/gpu/drm/i915/intel_lrc.c           |  9 ++++++---
 drivers/gpu/drm/i915/intel_lrc.h           | 25 ++++++++++++++++++++++---
 4 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
index 0e2e36ad6196..8e9eeff156f6 100644
--- a/drivers/gpu/drm/i915/gvt/scheduler.c
+++ b/drivers/gpu/drm/i915/gvt/scheduler.c
@@ -87,7 +87,7 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
 			return -EINVAL;
 		}
 
-		page = i915_gem_object_get_page(ctx_obj, LRC_PPHWSP_PN + i);
+		page = i915_gem_object_get_page(ctx_obj, LRC_HEADER_PAGES + i);
 		dst = kmap(page);
 		intel_gvt_hypervisor_read_gpa(vgpu, context_gpa, dst,
 				GTT_PAGE_SIZE);
@@ -361,7 +361,7 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
 			return;
 		}
 
-		page = i915_gem_object_get_page(ctx_obj, LRC_PPHWSP_PN + i);
+		page = i915_gem_object_get_page(ctx_obj, LRC_HEADER_PAGES + i);
 		src = kmap(page);
 		intel_gvt_hypervisor_write_gpa(vgpu, context_gpa, src,
 				GTT_PAGE_SIZE);
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 48a1e9349a2c..b7ca13860677 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -1310,7 +1310,7 @@ int intel_guc_suspend(struct drm_i915_private *dev_priv)
 	/* any value greater than GUC_POWER_D0 */
 	data[1] = GUC_POWER_D1;
 	/* first page is shared data with GuC */
-	data[2] = guc_ggtt_offset(ctx->engine[RCS].state);
+	data[2] = guc_ggtt_offset(ctx->engine[RCS].state) + LRC_GUCSHR_PN * PAGE_SIZE;
 
 	return intel_guc_send(guc, data, ARRAY_SIZE(data));
 }
@@ -1336,7 +1336,7 @@ int intel_guc_resume(struct drm_i915_private *dev_priv)
 	data[0] = INTEL_GUC_ACTION_EXIT_S_STATE;
 	data[1] = GUC_POWER_D0;
 	/* first page is shared data with GuC */
-	data[2] = guc_ggtt_offset(ctx->engine[RCS].state);
+	data[2] = guc_ggtt_offset(ctx->engine[RCS].state) + LRC_GUCSHR_PN * PAGE_SIZE;
 
 	return intel_guc_send(guc, data, ARRAY_SIZE(data));
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 699868d81de8..fde145c552ef 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -279,7 +279,7 @@ intel_lr_context_descriptor_update(struct i915_gem_context *ctx,
 	BUILD_BUG_ON(MAX_CONTEXT_HW_ID > (1<<GEN8_CTX_ID_WIDTH));
 
 	desc = ctx->desc_template;				/* bits  0-11 */
-	desc |= i915_ggtt_offset(ce->state) + LRC_PPHWSP_PN * PAGE_SIZE;
+	desc |= i915_ggtt_offset(ce->state) + LRC_HEADER_PAGES * PAGE_SIZE;
 								/* bits 12-31 */
 	desc |= (u64)ctx->hw_id << GEN8_CTX_ID_SHIFT;		/* bits 32-52 */
 
@@ -2015,8 +2015,11 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 
 	context_size = round_up(engine->context_size, I915_GTT_PAGE_SIZE);
 
-	/* One extra page as the sharing data between driver and GuC */
-	context_size += PAGE_SIZE * LRC_PPHWSP_PN;
+	/*
+	 * Before the actual start of the context image, we insert a few pages
+	 * for our own use and for sharing with the GuC.
+	 */
+	context_size += LRC_HEADER_PAGES * PAGE_SIZE;
 
 	ctx_obj = i915_gem_object_create(ctx->i915, context_size);
 	if (IS_ERR(ctx_obj)) {
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 52b3a1fd4059..cc17cbf5afd1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -70,10 +70,29 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine);
 
 /* Logical Ring Contexts */
 
-/* One extra page is added before LRC for GuC as shared data */
+/*
+ * We allocate a header at the start of the context image for our own
+ * use, therefore the actual location of the logical state is offset
+ * from the start of the VMA. The layout is
+ *
+ * | [guc]          | [hwsp] [logical state] |
+ * |<- our header ->|<- context image      ->|
+ *
+ */
+/* The first page is used for sharing data with the GuC */
 #define LRC_GUCSHR_PN	(0)
-#define LRC_PPHWSP_PN	(LRC_GUCSHR_PN + 1)
-#define LRC_STATE_PN	(LRC_PPHWSP_PN + 1)
+#define LRC_GUCSHR_SZ	(1)
+/* At the start of the context image is its per-process HWS page */
+#define LRC_PPHWSP_PN	(LRC_GUCSHR_PN + LRC_GUCSHR_SZ)
+#define LRC_PPHWSP_SZ	(1)
+/* Finally we have the logical state for the context */
+#define LRC_STATE_PN	(LRC_PPHWSP_PN + LRC_PPHWSP_SZ)
+
+/*
+ * Currently we include the PPHWSP in __intel_engine_context_size() so
+ * the size of the header is synonymous with the start of the PPHWSP.
+ */
+#define LRC_HEADER_PAGES LRC_PPHWSP_PN
 
 struct drm_i915_private;
 struct i915_gem_context;
-- 
2.13.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/5] drm/i915/guc: Don't make assumptions while getting the lrca offset
  2017-07-13 10:27 [PATCH v3 1/5] drm/i915/lrc: Clarify the format of the context image Chris Wilson
@ 2017-07-13 10:27 ` Chris Wilson
  2017-07-13 10:27 ` [PATCH v3 3/5] drm/i915/lrc: allocate separate page for HWSP Chris Wilson
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2017-07-13 10:27 UTC (permalink / raw)
  To: intel-gfx

From: Michel Thierry <michel.thierry@intel.com>

Using the HWSP ggtt_offset to get the lrca offset is only correct if the
HWSP happens to be before it (when we reuse the PPHWSP of the kernel
context as the engine HWSP). Instead of making this assumption, get the
lrca offset from the kernel_context engine state.

And while looking at this part of the GuC interaction, it was also
noticed that the firmware expects the size of only the engine context
(context minus the execlist part, i.e. don't include the first 80
dwords), so pass the right size.

v2: Use the new macros to prevent abusive overuse of the old ones (Chris).

Reported-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170712193032.27080-2-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_guc_submission.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index b7ca13860677..8b96935cf96a 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -1018,6 +1018,12 @@ static void guc_policies_init(struct guc_policies *policies)
 	policies->is_valid = 1;
 }
 
+/*
+ * The first 80 dwords of the register state context, containing the
+ * execlists and ppgtt registers.
+ */
+#define LR_HW_CONTEXT_SIZE	(80 * sizeof(u32))
+
 static int guc_ads_create(struct intel_guc *guc)
 {
 	struct drm_i915_private *dev_priv = guc_to_i915(guc);
@@ -1032,6 +1038,8 @@ static int guc_ads_create(struct intel_guc *guc)
 	} __packed *blob;
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
+	const u32 skipped_offset = LRC_HEADER_PAGES * PAGE_SIZE;
+	const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE;
 	u32 base;
 
 	GEM_BUG_ON(guc->ads_vma);
@@ -1062,13 +1070,20 @@ static int guc_ads_create(struct intel_guc *guc)
 	 * engines after a reset. Here we use the Render ring default
 	 * context, which must already exist and be pinned in the GGTT,
 	 * so its address won't change after we've told the GuC where
-	 * to find it.
+	 * to find it. Note that we have to skip our header (1 page),
+	 * because our GuC shared data is there.
 	 */
 	blob->ads.golden_context_lrca =
-		dev_priv->engine[RCS]->status_page.ggtt_offset;
+		guc_ggtt_offset(dev_priv->kernel_context->engine[RCS].state) + skipped_offset;
 
+	/*
+	 * The GuC expects us to exclude the portion of the context image that
+	 * it skips from the size it is to read. It starts reading from after
+	 * the execlist context (so skipping the first page [PPHWSP] and 80
+	 * dwords). Weird guc is weird.
+	 */
 	for_each_engine(engine, dev_priv, id)
-		blob->ads.eng_state_size[engine->guc_id] = engine->context_size;
+		blob->ads.eng_state_size[engine->guc_id] = engine->context_size - skipped_size;
 
 	base = guc_ggtt_offset(vma);
 	blob->ads.scheduler_policies = base + ptr_offset(blob, policies);
-- 
2.13.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/5] drm/i915/lrc: allocate separate page for HWSP
  2017-07-13 10:27 [PATCH v3 1/5] drm/i915/lrc: Clarify the format of the context image Chris Wilson
  2017-07-13 10:27 ` [PATCH v3 2/5] drm/i915/guc: Don't make assumptions while getting the lrca offset Chris Wilson
@ 2017-07-13 10:27 ` Chris Wilson
  2017-07-13 17:40   ` Michel Thierry
  2017-07-13 10:27 ` [PATCH v3 4/5] drm/i915/execlists: Read the context-status buffer from the HWSP Chris Wilson
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2017-07-13 10:27 UTC (permalink / raw)
  To: intel-gfx

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

On gen8+ we're currently using the PPHWSP of the kernel ctx as the
global HWSP. However, when the kernel ctx gets submitted (e.g. from
__intel_autoenable_gt_powersave) the HW will use that page as both
HWSP and PPHWSP. This causes a conflict in the register arena of the
HWSP, i.e. dword indices below 0x30. We don't current utilize this arena,
but in the following patches we will take advantage of the cached
register state for handling execlist's context status interrupt.

To avoid the conflict, instead of re-using the PPHWSP of the kernel
ctx we can allocate a separate page for the HWSP like what happens for
pre-execlists platform.

v2: Add a use-case for the register arena of the HWSP.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1499357440-34688-1-git-send-email-daniele.ceraolospurio@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewded-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c  | 126 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.c        |  34 +--------
 drivers/gpu/drm/i915/intel_ringbuffer.c | 125 +------------------------------
 3 files changed, 127 insertions(+), 158 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 24db316e0fd1..cba120f3d89d 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -445,6 +445,114 @@ static void intel_engine_cleanup_scratch(struct intel_engine_cs *engine)
 	i915_vma_unpin_and_release(&engine->scratch);
 }
 
+static void cleanup_phys_status_page(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+
+	if (!dev_priv->status_page_dmah)
+		return;
+
+	drm_pci_free(&dev_priv->drm, dev_priv->status_page_dmah);
+	engine->status_page.page_addr = NULL;
+}
+
+static void cleanup_status_page(struct intel_engine_cs *engine)
+{
+	struct i915_vma *vma;
+	struct drm_i915_gem_object *obj;
+
+	vma = fetch_and_zero(&engine->status_page.vma);
+	if (!vma)
+		return;
+
+	obj = vma->obj;
+
+	i915_vma_unpin(vma);
+	i915_vma_close(vma);
+
+	i915_gem_object_unpin_map(obj);
+	__i915_gem_object_release_unless_active(obj);
+}
+
+static int init_status_page(struct intel_engine_cs *engine)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	unsigned int flags;
+	void *vaddr;
+	int ret;
+
+	obj = i915_gem_object_create_internal(engine->i915, PAGE_SIZE);
+	if (IS_ERR(obj)) {
+		DRM_ERROR("Failed to allocate status page\n");
+		return PTR_ERR(obj);
+	}
+
+	ret = i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
+	if (ret)
+		goto err;
+
+	vma = i915_vma_instance(obj, &engine->i915->ggtt.base, NULL);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto err;
+	}
+
+	flags = PIN_GLOBAL;
+	if (!HAS_LLC(engine->i915))
+		/* On g33, we cannot place HWS above 256MiB, so
+		 * restrict its pinning to the low mappable arena.
+		 * Though this restriction is not documented for
+		 * gen4, gen5, or byt, they also behave similarly
+		 * and hang if the HWS is placed at the top of the
+		 * GTT. To generalise, it appears that all !llc
+		 * platforms have issues with us placing the HWS
+		 * above the mappable region (even though we never
+		 * actualy map it).
+		 */
+		flags |= PIN_MAPPABLE;
+	ret = i915_vma_pin(vma, 0, 4096, flags);
+	if (ret)
+		goto err;
+
+	vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
+	if (IS_ERR(vaddr)) {
+		ret = PTR_ERR(vaddr);
+		goto err_unpin;
+	}
+
+	engine->status_page.vma = vma;
+	engine->status_page.ggtt_offset = i915_ggtt_offset(vma);
+	engine->status_page.page_addr = memset(vaddr, 0, PAGE_SIZE);
+
+	DRM_DEBUG_DRIVER("%s hws offset: 0x%08x\n",
+			 engine->name, i915_ggtt_offset(vma));
+	return 0;
+
+err_unpin:
+	i915_vma_unpin(vma);
+err:
+	i915_gem_object_put(obj);
+	return ret;
+}
+
+static int init_phys_status_page(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+
+	GEM_BUG_ON(engine->id != RCS);
+
+	dev_priv->status_page_dmah =
+		drm_pci_alloc(&dev_priv->drm, PAGE_SIZE, PAGE_SIZE);
+	if (!dev_priv->status_page_dmah)
+		return -ENOMEM;
+
+	engine->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
+	memset(engine->status_page.page_addr, 0, PAGE_SIZE);
+
+	return 0;
+}
+
 /**
  * intel_engines_init_common - initialize cengine state which might require hw access
  * @engine: Engine to initialize.
@@ -480,10 +588,21 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 
 	ret = i915_gem_render_state_init(engine);
 	if (ret)
-		goto err_unpin;
+		goto err_breadcrumbs;
+
+	if (HWS_NEEDS_PHYSICAL(engine->i915))
+		ret = init_phys_status_page(engine);
+	else
+		ret = init_status_page(engine);
+	if (ret)
+		goto err_rs_fini;
 
 	return 0;
 
+err_rs_fini:
+	i915_gem_render_state_fini(engine);
+err_breadcrumbs:
+	intel_engine_fini_breadcrumbs(engine);
 err_unpin:
 	engine->context_unpin(engine, engine->i915->kernel_context);
 	return ret;
@@ -500,6 +619,11 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 {
 	intel_engine_cleanup_scratch(engine);
 
+	if (HWS_NEEDS_PHYSICAL(engine->i915))
+		cleanup_phys_status_page(engine);
+	else
+		cleanup_status_page(engine);
+
 	i915_gem_render_state_fini(engine);
 	intel_engine_fini_breadcrumbs(engine);
 	intel_engine_cleanup_cmd_parser(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index fde145c552ef..3469badedbe0 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1641,11 +1641,6 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
 	if (engine->cleanup)
 		engine->cleanup(engine);
 
-	if (engine->status_page.vma) {
-		i915_gem_object_unpin_map(engine->status_page.vma->obj);
-		engine->status_page.vma = NULL;
-	}
-
 	intel_engine_cleanup_common(engine);
 
 	lrc_destroy_wa_ctx(engine);
@@ -1692,24 +1687,6 @@ logical_ring_default_irqs(struct intel_engine_cs *engine)
 	engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift;
 }
 
-static int
-lrc_setup_hws(struct intel_engine_cs *engine, struct i915_vma *vma)
-{
-	const int hws_offset = LRC_PPHWSP_PN * PAGE_SIZE;
-	void *hws;
-
-	/* The HWSP is part of the default context object in LRC mode. */
-	hws = i915_gem_object_pin_map(vma->obj, I915_MAP_WB);
-	if (IS_ERR(hws))
-		return PTR_ERR(hws);
-
-	engine->status_page.page_addr = hws + hws_offset;
-	engine->status_page.ggtt_offset = i915_ggtt_offset(vma) + hws_offset;
-	engine->status_page.vma = vma;
-
-	return 0;
-}
-
 static void
 logical_ring_setup(struct intel_engine_cs *engine)
 {
@@ -1742,23 +1719,14 @@ logical_ring_setup(struct intel_engine_cs *engine)
 	logical_ring_default_irqs(engine);
 }
 
-static int
-logical_ring_init(struct intel_engine_cs *engine)
+static int logical_ring_init(struct intel_engine_cs *engine)
 {
-	struct i915_gem_context *dctx = engine->i915->kernel_context;
 	int ret;
 
 	ret = intel_engine_init_common(engine);
 	if (ret)
 		goto error;
 
-	/* And setup the hardware status page. */
-	ret = lrc_setup_hws(engine, dctx->engine[engine->id].state);
-	if (ret) {
-		DRM_ERROR("Failed to set up hws %s: %d\n", engine->name, ret);
-		goto error;
-	}
-
 	return 0;
 
 error:
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 5224b7abb8a3..b2580539051e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1174,113 +1174,7 @@ i915_emit_bb_start(struct drm_i915_gem_request *req,
 	return 0;
 }
 
-static void cleanup_phys_status_page(struct intel_engine_cs *engine)
-{
-	struct drm_i915_private *dev_priv = engine->i915;
-
-	if (!dev_priv->status_page_dmah)
-		return;
-
-	drm_pci_free(&dev_priv->drm, dev_priv->status_page_dmah);
-	engine->status_page.page_addr = NULL;
-}
-
-static void cleanup_status_page(struct intel_engine_cs *engine)
-{
-	struct i915_vma *vma;
-	struct drm_i915_gem_object *obj;
-
-	vma = fetch_and_zero(&engine->status_page.vma);
-	if (!vma)
-		return;
-
-	obj = vma->obj;
-
-	i915_vma_unpin(vma);
-	i915_vma_close(vma);
-
-	i915_gem_object_unpin_map(obj);
-	__i915_gem_object_release_unless_active(obj);
-}
 
-static int init_status_page(struct intel_engine_cs *engine)
-{
-	struct drm_i915_gem_object *obj;
-	struct i915_vma *vma;
-	unsigned int flags;
-	void *vaddr;
-	int ret;
-
-	obj = i915_gem_object_create_internal(engine->i915, PAGE_SIZE);
-	if (IS_ERR(obj)) {
-		DRM_ERROR("Failed to allocate status page\n");
-		return PTR_ERR(obj);
-	}
-
-	ret = i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
-	if (ret)
-		goto err;
-
-	vma = i915_vma_instance(obj, &engine->i915->ggtt.base, NULL);
-	if (IS_ERR(vma)) {
-		ret = PTR_ERR(vma);
-		goto err;
-	}
-
-	flags = PIN_GLOBAL;
-	if (!HAS_LLC(engine->i915))
-		/* On g33, we cannot place HWS above 256MiB, so
-		 * restrict its pinning to the low mappable arena.
-		 * Though this restriction is not documented for
-		 * gen4, gen5, or byt, they also behave similarly
-		 * and hang if the HWS is placed at the top of the
-		 * GTT. To generalise, it appears that all !llc
-		 * platforms have issues with us placing the HWS
-		 * above the mappable region (even though we never
-		 * actualy map it).
-		 */
-		flags |= PIN_MAPPABLE;
-	ret = i915_vma_pin(vma, 0, 4096, flags);
-	if (ret)
-		goto err;
-
-	vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
-	if (IS_ERR(vaddr)) {
-		ret = PTR_ERR(vaddr);
-		goto err_unpin;
-	}
-
-	engine->status_page.vma = vma;
-	engine->status_page.ggtt_offset = i915_ggtt_offset(vma);
-	engine->status_page.page_addr = memset(vaddr, 0, PAGE_SIZE);
-
-	DRM_DEBUG_DRIVER("%s hws offset: 0x%08x\n",
-			 engine->name, i915_ggtt_offset(vma));
-	return 0;
-
-err_unpin:
-	i915_vma_unpin(vma);
-err:
-	i915_gem_object_put(obj);
-	return ret;
-}
-
-static int init_phys_status_page(struct intel_engine_cs *engine)
-{
-	struct drm_i915_private *dev_priv = engine->i915;
-
-	GEM_BUG_ON(engine->id != RCS);
-
-	dev_priv->status_page_dmah =
-		drm_pci_alloc(&dev_priv->drm, PAGE_SIZE, PAGE_SIZE);
-	if (!dev_priv->status_page_dmah)
-		return -ENOMEM;
-
-	engine->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
-	memset(engine->status_page.page_addr, 0, PAGE_SIZE);
-
-	return 0;
-}
 
 int intel_ring_pin(struct intel_ring *ring,
 		   struct drm_i915_private *i915,
@@ -1567,17 +1461,10 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 	if (err)
 		goto err;
 
-	if (HWS_NEEDS_PHYSICAL(engine->i915))
-		err = init_phys_status_page(engine);
-	else
-		err = init_status_page(engine);
-	if (err)
-		goto err;
-
 	ring = intel_engine_create_ring(engine, 32 * PAGE_SIZE);
 	if (IS_ERR(ring)) {
 		err = PTR_ERR(ring);
-		goto err_hws;
+		goto err;
 	}
 
 	/* Ring wraparound at offset 0 sometimes hangs. No idea why. */
@@ -1592,11 +1479,6 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 
 err_ring:
 	intel_ring_free(ring);
-err_hws:
-	if (HWS_NEEDS_PHYSICAL(engine->i915))
-		cleanup_phys_status_page(engine);
-	else
-		cleanup_status_page(engine);
 err:
 	intel_engine_cleanup_common(engine);
 	return err;
@@ -1615,11 +1497,6 @@ void intel_engine_cleanup(struct intel_engine_cs *engine)
 	if (engine->cleanup)
 		engine->cleanup(engine);
 
-	if (HWS_NEEDS_PHYSICAL(dev_priv))
-		cleanup_phys_status_page(engine);
-	else
-		cleanup_status_page(engine);
-
 	intel_engine_cleanup_common(engine);
 
 	dev_priv->engine[engine->id] = NULL;
-- 
2.13.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/5] drm/i915/execlists: Read the context-status buffer from the HWSP
  2017-07-13 10:27 [PATCH v3 1/5] drm/i915/lrc: Clarify the format of the context image Chris Wilson
  2017-07-13 10:27 ` [PATCH v3 2/5] drm/i915/guc: Don't make assumptions while getting the lrca offset Chris Wilson
  2017-07-13 10:27 ` [PATCH v3 3/5] drm/i915/lrc: allocate separate page for HWSP Chris Wilson
@ 2017-07-13 10:27 ` Chris Wilson
  2017-07-13 17:58   ` Michel Thierry
  2017-07-13 19:44   ` [PATCH v4] " Chris Wilson
  2017-07-13 10:27 ` [PATCH v3 5/5] drm/i915/execlists: Read the context-status HEAD " Chris Wilson
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 16+ messages in thread
From: Chris Wilson @ 2017-07-13 10:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

The engine provides a mirror of the CSB in the HWSP. If we use the
cacheable reads from the HWSP, we can shave off a few mmio reads per
context-switch interrupt (which are quite frequent!). Just removing a
couple of mmio is not enough to actually reduce any latency, but a small
reduction in overall cpu usage.

Much appreciation for Ben dropping the bombshell that the CSB was in the
HWSP and for Michel in digging out the details.

v2: Don't be lazy, add the defines for the indices.
v3: Include the HWSP in debugfs/i915_engine_info
v4: Check for GVT-g, it currently depends on intercepting CSB mmio

Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  7 +++++--
 drivers/gpu/drm/i915/intel_lrc.c        | 16 +++++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
 3 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 620c9218d1c1..5fd01c14a3ec 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3384,6 +3384,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 			   upper_32_bits(addr), lower_32_bits(addr));
 
 		if (i915.enable_execlists) {
+			const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
 			u32 ptr, read, write;
 			unsigned int idx;
 
@@ -3404,10 +3405,12 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 				write += GEN8_CSB_ENTRIES;
 			while (read < write) {
 				idx = ++read % GEN8_CSB_ENTRIES;
-				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x, context: %d\n",
+				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x [0x%08x in hwsp], context: %d [%d in hwsp]\n",
 					   idx,
 					   I915_READ(RING_CONTEXT_STATUS_BUF_LO(engine, idx)),
-					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)));
+					   hws[idx * 2],
+					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)),
+					   hws[idx * 2 + 1]);
 			}
 
 			rcu_read_lock();
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3469badedbe0..5b721f65d232 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -547,10 +547,17 @@ static void intel_lrc_irq_handler(unsigned long data)
 	while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
 		u32 __iomem *csb_mmio =
 			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
-		u32 __iomem *buf =
-			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0));
+		/* The HWSP contains a (cacheable) mirror of the CSB */
+		const u32 *buf =
+			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
 		unsigned int head, tail;
 
+		/* However GVT emulation depends upon intercepting CSB mmio */
+		if (unlikely(intel_vgpu_active(dev_priv))) {
+			buf = (u32 * __force)
+				(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
+		}
+
 		/* The write will be ordered by the uncached read (itself
 		 * a memory barrier), so we do not need another in the form
 		 * of a locked instruction. The race between the interrupt
@@ -590,13 +597,12 @@ static void intel_lrc_irq_handler(unsigned long data)
 			 * status notifier.
 			 */
 
-			status = readl(buf + 2 * head);
+			status = buf[2 * head];
 			if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
 				continue;
 
 			/* Check the context/desc id for this event matches */
-			GEM_DEBUG_BUG_ON(readl(buf + 2 * head + 1) !=
-					 port->context_id);
+			GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
 
 			rq = port_unpack(port, &count);
 			GEM_BUG_ON(count == 0);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index d33c93444c0d..2c55cfa14fb5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -496,6 +496,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
 #define I915_GEM_HWS_SCRATCH_INDEX	0x40
 #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
 
+#define I915_HWS_CSB_BUF0_INDEX		0x10
+
 struct intel_ring *
 intel_engine_create_ring(struct intel_engine_cs *engine, int size);
 int intel_ring_pin(struct intel_ring *ring,
-- 
2.13.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 5/5] drm/i915/execlists: Read the context-status HEAD from the HWSP
  2017-07-13 10:27 [PATCH v3 1/5] drm/i915/lrc: Clarify the format of the context image Chris Wilson
                   ` (2 preceding siblings ...)
  2017-07-13 10:27 ` [PATCH v3 4/5] drm/i915/execlists: Read the context-status buffer from the HWSP Chris Wilson
@ 2017-07-13 10:27 ` Chris Wilson
  2017-07-13 14:17   ` Chris Wilson
                     ` (2 more replies)
  2017-07-13 10:50 ` ✗ Fi.CI.BAT: warning for series starting with [v3,1/5] drm/i915/lrc: Clarify the format of the context image Patchwork
  2017-07-13 19:47 ` ✗ Fi.CI.BAT: failure for series starting with [v3,1/5] drm/i915/lrc: Clarify the format of the context image (rev2) Patchwork
  5 siblings, 3 replies; 16+ messages in thread
From: Chris Wilson @ 2017-07-13 10:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

The engine also provides a mirror of the CSB write pointer in the HWSP,
but not of our read pointer. To take advantage of this we need to
remember where we read up to on the last interrupt and continue off from
there. This poses a problem following a reset, as we don't know where
the hw will start writing from, and due to the use of power contexts we
cannot perform that query during the reset itself. So we continue the
current modus operandi of delaying the first read of the context-status
read/write pointers until after the first interrupt. With this we should
now have eliminated all uncached mmio reads in handling the
context-status interrupt, though we still have the uncached mmio writes
for submitting new work, and many uncached mmio reads in the global
interrupt handler itself. Still a step in the right direction towards
reducing our resubmit latency, although it appears lost in the noise!

v2: Cannonlake moved the CSB write index
v3: Include the sw/hwsp state in debugfs/i915_engine_info
v4: Also revert to using CSB mmio for GVT-g

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  6 ++++--
 drivers/gpu/drm/i915/i915_drv.h         |  8 ++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 25 ++++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +++
 4 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 5fd01c14a3ec..552aef61b47b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3395,8 +3395,10 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 			ptr = I915_READ(RING_CONTEXT_STATUS_PTR(engine));
 			read = GEN8_CSB_READ_PTR(ptr);
 			write = GEN8_CSB_WRITE_PTR(ptr);
-			seq_printf(m, "\tExeclist CSB read %d, write %d\n",
-				   read, write);
+			seq_printf(m, "\tExeclist CSB read %d [%d cached], write %d [%d from hws]\n",
+				   read, engine->csb_head,
+				   write,
+				   intel_read_status_page(engine, intel_hws_csb_write_index(engine->i915)));
 			if (read >= GEN8_CSB_ENTRIES)
 				read = 0;
 			if (write >= GEN8_CSB_ENTRIES)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 81cd21ecfa7d..f62c9db8a9a8 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -4228,4 +4228,12 @@ static inline bool i915_gem_object_is_coherent(struct drm_i915_gem_object *obj)
 		HAS_LLC(to_i915(obj->base.dev)));
 }
 
+static inline int intel_hws_csb_write_index(struct drm_i915_private *i915)
+{
+	if (INTEL_GEN(i915) >= 10)
+		return CNL_HWS_CSB_WRITE_INDEX;
+	else
+		return I915_HWS_CSB_WRITE_INDEX;
+}
+
 #endif
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5b721f65d232..7c3dce27e504 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -556,6 +556,7 @@ static void intel_lrc_irq_handler(unsigned long data)
 		if (unlikely(intel_vgpu_active(dev_priv))) {
 			buf = (u32 * __force)
 				(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
+			engine->csb_head = -1;
 		}
 
 		/* The write will be ordered by the uncached read (itself
@@ -569,9 +570,19 @@ static void intel_lrc_irq_handler(unsigned long data)
 		 * is set and we do a new loop.
 		 */
 		__clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
-		head = readl(csb_mmio);
-		tail = GEN8_CSB_WRITE_PTR(head);
-		head = GEN8_CSB_READ_PTR(head);
+		if (unlikely(engine->csb_head == -1)) { /* following a reset */
+			head = readl(csb_mmio);
+			tail = GEN8_CSB_WRITE_PTR(head);
+			head = GEN8_CSB_READ_PTR(head);
+			engine->csb_head = head;
+		} else {
+			const int write_idx =
+				intel_hws_csb_write_index(dev_priv) -
+				I915_HWS_CSB_BUF0_INDEX;
+
+			head = engine->csb_head;
+			tail = buf[write_idx];
+		}
 		while (head != tail) {
 			struct drm_i915_gem_request *rq;
 			unsigned int status;
@@ -625,8 +636,11 @@ static void intel_lrc_irq_handler(unsigned long data)
 				   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
 		}
 
-		writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
-		       csb_mmio);
+		if (head != engine->csb_head) {
+			engine->csb_head = head;
+			writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
+			       csb_mmio);
+		}
 	}
 
 	if (execlists_elsp_ready(engine))
@@ -1253,6 +1267,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
 
 	/* After a GPU reset, we may have requests to replay */
 	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+	engine->csb_head = -1;
 
 	submit = false;
 	for (n = 0; n < ARRAY_SIZE(engine->execlist_port); n++) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 2c55cfa14fb5..a182da7eb9a9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -391,6 +391,7 @@ struct intel_engine_cs {
 	struct rb_root execlist_queue;
 	struct rb_node *execlist_first;
 	unsigned int fw_domains;
+	unsigned int csb_head;
 
 	/* Contexts are pinned whilst they are active on the GPU. The last
 	 * context executed remains active whilst the GPU is idle - the
@@ -497,6 +498,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
 #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
 
 #define I915_HWS_CSB_BUF0_INDEX		0x10
+#define I915_HWS_CSB_WRITE_INDEX	0x1f
+#define CNL_HWS_CSB_WRITE_INDEX		0x2f
 
 struct intel_ring *
 intel_engine_create_ring(struct intel_engine_cs *engine, int size);
-- 
2.13.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* ✗ Fi.CI.BAT: warning for series starting with [v3,1/5] drm/i915/lrc: Clarify the format of the context image
  2017-07-13 10:27 [PATCH v3 1/5] drm/i915/lrc: Clarify the format of the context image Chris Wilson
                   ` (3 preceding siblings ...)
  2017-07-13 10:27 ` [PATCH v3 5/5] drm/i915/execlists: Read the context-status HEAD " Chris Wilson
@ 2017-07-13 10:50 ` Patchwork
  2017-07-13 19:47 ` ✗ Fi.CI.BAT: failure for series starting with [v3,1/5] drm/i915/lrc: Clarify the format of the context image (rev2) Patchwork
  5 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2017-07-13 10:50 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [v3,1/5] drm/i915/lrc: Clarify the format of the context image
URL   : https://patchwork.freedesktop.org/series/27232/
State : warning

== Summary ==

Series 27232v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/27232/revisions/1/mbox/

Test gem_exec_suspend:
        Subgroup basic-s4-devices:
                pass       -> DMESG-WARN (fi-kbl-7560u) fdo#100125
Test kms_flip:
        Subgroup basic-flip-vs-modeset:
                pass       -> SKIP       (fi-skl-x1585l)

fdo#100125 https://bugs.freedesktop.org/show_bug.cgi?id=100125

fi-bdw-5557u     total:279  pass:268  dwarn:0   dfail:0   fail:0   skip:11  time:444s
fi-bdw-gvtdvm    total:279  pass:265  dwarn:0   dfail:0   fail:0   skip:14  time:431s
fi-blb-e6850     total:279  pass:224  dwarn:1   dfail:0   fail:0   skip:54  time:356s
fi-bsw-n3050     total:279  pass:243  dwarn:0   dfail:0   fail:0   skip:36  time:528s
fi-bxt-j4205     total:279  pass:260  dwarn:0   dfail:0   fail:0   skip:19  time:500s
fi-byt-j1900     total:279  pass:254  dwarn:1   dfail:0   fail:0   skip:24  time:492s
fi-byt-n2820     total:279  pass:250  dwarn:1   dfail:0   fail:0   skip:28  time:482s
fi-glk-2a        total:279  pass:260  dwarn:0   dfail:0   fail:0   skip:19  time:595s
fi-hsw-4770      total:279  pass:263  dwarn:0   dfail:0   fail:0   skip:16  time:443s
fi-hsw-4770r     total:279  pass:263  dwarn:0   dfail:0   fail:0   skip:16  time:413s
fi-ilk-650       total:279  pass:229  dwarn:0   dfail:0   fail:0   skip:50  time:421s
fi-ivb-3520m     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:490s
fi-ivb-3770      total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:494s
fi-kbl-7500u     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:465s
fi-kbl-7560u     total:279  pass:268  dwarn:1   dfail:0   fail:0   skip:10  time:581s
fi-kbl-r         total:279  pass:260  dwarn:1   dfail:0   fail:0   skip:18  time:582s
fi-pnv-d510      total:279  pass:222  dwarn:2   dfail:0   fail:0   skip:55  time:561s
fi-skl-6260u     total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:456s
fi-skl-6700hq    total:279  pass:262  dwarn:0   dfail:0   fail:0   skip:17  time:583s
fi-skl-6700k     total:279  pass:257  dwarn:4   dfail:0   fail:0   skip:18  time:463s
fi-skl-6770hq    total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:476s
fi-skl-gvtdvm    total:279  pass:266  dwarn:0   dfail:0   fail:0   skip:13  time:433s
fi-skl-x1585l    total:279  pass:268  dwarn:0   dfail:0   fail:0   skip:11  time:469s
fi-snb-2520m     total:279  pass:251  dwarn:0   dfail:0   fail:0   skip:28  time:544s
fi-snb-2600      total:279  pass:250  dwarn:0   dfail:0   fail:0   skip:29  time:402s

8ad9e19aafea47c272163c2cbf554e06ff7f9857 drm-tip: 2017y-07m-11d-19h-08m-20s UTC integration manifest
01e5e06 drm/i915/execlists: Read the context-status HEAD from the HWSP
047511e drm/i915/execlists: Read the context-status buffer from the HWSP
8a537bb drm/i915/lrc: allocate separate page for HWSP
5e246f6 drm/i915/guc: Don't make assumptions while getting the lrca offset
9e5da4b drm/i915/lrc: Clarify the format of the context image

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_5182/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 5/5] drm/i915/execlists: Read the context-status HEAD from the HWSP
  2017-07-13 10:27 ` [PATCH v3 5/5] drm/i915/execlists: Read the context-status HEAD " Chris Wilson
@ 2017-07-13 14:17   ` Chris Wilson
  2017-07-13 20:44     ` Chris Wilson
  2017-07-13 18:12   ` Michel Thierry
  2017-07-20 13:38   ` Mika Kuoppala
  2 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2017-07-13 14:17 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

Quoting Chris Wilson (2017-07-13 11:27:06)
> The engine also provides a mirror of the CSB write pointer in the HWSP,
> but not of our read pointer. To take advantage of this we need to
> remember where we read up to on the last interrupt and continue off from
> there. This poses a problem following a reset, as we don't know where
> the hw will start writing from, and due to the use of power contexts we
> cannot perform that query during the reset itself. So we continue the
> current modus operandi of delaying the first read of the context-status
> read/write pointers until after the first interrupt. With this we should
> now have eliminated all uncached mmio reads in handling the
> context-status interrupt, though we still have the uncached mmio writes
> for submitting new work, and many uncached mmio reads in the global
> interrupt handler itself. Still a step in the right direction towards
> reducing our resubmit latency, although it appears lost in the noise!

It is also worth noting that Broxton seems to handle per-engine resets
better when using the HWSP than CSB mmio. Too early to say for sure,
needs a few days to be sure the error doesn't occur.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/5] drm/i915/lrc: allocate separate page for HWSP
  2017-07-13 10:27 ` [PATCH v3 3/5] drm/i915/lrc: allocate separate page for HWSP Chris Wilson
@ 2017-07-13 17:40   ` Michel Thierry
  0 siblings, 0 replies; 16+ messages in thread
From: Michel Thierry @ 2017-07-13 17:40 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx



On 13/07/17 03:27, Chris Wilson wrote:
> From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> 
> On gen8+ we're currently using the PPHWSP of the kernel ctx as the
> global HWSP. However, when the kernel ctx gets submitted (e.g. from
> __intel_autoenable_gt_powersave) the HW will use that page as both
> HWSP and PPHWSP. This causes a conflict in the register arena of the
> HWSP, i.e. dword indices below 0x30. We don't current utilize this arena,
> but in the following patches we will take advantage of the cached
> register state for handling execlist's context status interrupt.
> 
> To avoid the conflict, instead of re-using the PPHWSP of the kernel
> ctx we can allocate a separate page for the HWSP like what happens for
> pre-execlists platform.
> 
> v2: Add a use-case for the register arena of the HWSP.
> 
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Link: http://patchwork.freedesktop.org/patch/msgid/1499357440-34688-1-git-send-email-daniele.ceraolospurio@intel.com
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewded-by: Michel Thierry <michel.thierry@intel.com>
         ^ that's why patchwork didn't detect the r-b.

> ---
>   drivers/gpu/drm/i915/intel_engine_cs.c  | 126 +++++++++++++++++++++++++++++++-
>   drivers/gpu/drm/i915/intel_lrc.c        |  34 +--------
>   drivers/gpu/drm/i915/intel_ringbuffer.c | 125 +------------------------------
>   3 files changed, 127 insertions(+), 158 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 24db316e0fd1..cba120f3d89d 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -445,6 +445,114 @@ static void intel_engine_cleanup_scratch(struct intel_engine_cs *engine)
>   	i915_vma_unpin_and_release(&engine->scratch);
>   }
>   
> +static void cleanup_phys_status_page(struct intel_engine_cs *engine)
> +{
> +	struct drm_i915_private *dev_priv = engine->i915;
> +
> +	if (!dev_priv->status_page_dmah)
> +		return;
> +
> +	drm_pci_free(&dev_priv->drm, dev_priv->status_page_dmah);
> +	engine->status_page.page_addr = NULL;
> +}
> +
> +static void cleanup_status_page(struct intel_engine_cs *engine)
> +{
> +	struct i915_vma *vma;
> +	struct drm_i915_gem_object *obj;
> +
> +	vma = fetch_and_zero(&engine->status_page.vma);
> +	if (!vma)
> +		return;
> +
> +	obj = vma->obj;
> +
> +	i915_vma_unpin(vma);
> +	i915_vma_close(vma);
> +
> +	i915_gem_object_unpin_map(obj);
> +	__i915_gem_object_release_unless_active(obj);
> +}
> +
> +static int init_status_page(struct intel_engine_cs *engine)
> +{
> +	struct drm_i915_gem_object *obj;
> +	struct i915_vma *vma;
> +	unsigned int flags;
> +	void *vaddr;
> +	int ret;
> +
> +	obj = i915_gem_object_create_internal(engine->i915, PAGE_SIZE);
> +	if (IS_ERR(obj)) {
> +		DRM_ERROR("Failed to allocate status page\n");
> +		return PTR_ERR(obj);
> +	}
> +
> +	ret = i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
> +	if (ret)
> +		goto err;
> +
> +	vma = i915_vma_instance(obj, &engine->i915->ggtt.base, NULL);
> +	if (IS_ERR(vma)) {
> +		ret = PTR_ERR(vma);
> +		goto err;
> +	}
> +
> +	flags = PIN_GLOBAL;
> +	if (!HAS_LLC(engine->i915))
> +		/* On g33, we cannot place HWS above 256MiB, so
> +		 * restrict its pinning to the low mappable arena.
> +		 * Though this restriction is not documented for
> +		 * gen4, gen5, or byt, they also behave similarly
> +		 * and hang if the HWS is placed at the top of the
> +		 * GTT. To generalise, it appears that all !llc
> +		 * platforms have issues with us placing the HWS
> +		 * above the mappable region (even though we never
> +		 * actualy map it).
> +		 */
> +		flags |= PIN_MAPPABLE;
> +	ret = i915_vma_pin(vma, 0, 4096, flags);
> +	if (ret)
> +		goto err;
> +
> +	vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
> +	if (IS_ERR(vaddr)) {
> +		ret = PTR_ERR(vaddr);
> +		goto err_unpin;
> +	}
> +
> +	engine->status_page.vma = vma;
> +	engine->status_page.ggtt_offset = i915_ggtt_offset(vma);
> +	engine->status_page.page_addr = memset(vaddr, 0, PAGE_SIZE);
> +
> +	DRM_DEBUG_DRIVER("%s hws offset: 0x%08x\n",
> +			 engine->name, i915_ggtt_offset(vma));
> +	return 0;
> +
> +err_unpin:
> +	i915_vma_unpin(vma);
> +err:
> +	i915_gem_object_put(obj);
> +	return ret;
> +}
> +
> +static int init_phys_status_page(struct intel_engine_cs *engine)
> +{
> +	struct drm_i915_private *dev_priv = engine->i915;
> +
> +	GEM_BUG_ON(engine->id != RCS);
> +
> +	dev_priv->status_page_dmah =
> +		drm_pci_alloc(&dev_priv->drm, PAGE_SIZE, PAGE_SIZE);
> +	if (!dev_priv->status_page_dmah)
> +		return -ENOMEM;
> +
> +	engine->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
> +	memset(engine->status_page.page_addr, 0, PAGE_SIZE);
> +
> +	return 0;
> +}
> +
>   /**
>    * intel_engines_init_common - initialize cengine state which might require hw access
>    * @engine: Engine to initialize.
> @@ -480,10 +588,21 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
>   
>   	ret = i915_gem_render_state_init(engine);
>   	if (ret)
> -		goto err_unpin;
> +		goto err_breadcrumbs;
> +
> +	if (HWS_NEEDS_PHYSICAL(engine->i915))
> +		ret = init_phys_status_page(engine);
> +	else
> +		ret = init_status_page(engine);
> +	if (ret)
> +		goto err_rs_fini;
>   
>   	return 0;
>   
> +err_rs_fini:
> +	i915_gem_render_state_fini(engine);
> +err_breadcrumbs:
> +	intel_engine_fini_breadcrumbs(engine);
>   err_unpin:
>   	engine->context_unpin(engine, engine->i915->kernel_context);
>   	return ret;
> @@ -500,6 +619,11 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
>   {
>   	intel_engine_cleanup_scratch(engine);
>   
> +	if (HWS_NEEDS_PHYSICAL(engine->i915))
> +		cleanup_phys_status_page(engine);
> +	else
> +		cleanup_status_page(engine);
> +
>   	i915_gem_render_state_fini(engine);
>   	intel_engine_fini_breadcrumbs(engine);
>   	intel_engine_cleanup_cmd_parser(engine);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index fde145c552ef..3469badedbe0 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1641,11 +1641,6 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
>   	if (engine->cleanup)
>   		engine->cleanup(engine);
>   
> -	if (engine->status_page.vma) {
> -		i915_gem_object_unpin_map(engine->status_page.vma->obj);
> -		engine->status_page.vma = NULL;
> -	}
> -
>   	intel_engine_cleanup_common(engine);
>   
>   	lrc_destroy_wa_ctx(engine);
> @@ -1692,24 +1687,6 @@ logical_ring_default_irqs(struct intel_engine_cs *engine)
>   	engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift;
>   }
>   
> -static int
> -lrc_setup_hws(struct intel_engine_cs *engine, struct i915_vma *vma)
> -{
> -	const int hws_offset = LRC_PPHWSP_PN * PAGE_SIZE;
> -	void *hws;
> -
> -	/* The HWSP is part of the default context object in LRC mode. */
> -	hws = i915_gem_object_pin_map(vma->obj, I915_MAP_WB);
> -	if (IS_ERR(hws))
> -		return PTR_ERR(hws);
> -
> -	engine->status_page.page_addr = hws + hws_offset;
> -	engine->status_page.ggtt_offset = i915_ggtt_offset(vma) + hws_offset;
> -	engine->status_page.vma = vma;
> -
> -	return 0;
> -}
> -
>   static void
>   logical_ring_setup(struct intel_engine_cs *engine)
>   {
> @@ -1742,23 +1719,14 @@ logical_ring_setup(struct intel_engine_cs *engine)
>   	logical_ring_default_irqs(engine);
>   }
>   
> -static int
> -logical_ring_init(struct intel_engine_cs *engine)
> +static int logical_ring_init(struct intel_engine_cs *engine)
>   {
> -	struct i915_gem_context *dctx = engine->i915->kernel_context;
>   	int ret;
>   
>   	ret = intel_engine_init_common(engine);
>   	if (ret)
>   		goto error;
>   
> -	/* And setup the hardware status page. */
> -	ret = lrc_setup_hws(engine, dctx->engine[engine->id].state);
> -	if (ret) {
> -		DRM_ERROR("Failed to set up hws %s: %d\n", engine->name, ret);
> -		goto error;
> -	}
> -
>   	return 0;
>   
>   error:
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 5224b7abb8a3..b2580539051e 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1174,113 +1174,7 @@ i915_emit_bb_start(struct drm_i915_gem_request *req,
>   	return 0;
>   }
>   
> -static void cleanup_phys_status_page(struct intel_engine_cs *engine)
> -{
> -	struct drm_i915_private *dev_priv = engine->i915;
> -
> -	if (!dev_priv->status_page_dmah)
> -		return;
> -
> -	drm_pci_free(&dev_priv->drm, dev_priv->status_page_dmah);
> -	engine->status_page.page_addr = NULL;
> -}
> -
> -static void cleanup_status_page(struct intel_engine_cs *engine)
> -{
> -	struct i915_vma *vma;
> -	struct drm_i915_gem_object *obj;
> -
> -	vma = fetch_and_zero(&engine->status_page.vma);
> -	if (!vma)
> -		return;
> -
> -	obj = vma->obj;
> -
> -	i915_vma_unpin(vma);
> -	i915_vma_close(vma);
> -
> -	i915_gem_object_unpin_map(obj);
> -	__i915_gem_object_release_unless_active(obj);
> -}
>   
> -static int init_status_page(struct intel_engine_cs *engine)
> -{
> -	struct drm_i915_gem_object *obj;
> -	struct i915_vma *vma;
> -	unsigned int flags;
> -	void *vaddr;
> -	int ret;
> -
> -	obj = i915_gem_object_create_internal(engine->i915, PAGE_SIZE);
> -	if (IS_ERR(obj)) {
> -		DRM_ERROR("Failed to allocate status page\n");
> -		return PTR_ERR(obj);
> -	}
> -
> -	ret = i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
> -	if (ret)
> -		goto err;
> -
> -	vma = i915_vma_instance(obj, &engine->i915->ggtt.base, NULL);
> -	if (IS_ERR(vma)) {
> -		ret = PTR_ERR(vma);
> -		goto err;
> -	}
> -
> -	flags = PIN_GLOBAL;
> -	if (!HAS_LLC(engine->i915))
> -		/* On g33, we cannot place HWS above 256MiB, so
> -		 * restrict its pinning to the low mappable arena.
> -		 * Though this restriction is not documented for
> -		 * gen4, gen5, or byt, they also behave similarly
> -		 * and hang if the HWS is placed at the top of the
> -		 * GTT. To generalise, it appears that all !llc
> -		 * platforms have issues with us placing the HWS
> -		 * above the mappable region (even though we never
> -		 * actualy map it).
> -		 */
> -		flags |= PIN_MAPPABLE;
> -	ret = i915_vma_pin(vma, 0, 4096, flags);
> -	if (ret)
> -		goto err;
> -
> -	vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
> -	if (IS_ERR(vaddr)) {
> -		ret = PTR_ERR(vaddr);
> -		goto err_unpin;
> -	}
> -
> -	engine->status_page.vma = vma;
> -	engine->status_page.ggtt_offset = i915_ggtt_offset(vma);
> -	engine->status_page.page_addr = memset(vaddr, 0, PAGE_SIZE);
> -
> -	DRM_DEBUG_DRIVER("%s hws offset: 0x%08x\n",
> -			 engine->name, i915_ggtt_offset(vma));
> -	return 0;
> -
> -err_unpin:
> -	i915_vma_unpin(vma);
> -err:
> -	i915_gem_object_put(obj);
> -	return ret;
> -}
> -
> -static int init_phys_status_page(struct intel_engine_cs *engine)
> -{
> -	struct drm_i915_private *dev_priv = engine->i915;
> -
> -	GEM_BUG_ON(engine->id != RCS);
> -
> -	dev_priv->status_page_dmah =
> -		drm_pci_alloc(&dev_priv->drm, PAGE_SIZE, PAGE_SIZE);
> -	if (!dev_priv->status_page_dmah)
> -		return -ENOMEM;
> -
> -	engine->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
> -	memset(engine->status_page.page_addr, 0, PAGE_SIZE);
> -
> -	return 0;
> -}
>   
>   int intel_ring_pin(struct intel_ring *ring,
>   		   struct drm_i915_private *i915,
> @@ -1567,17 +1461,10 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
>   	if (err)
>   		goto err;
>   
> -	if (HWS_NEEDS_PHYSICAL(engine->i915))
> -		err = init_phys_status_page(engine);
> -	else
> -		err = init_status_page(engine);
> -	if (err)
> -		goto err;
> -
>   	ring = intel_engine_create_ring(engine, 32 * PAGE_SIZE);
>   	if (IS_ERR(ring)) {
>   		err = PTR_ERR(ring);
> -		goto err_hws;
> +		goto err;
>   	}
>   
>   	/* Ring wraparound at offset 0 sometimes hangs. No idea why. */
> @@ -1592,11 +1479,6 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
>   
>   err_ring:
>   	intel_ring_free(ring);
> -err_hws:
> -	if (HWS_NEEDS_PHYSICAL(engine->i915))
> -		cleanup_phys_status_page(engine);
> -	else
> -		cleanup_status_page(engine);
>   err:
>   	intel_engine_cleanup_common(engine);
>   	return err;
> @@ -1615,11 +1497,6 @@ void intel_engine_cleanup(struct intel_engine_cs *engine)
>   	if (engine->cleanup)
>   		engine->cleanup(engine);
>   
> -	if (HWS_NEEDS_PHYSICAL(dev_priv))
> -		cleanup_phys_status_page(engine);
> -	else
> -		cleanup_status_page(engine);
> -
>   	intel_engine_cleanup_common(engine);
>   
>   	dev_priv->engine[engine->id] = NULL;
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/5] drm/i915/execlists: Read the context-status buffer from the HWSP
  2017-07-13 10:27 ` [PATCH v3 4/5] drm/i915/execlists: Read the context-status buffer from the HWSP Chris Wilson
@ 2017-07-13 17:58   ` Michel Thierry
  2017-07-13 19:44   ` [PATCH v4] " Chris Wilson
  1 sibling, 0 replies; 16+ messages in thread
From: Michel Thierry @ 2017-07-13 17:58 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Mika Kuoppala

On 13/07/17 03:27, Chris Wilson wrote:
> The engine provides a mirror of the CSB in the HWSP. If we use the
> cacheable reads from the HWSP, we can shave off a few mmio reads per
> context-switch interrupt (which are quite frequent!). Just removing a
> couple of mmio is not enough to actually reduce any latency, but a small
> reduction in overall cpu usage.
> 
> Much appreciation for Ben dropping the bombshell that the CSB was in the
> HWSP and for Michel in digging out the details.
> 
> v2: Don't be lazy, add the defines for the indices.
> v3: Include the HWSP in debugfs/i915_engine_info
> v4: Check for GVT-g, it currently depends on intercepting CSB mmio
> 
> Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> Cc: Zhi Wang <zhi.a.wang@intel.com>
> Acked-by: Michel Thierry <michel.thierry@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c     |  7 +++++--
>   drivers/gpu/drm/i915/intel_lrc.c        | 16 +++++++++++-----
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
>   3 files changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 620c9218d1c1..5fd01c14a3ec 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -3384,6 +3384,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>   			   upper_32_bits(addr), lower_32_bits(addr));
>   
>   		if (i915.enable_execlists) {
> +			const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
>   			u32 ptr, read, write;
>   			unsigned int idx;
>   
> @@ -3404,10 +3405,12 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>   				write += GEN8_CSB_ENTRIES;
>   			while (read < write) {
>   				idx = ++read % GEN8_CSB_ENTRIES;
> -				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x, context: %d\n",
> +				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x [0x%08x in hwsp], context: %d [%d in hwsp]\n",
>   					   idx,
>   					   I915_READ(RING_CONTEXT_STATUS_BUF_LO(engine, idx)),
> -					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)));
> +					   hws[idx * 2],
> +					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)),
> +					   hws[idx * 2 + 1]);
>   			}
>   
>   			rcu_read_lock();
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 3469badedbe0..5b721f65d232 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -547,10 +547,17 @@ static void intel_lrc_irq_handler(unsigned long data)
>   	while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
>   		u32 __iomem *csb_mmio =
>   			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
> -		u32 __iomem *buf =
> -			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0));
> +		/* The HWSP contains a (cacheable) mirror of the CSB */
> +		const u32 *buf =
> +			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
>   		unsigned int head, tail;
>   
> +		/* However GVT emulation depends upon intercepting CSB mmio */
> +		if (unlikely(intel_vgpu_active(dev_priv))) {
> +			buf = (u32 * __force)
> +				(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));

RING_CONTEXT_STATUS_BUF_LO(engine, 0) instead of 
RING_CONTEXT_STATUS_PTR(engine)?

> +		}
> +
>   		/* The write will be ordered by the uncached read (itself
>   		 * a memory barrier), so we do not need another in the form
>   		 * of a locked instruction. The race between the interrupt
> @@ -590,13 +597,12 @@ static void intel_lrc_irq_handler(unsigned long data)
>   			 * status notifier.
>   			 */
>   
> -			status = readl(buf + 2 * head);
> +			status = buf[2 * head];
>   			if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
>   				continue;
>   
>   			/* Check the context/desc id for this event matches */
> -			GEM_DEBUG_BUG_ON(readl(buf + 2 * head + 1) !=
> -					 port->context_id);
> +			GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
>   
>   			rq = port_unpack(port, &count);
>   			GEM_BUG_ON(count == 0);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index d33c93444c0d..2c55cfa14fb5 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -496,6 +496,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
>   #define I915_GEM_HWS_SCRATCH_INDEX	0x40
>   #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
>   
> +#define I915_HWS_CSB_BUF0_INDEX		0x10
> +
>   struct intel_ring *
>   intel_engine_create_ring(struct intel_engine_cs *engine, int size);
>   int intel_ring_pin(struct intel_ring *ring,
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 5/5] drm/i915/execlists: Read the context-status HEAD from the HWSP
  2017-07-13 10:27 ` [PATCH v3 5/5] drm/i915/execlists: Read the context-status HEAD " Chris Wilson
  2017-07-13 14:17   ` Chris Wilson
@ 2017-07-13 18:12   ` Michel Thierry
  2017-07-20 13:38   ` Mika Kuoppala
  2 siblings, 0 replies; 16+ messages in thread
From: Michel Thierry @ 2017-07-13 18:12 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Mika Kuoppala

On 13/07/17 03:27, Chris Wilson wrote:
> The engine also provides a mirror of the CSB write pointer in the HWSP,
> but not of our read pointer. To take advantage of this we need to
> remember where we read up to on the last interrupt and continue off from
> there. This poses a problem following a reset, as we don't know where
> the hw will start writing from, and due to the use of power contexts we
> cannot perform that query during the reset itself. So we continue the
> current modus operandi of delaying the first read of the context-status
> read/write pointers until after the first interrupt. With this we should
> now have eliminated all uncached mmio reads in handling the
> context-status interrupt, though we still have the uncached mmio writes
> for submitting new work, and many uncached mmio reads in the global
> interrupt handler itself. Still a step in the right direction towards
> reducing our resubmit latency, although it appears lost in the noise!
> 
> v2: Cannonlake moved the CSB write index
> v3: Include the sw/hwsp state in debugfs/i915_engine_info
> v4: Also revert to using CSB mmio for GVT-g
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> Cc: Zhi Wang <zhi.a.wang@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c     |  6 ++++--
>   drivers/gpu/drm/i915/i915_drv.h         |  8 ++++++++
>   drivers/gpu/drm/i915/intel_lrc.c        | 25 ++++++++++++++++++++-----
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +++
>   4 files changed, 35 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 5fd01c14a3ec..552aef61b47b 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -3395,8 +3395,10 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>   			ptr = I915_READ(RING_CONTEXT_STATUS_PTR(engine));
>   			read = GEN8_CSB_READ_PTR(ptr);
>   			write = GEN8_CSB_WRITE_PTR(ptr);
> -			seq_printf(m, "\tExeclist CSB read %d, write %d\n",
> -				   read, write);
> +			seq_printf(m, "\tExeclist CSB read %d [%d cached], write %d [%d from hws]\n",
> +				   read, engine->csb_head,
> +				   write,
> +				   intel_read_status_page(engine, intel_hws_csb_write_index(engine->i915)));
>   			if (read >= GEN8_CSB_ENTRIES)
>   				read = 0;
>   			if (write >= GEN8_CSB_ENTRIES)
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 81cd21ecfa7d..f62c9db8a9a8 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -4228,4 +4228,12 @@ static inline bool i915_gem_object_is_coherent(struct drm_i915_gem_object *obj)
>   		HAS_LLC(to_i915(obj->base.dev)));
>   }
>   
> +static inline int intel_hws_csb_write_index(struct drm_i915_private *i915)
> +{
> +	if (INTEL_GEN(i915) >= 10)
> +		return CNL_HWS_CSB_WRITE_INDEX;
> +	else
> +		return I915_HWS_CSB_WRITE_INDEX;
> +}
> +
>   #endif
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 5b721f65d232..7c3dce27e504 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -556,6 +556,7 @@ static void intel_lrc_irq_handler(unsigned long data)
>   		if (unlikely(intel_vgpu_active(dev_priv))) {
>   			buf = (u32 * __force)
>   				(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
> +			engine->csb_head = -1;

I would add a comment of why you have to change csb_head, something like:

			engine->csb_head = -1; /* force CSB via mmio */

>   		}
>   
>   		/* The write will be ordered by the uncached read (itself

Looks good to me and it works in skl. The news about bxt are promising.

Since this, and the patch before it, modify the behaviour for all GEN8+, 
one (or more) set of eyes would be good.

Acked-by: Michel Thierry <michel.thierry@intel.com>

> @@ -569,9 +570,19 @@ static void intel_lrc_irq_handler(unsigned long data)
>   		 * is set and we do a new loop.
>   		 */
>   		__clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> -		head = readl(csb_mmio);
> -		tail = GEN8_CSB_WRITE_PTR(head);
> -		head = GEN8_CSB_READ_PTR(head);
> +		if (unlikely(engine->csb_head == -1)) { /* following a reset */
> +			head = readl(csb_mmio);
> +			tail = GEN8_CSB_WRITE_PTR(head);
> +			head = GEN8_CSB_READ_PTR(head);
> +			engine->csb_head = head;
> +		} else {
> +			const int write_idx =
> +				intel_hws_csb_write_index(dev_priv) -
> +				I915_HWS_CSB_BUF0_INDEX;
> +
> +			head = engine->csb_head;
> +			tail = buf[write_idx];
> +		}
>   		while (head != tail) {
>   			struct drm_i915_gem_request *rq;
>   			unsigned int status;
> @@ -625,8 +636,11 @@ static void intel_lrc_irq_handler(unsigned long data)
>   				   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
>   		}
>   
> -		writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
> -		       csb_mmio);
> +		if (head != engine->csb_head) {
> +			engine->csb_head = head;
> +			writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
> +			       csb_mmio);
> +		}
>   	}
>   
>   	if (execlists_elsp_ready(engine))
> @@ -1253,6 +1267,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
>   
>   	/* After a GPU reset, we may have requests to replay */
>   	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> +	engine->csb_head = -1;
>   
>   	submit = false;
>   	for (n = 0; n < ARRAY_SIZE(engine->execlist_port); n++) {
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 2c55cfa14fb5..a182da7eb9a9 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -391,6 +391,7 @@ struct intel_engine_cs {
>   	struct rb_root execlist_queue;
>   	struct rb_node *execlist_first;
>   	unsigned int fw_domains;
> +	unsigned int csb_head;
>   
>   	/* Contexts are pinned whilst they are active on the GPU. The last
>   	 * context executed remains active whilst the GPU is idle - the
> @@ -497,6 +498,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
>   #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
>   
>   #define I915_HWS_CSB_BUF0_INDEX		0x10
> +#define I915_HWS_CSB_WRITE_INDEX	0x1f
> +#define CNL_HWS_CSB_WRITE_INDEX		0x2f
>   
>   struct intel_ring *
>   intel_engine_create_ring(struct intel_engine_cs *engine, int size);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4] drm/i915/execlists: Read the context-status buffer from the HWSP
  2017-07-13 10:27 ` [PATCH v3 4/5] drm/i915/execlists: Read the context-status buffer from the HWSP Chris Wilson
  2017-07-13 17:58   ` Michel Thierry
@ 2017-07-13 19:44   ` Chris Wilson
  2017-07-18 14:36     ` Mika Kuoppala
  1 sibling, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2017-07-13 19:44 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

The engine provides a mirror of the CSB in the HWSP. If we use the
cacheable reads from the HWSP, we can shave off a few mmio reads per
context-switch interrupt (which are quite frequent!). Just removing a
couple of mmio is not enough to actually reduce any latency, but a small
reduction in overall cpu usage.

Much appreciation for Ben dropping the bombshell that the CSB was in the
HWSP and for Michel in digging out the details.

v2: Don't be lazy, add the defines for the indices.
v3: Include the HWSP in debugfs/i915_engine_info
v4: Check for GVT-g, it currently depends on intercepting CSB mmio
v5: Fixup GVT-g mmio path

Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  7 +++++--
 drivers/gpu/drm/i915/intel_lrc.c        | 16 +++++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
 3 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 620c9218d1c1..5fd01c14a3ec 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3384,6 +3384,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 			   upper_32_bits(addr), lower_32_bits(addr));
 
 		if (i915.enable_execlists) {
+			const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
 			u32 ptr, read, write;
 			unsigned int idx;
 
@@ -3404,10 +3405,12 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 				write += GEN8_CSB_ENTRIES;
 			while (read < write) {
 				idx = ++read % GEN8_CSB_ENTRIES;
-				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x, context: %d\n",
+				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x [0x%08x in hwsp], context: %d [%d in hwsp]\n",
 					   idx,
 					   I915_READ(RING_CONTEXT_STATUS_BUF_LO(engine, idx)),
-					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)));
+					   hws[idx * 2],
+					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)),
+					   hws[idx * 2 + 1]);
 			}
 
 			rcu_read_lock();
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3469badedbe0..41dc04eb6097 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -547,10 +547,17 @@ static void intel_lrc_irq_handler(unsigned long data)
 	while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
 		u32 __iomem *csb_mmio =
 			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
-		u32 __iomem *buf =
-			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0));
+		/* The HWSP contains a (cacheable) mirror of the CSB */
+		const u32 *buf =
+			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
 		unsigned int head, tail;
 
+		/* However GVT emulation depends upon intercepting CSB mmio */
+		if (unlikely(intel_vgpu_active(dev_priv))) {
+			buf = (u32 * __force)
+				(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
+		}
+
 		/* The write will be ordered by the uncached read (itself
 		 * a memory barrier), so we do not need another in the form
 		 * of a locked instruction. The race between the interrupt
@@ -590,13 +597,12 @@ static void intel_lrc_irq_handler(unsigned long data)
 			 * status notifier.
 			 */
 
-			status = readl(buf + 2 * head);
+			status = buf[2 * head];
 			if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
 				continue;
 
 			/* Check the context/desc id for this event matches */
-			GEM_DEBUG_BUG_ON(readl(buf + 2 * head + 1) !=
-					 port->context_id);
+			GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
 
 			rq = port_unpack(port, &count);
 			GEM_BUG_ON(count == 0);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index d33c93444c0d..2c55cfa14fb5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -496,6 +496,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
 #define I915_GEM_HWS_SCRATCH_INDEX	0x40
 #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
 
+#define I915_HWS_CSB_BUF0_INDEX		0x10
+
 struct intel_ring *
 intel_engine_create_ring(struct intel_engine_cs *engine, int size);
 int intel_ring_pin(struct intel_ring *ring,
-- 
2.13.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [v3,1/5] drm/i915/lrc: Clarify the format of the context image (rev2)
  2017-07-13 10:27 [PATCH v3 1/5] drm/i915/lrc: Clarify the format of the context image Chris Wilson
                   ` (4 preceding siblings ...)
  2017-07-13 10:50 ` ✗ Fi.CI.BAT: warning for series starting with [v3,1/5] drm/i915/lrc: Clarify the format of the context image Patchwork
@ 2017-07-13 19:47 ` Patchwork
  5 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2017-07-13 19:47 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [v3,1/5] drm/i915/lrc: Clarify the format of the context image (rev2)
URL   : https://patchwork.freedesktop.org/series/27232/
State : failure

== Summary ==

  CHK     include/config/kernel.release
  CHK     include/generated/uapi/linux/version.h
  CHK     include/generated/utsrelease.h
  CHK     include/generated/bounds.h
  CHK     include/generated/timeconst.h
  CHK     include/generated/asm-offsets.h
  CALL    scripts/checksyscalls.sh
  CHK     scripts/mod/devicetable-offsets.h
  CHK     include/generated/compile.h
  CHK     kernel/config_data.h
  CC [M]  drivers/gpu/drm/i915/intel_lrc.o
drivers/gpu/drm/i915/intel_lrc.c: In function ‘intel_lrc_irq_handler’:
drivers/gpu/drm/i915/intel_lrc.c:558:1: error: expected expression before ‘<<’ token
 <<<<<<< 21a06178866781ffd268721502460b5377876f30
 ^
drivers/gpu/drm/i915/intel_lrc.c:558:9: error: invalid suffix "a06178866781ffd268721502460b5377876f30" on integer constant
 <<<<<<< 21a06178866781ffd268721502460b5377876f30
         ^
drivers/gpu/drm/i915/intel_lrc.c:560:1: error: expected expression before ‘==’ token
 =======
 ^
drivers/gpu/drm/i915/intel_lrc.c:563:1: error: expected expression before ‘>>’ token
 >>>>>>> drm/i915/execlists: Read the context-status HEAD from the HWSP
 ^
scripts/Makefile.build:302: recipe for target 'drivers/gpu/drm/i915/intel_lrc.o' failed
make[4]: *** [drivers/gpu/drm/i915/intel_lrc.o] Error 1
scripts/Makefile.build:561: recipe for target 'drivers/gpu/drm/i915' failed
make[3]: *** [drivers/gpu/drm/i915] Error 2
scripts/Makefile.build:561: recipe for target 'drivers/gpu/drm' failed
make[2]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:561: recipe for target 'drivers/gpu' failed
make[1]: *** [drivers/gpu] Error 2
Makefile:1016: recipe for target 'drivers' failed
make: *** [drivers] Error 2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 5/5] drm/i915/execlists: Read the context-status HEAD from the HWSP
  2017-07-13 14:17   ` Chris Wilson
@ 2017-07-13 20:44     ` Chris Wilson
  0 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2017-07-13 20:44 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

Quoting Chris Wilson (2017-07-13 15:17:41)
> Quoting Chris Wilson (2017-07-13 11:27:06)
> > The engine also provides a mirror of the CSB write pointer in the HWSP,
> > but not of our read pointer. To take advantage of this we need to
> > remember where we read up to on the last interrupt and continue off from
> > there. This poses a problem following a reset, as we don't know where
> > the hw will start writing from, and due to the use of power contexts we
> > cannot perform that query during the reset itself. So we continue the
> > current modus operandi of delaying the first read of the context-status
> > read/write pointers until after the first interrupt. With this we should
> > now have eliminated all uncached mmio reads in handling the
> > context-status interrupt, though we still have the uncached mmio writes
> > for submitting new work, and many uncached mmio reads in the global
> > interrupt handler itself. Still a step in the right direction towards
> > reducing our resubmit latency, although it appears lost in the noise!
> 
> It is also worth noting that Broxton seems to handle per-engine resets
> better when using the HWSP than CSB mmio. Too early to say for sure,
> needs a few days to be sure the error doesn't occur.

Sadly, that didn't last beyond turning KASAN off. Went back to a config
where I could reliably hit the zero mmio reads and then switched HWSP
reads back on. Instant garbage from engine A whilst resetting engine B.
I really did hope those zeroes where limited to the CSB buffer inside
the chipset and didn't make it to the HWSP. Obviously, I'll continue to
double check.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] drm/i915/execlists: Read the context-status buffer from the HWSP
  2017-07-13 19:44   ` [PATCH v4] " Chris Wilson
@ 2017-07-18 14:36     ` Mika Kuoppala
  2017-07-18 14:56       ` Chris Wilson
  0 siblings, 1 reply; 16+ messages in thread
From: Mika Kuoppala @ 2017-07-18 14:36 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> The engine provides a mirror of the CSB in the HWSP. If we use the
> cacheable reads from the HWSP, we can shave off a few mmio reads per
> context-switch interrupt (which are quite frequent!). Just removing a
> couple of mmio is not enough to actually reduce any latency, but a small
> reduction in overall cpu usage.
>
> Much appreciation for Ben dropping the bombshell that the CSB was in the
> HWSP and for Michel in digging out the details.
>
> v2: Don't be lazy, add the defines for the indices.
> v3: Include the HWSP in debugfs/i915_engine_info
> v4: Check for GVT-g, it currently depends on intercepting CSB mmio
> v5: Fixup GVT-g mmio path
>
> Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> Cc: Zhi Wang <zhi.a.wang@intel.com>
> Acked-by: Michel Thierry <michel.thierry@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c     |  7 +++++--
>  drivers/gpu/drm/i915/intel_lrc.c        | 16 +++++++++++-----
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
>  3 files changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 620c9218d1c1..5fd01c14a3ec 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -3384,6 +3384,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>  			   upper_32_bits(addr), lower_32_bits(addr));
>  
>  		if (i915.enable_execlists) {
> +			const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
>  			u32 ptr, read, write;
>  			unsigned int idx;
>  
> @@ -3404,10 +3405,12 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>  				write += GEN8_CSB_ENTRIES;
>  			while (read < write) {
>  				idx = ++read % GEN8_CSB_ENTRIES;
> -				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x, context: %d\n",
> +				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x [0x%08x in hwsp], context: %d [%d in hwsp]\n",
>  					   idx,
>  					   I915_READ(RING_CONTEXT_STATUS_BUF_LO(engine, idx)),
> -					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)));
> +					   hws[idx * 2],
> +					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)),
> +					   hws[idx * 2 + 1]);
>  			}
>  
>  			rcu_read_lock();
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 3469badedbe0..41dc04eb6097 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -547,10 +547,17 @@ static void intel_lrc_irq_handler(unsigned long data)
>  	while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
>  		u32 __iomem *csb_mmio =
>  			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
> -		u32 __iomem *buf =
> -			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0));
> +		/* The HWSP contains a (cacheable) mirror of the CSB */
> +		const u32 *buf =
> +			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];

Could be also const u32 * const buf = ...
as in debugfs counterpart. Value added is quite thin tho vs clutter so
not insisting.

>  		unsigned int head, tail;
>  
> +		/* However GVT emulation depends upon intercepting CSB mmio */
> +		if (unlikely(intel_vgpu_active(dev_priv))) {
> +			buf = (u32 * __force)
> +				(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
> +		}
> +
>  		/* The write will be ordered by the uncached read (itself
>  		 * a memory barrier), so we do not need another in the form
>  		 * of a locked instruction. The race between the interrupt
> @@ -590,13 +597,12 @@ static void intel_lrc_irq_handler(unsigned long data)
>  			 * status notifier.
>  			 */
>  
> -			status = readl(buf + 2 * head);
> +			status = buf[2 * head];
>  			if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
>  				continue;
>  
>  			/* Check the context/desc id for this event matches */
> -			GEM_DEBUG_BUG_ON(readl(buf + 2 * head + 1) !=
> -					 port->context_id);
> +			GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);

In here I wonder if GEM_BUG_ON check with the equivalence of the hswp value
vs mmio valu would serve any purpose. Adding the mmio delay here tho
would be harmful.

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


>  
>  			rq = port_unpack(port, &count);
>  			GEM_BUG_ON(count == 0);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index d33c93444c0d..2c55cfa14fb5 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -496,6 +496,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
>  #define I915_GEM_HWS_SCRATCH_INDEX	0x40
>  #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
>  
> +#define I915_HWS_CSB_BUF0_INDEX		0x10
> +
>  struct intel_ring *
>  intel_engine_create_ring(struct intel_engine_cs *engine, int size);
>  int intel_ring_pin(struct intel_ring *ring,
> -- 
> 2.13.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] drm/i915/execlists: Read the context-status buffer from the HWSP
  2017-07-18 14:36     ` Mika Kuoppala
@ 2017-07-18 14:56       ` Chris Wilson
  0 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2017-07-18 14:56 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2017-07-18 15:36:46)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > The engine provides a mirror of the CSB in the HWSP. If we use the
> > cacheable reads from the HWSP, we can shave off a few mmio reads per
> > context-switch interrupt (which are quite frequent!). Just removing a
> > couple of mmio is not enough to actually reduce any latency, but a small
> > reduction in overall cpu usage.
> >
> > Much appreciation for Ben dropping the bombshell that the CSB was in the
> > HWSP and for Michel in digging out the details.
> >
> > v2: Don't be lazy, add the defines for the indices.
> > v3: Include the HWSP in debugfs/i915_engine_info
> > v4: Check for GVT-g, it currently depends on intercepting CSB mmio
> > v5: Fixup GVT-g mmio path
> >
> > Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Michel Thierry <michel.thierry@intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> > Cc: Zhi Wang <zhi.a.wang@intel.com>
> > Acked-by: Michel Thierry <michel.thierry@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_debugfs.c     |  7 +++++--
> >  drivers/gpu/drm/i915/intel_lrc.c        | 16 +++++++++++-----
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
> >  3 files changed, 18 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 620c9218d1c1..5fd01c14a3ec 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -3384,6 +3384,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
> >                          upper_32_bits(addr), lower_32_bits(addr));
> >  
> >               if (i915.enable_execlists) {
> > +                     const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
> >                       u32 ptr, read, write;
> >                       unsigned int idx;
> >  
> > @@ -3404,10 +3405,12 @@ static int i915_engine_info(struct seq_file *m, void *unused)
> >                               write += GEN8_CSB_ENTRIES;
> >                       while (read < write) {
> >                               idx = ++read % GEN8_CSB_ENTRIES;
> > -                             seq_printf(m, "    Execlist CSB[%d]: 0x%08x, context: %d\n",
> > +                             seq_printf(m, "    Execlist CSB[%d]: 0x%08x [0x%08x in hwsp], context: %d [%d in hwsp]\n",
> >                                          idx,
> >                                          I915_READ(RING_CONTEXT_STATUS_BUF_LO(engine, idx)),
> > -                                        I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)));
> > +                                        hws[idx * 2],
> > +                                        I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)),
> > +                                        hws[idx * 2 + 1]);
> >                       }
> >  
> >                       rcu_read_lock();
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 3469badedbe0..41dc04eb6097 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -547,10 +547,17 @@ static void intel_lrc_irq_handler(unsigned long data)
> >       while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
> >               u32 __iomem *csb_mmio =
> >                       dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
> > -             u32 __iomem *buf =
> > -                     dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0));
> > +             /* The HWSP contains a (cacheable) mirror of the CSB */
> > +             const u32 *buf =
> > +                     &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
> 
> Could be also const u32 * const buf = ...
> as in debugfs counterpart. Value added is quite thin tho vs clutter so
> not insisting.

> >               unsigned int head, tail;
> >  
> > +             /* However GVT emulation depends upon intercepting CSB mmio */
> > +             if (unlikely(intel_vgpu_active(dev_priv))) {
> > +                     buf = (u32 * __force)
> > +                             (dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
> > +             }

Hence why we can't use const u32 *const buf ;-)

> > +
> >               /* The write will be ordered by the uncached read (itself
> >                * a memory barrier), so we do not need another in the form
> >                * of a locked instruction. The race between the interrupt
> > @@ -590,13 +597,12 @@ static void intel_lrc_irq_handler(unsigned long data)
> >                        * status notifier.
> >                        */
> >  
> > -                     status = readl(buf + 2 * head);
> > +                     status = buf[2 * head];
> >                       if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
> >                               continue;
> >  
> >                       /* Check the context/desc id for this event matches */
> > -                     GEM_DEBUG_BUG_ON(readl(buf + 2 * head + 1) !=
> > -                                      port->context_id);
> > +                     GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
> 
> In here I wonder if GEM_BUG_ON check with the equivalence of the hswp value
> vs mmio valu would serve any purpose. Adding the mmio delay here tho
> would be harmful.

Hard here as the hwsp equivalence isn't guaranteed due to vgpu. Our
sanity checks are already pretty good for confirming that the CSB
sequence matches our input.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 5/5] drm/i915/execlists: Read the context-status HEAD from the HWSP
  2017-07-13 10:27 ` [PATCH v3 5/5] drm/i915/execlists: Read the context-status HEAD " Chris Wilson
  2017-07-13 14:17   ` Chris Wilson
  2017-07-13 18:12   ` Michel Thierry
@ 2017-07-20 13:38   ` Mika Kuoppala
  2 siblings, 0 replies; 16+ messages in thread
From: Mika Kuoppala @ 2017-07-20 13:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> The engine also provides a mirror of the CSB write pointer in the HWSP,
> but not of our read pointer. To take advantage of this we need to
> remember where we read up to on the last interrupt and continue off from
> there. This poses a problem following a reset, as we don't know where
> the hw will start writing from, and due to the use of power contexts we
> cannot perform that query during the reset itself. So we continue the
> current modus operandi of delaying the first read of the context-status
> read/write pointers until after the first interrupt. With this we should
> now have eliminated all uncached mmio reads in handling the
> context-status interrupt, though we still have the uncached mmio writes
> for submitting new work, and many uncached mmio reads in the global
> interrupt handler itself. Still a step in the right direction towards
> reducing our resubmit latency, although it appears lost in the noise!
>
> v2: Cannonlake moved the CSB write index
> v3: Include the sw/hwsp state in debugfs/i915_engine_info
> v4: Also revert to using CSB mmio for GVT-g
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> Cc: Zhi Wang <zhi.a.wang@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c     |  6 ++++--
>  drivers/gpu/drm/i915/i915_drv.h         |  8 ++++++++
>  drivers/gpu/drm/i915/intel_lrc.c        | 25 ++++++++++++++++++++-----
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +++
>  4 files changed, 35 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 5fd01c14a3ec..552aef61b47b 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -3395,8 +3395,10 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>  			ptr = I915_READ(RING_CONTEXT_STATUS_PTR(engine));
>  			read = GEN8_CSB_READ_PTR(ptr);
>  			write = GEN8_CSB_WRITE_PTR(ptr);
> -			seq_printf(m, "\tExeclist CSB read %d, write %d\n",
> -				   read, write);
> +			seq_printf(m, "\tExeclist CSB read %d [%d cached], write %d [%d from hws]\n",
> +				   read, engine->csb_head,
> +				   write,
> +				   intel_read_status_page(engine, intel_hws_csb_write_index(engine->i915)));
>  			if (read >= GEN8_CSB_ENTRIES)
>  				read = 0;
>  			if (write >= GEN8_CSB_ENTRIES)
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 81cd21ecfa7d..f62c9db8a9a8 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -4228,4 +4228,12 @@ static inline bool i915_gem_object_is_coherent(struct drm_i915_gem_object *obj)
>  		HAS_LLC(to_i915(obj->base.dev)));
>  }
>  
> +static inline int intel_hws_csb_write_index(struct drm_i915_private *i915)
> +{
> +	if (INTEL_GEN(i915) >= 10)
> +		return CNL_HWS_CSB_WRITE_INDEX;
> +	else
> +		return I915_HWS_CSB_WRITE_INDEX;
> +}
> +
>  #endif
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 5b721f65d232..7c3dce27e504 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -556,6 +556,7 @@ static void intel_lrc_irq_handler(unsigned long data)
>  		if (unlikely(intel_vgpu_active(dev_priv))) {
>  			buf = (u32 * __force)
>  				(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
> +			engine->csb_head = -1;
>  		}
>  
>  		/* The write will be ordered by the uncached read (itself
> @@ -569,9 +570,19 @@ static void intel_lrc_irq_handler(unsigned long data)
>  		 * is set and we do a new loop.
>  		 */
>  		__clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> -		head = readl(csb_mmio);
> -		tail = GEN8_CSB_WRITE_PTR(head);
> -		head = GEN8_CSB_READ_PTR(head);
> +		if (unlikely(engine->csb_head == -1)) { /* following a reset */
> +			head = readl(csb_mmio);
> +			tail = GEN8_CSB_WRITE_PTR(head);
> +			head = GEN8_CSB_READ_PTR(head);
> +			engine->csb_head = head;
> +		} else {
> +			const int write_idx =
> +				intel_hws_csb_write_index(dev_priv) -
> +				I915_HWS_CSB_BUF0_INDEX;
> +
> +			head = engine->csb_head;
> +			tail = buf[write_idx];
> +		}

I have discussed this with Chris already in irc but I have a kbl
that can't survive the patch. The hwsp tail seems to update
inside the loop without corresponding interrupt.

And it results in a system hang instead of hangcheck firing.
Which is a another mystery byitself.

Resampling the tail inside the loop makes it work.

-Mika

>  		while (head != tail) {
>  			struct drm_i915_gem_request *rq;
>  			unsigned int status;
> @@ -625,8 +636,11 @@ static void intel_lrc_irq_handler(unsigned long data)
>  				   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
>  		}
>  
> -		writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
> -		       csb_mmio);
> +		if (head != engine->csb_head) {
> +			engine->csb_head = head;
> +			writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
> +			       csb_mmio);
> +		}
>  	}
>  
>  	if (execlists_elsp_ready(engine))
> @@ -1253,6 +1267,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
>  
>  	/* After a GPU reset, we may have requests to replay */
>  	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> +	engine->csb_head = -1;
>  
>  	submit = false;
>  	for (n = 0; n < ARRAY_SIZE(engine->execlist_port); n++) {
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 2c55cfa14fb5..a182da7eb9a9 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -391,6 +391,7 @@ struct intel_engine_cs {
>  	struct rb_root execlist_queue;
>  	struct rb_node *execlist_first;
>  	unsigned int fw_domains;
> +	unsigned int csb_head;
>  
>  	/* Contexts are pinned whilst they are active on the GPU. The last
>  	 * context executed remains active whilst the GPU is idle - the
> @@ -497,6 +498,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
>  #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
>  
>  #define I915_HWS_CSB_BUF0_INDEX		0x10
> +#define I915_HWS_CSB_WRITE_INDEX	0x1f
> +#define CNL_HWS_CSB_WRITE_INDEX		0x2f
>  
>  struct intel_ring *
>  intel_engine_create_ring(struct intel_engine_cs *engine, int size);
> -- 
> 2.13.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-07-20 13:40 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-13 10:27 [PATCH v3 1/5] drm/i915/lrc: Clarify the format of the context image Chris Wilson
2017-07-13 10:27 ` [PATCH v3 2/5] drm/i915/guc: Don't make assumptions while getting the lrca offset Chris Wilson
2017-07-13 10:27 ` [PATCH v3 3/5] drm/i915/lrc: allocate separate page for HWSP Chris Wilson
2017-07-13 17:40   ` Michel Thierry
2017-07-13 10:27 ` [PATCH v3 4/5] drm/i915/execlists: Read the context-status buffer from the HWSP Chris Wilson
2017-07-13 17:58   ` Michel Thierry
2017-07-13 19:44   ` [PATCH v4] " Chris Wilson
2017-07-18 14:36     ` Mika Kuoppala
2017-07-18 14:56       ` Chris Wilson
2017-07-13 10:27 ` [PATCH v3 5/5] drm/i915/execlists: Read the context-status HEAD " Chris Wilson
2017-07-13 14:17   ` Chris Wilson
2017-07-13 20:44     ` Chris Wilson
2017-07-13 18:12   ` Michel Thierry
2017-07-20 13:38   ` Mika Kuoppala
2017-07-13 10:50 ` ✗ Fi.CI.BAT: warning for series starting with [v3,1/5] drm/i915/lrc: Clarify the format of the context image Patchwork
2017-07-13 19:47 ` ✗ Fi.CI.BAT: failure for series starting with [v3,1/5] drm/i915/lrc: Clarify the format of the context image (rev2) Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).