* [RFC 0/5] LRC irq handler cleanups
@ 2015-11-10 10:59 Tvrtko Ursulin
2015-11-10 10:59 ` [PATCH 1/5] drm/i915: Avoid invariant conditionals in lrc interrupt handler Tvrtko Ursulin
` (5 more replies)
0 siblings, 6 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2015-11-10 10:59 UTC (permalink / raw)
To: Intel-gfx
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Some random bits to make the LRC irq handler do fewer branching,
locking and VMA lookups per interrupt handled.
I failed to measure a significant gain on a powerful chip but it
definitely results in fewer instructions and branches in the hot
path. So maybe it would help a lower power chip more.
Possibly makes the first run of gem_exec_nop have less variance
eg. branch predictor maybe gets warmed up sooner but I am not
completely confident in this interpretation.
Tvrtko Ursulin (5):
drm/i915: Avoid invariant conditionals in lrc interrupt handler
drm/i915: Move LRCA check out of the hot path
drm/i915: Cache LRCA in the context
drm/i915: Grab one forcewake across the whole LRC irq handler
drm/i915: Only grab and calculate timestamps when needed
drivers/gpu/drm/i915/i915_debugfs.c | 15 ++--
drivers/gpu/drm/i915/i915_drv.h | 1 +
drivers/gpu/drm/i915/i915_gem.c | 15 ++--
drivers/gpu/drm/i915/intel_lrc.c | 131 +++++++++++++++++++-------------
drivers/gpu/drm/i915/intel_lrc.h | 3 +-
drivers/gpu/drm/i915/intel_ringbuffer.h | 2 +
6 files changed, 97 insertions(+), 70 deletions(-)
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/5] drm/i915: Avoid invariant conditionals in lrc interrupt handler
2015-11-10 10:59 [RFC 0/5] LRC irq handler cleanups Tvrtko Ursulin
@ 2015-11-10 10:59 ` Tvrtko Ursulin
2015-11-10 10:59 ` [PATCH 2/5] drm/i915: Move LRCA check out of the hot path Tvrtko Ursulin
` (4 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2015-11-10 10:59 UTC (permalink / raw)
To: Intel-gfx
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
There is no point in doing several gen and static feature dependant
branches multiple times per interrupt handled. Do them once on
ring setup and use the cached values.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/intel_lrc.c | 34 +++++++++++++++++----------------
drivers/gpu/drm/i915/intel_ringbuffer.h | 2 ++
2 files changed, 20 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 06180dce954e..ea031bb46909 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -293,29 +293,15 @@ uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
struct intel_engine_cs *ring)
{
struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
- uint64_t desc;
+ uint64_t desc = ring->ctx_desc_template;
uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
LRC_PPHWSP_PN * PAGE_SIZE;
WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
- desc = GEN8_CTX_VALID;
- desc |= GEN8_CTX_ADDRESSING_MODE(dev) << GEN8_CTX_ADDRESSING_MODE_SHIFT;
- if (IS_GEN8(ctx_obj->base.dev))
- desc |= GEN8_CTX_L3LLC_COHERENT;
- desc |= GEN8_CTX_PRIVILEGE;
desc |= lrca;
desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
- /* TODO: WaDisableLiteRestore when we start using semaphore
- * signalling between Command Streamers */
- /* desc |= GEN8_CTX_FORCE_RESTORE; */
-
- /* WaEnableForceRestoreInCtxtDescForVCS:skl */
- /* WaEnableForceRestoreInCtxtDescForVCS:bxt */
- if (disable_lite_restore_wa(ring))
- desc |= GEN8_CTX_FORCE_RESTORE;
-
return desc;
}
@@ -540,7 +526,7 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring)
}
}
- if (disable_lite_restore_wa(ring)) {
+ if (ring->disable_lite_restore_wa) {
/* Prevent a ctx to preempt itself */
if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) &&
(submit_contexts != 0))
@@ -1948,6 +1934,22 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
return ret;
}
+ ring->disable_lite_restore_wa = disable_lite_restore_wa(ring);
+
+ ring->ctx_desc_template = GEN8_CTX_VALID;
+ ring->ctx_desc_template |= GEN8_CTX_ADDRESSING_MODE(dev) <<
+ GEN8_CTX_ADDRESSING_MODE_SHIFT;
+ if (IS_GEN8(dev))
+ ring->ctx_desc_template |= GEN8_CTX_L3LLC_COHERENT;
+ ring->ctx_desc_template |= GEN8_CTX_PRIVILEGE;
+
+ /* TODO: WaDisableLiteRestore when we start using semaphore
+ * signalling between Command Streamers */
+ /* WaEnableForceRestoreInCtxtDescForVCS:skl */
+ /* WaEnableForceRestoreInCtxtDescForVCS:bxt */
+ if (ring->disable_lite_restore_wa)
+ ring->ctx_desc_template |= GEN8_CTX_FORCE_RESTORE;
+
return ret;
}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 58b1976a7d0a..ad9cd6d73ab5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -268,6 +268,8 @@ struct intel_engine_cs {
struct list_head execlist_queue;
struct list_head execlist_retired_req_list;
u8 next_context_status_buffer;
+ bool disable_lite_restore_wa;
+ u32 ctx_desc_template;
u32 irq_keep_mask; /* bitmask for interrupts that should not be masked */
int (*emit_request)(struct drm_i915_gem_request *request);
int (*emit_flush)(struct drm_i915_gem_request *request,
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/5] drm/i915: Move LRCA check out of the hot path
2015-11-10 10:59 [RFC 0/5] LRC irq handler cleanups Tvrtko Ursulin
2015-11-10 10:59 ` [PATCH 1/5] drm/i915: Avoid invariant conditionals in lrc interrupt handler Tvrtko Ursulin
@ 2015-11-10 10:59 ` Tvrtko Ursulin
2015-11-10 10:59 ` [PATCH 3/5] drm/i915: Cache LRCA in the context Tvrtko Ursulin
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2015-11-10 10:59 UTC (permalink / raw)
To: Intel-gfx
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
There is no need to check the LRCA for non-aligment or range
several times per interrupt handled when the VMA address in
question is explicitly pinned and unpinned with a wider
lifetime.
So move the check to the place which does the pinning.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/intel_lrc.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ea031bb46909..3f9b981cc226 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -297,8 +297,6 @@ uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
LRC_PPHWSP_PN * PAGE_SIZE;
- WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
-
desc |= lrca;
desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
@@ -999,6 +997,7 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
{
struct drm_device *dev = ring->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
+ u64 lrca;
int ret = 0;
WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
@@ -1007,6 +1006,12 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
if (ret)
return ret;
+ lrca = i915_gem_obj_ggtt_offset(ctx_obj) + LRC_PPHWSP_PN * PAGE_SIZE;
+ if (WARN_ON(lrca & 0xFFFFFFFF00000FFFULL)) {
+ ret = -EINVAL;
+ goto unpin_ctx_obj;
+ }
+
ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
if (ret)
goto unpin_ctx_obj;
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/5] drm/i915: Cache LRCA in the context
2015-11-10 10:59 [RFC 0/5] LRC irq handler cleanups Tvrtko Ursulin
2015-11-10 10:59 ` [PATCH 1/5] drm/i915: Avoid invariant conditionals in lrc interrupt handler Tvrtko Ursulin
2015-11-10 10:59 ` [PATCH 2/5] drm/i915: Move LRCA check out of the hot path Tvrtko Ursulin
@ 2015-11-10 10:59 ` Tvrtko Ursulin
2015-11-10 10:59 ` [PATCH 4/5] drm/i915: Grab one forcewake across the whole LRC irq handler Tvrtko Ursulin
` (2 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2015-11-10 10:59 UTC (permalink / raw)
To: Intel-gfx
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
LRCA is static while the context is pinned so we can avoid looking
up the VMA in question several times per interrupt handled.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/i915_debugfs.c | 15 ++++++---------
drivers/gpu/drm/i915/i915_drv.h | 1 +
drivers/gpu/drm/i915/intel_lrc.c | 28 +++++++++++++++-------------
drivers/gpu/drm/i915/intel_lrc.h | 3 ++-
4 files changed, 24 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 5659d4c6c2c3..2b1598c8e01f 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1973,12 +1973,13 @@ static int i915_context_status(struct seq_file *m, void *unused)
}
static void i915_dump_lrc_obj(struct seq_file *m,
- struct intel_engine_cs *ring,
- struct drm_i915_gem_object *ctx_obj)
+ struct intel_context *ctx,
+ struct intel_engine_cs *ring)
{
struct page *page;
uint32_t *reg_state;
int j;
+ struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
unsigned long ggtt_offset = 0;
if (ctx_obj == NULL) {
@@ -1988,7 +1989,7 @@ static void i915_dump_lrc_obj(struct seq_file *m,
}
seq_printf(m, "CONTEXT: %s %u\n", ring->name,
- intel_execlists_ctx_id(ctx_obj));
+ intel_execlists_ctx_id(ctx, ring));
if (!i915_gem_obj_ggtt_bound(ctx_obj))
seq_puts(m, "\tNot bound in GGTT\n");
@@ -2037,8 +2038,7 @@ static int i915_dump_lrc(struct seq_file *m, void *unused)
list_for_each_entry(ctx, &dev_priv->context_list, link) {
for_each_ring(ring, dev_priv, i) {
if (ring->default_context != ctx)
- i915_dump_lrc_obj(m, ring,
- ctx->engine[i].state);
+ i915_dump_lrc_obj(m, ctx, ring);
}
}
@@ -2112,11 +2112,8 @@ static int i915_execlists(struct seq_file *m, void *data)
seq_printf(m, "\t%d requests in queue\n", count);
if (head_req) {
- struct drm_i915_gem_object *ctx_obj;
-
- ctx_obj = head_req->ctx->engine[ring_id].state;
seq_printf(m, "\tHead request id: %u\n",
- intel_execlists_ctx_id(ctx_obj));
+ intel_execlists_ctx_id(head_req->ctx, ring));
seq_printf(m, "\tHead request tail: %u\n",
head_req->tail);
}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d2a546a66203..f97bb4d27996 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -885,6 +885,7 @@ struct intel_context {
struct drm_i915_gem_object *state;
struct intel_ringbuffer *ringbuf;
int pin_count;
+ u32 lrca;
} engine[I915_NUM_RINGS];
struct list_head link;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3f9b981cc226..1f8566b1f072 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -260,7 +260,8 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
/**
* intel_execlists_ctx_id() - get the Execlists Context ID
- * @ctx_obj: Logical Ring Context backing object.
+ * @ctx: User context we are interested in
+ * @ring: Engine to get the Context ID for
*
* Do not confuse with ctx->id! Unfortunately we have a name overload
* here: the old context ID we pass to userspace as a handler so that
@@ -270,14 +271,12 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
*
* Return: 20-bits globally unique context ID.
*/
-u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
+u32 intel_execlists_ctx_id(struct intel_context *ctx,
+ struct intel_engine_cs *ring)
{
- u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
- LRC_PPHWSP_PN * PAGE_SIZE;
-
/* LRCA is required to be 4K aligned so the more significant 20 bits
* are globally unique */
- return lrca >> 12;
+ return ctx->engine[ring->id].lrca >> 12;
}
static bool disable_lite_restore_wa(struct intel_engine_cs *ring)
@@ -292,13 +291,11 @@ static bool disable_lite_restore_wa(struct intel_engine_cs *ring)
uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
struct intel_engine_cs *ring)
{
- struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
uint64_t desc = ring->ctx_desc_template;
- uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
- LRC_PPHWSP_PN * PAGE_SIZE;
+ uint64_t lrca = ctx->engine[ring->id].lrca;
desc |= lrca;
- desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
+ desc |= (u64)intel_execlists_ctx_id(ctx, ring) << GEN8_CTX_ID_SHIFT;
return desc;
}
@@ -457,9 +454,7 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
execlist_link);
if (head_req != NULL) {
- struct drm_i915_gem_object *ctx_obj =
- head_req->ctx->engine[ring->id].state;
- if (intel_execlists_ctx_id(ctx_obj) == request_id) {
+ if (intel_execlists_ctx_id(head_req->ctx, ring) == request_id) {
WARN(head_req->elsp_submitted == 0,
"Never submitted head request\n");
@@ -1041,6 +1036,8 @@ static int intel_lr_context_pin(struct drm_i915_gem_request *rq)
ret = intel_lr_context_do_pin(ring, ctx_obj, ringbuf);
if (ret)
goto reset_pin_count;
+ rq->ctx->engine[ring->id].lrca =
+ i915_gem_obj_ggtt_offset(ctx_obj) + LRC_PPHWSP_PN * PAGE_SIZE;
}
return ret;
@@ -1060,6 +1057,7 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
if (--rq->ctx->engine[ring->id].pin_count == 0) {
intel_unpin_ringbuffer_obj(ringbuf);
i915_gem_object_ggtt_unpin(ctx_obj);
+ rq->ctx->engine[ring->id].lrca = 0;
}
}
}
@@ -1939,6 +1937,10 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
return ret;
}
+ ring->default_context->engine[ring->id].lrca =
+ i915_gem_obj_ggtt_offset(ring->default_context->engine[ring->id].state)
+ + LRC_PPHWSP_PN * PAGE_SIZE;
+
ring->disable_lite_restore_wa = disable_lite_restore_wa(ring);
ring->ctx_desc_template = GEN8_CTX_VALID;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 4e60d54ba66d..cb68bfe91ecd 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -86,6 +86,8 @@ void intel_lr_context_reset(struct drm_device *dev,
struct intel_context *ctx);
uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
struct intel_engine_cs *ring);
+u32 intel_execlists_ctx_id(struct intel_context *ctx,
+ struct intel_engine_cs *ring);
/* Execlists */
int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
@@ -93,7 +95,6 @@ struct i915_execbuffer_params;
int intel_execlists_submission(struct i915_execbuffer_params *params,
struct drm_i915_gem_execbuffer2 *args,
struct list_head *vmas);
-u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
void intel_lrc_irq_handler(struct intel_engine_cs *ring);
void intel_execlists_retire_requests(struct intel_engine_cs *ring);
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 4/5] drm/i915: Grab one forcewake across the whole LRC irq handler
2015-11-10 10:59 [RFC 0/5] LRC irq handler cleanups Tvrtko Ursulin
` (2 preceding siblings ...)
2015-11-10 10:59 ` [PATCH 3/5] drm/i915: Cache LRCA in the context Tvrtko Ursulin
@ 2015-11-10 10:59 ` Tvrtko Ursulin
2015-11-10 10:59 ` [PATCH 5/5] drm/i915: Only grab and calculate timestamps when needed Tvrtko Ursulin
2015-11-10 11:36 ` [RFC 0/5] LRC irq handler cleanups Chris Wilson
5 siblings, 0 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2015-11-10 10:59 UTC (permalink / raw)
To: Intel-gfx
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
This is to replace multiple branch and spinlock heavy mmio
operations per LRC interrupt with a single forcewake grab.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/intel_lrc.c | 60 +++++++++++++++++++++++++---------------
1 file changed, 38 insertions(+), 22 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1f8566b1f072..2833ee642aa1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -301,7 +301,8 @@ uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
}
static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
- struct drm_i915_gem_request *rq1)
+ struct drm_i915_gem_request *rq1,
+ bool fw_locked)
{
struct intel_engine_cs *ring = rq0->ring;
@@ -319,9 +320,12 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
desc[0] = intel_lr_context_descriptor(rq0->ctx, rq0->ring);
rq0->elsp_submitted++;
+ if (!fw_locked) {
+ spin_lock(&dev_priv->uncore.lock);
+ intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL);
+ }
+
/* You must always write both descriptors in the order below. */
- spin_lock(&dev_priv->uncore.lock);
- intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL);
I915_WRITE_FW(RING_ELSP(ring), upper_32_bits(desc[1]));
I915_WRITE_FW(RING_ELSP(ring), lower_32_bits(desc[1]));
@@ -331,8 +335,11 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
/* ELSP is a wo register, use another nearby reg for posting */
POSTING_READ_FW(RING_EXECLIST_STATUS_LO(ring));
- intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL);
- spin_unlock(&dev_priv->uncore.lock);
+
+ if (!fw_locked) {
+ intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL);
+ spin_unlock(&dev_priv->uncore.lock);
+ }
}
static int execlists_update_context(struct drm_i915_gem_request *rq)
@@ -372,17 +379,19 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
}
static void execlists_submit_requests(struct drm_i915_gem_request *rq0,
- struct drm_i915_gem_request *rq1)
+ struct drm_i915_gem_request *rq1,
+ bool fw_locked)
{
execlists_update_context(rq0);
if (rq1)
execlists_update_context(rq1);
- execlists_elsp_write(rq0, rq1);
+ execlists_elsp_write(rq0, rq1, fw_locked);
}
-static void execlists_context_unqueue(struct intel_engine_cs *ring)
+static void
+execlists_context_unqueue(struct intel_engine_cs *ring, bool fw_locked)
{
struct drm_i915_gem_request *req0 = NULL, *req1 = NULL;
struct drm_i915_gem_request *cursor = NULL, *tmp = NULL;
@@ -439,7 +448,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
WARN_ON(req1 && req1->elsp_submitted);
- execlists_submit_requests(req0, req1);
+ execlists_submit_requests(req0, req1, fw_locked);
}
static bool execlists_check_remove_request(struct intel_engine_cs *ring,
@@ -487,19 +496,23 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring)
u32 status_id;
u32 submit_contexts = 0;
- status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
+ spin_lock(&ring->execlist_lock);
+
+ spin_lock(&dev_priv->uncore.lock);
+ intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL);
+
+ status_pointer = I915_READ_FW(RING_CONTEXT_STATUS_PTR(ring));
read_pointer = ring->next_context_status_buffer;
write_pointer = status_pointer & GEN8_CSB_PTR_MASK;
if (read_pointer > write_pointer)
write_pointer += GEN8_CSB_ENTRIES;
- spin_lock(&ring->execlist_lock);
while (read_pointer < write_pointer) {
read_pointer++;
- status = I915_READ(RING_CONTEXT_STATUS_BUF_LO(ring, read_pointer % GEN8_CSB_ENTRIES));
- status_id = I915_READ(RING_CONTEXT_STATUS_BUF_HI(ring, read_pointer % GEN8_CSB_ENTRIES));
+ status = I915_READ_FW(RING_CONTEXT_STATUS_BUF_LO(ring, read_pointer % GEN8_CSB_ENTRIES));
+ status_id = I915_READ_FW(RING_CONTEXT_STATUS_BUF_HI(ring, read_pointer % GEN8_CSB_ENTRIES));
if (status & GEN8_CTX_STATUS_IDLE_ACTIVE)
continue;
@@ -523,20 +536,23 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring)
/* Prevent a ctx to preempt itself */
if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) &&
(submit_contexts != 0))
- execlists_context_unqueue(ring);
+ execlists_context_unqueue(ring, true);
} else if (submit_contexts != 0) {
- execlists_context_unqueue(ring);
+ execlists_context_unqueue(ring, true);
}
- spin_unlock(&ring->execlist_lock);
-
WARN(submit_contexts > 2, "More than two context complete events?\n");
ring->next_context_status_buffer = write_pointer % GEN8_CSB_ENTRIES;
- I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
- _MASKED_FIELD(GEN8_CSB_PTR_MASK << 8,
- ((u32)ring->next_context_status_buffer &
- GEN8_CSB_PTR_MASK) << 8));
+ I915_WRITE_FW(RING_CONTEXT_STATUS_PTR(ring),
+ _MASKED_FIELD(GEN8_CSB_PTR_MASK << 8,
+ ((u32)ring->next_context_status_buffer &
+ GEN8_CSB_PTR_MASK) << 8));
+
+ intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL);
+ spin_unlock(&dev_priv->uncore.lock);
+
+ spin_unlock(&ring->execlist_lock);
}
static int execlists_context_queue(struct drm_i915_gem_request *request)
@@ -574,7 +590,7 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
list_add_tail(&request->execlist_link, &ring->execlist_queue);
if (num_elements == 0)
- execlists_context_unqueue(ring);
+ execlists_context_unqueue(ring, false);
spin_unlock_irq(&ring->execlist_lock);
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 5/5] drm/i915: Only grab and calculate timestamps when needed
2015-11-10 10:59 [RFC 0/5] LRC irq handler cleanups Tvrtko Ursulin
` (3 preceding siblings ...)
2015-11-10 10:59 ` [PATCH 4/5] drm/i915: Grab one forcewake across the whole LRC irq handler Tvrtko Ursulin
@ 2015-11-10 10:59 ` Tvrtko Ursulin
2015-11-10 11:36 ` [RFC 0/5] LRC irq handler cleanups Chris Wilson
5 siblings, 0 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2015-11-10 10:59 UTC (permalink / raw)
To: Intel-gfx
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Tiny cleanup to avoid grabbing a timestamp and calculating the
timeout when it is not going to be used. If anything makes the
profile correctly show busy wait spends time in get seqno and
not in ktime_get.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f1e3fdeea41f..d85c63dc36ac 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1199,7 +1199,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_ring_flag(ring);
DEFINE_WAIT(wait);
unsigned long timeout_expire;
- s64 before, now;
+ s64 before;
int ret;
WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
@@ -1210,15 +1210,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
if (i915_gem_request_completed(req, true))
return 0;
- timeout_expire = timeout ?
- jiffies + nsecs_to_jiffies_timeout((u64)*timeout) : 0;
-
if (INTEL_INFO(dev_priv)->gen >= 6)
gen6_rps_boost(dev_priv, rps, req->emitted_jiffies);
/* Record current time in case interrupted by signal, or wedged */
trace_i915_gem_request_wait_begin(req);
- before = ktime_get_raw_ns();
+ if (timeout) {
+ before = ktime_get_raw_ns();
+ timeout_expire = jiffies + nsecs_to_jiffies_timeout((u64)*timeout);
+ } else {
+ timeout_expire = 0;
+ }
/* Optimistic spin for the next jiffie before touching IRQs */
ret = __i915_spin_request(req);
@@ -1284,11 +1286,10 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
finish_wait(&ring->irq_queue, &wait);
out:
- now = ktime_get_raw_ns();
trace_i915_gem_request_wait_end(req);
if (timeout) {
- s64 tres = *timeout - (now - before);
+ s64 tres = *timeout - (ktime_get_raw_ns() - before);
*timeout = tres < 0 ? 0 : tres;
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC 0/5] LRC irq handler cleanups
2015-11-10 10:59 [RFC 0/5] LRC irq handler cleanups Tvrtko Ursulin
` (4 preceding siblings ...)
2015-11-10 10:59 ` [PATCH 5/5] drm/i915: Only grab and calculate timestamps when needed Tvrtko Ursulin
@ 2015-11-10 11:36 ` Chris Wilson
2015-11-10 12:09 ` Tvrtko Ursulin
5 siblings, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2015-11-10 11:36 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: Intel-gfx
On Tue, Nov 10, 2015 at 10:59:40AM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> Some random bits to make the LRC irq handler do fewer branching,
> locking and VMA lookups per interrupt handled.
>
> I failed to measure a significant gain on a powerful chip but it
> definitely results in fewer instructions and branches in the hot
> path. So maybe it would help a lower power chip more.
>
> Possibly makes the first run of gem_exec_nop have less variance
> eg. branch predictor maybe gets warmed up sooner but I am not
> completely confident in this interpretation.
I have previously posted these and so much more.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC 0/5] LRC irq handler cleanups
2015-11-10 11:36 ` [RFC 0/5] LRC irq handler cleanups Chris Wilson
@ 2015-11-10 12:09 ` Tvrtko Ursulin
0 siblings, 0 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2015-11-10 12:09 UTC (permalink / raw)
To: Chris Wilson, Intel-gfx
On 10/11/15 11:36, Chris Wilson wrote:
> On Tue, Nov 10, 2015 at 10:59:40AM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Some random bits to make the LRC irq handler do fewer branching,
>> locking and VMA lookups per interrupt handled.
>>
>> I failed to measure a significant gain on a powerful chip but it
>> definitely results in fewer instructions and branches in the hot
>> path. So maybe it would help a lower power chip more.
>>
>> Possibly makes the first run of gem_exec_nop have less variance
>> eg. branch predictor maybe gets warmed up sooner but I am not
>> completely confident in this interpretation.
>
> I have previously posted these and so much more.
Thread subject ?
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-11-10 12:09 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-10 10:59 [RFC 0/5] LRC irq handler cleanups Tvrtko Ursulin
2015-11-10 10:59 ` [PATCH 1/5] drm/i915: Avoid invariant conditionals in lrc interrupt handler Tvrtko Ursulin
2015-11-10 10:59 ` [PATCH 2/5] drm/i915: Move LRCA check out of the hot path Tvrtko Ursulin
2015-11-10 10:59 ` [PATCH 3/5] drm/i915: Cache LRCA in the context Tvrtko Ursulin
2015-11-10 10:59 ` [PATCH 4/5] drm/i915: Grab one forcewake across the whole LRC irq handler Tvrtko Ursulin
2015-11-10 10:59 ` [PATCH 5/5] drm/i915: Only grab and calculate timestamps when needed Tvrtko Ursulin
2015-11-10 11:36 ` [RFC 0/5] LRC irq handler cleanups Chris Wilson
2015-11-10 12:09 ` Tvrtko Ursulin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox