intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/i915/execlists: Micro-optimise "idle" context switch
@ 2018-08-17 12:24 Chris Wilson
  2018-08-17 12:47 ` ✓ Fi.CI.BAT: success for " Patchwork
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Chris Wilson @ 2018-08-17 12:24 UTC (permalink / raw)
  To: intel-gfx

On gen9, we see an effect where when we perform an element switch just
as the first context completes execution that switch takes twice as
long, as if it first reloads the completed context. That is we observe
the cost of

	context1 -> idle -> context1 ->  context2

as being twice the cost of the same operation as on gen8. The impact of
this is incredibly rare outside of microbenchmarks that are focused on
assessing the throughput of context switches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
---
I think is a microbenchmark too far, as there is no real world impact of
this as both the likelihood of submission at that precise point of time,
and the context switch being a significant fraction of the batch
runtime make the effect miniscule in practise.

It is also not foolproof for even gem_ctx_switch:
kbl ctx1 -> idle -> ctx2: ~25us;
    ctx1 -> idle -> ctx1 -> ctx2 (unpatched): ~53us
    ctx1 -> idle -> ctx1 -> ctx2 (patched): 30-40us

bxt ctx1 -> idle -> ctx2: ~40us
    ctx1 -> idle -> ctx1 -> ctx2 (unpatched): ~80
    ctx1 -> idle -> ctx1 -> ctx2 (patched): 60-70us

So consider this as more of a plea for ideas; why does bdw behaviour
better? Are we missing a flag, a fox or a chicken?
-Chris
---
 drivers/gpu/drm/i915/intel_lrc.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 36050f085071..682268d4249d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -711,6 +711,24 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 
 				GEM_BUG_ON(last->hw_context == rq->hw_context);
 
+				/*
+				 * Avoid reloading the previous context if we
+				 * know it has just completed and we want
+				 * to switch over to a new context. The CS
+				 * interrupt is likely waiting for us to
+				 * release the local irq lock and so we will
+				 * proceed with the submission momentarily,
+				 * which is quicker than reloading the context
+				 * on the gpu.
+				 */
+				if (!submit &&
+				    intel_engine_signaled(engine,
+							  last->global_seqno)) {
+					GEM_BUG_ON(!list_is_first(&rq->sched.link,
+								  &p->requests));
+					return;
+				}
+
 				if (submit)
 					port_assign(port, last);
 				port++;
-- 
2.18.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-08-17 15:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-17 12:24 [PATCH] drm/i915/execlists: Micro-optimise "idle" context switch Chris Wilson
2018-08-17 12:47 ` ✓ Fi.CI.BAT: success for " Patchwork
2018-08-17 13:43 ` [PATCH] " kbuild test robot
2018-08-17 14:23 ` kbuild test robot
2018-08-17 15:57 ` ✗ Fi.CI.IGT: failure for " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).