Fleshing out the picture to Load Balancing^W^W HW semaphores

public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed

* Fleshing out the picture to Load Balancing^W^W HW semaphores
@ 2019-02-04 13:21 Chris Wilson
  2019-02-04 13:21 ` [PATCH 01/22] drm/i915/execlists: Suppress mere WAIT preemption Chris Wilson
                   ` (28 more replies)
  0 siblings, 29 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:21 UTC (permalink / raw)
  To: intel-gfx

Just the bits and pieces to finish off context seqno (not including the
per-global seqno removal though) to complete the picture prior to SSEU
landing and load balancing shortly after.
-Chris



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 01/22] drm/i915/execlists: Suppress mere WAIT preemption
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
@ 2019-02-04 13:21 ` Chris Wilson
  2019-02-04 13:21 ` [PATCH 02/22] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:21 UTC (permalink / raw)
  To: intel-gfx

WAIT is occasionally suppressed by virtue of preempted requests being
promoted to NEWCLIENT if they have not all ready received that boost.
Make this consistent for all WAIT boosts that they are not allowed to
preempt executing contexts and are merely granted the right to be at the
front of the queue for the next execution slot. This is in keeping with
the desire that the WAIT boost be a minor tweak that does not give
excessive promotion to its user and open ourselves to trivial abuse.

The problem with the inconsistent WAIT preemption becomes more apparent
as the preemption is propagated across the engines, where one engine may
preempt and the other not, and we be relying on the exact execution
order being consistent across engines (e.g. using HW semaphores to
coordinate parallel execution).

v2: Also protect GuC submission from false preemption loops.
v3: Build bug safeguards and better debug messages for st.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c          |  12 ++
 drivers/gpu/drm/i915/i915_scheduler.h        |   2 +
 drivers/gpu/drm/i915/intel_lrc.c             |   9 +-
 drivers/gpu/drm/i915/selftests/igt_spinner.c |   9 +-
 drivers/gpu/drm/i915/selftests/intel_lrc.c   | 160 +++++++++++++++++++
 5 files changed, 190 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 9ed5baf157a3..d14a1b225f47 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -377,12 +377,24 @@ void __i915_request_submit(struct i915_request *request)
 
 	/* We may be recursing from the signal callback of another i915 fence */
 	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
+
 	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
 	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
+
 	request->global_seqno = seqno;
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
 	    !i915_request_enable_breadcrumb(request))
 		intel_engine_queue_breadcrumbs(engine);
+
+	/*
+	 * As we do not allow WAIT to preempt inflight requests,
+	 * once we have executed a request, along with triggering
+	 * any execution callbacks, we must preserve its ordering
+	 * within the non-preemptible FIFO.
+	 */
+	BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */
+	request->sched.attr.priority |= __NO_PREEMPTION;
+
 	spin_unlock(&request->lock);
 
 	engine->emit_fini_breadcrumb(request,
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index dbe9cb7ecd82..54bd6c89817e 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -33,6 +33,8 @@ enum {
 #define I915_PRIORITY_WAIT	((u8)BIT(0))
 #define I915_PRIORITY_NEWCLIENT	((u8)BIT(1))
 
+#define __NO_PREEMPTION (I915_PRIORITY_WAIT)
+
 struct i915_sched_attr {
 	/**
 	 * @priority: execution and service priority
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a9eb0211ce77..773df0bd685b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -188,6 +188,12 @@ static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority;
 }
 
+static int effective_prio(const struct i915_request *rq)
+{
+	/* Restrict mere WAIT boosts from triggering preemption */
+	return rq_prio(rq) | __NO_PREEMPTION;
+}
+
 static int queue_prio(const struct intel_engine_execlists *execlists)
 {
 	struct i915_priolist *p;
@@ -208,7 +214,7 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
 static inline bool need_preempt(const struct intel_engine_cs *engine,
 				const struct i915_request *rq)
 {
-	const int last_prio = rq_prio(rq);
+	int last_prio;
 
 	if (!intel_engine_has_preemption(engine))
 		return false;
@@ -228,6 +234,7 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 	 * preempt. If that hint is stale or we may be trying to preempt
 	 * ourselves, ignore the request.
 	 */
+	last_prio = effective_prio(rq);
 	if (!__execlists_need_preempt(engine->execlists.queue_priority_hint,
 				      last_prio))
 		return false;
diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
index 9ebd9225684e..86354e51bdd3 100644
--- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
+++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
@@ -142,10 +142,17 @@ igt_spinner_create_request(struct igt_spinner *spin,
 	*batch++ = upper_32_bits(vma->node.start);
 	*batch++ = MI_BATCH_BUFFER_END; /* not reached */
 
-	i915_gem_chipset_flush(spin->i915);
+	if (engine->emit_init_breadcrumb &&
+	    rq->timeline->has_initial_breadcrumb) {
+		err = engine->emit_init_breadcrumb(rq);
+		if (err)
+			goto cancel_rq;
+	}
 
 	err = engine->emit_bb_start(rq, vma->node.start, PAGE_SIZE, 0);
 
+	i915_gem_chipset_flush(spin->i915);
+
 cancel_rq:
 	if (err) {
 		i915_request_skip(rq, err);
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index fb35f53c9ce3..16037a841146 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -405,6 +405,165 @@ static int live_suppress_self_preempt(void *arg)
 	goto err_client_b;
 }
 
+static int __i915_sw_fence_call
+dummy_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
+{
+	return NOTIFY_DONE;
+}
+
+static struct i915_request *dummy_request(struct intel_engine_cs *engine)
+{
+	struct i915_request *rq;
+
+	rq = kmalloc(sizeof(*rq), GFP_KERNEL | __GFP_ZERO);
+	if (!rq)
+		return NULL;
+
+	INIT_LIST_HEAD(&rq->active_list);
+	rq->engine = engine;
+
+	i915_sched_node_init(&rq->sched);
+
+	/* mark this request as permanently incomplete */
+	rq->fence.seqno = 1;
+	BUILD_BUG_ON(sizeof(rq->fence.seqno) != 8); /* upper 32b == 0 */
+	rq->hwsp_seqno = (u32 *)&rq->fence.seqno + 1;
+
+	i915_sw_fence_init(&rq->submit, dummy_notify);
+	i915_sw_fence_commit(&rq->submit);
+
+	return rq;
+}
+
+static void dummy_request_free(struct i915_request *dummy)
+{
+	i915_request_mark_complete(dummy);
+	i915_sched_node_fini(dummy->engine->i915, &dummy->sched);
+	kfree(dummy);
+}
+
+static int live_suppress_wait_preempt(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct preempt_client client[4];
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	int err = -ENOMEM;
+	int i;
+
+	/*
+	 * Waiters are given a little priority nudge, but not enough
+	 * to actually cause any preemption. Double check that we do
+	 * not needlessly generate preempt-to-idle cycles.
+	 */
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	if (preempt_client_init(i915, &client[0])) /* ELSP[0] */
+		goto err_unlock;
+	if (preempt_client_init(i915, &client[1])) /* ELSP[1] */
+		goto err_client_0;
+	if (preempt_client_init(i915, &client[2])) /* head of queue */
+		goto err_client_1;
+	if (preempt_client_init(i915, &client[3])) /* bystander */
+		goto err_client_2;
+
+	for_each_engine(engine, i915, id) {
+		int depth;
+
+		if (!engine->emit_init_breadcrumb)
+			continue;
+
+		for (depth = 0; depth < ARRAY_SIZE(client); depth++) {
+			struct i915_request *rq[ARRAY_SIZE(client)];
+			struct i915_request *dummy;
+
+			engine->execlists.preempt_hang.count = 0;
+
+			dummy = dummy_request(engine);
+			if (!dummy)
+				goto err_client_3;
+
+			for (i = 0; i < ARRAY_SIZE(client); i++) {
+				rq[i] = igt_spinner_create_request(&client[i].spin,
+								   client[i].ctx, engine,
+								   MI_NOOP);
+				if (IS_ERR(rq[i])) {
+					err = PTR_ERR(rq[i]);
+					goto err_wedged;
+				}
+
+				/* Disable NEWCLIENT promotion */
+				i915_gem_active_set(&rq[i]->timeline->last_request,
+						    dummy);
+				i915_request_add(rq[i]);
+			}
+
+			dummy_request_free(dummy);
+
+			GEM_BUG_ON(i915_request_completed(rq[0]));
+			if (!igt_wait_for_spinner(&client[0].spin, rq[0])) {
+				pr_err("%s: First client failed to start\n",
+				       engine->name);
+				goto err_wedged;
+			}
+			GEM_BUG_ON(!i915_request_started(rq[0]));
+
+			if (i915_request_wait(rq[depth],
+					      I915_WAIT_LOCKED |
+					      I915_WAIT_PRIORITY,
+					      1) != -ETIME) {
+				pr_err("%s: Waiter depth:%d completed!\n",
+				       engine->name, depth);
+				goto err_wedged;
+			}
+
+			for (i = 0; i < ARRAY_SIZE(client); i++)
+				igt_spinner_end(&client[i].spin);
+
+			if (igt_flush_test(i915, I915_WAIT_LOCKED))
+				goto err_wedged;
+
+			if (engine->execlists.preempt_hang.count) {
+				pr_err("%s: Preemption recorded x%d, depth %d; should have been suppressed!\n",
+				       engine->name,
+				       engine->execlists.preempt_hang.count,
+				       depth);
+				err = -EINVAL;
+				goto err_client_3;
+			}
+		}
+	}
+
+	err = 0;
+err_client_3:
+	preempt_client_fini(&client[3]);
+err_client_2:
+	preempt_client_fini(&client[2]);
+err_client_1:
+	preempt_client_fini(&client[1]);
+err_client_0:
+	preempt_client_fini(&client[0]);
+err_unlock:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+
+err_wedged:
+	for (i = 0; i < ARRAY_SIZE(client); i++)
+		igt_spinner_end(&client[i].spin);
+	i915_gem_set_wedged(i915);
+	err = -EIO;
+	goto err_client_3;
+}
+
 static int live_preempt_hang(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -785,6 +944,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_preempt),
 		SUBTEST(live_late_preempt),
 		SUBTEST(live_suppress_self_preempt),
+		SUBTEST(live_suppress_wait_preempt),
 		SUBTEST(live_preempt_hang),
 		SUBTEST(live_preempt_smoke),
 	};
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 02/22] drm/i915/execlists: Suppress redundant preemption
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
  2019-02-04 13:21 ` [PATCH 01/22] drm/i915/execlists: Suppress mere WAIT preemption Chris Wilson
@ 2019-02-04 13:21 ` Chris Wilson
  2019-02-04 13:21 ` [PATCH 03/22] drm/i915/selftests: Exercise some AB...BA preemption chains Chris Wilson
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:21 UTC (permalink / raw)
  To: intel-gfx

On unwinding the active request we give it a small (limited to internal
priority levels) boost to prevent it from being gazumped a second time.
However, this means that it can be promoted to above the request that
triggered the preemption request, causing a preempt-to-idle cycle for no
change. We can avoid this if we take the boost into account when
checking if the preemption request is valid.

v2: After preemption the active request will be after the preemptee if
they end up with equal priority.

v3: Tvrtko pointed out that this, the existing logic, makes
I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk!

v4: Prove Tvrtko was right about WAIT being non-preemptible and test it.
v5: Except not all priorities were made equal, and the WAIT not preempting
is only if we start off as !NEWCLIENT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 773df0bd685b..9b6b3acb9070 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -164,6 +164,8 @@
 #define WA_TAIL_DWORDS 2
 #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
 
+#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
+
 static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 					    struct intel_engine_cs *engine,
 					    struct intel_context *ce);
@@ -190,8 +192,30 @@ static inline int rq_prio(const struct i915_request *rq)
 
 static int effective_prio(const struct i915_request *rq)
 {
+	int prio = rq_prio(rq);
+
+	/*
+	 * On unwinding the active request, we give it a priority bump
+	 * equivalent to a freshly submitted request. This protects it from
+	 * being gazumped again, but it would be preferable if we didn't
+	 * let it be gazumped in the first place!
+	 *
+	 * See __unwind_incomplete_requests()
+	 */
+	if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(rq)) {
+		/*
+		 * After preemption, we insert the active request at the
+		 * end of the new priority level. This means that we will be
+		 * _lower_ priority than the preemptee all things equal (and
+		 * so the preemption is valid), so adjust our comparison
+		 * accordingly.
+		 */
+		prio |= ACTIVE_PRIORITY;
+		prio--;
+	}
+
 	/* Restrict mere WAIT boosts from triggering preemption */
-	return rq_prio(rq) | __NO_PREEMPTION;
+	return prio | __NO_PREEMPTION;
 }
 
 static int queue_prio(const struct intel_engine_execlists *execlists)
@@ -360,7 +384,7 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 {
 	struct i915_request *rq, *rn, *active = NULL;
 	struct list_head *uninitialized_var(pl);
-	int prio = I915_PRIORITY_INVALID | I915_PRIORITY_NEWCLIENT;
+	int prio = I915_PRIORITY_INVALID | ACTIVE_PRIORITY;
 
 	lockdep_assert_held(&engine->timeline.lock);
 
@@ -391,9 +415,15 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 	 * The active request is now effectively the start of a new client
 	 * stream, so give it the equivalent small priority bump to prevent
 	 * it being gazumped a second time by another peer.
+	 *
+	 * One consequence of this preemption boost is that we may jump
+	 * over lesser priorities (such as I915_PRIORITY_WAIT), effectively
+	 * making those priorities non-preemptible. They will be moved forward
+	 * in the priority queue, but they will not gain immediate access to
+	 * the GPU.
 	 */
-	if (!(prio & I915_PRIORITY_NEWCLIENT)) {
-		prio |= I915_PRIORITY_NEWCLIENT;
+	if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(active)) {
+		prio |= ACTIVE_PRIORITY;
 		active->sched.attr.priority = prio;
 		list_move_tail(&active->sched.link,
 			       i915_sched_lookup_priolist(engine, prio));
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 03/22] drm/i915/selftests: Exercise some AB...BA preemption chains
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
  2019-02-04 13:21 ` [PATCH 01/22] drm/i915/execlists: Suppress mere WAIT preemption Chris Wilson
  2019-02-04 13:21 ` [PATCH 02/22] drm/i915/execlists: Suppress redundant preemption Chris Wilson
@ 2019-02-04 13:21 ` Chris Wilson
  2019-02-04 13:21 ` [PATCH 04/22] drm/i915: Trim NEWCLIENT boosting Chris Wilson
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:21 UTC (permalink / raw)
  To: intel-gfx

Build a chain using 2 contexts (A, B) then request a preemption such
that a later A request runs before the spinner in B.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/selftests/intel_lrc.c | 103 +++++++++++++++++++++
 1 file changed, 103 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 16037a841146..967cefa118ee 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -4,6 +4,8 @@
  * Copyright © 2018 Intel Corporation
  */
 
+#include <linux/prime_numbers.h>
+
 #include "../i915_reset.h"
 
 #include "../i915_selftest.h"
@@ -564,6 +566,106 @@ static int live_suppress_wait_preempt(void *arg)
 	goto err_client_3;
 }
 
+static int live_chain_preempt(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+	struct preempt_client hi, lo;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	int err = -ENOMEM;
+
+	/*
+	 * Build a chain AB...BA between two contexts (A, B) and request
+	 * preemption of the last request. It should then complete before
+	 * the previously submitted spinner in B.
+	 */
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	if (preempt_client_init(i915, &hi))
+		goto err_unlock;
+
+	if (preempt_client_init(i915, &lo))
+		goto err_client_hi;
+
+	for_each_engine(engine, i915, id) {
+		struct i915_sched_attr attr = {
+			.priority = I915_USER_PRIORITY(I915_PRIORITY_MAX),
+		};
+		int count, i;
+
+		for_each_prime_number_from(count, 1, 32) { /* must fit ring! */
+			struct i915_request *rq;
+
+			rq = igt_spinner_create_request(&hi.spin,
+							hi.ctx, engine,
+							MI_ARB_CHECK);
+			if (IS_ERR(rq))
+				goto err_wedged;
+			i915_request_add(rq);
+			if (!igt_wait_for_spinner(&hi.spin, rq))
+				goto err_wedged;
+
+			rq = igt_spinner_create_request(&lo.spin,
+							lo.ctx, engine,
+							MI_ARB_CHECK);
+			if (IS_ERR(rq))
+				goto err_wedged;
+			i915_request_add(rq);
+
+			for (i = 0; i < count; i++) {
+				rq = i915_request_alloc(engine, lo.ctx);
+				if (IS_ERR(rq))
+					goto err_wedged;
+				i915_request_add(rq);
+			}
+
+			rq = i915_request_alloc(engine, hi.ctx);
+			if (IS_ERR(rq))
+				goto err_wedged;
+			i915_request_add(rq);
+			engine->schedule(rq, &attr);
+
+			igt_spinner_end(&hi.spin);
+			if (i915_request_wait(rq, I915_WAIT_LOCKED, HZ / 5) < 0) {
+				struct drm_printer p =
+					drm_info_printer(i915->drm.dev);
+
+				pr_err("Failed to preempt over chain of %d\n",
+				       count);
+				intel_engine_dump(engine, &p,
+						  "%s\n", engine->name);
+				goto err_wedged;
+			}
+			igt_spinner_end(&lo.spin);
+		}
+	}
+
+	err = 0;
+err_client_lo:
+	preempt_client_fini(&lo);
+err_client_hi:
+	preempt_client_fini(&hi);
+err_unlock:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+
+err_wedged:
+	igt_spinner_end(&hi.spin);
+	igt_spinner_end(&lo.spin);
+	i915_gem_set_wedged(i915);
+	err = -EIO;
+	goto err_client_lo;
+}
+
 static int live_preempt_hang(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -945,6 +1047,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_late_preempt),
 		SUBTEST(live_suppress_self_preempt),
 		SUBTEST(live_suppress_wait_preempt),
+		SUBTEST(live_chain_preempt),
 		SUBTEST(live_preempt_hang),
 		SUBTEST(live_preempt_smoke),
 	};
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 04/22] drm/i915: Trim NEWCLIENT boosting
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (2 preceding siblings ...)
  2019-02-04 13:21 ` [PATCH 03/22] drm/i915/selftests: Exercise some AB...BA preemption chains Chris Wilson
@ 2019-02-04 13:21 ` Chris Wilson
  2019-02-04 15:01   ` [PATCH] " Chris Wilson
  2019-02-04 13:21 ` [PATCH 05/22] drm/i915: Show support for accurate sw PMU busyness tracking Chris Wilson
                   ` (24 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:21 UTC (permalink / raw)
  To: intel-gfx

Limit the NEWCLIENT boost to only give its small priority boost to fresh
clients only that have no dependencies.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index d14a1b225f47..04c65e6d83b9 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -980,7 +980,7 @@ void i915_request_add(struct i915_request *request)
 		 * Allow interactive/synchronous clients to jump ahead of
 		 * the bulk clients. (FQ_CODEL)
 		 */
-		if (!prev || i915_request_completed(prev))
+		if (list_empty(&request->sched.signalers_list))
 			attr.priority |= I915_PRIORITY_NEWCLIENT;
 
 		engine->schedule(request, &attr);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] drm/i915: Trim NEWCLIENT boosting
  2019-02-04 13:21 ` [PATCH 04/22] drm/i915: Trim NEWCLIENT boosting Chris Wilson
@ 2019-02-04 15:01   ` Chris Wilson
  2019-02-04 19:05     ` Tvrtko Ursulin
  0 siblings, 1 reply; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 15:01 UTC (permalink / raw)
  To: intel-gfx

Limit the NEWCLIENT boost to only give its small priority boost to fresh
clients only that have no dependencies.

The idea for using NEWCLIENT boosting, commit b16c765122f9 ("drm/i915:
Priority boost for new clients"), is that short-lived streams are often
interactive and require lower latency -- and that by executing those
ahead of the long running hogs, the short-lived clients do little
interfere with the system throughput by virtue of their short-lived
nature. However, we were only considering the client's own timeline for
determining whether or not it was a fresh stream. This allowed for
compositors to wake up before their vblank and bump all of its client
streams. However in testing with media-bench this results in chaining
all cooperating contexts together preventing us from being able to
reorder contexts to reduce bubbles (pipeline stalls), overall increasing
latency, and reducing system throughput. The exact opposite of our
intent. The compromise of applying the NEWCLIENT boost to strictly fresh
clients (that do not wait upon anything else) should maintain the
real-time response under load characteristics of FQ_CODEL, without
locking together the long chains of dependencies across the system.

References: b16c765122f9 ("drm/i915: Priority boost for new clients")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index d14a1b225f47..04c65e6d83b9 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -980,7 +980,7 @@ void i915_request_add(struct i915_request *request)
 		 * Allow interactive/synchronous clients to jump ahead of
 		 * the bulk clients. (FQ_CODEL)
 		 */
-		if (!prev || i915_request_completed(prev))
+		if (list_empty(&request->sched.signalers_list))
 			attr.priority |= I915_PRIORITY_NEWCLIENT;

 		engine->schedule(request, &attr);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH] drm/i915: Trim NEWCLIENT boosting
  2019-02-04 15:01   ` [PATCH] " Chris Wilson
@ 2019-02-04 19:05     ` Tvrtko Ursulin
  0 siblings, 0 replies; 45+ messages in thread
From: Tvrtko Ursulin @ 2019-02-04 19:05 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/02/2019 15:01, Chris Wilson wrote:
> Limit the NEWCLIENT boost to only give its small priority boost to fresh
> clients only that have no dependencies.
> 
> The idea for using NEWCLIENT boosting, commit b16c765122f9 ("drm/i915:
> Priority boost for new clients"), is that short-lived streams are often
> interactive and require lower latency -- and that by executing those
> ahead of the long running hogs, the short-lived clients do little
> interfere with the system throughput by virtue of their short-lived
> nature. However, we were only considering the client's own timeline for
> determining whether or not it was a fresh stream. This allowed for
> compositors to wake up before their vblank and bump all of its client
> streams. However in testing with media-bench this results in chaining
> all cooperating contexts together preventing us from being able to
> reorder contexts to reduce bubbles (pipeline stalls), overall increasing
> latency, and reducing system throughput. The exact opposite of our
> intent. The compromise of applying the NEWCLIENT boost to strictly fresh
> clients (that do not wait upon anything else) should maintain the
> real-time response under load characteristics of FQ_CODEL, without
> locking together the long chains of dependencies across the system.
> 
> References: b16c765122f9 ("drm/i915: Priority boost for new clients")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index d14a1b225f47..04c65e6d83b9 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -980,7 +980,7 @@ void i915_request_add(struct i915_request *request)
>   		 * Allow interactive/synchronous clients to jump ahead of
>   		 * the bulk clients. (FQ_CODEL)
>   		 */
> -		if (!prev || i915_request_completed(prev))
> +		if (list_empty(&request->sched.signalers_list))
>   			attr.priority |= I915_PRIORITY_NEWCLIENT;
>   
>   		engine->schedule(request, &attr);
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 05/22] drm/i915: Show support for accurate sw PMU busyness tracking
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (3 preceding siblings ...)
  2019-02-04 13:21 ` [PATCH 04/22] drm/i915: Trim NEWCLIENT boosting Chris Wilson
@ 2019-02-04 13:21 ` Chris Wilson
  2019-02-04 16:49   ` Chris Wilson
  2019-02-04 13:21 ` [PATCH 06/22] drm/i915: Revoke mmaps and prevent access to fence registers across reset Chris Wilson
                   ` (23 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:21 UTC (permalink / raw)
  To: intel-gfx

Expose whether or not we support the PMU software tracking in our
scheduler capabilities, so userspace can query at runtime.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         |  2 ++
 drivers/gpu/drm/i915/intel_engine_cs.c  | 38 +++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        |  6 ----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
 include/uapi/drm/i915_drm.h             |  1 +
 5 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e802af64d628..bc7d1338b69a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4757,6 +4757,8 @@ static int __i915_gem_restart_engines(void *data)
 		}
 	}
 
+	intel_engines_set_scheduler_caps(i915);
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 71c01eb13af1..ec2cbbe070a4 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -614,6 +614,44 @@ int intel_engine_setup_common(struct intel_engine_cs *engine)
 	return err;
 }
 
+void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
+{
+	static const struct {
+		u32 engine_flag;
+		u32 sched_cap;
+	} map[] = {
+		{ I915_ENGINE_HAS_PREEMPTION, I915_SCHEDULER_CAP_PREEMPTION },
+		{ I915_ENGINE_SUPPORTS_STATS, I915_SCHEDULER_CAP_PMU },
+	};
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	u32 enabled, disabled;
+
+	enabled = 0;
+	disabled = 0;
+	for_each_engine(engine, i915, id) { /* all engines must agree! */
+		int i;
+
+		if (engine->schedule)
+			enabled |= (I915_SCHEDULER_CAP_ENABLED |
+				    I915_SCHEDULER_CAP_PRIORITY);
+		else
+			disabled |= (I915_SCHEDULER_CAP_ENABLED |
+				     I915_SCHEDULER_CAP_PRIORITY);
+
+		for (i = 0; i < ARRAY_SIZE(map); i++) {
+			if (engine->flags & map[i].engine_flag)
+				enabled |= map[i].sched_cap;
+			else
+				disabled |= map[i].sched_cap;
+		}
+	}
+
+	i915->caps.scheduler = enabled & ~disabled;
+	if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_ENABLED))
+		i915->caps.scheduler = 0;
+}
+
 static void __intel_context_unpin(struct i915_gem_context *ctx,
 				  struct intel_engine_cs *engine)
 {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9b6b3acb9070..0869a4fd20c7 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2299,12 +2299,6 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
 	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
 	if (engine->i915->preempt_context)
 		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
-
-	engine->i915->caps.scheduler =
-		I915_SCHEDULER_CAP_ENABLED |
-		I915_SCHEDULER_CAP_PRIORITY;
-	if (intel_engine_has_preemption(engine))
-		engine->i915->caps.scheduler |= I915_SCHEDULER_CAP_PREEMPTION;
 }
 
 static void
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 1398eb81dee6..8183d3441907 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -590,6 +590,8 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine)
 	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
 }
 
+void intel_engines_set_scheduler_caps(struct drm_i915_private *i915);
+
 static inline bool __execlists_need_preempt(int prio, int last)
 {
 	/*
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 298b2e197744..d8ac7f105734 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -476,6 +476,7 @@ typedef struct drm_i915_irq_wait {
 #define   I915_SCHEDULER_CAP_ENABLED	(1ul << 0)
 #define   I915_SCHEDULER_CAP_PRIORITY	(1ul << 1)
 #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
+#define   I915_SCHEDULER_CAP_PMU	(1ul << 3)
 
 #define I915_PARAM_HUC_STATUS		 42
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 05/22] drm/i915: Show support for accurate sw PMU busyness tracking
  2019-02-04 13:21 ` [PATCH 05/22] drm/i915: Show support for accurate sw PMU busyness tracking Chris Wilson
@ 2019-02-04 16:49   ` Chris Wilson
  0 siblings, 0 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 16:49 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2019-02-04 13:21:57)
> Expose whether or not we support the PMU software tracking in our
> scheduler capabilities, so userspace can query at runtime.

Another datum:

        /*
         * Also there is software busyness tracking available we do not
         * need the timer for I915_SAMPLE_BUSY counter.
         *
         * Use RCS as proxy for all engines.
         */
        else if (intel_engine_supports_stats(i915->engine[RCS]))
                enable &= ~BIT(I915_SAMPLE_BUSY);

becomes a global check rather than a proxy.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 06/22] drm/i915: Revoke mmaps and prevent access to fence registers across reset
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (4 preceding siblings ...)
  2019-02-04 13:21 ` [PATCH 05/22] drm/i915: Show support for accurate sw PMU busyness tracking Chris Wilson
@ 2019-02-04 13:21 ` Chris Wilson
  2019-02-04 13:21 ` [PATCH 07/22] drm/i915: Force the GPU reset upon wedging Chris Wilson
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

Previously, we were able to rely on the recursive properties of
struct_mutex to allow us to serialise revoking mmaps and reacquiring the
FENCE registers with them being clobbered over a global device reset.
I then proceeded to throw out the baby with the bath water in order to
pursue a struct_mutex-less reset.

Perusing LWN for alternative strategies, the dilemma on how to serialise
access to a global resource on one side was answered by
https://lwn.net/Articles/202847/ -- Sleepable RCU:

    1  int readside(void) {
    2      int idx;
    3      rcu_read_lock();
    4	   if (nomoresrcu) {
    5          rcu_read_unlock();
    6	       return -EINVAL;
    7      }
    8	   idx = srcu_read_lock(&ss);
    9	   rcu_read_unlock();
    10	   /* SRCU read-side critical section. */
    11	   srcu_read_unlock(&ss, idx);
    12	   return 0;
    13 }
    14
    15 void cleanup(void)
    16 {
    17     nomoresrcu = 1;
    18     synchronize_rcu();
    19     synchronize_srcu(&ss);
    20     cleanup_srcu_struct(&ss);
    21 }

No more worrying about stop_machine, just an uber-complex mutex,
optimised for reads, with the overhead pushed to the rare reset path.

However, we do run the risk of a deadlock as we allocate underneath the
SRCU read lock, and the allocation may require a GPU reset, causing a
dependency cycle via the in-flight requests. We resolve that by declaring
the driver wedged and cancelling all in-flight rendering.

v2: Use expedited rcu barriers to match our earlier timing
characteristics.
v3: Try to annotate locking contexts for sparse
v4: Reduce selftest lock duration to avoid a reset deadlock with fences

Testcase: igt/gem_mmap_gtt/hang
Fixes: eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c           |  12 +-
 drivers/gpu/drm/i915/i915_drv.h               |  18 +--
 drivers/gpu/drm/i915/i915_gem.c               |  56 +++------
 drivers/gpu/drm/i915/i915_gem_fence_reg.c     |  31 +----
 drivers/gpu/drm/i915/i915_gpu_error.h         |  12 +-
 drivers/gpu/drm/i915/i915_reset.c             | 107 +++++++++++-------
 drivers/gpu/drm/i915/i915_reset.h             |   4 +
 .../gpu/drm/i915/selftests/intel_hangcheck.c  |   5 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |   1 +
 9 files changed, 109 insertions(+), 137 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index fa2c226fc779..2cea263b4d79 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1281,14 +1281,11 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 	intel_wakeref_t wakeref;
 	enum intel_engine_id id;
 
+	seq_printf(m, "Reset flags: %lx\n", dev_priv->gpu_error.flags);
 	if (test_bit(I915_WEDGED, &dev_priv->gpu_error.flags))
-		seq_puts(m, "Wedged\n");
+		seq_puts(m, "\tWedged\n");
 	if (test_bit(I915_RESET_BACKOFF, &dev_priv->gpu_error.flags))
-		seq_puts(m, "Reset in progress: struct_mutex backoff\n");
-	if (waitqueue_active(&dev_priv->gpu_error.wait_queue))
-		seq_puts(m, "Waiter holding struct mutex\n");
-	if (waitqueue_active(&dev_priv->gpu_error.reset_queue))
-		seq_puts(m, "struct_mutex blocked for reset\n");
+		seq_puts(m, "\tDevice (global) reset in progress\n");
 
 	if (!i915_modparams.enable_hangcheck) {
 		seq_puts(m, "Hangcheck disabled\n");
@@ -3885,9 +3882,6 @@ i915_wedged_set(void *data, u64 val)
 	 * while it is writing to 'i915_wedged'
 	 */
 
-	if (i915_reset_backoff(&i915->gpu_error))
-		return -EAGAIN;
-
 	i915_handle_error(i915, val, I915_ERROR_CAPTURE,
 			  "Manually set wedged engine mask = %llx", val);
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 534e52e3a8da..3e4538ce5276 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2989,7 +2989,12 @@ i915_gem_obj_finish_shmem_access(struct drm_i915_gem_object *obj)
 	i915_gem_object_unpin_pages(obj);
 }
 
-int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
+static inline int __must_check
+i915_mutex_lock_interruptible(struct drm_device *dev)
+{
+	return mutex_lock_interruptible(&dev->struct_mutex);
+}
+
 int i915_gem_dumb_create(struct drm_file *file_priv,
 			 struct drm_device *dev,
 			 struct drm_mode_create_dumb *args);
@@ -3006,21 +3011,11 @@ int __must_check i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno);
 struct i915_request *
 i915_gem_find_active_request(struct intel_engine_cs *engine);
 
-static inline bool i915_reset_backoff(struct i915_gpu_error *error)
-{
-	return unlikely(test_bit(I915_RESET_BACKOFF, &error->flags));
-}
-
 static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
 {
 	return unlikely(test_bit(I915_WEDGED, &error->flags));
 }
 
-static inline bool i915_reset_backoff_or_wedged(struct i915_gpu_error *error)
-{
-	return i915_reset_backoff(error) | i915_terminally_wedged(error);
-}
-
 static inline u32 i915_reset_count(struct i915_gpu_error *error)
 {
 	return READ_ONCE(error->reset_count);
@@ -3093,7 +3088,6 @@ struct drm_i915_fence_reg *
 i915_reserve_fence(struct drm_i915_private *dev_priv);
 void i915_unreserve_fence(struct drm_i915_fence_reg *fence);
 
-void i915_gem_revoke_fences(struct drm_i915_private *dev_priv);
 void i915_gem_restore_fences(struct drm_i915_private *dev_priv);
 
 void i915_gem_detect_bit_6_swizzle(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index bc7d1338b69a..2c6161c89cc7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -100,47 +100,6 @@ static void i915_gem_info_remove_obj(struct drm_i915_private *dev_priv,
 	spin_unlock(&dev_priv->mm.object_stat_lock);
 }
 
-static int
-i915_gem_wait_for_error(struct i915_gpu_error *error)
-{
-	int ret;
-
-	might_sleep();
-
-	/*
-	 * Only wait 10 seconds for the gpu reset to complete to avoid hanging
-	 * userspace. If it takes that long something really bad is going on and
-	 * we should simply try to bail out and fail as gracefully as possible.
-	 */
-	ret = wait_event_interruptible_timeout(error->reset_queue,
-					       !i915_reset_backoff(error),
-					       I915_RESET_TIMEOUT);
-	if (ret == 0) {
-		DRM_ERROR("Timed out waiting for the gpu reset to complete\n");
-		return -EIO;
-	} else if (ret < 0) {
-		return ret;
-	} else {
-		return 0;
-	}
-}
-
-int i915_mutex_lock_interruptible(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = to_i915(dev);
-	int ret;
-
-	ret = i915_gem_wait_for_error(&dev_priv->gpu_error);
-	if (ret)
-		return ret;
-
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
-	if (ret)
-		return ret;
-
-	return 0;
-}
-
 static u32 __i915_gem_park(struct drm_i915_private *i915)
 {
 	intel_wakeref_t wakeref;
@@ -1869,6 +1828,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 	intel_wakeref_t wakeref;
 	struct i915_vma *vma;
 	pgoff_t page_offset;
+	int srcu;
 	int ret;
 
 	/* Sanity check that we allow writing into this object */
@@ -1908,7 +1868,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 		goto err_unlock;
 	}
 
-
 	/* Now pin it into the GTT as needed */
 	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
 				       PIN_MAPPABLE |
@@ -1946,9 +1905,15 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 	if (ret)
 		goto err_unpin;
 
+	srcu = i915_reset_trylock(dev_priv);
+	if (srcu < 0) {
+		ret = srcu;
+		goto err_unpin;
+	}
+
 	ret = i915_vma_pin_fence(vma);
 	if (ret)
-		goto err_unpin;
+		goto err_reset;
 
 	/* Finally, remap it using the new GTT offset */
 	ret = remap_io_mapping(area,
@@ -1969,6 +1934,8 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 
 err_fence:
 	i915_vma_unpin_fence(vma);
+err_reset:
+	i915_reset_unlock(dev_priv, srcu);
 err_unpin:
 	__i915_vma_unpin(vma);
 err_unlock:
@@ -5326,6 +5293,7 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
 	init_waitqueue_head(&dev_priv->gpu_error.wait_queue);
 	init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
 	mutex_init(&dev_priv->gpu_error.wedge_mutex);
+	init_srcu_struct(&dev_priv->gpu_error.srcu);
 
 	atomic_set(&dev_priv->mm.bsd_engine_dispatch_index, 0);
 
@@ -5358,6 +5326,8 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
 	GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
 	WARN_ON(dev_priv->mm.object_count);
 
+	cleanup_srcu_struct(&dev_priv->gpu_error.srcu);
+
 	kmem_cache_destroy(dev_priv->priorities);
 	kmem_cache_destroy(dev_priv->dependencies);
 	kmem_cache_destroy(dev_priv->requests);
diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
index 46e259661294..bd0d5b8d6c96 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
@@ -240,6 +240,10 @@ static int fence_update(struct drm_i915_fence_reg *fence,
 		i915_vma_flush_writes(old);
 	}
 
+	ret = i915_reset_trylock(fence->i915);
+	if (ret < 0)
+		return ret;
+
 	if (fence->vma && fence->vma != vma) {
 		/* Ensure that all userspace CPU access is completed before
 		 * stealing the fence.
@@ -272,6 +276,7 @@ static int fence_update(struct drm_i915_fence_reg *fence,
 		list_move_tail(&fence->link, &fence->i915->mm.fence_list);
 	}
 
+	i915_reset_unlock(fence->i915, ret);
 	return 0;
 }
 
@@ -435,32 +440,6 @@ void i915_unreserve_fence(struct drm_i915_fence_reg *fence)
 	list_add(&fence->link, &fence->i915->mm.fence_list);
 }
 
-/**
- * i915_gem_revoke_fences - revoke fence state
- * @dev_priv: i915 device private
- *
- * Removes all GTT mmappings via the fence registers. This forces any user
- * of the fence to reacquire that fence before continuing with their access.
- * One use is during GPU reset where the fence register is lost and we need to
- * revoke concurrent userspace access via GTT mmaps until the hardware has been
- * reset and the fence registers have been restored.
- */
-void i915_gem_revoke_fences(struct drm_i915_private *dev_priv)
-{
-	int i;
-
-	lockdep_assert_held(&dev_priv->drm.struct_mutex);
-
-	for (i = 0; i < dev_priv->num_fence_regs; i++) {
-		struct drm_i915_fence_reg *fence = &dev_priv->fence_regs[i];
-
-		GEM_BUG_ON(fence->vma && fence->vma->fence != fence);
-
-		if (fence->vma)
-			i915_vma_revoke_mmap(fence->vma);
-	}
-}
-
 /**
  * i915_gem_restore_fences - restore fence state
  * @dev_priv: i915 device private
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 53b1f22dd365..4e797c552b96 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -231,12 +231,10 @@ struct i915_gpu_error {
 	/**
 	 * flags: Control various stages of the GPU reset
 	 *
-	 * #I915_RESET_BACKOFF - When we start a reset, we want to stop any
-	 * other users acquiring the struct_mutex. To do this we set the
-	 * #I915_RESET_BACKOFF bit in the error flags when we detect a reset
-	 * and then check for that bit before acquiring the struct_mutex (in
-	 * i915_mutex_lock_interruptible()?). I915_RESET_BACKOFF serves a
-	 * secondary role in preventing two concurrent global reset attempts.
+	 * #I915_RESET_BACKOFF - When we start a global reset, we need to
+	 * serialise with any other users attempting to do the same, and
+	 * any global resources that may be clobber by the reset (such as
+	 * FENCE registers).
 	 *
 	 * #I915_RESET_ENGINE[num_engines] - Since the driver doesn't need to
 	 * acquire the struct_mutex to reset an engine, we need an explicit
@@ -272,6 +270,8 @@ struct i915_gpu_error {
 	 */
 	wait_queue_head_t reset_queue;
 
+	struct srcu_struct srcu;
+
 	struct i915_gpu_restart *restart;
 };
 
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 4462007a681c..f58fae457ec6 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -639,6 +639,31 @@ static void reset_prepare_engine(struct intel_engine_cs *engine)
 	engine->reset.prepare(engine);
 }
 
+static void revoke_mmaps(struct drm_i915_private *i915)
+{
+	int i;
+
+	for (i = 0; i < i915->num_fence_regs; i++) {
+		struct i915_vma *vma = i915->fence_regs[i].vma;
+		struct drm_vma_offset_node *node;
+		u64 vma_offset;
+
+		if (!vma)
+			continue;
+
+		GEM_BUG_ON(vma->fence != &i915->fence_regs[i]);
+		if (!i915_vma_has_userfault(vma))
+			continue;
+
+		node = &vma->obj->base.vma_node;
+		vma_offset = vma->ggtt_view.partial.offset << PAGE_SHIFT;
+		unmap_mapping_range(i915->drm.anon_inode->i_mapping,
+				    drm_vma_node_offset_addr(node) + vma_offset,
+				    vma->size,
+				    1);
+	}
+}
+
 static void reset_prepare(struct drm_i915_private *i915)
 {
 	struct intel_engine_cs *engine;
@@ -648,6 +673,7 @@ static void reset_prepare(struct drm_i915_private *i915)
 		reset_prepare_engine(engine);
 
 	intel_uc_sanitize(i915);
+	revoke_mmaps(i915);
 }
 
 static int gt_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
@@ -911,50 +937,22 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	return ret;
 }
 
-struct __i915_reset {
-	struct drm_i915_private *i915;
-	unsigned int stalled_mask;
-};
-
-static int __i915_reset__BKL(void *data)
-{
-	struct __i915_reset *arg = data;
-	int err;
-
-	err = intel_gpu_reset(arg->i915, ALL_ENGINES);
-	if (err)
-		return err;
-
-	return gt_reset(arg->i915, arg->stalled_mask);
-}
-
-#if RESET_UNDER_STOP_MACHINE
-/*
- * XXX An alternative to using stop_machine would be to park only the
- * processes that have a GGTT mmap. By remote parking the threads (SIGSTOP)
- * we should be able to prevent their memmory accesses via the lost fence
- * registers over the course of the reset without the potential recursive
- * of mutexes between the pagefault handler and reset.
- *
- * See igt/gem_mmap_gtt/hang
- */
-#define __do_reset(fn, arg) stop_machine(fn, arg, NULL)
-#else
-#define __do_reset(fn, arg) fn(arg)
-#endif
-
 static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
 {
-	struct __i915_reset arg = { i915, stalled_mask };
 	int err, i;
 
-	err = __do_reset(__i915_reset__BKL, &arg);
+	/* Flush everyone currently using a resource about to be clobbered */
+	synchronize_srcu(&i915->gpu_error.srcu);
+
+	err = intel_gpu_reset(i915, ALL_ENGINES);
 	for (i = 0; err && i < RESET_MAX_RETRIES; i++) {
-		msleep(100);
-		err = __do_reset(__i915_reset__BKL, &arg);
+		msleep(10 * (i + 1));
+		err = intel_gpu_reset(i915, ALL_ENGINES);
 	}
+	if (err)
+		return err;
 
-	return err;
+	return gt_reset(i915, stalled_mask);
 }
 
 /**
@@ -1274,9 +1272,12 @@ void i915_handle_error(struct drm_i915_private *i915,
 		wait_event(i915->gpu_error.reset_queue,
 			   !test_bit(I915_RESET_BACKOFF,
 				     &i915->gpu_error.flags));
-		goto out;
+		goto out; /* piggy-back on the other reset */
 	}
 
+	/* Make sure i915_reset_trylock() sees the I915_RESET_BACKOFF */
+	synchronize_rcu_expedited();
+
 	/* Prevent any other reset-engine attempt. */
 	for_each_engine(engine, i915, tmp) {
 		while (test_and_set_bit(I915_RESET_ENGINE + engine->id,
@@ -1300,6 +1301,36 @@ void i915_handle_error(struct drm_i915_private *i915,
 	intel_runtime_pm_put(i915, wakeref);
 }
 
+int i915_reset_trylock(struct drm_i915_private *i915)
+{
+	struct i915_gpu_error *error = &i915->gpu_error;
+	int srcu;
+
+	rcu_read_lock();
+	while (test_bit(I915_RESET_BACKOFF, &error->flags)) {
+		rcu_read_unlock();
+
+		if (wait_event_interruptible(error->reset_queue,
+					     !test_bit(I915_RESET_BACKOFF,
+						       &error->flags)))
+			return -EINTR;
+
+		rcu_read_lock();
+	}
+	srcu = srcu_read_lock(&error->srcu);
+	rcu_read_unlock();
+
+	return srcu;
+}
+
+void i915_reset_unlock(struct drm_i915_private *i915, int tag)
+__releases(&i915->gpu_error.srcu)
+{
+	struct i915_gpu_error *error = &i915->gpu_error;
+
+	srcu_read_unlock(&error->srcu, tag);
+}
+
 bool i915_reset_flush(struct drm_i915_private *i915)
 {
 	int err;
diff --git a/drivers/gpu/drm/i915/i915_reset.h b/drivers/gpu/drm/i915/i915_reset.h
index f2d347f319df..893c5d1c2eb8 100644
--- a/drivers/gpu/drm/i915/i915_reset.h
+++ b/drivers/gpu/drm/i915/i915_reset.h
@@ -9,6 +9,7 @@
 
 #include <linux/compiler.h>
 #include <linux/types.h>
+#include <linux/srcu.h>
 
 struct drm_i915_private;
 struct intel_engine_cs;
@@ -32,6 +33,9 @@ int i915_reset_engine(struct intel_engine_cs *engine,
 void i915_reset_request(struct i915_request *rq, bool guilty);
 bool i915_reset_flush(struct drm_i915_private *i915);
 
+int __must_check i915_reset_trylock(struct drm_i915_private *i915);
+void i915_reset_unlock(struct drm_i915_private *i915, int tag);
+
 bool intel_has_gpu_reset(struct drm_i915_private *i915);
 bool intel_has_reset_engine(struct drm_i915_private *i915);
 
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index 7b6f3bea9ef8..4886fac12628 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -1039,8 +1039,6 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 
 	/* Check that we can recover an unbind stuck on a hanging request */
 
-	igt_global_reset_lock(i915);
-
 	mutex_lock(&i915->drm.struct_mutex);
 	err = hang_init(&h, i915);
 	if (err)
@@ -1138,7 +1136,9 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 	}
 
 out_reset:
+	igt_global_reset_lock(i915);
 	fake_hangcheck(rq->i915, intel_engine_flag(rq->engine));
+	igt_global_reset_unlock(i915);
 
 	if (tsk) {
 		struct igt_wedge_me w;
@@ -1159,7 +1159,6 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 	hang_fini(&h);
 unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
-	igt_global_reset_unlock(i915);
 
 	if (i915_terminally_wedged(&i915->gpu_error))
 		return -EIO;
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 14ae46fda49f..074a0d9cbf26 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -189,6 +189,7 @@ struct drm_i915_private *mock_gem_device(void)
 
 	init_waitqueue_head(&i915->gpu_error.wait_queue);
 	init_waitqueue_head(&i915->gpu_error.reset_queue);
+	init_srcu_struct(&i915->gpu_error.srcu);
 	mutex_init(&i915->gpu_error.wedge_mutex);
 
 	i915->wq = alloc_ordered_workqueue("mock", 0);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 07/22] drm/i915: Force the GPU reset upon wedging
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (5 preceding siblings ...)
  2019-02-04 13:21 ` [PATCH 06/22] drm/i915: Revoke mmaps and prevent access to fence registers across reset Chris Wilson
@ 2019-02-04 13:21 ` Chris Wilson
  2019-02-04 13:22 ` [PATCH 08/22] drm/i915: Uninterruptibly drain the timelines on unwedging Chris Wilson
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

When declaring the GPU wedged, we do need to hit the GPU with the reset
hammer so that its state matches our presumed state during cleanup. If
the reset fails, it fails, and we may be unhappy but wedged. However, if
we are testing our wedge/unwedged handling, the desync carries over into
the next test and promptly explodes.

References: https://bugs.freedesktop.org/show_bug.cgi?id=106702
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_reset.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index f58fae457ec6..c6f6400f95b4 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -532,9 +532,6 @@ typedef int (*reset_func)(struct drm_i915_private *,
 
 static reset_func intel_get_gpu_reset(struct drm_i915_private *i915)
 {
-	if (!i915_modparams.reset)
-		return NULL;
-
 	if (INTEL_GEN(i915) >= 8)
 		return gen8_reset_engines;
 	else if (INTEL_GEN(i915) >= 6)
@@ -599,6 +596,9 @@ bool intel_has_gpu_reset(struct drm_i915_private *i915)
 	if (USES_GUC(i915))
 		return false;
 
+	if (!i915_modparams.reset)
+		return NULL;
+
 	return intel_get_gpu_reset(i915);
 }
 
@@ -823,7 +823,7 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 		reset_prepare_engine(engine);
 
 	/* Even if the GPU reset fails, it should still stop the engines */
-	if (INTEL_GEN(i915) >= 5)
+	if (!INTEL_INFO(i915)->gpu_reset_clobbers_display)
 		intel_gpu_reset(i915, ALL_ENGINES);
 
 	for_each_engine(engine, i915, id) {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 08/22] drm/i915: Uninterruptibly drain the timelines on unwedging
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (6 preceding siblings ...)
  2019-02-04 13:21 ` [PATCH 07/22] drm/i915: Force the GPU reset upon wedging Chris Wilson
@ 2019-02-04 13:22 ` Chris Wilson
  2019-02-04 13:22 ` [PATCH 09/22] drm/i915: Wait for old resets before applying debugfs/i915_wedged Chris Wilson
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:22 UTC (permalink / raw)
  To: intel-gfx

On wedging, we mark all executing requests as complete and all pending
requests completed as soon as they are ready. Before unwedging though we
wish to flush those pending requests prior to restoring default
execution, and so we must wait. Do so interruptibly as we do not provide
the EINTR gracefully back to userspace in this case but persistent in
the permanently wedged start without restarting the syscall.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_reset.c | 28 ++++++++--------------------
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index c6f6400f95b4..7fc86b44d872 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -861,7 +861,6 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 {
 	struct i915_gpu_error *error = &i915->gpu_error;
 	struct i915_timeline *tl;
-	bool ret = false;
 
 	if (!test_bit(I915_WEDGED, &error->flags))
 		return true;
@@ -886,30 +885,20 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	mutex_lock(&i915->gt.timelines.mutex);
 	list_for_each_entry(tl, &i915->gt.timelines.active_list, link) {
 		struct i915_request *rq;
-		long timeout;
 
 		rq = i915_gem_active_get_unlocked(&tl->last_request);
 		if (!rq)
 			continue;
 
 		/*
-		 * We can't use our normal waiter as we want to
-		 * avoid recursively trying to handle the current
-		 * reset. The basic dma_fence_default_wait() installs
-		 * a callback for dma_fence_signal(), which is
-		 * triggered by our nop handler (indirectly, the
-		 * callback enables the signaler thread which is
-		 * woken by the nop_submit_request() advancing the seqno
-		 * and when the seqno passes the fence, the signaler
-		 * then signals the fence waking us up).
+		 * All internal dependencies (i915_requests) will have
+		 * been flushed by the set-wedge, but we may be stuck waiting
+		 * for external fences. These should all be capped to 10s
+		 * (I915_FENCE_TIMEOUT) so this wait should not be unbounded
+		 * in the worst case.
 		 */
-		timeout = dma_fence_default_wait(&rq->fence, true,
-						 MAX_SCHEDULE_TIMEOUT);
+		dma_fence_default_wait(&rq->fence, false, MAX_SCHEDULE_TIMEOUT);
 		i915_request_put(rq);
-		if (timeout < 0) {
-			mutex_unlock(&i915->gt.timelines.mutex);
-			goto unlock;
-		}
 	}
 	mutex_unlock(&i915->gt.timelines.mutex);
 
@@ -930,11 +919,10 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 
 	smp_mb__before_atomic(); /* complete takeover before enabling execbuf */
 	clear_bit(I915_WEDGED, &i915->gpu_error.flags);
-	ret = true;
-unlock:
+
 	mutex_unlock(&i915->gpu_error.wedge_mutex);
 
-	return ret;
+	return true;
 }
 
 static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 09/22] drm/i915: Wait for old resets before applying debugfs/i915_wedged
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (7 preceding siblings ...)
  2019-02-04 13:22 ` [PATCH 08/22] drm/i915: Uninterruptibly drain the timelines on unwedging Chris Wilson
@ 2019-02-04 13:22 ` Chris Wilson
  2019-02-04 13:22 ` [PATCH 10/22] drm/i915: Serialise resets with wedging Chris Wilson
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:22 UTC (permalink / raw)
  To: intel-gfx

Since we use the debugfs to recover the device after modifying the
i915.reset parameter, we need to be sure that we apply the reset and not
piggy-back onto a concurrent one in order for the parameter to take
effect.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 2cea263b4d79..54e426883529 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3874,13 +3874,9 @@ i915_wedged_set(void *data, u64 val)
 {
 	struct drm_i915_private *i915 = data;
 
-	/*
-	 * There is no safeguard against this debugfs entry colliding
-	 * with the hangcheck calling same i915_handle_error() in
-	 * parallel, causing an explosion. For now we assume that the
-	 * test harness is responsible enough not to inject gpu hangs
-	 * while it is writing to 'i915_wedged'
-	 */
+	/* Flush any previous reset before applying for a new one */
+	wait_event(i915->gpu_error.reset_queue,
+		   !test_bit(I915_RESET_BACKOFF, &i915->gpu_error.flags));
 
 	i915_handle_error(i915, val, I915_ERROR_CAPTURE,
 			  "Manually set wedged engine mask = %llx", val);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 10/22] drm/i915: Serialise resets with wedging
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (8 preceding siblings ...)
  2019-02-04 13:22 ` [PATCH 09/22] drm/i915: Wait for old resets before applying debugfs/i915_wedged Chris Wilson
@ 2019-02-04 13:22 ` Chris Wilson
  2019-02-04 13:22 ` [PATCH 11/22] drm/i915: Don't claim an unstarted request was guilty Chris Wilson
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:22 UTC (permalink / raw)
  To: intel-gfx

Prevent concurrent set-wedge with ongoing resets (and vice versa) by
taking the same wedge_mutex around both operations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_reset.c | 68 ++++++++++++++++++-------------
 1 file changed, 40 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 7fc86b44d872..ca19fcf29c5b 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -793,17 +793,14 @@ static void nop_submit_request(struct i915_request *request)
 	intel_engine_queue_breadcrumbs(engine);
 }
 
-void i915_gem_set_wedged(struct drm_i915_private *i915)
+static void __i915_gem_set_wedged(struct drm_i915_private *i915)
 {
 	struct i915_gpu_error *error = &i915->gpu_error;
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	mutex_lock(&error->wedge_mutex);
-	if (test_bit(I915_WEDGED, &error->flags)) {
-		mutex_unlock(&error->wedge_mutex);
+	if (test_bit(I915_WEDGED, &error->flags))
 		return;
-	}
 
 	if (GEM_SHOW_DEBUG() && !intel_engines_are_idle(i915)) {
 		struct drm_printer p = drm_debug_printer(__func__);
@@ -852,12 +849,18 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 	set_bit(I915_WEDGED, &error->flags);
 
 	GEM_TRACE("end\n");
-	mutex_unlock(&error->wedge_mutex);
+}
 
-	wake_up_all(&error->reset_queue);
+void i915_gem_set_wedged(struct drm_i915_private *i915)
+{
+	struct i915_gpu_error *error = &i915->gpu_error;
+
+	mutex_lock(&error->wedge_mutex);
+	__i915_gem_set_wedged(i915);
+	mutex_unlock(&error->wedge_mutex);
 }
 
-bool i915_gem_unset_wedged(struct drm_i915_private *i915)
+static bool __i915_gem_unset_wedged(struct drm_i915_private *i915)
 {
 	struct i915_gpu_error *error = &i915->gpu_error;
 	struct i915_timeline *tl;
@@ -868,8 +871,6 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	if (!i915->gt.scratch) /* Never full initialised, recovery impossible */
 		return false;
 
-	mutex_lock(&error->wedge_mutex);
-
 	GEM_TRACE("start\n");
 
 	/*
@@ -920,11 +921,21 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	smp_mb__before_atomic(); /* complete takeover before enabling execbuf */
 	clear_bit(I915_WEDGED, &i915->gpu_error.flags);
 
-	mutex_unlock(&i915->gpu_error.wedge_mutex);
-
 	return true;
 }
 
+bool i915_gem_unset_wedged(struct drm_i915_private *i915)
+{
+	struct i915_gpu_error *error = &i915->gpu_error;
+	bool result;
+
+	mutex_lock(&error->wedge_mutex);
+	result = __i915_gem_unset_wedged(i915);
+	mutex_unlock(&error->wedge_mutex);
+
+	return result;
+}
+
 static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
 {
 	int err, i;
@@ -976,7 +987,7 @@ void i915_reset(struct drm_i915_private *i915,
 	GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &error->flags));
 
 	/* Clear any previous failed attempts at recovery. Time to try again. */
-	if (!i915_gem_unset_wedged(i915))
+	if (!__i915_gem_unset_wedged(i915))
 		return;
 
 	if (reason)
@@ -1038,7 +1049,7 @@ void i915_reset(struct drm_i915_private *i915,
 	 */
 	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
 error:
-	i915_gem_set_wedged(i915);
+	__i915_gem_set_wedged(i915);
 	goto finish;
 }
 
@@ -1130,7 +1141,9 @@ static void i915_reset_device(struct drm_i915_private *i915,
 	i915_wedge_on_timeout(&w, i915, 5 * HZ) {
 		intel_prepare_reset(i915);
 
+		mutex_lock(&error->wedge_mutex);
 		i915_reset(i915, engine_mask, reason);
+		mutex_unlock(&error->wedge_mutex);
 
 		intel_finish_reset(i915);
 	}
@@ -1198,6 +1211,7 @@ void i915_handle_error(struct drm_i915_private *i915,
 		       unsigned long flags,
 		       const char *fmt, ...)
 {
+	struct i915_gpu_error *error = &i915->gpu_error;
 	struct intel_engine_cs *engine;
 	intel_wakeref_t wakeref;
 	unsigned int tmp;
@@ -1234,20 +1248,19 @@ void i915_handle_error(struct drm_i915_private *i915,
 	 * Try engine reset when available. We fall back to full reset if
 	 * single reset fails.
 	 */
-	if (intel_has_reset_engine(i915) &&
-	    !i915_terminally_wedged(&i915->gpu_error)) {
+	if (intel_has_reset_engine(i915) && !i915_terminally_wedged(error)) {
 		for_each_engine_masked(engine, i915, engine_mask, tmp) {
 			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
 			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
-					     &i915->gpu_error.flags))
+					     &error->flags))
 				continue;
 
 			if (i915_reset_engine(engine, msg) == 0)
 				engine_mask &= ~intel_engine_flag(engine);
 
 			clear_bit(I915_RESET_ENGINE + engine->id,
-				  &i915->gpu_error.flags);
-			wake_up_bit(&i915->gpu_error.flags,
+				  &error->flags);
+			wake_up_bit(&error->flags,
 				    I915_RESET_ENGINE + engine->id);
 		}
 	}
@@ -1256,10 +1269,9 @@ void i915_handle_error(struct drm_i915_private *i915,
 		goto out;
 
 	/* Full reset needs the mutex, stop any other user trying to do so. */
-	if (test_and_set_bit(I915_RESET_BACKOFF, &i915->gpu_error.flags)) {
-		wait_event(i915->gpu_error.reset_queue,
-			   !test_bit(I915_RESET_BACKOFF,
-				     &i915->gpu_error.flags));
+	if (test_and_set_bit(I915_RESET_BACKOFF, &error->flags)) {
+		wait_event(error->reset_queue,
+			   !test_bit(I915_RESET_BACKOFF, &error->flags));
 		goto out; /* piggy-back on the other reset */
 	}
 
@@ -1269,8 +1281,8 @@ void i915_handle_error(struct drm_i915_private *i915,
 	/* Prevent any other reset-engine attempt. */
 	for_each_engine(engine, i915, tmp) {
 		while (test_and_set_bit(I915_RESET_ENGINE + engine->id,
-					&i915->gpu_error.flags))
-			wait_on_bit(&i915->gpu_error.flags,
+					&error->flags))
+			wait_on_bit(&error->flags,
 				    I915_RESET_ENGINE + engine->id,
 				    TASK_UNINTERRUPTIBLE);
 	}
@@ -1279,11 +1291,11 @@ void i915_handle_error(struct drm_i915_private *i915,
 
 	for_each_engine(engine, i915, tmp) {
 		clear_bit(I915_RESET_ENGINE + engine->id,
-			  &i915->gpu_error.flags);
+			  &error->flags);
 	}
 
-	clear_bit(I915_RESET_BACKOFF, &i915->gpu_error.flags);
-	wake_up_all(&i915->gpu_error.reset_queue);
+	clear_bit(I915_RESET_BACKOFF, &error->flags);
+	wake_up_all(&error->reset_queue);
 
 out:
 	intel_runtime_pm_put(i915, wakeref);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 11/22] drm/i915: Don't claim an unstarted request was guilty
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (9 preceding siblings ...)
  2019-02-04 13:22 ` [PATCH 10/22] drm/i915: Serialise resets with wedging Chris Wilson
@ 2019-02-04 13:22 ` Chris Wilson
  2019-02-04 13:22 ` [PATCH 12/22] drm/i915: Generalise GPU activity tracking Chris Wilson
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:22 UTC (permalink / raw)
  To: intel-gfx

If we haven't even begun executing the payload of the stalled request,
then we should not claim that its userspace context was guilty of
submitting a hanging batch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c                 | 2 +-
 drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 6 ++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0869a4fd20c7..8e301f19036b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1947,7 +1947,7 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled)
 		  rq ? rq->global_seqno : 0,
 		  intel_engine_get_seqno(engine),
 		  yesno(stalled));
-	if (!rq)
+	if (!rq || !i915_request_started(rq))
 		goto out_unlock;
 
 	/*
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index 4886fac12628..36c17bfe05a7 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -246,6 +246,12 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine)
 	if (INTEL_GEN(vm->i915) <= 5)
 		flags |= I915_DISPATCH_SECURE;
 
+	if (rq->engine->emit_init_breadcrumb) {
+		err = rq->engine->emit_init_breadcrumb(rq);
+		if (err)
+			goto cancel_rq;
+	}
+
 	err = rq->engine->emit_bb_start(rq, vma->node.start, PAGE_SIZE, flags);
 
 cancel_rq:
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 12/22] drm/i915: Generalise GPU activity tracking
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (10 preceding siblings ...)
  2019-02-04 13:22 ` [PATCH 11/22] drm/i915: Don't claim an unstarted request was guilty Chris Wilson
@ 2019-02-04 13:22 ` Chris Wilson
  2019-02-05  8:51   ` Tvrtko Ursulin
  2019-02-04 13:22 ` [PATCH 13/22] drm/i915: Release the active tracker tree upon idling Chris Wilson
                   ` (16 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:22 UTC (permalink / raw)
  To: intel-gfx

We currently track GPU memory usage inside VMA, such that we never
release memory used by the GPU until after it has finished accessing it.
However, we may want to track other resources aside from VMA, or we may
want to split a VMA into multiple independent regions and track each
separately. For this purpose, generalise our request tracking (akin to
struct reservation_object) so that we can embed it into other objects.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile                 |   4 +-
 drivers/gpu/drm/i915/i915_active.c            | 228 ++++++++++++++++++
 drivers/gpu/drm/i915/i915_active.h            |  69 ++++++
 drivers/gpu/drm/i915/i915_active_types.h      |  26 ++
 drivers/gpu/drm/i915/i915_gem_gtt.c           |   3 +-
 drivers/gpu/drm/i915/i915_vma.c               | 173 +++----------
 drivers/gpu/drm/i915/i915_vma.h               |   9 +-
 drivers/gpu/drm/i915/selftests/i915_active.c  | 158 ++++++++++++
 .../drm/i915/selftests/i915_live_selftests.h  |   3 +-
 9 files changed, 519 insertions(+), 154 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_active.c
 create mode 100644 drivers/gpu/drm/i915/i915_active.h
 create mode 100644 drivers/gpu/drm/i915/i915_active_types.h
 create mode 100644 drivers/gpu/drm/i915/selftests/i915_active.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 210d0e8777b6..1787e1299b1b 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -57,7 +57,9 @@ i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o intel_pipe_crc.o
 i915-$(CONFIG_PERF_EVENTS) += i915_pmu.o
 
 # GEM code
-i915-y += i915_cmd_parser.o \
+i915-y += \
+	  i915_active.o \
+	  i915_cmd_parser.o \
 	  i915_gem_batch_pool.o \
 	  i915_gem_clflush.o \
 	  i915_gem_context.o \
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
new file mode 100644
index 000000000000..91950d778cab
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -0,0 +1,228 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include "i915_drv.h"
+#include "i915_active.h"
+
+#define BKL(ref) (&(ref)->i915->drm.struct_mutex)
+
+struct active_node {
+	struct i915_gem_active base;
+	struct i915_active *ref;
+	struct rb_node node;
+	u64 timeline;
+};
+
+static void
+__active_retire(struct i915_active *ref)
+{
+	GEM_BUG_ON(!ref->count);
+	if (!--ref->count)
+		ref->retire(ref);
+}
+
+static void
+node_retire(struct i915_gem_active *base, struct i915_request *rq)
+{
+	__active_retire(container_of(base, struct active_node, base)->ref);
+}
+
+static void
+last_retire(struct i915_gem_active *base, struct i915_request *rq)
+{
+	__active_retire(container_of(base, struct i915_active, last));
+}
+
+static struct i915_gem_active *
+active_instance(struct i915_active *ref, u64 idx)
+{
+	struct active_node *node;
+	struct rb_node **p, *parent;
+	struct i915_request *old;
+
+	/*
+	 * We track the most recently used timeline to skip a rbtree search
+	 * for the common case, under typical loads we never need the rbtree
+	 * at all. We can reuse the last slot if it is empty, that is
+	 * after the previous activity has been retired, or if it matches the
+	 * current timeline.
+	 *
+	 * Note that we allow the timeline to be active simultaneously in
+	 * the rbtree and the last cache. We do this to avoid having
+	 * to search and replace the rbtree element for a new timeline, with
+	 * the cost being that we must be aware that the ref may be retired
+	 * twice for the same timeline (as the older rbtree element will be
+	 * retired before the new request added to last).
+	 */
+	old = i915_gem_active_raw(&ref->last, BKL(ref));
+	if (!old || old->fence.context == idx)
+		goto out;
+
+	/* Move the currently active fence into the rbtree */
+	idx = old->fence.context;
+
+	parent = NULL;
+	p = &ref->tree.rb_node;
+	while (*p) {
+		parent = *p;
+
+		node = rb_entry(parent, struct active_node, node);
+		if (node->timeline == idx)
+			goto replace;
+
+		if (node->timeline < idx)
+			p = &parent->rb_right;
+		else
+			p = &parent->rb_left;
+	}
+
+	node = kmalloc(sizeof(*node), GFP_KERNEL);
+
+	/* kmalloc may retire the ref->last (thanks shrinker)! */
+	if (unlikely(!i915_gem_active_raw(&ref->last, BKL(ref)))) {
+		kfree(node);
+		goto out;
+	}
+
+	if (unlikely(!node))
+		return ERR_PTR(-ENOMEM);
+
+	init_request_active(&node->base, node_retire);
+	node->ref = ref;
+	node->timeline = idx;
+
+	rb_link_node(&node->node, parent, p);
+	rb_insert_color(&node->node, &ref->tree);
+
+replace:
+	/*
+	 * Overwrite the previous active slot in the rbtree with last,
+	 * leaving last zeroed. If the previous slot is still active,
+	 * we must be careful as we now only expect to receive one retire
+	 * callback not two, and so much undo the active counting for the
+	 * overwritten slot.
+	 */
+	if (i915_gem_active_isset(&node->base)) {
+		/* Retire ourselves from the old rq->active_list */
+		__list_del_entry(&node->base.link);
+		ref->count--;
+		GEM_BUG_ON(!ref->count);
+	}
+	GEM_BUG_ON(list_empty(&ref->last.link));
+	list_replace_init(&ref->last.link, &node->base.link);
+	node->base.request = fetch_and_zero(&ref->last.request);
+
+out:
+	return &ref->last;
+}
+
+void i915_active_init(struct drm_i915_private *i915,
+		      struct i915_active *ref,
+		      void (*retire)(struct i915_active *ref))
+{
+	ref->i915 = i915;
+	ref->retire = retire;
+	ref->tree = RB_ROOT;
+	init_request_active(&ref->last, last_retire);
+	ref->count = 0;
+}
+
+int i915_active_ref(struct i915_active *ref,
+		    u64 timeline,
+		    struct i915_request *rq)
+{
+	struct i915_gem_active *active;
+
+	active = active_instance(ref, timeline);
+	if (IS_ERR(active))
+		return PTR_ERR(active);
+
+	if (!i915_gem_active_isset(active))
+		ref->count++;
+	i915_gem_active_set(active, rq);
+
+	GEM_BUG_ON(!ref->count);
+	return 0;
+}
+
+bool i915_active_acquire(struct i915_active *ref)
+{
+	lockdep_assert_held(BKL(ref));
+	return !ref->count++;
+}
+
+void i915_active_release(struct i915_active *ref)
+{
+	lockdep_assert_held(BKL(ref));
+	__active_retire(ref);
+}
+
+int i915_active_wait(struct i915_active *ref)
+{
+	struct active_node *it, *n;
+	int ret = 0;
+
+	if (i915_active_acquire(ref))
+		goto out_release;
+
+	ret = i915_gem_active_retire(&ref->last, BKL(ref));
+	if (ret)
+		goto out_release;
+
+	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
+		ret = i915_gem_active_retire(&it->base, BKL(ref));
+		if (ret)
+			break;
+	}
+
+out_release:
+	i915_active_release(ref);
+	return ret;
+}
+
+static int __i915_request_await_active(struct i915_request *rq,
+				       struct i915_gem_active *active)
+{
+	struct i915_request *barrier =
+		i915_gem_active_raw(active, &rq->i915->drm.struct_mutex);
+
+	return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0;
+}
+
+int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
+{
+	struct active_node *it, *n;
+	int ret;
+
+	ret = __i915_request_await_active(rq, &ref->last);
+	if (ret)
+		return ret;
+
+	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
+		ret = __i915_request_await_active(rq, &it->base);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+void i915_active_fini(struct i915_active *ref)
+{
+	struct active_node *it, *n;
+
+	GEM_BUG_ON(i915_gem_active_isset(&ref->last));
+
+	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
+		GEM_BUG_ON(i915_gem_active_isset(&it->base));
+		kfree(it);
+	}
+	ref->tree = RB_ROOT;
+}
+
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+#include "selftests/i915_active.c"
+#endif
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
new file mode 100644
index 000000000000..0aa2628ea734
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -0,0 +1,69 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef _I915_ACTIVE_H_
+#define _I915_ACTIVE_H_
+
+#include "i915_active_types.h"
+
+/*
+ * GPU activity tracking
+ *
+ * Each set of commands submitted to the GPU compromises a single request that
+ * signals a fence upon completion. struct i915_request combines the
+ * command submission, scheduling and fence signaling roles. If we want to see
+ * if a particular task is complete, we need to grab the fence (struct
+ * i915_request) for that task and check or wait for it to be signaled. More
+ * often though we want to track the status of a bunch of tasks, for example
+ * to wait for the GPU to finish accessing some memory across a variety of
+ * different command pipelines from different clients. We could choose to
+ * track every single request associated with the task, but knowing that
+ * each request belongs to an ordered timeline (later requests within a
+ * timeline must wait for earlier requests), we need only track the
+ * latest request in each timeline to determine the overall status of the
+ * task.
+ *
+ * struct i915_active provides this tracking across timelines. It builds a
+ * composite shared-fence, and is updated as new work is submitted to the task,
+ * forming a snapshot of the current status. It should be embedded into the
+ * different resources that need to track their associated GPU activity to
+ * provide a callback when that GPU activity has ceased, or otherwise to
+ * provide a serialisation point either for request submission or for CPU
+ * synchronisation.
+ */
+
+void i915_active_init(struct drm_i915_private *i915,
+		      struct i915_active *ref,
+		      void (*retire)(struct i915_active *ref));
+
+int i915_active_ref(struct i915_active *ref,
+		    u64 timeline,
+		    struct i915_request *rq);
+
+int i915_active_wait(struct i915_active *ref);
+
+int i915_request_await_active(struct i915_request *rq,
+			      struct i915_active *ref);
+
+bool i915_active_acquire(struct i915_active *ref);
+
+static inline void i915_active_cancel(struct i915_active *ref)
+{
+	GEM_BUG_ON(ref->count != 1);
+	ref->count = 0;
+}
+
+void i915_active_release(struct i915_active *ref);
+
+static inline bool
+i915_active_is_idle(const struct i915_active *ref)
+{
+	return !ref->count;
+}
+
+void i915_active_fini(struct i915_active *ref);
+
+#endif /* _I915_ACTIVE_H_ */
diff --git a/drivers/gpu/drm/i915/i915_active_types.h b/drivers/gpu/drm/i915/i915_active_types.h
new file mode 100644
index 000000000000..411e502ed8dd
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_active_types.h
@@ -0,0 +1,26 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef _I915_ACTIVE_TYPES_H_
+#define _I915_ACTIVE_TYPES_H_
+
+#include <linux/rbtree.h>
+
+#include "i915_request.h"
+
+struct drm_i915_private;
+
+struct i915_active {
+	struct drm_i915_private *i915;
+
+	struct rb_root tree;
+	struct i915_gem_active last;
+	unsigned int count;
+
+	void (*retire)(struct i915_active *ref);
+};
+
+#endif /* _I915_ACTIVE_TYPES_H_ */
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 49b00996a15e..e625659c03a2 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1917,14 +1917,13 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
 	if (!vma)
 		return ERR_PTR(-ENOMEM);
 
+	i915_active_init(i915, &vma->active, NULL);
 	init_request_active(&vma->last_fence, NULL);
 
 	vma->vm = &ggtt->vm;
 	vma->ops = &pd_vma_ops;
 	vma->private = ppgtt;
 
-	vma->active = RB_ROOT;
-
 	vma->size = size;
 	vma->fence_size = size;
 	vma->flags = I915_VMA_GGTT;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index d83b8ad5f859..d4772061e642 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -63,22 +63,23 @@ static void vma_print_allocator(struct i915_vma *vma, const char *reason)
 
 #endif
 
-struct i915_vma_active {
-	struct i915_gem_active base;
-	struct i915_vma *vma;
-	struct rb_node node;
-	u64 timeline;
-};
+static void obj_bump_mru(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
-static void
-__i915_vma_retire(struct i915_vma *vma, struct i915_request *rq)
+	spin_lock(&i915->mm.obj_lock);
+	if (obj->bind_count)
+		list_move_tail(&obj->mm.link, &i915->mm.bound_list);
+	spin_unlock(&i915->mm.obj_lock);
+
+	obj->mm.dirty = true; /* be paranoid  */
+}
+
+static void __i915_vma_retire(struct i915_active *ref)
 {
+	struct i915_vma *vma = container_of(ref, typeof(*vma), active);
 	struct drm_i915_gem_object *obj = vma->obj;
 
-	GEM_BUG_ON(!i915_vma_is_active(vma));
-	if (--vma->active_count)
-		return;
-
 	GEM_BUG_ON(!i915_gem_object_is_active(obj));
 	if (--obj->active_count)
 		return;
@@ -90,16 +91,12 @@ __i915_vma_retire(struct i915_vma *vma, struct i915_request *rq)
 		reservation_object_unlock(obj->resv);
 	}
 
-	/* Bump our place on the bound list to keep it roughly in LRU order
+	/*
+	 * Bump our place on the bound list to keep it roughly in LRU order
 	 * so that we don't steal from recently used but inactive objects
 	 * (unless we are forced to ofc!)
 	 */
-	spin_lock(&rq->i915->mm.obj_lock);
-	if (obj->bind_count)
-		list_move_tail(&obj->mm.link, &rq->i915->mm.bound_list);
-	spin_unlock(&rq->i915->mm.obj_lock);
-
-	obj->mm.dirty = true; /* be paranoid  */
+	obj_bump_mru(obj);
 
 	if (i915_gem_object_has_active_reference(obj)) {
 		i915_gem_object_clear_active_reference(obj);
@@ -107,21 +104,6 @@ __i915_vma_retire(struct i915_vma *vma, struct i915_request *rq)
 	}
 }
 
-static void
-i915_vma_retire(struct i915_gem_active *base, struct i915_request *rq)
-{
-	struct i915_vma_active *active =
-		container_of(base, typeof(*active), base);
-
-	__i915_vma_retire(active->vma, rq);
-}
-
-static void
-i915_vma_last_retire(struct i915_gem_active *base, struct i915_request *rq)
-{
-	__i915_vma_retire(container_of(base, struct i915_vma, last_active), rq);
-}
-
 static struct i915_vma *
 vma_create(struct drm_i915_gem_object *obj,
 	   struct i915_address_space *vm,
@@ -137,10 +119,9 @@ vma_create(struct drm_i915_gem_object *obj,
 	if (vma == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	vma->active = RB_ROOT;
-
-	init_request_active(&vma->last_active, i915_vma_last_retire);
+	i915_active_init(vm->i915, &vma->active, __i915_vma_retire);
 	init_request_active(&vma->last_fence, NULL);
+
 	vma->vm = vm;
 	vma->ops = &vm->vma_ops;
 	vma->obj = obj;
@@ -823,7 +804,6 @@ void i915_vma_reopen(struct i915_vma *vma)
 static void __i915_vma_destroy(struct i915_vma *vma)
 {
 	struct drm_i915_private *i915 = vma->vm->i915;
-	struct i915_vma_active *iter, *n;
 
 	GEM_BUG_ON(vma->node.allocated);
 	GEM_BUG_ON(vma->fence);
@@ -843,10 +823,7 @@ static void __i915_vma_destroy(struct i915_vma *vma)
 		spin_unlock(&obj->vma.lock);
 	}
 
-	rbtree_postorder_for_each_entry_safe(iter, n, &vma->active, node) {
-		GEM_BUG_ON(i915_gem_active_isset(&iter->base));
-		kfree(iter);
-	}
+	i915_active_fini(&vma->active);
 
 	kmem_cache_free(i915->vmas, vma);
 }
@@ -931,104 +908,15 @@ static void export_fence(struct i915_vma *vma,
 	reservation_object_unlock(resv);
 }
 
-static struct i915_gem_active *active_instance(struct i915_vma *vma, u64 idx)
-{
-	struct i915_vma_active *active;
-	struct rb_node **p, *parent;
-	struct i915_request *old;
-
-	/*
-	 * We track the most recently used timeline to skip a rbtree search
-	 * for the common case, under typical loads we never need the rbtree
-	 * at all. We can reuse the last_active slot if it is empty, that is
-	 * after the previous activity has been retired, or if the active
-	 * matches the current timeline.
-	 *
-	 * Note that we allow the timeline to be active simultaneously in
-	 * the rbtree and the last_active cache. We do this to avoid having
-	 * to search and replace the rbtree element for a new timeline, with
-	 * the cost being that we must be aware that the vma may be retired
-	 * twice for the same timeline (as the older rbtree element will be
-	 * retired before the new request added to last_active).
-	 */
-	old = i915_gem_active_raw(&vma->last_active,
-				  &vma->vm->i915->drm.struct_mutex);
-	if (!old || old->fence.context == idx)
-		goto out;
-
-	/* Move the currently active fence into the rbtree */
-	idx = old->fence.context;
-
-	parent = NULL;
-	p = &vma->active.rb_node;
-	while (*p) {
-		parent = *p;
-
-		active = rb_entry(parent, struct i915_vma_active, node);
-		if (active->timeline == idx)
-			goto replace;
-
-		if (active->timeline < idx)
-			p = &parent->rb_right;
-		else
-			p = &parent->rb_left;
-	}
-
-	active = kmalloc(sizeof(*active), GFP_KERNEL);
-
-	/* kmalloc may retire the vma->last_active request (thanks shrinker)! */
-	if (unlikely(!i915_gem_active_raw(&vma->last_active,
-					  &vma->vm->i915->drm.struct_mutex))) {
-		kfree(active);
-		goto out;
-	}
-
-	if (unlikely(!active))
-		return ERR_PTR(-ENOMEM);
-
-	init_request_active(&active->base, i915_vma_retire);
-	active->vma = vma;
-	active->timeline = idx;
-
-	rb_link_node(&active->node, parent, p);
-	rb_insert_color(&active->node, &vma->active);
-
-replace:
-	/*
-	 * Overwrite the previous active slot in the rbtree with last_active,
-	 * leaving last_active zeroed. If the previous slot is still active,
-	 * we must be careful as we now only expect to receive one retire
-	 * callback not two, and so much undo the active counting for the
-	 * overwritten slot.
-	 */
-	if (i915_gem_active_isset(&active->base)) {
-		/* Retire ourselves from the old rq->active_list */
-		__list_del_entry(&active->base.link);
-		vma->active_count--;
-		GEM_BUG_ON(!vma->active_count);
-	}
-	GEM_BUG_ON(list_empty(&vma->last_active.link));
-	list_replace_init(&vma->last_active.link, &active->base.link);
-	active->base.request = fetch_and_zero(&vma->last_active.request);
-
-out:
-	return &vma->last_active;
-}
-
 int i915_vma_move_to_active(struct i915_vma *vma,
 			    struct i915_request *rq,
 			    unsigned int flags)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
-	struct i915_gem_active *active;
 
 	lockdep_assert_held(&rq->i915->drm.struct_mutex);
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 
-	active = active_instance(vma, rq->fence.context);
-	if (IS_ERR(active))
-		return PTR_ERR(active);
-
 	/*
 	 * Add a reference if we're newly entering the active list.
 	 * The order in which we add operations to the retirement queue is
@@ -1037,9 +925,15 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 	 * add the active reference first and queue for it to be dropped
 	 * *last*.
 	 */
-	if (!i915_gem_active_isset(active) && !vma->active_count++)
+	if (!vma->active.count)
 		obj->active_count++;
-	i915_gem_active_set(active, rq);
+
+	if (unlikely(i915_active_ref(&vma->active, rq->fence.context, rq))) {
+		if (!vma->active.count)
+			obj->active_count--;
+		return -ENOMEM;
+	}
+
 	GEM_BUG_ON(!i915_vma_is_active(vma));
 	GEM_BUG_ON(!obj->active_count);
 
@@ -1073,8 +967,6 @@ int i915_vma_unbind(struct i915_vma *vma)
 	 */
 	might_sleep();
 	if (i915_vma_is_active(vma)) {
-		struct i915_vma_active *active, *n;
-
 		/*
 		 * When a closed VMA is retired, it is unbound - eek.
 		 * In order to prevent it from being recursively closed,
@@ -1090,19 +982,10 @@ int i915_vma_unbind(struct i915_vma *vma)
 		 */
 		__i915_vma_pin(vma);
 
-		ret = i915_gem_active_retire(&vma->last_active,
-					     &vma->vm->i915->drm.struct_mutex);
+		ret = i915_active_wait(&vma->active);
 		if (ret)
 			goto unpin;
 
-		rbtree_postorder_for_each_entry_safe(active, n,
-						     &vma->active, node) {
-			ret = i915_gem_active_retire(&active->base,
-						     &vma->vm->i915->drm.struct_mutex);
-			if (ret)
-				goto unpin;
-		}
-
 		ret = i915_gem_active_retire(&vma->last_fence,
 					     &vma->vm->i915->drm.struct_mutex);
 unpin:
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 5793abe509a2..3c03d4569481 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -34,6 +34,7 @@
 #include "i915_gem_fence_reg.h"
 #include "i915_gem_object.h"
 
+#include "i915_active.h"
 #include "i915_request.h"
 
 enum i915_cache_level;
@@ -108,9 +109,7 @@ struct i915_vma {
 #define I915_VMA_USERFAULT	BIT(I915_VMA_USERFAULT_BIT)
 #define I915_VMA_GGTT_WRITE	BIT(15)
 
-	unsigned int active_count;
-	struct rb_root active;
-	struct i915_gem_active last_active;
+	struct i915_active active;
 	struct i915_gem_active last_fence;
 
 	/**
@@ -154,9 +153,9 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
 #define I915_VMA_RELEASE_MAP BIT(0)
 
-static inline bool i915_vma_is_active(struct i915_vma *vma)
+static inline bool i915_vma_is_active(const struct i915_vma *vma)
 {
-	return vma->active_count;
+	return !i915_active_is_idle(&vma->active);
 }
 
 int __must_check i915_vma_move_to_active(struct i915_vma *vma,
diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c
new file mode 100644
index 000000000000..c05ca366729a
--- /dev/null
+++ b/drivers/gpu/drm/i915/selftests/i915_active.c
@@ -0,0 +1,158 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+#include "../i915_selftest.h"
+
+#include "igt_flush_test.h"
+#include "lib_sw_fence.h"
+
+struct live_active {
+	struct i915_active base;
+	bool retired;
+};
+
+static void __live_active_retire(struct i915_active *base)
+{
+	struct live_active *active = container_of(base, typeof(*active), base);
+
+	active->retired = true;
+}
+
+static int __live_active_setup(struct drm_i915_private *i915,
+			       struct live_active *active)
+{
+	struct intel_engine_cs *engine;
+	struct i915_sw_fence *submit;
+	enum intel_engine_id id;
+	unsigned int count = 0;
+	int err = 0;
+
+	i915_active_init(i915, &active->base, __live_active_retire);
+	active->retired = false;
+
+	if (!i915_active_acquire(&active->base)) {
+		pr_err("First i915_active_acquire should report being idle\n");
+		return -EINVAL;
+	}
+
+	submit = heap_fence_create(GFP_KERNEL);
+
+	for_each_engine(engine, i915, id) {
+		struct i915_request *rq;
+
+		rq = i915_request_alloc(engine, i915->kernel_context);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			break;
+		}
+
+		err = i915_sw_fence_await_sw_fence_gfp(&rq->submit,
+						       submit,
+						       GFP_KERNEL);
+		if (err < 0) {
+			pr_err("Failed to allocate submission fence!\n");
+			i915_request_add(rq);
+			break;
+		}
+
+		err = i915_active_ref(&active->base, rq->fence.context, rq);
+		if (err) {
+			pr_err("Failed to track active ref!\n");
+			i915_request_add(rq);
+			break;
+		}
+
+		i915_request_add(rq);
+		count++;
+	}
+
+	i915_active_release(&active->base);
+	if (active->retired) {
+		pr_err("i915_active retired before submission!\n");
+		err = -EINVAL;
+	}
+	if (active->base.count != count) {
+		pr_err("i915_active not tracking all requests, found %d, expected %d\n",
+		       active->base.count, count);
+		err = -EINVAL;
+	}
+
+	i915_sw_fence_commit(submit);
+	heap_fence_put(submit);
+
+	return err;
+}
+
+static int live_active_wait(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct live_active active;
+	intel_wakeref_t wakeref;
+	int err;
+
+	/* Check that we get a callback when requests retire upon waiting */
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	err = __live_active_setup(i915, &active);
+
+	i915_active_wait(&active.base);
+	if (!active.retired) {
+		pr_err("i915_active not retired after waiting!\n");
+		err = -EINVAL;
+	}
+
+	i915_active_fini(&active.base);
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+}
+
+static int live_active_retire(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct live_active active;
+	intel_wakeref_t wakeref;
+	int err;
+
+	/* Check that we get a callback when requests are indirectly retired */
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	err = __live_active_setup(i915, &active);
+
+	/* waits for & retires all requests */
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	if (!active.retired) {
+		pr_err("i915_active not retired after flushing!\n");
+		err = -EINVAL;
+	}
+
+	i915_active_fini(&active.base);
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+}
+
+int i915_active_live_selftests(struct drm_i915_private *i915)
+{
+	static const struct i915_subtest tests[] = {
+		SUBTEST(live_active_wait),
+		SUBTEST(live_active_retire),
+	};
+
+	if (i915_terminally_wedged(&i915->gpu_error))
+		return 0;
+
+	return i915_subtests(tests, i915);
+}
diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
index 76b4f87fc853..6d766925ad04 100644
--- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
@@ -12,8 +12,9 @@
 selftest(sanitycheck, i915_live_sanitycheck) /* keep first (igt selfcheck) */
 selftest(uncore, intel_uncore_live_selftests)
 selftest(workarounds, intel_workarounds_live_selftests)
-selftest(requests, i915_request_live_selftests)
 selftest(timelines, i915_timeline_live_selftests)
+selftest(requests, i915_request_live_selftests)
+selftest(active, i915_active_live_selftests)
 selftest(objects, i915_gem_object_live_selftests)
 selftest(dmabuf, i915_gem_dmabuf_live_selftests)
 selftest(coherency, i915_gem_coherency_live_selftests)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 12/22] drm/i915: Generalise GPU activity tracking
  2019-02-04 13:22 ` [PATCH 12/22] drm/i915: Generalise GPU activity tracking Chris Wilson
@ 2019-02-05  8:51   ` Tvrtko Ursulin
  2019-02-05  8:59     ` Chris Wilson
  0 siblings, 1 reply; 45+ messages in thread
From: Tvrtko Ursulin @ 2019-02-05  8:51 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/02/2019 13:22, Chris Wilson wrote:
> We currently track GPU memory usage inside VMA, such that we never
> release memory used by the GPU until after it has finished accessing it.
> However, we may want to track other resources aside from VMA, or we may
> want to split a VMA into multiple independent regions and track each
> separately. For this purpose, generalise our request tracking (akin to
> struct reservation_object) so that we can embed it into other objects.

Please add changelog.

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/Makefile                 |   4 +-
>   drivers/gpu/drm/i915/i915_active.c            | 228 ++++++++++++++++++
>   drivers/gpu/drm/i915/i915_active.h            |  69 ++++++
>   drivers/gpu/drm/i915/i915_active_types.h      |  26 ++
>   drivers/gpu/drm/i915/i915_gem_gtt.c           |   3 +-
>   drivers/gpu/drm/i915/i915_vma.c               | 173 +++----------
>   drivers/gpu/drm/i915/i915_vma.h               |   9 +-
>   drivers/gpu/drm/i915/selftests/i915_active.c  | 158 ++++++++++++
>   .../drm/i915/selftests/i915_live_selftests.h  |   3 +-
>   9 files changed, 519 insertions(+), 154 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/i915_active.c
>   create mode 100644 drivers/gpu/drm/i915/i915_active.h
>   create mode 100644 drivers/gpu/drm/i915/i915_active_types.h
>   create mode 100644 drivers/gpu/drm/i915/selftests/i915_active.c
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 210d0e8777b6..1787e1299b1b 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -57,7 +57,9 @@ i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o intel_pipe_crc.o
>   i915-$(CONFIG_PERF_EVENTS) += i915_pmu.o
>   
>   # GEM code
> -i915-y += i915_cmd_parser.o \
> +i915-y += \
> +	  i915_active.o \
> +	  i915_cmd_parser.o \
>   	  i915_gem_batch_pool.o \
>   	  i915_gem_clflush.o \
>   	  i915_gem_context.o \
> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> new file mode 100644
> index 000000000000..91950d778cab
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_active.c
> @@ -0,0 +1,228 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#include "i915_drv.h"
> +#include "i915_active.h"
> +
> +#define BKL(ref) (&(ref)->i915->drm.struct_mutex)
> +
> +struct active_node {
> +	struct i915_gem_active base;
> +	struct i915_active *ref;
> +	struct rb_node node;
> +	u64 timeline;
> +};
> +
> +static void
> +__active_retire(struct i915_active *ref)
> +{
> +	GEM_BUG_ON(!ref->count);
> +	if (!--ref->count)
> +		ref->retire(ref);
> +}
> +
> +static void
> +node_retire(struct i915_gem_active *base, struct i915_request *rq)
> +{
> +	__active_retire(container_of(base, struct active_node, base)->ref);
> +}
> +
> +static void
> +last_retire(struct i915_gem_active *base, struct i915_request *rq)
> +{
> +	__active_retire(container_of(base, struct i915_active, last));
> +}
> +
> +static struct i915_gem_active *
> +active_instance(struct i915_active *ref, u64 idx)
> +{
> +	struct active_node *node;
> +	struct rb_node **p, *parent;
> +	struct i915_request *old;
> +
> +	/*
> +	 * We track the most recently used timeline to skip a rbtree search
> +	 * for the common case, under typical loads we never need the rbtree
> +	 * at all. We can reuse the last slot if it is empty, that is
> +	 * after the previous activity has been retired, or if it matches the
> +	 * current timeline.
> +	 *
> +	 * Note that we allow the timeline to be active simultaneously in
> +	 * the rbtree and the last cache. We do this to avoid having
> +	 * to search and replace the rbtree element for a new timeline, with
> +	 * the cost being that we must be aware that the ref may be retired
> +	 * twice for the same timeline (as the older rbtree element will be
> +	 * retired before the new request added to last).
> +	 */
> +	old = i915_gem_active_raw(&ref->last, BKL(ref));
> +	if (!old || old->fence.context == idx)
> +		goto out;
> +
> +	/* Move the currently active fence into the rbtree */
> +	idx = old->fence.context;
> +
> +	parent = NULL;
> +	p = &ref->tree.rb_node;
> +	while (*p) {
> +		parent = *p;
> +
> +		node = rb_entry(parent, struct active_node, node);
> +		if (node->timeline == idx)
> +			goto replace;
> +
> +		if (node->timeline < idx)
> +			p = &parent->rb_right;
> +		else
> +			p = &parent->rb_left;
> +	}
> +
> +	node = kmalloc(sizeof(*node), GFP_KERNEL);
> +
> +	/* kmalloc may retire the ref->last (thanks shrinker)! */
> +	if (unlikely(!i915_gem_active_raw(&ref->last, BKL(ref)))) {
> +		kfree(node);
> +		goto out;
> +	}
> +
> +	if (unlikely(!node))
> +		return ERR_PTR(-ENOMEM);
> +
> +	init_request_active(&node->base, node_retire);
> +	node->ref = ref;
> +	node->timeline = idx;
> +
> +	rb_link_node(&node->node, parent, p);
> +	rb_insert_color(&node->node, &ref->tree);
> +
> +replace:
> +	/*
> +	 * Overwrite the previous active slot in the rbtree with last,
> +	 * leaving last zeroed. If the previous slot is still active,
> +	 * we must be careful as we now only expect to receive one retire
> +	 * callback not two, and so much undo the active counting for the
> +	 * overwritten slot.
> +	 */
> +	if (i915_gem_active_isset(&node->base)) {
> +		/* Retire ourselves from the old rq->active_list */
> +		__list_del_entry(&node->base.link);
> +		ref->count--;
> +		GEM_BUG_ON(!ref->count);
> +	}
> +	GEM_BUG_ON(list_empty(&ref->last.link));
> +	list_replace_init(&ref->last.link, &node->base.link);
> +	node->base.request = fetch_and_zero(&ref->last.request);
> +
> +out:
> +	return &ref->last;
> +}
> +
> +void i915_active_init(struct drm_i915_private *i915,
> +		      struct i915_active *ref,
> +		      void (*retire)(struct i915_active *ref))
> +{
> +	ref->i915 = i915;
> +	ref->retire = retire;
> +	ref->tree = RB_ROOT;
> +	init_request_active(&ref->last, last_retire);
> +	ref->count = 0;
> +}
> +
> +int i915_active_ref(struct i915_active *ref,
> +		    u64 timeline,
> +		    struct i915_request *rq)
> +{
> +	struct i915_gem_active *active;
> +
> +	active = active_instance(ref, timeline);
> +	if (IS_ERR(active))
> +		return PTR_ERR(active);
> +
> +	if (!i915_gem_active_isset(active))
> +		ref->count++;
> +	i915_gem_active_set(active, rq);
> +
> +	GEM_BUG_ON(!ref->count);
> +	return 0;
> +}
> +
> +bool i915_active_acquire(struct i915_active *ref)
> +{
> +	lockdep_assert_held(BKL(ref));
> +	return !ref->count++;
> +}
> +
> +void i915_active_release(struct i915_active *ref)
> +{
> +	lockdep_assert_held(BKL(ref));
> +	__active_retire(ref);
> +}
> +
> +int i915_active_wait(struct i915_active *ref)
> +{
> +	struct active_node *it, *n;
> +	int ret = 0;
> +
> +	if (i915_active_acquire(ref))
> +		goto out_release;
> +
> +	ret = i915_gem_active_retire(&ref->last, BKL(ref));
> +	if (ret)
> +		goto out_release;
> +
> +	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
> +		ret = i915_gem_active_retire(&it->base, BKL(ref));
> +		if (ret)
> +			break;
> +	}
> +
> +out_release:
> +	i915_active_release(ref);
> +	return ret;
> +}
> +
> +static int __i915_request_await_active(struct i915_request *rq,
> +				       struct i915_gem_active *active)
> +{
> +	struct i915_request *barrier =
> +		i915_gem_active_raw(active, &rq->i915->drm.struct_mutex);
> +
> +	return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0;
> +}
> +
> +int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
> +{
> +	struct active_node *it, *n;
> +	int ret;
> +
> +	ret = __i915_request_await_active(rq, &ref->last);
> +	if (ret)
> +		return ret;
> +
> +	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
> +		ret = __i915_request_await_active(rq, &it->base);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +void i915_active_fini(struct i915_active *ref)
> +{
> +	struct active_node *it, *n;
> +
> +	GEM_BUG_ON(i915_gem_active_isset(&ref->last));
> +
> +	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
> +		GEM_BUG_ON(i915_gem_active_isset(&it->base));
> +		kfree(it);
> +	}
> +	ref->tree = RB_ROOT;
> +}
> +
> +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> +#include "selftests/i915_active.c"
> +#endif
> diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
> new file mode 100644
> index 000000000000..0aa2628ea734
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_active.h
> @@ -0,0 +1,69 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#ifndef _I915_ACTIVE_H_
> +#define _I915_ACTIVE_H_
> +
> +#include "i915_active_types.h"
> +
> +/*
> + * GPU activity tracking
> + *
> + * Each set of commands submitted to the GPU compromises a single request that
> + * signals a fence upon completion. struct i915_request combines the
> + * command submission, scheduling and fence signaling roles. If we want to see
> + * if a particular task is complete, we need to grab the fence (struct
> + * i915_request) for that task and check or wait for it to be signaled. More
> + * often though we want to track the status of a bunch of tasks, for example
> + * to wait for the GPU to finish accessing some memory across a variety of
> + * different command pipelines from different clients. We could choose to
> + * track every single request associated with the task, but knowing that
> + * each request belongs to an ordered timeline (later requests within a
> + * timeline must wait for earlier requests), we need only track the
> + * latest request in each timeline to determine the overall status of the
> + * task.
> + *
> + * struct i915_active provides this tracking across timelines. It builds a
> + * composite shared-fence, and is updated as new work is submitted to the task,
> + * forming a snapshot of the current status. It should be embedded into the
> + * different resources that need to track their associated GPU activity to
> + * provide a callback when that GPU activity has ceased, or otherwise to
> + * provide a serialisation point either for request submission or for CPU
> + * synchronisation.
> + */
> +
> +void i915_active_init(struct drm_i915_private *i915,
> +		      struct i915_active *ref,
> +		      void (*retire)(struct i915_active *ref));
> +
> +int i915_active_ref(struct i915_active *ref,
> +		    u64 timeline,
> +		    struct i915_request *rq);
> +
> +int i915_active_wait(struct i915_active *ref);
> +
> +int i915_request_await_active(struct i915_request *rq,
> +			      struct i915_active *ref);
> +
> +bool i915_active_acquire(struct i915_active *ref);
> +
> +static inline void i915_active_cancel(struct i915_active *ref)
> +{
> +	GEM_BUG_ON(ref->count != 1);
> +	ref->count = 0;
> +}
> +
> +void i915_active_release(struct i915_active *ref);
> +
> +static inline bool
> +i915_active_is_idle(const struct i915_active *ref)
> +{
> +	return !ref->count;
> +}
> +
> +void i915_active_fini(struct i915_active *ref);
> +
> +#endif /* _I915_ACTIVE_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_active_types.h b/drivers/gpu/drm/i915/i915_active_types.h
> new file mode 100644
> index 000000000000..411e502ed8dd
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_active_types.h
> @@ -0,0 +1,26 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#ifndef _I915_ACTIVE_TYPES_H_
> +#define _I915_ACTIVE_TYPES_H_
> +
> +#include <linux/rbtree.h>
> +
> +#include "i915_request.h"
> +
> +struct drm_i915_private;
> +
> +struct i915_active {
> +	struct drm_i915_private *i915;
> +
> +	struct rb_root tree;
> +	struct i915_gem_active last;
> +	unsigned int count;
> +
> +	void (*retire)(struct i915_active *ref);
> +};
> +
> +#endif /* _I915_ACTIVE_TYPES_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 49b00996a15e..e625659c03a2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1917,14 +1917,13 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
>   	if (!vma)
>   		return ERR_PTR(-ENOMEM);
>   
> +	i915_active_init(i915, &vma->active, NULL);
>   	init_request_active(&vma->last_fence, NULL);
>   
>   	vma->vm = &ggtt->vm;
>   	vma->ops = &pd_vma_ops;
>   	vma->private = ppgtt;
>   
> -	vma->active = RB_ROOT;
> -
>   	vma->size = size;
>   	vma->fence_size = size;
>   	vma->flags = I915_VMA_GGTT;
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index d83b8ad5f859..d4772061e642 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -63,22 +63,23 @@ static void vma_print_allocator(struct i915_vma *vma, const char *reason)
>   
>   #endif
>   
> -struct i915_vma_active {
> -	struct i915_gem_active base;
> -	struct i915_vma *vma;
> -	struct rb_node node;
> -	u64 timeline;
> -};
> +static void obj_bump_mru(struct drm_i915_gem_object *obj)
> +{
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>   
> -static void
> -__i915_vma_retire(struct i915_vma *vma, struct i915_request *rq)
> +	spin_lock(&i915->mm.obj_lock);
> +	if (obj->bind_count)
> +		list_move_tail(&obj->mm.link, &i915->mm.bound_list);
> +	spin_unlock(&i915->mm.obj_lock);
> +
> +	obj->mm.dirty = true; /* be paranoid  */
> +}
> +
> +static void __i915_vma_retire(struct i915_active *ref)
>   {
> +	struct i915_vma *vma = container_of(ref, typeof(*vma), active);
>   	struct drm_i915_gem_object *obj = vma->obj;
>   
> -	GEM_BUG_ON(!i915_vma_is_active(vma));
> -	if (--vma->active_count)
> -		return;
> -
>   	GEM_BUG_ON(!i915_gem_object_is_active(obj));
>   	if (--obj->active_count)
>   		return;
> @@ -90,16 +91,12 @@ __i915_vma_retire(struct i915_vma *vma, struct i915_request *rq)
>   		reservation_object_unlock(obj->resv);
>   	}
>   
> -	/* Bump our place on the bound list to keep it roughly in LRU order
> +	/*
> +	 * Bump our place on the bound list to keep it roughly in LRU order
>   	 * so that we don't steal from recently used but inactive objects
>   	 * (unless we are forced to ofc!)
>   	 */
> -	spin_lock(&rq->i915->mm.obj_lock);
> -	if (obj->bind_count)
> -		list_move_tail(&obj->mm.link, &rq->i915->mm.bound_list);
> -	spin_unlock(&rq->i915->mm.obj_lock);
> -
> -	obj->mm.dirty = true; /* be paranoid  */
> +	obj_bump_mru(obj);
>   
>   	if (i915_gem_object_has_active_reference(obj)) {
>   		i915_gem_object_clear_active_reference(obj);
> @@ -107,21 +104,6 @@ __i915_vma_retire(struct i915_vma *vma, struct i915_request *rq)
>   	}
>   }
>   
> -static void
> -i915_vma_retire(struct i915_gem_active *base, struct i915_request *rq)
> -{
> -	struct i915_vma_active *active =
> -		container_of(base, typeof(*active), base);
> -
> -	__i915_vma_retire(active->vma, rq);
> -}
> -
> -static void
> -i915_vma_last_retire(struct i915_gem_active *base, struct i915_request *rq)
> -{
> -	__i915_vma_retire(container_of(base, struct i915_vma, last_active), rq);
> -}
> -
>   static struct i915_vma *
>   vma_create(struct drm_i915_gem_object *obj,
>   	   struct i915_address_space *vm,
> @@ -137,10 +119,9 @@ vma_create(struct drm_i915_gem_object *obj,
>   	if (vma == NULL)
>   		return ERR_PTR(-ENOMEM);
>   
> -	vma->active = RB_ROOT;
> -
> -	init_request_active(&vma->last_active, i915_vma_last_retire);
> +	i915_active_init(vm->i915, &vma->active, __i915_vma_retire);
>   	init_request_active(&vma->last_fence, NULL);
> +
>   	vma->vm = vm;
>   	vma->ops = &vm->vma_ops;
>   	vma->obj = obj;
> @@ -823,7 +804,6 @@ void i915_vma_reopen(struct i915_vma *vma)
>   static void __i915_vma_destroy(struct i915_vma *vma)
>   {
>   	struct drm_i915_private *i915 = vma->vm->i915;
> -	struct i915_vma_active *iter, *n;
>   
>   	GEM_BUG_ON(vma->node.allocated);
>   	GEM_BUG_ON(vma->fence);
> @@ -843,10 +823,7 @@ static void __i915_vma_destroy(struct i915_vma *vma)
>   		spin_unlock(&obj->vma.lock);
>   	}
>   
> -	rbtree_postorder_for_each_entry_safe(iter, n, &vma->active, node) {
> -		GEM_BUG_ON(i915_gem_active_isset(&iter->base));
> -		kfree(iter);
> -	}
> +	i915_active_fini(&vma->active);
>   
>   	kmem_cache_free(i915->vmas, vma);
>   }
> @@ -931,104 +908,15 @@ static void export_fence(struct i915_vma *vma,
>   	reservation_object_unlock(resv);
>   }
>   
> -static struct i915_gem_active *active_instance(struct i915_vma *vma, u64 idx)
> -{
> -	struct i915_vma_active *active;
> -	struct rb_node **p, *parent;
> -	struct i915_request *old;
> -
> -	/*
> -	 * We track the most recently used timeline to skip a rbtree search
> -	 * for the common case, under typical loads we never need the rbtree
> -	 * at all. We can reuse the last_active slot if it is empty, that is
> -	 * after the previous activity has been retired, or if the active
> -	 * matches the current timeline.
> -	 *
> -	 * Note that we allow the timeline to be active simultaneously in
> -	 * the rbtree and the last_active cache. We do this to avoid having
> -	 * to search and replace the rbtree element for a new timeline, with
> -	 * the cost being that we must be aware that the vma may be retired
> -	 * twice for the same timeline (as the older rbtree element will be
> -	 * retired before the new request added to last_active).
> -	 */
> -	old = i915_gem_active_raw(&vma->last_active,
> -				  &vma->vm->i915->drm.struct_mutex);
> -	if (!old || old->fence.context == idx)
> -		goto out;
> -
> -	/* Move the currently active fence into the rbtree */
> -	idx = old->fence.context;
> -
> -	parent = NULL;
> -	p = &vma->active.rb_node;
> -	while (*p) {
> -		parent = *p;
> -
> -		active = rb_entry(parent, struct i915_vma_active, node);
> -		if (active->timeline == idx)
> -			goto replace;
> -
> -		if (active->timeline < idx)
> -			p = &parent->rb_right;
> -		else
> -			p = &parent->rb_left;
> -	}
> -
> -	active = kmalloc(sizeof(*active), GFP_KERNEL);
> -
> -	/* kmalloc may retire the vma->last_active request (thanks shrinker)! */
> -	if (unlikely(!i915_gem_active_raw(&vma->last_active,
> -					  &vma->vm->i915->drm.struct_mutex))) {
> -		kfree(active);
> -		goto out;
> -	}
> -
> -	if (unlikely(!active))
> -		return ERR_PTR(-ENOMEM);
> -
> -	init_request_active(&active->base, i915_vma_retire);
> -	active->vma = vma;
> -	active->timeline = idx;
> -
> -	rb_link_node(&active->node, parent, p);
> -	rb_insert_color(&active->node, &vma->active);
> -
> -replace:
> -	/*
> -	 * Overwrite the previous active slot in the rbtree with last_active,
> -	 * leaving last_active zeroed. If the previous slot is still active,
> -	 * we must be careful as we now only expect to receive one retire
> -	 * callback not two, and so much undo the active counting for the
> -	 * overwritten slot.
> -	 */
> -	if (i915_gem_active_isset(&active->base)) {
> -		/* Retire ourselves from the old rq->active_list */
> -		__list_del_entry(&active->base.link);
> -		vma->active_count--;
> -		GEM_BUG_ON(!vma->active_count);
> -	}
> -	GEM_BUG_ON(list_empty(&vma->last_active.link));
> -	list_replace_init(&vma->last_active.link, &active->base.link);
> -	active->base.request = fetch_and_zero(&vma->last_active.request);
> -
> -out:
> -	return &vma->last_active;
> -}
> -
>   int i915_vma_move_to_active(struct i915_vma *vma,
>   			    struct i915_request *rq,
>   			    unsigned int flags)
>   {
>   	struct drm_i915_gem_object *obj = vma->obj;
> -	struct i915_gem_active *active;
>   
>   	lockdep_assert_held(&rq->i915->drm.struct_mutex);
>   	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
>   
> -	active = active_instance(vma, rq->fence.context);
> -	if (IS_ERR(active))
> -		return PTR_ERR(active);
> -
>   	/*
>   	 * Add a reference if we're newly entering the active list.
>   	 * The order in which we add operations to the retirement queue is
> @@ -1037,9 +925,15 @@ int i915_vma_move_to_active(struct i915_vma *vma,
>   	 * add the active reference first and queue for it to be dropped
>   	 * *last*.
>   	 */
> -	if (!i915_gem_active_isset(active) && !vma->active_count++)
> +	if (!vma->active.count)
>   		obj->active_count++;
> -	i915_gem_active_set(active, rq);
> +
> +	if (unlikely(i915_active_ref(&vma->active, rq->fence.context, rq))) {
> +		if (!vma->active.count)
> +			obj->active_count--;
> +		return -ENOMEM;
> +	}
> +
>   	GEM_BUG_ON(!i915_vma_is_active(vma));
>   	GEM_BUG_ON(!obj->active_count);
>   
> @@ -1073,8 +967,6 @@ int i915_vma_unbind(struct i915_vma *vma)
>   	 */
>   	might_sleep();
>   	if (i915_vma_is_active(vma)) {
> -		struct i915_vma_active *active, *n;
> -
>   		/*
>   		 * When a closed VMA is retired, it is unbound - eek.
>   		 * In order to prevent it from being recursively closed,
> @@ -1090,19 +982,10 @@ int i915_vma_unbind(struct i915_vma *vma)
>   		 */
>   		__i915_vma_pin(vma);
>   
> -		ret = i915_gem_active_retire(&vma->last_active,
> -					     &vma->vm->i915->drm.struct_mutex);
> +		ret = i915_active_wait(&vma->active);
>   		if (ret)
>   			goto unpin;
>   
> -		rbtree_postorder_for_each_entry_safe(active, n,
> -						     &vma->active, node) {
> -			ret = i915_gem_active_retire(&active->base,
> -						     &vma->vm->i915->drm.struct_mutex);
> -			if (ret)
> -				goto unpin;
> -		}
> -
>   		ret = i915_gem_active_retire(&vma->last_fence,
>   					     &vma->vm->i915->drm.struct_mutex);
>   unpin:
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 5793abe509a2..3c03d4569481 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -34,6 +34,7 @@
>   #include "i915_gem_fence_reg.h"
>   #include "i915_gem_object.h"
>   
> +#include "i915_active.h"
>   #include "i915_request.h"
>   
>   enum i915_cache_level;
> @@ -108,9 +109,7 @@ struct i915_vma {
>   #define I915_VMA_USERFAULT	BIT(I915_VMA_USERFAULT_BIT)
>   #define I915_VMA_GGTT_WRITE	BIT(15)
>   
> -	unsigned int active_count;
> -	struct rb_root active;
> -	struct i915_gem_active last_active;
> +	struct i915_active active;
>   	struct i915_gem_active last_fence;
>   
>   	/**
> @@ -154,9 +153,9 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>   void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
>   #define I915_VMA_RELEASE_MAP BIT(0)
>   
> -static inline bool i915_vma_is_active(struct i915_vma *vma)
> +static inline bool i915_vma_is_active(const struct i915_vma *vma)
>   {
> -	return vma->active_count;
> +	return !i915_active_is_idle(&vma->active);
>   }
>   
>   int __must_check i915_vma_move_to_active(struct i915_vma *vma,
> diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c
> new file mode 100644
> index 000000000000..c05ca366729a
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/selftests/i915_active.c
> @@ -0,0 +1,158 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2018 Intel Corporation
> + */
> +
> +#include "../i915_selftest.h"
> +
> +#include "igt_flush_test.h"
> +#include "lib_sw_fence.h"
> +
> +struct live_active {
> +	struct i915_active base;
> +	bool retired;
> +};
> +
> +static void __live_active_retire(struct i915_active *base)
> +{
> +	struct live_active *active = container_of(base, typeof(*active), base);
> +
> +	active->retired = true;
> +}
> +
> +static int __live_active_setup(struct drm_i915_private *i915,
> +			       struct live_active *active)
> +{
> +	struct intel_engine_cs *engine;
> +	struct i915_sw_fence *submit;
> +	enum intel_engine_id id;
> +	unsigned int count = 0;
> +	int err = 0;
> +
> +	i915_active_init(i915, &active->base, __live_active_retire);
> +	active->retired = false;
> +
> +	if (!i915_active_acquire(&active->base)) {
> +		pr_err("First i915_active_acquire should report being idle\n");
> +		return -EINVAL;
> +	}
> +
> +	submit = heap_fence_create(GFP_KERNEL);
> +
> +	for_each_engine(engine, i915, id) {
> +		struct i915_request *rq;
> +
> +		rq = i915_request_alloc(engine, i915->kernel_context);
> +		if (IS_ERR(rq)) {
> +			err = PTR_ERR(rq);

Add a message here so error cause is clear in the logs.

> +			break;
> +		}
> +
> +		err = i915_sw_fence_await_sw_fence_gfp(&rq->submit,
> +						       submit,
> +						       GFP_KERNEL);
> +		if (err < 0) {
> +			pr_err("Failed to allocate submission fence!\n");
> +			i915_request_add(rq);
> +			break;
> +		}
> +
> +		err = i915_active_ref(&active->base, rq->fence.context, rq);
> +		if (err) {
> +			pr_err("Failed to track active ref!\n");
> +			i915_request_add(rq);
> +			break;
> +		}
> +
> +		i915_request_add(rq);

request_add could be consolidated into a single call after err = 
i915_sw_fence.., if you want.

> +		count++;
> +	}
> +
> +	i915_active_release(&active->base);
> +	if (active->retired) {
> +		pr_err("i915_active retired before submission!\n");
> +		err = -EINVAL;
> +	}
> +	if (active->base.count != count) {
> +		pr_err("i915_active not tracking all requests, found %d, expected %d\n",
> +		       active->base.count, count);
> +		err = -EINVAL;
> +	}
> +
> +	i915_sw_fence_commit(submit);
> +	heap_fence_put(submit);
> +
> +	return err;
> +}
> +
> +static int live_active_wait(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct live_active active;
> +	intel_wakeref_t wakeref;
> +	int err;
> +
> +	/* Check that we get a callback when requests retire upon waiting */
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +	wakeref = intel_runtime_pm_get(i915);
> +
> +	err = __live_active_setup(i915, &active);
> +
> +	i915_active_wait(&active.base);
> +	if (!active.retired) {
> +		pr_err("i915_active not retired after waiting!\n");
> +		err = -EINVAL;
> +	}
> +
> +	i915_active_fini(&active.base);
> +	if (igt_flush_test(i915, I915_WAIT_LOCKED))
> +		err = -EIO;
> +
> +	intel_runtime_pm_put(i915, wakeref);
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	return err;
> +}
> +
> +static int live_active_retire(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct live_active active;
> +	intel_wakeref_t wakeref;
> +	int err;
> +
> +	/* Check that we get a callback when requests are indirectly retired */
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +	wakeref = intel_runtime_pm_get(i915);
> +
> +	err = __live_active_setup(i915, &active);
> +
> +	/* waits for & retires all requests */
> +	if (igt_flush_test(i915, I915_WAIT_LOCKED))
> +		err = -EIO;
> +
> +	if (!active.retired) {
> +		pr_err("i915_active not retired after flushing!\n");
> +		err = -EINVAL;
> +	}
> +
> +	i915_active_fini(&active.base);
> +	intel_runtime_pm_put(i915, wakeref);
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	return err;
> +}
> +
> +int i915_active_live_selftests(struct drm_i915_private *i915)
> +{
> +	static const struct i915_subtest tests[] = {
> +		SUBTEST(live_active_wait),
> +		SUBTEST(live_active_retire),
> +	};
> +
> +	if (i915_terminally_wedged(&i915->gpu_error))
> +		return 0;
> +
> +	return i915_subtests(tests, i915);
> +}
> diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
> index 76b4f87fc853..6d766925ad04 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
> +++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
> @@ -12,8 +12,9 @@
>   selftest(sanitycheck, i915_live_sanitycheck) /* keep first (igt selfcheck) */
>   selftest(uncore, intel_uncore_live_selftests)
>   selftest(workarounds, intel_workarounds_live_selftests)
> -selftest(requests, i915_request_live_selftests)
>   selftest(timelines, i915_timeline_live_selftests)
> +selftest(requests, i915_request_live_selftests)
> +selftest(active, i915_active_live_selftests)
>   selftest(objects, i915_gem_object_live_selftests)
>   selftest(dmabuf, i915_gem_dmabuf_live_selftests)
>   selftest(coherency, i915_gem_coherency_live_selftests)
> 

With the change log and error message:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 12/22] drm/i915: Generalise GPU activity tracking
  2019-02-05  8:51   ` Tvrtko Ursulin
@ 2019-02-05  8:59     ` Chris Wilson
  0 siblings, 0 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-05  8:59 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-05 08:51:19)
> 
> On 04/02/2019 13:22, Chris Wilson wrote:
> > We currently track GPU memory usage inside VMA, such that we never
> > release memory used by the GPU until after it has finished accessing it.
> > However, we may want to track other resources aside from VMA, or we may
> > want to split a VMA into multiple independent regions and track each
> > separately. For this purpose, generalise our request tracking (akin to
> > struct reservation_object) so that we can embed it into other objects.
> 
> Please add changelog.

Added GEM_BUG_ON for an erroneous overflow. Nah. 

> > +static int __live_active_setup(struct drm_i915_private *i915,
> > +                            struct live_active *active)
> > +{
> > +     struct intel_engine_cs *engine;
> > +     struct i915_sw_fence *submit;
> > +     enum intel_engine_id id;
> > +     unsigned int count = 0;
> > +     int err = 0;
> > +
> > +     i915_active_init(i915, &active->base, __live_active_retire);
> > +     active->retired = false;
> > +
> > +     if (!i915_active_acquire(&active->base)) {
> > +             pr_err("First i915_active_acquire should report being idle\n");
> > +             return -EINVAL;
> > +     }
> > +
> > +     submit = heap_fence_create(GFP_KERNEL);
> > +
> > +     for_each_engine(engine, i915, id) {
> > +             struct i915_request *rq;
> > +
> > +             rq = i915_request_alloc(engine, i915->kernel_context);
> > +             if (IS_ERR(rq)) {
> > +                     err = PTR_ERR(rq);
> 
> Add a message here so error cause is clear in the logs.

I haven't, I don't see the point. It doesn't generate the same error and
it doesn't help investigating test failures (as it is not part of the
test).

> request_add could be consolidated into a single call after err = 
> i915_sw_fence.., if you want.

I did that yesterday, and even added a v2 for you.

> With the change log and error message:

What changelog? It's meant to be extracting the code from i915_vma.c and
adding a testcase. What existing logic did we need to change?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 13/22] drm/i915: Release the active tracker tree upon idling
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (11 preceding siblings ...)
  2019-02-04 13:22 ` [PATCH 12/22] drm/i915: Generalise GPU activity tracking Chris Wilson
@ 2019-02-04 13:22 ` Chris Wilson
  2019-02-04 13:22 ` [PATCH 14/22] drm/i915: Allocate active tracking nodes from a slabcache Chris Wilson
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:22 UTC (permalink / raw)
  To: intel-gfx

As soon as we detect that the active tracker is idle and we prepare to
call the retire callback, release the storage for our tree of
per-timeline nodes. We expect these to be infrequently used and quick
to allocate, so there is little benefit in keeping the tree cached and
we would prefer to return the pages back to the system in a timely
fashion.

This also means that when we finalize the struct as a whole, we know as
the activity tracker must be idle, the tree has already been released.
Indeed we can reduce i915_active_fini() just to the assertions that there
is nothing to do.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_active.c | 33 +++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_active.h |  4 ++++
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 91950d778cab..b1fefe98f9a6 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -16,12 +16,29 @@ struct active_node {
 	u64 timeline;
 };
 
+static void
+__active_park(struct i915_active *ref)
+{
+	struct active_node *it, *n;
+
+	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
+		GEM_BUG_ON(i915_gem_active_isset(&it->base));
+		kfree(it);
+	}
+	ref->tree = RB_ROOT;
+}
+
 static void
 __active_retire(struct i915_active *ref)
 {
 	GEM_BUG_ON(!ref->count);
-	if (!--ref->count)
-		ref->retire(ref);
+	if (--ref->count)
+		return;
+
+	/* return the unused nodes to our slabcache */
+	__active_park(ref);
+
+	ref->retire(ref);
 }
 
 static void
@@ -210,18 +227,14 @@ int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
 	return 0;
 }
 
+#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
 void i915_active_fini(struct i915_active *ref)
 {
-	struct active_node *it, *n;
-
 	GEM_BUG_ON(i915_gem_active_isset(&ref->last));
-
-	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
-		GEM_BUG_ON(i915_gem_active_isset(&it->base));
-		kfree(it);
-	}
-	ref->tree = RB_ROOT;
+	GEM_BUG_ON(!RB_EMPTY_ROOT(&ref->tree));
+	GEM_BUG_ON(ref->count);
 }
+#endif
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/i915_active.c"
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index 0aa2628ea734..ec4b66efd9a7 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -64,6 +64,10 @@ i915_active_is_idle(const struct i915_active *ref)
 	return !ref->count;
 }
 
+#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
 void i915_active_fini(struct i915_active *ref);
+#else
+static inline void i915_active_fini(struct i915_active *ref) { }
+#endif
 
 #endif /* _I915_ACTIVE_H_ */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 14/22] drm/i915: Allocate active tracking nodes from a slabcache
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (12 preceding siblings ...)
  2019-02-04 13:22 ` [PATCH 13/22] drm/i915: Release the active tracker tree upon idling Chris Wilson
@ 2019-02-04 13:22 ` Chris Wilson
  2019-02-04 18:22   ` Tvrtko Ursulin
  2019-02-04 13:22 ` [PATCH 15/22] drm/i915: Make request allocation caches global Chris Wilson
                   ` (14 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:22 UTC (permalink / raw)
  To: intel-gfx

Wrap the active tracking for a GPU references in a slabcache for faster
allocations, and hopefully better fragmentation reduction.

v3: Nothing device specific left, it's just a slabcache that we can
make global.
v4: Include i915_active.h and don't put the initfunc under DEBUG_GEM

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_active.c | 31 +++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_active.h |  3 +++
 drivers/gpu/drm/i915/i915_pci.c    |  4 ++++
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index b1fefe98f9a6..64661c41532b 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -9,6 +9,17 @@
 
 #define BKL(ref) (&(ref)->i915->drm.struct_mutex)
 
+/*
+ * Active refs memory management
+ *
+ * To be more economical with memory, we reap all the i915_active trees as
+ * they idle (when we know the active requests are inactive) and allocate the
+ * nodes from a local slab cache to hopefully reduce the fragmentation.
+ */
+static struct i915_global_active {
+	struct kmem_cache *slab_cache;
+} global;
+
 struct active_node {
 	struct i915_gem_active base;
 	struct i915_active *ref;
@@ -23,7 +34,7 @@ __active_park(struct i915_active *ref)
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
 		GEM_BUG_ON(i915_gem_active_isset(&it->base));
-		kfree(it);
+		kmem_cache_free(global.slab_cache, it);
 	}
 	ref->tree = RB_ROOT;
 }
@@ -96,11 +107,11 @@ active_instance(struct i915_active *ref, u64 idx)
 			p = &parent->rb_left;
 	}
 
-	node = kmalloc(sizeof(*node), GFP_KERNEL);
+	node = kmem_cache_alloc(global.slab_cache, GFP_KERNEL);
 
 	/* kmalloc may retire the ref->last (thanks shrinker)! */
 	if (unlikely(!i915_gem_active_raw(&ref->last, BKL(ref)))) {
-		kfree(node);
+		kmem_cache_free(global.slab_cache, node);
 		goto out;
 	}
 
@@ -239,3 +250,17 @@ void i915_active_fini(struct i915_active *ref)
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/i915_active.c"
 #endif
+
+int __init i915_global_active_init(void)
+{
+	global.slab_cache = KMEM_CACHE(active_node, SLAB_HWCACHE_ALIGN);
+	if (!global.slab_cache)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void __exit i915_global_active_exit(void)
+{
+	kmem_cache_destroy(global.slab_cache);
+}
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index ec4b66efd9a7..179b47aeec33 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -70,4 +70,7 @@ void i915_active_fini(struct i915_active *ref);
 static inline void i915_active_fini(struct i915_active *ref) { }
 #endif
 
+int i915_global_active_init(void);
+void i915_global_active_exit(void);
+
 #endif /* _I915_ACTIVE_H_ */
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 5d05572c9ff4..852b6b4e8ed8 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -28,6 +28,7 @@
 
 #include <drm/drm_drv.h>
 
+#include "i915_active.h"
 #include "i915_drv.h"
 #include "i915_selftest.h"
 
@@ -800,6 +801,8 @@ static int __init i915_init(void)
 	bool use_kms = true;
 	int err;
 
+	i915_global_active_init();
+
 	err = i915_mock_selftests();
 	if (err)
 		return err > 0 ? 0 : err;
@@ -831,6 +834,7 @@ static void __exit i915_exit(void)
 		return;
 
 	pci_unregister_driver(&i915_pci_driver);
+	i915_global_active_exit();
 }
 
 module_init(i915_init);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 14/22] drm/i915: Allocate active tracking nodes from a slabcache
  2019-02-04 13:22 ` [PATCH 14/22] drm/i915: Allocate active tracking nodes from a slabcache Chris Wilson
@ 2019-02-04 18:22   ` Tvrtko Ursulin
  0 siblings, 0 replies; 45+ messages in thread
From: Tvrtko Ursulin @ 2019-02-04 18:22 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/02/2019 13:22, Chris Wilson wrote:
> Wrap the active tracking for a GPU references in a slabcache for faster
> allocations, and hopefully better fragmentation reduction.
> 
> v3: Nothing device specific left, it's just a slabcache that we can
> make global.
> v4: Include i915_active.h and don't put the initfunc under DEBUG_GEM
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_active.c | 31 +++++++++++++++++++++++++++---
>   drivers/gpu/drm/i915/i915_active.h |  3 +++
>   drivers/gpu/drm/i915/i915_pci.c    |  4 ++++
>   3 files changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> index b1fefe98f9a6..64661c41532b 100644
> --- a/drivers/gpu/drm/i915/i915_active.c
> +++ b/drivers/gpu/drm/i915/i915_active.c
> @@ -9,6 +9,17 @@
>   
>   #define BKL(ref) (&(ref)->i915->drm.struct_mutex)
>   
> +/*
> + * Active refs memory management
> + *
> + * To be more economical with memory, we reap all the i915_active trees as
> + * they idle (when we know the active requests are inactive) and allocate the
> + * nodes from a local slab cache to hopefully reduce the fragmentation.
> + */
> +static struct i915_global_active {
> +	struct kmem_cache *slab_cache;
> +} global;
> +
>   struct active_node {
>   	struct i915_gem_active base;
>   	struct i915_active *ref;
> @@ -23,7 +34,7 @@ __active_park(struct i915_active *ref)
>   
>   	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
>   		GEM_BUG_ON(i915_gem_active_isset(&it->base));
> -		kfree(it);
> +		kmem_cache_free(global.slab_cache, it);
>   	}
>   	ref->tree = RB_ROOT;
>   }
> @@ -96,11 +107,11 @@ active_instance(struct i915_active *ref, u64 idx)
>   			p = &parent->rb_left;
>   	}
>   
> -	node = kmalloc(sizeof(*node), GFP_KERNEL);
> +	node = kmem_cache_alloc(global.slab_cache, GFP_KERNEL);
>   
>   	/* kmalloc may retire the ref->last (thanks shrinker)! */
>   	if (unlikely(!i915_gem_active_raw(&ref->last, BKL(ref)))) {
> -		kfree(node);
> +		kmem_cache_free(global.slab_cache, node);
>   		goto out;
>   	}
>   
> @@ -239,3 +250,17 @@ void i915_active_fini(struct i915_active *ref)
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   #include "selftests/i915_active.c"
>   #endif
> +
> +int __init i915_global_active_init(void)
> +{
> +	global.slab_cache = KMEM_CACHE(active_node, SLAB_HWCACHE_ALIGN);
> +	if (!global.slab_cache)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +void __exit i915_global_active_exit(void)
> +{
> +	kmem_cache_destroy(global.slab_cache);
> +}
> diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
> index ec4b66efd9a7..179b47aeec33 100644
> --- a/drivers/gpu/drm/i915/i915_active.h
> +++ b/drivers/gpu/drm/i915/i915_active.h
> @@ -70,4 +70,7 @@ void i915_active_fini(struct i915_active *ref);
>   static inline void i915_active_fini(struct i915_active *ref) { }
>   #endif
>   
> +int i915_global_active_init(void);
> +void i915_global_active_exit(void);
> +
>   #endif /* _I915_ACTIVE_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 5d05572c9ff4..852b6b4e8ed8 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -28,6 +28,7 @@
>   
>   #include <drm/drm_drv.h>
>   
> +#include "i915_active.h"
>   #include "i915_drv.h"
>   #include "i915_selftest.h"
>   
> @@ -800,6 +801,8 @@ static int __init i915_init(void)
>   	bool use_kms = true;
>   	int err;
>   
> +	i915_global_active_init();
> +
>   	err = i915_mock_selftests();
>   	if (err)
>   		return err > 0 ? 0 : err;
> @@ -831,6 +834,7 @@ static void __exit i915_exit(void)
>   		return;
>   
>   	pci_unregister_driver(&i915_pci_driver);
> +	i915_global_active_exit();
>   }
>   
>   module_init(i915_init);
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 15/22] drm/i915: Make request allocation caches global
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (13 preceding siblings ...)
  2019-02-04 13:22 ` [PATCH 14/22] drm/i915: Allocate active tracking nodes from a slabcache Chris Wilson
@ 2019-02-04 13:22 ` Chris Wilson
  2019-02-04 18:48   ` Tvrtko Ursulin
  2019-02-04 13:22 ` [PATCH 16/22] drm/i915: Add timeline barrier support Chris Wilson
                   ` (13 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:22 UTC (permalink / raw)
  To: intel-gfx

As kmem_caches share the same properties (size, allocation/free behaviour)
for all potential devices, we can use global caches. While this
potential has worse fragmentation behaviour (one can argue that
different devices would have different activity lifetimes, but you can
also argue that activity is temporal across the system) it is the
default behaviour of the system at large to amalgamate matching caches.

The benefit for us is much reduced pointer dancing along the frequent
allocation paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile                 |  1 +
 drivers/gpu/drm/i915/i915_active.c            |  9 ++-
 drivers/gpu/drm/i915/i915_active.h            |  1 +
 drivers/gpu/drm/i915/i915_drv.h               |  3 -
 drivers/gpu/drm/i915/i915_gem.c               | 32 +--------
 drivers/gpu/drm/i915/i915_globals.c           | 49 ++++++++++++++
 drivers/gpu/drm/i915/i915_globals.h           | 14 ++++
 drivers/gpu/drm/i915/i915_pci.c               |  8 ++-
 drivers/gpu/drm/i915/i915_request.c           | 53 ++++++++++++---
 drivers/gpu/drm/i915/i915_request.h           | 10 +++
 drivers/gpu/drm/i915/i915_scheduler.c         | 66 +++++++++++++++----
 drivers/gpu/drm/i915/i915_scheduler.h         | 34 ++++++++--
 drivers/gpu/drm/i915/intel_guc_submission.c   |  3 +-
 drivers/gpu/drm/i915/intel_lrc.c              |  6 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h       | 17 -----
 drivers/gpu/drm/i915/selftests/intel_lrc.c    |  2 +-
 drivers/gpu/drm/i915/selftests/mock_engine.c  | 48 +++++++-------
 .../gpu/drm/i915/selftests/mock_gem_device.c  | 26 --------
 drivers/gpu/drm/i915/selftests/mock_request.c | 12 ++--
 drivers/gpu/drm/i915/selftests/mock_request.h |  7 --
 20 files changed, 248 insertions(+), 153 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_globals.c
 create mode 100644 drivers/gpu/drm/i915/i915_globals.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 1787e1299b1b..a1d834068765 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -77,6 +77,7 @@ i915-y += \
 	  i915_gem_tiling.o \
 	  i915_gem_userptr.o \
 	  i915_gemfs.o \
+	  i915_globals.o \
 	  i915_query.o \
 	  i915_request.o \
 	  i915_scheduler.o \
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 64661c41532b..d23092d8c89f 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -251,7 +251,7 @@ void i915_active_fini(struct i915_active *ref)
 #include "selftests/i915_active.c"
 #endif
 
-int __init i915_global_active_init(void)
+int i915_global_active_init(void)
 {
 	global.slab_cache = KMEM_CACHE(active_node, SLAB_HWCACHE_ALIGN);
 	if (!global.slab_cache)
@@ -260,7 +260,12 @@ int __init i915_global_active_init(void)
 	return 0;
 }
 
-void __exit i915_global_active_exit(void)
+void i915_global_active_shrink(void)
+{
+	kmem_cache_shrink(global.slab_cache);
+}
+
+void i915_global_active_exit(void)
 {
 	kmem_cache_destroy(global.slab_cache);
 }
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index 179b47aeec33..6c56d10b1f59 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -71,6 +71,7 @@ static inline void i915_active_fini(struct i915_active *ref) { }
 #endif
 
 int i915_global_active_init(void);
+void i915_global_active_shrink(void);
 void i915_global_active_exit(void);
 
 #endif /* _I915_ACTIVE_H_ */
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3e4538ce5276..e48e3c228d9c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1459,9 +1459,6 @@ struct drm_i915_private {
 	struct kmem_cache *objects;
 	struct kmem_cache *vmas;
 	struct kmem_cache *luts;
-	struct kmem_cache *requests;
-	struct kmem_cache *dependencies;
-	struct kmem_cache *priorities;
 
 	const struct intel_device_info __info; /* Use INTEL_INFO() to access. */
 	struct intel_runtime_info __runtime; /* Use RUNTIME_INFO() to access. */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2c6161c89cc7..d82e4f990586 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -42,6 +42,7 @@
 #include "i915_drv.h"
 #include "i915_gem_clflush.h"
 #include "i915_gemfs.h"
+#include "i915_globals.h"
 #include "i915_reset.h"
 #include "i915_trace.h"
 #include "i915_vgpu.h"
@@ -2916,12 +2917,11 @@ static void shrink_caches(struct drm_i915_private *i915)
 	 * filled slabs to prioritise allocating from the mostly full slabs,
 	 * with the aim of reducing fragmentation.
 	 */
-	kmem_cache_shrink(i915->priorities);
-	kmem_cache_shrink(i915->dependencies);
-	kmem_cache_shrink(i915->requests);
 	kmem_cache_shrink(i915->luts);
 	kmem_cache_shrink(i915->vmas);
 	kmem_cache_shrink(i915->objects);
+
+	i915_globals_shrink();
 }
 
 struct sleep_rcu_work {
@@ -5264,23 +5264,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
 	if (!dev_priv->luts)
 		goto err_vmas;
 
-	dev_priv->requests = KMEM_CACHE(i915_request,
-					SLAB_HWCACHE_ALIGN |
-					SLAB_RECLAIM_ACCOUNT |
-					SLAB_TYPESAFE_BY_RCU);
-	if (!dev_priv->requests)
-		goto err_luts;
-
-	dev_priv->dependencies = KMEM_CACHE(i915_dependency,
-					    SLAB_HWCACHE_ALIGN |
-					    SLAB_RECLAIM_ACCOUNT);
-	if (!dev_priv->dependencies)
-		goto err_requests;
-
-	dev_priv->priorities = KMEM_CACHE(i915_priolist, SLAB_HWCACHE_ALIGN);
-	if (!dev_priv->priorities)
-		goto err_dependencies;
-
 	INIT_LIST_HEAD(&dev_priv->gt.active_rings);
 	INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
 
@@ -5305,12 +5288,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
 
 	return 0;
 
-err_dependencies:
-	kmem_cache_destroy(dev_priv->dependencies);
-err_requests:
-	kmem_cache_destroy(dev_priv->requests);
-err_luts:
-	kmem_cache_destroy(dev_priv->luts);
 err_vmas:
 	kmem_cache_destroy(dev_priv->vmas);
 err_objects:
@@ -5328,9 +5305,6 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
 
 	cleanup_srcu_struct(&dev_priv->gpu_error.srcu);
 
-	kmem_cache_destroy(dev_priv->priorities);
-	kmem_cache_destroy(dev_priv->dependencies);
-	kmem_cache_destroy(dev_priv->requests);
 	kmem_cache_destroy(dev_priv->luts);
 	kmem_cache_destroy(dev_priv->vmas);
 	kmem_cache_destroy(dev_priv->objects);
diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c
new file mode 100644
index 000000000000..2ecf9897fd16
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_globals.c
@@ -0,0 +1,49 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include "i915_active.h"
+#include "i915_globals.h"
+#include "i915_request.h"
+#include "i915_scheduler.h"
+
+int __init i915_globals_init(void)
+{
+	int err;
+
+	err = i915_global_active_init();
+	if (err)
+		return err;
+
+	err = i915_global_request_init();
+	if (err)
+		goto err_active;
+
+	err = i915_global_scheduler_init();
+	if (err)
+		goto err_request;
+
+	return 0;
+
+err_request:
+	i915_global_request_exit();
+err_active:
+	i915_global_active_exit();
+	return err;
+}
+
+void i915_globals_shrink(void)
+{
+	i915_global_active_shrink();
+	i915_global_request_shrink();
+	i915_global_scheduler_shrink();
+}
+
+void __exit i915_globals_exit(void)
+{
+	i915_global_scheduler_exit();
+	i915_global_request_exit();
+	i915_global_active_exit();
+}
diff --git a/drivers/gpu/drm/i915/i915_globals.h b/drivers/gpu/drm/i915/i915_globals.h
new file mode 100644
index 000000000000..903f52c0a1d2
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_globals.h
@@ -0,0 +1,14 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef _I915_GLOBALS_H_
+#define _I915_GLOBALS_H_
+
+int i915_globals_init(void);
+void i915_globals_shrink(void);
+void i915_globals_exit(void);
+
+#endif /* _I915_GLOBALS_H_ */
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 852b6b4e8ed8..0d684eb530c3 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -28,8 +28,8 @@
 
 #include <drm/drm_drv.h>
 
-#include "i915_active.h"
 #include "i915_drv.h"
+#include "i915_globals.h"
 #include "i915_selftest.h"
 
 #define PLATFORM(x) .platform = (x), .platform_mask = BIT(x)
@@ -801,7 +801,9 @@ static int __init i915_init(void)
 	bool use_kms = true;
 	int err;
 
-	i915_global_active_init();
+	err = i915_globals_init();
+	if (err)
+		return err;
 
 	err = i915_mock_selftests();
 	if (err)
@@ -834,7 +836,7 @@ static void __exit i915_exit(void)
 		return;
 
 	pci_unregister_driver(&i915_pci_driver);
-	i915_global_active_exit();
+	i915_globals_exit();
 }
 
 module_init(i915_init);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 04c65e6d83b9..3bb4840ba761 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -31,6 +31,11 @@
 #include "i915_drv.h"
 #include "i915_reset.h"
 
+static struct i915_global_request {
+	struct kmem_cache *slab_requests;
+	struct kmem_cache *slab_dependencies;
+} global;
+
 static const char *i915_fence_get_driver_name(struct dma_fence *fence)
 {
 	return "i915";
@@ -83,7 +88,7 @@ static void i915_fence_release(struct dma_fence *fence)
 	 */
 	i915_sw_fence_fini(&rq->submit);
 
-	kmem_cache_free(rq->i915->requests, rq);
+	kmem_cache_free(global.slab_requests, rq);
 }
 
 const struct dma_fence_ops i915_fence_ops = {
@@ -301,7 +306,7 @@ static void i915_request_retire(struct i915_request *request)
 
 	unreserve_gt(request->i915);
 
-	i915_sched_node_fini(request->i915, &request->sched);
+	i915_sched_node_fini(&request->sched);
 	i915_request_put(request);
 }
 
@@ -535,7 +540,7 @@ i915_request_alloc_slow(struct intel_context *ce)
 	ring_retire_requests(ring);
 
 out:
-	return kmem_cache_alloc(ce->gem_context->i915->requests, GFP_KERNEL);
+	return kmem_cache_alloc(global.slab_requests, GFP_KERNEL);
 }
 
 /**
@@ -617,7 +622,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 *
 	 * Do not use kmem_cache_zalloc() here!
 	 */
-	rq = kmem_cache_alloc(i915->requests,
+	rq = kmem_cache_alloc(global.slab_requests,
 			      GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
 	if (unlikely(!rq)) {
 		rq = i915_request_alloc_slow(ce);
@@ -701,7 +706,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	GEM_BUG_ON(!list_empty(&rq->sched.signalers_list));
 	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
-	kmem_cache_free(i915->requests, rq);
+	kmem_cache_free(global.slab_requests, rq);
 err_unreserve:
 	unreserve_gt(i915);
 	intel_context_unpin(ce);
@@ -720,9 +725,7 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 		return 0;
 
 	if (to->engine->schedule) {
-		ret = i915_sched_node_add_dependency(to->i915,
-						     &to->sched,
-						     &from->sched);
+		ret = i915_sched_node_add_dependency(&to->sched, &from->sched);
 		if (ret < 0)
 			return ret;
 	}
@@ -1195,3 +1198,37 @@ void i915_retire_requests(struct drm_i915_private *i915)
 #include "selftests/mock_request.c"
 #include "selftests/i915_request.c"
 #endif
+
+int i915_global_request_init(void)
+{
+	global.slab_requests = KMEM_CACHE(i915_request,
+					  SLAB_HWCACHE_ALIGN |
+					  SLAB_RECLAIM_ACCOUNT |
+					  SLAB_TYPESAFE_BY_RCU);
+	if (!global.slab_requests)
+		return -ENOMEM;
+
+	global.slab_dependencies = KMEM_CACHE(i915_dependency,
+					      SLAB_HWCACHE_ALIGN |
+					      SLAB_RECLAIM_ACCOUNT);
+	if (!global.slab_dependencies)
+		goto err_requests;
+
+	return 0;
+
+err_requests:
+	kmem_cache_destroy(global.slab_requests);
+	return -ENOMEM;
+}
+
+void i915_global_request_shrink(void)
+{
+	kmem_cache_shrink(global.slab_dependencies);
+	kmem_cache_shrink(global.slab_requests);
+}
+
+void i915_global_request_exit(void)
+{
+	kmem_cache_destroy(global.slab_dependencies);
+	kmem_cache_destroy(global.slab_requests);
+}
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 3cffb96203b9..054bd300984b 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -29,6 +29,7 @@
 
 #include "i915_gem.h"
 #include "i915_scheduler.h"
+#include "i915_selftest.h"
 #include "i915_sw_fence.h"
 
 #include <uapi/drm/i915_drm.h>
@@ -204,6 +205,11 @@ struct i915_request {
 	struct drm_i915_file_private *file_priv;
 	/** file_priv list entry for this request */
 	struct list_head client_link;
+
+	I915_SELFTEST_DECLARE(struct {
+		struct list_head link;
+		unsigned long delay;
+	} mock;)
 };
 
 #define I915_FENCE_GFP (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
@@ -786,4 +792,8 @@ i915_gem_active_retire(struct i915_gem_active *active,
 #define for_each_active(mask, idx) \
 	for (; mask ? idx = ffs(mask) - 1, 1 : 0; mask &= ~BIT(idx))
 
+int i915_global_request_init(void);
+void i915_global_request_shrink(void);
+void i915_global_request_exit(void);
+
 #endif /* I915_REQUEST_H */
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index d01683167c77..7c1d9ef98374 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -10,6 +10,11 @@
 #include "i915_request.h"
 #include "i915_scheduler.h"
 
+static struct i915_global_scheduler {
+	struct kmem_cache *slab_dependencies;
+	struct kmem_cache *slab_priorities;
+} global;
+
 static DEFINE_SPINLOCK(schedule_lock);
 
 static const struct i915_request *
@@ -32,16 +37,15 @@ void i915_sched_node_init(struct i915_sched_node *node)
 }
 
 static struct i915_dependency *
-i915_dependency_alloc(struct drm_i915_private *i915)
+i915_dependency_alloc(void)
 {
-	return kmem_cache_alloc(i915->dependencies, GFP_KERNEL);
+	return kmem_cache_alloc(global.slab_dependencies, GFP_KERNEL);
 }
 
 static void
-i915_dependency_free(struct drm_i915_private *i915,
-		     struct i915_dependency *dep)
+i915_dependency_free(struct i915_dependency *dep)
 {
-	kmem_cache_free(i915->dependencies, dep);
+	kmem_cache_free(global.slab_dependencies, dep);
 }
 
 bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
@@ -68,25 +72,23 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 	return ret;
 }
 
-int i915_sched_node_add_dependency(struct drm_i915_private *i915,
-				   struct i915_sched_node *node,
+int i915_sched_node_add_dependency(struct i915_sched_node *node,
 				   struct i915_sched_node *signal)
 {
 	struct i915_dependency *dep;
 
-	dep = i915_dependency_alloc(i915);
+	dep = i915_dependency_alloc();
 	if (!dep)
 		return -ENOMEM;
 
 	if (!__i915_sched_node_add_dependency(node, signal, dep,
 					      I915_DEPENDENCY_ALLOC))
-		i915_dependency_free(i915, dep);
+		i915_dependency_free(dep);
 
 	return 0;
 }
 
-void i915_sched_node_fini(struct drm_i915_private *i915,
-			  struct i915_sched_node *node)
+void i915_sched_node_fini(struct i915_sched_node *node)
 {
 	struct i915_dependency *dep, *tmp;
 
@@ -106,7 +108,7 @@ void i915_sched_node_fini(struct drm_i915_private *i915,
 
 		list_del(&dep->wait_link);
 		if (dep->flags & I915_DEPENDENCY_ALLOC)
-			i915_dependency_free(i915, dep);
+			i915_dependency_free(dep);
 	}
 
 	/* Remove ourselves from everyone who depends upon us */
@@ -116,7 +118,7 @@ void i915_sched_node_fini(struct drm_i915_private *i915,
 
 		list_del(&dep->signal_link);
 		if (dep->flags & I915_DEPENDENCY_ALLOC)
-			i915_dependency_free(i915, dep);
+			i915_dependency_free(dep);
 	}
 
 	spin_unlock(&schedule_lock);
@@ -193,7 +195,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 	if (prio == I915_PRIORITY_NORMAL) {
 		p = &execlists->default_priolist;
 	} else {
-		p = kmem_cache_alloc(engine->i915->priorities, GFP_ATOMIC);
+		p = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC);
 		/* Convert an allocation failure to a priority bump */
 		if (unlikely(!p)) {
 			prio = I915_PRIORITY_NORMAL; /* recurses just once */
@@ -408,3 +410,39 @@ void i915_schedule_bump_priority(struct i915_request *rq, unsigned int bump)
 
 	spin_unlock_bh(&schedule_lock);
 }
+
+void __i915_priolist_free(struct i915_priolist *p)
+{
+	kmem_cache_free(global.slab_priorities, p);
+}
+
+int i915_global_scheduler_init(void)
+{
+	global.slab_dependencies = KMEM_CACHE(i915_dependency,
+					      SLAB_HWCACHE_ALIGN);
+	if (!global.slab_dependencies)
+		return -ENOMEM;
+
+	global.slab_priorities = KMEM_CACHE(i915_priolist,
+					    SLAB_HWCACHE_ALIGN);
+	if (!global.slab_priorities)
+		goto err_priorities;
+
+	return 0;
+
+err_priorities:
+	kmem_cache_destroy(global.slab_priorities);
+	return -ENOMEM;
+}
+
+void i915_global_scheduler_shrink(void)
+{
+	kmem_cache_shrink(global.slab_dependencies);
+	kmem_cache_shrink(global.slab_priorities);
+}
+
+void i915_global_scheduler_exit(void)
+{
+	kmem_cache_destroy(global.slab_dependencies);
+	kmem_cache_destroy(global.slab_priorities);
+}
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 54bd6c89817e..5196ce07b6c2 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -85,6 +85,23 @@ struct i915_dependency {
 #define I915_DEPENDENCY_ALLOC BIT(0)
 };
 
+struct i915_priolist {
+	struct list_head requests[I915_PRIORITY_COUNT];
+	struct rb_node node;
+	unsigned long used;
+	int priority;
+};
+
+#define priolist_for_each_request(it, plist, idx) \
+	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
+		list_for_each_entry(it, &(plist)->requests[idx], sched.link)
+
+#define priolist_for_each_request_consume(it, n, plist, idx) \
+	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
+		list_for_each_entry_safe(it, n, \
+					 &(plist)->requests[idx - 1], \
+					 sched.link)
+
 void i915_sched_node_init(struct i915_sched_node *node);
 
 bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
@@ -92,12 +109,10 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 				      struct i915_dependency *dep,
 				      unsigned long flags);
 
-int i915_sched_node_add_dependency(struct drm_i915_private *i915,
-				   struct i915_sched_node *node,
+int i915_sched_node_add_dependency(struct i915_sched_node *node,
 				   struct i915_sched_node *signal);
 
-void i915_sched_node_fini(struct drm_i915_private *i915,
-			  struct i915_sched_node *node);
+void i915_sched_node_fini(struct i915_sched_node *node);
 
 void i915_schedule(struct i915_request *request,
 		   const struct i915_sched_attr *attr);
@@ -107,4 +122,15 @@ void i915_schedule_bump_priority(struct i915_request *rq, unsigned int bump);
 struct list_head *
 i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio);
 
+void __i915_priolist_free(struct i915_priolist *p);
+static inline void i915_priolist_free(struct i915_priolist *p)
+{
+	if (p->priority != I915_PRIORITY_NORMAL)
+		__i915_priolist_free(p);
+}
+
+int i915_global_scheduler_init(void);
+void i915_global_scheduler_shrink(void);
+void i915_global_scheduler_exit(void);
+
 #endif /* _I915_SCHEDULER_H_ */
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 8bc8aa54aa35..4cf94513615d 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -781,8 +781,7 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
 		}
 
 		rb_erase_cached(&p->node, &execlists->queue);
-		if (p->priority != I915_PRIORITY_NORMAL)
-			kmem_cache_free(engine->i915->priorities, p);
+		i915_priolist_free(p);
 	}
 done:
 	execlists->queue_priority_hint =
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8e301f19036b..e37f207afb5a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -806,8 +806,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		}
 
 		rb_erase_cached(&p->node, &execlists->queue);
-		if (p->priority != I915_PRIORITY_NORMAL)
-			kmem_cache_free(engine->i915->priorities, p);
+		i915_priolist_free(p);
 	}
 
 done:
@@ -966,8 +965,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 		}
 
 		rb_erase_cached(&p->node, &execlists->queue);
-		if (p->priority != I915_PRIORITY_NORMAL)
-			kmem_cache_free(engine->i915->priorities, p);
+		i915_priolist_free(p);
 	}
 
 	intel_write_status_page(engine,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 8183d3441907..5dffccb6740e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -185,23 +185,6 @@ enum intel_engine_id {
 #define _VECS(n) (VECS + (n))
 };
 
-struct i915_priolist {
-	struct list_head requests[I915_PRIORITY_COUNT];
-	struct rb_node node;
-	unsigned long used;
-	int priority;
-};
-
-#define priolist_for_each_request(it, plist, idx) \
-	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
-		list_for_each_entry(it, &(plist)->requests[idx], sched.link)
-
-#define priolist_for_each_request_consume(it, n, plist, idx) \
-	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
-		list_for_each_entry_safe(it, n, \
-					 &(plist)->requests[idx - 1], \
-					 sched.link)
-
 struct st_preempt_hang {
 	struct completion completion;
 	unsigned int count;
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 967cefa118ee..30ab0e04a674 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -440,7 +440,7 @@ static struct i915_request *dummy_request(struct intel_engine_cs *engine)
 static void dummy_request_free(struct i915_request *dummy)
 {
 	i915_request_mark_complete(dummy);
-	i915_sched_node_fini(dummy->engine->i915, &dummy->sched);
+	i915_sched_node_fini(&dummy->sched);
 	kfree(dummy);
 }
 
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 08f0cab02e0f..0d35af07867b 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -76,28 +76,27 @@ static void mock_ring_free(struct intel_ring *base)
 	kfree(ring);
 }
 
-static struct mock_request *first_request(struct mock_engine *engine)
+static struct i915_request *first_request(struct mock_engine *engine)
 {
 	return list_first_entry_or_null(&engine->hw_queue,
-					struct mock_request,
-					link);
+					struct i915_request,
+					mock.link);
 }
 
-static void advance(struct mock_request *request)
+static void advance(struct i915_request *request)
 {
-	list_del_init(&request->link);
-	intel_engine_write_global_seqno(request->base.engine,
-					request->base.global_seqno);
-	i915_request_mark_complete(&request->base);
-	GEM_BUG_ON(!i915_request_completed(&request->base));
+	list_del_init(&request->mock.link);
+	intel_engine_write_global_seqno(request->engine, request->global_seqno);
+	i915_request_mark_complete(request);
+	GEM_BUG_ON(!i915_request_completed(request));
 
-	intel_engine_queue_breadcrumbs(request->base.engine);
+	intel_engine_queue_breadcrumbs(request->engine);
 }
 
 static void hw_delay_complete(struct timer_list *t)
 {
 	struct mock_engine *engine = from_timer(engine, t, hw_delay);
-	struct mock_request *request;
+	struct i915_request *request;
 	unsigned long flags;
 
 	spin_lock_irqsave(&engine->hw_lock, flags);
@@ -112,8 +111,9 @@ static void hw_delay_complete(struct timer_list *t)
 	 * requeue the timer for the next delayed request.
 	 */
 	while ((request = first_request(engine))) {
-		if (request->delay) {
-			mod_timer(&engine->hw_delay, jiffies + request->delay);
+		if (request->mock.delay) {
+			mod_timer(&engine->hw_delay,
+				  jiffies + request->mock.delay);
 			break;
 		}
 
@@ -171,10 +171,8 @@ mock_context_pin(struct intel_engine_cs *engine,
 
 static int mock_request_alloc(struct i915_request *request)
 {
-	struct mock_request *mock = container_of(request, typeof(*mock), base);
-
-	INIT_LIST_HEAD(&mock->link);
-	mock->delay = 0;
+	INIT_LIST_HEAD(&request->mock.link);
+	request->mock.delay = 0;
 
 	return 0;
 }
@@ -192,7 +190,6 @@ static u32 *mock_emit_breadcrumb(struct i915_request *request, u32 *cs)
 
 static void mock_submit_request(struct i915_request *request)
 {
-	struct mock_request *mock = container_of(request, typeof(*mock), base);
 	struct mock_engine *engine =
 		container_of(request->engine, typeof(*engine), base);
 	unsigned long flags;
@@ -201,12 +198,13 @@ static void mock_submit_request(struct i915_request *request)
 	GEM_BUG_ON(!request->global_seqno);
 
 	spin_lock_irqsave(&engine->hw_lock, flags);
-	list_add_tail(&mock->link, &engine->hw_queue);
-	if (mock->link.prev == &engine->hw_queue) {
-		if (mock->delay)
-			mod_timer(&engine->hw_delay, jiffies + mock->delay);
+	list_add_tail(&request->mock.link, &engine->hw_queue);
+	if (list_is_first(&request->mock.link, &engine->hw_queue)) {
+		if (request->mock.delay)
+			mod_timer(&engine->hw_delay,
+				  jiffies + request->mock.delay);
 		else
-			advance(mock);
+			advance(request);
 	}
 	spin_unlock_irqrestore(&engine->hw_lock, flags);
 }
@@ -266,12 +264,12 @@ void mock_engine_flush(struct intel_engine_cs *engine)
 {
 	struct mock_engine *mock =
 		container_of(engine, typeof(*mock), base);
-	struct mock_request *request, *rn;
+	struct i915_request *request, *rn;
 
 	del_timer_sync(&mock->hw_delay);
 
 	spin_lock_irq(&mock->hw_lock);
-	list_for_each_entry_safe(request, rn, &mock->hw_queue, link)
+	list_for_each_entry_safe(request, rn, &mock->hw_queue, mock.link)
 		advance(request);
 	spin_unlock_irq(&mock->hw_lock);
 }
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 074a0d9cbf26..17915a2d94fa 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -79,9 +79,6 @@ static void mock_device_release(struct drm_device *dev)
 
 	destroy_workqueue(i915->wq);
 
-	kmem_cache_destroy(i915->priorities);
-	kmem_cache_destroy(i915->dependencies);
-	kmem_cache_destroy(i915->requests);
 	kmem_cache_destroy(i915->vmas);
 	kmem_cache_destroy(i915->objects);
 
@@ -211,23 +208,6 @@ struct drm_i915_private *mock_gem_device(void)
 	if (!i915->vmas)
 		goto err_objects;
 
-	i915->requests = KMEM_CACHE(mock_request,
-				    SLAB_HWCACHE_ALIGN |
-				    SLAB_RECLAIM_ACCOUNT |
-				    SLAB_TYPESAFE_BY_RCU);
-	if (!i915->requests)
-		goto err_vmas;
-
-	i915->dependencies = KMEM_CACHE(i915_dependency,
-					SLAB_HWCACHE_ALIGN |
-					SLAB_RECLAIM_ACCOUNT);
-	if (!i915->dependencies)
-		goto err_requests;
-
-	i915->priorities = KMEM_CACHE(i915_priolist, SLAB_HWCACHE_ALIGN);
-	if (!i915->priorities)
-		goto err_dependencies;
-
 	i915_timelines_init(i915);
 
 	INIT_LIST_HEAD(&i915->gt.active_rings);
@@ -257,12 +237,6 @@ struct drm_i915_private *mock_gem_device(void)
 err_unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
 	i915_timelines_fini(i915);
-	kmem_cache_destroy(i915->priorities);
-err_dependencies:
-	kmem_cache_destroy(i915->dependencies);
-err_requests:
-	kmem_cache_destroy(i915->requests);
-err_vmas:
 	kmem_cache_destroy(i915->vmas);
 err_objects:
 	kmem_cache_destroy(i915->objects);
diff --git a/drivers/gpu/drm/i915/selftests/mock_request.c b/drivers/gpu/drm/i915/selftests/mock_request.c
index 0dc29e242597..d1a7c9608712 100644
--- a/drivers/gpu/drm/i915/selftests/mock_request.c
+++ b/drivers/gpu/drm/i915/selftests/mock_request.c
@@ -31,29 +31,25 @@ mock_request(struct intel_engine_cs *engine,
 	     unsigned long delay)
 {
 	struct i915_request *request;
-	struct mock_request *mock;
 
 	/* NB the i915->requests slab cache is enlarged to fit mock_request */
 	request = i915_request_alloc(engine, context);
 	if (IS_ERR(request))
 		return NULL;
 
-	mock = container_of(request, typeof(*mock), base);
-	mock->delay = delay;
-
-	return &mock->base;
+	request->mock.delay = delay;
+	return request;
 }
 
 bool mock_cancel_request(struct i915_request *request)
 {
-	struct mock_request *mock = container_of(request, typeof(*mock), base);
 	struct mock_engine *engine =
 		container_of(request->engine, typeof(*engine), base);
 	bool was_queued;
 
 	spin_lock_irq(&engine->hw_lock);
-	was_queued = !list_empty(&mock->link);
-	list_del_init(&mock->link);
+	was_queued = !list_empty(&request->mock.link);
+	list_del_init(&request->mock.link);
 	spin_unlock_irq(&engine->hw_lock);
 
 	if (was_queued)
diff --git a/drivers/gpu/drm/i915/selftests/mock_request.h b/drivers/gpu/drm/i915/selftests/mock_request.h
index 995fb728380c..4acf0211df20 100644
--- a/drivers/gpu/drm/i915/selftests/mock_request.h
+++ b/drivers/gpu/drm/i915/selftests/mock_request.h
@@ -29,13 +29,6 @@
 
 #include "../i915_request.h"
 
-struct mock_request {
-	struct i915_request base;
-
-	struct list_head link;
-	unsigned long delay;
-};
-
 struct i915_request *
 mock_request(struct intel_engine_cs *engine,
 	     struct i915_gem_context *context,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 15/22] drm/i915: Make request allocation caches global
  2019-02-04 13:22 ` [PATCH 15/22] drm/i915: Make request allocation caches global Chris Wilson
@ 2019-02-04 18:48   ` Tvrtko Ursulin
  2019-02-04 21:26     ` Chris Wilson
  0 siblings, 1 reply; 45+ messages in thread
From: Tvrtko Ursulin @ 2019-02-04 18:48 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/02/2019 13:22, Chris Wilson wrote
> As kmem_caches share the same properties (size, allocation/free behaviour)
> for all potential devices, we can use global caches. While this
> potential has worse fragmentation behaviour (one can argue that
> different devices would have different activity lifetimes, but you can
> also argue that activity is temporal across the system) it is the
> default behaviour of the system at large to amalgamate matching caches.
> 
> The benefit for us is much reduced pointer dancing along the frequent
> allocation paths.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/Makefile                 |  1 +
>   drivers/gpu/drm/i915/i915_active.c            |  9 ++-
>   drivers/gpu/drm/i915/i915_active.h            |  1 +
>   drivers/gpu/drm/i915/i915_drv.h               |  3 -
>   drivers/gpu/drm/i915/i915_gem.c               | 32 +--------
>   drivers/gpu/drm/i915/i915_globals.c           | 49 ++++++++++++++
>   drivers/gpu/drm/i915/i915_globals.h           | 14 ++++
>   drivers/gpu/drm/i915/i915_pci.c               |  8 ++-
>   drivers/gpu/drm/i915/i915_request.c           | 53 ++++++++++++---
>   drivers/gpu/drm/i915/i915_request.h           | 10 +++
>   drivers/gpu/drm/i915/i915_scheduler.c         | 66 +++++++++++++++----
>   drivers/gpu/drm/i915/i915_scheduler.h         | 34 ++++++++--
>   drivers/gpu/drm/i915/intel_guc_submission.c   |  3 +-
>   drivers/gpu/drm/i915/intel_lrc.c              |  6 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.h       | 17 -----
>   drivers/gpu/drm/i915/selftests/intel_lrc.c    |  2 +-
>   drivers/gpu/drm/i915/selftests/mock_engine.c  | 48 +++++++-------
>   .../gpu/drm/i915/selftests/mock_gem_device.c  | 26 --------
>   drivers/gpu/drm/i915/selftests/mock_request.c | 12 ++--
>   drivers/gpu/drm/i915/selftests/mock_request.h |  7 --
>   20 files changed, 248 insertions(+), 153 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/i915_globals.c
>   create mode 100644 drivers/gpu/drm/i915/i915_globals.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 1787e1299b1b..a1d834068765 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -77,6 +77,7 @@ i915-y += \
>   	  i915_gem_tiling.o \
>   	  i915_gem_userptr.o \
>   	  i915_gemfs.o \
> +	  i915_globals.o \
>   	  i915_query.o \
>   	  i915_request.o \
>   	  i915_scheduler.o \
> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> index 64661c41532b..d23092d8c89f 100644
> --- a/drivers/gpu/drm/i915/i915_active.c
> +++ b/drivers/gpu/drm/i915/i915_active.c
> @@ -251,7 +251,7 @@ void i915_active_fini(struct i915_active *ref)
>   #include "selftests/i915_active.c"
>   #endif
>   
> -int __init i915_global_active_init(void)
> +int i915_global_active_init(void)

These can't remain __init, since they are only called from the global 
__init one?

>   {
>   	global.slab_cache = KMEM_CACHE(active_node, SLAB_HWCACHE_ALIGN);
>   	if (!global.slab_cache)
> @@ -260,7 +260,12 @@ int __init i915_global_active_init(void)
>   	return 0;
>   }
>   
> -void __exit i915_global_active_exit(void)
> +void i915_global_active_shrink(void)
> +{
> +	kmem_cache_shrink(global.slab_cache);
> +}
> +
> +void i915_global_active_exit(void)
>   {
>   	kmem_cache_destroy(global.slab_cache);
>   }
> diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
> index 179b47aeec33..6c56d10b1f59 100644
> --- a/drivers/gpu/drm/i915/i915_active.h
> +++ b/drivers/gpu/drm/i915/i915_active.h
> @@ -71,6 +71,7 @@ static inline void i915_active_fini(struct i915_active *ref) { }
>   #endif
>   
>   int i915_global_active_init(void);
> +void i915_global_active_shrink(void);
>   void i915_global_active_exit(void);
>   
>   #endif /* _I915_ACTIVE_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 3e4538ce5276..e48e3c228d9c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1459,9 +1459,6 @@ struct drm_i915_private {
>   	struct kmem_cache *objects;
>   	struct kmem_cache *vmas;
>   	struct kmem_cache *luts;
> -	struct kmem_cache *requests;
> -	struct kmem_cache *dependencies;
> -	struct kmem_cache *priorities;
>   
>   	const struct intel_device_info __info; /* Use INTEL_INFO() to access. */
>   	struct intel_runtime_info __runtime; /* Use RUNTIME_INFO() to access. */
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 2c6161c89cc7..d82e4f990586 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -42,6 +42,7 @@
>   #include "i915_drv.h"
>   #include "i915_gem_clflush.h"
>   #include "i915_gemfs.h"
> +#include "i915_globals.h"
>   #include "i915_reset.h"
>   #include "i915_trace.h"
>   #include "i915_vgpu.h"
> @@ -2916,12 +2917,11 @@ static void shrink_caches(struct drm_i915_private *i915)
>   	 * filled slabs to prioritise allocating from the mostly full slabs,
>   	 * with the aim of reducing fragmentation.
>   	 */
> -	kmem_cache_shrink(i915->priorities);
> -	kmem_cache_shrink(i915->dependencies);
> -	kmem_cache_shrink(i915->requests);
>   	kmem_cache_shrink(i915->luts);
>   	kmem_cache_shrink(i915->vmas);
>   	kmem_cache_shrink(i915->objects);
> +
> +	i915_globals_shrink();

This is the main bit which worries me.

Global caches are what we want I think, exactly for what you wrote in 
the commit message. But would one device going idle have the potential 
to inject some latency into another potentially very busy client?

Perhaps we could have some sort of aggregated idle signal and defer 
shrinking cached to that point. Like a bitmask of global clients 
reporting their idle/active status to global core, and then shrink 
happens only if all are idle.

>   }
>   
>   struct sleep_rcu_work {
> @@ -5264,23 +5264,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
>   	if (!dev_priv->luts)
>   		goto err_vmas;
>   
> -	dev_priv->requests = KMEM_CACHE(i915_request,
> -					SLAB_HWCACHE_ALIGN |
> -					SLAB_RECLAIM_ACCOUNT |
> -					SLAB_TYPESAFE_BY_RCU);
> -	if (!dev_priv->requests)
> -		goto err_luts;
> -
> -	dev_priv->dependencies = KMEM_CACHE(i915_dependency,
> -					    SLAB_HWCACHE_ALIGN |
> -					    SLAB_RECLAIM_ACCOUNT);
> -	if (!dev_priv->dependencies)
> -		goto err_requests;
> -
> -	dev_priv->priorities = KMEM_CACHE(i915_priolist, SLAB_HWCACHE_ALIGN);
> -	if (!dev_priv->priorities)
> -		goto err_dependencies;
> -
>   	INIT_LIST_HEAD(&dev_priv->gt.active_rings);
>   	INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
>   
> @@ -5305,12 +5288,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
>   
>   	return 0;
>   
> -err_dependencies:
> -	kmem_cache_destroy(dev_priv->dependencies);
> -err_requests:
> -	kmem_cache_destroy(dev_priv->requests);
> -err_luts:
> -	kmem_cache_destroy(dev_priv->luts);
>   err_vmas:
>   	kmem_cache_destroy(dev_priv->vmas);
>   err_objects:
> @@ -5328,9 +5305,6 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
>   
>   	cleanup_srcu_struct(&dev_priv->gpu_error.srcu);
>   
> -	kmem_cache_destroy(dev_priv->priorities);
> -	kmem_cache_destroy(dev_priv->dependencies);
> -	kmem_cache_destroy(dev_priv->requests);
>   	kmem_cache_destroy(dev_priv->luts);
>   	kmem_cache_destroy(dev_priv->vmas);
>   	kmem_cache_destroy(dev_priv->objects);
> diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c
> new file mode 100644
> index 000000000000..2ecf9897fd16
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_globals.c
> @@ -0,0 +1,49 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#include "i915_active.h"
> +#include "i915_globals.h"
> +#include "i915_request.h"
> +#include "i915_scheduler.h"
> +
> +int __init i915_globals_init(void)
> +{
> +	int err;
> +
> +	err = i915_global_active_init();
> +	if (err)
> +		return err;
> +
> +	err = i915_global_request_init();
> +	if (err)
> +		goto err_active;
> +
> +	err = i915_global_scheduler_init();
> +	if (err)
> +		goto err_request;
> +
> +	return 0;
> +
> +err_request:
> +	i915_global_request_exit();
> +err_active:
> +	i915_global_active_exit();
> +	return err;
> +}
> +
> +void i915_globals_shrink(void)
> +{
> +	i915_global_active_shrink();
> +	i915_global_request_shrink();
> +	i915_global_scheduler_shrink();
> +}
> +
> +void __exit i915_globals_exit(void)
> +{
> +	i915_global_scheduler_exit();
> +	i915_global_request_exit();
> +	i915_global_active_exit();
> +}
> diff --git a/drivers/gpu/drm/i915/i915_globals.h b/drivers/gpu/drm/i915/i915_globals.h
> new file mode 100644
> index 000000000000..903f52c0a1d2
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_globals.h
> @@ -0,0 +1,14 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#ifndef _I915_GLOBALS_H_
> +#define _I915_GLOBALS_H_
> +
> +int i915_globals_init(void);
> +void i915_globals_shrink(void);
> +void i915_globals_exit(void);
> +
> +#endif /* _I915_GLOBALS_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 852b6b4e8ed8..0d684eb530c3 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -28,8 +28,8 @@
>   
>   #include <drm/drm_drv.h>
>   
> -#include "i915_active.h"
>   #include "i915_drv.h"
> +#include "i915_globals.h"
>   #include "i915_selftest.h"
>   
>   #define PLATFORM(x) .platform = (x), .platform_mask = BIT(x)
> @@ -801,7 +801,9 @@ static int __init i915_init(void)
>   	bool use_kms = true;
>   	int err;
>   
> -	i915_global_active_init();
> +	err = i915_globals_init();
> +	if (err)
> +		return err;
>   
>   	err = i915_mock_selftests();
>   	if (err)
> @@ -834,7 +836,7 @@ static void __exit i915_exit(void)
>   		return;
>   
>   	pci_unregister_driver(&i915_pci_driver);
> -	i915_global_active_exit();
> +	i915_globals_exit();
>   }
>   
>   module_init(i915_init);
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 04c65e6d83b9..3bb4840ba761 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -31,6 +31,11 @@
>   #include "i915_drv.h"
>   #include "i915_reset.h"
>   
> +static struct i915_global_request {
> +	struct kmem_cache *slab_requests;
> +	struct kmem_cache *slab_dependencies;
> +} global;
> +
>   static const char *i915_fence_get_driver_name(struct dma_fence *fence)
>   {
>   	return "i915";
> @@ -83,7 +88,7 @@ static void i915_fence_release(struct dma_fence *fence)
>   	 */
>   	i915_sw_fence_fini(&rq->submit);
>   
> -	kmem_cache_free(rq->i915->requests, rq);
> +	kmem_cache_free(global.slab_requests, rq);
>   }
>   
>   const struct dma_fence_ops i915_fence_ops = {
> @@ -301,7 +306,7 @@ static void i915_request_retire(struct i915_request *request)
>   
>   	unreserve_gt(request->i915);
>   
> -	i915_sched_node_fini(request->i915, &request->sched);
> +	i915_sched_node_fini(&request->sched);
>   	i915_request_put(request);
>   }
>   
> @@ -535,7 +540,7 @@ i915_request_alloc_slow(struct intel_context *ce)
>   	ring_retire_requests(ring);
>   
>   out:
> -	return kmem_cache_alloc(ce->gem_context->i915->requests, GFP_KERNEL);
> +	return kmem_cache_alloc(global.slab_requests, GFP_KERNEL);
>   }
>   
>   /**
> @@ -617,7 +622,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	 *
>   	 * Do not use kmem_cache_zalloc() here!
>   	 */
> -	rq = kmem_cache_alloc(i915->requests,
> +	rq = kmem_cache_alloc(global.slab_requests,
>   			      GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
>   	if (unlikely(!rq)) {
>   		rq = i915_request_alloc_slow(ce);
> @@ -701,7 +706,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	GEM_BUG_ON(!list_empty(&rq->sched.signalers_list));
>   	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
>   
> -	kmem_cache_free(i915->requests, rq);
> +	kmem_cache_free(global.slab_requests, rq);
>   err_unreserve:
>   	unreserve_gt(i915);
>   	intel_context_unpin(ce);
> @@ -720,9 +725,7 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
>   		return 0;
>   
>   	if (to->engine->schedule) {
> -		ret = i915_sched_node_add_dependency(to->i915,
> -						     &to->sched,
> -						     &from->sched);
> +		ret = i915_sched_node_add_dependency(&to->sched, &from->sched);
>   		if (ret < 0)
>   			return ret;
>   	}
> @@ -1195,3 +1198,37 @@ void i915_retire_requests(struct drm_i915_private *i915)
>   #include "selftests/mock_request.c"
>   #include "selftests/i915_request.c"
>   #endif
> +
> +int i915_global_request_init(void)
> +{
> +	global.slab_requests = KMEM_CACHE(i915_request,
> +					  SLAB_HWCACHE_ALIGN |
> +					  SLAB_RECLAIM_ACCOUNT |
> +					  SLAB_TYPESAFE_BY_RCU);
> +	if (!global.slab_requests)
> +		return -ENOMEM;
> +
> +	global.slab_dependencies = KMEM_CACHE(i915_dependency,
> +					      SLAB_HWCACHE_ALIGN |
> +					      SLAB_RECLAIM_ACCOUNT);
> +	if (!global.slab_dependencies)
> +		goto err_requests;
> +
> +	return 0;
> +
> +err_requests:
> +	kmem_cache_destroy(global.slab_requests);
> +	return -ENOMEM;
> +}
> +
> +void i915_global_request_shrink(void)
> +{
> +	kmem_cache_shrink(global.slab_dependencies);
> +	kmem_cache_shrink(global.slab_requests);
> +}
> +
> +void i915_global_request_exit(void)
> +{
> +	kmem_cache_destroy(global.slab_dependencies);
> +	kmem_cache_destroy(global.slab_requests);
> +}
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 3cffb96203b9..054bd300984b 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -29,6 +29,7 @@
>   
>   #include "i915_gem.h"
>   #include "i915_scheduler.h"
> +#include "i915_selftest.h"
>   #include "i915_sw_fence.h"
>   
>   #include <uapi/drm/i915_drm.h>
> @@ -204,6 +205,11 @@ struct i915_request {
>   	struct drm_i915_file_private *file_priv;
>   	/** file_priv list entry for this request */
>   	struct list_head client_link;
> +
> +	I915_SELFTEST_DECLARE(struct {
> +		struct list_head link;
> +		unsigned long delay;
> +	} mock;)
>   };
>   
>   #define I915_FENCE_GFP (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
> @@ -786,4 +792,8 @@ i915_gem_active_retire(struct i915_gem_active *active,
>   #define for_each_active(mask, idx) \
>   	for (; mask ? idx = ffs(mask) - 1, 1 : 0; mask &= ~BIT(idx))
>   
> +int i915_global_request_init(void);
> +void i915_global_request_shrink(void);
> +void i915_global_request_exit(void);
> +
>   #endif /* I915_REQUEST_H */
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index d01683167c77..7c1d9ef98374 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -10,6 +10,11 @@
>   #include "i915_request.h"
>   #include "i915_scheduler.h"
>   
> +static struct i915_global_scheduler {
> +	struct kmem_cache *slab_dependencies;
> +	struct kmem_cache *slab_priorities;
> +} global;
> +
>   static DEFINE_SPINLOCK(schedule_lock);
>   
>   static const struct i915_request *
> @@ -32,16 +37,15 @@ void i915_sched_node_init(struct i915_sched_node *node)
>   }
>   
>   static struct i915_dependency *
> -i915_dependency_alloc(struct drm_i915_private *i915)
> +i915_dependency_alloc(void)
>   {
> -	return kmem_cache_alloc(i915->dependencies, GFP_KERNEL);
> +	return kmem_cache_alloc(global.slab_dependencies, GFP_KERNEL);
>   }
>   
>   static void
> -i915_dependency_free(struct drm_i915_private *i915,
> -		     struct i915_dependency *dep)
> +i915_dependency_free(struct i915_dependency *dep)
>   {
> -	kmem_cache_free(i915->dependencies, dep);
> +	kmem_cache_free(global.slab_dependencies, dep);
>   }
>   
>   bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
> @@ -68,25 +72,23 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
>   	return ret;
>   }
>   
> -int i915_sched_node_add_dependency(struct drm_i915_private *i915,
> -				   struct i915_sched_node *node,
> +int i915_sched_node_add_dependency(struct i915_sched_node *node,
>   				   struct i915_sched_node *signal)
>   {
>   	struct i915_dependency *dep;
>   
> -	dep = i915_dependency_alloc(i915);
> +	dep = i915_dependency_alloc();
>   	if (!dep)
>   		return -ENOMEM;
>   
>   	if (!__i915_sched_node_add_dependency(node, signal, dep,
>   					      I915_DEPENDENCY_ALLOC))
> -		i915_dependency_free(i915, dep);
> +		i915_dependency_free(dep);
>   
>   	return 0;
>   }
>   
> -void i915_sched_node_fini(struct drm_i915_private *i915,
> -			  struct i915_sched_node *node)
> +void i915_sched_node_fini(struct i915_sched_node *node)
>   {
>   	struct i915_dependency *dep, *tmp;
>   
> @@ -106,7 +108,7 @@ void i915_sched_node_fini(struct drm_i915_private *i915,
>   
>   		list_del(&dep->wait_link);
>   		if (dep->flags & I915_DEPENDENCY_ALLOC)
> -			i915_dependency_free(i915, dep);
> +			i915_dependency_free(dep);
>   	}
>   
>   	/* Remove ourselves from everyone who depends upon us */
> @@ -116,7 +118,7 @@ void i915_sched_node_fini(struct drm_i915_private *i915,
>   
>   		list_del(&dep->signal_link);
>   		if (dep->flags & I915_DEPENDENCY_ALLOC)
> -			i915_dependency_free(i915, dep);
> +			i915_dependency_free(dep);
>   	}
>   
>   	spin_unlock(&schedule_lock);
> @@ -193,7 +195,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
>   	if (prio == I915_PRIORITY_NORMAL) {
>   		p = &execlists->default_priolist;
>   	} else {
> -		p = kmem_cache_alloc(engine->i915->priorities, GFP_ATOMIC);
> +		p = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC);
>   		/* Convert an allocation failure to a priority bump */
>   		if (unlikely(!p)) {
>   			prio = I915_PRIORITY_NORMAL; /* recurses just once */
> @@ -408,3 +410,39 @@ void i915_schedule_bump_priority(struct i915_request *rq, unsigned int bump)
>   
>   	spin_unlock_bh(&schedule_lock);
>   }
> +
> +void __i915_priolist_free(struct i915_priolist *p)
> +{
> +	kmem_cache_free(global.slab_priorities, p);
> +}
> +
> +int i915_global_scheduler_init(void)
> +{
> +	global.slab_dependencies = KMEM_CACHE(i915_dependency,
> +					      SLAB_HWCACHE_ALIGN);
> +	if (!global.slab_dependencies)
> +		return -ENOMEM;

Right, so this slab is duplicated. It could end up merged by the core, 
but I am thinking if this is the direction we want to go just to avoid 
some pointer chasing.

You wouldn't consider i915->global->slab_dependencies or something along 
those lines?

Regards,

Tvrtko

> +
> +	global.slab_priorities = KMEM_CACHE(i915_priolist,
> +					    SLAB_HWCACHE_ALIGN);
> +	if (!global.slab_priorities)
> +		goto err_priorities;
> +
> +	return 0;
> +
> +err_priorities:
> +	kmem_cache_destroy(global.slab_priorities);
> +	return -ENOMEM;
> +}
> +
> +void i915_global_scheduler_shrink(void)
> +{
> +	kmem_cache_shrink(global.slab_dependencies);
> +	kmem_cache_shrink(global.slab_priorities);
> +}
> +
> +void i915_global_scheduler_exit(void)
> +{
> +	kmem_cache_destroy(global.slab_dependencies);
> +	kmem_cache_destroy(global.slab_priorities);
> +}
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index 54bd6c89817e..5196ce07b6c2 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -85,6 +85,23 @@ struct i915_dependency {
>   #define I915_DEPENDENCY_ALLOC BIT(0)
>   };
>   
> +struct i915_priolist {
> +	struct list_head requests[I915_PRIORITY_COUNT];
> +	struct rb_node node;
> +	unsigned long used;
> +	int priority;
> +};
> +
> +#define priolist_for_each_request(it, plist, idx) \
> +	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
> +		list_for_each_entry(it, &(plist)->requests[idx], sched.link)
> +
> +#define priolist_for_each_request_consume(it, n, plist, idx) \
> +	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
> +		list_for_each_entry_safe(it, n, \
> +					 &(plist)->requests[idx - 1], \
> +					 sched.link)
> +
>   void i915_sched_node_init(struct i915_sched_node *node);
>   
>   bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
> @@ -92,12 +109,10 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
>   				      struct i915_dependency *dep,
>   				      unsigned long flags);
>   
> -int i915_sched_node_add_dependency(struct drm_i915_private *i915,
> -				   struct i915_sched_node *node,
> +int i915_sched_node_add_dependency(struct i915_sched_node *node,
>   				   struct i915_sched_node *signal);
>   
> -void i915_sched_node_fini(struct drm_i915_private *i915,
> -			  struct i915_sched_node *node);
> +void i915_sched_node_fini(struct i915_sched_node *node);
>   
>   void i915_schedule(struct i915_request *request,
>   		   const struct i915_sched_attr *attr);
> @@ -107,4 +122,15 @@ void i915_schedule_bump_priority(struct i915_request *rq, unsigned int bump);
>   struct list_head *
>   i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio);
>   
> +void __i915_priolist_free(struct i915_priolist *p);
> +static inline void i915_priolist_free(struct i915_priolist *p)
> +{
> +	if (p->priority != I915_PRIORITY_NORMAL)
> +		__i915_priolist_free(p);
> +}
> +
> +int i915_global_scheduler_init(void);
> +void i915_global_scheduler_shrink(void);
> +void i915_global_scheduler_exit(void);
> +
>   #endif /* _I915_SCHEDULER_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 8bc8aa54aa35..4cf94513615d 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -781,8 +781,7 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
>   		}
>   
>   		rb_erase_cached(&p->node, &execlists->queue);
> -		if (p->priority != I915_PRIORITY_NORMAL)
> -			kmem_cache_free(engine->i915->priorities, p);
> +		i915_priolist_free(p);
>   	}
>   done:
>   	execlists->queue_priority_hint =
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 8e301f19036b..e37f207afb5a 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -806,8 +806,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		}
>   
>   		rb_erase_cached(&p->node, &execlists->queue);
> -		if (p->priority != I915_PRIORITY_NORMAL)
> -			kmem_cache_free(engine->i915->priorities, p);
> +		i915_priolist_free(p);
>   	}
>   
>   done:
> @@ -966,8 +965,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>   		}
>   
>   		rb_erase_cached(&p->node, &execlists->queue);
> -		if (p->priority != I915_PRIORITY_NORMAL)
> -			kmem_cache_free(engine->i915->priorities, p);
> +		i915_priolist_free(p);
>   	}
>   
>   	intel_write_status_page(engine,
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 8183d3441907..5dffccb6740e 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -185,23 +185,6 @@ enum intel_engine_id {
>   #define _VECS(n) (VECS + (n))
>   };
>   
> -struct i915_priolist {
> -	struct list_head requests[I915_PRIORITY_COUNT];
> -	struct rb_node node;
> -	unsigned long used;
> -	int priority;
> -};
> -
> -#define priolist_for_each_request(it, plist, idx) \
> -	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
> -		list_for_each_entry(it, &(plist)->requests[idx], sched.link)
> -
> -#define priolist_for_each_request_consume(it, n, plist, idx) \
> -	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
> -		list_for_each_entry_safe(it, n, \
> -					 &(plist)->requests[idx - 1], \
> -					 sched.link)
> -
>   struct st_preempt_hang {
>   	struct completion completion;
>   	unsigned int count;
> diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> index 967cefa118ee..30ab0e04a674 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> @@ -440,7 +440,7 @@ static struct i915_request *dummy_request(struct intel_engine_cs *engine)
>   static void dummy_request_free(struct i915_request *dummy)
>   {
>   	i915_request_mark_complete(dummy);
> -	i915_sched_node_fini(dummy->engine->i915, &dummy->sched);
> +	i915_sched_node_fini(&dummy->sched);
>   	kfree(dummy);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index 08f0cab02e0f..0d35af07867b 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -76,28 +76,27 @@ static void mock_ring_free(struct intel_ring *base)
>   	kfree(ring);
>   }
>   
> -static struct mock_request *first_request(struct mock_engine *engine)
> +static struct i915_request *first_request(struct mock_engine *engine)
>   {
>   	return list_first_entry_or_null(&engine->hw_queue,
> -					struct mock_request,
> -					link);
> +					struct i915_request,
> +					mock.link);
>   }
>   
> -static void advance(struct mock_request *request)
> +static void advance(struct i915_request *request)
>   {
> -	list_del_init(&request->link);
> -	intel_engine_write_global_seqno(request->base.engine,
> -					request->base.global_seqno);
> -	i915_request_mark_complete(&request->base);
> -	GEM_BUG_ON(!i915_request_completed(&request->base));
> +	list_del_init(&request->mock.link);
> +	intel_engine_write_global_seqno(request->engine, request->global_seqno);
> +	i915_request_mark_complete(request);
> +	GEM_BUG_ON(!i915_request_completed(request));
>   
> -	intel_engine_queue_breadcrumbs(request->base.engine);
> +	intel_engine_queue_breadcrumbs(request->engine);
>   }
>   
>   static void hw_delay_complete(struct timer_list *t)
>   {
>   	struct mock_engine *engine = from_timer(engine, t, hw_delay);
> -	struct mock_request *request;
> +	struct i915_request *request;
>   	unsigned long flags;
>   
>   	spin_lock_irqsave(&engine->hw_lock, flags);
> @@ -112,8 +111,9 @@ static void hw_delay_complete(struct timer_list *t)
>   	 * requeue the timer for the next delayed request.
>   	 */
>   	while ((request = first_request(engine))) {
> -		if (request->delay) {
> -			mod_timer(&engine->hw_delay, jiffies + request->delay);
> +		if (request->mock.delay) {
> +			mod_timer(&engine->hw_delay,
> +				  jiffies + request->mock.delay);
>   			break;
>   		}
>   
> @@ -171,10 +171,8 @@ mock_context_pin(struct intel_engine_cs *engine,
>   
>   static int mock_request_alloc(struct i915_request *request)
>   {
> -	struct mock_request *mock = container_of(request, typeof(*mock), base);
> -
> -	INIT_LIST_HEAD(&mock->link);
> -	mock->delay = 0;
> +	INIT_LIST_HEAD(&request->mock.link);
> +	request->mock.delay = 0;
>   
>   	return 0;
>   }
> @@ -192,7 +190,6 @@ static u32 *mock_emit_breadcrumb(struct i915_request *request, u32 *cs)
>   
>   static void mock_submit_request(struct i915_request *request)
>   {
> -	struct mock_request *mock = container_of(request, typeof(*mock), base);
>   	struct mock_engine *engine =
>   		container_of(request->engine, typeof(*engine), base);
>   	unsigned long flags;
> @@ -201,12 +198,13 @@ static void mock_submit_request(struct i915_request *request)
>   	GEM_BUG_ON(!request->global_seqno);
>   
>   	spin_lock_irqsave(&engine->hw_lock, flags);
> -	list_add_tail(&mock->link, &engine->hw_queue);
> -	if (mock->link.prev == &engine->hw_queue) {
> -		if (mock->delay)
> -			mod_timer(&engine->hw_delay, jiffies + mock->delay);
> +	list_add_tail(&request->mock.link, &engine->hw_queue);
> +	if (list_is_first(&request->mock.link, &engine->hw_queue)) {
> +		if (request->mock.delay)
> +			mod_timer(&engine->hw_delay,
> +				  jiffies + request->mock.delay);
>   		else
> -			advance(mock);
> +			advance(request);
>   	}
>   	spin_unlock_irqrestore(&engine->hw_lock, flags);
>   }
> @@ -266,12 +264,12 @@ void mock_engine_flush(struct intel_engine_cs *engine)
>   {
>   	struct mock_engine *mock =
>   		container_of(engine, typeof(*mock), base);
> -	struct mock_request *request, *rn;
> +	struct i915_request *request, *rn;
>   
>   	del_timer_sync(&mock->hw_delay);
>   
>   	spin_lock_irq(&mock->hw_lock);
> -	list_for_each_entry_safe(request, rn, &mock->hw_queue, link)
> +	list_for_each_entry_safe(request, rn, &mock->hw_queue, mock.link)
>   		advance(request);
>   	spin_unlock_irq(&mock->hw_lock);
>   }
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index 074a0d9cbf26..17915a2d94fa 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -79,9 +79,6 @@ static void mock_device_release(struct drm_device *dev)
>   
>   	destroy_workqueue(i915->wq);
>   
> -	kmem_cache_destroy(i915->priorities);
> -	kmem_cache_destroy(i915->dependencies);
> -	kmem_cache_destroy(i915->requests);
>   	kmem_cache_destroy(i915->vmas);
>   	kmem_cache_destroy(i915->objects);
>   
> @@ -211,23 +208,6 @@ struct drm_i915_private *mock_gem_device(void)
>   	if (!i915->vmas)
>   		goto err_objects;
>   
> -	i915->requests = KMEM_CACHE(mock_request,
> -				    SLAB_HWCACHE_ALIGN |
> -				    SLAB_RECLAIM_ACCOUNT |
> -				    SLAB_TYPESAFE_BY_RCU);
> -	if (!i915->requests)
> -		goto err_vmas;
> -
> -	i915->dependencies = KMEM_CACHE(i915_dependency,
> -					SLAB_HWCACHE_ALIGN |
> -					SLAB_RECLAIM_ACCOUNT);
> -	if (!i915->dependencies)
> -		goto err_requests;
> -
> -	i915->priorities = KMEM_CACHE(i915_priolist, SLAB_HWCACHE_ALIGN);
> -	if (!i915->priorities)
> -		goto err_dependencies;
> -
>   	i915_timelines_init(i915);
>   
>   	INIT_LIST_HEAD(&i915->gt.active_rings);
> @@ -257,12 +237,6 @@ struct drm_i915_private *mock_gem_device(void)
>   err_unlock:
>   	mutex_unlock(&i915->drm.struct_mutex);
>   	i915_timelines_fini(i915);
> -	kmem_cache_destroy(i915->priorities);
> -err_dependencies:
> -	kmem_cache_destroy(i915->dependencies);
> -err_requests:
> -	kmem_cache_destroy(i915->requests);
> -err_vmas:
>   	kmem_cache_destroy(i915->vmas);
>   err_objects:
>   	kmem_cache_destroy(i915->objects);
> diff --git a/drivers/gpu/drm/i915/selftests/mock_request.c b/drivers/gpu/drm/i915/selftests/mock_request.c
> index 0dc29e242597..d1a7c9608712 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_request.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_request.c
> @@ -31,29 +31,25 @@ mock_request(struct intel_engine_cs *engine,
>   	     unsigned long delay)
>   {
>   	struct i915_request *request;
> -	struct mock_request *mock;
>   
>   	/* NB the i915->requests slab cache is enlarged to fit mock_request */
>   	request = i915_request_alloc(engine, context);
>   	if (IS_ERR(request))
>   		return NULL;
>   
> -	mock = container_of(request, typeof(*mock), base);
> -	mock->delay = delay;
> -
> -	return &mock->base;
> +	request->mock.delay = delay;
> +	return request;
>   }
>   
>   bool mock_cancel_request(struct i915_request *request)
>   {
> -	struct mock_request *mock = container_of(request, typeof(*mock), base);
>   	struct mock_engine *engine =
>   		container_of(request->engine, typeof(*engine), base);
>   	bool was_queued;
>   
>   	spin_lock_irq(&engine->hw_lock);
> -	was_queued = !list_empty(&mock->link);
> -	list_del_init(&mock->link);
> +	was_queued = !list_empty(&request->mock.link);
> +	list_del_init(&request->mock.link);
>   	spin_unlock_irq(&engine->hw_lock);
>   
>   	if (was_queued)
> diff --git a/drivers/gpu/drm/i915/selftests/mock_request.h b/drivers/gpu/drm/i915/selftests/mock_request.h
> index 995fb728380c..4acf0211df20 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_request.h
> +++ b/drivers/gpu/drm/i915/selftests/mock_request.h
> @@ -29,13 +29,6 @@
>   
>   #include "../i915_request.h"
>   
> -struct mock_request {
> -	struct i915_request base;
> -
> -	struct list_head link;
> -	unsigned long delay;
> -};
> -
>   struct i915_request *
>   mock_request(struct intel_engine_cs *engine,
>   	     struct i915_gem_context *context,
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 15/22] drm/i915: Make request allocation caches global
  2019-02-04 18:48   ` Tvrtko Ursulin
@ 2019-02-04 21:26     ` Chris Wilson
  0 siblings, 0 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 21:26 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-04 18:48:50)
> 
> On 04/02/2019 13:22, Chris Wilson wrote
> > -int __init i915_global_active_init(void)
> > +int i915_global_active_init(void)
> 
> These can't remain __init, since they are only called from the global 
> __init one?

I ran into problems, and removed __init until it stopped complaining and
I stopped caring.

> > @@ -2916,12 +2917,11 @@ static void shrink_caches(struct drm_i915_private *i915)
> >        * filled slabs to prioritise allocating from the mostly full slabs,
> >        * with the aim of reducing fragmentation.
> >        */
> > -     kmem_cache_shrink(i915->priorities);
> > -     kmem_cache_shrink(i915->dependencies);
> > -     kmem_cache_shrink(i915->requests);
> >       kmem_cache_shrink(i915->luts);
> >       kmem_cache_shrink(i915->vmas);
> >       kmem_cache_shrink(i915->objects);
> > +
> > +     i915_globals_shrink();
> 
> This is the main bit which worries me.
> 
> Global caches are what we want I think, exactly for what you wrote in 
> the commit message. But would one device going idle have the potential 
> to inject some latency into another potentially very busy client?
> 
> Perhaps we could have some sort of aggregated idle signal and defer 
> shrinking cached to that point. Like a bitmask of global clients 
> reporting their idle/active status to global core, and then shrink 
> happens only if all are idle.

I had mixed feelings too. I didn't want to completely discard the
current logic, but this should be shrinking only when idle across all
future stakeholders... or we demonstrate that shrinking has no effect on
concurrent allocation latency.

An active counter for unparking seems an easy way out. (Which today is
this imaginary 1bit counter.) What I thought helped save this was this
is done from a post-rcu worker, the system has to be pretty stable for
us to start shrinking. We only clash with the first user to wake up.

In the caller it does a loop over each cpu removing the local cpu cache
and then the global slab cache. That is clearly going to increase
latency for a concurrent caller.

> > +int i915_global_scheduler_init(void)
> > +{
> > +     global.slab_dependencies = KMEM_CACHE(i915_dependency,
> > +                                           SLAB_HWCACHE_ALIGN);
> > +     if (!global.slab_dependencies)
> > +             return -ENOMEM;
> 
> Right, so this slab is duplicated. It could end up merged by the core, 
> but I am thinking if this is the direction we want to go just to avoid 
> some pointer chasing.

"some pointer chasing" :)

The slab isn't necessary duplicated, that depends on compiletime policy.
In debug environments or those more sensitive to performance, it will be
private so that we can catch stray writes and what not.

add/remove: 11/0 grow/shrink: 11/20 up/down: 595/-668 (-73)
Function                                     old     new   delta
i915_global_request_init                       -     116    +116
i915_global_scheduler_init                     -     111    +111
igt_mock_ppgtt_misaligned_dma                679     748     +69
i915_globals_init                              -      53     +53
global                                         8      40     +32
i915_global_scheduler_shrink                   -      29     +29
i915_global_scheduler_exit                     -      29     +29
i915_global_request_shrink                     -      29     +29
i915_global_request_exit                       -      29     +29
i915_globals_shrink                            -      20     +20
__i915_priolist_free                           -      20     +20
i915_global_active_shrink                      -      17     +17
i915_globals_exit                              -      15     +15
live_suppress_wait_preempt.part.cold         202     211      +9
__err_print_to_sgl                          4175    4181      +6
i915_global_active_exit                       12      17      +5
intel_engine_lookup_user                      54      55      +1
init_module                                   88      89      +1
igt_mock_ppgtt_misaligned_dma.cold           246     247      +1
i915_init                                     88      89      +1
gen11_irq_handler                            733     734      +1
g4x_pre_enable_dp                            345     346      +1
ring_request_alloc                          1899    1898      -1
live_suppress_wait_preempt.part             1291    1290      -1
i915_sched_lookup_priolist                   482     479      -3
i915_request_retire                         1377    1373      -4
i915_request_await_dma_fence                 547     543      -4
i915_fence_release                            45      41      -4
__execlists_submission_tasklet              2121    2111     -10
i915_request_alloc                           817     806     -11
i915_request_alloc_slow.isra                  76      64     -12
i915_sched_node_add_dependency               114     101     -13
execlists_cancel_requests                    690     676     -14
i915_sched_node_fini                         459     444     -15
guc_submission_tasklet                      1931    1916     -15
__sleep_work                                 106      75     -31
mock_device_release                          407     371     -36
igt_mock_ppgtt_huge_fill                    1108    1069     -39
i915_gem_cleanup_early                       213     173     -40
igt_mock_ppgtt_huge_fill.cold                611     531     -80
mock_gem_device                             1263    1102    -161
i915_gem_init_early                          838     664    -174

__i915_priolist_free is the nasty one, but that's only hit for
!I915_PRIORITY_NORMAL so I considered it to not be worth inlining.

(I have no idea what the compiler is thinking changed in half those
functions.)

> You wouldn't consider i915->global->slab_dependencies or something along 
> those lines?

I think for bits and bobs that are true globals like allocators, global
variables do make sense. We either end up with pointers to a singleton,
or just directly link them into the code. And I like the idea of easy
wins.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 16/22] drm/i915: Add timeline barrier support
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (14 preceding siblings ...)
  2019-02-04 13:22 ` [PATCH 15/22] drm/i915: Make request allocation caches global Chris Wilson
@ 2019-02-04 13:22 ` Chris Wilson
  2019-02-04 13:22 ` [PATCH 17/22] drm/i915: Pull i915_gem_active into the i915_active family Chris Wilson
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:22 UTC (permalink / raw)
  To: intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Timeline barrier allows serialization between different timelines.

After calling i915_timeline_set_barrier with a request, all following
submissions on this timeline will be set up as depending on this request,
or barrier. Once the barrier has been completed it automatically gets
cleared and things continue as normal.

This facility will be used by the upcoming context SSEU code.

v2:
 * Assert barrier has been retired on timeline_fini. (Chris Wilson)
 * Fix mock_timeline.

v3:
 * Improved comment language. (Chris Wilson)

v4:
 * Maintain ordering with previous barriers set on the timeline.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c           | 17 ++++++++++++++
 drivers/gpu/drm/i915/i915_timeline.c          | 21 ++++++++++++++++++
 drivers/gpu/drm/i915/i915_timeline.h          | 22 +++++++++++++++++++
 .../gpu/drm/i915/selftests/mock_timeline.c    |  1 +
 4 files changed, 61 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 3bb4840ba761..f5b2c95125ba 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -543,6 +543,19 @@ i915_request_alloc_slow(struct intel_context *ce)
 	return kmem_cache_alloc(global.slab_requests, GFP_KERNEL);
 }
 
+static int add_barrier(struct i915_request *rq, struct i915_gem_active *active)
+{
+	struct i915_request *barrier =
+		i915_gem_active_raw(active, &rq->i915->drm.struct_mutex);
+
+	return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0;
+}
+
+static int add_timeline_barrier(struct i915_request *rq)
+{
+	return add_barrier(rq, &rq->timeline->barrier);
+}
+
 /**
  * i915_request_alloc - allocate a request structure
  *
@@ -685,6 +698,10 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 */
 	rq->head = rq->ring->emit;
 
+	ret = add_timeline_barrier(rq);
+	if (ret)
+		goto err_unwind;
+
 	ret = engine->request_alloc(rq);
 	if (ret)
 		goto err_unwind;
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 5ea3af393ffe..b354843a5040 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -163,6 +163,7 @@ int i915_timeline_init(struct drm_i915_private *i915,
 
 	spin_lock_init(&timeline->lock);
 
+	init_request_active(&timeline->barrier, NULL);
 	init_request_active(&timeline->last_request, NULL);
 	INIT_LIST_HEAD(&timeline->requests);
 
@@ -235,6 +236,7 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 {
 	GEM_BUG_ON(timeline->pin_count);
 	GEM_BUG_ON(!list_empty(&timeline->requests));
+	GEM_BUG_ON(i915_gem_active_isset(&timeline->barrier));
 
 	i915_syncmap_free(&timeline->sync);
 	hwsp_free(timeline);
@@ -266,6 +268,25 @@ i915_timeline_create(struct drm_i915_private *i915,
 	return timeline;
 }
 
+int i915_timeline_set_barrier(struct i915_timeline *tl, struct i915_request *rq)
+{
+	struct i915_request *old;
+	int err;
+
+	lockdep_assert_held(&rq->i915->drm.struct_mutex);
+
+	/* Must maintain ordering wrt existing barriers */
+	old = i915_gem_active_raw(&tl->barrier, &rq->i915->drm.struct_mutex);
+	if (old) {
+		err = i915_request_await_dma_fence(rq, &old->fence);
+		if (err)
+			return err;
+	}
+
+	i915_gem_active_set(&tl->barrier, rq);
+	return 0;
+}
+
 int i915_timeline_pin(struct i915_timeline *tl)
 {
 	int err;
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 8caeb66d1cd5..d167e04073c5 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -74,6 +74,16 @@ struct i915_timeline {
 	 */
 	struct i915_syncmap *sync;
 
+	/**
+	 * Barrier provides the ability to serialize ordering between different
+	 * timelines.
+	 *
+	 * Users can call i915_timeline_set_barrier which will make all
+	 * subsequent submissions to this timeline be executed only after the
+	 * barrier has been completed.
+	 */
+	struct i915_gem_active barrier;
+
 	struct list_head link;
 	const char *name;
 	struct drm_i915_private *i915;
@@ -155,4 +165,16 @@ void i915_timelines_init(struct drm_i915_private *i915);
 void i915_timelines_park(struct drm_i915_private *i915);
 void i915_timelines_fini(struct drm_i915_private *i915);
 
+/**
+ * i915_timeline_set_barrier - orders submission between different timelines
+ * @timeline: timeline to set the barrier on
+ * @rq: request after which new submissions can proceed
+ *
+ * Sets the passed in request as the serialization point for all subsequent
+ * submissions on @timeline. Subsequent requests will not be submitted to GPU
+ * until the barrier has been completed.
+ */
+int i915_timeline_set_barrier(struct i915_timeline *timeline,
+			      struct i915_request *rq);
+
 #endif
diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c
index cf39ccd9fc05..e5659aaa856d 100644
--- a/drivers/gpu/drm/i915/selftests/mock_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c
@@ -15,6 +15,7 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context)
 
 	spin_lock_init(&timeline->lock);
 
+	init_request_active(&timeline->barrier, NULL);
 	init_request_active(&timeline->last_request, NULL);
 	INIT_LIST_HEAD(&timeline->requests);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 17/22] drm/i915: Pull i915_gem_active into the i915_active family
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (15 preceding siblings ...)
  2019-02-04 13:22 ` [PATCH 16/22] drm/i915: Add timeline barrier support Chris Wilson
@ 2019-02-04 13:22 ` Chris Wilson
  2019-02-04 18:50   ` Tvrtko Ursulin
  2019-02-04 13:22 ` [PATCH 18/22] drm/i915: Keep timeline HWSP allocated until idle across the system Chris Wilson
                   ` (11 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:22 UTC (permalink / raw)
  To: intel-gfx

Looking forward, we need to break the struct_mutex dependency on
i915_gem_active. In the meantime, external use of i915_gem_active is
quite beguiling, little do new users suspect that it implies a barrier
as each request it tracks must be ordered wrt the previous one. As one
of many, it can be used to track activity across multiple timelines, a
shared fence, which fits our unordered request submission much better. We
need to steer external users away from the singular, exclusive fence
imposed by i915_gem_active to i915_active instead. As part of that
process, we move i915_gem_active out of i915_request.c into
i915_active.c to start separating the two concepts, and rename it to
i915_active_request (both to tie it to the concept of tracking just one
request, and to give it a longer, less appealing name).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_active.c            |  62 ++-
 drivers/gpu/drm/i915/i915_active.h            | 349 ++++++++++++++++
 drivers/gpu/drm/i915/i915_active_types.h      |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c           |   2 +-
 drivers/gpu/drm/i915/i915_gem.c               |  10 +-
 drivers/gpu/drm/i915/i915_gem_context.c       |   4 +-
 drivers/gpu/drm/i915/i915_gem_fence_reg.c     |   4 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c           |   2 +-
 drivers/gpu/drm/i915/i915_gem_object.h        |   2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c         |  10 +-
 drivers/gpu/drm/i915/i915_request.c           |  35 +-
 drivers/gpu/drm/i915/i915_request.h           | 383 ------------------
 drivers/gpu/drm/i915/i915_reset.c             |   2 +-
 drivers/gpu/drm/i915/i915_timeline.c          |  25 +-
 drivers/gpu/drm/i915/i915_timeline.h          |  14 +-
 drivers/gpu/drm/i915/i915_vma.c               |  12 +-
 drivers/gpu/drm/i915/i915_vma.h               |   2 +-
 drivers/gpu/drm/i915/intel_engine_cs.c        |   2 +-
 drivers/gpu/drm/i915/intel_overlay.c          |  33 +-
 drivers/gpu/drm/i915/selftests/intel_lrc.c    |   4 +-
 .../gpu/drm/i915/selftests/mock_timeline.c    |   4 +-
 21 files changed, 474 insertions(+), 503 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index d23092d8c89f..846900535d10 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -21,7 +21,7 @@ static struct i915_global_active {
 } global;
 
 struct active_node {
-	struct i915_gem_active base;
+	struct i915_active_request base;
 	struct i915_active *ref;
 	struct rb_node node;
 	u64 timeline;
@@ -33,7 +33,7 @@ __active_park(struct i915_active *ref)
 	struct active_node *it, *n;
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
-		GEM_BUG_ON(i915_gem_active_isset(&it->base));
+		GEM_BUG_ON(i915_active_request_isset(&it->base));
 		kmem_cache_free(global.slab_cache, it);
 	}
 	ref->tree = RB_ROOT;
@@ -53,18 +53,18 @@ __active_retire(struct i915_active *ref)
 }
 
 static void
-node_retire(struct i915_gem_active *base, struct i915_request *rq)
+node_retire(struct i915_active_request *base, struct i915_request *rq)
 {
 	__active_retire(container_of(base, struct active_node, base)->ref);
 }
 
 static void
-last_retire(struct i915_gem_active *base, struct i915_request *rq)
+last_retire(struct i915_active_request *base, struct i915_request *rq)
 {
 	__active_retire(container_of(base, struct i915_active, last));
 }
 
-static struct i915_gem_active *
+static struct i915_active_request *
 active_instance(struct i915_active *ref, u64 idx)
 {
 	struct active_node *node;
@@ -85,7 +85,7 @@ active_instance(struct i915_active *ref, u64 idx)
 	 * twice for the same timeline (as the older rbtree element will be
 	 * retired before the new request added to last).
 	 */
-	old = i915_gem_active_raw(&ref->last, BKL(ref));
+	old = i915_active_request_raw(&ref->last, BKL(ref));
 	if (!old || old->fence.context == idx)
 		goto out;
 
@@ -110,7 +110,7 @@ active_instance(struct i915_active *ref, u64 idx)
 	node = kmem_cache_alloc(global.slab_cache, GFP_KERNEL);
 
 	/* kmalloc may retire the ref->last (thanks shrinker)! */
-	if (unlikely(!i915_gem_active_raw(&ref->last, BKL(ref)))) {
+	if (unlikely(!i915_active_request_raw(&ref->last, BKL(ref)))) {
 		kmem_cache_free(global.slab_cache, node);
 		goto out;
 	}
@@ -118,7 +118,7 @@ active_instance(struct i915_active *ref, u64 idx)
 	if (unlikely(!node))
 		return ERR_PTR(-ENOMEM);
 
-	init_request_active(&node->base, node_retire);
+	i915_active_request_init(&node->base, NULL, node_retire);
 	node->ref = ref;
 	node->timeline = idx;
 
@@ -133,7 +133,7 @@ active_instance(struct i915_active *ref, u64 idx)
 	 * callback not two, and so much undo the active counting for the
 	 * overwritten slot.
 	 */
-	if (i915_gem_active_isset(&node->base)) {
+	if (i915_active_request_isset(&node->base)) {
 		/* Retire ourselves from the old rq->active_list */
 		__list_del_entry(&node->base.link);
 		ref->count--;
@@ -154,7 +154,7 @@ void i915_active_init(struct drm_i915_private *i915,
 	ref->i915 = i915;
 	ref->retire = retire;
 	ref->tree = RB_ROOT;
-	init_request_active(&ref->last, last_retire);
+	i915_active_request_init(&ref->last, NULL, last_retire);
 	ref->count = 0;
 }
 
@@ -162,15 +162,15 @@ int i915_active_ref(struct i915_active *ref,
 		    u64 timeline,
 		    struct i915_request *rq)
 {
-	struct i915_gem_active *active;
+	struct i915_active_request *active;
 
 	active = active_instance(ref, timeline);
 	if (IS_ERR(active))
 		return PTR_ERR(active);
 
-	if (!i915_gem_active_isset(active))
+	if (!i915_active_request_isset(active))
 		ref->count++;
-	i915_gem_active_set(active, rq);
+	__i915_active_request_set(active, rq);
 
 	GEM_BUG_ON(!ref->count);
 	return 0;
@@ -196,12 +196,12 @@ int i915_active_wait(struct i915_active *ref)
 	if (i915_active_acquire(ref))
 		goto out_release;
 
-	ret = i915_gem_active_retire(&ref->last, BKL(ref));
+	ret = i915_active_request_retire(&ref->last, BKL(ref));
 	if (ret)
 		goto out_release;
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
-		ret = i915_gem_active_retire(&it->base, BKL(ref));
+		ret = i915_active_request_retire(&it->base, BKL(ref));
 		if (ret)
 			break;
 	}
@@ -211,11 +211,11 @@ int i915_active_wait(struct i915_active *ref)
 	return ret;
 }
 
-static int __i915_request_await_active(struct i915_request *rq,
-				       struct i915_gem_active *active)
+int i915_request_await_active_request(struct i915_request *rq,
+				      struct i915_active_request *active)
 {
 	struct i915_request *barrier =
-		i915_gem_active_raw(active, &rq->i915->drm.struct_mutex);
+		i915_active_request_raw(active, &rq->i915->drm.struct_mutex);
 
 	return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0;
 }
@@ -225,12 +225,12 @@ int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
 	struct active_node *it, *n;
 	int ret;
 
-	ret = __i915_request_await_active(rq, &ref->last);
+	ret = i915_request_await_active_request(rq, &ref->last);
 	if (ret)
 		return ret;
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
-		ret = __i915_request_await_active(rq, &it->base);
+		ret = i915_request_await_active_request(rq, &it->base);
 		if (ret)
 			return ret;
 	}
@@ -241,12 +241,32 @@ int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
 void i915_active_fini(struct i915_active *ref)
 {
-	GEM_BUG_ON(i915_gem_active_isset(&ref->last));
+	GEM_BUG_ON(i915_active_request_isset(&ref->last));
 	GEM_BUG_ON(!RB_EMPTY_ROOT(&ref->tree));
 	GEM_BUG_ON(ref->count);
 }
 #endif
 
+int i915_active_request_set(struct i915_active_request *active,
+			    struct i915_request *rq)
+{
+	int err;
+
+	/* Must maintain ordering wrt previous active requests */
+	err = i915_request_await_active_request(rq, active);
+	if (err)
+		return err;
+
+	__i915_active_request_set(active, rq);
+	return 0;
+}
+
+void i915_active_retire_noop(struct i915_active_request *active,
+			     struct i915_request *request)
+{
+	/* Space left intentionally blank */
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/i915_active.c"
 #endif
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index 6c56d10b1f59..5fbd9102384b 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -7,7 +7,354 @@
 #ifndef _I915_ACTIVE_H_
 #define _I915_ACTIVE_H_
 
+#include <linux/lockdep.h>
+
 #include "i915_active_types.h"
+#include "i915_request.h"
+
+/*
+ * We treat requests as fences. This is not be to confused with our
+ * "fence registers" but pipeline synchronisation objects ala GL_ARB_sync.
+ * We use the fences to synchronize access from the CPU with activity on the
+ * GPU, for example, we should not rewrite an object's PTE whilst the GPU
+ * is reading them. We also track fences at a higher level to provide
+ * implicit synchronisation around GEM objects, e.g. set-domain will wait
+ * for outstanding GPU rendering before marking the object ready for CPU
+ * access, or a pageflip will wait until the GPU is complete before showing
+ * the frame on the scanout.
+ *
+ * In order to use a fence, the object must track the fence it needs to
+ * serialise with. For example, GEM objects want to track both read and
+ * write access so that we can perform concurrent read operations between
+ * the CPU and GPU engines, as well as waiting for all rendering to
+ * complete, or waiting for the last GPU user of a "fence register". The
+ * object then embeds a #i915_active_request to track the most recent (in
+ * retirement order) request relevant for the desired mode of access.
+ * The #i915_active_request is updated with i915_active_request_set() to
+ * track the most recent fence request, typically this is done as part of
+ * i915_vma_move_to_active().
+ *
+ * When the #i915_active_request completes (is retired), it will
+ * signal its completion to the owner through a callback as well as mark
+ * itself as idle (i915_active_request.request == NULL). The owner
+ * can then perform any action, such as delayed freeing of an active
+ * resource including itself.
+ */
+
+void i915_active_retire_noop(struct i915_active_request *active,
+			     struct i915_request *request);
+
+/**
+ * i915_active_request_init - prepares the activity tracker for use
+ * @active - the active tracker
+ * @rq - initial request to track, can be NULL
+ * @func - a callback when then the tracker is retired (becomes idle),
+ *         can be NULL
+ *
+ * i915_active_request_init() prepares the embedded @active struct for use as
+ * an activity tracker, that is for tracking the last known active request
+ * associated with it. When the last request becomes idle, when it is retired
+ * after completion, the optional callback @func is invoked.
+ */
+static inline void
+i915_active_request_init(struct i915_active_request *active,
+			 struct i915_request *rq,
+			 i915_active_retire_fn retire)
+{
+	RCU_INIT_POINTER(active->request, rq);
+	INIT_LIST_HEAD(&active->link);
+	active->retire = retire ?: i915_active_retire_noop;
+}
+
+#define INIT_ACTIVE_REQUEST(name) i915_active_request_init((name), NULL, NULL)
+
+/**
+ * i915_active_request_set - updates the tracker to watch the current request
+ * @active - the active tracker
+ * @request - the request to watch
+ *
+ * __i915_active_request_set() watches the given @request for completion. Whilst
+ * that @request is busy, the @active reports busy. When that @request is
+ * retired, the @active tracker is updated to report idle.
+ */
+static inline void
+__i915_active_request_set(struct i915_active_request *active,
+			  struct i915_request *request)
+{
+	list_move(&active->link, &request->active_list);
+	rcu_assign_pointer(active->request, request);
+}
+
+int __must_check
+i915_active_request_set(struct i915_active_request *active,
+			struct i915_request *rq);
+
+/**
+ * i915_active_request_set_retire_fn - updates the retirement callback
+ * @active - the active tracker
+ * @fn - the routine called when the request is retired
+ * @mutex - struct_mutex used to guard retirements
+ *
+ * i915_active_request_set_retire_fn() updates the function pointer that
+ * is called when the final request associated with the @active tracker
+ * is retired.
+ */
+static inline void
+i915_active_request_set_retire_fn(struct i915_active_request *active,
+				  i915_active_retire_fn fn,
+				  struct mutex *mutex)
+{
+	lockdep_assert_held(mutex);
+	active->retire = fn ?: i915_active_retire_noop;
+}
+
+static inline struct i915_request *
+__i915_active_request_peek(const struct i915_active_request *active)
+{
+	/*
+	 * Inside the error capture (running with the driver in an unknown
+	 * state), we want to bend the rules slightly (a lot).
+	 *
+	 * Work is in progress to make it safer, in the meantime this keeps
+	 * the known issue from spamming the logs.
+	 */
+	return rcu_dereference_protected(active->request, 1);
+}
+
+/**
+ * i915_active_request_raw - return the active request
+ * @active - the active tracker
+ *
+ * i915_active_request_raw() returns the current request being tracked, or NULL.
+ * It does not obtain a reference on the request for the caller, so the caller
+ * must hold struct_mutex.
+ */
+static inline struct i915_request *
+i915_active_request_raw(const struct i915_active_request *active,
+			struct mutex *mutex)
+{
+	return rcu_dereference_protected(active->request,
+					 lockdep_is_held(mutex));
+}
+
+/**
+ * i915_active_request_peek - report the active request being monitored
+ * @active - the active tracker
+ *
+ * i915_active_request_peek() returns the current request being tracked if
+ * still active, or NULL. It does not obtain a reference on the request
+ * for the caller, so the caller must hold struct_mutex.
+ */
+static inline struct i915_request *
+i915_active_request_peek(const struct i915_active_request *active,
+			 struct mutex *mutex)
+{
+	struct i915_request *request;
+
+	request = i915_active_request_raw(active, mutex);
+	if (!request || i915_request_completed(request))
+		return NULL;
+
+	return request;
+}
+
+/**
+ * i915_active_request_get - return a reference to the active request
+ * @active - the active tracker
+ *
+ * i915_active_request_get() returns a reference to the active request, or NULL
+ * if the active tracker is idle. The caller must hold struct_mutex.
+ */
+static inline struct i915_request *
+i915_active_request_get(const struct i915_active_request *active,
+			struct mutex *mutex)
+{
+	return i915_request_get(i915_active_request_peek(active, mutex));
+}
+
+/**
+ * __i915_active_request_get_rcu - return a reference to the active request
+ * @active - the active tracker
+ *
+ * __i915_active_request_get() returns a reference to the active request,
+ * or NULL if the active tracker is idle. The caller must hold the RCU read
+ * lock, but the returned pointer is safe to use outside of RCU.
+ */
+static inline struct i915_request *
+__i915_active_request_get_rcu(const struct i915_active_request *active)
+{
+	/*
+	 * Performing a lockless retrieval of the active request is super
+	 * tricky. SLAB_TYPESAFE_BY_RCU merely guarantees that the backing
+	 * slab of request objects will not be freed whilst we hold the
+	 * RCU read lock. It does not guarantee that the request itself
+	 * will not be freed and then *reused*. Viz,
+	 *
+	 * Thread A			Thread B
+	 *
+	 * rq = active.request
+	 *				retire(rq) -> free(rq);
+	 *				(rq is now first on the slab freelist)
+	 *				active.request = NULL
+	 *
+	 *				rq = new submission on a new object
+	 * ref(rq)
+	 *
+	 * To prevent the request from being reused whilst the caller
+	 * uses it, we take a reference like normal. Whilst acquiring
+	 * the reference we check that it is not in a destroyed state
+	 * (refcnt == 0). That prevents the request being reallocated
+	 * whilst the caller holds on to it. To check that the request
+	 * was not reallocated as we acquired the reference we have to
+	 * check that our request remains the active request across
+	 * the lookup, in the same manner as a seqlock. The visibility
+	 * of the pointer versus the reference counting is controlled
+	 * by using RCU barriers (rcu_dereference and rcu_assign_pointer).
+	 *
+	 * In the middle of all that, we inspect whether the request is
+	 * complete. Retiring is lazy so the request may be completed long
+	 * before the active tracker is updated. Querying whether the
+	 * request is complete is far cheaper (as it involves no locked
+	 * instructions setting cachelines to exclusive) than acquiring
+	 * the reference, so we do it first. The RCU read lock ensures the
+	 * pointer dereference is valid, but does not ensure that the
+	 * seqno nor HWS is the right one! However, if the request was
+	 * reallocated, that means the active tracker's request was complete.
+	 * If the new request is also complete, then both are and we can
+	 * just report the active tracker is idle. If the new request is
+	 * incomplete, then we acquire a reference on it and check that
+	 * it remained the active request.
+	 *
+	 * It is then imperative that we do not zero the request on
+	 * reallocation, so that we can chase the dangling pointers!
+	 * See i915_request_alloc().
+	 */
+	do {
+		struct i915_request *request;
+
+		request = rcu_dereference(active->request);
+		if (!request || i915_request_completed(request))
+			return NULL;
+
+		/*
+		 * An especially silly compiler could decide to recompute the
+		 * result of i915_request_completed, more specifically
+		 * re-emit the load for request->fence.seqno. A race would catch
+		 * a later seqno value, which could flip the result from true to
+		 * false. Which means part of the instructions below might not
+		 * be executed, while later on instructions are executed. Due to
+		 * barriers within the refcounting the inconsistency can't reach
+		 * past the call to i915_request_get_rcu, but not executing
+		 * that while still executing i915_request_put() creates
+		 * havoc enough.  Prevent this with a compiler barrier.
+		 */
+		barrier();
+
+		request = i915_request_get_rcu(request);
+
+		/*
+		 * What stops the following rcu_access_pointer() from occurring
+		 * before the above i915_request_get_rcu()? If we were
+		 * to read the value before pausing to get the reference to
+		 * the request, we may not notice a change in the active
+		 * tracker.
+		 *
+		 * The rcu_access_pointer() is a mere compiler barrier, which
+		 * means both the CPU and compiler are free to perform the
+		 * memory read without constraint. The compiler only has to
+		 * ensure that any operations after the rcu_access_pointer()
+		 * occur afterwards in program order. This means the read may
+		 * be performed earlier by an out-of-order CPU, or adventurous
+		 * compiler.
+		 *
+		 * The atomic operation at the heart of
+		 * i915_request_get_rcu(), see dma_fence_get_rcu(), is
+		 * atomic_inc_not_zero() which is only a full memory barrier
+		 * when successful. That is, if i915_request_get_rcu()
+		 * returns the request (and so with the reference counted
+		 * incremented) then the following read for rcu_access_pointer()
+		 * must occur after the atomic operation and so confirm
+		 * that this request is the one currently being tracked.
+		 *
+		 * The corresponding write barrier is part of
+		 * rcu_assign_pointer().
+		 */
+		if (!request || request == rcu_access_pointer(active->request))
+			return rcu_pointer_handoff(request);
+
+		i915_request_put(request);
+	} while (1);
+}
+
+/**
+ * i915_active_request_get_unlocked - return a reference to the active request
+ * @active - the active tracker
+ *
+ * i915_active_request_get_unlocked() returns a reference to the active request,
+ * or NULL if the active tracker is idle. The reference is obtained under RCU,
+ * so no locking is required by the caller.
+ *
+ * The reference should be freed with i915_request_put().
+ */
+static inline struct i915_request *
+i915_active_request_get_unlocked(const struct i915_active_request *active)
+{
+	struct i915_request *request;
+
+	rcu_read_lock();
+	request = __i915_active_request_get_rcu(active);
+	rcu_read_unlock();
+
+	return request;
+}
+
+/**
+ * i915_active_request_isset - report whether the active tracker is assigned
+ * @active - the active tracker
+ *
+ * i915_active_request_isset() returns true if the active tracker is currently
+ * assigned to a request. Due to the lazy retiring, that request may be idle
+ * and this may report stale information.
+ */
+static inline bool
+i915_active_request_isset(const struct i915_active_request *active)
+{
+	return rcu_access_pointer(active->request);
+}
+
+/**
+ * i915_active_request_retire - waits until the request is retired
+ * @active - the active request on which to wait
+ *
+ * i915_active_request_retire() waits until the request is completed,
+ * and then ensures that at least the retirement handler for this
+ * @active tracker is called before returning. If the @active
+ * tracker is idle, the function returns immediately.
+ */
+static inline int __must_check
+i915_active_request_retire(struct i915_active_request *active,
+			   struct mutex *mutex)
+{
+	struct i915_request *request;
+	long ret;
+
+	request = i915_active_request_raw(active, mutex);
+	if (!request)
+		return 0;
+
+	ret = i915_request_wait(request,
+				I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
+				MAX_SCHEDULE_TIMEOUT);
+	if (ret < 0)
+		return ret;
+
+	list_del_init(&active->link);
+	RCU_INIT_POINTER(active->request, NULL);
+
+	active->retire(active, request);
+
+	return 0;
+}
 
 /*
  * GPU activity tracking
@@ -47,6 +394,8 @@ int i915_active_wait(struct i915_active *ref);
 
 int i915_request_await_active(struct i915_request *rq,
 			      struct i915_active *ref);
+int i915_request_await_active_request(struct i915_request *rq,
+				      struct i915_active_request *active);
 
 bool i915_active_acquire(struct i915_active *ref);
 
diff --git a/drivers/gpu/drm/i915/i915_active_types.h b/drivers/gpu/drm/i915/i915_active_types.h
index 411e502ed8dd..b679253b53a5 100644
--- a/drivers/gpu/drm/i915/i915_active_types.h
+++ b/drivers/gpu/drm/i915/i915_active_types.h
@@ -8,16 +8,26 @@
 #define _I915_ACTIVE_TYPES_H_
 
 #include <linux/rbtree.h>
-
-#include "i915_request.h"
+#include <linux/rcupdate.h>
 
 struct drm_i915_private;
+struct i915_active_request;
+struct i915_request;
+
+typedef void (*i915_active_retire_fn)(struct i915_active_request *,
+				      struct i915_request *);
+
+struct i915_active_request {
+	struct i915_request __rcu *request;
+	struct list_head link;
+	i915_active_retire_fn retire;
+};
 
 struct i915_active {
 	struct drm_i915_private *i915;
 
 	struct rb_root tree;
-	struct i915_gem_active last;
+	struct i915_active_request last;
 	unsigned int count;
 
 	void (*retire)(struct i915_active *ref);
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 54e426883529..a270af18404f 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -207,7 +207,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		if (vma->fence)
 			seq_printf(m, " , fence: %d%s",
 				   vma->fence->id,
-				   i915_gem_active_isset(&vma->last_fence) ? "*" : "");
+				   i915_active_request_isset(&vma->last_fence) ? "*" : "");
 		seq_puts(m, ")");
 	}
 	if (obj->stolen)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d82e4f990586..81aa37508bc4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2987,7 +2987,7 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 
 	GEM_BUG_ON(i915->gt.active_requests);
 	for_each_engine(engine, i915, id) {
-		GEM_BUG_ON(__i915_gem_active_peek(&engine->timeline.last_request));
+		GEM_BUG_ON(__i915_active_request_peek(&engine->timeline.last_request));
 		GEM_BUG_ON(engine->last_retired_context !=
 			   to_intel_context(i915->kernel_context, engine));
 	}
@@ -3233,7 +3233,7 @@ wait_for_timelines(struct drm_i915_private *i915,
 	list_for_each_entry(tl, &gt->active_list, link) {
 		struct i915_request *rq;
 
-		rq = i915_gem_active_get_unlocked(&tl->last_request);
+		rq = i915_active_request_get_unlocked(&tl->last_request);
 		if (!rq)
 			continue;
 
@@ -4134,7 +4134,8 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
 }
 
 static void
-frontbuffer_retire(struct i915_gem_active *active, struct i915_request *request)
+frontbuffer_retire(struct i915_active_request *active,
+		   struct i915_request *request)
 {
 	struct drm_i915_gem_object *obj =
 		container_of(active, typeof(*obj), frontbuffer_write);
@@ -4161,7 +4162,8 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 	obj->resv = &obj->__builtin_resv;
 
 	obj->frontbuffer_ggtt_origin = ORIGIN_GTT;
-	init_request_active(&obj->frontbuffer_write, frontbuffer_retire);
+	i915_active_request_init(&obj->frontbuffer_write,
+				 NULL, frontbuffer_retire);
 
 	obj->mm.madv = I915_MADV_WILLNEED;
 	INIT_RADIX_TREE(&obj->mm.get_page.radix, GFP_KERNEL | __GFP_NOWARN);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 6faf1f6faab5..ea8e818d22bf 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -653,8 +653,8 @@ last_request_on_engine(struct i915_timeline *timeline,
 
 	GEM_BUG_ON(timeline == &engine->timeline);
 
-	rq = i915_gem_active_raw(&timeline->last_request,
-				 &engine->i915->drm.struct_mutex);
+	rq = i915_active_request_raw(&timeline->last_request,
+				     &engine->i915->drm.struct_mutex);
 	if (rq && rq->engine == engine) {
 		GEM_TRACE("last request for %s on engine %s: %llx:%llu\n",
 			  timeline->name, engine->name,
diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
index bd0d5b8d6c96..36d548fa3aa2 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
@@ -223,7 +223,7 @@ static int fence_update(struct drm_i915_fence_reg *fence,
 			 i915_gem_object_get_tiling(vma->obj)))
 			return -EINVAL;
 
-		ret = i915_gem_active_retire(&vma->last_fence,
+		ret = i915_active_request_retire(&vma->last_fence,
 					     &vma->obj->base.dev->struct_mutex);
 		if (ret)
 			return ret;
@@ -232,7 +232,7 @@ static int fence_update(struct drm_i915_fence_reg *fence,
 	if (fence->vma) {
 		struct i915_vma *old = fence->vma;
 
-		ret = i915_gem_active_retire(&old->last_fence,
+		ret = i915_active_request_retire(&old->last_fence,
 					     &old->obj->base.dev->struct_mutex);
 		if (ret)
 			return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e625659c03a2..d646d37eec2f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1918,7 +1918,7 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
 		return ERR_PTR(-ENOMEM);
 
 	i915_active_init(i915, &vma->active, NULL);
-	init_request_active(&vma->last_fence, NULL);
+	INIT_ACTIVE_REQUEST(&vma->last_fence);
 
 	vma->vm = &ggtt->vm;
 	vma->ops = &pd_vma_ops;
diff --git a/drivers/gpu/drm/i915/i915_gem_object.h b/drivers/gpu/drm/i915/i915_gem_object.h
index 73fec917d097..fab040331cdb 100644
--- a/drivers/gpu/drm/i915/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/i915_gem_object.h
@@ -175,7 +175,7 @@ struct drm_i915_gem_object {
 
 	atomic_t frontbuffer_bits;
 	unsigned int frontbuffer_ggtt_origin; /* write once */
-	struct i915_gem_active frontbuffer_write;
+	struct i915_active_request frontbuffer_write;
 
 	/** Current tiling stride for the object, if it's tiled. */
 	unsigned int tiling_and_stride;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 6e2e5ed2bd0a..9a65341fec09 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1062,23 +1062,23 @@ i915_error_object_create(struct drm_i915_private *i915,
 }
 
 /* The error capture is special as tries to run underneath the normal
- * locking rules - so we use the raw version of the i915_gem_active lookup.
+ * locking rules - so we use the raw version of the i915_active_request lookup.
  */
 static inline u32
-__active_get_seqno(struct i915_gem_active *active)
+__active_get_seqno(struct i915_active_request *active)
 {
 	struct i915_request *request;
 
-	request = __i915_gem_active_peek(active);
+	request = __i915_active_request_peek(active);
 	return request ? request->global_seqno : 0;
 }
 
 static inline int
-__active_get_engine_id(struct i915_gem_active *active)
+__active_get_engine_id(struct i915_active_request *active)
 {
 	struct i915_request *request;
 
-	request = __i915_gem_active_peek(active);
+	request = __i915_active_request_peek(active);
 	return request ? request->engine->id : -1;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f5b2c95125ba..ed9f16bca4fe 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -29,6 +29,7 @@
 #include <linux/sched/signal.h>
 
 #include "i915_drv.h"
+#include "i915_active.h"
 #include "i915_reset.h"
 
 static struct i915_global_request {
@@ -130,12 +131,6 @@ static void unreserve_gt(struct drm_i915_private *i915)
 		i915_gem_park(i915);
 }
 
-void i915_gem_retire_noop(struct i915_gem_active *active,
-			  struct i915_request *request)
-{
-	/* Space left intentionally blank */
-}
-
 static void advance_ring(struct i915_request *request)
 {
 	struct intel_ring *ring = request->ring;
@@ -249,7 +244,7 @@ static void __retire_engine_upto(struct intel_engine_cs *engine,
 
 static void i915_request_retire(struct i915_request *request)
 {
-	struct i915_gem_active *active, *next;
+	struct i915_active_request *active, *next;
 
 	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d:%d\n",
 		  request->engine->name,
@@ -283,10 +278,10 @@ static void i915_request_retire(struct i915_request *request)
 		 * we may spend an inordinate amount of time simply handling
 		 * the retirement of requests and processing their callbacks.
 		 * Of which, this loop itself is particularly hot due to the
-		 * cache misses when jumping around the list of i915_gem_active.
-		 * So we try to keep this loop as streamlined as possible and
-		 * also prefetch the next i915_gem_active to try and hide
-		 * the likely cache miss.
+		 * cache misses when jumping around the list of
+		 * i915_active_request.  So we try to keep this loop as
+		 * streamlined as possible and also prefetch the next
+		 * i915_active_request to try and hide the likely cache miss.
 		 */
 		prefetchw(next);
 
@@ -543,17 +538,9 @@ i915_request_alloc_slow(struct intel_context *ce)
 	return kmem_cache_alloc(global.slab_requests, GFP_KERNEL);
 }
 
-static int add_barrier(struct i915_request *rq, struct i915_gem_active *active)
-{
-	struct i915_request *barrier =
-		i915_gem_active_raw(active, &rq->i915->drm.struct_mutex);
-
-	return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0;
-}
-
 static int add_timeline_barrier(struct i915_request *rq)
 {
-	return add_barrier(rq, &rq->timeline->barrier);
+	return i915_request_await_active_request(rq, &rq->timeline->barrier);
 }
 
 /**
@@ -612,7 +599,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 * We use RCU to look up requests in flight. The lookups may
 	 * race with the request being allocated from the slab freelist.
 	 * That is the request we are writing to here, may be in the process
-	 * of being read by __i915_gem_active_get_rcu(). As such,
+	 * of being read by __i915_active_request_get_rcu(). As such,
 	 * we have to be very careful when overwriting the contents. During
 	 * the RCU lookup, we change chase the request->engine pointer,
 	 * read the request->global_seqno and increment the reference count.
@@ -952,8 +939,8 @@ void i915_request_add(struct i915_request *request)
 	 * see a more recent value in the hws than we are tracking.
 	 */
 
-	prev = i915_gem_active_raw(&timeline->last_request,
-				   &request->i915->drm.struct_mutex);
+	prev = i915_active_request_raw(&timeline->last_request,
+				       &request->i915->drm.struct_mutex);
 	if (prev && !i915_request_completed(prev)) {
 		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
 					     &request->submitq);
@@ -969,7 +956,7 @@ void i915_request_add(struct i915_request *request)
 	spin_unlock_irq(&timeline->lock);
 
 	GEM_BUG_ON(timeline->seqno != request->fence.seqno);
-	i915_gem_active_set(&timeline->last_request, request);
+	__i915_active_request_set(&timeline->last_request, request);
 
 	list_add_tail(&request->ring_link, &ring->request_list);
 	if (list_is_first(&request->ring_link, &ring->request_list)) {
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 054bd300984b..071ff1064579 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -409,389 +409,6 @@ static inline void i915_request_mark_complete(struct i915_request *rq)
 
 void i915_retire_requests(struct drm_i915_private *i915);
 
-/*
- * We treat requests as fences. This is not be to confused with our
- * "fence registers" but pipeline synchronisation objects ala GL_ARB_sync.
- * We use the fences to synchronize access from the CPU with activity on the
- * GPU, for example, we should not rewrite an object's PTE whilst the GPU
- * is reading them. We also track fences at a higher level to provide
- * implicit synchronisation around GEM objects, e.g. set-domain will wait
- * for outstanding GPU rendering before marking the object ready for CPU
- * access, or a pageflip will wait until the GPU is complete before showing
- * the frame on the scanout.
- *
- * In order to use a fence, the object must track the fence it needs to
- * serialise with. For example, GEM objects want to track both read and
- * write access so that we can perform concurrent read operations between
- * the CPU and GPU engines, as well as waiting for all rendering to
- * complete, or waiting for the last GPU user of a "fence register". The
- * object then embeds a #i915_gem_active to track the most recent (in
- * retirement order) request relevant for the desired mode of access.
- * The #i915_gem_active is updated with i915_gem_active_set() to track the
- * most recent fence request, typically this is done as part of
- * i915_vma_move_to_active().
- *
- * When the #i915_gem_active completes (is retired), it will
- * signal its completion to the owner through a callback as well as mark
- * itself as idle (i915_gem_active.request == NULL). The owner
- * can then perform any action, such as delayed freeing of an active
- * resource including itself.
- */
-struct i915_gem_active;
-
-typedef void (*i915_gem_retire_fn)(struct i915_gem_active *,
-				   struct i915_request *);
-
-struct i915_gem_active {
-	struct i915_request __rcu *request;
-	struct list_head link;
-	i915_gem_retire_fn retire;
-};
-
-void i915_gem_retire_noop(struct i915_gem_active *,
-			  struct i915_request *request);
-
-/**
- * init_request_active - prepares the activity tracker for use
- * @active - the active tracker
- * @func - a callback when then the tracker is retired (becomes idle),
- *         can be NULL
- *
- * init_request_active() prepares the embedded @active struct for use as
- * an activity tracker, that is for tracking the last known active request
- * associated with it. When the last request becomes idle, when it is retired
- * after completion, the optional callback @func is invoked.
- */
-static inline void
-init_request_active(struct i915_gem_active *active,
-		    i915_gem_retire_fn retire)
-{
-	RCU_INIT_POINTER(active->request, NULL);
-	INIT_LIST_HEAD(&active->link);
-	active->retire = retire ?: i915_gem_retire_noop;
-}
-
-/**
- * i915_gem_active_set - updates the tracker to watch the current request
- * @active - the active tracker
- * @request - the request to watch
- *
- * i915_gem_active_set() watches the given @request for completion. Whilst
- * that @request is busy, the @active reports busy. When that @request is
- * retired, the @active tracker is updated to report idle.
- */
-static inline void
-i915_gem_active_set(struct i915_gem_active *active,
-		    struct i915_request *request)
-{
-	list_move(&active->link, &request->active_list);
-	rcu_assign_pointer(active->request, request);
-}
-
-/**
- * i915_gem_active_set_retire_fn - updates the retirement callback
- * @active - the active tracker
- * @fn - the routine called when the request is retired
- * @mutex - struct_mutex used to guard retirements
- *
- * i915_gem_active_set_retire_fn() updates the function pointer that
- * is called when the final request associated with the @active tracker
- * is retired.
- */
-static inline void
-i915_gem_active_set_retire_fn(struct i915_gem_active *active,
-			      i915_gem_retire_fn fn,
-			      struct mutex *mutex)
-{
-	lockdep_assert_held(mutex);
-	active->retire = fn ?: i915_gem_retire_noop;
-}
-
-static inline struct i915_request *
-__i915_gem_active_peek(const struct i915_gem_active *active)
-{
-	/*
-	 * Inside the error capture (running with the driver in an unknown
-	 * state), we want to bend the rules slightly (a lot).
-	 *
-	 * Work is in progress to make it safer, in the meantime this keeps
-	 * the known issue from spamming the logs.
-	 */
-	return rcu_dereference_protected(active->request, 1);
-}
-
-/**
- * i915_gem_active_raw - return the active request
- * @active - the active tracker
- *
- * i915_gem_active_raw() returns the current request being tracked, or NULL.
- * It does not obtain a reference on the request for the caller, so the caller
- * must hold struct_mutex.
- */
-static inline struct i915_request *
-i915_gem_active_raw(const struct i915_gem_active *active, struct mutex *mutex)
-{
-	return rcu_dereference_protected(active->request,
-					 lockdep_is_held(mutex));
-}
-
-/**
- * i915_gem_active_peek - report the active request being monitored
- * @active - the active tracker
- *
- * i915_gem_active_peek() returns the current request being tracked if
- * still active, or NULL. It does not obtain a reference on the request
- * for the caller, so the caller must hold struct_mutex.
- */
-static inline struct i915_request *
-i915_gem_active_peek(const struct i915_gem_active *active, struct mutex *mutex)
-{
-	struct i915_request *request;
-
-	request = i915_gem_active_raw(active, mutex);
-	if (!request || i915_request_completed(request))
-		return NULL;
-
-	return request;
-}
-
-/**
- * i915_gem_active_get - return a reference to the active request
- * @active - the active tracker
- *
- * i915_gem_active_get() returns a reference to the active request, or NULL
- * if the active tracker is idle. The caller must hold struct_mutex.
- */
-static inline struct i915_request *
-i915_gem_active_get(const struct i915_gem_active *active, struct mutex *mutex)
-{
-	return i915_request_get(i915_gem_active_peek(active, mutex));
-}
-
-/**
- * __i915_gem_active_get_rcu - return a reference to the active request
- * @active - the active tracker
- *
- * __i915_gem_active_get() returns a reference to the active request, or NULL
- * if the active tracker is idle. The caller must hold the RCU read lock, but
- * the returned pointer is safe to use outside of RCU.
- */
-static inline struct i915_request *
-__i915_gem_active_get_rcu(const struct i915_gem_active *active)
-{
-	/*
-	 * Performing a lockless retrieval of the active request is super
-	 * tricky. SLAB_TYPESAFE_BY_RCU merely guarantees that the backing
-	 * slab of request objects will not be freed whilst we hold the
-	 * RCU read lock. It does not guarantee that the request itself
-	 * will not be freed and then *reused*. Viz,
-	 *
-	 * Thread A			Thread B
-	 *
-	 * rq = active.request
-	 *				retire(rq) -> free(rq);
-	 *				(rq is now first on the slab freelist)
-	 *				active.request = NULL
-	 *
-	 *				rq = new submission on a new object
-	 * ref(rq)
-	 *
-	 * To prevent the request from being reused whilst the caller
-	 * uses it, we take a reference like normal. Whilst acquiring
-	 * the reference we check that it is not in a destroyed state
-	 * (refcnt == 0). That prevents the request being reallocated
-	 * whilst the caller holds on to it. To check that the request
-	 * was not reallocated as we acquired the reference we have to
-	 * check that our request remains the active request across
-	 * the lookup, in the same manner as a seqlock. The visibility
-	 * of the pointer versus the reference counting is controlled
-	 * by using RCU barriers (rcu_dereference and rcu_assign_pointer).
-	 *
-	 * In the middle of all that, we inspect whether the request is
-	 * complete. Retiring is lazy so the request may be completed long
-	 * before the active tracker is updated. Querying whether the
-	 * request is complete is far cheaper (as it involves no locked
-	 * instructions setting cachelines to exclusive) than acquiring
-	 * the reference, so we do it first. The RCU read lock ensures the
-	 * pointer dereference is valid, but does not ensure that the
-	 * seqno nor HWS is the right one! However, if the request was
-	 * reallocated, that means the active tracker's request was complete.
-	 * If the new request is also complete, then both are and we can
-	 * just report the active tracker is idle. If the new request is
-	 * incomplete, then we acquire a reference on it and check that
-	 * it remained the active request.
-	 *
-	 * It is then imperative that we do not zero the request on
-	 * reallocation, so that we can chase the dangling pointers!
-	 * See i915_request_alloc().
-	 */
-	do {
-		struct i915_request *request;
-
-		request = rcu_dereference(active->request);
-		if (!request || i915_request_completed(request))
-			return NULL;
-
-		/*
-		 * An especially silly compiler could decide to recompute the
-		 * result of i915_request_completed, more specifically
-		 * re-emit the load for request->fence.seqno. A race would catch
-		 * a later seqno value, which could flip the result from true to
-		 * false. Which means part of the instructions below might not
-		 * be executed, while later on instructions are executed. Due to
-		 * barriers within the refcounting the inconsistency can't reach
-		 * past the call to i915_request_get_rcu, but not executing
-		 * that while still executing i915_request_put() creates
-		 * havoc enough.  Prevent this with a compiler barrier.
-		 */
-		barrier();
-
-		request = i915_request_get_rcu(request);
-
-		/*
-		 * What stops the following rcu_access_pointer() from occurring
-		 * before the above i915_request_get_rcu()? If we were
-		 * to read the value before pausing to get the reference to
-		 * the request, we may not notice a change in the active
-		 * tracker.
-		 *
-		 * The rcu_access_pointer() is a mere compiler barrier, which
-		 * means both the CPU and compiler are free to perform the
-		 * memory read without constraint. The compiler only has to
-		 * ensure that any operations after the rcu_access_pointer()
-		 * occur afterwards in program order. This means the read may
-		 * be performed earlier by an out-of-order CPU, or adventurous
-		 * compiler.
-		 *
-		 * The atomic operation at the heart of
-		 * i915_request_get_rcu(), see dma_fence_get_rcu(), is
-		 * atomic_inc_not_zero() which is only a full memory barrier
-		 * when successful. That is, if i915_request_get_rcu()
-		 * returns the request (and so with the reference counted
-		 * incremented) then the following read for rcu_access_pointer()
-		 * must occur after the atomic operation and so confirm
-		 * that this request is the one currently being tracked.
-		 *
-		 * The corresponding write barrier is part of
-		 * rcu_assign_pointer().
-		 */
-		if (!request || request == rcu_access_pointer(active->request))
-			return rcu_pointer_handoff(request);
-
-		i915_request_put(request);
-	} while (1);
-}
-
-/**
- * i915_gem_active_get_unlocked - return a reference to the active request
- * @active - the active tracker
- *
- * i915_gem_active_get_unlocked() returns a reference to the active request,
- * or NULL if the active tracker is idle. The reference is obtained under RCU,
- * so no locking is required by the caller.
- *
- * The reference should be freed with i915_request_put().
- */
-static inline struct i915_request *
-i915_gem_active_get_unlocked(const struct i915_gem_active *active)
-{
-	struct i915_request *request;
-
-	rcu_read_lock();
-	request = __i915_gem_active_get_rcu(active);
-	rcu_read_unlock();
-
-	return request;
-}
-
-/**
- * i915_gem_active_isset - report whether the active tracker is assigned
- * @active - the active tracker
- *
- * i915_gem_active_isset() returns true if the active tracker is currently
- * assigned to a request. Due to the lazy retiring, that request may be idle
- * and this may report stale information.
- */
-static inline bool
-i915_gem_active_isset(const struct i915_gem_active *active)
-{
-	return rcu_access_pointer(active->request);
-}
-
-/**
- * i915_gem_active_wait - waits until the request is completed
- * @active - the active request on which to wait
- * @flags - how to wait
- * @timeout - how long to wait at most
- * @rps - userspace client to charge for a waitboost
- *
- * i915_gem_active_wait() waits until the request is completed before
- * returning, without requiring any locks to be held. Note that it does not
- * retire any requests before returning.
- *
- * This function relies on RCU in order to acquire the reference to the active
- * request without holding any locks. See __i915_gem_active_get_rcu() for the
- * glory details on how that is managed. Once the reference is acquired, we
- * can then wait upon the request, and afterwards release our reference,
- * free of any locking.
- *
- * This function wraps i915_request_wait(), see it for the full details on
- * the arguments.
- *
- * Returns 0 if successful, or a negative error code.
- */
-static inline int
-i915_gem_active_wait(const struct i915_gem_active *active, unsigned int flags)
-{
-	struct i915_request *request;
-	long ret = 0;
-
-	request = i915_gem_active_get_unlocked(active);
-	if (request) {
-		ret = i915_request_wait(request, flags, MAX_SCHEDULE_TIMEOUT);
-		i915_request_put(request);
-	}
-
-	return ret < 0 ? ret : 0;
-}
-
-/**
- * i915_gem_active_retire - waits until the request is retired
- * @active - the active request on which to wait
- *
- * i915_gem_active_retire() waits until the request is completed,
- * and then ensures that at least the retirement handler for this
- * @active tracker is called before returning. If the @active
- * tracker is idle, the function returns immediately.
- */
-static inline int __must_check
-i915_gem_active_retire(struct i915_gem_active *active,
-		       struct mutex *mutex)
-{
-	struct i915_request *request;
-	long ret;
-
-	request = i915_gem_active_raw(active, mutex);
-	if (!request)
-		return 0;
-
-	ret = i915_request_wait(request,
-				I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
-				MAX_SCHEDULE_TIMEOUT);
-	if (ret < 0)
-		return ret;
-
-	list_del_init(&active->link);
-	RCU_INIT_POINTER(active->request, NULL);
-
-	active->retire(active, request);
-
-	return 0;
-}
-
-#define for_each_active(mask, idx) \
-	for (; mask ? idx = ffs(mask) - 1, 1 : 0; mask &= ~BIT(idx))
-
 int i915_global_request_init(void);
 void i915_global_request_shrink(void);
 void i915_global_request_exit(void);
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index ca19fcf29c5b..86d9c46aef18 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -887,7 +887,7 @@ static bool __i915_gem_unset_wedged(struct drm_i915_private *i915)
 	list_for_each_entry(tl, &i915->gt.timelines.active_list, link) {
 		struct i915_request *rq;
 
-		rq = i915_gem_active_get_unlocked(&tl->last_request);
+		rq = i915_active_request_get_unlocked(&tl->last_request);
 		if (!rq)
 			continue;
 
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index b354843a5040..b2202d2e58a2 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -163,8 +163,8 @@ int i915_timeline_init(struct drm_i915_private *i915,
 
 	spin_lock_init(&timeline->lock);
 
-	init_request_active(&timeline->barrier, NULL);
-	init_request_active(&timeline->last_request, NULL);
+	INIT_ACTIVE_REQUEST(&timeline->barrier);
+	INIT_ACTIVE_REQUEST(&timeline->last_request);
 	INIT_LIST_HEAD(&timeline->requests);
 
 	i915_syncmap_init(&timeline->sync);
@@ -236,7 +236,7 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 {
 	GEM_BUG_ON(timeline->pin_count);
 	GEM_BUG_ON(!list_empty(&timeline->requests));
-	GEM_BUG_ON(i915_gem_active_isset(&timeline->barrier));
+	GEM_BUG_ON(i915_active_request_isset(&timeline->barrier));
 
 	i915_syncmap_free(&timeline->sync);
 	hwsp_free(timeline);
@@ -268,25 +268,6 @@ i915_timeline_create(struct drm_i915_private *i915,
 	return timeline;
 }
 
-int i915_timeline_set_barrier(struct i915_timeline *tl, struct i915_request *rq)
-{
-	struct i915_request *old;
-	int err;
-
-	lockdep_assert_held(&rq->i915->drm.struct_mutex);
-
-	/* Must maintain ordering wrt existing barriers */
-	old = i915_gem_active_raw(&tl->barrier, &rq->i915->drm.struct_mutex);
-	if (old) {
-		err = i915_request_await_dma_fence(rq, &old->fence);
-		if (err)
-			return err;
-	}
-
-	i915_gem_active_set(&tl->barrier, rq);
-	return 0;
-}
-
 int i915_timeline_pin(struct i915_timeline *tl)
 {
 	int err;
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index d167e04073c5..7bec7d2e45bf 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -28,6 +28,7 @@
 #include <linux/list.h>
 #include <linux/kref.h>
 
+#include "i915_active.h"
 #include "i915_request.h"
 #include "i915_syncmap.h"
 #include "i915_utils.h"
@@ -58,10 +59,10 @@ struct i915_timeline {
 
 	/* Contains an RCU guarded pointer to the last request. No reference is
 	 * held to the request, users must carefully acquire a reference to
-	 * the request using i915_gem_active_get_request_rcu(), or hold the
+	 * the request using i915_active_request_get_request_rcu(), or hold the
 	 * struct_mutex.
 	 */
-	struct i915_gem_active last_request;
+	struct i915_active_request last_request;
 
 	/**
 	 * We track the most recent seqno that we wait on in every context so
@@ -82,7 +83,7 @@ struct i915_timeline {
 	 * subsequent submissions to this timeline be executed only after the
 	 * barrier has been completed.
 	 */
-	struct i915_gem_active barrier;
+	struct i915_active_request barrier;
 
 	struct list_head link;
 	const char *name;
@@ -174,7 +175,10 @@ void i915_timelines_fini(struct drm_i915_private *i915);
  * submissions on @timeline. Subsequent requests will not be submitted to GPU
  * until the barrier has been completed.
  */
-int i915_timeline_set_barrier(struct i915_timeline *timeline,
-			      struct i915_request *rq);
+static inline int
+i915_timeline_set_barrier(struct i915_timeline *tl, struct i915_request *rq)
+{
+	return i915_active_request_set(&tl->barrier, rq);
+}
 
 #endif
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index d4772061e642..b713bed20c38 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -120,7 +120,7 @@ vma_create(struct drm_i915_gem_object *obj,
 		return ERR_PTR(-ENOMEM);
 
 	i915_active_init(vm->i915, &vma->active, __i915_vma_retire);
-	init_request_active(&vma->last_fence, NULL);
+	INIT_ACTIVE_REQUEST(&vma->last_fence);
 
 	vma->vm = vm;
 	vma->ops = &vm->vma_ops;
@@ -808,7 +808,7 @@ static void __i915_vma_destroy(struct i915_vma *vma)
 	GEM_BUG_ON(vma->node.allocated);
 	GEM_BUG_ON(vma->fence);
 
-	GEM_BUG_ON(i915_gem_active_isset(&vma->last_fence));
+	GEM_BUG_ON(i915_active_request_isset(&vma->last_fence));
 
 	mutex_lock(&vma->vm->mutex);
 	list_del(&vma->vm_link);
@@ -942,14 +942,14 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 		obj->write_domain = I915_GEM_DOMAIN_RENDER;
 
 		if (intel_fb_obj_invalidate(obj, ORIGIN_CS))
-			i915_gem_active_set(&obj->frontbuffer_write, rq);
+			__i915_active_request_set(&obj->frontbuffer_write, rq);
 
 		obj->read_domains = 0;
 	}
 	obj->read_domains |= I915_GEM_GPU_DOMAINS;
 
 	if (flags & EXEC_OBJECT_NEEDS_FENCE)
-		i915_gem_active_set(&vma->last_fence, rq);
+		__i915_active_request_set(&vma->last_fence, rq);
 
 	export_fence(vma, rq, flags);
 	return 0;
@@ -986,8 +986,8 @@ int i915_vma_unbind(struct i915_vma *vma)
 		if (ret)
 			goto unpin;
 
-		ret = i915_gem_active_retire(&vma->last_fence,
-					     &vma->vm->i915->drm.struct_mutex);
+		ret = i915_active_request_retire(&vma->last_fence,
+					      &vma->vm->i915->drm.struct_mutex);
 unpin:
 		__i915_vma_unpin(vma);
 		if (ret)
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 3c03d4569481..7c742027f866 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -110,7 +110,7 @@ struct i915_vma {
 #define I915_VMA_GGTT_WRITE	BIT(15)
 
 	struct i915_active active;
-	struct i915_gem_active last_fence;
+	struct i915_active_request last_fence;
 
 	/**
 	 * Support different GGTT views into the same object.
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index ec2cbbe070a4..0dbd6d7c1693 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1124,7 +1124,7 @@ bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine)
 	 * the last request that remains in the timeline. When idle, it is
 	 * the last executed context as tracked by retirement.
 	 */
-	rq = __i915_gem_active_peek(&engine->timeline.last_request);
+	rq = __i915_active_request_peek(&engine->timeline.last_request);
 	if (rq)
 		return rq->hw_context == kernel_context;
 	else
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index a9238fd07e30..c0df1dbb0069 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -186,7 +186,7 @@ struct intel_overlay {
 	struct overlay_registers __iomem *regs;
 	u32 flip_addr;
 	/* flip handling */
-	struct i915_gem_active last_flip;
+	struct i915_active_request last_flip;
 };
 
 static void i830_overlay_clock_gating(struct drm_i915_private *dev_priv,
@@ -214,23 +214,23 @@ static void i830_overlay_clock_gating(struct drm_i915_private *dev_priv,
 
 static void intel_overlay_submit_request(struct intel_overlay *overlay,
 					 struct i915_request *rq,
-					 i915_gem_retire_fn retire)
+					 i915_active_retire_fn retire)
 {
-	GEM_BUG_ON(i915_gem_active_peek(&overlay->last_flip,
-					&overlay->i915->drm.struct_mutex));
-	i915_gem_active_set_retire_fn(&overlay->last_flip, retire,
-				      &overlay->i915->drm.struct_mutex);
-	i915_gem_active_set(&overlay->last_flip, rq);
+	GEM_BUG_ON(i915_active_request_peek(&overlay->last_flip,
+					    &overlay->i915->drm.struct_mutex));
+	i915_active_request_set_retire_fn(&overlay->last_flip, retire,
+					  &overlay->i915->drm.struct_mutex);
+	__i915_active_request_set(&overlay->last_flip, rq);
 	i915_request_add(rq);
 }
 
 static int intel_overlay_do_wait_request(struct intel_overlay *overlay,
 					 struct i915_request *rq,
-					 i915_gem_retire_fn retire)
+					 i915_active_retire_fn retire)
 {
 	intel_overlay_submit_request(overlay, rq, retire);
-	return i915_gem_active_retire(&overlay->last_flip,
-				      &overlay->i915->drm.struct_mutex);
+	return i915_active_request_retire(&overlay->last_flip,
+					  &overlay->i915->drm.struct_mutex);
 }
 
 static struct i915_request *alloc_request(struct intel_overlay *overlay)
@@ -351,8 +351,9 @@ static void intel_overlay_release_old_vma(struct intel_overlay *overlay)
 	i915_vma_put(vma);
 }
 
-static void intel_overlay_release_old_vid_tail(struct i915_gem_active *active,
-					       struct i915_request *rq)
+static void
+intel_overlay_release_old_vid_tail(struct i915_active_request *active,
+				   struct i915_request *rq)
 {
 	struct intel_overlay *overlay =
 		container_of(active, typeof(*overlay), last_flip);
@@ -360,7 +361,7 @@ static void intel_overlay_release_old_vid_tail(struct i915_gem_active *active,
 	intel_overlay_release_old_vma(overlay);
 }
 
-static void intel_overlay_off_tail(struct i915_gem_active *active,
+static void intel_overlay_off_tail(struct i915_active_request *active,
 				   struct i915_request *rq)
 {
 	struct intel_overlay *overlay =
@@ -423,8 +424,8 @@ static int intel_overlay_off(struct intel_overlay *overlay)
  * We have to be careful not to repeat work forever an make forward progess. */
 static int intel_overlay_recover_from_interrupt(struct intel_overlay *overlay)
 {
-	return i915_gem_active_retire(&overlay->last_flip,
-				      &overlay->i915->drm.struct_mutex);
+	return i915_active_request_retire(&overlay->last_flip,
+					  &overlay->i915->drm.struct_mutex);
 }
 
 /* Wait for pending overlay flip and release old frame.
@@ -1357,7 +1358,7 @@ void intel_overlay_setup(struct drm_i915_private *dev_priv)
 	overlay->contrast = 75;
 	overlay->saturation = 146;
 
-	init_request_active(&overlay->last_flip, NULL);
+	INIT_ACTIVE_REQUEST(&overlay->last_flip);
 
 	mutex_lock(&dev_priv->drm.struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 30ab0e04a674..72151aab208e 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -501,8 +501,8 @@ static int live_suppress_wait_preempt(void *arg)
 				}
 
 				/* Disable NEWCLIENT promotion */
-				i915_gem_active_set(&rq[i]->timeline->last_request,
-						    dummy);
+				__i915_active_request_set(&rq[i]->timeline->last_request,
+							  dummy);
 				i915_request_add(rq[i]);
 			}
 
diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c
index e5659aaa856d..d2de9ece2118 100644
--- a/drivers/gpu/drm/i915/selftests/mock_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c
@@ -15,8 +15,8 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context)
 
 	spin_lock_init(&timeline->lock);
 
-	init_request_active(&timeline->barrier, NULL);
-	init_request_active(&timeline->last_request, NULL);
+	INIT_ACTIVE_REQUEST(&timeline->barrier);
+	INIT_ACTIVE_REQUEST(&timeline->last_request);
 	INIT_LIST_HEAD(&timeline->requests);
 
 	i915_syncmap_init(&timeline->sync);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 17/22] drm/i915: Pull i915_gem_active into the i915_active family
  2019-02-04 13:22 ` [PATCH 17/22] drm/i915: Pull i915_gem_active into the i915_active family Chris Wilson
@ 2019-02-04 18:50   ` Tvrtko Ursulin
  0 siblings, 0 replies; 45+ messages in thread
From: Tvrtko Ursulin @ 2019-02-04 18:50 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/02/2019 13:22, Chris Wilson wrote:
> Looking forward, we need to break the struct_mutex dependency on
> i915_gem_active. In the meantime, external use of i915_gem_active is
> quite beguiling, little do new users suspect that it implies a barrier
> as each request it tracks must be ordered wrt the previous one. As one
> of many, it can be used to track activity across multiple timelines, a
> shared fence, which fits our unordered request submission much better. We
> need to steer external users away from the singular, exclusive fence
> imposed by i915_gem_active to i915_active instead. As part of that
> process, we move i915_gem_active out of i915_request.c into
> i915_active.c to start separating the two concepts, and rename it to
> i915_active_request (both to tie it to the concept of tracking just one
> request, and to give it a longer, less appealing name).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Assuming the patch was unchanged, I'll copy&paste from last round:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> ---
>   drivers/gpu/drm/i915/i915_active.c            |  62 ++-
>   drivers/gpu/drm/i915/i915_active.h            | 349 ++++++++++++++++
>   drivers/gpu/drm/i915/i915_active_types.h      |  16 +-
>   drivers/gpu/drm/i915/i915_debugfs.c           |   2 +-
>   drivers/gpu/drm/i915/i915_gem.c               |  10 +-
>   drivers/gpu/drm/i915/i915_gem_context.c       |   4 +-
>   drivers/gpu/drm/i915/i915_gem_fence_reg.c     |   4 +-
>   drivers/gpu/drm/i915/i915_gem_gtt.c           |   2 +-
>   drivers/gpu/drm/i915/i915_gem_object.h        |   2 +-
>   drivers/gpu/drm/i915/i915_gpu_error.c         |  10 +-
>   drivers/gpu/drm/i915/i915_request.c           |  35 +-
>   drivers/gpu/drm/i915/i915_request.h           | 383 ------------------
>   drivers/gpu/drm/i915/i915_reset.c             |   2 +-
>   drivers/gpu/drm/i915/i915_timeline.c          |  25 +-
>   drivers/gpu/drm/i915/i915_timeline.h          |  14 +-
>   drivers/gpu/drm/i915/i915_vma.c               |  12 +-
>   drivers/gpu/drm/i915/i915_vma.h               |   2 +-
>   drivers/gpu/drm/i915/intel_engine_cs.c        |   2 +-
>   drivers/gpu/drm/i915/intel_overlay.c          |  33 +-
>   drivers/gpu/drm/i915/selftests/intel_lrc.c    |   4 +-
>   .../gpu/drm/i915/selftests/mock_timeline.c    |   4 +-
>   21 files changed, 474 insertions(+), 503 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> index d23092d8c89f..846900535d10 100644
> --- a/drivers/gpu/drm/i915/i915_active.c
> +++ b/drivers/gpu/drm/i915/i915_active.c
> @@ -21,7 +21,7 @@ static struct i915_global_active {
>   } global;
>   
>   struct active_node {
> -	struct i915_gem_active base;
> +	struct i915_active_request base;
>   	struct i915_active *ref;
>   	struct rb_node node;
>   	u64 timeline;
> @@ -33,7 +33,7 @@ __active_park(struct i915_active *ref)
>   	struct active_node *it, *n;
>   
>   	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
> -		GEM_BUG_ON(i915_gem_active_isset(&it->base));
> +		GEM_BUG_ON(i915_active_request_isset(&it->base));
>   		kmem_cache_free(global.slab_cache, it);
>   	}
>   	ref->tree = RB_ROOT;
> @@ -53,18 +53,18 @@ __active_retire(struct i915_active *ref)
>   }
>   
>   static void
> -node_retire(struct i915_gem_active *base, struct i915_request *rq)
> +node_retire(struct i915_active_request *base, struct i915_request *rq)
>   {
>   	__active_retire(container_of(base, struct active_node, base)->ref);
>   }
>   
>   static void
> -last_retire(struct i915_gem_active *base, struct i915_request *rq)
> +last_retire(struct i915_active_request *base, struct i915_request *rq)
>   {
>   	__active_retire(container_of(base, struct i915_active, last));
>   }
>   
> -static struct i915_gem_active *
> +static struct i915_active_request *
>   active_instance(struct i915_active *ref, u64 idx)
>   {
>   	struct active_node *node;
> @@ -85,7 +85,7 @@ active_instance(struct i915_active *ref, u64 idx)
>   	 * twice for the same timeline (as the older rbtree element will be
>   	 * retired before the new request added to last).
>   	 */
> -	old = i915_gem_active_raw(&ref->last, BKL(ref));
> +	old = i915_active_request_raw(&ref->last, BKL(ref));
>   	if (!old || old->fence.context == idx)
>   		goto out;
>   
> @@ -110,7 +110,7 @@ active_instance(struct i915_active *ref, u64 idx)
>   	node = kmem_cache_alloc(global.slab_cache, GFP_KERNEL);
>   
>   	/* kmalloc may retire the ref->last (thanks shrinker)! */
> -	if (unlikely(!i915_gem_active_raw(&ref->last, BKL(ref)))) {
> +	if (unlikely(!i915_active_request_raw(&ref->last, BKL(ref)))) {
>   		kmem_cache_free(global.slab_cache, node);
>   		goto out;
>   	}
> @@ -118,7 +118,7 @@ active_instance(struct i915_active *ref, u64 idx)
>   	if (unlikely(!node))
>   		return ERR_PTR(-ENOMEM);
>   
> -	init_request_active(&node->base, node_retire);
> +	i915_active_request_init(&node->base, NULL, node_retire);
>   	node->ref = ref;
>   	node->timeline = idx;
>   
> @@ -133,7 +133,7 @@ active_instance(struct i915_active *ref, u64 idx)
>   	 * callback not two, and so much undo the active counting for the
>   	 * overwritten slot.
>   	 */
> -	if (i915_gem_active_isset(&node->base)) {
> +	if (i915_active_request_isset(&node->base)) {
>   		/* Retire ourselves from the old rq->active_list */
>   		__list_del_entry(&node->base.link);
>   		ref->count--;
> @@ -154,7 +154,7 @@ void i915_active_init(struct drm_i915_private *i915,
>   	ref->i915 = i915;
>   	ref->retire = retire;
>   	ref->tree = RB_ROOT;
> -	init_request_active(&ref->last, last_retire);
> +	i915_active_request_init(&ref->last, NULL, last_retire);
>   	ref->count = 0;
>   }
>   
> @@ -162,15 +162,15 @@ int i915_active_ref(struct i915_active *ref,
>   		    u64 timeline,
>   		    struct i915_request *rq)
>   {
> -	struct i915_gem_active *active;
> +	struct i915_active_request *active;
>   
>   	active = active_instance(ref, timeline);
>   	if (IS_ERR(active))
>   		return PTR_ERR(active);
>   
> -	if (!i915_gem_active_isset(active))
> +	if (!i915_active_request_isset(active))
>   		ref->count++;
> -	i915_gem_active_set(active, rq);
> +	__i915_active_request_set(active, rq);
>   
>   	GEM_BUG_ON(!ref->count);
>   	return 0;
> @@ -196,12 +196,12 @@ int i915_active_wait(struct i915_active *ref)
>   	if (i915_active_acquire(ref))
>   		goto out_release;
>   
> -	ret = i915_gem_active_retire(&ref->last, BKL(ref));
> +	ret = i915_active_request_retire(&ref->last, BKL(ref));
>   	if (ret)
>   		goto out_release;
>   
>   	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
> -		ret = i915_gem_active_retire(&it->base, BKL(ref));
> +		ret = i915_active_request_retire(&it->base, BKL(ref));
>   		if (ret)
>   			break;
>   	}
> @@ -211,11 +211,11 @@ int i915_active_wait(struct i915_active *ref)
>   	return ret;
>   }
>   
> -static int __i915_request_await_active(struct i915_request *rq,
> -				       struct i915_gem_active *active)
> +int i915_request_await_active_request(struct i915_request *rq,
> +				      struct i915_active_request *active)
>   {
>   	struct i915_request *barrier =
> -		i915_gem_active_raw(active, &rq->i915->drm.struct_mutex);
> +		i915_active_request_raw(active, &rq->i915->drm.struct_mutex);
>   
>   	return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0;
>   }
> @@ -225,12 +225,12 @@ int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
>   	struct active_node *it, *n;
>   	int ret;
>   
> -	ret = __i915_request_await_active(rq, &ref->last);
> +	ret = i915_request_await_active_request(rq, &ref->last);
>   	if (ret)
>   		return ret;
>   
>   	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
> -		ret = __i915_request_await_active(rq, &it->base);
> +		ret = i915_request_await_active_request(rq, &it->base);
>   		if (ret)
>   			return ret;
>   	}
> @@ -241,12 +241,32 @@ int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
>   #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
>   void i915_active_fini(struct i915_active *ref)
>   {
> -	GEM_BUG_ON(i915_gem_active_isset(&ref->last));
> +	GEM_BUG_ON(i915_active_request_isset(&ref->last));
>   	GEM_BUG_ON(!RB_EMPTY_ROOT(&ref->tree));
>   	GEM_BUG_ON(ref->count);
>   }
>   #endif
>   
> +int i915_active_request_set(struct i915_active_request *active,
> +			    struct i915_request *rq)
> +{
> +	int err;
> +
> +	/* Must maintain ordering wrt previous active requests */
> +	err = i915_request_await_active_request(rq, active);
> +	if (err)
> +		return err;
> +
> +	__i915_active_request_set(active, rq);
> +	return 0;
> +}
> +
> +void i915_active_retire_noop(struct i915_active_request *active,
> +			     struct i915_request *request)
> +{
> +	/* Space left intentionally blank */
> +}
> +
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   #include "selftests/i915_active.c"
>   #endif
> diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
> index 6c56d10b1f59..5fbd9102384b 100644
> --- a/drivers/gpu/drm/i915/i915_active.h
> +++ b/drivers/gpu/drm/i915/i915_active.h
> @@ -7,7 +7,354 @@
>   #ifndef _I915_ACTIVE_H_
>   #define _I915_ACTIVE_H_
>   
> +#include <linux/lockdep.h>
> +
>   #include "i915_active_types.h"
> +#include "i915_request.h"
> +
> +/*
> + * We treat requests as fences. This is not be to confused with our
> + * "fence registers" but pipeline synchronisation objects ala GL_ARB_sync.
> + * We use the fences to synchronize access from the CPU with activity on the
> + * GPU, for example, we should not rewrite an object's PTE whilst the GPU
> + * is reading them. We also track fences at a higher level to provide
> + * implicit synchronisation around GEM objects, e.g. set-domain will wait
> + * for outstanding GPU rendering before marking the object ready for CPU
> + * access, or a pageflip will wait until the GPU is complete before showing
> + * the frame on the scanout.
> + *
> + * In order to use a fence, the object must track the fence it needs to
> + * serialise with. For example, GEM objects want to track both read and
> + * write access so that we can perform concurrent read operations between
> + * the CPU and GPU engines, as well as waiting for all rendering to
> + * complete, or waiting for the last GPU user of a "fence register". The
> + * object then embeds a #i915_active_request to track the most recent (in
> + * retirement order) request relevant for the desired mode of access.
> + * The #i915_active_request is updated with i915_active_request_set() to
> + * track the most recent fence request, typically this is done as part of
> + * i915_vma_move_to_active().
> + *
> + * When the #i915_active_request completes (is retired), it will
> + * signal its completion to the owner through a callback as well as mark
> + * itself as idle (i915_active_request.request == NULL). The owner
> + * can then perform any action, such as delayed freeing of an active
> + * resource including itself.
> + */
> +
> +void i915_active_retire_noop(struct i915_active_request *active,
> +			     struct i915_request *request);
> +
> +/**
> + * i915_active_request_init - prepares the activity tracker for use
> + * @active - the active tracker
> + * @rq - initial request to track, can be NULL
> + * @func - a callback when then the tracker is retired (becomes idle),
> + *         can be NULL
> + *
> + * i915_active_request_init() prepares the embedded @active struct for use as
> + * an activity tracker, that is for tracking the last known active request
> + * associated with it. When the last request becomes idle, when it is retired
> + * after completion, the optional callback @func is invoked.
> + */
> +static inline void
> +i915_active_request_init(struct i915_active_request *active,
> +			 struct i915_request *rq,
> +			 i915_active_retire_fn retire)
> +{
> +	RCU_INIT_POINTER(active->request, rq);
> +	INIT_LIST_HEAD(&active->link);
> +	active->retire = retire ?: i915_active_retire_noop;
> +}
> +
> +#define INIT_ACTIVE_REQUEST(name) i915_active_request_init((name), NULL, NULL)
> +
> +/**
> + * i915_active_request_set - updates the tracker to watch the current request
> + * @active - the active tracker
> + * @request - the request to watch
> + *
> + * __i915_active_request_set() watches the given @request for completion. Whilst
> + * that @request is busy, the @active reports busy. When that @request is
> + * retired, the @active tracker is updated to report idle.
> + */
> +static inline void
> +__i915_active_request_set(struct i915_active_request *active,
> +			  struct i915_request *request)
> +{
> +	list_move(&active->link, &request->active_list);
> +	rcu_assign_pointer(active->request, request);
> +}
> +
> +int __must_check
> +i915_active_request_set(struct i915_active_request *active,
> +			struct i915_request *rq);
> +
> +/**
> + * i915_active_request_set_retire_fn - updates the retirement callback
> + * @active - the active tracker
> + * @fn - the routine called when the request is retired
> + * @mutex - struct_mutex used to guard retirements
> + *
> + * i915_active_request_set_retire_fn() updates the function pointer that
> + * is called when the final request associated with the @active tracker
> + * is retired.
> + */
> +static inline void
> +i915_active_request_set_retire_fn(struct i915_active_request *active,
> +				  i915_active_retire_fn fn,
> +				  struct mutex *mutex)
> +{
> +	lockdep_assert_held(mutex);
> +	active->retire = fn ?: i915_active_retire_noop;
> +}
> +
> +static inline struct i915_request *
> +__i915_active_request_peek(const struct i915_active_request *active)
> +{
> +	/*
> +	 * Inside the error capture (running with the driver in an unknown
> +	 * state), we want to bend the rules slightly (a lot).
> +	 *
> +	 * Work is in progress to make it safer, in the meantime this keeps
> +	 * the known issue from spamming the logs.
> +	 */
> +	return rcu_dereference_protected(active->request, 1);
> +}
> +
> +/**
> + * i915_active_request_raw - return the active request
> + * @active - the active tracker
> + *
> + * i915_active_request_raw() returns the current request being tracked, or NULL.
> + * It does not obtain a reference on the request for the caller, so the caller
> + * must hold struct_mutex.
> + */
> +static inline struct i915_request *
> +i915_active_request_raw(const struct i915_active_request *active,
> +			struct mutex *mutex)
> +{
> +	return rcu_dereference_protected(active->request,
> +					 lockdep_is_held(mutex));
> +}
> +
> +/**
> + * i915_active_request_peek - report the active request being monitored
> + * @active - the active tracker
> + *
> + * i915_active_request_peek() returns the current request being tracked if
> + * still active, or NULL. It does not obtain a reference on the request
> + * for the caller, so the caller must hold struct_mutex.
> + */
> +static inline struct i915_request *
> +i915_active_request_peek(const struct i915_active_request *active,
> +			 struct mutex *mutex)
> +{
> +	struct i915_request *request;
> +
> +	request = i915_active_request_raw(active, mutex);
> +	if (!request || i915_request_completed(request))
> +		return NULL;
> +
> +	return request;
> +}
> +
> +/**
> + * i915_active_request_get - return a reference to the active request
> + * @active - the active tracker
> + *
> + * i915_active_request_get() returns a reference to the active request, or NULL
> + * if the active tracker is idle. The caller must hold struct_mutex.
> + */
> +static inline struct i915_request *
> +i915_active_request_get(const struct i915_active_request *active,
> +			struct mutex *mutex)
> +{
> +	return i915_request_get(i915_active_request_peek(active, mutex));
> +}
> +
> +/**
> + * __i915_active_request_get_rcu - return a reference to the active request
> + * @active - the active tracker
> + *
> + * __i915_active_request_get() returns a reference to the active request,
> + * or NULL if the active tracker is idle. The caller must hold the RCU read
> + * lock, but the returned pointer is safe to use outside of RCU.
> + */
> +static inline struct i915_request *
> +__i915_active_request_get_rcu(const struct i915_active_request *active)
> +{
> +	/*
> +	 * Performing a lockless retrieval of the active request is super
> +	 * tricky. SLAB_TYPESAFE_BY_RCU merely guarantees that the backing
> +	 * slab of request objects will not be freed whilst we hold the
> +	 * RCU read lock. It does not guarantee that the request itself
> +	 * will not be freed and then *reused*. Viz,
> +	 *
> +	 * Thread A			Thread B
> +	 *
> +	 * rq = active.request
> +	 *				retire(rq) -> free(rq);
> +	 *				(rq is now first on the slab freelist)
> +	 *				active.request = NULL
> +	 *
> +	 *				rq = new submission on a new object
> +	 * ref(rq)
> +	 *
> +	 * To prevent the request from being reused whilst the caller
> +	 * uses it, we take a reference like normal. Whilst acquiring
> +	 * the reference we check that it is not in a destroyed state
> +	 * (refcnt == 0). That prevents the request being reallocated
> +	 * whilst the caller holds on to it. To check that the request
> +	 * was not reallocated as we acquired the reference we have to
> +	 * check that our request remains the active request across
> +	 * the lookup, in the same manner as a seqlock. The visibility
> +	 * of the pointer versus the reference counting is controlled
> +	 * by using RCU barriers (rcu_dereference and rcu_assign_pointer).
> +	 *
> +	 * In the middle of all that, we inspect whether the request is
> +	 * complete. Retiring is lazy so the request may be completed long
> +	 * before the active tracker is updated. Querying whether the
> +	 * request is complete is far cheaper (as it involves no locked
> +	 * instructions setting cachelines to exclusive) than acquiring
> +	 * the reference, so we do it first. The RCU read lock ensures the
> +	 * pointer dereference is valid, but does not ensure that the
> +	 * seqno nor HWS is the right one! However, if the request was
> +	 * reallocated, that means the active tracker's request was complete.
> +	 * If the new request is also complete, then both are and we can
> +	 * just report the active tracker is idle. If the new request is
> +	 * incomplete, then we acquire a reference on it and check that
> +	 * it remained the active request.
> +	 *
> +	 * It is then imperative that we do not zero the request on
> +	 * reallocation, so that we can chase the dangling pointers!
> +	 * See i915_request_alloc().
> +	 */
> +	do {
> +		struct i915_request *request;
> +
> +		request = rcu_dereference(active->request);
> +		if (!request || i915_request_completed(request))
> +			return NULL;
> +
> +		/*
> +		 * An especially silly compiler could decide to recompute the
> +		 * result of i915_request_completed, more specifically
> +		 * re-emit the load for request->fence.seqno. A race would catch
> +		 * a later seqno value, which could flip the result from true to
> +		 * false. Which means part of the instructions below might not
> +		 * be executed, while later on instructions are executed. Due to
> +		 * barriers within the refcounting the inconsistency can't reach
> +		 * past the call to i915_request_get_rcu, but not executing
> +		 * that while still executing i915_request_put() creates
> +		 * havoc enough.  Prevent this with a compiler barrier.
> +		 */
> +		barrier();
> +
> +		request = i915_request_get_rcu(request);
> +
> +		/*
> +		 * What stops the following rcu_access_pointer() from occurring
> +		 * before the above i915_request_get_rcu()? If we were
> +		 * to read the value before pausing to get the reference to
> +		 * the request, we may not notice a change in the active
> +		 * tracker.
> +		 *
> +		 * The rcu_access_pointer() is a mere compiler barrier, which
> +		 * means both the CPU and compiler are free to perform the
> +		 * memory read without constraint. The compiler only has to
> +		 * ensure that any operations after the rcu_access_pointer()
> +		 * occur afterwards in program order. This means the read may
> +		 * be performed earlier by an out-of-order CPU, or adventurous
> +		 * compiler.
> +		 *
> +		 * The atomic operation at the heart of
> +		 * i915_request_get_rcu(), see dma_fence_get_rcu(), is
> +		 * atomic_inc_not_zero() which is only a full memory barrier
> +		 * when successful. That is, if i915_request_get_rcu()
> +		 * returns the request (and so with the reference counted
> +		 * incremented) then the following read for rcu_access_pointer()
> +		 * must occur after the atomic operation and so confirm
> +		 * that this request is the one currently being tracked.
> +		 *
> +		 * The corresponding write barrier is part of
> +		 * rcu_assign_pointer().
> +		 */
> +		if (!request || request == rcu_access_pointer(active->request))
> +			return rcu_pointer_handoff(request);
> +
> +		i915_request_put(request);
> +	} while (1);
> +}
> +
> +/**
> + * i915_active_request_get_unlocked - return a reference to the active request
> + * @active - the active tracker
> + *
> + * i915_active_request_get_unlocked() returns a reference to the active request,
> + * or NULL if the active tracker is idle. The reference is obtained under RCU,
> + * so no locking is required by the caller.
> + *
> + * The reference should be freed with i915_request_put().
> + */
> +static inline struct i915_request *
> +i915_active_request_get_unlocked(const struct i915_active_request *active)
> +{
> +	struct i915_request *request;
> +
> +	rcu_read_lock();
> +	request = __i915_active_request_get_rcu(active);
> +	rcu_read_unlock();
> +
> +	return request;
> +}
> +
> +/**
> + * i915_active_request_isset - report whether the active tracker is assigned
> + * @active - the active tracker
> + *
> + * i915_active_request_isset() returns true if the active tracker is currently
> + * assigned to a request. Due to the lazy retiring, that request may be idle
> + * and this may report stale information.
> + */
> +static inline bool
> +i915_active_request_isset(const struct i915_active_request *active)
> +{
> +	return rcu_access_pointer(active->request);
> +}
> +
> +/**
> + * i915_active_request_retire - waits until the request is retired
> + * @active - the active request on which to wait
> + *
> + * i915_active_request_retire() waits until the request is completed,
> + * and then ensures that at least the retirement handler for this
> + * @active tracker is called before returning. If the @active
> + * tracker is idle, the function returns immediately.
> + */
> +static inline int __must_check
> +i915_active_request_retire(struct i915_active_request *active,
> +			   struct mutex *mutex)
> +{
> +	struct i915_request *request;
> +	long ret;
> +
> +	request = i915_active_request_raw(active, mutex);
> +	if (!request)
> +		return 0;
> +
> +	ret = i915_request_wait(request,
> +				I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
> +				MAX_SCHEDULE_TIMEOUT);
> +	if (ret < 0)
> +		return ret;
> +
> +	list_del_init(&active->link);
> +	RCU_INIT_POINTER(active->request, NULL);
> +
> +	active->retire(active, request);
> +
> +	return 0;
> +}
>   
>   /*
>    * GPU activity tracking
> @@ -47,6 +394,8 @@ int i915_active_wait(struct i915_active *ref);
>   
>   int i915_request_await_active(struct i915_request *rq,
>   			      struct i915_active *ref);
> +int i915_request_await_active_request(struct i915_request *rq,
> +				      struct i915_active_request *active);
>   
>   bool i915_active_acquire(struct i915_active *ref);
>   
> diff --git a/drivers/gpu/drm/i915/i915_active_types.h b/drivers/gpu/drm/i915/i915_active_types.h
> index 411e502ed8dd..b679253b53a5 100644
> --- a/drivers/gpu/drm/i915/i915_active_types.h
> +++ b/drivers/gpu/drm/i915/i915_active_types.h
> @@ -8,16 +8,26 @@
>   #define _I915_ACTIVE_TYPES_H_
>   
>   #include <linux/rbtree.h>
> -
> -#include "i915_request.h"
> +#include <linux/rcupdate.h>
>   
>   struct drm_i915_private;
> +struct i915_active_request;
> +struct i915_request;
> +
> +typedef void (*i915_active_retire_fn)(struct i915_active_request *,
> +				      struct i915_request *);
> +
> +struct i915_active_request {
> +	struct i915_request __rcu *request;
> +	struct list_head link;
> +	i915_active_retire_fn retire;
> +};
>   
>   struct i915_active {
>   	struct drm_i915_private *i915;
>   
>   	struct rb_root tree;
> -	struct i915_gem_active last;
> +	struct i915_active_request last;
>   	unsigned int count;
>   
>   	void (*retire)(struct i915_active *ref);
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 54e426883529..a270af18404f 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -207,7 +207,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>   		if (vma->fence)
>   			seq_printf(m, " , fence: %d%s",
>   				   vma->fence->id,
> -				   i915_gem_active_isset(&vma->last_fence) ? "*" : "");
> +				   i915_active_request_isset(&vma->last_fence) ? "*" : "");
>   		seq_puts(m, ")");
>   	}
>   	if (obj->stolen)
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d82e4f990586..81aa37508bc4 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2987,7 +2987,7 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
>   
>   	GEM_BUG_ON(i915->gt.active_requests);
>   	for_each_engine(engine, i915, id) {
> -		GEM_BUG_ON(__i915_gem_active_peek(&engine->timeline.last_request));
> +		GEM_BUG_ON(__i915_active_request_peek(&engine->timeline.last_request));
>   		GEM_BUG_ON(engine->last_retired_context !=
>   			   to_intel_context(i915->kernel_context, engine));
>   	}
> @@ -3233,7 +3233,7 @@ wait_for_timelines(struct drm_i915_private *i915,
>   	list_for_each_entry(tl, &gt->active_list, link) {
>   		struct i915_request *rq;
>   
> -		rq = i915_gem_active_get_unlocked(&tl->last_request);
> +		rq = i915_active_request_get_unlocked(&tl->last_request);
>   		if (!rq)
>   			continue;
>   
> @@ -4134,7 +4134,8 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
>   }
>   
>   static void
> -frontbuffer_retire(struct i915_gem_active *active, struct i915_request *request)
> +frontbuffer_retire(struct i915_active_request *active,
> +		   struct i915_request *request)
>   {
>   	struct drm_i915_gem_object *obj =
>   		container_of(active, typeof(*obj), frontbuffer_write);
> @@ -4161,7 +4162,8 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
>   	obj->resv = &obj->__builtin_resv;
>   
>   	obj->frontbuffer_ggtt_origin = ORIGIN_GTT;
> -	init_request_active(&obj->frontbuffer_write, frontbuffer_retire);
> +	i915_active_request_init(&obj->frontbuffer_write,
> +				 NULL, frontbuffer_retire);
>   
>   	obj->mm.madv = I915_MADV_WILLNEED;
>   	INIT_RADIX_TREE(&obj->mm.get_page.radix, GFP_KERNEL | __GFP_NOWARN);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 6faf1f6faab5..ea8e818d22bf 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -653,8 +653,8 @@ last_request_on_engine(struct i915_timeline *timeline,
>   
>   	GEM_BUG_ON(timeline == &engine->timeline);
>   
> -	rq = i915_gem_active_raw(&timeline->last_request,
> -				 &engine->i915->drm.struct_mutex);
> +	rq = i915_active_request_raw(&timeline->last_request,
> +				     &engine->i915->drm.struct_mutex);
>   	if (rq && rq->engine == engine) {
>   		GEM_TRACE("last request for %s on engine %s: %llx:%llu\n",
>   			  timeline->name, engine->name,
> diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
> index bd0d5b8d6c96..36d548fa3aa2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
> +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
> @@ -223,7 +223,7 @@ static int fence_update(struct drm_i915_fence_reg *fence,
>   			 i915_gem_object_get_tiling(vma->obj)))
>   			return -EINVAL;
>   
> -		ret = i915_gem_active_retire(&vma->last_fence,
> +		ret = i915_active_request_retire(&vma->last_fence,
>   					     &vma->obj->base.dev->struct_mutex);
>   		if (ret)
>   			return ret;
> @@ -232,7 +232,7 @@ static int fence_update(struct drm_i915_fence_reg *fence,
>   	if (fence->vma) {
>   		struct i915_vma *old = fence->vma;
>   
> -		ret = i915_gem_active_retire(&old->last_fence,
> +		ret = i915_active_request_retire(&old->last_fence,
>   					     &old->obj->base.dev->struct_mutex);
>   		if (ret)
>   			return ret;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index e625659c03a2..d646d37eec2f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1918,7 +1918,7 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
>   		return ERR_PTR(-ENOMEM);
>   
>   	i915_active_init(i915, &vma->active, NULL);
> -	init_request_active(&vma->last_fence, NULL);
> +	INIT_ACTIVE_REQUEST(&vma->last_fence);
>   
>   	vma->vm = &ggtt->vm;
>   	vma->ops = &pd_vma_ops;
> diff --git a/drivers/gpu/drm/i915/i915_gem_object.h b/drivers/gpu/drm/i915/i915_gem_object.h
> index 73fec917d097..fab040331cdb 100644
> --- a/drivers/gpu/drm/i915/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/i915_gem_object.h
> @@ -175,7 +175,7 @@ struct drm_i915_gem_object {
>   
>   	atomic_t frontbuffer_bits;
>   	unsigned int frontbuffer_ggtt_origin; /* write once */
> -	struct i915_gem_active frontbuffer_write;
> +	struct i915_active_request frontbuffer_write;
>   
>   	/** Current tiling stride for the object, if it's tiled. */
>   	unsigned int tiling_and_stride;
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 6e2e5ed2bd0a..9a65341fec09 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1062,23 +1062,23 @@ i915_error_object_create(struct drm_i915_private *i915,
>   }
>   
>   /* The error capture is special as tries to run underneath the normal
> - * locking rules - so we use the raw version of the i915_gem_active lookup.
> + * locking rules - so we use the raw version of the i915_active_request lookup.
>    */
>   static inline u32
> -__active_get_seqno(struct i915_gem_active *active)
> +__active_get_seqno(struct i915_active_request *active)
>   {
>   	struct i915_request *request;
>   
> -	request = __i915_gem_active_peek(active);
> +	request = __i915_active_request_peek(active);
>   	return request ? request->global_seqno : 0;
>   }
>   
>   static inline int
> -__active_get_engine_id(struct i915_gem_active *active)
> +__active_get_engine_id(struct i915_active_request *active)
>   {
>   	struct i915_request *request;
>   
> -	request = __i915_gem_active_peek(active);
> +	request = __i915_active_request_peek(active);
>   	return request ? request->engine->id : -1;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index f5b2c95125ba..ed9f16bca4fe 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -29,6 +29,7 @@
>   #include <linux/sched/signal.h>
>   
>   #include "i915_drv.h"
> +#include "i915_active.h"
>   #include "i915_reset.h"
>   
>   static struct i915_global_request {
> @@ -130,12 +131,6 @@ static void unreserve_gt(struct drm_i915_private *i915)
>   		i915_gem_park(i915);
>   }
>   
> -void i915_gem_retire_noop(struct i915_gem_active *active,
> -			  struct i915_request *request)
> -{
> -	/* Space left intentionally blank */
> -}
> -
>   static void advance_ring(struct i915_request *request)
>   {
>   	struct intel_ring *ring = request->ring;
> @@ -249,7 +244,7 @@ static void __retire_engine_upto(struct intel_engine_cs *engine,
>   
>   static void i915_request_retire(struct i915_request *request)
>   {
> -	struct i915_gem_active *active, *next;
> +	struct i915_active_request *active, *next;
>   
>   	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d:%d\n",
>   		  request->engine->name,
> @@ -283,10 +278,10 @@ static void i915_request_retire(struct i915_request *request)
>   		 * we may spend an inordinate amount of time simply handling
>   		 * the retirement of requests and processing their callbacks.
>   		 * Of which, this loop itself is particularly hot due to the
> -		 * cache misses when jumping around the list of i915_gem_active.
> -		 * So we try to keep this loop as streamlined as possible and
> -		 * also prefetch the next i915_gem_active to try and hide
> -		 * the likely cache miss.
> +		 * cache misses when jumping around the list of
> +		 * i915_active_request.  So we try to keep this loop as
> +		 * streamlined as possible and also prefetch the next
> +		 * i915_active_request to try and hide the likely cache miss.
>   		 */
>   		prefetchw(next);
>   
> @@ -543,17 +538,9 @@ i915_request_alloc_slow(struct intel_context *ce)
>   	return kmem_cache_alloc(global.slab_requests, GFP_KERNEL);
>   }
>   
> -static int add_barrier(struct i915_request *rq, struct i915_gem_active *active)
> -{
> -	struct i915_request *barrier =
> -		i915_gem_active_raw(active, &rq->i915->drm.struct_mutex);
> -
> -	return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0;
> -}
> -
>   static int add_timeline_barrier(struct i915_request *rq)
>   {
> -	return add_barrier(rq, &rq->timeline->barrier);
> +	return i915_request_await_active_request(rq, &rq->timeline->barrier);
>   }
>   
>   /**
> @@ -612,7 +599,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	 * We use RCU to look up requests in flight. The lookups may
>   	 * race with the request being allocated from the slab freelist.
>   	 * That is the request we are writing to here, may be in the process
> -	 * of being read by __i915_gem_active_get_rcu(). As such,
> +	 * of being read by __i915_active_request_get_rcu(). As such,
>   	 * we have to be very careful when overwriting the contents. During
>   	 * the RCU lookup, we change chase the request->engine pointer,
>   	 * read the request->global_seqno and increment the reference count.
> @@ -952,8 +939,8 @@ void i915_request_add(struct i915_request *request)
>   	 * see a more recent value in the hws than we are tracking.
>   	 */
>   
> -	prev = i915_gem_active_raw(&timeline->last_request,
> -				   &request->i915->drm.struct_mutex);
> +	prev = i915_active_request_raw(&timeline->last_request,
> +				       &request->i915->drm.struct_mutex);
>   	if (prev && !i915_request_completed(prev)) {
>   		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
>   					     &request->submitq);
> @@ -969,7 +956,7 @@ void i915_request_add(struct i915_request *request)
>   	spin_unlock_irq(&timeline->lock);
>   
>   	GEM_BUG_ON(timeline->seqno != request->fence.seqno);
> -	i915_gem_active_set(&timeline->last_request, request);
> +	__i915_active_request_set(&timeline->last_request, request);
>   
>   	list_add_tail(&request->ring_link, &ring->request_list);
>   	if (list_is_first(&request->ring_link, &ring->request_list)) {
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 054bd300984b..071ff1064579 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -409,389 +409,6 @@ static inline void i915_request_mark_complete(struct i915_request *rq)
>   
>   void i915_retire_requests(struct drm_i915_private *i915);
>   
> -/*
> - * We treat requests as fences. This is not be to confused with our
> - * "fence registers" but pipeline synchronisation objects ala GL_ARB_sync.
> - * We use the fences to synchronize access from the CPU with activity on the
> - * GPU, for example, we should not rewrite an object's PTE whilst the GPU
> - * is reading them. We also track fences at a higher level to provide
> - * implicit synchronisation around GEM objects, e.g. set-domain will wait
> - * for outstanding GPU rendering before marking the object ready for CPU
> - * access, or a pageflip will wait until the GPU is complete before showing
> - * the frame on the scanout.
> - *
> - * In order to use a fence, the object must track the fence it needs to
> - * serialise with. For example, GEM objects want to track both read and
> - * write access so that we can perform concurrent read operations between
> - * the CPU and GPU engines, as well as waiting for all rendering to
> - * complete, or waiting for the last GPU user of a "fence register". The
> - * object then embeds a #i915_gem_active to track the most recent (in
> - * retirement order) request relevant for the desired mode of access.
> - * The #i915_gem_active is updated with i915_gem_active_set() to track the
> - * most recent fence request, typically this is done as part of
> - * i915_vma_move_to_active().
> - *
> - * When the #i915_gem_active completes (is retired), it will
> - * signal its completion to the owner through a callback as well as mark
> - * itself as idle (i915_gem_active.request == NULL). The owner
> - * can then perform any action, such as delayed freeing of an active
> - * resource including itself.
> - */
> -struct i915_gem_active;
> -
> -typedef void (*i915_gem_retire_fn)(struct i915_gem_active *,
> -				   struct i915_request *);
> -
> -struct i915_gem_active {
> -	struct i915_request __rcu *request;
> -	struct list_head link;
> -	i915_gem_retire_fn retire;
> -};
> -
> -void i915_gem_retire_noop(struct i915_gem_active *,
> -			  struct i915_request *request);
> -
> -/**
> - * init_request_active - prepares the activity tracker for use
> - * @active - the active tracker
> - * @func - a callback when then the tracker is retired (becomes idle),
> - *         can be NULL
> - *
> - * init_request_active() prepares the embedded @active struct for use as
> - * an activity tracker, that is for tracking the last known active request
> - * associated with it. When the last request becomes idle, when it is retired
> - * after completion, the optional callback @func is invoked.
> - */
> -static inline void
> -init_request_active(struct i915_gem_active *active,
> -		    i915_gem_retire_fn retire)
> -{
> -	RCU_INIT_POINTER(active->request, NULL);
> -	INIT_LIST_HEAD(&active->link);
> -	active->retire = retire ?: i915_gem_retire_noop;
> -}
> -
> -/**
> - * i915_gem_active_set - updates the tracker to watch the current request
> - * @active - the active tracker
> - * @request - the request to watch
> - *
> - * i915_gem_active_set() watches the given @request for completion. Whilst
> - * that @request is busy, the @active reports busy. When that @request is
> - * retired, the @active tracker is updated to report idle.
> - */
> -static inline void
> -i915_gem_active_set(struct i915_gem_active *active,
> -		    struct i915_request *request)
> -{
> -	list_move(&active->link, &request->active_list);
> -	rcu_assign_pointer(active->request, request);
> -}
> -
> -/**
> - * i915_gem_active_set_retire_fn - updates the retirement callback
> - * @active - the active tracker
> - * @fn - the routine called when the request is retired
> - * @mutex - struct_mutex used to guard retirements
> - *
> - * i915_gem_active_set_retire_fn() updates the function pointer that
> - * is called when the final request associated with the @active tracker
> - * is retired.
> - */
> -static inline void
> -i915_gem_active_set_retire_fn(struct i915_gem_active *active,
> -			      i915_gem_retire_fn fn,
> -			      struct mutex *mutex)
> -{
> -	lockdep_assert_held(mutex);
> -	active->retire = fn ?: i915_gem_retire_noop;
> -}
> -
> -static inline struct i915_request *
> -__i915_gem_active_peek(const struct i915_gem_active *active)
> -{
> -	/*
> -	 * Inside the error capture (running with the driver in an unknown
> -	 * state), we want to bend the rules slightly (a lot).
> -	 *
> -	 * Work is in progress to make it safer, in the meantime this keeps
> -	 * the known issue from spamming the logs.
> -	 */
> -	return rcu_dereference_protected(active->request, 1);
> -}
> -
> -/**
> - * i915_gem_active_raw - return the active request
> - * @active - the active tracker
> - *
> - * i915_gem_active_raw() returns the current request being tracked, or NULL.
> - * It does not obtain a reference on the request for the caller, so the caller
> - * must hold struct_mutex.
> - */
> -static inline struct i915_request *
> -i915_gem_active_raw(const struct i915_gem_active *active, struct mutex *mutex)
> -{
> -	return rcu_dereference_protected(active->request,
> -					 lockdep_is_held(mutex));
> -}
> -
> -/**
> - * i915_gem_active_peek - report the active request being monitored
> - * @active - the active tracker
> - *
> - * i915_gem_active_peek() returns the current request being tracked if
> - * still active, or NULL. It does not obtain a reference on the request
> - * for the caller, so the caller must hold struct_mutex.
> - */
> -static inline struct i915_request *
> -i915_gem_active_peek(const struct i915_gem_active *active, struct mutex *mutex)
> -{
> -	struct i915_request *request;
> -
> -	request = i915_gem_active_raw(active, mutex);
> -	if (!request || i915_request_completed(request))
> -		return NULL;
> -
> -	return request;
> -}
> -
> -/**
> - * i915_gem_active_get - return a reference to the active request
> - * @active - the active tracker
> - *
> - * i915_gem_active_get() returns a reference to the active request, or NULL
> - * if the active tracker is idle. The caller must hold struct_mutex.
> - */
> -static inline struct i915_request *
> -i915_gem_active_get(const struct i915_gem_active *active, struct mutex *mutex)
> -{
> -	return i915_request_get(i915_gem_active_peek(active, mutex));
> -}
> -
> -/**
> - * __i915_gem_active_get_rcu - return a reference to the active request
> - * @active - the active tracker
> - *
> - * __i915_gem_active_get() returns a reference to the active request, or NULL
> - * if the active tracker is idle. The caller must hold the RCU read lock, but
> - * the returned pointer is safe to use outside of RCU.
> - */
> -static inline struct i915_request *
> -__i915_gem_active_get_rcu(const struct i915_gem_active *active)
> -{
> -	/*
> -	 * Performing a lockless retrieval of the active request is super
> -	 * tricky. SLAB_TYPESAFE_BY_RCU merely guarantees that the backing
> -	 * slab of request objects will not be freed whilst we hold the
> -	 * RCU read lock. It does not guarantee that the request itself
> -	 * will not be freed and then *reused*. Viz,
> -	 *
> -	 * Thread A			Thread B
> -	 *
> -	 * rq = active.request
> -	 *				retire(rq) -> free(rq);
> -	 *				(rq is now first on the slab freelist)
> -	 *				active.request = NULL
> -	 *
> -	 *				rq = new submission on a new object
> -	 * ref(rq)
> -	 *
> -	 * To prevent the request from being reused whilst the caller
> -	 * uses it, we take a reference like normal. Whilst acquiring
> -	 * the reference we check that it is not in a destroyed state
> -	 * (refcnt == 0). That prevents the request being reallocated
> -	 * whilst the caller holds on to it. To check that the request
> -	 * was not reallocated as we acquired the reference we have to
> -	 * check that our request remains the active request across
> -	 * the lookup, in the same manner as a seqlock. The visibility
> -	 * of the pointer versus the reference counting is controlled
> -	 * by using RCU barriers (rcu_dereference and rcu_assign_pointer).
> -	 *
> -	 * In the middle of all that, we inspect whether the request is
> -	 * complete. Retiring is lazy so the request may be completed long
> -	 * before the active tracker is updated. Querying whether the
> -	 * request is complete is far cheaper (as it involves no locked
> -	 * instructions setting cachelines to exclusive) than acquiring
> -	 * the reference, so we do it first. The RCU read lock ensures the
> -	 * pointer dereference is valid, but does not ensure that the
> -	 * seqno nor HWS is the right one! However, if the request was
> -	 * reallocated, that means the active tracker's request was complete.
> -	 * If the new request is also complete, then both are and we can
> -	 * just report the active tracker is idle. If the new request is
> -	 * incomplete, then we acquire a reference on it and check that
> -	 * it remained the active request.
> -	 *
> -	 * It is then imperative that we do not zero the request on
> -	 * reallocation, so that we can chase the dangling pointers!
> -	 * See i915_request_alloc().
> -	 */
> -	do {
> -		struct i915_request *request;
> -
> -		request = rcu_dereference(active->request);
> -		if (!request || i915_request_completed(request))
> -			return NULL;
> -
> -		/*
> -		 * An especially silly compiler could decide to recompute the
> -		 * result of i915_request_completed, more specifically
> -		 * re-emit the load for request->fence.seqno. A race would catch
> -		 * a later seqno value, which could flip the result from true to
> -		 * false. Which means part of the instructions below might not
> -		 * be executed, while later on instructions are executed. Due to
> -		 * barriers within the refcounting the inconsistency can't reach
> -		 * past the call to i915_request_get_rcu, but not executing
> -		 * that while still executing i915_request_put() creates
> -		 * havoc enough.  Prevent this with a compiler barrier.
> -		 */
> -		barrier();
> -
> -		request = i915_request_get_rcu(request);
> -
> -		/*
> -		 * What stops the following rcu_access_pointer() from occurring
> -		 * before the above i915_request_get_rcu()? If we were
> -		 * to read the value before pausing to get the reference to
> -		 * the request, we may not notice a change in the active
> -		 * tracker.
> -		 *
> -		 * The rcu_access_pointer() is a mere compiler barrier, which
> -		 * means both the CPU and compiler are free to perform the
> -		 * memory read without constraint. The compiler only has to
> -		 * ensure that any operations after the rcu_access_pointer()
> -		 * occur afterwards in program order. This means the read may
> -		 * be performed earlier by an out-of-order CPU, or adventurous
> -		 * compiler.
> -		 *
> -		 * The atomic operation at the heart of
> -		 * i915_request_get_rcu(), see dma_fence_get_rcu(), is
> -		 * atomic_inc_not_zero() which is only a full memory barrier
> -		 * when successful. That is, if i915_request_get_rcu()
> -		 * returns the request (and so with the reference counted
> -		 * incremented) then the following read for rcu_access_pointer()
> -		 * must occur after the atomic operation and so confirm
> -		 * that this request is the one currently being tracked.
> -		 *
> -		 * The corresponding write barrier is part of
> -		 * rcu_assign_pointer().
> -		 */
> -		if (!request || request == rcu_access_pointer(active->request))
> -			return rcu_pointer_handoff(request);
> -
> -		i915_request_put(request);
> -	} while (1);
> -}
> -
> -/**
> - * i915_gem_active_get_unlocked - return a reference to the active request
> - * @active - the active tracker
> - *
> - * i915_gem_active_get_unlocked() returns a reference to the active request,
> - * or NULL if the active tracker is idle. The reference is obtained under RCU,
> - * so no locking is required by the caller.
> - *
> - * The reference should be freed with i915_request_put().
> - */
> -static inline struct i915_request *
> -i915_gem_active_get_unlocked(const struct i915_gem_active *active)
> -{
> -	struct i915_request *request;
> -
> -	rcu_read_lock();
> -	request = __i915_gem_active_get_rcu(active);
> -	rcu_read_unlock();
> -
> -	return request;
> -}
> -
> -/**
> - * i915_gem_active_isset - report whether the active tracker is assigned
> - * @active - the active tracker
> - *
> - * i915_gem_active_isset() returns true if the active tracker is currently
> - * assigned to a request. Due to the lazy retiring, that request may be idle
> - * and this may report stale information.
> - */
> -static inline bool
> -i915_gem_active_isset(const struct i915_gem_active *active)
> -{
> -	return rcu_access_pointer(active->request);
> -}
> -
> -/**
> - * i915_gem_active_wait - waits until the request is completed
> - * @active - the active request on which to wait
> - * @flags - how to wait
> - * @timeout - how long to wait at most
> - * @rps - userspace client to charge for a waitboost
> - *
> - * i915_gem_active_wait() waits until the request is completed before
> - * returning, without requiring any locks to be held. Note that it does not
> - * retire any requests before returning.
> - *
> - * This function relies on RCU in order to acquire the reference to the active
> - * request without holding any locks. See __i915_gem_active_get_rcu() for the
> - * glory details on how that is managed. Once the reference is acquired, we
> - * can then wait upon the request, and afterwards release our reference,
> - * free of any locking.
> - *
> - * This function wraps i915_request_wait(), see it for the full details on
> - * the arguments.
> - *
> - * Returns 0 if successful, or a negative error code.
> - */
> -static inline int
> -i915_gem_active_wait(const struct i915_gem_active *active, unsigned int flags)
> -{
> -	struct i915_request *request;
> -	long ret = 0;
> -
> -	request = i915_gem_active_get_unlocked(active);
> -	if (request) {
> -		ret = i915_request_wait(request, flags, MAX_SCHEDULE_TIMEOUT);
> -		i915_request_put(request);
> -	}
> -
> -	return ret < 0 ? ret : 0;
> -}
> -
> -/**
> - * i915_gem_active_retire - waits until the request is retired
> - * @active - the active request on which to wait
> - *
> - * i915_gem_active_retire() waits until the request is completed,
> - * and then ensures that at least the retirement handler for this
> - * @active tracker is called before returning. If the @active
> - * tracker is idle, the function returns immediately.
> - */
> -static inline int __must_check
> -i915_gem_active_retire(struct i915_gem_active *active,
> -		       struct mutex *mutex)
> -{
> -	struct i915_request *request;
> -	long ret;
> -
> -	request = i915_gem_active_raw(active, mutex);
> -	if (!request)
> -		return 0;
> -
> -	ret = i915_request_wait(request,
> -				I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
> -				MAX_SCHEDULE_TIMEOUT);
> -	if (ret < 0)
> -		return ret;
> -
> -	list_del_init(&active->link);
> -	RCU_INIT_POINTER(active->request, NULL);
> -
> -	active->retire(active, request);
> -
> -	return 0;
> -}
> -
> -#define for_each_active(mask, idx) \
> -	for (; mask ? idx = ffs(mask) - 1, 1 : 0; mask &= ~BIT(idx))
> -
>   int i915_global_request_init(void);
>   void i915_global_request_shrink(void);
>   void i915_global_request_exit(void);
> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> index ca19fcf29c5b..86d9c46aef18 100644
> --- a/drivers/gpu/drm/i915/i915_reset.c
> +++ b/drivers/gpu/drm/i915/i915_reset.c
> @@ -887,7 +887,7 @@ static bool __i915_gem_unset_wedged(struct drm_i915_private *i915)
>   	list_for_each_entry(tl, &i915->gt.timelines.active_list, link) {
>   		struct i915_request *rq;
>   
> -		rq = i915_gem_active_get_unlocked(&tl->last_request);
> +		rq = i915_active_request_get_unlocked(&tl->last_request);
>   		if (!rq)
>   			continue;
>   
> diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
> index b354843a5040..b2202d2e58a2 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.c
> +++ b/drivers/gpu/drm/i915/i915_timeline.c
> @@ -163,8 +163,8 @@ int i915_timeline_init(struct drm_i915_private *i915,
>   
>   	spin_lock_init(&timeline->lock);
>   
> -	init_request_active(&timeline->barrier, NULL);
> -	init_request_active(&timeline->last_request, NULL);
> +	INIT_ACTIVE_REQUEST(&timeline->barrier);
> +	INIT_ACTIVE_REQUEST(&timeline->last_request);
>   	INIT_LIST_HEAD(&timeline->requests);
>   
>   	i915_syncmap_init(&timeline->sync);
> @@ -236,7 +236,7 @@ void i915_timeline_fini(struct i915_timeline *timeline)
>   {
>   	GEM_BUG_ON(timeline->pin_count);
>   	GEM_BUG_ON(!list_empty(&timeline->requests));
> -	GEM_BUG_ON(i915_gem_active_isset(&timeline->barrier));
> +	GEM_BUG_ON(i915_active_request_isset(&timeline->barrier));
>   
>   	i915_syncmap_free(&timeline->sync);
>   	hwsp_free(timeline);
> @@ -268,25 +268,6 @@ i915_timeline_create(struct drm_i915_private *i915,
>   	return timeline;
>   }
>   
> -int i915_timeline_set_barrier(struct i915_timeline *tl, struct i915_request *rq)
> -{
> -	struct i915_request *old;
> -	int err;
> -
> -	lockdep_assert_held(&rq->i915->drm.struct_mutex);
> -
> -	/* Must maintain ordering wrt existing barriers */
> -	old = i915_gem_active_raw(&tl->barrier, &rq->i915->drm.struct_mutex);
> -	if (old) {
> -		err = i915_request_await_dma_fence(rq, &old->fence);
> -		if (err)
> -			return err;
> -	}
> -
> -	i915_gem_active_set(&tl->barrier, rq);
> -	return 0;
> -}
> -
>   int i915_timeline_pin(struct i915_timeline *tl)
>   {
>   	int err;
> diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
> index d167e04073c5..7bec7d2e45bf 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.h
> +++ b/drivers/gpu/drm/i915/i915_timeline.h
> @@ -28,6 +28,7 @@
>   #include <linux/list.h>
>   #include <linux/kref.h>
>   
> +#include "i915_active.h"
>   #include "i915_request.h"
>   #include "i915_syncmap.h"
>   #include "i915_utils.h"
> @@ -58,10 +59,10 @@ struct i915_timeline {
>   
>   	/* Contains an RCU guarded pointer to the last request. No reference is
>   	 * held to the request, users must carefully acquire a reference to
> -	 * the request using i915_gem_active_get_request_rcu(), or hold the
> +	 * the request using i915_active_request_get_request_rcu(), or hold the
>   	 * struct_mutex.
>   	 */
> -	struct i915_gem_active last_request;
> +	struct i915_active_request last_request;
>   
>   	/**
>   	 * We track the most recent seqno that we wait on in every context so
> @@ -82,7 +83,7 @@ struct i915_timeline {
>   	 * subsequent submissions to this timeline be executed only after the
>   	 * barrier has been completed.
>   	 */
> -	struct i915_gem_active barrier;
> +	struct i915_active_request barrier;
>   
>   	struct list_head link;
>   	const char *name;
> @@ -174,7 +175,10 @@ void i915_timelines_fini(struct drm_i915_private *i915);
>    * submissions on @timeline. Subsequent requests will not be submitted to GPU
>    * until the barrier has been completed.
>    */
> -int i915_timeline_set_barrier(struct i915_timeline *timeline,
> -			      struct i915_request *rq);
> +static inline int
> +i915_timeline_set_barrier(struct i915_timeline *tl, struct i915_request *rq)
> +{
> +	return i915_active_request_set(&tl->barrier, rq);
> +}
>   
>   #endif
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index d4772061e642..b713bed20c38 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -120,7 +120,7 @@ vma_create(struct drm_i915_gem_object *obj,
>   		return ERR_PTR(-ENOMEM);
>   
>   	i915_active_init(vm->i915, &vma->active, __i915_vma_retire);
> -	init_request_active(&vma->last_fence, NULL);
> +	INIT_ACTIVE_REQUEST(&vma->last_fence);
>   
>   	vma->vm = vm;
>   	vma->ops = &vm->vma_ops;
> @@ -808,7 +808,7 @@ static void __i915_vma_destroy(struct i915_vma *vma)
>   	GEM_BUG_ON(vma->node.allocated);
>   	GEM_BUG_ON(vma->fence);
>   
> -	GEM_BUG_ON(i915_gem_active_isset(&vma->last_fence));
> +	GEM_BUG_ON(i915_active_request_isset(&vma->last_fence));
>   
>   	mutex_lock(&vma->vm->mutex);
>   	list_del(&vma->vm_link);
> @@ -942,14 +942,14 @@ int i915_vma_move_to_active(struct i915_vma *vma,
>   		obj->write_domain = I915_GEM_DOMAIN_RENDER;
>   
>   		if (intel_fb_obj_invalidate(obj, ORIGIN_CS))
> -			i915_gem_active_set(&obj->frontbuffer_write, rq);
> +			__i915_active_request_set(&obj->frontbuffer_write, rq);
>   
>   		obj->read_domains = 0;
>   	}
>   	obj->read_domains |= I915_GEM_GPU_DOMAINS;
>   
>   	if (flags & EXEC_OBJECT_NEEDS_FENCE)
> -		i915_gem_active_set(&vma->last_fence, rq);
> +		__i915_active_request_set(&vma->last_fence, rq);
>   
>   	export_fence(vma, rq, flags);
>   	return 0;
> @@ -986,8 +986,8 @@ int i915_vma_unbind(struct i915_vma *vma)
>   		if (ret)
>   			goto unpin;
>   
> -		ret = i915_gem_active_retire(&vma->last_fence,
> -					     &vma->vm->i915->drm.struct_mutex);
> +		ret = i915_active_request_retire(&vma->last_fence,
> +					      &vma->vm->i915->drm.struct_mutex);
>   unpin:
>   		__i915_vma_unpin(vma);
>   		if (ret)
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 3c03d4569481..7c742027f866 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -110,7 +110,7 @@ struct i915_vma {
>   #define I915_VMA_GGTT_WRITE	BIT(15)
>   
>   	struct i915_active active;
> -	struct i915_gem_active last_fence;
> +	struct i915_active_request last_fence;
>   
>   	/**
>   	 * Support different GGTT views into the same object.
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index ec2cbbe070a4..0dbd6d7c1693 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1124,7 +1124,7 @@ bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine)
>   	 * the last request that remains in the timeline. When idle, it is
>   	 * the last executed context as tracked by retirement.
>   	 */
> -	rq = __i915_gem_active_peek(&engine->timeline.last_request);
> +	rq = __i915_active_request_peek(&engine->timeline.last_request);
>   	if (rq)
>   		return rq->hw_context == kernel_context;
>   	else
> diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
> index a9238fd07e30..c0df1dbb0069 100644
> --- a/drivers/gpu/drm/i915/intel_overlay.c
> +++ b/drivers/gpu/drm/i915/intel_overlay.c
> @@ -186,7 +186,7 @@ struct intel_overlay {
>   	struct overlay_registers __iomem *regs;
>   	u32 flip_addr;
>   	/* flip handling */
> -	struct i915_gem_active last_flip;
> +	struct i915_active_request last_flip;
>   };
>   
>   static void i830_overlay_clock_gating(struct drm_i915_private *dev_priv,
> @@ -214,23 +214,23 @@ static void i830_overlay_clock_gating(struct drm_i915_private *dev_priv,
>   
>   static void intel_overlay_submit_request(struct intel_overlay *overlay,
>   					 struct i915_request *rq,
> -					 i915_gem_retire_fn retire)
> +					 i915_active_retire_fn retire)
>   {
> -	GEM_BUG_ON(i915_gem_active_peek(&overlay->last_flip,
> -					&overlay->i915->drm.struct_mutex));
> -	i915_gem_active_set_retire_fn(&overlay->last_flip, retire,
> -				      &overlay->i915->drm.struct_mutex);
> -	i915_gem_active_set(&overlay->last_flip, rq);
> +	GEM_BUG_ON(i915_active_request_peek(&overlay->last_flip,
> +					    &overlay->i915->drm.struct_mutex));
> +	i915_active_request_set_retire_fn(&overlay->last_flip, retire,
> +					  &overlay->i915->drm.struct_mutex);
> +	__i915_active_request_set(&overlay->last_flip, rq);
>   	i915_request_add(rq);
>   }
>   
>   static int intel_overlay_do_wait_request(struct intel_overlay *overlay,
>   					 struct i915_request *rq,
> -					 i915_gem_retire_fn retire)
> +					 i915_active_retire_fn retire)
>   {
>   	intel_overlay_submit_request(overlay, rq, retire);
> -	return i915_gem_active_retire(&overlay->last_flip,
> -				      &overlay->i915->drm.struct_mutex);
> +	return i915_active_request_retire(&overlay->last_flip,
> +					  &overlay->i915->drm.struct_mutex);
>   }
>   
>   static struct i915_request *alloc_request(struct intel_overlay *overlay)
> @@ -351,8 +351,9 @@ static void intel_overlay_release_old_vma(struct intel_overlay *overlay)
>   	i915_vma_put(vma);
>   }
>   
> -static void intel_overlay_release_old_vid_tail(struct i915_gem_active *active,
> -					       struct i915_request *rq)
> +static void
> +intel_overlay_release_old_vid_tail(struct i915_active_request *active,
> +				   struct i915_request *rq)
>   {
>   	struct intel_overlay *overlay =
>   		container_of(active, typeof(*overlay), last_flip);
> @@ -360,7 +361,7 @@ static void intel_overlay_release_old_vid_tail(struct i915_gem_active *active,
>   	intel_overlay_release_old_vma(overlay);
>   }
>   
> -static void intel_overlay_off_tail(struct i915_gem_active *active,
> +static void intel_overlay_off_tail(struct i915_active_request *active,
>   				   struct i915_request *rq)
>   {
>   	struct intel_overlay *overlay =
> @@ -423,8 +424,8 @@ static int intel_overlay_off(struct intel_overlay *overlay)
>    * We have to be careful not to repeat work forever an make forward progess. */
>   static int intel_overlay_recover_from_interrupt(struct intel_overlay *overlay)
>   {
> -	return i915_gem_active_retire(&overlay->last_flip,
> -				      &overlay->i915->drm.struct_mutex);
> +	return i915_active_request_retire(&overlay->last_flip,
> +					  &overlay->i915->drm.struct_mutex);
>   }
>   
>   /* Wait for pending overlay flip and release old frame.
> @@ -1357,7 +1358,7 @@ void intel_overlay_setup(struct drm_i915_private *dev_priv)
>   	overlay->contrast = 75;
>   	overlay->saturation = 146;
>   
> -	init_request_active(&overlay->last_flip, NULL);
> +	INIT_ACTIVE_REQUEST(&overlay->last_flip);
>   
>   	mutex_lock(&dev_priv->drm.struct_mutex);
>   
> diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> index 30ab0e04a674..72151aab208e 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> @@ -501,8 +501,8 @@ static int live_suppress_wait_preempt(void *arg)
>   				}
>   
>   				/* Disable NEWCLIENT promotion */
> -				i915_gem_active_set(&rq[i]->timeline->last_request,
> -						    dummy);
> +				__i915_active_request_set(&rq[i]->timeline->last_request,
> +							  dummy);
>   				i915_request_add(rq[i]);
>   			}
>   
> diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c
> index e5659aaa856d..d2de9ece2118 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_timeline.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c
> @@ -15,8 +15,8 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context)
>   
>   	spin_lock_init(&timeline->lock);
>   
> -	init_request_active(&timeline->barrier, NULL);
> -	init_request_active(&timeline->last_request, NULL);
> +	INIT_ACTIVE_REQUEST(&timeline->barrier);
> +	INIT_ACTIVE_REQUEST(&timeline->last_request);
>   	INIT_LIST_HEAD(&timeline->requests);
>   
>   	i915_syncmap_init(&timeline->sync);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 18/22] drm/i915: Keep timeline HWSP allocated until idle across the system
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (16 preceding siblings ...)
  2019-02-04 13:22 ` [PATCH 17/22] drm/i915: Pull i915_gem_active into the i915_active family Chris Wilson
@ 2019-02-04 13:22 ` Chris Wilson
  2019-02-04 18:57   ` Tvrtko Ursulin
  2019-02-04 13:22 ` [PATCH 19/22] drm/i915/execlists: Refactor out can_merge_rq() Chris Wilson
                   ` (10 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:22 UTC (permalink / raw)
  To: intel-gfx

In preparation for enabling HW semaphores, we need to keep in flight
timeline HWSP alive until its use across entire system has completed,
as any other timeline active on the GPU may still refer back to the
already retired timeline. We both have to delay recycling available
cachelines and unpinning old HWSP until the next idle point.

An easy option would be to simply keep all used HWSP until the system as
a whole was idle, i.e. we could release them all at once on parking.
However, on a busy system, we may never see a global idle point,
essentially meaning the resource will be leaked until we are forced to
do a GC pass. We already employ a fine-grained idle detection mechanism
for vma, which we can reuse here so that each cacheline can be freed
immediately after the last request using it is retired.

v3: Keep track of the activity of each cacheline.
v4: cacheline_free() on canceling the seqno tracking
v5: Finally with a testcase to exercise wraparound

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c           |  30 +-
 drivers/gpu/drm/i915/i915_timeline.c          | 264 ++++++++++++++++--
 drivers/gpu/drm/i915/i915_timeline.h          |   9 +-
 .../gpu/drm/i915/selftests/i915_timeline.c    | 110 ++++++++
 4 files changed, 374 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index ed9f16bca4fe..057bffa56700 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -331,11 +331,6 @@ void i915_request_retire_upto(struct i915_request *rq)
 	} while (tmp != rq);
 }
 
-static u32 timeline_get_seqno(struct i915_timeline *tl)
-{
-	return tl->seqno += 1 + tl->has_initial_breadcrumb;
-}
-
 static void move_to_timeline(struct i915_request *request,
 			     struct i915_timeline *timeline)
 {
@@ -556,8 +551,10 @@ struct i915_request *
 i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 {
 	struct drm_i915_private *i915 = engine->i915;
-	struct i915_request *rq;
 	struct intel_context *ce;
+	struct i915_timeline *tl;
+	struct i915_request *rq;
+	u32 seqno;
 	int ret;
 
 	lockdep_assert_held(&i915->drm.struct_mutex);
@@ -632,24 +629,26 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 		}
 	}
 
-	rq->rcustate = get_state_synchronize_rcu();
-
 	INIT_LIST_HEAD(&rq->active_list);
+
+	tl = ce->ring->timeline;
+	ret = i915_timeline_get_seqno(tl, rq, &seqno);
+	if (ret)
+		goto err_free;
+
 	rq->i915 = i915;
 	rq->engine = engine;
 	rq->gem_context = ctx;
 	rq->hw_context = ce;
 	rq->ring = ce->ring;
-	rq->timeline = ce->ring->timeline;
+	rq->timeline = tl;
 	GEM_BUG_ON(rq->timeline == &engine->timeline);
-	rq->hwsp_seqno = rq->timeline->hwsp_seqno;
+	rq->hwsp_seqno = tl->hwsp_seqno;
+	rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */
 
 	spin_lock_init(&rq->lock);
-	dma_fence_init(&rq->fence,
-		       &i915_fence_ops,
-		       &rq->lock,
-		       rq->timeline->fence_context,
-		       timeline_get_seqno(rq->timeline));
+	dma_fence_init(&rq->fence, &i915_fence_ops, &rq->lock,
+		       tl->fence_context, seqno);
 
 	/* We bump the ref for the fence chain */
 	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
@@ -710,6 +709,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	GEM_BUG_ON(!list_empty(&rq->sched.signalers_list));
 	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
+err_free:
 	kmem_cache_free(global.slab_requests, rq);
 err_unreserve:
 	unreserve_gt(i915);
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index b2202d2e58a2..3608e544012f 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -6,19 +6,29 @@
 
 #include "i915_drv.h"
 
-#include "i915_timeline.h"
+#include "i915_active.h"
 #include "i915_syncmap.h"
+#include "i915_timeline.h"
 
 struct i915_timeline_hwsp {
-	struct i915_vma *vma;
+	struct i915_gt_timelines *gt;
 	struct list_head free_link;
+	struct i915_vma *vma;
 	u64 free_bitmap;
 };
 
-static inline struct i915_timeline_hwsp *
-i915_timeline_hwsp(const struct i915_timeline *tl)
+struct i915_timeline_cacheline {
+	struct i915_active active;
+	struct i915_timeline_hwsp *hwsp;
+	void *vaddr;
+	unsigned int cacheline : 6;
+	unsigned int free : 1;
+};
+
+static inline struct drm_i915_private *
+hwsp_to_i915(struct i915_timeline_hwsp *hwsp)
 {
-	return tl->hwsp_ggtt->private;
+	return container_of(hwsp->gt, struct drm_i915_private, gt.timelines);
 }
 
 static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
@@ -71,6 +81,7 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
 		vma->private = hwsp;
 		hwsp->vma = vma;
 		hwsp->free_bitmap = ~0ull;
+		hwsp->gt = gt;
 
 		spin_lock(&gt->hwsp_lock);
 		list_add(&hwsp->free_link, &gt->hwsp_free_list);
@@ -88,14 +99,9 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
 	return hwsp->vma;
 }
 
-static void hwsp_free(struct i915_timeline *timeline)
+static void __idle_hwsp_free(struct i915_timeline_hwsp *hwsp, int cacheline)
 {
-	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
-	struct i915_timeline_hwsp *hwsp;
-
-	hwsp = i915_timeline_hwsp(timeline);
-	if (!hwsp) /* leave global HWSP alone! */
-		return;
+	struct i915_gt_timelines *gt = hwsp->gt;
 
 	spin_lock(&gt->hwsp_lock);
 
@@ -103,7 +109,8 @@ static void hwsp_free(struct i915_timeline *timeline)
 	if (!hwsp->free_bitmap)
 		list_add_tail(&hwsp->free_link, &gt->hwsp_free_list);
 
-	hwsp->free_bitmap |= BIT_ULL(timeline->hwsp_offset / CACHELINE_BYTES);
+	GEM_BUG_ON(cacheline >= BITS_PER_TYPE(hwsp->free_bitmap));
+	hwsp->free_bitmap |= BIT_ULL(cacheline);
 
 	/* And if no one is left using it, give the page back to the system */
 	if (hwsp->free_bitmap == ~0ull) {
@@ -115,6 +122,78 @@ static void hwsp_free(struct i915_timeline *timeline)
 	spin_unlock(&gt->hwsp_lock);
 }
 
+static void __idle_cacheline_free(struct i915_timeline_cacheline *cl)
+{
+	GEM_BUG_ON(!i915_active_is_idle(&cl->active));
+
+	i915_gem_object_unpin_map(cl->hwsp->vma->obj);
+	i915_vma_put(cl->hwsp->vma);
+	__idle_hwsp_free(cl->hwsp, cl->cacheline);
+
+	i915_active_fini(&cl->active);
+	kfree(cl);
+}
+
+static void __cacheline_retire(struct i915_active *active)
+{
+	struct i915_timeline_cacheline *cl =
+		container_of(active, typeof(*cl), active);
+
+	i915_vma_unpin(cl->hwsp->vma);
+	if (cl->free)
+		__idle_cacheline_free(cl);
+}
+
+static struct i915_timeline_cacheline *
+cacheline_alloc(struct i915_timeline_hwsp *hwsp, unsigned int cacheline)
+{
+	struct i915_timeline_cacheline *cl;
+	void *vaddr;
+
+	GEM_BUG_ON(cacheline >= 64);
+
+	cl = kmalloc(sizeof(*cl), GFP_KERNEL);
+	if (!cl)
+		return ERR_PTR(-ENOMEM);
+
+	vaddr = i915_gem_object_pin_map(hwsp->vma->obj, I915_MAP_WB);
+	if (IS_ERR(vaddr)) {
+		kfree(cl);
+		return ERR_CAST(vaddr);
+	}
+
+	i915_vma_get(hwsp->vma);
+	cl->hwsp = hwsp;
+	cl->vaddr = vaddr;
+	cl->cacheline = cacheline;
+	cl->free = false;
+
+	i915_active_init(hwsp_to_i915(hwsp), &cl->active, __cacheline_retire);
+
+	return cl;
+}
+
+static void cacheline_acquire(struct i915_timeline_cacheline *cl)
+{
+	if (cl && i915_active_acquire(&cl->active))
+		__i915_vma_pin(cl->hwsp->vma);
+}
+
+static void cacheline_release(struct i915_timeline_cacheline *cl)
+{
+	if (cl)
+		i915_active_release(&cl->active);
+}
+
+static void cacheline_free(struct i915_timeline_cacheline *cl)
+{
+	GEM_BUG_ON(cl->free);
+	cl->free = true;
+
+	if (i915_active_is_idle(&cl->active))
+		__idle_cacheline_free(cl);
+}
+
 int i915_timeline_init(struct drm_i915_private *i915,
 		       struct i915_timeline *timeline,
 		       const char *name,
@@ -136,29 +215,40 @@ int i915_timeline_init(struct drm_i915_private *i915,
 	timeline->name = name;
 	timeline->pin_count = 0;
 	timeline->has_initial_breadcrumb = !hwsp;
+	timeline->hwsp_cacheline = NULL;
 
-	timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
 	if (!hwsp) {
+		struct i915_timeline_cacheline *cl;
 		unsigned int cacheline;
 
 		hwsp = hwsp_alloc(timeline, &cacheline);
 		if (IS_ERR(hwsp))
 			return PTR_ERR(hwsp);
 
+		cl = cacheline_alloc(hwsp->private, cacheline);
+		if (IS_ERR(cl)) {
+			__idle_hwsp_free(hwsp->private, cacheline);
+			return PTR_ERR(cl);
+		}
+
+		timeline->hwsp_cacheline = cl;
 		timeline->hwsp_offset = cacheline * CACHELINE_BYTES;
-	}
-	timeline->hwsp_ggtt = i915_vma_get(hwsp);
 
-	vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
-	if (IS_ERR(vaddr)) {
-		hwsp_free(timeline);
-		i915_vma_put(hwsp);
-		return PTR_ERR(vaddr);
+		vaddr = cl->vaddr;
+	} else {
+		timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
+
+		vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
+		if (IS_ERR(vaddr))
+			return PTR_ERR(vaddr);
 	}
 
 	timeline->hwsp_seqno =
 		memset(vaddr + timeline->hwsp_offset, 0, CACHELINE_BYTES);
 
+	timeline->hwsp_ggtt = i915_vma_get(hwsp);
+	GEM_BUG_ON(timeline->hwsp_offset >= hwsp->size);
+
 	timeline->fence_context = dma_fence_context_alloc(1);
 
 	spin_lock_init(&timeline->lock);
@@ -239,9 +329,12 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 	GEM_BUG_ON(i915_active_request_isset(&timeline->barrier));
 
 	i915_syncmap_free(&timeline->sync);
-	hwsp_free(timeline);
 
-	i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
+	if (timeline->hwsp_cacheline)
+		cacheline_free(timeline->hwsp_cacheline);
+	else
+		i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
+
 	i915_vma_put(timeline->hwsp_ggtt);
 }
 
@@ -284,6 +377,7 @@ int i915_timeline_pin(struct i915_timeline *tl)
 		i915_ggtt_offset(tl->hwsp_ggtt) +
 		offset_in_page(tl->hwsp_offset);
 
+	cacheline_acquire(tl->hwsp_cacheline);
 	timeline_add_to_active(tl);
 
 	return 0;
@@ -293,6 +387,129 @@ int i915_timeline_pin(struct i915_timeline *tl)
 	return err;
 }
 
+static u32 timeline_advance(struct i915_timeline *tl)
+{
+	GEM_BUG_ON(!tl->pin_count);
+	GEM_BUG_ON(tl->seqno & tl->has_initial_breadcrumb);
+
+	return tl->seqno += 1 + tl->has_initial_breadcrumb;
+}
+
+static void timeline_rollback(struct i915_timeline *tl)
+{
+	tl->seqno -= 1 + tl->has_initial_breadcrumb;
+}
+
+static noinline int
+__i915_timeline_get_seqno(struct i915_timeline *tl,
+			  struct i915_request *rq,
+			  u32 *seqno)
+{
+	struct i915_timeline_cacheline *cl;
+	struct i915_vma *vma;
+	unsigned int cacheline;
+	int err;
+
+	/*
+	 * If there is an outstanding GPU reference to this cacheline,
+	 * such as it being sampled by a HW semaphore on another timeline,
+	 * we cannot wraparound our seqno value (the HW semaphore does
+	 * a strict greater-than-or-equals compare, not i915_seqno_passed).
+	 * So if the cacheline is still busy, we must detach ourselves
+	 * from it and leave it inflight alongside its users.
+	 *
+	 * However, if nobody is watching and we can guarantee that nobody
+	 * will, we could simply reuse the same cacheline.
+	 *
+	 * if (i915_active_request_is_signaled(&tl->last_request) &&
+	 *     i915_active_is_signaled(&tl->hwsp_cacheline->active))
+	 *	return 0;
+	 *
+	 * That seems unlikely for a busy timeline that needed to wrap in
+	 * the first place, so just replace the cacheline.
+	 */
+
+	vma = hwsp_alloc(tl, &cacheline);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_rollback;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH);
+	if (err) {
+		__idle_hwsp_free(vma->private, cacheline);
+		goto err_rollback;
+	}
+
+	cl = cacheline_alloc(vma->private, cacheline);
+	if (IS_ERR(cl)) {
+		err = PTR_ERR(cl);
+		__idle_hwsp_free(vma->private, cacheline);
+		goto err_unpin;
+	}
+	GEM_BUG_ON(cl->hwsp->vma != vma);
+
+	/*
+	 * Attach the old cacheline to the current request, so that we only
+	 * free it after the current request is retired, which ensures that
+	 * all writes into the cacheline from previous requests are complete.
+	 */
+	err = i915_active_ref(&tl->hwsp_cacheline->active,
+			      tl->fence_context, rq);
+	if (err)
+		goto err_cacheline;
+
+	cacheline_release(tl->hwsp_cacheline); /* ownership now xfered to rq */
+	cacheline_free(tl->hwsp_cacheline);
+
+	i915_vma_unpin(tl->hwsp_ggtt); /* binding kept alive by old cacheline */
+	i915_vma_put(tl->hwsp_ggtt);
+
+	tl->hwsp_ggtt = i915_vma_get(vma);
+
+	tl->hwsp_offset = cacheline * CACHELINE_BYTES;
+	tl->hwsp_seqno =
+		memset(cl->vaddr + tl->hwsp_offset, 0, CACHELINE_BYTES);
+
+	tl->hwsp_offset += i915_ggtt_offset(vma);
+
+	cacheline_acquire(cl);
+	tl->hwsp_cacheline = cl;
+
+	*seqno = timeline_advance(tl);
+	GEM_BUG_ON(i915_seqno_passed(*tl->hwsp_seqno, *seqno));
+	return 0;
+
+err_cacheline:
+	cacheline_free(cl);
+err_unpin:
+	i915_vma_unpin(vma);
+err_rollback:
+	timeline_rollback(tl);
+	return err;
+}
+
+int i915_timeline_get_seqno(struct i915_timeline *tl,
+			    struct i915_request *rq,
+			    u32 *seqno)
+{
+	*seqno = timeline_advance(tl);
+
+	/* Replace the HWSP on wraparound for HW semaphores */
+	if (unlikely(!*seqno && tl->hwsp_cacheline))
+		return __i915_timeline_get_seqno(tl, rq, seqno);
+
+	return 0;
+}
+
+int i915_timeline_read_lock(struct i915_timeline *tl, struct i915_request *rq)
+{
+	GEM_BUG_ON(!tl->pin_count);
+	GEM_BUG_ON(!tl->hwsp_cacheline);
+	return i915_active_ref(&tl->hwsp_cacheline->active,
+			       rq->fence.context, rq);
+}
+
 void i915_timeline_unpin(struct i915_timeline *tl)
 {
 	GEM_BUG_ON(!tl->pin_count);
@@ -300,6 +517,7 @@ void i915_timeline_unpin(struct i915_timeline *tl)
 		return;
 
 	timeline_remove_from_active(tl);
+	cacheline_release(tl->hwsp_cacheline);
 
 	/*
 	 * Since this timeline is idle, all bariers upon which we were waiting
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 7bec7d2e45bf..d78ec6fbc000 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -34,7 +34,7 @@
 #include "i915_utils.h"
 
 struct i915_vma;
-struct i915_timeline_hwsp;
+struct i915_timeline_cacheline;
 
 struct i915_timeline {
 	u64 fence_context;
@@ -49,6 +49,8 @@ struct i915_timeline {
 	struct i915_vma *hwsp_ggtt;
 	u32 hwsp_offset;
 
+	struct i915_timeline_cacheline *hwsp_cacheline;
+
 	bool has_initial_breadcrumb;
 
 	/**
@@ -160,6 +162,11 @@ static inline bool i915_timeline_sync_is_later(struct i915_timeline *tl,
 }
 
 int i915_timeline_pin(struct i915_timeline *tl);
+int i915_timeline_get_seqno(struct i915_timeline *tl,
+			    struct i915_request *rq,
+			    u32 *seqno);
+int i915_timeline_read_lock(struct i915_timeline *tl,
+			    struct i915_request *rq);
 void i915_timeline_unpin(struct i915_timeline *tl);
 
 void i915_timelines_init(struct drm_i915_private *i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_timeline.c b/drivers/gpu/drm/i915/selftests/i915_timeline.c
index 12ea69b1a1e5..9e0126867634 100644
--- a/drivers/gpu/drm/i915/selftests/i915_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/i915_timeline.c
@@ -641,6 +641,115 @@ static int live_hwsp_alternate(void *arg)
 #undef NUM_TIMELINES
 }
 
+static int live_hwsp_wrap(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+	struct i915_timeline *tl;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	int err = 0;
+
+	/*
+	 * Across a seqno wrap, we need to keep the old cacheline alive for
+	 * foreign GPU references.
+	 */
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	tl = i915_timeline_create(i915, __func__, NULL);
+	if (IS_ERR(tl)) {
+		err = PTR_ERR(tl);
+		goto out_rpm;
+	}
+	if (!tl->has_initial_breadcrumb || !tl->hwsp_cacheline)
+		goto out_free;
+
+	err = i915_timeline_pin(tl);
+	if (err)
+		goto out_free;
+
+	for_each_engine(engine, i915, id) {
+		const u32 *hwsp_seqno[2];
+		struct i915_request *rq;
+		u32 seqno[2];
+
+		rq = i915_request_alloc(engine, i915->kernel_context);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			goto out;
+		}
+
+		tl->seqno = -4u;
+
+		err = i915_timeline_get_seqno(tl, rq, &seqno[0]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		pr_debug("seqno[0]:%08x, hwsp_offset:%08x\n",
+			 seqno[0], tl->hwsp_offset);
+
+		err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[0]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		hwsp_seqno[0] = tl->hwsp_seqno;
+
+		err = i915_timeline_get_seqno(tl, rq, &seqno[1]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		pr_debug("seqno[1]:%08x, hwsp_offset:%08x\n",
+			 seqno[1], tl->hwsp_offset);
+
+		err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[1]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		hwsp_seqno[1] = tl->hwsp_seqno;
+
+		/* With wrap should come a new hwsp */
+		GEM_BUG_ON(seqno[1] >= seqno[0]);
+		GEM_BUG_ON(hwsp_seqno[0] == hwsp_seqno[1]);
+
+		i915_request_add(rq);
+
+		if (i915_request_wait(rq, I915_WAIT_LOCKED, HZ / 5) < 0) {
+			pr_err("Wait for timeline writes timed out!\n");
+			err = -EIO;
+			goto out;
+		}
+
+		if (*hwsp_seqno[0] != seqno[0] || *hwsp_seqno[1] != seqno[1]) {
+			pr_err("Bad timeline values: found (%x, %x), expected (%x, %x)\n",
+			       *hwsp_seqno[0], *hwsp_seqno[1],
+			       seqno[0], seqno[1]);
+			err = -EINVAL;
+			goto out;
+		}
+
+		i915_retire_requests(i915); /* recycle HWSP */
+	}
+
+out:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	i915_timeline_unpin(tl);
+out_free:
+	i915_timeline_put(tl);
+out_rpm:
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	return err;
+}
+
 static int live_hwsp_recycle(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -723,6 +832,7 @@ int i915_timeline_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_hwsp_recycle),
 		SUBTEST(live_hwsp_engine),
 		SUBTEST(live_hwsp_alternate),
+		SUBTEST(live_hwsp_wrap),
 	};
 
 	return i915_subtests(tests, i915);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 18/22] drm/i915: Keep timeline HWSP allocated until idle across the system
  2019-02-04 13:22 ` [PATCH 18/22] drm/i915: Keep timeline HWSP allocated until idle across the system Chris Wilson
@ 2019-02-04 18:57   ` Tvrtko Ursulin
  0 siblings, 0 replies; 45+ messages in thread
From: Tvrtko Ursulin @ 2019-02-04 18:57 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/02/2019 13:22, Chris Wilson wrote:
> In preparation for enabling HW semaphores, we need to keep in flight
> timeline HWSP alive until its use across entire system has completed,
> as any other timeline active on the GPU may still refer back to the
> already retired timeline. We both have to delay recycling available
> cachelines and unpinning old HWSP until the next idle point.
> 
> An easy option would be to simply keep all used HWSP until the system as
> a whole was idle, i.e. we could release them all at once on parking.
> However, on a busy system, we may never see a global idle point,
> essentially meaning the resource will be leaked until we are forced to
> do a GC pass. We already employ a fine-grained idle detection mechanism
> for vma, which we can reuse here so that each cacheline can be freed
> immediately after the last request using it is retired.
> 
> v3: Keep track of the activity of each cacheline.
> v4: cacheline_free() on canceling the seqno tracking
> v5: Finally with a testcase to exercise wraparound
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_request.c           |  30 +-
>   drivers/gpu/drm/i915/i915_timeline.c          | 264 ++++++++++++++++--
>   drivers/gpu/drm/i915/i915_timeline.h          |   9 +-
>   .../gpu/drm/i915/selftests/i915_timeline.c    | 110 ++++++++
>   4 files changed, 374 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index ed9f16bca4fe..057bffa56700 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -331,11 +331,6 @@ void i915_request_retire_upto(struct i915_request *rq)
>   	} while (tmp != rq);
>   }
>   
> -static u32 timeline_get_seqno(struct i915_timeline *tl)
> -{
> -	return tl->seqno += 1 + tl->has_initial_breadcrumb;
> -}
> -
>   static void move_to_timeline(struct i915_request *request,
>   			     struct i915_timeline *timeline)
>   {
> @@ -556,8 +551,10 @@ struct i915_request *
>   i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
> -	struct i915_request *rq;
>   	struct intel_context *ce;
> +	struct i915_timeline *tl;
> +	struct i915_request *rq;
> +	u32 seqno;
>   	int ret;
>   
>   	lockdep_assert_held(&i915->drm.struct_mutex);
> @@ -632,24 +629,26 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   		}
>   	}
>   
> -	rq->rcustate = get_state_synchronize_rcu();
> -
>   	INIT_LIST_HEAD(&rq->active_list);
> +
> +	tl = ce->ring->timeline;
> +	ret = i915_timeline_get_seqno(tl, rq, &seqno);
> +	if (ret)
> +		goto err_free;
> +
>   	rq->i915 = i915;
>   	rq->engine = engine;
>   	rq->gem_context = ctx;
>   	rq->hw_context = ce;
>   	rq->ring = ce->ring;
> -	rq->timeline = ce->ring->timeline;
> +	rq->timeline = tl;
>   	GEM_BUG_ON(rq->timeline == &engine->timeline);
> -	rq->hwsp_seqno = rq->timeline->hwsp_seqno;
> +	rq->hwsp_seqno = tl->hwsp_seqno;
> +	rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */
>   
>   	spin_lock_init(&rq->lock);
> -	dma_fence_init(&rq->fence,
> -		       &i915_fence_ops,
> -		       &rq->lock,
> -		       rq->timeline->fence_context,
> -		       timeline_get_seqno(rq->timeline));
> +	dma_fence_init(&rq->fence, &i915_fence_ops, &rq->lock,
> +		       tl->fence_context, seqno);
>   
>   	/* We bump the ref for the fence chain */
>   	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
> @@ -710,6 +709,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	GEM_BUG_ON(!list_empty(&rq->sched.signalers_list));
>   	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
>   
> +err_free:
>   	kmem_cache_free(global.slab_requests, rq);
>   err_unreserve:
>   	unreserve_gt(i915);
> diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
> index b2202d2e58a2..3608e544012f 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.c
> +++ b/drivers/gpu/drm/i915/i915_timeline.c
> @@ -6,19 +6,29 @@
>   
>   #include "i915_drv.h"
>   
> -#include "i915_timeline.h"
> +#include "i915_active.h"
>   #include "i915_syncmap.h"
> +#include "i915_timeline.h"
>   
>   struct i915_timeline_hwsp {
> -	struct i915_vma *vma;
> +	struct i915_gt_timelines *gt;
>   	struct list_head free_link;
> +	struct i915_vma *vma;
>   	u64 free_bitmap;
>   };
>   
> -static inline struct i915_timeline_hwsp *
> -i915_timeline_hwsp(const struct i915_timeline *tl)
> +struct i915_timeline_cacheline {
> +	struct i915_active active;
> +	struct i915_timeline_hwsp *hwsp;
> +	void *vaddr;
> +	unsigned int cacheline : 6;
> +	unsigned int free : 1;
> +};
> +
> +static inline struct drm_i915_private *
> +hwsp_to_i915(struct i915_timeline_hwsp *hwsp)
>   {
> -	return tl->hwsp_ggtt->private;
> +	return container_of(hwsp->gt, struct drm_i915_private, gt.timelines);
>   }
>   
>   static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
> @@ -71,6 +81,7 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
>   		vma->private = hwsp;
>   		hwsp->vma = vma;
>   		hwsp->free_bitmap = ~0ull;
> +		hwsp->gt = gt;
>   
>   		spin_lock(&gt->hwsp_lock);
>   		list_add(&hwsp->free_link, &gt->hwsp_free_list);
> @@ -88,14 +99,9 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
>   	return hwsp->vma;
>   }
>   
> -static void hwsp_free(struct i915_timeline *timeline)
> +static void __idle_hwsp_free(struct i915_timeline_hwsp *hwsp, int cacheline)
>   {
> -	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
> -	struct i915_timeline_hwsp *hwsp;
> -
> -	hwsp = i915_timeline_hwsp(timeline);
> -	if (!hwsp) /* leave global HWSP alone! */
> -		return;
> +	struct i915_gt_timelines *gt = hwsp->gt;
>   
>   	spin_lock(&gt->hwsp_lock);
>   
> @@ -103,7 +109,8 @@ static void hwsp_free(struct i915_timeline *timeline)
>   	if (!hwsp->free_bitmap)
>   		list_add_tail(&hwsp->free_link, &gt->hwsp_free_list);
>   
> -	hwsp->free_bitmap |= BIT_ULL(timeline->hwsp_offset / CACHELINE_BYTES);
> +	GEM_BUG_ON(cacheline >= BITS_PER_TYPE(hwsp->free_bitmap));
> +	hwsp->free_bitmap |= BIT_ULL(cacheline);
>   
>   	/* And if no one is left using it, give the page back to the system */
>   	if (hwsp->free_bitmap == ~0ull) {
> @@ -115,6 +122,78 @@ static void hwsp_free(struct i915_timeline *timeline)
>   	spin_unlock(&gt->hwsp_lock);
>   }
>   
> +static void __idle_cacheline_free(struct i915_timeline_cacheline *cl)
> +{
> +	GEM_BUG_ON(!i915_active_is_idle(&cl->active));
> +
> +	i915_gem_object_unpin_map(cl->hwsp->vma->obj);
> +	i915_vma_put(cl->hwsp->vma);
> +	__idle_hwsp_free(cl->hwsp, cl->cacheline);
> +
> +	i915_active_fini(&cl->active);
> +	kfree(cl);
> +}
> +
> +static void __cacheline_retire(struct i915_active *active)
> +{
> +	struct i915_timeline_cacheline *cl =
> +		container_of(active, typeof(*cl), active);
> +
> +	i915_vma_unpin(cl->hwsp->vma);
> +	if (cl->free)
> +		__idle_cacheline_free(cl);
> +}
> +
> +static struct i915_timeline_cacheline *
> +cacheline_alloc(struct i915_timeline_hwsp *hwsp, unsigned int cacheline)
> +{
> +	struct i915_timeline_cacheline *cl;
> +	void *vaddr;
> +
> +	GEM_BUG_ON(cacheline >= 64);
> +
> +	cl = kmalloc(sizeof(*cl), GFP_KERNEL);
> +	if (!cl)
> +		return ERR_PTR(-ENOMEM);
> +
> +	vaddr = i915_gem_object_pin_map(hwsp->vma->obj, I915_MAP_WB);
> +	if (IS_ERR(vaddr)) {
> +		kfree(cl);
> +		return ERR_CAST(vaddr);
> +	}
> +
> +	i915_vma_get(hwsp->vma);
> +	cl->hwsp = hwsp;
> +	cl->vaddr = vaddr;
> +	cl->cacheline = cacheline;
> +	cl->free = false;
> +
> +	i915_active_init(hwsp_to_i915(hwsp), &cl->active, __cacheline_retire);
> +
> +	return cl;
> +}
> +
> +static void cacheline_acquire(struct i915_timeline_cacheline *cl)
> +{
> +	if (cl && i915_active_acquire(&cl->active))
> +		__i915_vma_pin(cl->hwsp->vma);
> +}
> +
> +static void cacheline_release(struct i915_timeline_cacheline *cl)
> +{
> +	if (cl)
> +		i915_active_release(&cl->active);
> +}
> +
> +static void cacheline_free(struct i915_timeline_cacheline *cl)
> +{
> +	GEM_BUG_ON(cl->free);
> +	cl->free = true;
> +
> +	if (i915_active_is_idle(&cl->active))
> +		__idle_cacheline_free(cl);
> +}
> +
>   int i915_timeline_init(struct drm_i915_private *i915,
>   		       struct i915_timeline *timeline,
>   		       const char *name,
> @@ -136,29 +215,40 @@ int i915_timeline_init(struct drm_i915_private *i915,
>   	timeline->name = name;
>   	timeline->pin_count = 0;
>   	timeline->has_initial_breadcrumb = !hwsp;
> +	timeline->hwsp_cacheline = NULL;
>   
> -	timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
>   	if (!hwsp) {
> +		struct i915_timeline_cacheline *cl;
>   		unsigned int cacheline;
>   
>   		hwsp = hwsp_alloc(timeline, &cacheline);
>   		if (IS_ERR(hwsp))
>   			return PTR_ERR(hwsp);
>   
> +		cl = cacheline_alloc(hwsp->private, cacheline);
> +		if (IS_ERR(cl)) {
> +			__idle_hwsp_free(hwsp->private, cacheline);
> +			return PTR_ERR(cl);
> +		}
> +
> +		timeline->hwsp_cacheline = cl;
>   		timeline->hwsp_offset = cacheline * CACHELINE_BYTES;
> -	}
> -	timeline->hwsp_ggtt = i915_vma_get(hwsp);
>   
> -	vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
> -	if (IS_ERR(vaddr)) {
> -		hwsp_free(timeline);
> -		i915_vma_put(hwsp);
> -		return PTR_ERR(vaddr);
> +		vaddr = cl->vaddr;
> +	} else {
> +		timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
> +
> +		vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
> +		if (IS_ERR(vaddr))
> +			return PTR_ERR(vaddr);
>   	}
>   
>   	timeline->hwsp_seqno =
>   		memset(vaddr + timeline->hwsp_offset, 0, CACHELINE_BYTES);
>   
> +	timeline->hwsp_ggtt = i915_vma_get(hwsp);
> +	GEM_BUG_ON(timeline->hwsp_offset >= hwsp->size);
> +
>   	timeline->fence_context = dma_fence_context_alloc(1);
>   
>   	spin_lock_init(&timeline->lock);
> @@ -239,9 +329,12 @@ void i915_timeline_fini(struct i915_timeline *timeline)
>   	GEM_BUG_ON(i915_active_request_isset(&timeline->barrier));
>   
>   	i915_syncmap_free(&timeline->sync);
> -	hwsp_free(timeline);
>   
> -	i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
> +	if (timeline->hwsp_cacheline)
> +		cacheline_free(timeline->hwsp_cacheline);
> +	else
> +		i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
> +
>   	i915_vma_put(timeline->hwsp_ggtt);
>   }
>   
> @@ -284,6 +377,7 @@ int i915_timeline_pin(struct i915_timeline *tl)
>   		i915_ggtt_offset(tl->hwsp_ggtt) +
>   		offset_in_page(tl->hwsp_offset);
>   
> +	cacheline_acquire(tl->hwsp_cacheline);
>   	timeline_add_to_active(tl);
>   
>   	return 0;
> @@ -293,6 +387,129 @@ int i915_timeline_pin(struct i915_timeline *tl)
>   	return err;
>   }
>   
> +static u32 timeline_advance(struct i915_timeline *tl)
> +{
> +	GEM_BUG_ON(!tl->pin_count);
> +	GEM_BUG_ON(tl->seqno & tl->has_initial_breadcrumb);
> +
> +	return tl->seqno += 1 + tl->has_initial_breadcrumb;
> +}
> +
> +static void timeline_rollback(struct i915_timeline *tl)
> +{
> +	tl->seqno -= 1 + tl->has_initial_breadcrumb;
> +}
> +
> +static noinline int
> +__i915_timeline_get_seqno(struct i915_timeline *tl,
> +			  struct i915_request *rq,
> +			  u32 *seqno)
> +{
> +	struct i915_timeline_cacheline *cl;
> +	struct i915_vma *vma;
> +	unsigned int cacheline;
> +	int err;
> +
> +	/*
> +	 * If there is an outstanding GPU reference to this cacheline,
> +	 * such as it being sampled by a HW semaphore on another timeline,
> +	 * we cannot wraparound our seqno value (the HW semaphore does
> +	 * a strict greater-than-or-equals compare, not i915_seqno_passed).
> +	 * So if the cacheline is still busy, we must detach ourselves
> +	 * from it and leave it inflight alongside its users.
> +	 *
> +	 * However, if nobody is watching and we can guarantee that nobody
> +	 * will, we could simply reuse the same cacheline.
> +	 *
> +	 * if (i915_active_request_is_signaled(&tl->last_request) &&
> +	 *     i915_active_is_signaled(&tl->hwsp_cacheline->active))
> +	 *	return 0;
> +	 *
> +	 * That seems unlikely for a busy timeline that needed to wrap in
> +	 * the first place, so just replace the cacheline.
> +	 */
> +
> +	vma = hwsp_alloc(tl, &cacheline);
> +	if (IS_ERR(vma)) {
> +		err = PTR_ERR(vma);
> +		goto err_rollback;
> +	}
> +
> +	err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH);
> +	if (err) {
> +		__idle_hwsp_free(vma->private, cacheline);
> +		goto err_rollback;
> +	}
> +
> +	cl = cacheline_alloc(vma->private, cacheline);
> +	if (IS_ERR(cl)) {
> +		err = PTR_ERR(cl);
> +		__idle_hwsp_free(vma->private, cacheline);
> +		goto err_unpin;
> +	}
> +	GEM_BUG_ON(cl->hwsp->vma != vma);
> +
> +	/*
> +	 * Attach the old cacheline to the current request, so that we only
> +	 * free it after the current request is retired, which ensures that
> +	 * all writes into the cacheline from previous requests are complete.
> +	 */
> +	err = i915_active_ref(&tl->hwsp_cacheline->active,
> +			      tl->fence_context, rq);
> +	if (err)
> +		goto err_cacheline;
> +
> +	cacheline_release(tl->hwsp_cacheline); /* ownership now xfered to rq */
> +	cacheline_free(tl->hwsp_cacheline);
> +
> +	i915_vma_unpin(tl->hwsp_ggtt); /* binding kept alive by old cacheline */
> +	i915_vma_put(tl->hwsp_ggtt);
> +
> +	tl->hwsp_ggtt = i915_vma_get(vma);
> +
> +	tl->hwsp_offset = cacheline * CACHELINE_BYTES;
> +	tl->hwsp_seqno =
> +		memset(cl->vaddr + tl->hwsp_offset, 0, CACHELINE_BYTES);
> +
> +	tl->hwsp_offset += i915_ggtt_offset(vma);
> +
> +	cacheline_acquire(cl);
> +	tl->hwsp_cacheline = cl;
> +
> +	*seqno = timeline_advance(tl);
> +	GEM_BUG_ON(i915_seqno_passed(*tl->hwsp_seqno, *seqno));
> +	return 0;
> +
> +err_cacheline:
> +	cacheline_free(cl);
> +err_unpin:
> +	i915_vma_unpin(vma);
> +err_rollback:
> +	timeline_rollback(tl);
> +	return err;
> +}
> +
> +int i915_timeline_get_seqno(struct i915_timeline *tl,
> +			    struct i915_request *rq,
> +			    u32 *seqno)
> +{
> +	*seqno = timeline_advance(tl);
> +
> +	/* Replace the HWSP on wraparound for HW semaphores */
> +	if (unlikely(!*seqno && tl->hwsp_cacheline))
> +		return __i915_timeline_get_seqno(tl, rq, seqno);
> +
> +	return 0;
> +}
> +
> +int i915_timeline_read_lock(struct i915_timeline *tl, struct i915_request *rq)
> +{
> +	GEM_BUG_ON(!tl->pin_count);
> +	GEM_BUG_ON(!tl->hwsp_cacheline);
> +	return i915_active_ref(&tl->hwsp_cacheline->active,
> +			       rq->fence.context, rq);
> +}
> +
>   void i915_timeline_unpin(struct i915_timeline *tl)
>   {
>   	GEM_BUG_ON(!tl->pin_count);
> @@ -300,6 +517,7 @@ void i915_timeline_unpin(struct i915_timeline *tl)
>   		return;
>   
>   	timeline_remove_from_active(tl);
> +	cacheline_release(tl->hwsp_cacheline);
>   
>   	/*
>   	 * Since this timeline is idle, all bariers upon which we were waiting
> diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
> index 7bec7d2e45bf..d78ec6fbc000 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.h
> +++ b/drivers/gpu/drm/i915/i915_timeline.h
> @@ -34,7 +34,7 @@
>   #include "i915_utils.h"
>   
>   struct i915_vma;
> -struct i915_timeline_hwsp;
> +struct i915_timeline_cacheline;
>   
>   struct i915_timeline {
>   	u64 fence_context;
> @@ -49,6 +49,8 @@ struct i915_timeline {
>   	struct i915_vma *hwsp_ggtt;
>   	u32 hwsp_offset;
>   
> +	struct i915_timeline_cacheline *hwsp_cacheline;
> +
>   	bool has_initial_breadcrumb;
>   
>   	/**
> @@ -160,6 +162,11 @@ static inline bool i915_timeline_sync_is_later(struct i915_timeline *tl,
>   }
>   
>   int i915_timeline_pin(struct i915_timeline *tl);
> +int i915_timeline_get_seqno(struct i915_timeline *tl,
> +			    struct i915_request *rq,
> +			    u32 *seqno);
> +int i915_timeline_read_lock(struct i915_timeline *tl,
> +			    struct i915_request *rq);
>   void i915_timeline_unpin(struct i915_timeline *tl);
>   
>   void i915_timelines_init(struct drm_i915_private *i915);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_timeline.c b/drivers/gpu/drm/i915/selftests/i915_timeline.c
> index 12ea69b1a1e5..9e0126867634 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_timeline.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_timeline.c
> @@ -641,6 +641,115 @@ static int live_hwsp_alternate(void *arg)
>   #undef NUM_TIMELINES
>   }
>   
> +static int live_hwsp_wrap(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct intel_engine_cs *engine;
> +	struct i915_timeline *tl;
> +	enum intel_engine_id id;
> +	intel_wakeref_t wakeref;
> +	int err = 0;
> +
> +	/*
> +	 * Across a seqno wrap, we need to keep the old cacheline alive for
> +	 * foreign GPU references.
> +	 */
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +	wakeref = intel_runtime_pm_get(i915);
> +
> +	tl = i915_timeline_create(i915, __func__, NULL);
> +	if (IS_ERR(tl)) {
> +		err = PTR_ERR(tl);
> +		goto out_rpm;
> +	}
> +	if (!tl->has_initial_breadcrumb || !tl->hwsp_cacheline)
> +		goto out_free;
> +
> +	err = i915_timeline_pin(tl);
> +	if (err)
> +		goto out_free;
> +
> +	for_each_engine(engine, i915, id) {
> +		const u32 *hwsp_seqno[2];
> +		struct i915_request *rq;
> +		u32 seqno[2];
> +
> +		rq = i915_request_alloc(engine, i915->kernel_context);
> +		if (IS_ERR(rq)) {
> +			err = PTR_ERR(rq);
> +			goto out;
> +		}
> +
> +		tl->seqno = -4u;
> +
> +		err = i915_timeline_get_seqno(tl, rq, &seqno[0]);
> +		if (err) {
> +			i915_request_add(rq);
> +			goto out;
> +		}
> +		pr_debug("seqno[0]:%08x, hwsp_offset:%08x\n",
> +			 seqno[0], tl->hwsp_offset);
> +
> +		err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[0]);
> +		if (err) {
> +			i915_request_add(rq);
> +			goto out;
> +		}
> +		hwsp_seqno[0] = tl->hwsp_seqno;
> +
> +		err = i915_timeline_get_seqno(tl, rq, &seqno[1]);
> +		if (err) {
> +			i915_request_add(rq);
> +			goto out;
> +		}
> +		pr_debug("seqno[1]:%08x, hwsp_offset:%08x\n",
> +			 seqno[1], tl->hwsp_offset);
> +
> +		err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[1]);
> +		if (err) {
> +			i915_request_add(rq);
> +			goto out;
> +		}
> +		hwsp_seqno[1] = tl->hwsp_seqno;
> +
> +		/* With wrap should come a new hwsp */
> +		GEM_BUG_ON(seqno[1] >= seqno[0]);
> +		GEM_BUG_ON(hwsp_seqno[0] == hwsp_seqno[1]);
> +
> +		i915_request_add(rq);
> +
> +		if (i915_request_wait(rq, I915_WAIT_LOCKED, HZ / 5) < 0) {
> +			pr_err("Wait for timeline writes timed out!\n");
> +			err = -EIO;
> +			goto out;
> +		}
> +
> +		if (*hwsp_seqno[0] != seqno[0] || *hwsp_seqno[1] != seqno[1]) {
> +			pr_err("Bad timeline values: found (%x, %x), expected (%x, %x)\n",
> +			       *hwsp_seqno[0], *hwsp_seqno[1],
> +			       seqno[0], seqno[1]);
> +			err = -EINVAL;
> +			goto out;
> +		}
> +
> +		i915_retire_requests(i915); /* recycle HWSP */
> +	}
> +
> +out:
> +	if (igt_flush_test(i915, I915_WAIT_LOCKED))
> +		err = -EIO;
> +
> +	i915_timeline_unpin(tl);
> +out_free:
> +	i915_timeline_put(tl);
> +out_rpm:
> +	intel_runtime_pm_put(i915, wakeref);
> +	mutex_unlock(&i915->drm.struct_mutex);
> +
> +	return err;
> +}
> +
>   static int live_hwsp_recycle(void *arg)
>   {
>   	struct drm_i915_private *i915 = arg;
> @@ -723,6 +832,7 @@ int i915_timeline_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(live_hwsp_recycle),
>   		SUBTEST(live_hwsp_engine),
>   		SUBTEST(live_hwsp_alternate),
> +		SUBTEST(live_hwsp_wrap),
>   	};
>   
>   	return i915_subtests(tests, i915);
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 19/22] drm/i915/execlists: Refactor out can_merge_rq()
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (17 preceding siblings ...)
  2019-02-04 13:22 ` [PATCH 18/22] drm/i915: Keep timeline HWSP allocated until idle across the system Chris Wilson
@ 2019-02-04 13:22 ` Chris Wilson
  2019-02-04 19:02   ` Tvrtko Ursulin
  2019-02-04 13:22 ` [PATCH 20/22] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
                   ` (9 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:22 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we add another user that wants to check whether
requests can be merge into a single HW execution, and in the future we
want to add more conditions under which requests from the same context
cannot be merge. In preparation, extract out can_merge_rq().

v2: Reorder tests to decide if we can continue filling ELSP and bonus
comments.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 35 ++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e37f207afb5a..66d465708bc6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -285,12 +285,11 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 }
 
 __maybe_unused static inline bool
-assert_priority_queue(const struct intel_engine_execlists *execlists,
-		      const struct i915_request *prev,
+assert_priority_queue(const struct i915_request *prev,
 		      const struct i915_request *next)
 {
-	if (!prev)
-		return true;
+	const struct intel_engine_execlists *execlists =
+		&prev->engine->execlists;
 
 	/*
 	 * Without preemption, the prev may refer to the still active element
@@ -601,6 +600,17 @@ static bool can_merge_ctx(const struct intel_context *prev,
 	return true;
 }
 
+static bool can_merge_rq(const struct i915_request *prev,
+			 const struct i915_request *next)
+{
+	GEM_BUG_ON(!assert_priority_queue(prev, next));
+
+	if (!can_merge_ctx(prev->hw_context, next->hw_context))
+		return false;
+
+	return true;
+}
+
 static void port_assign(struct execlist_port *port, struct i915_request *rq)
 {
 	GEM_BUG_ON(rq == port_request(port));
@@ -753,8 +763,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		int i;
 
 		priolist_for_each_request_consume(rq, rn, p, i) {
-			GEM_BUG_ON(!assert_priority_queue(execlists, last, rq));
-
 			/*
 			 * Can we combine this request with the current port?
 			 * It has to be the same context/ringbuffer and not
@@ -766,8 +774,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			 * second request, and so we never need to tell the
 			 * hardware about the first.
 			 */
-			if (last &&
-			    !can_merge_ctx(rq->hw_context, last->hw_context)) {
+			if (last && !can_merge_rq(last, rq)) {
 				/*
 				 * If we are on the second port and cannot
 				 * combine this request with the last, then we
@@ -776,6 +783,14 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				if (port == last_port)
 					goto done;
 
+				/*
+				 * We must not populate both ELSP[] with the
+				 * same LRCA, i.e. we must submit 2 different
+				 * contexts if we submit 2 ELSP.
+				 */
+				if (last->hw_context == rq->hw_context)
+					goto done;
+
 				/*
 				 * If GVT overrides us we only ever submit
 				 * port[0], leaving port[1] empty. Note that we
@@ -787,7 +802,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				    ctx_single_port_submission(rq->hw_context))
 					goto done;
 
-				GEM_BUG_ON(last->hw_context == rq->hw_context);
 
 				if (submit)
 					port_assign(port, last);
@@ -826,8 +840,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * request triggering preemption on the next dequeue (or subsequent
 	 * interrupt for secondary ports).
 	 */
-	execlists->queue_priority_hint =
-		port != execlists->port ? rq_prio(last) : INT_MIN;
+	execlists->queue_priority_hint = queue_prio(execlists);
 
 	if (submit) {
 		port_assign(port, last);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 19/22] drm/i915/execlists: Refactor out can_merge_rq()
  2019-02-04 13:22 ` [PATCH 19/22] drm/i915/execlists: Refactor out can_merge_rq() Chris Wilson
@ 2019-02-04 19:02   ` Tvrtko Ursulin
  0 siblings, 0 replies; 45+ messages in thread
From: Tvrtko Ursulin @ 2019-02-04 19:02 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/02/2019 13:22, Chris Wilson wrote:
> In the next patch, we add another user that wants to check whether
> requests can be merge into a single HW execution, and in the future we
> want to add more conditions under which requests from the same context
> cannot be merge. In preparation, extract out can_merge_rq().
> 
> v2: Reorder tests to decide if we can continue filling ELSP and bonus
> comments.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 35 ++++++++++++++++++++++----------
>   1 file changed, 24 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index e37f207afb5a..66d465708bc6 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -285,12 +285,11 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
>   }
>   
>   __maybe_unused static inline bool
> -assert_priority_queue(const struct intel_engine_execlists *execlists,
> -		      const struct i915_request *prev,
> +assert_priority_queue(const struct i915_request *prev,
>   		      const struct i915_request *next)
>   {
> -	if (!prev)
> -		return true;
> +	const struct intel_engine_execlists *execlists =
> +		&prev->engine->execlists;
>   
>   	/*
>   	 * Without preemption, the prev may refer to the still active element
> @@ -601,6 +600,17 @@ static bool can_merge_ctx(const struct intel_context *prev,
>   	return true;
>   }
>   
> +static bool can_merge_rq(const struct i915_request *prev,
> +			 const struct i915_request *next)
> +{
> +	GEM_BUG_ON(!assert_priority_queue(prev, next));
> +
> +	if (!can_merge_ctx(prev->hw_context, next->hw_context))
> +		return false;
> +
> +	return true;
> +}
> +
>   static void port_assign(struct execlist_port *port, struct i915_request *rq)
>   {
>   	GEM_BUG_ON(rq == port_request(port));
> @@ -753,8 +763,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		int i;
>   
>   		priolist_for_each_request_consume(rq, rn, p, i) {
> -			GEM_BUG_ON(!assert_priority_queue(execlists, last, rq));
> -
>   			/*
>   			 * Can we combine this request with the current port?
>   			 * It has to be the same context/ringbuffer and not
> @@ -766,8 +774,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   			 * second request, and so we never need to tell the
>   			 * hardware about the first.
>   			 */
> -			if (last &&
> -			    !can_merge_ctx(rq->hw_context, last->hw_context)) {
> +			if (last && !can_merge_rq(last, rq)) {
>   				/*
>   				 * If we are on the second port and cannot
>   				 * combine this request with the last, then we
> @@ -776,6 +783,14 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   				if (port == last_port)
>   					goto done;
>   
> +				/*
> +				 * We must not populate both ELSP[] with the
> +				 * same LRCA, i.e. we must submit 2 different
> +				 * contexts if we submit 2 ELSP.
> +				 */
> +				if (last->hw_context == rq->hw_context)
> +					goto done;
> +
>   				/*
>   				 * If GVT overrides us we only ever submit
>   				 * port[0], leaving port[1] empty. Note that we
> @@ -787,7 +802,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   				    ctx_single_port_submission(rq->hw_context))
>   					goto done;
>   
> -				GEM_BUG_ON(last->hw_context == rq->hw_context);
>   
>   				if (submit)
>   					port_assign(port, last);
> @@ -826,8 +840,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	 * request triggering preemption on the next dequeue (or subsequent
>   	 * interrupt for secondary ports).
>   	 */
> -	execlists->queue_priority_hint =
> -		port != execlists->port ? rq_prio(last) : INT_MIN;
> +	execlists->queue_priority_hint = queue_prio(execlists);
>   
>   	if (submit) {
>   		port_assign(port, last);
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 20/22] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (18 preceding siblings ...)
  2019-02-04 13:22 ` [PATCH 19/22] drm/i915/execlists: Refactor out can_merge_rq() Chris Wilson
@ 2019-02-04 13:22 ` Chris Wilson
  2019-02-05  9:24   ` Tvrtko Ursulin
  2019-02-04 13:22 ` [PATCH 21/22] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
                   ` (8 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:22 UTC (permalink / raw)
  To: intel-gfx

Having introduced per-context seqno, we now have a means to identity
progress across the system without feel of rollback as befell the
global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
advance of submission safe in the knowledge that our target seqno and
address is stable.

However, since we are telling the GPU to busy-spin on the target address
until it matches the signaling seqno, we only want to do so when we are
sure that busy-spin will be completed quickly. To achieve this we only
submit the request to HW once the signaler is itself executing (modulo
preemption causing us to wait longer), and we only do so for default and
above priority requests (so that idle priority tasks never themselves
hog the GPU waiting for others).

v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway.
v4: Tell the world and include it as part of scheduler caps.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c           |   2 +-
 drivers/gpu/drm/i915/i915_request.c       | 136 +++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_request.h       |   1 +
 drivers/gpu/drm/i915/i915_sw_fence.c      |   4 +-
 drivers/gpu/drm/i915/i915_sw_fence.h      |   3 +
 drivers/gpu/drm/i915/intel_engine_cs.c    |   1 +
 drivers/gpu/drm/i915/intel_gpu_commands.h |   5 +
 drivers/gpu/drm/i915/intel_lrc.c          |   1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h   |   7 ++
 include/uapi/drm/i915_drm.h               |   1 +
 10 files changed, 156 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index a7aaa1ac4c99..7e38e2b61a2e 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -349,7 +349,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
 		value = min_t(int, INTEL_PPGTT(dev_priv), I915_GEM_PPGTT_FULL);
 		break;
 	case I915_PARAM_HAS_SEMAPHORES:
-		value = 0;
+		value = !!(dev_priv->caps.scheduler & I915_SCHEDULER_CAP_SEMAPHORES);
 		break;
 	case I915_PARAM_HAS_SECURE_BATCHES:
 		value = capable(CAP_SYS_ADMIN);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 057bffa56700..116bd9648db7 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -22,8 +22,9 @@
  *
  */
 
-#include <linux/prefetch.h>
 #include <linux/dma-fence-array.h>
+#include <linux/irq_work.h>
+#include <linux/prefetch.h>
 #include <linux/sched.h>
 #include <linux/sched/clock.h>
 #include <linux/sched/signal.h>
@@ -32,9 +33,16 @@
 #include "i915_active.h"
 #include "i915_reset.h"
 
+struct execute_cb {
+	struct list_head link;
+	struct irq_work work;
+	struct i915_sw_fence *fence;
+};
+
 static struct i915_global_request {
 	struct kmem_cache *slab_requests;
 	struct kmem_cache *slab_dependencies;
+	struct kmem_cache *slab_execute_cbs;
 } global;
 
 static const char *i915_fence_get_driver_name(struct dma_fence *fence)
@@ -331,6 +339,69 @@ void i915_request_retire_upto(struct i915_request *rq)
 	} while (tmp != rq);
 }
 
+static void irq_execute_cb(struct irq_work *wrk)
+{
+	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
+
+	i915_sw_fence_complete(cb->fence);
+	kmem_cache_free(global.slab_execute_cbs, cb);
+}
+
+static void __notify_execute_cb(struct i915_request *rq)
+{
+	struct execute_cb *cb;
+
+	lockdep_assert_held(&rq->lock);
+
+	if (list_empty(&rq->execute_cb))
+		return;
+
+	list_for_each_entry(cb, &rq->execute_cb, link)
+		irq_work_queue(&cb->work);
+
+	/*
+	 * XXX Rollback on __i915_request_unsubmit()
+	 *
+	 * In the future, perhaps when we have an active time-slicing scheduler,
+	 * it will be interesting to unsubmit parallel execution and remove
+	 * busywaits from the GPU until their master is restarted. This is
+	 * quite hairy, we have to carefully rollback the fence and do a
+	 * preempt-to-idle cycle on the target engine, all the while the
+	 * master execute_cb may refire.
+	 */
+	INIT_LIST_HEAD(&rq->execute_cb);
+}
+
+static int
+i915_request_await_execution(struct i915_request *rq,
+			     struct i915_request *signal,
+			     gfp_t gfp)
+{
+	struct execute_cb *cb;
+
+	if (i915_request_is_active(signal))
+		return 0;
+
+	cb = kmem_cache_alloc(global.slab_execute_cbs, gfp);
+	if (!cb)
+		return -ENOMEM;
+
+	cb->fence = &rq->submit;
+	i915_sw_fence_await(cb->fence);
+	init_irq_work(&cb->work, irq_execute_cb);
+
+	spin_lock_irq(&signal->lock);
+	if (i915_request_is_active(signal)) {
+		i915_sw_fence_complete(cb->fence);
+		kmem_cache_free(global.slab_execute_cbs, cb);
+	} else {
+		list_add_tail(&cb->link, &signal->execute_cb);
+	}
+	spin_unlock_irq(&signal->lock);
+
+	return 0;
+}
+
 static void move_to_timeline(struct i915_request *request,
 			     struct i915_timeline *timeline)
 {
@@ -389,6 +460,7 @@ void __i915_request_submit(struct i915_request *request)
 	 */
 	BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */
 	request->sched.attr.priority |= __NO_PREEMPTION;
+	__notify_execute_cb(request);
 
 	spin_unlock(&request->lock);
 
@@ -630,6 +702,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	}
 
 	INIT_LIST_HEAD(&rq->active_list);
+	INIT_LIST_HEAD(&rq->execute_cb);
 
 	tl = ce->ring->timeline;
 	ret = i915_timeline_get_seqno(tl, rq, &seqno);
@@ -717,6 +790,51 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	return ERR_PTR(ret);
 }
 
+static int
+emit_semaphore_wait(struct i915_request *to,
+		    struct i915_request *from,
+		    gfp_t gfp)
+{
+	u32 *cs;
+	int err;
+
+	GEM_BUG_ON(!from->timeline->has_initial_breadcrumb);
+	GEM_BUG_ON(INTEL_GEN(to->i915) < 8);
+
+	/* We need to pin the signaler's HWSP until we are finished reading. */
+	err = i915_timeline_read_lock(from->timeline, to);
+	if (err)
+		return err;
+
+	/* Only submit our spinner after the signaler is running! */
+	err = i915_request_await_execution(to, from, gfp);
+	if (err)
+		return err;
+
+	cs = intel_ring_begin(to, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	/*
+	 * Using greater-than-or-equal here means we have to worry
+	 * about seqno wraparound. To side step that issue, we swap
+	 * the timeline HWSP upon wrapping, so that everyone listening
+	 * for the old (pre-wrap) values do not see the much smaller
+	 * (post-wrap) values than they were expecting (and so wait
+	 * forever).
+	 */
+	*cs++ = MI_SEMAPHORE_WAIT |
+		MI_SEMAPHORE_GLOBAL_GTT |
+		MI_SEMAPHORE_POLL |
+		MI_SEMAPHORE_SAD_GTE_SDD;
+	*cs++ = from->fence.seqno;
+	*cs++ = from->timeline->hwsp_offset;
+	*cs++ = 0;
+
+	intel_ring_advance(to, cs);
+	return 0;
+}
+
 static int
 i915_request_await_request(struct i915_request *to, struct i915_request *from)
 {
@@ -738,6 +856,9 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 		ret = i915_sw_fence_await_sw_fence_gfp(&to->submit,
 						       &from->submit,
 						       I915_FENCE_GFP);
+	} else if (intel_engine_has_semaphores(to->engine) &&
+		   to->gem_context->sched.priority >= I915_PRIORITY_NORMAL) {
+		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
 	} else {
 		ret = i915_sw_fence_await_dma_fence(&to->submit,
 						    &from->fence, 0,
@@ -1212,14 +1333,23 @@ int i915_global_request_init(void)
 	if (!global.slab_requests)
 		return -ENOMEM;
 
+	global.slab_execute_cbs = KMEM_CACHE(execute_cb,
+					     SLAB_HWCACHE_ALIGN |
+					     SLAB_RECLAIM_ACCOUNT |
+					     SLAB_TYPESAFE_BY_RCU);
+	if (!global.slab_execute_cbs)
+		goto err_requests;
+
 	global.slab_dependencies = KMEM_CACHE(i915_dependency,
 					      SLAB_HWCACHE_ALIGN |
 					      SLAB_RECLAIM_ACCOUNT);
 	if (!global.slab_dependencies)
-		goto err_requests;
+		goto err_execute_cbs;
 
 	return 0;
 
+err_execute_cbs:
+	kmem_cache_destroy(global.slab_execute_cbs);
 err_requests:
 	kmem_cache_destroy(global.slab_requests);
 	return -ENOMEM;
@@ -1228,11 +1358,13 @@ int i915_global_request_init(void)
 void i915_global_request_shrink(void)
 {
 	kmem_cache_shrink(global.slab_dependencies);
+	kmem_cache_shrink(global.slab_execute_cbs);
 	kmem_cache_shrink(global.slab_requests);
 }
 
 void i915_global_request_exit(void)
 {
 	kmem_cache_destroy(global.slab_dependencies);
+	kmem_cache_destroy(global.slab_execute_cbs);
 	kmem_cache_destroy(global.slab_requests);
 }
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 071ff1064579..df52776b26cf 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -128,6 +128,7 @@ struct i915_request {
 	 */
 	struct i915_sw_fence submit;
 	wait_queue_entry_t submitq;
+	struct list_head execute_cb;
 
 	/*
 	 * A list of everyone we wait upon, and everyone who waits upon us.
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 7c58b049ecb5..8d1400d378d7 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -192,7 +192,7 @@ static void __i915_sw_fence_complete(struct i915_sw_fence *fence,
 	__i915_sw_fence_notify(fence, FENCE_FREE);
 }
 
-static void i915_sw_fence_complete(struct i915_sw_fence *fence)
+void i915_sw_fence_complete(struct i915_sw_fence *fence)
 {
 	debug_fence_assert(fence);
 
@@ -202,7 +202,7 @@ static void i915_sw_fence_complete(struct i915_sw_fence *fence)
 	__i915_sw_fence_complete(fence, NULL);
 }
 
-static void i915_sw_fence_await(struct i915_sw_fence *fence)
+void i915_sw_fence_await(struct i915_sw_fence *fence)
 {
 	debug_fence_assert(fence);
 	WARN_ON(atomic_inc_return(&fence->pending) <= 1);
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 0e055ea0179f..6dec9e1d1102 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -79,6 +79,9 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    unsigned long timeout,
 				    gfp_t gfp);
 
+void i915_sw_fence_await(struct i915_sw_fence *fence);
+void i915_sw_fence_complete(struct i915_sw_fence *fence);
+
 static inline bool i915_sw_fence_signaled(const struct i915_sw_fence *fence)
 {
 	return atomic_read(&fence->pending) <= 0;
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 0dbd6d7c1693..30a308ebbc89 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -621,6 +621,7 @@ void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
 		u32 sched_cap;
 	} map[] = {
 		{ I915_ENGINE_HAS_PREEMPTION, I915_SCHEDULER_CAP_PREEMPTION },
+		{ I915_ENGINE_HAS_SEMAPHORES, I915_SCHEDULER_CAP_SEMAPHORES },
 		{ I915_ENGINE_SUPPORTS_STATS, I915_SCHEDULER_CAP_PMU },
 	};
 	struct intel_engine_cs *engine;
diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h
index b96a31bc1080..0efaadd3bc32 100644
--- a/drivers/gpu/drm/i915/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/intel_gpu_commands.h
@@ -106,7 +106,12 @@
 #define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
 #define MI_SEMAPHORE_WAIT	MI_INSTR(0x1c, 2) /* GEN8+ */
 #define   MI_SEMAPHORE_POLL		(1<<15)
+#define   MI_SEMAPHORE_SAD_GT_SDD	(0<<12)
 #define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
+#define   MI_SEMAPHORE_SAD_LT_SDD	(2<<12)
+#define   MI_SEMAPHORE_SAD_LTE_SDD	(3<<12)
+#define   MI_SEMAPHORE_SAD_EQ_SDD	(4<<12)
+#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5<<12)
 #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
 #define MI_STORE_DWORD_IMM_GEN4	MI_INSTR(0x20, 2)
 #define   MI_MEM_VIRTUAL	(1 << 22) /* 945,g33,965 */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 66d465708bc6..ea813f88fbb3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2307,6 +2307,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
 	engine->park = NULL;
 	engine->unpark = NULL;
 
+	engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
 	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
 	if (engine->i915->preempt_context)
 		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 5dffccb6740e..c9cd60444987 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -496,6 +496,7 @@ struct intel_engine_cs {
 #define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
 #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
 #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
+#define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
 	unsigned int flags;
 
 	/*
@@ -573,6 +574,12 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine)
 	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
 }
 
+static inline bool
+intel_engine_has_semaphores(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
+}
+
 void intel_engines_set_scheduler_caps(struct drm_i915_private *i915);
 
 static inline bool __execlists_need_preempt(int prio, int last)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index d8ac7f105734..4b8a07d774b4 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -477,6 +477,7 @@ typedef struct drm_i915_irq_wait {
 #define   I915_SCHEDULER_CAP_PRIORITY	(1ul << 1)
 #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
 #define   I915_SCHEDULER_CAP_PMU	(1ul << 3)
+#define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 4)
 
 #define I915_PARAM_HUC_STATUS		 42
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 20/22] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
  2019-02-04 13:22 ` [PATCH 20/22] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
@ 2019-02-05  9:24   ` Tvrtko Ursulin
  2019-02-05  9:27     ` Chris Wilson
  0 siblings, 1 reply; 45+ messages in thread
From: Tvrtko Ursulin @ 2019-02-05  9:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/02/2019 13:22, Chris Wilson wrote:
> Having introduced per-context seqno, we now have a means to identity
> progress across the system without feel of rollback as befell the
> global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
> advance of submission safe in the knowledge that our target seqno and
> address is stable.
> 
> However, since we are telling the GPU to busy-spin on the target address
> until it matches the signaling seqno, we only want to do so when we are
> sure that busy-spin will be completed quickly. To achieve this we only
> submit the request to HW once the signaler is itself executing (modulo
> preemption causing us to wait longer), and we only do so for default and
> above priority requests (so that idle priority tasks never themselves
> hog the GPU waiting for others).
> 
> v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway.
> v4: Tell the world and include it as part of scheduler caps.

Looks okay to me.

Just a paragraph about power and performance, with ideally a latest 
table of results from media-bench, would be the usual requirement for 
these kind of additions.

Regards,

Tvrtko

> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.c           |   2 +-
>   drivers/gpu/drm/i915/i915_request.c       | 136 +++++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_request.h       |   1 +
>   drivers/gpu/drm/i915/i915_sw_fence.c      |   4 +-
>   drivers/gpu/drm/i915/i915_sw_fence.h      |   3 +
>   drivers/gpu/drm/i915/intel_engine_cs.c    |   1 +
>   drivers/gpu/drm/i915/intel_gpu_commands.h |   5 +
>   drivers/gpu/drm/i915/intel_lrc.c          |   1 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h   |   7 ++
>   include/uapi/drm/i915_drm.h               |   1 +
>   10 files changed, 156 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index a7aaa1ac4c99..7e38e2b61a2e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -349,7 +349,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
>   		value = min_t(int, INTEL_PPGTT(dev_priv), I915_GEM_PPGTT_FULL);
>   		break;
>   	case I915_PARAM_HAS_SEMAPHORES:
> -		value = 0;
> +		value = !!(dev_priv->caps.scheduler & I915_SCHEDULER_CAP_SEMAPHORES);
>   		break;
>   	case I915_PARAM_HAS_SECURE_BATCHES:
>   		value = capable(CAP_SYS_ADMIN);
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 057bffa56700..116bd9648db7 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -22,8 +22,9 @@
>    *
>    */
>   
> -#include <linux/prefetch.h>
>   #include <linux/dma-fence-array.h>
> +#include <linux/irq_work.h>
> +#include <linux/prefetch.h>
>   #include <linux/sched.h>
>   #include <linux/sched/clock.h>
>   #include <linux/sched/signal.h>
> @@ -32,9 +33,16 @@
>   #include "i915_active.h"
>   #include "i915_reset.h"
>   
> +struct execute_cb {
> +	struct list_head link;
> +	struct irq_work work;
> +	struct i915_sw_fence *fence;
> +};
> +
>   static struct i915_global_request {
>   	struct kmem_cache *slab_requests;
>   	struct kmem_cache *slab_dependencies;
> +	struct kmem_cache *slab_execute_cbs;
>   } global;
>   
>   static const char *i915_fence_get_driver_name(struct dma_fence *fence)
> @@ -331,6 +339,69 @@ void i915_request_retire_upto(struct i915_request *rq)
>   	} while (tmp != rq);
>   }
>   
> +static void irq_execute_cb(struct irq_work *wrk)
> +{
> +	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
> +
> +	i915_sw_fence_complete(cb->fence);
> +	kmem_cache_free(global.slab_execute_cbs, cb);
> +}
> +
> +static void __notify_execute_cb(struct i915_request *rq)
> +{
> +	struct execute_cb *cb;
> +
> +	lockdep_assert_held(&rq->lock);
> +
> +	if (list_empty(&rq->execute_cb))
> +		return;
> +
> +	list_for_each_entry(cb, &rq->execute_cb, link)
> +		irq_work_queue(&cb->work);
> +
> +	/*
> +	 * XXX Rollback on __i915_request_unsubmit()
> +	 *
> +	 * In the future, perhaps when we have an active time-slicing scheduler,
> +	 * it will be interesting to unsubmit parallel execution and remove
> +	 * busywaits from the GPU until their master is restarted. This is
> +	 * quite hairy, we have to carefully rollback the fence and do a
> +	 * preempt-to-idle cycle on the target engine, all the while the
> +	 * master execute_cb may refire.
> +	 */
> +	INIT_LIST_HEAD(&rq->execute_cb);
> +}
> +
> +static int
> +i915_request_await_execution(struct i915_request *rq,
> +			     struct i915_request *signal,
> +			     gfp_t gfp)
> +{
> +	struct execute_cb *cb;
> +
> +	if (i915_request_is_active(signal))
> +		return 0;
> +
> +	cb = kmem_cache_alloc(global.slab_execute_cbs, gfp);
> +	if (!cb)
> +		return -ENOMEM;
> +
> +	cb->fence = &rq->submit;
> +	i915_sw_fence_await(cb->fence);
> +	init_irq_work(&cb->work, irq_execute_cb);
> +
> +	spin_lock_irq(&signal->lock);
> +	if (i915_request_is_active(signal)) {
> +		i915_sw_fence_complete(cb->fence);
> +		kmem_cache_free(global.slab_execute_cbs, cb);
> +	} else {
> +		list_add_tail(&cb->link, &signal->execute_cb);
> +	}
> +	spin_unlock_irq(&signal->lock);
> +
> +	return 0;
> +}
> +
>   static void move_to_timeline(struct i915_request *request,
>   			     struct i915_timeline *timeline)
>   {
> @@ -389,6 +460,7 @@ void __i915_request_submit(struct i915_request *request)
>   	 */
>   	BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */
>   	request->sched.attr.priority |= __NO_PREEMPTION;
> +	__notify_execute_cb(request);
>   
>   	spin_unlock(&request->lock);
>   
> @@ -630,6 +702,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	}
>   
>   	INIT_LIST_HEAD(&rq->active_list);
> +	INIT_LIST_HEAD(&rq->execute_cb);
>   
>   	tl = ce->ring->timeline;
>   	ret = i915_timeline_get_seqno(tl, rq, &seqno);
> @@ -717,6 +790,51 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	return ERR_PTR(ret);
>   }
>   
> +static int
> +emit_semaphore_wait(struct i915_request *to,
> +		    struct i915_request *from,
> +		    gfp_t gfp)
> +{
> +	u32 *cs;
> +	int err;
> +
> +	GEM_BUG_ON(!from->timeline->has_initial_breadcrumb);
> +	GEM_BUG_ON(INTEL_GEN(to->i915) < 8);
> +
> +	/* We need to pin the signaler's HWSP until we are finished reading. */
> +	err = i915_timeline_read_lock(from->timeline, to);
> +	if (err)
> +		return err;
> +
> +	/* Only submit our spinner after the signaler is running! */
> +	err = i915_request_await_execution(to, from, gfp);
> +	if (err)
> +		return err;
> +
> +	cs = intel_ring_begin(to, 4);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	/*
> +	 * Using greater-than-or-equal here means we have to worry
> +	 * about seqno wraparound. To side step that issue, we swap
> +	 * the timeline HWSP upon wrapping, so that everyone listening
> +	 * for the old (pre-wrap) values do not see the much smaller
> +	 * (post-wrap) values than they were expecting (and so wait
> +	 * forever).
> +	 */
> +	*cs++ = MI_SEMAPHORE_WAIT |
> +		MI_SEMAPHORE_GLOBAL_GTT |
> +		MI_SEMAPHORE_POLL |
> +		MI_SEMAPHORE_SAD_GTE_SDD;
> +	*cs++ = from->fence.seqno;
> +	*cs++ = from->timeline->hwsp_offset;
> +	*cs++ = 0;
> +
> +	intel_ring_advance(to, cs);
> +	return 0;
> +}
> +
>   static int
>   i915_request_await_request(struct i915_request *to, struct i915_request *from)
>   {
> @@ -738,6 +856,9 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
>   		ret = i915_sw_fence_await_sw_fence_gfp(&to->submit,
>   						       &from->submit,
>   						       I915_FENCE_GFP);
> +	} else if (intel_engine_has_semaphores(to->engine) &&
> +		   to->gem_context->sched.priority >= I915_PRIORITY_NORMAL) {
> +		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
>   	} else {
>   		ret = i915_sw_fence_await_dma_fence(&to->submit,
>   						    &from->fence, 0,
> @@ -1212,14 +1333,23 @@ int i915_global_request_init(void)
>   	if (!global.slab_requests)
>   		return -ENOMEM;
>   
> +	global.slab_execute_cbs = KMEM_CACHE(execute_cb,
> +					     SLAB_HWCACHE_ALIGN |
> +					     SLAB_RECLAIM_ACCOUNT |
> +					     SLAB_TYPESAFE_BY_RCU);
> +	if (!global.slab_execute_cbs)
> +		goto err_requests;
> +
>   	global.slab_dependencies = KMEM_CACHE(i915_dependency,
>   					      SLAB_HWCACHE_ALIGN |
>   					      SLAB_RECLAIM_ACCOUNT);
>   	if (!global.slab_dependencies)
> -		goto err_requests;
> +		goto err_execute_cbs;
>   
>   	return 0;
>   
> +err_execute_cbs:
> +	kmem_cache_destroy(global.slab_execute_cbs);
>   err_requests:
>   	kmem_cache_destroy(global.slab_requests);
>   	return -ENOMEM;
> @@ -1228,11 +1358,13 @@ int i915_global_request_init(void)
>   void i915_global_request_shrink(void)
>   {
>   	kmem_cache_shrink(global.slab_dependencies);
> +	kmem_cache_shrink(global.slab_execute_cbs);
>   	kmem_cache_shrink(global.slab_requests);
>   }
>   
>   void i915_global_request_exit(void)
>   {
>   	kmem_cache_destroy(global.slab_dependencies);
> +	kmem_cache_destroy(global.slab_execute_cbs);
>   	kmem_cache_destroy(global.slab_requests);
>   }
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 071ff1064579..df52776b26cf 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -128,6 +128,7 @@ struct i915_request {
>   	 */
>   	struct i915_sw_fence submit;
>   	wait_queue_entry_t submitq;
> +	struct list_head execute_cb;
>   
>   	/*
>   	 * A list of everyone we wait upon, and everyone who waits upon us.
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
> index 7c58b049ecb5..8d1400d378d7 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> @@ -192,7 +192,7 @@ static void __i915_sw_fence_complete(struct i915_sw_fence *fence,
>   	__i915_sw_fence_notify(fence, FENCE_FREE);
>   }
>   
> -static void i915_sw_fence_complete(struct i915_sw_fence *fence)
> +void i915_sw_fence_complete(struct i915_sw_fence *fence)
>   {
>   	debug_fence_assert(fence);
>   
> @@ -202,7 +202,7 @@ static void i915_sw_fence_complete(struct i915_sw_fence *fence)
>   	__i915_sw_fence_complete(fence, NULL);
>   }
>   
> -static void i915_sw_fence_await(struct i915_sw_fence *fence)
> +void i915_sw_fence_await(struct i915_sw_fence *fence)
>   {
>   	debug_fence_assert(fence);
>   	WARN_ON(atomic_inc_return(&fence->pending) <= 1);
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
> index 0e055ea0179f..6dec9e1d1102 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.h
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.h
> @@ -79,6 +79,9 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>   				    unsigned long timeout,
>   				    gfp_t gfp);
>   
> +void i915_sw_fence_await(struct i915_sw_fence *fence);
> +void i915_sw_fence_complete(struct i915_sw_fence *fence);
> +
>   static inline bool i915_sw_fence_signaled(const struct i915_sw_fence *fence)
>   {
>   	return atomic_read(&fence->pending) <= 0;
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 0dbd6d7c1693..30a308ebbc89 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -621,6 +621,7 @@ void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
>   		u32 sched_cap;
>   	} map[] = {
>   		{ I915_ENGINE_HAS_PREEMPTION, I915_SCHEDULER_CAP_PREEMPTION },
> +		{ I915_ENGINE_HAS_SEMAPHORES, I915_SCHEDULER_CAP_SEMAPHORES },
>   		{ I915_ENGINE_SUPPORTS_STATS, I915_SCHEDULER_CAP_PMU },
>   	};
>   	struct intel_engine_cs *engine;
> diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h
> index b96a31bc1080..0efaadd3bc32 100644
> --- a/drivers/gpu/drm/i915/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/intel_gpu_commands.h
> @@ -106,7 +106,12 @@
>   #define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
>   #define MI_SEMAPHORE_WAIT	MI_INSTR(0x1c, 2) /* GEN8+ */
>   #define   MI_SEMAPHORE_POLL		(1<<15)
> +#define   MI_SEMAPHORE_SAD_GT_SDD	(0<<12)
>   #define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
> +#define   MI_SEMAPHORE_SAD_LT_SDD	(2<<12)
> +#define   MI_SEMAPHORE_SAD_LTE_SDD	(3<<12)
> +#define   MI_SEMAPHORE_SAD_EQ_SDD	(4<<12)
> +#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5<<12)
>   #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
>   #define MI_STORE_DWORD_IMM_GEN4	MI_INSTR(0x20, 2)
>   #define   MI_MEM_VIRTUAL	(1 << 22) /* 945,g33,965 */
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 66d465708bc6..ea813f88fbb3 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2307,6 +2307,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
>   	engine->park = NULL;
>   	engine->unpark = NULL;
>   
> +	engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
>   	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
>   	if (engine->i915->preempt_context)
>   		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 5dffccb6740e..c9cd60444987 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -496,6 +496,7 @@ struct intel_engine_cs {
>   #define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
>   #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
>   #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
> +#define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
>   	unsigned int flags;
>   
>   	/*
> @@ -573,6 +574,12 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine)
>   	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
>   }
>   
> +static inline bool
> +intel_engine_has_semaphores(const struct intel_engine_cs *engine)
> +{
> +	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
> +}
> +
>   void intel_engines_set_scheduler_caps(struct drm_i915_private *i915);
>   
>   static inline bool __execlists_need_preempt(int prio, int last)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index d8ac7f105734..4b8a07d774b4 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -477,6 +477,7 @@ typedef struct drm_i915_irq_wait {
>   #define   I915_SCHEDULER_CAP_PRIORITY	(1ul << 1)
>   #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
>   #define   I915_SCHEDULER_CAP_PMU	(1ul << 3)
> +#define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 4)
>   
>   #define I915_PARAM_HUC_STATUS		 42
>   
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 20/22] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
  2019-02-05  9:24   ` Tvrtko Ursulin
@ 2019-02-05  9:27     ` Chris Wilson
  0 siblings, 0 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-05  9:27 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-05 09:24:24)
> 
> On 04/02/2019 13:22, Chris Wilson wrote:
> > Having introduced per-context seqno, we now have a means to identity
> > progress across the system without feel of rollback as befell the
> > global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
> > advance of submission safe in the knowledge that our target seqno and
> > address is stable.
> > 
> > However, since we are telling the GPU to busy-spin on the target address
> > until it matches the signaling seqno, we only want to do so when we are
> > sure that busy-spin will be completed quickly. To achieve this we only
> > submit the request to HW once the signaler is itself executing (modulo
> > preemption causing us to wait longer), and we only do so for default and
> > above priority requests (so that idle priority tasks never themselves
> > hog the GPU waiting for others).
> > 
> > v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway.
> > v4: Tell the world and include it as part of scheduler caps.
> 
> Looks okay to me.
> 
> Just a paragraph about power and performance, with ideally a latest 
> table of results from media-bench, would be the usual requirement for 
> these kind of additions.

"A bunch of vague waffle about how well it performs in
microbenchmarks, single client configs and doesn't regress multiple
clients, without mentioning any details that may be construed as an
absolute promise of performance metrics, despite that we give you the
tools to measure and confirm what we are trying not to say."
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 21/22] drm/i915: Prioritise non-busywait semaphore workloads
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (19 preceding siblings ...)
  2019-02-04 13:22 ` [PATCH 20/22] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
@ 2019-02-04 13:22 ` Chris Wilson
  2019-02-04 13:22 ` [PATCH 22/22] semaphore-no-stats Chris Wilson
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:22 UTC (permalink / raw)
  To: intel-gfx

We don't want to busywait on the GPU if we have other work to do. If we
give non-busywaiting workloads higher (initial) priority than workloads
that require a busywait, we will prioritise work that is ready to run
immediately. We then also have to be careful that we don't give earlier
semaphores an accidental boost because later work doesn't wait on other
rings, hence we keep a history of semaphore usage of the dependency chain.

Testcase: igt/gem_exec_schedule/semaphore
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c   | 16 ++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.c | 10 ++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h |  9 ++++++---
 drivers/gpu/drm/i915/intel_lrc.c      |  2 +-
 4 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 116bd9648db7..91dcaf07958f 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -832,6 +832,7 @@ emit_semaphore_wait(struct i915_request *to,
 	*cs++ = 0;
 
 	intel_ring_advance(to, cs);
+	to->sched.semaphore |= I915_SCHED_HAS_SEMAPHORE;
 	return 0;
 }
 
@@ -1102,6 +1103,21 @@ void i915_request_add(struct i915_request *request)
 	if (engine->schedule) {
 		struct i915_sched_attr attr = request->gem_context->sched;
 
+		/*
+		 * Boost actual workloads past semaphores!
+		 *
+		 * With semaphores we spin on one engine waiting for another,
+		 * simply to reduce the latency of starting our work when
+		 * the signaler completes. However, if there is any other
+		 * work that we could be doing on this engine instead, that
+		 * is better utilisation and will reduce the overall duration
+		 * of the current work. To avoid PI boosting a semaphore
+		 * far in the distance past over useful work, we keep a history
+		 * of any semaphore use along our dependency chain.
+		 */
+		if (!request->sched.semaphore)
+			attr.priority |= I915_PRIORITY_NOSEMAPHORE;
+
 		/*
 		 * Boost priorities to new clients (new request flows).
 		 *
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 7c1d9ef98374..9675ead24f75 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -28,12 +28,18 @@ static inline bool node_signaled(const struct i915_sched_node *node)
 	return i915_request_completed(node_to_request(node));
 }
 
+static inline bool node_started(const struct i915_sched_node *node)
+{
+	return i915_request_started(node_to_request(node));
+}
+
 void i915_sched_node_init(struct i915_sched_node *node)
 {
 	INIT_LIST_HEAD(&node->signalers_list);
 	INIT_LIST_HEAD(&node->waiters_list);
 	INIT_LIST_HEAD(&node->link);
 	node->attr.priority = I915_PRIORITY_INVALID;
+	node->semaphore = 0;
 }
 
 static struct i915_dependency *
@@ -64,6 +70,10 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 		dep->signaler = signal;
 		dep->flags = flags;
 
+		/* Keep track of whether anyone on this chain has a semaphore */
+		if (signal->semaphore && !node_started(signal))
+			node->semaphore |= signal->semaphore << 1;
+
 		ret = true;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 5196ce07b6c2..24c2c027fd2c 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -24,14 +24,15 @@ enum {
 	I915_PRIORITY_INVALID = INT_MIN
 };
 
-#define I915_USER_PRIORITY_SHIFT 2
+#define I915_USER_PRIORITY_SHIFT 3
 #define I915_USER_PRIORITY(x) ((x) << I915_USER_PRIORITY_SHIFT)
 
 #define I915_PRIORITY_COUNT BIT(I915_USER_PRIORITY_SHIFT)
 #define I915_PRIORITY_MASK (I915_PRIORITY_COUNT - 1)
 
-#define I915_PRIORITY_WAIT	((u8)BIT(0))
-#define I915_PRIORITY_NEWCLIENT	((u8)BIT(1))
+#define I915_PRIORITY_WAIT		((u8)BIT(0))
+#define I915_PRIORITY_NEWCLIENT		((u8)BIT(1))
+#define I915_PRIORITY_NOSEMAPHORE	((u8)BIT(2))
 
 #define __NO_PREEMPTION (I915_PRIORITY_WAIT)
 
@@ -74,6 +75,8 @@ struct i915_sched_node {
 	struct list_head waiters_list; /* those after us, they depend upon us */
 	struct list_head link;
 	struct i915_sched_attr attr;
+	unsigned long semaphore;
+#define I915_SCHED_HAS_SEMAPHORE	BIT(0)
 };
 
 struct i915_dependency {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ea813f88fbb3..ae90ce034252 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -164,7 +164,7 @@
 #define WA_TAIL_DWORDS 2
 #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
 
-#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
+#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE)
 
 static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 					    struct intel_engine_cs *engine,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 22/22] semaphore-no-stats
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (20 preceding siblings ...)
  2019-02-04 13:22 ` [PATCH 21/22] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
@ 2019-02-04 13:22 ` Chris Wilson
  2019-02-05 10:03   ` Tvrtko Ursulin
  2019-02-04 13:58 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption Patchwork
                   ` (6 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Chris Wilson @ 2019-02-04 13:22 UTC (permalink / raw)
  To: intel-gfx

SW PMU reports semaphore time as busy, HW PMU reports semaphore time as
idle. Who is correct?
---
 drivers/gpu/drm/i915/intel_lrc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ae90ce034252..d00b268ed6ee 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2308,7 +2308,6 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
 	engine->unpark = NULL;
 
 	engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
-	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
 	if (engine->i915->preempt_context)
 		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 22/22] semaphore-no-stats
  2019-02-04 13:22 ` [PATCH 22/22] semaphore-no-stats Chris Wilson
@ 2019-02-05 10:03   ` Tvrtko Ursulin
  2019-02-05 10:07     ` Chris Wilson
  0 siblings, 1 reply; 45+ messages in thread
From: Tvrtko Ursulin @ 2019-02-05 10:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 04/02/2019 13:22, Chris Wilson wrote:
> SW PMU reports semaphore time as busy, HW PMU reports semaphore time as
> idle. Who is correct?

[It's not really HW PMU, it's a different implementation of the SW PMU. :)]

As an additional data point, HW tracking of accumulated total context 
runtime as stored in the PPHWSP also reports semaphore spin time 
(polling mode) as context running.

So overall from the point of view of busy being opposite of idle, it is 
kind of correct. Regardless of whether engine is doing something useful 
or not. It is unavailable for other contexts due some action of the 
currently executing context.

In this light we could view busy as aggregate of busy and semaphore 
want. (MI_WAIT_EVENT is an open.) But there is indeed an inconsistency 
on platforms which cannot do context tracking.

Therefore solution a) add semaphore wait time to busy when reporting 
busy on those platforms.

Advantage - PMU sampling timer is already running on these platform so 
additional cost is small.

 From the point of view of wanting to make busy mean "useful" work, that 
seems much harder.

Option b) could be subtract semaphore wait time from busy, on the other 
set of platforms.

Disadvantage - this would mean running the PMU sampling timer when it 
today doesn't need to.

So I am leaning towards option a). Engine busy time semantics would 
therefore be defined as engine not being idle = occupied by a context 
doing something.

Regards,

Tvrtko

> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index ae90ce034252..d00b268ed6ee 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2308,7 +2308,6 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
>   	engine->unpark = NULL;
>   
>   	engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
> -	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
>   	if (engine->i915->preempt_context)
>   		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
>   }
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 22/22] semaphore-no-stats
  2019-02-05 10:03   ` Tvrtko Ursulin
@ 2019-02-05 10:07     ` Chris Wilson
  0 siblings, 0 replies; 45+ messages in thread
From: Chris Wilson @ 2019-02-05 10:07 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-05 10:03:04)
> 
> On 04/02/2019 13:22, Chris Wilson wrote:
> > SW PMU reports semaphore time as busy, HW PMU reports semaphore time as
> > idle. Who is correct?
> 
> [It's not really HW PMU, it's a different implementation of the SW PMU. :)]
> 
> As an additional data point, HW tracking of accumulated total context 
> runtime as stored in the PPHWSP also reports semaphore spin time 
> (polling mode) as context running.
> 
> So overall from the point of view of busy being opposite of idle, it is 
> kind of correct. Regardless of whether engine is doing something useful 
> or not. It is unavailable for other contexts due some action of the 
> currently executing context.
> 
> In this light we could view busy as aggregate of busy and semaphore 
> want. (MI_WAIT_EVENT is an open.) But there is indeed an inconsistency 
> on platforms which cannot do context tracking.
> 
> Therefore solution a) add semaphore wait time to busy when reporting 
> busy on those platforms.
> 
> Advantage - PMU sampling timer is already running on these platform so 
> additional cost is small.
> 
>  From the point of view of wanting to make busy mean "useful" work, that 
> seems much harder.
> 
> Option b) could be subtract semaphore wait time from busy, on the other 
> set of platforms.
> 
> Disadvantage - this would mean running the PMU sampling timer when it 
> today doesn't need to.
> 
> So I am leaning towards option a). Engine busy time semantics would 
> therefore be defined as engine not being idle = occupied by a context 
> doing something.

(a) is fine by me. The disadvantage is that if clients care about the
spinning they need to account for it themselves... Or they opt out of
semaphores (but they really want a global switch rather than per-context
for accurate system tracking). Is this the compelling reason to have a
context param?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (21 preceding siblings ...)
  2019-02-04 13:22 ` [PATCH 22/22] semaphore-no-stats Chris Wilson
@ 2019-02-04 13:58 ` Patchwork
  2019-02-04 14:07 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2019-02-04 13:58 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption
URL   : https://patchwork.freedesktop.org/series/56183/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
2003a826bcc9 drm/i915/execlists: Suppress mere WAIT preemption
4ea55b47b679 drm/i915/execlists: Suppress redundant preemption
057170380bc2 drm/i915/selftests: Exercise some AB...BA preemption chains
075b40c8a703 drm/i915: Trim NEWCLIENT boosting
770fc9436462 drm/i915: Show support for accurate sw PMU busyness tracking
fc011ccbf2e1 drm/i915: Revoke mmaps and prevent access to fence registers across reset
5ba03f59e64e drm/i915: Force the GPU reset upon wedging
5fc92d0ac60e drm/i915: Uninterruptibly drain the timelines on unwedging
2054755a2346 drm/i915: Wait for old resets before applying debugfs/i915_wedged
3f8a64bf2876 drm/i915: Serialise resets with wedging
c2307ce3ed5b drm/i915: Don't claim an unstarted request was guilty
f524044f24ff drm/i915: Generalise GPU activity tracking
-:31: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#31: 
new file mode 100644

-:36: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#36: FILE: drivers/gpu/drm/i915/i915_active.c:1:
+/*

-:270: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#270: FILE: drivers/gpu/drm/i915/i915_active.h:1:
+/*

-:345: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#345: FILE: drivers/gpu/drm/i915/i915_active_types.h:1:
+/*

-:700: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#700: FILE: drivers/gpu/drm/i915/selftests/i915_active.c:1:
+/*

total: 0 errors, 5 warnings, 0 checks, 798 lines checked
02cb9e1dad1c drm/i915: Release the active tracker tree upon idling
6131355f9bde drm/i915: Allocate active tracking nodes from a slabcache
6e02a325bf2e drm/i915: Make request allocation caches global
-:158: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#158: 
new file mode 100644

-:163: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#163: FILE: drivers/gpu/drm/i915/i915_globals.c:1:
+/*

-:218: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#218: FILE: drivers/gpu/drm/i915/i915_globals.h:1:
+/*

-:558: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'plist' - possible side-effects?
#558: FILE: drivers/gpu/drm/i915/i915_scheduler.h:95:
+#define priolist_for_each_request(it, plist, idx) \
+	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
+		list_for_each_entry(it, &(plist)->requests[idx], sched.link)

-:558: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'idx' - possible side-effects?
#558: FILE: drivers/gpu/drm/i915/i915_scheduler.h:95:
+#define priolist_for_each_request(it, plist, idx) \
+	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
+		list_for_each_entry(it, &(plist)->requests[idx], sched.link)

-:562: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'plist' - possible side-effects?
#562: FILE: drivers/gpu/drm/i915/i915_scheduler.h:99:
+#define priolist_for_each_request_consume(it, n, plist, idx) \
+	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
+		list_for_each_entry_safe(it, n, \
+					 &(plist)->requests[idx - 1], \
+					 sched.link)

-:562: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'idx' - possible side-effects?
#562: FILE: drivers/gpu/drm/i915/i915_scheduler.h:99:
+#define priolist_for_each_request_consume(it, n, plist, idx) \
+	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
+		list_for_each_entry_safe(it, n, \
+					 &(plist)->requests[idx - 1], \
+					 sched.link)

total: 0 errors, 3 warnings, 4 checks, 746 lines checked
74cb3e7eceba drm/i915: Add timeline barrier support
7072ed37f6b4 drm/i915: Pull i915_gem_active into the i915_active family
-:690: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#690: FILE: drivers/gpu/drm/i915/i915_gem_fence_reg.c:227:
+		ret = i915_active_request_retire(&vma->last_fence,
 					     &vma->obj->base.dev->struct_mutex);

-:699: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#699: FILE: drivers/gpu/drm/i915/i915_gem_fence_reg.c:236:
+		ret = i915_active_request_retire(&old->last_fence,
 					     &old->obj->base.dev->struct_mutex);

-:1408: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#1408: FILE: drivers/gpu/drm/i915/i915_vma.c:990:
+		ret = i915_active_request_retire(&vma->last_fence,
+					      &vma->vm->i915->drm.struct_mutex);

total: 0 errors, 0 warnings, 3 checks, 1392 lines checked
fca21928c6b9 drm/i915: Keep timeline HWSP allocated until idle across the system
a68b8158d4d9 drm/i915/execlists: Refactor out can_merge_rq()
b08aeb9031a5 drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
-:326: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#326: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:109:
+#define   MI_SEMAPHORE_SAD_GT_SDD	(0<<12)
                                  	  ^

-:328: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#328: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:111:
+#define   MI_SEMAPHORE_SAD_LT_SDD	(2<<12)
                                  	  ^

-:329: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#329: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:112:
+#define   MI_SEMAPHORE_SAD_LTE_SDD	(3<<12)
                                   	  ^

-:330: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#330: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:113:
+#define   MI_SEMAPHORE_SAD_EQ_SDD	(4<<12)
                                  	  ^

-:331: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#331: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:114:
+#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5<<12)
                                   	  ^

total: 0 errors, 0 warnings, 5 checks, 298 lines checked
842b32511f09 drm/i915: Prioritise non-busywait semaphore workloads
5d4fa7842268 semaphore-no-stats
-:20: ERROR:MISSING_SIGN_OFF: Missing Signed-off-by: line(s)

total: 1 errors, 0 warnings, 0 checks, 7 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (22 preceding siblings ...)
  2019-02-04 13:58 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption Patchwork
@ 2019-02-04 14:07 ` Patchwork
  2019-02-04 14:45 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2019-02-04 14:07 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption
URL   : https://patchwork.freedesktop.org/series/56183/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915/execlists: Suppress mere WAIT preemption
Okay!

Commit: drm/i915/execlists: Suppress redundant preemption
Okay!

Commit: drm/i915/selftests: Exercise some AB...BA preemption chains
Okay!

Commit: drm/i915: Trim NEWCLIENT boosting
Okay!

Commit: drm/i915: Show support for accurate sw PMU busyness tracking
Okay!

Commit: drm/i915: Revoke mmaps and prevent access to fence registers across reset
-drivers/gpu/drm/i915/i915_gem.c:986:39: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/i915_gem.c:986:39: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem.c:986:39: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem.c:986:39: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_reset.c:1304:5: warning: context imbalance in 'i915_reset_trylock' - different lock contexts for basic block
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3551:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3545:16: warning: expression using sizeof(void)

Commit: drm/i915: Force the GPU reset upon wedging
Okay!

Commit: drm/i915: Uninterruptibly drain the timelines on unwedging
Okay!

Commit: drm/i915: Wait for old resets before applying debugfs/i915_wedged
Okay!

Commit: drm/i915: Serialise resets with wedging
Okay!

Commit: drm/i915: Don't claim an unstarted request was guilty
Okay!

Commit: drm/i915: Generalise GPU activity tracking
+./include/uapi/linux/perf_event.h:147:56: warning: cast truncates bits from constant value (8000000000000000 becomes 0)

Commit: drm/i915: Release the active tracker tree upon idling
Okay!

Commit: drm/i915: Allocate active tracking nodes from a slabcache
Okay!

Commit: drm/i915: Make request allocation caches global
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3545:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3542:16: warning: expression using sizeof(void)

Commit: drm/i915: Add timeline barrier support
Okay!

Commit: drm/i915: Pull i915_gem_active into the i915_active family
Okay!

Commit: drm/i915: Keep timeline HWSP allocated until idle across the system
Okay!

Commit: drm/i915/execlists: Refactor out can_merge_rq()
Okay!

Commit: drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
-O:drivers/gpu/drm/i915/i915_drv.c:349:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_drv.c:349:25: warning: expression using sizeof(void)

Commit: drm/i915: Prioritise non-busywait semaphore workloads
Okay!

Commit: semaphore-no-stats
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (23 preceding siblings ...)
  2019-02-04 14:07 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2019-02-04 14:45 ` Patchwork
  2019-02-04 15:22 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption (rev2) Patchwork
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2019-02-04 14:45 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption
URL   : https://patchwork.freedesktop.org/series/56183/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5536 -> Patchwork_12126
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/56183/revisions/1/mbox/

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_12126:

### IGT changes ###

#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@i915_selftest@live_timelines}:
    - fi-gdg-551:         PASS -> DMESG-FAIL

  
Known issues
------------

  Here are the changes found in Patchwork_12126 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_flip@basic-flip-vs-dpms:
    - fi-skl-6700hq:      PASS -> DMESG-WARN [fdo#105998]

  * igt@pm_rpm@basic-rte:
    - fi-byt-j1900:       NOTRUN -> FAIL [fdo#108800]

  * igt@pm_rpm@module-reload:
    - fi-skl-6770hq:      PASS -> FAIL [fdo#108511]

  * igt@prime_vgem@basic-fence-flip:
    - fi-gdg-551:         PASS -> FAIL [fdo#103182]

  
#### Possible fixes ####

  * igt@kms_busy@basic-flip-b:
    - fi-gdg-551:         FAIL [fdo#103182] -> PASS

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-kbl-7500u:       FAIL [fdo#109485] -> PASS

  * igt@kms_pipe_crc_basic@read-crc-pipe-b-frame-sequence:
    - fi-skl-guc:         FAIL [fdo#103191] / [fdo#107362] -> PASS

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
    - fi-blb-e6850:       INCOMPLETE [fdo#107718] -> PASS

  * igt@prime_vgem@basic-fence-flip:
    - fi-ilk-650:         FAIL [fdo#104008] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103182]: https://bugs.freedesktop.org/show_bug.cgi?id=103182
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#104008]: https://bugs.freedesktop.org/show_bug.cgi?id=104008
  [fdo#105998]: https://bugs.freedesktop.org/show_bug.cgi?id=105998
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#108511]: https://bugs.freedesktop.org/show_bug.cgi?id=108511
  [fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
  [fdo#108800]: https://bugs.freedesktop.org/show_bug.cgi?id=108800
  [fdo#109226]: https://bugs.freedesktop.org/show_bug.cgi?id=109226
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289
  [fdo#109294]: https://bugs.freedesktop.org/show_bug.cgi?id=109294
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315
  [fdo#109485]: https://bugs.freedesktop.org/show_bug.cgi?id=109485
  [fdo#109527]: https://bugs.freedesktop.org/show_bug.cgi?id=109527
  [fdo#109528]: https://bugs.freedesktop.org/show_bug.cgi?id=109528
  [fdo#109530]: https://bugs.freedesktop.org/show_bug.cgi?id=109530


Participating hosts (46 -> 42)
------------------------------

  Additional (3): fi-icl-y fi-byt-j1900 fi-pnv-d510 
  Missing    (7): fi-kbl-soraka fi-ilk-m540 fi-hsw-peppy fi-byt-squawks fi-bsw-cyan fi-kbl-7560u fi-byt-clapper 


Build changes
-------------

    * Linux: CI_DRM_5536 -> Patchwork_12126

  CI_DRM_5536: 0a5caf6e62fb99d027b3e6af226abb47be732f15 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4805: cb6610f5a91a08b1d7f8ae910875891003c6f67c @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12126: 5d4fa784226829bb9a4e129d835c22c2c7176ee4 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

5d4fa7842268 semaphore-no-stats
842b32511f09 drm/i915: Prioritise non-busywait semaphore workloads
b08aeb9031a5 drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
a68b8158d4d9 drm/i915/execlists: Refactor out can_merge_rq()
fca21928c6b9 drm/i915: Keep timeline HWSP allocated until idle across the system
7072ed37f6b4 drm/i915: Pull i915_gem_active into the i915_active family
74cb3e7eceba drm/i915: Add timeline barrier support
6e02a325bf2e drm/i915: Make request allocation caches global
6131355f9bde drm/i915: Allocate active tracking nodes from a slabcache
02cb9e1dad1c drm/i915: Release the active tracker tree upon idling
f524044f24ff drm/i915: Generalise GPU activity tracking
c2307ce3ed5b drm/i915: Don't claim an unstarted request was guilty
3f8a64bf2876 drm/i915: Serialise resets with wedging
2054755a2346 drm/i915: Wait for old resets before applying debugfs/i915_wedged
5fc92d0ac60e drm/i915: Uninterruptibly drain the timelines on unwedging
5ba03f59e64e drm/i915: Force the GPU reset upon wedging
fc011ccbf2e1 drm/i915: Revoke mmaps and prevent access to fence registers across reset
770fc9436462 drm/i915: Show support for accurate sw PMU busyness tracking
075b40c8a703 drm/i915: Trim NEWCLIENT boosting
057170380bc2 drm/i915/selftests: Exercise some AB...BA preemption chains
4ea55b47b679 drm/i915/execlists: Suppress redundant preemption
2003a826bcc9 drm/i915/execlists: Suppress mere WAIT preemption

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12126/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption (rev2)
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (24 preceding siblings ...)
  2019-02-04 14:45 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2019-02-04 15:22 ` Patchwork
  2019-02-04 15:30 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2019-02-04 15:22 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption (rev2)
URL   : https://patchwork.freedesktop.org/series/56183/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
1436da54695c drm/i915/execlists: Suppress mere WAIT preemption
c7869190b4c8 drm/i915/execlists: Suppress redundant preemption
cf832a1bbb4e drm/i915/selftests: Exercise some AB...BA preemption chains
a3bddbf1dfc2 drm/i915: Trim NEWCLIENT boosting
-:26: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit b16c765122f9 ("drm/i915: Priority boost for new clients")'
#26: 
References: b16c765122f9 ("drm/i915: Priority boost for new clients")

total: 1 errors, 0 warnings, 0 checks, 8 lines checked
31eba806a7c5 drm/i915: Show support for accurate sw PMU busyness tracking
1680182854c0 drm/i915: Revoke mmaps and prevent access to fence registers across reset
fa66be4a0eee drm/i915: Force the GPU reset upon wedging
50864356bb78 drm/i915: Uninterruptibly drain the timelines on unwedging
4abf66518fa5 drm/i915: Wait for old resets before applying debugfs/i915_wedged
4aebd50e5449 drm/i915: Serialise resets with wedging
d2d2d640906f drm/i915: Don't claim an unstarted request was guilty
29a005168925 drm/i915: Generalise GPU activity tracking
-:31: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#31: 
new file mode 100644

-:36: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#36: FILE: drivers/gpu/drm/i915/i915_active.c:1:
+/*

-:270: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#270: FILE: drivers/gpu/drm/i915/i915_active.h:1:
+/*

-:345: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#345: FILE: drivers/gpu/drm/i915/i915_active_types.h:1:
+/*

-:700: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#700: FILE: drivers/gpu/drm/i915/selftests/i915_active.c:1:
+/*

total: 0 errors, 5 warnings, 0 checks, 798 lines checked
2904cb59aa0b drm/i915: Release the active tracker tree upon idling
a765b74ee2e7 drm/i915: Allocate active tracking nodes from a slabcache
7e5abacea1b5 drm/i915: Make request allocation caches global
-:158: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#158: 
new file mode 100644

-:163: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#163: FILE: drivers/gpu/drm/i915/i915_globals.c:1:
+/*

-:218: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#218: FILE: drivers/gpu/drm/i915/i915_globals.h:1:
+/*

-:558: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'plist' - possible side-effects?
#558: FILE: drivers/gpu/drm/i915/i915_scheduler.h:95:
+#define priolist_for_each_request(it, plist, idx) \
+	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
+		list_for_each_entry(it, &(plist)->requests[idx], sched.link)

-:558: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'idx' - possible side-effects?
#558: FILE: drivers/gpu/drm/i915/i915_scheduler.h:95:
+#define priolist_for_each_request(it, plist, idx) \
+	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
+		list_for_each_entry(it, &(plist)->requests[idx], sched.link)

-:562: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'plist' - possible side-effects?
#562: FILE: drivers/gpu/drm/i915/i915_scheduler.h:99:
+#define priolist_for_each_request_consume(it, n, plist, idx) \
+	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
+		list_for_each_entry_safe(it, n, \
+					 &(plist)->requests[idx - 1], \
+					 sched.link)

-:562: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'idx' - possible side-effects?
#562: FILE: drivers/gpu/drm/i915/i915_scheduler.h:99:
+#define priolist_for_each_request_consume(it, n, plist, idx) \
+	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
+		list_for_each_entry_safe(it, n, \
+					 &(plist)->requests[idx - 1], \
+					 sched.link)

total: 0 errors, 3 warnings, 4 checks, 746 lines checked
86e2bc88638e drm/i915: Add timeline barrier support
9bd42aa88e72 drm/i915: Pull i915_gem_active into the i915_active family
-:690: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#690: FILE: drivers/gpu/drm/i915/i915_gem_fence_reg.c:227:
+		ret = i915_active_request_retire(&vma->last_fence,
 					     &vma->obj->base.dev->struct_mutex);

-:699: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#699: FILE: drivers/gpu/drm/i915/i915_gem_fence_reg.c:236:
+		ret = i915_active_request_retire(&old->last_fence,
 					     &old->obj->base.dev->struct_mutex);

-:1408: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#1408: FILE: drivers/gpu/drm/i915/i915_vma.c:990:
+		ret = i915_active_request_retire(&vma->last_fence,
+					      &vma->vm->i915->drm.struct_mutex);

total: 0 errors, 0 warnings, 3 checks, 1392 lines checked
1b79b880ddc3 drm/i915: Keep timeline HWSP allocated until idle across the system
aa9e7124f366 drm/i915/execlists: Refactor out can_merge_rq()
06fce125dbeb drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
-:326: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#326: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:109:
+#define   MI_SEMAPHORE_SAD_GT_SDD	(0<<12)
                                  	  ^

-:328: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#328: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:111:
+#define   MI_SEMAPHORE_SAD_LT_SDD	(2<<12)
                                  	  ^

-:329: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#329: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:112:
+#define   MI_SEMAPHORE_SAD_LTE_SDD	(3<<12)
                                   	  ^

-:330: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#330: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:113:
+#define   MI_SEMAPHORE_SAD_EQ_SDD	(4<<12)
                                  	  ^

-:331: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#331: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:114:
+#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5<<12)
                                   	  ^

total: 0 errors, 0 warnings, 5 checks, 298 lines checked
9f920e3ee91d drm/i915: Prioritise non-busywait semaphore workloads
8691b315727a semaphore-no-stats
-:20: ERROR:MISSING_SIGN_OFF: Missing Signed-off-by: line(s)

total: 1 errors, 0 warnings, 0 checks, 7 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption (rev2)
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (25 preceding siblings ...)
  2019-02-04 15:22 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption (rev2) Patchwork
@ 2019-02-04 15:30 ` Patchwork
  2019-02-04 15:42 ` ✓ Fi.CI.BAT: success " Patchwork
  2019-02-04 17:51 ` ✓ Fi.CI.IGT: " Patchwork
  28 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2019-02-04 15:30 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption (rev2)
URL   : https://patchwork.freedesktop.org/series/56183/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915/execlists: Suppress mere WAIT preemption
Okay!

Commit: drm/i915/execlists: Suppress redundant preemption
Okay!

Commit: drm/i915/selftests: Exercise some AB...BA preemption chains
Okay!

Commit: drm/i915: Trim NEWCLIENT boosting
Okay!

Commit: drm/i915: Show support for accurate sw PMU busyness tracking
Okay!

Commit: drm/i915: Revoke mmaps and prevent access to fence registers across reset
-drivers/gpu/drm/i915/i915_gem.c:986:39: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/i915_gem.c:986:39: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem.c:986:39: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem.c:986:39: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_reset.c:1304:5: warning: context imbalance in 'i915_reset_trylock' - different lock contexts for basic block
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3551:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3545:16: warning: expression using sizeof(void)

Commit: drm/i915: Force the GPU reset upon wedging
Okay!

Commit: drm/i915: Uninterruptibly drain the timelines on unwedging
Okay!

Commit: drm/i915: Wait for old resets before applying debugfs/i915_wedged
Okay!

Commit: drm/i915: Serialise resets with wedging
Okay!

Commit: drm/i915: Don't claim an unstarted request was guilty
Okay!

Commit: drm/i915: Generalise GPU activity tracking
+./include/uapi/linux/perf_event.h:147:56: warning: cast truncates bits from constant value (8000000000000000 becomes 0)

Commit: drm/i915: Release the active tracker tree upon idling
Okay!

Commit: drm/i915: Allocate active tracking nodes from a slabcache
Okay!

Commit: drm/i915: Make request allocation caches global
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3545:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3542:16: warning: expression using sizeof(void)

Commit: drm/i915: Add timeline barrier support
Okay!

Commit: drm/i915: Pull i915_gem_active into the i915_active family
Okay!

Commit: drm/i915: Keep timeline HWSP allocated until idle across the system
Okay!

Commit: drm/i915/execlists: Refactor out can_merge_rq()
Okay!

Commit: drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
-O:drivers/gpu/drm/i915/i915_drv.c:349:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_drv.c:349:25: warning: expression using sizeof(void)

Commit: drm/i915: Prioritise non-busywait semaphore workloads
Okay!

Commit: semaphore-no-stats
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption (rev2)
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (26 preceding siblings ...)
  2019-02-04 15:30 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2019-02-04 15:42 ` Patchwork
  2019-02-04 17:51 ` ✓ Fi.CI.IGT: " Patchwork
  28 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2019-02-04 15:42 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption (rev2)
URL   : https://patchwork.freedesktop.org/series/56183/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5536 -> Patchwork_12127
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/56183/revisions/2/mbox/

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_12127:

### IGT changes ###

#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@i915_selftest@live_timelines}:
    - fi-gdg-551:         PASS -> DMESG-FAIL

  
Known issues
------------

  Here are the changes found in Patchwork_12127 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_basic@userptr:
    - fi-kbl-8809g:       PASS -> DMESG-WARN [fdo#108965]

  * igt@kms_pipe_crc_basic@read-crc-pipe-b:
    - fi-byt-clapper:     PASS -> FAIL [fdo#107362]

  
#### Possible fixes ####

  * igt@kms_busy@basic-flip-a:
    - fi-gdg-551:         FAIL [fdo#103182] -> PASS +1

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-kbl-7500u:       FAIL [fdo#109485] -> PASS

  * igt@kms_pipe_crc_basic@read-crc-pipe-b-frame-sequence:
    - fi-skl-guc:         FAIL [fdo#103191] / [fdo#107362] -> PASS

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
    - fi-blb-e6850:       INCOMPLETE [fdo#107718] -> PASS

  * igt@prime_vgem@basic-fence-flip:
    - fi-ilk-650:         FAIL [fdo#104008] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103182]: https://bugs.freedesktop.org/show_bug.cgi?id=103182
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#104008]: https://bugs.freedesktop.org/show_bug.cgi?id=104008
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#108965]: https://bugs.freedesktop.org/show_bug.cgi?id=108965
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109485]: https://bugs.freedesktop.org/show_bug.cgi?id=109485


Participating hosts (46 -> 44)
------------------------------

  Additional (2): fi-byt-j1900 fi-pnv-d510 
  Missing    (4): fi-kbl-soraka fi-ilk-m540 fi-byt-squawks fi-bsw-cyan 


Build changes
-------------

    * Linux: CI_DRM_5536 -> Patchwork_12127

  CI_DRM_5536: 0a5caf6e62fb99d027b3e6af226abb47be732f15 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4805: cb6610f5a91a08b1d7f8ae910875891003c6f67c @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12127: 8691b315727abf059962d96dfa06c7421977bb61 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

8691b315727a semaphore-no-stats
9f920e3ee91d drm/i915: Prioritise non-busywait semaphore workloads
06fce125dbeb drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
aa9e7124f366 drm/i915/execlists: Refactor out can_merge_rq()
1b79b880ddc3 drm/i915: Keep timeline HWSP allocated until idle across the system
9bd42aa88e72 drm/i915: Pull i915_gem_active into the i915_active family
86e2bc88638e drm/i915: Add timeline barrier support
7e5abacea1b5 drm/i915: Make request allocation caches global
a765b74ee2e7 drm/i915: Allocate active tracking nodes from a slabcache
2904cb59aa0b drm/i915: Release the active tracker tree upon idling
29a005168925 drm/i915: Generalise GPU activity tracking
d2d2d640906f drm/i915: Don't claim an unstarted request was guilty
4aebd50e5449 drm/i915: Serialise resets with wedging
4abf66518fa5 drm/i915: Wait for old resets before applying debugfs/i915_wedged
50864356bb78 drm/i915: Uninterruptibly drain the timelines on unwedging
fa66be4a0eee drm/i915: Force the GPU reset upon wedging
1680182854c0 drm/i915: Revoke mmaps and prevent access to fence registers across reset
31eba806a7c5 drm/i915: Show support for accurate sw PMU busyness tracking
a3bddbf1dfc2 drm/i915: Trim NEWCLIENT boosting
cf832a1bbb4e drm/i915/selftests: Exercise some AB...BA preemption chains
c7869190b4c8 drm/i915/execlists: Suppress redundant preemption
1436da54695c drm/i915/execlists: Suppress mere WAIT preemption

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12127/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* ✓ Fi.CI.IGT: success for series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption (rev2)
  2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
                   ` (27 preceding siblings ...)
  2019-02-04 15:42 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2019-02-04 17:51 ` Patchwork
  28 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2019-02-04 17:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption (rev2)
URL   : https://patchwork.freedesktop.org/series/56183/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5536_full -> Patchwork_12127_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_12127_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_schedule@pi-ringfull-blt:
    - shard-kbl:          NOTRUN -> FAIL [fdo#103158]

  * igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-b:
    - shard-apl:          NOTRUN -> DMESG-WARN [fdo#107956]

  * igt@kms_ccs@pipe-b-crc-sprite-planes-basic:
    - shard-apl:          PASS -> FAIL [fdo#106510] / [fdo#108145]

  * igt@kms_color@pipe-b-legacy-gamma:
    - shard-apl:          PASS -> FAIL [fdo#104782]

  * igt@kms_cursor_crc@cursor-256x85-onscreen:
    - shard-glk:          PASS -> FAIL [fdo#103232]

  * igt@kms_cursor_crc@cursor-64x64-onscreen:
    - shard-apl:          PASS -> FAIL [fdo#103232]

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-blt:
    - shard-glk:          PASS -> FAIL [fdo#103167] +4

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-gtt:
    - shard-apl:          PASS -> FAIL [fdo#103167]

  * igt@kms_plane@pixel-format-pipe-b-planes:
    - shard-apl:          PASS -> FAIL [fdo#103166]

  * igt@kms_plane@plane-position-covered-pipe-b-planes:
    - shard-apl:          NOTRUN -> FAIL [fdo#103166]

  * igt@kms_plane_alpha_blend@pipe-b-alpha-basic:
    - shard-kbl:          NOTRUN -> FAIL [fdo#108145] / [fdo#108590]

  * igt@kms_plane_alpha_blend@pipe-b-constant-alpha-max:
    - shard-apl:          NOTRUN -> FAIL [fdo#108145] +1

  * igt@kms_plane_multiple@atomic-pipe-a-tiling-x:
    - shard-glk:          PASS -> FAIL [fdo#103166] +1

  
#### Possible fixes ####

  * igt@gem_busy@extended-semaphore-blt:
    - shard-kbl:          {SKIP} [fdo#109271] -> PASS +5
    - shard-glk:          {SKIP} [fdo#109271] -> PASS +3

  * igt@gem_busy@extended-semaphore-vebox:
    - shard-apl:          {SKIP} [fdo#109271] -> PASS +3

  * igt@gem_mmap_gtt@hang:
    - shard-kbl:          FAIL [fdo#109469] -> PASS
    - shard-hsw:          FAIL [fdo#109469] -> PASS
    - shard-snb:          FAIL [fdo#109469] -> PASS
    - shard-apl:          FAIL [fdo#109469] -> PASS

  * igt@gem_pwrite_pread@uncached-copy-performance:
    - shard-apl:          INCOMPLETE [fdo#103927] -> PASS

  * igt@kms_cursor_crc@cursor-128x42-onscreen:
    - shard-glk:          FAIL [fdo#103232] -> PASS +1
    - shard-apl:          FAIL [fdo#103232] -> PASS

  * igt@kms_flip@basic-flip-vs-dpms:
    - shard-kbl:          DMESG-WARN [fdo#103313] / [fdo#105345] / [fdo#108473] -> PASS

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-indfb-msflip-blt:
    - shard-glk:          FAIL [fdo#103167] -> PASS +2

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-fullscreen:
    - shard-apl:          FAIL [fdo#103167] -> PASS

  * igt@kms_plane_multiple@atomic-pipe-b-tiling-yf:
    - shard-apl:          FAIL [fdo#103166] -> PASS

  * igt@kms_setmode@basic:
    - shard-kbl:          FAIL [fdo#99912] -> PASS

  * igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend:
    - shard-snb:          INCOMPLETE [fdo#105411] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103158]: https://bugs.freedesktop.org/show_bug.cgi?id=103158
  [fdo#103166]: https://bugs.freedesktop.org/show_bug.cgi?id=103166
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103232]: https://bugs.freedesktop.org/show_bug.cgi?id=103232
  [fdo#103313]: https://bugs.freedesktop.org/show_bug.cgi?id=103313
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#104782]: https://bugs.freedesktop.org/show_bug.cgi?id=104782
  [fdo#105345]: https://bugs.freedesktop.org/show_bug.cgi?id=105345
  [fdo#105411]: https://bugs.freedesktop.org/show_bug.cgi?id=105411
  [fdo#106510]: https://bugs.freedesktop.org/show_bug.cgi?id=106510
  [fdo#107956]: https://bugs.freedesktop.org/show_bug.cgi?id=107956
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108473]: https://bugs.freedesktop.org/show_bug.cgi?id=108473
  [fdo#108590]: https://bugs.freedesktop.org/show_bug.cgi?id=108590
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109469]: https://bugs.freedesktop.org/show_bug.cgi?id=109469
  [fdo#99912]: https://bugs.freedesktop.org/show_bug.cgi?id=99912


Participating hosts (7 -> 5)
------------------------------

  Missing    (2): shard-skl shard-iclb 


Build changes
-------------

    * Linux: CI_DRM_5536 -> Patchwork_12127

  CI_DRM_5536: 0a5caf6e62fb99d027b3e6af226abb47be732f15 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4805: cb6610f5a91a08b1d7f8ae910875891003c6f67c @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12127: 8691b315727abf059962d96dfa06c7421977bb61 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12127/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2019-02-05 10:07 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-02-04 13:21 Fleshing out the picture to Load Balancing^W^W HW semaphores Chris Wilson
2019-02-04 13:21 ` [PATCH 01/22] drm/i915/execlists: Suppress mere WAIT preemption Chris Wilson
2019-02-04 13:21 ` [PATCH 02/22] drm/i915/execlists: Suppress redundant preemption Chris Wilson
2019-02-04 13:21 ` [PATCH 03/22] drm/i915/selftests: Exercise some AB...BA preemption chains Chris Wilson
2019-02-04 13:21 ` [PATCH 04/22] drm/i915: Trim NEWCLIENT boosting Chris Wilson
2019-02-04 15:01   ` [PATCH] " Chris Wilson
2019-02-04 19:05     ` Tvrtko Ursulin
2019-02-04 13:21 ` [PATCH 05/22] drm/i915: Show support for accurate sw PMU busyness tracking Chris Wilson
2019-02-04 16:49   ` Chris Wilson
2019-02-04 13:21 ` [PATCH 06/22] drm/i915: Revoke mmaps and prevent access to fence registers across reset Chris Wilson
2019-02-04 13:21 ` [PATCH 07/22] drm/i915: Force the GPU reset upon wedging Chris Wilson
2019-02-04 13:22 ` [PATCH 08/22] drm/i915: Uninterruptibly drain the timelines on unwedging Chris Wilson
2019-02-04 13:22 ` [PATCH 09/22] drm/i915: Wait for old resets before applying debugfs/i915_wedged Chris Wilson
2019-02-04 13:22 ` [PATCH 10/22] drm/i915: Serialise resets with wedging Chris Wilson
2019-02-04 13:22 ` [PATCH 11/22] drm/i915: Don't claim an unstarted request was guilty Chris Wilson
2019-02-04 13:22 ` [PATCH 12/22] drm/i915: Generalise GPU activity tracking Chris Wilson
2019-02-05  8:51   ` Tvrtko Ursulin
2019-02-05  8:59     ` Chris Wilson
2019-02-04 13:22 ` [PATCH 13/22] drm/i915: Release the active tracker tree upon idling Chris Wilson
2019-02-04 13:22 ` [PATCH 14/22] drm/i915: Allocate active tracking nodes from a slabcache Chris Wilson
2019-02-04 18:22   ` Tvrtko Ursulin
2019-02-04 13:22 ` [PATCH 15/22] drm/i915: Make request allocation caches global Chris Wilson
2019-02-04 18:48   ` Tvrtko Ursulin
2019-02-04 21:26     ` Chris Wilson
2019-02-04 13:22 ` [PATCH 16/22] drm/i915: Add timeline barrier support Chris Wilson
2019-02-04 13:22 ` [PATCH 17/22] drm/i915: Pull i915_gem_active into the i915_active family Chris Wilson
2019-02-04 18:50   ` Tvrtko Ursulin
2019-02-04 13:22 ` [PATCH 18/22] drm/i915: Keep timeline HWSP allocated until idle across the system Chris Wilson
2019-02-04 18:57   ` Tvrtko Ursulin
2019-02-04 13:22 ` [PATCH 19/22] drm/i915/execlists: Refactor out can_merge_rq() Chris Wilson
2019-02-04 19:02   ` Tvrtko Ursulin
2019-02-04 13:22 ` [PATCH 20/22] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
2019-02-05  9:24   ` Tvrtko Ursulin
2019-02-05  9:27     ` Chris Wilson
2019-02-04 13:22 ` [PATCH 21/22] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
2019-02-04 13:22 ` [PATCH 22/22] semaphore-no-stats Chris Wilson
2019-02-05 10:03   ` Tvrtko Ursulin
2019-02-05 10:07     ` Chris Wilson
2019-02-04 13:58 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption Patchwork
2019-02-04 14:07 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-02-04 14:45 ` ✓ Fi.CI.BAT: success " Patchwork
2019-02-04 15:22 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/22] drm/i915/execlists: Suppress mere WAIT preemption (rev2) Patchwork
2019-02-04 15:30 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-02-04 15:42 ` ✓ Fi.CI.BAT: success " Patchwork
2019-02-04 17:51 ` ✓ Fi.CI.IGT: " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox