intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/8] Convert requests to use struct fence
@ 2016-04-20 17:09 John.C.Harrison
  2016-04-20 17:09 ` [PATCH v7 1/8] drm/i915: " John.C.Harrison
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: John.C.Harrison @ 2016-04-20 17:09 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more than simply track the execution progress so is very
definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain
all the advantages that provides.

Using the struct fence object also has the advantage that the fence
can be used outside of the i915 driver (by other drivers or by
userland applications). That is the basis of the dma-buff
synchronisation API and allows asynchronous tracking of work
completion. In this case, it allows applications to be signalled
directly when a batch buffer completes without having to make an IOCTL
call into the driver.

Note that in order to allow the full fence API to be used (e.g.
merging multiple fences together), the driver needs to provide an
incrementing timeline for the fence. Currently this timeline is
specific to the fence code as it must be per context. There is future
work planned to make the driver's internal seqno value also be per
context rather than driver global (VIZ-7443). Once this is done the
fence specific timeline code can be dropped in favour of just using
the driver's seqno value.

This is work that was planned since the conversion of the driver from
being seqno value based to being request structure based. This patch
series does that work.

An IGT test to exercise the fence support from user land is in
progress and will follow. Android already makes extensive use of
fences for display composition. Real world linux usage is planned in
the form of Jesse's page table sharing / bufferless execbuf support.
There is also a plan that Wayland (and others) could make use of it in
a similar manner to Android.

v2: Updated for review comments by various people and to add support
for Android style 'native sync'.

v3: Updated from review comments by Tvrtko Ursulin. Also moved sync
framework out of staging and improved request completion handling.

v4: Fixed patch tag (should have been PATCH not RFC). Corrected
ownership of one patch which had passed through many hands before
reaching me. Fixed a bug introduced in v3 and updated for review
comments.

v5: Removed de-staging and further updates to Android sync code. The
de-stage is now being handled by someone else. The sync integration to
the i915 driver will be a separate patch set that can only land after
the external de-stage has been completed.

Assorted changes based on review comments and style checker fixes.
Most significant change is fixing up the fake lost interrupt support
for the 'drv_missed_irq_hang' IGT test and improving the wait request
latency.

v6: Updated to newer nigthly and resolved conflicts around updates
to the wait_request optimisations.

v7: Updated to newer nightly and resolved conflicts around massive
ring -> engine rename and interface change to get_seqno(). Also fixed
up a race condition issue with stale request pointers in file client
lists and added a minor optimisation to not acquire spinlocks when a
list is empty and does not need processing.

[Patches against drm-intel-nightly tree fetched 13/04/2016]

John Harrison (8):
  drm/i915: Convert requests to use struct fence
  drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  drm/i915: Add per context timelines to fence object
  drm/i915: Fix clean up of file client list on execbuff failure
  drm/i915: Delay the freeing of requests until retire time
  drm/i915: Interrupt driven fences
  drm/i915: Updated request structure tracing
  drm/i915: Cache last IRQ seqno to reduce IRQ overhead

 drivers/gpu/drm/i915/i915_debugfs.c        |   7 +-
 drivers/gpu/drm/i915/i915_drv.h            |  74 +++--
 drivers/gpu/drm/i915/i915_gem.c            | 443 ++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_gem_context.c    |  14 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   6 +-
 drivers/gpu/drm/i915/i915_irq.c            |   3 +-
 drivers/gpu/drm/i915/i915_trace.h          |  14 +-
 drivers/gpu/drm/i915/intel_display.c       |   4 +-
 drivers/gpu/drm/i915/intel_lrc.c           |  13 +
 drivers/gpu/drm/i915/intel_pm.c            |   6 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c    |   5 +
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  12 +
 12 files changed, 513 insertions(+), 88 deletions(-)

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 1/8] drm/i915: Convert requests to use struct fence
  2016-04-20 17:09 [PATCH v7 0/8] Convert requests to use struct fence John.C.Harrison
@ 2016-04-20 17:09 ` John.C.Harrison
  2016-04-21  7:06   ` Maarten Lankhorst
  2016-04-21  7:09   ` Maarten Lankhorst
  2016-04-20 17:09 ` [PATCH v7 2/8] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 22+ messages in thread
From: John.C.Harrison @ 2016-04-20 17:09 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more than simply track the execution progress so is very
definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain
all the advantages that provides.

This patch makes the first step of integrating a struct fence into the
request. It replaces the explicit reference count with that of the
fence. It also replaces the 'is completed' test with the fence's
equivalent. Currently, that simply chains on to the original request
implementation. A future patch will improve this.

v3: Updated after review comments by Tvrtko Ursulin. Added fence
context/seqno pair to the debugfs request info. Renamed fence 'driver
name' to just 'i915'. Removed BUG_ONs.

v5: Changed seqno format in debugfs to %x rather than %u as that is
apparently the preferred appearance. Line wrapped some long lines to
keep the style checker happy.

v6: Updated to newer nigthly and resolved conflicts. The biggest issue
was with the re-worked busy spin precursor to waiting on a request. In
particular, the addition of a 'request_started' helper function. This
has no corresponding concept within the fence framework. However, it
is only ever used in one place and the whole point of that place is to
always directly read the seqno for absolutely lowest latency possible.
So the simple solution is to just make the seqno test explicit at that
point now rather than later in the series (it was previously being
done anyway when fences become interrupt driven).

v7: Rebased to newer nightly - lots of ring -> engine renaming and
interface change to get_seqno().

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  5 ++-
 drivers/gpu/drm/i915/i915_drv.h         | 51 ++++++++++-------------
 drivers/gpu/drm/i915/i915_gem.c         | 72 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_lrc.c        |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
 6 files changed, 94 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 2d11b49..6917515 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -707,11 +707,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
 			task = NULL;
 			if (req->pid)
 				task = pid_task(req->pid, PIDTYPE_PID);
-			seq_printf(m, "    %x @ %d: %s [%d]\n",
+			seq_printf(m, "    %x @ %d: %s [%d], fence = %x:%x\n",
 				   req->seqno,
 				   (int) (jiffies - req->emitted_jiffies),
 				   task ? task->comm : "<unknown>",
-				   task ? task->pid : -1);
+				   task ? task->pid : -1,
+				   req->fence.context, req->fence.seqno);
 			rcu_read_unlock();
 		}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d1e6e58..e5f49f3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -54,6 +54,7 @@
 #include <linux/pm_qos.h>
 #include "intel_guc.h"
 #include "intel_dpll_mgr.h"
+#include <linux/fence.h>
 
 /* General customization:
  */
@@ -2242,7 +2243,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
  * initial reference taken using kref_init
  */
 struct drm_i915_gem_request {
-	struct kref ref;
+	/**
+	 * Underlying object for implementing the signal/wait stuff.
+	 * NB: Never call fence_later() or return this fence object to user
+	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
+	 * etc., there is no guarantee at all about the validity or
+	 * sequentiality of the fence's seqno! It is also unsafe to let
+	 * anything outside of the i915 driver get hold of the fence object
+	 * as the clean up when decrementing the reference count requires
+	 * holding the driver mutex lock.
+	 */
+	struct fence fence;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2328,7 +2339,13 @@ struct drm_i915_gem_request * __must_check
 i915_gem_request_alloc(struct intel_engine_cs *engine,
 		       struct intel_context *ctx);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
-void i915_gem_request_free(struct kref *req_ref);
+
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
+					      bool lazy_coherency)
+{
+	return fence_is_signaled(&req->fence);
+}
+
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 				   struct drm_file *file);
 
@@ -2348,7 +2365,7 @@ static inline struct drm_i915_gem_request *
 i915_gem_request_reference(struct drm_i915_gem_request *req)
 {
 	if (req)
-		kref_get(&req->ref);
+		fence_get(&req->fence);
 	return req;
 }
 
@@ -2356,7 +2373,7 @@ static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
 	WARN_ON(!mutex_is_locked(&req->engine->dev->struct_mutex));
-	kref_put(&req->ref, i915_gem_request_free);
+	fence_put(&req->fence);
 }
 
 static inline void
@@ -2368,7 +2385,7 @@ i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
 		return;
 
 	dev = req->engine->dev;
-	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
+	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
 		mutex_unlock(&dev->struct_mutex);
 }
 
@@ -2385,12 +2402,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
 }
 
 /*
- * XXX: i915_gem_request_completed should be here but currently needs the
- * definition of i915_seqno_passed() which is below. It will be moved in
- * a later patch when the call to i915_seqno_passed() is obsoleted...
- */
-
-/*
  * A command that requires special handling by the command parser.
  */
 struct drm_i915_cmd_descriptor {
@@ -3055,24 +3066,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 	return (int32_t)(seq1 - seq2) >= 0;
 }
 
-static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
-					   bool lazy_coherency)
-{
-	if (!lazy_coherency && req->engine->irq_seqno_barrier)
-		req->engine->irq_seqno_barrier(req->engine);
-	return i915_seqno_passed(req->engine->get_seqno(req->engine),
-				 req->previous_seqno);
-}
-
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
-{
-	if (!lazy_coherency && req->engine->irq_seqno_barrier)
-		req->engine->irq_seqno_barrier(req->engine);
-	return i915_seqno_passed(req->engine->get_seqno(req->engine),
-				 req->seqno);
-}
-
 int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
 int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ebef03b..1add29a9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1183,6 +1183,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 {
 	unsigned long timeout;
 	unsigned cpu;
+	uint32_t seqno;
 
 	/* When waiting for high frequency requests, e.g. during synchronous
 	 * rendering split between the CPU and GPU, the finite amount of time
@@ -1198,12 +1199,14 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 		return -EBUSY;
 
 	/* Only spin if we know the GPU is processing this request */
-	if (!i915_gem_request_started(req, true))
+	seqno = req->engine->get_seqno(req->engine);
+	if (!i915_seqno_passed(seqno, req->previous_seqno))
 		return -EAGAIN;
 
 	timeout = local_clock_us(&cpu) + 5;
 	while (!need_resched()) {
-		if (i915_gem_request_completed(req, true))
+		seqno = req->engine->get_seqno(req->engine);
+		if (i915_seqno_passed(seqno, req->seqno))
 			return 0;
 
 		if (signal_pending_state(state, current))
@@ -1215,7 +1218,10 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 		cpu_relax_lowlatency();
 	}
 
-	if (i915_gem_request_completed(req, false))
+	if (req->engine->irq_seqno_barrier)
+		req->engine->irq_seqno_barrier(req->engine);
+	seqno = req->engine->get_seqno(req->engine);
+	if (i915_seqno_passed(seqno, req->seqno))
 		return 0;
 
 	return -EAGAIN;
@@ -2721,12 +2727,14 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 	}
 }
 
-void i915_gem_request_free(struct kref *req_ref)
+static void i915_gem_request_free(struct fence *req_fence)
 {
-	struct drm_i915_gem_request *req = container_of(req_ref,
-						 typeof(*req), ref);
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
 	struct intel_context *ctx = req->ctx;
 
+	WARN_ON(!mutex_is_locked(&req->engine->dev->struct_mutex));
+
 	if (req->file_priv)
 		i915_gem_request_remove_from_client(req);
 
@@ -2740,6 +2748,48 @@ void i915_gem_request_free(struct kref *req_ref)
 	kmem_cache_free(req->i915->requests, req);
 }
 
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+{
+	/* Interrupt driven fences are not implemented yet.*/
+	WARN(true, "This should not be called!");
+	return true;
+}
+
+static bool i915_gem_request_is_completed(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
+	u32 seqno;
+
+/*	if (!lazy_coherency && req->engine->irq_seqno_barrier)
+		req->engine->irq_seqno_barrier(req->engine);*/
+
+	seqno = req->engine->get_seqno(req->engine);
+
+	return i915_seqno_passed(seqno, req->seqno);
+}
+
+static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
+{
+	return "i915";
+}
+
+static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
+	return req->engine->name;
+}
+
+static const struct fence_ops i915_gem_request_fops = {
+	.enable_signaling	= i915_gem_request_enable_signaling,
+	.signaled		= i915_gem_request_is_completed,
+	.wait			= fence_default_wait,
+	.release		= i915_gem_request_free,
+	.get_driver_name	= i915_gem_request_get_driver_name,
+	.get_timeline_name	= i915_gem_request_get_timeline_name,
+};
+
 static inline int
 __i915_gem_request_alloc(struct intel_engine_cs *engine,
 			 struct intel_context *ctx,
@@ -2762,7 +2812,6 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
 	if (ret)
 		goto err;
 
-	kref_init(&req->ref);
 	req->i915 = dev_priv;
 	req->engine = engine;
 	req->ctx  = ctx;
@@ -2777,6 +2826,9 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
 		goto err;
 	}
 
+	fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
+		   engine->fence_context, req->seqno);
+
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
 	 * eventually emit this request. This is to guarantee that the
@@ -4887,7 +4939,7 @@ i915_gem_init_hw(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *engine;
-	int ret, j;
+	int ret, j, fence_base;
 
 	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
 		return -EIO;
@@ -4957,10 +5009,14 @@ i915_gem_init_hw(struct drm_device *dev)
 	if (ret)
 		goto out;
 
+	fence_base = fence_context_alloc(I915_NUM_ENGINES);
+
 	/* Now it is safe to go back round and do everything else: */
 	for_each_engine(engine, dev_priv) {
 		struct drm_i915_gem_request *req;
 
+		engine->fence_context = fence_base + engine->id;
+
 		req = i915_gem_request_alloc(engine, NULL);
 		if (IS_ERR(req)) {
 			ret = PTR_ERR(req);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5e08ea5..268e024 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2133,6 +2133,7 @@ logical_ring_init(struct drm_device *dev, struct intel_engine_cs *engine)
 	engine->dev = dev;
 	INIT_LIST_HEAD(&engine->active_list);
 	INIT_LIST_HEAD(&engine->request_list);
+	spin_lock_init(&engine->fence_lock);
 	i915_gem_batch_pool_init(dev, &engine->batch_pool);
 	init_waitqueue_head(&engine->irq_queue);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 19ebe77..a52b81c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2229,6 +2229,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&engine->request_list);
 	INIT_LIST_HEAD(&engine->execlist_queue);
 	INIT_LIST_HEAD(&engine->buffers);
+	spin_lock_init(&engine->fence_lock);
 	i915_gem_batch_pool_init(dev, &engine->batch_pool);
 	memset(engine->semaphore.sync_seqno, 0,
 	       sizeof(engine->semaphore.sync_seqno));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 2ade194..2cd9d2c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -348,6 +348,9 @@ struct  intel_engine_cs {
 	 * to encode the command length in the header).
 	 */
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
+
+	unsigned fence_context;
+	spinlock_t fence_lock;
 };
 
 static inline bool
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 2/8] drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  2016-04-20 17:09 [PATCH v7 0/8] Convert requests to use struct fence John.C.Harrison
  2016-04-20 17:09 ` [PATCH v7 1/8] drm/i915: " John.C.Harrison
@ 2016-04-20 17:09 ` John.C.Harrison
  2016-04-21  7:19   ` Maarten Lankhorst
  2016-04-20 17:09 ` [PATCH v7 3/8] drm/i915: Add per context timelines to fence object John.C.Harrison
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: John.C.Harrison @ 2016-04-20 17:09 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The change to the implementation of i915_gem_request_completed() means
that the lazy coherency flag is no longer used. This can now be
removed to simplify the interface.

v6: Updated to newer nightly and resolved conflicts.

v7: Updated to newer nightly (lots of ring -> engine renaming).

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  2 +-
 drivers/gpu/drm/i915/i915_drv.h      |  3 +--
 drivers/gpu/drm/i915/i915_gem.c      | 14 +++++++-------
 drivers/gpu/drm/i915/intel_display.c |  2 +-
 drivers/gpu/drm/i915/intel_pm.c      |  4 ++--
 5 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 6917515..29ff1b3 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -599,7 +599,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 					   i915_gem_request_get_seqno(work->flip_queued_req),
 					   dev_priv->next_seqno,
 					   engine->get_seqno(engine),
-					   i915_gem_request_completed(work->flip_queued_req, true));
+					   i915_gem_request_completed(work->flip_queued_req));
 			} else
 				seq_printf(m, "Flip not associated with any ring\n");
 			seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e5f49f3..bb18b89 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2340,8 +2340,7 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 		       struct intel_context *ctx);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
 	return fence_is_signaled(&req->fence);
 }
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1add29a9..4209e3b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1266,7 +1266,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (list_empty(&req->list))
 		return 0;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return 0;
 
 	timeout_expire = 0;
@@ -1316,7 +1316,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
-		if (i915_gem_request_completed(req, false)) {
+		if (i915_gem_request_completed(req)) {
 			ret = 0;
 			break;
 		}
@@ -2896,7 +2896,7 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
 	struct drm_i915_gem_request *request;
 
 	list_for_each_entry(request, &engine->request_list, list) {
-		if (i915_gem_request_completed(request, false))
+		if (i915_gem_request_completed(request))
 			continue;
 
 		return request;
@@ -3033,7 +3033,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 					   struct drm_i915_gem_request,
 					   list);
 
-		if (!i915_gem_request_completed(request, true))
+		if (!i915_gem_request_completed(request))
 			break;
 
 		i915_gem_request_retire(request);
@@ -3057,7 +3057,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 	}
 
 	if (unlikely(engine->trace_irq_req &&
-		     i915_gem_request_completed(engine->trace_irq_req, true))) {
+		     i915_gem_request_completed(engine->trace_irq_req))) {
 		engine->irq_put(engine);
 		i915_gem_request_assign(&engine->trace_irq_req, NULL);
 	}
@@ -3160,7 +3160,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 		if (list_empty(&req->list))
 			goto retire;
 
-		if (i915_gem_request_completed(req, true)) {
+		if (i915_gem_request_completed(req)) {
 			__i915_gem_request_retire__upto(req);
 retire:
 			i915_gem_object_retire__read(obj, i);
@@ -3272,7 +3272,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (to == from)
 		return 0;
 
-	if (i915_gem_request_completed(from_req, true))
+	if (i915_gem_request_completed(from_req))
 		return 0;
 
 	if (!i915_semaphore_is_enabled(obj->base.dev)) {
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index e5bf0ab..e14a8bf 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11433,7 +11433,7 @@ static bool __intel_pageflip_stall_check(struct drm_device *dev,
 
 	if (work->flip_ready_vblank == 0) {
 		if (work->flip_queued_req &&
-		    !i915_gem_request_completed(work->flip_queued_req, true))
+		    !i915_gem_request_completed(work->flip_queued_req))
 			return false;
 
 		work->flip_ready_vblank = drm_crtc_vblank_count(crtc);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index b8e395a..781d601 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7359,7 +7359,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 	struct request_boost *boost = container_of(work, struct request_boost, work);
 	struct drm_i915_gem_request *req = boost->req;
 
-	if (!i915_gem_request_completed(req, true))
+	if (!i915_gem_request_completed(req))
 		gen6_rps_boost(to_i915(req->engine->dev), NULL,
 			       req->emitted_jiffies);
 
@@ -7375,7 +7375,7 @@ void intel_queue_rps_boost_for_request(struct drm_device *dev,
 	if (req == NULL || INTEL_INFO(dev)->gen < 6)
 		return;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return;
 
 	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 3/8] drm/i915: Add per context timelines to fence object
  2016-04-20 17:09 [PATCH v7 0/8] Convert requests to use struct fence John.C.Harrison
  2016-04-20 17:09 ` [PATCH v7 1/8] drm/i915: " John.C.Harrison
  2016-04-20 17:09 ` [PATCH v7 2/8] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
@ 2016-04-20 17:09 ` John.C.Harrison
  2016-04-20 17:44   ` Chris Wilson
  2016-04-20 17:09 ` [PATCH v7 4/8] drm/i915: Fix clean up of file client list on execbuff failure John.C.Harrison
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: John.C.Harrison @ 2016-04-20 17:09 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The fence object used inside the request structure requires a sequence
number. Although this is not used by the i915 driver itself, it could
potentially be used by non-i915 code if the fence is passed outside of
the driver. This is the intention as it allows external kernel drivers
and user applications to wait on batch buffer completion
asynchronously via the dma-buff fence API.

To ensure that such external users are not confused by strange things
happening with the seqno, this patch adds in a per context timeline
that can provide a guaranteed in-order seqno value for the fence. This
is safe because the scheduler will not re-order batch buffers within a
context - they are considered to be mutually dependent.

v2: New patch in series.

v3: Renamed/retyped timeline structure fields after review comments by
Tvrtko Ursulin.

Added context information to the timeline's name string for better
identification in debugfs output.

v5: Line wrapping and other white space fixes to keep style checker
happy.

v7: Updated to newer nightly (lots of ring -> engine renaming).

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 25 +++++++---
 drivers/gpu/drm/i915/i915_gem.c         | 83 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++
 drivers/gpu/drm/i915/intel_lrc.c        |  8 ++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
 5 files changed, 114 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index bb18b89..1c3a1ca 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -813,6 +813,15 @@ struct i915_ctx_hang_stats {
 	bool banned;
 };
 
+struct i915_fence_timeline {
+	char        name[32];
+	unsigned    fence_context;
+	unsigned    next;
+
+	struct intel_context *ctx;
+	struct intel_engine_cs *engine;
+};
+
 /* This must match up with the value previously used for execbuf2.rsvd1. */
 #define DEFAULT_CONTEXT_HANDLE 0
 
@@ -860,6 +869,7 @@ struct intel_context {
 		struct i915_vma *lrc_vma;
 		u64 lrc_desc;
 		uint32_t *lrc_reg_state;
+		struct i915_fence_timeline fence_timeline;
 	} engine[I915_NUM_ENGINES];
 
 	struct list_head link;
@@ -2245,13 +2255,10 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
 struct drm_i915_gem_request {
 	/**
 	 * Underlying object for implementing the signal/wait stuff.
-	 * NB: Never call fence_later() or return this fence object to user
-	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
-	 * etc., there is no guarantee at all about the validity or
-	 * sequentiality of the fence's seqno! It is also unsafe to let
-	 * anything outside of the i915 driver get hold of the fence object
-	 * as the clean up when decrementing the reference count requires
-	 * holding the driver mutex lock.
+	 * NB: Never return this fence object to user land! It is unsafe to
+	 * let anything outside of the i915 driver get hold of the fence
+	 * object as the clean up when decrementing the reference count
+	 * requires holding the driver mutex lock.
 	 */
 	struct fence fence;
 
@@ -2340,6 +2347,10 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 		       struct intel_context *ctx);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
+int i915_create_fence_timeline(struct drm_device *dev,
+			       struct intel_context *ctx,
+			       struct intel_engine_cs *ring);
+
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
 	return fence_is_signaled(&req->fence);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4209e3b..99e9ddb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2776,9 +2776,35 @@ static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
 
 static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
 {
-	struct drm_i915_gem_request *req = container_of(req_fence,
-						 typeof(*req), fence);
-	return req->engine->name;
+	struct drm_i915_gem_request *req;
+	struct i915_fence_timeline *timeline;
+
+	req = container_of(req_fence, typeof(*req), fence);
+	timeline = &req->ctx->engine[req->engine->id].fence_timeline;
+
+	return timeline->name;
+}
+
+static void i915_gem_request_timeline_value_str(struct fence *req_fence,
+						char *str, int size)
+{
+	struct drm_i915_gem_request *req;
+
+	req = container_of(req_fence, typeof(*req), fence);
+
+	/* Last signalled timeline value ??? */
+	snprintf(str, size, "? [%d]"/*, timeline->value*/,
+		 req->engine->get_seqno(req->engine));
+}
+
+static void i915_gem_request_fence_value_str(struct fence *req_fence,
+					     char *str, int size)
+{
+	struct drm_i915_gem_request *req;
+
+	req = container_of(req_fence, typeof(*req), fence);
+
+	snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);
 }
 
 static const struct fence_ops i915_gem_request_fops = {
@@ -2788,8 +2814,50 @@ static const struct fence_ops i915_gem_request_fops = {
 	.release		= i915_gem_request_free,
 	.get_driver_name	= i915_gem_request_get_driver_name,
 	.get_timeline_name	= i915_gem_request_get_timeline_name,
+	.fence_value_str	= i915_gem_request_fence_value_str,
+	.timeline_value_str	= i915_gem_request_timeline_value_str,
 };
 
+int i915_create_fence_timeline(struct drm_device *dev,
+			       struct intel_context *ctx,
+			       struct intel_engine_cs *engine)
+{
+	struct i915_fence_timeline *timeline;
+
+	timeline = &ctx->engine[engine->id].fence_timeline;
+
+	if (timeline->engine)
+		return 0;
+
+	timeline->fence_context = fence_context_alloc(1);
+
+	/*
+	 * Start the timeline from seqno 0 as this is a special value
+	 * that is reserved for invalid sync points.
+	 */
+	timeline->next       = 1;
+	timeline->ctx        = ctx;
+	timeline->engine     = engine;
+
+	snprintf(timeline->name, sizeof(timeline->name), "%d>%s:%d",
+		 timeline->fence_context, engine->name, ctx->user_handle);
+
+	return 0;
+}
+
+static unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
+{
+	unsigned seqno;
+
+	seqno = timeline->next;
+
+	/* Reserve zero for invalid */
+	if (++timeline->next == 0)
+		timeline->next = 1;
+
+	return seqno;
+}
+
 static inline int
 __i915_gem_request_alloc(struct intel_engine_cs *engine,
 			 struct intel_context *ctx,
@@ -2827,7 +2895,8 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
 	}
 
 	fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
-		   engine->fence_context, req->seqno);
+		   ctx->engine[engine->id].fence_timeline.fence_context,
+		   i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline));
 
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
@@ -4939,7 +5008,7 @@ i915_gem_init_hw(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *engine;
-	int ret, j, fence_base;
+	int ret, j;
 
 	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
 		return -EIO;
@@ -5009,14 +5078,10 @@ i915_gem_init_hw(struct drm_device *dev)
 	if (ret)
 		goto out;
 
-	fence_base = fence_context_alloc(I915_NUM_ENGINES);
-
 	/* Now it is safe to go back round and do everything else: */
 	for_each_engine(engine, dev_priv) {
 		struct drm_i915_gem_request *req;
 
-		engine->fence_context = fence_base + engine->id;
-
 		req = i915_gem_request_alloc(engine, NULL);
 		if (IS_ERR(req)) {
 			ret = PTR_ERR(req);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 282b17c..6b30de9 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -279,6 +279,20 @@ i915_gem_create_context(struct drm_device *dev,
 	if (IS_ERR(ctx))
 		return ctx;
 
+	if (!i915.enable_execlists) {
+		struct intel_engine_cs *engine;
+
+		/* Create a per context timeline for fences */
+		for_each_engine(engine, to_i915(dev)) {
+			ret = i915_create_fence_timeline(dev, ctx, engine);
+			if (ret) {
+				DRM_ERROR("Fence timeline creation failed for legacy %s: %p\n",
+					  engine->name, ctx);
+				goto err_destroy;
+			}
+		}
+	}
+
 	if (is_global_default_ctx && ctx->legacy_hw_ctx.rcs_state) {
 		/* We may need to do things with the shrinker which
 		 * require us to immediately switch back to the default
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 268e024..db74cc14 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2694,6 +2694,14 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 		goto error_ringbuf;
 	}
 
+	/* Create a per context timeline for fences */
+	ret = i915_create_fence_timeline(dev, ctx, engine);
+	if (ret) {
+		DRM_ERROR("Fence timeline creation failed for engine %s, ctx %p\n",
+			  engine->name, ctx);
+		goto error_ringbuf;
+	}
+
 	ctx->engine[engine->id].ringbuf = ringbuf;
 	ctx->engine[engine->id].state = ctx_obj;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 2cd9d2c..23e1d94 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -349,7 +349,6 @@ struct  intel_engine_cs {
 	 */
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
 
-	unsigned fence_context;
 	spinlock_t fence_lock;
 };
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 4/8] drm/i915: Fix clean up of file client list on execbuff failure
  2016-04-20 17:09 [PATCH v7 0/8] Convert requests to use struct fence John.C.Harrison
                   ` (2 preceding siblings ...)
  2016-04-20 17:09 ` [PATCH v7 3/8] drm/i915: Add per context timelines to fence object John.C.Harrison
@ 2016-04-20 17:09 ` John.C.Harrison
  2016-04-21  7:20   ` Maarten Lankhorst
  2016-04-20 17:09 ` [PATCH v7 5/8] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: John.C.Harrison @ 2016-04-20 17:09 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

If an execbuff IOCTL call fails for some reason, it would leave the
request in the client list. The request clean up code would remove
this but only later on and only after the reference count has dropped
to zero. The entire sequence is contained within the driver mutex
lock. However, there is still a hole such that any code which does not
require the mutex lock could still find the request on the client list
and start using it. That would lead to broken reference counts, use of
dangling pointers and all sorts of other nastiness.

The throttle IOCTL in particular does not acquire the mutex and does
process the client list. And the likely situation of the execbuff
IOCTL failing is when the system is busy with lots of work
outstanding. That is exactly the situation where the throttle IOCTL
would try to wait on a request.

Currently, this hole is tiny - the gap between the reference count
dropping to zero and the free function being called in response.
However the next patch in this series enlarges that gap considerably
by deferring the free function (to remove the need for the mutex lock
when unreferencing requests).

v7: New patch in series.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            | 1 +
 drivers/gpu/drm/i915/i915_gem.c            | 5 ++---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 6 +++++-
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1c3a1ca..707bf0f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2358,6 +2358,7 @@ static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 				   struct drm_file *file);
+void i915_gem_request_remove_from_client(struct drm_i915_gem_request *request);
 
 static inline uint32_t
 i915_gem_request_get_seqno(struct drm_i915_gem_request *req)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 99e9ddb..74db6a3 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1399,8 +1399,7 @@ int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 	return 0;
 }
 
-static inline void
-i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
+void i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
 {
 	struct drm_i915_file_private *file_priv = request->file_priv;
 
@@ -2735,7 +2734,7 @@ static void i915_gem_request_free(struct fence *req_fence)
 
 	WARN_ON(!mutex_is_locked(&req->engine->dev->struct_mutex));
 
-	if (req->file_priv)
+	if (WARN_ON(req->file_priv))
 		i915_gem_request_remove_from_client(req);
 
 	if (ctx) {
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index bb95d27..40d333c 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1659,8 +1659,12 @@ err:
 	 * must be freed again. If it was submitted then it is being tracked
 	 * on the active request list and no clean up is required here.
 	 */
-	if (ret && !IS_ERR_OR_NULL(req))
+	if (ret && !IS_ERR_OR_NULL(req)) {
+		if (req->file_priv)
+			i915_gem_request_remove_from_client(req);
+
 		i915_gem_request_cancel(req);
+	}
 
 	mutex_unlock(&dev->struct_mutex);
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 5/8] drm/i915: Delay the freeing of requests until retire time
  2016-04-20 17:09 [PATCH v7 0/8] Convert requests to use struct fence John.C.Harrison
                   ` (3 preceding siblings ...)
  2016-04-20 17:09 ` [PATCH v7 4/8] drm/i915: Fix clean up of file client list on execbuff failure John.C.Harrison
@ 2016-04-20 17:09 ` John.C.Harrison
  2016-04-21  7:41   ` Maarten Lankhorst
  2016-04-20 17:09 ` [PATCH v7 6/8] drm/i915: Interrupt driven fences John.C.Harrison
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: John.C.Harrison @ 2016-04-20 17:09 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The request structure is reference counted. When the count reached
zero, the request was immediately freed and all associated objects
were unrefereced/unallocated. This meant that the driver mutex lock
must be held at the point where the count reaches zero. This was fine
while all references were held internally to the driver. However, the
plan is to allow the underlying fence object (and hence the request
itself) to be returned to other drivers and to userland. External
users cannot be expected to acquire a driver private mutex lock.

Rather than attempt to disentangle the request structure from the
driver mutex lock, the decsion was to defer the free code until a
later (safer) point. Hence this patch changes the unreference callback
to merely move the request onto a delayed free list. The driver's
retire worker thread will then process the list and actually call the
free function on the requests.

v2: New patch in series.

v3: Updated after review comments by Tvrtko Ursulin. Rename list nodes
to 'link' rather than 'list'. Update list processing to be more
efficient/safer with respect to spinlocks.

v4: Changed to use basic spinlocks rather than IRQ ones - missed
update from earlier feedback by Tvrtko.

v5: Improved a comment to keep the style checker happy.

v7: Updated to newer nightly (lots of ring -> engine renaming). Also
added a list_empty() check before wasting time with spinlocks and list
processing.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 22 +++----------------
 drivers/gpu/drm/i915/i915_gem.c         | 39 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_display.c    |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c        |  2 ++
 drivers/gpu/drm/i915/intel_pm.c         |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  7 ++++++
 7 files changed, 51 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 707bf0f..81324ba 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2253,14 +2253,9 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
  * initial reference taken using kref_init
  */
 struct drm_i915_gem_request {
-	/**
-	 * Underlying object for implementing the signal/wait stuff.
-	 * NB: Never return this fence object to user land! It is unsafe to
-	 * let anything outside of the i915 driver get hold of the fence
-	 * object as the clean up when decrementing the reference count
-	 * requires holding the driver mutex lock.
-	 */
+	/** Underlying object for implementing the signal/wait stuff. */
 	struct fence fence;
+	struct list_head delayed_free_link;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2383,21 +2378,10 @@ i915_gem_request_reference(struct drm_i915_gem_request *req)
 static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
-	WARN_ON(!mutex_is_locked(&req->engine->dev->struct_mutex));
-	fence_put(&req->fence);
-}
-
-static inline void
-i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
-{
-	struct drm_device *dev;
-
 	if (!req)
 		return;
 
-	dev = req->engine->dev;
-	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
-		mutex_unlock(&dev->struct_mutex);
+	fence_put(&req->fence);
 }
 
 static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 74db6a3..9071058 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2726,10 +2726,26 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 	}
 }
 
-static void i915_gem_request_free(struct fence *req_fence)
+static void i915_gem_request_release(struct fence *req_fence)
 {
 	struct drm_i915_gem_request *req = container_of(req_fence,
 						 typeof(*req), fence);
+	struct intel_engine_cs *engine = req->engine;
+	struct drm_i915_private *dev_priv = to_i915(engine->dev);
+
+	/*
+	 * Need to add the request to a deferred dereference list to be
+	 * processed at a mutex lock safe time.
+	 */
+	spin_lock(&engine->delayed_free_lock);
+	list_add_tail(&req->delayed_free_link, &engine->delayed_free_list);
+	spin_unlock(&engine->delayed_free_lock);
+
+	queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
+}
+
+static void i915_gem_request_free(struct drm_i915_gem_request *req)
+{
 	struct intel_context *ctx = req->ctx;
 
 	WARN_ON(!mutex_is_locked(&req->engine->dev->struct_mutex));
@@ -2810,7 +2826,7 @@ static const struct fence_ops i915_gem_request_fops = {
 	.enable_signaling	= i915_gem_request_enable_signaling,
 	.signaled		= i915_gem_request_is_completed,
 	.wait			= fence_default_wait,
-	.release		= i915_gem_request_free,
+	.release		= i915_gem_request_release,
 	.get_driver_name	= i915_gem_request_get_driver_name,
 	.get_timeline_name	= i915_gem_request_get_timeline_name,
 	.fence_value_str	= i915_gem_request_fence_value_str,
@@ -3087,6 +3103,9 @@ void i915_gem_reset(struct drm_device *dev)
 void
 i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 {
+	struct drm_i915_gem_request *req, *req_next;
+	LIST_HEAD(list_head);
+
 	WARN_ON(i915_verify_lists(engine->dev));
 
 	/* Retire requests first as we use it above for the early return.
@@ -3130,6 +3149,17 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 		i915_gem_request_assign(&engine->trace_irq_req, NULL);
 	}
 
+	/* Really free any requests that were recently unreferenced */
+	if (!list_empty(&engine->delayed_free_list)) {
+		spin_lock(&engine->delayed_free_lock);
+		list_splice_init(&engine->delayed_free_list, &list_head);
+		spin_unlock(&engine->delayed_free_lock);
+		list_for_each_entry_safe(req, req_next, &list_head, delayed_free_link) {
+			list_del(&req->delayed_free_link);
+			i915_gem_request_free(req);
+		}
+	}
+
 	WARN_ON(i915_verify_lists(engine->dev));
 }
 
@@ -3317,7 +3347,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 			ret = __i915_wait_request(req[i], reset_counter, true,
 						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
 						  to_rps_client(file));
-		i915_gem_request_unreference__unlocked(req[i]);
+		i915_gem_request_unreference(req[i]);
 	}
 	return ret;
 
@@ -4339,7 +4369,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (ret == 0)
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
 
-	i915_gem_request_unreference__unlocked(target);
+	i915_gem_request_unreference(target);
 
 	return ret;
 }
@@ -5204,6 +5234,7 @@ init_engine_lists(struct intel_engine_cs *engine)
 {
 	INIT_LIST_HEAD(&engine->active_list);
 	INIT_LIST_HEAD(&engine->request_list);
+	INIT_LIST_HEAD(&engine->delayed_free_list);
 }
 
 void
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index e14a8bf..e550e5b 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11370,7 +11370,7 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
 					    mmio_flip->crtc->reset_counter,
 					    false, NULL,
 					    &mmio_flip->i915->rps.mmioflips));
-		i915_gem_request_unreference__unlocked(mmio_flip->req);
+		i915_gem_request_unreference(mmio_flip->req);
 	}
 
 	/* For framebuffer backed by dmabuf, wait for fence */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index db74cc14..8eb0c5e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2133,7 +2133,9 @@ logical_ring_init(struct drm_device *dev, struct intel_engine_cs *engine)
 	engine->dev = dev;
 	INIT_LIST_HEAD(&engine->active_list);
 	INIT_LIST_HEAD(&engine->request_list);
+	INIT_LIST_HEAD(&engine->delayed_free_list);
 	spin_lock_init(&engine->fence_lock);
+	spin_lock_init(&engine->delayed_free_lock);
 	i915_gem_batch_pool_init(dev, &engine->batch_pool);
 	init_waitqueue_head(&engine->irq_queue);
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 781d601..568c2fa 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7363,7 +7363,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 		gen6_rps_boost(to_i915(req->engine->dev), NULL,
 			       req->emitted_jiffies);
 
-	i915_gem_request_unreference__unlocked(req);
+	i915_gem_request_unreference(req);
 	kfree(boost);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index a52b81c..ac922a5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2229,7 +2229,9 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&engine->request_list);
 	INIT_LIST_HEAD(&engine->execlist_queue);
 	INIT_LIST_HEAD(&engine->buffers);
+	INIT_LIST_HEAD(&engine->delayed_free_list);
 	spin_lock_init(&engine->fence_lock);
+	spin_lock_init(&engine->delayed_free_lock);
 	i915_gem_batch_pool_init(dev, &engine->batch_pool);
 	memset(engine->semaphore.sync_seqno, 0,
 	       sizeof(engine->semaphore.sync_seqno));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 23e1d94..fa07ea9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -309,6 +309,13 @@ struct  intel_engine_cs {
 	u32 last_submitted_seqno;
 	unsigned user_interrupts;
 
+	/*
+	 * Deferred free list to allow unreferencing requests from interrupt
+	 * contexts and from outside of the i915 driver.
+	 */
+	struct list_head delayed_free_list;
+	spinlock_t delayed_free_lock;
+
 	bool gpu_caches_dirty;
 
 	wait_queue_head_t irq_queue;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 6/8] drm/i915: Interrupt driven fences
  2016-04-20 17:09 [PATCH v7 0/8] Convert requests to use struct fence John.C.Harrison
                   ` (4 preceding siblings ...)
  2016-04-20 17:09 ` [PATCH v7 5/8] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
@ 2016-04-20 17:09 ` John.C.Harrison
  2016-04-21  7:38   ` Maarten Lankhorst
  2016-04-20 17:09 ` [PATCH v7 7/8] drm/i915: Updated request structure tracing John.C.Harrison
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: John.C.Harrison @ 2016-04-20 17:09 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The intended usage model for struct fence is that the signalled status
should be set on demand rather than polled. That is, there should not
be a need for a 'signaled' function to be called everytime the status
is queried. Instead, 'something' should be done to enable a signal
callback from the hardware which will update the state directly. In
the case of requests, this is the seqno update interrupt. The idea is
that this callback will only be enabled on demand when something
actually tries to wait on the fence.

This change removes the polling test and replaces it with the callback
scheme. Each fence is added to a 'please poke me' list at the start of
i915_add_request(). The interrupt handler then scans through the 'poke
me' list when a new seqno pops out and signals any matching
fence/request. The fence is then removed from the list so the entire
request stack does not need to be scanned every time. Note that the
fence is added to the list before the commands to generate the seqno
interrupt are added to the ring. Thus the sequence is guaranteed to be
race free if the interrupt is already enabled.

Note that the interrupt is only enabled on demand (i.e. when
__wait_request() is called). Thus there is still a potential race when
enabling the interrupt as the request may already have completed.
However, this is simply solved by calling the interrupt processing
code immediately after enabling the interrupt and thereby checking for
already completed requests.

Lastly, the ring clean up code has the possibility to cancel
outstanding requests (e.g. because TDR has reset the ring). These
requests will never get signalled and so must be removed from the
signal list manually. This is done by setting a 'cancelled' flag and
then calling the regular notify/retire code path rather than
attempting to duplicate the list manipulatation and clean up code in
multiple places. This also avoid any race condition where the
cancellation request might occur after/during the completion interrupt
actually arriving.

v2: Updated to take advantage of the request unreference no longer
requiring the mutex lock.

v3: Move the signal list processing around to prevent unsubmitted
requests being added to the list. This was occurring on Android
because the native sync implementation calls the
fence->enable_signalling API immediately on fence creation.

Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
'link' instead of 'list'. Added support for returning an error code on
a cancelled fence. Update list processing to be more efficient/safer
with respect to spinlocks.

v5: Made i915_gem_request_submit a static as it is only ever called
from one place.

Fixed up the low latency wait optimisation. The time delay between the
seqno value being to memory and the drive's ISR running can be
significant, at least for the wait request micro-benchmark. This can
be greatly improved by explicitly checking for seqno updates in the
pre-wait busy poll loop. Also added some documentation comments to the
busy poll code.

Fixed up support for the faking of lost interrupts
(test_irq_rings/missed_irq_rings). That is, there is an IGT test that
tells the driver to loose interrupts deliberately and then check that
everything still works as expected (albeit much slower).

Updates from review comments: use non IRQ-save spinlocking, early exit
on WARN and improved comments (Tvrtko Ursulin).

v6: Updated to newer nigthly and resolved conflicts around the
wait_request busy spin optimisation. Also fixed a race condition
between this early exit path and the regular completion path.

v7: Updated to newer nightly - lots of ring -> engine renaming plus an
interface change on get_seqno(). Also added a list_empty() check
before acquring spinlocks and doing list processing.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |   8 +
 drivers/gpu/drm/i915/i915_gem.c         | 249 +++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_irq.c         |   2 +
 drivers/gpu/drm/i915/intel_lrc.c        |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
 6 files changed, 241 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 81324ba..9519b11 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2255,7 +2255,12 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
 struct drm_i915_gem_request {
 	/** Underlying object for implementing the signal/wait stuff. */
 	struct fence fence;
+	struct list_head signal_link;
+	struct list_head unsignal_link;
 	struct list_head delayed_free_link;
+	bool cancelled;
+	bool irq_enabled;
+	bool signal_requested;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2341,6 +2346,9 @@ struct drm_i915_gem_request * __must_check
 i915_gem_request_alloc(struct intel_engine_cs *engine,
 		       struct intel_context *ctx);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
+				       bool fence_locked);
+void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
 
 int i915_create_fence_timeline(struct drm_device *dev,
 			       struct intel_context *ctx,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9071058..9b1de5f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -40,6 +40,8 @@
 
 #define RQ_BUG_ON(expr)
 
+static void i915_gem_request_submit(struct drm_i915_gem_request *req);
+
 static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
 static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
 static void
@@ -1253,9 +1255,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
 	struct drm_device *dev = engine->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	const bool irq_test_in_progress =
-		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
 	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
+	uint32_t seqno;
 	DEFINE_WAIT(wait);
 	unsigned long timeout_expire;
 	s64 before = 0; /* Only to silence a compiler warning. */
@@ -1263,9 +1264,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
 
-	if (list_empty(&req->list))
-		return 0;
-
 	if (i915_gem_request_completed(req))
 		return 0;
 
@@ -1291,15 +1289,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	trace_i915_gem_request_wait_begin(req);
 
 	/* Optimistic spin for the next jiffie before touching IRQs */
-	ret = __i915_spin_request(req, state);
-	if (ret == 0)
-		goto out;
-
-	if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
-		ret = -ENODEV;
-		goto out;
+	if (req->seqno) {
+		ret = __i915_spin_request(req, state);
+		if (ret == 0)
+			goto out;
 	}
 
+	/*
+	 * Enable interrupt completion of the request.
+	 */
+	fence_enable_sw_signaling(&req->fence);
+
 	for (;;) {
 		struct timer_list timer;
 
@@ -1321,6 +1321,21 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
+		if (req->seqno) {
+			/*
+			 * There is quite a lot of latency in the user interrupt
+			 * path. So do an explicit seqno check and potentially
+			 * remove all that delay.
+			 */
+			if (req->engine->irq_seqno_barrier)
+				req->engine->irq_seqno_barrier(req->engine);
+			seqno = engine->get_seqno(engine);
+			if (i915_seqno_passed(seqno, req->seqno)) {
+				ret = 0;
+				break;
+			}
+		}
+
 		if (signal_pending_state(state, current)) {
 			ret = -ERESTARTSYS;
 			break;
@@ -1347,14 +1362,32 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			destroy_timer_on_stack(&timer);
 		}
 	}
-	if (!irq_test_in_progress)
-		engine->irq_put(engine);
 
 	finish_wait(&engine->irq_queue, &wait);
 
 out:
 	trace_i915_gem_request_wait_end(req);
 
+	if ((ret == 0) && (req->seqno)) {
+		if (req->engine->irq_seqno_barrier)
+			req->engine->irq_seqno_barrier(req->engine);
+		seqno = engine->get_seqno(engine);
+		if (i915_seqno_passed(seqno, req->seqno) &&
+		    !i915_gem_request_completed(req)) {
+			/*
+			 * Make sure the request is marked as completed before
+			 * returning. NB: Need to acquire the spinlock around
+			 * the whole call to avoid a race condition with the
+			 * interrupt handler is running concurrently and could
+			 * cause this invocation to early exit even though the
+			 * request has not actually been fully processed yet.
+			 */
+			spin_lock_irq(&req->engine->fence_lock);
+			i915_gem_request_notify(req->engine, true);
+			spin_unlock_irq(&req->engine->fence_lock);
+		}
+	}
+
 	if (timeout) {
 		s64 tres = *timeout - (ktime_get_raw_ns() - before);
 
@@ -1432,6 +1465,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	list_del_init(&request->list);
 	i915_gem_request_remove_from_client(request);
 
+	/*
+	 * In case the request is still in the signal pending list,
+	 * e.g. due to being cancelled by TDR, preemption, etc.
+	 */
+	if (!list_empty(&request->signal_link)) {
+		/*
+		 * The request must be marked as cancelled and the underlying
+		 * fence as failed. NB: There is no explicit fence fail API,
+		 * there is only a manual poke and signal.
+		 */
+		request->cancelled = true;
+		/* How to propagate to any associated sync_fence??? */
+		request->fence.status = -EIO;
+		fence_signal_locked(&request->fence);
+	}
+
 	i915_gem_request_unreference(request);
 }
 
@@ -2660,6 +2709,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	 */
 	request->postfix = intel_ring_get_tail(ringbuf);
 
+	/*
+	 * Add the fence to the pending list before emitting the commands to
+	 * generate a seqno notification interrupt.
+	 */
+	i915_gem_request_submit(request);
+
 	if (i915.enable_execlists)
 		ret = engine->emit_request(request);
 	else {
@@ -2760,28 +2815,142 @@ static void i915_gem_request_free(struct drm_i915_gem_request *req)
 		i915_gem_context_unreference(ctx);
 	}
 
+	if (req->irq_enabled)
+		req->engine->irq_put(req->engine);
+
 	kmem_cache_free(req->i915->requests, req);
 }
 
-static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+/*
+ * The request is about to be submitted to the hardware so add the fence to
+ * the list of signalable fences.
+ *
+ * NB: This does not necessarily enable interrupts yet. That only occurs on
+ * demand when the request is actually waited on. However, adding it to the
+ * list early ensures that there is no race condition where the interrupt
+ * could pop out prematurely and thus be completely lost. The race is merely
+ * that the interrupt must be manually checked for after being enabled.
+ */
+static void i915_gem_request_submit(struct drm_i915_gem_request *req)
 {
-	/* Interrupt driven fences are not implemented yet.*/
-	WARN(true, "This should not be called!");
-	return true;
+	/*
+	 * Always enable signal processing for the request's fence object
+	 * before that request is submitted to the hardware. Thus there is no
+	 * race condition whereby the interrupt could pop out before the
+	 * request has been added to the signal list. Hence no need to check
+	 * for completion, undo the list add and return false.
+	 */
+	i915_gem_request_reference(req);
+	spin_lock_irq(&req->engine->fence_lock);
+	WARN_ON(!list_empty(&req->signal_link));
+	list_add_tail(&req->signal_link, &req->engine->fence_signal_list);
+	spin_unlock_irq(&req->engine->fence_lock);
+
+	/*
+	 * NB: Interrupts are only enabled on demand. Thus there is still a
+	 * race where the request could complete before the interrupt has
+	 * been enabled. Thus care must be taken at that point.
+	 */
+
+	/* Have interrupts already been requested? */
+	if (req->signal_requested)
+		i915_gem_request_enable_interrupt(req, false);
+}
+
+/*
+ * The request is being actively waited on, so enable interrupt based
+ * completion signalling.
+ */
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
+				       bool fence_locked)
+{
+	struct drm_i915_private *dev_priv = to_i915(req->engine->dev);
+	const bool irq_test_in_progress =
+		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) &
+						intel_engine_flag(req->engine);
+
+	if (req->irq_enabled)
+		return;
+
+	if (irq_test_in_progress)
+		return;
+
+	if (!WARN_ON(!req->engine->irq_get(req->engine)))
+		req->irq_enabled = true;
+
+	/*
+	 * Because the interrupt is only enabled on demand, there is a race
+	 * where the interrupt can fire before anyone is looking for it. So
+	 * do an explicit check for missed interrupts.
+	 */
+	i915_gem_request_notify(req->engine, fence_locked);
 }
 
-static bool i915_gem_request_is_completed(struct fence *req_fence)
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
 {
 	struct drm_i915_gem_request *req = container_of(req_fence,
 						 typeof(*req), fence);
+
+	/*
+	 * No need to actually enable interrupt based processing until the
+	 * request has been submitted to the hardware. At which point
+	 * 'i915_gem_request_submit()' is called. So only really enable
+	 * signalling in there. Just set a flag to say that interrupts are
+	 * wanted when the request is eventually submitted. On the other hand
+	 * if the request has already been submitted then interrupts do need
+	 * to be enabled now.
+	 */
+
+	req->signal_requested = true;
+
+	if (!list_empty(&req->signal_link))
+		i915_gem_request_enable_interrupt(req, true);
+
+	return true;
+}
+
+void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
+{
+	struct drm_i915_gem_request *req, *req_next;
+	unsigned long flags;
 	u32 seqno;
 
-/*	if (!lazy_coherency && req->engine->irq_seqno_barrier)
-		req->engine->irq_seqno_barrier(req->engine);*/
+	if (list_empty(&engine->fence_signal_list))
+		return;
 
-	seqno = req->engine->get_seqno(req->engine);
+	if (!fence_locked)
+		spin_lock_irqsave(&engine->fence_lock, flags);
+
+	if (engine->irq_seqno_barrier)
+		engine->irq_seqno_barrier(engine);
+	seqno = engine->get_seqno(engine);
+
+	list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) {
+		if (!req->cancelled) {
+			if (!i915_seqno_passed(seqno, req->seqno))
+				break;
+		}
+
+		/*
+		 * Start by removing the fence from the signal list otherwise
+		 * the retire code can run concurrently and get confused.
+		 */
+		list_del_init(&req->signal_link);
+
+		if (!req->cancelled)
+			fence_signal_locked(&req->fence);
+
+		if (req->irq_enabled) {
+			req->engine->irq_put(req->engine);
+			req->irq_enabled = false;
+		}
 
-	return i915_seqno_passed(seqno, req->seqno);
+		/* Can't unreference here because that might grab fence_lock */
+		list_add_tail(&req->unsignal_link, &engine->fence_unsignal_list);
+	}
+
+	if (!fence_locked)
+		spin_unlock_irqrestore(&engine->fence_lock, flags);
 }
 
 static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
@@ -2824,7 +2993,6 @@ static void i915_gem_request_fence_value_str(struct fence *req_fence,
 
 static const struct fence_ops i915_gem_request_fops = {
 	.enable_signaling	= i915_gem_request_enable_signaling,
-	.signaled		= i915_gem_request_is_completed,
 	.wait			= fence_default_wait,
 	.release		= i915_gem_request_release,
 	.get_driver_name	= i915_gem_request_get_driver_name,
@@ -2909,6 +3077,7 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
 		goto err;
 	}
 
+	INIT_LIST_HEAD(&req->signal_link);
 	fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
 		   ctx->engine[engine->id].fence_timeline.fence_context,
 		   i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline));
@@ -2971,6 +3140,11 @@ void i915_gem_request_cancel(struct drm_i915_gem_request *req)
 {
 	intel_ring_reserved_space_cancel(req->ringbuf);
 
+	req->cancelled = true;
+	/* How to propagate to any associated sync_fence??? */
+	req->fence.status = -EINVAL;
+	fence_signal_locked(&req->fence);
+
 	i915_gem_request_unreference(req);
 }
 
@@ -3059,6 +3233,13 @@ static void i915_gem_reset_engine_cleanup(struct drm_i915_private *dev_priv,
 		i915_gem_request_retire(request);
 	}
 
+	/*
+	 * Tidy up anything left over. This includes a call to
+	 * i915_gem_request_notify() which will make sure that any requests
+	 * that were on the signal pending list get also cleaned up.
+	 */
+	i915_gem_retire_requests_ring(engine);
+
 	/* Having flushed all requests from all queues, we know that all
 	 * ringbuffers must now be empty. However, since we do not reclaim
 	 * all space when retiring the request (to prevent HEADs colliding
@@ -3108,6 +3289,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 
 	WARN_ON(i915_verify_lists(engine->dev));
 
+	/*
+	 * If no-one has waited on a request recently then interrupts will
+	 * not have been enabled and thus no requests will ever be marked as
+	 * completed. So do an interrupt check now.
+	 */
+	i915_gem_request_notify(engine, false);
+
 	/* Retire requests first as we use it above for the early return.
 	 * If we retire requests last, we may use a later seqno and so clear
 	 * the requests lists without clearing the active list, leading to
@@ -3149,6 +3337,17 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 		i915_gem_request_assign(&engine->trace_irq_req, NULL);
 	}
 
+	/* Tidy up any requests that were recently signalled */
+	if (!list_empty(&engine->fence_unsignal_list)) {
+		spin_lock_irq(&engine->fence_lock);
+		list_splice_init(&engine->fence_unsignal_list, &list_head);
+		spin_unlock_irq(&engine->fence_lock);
+		list_for_each_entry_safe(req, req_next, &list_head, unsignal_link) {
+			list_del(&req->unsignal_link);
+			i915_gem_request_unreference(req);
+		}
+	}
+
 	/* Really free any requests that were recently unreferenced */
 	if (!list_empty(&engine->delayed_free_list)) {
 		spin_lock(&engine->delayed_free_lock);
@@ -5234,6 +5433,8 @@ init_engine_lists(struct intel_engine_cs *engine)
 {
 	INIT_LIST_HEAD(&engine->active_list);
 	INIT_LIST_HEAD(&engine->request_list);
+	INIT_LIST_HEAD(&engine->fence_signal_list);
+	INIT_LIST_HEAD(&engine->fence_unsignal_list);
 	INIT_LIST_HEAD(&engine->delayed_free_list);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 1925a5f..9640fa8 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1002,6 +1002,8 @@ static void notify_ring(struct intel_engine_cs *engine)
 	trace_i915_gem_request_notify(engine);
 	engine->user_interrupts++;
 
+	i915_gem_request_notify(engine, false);
+
 	wake_up_all(&engine->irq_queue);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8eb0c5e..a351a0e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2133,6 +2133,8 @@ logical_ring_init(struct drm_device *dev, struct intel_engine_cs *engine)
 	engine->dev = dev;
 	INIT_LIST_HEAD(&engine->active_list);
 	INIT_LIST_HEAD(&engine->request_list);
+	INIT_LIST_HEAD(&engine->fence_signal_list);
+	INIT_LIST_HEAD(&engine->fence_unsignal_list);
 	INIT_LIST_HEAD(&engine->delayed_free_list);
 	spin_lock_init(&engine->fence_lock);
 	spin_lock_init(&engine->delayed_free_lock);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ac922a5..5f209ba 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2229,6 +2229,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&engine->request_list);
 	INIT_LIST_HEAD(&engine->execlist_queue);
 	INIT_LIST_HEAD(&engine->buffers);
+	INIT_LIST_HEAD(&engine->fence_signal_list);
+	INIT_LIST_HEAD(&engine->fence_unsignal_list);
 	INIT_LIST_HEAD(&engine->delayed_free_list);
 	spin_lock_init(&engine->fence_lock);
 	spin_lock_init(&engine->delayed_free_lock);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index fa07ea9..160e71e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -357,6 +357,8 @@ struct  intel_engine_cs {
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
 
 	spinlock_t fence_lock;
+	struct list_head fence_signal_list;
+	struct list_head fence_unsignal_list;
 };
 
 static inline bool
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 7/8] drm/i915: Updated request structure tracing
  2016-04-20 17:09 [PATCH v7 0/8] Convert requests to use struct fence John.C.Harrison
                   ` (5 preceding siblings ...)
  2016-04-20 17:09 ` [PATCH v7 6/8] drm/i915: Interrupt driven fences John.C.Harrison
@ 2016-04-20 17:09 ` John.C.Harrison
  2016-04-20 17:09 ` [PATCH v7 8/8] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
  2016-04-22 15:37 ` [PATCH v7 0/8] Convert requests to use struct fence John Harrison
  8 siblings, 0 replies; 22+ messages in thread
From: John.C.Harrison @ 2016-04-20 17:09 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Added the '_complete' trace event which occurs when a fence/request is
signaled as complete. Also moved the notify event from the IRQ handler
code to inside the notify function itself.

v3: Added the current ring seqno to the notify trace point.

v5: Line wrapping to keep the style checker happy.

v7: Updated to newer nightly (lots of ring -> engine renaming).

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c   |  9 +++++++--
 drivers/gpu/drm/i915/i915_irq.c   |  1 -
 drivers/gpu/drm/i915/i915_trace.h | 14 +++++++++-----
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9b1de5f..a89bfe9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2915,8 +2915,10 @@ void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
 	unsigned long flags;
 	u32 seqno;
 
-	if (list_empty(&engine->fence_signal_list))
+	if (list_empty(&engine->fence_signal_list)) {
+		trace_i915_gem_request_notify(engine, 0);
 		return;
+	}
 
 	if (!fence_locked)
 		spin_lock_irqsave(&engine->fence_lock, flags);
@@ -2924,6 +2926,7 @@ void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
 	if (engine->irq_seqno_barrier)
 		engine->irq_seqno_barrier(engine);
 	seqno = engine->get_seqno(engine);
+	trace_i915_gem_request_notify(engine, seqno);
 
 	list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) {
 		if (!req->cancelled) {
@@ -2937,8 +2940,10 @@ void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
 		 */
 		list_del_init(&req->signal_link);
 
-		if (!req->cancelled)
+		if (!req->cancelled) {
 			fence_signal_locked(&req->fence);
+			trace_i915_gem_request_complete(req);
+		}
 
 		if (req->irq_enabled) {
 			req->engine->irq_put(req->engine);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 9640fa8..0952064 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -999,7 +999,6 @@ static void notify_ring(struct intel_engine_cs *engine)
 	if (!intel_engine_initialized(engine))
 		return;
 
-	trace_i915_gem_request_notify(engine);
 	engine->user_interrupts++;
 
 	i915_gem_request_notify(engine, false);
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index dc0def2..b7c7031 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -550,23 +550,27 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
 );
 
 TRACE_EVENT(i915_gem_request_notify,
-	    TP_PROTO(struct intel_engine_cs *engine),
-	    TP_ARGS(engine),
+	    TP_PROTO(struct intel_engine_cs *engine, uint32_t seqno),
+	    TP_ARGS(engine, seqno),
 
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
 			     __field(u32, ring)
 			     __field(u32, seqno)
+			     __field(bool, is_empty)
 			     ),
 
 	    TP_fast_assign(
 			   __entry->dev = engine->dev->primary->index;
 			   __entry->ring = engine->id;
-			   __entry->seqno = engine->get_seqno(engine);
+			   __entry->seqno = seqno;
+			   __entry->is_empty =
+					list_empty(&engine->fence_signal_list);
 			   ),
 
-	    TP_printk("dev=%u, ring=%u, seqno=%u",
-		      __entry->dev, __entry->ring, __entry->seqno)
+	    TP_printk("dev=%u, ring=%u, seqno=%u, empty=%d",
+		      __entry->dev, __entry->ring, __entry->seqno,
+		      __entry->is_empty)
 );
 
 DEFINE_EVENT(i915_gem_request, i915_gem_request_retire,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 8/8] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
  2016-04-20 17:09 [PATCH v7 0/8] Convert requests to use struct fence John.C.Harrison
                   ` (6 preceding siblings ...)
  2016-04-20 17:09 ` [PATCH v7 7/8] drm/i915: Updated request structure tracing John.C.Harrison
@ 2016-04-20 17:09 ` John.C.Harrison
  2016-04-22 15:37 ` [PATCH v7 0/8] Convert requests to use struct fence John Harrison
  8 siblings, 0 replies; 22+ messages in thread
From: John.C.Harrison @ 2016-04-20 17:09 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The notify function can be called many times without the seqno
changing. A large number of duplicates are to prevent races due to the
requirement of not enabling interrupts until requested. However, when
interrupts are enabled the IRQ handle can be called multiple times
without the ring's seqno value changing. This patch reduces the
overhead of these extra calls by caching the last processed seqno
value and early exiting if it has not changed.

v3: New patch for series.

v5: Added comment about last_irq_seqno usage due to code review
feedback (Tvrtko Ursulin).

v6: Minor update to resolve a race condition with the wait_request
optimisation.

v7: Updated to newer nightly - lots of ring -> engine renaming plus an
interface change to get_seqno().

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         | 26 ++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a89bfe9..d1f8c47 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1383,6 +1383,7 @@ out:
 			 * request has not actually been fully processed yet.
 			 */
 			spin_lock_irq(&req->engine->fence_lock);
+			req->engine->last_irq_seqno = 0;
 			i915_gem_request_notify(req->engine, true);
 			spin_unlock_irq(&req->engine->fence_lock);
 		}
@@ -2584,9 +2585,12 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno)
 	i915_gem_retire_requests(dev);
 
 	/* Finally reset hw state */
-	for_each_engine(engine, dev_priv)
+	for_each_engine(engine, dev_priv) {
 		intel_ring_init_seqno(engine, seqno);
 
+		engine->last_irq_seqno = 0;
+	}
+
 	return 0;
 }
 
@@ -2920,13 +2924,24 @@ void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
 		return;
 	}
 
-	if (!fence_locked)
-		spin_lock_irqsave(&engine->fence_lock, flags);
-
+	/*
+	 * Check for a new seqno. If it hasn't actually changed then early
+	 * exit without even grabbing the spinlock. Note that this is safe
+	 * because any corruption of last_irq_seqno merely results in doing
+	 * the full processing when there is potentially no work to be done.
+	 * It can never lead to not processing work that does need to happen.
+	 */
 	if (engine->irq_seqno_barrier)
 		engine->irq_seqno_barrier(engine);
 	seqno = engine->get_seqno(engine);
 	trace_i915_gem_request_notify(engine, seqno);
+	if (seqno == engine->last_irq_seqno)
+		return;
+
+	if (!fence_locked)
+		spin_lock_irqsave(&engine->fence_lock, flags);
+
+	engine->last_irq_seqno = seqno;
 
 	list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) {
 		if (!req->cancelled) {
@@ -3242,7 +3257,10 @@ static void i915_gem_reset_engine_cleanup(struct drm_i915_private *dev_priv,
 	 * Tidy up anything left over. This includes a call to
 	 * i915_gem_request_notify() which will make sure that any requests
 	 * that were on the signal pending list get also cleaned up.
+	 * NB: The seqno cache must be cleared otherwise the notify call will
+	 * simply return immediately.
 	 */
+	engine->last_irq_seqno = 0;
 	i915_gem_retire_requests_ring(engine);
 
 	/* Having flushed all requests from all queues, we know that all
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 160e71e..c75c5e1 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -359,6 +359,7 @@ struct  intel_engine_cs {
 	spinlock_t fence_lock;
 	struct list_head fence_signal_list;
 	struct list_head fence_unsignal_list;
+	uint32_t last_irq_seqno;
 };
 
 static inline bool
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 3/8] drm/i915: Add per context timelines to fence object
  2016-04-20 17:09 ` [PATCH v7 3/8] drm/i915: Add per context timelines to fence object John.C.Harrison
@ 2016-04-20 17:44   ` Chris Wilson
  2016-04-21 11:37     ` John Harrison
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2016-04-20 17:44 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Wed, Apr 20, 2016 at 06:09:50PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The fence object used inside the request structure requires a sequence
> number. Although this is not used by the i915 driver itself, it could
> potentially be used by non-i915 code if the fence is passed outside of
> the driver. This is the intention as it allows external kernel drivers
> and user applications to wait on batch buffer completion
> asynchronously via the dma-buff fence API.
> 
> To ensure that such external users are not confused by strange things
> happening with the seqno, this patch adds in a per context timeline
> that can provide a guaranteed in-order seqno value for the fence. This
> is safe because the scheduler will not re-order batch buffers within a
> context - they are considered to be mutually dependent.
> 
> v2: New patch in series.
> 
> v3: Renamed/retyped timeline structure fields after review comments by
> Tvrtko Ursulin.
> 
> Added context information to the timeline's name string for better
> identification in debugfs output.
> 
> v5: Line wrapping and other white space fixes to keep style checker
> happy.
> 
> v7: Updated to newer nightly (lots of ring -> engine renaming).
> 
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         | 25 +++++++---
>  drivers/gpu/drm/i915/i915_gem.c         | 83 +++++++++++++++++++++++++++++----
>  drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++
>  drivers/gpu/drm/i915/intel_lrc.c        |  8 ++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
>  5 files changed, 114 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index bb18b89..1c3a1ca 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -813,6 +813,15 @@ struct i915_ctx_hang_stats {
>  	bool banned;
>  };
>  
> +struct i915_fence_timeline {
> +	char        name[32];
> +	unsigned    fence_context;
> +	unsigned    next;
> +
> +	struct intel_context *ctx;
> +	struct intel_engine_cs *engine;
> +};
> +
>  /* This must match up with the value previously used for execbuf2.rsvd1. */
>  #define DEFAULT_CONTEXT_HANDLE 0
>  
> @@ -860,6 +869,7 @@ struct intel_context {
>  		struct i915_vma *lrc_vma;
>  		u64 lrc_desc;
>  		uint32_t *lrc_reg_state;
> +		struct i915_fence_timeline fence_timeline;

This is wrong. The timeline is actually a property of the vm, with
contexts for each engine.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 1/8] drm/i915: Convert requests to use struct fence
  2016-04-20 17:09 ` [PATCH v7 1/8] drm/i915: " John.C.Harrison
@ 2016-04-21  7:06   ` Maarten Lankhorst
  2016-04-21 10:26     ` John Harrison
  2016-04-21  7:09   ` Maarten Lankhorst
  1 sibling, 1 reply; 22+ messages in thread
From: Maarten Lankhorst @ 2016-04-21  7:06 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

Op 20-04-16 om 19:09 schreef John.C.Harrison@Intel.com:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> There is a construct in the linux kernel called 'struct fence' that is
> intended to keep track of work that is executed on hardware. I.e. it
> solves the basic problem that the drivers 'struct
> drm_i915_gem_request' is trying to address. The request structure does
> quite a lot more than simply track the execution progress so is very
> definitely still required. However, the basic completion status side
> could be updated to use the ready made fence implementation and gain
> all the advantages that provides.
>
> This patch makes the first step of integrating a struct fence into the
> request. It replaces the explicit reference count with that of the
> fence. It also replaces the 'is completed' test with the fence's
> equivalent. Currently, that simply chains on to the original request
> implementation. A future patch will improve this.
>
> v3: Updated after review comments by Tvrtko Ursulin. Added fence
> context/seqno pair to the debugfs request info. Renamed fence 'driver
> name' to just 'i915'. Removed BUG_ONs.
>
> v5: Changed seqno format in debugfs to %x rather than %u as that is
> apparently the preferred appearance. Line wrapped some long lines to
> keep the style checker happy.
>
> v6: Updated to newer nigthly and resolved conflicts. The biggest issue
> was with the re-worked busy spin precursor to waiting on a request. In
> particular, the addition of a 'request_started' helper function. This
> has no corresponding concept within the fence framework. However, it
> is only ever used in one place and the whole point of that place is to
> always directly read the seqno for absolutely lowest latency possible.
> So the simple solution is to just make the seqno test explicit at that
> point now rather than later in the series (it was previously being
> done anyway when fences become interrupt driven).
>
> v7: Rebased to newer nightly - lots of ring -> engine renaming and
> interface change to get_seqno().
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c     |  5 ++-
>  drivers/gpu/drm/i915/i915_drv.h         | 51 ++++++++++-------------
>  drivers/gpu/drm/i915/i915_gem.c         | 72 +++++++++++++++++++++++++++++----
>  drivers/gpu/drm/i915/intel_lrc.c        |  1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
>  6 files changed, 94 insertions(+), 39 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 2d11b49..6917515 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -707,11 +707,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
>  			task = NULL;
>  			if (req->pid)
>  				task = pid_task(req->pid, PIDTYPE_PID);
> -			seq_printf(m, "    %x @ %d: %s [%d]\n",
> +			seq_printf(m, "    %x @ %d: %s [%d], fence = %x:%x\n",
>  				   req->seqno,
>  				   (int) (jiffies - req->emitted_jiffies),
>  				   task ? task->comm : "<unknown>",
> -				   task ? task->pid : -1);
> +				   task ? task->pid : -1,
> +				   req->fence.context, req->fence.seqno);
>  			rcu_read_unlock();
>  		}
>  
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index d1e6e58..e5f49f3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -54,6 +54,7 @@
>  #include <linux/pm_qos.h>
>  #include "intel_guc.h"
>  #include "intel_dpll_mgr.h"
> +#include <linux/fence.h>
>  
>  /* General customization:
>   */
> @@ -2242,7 +2243,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>   * initial reference taken using kref_init
>   */
>  struct drm_i915_gem_request {
> -	struct kref ref;
> +	/**
> +	 * Underlying object for implementing the signal/wait stuff.
> +	 * NB: Never call fence_later() or return this fence object to user
> +	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
> +	 * etc., there is no guarantee at all about the validity or
> +	 * sequentiality of the fence's seqno! It is also unsafe to let
> +	 * anything outside of the i915 driver get hold of the fence object
> +	 * as the clean up when decrementing the reference count requires
> +	 * holding the driver mutex lock.
> +	 */
> +	struct fence fence;
>  
>  	/** On Which ring this request was generated */
>  	struct drm_i915_private *i915;
> @@ -2328,7 +2339,13 @@ struct drm_i915_gem_request * __must_check
>  i915_gem_request_alloc(struct intel_engine_cs *engine,
>  		       struct intel_context *ctx);
>  void i915_gem_request_cancel(struct drm_i915_gem_request *req);
> -void i915_gem_request_free(struct kref *req_ref);
> +
> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> +					      bool lazy_coherency)
> +{
> +	return fence_is_signaled(&req->fence);
> +}
> +
>  int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>  				   struct drm_file *file);
>  
> @@ -2348,7 +2365,7 @@ static inline struct drm_i915_gem_request *
>  i915_gem_request_reference(struct drm_i915_gem_request *req)
>  {
>  	if (req)
> -		kref_get(&req->ref);
> +		fence_get(&req->fence);
>  	return req;
>  }
>  
> @@ -2356,7 +2373,7 @@ static inline void
>  i915_gem_request_unreference(struct drm_i915_gem_request *req)
>  {
>  	WARN_ON(!mutex_is_locked(&req->engine->dev->struct_mutex));
> -	kref_put(&req->ref, i915_gem_request_free);
> +	fence_put(&req->fence);
>  }
>  
>  static inline void
> @@ -2368,7 +2385,7 @@ i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
>  		return;
>  
>  	dev = req->engine->dev;
> -	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
> +	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
>  		mutex_unlock(&dev->struct_mutex);
>  }
>  
> @@ -2385,12 +2402,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>  }
>  
>  /*
> - * XXX: i915_gem_request_completed should be here but currently needs the
> - * definition of i915_seqno_passed() which is below. It will be moved in
> - * a later patch when the call to i915_seqno_passed() is obsoleted...
> - */
> -
> -/*
>   * A command that requires special handling by the command parser.
>   */
>  struct drm_i915_cmd_descriptor {
> @@ -3055,24 +3066,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>  	return (int32_t)(seq1 - seq2) >= 0;
>  }
>  
> -static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
> -					   bool lazy_coherency)
> -{
> -	if (!lazy_coherency && req->engine->irq_seqno_barrier)
> -		req->engine->irq_seqno_barrier(req->engine);
> -	return i915_seqno_passed(req->engine->get_seqno(req->engine),
> -				 req->previous_seqno);
> -}
> -
> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> -					      bool lazy_coherency)
> -{
> -	if (!lazy_coherency && req->engine->irq_seqno_barrier)
> -		req->engine->irq_seqno_barrier(req->engine);
> -	return i915_seqno_passed(req->engine->get_seqno(req->engine),
> -				 req->seqno);
> -}
> -
>  int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
>  int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ebef03b..1add29a9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1183,6 +1183,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>  {
>  	unsigned long timeout;
>  	unsigned cpu;
> +	uint32_t seqno;
>  
>  	/* When waiting for high frequency requests, e.g. during synchronous
>  	 * rendering split between the CPU and GPU, the finite amount of time
> @@ -1198,12 +1199,14 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>  		return -EBUSY;
>  
>  	/* Only spin if we know the GPU is processing this request */
> -	if (!i915_gem_request_started(req, true))
> +	seqno = req->engine->get_seqno(req->engine);
> +	if (!i915_seqno_passed(seqno, req->previous_seqno))
>  		return -EAGAIN;
>  
>  	timeout = local_clock_us(&cpu) + 5;
>  	while (!need_resched()) {
> -		if (i915_gem_request_completed(req, true))
> +		seqno = req->engine->get_seqno(req->engine);
> +		if (i915_seqno_passed(seqno, req->seqno))
>  			return 0;
>  
>  		if (signal_pending_state(state, current))
> @@ -1215,7 +1218,10 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>  		cpu_relax_lowlatency();
>  	}
>  
> -	if (i915_gem_request_completed(req, false))
> +	if (req->engine->irq_seqno_barrier)
> +		req->engine->irq_seqno_barrier(req->engine);
> +	seqno = req->engine->get_seqno(req->engine);
> +	if (i915_seqno_passed(seqno, req->seqno))
>  		return 0;
>  
>  	return -EAGAIN;
> @@ -2721,12 +2727,14 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>  	}
>  }
>  
> -void i915_gem_request_free(struct kref *req_ref)
> +static void i915_gem_request_free(struct fence *req_fence)
>  {
> -	struct drm_i915_gem_request *req = container_of(req_ref,
> -						 typeof(*req), ref);
> +	struct drm_i915_gem_request *req = container_of(req_fence,
> +						 typeof(*req), fence);
>  	struct intel_context *ctx = req->ctx;
>  
> +	WARN_ON(!mutex_is_locked(&req->engine->dev->struct_mutex));
> +
>  	if (req->file_priv)
>  		i915_gem_request_remove_from_client(req);
>  
>
Is kmem_cache_free rcu-safe?

I don't think it is, and that would cause some hard to debug issues.

Adding SLAB_DESTROY_BY_RCU to flags wouldn't do what you would expect here,
so your best bet would be to do a call_rcu(&fence->rcu, wrapper_for_kmem_cache_free);

~Maarten
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 1/8] drm/i915: Convert requests to use struct fence
  2016-04-20 17:09 ` [PATCH v7 1/8] drm/i915: " John.C.Harrison
  2016-04-21  7:06   ` Maarten Lankhorst
@ 2016-04-21  7:09   ` Maarten Lankhorst
  1 sibling, 0 replies; 22+ messages in thread
From: Maarten Lankhorst @ 2016-04-21  7:09 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

Op 20-04-16 om 19:09 schreef John.C.Harrison@Intel.com:
> <snip>

> +/*	if (!lazy_coherency && req->engine->irq_seqno_barrier)
> +		req->engine->irq_seqno_barrier(req->engine);*/
>
Why keep this hunk in i915_gem_request_is_completed?
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 2/8] drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  2016-04-20 17:09 ` [PATCH v7 2/8] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
@ 2016-04-21  7:19   ` Maarten Lankhorst
  2016-04-21 10:18     ` John Harrison
  0 siblings, 1 reply; 22+ messages in thread
From: Maarten Lankhorst @ 2016-04-21  7:19 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

Typo in subject (redundant). ^

Op 20-04-16 om 19:09 schreef John.C.Harrison@Intel.com:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The change to the implementation of i915_gem_request_completed() means
> that the lazy coherency flag is no longer used. This can now be
> removed to simplify the interface.
>
> v6: Updated to newer nightly and resolved conflicts.
>
> v7: Updated to newer nightly (lots of ring -> engine renaming).
>
I think it would be more readable to put this patch as preparation for patch 1.

~Maarten
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 4/8] drm/i915: Fix clean up of file client list on execbuff failure
  2016-04-20 17:09 ` [PATCH v7 4/8] drm/i915: Fix clean up of file client list on execbuff failure John.C.Harrison
@ 2016-04-21  7:20   ` Maarten Lankhorst
  2016-04-21 10:15     ` John Harrison
  0 siblings, 1 reply; 22+ messages in thread
From: Maarten Lankhorst @ 2016-04-21  7:20 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

Op 20-04-16 om 19:09 schreef John.C.Harrison@Intel.com:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> If an execbuff IOCTL call fails for some reason, it would leave the
> request in the client list. The request clean up code would remove
> this but only later on and only after the reference count has dropped
> to zero. The entire sequence is contained within the driver mutex
> lock. However, there is still a hole such that any code which does not
> require the mutex lock could still find the request on the client list
> and start using it. That would lead to broken reference counts, use of
> dangling pointers and all sorts of other nastiness.
>
> The throttle IOCTL in particular does not acquire the mutex and does
> process the client list. And the likely situation of the execbuff
> IOCTL failing is when the system is busy with lots of work
> outstanding. That is exactly the situation where the throttle IOCTL
> would try to wait on a request.
>
> Currently, this hole is tiny - the gap between the reference count
> dropping to zero and the free function being called in response.
> However the next patch in this series enlarges that gap considerably
> by deferring the free function (to remove the need for the mutex lock
> when unreferencing requests).
>
Seems to already have been fixed by

commit aa9b78104fe3210758fa9e6c644e9a108d371e8b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Apr 13 17:35:15 2016 +0100

    drm/i915: Late request cancellations are harmful

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 6/8] drm/i915: Interrupt driven fences
  2016-04-20 17:09 ` [PATCH v7 6/8] drm/i915: Interrupt driven fences John.C.Harrison
@ 2016-04-21  7:38   ` Maarten Lankhorst
  0 siblings, 0 replies; 22+ messages in thread
From: Maarten Lankhorst @ 2016-04-21  7:38 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

Op 20-04-16 om 19:09 schreef John.C.Harrison@Intel.com:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The intended usage model for struct fence is that the signalled status
> should be set on demand rather than polled. That is, there should not
> be a need for a 'signaled' function to be called everytime the status
> is queried. Instead, 'something' should be done to enable a signal
> callback from the hardware which will update the state directly. In
> the case of requests, this is the seqno update interrupt. The idea is
> that this callback will only be enabled on demand when something
> actually tries to wait on the fence.
>
> This change removes the polling test and replaces it with the callback
> scheme. Each fence is added to a 'please poke me' list at the start of
> i915_add_request(). The interrupt handler then scans through the 'poke
> me' list when a new seqno pops out and signals any matching
> fence/request. The fence is then removed from the list so the entire
> request stack does not need to be scanned every time. Note that the
> fence is added to the list before the commands to generate the seqno
> interrupt are added to the ring. Thus the sequence is guaranteed to be
> race free if the interrupt is already enabled.
>
> Note that the interrupt is only enabled on demand (i.e. when
> __wait_request() is called). Thus there is still a potential race when
> enabling the interrupt as the request may already have completed.
> However, this is simply solved by calling the interrupt processing
> code immediately after enabling the interrupt and thereby checking for
> already completed requests.
>
> Lastly, the ring clean up code has the possibility to cancel
> outstanding requests (e.g. because TDR has reset the ring). These
> requests will never get signalled and so must be removed from the
> signal list manually. This is done by setting a 'cancelled' flag and
> then calling the regular notify/retire code path rather than
> attempting to duplicate the list manipulatation and clean up code in
> multiple places. This also avoid any race condition where the
> cancellation request might occur after/during the completion interrupt
> actually arriving.
>
> v2: Updated to take advantage of the request unreference no longer
> requiring the mutex lock.
>
> v3: Move the signal list processing around to prevent unsubmitted
> requests being added to the list. This was occurring on Android
> because the native sync implementation calls the
> fence->enable_signalling API immediately on fence creation.
>
> Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
> 'link' instead of 'list'. Added support for returning an error code on
> a cancelled fence. Update list processing to be more efficient/safer
> with respect to spinlocks.
>
> v5: Made i915_gem_request_submit a static as it is only ever called
> from one place.
>
> Fixed up the low latency wait optimisation. The time delay between the
> seqno value being to memory and the drive's ISR running can be
> significant, at least for the wait request micro-benchmark. This can
> be greatly improved by explicitly checking for seqno updates in the
> pre-wait busy poll loop. Also added some documentation comments to the
> busy poll code.
>
> Fixed up support for the faking of lost interrupts
> (test_irq_rings/missed_irq_rings). That is, there is an IGT test that
> tells the driver to loose interrupts deliberately and then check that
> everything still works as expected (albeit much slower).
>
> Updates from review comments: use non IRQ-save spinlocking, early exit
> on WARN and improved comments (Tvrtko Ursulin).
>
> v6: Updated to newer nigthly and resolved conflicts around the
> wait_request busy spin optimisation. Also fixed a race condition
> between this early exit path and the regular completion path.
>
> v7: Updated to newer nightly - lots of ring -> engine renaming plus an
> interface change on get_seqno(). Also added a list_empty() check
> before acquring spinlocks and doing list processing.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |   8 +
>  drivers/gpu/drm/i915/i915_gem.c         | 249 +++++++++++++++++++++++++++++---
>  drivers/gpu/drm/i915/i915_irq.c         |   2 +
>  drivers/gpu/drm/i915/intel_lrc.c        |   2 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
>  6 files changed, 241 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 81324ba..9519b11 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2255,7 +2255,12 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>  struct drm_i915_gem_request {
>  	/** Underlying object for implementing the signal/wait stuff. */
>  	struct fence fence;
> +	struct list_head signal_link;
> +	struct list_head unsignal_link;
>  	struct list_head delayed_free_link;
> +	bool cancelled;
> +	bool irq_enabled;
> +	bool signal_requested;
>  
>  	/** On Which ring this request was generated */
>  	struct drm_i915_private *i915;
> @@ -2341,6 +2346,9 @@ struct drm_i915_gem_request * __must_check
>  i915_gem_request_alloc(struct intel_engine_cs *engine,
>  		       struct intel_context *ctx);
>  void i915_gem_request_cancel(struct drm_i915_gem_request *req);
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
> +				       bool fence_locked);
> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
>  
>  int i915_create_fence_timeline(struct drm_device *dev,
>  			       struct intel_context *ctx,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9071058..9b1de5f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -40,6 +40,8 @@
>  
>  #define RQ_BUG_ON(expr)
>  
> +static void i915_gem_request_submit(struct drm_i915_gem_request *req);
> +
>  static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
>  static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
>  static void
> @@ -1253,9 +1255,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  	struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
>  	struct drm_device *dev = engine->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> -	const bool irq_test_in_progress =
> -		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
>  	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> +	uint32_t seqno;
>  	DEFINE_WAIT(wait);
>  	unsigned long timeout_expire;
>  	s64 before = 0; /* Only to silence a compiler warning. */
> @@ -1263,9 +1264,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  
>  	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
>  
> -	if (list_empty(&req->list))
> -		return 0;
> -
>  	if (i915_gem_request_completed(req))
>  		return 0;
>  
> @@ -1291,15 +1289,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  	trace_i915_gem_request_wait_begin(req);
>  
>  	/* Optimistic spin for the next jiffie before touching IRQs */
> -	ret = __i915_spin_request(req, state);
> -	if (ret == 0)
> -		goto out;
> -
> -	if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
> -		ret = -ENODEV;
> -		goto out;
> +	if (req->seqno) {
> +		ret = __i915_spin_request(req, state);
> +		if (ret == 0)
> +			goto out;
>  	}
>  
> +	/*
> +	 * Enable interrupt completion of the request.
> +	 */
> +	fence_enable_sw_signaling(&req->fence);
> +
>  	for (;;) {
>  		struct timer_list timer;
>  
> @@ -1321,6 +1321,21 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  			break;
>  		}
>  
> +		if (req->seqno) {
> +			/*
> +			 * There is quite a lot of latency in the user interrupt
> +			 * path. So do an explicit seqno check and potentially
> +			 * remove all that delay.
> +			 */
> +			if (req->engine->irq_seqno_barrier)
> +				req->engine->irq_seqno_barrier(req->engine);
> +			seqno = engine->get_seqno(engine);
You're using this barrier + get_seqno pattern multiple times, maybe write a helper function for this?
> +			if (i915_seqno_passed(seqno, req->seqno)) {
> +				ret = 0;
> +				break;
> +			}
> +		}
> +
>  		if (signal_pending_state(state, current)) {
>  			ret = -ERESTARTSYS;
>  			break;
> @@ -1347,14 +1362,32 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  			destroy_timer_on_stack(&timer);
>  		}
>  	}
> -	if (!irq_test_in_progress)
> -		engine->irq_put(engine);
>  
>  	finish_wait(&engine->irq_queue, &wait);
>  
>  out:
>  	trace_i915_gem_request_wait_end(req);
>  
> +	if ((ret == 0) && (req->seqno)) {
> +		if (req->engine->irq_seqno_barrier)
> +			req->engine->irq_seqno_barrier(req->engine);
> +		seqno = engine->get_seqno(engine);
> +		if (i915_seqno_passed(seqno, req->seqno) &&
> +		    !i915_gem_request_completed(req)) {
> +			/*
> +			 * Make sure the request is marked as completed before
> +			 * returning. NB: Need to acquire the spinlock around
> +			 * the whole call to avoid a race condition with the
> +			 * interrupt handler is running concurrently and could
> +			 * cause this invocation to early exit even though the
> +			 * request has not actually been fully processed yet.
> +			 */
> +			spin_lock_irq(&req->engine->fence_lock);
> +			i915_gem_request_notify(req->engine, true);
> +			spin_unlock_irq(&req->engine->fence_lock);
> +		}
> +	}
> +
>  	if (timeout) {
>  		s64 tres = *timeout - (ktime_get_raw_ns() - before);
>  
> @@ -1432,6 +1465,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>  	list_del_init(&request->list);
>  	i915_gem_request_remove_from_client(request);
>  
> +	/*
> +	 * In case the request is still in the signal pending list,
> +	 * e.g. due to being cancelled by TDR, preemption, etc.
> +	 */
> +	if (!list_empty(&request->signal_link)) {
> +		/*
> +		 * The request must be marked as cancelled and the underlying
> +		 * fence as failed. NB: There is no explicit fence fail API,
> +		 * there is only a manual poke and signal.
> +		 */
> +		request->cancelled = true;
> +		/* How to propagate to any associated sync_fence??? */
> +		request->fence.status = -EIO;
> +		fence_signal_locked(&request->fence);
> +	}
>
Can this still happen with

commit aa9b78104fe3210758fa9e6c644e9a108d371e8b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Apr 13 17:35:15 2016 +0100

    drm/i915: Late request cancellations are harmful

?
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 5/8] drm/i915: Delay the freeing of requests until retire time
  2016-04-20 17:09 ` [PATCH v7 5/8] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
@ 2016-04-21  7:41   ` Maarten Lankhorst
  0 siblings, 0 replies; 22+ messages in thread
From: Maarten Lankhorst @ 2016-04-21  7:41 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

Op 20-04-16 om 19:09 schreef John.C.Harrison@Intel.com:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The request structure is reference counted. When the count reached
> zero, the request was immediately freed and all associated objects
> were unrefereced/unallocated. This meant that the driver mutex lock
> must be held at the point where the count reaches zero. This was fine
> while all references were held internally to the driver. However, the
> plan is to allow the underlying fence object (and hence the request
> itself) to be returned to other drivers and to userland. External
> users cannot be expected to acquire a driver private mutex lock.
>
> Rather than attempt to disentangle the request structure from the
> driver mutex lock, the decsion was to defer the free code until a
> later (safer) point. Hence this patch changes the unreference callback
> to merely move the request onto a delayed free list. The driver's
> retire worker thread will then process the list and actually call the
> free function on the requests.
>
> v2: New patch in series.
>
> v3: Updated after review comments by Tvrtko Ursulin. Rename list nodes
> to 'link' rather than 'list'. Update list processing to be more
> efficient/safer with respect to spinlocks.
>
> v4: Changed to use basic spinlocks rather than IRQ ones - missed
> update from earlier feedback by Tvrtko.
>
> v5: Improved a comment to keep the style checker happy.
>
> v7: Updated to newer nightly (lots of ring -> engine renaming). Also
> added a list_empty() check before wasting time with spinlocks and list
> processing.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         | 22 +++----------------
>  drivers/gpu/drm/i915/i915_gem.c         | 39 +++++++++++++++++++++++++++++----
>  drivers/gpu/drm/i915/intel_display.c    |  2 +-
>  drivers/gpu/drm/i915/intel_lrc.c        |  2 ++
>  drivers/gpu/drm/i915/intel_pm.c         |  2 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  2 ++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  7 ++++++
>  7 files changed, 51 insertions(+), 25 deletions(-)
>
This patch should go away entirely with the 'Premature unpinning at last' patch series, specifically
'[Intel-gfx] [PATCH 16/19] drm/i915: Move releasing of the GEM request from free to retire/cancel'.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 4/8] drm/i915: Fix clean up of file client list on execbuff failure
  2016-04-21  7:20   ` Maarten Lankhorst
@ 2016-04-21 10:15     ` John Harrison
  0 siblings, 0 replies; 22+ messages in thread
From: John Harrison @ 2016-04-21 10:15 UTC (permalink / raw)
  To: Maarten Lankhorst, Intel-GFX

On 21/04/2016 08:20, Maarten Lankhorst wrote:
> Op 20-04-16 om 19:09 schreef John.C.Harrison@Intel.com:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> If an execbuff IOCTL call fails for some reason, it would leave the
>> request in the client list. The request clean up code would remove
>> this but only later on and only after the reference count has dropped
>> to zero. The entire sequence is contained within the driver mutex
>> lock. However, there is still a hole such that any code which does not
>> require the mutex lock could still find the request on the client list
>> and start using it. That would lead to broken reference counts, use of
>> dangling pointers and all sorts of other nastiness.
>>
>> The throttle IOCTL in particular does not acquire the mutex and does
>> process the client list. And the likely situation of the execbuff
>> IOCTL failing is when the system is busy with lots of work
>> outstanding. That is exactly the situation where the throttle IOCTL
>> would try to wait on a request.
>>
>> Currently, this hole is tiny - the gap between the reference count
>> dropping to zero and the free function being called in response.
>> However the next patch in this series enlarges that gap considerably
>> by deferring the free function (to remove the need for the mutex lock
>> when unreferencing requests).
>>
> Seems to already have been fixed by
>
> commit aa9b78104fe3210758fa9e6c644e9a108d371e8b
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed Apr 13 17:35:15 2016 +0100
>
>      drm/i915: Late request cancellations are harmful
>

I don't believe that patch is the best way to solve the problem.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 2/8] drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  2016-04-21  7:19   ` Maarten Lankhorst
@ 2016-04-21 10:18     ` John Harrison
  0 siblings, 0 replies; 22+ messages in thread
From: John Harrison @ 2016-04-21 10:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Intel-GFX

On 21/04/2016 08:19, Maarten Lankhorst wrote:
> Typo in subject (redundant). ^
>
> Op 20-04-16 om 19:09 schreef John.C.Harrison@Intel.com:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The change to the implementation of i915_gem_request_completed() means
>> that the lazy coherency flag is no longer used. This can now be
>> removed to simplify the interface.
>>
>> v6: Updated to newer nightly and resolved conflicts.
>>
>> v7: Updated to newer nightly (lots of ring -> engine renaming).
>>
> I think it would be more readable to put this patch as preparation for patch 1.
>
> ~Maarten
I can't be prep for patch one. It is a result of patch one that the 
parameter is redundant and can be removed. It can't be removed before 
the patch that makes it redundant! It could be squashed into that patch 
but earlier comments have been that it is best to not do too much in a 
single patch.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 1/8] drm/i915: Convert requests to use struct fence
  2016-04-21  7:06   ` Maarten Lankhorst
@ 2016-04-21 10:26     ` John Harrison
  2016-04-21 11:12       ` Maarten Lankhorst
  0 siblings, 1 reply; 22+ messages in thread
From: John Harrison @ 2016-04-21 10:26 UTC (permalink / raw)
  To: Maarten Lankhorst, Intel-GFX

On 21/04/2016 08:06, Maarten Lankhorst wrote:
> Op 20-04-16 om 19:09 schreef John.C.Harrison@Intel.com:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> There is a construct in the linux kernel called 'struct fence' that is
>> intended to keep track of work that is executed on hardware. I.e. it
>> solves the basic problem that the drivers 'struct
>> drm_i915_gem_request' is trying to address. The request structure does
>> quite a lot more than simply track the execution progress so is very
>> definitely still required. However, the basic completion status side
>> could be updated to use the ready made fence implementation and gain
>> all the advantages that provides.
>>
>> This patch makes the first step of integrating a struct fence into the
>> request. It replaces the explicit reference count with that of the
>> fence. It also replaces the 'is completed' test with the fence's
>> equivalent. Currently, that simply chains on to the original request
>> implementation. A future patch will improve this.
>>
>> v3: Updated after review comments by Tvrtko Ursulin. Added fence
>> context/seqno pair to the debugfs request info. Renamed fence 'driver
>> name' to just 'i915'. Removed BUG_ONs.
>>
>> v5: Changed seqno format in debugfs to %x rather than %u as that is
>> apparently the preferred appearance. Line wrapped some long lines to
>> keep the style checker happy.
>>
>> v6: Updated to newer nigthly and resolved conflicts. The biggest issue
>> was with the re-worked busy spin precursor to waiting on a request. In
>> particular, the addition of a 'request_started' helper function. This
>> has no corresponding concept within the fence framework. However, it
>> is only ever used in one place and the whole point of that place is to
>> always directly read the seqno for absolutely lowest latency possible.
>> So the simple solution is to just make the seqno test explicit at that
>> point now rather than later in the series (it was previously being
>> done anyway when fences become interrupt driven).
>>
>> v7: Rebased to newer nightly - lots of ring -> engine renaming and
>> interface change to get_seqno().
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
>> ---
>>   drivers/gpu/drm/i915/i915_debugfs.c     |  5 ++-
>>   drivers/gpu/drm/i915/i915_drv.h         | 51 ++++++++++-------------
>>   drivers/gpu/drm/i915/i915_gem.c         | 72 +++++++++++++++++++++++++++++----
>>   drivers/gpu/drm/i915/intel_lrc.c        |  1 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
>>   6 files changed, 94 insertions(+), 39 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>> index 2d11b49..6917515 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -707,11 +707,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
>>   			task = NULL;
>>   			if (req->pid)
>>   				task = pid_task(req->pid, PIDTYPE_PID);
>> -			seq_printf(m, "    %x @ %d: %s [%d]\n",
>> +			seq_printf(m, "    %x @ %d: %s [%d], fence = %x:%x\n",
>>   				   req->seqno,
>>   				   (int) (jiffies - req->emitted_jiffies),
>>   				   task ? task->comm : "<unknown>",
>> -				   task ? task->pid : -1);
>> +				   task ? task->pid : -1,
>> +				   req->fence.context, req->fence.seqno);
>>   			rcu_read_unlock();
>>   		}
>>   
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index d1e6e58..e5f49f3 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -54,6 +54,7 @@
>>   #include <linux/pm_qos.h>
>>   #include "intel_guc.h"
>>   #include "intel_dpll_mgr.h"
>> +#include <linux/fence.h>
>>   
>>   /* General customization:
>>    */
>> @@ -2242,7 +2243,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>>    * initial reference taken using kref_init
>>    */
>>   struct drm_i915_gem_request {
>> -	struct kref ref;
>> +	/**
>> +	 * Underlying object for implementing the signal/wait stuff.
>> +	 * NB: Never call fence_later() or return this fence object to user
>> +	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
>> +	 * etc., there is no guarantee at all about the validity or
>> +	 * sequentiality of the fence's seqno! It is also unsafe to let
>> +	 * anything outside of the i915 driver get hold of the fence object
>> +	 * as the clean up when decrementing the reference count requires
>> +	 * holding the driver mutex lock.
>> +	 */
>> +	struct fence fence;
>>   
>>   	/** On Which ring this request was generated */
>>   	struct drm_i915_private *i915;
>> @@ -2328,7 +2339,13 @@ struct drm_i915_gem_request * __must_check
>>   i915_gem_request_alloc(struct intel_engine_cs *engine,
>>   		       struct intel_context *ctx);
>>   void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>> -void i915_gem_request_free(struct kref *req_ref);
>> +
>> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
>> +					      bool lazy_coherency)
>> +{
>> +	return fence_is_signaled(&req->fence);
>> +}
>> +
>>   int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>>   				   struct drm_file *file);
>>   
>> @@ -2348,7 +2365,7 @@ static inline struct drm_i915_gem_request *
>>   i915_gem_request_reference(struct drm_i915_gem_request *req)
>>   {
>>   	if (req)
>> -		kref_get(&req->ref);
>> +		fence_get(&req->fence);
>>   	return req;
>>   }
>>   
>> @@ -2356,7 +2373,7 @@ static inline void
>>   i915_gem_request_unreference(struct drm_i915_gem_request *req)
>>   {
>>   	WARN_ON(!mutex_is_locked(&req->engine->dev->struct_mutex));
>> -	kref_put(&req->ref, i915_gem_request_free);
>> +	fence_put(&req->fence);
>>   }
>>   
>>   static inline void
>> @@ -2368,7 +2385,7 @@ i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
>>   		return;
>>   
>>   	dev = req->engine->dev;
>> -	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
>> +	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
>>   		mutex_unlock(&dev->struct_mutex);
>>   }
>>   
>> @@ -2385,12 +2402,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>>   }
>>   
>>   /*
>> - * XXX: i915_gem_request_completed should be here but currently needs the
>> - * definition of i915_seqno_passed() which is below. It will be moved in
>> - * a later patch when the call to i915_seqno_passed() is obsoleted...
>> - */
>> -
>> -/*
>>    * A command that requires special handling by the command parser.
>>    */
>>   struct drm_i915_cmd_descriptor {
>> @@ -3055,24 +3066,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>>   	return (int32_t)(seq1 - seq2) >= 0;
>>   }
>>   
>> -static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
>> -					   bool lazy_coherency)
>> -{
>> -	if (!lazy_coherency && req->engine->irq_seqno_barrier)
>> -		req->engine->irq_seqno_barrier(req->engine);
>> -	return i915_seqno_passed(req->engine->get_seqno(req->engine),
>> -				 req->previous_seqno);
>> -}
>> -
>> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
>> -					      bool lazy_coherency)
>> -{
>> -	if (!lazy_coherency && req->engine->irq_seqno_barrier)
>> -		req->engine->irq_seqno_barrier(req->engine);
>> -	return i915_seqno_passed(req->engine->get_seqno(req->engine),
>> -				 req->seqno);
>> -}
>> -
>>   int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
>>   int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
>>   
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index ebef03b..1add29a9 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -1183,6 +1183,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>>   {
>>   	unsigned long timeout;
>>   	unsigned cpu;
>> +	uint32_t seqno;
>>   
>>   	/* When waiting for high frequency requests, e.g. during synchronous
>>   	 * rendering split between the CPU and GPU, the finite amount of time
>> @@ -1198,12 +1199,14 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>>   		return -EBUSY;
>>   
>>   	/* Only spin if we know the GPU is processing this request */
>> -	if (!i915_gem_request_started(req, true))
>> +	seqno = req->engine->get_seqno(req->engine);
>> +	if (!i915_seqno_passed(seqno, req->previous_seqno))
>>   		return -EAGAIN;
>>   
>>   	timeout = local_clock_us(&cpu) + 5;
>>   	while (!need_resched()) {
>> -		if (i915_gem_request_completed(req, true))
>> +		seqno = req->engine->get_seqno(req->engine);
>> +		if (i915_seqno_passed(seqno, req->seqno))
>>   			return 0;
>>   
>>   		if (signal_pending_state(state, current))
>> @@ -1215,7 +1218,10 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>>   		cpu_relax_lowlatency();
>>   	}
>>   
>> -	if (i915_gem_request_completed(req, false))
>> +	if (req->engine->irq_seqno_barrier)
>> +		req->engine->irq_seqno_barrier(req->engine);
>> +	seqno = req->engine->get_seqno(req->engine);
>> +	if (i915_seqno_passed(seqno, req->seqno))
>>   		return 0;
>>   
>>   	return -EAGAIN;
>> @@ -2721,12 +2727,14 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>>   	}
>>   }
>>   
>> -void i915_gem_request_free(struct kref *req_ref)
>> +static void i915_gem_request_free(struct fence *req_fence)
>>   {
>> -	struct drm_i915_gem_request *req = container_of(req_ref,
>> -						 typeof(*req), ref);
>> +	struct drm_i915_gem_request *req = container_of(req_fence,
>> +						 typeof(*req), fence);
>>   	struct intel_context *ctx = req->ctx;
>>   
>> +	WARN_ON(!mutex_is_locked(&req->engine->dev->struct_mutex));
>> +
>>   	if (req->file_priv)
>>   		i915_gem_request_remove_from_client(req);
>>   
>>
> Is kmem_cache_free rcu-safe?
>
> I don't think it is, and that would cause some hard to debug issues.
>
> Adding SLAB_DESTROY_BY_RCU to flags wouldn't do what you would expect here,
> so your best bet would be to do a call_rcu(&fence->rcu, wrapper_for_kmem_cache_free);
>
> ~Maarten
I don't understand what you mean? Are you referring to the 
kmem_cache_free that frees the request object at the end of the above 
function (which you have actually deleted from your reply)? Or are you 
referring to something inside the i915_gem_request_remove_from_client() 
call that your comments seem to be in reply to?

If you mean the free of the request itself, then the only usage of that 
particular kmem_cache are within the driver mutex lock. Does that not 
make it safe? If you mean the client remove, then where is the 
kmem_cache_free in that call path?

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 1/8] drm/i915: Convert requests to use struct fence
  2016-04-21 10:26     ` John Harrison
@ 2016-04-21 11:12       ` Maarten Lankhorst
  0 siblings, 0 replies; 22+ messages in thread
From: Maarten Lankhorst @ 2016-04-21 11:12 UTC (permalink / raw)
  To: John Harrison, Intel-GFX

Op 21-04-16 om 12:26 schreef John Harrison:
> On 21/04/2016 08:06, Maarten Lankhorst wrote:
>> Op 20-04-16 om 19:09 schreef John.C.Harrison@Intel.com:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> There is a construct in the linux kernel called 'struct fence' that is
>>> intended to keep track of work that is executed on hardware. I.e. it
>>> solves the basic problem that the drivers 'struct
>>> drm_i915_gem_request' is trying to address. The request structure does
>>> quite a lot more than simply track the execution progress so is very
>>> definitely still required. However, the basic completion status side
>>> could be updated to use the ready made fence implementation and gain
>>> all the advantages that provides.
>>>
>>> This patch makes the first step of integrating a struct fence into the
>>> request. It replaces the explicit reference count with that of the
>>> fence. It also replaces the 'is completed' test with the fence's
>>> equivalent. Currently, that simply chains on to the original request
>>> implementation. A future patch will improve this.
>>>
>>> v3: Updated after review comments by Tvrtko Ursulin. Added fence
>>> context/seqno pair to the debugfs request info. Renamed fence 'driver
>>> name' to just 'i915'. Removed BUG_ONs.
>>>
>>> v5: Changed seqno format in debugfs to %x rather than %u as that is
>>> apparently the preferred appearance. Line wrapped some long lines to
>>> keep the style checker happy.
>>>
>>> v6: Updated to newer nigthly and resolved conflicts. The biggest issue
>>> was with the re-worked busy spin precursor to waiting on a request. In
>>> particular, the addition of a 'request_started' helper function. This
>>> has no corresponding concept within the fence framework. However, it
>>> is only ever used in one place and the whole point of that place is to
>>> always directly read the seqno for absolutely lowest latency possible.
>>> So the simple solution is to just make the seqno test explicit at that
>>> point now rather than later in the series (it was previously being
>>> done anyway when fences become interrupt driven).
>>>
>>> v7: Rebased to newer nightly - lots of ring -> engine renaming and
>>> interface change to get_seqno().
>>>
>>> For: VIZ-5190
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
>>> ---
>>>   drivers/gpu/drm/i915/i915_debugfs.c     |  5 ++-
>>>   drivers/gpu/drm/i915/i915_drv.h         | 51 ++++++++++-------------
>>>   drivers/gpu/drm/i915/i915_gem.c         | 72 +++++++++++++++++++++++++++++----
>>>   drivers/gpu/drm/i915/intel_lrc.c        |  1 +
>>>   drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
>>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
>>>   6 files changed, 94 insertions(+), 39 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>>> index 2d11b49..6917515 100644
>>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>>> @@ -707,11 +707,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
>>>               task = NULL;
>>>               if (req->pid)
>>>                   task = pid_task(req->pid, PIDTYPE_PID);
>>> -            seq_printf(m, "    %x @ %d: %s [%d]\n",
>>> +            seq_printf(m, "    %x @ %d: %s [%d], fence = %x:%x\n",
>>>                      req->seqno,
>>>                      (int) (jiffies - req->emitted_jiffies),
>>>                      task ? task->comm : "<unknown>",
>>> -                   task ? task->pid : -1);
>>> +                   task ? task->pid : -1,
>>> +                   req->fence.context, req->fence.seqno);
>>>               rcu_read_unlock();
>>>           }
>>>   diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>> index d1e6e58..e5f49f3 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -54,6 +54,7 @@
>>>   #include <linux/pm_qos.h>
>>>   #include "intel_guc.h"
>>>   #include "intel_dpll_mgr.h"
>>> +#include <linux/fence.h>
>>>     /* General customization:
>>>    */
>>> @@ -2242,7 +2243,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>>>    * initial reference taken using kref_init
>>>    */
>>>   struct drm_i915_gem_request {
>>> -    struct kref ref;
>>> +    /**
>>> +     * Underlying object for implementing the signal/wait stuff.
>>> +     * NB: Never call fence_later() or return this fence object to user
>>> +     * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
>>> +     * etc., there is no guarantee at all about the validity or
>>> +     * sequentiality of the fence's seqno! It is also unsafe to let
>>> +     * anything outside of the i915 driver get hold of the fence object
>>> +     * as the clean up when decrementing the reference count requires
>>> +     * holding the driver mutex lock.
>>> +     */
>>> +    struct fence fence;
>>>         /** On Which ring this request was generated */
>>>       struct drm_i915_private *i915;
>>> @@ -2328,7 +2339,13 @@ struct drm_i915_gem_request * __must_check
>>>   i915_gem_request_alloc(struct intel_engine_cs *engine,
>>>                  struct intel_context *ctx);
>>>   void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>>> -void i915_gem_request_free(struct kref *req_ref);
>>> +
>>> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
>>> +                          bool lazy_coherency)
>>> +{
>>> +    return fence_is_signaled(&req->fence);
>>> +}
>>> +
>>>   int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>>>                      struct drm_file *file);
>>>   @@ -2348,7 +2365,7 @@ static inline struct drm_i915_gem_request *
>>>   i915_gem_request_reference(struct drm_i915_gem_request *req)
>>>   {
>>>       if (req)
>>> -        kref_get(&req->ref);
>>> +        fence_get(&req->fence);
>>>       return req;
>>>   }
>>>   @@ -2356,7 +2373,7 @@ static inline void
>>>   i915_gem_request_unreference(struct drm_i915_gem_request *req)
>>>   {
>>>       WARN_ON(!mutex_is_locked(&req->engine->dev->struct_mutex));
>>> -    kref_put(&req->ref, i915_gem_request_free);
>>> +    fence_put(&req->fence);
>>>   }
>>>     static inline void
>>> @@ -2368,7 +2385,7 @@ i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
>>>           return;
>>>         dev = req->engine->dev;
>>> -    if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
>>> +    if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
>>>           mutex_unlock(&dev->struct_mutex);
>>>   }
>>>   @@ -2385,12 +2402,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>>>   }
>>>     /*
>>> - * XXX: i915_gem_request_completed should be here but currently needs the
>>> - * definition of i915_seqno_passed() which is below. It will be moved in
>>> - * a later patch when the call to i915_seqno_passed() is obsoleted...
>>> - */
>>> -
>>> -/*
>>>    * A command that requires special handling by the command parser.
>>>    */
>>>   struct drm_i915_cmd_descriptor {
>>> @@ -3055,24 +3066,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>>>       return (int32_t)(seq1 - seq2) >= 0;
>>>   }
>>>   -static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
>>> -                       bool lazy_coherency)
>>> -{
>>> -    if (!lazy_coherency && req->engine->irq_seqno_barrier)
>>> -        req->engine->irq_seqno_barrier(req->engine);
>>> -    return i915_seqno_passed(req->engine->get_seqno(req->engine),
>>> -                 req->previous_seqno);
>>> -}
>>> -
>>> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
>>> -                          bool lazy_coherency)
>>> -{
>>> -    if (!lazy_coherency && req->engine->irq_seqno_barrier)
>>> -        req->engine->irq_seqno_barrier(req->engine);
>>> -    return i915_seqno_passed(req->engine->get_seqno(req->engine),
>>> -                 req->seqno);
>>> -}
>>> -
>>>   int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
>>>   int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
>>>   diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index ebef03b..1add29a9 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -1183,6 +1183,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>>>   {
>>>       unsigned long timeout;
>>>       unsigned cpu;
>>> +    uint32_t seqno;
>>>         /* When waiting for high frequency requests, e.g. during synchronous
>>>        * rendering split between the CPU and GPU, the finite amount of time
>>> @@ -1198,12 +1199,14 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>>>           return -EBUSY;
>>>         /* Only spin if we know the GPU is processing this request */
>>> -    if (!i915_gem_request_started(req, true))
>>> +    seqno = req->engine->get_seqno(req->engine);
>>> +    if (!i915_seqno_passed(seqno, req->previous_seqno))
>>>           return -EAGAIN;
>>>         timeout = local_clock_us(&cpu) + 5;
>>>       while (!need_resched()) {
>>> -        if (i915_gem_request_completed(req, true))
>>> +        seqno = req->engine->get_seqno(req->engine);
>>> +        if (i915_seqno_passed(seqno, req->seqno))
>>>               return 0;
>>>             if (signal_pending_state(state, current))
>>> @@ -1215,7 +1218,10 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>>>           cpu_relax_lowlatency();
>>>       }
>>>   -    if (i915_gem_request_completed(req, false))
>>> +    if (req->engine->irq_seqno_barrier)
>>> +        req->engine->irq_seqno_barrier(req->engine);
>>> +    seqno = req->engine->get_seqno(req->engine);
>>> +    if (i915_seqno_passed(seqno, req->seqno))
>>>           return 0;
>>>         return -EAGAIN;
>>> @@ -2721,12 +2727,14 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>>>       }
>>>   }
>>>   -void i915_gem_request_free(struct kref *req_ref)
>>> +static void i915_gem_request_free(struct fence *req_fence)
>>>   {
>>> -    struct drm_i915_gem_request *req = container_of(req_ref,
>>> -                         typeof(*req), ref);
>>> +    struct drm_i915_gem_request *req = container_of(req_fence,
>>> +                         typeof(*req), fence);
>>>       struct intel_context *ctx = req->ctx;
>>>   +    WARN_ON(!mutex_is_locked(&req->engine->dev->struct_mutex));
>>> +
>>>       if (req->file_priv)
>>>           i915_gem_request_remove_from_client(req);
>>>  
>> Is kmem_cache_free rcu-safe?
>>
>> I don't think it is, and that would cause some hard to debug issues.
>>
>> Adding SLAB_DESTROY_BY_RCU to flags wouldn't do what you would expect here,
>> so your best bet would be to do a call_rcu(&fence->rcu, wrapper_for_kmem_cache_free);
>>
>> ~Maarten
> I don't understand what you mean? Are you referring to the kmem_cache_free that frees the request object at the end of the above function (which you have actually deleted from your reply)? Or are you referring to something inside the i915_gem_request_remove_from_client() call that your comments seem to be in reply to?
>
> If you mean the free of the request itself, then the only usage of that particular kmem_cache are within the driver mutex lock. Does that not make it safe? If you mean the client remove, then where is the kmem_cache_free in that call path?
Just because a function is locked doesn't make it RCU safe. The fence api has function called fence_get_rcu and it requires that the memory backing the fence should be freed with kfree_rcu, call_rcu, or by calling synchronize_rcu before freeing.

In particular kmem_cache_free wouldn't work as intended, which results in another fence possibly re-using the memory.

Needs a __rcu annotated version of https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=tasklet&id=59f63caac74bbea817225e134e51ca97ecd06568

~Maarten
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 3/8] drm/i915: Add per context timelines to fence object
  2016-04-20 17:44   ` Chris Wilson
@ 2016-04-21 11:37     ` John Harrison
  0 siblings, 0 replies; 22+ messages in thread
From: John Harrison @ 2016-04-21 11:37 UTC (permalink / raw)
  To: Chris Wilson, Intel-GFX

On 20/04/2016 18:44, Chris Wilson wrote:
> On Wed, Apr 20, 2016 at 06:09:50PM +0100, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The fence object used inside the request structure requires a sequence
>> number. Although this is not used by the i915 driver itself, it could
>> potentially be used by non-i915 code if the fence is passed outside of
>> the driver. This is the intention as it allows external kernel drivers
>> and user applications to wait on batch buffer completion
>> asynchronously via the dma-buff fence API.
>>
>> To ensure that such external users are not confused by strange things
>> happening with the seqno, this patch adds in a per context timeline
>> that can provide a guaranteed in-order seqno value for the fence. This
>> is safe because the scheduler will not re-order batch buffers within a
>> context - they are considered to be mutually dependent.
>>
>> v2: New patch in series.
>>
>> v3: Renamed/retyped timeline structure fields after review comments by
>> Tvrtko Ursulin.
>>
>> Added context information to the timeline's name string for better
>> identification in debugfs output.
>>
>> v5: Line wrapping and other white space fixes to keep style checker
>> happy.
>>
>> v7: Updated to newer nightly (lots of ring -> engine renaming).
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h         | 25 +++++++---
>>   drivers/gpu/drm/i915/i915_gem.c         | 83 +++++++++++++++++++++++++++++----
>>   drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++
>>   drivers/gpu/drm/i915/intel_lrc.c        |  8 ++++
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
>>   5 files changed, 114 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index bb18b89..1c3a1ca 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -813,6 +813,15 @@ struct i915_ctx_hang_stats {
>>   	bool banned;
>>   };
>>   
>> +struct i915_fence_timeline {
>> +	char        name[32];
>> +	unsigned    fence_context;
>> +	unsigned    next;
>> +
>> +	struct intel_context *ctx;
>> +	struct intel_engine_cs *engine;
>> +};
>> +
>>   /* This must match up with the value previously used for execbuf2.rsvd1. */
>>   #define DEFAULT_CONTEXT_HANDLE 0
>>   
>> @@ -860,6 +869,7 @@ struct intel_context {
>>   		struct i915_vma *lrc_vma;
>>   		u64 lrc_desc;
>>   		uint32_t *lrc_reg_state;
>> +		struct i915_fence_timeline fence_timeline;
> This is wrong. The timeline is actually a property of the vm, with
> contexts for each engine.
> -Chris
>

I don't get what you mean. The timeline does not have contexts. It is 
just an incrementing number. It could possible be shared between all 
engines of a single software context rather than be specific to the 
hardware context. Although I think it is safer and more future proof to 
be the latter, i.e. per engine per s/w context. I don't see where the vm 
comes into it.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 0/8] Convert requests to use struct fence
  2016-04-20 17:09 [PATCH v7 0/8] Convert requests to use struct fence John.C.Harrison
                   ` (7 preceding siblings ...)
  2016-04-20 17:09 ` [PATCH v7 8/8] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
@ 2016-04-22 15:37 ` John Harrison
  8 siblings, 0 replies; 22+ messages in thread
From: John Harrison @ 2016-04-22 15:37 UTC (permalink / raw)
  To: Intel-GFX

This patch series can be pulled from 
'ssh://people.freedesktop.org/~johnharr/scheduler' as the 'fence' branch.


On 20/04/2016 18:09, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> There is a construct in the linux kernel called 'struct fence' that is
> intended to keep track of work that is executed on hardware. I.e. it
> solves the basic problem that the drivers 'struct
> drm_i915_gem_request' is trying to address. The request structure does
> quite a lot more than simply track the execution progress so is very
> definitely still required. However, the basic completion status side
> could be updated to use the ready made fence implementation and gain
> all the advantages that provides.
>
> Using the struct fence object also has the advantage that the fence
> can be used outside of the i915 driver (by other drivers or by
> userland applications). That is the basis of the dma-buff
> synchronisation API and allows asynchronous tracking of work
> completion. In this case, it allows applications to be signalled
> directly when a batch buffer completes without having to make an IOCTL
> call into the driver.
>
> Note that in order to allow the full fence API to be used (e.g.
> merging multiple fences together), the driver needs to provide an
> incrementing timeline for the fence. Currently this timeline is
> specific to the fence code as it must be per context. There is future
> work planned to make the driver's internal seqno value also be per
> context rather than driver global (VIZ-7443). Once this is done the
> fence specific timeline code can be dropped in favour of just using
> the driver's seqno value.
>
> This is work that was planned since the conversion of the driver from
> being seqno value based to being request structure based. This patch
> series does that work.
>
> An IGT test to exercise the fence support from user land is in
> progress and will follow. Android already makes extensive use of
> fences for display composition. Real world linux usage is planned in
> the form of Jesse's page table sharing / bufferless execbuf support.
> There is also a plan that Wayland (and others) could make use of it in
> a similar manner to Android.
>
> v2: Updated for review comments by various people and to add support
> for Android style 'native sync'.
>
> v3: Updated from review comments by Tvrtko Ursulin. Also moved sync
> framework out of staging and improved request completion handling.
>
> v4: Fixed patch tag (should have been PATCH not RFC). Corrected
> ownership of one patch which had passed through many hands before
> reaching me. Fixed a bug introduced in v3 and updated for review
> comments.
>
> v5: Removed de-staging and further updates to Android sync code. The
> de-stage is now being handled by someone else. The sync integration to
> the i915 driver will be a separate patch set that can only land after
> the external de-stage has been completed.
>
> Assorted changes based on review comments and style checker fixes.
> Most significant change is fixing up the fake lost interrupt support
> for the 'drv_missed_irq_hang' IGT test and improving the wait request
> latency.
>
> v6: Updated to newer nigthly and resolved conflicts around updates
> to the wait_request optimisations.
>
> v7: Updated to newer nightly and resolved conflicts around massive
> ring -> engine rename and interface change to get_seqno(). Also fixed
> up a race condition issue with stale request pointers in file client
> lists and added a minor optimisation to not acquire spinlocks when a
> list is empty and does not need processing.
>
> [Patches against drm-intel-nightly tree fetched 13/04/2016]
>
> John Harrison (8):
>    drm/i915: Convert requests to use struct fence
>    drm/i915: Removed now redudant parameter to i915_gem_request_completed()
>    drm/i915: Add per context timelines to fence object
>    drm/i915: Fix clean up of file client list on execbuff failure
>    drm/i915: Delay the freeing of requests until retire time
>    drm/i915: Interrupt driven fences
>    drm/i915: Updated request structure tracing
>    drm/i915: Cache last IRQ seqno to reduce IRQ overhead
>
>   drivers/gpu/drm/i915/i915_debugfs.c        |   7 +-
>   drivers/gpu/drm/i915/i915_drv.h            |  74 +++--
>   drivers/gpu/drm/i915/i915_gem.c            | 443 ++++++++++++++++++++++++++---
>   drivers/gpu/drm/i915/i915_gem_context.c    |  14 +
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |   6 +-
>   drivers/gpu/drm/i915/i915_irq.c            |   3 +-
>   drivers/gpu/drm/i915/i915_trace.h          |  14 +-
>   drivers/gpu/drm/i915/intel_display.c       |   4 +-
>   drivers/gpu/drm/i915/intel_lrc.c           |  13 +
>   drivers/gpu/drm/i915/intel_pm.c            |   6 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.c    |   5 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h    |  12 +
>   12 files changed, 513 insertions(+), 88 deletions(-)
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-04-22 15:37 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-20 17:09 [PATCH v7 0/8] Convert requests to use struct fence John.C.Harrison
2016-04-20 17:09 ` [PATCH v7 1/8] drm/i915: " John.C.Harrison
2016-04-21  7:06   ` Maarten Lankhorst
2016-04-21 10:26     ` John Harrison
2016-04-21 11:12       ` Maarten Lankhorst
2016-04-21  7:09   ` Maarten Lankhorst
2016-04-20 17:09 ` [PATCH v7 2/8] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
2016-04-21  7:19   ` Maarten Lankhorst
2016-04-21 10:18     ` John Harrison
2016-04-20 17:09 ` [PATCH v7 3/8] drm/i915: Add per context timelines to fence object John.C.Harrison
2016-04-20 17:44   ` Chris Wilson
2016-04-21 11:37     ` John Harrison
2016-04-20 17:09 ` [PATCH v7 4/8] drm/i915: Fix clean up of file client list on execbuff failure John.C.Harrison
2016-04-21  7:20   ` Maarten Lankhorst
2016-04-21 10:15     ` John Harrison
2016-04-20 17:09 ` [PATCH v7 5/8] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
2016-04-21  7:41   ` Maarten Lankhorst
2016-04-20 17:09 ` [PATCH v7 6/8] drm/i915: Interrupt driven fences John.C.Harrison
2016-04-21  7:38   ` Maarten Lankhorst
2016-04-20 17:09 ` [PATCH v7 7/8] drm/i915: Updated request structure tracing John.C.Harrison
2016-04-20 17:09 ` [PATCH v7 8/8] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
2016-04-22 15:37 ` [PATCH v7 0/8] Convert requests to use struct fence John Harrison

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).