* [PATCH v6 0/7] Convert requests to use struct fence
@ 2016-02-18 14:24 John.C.Harrison
2016-02-18 14:24 ` [PATCH v6 1/7] drm/i915: " John.C.Harrison
` (8 more replies)
0 siblings, 9 replies; 18+ messages in thread
From: John.C.Harrison @ 2016-02-18 14:24 UTC (permalink / raw)
To: Intel-GFX
From: John Harrison <John.C.Harrison@Intel.com>
There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more than simply track the execution progress so is very
definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain
all the advantages that provides.
Using the struct fence object also has the advantage that the fence
can be used outside of the i915 driver (by other drivers or by
userland applications). That is the basis of the dma-buff
synchronisation API and allows asynchronous tracking of work
completion. In this case, it allows applications to be signalled
directly when a batch buffer completes without having to make an IOCTL
call into the driver.
This is work that was planned since the conversion of the driver from
being seqno value based to being request structure based. This patch
series does that work.
An IGT test to exercise the fence support from user land is in
progress and will follow. Android already makes extensive use of
fences for display composition. Real world linux usage is planned in
the form of Jesse's page table sharing / bufferless execbuf support.
There is also a plan that Wayland (and others) could make use of it in
a similar manner to Android.
v2: Updated for review comments by various people and to add support
for Android style 'native sync'.
v3: Updated from review comments by Tvrtko Ursulin. Also moved sync
framework out of staging and improved request completion handling.
v4: Fixed patch tag (should have been PATCH not RFC). Corrected
ownership of one patch which had passed through many hands before
reaching me. Fixed a bug introduced in v3 and updated for review
comments.
v5: Removed de-staging and further updates to Android sync code. The
de-stage is now being handled by someone else. The sync integration to
the i915 driver will be a separate patch set that can only land after
the external de-stage has been completed.
Assorted changes based on review comments and style checker fixes.
Most significant change is fixing up the fake lost interrupt support
for the 'drv_missed_irq_hang' IGT test and improving the wait request
latency.
v6: Updated to newer nigthly and resolved conflicts around updates
to the wait_request optimisations.
[Patches against drm-intel-nightly tree fetched 19/01/2016]
John Harrison (7):
drm/i915: Convert requests to use struct fence
drm/i915: Removed now redudant parameter to i915_gem_request_completed()
drm/i915: Add per context timelines to fence object
drm/i915: Delay the freeing of requests until retire time
drm/i915: Interrupt driven fences
drm/i915: Updated request structure tracing
drm/i915: Cache last IRQ seqno to reduce IRQ overhead
drivers/gpu/drm/i915/i915_debugfs.c | 7 +-
drivers/gpu/drm/i915/i915_drv.h | 69 +++---
drivers/gpu/drm/i915/i915_gem.c | 423 +++++++++++++++++++++++++++++---
drivers/gpu/drm/i915/i915_gem_context.c | 16 +-
drivers/gpu/drm/i915/i915_irq.c | 2 +-
drivers/gpu/drm/i915/i915_trace.h | 14 +-
drivers/gpu/drm/i915/intel_display.c | 4 +-
drivers/gpu/drm/i915/intel_lrc.c | 13 +
drivers/gpu/drm/i915/intel_pm.c | 6 +-
drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +
drivers/gpu/drm/i915/intel_ringbuffer.h | 12 +
11 files changed, 491 insertions(+), 80 deletions(-)
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v6 1/7] drm/i915: Convert requests to use struct fence
2016-02-18 14:24 [PATCH v6 0/7] Convert requests to use struct fence John.C.Harrison
@ 2016-02-18 14:24 ` John.C.Harrison
2016-02-18 14:24 ` [PATCH v6 2/7] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
` (7 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: John.C.Harrison @ 2016-02-18 14:24 UTC (permalink / raw)
To: Intel-GFX
From: John Harrison <John.C.Harrison@Intel.com>
There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more than simply track the execution progress so is very
definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain
all the advantages that provides.
This patch makes the first step of integrating a struct fence into the
request. It replaces the explicit reference count with that of the
fence. It also replaces the 'is completed' test with the fence's
equivalent. Currently, that simply chains on to the original request
implementation. A future patch will improve this.
v3: Updated after review comments by Tvrtko Ursulin. Added fence
context/seqno pair to the debugfs request info. Renamed fence 'driver
name' to just 'i915'. Removed BUG_ONs.
v5: Changed seqno format in debugfs to %x rather than %u as that is
apparently the preferred appearance. Line wrapped some long lines to
keep the style checker happy.
v6: Updated to newer nigthly and resolved conflicts. The biggest issue
was with the re-worked busy spin precursor to waiting on a request. In
particular, the addition of a 'request_started' helper function. This
has no corresponding concept within the fence framework. However, it
is only ever used in one place and the whole point of that place is to
always directly read the seqno for absolutely lowest latency possible.
So the simple solution is to just make the seqno test explicit at that
point now rather than later in the series (it was previously being
done anyway when fences become interrupt driven).
For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
drivers/gpu/drm/i915/i915_debugfs.c | 5 ++-
drivers/gpu/drm/i915/i915_drv.h | 47 +++++++++++------------
drivers/gpu/drm/i915/i915_gem.c | 67 +++++++++++++++++++++++++++++----
drivers/gpu/drm/i915/intel_lrc.c | 1 +
drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
drivers/gpu/drm/i915/intel_ringbuffer.h | 3 ++
6 files changed, 89 insertions(+), 35 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index ebe7063..d032e9f 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -709,11 +709,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
task = NULL;
if (req->pid)
task = pid_task(req->pid, PIDTYPE_PID);
- seq_printf(m, " %x @ %d: %s [%d]\n",
+ seq_printf(m, " %x @ %d: %s [%d], fence = %x:%x\n",
req->seqno,
(int) (jiffies - req->emitted_jiffies),
task ? task->comm : "<unknown>",
- task ? task->pid : -1);
+ task ? task->pid : -1,
+ req->fence.context, req->fence.seqno);
rcu_read_unlock();
}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 351308f..7c64cc1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -53,6 +53,7 @@
#include <linux/kref.h>
#include <linux/pm_qos.h>
#include "intel_guc.h"
+#include <linux/fence.h>
/* General customization:
*/
@@ -2197,7 +2198,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
* initial reference taken using kref_init
*/
struct drm_i915_gem_request {
- struct kref ref;
+ /**
+ * Underlying object for implementing the signal/wait stuff.
+ * NB: Never call fence_later() or return this fence object to user
+ * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
+ * etc., there is no guarantee at all about the validity or
+ * sequentiality of the fence's seqno! It is also unsafe to let
+ * anything outside of the i915 driver get hold of the fence object
+ * as the clean up when decrementing the reference count requires
+ * holding the driver mutex lock.
+ */
+ struct fence fence;
/** On Which ring this request was generated */
struct drm_i915_private *i915;
@@ -2283,7 +2294,13 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
struct intel_context *ctx,
struct drm_i915_gem_request **req_out);
void i915_gem_request_cancel(struct drm_i915_gem_request *req);
-void i915_gem_request_free(struct kref *req_ref);
+
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
+ bool lazy_coherency)
+{
+ return fence_is_signaled(&req->fence);
+}
+
int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
struct drm_file *file);
@@ -2303,7 +2320,7 @@ static inline struct drm_i915_gem_request *
i915_gem_request_reference(struct drm_i915_gem_request *req)
{
if (req)
- kref_get(&req->ref);
+ fence_get(&req->fence);
return req;
}
@@ -2311,7 +2328,7 @@ static inline void
i915_gem_request_unreference(struct drm_i915_gem_request *req)
{
WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
- kref_put(&req->ref, i915_gem_request_free);
+ fence_put(&req->fence);
}
static inline void
@@ -2323,7 +2340,7 @@ i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
return;
dev = req->ring->dev;
- if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
+ if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
mutex_unlock(&dev->struct_mutex);
}
@@ -2340,12 +2357,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
}
/*
- * XXX: i915_gem_request_completed should be here but currently needs the
- * definition of i915_seqno_passed() which is below. It will be moved in
- * a later patch when the call to i915_seqno_passed() is obsoleted...
- */
-
-/*
* A command that requires special handling by the command parser.
*/
struct drm_i915_cmd_descriptor {
@@ -2966,20 +2977,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
return (int32_t)(seq1 - seq2) >= 0;
}
-static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
- bool lazy_coherency)
-{
- u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency);
- return i915_seqno_passed(seqno, req->previous_seqno);
-}
-
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
- bool lazy_coherency)
-{
- u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency);
- return i915_seqno_passed(seqno, req->seqno);
-}
-
int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 915b4e2..901be6c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1192,6 +1192,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
{
unsigned long timeout;
unsigned cpu;
+ uint32_t seqno;
/* When waiting for high frequency requests, e.g. during synchronous
* rendering split between the CPU and GPU, the finite amount of time
@@ -1207,12 +1208,14 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
return -EBUSY;
/* Only spin if we know the GPU is processing this request */
- if (!i915_gem_request_started(req, true))
+ seqno = req->ring->get_seqno(req->ring, true);
+ if (!i915_seqno_passed(seqno, req->previous_seqno))
return -EAGAIN;
timeout = local_clock_us(&cpu) + 5;
while (!need_resched()) {
- if (i915_gem_request_completed(req, true))
+ seqno = req->ring->get_seqno(req->ring, true);
+ if (i915_seqno_passed(seqno, req->seqno))
return 0;
if (signal_pending_state(state, current))
@@ -1224,7 +1227,8 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
cpu_relax_lowlatency();
}
- if (i915_gem_request_completed(req, false))
+ seqno = req->ring->get_seqno(req->ring, false);
+ if (i915_seqno_passed(seqno, req->seqno))
return 0;
return -EAGAIN;
@@ -2679,12 +2683,14 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
}
}
-void i915_gem_request_free(struct kref *req_ref)
+static void i915_gem_request_free(struct fence *req_fence)
{
- struct drm_i915_gem_request *req = container_of(req_ref,
- typeof(*req), ref);
+ struct drm_i915_gem_request *req = container_of(req_fence,
+ typeof(*req), fence);
struct intel_context *ctx = req->ctx;
+ WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
+
if (req->file_priv)
i915_gem_request_remove_from_client(req);
@@ -2700,6 +2706,45 @@ void i915_gem_request_free(struct kref *req_ref)
kmem_cache_free(req->i915->requests, req);
}
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+{
+ /* Interrupt driven fences are not implemented yet.*/
+ WARN(true, "This should not be called!");
+ return true;
+}
+
+static bool i915_gem_request_is_completed(struct fence *req_fence)
+{
+ struct drm_i915_gem_request *req = container_of(req_fence,
+ typeof(*req), fence);
+ u32 seqno;
+
+ seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
+
+ return i915_seqno_passed(seqno, req->seqno);
+}
+
+static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
+{
+ return "i915";
+}
+
+static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
+{
+ struct drm_i915_gem_request *req = container_of(req_fence,
+ typeof(*req), fence);
+ return req->ring->name;
+}
+
+static const struct fence_ops i915_gem_request_fops = {
+ .enable_signaling = i915_gem_request_enable_signaling,
+ .signaled = i915_gem_request_is_completed,
+ .wait = fence_default_wait,
+ .release = i915_gem_request_free,
+ .get_driver_name = i915_gem_request_get_driver_name,
+ .get_timeline_name = i915_gem_request_get_timeline_name,
+};
+
int i915_gem_request_alloc(struct intel_engine_cs *ring,
struct intel_context *ctx,
struct drm_i915_gem_request **req_out)
@@ -2721,7 +2766,6 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
if (ret)
goto err;
- kref_init(&req->ref);
req->i915 = dev_priv;
req->ring = ring;
req->ctx = ctx;
@@ -2736,6 +2780,9 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
goto err;
}
+ fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
+ ring->fence_context, req->seqno);
+
/*
* Reserve space in the ring buffer for all the commands required to
* eventually emit this request. This is to guarantee that the
@@ -4810,7 +4857,7 @@ i915_gem_init_hw(struct drm_device *dev)
{
struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_engine_cs *ring;
- int ret, i, j;
+ int ret, i, j, fence_base;
if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
return -EIO;
@@ -4880,12 +4927,16 @@ i915_gem_init_hw(struct drm_device *dev)
if (ret)
goto out;
+ fence_base = fence_context_alloc(I915_NUM_RINGS);
+
/* Now it is safe to go back round and do everything else: */
for_each_ring(ring, dev_priv, i) {
struct drm_i915_gem_request *req;
WARN_ON(!ring->default_context);
+ ring->fence_context = fence_base + i;
+
ret = i915_gem_request_alloc(ring, ring->default_context, &req);
if (ret) {
i915_gem_cleanup_ringbuffer(dev);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index faaf490..4e5195b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2013,6 +2013,7 @@ logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
ring->dev = dev;
INIT_LIST_HEAD(&ring->active_list);
INIT_LIST_HEAD(&ring->request_list);
+ spin_lock_init(&ring->fence_lock);
i915_gem_batch_pool_init(dev, &ring->batch_pool);
init_waitqueue_head(&ring->irq_queue);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index d4e33ac..651b700 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2159,6 +2159,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
INIT_LIST_HEAD(&ring->request_list);
INIT_LIST_HEAD(&ring->execlist_queue);
INIT_LIST_HEAD(&ring->buffers);
+ spin_lock_init(&ring->fence_lock);
i915_gem_batch_pool_init(dev, &ring->batch_pool);
memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index ede5795..22e3d87 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -352,6 +352,9 @@ struct intel_engine_cs {
* to encode the command length in the header).
*/
u32 (*get_cmd_length_mask)(u32 cmd_header);
+
+ unsigned fence_context;
+ spinlock_t fence_lock;
};
static inline bool
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v6 2/7] drm/i915: Removed now redudant parameter to i915_gem_request_completed()
2016-02-18 14:24 [PATCH v6 0/7] Convert requests to use struct fence John.C.Harrison
2016-02-18 14:24 ` [PATCH v6 1/7] drm/i915: " John.C.Harrison
@ 2016-02-18 14:24 ` John.C.Harrison
2016-02-18 14:24 ` [PATCH v6 3/7] drm/i915: Add per context timelines to fence object John.C.Harrison
` (6 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: John.C.Harrison @ 2016-02-18 14:24 UTC (permalink / raw)
To: Intel-GFX
From: John Harrison <John.C.Harrison@Intel.com>
The change to the implementation of i915_gem_request_completed() means
that the lazy coherency flag is no longer used. This can now be
removed to simplify the interface.
v6: Updated to newer nigthly and resolved conflicts.
For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
drivers/gpu/drm/i915/i915_debugfs.c | 2 +-
drivers/gpu/drm/i915/i915_drv.h | 3 +--
drivers/gpu/drm/i915/i915_gem.c | 14 +++++++-------
drivers/gpu/drm/i915/intel_display.c | 2 +-
drivers/gpu/drm/i915/intel_pm.c | 4 ++--
5 files changed, 12 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index d032e9f..b90d6ea 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -601,7 +601,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
i915_gem_request_get_seqno(work->flip_queued_req),
dev_priv->next_seqno,
ring->get_seqno(ring, true),
- i915_gem_request_completed(work->flip_queued_req, true));
+ i915_gem_request_completed(work->flip_queued_req));
} else
seq_printf(m, "Flip not associated with any ring\n");
seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7c64cc1..86ef0b4 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2295,8 +2295,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
struct drm_i915_gem_request **req_out);
void i915_gem_request_cancel(struct drm_i915_gem_request *req);
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
- bool lazy_coherency)
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
{
return fence_is_signaled(&req->fence);
}
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 901be6c..e170732 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1273,7 +1273,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
if (list_empty(&req->list))
return 0;
- if (i915_gem_request_completed(req, true))
+ if (i915_gem_request_completed(req))
return 0;
timeout_expire = 0;
@@ -1323,7 +1323,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
break;
}
- if (i915_gem_request_completed(req, false)) {
+ if (i915_gem_request_completed(req)) {
ret = 0;
break;
}
@@ -2825,7 +2825,7 @@ i915_gem_find_active_request(struct intel_engine_cs *ring)
struct drm_i915_gem_request *request;
list_for_each_entry(request, &ring->request_list, list) {
- if (i915_gem_request_completed(request, false))
+ if (i915_gem_request_completed(request))
continue;
return request;
@@ -2959,7 +2959,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
struct drm_i915_gem_request,
list);
- if (!i915_gem_request_completed(request, true))
+ if (!i915_gem_request_completed(request))
break;
i915_gem_request_retire(request);
@@ -2983,7 +2983,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
}
if (unlikely(ring->trace_irq_req &&
- i915_gem_request_completed(ring->trace_irq_req, true))) {
+ i915_gem_request_completed(ring->trace_irq_req))) {
ring->irq_put(ring);
i915_gem_request_assign(&ring->trace_irq_req, NULL);
}
@@ -3093,7 +3093,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
if (list_empty(&req->list))
goto retire;
- if (i915_gem_request_completed(req, true)) {
+ if (i915_gem_request_completed(req)) {
__i915_gem_request_retire__upto(req);
retire:
i915_gem_object_retire__read(obj, i);
@@ -3205,7 +3205,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
if (to == from)
return 0;
- if (i915_gem_request_completed(from_req, true))
+ if (i915_gem_request_completed(from_req))
return 0;
if (!i915_semaphore_is_enabled(obj->base.dev)) {
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index af4316b..a7bcd87e 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11519,7 +11519,7 @@ static bool __intel_pageflip_stall_check(struct drm_device *dev,
if (work->flip_ready_vblank == 0) {
if (work->flip_queued_req &&
- !i915_gem_request_completed(work->flip_queued_req, true))
+ !i915_gem_request_completed(work->flip_queued_req))
return false;
work->flip_ready_vblank = drm_crtc_vblank_count(crtc);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 6b2b3e3..061c59d 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7306,7 +7306,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
struct request_boost *boost = container_of(work, struct request_boost, work);
struct drm_i915_gem_request *req = boost->req;
- if (!i915_gem_request_completed(req, true))
+ if (!i915_gem_request_completed(req))
gen6_rps_boost(to_i915(req->ring->dev), NULL,
req->emitted_jiffies);
@@ -7322,7 +7322,7 @@ void intel_queue_rps_boost_for_request(struct drm_device *dev,
if (req == NULL || INTEL_INFO(dev)->gen < 6)
return;
- if (i915_gem_request_completed(req, true))
+ if (i915_gem_request_completed(req))
return;
boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v6 3/7] drm/i915: Add per context timelines to fence object
2016-02-18 14:24 [PATCH v6 0/7] Convert requests to use struct fence John.C.Harrison
2016-02-18 14:24 ` [PATCH v6 1/7] drm/i915: " John.C.Harrison
2016-02-18 14:24 ` [PATCH v6 2/7] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
@ 2016-02-18 14:24 ` John.C.Harrison
2016-02-18 14:49 ` Chris Wilson
2016-02-18 14:24 ` [PATCH v6 4/7] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
` (5 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: John.C.Harrison @ 2016-02-18 14:24 UTC (permalink / raw)
To: Intel-GFX
From: John Harrison <John.C.Harrison@Intel.com>
The fence object used inside the request structure requires a sequence
number. Although this is not used by the i915 driver itself, it could
potentially be used by non-i915 code if the fence is passed outside of
the driver. This is the intention as it allows external kernel drivers
and user applications to wait on batch buffer completion
asynchronously via the dma-buff fence API.
To ensure that such external users are not confused by strange things
happening with the seqno, this patch adds in a per context timeline
that can provide a guaranteed in-order seqno value for the fence. This
is safe because the scheduler will not re-order batch buffers within a
context - they are considered to be mutually dependent.
v2: New patch in series.
v3: Renamed/retyped timeline structure fields after review comments by
Tvrtko Ursulin.
Added context information to the timeline's name string for better
identification in debugfs output.
v5: Line wrapping and other white space fixes to keep style checker
happy.
For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 25 +++++++---
drivers/gpu/drm/i915/i915_gem.c | 83 +++++++++++++++++++++++++++++----
drivers/gpu/drm/i915/i915_gem_context.c | 16 ++++++-
drivers/gpu/drm/i915/intel_lrc.c | 8 ++++
drivers/gpu/drm/i915/intel_ringbuffer.h | 1 -
5 files changed, 115 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 86ef0b4..62dbdf2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -845,6 +845,15 @@ struct i915_ctx_hang_stats {
bool banned;
};
+struct i915_fence_timeline {
+ char name[32];
+ unsigned fence_context;
+ unsigned next;
+
+ struct intel_context *ctx;
+ struct intel_engine_cs *ring;
+};
+
/* This must match up with the value previously used for execbuf2.rsvd1. */
#define DEFAULT_CONTEXT_HANDLE 0
@@ -892,6 +901,7 @@ struct intel_context {
struct i915_vma *lrc_vma;
u64 lrc_desc;
uint32_t *lrc_reg_state;
+ struct i915_fence_timeline fence_timeline;
} engine[I915_NUM_RINGS];
struct list_head link;
@@ -2200,13 +2210,10 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
struct drm_i915_gem_request {
/**
* Underlying object for implementing the signal/wait stuff.
- * NB: Never call fence_later() or return this fence object to user
- * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
- * etc., there is no guarantee at all about the validity or
- * sequentiality of the fence's seqno! It is also unsafe to let
- * anything outside of the i915 driver get hold of the fence object
- * as the clean up when decrementing the reference count requires
- * holding the driver mutex lock.
+ * NB: Never return this fence object to user land! It is unsafe to
+ * let anything outside of the i915 driver get hold of the fence
+ * object as the clean up when decrementing the reference count
+ * requires holding the driver mutex lock.
*/
struct fence fence;
@@ -2295,6 +2302,10 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
struct drm_i915_gem_request **req_out);
void i915_gem_request_cancel(struct drm_i915_gem_request *req);
+int i915_create_fence_timeline(struct drm_device *dev,
+ struct intel_context *ctx,
+ struct intel_engine_cs *ring);
+
static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
{
return fence_is_signaled(&req->fence);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e170732..2d50287 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2731,9 +2731,35 @@ static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
{
- struct drm_i915_gem_request *req = container_of(req_fence,
- typeof(*req), fence);
- return req->ring->name;
+ struct drm_i915_gem_request *req;
+ struct i915_fence_timeline *timeline;
+
+ req = container_of(req_fence, typeof(*req), fence);
+ timeline = &req->ctx->engine[req->ring->id].fence_timeline;
+
+ return timeline->name;
+}
+
+static void i915_gem_request_timeline_value_str(struct fence *req_fence,
+ char *str, int size)
+{
+ struct drm_i915_gem_request *req;
+
+ req = container_of(req_fence, typeof(*req), fence);
+
+ /* Last signalled timeline value ??? */
+ snprintf(str, size, "? [%d]"/*, timeline->value*/,
+ req->ring->get_seqno(req->ring, true));
+}
+
+static void i915_gem_request_fence_value_str(struct fence *req_fence,
+ char *str, int size)
+{
+ struct drm_i915_gem_request *req;
+
+ req = container_of(req_fence, typeof(*req), fence);
+
+ snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);
}
static const struct fence_ops i915_gem_request_fops = {
@@ -2743,8 +2769,50 @@ static const struct fence_ops i915_gem_request_fops = {
.release = i915_gem_request_free,
.get_driver_name = i915_gem_request_get_driver_name,
.get_timeline_name = i915_gem_request_get_timeline_name,
+ .fence_value_str = i915_gem_request_fence_value_str,
+ .timeline_value_str = i915_gem_request_timeline_value_str,
};
+int i915_create_fence_timeline(struct drm_device *dev,
+ struct intel_context *ctx,
+ struct intel_engine_cs *ring)
+{
+ struct i915_fence_timeline *timeline;
+
+ timeline = &ctx->engine[ring->id].fence_timeline;
+
+ if (timeline->ring)
+ return 0;
+
+ timeline->fence_context = fence_context_alloc(1);
+
+ /*
+ * Start the timeline from seqno 0 as this is a special value
+ * that is reserved for invalid sync points.
+ */
+ timeline->next = 1;
+ timeline->ctx = ctx;
+ timeline->ring = ring;
+
+ snprintf(timeline->name, sizeof(timeline->name), "%d>%s:%d",
+ timeline->fence_context, ring->name, ctx->user_handle);
+
+ return 0;
+}
+
+static unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
+{
+ unsigned seqno;
+
+ seqno = timeline->next;
+
+ /* Reserve zero for invalid */
+ if (++timeline->next == 0)
+ timeline->next = 1;
+
+ return seqno;
+}
+
int i915_gem_request_alloc(struct intel_engine_cs *ring,
struct intel_context *ctx,
struct drm_i915_gem_request **req_out)
@@ -2781,7 +2849,8 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
}
fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
- ring->fence_context, req->seqno);
+ ctx->engine[ring->id].fence_timeline.fence_context,
+ i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
/*
* Reserve space in the ring buffer for all the commands required to
@@ -4857,7 +4926,7 @@ i915_gem_init_hw(struct drm_device *dev)
{
struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_engine_cs *ring;
- int ret, i, j, fence_base;
+ int ret, i, j;
if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
return -EIO;
@@ -4927,16 +4996,12 @@ i915_gem_init_hw(struct drm_device *dev)
if (ret)
goto out;
- fence_base = fence_context_alloc(I915_NUM_RINGS);
-
/* Now it is safe to go back round and do everything else: */
for_each_ring(ring, dev_priv, i) {
struct drm_i915_gem_request *req;
WARN_ON(!ring->default_context);
- ring->fence_context = fence_base + i;
-
ret = i915_gem_request_alloc(ring, ring->default_context, &req);
if (ret) {
i915_gem_cleanup_ringbuffer(dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 7951140..3dcb2f4 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -271,7 +271,7 @@ i915_gem_create_context(struct drm_device *dev,
{
const bool is_global_default_ctx = file_priv == NULL;
struct intel_context *ctx;
- int ret = 0;
+ int i, ret = 0;
BUG_ON(!mutex_is_locked(&dev->struct_mutex));
@@ -279,6 +279,20 @@ i915_gem_create_context(struct drm_device *dev,
if (IS_ERR(ctx))
return ctx;
+ if (!i915.enable_execlists) {
+ struct intel_engine_cs *ring;
+
+ /* Create a per context timeline for fences */
+ for_each_ring(ring, to_i915(dev), i) {
+ ret = i915_create_fence_timeline(dev, ctx, ring);
+ if (ret) {
+ DRM_ERROR("Fence timeline creation failed for legacy %s: %p\n",
+ ring->name, ctx);
+ goto err_destroy;
+ }
+ }
+ }
+
if (is_global_default_ctx && ctx->legacy_hw_ctx.rcs_state) {
/* We may need to do things with the shrinker which
* require us to immediately switch back to the default
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4e5195b..ac1f6dc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2515,6 +2515,14 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
goto error_ringbuf;
}
+ /* Create a per context timeline for fences */
+ ret = i915_create_fence_timeline(dev, ctx, ring);
+ if (ret) {
+ DRM_ERROR("Fence timeline creation failed for ring %s, ctx %p\n",
+ ring->name, ctx);
+ goto error_ringbuf;
+ }
+
ctx->engine[ring->id].ringbuf = ringbuf;
ctx->engine[ring->id].state = ctx_obj;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 22e3d87..585a540 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -353,7 +353,6 @@ struct intel_engine_cs {
*/
u32 (*get_cmd_length_mask)(u32 cmd_header);
- unsigned fence_context;
spinlock_t fence_lock;
};
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v6 4/7] drm/i915: Delay the freeing of requests until retire time
2016-02-18 14:24 [PATCH v6 0/7] Convert requests to use struct fence John.C.Harrison
` (2 preceding siblings ...)
2016-02-18 14:24 ` [PATCH v6 3/7] drm/i915: Add per context timelines to fence object John.C.Harrison
@ 2016-02-18 14:24 ` John.C.Harrison
2016-02-18 14:48 ` Chris Wilson
2016-03-01 15:18 ` Maarten Lankhorst
2016-02-18 14:24 ` [PATCH v6 5/7] drm/i915: Interrupt driven fences John.C.Harrison
` (4 subsequent siblings)
8 siblings, 2 replies; 18+ messages in thread
From: John.C.Harrison @ 2016-02-18 14:24 UTC (permalink / raw)
To: Intel-GFX
From: John Harrison <John.C.Harrison@Intel.com>
The request structure is reference counted. When the count reached
zero, the request was immediately freed and all associated objects
were unrefereced/unallocated. This meant that the driver mutex lock
must be held at the point where the count reaches zero. This was fine
while all references were held internally to the driver. However, the
plan is to allow the underlying fence object (and hence the request
itself) to be returned to other drivers and to userland. External
users cannot be expected to acquire a driver private mutex lock.
Rather than attempt to disentangle the request structure from the
driver mutex lock, the decsion was to defer the free code until a
later (safer) point. Hence this patch changes the unreference callback
to merely move the request onto a delayed free list. The driver's
retire worker thread will then process the list and actually call the
free function on the requests.
v2: New patch in series.
v3: Updated after review comments by Tvrtko Ursulin. Rename list nodes
to 'link' rather than 'list'. Update list processing to be more
efficient/safer with respect to spinlocks.
v4: Changed to use basic spinlocks rather than IRQ ones - missed
update from earlier feedback by Tvrtko.
v5: Improved a comment to keep the style checker happy.
For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 22 +++-----------------
drivers/gpu/drm/i915/i915_gem.c | 37 +++++++++++++++++++++++++++++----
drivers/gpu/drm/i915/intel_display.c | 2 +-
drivers/gpu/drm/i915/intel_lrc.c | 2 ++
drivers/gpu/drm/i915/intel_pm.c | 2 +-
drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++
drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++
7 files changed, 49 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 62dbdf2..2c6aefba 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2208,14 +2208,9 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
* initial reference taken using kref_init
*/
struct drm_i915_gem_request {
- /**
- * Underlying object for implementing the signal/wait stuff.
- * NB: Never return this fence object to user land! It is unsafe to
- * let anything outside of the i915 driver get hold of the fence
- * object as the clean up when decrementing the reference count
- * requires holding the driver mutex lock.
- */
+ /** Underlying object for implementing the signal/wait stuff. */
struct fence fence;
+ struct list_head delayed_free_link;
/** On Which ring this request was generated */
struct drm_i915_private *i915;
@@ -2337,21 +2332,10 @@ i915_gem_request_reference(struct drm_i915_gem_request *req)
static inline void
i915_gem_request_unreference(struct drm_i915_gem_request *req)
{
- WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
- fence_put(&req->fence);
-}
-
-static inline void
-i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
-{
- struct drm_device *dev;
-
if (!req)
return;
- dev = req->ring->dev;
- if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
- mutex_unlock(&dev->struct_mutex);
+ fence_put(&req->fence);
}
static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2d50287..aca9fcd 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2683,10 +2683,26 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
}
}
-static void i915_gem_request_free(struct fence *req_fence)
+static void i915_gem_request_release(struct fence *req_fence)
{
struct drm_i915_gem_request *req = container_of(req_fence,
typeof(*req), fence);
+ struct intel_engine_cs *ring = req->ring;
+ struct drm_i915_private *dev_priv = to_i915(ring->dev);
+
+ /*
+ * Need to add the request to a deferred dereference list to be
+ * processed at a mutex lock safe time.
+ */
+ spin_lock(&ring->delayed_free_lock);
+ list_add_tail(&req->delayed_free_link, &ring->delayed_free_list);
+ spin_unlock(&ring->delayed_free_lock);
+
+ queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
+}
+
+static void i915_gem_request_free(struct drm_i915_gem_request *req)
+{
struct intel_context *ctx = req->ctx;
WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
@@ -2766,7 +2782,7 @@ static const struct fence_ops i915_gem_request_fops = {
.enable_signaling = i915_gem_request_enable_signaling,
.signaled = i915_gem_request_is_completed,
.wait = fence_default_wait,
- .release = i915_gem_request_free,
+ .release = i915_gem_request_release,
.get_driver_name = i915_gem_request_get_driver_name,
.get_timeline_name = i915_gem_request_get_timeline_name,
.fence_value_str = i915_gem_request_fence_value_str,
@@ -3014,6 +3030,9 @@ void i915_gem_reset(struct drm_device *dev)
void
i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
{
+ struct drm_i915_gem_request *req, *req_next;
+ LIST_HEAD(list_head);
+
WARN_ON(i915_verify_lists(ring->dev));
/* Retire requests first as we use it above for the early return.
@@ -3057,6 +3076,15 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
i915_gem_request_assign(&ring->trace_irq_req, NULL);
}
+ /* Really free any requests that were recently unreferenced */
+ spin_lock(&ring->delayed_free_lock);
+ list_splice_init(&ring->delayed_free_list, &list_head);
+ spin_unlock(&ring->delayed_free_lock);
+ list_for_each_entry_safe(req, req_next, &list_head, delayed_free_link) {
+ list_del(&req->delayed_free_link);
+ i915_gem_request_free(req);
+ }
+
WARN_ON(i915_verify_lists(ring->dev));
}
@@ -3251,7 +3279,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
ret = __i915_wait_request(req[i], reset_counter, true,
args->timeout_ns > 0 ? &args->timeout_ns : NULL,
to_rps_client(file));
- i915_gem_request_unreference__unlocked(req[i]);
+ i915_gem_request_unreference(req[i]);
}
return ret;
@@ -4266,7 +4294,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
if (ret == 0)
queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
- i915_gem_request_unreference__unlocked(target);
+ i915_gem_request_unreference(target);
return ret;
}
@@ -5123,6 +5151,7 @@ init_ring_lists(struct intel_engine_cs *ring)
{
INIT_LIST_HEAD(&ring->active_list);
INIT_LIST_HEAD(&ring->request_list);
+ INIT_LIST_HEAD(&ring->delayed_free_list);
}
void
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index a7bcd87e..6e12ed7 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11456,7 +11456,7 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
mmio_flip->crtc->reset_counter,
false, NULL,
&mmio_flip->i915->rps.mmioflips));
- i915_gem_request_unreference__unlocked(mmio_flip->req);
+ i915_gem_request_unreference(mmio_flip->req);
}
/* For framebuffer backed by dmabuf, wait for fence */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ac1f6dc..fede0dd 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2013,7 +2013,9 @@ logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
ring->dev = dev;
INIT_LIST_HEAD(&ring->active_list);
INIT_LIST_HEAD(&ring->request_list);
+ INIT_LIST_HEAD(&ring->delayed_free_list);
spin_lock_init(&ring->fence_lock);
+ spin_lock_init(&ring->delayed_free_lock);
i915_gem_batch_pool_init(dev, &ring->batch_pool);
init_waitqueue_head(&ring->irq_queue);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 061c59d..18c01b6 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7310,7 +7310,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
gen6_rps_boost(to_i915(req->ring->dev), NULL,
req->emitted_jiffies);
- i915_gem_request_unreference__unlocked(req);
+ i915_gem_request_unreference(req);
kfree(boost);
}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 651b700..ea2235f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2159,7 +2159,9 @@ static int intel_init_ring_buffer(struct drm_device *dev,
INIT_LIST_HEAD(&ring->request_list);
INIT_LIST_HEAD(&ring->execlist_queue);
INIT_LIST_HEAD(&ring->buffers);
+ INIT_LIST_HEAD(&ring->delayed_free_list);
spin_lock_init(&ring->fence_lock);
+ spin_lock_init(&ring->delayed_free_lock);
i915_gem_batch_pool_init(dev, &ring->batch_pool);
memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 585a540..a43247f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -305,6 +305,13 @@ struct intel_engine_cs {
*/
u32 last_submitted_seqno;
+ /*
+ * Deferred free list to allow unreferencing requests from interrupt
+ * contexts and from outside of the i915 driver.
+ */
+ struct list_head delayed_free_list;
+ spinlock_t delayed_free_lock;
+
bool gpu_caches_dirty;
wait_queue_head_t irq_queue;
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v6 5/7] drm/i915: Interrupt driven fences
2016-02-18 14:24 [PATCH v6 0/7] Convert requests to use struct fence John.C.Harrison
` (3 preceding siblings ...)
2016-02-18 14:24 ` [PATCH v6 4/7] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
@ 2016-02-18 14:24 ` John.C.Harrison
2016-02-18 14:24 ` [PATCH v6 6/7] drm/i915: Updated request structure tracing John.C.Harrison
` (3 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: John.C.Harrison @ 2016-02-18 14:24 UTC (permalink / raw)
To: Intel-GFX
From: John Harrison <John.C.Harrison@Intel.com>
The intended usage model for struct fence is that the signalled status
should be set on demand rather than polled. That is, there should not
be a need for a 'signaled' function to be called everytime the status
is queried. Instead, 'something' should be done to enable a signal
callback from the hardware which will update the state directly. In
the case of requests, this is the seqno update interrupt. The idea is
that this callback will only be enabled on demand when something
actually tries to wait on the fence.
This change removes the polling test and replaces it with the callback
scheme. Each fence is added to a 'please poke me' list at the start of
i915_add_request(). The interrupt handler then scans through the 'poke
me' list when a new seqno pops out and signals any matching
fence/request. The fence is then removed from the list so the entire
request stack does not need to be scanned every time. Note that the
fence is added to the list before the commands to generate the seqno
interrupt are added to the ring. Thus the sequence is guaranteed to be
race free if the interrupt is already enabled.
Note that the interrupt is only enabled on demand (i.e. when
__wait_request() is called). Thus there is still a potential race when
enabling the interrupt as the request may already have completed.
However, this is simply solved by calling the interrupt processing
code immediately after enabling the interrupt and thereby checking for
already completed requests.
Lastly, the ring clean up code has the possibility to cancel
outstanding requests (e.g. because TDR has reset the ring). These
requests will never get signalled and so must be removed from the
signal list manually. This is done by setting a 'cancelled' flag and
then calling the regular notify/retire code path rather than
attempting to duplicate the list manipulatation and clean up code in
multiple places. This also avoid any race condition where the
cancellation request might occur after/during the completion interrupt
actually arriving.
v2: Updated to take advantage of the request unreference no longer
requiring the mutex lock.
v3: Move the signal list processing around to prevent unsubmitted
requests being added to the list. This was occurring on Android
because the native sync implementation calls the
fence->enable_signalling API immediately on fence creation.
Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
'link' instead of 'list'. Added support for returning an error code on
a cancelled fence. Update list processing to be more efficient/safer
with respect to spinlocks.
v5: Made i915_gem_request_submit a static as it is only ever called
from one place.
Fixed up the low latency wait optimisation. The time delay between the
seqno value being to memory and the drive's ISR running can be
significant, at least for the wait request micro-benchmark. This can
be greatly improved by explicitly checking for seqno updates in the
pre-wait busy poll loop. Also added some documentation comments to the
busy poll code.
Fixed up support for the faking of lost interrupts
(test_irq_rings/missed_irq_rings). That is, there is an IGT test that
tells the driver to loose interrupts deliberately and then check that
everything still works as expected (albeit much slower).
Updates from review comments: use non IRQ-save spinlocking, early exit
on WARN and improved comments (Tvrtko Ursulin).
v6: Updated to newer nigthly and resolved conflicts around the
wait_request busy spin optimisation. Also fixed a race condition
between this early exit path and the regular completion path.
For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 8 ++
drivers/gpu/drm/i915/i915_gem.c | 240 +++++++++++++++++++++++++++++---
drivers/gpu/drm/i915/i915_irq.c | 2 +
drivers/gpu/drm/i915/intel_lrc.c | 2 +
drivers/gpu/drm/i915/intel_ringbuffer.c | 2 +
drivers/gpu/drm/i915/intel_ringbuffer.h | 2 +
6 files changed, 234 insertions(+), 22 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2c6aefba..0584846 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2210,7 +2210,12 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
struct drm_i915_gem_request {
/** Underlying object for implementing the signal/wait stuff. */
struct fence fence;
+ struct list_head signal_link;
+ struct list_head unsignal_link;
struct list_head delayed_free_link;
+ bool cancelled;
+ bool irq_enabled;
+ bool signal_requested;
/** On Which ring this request was generated */
struct drm_i915_private *i915;
@@ -2296,6 +2301,9 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
struct intel_context *ctx,
struct drm_i915_gem_request **req_out);
void i915_gem_request_cancel(struct drm_i915_gem_request *req);
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
+ bool fence_locked);
+void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
int i915_create_fence_timeline(struct drm_device *dev,
struct intel_context *ctx,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index aca9fcd..635729e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -40,6 +40,8 @@
#define RQ_BUG_ON(expr)
+static void i915_gem_request_submit(struct drm_i915_gem_request *req);
+
static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
static void
@@ -1260,9 +1262,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
struct intel_engine_cs *ring = i915_gem_request_get_ring(req);
struct drm_device *dev = ring->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
- const bool irq_test_in_progress =
- ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_ring_flag(ring);
int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
+ uint32_t seqno;
DEFINE_WAIT(wait);
unsigned long timeout_expire;
s64 before = 0; /* Only to silence a compiler warning. */
@@ -1270,9 +1271,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
- if (list_empty(&req->list))
- return 0;
-
if (i915_gem_request_completed(req))
return 0;
@@ -1298,15 +1296,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
trace_i915_gem_request_wait_begin(req);
/* Optimistic spin for the next jiffie before touching IRQs */
- ret = __i915_spin_request(req, state);
- if (ret == 0)
- goto out;
-
- if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring))) {
- ret = -ENODEV;
- goto out;
+ if (req->seqno) {
+ ret = __i915_spin_request(req, state);
+ if (ret == 0)
+ goto out;
}
+ /*
+ * Enable interrupt completion of the request.
+ */
+ fence_enable_sw_signaling(&req->fence);
+
for (;;) {
struct timer_list timer;
@@ -1328,6 +1328,19 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
break;
}
+ if (req->seqno) {
+ /*
+ * There is quite a lot of latency in the user interrupt
+ * path. So do an explicit seqno check and potentially
+ * remove all that delay.
+ */
+ seqno = ring->get_seqno(ring, false);
+ if (i915_seqno_passed(seqno, req->seqno)) {
+ ret = 0;
+ break;
+ }
+ }
+
if (signal_pending_state(state, current)) {
ret = -ERESTARTSYS;
break;
@@ -1354,14 +1367,30 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
destroy_timer_on_stack(&timer);
}
}
- if (!irq_test_in_progress)
- ring->irq_put(ring);
finish_wait(&ring->irq_queue, &wait);
out:
trace_i915_gem_request_wait_end(req);
+ if ((ret == 0) && (req->seqno)) {
+ seqno = ring->get_seqno(ring, false);
+ if (i915_seqno_passed(seqno, req->seqno) &&
+ !i915_gem_request_completed(req)) {
+ /*
+ * Make sure the request is marked as completed before
+ * returning. NB: Need to acquire the spinlock around
+ * the whole call to avoid a race condition with the
+ * interrupt handler is running concurrently and could
+ * cause this invocation to early exit even though the
+ * request has not actually been fully processed yet.
+ */
+ spin_lock_irq(&req->ring->fence_lock);
+ i915_gem_request_notify(req->ring, true);
+ spin_unlock_irq(&req->ring->fence_lock);
+ }
+ }
+
if (timeout) {
s64 tres = *timeout - (ktime_get_raw_ns() - before);
@@ -1442,6 +1471,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
list_del_init(&request->list);
i915_gem_request_remove_from_client(request);
+ /*
+ * In case the request is still in the signal pending list,
+ * e.g. due to being cancelled by TDR, preemption, etc.
+ */
+ if (!list_empty(&request->signal_link)) {
+ /*
+ * The request must be marked as cancelled and the underlying
+ * fence as failed. NB: There is no explicit fence fail API,
+ * there is only a manual poke and signal.
+ */
+ request->cancelled = true;
+ /* How to propagate to any associated sync_fence??? */
+ request->fence.status = -EIO;
+ fence_signal_locked(&request->fence);
+ }
+
i915_gem_request_unreference(request);
}
@@ -2600,6 +2645,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
*/
request->postfix = intel_ring_get_tail(ringbuf);
+ /*
+ * Add the fence to the pending list before emitting the commands to
+ * generate a seqno notification interrupt.
+ */
+ i915_gem_request_submit(request);
+
if (i915.enable_execlists)
ret = ring->emit_request(request);
else {
@@ -2719,25 +2770,140 @@ static void i915_gem_request_free(struct drm_i915_gem_request *req)
i915_gem_context_unreference(ctx);
}
+ if (req->irq_enabled)
+ req->ring->irq_put(req->ring);
+
kmem_cache_free(req->i915->requests, req);
}
-static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+/*
+ * The request is about to be submitted to the hardware so add the fence to
+ * the list of signalable fences.
+ *
+ * NB: This does not necessarily enable interrupts yet. That only occurs on
+ * demand when the request is actually waited on. However, adding it to the
+ * list early ensures that there is no race condition where the interrupt
+ * could pop out prematurely and thus be completely lost. The race is merely
+ * that the interrupt must be manually checked for after being enabled.
+ */
+static void i915_gem_request_submit(struct drm_i915_gem_request *req)
{
- /* Interrupt driven fences are not implemented yet.*/
- WARN(true, "This should not be called!");
- return true;
+ /*
+ * Always enable signal processing for the request's fence object
+ * before that request is submitted to the hardware. Thus there is no
+ * race condition whereby the interrupt could pop out before the
+ * request has been added to the signal list. Hence no need to check
+ * for completion, undo the list add and return false.
+ */
+ i915_gem_request_reference(req);
+ spin_lock_irq(&req->ring->fence_lock);
+ WARN_ON(!list_empty(&req->signal_link));
+ list_add_tail(&req->signal_link, &req->ring->fence_signal_list);
+ spin_unlock_irq(&req->ring->fence_lock);
+
+ /*
+ * NB: Interrupts are only enabled on demand. Thus there is still a
+ * race where the request could complete before the interrupt has
+ * been enabled. Thus care must be taken at that point.
+ */
+
+ /* Have interrupts already been requested? */
+ if (req->signal_requested)
+ i915_gem_request_enable_interrupt(req, false);
}
-static bool i915_gem_request_is_completed(struct fence *req_fence)
+/*
+ * The request is being actively waited on, so enable interrupt based
+ * completion signalling.
+ */
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
+ bool fence_locked)
+{
+ struct drm_i915_private *dev_priv = to_i915(req->ring->dev);
+ const bool irq_test_in_progress =
+ ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) &
+ intel_ring_flag(req->ring);
+
+ if (req->irq_enabled)
+ return;
+
+ if (irq_test_in_progress)
+ return;
+
+ if (!WARN_ON(!req->ring->irq_get(req->ring)))
+ req->irq_enabled = true;
+
+ /*
+ * Because the interrupt is only enabled on demand, there is a race
+ * where the interrupt can fire before anyone is looking for it. So
+ * do an explicit check for missed interrupts.
+ */
+ i915_gem_request_notify(req->ring, fence_locked);
+}
+
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
{
struct drm_i915_gem_request *req = container_of(req_fence,
typeof(*req), fence);
+
+ /*
+ * No need to actually enable interrupt based processing until the
+ * request has been submitted to the hardware. At which point
+ * 'i915_gem_request_submit()' is called. So only really enable
+ * signalling in there. Just set a flag to say that interrupts are
+ * wanted when the request is eventually submitted. On the other hand
+ * if the request has already been submitted then interrupts do need
+ * to be enabled now.
+ */
+
+ req->signal_requested = true;
+
+ if (!list_empty(&req->signal_link))
+ i915_gem_request_enable_interrupt(req, true);
+
+ return true;
+}
+
+void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked)
+{
+ struct drm_i915_gem_request *req, *req_next;
+ unsigned long flags;
u32 seqno;
- seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
+ if (list_empty(&ring->fence_signal_list))
+ return;
+
+ if (!fence_locked)
+ spin_lock_irqsave(&ring->fence_lock, flags);
+
+ seqno = ring->get_seqno(ring, false);
+
+ list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_link) {
+ if (!req->cancelled) {
+ if (!i915_seqno_passed(seqno, req->seqno))
+ break;
+ }
+
+ /*
+ * Start by removing the fence from the signal list otherwise
+ * the retire code can run concurrently and get confused.
+ */
+ list_del_init(&req->signal_link);
+
+ if (!req->cancelled)
+ fence_signal_locked(&req->fence);
+
+ if (req->irq_enabled) {
+ req->ring->irq_put(req->ring);
+ req->irq_enabled = false;
+ }
- return i915_seqno_passed(seqno, req->seqno);
+ /* Can't unreference here because that might grab fence_lock */
+ list_add_tail(&req->unsignal_link, &ring->fence_unsignal_list);
+ }
+
+ if (!fence_locked)
+ spin_unlock_irqrestore(&ring->fence_lock, flags);
}
static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
@@ -2780,7 +2946,6 @@ static void i915_gem_request_fence_value_str(struct fence *req_fence,
static const struct fence_ops i915_gem_request_fops = {
.enable_signaling = i915_gem_request_enable_signaling,
- .signaled = i915_gem_request_is_completed,
.wait = fence_default_wait,
.release = i915_gem_request_release,
.get_driver_name = i915_gem_request_get_driver_name,
@@ -2864,6 +3029,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
goto err;
}
+ INIT_LIST_HEAD(&req->signal_link);
fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
ctx->engine[ring->id].fence_timeline.fence_context,
i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
@@ -2901,6 +3067,11 @@ void i915_gem_request_cancel(struct drm_i915_gem_request *req)
{
intel_ring_reserved_space_cancel(req->ringbuf);
+ req->cancelled = true;
+ /* How to propagate to any associated sync_fence??? */
+ req->fence.status = -EINVAL;
+ fence_signal_locked(&req->fence);
+
i915_gem_request_unreference(req);
}
@@ -2987,6 +3158,13 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
i915_gem_request_retire(request);
}
+ /*
+ * Tidy up anything left over. This includes a call to
+ * i915_gem_request_notify() which will make sure that any requests
+ * that were on the signal pending list get also cleaned up.
+ */
+ i915_gem_retire_requests_ring(ring);
+
/* Having flushed all requests from all queues, we know that all
* ringbuffers must now be empty. However, since we do not reclaim
* all space when retiring the request (to prevent HEADs colliding
@@ -3035,6 +3213,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
WARN_ON(i915_verify_lists(ring->dev));
+ /*
+ * If no-one has waited on a request recently then interrupts will
+ * not have been enabled and thus no requests will ever be marked as
+ * completed. So do an interrupt check now.
+ */
+ i915_gem_request_notify(ring, false);
+
/* Retire requests first as we use it above for the early return.
* If we retire requests last, we may use a later seqno and so clear
* the requests lists without clearing the active list, leading to
@@ -3076,6 +3261,15 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
i915_gem_request_assign(&ring->trace_irq_req, NULL);
}
+ /* Tidy up any requests that were recently signalled */
+ spin_lock_irq(&ring->fence_lock);
+ list_splice_init(&ring->fence_unsignal_list, &list_head);
+ spin_unlock_irq(&ring->fence_lock);
+ list_for_each_entry_safe(req, req_next, &list_head, unsignal_link) {
+ list_del(&req->unsignal_link);
+ i915_gem_request_unreference(req);
+ }
+
/* Really free any requests that were recently unreferenced */
spin_lock(&ring->delayed_free_lock);
list_splice_init(&ring->delayed_free_list, &list_head);
@@ -5151,6 +5345,8 @@ init_ring_lists(struct intel_engine_cs *ring)
{
INIT_LIST_HEAD(&ring->active_list);
INIT_LIST_HEAD(&ring->request_list);
+ INIT_LIST_HEAD(&ring->fence_signal_list);
+ INIT_LIST_HEAD(&ring->fence_unsignal_list);
INIT_LIST_HEAD(&ring->delayed_free_list);
}
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 6e2cd9f..a5f64aa 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1001,6 +1001,8 @@ static void notify_ring(struct intel_engine_cs *ring)
trace_i915_gem_request_notify(ring);
+ i915_gem_request_notify(ring, false);
+
wake_up_all(&ring->irq_queue);
}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index fede0dd..5657924 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2013,6 +2013,8 @@ logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
ring->dev = dev;
INIT_LIST_HEAD(&ring->active_list);
INIT_LIST_HEAD(&ring->request_list);
+ INIT_LIST_HEAD(&ring->fence_signal_list);
+ INIT_LIST_HEAD(&ring->fence_unsignal_list);
INIT_LIST_HEAD(&ring->delayed_free_list);
spin_lock_init(&ring->fence_lock);
spin_lock_init(&ring->delayed_free_lock);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ea2235f..9d4f19d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2159,6 +2159,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
INIT_LIST_HEAD(&ring->request_list);
INIT_LIST_HEAD(&ring->execlist_queue);
INIT_LIST_HEAD(&ring->buffers);
+ INIT_LIST_HEAD(&ring->fence_signal_list);
+ INIT_LIST_HEAD(&ring->fence_unsignal_list);
INIT_LIST_HEAD(&ring->delayed_free_list);
spin_lock_init(&ring->fence_lock);
spin_lock_init(&ring->delayed_free_lock);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index a43247f..6a7968b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -361,6 +361,8 @@ struct intel_engine_cs {
u32 (*get_cmd_length_mask)(u32 cmd_header);
spinlock_t fence_lock;
+ struct list_head fence_signal_list;
+ struct list_head fence_unsignal_list;
};
static inline bool
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v6 6/7] drm/i915: Updated request structure tracing
2016-02-18 14:24 [PATCH v6 0/7] Convert requests to use struct fence John.C.Harrison
` (4 preceding siblings ...)
2016-02-18 14:24 ` [PATCH v6 5/7] drm/i915: Interrupt driven fences John.C.Harrison
@ 2016-02-18 14:24 ` John.C.Harrison
2016-02-18 14:24 ` [PATCH v6 7/7] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
` (2 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: John.C.Harrison @ 2016-02-18 14:24 UTC (permalink / raw)
To: Intel-GFX
From: John Harrison <John.C.Harrison@Intel.com>
Added the '_complete' trace event which occurs when a fence/request is
signaled as complete. Also moved the notify event from the IRQ handler
code to inside the notify function itself.
v3: Added the current ring seqno to the notify trace point.
v5: Line wrapping to keep the style checker happy.
For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 9 +++++++--
drivers/gpu/drm/i915/i915_irq.c | 2 --
drivers/gpu/drm/i915/i915_trace.h | 14 +++++++++-----
3 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 635729e..f7858ea 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2870,13 +2870,16 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked)
unsigned long flags;
u32 seqno;
- if (list_empty(&ring->fence_signal_list))
+ if (list_empty(&ring->fence_signal_list)) {
+ trace_i915_gem_request_notify(ring, 0);
return;
+ }
if (!fence_locked)
spin_lock_irqsave(&ring->fence_lock, flags);
seqno = ring->get_seqno(ring, false);
+ trace_i915_gem_request_notify(ring, seqno);
list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_link) {
if (!req->cancelled) {
@@ -2890,8 +2893,10 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked)
*/
list_del_init(&req->signal_link);
- if (!req->cancelled)
+ if (!req->cancelled) {
fence_signal_locked(&req->fence);
+ trace_i915_gem_request_complete(req);
+ }
if (req->irq_enabled) {
req->ring->irq_put(req->ring);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index a5f64aa..20c6a90 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -999,8 +999,6 @@ static void notify_ring(struct intel_engine_cs *ring)
if (!intel_ring_initialized(ring))
return;
- trace_i915_gem_request_notify(ring);
-
i915_gem_request_notify(ring, false);
wake_up_all(&ring->irq_queue);
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 52b2d40..cfe4f03 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -561,23 +561,27 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
);
TRACE_EVENT(i915_gem_request_notify,
- TP_PROTO(struct intel_engine_cs *ring),
- TP_ARGS(ring),
+ TP_PROTO(struct intel_engine_cs *ring, uint32_t seqno),
+ TP_ARGS(ring, seqno),
TP_STRUCT__entry(
__field(u32, dev)
__field(u32, ring)
__field(u32, seqno)
+ __field(bool, is_empty)
),
TP_fast_assign(
__entry->dev = ring->dev->primary->index;
__entry->ring = ring->id;
- __entry->seqno = ring->get_seqno(ring, false);
+ __entry->seqno = seqno;
+ __entry->is_empty =
+ list_empty(&ring->fence_signal_list);
),
- TP_printk("dev=%u, ring=%u, seqno=%u",
- __entry->dev, __entry->ring, __entry->seqno)
+ TP_printk("dev=%u, ring=%u, seqno=%u, empty=%d",
+ __entry->dev, __entry->ring, __entry->seqno,
+ __entry->is_empty)
);
DEFINE_EVENT(i915_gem_request, i915_gem_request_retire,
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v6 7/7] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
2016-02-18 14:24 [PATCH v6 0/7] Convert requests to use struct fence John.C.Harrison
` (5 preceding siblings ...)
2016-02-18 14:24 ` [PATCH v6 6/7] drm/i915: Updated request structure tracing John.C.Harrison
@ 2016-02-18 14:24 ` John.C.Harrison
2016-02-18 14:51 ` [PATCH v6 0/7] Convert requests to use struct fence Chris Wilson
2016-02-18 15:01 ` ✗ Fi.CI.BAT: failure for Convert requests to use struct fence (rev3) Patchwork
8 siblings, 0 replies; 18+ messages in thread
From: John.C.Harrison @ 2016-02-18 14:24 UTC (permalink / raw)
To: Intel-GFX
From: John Harrison <John.C.Harrison@Intel.com>
The notify function can be called many times without the seqno
changing. A large number of duplicates are to prevent races due to the
requirement of not enabling interrupts until requested. However, when
interrupts are enabled the IRQ handle can be called multiple times
without the ring's seqno value changing. This patch reduces the
overhead of these extra calls by caching the last processed seqno
value and early exiting if it has not changed.
v3: New patch for series.
v5: Added comment about last_irq_seqno usage due to code review
feedback (Tvrtko Ursulin).
v6: Minor update to resolve a race condition with the wait_request
optimisation.
For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 21 +++++++++++++++++++--
drivers/gpu/drm/i915/intel_ringbuffer.h | 1 +
2 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f7858ea..72a37d6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1386,6 +1386,7 @@ out:
* request has not actually been fully processed yet.
*/
spin_lock_irq(&req->ring->fence_lock);
+ req->ring->last_irq_seqno = 0;
i915_gem_request_notify(req->ring, true);
spin_unlock_irq(&req->ring->fence_lock);
}
@@ -2543,6 +2544,8 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno)
for (j = 0; j < ARRAY_SIZE(ring->semaphore.sync_seqno); j++)
ring->semaphore.sync_seqno[j] = 0;
+
+ ring->last_irq_seqno = 0;
}
return 0;
@@ -2875,11 +2878,22 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked)
return;
}
+ /*
+ * Check for a new seqno. If it hasn't actually changed then early
+ * exit without even grabbing the spinlock. Note that this is safe
+ * because any corruption of last_irq_seqno merely results in doing
+ * the full processing when there is potentially no work to be done.
+ * It can never lead to not processing work that does need to happen.
+ */
+ seqno = ring->get_seqno(ring, false);
+ trace_i915_gem_request_notify(ring, seqno);
+ if (seqno == ring->last_irq_seqno)
+ return;
+
if (!fence_locked)
spin_lock_irqsave(&ring->fence_lock, flags);
- seqno = ring->get_seqno(ring, false);
- trace_i915_gem_request_notify(ring, seqno);
+ ring->last_irq_seqno = seqno;
list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_link) {
if (!req->cancelled) {
@@ -3167,7 +3181,10 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
* Tidy up anything left over. This includes a call to
* i915_gem_request_notify() which will make sure that any requests
* that were on the signal pending list get also cleaned up.
+ * NB: The seqno cache must be cleared otherwise the notify call will
+ * simply return immediately.
*/
+ ring->last_irq_seqno = 0;
i915_gem_retire_requests_ring(ring);
/* Having flushed all requests from all queues, we know that all
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 6a7968b..ada93a9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -363,6 +363,7 @@ struct intel_engine_cs {
spinlock_t fence_lock;
struct list_head fence_signal_list;
struct list_head fence_unsignal_list;
+ uint32_t last_irq_seqno;
};
static inline bool
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v6 4/7] drm/i915: Delay the freeing of requests until retire time
2016-02-18 14:24 ` [PATCH v6 4/7] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
@ 2016-02-18 14:48 ` Chris Wilson
2016-02-22 14:34 ` John Harrison
2016-03-01 15:18 ` Maarten Lankhorst
1 sibling, 1 reply; 18+ messages in thread
From: Chris Wilson @ 2016-02-18 14:48 UTC (permalink / raw)
To: John.C.Harrison; +Cc: Intel-GFX
On Thu, Feb 18, 2016 at 02:24:07PM +0000, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
As I said, and have shown in patches several months ago, just fix the
underlying bug to remove the struct_mutex requirement for freeing the
request.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 3/7] drm/i915: Add per context timelines to fence object
2016-02-18 14:24 ` [PATCH v6 3/7] drm/i915: Add per context timelines to fence object John.C.Harrison
@ 2016-02-18 14:49 ` Chris Wilson
2016-02-22 14:33 ` John Harrison
0 siblings, 1 reply; 18+ messages in thread
From: Chris Wilson @ 2016-02-18 14:49 UTC (permalink / raw)
To: John.C.Harrison; +Cc: Intel-GFX
On Thu, Feb 18, 2016 at 02:24:06PM +0000, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The fence object used inside the request structure requires a sequence
> number. Although this is not used by the i915 driver itself, it could
> potentially be used by non-i915 code if the fence is passed outside of
> the driver. This is the intention as it allows external kernel drivers
> and user applications to wait on batch buffer completion
> asynchronously via the dma-buff fence API.
>
> To ensure that such external users are not confused by strange things
> happening with the seqno, this patch adds in a per context timeline
> that can provide a guaranteed in-order seqno value for the fence. This
> is safe because the scheduler will not re-order batch buffers within a
> context - they are considered to be mutually dependent.
This is still nonsense. Just implement per-context seqno.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 0/7] Convert requests to use struct fence
2016-02-18 14:24 [PATCH v6 0/7] Convert requests to use struct fence John.C.Harrison
` (6 preceding siblings ...)
2016-02-18 14:24 ` [PATCH v6 7/7] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
@ 2016-02-18 14:51 ` Chris Wilson
2016-02-22 14:36 ` John Harrison
2016-02-18 15:01 ` ✗ Fi.CI.BAT: failure for Convert requests to use struct fence (rev3) Patchwork
8 siblings, 1 reply; 18+ messages in thread
From: Chris Wilson @ 2016-02-18 14:51 UTC (permalink / raw)
To: John.C.Harrison; +Cc: Intel-GFX
On Thu, Feb 18, 2016 at 02:24:03PM +0000, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
Does this pass igt? If so, which are the bug fixes for the current
regressions from the request conversion?
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* ✗ Fi.CI.BAT: failure for Convert requests to use struct fence (rev3)
2016-02-18 14:24 [PATCH v6 0/7] Convert requests to use struct fence John.C.Harrison
` (7 preceding siblings ...)
2016-02-18 14:51 ` [PATCH v6 0/7] Convert requests to use struct fence Chris Wilson
@ 2016-02-18 15:01 ` Patchwork
8 siblings, 0 replies; 18+ messages in thread
From: Patchwork @ 2016-02-18 15:01 UTC (permalink / raw)
To: John.C.Harrison; +Cc: intel-gfx
== Summary ==
Series 1068v3 Convert requests to use struct fence
2016-02-18T14:23:35.102012 http://patchwork.freedesktop.org/api/1.0/series/1068/revisions/3/mbox/
Applying: drm/i915: Convert requests to use struct fence
Repository lacks necessary blobs to fall back on 3-way merge.
Cannot fall back to three-way merge.
Patch failed at 0001 drm/i915: Convert requests to use struct fence
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 3/7] drm/i915: Add per context timelines to fence object
2016-02-18 14:49 ` Chris Wilson
@ 2016-02-22 14:33 ` John Harrison
2016-02-24 10:45 ` Maarten Lankhorst
0 siblings, 1 reply; 18+ messages in thread
From: John Harrison @ 2016-02-22 14:33 UTC (permalink / raw)
To: Chris Wilson, Intel-GFX
On 18/02/2016 14:49, Chris Wilson wrote:
> On Thu, Feb 18, 2016 at 02:24:06PM +0000, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The fence object used inside the request structure requires a sequence
>> number. Although this is not used by the i915 driver itself, it could
>> potentially be used by non-i915 code if the fence is passed outside of
>> the driver. This is the intention as it allows external kernel drivers
>> and user applications to wait on batch buffer completion
>> asynchronously via the dma-buff fence API.
>>
>> To ensure that such external users are not confused by strange things
>> happening with the seqno, this patch adds in a per context timeline
>> that can provide a guaranteed in-order seqno value for the fence. This
>> is safe because the scheduler will not re-order batch buffers within a
>> context - they are considered to be mutually dependent.
> This is still nonsense. Just implement per-context seqno.
If you already have a set of patches to implement per-context seqno then
let's get them merged. Otherwise, that is follow up work to be done once
the scheduler has landed. There has already been too much churn and
delay. So the decision is to get the scheduler in as soon as possible
and any 'could do better' issues should be logged for follow up work.
> -Chris
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 4/7] drm/i915: Delay the freeing of requests until retire time
2016-02-18 14:48 ` Chris Wilson
@ 2016-02-22 14:34 ` John Harrison
0 siblings, 0 replies; 18+ messages in thread
From: John Harrison @ 2016-02-22 14:34 UTC (permalink / raw)
To: Chris Wilson, Intel-GFX
On 18/02/2016 14:48, Chris Wilson wrote:
> On Thu, Feb 18, 2016 at 02:24:07PM +0000, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
> As I said, and have shown in patches several months ago, just fix the
> underlying bug to remove the struct_mutex requirement for freeing the
> request.
If you have a stand alone set of patches to implement this them please
post them and let's get them merged.
> -Chris
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 0/7] Convert requests to use struct fence
2016-02-18 14:51 ` [PATCH v6 0/7] Convert requests to use struct fence Chris Wilson
@ 2016-02-22 14:36 ` John Harrison
0 siblings, 0 replies; 18+ messages in thread
From: John Harrison @ 2016-02-22 14:36 UTC (permalink / raw)
To: Chris Wilson, Intel-GFX
On 18/02/2016 14:51, Chris Wilson wrote:
> On Thu, Feb 18, 2016 at 02:24:03PM +0000, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
> Does this pass igt?
AFAICT, it passes as much of IGT as was passing before this patch set.
> If so, which are the bug fixes for the current
> regressions from the request conversion?
Which bugs in particular?
> -Chris
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 3/7] drm/i915: Add per context timelines to fence object
2016-02-22 14:33 ` John Harrison
@ 2016-02-24 10:45 ` Maarten Lankhorst
0 siblings, 0 replies; 18+ messages in thread
From: Maarten Lankhorst @ 2016-02-24 10:45 UTC (permalink / raw)
To: John Harrison, Chris Wilson, Intel-GFX
Op 22-02-16 om 15:33 schreef John Harrison:
> On 18/02/2016 14:49, Chris Wilson wrote:
>> On Thu, Feb 18, 2016 at 02:24:06PM +0000, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> The fence object used inside the request structure requires a sequence
>>> number. Although this is not used by the i915 driver itself, it could
>>> potentially be used by non-i915 code if the fence is passed outside of
>>> the driver. This is the intention as it allows external kernel drivers
>>> and user applications to wait on batch buffer completion
>>> asynchronously via the dma-buff fence API.
>>>
>>> To ensure that such external users are not confused by strange things
>>> happening with the seqno, this patch adds in a per context timeline
>>> that can provide a guaranteed in-order seqno value for the fence. This
>>> is safe because the scheduler will not re-order batch buffers within a
>>> context - they are considered to be mutually dependent.
>> This is still nonsense. Just implement per-context seqno.
> If you already have a set of patches to implement per-context seqno then let's get them merged. Otherwise, that is follow up work to be done once the scheduler has landed. There has already been too much churn and delay. So the decision is to get the scheduler in as soon as possible and any 'could do better' issues should be logged for follow up work.
Seems to me that per context seqno would be cleaner than this hack..
~Maarten
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 4/7] drm/i915: Delay the freeing of requests until retire time
2016-02-18 14:24 ` [PATCH v6 4/7] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
2016-02-18 14:48 ` Chris Wilson
@ 2016-03-01 15:18 ` Maarten Lankhorst
2016-03-14 15:20 ` John Harrison
1 sibling, 1 reply; 18+ messages in thread
From: Maarten Lankhorst @ 2016-03-01 15:18 UTC (permalink / raw)
To: John.C.Harrison, Intel-GFX
Hey,
Op 18-02-16 om 15:24 schreef John.C.Harrison@Intel.com:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The request structure is reference counted. When the count reached
> zero, the request was immediately freed and all associated objects
> were unrefereced/unallocated. This meant that the driver mutex lock
> must be held at the point where the count reaches zero. This was fine
> while all references were held internally to the driver. However, the
> plan is to allow the underlying fence object (and hence the request
> itself) to be returned to other drivers and to userland. External
> users cannot be expected to acquire a driver private mutex lock.
>
> Rather than attempt to disentangle the request structure from the
> driver mutex lock, the decsion was to defer the free code until a
> later (safer) point. Hence this patch changes the unreference callback
> to merely move the request onto a delayed free list. The driver's
> retire worker thread will then process the list and actually call the
> free function on the requests.
>
> v2: New patch in series.
>
> v3: Updated after review comments by Tvrtko Ursulin. Rename list nodes
> to 'link' rather than 'list'. Update list processing to be more
> efficient/safer with respect to spinlocks.
>
> v4: Changed to use basic spinlocks rather than IRQ ones - missed
> update from earlier feedback by Tvrtko.
>
> v5: Improved a comment to keep the style checker happy.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Looks like Chris also mentioned it, but a fence can stay alive for an unknown period of time.
As a result when a fence is signaled all associated data should be freed as soon as the fence is signaled,
not when the last refcount is dropped to 0. This will remove the delayed free dance and clean up code. :)
~Maarten
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 4/7] drm/i915: Delay the freeing of requests until retire time
2016-03-01 15:18 ` Maarten Lankhorst
@ 2016-03-14 15:20 ` John Harrison
0 siblings, 0 replies; 18+ messages in thread
From: John Harrison @ 2016-03-14 15:20 UTC (permalink / raw)
To: Maarten Lankhorst, Intel-GFX
On 01/03/2016 15:18, Maarten Lankhorst wrote:
> Hey,
>
> Op 18-02-16 om 15:24 schreef John.C.Harrison@Intel.com:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The request structure is reference counted. When the count reached
>> zero, the request was immediately freed and all associated objects
>> were unrefereced/unallocated. This meant that the driver mutex lock
>> must be held at the point where the count reaches zero. This was fine
>> while all references were held internally to the driver. However, the
>> plan is to allow the underlying fence object (and hence the request
>> itself) to be returned to other drivers and to userland. External
>> users cannot be expected to acquire a driver private mutex lock.
>>
>> Rather than attempt to disentangle the request structure from the
>> driver mutex lock, the decsion was to defer the free code until a
>> later (safer) point. Hence this patch changes the unreference callback
>> to merely move the request onto a delayed free list. The driver's
>> retire worker thread will then process the list and actually call the
>> free function on the requests.
>>
>> v2: New patch in series.
>>
>> v3: Updated after review comments by Tvrtko Ursulin. Rename list nodes
>> to 'link' rather than 'list'. Update list processing to be more
>> efficient/safer with respect to spinlocks.
>>
>> v4: Changed to use basic spinlocks rather than IRQ ones - missed
>> update from earlier feedback by Tvrtko.
>>
>> v5: Improved a comment to keep the style checker happy.
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Looks like Chris also mentioned it, but a fence can stay alive for an unknown period of time.
>
> As a result when a fence is signaled all associated data should be freed as soon as the fence is signaled,
> not when the last refcount is dropped to 0. This will remove the delayed free dance and clean up code. :)
I'm not sure what you mean. The delayed free thing is purely because
freeing up the resources associated with the request requires holding
the driver mutex lock - unpinning and freeing contexts basically. Chris
has claimed that this is easy to resolve but it does not look trivial to
me.
It might be possible to move the context, client and IRQ release from
the final ref count -> 0 function to the retire function instead. I
think that would be the soonest non-interrupt opportunity after the
request has been signalled. I'm not sure it really buys you much though.
The context is likely to be locked by a newer request anyway, the client
release is only removing up a node from list and the IRQ is already
being released at the point of signal (it is only in the ref -> 0 path
for the case where the request got aborted before completing).
The real holder of resources is the object tracking code. It is the
object/vma freeing when the object itself is retired that really
releases memory. And that is not changing - it is not part of the
request signal code path. That all happens from
'i915_gem_retire_requests_ring' or from an explicit wait-on-request. It
might be possible to trigger the process from the request signal handler
as well but again, I can't see it being easy to make that IRQ-time
friendly. I'm pretty sure it would have to be another deferred work
handler rather than doing it in the IRQ.
>
> ~Maarten
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2016-03-14 15:22 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-18 14:24 [PATCH v6 0/7] Convert requests to use struct fence John.C.Harrison
2016-02-18 14:24 ` [PATCH v6 1/7] drm/i915: " John.C.Harrison
2016-02-18 14:24 ` [PATCH v6 2/7] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
2016-02-18 14:24 ` [PATCH v6 3/7] drm/i915: Add per context timelines to fence object John.C.Harrison
2016-02-18 14:49 ` Chris Wilson
2016-02-22 14:33 ` John Harrison
2016-02-24 10:45 ` Maarten Lankhorst
2016-02-18 14:24 ` [PATCH v6 4/7] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
2016-02-18 14:48 ` Chris Wilson
2016-02-22 14:34 ` John Harrison
2016-03-01 15:18 ` Maarten Lankhorst
2016-03-14 15:20 ` John Harrison
2016-02-18 14:24 ` [PATCH v6 5/7] drm/i915: Interrupt driven fences John.C.Harrison
2016-02-18 14:24 ` [PATCH v6 6/7] drm/i915: Updated request structure tracing John.C.Harrison
2016-02-18 14:24 ` [PATCH v6 7/7] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
2016-02-18 14:51 ` [PATCH v6 0/7] Convert requests to use struct fence Chris Wilson
2016-02-22 14:36 ` John Harrison
2016-02-18 15:01 ` ✗ Fi.CI.BAT: failure for Convert requests to use struct fence (rev3) Patchwork
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).