* [PATCH v2 2/3] drm/i915: Limit the busy wait on requests to 10us not 10ms!
2015-11-27 14:26 [PATCH v2 1/3] drm/i915: Break busywaiting for requests on pending signals Chris Wilson
@ 2015-11-27 14:26 ` Chris Wilson
2015-11-27 16:52 ` Tvrtko Ursulin
2015-11-27 14:26 ` [PATCH v2 3/3] drm/i915: Only spin whilst waiting on the current request Chris Wilson
2015-11-27 16:51 ` [PATCH v2 1/3] drm/i915: Break busywaiting for requests on pending signals Tvrtko Ursulin
2 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2015-11-27 14:26 UTC (permalink / raw)
To: intel-gfx; +Cc: Daniel Vetter, Eero Tamminen, stable, Rantala, Valtteri
When waiting for high frequency requests, the finite amount of time
required to set up the irq and wait upon it limits the response rate. By
busywaiting on the request completion for a short while we can service
the high frequency waits as quick as possible. However, if it is a slow
request, we want to sleep as quickly as possible. The tradeoff between
waiting and sleeping is roughly the time it takes to sleep on a request,
on the order of a microsecond. Based on measurements of synchronous
workloads from across big core and little atom, I have set the limit for
busywaiting as 10 microseconds. In most of the synchronous cases, we can
reduce the limit down to as little as 2 miscroseconds, but that leaves
quite a few test cases regressing by factors of 3 and more.
The code currently uses the jiffie clock, but that is far too coarse (on
the order of 10 milliseconds) and results in poor interactivity as the
CPU ends up being hogged by slow requests. To get microsecond resolution
we need to use a high resolution timer. The cheapest of which is polling
local_clock(), but that is only valid on the same CPU. If we switch CPUs
because the task was preempted, we can also use that as an indicator that
the system is too busy to waste cycles on spinning and we should sleep
instead.
__i915_spin_request was introduced in
commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 [v4.2]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Apr 7 16:20:41 2015 +0100
drm/i915: Optimistically spin for the request completion
v2: Drop full u64 for unsigned long - the timer is 32bit wraparound safe,
so we can use native register sizes on smaller architectures. Mention
the approximate microseconds units for elapsed time and add some extra
comments describing the reason for busywaiting.
v3: Raise the limit to 10us
Reported-by: Jens Axboe <axboe@kernel.dk>
Link: https://lkml.org/lkml/2015/11/12/621
Cc; "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
Cc: stable@kernel.vger.org
---
drivers/gpu/drm/i915/i915_gem.c | 47 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 45 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 51b6e6f8a44e..c0941e8a70f7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1146,14 +1146,57 @@ static bool missed_irq(struct drm_i915_private *dev_priv,
return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
}
+static unsigned long local_clock_us(unsigned *cpu)
+{
+ unsigned long t;
+
+ /* Cheaply and approximately convert from nanoseconds to microseconds.
+ * The result and subsequent calculations are also defined in the same
+ * approximate microseconds units. The principal source of timing
+ * error here is from the simple truncation.
+ *
+ * Note that local_clock() is only defined wrt to the current CPU;
+ * the comparisons are no longer valid if we switch CPUs. Instead of
+ * blocking preemption for the entire busywait, we can detect the CPU
+ * switch and use that as indicator of system load and a reason to
+ * stop busywaiting, see busywait_stop().
+ */
+ *cpu = get_cpu();
+ t = local_clock() >> 10;
+ put_cpu();
+
+ return t;
+}
+
+static bool busywait_stop(unsigned long timeout, unsigned cpu)
+{
+ unsigned this_cpu;
+
+ if (time_after(local_clock_us(&this_cpu), timeout))
+ return true;
+
+ return this_cpu != cpu;
+}
+
static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
{
unsigned long timeout;
+ unsigned cpu;
+
+ /* When waiting for high frequency requests, e.g. during synchronous
+ * rendering split between the CPU and GPU, the finite amount of time
+ * required to set up the irq and wait upon it limits the response
+ * rate. By busywaiting on the request completion for a short while we
+ * can service the high frequency waits as quick as possible. However,
+ * if it is a slow request, we want to sleep as quickly as possible.
+ * The tradeoff between waiting and sleeping is roughly the time it
+ * takes to sleep on a request, on the order of a microsecond.
+ */
if (i915_gem_request_get_ring(req)->irq_refcount)
return -EBUSY;
- timeout = jiffies + 1;
+ timeout = local_clock_us(&cpu) + 10;
while (!need_resched()) {
if (i915_gem_request_completed(req, true))
return 0;
@@ -1161,7 +1204,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
if (signal_pending_state(state, current))
break;
- if (time_after_eq(jiffies, timeout))
+ if (busywait_stop(timeout, cpu))
break;
cpu_relax_lowlatency();
--
2.6.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH v2 3/3] drm/i915: Only spin whilst waiting on the current request
2015-11-27 14:26 [PATCH v2 1/3] drm/i915: Break busywaiting for requests on pending signals Chris Wilson
2015-11-27 14:26 ` [PATCH v2 2/3] drm/i915: Limit the busy wait on requests to 10us not 10ms! Chris Wilson
@ 2015-11-27 14:26 ` Chris Wilson
2015-11-27 16:51 ` [PATCH v2 1/3] drm/i915: Break busywaiting for requests on pending signals Tvrtko Ursulin
2 siblings, 0 replies; 5+ messages in thread
From: Chris Wilson @ 2015-11-27 14:26 UTC (permalink / raw)
To: intel-gfx; +Cc: Daniel Vetter, Eero Tamminen, stable, Rantala, Valtteri
Limit busywaiting only to the request currently being processed by the
GPU. If the request is not currently being processed by the GPU, there
is a very low likelihood of it being completed within the 2 microsecond
spin timeout and so we will just be wasting CPU cycles.
v2: Check for logical inversion when rebasing - we were incorrectly
checking for this request being active, and instead busywaiting for
when the GPU was not yet processing the request of interest.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
Cc: stable@kernel.vger.org
---
drivers/gpu/drm/i915/i915_debugfs.c | 2 +-
drivers/gpu/drm/i915/i915_drv.h | 25 +++++++++++++++----------
drivers/gpu/drm/i915/i915_gem.c | 12 +++++++++---
drivers/gpu/drm/i915/i915_gpu_error.c | 2 +-
drivers/gpu/drm/i915/i915_guc_submission.c | 2 +-
5 files changed, 27 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index a3b22bdacd44..f0aeafbcf307 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -710,7 +710,7 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
if (req->pid)
task = pid_task(req->pid, PIDTYPE_PID);
seq_printf(m, " %x @ %d: %s [%d]\n",
- req->seqno,
+ req->seqno_complete,
(int) (jiffies - req->emitted_jiffies),
task ? task->comm : "<unknown>",
task ? task->pid : -1);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 95bb27de774f..19986bd032ef 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2192,8 +2192,12 @@ struct drm_i915_gem_request {
struct drm_i915_private *i915;
struct intel_engine_cs *ring;
- /** GEM sequence number associated with this request. */
- uint32_t seqno;
+ /* GEM sequence numbers associated with this request:
+ * active - marks the start of the request, passed when
+ * the GPU completes the previous request
+ * complete - marks the end of the request
+ */
+ u32 seqno_active, seqno_complete;
/** Position in the ringbuffer of the start of the request */
u32 head;
@@ -2270,7 +2274,7 @@ int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
static inline uint32_t
i915_gem_request_get_seqno(struct drm_i915_gem_request *req)
{
- return req ? req->seqno : 0;
+ return req ? req->seqno_complete : 0;
}
static inline struct intel_engine_cs *
@@ -2909,16 +2913,17 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
return (int32_t)(seq1 - seq2) >= 0;
}
+static inline bool i915_gem_request_active(struct drm_i915_gem_request *req)
+{
+ u32 seqno = req->ring->get_seqno(req->ring, true);
+ return i915_seqno_passed(seqno, req->seqno_active);
+}
+
static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
bool lazy_coherency)
{
- u32 seqno;
-
- BUG_ON(req == NULL);
-
- seqno = req->ring->get_seqno(req->ring, lazy_coherency);
-
- return i915_seqno_passed(seqno, req->seqno);
+ u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency);
+ return i915_seqno_passed(seqno, req->seqno_complete);
}
int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c0941e8a70f7..490840324e88 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1193,9 +1193,14 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
* takes to sleep on a request, on the order of a microsecond.
*/
- if (i915_gem_request_get_ring(req)->irq_refcount)
+ if (req->ring->irq_refcount)
return -EBUSY;
+ /* Only spin if we know the GPU is processing this request */
+ if (!i915_seqno_passed(req->ring->get_seqno(req->ring, false),
+ req->seqno_active))
+ return -EAGAIN;
+
timeout = local_clock_us(&cpu) + 10;
while (!need_resched()) {
if (i915_gem_request_completed(req, true))
@@ -2592,7 +2597,8 @@ void __i915_add_request(struct drm_i915_gem_request *request,
request->batch_obj = obj;
request->emitted_jiffies = jiffies;
- ring->last_submitted_seqno = request->seqno;
+ request->seqno_active = ring->last_submitted_seqno;
+ ring->last_submitted_seqno = request->seqno_complete;
list_add_tail(&request->list, &ring->request_list);
trace_i915_gem_request_add(request);
@@ -2691,7 +2697,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
if (req == NULL)
return -ENOMEM;
- ret = i915_gem_get_seqno(ring->dev, &req->seqno);
+ ret = i915_gem_get_seqno(ring->dev, &req->seqno_complete);
if (ret)
goto err;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 2f04e4f2ff35..33501499bf51 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1072,7 +1072,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
struct drm_i915_error_request *erq;
erq = &error->ring[i].requests[count++];
- erq->seqno = request->seqno;
+ erq->seqno = request->seqno_complete;
erq->jiffies = request->emitted_jiffies;
erq->tail = request->postfix;
}
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 036b42bae827..d313e1938d58 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -625,7 +625,7 @@ int i915_guc_submit(struct i915_guc_client *client,
spin_lock(&guc->host2guc_lock);
guc->submissions[ring_id] += 1;
- guc->last_seqno[ring_id] = rq->seqno;
+ guc->last_seqno[ring_id] = rq->seqno_complete;
spin_unlock(&guc->host2guc_lock);
return q_ret;
--
2.6.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 5+ messages in thread