* [PATCH 1/3] drm/i915: Avoid the branch in computing intel_ring_space()
@ 2017-05-03 10:06 Chris Wilson
2017-05-03 10:06 ` [PATCH 2/3] drm/i915: Report the ring->space from intel_ring_update_space() Chris Wilson
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Chris Wilson @ 2017-05-03 10:06 UTC (permalink / raw)
To: intel-gfx; +Cc: Mika Kuoppala
Exploit the power-of-two ring size to compute the space across the
wraparound using a mask rather than a if. Convert to unsigned integers
so the operation is well defined.
References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++----------
drivers/gpu/drm/i915/intel_ringbuffer.h | 36 ++++++++++++++++++++-------------
2 files changed, 34 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 29b5afac7856..5947bba20c4a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -39,12 +39,16 @@
*/
#define LEGACY_REQUEST_SIZE 200
-static int __intel_ring_space(int head, int tail, int size)
+static unsigned int __intel_ring_space(unsigned int head,
+ unsigned int tail,
+ unsigned int size)
{
- int space = head - tail;
- if (space <= 0)
- space += size;
- return space - I915_RING_FREE_SPACE;
+ /*
+ * "If the Ring Buffer Head Pointer and the Tail Pointer are on the
+ * same cacheline, the Head Pointer must not be greater than the Tail
+ * Pointer."
+ */
+ return (head - tail - CACHELINE_BYTES) & (size - 1);
}
void intel_ring_update_space(struct intel_ring *ring)
@@ -1669,12 +1673,9 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
GEM_BUG_ON(!req->reserved_space);
list_for_each_entry(target, &ring->request_list, ring_link) {
- unsigned space;
-
/* Would completion of this request free enough space? */
- space = __intel_ring_space(target->postfix, ring->emit,
- ring->size);
- if (space >= bytes)
+ if (bytes <= __intel_ring_space(target->postfix,
+ ring->emit, ring->size))
break;
}
@@ -1743,11 +1744,11 @@ u32 *intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
}
GEM_BUG_ON(ring->emit > ring->size - bytes);
+ GEM_BUG_ON(ring->space < bytes);
cs = ring->vaddr + ring->emit;
GEM_DEBUG_EXEC(memset(cs, POISON_INUSE, bytes));
ring->emit += bytes;
ring->space -= bytes;
- GEM_BUG_ON(ring->space < 0);
return cs;
}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 02d741ef99ad..64584d134681 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -17,17 +17,6 @@
#define CACHELINE_BYTES 64
#define CACHELINE_DWORDS (CACHELINE_BYTES / sizeof(uint32_t))
-/*
- * Gen2 BSpec "1. Programming Environment" / 1.4.4.6 "Ring Buffer Use"
- * Gen3 BSpec "vol1c Memory Interface Functions" / 2.3.4.5 "Ring Buffer Use"
- * Gen4+ BSpec "vol1c Memory Interface and Command Stream" / 5.3.4.5 "Ring Buffer Use"
- *
- * "If the Ring Buffer Head Pointer and the Tail Pointer are on the same
- * cacheline, the Head Pointer must not be greater than the Tail
- * Pointer."
- */
-#define I915_RING_FREE_SPACE 64
-
struct intel_hw_status_page {
struct i915_vma *vma;
u32 *page_addr;
@@ -145,9 +134,9 @@ struct intel_ring {
u32 tail;
u32 emit;
- int space;
- int size;
- int effective_size;
+ u32 space;
+ u32 size;
+ u32 effective_size;
};
struct i915_gem_context;
@@ -548,6 +537,25 @@ assert_ring_tail_valid(const struct intel_ring *ring, unsigned int tail)
*/
GEM_BUG_ON(!IS_ALIGNED(tail, 8));
GEM_BUG_ON(tail >= ring->size);
+
+ /*
+ * "Ring Buffer Use"
+ * Gen2 BSpec "1. Programming Environment" / 1.4.4.6
+ * Gen3 BSpec "1c Memory Interface Functions" / 2.3.4.5
+ * Gen4+ BSpec "1c Memory Interface and Command Stream" / 5.3.4.5
+ * "If the Ring Buffer Head Pointer and the Tail Pointer are on the
+ * same cacheline, the Head Pointer must not be greater than the Tail
+ * Pointer."
+ *
+ * We use ring->head as the last known location of the actual RING_HEAD,
+ * it may have advanced but in the worst case it is equally the same
+ * as ring->head and so we should never program RING_TAIL to advance
+ * into the same cacheline as ring->head.
+ */
+#define cacheline(a) round_down(a, CACHELINE_BYTES)
+ GEM_BUG_ON(cacheline(tail) == cacheline(ring->head) &&
+ tail < ring->head);
+#undef cacheline
}
static inline unsigned int
--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 2/3] drm/i915: Report the ring->space from intel_ring_update_space()
2017-05-03 10:06 [PATCH 1/3] drm/i915: Avoid the branch in computing intel_ring_space() Chris Wilson
@ 2017-05-03 10:06 ` Chris Wilson
2017-05-03 10:06 ` [PATCH 3/3] drm/i915: Micro-optimise hotpath through intel_ring_begin() Chris Wilson
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-05-03 10:06 UTC (permalink / raw)
To: intel-gfx
Some callers immediately want to know the current ring->space after
calling intel_ring_update_space(), which we can freely provide via the
return parameter.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/intel_ringbuffer.c | 12 ++++++++----
drivers/gpu/drm/i915/intel_ringbuffer.h | 2 +-
2 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 5947bba20c4a..c46e5439d379 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -51,9 +51,14 @@ static unsigned int __intel_ring_space(unsigned int head,
return (head - tail - CACHELINE_BYTES) & (size - 1);
}
-void intel_ring_update_space(struct intel_ring *ring)
+unsigned int intel_ring_update_space(struct intel_ring *ring)
{
- ring->space = __intel_ring_space(ring->head, ring->emit, ring->size);
+ unsigned int space;
+
+ space = __intel_ring_space(ring->head, ring->emit, ring->size);
+
+ ring->space = space;
+ return space;
}
static int
@@ -1657,8 +1662,7 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
lockdep_assert_held(&req->i915->drm.struct_mutex);
- intel_ring_update_space(ring);
- if (ring->space >= bytes)
+ if (intel_ring_update_space(ring) >= bytes)
return 0;
/*
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 64584d134681..4a599e3480a9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -486,7 +486,7 @@ int intel_ring_pin(struct intel_ring *ring,
struct drm_i915_private *i915,
unsigned int offset_bias);
void intel_ring_reset(struct intel_ring *ring, u32 tail);
-void intel_ring_update_space(struct intel_ring *ring);
+unsigned int intel_ring_update_space(struct intel_ring *ring);
void intel_ring_unpin(struct intel_ring *ring);
void intel_ring_free(struct intel_ring *ring);
--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 3/3] drm/i915: Micro-optimise hotpath through intel_ring_begin()
2017-05-03 10:06 [PATCH 1/3] drm/i915: Avoid the branch in computing intel_ring_space() Chris Wilson
2017-05-03 10:06 ` [PATCH 2/3] drm/i915: Report the ring->space from intel_ring_update_space() Chris Wilson
@ 2017-05-03 10:06 ` Chris Wilson
2017-05-04 12:11 ` Mika Kuoppala
2017-05-03 11:04 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Avoid the branch in computing intel_ring_space() Patchwork
2017-05-03 11:53 ` [PATCH 1/3] " Mika Kuoppala
3 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2017-05-03 10:06 UTC (permalink / raw)
To: intel-gfx
Typically, there is space available within the ring and if not we have
to wait (by definition a slow path). Rearrange the code to reduce the
number of branches and stack size for the hotpath, accomodating a slight
growth for the wait.
v2: Fix the new assert that packets are not larger than the actual ring.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
drivers/gpu/drm/i915/intel_ringbuffer.c | 63 +++++++++++++++++----------------
1 file changed, 33 insertions(+), 30 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index c46e5439d379..53123c1cfcc5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1654,7 +1654,7 @@ static int ring_request_alloc(struct drm_i915_gem_request *request)
return 0;
}
-static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
+static noinline int wait_for_space(struct drm_i915_gem_request *req, int bytes)
{
struct intel_ring *ring = req->ring;
struct drm_i915_gem_request *target;
@@ -1702,49 +1702,52 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
u32 *intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
{
struct intel_ring *ring = req->ring;
- int remain_actual = ring->size - ring->emit;
- int remain_usable = ring->effective_size - ring->emit;
- int bytes = num_dwords * sizeof(u32);
- int total_bytes, wait_bytes;
- bool need_wrap = false;
+ const unsigned int remain_usable = ring->effective_size - ring->emit;
+ const unsigned int bytes = num_dwords * sizeof(u32);
+ unsigned int need_wrap = 0;
+ unsigned int total_bytes;
u32 *cs;
total_bytes = bytes + req->reserved_space;
+ GEM_BUG_ON(total_bytes > ring->effective_size);
- if (unlikely(bytes > remain_usable)) {
- /*
- * Not enough space for the basic request. So need to flush
- * out the remainder and then wait for base + reserved.
- */
- wait_bytes = remain_actual + total_bytes;
- need_wrap = true;
- } else if (unlikely(total_bytes > remain_usable)) {
- /*
- * The base request will fit but the reserved space
- * falls off the end. So we don't need an immediate wrap
- * and only need to effectively wait for the reserved
- * size space from the start of ringbuffer.
- */
- wait_bytes = remain_actual + req->reserved_space;
- } else {
- /* No wrapping required, just waiting. */
- wait_bytes = total_bytes;
+ if (unlikely(total_bytes > remain_usable)) {
+ const int remain_actual = ring->size - ring->emit;
+
+ if (bytes > remain_usable) {
+ /*
+ * Not enough space for the basic request. So need to
+ * flush out the remainder and then wait for
+ * base + reserved.
+ */
+ total_bytes += remain_actual;
+ need_wrap = remain_actual | 1;
+ } else {
+ /*
+ * The base request will fit but the reserved space
+ * falls off the end. So we don't need an immediate
+ * wrap and only need to effectively wait for the
+ * reserved size from the start of ringbuffer.
+ */
+ total_bytes = req->reserved_space + remain_actual;
+ }
}
- if (wait_bytes > ring->space) {
- int ret = wait_for_space(req, wait_bytes);
+ if (unlikely(total_bytes > ring->space)) {
+ int ret = wait_for_space(req, total_bytes);
if (unlikely(ret))
return ERR_PTR(ret);
}
if (unlikely(need_wrap)) {
- GEM_BUG_ON(remain_actual > ring->space);
- GEM_BUG_ON(ring->emit + remain_actual > ring->size);
+ need_wrap &= ~1;
+ GEM_BUG_ON(need_wrap > ring->space);
+ GEM_BUG_ON(ring->emit + need_wrap > ring->size);
/* Fill the tail with MI_NOOP */
- memset(ring->vaddr + ring->emit, 0, remain_actual);
+ memset(ring->vaddr + ring->emit, 0, need_wrap);
ring->emit = 0;
- ring->space -= remain_actual;
+ ring->space -= need_wrap;
}
GEM_BUG_ON(ring->emit > ring->size - bytes);
--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH 3/3] drm/i915: Micro-optimise hotpath through intel_ring_begin()
2017-05-03 10:06 ` [PATCH 3/3] drm/i915: Micro-optimise hotpath through intel_ring_begin() Chris Wilson
@ 2017-05-04 12:11 ` Mika Kuoppala
2017-05-04 12:27 ` Chris Wilson
0 siblings, 1 reply; 12+ messages in thread
From: Mika Kuoppala @ 2017-05-04 12:11 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Typically, there is space available within the ring and if not we have
> to wait (by definition a slow path). Rearrange the code to reduce the
> number of branches and stack size for the hotpath, accomodating a slight
> growth for the wait.
>
> v2: Fix the new assert that packets are not larger than the actual ring.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
> drivers/gpu/drm/i915/intel_ringbuffer.c | 63 +++++++++++++++++----------------
> 1 file changed, 33 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index c46e5439d379..53123c1cfcc5 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1654,7 +1654,7 @@ static int ring_request_alloc(struct drm_i915_gem_request *request)
> return 0;
> }
>
> -static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
> +static noinline int wait_for_space(struct drm_i915_gem_request *req, int bytes)
> {
> struct intel_ring *ring = req->ring;
> struct drm_i915_gem_request *target;
> @@ -1702,49 +1702,52 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
> u32 *intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
> {
> struct intel_ring *ring = req->ring;
> - int remain_actual = ring->size - ring->emit;
> - int remain_usable = ring->effective_size - ring->emit;
> - int bytes = num_dwords * sizeof(u32);
> - int total_bytes, wait_bytes;
> - bool need_wrap = false;
> + const unsigned int remain_usable = ring->effective_size - ring->emit;
> + const unsigned int bytes = num_dwords * sizeof(u32);
> + unsigned int need_wrap = 0;
> + unsigned int total_bytes;
> u32 *cs;
>
> total_bytes = bytes + req->reserved_space;
> + GEM_BUG_ON(total_bytes > ring->effective_size);
>
> - if (unlikely(bytes > remain_usable)) {
> - /*
> - * Not enough space for the basic request. So need to flush
> - * out the remainder and then wait for base + reserved.
> - */
> - wait_bytes = remain_actual + total_bytes;
> - need_wrap = true;
> - } else if (unlikely(total_bytes > remain_usable)) {
> - /*
> - * The base request will fit but the reserved space
> - * falls off the end. So we don't need an immediate wrap
> - * and only need to effectively wait for the reserved
> - * size space from the start of ringbuffer.
> - */
> - wait_bytes = remain_actual + req->reserved_space;
> - } else {
> - /* No wrapping required, just waiting. */
> - wait_bytes = total_bytes;
> + if (unlikely(total_bytes > remain_usable)) {
> + const int remain_actual = ring->size - ring->emit;
> +
> + if (bytes > remain_usable) {
> + /*
> + * Not enough space for the basic request. So need to
> + * flush out the remainder and then wait for
> + * base + reserved.
> + */
> + total_bytes += remain_actual;
> + need_wrap = remain_actual | 1;
Your remain_actual should never reach zero. So in here
forcing the lowest bit on, and later off, seems superfluous.
-Mika
> + } else {
> + /*
> + * The base request will fit but the reserved space
> + * falls off the end. So we don't need an immediate
> + * wrap and only need to effectively wait for the
> + * reserved size from the start of ringbuffer.
> + */
> + total_bytes = req->reserved_space + remain_actual;
> + }
> }
>
> - if (wait_bytes > ring->space) {
> - int ret = wait_for_space(req, wait_bytes);
> + if (unlikely(total_bytes > ring->space)) {
> + int ret = wait_for_space(req, total_bytes);
> if (unlikely(ret))
> return ERR_PTR(ret);
> }
>
> if (unlikely(need_wrap)) {
> - GEM_BUG_ON(remain_actual > ring->space);
> - GEM_BUG_ON(ring->emit + remain_actual > ring->size);
> + need_wrap &= ~1;
> + GEM_BUG_ON(need_wrap > ring->space);
> + GEM_BUG_ON(ring->emit + need_wrap > ring->size);
>
> /* Fill the tail with MI_NOOP */
> - memset(ring->vaddr + ring->emit, 0, remain_actual);
> + memset(ring->vaddr + ring->emit, 0, need_wrap);
> ring->emit = 0;
> - ring->space -= remain_actual;
> + ring->space -= need_wrap;
> }
>
> GEM_BUG_ON(ring->emit > ring->size - bytes);
> --
> 2.11.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 3/3] drm/i915: Micro-optimise hotpath through intel_ring_begin()
2017-05-04 12:11 ` Mika Kuoppala
@ 2017-05-04 12:27 ` Chris Wilson
2017-05-04 12:59 ` Mika Kuoppala
0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2017-05-04 12:27 UTC (permalink / raw)
To: Mika Kuoppala; +Cc: intel-gfx
On Thu, May 04, 2017 at 03:11:45PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > Typically, there is space available within the ring and if not we have
> > to wait (by definition a slow path). Rearrange the code to reduce the
> > number of branches and stack size for the hotpath, accomodating a slight
> > growth for the wait.
> >
> > v2: Fix the new assert that packets are not larger than the actual ring.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> > drivers/gpu/drm/i915/intel_ringbuffer.c | 63 +++++++++++++++++----------------
> > 1 file changed, 33 insertions(+), 30 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index c46e5439d379..53123c1cfcc5 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1654,7 +1654,7 @@ static int ring_request_alloc(struct drm_i915_gem_request *request)
> > return 0;
> > }
> >
> > -static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
> > +static noinline int wait_for_space(struct drm_i915_gem_request *req, int bytes)
> > {
> > struct intel_ring *ring = req->ring;
> > struct drm_i915_gem_request *target;
> > @@ -1702,49 +1702,52 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
> > u32 *intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
> > {
> > struct intel_ring *ring = req->ring;
> > - int remain_actual = ring->size - ring->emit;
> > - int remain_usable = ring->effective_size - ring->emit;
> > - int bytes = num_dwords * sizeof(u32);
> > - int total_bytes, wait_bytes;
> > - bool need_wrap = false;
> > + const unsigned int remain_usable = ring->effective_size - ring->emit;
> > + const unsigned int bytes = num_dwords * sizeof(u32);
> > + unsigned int need_wrap = 0;
> > + unsigned int total_bytes;
> > u32 *cs;
> >
> > total_bytes = bytes + req->reserved_space;
> > + GEM_BUG_ON(total_bytes > ring->effective_size);
> >
> > - if (unlikely(bytes > remain_usable)) {
> > - /*
> > - * Not enough space for the basic request. So need to flush
> > - * out the remainder and then wait for base + reserved.
> > - */
> > - wait_bytes = remain_actual + total_bytes;
> > - need_wrap = true;
> > - } else if (unlikely(total_bytes > remain_usable)) {
> > - /*
> > - * The base request will fit but the reserved space
> > - * falls off the end. So we don't need an immediate wrap
> > - * and only need to effectively wait for the reserved
> > - * size space from the start of ringbuffer.
> > - */
> > - wait_bytes = remain_actual + req->reserved_space;
> > - } else {
> > - /* No wrapping required, just waiting. */
> > - wait_bytes = total_bytes;
> > + if (unlikely(total_bytes > remain_usable)) {
> > + const int remain_actual = ring->size - ring->emit;
> > +
> > + if (bytes > remain_usable) {
> > + /*
> > + * Not enough space for the basic request. So need to
> > + * flush out the remainder and then wait for
> > + * base + reserved.
> > + */
> > + total_bytes += remain_actual;
> > + need_wrap = remain_actual | 1;
>
> Your remain_actual should never reach zero. So in here
> forcing the lowest bit on, and later off, seems superfluous.
Why can't we fill up to the last byte with commands? remain_actual is
just (size - tail) and we don't force a wrap until emit crosses the
boundary (and not before). We hit remain_actual == 0 in practice.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 3/3] drm/i915: Micro-optimise hotpath through intel_ring_begin()
2017-05-04 12:27 ` Chris Wilson
@ 2017-05-04 12:59 ` Mika Kuoppala
2017-05-04 13:05 ` Chris Wilson
0 siblings, 1 reply; 12+ messages in thread
From: Mika Kuoppala @ 2017-05-04 12:59 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> On Thu, May 04, 2017 at 03:11:45PM +0300, Mika Kuoppala wrote:
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>>
>> > Typically, there is space available within the ring and if not we have
>> > to wait (by definition a slow path). Rearrange the code to reduce the
>> > number of branches and stack size for the hotpath, accomodating a slight
>> > growth for the wait.
>> >
>> > v2: Fix the new assert that packets are not larger than the actual ring.
>> >
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > ---
>> > drivers/gpu/drm/i915/intel_ringbuffer.c | 63 +++++++++++++++++----------------
>> > 1 file changed, 33 insertions(+), 30 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> > index c46e5439d379..53123c1cfcc5 100644
>> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> > @@ -1654,7 +1654,7 @@ static int ring_request_alloc(struct drm_i915_gem_request *request)
>> > return 0;
>> > }
>> >
>> > -static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
>> > +static noinline int wait_for_space(struct drm_i915_gem_request *req, int bytes)
>> > {
>> > struct intel_ring *ring = req->ring;
>> > struct drm_i915_gem_request *target;
>> > @@ -1702,49 +1702,52 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
>> > u32 *intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
>> > {
>> > struct intel_ring *ring = req->ring;
>> > - int remain_actual = ring->size - ring->emit;
>> > - int remain_usable = ring->effective_size - ring->emit;
>> > - int bytes = num_dwords * sizeof(u32);
>> > - int total_bytes, wait_bytes;
>> > - bool need_wrap = false;
>> > + const unsigned int remain_usable = ring->effective_size - ring->emit;
>> > + const unsigned int bytes = num_dwords * sizeof(u32);
>> > + unsigned int need_wrap = 0;
>> > + unsigned int total_bytes;
>> > u32 *cs;
>> >
>> > total_bytes = bytes + req->reserved_space;
>> > + GEM_BUG_ON(total_bytes > ring->effective_size);
>> >
>> > - if (unlikely(bytes > remain_usable)) {
>> > - /*
>> > - * Not enough space for the basic request. So need to flush
>> > - * out the remainder and then wait for base + reserved.
>> > - */
>> > - wait_bytes = remain_actual + total_bytes;
>> > - need_wrap = true;
>> > - } else if (unlikely(total_bytes > remain_usable)) {
>> > - /*
>> > - * The base request will fit but the reserved space
>> > - * falls off the end. So we don't need an immediate wrap
>> > - * and only need to effectively wait for the reserved
>> > - * size space from the start of ringbuffer.
>> > - */
>> > - wait_bytes = remain_actual + req->reserved_space;
>> > - } else {
>> > - /* No wrapping required, just waiting. */
>> > - wait_bytes = total_bytes;
>> > + if (unlikely(total_bytes > remain_usable)) {
>> > + const int remain_actual = ring->size - ring->emit;
>> > +
>> > + if (bytes > remain_usable) {
>> > + /*
>> > + * Not enough space for the basic request. So need to
>> > + * flush out the remainder and then wait for
>> > + * base + reserved.
>> > + */
>> > + total_bytes += remain_actual;
>> > + need_wrap = remain_actual | 1;
>>
>> Your remain_actual should never reach zero. So in here
>> forcing the lowest bit on, and later off, seems superfluous.
>
> Why can't we fill up to the last byte with commands? remain_actual is
> just (size - tail) and we don't force a wrap until emit crosses the
> boundary (and not before). We hit remain_actual == 0 in practice.
> -Chris
My mistake, was thinking postwrap.
num_dwords and second parameter to wait_for_space should be unsigned.
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 3/3] drm/i915: Micro-optimise hotpath through intel_ring_begin()
2017-05-04 12:59 ` Mika Kuoppala
@ 2017-05-04 13:05 ` Chris Wilson
0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-05-04 13:05 UTC (permalink / raw)
To: Mika Kuoppala; +Cc: intel-gfx
On Thu, May 04, 2017 at 03:59:05PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > On Thu, May 04, 2017 at 03:11:45PM +0300, Mika Kuoppala wrote:
> >> Chris Wilson <chris@chris-wilson.co.uk> writes:
> >>
> >> > Typically, there is space available within the ring and if not we have
> >> > to wait (by definition a slow path). Rearrange the code to reduce the
> >> > number of branches and stack size for the hotpath, accomodating a slight
> >> > growth for the wait.
> >> >
> >> > v2: Fix the new assert that packets are not larger than the actual ring.
> >> >
> >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> > ---
> >> > drivers/gpu/drm/i915/intel_ringbuffer.c | 63 +++++++++++++++++----------------
> >> > 1 file changed, 33 insertions(+), 30 deletions(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >> > index c46e5439d379..53123c1cfcc5 100644
> >> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> >> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >> > @@ -1654,7 +1654,7 @@ static int ring_request_alloc(struct drm_i915_gem_request *request)
> >> > return 0;
> >> > }
> >> >
> >> > -static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
> >> > +static noinline int wait_for_space(struct drm_i915_gem_request *req, int bytes)
> >> > {
> >> > struct intel_ring *ring = req->ring;
> >> > struct drm_i915_gem_request *target;
> >> > @@ -1702,49 +1702,52 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
> >> > u32 *intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
> >> > {
> >> > struct intel_ring *ring = req->ring;
> >> > - int remain_actual = ring->size - ring->emit;
> >> > - int remain_usable = ring->effective_size - ring->emit;
> >> > - int bytes = num_dwords * sizeof(u32);
> >> > - int total_bytes, wait_bytes;
> >> > - bool need_wrap = false;
> >> > + const unsigned int remain_usable = ring->effective_size - ring->emit;
> >> > + const unsigned int bytes = num_dwords * sizeof(u32);
> >> > + unsigned int need_wrap = 0;
> >> > + unsigned int total_bytes;
> >> > u32 *cs;
> >> >
> >> > total_bytes = bytes + req->reserved_space;
> >> > + GEM_BUG_ON(total_bytes > ring->effective_size);
> >> >
> >> > - if (unlikely(bytes > remain_usable)) {
> >> > - /*
> >> > - * Not enough space for the basic request. So need to flush
> >> > - * out the remainder and then wait for base + reserved.
> >> > - */
> >> > - wait_bytes = remain_actual + total_bytes;
> >> > - need_wrap = true;
> >> > - } else if (unlikely(total_bytes > remain_usable)) {
> >> > - /*
> >> > - * The base request will fit but the reserved space
> >> > - * falls off the end. So we don't need an immediate wrap
> >> > - * and only need to effectively wait for the reserved
> >> > - * size space from the start of ringbuffer.
> >> > - */
> >> > - wait_bytes = remain_actual + req->reserved_space;
> >> > - } else {
> >> > - /* No wrapping required, just waiting. */
> >> > - wait_bytes = total_bytes;
> >> > + if (unlikely(total_bytes > remain_usable)) {
> >> > + const int remain_actual = ring->size - ring->emit;
> >> > +
> >> > + if (bytes > remain_usable) {
> >> > + /*
> >> > + * Not enough space for the basic request. So need to
> >> > + * flush out the remainder and then wait for
> >> > + * base + reserved.
> >> > + */
> >> > + total_bytes += remain_actual;
> >> > + need_wrap = remain_actual | 1;
> >>
> >> Your remain_actual should never reach zero. So in here
> >> forcing the lowest bit on, and later off, seems superfluous.
> >
> > Why can't we fill up to the last byte with commands? remain_actual is
> > just (size - tail) and we don't force a wrap until emit crosses the
> > boundary (and not before). We hit remain_actual == 0 in practice.
> > -Chris
>
> My mistake, was thinking postwrap.
>
> num_dwords and second parameter to wait_for_space should be unsigned.
You predictive algorithm is working fine though. Applied after your
suggestion from patch 1.
Thanks,
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Avoid the branch in computing intel_ring_space()
2017-05-03 10:06 [PATCH 1/3] drm/i915: Avoid the branch in computing intel_ring_space() Chris Wilson
2017-05-03 10:06 ` [PATCH 2/3] drm/i915: Report the ring->space from intel_ring_update_space() Chris Wilson
2017-05-03 10:06 ` [PATCH 3/3] drm/i915: Micro-optimise hotpath through intel_ring_begin() Chris Wilson
@ 2017-05-03 11:04 ` Patchwork
2017-05-03 11:53 ` [PATCH 1/3] " Mika Kuoppala
3 siblings, 0 replies; 12+ messages in thread
From: Patchwork @ 2017-05-03 11:04 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: series starting with [1/3] drm/i915: Avoid the branch in computing intel_ring_space()
URL : https://patchwork.freedesktop.org/series/23878/
State : success
== Summary ==
Series 23878v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/23878/revisions/1/mbox/
Test gem_exec_flush:
Subgroup basic-batch-kernel-default-uc:
pass -> FAIL (fi-snb-2600) fdo#100007
Test gem_exec_suspend:
Subgroup basic-s4-devices:
pass -> DMESG-WARN (fi-kbl-7560u) fdo#100125
fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
fdo#100125 https://bugs.freedesktop.org/show_bug.cgi?id=100125
fi-bdw-5557u total:278 pass:267 dwarn:0 dfail:0 fail:0 skip:11 time:429s
fi-bdw-gvtdvm total:278 pass:256 dwarn:8 dfail:0 fail:0 skip:14 time:430s
fi-bsw-n3050 total:278 pass:242 dwarn:0 dfail:0 fail:0 skip:36 time:572s
fi-bxt-j4205 total:278 pass:259 dwarn:0 dfail:0 fail:0 skip:19 time:506s
fi-bxt-t5700 total:278 pass:258 dwarn:0 dfail:0 fail:0 skip:20 time:567s
fi-byt-j1900 total:278 pass:254 dwarn:0 dfail:0 fail:0 skip:24 time:492s
fi-byt-n2820 total:278 pass:250 dwarn:0 dfail:0 fail:0 skip:28 time:484s
fi-hsw-4770 total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16 time:412s
fi-hsw-4770r total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16 time:406s
fi-ilk-650 total:278 pass:228 dwarn:0 dfail:0 fail:0 skip:50 time:414s
fi-ivb-3520m total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time:492s
fi-ivb-3770 total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time:474s
fi-kbl-7500u total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time:462s
fi-kbl-7560u total:278 pass:267 dwarn:1 dfail:0 fail:0 skip:10 time:568s
fi-skl-6260u total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10 time:458s
fi-skl-6700hq total:278 pass:261 dwarn:0 dfail:0 fail:0 skip:17 time:572s
fi-skl-6700k total:278 pass:256 dwarn:4 dfail:0 fail:0 skip:18 time:462s
fi-skl-6770hq total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10 time:494s
fi-skl-gvtdvm total:278 pass:265 dwarn:0 dfail:0 fail:0 skip:13 time:430s
fi-snb-2520m total:278 pass:250 dwarn:0 dfail:0 fail:0 skip:28 time:522s
fi-snb-2600 total:278 pass:248 dwarn:0 dfail:0 fail:1 skip:29 time:410s
26ef9256ab34901a953bd5338a3c94388c745c99 drm-tip: 2017y-05m-03d-10h-11m-12s UTC integration manifest
fdbe0a2 drm/i915: Micro-optimise hotpath through intel_ring_begin()
90964bf drm/i915: Report the ring->space from intel_ring_update_space()
1c35c55 drm/i915: Avoid the branch in computing intel_ring_space()
== Logs ==
For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4609/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 1/3] drm/i915: Avoid the branch in computing intel_ring_space()
2017-05-03 10:06 [PATCH 1/3] drm/i915: Avoid the branch in computing intel_ring_space() Chris Wilson
` (2 preceding siblings ...)
2017-05-03 11:04 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Avoid the branch in computing intel_ring_space() Patchwork
@ 2017-05-03 11:53 ` Mika Kuoppala
2017-05-03 12:02 ` Chris Wilson
3 siblings, 1 reply; 12+ messages in thread
From: Mika Kuoppala @ 2017-05-03 11:53 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Exploit the power-of-two ring size to compute the space across the
> wraparound using a mask rather than a if. Convert to unsigned integers
> so the operation is well defined.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
> drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++----------
> drivers/gpu/drm/i915/intel_ringbuffer.h | 36 ++++++++++++++++++++-------------
> 2 files changed, 34 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 29b5afac7856..5947bba20c4a 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -39,12 +39,16 @@
> */
> #define LEGACY_REQUEST_SIZE 200
>
> -static int __intel_ring_space(int head, int tail, int size)
> +static unsigned int __intel_ring_space(unsigned int head,
> + unsigned int tail,
> + unsigned int size)
Why not u32 here too?
> {
> - int space = head - tail;
> - if (space <= 0)
> - space += size;
> - return space - I915_RING_FREE_SPACE;
> + /*
> + * "If the Ring Buffer Head Pointer and the Tail Pointer are on the
> + * same cacheline, the Head Pointer must not be greater than the Tail
> + * Pointer."
> + */
> + return (head - tail - CACHELINE_BYTES) & (size - 1);
I almost went and complained about the incorrect calculation here
but rereading the commit message I regained my senses.
> }
>
> void intel_ring_update_space(struct intel_ring *ring)
> @@ -1669,12 +1673,9 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
Bytes should be converted also to unsigned at some point.
> GEM_BUG_ON(!req->reserved_space);
>
> list_for_each_entry(target, &ring->request_list, ring_link) {
> - unsigned space;
> -
> /* Would completion of this request free enough space? */
> - space = __intel_ring_space(target->postfix, ring->emit,
> - ring->size);
> - if (space >= bytes)
> + if (bytes <= __intel_ring_space(target->postfix,
> + ring->emit, ring->size))
> break;
> }
>
> @@ -1743,11 +1744,11 @@ u32 *intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
> }
>
> GEM_BUG_ON(ring->emit > ring->size - bytes);
> + GEM_BUG_ON(ring->space < bytes);
> cs = ring->vaddr + ring->emit;
> GEM_DEBUG_EXEC(memset(cs, POISON_INUSE, bytes));
> ring->emit += bytes;
> ring->space -= bytes;
> - GEM_BUG_ON(ring->space < 0);
>
> return cs;
> }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 02d741ef99ad..64584d134681 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -17,17 +17,6 @@
> #define CACHELINE_BYTES 64
> #define CACHELINE_DWORDS (CACHELINE_BYTES / sizeof(uint32_t))
>
> -/*
> - * Gen2 BSpec "1. Programming Environment" / 1.4.4.6 "Ring Buffer Use"
> - * Gen3 BSpec "vol1c Memory Interface Functions" / 2.3.4.5 "Ring Buffer Use"
> - * Gen4+ BSpec "vol1c Memory Interface and Command Stream" / 5.3.4.5 "Ring Buffer Use"
> - *
> - * "If the Ring Buffer Head Pointer and the Tail Pointer are on the same
> - * cacheline, the Head Pointer must not be greater than the Tail
> - * Pointer."
> - */
> -#define I915_RING_FREE_SPACE 64
> -
> struct intel_hw_status_page {
> struct i915_vma *vma;
> u32 *page_addr;
> @@ -145,9 +134,9 @@ struct intel_ring {
> u32 tail;
> u32 emit;
>
> - int space;
> - int size;
> - int effective_size;
> + u32 space;
> + u32 size;
> + u32 effective_size;
> };
>
> struct i915_gem_context;
> @@ -548,6 +537,25 @@ assert_ring_tail_valid(const struct intel_ring *ring, unsigned int tail)
> */
> GEM_BUG_ON(!IS_ALIGNED(tail, 8));
> GEM_BUG_ON(tail >= ring->size);
> +
> + /*
> + * "Ring Buffer Use"
> + * Gen2 BSpec "1. Programming Environment" / 1.4.4.6
> + * Gen3 BSpec "1c Memory Interface Functions" / 2.3.4.5
> + * Gen4+ BSpec "1c Memory Interface and Command Stream" / 5.3.4.5
> + * "If the Ring Buffer Head Pointer and the Tail Pointer are on the
> + * same cacheline, the Head Pointer must not be greater than the Tail
> + * Pointer."
> + *
> + * We use ring->head as the last known location of the actual RING_HEAD,
> + * it may have advanced but in the worst case it is equally the same
> + * as ring->head and so we should never program RING_TAIL to advance
> + * into the same cacheline as ring->head.
> + */
> +#define cacheline(a) round_down(a, CACHELINE_BYTES)
> + GEM_BUG_ON(cacheline(tail) == cacheline(ring->head) &&
> + tail < ring->head);
Nitpick, ring->head > tail should have fit the quote above better.
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
> +#undef cacheline
> }
>
> static inline unsigned int
> --
> 2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 1/3] drm/i915: Avoid the branch in computing intel_ring_space()
2017-05-03 11:53 ` [PATCH 1/3] " Mika Kuoppala
@ 2017-05-03 12:02 ` Chris Wilson
2017-05-03 12:08 ` Chris Wilson
0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2017-05-03 12:02 UTC (permalink / raw)
To: Mika Kuoppala; +Cc: intel-gfx
On Wed, May 03, 2017 at 02:53:43PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > Exploit the power-of-two ring size to compute the space across the
> > wraparound using a mask rather than a if. Convert to unsigned integers
> > so the operation is well defined.
> >
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > ---
> > drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++----------
> > drivers/gpu/drm/i915/intel_ringbuffer.h | 36 ++++++++++++++++++++-------------
> > 2 files changed, 34 insertions(+), 25 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 29b5afac7856..5947bba20c4a 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -39,12 +39,16 @@
> > */
> > #define LEGACY_REQUEST_SIZE 200
> >
> > -static int __intel_ring_space(int head, int tail, int size)
> > +static unsigned int __intel_ring_space(unsigned int head,
> > + unsigned int tail,
> > + unsigned int size)
>
> Why not u32 here too?
Because there's no need to constrain the compiler's freedom in choosing
a natural uint.
> > {
> > - int space = head - tail;
> > - if (space <= 0)
> > - space += size;
> > - return space - I915_RING_FREE_SPACE;
> > + /*
> > + * "If the Ring Buffer Head Pointer and the Tail Pointer are on the
> > + * same cacheline, the Head Pointer must not be greater than the Tail
> > + * Pointer."
> > + */
> > + return (head - tail - CACHELINE_BYTES) & (size - 1);
>
> I almost went and complained about the incorrect calculation here
> but rereading the commit message I regained my senses.
>
> > }
> >
> > void intel_ring_update_space(struct intel_ring *ring)
> > @@ -1669,12 +1673,9 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
>
> Bytes should be converted also to unsigned at some point.
Might as well do it now, indeed.
>
> > GEM_BUG_ON(!req->reserved_space);
> >
> > list_for_each_entry(target, &ring->request_list, ring_link) {
> > - unsigned space;
> > -
> > /* Would completion of this request free enough space? */
> > - space = __intel_ring_space(target->postfix, ring->emit,
> > - ring->size);
> > - if (space >= bytes)
> > + if (bytes <= __intel_ring_space(target->postfix,
> > + ring->emit, ring->size))
> > break;
> > }
> >
> > @@ -1743,11 +1744,11 @@ u32 *intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
> > }
> >
> > GEM_BUG_ON(ring->emit > ring->size - bytes);
> > + GEM_BUG_ON(ring->space < bytes);
> > cs = ring->vaddr + ring->emit;
> > GEM_DEBUG_EXEC(memset(cs, POISON_INUSE, bytes));
> > ring->emit += bytes;
> > ring->space -= bytes;
> > - GEM_BUG_ON(ring->space < 0);
> >
> > return cs;
> > }
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > index 02d741ef99ad..64584d134681 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > @@ -17,17 +17,6 @@
> > #define CACHELINE_BYTES 64
> > #define CACHELINE_DWORDS (CACHELINE_BYTES / sizeof(uint32_t))
> >
> > -/*
> > - * Gen2 BSpec "1. Programming Environment" / 1.4.4.6 "Ring Buffer Use"
> > - * Gen3 BSpec "vol1c Memory Interface Functions" / 2.3.4.5 "Ring Buffer Use"
> > - * Gen4+ BSpec "vol1c Memory Interface and Command Stream" / 5.3.4.5 "Ring Buffer Use"
> > - *
> > - * "If the Ring Buffer Head Pointer and the Tail Pointer are on the same
> > - * cacheline, the Head Pointer must not be greater than the Tail
> > - * Pointer."
> > - */
> > -#define I915_RING_FREE_SPACE 64
> > -
> > struct intel_hw_status_page {
> > struct i915_vma *vma;
> > u32 *page_addr;
> > @@ -145,9 +134,9 @@ struct intel_ring {
> > u32 tail;
> > u32 emit;
> >
> > - int space;
> > - int size;
> > - int effective_size;
> > + u32 space;
> > + u32 size;
> > + u32 effective_size;
> > };
> >
> > struct i915_gem_context;
> > @@ -548,6 +537,25 @@ assert_ring_tail_valid(const struct intel_ring *ring, unsigned int tail)
> > */
> > GEM_BUG_ON(!IS_ALIGNED(tail, 8));
> > GEM_BUG_ON(tail >= ring->size);
> > +
> > + /*
> > + * "Ring Buffer Use"
> > + * Gen2 BSpec "1. Programming Environment" / 1.4.4.6
> > + * Gen3 BSpec "1c Memory Interface Functions" / 2.3.4.5
> > + * Gen4+ BSpec "1c Memory Interface and Command Stream" / 5.3.4.5
> > + * "If the Ring Buffer Head Pointer and the Tail Pointer are on the
> > + * same cacheline, the Head Pointer must not be greater than the Tail
> > + * Pointer."
> > + *
> > + * We use ring->head as the last known location of the actual RING_HEAD,
> > + * it may have advanced but in the worst case it is equally the same
> > + * as ring->head and so we should never program RING_TAIL to advance
> > + * into the same cacheline as ring->head.
> > + */
> > +#define cacheline(a) round_down(a, CACHELINE_BYTES)
> > + GEM_BUG_ON(cacheline(tail) == cacheline(ring->head) &&
> > + tail < ring->head);
>
> Nitpick, ring->head > tail should have fit the quote above better.
It's hard though. Here there is an equivalence between ring->tail and
RING_TAIL, but the relationship between ring->head and RING_HEAD is much
more relaxed, so I felt worth commenting upon following our last
discussion. The function is assert_ring_tail_valid() so I assume the
ring->tail == RING_TAIL is clear enough and I don't need to make the
comment even more convoluted.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 1/3] drm/i915: Avoid the branch in computing intel_ring_space()
2017-05-03 12:02 ` Chris Wilson
@ 2017-05-03 12:08 ` Chris Wilson
0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-05-03 12:08 UTC (permalink / raw)
To: Mika Kuoppala, intel-gfx
On Wed, May 03, 2017 at 01:02:40PM +0100, Chris Wilson wrote:
> On Wed, May 03, 2017 at 02:53:43PM +0300, Mika Kuoppala wrote:
> > > void intel_ring_update_space(struct intel_ring *ring)
> > > @@ -1669,12 +1673,9 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
> >
> > Bytes should be converted also to unsigned at some point.
>
> Might as well do it now, indeed.
Actually I'll do it in patch 3 as there are nearby similar changes.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/3] drm/i915: Avoid the branch in computing intel_ring_space()
@ 2017-04-26 8:37 Chris Wilson
2017-04-26 8:37 ` [PATCH 3/3] drm/i915: Micro-optimise hotpath through intel_ring_begin() Chris Wilson
0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2017-04-26 8:37 UTC (permalink / raw)
To: intel-gfx; +Cc: Mika Kuoppala
Exploit the power-of-two ring size to compute the space across the
wraparound using a mask rather than a if. Convert to unsigned integers
so the operation is well defined.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
drivers/gpu/drm/i915/intel_ringbuffer.c | 20 ++++++++++----------
drivers/gpu/drm/i915/intel_ringbuffer.h | 30 ++++++++++++++++--------------
2 files changed, 26 insertions(+), 24 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6836efb7e3d2..16739352f057 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -39,12 +39,15 @@
*/
#define LEGACY_REQUEST_SIZE 200
-static int __intel_ring_space(int head, int tail, int size)
+static unsigned int __intel_ring_space(unsigned int head,
+ unsigned int tail,
+ unsigned int size)
{
- int space = head - tail;
- if (space <= 0)
- space += size;
- return space - I915_RING_FREE_SPACE;
+ /* "If the Ring Buffer Head Pointer and the Tail Pointer are on the
+ * same cacheline, the Head Pointer must not be greater than the Tail
+ * Pointer."
+ */
+ return (head - tail - CACHELINE_BYTES) & (size - 1);
}
void intel_ring_update_space(struct intel_ring *ring)
@@ -1619,12 +1622,9 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
GEM_BUG_ON(!req->reserved_space);
list_for_each_entry(target, &ring->request_list, ring_link) {
- unsigned space;
-
/* Would completion of this request free enough space? */
- space = __intel_ring_space(target->postfix, ring->emit,
- ring->size);
- if (space >= bytes)
+ if (bytes <= __intel_ring_space(target->postfix,
+ ring->emit, ring->size))
break;
}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 96710b616efb..1b1099b94912 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -17,17 +17,6 @@
#define CACHELINE_BYTES 64
#define CACHELINE_DWORDS (CACHELINE_BYTES / sizeof(uint32_t))
-/*
- * Gen2 BSpec "1. Programming Environment" / 1.4.4.6 "Ring Buffer Use"
- * Gen3 BSpec "vol1c Memory Interface Functions" / 2.3.4.5 "Ring Buffer Use"
- * Gen4+ BSpec "vol1c Memory Interface and Command Stream" / 5.3.4.5 "Ring Buffer Use"
- *
- * "If the Ring Buffer Head Pointer and the Tail Pointer are on the same
- * cacheline, the Head Pointer must not be greater than the Tail
- * Pointer."
- */
-#define I915_RING_FREE_SPACE 64
-
struct intel_hw_status_page {
struct i915_vma *vma;
u32 *page_addr;
@@ -145,9 +134,9 @@ struct intel_ring {
u32 tail;
u32 emit;
- int space;
- int size;
- int effective_size;
+ u32 space;
+ u32 size;
+ u32 effective_size;
};
struct i915_gem_context;
@@ -547,6 +536,19 @@ assert_ring_tail_valid(const struct intel_ring *ring, unsigned int tail)
*/
GEM_BUG_ON(!IS_ALIGNED(tail, 8));
GEM_BUG_ON(tail >= ring->size);
+
+ /* "Ring Buffer Use"
+ * Gen2 BSpec "1. Programming Environment" / 1.4.4.6
+ * Gen3 BSpec "1c Memory Interface Functions" / 2.3.4.5
+ * Gen4+ BSpec "1c Memory Interface and Command Stream" / 5.3.4.5
+ * "If the Ring Buffer Head Pointer and the Tail Pointer are on the
+ * same cacheline, the Head Pointer must not be greater than the Tail
+ * Pointer."
+ */
+#define cacheline(a) round_down(a, CACHELINE_BYTES)
+ GEM_BUG_ON(cacheline(tail) == cacheline(ring->head) &&
+ tail < ring->head);
+#undef cacheline
}
static inline unsigned int
--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 3/3] drm/i915: Micro-optimise hotpath through intel_ring_begin()
2017-04-26 8:37 Chris Wilson
@ 2017-04-26 8:37 ` Chris Wilson
0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-04-26 8:37 UTC (permalink / raw)
To: intel-gfx
Typically, there is space available within the ring and if not we have
to wait (by definition a slow path). Rearrange the code to reduce the
number of branches and stack size for the hotpath, accomodating a slight
growth for the wait.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
drivers/gpu/drm/i915/intel_ringbuffer.c | 61 +++++++++++++++++----------------
1 file changed, 31 insertions(+), 30 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 25e9a0dcc42d..c1227eff28e0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1603,7 +1603,7 @@ static int ring_request_alloc(struct drm_i915_gem_request *request)
return 0;
}
-static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
+static noinline int wait_for_space(struct drm_i915_gem_request *req, int bytes)
{
struct intel_ring *ring = req->ring;
struct drm_i915_gem_request *target;
@@ -1651,49 +1651,50 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
u32 *intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
{
struct intel_ring *ring = req->ring;
- int remain_actual = ring->size - ring->emit;
- int remain_usable = ring->effective_size - ring->emit;
- int bytes = num_dwords * sizeof(u32);
- int total_bytes, wait_bytes;
- bool need_wrap = false;
+ const unsigned int remain_usable = ring->effective_size - ring->emit;
+ const unsigned int bytes = num_dwords * sizeof(u32);
+ unsigned int need_wrap = 0;
+ unsigned int total_bytes;
u32 *cs;
total_bytes = bytes + req->reserved_space;
+ GEM_BUG_ON(total_bytes > remain_usable);
- if (unlikely(bytes > remain_usable)) {
- /*
- * Not enough space for the basic request. So need to flush
- * out the remainder and then wait for base + reserved.
- */
- wait_bytes = remain_actual + total_bytes;
- need_wrap = true;
- } else if (unlikely(total_bytes > remain_usable)) {
- /*
- * The base request will fit but the reserved space
- * falls off the end. So we don't need an immediate wrap
- * and only need to effectively wait for the reserved
- * size space from the start of ringbuffer.
- */
- wait_bytes = remain_actual + req->reserved_space;
- } else {
- /* No wrapping required, just waiting. */
- wait_bytes = total_bytes;
+ if (unlikely(total_bytes > remain_usable)) {
+ const int remain_actual = ring->size - ring->emit;
+
+ if (bytes > remain_usable) {
+ /* Not enough space for the basic request. So need to
+ * flush out the remainder and then wait for
+ * base + reserved.
+ */
+ total_bytes += remain_actual;
+ need_wrap = remain_actual | 1;
+ } else {
+ /* The base request will fit but the reserved space
+ * falls off the end. So we don't need an immediate
+ * wrap and only need to effectively wait for the
+ * reserved size from the start of ringbuffer.
+ */
+ total_bytes = req->reserved_space + remain_actual;
+ }
}
- if (wait_bytes > ring->space) {
- int ret = wait_for_space(req, wait_bytes);
+ if (unlikely(total_bytes > ring->space)) {
+ int ret = wait_for_space(req, total_bytes);
if (unlikely(ret))
return ERR_PTR(ret);
}
if (unlikely(need_wrap)) {
- GEM_BUG_ON(remain_actual > ring->space);
- GEM_BUG_ON(ring->emit + remain_actual > ring->size);
+ need_wrap &= ~1;
+ GEM_BUG_ON(need_wrap > ring->space);
+ GEM_BUG_ON(ring->emit + need_wrap > ring->size);
/* Fill the tail with MI_NOOP */
- memset(ring->vaddr + ring->emit, 0, remain_actual);
+ memset(ring->vaddr + ring->emit, 0, need_wrap);
ring->emit = 0;
- ring->space -= remain_actual;
+ ring->space -= need_wrap;
}
GEM_BUG_ON(ring->emit > ring->size - bytes);
--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-05-04 13:05 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-03 10:06 [PATCH 1/3] drm/i915: Avoid the branch in computing intel_ring_space() Chris Wilson
2017-05-03 10:06 ` [PATCH 2/3] drm/i915: Report the ring->space from intel_ring_update_space() Chris Wilson
2017-05-03 10:06 ` [PATCH 3/3] drm/i915: Micro-optimise hotpath through intel_ring_begin() Chris Wilson
2017-05-04 12:11 ` Mika Kuoppala
2017-05-04 12:27 ` Chris Wilson
2017-05-04 12:59 ` Mika Kuoppala
2017-05-04 13:05 ` Chris Wilson
2017-05-03 11:04 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Avoid the branch in computing intel_ring_space() Patchwork
2017-05-03 11:53 ` [PATCH 1/3] " Mika Kuoppala
2017-05-03 12:02 ` Chris Wilson
2017-05-03 12:08 ` Chris Wilson
-- strict thread matches above, loose matches on Subject: below --
2017-04-26 8:37 Chris Wilson
2017-04-26 8:37 ` [PATCH 3/3] drm/i915: Micro-optimise hotpath through intel_ring_begin() Chris Wilson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.