public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Optimise VMA lookup slightly
@ 2016-12-13 12:22 Tvrtko Ursulin
  2016-12-13 12:41 ` Chris Wilson
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-12-13 12:22 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A few details to hopefully make a very hot function a tiny bit
more efficient:

 1. Cast VM pointers before substraction to save the compiler
    doing a smart one which includes multiplication.

 2. Use smaller type for comparison since we only care about
    the sign.

 3. Prefer the ppgtt lookup branch and inline it, allowing the
    compiler to optimise out the second part of i915_vma_compare
    and save one call indirection.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 23 +++++++++++++++++------
 drivers/gpu/drm/i915/i915_vma.h     |  9 ++++++---
 2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ef00d36680c9..aa81945a608b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3338,17 +3338,17 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 	i915_ggtt_flush(dev_priv);
 }
 
-struct i915_vma *
-i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
-		    struct i915_address_space *vm,
-		    const struct i915_ggtt_view *view)
+static inline struct i915_vma *
+__i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
+		      struct i915_address_space *vm,
+		      const struct i915_ggtt_view *view)
 {
 	struct rb_node *rb;
 
 	rb = obj->vma_tree.rb_node;
 	while (rb) {
 		struct i915_vma *vma = rb_entry(rb, struct i915_vma, obj_node);
-		long cmp;
+		int cmp;
 
 		cmp = i915_vma_compare(vma, vm, view);
 		if (cmp == 0)
@@ -3363,6 +3363,14 @@ i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 	return NULL;
 }
 
+noinline struct i915_vma *
+i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
+		    struct i915_address_space *vm,
+		    const struct i915_ggtt_view *view)
+{
+	return __i915_gem_obj_to_vma(obj, vm, view);
+}
+
 struct i915_vma *
 i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
 				  struct i915_address_space *vm,
@@ -3373,7 +3381,10 @@ i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
 	lockdep_assert_held(&obj->base.dev->struct_mutex);
 	GEM_BUG_ON(view && !i915_is_ggtt(vm));
 
-	vma = i915_gem_obj_to_vma(obj, vm, view);
+	if (likely(!view))
+		vma = __i915_gem_obj_to_vma(obj, vm, NULL);
+	else
+		vma = i915_gem_obj_to_vma(obj, vm, view);
 	if (!vma) {
 		vma = i915_vma_create(obj, vm, view);
 		GEM_BUG_ON(vma != i915_gem_obj_to_vma(obj, vm, view));
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 21be74c61065..098f206c1a4d 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -178,15 +178,18 @@ static inline void i915_vma_put(struct i915_vma *vma)
 	i915_gem_object_put(vma->obj);
 }
 
-static inline long
+static inline int
 i915_vma_compare(struct i915_vma *vma,
 		 struct i915_address_space *vm,
 		 const struct i915_ggtt_view *view)
 {
+	long cmp;
+
 	GEM_BUG_ON(view && !i915_is_ggtt(vm));
 
-	if (vma->vm != vm)
-		return vma->vm - vm;
+	cmp = (unsigned long)vma->vm - (unsigned long)vm;
+	if (cmp)
+		return cmp;
 
 	if (!view)
 		return vma->ggtt_view.type;
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Optimise VMA lookup slightly
  2016-12-13 12:22 [PATCH] drm/i915: Optimise VMA lookup slightly Tvrtko Ursulin
@ 2016-12-13 12:41 ` Chris Wilson
  2016-12-13 13:30   ` Tvrtko Ursulin
  2016-12-13 12:45 ` ✓ Fi.CI.BAT: success for " Patchwork
  2016-12-13 17:23 ` ✗ Fi.CI.BAT: warning for drm/i915: Optimise VMA lookup slightly (rev2) Patchwork
  2 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-12-13 12:41 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Tue, Dec 13, 2016 at 12:22:18PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> A few details to hopefully make a very hot function a tiny bit
> more efficient:
> 
>  1. Cast VM pointers before substraction to save the compiler
>     doing a smart one which includes multiplication.

Indeed. Not pretty though.

static always_inline __kernel_ptrdiff_t ptrdiff(const void *a, const void *b)
{
	return a - b;
}

cmp = ptrdiff(vma->vm, vm);
if (cmp)
	return cmp;


>  2. Use smaller type for comparison since we only care about
>     the sign.

Should be a no-op since the compiler also should only care about the
sign and not be moving the registers about, just the cc and we should be
inlining... Is gcc not smart enough? :(

> 
>  3. Prefer the ppgtt lookup branch and inline it, allowing the
>     compiler to optimise out the second part of i915_vma_compare
>     and save one call indirection.

This runs counter to a better optimisation that completely avoids
calling obj_to_vma for ppgtt lookups (i.e. in execbuffer we go straight
from handle to vma, skipping the handle to obj intermediate lookup).

Primary caller for this function should be ggtt users, with single
negative lookups before creating the ppgtt vma.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Optimise VMA lookup slightly
  2016-12-13 12:22 [PATCH] drm/i915: Optimise VMA lookup slightly Tvrtko Ursulin
  2016-12-13 12:41 ` Chris Wilson
@ 2016-12-13 12:45 ` Patchwork
  2016-12-13 17:23 ` ✗ Fi.CI.BAT: warning for drm/i915: Optimise VMA lookup slightly (rev2) Patchwork
  2 siblings, 0 replies; 12+ messages in thread
From: Patchwork @ 2016-12-13 12:45 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Optimise VMA lookup slightly
URL   : https://patchwork.freedesktop.org/series/16740/
State : success

== Summary ==

Series 16740v1 drm/i915: Optimise VMA lookup slightly
https://patchwork.freedesktop.org/api/1.0/series/16740/revisions/1/mbox/

Test gem_ringfill:
        Subgroup basic-default-hang:
                incomplete -> PASS       (fi-snb-2600)

fi-bdw-5557u     total:247  pass:233  dwarn:0   dfail:0   fail:0   skip:14 
fi-bsw-n3050     total:247  pass:208  dwarn:0   dfail:0   fail:0   skip:39 
fi-bxt-t5700     total:247  pass:220  dwarn:0   dfail:0   fail:0   skip:27 
fi-byt-j1900     total:247  pass:220  dwarn:0   dfail:0   fail:0   skip:27 
fi-hsw-4770      total:247  pass:228  dwarn:0   dfail:0   fail:0   skip:19 
fi-hsw-4770r     total:247  pass:228  dwarn:0   dfail:0   fail:0   skip:19 
fi-ilk-650       total:247  pass:195  dwarn:0   dfail:0   fail:0   skip:52 
fi-ivb-3520m     total:247  pass:226  dwarn:0   dfail:0   fail:0   skip:21 
fi-ivb-3770      total:247  pass:226  dwarn:0   dfail:0   fail:0   skip:21 
fi-kbl-7500u     total:247  pass:226  dwarn:0   dfail:0   fail:0   skip:21 
fi-skl-6260u     total:247  pass:234  dwarn:0   dfail:0   fail:0   skip:13 
fi-skl-6700hq    total:247  pass:227  dwarn:0   dfail:0   fail:0   skip:20 
fi-skl-6700k     total:247  pass:224  dwarn:3   dfail:0   fail:0   skip:20 
fi-skl-6770hq    total:247  pass:234  dwarn:0   dfail:0   fail:0   skip:13 
fi-snb-2600      total:247  pass:215  dwarn:0   dfail:0   fail:0   skip:32 

d4ed6d3718cc5f9f2832177d292f425466a8a460 drm-tip: 2016y-12m-13d-10h-41m-44s UTC integration manifest
3858095 drm/i915: Optimise VMA lookup slightly

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_3274/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Optimise VMA lookup slightly
  2016-12-13 12:41 ` Chris Wilson
@ 2016-12-13 13:30   ` Tvrtko Ursulin
  2016-12-13 13:41     ` Chris Wilson
  0 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-12-13 13:30 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, Intel-gfx, Tvrtko Ursulin


On 13/12/2016 12:41, Chris Wilson wrote:
> On Tue, Dec 13, 2016 at 12:22:18PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> A few details to hopefully make a very hot function a tiny bit
>> more efficient:
>>
>>  1. Cast VM pointers before substraction to save the compiler
>>     doing a smart one which includes multiplication.
>
> Indeed. Not pretty though.
>
> static always_inline __kernel_ptrdiff_t ptrdiff(const void *a, const void *b)
> {
> 	return a - b;
> }
>
> cmp = ptrdiff(vma->vm, vm);
> if (cmp)
> 	return cmp;

Pffifle as you would say. :)

>>  2. Use smaller type for comparison since we only care about
>>     the sign.
>
> Should be a no-op since the compiler also should only care about the
> sign and not be moving the registers about, just the cc and we should be
> inlining... Is gcc not smart enough? :(

I am pretty sure there was a significant difference in the size of 
generated code so apparently it isn't.

>>  3. Prefer the ppgtt lookup branch and inline it, allowing the
>>     compiler to optimise out the second part of i915_vma_compare
>>     and save one call indirection.
>
> This runs counter to a better optimisation that completely avoids
> calling obj_to_vma for ppgtt lookups (i.e. in execbuffer we go straight
> from handle to vma, skipping the handle to obj intermediate lookup).
>
> Primary caller for this function should be ggtt users, with single
> negative lookups before creating the ppgtt vma.

Empirically ppgtt users call this much much more on any benchmarks I've 
tried since with the change all I see is lookup_and_create in the 
profiles and not obj_to_vma.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Optimise VMA lookup slightly
  2016-12-13 13:30   ` Tvrtko Ursulin
@ 2016-12-13 13:41     ` Chris Wilson
  2016-12-13 14:37       ` [PATCH v2] " Tvrtko Ursulin
  2016-12-13 14:40       ` [PATCH] " Tvrtko Ursulin
  0 siblings, 2 replies; 12+ messages in thread
From: Chris Wilson @ 2016-12-13 13:41 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Tue, Dec 13, 2016 at 01:30:19PM +0000, Tvrtko Ursulin wrote:
> 
> On 13/12/2016 12:41, Chris Wilson wrote:
> >On Tue, Dec 13, 2016 at 12:22:18PM +0000, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >>A few details to hopefully make a very hot function a tiny bit
> >>more efficient:
> >>
> >> 1. Cast VM pointers before substraction to save the compiler
> >>    doing a smart one which includes multiplication.
> >
> >Indeed. Not pretty though.
> >
> >static always_inline __kernel_ptrdiff_t ptrdiff(const void *a, const void *b)
> >{
> >	return a - b;
> >}
> >
> >cmp = ptrdiff(vma->vm, vm);
> >if (cmp)
> >	return cmp;
> 
> Pffifle as you would say. :)
> 
> >> 2. Use smaller type for comparison since we only care about
> >>    the sign.
> >
> >Should be a no-op since the compiler also should only care about the
> >sign and not be moving the registers about, just the cc and we should be
> >inlining... Is gcc not smart enough? :(
> 
> I am pretty sure there was a significant difference in the size of
> generated code so apparently it isn't.

If you have two available, can you attach the asm?
 
> >> 3. Prefer the ppgtt lookup branch and inline it, allowing the
> >>    compiler to optimise out the second part of i915_vma_compare
> >>    and save one call indirection.
> >
> >This runs counter to a better optimisation that completely avoids
> >calling obj_to_vma for ppgtt lookups (i.e. in execbuffer we go straight
> >from handle to vma, skipping the handle to obj intermediate lookup).
> >
> >Primary caller for this function should be ggtt users, with single
> >negative lookups before creating the ppgtt vma.
> 
> Empirically ppgtt users call this much much more on any benchmarks
> I've tried since with the change all I see is lookup_and_create in
> the profiles and not obj_to_vma.

Exactly, hence why we should tackle the root of the problem... Replace a
slow idr lookup and rbtree walk with a direct lookup (hashtable) from
the fpriv. Since it doesn't just affect full-ppgtt but the introduction
of the second lookup regressed everyone,
https://bugs.freedesktop.org/show_bug.cgi?id=87131 and the myriad of
igt/benchmark/gem_exec_reloc benchmarks.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] drm/i915: Optimise VMA lookup slightly
  2016-12-13 13:41     ` Chris Wilson
@ 2016-12-13 14:37       ` Tvrtko Ursulin
  2016-12-13 14:47         ` Chris Wilson
  2016-12-13 14:40       ` [PATCH] " Tvrtko Ursulin
  1 sibling, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-12-13 14:37 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Cast VM pointers before substraction to save the compiler
doing a smart one which includes multiplication.

v2: Only keep the first optimisation and prettify it. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_vma.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 21be74c61065..e3b2b3b1e056 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -178,15 +178,23 @@ static inline void i915_vma_put(struct i915_vma *vma)
 	i915_gem_object_put(vma->obj);
 }
 
+static __always_inline ptrdiff_t ptrdiff(const void *a, const void *b)
+{
+	return a - b;
+}
+
 static inline long
 i915_vma_compare(struct i915_vma *vma,
 		 struct i915_address_space *vm,
 		 const struct i915_ggtt_view *view)
 {
+	ptrdiff_t cmp;
+
 	GEM_BUG_ON(view && !i915_is_ggtt(vm));
 
-	if (vma->vm != vm)
-		return vma->vm - vm;
+	cmp = ptrdiff(vma->vm, vm);
+	if (cmp)
+		return cmp;
 
 	if (!view)
 		return vma->ggtt_view.type;
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Optimise VMA lookup slightly
  2016-12-13 13:41     ` Chris Wilson
  2016-12-13 14:37       ` [PATCH v2] " Tvrtko Ursulin
@ 2016-12-13 14:40       ` Tvrtko Ursulin
  1 sibling, 0 replies; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-12-13 14:40 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, Intel-gfx, Tvrtko Ursulin


On 13/12/2016 13:41, Chris Wilson wrote:
> On Tue, Dec 13, 2016 at 01:30:19PM +0000, Tvrtko Ursulin wrote:
>>
>> On 13/12/2016 12:41, Chris Wilson wrote:
>>> On Tue, Dec 13, 2016 at 12:22:18PM +0000, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> A few details to hopefully make a very hot function a tiny bit
>>>> more efficient:
>>>>
>>>> 1. Cast VM pointers before substraction to save the compiler
>>>>    doing a smart one which includes multiplication.
>>>
>>> Indeed. Not pretty though.
>>>
>>> static always_inline __kernel_ptrdiff_t ptrdiff(const void *a, const void *b)
>>> {
>>> 	return a - b;
>>> }
>>>
>>> cmp = ptrdiff(vma->vm, vm);
>>> if (cmp)
>>> 	return cmp;
>>
>> Pffifle as you would say. :)
>>
>>>> 2. Use smaller type for comparison since we only care about
>>>>    the sign.
>>>
>>> Should be a no-op since the compiler also should only care about the
>>> sign and not be moving the registers about, just the cc and we should be
>>> inlining... Is gcc not smart enough? :(
>>
>> I am pretty sure there was a significant difference in the size of
>> generated code so apparently it isn't.
>
> If you have two available, can you attach the asm?

http://paste.debian.net/901995/ if you can figure it out.

Dropped it from v2 though.

>>>> 3. Prefer the ppgtt lookup branch and inline it, allowing the
>>>>    compiler to optimise out the second part of i915_vma_compare
>>>>    and save one call indirection.
>>>
>>> This runs counter to a better optimisation that completely avoids
>>> calling obj_to_vma for ppgtt lookups (i.e. in execbuffer we go straight
>> >from handle to vma, skipping the handle to obj intermediate lookup).
>>>
>>> Primary caller for this function should be ggtt users, with single
>>> negative lookups before creating the ppgtt vma.
>>
>> Empirically ppgtt users call this much much more on any benchmarks
>> I've tried since with the change all I see is lookup_and_create in
>> the profiles and not obj_to_vma.
>
> Exactly, hence why we should tackle the root of the problem... Replace a
> slow idr lookup and rbtree walk with a direct lookup (hashtable) from
> the fpriv. Since it doesn't just affect full-ppgtt but the introduction
> of the second lookup regressed everyone,
> https://bugs.freedesktop.org/show_bug.cgi?id=87131 and the myriad of
> igt/benchmark/gem_exec_reloc benchmarks.

Dropped this from v2 as well, only kept ptrdiff.

Will have a look around for the things you mentioned here.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] drm/i915: Optimise VMA lookup slightly
  2016-12-13 14:37       ` [PATCH v2] " Tvrtko Ursulin
@ 2016-12-13 14:47         ` Chris Wilson
  2016-12-15 16:49           ` Tvrtko Ursulin
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-12-13 14:47 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Tue, Dec 13, 2016 at 02:37:27PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Cast VM pointers before substraction to save the compiler
> doing a smart one which includes multiplication.
> 
> v2: Only keep the first optimisation and prettify it. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>

Step 1, ok.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

(I wasn't against the others, just curious as to what gcc was doing for
#2 and #3 I'd like just to pursue a different path altogether :)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* ✗ Fi.CI.BAT: warning for drm/i915: Optimise VMA lookup slightly (rev2)
  2016-12-13 12:22 [PATCH] drm/i915: Optimise VMA lookup slightly Tvrtko Ursulin
  2016-12-13 12:41 ` Chris Wilson
  2016-12-13 12:45 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2016-12-13 17:23 ` Patchwork
  2016-12-15 13:33   ` Tvrtko Ursulin
  2 siblings, 1 reply; 12+ messages in thread
From: Patchwork @ 2016-12-13 17:23 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Optimise VMA lookup slightly (rev2)
URL   : https://patchwork.freedesktop.org/series/16740/
State : warning

== Summary ==

Series 16740v2 drm/i915: Optimise VMA lookup slightly
https://patchwork.freedesktop.org/api/1.0/series/16740/revisions/2/mbox/

Test drv_module_reload:
        Subgroup basic-reload:
                pass       -> DMESG-WARN (fi-bsw-n3050)
Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-b:
                incomplete -> PASS       (fi-snb-2600)

fi-bdw-5557u     total:247  pass:233  dwarn:0   dfail:0   fail:0   skip:14 
fi-bsw-n3050     total:247  pass:207  dwarn:1   dfail:0   fail:0   skip:39 
fi-bxt-t5700     total:247  pass:220  dwarn:0   dfail:0   fail:0   skip:27 
fi-byt-j1900     total:247  pass:220  dwarn:0   dfail:0   fail:0   skip:27 
fi-byt-n2820     total:247  pass:216  dwarn:0   dfail:0   fail:0   skip:31 
fi-hsw-4770      total:247  pass:228  dwarn:0   dfail:0   fail:0   skip:19 
fi-hsw-4770r     total:247  pass:228  dwarn:0   dfail:0   fail:0   skip:19 
fi-ilk-650       total:247  pass:195  dwarn:0   dfail:0   fail:0   skip:52 
fi-ivb-3520m     total:247  pass:226  dwarn:0   dfail:0   fail:0   skip:21 
fi-ivb-3770      total:247  pass:226  dwarn:0   dfail:0   fail:0   skip:21 
fi-kbl-7500u     total:247  pass:226  dwarn:0   dfail:0   fail:0   skip:21 
fi-skl-6260u     total:247  pass:234  dwarn:0   dfail:0   fail:0   skip:13 
fi-skl-6700hq    total:247  pass:227  dwarn:0   dfail:0   fail:0   skip:20 
fi-skl-6700k     total:247  pass:224  dwarn:3   dfail:0   fail:0   skip:20 
fi-skl-6770hq    total:247  pass:234  dwarn:0   dfail:0   fail:0   skip:13 
fi-snb-2520m     total:247  pass:216  dwarn:0   dfail:0   fail:0   skip:31 
fi-snb-2600      total:247  pass:215  dwarn:0   dfail:0   fail:0   skip:32 

c596a6521837b06a740e894030d11024f391bd6e drm-tip: 2016y-12m-13d-15h-48m-22s UTC integration manifest
ee25c15 drm/i915: Optimise VMA lookup slightly

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_3276/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ✗ Fi.CI.BAT: warning for drm/i915: Optimise VMA lookup slightly (rev2)
  2016-12-13 17:23 ` ✗ Fi.CI.BAT: warning for drm/i915: Optimise VMA lookup slightly (rev2) Patchwork
@ 2016-12-15 13:33   ` Tvrtko Ursulin
  0 siblings, 0 replies; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-12-15 13:33 UTC (permalink / raw)
  To: intel-gfx, Tvrtko Ursulin


On 13/12/2016 17:23, Patchwork wrote:
> == Series Details ==
>
> Series: drm/i915: Optimise VMA lookup slightly (rev2)
> URL   : https://patchwork.freedesktop.org/series/16740/
> State : warning
>
> == Summary ==
>
> Series 16740v2 drm/i915: Optimise VMA lookup slightly
> https://patchwork.freedesktop.org/api/1.0/series/16740/revisions/2/mbox/
>
> Test drv_module_reload:
>         Subgroup basic-reload:
>                 pass       -> DMESG-WARN (fi-bsw-n3050)

WARN_ON(dev_priv->gt.awake)

https://bugs.freedesktop.org/show_bug.cgi?id=98670

> Test kms_pipe_crc_basic:
>         Subgroup suspend-read-crc-pipe-b:
>                 incomplete -> PASS       (fi-snb-2600)
>
> fi-bdw-5557u     total:247  pass:233  dwarn:0   dfail:0   fail:0   skip:14
> fi-bsw-n3050     total:247  pass:207  dwarn:1   dfail:0   fail:0   skip:39
> fi-bxt-t5700     total:247  pass:220  dwarn:0   dfail:0   fail:0   skip:27
> fi-byt-j1900     total:247  pass:220  dwarn:0   dfail:0   fail:0   skip:27
> fi-byt-n2820     total:247  pass:216  dwarn:0   dfail:0   fail:0   skip:31
> fi-hsw-4770      total:247  pass:228  dwarn:0   dfail:0   fail:0   skip:19
> fi-hsw-4770r     total:247  pass:228  dwarn:0   dfail:0   fail:0   skip:19
> fi-ilk-650       total:247  pass:195  dwarn:0   dfail:0   fail:0   skip:52
> fi-ivb-3520m     total:247  pass:226  dwarn:0   dfail:0   fail:0   skip:21
> fi-ivb-3770      total:247  pass:226  dwarn:0   dfail:0   fail:0   skip:21
> fi-kbl-7500u     total:247  pass:226  dwarn:0   dfail:0   fail:0   skip:21
> fi-skl-6260u     total:247  pass:234  dwarn:0   dfail:0   fail:0   skip:13
> fi-skl-6700hq    total:247  pass:227  dwarn:0   dfail:0   fail:0   skip:20
> fi-skl-6700k     total:247  pass:224  dwarn:3   dfail:0   fail:0   skip:20
> fi-skl-6770hq    total:247  pass:234  dwarn:0   dfail:0   fail:0   skip:13
> fi-snb-2520m     total:247  pass:216  dwarn:0   dfail:0   fail:0   skip:31
> fi-snb-2600      total:247  pass:215  dwarn:0   dfail:0   fail:0   skip:32
>
> c596a6521837b06a740e894030d11024f391bd6e drm-tip: 2016y-12m-13d-15h-48m-22s UTC integration manifest
> ee25c15 drm/i915: Optimise VMA lookup slightly

Pushed, thanks for the review!

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] drm/i915: Optimise VMA lookup slightly
  2016-12-13 14:47         ` Chris Wilson
@ 2016-12-15 16:49           ` Tvrtko Ursulin
  2016-12-15 17:08             ` Chris Wilson
  0 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-12-15 16:49 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, Intel-gfx, Tvrtko Ursulin


On 13/12/2016 14:47, Chris Wilson wrote:
> On Tue, Dec 13, 2016 at 02:37:27PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Cast VM pointers before substraction to save the compiler
>> doing a smart one which includes multiplication.
>>
>> v2: Only keep the first optimisation and prettify it. (Chris Wilson)
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>
> Step 1, ok.
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>
> (I wasn't against the others, just curious as to what gcc was doing for
> #2 and #3 I'd like just to pursue a different path altogether :)

Thanks.

Yes I know. Longer VMA lists is not something I've tested yet. I've just 
noticed that even where lookups are predominantly on short lists it can 
still be up to 1% of CPU time spent in the lookup. It averages around 
0.7% AFAIR.

More precisely in that test (which is simply running a vsync limited 
neverball intro screen :)), 65% of all lookups are on single VMA object! 
29% on objects with two VMAs and 29% on on objects with three VMAs. 
That's it, no longer lists at all.

How much benefit for this case smarter lookup would make I was not sure. 
So simply wanted to tighten up the existing search as much as possible. 
Even for that I am not sure that it makes a difference but at least if 
we can pointless instructions why not.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] drm/i915: Optimise VMA lookup slightly
  2016-12-15 16:49           ` Tvrtko Ursulin
@ 2016-12-15 17:08             ` Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2016-12-15 17:08 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Thu, Dec 15, 2016 at 04:49:49PM +0000, Tvrtko Ursulin wrote:
> 
> On 13/12/2016 14:47, Chris Wilson wrote:
> >On Tue, Dec 13, 2016 at 02:37:27PM +0000, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >>Cast VM pointers before substraction to save the compiler
> >>doing a smart one which includes multiplication.
> >>
> >>v2: Only keep the first optimisation and prettify it. (Chris Wilson)
> >>
> >>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >
> >Step 1, ok.
> >Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> >
> >(I wasn't against the others, just curious as to what gcc was doing for
> >#2 and #3 I'd like just to pursue a different path altogether :)
> 
> Thanks.
> 
> Yes I know. Longer VMA lists is not something I've tested yet. I've
> just noticed that even where lookups are predominantly on short
> lists it can still be up to 1% of CPU time spent in the lookup. It
> averages around 0.7% AFAIR.
> 
> More precisely in that test (which is simply running a vsync limited
> neverball intro screen :)), 65% of all lookups are on single VMA
> object! 29% on objects with two VMAs and 29% on on objects with
> three VMAs. That's it, no longer lists at all.

Also note that I've been trying to teach mesa not to be so dumb as well.
We still do get benefits from improving the execbuf vma/reloc paths, but
that pales in comparison to the improvements we can make in mesa.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-12-15 17:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-13 12:22 [PATCH] drm/i915: Optimise VMA lookup slightly Tvrtko Ursulin
2016-12-13 12:41 ` Chris Wilson
2016-12-13 13:30   ` Tvrtko Ursulin
2016-12-13 13:41     ` Chris Wilson
2016-12-13 14:37       ` [PATCH v2] " Tvrtko Ursulin
2016-12-13 14:47         ` Chris Wilson
2016-12-15 16:49           ` Tvrtko Ursulin
2016-12-15 17:08             ` Chris Wilson
2016-12-13 14:40       ` [PATCH] " Tvrtko Ursulin
2016-12-13 12:45 ` ✓ Fi.CI.BAT: success for " Patchwork
2016-12-13 17:23 ` ✗ Fi.CI.BAT: warning for drm/i915: Optimise VMA lookup slightly (rev2) Patchwork
2016-12-15 13:33   ` Tvrtko Ursulin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox