* Re: Corruption in glxgears with Compiz
2010-10-22 19:29 ` Chris Wilson
@ 2010-10-22 19:38 ` Peter Clifton
2010-10-22 20:41 ` Peter Clifton
2010-10-23 3:35 ` Peter Clifton
2 siblings, 0 replies; 18+ messages in thread
From: Peter Clifton @ 2010-10-22 19:38 UTC (permalink / raw)
To: Chris Wilson, intel-gfx@lists.freedesktop.org
On Fri, 2010-10-22 at 20:29 +0100, Chris Wilson wrote:
> My guess is that it is a double application of the drawable offset when
> doing a CopyRegion swapbuffers. Does the corruption move in relation to
> the window as it moves?
Yes, it appears to. Relative to the screen, (and just squinting at it),
the corruption appears to move twice the distance you move the window.
> This suggests that [my] recent changes to the ddx are to blame, and
> certainly a bisection on -intel might help - though it's probably just as
> easy to test before the shadow+dri changes to confirm.
I thought I'd tried reverting to an older ddx, but thinking more
carefully, perhaps I didn't go back very far. I'll try poking at the 2D
driver. Certainly it is quicker to do that than keep rebuilding drm and
rebooting.
Thanks for the pointer.
--
Peter Clifton
Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA
Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Corruption in glxgears with Compiz
2010-10-22 19:29 ` Chris Wilson
2010-10-22 19:38 ` Peter Clifton
@ 2010-10-22 20:41 ` Peter Clifton
2010-10-23 3:35 ` Peter Clifton
2 siblings, 0 replies; 18+ messages in thread
From: Peter Clifton @ 2010-10-22 20:41 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
On Fri, 2010-10-22 at 20:29 +0100, Chris Wilson wrote:
> On Fri, 22 Oct 2010 20:10:44 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> > As an additional data-point, with the bug manifesting, if you go to
> > "expose" mode, (Win+E for default config), you find the corruption is
> > absent. It only appears to be present when the glxgears window is not
> > scaled by the window manager.
>
> My guess is that it is a double application of the drawable offset when
> doing a CopyRegion swapbuffers. Does the corruption move in relation to
> the window as it moves?
>
> This suggests that [my] recent changes to the ddx are to blame, and
> certainly a bisection on -intel might help - though it's probably just as
> easy to test before the shadow+dri changes to confirm.
> -Chris
Well, I'm as far back as:
commit d41684d54592cf93554a4d6534e7ea74562b1798
Author: Eric Anholt <eric@anholt.net>
Date: Mon Jun 7 11:18:09 2010 -0700
And I'm still seeing the glitch. This is with drm backported from
878a3c37d36142a192bdf5b6bfcf920832f431d7
If it weren't for Alexey seeing it too with a non-backported version,
I'd suspect I'd made a mistake somewhere. Hmm.. what to try next?
I'd already attempted to revert mesa versions to when I (thought) it was
working nicely with compiz, but I can't recall quite what version that
was now. (I've purged my /var/cache/apt/archives)
Could the Xorg server cause this kind of issue? Looking at the commit
logs, I don't see much in the way of glx activity. (But would that
affect DRI rendering anyway?)
Are there any tests suggest I run?
Regards,
--
Peter Clifton
Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA
Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Corruption in glxgears with Compiz
2010-10-22 19:29 ` Chris Wilson
2010-10-22 19:38 ` Peter Clifton
2010-10-22 20:41 ` Peter Clifton
@ 2010-10-23 3:35 ` Peter Clifton
2010-10-23 4:07 ` Peter Clifton
2 siblings, 1 reply; 18+ messages in thread
From: Peter Clifton @ 2010-10-23 3:35 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
On Fri, 2010-10-22 at 20:29 +0100, Chris Wilson wrote:
> On Fri, 22 Oct 2010 20:10:44 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> > As an additional data-point, with the bug manifesting, if you go to
> > "expose" mode, (Win+E for default config), you find the corruption is
> > absent. It only appears to be present when the glxgears window is not
> > scaled by the window manager.
>
> My guess is that it is a double application of the drawable offset when
> doing a CopyRegion swapbuffers. Does the corruption move in relation to
> the window as it moves?
>
> This suggests that [my] recent changes to the ddx are to blame, and
> certainly a bisection on -intel might help - though it's probably just as
> easy to test before the shadow+dri changes to confirm.
> -Chris
Lost of bisecting and backporting later.. and I've identified the bad
commit:
9220434a8768902cd9cf248709972678b74aa8c1 drm/i915: Only emit a flush
request on the active ring.
I'm not sure what the correct fix is, but a workaround is this:
Actually, I've not tested that yet.. oops. It certainly works with the
if(1) and if (obj->write_domain) bypassing the test for ... &
I915_GEM_GPU_DOMAIN. That wasn't enough alone though, it didn't work
until I changed:
- if (flush_rings & RING_RENDER)
+ if (1)
Presumably some object is not getting the RENDER_RING added to the
flush_rings field correctly.
git diff HEAD^
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cf27655..a9d528e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1901,7 +1901,9 @@ i915_gem_flush(struct drm_device *dev,
drm_agp_chipset_flush(dev);
if ((flush_domains | invalidate_domains) & I915_GEM_GPU_DOMAINS) {
- if (flush_rings & RING_RENDER)
+// if (1) {
+// if (flush_rings & RING_RENDER)
+ if (1)
i915_gem_flush_ring(dev,
&dev_priv->render_ring,
invalidate_domains, flush_domains);
@@ -4197,6 +4199,7 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
* flush earlier is beneficial.
*/
if (obj->write_domain & I915_GEM_GPU_DOMAINS) {
+// if (obj->write_domain) {
i915_gem_flush_ring(dev,
obj_priv->ring,
0, obj->write_domain);
--
Peter Clifton
Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA
Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: Corruption in glxgears with Compiz
2010-10-23 3:35 ` Peter Clifton
@ 2010-10-23 4:07 ` Peter Clifton
2010-10-23 8:23 ` Alexey Fisher
2010-10-23 9:10 ` Chris Wilson
0 siblings, 2 replies; 18+ messages in thread
From: Peter Clifton @ 2010-10-23 4:07 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
On Sat, 2010-10-23 at 04:35 +0100, Peter Clifton wrote:
> Lost of bisecting and backporting later.. and I've identified the bad
> commit:
>
> 9220434a8768902cd9cf248709972678b74aa8c1 drm/i915: Only emit a flush
> request on the active ring.
A minimal fix is this:
commit 78342e8fd01614ac0507db1f9c3e0522f4da3c14
Author: Peter Clifton <pcjc2@cam.ac.uk>
Date: Sat Oct 23 04:00:21 2010 +0100
Attempted fix
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9290f02..868a399 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3759,7 +3759,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
*/
dev->invalidate_domains = 0;
dev->flush_domains = 0;
- dev_priv->mm.flush_rings = 0;
+ dev_priv->mm.flush_rings = ring->id;
for (i = 0; i < args->buffer_count; i++) {
struct drm_gem_object *obj = object_list[i];
Although I don't doubt that it is incorrect for some reason. My logic
was this.. the mm.flush_rings is supposed to be |='d with the object's
ring->id if the ring is set on a given object.
But we transfer objects to GPU domain before we actually put them in the
ring, therefore it never gets set.
So this patch just dumps the execbuffer ring into the list of rings to
be flushed. I guess that might be wrong.. perhaps we don't always need
to flush that ring unless an object in it gets reused.. anyway, I'm not
that familiar with GEM internals, and it is gone 5AM here. Still.. I
think I can claim I've nailed the offending commit at least.
Hopefully someone can come up with a sensible patch and explain to me
how this stuff works ;)
Best wishes,
--
Peter Clifton
Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA
Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: Corruption in glxgears with Compiz
2010-10-23 4:07 ` Peter Clifton
@ 2010-10-23 8:23 ` Alexey Fisher
2010-10-23 9:10 ` Chris Wilson
1 sibling, 0 replies; 18+ messages in thread
From: Alexey Fisher @ 2010-10-23 8:23 UTC (permalink / raw)
To: Peter Clifton; +Cc: intel-gfx
Am Samstag, den 23.10.2010, 05:07 +0100 schrieb Peter Clifton:
> On Sat, 2010-10-23 at 04:35 +0100, Peter Clifton wrote:
>
> > Lost of bisecting and backporting later.. and I've identified the bad
> > commit:
> >
> > 9220434a8768902cd9cf248709972678b74aa8c1 drm/i915: Only emit a flush
> > request on the active ring.
>
> A minimal fix is this:
>
> commit 78342e8fd01614ac0507db1f9c3e0522f4da3c14
> Author: Peter Clifton <pcjc2@cam.ac.uk>
> Date: Sat Oct 23 04:00:21 2010 +0100
>
> Attempted fix
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9290f02..868a399 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3759,7 +3759,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> */
> dev->invalidate_domains = 0;
> dev->flush_domains = 0;
> - dev_priv->mm.flush_rings = 0;
> + dev_priv->mm.flush_rings = ring->id;
>
> for (i = 0; i < args->buffer_count; i++) {
> struct drm_gem_object *obj = object_list[i];
>
>
>
>
Can only add here, me too. This patch fix it on my board too.
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: Corruption in glxgears with Compiz
2010-10-23 4:07 ` Peter Clifton
2010-10-23 8:23 ` Alexey Fisher
@ 2010-10-23 9:10 ` Chris Wilson
2010-10-23 9:43 ` Alexey Fisher
2010-10-23 11:42 ` Peter Clifton
1 sibling, 2 replies; 18+ messages in thread
From: Chris Wilson @ 2010-10-23 9:10 UTC (permalink / raw)
To: Peter Clifton; +Cc: intel-gfx
On Sat, 23 Oct 2010 05:07:57 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> Although I don't doubt that it is incorrect for some reason. My logic
> was this.. the mm.flush_rings is supposed to be |='d with the object's
> ring->id if the ring is set on a given object.
Well the whole inter-ring flushing is decidedly suspect since we have no
synchronisation between rings, yet. However in this scenario, you are just
using one ring...
If an object is in a GPU domain and so requires a flush, it is attached to
a ring. However, if the object needs an invalidation it may not yet be
attached to the ring (and in any event the invalidation needs to be
performed on the pending ring). Ahah.
Note to self: flushes must be done on the from-ring before the semaphore
and invalidations on the to-ring after the semaphore.
Can you try this patch?
diff --git a/drivers/gpu/drm/i915/i915_gem.c
b/drivers/gpu/drm/i915/i915_gem.c
index 9290f02..e7f27a5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3078,7 +3078,8 @@ i915_gem_object_set_to_cpu_domain(struct
drm_gem_object *o
bj, int write)
* drm_agp_chipset_flush
*/
static void
-i915_gem_object_set_to_gpu_domain(struct drm_gem_object *obj)
+i915_gem_object_set_to_gpu_domain(struct drm_gem_object *obj,
+ struct intel_ring_buffer *ring)
{
struct drm_device *dev = obj->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
@@ -3132,8 +3133,10 @@ i915_gem_object_set_to_gpu_domain(struct
drm_gem_object *
obj)
dev->invalidate_domains |= invalidate_domains;
dev->flush_domains |= flush_domains;
- if (obj_priv->ring)
+ if (flush_domains & I915_GEM_GPU_DOMAINS)
dev_priv->mm.flush_rings |= obj_priv->ring->id;
+ if (invalidate_domains & I915_GEM_GPU_DOMAINS)
+ dev_priv->mm.flush_rings |= ring->id;
trace_i915_gem_object_change_domain(obj,
old_read_domains,
@@ -3765,7 +3768,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void
*data,
struct drm_gem_object *obj = object_list[i];
/* Compute new gpu domains and update invalidate/flush */
- i915_gem_object_set_to_gpu_domain(obj);
+ i915_gem_object_set_to_gpu_domain(obj, ring);
}
if (dev->invalidate_domains | dev->flush_domains) {
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: Corruption in glxgears with Compiz
2010-10-23 9:10 ` Chris Wilson
@ 2010-10-23 9:43 ` Alexey Fisher
2010-10-23 10:07 ` Chris Wilson
2010-10-23 11:42 ` Peter Clifton
1 sibling, 1 reply; 18+ messages in thread
From: Alexey Fisher @ 2010-10-23 9:43 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
Am Samstag, den 23.10.2010, 10:10 +0100 schrieb Chris Wilson:
> On Sat, 23 Oct 2010 05:07:57 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> > Although I don't doubt that it is incorrect for some reason. My logic
> > was this.. the mm.flush_rings is supposed to be |='d with the object's
> > ring->id if the ring is set on a given object.
>
> Well the whole inter-ring flushing is decidedly suspect since we have no
> synchronisation between rings, yet. However in this scenario, you are just
> using one ring...
>
> If an object is in a GPU domain and so requires a flush, it is attached to
> a ring. However, if the object needs an invalidation it may not yet be
> attached to the ring (and in any event the invalidation needs to be
> performed on the pending ring). Ahah.
>
> Note to self: flushes must be done on the from-ring before the semaphore
> and invalidations on the to-ring after the semaphore.
>
> Can you try this patch?
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c
> b/drivers/gpu/drm/i915/i915_gem.c
> index 9290f02..e7f27a5 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3078,7 +3078,8 @@ i915_gem_object_set_to_cpu_domain(struct
> drm_gem_object *o
> bj, int write)
> * drm_agp_chipset_flush
> */
> static void
> -i915_gem_object_set_to_gpu_domain(struct drm_gem_object *obj)
> +i915_gem_object_set_to_gpu_domain(struct drm_gem_object *obj,
> + struct intel_ring_buffer *ring)
> {
> struct drm_device *dev = obj->dev;
> struct drm_i915_private *dev_priv = dev->dev_private;
> @@ -3132,8 +3133,10 @@ i915_gem_object_set_to_gpu_domain(struct
> drm_gem_object *
> obj)
>
> dev->invalidate_domains |= invalidate_domains;
> dev->flush_domains |= flush_domains;
> - if (obj_priv->ring)
> + if (flush_domains & I915_GEM_GPU_DOMAINS)
> dev_priv->mm.flush_rings |= obj_priv->ring->id;
> + if (invalidate_domains & I915_GEM_GPU_DOMAINS)
> + dev_priv->mm.flush_rings |= ring->id;
>
> trace_i915_gem_object_change_domain(obj,
> old_read_domains,
> @@ -3765,7 +3768,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void
> *data,
> struct drm_gem_object *obj = object_list[i];
>
> /* Compute new gpu domains and update invalidate/flush */
> - i915_gem_object_set_to_gpu_domain(obj);
> + i915_gem_object_set_to_gpu_domain(obj, ring);
> }
>
> if (dev->invalidate_domains | dev->flush_domains) {
>
Works for me.
Your mail client broke the patch, so "git am" didn't worked.
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: Corruption in glxgears with Compiz
2010-10-23 9:43 ` Alexey Fisher
@ 2010-10-23 10:07 ` Chris Wilson
0 siblings, 0 replies; 18+ messages in thread
From: Chris Wilson @ 2010-10-23 10:07 UTC (permalink / raw)
To: Alexey Fisher; +Cc: intel-gfx
On Sat, 23 Oct 2010 11:43:07 +0200, Alexey Fisher <bug-track@fisher-privat.net> wrote:
> Works for me.
> Your mail client broke the patch, so "git am" didn't worked.
Apologies, I was being lazy and just cut'n'paste into vim without paying
attention.
That patch works as a stop gap, the real fun begins getting the inter-ring
synchronisation right.
Thanks,
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Corruption in glxgears with Compiz
2010-10-23 9:10 ` Chris Wilson
2010-10-23 9:43 ` Alexey Fisher
@ 2010-10-23 11:42 ` Peter Clifton
2010-10-23 17:48 ` Chris Wilson
1 sibling, 1 reply; 18+ messages in thread
From: Peter Clifton @ 2010-10-23 11:42 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
On Sat, 2010-10-23 at 10:10 +0100, Chris Wilson wrote:
> On Sat, 23 Oct 2010 05:07:57 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> > Although I don't doubt that it is incorrect for some reason. My logic
> > was this.. the mm.flush_rings is supposed to be |='d with the object's
> > ring->id if the ring is set on a given object.
>
> Well the whole inter-ring flushing is decidedly suspect since we have no
> synchronisation between rings, yet. However in this scenario, you are just
> using one ring...
>
> If an object is in a GPU domain and so requires a flush, it is attached to
> a ring. However, if the object needs an invalidation it may not yet be
> attached to the ring (and in any event the invalidation needs to be
> performed on the pending ring). Ahah.
>
> Note to self: flushes must be done on the from-ring before the semaphore
> and invalidations on the to-ring after the semaphore.
>
> Can you try this patch?
Your patch works a treat.. I knew mine was really only a band-aid which
forced a flush on the pending indiscriminately, and was glad to see the
proper fix.
Really difficult to get your head round all this flush / invalidate
stuff. I get the idea, but in practice it is very confusing due to the
fact it is all deferred / scheduled work, and both subtly different
concepts (flush / invalidate) concepts are handled by the same action on
the GPU, and very similar code! Very easy to muddle current / pending
ring in my head, for example.
You replied to Alexey that the patch is only a stop gap, and inter-ring
synchronisation is the real challenge. I guess that is something you'll
be forced to look at with the new Sandybridge chipset having a separate
ring for BLT operations?
I'm just looking for fps with my circuit board rendering GL code at the
moment.. that's why I'm following git HEAD stuff, to see if the drivers
can unlock some performance in the code I'm writing. I'm struggling to
profile just what the bottleneck is!
--
Peter Clifton
Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA
Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Corruption in glxgears with Compiz
2010-10-23 11:42 ` Peter Clifton
@ 2010-10-23 17:48 ` Chris Wilson
2010-10-23 18:33 ` Peter Clifton
2010-10-24 23:06 ` Peter Clifton
0 siblings, 2 replies; 18+ messages in thread
From: Chris Wilson @ 2010-10-23 17:48 UTC (permalink / raw)
To: Peter Clifton; +Cc: intel-gfx
On Sat, 23 Oct 2010 12:42:05 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> Your patch works a treat.. I knew mine was really only a band-aid which
> forced a flush on the pending indiscriminately, and was glad to see the
> proper fix.
>
> Really difficult to get your head round all this flush / invalidate
> stuff. I get the idea, but in practice it is very confusing due to the
> fact it is all deferred / scheduled work, and both subtly different
> concepts (flush / invalidate) concepts are handled by the same action on
> the GPU, and very similar code! Very easy to muddle current / pending
> ring in my head, for example.
>
> You replied to Alexey that the patch is only a stop gap, and inter-ring
> synchronisation is the real challenge. I guess that is something you'll
> be forced to look at with the new Sandybridge chipset having a separate
> ring for BLT operations?
Exactly. We already have the issue on i965 with the Bitstream Decoder ring
which handles video separate from the render ring. Fortunately no one has
fallen over the lack of synchronisation there since the API design makes
interoperating GL/RENDER/Video so difficult. Even worse is that it is only
with Sandybridge that we have the ability to insert semaphores onto the
ring to handle inter-ring synchronisation on the GPU, otherwise we will
simply have to wait on retirement when transferring ownership from one
ring to another. Is it worth the additional complexity to have buffers
reside on multiple rings at the same time? Possibly if we do start mixing
video + GL. Anyway with the BLT split, handling synchronisation will
become an issue.
> I'm just looking for fps with my circuit board rendering GL code at the
> moment.. that's why I'm following git HEAD stuff, to see if the drivers
> can unlock some performance in the code I'm writing. I'm struggling to
> profile just what the bottleneck is!
Aye, profiling GPU code at the moment is a hard problem. If you do find
some CPU bottlenecks, they're usually the easiest to fix. What may help is
to sync every operation and see what the relative times + relative
frequencies to work out the rate limiting step and then see if you can
break it down further and repeat. (Even if we had a GPU callgrind, given
the disconnect between what is executed on the GPU and GL, it may not be
obvious how to improve the code.) uprof may help here given the
annotations Robert Brag has made for mesa profiling.
We're always eager to improve our code to get the most of our admittedly
lack-luster GPUs. Even suggests on what tools would be useful or
improvements we could make to improve profiling/development are most
welcome.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Corruption in glxgears with Compiz
2010-10-23 17:48 ` Chris Wilson
@ 2010-10-23 18:33 ` Peter Clifton
2010-10-24 23:06 ` Peter Clifton
1 sibling, 0 replies; 18+ messages in thread
From: Peter Clifton @ 2010-10-23 18:33 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx@lists.freedesktop.org
On Sat, 2010-10-23 at 18:48 +0100, Chris Wilson wrote:
> We're always eager to improve our code to get the most of our admittedly
> lack-luster GPUs. Even suggests on what tools would be useful or
> improvements we could make to improve profiling/development are most
> welcome.
One thing I was wondering about, was intel_gpu_top. It reports unit
usage based on busy / done registers in the chip. I wondered what would
happen if we polled those registers and graphed them in time... whether
it would show any hints as to which units were waiting on each other,
and where any gaps are.
It would need to be graphical probably, and it would need to be
synchronised in some way to the application / frames being processed, so
all in all, it is rather hard to imagine how it would work with perhaps
unrelated GPU activity going on for other things such as the compositor
and toolkit redrawing.
I sometimes wonder if it is just memory bandwidth constraining things..
perhaps I need to look to the chipset docs and see if there are any
performance diagnostic regs there as well.
--
Peter Clifton
Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA
Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Corruption in glxgears with Compiz
2010-10-23 17:48 ` Chris Wilson
2010-10-23 18:33 ` Peter Clifton
@ 2010-10-24 23:06 ` Peter Clifton
1 sibling, 0 replies; 18+ messages in thread
From: Peter Clifton @ 2010-10-24 23:06 UTC (permalink / raw)
To: Chris Wilson, intel-gfx@lists.freedesktop.org
On Sat, 2010-10-23 at 18:48 +0100, Chris Wilson wrote:
> Aye, profiling GPU code at the moment is a hard problem. If you do find
> some CPU bottlenecks, they're usually the easiest to fix. What may help is
> to sync every operation and see what the relative times + relative
> frequencies to work out the rate limiting step and then see if you can
> break it down further and repeat. (Even if we had a GPU callgrind, given
> the disconnect between what is executed on the GPU and GL, it may not be
> obvious how to improve the code.) uprof may help here given the
> annotations Robert Brag has made for mesa profiling.
uprof looks interesting, but I couldn't see anything in git head mesa
relating to it. When I profiled in the past, I noticed my use of glClear
was a problem. I've reduced it by a factor of eight by more intelligent
use of the stencil buffer bitplanes, and might be able to do better
still with some thought about encoding and / or abuse of the depth
buffer.
Enabling debugging shows that I'm always falling onto the mesa meta
clear path as the depth / stencil buffer is tiled on the GM45. The BLT
engine can't write to that and mesa has to save and restore nearly the
entire 3D state for every clear.
I'm tempted to try open-coding the stencil buffer clears using GL calls
as I won't need to modify so much state as mesa has to. Still, I'm not
sure if there would be much difference in overhead between a big
state-change and a small one.
PCB design / CAD applications are very graphics intensive, so I should
perhaps have looked at a heavier weight laptop to do them on, but I'd
dearly love to support less performant GL capable hardware too as many
of our users are on oldish hardware. Being a bit fps challenged myself
helps me find more devious ways to keep frame-rate up ;) still, glxgears
only manages 30fps at full screen, so I don't expect miracles!
> We're always eager to improve our code to get the most of our admittedly
> lack-luster GPUs. Even suggests on what tools would be useful or
> improvements we could make to improve profiling/development are most
> welcome.
The code is already so much better. I can remember before the pre DRI2
days, pre GEM, pre KMS.. I just can't imagine a desktop without seamless
compositing and GL working any more.
The hard work from everyone at Intel, the mesa developers, and those
working on all the other OSS drivers is really really bringing the Linux
desktop up to scratch. Very very many people have a lot to be thankful
to you guys for.
--
Peter Clifton
Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA
Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)
^ permalink raw reply [flat|nested] 18+ messages in thread