All of lore.kernel.org
 help / color / mirror / Atom feed
* Corruption in glxgears with Compiz
@ 2010-10-22 12:53 Peter Clifton
  2010-10-22 13:39 ` Chris Wilson
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Clifton @ 2010-10-22 12:53 UTC (permalink / raw)
  To: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 1183 bytes --]

Hi guys,

I was wondering whether anyone has tried the latest stack of drivers
with compiz running?

I'm running the Ubuntu Xorg edgers PPA with a home-brew backport of
drm-intel-next against the 2.6.35 ubuntu kernel. That was working fine
until recently, but with compiz running I'm getting some visual
corruption. (I've attached a fragment of a screen-shot captured during
the problem).

The problem does not occur with xcompmgr or metacity (with or without
compositing enabled), just compiz.

Unfortunately, I've not been able to narrow down which update caused the
breakage yet, as a lot of components have changed recently, and I've not
normally been running compiz.

If I can narrow it down further (or it doesn't get resolved in a few
weeks), I'll file a proper bug report ;)

Aside from this little glitch and a small TV-out misdetect problem which
I patch around locally, the drivers have been great recently!

Best wishes,

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

[-- Attachment #2: gl_corruption_crop.png --]
[-- Type: image/png, Size: 103047 bytes --]

[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Corruption in glxgears with Compiz
  2010-10-22 12:53 Corruption in glxgears with Compiz Peter Clifton
@ 2010-10-22 13:39 ` Chris Wilson
  2010-10-22 14:04   ` Alexey Fisher
                     ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Chris Wilson @ 2010-10-22 13:39 UTC (permalink / raw)
  To: Peter Clifton, intel-gfx

On Fri, 22 Oct 2010 13:53:16 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> Hi guys,
> 
> I was wondering whether anyone has tried the latest stack of drivers
> with compiz running?
> 
> I'm running the Ubuntu Xorg edgers PPA with a home-brew backport of
> drm-intel-next against the 2.6.35 ubuntu kernel. That was working fine
> until recently, but with compiz running I'm getting some visual
> corruption. (I've attached a fragment of a screen-shot captured during
> the problem).

Which commit did you take your backport from? There were a couple of
pageflip related commits that I'm interested in knowing how they fare and
QA has chastised me for introducing a couple of bugs as well.
-chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Corruption in glxgears with Compiz
  2010-10-22 13:39 ` Chris Wilson
@ 2010-10-22 14:04   ` Alexey Fisher
  2010-10-22 19:10   ` Peter Clifton
  2010-10-22 19:13   ` Peter Clifton
  2 siblings, 0 replies; 18+ messages in thread
From: Alexey Fisher @ 2010-10-22 14:04 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

Am Freitag, den 22.10.2010, 14:39 +0100 schrieb Chris Wilson:
> On Fri, 22 Oct 2010 13:53:16 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> > Hi guys,
> > 
> > I was wondering whether anyone has tried the latest stack of drivers
> > with compiz running?
> > 
> > I'm running the Ubuntu Xorg edgers PPA with a home-brew backport of
> > drm-intel-next against the 2.6.35 ubuntu kernel. That was working fine
> > until recently, but with compiz running I'm getting some visual
> > corruption. (I've attached a fragment of a screen-shot captured during
> > the problem).
> 
> Which commit did you take your backport from? There were a couple of
> pageflip related commits that I'm interested in knowing how they fare and
> QA has chastised me for introducing a couple of bugs as well.
> -chris
> 

i can reproduce this issue too, here is the video:
http://videobin.org/+28v/2jf.html

i use kernel 2.6.36-rc8-01142-gb5dc608 from intel drm_next
and 
apt-cache policy xserver-xorg-video-intel
xserver-xorg-video-intel:
  Installiert: 2:2.12.902+git20101018.a1c54f69-0ubuntu0sarvatt
  Kandidat:    2:2.12.902+git20101018.a1c54f69-0ubuntu0sarvatt
  Versionstabelle:
 *** 2:2.12.902+git20101018.a1c54f69-0ubuntu0sarvatt 0
        500 http://ppa.launchpad.net/xorg-edgers/ppa/ubuntu/
maverick/main amd64 Packages
        100 /var/lib/dpkg/status
     2:2.12.0-1ubuntu5 0
        500 http://de.archive.ubuntu.com/ubuntu/ maverick/main amd64
Packages


if it kernel side problem i can do bisecting.

PS: my HW, intel DG45ID board with 
intel_stepping 
Vendor: 0x8086, Device: 0x2e22, Revision: 0x03 (A3)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Corruption in glxgears with Compiz
  2010-10-22 13:39 ` Chris Wilson
  2010-10-22 14:04   ` Alexey Fisher
@ 2010-10-22 19:10   ` Peter Clifton
  2010-10-22 19:29     ` Chris Wilson
  2010-10-22 19:13   ` Peter Clifton
  2 siblings, 1 reply; 18+ messages in thread
From: Peter Clifton @ 2010-10-22 19:10 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, 2010-10-22 at 14:39 +0100, Chris Wilson wrote:
> On Fri, 22 Oct 2010 13:53:16 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> > Hi guys,	
> > 
> > I was wondering whether anyone has tried the latest stack of drivers
> > with compiz running?
> > 
> > I'm running the Ubuntu Xorg edgers PPA with a home-brew backport of
> > drm-intel-next against the 2.6.35 ubuntu kernel. That was working fine
> > until recently, but with compiz running I'm getting some visual
> > corruption. (I've attached a fragment of a screen-shot captured during
> > the problem).
> 
> Which commit did you take your backport from? There were a couple of
> pageflip related commits that I'm interested in knowing how they fare and
> QA has chastised me for introducing a couple of bugs as well.
> -chris

As an additional data-point, with the bug manifesting, if you go to
"expose" mode, (Win+E for default config), you find the corruption is
absent. It only appears to be present when the glxgears window is not
scaled by the window manager.

With (urgh) wobbly windows turned on, it is less manifest when the
window is being wobbled, but there is still some edge tearing and
corruption.

With drm debugging enabled (modprobe drm debug=2), I noticed some pipe
underrun errors in the logs. If you want further info, let me know. I
wasn't sure they were correlated to the problem though.

With glxgears running in the background, corruption isn't just
restricted to that window. For example, I have glxgears running in the
background now, and the email composer caret is being left behind some
times as I type. Similarly, on a console window (gnome terminal),
certain glyphs are rendering as blank, or full black. This goes away
when the window is re-exposed by dragging another window on top, so I
guess it is not the glyphs themselves which were corrupted.

Whilst I don't have nearly your level of expertise with the driver, I'm
keen to know what you're thinking these symptoms might point to, and
what you think is worth looking at to debug. I know you'll beat me to a
fix, but I'm curious as to how one even starts debug this kind of issue
(aside from bisection).

I just tried to build and run with a stock (ish) Ubuntu kernel (removed
the stgit series for my backport of agp and drm drivers), but I couldn't
log in (with compiz auto-starting), without the graphics freezing up.
Probably something else in the "new" stack needs a newer kernel driver I
guess.

Best regards,

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Corruption in glxgears with Compiz
  2010-10-22 13:39 ` Chris Wilson
  2010-10-22 14:04   ` Alexey Fisher
  2010-10-22 19:10   ` Peter Clifton
@ 2010-10-22 19:13   ` Peter Clifton
  2 siblings, 0 replies; 18+ messages in thread
From: Peter Clifton @ 2010-10-22 19:13 UTC (permalink / raw)
  To: Chris Wilson

On Fri, 2010-10-22 at 14:39 +0100, Chris Wilson wrote:
> On Fri, 22 Oct 2010 13:53:16 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> > Hi guys,
> > 
> > I was wondering whether anyone has tried the latest stack of drivers
> > with compiz running?
> > 
> > I'm running the Ubuntu Xorg edgers PPA with a home-brew backport of
> > drm-intel-next against the 2.6.35 ubuntu kernel. That was working fine
> > until recently, but with compiz running I'm getting some visual
> > corruption. (I've attached a fragment of a screen-shot captured during
> > the problem).
> 
> Which commit did you take your backport from? There were a couple of
> pageflip related commits that I'm interested in knowing how they fare and
> QA has chastised me for introducing a couple of bugs as well.
> -chris

I'm backported from 878a3c37d36142a192bdf5b6bfcf920832f431d7

I was following since before 36eb1cdc5fcc9a93dc4b7a62393b11522c71, then
have variously updated through:

69dc4987cbe5fe70ae1c2a08906d431d53cdd242
b5dc608c98d929abbf2fe932ed07b3c868d83342
878a3c37d36142a192bdf5b6bfcf920832f431d7

I had hopes from the sound of the bug, that
878a3c37d36142a192bdf5b6bfcf920832f431d7 would fix it, but it did not
appear to, nor did reverting (adjusting for a conflict) commit 9af90d19f
which the above commit addressed an issue with.

Mesa I have libgl1-mesa-dev	        7.10.0+git20101020.28990043-0ubuntu0sarvatt
2D   I have xserver-xorg-video-intel	2:2.12.902+git20101018.a1c54f69-0ubuntu0sarvatt
Xorg I have xserver-xorg-core           2:1.9.0.902+git20101020+server-1.9-branch.4dd316f2-0ubuntu0sarvatt

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Corruption in glxgears with Compiz
  2010-10-22 19:10   ` Peter Clifton
@ 2010-10-22 19:29     ` Chris Wilson
  2010-10-22 19:38       ` Peter Clifton
                         ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Chris Wilson @ 2010-10-22 19:29 UTC (permalink / raw)
  To: Peter Clifton; +Cc: intel-gfx

On Fri, 22 Oct 2010 20:10:44 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> As an additional data-point, with the bug manifesting, if you go to
> "expose" mode, (Win+E for default config), you find the corruption is
> absent. It only appears to be present when the glxgears window is not
> scaled by the window manager.

My guess is that it is a double application of the drawable offset when
doing a CopyRegion swapbuffers. Does the corruption move in relation to
the window as it moves?

This suggests that [my] recent changes to the ddx are to blame, and
certainly a bisection on -intel might help - though it's probably just as
easy to test before the shadow+dri changes to confirm.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Corruption in glxgears with Compiz
  2010-10-22 19:29     ` Chris Wilson
@ 2010-10-22 19:38       ` Peter Clifton
  2010-10-22 20:41       ` Peter Clifton
  2010-10-23  3:35       ` Peter Clifton
  2 siblings, 0 replies; 18+ messages in thread
From: Peter Clifton @ 2010-10-22 19:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx@lists.freedesktop.org

On Fri, 2010-10-22 at 20:29 +0100, Chris Wilson wrote:

> My guess is that it is a double application of the drawable offset when
> doing a CopyRegion swapbuffers. Does the corruption move in relation to
> the window as it moves?

Yes, it appears to. Relative to the screen, (and just squinting at it),
the corruption appears to move twice the distance you move the window.

> This suggests that [my] recent changes to the ddx are to blame, and
> certainly a bisection on -intel might help - though it's probably just as
> easy to test before the shadow+dri changes to confirm.

I thought I'd tried reverting to an older ddx, but thinking more
carefully, perhaps I didn't go back very far. I'll try poking at the 2D
driver. Certainly it is quicker to do that than keep rebuilding drm and
rebooting.

Thanks for the pointer.

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Corruption in glxgears with Compiz
  2010-10-22 19:29     ` Chris Wilson
  2010-10-22 19:38       ` Peter Clifton
@ 2010-10-22 20:41       ` Peter Clifton
  2010-10-23  3:35       ` Peter Clifton
  2 siblings, 0 replies; 18+ messages in thread
From: Peter Clifton @ 2010-10-22 20:41 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, 2010-10-22 at 20:29 +0100, Chris Wilson wrote:
> On Fri, 22 Oct 2010 20:10:44 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> > As an additional data-point, with the bug manifesting, if you go to
> > "expose" mode, (Win+E for default config), you find the corruption is
> > absent. It only appears to be present when the glxgears window is not
> > scaled by the window manager.
> 
> My guess is that it is a double application of the drawable offset when
> doing a CopyRegion swapbuffers. Does the corruption move in relation to
> the window as it moves?
> 
> This suggests that [my] recent changes to the ddx are to blame, and
> certainly a bisection on -intel might help - though it's probably just as
> easy to test before the shadow+dri changes to confirm.
> -Chris

Well, I'm as far back as:

commit d41684d54592cf93554a4d6534e7ea74562b1798
Author: Eric Anholt <eric@anholt.net>
Date:   Mon Jun 7 11:18:09 2010 -0700

And I'm still seeing the glitch. This is with drm backported from
878a3c37d36142a192bdf5b6bfcf920832f431d7

If it weren't for Alexey seeing it too with a non-backported version,
I'd suspect I'd made a mistake somewhere. Hmm.. what to try next?

I'd already attempted to revert mesa versions to when I (thought) it was
working nicely with compiz, but I can't recall quite what version that
was now. (I've purged my /var/cache/apt/archives)

Could the Xorg server cause this kind of issue? Looking at the commit
logs, I don't see much in the way of glx activity. (But would that
affect DRI rendering anyway?)

Are there any tests suggest I run?

Regards,

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Corruption in glxgears with Compiz
  2010-10-22 19:29     ` Chris Wilson
  2010-10-22 19:38       ` Peter Clifton
  2010-10-22 20:41       ` Peter Clifton
@ 2010-10-23  3:35       ` Peter Clifton
  2010-10-23  4:07         ` Peter Clifton
  2 siblings, 1 reply; 18+ messages in thread
From: Peter Clifton @ 2010-10-23  3:35 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, 2010-10-22 at 20:29 +0100, Chris Wilson wrote:
> On Fri, 22 Oct 2010 20:10:44 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> > As an additional data-point, with the bug manifesting, if you go to
> > "expose" mode, (Win+E for default config), you find the corruption is
> > absent. It only appears to be present when the glxgears window is not
> > scaled by the window manager.
> 
> My guess is that it is a double application of the drawable offset when
> doing a CopyRegion swapbuffers. Does the corruption move in relation to
> the window as it moves?
> 
> This suggests that [my] recent changes to the ddx are to blame, and
> certainly a bisection on -intel might help - though it's probably just as
> easy to test before the shadow+dri changes to confirm.
> -Chris

Lost of bisecting and backporting later.. and I've identified the bad
commit:

9220434a8768902cd9cf248709972678b74aa8c1 drm/i915: Only emit a flush
request on the active ring.

I'm not sure what the correct fix is, but a workaround is this:

Actually, I've not tested that yet.. oops. It certainly works with the
if(1) and if (obj->write_domain) bypassing the test for ... &
I915_GEM_GPU_DOMAIN. That wasn't enough alone though, it didn't work
until I changed:

-               if (flush_rings & RING_RENDER)
+               if (1)

Presumably some object is not getting the RENDER_RING added to the
flush_rings field correctly.



git diff HEAD^
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cf27655..a9d528e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1901,7 +1901,9 @@ i915_gem_flush(struct drm_device *dev,
                drm_agp_chipset_flush(dev);
 
        if ((flush_domains | invalidate_domains) & I915_GEM_GPU_DOMAINS) {
-               if (flush_rings & RING_RENDER)
+//        if (1) {
+//             if (flush_rings & RING_RENDER)
+               if (1)
                        i915_gem_flush_ring(dev,
                                            &dev_priv->render_ring,
                                            invalidate_domains, flush_domains);
@@ -4197,6 +4199,7 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
                 * flush earlier is beneficial.
                 */
                if (obj->write_domain & I915_GEM_GPU_DOMAINS) {
+//             if (obj->write_domain) {
                        i915_gem_flush_ring(dev,
                                            obj_priv->ring,
                                            0, obj->write_domain);



-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Corruption in glxgears with Compiz
  2010-10-23  3:35       ` Peter Clifton
@ 2010-10-23  4:07         ` Peter Clifton
  2010-10-23  8:23           ` Alexey Fisher
  2010-10-23  9:10           ` Chris Wilson
  0 siblings, 2 replies; 18+ messages in thread
From: Peter Clifton @ 2010-10-23  4:07 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Sat, 2010-10-23 at 04:35 +0100, Peter Clifton wrote:

> Lost of bisecting and backporting later.. and I've identified the bad
> commit:
> 
> 9220434a8768902cd9cf248709972678b74aa8c1 drm/i915: Only emit a flush
> request on the active ring.

A minimal fix is this:

commit 78342e8fd01614ac0507db1f9c3e0522f4da3c14
Author: Peter Clifton <pcjc2@cam.ac.uk>
Date:   Sat Oct 23 04:00:21 2010 +0100

    Attempted fix

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9290f02..868a399 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3759,7 +3759,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
         */
        dev->invalidate_domains = 0;
        dev->flush_domains = 0;
-       dev_priv->mm.flush_rings = 0;
+       dev_priv->mm.flush_rings = ring->id;
 
        for (i = 0; i < args->buffer_count; i++) {
                struct drm_gem_object *obj = object_list[i];




Although I don't doubt that it is incorrect for some reason. My logic
was this.. the mm.flush_rings is supposed to be |='d with the object's
ring->id if the ring is set on a given object.

But we transfer objects to GPU domain before we actually put them in the
ring, therefore it never gets set.

So this patch just dumps the execbuffer ring into the list of rings to
be flushed. I guess that might be wrong.. perhaps we don't always need
to flush that ring unless an object in it gets reused.. anyway, I'm not
that familiar with GEM internals, and it is gone 5AM here. Still.. I
think I can claim I've nailed the offending commit at least.

Hopefully someone can come up with a sensible patch and explain to me
how this stuff works ;)


Best wishes,

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Corruption in glxgears with Compiz
  2010-10-23  4:07         ` Peter Clifton
@ 2010-10-23  8:23           ` Alexey Fisher
  2010-10-23  9:10           ` Chris Wilson
  1 sibling, 0 replies; 18+ messages in thread
From: Alexey Fisher @ 2010-10-23  8:23 UTC (permalink / raw)
  To: Peter Clifton; +Cc: intel-gfx

Am Samstag, den 23.10.2010, 05:07 +0100 schrieb Peter Clifton:
> On Sat, 2010-10-23 at 04:35 +0100, Peter Clifton wrote:
> 
> > Lost of bisecting and backporting later.. and I've identified the bad
> > commit:
> > 
> > 9220434a8768902cd9cf248709972678b74aa8c1 drm/i915: Only emit a flush
> > request on the active ring.
> 
> A minimal fix is this:
> 
> commit 78342e8fd01614ac0507db1f9c3e0522f4da3c14
> Author: Peter Clifton <pcjc2@cam.ac.uk>
> Date:   Sat Oct 23 04:00:21 2010 +0100
> 
>     Attempted fix
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9290f02..868a399 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3759,7 +3759,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>          */
>         dev->invalidate_domains = 0;
>         dev->flush_domains = 0;
> -       dev_priv->mm.flush_rings = 0;
> +       dev_priv->mm.flush_rings = ring->id;
>  
>         for (i = 0; i < args->buffer_count; i++) {
>                 struct drm_gem_object *obj = object_list[i];
> 
> 
> 
> 

Can only add here, me too.  This patch fix it on my board too.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Corruption in glxgears with Compiz
  2010-10-23  4:07         ` Peter Clifton
  2010-10-23  8:23           ` Alexey Fisher
@ 2010-10-23  9:10           ` Chris Wilson
  2010-10-23  9:43             ` Alexey Fisher
  2010-10-23 11:42             ` Peter Clifton
  1 sibling, 2 replies; 18+ messages in thread
From: Chris Wilson @ 2010-10-23  9:10 UTC (permalink / raw)
  To: Peter Clifton; +Cc: intel-gfx

On Sat, 23 Oct 2010 05:07:57 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> Although I don't doubt that it is incorrect for some reason. My logic
> was this.. the mm.flush_rings is supposed to be |='d with the object's
> ring->id if the ring is set on a given object.

Well the whole inter-ring flushing is decidedly suspect since we have no
synchronisation between rings, yet. However in this scenario, you are just
using one ring...

If an object is in a GPU domain and so requires a flush, it is attached to
a ring. However, if the object needs an invalidation it may not yet be
attached to the ring (and in any event the invalidation needs to be
performed on the pending ring). Ahah.

Note to self: flushes must be done on the from-ring before the semaphore
and invalidations on the to-ring after the semaphore.

Can you try this patch?

diff --git a/drivers/gpu/drm/i915/i915_gem.c
b/drivers/gpu/drm/i915/i915_gem.c
index 9290f02..e7f27a5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3078,7 +3078,8 @@ i915_gem_object_set_to_cpu_domain(struct
drm_gem_object *o
bj, int write)
  *		drm_agp_chipset_flush
  */
 static void
-i915_gem_object_set_to_gpu_domain(struct drm_gem_object *obj)
+i915_gem_object_set_to_gpu_domain(struct drm_gem_object *obj,
+				  struct intel_ring_buffer *ring)
 {
 	struct drm_device		*dev = obj->dev;
 	struct drm_i915_private		*dev_priv = dev->dev_private;
@@ -3132,8 +3133,10 @@ i915_gem_object_set_to_gpu_domain(struct
drm_gem_object *
obj)
 
 	dev->invalidate_domains |= invalidate_domains;
 	dev->flush_domains |= flush_domains;
-	if (obj_priv->ring)
+	if (flush_domains & I915_GEM_GPU_DOMAINS)
 		dev_priv->mm.flush_rings |= obj_priv->ring->id;
+	if (invalidate_domains & I915_GEM_GPU_DOMAINS)
+		dev_priv->mm.flush_rings |= ring->id;
 
 	trace_i915_gem_object_change_domain(obj,
 					    old_read_domains,
@@ -3765,7 +3768,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void
*data,
 		struct drm_gem_object *obj = object_list[i];
 
 		/* Compute new gpu domains and update invalidate/flush */
-		i915_gem_object_set_to_gpu_domain(obj);
+		i915_gem_object_set_to_gpu_domain(obj, ring);
 	}
 
 	if (dev->invalidate_domains | dev->flush_domains) {

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Corruption in glxgears with Compiz
  2010-10-23  9:10           ` Chris Wilson
@ 2010-10-23  9:43             ` Alexey Fisher
  2010-10-23 10:07               ` Chris Wilson
  2010-10-23 11:42             ` Peter Clifton
  1 sibling, 1 reply; 18+ messages in thread
From: Alexey Fisher @ 2010-10-23  9:43 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

Am Samstag, den 23.10.2010, 10:10 +0100 schrieb Chris Wilson:
> On Sat, 23 Oct 2010 05:07:57 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> > Although I don't doubt that it is incorrect for some reason. My logic
> > was this.. the mm.flush_rings is supposed to be |='d with the object's
> > ring->id if the ring is set on a given object.
> 
> Well the whole inter-ring flushing is decidedly suspect since we have no
> synchronisation between rings, yet. However in this scenario, you are just
> using one ring...
> 
> If an object is in a GPU domain and so requires a flush, it is attached to
> a ring. However, if the object needs an invalidation it may not yet be
> attached to the ring (and in any event the invalidation needs to be
> performed on the pending ring). Ahah.
> 
> Note to self: flushes must be done on the from-ring before the semaphore
> and invalidations on the to-ring after the semaphore.
> 
> Can you try this patch?
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c
> b/drivers/gpu/drm/i915/i915_gem.c
> index 9290f02..e7f27a5 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3078,7 +3078,8 @@ i915_gem_object_set_to_cpu_domain(struct
> drm_gem_object *o
> bj, int write)
>   *		drm_agp_chipset_flush
>   */
>  static void
> -i915_gem_object_set_to_gpu_domain(struct drm_gem_object *obj)
> +i915_gem_object_set_to_gpu_domain(struct drm_gem_object *obj,
> +				  struct intel_ring_buffer *ring)
>  {
>  	struct drm_device		*dev = obj->dev;
>  	struct drm_i915_private		*dev_priv = dev->dev_private;
> @@ -3132,8 +3133,10 @@ i915_gem_object_set_to_gpu_domain(struct
> drm_gem_object *
> obj)
>  
>  	dev->invalidate_domains |= invalidate_domains;
>  	dev->flush_domains |= flush_domains;
> -	if (obj_priv->ring)
> +	if (flush_domains & I915_GEM_GPU_DOMAINS)
>  		dev_priv->mm.flush_rings |= obj_priv->ring->id;
> +	if (invalidate_domains & I915_GEM_GPU_DOMAINS)
> +		dev_priv->mm.flush_rings |= ring->id;
>  
>  	trace_i915_gem_object_change_domain(obj,
>  					    old_read_domains,
> @@ -3765,7 +3768,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void
> *data,
>  		struct drm_gem_object *obj = object_list[i];
>  
>  		/* Compute new gpu domains and update invalidate/flush */
> -		i915_gem_object_set_to_gpu_domain(obj);
> +		i915_gem_object_set_to_gpu_domain(obj, ring);
>  	}
>  
>  	if (dev->invalidate_domains | dev->flush_domains) {
> 


Works for me.
Your mail client broke the patch, so "git am" didn't worked.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Corruption in glxgears with Compiz
  2010-10-23  9:43             ` Alexey Fisher
@ 2010-10-23 10:07               ` Chris Wilson
  0 siblings, 0 replies; 18+ messages in thread
From: Chris Wilson @ 2010-10-23 10:07 UTC (permalink / raw)
  To: Alexey Fisher; +Cc: intel-gfx

On Sat, 23 Oct 2010 11:43:07 +0200, Alexey Fisher <bug-track@fisher-privat.net> wrote:
> Works for me.
> Your mail client broke the patch, so "git am" didn't worked.

Apologies, I was being lazy and just cut'n'paste into vim without paying
attention.

That patch works as a stop gap, the real fun begins getting the inter-ring
synchronisation right.

Thanks,
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Corruption in glxgears with Compiz
  2010-10-23  9:10           ` Chris Wilson
  2010-10-23  9:43             ` Alexey Fisher
@ 2010-10-23 11:42             ` Peter Clifton
  2010-10-23 17:48               ` Chris Wilson
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Clifton @ 2010-10-23 11:42 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Sat, 2010-10-23 at 10:10 +0100, Chris Wilson wrote:
> On Sat, 23 Oct 2010 05:07:57 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> > Although I don't doubt that it is incorrect for some reason. My logic
> > was this.. the mm.flush_rings is supposed to be |='d with the object's
> > ring->id if the ring is set on a given object.
> 
> Well the whole inter-ring flushing is decidedly suspect since we have no
> synchronisation between rings, yet. However in this scenario, you are just
> using one ring...
> 
> If an object is in a GPU domain and so requires a flush, it is attached to
> a ring. However, if the object needs an invalidation it may not yet be
> attached to the ring (and in any event the invalidation needs to be
> performed on the pending ring). Ahah.
> 
> Note to self: flushes must be done on the from-ring before the semaphore
> and invalidations on the to-ring after the semaphore.
> 
> Can you try this patch?

Your patch works a treat.. I knew mine was really only a band-aid which
forced a flush on the pending indiscriminately, and was glad to see the
proper fix. 

Really difficult to get your head round all this flush / invalidate
stuff. I get the idea, but in practice it is very confusing due to the
fact it is all deferred / scheduled work, and both subtly different
concepts (flush / invalidate) concepts are handled by the same action on
the GPU, and very similar code! Very easy to muddle current / pending
ring in my head, for example.

You replied to Alexey that the patch is only a stop gap, and inter-ring
synchronisation is the real challenge. I guess that is something you'll
be forced to look at with the new Sandybridge chipset having a separate
ring for BLT operations?

I'm just looking for fps with my circuit board rendering GL code at the
moment.. that's why I'm following git HEAD stuff, to see if the drivers
can unlock some performance in the code I'm writing. I'm struggling to
profile just what the bottleneck is!

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Corruption in glxgears with Compiz
  2010-10-23 11:42             ` Peter Clifton
@ 2010-10-23 17:48               ` Chris Wilson
  2010-10-23 18:33                 ` Peter Clifton
  2010-10-24 23:06                 ` Peter Clifton
  0 siblings, 2 replies; 18+ messages in thread
From: Chris Wilson @ 2010-10-23 17:48 UTC (permalink / raw)
  To: Peter Clifton; +Cc: intel-gfx

On Sat, 23 Oct 2010 12:42:05 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> Your patch works a treat.. I knew mine was really only a band-aid which
> forced a flush on the pending indiscriminately, and was glad to see the
> proper fix. 
> 
> Really difficult to get your head round all this flush / invalidate
> stuff. I get the idea, but in practice it is very confusing due to the
> fact it is all deferred / scheduled work, and both subtly different
> concepts (flush / invalidate) concepts are handled by the same action on
> the GPU, and very similar code! Very easy to muddle current / pending
> ring in my head, for example.
> 
> You replied to Alexey that the patch is only a stop gap, and inter-ring
> synchronisation is the real challenge. I guess that is something you'll
> be forced to look at with the new Sandybridge chipset having a separate
> ring for BLT operations?

Exactly. We already have the issue on i965 with the Bitstream Decoder ring
which handles video separate from the render ring. Fortunately no one has
fallen over the lack of synchronisation there since the API design makes
interoperating GL/RENDER/Video so difficult. Even worse is that it is only
with Sandybridge that we have the ability to insert semaphores onto the
ring to handle inter-ring synchronisation on the GPU, otherwise we will
simply have to wait on retirement when transferring ownership from one
ring to another.  Is it worth the additional complexity to have buffers
reside on multiple rings at the same time? Possibly if we do start mixing
video + GL.  Anyway with the BLT split, handling synchronisation will
become an issue.
 
> I'm just looking for fps with my circuit board rendering GL code at the
> moment.. that's why I'm following git HEAD stuff, to see if the drivers
> can unlock some performance in the code I'm writing. I'm struggling to
> profile just what the bottleneck is!

Aye, profiling GPU code at the moment is a hard problem. If you do find
some CPU bottlenecks, they're usually the easiest to fix. What may help is
to sync every operation and see what the relative times + relative
frequencies to work out the rate limiting step and then see if you can
break it down further and repeat. (Even if we had a GPU callgrind, given
the disconnect between what is executed on the GPU and GL, it may not be
obvious how to improve the code.) uprof may help here given the
annotations Robert Brag has made for mesa profiling.

We're always eager to improve our code to get the most of our admittedly
lack-luster GPUs. Even suggests on what tools would be useful or
improvements we could make to improve profiling/development are most
welcome.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Corruption in glxgears with Compiz
  2010-10-23 17:48               ` Chris Wilson
@ 2010-10-23 18:33                 ` Peter Clifton
  2010-10-24 23:06                 ` Peter Clifton
  1 sibling, 0 replies; 18+ messages in thread
From: Peter Clifton @ 2010-10-23 18:33 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx@lists.freedesktop.org

On Sat, 2010-10-23 at 18:48 +0100, Chris Wilson wrote:

> We're always eager to improve our code to get the most of our admittedly
> lack-luster GPUs. Even suggests on what tools would be useful or
> improvements we could make to improve profiling/development are most
> welcome.

One thing I was wondering about, was intel_gpu_top. It reports unit
usage based on busy / done registers in the chip. I wondered what would
happen if we polled those registers and graphed them in time... whether
it would show any hints as to which units were waiting on each other,
and where any gaps are.

It would need to be graphical probably, and it would need to be
synchronised in some way to the application / frames being processed, so
all in all, it is rather hard to imagine how it would work with perhaps
unrelated GPU activity going on for other things such as the compositor
and toolkit redrawing.

I sometimes wonder if it is just memory bandwidth constraining things..
perhaps I need to look to the chipset docs and see if there are any
performance diagnostic regs there as well.

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Corruption in glxgears with Compiz
  2010-10-23 17:48               ` Chris Wilson
  2010-10-23 18:33                 ` Peter Clifton
@ 2010-10-24 23:06                 ` Peter Clifton
  1 sibling, 0 replies; 18+ messages in thread
From: Peter Clifton @ 2010-10-24 23:06 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx@lists.freedesktop.org

On Sat, 2010-10-23 at 18:48 +0100, Chris Wilson wrote:

> Aye, profiling GPU code at the moment is a hard problem. If you do find
> some CPU bottlenecks, they're usually the easiest to fix. What may help is
> to sync every operation and see what the relative times + relative
> frequencies to work out the rate limiting step and then see if you can
> break it down further and repeat. (Even if we had a GPU callgrind, given
> the disconnect between what is executed on the GPU and GL, it may not be
> obvious how to improve the code.) uprof may help here given the
> annotations Robert Brag has made for mesa profiling.

uprof looks interesting, but I couldn't see anything in git head mesa
relating to it. When I profiled in the past, I noticed my use of glClear
was a problem. I've reduced it by a factor of eight by more intelligent
use of the stencil buffer bitplanes, and might be able to do better
still with some thought about encoding and / or abuse of the depth
buffer.

Enabling debugging shows that I'm always falling onto the mesa meta
clear path as the depth / stencil buffer is tiled on the GM45. The BLT
engine can't write to that and mesa has to save and restore nearly the
entire 3D state for every clear.

I'm tempted to try open-coding the stencil buffer clears using GL calls
as I won't need to modify so much state as mesa has to. Still, I'm not
sure if there would be much difference in overhead between a big
state-change and a small one.


PCB design / CAD applications are very graphics intensive, so I should
perhaps have looked at a heavier weight laptop to do them on, but I'd
dearly love to support less performant GL capable hardware too as many
of our users are on oldish hardware. Being a bit fps challenged myself
helps me find more devious ways to keep frame-rate up ;) still, glxgears
only manages 30fps at full screen, so I don't expect miracles!

> We're always eager to improve our code to get the most of our admittedly
> lack-luster GPUs. Even suggests on what tools would be useful or
> improvements we could make to improve profiling/development are most
> welcome.

The code is already so much better. I can remember before the pre DRI2
days, pre GEM, pre KMS.. I just can't imagine a desktop without seamless
compositing and GL working any more.

The hard work from everyone at Intel, the mesa developers, and those
working on all the other OSS drivers is really really bringing the Linux
desktop up to scratch. Very very many people have a lot to be thankful
to you guys for.

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-10-24 23:06 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-22 12:53 Corruption in glxgears with Compiz Peter Clifton
2010-10-22 13:39 ` Chris Wilson
2010-10-22 14:04   ` Alexey Fisher
2010-10-22 19:10   ` Peter Clifton
2010-10-22 19:29     ` Chris Wilson
2010-10-22 19:38       ` Peter Clifton
2010-10-22 20:41       ` Peter Clifton
2010-10-23  3:35       ` Peter Clifton
2010-10-23  4:07         ` Peter Clifton
2010-10-23  8:23           ` Alexey Fisher
2010-10-23  9:10           ` Chris Wilson
2010-10-23  9:43             ` Alexey Fisher
2010-10-23 10:07               ` Chris Wilson
2010-10-23 11:42             ` Peter Clifton
2010-10-23 17:48               ` Chris Wilson
2010-10-23 18:33                 ` Peter Clifton
2010-10-24 23:06                 ` Peter Clifton
2010-10-22 19:13   ` Peter Clifton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.