All of lore.kernel.org
 help / color / mirror / Atom feed
* intel_gpu_top decode..
@ 2010-10-06 17:55 Peter Clifton
  2010-10-06 22:27 ` Eric Anholt
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Clifton @ 2010-10-06 17:55 UTC (permalink / raw)
  To: intel-gfx

Hi,

Can anyone point me at what this intel_gpu_top output (below) indicates
regarding what is limiting the frame-rate of my drawing?

Primarily I'm throwing a lot of triangles and texture coordinates into a
vertex array, compiling the lot into a display list and benchmarking the
frame-rate I can achieve at given window sizes (not vblank limited). For
the data below, I'm just drawing lines (with two triangles each, then
two more at each end for caps). I have some colour changes, but these
are handled with a flush of my vertex array and a glColor call. Colour
changes should be relatively infrequent though.

Round line caps are being drawn using two triangles (to make a square),
with texture coordinates spaced between -1 and 1 to span the square. The
round object is drawn with an implicit texture using this shader:

void main()
{
  float sqdist;

  sqdist = dot (gl_TexCoord[0].st, gl_TexCoord[0].st);
  if (sqdist > 1.0)
    discard;

  gl_FragColor = gl_Color;
}

The line geometry is also hitting this same shader, but with texture
coordinates set to 0.0, 0.0 so it is not clipped.

This is on a GM45. Am I correct in thinking the geometry transfer is the
indicated bottle-neck? (VF CS is vertex fetch command stream, right?)

From the fact the pixel shader is at 70%, I presume I'm not (yet)
fill-rate limited, but not that far from it either.

I've no idea what the other acronyms are, and the PRM doesn't help
immediately. Is UC0 related to clipping? Can I reduce it?


core clock: 400 Mhz
                     ring idle:   1%: ▌                                        
                    ring space: 256/126976 (0%)
                          task  percent busy

                         VF CS:  91%: ████████████████████████████████████▌    
                        UC0 CS:  88%: ███████████████████████████████████▍     
                        ISC CS:  88%: ███████████████████████████████████▍     
                         GS CS:  88%: ███████████████████████████████████▍     
                        VS0 CS:  82%: █████████████████████████████████        
                         CL CS:  82%: █████████████████████████████████        
                    MASM CS CR:  80%: ████████████████████████████████▏        
                   Row 1, EU 3:  78%: ███████████████████████████████▍         
                   Row 0, EU 3:  71%: ████████████████████████████▌            
                  Pixel shader:  70%: ████████████████████████████▏            
                   Bypass FIFO:  69%: ███████████████████████████▊             
                    Windowizer:  68%: ███████████████████████████▍             
                   Row 1, EU 2:  63%: █████████████████████████▍               
                     Filtering:  62%: █████████████████████████                
                   Row 0, EU 2:  60%: ████████████████████████▏                
                        URB CS:  57%: ███████████████████████                  
                  Setup Engine:  55%: ██████████████████████▏                  
                    Map filter:  54%: █████████████████████▊                   
                   Row 1, EU 1:  50%: ████████████████████▏                    
                   Row 0, EU 1:  47%: ███████████████████                      
            Texture decompress:  45%: ██████████████████▏                      
                 Sampler cache:  44%: █████████████████▊                       
                 Texture fetch:  44%: █████████████████▊                       
                   Row 1, EU 0:  43%: █████████████████▍                       
            Projection and LOD:  24%: █████████▊                               
   Dependent address generator:  22%: █████████                                
                    Dispatcher:  18%: ███████▍                                 
         Message Arbiter row 1:  11%: ████▌                                    
                    SVDR CS CR:   6%: ██▌                                      
                     EM1 CS CR:   5%: ██▏                                      
                    SVSM CS CR:   2%: █     


Best regards,
           

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: intel_gpu_top decode..
  2010-10-06 17:55 intel_gpu_top decode Peter Clifton
@ 2010-10-06 22:27 ` Eric Anholt
  2010-10-07 11:41   ` Peter Clifton
  2010-10-07 12:55   ` Peter Clifton
  0 siblings, 2 replies; 5+ messages in thread
From: Eric Anholt @ 2010-10-06 22:27 UTC (permalink / raw)
  To: Peter Clifton, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 6974 bytes --]

On Wed, 06 Oct 2010 18:55:55 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> Hi,
> 
> Can anyone point me at what this intel_gpu_top output (below) indicates
> regarding what is limiting the frame-rate of my drawing?

I'll take a stab at what I can, but honestly I find the status reports
of the chip fairly mystic myself.

> Primarily I'm throwing a lot of triangles and texture coordinates into a
> vertex array, compiling the lot into a display list and benchmarking the
> frame-rate I can achieve at given window sizes (not vblank limited). For
> the data below, I'm just drawing lines (with two triangles each, then
> two more at each end for caps). I have some colour changes, but these
> are handled with a flush of my vertex array and a glColor call. Colour
> changes should be relatively infrequent though.
> 
> Round line caps are being drawn using two triangles (to make a square),
> with texture coordinates spaced between -1 and 1 to span the square. The
> round object is drawn with an implicit texture using this shader:
> 
> void main()
> {
>   float sqdist;
> 
>   sqdist = dot (gl_TexCoord[0].st, gl_TexCoord[0].st);
>   if (sqdist > 1.0)
>     discard;
> 
>   gl_FragColor = gl_Color;
> }
> 
> The line geometry is also hitting this same shader, but with texture
> coordinates set to 0.0, 0.0 so it is not clipped.
> 
> This is on a GM45. Am I correct in thinking the geometry transfer is the
> indicated bottle-neck? (VF CS is vertex fetch command stream, right?)
> 
> From the fact the pixel shader is at 70%, I presume I'm not (yet)
> fill-rate limited, but not that far from it either.

Generally, a unit appears to also report busy if it's stalled on getting
its data downstream.  So I read your output as probably VF is 9% busy
and 88% waiting for VS, and CL (clipper) is 14% busy, and 68% waiting on
windowizer (fragment shader).  That's just approximately, since we don't
have metrics here for how much is actually stalled vs accomplishing
something.  Ideally, everyone would report busy all the time getting
work done, but another debug tool for Ironlake I'm working on getting
released has strongly pointed to "units are either starved or stalled,
and rarely doing real work."

> I've no idea what the other acronyms are, and the PRM doesn't help
> immediately. Is UC0 related to clipping? Can I reduce it?

Not sure what that one is.

> core clock: 400 Mhz
>                      ring idle:   1%: ▌                                        
>                     ring space: 256/126976 (0%)
>                           task  percent busy
> 
>                          VF CS:  91%: ████████████████████████████████████▌    
>                         UC0 CS:  88%: ███████████████████████████████████▍     
>                         ISC CS:  88%: ███████████████████████████████████▍     
>                          GS CS:  88%: ███████████████████████████████████▍     
>                         VS0 CS:  82%: █████████████████████████████████        
>                          CL CS:  82%: █████████████████████████████████        
>                     MASM CS CR:  80%: ████████████████████████████████▏        
>                    Row 1, EU 3:  78%: ███████████████████████████████▍         
>                    Row 0, EU 3:  71%: ████████████████████████████▌            
>                   Pixel shader:  70%: ████████████████████████████▏            
>                    Bypass FIFO:  69%: ███████████████████████████▊             
>                     Windowizer:  68%: ███████████████████████████▍             
>                    Row 1, EU 2:  63%: █████████████████████████▍               
>                      Filtering:  62%: █████████████████████████                
>                    Row 0, EU 2:  60%: ████████████████████████▏                
>                         URB CS:  57%: ███████████████████████                  
>                   Setup Engine:  55%: ██████████████████████▏                  
>                     Map filter:  54%: █████████████████████▊                   
>                    Row 1, EU 1:  50%: ████████████████████▏                    
>                    Row 0, EU 1:  47%: ███████████████████                      
>             Texture decompress:  45%: ██████████████████▏                      
>                  Sampler cache:  44%: █████████████████▊                       
>                  Texture fetch:  44%: █████████████████▊                       
>                    Row 1, EU 0:  43%: █████████████████▍                       
>             Projection and LOD:  24%: █████████▊                               
>    Dependent address generator:  22%: █████████                                
>                     Dispatcher:  18%: ███████▍                                 
>          Message Arbiter row 1:  11%: ████▌                                    
>                     SVDR CS CR:   6%: ██▌                                      
>                      EM1 CS CR:   5%: ██▏                                      
>                     SVSM CS CR:   2%: █     

Of this, I'd say that you're spending a surprising amount of time in
texture fetch.  Finding ways to reduce texture bandwidth may pay off,
assuming that (texture fetch / sampler cache) is the percentage of the
time you're cache missing.  I'm not sure if that's true or not, though.
And you said that this data was just for the line drawing, which didn't
appear to have any texturing going on at all, so I'm just confused.


[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: intel_gpu_top decode..
  2010-10-06 22:27 ` Eric Anholt
@ 2010-10-07 11:41   ` Peter Clifton
  2010-10-07 11:47     ` Peter Clifton
  2010-10-07 12:55   ` Peter Clifton
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Clifton @ 2010-10-07 11:41 UTC (permalink / raw)
  To: Eric Anholt, intel-gfx@lists.freedesktop.org

On Wed, 2010-10-06 at 15:27 -0700, Eric Anholt wrote:

> Of this, I'd say that you're spending a surprising amount of time in
> texture fetch.  Finding ways to reduce texture bandwidth may pay off,
> assuming that (texture fetch / sampler cache) is the percentage of the
> time you're cache missing.  I'm not sure if that's true or not, though.
> And you said that this data was just for the line drawing, which didn't
> appear to have any texturing going on at all, so I'm just confused.

Indeed, I'm not using texturing at all!

I had assumed the texture unit might have been being used by internal
threads in the GPU for something - or that it might have been due to
other X11 applications running in the background.

I switched off compiz for my benchmarking, and was just using metacity.
Of course, I'm not clear on what cairo / pixman would activate when
drawing. That said.. when _not_ running my benchmark, there is very
little activity from the GPU at all - so it doesn't seem related.

I'm probably doing a few bitmasked glClears of things like the stencil
buffer (part of the compositing / rendering in my app). I can't quite
imagine why, but perhaps mesa is using the texture unit for some
operations.


Thanks for the hint on the busy / stalled figures. That helps interpret
things!

Best wishes,


-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: intel_gpu_top decode..
  2010-10-07 11:41   ` Peter Clifton
@ 2010-10-07 11:47     ` Peter Clifton
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Clifton @ 2010-10-07 11:47 UTC (permalink / raw)
  To: Eric Anholt, intel-gfx@lists.freedesktop.org

On Thu, 2010-10-07 at 12:41 +0100, Peter Clifton wrote:
> On Wed, 2010-10-06 at 15:27 -0700, Eric Anholt wrote:
> 
> > Of this, I'd say that you're spending a surprising amount of time in
> > texture fetch.  Finding ways to reduce texture bandwidth may pay off,
> > assuming that (texture fetch / sampler cache) is the percentage of the
> > time you're cache missing.  I'm not sure if that's true or not, though.
> > And you said that this data was just for the line drawing, which didn't
> > appear to have any texturing going on at all, so I'm just confused.
> 
> Indeed, I'm not using texturing at all!

Well - actually, I'm setting texture coordinates for all vertices,
specifically to pass data to the vertex and pixel shader. I've got the
code running with the fixed function vertex pipeline (no shader), and
unfortunately can't recall whether or not this was the case I sent you
the profile for. I don't recall it making much of a difference though.

 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: intel_gpu_top decode..
  2010-10-06 22:27 ` Eric Anholt
  2010-10-07 11:41   ` Peter Clifton
@ 2010-10-07 12:55   ` Peter Clifton
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Clifton @ 2010-10-07 12:55 UTC (permalink / raw)
  To: Eric Anholt; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 11405 bytes --]

On Wed, 2010-10-06 at 15:27 -0700, Eric Anholt wrote:

> Of this, I'd say that you're spending a surprising amount of time in
> texture fetch.  Finding ways to reduce texture bandwidth may pay off,
> assuming that (texture fetch / sampler cache) is the percentage of the
> time you're cache missing.  I'm not sure if that's true or not, though.
> And you said that this data was just for the line drawing, which didn't
> appear to have any texturing going on at all, so I'm just confused.

I cut down my rendering code to almost nothing (just draw a blank
full-viewport quad 30 times):

    int count = 0;
    glBegin (GL_QUADS);
    for (count = 0; count < 30; count++) {
      glVertex3i (0,             0,              0);
      glVertex3i (PCB->MaxWidth, 0,              0);
      glVertex3i (PCB->MaxWidth, PCB->MaxHeight, 0);
      glVertex3i (0,             PCB->MaxHeight, 0);
    }
    glEnd ();


and was able to get this trace:

Texture unit %busy seems to go up with increasing count above.

core clock: 400 Mhz
                     ring idle:   3%: █▍                                       
                    ring space: 60/126976 (0%)
                          task  percent busy

                        UC0 CS:  93%: █████████████████████████████████████▍   
                         VF CS:  92%: █████████████████████████████████████    
                   Row 1, EU 3:  92%: █████████████████████████████████████    
                    Windowizer:  92%: █████████████████████████████████████    
                   Row 0, EU 3:  92%: █████████████████████████████████████    
                   Row 1, EU 2:  92%: █████████████████████████████████████    
                   Row 0, EU 2:  92%: █████████████████████████████████████    
                  Setup Engine:  92%: █████████████████████████████████████    
                   Row 1, EU 1:  92%: █████████████████████████████████████    
                        ISC CS:  92%: █████████████████████████████████████    
                   Row 0, EU 1:  92%: █████████████████████████████████████    
                   Row 1, EU 0:  92%: █████████████████████████████████████    
                   Bypass FIFO:  92%: █████████████████████████████████████    
                    Map filter:  92%: █████████████████████████████████████    
                  Pixel shader:  92%: █████████████████████████████████████    
                     Filtering:  92%: █████████████████████████████████████    
                         CL CS:  79%: ███████████████████████████████▊         
            Texture decompress:  76%: ██████████████████████████████▌          
                 Texture fetch:  74%: █████████████████████████████▊           
                 Sampler cache:  73%: █████████████████████████████▍           
                         GS CS:  70%: ████████████████████████████▏            
                        URB CS:  69%: ███████████████████████████▊             
                        VS0 CS:  54%: █████████████████████▊                   
            Projection and LOD:  50%: ████████████████████▏                    
                    Dispatcher:  39%: ███████████████▊                         
   Dependent address generator:  38%: ███████████████▍                         
         Message Arbiter row 1:  19%: ███████▊                                 
                     EM1 CS CR:   7%: ███                                      
                    SVDR CS CR:   4%: █▊                                       
                    SVSM CS CR:   1%: ▌                                        
                    SVTW CS CR:   0%: ▏                                        
                      GW CS CR:   0%: ▏                                        
                        UC1 CS:   0%: ▏                                        
                    SVRW CS CR:   0%: ▏                                        
                    SVRR CS CR:   0%: ▏                                        
                     EM0 CS CR:   0%: ▏                                        
                     MAW CS CR:   0%: ▏                                        
                    MASM CS CR:   0%: ▏                                        
                    SVDW CS CR:   0%: ▏        


This was just with the following rendering code (and the fragment shader
I posted before, FF vertex pipeline). Without the fragment shader, I got
this (similar) trace:


                        UC0 CS:  82%: █████████████████████████████████        
                         VF CS:  82%: █████████████████████████████████        
                   Row 1, EU 3:  82%: █████████████████████████████████        
                    Windowizer:  82%: █████████████████████████████████        
                   Row 0, EU 3:  82%: █████████████████████████████████        
                        ISC CS:  82%: █████████████████████████████████        
                   Row 1, EU 2:  82%: █████████████████████████████████        
                  Setup Engine:  82%: █████████████████████████████████        
                   Row 0, EU 2:  82%: █████████████████████████████████        
                   Row 1, EU 1:  82%: █████████████████████████████████        
                   Row 1, EU 0:  82%: █████████████████████████████████        
                   Row 0, EU 1:  82%: █████████████████████████████████        
                    Map filter:  81%: ████████████████████████████████▌        
                 Texture fetch:  79%: ███████████████████████████████▊         
                   Bypass FIFO:  77%: ███████████████████████████████          
                  Pixel shader:  75%: ██████████████████████████████▏          
            Texture decompress:  71%: ████████████████████████████▌            
                     Filtering:  71%: ████████████████████████████▌            
                         CL CS:  70%: ████████████████████████████▏            
                 Sampler cache:  70%: ████████████████████████████▏            
                        URB CS:  68%: ███████████████████████████▍             
                         GS CS:  62%: █████████████████████████                
                        VS0 CS:  49%: ███████████████████▊                     
         Message Arbiter row 1:  32%: █████████████                            
                     EM1 CS CR:  13%: █████▍                                   
                    SVDR CS CR:   7%: ███                                      
                    SVSM CS CR:   5%: ██▏                                      
                    SVTW CS CR:   0%: ▏                                        
                      GW CS CR:   0%: ▏                                        
                        UC1 CS:   0%: ▏                                        
                     MAW CS CR:   0%: ▏                                        
                    SVRR CS CR:   0%: ▏                                        
                    SVRW CS CR:   0%: ▏                                        
                     EM0 CS CR:   0%: ▏                                        
                    MASM CS CR:   0%: ▏                                        
                    SVDW CS CR:   0%: ▏                                        
                    MASF CS CR:   0%: ▏                                        
            Projection and LOD:   0%: ▏                                        
                    Dispatcher:   0%: ▏                            


For the shader case, I captured a GPU dump during execution. It is
attached (gzipped)

Is it possible the texture unit is involved in copying buffers? Or is it
just a case of mis-reporting what is active?


-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

[-- Attachment #2: gpu_dump_texture_unit_busy.txt.gz --]
[-- Type: application/x-gzip, Size: 158631 bytes --]

[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-10-07 12:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-06 17:55 intel_gpu_top decode Peter Clifton
2010-10-06 22:27 ` Eric Anholt
2010-10-07 11:41   ` Peter Clifton
2010-10-07 11:47     ` Peter Clifton
2010-10-07 12:55   ` Peter Clifton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.