* i915_gem_evict_something in sysprof trace using VBOs
@ 2010-11-05 10:21 Peter Clifton
2010-11-05 10:35 ` Chris Wilson
0 siblings, 1 reply; 4+ messages in thread
From: Peter Clifton @ 2010-11-05 10:21 UTC (permalink / raw)
To: intel-gfx@lists.freedesktop.org
I was playing with my VBO code, and noticed this sysprof trace
(non-interesting stuff pruned):
drm_ioctl 0.13% 56.08%
i915_gem_execbuffer2 0.00% 32.50%
i915_gem_do_execbuffer 0.08% 32.50%
i915_gem_object_pin 0.00% 17.47%
i915_gem_object_bind_to_gtt 0.03% 17.44%
i915_gem_evict_something 0.00% 15.54%
i915_gem_object_unbind 0.00% 15.31%
i915_gem_object_set_to_cpu_domain 0.00% 13.33%
i915_gem_clflush_object 0.00% 13.33%
i915_gem_clflush_object 0.00% 14.29%
i915_gem_mmap_gtt_ioctl 0.00% 10.74%
i915_gem_set_domain_ioctl 0.00% 4.98%
The i915_gem_evict_something has me curious. Presumably I have too many
pages of data actively being used by the GPU (or mapped).
--
Peter Clifton
Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA
Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: i915_gem_evict_something in sysprof trace using VBOs
2010-11-05 10:21 i915_gem_evict_something in sysprof trace using VBOs Peter Clifton
@ 2010-11-05 10:35 ` Chris Wilson
2010-11-05 11:44 ` Peter Clifton
0 siblings, 1 reply; 4+ messages in thread
From: Chris Wilson @ 2010-11-05 10:35 UTC (permalink / raw)
To: Peter Clifton, intel-gfx@lists.freedesktop.org
On Fri, 05 Nov 2010 10:21:07 +0000, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> I was playing with my VBO code, and noticed this sysprof trace
> (non-interesting stuff pruned):
>
> drm_ioctl 0.13% 56.08%
> i915_gem_execbuffer2 0.00% 32.50%
> i915_gem_do_execbuffer 0.08% 32.50%
> i915_gem_object_pin 0.00% 17.47%
> i915_gem_object_bind_to_gtt 0.03% 17.44%
> i915_gem_evict_something 0.00% 15.54%
> i915_gem_object_unbind 0.00% 15.31%
> i915_gem_object_set_to_cpu_domain 0.00% 13.33%
> i915_gem_clflush_object 0.00% 13.33%
> i915_gem_clflush_object 0.00% 14.29%
> i915_gem_mmap_gtt_ioctl 0.00% 10.74%
> i915_gem_set_domain_ioctl 0.00% 4.98%
>
>
> The i915_gem_evict_something has me curious. Presumably I have too many
> pages of data actively being used by the GPU (or mapped).
Yes, you are suffering from aperture thrashing. There are a few ways to
workaround this (1) decrease the size of your working set (reduce texture
sizes, reuse as many buffers within the aperture as possible), (2)
increase the size of the aperture (check your BIOS AGP size and apply
drm-intel-next to get the benefit of the full-GTT), (3) add an uncached
page cache to avoid those costly clflushes.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: i915_gem_evict_something in sysprof trace using VBOs
2010-11-05 10:35 ` Chris Wilson
@ 2010-11-05 11:44 ` Peter Clifton
2010-11-05 14:30 ` Peter Clifton
0 siblings, 1 reply; 4+ messages in thread
From: Peter Clifton @ 2010-11-05 11:44 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx@lists.freedesktop.org
On Fri, 2010-11-05 at 10:35 +0000, Chris Wilson wrote:
> On Fri, 05 Nov 2010 10:21:07 +0000, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> > I was playing with my VBO code, and noticed this sysprof trace
> > (non-interesting stuff pruned):
> >
> > drm_ioctl 0.13% 56.08%
> > i915_gem_execbuffer2 0.00% 32.50%
> > i915_gem_do_execbuffer 0.08% 32.50%
> > i915_gem_object_pin 0.00% 17.47%
> > i915_gem_object_bind_to_gtt 0.03% 17.44%
> > i915_gem_evict_something 0.00% 15.54%
> > i915_gem_object_unbind 0.00% 15.31%
> > i915_gem_object_set_to_cpu_domain 0.00% 13.33%
> > i915_gem_clflush_object 0.00% 13.33%
> > i915_gem_clflush_object 0.00% 14.29%
> > i915_gem_mmap_gtt_ioctl 0.00% 10.74%
> > i915_gem_set_domain_ioctl 0.00% 4.98%
> >
> >
> > The i915_gem_evict_something has me curious. Presumably I have too many
> > pages of data actively being used by the GPU (or mapped).
>
> Yes, you are suffering from aperture thrashing. There are a few ways to
> workaround this (1) decrease the size of your working set (reduce texture
> sizes, reuse as many buffers within the aperture as possible)
All VBOs. (One 1.7M VBO actually), but it appears the driver / card is
hanging on to it for a while, so every time I glBufferData (...,
NULL, ...); and glMapBuffer - I get more memory usage. I was expecting
the card to use (say), a handful of copies, but no more.
I guess the CPU got ahead of the GPU in terms of rendering, and it used
up all the aperture space in doing so. In truth, my buffers are RARELY
full, but due to some other (bad) code, needed to be large enough to fit
a particularly complex object in some rare cases.
Having thought about it all now (and read some of the implementation
details in mesa / kernel), I think glBufferSubData should work MUCH
better for my needs.
I take bets its "something I've done wrong", as usually seems to be the
way, but for now - if I just use glBufferSubData to upload changed data
only, I get rendering corruption. It works fine with
LIBGL_ALWAYS_SOFTWARE=1 though, so there is perhaps a small possibility
of a driver bug?
Similarly, if I call glBufferData(..., NULL, ...) before the
glBufferSubData, I get back to bad performance (expected), but rendering
corruption is gone.
Thinking of stupid things I might have done wrong... yes, I did call
glBufferData(..., NULL, ...) once in the case where I re-upload
subsequent times with glBufferSubData(). *Turns out when you
accidentally miss this out, "bad" things happen ;).
All this said, I've discovered docs for glMapBufferRange. With a bit of
extra work to my code (to ensure I use as much of a buffer as possible
before scrapping the whole thing), I think this could be my friend for
getting decent performance out of VBOs without having to glBufferSubData
each set of new data.
--
Peter Clifton
Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA
Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: i915_gem_evict_something in sysprof trace using VBOs
2010-11-05 11:44 ` Peter Clifton
@ 2010-11-05 14:30 ` Peter Clifton
0 siblings, 0 replies; 4+ messages in thread
From: Peter Clifton @ 2010-11-05 14:30 UTC (permalink / raw)
To: Chris Wilson
Cc: intel-gfx@lists.freedesktop.org, mesa3d-dev@lists.sourceforge.net
[-- Attachment #1: Type: text/plain, Size: 1031 bytes --]
On Fri, 2010-11-05 at 11:44 +0000, Peter Clifton wrote:
> I take bets its "something I've done wrong", as usually seems to be the
> way, but for now - if I just use glBufferSubData to upload changed data
> only, I get rendering corruption. It works fine with
> LIBGL_ALWAYS_SOFTWARE=1 though, so there is perhaps a small possibility
> of a driver bug?
Does this look correct? Forcing the Gen6 fallback for BufferSubData
fixes my corruption. Seems as if the blit is going wrong. The PRM
suggests the pitch needs to be DWORD aligned,
The attached patch fixes it. What I can't quite fathom is how this has
escaped until now.. am I doing something unusual by calling
glBufferSubData with large buffers?
Could someone with access to MESA repositories review and commit it
please?
--
Peter Clifton
Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA
Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)
[-- Attachment #2: 0001-intel-Fix-emit_linear_blit-to-use-DWORD-aligned-widt.patch --]
[-- Type: text/x-patch, Size: 1479 bytes --]
>From 055cd27dbdf225c084f4bc2254763dba016edd50 Mon Sep 17 00:00:00 2001
From: Peter Clifton <pcjc2@cam.ac.uk>
Date: Fri, 5 Nov 2010 14:26:24 +0000
Subject: [PATCH] intel: Fix emit_linear_blit to use DWORD aligned width blits
The width of the 2D blits used to copy the data is defined as a 16-bit
signed integer, but the pitch must be DWORD aligned. Limit to an integral
number of DWORDs, (1 << 15 - 4) rather than (1 << 15 -1).
Fixes corruption to data uploaded with glBufferSubData.
Signed-off-by: Peter Clifton <pcjc2@cam.ac.uk>
---
src/mesa/drivers/dri/intel/intel_blit.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/intel/intel_blit.c b/src/mesa/drivers/dri/intel/intel_blit.c
index a74e217..4dc8888 100644
--- a/src/mesa/drivers/dri/intel/intel_blit.c
+++ b/src/mesa/drivers/dri/intel/intel_blit.c
@@ -483,8 +483,11 @@ intel_emit_linear_blit(struct intel_context *intel,
/* Blits are in a different ringbuffer so we don't use them. */
assert(intel->gen < 6);
- /* The pitch is a signed value. */
- pitch = MIN2(size, (1 << 15) - 1);
+ /* The pitch hits the GPU as a is a signed value, IN DWORDs.
+ * But we want width to match pitch. Max width is (1 << 15 -1),
+ * rounding that down to the nearest DWORD is 1 << 15 - 3
+ */
+ pitch = MIN2(size, (1 << 15) - 4);
height = size / pitch;
ok = intelEmitCopyBlit(intel, 1,
pitch, src_bo, src_offset, I915_TILING_NONE,
--
1.7.1
[-- Attachment #3: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-11-05 14:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-05 10:21 i915_gem_evict_something in sysprof trace using VBOs Peter Clifton
2010-11-05 10:35 ` Chris Wilson
2010-11-05 11:44 ` Peter Clifton
2010-11-05 14:30 ` Peter Clifton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox