* [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" @ 2011-06-14 23:43 Eric Anholt 2011-06-15 9:39 ` Chris Wilson 0 siblings, 1 reply; 17+ messages in thread From: Eric Anholt @ 2011-06-14 23:43 UTC (permalink / raw) To: intel-gfx This reverts commit 4a684a4117abd756291969336af454e8a958802f. Userland has always been required to set the object's domain to GTT before using it through a GTT mapping, it's not something that the kernel is supposed to enforce. (The pagefault support is so that we can handle multiple mappings without userland having to pin across them, not so that userland can use GTT after GPU domains without telling the kernel). Fixes 19.2% +/- 0.8% (n=6) performance regression in cairo-gl firefox-talos-gfx on my T420 latop. --- drivers/gpu/drm/i915/i915_gem.c | 10 ++++------ drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 ---- 2 files changed, 4 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index d231729..7bbd355 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1218,11 +1218,11 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf) ret = i915_gem_object_bind_to_gtt(obj, 0, true); if (ret) goto unlock; - } - ret = i915_gem_object_set_to_gtt_domain(obj, write); - if (ret) - goto unlock; + ret = i915_gem_object_set_to_gtt_domain(obj, write); + if (ret) + goto unlock; + } if (obj->tiling_mode == I915_TILING_NONE) ret = i915_gem_object_put_fence(obj); @@ -2953,8 +2953,6 @@ i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj) */ wmb(); - i915_gem_release_mmap(obj); - old_write_domain = obj->base.write_domain; obj->base.write_domain = 0; diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 20a4cc5..4934cf8 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -187,10 +187,6 @@ i915_gem_object_set_to_gpu_domain(struct drm_i915_gem_object *obj, if ((flush_domains | invalidate_domains) & I915_GEM_DOMAIN_CPU) i915_gem_clflush_object(obj); - /* blow away mappings if mapped through GTT */ - if ((flush_domains | invalidate_domains) & I915_GEM_DOMAIN_GTT) - i915_gem_release_mmap(obj); - if (obj->base.pending_write_domain) cd->flips |= atomic_read(&obj->pending_flip); -- 1.7.5.3 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" 2011-06-14 23:43 [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" Eric Anholt @ 2011-06-15 9:39 ` Chris Wilson 2011-06-15 15:08 ` Eric Anholt 0 siblings, 1 reply; 17+ messages in thread From: Chris Wilson @ 2011-06-15 9:39 UTC (permalink / raw) To: Eric Anholt, intel-gfx On Tue, 14 Jun 2011 16:43:09 -0700, Eric Anholt <eric@anholt.net> wrote: > This reverts commit 4a684a4117abd756291969336af454e8a958802f. > Userland has always been required to set the object's domain to GTT > before using it through a GTT mapping, it's not something that the > kernel is supposed to enforce. (The pagefault support is so that we > can handle multiple mappings without userland having to pin across > them, not so that userland can use GTT after GPU domains without > telling the kernel). The only place that can do accurate per-page tracking of domains is the kernel. (And avoid clflushes for GTT eviction. Ok, so we're not quite there yet.) Relying on userspace to get it right, and for co-operating processes to coordinate their access to buffers is a misfeature. > Fixes 19.2% +/- 0.8% (n=6) performance regression in cairo-gl > firefox-talos-gfx on my T420 latop. Since GTT mappings are fundamentally slow, don't you think this suggests that mesa is doing something wrong? Or a bug in cairo-gl to make mesa act this way? For the record, on my SugarBay, this patch on top of drm-intel-fixes and mesa.git + your instruction streaming, regresses cairo-xlib 4.5s to 4.7s (4%, with a claim of .1% stddev) and improves cairo-gl from 21.5s to 20.1s (6.5%, with a claim of .5% stddev) for the firefox-talos-gfx. [This is a system that can do under 2s for cairo-xlib using vmap and the perf patches I posted before.] -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" 2011-06-15 9:39 ` Chris Wilson @ 2011-06-15 15:08 ` Eric Anholt 2011-06-15 16:03 ` Chris Wilson 0 siblings, 1 reply; 17+ messages in thread From: Eric Anholt @ 2011-06-15 15:08 UTC (permalink / raw) To: Chris Wilson, intel-gfx [-- Attachment #1.1: Type: text/plain, Size: 2358 bytes --] On Wed, 15 Jun 2011 10:39:06 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > On Tue, 14 Jun 2011 16:43:09 -0700, Eric Anholt <eric@anholt.net> wrote: > > This reverts commit 4a684a4117abd756291969336af454e8a958802f. > > Userland has always been required to set the object's domain to GTT > > before using it through a GTT mapping, it's not something that the > > kernel is supposed to enforce. (The pagefault support is so that we > > can handle multiple mappings without userland having to pin across > > them, not so that userland can use GTT after GPU domains without > > telling the kernel). > > The only place that can do accurate per-page tracking of domains is the > kernel. (And avoid clflushes for GTT eviction. Ok, so we're not quite > there yet.) Relying on userspace to get it right, and for co-operating > processes to coordinate their access to buffers is a misfeature. We rely on userspace to get it right all the time, which was a design decision made early on for performance reasons. For example, userspace can submit the wrong domains with an exec call, and non-GTT mappings also behave like I'm changing GTT back to. Unless you quietly changed that, too? > > Fixes 19.2% +/- 0.8% (n=6) performance regression in cairo-gl > > firefox-talos-gfx on my T420 latop. > > Since GTT mappings are fundamentally slow, don't you think this suggests > that mesa is doing something wrong? Or a bug in cairo-gl to make mesa act > this way? Page table changes are expensive, yes. That's why we cache BOs in userspace and hang onto the mappings of them. We only moved to GTT mappings in Mesa in places where it was faster than the alternatives (or for tiled buffers, where due to the a17 swizzling there is no other option). > For the record, on my SugarBay, this patch on top of drm-intel-fixes and > mesa.git + your instruction streaming, regresses cairo-xlib 4.5s to 4.7s > (4%, with a claim of .1% stddev) and improves cairo-gl from 21.5s to 20.1s > (6.5%, with a claim of .5% stddev) for the firefox-talos-gfx. [This is a > system that can do under 2s for cairo-xlib using vmap and the perf > patches I posted before.] > -Chris I was testing with semaphores enabled on the LLC series. I'm curious how this revert could possibly regress cairo-xlib, though -- that seems very strange. [-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --] [-- Attachment #2: Type: text/plain, Size: 159 bytes --] _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" 2011-06-15 15:08 ` Eric Anholt @ 2011-06-15 16:03 ` Chris Wilson 2011-06-17 19:06 ` Eric Anholt 0 siblings, 1 reply; 17+ messages in thread From: Chris Wilson @ 2011-06-15 16:03 UTC (permalink / raw) To: Eric Anholt, intel-gfx On Wed, 15 Jun 2011 08:08:59 -0700, Eric Anholt <eric@anholt.net> wrote: > On Wed, 15 Jun 2011 10:39:06 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > For the record, on my SugarBay, this patch on top of drm-intel-fixes and > > mesa.git + your instruction streaming, regresses cairo-xlib 4.5s to 4.7s > > (4%, with a claim of .1% stddev) and improves cairo-gl from 21.5s to 20.1s > > (6.5%, with a claim of .5% stddev) for the firefox-talos-gfx. [This is a > > system that can do under 2s for cairo-xlib using vmap and the perf > > patches I posted before.] > > -Chris > > I was testing with semaphores enabled on the LLC series. I'm curious > how this revert could possibly regress cairo-xlib, though -- that seems > very strange. Moving back to LLC+semaphores (2.6.39-rc2+), firefox-talos-gfx: xlib: 4.473 gl: 20.753 applying the patch: xlib: 4.472 gl: 20.824 I'm just not reproducing the same issue you are seeing. Are you using a standard distro Kconfig, or if not, can you send me yours? -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" 2011-06-15 16:03 ` Chris Wilson @ 2011-06-17 19:06 ` Eric Anholt 2011-06-18 11:43 ` Chris Wilson 0 siblings, 1 reply; 17+ messages in thread From: Eric Anholt @ 2011-06-17 19:06 UTC (permalink / raw) To: Chris Wilson, intel-gfx [-- Attachment #1.1: Type: text/plain, Size: 2614 bytes --] On Wed, 15 Jun 2011 17:03:58 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > On Wed, 15 Jun 2011 08:08:59 -0700, Eric Anholt <eric@anholt.net> wrote: > > On Wed, 15 Jun 2011 10:39:06 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > For the record, on my SugarBay, this patch on top of drm-intel-fixes and > > > mesa.git + your instruction streaming, regresses cairo-xlib 4.5s to 4.7s > > > (4%, with a claim of .1% stddev) and improves cairo-gl from 21.5s to 20.1s > > > (6.5%, with a claim of .5% stddev) for the firefox-talos-gfx. [This is a > > > system that can do under 2s for cairo-xlib using vmap and the perf > > > patches I posted before.] > > > -Chris > > > > I was testing with semaphores enabled on the LLC series. I'm curious > > how this revert could possibly regress cairo-xlib, though -- that seems > > very strange. > > Moving back to LLC+semaphores (2.6.39-rc2+), firefox-talos-gfx: > xlib: 4.473 > gl: 20.753 > > applying the patch: > xlib: 4.472 > gl: 20.824 > > I'm just not reproducing the same issue you are seeing. Are you using a > standard distro Kconfig, or if not, can you send me yours? > -Chris http://people.freedesktop.org/~anholt/dotconfig Pushed my kernel tree to "gtt-revert" branch. Fresh set of numbers on two fresh boots, just about as identical testing environments as I could produce: (without revert) anholt@eliezer:src/cairo/perf% cat /home/anholt/cairogl-before 21791099815 21677090234 22060471069 22049244399 21721362892 21966143347 (with revert) anholt@eliezer:src/cairo/perf% cat /home/anholt/cairogl-after 19162606198 19127697366 19462253708 19172535715 19339999910 19221539215 anholt@eliezer:src/cairo/perf% ministat ~/cairogl-before ~/cairogl-after x /home/anholt/cairogl-before + /home/anholt/cairogl-after +------------------------------------------------------------------------------+ | + x| |++ + + + xx x x x| ||__A___| |___A__M_|| +------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 6 2.167709e+10 2.2060471e+10 2.1966143e+10 2.1877569e+10 1.6902073e+08 + 6 1.9127697e+10 1.9462254e+10 1.9221539e+10 1.9247772e+10 1.2847426e+08 Difference at 95.0% confidence -2.6298e+09 +/- 1.93108e+08 -12.0205% +/- 0.882677% (Student's t, pooled s = 1.50123e+08) [-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --] [-- Attachment #2: Type: text/plain, Size: 159 bytes --] _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" 2011-06-17 19:06 ` Eric Anholt @ 2011-06-18 11:43 ` Chris Wilson 2011-06-18 20:20 ` Eric Anholt 0 siblings, 1 reply; 17+ messages in thread From: Chris Wilson @ 2011-06-18 11:43 UTC (permalink / raw) To: Eric Anholt, intel-gfx On Fri, 17 Jun 2011 12:06:54 -0700, Eric Anholt <eric@anholt.net> wrote: > On Wed, 15 Jun 2011 17:03:58 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > Moving back to LLC+semaphores (2.6.39-rc2+), firefox-talos-gfx: > > xlib: 4.473 > > gl: 20.753 > > > > applying the patch: > > xlib: 4.472 > > gl: 20.824 > > > > I'm just not reproducing the same issue you are seeing. Are you using a > > standard distro Kconfig, or if not, can you send me yours? > > -Chris > > http://people.freedesktop.org/~anholt/dotconfig > > Pushed my kernel tree to "gtt-revert" branch. Thanks, I'm testing with those on my SNB desktop, still only see around a 5% hit for firefox-talos-gfx. I still think this is optimising for bad behaviour of the client, and papering over two slow linear lookups in the kernel. [One for find_vma which is exacerbated by the cached vma on the bo and the other checking whether the GTT aperture is RAM.] Looking at what cairo-gl is doing, the root cause of why we are repeatedly rewriting textures is: commit 3b1c0a4bd66660780095e6016e3db451f34503a3 Author: Benjamin Otte <otte@redhat.com> Date: Fri May 14 15:56:17 2010 +0200 fallback: Remove span renderer paths Those paths were broken, as they didn't properly translate the polygon to the destination size. And rather than adding lots of code that allows translation, it's easier to just delete this code. Note that the only user of the code was the GL backend anyway. i.e. cairo-gl is no longer using spans for firefox-talos-gfx where unaligned clipping dominates. -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" 2011-06-18 11:43 ` Chris Wilson @ 2011-06-18 20:20 ` Eric Anholt 2011-06-19 16:28 ` Chris Wilson 0 siblings, 1 reply; 17+ messages in thread From: Eric Anholt @ 2011-06-18 20:20 UTC (permalink / raw) To: Chris Wilson, intel-gfx [-- Attachment #1.1: Type: text/plain, Size: 1345 bytes --] On Sat, 18 Jun 2011 12:43:58 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > On Fri, 17 Jun 2011 12:06:54 -0700, Eric Anholt <eric@anholt.net> wrote: > > On Wed, 15 Jun 2011 17:03:58 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > Moving back to LLC+semaphores (2.6.39-rc2+), firefox-talos-gfx: > > > xlib: 4.473 > > > gl: 20.753 > > > > > > applying the patch: > > > xlib: 4.472 > > > gl: 20.824 > > > > > > I'm just not reproducing the same issue you are seeing. Are you using a > > > standard distro Kconfig, or if not, can you send me yours? > > > -Chris > > > > http://people.freedesktop.org/~anholt/dotconfig > > > > Pushed my kernel tree to "gtt-revert" branch. > > Thanks, I'm testing with those on my SNB desktop, still only see around > a 5% hit for firefox-talos-gfx. GTT mappings are important. They're how textures are uploaded in GL in general. One of the longstanding things I've wanted to do for GL applications that repeatedly allocate new texture images is to userland BO cache those objects, which we don't do because of tiling. With the patch I'm reverting in place, there's basically no reason to do so because remapping the BO is much of the cost of recreating the BO from scratch. Remapping is the reason we do userland caching instead of kernel-side caching. [-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --] [-- Attachment #2: Type: text/plain, Size: 159 bytes --] _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" 2011-06-18 20:20 ` Eric Anholt @ 2011-06-19 16:28 ` Chris Wilson 2011-06-19 17:01 ` Eric Anholt 0 siblings, 1 reply; 17+ messages in thread From: Chris Wilson @ 2011-06-19 16:28 UTC (permalink / raw) To: Eric Anholt, intel-gfx On Sat, 18 Jun 2011 13:20:05 -0700, Eric Anholt <eric@anholt.net> wrote: > On Sat, 18 Jun 2011 12:43:58 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > On Fri, 17 Jun 2011 12:06:54 -0700, Eric Anholt <eric@anholt.net> wrote: > > > On Wed, 15 Jun 2011 17:03:58 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > > Moving back to LLC+semaphores (2.6.39-rc2+), firefox-talos-gfx: > > > > xlib: 4.473 > > > > gl: 20.753 > > > > > > > > applying the patch: > > > > xlib: 4.472 > > > > gl: 20.824 > > > > > > > > I'm just not reproducing the same issue you are seeing. Are you using a > > > > standard distro Kconfig, or if not, can you send me yours? > > > > -Chris > > > > > > http://people.freedesktop.org/~anholt/dotconfig > > > > > > Pushed my kernel tree to "gtt-revert" branch. > > > > Thanks, I'm testing with those on my SNB desktop, still only see around > > a 5% hit for firefox-talos-gfx. > > GTT mappings are important. They're how textures are uploaded in GL in > general. For lack of a better mechanism. Even using anholt/gtt-revert, I question the value of caching the GTT mapping in drm_intel_bo. For the cairo-gl and pts benchmarks I've run, the efficacy of the cached vma is very small and there is a very slight improvement by unmapping the vma after use. (The difference is so small, that it will take a lot more runs to determine if it is statistically significant.) > One of the longstanding things I've wanted to do for GL > applications that repeatedly allocate new texture images is to userland > BO cache those objects, which we don't do because of tiling. I'm failing to see the difference between this cache and a slightly smarter (i.e. one that preferentially returns an object with appropriate tiling and busy status) libdrm_intel cache. > With the > patch I'm reverting in place, there's basically no reason to do so > because remapping the BO is much of the cost of recreating the BO from > scratch. Remapping is the reason we do userland caching instead of > kernel-side caching. Conversely, it is the lack of such a kernel bo and page cache that is determinental to the ddx (at least on pre-SNB). And for any system running more than one renderer. [Except for the thorny issue of whether having to clear the pages returned from such a cache will nullify its benefits. I have yet to measure the impact of doing so.] -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" 2011-06-19 16:28 ` Chris Wilson @ 2011-06-19 17:01 ` Eric Anholt 2011-06-19 17:14 ` Chris Wilson 0 siblings, 1 reply; 17+ messages in thread From: Eric Anholt @ 2011-06-19 17:01 UTC (permalink / raw) To: Chris Wilson, intel-gfx [-- Attachment #1.1: Type: text/plain, Size: 2294 bytes --] On Sun, 19 Jun 2011 17:28:11 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > On Sat, 18 Jun 2011 13:20:05 -0700, Eric Anholt <eric@anholt.net> wrote: > > On Sat, 18 Jun 2011 12:43:58 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > On Fri, 17 Jun 2011 12:06:54 -0700, Eric Anholt <eric@anholt.net> wrote: > > > > On Wed, 15 Jun 2011 17:03:58 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > > > Moving back to LLC+semaphores (2.6.39-rc2+), firefox-talos-gfx: > > > > > xlib: 4.473 > > > > > gl: 20.753 > > > > > > > > > > applying the patch: > > > > > xlib: 4.472 > > > > > gl: 20.824 > > > > > > > > > > I'm just not reproducing the same issue you are seeing. Are you using a > > > > > standard distro Kconfig, or if not, can you send me yours? > > > > > -Chris > > > > > > > > http://people.freedesktop.org/~anholt/dotconfig > > > > > > > > Pushed my kernel tree to "gtt-revert" branch. > > > > > > Thanks, I'm testing with those on my SNB desktop, still only see around > > > a 5% hit for firefox-talos-gfx. > > > > GTT mappings are important. They're how textures are uploaded in GL in > > general. > > For lack of a better mechanism. Even using anholt/gtt-revert, I question > the value of caching the GTT mapping in drm_intel_bo. For the cairo-gl and > pts benchmarks I've run, the efficacy of the cached vma is very small and > there is a very slight improvement by unmapping the vma after use. (The > difference is so small, that it will take a lot more runs to determine if > it is statistically significant.) I'm confused. You've measured a 5% impact from removing this part of the caching, and I've measured 12-19%, so what are you planning that doesn't involve caching the mapping that's faster than caching the mapping? > > One of the longstanding things I've wanted to do for GL > > applications that repeatedly allocate new texture images is to userland > > BO cache those objects, which we don't do because of tiling. > > I'm failing to see the difference between this cache and a slightly > smarter (i.e. one that preferentially returns an object with appropriate > tiling and busy status) libdrm_intel cache. I was talking about it being in libdrm_intel, sorry if that was unclear. [-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --] [-- Attachment #2: Type: text/plain, Size: 159 bytes --] _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" 2011-06-19 17:01 ` Eric Anholt @ 2011-06-19 17:14 ` Chris Wilson 2011-06-19 21:20 ` Eric Anholt 0 siblings, 1 reply; 17+ messages in thread From: Chris Wilson @ 2011-06-19 17:14 UTC (permalink / raw) To: Eric Anholt, intel-gfx On Sun, 19 Jun 2011 10:01:23 -0700, Eric Anholt <eric@anholt.net> wrote: > On Sun, 19 Jun 2011 17:28:11 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > For lack of a better mechanism. Even using anholt/gtt-revert, I question > > the value of caching the GTT mapping in drm_intel_bo. For the cairo-gl and > > pts benchmarks I've run, the efficacy of the cached vma is very small and > > there is a very slight improvement by unmapping the vma after use. (The > > difference is so small, that it will take a lot more runs to determine if > > it is statistically significant.) > > I'm confused. You've measured a 5% impact from removing this part of > the caching, and I've measured 12-19%, so what are you planning that > doesn't involve caching the mapping that's faster than caching the > mapping? As I pointed out, cairo-gl is using fallback code and not spans. Once that is corrected, cairo-gl no longer continually recreating textures and so unaffected by the patch. Removing the cached vma is then arguably beneficial, though the difference looks to be in the noise. -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" 2011-06-19 17:14 ` Chris Wilson @ 2011-06-19 21:20 ` Eric Anholt 2011-06-19 21:49 ` Chris Wilson 0 siblings, 1 reply; 17+ messages in thread From: Eric Anholt @ 2011-06-19 21:20 UTC (permalink / raw) To: Chris Wilson, intel-gfx [-- Attachment #1.1: Type: text/plain, Size: 1510 bytes --] On Sun, 19 Jun 2011 18:14:11 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > On Sun, 19 Jun 2011 10:01:23 -0700, Eric Anholt <eric@anholt.net> wrote: > > On Sun, 19 Jun 2011 17:28:11 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > For lack of a better mechanism. Even using anholt/gtt-revert, I question > > > the value of caching the GTT mapping in drm_intel_bo. For the cairo-gl and > > > pts benchmarks I've run, the efficacy of the cached vma is very small and > > > there is a very slight improvement by unmapping the vma after use. (The > > > difference is so small, that it will take a lot more runs to determine if > > > it is statistically significant.) > > > > I'm confused. You've measured a 5% impact from removing this part of > > the caching, and I've measured 12-19%, so what are you planning that > > doesn't involve caching the mapping that's faster than caching the > > mapping? > > As I pointed out, cairo-gl is using fallback code and not spans. Once that > is corrected, cairo-gl no longer continually recreating textures and so > unaffected by the patch. Removing the cached vma is then arguably > beneficial, though the difference looks to be in the noise. So you don't actually want us to fix the performance regression in texture uploads, and instead want to just tell applications not to upload textures? What about applications where we're not the authors, and where they don't have a choice but to upload textures (media players)? [-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --] [-- Attachment #2: Type: text/plain, Size: 159 bytes --] _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" 2011-06-19 21:20 ` Eric Anholt @ 2011-06-19 21:49 ` Chris Wilson 2011-06-21 16:17 ` Remove the overhead of vm_insert_pfn() Chris Wilson 2011-06-21 18:09 ` [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" Keith Packard 0 siblings, 2 replies; 17+ messages in thread From: Chris Wilson @ 2011-06-19 21:49 UTC (permalink / raw) To: Eric Anholt, intel-gfx On Sun, 19 Jun 2011 14:20:14 -0700, Eric Anholt <eric@anholt.net> wrote: > On Sun, 19 Jun 2011 18:14:11 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > On Sun, 19 Jun 2011 10:01:23 -0700, Eric Anholt <eric@anholt.net> wrote: > > > On Sun, 19 Jun 2011 17:28:11 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > > For lack of a better mechanism. Even using anholt/gtt-revert, I question > > > > the value of caching the GTT mapping in drm_intel_bo. For the cairo-gl and > > > > pts benchmarks I've run, the efficacy of the cached vma is very small and > > > > there is a very slight improvement by unmapping the vma after use. (The > > > > difference is so small, that it will take a lot more runs to determine if > > > > it is statistically significant.) > > > > > > I'm confused. You've measured a 5% impact from removing this part of > > > the caching, and I've measured 12-19%, so what are you planning that > > > doesn't involve caching the mapping that's faster than caching the > > > mapping? > > > > As I pointed out, cairo-gl is using fallback code and not spans. Once that > > is corrected, cairo-gl no longer continually recreating textures and so > > unaffected by the patch. Removing the cached vma is then arguably > > beneficial, though the difference looks to be in the noise. > > So you don't actually want us to fix the performance regression in > texture uploads, and instead want to just tell applications not to > upload textures? Nowhere, did I say that. I just said you were optimising for broken code and papering over bugs and lack of functionality elsewhere. The patch closes a race condition. The essence of your complaint is that the kernel is not as fast as we need it to be, and that the initial upload to any object is slower than expected. I presume you also have a plan for fixing the underwhelming performance of a glibc memcpy through an established GTT mapping? > What about applications where we're not the authors, and where they > don't have a choice but to upload textures (media players)? No, I'm not necessarily saying application code is broken, just the library. -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 17+ messages in thread
* Remove the overhead of vm_insert_pfn() 2011-06-19 21:49 ` Chris Wilson @ 2011-06-21 16:17 ` Chris Wilson 2011-06-21 16:17 ` [PATCH 1/3] resource: Use a common string for .name = "System RAM" Chris Wilson ` (2 more replies) 2011-06-21 18:09 ` [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" Keith Packard 1 sibling, 3 replies; 17+ messages in thread From: Chris Wilson @ 2011-06-21 16:17 UTC (permalink / raw) To: intel-gfx Poking around in x86,pat I found one way of avoiding the linear walk for pat_pagerange_is_ram() which appears to nullify the regression. I would appreciate confirmation and some review before I go poking dragons. -Chris ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/3] resource: Use a common string for .name = "System RAM" 2011-06-21 16:17 ` Remove the overhead of vm_insert_pfn() Chris Wilson @ 2011-06-21 16:17 ` Chris Wilson 2011-06-21 16:17 ` [PATCH 2/3] x86, pat: Perform interval rbtree lookup of memtype first Chris Wilson 2011-06-21 16:17 ` [PATCH 3/3] x86, pat: Convert memtype_lock spinlock to a rwlock Chris Wilson 2 siblings, 0 replies; 17+ messages in thread From: Chris Wilson @ 2011-06-21 16:17 UTC (permalink / raw) To: intel-gfx On a PAT system, inserting a pfn is quite expensive due to the query of the protections for the page based on its physical location. A large part of this overhead is due to the linear walk of resource regions and the strcmp to find "System RAM". By restricting memory to use a global string for its resource name, we can reduce this strcmp to a direct pointer comparison and so eliminate the largest single overhead for vma_insert_pfn(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- arch/arm/kernel/setup.c | 8 ++++---- arch/arm/plat-samsung/pm-check.c | 2 +- arch/avr32/kernel/setup.c | 2 +- arch/ia64/kernel/efi.c | 8 ++++---- arch/mips/kernel/setup.c | 4 ++-- arch/parisc/mm/init.c | 2 +- arch/s390/kernel/setup.c | 6 +++--- arch/score/kernel/setup.c | 2 +- arch/sh/kernel/setup.c | 2 +- arch/sparc/kernel/pci_common.c | 2 +- arch/tile/kernel/setup.c | 2 +- arch/unicore32/kernel/setup.c | 2 +- arch/x86/kernel/e820.c | 4 ++-- arch/x86/kernel/probe_roms.c | 2 +- include/linux/ioport.h | 5 +++++ kernel/resource.c | 15 +++++++++++---- mm/memory_hotplug.c | 4 ++-- 17 files changed, 42 insertions(+), 30 deletions(-) diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index ed11fb0..e525baa 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -166,19 +166,19 @@ static struct resource mem_res[] = { static struct resource io_res[] = { { - .name = "reserved", + .name = resource_name__reserved, .start = 0x3bc, .end = 0x3be, .flags = IORESOURCE_IO | IORESOURCE_BUSY }, { - .name = "reserved", + .name = resource_name__reserved, .start = 0x378, .end = 0x37f, .flags = IORESOURCE_IO | IORESOURCE_BUSY }, { - .name = "reserved", + .name = resource_name__reserved, .start = 0x278, .end = 0x27f, .flags = IORESOURCE_IO | IORESOURCE_BUSY @@ -543,7 +543,7 @@ static void __init request_standard_resources(struct machine_desc *mdesc) for_each_memblock(memory, region) { res = alloc_bootmem_low(sizeof(*res)); - res->name = "System RAM"; + res->name = resource_name__system_ram; res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; diff --git a/arch/arm/plat-samsung/pm-check.c b/arch/arm/plat-samsung/pm-check.c index 6b733fa..d402409 100644 --- a/arch/arm/plat-samsung/pm-check.c +++ b/arch/arm/plat-samsung/pm-check.c @@ -54,7 +54,7 @@ static void s3c_pm_run_res(struct resource *ptr, run_fn_t fn, u32 *arg) s3c_pm_run_res(ptr->child, fn, arg); if ((ptr->flags & IORESOURCE_MEM) && - strcmp(ptr->name, "System RAM") == 0) { + ptr->name == resource_name__system_ram) { S3C_PMDBG("Found system RAM at %08lx..%08lx\n", (unsigned long)ptr->start, (unsigned long)ptr->end); diff --git a/arch/avr32/kernel/setup.c b/arch/avr32/kernel/setup.c index bb0974c..90ded04 100644 --- a/arch/avr32/kernel/setup.c +++ b/arch/avr32/kernel/setup.c @@ -133,7 +133,7 @@ add_physical_memory(resource_size_t start, resource_size_t end) new = &res_cache[res_cache_next_free++]; new->start = start; new->end = end; - new->name = "System RAM"; + new->name = resource_name__system_ram; new->flags = IORESOURCE_MEM; *pprev = new; diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c index 6fc03af..44d5116 100644 --- a/arch/ia64/kernel/efi.c +++ b/arch/ia64/kernel/efi.c @@ -1233,12 +1233,12 @@ efi_initialize_iomem_resources(struct resource *code_resource, case EFI_BOOT_SERVICES_CODE: case EFI_CONVENTIONAL_MEMORY: if (md->attribute & EFI_MEMORY_WP) { - name = "System ROM"; + name = resource_name__system_rom; flags |= IORESOURCE_READONLY; } else if (md->attribute == EFI_MEMORY_UC) name = "Uncached RAM"; else - name = "System RAM"; + name = resource_name__system_ram; break; case EFI_ACPI_MEMORY_NVS: @@ -1246,7 +1246,7 @@ efi_initialize_iomem_resources(struct resource *code_resource, break; case EFI_UNUSABLE_MEMORY: - name = "reserved"; + name = resource_name__reserved; flags |= IORESOURCE_DISABLED; break; @@ -1255,7 +1255,7 @@ efi_initialize_iomem_resources(struct resource *code_resource, case EFI_RUNTIME_SERVICES_DATA: case EFI_ACPI_RECLAIM_MEMORY: default: - name = "reserved"; + name = resource_name__reserved; break; } diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c index 8ad1d56..713352d 100644 --- a/arch/mips/kernel/setup.c +++ b/arch/mips/kernel/setup.c @@ -524,11 +524,11 @@ static void __init resource_init(void) switch (boot_mem_map.map[i].type) { case BOOT_MEM_RAM: case BOOT_MEM_ROM_DATA: - res->name = "System RAM"; + res->name = resource_name__system_ram; break; case BOOT_MEM_RESERVED: default: - res->name = "reserved"; + res->name = resource_name__reserved; } res->start = start; diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c index 82f364e..4915ca8 100644 --- a/arch/parisc/mm/init.c +++ b/arch/parisc/mm/init.c @@ -183,7 +183,7 @@ static void __init setup_bootmem(void) sysram_resource_count = npmem_ranges; for (i = 0; i < sysram_resource_count; i++) { struct resource *res = &sysram_resources[i]; - res->name = "System RAM"; + res->name = resource_name__system_ram; res->start = pmem_ranges[i].start_pfn << PAGE_SHIFT; res->end = res->start + (pmem_ranges[i].pages << PAGE_SHIFT)-1; res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c index 0c35dee..f892eb4 100644 --- a/arch/s390/kernel/setup.c +++ b/arch/s390/kernel/setup.c @@ -439,14 +439,14 @@ static void __init setup_resources(void) res->flags = IORESOURCE_BUSY | IORESOURCE_MEM; switch (memory_chunk[i].type) { case CHUNK_READ_WRITE: - res->name = "System RAM"; + res->name = resource_name__system_ram; break; case CHUNK_READ_ONLY: - res->name = "System ROM"; + res->name = resource_name__system_rom; res->flags |= IORESOURCE_READONLY; break; default: - res->name = "reserved"; + res->name = resource_name__reserved; } res->start = memory_chunk[i].addr; res->end = res->start + memory_chunk[i].size - 1; diff --git a/arch/score/kernel/setup.c b/arch/score/kernel/setup.c index 6f898c0..5feab05 100644 --- a/arch/score/kernel/setup.c +++ b/arch/score/kernel/setup.c @@ -96,7 +96,7 @@ static void __init resource_init(void) data_resource.end = __pa(&_edata) - 1; res = alloc_bootmem(sizeof(struct resource)); - res->name = "System RAM"; + res->name = resource_name__system_ram; res->start = MEMORY_START; res->end = MEMORY_START + MEMORY_SIZE - 1; res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; diff --git a/arch/sh/kernel/setup.c b/arch/sh/kernel/setup.c index 58bff45..4d8edae 100644 --- a/arch/sh/kernel/setup.c +++ b/arch/sh/kernel/setup.c @@ -199,7 +199,7 @@ void __init __add_active_range(unsigned int nid, unsigned long start_pfn, start = start_pfn << PAGE_SHIFT; end = end_pfn << PAGE_SHIFT; - res->name = "System RAM"; + res->name = resource_name__system_ram; res->start = start; res->end = end - 1; res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; diff --git a/arch/sparc/kernel/pci_common.c b/arch/sparc/kernel/pci_common.c index 6e3874b..e7cf5ea 100644 --- a/arch/sparc/kernel/pci_common.c +++ b/arch/sparc/kernel/pci_common.c @@ -349,7 +349,7 @@ static void pci_register_legacy_regions(struct resource *io_res, if (!p) return; - p->name = "System ROM"; + p->name = resource_name__system_rom; p->start = mem_res->start + 0xf0000UL; p->end = p->start + 0xffffUL; p->flags = IORESOURCE_BUSY; diff --git a/arch/tile/kernel/setup.c b/arch/tile/kernel/setup.c index 6cdc9ba..901d908 100644 --- a/arch/tile/kernel/setup.c +++ b/arch/tile/kernel/setup.c @@ -1466,7 +1466,7 @@ insert_ram_resource(u64 start_pfn, u64 end_pfn) { struct resource *res = kzalloc(sizeof(struct resource), GFP_ATOMIC); - res->name = "System RAM"; + res->name = resource_name__system_ram; res->start = start_pfn << PAGE_SHIFT; res->end = (end_pfn << PAGE_SHIFT) - 1; res->flags = IORESOURCE_BUSY | IORESOURCE_MEM; diff --git a/arch/unicore32/kernel/setup.c b/arch/unicore32/kernel/setup.c index 471b6bc..2522832 100644 --- a/arch/unicore32/kernel/setup.c +++ b/arch/unicore32/kernel/setup.c @@ -203,7 +203,7 @@ request_standard_resources(struct meminfo *mi) continue; res = alloc_bootmem_low(sizeof(*res)); - res->name = "System RAM"; + res->name = resource_name__system_ram; res->start = mi->bank[i].start; res->end = mi->bank[i].start + mi->bank[i].size - 1; res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 3e2ef84..8642272 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -925,11 +925,11 @@ static inline const char *e820_type_to_string(int e820_type) { switch (e820_type) { case E820_RESERVED_KERN: - case E820_RAM: return "System RAM"; + case E820_RAM: return resource_name__system_ram; case E820_ACPI: return "ACPI Tables"; case E820_NVS: return "ACPI Non-volatile Storage"; case E820_UNUSABLE: return "Unusable memory"; - default: return "reserved"; + default: return resource_name__reserved; } } diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c index ba0a4cc..ce7f8d4 100644 --- a/arch/x86/kernel/probe_roms.c +++ b/arch/x86/kernel/probe_roms.c @@ -21,7 +21,7 @@ #include <asm/setup_arch.h> static struct resource system_rom_resource = { - .name = "System ROM", + .name = resource_name__system_rom, .start = 0xf0000, .end = 0xfffff, .flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM diff --git a/include/linux/ioport.h b/include/linux/ioport.h index e9bb22c..b6222ee 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -109,6 +109,11 @@ struct resource_list { /* PCI control bits. Shares IORESOURCE_BITS with above PCI ROM. */ #define IORESOURCE_PCI_FIXED (1<<4) /* Do not move resource */ +/* Some common resource names, must be used when appropriate. */ +extern const char resource_name__system_ram[]; +extern const char resource_name__system_rom[]; +extern const char resource_name__reserved[]; + /* PC/ISA/whatever - the normal PC address spaces: IO and memory */ extern struct resource ioport_resource; extern struct resource iomem_resource; diff --git a/kernel/resource.c b/kernel/resource.c index 798e2fa..8c29c2e 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -269,13 +269,20 @@ int release_resource(struct resource *old) EXPORT_SYMBOL(release_resource); +const char resource_name__system_ram[] = "System RAM"; +EXPORT_SYMBOL(resource_name__system_ram); +const char resource_name__system_rom[] = "System ROM"; +EXPORT_SYMBOL(resource_name__system_rom); +const char resource_name__reserved[] = "reserved"; +EXPORT_SYMBOL(resource_name__reserved); + #if !defined(CONFIG_ARCH_HAS_WALK_MEMORY) /* * Finds the lowest memory reosurce exists within [res->start.res->end) * the caller must specify res->start, res->end, res->flags and "name". * If found, returns 0, res is overwritten, if not found, returns -1. */ -static int find_next_system_ram(struct resource *res, char *name) +static int find_next_system_ram(struct resource *res, const char *name) { resource_size_t start, end; struct resource *p; @@ -291,7 +298,7 @@ static int find_next_system_ram(struct resource *res, char *name) /* system ram is just marked as IORESOURCE_MEM */ if (p->flags != res->flags) continue; - if (name && strcmp(p->name, name)) + if (name && p->name == name) continue; if (p->start > end) { p = NULL; @@ -329,7 +336,7 @@ int walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages, res.flags = IORESOURCE_MEM | IORESOURCE_BUSY; orig_end = res.end; while ((res.start < res.end) && - (find_next_system_ram(&res, "System RAM") >= 0)) { + (find_next_system_ram(&res, resource_name__system_ram) >= 0)) { pfn = (res.start + PAGE_SIZE - 1) >> PAGE_SHIFT; end_pfn = (res.end + 1) >> PAGE_SHIFT; if (end_pfn > pfn) @@ -936,7 +943,7 @@ static int __init reserve_setup(char *str) break; if (x < MAXRESERVE) { struct resource *res = reserve + x; - res->name = "reserved"; + res->name = resource_name__reserved; res->start = io_start; res->end = io_start + io_num - 1; res->flags = IORESOURCE_BUSY; diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 9f64637..97aa891 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -58,7 +58,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) res = kzalloc(sizeof(struct resource), GFP_KERNEL); BUG_ON(!res); - res->name = "System RAM"; + res->name = resource_name__system_ram; res->start = start; res->end = start + size - 1; res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; @@ -571,7 +571,7 @@ int __ref add_memory(int nid, u64 start, u64 size) } /* create new memmap entry */ - firmware_map_add_hotplug(start, start + size, "System RAM"); + firmware_map_add_hotplug(start, start + size, resource_name__system_ram); goto out; -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/3] x86, pat: Perform interval rbtree lookup of memtype first 2011-06-21 16:17 ` Remove the overhead of vm_insert_pfn() Chris Wilson 2011-06-21 16:17 ` [PATCH 1/3] resource: Use a common string for .name = "System RAM" Chris Wilson @ 2011-06-21 16:17 ` Chris Wilson 2011-06-21 16:17 ` [PATCH 3/3] x86, pat: Convert memtype_lock spinlock to a rwlock Chris Wilson 2 siblings, 0 replies; 17+ messages in thread From: Chris Wilson @ 2011-06-21 16:17 UTC (permalink / raw) To: intel-gfx The PAT interval tree is only defined for non-RAM ranges, and is both a shorter list and log-n lookup compared to the linear walk over the resource ranges looking for "System RAM". In the case of heavy vm_insert_pfn() users like the gpu drivers, which regularly modify the contents of the AGP aperture, this gives a significant reduction in the overhead of faulting in fresh addresses. However, note that in 1f9cc3cb6a27521ed (x86, pat: Update the page flags for memtype atomically instead of using memtype_lock), the contention upon the memtype_lock in lookup_memtype() was observed to be behind a factor of 50x reduction in page fault rate for 32 cpus running vm_insert_pfn(). By performing the locked lookup of memtype first, we are once again exposed to that contention for is_ram pages. Though in effect we will be just moving the contention from resource_lock (rwlock) to memtype_lock (spinlock) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> %%Cc: Robin Holt <holt@sgi.com> %%Cc: Suresh Siddha <suresh.b.siddha@intel.com> %%Cc: H. Peter Anvin <hpa@zytor.com> --- arch/x86/mm/pat.c | 24 +++++++++++++----------- 1 files changed, 13 insertions(+), 11 deletions(-) diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c index f6ff57b..18d4aa9 100644 --- a/arch/x86/mm/pat.c +++ b/arch/x86/mm/pat.c @@ -384,10 +384,21 @@ int free_memtype(u64 start, u64 end) */ static unsigned long lookup_memtype(u64 paddr) { - int rettype = _PAGE_CACHE_WB; + int rettype = -1; struct memtype *entry; if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE)) + return _PAGE_CACHE_WB; + + spin_lock(&memtype_lock); + + entry = rbt_memtype_lookup(paddr); + if (entry != NULL) + rettype = entry->type; + + spin_unlock(&memtype_lock); + + if (rettype != -1) return rettype; if (pat_pagerange_is_ram(paddr, paddr + PAGE_SIZE)) { @@ -404,16 +415,7 @@ static unsigned long lookup_memtype(u64 paddr) return rettype; } - spin_lock(&memtype_lock); - - entry = rbt_memtype_lookup(paddr); - if (entry != NULL) - rettype = entry->type; - else - rettype = _PAGE_CACHE_UC_MINUS; - - spin_unlock(&memtype_lock); - return rettype; + return _PAGE_CACHE_UC_MINUS; } /** -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 3/3] x86, pat: Convert memtype_lock spinlock to a rwlock 2011-06-21 16:17 ` Remove the overhead of vm_insert_pfn() Chris Wilson 2011-06-21 16:17 ` [PATCH 1/3] resource: Use a common string for .name = "System RAM" Chris Wilson 2011-06-21 16:17 ` [PATCH 2/3] x86, pat: Perform interval rbtree lookup of memtype first Chris Wilson @ 2011-06-21 16:17 ` Chris Wilson 2 siblings, 0 replies; 17+ messages in thread From: Chris Wilson @ 2011-06-21 16:17 UTC (permalink / raw) To: intel-gfx Presuming that we lookup the memtype of an address far more often than we modify the PAT ranges, favour the reader. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- arch/x86/mm/pat.c | 20 ++++++++++---------- 1 files changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c index 18d4aa9..8cc67e5 100644 --- a/arch/x86/mm/pat.c +++ b/arch/x86/mm/pat.c @@ -130,7 +130,7 @@ void pat_init(void) #undef PAT -static DEFINE_SPINLOCK(memtype_lock); /* protects memtype accesses */ +static DEFINE_RWLOCK(memtype_lock); /* protects memtype accesses */ /* * Does intersection of PAT memory type and MTRR memory type and returns @@ -310,7 +310,7 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type, new->end = end; new->type = actual_type; - spin_lock(&memtype_lock); + write_lock(&memtype_lock); err = rbt_memtype_check_insert(new, new_type); if (err) { @@ -318,12 +318,12 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type, "track %s, req %s\n", start, end, cattr_name(new->type), cattr_name(req_type)); kfree(new); - spin_unlock(&memtype_lock); + write_unlock(&memtype_lock); return err; } - spin_unlock(&memtype_lock); + write_unlock(&memtype_lock); dprintk("reserve_memtype added 0x%Lx-0x%Lx, track %s, req %s, ret %s\n", start, end, cattr_name(new->type), cattr_name(req_type), @@ -355,9 +355,9 @@ int free_memtype(u64 start, u64 end) return -EINVAL; } - spin_lock(&memtype_lock); + write_lock(&memtype_lock); entry = rbt_memtype_erase(start, end); - spin_unlock(&memtype_lock); + write_unlock(&memtype_lock); if (!entry) { printk(KERN_INFO "%s:%d freeing invalid memtype %Lx-%Lx\n", @@ -390,13 +390,13 @@ static unsigned long lookup_memtype(u64 paddr) if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE)) return _PAGE_CACHE_WB; - spin_lock(&memtype_lock); + read_lock(&memtype_lock); entry = rbt_memtype_lookup(paddr); if (entry != NULL) rettype = entry->type; - spin_unlock(&memtype_lock); + read_unlock(&memtype_lock); if (rettype != -1) return rettype; @@ -754,9 +754,9 @@ static struct memtype *memtype_get_idx(loff_t pos) if (!print_entry) return NULL; - spin_lock(&memtype_lock); + read_lock(&memtype_lock); ret = rbt_memtype_copy_nth_element(print_entry, pos); - spin_unlock(&memtype_lock); + read_unlock(&memtype_lock); if (!ret) { return print_entry; -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" 2011-06-19 21:49 ` Chris Wilson 2011-06-21 16:17 ` Remove the overhead of vm_insert_pfn() Chris Wilson @ 2011-06-21 18:09 ` Keith Packard 1 sibling, 0 replies; 17+ messages in thread From: Keith Packard @ 2011-06-21 18:09 UTC (permalink / raw) To: Chris Wilson, Eric Anholt, intel-gfx [-- Attachment #1.1: Type: text/plain, Size: 734 bytes --] On Sun, 19 Jun 2011 22:49:36 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > The patch closes a race condition. The essence of your complaint is that > the kernel is not as fast as we need it to be, and that the initial upload > to any object is slower than expected. I presume you also have a plan for > fixing the underwhelming performance of a glibc memcpy through an > established GTT mapping? I see a performance regression in a critical path for most applications (texture upload), and I'm not seeing any bug report that your fix closes. That seems pretty conclusive to me; I'll plan on pushing a revert of this patch to eliminate the regression from the previous version. -- keith.packard@intel.com [-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 159 bytes --] _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2011-06-21 18:09 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-06-14 23:43 [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" Eric Anholt 2011-06-15 9:39 ` Chris Wilson 2011-06-15 15:08 ` Eric Anholt 2011-06-15 16:03 ` Chris Wilson 2011-06-17 19:06 ` Eric Anholt 2011-06-18 11:43 ` Chris Wilson 2011-06-18 20:20 ` Eric Anholt 2011-06-19 16:28 ` Chris Wilson 2011-06-19 17:01 ` Eric Anholt 2011-06-19 17:14 ` Chris Wilson 2011-06-19 21:20 ` Eric Anholt 2011-06-19 21:49 ` Chris Wilson 2011-06-21 16:17 ` Remove the overhead of vm_insert_pfn() Chris Wilson 2011-06-21 16:17 ` [PATCH 1/3] resource: Use a common string for .name = "System RAM" Chris Wilson 2011-06-21 16:17 ` [PATCH 2/3] x86, pat: Perform interval rbtree lookup of memtype first Chris Wilson 2011-06-21 16:17 ` [PATCH 3/3] x86, pat: Convert memtype_lock spinlock to a rwlock Chris Wilson 2011-06-21 18:09 ` [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" Keith Packard
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.