* [PATCH] [RFC] intel: Non-LLC based non-blocking maps.
@ 2012-06-19 3:38 Ben Widawsky
2012-06-19 8:22 ` Chris Wilson
0 siblings, 1 reply; 4+ messages in thread
From: Ben Widawsky @ 2012-06-19 3:38 UTC (permalink / raw)
To: DRI Devel; +Cc: Daniel Vetter, Ben Widawsky, Intel GFX
The history on this patch goes back quite a way. This time around, the
patch builds on top of the map_unsynchronized that Eric pushed. Eric's
patch attempted only to solve the problem for LLC machines. Unlike
my earlier versions of this patch (with the help from Daniel Vetter), we
do not attempt to cpu map objects in a unsynchronized manner.
The concept is fairly simple - once a buffer is moved into the GTT
domain, we can assume it remains there unless we tell it otherwise (via
cpu map). It therefore stands to reason that as long as we can keep the
object in the GTT domain, and don't ever count on reading back contents,
things might just work. I believe as long as we are doing GTT mappings
only, we get to avoid worry about clflushing the dirtied cachelines, but
that could use some fact checking.
The patch makes some assumptions about how the kernel does buffer
tracking, this could be conceived as an ABI dependency, but actually the
behavior is pretty confined. It exploits the fact the BOs are only moved
into the CPU domain under certain circumstances, and daintily dances
around those conditions. The main thing here is we assume MADV_WILLNEED
prevents the object from getting evicted.
I am not aware of a good way to test it's effectiveness
performance-wise; but it introduces no regressions with piglit on my
ILK, or SNB.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
intel/intel_bufmgr_gem.c | 46 ++++++++++++++++++++++++++++++++++------------
1 file changed, 34 insertions(+), 12 deletions(-)
diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index b776d2f..3872fbb 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -204,6 +204,11 @@ struct _drm_intel_bo_gem {
bool reusable;
/**
+ * Boolean of whether or not we think this buffer is in the GTT domain.
+ * */
+ bool in_gtt_domain;
+
+ /**
* Size in bytes of this buffer and its relocation descendents.
*
* Used to avoid costly tree walking in
@@ -1188,6 +1193,9 @@ static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
if (write_enable)
bo_gem->mapped_cpu_write = true;
+ if (!bufmgr_gem->has_llc)
+ bo_gem->in_gtt_domain = false;
+
drm_intel_gem_bo_mark_mmaps_incoherent(bo);
VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_virtual, bo->size));
pthread_mutex_unlock(&bufmgr_gem->lock);
@@ -1292,6 +1300,7 @@ int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo)
strerror(errno));
}
+ bo_gem->in_gtt_domain = true;
drm_intel_gem_bo_mark_mmaps_incoherent(bo);
VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->gtt_virtual, bo->size));
pthread_mutex_unlock(&bufmgr_gem->lock);
@@ -1311,27 +1320,39 @@ int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo)
* data that the GPU is busy using (or, more specifically, that if it
* does write over the data, it acknowledges that rendering is
* undefined).
+ *
*/
-
int drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)
{
drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+ drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
int ret;
- /* If the CPU cache isn't coherent with the GTT, then use a
- * regular synchronized mapping. The problem is that we don't
- * track where the buffer was last used on the CPU side in
- * terms of drm_intel_bo_map vs drm_intel_gem_bo_map_gtt, so
- * we would potentially corrupt the buffer even when the user
- * does reasonable things.
- */
- if (!bufmgr_gem->has_llc)
- return drm_intel_gem_bo_map_gtt(bo);
-
pthread_mutex_lock(&bufmgr_gem->lock);
+
ret = map_gtt(bo);
- pthread_mutex_unlock(&bufmgr_gem->lock);
+ if (ret)
+ goto out;
+
+ /* LLC + gtt makes this function quite safe. */
+ if (!bufmgr_gem->has_llc) {
+ if (!bo_gem->in_gtt_domain) {
+ /* Put the object in the gtt domain. We can assume the
+ * object remains there unless set_domain happens which
+ * can only occur if the buffer is flinked, or we do it
+ * ourself. The first two can be managed in userspace,
+ * and the last one works as long as we assume
+ * MADV_WILLNEED prevents unbinding.
+ */
+ ret = drm_intel_gem_bo_madvise_internal(bufmgr_gem, bo_gem, MADV_WILLNEED);
+ if (ret)
+ goto out;
+ drm_intel_gem_bo_start_gtt_access(bo, 1);
+ }
+ }
+out:
+ pthread_mutex_unlock(&bufmgr_gem->lock);
return ret;
}
@@ -1506,6 +1527,7 @@ drm_intel_gem_bo_start_gtt_access(drm_intel_bo *bo, int write_enable)
set_domain.read_domains, set_domain.write_domain,
strerror(errno));
}
+ bo_gem->in_gtt_domain = true;
}
static void
--
1.7.10.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] [RFC] intel: Non-LLC based non-blocking maps.
2012-06-19 3:38 [PATCH] [RFC] intel: Non-LLC based non-blocking maps Ben Widawsky
@ 2012-06-19 8:22 ` Chris Wilson
2012-06-19 16:13 ` [Intel-gfx] " Ben Widawsky
0 siblings, 1 reply; 4+ messages in thread
From: Chris Wilson @ 2012-06-19 8:22 UTC (permalink / raw)
To: DRI Devel; +Cc: Daniel Vetter, Ben Widawsky, Intel GFX
On Mon, 18 Jun 2012 20:38:15 -0700, Ben Widawsky <ben@bwidawsk.net> wrote:
> The history on this patch goes back quite a way. This time around, the
> patch builds on top of the map_unsynchronized that Eric pushed. Eric's
> patch attempted only to solve the problem for LLC machines. Unlike
> my earlier versions of this patch (with the help from Daniel Vetter), we
> do not attempt to cpu map objects in a unsynchronized manner.
>
> The concept is fairly simple - once a buffer is moved into the GTT
> domain, we can assume it remains there unless we tell it otherwise (via
> cpu map). It therefore stands to reason that as long as we can keep the
> object in the GTT domain, and don't ever count on reading back contents,
> things might just work. I believe as long as we are doing GTT mappings
> only, we get to avoid worry about clflushing the dirtied cachelines, but
> that could use some fact checking.
>
> The patch makes some assumptions about how the kernel does buffer
> tracking, this could be conceived as an ABI dependency, but actually the
> behavior is pretty confined. It exploits the fact the BOs are only moved
> into the CPU domain under certain circumstances, and daintily dances
> around those conditions. The main thing here is we assume MADV_WILLNEED
> prevents the object from getting evicted.
>
> I am not aware of a good way to test it's effectiveness
> performance-wise; but it introduces no regressions with piglit on my
> ILK, or SNB.
This is broken wrt to cache invalidation if I want to rewrite part of
the buffer that already has been read by the GPU.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Intel-gfx] [PATCH] [RFC] intel: Non-LLC based non-blocking maps.
2012-06-19 8:22 ` Chris Wilson
@ 2012-06-19 16:13 ` Ben Widawsky
2012-06-19 19:50 ` Chris Wilson
0 siblings, 1 reply; 4+ messages in thread
From: Ben Widawsky @ 2012-06-19 16:13 UTC (permalink / raw)
To: Chris Wilson; +Cc: Daniel Vetter, Intel GFX, DRI Devel
On Tue, 19 Jun 2012 09:22:03 +0100
Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Mon, 18 Jun 2012 20:38:15 -0700, Ben Widawsky <ben@bwidawsk.net> wrote:
> > The history on this patch goes back quite a way. This time around, the
> > patch builds on top of the map_unsynchronized that Eric pushed. Eric's
> > patch attempted only to solve the problem for LLC machines. Unlike
> > my earlier versions of this patch (with the help from Daniel Vetter), we
> > do not attempt to cpu map objects in a unsynchronized manner.
> >
> > The concept is fairly simple - once a buffer is moved into the GTT
> > domain, we can assume it remains there unless we tell it otherwise (via
> > cpu map). It therefore stands to reason that as long as we can keep the
> > object in the GTT domain, and don't ever count on reading back contents,
> > things might just work. I believe as long as we are doing GTT mappings
> > only, we get to avoid worry about clflushing the dirtied cachelines, but
> > that could use some fact checking.
> >
> > The patch makes some assumptions about how the kernel does buffer
> > tracking, this could be conceived as an ABI dependency, but actually the
> > behavior is pretty confined. It exploits the fact the BOs are only moved
> > into the CPU domain under certain circumstances, and daintily dances
> > around those conditions. The main thing here is we assume MADV_WILLNEED
> > prevents the object from getting evicted.
> >
> > I am not aware of a good way to test it's effectiveness
> > performance-wise; but it introduces no regressions with piglit on my
> > ILK, or SNB.
>
> This is broken wrt to cache invalidation if I want to rewrite part of
> the buffer that already has been read by the GPU.
> -Chris
>
Well if you're talking about what I think you're talking about (ie. not
clflushing, but simply dealing with the GPUs internal caching). It's a
problem that has existed with all of the non-LLC non-blocking map
patches; and sort of the point of non-blocking maps. Play it fast and
loose, submit pipe controls if you get nervous.
Did I catch your meaning, or were you just talking about clflushing
stuff (we also miss chipset flush on really old platforms; I was
thinking of restricting this to ILK only)?
--
Ben Widawsky, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Intel-gfx] [PATCH] [RFC] intel: Non-LLC based non-blocking maps.
2012-06-19 16:13 ` [Intel-gfx] " Ben Widawsky
@ 2012-06-19 19:50 ` Chris Wilson
0 siblings, 0 replies; 4+ messages in thread
From: Chris Wilson @ 2012-06-19 19:50 UTC (permalink / raw)
To: Ben Widawsky; +Cc: Daniel Vetter, Intel GFX, DRI Devel
On Tue, 19 Jun 2012 09:13:20 -0700, Ben Widawsky <ben@bwidawsk.net> wrote:
> On Tue, 19 Jun 2012 09:22:03 +0100
> Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> > On Mon, 18 Jun 2012 20:38:15 -0700, Ben Widawsky <ben@bwidawsk.net> wrote:
> > > The history on this patch goes back quite a way. This time around, the
> > > patch builds on top of the map_unsynchronized that Eric pushed. Eric's
> > > patch attempted only to solve the problem for LLC machines. Unlike
> > > my earlier versions of this patch (with the help from Daniel Vetter), we
> > > do not attempt to cpu map objects in a unsynchronized manner.
> > >
> > > The concept is fairly simple - once a buffer is moved into the GTT
> > > domain, we can assume it remains there unless we tell it otherwise (via
> > > cpu map). It therefore stands to reason that as long as we can keep the
> > > object in the GTT domain, and don't ever count on reading back contents,
> > > things might just work. I believe as long as we are doing GTT mappings
> > > only, we get to avoid worry about clflushing the dirtied cachelines, but
> > > that could use some fact checking.
> > >
> > > The patch makes some assumptions about how the kernel does buffer
> > > tracking, this could be conceived as an ABI dependency, but actually the
> > > behavior is pretty confined. It exploits the fact the BOs are only moved
> > > into the CPU domain under certain circumstances, and daintily dances
> > > around those conditions. The main thing here is we assume MADV_WILLNEED
> > > prevents the object from getting evicted.
> > >
> > > I am not aware of a good way to test it's effectiveness
> > > performance-wise; but it introduces no regressions with piglit on my
> > > ILK, or SNB.
> >
> > This is broken wrt to cache invalidation if I want to rewrite part of
> > the buffer that already has been read by the GPU.
> > -Chris
> >
>
> Well if you're talking about what I think you're talking about (ie. not
> clflushing, but simply dealing with the GPUs internal caching). It's a
> problem that has existed with all of the non-LLC non-blocking map
> patches; and sort of the point of non-blocking maps. Play it fast and
> loose, submit pipe controls if you get nervous.
>
> Did I catch your meaning, or were you just talking about clflushing
> stuff (we also miss chipset flush on really old platforms; I was
> thinking of restricting this to ILK only)?
Sorry, I actually meant GPU caches. However, I was under the false
impression that you were chaning existing API not bringing GTT maps into
compliance with the new async mappings. My warning was merely about the
issue that can arrise from missing the invalidate when reusing an async
map (or even if the sampler prefetches futher than expected which is
what I was stung by most recently.) Furthermore, Daniel has just added
unconditional invalidates before each batchbuffer which neatly papers
over this issue (in the future at least).
Again, sorry for the noise, please continue :)
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-06-19 19:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-19 3:38 [PATCH] [RFC] intel: Non-LLC based non-blocking maps Ben Widawsky
2012-06-19 8:22 ` Chris Wilson
2012-06-19 16:13 ` [Intel-gfx] " Ben Widawsky
2012-06-19 19:50 ` Chris Wilson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox