public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Only insert the mb() before updating the fence parameter
@ 2012-10-09  8:38 Chris Wilson
  2012-10-09  9:54 ` Daniel Vetter
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2012-10-09  8:38 UTC (permalink / raw)
  To: intel-gfx

With a fence, we only need to insert a memory barrier around the actual
fence alteration for CPU accesses through the GTT. Performing the
barrier in flush-fence was inserting unnecessary and expensive barriers
for never fenced objects.

Note removing the barriers from flush-fence, which was effectively a
barrier before every direct access through the GTT, revealed that we
where missing a barrier before the first access through the GTT. Lack of
that barrier was sufficient to cause GPU hangs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c |   33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 05ff790..c1ef0a8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2771,9 +2771,22 @@ static void i830_write_fence_reg(struct drm_device *dev, int reg,
 	POSTING_READ(FENCE_REG_830_0 + reg * 4);
 }
 
+inline static bool i915_gem_object_needs_mb(struct drm_i915_gem_object *obj)
+{
+	return obj && obj->base.read_domains & I915_GEM_DOMAIN_GTT;
+}
+
 static void i915_gem_write_fence(struct drm_device *dev, int reg,
 				 struct drm_i915_gem_object *obj)
 {
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	/* Ensure that all CPU reads are completed before installing a fence
+	 * and all writes before removing the fence.
+	 */
+	if (i915_gem_object_needs_mb(dev_priv->fence_regs[reg].obj))
+		mb();
+
 	switch (INTEL_INFO(dev)->gen) {
 	case 7:
 	case 6: sandybridge_write_fence_reg(dev, reg, obj); break;
@@ -2783,6 +2796,9 @@ static void i915_gem_write_fence(struct drm_device *dev, int reg,
 	case 2: i830_write_fence_reg(dev, reg, obj); break;
 	default: break;
 	}
+
+	if (i915_gem_object_needs_mb(obj))
+		mb();
 }
 
 static inline int fence_number(struct drm_i915_private *dev_priv,
@@ -2812,7 +2828,7 @@ static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
 }
 
 static int
-i915_gem_object_flush_fence(struct drm_i915_gem_object *obj)
+i915_gem_object_wait_fence(struct drm_i915_gem_object *obj)
 {
 	if (obj->last_fenced_seqno) {
 		int ret = i915_wait_seqno(obj->ring, obj->last_fenced_seqno);
@@ -2822,12 +2838,6 @@ i915_gem_object_flush_fence(struct drm_i915_gem_object *obj)
 		obj->last_fenced_seqno = 0;
 	}
 
-	/* Ensure that all CPU reads are completed before installing a fence
-	 * and all writes before removing the fence.
-	 */
-	if (obj->base.read_domains & I915_GEM_DOMAIN_GTT)
-		mb();
-
 	obj->fenced_gpu_access = false;
 	return 0;
 }
@@ -2838,7 +2848,7 @@ i915_gem_object_put_fence(struct drm_i915_gem_object *obj)
 	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
 	int ret;
 
-	ret = i915_gem_object_flush_fence(obj);
+	ret = i915_gem_object_wait_fence(obj);
 	if (ret)
 		return ret;
 
@@ -2912,7 +2922,7 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj)
 	 * will need to serialise the write to the associated fence register?
 	 */
 	if (obj->fence_dirty) {
-		ret = i915_gem_object_flush_fence(obj);
+		ret = i915_gem_object_wait_fence(obj);
 		if (ret)
 			return ret;
 	}
@@ -2933,7 +2943,7 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj)
 		if (reg->obj) {
 			struct drm_i915_gem_object *old = reg->obj;
 
-			ret = i915_gem_object_flush_fence(old);
+			ret = i915_gem_object_wait_fence(old);
 			if (ret)
 				return ret;
 
@@ -3244,6 +3254,9 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 
 	i915_gem_object_flush_cpu_write_domain(obj);
 
+	if ((obj->base.read_domains & I915_GEM_DOMAIN_GTT) == 0)
+		mb();
+
 	old_write_domain = obj->base.write_domain;
 	old_read_domains = obj->base.read_domains;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Only insert the mb() before updating the fence parameter
  2012-10-09  8:38 [PATCH] drm/i915: Only insert the mb() before updating the fence parameter Chris Wilson
@ 2012-10-09  9:54 ` Daniel Vetter
  2012-10-09 10:02   ` Daniel Vetter
  2012-10-09 11:03   ` Chris Wilson
  0 siblings, 2 replies; 6+ messages in thread
From: Daniel Vetter @ 2012-10-09  9:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Oct 09, 2012 at 09:38:58AM +0100, Chris Wilson wrote:
> With a fence, we only need to insert a memory barrier around the actual
> fence alteration for CPU accesses through the GTT. Performing the
> barrier in flush-fence was inserting unnecessary and expensive barriers
> for never fenced objects.
> 
> Note removing the barriers from flush-fence, which was effectively a
> barrier before every direct access through the GTT, revealed that we
> where missing a barrier before the first access through the GTT. Lack of
> that barrier was sufficient to cause GPU hangs.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Looks good and finally puts some clear explanation and consistency behind
our mb()s. Two minor nitpicks, otherwise.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  drivers/gpu/drm/i915/i915_gem.c |   33 +++++++++++++++++++++++----------
>  1 file changed, 23 insertions(+), 10 deletions(-)

[snip]

> @@ -3244,6 +3254,9 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>  
>  	i915_gem_object_flush_cpu_write_domain(obj);
>  
> +	if ((obj->base.read_domains & I915_GEM_DOMAIN_GTT) == 0)
> +		mb();
> +

I think a comment here like we have one for all other gtt related memory
barries would be good. Another thing is the flush_gtt_write_domain uses a
wmb, whereas here we don't bother with micro-optimizing things. So I think
it'd be good to just use a mb() for that, too, if just for consistency.

Also, you know the grumpy maintainer drill: Could we exercise these
barriers with a minimal i-g-t testcase, please? Since you've managed to
kill your machine by removing them, they're no longer just there to keep
us happy, hence I'd like to have them exercised ...

Another thing that just crossed my mind: Could we lack a set of mb()s for
cpu access on llc platforms? For non-coherent platforms the mb() in the
clflush paths will do that, but on llc platforms I couldn't find anything.
And that lp bugs seems to make an excellent case for them being required
...

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Only insert the mb() before updating the fence parameter
  2012-10-09  9:54 ` Daniel Vetter
@ 2012-10-09 10:02   ` Daniel Vetter
  2012-10-09 11:03   ` Chris Wilson
  1 sibling, 0 replies; 6+ messages in thread
From: Daniel Vetter @ 2012-10-09 10:02 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

While looking at barriers, I think we could be a bit more paranoid with
the barrier in intel_read_status_page and up it to a full mb() and move it
into the !lazy_coherency conditional of the various get_seqno functions.
-Daniel

On Tue, Oct 09, 2012 at 11:54:12AM +0200, Daniel Vetter wrote:
> On Tue, Oct 09, 2012 at 09:38:58AM +0100, Chris Wilson wrote:
> > With a fence, we only need to insert a memory barrier around the actual
> > fence alteration for CPU accesses through the GTT. Performing the
> > barrier in flush-fence was inserting unnecessary and expensive barriers
> > for never fenced objects.
> > 
> > Note removing the barriers from flush-fence, which was effectively a
> > barrier before every direct access through the GTT, revealed that we
> > where missing a barrier before the first access through the GTT. Lack of
> > that barrier was sufficient to cause GPU hangs.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Looks good and finally puts some clear explanation and consistency behind
> our mb()s. Two minor nitpicks, otherwise.
> 
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c |   33 +++++++++++++++++++++++----------
> >  1 file changed, 23 insertions(+), 10 deletions(-)
> 
> [snip]
> 
> > @@ -3244,6 +3254,9 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
> >  
> >  	i915_gem_object_flush_cpu_write_domain(obj);
> >  
> > +	if ((obj->base.read_domains & I915_GEM_DOMAIN_GTT) == 0)
> > +		mb();
> > +
> 
> I think a comment here like we have one for all other gtt related memory
> barries would be good. Another thing is the flush_gtt_write_domain uses a
> wmb, whereas here we don't bother with micro-optimizing things. So I think
> it'd be good to just use a mb() for that, too, if just for consistency.
> 
> Also, you know the grumpy maintainer drill: Could we exercise these
> barriers with a minimal i-g-t testcase, please? Since you've managed to
> kill your machine by removing them, they're no longer just there to keep
> us happy, hence I'd like to have them exercised ...
> 
> Another thing that just crossed my mind: Could we lack a set of mb()s for
> cpu access on llc platforms? For non-coherent platforms the mb() in the
> clflush paths will do that, but on llc platforms I couldn't find anything.
> And that lp bugs seems to make an excellent case for them being required
> ...
> 
> Cheers, Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Only insert the mb() before updating the fence parameter
  2012-10-09  9:54 ` Daniel Vetter
  2012-10-09 10:02   ` Daniel Vetter
@ 2012-10-09 11:03   ` Chris Wilson
  2012-10-09 11:14     ` Daniel Vetter
  1 sibling, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2012-10-09 11:03 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, 9 Oct 2012 11:54:12 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Tue, Oct 09, 2012 at 09:38:58AM +0100, Chris Wilson wrote:
> > @@ -3244,6 +3254,9 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
> >  
> >  	i915_gem_object_flush_cpu_write_domain(obj);
> >  
> > +	if ((obj->base.read_domains & I915_GEM_DOMAIN_GTT) == 0)
> > +		mb();
> > +
> 
> I think a comment here like we have one for all other gtt related memory
> barries would be good. Another thing is the flush_gtt_write_domain uses a
> wmb, whereas here we don't bother with micro-optimizing things. So I think
> it'd be good to just use a mb() for that, too, if just for consistency.

In fact, not only is that the wmb() the easiest to micro-optimise, it
is the only one that can be - I think. Around manipulating the fence, we
need a read/write barrier in case we have any pending accesses through
the fenced region, since the register write may be reordered passed the
memory reads since there is no obvious dependency. That might just be
heightened paranoia and our memory controller isn't that smart. Yet. So
those two need to be mb() so that I can sleep safely at night. For the
mb() inside set-to-gtt-domain, I don't have a robust explanation other
than that empirically we need a barrier, therefore there is some
lingering incoherency when reusing a bo. (The hangs always seem to occur
when crossing a page boundary, we see stale data.) You could attempt to
insert a read/write barrier depending upon actual usage, but it hardly
seems worth the effort.
 
> Also, you know the grumpy maintainer drill: Could we exercise these
> barriers with a minimal i-g-t testcase, please? Since you've managed to
> kill your machine by removing them, they're no longer just there to keep
> us happy, hence I'd like to have them exercised ...

Still hunting.
 
> Another thing that just crossed my mind: Could we lack a set of mb()s for
> cpu access on llc platforms? For non-coherent platforms the mb() in the
> clflush paths will do that, but on llc platforms I couldn't find anything.
> And that lp bugs seems to make an excellent case for them being required
> ...

Yes, with LLC we need to treat the GPU as another core and so put
similar SMP-esque memory barriers in place.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Only insert the mb() before updating the fence parameter
  2012-10-09 11:03   ` Chris Wilson
@ 2012-10-09 11:14     ` Daniel Vetter
  2012-10-09 11:26       ` Chris Wilson
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Vetter @ 2012-10-09 11:14 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Oct 9, 2012 at 1:03 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> In fact, not only is that the wmb() the easiest to micro-optimise, it
> is the only one that can be - I think. Around manipulating the fence, we
> need a read/write barrier in case we have any pending accesses through
> the fenced region, since the register write may be reordered passed the
> memory reads since there is no obvious dependency. That might just be
> heightened paranoia and our memory controller isn't that smart. Yet. So
> those two need to be mb() so that I can sleep safely at night. For the
> mb() inside set-to-gtt-domain, I don't have a robust explanation other
> than that empirically we need a barrier, therefore there is some
> lingering incoherency when reusing a bo. (The hangs always seem to occur
> when crossing a page boundary, we see stale data.) You could attempt to
> insert a read/write barrier depending upon actual usage, but it hardly
> seems worth the effort.

Hm, I think we can make a case that the barrier before the fence
change can only be a wmb(): Racing reads against fence changes is
ill-defined anyway, so we don't need the read barrier for that reason.
But we need to flush out any store buffers (especially the wc store
buffer) before changing the fencing. The mb() afterwards seems to be
required, since we need to sync both subsequent reads and writes
against the fence mmio write.

One thing I wonder is whether we miss any barrier between the wc
writes to the ringbuffer and the tail update. If that's the case I
wonder where all the bug reports are ...

Last one: Which machines blow up when you drop that mb()?

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Only insert the mb() before updating the fence parameter
  2012-10-09 11:14     ` Daniel Vetter
@ 2012-10-09 11:26       ` Chris Wilson
  0 siblings, 0 replies; 6+ messages in thread
From: Chris Wilson @ 2012-10-09 11:26 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, 9 Oct 2012 13:14:09 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> One thing I wonder is whether we miss any barrier between the wc
> writes to the ringbuffer and the tail update. If that's the case I
> wonder where all the bug reports are ...

Ditto. I've often wondered how we get away without a wmb() there...
 
> Last one: Which machines blow up when you drop that mb()?

pnv, though that's the only non-LLC I've been testing with the
incomplete patch so I can't say it is limited to that machine.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-10-09 11:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-09  8:38 [PATCH] drm/i915: Only insert the mb() before updating the fence parameter Chris Wilson
2012-10-09  9:54 ` Daniel Vetter
2012-10-09 10:02   ` Daniel Vetter
2012-10-09 11:03   ` Chris Wilson
2012-10-09 11:14     ` Daniel Vetter
2012-10-09 11:26       ` Chris Wilson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox