public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH 1/9] drm/i915: Update rules for reading cache lines through the LLC
@ 2013-08-08 13:41 Chris Wilson
  2013-08-08 13:41 ` [PATCH 2/9] drm/i915: Track when an object is pinned for use by the display engine Chris Wilson
                   ` (8 more replies)
  0 siblings, 9 replies; 28+ messages in thread
From: Chris Wilson @ 2013-08-08 13:41 UTC (permalink / raw)
  To: intel-gfx

The LLC is a fun device. The cache is a distinct functional block within
the SA that arbitrates access from both the CPU and GPU cores. As such
all writes to memory land first in the LLC before further action is
taken. For example, an uncached write from either the CPU or GPU will
then proceed to memory and evict the cacheline from the LLC. This means that
a read from the LLC always returns the correct information even if the PTE
bit in the GPU differs from the PAT bit in the CPU. For the older
snooping architecture on non-LLC, the fundamental principle still holds
except that some coordination is required between the CPU and GPU to
explicitly perform the snooping (which is handled by our request
tracking).

The upshot of this is that we know that we can issue a read from either
LLC devices or snoopable memory and trust the contents of the cache -
i.e. we can forgo a clflush before a read in these circumstances.
Writing to memory from the CPU is a little more tricky as we have to
consider that the scanout does not read from the CPU cache at all, but
from main memory. So we have to currently treat all requests to write to
uncached memory as having to be flushed to main memory for coherency
with all consumers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4c420f9..7cd36c5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -62,6 +62,12 @@ static long i915_gem_purge(struct drm_i915_private *dev_priv, long target);
 static void i915_gem_shrink_all(struct drm_i915_private *dev_priv);
 static void i915_gem_object_truncate(struct drm_i915_gem_object *obj);
 
+static bool cpu_cache_is_coherent(struct drm_device *dev,
+				  enum i915_cache_level level)
+{
+	return HAS_LLC(dev) || level != I915_CACHE_NONE;
+}
+
 static inline void i915_gem_object_fence_lost(struct drm_i915_gem_object *obj)
 {
 	if (obj->tiling_mode)
@@ -508,8 +514,7 @@ i915_gem_shmem_pread(struct drm_device *dev,
 		 * read domain and manually flush cachelines (if required). This
 		 * optimizes for the case when the gpu will dirty the data
 		 * anyway again before the next pread happens. */
-		if (obj->cache_level == I915_CACHE_NONE)
-			needs_clflush = 1;
+		needs_clflush = !cpu_cache_is_coherent(dev, obj->cache_level);
 		if (i915_gem_obj_bound_any(obj)) {
 			ret = i915_gem_object_set_to_gtt_domain(obj, false);
 			if (ret)
@@ -833,11 +838,11 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 				return ret;
 		}
 	}
-	/* Same trick applies for invalidate partially written cachelines before
-	 * writing.  */
-	if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)
-	    && obj->cache_level == I915_CACHE_NONE)
-		needs_clflush_before = 1;
+	/* Same trick applies to invalidate partially written cachelines read
+	 * before writing. */
+	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0)
+		needs_clflush_before =
+			!cpu_cache_is_coherent(dev, obj->cache_level);
 
 	ret = i915_gem_object_get_pages(obj);
 	if (ret)
@@ -3679,7 +3684,8 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 
 	/* Flush the CPU cache if it's still invalid. */
 	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
-		i915_gem_clflush_object(obj);
+		if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
+			i915_gem_clflush_object(obj);
 
 		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
 	}
-- 
1.8.4.rc1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/9] drm/i915: Track when an object is pinned for use by the display engine
  2013-08-08 13:41 [PATCH 1/9] drm/i915: Update rules for reading cache lines through the LLC Chris Wilson
@ 2013-08-08 13:41 ` Chris Wilson
  2013-08-09 11:25   ` [PATCH] " Chris Wilson
  2013-08-08 13:41 ` [PATCH 3/9] drm/i915: Update rules for writing through the LLC with the cpu Chris Wilson
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2013-08-08 13:41 UTC (permalink / raw)
  To: intel-gfx

The display engine has unique coherency rules such that it requires
special handling to ensure that all writes to cursors, scanouts and
sprites are clflushed. This patch introduces the infrastructure to
simply track when an object is being accessed by the display engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  2 ++
 drivers/gpu/drm/i915/i915_drv.h      |  2 ++
 drivers/gpu/drm/i915/i915_gem.c      | 25 +++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_display.c |  8 ++++----
 4 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index b06e6ce..463c660 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -117,6 +117,8 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		seq_printf(m, " (name: %d)", obj->base.name);
 	if (obj->pin_count)
 		seq_printf(m, " (pinned x %d)", obj->pin_count);
+	if (obj->pin_display)
+		seq_printf(m, " (display)");
 	if (obj->fence_reg != I915_FENCE_REG_NONE)
 		seq_printf(m, " (fence: %d)", obj->fence_reg);
 	list_for_each_entry(vma, &obj->vma_list, vma_link) {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 16a46f3..3622ec2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1380,6 +1380,7 @@ struct drm_i915_gem_object {
 	 */
 	unsigned int fault_mappable:1;
 	unsigned int pin_mappable:1;
+	unsigned int pin_display:1;
 
 	/*
 	 * Is the GPU currently using a fence to access this buffer,
@@ -1894,6 +1895,7 @@ int __must_check
 i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 				     u32 alignment,
 				     struct intel_ring_buffer *pipelined);
+void i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj);
 int i915_gem_attach_phys_object(struct drm_device *dev,
 				struct drm_i915_gem_object *obj,
 				int id,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7cd36c5..c5e03ba 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3583,6 +3583,11 @@ unlock:
 	return ret;
 }
 
+static bool is_pin_display(struct drm_i915_gem_object *obj)
+{
+	return obj->pin_count != !!obj->user_pin_count;
+}
+
 /*
  * Prepare buffer for display plane (scanout, cursors, etc).
  * Can be called from an uninterruptible phase (modesetting) and allows
@@ -3602,6 +3607,11 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 			return ret;
 	}
 
+	/* Mark the pin_display early so that we account for the 
+	 * display coherency whilst setting up the cache domains.
+	 */
+	obj->pin_display = true;
+
 	/* The display engine is not coherent with the LLC cache on gen6.  As
 	 * a result, we make sure that the pinning that is about to occur is
 	 * done with uncached PTEs. This is lowest common denominator for all
@@ -3613,7 +3623,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	 */
 	ret = i915_gem_object_set_cache_level(obj, I915_CACHE_NONE);
 	if (ret)
-		return ret;
+		goto err_unpin_display;
 
 	/* As the user may map the buffer once pinned in the display plane
 	 * (e.g. libkms for the bootup splash), we have to ensure that we
@@ -3621,7 +3631,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	 */
 	ret = i915_gem_obj_ggtt_pin(obj, alignment, true, false);
 	if (ret)
-		return ret;
+		goto err_unpin_display;
 
 	i915_gem_object_flush_cpu_write_domain(obj);
 
@@ -3639,6 +3649,17 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 					    old_write_domain);
 
 	return 0;
+
+err_unpin_display:
+	obj->pin_display = is_pin_display(obj);
+	return ret;
+}
+
+void
+i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj)
+{
+	i915_gem_object_unpin(obj);
+	obj->pin_display = is_pin_display(obj);
 }
 
 int
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index ae56e87..41b2cc2 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -1883,7 +1883,7 @@ intel_pin_and_fence_fb_obj(struct drm_device *dev,
 	return 0;
 
 err_unpin:
-	i915_gem_object_unpin(obj);
+	i915_gem_object_unpin_from_display_plane(obj);
 err_interruptible:
 	dev_priv->mm.interruptible = true;
 	return ret;
@@ -1892,7 +1892,7 @@ err_interruptible:
 void intel_unpin_fb_obj(struct drm_i915_gem_object *obj)
 {
 	i915_gem_object_unpin_fence(obj);
-	i915_gem_object_unpin(obj);
+	i915_gem_object_unpin_from_display_plane(obj);
 }
 
 /* Computes the linear offset to the base tile and adjusts x, y. bytes per pixel
@@ -6775,7 +6775,7 @@ static int intel_crtc_cursor_set(struct drm_crtc *crtc,
 			if (intel_crtc->cursor_bo != obj)
 				i915_gem_detach_phys_object(dev, intel_crtc->cursor_bo);
 		} else
-			i915_gem_object_unpin(intel_crtc->cursor_bo);
+			i915_gem_object_unpin_from_display_plane(intel_crtc->cursor_bo);
 		drm_gem_object_unreference(&intel_crtc->cursor_bo->base);
 	}
 
@@ -6790,7 +6790,7 @@ static int intel_crtc_cursor_set(struct drm_crtc *crtc,
 
 	return 0;
 fail_unpin:
-	i915_gem_object_unpin(obj);
+	i915_gem_object_unpin_from_display_plane(obj);
 fail_locked:
 	mutex_unlock(&dev->struct_mutex);
 fail:
-- 
1.8.4.rc1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/9] drm/i915: Update rules for writing through the LLC with the cpu
  2013-08-08 13:41 [PATCH 1/9] drm/i915: Update rules for reading cache lines through the LLC Chris Wilson
  2013-08-08 13:41 ` [PATCH 2/9] drm/i915: Track when an object is pinned for use by the display engine Chris Wilson
@ 2013-08-08 13:41 ` Chris Wilson
  2013-08-08 15:27   ` Ville Syrjälä
  2013-08-09 11:26   ` [PATCH] " Chris Wilson
  2013-08-08 13:41 ` [PATCH 4/9] drm/i915: Allow the GPU to cache stolen memory Chris Wilson
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 28+ messages in thread
From: Chris Wilson @ 2013-08-08 13:41 UTC (permalink / raw)
  To: intel-gfx

As mentioned in the previous commit, reads and writes from both the CPU
and GPU go through the LLC. This gives us coherency between the CPU and
GPU irrespective of the attribute settings either device sets. We can
use to avoid having to clflush even uncached memory.

Except for the scanout.

The scanout resides within another functional block that does not use
the LLC but reads directly from main memory. So in order to maintain
coherency with the scanout, writes to uncached memory must be flushed.
In order to optimize writes elsewhere, we start tracking whether an
framebuffer is attached to an object.

v2: Use pin_display tracking rather than fb_count (to ensure we flush
cursors as well etc) and only force the clflush along explicit writes to
the scanout paths (i.e. pin_to_display_plane and pwrite into scanout).

Based on a patch by Ville Syrjälä.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  2 +-
 drivers/gpu/drm/i915/i915_gem.c            | 58 ++++++++++++++++--------------
 drivers/gpu/drm/i915/i915_gem_exec.c       |  2 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c        |  2 +-
 5 files changed, 36 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3622ec2..1ffae08 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1863,7 +1863,7 @@ static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
 }
 
 void i915_gem_reset(struct drm_device *dev);
-void i915_gem_clflush_object(struct drm_i915_gem_object *obj);
+void i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
 int __must_check i915_gem_object_set_domain(struct drm_i915_gem_object *obj,
 					    uint32_t read_domains,
 					    uint32_t write_domain);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c5e03ba..78535e9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -38,7 +38,8 @@
 #include <linux/dma-buf.h>
 
 static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
-static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
+static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj,
+						   bool force);
 static __must_check int
 i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 			   struct i915_address_space *vm,
@@ -68,6 +69,14 @@ static bool cpu_cache_is_coherent(struct drm_device *dev,
 	return HAS_LLC(dev) || level != I915_CACHE_NONE;
 }
 
+static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
+{
+	if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
+		return true;
+
+	return obj->pin_display;
+}
+
 static inline void i915_gem_object_fence_lost(struct drm_i915_gem_object *obj)
 {
 	if (obj->tiling_mode)
@@ -830,8 +839,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 		 * write domain and manually flush cachelines (if required). This
 		 * optimizes for the case when the gpu will use the data
 		 * right away and we therefore have to clflush anyway. */
-		if (obj->cache_level == I915_CACHE_NONE)
-			needs_clflush_after = 1;
+		needs_clflush_after = cpu_write_needs_clflush(obj);
 		if (i915_gem_obj_bound_any(obj)) {
 			ret = i915_gem_object_set_to_gtt_domain(obj, true);
 			if (ret)
@@ -921,7 +929,7 @@ out:
 		 */
 		if (!needs_clflush_after &&
 		    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
-			i915_gem_clflush_object(obj);
+			i915_gem_clflush_object(obj, false);
 			i915_gem_chipset_flush(dev);
 		}
 	}
@@ -999,9 +1007,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 		goto out;
 	}
 
-	if (obj->cache_level == I915_CACHE_NONE &&
-	    obj->tiling_mode == I915_TILING_NONE &&
-	    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
+	if (obj->tiling_mode == I915_TILING_NONE &&
+	    obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
+	    cpu_write_needs_clflush(obj)) {
 		ret = i915_gem_gtt_pwrite_fast(dev, obj, args, file);
 		/* Note that the gtt paths might fail with non-page-backed user
 		 * pointers (e.g. gtt mappings when moving data between
@@ -1350,8 +1358,8 @@ i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 	}
 
 	/* Pinned buffers may be scanout, so flush the cache */
-	if (obj->pin_count)
-		i915_gem_object_flush_cpu_write_domain(obj);
+	if (obj->pin_display)
+		i915_gem_object_flush_cpu_write_domain(obj, true);
 
 	drm_gem_object_unreference(&obj->base);
 unlock:
@@ -1721,7 +1729,7 @@ i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
 		 * hope for the best.
 		 */
 		WARN_ON(ret != -EIO);
-		i915_gem_clflush_object(obj);
+		i915_gem_clflush_object(obj, true);
 		obj->base.read_domains = obj->base.write_domain = I915_GEM_DOMAIN_CPU;
 	}
 
@@ -3299,7 +3307,8 @@ err_unpin:
 }
 
 void
-i915_gem_clflush_object(struct drm_i915_gem_object *obj)
+i915_gem_clflush_object(struct drm_i915_gem_object *obj,
+			bool force)
 {
 	/* If we don't have a page list set up, then we're not pinned
 	 * to GPU, and we can ignore the cache flush because it'll happen
@@ -3323,7 +3332,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj)
 	 * snooping behaviour occurs naturally as the result of our domain
 	 * tracking.
 	 */
-	if (obj->cache_level != I915_CACHE_NONE)
+	if (!force && cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
 		return;
 
 	trace_i915_gem_object_clflush(obj);
@@ -3360,14 +3369,15 @@ i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj)
 
 /** Flushes the CPU write domain for the object if it's dirty. */
 static void
-i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj)
+i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj,
+				       bool force)
 {
 	uint32_t old_write_domain;
 
 	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU)
 		return;
 
-	i915_gem_clflush_object(obj);
+	i915_gem_clflush_object(obj, force);
 	i915_gem_chipset_flush(obj->base.dev);
 	old_write_domain = obj->base.write_domain;
 	obj->base.write_domain = 0;
@@ -3401,7 +3411,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 	if (ret)
 		return ret;
 
-	i915_gem_object_flush_cpu_write_domain(obj);
+	i915_gem_object_flush_cpu_write_domain(obj, false);
 
 	/* Serialise direct access to this object with the barriers for
 	 * coherent writes from the GPU, by effectively invalidating the
@@ -3491,7 +3501,11 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 					       obj, cache_level);
 	}
 
-	if (cache_level == I915_CACHE_NONE) {
+	list_for_each_entry(vma, &obj->vma_list, vma_link)
+		vma->node.color = cache_level;
+	obj->cache_level = cache_level;
+
+	if (cpu_write_needs_clflush(obj)) {
 		u32 old_read_domains, old_write_domain;
 
 		/* If we're coming from LLC cached, then we haven't
@@ -3514,9 +3528,6 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 						    old_write_domain);
 	}
 
-	list_for_each_entry(vma, &obj->vma_list, vma_link)
-		vma->node.color = cache_level;
-	obj->cache_level = cache_level;
 	i915_gem_verify_gtt(dev);
 	return 0;
 }
@@ -3633,7 +3644,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	if (ret)
 		goto err_unpin_display;
 
-	i915_gem_object_flush_cpu_write_domain(obj);
+	i915_gem_object_flush_cpu_write_domain(obj, true);
 
 	old_write_domain = obj->base.write_domain;
 	old_read_domains = obj->base.read_domains;
@@ -3705,8 +3716,7 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 
 	/* Flush the CPU cache if it's still invalid. */
 	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
-		if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
-			i915_gem_clflush_object(obj);
+		i915_gem_clflush_object(obj, false);
 
 		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
 	}
@@ -3888,10 +3898,6 @@ i915_gem_pin_ioctl(struct drm_device *dev, void *data,
 	obj->user_pin_count++;
 	obj->pin_filp = file;
 
-	/* XXX - flush the CPU caches for pinned objects
-	 * as the X server doesn't manage domains yet
-	 */
-	i915_gem_object_flush_cpu_write_domain(obj);
 	args->offset = i915_gem_obj_ggtt_offset(obj);
 out:
 	drm_gem_object_unreference(&obj->base);
diff --git a/drivers/gpu/drm/i915/i915_gem_exec.c b/drivers/gpu/drm/i915/i915_gem_exec.c
index d2ac077..6af0067 100644
--- a/drivers/gpu/drm/i915/i915_gem_exec.c
+++ b/drivers/gpu/drm/i915/i915_gem_exec.c
@@ -50,7 +50,7 @@ static int i915_gem_exec_flush_object(struct drm_i915_gem_object *obj,
 		return ret;
 
 	if (obj->base.write_domain & I915_GEM_DOMAIN_CPU) {
-		i915_gem_clflush_object(obj);
+		i915_gem_clflush_object(obj, false);
 		i915_gem_chipset_flush(obj->base.dev);
 		obj->base.write_domain &= ~I915_GEM_DOMAIN_CPU;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 8ccc29a..e999578 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -716,7 +716,7 @@ i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring,
 			return ret;
 
 		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
-			i915_gem_clflush_object(obj);
+			i915_gem_clflush_object(obj, false);
 
 		flush_domains |= obj->base.write_domain;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d29b1db..89d8cc8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -487,7 +487,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 				       dev_priv->gtt.base.total / PAGE_SIZE);
 
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-		i915_gem_clflush_object(obj);
+		i915_gem_clflush_object(obj, obj->pin_display);
 		i915_gem_gtt_bind_object(obj, obj->cache_level);
 	}
 
-- 
1.8.4.rc1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/9] drm/i915: Allow the GPU to cache stolen memory
  2013-08-08 13:41 [PATCH 1/9] drm/i915: Update rules for reading cache lines through the LLC Chris Wilson
  2013-08-08 13:41 ` [PATCH 2/9] drm/i915: Track when an object is pinned for use by the display engine Chris Wilson
  2013-08-08 13:41 ` [PATCH 3/9] drm/i915: Update rules for writing through the LLC with the cpu Chris Wilson
@ 2013-08-08 13:41 ` Chris Wilson
  2013-08-08 13:41 ` [PATCH 5/9] drm/i915: Allocate LLC ringbuffers from stolen Chris Wilson
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2013-08-08 13:41 UTC (permalink / raw)
  To: intel-gfx

As a corollary to reviewing the interaction between LLC and our cache
domains, the GPU PTE bits are independent of the CPU PAT bits. As such
we can set the cache level on stolen memory based on how we wish the GPU
to cache accesses to it. So we are free to set the same default cache
levels as for normal bo, i.e. enable LLC cacheing by default where
appropriate.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_stolen.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index c0ab8ec..112c5e1 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -254,9 +254,8 @@ _i915_gem_object_create_stolen(struct drm_device *dev,
 	i915_gem_object_pin_pages(obj);
 	obj->stolen = stolen;
 
-	obj->base.write_domain = I915_GEM_DOMAIN_GTT;
-	obj->base.read_domains = I915_GEM_DOMAIN_GTT;
-	obj->cache_level = I915_CACHE_NONE;
+	obj->base.read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
+	obj->cache_level = HAS_LLC(dev) ? I915_CACHE_LLC : I915_CACHE_NONE;
 
 	return obj;
 
-- 
1.8.4.rc1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/9] drm/i915: Allocate LLC ringbuffers from stolen
  2013-08-08 13:41 [PATCH 1/9] drm/i915: Update rules for reading cache lines through the LLC Chris Wilson
                   ` (2 preceding siblings ...)
  2013-08-08 13:41 ` [PATCH 4/9] drm/i915: Allow the GPU to cache stolen memory Chris Wilson
@ 2013-08-08 13:41 ` Chris Wilson
  2013-08-08 13:41 ` [PATCH 6/9] drm/i915: Allocate context objects " Chris Wilson
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2013-08-08 13:41 UTC (permalink / raw)
  To: intel-gfx

As stolen objects now behave identically (wrt to default LLC cacheing)
as their normal system counterparts, we no longer have to differentiate
our usage for ringbuffers. So allocate them from stolen on SNB+ as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index dbc1f7c..5d38ced 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1298,9 +1298,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 			return ret;
 	}
 
-	obj = NULL;
-	if (!HAS_LLC(dev))
-		obj = i915_gem_object_create_stolen(dev, ring->size);
+	obj = i915_gem_object_create_stolen(dev, ring->size);
 	if (obj == NULL)
 		obj = i915_gem_alloc_object(dev, ring->size);
 	if (obj == NULL) {
-- 
1.8.4.rc1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/9] drm/i915: Allocate context objects from stolen
  2013-08-08 13:41 [PATCH 1/9] drm/i915: Update rules for reading cache lines through the LLC Chris Wilson
                   ` (3 preceding siblings ...)
  2013-08-08 13:41 ` [PATCH 5/9] drm/i915: Allocate LLC ringbuffers from stolen Chris Wilson
@ 2013-08-08 13:41 ` Chris Wilson
  2013-08-10  9:25   ` Daniel Vetter
  2013-08-08 13:41 ` [PATCH 7/9] drm/i915: Only do a chipset flush after a clflush Chris Wilson
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2013-08-08 13:41 UTC (permalink / raw)
  To: intel-gfx

Once again, the CPU PAT bits are irrelevant when considering the GPU
cacheing, and context objects are never accessed from the CPU or
directly by userspace making them another ideal candidate to allocate
from stolen memory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index a57d49a..498f8a0 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -141,6 +141,7 @@ create_hw_context(struct drm_device *dev,
 		  struct drm_i915_file_private *file_priv)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	const int size = dev_priv->hw_context_size;
 	struct i915_hw_context *ctx;
 	int ret;
 
@@ -149,7 +150,9 @@ create_hw_context(struct drm_device *dev,
 		return ERR_PTR(-ENOMEM);
 
 	kref_init(&ctx->ref);
-	ctx->obj = i915_gem_alloc_object(dev, dev_priv->hw_context_size);
+	ctx->obj = i915_gem_object_create_stolen(dev, size);
+	if (ctx->obj == NULL)
+		ctx->obj = i915_gem_alloc_object(dev, size);
 	if (ctx->obj == NULL) {
 		kfree(ctx);
 		DRM_DEBUG_DRIVER("Context object allocated failed\n");
-- 
1.8.4.rc1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 7/9] drm/i915: Only do a chipset flush after a clflush
  2013-08-08 13:41 [PATCH 1/9] drm/i915: Update rules for reading cache lines through the LLC Chris Wilson
                   ` (4 preceding siblings ...)
  2013-08-08 13:41 ` [PATCH 6/9] drm/i915: Allocate context objects " Chris Wilson
@ 2013-08-08 13:41 ` Chris Wilson
  2013-08-08 13:41 ` [PATCH 8/9] drm/i915: Use Write-Through cacheing for the display plane on Iris Chris Wilson
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2013-08-08 13:41 UTC (permalink / raw)
  To: intel-gfx

Now that we skip clflushes more often, return a boolean indicating
whether the clflush was actually performed, and only if it was do the
chipset flush. (Though on most of the architectures where the clflush will
be skipped, the chipset flush is a no-op!)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  2 +-
 drivers/gpu/drm/i915/i915_gem.c            | 20 +++++++++++---------
 drivers/gpu/drm/i915/i915_gem_exec.c       |  4 ++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 +++--
 4 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1ffae08..c51a771 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1863,7 +1863,7 @@ static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
 }
 
 void i915_gem_reset(struct drm_device *dev);
-void i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
+bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
 int __must_check i915_gem_object_set_domain(struct drm_i915_gem_object *obj,
 					    uint32_t read_domains,
 					    uint32_t write_domain);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 78535e9..2656c69 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -929,8 +929,8 @@ out:
 		 */
 		if (!needs_clflush_after &&
 		    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
-			i915_gem_clflush_object(obj, false);
-			i915_gem_chipset_flush(dev);
+			if (i915_gem_clflush_object(obj, false))
+				i915_gem_chipset_flush(dev);
 		}
 	}
 
@@ -3306,7 +3306,7 @@ err_unpin:
 	return ret;
 }
 
-void
+bool
 i915_gem_clflush_object(struct drm_i915_gem_object *obj,
 			bool force)
 {
@@ -3315,14 +3315,14 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
 	 * again at bind time.
 	 */
 	if (obj->pages == NULL)
-		return;
+		return false;
 
 	/*
 	 * Stolen memory is always coherent with the GPU as it is explicitly
 	 * marked as wc by the system, or the system is cache-coherent.
 	 */
 	if (obj->stolen)
-		return;
+		return false;
 
 	/* If the GPU is snooping the contents of the CPU cache,
 	 * we do not need to manually clear the CPU cache lines.  However,
@@ -3333,11 +3333,12 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
 	 * tracking.
 	 */
 	if (!force && cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
-		return;
+		return false;
 
 	trace_i915_gem_object_clflush(obj);
-
 	drm_clflush_sg(obj->pages);
+
+	return true;
 }
 
 /** Flushes the GTT write domain for the object if it's dirty. */
@@ -3377,8 +3378,9 @@ i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj,
 	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU)
 		return;
 
-	i915_gem_clflush_object(obj, force);
-	i915_gem_chipset_flush(obj->base.dev);
+	if (i915_gem_clflush_object(obj, force))
+		i915_gem_chipset_flush(obj->base.dev);
+
 	old_write_domain = obj->base.write_domain;
 	obj->base.write_domain = 0;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_exec.c b/drivers/gpu/drm/i915/i915_gem_exec.c
index 6af0067..a2c6dbf 100644
--- a/drivers/gpu/drm/i915/i915_gem_exec.c
+++ b/drivers/gpu/drm/i915/i915_gem_exec.c
@@ -50,8 +50,8 @@ static int i915_gem_exec_flush_object(struct drm_i915_gem_object *obj,
 		return ret;
 
 	if (obj->base.write_domain & I915_GEM_DOMAIN_CPU) {
-		i915_gem_clflush_object(obj, false);
-		i915_gem_chipset_flush(obj->base.dev);
+		if (i915_gem_clflush_object(obj, false))
+			i915_gem_chipset_flush(obj->base.dev);
 		obj->base.write_domain &= ~I915_GEM_DOMAIN_CPU;
 	}
 	if (obj->base.write_domain & I915_GEM_DOMAIN_GTT) {
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index e999578..7dcf78c 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -708,6 +708,7 @@ i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring,
 {
 	struct drm_i915_gem_object *obj;
 	uint32_t flush_domains = 0;
+	bool flush_chipset = false;
 	int ret;
 
 	list_for_each_entry(obj, objects, exec_list) {
@@ -716,12 +717,12 @@ i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring,
 			return ret;
 
 		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
-			i915_gem_clflush_object(obj, false);
+			flush_chipset |= i915_gem_clflush_object(obj, false);
 
 		flush_domains |= obj->base.write_domain;
 	}
 
-	if (flush_domains & I915_GEM_DOMAIN_CPU)
+	if (flush_chipset)
 		i915_gem_chipset_flush(ring->dev);
 
 	if (flush_domains & I915_GEM_DOMAIN_GTT)
-- 
1.8.4.rc1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 8/9] drm/i915: Use Write-Through cacheing for the display plane on Iris
  2013-08-08 13:41 [PATCH 1/9] drm/i915: Update rules for reading cache lines through the LLC Chris Wilson
                   ` (5 preceding siblings ...)
  2013-08-08 13:41 ` [PATCH 7/9] drm/i915: Only do a chipset flush after a clflush Chris Wilson
@ 2013-08-08 13:41 ` Chris Wilson
  2013-08-08 13:41 ` [PATCH 9/9] drm/i915: Allow the user to set bo into the DISPLAY cache domain Chris Wilson
  2013-08-08 16:42 ` [PATCH 1/9] drm/i915: Update rules for reading cache lines through the LLC Ville Syrjälä
  8 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2013-08-08 13:41 UTC (permalink / raw)
  To: intel-gfx

Haswell GT3e has the unique feature of supporting Write-Through cacheing
of objects within the eLLC/LLC. The purpose of this is to enable the display
plane to remain coherent whilst objects lie resident in the eLLC/LLC - so
that we, in theory, get the best of both worlds, perfect display and fast
access.

However, we still need to be careful as the CPU does not see the WT when
accessing the cache. In particular, this means that we need to flush the
cache lines after writing to an object through the CPU, and on
transitioning from a cached state to WT.

v2: Actually do the clflush on transition to WT, nagging by Ville.
v3: Flush the CPU cache after writes into WT objects.
v4: Rease onto LLC updates and report WT as "uncached" for
get_cache_level_ioctl to remain symmetric with set_cache_level_ioctl.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
---
 drivers/gpu/drm/i915/i915_dma.c     |  3 +++
 drivers/gpu/drm/i915/i915_drv.h     |  4 +++-
 drivers/gpu/drm/i915/i915_gem.c     | 14 ++++++++++++--
 drivers/gpu/drm/i915/i915_gem_gtt.c | 11 ++++++++++-
 include/uapi/drm/i915_drm.h         |  1 +
 5 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 9decc5b..de0b86a 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -976,6 +976,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_LLC:
 		value = HAS_LLC(dev);
 		break;
+	case I915_PARAM_HAS_WT:
+		value = HAS_WT(dev);
+		break;
 	case I915_PARAM_HAS_ALIASING_PPGTT:
 		value = dev_priv->mm.aliasing_ppgtt ? 1 : 0;
 		break;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c51a771..0a903c4 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -457,6 +457,7 @@ enum i915_cache_level {
 			      caches, eg sampler/render caches, and the
 			      large Last-Level-Cache. LLC is coherent with
 			      the CPU, but L3 is only visible to the GPU. */
+	I915_CACHE_WT, /* hsw:gt3e WriteThrough for scanouts */
 };
 
 typedef uint32_t gen6_gtt_pte_t;
@@ -1388,7 +1389,7 @@ struct drm_i915_gem_object {
 	unsigned int pending_fenced_gpu_access:1;
 	unsigned int fenced_gpu_access:1;
 
-	unsigned int cache_level:2;
+	unsigned int cache_level:3;
 
 	unsigned int has_aliasing_ppgtt_mapping:1;
 	unsigned int has_global_gtt_mapping:1;
@@ -1545,6 +1546,7 @@ struct drm_i915_file_private {
 #define HAS_BLT(dev)            (INTEL_INFO(dev)->has_blt_ring)
 #define HAS_VEBOX(dev)          (INTEL_INFO(dev)->has_vebox_ring)
 #define HAS_LLC(dev)            (INTEL_INFO(dev)->has_llc)
+#define HAS_WT(dev)            (IS_HASWELL(dev) && to_i915(dev)->ellc_size)
 #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
 
 #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 5)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2656c69..2b897f2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3551,7 +3551,16 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
 		goto unlock;
 	}
 
-	args->caching = obj->cache_level != I915_CACHE_NONE;
+	switch (obj->cache_level) {
+	case I915_CACHE_LLC:
+	case I915_CACHE_L3_LLC:
+		args->caching = I915_CACHING_CACHED;
+		break;
+
+	default:
+		args->caching = I915_CACHING_NONE;
+		break;
+	}
 
 	drm_gem_object_unreference(&obj->base);
 unlock:
@@ -3634,7 +3643,8 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	 * of uncaching, which would allow us to flush all the LLC-cached data
 	 * with that bit in the PTE to main memory with just one PIPE_CONTROL.
 	 */
-	ret = i915_gem_object_set_cache_level(obj, I915_CACHE_NONE);
+	ret = i915_gem_object_set_cache_level(obj,
+					      HAS_WT(obj->base.dev) ? I915_CACHE_WT : I915_CACHE_NONE);
 	if (ret)
 		goto err_unpin_display;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 89d8cc8..1791643 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -55,6 +55,7 @@
 #define HSW_WB_LLC_AGE3			HSW_CACHEABILITY_CONTROL(0x2)
 #define HSW_WB_LLC_AGE0			HSW_CACHEABILITY_CONTROL(0x3)
 #define HSW_WB_ELLC_LLC_AGE0		HSW_CACHEABILITY_CONTROL(0xb)
+#define HSW_WT_ELLC_LLC_AGE0		HSW_CACHEABILITY_CONTROL(0x6)
 
 static gen6_gtt_pte_t snb_pte_encode(dma_addr_t addr,
 				     enum i915_cache_level level)
@@ -138,8 +139,16 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	gen6_gtt_pte_t pte = GEN6_PTE_VALID;
 	pte |= HSW_PTE_ADDR_ENCODE(addr);
 
-	if (level != I915_CACHE_NONE)
+	switch (level) {
+	case I915_CACHE_NONE:
+		break;
+	case I915_CACHE_WT:
+		pte |= HSW_WT_ELLC_LLC_AGE0;
+		break;
+	default:
 		pte |= HSW_WB_ELLC_LLC_AGE0;
+		break;
+	}
 
 	return pte;
 }
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 2b282a2..39d14b3 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -338,6 +338,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_PINNED_BATCHES	 24
 #define I915_PARAM_HAS_EXEC_NO_RELOC	 25
 #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
+#define I915_PARAM_HAS_WT     	 	 27
 
 typedef struct drm_i915_getparam {
 	int param;
-- 
1.8.4.rc1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 9/9] drm/i915: Allow the user to set bo into the DISPLAY cache domain
  2013-08-08 13:41 [PATCH 1/9] drm/i915: Update rules for reading cache lines through the LLC Chris Wilson
                   ` (6 preceding siblings ...)
  2013-08-08 13:41 ` [PATCH 8/9] drm/i915: Use Write-Through cacheing for the display plane on Iris Chris Wilson
@ 2013-08-08 13:41 ` Chris Wilson
  2013-08-10 10:09   ` Daniel Vetter
  2013-08-08 16:42 ` [PATCH 1/9] drm/i915: Update rules for reading cache lines through the LLC Ville Syrjälä
  8 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2013-08-08 13:41 UTC (permalink / raw)
  To: intel-gfx

This is primarily for the benefit of the create2 ioctl so that the
caller can avoid the later step of rebinding the bo with new PTE bits.
After introducing WT (and possibly GFDT) cacheing for display targets,
not everything in the display is earmarked as UC, and more importantly
what is is controlled by the kernel.

Note that set_cache_level/get_cache_level for DISPLAY is not necessarily
idempotent; get_cache_level may return UC for architectures that have no
special cache domain for the display engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 7 +++++++
 include/uapi/drm/i915_drm.h     | 1 +
 2 files changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2b897f2..e68a129 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3557,6 +3557,10 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
 		args->caching = I915_CACHING_CACHED;
 		break;
 
+	case I915_CACHE_WT:
+		args->caching = I915_CACHING_DISPLAY;
+		break;
+
 	default:
 		args->caching = I915_CACHING_NONE;
 		break;
@@ -3583,6 +3587,9 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
 	case I915_CACHING_CACHED:
 		level = I915_CACHE_LLC;
 		break;
+	case I915_CACHING_DISPLAY:
+		level = HAS_WT(dev) ? I915_CACHE_WT : I915_CACHE_NONE;
+		break;
 	default:
 		return -EINVAL;
 	}
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 39d14b3..40582e5 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -853,6 +853,7 @@ struct drm_i915_gem_busy {
 
 #define I915_CACHING_NONE		0
 #define I915_CACHING_CACHED		1
+#define I915_CACHING_DISPLAY		2
 
 struct drm_i915_gem_caching {
 	/**
-- 
1.8.4.rc1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/9] drm/i915: Update rules for writing through the LLC with the cpu
  2013-08-08 13:41 ` [PATCH 3/9] drm/i915: Update rules for writing through the LLC with the cpu Chris Wilson
@ 2013-08-08 15:27   ` Ville Syrjälä
  2013-08-08 15:36     ` Chris Wilson
  2013-08-09 11:26   ` [PATCH] " Chris Wilson
  1 sibling, 1 reply; 28+ messages in thread
From: Ville Syrjälä @ 2013-08-08 15:27 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Thu, Aug 08, 2013 at 02:41:05PM +0100, Chris Wilson wrote:
> As mentioned in the previous commit, reads and writes from both the CPU
> and GPU go through the LLC. This gives us coherency between the CPU and
> GPU irrespective of the attribute settings either device sets. We can
> use to avoid having to clflush even uncached memory.
> 
> Except for the scanout.
> 
> The scanout resides within another functional block that does not use
> the LLC but reads directly from main memory. So in order to maintain
> coherency with the scanout, writes to uncached memory must be flushed.
> In order to optimize writes elsewhere, we start tracking whether an
> framebuffer is attached to an object.
> 
> v2: Use pin_display tracking rather than fb_count (to ensure we flush
> cursors as well etc) and only force the clflush along explicit writes to
> the scanout paths (i.e. pin_to_display_plane and pwrite into scanout).
> 
> Based on a patch by Ville Syrjälä.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h            |  2 +-
>  drivers/gpu/drm/i915/i915_gem.c            | 58 ++++++++++++++++--------------
>  drivers/gpu/drm/i915/i915_gem_exec.c       |  2 +-
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 +-
>  drivers/gpu/drm/i915/i915_gem_gtt.c        |  2 +-
>  5 files changed, 36 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 3622ec2..1ffae08 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1863,7 +1863,7 @@ static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
>  }
>  
>  void i915_gem_reset(struct drm_device *dev);
> -void i915_gem_clflush_object(struct drm_i915_gem_object *obj);
> +void i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
>  int __must_check i915_gem_object_set_domain(struct drm_i915_gem_object *obj,
>  					    uint32_t read_domains,
>  					    uint32_t write_domain);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c5e03ba..78535e9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -38,7 +38,8 @@
>  #include <linux/dma-buf.h>
>  
>  static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
> -static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
> +static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj,
> +						   bool force);
>  static __must_check int
>  i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
>  			   struct i915_address_space *vm,
> @@ -68,6 +69,14 @@ static bool cpu_cache_is_coherent(struct drm_device *dev,
>  	return HAS_LLC(dev) || level != I915_CACHE_NONE;
>  }
>  
> +static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> +{
> +	if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
> +		return true;
> +
> +	return obj->pin_display;
> +}
> +
>  static inline void i915_gem_object_fence_lost(struct drm_i915_gem_object *obj)
>  {
>  	if (obj->tiling_mode)
> @@ -830,8 +839,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
>  		 * write domain and manually flush cachelines (if required). This
>  		 * optimizes for the case when the gpu will use the data
>  		 * right away and we therefore have to clflush anyway. */
> -		if (obj->cache_level == I915_CACHE_NONE)
> -			needs_clflush_after = 1;
> +		needs_clflush_after = cpu_write_needs_clflush(obj);
>  		if (i915_gem_obj_bound_any(obj)) {
>  			ret = i915_gem_object_set_to_gtt_domain(obj, true);
>  			if (ret)
> @@ -921,7 +929,7 @@ out:
>  		 */
>  		if (!needs_clflush_after &&
>  		    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> -			i915_gem_clflush_object(obj);
> +			i915_gem_clflush_object(obj, false);

Shouldn't that be i915_gem_clflush_object(obj, obj->pin_display) ?

>  			i915_gem_chipset_flush(dev);
>  		}
>  	}

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/9] drm/i915: Update rules for writing through the LLC with the cpu
  2013-08-08 15:27   ` Ville Syrjälä
@ 2013-08-08 15:36     ` Chris Wilson
  2013-08-08 15:51       ` Ville Syrjälä
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2013-08-08 15:36 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Thu, Aug 08, 2013 at 06:27:12PM +0300, Ville Syrjälä wrote:
> On Thu, Aug 08, 2013 at 02:41:05PM +0100, Chris Wilson wrote:
> >  		if (!needs_clflush_after &&
> >  		    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> > -			i915_gem_clflush_object(obj);
> > +			i915_gem_clflush_object(obj, false);
> 
> Shouldn't that be i915_gem_clflush_object(obj, obj->pin_display) ?

!needs_clflush_after implies that we cache-coherent and not writing to a
scanout, so obj->pin_display must be false here.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/9] drm/i915: Update rules for writing through the LLC with the cpu
  2013-08-08 15:36     ` Chris Wilson
@ 2013-08-08 15:51       ` Ville Syrjälä
  2013-08-08 16:12         ` Chris Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Ville Syrjälä @ 2013-08-08 15:51 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On Thu, Aug 08, 2013 at 04:36:35PM +0100, Chris Wilson wrote:
> On Thu, Aug 08, 2013 at 06:27:12PM +0300, Ville Syrjälä wrote:
> > On Thu, Aug 08, 2013 at 02:41:05PM +0100, Chris Wilson wrote:
> > >  		if (!needs_clflush_after &&
> > >  		    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> > > -			i915_gem_clflush_object(obj);
> > > +			i915_gem_clflush_object(obj, false);
> > 
> > Shouldn't that be i915_gem_clflush_object(obj, obj->pin_display) ?
> 
> !needs_clflush_after implies that we cache-coherent and not writing to a
> scanout, so obj->pin_display must be false here.

But we dropped the lock in the slow path, so couldn't it have changed?

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/9] drm/i915: Update rules for writing through the LLC with the cpu
  2013-08-08 15:51       ` Ville Syrjälä
@ 2013-08-08 16:12         ` Chris Wilson
  0 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2013-08-08 16:12 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Thu, Aug 08, 2013 at 06:51:26PM +0300, Ville Syrjälä wrote:
> On Thu, Aug 08, 2013 at 04:36:35PM +0100, Chris Wilson wrote:
> > On Thu, Aug 08, 2013 at 06:27:12PM +0300, Ville Syrjälä wrote:
> > > On Thu, Aug 08, 2013 at 02:41:05PM +0100, Chris Wilson wrote:
> > > >  		if (!needs_clflush_after &&
> > > >  		    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> > > > -			i915_gem_clflush_object(obj);
> > > > +			i915_gem_clflush_object(obj, false);
> > > 
> > > Shouldn't that be i915_gem_clflush_object(obj, obj->pin_display) ?
> > 
> > !needs_clflush_after implies that we cache-coherent and not writing to a
> > scanout, so obj->pin_display must be false here.
> 
> But we dropped the lock in the slow path, so couldn't it have changed?

Que sera, sera. The contents after userspace races with itself are going
to be fairly arbitrary. Doing the flush is likely to be better than
not...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/9] drm/i915: Update rules for reading cache lines through the LLC
  2013-08-08 13:41 [PATCH 1/9] drm/i915: Update rules for reading cache lines through the LLC Chris Wilson
                   ` (7 preceding siblings ...)
  2013-08-08 13:41 ` [PATCH 9/9] drm/i915: Allow the user to set bo into the DISPLAY cache domain Chris Wilson
@ 2013-08-08 16:42 ` Ville Syrjälä
  8 siblings, 0 replies; 28+ messages in thread
From: Ville Syrjälä @ 2013-08-08 16:42 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Thu, Aug 08, 2013 at 02:41:03PM +0100, Chris Wilson wrote:
> The LLC is a fun device. The cache is a distinct functional block within
> the SA that arbitrates access from both the CPU and GPU cores. As such
> all writes to memory land first in the LLC before further action is
> taken. For example, an uncached write from either the CPU or GPU will
> then proceed to memory and evict the cacheline from the LLC. This means that
> a read from the LLC always returns the correct information even if the PTE
> bit in the GPU differs from the PAT bit in the CPU. For the older
> snooping architecture on non-LLC, the fundamental principle still holds
> except that some coordination is required between the CPU and GPU to
> explicitly perform the snooping (which is handled by our request
> tracking).
> 
> The upshot of this is that we know that we can issue a read from either
> LLC devices or snoopable memory and trust the contents of the cache -
> i.e. we can forgo a clflush before a read in these circumstances.
> Writing to memory from the CPU is a little more tricky as we have to
> consider that the scanout does not read from the CPU cache at all, but
> from main memory. So we have to currently treat all requests to write to
> uncached memory as having to be flushed to main memory for coherency
> with all consumers.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>

I've read through this series a few times now and haven't found any
monsters.

So, I only found these two small issues:
- is_pin_display() confused me and I suspect it could confuse others
  as well, so a comment would be nice
- the pwrite flush after taking the slowpath in 3/9

For everything else:
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] drm/i915: Track when an object is pinned for use by the display engine
  2013-08-08 13:41 ` [PATCH 2/9] drm/i915: Track when an object is pinned for use by the display engine Chris Wilson
@ 2013-08-09 11:25   ` Chris Wilson
  2013-08-09 11:54     ` Ville Syrjälä
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2013-08-09 11:25 UTC (permalink / raw)
  To: intel-gfx

The display engine has unique coherency rules such that it requires
special handling to ensure that all writes to cursors, scanouts and
sprites are clflushed. This patch introduces the infrastructure to
simply track when an object is being accessed by the display engine.

v2: Explain the is_pin_display() magic as the sources for obj->pin_count
and their individual rules is not obvious. (Ville)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  2 ++
 drivers/gpu/drm/i915/i915_drv.h      |  2 ++
 drivers/gpu/drm/i915/i915_gem.c      | 36 ++++++++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_display.c |  8 ++++----
 4 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index b06e6ce..463c660 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -117,6 +117,8 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		seq_printf(m, " (name: %d)", obj->base.name);
 	if (obj->pin_count)
 		seq_printf(m, " (pinned x %d)", obj->pin_count);
+	if (obj->pin_display)
+		seq_printf(m, " (display)");
 	if (obj->fence_reg != I915_FENCE_REG_NONE)
 		seq_printf(m, " (fence: %d)", obj->fence_reg);
 	list_for_each_entry(vma, &obj->vma_list, vma_link) {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 957f22e..a6a1cc3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1379,6 +1379,7 @@ struct drm_i915_gem_object {
 	 */
 	unsigned int fault_mappable:1;
 	unsigned int pin_mappable:1;
+	unsigned int pin_display:1;
 
 	/*
 	 * Is the GPU currently using a fence to access this buffer,
@@ -1888,6 +1889,7 @@ int __must_check
 i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 				     u32 alignment,
 				     struct intel_ring_buffer *pipelined);
+void i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj);
 int i915_gem_attach_phys_object(struct drm_device *dev,
 				struct drm_i915_gem_object *obj,
 				int id,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 065f927..3880f05 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3583,6 +3583,22 @@ unlock:
 	return ret;
 }
 
+static bool is_pin_display(struct drm_i915_gem_object *obj)
+{
+	/* There are 3 sources that pin objects:
+	 *   1. The display engine (scanouts, sprites, cursors);
+	 *   2. Reservations for execbuffer;
+	 *   3. The user.
+	 *
+	 * We can ignore reservations as we hold the struct_mutex and
+	 * are only called outside of the reservation path.  The user
+	 * can only increment pin_count once, and so if after
+	 * subtracting the potential reference by the user, any pin_count
+	 * remains, it must be due to another use by the display engine.
+	 */
+	return obj->pin_count - !!obj->user_pin_count;
+}
+
 /*
  * Prepare buffer for display plane (scanout, cursors, etc).
  * Can be called from an uninterruptible phase (modesetting) and allows
@@ -3602,6 +3618,11 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 			return ret;
 	}
 
+	/* Mark the pin_display early so that we account for the
+	 * display coherency whilst setting up the cache domains.
+	 */
+	obj->pin_display = true;
+
 	/* The display engine is not coherent with the LLC cache on gen6.  As
 	 * a result, we make sure that the pinning that is about to occur is
 	 * done with uncached PTEs. This is lowest common denominator for all
@@ -3613,7 +3634,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	 */
 	ret = i915_gem_object_set_cache_level(obj, I915_CACHE_NONE);
 	if (ret)
-		return ret;
+		goto err_unpin_display;
 
 	/* As the user may map the buffer once pinned in the display plane
 	 * (e.g. libkms for the bootup splash), we have to ensure that we
@@ -3621,7 +3642,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	 */
 	ret = i915_gem_obj_ggtt_pin(obj, alignment, true, false);
 	if (ret)
-		return ret;
+		goto err_unpin_display;
 
 	i915_gem_object_flush_cpu_write_domain(obj);
 
@@ -3639,6 +3660,17 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 					    old_write_domain);
 
 	return 0;
+
+err_unpin_display:
+	obj->pin_display = is_pin_display(obj);
+	return ret;
+}
+
+void
+i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj)
+{
+	i915_gem_object_unpin(obj);
+	obj->pin_display = is_pin_display(obj);
 }
 
 int
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 3bbbbe4..809b968 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -1887,7 +1887,7 @@ intel_pin_and_fence_fb_obj(struct drm_device *dev,
 	return 0;
 
 err_unpin:
-	i915_gem_object_unpin(obj);
+	i915_gem_object_unpin_from_display_plane(obj);
 err_interruptible:
 	dev_priv->mm.interruptible = true;
 	return ret;
@@ -1896,7 +1896,7 @@ err_interruptible:
 void intel_unpin_fb_obj(struct drm_i915_gem_object *obj)
 {
 	i915_gem_object_unpin_fence(obj);
-	i915_gem_object_unpin(obj);
+	i915_gem_object_unpin_from_display_plane(obj);
 }
 
 /* Computes the linear offset to the base tile and adjusts x, y. bytes per pixel
@@ -6769,7 +6769,7 @@ static int intel_crtc_cursor_set(struct drm_crtc *crtc,
 			if (intel_crtc->cursor_bo != obj)
 				i915_gem_detach_phys_object(dev, intel_crtc->cursor_bo);
 		} else
-			i915_gem_object_unpin(intel_crtc->cursor_bo);
+			i915_gem_object_unpin_from_display_plane(intel_crtc->cursor_bo);
 		drm_gem_object_unreference(&intel_crtc->cursor_bo->base);
 	}
 
@@ -6784,7 +6784,7 @@ static int intel_crtc_cursor_set(struct drm_crtc *crtc,
 
 	return 0;
 fail_unpin:
-	i915_gem_object_unpin(obj);
+	i915_gem_object_unpin_from_display_plane(obj);
 fail_locked:
 	mutex_unlock(&dev->struct_mutex);
 fail:
-- 
1.8.4.rc1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH] drm/i915: Update rules for writing through the LLC with the cpu
  2013-08-08 13:41 ` [PATCH 3/9] drm/i915: Update rules for writing through the LLC with the cpu Chris Wilson
  2013-08-08 15:27   ` Ville Syrjälä
@ 2013-08-09 11:26   ` Chris Wilson
  2013-08-09 11:56     ` Ville Syrjälä
  1 sibling, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2013-08-09 11:26 UTC (permalink / raw)
  To: intel-gfx

As mentioned in the previous commit, reads and writes from both the CPU
and GPU go through the LLC. This gives us coherency between the CPU and
GPU irrespective of the attribute settings either device sets. We can
use to avoid having to clflush even uncached memory.

Except for the scanout.

The scanout resides within another functional block that does not use
the LLC but reads directly from main memory. So in order to maintain
coherency with the scanout, writes to uncached memory must be flushed.
In order to optimize writes elsewhere, we start tracking whether an
framebuffer is attached to an object.

v2: Use pin_display tracking rather than fb_count (to ensure we flush
cursors as well etc) and only force the clflush along explicit writes to
the scanout paths (i.e. pin_to_display_plane and pwrite into scanout).

v3: Force the flush after hitting the slowpath in pwrite, as after
dropping the lock the object's cache domain may be invalidated. (Ville)

Based on a patch by Ville Syrjälä.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  2 +-
 drivers/gpu/drm/i915/i915_gem.c            | 58 ++++++++++++++++--------------
 drivers/gpu/drm/i915/i915_gem_exec.c       |  2 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c        |  2 +-
 5 files changed, 36 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a6a1cc3..48a5463 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1860,7 +1860,7 @@ static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
 }
 
 void i915_gem_reset(struct drm_device *dev);
-void i915_gem_clflush_object(struct drm_i915_gem_object *obj);
+void i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
 int __must_check i915_gem_object_set_tiling(struct drm_i915_gem_object *obj,
 					    int tiling_mode, int pitch);
 int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3880f05..650e0ac 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -38,7 +38,8 @@
 #include <linux/dma-buf.h>
 
 static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
-static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
+static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj,
+						   bool force);
 static __must_check int
 i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 			   struct i915_address_space *vm,
@@ -68,6 +69,14 @@ static bool cpu_cache_is_coherent(struct drm_device *dev,
 	return HAS_LLC(dev) || level != I915_CACHE_NONE;
 }
 
+static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
+{
+	if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
+		return true;
+
+	return obj->pin_display;
+}
+
 static inline void i915_gem_object_fence_lost(struct drm_i915_gem_object *obj)
 {
 	if (obj->tiling_mode)
@@ -830,8 +839,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 		 * write domain and manually flush cachelines (if required). This
 		 * optimizes for the case when the gpu will use the data
 		 * right away and we therefore have to clflush anyway. */
-		if (obj->cache_level == I915_CACHE_NONE)
-			needs_clflush_after = 1;
+		needs_clflush_after = cpu_write_needs_clflush(obj);
 		if (i915_gem_obj_bound_any(obj)) {
 			ret = i915_gem_object_set_to_gtt_domain(obj, true);
 			if (ret)
@@ -921,7 +929,7 @@ out:
 		 */
 		if (!needs_clflush_after &&
 		    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
-			i915_gem_clflush_object(obj);
+			i915_gem_clflush_object(obj, obj->pin_display);
 			i915_gem_chipset_flush(dev);
 		}
 	}
@@ -999,9 +1007,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 		goto out;
 	}
 
-	if (obj->cache_level == I915_CACHE_NONE &&
-	    obj->tiling_mode == I915_TILING_NONE &&
-	    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
+	if (obj->tiling_mode == I915_TILING_NONE &&
+	    obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
+	    cpu_write_needs_clflush(obj)) {
 		ret = i915_gem_gtt_pwrite_fast(dev, obj, args, file);
 		/* Note that the gtt paths might fail with non-page-backed user
 		 * pointers (e.g. gtt mappings when moving data between
@@ -1350,8 +1358,8 @@ i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 	}
 
 	/* Pinned buffers may be scanout, so flush the cache */
-	if (obj->pin_count)
-		i915_gem_object_flush_cpu_write_domain(obj);
+	if (obj->pin_display)
+		i915_gem_object_flush_cpu_write_domain(obj, true);
 
 	drm_gem_object_unreference(&obj->base);
 unlock:
@@ -1721,7 +1729,7 @@ i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
 		 * hope for the best.
 		 */
 		WARN_ON(ret != -EIO);
-		i915_gem_clflush_object(obj);
+		i915_gem_clflush_object(obj, true);
 		obj->base.read_domains = obj->base.write_domain = I915_GEM_DOMAIN_CPU;
 	}
 
@@ -3299,7 +3307,8 @@ err_unpin:
 }
 
 void
-i915_gem_clflush_object(struct drm_i915_gem_object *obj)
+i915_gem_clflush_object(struct drm_i915_gem_object *obj,
+			bool force)
 {
 	/* If we don't have a page list set up, then we're not pinned
 	 * to GPU, and we can ignore the cache flush because it'll happen
@@ -3323,7 +3332,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj)
 	 * snooping behaviour occurs naturally as the result of our domain
 	 * tracking.
 	 */
-	if (obj->cache_level != I915_CACHE_NONE)
+	if (!force && cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
 		return;
 
 	trace_i915_gem_object_clflush(obj);
@@ -3360,14 +3369,15 @@ i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj)
 
 /** Flushes the CPU write domain for the object if it's dirty. */
 static void
-i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj)
+i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj,
+				       bool force)
 {
 	uint32_t old_write_domain;
 
 	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU)
 		return;
 
-	i915_gem_clflush_object(obj);
+	i915_gem_clflush_object(obj, force);
 	i915_gem_chipset_flush(obj->base.dev);
 	old_write_domain = obj->base.write_domain;
 	obj->base.write_domain = 0;
@@ -3401,7 +3411,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 	if (ret)
 		return ret;
 
-	i915_gem_object_flush_cpu_write_domain(obj);
+	i915_gem_object_flush_cpu_write_domain(obj, false);
 
 	/* Serialise direct access to this object with the barriers for
 	 * coherent writes from the GPU, by effectively invalidating the
@@ -3491,7 +3501,11 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 					       obj, cache_level);
 	}
 
-	if (cache_level == I915_CACHE_NONE) {
+	list_for_each_entry(vma, &obj->vma_list, vma_link)
+		vma->node.color = cache_level;
+	obj->cache_level = cache_level;
+
+	if (cpu_write_needs_clflush(obj)) {
 		u32 old_read_domains, old_write_domain;
 
 		/* If we're coming from LLC cached, then we haven't
@@ -3514,9 +3528,6 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 						    old_write_domain);
 	}
 
-	list_for_each_entry(vma, &obj->vma_list, vma_link)
-		vma->node.color = cache_level;
-	obj->cache_level = cache_level;
 	i915_gem_verify_gtt(dev);
 	return 0;
 }
@@ -3644,7 +3655,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	if (ret)
 		goto err_unpin_display;
 
-	i915_gem_object_flush_cpu_write_domain(obj);
+	i915_gem_object_flush_cpu_write_domain(obj, true);
 
 	old_write_domain = obj->base.write_domain;
 	old_read_domains = obj->base.read_domains;
@@ -3716,8 +3727,7 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 
 	/* Flush the CPU cache if it's still invalid. */
 	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
-		if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
-			i915_gem_clflush_object(obj);
+		i915_gem_clflush_object(obj, false);
 
 		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
 	}
@@ -3899,10 +3909,6 @@ i915_gem_pin_ioctl(struct drm_device *dev, void *data,
 	obj->user_pin_count++;
 	obj->pin_filp = file;
 
-	/* XXX - flush the CPU caches for pinned objects
-	 * as the X server doesn't manage domains yet
-	 */
-	i915_gem_object_flush_cpu_write_domain(obj);
 	args->offset = i915_gem_obj_ggtt_offset(obj);
 out:
 	drm_gem_object_unreference(&obj->base);
diff --git a/drivers/gpu/drm/i915/i915_gem_exec.c b/drivers/gpu/drm/i915/i915_gem_exec.c
index d2ac077..6af0067 100644
--- a/drivers/gpu/drm/i915/i915_gem_exec.c
+++ b/drivers/gpu/drm/i915/i915_gem_exec.c
@@ -50,7 +50,7 @@ static int i915_gem_exec_flush_object(struct drm_i915_gem_object *obj,
 		return ret;
 
 	if (obj->base.write_domain & I915_GEM_DOMAIN_CPU) {
-		i915_gem_clflush_object(obj);
+		i915_gem_clflush_object(obj, false);
 		i915_gem_chipset_flush(obj->base.dev);
 		obj->base.write_domain &= ~I915_GEM_DOMAIN_CPU;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 8ccc29a..e999578 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -716,7 +716,7 @@ i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring,
 			return ret;
 
 		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
-			i915_gem_clflush_object(obj);
+			i915_gem_clflush_object(obj, false);
 
 		flush_domains |= obj->base.write_domain;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d29b1db..89d8cc8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -487,7 +487,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 				       dev_priv->gtt.base.total / PAGE_SIZE);
 
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-		i915_gem_clflush_object(obj);
+		i915_gem_clflush_object(obj, obj->pin_display);
 		i915_gem_gtt_bind_object(obj, obj->cache_level);
 	}
 
-- 
1.8.4.rc1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] drm/i915: Track when an object is pinned for use by the display engine
  2013-08-09 11:25   ` [PATCH] " Chris Wilson
@ 2013-08-09 11:54     ` Ville Syrjälä
  0 siblings, 0 replies; 28+ messages in thread
From: Ville Syrjälä @ 2013-08-09 11:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Aug 09, 2013 at 12:25:09PM +0100, Chris Wilson wrote:
> The display engine has unique coherency rules such that it requires
> special handling to ensure that all writes to cursors, scanouts and
> sprites are clflushed. This patch introduces the infrastructure to
> simply track when an object is being accessed by the display engine.
> 
> v2: Explain the is_pin_display() magic as the sources for obj->pin_count
> and their individual rules is not obvious. (Ville)
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_debugfs.c  |  2 ++
>  drivers/gpu/drm/i915/i915_drv.h      |  2 ++
>  drivers/gpu/drm/i915/i915_gem.c      | 36 ++++++++++++++++++++++++++++++++++--
>  drivers/gpu/drm/i915/intel_display.c |  8 ++++----
>  4 files changed, 42 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index b06e6ce..463c660 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -117,6 +117,8 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>  		seq_printf(m, " (name: %d)", obj->base.name);
>  	if (obj->pin_count)
>  		seq_printf(m, " (pinned x %d)", obj->pin_count);
> +	if (obj->pin_display)
> +		seq_printf(m, " (display)");
>  	if (obj->fence_reg != I915_FENCE_REG_NONE)
>  		seq_printf(m, " (fence: %d)", obj->fence_reg);
>  	list_for_each_entry(vma, &obj->vma_list, vma_link) {
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 957f22e..a6a1cc3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1379,6 +1379,7 @@ struct drm_i915_gem_object {
>  	 */
>  	unsigned int fault_mappable:1;
>  	unsigned int pin_mappable:1;
> +	unsigned int pin_display:1;
>  
>  	/*
>  	 * Is the GPU currently using a fence to access this buffer,
> @@ -1888,6 +1889,7 @@ int __must_check
>  i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
>  				     u32 alignment,
>  				     struct intel_ring_buffer *pipelined);
> +void i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj);
>  int i915_gem_attach_phys_object(struct drm_device *dev,
>  				struct drm_i915_gem_object *obj,
>  				int id,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 065f927..3880f05 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3583,6 +3583,22 @@ unlock:
>  	return ret;
>  }
>  
> +static bool is_pin_display(struct drm_i915_gem_object *obj)
> +{
> +	/* There are 3 sources that pin objects:
> +	 *   1. The display engine (scanouts, sprites, cursors);
> +	 *   2. Reservations for execbuffer;
> +	 *   3. The user.
> +	 *
> +	 * We can ignore reservations as we hold the struct_mutex and
> +	 * are only called outside of the reservation path.  The user
> +	 * can only increment pin_count once, and so if after
> +	 * subtracting the potential reference by the user, any pin_count
> +	 * remains, it must be due to another use by the display engine.
> +	 */
> +	return obj->pin_count - !!obj->user_pin_count;
> +}
> +
>  /*
>   * Prepare buffer for display plane (scanout, cursors, etc).
>   * Can be called from an uninterruptible phase (modesetting) and allows
> @@ -3602,6 +3618,11 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
>  			return ret;
>  	}
>  
> +	/* Mark the pin_display early so that we account for the
> +	 * display coherency whilst setting up the cache domains.
> +	 */
> +	obj->pin_display = true;
> +
>  	/* The display engine is not coherent with the LLC cache on gen6.  As
>  	 * a result, we make sure that the pinning that is about to occur is
>  	 * done with uncached PTEs. This is lowest common denominator for all
> @@ -3613,7 +3634,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
>  	 */
>  	ret = i915_gem_object_set_cache_level(obj, I915_CACHE_NONE);
>  	if (ret)
> -		return ret;
> +		goto err_unpin_display;
>  
>  	/* As the user may map the buffer once pinned in the display plane
>  	 * (e.g. libkms for the bootup splash), we have to ensure that we
> @@ -3621,7 +3642,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
>  	 */
>  	ret = i915_gem_obj_ggtt_pin(obj, alignment, true, false);
>  	if (ret)
> -		return ret;
> +		goto err_unpin_display;
>  
>  	i915_gem_object_flush_cpu_write_domain(obj);
>  
> @@ -3639,6 +3660,17 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
>  					    old_write_domain);
>  
>  	return 0;
> +
> +err_unpin_display:
> +	obj->pin_display = is_pin_display(obj);
> +	return ret;
> +}
> +
> +void
> +i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj)
> +{
> +	i915_gem_object_unpin(obj);
> +	obj->pin_display = is_pin_display(obj);
>  }
>  
>  int
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 3bbbbe4..809b968 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -1887,7 +1887,7 @@ intel_pin_and_fence_fb_obj(struct drm_device *dev,
>  	return 0;
>  
>  err_unpin:
> -	i915_gem_object_unpin(obj);
> +	i915_gem_object_unpin_from_display_plane(obj);
>  err_interruptible:
>  	dev_priv->mm.interruptible = true;
>  	return ret;
> @@ -1896,7 +1896,7 @@ err_interruptible:
>  void intel_unpin_fb_obj(struct drm_i915_gem_object *obj)
>  {
>  	i915_gem_object_unpin_fence(obj);
> -	i915_gem_object_unpin(obj);
> +	i915_gem_object_unpin_from_display_plane(obj);
>  }
>  
>  /* Computes the linear offset to the base tile and adjusts x, y. bytes per pixel
> @@ -6769,7 +6769,7 @@ static int intel_crtc_cursor_set(struct drm_crtc *crtc,
>  			if (intel_crtc->cursor_bo != obj)
>  				i915_gem_detach_phys_object(dev, intel_crtc->cursor_bo);
>  		} else
> -			i915_gem_object_unpin(intel_crtc->cursor_bo);
> +			i915_gem_object_unpin_from_display_plane(intel_crtc->cursor_bo);
>  		drm_gem_object_unreference(&intel_crtc->cursor_bo->base);
>  	}
>  
> @@ -6784,7 +6784,7 @@ static int intel_crtc_cursor_set(struct drm_crtc *crtc,
>  
>  	return 0;
>  fail_unpin:
> -	i915_gem_object_unpin(obj);
> +	i915_gem_object_unpin_from_display_plane(obj);
>  fail_locked:
>  	mutex_unlock(&dev->struct_mutex);
>  fail:
> -- 
> 1.8.4.rc1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] drm/i915: Update rules for writing through the LLC with the cpu
  2013-08-09 11:26   ` [PATCH] " Chris Wilson
@ 2013-08-09 11:56     ` Ville Syrjälä
  0 siblings, 0 replies; 28+ messages in thread
From: Ville Syrjälä @ 2013-08-09 11:56 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Aug 09, 2013 at 12:26:45PM +0100, Chris Wilson wrote:
> As mentioned in the previous commit, reads and writes from both the CPU
> and GPU go through the LLC. This gives us coherency between the CPU and
> GPU irrespective of the attribute settings either device sets. We can
> use to avoid having to clflush even uncached memory.
> 
> Except for the scanout.
> 
> The scanout resides within another functional block that does not use
> the LLC but reads directly from main memory. So in order to maintain
> coherency with the scanout, writes to uncached memory must be flushed.
> In order to optimize writes elsewhere, we start tracking whether an
> framebuffer is attached to an object.
> 
> v2: Use pin_display tracking rather than fb_count (to ensure we flush
> cursors as well etc) and only force the clflush along explicit writes to
> the scanout paths (i.e. pin_to_display_plane and pwrite into scanout).
> 
> v3: Force the flush after hitting the slowpath in pwrite, as after
> dropping the lock the object's cache domain may be invalidated. (Ville)
> 
> Based on a patch by Ville Syrjälä.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/9] drm/i915: Allocate context objects from stolen
  2013-08-08 13:41 ` [PATCH 6/9] drm/i915: Allocate context objects " Chris Wilson
@ 2013-08-10  9:25   ` Daniel Vetter
  2013-08-10  9:34     ` Chris Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Vetter @ 2013-08-10  9:25 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Thu, Aug 08, 2013 at 02:41:08PM +0100, Chris Wilson wrote:
> Once again, the CPU PAT bits are irrelevant when considering the GPU
> cacheing, and context objects are never accessed from the CPU or
> directly by userspace making them another ideal candidate to allocate
> from stolen memory.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

I think this will break hibernate, since over hibernate stolen get some
garbage. And userspace (and the gpu) probably don't expect garbage when
trying to restore a hw context.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index a57d49a..498f8a0 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -141,6 +141,7 @@ create_hw_context(struct drm_device *dev,
>  		  struct drm_i915_file_private *file_priv)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> +	const int size = dev_priv->hw_context_size;
>  	struct i915_hw_context *ctx;
>  	int ret;
>  
> @@ -149,7 +150,9 @@ create_hw_context(struct drm_device *dev,
>  		return ERR_PTR(-ENOMEM);
>  
>  	kref_init(&ctx->ref);
> -	ctx->obj = i915_gem_alloc_object(dev, dev_priv->hw_context_size);
> +	ctx->obj = i915_gem_object_create_stolen(dev, size);
> +	if (ctx->obj == NULL)
> +		ctx->obj = i915_gem_alloc_object(dev, size);
>  	if (ctx->obj == NULL) {
>  		kfree(ctx);
>  		DRM_DEBUG_DRIVER("Context object allocated failed\n");
> -- 
> 1.8.4.rc1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/9] drm/i915: Allocate context objects from stolen
  2013-08-10  9:25   ` Daniel Vetter
@ 2013-08-10  9:34     ` Chris Wilson
  2013-08-10  9:44       ` Daniel Vetter
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2013-08-10  9:34 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Sat, Aug 10, 2013 at 11:25:46AM +0200, Daniel Vetter wrote:
> On Thu, Aug 08, 2013 at 02:41:08PM +0100, Chris Wilson wrote:
> > Once again, the CPU PAT bits are irrelevant when considering the GPU
> > cacheing, and context objects are never accessed from the CPU or
> > directly by userspace making them another ideal candidate to allocate
> > from stolen memory.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> I think this will break hibernate, since over hibernate stolen get some
> garbage. And userspace (and the gpu) probably don't expect garbage when
> trying to restore a hw context.

Same will be true for framebuffers, which we definitely want to use
stolen for. Suggests we will need to allocate temporaries across
hibernate or warn userspace that the contents are garbage.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/9] drm/i915: Allocate context objects from stolen
  2013-08-10  9:34     ` Chris Wilson
@ 2013-08-10  9:44       ` Daniel Vetter
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Vetter @ 2013-08-10  9:44 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx

On Sat, Aug 10, 2013 at 10:34:29AM +0100, Chris Wilson wrote:
> On Sat, Aug 10, 2013 at 11:25:46AM +0200, Daniel Vetter wrote:
> > On Thu, Aug 08, 2013 at 02:41:08PM +0100, Chris Wilson wrote:
> > > Once again, the CPU PAT bits are irrelevant when considering the GPU
> > > cacheing, and context objects are never accessed from the CPU or
> > > directly by userspace making them another ideal candidate to allocate
> > > from stolen memory.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > I think this will break hibernate, since over hibernate stolen get some
> > garbage. And userspace (and the gpu) probably don't expect garbage when
> > trying to restore a hw context.
> 
> Same will be true for framebuffers, which we definitely want to use
> stolen for. Suggests we will need to allocate temporaries across
> hibernate or warn userspace that the contents are garbage.

Yeah, it's not just an issue for contexts but for everything that we don't
fully reinit upon wake-up from hibernate. But I have no idea how
allocating a temporary shmem node when we do hibernate interferes with the
hibernate's page cache shrinker ... Maybe we need to simply clear to black
upon resume. The other solution would be to simply give back all the
leftover stolen mem to the linux kernel, maybe as a cma block. And teach
our allocation functions more smarts to grab hugepages (or cma junks for
the fb compressed buffer).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 9/9] drm/i915: Allow the user to set bo into the DISPLAY cache domain
  2013-08-08 13:41 ` [PATCH 9/9] drm/i915: Allow the user to set bo into the DISPLAY cache domain Chris Wilson
@ 2013-08-10 10:09   ` Daniel Vetter
  2013-08-10 10:19     ` Chris Wilson
  2013-08-12 16:53     ` [PATCH 9/9] drm/i915: Allow the user to set bo into the DISPLAY cache domain Daniel Vetter
  0 siblings, 2 replies; 28+ messages in thread
From: Daniel Vetter @ 2013-08-10 10:09 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Thu, Aug 08, 2013 at 02:41:11PM +0100, Chris Wilson wrote:
> This is primarily for the benefit of the create2 ioctl so that the
> caller can avoid the later step of rebinding the bo with new PTE bits.
> After introducing WT (and possibly GFDT) cacheing for display targets,
> not everything in the display is earmarked as UC, and more importantly
> what is is controlled by the kernel.
> 
> Note that set_cache_level/get_cache_level for DISPLAY is not necessarily
> idempotent; get_cache_level may return UC for architectures that have no
> special cache domain for the display engine.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

You know the drill: A bit of igt testcoverage would be neat, I think
simply adding the display domain everywhere we test uncached/snooped
already should be more than good enough. So I'll punt on this one here for
now, all other patches (with the exception of the hw context from stolen
one) are merged to dinq.

Thanks, Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 7 +++++++
>  include/uapi/drm/i915_drm.h     | 1 +
>  2 files changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 2b897f2..e68a129 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3557,6 +3557,10 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
>  		args->caching = I915_CACHING_CACHED;
>  		break;
>  
> +	case I915_CACHE_WT:
> +		args->caching = I915_CACHING_DISPLAY;
> +		break;
> +
>  	default:
>  		args->caching = I915_CACHING_NONE;
>  		break;
> @@ -3583,6 +3587,9 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>  	case I915_CACHING_CACHED:
>  		level = I915_CACHE_LLC;
>  		break;
> +	case I915_CACHING_DISPLAY:
> +		level = HAS_WT(dev) ? I915_CACHE_WT : I915_CACHE_NONE;
> +		break;
>  	default:
>  		return -EINVAL;
>  	}
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 39d14b3..40582e5 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -853,6 +853,7 @@ struct drm_i915_gem_busy {
>  
>  #define I915_CACHING_NONE		0
>  #define I915_CACHING_CACHED		1
> +#define I915_CACHING_DISPLAY		2
>  
>  struct drm_i915_gem_caching {
>  	/**
> -- 
> 1.8.4.rc1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 9/9] drm/i915: Allow the user to set bo into the DISPLAY cache domain
  2013-08-10 10:09   ` Daniel Vetter
@ 2013-08-10 10:19     ` Chris Wilson
  2013-08-10 12:54       ` [PATCH] drm/i915: reserve I915_CACHING_DISPLAY and document cache modes Daniel Vetter
  2013-08-10 12:57       ` Daniel Vetter
  2013-08-12 16:53     ` [PATCH 9/9] drm/i915: Allow the user to set bo into the DISPLAY cache domain Daniel Vetter
  1 sibling, 2 replies; 28+ messages in thread
From: Chris Wilson @ 2013-08-10 10:19 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Sat, Aug 10, 2013 at 12:09:26PM +0200, Daniel Vetter wrote:
> On Thu, Aug 08, 2013 at 02:41:11PM +0100, Chris Wilson wrote:
> > This is primarily for the benefit of the create2 ioctl so that the
> > caller can avoid the later step of rebinding the bo with new PTE bits.
> > After introducing WT (and possibly GFDT) cacheing for display targets,
> > not everything in the display is earmarked as UC, and more importantly
> > what is is controlled by the kernel.
> > 
> > Note that set_cache_level/get_cache_level for DISPLAY is not necessarily
> > idempotent; get_cache_level may return UC for architectures that have no
> > special cache domain for the display engine.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> You know the drill: A bit of igt testcoverage would be neat, I think
> simply adding the display domain everywhere we test uncached/snooped
> already should be more than good enough. So I'll punt on this one here for
> now, all other patches (with the exception of the hw context from stolen
> one) are merged to dinq.

Catch-22, you will not get igt until you at least reserve the enum.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] drm/i915: reserve I915_CACHING_DISPLAY and document cache modes
  2013-08-10 10:19     ` Chris Wilson
@ 2013-08-10 12:54       ` Daniel Vetter
  2013-08-10 12:57       ` Daniel Vetter
  1 sibling, 0 replies; 28+ messages in thread
From: Daniel Vetter @ 2013-08-10 12:54 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

Resolve the catch-22 of igt needing a stable number and patches first
needing testcases by reserving the interface number up-front.

Requested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 include/uapi/drm/i915_drm.h | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a1a7b6b..96b86f8 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -768,8 +768,32 @@ struct drm_i915_gem_busy {
 	__u32 busy;
 };
 
+/**
+ * I915_CACHING_NONE
+ *
+ * GPU access is not coherent with cpu caches. Default for machines without a
+ * LLC.
+ */
 #define I915_CACHING_NONE		0
+/**
+ * I915_CACHGING_CACHED
+ *
+ * GPU access is coherent with cpu caches and furthermore the data is cached in
+ * last-level caches shared between cpu cores and the gpu GT. Default on
+ * machines with HAS_LLC.
+ */
 #define I915_CACHING_CACHED		1
+/**
+ * I915_CACHING_DISPLAY
+ *
+ * Special GPU which is coherent with the scanout engines. Transparently falls
+ * back to I915_CACHING_NONE on platforms where not special cache mode (like
+ * write-through or gfdt flushing) is available. The kernel automatically sets
+ * this mode when using a buffer as a scanout target. Userspace can manually set
+ * this mode to avoid a costly stall and clflush in the hotpath of drawing the
+ * first frame.
+ */
+#define I915_CACHING_CACHED		2
 
 struct drm_i915_gem_caching {
 	/**
-- 
1.8.4.rc1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH] drm/i915: reserve I915_CACHING_DISPLAY and document cache modes
  2013-08-10 10:19     ` Chris Wilson
  2013-08-10 12:54       ` [PATCH] drm/i915: reserve I915_CACHING_DISPLAY and document cache modes Daniel Vetter
@ 2013-08-10 12:57       ` Daniel Vetter
  2013-08-10 13:09         ` Chris Wilson
  1 sibling, 1 reply; 28+ messages in thread
From: Daniel Vetter @ 2013-08-10 12:57 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

Resolve the catch-22 of igt needing a stable number and patches first
needing testcases by reserving the interface number up-front.

v2: Improve the spelling a bit.

Requested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 include/uapi/drm/i915_drm.h | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a1a7b6b..86fc1ba 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -768,8 +768,32 @@ struct drm_i915_gem_busy {
 	__u32 busy;
 };
 
+/**
+ * I915_CACHING_NONE
+ *
+ * GPU access is not coherent with cpu caches. Default for machines without an
+ * LLC.
+ */
 #define I915_CACHING_NONE		0
+/**
+ * I915_CACHGING_CACHED
+ *
+ * GPU access is coherent with cpu caches and furthermore the data is cached in
+ * last-level caches shared between cpu cores and the gpu GT. Default on
+ * machines with HAS_LLC.
+ */
 #define I915_CACHING_CACHED		1
+/**
+ * I915_CACHING_DISPLAY
+ *
+ * Special GPU caching mode which is coherent with the scanout engines.
+ * Transparently falls back to I915_CACHING_NONE on platforms where not special
+ * cache mode (like write-through or gfdt flushing) is available. The kernel
+ * automatically sets this mode when using a buffer as a scanout target.
+ * Userspace can manually set this mode to avoid a costly stall and clflush in
+ * the hotpath of drawing the first frame.
+ */
+#define I915_CACHING_CACHED		2
 
 struct drm_i915_gem_caching {
 	/**
-- 
1.8.4.rc1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] drm/i915: reserve I915_CACHING_DISPLAY and document cache modes
  2013-08-10 12:57       ` Daniel Vetter
@ 2013-08-10 13:09         ` Chris Wilson
  2013-08-10 15:54           ` Daniel Vetter
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2013-08-10 13:09 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel Graphics Development

On Sat, Aug 10, 2013 at 02:57:27PM +0200, Daniel Vetter wrote:
> Resolve the catch-22 of igt needing a stable number and patches first
> needing testcases by reserving the interface number up-front.
> 
> v2: Improve the spelling a bit.

Missed CACHGING and s/not special/no special/
Otherwise,
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] drm/i915: reserve I915_CACHING_DISPLAY and document cache modes
  2013-08-10 13:09         ` Chris Wilson
@ 2013-08-10 15:54           ` Daniel Vetter
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Vetter @ 2013-08-10 15:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, Intel Graphics Development

On Sat, Aug 10, 2013 at 02:09:20PM +0100, Chris Wilson wrote:
> On Sat, Aug 10, 2013 at 02:57:27PM +0200, Daniel Vetter wrote:
> > Resolve the catch-22 of igt needing a stable number and patches first
> > needing testcases by reserving the interface number up-front.
> > 
> > v2: Improve the spelling a bit.
> 
> Missed CACHGING and s/not special/no special/
> Otherwise,
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Fixed and merged, thanks for the review.
-Daniel


> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 9/9] drm/i915: Allow the user to set bo into the DISPLAY cache domain
  2013-08-10 10:09   ` Daniel Vetter
  2013-08-10 10:19     ` Chris Wilson
@ 2013-08-12 16:53     ` Daniel Vetter
  1 sibling, 0 replies; 28+ messages in thread
From: Daniel Vetter @ 2013-08-12 16:53 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Sat, Aug 10, 2013 at 12:09:26PM +0200, Daniel Vetter wrote:
> On Thu, Aug 08, 2013 at 02:41:11PM +0100, Chris Wilson wrote:
> > This is primarily for the benefit of the create2 ioctl so that the
> > caller can avoid the later step of rebinding the bo with new PTE bits.
> > After introducing WT (and possibly GFDT) cacheing for display targets,
> > not everything in the display is earmarked as UC, and more importantly
> > what is is controlled by the kernel.
> > 
> > Note that set_cache_level/get_cache_level for DISPLAY is not necessarily
> > idempotent; get_cache_level may return UC for architectures that have no
> > special cache domain for the display engine.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> You know the drill: A bit of igt testcoverage would be neat, I think
> simply adding the display domain everywhere we test uncached/snooped
> already should be more than good enough. So I'll punt on this one here for
> now, all other patches (with the exception of the hw context from stolen
> one) are merged to dinq.

Ok, merged this one and the previous one (which I've thought I've merged,
but apparently didn't). Thanks for the patches and writing the igts.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-08-12 16:53 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-08 13:41 [PATCH 1/9] drm/i915: Update rules for reading cache lines through the LLC Chris Wilson
2013-08-08 13:41 ` [PATCH 2/9] drm/i915: Track when an object is pinned for use by the display engine Chris Wilson
2013-08-09 11:25   ` [PATCH] " Chris Wilson
2013-08-09 11:54     ` Ville Syrjälä
2013-08-08 13:41 ` [PATCH 3/9] drm/i915: Update rules for writing through the LLC with the cpu Chris Wilson
2013-08-08 15:27   ` Ville Syrjälä
2013-08-08 15:36     ` Chris Wilson
2013-08-08 15:51       ` Ville Syrjälä
2013-08-08 16:12         ` Chris Wilson
2013-08-09 11:26   ` [PATCH] " Chris Wilson
2013-08-09 11:56     ` Ville Syrjälä
2013-08-08 13:41 ` [PATCH 4/9] drm/i915: Allow the GPU to cache stolen memory Chris Wilson
2013-08-08 13:41 ` [PATCH 5/9] drm/i915: Allocate LLC ringbuffers from stolen Chris Wilson
2013-08-08 13:41 ` [PATCH 6/9] drm/i915: Allocate context objects " Chris Wilson
2013-08-10  9:25   ` Daniel Vetter
2013-08-10  9:34     ` Chris Wilson
2013-08-10  9:44       ` Daniel Vetter
2013-08-08 13:41 ` [PATCH 7/9] drm/i915: Only do a chipset flush after a clflush Chris Wilson
2013-08-08 13:41 ` [PATCH 8/9] drm/i915: Use Write-Through cacheing for the display plane on Iris Chris Wilson
2013-08-08 13:41 ` [PATCH 9/9] drm/i915: Allow the user to set bo into the DISPLAY cache domain Chris Wilson
2013-08-10 10:09   ` Daniel Vetter
2013-08-10 10:19     ` Chris Wilson
2013-08-10 12:54       ` [PATCH] drm/i915: reserve I915_CACHING_DISPLAY and document cache modes Daniel Vetter
2013-08-10 12:57       ` Daniel Vetter
2013-08-10 13:09         ` Chris Wilson
2013-08-10 15:54           ` Daniel Vetter
2013-08-12 16:53     ` [PATCH 9/9] drm/i915: Allow the user to set bo into the DISPLAY cache domain Daniel Vetter
2013-08-08 16:42 ` [PATCH 1/9] drm/i915: Update rules for reading cache lines through the LLC Ville Syrjälä

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox