public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] Command parser batch buffer copy
@ 2014-11-03 19:19 bradley.d.volkin
  2014-11-03 19:19 ` [PATCH v3 1/5] drm/i915: Implement a framework for batch buffer pools bradley.d.volkin
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: bradley.d.volkin @ 2014-11-03 19:19 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

This is v3 of the series I sent here:
http://lists.freedesktop.org/archives/intel-gfx/2014-July/048705.html

Most of the previous commentary still applies. We've fixed the secure
dispatch regression though, so the series now puts the parser into
enabling mode in patch 2.

There are currently some regressions. I've sent i-g-t patches for a couple that
are test issues. The remaining issues are:

drv_hangman error-state-capture-*
    The test has checks that the logged 'gtt_offset' matches the expected
    offset of the userspace-supplied batch buffer. Similarly for the address
    in an MI_BATCH_BUFFER_START command found in the logged ringbuffer contents.
    These obviously won't match if the buffer submitted to hardware is from the
    batch pool instead of the one from userspace.

gem_reloc_vs_gpu *-thrash-inactive
gem_persistent_relocs *-thrash-inactive
    These fail with this type of error:

        Test assertion failure function do_test, file gem_reloc_vs_gpu.c:221:
        Failed assertion: test == 0xdeadbeef
        mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
        child 6 failed with exit status 99
        Subtest forked-thrash-inactive: FAIL (3.824s)

    One crashed, apparently in i915_gem_object_move_to_inactive() called via
    i915_gem_reset(). I assume there's an issue with my active tracking or
    madv usage for batch pool objects. Any input would be helpful.

gem_cs_tlb
    This test takes longer and may time out.

Brad Volkin (5):
  drm/i915: Implement a framework for batch buffer pools
  drm/i915: Use batch pools with the command parser
  drm/i915: Add a batch pool debugfs file
  drm/i915: Add batch pool details to i915_gem_objects debugfs
  drm/i915: Use batch length instead of object size in command parser

 Documentation/DocBook/drm.tmpl             |   5 +
 drivers/gpu/drm/i915/Makefile              |   1 +
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 102 +++++++++++++++----
 drivers/gpu/drm/i915/i915_debugfs.c        |  86 ++++++++++++++--
 drivers/gpu/drm/i915/i915_dma.c            |   1 +
 drivers/gpu/drm/i915/i915_drv.h            |  26 +++++
 drivers/gpu/drm/i915/i915_gem.c            |  11 +++
 drivers/gpu/drm/i915/i915_gem_batch_pool.c | 153 +++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  62 +++++++++++-
 9 files changed, 416 insertions(+), 31 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_batch_pool.c

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/5] drm/i915: Implement a framework for batch buffer pools
  2014-11-03 19:19 [PATCH v3 0/5] Command parser batch buffer copy bradley.d.volkin
@ 2014-11-03 19:19 ` bradley.d.volkin
  2014-11-03 19:19 ` [PATCH v3 2/5] drm/i915: Use batch pools with the command parser bradley.d.volkin
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: bradley.d.volkin @ 2014-11-03 19:19 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

This adds a small module for managing a pool of batch buffers.
The only current use case is for the command parser, as described
in the kerneldoc in the patch. The code is simple, but separating
it out makes it easier to change the underlying algorithms and to
extend to future use cases should they arise.

The interface is simple: init to create an empty pool, fini to
clean it up; get to obtain a new buffer, put to return it to the
pool. Note that all buffers must be returned to the pool before
cleaning up the pool.

Buffers are purgeable while in the pool, but not explicitly
truncated in order to avoid overhead during execbuf.

Locking is currently based on the caller holding the struct_mutex.
We already do that in the places where we will use the batch pool
for the command parser.

v2:
- s/BUG_ON/WARN_ON/ for locking assertions
- Remove the cap on pool size
- Switch from alloc/free to init/fini

v3:
- Idiomatic looping structure in _fini
- Correct handling of purged objects
- Don't return a buffer that's too much larger than needed

v4:
- Rebased to latest -nightly

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 Documentation/DocBook/drm.tmpl             |   5 +
 drivers/gpu/drm/i915/Makefile              |   1 +
 drivers/gpu/drm/i915/i915_drv.h            |  17 ++++
 drivers/gpu/drm/i915/i915_gem.c            |   1 +
 drivers/gpu/drm/i915/i915_gem_batch_pool.c | 153 +++++++++++++++++++++++++++++
 5 files changed, 177 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_batch_pool.c

diff --git a/Documentation/DocBook/drm.tmpl b/Documentation/DocBook/drm.tmpl
index f6a9d7b..133f4e6 100644
--- a/Documentation/DocBook/drm.tmpl
+++ b/Documentation/DocBook/drm.tmpl
@@ -3958,6 +3958,11 @@ int num_ioctls;</synopsis>
 !Idrivers/gpu/drm/i915/i915_cmd_parser.c
       </sect2>
       <sect2>
+        <title>Batchbuffer Pools</title>
+!Pdrivers/gpu/drm/i915/i915_gem_batch_pool.c batch pool
+!Idrivers/gpu/drm/i915/i915_gem_batch_pool.c
+      </sect2>
+      <sect2>
         <title>Logical Rings, Logical Ring Contexts and Execlists</title>
 !Pdrivers/gpu/drm/i915/intel_lrc.c Logical Rings, Logical Ring Contexts and Execlists
 !Idrivers/gpu/drm/i915/intel_lrc.c
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 891e584..73cd2d7 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -19,6 +19,7 @@ i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o
 
 # GEM code
 i915-y += i915_cmd_parser.o \
+	  i915_gem_batch_pool.o \
 	  i915_gem_context.o \
 	  i915_gem_render_state.o \
 	  i915_gem_debug.o \
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6a73803..fbf10cc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1126,6 +1126,12 @@ struct intel_l3_parity {
 	int which_slice;
 };
 
+struct i915_gem_batch_pool {
+	struct drm_device *dev;
+	struct list_head active_list;
+	struct list_head inactive_list;
+};
+
 struct i915_gem_mm {
 	/** Memory allocator for GTT stolen memory */
 	struct drm_mm stolen;
@@ -1797,6 +1803,8 @@ struct drm_i915_gem_object {
 	/** Used in execbuf to temporarily hold a ref */
 	struct list_head obj_exec_link;
 
+	struct list_head batch_pool_list;
+
 	/**
 	 * This is set if the object is on the active lists (has pending
 	 * rendering and so a non-zero seqno), and is not set if it i s on
@@ -2758,6 +2766,15 @@ void i915_destroy_error_state(struct drm_device *dev);
 void i915_get_extra_instdone(struct drm_device *dev, uint32_t *instdone);
 const char *i915_cache_level_str(struct drm_i915_private *i915, int type);
 
+/* i915_gem_batch_pool.c */
+void i915_gem_batch_pool_init(struct drm_device *dev,
+			      struct i915_gem_batch_pool *pool);
+void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool);
+struct drm_i915_gem_object*
+i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool, size_t size);
+void i915_gem_batch_pool_put(struct i915_gem_batch_pool *pool,
+			     struct drm_i915_gem_object *obj);
+
 /* i915_cmd_parser.c */
 int i915_cmd_parser_get_version(void);
 int i915_cmd_parser_init_ring(struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7e91978..4dbd7b9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4337,6 +4337,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 	INIT_LIST_HEAD(&obj->ring_list);
 	INIT_LIST_HEAD(&obj->obj_exec_link);
 	INIT_LIST_HEAD(&obj->vma_list);
+	INIT_LIST_HEAD(&obj->batch_pool_list);
 
 	obj->ops = ops;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
new file mode 100644
index 0000000..6d526fa
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -0,0 +1,153 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "i915_drv.h"
+
+/**
+ * DOC: batch pool
+ *
+ * In order to submit batch buffers as 'secure', the software command parser
+ * must ensure that a batch buffer cannot be modified after parsing. It does
+ * this by copying the user provided batch buffer contents to a kernel owned
+ * buffer from which the hardware will actually execute, and by carefully
+ * managing the address space bindings for such buffers.
+ *
+ * The batch pool framework provides a mechanism for the driver to manage a
+ * set of scratch buffers to use for this purpose. The framework can be
+ * extended to support other uses cases should they arise.
+ */
+
+/**
+ * i915_gem_batch_pool_init() - initialize a batch buffer pool
+ * @dev: the drm device
+ * @pool: the batch buffer pool
+ */
+void i915_gem_batch_pool_init(struct drm_device *dev,
+			      struct i915_gem_batch_pool *pool)
+{
+	pool->dev = dev;
+	INIT_LIST_HEAD(&pool->active_list);
+	INIT_LIST_HEAD(&pool->inactive_list);
+}
+
+/**
+ * i915_gem_batch_pool_fini() - clean up a batch buffer pool
+ * @pool: the pool to clean up
+ *
+ * Note: Callers must hold the struct_mutex. Callers must also ensure
+ *       that all buffers have been returned to the pool.
+ */
+void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool)
+{
+	WARN_ON(!mutex_is_locked(&pool->dev->struct_mutex));
+	WARN_ON(!list_empty(&pool->active_list));
+
+	while (!list_empty(&pool->inactive_list)) {
+		struct drm_i915_gem_object *obj =
+			list_first_entry(&pool->inactive_list,
+					 struct drm_i915_gem_object,
+					 batch_pool_list);
+
+		list_del_init(&obj->batch_pool_list);
+		drm_gem_object_unreference(&obj->base);
+	}
+}
+
+/**
+ * i915_gem_batch_pool_get() - select a buffer from the pool
+ * @pool: the batch buffer pool
+ * @size: the minimum desired size of the returned buffer
+ *
+ * Finds or allocates a batch buffer in the pool with at least the requested
+ * size. The buffer will not be used to satisfy further
+ * i915_gem_batch_pool_get() requests until the corresponding
+ * i915_gem_batch_pool_put() call. The caller is responsible for any domain or
+ * active/inactive management for the returned buffer.
+ *
+ * Note: Callers must hold the struct_mutex
+ *
+ * Return: the selected batch buffer object
+ */
+struct drm_i915_gem_object *
+i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
+			size_t size)
+{
+	struct drm_i915_gem_object *obj = NULL;
+	struct drm_i915_gem_object *tmp;
+	bool was_purged;
+
+	WARN_ON(!mutex_is_locked(&pool->dev->struct_mutex));
+
+	do {
+		was_purged = false;
+
+		list_for_each_entry(tmp, &pool->inactive_list, batch_pool_list) {
+			/*
+			 * Select a buffer that is at least as big as needed
+			 * but not 'too much' bigger. A better way to do this
+			 * might be to bucket the pool objects based on size.
+			 */
+			if (tmp->base.size >= size &&
+			    tmp->base.size <= (2 * size)) {
+				obj = tmp;
+				break;
+			}
+		}
+
+		if (obj && obj->madv == __I915_MADV_PURGED) {
+			was_purged = true;
+			list_del(&obj->batch_pool_list);
+			drm_gem_object_unreference(&obj->base);
+			obj = NULL;
+		}
+	} while (was_purged);
+
+	if (!obj) {
+		obj = i915_gem_alloc_object(pool->dev, size);
+		if (!obj)
+			return ERR_PTR(-ENOMEM);
+
+		list_add_tail(&obj->batch_pool_list, &pool->inactive_list);
+	}
+
+	obj->madv = I915_MADV_WILLNEED;
+	list_move_tail(&obj->batch_pool_list, &pool->active_list);
+
+	return obj;
+}
+
+/**
+ * i915_gem_batch_pool_put() - return a buffer to the pool
+ * @pool: the batch buffer pool
+ * @obj: the batch buffer object
+ *
+ * Note: Callers must hold the struct_mutex
+ */
+void i915_gem_batch_pool_put(struct i915_gem_batch_pool *pool,
+			     struct drm_i915_gem_object *obj)
+{
+	WARN_ON(!mutex_is_locked(&pool->dev->struct_mutex));
+	obj->madv = I915_MADV_DONTNEED;
+	list_move_tail(&obj->batch_pool_list, &pool->inactive_list);
+}
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/5] drm/i915: Use batch pools with the command parser
  2014-11-03 19:19 [PATCH v3 0/5] Command parser batch buffer copy bradley.d.volkin
  2014-11-03 19:19 ` [PATCH v3 1/5] drm/i915: Implement a framework for batch buffer pools bradley.d.volkin
@ 2014-11-03 19:19 ` bradley.d.volkin
  2014-11-04 10:17   ` Daniel Vetter
  2014-11-04 10:30   ` Daniel Vetter
  2014-11-03 19:19 ` [PATCH v3 3/5] drm/i915: Add a batch pool debugfs file bradley.d.volkin
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 23+ messages in thread
From: bradley.d.volkin @ 2014-11-03 19:19 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

This patch sets up all of the tracking and copying necessary to
use batch pools with the command parser and dispatches the copied
(shadow) batch to the hardware.

After this patch, the parser is in 'enabling' mode.

Note that performance takes a hit from the copy in some cases
and will likely need some work. At a rough pass, the memcpy
appears to be the bottleneck. Without having done a deeper
analysis, two ideas that come to mind are:
1) Copy sections of the batch at a time, as they are reached
   by parsing. Might improve cache locality.
2) Copy only up to the userspace-supplied batch length and
   memset the rest of the buffer. Reduces the number of reads.

v2:
- Remove setting the capacity of the pool
- One global pool instead of per-ring pools
- Replace batch_obj with shadow_batch_obj and hook into eb->vmas
- Memset any space in the shadow batch beyond what gets copied
- Rebased on execlist prep refactoring

v3:
- Rebase on chained batch handling
- Squash in setting the secure dispatch flag
- Add a note about the interaction w/secure dispatch pinning
- Check for request->batch_obj == NULL in i915_gem_free_request

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 84 ++++++++++++++++++++++++------
 drivers/gpu/drm/i915/i915_dma.c            |  1 +
 drivers/gpu/drm/i915/i915_drv.h            |  8 +++
 drivers/gpu/drm/i915/i915_gem.c            | 10 ++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 61 ++++++++++++++++++++--
 5 files changed, 143 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 809bb95..c8fe403 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -838,6 +838,56 @@ finish:
 	return (u32*)addr;
 }
 
+/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
+static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
+		       struct drm_i915_gem_object *src_obj)
+{
+	int ret = 0;
+	int needs_clflush = 0;
+	u32 *src_addr, *dest_addr = NULL;
+
+	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
+	if (ret) {
+		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
+		return ERR_PTR(ret);
+	}
+
+	src_addr = vmap_batch(src_obj);
+	if (!src_addr) {
+		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
+		ret = -ENOMEM;
+		goto unpin_src;
+	}
+
+	if (needs_clflush)
+		drm_clflush_virt_range((char *)src_addr, src_obj->base.size);
+
+	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
+	if (ret) {
+		DRM_DEBUG_DRIVER("CMD: Failed to set batch CPU domain\n");
+		goto unmap_src;
+	}
+
+	dest_addr = vmap_batch(dest_obj);
+	if (!dest_addr) {
+		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
+		ret = -ENOMEM;
+		goto unmap_src;
+	}
+
+	memcpy(dest_addr, src_addr, src_obj->base.size);
+	if (dest_obj->base.size > src_obj->base.size)
+		memset((u8 *)dest_addr + src_obj->base.size, 0,
+		       dest_obj->base.size - src_obj->base.size);
+
+unmap_src:
+	vunmap(src_addr);
+unpin_src:
+	i915_gem_object_unpin_pages(src_obj);
+
+	return ret ? ERR_PTR(ret) : dest_addr;
+}
+
 /**
  * i915_needs_cmd_parser() - should a given ring use software command parsing?
  * @ring: the ring in question
@@ -954,6 +1004,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
  * i915_parse_cmds() - parse a submitted batch buffer for privilege violations
  * @ring: the ring on which the batch is to execute
  * @batch_obj: the batch buffer in question
+ * @shadow_batch_obj: copy of the batch buffer in question
  * @batch_start_offset: byte offset in the batch at which execution starts
  * @is_master: is the submitting process the drm master?
  *
@@ -965,32 +1016,28 @@ static bool check_cmd(const struct intel_engine_cs *ring,
  */
 int i915_parse_cmds(struct intel_engine_cs *ring,
 		    struct drm_i915_gem_object *batch_obj,
+		    struct drm_i915_gem_object *shadow_batch_obj,
 		    u32 batch_start_offset,
 		    bool is_master)
 {
 	int ret = 0;
 	u32 *cmd, *batch_base, *batch_end;
 	struct drm_i915_cmd_descriptor default_desc = { 0 };
-	int needs_clflush = 0;
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
 
-	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
-		return ret;
+	batch_base = copy_batch(shadow_batch_obj, batch_obj);
+	if (IS_ERR(batch_base)) {
+		DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n");
+		return PTR_ERR(batch_base);
 	}
 
-	batch_base = vmap_batch(batch_obj);
-	if (!batch_base) {
-		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
-		i915_gem_object_unpin_pages(batch_obj);
-		return -ENOMEM;
-	}
-
-	if (needs_clflush)
-		drm_clflush_virt_range((char *)batch_base, batch_obj->base.size);
-
 	cmd = batch_base + (batch_start_offset / sizeof(*cmd));
+
+	/*
+	 * We use the source object's size because the shadow object is as
+	 * large or larger and copy_batch() will write MI_NOPs to the extra
+	 * space. Parsing should be faster in some cases this way.
+	 */
 	batch_end = cmd + (batch_obj->base.size / sizeof(*batch_end));
 
 	while (cmd < batch_end) {
@@ -1052,7 +1099,12 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 
 	vunmap(batch_base);
 
-	i915_gem_object_unpin_pages(batch_obj);
+	if (!ret) {
+		ret = i915_gem_object_set_to_gtt_domain(shadow_batch_obj,
+							false);
+		if (ret)
+			DRM_DEBUG_DRIVER("CMD: Failed to set batch GTT domain\n");
+	}
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 9a73533..dea0f45 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1891,6 +1891,7 @@ int i915_driver_unload(struct drm_device *dev)
 
 		mutex_lock(&dev->struct_mutex);
 		i915_gem_cleanup_ringbuffer(dev);
+		i915_gem_batch_pool_fini(&dev_priv->mm.batch_pool);
 		i915_gem_context_fini(dev);
 		mutex_unlock(&dev->struct_mutex);
 		i915_gem_cleanup_stolen(dev);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fbf10cc..50d974d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1145,6 +1145,13 @@ struct i915_gem_mm {
 	 */
 	struct list_head unbound_list;
 
+	/*
+	 * A pool of objects to use as shadow copies of client batch buffers
+	 * when the command parser is enabled. Prevents the client from
+	 * modifying the batch contents after software parsing.
+	 */
+	struct i915_gem_batch_pool batch_pool;
+
 	/** Usable portion of the GTT for GEM */
 	unsigned long stolen_base; /* limited to low memory (32-bit) */
 
@@ -2782,6 +2789,7 @@ void i915_cmd_parser_fini_ring(struct intel_engine_cs *ring);
 bool i915_needs_cmd_parser(struct intel_engine_cs *ring);
 int i915_parse_cmds(struct intel_engine_cs *ring,
 		    struct drm_i915_gem_object *batch_obj,
+		    struct drm_i915_gem_object *shadow_batch_obj,
 		    u32 batch_start_offset,
 		    bool is_master);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4dbd7b9..c59100d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2500,6 +2500,14 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request)
 	if (request->ctx)
 		i915_gem_context_unreference(request->ctx);
 
+	if (request->batch_obj &&
+	    !list_empty(&request->batch_obj->batch_pool_list)) {
+		struct drm_i915_private *dev_priv = to_i915(request->ring->dev);
+
+		i915_gem_batch_pool_put(&dev_priv->mm.batch_pool,
+					request->batch_obj);
+	}
+
 	kfree(request);
 }
 
@@ -5019,6 +5027,8 @@ i915_gem_load(struct drm_device *dev)
 	dev_priv->mm.oom_notifier.notifier_call = i915_gem_shrinker_oom;
 	register_oom_notifier(&dev_priv->mm.oom_notifier);
 
+	i915_gem_batch_pool_init(dev, &dev_priv->mm.batch_pool);
+
 	mutex_init(&dev_priv->fb_tracking.lock);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 4b7f5c1..c216ff2 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1236,6 +1236,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct eb_vmas *eb;
 	struct drm_i915_gem_object *batch_obj;
+	struct drm_i915_gem_object *shadow_batch_obj = NULL;
+	struct drm_i915_gem_exec_object2 shadow_exec_entry;
 	struct intel_engine_cs *ring;
 	struct intel_context *ctx;
 	struct i915_address_space *vm;
@@ -1361,22 +1363,63 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
 
 	if (i915_needs_cmd_parser(ring)) {
+		shadow_batch_obj =
+			i915_gem_batch_pool_get(&dev_priv->mm.batch_pool,
+						batch_obj->base.size);
+		if (IS_ERR(shadow_batch_obj)) {
+			ret = PTR_ERR(shadow_batch_obj);
+			/* Don't try to clean up the obj in the error path */
+			shadow_batch_obj = NULL;
+			goto err;
+		}
+
+		ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 4096, 0);
+		if (ret)
+			goto err;
+
 		ret = i915_parse_cmds(ring,
 				      batch_obj,
+				      shadow_batch_obj,
 				      args->batch_start_offset,
 				      file->is_master);
 		if (ret) {
 			if (ret != -EACCES)
 				goto err;
+			else {
+				i915_gem_object_ggtt_unpin(shadow_batch_obj);
+				i915_gem_batch_pool_put(&dev_priv->mm.batch_pool,
+							shadow_batch_obj);
+			}
 		} else {
+			struct i915_vma *vma;
+
+			memset(&shadow_exec_entry, 0,
+			       sizeof(shadow_exec_entry));
+
+			vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
+			vma->exec_entry = &shadow_exec_entry;
+			vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
+			drm_gem_object_reference(&shadow_batch_obj->base);
+			list_add_tail(&vma->exec_list, &eb->vmas);
+
+			batch_obj = shadow_batch_obj;
+
 			/*
-			 * XXX: Actually do this when enabling batch copy...
+			 * Set the DISPATCH_SECURE bit to remove the NON_SECURE
+			 * bit from MI_BATCH_BUFFER_START commands issued in the
+			 * dispatch_execbuffer implementations. We specifically
+			 * don't want that set when the command parser is
+			 * enabled.
 			 *
-			 * Set the DISPATCH_SECURE bit to remove the NON_SECURE bit
-			 * from MI_BATCH_BUFFER_START commands issued in the
-			 * dispatch_execbuffer implementations. We specifically don't
-			 * want that set when the command parser is enabled.
+			 * NB: By setting this flag, we're going to hit the
+			 * normal secure dispatch path below. That path will
+			 * do an extra pin of the shadow batch object and then
+			 * unpin after ->gt.do_execbuf(). At that point, the
+			 * object will have the single outstanding pin from
+			 * just before calling i915_parse_cmds(), and will get
+			 * unpinned via eb_destroy() and __EXEC_OBJECT_HAS_PIN.
 			 */
+			flags |= I915_DISPATCH_SECURE;
 		}
 	}
 
@@ -1418,6 +1461,14 @@ err:
 	i915_gem_context_unreference(ctx);
 	eb_destroy(eb);
 
+	if (ret && ring && shadow_batch_obj) {
+		if (i915_gem_obj_is_pinned(shadow_batch_obj))
+			i915_gem_object_ggtt_unpin(shadow_batch_obj);
+
+		i915_gem_batch_pool_put(&dev_priv->mm.batch_pool,
+					shadow_batch_obj);
+	}
+
 	mutex_unlock(&dev->struct_mutex);
 
 pre_mutex_err:
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 3/5] drm/i915: Add a batch pool debugfs file
  2014-11-03 19:19 [PATCH v3 0/5] Command parser batch buffer copy bradley.d.volkin
  2014-11-03 19:19 ` [PATCH v3 1/5] drm/i915: Implement a framework for batch buffer pools bradley.d.volkin
  2014-11-03 19:19 ` [PATCH v3 2/5] drm/i915: Use batch pools with the command parser bradley.d.volkin
@ 2014-11-03 19:19 ` bradley.d.volkin
  2014-11-03 19:19 ` [PATCH v3 4/5] drm/i915: Add batch pool details to i915_gem_objects debugfs bradley.d.volkin
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: bradley.d.volkin @ 2014-11-03 19:19 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

It provides some useful information about the buffers in
the global command parser batch pool.

v2: rebase on global pool instead of per-ring pools

v3: rebase

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 41 +++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index a79f83c..5f7cbed 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -582,6 +582,46 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 	return 0;
 }
 
+static int i915_gem_batch_pool_info(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_gem_object *obj;
+	int count = 0;
+	int ret;
+
+	ret = mutex_lock_interruptible(&dev->struct_mutex);
+	if (ret)
+		return ret;
+
+	seq_puts(m, "active:\n");
+	list_for_each_entry(obj,
+			    &dev_priv->mm.batch_pool.active_list,
+			    batch_pool_list) {
+		seq_puts(m, "   ");
+		describe_obj(m, obj);
+		seq_putc(m, '\n');
+		count++;
+	}
+
+	seq_puts(m, "inactive:\n");
+	list_for_each_entry(obj,
+			    &dev_priv->mm.batch_pool.inactive_list,
+			    batch_pool_list) {
+		seq_puts(m, "   ");
+		describe_obj(m, obj);
+		seq_putc(m, '\n');
+		count++;
+	}
+
+	seq_printf(m, "total: %d\n", count);
+
+	mutex_unlock(&dev->struct_mutex);
+
+	return 0;
+}
+
 static int i915_gem_request_info(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = m->private;
@@ -4177,6 +4217,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_gem_hws_blt", i915_hws_info, 0, (void *)BCS},
 	{"i915_gem_hws_bsd", i915_hws_info, 0, (void *)VCS},
 	{"i915_gem_hws_vebox", i915_hws_info, 0, (void *)VECS},
+	{"i915_gem_batch_pool", i915_gem_batch_pool_info, 0},
 	{"i915_frequency_info", i915_frequency_info, 0},
 	{"i915_drpc_info", i915_drpc_info, 0},
 	{"i915_emon_status", i915_emon_status, 0},
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 4/5] drm/i915: Add batch pool details to i915_gem_objects debugfs
  2014-11-03 19:19 [PATCH v3 0/5] Command parser batch buffer copy bradley.d.volkin
                   ` (2 preceding siblings ...)
  2014-11-03 19:19 ` [PATCH v3 3/5] drm/i915: Add a batch pool debugfs file bradley.d.volkin
@ 2014-11-03 19:19 ` bradley.d.volkin
  2014-11-03 19:19 ` [PATCH v3 5/5] drm/i915: Use batch length instead of object size in command parser bradley.d.volkin
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: bradley.d.volkin @ 2014-11-03 19:19 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

To better account for the potentially large memory consumption
of the batch pool.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 45 +++++++++++++++++++++++++++++--------
 1 file changed, 36 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 5f7cbed..53f78da 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -360,6 +360,38 @@ static int per_file_stats(int id, void *ptr, void *data)
 	return 0;
 }
 
+#define print_file_stats(m, name, stats) \
+	seq_printf(m, "%s: %u objects, %zu bytes (%zu active, %zu inactive, %zu global, %zu shared, %zu unbound)\n", \
+		   name, \
+		   stats.count, \
+		   stats.total, \
+		   stats.active, \
+		   stats.inactive, \
+		   stats.global, \
+		   stats.shared, \
+		   stats.unbound)
+
+static void print_batch_pool_stats(struct seq_file *m,
+				   struct drm_i915_private *dev_priv)
+{
+	struct drm_i915_gem_object *obj;
+	struct file_stats stats;
+
+	memset(&stats, 0, sizeof(stats));
+
+	list_for_each_entry(obj,
+			    &dev_priv->mm.batch_pool.active_list,
+			    batch_pool_list)
+		per_file_stats(0, obj, &stats);
+
+	list_for_each_entry(obj,
+			    &dev_priv->mm.batch_pool.inactive_list,
+			    batch_pool_list)
+		per_file_stats(0, obj, &stats);
+
+	print_file_stats(m, "batch pool", stats);
+}
+
 #define count_vmas(list, member) do { \
 	list_for_each_entry(vma, list, member) { \
 		size += i915_gem_obj_ggtt_size(vma->obj); \
@@ -442,6 +474,9 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 		   dev_priv->gtt.mappable_end - dev_priv->gtt.base.start);
 
 	seq_putc(m, '\n');
+	print_batch_pool_stats(m, dev_priv);
+
+	seq_putc(m, '\n');
 	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
 		struct file_stats stats;
 		struct task_struct *task;
@@ -459,15 +494,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 		 */
 		rcu_read_lock();
 		task = pid_task(file->pid, PIDTYPE_PID);
-		seq_printf(m, "%s: %u objects, %zu bytes (%zu active, %zu inactive, %zu global, %zu shared, %zu unbound)\n",
-			   task ? task->comm : "<unknown>",
-			   stats.count,
-			   stats.total,
-			   stats.active,
-			   stats.inactive,
-			   stats.global,
-			   stats.shared,
-			   stats.unbound);
+		print_file_stats(m, task ? task->comm : "<unknown>", stats);
 		rcu_read_unlock();
 	}
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 5/5] drm/i915: Use batch length instead of object size in command parser
  2014-11-03 19:19 [PATCH v3 0/5] Command parser batch buffer copy bradley.d.volkin
                   ` (3 preceding siblings ...)
  2014-11-03 19:19 ` [PATCH v3 4/5] drm/i915: Add batch pool details to i915_gem_objects debugfs bradley.d.volkin
@ 2014-11-03 19:19 ` bradley.d.volkin
  2014-11-03 19:32   ` [PATCH v3 5/5] drm/i915: Use batch length instead of shuang.he
  2014-11-03 22:44 ` [PATCH v3 0/5] Command parser batch buffer copy Volkin, Bradley D
  2014-11-04 10:31 ` Daniel Vetter
  6 siblings, 1 reply; 23+ messages in thread
From: bradley.d.volkin @ 2014-11-03 19:19 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Previously we couldn't trust the user-supplied batch length because
it came directly from userspace (i.e. untrusted code). It would have
affected what commands software parsed without regard to what hardware
would actually execute, leaving a potential hole.

With the parser now copying the user supplied batch buffer and writing
MI_NOP commands to any space after the copied region, we can safely use
the batch length input. This should be a performance win as the actual
batch length is frequently much smaller than the allocated object size.

v2: Fix handling of non-zero batch_start_offset

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 48 ++++++++++++++++++++----------
 drivers/gpu/drm/i915/i915_drv.h            |  1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  1 +
 3 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index c8fe403..d4d13b1 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -840,11 +840,19 @@ finish:
 
 /* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
 static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
-		       struct drm_i915_gem_object *src_obj)
+		       struct drm_i915_gem_object *src_obj,
+		       u32 batch_start_offset,
+		       u32 batch_len)
 {
 	int ret = 0;
 	int needs_clflush = 0;
-	u32 *src_addr, *dest_addr = NULL;
+	u32 *src_base, *dest_base = NULL;
+	u32 *src_addr, *dest_addr;
+	u32 offset = batch_start_offset / sizeof(*dest_addr);
+	u32 end = batch_start_offset + batch_len;
+
+	if (end > dest_obj->base.size || end > src_obj->base.size)
+		return ERR_PTR(-E2BIG);
 
 	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
 	if (ret) {
@@ -852,15 +860,17 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
 		return ERR_PTR(ret);
 	}
 
-	src_addr = vmap_batch(src_obj);
-	if (!src_addr) {
+	src_base = vmap_batch(src_obj);
+	if (!src_base) {
 		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
 		ret = -ENOMEM;
 		goto unpin_src;
 	}
 
+	src_addr = src_base + offset;
+
 	if (needs_clflush)
-		drm_clflush_virt_range((char *)src_addr, src_obj->base.size);
+		drm_clflush_virt_range((char *)src_addr, batch_len);
 
 	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
 	if (ret) {
@@ -868,24 +878,27 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
 		goto unmap_src;
 	}
 
-	dest_addr = vmap_batch(dest_obj);
-	if (!dest_addr) {
+	dest_base = vmap_batch(dest_obj);
+	if (!dest_base) {
 		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
 		ret = -ENOMEM;
 		goto unmap_src;
 	}
 
-	memcpy(dest_addr, src_addr, src_obj->base.size);
-	if (dest_obj->base.size > src_obj->base.size)
-		memset((u8 *)dest_addr + src_obj->base.size, 0,
-		       dest_obj->base.size - src_obj->base.size);
+	dest_addr = dest_base + offset;
+
+	if (batch_start_offset != 0)
+		memset((u8 *)dest_base, 0, batch_start_offset);
+
+	memcpy(dest_addr, src_addr, batch_len);
+	memset((u8 *)dest_addr + batch_len, 0, dest_obj->base.size - end);
 
 unmap_src:
-	vunmap(src_addr);
+	vunmap(src_base);
 unpin_src:
 	i915_gem_object_unpin_pages(src_obj);
 
-	return ret ? ERR_PTR(ret) : dest_addr;
+	return ret ? ERR_PTR(ret) : dest_base;
 }
 
 /**
@@ -1006,6 +1019,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
  * @batch_obj: the batch buffer in question
  * @shadow_batch_obj: copy of the batch buffer in question
  * @batch_start_offset: byte offset in the batch at which execution starts
+ * @batch_len: length of the commands in batch_obj
  * @is_master: is the submitting process the drm master?
  *
  * Parses the specified batch buffer looking for privilege violations as
@@ -1018,6 +1032,7 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		    struct drm_i915_gem_object *batch_obj,
 		    struct drm_i915_gem_object *shadow_batch_obj,
 		    u32 batch_start_offset,
+		    u32 batch_len,
 		    bool is_master)
 {
 	int ret = 0;
@@ -1025,7 +1040,8 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 	struct drm_i915_cmd_descriptor default_desc = { 0 };
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
 
-	batch_base = copy_batch(shadow_batch_obj, batch_obj);
+	batch_base = copy_batch(shadow_batch_obj, batch_obj,
+				batch_start_offset, batch_len);
 	if (IS_ERR(batch_base)) {
 		DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n");
 		return PTR_ERR(batch_base);
@@ -1034,11 +1050,11 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 	cmd = batch_base + (batch_start_offset / sizeof(*cmd));
 
 	/*
-	 * We use the source object's size because the shadow object is as
+	 * We use the batch length as size because the shadow object is as
 	 * large or larger and copy_batch() will write MI_NOPs to the extra
 	 * space. Parsing should be faster in some cases this way.
 	 */
-	batch_end = cmd + (batch_obj->base.size / sizeof(*batch_end));
+	batch_end = cmd + (batch_len / sizeof(*batch_end));
 
 	while (cmd < batch_end) {
 		const struct drm_i915_cmd_descriptor *desc;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 50d974d..6785ca2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2791,6 +2791,7 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		    struct drm_i915_gem_object *batch_obj,
 		    struct drm_i915_gem_object *shadow_batch_obj,
 		    u32 batch_start_offset,
+		    u32 batch_len,
 		    bool is_master);
 
 /* i915_suspend.c */
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index c216ff2..bab7ebf 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1381,6 +1381,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 				      batch_obj,
 				      shadow_batch_obj,
 				      args->batch_start_offset,
+				      args->batch_len,
 				      file->is_master);
 		if (ret) {
 			if (ret != -EACCES)
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/5] drm/i915: Use batch length instead of
  2014-11-03 19:19 ` [PATCH v3 5/5] drm/i915: Use batch length instead of object size in command parser bradley.d.volkin
@ 2014-11-03 19:32   ` shuang.he
  0 siblings, 0 replies; 23+ messages in thread
From: shuang.he @ 2014-11-03 19:32 UTC (permalink / raw)
  To: shuang.he, intel-gfx, bradley.d.volkin

Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
-------------------------------------Summary-------------------------------------
Platform: baseline_drm_intel_nightly_pass_rate->patch_applied_pass_rate
BYT: pass/total=251/251->248/251
PNV: pass/total=328/328->325/328
ILK: pass/total=323/330->327/330
IVB: pass/total=304/304->293/304
SNB: pass/total=336/336->334/336
HSW: pass/total=312/312->298/312
BDW: pass/total=284/285->284/285
-------------------------------------Detailed-------------------------------------
test_platform: test_suite, test_case, result_with_drm_intel_nightly(count, machine_id...)...->result_with_patch_applied(count, machine_id)...
BYT: Intel_gpu_tools, igt_gem_cs_tlb_blt, PASS(1, M29) -> TIMEOUT(1, M36)
BYT: Intel_gpu_tools, igt_gem_persistent_relocs_forked-thrash-inactive, PASS(1, M29) -> TIMEOUT(1, M36)
BYT: Intel_gpu_tools, igt_gem_reloc_vs_gpu_forked-thrash-inactive, PASS(1, M29) -> TIMEOUT(1, M36)
PNV: Intel_gpu_tools, igt_gem_reloc_vs_gpu_forked-faulting-reloc-thrash-inactive, PASS(1, M25) -> TIMEOUT(1, M25)
PNV: Intel_gpu_tools, igt_gem_reloc_vs_gpu_forked-interruptible-thrash-inactive, PASS(1, M25) -> TIMEOUT(1, M25)
PNV: Intel_gpu_tools, igt_gem_reloc_vs_gpu_forked-thrash-inactive, PASS(1, M25) -> TIMEOUT(1, M25)
ILK: Intel_gpu_tools, igt_gem_persistent_relocs_forked-thrash-inactive, PASS(1, M26) -> TIMEOUT(1, M37)
ILK: Intel_gpu_tools, igt_gem_reloc_vs_gpu_forked-interruptible-faulting-reloc-thrash-inactive, PASS(1, M26) -> TIMEOUT(1, M37)
ILK: Intel_gpu_tools, igt_gem_reloc_vs_gpu_forked-thrash-inactive, PASS(1, M26) -> TIMEOUT(1, M37)
ILK: Intel_gpu_tools, igt_kms_flip_dpms-off-confusion, DMESG_WARN(1, M26) -> PASS(1, M37)
ILK: Intel_gpu_tools, igt_kms_flip_flip-vs-fences-interruptible, DMESG_WARN(1, M26) -> PASS(1, M37)
ILK: Intel_gpu_tools, igt_kms_flip_flip-vs-modeset-interruptible, DMESG_WARN(1, M26) -> PASS(1, M37)
ILK: Intel_gpu_tools, igt_kms_flip_flip-vs-panning-vs-hang, DMESG_WARN(1, M26) -> PASS(1, M37)
ILK: Intel_gpu_tools, igt_kms_flip_flip-vs-rmfb, DMESG_WARN(1, M26) -> PASS(1, M37)
ILK: Intel_gpu_tools, igt_kms_flip_wf_vblank-vs-modeset, DMESG_WARN(1, M26) -> PASS(1, M37)
ILK: Intel_gpu_tools, igt_kms_render_gpu-blit, DMESG_WARN(1, M26) -> PASS(1, M37)
IVB: Intel_gpu_tools, igt_drv_hangman_error-state-capture-blt, PASS(1, M21) -> FAIL(1, M34)
IVB: Intel_gpu_tools, igt_drv_hangman_error-state-capture-bsd, PASS(1, M21) -> FAIL(1, M34)
IVB: Intel_gpu_tools, igt_drv_hangman_error-state-capture-render, PASS(1, M21) -> FAIL(1, M34)
IVB: Intel_gpu_tools, igt_gem_exec_parse_cmd-crossing-page, PASS(1, M21) -> FAIL(1, M34)
IVB: Intel_gpu_tools, igt_gem_madvise_dontneed-before-exec, PASS(1, M21) -> FAIL(1, M34)
IVB: Intel_gpu_tools, igt_gem_persistent_relocs_forked-faulting-reloc-thrash-inactive, PASS(1, M21) -> NO_RESULT(1, M34)
IVB: Intel_gpu_tools, igt_gem_persistent_relocs_forked-interruptible-faulting-reloc-thrash-inactive, PASS(1, M21) -> NO_RESULT(1, M34)
IVB: Intel_gpu_tools, igt_gem_persistent_relocs_forked-interruptible-thrash-inactive, PASS(1, M21) -> NO_RESULT(1, M34)
IVB: Intel_gpu_tools, igt_gem_persistent_relocs_forked-thrash-inactive, PASS(1, M21) -> NO_RESULT(1, M34)
IVB: Intel_gpu_tools, igt_gem_reloc_vs_gpu_forked-interruptible-thrash-inactive, PASS(1, M21) -> FAIL(1, M34)
IVB: Intel_gpu_tools, igt_gem_reloc_vs_gpu_forked-thrash-inactive, PASS(1, M21) -> FAIL(1, M34)
SNB: Intel_gpu_tools, igt_gem_concurrent_blit_gpu-rcs-early-read-forked, PASS(1, M35) -> TIMEOUT(1, M35)
SNB: Intel_gpu_tools, igt_gem_concurrent_blit_gpuX-rcs-early-read-forked, PASS(1, M35) -> TIMEOUT(1, M35)
HSW: Intel_gpu_tools, igt_drv_hangman_error-state-capture-blt, PASS(1, M39) -> FAIL(1, M39)
HSW: Intel_gpu_tools, igt_drv_hangman_error-state-capture-bsd, PASS(1, M39) -> FAIL(1, M39)
HSW: Intel_gpu_tools, igt_drv_hangman_error-state-capture-render, PASS(1, M39) -> FAIL(1, M39)
HSW: Intel_gpu_tools, igt_drv_hangman_error-state-capture-vebox, PASS(1, M39) -> FAIL(1, M39)
HSW: Intel_gpu_tools, igt_gem_bad_reloc_negative-reloc, PASS(1, M39) -> NSPT(1, M39)
HSW: Intel_gpu_tools, igt_gem_bad_reloc_negative-reloc-lut, PASS(1, M39) -> NSPT(1, M39)
HSW: Intel_gpu_tools, igt_gem_exec_parse_cmd-crossing-page, PASS(1, M39) -> FAIL(1, M39)
HSW: Intel_gpu_tools, igt_gem_madvise_dontneed-before-exec, PASS(1, M39) -> FAIL(1, M39)
HSW: Intel_gpu_tools, igt_gem_persistent_relocs_forked-faulting-reloc-thrash-inactive, PASS(1, M39) -> NO_RESULT(1, M39)
HSW: Intel_gpu_tools, igt_gem_persistent_relocs_forked-interruptible-faulting-reloc-thrash-inactive, PASS(1, M39) -> NO_RESULT(1, M39)
HSW: Intel_gpu_tools, igt_gem_persistent_relocs_forked-interruptible-thrash-inactive, PASS(1, M39) -> NO_RESULT(1, M39)
HSW: Intel_gpu_tools, igt_gem_persistent_relocs_forked-thrash-inactive, PASS(1, M39) -> NO_RESULT(1, M39)
HSW: Intel_gpu_tools, igt_gem_reloc_vs_gpu_forked-interruptible-thrash-inactive, PASS(1, M39) -> FAIL(1, M39)
HSW: Intel_gpu_tools, igt_gem_reloc_vs_gpu_forked-thrash-inactive, PASS(1, M39) -> FAIL(1, M39)
BDW: Intel_gpu_tools, igt_gem_linear_blits_interruptible, TIMEOUT(1, M30) -> PASS(1, M42)
BDW: Intel_gpu_tools, igt_gem_reloc_vs_gpu_forked-faulting-reloc-thrash-inactive, PASS(1, M30) -> TIMEOUT(1, M42)
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/5] Command parser batch buffer copy
  2014-11-03 19:19 [PATCH v3 0/5] Command parser batch buffer copy bradley.d.volkin
                   ` (4 preceding siblings ...)
  2014-11-03 19:19 ` [PATCH v3 5/5] drm/i915: Use batch length instead of object size in command parser bradley.d.volkin
@ 2014-11-03 22:44 ` Volkin, Bradley D
  2014-11-04  9:51   ` Daniel Vetter
  2014-11-04 10:31 ` Daniel Vetter
  6 siblings, 1 reply; 23+ messages in thread
From: Volkin, Bradley D @ 2014-11-03 22:44 UTC (permalink / raw)
  To: intel-gfx@lists.freedesktop.org

On Mon, Nov 03, 2014 at 11:19:40AM -0800, Volkin, Bradley D wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> This is v3 of the series I sent here:
> http://lists.freedesktop.org/archives/intel-gfx/2014-July/048705.html
> 
> Most of the previous commentary still applies. We've fixed the secure
> dispatch regression though, so the series now puts the parser into
> enabling mode in patch 2.
> 
> There are currently some regressions. I've sent i-g-t patches for a couple that
> are test issues. The remaining issues are:
> 
> drv_hangman error-state-capture-*
>     The test has checks that the logged 'gtt_offset' matches the expected
>     offset of the userspace-supplied batch buffer. Similarly for the address
>     in an MI_BATCH_BUFFER_START command found in the logged ringbuffer contents.
>     These obviously won't match if the buffer submitted to hardware is from the
>     batch pool instead of the one from userspace.
> 
> gem_reloc_vs_gpu *-thrash-inactive
> gem_persistent_relocs *-thrash-inactive
>     These fail with this type of error:
> 
>         Test assertion failure function do_test, file gem_reloc_vs_gpu.c:221:
>         Failed assertion: test == 0xdeadbeef
>         mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
>         child 6 failed with exit status 99
>         Subtest forked-thrash-inactive: FAIL (3.824s)
> 
>     One crashed, apparently in i915_gem_object_move_to_inactive() called via
>     i915_gem_reset(). I assume there's an issue with my active tracking or
>     madv usage for batch pool objects. Any input would be helpful.

Follow up: the current patches aren't setting any read domains for the shadow
batch object. Simply setting

shadow_batch_obj->base.pending_read_domains = batch_obj->base.pending_read_domains;

after parsing has these passing consistently in local testing. So unless anyone
sees further problems, I'll send a new version of the one patch with that change
squashed in.

Brad

> 
> gem_cs_tlb
>     This test takes longer and may time out.
> 
> Brad Volkin (5):
>   drm/i915: Implement a framework for batch buffer pools
>   drm/i915: Use batch pools with the command parser
>   drm/i915: Add a batch pool debugfs file
>   drm/i915: Add batch pool details to i915_gem_objects debugfs
>   drm/i915: Use batch length instead of object size in command parser
> 
>  Documentation/DocBook/drm.tmpl             |   5 +
>  drivers/gpu/drm/i915/Makefile              |   1 +
>  drivers/gpu/drm/i915/i915_cmd_parser.c     | 102 +++++++++++++++----
>  drivers/gpu/drm/i915/i915_debugfs.c        |  86 ++++++++++++++--
>  drivers/gpu/drm/i915/i915_dma.c            |   1 +
>  drivers/gpu/drm/i915/i915_drv.h            |  26 +++++
>  drivers/gpu/drm/i915/i915_gem.c            |  11 +++
>  drivers/gpu/drm/i915/i915_gem_batch_pool.c | 153 +++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  62 +++++++++++-
>  9 files changed, 416 insertions(+), 31 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_gem_batch_pool.c
> 
> -- 
> 1.9.1
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/5] Command parser batch buffer copy
  2014-11-03 22:44 ` [PATCH v3 0/5] Command parser batch buffer copy Volkin, Bradley D
@ 2014-11-04  9:51   ` Daniel Vetter
  0 siblings, 0 replies; 23+ messages in thread
From: Daniel Vetter @ 2014-11-04  9:51 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx@lists.freedesktop.org

On Mon, Nov 03, 2014 at 02:44:55PM -0800, Volkin, Bradley D wrote:
> On Mon, Nov 03, 2014 at 11:19:40AM -0800, Volkin, Bradley D wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> > 
> > This is v3 of the series I sent here:
> > http://lists.freedesktop.org/archives/intel-gfx/2014-July/048705.html
> > 
> > Most of the previous commentary still applies. We've fixed the secure
> > dispatch regression though, so the series now puts the parser into
> > enabling mode in patch 2.
> > 
> > There are currently some regressions. I've sent i-g-t patches for a couple that
> > are test issues. The remaining issues are:
> > 
> > drv_hangman error-state-capture-*
> >     The test has checks that the logged 'gtt_offset' matches the expected
> >     offset of the userspace-supplied batch buffer. Similarly for the address
> >     in an MI_BATCH_BUFFER_START command found in the logged ringbuffer contents.
> >     These obviously won't match if the buffer submitted to hardware is from the
> >     batch pool instead of the one from userspace.

Sounds like we need a patch to check for the cmd parser getparam and
remove these checks if that is active.

> > 
> > gem_reloc_vs_gpu *-thrash-inactive
> > gem_persistent_relocs *-thrash-inactive
> >     These fail with this type of error:
> > 
> >         Test assertion failure function do_test, file gem_reloc_vs_gpu.c:221:
> >         Failed assertion: test == 0xdeadbeef
> >         mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
> >         child 6 failed with exit status 99
> >         Subtest forked-thrash-inactive: FAIL (3.824s)
> > 
> >     One crashed, apparently in i915_gem_object_move_to_inactive() called via
> >     i915_gem_reset(). I assume there's an issue with my active tracking or
> >     madv usage for batch pool objects. Any input would be helpful.
> 
> Follow up: the current patches aren't setting any read domains for the shadow
> batch object. Simply setting
> 
> shadow_batch_obj->base.pending_read_domains = batch_obj->base.pending_read_domains;
> 
> after parsing has these passing consistently in local testing. So unless anyone
> sees further problems, I'll send a new version of the one patch with that change
> squashed in.

Yeah, PRTS results looked like you've broken the shrinker logic. Not
setting read domains will result in no conflict and so unbind the shadow
batches way too early. Makes sense.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/5] drm/i915: Use batch pools with the command parser
  2014-11-03 19:19 ` [PATCH v3 2/5] drm/i915: Use batch pools with the command parser bradley.d.volkin
@ 2014-11-04 10:17   ` Daniel Vetter
  2014-11-04 16:35     ` Volkin, Bradley D
  2014-11-04 10:30   ` Daniel Vetter
  1 sibling, 1 reply; 23+ messages in thread
From: Daniel Vetter @ 2014-11-04 10:17 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Mon, Nov 03, 2014 at 11:19:42AM -0800, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> This patch sets up all of the tracking and copying necessary to
> use batch pools with the command parser and dispatches the copied
> (shadow) batch to the hardware.
> 
> After this patch, the parser is in 'enabling' mode.

Awesome. Ok, a few small comments and notes for polish on this one.

> 
> Note that performance takes a hit from the copy in some cases
> and will likely need some work. At a rough pass, the memcpy
> appears to be the bottleneck. Without having done a deeper
> analysis, two ideas that come to mind are:
> 1) Copy sections of the batch at a time, as they are reached
>    by parsing. Might improve cache locality.
> 2) Copy only up to the userspace-supplied batch length and
>    memset the rest of the buffer. Reduces the number of reads.
> 
> v2:
> - Remove setting the capacity of the pool
> - One global pool instead of per-ring pools
> - Replace batch_obj with shadow_batch_obj and hook into eb->vmas
> - Memset any space in the shadow batch beyond what gets copied

Hm, what is that memset for? Just appeasing the cmdparser?

In that case why can't we just fail the scanning if we don't find the
MI_BB_END within the limit - the hardware should never execute beyond
that, so garbage beyond is ok. Or do I miss something.

> - Rebased on execlist prep refactoring
> 
> v3:
> - Rebase on chained batch handling
> - Squash in setting the secure dispatch flag
> - Add a note about the interaction w/secure dispatch pinning
> - Check for request->batch_obj == NULL in i915_gem_free_request
> 
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c     | 84 ++++++++++++++++++++++++------
>  drivers/gpu/drm/i915/i915_dma.c            |  1 +
>  drivers/gpu/drm/i915/i915_drv.h            |  8 +++
>  drivers/gpu/drm/i915/i915_gem.c            | 10 ++++
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 61 ++++++++++++++++++++--
>  5 files changed, 143 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 809bb95..c8fe403 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -838,6 +838,56 @@ finish:
>  	return (u32*)addr;
>  }
>  
> +/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
> +static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
> +		       struct drm_i915_gem_object *src_obj)
> +{
> +	int ret = 0;
> +	int needs_clflush = 0;
> +	u32 *src_addr, *dest_addr = NULL;
> +
> +	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
> +		return ERR_PTR(ret);
> +	}
> +
> +	src_addr = vmap_batch(src_obj);
> +	if (!src_addr) {
> +		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
> +		ret = -ENOMEM;
> +		goto unpin_src;
> +	}
> +
> +	if (needs_clflush)
> +		drm_clflush_virt_range((char *)src_addr, src_obj->base.size);
> +
> +	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("CMD: Failed to set batch CPU domain\n");
> +		goto unmap_src;
> +	}
> +
> +	dest_addr = vmap_batch(dest_obj);
> +	if (!dest_addr) {
> +		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
> +		ret = -ENOMEM;
> +		goto unmap_src;
> +	}
> +
> +	memcpy(dest_addr, src_addr, src_obj->base.size);
> +	if (dest_obj->base.size > src_obj->base.size)
> +		memset((u8 *)dest_addr + src_obj->base.size, 0,
> +		       dest_obj->base.size - src_obj->base.size);

Hm, no in-line clflush and cached cpu mmaps. How slow is this on vlv?

> +
> +unmap_src:
> +	vunmap(src_addr);
> +unpin_src:
> +	i915_gem_object_unpin_pages(src_obj);
> +
> +	return ret ? ERR_PTR(ret) : dest_addr;
> +}
> +
>  /**
>   * i915_needs_cmd_parser() - should a given ring use software command parsing?
>   * @ring: the ring in question
> @@ -954,6 +1004,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
>   * i915_parse_cmds() - parse a submitted batch buffer for privilege violations
>   * @ring: the ring on which the batch is to execute
>   * @batch_obj: the batch buffer in question
> + * @shadow_batch_obj: copy of the batch buffer in question
>   * @batch_start_offset: byte offset in the batch at which execution starts
>   * @is_master: is the submitting process the drm master?
>   *
> @@ -965,32 +1016,28 @@ static bool check_cmd(const struct intel_engine_cs *ring,
>   */
>  int i915_parse_cmds(struct intel_engine_cs *ring,
>  		    struct drm_i915_gem_object *batch_obj,
> +		    struct drm_i915_gem_object *shadow_batch_obj,
>  		    u32 batch_start_offset,
>  		    bool is_master)
>  {
>  	int ret = 0;
>  	u32 *cmd, *batch_base, *batch_end;
>  	struct drm_i915_cmd_descriptor default_desc = { 0 };
> -	int needs_clflush = 0;
>  	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
>  
> -	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
> -	if (ret) {
> -		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
> -		return ret;
> +	batch_base = copy_batch(shadow_batch_obj, batch_obj);
> +	if (IS_ERR(batch_base)) {
> +		DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n");
> +		return PTR_ERR(batch_base);
>  	}
>  
> -	batch_base = vmap_batch(batch_obj);
> -	if (!batch_base) {
> -		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
> -		i915_gem_object_unpin_pages(batch_obj);
> -		return -ENOMEM;
> -	}
> -
> -	if (needs_clflush)
> -		drm_clflush_virt_range((char *)batch_base, batch_obj->base.size);
> -
>  	cmd = batch_base + (batch_start_offset / sizeof(*cmd));
> +
> +	/*
> +	 * We use the source object's size because the shadow object is as
> +	 * large or larger and copy_batch() will write MI_NOPs to the extra
> +	 * space. Parsing should be faster in some cases this way.
> +	 */
>  	batch_end = cmd + (batch_obj->base.size / sizeof(*batch_end));
>  
>  	while (cmd < batch_end) {
> @@ -1052,7 +1099,12 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>  
>  	vunmap(batch_base);
>  
> -	i915_gem_object_unpin_pages(batch_obj);
> +	if (!ret) {
> +		ret = i915_gem_object_set_to_gtt_domain(shadow_batch_obj,
> +							false);

This smells like hacking around the domain tracking system. The batch is
executed by the gpu, gtt domain is for cpu access. This might also be a
reason why the shrinker wreaks havoc with you shadow patches. With the
correct pending_read_domains move_to_active should take care of things
mostly.

> +		if (ret)
> +			DRM_DEBUG_DRIVER("CMD: Failed to set batch GTT domain\n");
> +	}
>  
>  	return ret;
>  }
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 9a73533..dea0f45 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1891,6 +1891,7 @@ int i915_driver_unload(struct drm_device *dev)
>  
>  		mutex_lock(&dev->struct_mutex);
>  		i915_gem_cleanup_ringbuffer(dev);
> +		i915_gem_batch_pool_fini(&dev_priv->mm.batch_pool);
>  		i915_gem_context_fini(dev);
>  		mutex_unlock(&dev->struct_mutex);
>  		i915_gem_cleanup_stolen(dev);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index fbf10cc..50d974d 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1145,6 +1145,13 @@ struct i915_gem_mm {
>  	 */
>  	struct list_head unbound_list;
>  
> +	/*
> +	 * A pool of objects to use as shadow copies of client batch buffers
> +	 * when the command parser is enabled. Prevents the client from
> +	 * modifying the batch contents after software parsing.
> +	 */
> +	struct i915_gem_batch_pool batch_pool;
> +
>  	/** Usable portion of the GTT for GEM */
>  	unsigned long stolen_base; /* limited to low memory (32-bit) */
>  
> @@ -2782,6 +2789,7 @@ void i915_cmd_parser_fini_ring(struct intel_engine_cs *ring);
>  bool i915_needs_cmd_parser(struct intel_engine_cs *ring);
>  int i915_parse_cmds(struct intel_engine_cs *ring,
>  		    struct drm_i915_gem_object *batch_obj,
> +		    struct drm_i915_gem_object *shadow_batch_obj,
>  		    u32 batch_start_offset,
>  		    bool is_master);
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 4dbd7b9..c59100d 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2500,6 +2500,14 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request)
>  	if (request->ctx)
>  		i915_gem_context_unreference(request->ctx);
>  
> +	if (request->batch_obj &&
> +	    !list_empty(&request->batch_obj->batch_pool_list)) {
> +		struct drm_i915_private *dev_priv = to_i915(request->ring->dev);
> +
> +		i915_gem_batch_pool_put(&dev_priv->mm.batch_pool,
> +					request->batch_obj);
> +	}

This looks a bit like reinvented active tracking. If we copy the libdrm
cache design a bit more the cache would simply walk the active list when
there's nothing in the inactive list and shovel all the objects with
->active.

The advantage would be that we wouldn't leak special batch_pool_put code
all over the place, keeping things nice&tidy.

> +
>  	kfree(request);
>  }
>  
> @@ -5019,6 +5027,8 @@ i915_gem_load(struct drm_device *dev)
>  	dev_priv->mm.oom_notifier.notifier_call = i915_gem_shrinker_oom;
>  	register_oom_notifier(&dev_priv->mm.oom_notifier);
>  
> +	i915_gem_batch_pool_init(dev, &dev_priv->mm.batch_pool);
> +
>  	mutex_init(&dev_priv->fb_tracking.lock);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 4b7f5c1..c216ff2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1236,6 +1236,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct eb_vmas *eb;
>  	struct drm_i915_gem_object *batch_obj;
> +	struct drm_i915_gem_object *shadow_batch_obj = NULL;
> +	struct drm_i915_gem_exec_object2 shadow_exec_entry;
>  	struct intel_engine_cs *ring;
>  	struct intel_context *ctx;
>  	struct i915_address_space *vm;
> @@ -1361,22 +1363,63 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;

This line here to assign read domains is only required on the real batch
obj. So I think this should be into the if maze below to make sure we only
set it on the real batch (either the userspace one or the shadow copy).

>  
>  	if (i915_needs_cmd_parser(ring)) {
> +		shadow_batch_obj =
> +			i915_gem_batch_pool_get(&dev_priv->mm.batch_pool,
> +						batch_obj->base.size);
> +		if (IS_ERR(shadow_batch_obj)) {
> +			ret = PTR_ERR(shadow_batch_obj);
> +			/* Don't try to clean up the obj in the error path */
> +			shadow_batch_obj = NULL;
> +			goto err;
> +		}
> +
> +		ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 4096, 0);
> +		if (ret)
> +			goto err;
> +
>  		ret = i915_parse_cmds(ring,
>  				      batch_obj,
> +				      shadow_batch_obj,
>  				      args->batch_start_offset,
>  				      file->is_master);
>  		if (ret) {
>  			if (ret != -EACCES)
>  				goto err;
> +			else {
> +				i915_gem_object_ggtt_unpin(shadow_batch_obj);
> +				i915_gem_batch_pool_put(&dev_priv->mm.batch_pool,
> +							shadow_batch_obj);
> +			}
>  		} else {
> +			struct i915_vma *vma;
> +
> +			memset(&shadow_exec_entry, 0,
> +			       sizeof(shadow_exec_entry));
> +
> +			vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
> +			vma->exec_entry = &shadow_exec_entry;
> +			vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
> +			drm_gem_object_reference(&shadow_batch_obj->base);
> +			list_add_tail(&vma->exec_list, &eb->vmas);
> +
> +			batch_obj = shadow_batch_obj;
> +
>  			/*
> -			 * XXX: Actually do this when enabling batch copy...
> +			 * Set the DISPATCH_SECURE bit to remove the NON_SECURE
> +			 * bit from MI_BATCH_BUFFER_START commands issued in the
> +			 * dispatch_execbuffer implementations. We specifically
> +			 * don't want that set when the command parser is
> +			 * enabled.
>  			 *
> -			 * Set the DISPATCH_SECURE bit to remove the NON_SECURE bit
> -			 * from MI_BATCH_BUFFER_START commands issued in the
> -			 * dispatch_execbuffer implementations. We specifically don't
> -			 * want that set when the command parser is enabled.
> +			 * NB: By setting this flag, we're going to hit the
> +			 * normal secure dispatch path below. That path will
> +			 * do an extra pin of the shadow batch object and then
> +			 * unpin after ->gt.do_execbuf(). At that point, the
> +			 * object will have the single outstanding pin from
> +			 * just before calling i915_parse_cmds(), and will get
> +			 * unpinned via eb_destroy() and __EXEC_OBJECT_HAS_PIN.
>  			 */

That sounds like this additional pin should be put into the cmd parser,
just around the copy function.

Also do_execbuf is already way too big a function. I think it would be
good to extract all the SECURE_DISPATCH/cmd parser/batch_bo special
handling code into a new function for clarity. And then if possible not
leak any of the cleanup code out.


> +			flags |= I915_DISPATCH_SECURE;
>  		}
>  	}
>  
> @@ -1418,6 +1461,14 @@ err:
>  	i915_gem_context_unreference(ctx);
>  	eb_destroy(eb);
>  
> +	if (ret && ring && shadow_batch_obj) {
> +		if (i915_gem_obj_is_pinned(shadow_batch_obj))
> +			i915_gem_object_ggtt_unpin(shadow_batch_obj);

Which would also clean up this code, which looks more like a hack to work
around the pin_count check on final unref than anything solid and
intentional ;-)

> +
> +		i915_gem_batch_pool_put(&dev_priv->mm.batch_pool,
> +					shadow_batch_obj);
> +	}
> +
>  	mutex_unlock(&dev->struct_mutex);
>  
>  pre_mutex_err:
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/5] drm/i915: Use batch pools with the command parser
  2014-11-03 19:19 ` [PATCH v3 2/5] drm/i915: Use batch pools with the command parser bradley.d.volkin
  2014-11-04 10:17   ` Daniel Vetter
@ 2014-11-04 10:30   ` Daniel Vetter
  2014-11-04 16:46     ` Volkin, Bradley D
  1 sibling, 1 reply; 23+ messages in thread
From: Daniel Vetter @ 2014-11-04 10:30 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Mon, Nov 03, 2014 at 11:19:42AM -0800, bradley.d.volkin@intel.com wrote:
> +			flags |= I915_DISPATCH_SECURE;

I've forgotten one: You must have a full ppgtt check here since the
binding for aliasing ppgtt is still broken - all PIN_GLOBAL stuff still
ends up in the ppgtt ptes too, so yourspace can still get at your shadow
batch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/5] Command parser batch buffer copy
  2014-11-03 19:19 [PATCH v3 0/5] Command parser batch buffer copy bradley.d.volkin
                   ` (5 preceding siblings ...)
  2014-11-03 22:44 ` [PATCH v3 0/5] Command parser batch buffer copy Volkin, Bradley D
@ 2014-11-04 10:31 ` Daniel Vetter
  6 siblings, 0 replies; 23+ messages in thread
From: Daniel Vetter @ 2014-11-04 10:31 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Mon, Nov 03, 2014 at 11:19:40AM -0800, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> This is v3 of the series I sent here:
> http://lists.freedesktop.org/archives/intel-gfx/2014-July/048705.html
> 
> Most of the previous commentary still applies. We've fixed the secure
> dispatch regression though, so the series now puts the parser into
> enabling mode in patch 2.
> 
> There are currently some regressions. I've sent i-g-t patches for a couple that
> are test issues. The remaining issues are:
> 
> drv_hangman error-state-capture-*
>     The test has checks that the logged 'gtt_offset' matches the expected
>     offset of the userspace-supplied batch buffer. Similarly for the address
>     in an MI_BATCH_BUFFER_START command found in the logged ringbuffer contents.
>     These obviously won't match if the buffer submitted to hardware is from the
>     batch pool instead of the one from userspace.
> 
> gem_reloc_vs_gpu *-thrash-inactive
> gem_persistent_relocs *-thrash-inactive
>     These fail with this type of error:
> 
>         Test assertion failure function do_test, file gem_reloc_vs_gpu.c:221:
>         Failed assertion: test == 0xdeadbeef
>         mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
>         child 6 failed with exit status 99
>         Subtest forked-thrash-inactive: FAIL (3.824s)
> 
>     One crashed, apparently in i915_gem_object_move_to_inactive() called via
>     i915_gem_reset(). I assume there's an issue with my active tracking or
>     madv usage for batch pool objects. Any input would be helpful.
> 
> gem_cs_tlb
>     This test takes longer and may time out.
> 
> Brad Volkin (5):
>   drm/i915: Implement a framework for batch buffer pools
>   drm/i915: Use batch pools with the command parser
>   drm/i915: Add a batch pool debugfs file
>   drm/i915: Add batch pool details to i915_gem_objects debugfs
>   drm/i915: Use batch length instead of object size in command parser

Looks good overall, some comments on the 2nd patch. With those addressed
imo ready for detailed review.

Cheers, Daniel
> 
>  Documentation/DocBook/drm.tmpl             |   5 +
>  drivers/gpu/drm/i915/Makefile              |   1 +
>  drivers/gpu/drm/i915/i915_cmd_parser.c     | 102 +++++++++++++++----
>  drivers/gpu/drm/i915/i915_debugfs.c        |  86 ++++++++++++++--
>  drivers/gpu/drm/i915/i915_dma.c            |   1 +
>  drivers/gpu/drm/i915/i915_drv.h            |  26 +++++
>  drivers/gpu/drm/i915/i915_gem.c            |  11 +++
>  drivers/gpu/drm/i915/i915_gem_batch_pool.c | 153 +++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  62 +++++++++++-
>  9 files changed, 416 insertions(+), 31 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_gem_batch_pool.c
> 
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/5] drm/i915: Use batch pools with the command parser
  2014-11-04 10:17   ` Daniel Vetter
@ 2014-11-04 16:35     ` Volkin, Bradley D
  2014-11-05  9:50       ` Daniel Vetter
  0 siblings, 1 reply; 23+ messages in thread
From: Volkin, Bradley D @ 2014-11-04 16:35 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx@lists.freedesktop.org

On Tue, Nov 04, 2014 at 02:17:59AM -0800, Daniel Vetter wrote:
> On Mon, Nov 03, 2014 at 11:19:42AM -0800, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> > 
> > This patch sets up all of the tracking and copying necessary to
> > use batch pools with the command parser and dispatches the copied
> > (shadow) batch to the hardware.
> > 
> > After this patch, the parser is in 'enabling' mode.
> 
> Awesome. Ok, a few small comments and notes for polish on this one.
> 
> > 
> > Note that performance takes a hit from the copy in some cases
> > and will likely need some work. At a rough pass, the memcpy
> > appears to be the bottleneck. Without having done a deeper
> > analysis, two ideas that come to mind are:
> > 1) Copy sections of the batch at a time, as they are reached
> >    by parsing. Might improve cache locality.
> > 2) Copy only up to the userspace-supplied batch length and
> >    memset the rest of the buffer. Reduces the number of reads.
> > 
> > v2:
> > - Remove setting the capacity of the pool
> > - One global pool instead of per-ring pools
> > - Replace batch_obj with shadow_batch_obj and hook into eb->vmas
> > - Memset any space in the shadow batch beyond what gets copied
> 
> Hm, what is that memset for? Just appeasing the cmdparser?
> 
> In that case why can't we just fail the scanning if we don't find the
> MI_BB_END within the limit - the hardware should never execute beyond
> that, so garbage beyond is ok. Or do I miss something.

Yeah, in retrospect this probably isn't needed. I think it was just some
extra paranoia to make sure the extra space is harmless.

> 
> > - Rebased on execlist prep refactoring
> > 
> > v3:
> > - Rebase on chained batch handling
> > - Squash in setting the secure dispatch flag
> > - Add a note about the interaction w/secure dispatch pinning
> > - Check for request->batch_obj == NULL in i915_gem_free_request
> > 
> > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_cmd_parser.c     | 84 ++++++++++++++++++++++++------
> >  drivers/gpu/drm/i915/i915_dma.c            |  1 +
> >  drivers/gpu/drm/i915/i915_drv.h            |  8 +++
> >  drivers/gpu/drm/i915/i915_gem.c            | 10 ++++
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 61 ++++++++++++++++++++--
> >  5 files changed, 143 insertions(+), 21 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > index 809bb95..c8fe403 100644
> > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > @@ -838,6 +838,56 @@ finish:
> >  	return (u32*)addr;
> >  }
> >  
> > +/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
> > +static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
> > +		       struct drm_i915_gem_object *src_obj)
> > +{
> > +	int ret = 0;
> > +	int needs_clflush = 0;
> > +	u32 *src_addr, *dest_addr = NULL;
> > +
> > +	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
> > +	if (ret) {
> > +		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
> > +		return ERR_PTR(ret);
> > +	}
> > +
> > +	src_addr = vmap_batch(src_obj);
> > +	if (!src_addr) {
> > +		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
> > +		ret = -ENOMEM;
> > +		goto unpin_src;
> > +	}
> > +
> > +	if (needs_clflush)
> > +		drm_clflush_virt_range((char *)src_addr, src_obj->base.size);
> > +
> > +	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
> > +	if (ret) {
> > +		DRM_DEBUG_DRIVER("CMD: Failed to set batch CPU domain\n");
> > +		goto unmap_src;
> > +	}
> > +
> > +	dest_addr = vmap_batch(dest_obj);
> > +	if (!dest_addr) {
> > +		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
> > +		ret = -ENOMEM;
> > +		goto unmap_src;
> > +	}
> > +
> > +	memcpy(dest_addr, src_addr, src_obj->base.size);
> > +	if (dest_obj->base.size > src_obj->base.size)
> > +		memset((u8 *)dest_addr + src_obj->base.size, 0,
> > +		       dest_obj->base.size - src_obj->base.size);
> 
> Hm, no in-line clflush and cached cpu mmaps. How slow is this on vlv?

No system to test on. Maybe we can get some qa perf testing on all relevant
platforms, since we'll want an overall sense of perf impact anyhow.

Is there a specific change you would make here to avoid the issue?

> 
> > +
> > +unmap_src:
> > +	vunmap(src_addr);
> > +unpin_src:
> > +	i915_gem_object_unpin_pages(src_obj);
> > +
> > +	return ret ? ERR_PTR(ret) : dest_addr;
> > +}
> > +
> >  /**
> >   * i915_needs_cmd_parser() - should a given ring use software command parsing?
> >   * @ring: the ring in question
> > @@ -954,6 +1004,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
> >   * i915_parse_cmds() - parse a submitted batch buffer for privilege violations
> >   * @ring: the ring on which the batch is to execute
> >   * @batch_obj: the batch buffer in question
> > + * @shadow_batch_obj: copy of the batch buffer in question
> >   * @batch_start_offset: byte offset in the batch at which execution starts
> >   * @is_master: is the submitting process the drm master?
> >   *
> > @@ -965,32 +1016,28 @@ static bool check_cmd(const struct intel_engine_cs *ring,
> >   */
> >  int i915_parse_cmds(struct intel_engine_cs *ring,
> >  		    struct drm_i915_gem_object *batch_obj,
> > +		    struct drm_i915_gem_object *shadow_batch_obj,
> >  		    u32 batch_start_offset,
> >  		    bool is_master)
> >  {
> >  	int ret = 0;
> >  	u32 *cmd, *batch_base, *batch_end;
> >  	struct drm_i915_cmd_descriptor default_desc = { 0 };
> > -	int needs_clflush = 0;
> >  	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
> >  
> > -	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
> > -	if (ret) {
> > -		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
> > -		return ret;
> > +	batch_base = copy_batch(shadow_batch_obj, batch_obj);
> > +	if (IS_ERR(batch_base)) {
> > +		DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n");
> > +		return PTR_ERR(batch_base);
> >  	}
> >  
> > -	batch_base = vmap_batch(batch_obj);
> > -	if (!batch_base) {
> > -		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
> > -		i915_gem_object_unpin_pages(batch_obj);
> > -		return -ENOMEM;
> > -	}
> > -
> > -	if (needs_clflush)
> > -		drm_clflush_virt_range((char *)batch_base, batch_obj->base.size);
> > -
> >  	cmd = batch_base + (batch_start_offset / sizeof(*cmd));
> > +
> > +	/*
> > +	 * We use the source object's size because the shadow object is as
> > +	 * large or larger and copy_batch() will write MI_NOPs to the extra
> > +	 * space. Parsing should be faster in some cases this way.
> > +	 */
> >  	batch_end = cmd + (batch_obj->base.size / sizeof(*batch_end));
> >  
> >  	while (cmd < batch_end) {
> > @@ -1052,7 +1099,12 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
> >  
> >  	vunmap(batch_base);
> >  
> > -	i915_gem_object_unpin_pages(batch_obj);
> > +	if (!ret) {
> > +		ret = i915_gem_object_set_to_gtt_domain(shadow_batch_obj,
> > +							false);
> 
> This smells like hacking around the domain tracking system. The batch is
> executed by the gpu, gtt domain is for cpu access. This might also be a
> reason why the shrinker wreaks havoc with you shadow patches. With the
> correct pending_read_domains move_to_active should take care of things
> mostly.

Ok, I'll look at this again. I was using the golden context batch buffer code
as a reference for this, so perhaps there's something to fix there as well.

> 
> > +		if (ret)
> > +			DRM_DEBUG_DRIVER("CMD: Failed to set batch GTT domain\n");
> > +	}
> >  
> >  	return ret;
> >  }
> > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> > index 9a73533..dea0f45 100644
> > --- a/drivers/gpu/drm/i915/i915_dma.c
> > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > @@ -1891,6 +1891,7 @@ int i915_driver_unload(struct drm_device *dev)
> >  
> >  		mutex_lock(&dev->struct_mutex);
> >  		i915_gem_cleanup_ringbuffer(dev);
> > +		i915_gem_batch_pool_fini(&dev_priv->mm.batch_pool);
> >  		i915_gem_context_fini(dev);
> >  		mutex_unlock(&dev->struct_mutex);
> >  		i915_gem_cleanup_stolen(dev);
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index fbf10cc..50d974d 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1145,6 +1145,13 @@ struct i915_gem_mm {
> >  	 */
> >  	struct list_head unbound_list;
> >  
> > +	/*
> > +	 * A pool of objects to use as shadow copies of client batch buffers
> > +	 * when the command parser is enabled. Prevents the client from
> > +	 * modifying the batch contents after software parsing.
> > +	 */
> > +	struct i915_gem_batch_pool batch_pool;
> > +
> >  	/** Usable portion of the GTT for GEM */
> >  	unsigned long stolen_base; /* limited to low memory (32-bit) */
> >  
> > @@ -2782,6 +2789,7 @@ void i915_cmd_parser_fini_ring(struct intel_engine_cs *ring);
> >  bool i915_needs_cmd_parser(struct intel_engine_cs *ring);
> >  int i915_parse_cmds(struct intel_engine_cs *ring,
> >  		    struct drm_i915_gem_object *batch_obj,
> > +		    struct drm_i915_gem_object *shadow_batch_obj,
> >  		    u32 batch_start_offset,
> >  		    bool is_master);
> >  
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 4dbd7b9..c59100d 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -2500,6 +2500,14 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request)
> >  	if (request->ctx)
> >  		i915_gem_context_unreference(request->ctx);
> >  
> > +	if (request->batch_obj &&
> > +	    !list_empty(&request->batch_obj->batch_pool_list)) {
> > +		struct drm_i915_private *dev_priv = to_i915(request->ring->dev);
> > +
> > +		i915_gem_batch_pool_put(&dev_priv->mm.batch_pool,
> > +					request->batch_obj);
> > +	}
> 
> This looks a bit like reinvented active tracking. If we copy the libdrm
> cache design a bit more the cache would simply walk the active list when
> there's nothing in the inactive list and shovel all the objects with
> ->active.
> 
> The advantage would be that we wouldn't leak special batch_pool_put code
> all over the place, keeping things nice&tidy.

Chris had also suggested looking at the libdrm cache with respect to this,
but I was concerned about the impact of the scheduler reordering requests.
I'd have to go back and look at libdrm, but it sounded like it was relying
on the fifo nature of submissions. Could be wrong though.

Plus, we're hardly leaking batch_pool_put "all over the place" :). There's
exactly 3 call sites - this one for batches that execute and complete,
one for the chained batch fallback, and one in the execbuf error handling
path. So my inclination is really to leave it as is.

> 
> > +
> >  	kfree(request);
> >  }
> >  
> > @@ -5019,6 +5027,8 @@ i915_gem_load(struct drm_device *dev)
> >  	dev_priv->mm.oom_notifier.notifier_call = i915_gem_shrinker_oom;
> >  	register_oom_notifier(&dev_priv->mm.oom_notifier);
> >  
> > +	i915_gem_batch_pool_init(dev, &dev_priv->mm.batch_pool);
> > +
> >  	mutex_init(&dev_priv->fb_tracking.lock);
> >  }
> >  
> > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index 4b7f5c1..c216ff2 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -1236,6 +1236,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  	struct eb_vmas *eb;
> >  	struct drm_i915_gem_object *batch_obj;
> > +	struct drm_i915_gem_object *shadow_batch_obj = NULL;
> > +	struct drm_i915_gem_exec_object2 shadow_exec_entry;
> >  	struct intel_engine_cs *ring;
> >  	struct intel_context *ctx;
> >  	struct i915_address_space *vm;
> > @@ -1361,22 +1363,63 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >  	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
> 
> This line here to assign read domains is only required on the real batch
> obj. So I think this should be into the if maze below to make sure we only
> set it on the real batch (either the userspace one or the shadow copy).

Ok, I can put it after the whole parser block, after we've set
	batch_obj = shadow_batch_obj;

> 
> >  
> >  	if (i915_needs_cmd_parser(ring)) {
> > +		shadow_batch_obj =
> > +			i915_gem_batch_pool_get(&dev_priv->mm.batch_pool,
> > +						batch_obj->base.size);
> > +		if (IS_ERR(shadow_batch_obj)) {
> > +			ret = PTR_ERR(shadow_batch_obj);
> > +			/* Don't try to clean up the obj in the error path */
> > +			shadow_batch_obj = NULL;
> > +			goto err;
> > +		}
> > +
> > +		ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 4096, 0);
> > +		if (ret)
> > +			goto err;
> > +
> >  		ret = i915_parse_cmds(ring,
> >  				      batch_obj,
> > +				      shadow_batch_obj,
> >  				      args->batch_start_offset,
> >  				      file->is_master);
> >  		if (ret) {
> >  			if (ret != -EACCES)
> >  				goto err;
> > +			else {
> > +				i915_gem_object_ggtt_unpin(shadow_batch_obj);
> > +				i915_gem_batch_pool_put(&dev_priv->mm.batch_pool,
> > +							shadow_batch_obj);
> > +			}
> >  		} else {
> > +			struct i915_vma *vma;
> > +
> > +			memset(&shadow_exec_entry, 0,
> > +			       sizeof(shadow_exec_entry));
> > +
> > +			vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
> > +			vma->exec_entry = &shadow_exec_entry;
> > +			vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
> > +			drm_gem_object_reference(&shadow_batch_obj->base);
> > +			list_add_tail(&vma->exec_list, &eb->vmas);
> > +
> > +			batch_obj = shadow_batch_obj;
> > +
> >  			/*
> > -			 * XXX: Actually do this when enabling batch copy...
> > +			 * Set the DISPATCH_SECURE bit to remove the NON_SECURE
> > +			 * bit from MI_BATCH_BUFFER_START commands issued in the
> > +			 * dispatch_execbuffer implementations. We specifically
> > +			 * don't want that set when the command parser is
> > +			 * enabled.
> >  			 *
> > -			 * Set the DISPATCH_SECURE bit to remove the NON_SECURE bit
> > -			 * from MI_BATCH_BUFFER_START commands issued in the
> > -			 * dispatch_execbuffer implementations. We specifically don't
> > -			 * want that set when the command parser is enabled.
> > +			 * NB: By setting this flag, we're going to hit the
> > +			 * normal secure dispatch path below. That path will
> > +			 * do an extra pin of the shadow batch object and then
> > +			 * unpin after ->gt.do_execbuf(). At that point, the
> > +			 * object will have the single outstanding pin from
> > +			 * just before calling i915_parse_cmds(), and will get
> > +			 * unpinned via eb_destroy() and __EXEC_OBJECT_HAS_PIN.
> >  			 */
> 
> That sounds like this additional pin should be put into the cmd parser,
> just around the copy function.

I suppose we could do that. I had originally been trying to keep i915_parse_cmds()
to only mapping/parsing the batch, but it's grown beyond that already I think.

> 
> Also do_execbuf is already way too big a function. I think it would be
> good to extract all the SECURE_DISPATCH/cmd parser/batch_bo special
> handling code into a new function for clarity. And then if possible not
> leak any of the cleanup code out.

The issue is all the potential failure points after this block. I'll look at it
again and see if there's a way to keep it clean.

> 
> 
> > +			flags |= I915_DISPATCH_SECURE;
> >  		}
> >  	}
> >  
> > @@ -1418,6 +1461,14 @@ err:
> >  	i915_gem_context_unreference(ctx);
> >  	eb_destroy(eb);
> >  
> > +	if (ret && ring && shadow_batch_obj) {
> > +		if (i915_gem_obj_is_pinned(shadow_batch_obj))
> > +			i915_gem_object_ggtt_unpin(shadow_batch_obj);
> 
> Which would also clean up this code, which looks more like a hack to work
> around the pin_count check on final unref than anything solid and
> intentional ;-)

This is error path cleanup, at least for a window where we've allocated the
shadow batch, pinned it, but haven't hooked it into the vma list. Again,
I'll see if there's a cleaner way to do it, but I don't think it's that bad.

Brad

> 
> > +
> > +		i915_gem_batch_pool_put(&dev_priv->mm.batch_pool,
> > +					shadow_batch_obj);
> > +	}
> > +
> >  	mutex_unlock(&dev->struct_mutex);
> >  
> >  pre_mutex_err:
> > -- 
> > 1.9.1
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/5] drm/i915: Use batch pools with the command parser
  2014-11-04 10:30   ` Daniel Vetter
@ 2014-11-04 16:46     ` Volkin, Bradley D
  2014-11-05  9:53       ` Daniel Vetter
  0 siblings, 1 reply; 23+ messages in thread
From: Volkin, Bradley D @ 2014-11-04 16:46 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx@lists.freedesktop.org

On Tue, Nov 04, 2014 at 02:30:14AM -0800, Daniel Vetter wrote:
> On Mon, Nov 03, 2014 at 11:19:42AM -0800, bradley.d.volkin@intel.com wrote:
> > +			flags |= I915_DISPATCH_SECURE;
> 
> I've forgotten one: You must have a full ppgtt check here since the
> binding for aliasing ppgtt is still broken - all PIN_GLOBAL stuff still
> ends up in the ppgtt ptes too, so yourspace can still get at your shadow
> batch.

Hmm. We have a check for USES_PPGTT in i915_needs_cmd_parser(). Should we
change that to USES_FULL_PPGTT and just not do any parser stuff without
full ppgtt? I guess I'm just wondering if there's a benefit to running the
parser in a case where we can't enable secure dispatch.

Brad

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/5] drm/i915: Use batch pools with the command parser
  2014-11-04 16:35     ` Volkin, Bradley D
@ 2014-11-05  9:50       ` Daniel Vetter
  2014-11-05 10:20         ` Daniel Vetter
  2014-11-05 22:42         ` Volkin, Bradley D
  0 siblings, 2 replies; 23+ messages in thread
From: Daniel Vetter @ 2014-11-05  9:50 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx@lists.freedesktop.org

On Tue, Nov 04, 2014 at 08:35:00AM -0800, Volkin, Bradley D wrote:
> On Tue, Nov 04, 2014 at 02:17:59AM -0800, Daniel Vetter wrote:
> > On Mon, Nov 03, 2014 at 11:19:42AM -0800, bradley.d.volkin@intel.com wrote:

> > Hm, no in-line clflush and cached cpu mmaps. How slow is this on vlv?
> 
> No system to test on. Maybe we can get some qa perf testing on all relevant
> platforms, since we'll want an overall sense of perf impact anyhow.
> 
> Is there a specific change you would make here to avoid the issue?

The fastest upload path afaik (without big setup costs like new
pagetables) is write + immediate clflushing. But that means that the cmd
parsing and write-out to the shadow batch needs to be merged first.

The downside of your approache here of first setting things to the cpu
write domain is that we'll incur an additional clflush pass before we copy
things. Which isn't needed since we'll overwrite all the cachelines
anyway. So an intermediate would be to leave the object in the GTT
(flushed domain) and only clflush after the copy and cmd parser is done.

But since clflush is only required on vlv (LLC machines are coherent) you
really need such a box to test/develop this. So I guess we'll just wait
with this.

> > > +	if (!ret) {
> > > +		ret = i915_gem_object_set_to_gtt_domain(shadow_batch_obj,
> > > +							false);
> > 
> > This smells like hacking around the domain tracking system. The batch is
> > executed by the gpu, gtt domain is for cpu access. This might also be a
> > reason why the shrinker wreaks havoc with you shadow patches. With the
> > correct pending_read_domains move_to_active should take care of things
> > mostly.
> 
> Ok, I'll look at this again. I was using the golden context batch buffer code
> as a reference for this, so perhaps there's something to fix there as well.

Hm, indeed that's a bit hacked up too. So let's document the sequenc as
it's used by execbuf:

i915_gem_execbuffer_move_to_gpu takes care of any cpu and gpu side cache
flushing. For the cmd parser you should be able to reuse that just by
putting the the shadow batch vma onto the execbuf list. render ctx init
would need to send in a single entry vma list.

i915_gem_execbuffer_move_to_active updates the domain tracking an puts the
vma onto the active list. That requires correctly adjusted
pending_read_domains though.

With these two no set_to_gtt domain should be required for flushing data
out. Mika, can you perhaps look into this, together with kerneldoc for the
render init functions?

> 
> > 
> > > +		if (ret)
> > > +			DRM_DEBUG_DRIVER("CMD: Failed to set batch GTT domain\n");
> > > +	}
> > >  
> > >  	return ret;
> > >  }
> > > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> > > index 9a73533..dea0f45 100644
> > > --- a/drivers/gpu/drm/i915/i915_dma.c
> > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > @@ -1891,6 +1891,7 @@ int i915_driver_unload(struct drm_device *dev)
> > >  
> > >  		mutex_lock(&dev->struct_mutex);
> > >  		i915_gem_cleanup_ringbuffer(dev);
> > > +		i915_gem_batch_pool_fini(&dev_priv->mm.batch_pool);
> > >  		i915_gem_context_fini(dev);
> > >  		mutex_unlock(&dev->struct_mutex);
> > >  		i915_gem_cleanup_stolen(dev);
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > > index fbf10cc..50d974d 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > @@ -1145,6 +1145,13 @@ struct i915_gem_mm {
> > >  	 */
> > >  	struct list_head unbound_list;
> > >  
> > > +	/*
> > > +	 * A pool of objects to use as shadow copies of client batch buffers
> > > +	 * when the command parser is enabled. Prevents the client from
> > > +	 * modifying the batch contents after software parsing.
> > > +	 */
> > > +	struct i915_gem_batch_pool batch_pool;
> > > +
> > >  	/** Usable portion of the GTT for GEM */
> > >  	unsigned long stolen_base; /* limited to low memory (32-bit) */
> > >  
> > > @@ -2782,6 +2789,7 @@ void i915_cmd_parser_fini_ring(struct intel_engine_cs *ring);
> > >  bool i915_needs_cmd_parser(struct intel_engine_cs *ring);
> > >  int i915_parse_cmds(struct intel_engine_cs *ring,
> > >  		    struct drm_i915_gem_object *batch_obj,
> > > +		    struct drm_i915_gem_object *shadow_batch_obj,
> > >  		    u32 batch_start_offset,
> > >  		    bool is_master);
> > >  
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index 4dbd7b9..c59100d 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -2500,6 +2500,14 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request)
> > >  	if (request->ctx)
> > >  		i915_gem_context_unreference(request->ctx);
> > >  
> > > +	if (request->batch_obj &&
> > > +	    !list_empty(&request->batch_obj->batch_pool_list)) {
> > > +		struct drm_i915_private *dev_priv = to_i915(request->ring->dev);
> > > +
> > > +		i915_gem_batch_pool_put(&dev_priv->mm.batch_pool,
> > > +					request->batch_obj);
> > > +	}
> > 
> > This looks a bit like reinvented active tracking. If we copy the libdrm
> > cache design a bit more the cache would simply walk the active list when
> > there's nothing in the inactive list and shovel all the objects with
> > ->active.
> > 
> > The advantage would be that we wouldn't leak special batch_pool_put code
> > all over the place, keeping things nice&tidy.
> 
> Chris had also suggested looking at the libdrm cache with respect to this,
> but I was concerned about the impact of the scheduler reordering requests.
> I'd have to go back and look at libdrm, but it sounded like it was relying
> on the fifo nature of submissions. Could be wrong though.

It just walks the list and moves all the non-active stuff over. Not
terribly efficient in the face of a scheduler (or multiple rings even),
but I think we could start to care about this if it turns out to be a real
problem.

> Plus, we're hardly leaking batch_pool_put "all over the place" :). There's
> exactly 3 call sites - this one for batches that execute and complete,
> one for the chained batch fallback, and one in the execbuf error handling
> path. So my inclination is really to leave it as is.

Oh I know, everyone's just adding 3 callsites over the driver ;-) GEM is
already way too complex for me to fully understand all the corners, so I
tend to be fairly nasty with separation and encapsulation. E.g. I totally
agree with Chris that all the execlist special cases we're sprinkling all
over the driver will come back and bite us hard. Especially since they
seem to be growing still with recent patches.

> > >  			/*
> > > -			 * XXX: Actually do this when enabling batch copy...
> > > +			 * Set the DISPATCH_SECURE bit to remove the NON_SECURE
> > > +			 * bit from MI_BATCH_BUFFER_START commands issued in the
> > > +			 * dispatch_execbuffer implementations. We specifically
> > > +			 * don't want that set when the command parser is
> > > +			 * enabled.
> > >  			 *
> > > -			 * Set the DISPATCH_SECURE bit to remove the NON_SECURE bit
> > > -			 * from MI_BATCH_BUFFER_START commands issued in the
> > > -			 * dispatch_execbuffer implementations. We specifically don't
> > > -			 * want that set when the command parser is enabled.
> > > +			 * NB: By setting this flag, we're going to hit the
> > > +			 * normal secure dispatch path below. That path will
> > > +			 * do an extra pin of the shadow batch object and then
> > > +			 * unpin after ->gt.do_execbuf(). At that point, the
> > > +			 * object will have the single outstanding pin from
> > > +			 * just before calling i915_parse_cmds(), and will get
> > > +			 * unpinned via eb_destroy() and __EXEC_OBJECT_HAS_PIN.
> > >  			 */
> > 
> > That sounds like this additional pin should be put into the cmd parser,
> > just around the copy function.
> 
> I suppose we could do that. I had originally been trying to keep i915_parse_cmds()
> to only mapping/parsing the batch, but it's grown beyond that already I think.

Yeah I think that split would make sense, but now with the batch copy it's
probably better to have it within i915_cmd_parser.c
> 
> > 
> > Also do_execbuf is already way too big a function. I think it would be
> > good to extract all the SECURE_DISPATCH/cmd parser/batch_bo special
> > handling code into a new function for clarity. And then if possible not
> > leak any of the cleanup code out.
> 
> The issue is all the potential failure points after this block. I'll look at it
> again and see if there's a way to keep it clean.

Yeah. Although I have to agree that the other motivation for this request
was to make sure you'll have to think about no leaking such cleanups for
error paths, like the unpin below ;-)
> 
> > 
> > 
> > > +			flags |= I915_DISPATCH_SECURE;
> > >  		}
> > >  	}
> > >  
> > > @@ -1418,6 +1461,14 @@ err:
> > >  	i915_gem_context_unreference(ctx);
> > >  	eb_destroy(eb);
> > >  
> > > +	if (ret && ring && shadow_batch_obj) {
> > > +		if (i915_gem_obj_is_pinned(shadow_batch_obj))
> > > +			i915_gem_object_ggtt_unpin(shadow_batch_obj);
> > 
> > Which would also clean up this code, which looks more like a hack to work
> > around the pin_count check on final unref than anything solid and
> > intentional ;-)
> 
> This is error path cleanup, at least for a window where we've allocated the
> shadow batch, pinned it, but haven't hooked it into the vma list. Again,
> I'll see if there's a cleaner way to do it, but I don't think it's that bad.

I think if you put the pinning into the cmd parser function it should just
vanish.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/5] drm/i915: Use batch pools with the command parser
  2014-11-04 16:46     ` Volkin, Bradley D
@ 2014-11-05  9:53       ` Daniel Vetter
  0 siblings, 0 replies; 23+ messages in thread
From: Daniel Vetter @ 2014-11-05  9:53 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx@lists.freedesktop.org

On Tue, Nov 04, 2014 at 08:46:16AM -0800, Volkin, Bradley D wrote:
> On Tue, Nov 04, 2014 at 02:30:14AM -0800, Daniel Vetter wrote:
> > On Mon, Nov 03, 2014 at 11:19:42AM -0800, bradley.d.volkin@intel.com wrote:
> > > +			flags |= I915_DISPATCH_SECURE;
> > 
> > I've forgotten one: You must have a full ppgtt check here since the
> > binding for aliasing ppgtt is still broken - all PIN_GLOBAL stuff still
> > ends up in the ppgtt ptes too, so yourspace can still get at your shadow
> > batch.
> 
> Hmm. We have a check for USES_PPGTT in i915_needs_cmd_parser(). Should we
> change that to USES_FULL_PPGTT and just not do any parser stuff without
> full ppgtt? I guess I'm just wondering if there's a benefit to running the
> parser in a case where we can't enable secure dispatch.

Given the situation we're in (ugly) I think the USES_PPGTT is ok (since
that's kinda indicating whether the cmd parser is technically possible).
But granting mode should have an additional USES_FULL_PPGTT with a FIXME
comment that until the aliasing ppgtt binding is fixed we can only hide
the shadow batch with full ppgtt.

Wrt still keeping it running: Atm we can't enable it yet by default (since
neither full ppgtt is solid nor the aliasing ppgtt binding fixed). So I
think it makes sense to keep it in scan-only mode for now. Hence also why
I think an additional check is better - we can then just bin that when
aliasing ppgtt is fixed.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/5] drm/i915: Use batch pools with the command parser
  2014-11-05  9:50       ` Daniel Vetter
@ 2014-11-05 10:20         ` Daniel Vetter
  2014-11-05 22:42         ` Volkin, Bradley D
  1 sibling, 0 replies; 23+ messages in thread
From: Daniel Vetter @ 2014-11-05 10:20 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx@lists.freedesktop.org

On Wed, Nov 05, 2014 at 10:50:24AM +0100, Daniel Vetter wrote:
> On Tue, Nov 04, 2014 at 08:35:00AM -0800, Volkin, Bradley D wrote:
> > On Tue, Nov 04, 2014 at 02:17:59AM -0800, Daniel Vetter wrote:
> > > On Mon, Nov 03, 2014 at 11:19:42AM -0800, bradley.d.volkin@intel.com wrote:
> > > > +	if (!ret) {
> > > > +		ret = i915_gem_object_set_to_gtt_domain(shadow_batch_obj,
> > > > +							false);
> > > 
> > > This smells like hacking around the domain tracking system. The batch is
> > > executed by the gpu, gtt domain is for cpu access. This might also be a
> > > reason why the shrinker wreaks havoc with you shadow patches. With the
> > > correct pending_read_domains move_to_active should take care of things
> > > mostly.
> > 
> > Ok, I'll look at this again. I was using the golden context batch buffer code
> > as a reference for this, so perhaps there's something to fix there as well.
> 
> Hm, indeed that's a bit hacked up too. So let's document the sequenc as
> it's used by execbuf:
> 
> i915_gem_execbuffer_move_to_gpu takes care of any cpu and gpu side cache
> flushing. For the cmd parser you should be able to reuse that just by
> putting the the shadow batch vma onto the execbuf list. render ctx init
> would need to send in a single entry vma list.
> 
> i915_gem_execbuffer_move_to_active updates the domain tracking an puts the
> vma onto the active list. That requires correctly adjusted
> pending_read_domains though.
> 
> With these two no set_to_gtt domain should be required for flushing data
> out. Mika, can you perhaps look into this, together with kerneldoc for the
> render init functions?

Actually cc Mika ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/5] drm/i915: Use batch pools with the command parser
  2014-11-05  9:50       ` Daniel Vetter
  2014-11-05 10:20         ` Daniel Vetter
@ 2014-11-05 22:42         ` Volkin, Bradley D
  2014-11-06  7:36           ` Chris Wilson
  1 sibling, 1 reply; 23+ messages in thread
From: Volkin, Bradley D @ 2014-11-05 22:42 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx@lists.freedesktop.org

[snip]

On Wed, Nov 05, 2014 at 01:50:24AM -0800, Daniel Vetter wrote:
> On Tue, Nov 04, 2014 at 08:35:00AM -0800, Volkin, Bradley D wrote:
> > On Tue, Nov 04, 2014 at 02:17:59AM -0800, Daniel Vetter wrote:
> > > On Mon, Nov 03, 2014 at 11:19:42AM -0800, bradley.d.volkin@intel.com wrote:
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > index 4dbd7b9..c59100d 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > @@ -2500,6 +2500,14 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request)
> > > >  	if (request->ctx)
> > > >  		i915_gem_context_unreference(request->ctx);
> > > >  
> > > > +	if (request->batch_obj &&
> > > > +	    !list_empty(&request->batch_obj->batch_pool_list)) {
> > > > +		struct drm_i915_private *dev_priv = to_i915(request->ring->dev);
> > > > +
> > > > +		i915_gem_batch_pool_put(&dev_priv->mm.batch_pool,
> > > > +					request->batch_obj);
> > > > +	}
> > > 
> > > This looks a bit like reinvented active tracking. If we copy the libdrm
> > > cache design a bit more the cache would simply walk the active list when
> > > there's nothing in the inactive list and shovel all the objects with
> > > ->active.
> > > 
> > > The advantage would be that we wouldn't leak special batch_pool_put code
> > > all over the place, keeping things nice&tidy.
> > 
> > Chris had also suggested looking at the libdrm cache with respect to this,
> > but I was concerned about the impact of the scheduler reordering requests.
> > I'd have to go back and look at libdrm, but it sounded like it was relying
> > on the fifo nature of submissions. Could be wrong though.
> 
> It just walks the list and moves all the non-active stuff over. Not
> terribly efficient in the face of a scheduler (or multiple rings even),
> but I think we could start to care about this if it turns out to be a real
> problem.

I'm testing patches with the majority of the changes you've requested, so
should have a v4 tonight/tomorrow.

For this part, I've got an implementation that works ok but one difference is
that if we stop submitting batches, and therefore stop calling batch_pool_get,
we stop moving buffers to the batch pool's inactive list. This means some buffers
don't get marked purgeable even when they are. The solution that I see is to
add a function to do the batch pool active -> inactive work and then call that
from the appropriate place(s), but that seems to defeat the purpose of the
proposed change. Suggestions?

Brad
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/5] drm/i915: Use batch pools with the command parser
  2014-11-05 22:42         ` Volkin, Bradley D
@ 2014-11-06  7:36           ` Chris Wilson
  2014-11-06 13:56             ` Daniel Vetter
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2014-11-06  7:36 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx@lists.freedesktop.org

On Wed, Nov 05, 2014 at 02:42:00PM -0800, Volkin, Bradley D wrote:
> For this part, I've got an implementation that works ok but one difference is
> that if we stop submitting batches, and therefore stop calling batch_pool_get,
> we stop moving buffers to the batch pool's inactive list. This means some buffers
> don't get marked purgeable even when they are. The solution that I see is to
> add a function to do the batch pool active -> inactive work and then call that
> from the appropriate place(s), but that seems to defeat the purpose of the
> proposed change. Suggestions?

Just mark them always as purgeable.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/5] drm/i915: Use batch pools with the command parser
  2014-11-06  7:36           ` Chris Wilson
@ 2014-11-06 13:56             ` Daniel Vetter
  2014-11-06 17:38               ` Volkin, Bradley D
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Vetter @ 2014-11-06 13:56 UTC (permalink / raw)
  To: Chris Wilson, Volkin, Bradley D, Daniel Vetter,
	intel-gfx@lists.freedesktop.org

On Thu, Nov 06, 2014 at 07:36:55AM +0000, Chris Wilson wrote:
> On Wed, Nov 05, 2014 at 02:42:00PM -0800, Volkin, Bradley D wrote:
> > For this part, I've got an implementation that works ok but one difference is
> > that if we stop submitting batches, and therefore stop calling batch_pool_get,
> > we stop moving buffers to the batch pool's inactive list. This means some buffers
> > don't get marked purgeable even when they are. The solution that I see is to
> > add a function to do the batch pool active -> inactive work and then call that
> > from the appropriate place(s), but that seems to defeat the purpose of the
> > proposed change. Suggestions?
> 
> Just mark them always as purgeable.

Yeah the trick with purgeable is that the shrinker will wait for the
buffers to retire if they're still active. So you can mark the purgeable
right after the move_to_active call. Then the only part that doesn't
happen automatically is the batch-pool internal accounting. But we also
don't really care about that until we want a new shadow batch.

libdrm works the same way btw: Userspace grabs a new batch (from the cache
hopefully, setting willneed again), builds the cmd stream and submits it.
Then it frees the buffer right away, libdrm puts it into the bo cache and
also marks it as purgeable right away.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/5] drm/i915: Use batch pools with the command parser
  2014-11-06 13:56             ` Daniel Vetter
@ 2014-11-06 17:38               ` Volkin, Bradley D
  2014-11-07  9:32                 ` Chris Wilson
  2014-11-07  9:45                 ` Daniel Vetter
  0 siblings, 2 replies; 23+ messages in thread
From: Volkin, Bradley D @ 2014-11-06 17:38 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx@lists.freedesktop.org

On Thu, Nov 06, 2014 at 05:56:36AM -0800, Daniel Vetter wrote:
> On Thu, Nov 06, 2014 at 07:36:55AM +0000, Chris Wilson wrote:
> > On Wed, Nov 05, 2014 at 02:42:00PM -0800, Volkin, Bradley D wrote:
> > > For this part, I've got an implementation that works ok but one difference is
> > > that if we stop submitting batches, and therefore stop calling batch_pool_get,
> > > we stop moving buffers to the batch pool's inactive list. This means some buffers
> > > don't get marked purgeable even when they are. The solution that I see is to
> > > add a function to do the batch pool active -> inactive work and then call that
> > > from the appropriate place(s), but that seems to defeat the purpose of the
> > > proposed change. Suggestions?
> > 
> > Just mark them always as purgeable.
> 
> Yeah the trick with purgeable is that the shrinker will wait for the
> buffers to retire if they're still active. So you can mark the purgeable
> right after the move_to_active call. Then the only part that doesn't
> happen automatically is the batch-pool internal accounting. But we also
> don't really care about that until we want a new shadow batch.

Ok. I was concerned about leaving objects purgeable because there are various
places where the driver checks that an object is not purgeable. Looking at it
again, the only one I'm nervous about is i915_gem_object_get_pages(), but I'll
put something together and see if it's a problem. I imagine we can avoid the
issue by carefully setting madv during/after the parser flow.

Brad

> 
> libdrm works the same way btw: Userspace grabs a new batch (from the cache
> hopefully, setting willneed again), builds the cmd stream and submits it.
> Then it frees the buffer right away, libdrm puts it into the bo cache and
> also marks it as purgeable right away.
> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/5] drm/i915: Use batch pools with the command parser
  2014-11-06 17:38               ` Volkin, Bradley D
@ 2014-11-07  9:32                 ` Chris Wilson
  2014-11-07  9:45                 ` Daniel Vetter
  1 sibling, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2014-11-07  9:32 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx@lists.freedesktop.org

On Thu, Nov 06, 2014 at 09:38:00AM -0800, Volkin, Bradley D wrote:
> On Thu, Nov 06, 2014 at 05:56:36AM -0800, Daniel Vetter wrote:
> > On Thu, Nov 06, 2014 at 07:36:55AM +0000, Chris Wilson wrote:
> > > On Wed, Nov 05, 2014 at 02:42:00PM -0800, Volkin, Bradley D wrote:
> > > > For this part, I've got an implementation that works ok but one difference is
> > > > that if we stop submitting batches, and therefore stop calling batch_pool_get,
> > > > we stop moving buffers to the batch pool's inactive list. This means some buffers
> > > > don't get marked purgeable even when they are. The solution that I see is to
> > > > add a function to do the batch pool active -> inactive work and then call that
> > > > from the appropriate place(s), but that seems to defeat the purpose of the
> > > > proposed change. Suggestions?
> > > 
> > > Just mark them always as purgeable.
> > 
> > Yeah the trick with purgeable is that the shrinker will wait for the
> > buffers to retire if they're still active. So you can mark the purgeable
> > right after the move_to_active call. Then the only part that doesn't
> > happen automatically is the batch-pool internal accounting. But we also
> > don't really care about that until we want a new shadow batch.
> 
> Ok. I was concerned about leaving objects purgeable because there are various
> places where the driver checks that an object is not purgeable. Looking at it
> again, the only one I'm nervous about is i915_gem_object_get_pages(), but I'll
> put something together and see if it's a problem. I imagine we can avoid the
> issue by carefully setting madv during/after the parser flow.

Once we have the pages we will only lose them to the shrinker.

  batch_get_obj() {
        // presume we have sufficient i915_gem_retire_requests_ring()
	// see i915_gem_execbuffer_reserver()
	list_for_each(cache, list) {
		obj = cache->obj;
			
		if (obj->active)
			break;
		if (obj->madv == __I915_MADV_PURGED) {
			free(cache+obj);
			continue;
		}
		list_move_tail(&cache->link, &list);
		return obj;
	}

	obj = i915_gem_obj_alloc();
	i915_gem_object_get_pages(obj);
	obj->madv = I915_MADV_DONTNEED;

	cache = kmalloc();
	cache->obj = obj;
	list_add_tail(&cache->link, &cache);

	return obj;
 }

Whilst the object is active, it won't be preferrentially shrunk, but
will still be discarded in dire straits. So we don't need to touch madv
or worry about pages after allocation. However, as the free of the
kernel structs will not happen until the next execbuffer, we may consider
explicitly adding the cleanup of the cache to the oom-notifier.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/5] drm/i915: Use batch pools with the command parser
  2014-11-06 17:38               ` Volkin, Bradley D
  2014-11-07  9:32                 ` Chris Wilson
@ 2014-11-07  9:45                 ` Daniel Vetter
  1 sibling, 0 replies; 23+ messages in thread
From: Daniel Vetter @ 2014-11-07  9:45 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx@lists.freedesktop.org

On Thu, Nov 06, 2014 at 09:38:00AM -0800, Volkin, Bradley D wrote:
> On Thu, Nov 06, 2014 at 05:56:36AM -0800, Daniel Vetter wrote:
> > On Thu, Nov 06, 2014 at 07:36:55AM +0000, Chris Wilson wrote:
> > > On Wed, Nov 05, 2014 at 02:42:00PM -0800, Volkin, Bradley D wrote:
> > > > For this part, I've got an implementation that works ok but one difference is
> > > > that if we stop submitting batches, and therefore stop calling batch_pool_get,
> > > > we stop moving buffers to the batch pool's inactive list. This means some buffers
> > > > don't get marked purgeable even when they are. The solution that I see is to
> > > > add a function to do the batch pool active -> inactive work and then call that
> > > > from the appropriate place(s), but that seems to defeat the purpose of the
> > > > proposed change. Suggestions?
> > > 
> > > Just mark them always as purgeable.
> > 
> > Yeah the trick with purgeable is that the shrinker will wait for the
> > buffers to retire if they're still active. So you can mark the purgeable
> > right after the move_to_active call. Then the only part that doesn't
> > happen automatically is the batch-pool internal accounting. But we also
> > don't really care about that until we want a new shadow batch.
> 
> Ok. I was concerned about leaving objects purgeable because there are various
> places where the driver checks that an object is not purgeable. Looking at it
> again, the only one I'm nervous about is i915_gem_object_get_pages(), but I'll
> put something together and see if it's a problem. I imagine we can avoid the
> issue by carefully setting madv during/after the parser flow.

Yeah, calling get_pages on a purgeable object is a bug. But you have the
bo already pinned, so the only thing we might call is put_pages. And
being able to free pages is the point of purgeable. Of course before you
reuse it you have to set the bo to willneed again.

Aside: Since you're digging around in all this, feel like doing a DOC:
comment about purgeable memeory and pulling it into the kerneldoc? I know
that GEM driver docs are really thin still so that comment will look
lonely in the docbook, but we need to start somewhere.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-11-07  9:44 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-03 19:19 [PATCH v3 0/5] Command parser batch buffer copy bradley.d.volkin
2014-11-03 19:19 ` [PATCH v3 1/5] drm/i915: Implement a framework for batch buffer pools bradley.d.volkin
2014-11-03 19:19 ` [PATCH v3 2/5] drm/i915: Use batch pools with the command parser bradley.d.volkin
2014-11-04 10:17   ` Daniel Vetter
2014-11-04 16:35     ` Volkin, Bradley D
2014-11-05  9:50       ` Daniel Vetter
2014-11-05 10:20         ` Daniel Vetter
2014-11-05 22:42         ` Volkin, Bradley D
2014-11-06  7:36           ` Chris Wilson
2014-11-06 13:56             ` Daniel Vetter
2014-11-06 17:38               ` Volkin, Bradley D
2014-11-07  9:32                 ` Chris Wilson
2014-11-07  9:45                 ` Daniel Vetter
2014-11-04 10:30   ` Daniel Vetter
2014-11-04 16:46     ` Volkin, Bradley D
2014-11-05  9:53       ` Daniel Vetter
2014-11-03 19:19 ` [PATCH v3 3/5] drm/i915: Add a batch pool debugfs file bradley.d.volkin
2014-11-03 19:19 ` [PATCH v3 4/5] drm/i915: Add batch pool details to i915_gem_objects debugfs bradley.d.volkin
2014-11-03 19:19 ` [PATCH v3 5/5] drm/i915: Use batch length instead of object size in command parser bradley.d.volkin
2014-11-03 19:32   ` [PATCH v3 5/5] drm/i915: Use batch length instead of shuang.he
2014-11-03 22:44 ` [PATCH v3 0/5] Command parser batch buffer copy Volkin, Bradley D
2014-11-04  9:51   ` Daniel Vetter
2014-11-04 10:31 ` Daniel Vetter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox