public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror
@ 2014-05-10  3:58 Ben Widawsky
  2014-05-10  3:58 ` [PATCH 01/56] drm/i915: Fix flush before context switch comment Ben Widawsky
                   ` (56 more replies)
  0 siblings, 57 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:58 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky

Just as before, these patches are living based off of my Broadwell
branch, here:
http://cgit.freedesktop.org/~bwidawsk/drm-intel/log/?h=gpu_mirror

This is the follow-on patches for [1]

This patch series brings 3 things:
1. Dynamic page table allocation for gen6-8
2. 64b (48b canonical) graphics virtual address space for Broadwell
3. An interface to specify a specific offset for a BO.

It's taken way longer than I thought to get this work done, and given
the current state of our driver, I fear I may not have time to see this
through to the end before I am pulled onto other things. If people want
to send me smallish bugfixes, I will gladly do my best to fix them
quickly. If there are more substantial change requests wrt design or
patch reorganization, I will not be able to accommodate. Someone else
must take over this patch series at that point if they want these
features. I do believe that everything up until the userptr patch is in
decent shape though, so we'll see, I guess. (if you are qualified to
take this over, and have interest, please let me know).

The patch series is highly volatile and not manicured. I've run exactly
1 test on the GPU mirror (see below for what that means), though many
more on the prior stuff. The series depends on full PPGTT, which is not
yet enabled by default, and has a few outstanding issues. It also has
been developed exclusively on pre-production hardware. I am only sending
out now because I will be on vacation for the next 10 days, and I know
there are people that can benefit from this code before I return. With
that, I got the last parts of this working very recently, and they're
very hackish. The reason for this lack of refinement is I expect the
interfaces for letting userspace dictate things to change (more on this
later), and the other part is I just ran out of time before my vacation.
Throughout development, I've been hitting issues which I am not yet sure
if they are bugs in my code, bugs in full PPGTT, bugs in userptr, or
generally flakiness. There are a few patches in here which say TESTME
reflecting upon this. Also, if you want to run this, I highly recommend
turning off semaphores, and rc6. (To be honest, I've not tried it
recently). You also need to turn on PPGTT since it is disabled by
default.

modprobe i915 enable_ppgtt=2 semaphores=0 enable_rc6=0

What you get in this series is what I'm going to coin, GPU mirror. This
patch series allows one to allocate an arbitrary address for your GPU
buffer object, and map it to a specific space within the GPUs address
space. This is only possible because on Broadwell we get a 64b canonical
GPU address space, and this allows us to map any CPU address as a GPU
address. The obvious usage here is malloc(). malloc() returns a pointer
that is valid on the CPU. Now that address can be identical on the GPU.

The interface provided is identical to the userptr interface previously
posted by Chris Wilson. I've added a flag to that interface that
indicates this new functionality. This is not necessarily the final
version, and it's arguably not the best idea either. The reason for this
choice is we had users of userptr that wanted to try out this concept
and not have to do much porting.

To get to the userptr interface, I had to make a few things happen
first. I needed to get dynamic page table allocation and teardown
working. This was posted previously for gen6-7 [1] (with very rough code
for gen8). I've now added more robust support for gen8 dynamic page
table allocations. Doing the allocations dynamically was important
because preallocating all 4 levels of page tables is not feasible in a
real system. 4 level page tables are required in order to be able to
support the 64b canonical address space.

With that all done, I was able to make a few minor hacks to userptr,
take the intel-gpu-tools test from Tvrtko, and see at least one pass.
FWIW, I am currently running,
./tests/gem_userptr_blits --run-subtest coherency-unsync

Since I feel the interface will likely change, I do not feel compelled
to post either my libdrm, not my IGT changes. If you want the modified
test, let me know, as I don't think it's really relevant here.

One last thing. Intel GPU tools, as it stands today, makes a lot of
assumptions about using an address space > 32b. I have not had time to
fix this. It is something which needs fixing before this series could
even be considered testable.

[1] http://lists.freedesktop.org/archives/intel-gfx/2014-March/041814.html

Ben Widawsky (54):
  drm/i915: Fix flush before context switch comment
  Revert "drm/i915: Drop I915_PARAM_HAS_FULL_PPGTT again"
  drm/i915: Wrap VMA binding
  drm/i915: Make pin global flags explicit
  drm/i915: Split out aliasing binds
  drm/i915: fix gtt_total_entries()
  drm/i915: Rename to GEN8_LEGACY_PDPES
  drm/i915: Split out verbose PPGTT dumping
  drm/i915: s/pd/pdpe, s/pt/pde
  drm/i915: rename map/unmap to dma_map/unmap
  drm/i915: Setup less PPGTT on failed pagedir
  drm/i915: clean up PPGTT init error path
  drm/i915: Un-hardcode number of page directories
  drm/i915: Make gen6_write_pdes gen6_map_page_tables
  drm/i915: Range clearing is PPGTT agnostic
  drm/i915: Page table helpers, and define renames
  drm/i915: construct page table abstractions
  drm/i915: Complete page table structures
  drm/i915: Create page table allocators
  drm/i915: Generalize GEN6 mapping
  drm/i915: Clean up pagetable DMA map & unmap
  drm/i915: Always dma map page table allocations
  drm/i915: Consolidate dma mappings
  drm/i915: Always dma map page directory allocations
  drm/i915: Track GEN6 page table usage
  drm/i915: Extract context switch skip logic
  drm/i915: Force pd restore when PDEs change, gen6-7
  drm/i915: Finish gen6/7 dynamic page table allocation
  drm/i915/bdw: Use dynamic allocation idioms on free
  drm/i915/bdw: pagedirs rework allocation
  drm/i915/bdw: pagetable allocation rework
  drm/i915/bdw: Make the pdp switch a bit less hacky
  drm/i915: num_pd_pages/num_pd_entries isn't useful
  drm/i915: Extract PPGTT param from pagedir alloc
  drm/i915/bdw: Split out mappings
  drm/i915/bdw: begin bitmap tracking
  drm/i915/bdw: Dynamic page table allocations
  drm/i915/bdw: Scratch unused pages
  drm/i915/bdw: Add ppgtt info for dynamic pages
  drm/i915/bdw: Optimize PDP loads
  TESTME: Either drop the last patch or fix it.
  drm/i915/bdw: Add dynamic page trace events
  drm/i915/bdw: Make pdp allocation more dynamic
  drm/i915/bdw: Abstract PDP usage
  drm/i915/bdw: implement alloc/teardown for 4lvl
  drm/i915/bdw: 4 level pages tables
  drm/i915: Restructure map vs. insert entries
  drm/i915/bdw: make aliasing PPGTT dynamic
  drm/i915: Expand error state's address width to 64b
  drm/i915/bdw: Flip the 48b switch
  TESTME: GFX_TLB_INVALIDATE_EXPLICIT
  TESTME: Always force invalidate
  drm/i915: Track userptr VMAs
  drm/i915/userptr: Mirror GPU addr at ioctl (HACK/POC)

Chris Wilson (2):
  drm/i915: Prevent signals from interrupting close()
  drm/i915: Introduce mapping of user pages into video memory (userptr)
    ioctl

 drivers/gpu/drm/i915/Kconfig               |    1 +
 drivers/gpu/drm/i915/Makefile              |    1 +
 drivers/gpu/drm/i915/i915_debugfs.c        |  112 +-
 drivers/gpu/drm/i915/i915_dma.c            |   15 +-
 drivers/gpu/drm/i915/i915_drv.h            |   40 +-
 drivers/gpu/drm/i915/i915_gem.c            |   61 +-
 drivers/gpu/drm/i915/i915_gem_context.c    |   31 +-
 drivers/gpu/drm/i915/i915_gem_dmabuf.c     |    5 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   22 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 1810 +++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  354 +++++-
 drivers/gpu/drm/i915/i915_gem_userptr.c    |  767 ++++++++++++
 drivers/gpu/drm/i915/i915_gpu_error.c      |   21 +-
 drivers/gpu/drm/i915/i915_reg.h            |    1 +
 drivers/gpu/drm/i915/i915_trace.h          |  140 +++
 drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +-
 include/uapi/drm/i915_drm.h                |   20 +
 17 files changed, 2823 insertions(+), 580 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_userptr.c

-- 
1.9.2

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 01/56] drm/i915: Fix flush before context switch comment
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
@ 2014-05-10  3:58 ` Ben Widawsky
  2014-05-10  3:58 ` [PATCH 02/56] Revert "drm/i915: Drop I915_PARAM_HAS_FULL_PPGTT again" Ben Widawsky
                   ` (55 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:58 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 6e2145b..29dd825 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -553,9 +553,7 @@ mi_set_context(struct intel_ring_buffer *ring,
 	int ret;
 
 	/* w/a: If Flush TLB Invalidation Mode is enabled, driver must do a TLB
-	 * invalidation prior to MI_SET_CONTEXT. On GEN6 we don't set the value
-	 * explicitly, so we rely on the value at ring init, stored in
-	 * itlb_before_ctx_switch.
+	 * invalidation prior to MI_SET_CONTEXT.
 	 */
 	if (IS_GEN6(ring->dev)) {
 		ret = ring->flush(ring, I915_GEM_GPU_DOMAINS, 0);
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 02/56] Revert "drm/i915: Drop I915_PARAM_HAS_FULL_PPGTT again"
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
  2014-05-10  3:58 ` [PATCH 01/56] drm/i915: Fix flush before context switch comment Ben Widawsky
@ 2014-05-10  3:58 ` Ben Widawsky
  2014-05-10  3:58 ` [PATCH 03/56] drm/i915: Prevent signals from interrupting close() Ben Widawsky
                   ` (54 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:58 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky

This reverts commit 7d9c477966e739a52d4c9655149958a2671ef376.

Conflicts:
	drivers/gpu/drm/i915/i915_dma.c
	include/uapi/drm/i915_drm.h
---
 drivers/gpu/drm/i915/i915_dma.c | 5 ++++-
 include/uapi/drm/i915_drm.h     | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index d02c8de..d10ddcc 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -994,7 +994,7 @@ static int i915_getparam(struct drm_device *dev, void *data,
 		value = HAS_WT(dev);
 		break;
 	case I915_PARAM_HAS_ALIASING_PPGTT:
-		value = dev_priv->mm.aliasing_ppgtt || USES_FULL_PPGTT(dev);
+		value = dev_priv->mm.aliasing_ppgtt ? 1 : 0;
 		break;
 	case I915_PARAM_HAS_WAIT_TIMEOUT:
 		value = 1;
@@ -1020,6 +1020,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_CMD_PARSER_VERSION:
 		value = i915_cmd_parser_get_version();
 		break;
+	case I915_PARAM_HAS_FULL_PPGTT:
+		value = USES_FULL_PPGTT(dev);
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 8a3e4ef00..6306a84 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -338,6 +338,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
 #define I915_PARAM_HAS_WT     	 	 27
 #define I915_PARAM_CMD_PARSER_VERSION	 28
+#define I915_PARAM_HAS_FULL_PPGTT	 29
 
 typedef struct drm_i915_getparam {
 	int param;
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 03/56] drm/i915: Prevent signals from interrupting close()
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
  2014-05-10  3:58 ` [PATCH 01/56] drm/i915: Fix flush before context switch comment Ben Widawsky
  2014-05-10  3:58 ` [PATCH 02/56] Revert "drm/i915: Drop I915_PARAM_HAS_FULL_PPGTT again" Ben Widawsky
@ 2014-05-10  3:58 ` Ben Widawsky
  2014-05-10  3:58 ` [PATCH 04/56] drm/i915: Wrap VMA binding Ben Widawsky
                   ` (53 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:58 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky

From: Chris Wilson <chris@chris-wilson.co.uk>

We neither report any unfinished operations during releasing GEM objects
associated with the file, and even if we did, it is bad form to report
-EINTR from a close().

The root cause of the bug that first showed itself during close is that
we do not do proper live tracking of vma and contexts under full-ppgtt,
but this is useful piece of defensive programming enforcing our
userspace API contract.

Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_dma.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index d10ddcc..54a08a9 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1921,9 +1921,18 @@ void i915_driver_lastclose(struct drm_device * dev)
 
 void i915_driver_preclose(struct drm_device * dev, struct drm_file *file_priv)
 {
+	struct drm_i915_private *dev_priv = to_i915(dev);
+	bool was_interruptible;
+
 	mutex_lock(&dev->struct_mutex);
+	was_interruptible = dev_priv->mm.interruptible;
+	WARN_ON(!was_interruptible);
+	dev_priv->mm.interruptible = false;
+
 	i915_gem_context_close(dev, file_priv);
 	i915_gem_release(dev, file_priv);
+
+	dev_priv->mm.interruptible = was_interruptible;
 	mutex_unlock(&dev->struct_mutex);
 }
 
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 04/56] drm/i915: Wrap VMA binding
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (2 preceding siblings ...)
  2014-05-10  3:58 ` [PATCH 03/56] drm/i915: Prevent signals from interrupting close() Ben Widawsky
@ 2014-05-10  3:58 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 05/56] drm/i915: Make pin global flags explicit Ben Widawsky
                   ` (52 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:58 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This will be useful for some upcoming patches which do more platform
specific work. Having it in one central place just makes things a bit
cleaner and easier.

NOTE: I didn't actually end up using this patch for the intended
purpose, but I thought it was a nice patch to keep around.

v2: s/i915_gem_bind_vma/i915_gem_vma_bind/
s/i915_gem_unbind_vma/i915_gem_vma_unbind/
(Chris)

v3: Missed one spot

v4: Don't change the trace events (Daniel)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h            |  3 +++
 drivers/gpu/drm/i915/i915_gem.c            | 12 ++++++------
 drivers/gpu/drm/i915/i915_gem_context.c    |  2 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 +++--
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 13 ++++++++++++-
 5 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1a190a1..88d3d82 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2312,6 +2312,9 @@ bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
 			struct i915_address_space *vm);
 unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
 				struct i915_address_space *vm);
+void i915_gem_vma_bind(struct i915_vma *vma, enum i915_cache_level,
+		       unsigned flags);
+void i915_gem_vma_unbind(struct i915_vma *vma);
 struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 				     struct i915_address_space *vm);
 struct i915_vma *
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8fd1824..59b0e67 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2771,7 +2771,7 @@ int i915_vma_unbind(struct i915_vma *vma)
 
 	trace_i915_vma_unbind(vma);
 
-	vma->unbind_vma(vma);
+	i915_gem_vma_unbind(vma);
 
 	i915_gem_gtt_finish_object(obj);
 
@@ -3317,8 +3317,8 @@ search_free:
 	WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
 
 	trace_i915_vma_bind(vma, flags);
-	vma->bind_vma(vma, obj->cache_level,
-		      flags & (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0);
+	i915_gem_vma_bind(vma, obj->cache_level,
+			  flags & (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0);
 
 	i915_gem_verify_gtt(dev);
 	return vma;
@@ -3522,8 +3522,8 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 
 		list_for_each_entry(vma, &obj->vma_list, vma_link)
 			if (drm_mm_node_allocated(&vma->node))
-				vma->bind_vma(vma, cache_level,
-					      obj->has_global_gtt_mapping ? GLOBAL_BIND : 0);
+				i915_gem_vma_bind(vma, cache_level,
+						  obj->has_global_gtt_mapping ? GLOBAL_BIND : 0);
 	}
 
 	list_for_each_entry(vma, &obj->vma_list, vma_link)
@@ -3892,7 +3892,7 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
 	}
 
 	if (flags & PIN_GLOBAL && !obj->has_global_gtt_mapping)
-		vma->bind_vma(vma, obj->cache_level, GLOBAL_BIND);
+		i915_gem_vma_bind(vma, obj->cache_level, GLOBAL_BIND);
 
 	vma->pin_count++;
 	if (flags & PIN_MAPPABLE)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 29dd825..f2dc17a 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -652,7 +652,7 @@ static int do_switch(struct intel_ring_buffer *ring,
 	if (!to->obj->has_global_gtt_mapping) {
 		struct i915_vma *vma = i915_gem_obj_to_vma(to->obj,
 							   &dev_priv->gtt.base);
-		vma->bind_vma(vma, to->obj->cache_level, GLOBAL_BIND);
+		i915_gem_vma_bind(vma, to->obj->cache_level, GLOBAL_BIND);
 	}
 
 	if (!to->is_initialized || i915_gem_context_is_default(to))
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 47fe8ec..cd9b932 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -373,7 +373,8 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 		struct i915_vma *vma =
 			list_first_entry(&target_i915_obj->vma_list,
 					 typeof(*vma), vma_link);
-		vma->bind_vma(vma, target_i915_obj->cache_level, GLOBAL_BIND);
+		i915_gem_vma_bind(vma, target_i915_obj->cache_level,
+				  GLOBAL_BIND);
 	}
 
 	/* Validate that the target is in a valid r/w GPU domain */
@@ -1269,7 +1270,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		 * allocate space first */
 		struct i915_vma *vma = i915_gem_obj_to_ggtt(batch_obj);
 		BUG_ON(!vma);
-		vma->bind_vma(vma, batch_obj->cache_level, GLOBAL_BIND);
+		i915_gem_vma_bind(vma, batch_obj->cache_level, GLOBAL_BIND);
 	}
 
 	if (flags & I915_DISPATCH_SECURE)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 33610fe..87d92d0 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1340,7 +1340,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 		 * without telling our object about it. So we need to fake it.
 		 */
 		obj->has_global_gtt_mapping = 0;
-		vma->bind_vma(vma, obj->cache_level, GLOBAL_BIND);
+		i915_gem_vma_bind(vma, obj->cache_level, GLOBAL_BIND);
 	}
 
 
@@ -2041,6 +2041,17 @@ int i915_gem_gtt_init(struct drm_device *dev)
 	return 0;
 }
 
+void i915_gem_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
+		       unsigned flags)
+{
+	vma->bind_vma(vma, cache_level, flags);
+}
+
+void i915_gem_vma_unbind(struct i915_vma *vma)
+{
+	vma->unbind_vma(vma);
+}
+
 static struct i915_vma *__i915_gem_vma_create(struct drm_i915_gem_object *obj,
 					      struct i915_address_space *vm)
 {
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 05/56] drm/i915: Make pin global flags explicit
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (3 preceding siblings ...)
  2014-05-10  3:58 ` [PATCH 04/56] drm/i915: Wrap VMA binding Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 06/56] drm/i915: Split out aliasing binds Ben Widawsky
                   ` (51 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

The driver currently lets callers pin global, and then tries to do
things correctly inside the function. Doing so has two downsides:
1. It's not possible to exclusively pin to a global, or an aliasing
address space.
2. It's difficult to read, and understand.

The eventual goal when realized should fix both of the issues. This patch
which should have no functional impact begins to address these issues
without intentionally breaking things.

v2: Replace PIN_GLOBAL with PIN_ALIASING in _pin(). Copy paste error

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h            |  4 +++-
 drivers/gpu/drm/i915/i915_gem.c            | 31 +++++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  8 ++++++--
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 12 ++++++++++--
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  4 ++++
 5 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 88d3d82..62e1ecb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2136,6 +2136,8 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
 #define PIN_MAPPABLE 0x1
 #define PIN_NONBLOCK 0x2
 #define PIN_GLOBAL 0x4
+#define PIN_ALIASING 0x8
+#define PIN_GLOBAL_ALIASED (PIN_ALIASING | PIN_GLOBAL)
 int __must_check i915_gem_object_pin(struct drm_i915_gem_object *obj,
 				     struct i915_address_space *vm,
 				     uint32_t alignment,
@@ -2362,7 +2364,7 @@ i915_gem_obj_ggtt_pin(struct drm_i915_gem_object *obj,
 		      uint32_t alignment,
 		      unsigned flags)
 {
-	return i915_gem_object_pin(obj, obj_to_ggtt(obj), alignment, flags | PIN_GLOBAL);
+	return i915_gem_object_pin(obj, obj_to_ggtt(obj), alignment, flags | PIN_GLOBAL_ALIASED);
 }
 
 static inline int
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 59b0e67..e3ac643 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3231,8 +3231,12 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	size_t gtt_max =
 		flags & PIN_MAPPABLE ? dev_priv->gtt.mappable_end : vm->total;
 	struct i915_vma *vma;
+	u32 vma_bind_flags = 0;
 	int ret;
 
+	if (WARN_ON((flags & (PIN_MAPPABLE | PIN_GLOBAL)) == PIN_MAPPABLE))
+		flags |= PIN_GLOBAL;
+
 	fence_size = i915_gem_get_gtt_size(dev,
 					   obj->base.size,
 					   obj->tiling_mode);
@@ -3316,9 +3320,11 @@ search_free:
 
 	WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
 
+	if (flags & PIN_GLOBAL_ALIASED)
+		vma_bind_flags = GLOBAL_BIND | ALIASING_BIND;
+
 	trace_i915_vma_bind(vma, flags);
-	i915_gem_vma_bind(vma, obj->cache_level,
-			  flags & (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0);
+	i915_gem_vma_bind(vma, obj->cache_level, vma_bind_flags);
 
 	i915_gem_verify_gtt(dev);
 	return vma;
@@ -3521,9 +3527,14 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		}
 
 		list_for_each_entry(vma, &obj->vma_list, vma_link)
-			if (drm_mm_node_allocated(&vma->node))
-				i915_gem_vma_bind(vma, cache_level,
-						  obj->has_global_gtt_mapping ? GLOBAL_BIND : 0);
+			if (drm_mm_node_allocated(&vma->node)) {
+				u32 bind_flags = 0;
+				if (obj->has_global_gtt_mapping)
+					bind_flags |= GLOBAL_BIND;
+				if (obj->has_aliasing_ppgtt_mapping)
+					bind_flags |= ALIASING_BIND;
+				i915_gem_vma_bind(vma, cache_level, bind_flags);
+			}
 	}
 
 	list_for_each_entry(vma, &obj->vma_list, vma_link)
@@ -3891,8 +3902,14 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
 			return PTR_ERR(vma);
 	}
 
-	if (flags & PIN_GLOBAL && !obj->has_global_gtt_mapping)
-		i915_gem_vma_bind(vma, obj->cache_level, GLOBAL_BIND);
+	if (flags & PIN_GLOBAL_ALIASED) {
+		u32 bind_flags = 0;
+		if (flags & PIN_GLOBAL && !obj->has_global_gtt_mapping)
+			bind_flags |= GLOBAL_BIND;
+		if (flags & PIN_ALIASING && !obj->has_aliasing_ppgtt_mapping)
+			bind_flags |= ALIASING_BIND;
+		i915_gem_vma_bind(vma, obj->cache_level, bind_flags);
+	}
 
 	vma->pin_count++;
 	if (flags & PIN_MAPPABLE)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index cd9b932..7cad10f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -374,7 +374,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 			list_first_entry(&target_i915_obj->vma_list,
 					 typeof(*vma), vma_link);
 		i915_gem_vma_bind(vma, target_i915_obj->cache_level,
-				  GLOBAL_BIND);
+				  GLOBAL_BIND | ALIASING_BIND);
 	}
 
 	/* Validate that the target is in a valid r/w GPU domain */
@@ -561,6 +561,7 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
 	if (need_fence || need_reloc_mappable(vma))
 		flags |= PIN_MAPPABLE;
 
+	/* FIXME: What kind of bind does Chris want? */
 	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
 		flags |= PIN_GLOBAL;
 
@@ -1270,7 +1271,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		 * allocate space first */
 		struct i915_vma *vma = i915_gem_obj_to_ggtt(batch_obj);
 		BUG_ON(!vma);
-		i915_gem_vma_bind(vma, batch_obj->cache_level, GLOBAL_BIND);
+		/* FIXME: Current secure dispatch code actually uses PPGTT. We
+		 * need to fix this eventually */
+		i915_gem_vma_bind(vma, batch_obj->cache_level,
+				  GLOBAL_BIND | ALIASING_BIND);
 	}
 
 	if (flags & I915_DISPATCH_SECURE)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 87d92d0..226afea 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1339,8 +1339,16 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 		 * Unfortunately above, we've just wiped out the mappings
 		 * without telling our object about it. So we need to fake it.
 		 */
-		obj->has_global_gtt_mapping = 0;
-		i915_gem_vma_bind(vma, obj->cache_level, GLOBAL_BIND);
+		if (obj->has_global_gtt_mapping || obj->has_aliasing_ppgtt_mapping) {
+			u32 bind_flags = 0;
+			if (obj->has_global_gtt_mapping)
+				        bind_flags |= GLOBAL_BIND;
+			if (obj->has_aliasing_ppgtt_mapping)
+				        bind_flags |= ALIASING_BIND;
+			obj->has_global_gtt_mapping = 0;
+			obj->has_aliasing_ppgtt_mapping = 0;
+			i915_gem_vma_bind(vma, obj->cache_level, bind_flags);
+		}
 	}
 
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index cfca023..5635c65 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -153,7 +153,11 @@ struct i915_vma {
 	 * setting the valid PTE entries to a reserved scratch page. */
 	void (*unbind_vma)(struct i915_vma *vma);
 	/* Map an object into an address space with the given cache flags. */
+
+/* Only use this if you know you want a strictly global binding */
 #define GLOBAL_BIND (1<<0)
+/* Only use this if you know you want a strictly aliased binding */
+#define ALIASING_BIND (1<<1)
 	void (*bind_vma)(struct i915_vma *vma,
 			 enum i915_cache_level cache_level,
 			 u32 flags);
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 06/56] drm/i915: Split out aliasing binds
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (4 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 05/56] drm/i915: Make pin global flags explicit Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 07/56] drm/i915: fix gtt_total_entries() Ben Widawsky
                   ` (50 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This patch finishes off  actually separating the aliasing and global
finds. Prior to this, all global binds would be aliased. Now if aliasing
binds are required, they must be explicitly asked for. So far, we have
no users of this outside of execbuf - but Mika has already submitted a
patch requiring just this.

A nice benefit of this is we should no longer be able to clobber GTT
only objects from the aliasing PPGTT.

v2: Only add aliasing binds for the GGTT/Aliasing PPGTT at execbuf

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h            | 2 +-
 drivers/gpu/drm/i915/i915_gem.c            | 6 ++++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +++--
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 3 +++
 4 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 62e1ecb..29bf034 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2364,7 +2364,7 @@ i915_gem_obj_ggtt_pin(struct drm_i915_gem_object *obj,
 		      uint32_t alignment,
 		      unsigned flags)
 {
-	return i915_gem_object_pin(obj, obj_to_ggtt(obj), alignment, flags | PIN_GLOBAL_ALIASED);
+	return i915_gem_object_pin(obj, obj_to_ggtt(obj), alignment, flags | PIN_GLOBAL);
 }
 
 static inline int
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e3ac643..320d6b0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3320,8 +3320,10 @@ search_free:
 
 	WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
 
-	if (flags & PIN_GLOBAL_ALIASED)
-		vma_bind_flags = GLOBAL_BIND | ALIASING_BIND;
+	if (flags & PIN_ALIASING)
+		vma_bind_flags = ALIASING_BIND;
+	if (flags & PIN_GLOBAL)
+		vma_bind_flags = GLOBAL_BIND;
 
 	trace_i915_vma_bind(vma, flags);
 	i915_gem_vma_bind(vma, obj->cache_level, vma_bind_flags);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 7cad10f..3c3aba7 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -549,10 +549,11 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
 	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 	bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4;
 	bool need_fence;
-	unsigned flags;
+	unsigned flags = 0;
 	int ret;
 
-	flags = 0;
+	if (i915_is_ggtt(vma->vm))
+		flags = PIN_ALIASING;
 
 	need_fence =
 		has_fenced_gpu_access &&
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 226afea..bec637b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1595,6 +1595,9 @@ static void ggtt_bind_vma(struct i915_vma *vma,
 		}
 	}
 
+	if (!(flags & ALIASING_BIND))
+		return;
+
 	if (dev_priv->mm.aliasing_ppgtt &&
 	    (!obj->has_aliasing_ppgtt_mapping ||
 	     (cache_level != obj->cache_level))) {
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 07/56] drm/i915: fix gtt_total_entries()
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (5 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 06/56] drm/i915: Split out aliasing binds Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 08/56] drm/i915: Rename to GEN8_LEGACY_PDPES Ben Widawsky
                   ` (49 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

It's useful to have it not as a macro for some upcoming work. Generally
since we try to avoid macros anyway, I think it doesn't hurt to put this
as its own patch.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 9 +++++++--
 drivers/gpu/drm/i915/i915_gem_gtt.h | 2 --
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bec637b..33cac92 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -73,6 +73,11 @@ static void ppgtt_bind_vma(struct i915_vma *vma,
 static void ppgtt_unbind_vma(struct i915_vma *vma);
 static int gen8_ppgtt_enable(struct i915_hw_ppgtt *ppgtt);
 
+static size_t gtt_total_entries(struct i915_gtt *gtt)
+{
+	return gtt->base.total >> PAGE_SHIFT;
+}
+
 static inline gen8_gtt_pte_t gen8_pte_encode(dma_addr_t addr,
 					     enum i915_cache_level level,
 					     bool valid)
@@ -1491,7 +1496,7 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
 	unsigned num_entries = length >> PAGE_SHIFT;
 	gen8_gtt_pte_t scratch_pte, __iomem *gtt_base =
 		(gen8_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
-	const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
+	const int max_entries = gtt_total_entries(&dev_priv->gtt) - first_entry;
 	int i;
 
 	if (WARN(num_entries > max_entries,
@@ -1517,7 +1522,7 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
 	unsigned num_entries = length >> PAGE_SHIFT;
 	gen6_gtt_pte_t scratch_pte, __iomem *gtt_base =
 		(gen6_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
-	const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
+	const int max_entries = gtt_total_entries(&dev_priv->gtt) - first_entry;
 	int i;
 
 	if (WARN(num_entries > max_entries,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 5635c65..ad68079 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -38,8 +38,6 @@ typedef uint32_t gen6_gtt_pte_t;
 typedef uint64_t gen8_gtt_pte_t;
 typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 
-#define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
-
 #define I915_PPGTT_PT_ENTRIES		(PAGE_SIZE / sizeof(gen6_gtt_pte_t))
 /* gen6-hsw has bit 11-4 for physical addr bit 39-32 */
 #define GEN6_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0xff0))
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 08/56] drm/i915: Rename to GEN8_LEGACY_PDPES
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (6 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 07/56] drm/i915: fix gtt_total_entries() Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 09/56] drm/i915: Split out verbose PPGTT dumping Ben Widawsky
                   ` (48 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

In gen8, 32b PPGTT has always had one "pdp" (it doesn't actually have
one, but it resembles having one). The #define was confusing as is, and
using "PDPE" is a much better description.

sed -i 's/GEN8_LEGACY_PDPS/GEN8_LEGACY_PDPES/' drivers/gpu/drm/i915/*.[ch]

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 6 +++---
 drivers/gpu/drm/i915/i915_gem_gtt.h | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 33cac92..d3c52b1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -319,7 +319,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
-		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPS))
+		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
 			break;
 
 		if (pt_vaddr == NULL)
@@ -433,7 +433,7 @@ bail:
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
 					   const int max_pdp)
 {
-	struct page **pt_pages[GEN8_LEGACY_PDPS];
+	struct page **pt_pages[GEN8_LEGACY_PDPES];
 	int i, ret;
 
 	for (i = 0; i < max_pdp; i++) {
@@ -485,7 +485,7 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 		return -ENOMEM;
 
 	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
-	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
+	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index ad68079..7c06c43 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -84,7 +84,7 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN8_PDE_MASK			0x1ff
 #define GEN8_PTE_SHIFT			12
 #define GEN8_PTE_MASK			0x1ff
-#define GEN8_LEGACY_PDPS		4
+#define GEN8_LEGACY_PDPES		4
 #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
 #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
 
@@ -247,12 +247,12 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_pages; /* gen8+ */
 	union {
 		struct page **pt_pages;
-		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
+		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
 	};
 	struct page *pd_pages;
 	union {
 		uint32_t pd_offset;
-		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
+		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
 	};
 	union {
 		dma_addr_t *pt_dma_addr;
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 09/56] drm/i915: Split out verbose PPGTT dumping
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (7 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 08/56] drm/i915: Rename to GEN8_LEGACY_PDPES Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 10/56] drm/i915: s/pd/pdpe, s/pt/pde Ben Widawsky
                   ` (47 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

There often is not enough memory to dump the full contents of the PPGTT.
As a temporary bandage, to continue getting valuable basic PPGTT info,
wrap the dangerous, memory hungry part inside of a new verbose version
of the debugfs file.

Also while here we can split out the PPGTT print function so it's more
reusable.

I'd really like to get PPGTT info into our error state, but I found it too
difficult to make work in the limited time I have. Maybe Mika can find a way.

v2: Get the info for the non-default contexts. Merge a patch from Chris
into this patch (Chris). All credit goes to him.

References: 20140320115742.GA4463@nuc-i3427.alporthouse.com
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 49 +++++++++++++++++++++++--------------
 1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index d9c1414..4a0b1c8 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1812,18 +1812,13 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
 	return 0;
 }
 
-static int per_file_ctx(int id, void *ptr, void *data)
+static void print_ppgtt(struct seq_file *m, struct i915_hw_ppgtt *ppgtt, const char *name)
 {
-	struct i915_hw_context *ctx = ptr;
-	struct seq_file *m = data;
-	struct i915_hw_ppgtt *ppgtt = ctx_to_ppgtt(ctx);
-
-	ppgtt->debug_dump(ppgtt, m);
-
-	return 0;
+	seq_printf(m, "%s:\n", name);
+	seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
 }
 
-static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
+static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int verbose)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *ring;
@@ -1847,7 +1842,21 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 	}
 }
 
-static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
+static int per_file_ctx(int id, void *ptr, void *data)
+{
+	struct i915_hw_context *ctx = ptr;
+	struct seq_file *m = data;
+	bool verbose = (unsigned long)data & 1;
+	struct i915_hw_ppgtt *ppgtt = ctx_to_ppgtt(ctx);
+
+	print_ppgtt(m, ppgtt, ctx->id == DEFAULT_CONTEXT_ID ? "Default context" : "User context");
+	if (verbose)
+		ppgtt->debug_dump(ppgtt, m);
+
+	return 0;
+}
+
+static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool verbose)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *ring;
@@ -1868,10 +1877,9 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 	if (dev_priv->mm.aliasing_ppgtt) {
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
-		seq_puts(m, "aliasing PPGTT:\n");
-		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
-
-		ppgtt->debug_dump(ppgtt, m);
+		print_ppgtt(m, ppgtt, "Aliasing PPGTT");
+		if (verbose)
+			ppgtt->debug_dump(ppgtt, m);
 	} else
 		return;
 
@@ -1880,10 +1888,11 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		struct i915_hw_ppgtt *pvt_ppgtt;
 
 		pvt_ppgtt = ctx_to_ppgtt(file_priv->private_default_ctx);
-		seq_printf(m, "proc: %s\n",
+		seq_printf(m, "\nproc: %s\n",
 			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
-		seq_puts(m, "  default context:\n");
-		idr_for_each(&file_priv->context_idr, per_file_ctx, m);
+		print_ppgtt(m, pvt_ppgtt, "Default context");
+		idr_for_each(&file_priv->context_idr, per_file_ctx,
+			     (void *)((unsigned long)m | verbose));
 	}
 	seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
 }
@@ -1893,6 +1902,7 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
 	struct drm_info_node *node = (struct drm_info_node *) m->private;
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	bool verbose = node->info_ent->data ? true : false;
 
 	int ret = mutex_lock_interruptible(&dev->struct_mutex);
 	if (ret)
@@ -1900,9 +1910,9 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
 	intel_runtime_pm_get(dev_priv);
 
 	if (INTEL_INFO(dev)->gen >= 8)
-		gen8_ppgtt_info(m, dev);
+		gen8_ppgtt_info(m, dev, verbose);
 	else if (INTEL_INFO(dev)->gen >= 6)
-		gen6_ppgtt_info(m, dev);
+		gen6_ppgtt_info(m, dev, verbose);
 
 	intel_runtime_pm_put(dev_priv);
 	mutex_unlock(&dev->struct_mutex);
@@ -3843,6 +3853,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
 	{"i915_swizzle_info", i915_swizzle_info, 0},
 	{"i915_ppgtt_info", i915_ppgtt_info, 0},
+	{"i915_ppgtt_verbose_info", i915_ppgtt_info, 0, (void *)1},
 	{"i915_llc", i915_llc, 0},
 	{"i915_edp_psr_status", i915_edp_psr_status, 0},
 	{"i915_sink_crc_eDP1", i915_sink_crc, 0},
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 10/56] drm/i915: s/pd/pdpe, s/pt/pde
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (8 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 09/56] drm/i915: Split out verbose PPGTT dumping Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 11/56] drm/i915: rename map/unmap to dma_map/unmap Ben Widawsky
                   ` (46 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

The actual correct way to think about this with the new style of page
table data structures is as the actual entry that is being indexed into
the array. "pd", and "pt" aren't representative of what the operation is
doing.

The clarity here will improve the readability of future patches.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d3c52b1..0869e54 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -515,40 +515,40 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 }
 
 static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
-					     const int pd)
+					     const int pdpe)
 {
 	dma_addr_t pd_addr;
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       &ppgtt->pd_pages[pd], 0,
+			       &ppgtt->pd_pages[pdpe], 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->pd_dma_addr[pd] = pd_addr;
+	ppgtt->pd_dma_addr[pdpe] = pd_addr;
 
 	return 0;
 }
 
 static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
-					const int pd,
-					const int pt)
+					const int pdpe,
+					const int pde)
 {
 	dma_addr_t pt_addr;
 	struct page *p;
 	int ret;
 
-	p = ppgtt->gen8_pt_pages[pd][pt];
+	p = ppgtt->gen8_pt_pages[pdpe][pde];
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->gen8_pt_dma_addr[pd][pt] = pt_addr;
+	ppgtt->gen8_pt_dma_addr[pdpe][pde] = pt_addr;
 
 	return 0;
 }
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 11/56] drm/i915: rename map/unmap to dma_map/unmap
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (9 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 10/56] drm/i915: s/pd/pdpe, s/pt/pde Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 12/56] drm/i915: Setup less PPGTT on failed pagedir Ben Widawsky
                   ` (45 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Upcoming patches will use the terms map and unmap in references to the
page table entries. Having this distinction will really help with code
clarity at that point.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0869e54..d772577 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -372,7 +372,7 @@ static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
 	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
 }
 
-static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
+static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i, j;
@@ -403,7 +403,7 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	list_del(&vm->global_link);
 	drm_mm_takedown(&vm->mm);
 
-	gen8_ppgtt_unmap_pages(ppgtt);
+	gen8_ppgtt_dma_unmap_pages(ppgtt);
 	gen8_ppgtt_free(ppgtt);
 }
 
@@ -631,7 +631,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	return 0;
 
 bail:
-	gen8_ppgtt_unmap_pages(ppgtt);
+	gen8_ppgtt_dma_unmap_pages(ppgtt);
 	gen8_ppgtt_free(ppgtt);
 	return ret;
 }
@@ -999,7 +999,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 }
 
-static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
+static void gen6_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
@@ -1030,7 +1030,7 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 	drm_mm_takedown(&ppgtt->base.mm);
 	drm_mm_remove_node(&ppgtt->node);
 
-	gen6_ppgtt_unmap_pages(ppgtt);
+	gen6_ppgtt_dma_unmap_pages(ppgtt);
 	gen6_ppgtt_free(ppgtt);
 }
 
@@ -1128,7 +1128,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 				       PCI_DMA_BIDIRECTIONAL);
 
 		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
-			gen6_ppgtt_unmap_pages(ppgtt);
+			gen6_ppgtt_dma_unmap_pages(ppgtt);
 			return -EIO;
 		}
 
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 12/56] drm/i915: Setup less PPGTT on failed pagedir
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (10 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 11/56] drm/i915: rename map/unmap to dma_map/unmap Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 13/56] drm/i915: clean up PPGTT init error path Ben Widawsky
                   ` (44 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

The current code will both potentially print a WARN, and setup part of
the PPGTT structure. Neither of these harm the current code, it is
simply for clarity, and to perhaps prevent later bugs, or weird
debug messages.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d772577..5ca8208 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1063,11 +1063,14 @@ alloc:
 		goto alloc;
 	}
 
+	if (ret)
+		return ret;
+
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
 	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
-	return ret;
+	return 0;
 }
 
 static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 13/56] drm/i915: clean up PPGTT init error path
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (11 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 12/56] drm/i915: Setup less PPGTT on failed pagedir Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 14/56] drm/i915: Un-hardcode number of page directories Ben Widawsky
                   ` (43 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

The old code (I'm having trouble finding the commit) had a reason for
doing things when there was an error, and would continue on, thus the
!ret. For the newer code however, this looks completely silly.

Follow the normal idiom of if (ret) return ret.

Also, put the pde wiring in the gen specific init, now that GEN8 exists.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5ca8208..08b1b25 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1180,6 +1180,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
+	gen6_write_pdes(ppgtt);
+
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
@@ -1204,20 +1206,14 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	else
 		BUG();
 
-	if (!ret) {
-		struct drm_i915_private *dev_priv = dev->dev_private;
-		kref_init(&ppgtt->ref);
-		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
-			    ppgtt->base.total);
-		i915_init_vm(dev_priv, &ppgtt->base);
-		if (INTEL_INFO(dev)->gen < 8) {
-			gen6_write_pdes(ppgtt);
-			DRM_DEBUG("Adding PPGTT at offset %x\n",
-				  ppgtt->pd_offset << 10);
-		}
-	}
+	if (ret)
+		return ret;
 
-	return ret;
+	kref_init(&ppgtt->ref);
+	drm_mm_init(&ppgtt->base.mm, ppgtt->base.start, ppgtt->base.total);
+	i915_init_vm(dev_priv, &ppgtt->base);
+
+	return 0;
 }
 
 static void
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 14/56] drm/i915: Un-hardcode number of page directories
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (12 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 13/56] drm/i915: clean up PPGTT init error path Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 15/56] drm/i915: Make gen6_write_pdes gen6_map_page_tables Ben Widawsky
                   ` (42 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

trivial.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 7c06c43..2002393 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -256,7 +256,7 @@ struct i915_hw_ppgtt {
 	};
 	union {
 		dma_addr_t *pt_dma_addr;
-		dma_addr_t *gen8_pt_dma_addr[4];
+		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
 	};
 
 	struct i915_hw_context *ctx;
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 15/56] drm/i915: Make gen6_write_pdes gen6_map_page_tables
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (13 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 14/56] drm/i915: Un-hardcode number of page directories Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 16/56] drm/i915: Range clearing is PPGTT agnostic Ben Widawsky
                   ` (41 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Split out single mappings which will help with upcoming work. Also while
here, rename the function because it is a better description - but this
function is going away soon.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 39 ++++++++++++++++++++++---------------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 08b1b25..bfa9811 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -692,26 +692,33 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	}
 }
 
-static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
+static void gen6_map_single(struct i915_hw_ppgtt *ppgtt,
+			    const unsigned pde_index,
+			    dma_addr_t daddr)
 {
 	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
-	gen6_gtt_pte_t __iomem *pd_addr;
 	uint32_t pd_entry;
+	gen6_gtt_pte_t __iomem *pd_addr =
+		(gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+
+	pd_entry = GEN6_PDE_ADDR_ENCODE(daddr);
+	pd_entry |= GEN6_PDE_VALID;
+
+	writel(pd_entry, pd_addr + pde_index);
+}
+
+/* Map all the page tables found in the ppgtt structure to incrementing page
+ * directories. */
+static void gen6_map_page_tables(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
 	int i;
 
 	WARN_ON(ppgtt->pd_offset & 0x3f);
-	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
-		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		dma_addr_t pt_addr;
-
-		pt_addr = ppgtt->pt_dma_addr[i];
-		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
-		pd_entry |= GEN6_PDE_VALID;
+	for (i = 0; i < ppgtt->num_pd_entries; i++)
+		gen6_map_single(ppgtt, i, ppgtt->pt_dma_addr[i]);
 
-		writel(pd_entry, pd_addr + i);
-	}
-	readl(pd_addr);
+	readl(dev_priv->gtt.gsm);
 }
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
@@ -1180,7 +1187,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
-	gen6_write_pdes(ppgtt);
+	gen6_map_page_tables(ppgtt);
 
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 
@@ -1369,11 +1376,11 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 		/* TODO: Perhaps it shouldn't be gen6 specific */
 		if (i915_is_ggtt(vm)) {
 			if (dev_priv->mm.aliasing_ppgtt)
-				gen6_write_pdes(dev_priv->mm.aliasing_ppgtt);
+				gen6_map_page_tables(dev_priv->mm.aliasing_ppgtt);
 			continue;
 		}
 
-		gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt, base));
+		gen6_map_page_tables(container_of(vm, struct i915_hw_ppgtt, base));
 	}
 
 	i915_gem_chipset_flush(dev);
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 16/56] drm/i915: Range clearing is PPGTT agnostic
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (14 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 15/56] drm/i915: Make gen6_write_pdes gen6_map_page_tables Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 17/56] drm/i915: Page table helpers, and define renames Ben Widawsky
                   ` (40 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Therefore we can do it from our general init function. Eventually, I
hope to have a lot more commonality like this. It won't arrive yet, but
this was a nice easy one.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bfa9811..086c533 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -621,8 +621,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.start = 0;
 	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE;
 
-	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
-
 	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
 			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
 	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
@@ -1189,8 +1187,6 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 
 	gen6_map_page_tables(ppgtt);
 
-	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
-
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 			 ppgtt->node.size >> 20,
 			 ppgtt->node.start / PAGE_SIZE);
@@ -1218,6 +1214,7 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 
 	kref_init(&ppgtt->ref);
 	drm_mm_init(&ppgtt->base.mm, ppgtt->base.start, ppgtt->base.total);
+	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 	i915_init_vm(dev_priv, &ppgtt->base);
 
 	return 0;
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 17/56] drm/i915: Page table helpers, and define renames
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (15 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 16/56] drm/i915: Range clearing is PPGTT agnostic Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 18/56] drm/i915: construct page table abstractions Ben Widawsky
                   ` (39 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

These page table helpers make the code much cleaner. There is some
room to use the arch/x86 header files. The reason I've opted not to is
in several cases, the definitions are dictated by the CONFIG_ options
which do not always indicate the restrictions in the GPU. While here,
clean up the defines to have more concise names, and consolidate between
gen6 and gen8 where appropriate.

v2: Use I915_PAGE_SIZE to remove PAGE_SIZE dep in the new code (Jesse)
Fix bugged I915_PTE_MASK define, which was unused (Chris)
BUG_ON bad length/size - taking directly from Chris (Chris)
define NUM_PTE (Chris)

I've made a lot of tiny errors in these helpers. Often I'd correct an
error only to introduce another one. While IGT was capable of catching
them, the tests often took a while to catch, and where hard/slow to
debug in the kernel. As a result, to test this, I compiled
i915_gem_gtt.h in userspace, and ran tests from userspace. What follows
isn't by any means complete, but it was able to catch lot of bugs. Gen8
is also untested, but since the current code is almost identical, I feel
pretty comfortable with that.

void test_pte(uint32_t base) {
        uint32_t ret;
        assert_pte_index((base + 0), 0);
        assert_pte_index((base + 1), 0);
        assert_pte_index((base + 0x1000), 1);
        assert_pte_index((base + (1<<22)), 0);
        assert_pte_index((base + ((1<<22) - 1)), 1023);
        assert_pte_index((base + (1<<21)), 512);

        assert_pte_count(base + 0, 0, 0);
        assert_pte_count(base + 0, 1, 1);
        assert_pte_count(base + 0, 0x1000, 1);
        assert_pte_count(base + 0, 0x1001, 2);
        assert_pte_count(base + 0, 1<<21, 512);

        assert_pte_count(base + 0, 1<<22, 1024);
        assert_pte_count(base + 0, (1<<22) - 1, 1024);
        assert_pte_count(base + (1<<21), 1<<22, 512);
        assert_pte_count(base + (1<<21), (1<<22)+1, 512);
        assert_pte_count(base + (1<<21), 10<<22, 512);
}

void test_pde(uint32_t base) {
        assert(gen6_pde_index(base + 0) == 0);
        assert(gen6_pde_index(base + 1) == 0);
        assert(gen6_pde_index(base + (1<<21)) == 0);
        assert(gen6_pde_index(base + (1<<22)) == 1);
        assert(gen6_pde_index(base + ((256<<22)))== 256);
        assert(gen6_pde_index(base + ((512<<22))) == 0);
        assert(gen6_pde_index(base + ((513<<22))) == 1); /* This is
actually not possible on gen6 */

        assert(gen6_pde_count(base + 0, 0) == 0);
        assert(gen6_pde_count(base + 0, 1) == 1);
        assert(gen6_pde_count(base + 0, 1<<21) == 1);
        assert(gen6_pde_count(base + 0, 1<<22) == 1);
        assert(gen6_pde_count(base + 0, (1<<22) + 0x1000) == 2);
        assert(gen6_pde_count(base + 0x1000, 1<<22) == 2);
        assert(gen6_pde_count(base + 0, 511<<22) == 511);
        assert(gen6_pde_count(base + 0, 512<<22) == 512);
        assert(gen6_pde_count(base + 0x1000, 512<<22) == 512);
        assert(gen6_pde_count(base + (1<<22), 512<<22) == 511);
}

int main()
{
        test_pde(0);
        while (1)
                test_pte(rand() & ~((1<<22) - 1));

        return 0;
}

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c |  88 +++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 123 +++++++++++++++++++++++++++++++++---
 2 files changed, 156 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 086c533..a8eb077 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -248,7 +248,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int i, ret;
 
 	/* bit of a hack to find the actual last used pd */
-	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
+	int used_pd = ppgtt->num_pd_entries / I915_PDES_PER_PD;
 
 	for (i = used_pd - 1; i >= 0; i--) {
 		dma_addr_t addr = ppgtt->pd_dma_addr[i];
@@ -268,9 +268,9 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
-	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
-	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
-	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
+	unsigned pdpe = gen8_pdpe_index(start);
+	unsigned pde = gen8_pde_index(start);
+	unsigned pte = gen8_pte_index(start);
 	unsigned num_entries = length >> PAGE_SHIFT;
 	unsigned last_pte, i;
 
@@ -281,8 +281,8 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
 
 		last_pte = pte + num_entries;
-		if (last_pte > GEN8_PTES_PER_PAGE)
-			last_pte = GEN8_PTES_PER_PAGE;
+		if (last_pte > GEN8_PTES_PER_PT)
+			last_pte = GEN8_PTES_PER_PT;
 
 		pt_vaddr = kmap_atomic(page_table);
 
@@ -296,7 +296,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 
 		pte = 0;
-		if (++pde == GEN8_PDES_PER_PAGE) {
+		if (++pde == I915_PDES_PER_PD) {
 			pdpe++;
 			pde = 0;
 		}
@@ -311,9 +311,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t *pt_vaddr;
-	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
-	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
-	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
+	unsigned pdpe = gen8_pdpe_index(start);
+	unsigned pde = gen8_pde_index(start);
+	unsigned pte = gen8_pte_index(start);
 	struct sg_page_iter sg_iter;
 
 	pt_vaddr = NULL;
@@ -328,12 +328,12 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		pt_vaddr[pte] =
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
 					cache_level, true);
-		if (++pte == GEN8_PTES_PER_PAGE) {
+		if (++pte == GEN8_PTES_PER_PT) {
 			if (!HAS_LLC(ppgtt->base.dev))
 				drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
 			kunmap_atomic(pt_vaddr);
 			pt_vaddr = NULL;
-			if (++pde == GEN8_PDES_PER_PAGE) {
+			if (++pde == I915_PDES_PER_PD) {
 				pdpe++;
 				pde = 0;
 			}
@@ -354,7 +354,7 @@ static void gen8_free_page_tables(struct page **pt_pages)
 	if (pt_pages == NULL)
 		return;
 
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
+	for (i = 0; i < I915_PDES_PER_PD; i++)
 		if (pt_pages[i])
 			__free_pages(pt_pages[i], 0);
 }
@@ -386,7 +386,7 @@ static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 		pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
+		for (j = 0; j < I915_PDES_PER_PD; j++) {
 			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
@@ -412,11 +412,11 @@ static struct page **__gen8_alloc_page_tables(void)
 	struct page **pt_pages;
 	int i;
 
-	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
+	pt_pages = kcalloc(I915_PDES_PER_PD, sizeof(struct page *), GFP_KERNEL);
 	if (!pt_pages)
 		return ERR_PTR(-ENOMEM);
 
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
+	for (i = 0; i < I915_PDES_PER_PD; i++) {
 		pt_pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO);
 		if (!pt_pages[i])
 			goto bail;
@@ -466,7 +466,7 @@ static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
+		ppgtt->gen8_pt_dma_addr[i] = kcalloc(I915_PDES_PER_PD,
 						     sizeof(dma_addr_t),
 						     GFP_KERNEL);
 		if (!ppgtt->gen8_pt_dma_addr[i])
@@ -505,7 +505,7 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 		return ret;
 	}
 
-	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
+	ppgtt->num_pd_entries = max_pdp * I915_PDES_PER_PD;
 
 	ret = gen8_ppgtt_allocate_dma(ppgtt);
 	if (ret)
@@ -566,7 +566,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
 	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-	const int min_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
+	const int min_pt_pages = I915_PDES_PER_PD * max_pdp;
 	int i, j, ret;
 
 	if (size % (1<<30))
@@ -585,7 +585,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 		if (ret)
 			goto bail;
 
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
+		for (j = 0; j < I915_PDES_PER_PD; j++) {
 			ret = gen8_ppgtt_setup_page_tables(ppgtt, i, j);
 			if (ret)
 				goto bail;
@@ -603,7 +603,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	for (i = 0; i < max_pdp; i++) {
 		gen8_ppgtt_pde_t *pd_vaddr;
 		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
+		for (j = 0; j < I915_PDES_PER_PD; j++) {
 			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
@@ -619,7 +619,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE;
+	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PT * PAGE_SIZE;
 
 	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
 			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
@@ -665,9 +665,9 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
 		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
-		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
+		for (pte = 0; pte < GEN6_PTES_PER_PT; pte+=4) {
 			unsigned long va =
-				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
+				(pde * PAGE_SIZE * GEN6_PTES_PER_PT) +
 				(pte * PAGE_SIZE);
 			int i;
 			bool found = false;
@@ -946,29 +946,28 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen6_gtt_pte_t *pt_vaddr, scratch_pte;
-	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned pde = gen6_pde_index(start);
 	unsigned num_entries = length >> PAGE_SHIFT;
-	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
-	unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES;
+	unsigned pte = gen6_pte_index(start);
 	unsigned last_pte, i;
 
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
 
 	while (num_entries) {
-		last_pte = first_pte + num_entries;
-		if (last_pte > I915_PPGTT_PT_ENTRIES)
-			last_pte = I915_PPGTT_PT_ENTRIES;
+		last_pte = pte + num_entries;
+		if (last_pte > GEN6_PTES_PER_PT)
+			last_pte = GEN6_PTES_PER_PT;
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
 
-		for (i = first_pte; i < last_pte; i++)
+		for (i = pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
 
 		kunmap_atomic(pt_vaddr);
 
-		num_entries -= last_pte - first_pte;
-		first_pte = 0;
-		act_pt++;
+		num_entries -= last_pte - pte;
+		pte = 0;
+		pde++;
 	}
 }
 
@@ -980,24 +979,23 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen6_gtt_pte_t *pt_vaddr;
-	unsigned first_entry = start >> PAGE_SHIFT;
-	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
-	unsigned act_pte = first_entry % I915_PPGTT_PT_ENTRIES;
+	unsigned pde = gen6_pde_index(start);
+	unsigned pte = gen6_pte_index(start);
 	struct sg_page_iter sg_iter;
 
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+			pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
 
-		pt_vaddr[act_pte] =
+		pt_vaddr[pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
 				       cache_level, true);
-		if (++act_pte == I915_PPGTT_PT_ENTRIES) {
+		if (++pte == GEN6_PTES_PER_PT) {
 			kunmap_atomic(pt_vaddr);
 			pt_vaddr = NULL;
-			act_pt++;
-			act_pte = 0;
+			pde++;
+			pte = 0;
 		}
 	}
 	if (pt_vaddr)
@@ -1074,7 +1072,7 @@ alloc:
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
-	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
+	ppgtt->num_pd_entries = I915_PDES_PER_PD;
 	return 0;
 }
 
@@ -1179,7 +1177,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total =  ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+	ppgtt->base.total =  ppgtt->num_pd_entries * GEN6_PTES_PER_PT * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd_offset =
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 2002393..3d3337b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -38,8 +38,16 @@ typedef uint32_t gen6_gtt_pte_t;
 typedef uint64_t gen8_gtt_pte_t;
 typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 
-#define I915_PPGTT_PT_ENTRIES		(PAGE_SIZE / sizeof(gen6_gtt_pte_t))
-/* gen6-hsw has bit 11-4 for physical addr bit 39-32 */
+/* GEN Agnostic defines */
+#define I915_PAGE_SIZE			4096
+#define I915_PDES_PER_PD		512
+#define I915_PTE_MASK			(I915_PAGE_SIZE-1)
+#define I915_PDE_MASK			(I915_PDES_PER_PD-1)
+
+/* GEN6 PPGTT resembles a 2 level page table:
+ * 31:22 | 21:12 |  11:0
+ *  PDE  |  PTE  | offset
+ */
 #define GEN6_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0xff0))
 #define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
 #define GEN6_PDE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
@@ -47,13 +55,16 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN6_PTE_UNCACHED		(1 << 1)
 #define GEN6_PTE_VALID			(1 << 0)
 
-#define GEN6_PPGTT_PD_ENTRIES		512
-#define GEN6_PD_SIZE			(GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
+#define GEN6_PD_SIZE			(I915_PDES_PER_PD * PAGE_SIZE)
 #define GEN6_PD_ALIGN			(PAGE_SIZE * 16)
 #define GEN6_PDE_VALID			(1 << 0)
 
 #define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
 
+#define GEN6_PDE_SHIFT			22
+#define GEN6_PTES_PER_PT		(PAGE_SIZE / sizeof(gen6_gtt_pte_t))
+#define NUM_PTE(pde_shift)		(1 << (pde_shift - PAGE_SHIFT))
+
 #define BYT_PTE_SNOOPED_BY_CPU_CACHES	(1 << 2)
 #define BYT_PTE_WRITEABLE		(1 << 1)
 
@@ -72,6 +83,14 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define HSW_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0x7f0))
 #define HSW_PTE_ADDR_ENCODE(addr)	HSW_GTT_ADDR_ENCODE(addr)
 
+#define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
+#define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
+#define PPAT_CACHED_INDEX		_PAGE_PAT /* WB LLCeLLC */
+#define PPAT_DISPLAY_ELLC_INDEX		_PAGE_PCD /* WT eLLC */
+
+#define GEN8_LEGACY_PDPES		4
+#define GEN8_PTES_PER_PT		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
+
 /* GEN8 legacy style address is defined as a 3 level page table:
  * 31:30 | 29:21 | 20:12 |  11:0
  * PDPE  |  PDE  |  PTE  | offset
@@ -81,12 +100,6 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN8_PDPE_SHIFT			30
 #define GEN8_PDPE_MASK			0x3
 #define GEN8_PDE_SHIFT			21
-#define GEN8_PDE_MASK			0x1ff
-#define GEN8_PTE_SHIFT			12
-#define GEN8_PTE_MASK			0x1ff
-#define GEN8_LEGACY_PDPES		4
-#define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
-#define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
 
 #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
 #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
@@ -268,6 +281,96 @@ struct i915_hw_ppgtt {
 	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
 };
 
+static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
+{
+	const uint32_t mask = NUM_PTE(pde_shift) - 1;
+	return (address >> PAGE_SHIFT) & mask;
+}
+
+/* Helper to counts the number of PTEs within the given length. This count does
+ * not cross a page table boundary, so the max value would be
+ * GEN6_PTES_PER_PT for GEN6, and GEN8_PTES_PER_PT for GEN8.
+ */
+static inline size_t i915_pte_count(uint64_t addr, size_t length,
+				    uint32_t pde_shift)
+{
+	const uint64_t mask = ~((1 << pde_shift) - 1);
+	uint64_t end;
+
+	BUG_ON(length == 0);
+	BUG_ON(offset_in_page(addr|length));
+
+	end = addr + length;
+
+	if ((addr & mask) != (end & mask))
+		return NUM_PTE(pde_shift) - i915_pte_index(addr, pde_shift);
+
+	return i915_pte_index(end, pde_shift) - i915_pte_index(addr, pde_shift);
+}
+
+static inline uint32_t i915_pde_index(uint64_t addr, uint32_t shift)
+{
+	return (addr >> shift) & I915_PDE_MASK;
+}
+
+static inline size_t i915_pde_count(uint64_t addr, uint64_t length,
+				    uint32_t pde_shift)
+{
+	const uint32_t pdp_shift = pde_shift + 9;
+	const uint64_t mask = ~((1 << pdp_shift) - 1);
+	uint64_t end;
+
+	BUG_ON(length == 0);
+	BUG_ON(offset_in_page(addr|length));
+
+	end = addr + length;
+
+	if ((addr & mask) != (end & mask))
+		return I915_PDES_PER_PD - i915_pde_index(addr, pde_shift);
+
+	return i915_pde_index(end, pde_shift) - i915_pde_index(addr, pde_shift);
+}
+
+static inline uint32_t gen6_pte_index(uint32_t addr)
+{
+	return i915_pte_index(addr, GEN6_PDE_SHIFT);
+}
+
+static inline size_t gen6_pte_count(uint32_t addr, uint32_t length)
+{
+	return i915_pte_count(addr, length, GEN6_PDE_SHIFT);
+}
+
+static inline uint32_t gen6_pde_index(uint32_t addr)
+{
+	return i915_pde_index(addr, GEN6_PDE_SHIFT);
+}
+
+static inline size_t gen6_pde_count(uint32_t addr, uint32_t length)
+{
+	return i915_pde_count(addr, length, GEN6_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pte_index(uint64_t address)
+{
+	return i915_pte_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pde_index(uint64_t address)
+{
+	return i915_pde_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pdpe_index(uint64_t address)
+{
+	return (address >> GEN8_PDPE_SHIFT) & GEN8_PDPE_MASK;
+}
+
+static inline uint32_t gen8_pml4e_index(uint64_t address)
+{
+	BUG();
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_gem_setup_global_gtt(struct drm_device *dev, unsigned long start,
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 18/56] drm/i915: construct page table abstractions
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (16 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 17/56] drm/i915: Page table helpers, and define renames Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 19/56] drm/i915: Complete page table structures Ben Widawsky
                   ` (38 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Thus far we've opted to make complex code requiring difficult review. In
the future, the code is only going to become more complex, and as such
we'll take the hit now and start to encapsulate things.

To help transition the code nicely there is some wasted space in gen6/7.
This will be ameliorated shortly.

NOTE: The pun in the subject was intentional.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Conflicts:
	drivers/gpu/drm/i915/i915_drv.h
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 175 ++++++++++++++++++------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  24 +++--
 2 files changed, 104 insertions(+), 95 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a8eb077..f2478c9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -278,7 +278,8 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
+		struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
+		struct page *page_table = pd->page_tables[pde].page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PT)
@@ -322,8 +323,11 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
 			break;
 
-		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[pdpe][pde]);
+		if (pt_vaddr == NULL) {
+			struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
+			struct page *page_table = pd->page_tables[pde].page;
+			pt_vaddr = kmap_atomic(page_table);
+		}
 
 		pt_vaddr[pte] =
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -347,29 +351,33 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct page **pt_pages)
+static void gen8_free_page_tables(struct i915_pagedir *pd)
 {
 	int i;
 
-	if (pt_pages == NULL)
+	if (pd->page_tables == NULL)
 		return;
 
 	for (i = 0; i < I915_PDES_PER_PD; i++)
-		if (pt_pages[i])
-			__free_pages(pt_pages[i], 0);
+		if (pd->page_tables[i].page)
+			__free_page(pd->page_tables[i].page);
+}
+
+static void gen8_free_page_directories(struct i915_pagedir *pd)
+{
+	kfree(pd->page_tables);
+	__free_page(pd->page);
 }
 
-static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
-		kfree(ppgtt->gen8_pt_pages[i]);
+		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
+		gen8_free_page_directories(&ppgtt->pdp.pagedir[i]);
 		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
-
-	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
 }
 
 static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
@@ -407,87 +415,73 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static struct page **__gen8_alloc_page_tables(void)
+static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages;
 	int i;
 
-	pt_pages = kcalloc(I915_PDES_PER_PD, sizeof(struct page *), GFP_KERNEL);
-	if (!pt_pages)
-		return ERR_PTR(-ENOMEM);
-
-	for (i = 0; i < I915_PDES_PER_PD; i++) {
-		pt_pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO);
-		if (!pt_pages[i])
-			goto bail;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		ppgtt->gen8_pt_dma_addr[i] = kcalloc(I915_PDES_PER_PD,
+						     sizeof(dma_addr_t),
+						     GFP_KERNEL);
+		if (!ppgtt->gen8_pt_dma_addr[i])
+			return -ENOMEM;
 	}
 
-	return pt_pages;
-
-bail:
-	gen8_free_page_tables(pt_pages);
-	kfree(pt_pages);
-	return ERR_PTR(-ENOMEM);
+	return 0;
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
-					   const int max_pdp)
+static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages[GEN8_LEGACY_PDPES];
-	int i, ret;
+	int i, j;
 
-	for (i = 0; i < max_pdp; i++) {
-		pt_pages[i] = __gen8_alloc_page_tables();
-		if (IS_ERR(pt_pages[i])) {
-			ret = PTR_ERR(pt_pages[i]);
-			goto unwind_out;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		for (j = 0; j < I915_PDES_PER_PD; j++) {
+			struct i915_pagetab *pt = &ppgtt->pdp.pagedir[i].page_tables[j];
+			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+			if (!pt->page)
+				goto unwind_out;
 		}
 	}
 
-	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
-	 * "atomic" - for cleanup purposes.
-	 */
-	for (i = 0; i < max_pdp; i++)
-		ppgtt->gen8_pt_pages[i] = pt_pages[i];
-
 	return 0;
 
 unwind_out:
-	while (i--) {
-		gen8_free_page_tables(pt_pages[i]);
-		kfree(pt_pages[i]);
-	}
+	while (i--)
+		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
 
-	return ret;
+	return -ENOMEM;
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
+						const int max_pdp)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(I915_PDES_PER_PD,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			return -ENOMEM;
-	}
+	for (i = 0; i < max_pdp; i++) {
+		struct i915_pagetab *pt;
+		pt = kcalloc(I915_PDES_PER_PD, sizeof(*pt), GFP_KERNEL);
+		if (!pt)
+			goto unwind_out;
 
-	return 0;
-}
+		ppgtt->pdp.pagedir[i].page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (!ppgtt->pdp.pagedir[i].page)
+			goto unwind_out;
 
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-						const int max_pdp)
-{
-	ppgtt->pd_pages = alloc_pages(GFP_KERNEL | __GFP_ZERO,
-				      get_order(max_pdp << PAGE_SHIFT));
-	if (!ppgtt->pd_pages)
-		return -ENOMEM;
+		ppgtt->pdp.pagedir[i].page_tables = pt;
+	}
 
-	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
+	ppgtt->num_pd_pages = max_pdp;
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
+
+unwind_out:
+	while (i--) {
+		kfree(ppgtt->pdp.pagedir[i].page_tables);
+		__free_page(ppgtt->pdp.pagedir[i].page);
+	}
+
+	return -ENOMEM;
 }
 
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
@@ -499,18 +493,19 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		return ret;
 
-	ret = gen8_ppgtt_allocate_page_tables(ppgtt, max_pdp);
-	if (ret) {
-		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
-		return ret;
-	}
+	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
+	if (ret)
+		goto err_out;
 
 	ppgtt->num_pd_entries = max_pdp * I915_PDES_PER_PD;
 
 	ret = gen8_ppgtt_allocate_dma(ppgtt);
-	if (ret)
-		gen8_ppgtt_free(ppgtt);
+	if (!ret)
+		return ret;
 
+	/* TODO: Check this for all cases */
+err_out:
+	gen8_ppgtt_free(ppgtt);
 	return ret;
 }
 
@@ -521,7 +516,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       &ppgtt->pd_pages[pdpe], 0,
+			       ppgtt->pdp.pagedir[pdpe].page, 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
@@ -541,7 +536,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 	struct page *p;
 	int ret;
 
-	p = ppgtt->gen8_pt_pages[pdpe][pde];
+	p = ppgtt->pdp.pagedir[pdpe].page_tables[pde].page;
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
@@ -602,7 +597,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 */
 	for (i = 0; i < max_pdp; i++) {
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i].page);
 		for (j = 0; j < I915_PDES_PER_PD; j++) {
 			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
@@ -664,7 +659,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
 		for (pte = 0; pte < GEN6_PTES_PER_PT; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * GEN6_PTES_PER_PT) +
@@ -958,7 +953,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > GEN6_PTES_PER_PT)
 			last_pte = GEN6_PTES_PER_PT;
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
 
 		for (i = pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -986,7 +981,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
 
 		pt_vaddr[pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -1020,8 +1015,8 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		__free_page(ppgtt->pt_pages[i]);
-	kfree(ppgtt->pt_pages);
+		__free_page(ppgtt->pd.page_tables[i].page);
+	kfree(ppgtt->pd.page_tables);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1078,22 +1073,22 @@ alloc:
 
 static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
+	struct i915_pagetab *pt;
 	int i;
 
-	ppgtt->pt_pages = kcalloc(ppgtt->num_pd_entries, sizeof(struct page *),
-				  GFP_KERNEL);
-
-	if (!ppgtt->pt_pages)
+	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
+	if (!pt)
 		return -ENOMEM;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO);
-		if (!ppgtt->pt_pages[i]) {
+		pt[i].page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (!pt->page) {
 			gen6_ppgtt_free(ppgtt);
 			return -ENOMEM;
 		}
 	}
 
+	ppgtt->pd.page_tables = pt;
 	return 0;
 }
 
@@ -1128,9 +1123,11 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		struct page *page;
 		dma_addr_t pt_addr;
 
-		pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i], 0, 4096,
+		page = ppgtt->pd.page_tables[i].page;
+		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
 				       PCI_DMA_BIDIRECTIONAL);
 
 		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
@@ -1177,7 +1174,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total =  ppgtt->num_pd_entries * GEN6_PTES_PER_PT * PAGE_SIZE;
+	ppgtt->base.total = ppgtt->num_pd_entries * GEN6_PTES_PER_PT * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd_offset =
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 3d3337b..cddd1e8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -252,6 +252,20 @@ struct i915_gtt {
 			  unsigned long *mappable_end);
 };
 
+struct i915_pagetab {
+	struct page *page;
+};
+
+struct i915_pagedir {
+	struct page *page; /* NULL for GEN6-GEN7 */
+	struct i915_pagetab *page_tables;
+};
+
+struct i915_pagedirpo {
+	/* struct page *page; */
+	struct i915_pagedir pagedir[GEN8_LEGACY_PDPES];
+};
+
 struct i915_hw_ppgtt {
 	struct i915_address_space base;
 	struct kref ref;
@@ -259,11 +273,6 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-		struct page **pt_pages;
-		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
-	};
-	struct page *pd_pages;
-	union {
 		uint32_t pd_offset;
 		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
 	};
@@ -271,7 +280,10 @@ struct i915_hw_ppgtt {
 		dma_addr_t *pt_dma_addr;
 		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
 	};
-
+	union {
+		struct i915_pagedirpo pdp;
+		struct i915_pagedir pd;
+	};
 	struct i915_hw_context *ctx;
 
 	int (*enable)(struct i915_hw_ppgtt *ppgtt);
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 19/56] drm/i915: Complete page table structures
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (17 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 18/56] drm/i915: construct page table abstractions Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 20/56] drm/i915: Create page table allocators Ben Widawsky
                   ` (37 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Move the remaining members over to the new page table structures.

This can be squashed with the previous commit if desire. The reasoning
is the same as that patch. I simply felt it is easier to review if split.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Conflicts:
	drivers/gpu/drm/i915/i915_drv.h
	drivers/gpu/drm/i915/i915_gem_gtt.c
---
 drivers/gpu/drm/i915/i915_debugfs.c   |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c   | 85 +++++++++++++----------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h   | 15 +++----
 drivers/gpu/drm/i915/i915_gpu_error.c |  1 -
 4 files changed, 38 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 4a0b1c8..64051b0 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1815,7 +1815,7 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
 static void print_ppgtt(struct seq_file *m, struct i915_hw_ppgtt *ppgtt, const char *name)
 {
 	seq_printf(m, "%s:\n", name);
-	seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
+	seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
 }
 
 static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int verbose)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index f2478c9..1f186d3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -251,7 +251,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int used_pd = ppgtt->num_pd_entries / I915_PDES_PER_PD;
 
 	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pd_dma_addr[i];
+		dma_addr_t addr = ppgtt->pdp.pagedir[i].daddr;
 		ret = gen8_write_pdp(ring, i, addr, synchronous);
 		if (ret)
 			return ret;
@@ -376,7 +376,6 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
 		gen8_free_page_directories(&ppgtt->pdp.pagedir[i]);
-		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
 }
 
@@ -388,14 +387,14 @@ static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
-		if (!ppgtt->pd_dma_addr[i])
+		if (!ppgtt->pdp.pagedir[i].daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
+		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i].daddr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+			dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
 					       PCI_DMA_BIDIRECTIONAL);
@@ -415,31 +414,18 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
-{
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(I915_PDES_PER_PD,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			return -ENOMEM;
-	}
-
-	return 0;
-}
-
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
 	int i, j;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		struct i915_pagedir *pd = &ppgtt->pdp.pagedir[i];
 		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			struct i915_pagetab *pt = &ppgtt->pdp.pagedir[i].page_tables[j];
+			struct i915_pagetab *pt = &pd->page_tables[j];
 			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
 			if (!pt->page)
 				goto unwind_out;
+
 		}
 	}
 
@@ -499,9 +485,7 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 
 	ppgtt->num_pd_entries = max_pdp * I915_PDES_PER_PD;
 
-	ret = gen8_ppgtt_allocate_dma(ppgtt);
-	if (!ret)
-		return ret;
+	return 0;
 
 	/* TODO: Check this for all cases */
 err_out:
@@ -523,7 +507,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		return ret;
 
-	ppgtt->pd_dma_addr[pdpe] = pd_addr;
+	ppgtt->pdp.pagedir[pdpe].daddr = pd_addr;
 
 	return 0;
 }
@@ -533,17 +517,18 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 					const int pde)
 {
 	dma_addr_t pt_addr;
-	struct page *p;
+	struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
+	struct i915_pagetab *pt = &pd->page_tables[pde];
+	struct page *p = pt->page;
 	int ret;
 
-	p = ppgtt->pdp.pagedir[pdpe].page_tables[pde].page;
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->gen8_pt_dma_addr[pdpe][pde] = pt_addr;
+	pt->daddr = pt_addr;
 
 	return 0;
 }
@@ -599,7 +584,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 		gen8_ppgtt_pde_t *pd_vaddr;
 		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i].page);
 		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+			dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
 		}
@@ -641,14 +626,15 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
 
 	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
 	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
-		   ppgtt->pd_offset, ppgtt->pd_offset + ppgtt->num_pd_entries);
+		   ppgtt->pd.pd_offset,
+		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
 	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pt_dma_addr[pde];
+		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
 		pd_entry = readl(pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -691,8 +677,8 @@ static void gen6_map_single(struct i915_hw_ppgtt *ppgtt,
 {
 	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
 	uint32_t pd_entry;
-	gen6_gtt_pte_t __iomem *pd_addr =
-		(gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+	gen6_gtt_pte_t __iomem *pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm;
+	pd_addr	+= ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
 	pd_entry = GEN6_PDE_ADDR_ENCODE(daddr);
 	pd_entry |= GEN6_PDE_VALID;
@@ -707,18 +693,18 @@ static void gen6_map_page_tables(struct i915_hw_ppgtt *ppgtt)
 	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
 	int i;
 
-	WARN_ON(ppgtt->pd_offset & 0x3f);
+	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		gen6_map_single(ppgtt, i, ppgtt->pt_dma_addr[i]);
+		gen6_map_single(ppgtt, i, ppgtt->pd.page_tables[i].daddr);
 
 	readl(dev_priv->gtt.gsm);
 }
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
 {
-	BUG_ON(ppgtt->pd_offset & 0x3f);
+	BUG_ON(ppgtt->pd.pd_offset & 0x3f);
 
-	return (ppgtt->pd_offset / 64) << 16;
+	return (ppgtt->pd.pd_offset / 64) << 16;
 }
 
 static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
@@ -1001,19 +987,16 @@ static void gen6_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	if (ppgtt->pt_dma_addr) {
-		for (i = 0; i < ppgtt->num_pd_entries; i++)
-			pci_unmap_page(ppgtt->base.dev->pdev,
-				       ppgtt->pt_dma_addr[i],
-				       4096, PCI_DMA_BIDIRECTIONAL);
-	}
+	for (i = 0; i < ppgtt->num_pd_entries; i++)
+		pci_unmap_page(ppgtt->base.dev->pdev,
+			       ppgtt->pd.page_tables[i].daddr,
+			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		__free_page(ppgtt->pd.page_tables[i].page);
 	kfree(ppgtt->pd.page_tables);
@@ -1106,14 +1089,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 		return ret;
 	}
 
-	ppgtt->pt_dma_addr = kcalloc(ppgtt->num_pd_entries, sizeof(dma_addr_t),
-				     GFP_KERNEL);
-	if (!ppgtt->pt_dma_addr) {
-		drm_mm_remove_node(&ppgtt->node);
-		gen6_ppgtt_free(ppgtt);
-		return -ENOMEM;
-	}
-
 	return 0;
 }
 
@@ -1135,7 +1110,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 			return -EIO;
 		}
 
-		ppgtt->pt_dma_addr[i] = pt_addr;
+		ppgtt->pd.page_tables[i].daddr = pt_addr;
 	}
 
 	return 0;
@@ -1177,7 +1152,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.total = ppgtt->num_pd_entries * GEN6_PTES_PER_PT * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
-	ppgtt->pd_offset =
+	ppgtt->pd.pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
 	gen6_map_page_tables(ppgtt);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index cddd1e8..07a5cd4 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -254,10 +254,16 @@ struct i915_gtt {
 
 struct i915_pagetab {
 	struct page *page;
+	dma_addr_t daddr;
 };
 
 struct i915_pagedir {
 	struct page *page; /* NULL for GEN6-GEN7 */
+	union {
+		uint32_t pd_offset;
+		dma_addr_t daddr;
+	};
+
 	struct i915_pagetab *page_tables;
 };
 
@@ -273,17 +279,10 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-		uint32_t pd_offset;
-		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
-	};
-	union {
-		dma_addr_t *pt_dma_addr;
-		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
-	};
-	union {
 		struct i915_pagedirpo pdp;
 		struct i915_pagedir pd;
 	};
+
 	struct i915_hw_context *ctx;
 
 	int (*enable)(struct i915_hw_ppgtt *ppgtt);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 50d2af8..5d691cd 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -756,7 +756,6 @@ static void i915_gem_record_fences(struct drm_device *dev,
 	}
 }
 
-
 static void gen8_record_semaphore_state(struct drm_i915_private *dev_priv,
 					struct drm_i915_error_state *error,
 					struct intel_ring_buffer *ring,
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 20/56] drm/i915: Create page table allocators
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (18 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 19/56] drm/i915: Complete page table structures Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 21/56] drm/i915: Generalize GEN6 mapping Ben Widawsky
                   ` (36 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks.  This makes the
code easier to write, read, and verify.

Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,

The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain non-trivial complexity as a
result.

This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.

2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).

3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.

4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.

v2: Updated commit message to explain why this patch exists

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 226 +++++++++++++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |   4 +-
 2 files changed, 147 insertions(+), 83 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 1f186d3..35370eb 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -211,6 +211,102 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
+static void free_pt_single(struct i915_pagetab *pt)
+{
+	if (WARN_ON(!pt->page))
+		return;
+	__free_page(pt->page);
+	kfree(pt);
+}
+
+static struct i915_pagetab *alloc_pt_single(void)
+{
+	struct i915_pagetab *pt;
+
+	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
+	if (!pt)
+		return ERR_PTR(-ENOMEM);
+
+	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pt->page) {
+		kfree(pt);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return pt;
+}
+
+/**
+ * alloc_pt_range() - Allocate a multiple page tables
+ * @pd:		The page directory which will have at least @count entries
+ *		available to point to the allocated page tables.
+ * @pde:	First page directory entry for which we are allocating.
+ * @count:	Number of pages to allocate.
+ *
+ * Allocates multiple page table pages and sets the appropriate entries in the
+ * page table structure within the page directory. Function cleans up after
+ * itself on any failures.
+ *
+ * Return: 0 if allocation succeeded.
+ */
+static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
+{
+	int i, ret;
+
+	/* 512 is the max page tables per pagedir on any platform.
+	 * TODO: make WARN after patch series is done
+	 */
+	BUG_ON(pde + count > I915_PDES_PER_PD);
+
+	for (i = pde; i < pde + count; i++) {
+		struct i915_pagetab *pt = alloc_pt_single();
+		if (IS_ERR(pt)) {
+			ret = PTR_ERR(pt);
+			goto err_out;
+		}
+		WARN(pd->page_tables[i],
+		     "Leaking page directory entry %d (%pa)\n",
+		     i, pd->page_tables[i]);
+		pd->page_tables[i] = pt;
+	}
+
+	return 0;
+
+err_out:
+	while (i--)
+		free_pt_single(pd->page_tables[i]);
+	return ret;
+}
+
+static void __free_pd_single(struct i915_pagedir *pd)
+{
+	__free_page(pd->page);
+	kfree(pd);
+}
+
+#define free_pd_single(pd) do { \
+	if ((pd)->page) { \
+		__free_pd_single(pd); \
+	} \
+} while (0)
+
+static struct i915_pagedir *alloc_pd_single(void)
+{
+	struct i915_pagedir *pd;
+
+	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
+	if (!pd)
+		return ERR_PTR(-ENOMEM);
+
+	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pd->page) {
+		kfree(pd);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return pd;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_ring_buffer *ring, unsigned entry,
 			   uint64_t val, bool synchronous)
@@ -251,7 +347,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int used_pd = ppgtt->num_pd_entries / I915_PDES_PER_PD;
 
 	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pdp.pagedir[i].daddr;
+		dma_addr_t addr = ppgtt->pdp.pagedir[i]->daddr;
 		ret = gen8_write_pdp(ring, i, addr, synchronous);
 		if (ret)
 			return ret;
@@ -278,8 +374,9 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
-		struct page *page_table = pd->page_tables[pde].page;
+		struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
+		struct i915_pagetab *pt = pd->page_tables[pde];
+		struct page *page_table = pt->page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PT)
@@ -324,8 +421,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 			break;
 
 		if (pt_vaddr == NULL) {
-			struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
-			struct page *page_table = pd->page_tables[pde].page;
+			struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
+			struct i915_pagetab *pt = pd->page_tables[pde];
+			struct page *page_table = pt->page;
 			pt_vaddr = kmap_atomic(page_table);
 		}
 
@@ -355,18 +453,13 @@ static void gen8_free_page_tables(struct i915_pagedir *pd)
 {
 	int i;
 
-	if (pd->page_tables == NULL)
+	if (!pd->page)
 		return;
 
-	for (i = 0; i < I915_PDES_PER_PD; i++)
-		if (pd->page_tables[i].page)
-			__free_page(pd->page_tables[i].page);
-}
-
-static void gen8_free_page_directories(struct i915_pagedir *pd)
-{
-	kfree(pd->page_tables);
-	__free_page(pd->page);
+	for (i = 0; i < I915_PDES_PER_PD; i++) {
+		free_pt_single(pd->page_tables[i]);
+		pd->page_tables[i] = NULL;
+	}
 }
 
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
@@ -374,8 +467,8 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
-		gen8_free_page_directories(&ppgtt->pdp.pagedir[i]);
+		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
+		free_pd_single(ppgtt->pdp.pagedir[i]);
 	}
 }
 
@@ -387,14 +480,16 @@ static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
-		if (!ppgtt->pdp.pagedir[i].daddr)
+		if (!ppgtt->pdp.pagedir[i]->daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i].daddr, PAGE_SIZE,
+		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i]->daddr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
+			struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
+			struct i915_pagetab *pt =  pd->page_tables[j];
+			dma_addr_t addr = pt->daddr;
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
 					       PCI_DMA_BIDIRECTIONAL);
@@ -416,24 +511,20 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-	int i, j;
+	int i, ret;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		struct i915_pagedir *pd = &ppgtt->pdp.pagedir[i];
-		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			struct i915_pagetab *pt = &pd->page_tables[j];
-			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-			if (!pt->page)
-				goto unwind_out;
-
-		}
+		ret = alloc_pt_range(ppgtt->pdp.pagedir[i],
+				     0, I915_PDES_PER_PD);
+		if (ret)
+			goto unwind_out;
 	}
 
 	return 0;
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
+		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
 
 	return -ENOMEM;
 }
@@ -444,16 +535,9 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int i;
 
 	for (i = 0; i < max_pdp; i++) {
-		struct i915_pagetab *pt;
-		pt = kcalloc(I915_PDES_PER_PD, sizeof(*pt), GFP_KERNEL);
-		if (!pt)
+		ppgtt->pdp.pagedir[i] = alloc_pd_single();
+		if (IS_ERR(ppgtt->pdp.pagedir[i]))
 			goto unwind_out;
-
-		ppgtt->pdp.pagedir[i].page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-		if (!ppgtt->pdp.pagedir[i].page)
-			goto unwind_out;
-
-		ppgtt->pdp.pagedir[i].page_tables = pt;
 	}
 
 	ppgtt->num_pd_pages = max_pdp;
@@ -462,10 +546,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 
 unwind_out:
-	while (i--) {
-		kfree(ppgtt->pdp.pagedir[i].page_tables);
-		__free_page(ppgtt->pdp.pagedir[i].page);
-	}
+	while (i--)
+		free_pd_single(ppgtt->pdp.pagedir[i]);
 
 	return -ENOMEM;
 }
@@ -500,14 +582,14 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       ppgtt->pdp.pagedir[pdpe].page, 0,
+			       ppgtt->pdp.pagedir[pdpe]->page, 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->pdp.pagedir[pdpe].daddr = pd_addr;
+	ppgtt->pdp.pagedir[pdpe]->daddr = pd_addr;
 
 	return 0;
 }
@@ -517,8 +599,8 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 					const int pde)
 {
 	dma_addr_t pt_addr;
-	struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
-	struct i915_pagetab *pt = &pd->page_tables[pde];
+	struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
+	struct i915_pagetab *pt = pd->page_tables[pde];
 	struct page *p = pt->page;
 	int ret;
 
@@ -581,10 +663,12 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 * will never need to touch the PDEs again.
 	 */
 	for (i = 0; i < max_pdp; i++) {
+		struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i].page);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i]->page);
 		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
+			struct i915_pagetab *pt = pd->page_tables[j];
+			dma_addr_t addr = pt->daddr;
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
 		}
@@ -634,7 +718,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
+		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
 		pd_entry = readl(pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -645,7 +729,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde]->page);
 		for (pte = 0; pte < GEN6_PTES_PER_PT; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * GEN6_PTES_PER_PT) +
@@ -695,7 +779,7 @@ static void gen6_map_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		gen6_map_single(ppgtt, i, ppgtt->pd.page_tables[i].daddr);
+		gen6_map_single(ppgtt, i, ppgtt->pd.page_tables[i]->daddr);
 
 	readl(dev_priv->gtt.gsm);
 }
@@ -939,7 +1023,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > GEN6_PTES_PER_PT)
 			last_pte = GEN6_PTES_PER_PT;
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde]->page);
 
 		for (i = pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -967,7 +1051,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde]->page);
 
 		pt_vaddr[pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -989,7 +1073,7 @@ static void gen6_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd.page_tables[i].daddr,
+			       ppgtt->pd.page_tables[i]->daddr,
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
@@ -998,8 +1082,9 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		__free_page(ppgtt->pd.page_tables[i].page);
-	kfree(ppgtt->pd.page_tables);
+		free_pt_single(ppgtt->pd.page_tables[i]);
+
+	free_pd_single(&ppgtt->pd);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1054,27 +1139,6 @@ alloc:
 	return 0;
 }
 
-static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
-	struct i915_pagetab *pt;
-	int i;
-
-	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
-	if (!pt)
-		return -ENOMEM;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		pt[i].page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-		if (!pt->page) {
-			gen6_ppgtt_free(ppgtt);
-			return -ENOMEM;
-		}
-	}
-
-	ppgtt->pd.page_tables = pt;
-	return 0;
-}
-
 static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 {
 	int ret;
@@ -1083,7 +1147,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = gen6_ppgtt_allocate_page_tables(ppgtt);
+	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
 	if (ret) {
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
@@ -1101,7 +1165,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 		struct page *page;
 		dma_addr_t pt_addr;
 
-		page = ppgtt->pd.page_tables[i].page;
+		page = ppgtt->pd.page_tables[i]->page;
 		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
 				       PCI_DMA_BIDIRECTIONAL);
 
@@ -1110,7 +1174,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 			return -EIO;
 		}
 
-		ppgtt->pd.page_tables[i].daddr = pt_addr;
+		ppgtt->pd.page_tables[i]->daddr = pt_addr;
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 07a5cd4..9b714b5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -264,12 +264,12 @@ struct i915_pagedir {
 		dma_addr_t daddr;
 	};
 
-	struct i915_pagetab *page_tables;
+	struct i915_pagetab *page_tables[I915_PDES_PER_PD]; /* PDEs */
 };
 
 struct i915_pagedirpo {
 	/* struct page *page; */
-	struct i915_pagedir pagedir[GEN8_LEGACY_PDPES];
+	struct i915_pagedir *pagedir[GEN8_LEGACY_PDPES];
 };
 
 struct i915_hw_ppgtt {
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 21/56] drm/i915: Generalize GEN6 mapping
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (19 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 20/56] drm/i915: Create page table allocators Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 22/56] drm/i915: Clean up pagetable DMA map & unmap Ben Widawsky
                   ` (35 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Having a more general way of doing mappings will allow the ability to
easy map and unmap a specific page table. Specifically in this case, we
pass down the page directory + entry, and the page table to map. This
works similarly to the x86 code.

The same work will need to happen for GEN8. At that point I will try to
combine functionality.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 61 +++++++++++++++++++------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  2 ++
 2 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 35370eb..e396b89 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -700,18 +700,13 @@ bail:
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
 	struct i915_address_space *vm = &ppgtt->base;
-	gen6_gtt_pte_t __iomem *pd_addr;
 	gen6_gtt_pte_t scratch_pte;
 	uint32_t pd_entry;
 	int pte, pde;
 
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
 
-	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
-
 	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
 		   ppgtt->pd.pd_offset,
 		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
@@ -719,7 +714,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
 		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
-		pd_entry = readl(pd_addr + pde);
+		pd_entry = readl(ppgtt->pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
 		if (pd_entry != expected)
@@ -755,39 +750,43 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	}
 }
 
-static void gen6_map_single(struct i915_hw_ppgtt *ppgtt,
-			    const unsigned pde_index,
-			    dma_addr_t daddr)
+/* Map pde (index) from the page directory @pd to the page table @pt */
+static void gen6_map_single(struct i915_pagedir *pd,
+			    const int pde, struct i915_pagetab *pt)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
-	uint32_t pd_entry;
-	gen6_gtt_pte_t __iomem *pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm;
-	pd_addr	+= ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pd, struct i915_hw_ppgtt, pd);
+	u32 pd_entry;
 
-	pd_entry = GEN6_PDE_ADDR_ENCODE(daddr);
+	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
 	pd_entry |= GEN6_PDE_VALID;
 
-	writel(pd_entry, pd_addr + pde_index);
+	writel(pd_entry, ppgtt->pd_addr + pde);
+
+	/* XXX: Caller needs to make sure the write completes if necessary */
 }
 
 /* Map all the page tables found in the ppgtt structure to incrementing page
  * directories. */
-static void gen6_map_page_tables(struct i915_hw_ppgtt *ppgtt)
+static void gen6_map_page_range(struct drm_i915_private *dev_priv,
+				struct i915_pagedir *pd, unsigned pde, size_t n)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
-	int i;
+	if (WARN_ON(pde + n > I915_PDES_PER_PD))
+		n = I915_PDES_PER_PD - pde;
 
-	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		gen6_map_single(ppgtt, i, ppgtt->pd.page_tables[i]->daddr);
+	n += pde;
+
+	for (; pde < n; pde++)
+		gen6_map_single(pd, pde, pd->page_tables[pde]);
 
+	/* Make sure write is complete before other code can use this page
+	 * table. Also require for WC mapped PTEs */
 	readl(dev_priv->gtt.gsm);
 }
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
 {
 	BUG_ON(ppgtt->pd.pd_offset & 0x3f);
-
 	return (ppgtt->pd.pd_offset / 64) << 16;
 }
 
@@ -1219,7 +1218,10 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd.pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
-	gen6_map_page_tables(ppgtt);
+	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
+
+	gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 			 ppgtt->node.size >> 20,
@@ -1405,13 +1407,14 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 
 	list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
 		/* TODO: Perhaps it shouldn't be gen6 specific */
-		if (i915_is_ggtt(vm)) {
-			if (dev_priv->mm.aliasing_ppgtt)
-				gen6_map_page_tables(dev_priv->mm.aliasing_ppgtt);
-			continue;
-		}
 
-		gen6_map_page_tables(container_of(vm, struct i915_hw_ppgtt, base));
+		struct i915_hw_ppgtt *ppgtt =
+			container_of(vm, struct i915_hw_ppgtt, base);
+
+		if (i915_is_ggtt(vm))
+			ppgtt = dev_priv->mm.aliasing_ppgtt;
+
+		gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
 	}
 
 	i915_gem_chipset_flush(dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 9b714b5..fea846d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -285,6 +285,8 @@ struct i915_hw_ppgtt {
 
 	struct i915_hw_context *ctx;
 
+	gen6_gtt_pte_t __iomem *pd_addr;
+
 	int (*enable)(struct i915_hw_ppgtt *ppgtt);
 	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
 			 struct intel_ring_buffer *ring,
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 22/56] drm/i915: Clean up pagetable DMA map & unmap
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (20 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 21/56] drm/i915: Generalize GEN6 mapping Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 23/56] drm/i915: Always dma map page table allocations Ben Widawsky
                   ` (34 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Map and unmap are common operations across all generations for
pagetables. With a simple helper, we can get a nice net code reduction
as well as simplified complexity.

There is some room for optimization here, for instance with the multiple
page mapping, that can be done in one pci_map operation. In that case
however, the max value we'll ever see there is 512, and so I believe the
simpler code makes this a worthwhile trade-off. Also, the range mapping
functions are place holders to help transition the code. Eventually,
mapping will only occur during a page allocation which will always be a
discrete operation.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 147 +++++++++++++++++++++---------------
 1 file changed, 85 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e396b89..e8d4dfa 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -211,6 +211,76 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
+#define dma_unmap_pt_single(pt, dev) do { \
+	pci_unmap_page((dev)->pdev, (pt)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
+} while (0);
+
+
+static void dma_unmap_pt_range(struct i915_pagedir *pd,
+			       unsigned pde, size_t n,
+			       struct drm_device *dev)
+{
+	if (WARN_ON(pde + n > I915_PDES_PER_PD))
+		n = I915_PDES_PER_PD - pde;
+
+	n += pde;
+
+	for (; pde < n; pde++)
+		dma_unmap_pt_single(pd->page_tables[pde], dev);
+}
+
+/**
+ * dma_map_pt_single() - Create a dma mapping for a page table
+ * @pt:		Page table to get a DMA map for
+ * @dev:	drm device
+ *
+ * Page table allocations are unified across all gens. They always require a
+ * single 4k allocation, as well as a DMA mapping.
+ *
+ * Return: 0 if success.
+ */
+static int dma_map_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
+{
+	struct page *page;
+	dma_addr_t pt_addr;
+	int ret;
+
+	page = pt->page;
+	pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
+			       PCI_DMA_BIDIRECTIONAL);
+
+	ret = pci_dma_mapping_error(dev->pdev, pt_addr);
+	if (ret)
+		return ret;
+
+	pt->daddr = pt_addr;
+
+	return 0;
+}
+
+static int dma_map_pt_range(struct i915_pagedir *pd,
+			    unsigned pde, size_t n,
+			    struct drm_device *dev)
+{
+	const int first = pde;
+
+	if (WARN_ON(pde + n > I915_PDES_PER_PD))
+		n = I915_PDES_PER_PD - pde;
+
+	n += pde;
+
+	for (; pde < n; pde++) {
+		int ret;
+		ret = dma_map_pt_single(pd->page_tables[pde], dev);
+		if (ret) {
+			dma_unmap_pt_range(pd, first, pde, dev);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
 static void free_pt_single(struct i915_pagetab *pt)
 {
 	if (WARN_ON(!pt->page))
@@ -219,7 +289,7 @@ static void free_pt_single(struct i915_pagetab *pt)
 	kfree(pt);
 }
 
-static struct i915_pagetab *alloc_pt_single(void)
+static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
 {
 	struct i915_pagetab *pt;
 
@@ -242,6 +312,7 @@ static struct i915_pagetab *alloc_pt_single(void)
  *		available to point to the allocated page tables.
  * @pde:	First page directory entry for which we are allocating.
  * @count:	Number of pages to allocate.
+ * @dev		DRM device used for DMA mapping.
  *
  * Allocates multiple page table pages and sets the appropriate entries in the
  * page table structure within the page directory. Function cleans up after
@@ -249,7 +320,8 @@ static struct i915_pagetab *alloc_pt_single(void)
  *
  * Return: 0 if allocation succeeded.
  */
-static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
+static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count,
+			  struct drm_device *dev)
 {
 	int i, ret;
 
@@ -259,7 +331,7 @@ static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
 	BUG_ON(pde + count > I915_PDES_PER_PD);
 
 	for (i = pde; i < pde + count; i++) {
-		struct i915_pagetab *pt = alloc_pt_single();
+		struct i915_pagetab *pt = alloc_pt_single(dev);
 		if (IS_ERR(pt)) {
 			ret = PTR_ERR(pt);
 			goto err_out;
@@ -515,7 +587,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		ret = alloc_pt_range(ppgtt->pdp.pagedir[i],
-				     0, I915_PDES_PER_PD);
+				     0, I915_PDES_PER_PD, ppgtt->base.dev);
 		if (ret)
 			goto unwind_out;
 	}
@@ -594,27 +666,6 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 }
 
-static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
-					const int pdpe,
-					const int pde)
-{
-	dma_addr_t pt_addr;
-	struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
-	struct i915_pagetab *pt = pd->page_tables[pde];
-	struct page *p = pt->page;
-	int ret;
-
-	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
-	if (ret)
-		return ret;
-
-	pt->daddr = pt_addr;
-
-	return 0;
-}
-
 /**
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
@@ -643,12 +694,15 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 * 2. Create DMA mappings for the page directories and page tables.
 	 */
 	for (i = 0; i < max_pdp; i++) {
+		struct i915_pagedir *pd;
 		ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
 		if (ret)
 			goto bail;
 
+		pd = ppgtt->pdp.pagedir[i];
+
 		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			ret = gen8_ppgtt_setup_page_tables(ppgtt, i, j);
+			ret = dma_map_pt_single(pd->page_tables[j], ppgtt->base.dev);
 			if (ret)
 				goto bail;
 		}
@@ -1066,16 +1120,6 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 }
 
-static void gen6_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
-{
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd.page_tables[i]->daddr,
-			       4096, PCI_DMA_BIDIRECTIONAL);
-}
-
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
@@ -1095,7 +1139,7 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 	drm_mm_takedown(&ppgtt->base.mm);
 	drm_mm_remove_node(&ppgtt->node);
 
-	gen6_ppgtt_dma_unmap_pages(ppgtt);
+	dma_unmap_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries, vm->dev);
 	gen6_ppgtt_free(ppgtt);
 }
 
@@ -1146,7 +1190,8 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
+	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+			     ppgtt->base.dev);
 	if (ret) {
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
@@ -1155,29 +1200,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
-	struct drm_device *dev = ppgtt->base.dev;
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		struct page *page;
-		dma_addr_t pt_addr;
-
-		page = ppgtt->pd.page_tables[i]->page;
-		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
-				       PCI_DMA_BIDIRECTIONAL);
-
-		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
-			gen6_ppgtt_dma_unmap_pages(ppgtt);
-			return -EIO;
-		}
-
-		ppgtt->pd.page_tables[i]->daddr = pt_addr;
-	}
-
-	return 0;
-}
 
 static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
@@ -1202,7 +1224,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = gen6_ppgtt_setup_page_tables(ppgtt);
+	ret = dma_map_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+			       ppgtt->base.dev);
 	if (ret) {
 		gen6_ppgtt_free(ppgtt);
 		return ret;
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 23/56] drm/i915: Always dma map page table allocations
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (21 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 22/56] drm/i915: Clean up pagetable DMA map & unmap Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 24/56] drm/i915: Consolidate dma mappings Ben Widawsky
                   ` (33 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

There is never a case where we don't want to do it. Since we've broken
up the allocations into nice clean helper functions, it's both easy and
obvious to do the dma mapping at the same time.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 78 ++++++++-----------------------------
 1 file changed, 17 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e8d4dfa..92ffee7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -215,20 +215,6 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	pci_unmap_page((dev)->pdev, (pt)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
 } while (0);
 
-
-static void dma_unmap_pt_range(struct i915_pagedir *pd,
-			       unsigned pde, size_t n,
-			       struct drm_device *dev)
-{
-	if (WARN_ON(pde + n > I915_PDES_PER_PD))
-		n = I915_PDES_PER_PD - pde;
-
-	n += pde;
-
-	for (; pde < n; pde++)
-		dma_unmap_pt_single(pd->page_tables[pde], dev);
-}
-
 /**
  * dma_map_pt_single() - Create a dma mapping for a page table
  * @pt:		Page table to get a DMA map for
@@ -258,33 +244,12 @@ static int dma_map_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
 	return 0;
 }
 
-static int dma_map_pt_range(struct i915_pagedir *pd,
-			    unsigned pde, size_t n,
-			    struct drm_device *dev)
-{
-	const int first = pde;
-
-	if (WARN_ON(pde + n > I915_PDES_PER_PD))
-		n = I915_PDES_PER_PD - pde;
-
-	n += pde;
-
-	for (; pde < n; pde++) {
-		int ret;
-		ret = dma_map_pt_single(pd->page_tables[pde], dev);
-		if (ret) {
-			dma_unmap_pt_range(pd, first, pde, dev);
-			return ret;
-		}
-	}
-
-	return 0;
-}
-
-static void free_pt_single(struct i915_pagetab *pt)
+static void free_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
 {
 	if (WARN_ON(!pt->page))
 		return;
+
+	dma_unmap_pt_single(pt, dev);
 	__free_page(pt->page);
 	kfree(pt);
 }
@@ -292,6 +257,7 @@ static void free_pt_single(struct i915_pagetab *pt)
 static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
 {
 	struct i915_pagetab *pt;
+	int ret;
 
 	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
 	if (!pt)
@@ -303,6 +269,13 @@ static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	ret = dma_map_pt_single(pt, dev);
+	if (ret) {
+		__free_page(pt->page);
+		kfree(pt);
+		return ERR_PTR(ret);
+	}
+
 	return pt;
 }
 
@@ -346,7 +319,7 @@ static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count,
 
 err_out:
 	while (i--)
-		free_pt_single(pd->page_tables[i]);
+		free_pt_single(pd->page_tables[i], dev);
 	return ret;
 }
 
@@ -521,7 +494,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct i915_pagedir *pd)
+static void gen8_free_page_tables(struct i915_pagedir *pd, struct drm_device *dev)
 {
 	int i;
 
@@ -529,7 +502,7 @@ static void gen8_free_page_tables(struct i915_pagedir *pd)
 		return;
 
 	for (i = 0; i < I915_PDES_PER_PD; i++) {
-		free_pt_single(pd->page_tables[i]);
+		free_pt_single(pd->page_tables[i], dev);
 		pd->page_tables[i] = NULL;
 	}
 }
@@ -539,7 +512,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
+		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
 		free_pd_single(ppgtt->pdp.pagedir[i]);
 	}
 }
@@ -596,7 +569,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
+		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
@@ -694,18 +667,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 * 2. Create DMA mappings for the page directories and page tables.
 	 */
 	for (i = 0; i < max_pdp; i++) {
-		struct i915_pagedir *pd;
 		ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
 		if (ret)
 			goto bail;
-
-		pd = ppgtt->pdp.pagedir[i];
-
-		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			ret = dma_map_pt_single(pd->page_tables[j], ppgtt->base.dev);
-			if (ret)
-				goto bail;
-		}
 	}
 
 	/*
@@ -1125,7 +1089,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		free_pt_single(ppgtt->pd.page_tables[i]);
+		free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
 
 	free_pd_single(&ppgtt->pd);
 }
@@ -1139,7 +1103,6 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 	drm_mm_takedown(&ppgtt->base.mm);
 	drm_mm_remove_node(&ppgtt->node);
 
-	dma_unmap_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries, vm->dev);
 	gen6_ppgtt_free(ppgtt);
 }
 
@@ -1224,13 +1187,6 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = dma_map_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
-			       ppgtt->base.dev);
-	if (ret) {
-		gen6_ppgtt_free(ppgtt);
-		return ret;
-	}
-
 	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 24/56] drm/i915: Consolidate dma mappings
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (22 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 23/56] drm/i915: Always dma map page table allocations Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 25/56] drm/i915: Always dma map page directory allocations Ben Widawsky
                   ` (32 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

With a little bit of macro magic, and the fact that every page
table/dir/etc. we wish to map will have a page, and daddr member, we can
greatly simplify and reduce code.

The patch introduces an i915_dma_map/unmap which has the same semantics
as pci_map_page, but is 1 line, and doesn't require newlines, or local
variables to make it fit cleanly.

Notice that even the page allocation shares this same attribute. For
now, I am leaving that code untouched because the macro version would be
a bit on the big side - but it's a nice cleanup as well (IMO)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 56 ++++++++++++-------------------------
 1 file changed, 18 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 92ffee7..bb909e9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -211,45 +211,33 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
-#define dma_unmap_pt_single(pt, dev) do { \
-	pci_unmap_page((dev)->pdev, (pt)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
+#define i915_dma_unmap_single(px, dev) do { \
+	pci_unmap_page((dev)->pdev, (px)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
 } while (0);
 
 /**
- * dma_map_pt_single() - Create a dma mapping for a page table
- * @pt:		Page table to get a DMA map for
+ * i915_dma_map_px_single() - Create a dma mapping for a page table/dir/etc.
+ * @px:		Page table/dir/etc to get a DMA map for
  * @dev:	drm device
  *
  * Page table allocations are unified across all gens. They always require a
- * single 4k allocation, as well as a DMA mapping.
+ * single 4k allocation, as well as a DMA mapping. If we keep the structs
+ * symmetric here, the simple macro covers us for every page table type.
  *
  * Return: 0 if success.
  */
-static int dma_map_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
-{
-	struct page *page;
-	dma_addr_t pt_addr;
-	int ret;
-
-	page = pt->page;
-	pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
-			       PCI_DMA_BIDIRECTIONAL);
-
-	ret = pci_dma_mapping_error(dev->pdev, pt_addr);
-	if (ret)
-		return ret;
-
-	pt->daddr = pt_addr;
-
-	return 0;
-}
+#define i915_dma_map_px_single(px, dev) \
+	pci_dma_mapping_error((dev)->pdev, \
+			      (px)->daddr = pci_map_page((dev)->pdev, \
+							 (px)->page, 0, 4096, \
+							 PCI_DMA_BIDIRECTIONAL))
 
 static void free_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
 {
 	if (WARN_ON(!pt->page))
 		return;
 
-	dma_unmap_pt_single(pt, dev);
+	i915_dma_unmap_single(pt, dev);
 	__free_page(pt->page);
 	kfree(pt);
 }
@@ -269,7 +257,7 @@ static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
 		return ERR_PTR(-ENOMEM);
 	}
 
-	ret = dma_map_pt_single(pt, dev);
+	ret = i915_dma_map_px_single(pt, dev);
 	if (ret) {
 		__free_page(pt->page);
 		kfree(pt);
@@ -519,7 +507,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
-	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
+	struct drm_device *dev = ppgtt->base.dev;
 	int i, j;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
@@ -528,16 +516,14 @@ static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 		if (!ppgtt->pdp.pagedir[i]->daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i]->daddr, PAGE_SIZE,
-			       PCI_DMA_BIDIRECTIONAL);
+		i915_dma_unmap_single(ppgtt->pdp.pagedir[i], dev);
 
 		for (j = 0; j < I915_PDES_PER_PD; j++) {
 			struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
 			struct i915_pagetab *pt =  pd->page_tables[j];
 			dma_addr_t addr = pt->daddr;
 			if (addr)
-				pci_unmap_page(hwdev, addr, PAGE_SIZE,
-					       PCI_DMA_BIDIRECTIONAL);
+				i915_dma_unmap_single(pt, dev);
 		}
 	}
 }
@@ -623,19 +609,13 @@ err_out:
 static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 					     const int pdpe)
 {
-	dma_addr_t pd_addr;
 	int ret;
 
-	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       ppgtt->pdp.pagedir[pdpe]->page, 0,
-			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-
-	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
+	ret = i915_dma_map_px_single(ppgtt->pdp.pagedir[pdpe],
+				     ppgtt->base.dev);
 	if (ret)
 		return ret;
 
-	ppgtt->pdp.pagedir[pdpe]->daddr = pd_addr;
-
 	return 0;
 }
 
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 25/56] drm/i915: Always dma map page directory allocations
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (23 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 24/56] drm/i915: Consolidate dma mappings Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 26/56] drm/i915: Track GEN6 page table usage Ben Widawsky
                   ` (31 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Similar to the patch a few back in the series, we can always map and
unmap page directories when we do their allocation and teardown. Page
directory pages only exist on gen8+, so this should only effect behavior
on those platforms.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 79 +++++++++----------------------------
 1 file changed, 19 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bb909e9..51fc036 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -311,21 +311,23 @@ err_out:
 	return ret;
 }
 
-static void __free_pd_single(struct i915_pagedir *pd)
+static void __free_pd_single(struct i915_pagedir *pd, struct drm_device *dev)
 {
+	i915_dma_unmap_single(pd, dev);
 	__free_page(pd->page);
 	kfree(pd);
 }
 
-#define free_pd_single(pd) do { \
+#define free_pd_single(pd, dev) do { \
 	if ((pd)->page) { \
-		__free_pd_single(pd); \
+		__free_pd_single(pd, dev); \
 	} \
 } while (0)
 
-static struct i915_pagedir *alloc_pd_single(void)
+static struct i915_pagedir *alloc_pd_single(struct drm_device *dev)
 {
 	struct i915_pagedir *pd;
+	int ret;
 
 	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
 	if (!pd)
@@ -337,6 +339,13 @@ static struct i915_pagedir *alloc_pd_single(void)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	ret = i915_dma_map_px_single(pd, dev);
+	if (ret) {
+		__free_page(pd->page);
+		kfree(pd);
+		return ERR_PTR(ret);
+	}
+
 	return pd;
 }
 
@@ -501,30 +510,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
-		free_pd_single(ppgtt->pdp.pagedir[i]);
-	}
-}
-
-static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
-{
-	struct drm_device *dev = ppgtt->base.dev;
-	int i, j;
-
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		/* TODO: In the future we'll support sparse mappings, so this
-		 * will have to change. */
-		if (!ppgtt->pdp.pagedir[i]->daddr)
-			continue;
-
-		i915_dma_unmap_single(ppgtt->pdp.pagedir[i], dev);
-
-		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
-			struct i915_pagetab *pt =  pd->page_tables[j];
-			dma_addr_t addr = pt->daddr;
-			if (addr)
-				i915_dma_unmap_single(pt, dev);
-		}
+		free_pd_single(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
 	}
 }
 
@@ -536,7 +522,6 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	list_del(&vm->global_link);
 	drm_mm_takedown(&vm->mm);
 
-	gen8_ppgtt_dma_unmap_pages(ppgtt);
 	gen8_ppgtt_free(ppgtt);
 }
 
@@ -566,7 +551,7 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int i;
 
 	for (i = 0; i < max_pdp; i++) {
-		ppgtt->pdp.pagedir[i] = alloc_pd_single();
+		ppgtt->pdp.pagedir[i] = alloc_pd_single(ppgtt->base.dev);
 		if (IS_ERR(ppgtt->pdp.pagedir[i]))
 			goto unwind_out;
 	}
@@ -578,7 +563,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 
 unwind_out:
 	while (i--)
-		free_pd_single(ppgtt->pdp.pagedir[i]);
+		free_pd_single(ppgtt->pdp.pagedir[i],
+			       ppgtt->base.dev);
 
 	return -ENOMEM;
 }
@@ -606,19 +592,6 @@ err_out:
 	return ret;
 }
 
-static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
-					     const int pdpe)
-{
-	int ret;
-
-	ret = i915_dma_map_px_single(ppgtt->pdp.pagedir[pdpe],
-				     ppgtt->base.dev);
-	if (ret)
-		return ret;
-
-	return 0;
-}
-
 /**
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
@@ -644,16 +617,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 		return ret;
 
 	/*
-	 * 2. Create DMA mappings for the page directories and page tables.
-	 */
-	for (i = 0; i < max_pdp; i++) {
-		ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
-		if (ret)
-			goto bail;
-	}
-
-	/*
-	 * 3. Map all the page directory entires to point to the page tables
+	 * 2. Map all the page directory entires to point to the page tables
 	 * we've allocated.
 	 *
 	 * For now, the PPGTT helper functions all require that the PDEs are
@@ -689,11 +653,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 			 ppgtt->num_pd_entries,
 			 (ppgtt->num_pd_entries - min_pt_pages) + size % (1<<30));
 	return 0;
-
-bail:
-	gen8_ppgtt_dma_unmap_pages(ppgtt);
-	gen8_ppgtt_free(ppgtt);
-	return ret;
 }
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
@@ -1071,7 +1030,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
 
-	free_pd_single(&ppgtt->pd);
+	free_pd_single(&ppgtt->pd, ppgtt->base.dev);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 26/56] drm/i915: Track GEN6 page table usage
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (24 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 25/56] drm/i915: Always dma map page directory allocations Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 27/56] drm/i915: Extract context switch skip logic Ben Widawsky
                   ` (30 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Instead of implementing the full tracking + dynamic allocation, this
patch does a bit less than half of the work, by tracking and warning on
unexpected conditions. The tracking itself follows which PTEs within a
page table are currently being used for objects. The next patch will
modify this to actually allocate the page tables only when necessary.

With the current patch there isn't much in the way of making a gen
agnostic range allocation function. However, in the next patch we'll add
more specificity which makes having separate functions a bit easier to
manage.

Notice that aliasing PPGTT is not managed here. The patch which actually
begins dynamic allocation/teardown explains the reasoning forthis.

v2: s/pdp.pagedir/pdp.pagedirs
Make a scratch page allocation helper

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 203 ++++++++++++++++++++++++++++--------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 117 +++++++++++++--------
 2 files changed, 231 insertions(+), 89 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 51fc036..b7a0232 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -66,10 +66,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 	return HAS_ALIASING_PPGTT(dev) ? 1 : 0;
 }
 
-
-static void ppgtt_bind_vma(struct i915_vma *vma,
-			   enum i915_cache_level cache_level,
-			   u32 flags);
+static int ppgtt_bind_vma(struct i915_vma *vma,
+			  enum i915_cache_level cache_level,
+			  u32 flags);
 static void ppgtt_unbind_vma(struct i915_vma *vma);
 static int gen8_ppgtt_enable(struct i915_hw_ppgtt *ppgtt);
 
@@ -232,37 +231,78 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 							 (px)->page, 0, 4096, \
 							 PCI_DMA_BIDIRECTIONAL))
 
-static void free_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
+static void __free_pt_single(struct i915_pagetab *pt, struct drm_device *dev,
+			     int scratch)
 {
+	if (WARN(scratch ^ pt->scratch,
+		 "Tried to free scratch = %d. Is scratch = %d\n",
+		 scratch, pt->scratch))
+		return;
+
 	if (WARN_ON(!pt->page))
 		return;
 
+	if (!scratch) {
+		const size_t count = INTEL_INFO(dev)->gen >= 8 ?
+			GEN8_PTES_PER_PT : GEN6_PTES_PER_PT;
+		WARN(!bitmap_empty(pt->used_ptes, count),
+		     "Free page table with %d used pages\n",
+		     bitmap_weight(pt->used_ptes, count));
+	}
+
 	i915_dma_unmap_single(pt, dev);
 	__free_page(pt->page);
+	kfree(pt->used_ptes);
 	kfree(pt);
 }
 
+#define free_pt_single(pt, dev) \
+	__free_pt_single(pt, dev, false)
+#define free_pt_scratch(pt, dev) \
+	__free_pt_single(pt, dev, true)
+
 static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
 {
 	struct i915_pagetab *pt;
-	int ret;
+	const size_t count = INTEL_INFO(dev)->gen >= 8 ?
+		GEN8_PTES_PER_PT : GEN6_PTES_PER_PT;
+	int ret = -ENOMEM;
 
 	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
 	if (!pt)
 		return ERR_PTR(-ENOMEM);
 
+	pt->used_ptes = kcalloc(BITS_TO_LONGS(count), sizeof(*pt->used_ptes),
+				GFP_KERNEL);
+
+	if (!pt->used_ptes)
+		goto fail_bitmap;
+
 	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	if (!pt->page) {
-		kfree(pt);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!pt->page)
+		goto fail_page;
 
 	ret = i915_dma_map_px_single(pt, dev);
-	if (ret) {
-		__free_page(pt->page);
-		kfree(pt);
-		return ERR_PTR(ret);
-	}
+	if (ret)
+		goto fail_dma;
+
+	return pt;
+
+fail_dma:
+	__free_page(pt->page);
+fail_page:
+	kfree(pt->used_ptes);
+fail_bitmap:
+	kfree(pt);
+
+	return ERR_PTR(ret);
+}
+
+static inline struct i915_pagetab *alloc_pt_scratch(struct drm_device *dev)
+{
+	struct i915_pagetab *pt = alloc_pt_single(dev);
+	if (!IS_ERR(pt))
+		pt->scratch = 1;
 
 	return pt;
 }
@@ -389,7 +429,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int used_pd = ppgtt->num_pd_entries / I915_PDES_PER_PD;
 
 	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pdp.pagedir[i]->daddr;
+		dma_addr_t addr = ppgtt->pdp.pagedirs[i]->daddr;
 		ret = gen8_write_pdp(ring, i, addr, synchronous);
 		if (ret)
 			return ret;
@@ -416,7 +456,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
+		struct i915_pagedir *pd = ppgtt->pdp.pagedirs[pdpe];
 		struct i915_pagetab *pt = pd->page_tables[pde];
 		struct page *page_table = pt->page;
 
@@ -463,7 +503,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 			break;
 
 		if (pt_vaddr == NULL) {
-			struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
+			struct i915_pagedir *pd = ppgtt->pdp.pagedirs[pdpe];
 			struct i915_pagetab *pt = pd->page_tables[pde];
 			struct page *page_table = pt->page;
 			pt_vaddr = kmap_atomic(page_table);
@@ -509,8 +549,8 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
-		free_pd_single(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
+		gen8_free_page_tables(ppgtt->pdp.pagedirs[i], ppgtt->base.dev);
+		free_pd_single(ppgtt->pdp.pagedirs[i], ppgtt->base.dev);
 	}
 }
 
@@ -530,7 +570,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 	int i, ret;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ret = alloc_pt_range(ppgtt->pdp.pagedir[i],
+		ret = alloc_pt_range(ppgtt->pdp.pagedirs[i],
 				     0, I915_PDES_PER_PD, ppgtt->base.dev);
 		if (ret)
 			goto unwind_out;
@@ -540,7 +580,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
+		gen8_free_page_tables(ppgtt->pdp.pagedirs[i], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
@@ -551,8 +591,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int i;
 
 	for (i = 0; i < max_pdp; i++) {
-		ppgtt->pdp.pagedir[i] = alloc_pd_single(ppgtt->base.dev);
-		if (IS_ERR(ppgtt->pdp.pagedir[i]))
+		ppgtt->pdp.pagedirs[i] = alloc_pd_single(ppgtt->base.dev);
+		if (IS_ERR(ppgtt->pdp.pagedirs[i]))
 			goto unwind_out;
 	}
 
@@ -563,7 +603,7 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 
 unwind_out:
 	while (i--)
-		free_pd_single(ppgtt->pdp.pagedir[i],
+		free_pd_single(ppgtt->pdp.pagedirs[i],
 			       ppgtt->base.dev);
 
 	return -ENOMEM;
@@ -625,9 +665,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 * will never need to touch the PDEs again.
 	 */
 	for (i = 0; i < max_pdp; i++) {
-		struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
+		struct i915_pagedir *pd = ppgtt->pdp.pagedirs[i];
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i]->page);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedirs[i]->page);
 		for (j = 0; j < I915_PDES_PER_PD; j++) {
 			struct i915_pagetab *pt = pd->page_tables[j];
 			dma_addr_t addr = pt->daddr;
@@ -726,15 +766,13 @@ static void gen6_map_single(struct i915_pagedir *pd,
 /* Map all the page tables found in the ppgtt structure to incrementing page
  * directories. */
 static void gen6_map_page_range(struct drm_i915_private *dev_priv,
-				struct i915_pagedir *pd, unsigned pde, size_t n)
+				struct i915_pagedir *pd, uint32_t start, uint32_t length)
 {
-	if (WARN_ON(pde + n > I915_PDES_PER_PD))
-		n = I915_PDES_PER_PD - pde;
-
-	n += pde;
+	struct i915_pagetab *pt;
+	uint32_t pde, temp;
 
-	for (; pde < n; pde++)
-		gen6_map_single(pd, pde, pd->page_tables[pde]);
+	gen6_for_each_pde(pt, pd, start, length, temp, pde)
+		gen6_map_single(pd, pde, pt);
 
 	/* Make sure write is complete before other code can use this page
 	 * table. Also require for WC mapped PTEs */
@@ -1023,6 +1061,51 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 }
 
+static int gen6_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		        container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_pagetab *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		int j;
+
+		DECLARE_BITMAP(tmp_bitmap, GEN6_PTES_PER_PT);
+		bitmap_zero(tmp_bitmap, GEN6_PTES_PER_PT);
+		bitmap_set(tmp_bitmap, gen6_pte_index(start),
+			   gen6_pte_count(start, length));
+
+		/* TODO: To be done in the next patch. Map the page/insert
+		 * entries here */
+		for_each_set_bit(j, tmp_bitmap, GEN6_PTES_PER_PT) {
+			if (test_bit(j, pt->used_ptes)) {
+				/* Check that we're changing cache levels */
+			}
+		}
+
+		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+			  GEN6_PTES_PER_PT);
+	}
+
+	return 0;
+}
+
+static void gen6_teardown_va_range(struct i915_address_space *vm,
+				   uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		        container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_pagetab *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		bitmap_clear(pt->used_ptes, gen6_pte_index(start),
+			     gen6_pte_count(start, length));
+	}
+}
+
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
@@ -1030,6 +1113,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
 
+	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
 	free_pd_single(&ppgtt->pd, ppgtt->base.dev);
 }
 
@@ -1057,6 +1141,9 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 	 * size. We allocate at the top of the GTT to avoid fragmentation.
 	 */
 	BUG_ON(!drm_mm_initialized(&dev_priv->gtt.base.mm));
+	ppgtt->scratch_pt = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pt))
+		return PTR_ERR(ppgtt->scratch_pt);
 alloc:
 	ret = drm_mm_insert_node_in_range_generic(&dev_priv->gtt.base.mm,
 						  &ppgtt->node, GEN6_PD_SIZE,
@@ -1068,20 +1155,25 @@ alloc:
 					       GEN6_PD_SIZE, GEN6_PD_ALIGN,
 					       I915_CACHE_NONE, 0);
 		if (ret)
-			return ret;
+			goto err_out;
 
 		retried = true;
 		goto alloc;
 	}
 
 	if (ret)
-		return ret;
+		goto err_out;
+
 
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
 	ppgtt->num_pd_entries = I915_PDES_PER_PD;
 	return 0;
+
+err_out:
+	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
+	return ret;
 }
 
 static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
@@ -1126,6 +1218,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
+	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
+	ppgtt->base.teardown_va_range = gen6_teardown_va_range;
 	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
@@ -1139,7 +1233,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
 		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
-	gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
+	gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 			 ppgtt->node.size >> 20,
@@ -1174,13 +1268,25 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static void
+static int
 ppgtt_bind_vma(struct i915_vma *vma,
 	       enum i915_cache_level cache_level,
 	       u32 flags)
 {
+	int ret;
+
+	WARN_ON(flags);
+	if (vma->vm->allocate_va_range) {
+		ret = vma->vm->allocate_va_range(vma->vm,
+						 vma->node.start,
+						 vma->node.size);
+		if (ret)
+			return ret;
+	}
+
 	vma->vm->insert_entries(vma->vm, vma->obj->pages, vma->node.start,
 				cache_level);
+	return 0;
 }
 
 static void ppgtt_unbind_vma(struct i915_vma *vma)
@@ -1189,6 +1295,9 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
 			     vma->node.start,
 			     vma->obj->base.size,
 			     true);
+	if (vma->vm->teardown_va_range)
+		vma->vm->teardown_va_range(vma->vm,
+					   vma->node.start, vma->node.size);
 }
 
 extern int intel_iommu_gfx_mapped;
@@ -1496,9 +1605,9 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
 }
 
 
-static void i915_ggtt_bind_vma(struct i915_vma *vma,
-			       enum i915_cache_level cache_level,
-			       u32 unused)
+static int i915_ggtt_bind_vma(struct i915_vma *vma,
+			      enum i915_cache_level cache_level,
+			      u32 unused)
 {
 	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
 	unsigned int flags = (cache_level == I915_CACHE_NONE) ?
@@ -1507,6 +1616,8 @@ static void i915_ggtt_bind_vma(struct i915_vma *vma,
 	BUG_ON(!i915_is_ggtt(vma->vm));
 	intel_gtt_insert_sg_entries(vma->obj->pages, entry, flags);
 	vma->obj->has_global_gtt_mapping = 1;
+
+	return 0;
 }
 
 static void i915_ggtt_clear_range(struct i915_address_space *vm,
@@ -1529,9 +1640,9 @@ static void i915_ggtt_unbind_vma(struct i915_vma *vma)
 	intel_gtt_clear_range(first, size);
 }
 
-static void ggtt_bind_vma(struct i915_vma *vma,
-			  enum i915_cache_level cache_level,
-			  u32 flags)
+static int ggtt_bind_vma(struct i915_vma *vma,
+			 enum i915_cache_level cache_level,
+			 u32 flags)
 {
 	struct drm_device *dev = vma->vm->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1559,7 +1670,7 @@ static void ggtt_bind_vma(struct i915_vma *vma,
 	}
 
 	if (!(flags & ALIASING_BIND))
-		return;
+		return 0;
 
 	if (dev_priv->mm.aliasing_ppgtt &&
 	    (!obj->has_aliasing_ppgtt_mapping ||
@@ -1571,6 +1682,8 @@ static void ggtt_bind_vma(struct i915_vma *vma,
 					    cache_level);
 		vma->obj->has_aliasing_ppgtt_mapping = 1;
 	}
+
+	return 0;
 }
 
 static void ggtt_unbind_vma(struct i915_vma *vma)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index fea846d..1246df1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -169,9 +169,33 @@ struct i915_vma {
 #define GLOBAL_BIND (1<<0)
 /* Only use this if you know you want a strictly aliased binding */
 #define ALIASING_BIND (1<<1)
-	void (*bind_vma)(struct i915_vma *vma,
-			 enum i915_cache_level cache_level,
-			 u32 flags);
+	int (*bind_vma)(struct i915_vma *vma,
+			enum i915_cache_level cache_level,
+			u32 flags);
+};
+
+
+struct i915_pagetab {
+	struct page *page;
+	dma_addr_t daddr;
+
+	unsigned long *used_ptes;
+	unsigned int scratch:1;
+};
+
+struct i915_pagedir {
+	struct page *page; /* NULL for GEN6-GEN7 */
+	union {
+		uint32_t pd_offset;
+		dma_addr_t daddr;
+	};
+
+	struct i915_pagetab *page_tables[I915_PDES_PER_PD];
+};
+
+struct i915_pagedirpo {
+	/* struct page *page; */
+	struct i915_pagedir *pagedirs[GEN8_LEGACY_PDPES];
 };
 
 struct i915_address_space {
@@ -213,6 +237,12 @@ struct i915_address_space {
 	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
 				     enum i915_cache_level level,
 				     bool valid); /* Create a valid PTE */
+	int (*allocate_va_range)(struct i915_address_space *vm,
+				 uint64_t start,
+				 uint64_t length);
+	void (*teardown_va_range)(struct i915_address_space *vm,
+				  uint64_t start,
+				  uint64_t length);
 	void (*clear_range)(struct i915_address_space *vm,
 			    uint64_t start,
 			    uint64_t length,
@@ -224,6 +254,30 @@ struct i915_address_space {
 	void (*cleanup)(struct i915_address_space *vm);
 };
 
+struct i915_hw_ppgtt {
+	struct i915_address_space base;
+	struct kref ref;
+	struct drm_mm_node node;
+	unsigned num_pd_entries;
+	unsigned num_pd_pages; /* gen8+ */
+	union {
+		struct i915_pagedirpo pdp;
+		struct i915_pagedir pd;
+	};
+
+	struct i915_pagetab *scratch_pt;
+
+	struct i915_hw_context *ctx;
+
+	gen6_gtt_pte_t __iomem *pd_addr;
+
+	int (*enable)(struct i915_hw_ppgtt *ppgtt);
+	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
+			 struct intel_ring_buffer *ring,
+			 bool synchronous);
+	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
+};
+
 /* The Graphics Translation Table is the way in which GEN hardware translates a
  * Graphics Virtual Address into a Physical Address. In addition to the normal
  * collateral associated with any va->pa translations GEN hardware also has a
@@ -252,47 +306,22 @@ struct i915_gtt {
 			  unsigned long *mappable_end);
 };
 
-struct i915_pagetab {
-	struct page *page;
-	dma_addr_t daddr;
-};
-
-struct i915_pagedir {
-	struct page *page; /* NULL for GEN6-GEN7 */
-	union {
-		uint32_t pd_offset;
-		dma_addr_t daddr;
-	};
-
-	struct i915_pagetab *page_tables[I915_PDES_PER_PD]; /* PDEs */
-};
-
-struct i915_pagedirpo {
-	/* struct page *page; */
-	struct i915_pagedir *pagedir[GEN8_LEGACY_PDPES];
-};
-
-struct i915_hw_ppgtt {
-	struct i915_address_space base;
-	struct kref ref;
-	struct drm_mm_node node;
-	unsigned num_pd_entries;
-	unsigned num_pd_pages; /* gen8+ */
-	union {
-		struct i915_pagedirpo pdp;
-		struct i915_pagedir pd;
-	};
-
-	struct i915_hw_context *ctx;
-
-	gen6_gtt_pte_t __iomem *pd_addr;
-
-	int (*enable)(struct i915_hw_ppgtt *ppgtt);
-	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
-			 struct intel_ring_buffer *ring,
-			 bool synchronous);
-	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
-};
+/* For each pde iterates over every pde between from start until start + length.
+ * If start, and start+length are not perfectly divisible, the macro will round
+ * down, and up as needed. The macro modifies pde, start, and length. Dev is
+ * only used to differentiate shift values. Temp is temp.  On gen6/7, start = 0,
+ * and length = 2G effectively iterates over every PDE in the system. On gen8+
+ * it simply iterates over every page directory entry in a page directory.
+ *
+ * XXX: temp is not actually needed, but it saves doing the ALIGN operation.
+ */
+#define gen6_for_each_pde(pt, pd, start, length, temp, iter) \
+	for (iter = gen6_pde_index(start), pt = (pd)->page_tables[iter]; \
+	     length > 0 && iter < I915_PDES_PER_PD; \
+	     pt = (pd)->page_tables[++iter], \
+	     temp = ALIGN(start+1, 1 << GEN6_PDE_SHIFT) - start, \
+	     temp = min(temp, (unsigned)length), \
+	     start += temp, length -= temp)
 
 static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
 {
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 27/56] drm/i915: Extract context switch skip logic
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (25 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 26/56] drm/i915: Track GEN6 page table usage Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 28/56] drm/i915: Force pd restore when PDEs change, gen6-7 Ben Widawsky
                   ` (29 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

We have some fanciness coming up. This patch just breaks out the logic.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f2dc17a..7eb4091 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -594,6 +594,16 @@ mi_set_context(struct intel_ring_buffer *ring,
 	return ret;
 }
 
+static inline bool should_skip_switch(struct intel_ring_buffer *ring,
+				      struct i915_hw_context *from,
+				      struct i915_hw_context *to)
+{
+	if (from == to && from->last_ring == ring && !to->remap_slice)
+		return true;
+
+	return false;
+}
+
 static int do_switch(struct intel_ring_buffer *ring,
 		     struct i915_hw_context *to)
 {
@@ -608,7 +618,7 @@ static int do_switch(struct intel_ring_buffer *ring,
 		BUG_ON(!i915_gem_obj_is_pinned(from->obj));
 	}
 
-	if (from == to && from->last_ring == ring && !to->remap_slice)
+	if (should_skip_switch(ring, from, to))
 		return 0;
 
 	/* Trying to pin first makes error handling easier. */
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 28/56] drm/i915: Force pd restore when PDEs change, gen6-7
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (26 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 27/56] drm/i915: Extract context switch skip logic Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 29/56] drm/i915: Finish gen6/7 dynamic page table allocation Ben Widawsky
                   ` (28 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

The docs say you cannot change the PDEs of a currently running context. If you
are changing the PDEs of the currently running context then. We never
map new PDEs of a running context, and expect them to be present - so I
think this is okay. (We can unmap, but this should also be okay since we
only unmap unreferenced objects that the GPU shouldn't be tryingto
va->pa xlate.) The MI_SET_CONTEXT command does have a flag to signal
that even if the context is the same, force a reload. It's unclear
exactly what this does, but I have a hunch it's the right thing to do.

The logic assumes that we always emit a context switch after mapping new
PDEs, and before we submit a batch. This is the case today, and has been
the case since the inception of hardware contexts. A note in the comment
let's the user know.

NOTE: I have no evidence to suggest this is actually needed other than a
few tidbits which lead me to believe there are some corner cases that
will require it. I'm mostly depending on the reload of DCLV to
invalidate the old TLBs. We can try to remove this patch and see what
happens.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_context.c    | 15 ++++++++++++---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 +++++
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 17 ++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  2 ++
 4 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 7eb4091..5155d09 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -596,9 +596,18 @@ mi_set_context(struct intel_ring_buffer *ring,
 
 static inline bool should_skip_switch(struct intel_ring_buffer *ring,
 				      struct i915_hw_context *from,
-				      struct i915_hw_context *to)
+				      struct i915_hw_context *to,
+				      u32 *flags)
 {
-	if (from == to && from->last_ring == ring && !to->remap_slice)
+	if (test_and_clear_bit(ring->id, &to->vm->pd_reload_mask)) {
+		*flags |= MI_FORCE_RESTORE;
+		return false;
+	}
+
+	if (to->remap_slice)
+		return false;
+
+	if (from == to && from->last_ring == ring)
 		return true;
 
 	return false;
@@ -618,7 +627,7 @@ static int do_switch(struct intel_ring_buffer *ring,
 		BUG_ON(!i915_gem_obj_is_pinned(from->obj));
 	}
 
-	if (should_skip_switch(ring, from, to))
+	if (should_skip_switch(ring, from, to, &hw_flags))
 		return 0;
 
 	/* Trying to pin first makes error handling easier. */
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3c3aba7..08fde7d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1224,6 +1224,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (ret)
 		goto err;
 
+	/* XXX: Reserve has possibly change PDEs which means we must do a
+	 * context switch before we can coherently read some of the reserved
+	 * VMAs. */
+
 	/* The objects are in their final locations, apply the relocations. */
 	if (need_relocs)
 		ret = i915_gem_execbuffer_relocate(eb);
@@ -1328,6 +1332,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 				goto err;
 		}
 	} else {
+		WARN_ON(vm->pd_reload_mask & (1<<ring->id));
 		ret = ring->dispatch_execbuffer(ring,
 						exec_start, exec_len,
 						flags);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index b7a0232..1d459e3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1268,6 +1268,16 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
+/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
+ * are switching between contexts with the same LRCA, we also must do a force
+ * restore.
+ */
+#define ppgtt_invalidate_tlbs(vm) do {\
+	if (INTEL_INFO(vm->dev)->gen < 8) { \
+		vm->pd_reload_mask = INTEL_INFO(vm->dev)->ring_mask; \
+	} \
+} while(0)
+
 static int
 ppgtt_bind_vma(struct i915_vma *vma,
 	       enum i915_cache_level cache_level,
@@ -1282,10 +1292,13 @@ ppgtt_bind_vma(struct i915_vma *vma,
 						 vma->node.size);
 		if (ret)
 			return ret;
+
+		ppgtt_invalidate_tlbs(vma->vm);
 	}
 
 	vma->vm->insert_entries(vma->vm, vma->obj->pages, vma->node.start,
 				cache_level);
+
 	return 0;
 }
 
@@ -1295,9 +1308,11 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
 			     vma->node.start,
 			     vma->obj->base.size,
 			     true);
-	if (vma->vm->teardown_va_range)
+	if (vma->vm->teardown_va_range) {
 		vma->vm->teardown_va_range(vma->vm,
 					   vma->node.start, vma->node.size);
+		ppgtt_invalidate_tlbs(vma->vm);
+	}
 }
 
 extern int intel_iommu_gfx_mapped;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 1246df1..08d49c1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -210,6 +210,8 @@ struct i915_address_space {
 		struct page *page;
 	} scratch;
 
+	unsigned long pd_reload_mask;
+
 	/**
 	 * List of objects currently involved in rendering.
 	 *
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 29/56] drm/i915: Finish gen6/7 dynamic page table allocation
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (27 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 28/56] drm/i915: Force pd restore when PDEs change, gen6-7 Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 30/56] drm/i915/bdw: Use dynamic allocation idioms on free Ben Widawsky
                   ` (27 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This patch continues on the idea from the previous patch. From here on,
in the steady state, PDEs are all pointing to the scratch page table (as
recommended in the spec). When an object is allocated in the VA range,
the code will determine if we need to allocate a page for the page
table. Similarly when the object is destroyed, we will remove, and free
the page table pointing the PDE back to the scratch page.

Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.

The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV.  Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.

We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.

v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  19 ++++-
 drivers/gpu/drm/i915/i915_gem_context.c |   2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c     | 118 +++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_gem_gtt.h     |   2 +-
 drivers/gpu/drm/i915/i915_trace.h       | 108 +++++++++++++++++++++++++++++
 5 files changed, 238 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 64051b0..921d898 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1812,10 +1812,26 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
 	return 0;
 }
 
+static size_t gen6_ppgtt_count_pt_pages(struct i915_hw_ppgtt *ppgtt)
+{
+	struct i915_pagedir *pd = &ppgtt->pd;
+	struct i915_pagetab **pt = &pd->page_tables[0];
+	size_t cnt = 0;
+	int i;
+
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		if (pt[i] != ppgtt->scratch_pt)
+			cnt++;
+	}
+
+	return cnt;
+}
+
 static void print_ppgtt(struct seq_file *m, struct i915_hw_ppgtt *ppgtt, const char *name)
 {
 	seq_printf(m, "%s:\n", name);
 	seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
+	seq_printf(m, "\tpd pages: %zu\n", gen6_ppgtt_count_pt_pages(ppgtt));
 }
 
 static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int verbose)
@@ -1874,6 +1890,8 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool ver
 		seq_printf(m, "PP_DIR_BASE_READ: 0x%08x\n", I915_READ(RING_PP_DIR_BASE_READ(ring)));
 		seq_printf(m, "PP_DIR_DCLV: 0x%08x\n", I915_READ(RING_PP_DIR_DCLV(ring)));
 	}
+	seq_printf(m, "ECOCHK: 0x%08x\n\n", I915_READ(GAM_ECOCHK));
+
 	if (dev_priv->mm.aliasing_ppgtt) {
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
@@ -1894,7 +1912,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool ver
 		idr_for_each(&file_priv->context_idr, per_file_ctx,
 			     (void *)((unsigned long)m | verbose));
 	}
-	seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
 }
 
 static int i915_ppgtt_info(struct seq_file *m, void *data)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 5155d09..fec8114 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -208,7 +208,7 @@ create_vm_for_ctx(struct drm_device *dev, struct i915_hw_context *ctx)
 	if (!ppgtt)
 		return ERR_PTR(-ENOMEM);
 
-	ret = i915_gem_init_ppgtt(dev, ppgtt);
+	ret = i915_gem_init_ppgtt(dev, ppgtt, ctx->file_priv == NULL);
 	if (ret) {
 		kfree(ppgtt);
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 1d459e3..68cc1ab 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1064,10 +1064,47 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 static int gen6_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start, uint64_t length)
 {
+	DECLARE_BITMAP(new_page_tables, I915_PDES_PER_PD);
+	struct drm_device *dev = vm->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_hw_ppgtt *ppgtt =
 		        container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_pagetab *pt;
+	const uint32_t start_save = start, length_save = length;
 	uint32_t pde, temp;
+	int ret;
+
+	BUG_ON(upper_32_bits(start));
+
+	bitmap_zero(new_page_tables, I915_PDES_PER_PD);
+
+	/* The allocation is done in two stages so that we can bail out with
+	 * minimal amount of pain. The first stage finds new page tables that
+	 * need allocation. The second stage marks use ptes within the page
+	 * tables.
+	 */
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		if (pt != ppgtt->scratch_pt) {
+			WARN_ON(bitmap_empty(pt->used_ptes, GEN6_PTES_PER_PT));
+			continue;
+		}
+
+		/* We've already allocated a page table */
+		WARN_ON(!bitmap_empty(pt->used_ptes, GEN6_PTES_PER_PT));
+
+		pt = alloc_pt_single(dev);
+		if (IS_ERR(pt)) {
+			ret = PTR_ERR(pt);
+			goto unwind_out;
+		}
+
+		ppgtt->pd.page_tables[pde] = pt;
+		set_bit(pde, new_page_tables);
+		trace_i915_pagetable_alloc(vm, pde, start, GEN6_PDE_SHIFT);
+	}
+
+	start = start_save;
+	length = length_save;
 
 	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
 		int j;
@@ -1085,11 +1122,32 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 			}
 		}
 
-		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+		if (test_and_clear_bit(pde, new_page_tables))
+			gen6_map_single(&ppgtt->pd, pde, pt);
+
+		trace_i915_pagetable_map(vm, pde, pt,
+					 gen6_pte_index(start),
+					 gen6_pte_count(start, length),
+					 GEN6_PTES_PER_PT);
+		bitmap_or(pt->used_ptes, tmp_bitmap, pt->used_ptes,
 			  GEN6_PTES_PER_PT);
 	}
 
+	WARN_ON(!bitmap_empty(new_page_tables, I915_PDES_PER_PD));
+
+	/* Make sure write is complete before other code can use this page
+	 * table. Also require for WC mapped PTEs */
+	readl(dev_priv->gtt.gsm);
+
 	return 0;
+
+unwind_out:
+	for_each_set_bit(pde, new_page_tables, I915_PDES_PER_PD) {
+		struct i915_pagetab *pt = ppgtt->pd.page_tables[pde];
+		ppgtt->pd.page_tables[pde] = NULL;
+		free_pt_single(pt, vm->dev);
+	}
+	return ret;
 }
 
 static void gen6_teardown_va_range(struct i915_address_space *vm,
@@ -1101,8 +1159,27 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
 	uint32_t pde, temp;
 
 	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+
+		if (WARN(pt == ppgtt->scratch_pt,
+		    "Tried to teardown scratch page vm %p. pde %u: %llx-%llx\n",
+		    vm, pde, start, start + length))
+			continue;
+
+		trace_i915_pagetable_unmap(vm, pde, pt,
+					   gen6_pte_index(start),
+					   gen6_pte_count(start, length),
+					   GEN6_PTES_PER_PT);
+
 		bitmap_clear(pt->used_ptes, gen6_pte_index(start),
 			     gen6_pte_count(start, length));
+
+		if (bitmap_empty(pt->used_ptes, GEN6_PTES_PER_PT)) {
+			trace_i915_pagetable_destroy(vm, pde,
+						     start & GENMASK_ULL(64, GEN6_PDE_SHIFT),
+						     GEN6_PDE_SHIFT);
+			gen6_map_single(&ppgtt->pd, pde, ppgtt->scratch_pt);
+			ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
+		}
 	}
 }
 
@@ -1110,9 +1187,13 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		struct i915_pagetab *pt = ppgtt->pd.page_tables[i];
+		if (pt != ppgtt->scratch_pt)
+			free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+	}
 
+	/* Consider putting this as part of pd free. */
 	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
 	free_pd_single(&ppgtt->pd, ppgtt->base.dev);
 }
@@ -1176,7 +1257,7 @@ err_out:
 	return ret;
 }
 
-static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
+static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt, bool preallocate_pt)
 {
 	int ret;
 
@@ -1184,9 +1265,13 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
+	if (!preallocate_pt)
+		return 0;
+
 	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
 			     ppgtt->base.dev);
 	if (ret) {
+		free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
 	}
@@ -1194,8 +1279,17 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
+static void gen6_scratch_va_range(struct i915_hw_ppgtt *ppgtt,
+				  uint64_t start, uint64_t length)
+{
+	struct i915_pagetab *unused;
+	uint32_t pde, temp;
 
-static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde)
+		ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
+}
+
+static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 {
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1214,7 +1308,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	} else
 		BUG();
 
-	ret = gen6_ppgtt_alloc(ppgtt);
+	ret = gen6_ppgtt_alloc(ppgtt, aliasing);
 	if (ret)
 		return ret;
 
@@ -1233,6 +1327,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
 		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
+	if (!aliasing)
+		gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
+
 	gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
@@ -1242,7 +1339,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
+int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt, bool aliasing)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret = 0;
@@ -1251,7 +1348,7 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.scratch = dev_priv->gtt.base.scratch;
 
 	if (INTEL_INFO(dev)->gen < 8)
-		ret = gen6_ppgtt_init(ppgtt);
+		ret = gen6_ppgtt_init(ppgtt, aliasing);
 	else if (IS_GEN8(dev))
 		ret = gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
 	else
@@ -1287,6 +1384,8 @@ ppgtt_bind_vma(struct i915_vma *vma,
 
 	WARN_ON(flags);
 	if (vma->vm->allocate_va_range) {
+		trace_i915_va_alloc(vma->vm, vma->node.start, vma->node.size);
+
 		ret = vma->vm->allocate_va_range(vma->vm,
 						 vma->node.start,
 						 vma->node.size);
@@ -1309,6 +1408,9 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
 			     vma->obj->base.size,
 			     true);
 	if (vma->vm->teardown_va_range) {
+		trace_i915_va_teardown(vma->vm,
+				       vma->node.start, vma->node.size);
+
 		vma->vm->teardown_va_range(vma->vm,
 					   vma->node.start, vma->node.size);
 		ppgtt_invalidate_tlbs(vma->vm);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 08d49c1..d8a990e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -421,7 +421,7 @@ void i915_gem_setup_global_gtt(struct drm_device *dev, unsigned long start,
 			       unsigned long mappable_end, unsigned long end);
 
 bool intel_enable_ppgtt(struct drm_device *dev, bool full);
-int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt);
+int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt, bool aliasing);
 
 void i915_check_and_clear_faults(struct drm_device *dev);
 void i915_gem_suspend_gtt_mappings(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index b29d7b1..99a436d 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -156,6 +156,114 @@ TRACE_EVENT(i915_vma_unbind,
 		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
 );
 
+DECLARE_EVENT_CLASS(i915_va,
+	TP_PROTO(struct i915_address_space *vm, u64 start, u64 length),
+	TP_ARGS(vm, start, length),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u64, start)
+		__field(u64, end)
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->start = start;
+		__entry->end = start + length;
+	),
+
+	TP_printk("vm=%p, 0x%llx-0x%llx", __entry->vm, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_va, i915_va_alloc,
+	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length),
+	     TP_ARGS(vm, start, length)
+);
+
+DEFINE_EVENT(i915_va, i915_va_teardown,
+	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length),
+	     TP_ARGS(vm, start, length)
+);
+
+DECLARE_EVENT_CLASS(i915_pagetable,
+	TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	TP_ARGS(vm, pde, start, pde_shift),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u32, pde)
+		__field(u64, start)
+		__field(u64, end)
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->pde = pde;
+		__entry->start = start;
+		__entry->end = start + (1ULL << pde_shift);
+	),
+
+	TP_printk("vm=%p, pde=%d (0x%llx-0x%llx)",
+		  __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_pagetable, i915_pagetable_alloc,
+	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	     TP_ARGS(vm, pde, start, pde_shift)
+);
+
+DEFINE_EVENT(i915_pagetable, i915_pagetable_destroy,
+	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	     TP_ARGS(vm, pde, start, pde_shift)
+);
+
+/* Avoid extra math because we only support two sizes. The format is defined by
+ * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
+#define TRACE_PT_SIZE(bits) \
+	((((bits) == 1024) ? 288 : 144) + 1)
+
+DECLARE_EVENT_CLASS(i915_pagetable_update,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u32, pde)
+		__field(u32, first)
+		__field(u32, last)
+		__dynamic_array(char, cur_ptes, TRACE_PT_SIZE(bits))
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->pde = pde;
+		__entry->first = first;
+		__entry->last = first + len;
+
+		bitmap_scnprintf(__get_str(cur_ptes),
+				 TRACE_PT_SIZE(bits),
+				 pt->used_ptes,
+				 bits);
+	),
+
+	TP_printk("vm=%p, pde=%d, updating %u:%u\t%s",
+		  __entry->vm, __entry->pde, __entry->last, __entry->first,
+		  __get_str(cur_ptes))
+);
+
+DEFINE_EVENT(i915_pagetable_update, i915_pagetable_map,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits)
+);
+
+DEFINE_EVENT(i915_pagetable_update, i915_pagetable_unmap,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits)
+);
+
 TRACE_EVENT(i915_gem_object_change_domain,
 	    TP_PROTO(struct drm_i915_gem_object *obj, u32 old_read, u32 old_write),
 	    TP_ARGS(obj, old_read, old_write),
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 30/56] drm/i915/bdw: Use dynamic allocation idioms on free
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (28 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 29/56] drm/i915: Finish gen6/7 dynamic page table allocation Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 31/56] drm/i915/bdw: pagedirs rework allocation Ben Widawsky
                   ` (26 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

The page directory freer is left here for now as it's still useful given
that GEN8 still preallocates. Once the allocation functions are broken
up into more discrete chunks, we'll follow suit and destroy this
leftover piece.

comments

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 45 ++++++++++++++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 26 +++++++++++++++++++++
 2 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 68cc1ab..14aae05 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -531,27 +531,40 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct i915_pagedir *pd, struct drm_device *dev)
+static void gen8_teardown_va_range(struct i915_address_space *vm,
+				   uint64_t start, uint64_t length)
 {
-	int i;
-
-	if (!pd->page)
-		return;
-
-	for (i = 0; i < I915_PDES_PER_PD; i++) {
-		free_pt_single(pd->page_tables[i], dev);
-		pd->page_tables[i] = NULL;
+	struct i915_hw_ppgtt *ppgtt =
+		        container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_pagedir *pd;
+	struct i915_pagetab *pt;
+	uint64_t temp;
+	uint32_t pdpe, pde;
+
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		uint64_t pd_len = gen8_clamp_pd(start, length);
+		uint64_t pd_start = start;
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+			free_pt_single(pt, vm->dev);
+		}
+		free_pd_single(pd, vm->dev);
 	}
 }
 
-static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+/* This function will die soon */
+static void gen8_free_full_pagedir(struct i915_hw_ppgtt *ppgtt, int i)
 {
-	int i;
+	gen8_teardown_va_range(&ppgtt->base,
+			       i << GEN8_PDPE_SHIFT,
+			       (1 << GEN8_PDPE_SHIFT));
+}
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->pdp.pagedirs[i], ppgtt->base.dev);
-		free_pd_single(ppgtt->pdp.pagedirs[i], ppgtt->base.dev);
-	}
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+{
+	trace_i915_va_teardown(&ppgtt->base,
+			       ppgtt->base.start, ppgtt->base.total);
+	gen8_teardown_va_range(&ppgtt->base,
+			       ppgtt->base.start, ppgtt->base.total);
 }
 
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
@@ -580,7 +593,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(ppgtt->pdp.pagedirs[i], ppgtt->base.dev);
+		gen8_free_full_pagedir(ppgtt, i);
 
 	return -ENOMEM;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index d8a990e..f81b26a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -395,6 +395,32 @@ static inline size_t gen6_pde_count(uint32_t addr, uint32_t length)
 	return i915_pde_count(addr, length, GEN6_PDE_SHIFT);
 }
 
+#define gen8_for_each_pde(pt, pd, start, length, temp, iter)		\
+	for (iter = gen8_pde_index(start), pt = (pd)->page_tables[iter]; \
+	     length > 0 && iter < I915_PDES_PER_PD;			\
+	     pt = (pd)->page_tables[++iter],				\
+	     temp = ALIGN(start+1, 1 << GEN8_PDE_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
+#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
+	for (iter = gen8_pdpe_index(start), pd = (pdp)->pagedirs[iter];	\
+	     length > 0 && iter < GEN8_LEGACY_PDPES;			\
+	     pd = (pdp)->pagedirs[++iter],				\
+	     temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
+/* Clamp length to the next pagedir boundary */
+static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
+{
+	uint64_t next_pd = ALIGN(start + 1, 1 << GEN8_PDPE_SHIFT);
+	if (next_pd > (start + length))
+		return length;
+
+	return next_pd - start;
+}
+
 static inline uint32_t gen8_pte_index(uint64_t address)
 {
 	return i915_pte_index(address, GEN8_PDE_SHIFT);
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 31/56] drm/i915/bdw: pagedirs rework allocation
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (29 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 30/56] drm/i915/bdw: Use dynamic allocation idioms on free Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 32/56] drm/i915/bdw: pagetable allocation rework Ben Widawsky
                   ` (25 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 43 ++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 14aae05..10cfad8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -546,8 +546,10 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 		uint64_t pd_start = start;
 		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
 			free_pt_single(pt, vm->dev);
+			pd->page_tables[pde] = NULL;
 		}
 		free_pd_single(pd, vm->dev);
+		ppgtt->pdp.pagedirs[pdpe] = NULL;
 	}
 }
 
@@ -598,26 +600,40 @@ unwind_out:
 	return -ENOMEM;
 }
 
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-						const int max_pdp)
+static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
+				     uint64_t start,
+				     uint64_t length)
 {
-	int i;
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pdp, struct i915_hw_ppgtt, pdp);
+	struct i915_pagedir *unused;
+	uint64_t temp;
+	uint32_t pdpe;
 
-	for (i = 0; i < max_pdp; i++) {
-		ppgtt->pdp.pagedirs[i] = alloc_pd_single(ppgtt->base.dev);
-		if (IS_ERR(ppgtt->pdp.pagedirs[i]))
+	/* FIXME: PPGTT container_of won't work for 64b */
+	BUG_ON((start + length) > 0x800000000ULL);
+
+	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
+		BUG_ON(unused);
+		pdp->pagedirs[pdpe] = alloc_pd_single(ppgtt->base.dev);
+		if (IS_ERR(ppgtt->pdp.pagedirs[pdpe]))
 			goto unwind_out;
+
+		ppgtt->num_pd_pages++;
 	}
 
-	ppgtt->num_pd_pages = max_pdp;
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
 
 unwind_out:
-	while (i--)
-		free_pd_single(ppgtt->pdp.pagedirs[i],
+	while (pdpe--) {
+		free_pd_single(ppgtt->pdp.pagedirs[pdpe],
 			       ppgtt->base.dev);
+		ppgtt->num_pd_pages--;
+	}
+
+	WARN_ON(ppgtt->num_pd_pages);
 
 	return -ENOMEM;
 }
@@ -627,7 +643,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 {
 	int ret;
 
-	ret = gen8_ppgtt_allocate_page_directories(ppgtt, max_pdp);
+	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, ppgtt->base.start,
+					ppgtt->base.total);
 	if (ret)
 		return ret;
 
@@ -664,6 +681,10 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	if (size % (1<<30))
 		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
 
+	ppgtt->base.start = 0;
+	ppgtt->base.total = size;
+	BUG_ON(ppgtt->base.total == 0);
+
 	/* 1. Do all our allocations for page directories and page tables. */
 	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
 	if (ret)
@@ -697,8 +718,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
-	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PT * PAGE_SIZE;
 
 	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
 			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 32/56] drm/i915/bdw: pagetable allocation rework
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (30 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 31/56] drm/i915/bdw: pagedirs rework allocation Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 33/56] drm/i915/bdw: Make the pdp switch a bit less hacky Ben Widawsky
                   ` (24 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 54 ++++++++++++++++++++-----------------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 10 +++++++
 2 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 10cfad8..041ddca 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -553,14 +553,6 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 	}
 }
 
-/* This function will die soon */
-static void gen8_free_full_pagedir(struct i915_hw_ppgtt *ppgtt, int i)
-{
-	gen8_teardown_va_range(&ppgtt->base,
-			       i << GEN8_PDPE_SHIFT,
-			       (1 << GEN8_PDPE_SHIFT));
-}
-
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	trace_i915_va_teardown(&ppgtt->base,
@@ -580,22 +572,27 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_alloc_pagetabs(struct i915_pagedir *pd,
+				     uint64_t start,
+				     uint64_t length,
+				     struct drm_device *dev)
 {
-	int i, ret;
+	struct i915_pagetab *unused;
+	uint64_t temp;
+	uint32_t pde;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ret = alloc_pt_range(ppgtt->pdp.pagedirs[i],
-				     0, I915_PDES_PER_PD, ppgtt->base.dev);
-		if (ret)
+	gen8_for_each_pde(unused, pd, start, length, temp, pde) {
+		BUG_ON(unused);
+		pd->page_tables[pde] = alloc_pt_single(dev);
+		if (IS_ERR(pd->page_tables[pde]))
 			goto unwind_out;
 	}
 
 	return 0;
 
 unwind_out:
-	while (i--)
-		gen8_free_full_pagedir(ppgtt, i);
+	while (pde--)
+		free_pt_single(pd->page_tables[pde], dev);
 
 	return -ENOMEM;
 }
@@ -639,20 +636,28 @@ unwind_out:
 }
 
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
-			    const int max_pdp)
+			    uint64_t start,
+			    uint64_t length)
 {
+	struct i915_pagedir *pd;
+	uint64_t temp;
+	uint32_t pdpe;
 	int ret;
 
-	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, ppgtt->base.start,
-					ppgtt->base.total);
+	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length);
 	if (ret)
 		return ret;
 
-	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
-	if (ret)
-		goto err_out;
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
+						ppgtt->base.dev);
+		if (ret)
+			goto err_out;
+
+		ppgtt->num_pd_entries += I915_PDES_PER_PD;
+	}
 
-	ppgtt->num_pd_entries = max_pdp * I915_PDES_PER_PD;
+	BUG_ON(pdpe > ppgtt->num_pd_pages);
 
 	return 0;
 
@@ -683,10 +688,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
-	BUG_ON(ppgtt->base.total == 0);
 
 	/* 1. Do all our allocations for page directories and page tables. */
-	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
+	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index f81b26a..fae0867 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -411,6 +411,16 @@ static inline size_t gen6_pde_count(uint32_t addr, uint32_t length)
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
+/* Clamp length to the next pagetab boundary */
+static inline uint64_t gen8_clamp_pt(uint64_t start, uint64_t length)
+{
+	uint64_t next_pt = ALIGN(start + 1, 1 << GEN8_PDE_SHIFT);
+	if (next_pt > (start + length))
+		return length;
+
+	return next_pt - start;
+}
+
 /* Clamp length to the next pagedir boundary */
 static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
 {
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 33/56] drm/i915/bdw: Make the pdp switch a bit less hacky
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (31 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 32/56] drm/i915/bdw: pagetable allocation rework Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 34/56] drm/i915: num_pd_pages/num_pd_entries isn't useful Ben Widawsky
                   ` (23 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

One important part of this patch is we now write a scratch page
directory into any unused PDP descriptors. This matters for 2 reasons,
first, it's not clear we're allowed to just use 0, or an invalid
pointer, and second, we must wipe out any previous contents from the last
context.

The latter point only matters with full PPGTT. The former point would
only effect 32b platforms, or platforms with less than 4GB memory.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 32 ++++++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  5 ++++-
 2 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 041ddca..a895f4b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -390,8 +390,10 @@ static struct i915_pagedir *alloc_pd_single(struct drm_device *dev)
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
-static int gen8_write_pdp(struct intel_ring_buffer *ring, unsigned entry,
-			   uint64_t val, bool synchronous)
+static int gen8_write_pdp(struct intel_ring_buffer *ring,
+			  unsigned entry,
+			  dma_addr_t addr,
+			  bool synchronous)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	int ret;
@@ -399,8 +401,8 @@ static int gen8_write_pdp(struct intel_ring_buffer *ring, unsigned entry,
 	BUG_ON(entry >= 4);
 
 	if (synchronous) {
-		I915_WRITE(GEN8_RING_PDP_UDW(ring, entry), val >> 32);
-		I915_WRITE(GEN8_RING_PDP_LDW(ring, entry), (u32)val);
+		I915_WRITE(GEN8_RING_PDP_UDW(ring, entry), upper_32_bits(addr));
+		I915_WRITE(GEN8_RING_PDP_LDW(ring, entry), lower_32_bits(addr));
 		return 0;
 	}
 
@@ -410,10 +412,10 @@ static int gen8_write_pdp(struct intel_ring_buffer *ring, unsigned entry,
 
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 	intel_ring_emit(ring, GEN8_RING_PDP_UDW(ring, entry));
-	intel_ring_emit(ring, (u32)(val >> 32));
+	intel_ring_emit(ring, upper_32_bits(addr));
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 	intel_ring_emit(ring, GEN8_RING_PDP_LDW(ring, entry));
-	intel_ring_emit(ring, (u32)(val));
+	intel_ring_emit(ring, lower_32_bits(addr));
 	intel_ring_advance(ring);
 
 	return 0;
@@ -425,11 +427,11 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 {
 	int i, ret;
 
-	/* bit of a hack to find the actual last used pd */
-	int used_pd = ppgtt->num_pd_entries / I915_PDES_PER_PD;
-
-	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pdp.pagedirs[i]->daddr;
+	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
+		struct i915_pagedir *pd = ppgtt->pdp.pagedirs[i];
+		dma_addr_t addr = pd ? pd->daddr : ppgtt->scratch_pt->daddr;
+		/* The page directory might be NULL, but we need to clear out
+		 * whatever the previous context might have used. */
 		ret = gen8_write_pdp(ring, i, addr, synchronous);
 		if (ret)
 			return ret;
@@ -689,10 +691,16 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
 
+	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pd))
+		return PTR_ERR(ppgtt->scratch_pd);
+
 	/* 1. Do all our allocations for page directories and page tables. */
 	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
-	if (ret)
+	if (ret) {
+		free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
+	}
 
 	/*
 	 * 2. Map all the page directory entires to point to the page tables
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index fae0867..5c6db90 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -267,7 +267,10 @@ struct i915_hw_ppgtt {
 		struct i915_pagedir pd;
 	};
 
-	struct i915_pagetab *scratch_pt;
+	union {
+		struct i915_pagetab *scratch_pt;
+		struct i915_pagetab *scratch_pd; /* Just need the daddr */
+	};
 
 	struct i915_hw_context *ctx;
 
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 34/56] drm/i915: num_pd_pages/num_pd_entries isn't useful
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (32 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 33/56] drm/i915/bdw: Make the pdp switch a bit less hacky Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 35/56] drm/i915: Extract PPGTT param from pagedir alloc Ben Widawsky
                   ` (22 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

These values are never quite useful for dynamic allocations of the page
tables. Getting rid of them will help prevent later confusion.

TODO: this probably needs to be earlier in the series

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 11 ++++-----
 drivers/gpu/drm/i915/i915_gem_gtt.c | 45 ++++++++++---------------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  7 ++++--
 3 files changed, 21 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 921d898..40aca7f 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1814,13 +1814,12 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
 
 static size_t gen6_ppgtt_count_pt_pages(struct i915_hw_ppgtt *ppgtt)
 {
-	struct i915_pagedir *pd = &ppgtt->pd;
-	struct i915_pagetab **pt = &pd->page_tables[0];
+	struct i915_pagetab *pt;
 	size_t cnt = 0;
-	int i;
+	uint32_t useless;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		if (pt[i] != ppgtt->scratch_pt)
+	gen6_for_all_pdes(pt, ppgtt, useless) {
+		if (pt != ppgtt->scratch_pt)
 			cnt++;
 	}
 
@@ -1844,8 +1843,6 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int verb
 	if (!ppgtt)
 		return;
 
-	seq_printf(m, "Page directories: %d\n", ppgtt->num_pd_pages);
-	seq_printf(m, "Page tables: %d\n", ppgtt->num_pd_entries);
 	for_each_ring(ring, dev_priv, unused) {
 		seq_printf(m, "%s\n", ring->name);
 		for (i = 0; i < 4; i++) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a895f4b..a646475 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -617,22 +617,14 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 		pdp->pagedirs[pdpe] = alloc_pd_single(ppgtt->base.dev);
 		if (IS_ERR(ppgtt->pdp.pagedirs[pdpe]))
 			goto unwind_out;
-
-		ppgtt->num_pd_pages++;
 	}
 
-	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
-
 	return 0;
 
 unwind_out:
-	while (pdpe--) {
+	while (pdpe--)
 		free_pd_single(ppgtt->pdp.pagedirs[pdpe],
 			       ppgtt->base.dev);
-		ppgtt->num_pd_pages--;
-	}
-
-	WARN_ON(ppgtt->num_pd_pages);
 
 	return -ENOMEM;
 }
@@ -655,12 +647,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 						ppgtt->base.dev);
 		if (ret)
 			goto err_out;
-
-		ppgtt->num_pd_entries += I915_PDES_PER_PD;
 	}
 
-	BUG_ON(pdpe > ppgtt->num_pd_pages);
-
 	return 0;
 
 	/* TODO: Check this for all cases */
@@ -682,7 +670,6 @@ err_out:
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
 	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-	const int min_pt_pages = I915_PDES_PER_PD * max_pdp;
 	int i, j, ret;
 
 	if (size % (1<<30))
@@ -731,27 +718,21 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 
-	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
-			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
-	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
-			 ppgtt->num_pd_entries,
-			 (ppgtt->num_pd_entries - min_pt_pages) + size % (1<<30));
 	return 0;
 }
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 {
 	struct i915_address_space *vm = &ppgtt->base;
+	struct i915_pagetab *unused;
 	gen6_gtt_pte_t scratch_pte;
 	uint32_t pd_entry;
-	int pte, pde;
+	uint32_t  pte, pde, temp;
+	uint32_t start = ppgtt->base.start, length = ppgtt->base.total;
 
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
 
-	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
-		   ppgtt->pd.pd_offset,
-		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
-	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
+	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
 		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
@@ -1229,12 +1210,12 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
-	int i;
+	struct i915_pagetab *pt;
+	uint32_t pde;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		struct i915_pagetab *pt = ppgtt->pd.page_tables[i];
+	gen6_for_all_pdes(pt, ppgtt, pde) {
 		if (pt != ppgtt->scratch_pt)
-			free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+			free_pt_single(pt, ppgtt->base.dev);
 	}
 
 	/* Consider putting this as part of pd free. */
@@ -1293,7 +1274,6 @@ alloc:
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
-	ppgtt->num_pd_entries = I915_PDES_PER_PD;
 	return 0;
 
 err_out:
@@ -1312,8 +1292,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt, bool preallocate_pt)
 	if (!preallocate_pt)
 		return 0;
 
-	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
-			     ppgtt->base.dev);
+	ret = alloc_pt_range(&ppgtt->pd, 0, I915_PDES_PER_PD, ppgtt->base.dev);
 	if (ret) {
 		free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
 		drm_mm_remove_node(&ppgtt->node);
@@ -1362,7 +1341,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pd_entries * GEN6_PTES_PER_PT * PAGE_SIZE;
+	ppgtt->base.total = I915_PDES_PER_PD * GEN6_PTES_PER_PT * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd.pd_offset =
@@ -1602,7 +1581,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 		if (i915_is_ggtt(vm))
 			ppgtt = dev_priv->mm.aliasing_ppgtt;
 
-		gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
+		gen6_map_page_range(dev_priv, &ppgtt->pd, 0, I915_PDES_PER_PD);
 	}
 
 	i915_gem_chipset_flush(dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 5c6db90..a581b33 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -260,8 +260,6 @@ struct i915_hw_ppgtt {
 	struct i915_address_space base;
 	struct kref ref;
 	struct drm_mm_node node;
-	unsigned num_pd_entries;
-	unsigned num_pd_pages; /* gen8+ */
 	union {
 		struct i915_pagedirpo pdp;
 		struct i915_pagedir pd;
@@ -328,6 +326,11 @@ struct i915_gtt {
 	     temp = min(temp, (unsigned)length), \
 	     start += temp, length -= temp)
 
+#define gen6_for_all_pdes(pt, ppgtt, iter)  \
+	for (iter = 0, pt = ppgtt->pd.page_tables[iter];			\
+	     iter < gen6_pde_index(ppgtt->base.total);			\
+	     pt =  ppgtt->pd.page_tables[++iter])
+
 static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
 {
 	const uint32_t mask = NUM_PTE(pde_shift) - 1;
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 35/56] drm/i915: Extract PPGTT param from pagedir alloc
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (33 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 34/56] drm/i915: num_pd_pages/num_pd_entries isn't useful Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 36/56] drm/i915/bdw: Split out mappings Ben Widawsky
                   ` (21 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Now that we don't need to trace num_pd_pages, we may as well kill all
need for the PPGTT structure in the alloc_pagedirs. This is very useful
for when we move to 48b addressing, and the PDP isn't the root of the
page table structure.

The param is replaced with drm_device, which is an unavoidable wart
throughout the series. (in other words, not extra flagrant).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a646475..eded6a1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -601,10 +601,9 @@ unwind_out:
 
 static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 				     uint64_t start,
-				     uint64_t length)
+				     uint64_t length,
+				     struct drm_device *dev)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(pdp, struct i915_hw_ppgtt, pdp);
 	struct i915_pagedir *unused;
 	uint64_t temp;
 	uint32_t pdpe;
@@ -614,8 +613,8 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
-		pdp->pagedirs[pdpe] = alloc_pd_single(ppgtt->base.dev);
-		if (IS_ERR(ppgtt->pdp.pagedirs[pdpe]))
+		pdp->pagedirs[pdpe] = alloc_pd_single(dev);
+		if (IS_ERR(pdp->pagedirs[pdpe]))
 			goto unwind_out;
 	}
 
@@ -623,8 +622,7 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 
 unwind_out:
 	while (pdpe--)
-		free_pd_single(ppgtt->pdp.pagedirs[pdpe],
-			       ppgtt->base.dev);
+		free_pd_single(pdp->pagedirs[pdpe], dev);
 
 	return -ENOMEM;
 }
@@ -638,7 +636,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 	uint32_t pdpe;
 	int ret;
 
-	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length);
+	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length,
+					ppgtt->base.dev);
 	if (ret)
 		return ret;
 
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 36/56] drm/i915/bdw: Split out mappings
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (34 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 35/56] drm/i915: Extract PPGTT param from pagedir alloc Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 37/56] drm/i915/bdw: begin bitmap tracking Ben Widawsky
                   ` (20 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

When we do dynamic page table allocations for gen8, we'll need to have
more control over how and when we map page tables, similar to gen6.

This patch adds the functionality and calls it at init, which should
have no functional change.

The PDPEs are still a special case for now. We'll need a function for
that in the future as well.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 94 ++++++++++++++++++++-----------------
 1 file changed, 52 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index eded6a1..e2bc274 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -533,6 +533,36 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
+static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
+			     struct i915_pagetab *pt,
+			     struct drm_device *dev)
+{
+	gen8_ppgtt_pde_t entry =
+		gen8_pde_encode(dev, pt->daddr, I915_CACHE_LLC);
+	*pde = entry;
+}
+
+/* It's likely we'll map more than one pagetable at a time. This function will
+ * save us unnecessary kmap calls, but do no more functionally than multiple
+ * calls to map_pt. */
+static void gen8_map_pagetable_range(struct i915_pagedir *pd,
+				     uint64_t start,
+				     uint64_t length,
+				     struct drm_device *dev)
+{
+	gen8_ppgtt_pde_t *pagedir = kmap_atomic(pd->page);
+	struct i915_pagetab *pt;
+	uint64_t temp, pde;
+
+	gen8_for_each_pde(pt, pd, start, length, temp, pde)
+		__gen8_do_map_pt(pagedir + pde, pt, dev);
+
+	if (!HAS_LLC(dev))
+		drm_clflush_virt_range(pagedir, PAGE_SIZE);
+
+	kunmap_atomic(pagedir);
+}
+
 static void gen8_teardown_va_range(struct i915_address_space *vm,
 				   uint64_t start, uint64_t length)
 {
@@ -627,11 +657,14 @@ unwind_out:
 	return -ENOMEM;
 }
 
-static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
-			    uint64_t start,
-			    uint64_t length)
+static int gen8_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start,
+			       uint64_t length)
 {
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_pagedir *pd;
+	const uint64_t orig_start = start;
 	uint64_t temp;
 	uint32_t pdpe;
 	int ret;
@@ -650,9 +683,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 
 	return 0;
 
-	/* TODO: Check this for all cases */
 err_out:
-	gen8_ppgtt_free(ppgtt);
+	gen8_teardown_va_range(vm, orig_start, start);
 	return ret;
 }
 
@@ -662,60 +694,38 @@ err_out:
  * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
  * space.
  *
- * FIXME: split allocation into smaller pieces. For now we only ever do this
- * once, but with full PPGTT, the multiple contiguous allocations will be bad.
- * TODO: Do something with the size parameter
  */
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
-	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-	int i, j, ret;
-
-	if (size % (1<<30))
-		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
+	struct i915_pagedir *pd;
+	uint64_t temp, start = 0;
+	const uint64_t orig_length = size;
+	uint32_t pdpe;
+	int ret;
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	ppgtt->enable = gen8_ppgtt_enable;
+	ppgtt->switch_mm = gen8_mm_switch;
 
 	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
 	if (IS_ERR(ppgtt->scratch_pd))
 		return PTR_ERR(ppgtt->scratch_pd);
 
-	/* 1. Do all our allocations for page directories and page tables. */
-	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
+	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
 		free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
 	}
 
-	/*
-	 * 2. Map all the page directory entires to point to the page tables
-	 * we've allocated.
-	 *
-	 * For now, the PPGTT helper functions all require that the PDEs are
-	 * plugged in correctly. So we do that now/here. For aliasing PPGTT, we
-	 * will never need to touch the PDEs again.
-	 */
-	for (i = 0; i < max_pdp; i++) {
-		struct i915_pagedir *pd = ppgtt->pdp.pagedirs[i];
-		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedirs[i]->page);
-		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			struct i915_pagetab *pt = pd->page_tables[j];
-			dma_addr_t addr = pt->daddr;
-			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
-						      I915_CACHE_LLC);
-		}
-		if (!HAS_LLC(ppgtt->base.dev))
-			drm_clflush_virt_range(pd_vaddr, PAGE_SIZE);
-		kunmap_atomic(pd_vaddr);
-	}
+	start = 0;
+	size = orig_length;
 
-	ppgtt->enable = gen8_ppgtt_enable;
-	ppgtt->switch_mm = gen8_mm_switch;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
-	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
+		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
 
 	return 0;
 }
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 37/56] drm/i915/bdw: begin bitmap tracking
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (35 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 36/56] drm/i915/bdw: Split out mappings Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 38/56] drm/i915/bdw: Dynamic page table allocations Ben Widawsky
                   ` (19 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Like with gen6/7, we can enable bitmap tracking with all the
preallocations to make sure things actually don't blow up.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 101 +++++++++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_gtt.h |  12 +++++
 2 files changed, 99 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e2bc274..82b98ea 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -353,8 +353,12 @@ err_out:
 
 static void __free_pd_single(struct i915_pagedir *pd, struct drm_device *dev)
 {
+	WARN(!bitmap_empty(pd->used_pdes, I915_PDES_PER_PD),
+	     "Free page directory with %d used pages\n",
+	     bitmap_weight(pd->used_pdes, I915_PDES_PER_PD));
 	i915_dma_unmap_single(pd, dev);
 	__free_page(pd->page);
+	kfree(pd->used_pdes);
 	kfree(pd);
 }
 
@@ -367,26 +371,35 @@ static void __free_pd_single(struct i915_pagedir *pd, struct drm_device *dev)
 static struct i915_pagedir *alloc_pd_single(struct drm_device *dev)
 {
 	struct i915_pagedir *pd;
-	int ret;
+	int ret = -ENOMEM;
 
 	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
 	if (!pd)
 		return ERR_PTR(-ENOMEM);
 
+	pd->used_pdes = kcalloc(BITS_TO_LONGS(I915_PDES_PER_PD),
+				sizeof(*pd->used_pdes), GFP_KERNEL);
+	if (!pd->used_pdes)
+		goto free_pd;
+
 	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	if (!pd->page) {
-		kfree(pd);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!pd->page)
+		goto free_bitmap;
 
 	ret = i915_dma_map_px_single(pd, dev);
-	if (ret) {
-		__free_page(pd->page);
-		kfree(pd);
-		return ERR_PTR(ret);
-	}
+	if (ret)
+		goto free_page;
 
 	return pd;
+
+free_page:
+	__free_page(pd->page);
+free_bitmap:
+	kfree(pd->used_pdes);
+free_pd:
+	kfree(pd);
+
+	return ERR_PTR(ret);
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
@@ -576,12 +589,48 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
+
+		/* Page directories might not be present since the macro rounds
+		 * down, and up.
+		 */
+		if (!pd) {
+			WARN(test_bit(pdpe, ppgtt->pdp.used_pdpes),
+			     "PDPE %d is not allocated, but is reserved (%p)\n",
+			     pdpe, vm);
+			continue;
+		} else {
+			WARN(!test_bit(pdpe, ppgtt->pdp.used_pdpes),
+			     "PDPE %d not reserved, but is allocated (%p)",
+			     pdpe, vm);
+		}
+
 		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
-			free_pt_single(pt, vm->dev);
-			pd->page_tables[pde] = NULL;
+			if (!pt) {
+				WARN(test_bit(pde, pd->used_pdes),
+				     "PDE %d is not allocated, but is reserved (%p)\n",
+				     pde, vm);
+				continue;
+			} else
+				WARN(!test_bit(pde, pd->used_pdes),
+				     "PDE %d not reserved, but is allocated (%p)",
+				     pde, vm);
+
+			bitmap_clear(pt->used_ptes,
+				     gen8_pte_index(pd_start),
+				     gen8_pte_count(pd_start, pd_len));
+
+			if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PT)) {
+				free_pt_single(pt, vm->dev);
+				pd->page_tables[pde] = NULL;
+				WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
+			}
+		}
+
+		if (bitmap_empty(pd->used_pdes, I915_PDES_PER_PD)) {
+			free_pd_single(pd, vm->dev);
+			ppgtt->pdp.pagedirs[pdpe] = NULL;
+			WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
 		}
-		free_pd_single(pd, vm->dev);
-		ppgtt->pdp.pagedirs[pdpe] = NULL;
 	}
 }
 
@@ -629,6 +678,7 @@ unwind_out:
 	return -ENOMEM;
 }
 
+/* bitmap of new pagedirs */
 static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 				     uint64_t start,
 				     uint64_t length,
@@ -644,6 +694,7 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
 		pdp->pagedirs[pdpe] = alloc_pd_single(dev);
+
 		if (IS_ERR(pdp->pagedirs[pdpe]))
 			goto unwind_out;
 	}
@@ -665,10 +716,12 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_pagedir *pd;
 	const uint64_t orig_start = start;
+	const uint64_t orig_length = length;
 	uint64_t temp;
 	uint32_t pdpe;
 	int ret;
 
+	/* Do the allocations first so we can easily bail out */
 	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length,
 					ppgtt->base.dev);
 	if (ret)
@@ -681,6 +734,26 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 			goto err_out;
 	}
 
+	/* Now mark everything we've touched as used. This doesn't allow for
+	 * robust error checking, but it makes the code a hell of a lot simpler.
+	 */
+	start = orig_start;
+	length = orig_length;
+
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		struct i915_pagetab *pt;
+		uint64_t pd_len = gen8_clamp_pd(start, length);
+		uint64_t pd_start = start;
+		uint32_t pde;
+		gen8_for_each_pde(pt, &ppgtt->pd, pd_start, pd_len, temp, pde) {
+			bitmap_set(pd->page_tables[pde]->used_ptes,
+				   gen8_pte_index(start),
+				   gen8_pte_count(start, length));
+			set_bit(pde, pd->used_pdes);
+		}
+		set_bit(pdpe, ppgtt->pdp.used_pdpes);
+	}
+
 	return 0;
 
 err_out:
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index a581b33..bce4124 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -190,11 +190,13 @@ struct i915_pagedir {
 		dma_addr_t daddr;
 	};
 
+	unsigned long *used_pdes;
 	struct i915_pagetab *page_tables[I915_PDES_PER_PD];
 };
 
 struct i915_pagedirpo {
 	/* struct page *page; */
+	DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
 	struct i915_pagedir *pagedirs[GEN8_LEGACY_PDPES];
 };
 
@@ -457,6 +459,16 @@ static inline uint32_t gen8_pml4e_index(uint64_t address)
 	BUG();
 }
 
+static inline size_t gen8_pte_count(uint64_t addr, uint64_t length)
+{
+	return i915_pte_count(addr, length, GEN8_PDE_SHIFT);
+}
+
+static inline size_t gen8_pde_count(uint64_t addr, uint64_t length)
+{
+	return i915_pde_count(addr, length, GEN8_PDE_SHIFT);
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_gem_setup_global_gtt(struct drm_device *dev, unsigned long start,
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 38/56] drm/i915/bdw: Dynamic page table allocations
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (36 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 37/56] drm/i915/bdw: begin bitmap tracking Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 39/56] drm/i915/bdw: Scratch unused pages Ben Widawsky
                   ` (18 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This finishes off the dynamic page tables allocations, in the legacy 3
level style that already exists. Most everything has already been setup
to this point, the patch finishes off the enabling by setting the
appropriate function pointers.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 261 +++++++++++++++++++++++++++++-------
 1 file changed, 216 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 82b98ea..66ed943 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -653,58 +653,160 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_alloc_pagetabs(struct i915_pagedir *pd,
+/**
+ * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
+ * @ppgtt:	Master ppgtt structure.
+ * @pd:		Page directory for this address range.
+ * @start:	Starting virtual address to begin allocations.
+ * @length	Size of the allocations.
+ * @new_pts:	Bitmap set by function with new allocations. Likely used by the
+ *		caller to free on error.
+ *
+ * Allocate the required number of page tables. Extremely similar to
+ * gen8_ppgtt_alloc_pagedirs(). The main difference is here we are limited by
+ * the page directory boundary (instead of the page directory pointer). That
+ * boundary is 1GB virtual. Therefore, unlike gen8_ppgtt_alloc_pagedirs(), it is
+ * possible, and likely that the caller will need to use multiple calls of this
+ * function to achieve the appropriate allocation.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
+				     struct i915_pagedir *pd,
 				     uint64_t start,
 				     uint64_t length,
-				     struct drm_device *dev)
+				     unsigned long *new_pts)
 {
 	struct i915_pagetab *unused;
 	uint64_t temp;
 	uint32_t pde;
 
 	gen8_for_each_pde(unused, pd, start, length, temp, pde) {
-		BUG_ON(unused);
-		pd->page_tables[pde] = alloc_pt_single(dev);
+		if (unused)
+			continue;
+
+		pd->page_tables[pde] = alloc_pt_single(ppgtt->base.dev);
+
 		if (IS_ERR(pd->page_tables[pde]))
 			goto unwind_out;
+
+		set_bit(pde, new_pts);
 	}
 
 	return 0;
 
 unwind_out:
-	while (pde--)
-		free_pt_single(pd->page_tables[pde], dev);
+	for_each_set_bit(pde, new_pts, I915_PDES_PER_PD)
+		free_pt_single(pd->page_tables[pde], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
 
-/* bitmap of new pagedirs */
-static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
+/**
+ * gen8_ppgtt_alloc_pagedirs() - Allocate page directories for VA range.
+ * @ppgtt:	Master ppgtt structure.
+ * @pdp:	Page directory pointer for this address range.
+ * @start:	Starting virtual address to begin allocations.
+ * @length	Size of the allocations.
+ * @new_pds	Bitmap set by function with new allocations. Likely used by the
+ *		caller to free on error.
+ *
+ * Allocate the required number of page directories starting at the pde index of
+ * @start, and ending at the pde index @start + @length. This function will skip
+ * over already allocated page directories within the range, and only allocate
+ * new ones, setting the appropriate pointer within the pdp as well as the
+ * correct position in the bitmap @new_pds.
+ *
+ * The function will only allocate the pages within the range for a give page
+ * directory pointer. In other words, if @start + @length straddles a virtually
+ * addressed PDP boundary (512GB for 4k pages), there will be more allocations
+ * required by the caller, This is not currently possible, and the BUG in the
+ * code will prevent it.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_pagedirs(struct i915_hw_ppgtt *ppgtt,
+				     struct i915_pagedirpo *pdp,
 				     uint64_t start,
 				     uint64_t length,
-				     struct drm_device *dev)
+				     unsigned long *new_pds)
 {
 	struct i915_pagedir *unused;
 	uint64_t temp;
 	uint32_t pdpe;
 
+	BUG_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
+
 	/* FIXME: PPGTT container_of won't work for 64b */
 	BUG_ON((start + length) > 0x800000000ULL);
 
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
-		BUG_ON(unused);
-		pdp->pagedirs[pdpe] = alloc_pd_single(dev);
+		struct i915_pagedir *pd;
+		if (unused)
+			continue;
 
-		if (IS_ERR(pdp->pagedirs[pdpe]))
+		pd = alloc_pd_single(ppgtt->base.dev);
+		if (IS_ERR(pd))
 			goto unwind_out;
+
+		pdp->pagedirs[pdpe] = pd;
+		set_bit(pdpe, new_pds);
 	}
 
 	return 0;
 
 unwind_out:
-	while (pdpe--)
-		free_pd_single(pdp->pagedirs[pdpe], dev);
+	for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
+		free_pd_single(pdp->pagedirs[pdpe], ppgtt->base.dev);
+
+	return -ENOMEM;
+}
+
+void free_gen8_temp_bitmaps(unsigned long *new_pds,
+			    unsigned long **new_pts)
+{
+	int i;
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++)
+		kfree(new_pts[i]);
+	kfree(new_pts);
+	kfree(new_pds);
+}
 
+/* Fills in the page directory bitmap, ant the array of page tables bitmap. Both
+ * of these are based on the number of PDPEs in the system.
+ */
+int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
+					 unsigned long ***new_pts)
+{
+	int i;
+	unsigned long *pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES),
+				     sizeof(unsigned long),
+				     GFP_KERNEL);
+
+	unsigned long **pts = kcalloc(I915_PDES_PER_PD,
+				      sizeof(unsigned long *),
+				      GFP_KERNEL);
+
+	if (!pts || !pds)
+		goto err_out;
+
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+		pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES_PER_PD),
+				 sizeof(unsigned long), GFP_KERNEL);
+		if (!pts[i])
+			goto err_out;
+	}
+
+	*new_pds = pds;
+	*new_pts = (unsigned long **)pts;
+
+	return 0;
+
+err_out:
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++)
+		kfree(pts[i]);
+	kfree(pds);
+	kfree(pts);
 	return -ENOMEM;
 }
 
@@ -714,6 +816,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	unsigned long *new_page_dirs, **new_page_tables;
 	struct i915_pagedir *pd;
 	const uint64_t orig_start = start;
 	const uint64_t orig_length = length;
@@ -721,22 +824,40 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	uint32_t pdpe;
 	int ret;
 
-	/* Do the allocations first so we can easily bail out */
-	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length,
-					ppgtt->base.dev);
+#ifdef CONFIG_32BIT
+	/* Disallow 64b address on 32b platforms. Nothing is wrong with doing
+	 * this in hardware, but a lot of the drm code is not prepared to handle
+	 * 64b offset on 32b platforms. */
+	if (start + length > 0x100000000ULL)
+		return -E2BIG;
+#endif
+
+	/* Wrap is never okay since we can only represent 48b, and we don't
+	 * actually use the other side of the canonical address space.
+	 */
+	if (WARN_ON(start + length < start))
+		return -ERANGE;
+
+	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
 	if (ret)
 		return ret;
 
+	/* Do the allocations first so we can easily bail out */
+	ret = gen8_ppgtt_alloc_pagedirs(ppgtt, &ppgtt->pdp, start, length,
+					new_page_dirs);
+	if (ret) {
+		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+		return ret;
+	}
+
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
-		ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
-						ppgtt->base.dev);
+		bitmap_zero(new_page_tables[pdpe], I915_PDES_PER_PD);
+		ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
+						new_page_tables[pdpe]);
 		if (ret)
 			goto err_out;
 	}
 
-	/* Now mark everything we've touched as used. This doesn't allow for
-	 * robust error checking, but it makes the code a hell of a lot simpler.
-	 */
 	start = orig_start;
 	length = orig_length;
 
@@ -745,19 +866,37 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
 		uint32_t pde;
-		gen8_for_each_pde(pt, &ppgtt->pd, pd_start, pd_len, temp, pde) {
-			bitmap_set(pd->page_tables[pde]->used_ptes,
-				   gen8_pte_index(start),
-				   gen8_pte_count(start, length));
+
+		BUG_ON(!pd);
+
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+			BUG_ON(!pt);
+
+			bitmap_set(pt->used_ptes,
+				   gen8_pte_index(pd_start),
+				   gen8_pte_count(pd_start, pd_len));
+
 			set_bit(pde, pd->used_pdes);
 		}
+
 		set_bit(pdpe, ppgtt->pdp.used_pdpes);
+
+		gen8_map_pagetable_range(pd, start, length, ppgtt->base.dev);
 	}
 
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	return 0;
 
 err_out:
-	gen8_teardown_va_range(vm, orig_start, start);
+	while (pdpe--) {
+		for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES_PER_PD)
+			free_pt_single(pd->page_tables[temp], ppgtt->base.dev);
+	}
+
+	for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
+		free_pd_single(ppgtt->pdp.pagedirs[pdpe], ppgtt->base.dev);
+
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	return ret;
 }
 
@@ -768,38 +907,65 @@ err_out:
  * space.
  *
  */
-static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
+static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
-	struct i915_pagedir *pd;
-	uint64_t temp, start = 0;
-	const uint64_t orig_length = size;
-	uint32_t pdpe;
-	int ret;
-
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 	ppgtt->enable = gen8_ppgtt_enable;
 	ppgtt->switch_mm = gen8_mm_switch;
+	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 
 	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
 	if (IS_ERR(ppgtt->scratch_pd))
 		return PTR_ERR(ppgtt->scratch_pd);
 
+	return 0;
+}
+
+static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_pagedir *pd;
+	uint64_t temp, start = 0, size = dev_priv->gtt.base.total;
+	uint32_t pdpe;
+	int ret;
+
+	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+	if (ret)
+		return ret;
+
 	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
 		free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
 	}
 
-	start = 0;
-	size = orig_length;
-
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
 		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
 
+	ppgtt->base.allocate_va_range = NULL;
+	ppgtt->base.teardown_va_range = NULL;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+
+	return 0;
+}
+
+static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+	if (ret)
+		return ret;
+
+	ppgtt->base.allocate_va_range = gen8_alloc_va_range;
+	ppgtt->base.teardown_va_range = gen8_teardown_va_range;
+	ppgtt->base.clear_range = NULL;
+
 	return 0;
 }
 
@@ -1454,8 +1620,10 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt, boo
 
 	if (INTEL_INFO(dev)->gen < 8)
 		ret = gen6_ppgtt_init(ppgtt, aliasing);
+	else if (IS_GEN8(dev) && aliasing)
+		ret = gen8_aliasing_ppgtt_init(ppgtt);
 	else if (IS_GEN8(dev))
-		ret = gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
+		ret = gen8_ppgtt_init(ppgtt);
 	else
 		BUG();
 
@@ -1464,7 +1632,8 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt, boo
 
 	kref_init(&ppgtt->ref);
 	drm_mm_init(&ppgtt->base.mm, ppgtt->base.start, ppgtt->base.total);
-	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
+	if (ppgtt->base.clear_range)
+		ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 	i915_init_vm(dev_priv, &ppgtt->base);
 
 	return 0;
@@ -1508,10 +1677,12 @@ ppgtt_bind_vma(struct i915_vma *vma,
 
 static void ppgtt_unbind_vma(struct i915_vma *vma)
 {
-	vma->vm->clear_range(vma->vm,
-			     vma->node.start,
-			     vma->obj->base.size,
-			     true);
+	if (vma->vm->clear_range)
+		vma->vm->clear_range(vma->vm,
+				     vma->node.start,
+				     vma->obj->base.size,
+				     true);
+
 	if (vma->vm->teardown_va_range) {
 		trace_i915_va_teardown(vma->vm,
 				       vma->node.start, vma->node.size);
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 39/56] drm/i915/bdw: Scratch unused pages
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (37 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 38/56] drm/i915/bdw: Dynamic page table allocations Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 40/56] drm/i915/bdw: Add ppgtt info for dynamic pages Ben Widawsky
                   ` (17 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This is probably not required since BDW is hopefully a bit more robust
that previous generations. Realize also that scratch will not exist for
every entry within the page table structure. Doing this would waste
an extraordinary amount of space when we move to 4 level page tables.
Therefore, the scratch pages/tables will only be pointed to by page
tables which have  less than all of the entries filled.

I wrote the patch while debugging so I figured why not put it in the
series.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 66ed943..2b732ca 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -576,6 +576,25 @@ static void gen8_map_pagetable_range(struct i915_pagedir *pd,
 	kunmap_atomic(pagedir);
 }
 
+static void gen8_map_pagedir(struct i915_pagedir *pd,
+			     struct i915_pagetab *pt,
+			     int entry,
+			     struct drm_device *dev)
+{
+	gen8_ppgtt_pde_t *pagedir = kmap_atomic(pd->page);
+	__gen8_do_map_pt(pagedir + entry, pt, dev);
+	kunmap_atomic(pagedir);
+}
+
+static void gen8_unmap_pagetable(struct i915_hw_ppgtt *ppgtt,
+				 struct i915_pagedir *pd,
+				 int pde)
+{
+	pd->page_tables[pde] = NULL;
+	WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
+	gen8_map_pagedir(pd, ppgtt->scratch_pt, pde, ppgtt->base.dev);
+}
+
 static void gen8_teardown_va_range(struct i915_address_space *vm,
 				   uint64_t start, uint64_t length)
 {
@@ -621,8 +640,10 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 
 			if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PT)) {
 				free_pt_single(pt, vm->dev);
-				pd->page_tables[pde] = NULL;
-				WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
+				/* This may be nixed later. Optimize? */
+				gen8_unmap_pagetable(ppgtt, pd, pde);
+			} else {
+				gen8_ppgtt_clear_range(vm, pd_start, pd_len, true);
 			}
 		}
 
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 40/56] drm/i915/bdw: Add ppgtt info for dynamic pages
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (38 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 39/56] drm/i915/bdw: Scratch unused pages Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 41/56] drm/i915/bdw: Optimize PDP loads Ben Widawsky
                   ` (16 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 59 +++++++++++++++++++++++++++----------
 drivers/gpu/drm/i915/i915_gem_gtt.c | 32 ++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.h |  9 ++++++
 3 files changed, 85 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 40aca7f..c29c71a 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1826,11 +1826,40 @@ static size_t gen6_ppgtt_count_pt_pages(struct i915_hw_ppgtt *ppgtt)
 	return cnt;
 }
 
+static void gen8_ppgtt_debugfs_counter(struct i915_pagedirpo *pdp,
+				       struct i915_pagedir *pd,
+				       struct i915_pagetab *pt,
+				       unsigned pdpe,
+				       unsigned pde,
+				       void *data)
+{
+	if (!pd || !pt)
+		return;
+
+	(*(size_t *)data)++;
+}
+
+static size_t gen8_ppgtt_count_pt_pages(struct i915_hw_ppgtt *ppgtt)
+{
+	size_t count = 0;
+
+	gen8_for_every_pdpe_pde(ppgtt, gen8_ppgtt_debugfs_counter, &count);
+
+	return count;
+}
+
 static void print_ppgtt(struct seq_file *m, struct i915_hw_ppgtt *ppgtt, const char *name)
 {
+	struct drm_device *dev = ppgtt->base.dev;
+
 	seq_printf(m, "%s:\n", name);
-	seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
-	seq_printf(m, "\tpd pages: %zu\n", gen6_ppgtt_count_pt_pages(ppgtt));
+
+	if (INTEL_INFO(dev)->gen < 8) {
+		seq_printf(m, "\tpd pages: %zu\n", gen6_ppgtt_count_pt_pages(ppgtt));
+		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
+	} else {
+		seq_printf(m, "\tpage table overhead: %zu pages\n", gen8_ppgtt_count_pt_pages(ppgtt));
+	}
 }
 
 static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int verbose)
@@ -1873,7 +1902,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool ver
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *ring;
-	struct drm_file *file;
 	int i;
 
 	if (INTEL_INFO(dev)->gen == 6)
@@ -1897,18 +1925,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool ver
 			ppgtt->debug_dump(ppgtt, m);
 	} else
 		return;
-
-	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
-		struct drm_i915_file_private *file_priv = file->driver_priv;
-		struct i915_hw_ppgtt *pvt_ppgtt;
-
-		pvt_ppgtt = ctx_to_ppgtt(file_priv->private_default_ctx);
-		seq_printf(m, "\nproc: %s\n",
-			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
-		print_ppgtt(m, pvt_ppgtt, "Default context");
-		idr_for_each(&file_priv->context_idr, per_file_ctx,
-			     (void *)((unsigned long)m | verbose));
-	}
 }
 
 static int i915_ppgtt_info(struct seq_file *m, void *data)
@@ -1917,6 +1933,7 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	bool verbose = node->info_ent->data ? true : false;
+	struct drm_file *file;
 
 	int ret = mutex_lock_interruptible(&dev->struct_mutex);
 	if (ret)
@@ -1928,6 +1945,18 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
 	else if (INTEL_INFO(dev)->gen >= 6)
 		gen6_ppgtt_info(m, dev, verbose);
 
+	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
+		struct drm_i915_file_private *file_priv = file->driver_priv;
+		struct i915_hw_ppgtt *pvt_ppgtt;
+
+		pvt_ppgtt = ctx_to_ppgtt(file_priv->private_default_ctx);
+		seq_printf(m, "\nproc: %s\n",
+			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
+		print_ppgtt(m, pvt_ppgtt, "Default context");
+		idr_for_each(&file_priv->context_idr, per_file_ctx,
+			     (void *)((unsigned long)m | verbose));
+	}
+
 	intel_runtime_pm_put(dev_priv);
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 2b732ca..d8bb4dc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1993,6 +1993,38 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
 	readl(gtt_base);
 }
 
+void gen8_for_every_pdpe_pde(struct i915_hw_ppgtt *ppgtt,
+			     void (*callback)(struct i915_pagedirpo *pdp,
+					      struct i915_pagedir *pd,
+					      struct i915_pagetab *pt,
+					      unsigned pdpe,
+					      unsigned pde,
+					      void *data),
+			     void *data)
+{
+	uint64_t start = ppgtt->base.start;
+	uint64_t length = ppgtt->base.total;
+	uint64_t pdpe, pde, temp;
+
+	struct i915_pagedir *pd;
+	struct i915_pagetab *pt;
+
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		uint64_t pd_start = start, pd_length = length;
+		int i;
+
+		if (pd == NULL) {
+			for (i = 0; i < I915_PDES_PER_PD; i++)
+				callback(&ppgtt->pdp, NULL, NULL, pdpe, i, data);
+			continue;
+		}
+
+		gen8_for_each_pde(pt, pd, pd_start, pd_length, temp, pde) {
+			callback(&ppgtt->pdp, pd, pt, pdpe, pde, data);
+		}
+	}
+}
+
 static void gen6_ggtt_clear_range(struct i915_address_space *vm,
 				  uint64_t start,
 				  uint64_t length,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index bce4124..b3d0776 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -469,6 +469,15 @@ static inline size_t gen8_pde_count(uint64_t addr, uint64_t length)
 	return i915_pde_count(addr, length, GEN8_PDE_SHIFT);
 }
 
+void gen8_for_every_pdpe_pde(struct i915_hw_ppgtt *ppgtt,
+			     void (*callback)(struct i915_pagedirpo *pdp,
+					      struct i915_pagedir *pd,
+					      struct i915_pagetab *pt,
+					      unsigned pdpe,
+					      unsigned pde,
+					      void *data),
+			     void *data);
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_gem_setup_global_gtt(struct drm_device *dev, unsigned long start,
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 41/56] drm/i915/bdw: Optimize PDP loads
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (39 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 40/56] drm/i915/bdw: Add ppgtt info for dynamic pages Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 42/56] TESTME: Either drop the last patch or fix it Ben Widawsky
                   ` (15 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Don't do them if they're not necessary, which they're not, for the RCS,
in certain conditions.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 20 ++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.h |  1 +
 2 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d8bb4dc..3ea0c7d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -438,8 +438,20 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 			  struct intel_ring_buffer *ring,
 			  bool synchronous)
 {
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	int i, ret;
 
+	/* The RCS ring gets reloaded by the hardware context state. So we only
+	 * need to actually reload if one of the page directory pointer have
+	 * changed, or it's !RCS
+	 *
+	 * Aliasing PPGTT remains special, as we do not track it's
+	 * reloading needs.
+	 */
+	if (ppgtt != dev_priv->mm.aliasing_ppgtt &&
+	    ring->id == RCS && !ppgtt->pdp.needs_reload)
+		return 0;
+
 	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
 		struct i915_pagedir *pd = ppgtt->pdp.pagedirs[i];
 		dma_addr_t addr = pd ? pd->daddr : ppgtt->scratch_pt->daddr;
@@ -450,6 +462,9 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 			return ret;
 	}
 
+
+	ppgtt->pdp.needs_reload = 0;
+
 	return 0;
 }
 
@@ -651,6 +666,7 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 			free_pd_single(pd, vm->dev);
 			ppgtt->pdp.pagedirs[pdpe] = NULL;
 			WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
+			ppgtt->pdp.needs_reload = 1;
 		}
 	}
 }
@@ -901,6 +917,8 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		}
 
 		set_bit(pdpe, ppgtt->pdp.used_pdpes);
+		if (test_and_set_bit(pdpe, ppgtt->pdp.used_pdpes))
+			ppgtt->pdp.needs_reload = 1;
 
 		gen8_map_pagetable_range(pd, start, length, ppgtt->base.dev);
 	}
@@ -937,6 +955,8 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->switch_mm = gen8_mm_switch;
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 
+	ppgtt->pdp.needs_reload = 1;
+
 	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
 	if (IS_ERR(ppgtt->scratch_pd))
 		return PTR_ERR(ppgtt->scratch_pd);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index b3d0776..dd561f3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -198,6 +198,7 @@ struct i915_pagedirpo {
 	/* struct page *page; */
 	DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
 	struct i915_pagedir *pagedirs[GEN8_LEGACY_PDPES];
+	unsigned needs_reload:1;
 };
 
 struct i915_address_space {
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 42/56] TESTME: Either drop the last patch or fix it.
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (40 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 41/56] drm/i915/bdw: Optimize PDP loads Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 43/56] drm/i915/bdw: Add dynamic page trace events Ben Widawsky
                   ` (14 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky

I was getting unexplainable hangs with the last patch, even though I
think it should be correct. As the subject says, should this ever get
merged, it needs to be coordinated with the patch this reverts.

Revert "drm/i915/bdw: Optimize PDP loads"

This reverts commit 64053129b5cbd3a5f87dab27d026c17efbdf0387.
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 20 --------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  1 -
 2 files changed, 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 3ea0c7d..d8bb4dc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -438,20 +438,8 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 			  struct intel_ring_buffer *ring,
 			  bool synchronous)
 {
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	int i, ret;
 
-	/* The RCS ring gets reloaded by the hardware context state. So we only
-	 * need to actually reload if one of the page directory pointer have
-	 * changed, or it's !RCS
-	 *
-	 * Aliasing PPGTT remains special, as we do not track it's
-	 * reloading needs.
-	 */
-	if (ppgtt != dev_priv->mm.aliasing_ppgtt &&
-	    ring->id == RCS && !ppgtt->pdp.needs_reload)
-		return 0;
-
 	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
 		struct i915_pagedir *pd = ppgtt->pdp.pagedirs[i];
 		dma_addr_t addr = pd ? pd->daddr : ppgtt->scratch_pt->daddr;
@@ -462,9 +450,6 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 			return ret;
 	}
 
-
-	ppgtt->pdp.needs_reload = 0;
-
 	return 0;
 }
 
@@ -666,7 +651,6 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 			free_pd_single(pd, vm->dev);
 			ppgtt->pdp.pagedirs[pdpe] = NULL;
 			WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
-			ppgtt->pdp.needs_reload = 1;
 		}
 	}
 }
@@ -917,8 +901,6 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		}
 
 		set_bit(pdpe, ppgtt->pdp.used_pdpes);
-		if (test_and_set_bit(pdpe, ppgtt->pdp.used_pdpes))
-			ppgtt->pdp.needs_reload = 1;
 
 		gen8_map_pagetable_range(pd, start, length, ppgtt->base.dev);
 	}
@@ -955,8 +937,6 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->switch_mm = gen8_mm_switch;
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 
-	ppgtt->pdp.needs_reload = 1;
-
 	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
 	if (IS_ERR(ppgtt->scratch_pd))
 		return PTR_ERR(ppgtt->scratch_pd);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index dd561f3..b3d0776 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -198,7 +198,6 @@ struct i915_pagedirpo {
 	/* struct page *page; */
 	DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
 	struct i915_pagedir *pagedirs[GEN8_LEGACY_PDPES];
-	unsigned needs_reload:1;
 };
 
 struct i915_address_space {
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 43/56] drm/i915/bdw: Add dynamic page trace events
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (41 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 42/56] TESTME: Either drop the last patch or fix it Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 44/56] drm/i915/bdw: Make pdp allocation more dynamic Ben Widawsky
                   ` (13 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This works the same as GEN6.

I was disappointed that I need to pass vm around now, but it's not so
much uglier than the drm_device, and having the vm in trace events is
hugely important.

v2: Consolidate pagetable/pagedirectory events

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 41 ++++++++++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_trace.h   | 16 +++++++++++++++
 2 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d8bb4dc..4d01d4e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -558,19 +558,24 @@ static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
 /* It's likely we'll map more than one pagetable at a time. This function will
  * save us unnecessary kmap calls, but do no more functionally than multiple
  * calls to map_pt. */
-static void gen8_map_pagetable_range(struct i915_pagedir *pd,
+static void gen8_map_pagetable_range(struct i915_address_space *vm,
+				     struct i915_pagedir *pd,
 				     uint64_t start,
-				     uint64_t length,
-				     struct drm_device *dev)
+				     uint64_t length)
 {
 	gen8_ppgtt_pde_t *pagedir = kmap_atomic(pd->page);
 	struct i915_pagetab *pt;
 	uint64_t temp, pde;
 
-	gen8_for_each_pde(pt, pd, start, length, temp, pde)
-		__gen8_do_map_pt(pagedir + pde, pt, dev);
+	gen8_for_each_pde(pt, pd, start, length, temp, pde) {
+		__gen8_do_map_pt(pagedir + pde, pt, vm->dev);
+		trace_i915_pagetable_map(vm, pde, pt,
+					 gen8_pte_index(start),
+					 gen8_pte_count(start, length),
+					 GEN8_PTES_PER_PT);
+	}
 
-	if (!HAS_LLC(dev))
+	if (!HAS_LLC(vm->dev))
 		drm_clflush_virt_range(pagedir, PAGE_SIZE);
 
 	kunmap_atomic(pagedir);
@@ -634,11 +639,20 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 				     "PDE %d not reserved, but is allocated (%p)",
 				     pde, vm);
 
+			trace_i915_pagetable_unmap(vm, pde, pt,
+						   gen8_pte_index(pd_start),
+						   gen8_pte_count(pd_start, pd_len),
+						   GEN8_PTES_PER_PT);
+
 			bitmap_clear(pt->used_ptes,
 				     gen8_pte_index(pd_start),
 				     gen8_pte_count(pd_start, pd_len));
 
 			if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PT)) {
+				trace_i915_pagetable_destroy(vm,
+							     pde,
+							     pd_start & GENMASK_ULL(64, GEN8_PDE_SHIFT),
+							     GEN8_PDE_SHIFT);
 				free_pt_single(pt, vm->dev);
 				/* This may be nixed later. Optimize? */
 				gen8_unmap_pagetable(ppgtt, pd, pde);
@@ -650,6 +664,9 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 		if (bitmap_empty(pd->used_pdes, I915_PDES_PER_PD)) {
 			free_pd_single(pd, vm->dev);
 			ppgtt->pdp.pagedirs[pdpe] = NULL;
+			trace_i915_pagedirectory_destroy(vm, pdpe,
+							 start & GENMASK_ULL(64, GEN8_PDPE_SHIFT),
+							 GEN8_PDPE_SHIFT);
 			WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
 		}
 	}
@@ -698,6 +715,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
 				     uint64_t length,
 				     unsigned long *new_pts)
 {
+	struct drm_device *dev = ppgtt->base.dev;
 	struct i915_pagetab *unused;
 	uint64_t temp;
 	uint32_t pde;
@@ -706,19 +724,20 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
 		if (unused)
 			continue;
 
-		pd->page_tables[pde] = alloc_pt_single(ppgtt->base.dev);
+		pd->page_tables[pde] = alloc_pt_single(dev);
 
 		if (IS_ERR(pd->page_tables[pde]))
 			goto unwind_out;
 
 		set_bit(pde, new_pts);
+		trace_i915_pagetable_alloc(&ppgtt->base, pde, start, GEN8_PDE_SHIFT);
 	}
 
 	return 0;
 
 unwind_out:
 	for_each_set_bit(pde, new_pts, I915_PDES_PER_PD)
-		free_pt_single(pd->page_tables[pde], ppgtt->base.dev);
+		free_pt_single(pd->page_tables[pde], dev);
 
 	return -ENOMEM;
 }
@@ -772,6 +791,8 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_hw_ppgtt *ppgtt,
 
 		pdp->pagedirs[pdpe] = pd;
 		set_bit(pdpe, new_pds);
+		trace_i915_pagedirectory_alloc(&ppgtt->base, pdpe, start,
+					       GEN8_PDPE_SHIFT);
 	}
 
 	return 0;
@@ -902,7 +923,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 
 		set_bit(pdpe, ppgtt->pdp.used_pdpes);
 
-		gen8_map_pagetable_range(pd, start, length, ppgtt->base.dev);
+		gen8_map_pagetable_range(vm, pd, start, length);
 	}
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
@@ -964,7 +985,7 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	}
 
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
-		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
+		gen8_map_pagetable_range(&ppgtt->base, pd, start, size);
 
 	ppgtt->base.allocate_va_range = NULL;
 	ppgtt->base.teardown_va_range = NULL;
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 99a436d..49f2389 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -217,6 +217,22 @@ DEFINE_EVENT(i915_pagetable, i915_pagetable_destroy,
 	     TP_ARGS(vm, pde, start, pde_shift)
 );
 
+DEFINE_EVENT_PRINT(i915_pagetable, i915_pagedirectory_alloc,
+		   TP_PROTO(struct i915_address_space *vm, u32 pdpe, u64 start, u64 pdpe_shift),
+		   TP_ARGS(vm, pdpe, start, pdpe_shift),
+
+		   TP_printk("vm=%p, pdpe=%d (0x%llx-0x%llx)",
+			     __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT_PRINT(i915_pagetable, i915_pagedirectory_destroy,
+		   TP_PROTO(struct i915_address_space *vm, u32 pdpe, u64 start, u64 pdpe_shift),
+		   TP_ARGS(vm, pdpe, start, pdpe_shift),
+
+		   TP_printk("vm=%p, pdpe=%d (0x%llx-0x%llx)",
+			     __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
 /* Avoid extra math because we only support two sizes. The format is defined by
  * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
 #define TRACE_PT_SIZE(bits) \
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 44/56] drm/i915/bdw: Make pdp allocation more dynamic
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (42 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 43/56] drm/i915/bdw: Add dynamic page trace events Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 45/56] drm/i915/bdw: Abstract PDP usage Ben Widawsky
                   ` (12 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This transitional patch doesn't do much for the existing code. However,
it should make upcoming patches to use the full 48b address space a bit
easier to swallow. The patch also introduces the PML4, ie. the new top
level structure of the page tables.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h     |   5 ++
 drivers/gpu/drm/i915/i915_gem_gtt.c | 122 +++++++++++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  40 +++++++++---
 drivers/gpu/drm/i915/i915_trace.h   |  16 +++++
 4 files changed, 151 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 29bf034..4d53728 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1920,6 +1920,11 @@ struct drm_i915_cmd_table {
 #define HAS_PPGTT(dev)		(INTEL_INFO(dev)->gen >= 7 && !IS_VALLEYVIEW(dev))
 #define USES_PPGTT(dev)		intel_enable_ppgtt(dev, false)
 #define USES_FULL_PPGTT(dev)	intel_enable_ppgtt(dev, true)
+#ifdef CONFIG_32BIT
+# define HAS_48B_PPGTT(dev)	false
+#else
+# define HAS_48B_PPGTT(dev)	(IS_BROADWELL(dev) && false)
+#endif
 
 #define HAS_OVERLAY(dev)		(INTEL_INFO(dev)->has_overlay)
 #define OVERLAY_NEEDS_PHYSICAL(dev)	(INTEL_INFO(dev)->overlay_needs_physical)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 4d01d4e..df3cd41 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -402,6 +402,45 @@ free_pd:
 	return ERR_PTR(ret);
 }
 
+static void __pdp_fini(struct i915_pagedirpo *pdp)
+{
+	kfree(pdp->used_pdpes);
+	kfree(pdp->pagedirs);
+	/* HACK */
+	pdp->pagedirs = NULL;
+}
+
+static void free_pdp_single(struct i915_pagedirpo *pdp,
+			    struct drm_device *dev)
+{
+	__pdp_fini(pdp);
+	if (HAS_48B_PPGTT(dev))
+		kfree(pdp);
+}
+
+static int __pdp_init(struct i915_pagedirpo *pdp,
+		      struct drm_device *dev)
+{
+	size_t pdpes = I915_PDPES_PER_PDP(dev);
+
+	pdp->used_pdpes = kcalloc(BITS_TO_LONGS(pdpes),
+				  sizeof(unsigned long),
+				  GFP_KERNEL);
+	if (!pdp->used_pdpes)
+		return -ENOMEM;
+
+	pdp->pagedirs = kcalloc(pdpes, sizeof(*pdp->pagedirs), GFP_KERNEL);
+	if (!pdp->pagedirs) {
+		kfree(pdp->used_pdpes);
+		/* the PDP might be the statically allocated top level. Keep it
+		 * as clean as possible */
+		pdp->used_pdpes = NULL;
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_ring_buffer *ring,
 			  unsigned entry,
@@ -440,7 +479,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 {
 	int i, ret;
 
-	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
+	for (i = 3; i >= 0; i--) {
 		struct i915_pagedir *pd = ppgtt->pdp.pagedirs[i];
 		dma_addr_t addr = pd ? pd->daddr : ppgtt->scratch_pt->daddr;
 		/* The page directory might be NULL, but we need to clear out
@@ -514,9 +553,6 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
-		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
-			break;
-
 		if (pt_vaddr == NULL) {
 			struct i915_pagedir *pd = ppgtt->pdp.pagedirs[pdpe];
 			struct i915_pagetab *pt = pd->page_tables[pde];
@@ -605,10 +641,16 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		        container_of(vm, struct i915_hw_ppgtt, base);
+	struct drm_device *dev = vm->dev;
 	struct i915_pagedir *pd;
 	struct i915_pagetab *pt;
 	uint64_t temp;
-	uint32_t pdpe, pde;
+	uint32_t pdpe, pde, orig_start = start;
+
+	if (!ppgtt->pdp.pagedirs) {
+		/* If pagedirs are already free, there is nothing to do.*/
+		return;
+	}
 
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
 		uint64_t pd_len = gen8_clamp_pd(start, length);
@@ -653,7 +695,7 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 							     pde,
 							     pd_start & GENMASK_ULL(64, GEN8_PDE_SHIFT),
 							     GEN8_PDE_SHIFT);
-				free_pt_single(pt, vm->dev);
+				free_pt_single(pt, dev);
 				/* This may be nixed later. Optimize? */
 				gen8_unmap_pagetable(ppgtt, pd, pde);
 			} else {
@@ -662,7 +704,7 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 		}
 
 		if (bitmap_empty(pd->used_pdes, I915_PDES_PER_PD)) {
-			free_pd_single(pd, vm->dev);
+			free_pd_single(pd, dev);
 			ppgtt->pdp.pagedirs[pdpe] = NULL;
 			trace_i915_pagedirectory_destroy(vm, pdpe,
 							 start & GENMASK_ULL(64, GEN8_PDPE_SHIFT),
@@ -670,6 +712,14 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 			WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
 		}
 	}
+
+	if (bitmap_empty(ppgtt->pdp.used_pdpes, I915_PDPES_PER_PDP(dev))) {
+		/* TODO: When pagetables are fully dynamic:
+		free_pdp_single(&ppgtt->pdp, dev); */
+		trace_i915_pagedirpo_destroy(vm, 0,
+					     orig_start & GENMASK_ULL(64, GEN8_PML4E_SHIFT),
+					     GEN8_PML4E_SHIFT);
+	}
 }
 
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
@@ -678,6 +728,10 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 			       ppgtt->base.start, ppgtt->base.total);
 	gen8_teardown_va_range(&ppgtt->base,
 			       ppgtt->base.start, ppgtt->base.total);
+
+	WARN_ON(!bitmap_empty(ppgtt->pdp.used_pdpes,
+			      I915_PDPES_PER_PDP(ppgtt->base.dev)));
+	free_pdp_single(&ppgtt->pdp, ppgtt->base.dev);
 }
 
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
@@ -771,11 +825,13 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_hw_ppgtt *ppgtt,
 				     uint64_t length,
 				     unsigned long *new_pds)
 {
+	struct drm_device *dev = ppgtt->base.dev;
 	struct i915_pagedir *unused;
 	uint64_t temp;
 	uint32_t pdpe;
+	size_t pdpes =  I915_PDPES_PER_PDP(ppgtt->base.dev);
 
-	BUG_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
+	BUG_ON(!bitmap_empty(new_pds, pdpes));
 
 	/* FIXME: PPGTT container_of won't work for 64b */
 	BUG_ON((start + length) > 0x800000000ULL);
@@ -798,17 +854,18 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 
 unwind_out:
-	for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
+	for_each_set_bit(pdpe, new_pds, pdpes)
 		free_pd_single(pdp->pagedirs[pdpe], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
 
 void free_gen8_temp_bitmaps(unsigned long *new_pds,
-			    unsigned long **new_pts)
+			    unsigned long **new_pts,
+			    size_t pdpes)
 {
 	int i;
-	for (i = 0; i < GEN8_LEGACY_PDPES; i++)
+	for (i = 0; i < pdpes; i++)
 		kfree(new_pts[i]);
 	kfree(new_pts);
 	kfree(new_pds);
@@ -818,10 +875,11 @@ void free_gen8_temp_bitmaps(unsigned long *new_pds,
  * of these are based on the number of PDPEs in the system.
  */
 int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
-					 unsigned long ***new_pts)
+					 unsigned long ***new_pts,
+					 size_t pdpes)
 {
 	int i;
-	unsigned long *pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES),
+	unsigned long *pds = kcalloc(BITS_TO_LONGS(pdpes),
 				     sizeof(unsigned long),
 				     GFP_KERNEL);
 
@@ -832,7 +890,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
 	if (!pts || !pds)
 		goto err_out;
 
-	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+	for (i = 0; i < pdpes; i++) {
 		pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES_PER_PD),
 				 sizeof(unsigned long), GFP_KERNEL);
 		if (!pts[i])
@@ -845,7 +903,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
 	return 0;
 
 err_out:
-	for (i = 0; i < GEN8_LEGACY_PDPES; i++)
+	for (i = 0; i < pdpes; i++)
 		kfree(pts[i]);
 	kfree(pds);
 	kfree(pts);
@@ -859,11 +917,13 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	unsigned long *new_page_dirs, **new_page_tables;
+	struct drm_device *dev = vm->dev;
 	struct i915_pagedir *pd;
 	const uint64_t orig_start = start;
 	const uint64_t orig_length = length;
 	uint64_t temp;
 	uint32_t pdpe;
+	size_t pdpes = I915_PDPES_PER_PDP(dev);
 	int ret;
 
 #ifdef CONFIG_32BIT
@@ -880,7 +940,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	if (WARN_ON(start + length < start))
 		return -ERANGE;
 
-	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
+	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables, pdpes);
 	if (ret)
 		return ret;
 
@@ -888,7 +948,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	ret = gen8_ppgtt_alloc_pagedirs(ppgtt, &ppgtt->pdp, start, length,
 					new_page_dirs);
 	if (ret) {
-		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 		return ret;
 	}
 
@@ -926,7 +986,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		gen8_map_pagetable_range(vm, pd, start, length);
 	}
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 	return 0;
 
 err_out:
@@ -935,13 +995,19 @@ err_out:
 			free_pt_single(pd->page_tables[temp], ppgtt->base.dev);
 	}
 
-	for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
+	for_each_set_bit(pdpe, new_page_dirs, pdpes)
 		free_pd_single(ppgtt->pdp.pagedirs[pdpe], ppgtt->base.dev);
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 	return ret;
 }
 
+static void gen8_ppgtt_fini_common(struct i915_hw_ppgtt *ppgtt)
+{
+	free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
+	free_pdp_single(&ppgtt->pdp, ppgtt->base.dev);
+}
+
 /**
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
@@ -955,13 +1021,24 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.total = size;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 	ppgtt->enable = gen8_ppgtt_enable;
-	ppgtt->switch_mm = gen8_mm_switch;
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 
 	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
 	if (IS_ERR(ppgtt->scratch_pd))
 		return PTR_ERR(ppgtt->scratch_pd);
 
+	if (!HAS_48B_PPGTT(ppgtt->base.dev)) {
+		int ret = __pdp_init(&ppgtt->pdp, false);
+		if (ret) {
+			free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
+			return ret;
+		}
+
+		ppgtt->switch_mm = gen8_mm_switch;
+		trace_i915_pagedirpo_alloc(&ppgtt->base, 0, 0, GEN8_PML4E_SHIFT);
+	} else
+		BUG(); /* Not yet implemented */
+
 	return 0;
 }
 
@@ -980,7 +1057,7 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 
 	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
-		free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
+		gen8_ppgtt_fini_common(ppgtt);
 		return ret;
 	}
 
@@ -2023,6 +2100,7 @@ void gen8_for_every_pdpe_pde(struct i915_hw_ppgtt *ppgtt,
 					      void *data),
 			     void *data)
 {
+	struct drm_device *dev = ppgtt->base.dev;
 	uint64_t start = ppgtt->base.start;
 	uint64_t length = ppgtt->base.total;
 	uint64_t pdpe, pde, temp;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index b3d0776..94c825e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -88,7 +88,6 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define PPAT_CACHED_INDEX		_PAGE_PAT /* WB LLCeLLC */
 #define PPAT_DISPLAY_ELLC_INDEX		_PAGE_PCD /* WT eLLC */
 
-#define GEN8_LEGACY_PDPES		4
 #define GEN8_PTES_PER_PT		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
 
 /* GEN8 legacy style address is defined as a 3 level page table:
@@ -97,8 +96,17 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
  * The difference as compared to normal x86 3 level page table is the PDPEs are
  * programmed via register.
  */
+#ifndef CONFIG_32BIT
+# define I915_PDPES_PER_PDP(dev) (HAS_48B_PPGTT(dev) ? 512 : 4)
+#else
+# define I915_PDPES_PER_PDP		4
+#endif
+#define GEN8_PML4ES_PER_PML4		512
+#define GEN8_PML4E_SHIFT		39
 #define GEN8_PDPE_SHIFT			30
-#define GEN8_PDPE_MASK			0x3
+/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
+ * tables */
+#define GEN8_PDPE_MASK			0x1ff
 #define GEN8_PDE_SHIFT			21
 
 #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
@@ -195,9 +203,17 @@ struct i915_pagedir {
 };
 
 struct i915_pagedirpo {
-	/* struct page *page; */
-	DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
-	struct i915_pagedir *pagedirs[GEN8_LEGACY_PDPES];
+	struct page *page;
+	dma_addr_t daddr;
+	unsigned long *used_pdpes;
+	struct i915_pagedir **pagedirs;
+};
+
+struct i915_pml4 {
+	struct page *page;
+	dma_addr_t daddr;
+	DECLARE_BITMAP(used_pml4es, GEN8_PML4ES_PER_PML4);
+	struct i915_pagedirpo *pdps[GEN8_PML4ES_PER_PML4];
 };
 
 struct i915_address_space {
@@ -263,8 +279,9 @@ struct i915_hw_ppgtt {
 	struct kref ref;
 	struct drm_mm_node node;
 	union {
-		struct i915_pagedirpo pdp;
-		struct i915_pagedir pd;
+		struct i915_pml4 pml4;		/* GEN8+ & 64b PPGTT */
+		struct i915_pagedirpo pdp;	/* GEN8+ */
+		struct i915_pagedir pd;		/* GEN6-7 */
 	};
 
 	union {
@@ -411,14 +428,17 @@ static inline size_t gen6_pde_count(uint32_t addr, uint32_t length)
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
-#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
-	for (iter = gen8_pdpe_index(start), pd = (pdp)->pagedirs[iter];	\
-	     length > 0 && iter < GEN8_LEGACY_PDPES;			\
+#define gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, b)	\
+	for (iter = gen8_pdpe_index(start), pd = (pdp)->pagedirs[iter]; \
+	     length > 0 && (iter < b);					\
 	     pd = (pdp)->pagedirs[++iter],				\
 	     temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start,	\
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
+#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
+	gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
+
 /* Clamp length to the next pagetab boundary */
 static inline uint64_t gen8_clamp_pt(uint64_t start, uint64_t length)
 {
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 49f2389..17b8059 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -233,6 +233,22 @@ DEFINE_EVENT_PRINT(i915_pagetable, i915_pagedirectory_destroy,
 			     __entry->vm, __entry->pde, __entry->start, __entry->end)
 );
 
+DEFINE_EVENT_PRINT(i915_pagetable, i915_pagedirpo_alloc,
+		   TP_PROTO(struct i915_address_space *vm, u32 pml4e, u64 start, u64 pml4e_shift),
+		   TP_ARGS(vm, pml4e, start, pml4e_shift),
+
+		   TP_printk("vm=%p, pml4e=%d (0x%llx-0x%llx)",
+			     __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT_PRINT(i915_pagetable, i915_pagedirpo_destroy,
+		   TP_PROTO(struct i915_address_space *vm, u32 pml4e, u64 start, u64 pml4e_shift),
+		   TP_ARGS(vm, pml4e, start, pml4e_shift),
+
+		   TP_printk("vm=%p, pml4e=%d (0x%llx-0x%llx)",
+			     __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
 /* Avoid extra math because we only support two sizes. The format is defined by
  * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
 #define TRACE_PT_SIZE(bits) \
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 45/56] drm/i915/bdw: Abstract PDP usage
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (43 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 44/56] drm/i915/bdw: Make pdp allocation more dynamic Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 46/56] drm/i915/bdw: implement alloc/teardown for 4lvl Ben Widawsky
                   ` (11 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Up until now, ppgtt->pdp has always been the root of our page tables.
Legacy 32b addresses acted like it had 1 PDP with 4 PDPEs.

In preparation for 4 level page tables, we need to stop use ppgtt->pdp
directly unless we know it's what we want. The future structure will use
ppgtt->pml4 for the top level, and the pdp is just one of the entries
being pointed to by a pml4e.

This patch addresses some carelessness done throughout development wrt
assumptions made of the root page tables.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 129 ++++++++++++++++++++++++------------
 1 file changed, 85 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index df3cd41..c4b53ef 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -499,6 +499,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_pagedirpo *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
 	unsigned pdpe = gen8_pdpe_index(start);
 	unsigned pde = gen8_pde_index(start);
@@ -510,7 +511,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct i915_pagedir *pd = ppgtt->pdp.pagedirs[pdpe];
+		struct i915_pagedir *pd = pdp->pagedirs[pdpe];
 		struct i915_pagetab *pt = pd->page_tables[pde];
 		struct page *page_table = pt->page;
 
@@ -544,6 +545,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_pagedirpo *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	gen8_gtt_pte_t *pt_vaddr;
 	unsigned pdpe = gen8_pdpe_index(start);
 	unsigned pde = gen8_pde_index(start);
@@ -554,7 +556,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL) {
-			struct i915_pagedir *pd = ppgtt->pdp.pagedirs[pdpe];
+			struct i915_pagedir *pd = pdp->pagedirs[pdpe];
 			struct i915_pagetab *pt = pd->page_tables[pde];
 			struct page *page_table = pt->page;
 			pt_vaddr = kmap_atomic(page_table);
@@ -636,23 +638,22 @@ static void gen8_unmap_pagetable(struct i915_hw_ppgtt *ppgtt,
 	gen8_map_pagedir(pd, ppgtt->scratch_pt, pde, ppgtt->base.dev);
 }
 
-static void gen8_teardown_va_range(struct i915_address_space *vm,
-				   uint64_t start, uint64_t length)
+static void gen8_teardown_va_range_3lvl(struct i915_address_space *vm,
+					struct i915_pagedirpo *pdp,
+					uint64_t start, uint64_t length)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		        container_of(vm, struct i915_hw_ppgtt, base);
 	struct drm_device *dev = vm->dev;
 	struct i915_pagedir *pd;
 	struct i915_pagetab *pt;
 	uint64_t temp;
 	uint32_t pdpe, pde, orig_start = start;
 
-	if (!ppgtt->pdp.pagedirs) {
+	if (!pdp || !pdp->pagedirs) {
 		/* If pagedirs are already free, there is nothing to do.*/
 		return;
 	}
 
-	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
 
@@ -660,12 +661,12 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 		 * down, and up.
 		 */
 		if (!pd) {
-			WARN(test_bit(pdpe, ppgtt->pdp.used_pdpes),
+			WARN(test_bit(pdpe, pdp->used_pdpes),
 			     "PDPE %d is not allocated, but is reserved (%p)\n",
 			     pdpe, vm);
 			continue;
 		} else {
-			WARN(!test_bit(pdpe, ppgtt->pdp.used_pdpes),
+			WARN(!test_bit(pdpe, pdp->used_pdpes),
 			     "PDPE %d not reserved, but is allocated (%p)",
 			     pdpe, vm);
 		}
@@ -691,6 +692,8 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 				     gen8_pte_count(pd_start, pd_len));
 
 			if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PT)) {
+				struct i915_hw_ppgtt *ppgtt =
+					container_of(vm, struct i915_hw_ppgtt, base);
 				trace_i915_pagetable_destroy(vm,
 							     pde,
 							     pd_start & GENMASK_ULL(64, GEN8_PDE_SHIFT),
@@ -705,23 +708,42 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 
 		if (bitmap_empty(pd->used_pdes, I915_PDES_PER_PD)) {
 			free_pd_single(pd, dev);
-			ppgtt->pdp.pagedirs[pdpe] = NULL;
+			pdp->pagedirs[pdpe] = NULL;
 			trace_i915_pagedirectory_destroy(vm, pdpe,
 							 start & GENMASK_ULL(64, GEN8_PDPE_SHIFT),
 							 GEN8_PDPE_SHIFT);
-			WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
+			WARN_ON(!test_and_clear_bit(pdpe, pdp->used_pdpes));
 		}
 	}
 
-	if (bitmap_empty(ppgtt->pdp.used_pdpes, I915_PDPES_PER_PDP(dev))) {
+	if (bitmap_empty(pdp->used_pdpes, I915_PDPES_PER_PDP(dev))) {
 		/* TODO: When pagetables are fully dynamic:
-		free_pdp_single(&ppgtt->pdp, dev); */
+		free_pdp_single(pdp, dev); */
 		trace_i915_pagedirpo_destroy(vm, 0,
 					     orig_start & GENMASK_ULL(64, GEN8_PML4E_SHIFT),
 					     GEN8_PML4E_SHIFT);
 	}
 }
 
+static void gen8_teardown_va_range_4lvl(struct i915_address_space *vm,
+					struct i915_pml4 *pml4,
+					uint64_t start, uint64_t length)
+{
+	BUG();
+}
+
+static void gen8_teardown_va_range(struct i915_address_space *vm,
+				   uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+
+	if (!HAS_48B_PPGTT(vm->dev))
+		gen8_teardown_va_range_3lvl(vm, &ppgtt->pdp, start, length);
+	else
+		gen8_teardown_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+}
+
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	trace_i915_va_teardown(&ppgtt->base,
@@ -747,7 +769,7 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 
 /**
  * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
- * @ppgtt:	Master ppgtt structure.
+ * @vm:		Master vm structure.
  * @pd:		Page directory for this address range.
  * @start:	Starting virtual address to begin allocations.
  * @length	Size of the allocations.
@@ -763,13 +785,13 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
  *
  * Return: 0 if success; negative error code otherwise.
  */
-static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
+static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
 				     struct i915_pagedir *pd,
 				     uint64_t start,
 				     uint64_t length,
 				     unsigned long *new_pts)
 {
-	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_device *dev = vm->dev;
 	struct i915_pagetab *unused;
 	uint64_t temp;
 	uint32_t pde;
@@ -784,7 +806,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
 			goto unwind_out;
 
 		set_bit(pde, new_pts);
-		trace_i915_pagetable_alloc(&ppgtt->base, pde, start, GEN8_PDE_SHIFT);
+		trace_i915_pagetable_alloc(vm, pde, start, GEN8_PDE_SHIFT);
 	}
 
 	return 0;
@@ -798,7 +820,7 @@ unwind_out:
 
 /**
  * gen8_ppgtt_alloc_pagedirs() - Allocate page directories for VA range.
- * @ppgtt:	Master ppgtt structure.
+ * @vm:		Master vm structure.
  * @pdp:	Page directory pointer for this address range.
  * @start:	Starting virtual address to begin allocations.
  * @length	Size of the allocations.
@@ -819,17 +841,17 @@ unwind_out:
  *
  * Return: 0 if success; negative error code otherwise.
  */
-static int gen8_ppgtt_alloc_pagedirs(struct i915_hw_ppgtt *ppgtt,
+static int gen8_ppgtt_alloc_pagedirs(struct i915_address_space *vm,
 				     struct i915_pagedirpo *pdp,
 				     uint64_t start,
 				     uint64_t length,
 				     unsigned long *new_pds)
 {
-	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_device *dev = vm->dev;
 	struct i915_pagedir *unused;
 	uint64_t temp;
 	uint32_t pdpe;
-	size_t pdpes =  I915_PDPES_PER_PDP(ppgtt->base.dev);
+	size_t pdpes =  I915_PDPES_PER_PDP(vm->dev);
 
 	BUG_ON(!bitmap_empty(new_pds, pdpes));
 
@@ -841,13 +863,13 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_hw_ppgtt *ppgtt,
 		if (unused)
 			continue;
 
-		pd = alloc_pd_single(ppgtt->base.dev);
+		pd = alloc_pd_single(dev);
 		if (IS_ERR(pd))
 			goto unwind_out;
 
 		pdp->pagedirs[pdpe] = pd;
 		set_bit(pdpe, new_pds);
-		trace_i915_pagedirectory_alloc(&ppgtt->base, pdpe, start,
+		trace_i915_pagedirectory_alloc(vm, pdpe, start,
 					       GEN8_PDPE_SHIFT);
 	}
 
@@ -855,7 +877,7 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_hw_ppgtt *ppgtt,
 
 unwind_out:
 	for_each_set_bit(pdpe, new_pds, pdpes)
-		free_pd_single(pdp->pagedirs[pdpe], ppgtt->base.dev);
+		free_pd_single(pdp->pagedirs[pdpe], dev);
 
 	return -ENOMEM;
 }
@@ -910,12 +932,11 @@ err_out:
 	return -ENOMEM;
 }
 
-static int gen8_alloc_va_range(struct i915_address_space *vm,
-			       uint64_t start,
-			       uint64_t length)
+static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
+				    struct i915_pagedirpo *pdp,
+				    uint64_t start,
+				    uint64_t length)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
 	unsigned long *new_page_dirs, **new_page_tables;
 	struct drm_device *dev = vm->dev;
 	struct i915_pagedir *pd;
@@ -945,17 +966,15 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		return ret;
 
 	/* Do the allocations first so we can easily bail out */
-	ret = gen8_ppgtt_alloc_pagedirs(ppgtt, &ppgtt->pdp, start, length,
-					new_page_dirs);
+	ret = gen8_ppgtt_alloc_pagedirs(vm, pdp, start, length, new_page_dirs);
 	if (ret) {
 		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 		return ret;
 	}
 
-	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		bitmap_zero(new_page_tables[pdpe], I915_PDES_PER_PD);
-		ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
-						new_page_tables[pdpe]);
+		ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length, new_page_tables[pdpe]);
 		if (ret)
 			goto err_out;
 	}
@@ -963,7 +982,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	start = orig_start;
 	length = orig_length;
 
-	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		struct i915_pagetab *pt;
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
@@ -981,7 +1000,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 			set_bit(pde, pd->used_pdes);
 		}
 
-		set_bit(pdpe, ppgtt->pdp.used_pdpes);
+		set_bit(pdpe, pdp->used_pdpes);
 
 		gen8_map_pagetable_range(vm, pd, start, length);
 	}
@@ -992,16 +1011,36 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 err_out:
 	while (pdpe--) {
 		for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES_PER_PD)
-			free_pt_single(pd->page_tables[temp], ppgtt->base.dev);
+			free_pt_single(pd->page_tables[temp], dev);
 	}
 
 	for_each_set_bit(pdpe, new_page_dirs, pdpes)
-		free_pd_single(ppgtt->pdp.pagedirs[pdpe], ppgtt->base.dev);
+		free_pd_single(pdp->pagedirs[pdpe], dev);
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 	return ret;
 }
 
+static int __noreturn gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
+					       struct i915_pml4 *pml4,
+					       uint64_t start,
+					       uint64_t length)
+{
+	BUG();
+}
+
+static int gen8_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+
+	if (!HAS_48B_PPGTT(vm->dev))
+		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
+	else
+		return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+}
+
 static void gen8_ppgtt_fini_common(struct i915_hw_ppgtt *ppgtt)
 {
 	free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
@@ -1046,12 +1085,13 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_pagedirpo *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	struct i915_pagedir *pd;
 	uint64_t temp, start = 0, size = dev_priv->gtt.base.total;
 	uint32_t pdpe;
 	int ret;
 
-	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+	ret = gen8_ppgtt_init_common(ppgtt, size);
 	if (ret)
 		return ret;
 
@@ -1061,7 +1101,7 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 		return ret;
 	}
 
-	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
+	gen8_for_each_pdpe(pd, pdp, start, size, temp, pdpe)
 		gen8_map_pagetable_range(&ppgtt->base, pd, start, size);
 
 	ppgtt->base.allocate_va_range = NULL;
@@ -2101,6 +2141,7 @@ void gen8_for_every_pdpe_pde(struct i915_hw_ppgtt *ppgtt,
 			     void *data)
 {
 	struct drm_device *dev = ppgtt->base.dev;
+	struct i915_pagedirpo *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	uint64_t start = ppgtt->base.start;
 	uint64_t length = ppgtt->base.total;
 	uint64_t pdpe, pde, temp;
@@ -2108,18 +2149,18 @@ void gen8_for_every_pdpe_pde(struct i915_hw_ppgtt *ppgtt,
 	struct i915_pagedir *pd;
 	struct i915_pagetab *pt;
 
-	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		uint64_t pd_start = start, pd_length = length;
 		int i;
 
 		if (pd == NULL) {
 			for (i = 0; i < I915_PDES_PER_PD; i++)
-				callback(&ppgtt->pdp, NULL, NULL, pdpe, i, data);
+				callback(pdp, NULL, NULL, pdpe, i, data);
 			continue;
 		}
 
 		gen8_for_each_pde(pt, pd, pd_start, pd_length, temp, pde) {
-			callback(&ppgtt->pdp, pd, pt, pdpe, pde, data);
+			callback(pdp, pd, pt, pdpe, pde, data);
 		}
 	}
 }
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 46/56] drm/i915/bdw: implement alloc/teardown for 4lvl
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (44 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 45/56] drm/i915/bdw: Abstract PDP usage Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 47/56] drm/i915/bdw: 4 level pages tables Ben Widawsky
                   ` (10 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

The code for 4lvl works just as one would expect, and nicely it is able
to call into the existing 3lvl page table code to handle all of the
lower levels.

PML4 has no special attributes.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 170 ++++++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_gem_gtt.h |  12 ++-
 2 files changed, 163 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index c4b53ef..3478bf5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -413,9 +413,12 @@ static void __pdp_fini(struct i915_pagedirpo *pdp)
 static void free_pdp_single(struct i915_pagedirpo *pdp,
 			    struct drm_device *dev)
 {
-	__pdp_fini(pdp);
-	if (HAS_48B_PPGTT(dev))
+	if (HAS_48B_PPGTT(dev)) {
+		__pdp_fini(pdp);
+		i915_dma_unmap_single(pdp, dev);
+		__free_page(pdp->page);
 		kfree(pdp);
+	}
 }
 
 static int __pdp_init(struct i915_pagedirpo *pdp,
@@ -441,6 +444,58 @@ static int __pdp_init(struct i915_pagedirpo *pdp,
 	return 0;
 }
 
+static struct i915_pagedirpo *alloc_pdp_single(struct i915_hw_ppgtt *ppgtt,
+					       struct i915_pml4 *pml4)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct i915_pagedirpo *pdp;
+	int ret;
+
+	BUG_ON(!HAS_48B_PPGTT(dev));
+
+	pdp = kmalloc(sizeof(*pdp), GFP_KERNEL);
+	if (!pdp)
+		return ERR_PTR(-ENOMEM);
+
+	pdp->page = alloc_page(GFP_KERNEL | GFP_DMA32 | __GFP_ZERO);
+	if (!pdp->page) {
+		kfree(pdp);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	ret = __pdp_init(pdp, dev);
+	if (ret) {
+		__free_page(pdp->page);
+		kfree(pdp);
+		return ERR_PTR(ret);
+	}
+
+	i915_dma_map_px_single(pdp, dev);
+
+	return pdp;
+}
+
+static void pml4_fini(struct i915_pml4 *pml4)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pml4, struct i915_hw_ppgtt, pml4);
+	i915_dma_unmap_single(pml4, ppgtt->base.dev);
+	__free_page(pml4->page);
+}
+
+static int pml4_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct i915_pml4 *pml4 = &ppgtt->pml4;
+
+	pml4->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pml4->page)
+		return -ENOMEM;
+
+	i915_dma_map_px_single(pml4, ppgtt->base.dev);
+
+	return 0;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_ring_buffer *ring,
 			  unsigned entry,
@@ -729,7 +784,14 @@ static void gen8_teardown_va_range_4lvl(struct i915_address_space *vm,
 					struct i915_pml4 *pml4,
 					uint64_t start, uint64_t length)
 {
-	BUG();
+	struct i915_pagedirpo *pdp;
+	uint64_t temp, pml4e;
+
+	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+		gen8_teardown_va_range_3lvl(vm, pdp, start, length);
+		if (bitmap_empty(pdp->used_pdpes, I915_PDPES_PER_PDP(vm->dev)))
+			clear_bit(pml4e, pml4->used_pml4es);
+	}
 }
 
 static void gen8_teardown_va_range(struct i915_address_space *vm,
@@ -738,10 +800,10 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 
-	if (!HAS_48B_PPGTT(vm->dev))
-		gen8_teardown_va_range_3lvl(vm, &ppgtt->pdp, start, length);
-	else
+	if (HAS_48B_PPGTT(vm->dev))
 		gen8_teardown_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+	else
+		gen8_teardown_va_range_3lvl(vm, &ppgtt->pdp, start, length);
 }
 
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
@@ -1021,12 +1083,76 @@ err_out:
 	return ret;
 }
 
-static int __noreturn gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
-					       struct i915_pml4 *pml4,
-					       uint64_t start,
-					       uint64_t length)
+static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
+				    struct i915_pml4 *pml4,
+				    uint64_t start,
+				    uint64_t length)
 {
-	BUG();
+	DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4);
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_pagedirpo *pdp;
+	const uint64_t orig_start = start;
+	const uint64_t orig_length = length;
+	uint64_t temp, pml4e;
+
+	/* Do the pml4 allocations first, so we don't need to track the newly
+	 * allocated tables below the pdp */
+	bitmap_zero(new_pdps, GEN8_PML4ES_PER_PML4);
+
+	/* The pagedirectory and pagetable allocations are done in the shared 3
+	 * and 4 level code. Just allocate the pdps.
+	 */
+	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+		if (!pdp) {
+			WARN_ON(test_bit(pml4e, pml4->used_pml4es));
+			pdp = alloc_pdp_single(ppgtt, pml4);
+			if (IS_ERR(pdp))
+				goto err_alloc;
+
+			pml4->pdps[pml4e] = pdp;
+			set_bit(pml4e, new_pdps);
+			trace_i915_pagedirpo_alloc(&ppgtt->base, pml4e,
+						   pml4e << GEN8_PML4E_SHIFT,
+						   GEN8_PML4E_SHIFT);
+
+		} else
+			WARN(!test_bit(pml4e, pml4->used_pml4es),
+			     "%lld %p", pml4e, vm);
+	}
+
+	start = orig_start;
+	length = orig_length;
+
+	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+		int ret;
+
+		BUG_ON(!pdp);
+
+		ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
+		if (ret)
+			goto err_out;
+	}
+
+	WARN(bitmap_weight(pml4->used_pml4es, GEN8_PML4ES_PER_PML4) > 2,
+	     "The allocation has spanned more than 512GB. It is highly likely this is incorrect.");
+
+	bitmap_or(pml4->used_pml4es, new_pdps, pml4->used_pml4es,
+		  GEN8_PML4ES_PER_PML4);
+
+	return 0;
+
+err_out:
+	/* This will teardown more than we allocated. It should be fine, and
+	 * makes code simpler. */
+	start = orig_start;
+	length = orig_length;
+	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e)
+		gen8_teardown_va_range_3lvl(vm, pdp, start, length);
+
+err_alloc:
+	for_each_set_bit(pml4e, new_pdps, GEN8_PML4ES_PER_PML4)
+		free_pdp_single(pdp, vm->dev);
 }
 
 static int gen8_alloc_va_range(struct i915_address_space *vm,
@@ -1035,16 +1161,19 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 
-	if (!HAS_48B_PPGTT(vm->dev))
-		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
-	else
+	if (HAS_48B_PPGTT(vm->dev))
 		return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+	else
+		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
 }
 
 static void gen8_ppgtt_fini_common(struct i915_hw_ppgtt *ppgtt)
 {
 	free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
-	free_pdp_single(&ppgtt->pdp, ppgtt->base.dev);
+	if (HAS_48B_PPGTT(ppgtt->base.dev))
+		pml4_fini(&ppgtt->pml4);
+	else
+		free_pdp_single(&ppgtt->pdp, ppgtt->base.dev);
 }
 
 /**
@@ -1066,7 +1195,13 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	if (IS_ERR(ppgtt->scratch_pd))
 		return PTR_ERR(ppgtt->scratch_pd);
 
-	if (!HAS_48B_PPGTT(ppgtt->base.dev)) {
+	if (HAS_48B_PPGTT(ppgtt->base.dev)) {
+		int ret = pml4_init(ppgtt);
+		if (ret) {
+			free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
+			return ret;
+		}
+	} else {
 		int ret = __pdp_init(&ppgtt->pdp, false);
 		if (ret) {
 			free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
@@ -1075,8 +1210,7 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 
 		ppgtt->switch_mm = gen8_mm_switch;
 		trace_i915_pagedirpo_alloc(&ppgtt->base, 0, 0, GEN8_PML4E_SHIFT);
-	} else
-		BUG(); /* Not yet implemented */
+	}
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 94c825e..0e5cd58 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -103,6 +103,7 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #endif
 #define GEN8_PML4ES_PER_PML4		512
 #define GEN8_PML4E_SHIFT		39
+#define GEN8_PML4E_MASK			(GEN8_PML4ES_PER_PML4 - 1)
 #define GEN8_PDPE_SHIFT			30
 /* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
  * tables */
@@ -436,9 +437,18 @@ static inline size_t gen6_pde_count(uint32_t addr, uint32_t length)
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
+#define gen8_for_each_pml4e(pdp, pml4, start, length, temp, iter)	\
+	for (iter = gen8_pml4e_index(start), pdp = (pml4)->pdps[iter];	\
+	     length > 0 && iter < GEN8_PML4ES_PER_PML4;			\
+	     pdp = (pml4)->pdps[++iter],				\
+	     temp = ALIGN(start+1, 1ULL << GEN8_PML4E_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
 #define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
 	gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
 
+
 /* Clamp length to the next pagetab boundary */
 static inline uint64_t gen8_clamp_pt(uint64_t start, uint64_t length)
 {
@@ -476,7 +486,7 @@ static inline uint32_t gen8_pdpe_index(uint64_t address)
 
 static inline uint32_t gen8_pml4e_index(uint64_t address)
 {
-	BUG();
+	return (address >> GEN8_PML4E_SHIFT) & GEN8_PML4E_MASK;
 }
 
 static inline size_t gen8_pte_count(uint64_t addr, uint64_t length)
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 47/56] drm/i915/bdw: 4 level pages tables
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (45 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 46/56] drm/i915/bdw: implement alloc/teardown for 4lvl Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 48/56] drm/i915: Restructure map vs. insert entries Ben Widawsky
                   ` (9 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Map is easy, it's the same register as the PDP descriptor 0, but it only
has one entry. Also, the mapping code is now trivial thanks to all of
the prep patches.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 53 +++++++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_gem_gtt.h |  4 ++-
 drivers/gpu/drm/i915/i915_reg.h     |  1 +
 3 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 3478bf5..15e61d8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -528,9 +528,9 @@ static int gen8_write_pdp(struct intel_ring_buffer *ring,
 	return 0;
 }
 
-static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
-			  struct intel_ring_buffer *ring,
-			  bool synchronous)
+static int gen8_legacy_mm_switch(struct i915_hw_ppgtt *ppgtt,
+				 struct intel_ring_buffer *ring,
+				 bool synchronous)
 {
 	int i, ret;
 
@@ -547,6 +547,13 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 }
 
+static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
+			      struct intel_ring_buffer *ring,
+			      bool synchronous)
+{
+	return gen8_write_pdp(ring, 0, ppgtt->pml4.daddr, synchronous);
+}
+
 static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				   uint64_t start,
 				   uint64_t length,
@@ -674,6 +681,7 @@ static void gen8_map_pagetable_range(struct i915_address_space *vm,
 	kunmap_atomic(pagedir);
 }
 
+
 static void gen8_map_pagedir(struct i915_pagedir *pd,
 			     struct i915_pagetab *pt,
 			     int entry,
@@ -693,6 +701,35 @@ static void gen8_unmap_pagetable(struct i915_hw_ppgtt *ppgtt,
 	gen8_map_pagedir(pd, ppgtt->scratch_pt, pde, ppgtt->base.dev);
 }
 
+static void gen8_map_page_directory(struct i915_pagedirpo *pdp,
+				    struct i915_pagedir *pd,
+				    int index,
+				    struct drm_device *dev)
+{
+	gen8_ppgtt_pdpe_t *pagedirpo;
+	gen8_ppgtt_pdpe_t pdpe;
+
+	if (!HAS_48B_PPGTT(dev))
+		return;
+
+	pagedirpo = kmap_atomic(pdp->page);
+	pdpe = gen8_pde_encode(dev, pd->daddr, I915_CACHE_LLC);
+	pagedirpo[index] = pdpe;
+	kunmap_atomic(pagedirpo);
+}
+
+static void gen8_map_page_directory_pointer(struct i915_pml4 *pml4,
+					    struct i915_pagedirpo *pdp,
+					    int index,
+					    struct drm_device *dev)
+{
+	gen8_ppgtt_pml4e_t *pagemap = kmap_atomic(pml4->page);
+	gen8_ppgtt_pml4e_t pml4e = gen8_pde_encode(dev, pdp->daddr, I915_CACHE_LLC);
+	BUG_ON(!HAS_48B_PPGTT(dev));
+	pagemap[index] = pml4e;
+	kunmap_atomic(pagemap);
+}
+
 static void gen8_teardown_va_range_3lvl(struct i915_address_space *vm,
 					struct i915_pagedirpo *pdp,
 					uint64_t start, uint64_t length)
@@ -1065,6 +1102,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 		set_bit(pdpe, pdp->used_pdpes);
 
 		gen8_map_pagetable_range(vm, pd, start, length);
+		gen8_map_page_directory(pdp, pd, pdpe, dev);
 	}
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1132,6 +1170,8 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
 		ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
 		if (ret)
 			goto err_out;
+
+		gen8_map_page_directory_pointer(pml4, pdp, pml4e, vm->dev);
 	}
 
 	WARN(bitmap_weight(pml4->used_pml4es, GEN8_PML4ES_PER_PML4) > 2,
@@ -1201,6 +1241,7 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 			free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
 			return ret;
 		}
+		ppgtt->switch_mm = gen8_48b_mm_switch;
 	} else {
 		int ret = __pdp_init(&ppgtt->pdp, false);
 		if (ret) {
@@ -1208,7 +1249,7 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 			return ret;
 		}
 
-		ppgtt->switch_mm = gen8_mm_switch;
+		ppgtt->switch_mm = gen8_legacy_mm_switch;
 		trace_i915_pagedirpo_alloc(&ppgtt->base, 0, 0, GEN8_PML4E_SHIFT);
 	}
 
@@ -1235,6 +1276,7 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 		return ret;
 	}
 
+	/* FIXME: PML4 */
 	gen8_for_each_pdpe(pd, pdp, start, size, temp, pdpe)
 		gen8_map_pagetable_range(&ppgtt->base, pd, start, size);
 
@@ -1472,8 +1514,9 @@ static int gen8_ppgtt_enable(struct i915_hw_ppgtt *ppgtt)
 	int j, ret;
 
 	for_each_ring(ring, dev_priv, j) {
+		u32 four_level = HAS_48B_PPGTT(dev) ? GEN8_GFX_PPGTT_64B : 0;
 		I915_WRITE(RING_MODE_GEN7(ring),
-			   _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
+			   _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE | four_level));
 
 		/* We promise to do a switch later with FULL PPGTT. If this is
 		 * aliasing, this is the one and only switch we'll do */
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 0e5cd58..3904ae5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -36,7 +36,9 @@
 
 typedef uint32_t gen6_gtt_pte_t;
 typedef uint64_t gen8_gtt_pte_t;
-typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
+typedef gen8_gtt_pte_t		gen8_ppgtt_pde_t;
+typedef gen8_ppgtt_pde_t	gen8_ppgtt_pdpe_t;
+typedef gen8_ppgtt_pdpe_t	gen8_ppgtt_pml4e_t;
 
 /* GEN Agnostic defines */
 #define I915_PAGE_SIZE			4096
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index bc34250..d8ee8ed 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -969,6 +969,7 @@ enum punit_power_well {
 #define   GFX_REPLAY_MODE		(1<<11)
 #define   GFX_PSMI_GRANULARITY		(1<<10)
 #define   GFX_PPGTT_ENABLE		(1<<9)
+#define   GEN8_GFX_PPGTT_64B		(1<<7)
 
 #define VLV_DISPLAY_BASE 0x180000
 
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 48/56] drm/i915: Restructure map vs. insert entries
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (46 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 47/56] drm/i915/bdw: 4 level pages tables Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 49/56] drm/i915/bdw: make aliasing PPGTT dynamic Ben Widawsky
                   ` (8 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

After this change, the old GGTT keeps its insert_entries/clear_range
functions as we don't expect those to ever change in terms of page table
levels. The address space now gets map_vma/unmap VMA. It better reflects
the operations we actually want to support for a VMA.

I was too lazy, but the GGTT should really use these new functions as
well.

BISECT WARNING: This commit breaks aliasing PPGTT as is. If you see this
during bisect, please skip. There was no other way I could find to make
these changes remotely readable

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h     |   1 +
 drivers/gpu/drm/i915/i915_gem_gtt.c | 223 +++++++++++++++++++-----------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  24 ++--
 3 files changed, 126 insertions(+), 122 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4d53728..a043941 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -571,6 +571,7 @@ enum i915_cache_level {
 			      large Last-Level-Cache. LLC is coherent with
 			      the CPU, but L3 is only visible to the GPU. */
 	I915_CACHE_WT, /* hsw:gt3e WriteThrough for scanouts */
+	I915_CACHE_MAX,
 };
 
 struct i915_ctx_hang_stats {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 15e61d8..d67d803 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -730,9 +730,9 @@ static void gen8_map_page_directory_pointer(struct i915_pml4 *pml4,
 	kunmap_atomic(pagemap);
 }
 
-static void gen8_teardown_va_range_3lvl(struct i915_address_space *vm,
-					struct i915_pagedirpo *pdp,
-					uint64_t start, uint64_t length)
+static void gen8_unmap_vma_3lvl(struct i915_address_space *vm,
+				struct i915_pagedirpo *pdp,
+				uint64_t start, uint64_t length)
 {
 	struct drm_device *dev = vm->dev;
 	struct i915_pagedir *pd;
@@ -817,38 +817,43 @@ static void gen8_teardown_va_range_3lvl(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_teardown_va_range_4lvl(struct i915_address_space *vm,
-					struct i915_pml4 *pml4,
-					uint64_t start, uint64_t length)
+static void gen8_unmap_vma_4lvl(struct i915_address_space *vm,
+				struct i915_pml4 *pml4,
+				uint64_t start, uint64_t length)
 {
 	struct i915_pagedirpo *pdp;
 	uint64_t temp, pml4e;
 
 	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
-		gen8_teardown_va_range_3lvl(vm, pdp, start, length);
+		gen8_unmap_vma_3lvl(vm, pdp, start, length);
 		if (bitmap_empty(pdp->used_pdpes, I915_PDPES_PER_PDP(vm->dev)))
 			clear_bit(pml4e, pml4->used_pml4es);
 	}
 }
 
-static void gen8_teardown_va_range(struct i915_address_space *vm,
-				   uint64_t start, uint64_t length)
+static void __gen8_teardown_va_range(struct i915_address_space *vm,
+				     uint64_t start, uint64_t length)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 
 	if (HAS_48B_PPGTT(vm->dev))
-		gen8_teardown_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+		gen8_unmap_vma_4lvl(vm, &ppgtt->pml4, start, length);
 	else
-		gen8_teardown_va_range_3lvl(vm, &ppgtt->pdp, start, length);
+		gen8_unmap_vma_3lvl(vm, &ppgtt->pdp, start, length);
+}
+
+static void gen8_unmap_vma(struct i915_vma *vma)
+{
+	__gen8_teardown_va_range(vma->vm, vma->node.start, vma->node.size);
 }
 
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	trace_i915_va_teardown(&ppgtt->base,
 			       ppgtt->base.start, ppgtt->base.total);
-	gen8_teardown_va_range(&ppgtt->base,
-			       ppgtt->base.start, ppgtt->base.total);
+	__gen8_teardown_va_range(&ppgtt->base,
+				 ppgtt->base.start, ppgtt->base.total);
 
 	WARN_ON(!bitmap_empty(ppgtt->pdp.used_pdpes,
 			      I915_PDPES_PER_PDP(ppgtt->base.dev)));
@@ -1188,15 +1193,15 @@ err_out:
 	start = orig_start;
 	length = orig_length;
 	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e)
-		gen8_teardown_va_range_3lvl(vm, pdp, start, length);
+		gen8_unmap_vma_3lvl(vm, pdp, start, length);
 
 err_alloc:
 	for_each_set_bit(pml4e, new_pdps, GEN8_PML4ES_PER_PML4)
 		free_pdp_single(pdp, vm->dev);
 }
 
-static int gen8_alloc_va_range(struct i915_address_space *vm,
-			       uint64_t start, uint64_t length)
+static int __gen8_alloc_va_range(struct i915_address_space *vm,
+				 uint64_t start, uint64_t length)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
@@ -1207,6 +1212,18 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
 }
 
+static int gen8_map_vma(struct i915_vma *vma, u32 flags)
+{
+	int ret =  __gen8_alloc_va_range(vma->vm, vma->node.start,vma->node.size);
+	if (!ret) {
+		BUG_ON(flags >= I915_CACHE_MAX);
+		gen8_ppgtt_insert_entries(vma->vm, vma->obj->pages, vma->node.start,
+					  flags);
+	}
+
+	return ret;
+}
+
 static void gen8_ppgtt_fini_common(struct i915_hw_ppgtt *ppgtt)
 {
 	free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
@@ -1229,7 +1246,6 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.total = size;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 	ppgtt->enable = gen8_ppgtt_enable;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 
 	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
 	if (IS_ERR(ppgtt->scratch_pd))
@@ -1270,7 +1286,7 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
+	ret = __gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
 		gen8_ppgtt_fini_common(ppgtt);
 		return ret;
@@ -1280,9 +1296,11 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	gen8_for_each_pdpe(pd, pdp, start, size, temp, pdpe)
 		gen8_map_pagetable_range(&ppgtt->base, pd, start, size);
 
-	ppgtt->base.allocate_va_range = NULL;
-	ppgtt->base.teardown_va_range = NULL;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+	BUG(); // we need a map_vma for aliasing
+	ppgtt->base.map_vma = NULL;
+	ppgtt->base.unmap_vma = NULL;
+
+	gen8_ppgtt_clear_range(&ppgtt->base, 0, dev_priv->gtt.base.total, true);
 
 	return 0;
 }
@@ -1297,9 +1315,8 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ppgtt->base.allocate_va_range = gen8_alloc_va_range;
-	ppgtt->base.teardown_va_range = gen8_teardown_va_range;
-	ppgtt->base.clear_range = NULL;
+	ppgtt->base.map_vma = gen8_map_vma;
+	ppgtt->base.unmap_vma = gen8_unmap_vma;
 
 	return 0;
 }
@@ -1670,15 +1687,16 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 }
 
-static int gen6_alloc_va_range(struct i915_address_space *vm,
-			       uint64_t start, uint64_t length)
+static int gen6_alloc_va_range(struct i915_vma *vma, u32 flags)
 {
 	DECLARE_BITMAP(new_page_tables, I915_PDES_PER_PD);
+	struct i915_address_space *vm = vma->vm;
 	struct drm_device *dev = vm->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_hw_ppgtt *ppgtt =
 		        container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_pagetab *pt;
+	uint32_t start = vma->node.start, length = vma->node.size;
 	const uint32_t start_save = start, length_save = length;
 	uint32_t pde, temp;
 	int ret;
@@ -1748,6 +1766,9 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 	 * table. Also require for WC mapped PTEs */
 	readl(dev_priv->gtt.gsm);
 
+	BUG_ON(flags >= I915_CACHE_MAX);
+	gen6_ppgtt_insert_entries(vm, vma->obj->pages, vma->node.start, flags);
+
 	return 0;
 
 unwind_out:
@@ -1759,18 +1780,20 @@ unwind_out:
 	return ret;
 }
 
-static void gen6_teardown_va_range(struct i915_address_space *vm,
-				   uint64_t start, uint64_t length)
+static void gen6_unmap_vma(struct i915_vma *vma)
 {
+	struct i915_address_space *vm = vma->vm;
 	struct i915_hw_ppgtt *ppgtt =
 		        container_of(vm, struct i915_hw_ppgtt, base);
+	uint32_t start = vma->node.start, length = vma->node.size;
+	const uint32_t orig_start = start, orig_length = length;
 	struct i915_pagetab *pt;
 	uint32_t pde, temp;
 
 	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
 
 		if (WARN(pt == ppgtt->scratch_pt,
-		    "Tried to teardown scratch page vm %p. pde %u: %llx-%llx\n",
+		    "Tried to teardown scratch page vm %p. pde %u: %x-%x\n",
 		    vm, pde, start, start + length))
 			continue;
 
@@ -1790,6 +1813,8 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
 			ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
 		}
 	}
+
+	gen6_ppgtt_clear_range(vm, orig_start, orig_length, true);
 }
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
@@ -1919,10 +1944,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 	if (ret)
 		return ret;
 
-	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
-	ppgtt->base.teardown_va_range = gen6_teardown_va_range;
-	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
+	ppgtt->base.map_vma = gen6_alloc_va_range;
+	ppgtt->base.unmap_vma = gen6_unmap_vma;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
 	ppgtt->base.total = I915_PDES_PER_PD * GEN6_PTES_PER_PT * PAGE_SIZE;
@@ -1968,8 +1991,6 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt, boo
 
 	kref_init(&ppgtt->ref);
 	drm_mm_init(&ppgtt->base.mm, ppgtt->base.start, ppgtt->base.total);
-	if (ppgtt->base.clear_range)
-		ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 	i915_init_vm(dev_priv, &ppgtt->base);
 
 	return 0;
@@ -1993,40 +2014,28 @@ ppgtt_bind_vma(struct i915_vma *vma,
 	int ret;
 
 	WARN_ON(flags);
-	if (vma->vm->allocate_va_range) {
-		trace_i915_va_alloc(vma->vm, vma->node.start, vma->node.size);
-
-		ret = vma->vm->allocate_va_range(vma->vm,
-						 vma->node.start,
-						 vma->node.size);
-		if (ret)
-			return ret;
+	BUG_ON(!vma->vm->map_vma);
+	trace_i915_va_alloc(vma->vm, vma->node.start, vma->node.size);
 
-		ppgtt_invalidate_tlbs(vma->vm);
-	}
+	ret = vma->vm->map_vma(vma, cache_level);
+	if (ret)
+		return ret;
 
-	vma->vm->insert_entries(vma->vm, vma->obj->pages, vma->node.start,
-				cache_level);
+	ppgtt_invalidate_tlbs(vma->vm);
 
 	return 0;
 }
 
 static void ppgtt_unbind_vma(struct i915_vma *vma)
 {
-	if (vma->vm->clear_range)
-		vma->vm->clear_range(vma->vm,
-				     vma->node.start,
-				     vma->obj->base.size,
-				     true);
-
-	if (vma->vm->teardown_va_range) {
+	if (vma->vm->unmap_vma) {
 		trace_i915_va_teardown(vma->vm,
 				       vma->node.start, vma->node.size);
 
-		vma->vm->teardown_va_range(vma->vm,
-					   vma->node.start, vma->node.size);
+		vma->vm->unmap_vma(vma);
 		ppgtt_invalidate_tlbs(vma->vm);
-	}
+	} else
+		BUG();
 }
 
 extern int intel_iommu_gfx_mapped;
@@ -2108,10 +2117,10 @@ void i915_gem_suspend_gtt_mappings(struct drm_device *dev)
 
 	i915_check_and_clear_faults(dev);
 
-	dev_priv->gtt.base.clear_range(&dev_priv->gtt.base,
-				       dev_priv->gtt.base.start,
-				       dev_priv->gtt.base.total,
-				       true);
+	dev_priv->gtt.clear_range(&dev_priv->gtt,
+				  dev_priv->gtt.base.start,
+				  dev_priv->gtt.base.total,
+				  true);
 }
 
 void i915_gem_restore_gtt_mappings(struct drm_device *dev)
@@ -2123,10 +2132,10 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 	i915_check_and_clear_faults(dev);
 
 	/* First fill our portion of the GTT with scratch pages */
-	dev_priv->gtt.base.clear_range(&dev_priv->gtt.base,
-				       dev_priv->gtt.base.start,
-				       dev_priv->gtt.base.total,
-				       true);
+	dev_priv->gtt.clear_range(&dev_priv->gtt,
+				  dev_priv->gtt.base.start,
+				  dev_priv->gtt.base.total,
+				  true);
 
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
 		struct i915_vma *vma = i915_gem_obj_to_vma(obj,
@@ -2199,15 +2208,16 @@ static inline void gen8_set_pte(void __iomem *addr, gen8_gtt_pte_t pte)
 #endif
 }
 
-static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
+static void gen8_ggtt_insert_entries(struct i915_gtt *gtt,
 				     struct sg_table *st,
 				     uint64_t start,
 				     enum i915_cache_level level)
 {
-	struct drm_i915_private *dev_priv = vm->dev->dev_private;
+	struct drm_i915_private *dev_priv =
+		container_of(gtt, struct drm_i915_private, gtt);
 	unsigned first_entry = start >> PAGE_SHIFT;
 	gen8_gtt_pte_t __iomem *gtt_entries =
-		(gen8_gtt_pte_t __iomem *)dev_priv->gtt.gsm + first_entry;
+		(gen8_gtt_pte_t __iomem *)gtt->gsm + first_entry;
 	int i = 0;
 	struct sg_page_iter sg_iter;
 	dma_addr_t addr = 0;
@@ -2245,22 +2255,23 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
  * within the global GTT as well as accessible by the GPU through the GMADR
  * mapped BAR (dev_priv->mm.gtt->gtt).
  */
-static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
+static void gen6_ggtt_insert_entries(struct i915_gtt *gtt,
 				     struct sg_table *st,
 				     uint64_t start,
 				     enum i915_cache_level level)
 {
-	struct drm_i915_private *dev_priv = vm->dev->dev_private;
+	struct drm_i915_private *dev_priv =
+		container_of(gtt, struct drm_i915_private, gtt);
 	unsigned first_entry = start >> PAGE_SHIFT;
 	gen6_gtt_pte_t __iomem *gtt_entries =
-		(gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm + first_entry;
+		(gen6_gtt_pte_t __iomem *)gtt->gsm + first_entry;
 	int i = 0;
 	struct sg_page_iter sg_iter;
 	dma_addr_t addr;
 
 	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0) {
 		addr = sg_page_iter_dma_address(&sg_iter);
-		iowrite32(vm->pte_encode(addr, level, true), &gtt_entries[i]);
+		iowrite32(gtt->base.pte_encode(addr, level, true), &gtt_entries[i]);
 		i++;
 	}
 
@@ -2272,7 +2283,7 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 	 */
 	if (i != 0)
 		WARN_ON(readl(&gtt_entries[i-1]) !=
-			vm->pte_encode(addr, level, true));
+			gtt->base.pte_encode(addr, level, true));
 
 	/* This next bit makes the above posting read even more important. We
 	 * want to flush the TLBs only after we're certain all the PTE updates
@@ -2282,17 +2293,16 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 	POSTING_READ(GFX_FLSH_CNTL_GEN6);
 }
 
-static void gen8_ggtt_clear_range(struct i915_address_space *vm,
+static void gen8_ggtt_clear_range(struct i915_gtt *gtt,
 				  uint64_t start,
 				  uint64_t length,
 				  bool use_scratch)
 {
-	struct drm_i915_private *dev_priv = vm->dev->dev_private;
 	unsigned first_entry = start >> PAGE_SHIFT;
 	unsigned num_entries = length >> PAGE_SHIFT;
 	gen8_gtt_pte_t scratch_pte, __iomem *gtt_base =
-		(gen8_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
-	const int max_entries = gtt_total_entries(&dev_priv->gtt) - first_entry;
+		(gen8_gtt_pte_t __iomem *) gtt->gsm + first_entry;
+	const int max_entries = gtt_total_entries(gtt) - first_entry;
 	int i;
 
 	if (WARN(num_entries > max_entries,
@@ -2300,7 +2310,7 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
 		 first_entry, num_entries, max_entries))
 		num_entries = max_entries;
 
-	scratch_pte = gen8_pte_encode(vm->scratch.addr,
+	scratch_pte = gen8_pte_encode(gtt->base.scratch.addr,
 				      I915_CACHE_LLC,
 				      use_scratch);
 	for (i = 0; i < num_entries; i++)
@@ -2342,17 +2352,16 @@ void gen8_for_every_pdpe_pde(struct i915_hw_ppgtt *ppgtt,
 	}
 }
 
-static void gen6_ggtt_clear_range(struct i915_address_space *vm,
+static void gen6_ggtt_clear_range(struct i915_gtt *gtt,
 				  uint64_t start,
 				  uint64_t length,
 				  bool use_scratch)
 {
-	struct drm_i915_private *dev_priv = vm->dev->dev_private;
 	unsigned first_entry = start >> PAGE_SHIFT;
 	unsigned num_entries = length >> PAGE_SHIFT;
 	gen6_gtt_pte_t scratch_pte, __iomem *gtt_base =
-		(gen6_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
-	const int max_entries = gtt_total_entries(&dev_priv->gtt) - first_entry;
+		(gen6_gtt_pte_t __iomem *) gtt->gsm + first_entry;
+	const int max_entries = gtt_total_entries(gtt) - first_entry;
 	int i;
 
 	if (WARN(num_entries > max_entries,
@@ -2360,7 +2369,8 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
 		 first_entry, num_entries, max_entries))
 		num_entries = max_entries;
 
-	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, use_scratch);
+	scratch_pte = gtt->base.pte_encode(gtt->base.scratch.addr,
+					   I915_CACHE_LLC, use_scratch);
 
 	for (i = 0; i < num_entries; i++)
 		iowrite32(scratch_pte, &gtt_base[i]);
@@ -2383,7 +2393,7 @@ static int i915_ggtt_bind_vma(struct i915_vma *vma,
 	return 0;
 }
 
-static void i915_ggtt_clear_range(struct i915_address_space *vm,
+static void i915_ggtt_clear_range(struct i915_gtt *gunused,
 				  uint64_t start,
 				  uint64_t length,
 				  bool unused)
@@ -2425,9 +2435,10 @@ static int ggtt_bind_vma(struct i915_vma *vma,
 	if (!dev_priv->mm.aliasing_ppgtt || flags & GLOBAL_BIND) {
 		if (!obj->has_global_gtt_mapping ||
 		    (cache_level != obj->cache_level)) {
-			vma->vm->insert_entries(vma->vm, obj->pages,
-						vma->node.start,
-						cache_level);
+			struct i915_gtt *gtt = &dev_priv->gtt;
+			gtt->insert_entries(gtt, obj->pages,
+					    vma->node.start,
+					    cache_level);
 			obj->has_global_gtt_mapping = 1;
 		}
 	}
@@ -2439,10 +2450,8 @@ static int ggtt_bind_vma(struct i915_vma *vma,
 	    (!obj->has_aliasing_ppgtt_mapping ||
 	     (cache_level != obj->cache_level))) {
 		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
-		appgtt->base.insert_entries(&appgtt->base,
-					    vma->obj->pages,
-					    vma->node.start,
-					    cache_level);
+		BUG();
+		appgtt->base.map_vma(vma, cache_level);
 		vma->obj->has_aliasing_ppgtt_mapping = 1;
 	}
 
@@ -2453,22 +2462,19 @@ static void ggtt_unbind_vma(struct i915_vma *vma)
 {
 	struct drm_device *dev = vma->vm->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_gtt *gtt = &dev_priv->gtt;
 	struct drm_i915_gem_object *obj = vma->obj;
 
+	BUG_ON(vma->vm != &gtt->base);
+
 	if (obj->has_global_gtt_mapping) {
-		vma->vm->clear_range(vma->vm,
-				     vma->node.start,
-				     obj->base.size,
-				     true);
+		gtt->clear_range(gtt, vma->node.start, obj->base.size, true);
 		obj->has_global_gtt_mapping = 0;
 	}
 
 	if (obj->has_aliasing_ppgtt_mapping) {
 		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
-		appgtt->base.clear_range(&appgtt->base,
-					 vma->node.start,
-					 obj->base.size,
-					 true);
+		appgtt->base.unmap_vma(vma);
 		obj->has_aliasing_ppgtt_mapping = 0;
 	}
 }
@@ -2521,7 +2527,8 @@ void i915_gem_setup_global_gtt(struct drm_device *dev,
 	 * of the aperture.
 	 */
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct i915_address_space *ggtt_vm = &dev_priv->gtt.base;
+	struct i915_gtt *gtt = &dev_priv->gtt;
+	struct i915_address_space *ggtt_vm = &gtt->base;
 	struct drm_mm_node *entry;
 	struct drm_i915_gem_object *obj;
 	unsigned long hole_start, hole_end;
@@ -2554,12 +2561,12 @@ void i915_gem_setup_global_gtt(struct drm_device *dev,
 	drm_mm_for_each_hole(entry, &ggtt_vm->mm, hole_start, hole_end) {
 		DRM_DEBUG_KMS("clearing unused GTT space: [%lx, %lx]\n",
 			      hole_start, hole_end);
-		ggtt_vm->clear_range(ggtt_vm, hole_start,
-				     hole_end - hole_start, true);
+		gtt->clear_range(gtt, hole_start,
+				 hole_end - hole_start, true);
 	}
 
 	/* And finally clear the reserved guard page */
-	ggtt_vm->clear_range(ggtt_vm, end - PAGE_SIZE, PAGE_SIZE, true);
+	gtt->clear_range(gtt, end - PAGE_SIZE, PAGE_SIZE, true);
 }
 
 void i915_gem_init_global_gtt(struct drm_device *dev)
@@ -2749,8 +2756,8 @@ static int gen8_gmch_probe(struct drm_device *dev,
 
 	ret = ggtt_probe_common(dev, gtt_size);
 
-	dev_priv->gtt.base.clear_range = gen8_ggtt_clear_range;
-	dev_priv->gtt.base.insert_entries = gen8_ggtt_insert_entries;
+	dev_priv->gtt.clear_range = gen8_ggtt_clear_range;
+	dev_priv->gtt.insert_entries = gen8_ggtt_insert_entries;
 
 	return ret;
 }
@@ -2789,8 +2796,8 @@ static int gen6_gmch_probe(struct drm_device *dev,
 
 	ret = ggtt_probe_common(dev, gtt_size);
 
-	dev_priv->gtt.base.clear_range = gen6_ggtt_clear_range;
-	dev_priv->gtt.base.insert_entries = gen6_ggtt_insert_entries;
+	dev_priv->gtt.clear_range = gen6_ggtt_clear_range;
+	dev_priv->gtt.insert_entries = gen6_ggtt_insert_entries;
 
 	return ret;
 }
@@ -2823,7 +2830,7 @@ static int i915_gmch_probe(struct drm_device *dev,
 	intel_gtt_get(gtt_total, stolen, mappable_base, mappable_end);
 
 	dev_priv->gtt.do_idle_maps = needs_idle_maps(dev_priv->dev);
-	dev_priv->gtt.base.clear_range = i915_ggtt_clear_range;
+	dev_priv->gtt.clear_range = i915_ggtt_clear_range;
 
 	if (unlikely(dev_priv->gtt.do_idle_maps))
 		DRM_INFO("applying Ironlake quirks for intel_iommu\n");
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 3904ae5..c265c23 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -260,20 +260,8 @@ struct i915_address_space {
 	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
 				     enum i915_cache_level level,
 				     bool valid); /* Create a valid PTE */
-	int (*allocate_va_range)(struct i915_address_space *vm,
-				 uint64_t start,
-				 uint64_t length);
-	void (*teardown_va_range)(struct i915_address_space *vm,
-				  uint64_t start,
-				  uint64_t length);
-	void (*clear_range)(struct i915_address_space *vm,
-			    uint64_t start,
-			    uint64_t length,
-			    bool use_scratch);
-	void (*insert_entries)(struct i915_address_space *vm,
-			       struct sg_table *st,
-			       uint64_t start,
-			       enum i915_cache_level cache_level);
+	int (*map_vma)(struct i915_vma *vma, u32 flags);
+	void (*unmap_vma)(struct i915_vma *vma);
 	void (*cleanup)(struct i915_address_space *vm);
 };
 
@@ -329,6 +317,14 @@ struct i915_gtt {
 	int (*gtt_probe)(struct drm_device *dev, size_t *gtt_total,
 			  size_t *stolen, phys_addr_t *mappable_base,
 			  unsigned long *mappable_end);
+	void (*insert_entries)(struct i915_gtt *gtt,
+			       struct sg_table *st,
+			       uint64_t start,
+			       enum i915_cache_level cache_level);
+	void (*clear_range)(struct i915_gtt *gtt,
+			    uint64_t start,
+			    uint64_t length,
+			    bool use_scratch);
 };
 
 /* For each pde iterates over every pde between from start until start + length.
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 49/56] drm/i915/bdw: make aliasing PPGTT dynamic
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (47 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 48/56] drm/i915: Restructure map vs. insert entries Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 50/56] drm/i915: Expand error state's address width to 64b Ben Widawsky
                   ` (7 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

There is no need to preallocate the aliasing PPGTT. The code is properly
plubmed now to treat this address space like any other.

v2: Updated for CHV. Note CHV doesn't support 64b address space.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 281 ++++++++++++++++++++----------------
 1 file changed, 153 insertions(+), 128 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d67d803..959054c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -554,14 +554,14 @@ static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	return gen8_write_pdp(ring, 0, ppgtt->pml4.daddr, synchronous);
 }
 
-static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
-				   uint64_t start,
-				   uint64_t length,
-				   bool use_scratch)
+/* Helper function clear a range of PTEs. The range may span multiple page
+ * tables. */
+static void gen8_ppgtt_clear_pte_range(struct i915_hw_ppgtt *ppgtt,
+				       struct i915_pagedirpo *pdp,
+				       uint64_t start,
+				       uint64_t length,
+				       bool scratch)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
-	struct i915_pagedirpo *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
 	unsigned pdpe = gen8_pdpe_index(start);
 	unsigned pde = gen8_pde_index(start);
@@ -570,7 +570,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 	unsigned last_pte, i;
 
 	scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
-				      I915_CACHE_LLC, use_scratch);
+				      I915_CACHE_LLC, scratch);
 
 	while (num_entries) {
 		struct i915_pagedir *pd = pdp->pagedirs[pdpe];
@@ -600,23 +600,21 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
-				      struct sg_table *pages,
-				      uint64_t start,
-				      enum i915_cache_level cache_level)
+static void gen8_ppgtt_insert_pte_entries(struct i915_pagedirpo *pdp,
+					  struct sg_page_iter *sg_iter,
+					  uint64_t start,
+					  size_t pages,
+					  enum i915_cache_level cache_level,
+					  bool flush_pt)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
-	struct i915_pagedirpo *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	gen8_gtt_pte_t *pt_vaddr;
 	unsigned pdpe = gen8_pdpe_index(start);
 	unsigned pde = gen8_pde_index(start);
 	unsigned pte = gen8_pte_index(start);
-	struct sg_page_iter sg_iter;
 
 	pt_vaddr = NULL;
 
-	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
+	while (pages-- && __sg_page_iter_next(sg_iter)) {
 		if (pt_vaddr == NULL) {
 			struct i915_pagedir *pd = pdp->pagedirs[pdpe];
 			struct i915_pagetab *pt = pd->page_tables[pde];
@@ -625,10 +623,10 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		}
 
 		pt_vaddr[pte] =
-			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
+			gen8_pte_encode(sg_page_iter_dma_address(sg_iter),
 					cache_level, true);
 		if (++pte == GEN8_PTES_PER_PT) {
-			if (!HAS_LLC(ppgtt->base.dev))
+			if (flush_pt)
 				drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
 			kunmap_atomic(pt_vaddr);
 			pt_vaddr = NULL;
@@ -640,7 +638,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		}
 	}
 	if (pt_vaddr) {
-		if (!HAS_LLC(ppgtt->base.dev))
+		if (flush_pt)
 			drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
 		kunmap_atomic(pt_vaddr);
 	}
@@ -730,10 +728,14 @@ static void gen8_map_page_directory_pointer(struct i915_pml4 *pml4,
 	kunmap_atomic(pagemap);
 }
 
-static void gen8_unmap_vma_3lvl(struct i915_address_space *vm,
-				struct i915_pagedirpo *pdp,
-				uint64_t start, uint64_t length)
+/* Returns 1 if the a PDP(s) has been freed and the caller could potentially
+ * cleanup. */
+static int gen8_unmap_vma_3lvl(struct i915_address_space *vm,
+			       struct i915_pagedirpo *pdp,
+			       uint64_t start, uint64_t length)
 {
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
 	struct drm_device *dev = vm->dev;
 	struct i915_pagedir *pd;
 	struct i915_pagetab *pt;
@@ -742,7 +744,7 @@ static void gen8_unmap_vma_3lvl(struct i915_address_space *vm,
 
 	if (!pdp || !pdp->pagedirs) {
 		/* If pagedirs are already free, there is nothing to do.*/
-		return;
+		return 0;
 	}
 
 	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
@@ -784,8 +786,6 @@ static void gen8_unmap_vma_3lvl(struct i915_address_space *vm,
 				     gen8_pte_count(pd_start, pd_len));
 
 			if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PT)) {
-				struct i915_hw_ppgtt *ppgtt =
-					container_of(vm, struct i915_hw_ppgtt, base);
 				trace_i915_pagetable_destroy(vm,
 							     pde,
 							     pd_start & GENMASK_ULL(64, GEN8_PDE_SHIFT),
@@ -794,7 +794,9 @@ static void gen8_unmap_vma_3lvl(struct i915_address_space *vm,
 				/* This may be nixed later. Optimize? */
 				gen8_unmap_pagetable(ppgtt, pd, pde);
 			} else {
-				gen8_ppgtt_clear_range(vm, pd_start, pd_len, true);
+				gen8_ppgtt_clear_pte_range(ppgtt, pdp,
+							   pd_start, pd_len,
+							   true);
 			}
 		}
 
@@ -809,12 +811,14 @@ static void gen8_unmap_vma_3lvl(struct i915_address_space *vm,
 	}
 
 	if (bitmap_empty(pdp->used_pdpes, I915_PDPES_PER_PDP(dev))) {
-		/* TODO: When pagetables are fully dynamic:
-		free_pdp_single(pdp, dev); */
+		free_pdp_single(pdp, dev);
 		trace_i915_pagedirpo_destroy(vm, 0,
 					     orig_start & GENMASK_ULL(64, GEN8_PML4E_SHIFT),
 					     GEN8_PML4E_SHIFT);
+		return 1;
 	}
+
+	return 0;
 }
 
 static void gen8_unmap_vma_4lvl(struct i915_address_space *vm,
@@ -824,10 +828,15 @@ static void gen8_unmap_vma_4lvl(struct i915_address_space *vm,
 	struct i915_pagedirpo *pdp;
 	uint64_t temp, pml4e;
 
+	BUG_ON(I915_PDPES_PER_PDP(vm->dev) != 512);
 	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
-		gen8_unmap_vma_3lvl(vm, pdp, start, length);
-		if (bitmap_empty(pdp->used_pdpes, I915_PDPES_PER_PDP(vm->dev)))
+		if (!pdp)
+			continue;
+
+		if (gen8_unmap_vma_3lvl(vm, pdp, start, length)) {
 			clear_bit(pml4e, pml4->used_pml4es);
+			pml4->pdps[pml4e] = NULL;
+		}
 	}
 }
 
@@ -848,6 +857,15 @@ static void gen8_unmap_vma(struct i915_vma *vma)
 	__gen8_teardown_va_range(vma->vm, vma->node.start, vma->node.size);
 }
 
+static void gen8_unmap_aliasing_vma(struct i915_vma *vma)
+{
+	struct drm_device *dev = vma->vm->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	__gen8_teardown_va_range(&dev_priv->mm.aliasing_ppgtt->base,
+				 vma->node.start, vma->node.size);
+}
+
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	trace_i915_va_teardown(&ppgtt->base,
@@ -855,9 +873,14 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	__gen8_teardown_va_range(&ppgtt->base,
 				 ppgtt->base.start, ppgtt->base.total);
 
-	WARN_ON(!bitmap_empty(ppgtt->pdp.used_pdpes,
-			      I915_PDPES_PER_PDP(ppgtt->base.dev)));
-	free_pdp_single(&ppgtt->pdp, ppgtt->base.dev);
+	if (HAS_48B_PPGTT(ppgtt->base.dev)) {
+		WARN_ON(!bitmap_empty(ppgtt->pml4.used_pml4es, GEN8_PML4ES_PER_PML4));
+		pml4_fini(&ppgtt->pml4);
+	} else {
+		WARN_ON(!bitmap_empty(ppgtt->pdp.used_pdpes,
+				      I915_PDPES_PER_PDP(ppgtt->base.dev)));
+		free_pdp_single(&ppgtt->pdp, ppgtt->base.dev);
+	}
 }
 
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1036,12 +1059,20 @@ err_out:
 	return -ENOMEM;
 }
 
-static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
-				    struct i915_pagedirpo *pdp,
-				    uint64_t start,
-				    uint64_t length)
+/**
+ * __gen8_alloc_va_range_3lvl() Map PDEs for a given range
+ * @
+ *
+ */
+static int __gen8_alloc_vma_range_3lvl(struct i915_pagedirpo *pdp,
+				       struct i915_vma *vma,
+				       struct sg_page_iter *sg_iter,
+				       uint64_t start,
+				       uint64_t length,
+				       u32 flags)
 {
 	unsigned long *new_page_dirs, **new_page_tables;
+	struct i915_address_space *vm = vma->vm;
 	struct drm_device *dev = vm->dev;
 	struct i915_pagedir *pd;
 	const uint64_t orig_start = start;
@@ -1051,6 +1082,8 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 	size_t pdpes = I915_PDPES_PER_PDP(dev);
 	int ret;
 
+	BUG_ON(!sg_iter->sg);
+
 #ifdef CONFIG_32BIT
 	/* Disallow 64b address on 32b platforms. Nothing is wrong with doing
 	 * this in hardware, but a lot of the drm code is not prepared to handle
@@ -1096,16 +1129,23 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 
 		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
 			BUG_ON(!pt);
+			BUG_ON(!pd_len);
+			BUG_ON(!gen8_pte_count(pd_start, pd_len));
+			BUG_ON(!sg_iter->__nents);
 
 			bitmap_set(pt->used_ptes,
 				   gen8_pte_index(pd_start),
 				   gen8_pte_count(pd_start, pd_len));
 
+			gen8_ppgtt_insert_pte_entries(pdp, sg_iter, pd_start,
+						      gen8_pte_count(pd_start, pd_len),
+						      flags, !HAS_LLC(vm->dev));
 			set_bit(pde, pd->used_pdes);
 		}
 
 		set_bit(pdpe, pdp->used_pdpes);
 
+
 		gen8_map_pagetable_range(vm, pd, start, length);
 		gen8_map_page_directory(pdp, pd, pdpe, dev);
 	}
@@ -1126,18 +1166,21 @@ err_out:
 	return ret;
 }
 
-static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
-				    struct i915_pml4 *pml4,
-				    uint64_t start,
-				    uint64_t length)
+static int __gen8_alloc_vma_range_4lvl(struct i915_pml4 *pml4,
+				       struct i915_vma *vma,
+				       struct sg_page_iter *sg_iter,
+				       u32 flags)
 {
 	DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4);
+	struct i915_address_space *vm = vma->vm;
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_pagedirpo *pdp;
+	uint64_t start = vma->node.start, length = vma->node.size;
 	const uint64_t orig_start = start;
 	const uint64_t orig_length = length;
 	uint64_t temp, pml4e;
+	int ret;
 
 	/* Do the pml4 allocations first, so we don't need to track the newly
 	 * allocated tables below the pdp */
@@ -1168,11 +1211,10 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
 	length = orig_length;
 
 	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
-		int ret;
-
 		BUG_ON(!pdp);
 
-		ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
+		ret = __gen8_alloc_vma_range_3lvl(pdp, vma, sg_iter,
+						  start, length, flags);
 		if (ret)
 			goto err_out;
 
@@ -1198,39 +1240,36 @@ err_out:
 err_alloc:
 	for_each_set_bit(pml4e, new_pdps, GEN8_PML4ES_PER_PML4)
 		free_pdp_single(pdp, vm->dev);
+
+	return ret;
 }
 
-static int __gen8_alloc_va_range(struct i915_address_space *vm,
-				 uint64_t start, uint64_t length)
+static int __gen8_map_vma(struct i915_address_space *vm, struct i915_vma *vma, u32 flags)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	struct sg_page_iter sg_iter;
 
-	if (HAS_48B_PPGTT(vm->dev))
-		return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+	__sg_page_iter_start(&sg_iter, vma->obj->pages->sgl, sg_nents(vma->obj->pages->sgl), 0);
+	if (HAS_48B_PPGTT(vma->vm->dev))
+		return __gen8_alloc_vma_range_4lvl(&ppgtt->pml4, vma, &sg_iter, flags);
 	else
-		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
+		return __gen8_alloc_vma_range_3lvl(&ppgtt->pdp, vma, &sg_iter,
+						   vma->node.start,
+						   vma->node.size,
+						   flags);
 }
 
-static int gen8_map_vma(struct i915_vma *vma, u32 flags)
+static int gen8_map_aliasing_vma(struct i915_vma *vma, u32 flags)
 {
-	int ret =  __gen8_alloc_va_range(vma->vm, vma->node.start,vma->node.size);
-	if (!ret) {
-		BUG_ON(flags >= I915_CACHE_MAX);
-		gen8_ppgtt_insert_entries(vma->vm, vma->obj->pages, vma->node.start,
-					  flags);
-	}
-
-	return ret;
+	struct drm_device *dev = vma->vm->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	return __gen8_map_vma(&dev_priv->mm.aliasing_ppgtt->base, vma, flags);
 }
 
-static void gen8_ppgtt_fini_common(struct i915_hw_ppgtt *ppgtt)
+static int gen8_map_vma(struct i915_vma *vma, u32 flags)
 {
-	free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
-	if (HAS_48B_PPGTT(ppgtt->base.dev))
-		pml4_fini(&ppgtt->pml4);
-	else
-		free_pdp_single(&ppgtt->pdp, ppgtt->base.dev);
+	return __gen8_map_vma(vma->vm, vma, flags);
 }
 
 /**
@@ -1240,12 +1279,18 @@ static void gen8_ppgtt_fini_common(struct i915_hw_ppgtt *ppgtt)
  * space.
  *
  */
-static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
+static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 {
 	ppgtt->base.start = 0;
-	ppgtt->base.total = size;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 	ppgtt->enable = gen8_ppgtt_enable;
+	if (aliasing) {
+		ppgtt->base.map_vma = gen8_map_aliasing_vma;
+		ppgtt->base.unmap_vma = gen8_unmap_aliasing_vma;
+	} else {
+		ppgtt->base.map_vma = gen8_map_vma;
+		ppgtt->base.unmap_vma = gen8_unmap_vma;
+	}
 
 	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
 	if (IS_ERR(ppgtt->scratch_pd))
@@ -1257,6 +1302,7 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 			free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
 			return ret;
 		}
+		ppgtt->base.total = (1ULL<<48);
 		ppgtt->switch_mm = gen8_48b_mm_switch;
 	} else {
 		int ret = __pdp_init(&ppgtt->pdp, false);
@@ -1266,61 +1312,13 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 		}
 
 		ppgtt->switch_mm = gen8_legacy_mm_switch;
+		ppgtt->base.total = (1ULL<<32);
 		trace_i915_pagedirpo_alloc(&ppgtt->base, 0, 0, GEN8_PML4E_SHIFT);
 	}
 
 	return 0;
 }
 
-static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
-{
-	struct drm_device *dev = ppgtt->base.dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct i915_pagedirpo *pdp = &ppgtt->pdp; /* FIXME: 48b */
-	struct i915_pagedir *pd;
-	uint64_t temp, start = 0, size = dev_priv->gtt.base.total;
-	uint32_t pdpe;
-	int ret;
-
-	ret = gen8_ppgtt_init_common(ppgtt, size);
-	if (ret)
-		return ret;
-
-	ret = __gen8_alloc_va_range(&ppgtt->base, start, size);
-	if (ret) {
-		gen8_ppgtt_fini_common(ppgtt);
-		return ret;
-	}
-
-	/* FIXME: PML4 */
-	gen8_for_each_pdpe(pd, pdp, start, size, temp, pdpe)
-		gen8_map_pagetable_range(&ppgtt->base, pd, start, size);
-
-	BUG(); // we need a map_vma for aliasing
-	ppgtt->base.map_vma = NULL;
-	ppgtt->base.unmap_vma = NULL;
-
-	gen8_ppgtt_clear_range(&ppgtt->base, 0, dev_priv->gtt.base.total, true);
-
-	return 0;
-}
-
-static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
-{
-	struct drm_device *dev = ppgtt->base.dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	int ret;
-
-	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
-	if (ret)
-		return ret;
-
-	ppgtt->base.map_vma = gen8_map_vma;
-	ppgtt->base.unmap_vma = gen8_unmap_vma;
-
-	return 0;
-}
-
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 {
 	struct i915_address_space *vm = &ppgtt->base;
@@ -1687,10 +1685,10 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 }
 
-static int gen6_alloc_va_range(struct i915_vma *vma, u32 flags)
+static int _gen6_map_vma(struct i915_address_space *vm,
+			 struct i915_vma *vma, u32 flags)
 {
 	DECLARE_BITMAP(new_page_tables, I915_PDES_PER_PD);
-	struct i915_address_space *vm = vma->vm;
 	struct drm_device *dev = vm->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_hw_ppgtt *ppgtt =
@@ -1780,9 +1778,20 @@ unwind_out:
 	return ret;
 }
 
-static void gen6_unmap_vma(struct i915_vma *vma)
+static int gen6_map_aliasing_vma(struct i915_vma *vma, u32 flags)
+{
+	struct drm_i915_private *dev_priv = vma->vm->dev->dev_private;
+	return _gen6_map_vma(&dev_priv->mm.aliasing_ppgtt->base, vma, flags);
+}
+
+static int gen6_map_vma(struct i915_vma *vma, u32 flags)
+{
+	return _gen6_map_vma(vma->vm, vma, flags);
+}
+
+static void _gen6_unmap_vma(struct i915_address_space *vm,
+			    struct i915_vma *vma)
 {
-	struct i915_address_space *vm = vma->vm;
 	struct i915_hw_ppgtt *ppgtt =
 		        container_of(vm, struct i915_hw_ppgtt, base);
 	uint32_t start = vma->node.start, length = vma->node.size;
@@ -1817,6 +1826,17 @@ static void gen6_unmap_vma(struct i915_vma *vma)
 	gen6_ppgtt_clear_range(vm, orig_start, orig_length, true);
 }
 
+static void gen6_unmap_aliasing_vma(struct i915_vma *vma)
+{
+	struct drm_i915_private *dev_priv = vma->vm->dev->dev_private;
+	_gen6_unmap_vma(&dev_priv->mm.aliasing_ppgtt->base, vma);
+}
+
+static void gen6_unmap_vma(struct i915_vma *vma)
+{
+	_gen6_unmap_vma(vma->vm, vma);
+}
+
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	struct i915_pagetab *pt;
@@ -1944,8 +1964,13 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 	if (ret)
 		return ret;
 
-	ppgtt->base.map_vma = gen6_alloc_va_range;
-	ppgtt->base.unmap_vma = gen6_unmap_vma;
+	if (aliasing) {
+		ppgtt->base.map_vma = gen6_map_aliasing_vma;
+		ppgtt->base.unmap_vma = gen6_unmap_aliasing_vma;
+	} else {
+		ppgtt->base.map_vma = gen6_map_vma;
+		ppgtt->base.unmap_vma = gen6_unmap_vma;
+	}
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
 	ppgtt->base.total = I915_PDES_PER_PD * GEN6_PTES_PER_PT * PAGE_SIZE;
@@ -1957,8 +1982,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
 		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
-	if (!aliasing)
-		gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
+	gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
 
 	gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
 
@@ -1979,16 +2003,18 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt, boo
 
 	if (INTEL_INFO(dev)->gen < 8)
 		ret = gen6_ppgtt_init(ppgtt, aliasing);
-	else if (IS_GEN8(dev) && aliasing)
-		ret = gen8_aliasing_ppgtt_init(ppgtt);
 	else if (IS_GEN8(dev))
-		ret = gen8_ppgtt_init(ppgtt);
+		ret = gen8_ppgtt_init(ppgtt, aliasing);
 	else
 		BUG();
 
 	if (ret)
 		return ret;
 
+	BUG_ON(ppgtt->base.total < dev_priv->gtt.base.total && aliasing);
+	if (aliasing)
+		ppgtt->base.total = dev_priv->gtt.base.total;
+
 	kref_init(&ppgtt->ref);
 	drm_mm_init(&ppgtt->base.mm, ppgtt->base.start, ppgtt->base.total);
 	i915_init_vm(dev_priv, &ppgtt->base);
@@ -2450,7 +2476,6 @@ static int ggtt_bind_vma(struct i915_vma *vma,
 	    (!obj->has_aliasing_ppgtt_mapping ||
 	     (cache_level != obj->cache_level))) {
 		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
-		BUG();
 		appgtt->base.map_vma(vma, cache_level);
 		vma->obj->has_aliasing_ppgtt_mapping = 1;
 	}
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 50/56] drm/i915: Expand error state's address width to 64b
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (48 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 49/56] drm/i915/bdw: make aliasing PPGTT dynamic Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 51/56] drm/i915/bdw: Flip the 48b switch Ben Widawsky
                   ` (6 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

v2: 0 pad the new 8B fields or else intel_error_decode has a hard time.
Note, regardless we need an igt update.

v3: Make reloc_offset 64b also.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h       |  4 ++--
 drivers/gpu/drm/i915/i915_gpu_error.c | 18 ++++++++++--------
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a043941..b3b52cf 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -365,7 +365,7 @@ struct drm_i915_error_state {
 
 		struct drm_i915_error_object {
 			int page_count;
-			u32 gtt_offset;
+			u64 gtt_offset;
 			u32 *pages[0];
 		} *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
 
@@ -390,7 +390,7 @@ struct drm_i915_error_state {
 		u32 size;
 		u32 name;
 		u32 rseqno, wseqno;
-		u32 gtt_offset;
+		u64 gtt_offset;
 		u32 read_domains;
 		u32 write_domain;
 		s32 fence_reg:I915_MAX_NUM_FENCE_BITS;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 5d691cd..d639d6f 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -195,7 +195,7 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
 	err_printf(m, "%s [%d]:\n", name, count);
 
 	while (count--) {
-		err_printf(m, "  %08x %8u %02x %02x %x %x",
+		err_printf(m, "  %016llx %8u %02x %02x %x %x",
 			   err->gtt_offset,
 			   err->size,
 			   err->read_domains,
@@ -402,7 +402,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 				err_printf(m, " (submitted by %s [%d])",
 					   error->ring[i].comm,
 					   error->ring[i].pid);
-			err_printf(m, " --- gtt_offset = 0x%08x\n",
+			err_printf(m, " --- gtt_offset = 0x%016llx\n",
 				   obj->gtt_offset);
 			print_error_obj(m, obj);
 		}
@@ -410,7 +410,8 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		obj = error->ring[i].wa_batchbuffer;
 		if (obj) {
 			err_printf(m, "%s (w/a) --- gtt_offset = 0x%08x\n",
-				   dev_priv->ring[i].name, obj->gtt_offset);
+				   dev_priv->ring[i].name,
+				   lower_32_bits(obj->gtt_offset));
 			print_error_obj(m, obj);
 		}
 
@@ -429,14 +430,14 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		if ((obj = error->ring[i].ringbuffer)) {
 			err_printf(m, "%s --- ringbuffer = 0x%08x\n",
 				   dev_priv->ring[i].name,
-				   obj->gtt_offset);
+				   lower_32_bits(obj->gtt_offset));
 			print_error_obj(m, obj);
 		}
 
 		if ((obj = error->ring[i].hws_page)) {
 			err_printf(m, "%s --- HW Status = 0x%08x\n",
 				   dev_priv->ring[i].name,
-				   obj->gtt_offset);
+				   lower_32_bits(obj->gtt_offset));
 			offset = 0;
 			for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
 				err_printf(m, "[%04x] %08x %08x %08x %08x\n",
@@ -452,13 +453,14 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		if ((obj = error->ring[i].ctx)) {
 			err_printf(m, "%s --- HW Context = 0x%08x\n",
 				   dev_priv->ring[i].name,
-				   obj->gtt_offset);
+				   lower_32_bits(obj->gtt_offset));
 			print_error_obj(m, obj);
 		}
 	}
 
 	if ((obj = error->semaphore_obj)) {
-		err_printf(m, "Semaphore page = 0x%08x\n", obj->gtt_offset);
+		err_printf(m, "Semaphore page = 0x%08x\n",
+			   lower_32_bits(obj->gtt_offset));
 		for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
 			err_printf(m, "[%04x] %08x %08x %08x %08x\n",
 				   elt * 4,
@@ -554,7 +556,7 @@ i915_error_object_create_sized(struct drm_i915_private *dev_priv,
 {
 	struct drm_i915_error_object *dst;
 	int i;
-	u32 reloc_offset;
+	u64 reloc_offset;
 
 	if (src == NULL || src->pages == NULL)
 		return NULL;
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 51/56] drm/i915/bdw: Flip the 48b switch
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (49 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 50/56] drm/i915: Expand error state's address width to 64b Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 52/56] TESTME: GFX_TLB_INVALIDATE_EXPLICIT Ben Widawsky
                   ` (5 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h     | 2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 3 ---
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b3b52cf..0848638 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1924,7 +1924,7 @@ struct drm_i915_cmd_table {
 #ifdef CONFIG_32BIT
 # define HAS_48B_PPGTT(dev)	false
 #else
-# define HAS_48B_PPGTT(dev)	(IS_BROADWELL(dev) && false)
+# define HAS_48B_PPGTT(dev)	IS_BROADWELL(dev)
 #endif
 
 #define HAS_OVERLAY(dev)		(INTEL_INFO(dev)->has_overlay)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 959054c..d73a132 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -982,9 +982,6 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_address_space *vm,
 
 	BUG_ON(!bitmap_empty(new_pds, pdpes));
 
-	/* FIXME: PPGTT container_of won't work for 64b */
-	BUG_ON((start + length) > 0x800000000ULL);
-
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		struct i915_pagedir *pd;
 		if (unused)
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 52/56] TESTME: GFX_TLB_INVALIDATE_EXPLICIT
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (50 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 51/56] drm/i915/bdw: Flip the 48b switch Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 53/56] TESTME: Always force invalidate Ben Widawsky
                   ` (4 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky

---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 33f9abd..15ede8e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -630,7 +630,7 @@ static int init_render_ring(struct intel_ring_buffer *ring)
 			   _MASKED_BIT_ENABLE(GFX_TLB_INVALIDATE_EXPLICIT));
 
 	/* WaBCSVCSTlbInvalidationMode:ivb,vlv,hsw */
-	if (IS_GEN7(dev))
+	if (IS_GEN7(dev) || IS_GEN8(dev))
 		I915_WRITE(GFX_MODE_GEN7,
 			   _MASKED_BIT_ENABLE(GFX_TLB_INVALIDATE_EXPLICIT) |
 			   _MASKED_BIT_ENABLE(GFX_REPLAY_MODE));
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 53/56] TESTME: Always force invalidate
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (51 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 52/56] TESTME: GFX_TLB_INVALIDATE_EXPLICIT Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 54/56] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl Ben Widawsky
                   ` (3 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky

---
 drivers/gpu/drm/i915/i915_gem_context.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index fec8114..a4ea50a 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -681,7 +681,7 @@ static int do_switch(struct intel_ring_buffer *ring,
 	 * it must avoid lite restores in HW by programming "Force Restore" bit
 	 * to ‘1’ in context descriptor during context submission
 	 */
-	if (IS_GEN8(ring->dev) && i915_semaphore_is_enabled(ring->dev))
+	if (IS_GEN8(ring->dev) && to->is_initialized)
 		hw_flags |= MI_FORCE_RESTORE;
 
 	ret = mi_set_context(ring, to, hw_flags);
-- 
1.9.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 54/56] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (52 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 53/56] TESTME: Always force invalidate Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 55/56] drm/i915: Track userptr VMAs Ben Widawsky
                   ` (2 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Akash Goel

From: Chris Wilson <chris@chris-wilson.co.uk>

By exporting the ability to map user address and inserting PTEs
representing their backing pages into the GTT, we can exploit UMA in order
to utilize normal application data as a texture source or even as a
render target (depending upon the capabilities of the chipset). This has
a number of uses, with zero-copy downloads to the GPU and efficient
readback making the intermixed streaming of CPU and GPU operations
fairly efficient. This ability has many widespread implications from
faster rendering of client-side software rasterisers (chromium),
mitigation of stalls due to read back (firefox) and to faster pipelining
of texture data (such as pixel buffer objects in GL or data blobs in CL).

v2: Compile with CONFIG_MMU_NOTIFIER
v3: We can sleep while performing invalidate-range, which we can utilise
to drop our page references prior to the kernel manipulating the vma
(for either discard or cloning) and so protect normal users.
v4: Only run the invalidate notifier if the range intercepts the bo.
v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
v6: Recheck after reacquire mutex for lost mmu.
v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
v8: Fix rebasing error after forwarding porting the back port.
v9: Limit the userptr to page aligned entries. We now expect userspace
    to handle all the offset-in-page adjustments itself.
v10: Prevent vma from being copied across fork to avoid issues with cow.
v11: Drop vma behaviour changes -- locking is nigh on impossible.
     Use a worker to load user pages to avoid lock inversions.
v12: Use get_task_mm()/mmput() for correct refcounting of mm.
v13: Use a worker to release the mmu_notifier to avoid lock inversion
v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
     with its own locking and tree of objects for each mm/mmu_notifier.
v15: Prevent overlapping userptr objects, and invalidate all objects
     within the mmu_notifier range
v16: Fix a typo for iterating over multiple objects in the range and
     rearrange error path to destroy the mmu_notifier locklessly.
     Also close a race between invalidate_range and the get_pages_worker.
v17: Close a race between get_pages_worker/invalidate_range and fresh
     allocations of the same userptr range - and notice that
     struct_mutex was presumed to be held when during creation it wasn't.
v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
     for the struct sg_table and to clear it before reporting an error.
v19: Always error out on read-only userptr requests as we don't have the
     hardware infrastructure to support them at the moment.
v20: Refuse to implement read-only support until we have the required
     infrastructure - but reserve the bit in flags for future use.
v21: use_mm() is not required for get_user_pages(). It is only meant to
     be used to fix up the kernel thread's current->mm for use with
     copy_user().
v22: Use sg_alloc_table_from_pages for that chunky feeling

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Akash Goel <akash.goel@intel.com>
Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>

Conflicts:
	drivers/gpu/drm/i915/i915_dma.c
	drivers/gpu/drm/i915/i915_drv.h
	include/uapi/drm/i915_drm.h
---
 drivers/gpu/drm/i915/Kconfig            |   1 +
 drivers/gpu/drm/i915/Makefile           |   1 +
 drivers/gpu/drm/i915/i915_dma.c         |   1 +
 drivers/gpu/drm/i915/i915_drv.h         |  24 +-
 drivers/gpu/drm/i915/i915_gem.c         |   4 +
 drivers/gpu/drm/i915/i915_gem_dmabuf.c  |   5 +
 drivers/gpu/drm/i915/i915_gem_userptr.c | 701 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gpu_error.c   |   2 +
 include/uapi/drm/i915_drm.h             |  16 +
 9 files changed, 754 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_userptr.c

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index e4e3c01..437e182 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -5,6 +5,7 @@ config DRM_I915
 	depends on (AGP || AGP=n)
 	select INTEL_GTT
 	select AGP_INTEL if AGP
+	select INTERVAL_TREE
 	# we need shmfs for the swappable backing store, and in particular
 	# the shmem_readpage() which depends upon tmpfs
 	select SHMEM
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index b5d4029..e548f4e 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -27,6 +27,7 @@ i915-y += i915_cmd_parser.o \
 	  i915_gem.o \
 	  i915_gem_stolen.o \
 	  i915_gem_tiling.o \
+	  i915_gem_userptr.o \
 	  i915_gpu_error.o \
 	  i915_irq.o \
 	  i915_trace_points.o \
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 54a08a9..00ae6d6 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1995,6 +1995,7 @@ const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_DESTROY, i915_gem_context_destroy_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_REG_READ, i915_reg_read_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATS, i915_get_reset_stats_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 };
 
 int i915_max_ioctl = DRM_ARRAY_SIZE(i915_ioctls);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0848638..60513e7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -41,6 +41,7 @@
 #include <linux/i2c-algo-bit.h>
 #include <drm/intel-gtt.h>
 #include <linux/backlight.h>
+#include <linux/hashtable.h>
 #include <linux/intel-iommu.h>
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
@@ -172,6 +173,7 @@ enum hpd_pin {
 		if ((intel_connector)->base.encoder == (__encoder))
 
 struct drm_i915_private;
+struct i915_mmu_object;
 
 enum intel_dpll_id {
 	DPLL_ID_PRIVATE = -1, /* non-shared dpll in use */
@@ -398,6 +400,7 @@ struct drm_i915_error_state {
 		u32 tiling:2;
 		u32 dirty:1;
 		u32 purgeable:1;
+		u32 userptr:1;
 		s32 ring:4;
 		u32 cache_level:3;
 	} **active_bo, **pinned_bo;
@@ -1444,6 +1447,9 @@ struct drm_i915_private {
 	struct i915_gtt gtt; /* VM representing the global address space */
 
 	struct i915_gem_mm mm;
+#if defined(CONFIG_MMU_NOTIFIER)
+	DECLARE_HASHTABLE(mmu_notifiers, 7);
+#endif
 
 	/* Kernel Modesetting */
 
@@ -1577,6 +1583,7 @@ struct drm_i915_gem_object_ops {
 	 */
 	int (*get_pages)(struct drm_i915_gem_object *);
 	void (*put_pages)(struct drm_i915_gem_object *);
+	void (*release)(struct drm_i915_gem_object *);
 };
 
 struct drm_i915_gem_object {
@@ -1690,8 +1697,20 @@ struct drm_i915_gem_object {
 
 	/** for phy allocated objects */
 	struct drm_i915_gem_phys_object *phys_obj;
-};
 
+	union {
+		struct i915_gem_userptr {
+			uintptr_t ptr;
+			unsigned read_only :1;
+			unsigned active :4;
+#define I915_GEM_USERPTR_MAX_ACTIVE 15
+
+			struct mm_struct *mm;
+			struct i915_mmu_object *mn;
+			struct work_struct *work;
+		} userptr;
+	};
+};
 #define to_intel_bo(x) container_of(x, struct drm_i915_gem_object, base)
 
 /**
@@ -2123,6 +2142,9 @@ int i915_gem_set_tiling(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
 int i915_gem_get_tiling(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
+int i915_gem_init_userptr(struct drm_device *dev);
+int i915_gem_userptr_ioctl(struct drm_device *dev, void *data,
+			   struct drm_file *file);
 int i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
 				struct drm_file *file_priv);
 int i915_gem_wait_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 320d6b0..287d48e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4240,6 +4240,9 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 	if (obj->base.import_attach)
 		drm_prime_gem_destroy(&obj->base, NULL);
 
+	if (obj->ops->release)
+		obj->ops->release(obj);
+
 	drm_gem_object_release(&obj->base);
 	i915_gem_info_remove_obj(dev_priv, obj->base.size);
 
@@ -4519,6 +4522,7 @@ int i915_gem_init(struct drm_device *dev)
 			DRM_DEBUG_DRIVER("allow wake ack timed out\n");
 	}
 
+	i915_gem_init_userptr(dev);
 	i915_gem_init_global_gtt(dev);
 
 	ret = i915_gem_context_init(dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index 321102a..61195bc 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -229,6 +229,11 @@ static const struct dma_buf_ops i915_dmabuf_ops =  {
 struct dma_buf *i915_gem_prime_export(struct drm_device *dev,
 				      struct drm_gem_object *gem_obj, int flags)
 {
+	struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
+
+	if (obj->userptr.mm && obj->userptr.mn == NULL)
+		return ERR_PTR(-EINVAL);
+
 	return dma_buf_export(gem_obj, &i915_dmabuf_ops, gem_obj->size, flags);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
new file mode 100644
index 0000000..5da37cc
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -0,0 +1,701 @@
+/*
+ * Copyright © 2012-2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "drmP.h"
+#include "i915_drm.h"
+#include "i915_drv.h"
+#include "i915_trace.h"
+#include "intel_drv.h"
+#include <linux/mmu_context.h>
+#include <linux/mmu_notifier.h>
+#include <linux/mempolicy.h>
+#include <linux/swap.h>
+
+#if defined(CONFIG_MMU_NOTIFIER)
+#include <linux/interval_tree.h>
+
+struct i915_mmu_notifier {
+	spinlock_t lock;
+	struct hlist_node node;
+	struct mmu_notifier mn;
+	struct rb_root objects;
+	struct drm_device *dev;
+	struct mm_struct *mm;
+	struct work_struct work;
+	unsigned long count;
+	unsigned long serial;
+};
+
+struct i915_mmu_object {
+	struct i915_mmu_notifier *mmu;
+	struct interval_tree_node it;
+	struct drm_i915_gem_object *obj;
+};
+
+static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
+						       struct mm_struct *mm,
+						       unsigned long start,
+						       unsigned long end)
+{
+	struct i915_mmu_notifier *mn = container_of(_mn, struct i915_mmu_notifier, mn);
+	struct interval_tree_node *it = NULL;
+	unsigned long serial = 0;
+
+	end--; /* interval ranges are inclusive, but invalidate range is exclusive */
+	while (start < end) {
+		struct drm_i915_gem_object *obj;
+
+		obj = NULL;
+		spin_lock(&mn->lock);
+		if (serial == mn->serial)
+			it = interval_tree_iter_next(it, start, end);
+		else
+			it = interval_tree_iter_first(&mn->objects, start, end);
+		if (it != NULL) {
+			obj = container_of(it, struct i915_mmu_object, it)->obj;
+			drm_gem_object_reference(&obj->base);
+			serial = mn->serial;
+		}
+		spin_unlock(&mn->lock);
+		if (obj == NULL)
+			return;
+
+		mutex_lock(&mn->dev->struct_mutex);
+		/* Cancel any active worker and force us to re-evaluate gup */
+		obj->userptr.work = NULL;
+
+		if (obj->pages != NULL) {
+			struct drm_i915_private *dev_priv = to_i915(mn->dev);
+			struct i915_vma *vma, *tmp;
+			bool was_interruptible;
+
+			was_interruptible = dev_priv->mm.interruptible;
+			dev_priv->mm.interruptible = false;
+
+			list_for_each_entry_safe(vma, tmp, &obj->vma_list, vma_link) {
+				int ret = i915_vma_unbind(vma);
+				WARN_ON(ret && ret != -EIO);
+			}
+			WARN_ON(i915_gem_object_put_pages(obj));
+
+			dev_priv->mm.interruptible = was_interruptible;
+		}
+
+		start = obj->userptr.ptr + obj->base.size;
+
+		drm_gem_object_unreference(&obj->base);
+		mutex_unlock(&mn->dev->struct_mutex);
+	}
+}
+
+static const struct mmu_notifier_ops i915_gem_userptr_notifier = {
+	.invalidate_range_start = i915_gem_userptr_mn_invalidate_range_start,
+};
+
+static struct i915_mmu_notifier *
+__i915_mmu_notifier_lookup(struct drm_device *dev, struct mm_struct *mm)
+{
+	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct i915_mmu_notifier *mmu;
+
+	/* Protected by dev->struct_mutex */
+	hash_for_each_possible(dev_priv->mmu_notifiers, mmu, node, (unsigned long)mm)
+		if (mmu->mm == mm)
+			return mmu;
+
+	return NULL;
+}
+
+static struct i915_mmu_notifier *
+i915_mmu_notifier_get(struct drm_device *dev, struct mm_struct *mm)
+{
+	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct i915_mmu_notifier *mmu;
+	int ret;
+
+	lockdep_assert_held(&dev->struct_mutex);
+
+	mmu = __i915_mmu_notifier_lookup(dev, mm);
+	if (mmu)
+		return mmu;
+
+	mmu = kmalloc(sizeof(*mmu), GFP_KERNEL);
+	if (mmu == NULL)
+		return ERR_PTR(-ENOMEM);
+
+	spin_lock_init(&mmu->lock);
+	mmu->dev = dev;
+	mmu->mn.ops = &i915_gem_userptr_notifier;
+	mmu->mm = mm;
+	mmu->objects = RB_ROOT;
+	mmu->count = 0;
+	mmu->serial = 0;
+
+	/* Protected by mmap_sem (write-lock) */
+	ret = __mmu_notifier_register(&mmu->mn, mm);
+	if (ret) {
+		kfree(mmu);
+		return ERR_PTR(ret);
+	}
+
+	/* Protected by dev->struct_mutex */
+	hash_add(dev_priv->mmu_notifiers, &mmu->node, (unsigned long)mm);
+	return mmu;
+}
+
+static void
+__i915_mmu_notifier_destroy_worker(struct work_struct *work)
+{
+	struct i915_mmu_notifier *mmu = container_of(work, typeof(*mmu), work);
+	mmu_notifier_unregister(&mmu->mn, mmu->mm);
+	kfree(mmu);
+}
+
+static void
+__i915_mmu_notifier_destroy(struct i915_mmu_notifier *mmu)
+{
+	lockdep_assert_held(&mmu->dev->struct_mutex);
+
+	/* Protected by dev->struct_mutex */
+	hash_del(&mmu->node);
+
+	/* Our lock ordering is: mmap_sem, mmu_notifier_scru, struct_mutex.
+	 * We enter the function holding struct_mutex, therefore we need
+	 * to drop our mutex prior to calling mmu_notifier_unregister in
+	 * order to prevent lock inversion (and system-wide deadlock)
+	 * between the mmap_sem and struct-mutex. Hence we defer the
+	 * unregistration to a workqueue where we hold no locks.
+	 */
+	INIT_WORK(&mmu->work, __i915_mmu_notifier_destroy_worker);
+	schedule_work(&mmu->work);
+}
+
+static void __i915_mmu_notifier_update_serial(struct i915_mmu_notifier *mmu)
+{
+	if (++mmu->serial == 0)
+		mmu->serial = 1;
+}
+
+static void
+i915_mmu_notifier_del(struct i915_mmu_notifier *mmu,
+		      struct i915_mmu_object *mn)
+{
+	lockdep_assert_held(&mmu->dev->struct_mutex);
+
+	spin_lock(&mmu->lock);
+	interval_tree_remove(&mn->it, &mmu->objects);
+	__i915_mmu_notifier_update_serial(mmu);
+	spin_unlock(&mmu->lock);
+
+	/* Protected against _add() by dev->struct_mutex */
+	if (--mmu->count == 0)
+		__i915_mmu_notifier_destroy(mmu);
+}
+
+static int
+i915_mmu_notifier_add(struct i915_mmu_notifier *mmu,
+		      struct i915_mmu_object *mn)
+{
+	struct interval_tree_node *it;
+	int ret;
+
+	/* Make sure we drop the final active reference (and thereby
+	 * remove the objects from the interval tree) before we do
+	 * the check for overlapping objects.
+	 */
+	ret = i915_mutex_lock_interruptible(mmu->dev);
+	if (ret)
+		return ret;
+
+	i915_gem_retire_requests(mmu->dev);
+
+	/* Disallow overlapping userptr objects */
+	spin_lock(&mmu->lock);
+	it = interval_tree_iter_first(&mmu->objects,
+				      mn->it.start, mn->it.last);
+	if (it) {
+		struct drm_i915_gem_object *obj;
+
+		/* We only need to check the first object as it either
+		 * is idle (and in use elsewhere) or we try again in order
+		 * to give time for the gup-worker to run and flush its
+		 * object references. Afterwards if we find another
+		 * object that is idle (and so referenced elsewhere)
+		 * we know that the overlap with an pinned object is
+		 * genuine.
+		 */
+		obj = container_of(it, struct i915_mmu_object, it)->obj;
+		ret = obj->userptr.active ? -EAGAIN : -EINVAL;
+	} else {
+		interval_tree_insert(&mn->it, &mmu->objects);
+		__i915_mmu_notifier_update_serial(mmu);
+		ret = 0;
+	}
+	spin_unlock(&mmu->lock);
+	mutex_unlock(&mmu->dev->struct_mutex);
+
+	return ret;
+}
+
+static void
+i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj)
+{
+	struct i915_mmu_object *mn;
+
+	mn = obj->userptr.mn;
+	if (mn == NULL)
+		return;
+
+	i915_mmu_notifier_del(mn->mmu, mn);
+	obj->userptr.mn = NULL;
+}
+
+static int
+i915_gem_userptr_init__mmu_notifier(struct drm_i915_gem_object *obj,
+				    unsigned flags)
+{
+	struct i915_mmu_notifier *mmu;
+	struct i915_mmu_object *mn;
+	int ret;
+
+	if (flags & I915_USERPTR_UNSYNCHRONIZED)
+		return capable(CAP_SYS_ADMIN) ? 0 : -EPERM;
+
+	down_write(&obj->userptr.mm->mmap_sem);
+	ret = i915_mutex_lock_interruptible(obj->base.dev);
+	if (ret == 0) {
+		mmu = i915_mmu_notifier_get(obj->base.dev, obj->userptr.mm);
+		if (!IS_ERR(mmu))
+			mmu->count++; /* preemptive add to act as a refcount */
+		else
+			ret = PTR_ERR(mmu);
+		mutex_unlock(&obj->base.dev->struct_mutex);
+	}
+	up_write(&obj->userptr.mm->mmap_sem);
+	if (ret)
+		return ret;
+
+	mn = kzalloc(sizeof(*mn), GFP_KERNEL);
+	if (mn == NULL) {
+		ret = -ENOMEM;
+		goto destroy_mmu;
+	}
+
+	mn->mmu = mmu;
+	mn->it.start = obj->userptr.ptr;
+	mn->it.last = mn->it.start + obj->base.size - 1;
+	mn->obj = obj;
+
+	ret = i915_mmu_notifier_add(mmu, mn);
+	if (ret)
+		goto free_mn;
+
+	obj->userptr.mn = mn;
+	return 0;
+
+free_mn:
+	kfree(mn);
+destroy_mmu:
+	mutex_lock(&obj->base.dev->struct_mutex);
+	if (--mmu->count == 0)
+		__i915_mmu_notifier_destroy(mmu);
+	mutex_unlock(&obj->base.dev->struct_mutex);
+	return ret;
+}
+
+#else
+
+static void
+i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj)
+{
+}
+
+static int
+i915_gem_userptr_init__mmu_notifier(struct drm_i915_gem_object *obj,
+				    unsigned flags)
+{
+	if ((flags & I915_USERPTR_UNSYNCHRONIZED) == 0)
+		return -ENODEV;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	return 0;
+}
+#endif
+
+struct get_pages_work {
+	struct work_struct work;
+	struct drm_i915_gem_object *obj;
+	struct task_struct *task;
+};
+
+
+#if IS_ENABLED(CONFIG_SWIOTLB)
+#define swiotlb_active() swiotlb_nr_tbl()
+#else
+#define swiotlb_active() 0
+#endif
+
+static int
+st_set_pages(struct sg_table **st, struct page **pvec, int num_pages)
+{
+	struct scatterlist *sg;
+	int ret, n;
+
+	*st = kmalloc(sizeof(**st), GFP_KERNEL);
+	if (*st == NULL)
+		return -ENOMEM;
+
+	if (swiotlb_active()) {
+		ret = sg_alloc_table(*st, num_pages, GFP_KERNEL);
+		if (ret)
+			goto err;
+
+		for_each_sg((*st)->sgl, sg, num_pages, n)
+			sg_set_page(sg, pvec[n], PAGE_SIZE, 0);
+	} else {
+		ret = sg_alloc_table_from_pages(*st, pvec, num_pages,
+						0, num_pages << PAGE_SHIFT,
+						GFP_KERNEL);
+		if (ret)
+			goto err;
+	}
+
+	return 0;
+
+err:
+	kfree(*st);
+	*st = NULL;
+	return ret;
+}
+
+static void
+__i915_gem_userptr_get_pages_worker(struct work_struct *_work)
+{
+	struct get_pages_work *work = container_of(_work, typeof(*work), work);
+	struct drm_i915_gem_object *obj = work->obj;
+	struct drm_device *dev = obj->base.dev;
+	const int num_pages = obj->base.size >> PAGE_SHIFT;
+	struct page **pvec;
+	int pinned, ret;
+
+	ret = -ENOMEM;
+	pinned = 0;
+
+	pvec = kmalloc(num_pages*sizeof(struct page *),
+		       GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
+	if (pvec == NULL)
+		pvec = drm_malloc_ab(num_pages, sizeof(struct page *));
+	if (pvec != NULL) {
+		struct mm_struct *mm = obj->userptr.mm;
+
+		down_read(&mm->mmap_sem);
+		while (pinned < num_pages) {
+			ret = get_user_pages(work->task, mm,
+					     obj->userptr.ptr + pinned * PAGE_SIZE,
+					     num_pages - pinned,
+					     !obj->userptr.read_only, 0,
+					     pvec + pinned, NULL);
+			if (ret < 0)
+				break;
+
+			pinned += ret;
+		}
+		up_read(&mm->mmap_sem);
+	}
+
+	mutex_lock(&dev->struct_mutex);
+	if (obj->userptr.work != &work->work) {
+		ret = 0;
+	} else if (pinned == num_pages) {
+		ret = st_set_pages(&obj->pages, pvec, num_pages);
+		if (ret == 0) {
+			list_add_tail(&obj->global_list, &to_i915(dev)->mm.unbound_list);
+			pinned = 0;
+		}
+	}
+
+	obj->userptr.work = ERR_PTR(ret);
+	obj->userptr.active--;
+	drm_gem_object_unreference(&obj->base);
+	mutex_unlock(&dev->struct_mutex);
+
+	release_pages(pvec, pinned, 0);
+	drm_free_large(pvec);
+
+	put_task_struct(work->task);
+	kfree(work);
+}
+
+static int
+i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
+{
+	const int num_pages = obj->base.size >> PAGE_SHIFT;
+	struct page **pvec;
+	int pinned, ret;
+
+	/* If userspace should engineer that these pages are replaced in
+	 * the vma between us binding this page into the GTT and completion
+	 * of rendering... Their loss. If they change the mapping of their
+	 * pages they need to create a new bo to point to the new vma.
+	 *
+	 * However, that still leaves open the possibility of the vma
+	 * being copied upon fork. Which falls under the same userspace
+	 * synchronisation issue as a regular bo, except that this time
+	 * the process may not be expecting that a particular piece of
+	 * memory is tied to the GPU.
+	 *
+	 * Fortunately, we can hook into the mmu_notifier in order to
+	 * discard the page references prior to anything nasty happening
+	 * to the vma (discard or cloning) which should prevent the more
+	 * egregious cases from causing harm.
+	 */
+
+	pvec = NULL;
+	pinned = 0;
+	if (obj->userptr.mm == current->mm) {
+		pvec = kmalloc(num_pages*sizeof(struct page *),
+			       GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
+		if (pvec == NULL) {
+			pvec = drm_malloc_ab(num_pages, sizeof(struct page *));
+			if (pvec == NULL)
+				return -ENOMEM;
+		}
+
+		pinned = __get_user_pages_fast(obj->userptr.ptr, num_pages,
+					       !obj->userptr.read_only, pvec);
+	}
+	if (pinned < num_pages) {
+		if (pinned < 0) {
+			ret = pinned;
+			pinned = 0;
+		} else {
+			/* Spawn a worker so that we can acquire the
+			 * user pages without holding our mutex. Access
+			 * to the user pages requires mmap_sem, and we have
+			 * a strict lock ordering of mmap_sem, struct_mutex -
+			 * we already hold struct_mutex here and so cannot
+			 * call gup without encountering a lock inversion.
+			 *
+			 * Userspace will keep on repeating the operation
+			 * (thanks to EAGAIN) until either we hit the fast
+			 * path or the worker completes. If the worker is
+			 * cancelled or superseded, the task is still run
+			 * but the results ignored. (This leads to
+			 * complications that we may have a stray object
+			 * refcount that we need to be wary of when
+			 * checking for existing objects during creation.)
+			 * If the worker encounters an error, it reports
+			 * that error back to this function through
+			 * obj->userptr.work = ERR_PTR.
+			 */
+			ret = -EAGAIN;
+			if (obj->userptr.work == NULL &&
+			    obj->userptr.active < I915_GEM_USERPTR_MAX_ACTIVE) {
+				struct get_pages_work *work;
+
+				work = kmalloc(sizeof(*work), GFP_KERNEL);
+				if (work != NULL) {
+					obj->userptr.work = &work->work;
+					obj->userptr.active++;
+
+					work->obj = obj;
+					drm_gem_object_reference(&obj->base);
+
+					work->task = current;
+					get_task_struct(work->task);
+
+					INIT_WORK(&work->work, __i915_gem_userptr_get_pages_worker);
+					schedule_work(&work->work);
+				} else
+					ret = -ENOMEM;
+			} else {
+				if (IS_ERR(obj->userptr.work)) {
+					ret = PTR_ERR(obj->userptr.work);
+					obj->userptr.work = NULL;
+				}
+			}
+		}
+	} else {
+		ret = st_set_pages(&obj->pages, pvec, num_pages);
+		if (ret == 0) {
+			obj->userptr.work = NULL;
+			pinned = 0;
+		}
+	}
+
+	release_pages(pvec, pinned, 0);
+	drm_free_large(pvec);
+	return ret;
+}
+
+static void
+i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj)
+{
+	struct scatterlist *sg;
+	int i;
+
+	BUG_ON(obj->userptr.work != NULL);
+
+	if (obj->madv != I915_MADV_WILLNEED)
+		obj->dirty = 0;
+
+	for_each_sg(obj->pages->sgl, sg, obj->pages->nents, i) {
+		struct page *page = sg_page(sg);
+
+		if (obj->dirty)
+			set_page_dirty(page);
+
+		mark_page_accessed(page);
+		page_cache_release(page);
+	}
+	obj->dirty = 0;
+
+	sg_free_table(obj->pages);
+	kfree(obj->pages);
+}
+
+static void
+i915_gem_userptr_release(struct drm_i915_gem_object *obj)
+{
+	i915_gem_userptr_release__mmu_notifier(obj);
+
+	if (obj->userptr.mm) {
+		mmput(obj->userptr.mm);
+		obj->userptr.mm = NULL;
+	}
+}
+
+static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
+	.get_pages = i915_gem_userptr_get_pages,
+	.put_pages = i915_gem_userptr_put_pages,
+	.release = i915_gem_userptr_release,
+};
+
+/**
+ * Creates a new mm object that wraps some normal memory from the process
+ * context - user memory.
+ *
+ * We impose several restrictions upon the memory being mapped
+ * into the GPU.
+ * 1. It must be page aligned (both start/end addresses, i.e ptr and size).
+ * 2. It cannot overlap any other userptr object in the same address space.
+ * 3. It must be normal system memory, not a pointer into another map of IO
+ *    space (e.g. it must not be a GTT mmapping of another object).
+ * 4. We only allow a bo as large as we could in theory map into the GTT,
+ *    that is we limit the size to the total size of the GTT.
+ * 5. The bo is marked as being snoopable. The backing pages are left
+ *    accessible directly by the CPU, but reads and writes by the GPU may
+ *    incur the cost of a snoop (unless you have an LLC architecture).
+ *
+ * Synchronisation between multiple users and the GPU is left to userspace
+ * through the normal set-domain-ioctl. The kernel will enforce that the
+ * GPU relinquishes the VMA before it is returned back to the system
+ * i.e. upon free(), munmap() or process termination. However, the userspace
+ * malloc() library may not immediately relinquish the VMA after free() and
+ * instead reuse it whilst the GPU is still reading and writing to the VMA.
+ * Caveat emptor.
+ *
+ * Also note, that the object created here is not currently a "first class"
+ * object, in that several ioctls are banned. These are the CPU access
+ * ioctls: mmap(), pwrite and pread. In practice, you are expected to use
+ * direct access via your pointer rather than use those ioctls.
+ *
+ * If you think this is a good interface to use to pass GPU memory between
+ * drivers, please use dma-buf instead. In fact, wherever possible use
+ * dma-buf instead.
+ */
+int
+i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_gem_userptr *args = data;
+	struct drm_i915_gem_object *obj;
+	int ret;
+	u32 handle;
+
+	if (args->flags & ~(I915_USERPTR_READ_ONLY |
+			    I915_USERPTR_UNSYNCHRONIZED))
+		return -EINVAL;
+
+	if (offset_in_page(args->user_ptr | args->user_size))
+		return -EINVAL;
+
+	if (args->user_size > dev_priv->gtt.base.total)
+		return -E2BIG;
+
+	if (!access_ok(args->flags & I915_USERPTR_READ_ONLY ? VERIFY_READ : VERIFY_WRITE,
+		       (char __user *)(unsigned long)args->user_ptr, args->user_size))
+		return -EFAULT;
+
+	if (args->flags & I915_USERPTR_READ_ONLY) {
+		/* On almost all of the current hw, we cannot tell the GPU that a
+		 * page is readonly, so this is just a placeholder in the uAPI.
+		 */
+		return -ENODEV;
+	}
+
+	/* Allocate the new object */
+	obj = i915_gem_object_alloc(dev);
+	if (obj == NULL)
+		return -ENOMEM;
+
+	drm_gem_private_object_init(dev, &obj->base, args->user_size);
+	i915_gem_object_init(obj, &i915_gem_userptr_ops);
+	obj->cache_level = I915_CACHE_LLC;
+	obj->base.write_domain = I915_GEM_DOMAIN_CPU;
+	obj->base.read_domains = I915_GEM_DOMAIN_CPU;
+
+	obj->userptr.ptr = args->user_ptr;
+	obj->userptr.read_only = !!(args->flags & I915_USERPTR_READ_ONLY);
+
+	/* And keep a pointer to the current->mm for resolving the user pages
+	 * at binding. This means that we need to hook into the mmu_notifier
+	 * in order to detect if the mmu is destroyed.
+	 */
+	ret = -ENOMEM;
+	if ((obj->userptr.mm = get_task_mm(current)))
+		ret = i915_gem_userptr_init__mmu_notifier(obj, args->flags);
+	if (ret == 0)
+		ret = drm_gem_handle_create(file, &obj->base, &handle);
+
+	/* drop reference from allocate - handle holds it now */
+	drm_gem_object_unreference_unlocked(&obj->base);
+	if (ret)
+		return ret;
+
+	args->handle = handle;
+	return 0;
+}
+
+int
+i915_gem_init_userptr(struct drm_device *dev)
+{
+#if defined(CONFIG_MMU_NOTIFIER)
+	struct drm_i915_private *dev_priv = to_i915(dev);
+	hash_init(dev_priv->mmu_notifiers);
+#endif
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index d639d6f..49f3200 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -205,6 +205,7 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
 		err_puts(m, tiling_flag(err->tiling));
 		err_puts(m, dirty_flag(err->dirty));
 		err_puts(m, purgeable_flag(err->purgeable));
+		err_puts(m, err->userptr ? " userptr" : "");
 		err_puts(m, err->ring != -1 ? " " : "");
 		err_puts(m, ring_str(err->ring));
 		err_puts(m, i915_cache_level_str(err->cache_level));
@@ -655,6 +656,7 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 	err->tiling = obj->tiling_mode;
 	err->dirty = obj->dirty;
 	err->purgeable = obj->madv != I915_MADV_WILLNEED;
+	err->userptr = obj->userptr.mm != 0;
 	err->ring = obj->ring ? obj->ring->id : -1;
 	err->cache_level = obj->cache_level;
 }
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 6306a84..dd98463 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -223,6 +223,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_GET_CACHING	0x30
 #define DRM_I915_REG_READ		0x31
 #define DRM_I915_GET_RESET_STATS	0x32
+#define DRM_I915_GEM_USERPTR		0x34
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
 #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
@@ -273,6 +274,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy)
 #define DRM_IOCTL_I915_REG_READ			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read)
 #define DRM_IOCTL_I915_GET_RESET_STATS		DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATS, struct drm_i915_reset_stats)
+#define DRM_IOCTL_I915_GEM_USERPTR			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_USERPTR, struct drm_i915_gem_userptr)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1051,4 +1053,18 @@ struct drm_i915_reset_stats {
 	__u32 pad;
 };
 
+struct drm_i915_gem_userptr {
+	__u64 user_ptr;
+	__u64 user_size;
+	__u32 flags;
+#define I915_USERPTR_READ_ONLY 0x1
+#define I915_USERPTR_UNSYNCHRONIZED 0x80000000
+	/**
+	 * Returned handle for the object.
+	 *
+	 * Object handles are nonzero.
+	 */
+	__u32 handle;
+};
+
 #endif /* _UAPI_I915_DRM_H_ */
-- 
1.9.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 55/56] drm/i915: Track userptr VMAs
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (53 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 54/56] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-10  3:59 ` [PATCH 56/56] drm/i915/userptr: Mirror GPU addr at ioctl (HACK/POC) Ben Widawsky
  2014-05-11 17:33 ` [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Daniel Vetter
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This HACK allows users to reuse the userptr ioctl in order to
pre-reserve the VMA at a specific location. The vma will follow all the
same paths as other userptr objects - only the drm_mm node is actually
allocated.

Again, this patch is a big HACK to get some other people currently using
userptr enabled.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h            |  1 +
 drivers/gpu/drm/i915/i915_gem.c            | 22 +++++++++++++++++++---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +++
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  4 ++++
 4 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 60513e7..71e39ff 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2166,6 +2166,7 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
 #define PIN_GLOBAL 0x4
 #define PIN_ALIASING 0x8
 #define PIN_GLOBAL_ALIASED (PIN_ALIASING | PIN_GLOBAL)
+#define PIN_BOUND	0x10
 int __must_check i915_gem_object_pin(struct drm_i915_gem_object *obj,
 				     struct i915_address_space *vm,
 				     uint32_t alignment,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 287d48e..ff75971 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3279,7 +3279,13 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	if (IS_ERR(vma))
 		goto err_unpin;
 
+	if (flags & PIN_BOUND) {
+		WARN_ON(!vma->node.allocated && !vma->obj->userptr.ptr);
+		goto skip_search;
+	}
+
 search_free:
+	WARN_ON(vma->node.allocated);
 	ret = drm_mm_insert_node_in_range_generic(&vm->mm, &vma->node,
 						  size, alignment,
 						  obj->cache_level, 0, gtt_max,
@@ -3293,6 +3299,7 @@ search_free:
 
 		goto err_free_vma;
 	}
+skip_search:
 	if (WARN_ON(!i915_gem_valid_gtt_space(dev, &vma->node,
 					      obj->cache_level))) {
 		ret = -EINVAL;
@@ -3329,10 +3336,13 @@ search_free:
 	i915_gem_vma_bind(vma, obj->cache_level, vma_bind_flags);
 
 	i915_gem_verify_gtt(dev);
+	if (flags & PIN_BOUND)
+		vma->uptr_bind=1;
 	return vma;
 
 err_remove_node:
-	drm_mm_remove_node(&vma->node);
+	if ((flags & PIN_BOUND) == 0)
+		drm_mm_remove_node(&vma->node);
 err_free_vma:
 	i915_gem_vma_destroy(vma);
 	vma = ERR_PTR(ret);
@@ -3875,6 +3885,11 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
 	if (WARN_ON(flags & (PIN_GLOBAL | PIN_MAPPABLE) && !i915_is_ggtt(vm)))
 		return -EINVAL;
 
+	if (flags & PIN_BOUND) {
+		if (WARN_ON(flags & ~PIN_BOUND))
+			return -EINVAL;
+	}
+
 	vma = i915_gem_obj_to_vma(obj, vm);
 	if (vma) {
 		if (WARN_ON(vma->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT))
@@ -3898,7 +3913,8 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
 		}
 	}
 
-	if (vma == NULL || !drm_mm_node_allocated(&vma->node)) {
+	if (vma == NULL || !drm_mm_node_allocated(&vma->node) ||
+	    ((flags & PIN_BOUND) && !vma->uptr_bind)) {
 		vma = i915_gem_object_bind_to_vm(obj, vm, alignment, flags);
 		if (IS_ERR(vma))
 			return PTR_ERR(vma);
@@ -4265,7 +4281,7 @@ struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 
 void i915_gem_vma_destroy(struct i915_vma *vma)
 {
-	WARN_ON(vma->node.allocated);
+	WARN_ON(vma->node.allocated && !vma->uptr);
 
 	/* Keep the vma as a placeholder in the execbuffer reservation lists */
 	if (!list_empty(&vma->exec_list))
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 08fde7d..596e51e 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -566,6 +566,9 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
 	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
 		flags |= PIN_GLOBAL;
 
+	if (vma->uptr)
+		flags |= PIN_BOUND;
+
 	ret = i915_gem_object_pin(obj, vma->vm, entry->alignment, flags);
 	if (ret)
 		return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c265c23..bdb4b05 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -171,6 +171,10 @@ struct i915_vma {
 	unsigned int pin_count:4;
 #define DRM_I915_GEM_OBJECT_MAX_PIN_COUNT 0xf
 
+	/* FIXME: */
+	unsigned int uptr:1; /* Whether this VMA has been userptr'd */
+	unsigned int uptr_bind:1; /* Whether we've actually bound it */
+
 	/** Unmap an object from an address space. This usually consists of
 	 * setting the valid PTE entries to a reserved scratch page. */
 	void (*unbind_vma)(struct i915_vma *vma);
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 56/56] drm/i915/userptr: Mirror GPU addr at ioctl (HACK/POC)
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (54 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 55/56] drm/i915: Track userptr VMAs Ben Widawsky
@ 2014-05-10  3:59 ` Ben Widawsky
  2014-05-11 17:33 ` [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Daniel Vetter
  56 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2014-05-10  3:59 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This is needed for the proof of concept work that will allow mirrored
GPU addressing via the existing userptr interface. Part of the hack
involves passing the context ID to the ioctl in order to get a VM.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_userptr.c | 120 +++++++++++++++++++++++++-------
 include/uapi/drm/i915_drm.h             |   7 +-
 2 files changed, 98 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 5da37cc..795ea3e 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -224,10 +224,6 @@ i915_mmu_notifier_add(struct i915_mmu_notifier *mmu,
 	 * remove the objects from the interval tree) before we do
 	 * the check for overlapping objects.
 	 */
-	ret = i915_mutex_lock_interruptible(mmu->dev);
-	if (ret)
-		return ret;
-
 	i915_gem_retire_requests(mmu->dev);
 
 	/* Disallow overlapping userptr objects */
@@ -253,7 +249,6 @@ i915_mmu_notifier_add(struct i915_mmu_notifier *mmu,
 		ret = 0;
 	}
 	spin_unlock(&mmu->lock);
-	mutex_unlock(&mmu->dev->struct_mutex);
 
 	return ret;
 }
@@ -283,19 +278,12 @@ i915_gem_userptr_init__mmu_notifier(struct drm_i915_gem_object *obj,
 		return capable(CAP_SYS_ADMIN) ? 0 : -EPERM;
 
 	down_write(&obj->userptr.mm->mmap_sem);
-	ret = i915_mutex_lock_interruptible(obj->base.dev);
-	if (ret == 0) {
-		mmu = i915_mmu_notifier_get(obj->base.dev, obj->userptr.mm);
-		if (!IS_ERR(mmu))
-			mmu->count++; /* preemptive add to act as a refcount */
-		else
-			ret = PTR_ERR(mmu);
-		mutex_unlock(&obj->base.dev->struct_mutex);
-	}
+	mmu = i915_mmu_notifier_get(obj->base.dev, obj->userptr.mm);
+	if (!IS_ERR(mmu))
+		mmu->count++; /* preemptive add to act as a refcount */
+	else
+		ret = PTR_ERR(mmu);
 	up_write(&obj->userptr.mm->mmap_sem);
-	if (ret)
-		return ret;
-
 	mn = kzalloc(sizeof(*mn), GFP_KERNEL);
 	if (mn == NULL) {
 		ret = -ENOMEM;
@@ -588,6 +576,52 @@ i915_gem_userptr_release(struct drm_i915_gem_object *obj)
 	}
 }
 
+/* Carve out the address space for later use */
+static int i915_gem_userptr_reserve_vma(struct drm_i915_gem_object *obj,
+					struct i915_address_space *vm,
+					uint64_t offset,
+					uint64_t size)
+{
+	struct i915_vma *vma;
+	int ret;
+
+	vma = i915_gem_obj_to_vma(obj, vm);
+	if (vma)
+		return -ENXIO;
+
+	vma = i915_gem_obj_lookup_or_create_vma(obj, vm);
+	if (!vma)
+		return PTR_ERR(vma);
+
+	BUG_ON(!drm_mm_initialized(&vm->mm));
+
+	if (vma->uptr) {
+		DRM_INFO("Already had a userptr\n");
+		return 0;
+	}
+	if (vma->node.allocated) {
+		DRM_INFO("Node was previously allocated\n");
+		return -EBUSY;
+	}
+
+	vma->node.start = offset;
+	vma->node.size = size;
+	vma->node.color = 0;
+	ret = drm_mm_reserve_node(&vm->mm, &vma->node);
+	if (ret) {
+		/* There are two reasons this can fail.
+		 * 1. The user is using a mix of relocs and userptr, and a reloc
+		 * won.
+		 * TODO: handle better.
+		 */
+		return ret;
+	}
+
+	vma->uptr = 1;
+
+	return 0;
+}
+
 static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
 	.get_pages = i915_gem_userptr_get_pages,
 	.put_pages = i915_gem_userptr_put_pages,
@@ -630,37 +664,62 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
 int
 i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct drm_i915_gem_userptr *args = data;
 	struct drm_i915_gem_object *obj;
+	struct i915_hw_context *ctx;
+	struct i915_address_space *vm;
 	int ret;
 	u32 handle;
 
+	ret = i915_mutex_lock_interruptible(dev);
+	if (ret)
+		return ret;
+
+#define goto_err(__err) do { \
+	ret = (__err); \
+	goto out; \
+} while (0)
+
+	ctx = i915_gem_context_get(file_priv, args->ctx_id);
+	if (IS_ERR(ctx))
+		goto_err(PTR_ERR(ctx));
+
+	/* i915_gem_context_reference(ctx); */
+
 	if (args->flags & ~(I915_USERPTR_READ_ONLY |
+			    I915_USERPTR_GPU_MIRROR |
 			    I915_USERPTR_UNSYNCHRONIZED))
-		return -EINVAL;
+		goto_err(-EINVAL);
 
 	if (offset_in_page(args->user_ptr | args->user_size))
-		return -EINVAL;
-
-	if (args->user_size > dev_priv->gtt.base.total)
-		return -E2BIG;
+		goto_err(-EINVAL);
 
 	if (!access_ok(args->flags & I915_USERPTR_READ_ONLY ? VERIFY_READ : VERIFY_WRITE,
 		       (char __user *)(unsigned long)args->user_ptr, args->user_size))
-		return -EFAULT;
+		goto_err(-EFAULT);
 
 	if (args->flags & I915_USERPTR_READ_ONLY) {
 		/* On almost all of the current hw, we cannot tell the GPU that a
 		 * page is readonly, so this is just a placeholder in the uAPI.
 		 */
-		return -ENODEV;
+		goto_err(-ENODEV);
+	}
+
+	vm = ctx->vm;
+	if (args->user_size > vm->total)
+		goto_err(-E2BIG);
+
+	if (args->flags & I915_USERPTR_GPU_MIRROR) {
+		if (!HAS_48B_PPGTT(dev))
+			goto_err(-ENODEV);
 	}
 
 	/* Allocate the new object */
 	obj = i915_gem_object_alloc(dev);
 	if (obj == NULL)
-		return -ENOMEM;
+		goto_err(-ENOMEM);
+#undef goto_err
 
 	drm_gem_private_object_init(dev, &obj->base, args->user_size);
 	i915_gem_object_init(obj, &i915_gem_userptr_ops);
@@ -680,9 +739,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file *file
 		ret = i915_gem_userptr_init__mmu_notifier(obj, args->flags);
 	if (ret == 0)
 		ret = drm_gem_handle_create(file, &obj->base, &handle);
+	if (ret == 0 && args->flags & I915_USERPTR_GPU_MIRROR) {
+		ret = i915_gem_userptr_reserve_vma(obj, vm, args->user_ptr, args->user_size);
+		if (ret)
+			DRM_DEBUG_DRIVER("Failed to reserve GPU mirror %d\n", ret);
+	}
 
 	/* drop reference from allocate - handle holds it now */
-	drm_gem_object_unreference_unlocked(&obj->base);
+	drm_gem_object_unreference(&obj->base);
+out:
+	mutex_unlock(&dev->struct_mutex);
 	if (ret)
 		return ret;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index dd98463..addab22 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1056,15 +1056,18 @@ struct drm_i915_reset_stats {
 struct drm_i915_gem_userptr {
 	__u64 user_ptr;
 	__u64 user_size;
+	__u32 ctx_id;
 	__u32 flags;
-#define I915_USERPTR_READ_ONLY 0x1
-#define I915_USERPTR_UNSYNCHRONIZED 0x80000000
+#define I915_USERPTR_READ_ONLY		(1<<0)
+#define I915_USERPTR_GPU_MIRROR		(1<<1)
+#define I915_USERPTR_UNSYNCHRONIZED	(1<<31)
 	/**
 	 * Returned handle for the object.
 	 *
 	 * Object handles are nonzero.
 	 */
 	__u32 handle;
+	__u32 pad;
 };
 
 #endif /* _UAPI_I915_DRM_H_ */
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror
  2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
                   ` (55 preceding siblings ...)
  2014-05-10  3:59 ` [PATCH 56/56] drm/i915/userptr: Mirror GPU addr at ioctl (HACK/POC) Ben Widawsky
@ 2014-05-11 17:33 ` Daniel Vetter
  56 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2014-05-11 17:33 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Sat, May 10, 2014 at 5:58 AM, Ben Widawsky
<benjamin.widawsky@intel.com> wrote:
> Just as before, these patches are living based off of my Broadwell
> branch, here:
> http://cgit.freedesktop.org/~bwidawsk/drm-intel/log/?h=gpu_mirror
>
> This is the follow-on patches for [1]
>
> This patch series brings 3 things:
> 1. Dynamic page table allocation for gen6-8
> 2. 64b (48b canonical) graphics virtual address space for Broadwell
> 3. An interface to specify a specific offset for a BO.
>
> It's taken way longer than I thought to get this work done, and given
> the current state of our driver, I fear I may not have time to see this
> through to the end before I am pulled onto other things. If people want
> to send me smallish bugfixes, I will gladly do my best to fix them
> quickly. If there are more substantial change requests wrt design or
> patch reorganization, I will not be able to accommodate. Someone else
> must take over this patch series at that point if they want these
> features. I do believe that everything up until the userptr patch is in
> decent shape though, so we'll see, I guess. (if you are qualified to
> take this over, and have interest, please let me know).
>
> The patch series is highly volatile and not manicured. I've run exactly
> 1 test on the GPU mirror (see below for what that means), though many
> more on the prior stuff. The series depends on full PPGTT, which is not
> yet enabled by default, and has a few outstanding issues. It also has
> been developed exclusively on pre-production hardware. I am only sending
> out now because I will be on vacation for the next 10 days, and I know
> there are people that can benefit from this code before I return. With
> that, I got the last parts of this working very recently, and they're
> very hackish. The reason for this lack of refinement is I expect the
> interfaces for letting userspace dictate things to change (more on this
> later), and the other part is I just ran out of time before my vacation.
> Throughout development, I've been hitting issues which I am not yet sure
> if they are bugs in my code, bugs in full PPGTT, bugs in userptr, or
> generally flakiness. There are a few patches in here which say TESTME
> reflecting upon this. Also, if you want to run this, I highly recommend
> turning off semaphores, and rc6. (To be honest, I've not tried it
> recently). You also need to turn on PPGTT since it is disabled by
> default.
>
> modprobe i915 enable_ppgtt=2 semaphores=0 enable_rc6=0
>
> What you get in this series is what I'm going to coin, GPU mirror. This
> patch series allows one to allocate an arbitrary address for your GPU
> buffer object, and map it to a specific space within the GPUs address
> space. This is only possible because on Broadwell we get a 64b canonical
> GPU address space, and this allows us to map any CPU address as a GPU
> address. The obvious usage here is malloc(). malloc() returns a pointer
> that is valid on the CPU. Now that address can be identical on the GPU.
>
> The interface provided is identical to the userptr interface previously
> posted by Chris Wilson. I've added a flag to that interface that
> indicates this new functionality. This is not necessarily the final
> version, and it's arguably not the best idea either. The reason for this
> choice is we had users of userptr that wanted to try out this concept
> and not have to do much porting.
>
> To get to the userptr interface, I had to make a few things happen
> first. I needed to get dynamic page table allocation and teardown
> working. This was posted previously for gen6-7 [1] (with very rough code
> for gen8). I've now added more robust support for gen8 dynamic page
> table allocations. Doing the allocations dynamically was important
> because preallocating all 4 levels of page tables is not feasible in a
> real system. 4 level page tables are required in order to be able to
> support the 64b canonical address space.
>
> With that all done, I was able to make a few minor hacks to userptr,
> take the intel-gpu-tools test from Tvrtko, and see at least one pass.
> FWIW, I am currently running,
> ./tests/gem_userptr_blits --run-subtest coherency-unsync
>
> Since I feel the interface will likely change, I do not feel compelled
> to post either my libdrm, not my IGT changes. If you want the modified
> test, let me know, as I don't think it's really relevant here.
>
> One last thing. Intel GPU tools, as it stands today, makes a lot of
> assumptions about using an address space > 32b. I have not had time to
> fix this. It is something which needs fixing before this series could
> even be considered testable.

Until full ppgtt is fixed up and enabled by default it in my opinion
doesn't make sense to pile more ppgtt features on top. Until that's
addressed or someone convinces me that my opinion is stupid I'll
reject this.

Of course that doesn't include the entire patch series, since the
dynamic ppgtt pte stuff is part of fixing up ppgtt. But I still think
we should address the bugs first before we make the code even more
complicated, so I'd prefer to merge even the dynamic pte stuff after
full ppgtt is again enabled by default.
-Daniel

>
> [1] http://lists.freedesktop.org/archives/intel-gfx/2014-March/041814.html
>
> Ben Widawsky (54):
>   drm/i915: Fix flush before context switch comment
>   Revert "drm/i915: Drop I915_PARAM_HAS_FULL_PPGTT again"
>   drm/i915: Wrap VMA binding
>   drm/i915: Make pin global flags explicit
>   drm/i915: Split out aliasing binds
>   drm/i915: fix gtt_total_entries()
>   drm/i915: Rename to GEN8_LEGACY_PDPES
>   drm/i915: Split out verbose PPGTT dumping
>   drm/i915: s/pd/pdpe, s/pt/pde
>   drm/i915: rename map/unmap to dma_map/unmap
>   drm/i915: Setup less PPGTT on failed pagedir
>   drm/i915: clean up PPGTT init error path
>   drm/i915: Un-hardcode number of page directories
>   drm/i915: Make gen6_write_pdes gen6_map_page_tables
>   drm/i915: Range clearing is PPGTT agnostic
>   drm/i915: Page table helpers, and define renames
>   drm/i915: construct page table abstractions
>   drm/i915: Complete page table structures
>   drm/i915: Create page table allocators
>   drm/i915: Generalize GEN6 mapping
>   drm/i915: Clean up pagetable DMA map & unmap
>   drm/i915: Always dma map page table allocations
>   drm/i915: Consolidate dma mappings
>   drm/i915: Always dma map page directory allocations
>   drm/i915: Track GEN6 page table usage
>   drm/i915: Extract context switch skip logic
>   drm/i915: Force pd restore when PDEs change, gen6-7
>   drm/i915: Finish gen6/7 dynamic page table allocation
>   drm/i915/bdw: Use dynamic allocation idioms on free
>   drm/i915/bdw: pagedirs rework allocation
>   drm/i915/bdw: pagetable allocation rework
>   drm/i915/bdw: Make the pdp switch a bit less hacky
>   drm/i915: num_pd_pages/num_pd_entries isn't useful
>   drm/i915: Extract PPGTT param from pagedir alloc
>   drm/i915/bdw: Split out mappings
>   drm/i915/bdw: begin bitmap tracking
>   drm/i915/bdw: Dynamic page table allocations
>   drm/i915/bdw: Scratch unused pages
>   drm/i915/bdw: Add ppgtt info for dynamic pages
>   drm/i915/bdw: Optimize PDP loads
>   TESTME: Either drop the last patch or fix it.
>   drm/i915/bdw: Add dynamic page trace events
>   drm/i915/bdw: Make pdp allocation more dynamic
>   drm/i915/bdw: Abstract PDP usage
>   drm/i915/bdw: implement alloc/teardown for 4lvl
>   drm/i915/bdw: 4 level pages tables
>   drm/i915: Restructure map vs. insert entries
>   drm/i915/bdw: make aliasing PPGTT dynamic
>   drm/i915: Expand error state's address width to 64b
>   drm/i915/bdw: Flip the 48b switch
>   TESTME: GFX_TLB_INVALIDATE_EXPLICIT
>   TESTME: Always force invalidate
>   drm/i915: Track userptr VMAs
>   drm/i915/userptr: Mirror GPU addr at ioctl (HACK/POC)
>
> Chris Wilson (2):
>   drm/i915: Prevent signals from interrupting close()
>   drm/i915: Introduce mapping of user pages into video memory (userptr)
>     ioctl
>
>  drivers/gpu/drm/i915/Kconfig               |    1 +
>  drivers/gpu/drm/i915/Makefile              |    1 +
>  drivers/gpu/drm/i915/i915_debugfs.c        |  112 +-
>  drivers/gpu/drm/i915/i915_dma.c            |   15 +-
>  drivers/gpu/drm/i915/i915_drv.h            |   40 +-
>  drivers/gpu/drm/i915/i915_gem.c            |   61 +-
>  drivers/gpu/drm/i915/i915_gem_context.c    |   31 +-
>  drivers/gpu/drm/i915/i915_gem_dmabuf.c     |    5 +
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   22 +-
>  drivers/gpu/drm/i915/i915_gem_gtt.c        | 1810 +++++++++++++++++++++-------
>  drivers/gpu/drm/i915/i915_gem_gtt.h        |  354 +++++-
>  drivers/gpu/drm/i915/i915_gem_userptr.c    |  767 ++++++++++++
>  drivers/gpu/drm/i915/i915_gpu_error.c      |   21 +-
>  drivers/gpu/drm/i915/i915_reg.h            |    1 +
>  drivers/gpu/drm/i915/i915_trace.h          |  140 +++
>  drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +-
>  include/uapi/drm/i915_drm.h                |   20 +
>  17 files changed, 2823 insertions(+), 580 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_gem_userptr.c
>
> --
> 1.9.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2014-05-11 17:33 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-10  3:58 [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Ben Widawsky
2014-05-10  3:58 ` [PATCH 01/56] drm/i915: Fix flush before context switch comment Ben Widawsky
2014-05-10  3:58 ` [PATCH 02/56] Revert "drm/i915: Drop I915_PARAM_HAS_FULL_PPGTT again" Ben Widawsky
2014-05-10  3:58 ` [PATCH 03/56] drm/i915: Prevent signals from interrupting close() Ben Widawsky
2014-05-10  3:58 ` [PATCH 04/56] drm/i915: Wrap VMA binding Ben Widawsky
2014-05-10  3:59 ` [PATCH 05/56] drm/i915: Make pin global flags explicit Ben Widawsky
2014-05-10  3:59 ` [PATCH 06/56] drm/i915: Split out aliasing binds Ben Widawsky
2014-05-10  3:59 ` [PATCH 07/56] drm/i915: fix gtt_total_entries() Ben Widawsky
2014-05-10  3:59 ` [PATCH 08/56] drm/i915: Rename to GEN8_LEGACY_PDPES Ben Widawsky
2014-05-10  3:59 ` [PATCH 09/56] drm/i915: Split out verbose PPGTT dumping Ben Widawsky
2014-05-10  3:59 ` [PATCH 10/56] drm/i915: s/pd/pdpe, s/pt/pde Ben Widawsky
2014-05-10  3:59 ` [PATCH 11/56] drm/i915: rename map/unmap to dma_map/unmap Ben Widawsky
2014-05-10  3:59 ` [PATCH 12/56] drm/i915: Setup less PPGTT on failed pagedir Ben Widawsky
2014-05-10  3:59 ` [PATCH 13/56] drm/i915: clean up PPGTT init error path Ben Widawsky
2014-05-10  3:59 ` [PATCH 14/56] drm/i915: Un-hardcode number of page directories Ben Widawsky
2014-05-10  3:59 ` [PATCH 15/56] drm/i915: Make gen6_write_pdes gen6_map_page_tables Ben Widawsky
2014-05-10  3:59 ` [PATCH 16/56] drm/i915: Range clearing is PPGTT agnostic Ben Widawsky
2014-05-10  3:59 ` [PATCH 17/56] drm/i915: Page table helpers, and define renames Ben Widawsky
2014-05-10  3:59 ` [PATCH 18/56] drm/i915: construct page table abstractions Ben Widawsky
2014-05-10  3:59 ` [PATCH 19/56] drm/i915: Complete page table structures Ben Widawsky
2014-05-10  3:59 ` [PATCH 20/56] drm/i915: Create page table allocators Ben Widawsky
2014-05-10  3:59 ` [PATCH 21/56] drm/i915: Generalize GEN6 mapping Ben Widawsky
2014-05-10  3:59 ` [PATCH 22/56] drm/i915: Clean up pagetable DMA map & unmap Ben Widawsky
2014-05-10  3:59 ` [PATCH 23/56] drm/i915: Always dma map page table allocations Ben Widawsky
2014-05-10  3:59 ` [PATCH 24/56] drm/i915: Consolidate dma mappings Ben Widawsky
2014-05-10  3:59 ` [PATCH 25/56] drm/i915: Always dma map page directory allocations Ben Widawsky
2014-05-10  3:59 ` [PATCH 26/56] drm/i915: Track GEN6 page table usage Ben Widawsky
2014-05-10  3:59 ` [PATCH 27/56] drm/i915: Extract context switch skip logic Ben Widawsky
2014-05-10  3:59 ` [PATCH 28/56] drm/i915: Force pd restore when PDEs change, gen6-7 Ben Widawsky
2014-05-10  3:59 ` [PATCH 29/56] drm/i915: Finish gen6/7 dynamic page table allocation Ben Widawsky
2014-05-10  3:59 ` [PATCH 30/56] drm/i915/bdw: Use dynamic allocation idioms on free Ben Widawsky
2014-05-10  3:59 ` [PATCH 31/56] drm/i915/bdw: pagedirs rework allocation Ben Widawsky
2014-05-10  3:59 ` [PATCH 32/56] drm/i915/bdw: pagetable allocation rework Ben Widawsky
2014-05-10  3:59 ` [PATCH 33/56] drm/i915/bdw: Make the pdp switch a bit less hacky Ben Widawsky
2014-05-10  3:59 ` [PATCH 34/56] drm/i915: num_pd_pages/num_pd_entries isn't useful Ben Widawsky
2014-05-10  3:59 ` [PATCH 35/56] drm/i915: Extract PPGTT param from pagedir alloc Ben Widawsky
2014-05-10  3:59 ` [PATCH 36/56] drm/i915/bdw: Split out mappings Ben Widawsky
2014-05-10  3:59 ` [PATCH 37/56] drm/i915/bdw: begin bitmap tracking Ben Widawsky
2014-05-10  3:59 ` [PATCH 38/56] drm/i915/bdw: Dynamic page table allocations Ben Widawsky
2014-05-10  3:59 ` [PATCH 39/56] drm/i915/bdw: Scratch unused pages Ben Widawsky
2014-05-10  3:59 ` [PATCH 40/56] drm/i915/bdw: Add ppgtt info for dynamic pages Ben Widawsky
2014-05-10  3:59 ` [PATCH 41/56] drm/i915/bdw: Optimize PDP loads Ben Widawsky
2014-05-10  3:59 ` [PATCH 42/56] TESTME: Either drop the last patch or fix it Ben Widawsky
2014-05-10  3:59 ` [PATCH 43/56] drm/i915/bdw: Add dynamic page trace events Ben Widawsky
2014-05-10  3:59 ` [PATCH 44/56] drm/i915/bdw: Make pdp allocation more dynamic Ben Widawsky
2014-05-10  3:59 ` [PATCH 45/56] drm/i915/bdw: Abstract PDP usage Ben Widawsky
2014-05-10  3:59 ` [PATCH 46/56] drm/i915/bdw: implement alloc/teardown for 4lvl Ben Widawsky
2014-05-10  3:59 ` [PATCH 47/56] drm/i915/bdw: 4 level pages tables Ben Widawsky
2014-05-10  3:59 ` [PATCH 48/56] drm/i915: Restructure map vs. insert entries Ben Widawsky
2014-05-10  3:59 ` [PATCH 49/56] drm/i915/bdw: make aliasing PPGTT dynamic Ben Widawsky
2014-05-10  3:59 ` [PATCH 50/56] drm/i915: Expand error state's address width to 64b Ben Widawsky
2014-05-10  3:59 ` [PATCH 51/56] drm/i915/bdw: Flip the 48b switch Ben Widawsky
2014-05-10  3:59 ` [PATCH 52/56] TESTME: GFX_TLB_INVALIDATE_EXPLICIT Ben Widawsky
2014-05-10  3:59 ` [PATCH 53/56] TESTME: Always force invalidate Ben Widawsky
2014-05-10  3:59 ` [PATCH 54/56] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl Ben Widawsky
2014-05-10  3:59 ` [PATCH 55/56] drm/i915: Track userptr VMAs Ben Widawsky
2014-05-10  3:59 ` [PATCH 56/56] drm/i915/userptr: Mirror GPU addr at ioctl (HACK/POC) Ben Widawsky
2014-05-11 17:33 ` [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror Daniel Vetter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox